VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-96.26%)

Mutual labels: big-data

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

Stars: ✭ 107 (-93.21%)

Mutual labels: big-data

Kibble 1

Apache Kibble - a tool to collect, aggregate and visualize data about any software project

Stars: ✭ 54 (-96.57%)

Mutual labels: big-data

Uproot4

ROOT I/O in pure Python and NumPy.

Stars: ✭ 80 (-94.92%)

Mutual labels: big-data

Datumbox Framework

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Stars: ✭ 1,063 (-32.55%)

Mutual labels: big-data

Orc

An ORC file format reader and writer for Go.

Stars: ✭ 97 (-93.85%)

Mutual labels: big-data

Couchdb Couch

Mirror of Apache CouchDB

Stars: ✭ 43 (-97.27%)

Mutual labels: big-data

Spark Website

Apache Spark Website

Stars: ✭ 75 (-95.24%)

Mutual labels: big-data

Bookkeeper

Apache Bookkeeper

Stars: ✭ 1,178 (-25.25%)

Mutual labels: big-data

Analysispreservation.cern.ch

Source code for the CERN Analysis Preservation portal

Stars: ✭ 37 (-97.65%)

Mutual labels: big-data

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (-15.1%)

Mutual labels: big-data

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-95.49%)

Mutual labels: big-data

Vizuka

Explore high-dimensional datasets and how your algo handles specific regions.

Stars: ✭ 100 (-93.65%)

Mutual labels: big-data

Countly Sdk Cordova

Countly Product Analytics SDK for Cordova, Icenium and Phonegap

Stars: ✭ 69 (-95.62%)

Mutual labels: big-data

Reef

Mirror of Apache REEF

Stars: ✭ 92 (-94.16%)

Mutual labels: big-data

Hazelcast Cpp Client

Hazelcast IMDG C++ Client

Stars: ✭ 67 (-95.75%)

Mutual labels: big-data

Attic Predictionio Sdk Java

PredictionIO Java SDK

Stars: ✭ 107 (-93.21%)

Mutual labels: big-data

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-95.88%)

Mutual labels: big-data

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (-94.23%)

Mutual labels: big-data

Spark Doc Zh

Apache Spark 官方文档中文版

Stars: ✭ 1,126 (-28.55%)

Mutual labels: big-data

Samza Hello Samza

Mirror of Apache Samza

Stars: ✭ 99 (-93.72%)

Mutual labels: big-data

Nabhash

An extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data

Stars: ✭ 62 (-96.07%)

Mutual labels: big-data

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (-18.91%)

Mutual labels: big-data

Attic Lens

Mirror of Apache Lens

Stars: ✭ 58 (-96.32%)

Mutual labels: big-data

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-93.02%)

Mutual labels: big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-96.38%)

Mutual labels: big-data

Panoptes

A Global Scale Network Telemetry Ecosystem

Stars: ✭ 80 (-94.92%)

Mutual labels: big-data

Lifion Kinesis

A native Node.js producer and consumer library for Amazon Kinesis Data Streams

Stars: ✭ 54 (-96.57%)

Mutual labels: big-data

Kudu

Mirror of Apache Kudu

Stars: ✭ 1,360 (-13.71%)

Mutual labels: big-data

Oodt

Mirror of Apache OODT

Stars: ✭ 52 (-96.7%)

Mutual labels: big-data

Iotdb

Apache IoTDB

Stars: ✭ 1,221 (-22.53%)

Mutual labels: big-data

Trck

Query engine for TrailDB

Stars: ✭ 48 (-96.95%)

Mutual labels: big-data

Mysql perf analyzer

MySQL performance monitoring and analysis.

Stars: ✭ 1,423 (-9.71%)

Mutual labels: big-data

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (-34.96%)

Mutual labels: big-data

Attic Predictionio Template Recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 78 (-95.05%)

Mutual labels: big-data

Attaca

Robust, distributed version control for large files.

Stars: ✭ 41 (-97.4%)

Mutual labels: big-data

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-93.85%)

Mutual labels: big-data

Cookbook

The Data Engineering Cookbook

Stars: ✭ 9,829 (+523.67%)

Mutual labels: big-data

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (-2.03%)

Mutual labels: big-data

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks