H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+258.88%)

Mutual labels: big-data

Attic Lens

Mirror of Apache Lens

Stars: ✭ 58 (-96.32%)

Mutual labels: big-data

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+249.81%)

Mutual labels: big-data

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-93.02%)

Mutual labels: big-data

Scanner

Efficient video analysis at scale

Stars: ✭ 569 (-63.9%)

Mutual labels: big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-96.38%)

Mutual labels: big-data

Nipype

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (-64.66%)

Mutual labels: big-data

Panoptes

A Global Scale Network Telemetry Ecosystem

Stars: ✭ 80 (-94.92%)

Mutual labels: big-data

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (-66.5%)

Mutual labels: big-data

Lifion Kinesis

A native Node.js producer and consumer library for Amazon Kinesis Data Streams

Stars: ✭ 54 (-96.57%)

Mutual labels: big-data

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+226.71%)

Mutual labels: big-data

Kudu

Mirror of Apache Kudu

Stars: ✭ 1,360 (-13.71%)

Mutual labels: big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (-67.83%)

Mutual labels: big-data

Oodt

Mirror of Apache OODT

Stars: ✭ 52 (-96.7%)

Mutual labels: big-data

Stream Framework

Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Stars: ✭ 4,576 (+190.36%)

Mutual labels: big-data

Iotdb

Apache IoTDB

Stars: ✭ 1,221 (-22.53%)

Mutual labels: big-data

Redislite

Redis in a python module.

Stars: ✭ 464 (-70.56%)

Mutual labels: big-data

Trck

Query engine for TrailDB

Stars: ✭ 48 (-96.95%)

Mutual labels: big-data

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (-71.19%)

Mutual labels: big-data

Mysql perf analyzer

MySQL performance monitoring and analysis.

Stars: ✭ 1,423 (-9.71%)

Mutual labels: big-data

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+1298.98%)

Mutual labels: big-data

Moosefs

MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)

Stars: ✭ 1,025 (-34.96%)

Mutual labels: big-data

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (-72.97%)

Mutual labels: big-data

Attic Predictionio Template Recommender

PredictionIO Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 78 (-95.05%)

Mutual labels: big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (-73.73%)

Mutual labels: big-data

Attaca

Robust, distributed version control for large files.

Stars: ✭ 41 (-97.4%)

Mutual labels: big-data

Cogcomp Nlp

CogComp's Natural Language Processing libraries and Demos:

Stars: ✭ 410 (-73.98%)

Mutual labels: big-data

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-93.85%)

Mutual labels: big-data

Analysispreservation.cern.ch

Source code for the CERN Analysis Preservation portal

Stars: ✭ 37 (-97.65%)

Mutual labels: big-data

Genie

Distributed Big Data Orchestration Service

Stars: ✭ 1,544 (-2.03%)

Mutual labels: big-data

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks