Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-95.59%)

Mutual labels: big-data

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+52.5%)

Mutual labels: big-data

Sqoop

Mirror of Apache Sqoop

Stars: ✭ 817 (-62.86%)

Mutual labels: big-data

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-99.09%)

Mutual labels: big-data

Couchdb Documentation

Apache CouchDB Documentation

Stars: ✭ 128 (-94.18%)

Mutual labels: big-data

Big-Data-Demo

基于Vue、three.js、echarts，数据可视化展示项目，包含三维模型导入交互、三维模型标注等功能

Stars: ✭ 146 (-93.36%)

Mutual labels: big-data

Titanoboa

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Stars: ✭ 787 (-64.23%)

Mutual labels: big-data

talaria

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Stars: ✭ 148 (-93.27%)

Mutual labels: big-data

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (-39.18%)

Mutual labels: big-data

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-97.27%)

Mutual labels: big-data

Storm

Mirror of Apache Storm

Stars: ✭ 6,297 (+186.23%)

Mutual labels: big-data

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-99.23%)

Mutual labels: big-data

Hydrograph

A visual ETL development and debugging tool for big data

Stars: ✭ 144 (-93.45%)

Mutual labels: big-data

LoL-Match-Prediction

Win probability predictions for League of Legends matches using neural networks

Stars: ✭ 34 (-98.45%)

Mutual labels: big-data

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+199.45%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-99%)

Mutual labels: big-data

Reef

Mirror of Apache REEF

Stars: ✭ 92 (-95.82%)

Mutual labels: big-data

cloudberry

Big Data Visualization

Stars: ✭ 89 (-95.95%)

Mutual labels: big-data

Samza

Mirror of Apache Samza

Stars: ✭ 676 (-69.27%)

Mutual labels: big-data

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (-94.23%)

Mutual labels: big-data

Azuredatalake

Samples and Docs for Azure Data Lake Store and Analytics

Stars: ✭ 128 (-94.18%)

Mutual labels: big-data

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-98.55%)

Mutual labels: big-data

Sdc

Intel® Scalable Dataframe Compiler for Pandas*

Stars: ✭ 623 (-71.68%)

Mutual labels: big-data

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-99.32%)

Mutual labels: big-data

Bitcoin Value Predictor

[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

Stars: ✭ 91 (-95.86%)

Mutual labels: big-data

flume-elasticsearch-sink

Flume sink plugin for Elasticsearch

Stars: ✭ 39 (-98.23%)

Mutual labels: flume

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+157.09%)

Mutual labels: big-data

CS Book

🔥 Latest computer science e-books。提供最新技术类电子书下载， “我无非就是想卷死各位，或者被各位卷死！”

Stars: ✭ 40 (-98.18%)

Mutual labels: big-data

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+488.95%)

Mutual labels: big-data

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (-96.95%)

Mutual labels: big-data

Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+150.59%)

Mutual labels: big-data

RemoteShuffleService

Celeborn provides an elastic and high-performance service for shuffle and spilled data.

Stars: ✭ 262 (-88.09%)

Mutual labels: big-data

Parquet Mr

Apache Parquet

Stars: ✭ 1,278 (-41.91%)

Mutual labels: big-data

terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

Stars: ✭ 25 (-98.86%)

Mutual labels: big-data

Scanner

Efficient video analysis at scale

Stars: ✭ 569 (-74.14%)

Mutual labels: big-data

dxram

A distributed in-memory key-value storage for billions of small objects.

Stars: ✭ 25 (-98.86%)

Mutual labels: big-data

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (-94.18%)

Mutual labels: big-data

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (-46.68%)

Mutual labels: big-data

Nipype

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (-74.68%)

Mutual labels: big-data

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-97.73%)

Mutual labels: big-data

Panoptes

A Global Scale Network Telemetry Ecosystem

Stars: ✭ 80 (-96.36%)

Mutual labels: big-data

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (-76%)

Mutual labels: big-data

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-91.95%)

Mutual labels: big-data

Keyvi

Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

Stars: ✭ 171 (-92.23%)

Mutual labels: big-data

Fluo

Apache Fluo

Stars: ✭ 159 (-92.77%)

Mutual labels: big-data

100daysofmlcode

My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.

Stars: ✭ 146 (-93.36%)

Mutual labels: big-data

Gaffer

A large-scale entity and relation database supporting aggregation of properties