Top 364 big-data open source projects

Vue Virtual Scroll List
⚡️A vue component support big amount data list with high render performance and efficient.
An easy to use, self-service open BI reporting and BI dashboard platform.
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Apache Trafodion
An advanced distributed task flow management on top of Celery
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Lite Virtual List
Virtual list component library supporting waterfall flow based on vue
U-SQL Examples and Issue Tracking
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Awkward 0.x
Manipulate arrays of complex data structures as easily as Numpy.
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Data Science Live Book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
Presto Go Client
A Presto client for the Go programming language.
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Distributed, Versioned, Image-oriented Dataservice
Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Attic Predictionio
PredictionIO, a machine learning server for developers and ML engineers.
GeoTrellis for PySpark
Keyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
The official home of the Presto distributed SQL query engine for big data
Julia binding for Apache Spark
Tools and Docs on the Azure Data Science Virtual Machine (
Easily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
Simple windows desktop application for viewing & querying Apache Parquet files
My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
A visual ETL development and debugging tool for big data
Mirror of Apache Metamodel
Storm Doc Zh
Apache Storm 官方文档中文版
Open Source Indonesian Python Programming Tutorial Site
Eel Sdk
Big Data Toolkit for the JVM
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
A search engine which can hold 100 trillion lines of log data.
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Attic Apex Malhar
Mirror of Apache Apex malhar
Calcite Avatica
Mirror of Apache Calcite - Avatica
Mirror of Apache Hama
✭ 129
A large-scale entity and relation database supporting aggregation of properties
Mirror of Apache Tajo
✭ 128
