Top 369 big-data open source projects

Vue Virtual Scroll List
⚡️A vue component support big amount data list with high render performance and efficient.
Cboard
An easy to use, self-service open BI reporting and BI dashboard platform.
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Trafodion
Apache Trafodion
Selinon
An advanced distributed task flow management on top of Celery
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Books
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Lite Virtual List
Virtual list component library supporting waterfall flow based on vue
Usql
U-SQL Examples and Issue Tracking
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Awkward 0.x
Manipulate arrays of complex data structures as easily as Numpy.
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Helicalinsight
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Data Science Live Book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
Presto Go Client
A Presto client for the Go programming language.
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Dvid
Distributed, Versioned, Image-oriented Dataservice
Keyvi
Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Attic Predictionio
PredictionIO, a machine learning server for developers and ML engineers.
Geopyspark
GeoTrellis for PySpark
Keyvi
Keyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Presto
The official home of the Presto distributed SQL query engine for big data
Spark.jl
Julia binding for Apache Spark
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Fili
Easily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
Parquetviewer
Simple windows desktop application for viewing & querying Apache Parquet files
100daysofmlcode
My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Hydrograph
A visual ETL development and debugging tool for big data
Metamodel
Mirror of Apache Metamodel
Storm Doc Zh
Apache Storm 官方文档中文版
Belajarpython.com
Open Source Indonesian Python Programming Tutorial Site
Eel Sdk
Big Data Toolkit for the JVM
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Poseidon
A search engine which can hold 100 trillion lines of log data.
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Attic Apex Malhar
Mirror of Apache Apex malhar
Calcite Avatica
Mirror of Apache Calcite - Avatica
Hama
Mirror of Apache Hama
✭ 129
javabig-data
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Tajo
Mirror of Apache Tajo
✭ 128
javabig-data
1-60 of 369 big-data projects