Top 369 big-data open source projects

Azuredatalake
Samples and Docs for Azure Data Lake Store and Analytics
Richdem
High-performance Terrain and Hydrology Analysis
Mobydq
🐳 Tool to automate data quality checks on data pipelines
✭ 123
vuebig-data
Report
自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
Scala Spark Tutorial
Project for James' Apache Spark with Scala course
Sigmf
The Signal Metadata Format Specification
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Drill
Apache Drill is a distributed MPP query layer for self describing data
Cmak
CMAK is a tool for managing Apache Kafka clusters
Amazon S3 Find And Forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Mysql perf analyzer
MySQL performance monitoring and analysis.
Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Samza Hello Samza
Mirror of Apache Samza
Kudu
Mirror of Apache Kudu
Orc
An ORC file format reader and writer for Go.
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Streamx
kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Treeviz
Tree diagrams with JavaScript 🌲 📈
Reef
Mirror of Apache REEF
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Smart Array To Tree
Convert large amounts of data array to tree fastly
Parquet Mr
Apache Parquet
Panoptes
A Global Scale Network Telemetry Ecosystem
Uproot4
ROOT I/O in pure Python and NumPy.
Attic Predictionio Template Recommender
PredictionIO Recommendation Engine Template (Scala-based parallelized engine)
Labs
Research on distributed system
Bookkeeper
Apache Bookkeeper
✭ 1,178
javabig-data
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Appdocs
Application Performance Optimization Summary
Carbondata
Mirror of Apache CarbonData
Flink Shaded
Apache Flink shaded artifacts repository
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Cloud Volume
Read and write Neuroglancer datasets programmatically.
Spark Doc Zh
Apache Spark 官方文档中文版
Warp
Convert and analyze large data sets at light speed, on Mac and iOS.
Nabhash
An extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
61-120 of 369 big-data projects