AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
FeastFeature Store for Machine Learning
RichdemHigh-performance Terrain and Hydrology Analysis
Mobydq🐳 Tool to automate data quality checks on data pipelines
Report自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
SigmfThe Signal Metadata Format Specification
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
DrillApache Drill is a distributed MPP query layer for self describing data
CmakCMAK is a tool for managing Apache Kafka clusters
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Pythondatarepo for code published on pythondata.com
GenieDistributed Big Data Orchestration Service
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Graph samplingGraph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
KuduMirror of Apache Kudu
OrcAn ORC file format reader and writer for Go.
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
TreevizTree diagrams with JavaScript 🌲 📈
ReefMirror of Apache REEF
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
PanoptesA Global Scale Network Telemetry Ecosystem
Uproot4ROOT I/O in pure Python and NumPy.
SetlA simple Spark-powered ETL framework that just works 🍺
LabsResearch on distributed system
AppdocsApplication Performance Optimization Summary
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data