Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

✭ 115

python aws privacy data big-data s3 gdpr parquet

Asakusafw

Asakusa Framework

✭ 114

java framework big-data hadoop batch mapreduce batch-processing data-flow

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

✭ 1,511

javascript CSS Makefile json data-science dashboard data-visualization data chart csv big-data yaml d3 d3js data-engineering gist business-intelligence data-driven github-gist just-dashboard

Pythondata

repo for code published on pythondata.com

✭ 113

python jupyter-notebook machine-learning data-science data-visualization data-analysis big-data

Ambari

Mirror of Apache Ambari

✭ 1,576

javascript python java HTML powershell Handlebars big-data ambari

Genie

Distributed Big Data Orchestration Service

✭ 1,544

java groovy CSS javascript PLpgSQL shell spring-boot cloud microservices microservice distributed-systems big-data configuration bigdata configuration-management orchestration netflixoss netflix-oss

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

✭ 110

r spark big-data db

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

✭ 109

r jupyter-notebook data-science jupyter data-analysis big-data notebook bigdata exploratory-data-analysis

Attic Predictionio Sdk Java

PredictionIO Java SDK

✭ 107

java scala big-data

Tennis Crystal Ball

Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction

✭ 107

java machine-learning database data-science statistics data-analysis big-data bigdata prediction sports forecast

Mysql perf analyzer

MySQL performance monitoring and analysis.

✭ 1,423

java javascript CSS shell mysql big-data performance-analysis

Maha

A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.

✭ 101

scala postgresql sql analytics big-data oracle hive druid presto

Vizuka

Explore high-dimensional datasets and how your algo handles specific regions.

✭ 100

python machine-learning data-science visualization data-visualization big-data data-mining unsupervised-learning pca kmeans

Graph sampling

Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.

✭ 99

python network big-data data-mining sample network-analysis graphs

Samza Hello Samza

Mirror of Apache Samza

✭ 99

java scala big-data

Bigdata Notes

大数据入门指南 ⭐

✭ 10,991

java scala kafka spark big-data yarn hadoop phoenix zookeeper bigdata hive hbase hdfs mapreduce storm flume azkaban sqoop

Kudu

Mirror of Apache Kudu

✭ 1,360

cplusplus big-data

Orc

An ORC file format reader and writer for Go.

✭ 97

go golang big-data

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

✭ 97

java elasticsearch kafka spark analytics big-data influxdb cassandra stream-processing solr kafka-streams

Streamx

kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)

✭ 96

java aws kafka streaming big-data s3 gcp kafka-connect connector

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

✭ 1,338