SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-38.76%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-15.5%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-25.58%)
AppdocsApplication Performance Optimization Summary
Stars: ✭ 1,169 (+806.2%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-12.4%)
LabsResearch on distributed system
Stars: ✭ 73 (-43.41%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-21.71%)
OrcAn ORC file format reader and writer for Go.
Stars: ✭ 97 (-24.81%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-48.06%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-11.63%)
TreevizTree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-26.36%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1096.9%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-37.98%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+1896.9%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-17.05%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-9.3%)
CarbondataMirror of Apache CarbonData
Stars: ✭ 1,158 (+797.67%)
Graph samplingGraph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (-23.26%)
KuduMirror of Apache Kudu
Stars: ✭ 1,360 (+954.26%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-49.61%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-10.85%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-24.81%)
Mobydq🐳 Tool to automate data quality checks on data pipelines
Stars: ✭ 123 (-4.65%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+937.21%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1071.32%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-28.68%)
AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (-0.78%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-29.46%)
AmbariMirror of Apache Ambari
Stars: ✭ 1,576 (+1121.71%)
Report自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
Stars: ✭ 123 (-4.65%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-37.98%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-14.73%)
IotdbApache IoTDB
Stars: ✭ 1,221 (+846.51%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+7519.38%)
SigmfThe Signal Metadata Format Specification
Stars: ✭ 120 (-6.98%)
BookkeeperApache Bookkeeper
Stars: ✭ 1,178 (+813.18%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-0.78%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-46.51%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-22.48%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1155.04%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+1172.87%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (-0.78%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-1.55%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+8073.64%)