CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+7891.06%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-17.89%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-35.77%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+1155.28%)
JobJOB, make your short-term command as a long-term job. 将命令行规划成任务的工具
Stars: ✭ 98 (-20.33%)
Parquet MrApache Parquet
Stars: ✭ 1,278 (+939.02%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-8.13%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-34.96%)
Graph samplingGraph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (-19.51%)
PyreportjasperPython Reporting with JasperReports
Stars: ✭ 77 (-37.4%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+8472.36%)
BookkeeperApache Bookkeeper
Stars: ✭ 1,178 (+857.72%)
OrcAn ORC file format reader and writer for Go.
Stars: ✭ 97 (-21.14%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-43.9%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-10.57%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-47.15%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-21.95%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-25.2%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-49.59%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-13.01%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-26.02%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1128.46%)
UreportUReport2 is a high-performance pure Java report engine based on Spring architecture, where complex Chinese-style statements and reports can be prepared by iterating over cells.
Stars: ✭ 1,295 (+952.85%)
NeuropredictEasy and comprehensive assessment of predictive power, with support for neuroimaging features
Stars: ✭ 87 (-29.27%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1216.26%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-34.96%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-18.7%)
IotdbApache IoTDB
Stars: ✭ 1,221 (+892.68%)
AmbariMirror of Apache Ambari
Stars: ✭ 1,576 (+1181.3%)
SigmfThe Signal Metadata Format Specification
Stars: ✭ 120 (-2.44%)
LabsResearch on distributed system
Stars: ✭ 73 (-40.65%)
KuduMirror of Apache Kudu
Stars: ✭ 1,360 (+1005.69%)
FastreportFree Open Source Reporting tool for .NET6/.NET Core/.NET Framework that helps your application generate document-like reports
Stars: ✭ 1,688 (+1272.36%)
AppdocsApplication Performance Optimization Summary
Stars: ✭ 1,169 (+850.41%)
AweDynamic web based reports/dashboards in Python
Stars: ✭ 98 (-20.33%)
CarbondataMirror of Apache CarbonData
Stars: ✭ 1,158 (+841.46%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-6.5%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-45.53%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-21.14%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-48.78%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-11.38%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+987.8%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-4.88%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-7.32%)
TreevizTree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-22.76%)