Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-10.85%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-24.81%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-58.14%)
Mobydq🐳 Tool to automate data quality checks on data pipelines
Stars: ✭ 123 (-4.65%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-59.69%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+937.21%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-62.79%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1071.32%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+694.57%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-28.68%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-68.22%)
AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (-0.78%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-29.46%)
MetricsMeasure behavior of Java applications
Stars: ✭ 35 (-72.87%)
AmbariMirror of Apache Ambari
Stars: ✭ 1,576 (+1121.71%)
SkymapHigh-throughput gene to knowledge mapping through massive integration of public sequencing data.
Stars: ✭ 29 (-77.52%)
Awesome ScalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Stars: ✭ 36,688 (+28340.31%)
Report自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
Stars: ✭ 123 (-4.65%)
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-89.15%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-37.98%)
Dremio OssDremio - the missing link in modern data
Stars: ✭ 862 (+568.22%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-14.73%)
AccumuloApache Accumulo
Stars: ✭ 857 (+564.34%)
IotdbApache IoTDB
Stars: ✭ 1,221 (+846.51%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+562.02%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-79.84%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-85.27%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (+533.33%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+7519.38%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+510.08%)
SigmfThe Signal Metadata Format Specification
Stars: ✭ 120 (-6.98%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+4781.4%)
BookkeeperApache Bookkeeper
Stars: ✭ 1,178 (+813.18%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+5006.98%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (+424.03%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (+382.95%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-0.78%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+4284.5%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-46.51%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+4173.64%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-22.48%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (+341.09%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (+331.78%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+1155.04%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-49.61%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+1172.87%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (-0.78%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-1.55%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+8073.64%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-51.94%)