wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+8.62%)
PoseidonA search engine which can hold 100 trillion lines of log data.
Stars: ✭ 1,793 (+2991.38%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+1256.9%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+136.21%)
predictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+21468.97%)
React Image MagnifyA responsive image zoom component designed for shopping sites.
Stars: ✭ 391 (+574.14%)
HamaMirror of Apache Hama
Stars: ✭ 129 (+122.41%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-50%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-6.9%)
AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (+120.69%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (+70.69%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+120.69%)
IgniteApache Ignite
Stars: ✭ 4,027 (+6843.1%)
Mobydq🐳 Tool to automate data quality checks on data pipelines
Stars: ✭ 123 (+112.07%)
storm-mlan online learning algorithm library for Storm
Stars: ✭ 18 (-68.97%)
Report自动化配置报表平台。演示地址http://58.87.112.247/report 账号 visitor密码123456
Stars: ✭ 123 (+112.07%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+10756.9%)
SigmfThe Signal Metadata Format Specification
Stars: ✭ 120 (+106.9%)
SCF4-SDKMotorized zoom lens controller Kurokesu SCF4 module featuring STM32 controller and Onsemi LC898201 driver control software
Stars: ✭ 14 (-75.86%)
DrillApache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+2691.38%)
HiveApache Hive
Stars: ✭ 4,031 (+6850%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (+98.28%)
opticAn Erlang/OTP library for reading and updating deeply nested immutable data.
Stars: ✭ 34 (-41.38%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+2505.17%)
QcportalA client interface to the QCArchive Project (read-only image of QCFractal)
Stars: ✭ 29 (-50%)
AmbariMirror of Apache Ambari
Stars: ✭ 1,576 (+2617.24%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-65.52%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (+89.66%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+522.41%)
Big-Data-Demo基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (+151.72%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+11258.62%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (+72.41%)
talariaTalariaDB is a distributed, highly available, and low latency time-series database for Presto
Stars: ✭ 148 (+155.17%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (+524.14%)
KuduMirror of Apache Kudu
Stars: ✭ 1,360 (+2244.83%)
xcastA High-Performance Data Science Toolkit for the Earth Sciences
Stars: ✭ 28 (-51.72%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (+67.24%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+1667.24%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+2206.9%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+3968.97%)
ReefMirror of Apache REEF
Stars: ✭ 92 (+58.62%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+6360.34%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+56.9%)
Parquet MrApache Parquet
Stars: ✭ 1,278 (+2103.45%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (+1065.52%)
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Stars: ✭ 58 (+0%)
Kibble 1Apache Kibble - a tool to collect, aggregate and visualize data about any software project
Stars: ✭ 54 (-6.9%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+1732.76%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (-36.21%)
Partial.lensesPartial lenses is a comprehensive, high-performance optics library for JavaScript
Stars: ✭ 846 (+1358.62%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (+736.21%)
ibmpairsopen source tools for interaction with IBM PAIRS:
Stars: ✭ 23 (-60.34%)
mascMicrosoft's contributions for Spark with Apache Accumulo
Stars: ✭ 20 (-65.52%)