DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-61.18%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-97.14%)
KuduMirror of Apache Kudu
Stars: ✭ 1,360 (-38.18%)
predictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+468.64%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-98.82%)
opendcCollaborative Datacenter Simulation and Exploration for Everybody
Stars: ✭ 40 (-98.18%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-98.68%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-99.14%)
subsemblesubsemble R package for ensemble learning on subsets of data
Stars: ✭ 40 (-98.18%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-95.59%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+52.5%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (-62.86%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-99.09%)
Big-Data-Demo基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (-93.36%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (-64.23%)
talariaTalariaDB is a distributed, highly available, and low latency time-series database for Presto
Stars: ✭ 148 (-93.27%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-39.18%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-97.27%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+186.23%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (-93.45%)
LoL-Match-PredictionWin probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-98.45%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+199.45%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-95.82%)
cloudberryBig Data Visualization
Stars: ✭ 89 (-95.95%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (-69.27%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (-94.23%)
AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (-94.18%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-98.55%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-71.68%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-99.32%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-95.86%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+157.09%)
CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-98.18%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+488.95%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-96.95%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+150.59%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (-88.09%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (-74.14%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-98.86%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-94.18%)
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (-46.68%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-74.68%)
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-97.73%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-96.36%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-76%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-91.95%)
KeyviKeyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (-92.23%)
FluoApache Fluo
Stars: ✭ 159 (-92.77%)
100daysofmlcodeMy journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (-93.36%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-25.36%)
OrcAn ORC file format reader and writer for Go.
Stars: ✭ 97 (-95.59%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-98.91%)