All Projects → Flume → Similar Projects or Alternatives

380 Open source projects that are alternatives of or similar to Flume

SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (-97.41%)
Mutual labels:  big-data
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-61.18%)
Mutual labels:  big-data
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-97.14%)
Mutual labels:  big-data
Kudu
Mirror of Apache Kudu
Stars: ✭ 1,360 (-38.18%)
Mutual labels:  big-data
predictionio
PredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+468.64%)
Mutual labels:  big-data
Pretzel
Javascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-98.82%)
Mutual labels:  big-data
opendc
Collaborative Datacenter Simulation and Exploration for Everybody
Stars: ✭ 40 (-98.18%)
Mutual labels:  big-data
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-98.68%)
Mutual labels:  big-data
Bandar Log
Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-99.14%)
Mutual labels:  big-data
subsemble
subsemble R package for ensemble learning on subsets of data
Stars: ✭ 40 (-98.18%)
Mutual labels:  big-data
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-95.59%)
Mutual labels:  big-data
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+52.5%)
Mutual labels:  big-data
Sqoop
Mirror of Apache Sqoop
Stars: ✭ 817 (-62.86%)
Mutual labels:  big-data
MLBD
Materials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-99.09%)
Mutual labels:  big-data
Couchdb Documentation
Apache CouchDB Documentation
Stars: ✭ 128 (-94.18%)
Mutual labels:  big-data
Big-Data-Demo
基于Vue、three.js、echarts,数据可视化展示项目,包含三维模型导入交互、三维模型标注等功能
Stars: ✭ 146 (-93.36%)
Mutual labels:  big-data
Titanoboa
Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (-64.23%)
Mutual labels:  big-data
talaria
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
Stars: ✭ 148 (-93.27%)
Mutual labels:  big-data
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-39.18%)
Mutual labels:  big-data
meetups-archivos
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-97.27%)
Mutual labels:  big-data
Storm
Mirror of Apache Storm
Stars: ✭ 6,297 (+186.23%)
Mutual labels:  big-data
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-99.23%)
Mutual labels:  big-data
Hydrograph
A visual ETL development and debugging tool for big data
Stars: ✭ 144 (-93.45%)
Mutual labels:  big-data
LoL-Match-Prediction
Win probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-98.45%)
Mutual labels:  big-data
Cython
The most widely used Python to C compiler
Stars: ✭ 6,588 (+199.45%)
Mutual labels:  big-data
insightedge
InsightEdge Core
Stars: ✭ 22 (-99%)
Mutual labels:  big-data
Reef
Mirror of Apache REEF
Stars: ✭ 92 (-95.82%)
Mutual labels:  big-data
cloudberry
Big Data Visualization
Stars: ✭ 89 (-95.95%)
Mutual labels:  big-data
Samza
Mirror of Apache Samza
Stars: ✭ 676 (-69.27%)
Mutual labels:  big-data
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (-94.23%)
Mutual labels:  big-data
Azuredatalake
Samples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (-94.18%)
Mutual labels:  big-data
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-98.55%)
Mutual labels:  big-data
Sdc
Intel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-71.68%)
Mutual labels:  big-data
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-99.32%)
Mutual labels:  big-data
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-95.86%)
Mutual labels:  big-data
flume-elasticsearch-sink
Flume sink plugin for Elasticsearch
Stars: ✭ 39 (-98.23%)
Mutual labels:  flume
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+157.09%)
Mutual labels:  big-data
CS Book
🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-98.18%)
Mutual labels:  big-data
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+488.95%)
Mutual labels:  big-data
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-96.95%)
Mutual labels:  big-data
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+150.59%)
Mutual labels:  big-data
RemoteShuffleService
Celeborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (-88.09%)
Mutual labels:  big-data
Parquet Mr
Apache Parquet
Stars: ✭ 1,278 (-41.91%)
Mutual labels:  big-data
terraform-aws-kinesis-firehose
This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.
Stars: ✭ 25 (-98.86%)
Mutual labels:  big-data
Scanner
Efficient video analysis at scale
Stars: ✭ 569 (-74.14%)
Mutual labels:  big-data
dxram
A distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-98.86%)
Mutual labels:  big-data
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-94.18%)
Mutual labels:  big-data
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (-46.68%)
Mutual labels:  big-data
Nipype
Workflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-74.68%)
Mutual labels:  big-data
GDLibrary
Matlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-97.73%)
Mutual labels:  big-data
Panoptes
A Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-96.36%)
Mutual labels:  big-data
Thrill
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-76%)
Mutual labels:  big-data
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-91.95%)
Mutual labels:  big-data
Keyvi
Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (-92.23%)
Mutual labels:  big-data
Fluo
Apache Fluo
Stars: ✭ 159 (-92.77%)
Mutual labels:  big-data
100daysofmlcode
My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (-93.36%)
Mutual labels:  big-data
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-25.36%)
Mutual labels:  big-data
Orc
An ORC file format reader and writer for Go.
Stars: ✭ 97 (-95.59%)
Mutual labels:  big-data
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-98.91%)
Mutual labels:  big-data
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-99.36%)
Mutual labels:  flume
241-300 of 380 similar projects