Streaming reference architecture for ETL with Kafka and Kafka-Connect. You can find more on http://lenses.io on how we provide a unified solution to manage your connectors, most advanced SQL engine for Kafka and Kafka Streams, cluster monitoring and alerting, and more.

Stars: ✭ 753 (+46.78%)

Mutual labels: kafka, streaming

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+41.91%)

Mutual labels: kafka, spark

Serverless Analytics

Track website visitors with Serverless Analytics using Kinesis, Lambda, and TypeScript.

Stars: ✭ 219 (-57.31%)

Mutual labels: lambda, analytics

Delta Architecture

Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline

Stars: ✭ 43 (-91.62%)

Mutual labels: kafka, spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-73.29%)

Mutual labels: lambda, spark

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-96.3%)

Mutual labels: spark-streaming, hdfs

matrixone

Hyperconverged cloud-edge native database

Stars: ✭ 1,057 (+106.04%)

Mutual labels: streaming, olap

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Stars: ✭ 113 (-77.97%)

Mutual labels: spark, spark-streaming

Camus

Mirror of Linkedin's Camus

Stars: ✭ 81 (-84.21%)

Mutual labels: kafka, hdfs

MStream

Anomaly Detection on Time-Evolving Streams in Real-time. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.

Stars: ✭ 68 (-86.74%)

Mutual labels: streaming, real-time

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-96.1%)

Mutual labels: spark, hdfs

Real Time Stock Market Prediction

In this repository, I have developed the entire server-side principal architecture for real-time stock market prediction with Machine Learning. I have used Tensorflow.js for constructing ml model architecture, and Kafka for real-time data streaming and pipelining.

Stars: ✭ 414 (-19.3%)

Mutual labels: kafka, streaming

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-97.47%)

Mutual labels: spark, hdfs

traffic

Massively real-time traffic streaming application

Stars: ✭ 25 (-95.13%)

Mutual labels: streaming, real-time

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-81.48%)

Mutual labels: spark, analytics

TogetherStream

A social and synchronized streaming experience

Stars: ✭ 16 (-96.88%)

Mutual labels: streaming, real-time

beneath

Beneath is a serverless real-time data platform ⚡️

Stars: ✭ 65 (-87.33%)

Mutual labels: streaming, analytics

Technology Talk

汇总java生态圈常用技术框架、开源中间件，系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识

Stars: ✭ 12,136 (+2265.69%)

Mutual labels: kafka, spark

Benthos

Fancy stream processing made operationally mundane

Stars: ✭ 3,705 (+622.22%)

Mutual labels: kafka, streaming-data

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-65.5%)

Mutual labels: kafka, spark-streaming

Crate

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.

Stars: ✭ 3,254 (+534.31%)

Mutual labels: analytics, olap

Dagster

An orchestration platform for the development, production, and observation of data assets.

Stars: ✭ 4,099 (+699.03%)

Mutual labels: analytics, workflow

Voik

♒︎ [WIP] An experimental ~distributed~ commit-log

Stars: ✭ 200 (-61.01%)

Mutual labels: kafka, streaming

Waterdrop

Production Ready Data Integration Product, documentation：

Stars: ✭ 1,856 (+261.79%)

Mutual labels: spark, spark-streaming

Video Stream Analytics

Stars: ✭ 240 (-53.22%)

Mutual labels: kafka, spark

Whatsmars

Java生态研究(Spring Boot + Redis + Dubbo + RocketMQ + Elasticsearch)🔥🔥🔥🔥🔥

Stars: ✭ 1,389 (+170.76%)

Mutual labels: lambda, kafka

Storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Stars: ✭ 232 (-54.78%)

Mutual labels: kafka, hdfs

Learningspark

Scala examples for learning to use Spark

Stars: ✭ 421 (-17.93%)

Mutual labels: spark, spark-streaming

Flogo

Project Flogo is an open source ecosystem of opinionated event-driven capabilities to simplify building efficient & modern serverless functions, microservices & edge apps.

Stars: ✭ 1,891 (+268.62%)

Mutual labels: lambda, streaming

Sparkle

Haskell on Apache Spark.

Stars: ✭ 419 (-18.32%)

Mutual labels: spark, analytics

Tributary

Streaming reactive and dataflow graphs in Python

Stars: ✭ 231 (-54.97%)

Mutual labels: kafka, streaming

Cubesviewer

Explore and visualize analytical datasets

Stars: ✭ 416 (-18.91%)

Mutual labels: analytics, olap

transit

Massively real-time city transit streaming application

Stars: ✭ 20 (-96.1%)

Mutual labels: real-time, streaming-data

duckdb

DuckDB is an in-process SQL OLAP Database Management System

Stars: ✭ 4,707 (+817.54%)

Mutual labels: analytics, olap

wink-statistics

Fast & numerically stable statistical analysis

Stars: ✭ 36 (-92.98%)

Mutual labels: streaming, real-time

Clickhouse Native Jdbc

ClickHouse Native Protocol JDBC implementation

Stars: ✭ 310 (-39.57%)

Mutual labels: spark, analytics

transform-hub

Flexible and efficient data processing engine and an evolution of the popular Scramjet Framework based on node.js. Our Transform Hub was designed specifically for data processing and has its own unique algorithms included.

Stars: ✭ 38 (-92.59%)

Mutual labels: streaming, real-time

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-97.27%)

Mutual labels: spark, hdfs

Materialize

Materialize lets you ask questions of your live data, which it answers and then maintains for you as your data continue to change. The moment you need a refreshed answer, you can get it in milliseconds. Materialize is designed to help you interactively explore your streaming data, perform data warehousing analytics against live relational data, or just increase the freshness and reduce the load of your dashboard and monitoring tasks.

Stars: ✭ 3,341 (+551.27%)

Mutual labels: kafka, streaming

Cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.

Stars: ✭ 278 (-45.81%)

Mutual labels: spark, streaming-data

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (-40.94%)

Mutual labels: kafka, spark

Perspective

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.

Stars: ✭ 3,989 (+677.58%)

Mutual labels: analytics, real-time

Kyuubi

Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark

Stars: ✭ 363 (-29.24%)

Mutual labels: spark, analytics

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-20.86%)

Mutual labels: spark, hdfs

kafka-spark-streaming-zeppelin-docker

One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)

Stars: ✭ 82 (-84.02%)

Mutual labels: streaming, spark

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

Stars: ✭ 3,318 (+546.78%)

Mutual labels: spark, spark-streaming

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+660.82%)

Mutual labels: spark, analytics

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (-35.28%)

Mutual labels: kafka, spark

Redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

Stars: ✭ 20,147 (+3827.29%)

Mutual labels: spark, analytics

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (-27.49%)

Mutual labels: kafka, spark

Kafka Connect Ui

Web tool for Kafka Connect |

Stars: ✭ 388 (-24.37%)

Mutual labels: kafka, hdfs

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+132.94%)

Mutual labels: spark, workflow

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-83.82%)

Mutual labels: spark, spark-streaming

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-96.88%)

Mutual labels: spark, hdfs

61-120 of 2813 similar projects

‹

›

next*5