All Categories → Data Processing → spark-streaming

Top 58 spark-streaming open source projects

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

✭ 247

react nodejs docker iot kafka azure spark streaming big-data apache-spark kafka-streams spark-streaming streaming-data

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

✭ 216

python scala elasticsearch kafka spark big-data cassandra jdbc hbase pyspark paypal spark-streaming

Example Spark

Spark, Spark Streaming and Spark SQL unit testing strategies

✭ 205

scala testing spark spark-streaming

Registry

Schema Registry

✭ 184

java kafka metadata flink spark-streaming storm schemas

Spark Streaming With Kafka

Self-contained examples of Apache Spark streaming integrated with Apache Kafka.

✭ 180

scala kafka spark spark-streaming

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

✭ 177

python typescript scala nodejs machine-learning docker angular graphql mongodb kafka big-data hadoop apache-spark twitter-api hbase avro parquet spark-streaming

Scramjet

Simple yet powerful live data computation framework

✭ 171

javascript es6 react stream promise reactive-programming spark-streaming transformations

Movie recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

✭ 2,092

java scala mysql nginx hadoop scrapy hive spark-streaming ssm-maven spark-mllib

Streamline

StreamLine - Streaming Analytics

✭ 151

java kafka streaming real-time flink kafka-streams spark-streaming storm

Pyspark Learning

Updated repository

✭ 147

jupyter-notebook spark pyspark spark-streaming

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

✭ 140

scala kafka azure spark streaming real-time stream microsoft apache bigdata apache-spark spark-streaming connector

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Example Spark Kafka

Apache Spark and Apache Kafka integration example

✭ 120

scala kafka spark spark-streaming

Kinesis Sql

Kinesis Connector for Structured Streaming

✭ 120

scala spark spark-streaming

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

✭ 113

scala machine-learning visualization spark spark-streaming

Waterdrop

Production Ready Data Integration Product, documentation：

✭ 1,856

java scala shell spark hadoop flink spark-streaming etl-framework sql-engine etl-pipeline

Spark States

Custom state store providers for Apache Spark

✭ 83

scala spark apache state apache-spark spark-streaming

Pyspark Examples

Code examples on Apache Spark using python

✭ 58

python jupyter-notebook spark apache spark-streaming

Utils4s

scala、spark使用过程中，各种测试用例以及相关资料整理

✭ 1,070

scala spark akka spark-streaming

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

✭ 37

scala elasticsearch kafka spark apache-spark spark-streaming

Learning Spark

零基础学习spark，大数据学习

✭ 37

python java scala spark hadoop hbase hdfs spark-streaming

Project Fortis

Repository for all parts of the Fortis architecture

✭ 27

javascript react graphql azure cassandra spark-streaming

Spark Streaming Monitoring With Lightning

Plot live-stats as graph from ApacheSpark application using Lightning-viz

✭ 15

scala realtime bigdata apache-spark monitoring-tool spark-streaming

Wormhole

Wormhole is a SPaaS (Stream Processing as a Service) Platform

✭ 863

javascript stream-processing spark-streaming

Mobius

C# and F# language binding and extensions to Apache Spark

✭ 929

csharp fsharp dataset spark streaming bigdata apache-spark dataframe spark-streaming mapreduce

Bandar Log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

✭ 19

scala monitoring kafka big-data etl spark-streaming presto

Angel

A Flexible and Powerful Parameter Server for large-scale machine learning

✭ 6,458

java scala Terra shell Jupyter Notebook Dockerfile machine-learning spark model spark-streaming online-learning parameter-server high-dimensional

Streaming Readings

Streaming System 相关的论文读物

✭ 554

streaming apache-spark stream-processing flink dataflow spark-streaming storm

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

✭ 513

scala kafka spark workflow analytics streaming real-time lambda hdfs spark-streaming streaming-data olap

Cdap

An open source framework for building data analytic applications.

✭ 509

python java dataset spark middleware platform integration spark-streaming mapreduce unified

Learningspark

Scala examples for learning to use Spark

✭ 421

scala spark spark-streaming

Sylph

Stream computing platform for bigdata

✭ 362

java sql big-data flink spark-streaming

Coolplayspark

酷玩 Spark: Spark 源代码解析、Spark 类库等

✭ 3,318

scala spark apache-spark spark-streaming sparkcore structured-streaming

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

✭ 20

scala shell kafka big-data monitoring presto athena etl spark-streaming vertica

live twitter sentiment analysis

Live Twitter sentiment analysis using Python, Apache Spark Streaming, Kafka, NLTK, SocketIO

✭ 20

javascript python java HTML twitter-streaming-api spark-streaming nltk twitter-sentiment-analysis

Spark-and-Kafka IoT-Data-Processing-and-Analytics

Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time

✭ 42

python kafka bigdata pyspark spark-streaming iot-sensors

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

✭ 36

java javascript Vue scala SCSS CSS redis vagrant kafka spring-boot hive solr clickhouse hbase spark-streaming openresty flume oozie flink azkaban spark-sql maxwell cdh6

NYC Taxi Pipeline

Design/Implement stream/batch architecture on NYC taxi data | #DE

✭ 16

scala python shell events kafka spark s3 spark-streaming stream-pipeline spark-batch nyc-taxi-dataset

AdRealTimeAnalysis

四川大学拓思艾诺广告流量实时分析项目

✭ 22

java scala mysql spark-streaming

spark-utils

Basic framework utilities to quickly start writing production ready Apache Spark applications

✭ 25

scala framework spark apache-spark spark-streaming data-source convenience spark-applications data-sink

cassandra.realtime

Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink

✭ 25

scala python java shell Dockerfile kafka akka spark cassandra spark-streaming kafka-connect flink flink-stream-processing flink-streaming

architect big data solutions with spark

code, labs and lectures for the course

✭ 40

Jupyter Notebook python shell data-science machine-learning spark etl spark-streaming data-analysis databricks data-science-notebook databricks-notebooks aws-machine-learning

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

✭ 19

scala shell java Dockerfile XSLT javascript elasticsearch kafka akka spark yarn hadoop solr jdbc hbase spark-streaming hdfs parquet

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

✭ 37

java scala shell elasticsearch kafka spark hive hadoop storm hbase zookeeper spark-streaming mr hadoop-rpc

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

✭ 21

spark hive hadoop hbase spark-streaming ibm-bluemix oozie ambari zeppelin webhdfs knox biginsights bigsql

bitnami-docker-spark

Bitnami Docker Image for Apache Spark

✭ 239

shell Dockerfile docker spark containers docker-image spark-streaming bitnami non-root

qs-hadoop

大数据生态圈学习

✭ 18

java scala elasticsearch spark hadoop storm bigdata spark-streaming mapreduce

interview-refresh-java-bigdata

a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

✭ 25

java snippets kafka spark interview recursion garbage-collection spark-streaming mapreduce java-collection

seatunnel-example

seatunnel plugin developing examples.

✭ 27

scala java spark spark-streaming flink sql-engine etl-framework waterdrop etl-pipeline

open-stream-processing-benchmark

This repository contains the code base for the Open Stream Processing Benchmark.

✭ 37

Jupyter Notebook scala shell Dockerfile distributed-systems benchmark real-time kafka spark distributed-computing stream-processing spark-streaming flink kafka-streams benchmark-suite real-time-processing structured-streaming

T-Watch

Real Time Twitter Sentiment Analysis Product

✭ 20

Jupyter Notebook python shell CSS HTML javascript flask airflow kafka spark mongodb aws-s3 spark-streaming aws-ec2 twitter-stream

Tweet-Analysis-With-Kafka-and-Spark

A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.

✭ 18

python HTML javascript kafka spark highcharts spark-streaming node-js analytics-dashboard spark-sql

kafka-twitter-spark-streaming

Counting Tweets Per User in Real-Time

✭ 38

python twitter spark twitter-api pyspark spark-streaming apache-kafka tweepy

Spark ALS

基于spark-ml,spark-mllib,spark-streaming的推荐算法实现

✭ 89

java spark spark-streaming als spark-mllib spark-streaming-als

Real-time-log-analysis-system

🐧基于spark streaming+flume+kafka+hbase的实时日志处理分析系统(分为控制台版本和基于springboot、Echarts等的Web UI可视化版本)

✭ 31

java scala HTML kafka spring-boot hbase spark-streaming flume echarts

fdp-modelserver

An umbrella project for multiple implementations of model serving

✭ 47

scala java kafka akka-http tensorflow spark-streaming akka-streams pmml flink kafka-streams spark-ml model-serving

ExDeMon

A general purpose metrics monitor implemented with Apache Spark. Kafka source, Elastic sink, aggregate metrics, different analysis, notifications, actions, live configuration update, missing metrics, ...

✭ 19

java shell Dockerfile notifications monitor spark monitoring spark-streaming monitoring-application monitoring-tool metric-analysis monitors-metrics

bigdatatutorial

✭ 34

shell spark hadoop bigdata postgresql spark-streaming greenplum spark-sql

1-58 of 58 spark-streaming projects