All Projects → mkuthan → Example Spark Kafka

mkuthan / Example Spark Kafka

Apache Spark and Apache Kafka integration example

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Example Spark Kafka

Real Time Stream Processing Engine
This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.
Stars: ✭ 37 (-69.17%)
Mutual labels:  kafka, spark, spark-streaming
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+80%)
Mutual labels:  kafka, spark, spark-streaming
Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+16.67%)
Mutual labels:  kafka, spark, spark-streaming
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+105.83%)
Mutual labels:  kafka, spark, spark-streaming
Spark Streaming With Kafka
Self-contained examples of Apache Spark streaming integrated with Apache Kafka.
Stars: ✭ 180 (+50%)
Mutual labels:  kafka, spark, spark-streaming
Sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+327.5%)
Mutual labels:  kafka, spark, spark-streaming
Awesome Recommendation Engine
The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.
Stars: ✭ 47 (-60.83%)
Mutual labels:  kafka, spark
Utils4s
scala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+791.67%)
Mutual labels:  spark, spark-streaming
Kinesis Sql
Kinesis Connector for Structured Streaming
Stars: ✭ 120 (+0%)
Mutual labels:  spark, spark-streaming
Spark Mllib Twitter Sentiment Analysis
🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib
Stars: ✭ 113 (-5.83%)
Mutual labels:  spark, spark-streaming
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+614.17%)
Mutual labels:  kafka, spark
Pyspark Examples
Code examples on Apache Spark using python
Stars: ✭ 58 (-51.67%)
Mutual labels:  spark, spark-streaming
Spark States
Custom state store providers for Apache Spark
Stars: ✭ 83 (-30.83%)
Mutual labels:  spark, spark-streaming
Delta Architecture
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-64.17%)
Mutual labels:  kafka, spark
Model Serving Tutorial
Code and presentation for Strata Model Serving tutorial
Stars: ✭ 57 (-52.5%)
Mutual labels:  kafka, spark
Learning Spark
零基础学习spark,大数据学习
Stars: ✭ 37 (-69.17%)
Mutual labels:  spark, spark-streaming
Thingsboard
Open-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+8671.67%)
Mutual labels:  kafka, spark
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+9059.17%)
Mutual labels:  kafka, spark
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-19.17%)
Mutual labels:  kafka, spark
Bigdata Notebook
Stars: ✭ 100 (-16.67%)
Mutual labels:  kafka, spark

Apache Spark and Apache Kafka integration example

Build Status Coverage Status

This example shows how to send processing results from Spark Streaming to Apache Kafka in reliable way. The example follows Spark convention for integration with external data sinks:

// import implicit conversions
import org.mkuthan.spark.KafkaDStreamSink._

// send dstream to Kafka
dstream.sendToKafka(kafkaProducerConfig, topic)

Features

  • KafkaDStreamSink for sending streaming results to Apache Kafka in reliable way.
  • Stream processing fail fast, if the results could not be sent to Apache Kafka.
  • Stream processing is blocked (back pressure), if the Kafka producer is too slow.
  • Stream processing results are flushed explicitly from Kafka producer internal buffer.
  • Kafka producer is shared by all tasks on single JVM (see KafkaProducerFactory).
  • Kafka producer is properly closed when Spark executor is shutdown (see KafkaProducerFactory).
  • Twitter Bijection is used for encoding/decoding KafkaPayload from/into String or Avro.

Quickstart guide

Download latest Apache Kafka distribution and un-tar it.

Start ZooKeeper server:

./bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka server:

./bin/kafka-server-start.sh config/server.properties

Create input topic:

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic input

Create output topic:

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic output

Start Kafka producer:

./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic input

Start Kafka consumer:

./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic output

Run example application:

sbt "runMain example.WordCountJob"

Publish a few words on input topic using Kafka console producer and check the processing result on output topic using Kafka console producer.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].