Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+105.83%)

Mutual labels: kafka, spark, spark-streaming

Spark Streaming With Kafka

Self-contained examples of Apache Spark streaming integrated with Apache Kafka.

Stars: ✭ 180 (+50%)

Mutual labels: kafka, spark, spark-streaming

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+327.5%)

Mutual labels: kafka, spark, spark-streaming

Awesome Recommendation Engine

The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.

Stars: ✭ 47 (-60.83%)

Mutual labels: kafka, spark

Utils4s

scala、spark使用过程中，各种测试用例以及相关资料整理

Stars: ✭ 1,070 (+791.67%)

Mutual labels: spark, spark-streaming

Kinesis Sql

Kinesis Connector for Structured Streaming

Stars: ✭ 120 (+0%)

Mutual labels: spark, spark-streaming

Spark Mllib Twitter Sentiment Analysis

🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib

Stars: ✭ 113 (-5.83%)

Mutual labels: spark, spark-streaming

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+614.17%)

Mutual labels: kafka, spark

Pyspark Examples

Code examples on Apache Spark using python

Stars: ✭ 58 (-51.67%)

Mutual labels: spark, spark-streaming

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-30.83%)

Mutual labels: spark, spark-streaming

Delta Architecture

Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline

Stars: ✭ 43 (-64.17%)

Mutual labels: kafka, spark

Model Serving Tutorial

Code and presentation for Strata Model Serving tutorial

Stars: ✭ 57 (-52.5%)

Mutual labels: kafka, spark

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (-69.17%)

Mutual labels: spark, spark-streaming

Thingsboard

Open-source IoT Platform - Device management, data collection, processing and visualization.

Stars: ✭ 10,526 (+8671.67%)

Mutual labels: kafka, spark

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+9059.17%)

Mutual labels: kafka, spark

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-19.17%)

Mutual labels: kafka, spark

Bigdata Notebook

Stars: ✭ 100 (-16.67%)

Mutual labels: kafka, spark

View All Similar Projects ➔

Apache Spark and Apache Kafka integration example

This example shows how to send processing results from Spark Streaming to Apache Kafka in reliable way. The example follows Spark convention for integration with external data sinks:

// import implicit conversions
import org.mkuthan.spark.KafkaDStreamSink._

// send dstream to Kafka
dstream.sendToKafka(kafkaProducerConfig, topic)

Features

KafkaDStreamSink for sending streaming results to Apache Kafka in reliable way.
Stream processing fail fast, if the results could not be sent to Apache Kafka.
Stream processing is blocked (back pressure), if the Kafka producer is too slow.
Stream processing results are flushed explicitly from Kafka producer internal buffer.
Kafka producer is shared by all tasks on single JVM (see KafkaProducerFactory).
Kafka producer is properly closed when Spark executor is shutdown (see KafkaProducerFactory).
Twitter Bijection is used for encoding/decoding KafkaPayload from/into String or Avro.

Quickstart guide

Download latest Apache Kafka distribution and un-tar it.

Start ZooKeeper server:

./bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka server:

./bin/kafka-server-start.sh config/server.properties

Create input topic:

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic input

Create output topic:

./bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic output

Start Kafka producer:

./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic input

Start Kafka consumer:

./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic output

Run example application:

sbt "runMain example.WordCountJob"

Publish a few words on input topic using Kafka console producer and check the processing result on output topic using Kafka console producer.

References

Spark and Kafka integration patterns, part 1
Spark and Kafka integration patterns, part 2
spark-kafka-writer Alternative integration library for writing processing results from Apache Spark to Apache Kafka. Unfortunately at the time of this writing, the library used obsolete Scala Kafka producer API and did not send processing results in reliable way.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 120

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗