EthicalML / kafka-spark-streaming-zeppelin-docker

Licence: MIT License

One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)

Projects that are alternatives of or similar to kafka-spark-streaming-zeppelin-docker

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.

Stars: ✭ 5,513 (+6623.17%)

Mutual labels: spark, zeppelin

Spark Twitter Stream Example

"Sentiment analysis" on a live Twitter feed with Apache Spark and Apache Bahir

Stars: ✭ 73 (-10.98%)

Mutual labels: streaming, spark

Strimpack

A platform for livestreamers to make a home for their audience.

Stars: ✭ 378 (+360.98%)

Mutual labels: streaming, docker-compose

Docker Spark Cluster

A simple spark standalone cluster for your testing environment purposses

Stars: ✭ 261 (+218.29%)

Mutual labels: spark, docker-compose

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+1998.78%)

Mutual labels: streaming, spark

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+525.61%)

Mutual labels: streaming, spark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+1032.93%)

Mutual labels: streaming, spark

Bigdata Notebook

Stars: ✭ 100 (+21.95%)

Mutual labels: streaming, spark

Teddy

Spark Streaming监控平台，支持任务部署与告警、自启动

Stars: ✭ 120 (+46.34%)

Mutual labels: streaming, spark

Flink Learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+13775.61%)

Mutual labels: streaming, spark

Azure Event Hubs

☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs

Stars: ✭ 233 (+184.15%)

Mutual labels: streaming, spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (+70.73%)

Mutual labels: streaming, spark

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+201.22%)

Mutual labels: streaming, spark

django-boilerplate

(An opinionated) Django boilerplate to run your project on Docker Compose (Redis, Rabbitmq, base/dev/prod settings ..etc) 🌟 Give it a star if you like it.

Stars: ✭ 35 (-57.32%)

Mutual labels: docker-compose

hakase-labs

Learn and Share..

Stars: ✭ 24 (-70.73%)

Mutual labels: docker-compose

MMseqs2-App

MMseqs2 app to run on your workstation or servers

Stars: ✭ 16 (-80.49%)

Mutual labels: docker-compose

docker-parse-mongo

Parse Server with MongoDB ReplicaSet using Docker (for AWS EC2 or GCP GCE)

Stars: ✭ 27 (-67.07%)

Mutual labels: docker-compose

angular-forum

Forum application built with Angular

Stars: ✭ 52 (-36.59%)

Mutual labels: docker-compose

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-69.51%)

Mutual labels: spark

spark learning

尚硅谷大数据Spark-2019版最新 Spark 学习

Stars: ✭ 42 (-48.78%)

Mutual labels: spark

View All Similar Projects ➔

One Click Deploy: Kafka Spark Streaming with Zeppelin UI

This repository contains a docker-compose stack with Kafka and Spark Streaming, together with monitoring with Kafka Manager and a Grafana Dashboard. The networking is set up so Kafka brokers can be accessed from the host.

It also comes with a producer-consumer example using a small subset of the US Census adult income prediction dataset.

High level features:

Monitoring with grafana	Zeppelin UI
Kafka access from host	Multiple spark interpreters

Detail Summary

Container	Image	Tag	Accessible
zookeeper	wurstmeister/zookeeper	latest	172.25.0.11:2181
kafka1	wurstmeister/kafka	2.12-2.2.0	172.25.0.12:9092 (port 8080 for JMX metrics)
kafka1	wurstmeister/kafka	2.12-2.2.0	172.25.0.13:9092 (port 8080 for JMX metrics)
kafka_manager	hlebalbau/kafka_manager	1.3.3.18	172.25.0.14:9000
prometheus	prom/prometheus	v2.8.1	172.25.0.15:9090
grafana	grafana/grafana	6.1.1	172.25.0.16:3000
zeppelin	apache/zeppelin	0.8.1	172.25.0.19:8080

Quickstart

The easiest way to understand the setup is by diving into it and interacting with it.

Running Docker Compose

To run docker compose simply run the following command in the current folder:

docker-compose up -d

This will run deattached. If you want to see the logs, you can run:

docker-compose logs -f -t --tail=10

To see the memory and CPU usage (which comes in handy to ensure docker has enough memory) use:

docker stats

Accessing the notebook

You can access the default notebook by going to http://172.25.0.19:8080/#/notebook/2EAB941ZD. Now we can start running the cells.

1) Setup

Install python-kafka dependency

2) Producer

We have an interpreter called %producer.pyspark that we'll be able to run in parallel.

Load our example dummy dataset

We have made available a 1000-row version of the US Census adult income prediction dataset.

Start the stream of rows

We now take one row at random, and send it using our python-kafka producer. The topic will be created automatically if it doesn't exist (given that auto.create.topics.enable is set to true).

3) Consumer

We now use the %consumer.pyspark interpreter to run our pyspark job in parallel to the producer.

Connect to the stream and print

Now we can run the spark stream job to connect to the topic and listen to data. The job will listen for windows of 2 seconds and will print the ID and "label" for all the rows within that window.

4) Monitor Kafka

We can now use the kafka manager to dive into the current kafka setup.

Setup Kafka Manager

To set up kafka manager we need to configure it. In order to do this, access http://172.25.0.14:9000/addCluster and fill up the following two fields:

Cluster name: Kafka
Zookeeper hosts: 172.25.0.11:2181

Optionally:

You can tick the following;
- Enable JMX Polling
- Poll consumer information

Access the topic information

If your cluster was named "Kafka", then you can go to http://172.25.0.14:9000/clusters/Kafka/topics/default_topic, where you will be able to see the partition offsets. Given that the topic was created automatically, it will have only 1 partition.

Visualise metrics in Grafana

Finally, you can access the default kafka dashboard in Grafana (username is "admin" and password is "password") by going to http://172.25.0.16:3000/d/xyAGlzgWz/kafka?orgId=1

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

EthicalML / kafka-spark-streaming-zeppelin-docker

Labels

Projects that are alternatives of or similar to kafka-spark-streaming-zeppelin-docker

One Click Deploy: Kafka Spark Streaming with Zeppelin UI

High level features:

Monitoring with grafana

Zeppelin UI

Kafka access from host

Multiple spark interpreters

Detail Summary

Quickstart

Running Docker Compose

Accessing the notebook

1) Setup

Install python-kafka dependency

2) Producer

Load our example dummy dataset

Start the stream of rows

3) Consumer

Connect to the stream and print

4) Monitor Kafka

Setup Kafka Manager

Access the topic information

Visualise metrics in Grafana