All Projects → kaiwaehner → Kafka Streams Machine Learning Examples

kaiwaehner / Kafka Streams Machine Learning Examples

Licence: apache-2.0
This project contains examples which demonstrate how to deploy analytic models to mission-critical, scalable production environments leveraging Apache Kafka and its Streams API. Models are built with Python, H2O, TensorFlow, Keras, DeepLearning4 and other technologies.

Programming Languages

python
139335 projects - #7 most used programming language
java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Kafka Streams Machine Learning Examples

Kafka Ui
Open-Source Web GUI for Apache Kafka Management
Stars: ✭ 230 (-65.2%)
Mutual labels:  kafka, kafka-streams, kafka-client, open-source
Java Kafka Client
OpenTracing Instrumentation for Apache Kafka Client
Stars: ✭ 101 (-84.72%)
Mutual labels:  kafka, kafka-streams, kafka-client
Ksql Udf Deep Learning Mqtt Iot
Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data
Stars: ✭ 219 (-66.87%)
Mutual labels:  kafka, kafka-client, open-source
Faust
Python Stream Processing
Stars: ✭ 5,899 (+792.44%)
Mutual labels:  kafka, kafka-streams
Kafka
Go driver for Kafka
Stars: ✭ 212 (-67.93%)
Mutual labels:  kafka, open-source
Kafka Streams
equivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (-7.26%)
Mutual labels:  kafka, kafka-streams
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-62.63%)
Mutual labels:  kafka, kafka-streams
Kafkastreams Cep
Complex Event Processing on top of Kafka Streams
Stars: ✭ 257 (-61.12%)
Mutual labels:  kafka, kafka-streams
Node Sinek
🎩 Most advanced high level Node.js Kafka client
Stars: ✭ 262 (-60.36%)
Mutual labels:  kafka, kafka-client
Rust Rdkafka
A fully asynchronous, futures-based Kafka client library for Rust based on librdkafka
Stars: ✭ 637 (-3.63%)
Mutual labels:  kafka, kafka-client
Kafka Go
Kafka library in Go
Stars: ✭ 4,200 (+535.4%)
Mutual labels:  kafka, kafka-client
Kafka Streams Course
Learn Kafka Streams with several examples!
Stars: ✭ 625 (-5.45%)
Mutual labels:  kafka, kafka-streams
Go Streams
A lightweight stream processing library for Go
Stars: ✭ 615 (-6.96%)
Mutual labels:  kafka, kafka-streams
Hivemq Mqtt Tensorflow Kafka Realtime Iot Machine Learning Training Inference
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Stars: ✭ 204 (-69.14%)
Mutual labels:  kafka, kafka-streams
Strimzi Kafka Operator
Apache Kafka running on Kubernetes
Stars: ✭ 2,833 (+328.59%)
Mutual labels:  kafka, kafka-streams
Franz Go
franz-go contains a high performance, pure Go library for interacting with Kafka from 0.8.0 through 2.7.0+. Producing, consuming, transacting, administrating, etc.
Stars: ✭ 199 (-69.89%)
Mutual labels:  kafka, kafka-client
Kq
Kafka-based Job Queue for Python
Stars: ✭ 530 (-19.82%)
Mutual labels:  kafka, kafka-client
Mockedstreams
Scala DSL for Unit-Testing Processing Topologies in Kafka Streams
Stars: ✭ 184 (-72.16%)
Mutual labels:  kafka, kafka-streams
Kafka Streams Scala
Thin Scala wrapper around Kafka Streams Java API
Stars: ✭ 192 (-70.95%)
Mutual labels:  kafka, kafka-streams
Scalatest Embedded Kafka
A library that provides an in-memory Kafka instance to run your tests against.
Stars: ✭ 292 (-55.82%)
Mutual labels:  kafka, kafka-streams

Machine Learning + Kafka Streams Examples

This project contains examples which demonstrate how to deploy analytic models to mission-critical, scalable production leveraging Apache Kafka and its Streams API. Examples will include analytic models built with TensorFlow, Keras, H2O, Python, DeepLearning4J and other technologies.

Kafka Open Source Ecosystem for a Scalable Mission Critical Machine Learning Infrastructure

Material (Blogs Posts, Slides, Videos)

Here is some material about this topic if you want to read and listen to the theory instead of just doing hands-on:

Use Cases and Technologies

The following examples are already available including unit tests:
  • Deployment of a H2O GBM model to a Kafka Streams application for prediction of flight delays
  • Deployment of a H2O Deep Learning model to a Kafka Streams application for prediction of flight delays
  • Deployment of a pre-built TensorFlow CNN model for image recognition
  • Deployment of a DL4J model to predict the species of Iris flowers
  • Deployment of a Keras model (trained with TensorFlow backend) using the Import Model API from DeepLearning4J

More sophisticated use cases around Kafka Streams and other technologies will be added over time in this or related Github project. Some ideas:

  • Image Recognition with H2O and TensorFlow (to show the difference of using H2O instead of using just low level TensorFlow APIs)
  • Anomaly Detection with Autoencoders leveraging DeepLearning4J.
  • Cross Selling and Customer Churn Detection using classical Machine Learning algorithms but also Deep Learning
  • Stateful Stream Processing to combine different model execution steps into a more powerful workflow instead of "just" inferencing single events (a good example might be a streaming process with sliding or session windows).
  • Keras to build different models with Python, TensorFlow, Theano and other Deep Learning frameworks under the hood + Kafka Streams as generic Machine Learning infrastructure to deploy, execute and monitor these different models.
Some other Github projects exist already with more ML + Kafka content:

The most exciting and powerful example first: Streaming Machine Learning at Scale from 100000 IoT Devices with HiveMQ, Apache Kafka and TensorFLow

Here some more demos:

Requirements, Installation and Usage

The code is developed and tested on Mac and Linux operating systems. As Kafka does not support and work well on Windows, this is not tested at all.

Java 8 and Maven 3 are required. Maven will download all required dependencies.

Just download the project and run

            mvn clean package

You can do this in main directory or each module separately.

Apache Kafka 2.5 is currently used. The code is also compatible with Kafka and Kafka Streams 1.1 and 2.x.

Please make sure to run the Maven build without any changes first. If it works without errors, you can change library versions, Java version, etc. and see if it still works or if you need to adjust code.

Every examples includes an implementation and an unit test. The examples are very simple and lightweight. No further configuration is needed to build and run it. Though, for this reason, the generated models are also included (and increase the download size of the project).

The unit tests use some Kafka helper classes like EmbeddedSingleNodeKafkaCluster in package com.github.megachucky.kafka.streams.machinelearning.test.utils so that you can run it without any other configuration or Kafka setup. If you want to run an implementation of a main class in package com.github.megachucky.kafka.streams.machinelearning, you need to start a Kafka cluster (with at least one Zookeeper and one Kafka broker running) and also create the required topics. So check out the unit tests first.

Example 1 - Gradient Boosting with H2O.ai for Prediction of Flight Delays

Detailed info in h2o-gbm

Example 2 - Convolutional Neural Network (CNN) with TensorFlow for Image Recognition

Detailed info in tensorflow-image-recognition

Example 3 - Iris Prediction using a Neural Network with DeepLearning4J (DL4J)

Detailed info in dl4j-deeplearning-iris

Example 4 - Python + Keras + TensorFlow + DeepLearning4j

Detailed info in tensorflow-kerasm

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].