All Projects → Yannael → Kafka Sparkstreaming Cassandra

Yannael / Kafka Sparkstreaming Cassandra

Docker container for Kafka - Spark Streaming - Cassandra

Projects that are alternatives of or similar to Kafka Sparkstreaming Cassandra

Cnn intent classification
CNN for intent classification task in a Chatbot
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Readingbricks
A structured collection of tagged notes about machine learning theory and practice endowed with search infrastructure that allows users to read requested info only.
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Starcraft2 Replay Analysis
A jupyter notebook that provides analysis for StarCraft 2 replays
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Ucb w205 crook supplement
UC Berkeley, W205 Data Engineering, 2018 Spring, Kevin Crook's supplement
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Tutorial Arima W Jeffrey Yau
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Ai Dl Enthusiasts Meetup
AI & Deep Learning Enthusiasts Meetup Project & Study Sessions
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Deep Dream In Pytorch
Pytorch implementation of the DeepDream computer vision algorithm
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Kaggle House Prices Advanced Regression Techniques
Udacity capstone project: Kaggle competition on house prices prediction using advanced regression techniques
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Theconsciousnessprior
AI-ON Consciousness Prior
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Appliedml python coursera
Material and note of the course of Applied ML in Python
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Computer vision
Some computer vision tutorials for my articles
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Pymc Example Project
Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Deep learning
비전공생도 한눈에 이해하는 딥러닝 자료모음
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Learning Notes
💡 Repo of learning notes in DRL and DL, theory, codes, models and notes maybe.
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Beauty.torch
Understanding facial beauty with deep learning.
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Stnn
Code for the paper "Spatio-Temporal Neural Networks for Space-Time Series Modeling and Relations Discovery"
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Trained Ternary Quantization
Reducing the size of convolutional neural networks
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Sci Pype
A Machine Learning API with native redis caching and export + import using S3. Analyze entire datasets using an API for building, training, testing, analyzing, extracting, importing, and archiving. This repository can run from a docker container or from the repository.
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook
Python For Signal Processing
Notebooks for "Python for Signal Processing" book
Stars: ✭ 1,296 (+1340%)
Mutual labels:  jupyter-notebook
Fashionnet
Fashion recommender system using deep learning
Stars: ✭ 90 (+0%)
Mutual labels:  jupyter-notebook

Docker container for Kafka - Spark streaming - Cassandra

This Dockerfile sets up a complete streaming environment for experimenting with Kafka, Spark streaming (PySpark), and Cassandra. It installs

  • Kafka 0.10.2.1
  • Spark 2.1.1 for Scala 2.11
  • Cassandra 3.7

It additionnally installs

  • Anaconda distribution 4.4.0 for Python 2.7.10
  • Jupyter notebook for Python

Quick start-up guide

Run container using DockerHub image

docker run -p 4040:4040 -p 8888:8888 -p 23:22 -ti --privileged yannael/kafka-sparkstreaming-cassandra

See following video for usage demo.
Demo

Note that any changes you make in the notebook will be lost once you exit de container. In order to keep the changes, it is necessary put your notebooks in a folder on your host, that you share with the container, using for example

docker run -v `pwd`:/home/guest/host -p 4040:4040 -p 8888:8888 -p 23:22 -ti --privileged yannael/kafka-sparkstreaming-cassandra

Note:

  • The "-v pwd:/home/guest/host" shares the local folder (i.e. folder containing Dockerfile, ipynb files, etc...) on your computer - the 'host') with the container in the '/home/guest/host' folder.
  • Port are shared as follows:
    • 4040 bridges to Spark UI
    • 8888 bridges to the Jupyter Notebook
    • 23 bridges to SSH

SSH allows to get a onnection to the container

ssh -p 23 [email protected]

where 'containerIP' is the IP of th container (127.0.0.1 on Linux). Password is 'guest'.

Start services

Once run, you are logged in as root in the container. Run the startup_script.sh (in /usr/bin) to start

  • SSH server. You can connect to the container using user 'guest' and password 'guest'
  • Cassandra
  • Zookeeper server
  • Kafka server
startup_script.sh

Connect, create Cassandra table, open notebook and start streaming

Connect as user 'guest' and go to 'host' folder (shared with the host)

su guest

Start Jupyter notebook

notebook

and connect from your browser at port host:8888 (where 'host' is the IP for your host. If run locally on your computer, this should be 127.0.0.1 or 192.168.99.100, check Docker documentation)

Start Kafka producer

Open kafkaSendDataPy.ipynb and run all cells.

Start Kafka receiver

Open kafkaReceiveAndSaveToCassandraPy.ipynb and run cells up to start streaming. Check in subsequent cells that Cassandra collects data properly.

Connect to Spark UI

It is available in your browser at port 4040

Container configuration details

The container is based on CentOS 6 Linux distribution. The main steps of the building process are

  • Install some common Linux tools (wget, unzip, tar, ssh tools, ...), and Java (1.8)
  • Create a guest user (UID important for sharing folders with host!, see below), and install Spark and sbt, Kafka, Anaconda and Jupyter notbooks for the guest user
  • Go back to root user, and install startup script (for starting SSH and Cassandra services), sentenv.sh script to set up environment variables (JAVA, Kafka, Spark, ...), spark-default.conf, and Cassandra

User UID

In the Dockerfile, the line

RUN useradd guest -u 1000

creates the user under which the container will be run as a guest user. The username is 'guest', with password 'guest', and the '-u' parameter sets the linux UID for that user.

In order to make sharing of folders easier between the container and your host, make sure this UID matches your user UID on the host. You can see what your host UID is with

echo $UID

Build and running the container from scratch

Clone this repository

git clone https://github.com/Yannael/kafka-sparkstreaming-cassandra

Build

From Dockerfile folder, run

docker build -t kafka-sparkstreaming-cassandra .

It may take about 30 minutes to complete.

Run

docker run -v `pwd`:/home/guest/host -p 4040:4040 -p 8888:8888 -p 23:22 -ti --privileged kafka-sparkstreaming-cassandra
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].