All Projects → victorcouste → zeppelin-spark-cassandra-demo

victorcouste / zeppelin-spark-cassandra-demo

Licence: other
A demo explaining how to use Zeppelin notebook to access Apache Cassandra data via Apache Spark or CQL language

Projects that are alternatives of or similar to zeppelin-spark-cassandra-demo

dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (+70.59%)
Mutual labels:  cassandra, zeppelin
newsql nosql library
整理12种数据库相关资料,mysql,mariaDB,Percona Server,MongoDB,Redis,RocksDB,TiDB,CouchDB,Cassandra,TokuDB,MemDB,Oceanbase
Stars: ✭ 270 (+1488.24%)
Mutual labels:  cassandra
cassandra-data-apis
Data APIs for Apache Cassandra
Stars: ✭ 18 (+5.88%)
Mutual labels:  cassandra
janusgraph-deployement
A dockerized environment For [JanusGraph + ElasticSearch + Cassandra + GraphExp]
Stars: ✭ 16 (-5.88%)
Mutual labels:  cassandra
cassandra-exporter
Simple Tool to Export / Import Cassandra Tables into JSON
Stars: ✭ 44 (+158.82%)
Mutual labels:  cassandra
docker-cassandra-k8s
Cassandra Docker optimized for Kubernetes
Stars: ✭ 13 (-23.53%)
Mutual labels:  cassandra
cassandra-prometheus
prometheus exporter for cassandra
Stars: ✭ 25 (+47.06%)
Mutual labels:  cassandra
jelass
Janus + Elastic Search + Cassandra docker container with SSL Client Certificates implemented.
Stars: ✭ 13 (-23.53%)
Mutual labels:  cassandra
crystal-cassandra
A Cassandra driver for Crystal
Stars: ✭ 20 (+17.65%)
Mutual labels:  cassandra
battlestax
BattleStax is a stateful JAMStack game that is wholesome fun for the entire crew.
Stars: ✭ 32 (+88.24%)
Mutual labels:  cassandra
cassandra-exporter
Java agent for exporting Cassandra metrics to Prometheus
Stars: ✭ 59 (+247.06%)
Mutual labels:  cassandra
cassandra-web
cassandra web ui
Stars: ✭ 61 (+258.82%)
Mutual labels:  cassandra
spark-notebook-examples
Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin
Stars: ✭ 49 (+188.24%)
Mutual labels:  zeppelin
cassandra-phantom
Cassandra + Phantom Example
Stars: ✭ 64 (+276.47%)
Mutual labels:  cassandra
Cassandra-Data-Modeling
Basic Rules of Cassandra Data Modeling
Stars: ✭ 29 (+70.59%)
Mutual labels:  cassandra
ecaudit
Ericsson Audit plug-in for Apache Cassandra
Stars: ✭ 36 (+111.76%)
Mutual labels:  cassandra
microservices-transactions
Choreography-based sagas to maintain data consistency in a microservice architecture.
Stars: ✭ 20 (+17.65%)
Mutual labels:  cassandra
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (+23.53%)
Mutual labels:  zeppelin
cassandra-top
Cassandra top command to monitor cluster without Datastax OpsCenter, and log nodetool administrative commands
Stars: ✭ 13 (-23.53%)
Mutual labels:  cassandra
workshop-intro-to-cassandra
Learn Apache Cassandra fundamentals in this hands-on workshop
Stars: ✭ 208 (+1123.53%)
Mutual labels:  cassandra

Zeppelin + Spark + Cassandra

Ask Me Anything !

GitHub stars

This is a tutorial explaining how to use Apache Zeppelin notebook to interact with Apache Cassandra NoSQL database through Apache Spark or directly through Cassandra CQL language.

Apache Zeppelin

Apache Spark is web-based notebook that enables interactive data analytics. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently Zeppelin supports many interpreters such as Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown, CQL Cassandra, and Shell. More details can be found here https://zeppelin.incubator.apache.org/

Apache Spark

Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. More details can be found here http://spark.apache.org. It will be used for Cassandra data processing needs (ETL, transformations, analytics ...).

DataStax Spark Cassandra Connector

DataStax have developed a Spark Cassandra Connector to be able to read and write Cassandra data from Spark API. The Spark Cassandra Connector lets you expose Cassandra tables as Spark RDDs (or DataFrames), write Spark RDDs (or DataFrames) to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.

Useful links:

CQL Language

The Cassandra Query Language (CQL) is the primary language for communicating with the Cassandra database. Documentation on CQL usage:

The Cassandra CQL Interpreter for Apache Zeppelin is written by my colleague Duy Hai Doan @doanduyhai

CQL Interpreter documentation for Apache Zeppelin 0.5.5

Installation and Setup

1 - Apache Cassandra and Apache Spark

First you need to install a Cassandra cluster and a Spark cluster connected with the DataStax Spark Cassandra connector. A very simple way to do that is to use DataStax Enterprise (DSE), it’s free for development or test and it contains Apache Cassandra and Apache Spark already linked. You can download DataStax Enterprise from https://academy.datastax.com/downloads and find installation instructions here http://docs.datastax.com/en/getting_started/doc/getting_started/installDSE.html. After the installation, start your DSE Cassandra cluster (it can be a single node) with Spark enable with the command line dse cassandra -k.

2 - Apache Zeppelin

  • Clone Zeppelin repository from https://github.com/apache/incubator-zeppelin

    git clone https://github.com/apache/incubator-zeppelin

  • Compile with the cassandra-spark connector

    Select your version depending of your DataStax Enterprise (DSE) or Apache Spark version installed. For example for DSE 4.8 or Spark 1.4 mvn clean package -Pcassandra-spark-1.4 -DskipTests

3 - Link between Zeppelin and Spark

You have the choice to use Spark embedded within Zeppelin (automatically installed) or your own deployed Spark cluster (with DSE or in standalone). For this last option you may need to tune the $ZEPPELINE_HOME/conf/zeppelin-env.sh file to change the MASTER parameter. By default it is set to spark://127.0.0.1:7077

4 - Start Zeppelin

$ZEPPELIN_HOME\bin\zeppelin-daemon.sh start

Zeppelin must then be available at http://localhost:8080/

5 - Add the property spark.cassandra.connection.host with value 127.0.0.1 (or IP of one of your Cassandra cluster node) to the Spark connector interpreter

6 - Download and import in Zeppelin the demonstration note found at https://raw.githubusercontent.com/victorcouste/zeppelin-spark-cassandra-demo/master/Demo_Zeppelin_Spark_Cassandra.json

7 - Follow paragraphs of Demo_Zeppelin_Spark_Cassandra note and have fun!

8 - Notes on paragraphs

In the paragraph 3, just after running it, you will have to restart the Spark interpreter. Then the error message must disappear if you re-run the paragraph 3.

In the last paragraph 17, you can finally create a dashboard that can be embedded in a Website iFrame. Click on “Link this paragraph” and you will get in a new window the URL of the dashboard.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].