All Projects → markush81 → fastdata-cluster

markush81 / fastdata-cluster

Licence: other
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Programming Languages

HTML
75241 projects
shell
77523 projects

Projects that are alternatives of or similar to fastdata-cluster

Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+4185%)
Mutual labels:  spark, yarn, hadoop, hdfs, flink
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+54855%)
Mutual labels:  spark, yarn, hadoop, hdfs
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+360%)
Mutual labels:  spark, hadoop, hdfs, flink
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+29940%)
Mutual labels:  spark, hadoop, hdfs, flink
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+5875%)
Mutual labels:  spark, hadoop, flink
Learning Spark
零基础学习spark,大数据学习
Stars: ✭ 37 (+85%)
Mutual labels:  spark, hadoop, hdfs
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (+45%)
Mutual labels:  cassandra, hadoop, flink
Bigdata Notebook
Stars: ✭ 100 (+400%)
Mutual labels:  spark, hadoop, flink
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-5%)
Mutual labels:  yarn, hadoop, hdfs
Waterdrop
Production Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+9180%)
Mutual labels:  spark, hadoop, flink
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+705%)
Mutual labels:  spark, hadoop, hdfs
skein
A tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+540%)
Mutual labels:  hadoop, cluster, hdfs
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+650%)
Mutual labels:  spark, hadoop, hdfs
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (+715%)
Mutual labels:  spark, hadoop, flink
Dockerfiles
50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+4135%)
Mutual labels:  spark, cassandra, hadoop
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+4030%)
Mutual labels:  spark, hadoop, flink
docker-hadoop
Docker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (+195%)
Mutual labels:  yarn, hadoop, hdfs
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+3985%)
Mutual labels:  spark, hadoop, flink
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+530%)
Mutual labels:  spark, hadoop, flink
Vagrant Projects
Vagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
Stars: ✭ 34 (+70%)
Mutual labels:  vagrant, spark, cassandra

Fast Data Cluster

Content

In case you need a local cluster providing Kafka, Cassandra and Spark you're at the right place.

Prerequisites

  • Vagrant (tested with 2.2.14)
  • VirtualBox (tested with 6.1.18)
  • Ansible (tested with 2.10.5)
  • The VMs take approx 18 GB of RAM, so you should have more than that.

⚠️ Vagrant might ask you for your admin password. The reason behind is, that vagrant-hostsupdater is used to have the vms available with their names in your network.

Init

git clone https://github.com/markush81/fastdata-cluster.git
vagrant up

Cluster

The result if everything wents fine should be

FastData Cluster

Coordinates

Servers

IP Hostname Description Settings
192.168.10.2 kafka-1 running a kafka broker 1024 MB RAM
192.168.10.3 kafka-2 running a kafka broker 1024 MB RAM
192.168.10.4 kafka-3 running a kafka broker 1024 MB RAM
192.168.10.5 cassandra-1 running a cassandra node 1024 MB RAM
192.168.10.6 cassandra-2 running a cassandra nodee 1024 MB RAM
192.168.10.7 cassandra-3 running a cassandra node 1024 MB RAM
192.168.10.8 hadoop-1 running a yarn resourcemanager and nodemanager, hdfs namenode, spark distribution, flink distribution 4096 MB RAM
192.168.10.9 hadoop-2 running a yarn nodemanager, hdfs datanode 4096 MB RAM
192.168.10.10 hadoop-3 running a yarn nodemanager, hdfs datanode 4096 MB RAM

Connections

Name
Zookeeper kafka-1:2181,kafka-2:2181,kafka-3:2181
Kafka Brokers kafka-1:9092,kafka-2:9092,kafka-3:9092
Cassandra Hosts cassandra-1,cassandra-2,cassandra-3
YARN Resource Manager http://hadoop-1:8088
HDFS Namenode UI http://hadoop-1:9870

Usage

Cassandra

lucky:~ markus$ vagrant ssh cassandra-1
[vagrant@cassandra-1 ~]$ cqlsh
Connected to analytics at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 4.0-beta4 | CQL spec 3.4.5 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh> CREATE KEYSPACE example WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
cqlsh> USE example;
cqlsh:example> CREATE TABLE users (id UUID PRIMARY KEY, lastname text, firstname text );
cqlsh:example> INSERT INTO users (id, lastname, firstname) VALUES (6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47, 'Mustermann','Max') USING TTL 86400 AND TIMESTAMP 123456789;
cqlsh:example> SELECT * FROM users;

 id                                   | firstname | lastname
--------------------------------------+-----------+------------
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |       Max | Mustermann

(1 rows)

Check Cluster Status:

[vagrant@cassandra-1 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load        Tokens  Owns  Host ID                               Rack
UN  192.168.10.5  105.69 KiB  16      ?     74e6aff4-3561-4f48-bdbb-d030a9da0c01  rack1
UN  192.168.10.7  100.65 KiB  16      ?     3b428824-a9f2-4a49-ae1d-3639fc584e92  rack1
UN  192.168.10.6  100.66 KiB  16      ?     4418963f-5e94-4046-9cc1-f9614c6eae6e  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Zookeeper

[vagrant@kafka-1 ~]$ zookeeper-shell.sh kafka-1:2181/
Connecting to kafka-1:2181/
Welcome to ZooKeeper!
JLine support is disabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
ls /
[admin, brokers, cluster, config, consumers, controller, controller_epoch, isr_change_notification, latest_producer_id_block, log_dir_event_notification, zookeeper]
ls /brokers/ids
[0, 1, 2]

Kafka

Topic Creation

lucky:~ markus$ vagrant ssh kafka-1
[vagrant@kafka-1 ~]$ kafka-topics.sh --create --zookeeper kafka-1:2181 --replication-factor 2 --partitions 6 --topic sample
Created topic "sample".
[vagrant@kafka-1 ~]$ kafka-topics.sh --zookeeper kafka-1 --topic sample --describe
Topic:sample	PartitionCount:6	ReplicationFactor:2	Configs:
	Topic: sample	Partition: 0	Leader: 1	Replicas: 1,2	Isr: 1,2
	Topic: sample	Partition: 1	Leader: 2	Replicas: 2,3	Isr: 2,3
	Topic: sample	Partition: 2	Leader: 3	Replicas: 3,1	Isr: 3,1
	Topic: sample	Partition: 3	Leader: 1	Replicas: 1,3	Isr: 1,3
	Topic: sample	Partition: 4	Leader: 2	Replicas: 2,1	Isr: 2,1
	Topic: sample	Partition: 5	Leader: 3	Replicas: 3,2	Isr: 3,2
[vagrant@kafka-1 ~]$

Producer

[vagrant@kafka-1 ~]$ kafka-console-producer.sh --broker-list kafka-1:9092,kafka-3:9092 --topic sample
Hey, is Kafka up and running?

Consumer

[vagrant@kafka-1 ~]$ kafka-console-consumer.sh --bootstrap-server kafka-1:9092,kafka-3:9092 --topic sample --from-beginning
Hey, is Kafka up and running?

YARN

The YARN ResourceManager UI can be accessed by http://hadoop-1:8088, from there you can navigate to your application .

YARN

Spark

Spark Examples

lucky:~ markus$ vagrant ssh hadoop-1
[vagrant@hadoop-1 ~]$ spark-submit --master yarn --class org.apache.spark.examples.SparkPi --deploy-mode cluster --driver-memory 512M --executor-memory 512M --num-executors 2 /usr/local/spark-3.0.2-bin-without-hadoop/examples/jars/spark-examples_2.12-3.0.2.jar 1000

Flink

Flink Example Run

Access Flink UI:

http://hadoop-1:8088/cluster -> Click ID Link of "Flink session cluster" and then "Tracking URL: ApplicationMaster"

Submit a job:

[vagrant@hadoop-1 ~]$ HADOOP_CLASSPATH=$(hadoop classpath) flink run /usr/local/flink-1.12.1/examples/streaming/WordCount.jar

Flink

Further Links

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].