All Projects → P7h → Docker Spark

P7h / Docker Spark

Licence: apache-2.0
🚢 Docker image for Apache Spark

Programming Languages

java
68154 projects - #9 most used programming language
scala
5932 projects

Projects that are alternatives of or similar to Docker Spark

Apache Spark Hands On
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-5.13%)
Mutual labels:  spark, hadoop
Interview Questions Collection
按知识领域整理面试题,包括C++、Java、Hadoop、机器学习等
Stars: ✭ 21 (-73.08%)
Mutual labels:  spark, hadoop
Kylo
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+1074.36%)
Mutual labels:  spark, hadoop
Useractionanalyzeplatform
电商用户行为分析大数据平台
Stars: ✭ 645 (+726.92%)
Mutual labels:  spark, hadoop
Docker Hadoop
A Docker container with a full Hadoop cluster setup with Spark and Zeppelin
Stars: ✭ 54 (-30.77%)
Mutual labels:  spark, hadoop
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+947.44%)
Mutual labels:  spark, hadoop
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+998.72%)
Mutual labels:  spark, hadoop
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+28166.67%)
Mutual labels:  spark, hadoop
Weblogsanalysissystem
A big data platform for analyzing web access logs
Stars: ✭ 37 (-52.56%)
Mutual labels:  spark, hadoop
Learning Spark
零基础学习spark,大数据学习
Stars: ✭ 37 (-52.56%)
Mutual labels:  spark, hadoop
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+7151.28%)
Mutual labels:  spark, hadoop
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-23.08%)
Mutual labels:  spark, hadoop
Alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+6796.15%)
Mutual labels:  spark, hadoop
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+958.97%)
Mutual labels:  spark, hadoop
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+15296.15%)
Mutual labels:  spark, hadoop
Dockerfiles
50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+985.9%)
Mutual labels:  spark, hadoop
Marmaray
Generic Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+430.77%)
Mutual labels:  spark, hadoop
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+7602.56%)
Mutual labels:  spark, hadoop
Data Algorithms Book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+1116.67%)
Mutual labels:  spark, hadoop
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-26.92%)
Mutual labels:  spark, hadoop

docker-spark

Docker Pulls Size

Dockerfiles for Apache Spark.
Apache Spark Docker image is available directly from https://index.docker.io.

This image contains the following softwares:

  • OpenJDK 64-Bit v1.8.0_131
  • Scala v2.12.2
  • SBT v0.13.15
  • Apache Spark v2.2.0

Various versions of Spark Images

Depending on the version of the Spark Image you want, please run the corresponding command.
Latest image is always the most recent version of Apache Spark available. As of 11th July, 2017 it is v2.2.0.

Apache Spark latest [i.e. v2.2.0]

Dockerfile for Apache Spark v2.2.0

docker pull p7hb/docker-spark

Apache Spark v2.2.0

Dockerfile for Apache Spark v2.2.0

docker pull p7hb/docker-spark:2.2.0

Apache Spark v2.1.1

Dockerfile for Apache Spark v2.1.1

docker pull p7hb/docker-spark:2.1.1

Apache Spark v2.1.0

Dockerfile for Apache Spark v2.1.0

docker pull p7hb/docker-spark:2.1.0

Apache Spark v2.0.2

Dockerfile for Apache Spark v2.0.2

docker pull p7hb/docker-spark:2.0.2

Apache Spark v2.0.1

Dockerfile for Apache Spark v2.0.1

docker pull p7hb/docker-spark:2.0.1

Apache Spark v2.0.0

Dockerfile for Apache Spark v2.0.0

docker pull p7hb/docker-spark:2.0.0

Apache Spark v1.6.3

Dockerfile for Apache Spark v1.6.3

docker pull p7hb/docker-spark:1.6.3

Apache Spark v1.6.2

Dockerfile for Apache Spark v1.6.2

docker pull p7hb/docker-spark:1.6.2

Get the latest image

There are 2 ways of getting this image:

  1. Build this image using Dockerfile OR
  2. Pull the image directly from DockerHub.

Build the latest image

Copy the Dockerfile to a folder on your local machine and then invoke the following command.

docker build -t p7hb/docker-spark .

Pull the latest image

docker pull p7hb/docker-spark

Run Spark image

Run the latest image i.e. Apache Spark 2.2.0

Spark latest version as on 11th July, 2017 is 2.2.0. So, :latest or 2.2.0 both refer to the same image.

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark

Run images of previous versions

Other Spark image versions of this repository can be booted by suffixing the image with the Spark version. It can have values of 2.2.0, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0, 1.6.3 and 1.6.2.

Apache Spark latest [i.e. v2.2.0]

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.2.0

Apache Spark v2.1.1

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.1.1

Apache Spark v2.1.0

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.1.0

Apache Spark v2.0.2

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.0.2

Apache Spark v2.0.1

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.0.1

Apache Spark v2.0.0

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:2.0.0

Apache Spark v1.6.3

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:1.6.3

Apache Spark v1.6.2

docker run -it -p 4040:4040 -p 8080:8080 -p 8081:8081 -h spark --name=spark p7hb/docker-spark:1.6.2

The above step will launch and run the image with:

  • root is the user we logged into.
  • spark is the container name.
  • spark is host name of this container.
    • This is very important as Spark Slaves are started using this host name as the master.
  • The container exposes ports 4040, 8080, 8081 for Spark Web UI console(s).

Check softwares and versions

Host name

[email protected]:~# hostname
spark

Java

[email protected]:~# java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_111-8u131-b11-2~bpo8+1-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Scala

[email protected]:~# scala -version
Scala code runner version 2.12.2 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc.

SBT

Running sbt about will download and setup SBT on the image.

Spark

[email protected]:~# spark-shell
Spark context Web UI available at http://172.17.0.2:4040
Spark context available as 'sc' (master = local[*], app id = local-1483032227786).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Spark commands

All the required binaries have been added to the PATH.

Start Spark Master

start-master.sh

Start Spark Slave

start-slave.sh spark://spark:7077

Execute Spark job for calculating Pi Value

spark-submit --class org.apache.spark.examples.SparkPi --master spark://spark:7077 $SPARK_HOME/examples/jars/spark-examples*.jar 100
.......
.......
Pi is roughly 3.140495114049511

OR even simpler

$SPARK_HOME/bin/run-example SparkPi 100
.......
.......
Pi is roughly 3.1413855141385514

Please note the first command above expects Spark Master and Slave to be running. And we can even check the Spark Web UI after executing this command. But with the second command, this is not possible.

Start Spark Shell

spark-shell --master spark://spark:7077

View Spark Master WebUI console

http://192.168.99.100:8080/

View Spark Worker WebUI console

http://192.168.99.100:8081/

View Spark WebUI console

Only available for the duration of the application.

http://192.168.99.100:4040/

Misc Docker commands

Find IP Address of the Docker machine

This is the IP Address which needs to be used to look upto for all the exposed ports of our Docker container.

docker-machine ip default

Find all the running containers

docker ps

Find all the running and stopped containers

docker ps -a

Show running list of containers

docker stats --all shows a running list of containers.

Find IP Address of a specific container

docker inspect <<Container_Name>> | grep IPAddress

Open new terminal to a Docker container

We can open new terminal with new instance of container's shell with the following command.

docker exec -it <<Container_ID>> /bin/bash #by Container ID

OR

docker exec -it <<Container_Name>> /bin/bash #by Container Name

Problems? Questions? Contributions? Contributions welcome

If you find any issues or would like to discuss further, please ping me on my Twitter handle @P7h or drop me an email.

License License

Copyright © 2016 Prashanth Babu.
Licensed under the Apache License, Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].