Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → big-data-europe → Docker Spark

big-data-europe / Docker Spark

Apache Spark docker image

Labels

docker kubernetes dockerfile apache-spark

Projects that are alternatives of or similar to Docker Spark

Spark Flamegraph

Easy CPU Profiling for Apache Spark applications

Stars: ✭ 30 (-97.85%)

Mutual labels: apache-spark

Apache Spark Internals

The Internals of Apache Spark

Stars: ✭ 1,045 (-25.14%)

Mutual labels: apache-spark

Awesome Pulsar

A curated list of Pulsar tools, integrations and resources.

Stars: ✭ 57 (-95.92%)

Mutual labels: apache-spark

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

Stars: ✭ 37 (-97.35%)

Mutual labels: apache-spark

Spark Scala Maven Example

Example Maven configuration for a Spark, Scala project

Stars: ✭ 45 (-96.78%)

Mutual labels: apache-spark

Spark Nkp

Natural Korean Processor for Apache Spark

Stars: ✭ 50 (-96.42%)

Mutual labels: apache-spark

Spark Streaming Monitoring With Lightning

Plot live-stats as graph from ApacheSpark application using Lightning-viz

Stars: ✭ 15 (-98.93%)

Mutual labels: apache-spark

Cuesheet

A framework for writing Spark 2.x applications in a pretty way

Stars: ✭ 86 (-93.84%)

Mutual labels: apache-spark

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

Stars: ✭ 46 (-96.7%)

Mutual labels: apache-spark

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-96.06%)

Mutual labels: apache-spark

Dblink

Distributed Bayesian Entity Resolution in Apache Spark

Stars: ✭ 38 (-97.28%)

Mutual labels: apache-spark

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

Stars: ✭ 45 (-96.78%)

Mutual labels: apache-spark

Awesome Spark

A curated list of awesome Apache Spark packages and resources.

Stars: ✭ 1,061 (-24%)

Mutual labels: apache-spark

Cloud Based Sql Engine Using Spark

Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.

Stars: ✭ 30 (-97.85%)

Mutual labels: apache-spark

Mlflow

Open source platform for the machine learning lifecycle

Stars: ✭ 10,898 (+680.66%)

Mutual labels: apache-spark

Datahacksummit 2017

Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark

Stars: ✭ 30 (-97.85%)

Mutual labels: apache-spark

Spark Sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

Stars: ✭ 1,055 (-24.43%)

Mutual labels: apache-spark

Pyspark Stubs

Apache (Py)Spark type annotations (stub files).

Stars: ✭ 98 (-92.98%)

Mutual labels: apache-spark

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-94.05%)

Mutual labels: apache-spark

Sparkit Learn

PySpark + Scikit-learn = Sparkit-learn

Stars: ✭ 1,073 (-23.14%)

Mutual labels: apache-spark

View All Similar Projects ➔

Spark docker

Docker images to:

Setup a standalone Apache Spark cluster running one Spark Master and multiple Spark workers
Build Spark applications in Java, Scala or Python to run on a Spark cluster

Currently supported versions:

Spark 3.1.1 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
Spark 3.1.1 for Hadoop 3.2 with OpenJDK 11 and Scala 2.12
Spark 3.0.2 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
Spark 3.0.1 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
Spark 3.0.0 for Hadoop 3.2 with OpenJDK 11 and Scala 2.12
Spark 3.0.0 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
Spark 2.4.5 for Hadoop 2.7+ with OpenJDK 8
Spark 2.4.4 for Hadoop 2.7+ with OpenJDK 8
Spark 2.4.3 for Hadoop 2.7+ with OpenJDK 8
Spark 2.4.1 for Hadoop 2.7+ with OpenJDK 8
Spark 2.4.0 for Hadoop 2.8 with OpenJDK 8 and Scala 2.12
Spark 2.4.0 for Hadoop 2.7+ with OpenJDK 8
Spark 2.3.2 for Hadoop 2.7+ with OpenJDK 8
Spark 2.3.1 for Hadoop 2.7+ with OpenJDK 8
Spark 2.3.1 for Hadoop 2.8 with OpenJDK 8
Spark 2.3.0 for Hadoop 2.7+ with OpenJDK 8
Spark 2.2.2 for Hadoop 2.7+ with OpenJDK 8
Spark 2.2.1 for Hadoop 2.7+ with OpenJDK 8
Spark 2.2.0 for Hadoop 2.7+ with OpenJDK 8
Spark 2.1.3 for Hadoop 2.7+ with OpenJDK 8
Spark 2.1.2 for Hadoop 2.7+ with OpenJDK 8
Spark 2.1.1 for Hadoop 2.7+ with OpenJDK 8
Spark 2.1.0 for Hadoop 2.7+ with OpenJDK 8
Spark 2.0.2 for Hadoop 2.7+ with OpenJDK 8
Spark 2.0.1 for Hadoop 2.7+ with OpenJDK 8
Spark 2.0.0 for Hadoop 2.7+ with Hive support and OpenJDK 8
Spark 2.0.0 for Hadoop 2.7+ with Hive support and OpenJDK 7
Spark 1.6.2 for Hadoop 2.6 and later
Spark 1.5.1 for Hadoop 2.6 and later

Using Docker Compose

Add the following services to your docker-compose.yml to integrate a Spark master and Spark worker in your BDE pipeline:

spark-master:
  image: bde2020/spark-master:3.1.1-hadoop3.2
  container_name: spark-master
  ports:
    - "8080:8080"
    - "7077:7077"
  environment:
    - INIT_DAEMON_STEP=setup_spark
spark-worker-1:
  image: bde2020/spark-worker:3.1.1-hadoop3.2
  container_name: spark-worker-1
  depends_on:
    - spark-master
  ports:
    - "8081:8081"
  environment:
    - "SPARK_MASTER=spark://spark-master:7077"
spark-worker-2:
  image: bde2020/spark-worker:3.1.1-hadoop3.2
  container_name: spark-worker-2
  depends_on:
    - spark-master
  ports:
    - "8081:8081"
  environment:
    - "SPARK_MASTER=spark://spark-master:7077"

Make sure to fill in the INIT_DAEMON_STEP as configured in your pipeline.

Running Docker containers without the init daemon

Spark Master

To start a Spark master:

docker run --name spark-master -h spark-master -e ENABLE_INIT_DAEMON=false -d bde2020/spark-master:3.1.1-hadoop3.2

Spark Worker

To start a Spark worker:

docker run --name spark-worker-1 --link spark-master:spark-master -e ENABLE_INIT_DAEMON=false -d bde2020/spark-worker:3.1.1-hadoop3.2

Launch a Spark application

Building and running your Spark application on top of the Spark cluster is as simple as extending a template Docker image. Check the template's README for further documentation.

Kubernetes deployment

The BDE Spark images can also be used in a Kubernetes enviroment.

To deploy a simple Spark standalone cluster issue

kubectl apply -f https://raw.githubusercontent.com/big-data-europe/docker-spark/master/k8s-spark-cluster.yaml

This will setup a Spark standalone cluster with one master and a worker on every available node using the default namespace and resources. The master is reachable in the same namespace at spark://spark-master:7077. It will also setup a headless service so spark clients can be reachable from the workers using hostname spark-client.

Then to use spark-shell issue

kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.1.1-hadoop3.2 -- bash ./spark/bin/spark-shell --master spark://spark-master:7077 --conf spark.driver.host=spark-client

To use spark-submit issue for example

kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:3.1.1-hadoop3.2 -- bash ./spark/bin/spark-submit --class CLASS_TO_RUN --master spark://spark-master:7077 --deploy-mode client --conf spark.driver.host=spark-client URL_TO_YOUR_APP

You can use your own image packed with Spark and your application but when deployed it must be reachable from the workers. One way to achieve this is by creating a headless service for your pod and then use --conf spark.driver.host=YOUR_HEADLESS_SERVICE whenever you submit your application.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,396

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (42) 🔗