All Projects → actionml → docker-spark

actionml / docker-spark

Licence: Apache-2.0 License
Apache Spark docker container image (Standalone mode)

Programming Languages

shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to docker-spark

swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (+2.94%)
Mutual labels:  spark
spark-gradle-template
Apache Spark in your IDE with gradle
Stars: ✭ 39 (+14.71%)
Mutual labels:  spark
spark-word2vec
A parallel implementation of word2vec based on Spark
Stars: ✭ 24 (-29.41%)
Mutual labels:  spark
spark-druid-olap
Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+741.18%)
Mutual labels:  spark
observable-to-standalone
Importing an Observable notebook into a standalone application
Stars: ✭ 31 (-8.82%)
Mutual labels:  standalone
Search Ads Web Service
Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (-11.76%)
Mutual labels:  spark
Spark-Ar
Resources for Spark AR
Stars: ✭ 43 (+26.47%)
Mutual labels:  spark
Python Master Courses
人生苦短 我用Python
Stars: ✭ 61 (+79.41%)
Mutual labels:  spark
spark-util
low-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-52.94%)
Mutual labels:  spark
spark-kubernetes
spark on kubernetes
Stars: ✭ 80 (+135.29%)
Mutual labels:  spark
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+179.41%)
Mutual labels:  spark
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-20.59%)
Mutual labels:  spark
you-get.exe
You-Get unofficial build executable for Windows || You-Get 非官方构建的可执行文件
Stars: ✭ 40 (+17.65%)
Mutual labels:  standalone
ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-23.53%)
Mutual labels:  spark
spark-sql-flow-plugin
Visualize column-level data lineage in Spark SQL
Stars: ✭ 20 (-41.18%)
Mutual labels:  spark
sparkar-volts
An extensive non-reactive Typescript framework that eases the development experience in Spark AR
Stars: ✭ 15 (-55.88%)
Mutual labels:  spark
yuzhouwan
Code Library for My Blog
Stars: ✭ 39 (+14.71%)
Mutual labels:  spark
sentry-spark
Apache Spark Sentry Integration
Stars: ✭ 14 (-58.82%)
Mutual labels:  spark
spark-acid
ACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+167.65%)
Mutual labels:  spark
shamash
Autoscaling for Google Cloud Dataproc
Stars: ✭ 31 (-8.82%)
Mutual labels:  spark

Go to Docker Hub

Apache Spark docker container image (Standalone mode)

Standalone Spark cluster mode requires a dedicated instance called master to coordinate the cluster workloads. Therefore the usage of an additional cluster manager such as Mesos, YARN or Kubernetes is not necessary. However Standalone cluster can be used with all of these cluster managers. Additionally Standalone cluster mode is the most flexible to deliver Spark workloads for Kubernetes, since as of Spark version 2.4.0 the native Spark Kubernetes support is still very limited.

Starting up

Clone this repo and use docker-compose to bring up the sample standalone spark cluster.

docker-compose up

Note that the default configuration will expose ports 8080 and 8081 correspondingly for the master and the worker containers.

Configuration

Standalone mode supports to container roles master and worker. Depending on which one you need to start you may pass either master or worker command to this container. Worker requires only one argument which is the spark host URL (spark://host:7077).

Fine-tune configuration maybe achieved by mounting /spark/conf/spark-defaults.conf, /spark/conf/spark-env.sh or by passing SPARK_* environment variables directly. See the links bellow for more details:

Important: scratch volumes

  • /spark/work - directory to use for scratch space and job output logs; only on worker. Can be overridden via -w path CLI argument.
  • /tmp - directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk (spark.local.dir setting).

Authors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].