Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → alexarchambault → Ammonite Spark

alexarchambault / Ammonite Spark

Licence: other

Run spark calculations from Ammonite

Programming Languages

scala

5932 projects

Labels

spark

Projects that are alternatives of or similar to Ammonite Spark

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+1257.95%)

Mutual labels: spark

Lehar

Visualize data using relative ordering

Stars: ✭ 81 (-7.95%)

Mutual labels: spark

Flint

Webex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)

Stars: ✭ 85 (-3.41%)

Mutual labels: spark

Cleanframes

type-class based data cleansing library for Apache Spark SQL

Stars: ✭ 75 (-14.77%)

Mutual labels: spark

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-10.23%)

Mutual labels: spark

Spark Dependencies

Spark job for dependency links

Stars: ✭ 82 (-6.82%)

Mutual labels: spark

Lpa Detector

Optimize and improve the Label propagation algorithm

Stars: ✭ 75 (-14.77%)

Mutual labels: spark

Spark python ml examples

Spark 2.0 Python Machine Learning examples

Stars: ✭ 87 (-1.14%)

Mutual labels: spark

Spark Gbtlr

Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark

Stars: ✭ 81 (-7.95%)

Mutual labels: spark

Hops Examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops

Stars: ✭ 84 (-4.55%)

Mutual labels: spark

Spark Website

Apache Spark Website

Stars: ✭ 75 (-14.77%)

Mutual labels: spark

Docker Spark

🚢 Docker image for Apache Spark

Stars: ✭ 78 (-11.36%)

Mutual labels: spark

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (-6.82%)

Mutual labels: spark

Ds Cheatsheets

List of Data Science Cheatsheets to rule the world

Stars: ✭ 9,452 (+10640.91%)

Mutual labels: spark

Cuesheet

A framework for writing Spark 2.x applications in a pretty way

Stars: ✭ 86 (-2.27%)

Mutual labels: spark

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (-15.91%)

Mutual labels: spark

Mleap

MLeap: Deploy ML Pipelines to Production

Stars: ✭ 1,232 (+1300%)

Mutual labels: spark

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (+0%)

Mutual labels: spark

Laravel Spark Google2fa

Google Authenticator support for Laravel Spark

Stars: ✭ 86 (-2.27%)

Mutual labels: spark

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-5.68%)

Mutual labels: spark

View All Similar Projects ➔

ammonite-spark

Run spark calculations from Ammonite

ammonite-spark allows to create SparkSessions from Ammonite. It passes some Ammonite internals to a SparkSession, so that spark calculations can be driven from Ammonite, as one would do from a spark-shell.

Table of content

Quick start
AmmoniteSparkSession vs SparkSession
1. Syncing dependencies
Using with standalone cluster
Using with YARN cluster
Troubleshooting
Compatibility
Missing

Quick start

Start Ammonite >= 1.6.3, with the --class-based option. The compatibility section lists the compatible versions of Ammonite and ammonite-spark. Start Ammonite by either following the Ammonite instructions on its website, then do

$ amm --class-based

or use coursier,

$ cs launch ammonite:2.1.4 --scala 2.12.11 -- --class-based

Ensure you are using scala 2.12, the only supported Scala version as of writing this.

At the Ammonite prompt, load the Spark 2.x version of your choice, along with ammonite-spark,

@ import $ivy.`org.apache.spark::spark-sql:2.4.0`
@ import $ivy.`sh.almond::ammonite-spark:0.3.0`

(Note the two :: before spark-sql or ammonite-spark, as these are scala dependencies.)

Then create a SparkSession using the builder provided by ammonite-spark

@ import org.apache.spark.sql._

@ val spark = {
    AmmoniteSparkSession.builder()
      .master("local[*]")
      .getOrCreate()
  }

Note the use of AmmoniteSparkSession.builder(), instead of SparkSession.builder() that one would use when e.g. writing a Spark job.

The builder returned by AmmoniteSparkSession.builder() extends the one of SparkSession.builder(), so that one can call .appName("foo"), .config("key", "value"), etc. on it.

See below for how to use it with standalone clusters, and how to use it with YARN clusters.

Note that ammonite-spark does not rely on a Spark distribution. The driver and executors classpaths are handled from the Ammonite session only, via import $ivy.`…` statements. See INTERNALS for more details.

You can then run Spark calculations, like

@ def sc = spark.sparkContext

@ val rdd = sc.parallelize(1 to 100, 10)

@ val n = rdd.map(_ + 1).sum()

Syncing dependencies

If extra dependencies are loaded, via import $ivy.`…` after the SparkSession has been created, one should call AmmoniteSparkSession.sync() for the newly added JARs to be passed to the Spark executors.

Using with standalone cluster

Simply set the master to spark://… when building the session, e.g.

@ val spark = {
    AmmoniteSparkSession.builder()
      .master("spark://localhost:7077")
      .config("spark.executor.instances", "4")
      .config("spark.executor.memory", "2g")
      .getOrCreate()
  }

Ensure the version of Spark used to start the master and executors matches the one loaded in the Ammonite session (via e.g. import $ivy.`org.apache.spark::spark-sql:X.Y.Z`), and that the machine running Ammonite can access / is accessible from all nodes of the standalone cluster.

Using with YARN cluster

Set the master to "yarn" when building the session, e.g.

@ val spark = {
    AmmoniteSparkSession.builder()
      .master("yarn")
      .config("spark.executor.instances", "4")
      .config("spark.executor.memory", "2g")
      .getOrCreate()
  }

Ensure the configuration directory of the cluster is set in HADOOP_CONF_DIR or YARN_CONF_DIR in the environment, or is available at /etc/hadoop/conf. This directory should contain files like core-site.xml, hdfs-site.xml, … Ensure also that the machine you run Ammonite on can indeed act as the driver (it should have access to and be accessible from the YARN nodes, etc.).

Before raising issues, ensure you are aware of all that needs to be set up to get a working spark-shell from a Spark distribution, and that all of them are passed in one way or another to the SparkSession created from Ammonite.

Troubleshooting

Getting `org.apache.spark.sql.AnalysisException` when calling `.toDS`

Add org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this) on the same lines as those where you define case classes involved, like

@ import spark.implicits._
import spark.implicits._

@ org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this); case class Foo(id: String, value: Int)
defined class Foo

@  val ds = List(Foo("Alice", 42), Foo("Bob", 43)).toDS
ds: Dataset[Foo] = [id: string, value: int]

(This should likely be added automatically in the future.)

Compatibility

ammonite-spark relies on the API of Ammonite, which undergoes non backward compatible changes from time to time. The following table lists which versions of Ammonite ammonite-spark is built against - so is compatible with for sure.

ammonite-spark	Ammonite	almond
`0.1.2`, `0.1.3`	`1.3.2`
`0.2.0`	`1.5.0`	`0.2.0`
`0.3.0`	`1.6.3`	`0.3.0`
`0.4.0`	`1.6.5`	`0.4.0`
`0.4.1`	`1.6.6`	`0.5.0`
`0.4.2`	`1.6.7`	`0.5.0`
`0.5.0`	`1.6.9-8-2a27ffe`	`0.6.0`
`0.6.0`, `0.6.1`	`1.6.9-15-6720d42`	`0.7.0`, `0.8.0`
`0.7.0`	`1.7.1`	`0.8.1`
`0.7.1`	`1.7.3-3-b95f921`
`0.7.2`	`1.7.4`	`0.8.2`, `0.8.3`
`0.8.0`	`1.8.1`
`0.9.0`	`2.0.4`
`0.10.0`	`2.1.4`	`0.10.0`
`0.10.1`	`2.1.4`	`0.10.1`
`0.11.0`	`2.3.8-36-1cce53f3`	`0.11.0`

Missing

Local clusters, Mesos, and Kubernetes, aren't supported yet.

No scala 2.10 or 2.11 support (support for those was dropped by Ammonite).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 88

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (24) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

alexarchambault / Ammonite Spark

Programming Languages

Labels

Projects that are alternatives of or similar to Ammonite Spark

ammonite-spark

Table of content

Quick start

Syncing dependencies

Using with standalone cluster

Using with YARN cluster

Troubleshooting

Getting org.apache.spark.sql.AnalysisException when calling .toDS

Compatibility

Missing

Getting `org.apache.spark.sql.AnalysisException` when calling `.toDS`