All Projects → holdenk → sparkProjectTemplate.g8

holdenk / sparkProjectTemplate.g8

Licence: Apache-2.0 License
Template for Spark Projects

Programming Languages

scala
5932 projects
shell
77523 projects

Projects that are alternatives of or similar to sparkProjectTemplate.g8

bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-81.82%)
Mutual labels:  spark
bigkube
Minikube for big data with Scala and Spark
Stars: ✭ 16 (-79.22%)
Mutual labels:  spark
daf-kylo
Kylo integration with PDND (previously DAF).
Stars: ✭ 20 (-74.03%)
Mutual labels:  spark
trembita
Model complex data transformation pipelines easily
Stars: ✭ 44 (-42.86%)
Mutual labels:  spark
Covid19Tracker
A Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (-15.58%)
Mutual labels:  spark
spark-data-sources
Developing Spark External Data Sources using the V2 API
Stars: ✭ 36 (-53.25%)
Mutual labels:  spark
smolder
HL7 Apache Spark Datasource
Stars: ✭ 33 (-57.14%)
Mutual labels:  spark
kafka-spark-streaming-zeppelin-docker
One click deploy docker-compose with Kafka, Spark Streaming, Zeppelin UI and Monitoring (Grafana + Kafka Manager)
Stars: ✭ 82 (+6.49%)
Mutual labels:  spark
confluent-spark-avro
Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-76.62%)
Mutual labels:  spark
dllib
dllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-58.44%)
Mutual labels:  spark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-55.84%)
Mutual labels:  spark
blog
blog entries
Stars: ✭ 39 (-49.35%)
Mutual labels:  spark
spark learning
尚硅谷大数据Spark-2019版最新 Spark 学习
Stars: ✭ 42 (-45.45%)
Mutual labels:  spark
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-67.53%)
Mutual labels:  spark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-67.53%)
Mutual labels:  spark
Casper
A compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-41.56%)
Mutual labels:  spark
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-29.87%)
Mutual labels:  spark
Book
本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-38.96%)
Mutual labels:  spark
spark-http-stream
spark structured streaming via HTTP communication
Stars: ✭ 17 (-77.92%)
Mutual labels:  spark
Spotify-Song-Recommendation-ML
UC Berkeley team's submission for RecSys Challenge 2018
Stars: ✭ 70 (-9.09%)
Mutual labels:  spark

sparkProjectTemplate

A Giter8 template for Scala Spark Projects.

What this gives you

This template will bootstrap a new spark project with everyone's "favourite" wordcount example (modified for stop words). You can then replace the wordcount example as desired, and customize the Spark components your project needs.

To encourage good software development practice, this starts with a project at 100% code coverage (e.g. one test :p), while its expected for this to decrease, we hope you use the provided spark-testing-base library or similar option.

Creating a new project from this template

Have g8 installed? You can run it with:

g8 holdenk/sparkProjectTemplate --name=projectname --organization=com.my.org --sparkVersion=2.2.0

Using sbt (0.13.13+) just do

sbt new holdenk/sparkProjectTemplate.g8

Executing the created project

First go to the project you created:

cd projectname

You can test locally the example spark job included in this template directly from sbt:

sbt "run inputFile.txt outputFile.txt"

then choose CountingLocalApp when prompted.

You can also assemble a fat jar (see sbt-assembly for configuration details):

sbt assembly

then submit as usual to your spark cluster :

/path/to/spark-home/bin/spark-submit \
  --class <package-name>.CountingApp \
  --name the_awesome_app \
  --master <master url> \
  ./target/scala-2.11/<jar name> \
  <input file> <output file>

Related

Want to build your application using the Spark Job Server? The spark-jobserver.g8 template can help you get started too.

License

This project is available under your choice of Apache 2 or CC0 1.0. See https://www.apache.org/licenses/LICENSE-2.0 or https://creativecommons.org/publicdomain/zero/1.0/ respectively. This template is distributed without any warranty.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].