All Projects → ypriverol → spark-java8

ypriverol / spark-java8

Licence: other
Java 8 and Spark learning through examples

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to spark-java8

yake
A Rake-like DSL for writing AWS Lambda handlers
Stars: ✭ 146 (+265%)
Mutual labels:  lambda
go-localstack
Go Wrapper for using localstack
Stars: ✭ 56 (+40%)
Mutual labels:  lambda
serverless-lumigo-plugin
Serverless monitoring and troubleshooting plugin to easily apply distributed tracing
Stars: ✭ 59 (+47.5%)
Mutual labels:  lambda
docker-selenium-lambda
The simplest demo of chrome automation by python and selenium in AWS Lambda
Stars: ✭ 172 (+330%)
Mutual labels:  lambda
shelvery-aws-backups
Automating EBS RDS EC2 backups on lambda
Stars: ✭ 31 (-22.5%)
Mutual labels:  lambda
lambdakiq
ActiveJob on SQS & Lambda
Stars: ✭ 131 (+227.5%)
Mutual labels:  lambda
Spacesiren
A honey token manager and alert system for AWS.
Stars: ✭ 247 (+517.5%)
Mutual labels:  lambda
ebs-snapshot-lambda
AWS lambda function to snapshot EBS volumes and purge old snapshots.
Stars: ✭ 37 (-7.5%)
Mutual labels:  lambda
github-task-manager
receive github hook, notify agent, receive task results, notify github
Stars: ✭ 13 (-67.5%)
Mutual labels:  lambda
imprenta
An AWS lambda in python 3 that generates PDF files from HTML using jinja, pdfkit and wkhtmltopdf.
Stars: ✭ 18 (-55%)
Mutual labels:  lambda
terraform-aws-lambda-function
A Terraform module for deploying and managing Lambda functions on Amazon Web Services (AWS). https://aws.amazon.com/lambda/
Stars: ✭ 37 (-7.5%)
Mutual labels:  lambda
website-honestly
🦄 The Red Badger website. Honestly.
Stars: ✭ 26 (-35%)
Mutual labels:  lambda
Hands-On-Serverless-Applications-with-Go
Hands-On Serverless Applications with Go, published by Packt.
Stars: ✭ 92 (+130%)
Mutual labels:  lambda
workshop-serverless-graphql
[AWSKRUG Serverless Group 2019] Serverless GraphQL Workshop
Stars: ✭ 80 (+100%)
Mutual labels:  lambda
aws-is-how
Know How Guide and Hands on Guide for AWS
Stars: ✭ 27 (-32.5%)
Mutual labels:  lambda
zappa-ffmpeg
Run ffmpeg inside a lambda for serverless transformations.
Stars: ✭ 14 (-65%)
Mutual labels:  lambda
leaderboard-app
GitHub leaderboard for your organisation or repo (Serverless SPA)
Stars: ✭ 64 (+60%)
Mutual labels:  lambda
serverless-certificate-creator
serverless plugin to manage the certificate of your lambdas custom domain (API Gateway=
Stars: ✭ 33 (-17.5%)
Mutual labels:  lambda
aws-node-custom-user-pool
Serverless AWS Cognito Custom User Pool Example
Stars: ✭ 15 (-62.5%)
Mutual labels:  lambda
netlify-lambda-function-example
An example Netlify Lambda function that processes payments with Stripe.
Stars: ✭ 93 (+132.5%)
Mutual labels:  lambda

Java 8 and Spark Learning tutorials

This is a collection of Java 8 and Apache Spark examples and concepts, from basic to advanced. It explain basic concepts introduced by Java 8 and how you can merge them with Apache Spark.

The current tutorial or set of examples provide a way of understand Spark 2.0 in details but also to get familiar with Java 8 and it new features like lambda, Stream and reaction programming.

Why Java 8

Java 8 is the latest version of Java which includes two major changes: Lambda expressions and Streams. Java 8 is a revolutionary release of the world’s #1 development platform. It includes a huge upgrade to the Java programming model and a coordinated evolution of the JVM, Java language, and libraries. Java 8 includes features for productivity, ease of use, improved polyglot programming, security and improved performance. Welcome to the latest iteration of the largest, open, standards-based, community-driven platform.

1- Lambda Expressions, a new language feature, has been introduced in this release. They enable you to treat functionality as a method argument, or code as data. Lambda expressions let you express instances of single-method interfaces (referred to as functional interfaces) more compactly.

2- Classes in the new java.util.stream package provide a Stream API to support functional-style operations on streams of elements. The Stream API is integrated into the Collections API, which enables bulk operations on collections, such as sequential or parallel map-reduce transformations including performance improvement for HashMaps with Key Collisions.

Why Spark

Apache Spark™ is a fast and general engine for large-scale data processing. Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Instructions

A good way of using these examples is by first cloning the repo, and then starting your own Spark Java 8.

Installing Java 8 and Spark

Java 8 can be download here. After the installation you need to be sure that the version you are using is java 8, you can check that by running:

java -version

In order to setup Spark locally in you machine you should download the spark version from here. Then you should follow the next steps:

> tar zxvf spark-xxx.tgz
> cd spark-xxx
> build/mvn -DskipTests clean package

After the compilation and before running your first example you should add to your profile the SPARK MASTER Variable:

 > export SPARK_LOCAL_IP=127.0.0.1

To be sure that you spark is installed properly in your machine you can run the first example from spark:

> ./bin/run-example SparkPi

Datasets

Some of the datasets we will use in this learning tutorial are:

  • Tweets Archive from @ypriverol is used in the word count
  • We will be using datasets from the KDD Cup 1999. The results of this competition can be found here.

References

The reference book for these and other Spark related topics is:

  • Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia.

Examples

The following examples can be examined individually, although there is a more or less linear 'story' when followed in sequence. By using different datasets they try to solve a related set of tasks with it.

RDD Basic Examples

Here a list of the most basic examples in Spark-Java8 and definition of the most basic concepts in Spark.

1- SparkWordCount: About How to create a simple JavaRDD in Spark.

2- MaptoDouble: How to generate general statistics about an RDD in Spark

3- SparkAverage: How to compute the average of a set of numbers in Spark.

RDD Sampling Examples

1- SparkSampling: Basic Spark Sampling using functions sample and takesample.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].