All Projects → Intel-bigdata → Spark-PMoF

Intel-bigdata / Spark-PMoF

Licence: Apache-2.0 License
Spark Shuffle Optimization with RDMA+AEP

Programming Languages

C++
36643 projects - #6 most used programming language
scala
5932 projects
java
68154 projects - #9 most used programming language
c
50402 projects - #5 most used programming language

Projects that are alternatives of or similar to Spark-PMoF

blog
blog entries
Stars: ✭ 39 (+39.29%)
Mutual labels:  spark, rdma
sentry-spark
Apache Spark Sentry Integration
Stars: ✭ 14 (-50%)
Mutual labels:  spark
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+239.29%)
Mutual labels:  spark
shamash
Autoscaling for Google Cloud Dataproc
Stars: ✭ 31 (+10.71%)
Mutual labels:  spark
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+78.57%)
Mutual labels:  spark
spark-word2vec
A parallel implementation of word2vec based on Spark
Stars: ✭ 24 (-14.29%)
Mutual labels:  spark
ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-7.14%)
Mutual labels:  spark
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-53.57%)
Mutual labels:  spark
Python Master Courses
人生苦短 我用Python
Stars: ✭ 61 (+117.86%)
Mutual labels:  spark
Search Ads Web Service
Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]
Stars: ✭ 30 (+7.14%)
Mutual labels:  spark
yuzhouwan
Code Library for My Blog
Stars: ✭ 39 (+39.29%)
Mutual labels:  spark
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-3.57%)
Mutual labels:  spark
spark-sql-flow-plugin
Visualize column-level data lineage in Spark SQL
Stars: ✭ 20 (-28.57%)
Mutual labels:  spark
ksmbd
ksmbd kernel server(SMB/CIFS server)
Stars: ✭ 98 (+250%)
Mutual labels:  rdma
docker-spark
Apache Spark docker container image (Standalone mode)
Stars: ✭ 34 (+21.43%)
Mutual labels:  spark
spark-druid-olap
Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+921.43%)
Mutual labels:  spark
spark-gradle-template
Apache Spark in your IDE with gradle
Stars: ✭ 39 (+39.29%)
Mutual labels:  spark
spark-kubernetes
spark on kubernetes
Stars: ✭ 80 (+185.71%)
Mutual labels:  spark
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (+28.57%)
Mutual labels:  spark
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (+14.29%)
Mutual labels:  spark

Spark-PMoF: RPMem extension for Spark Shuffle

Spark-PMoF (Persistent Memory over Fabric), RPMem extension for Spark Shuffle, is a Spark Shuffle Plugin which enables persistent memory and high performance fabric technology like RDMA for Spark shuffle to improve Spark performance in shuffle intensive scneario.

IMPORTANT NOTE

Spark-PMof has been migrated and integrated to OAP: https://github.com/Intel-bigdata/OAP/tree/master/oap-shuffle/RPMem-shuffle. Please Check OAP for most recent update.

Contents

Introduction

Installation

Make sure you got HPNL installed.

git clone https://github.com/Intel-bigdata/Spark-PMoF.git
cd Spark-PMoF; mvn package -DskipTests

If the pmem hardware is ready,it's useful to test by removing the -DskipTests option:

mvn package

Benchmark

Usage

This plugin current supports Spark 2.3 and works well on various Network fabrics, including Socket, RDMA and Omni-Path. Before runing Spark workload, add following contents in spark-defaults.conf, then have fun! :-)

spark.driver.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.executor.extraClassPath Spark-PMoF-PATH/target/sso-0.1-jar-with-dependencies.jar
spark.shuffle.manager org.apache.spark.shuffle.pmof.PmofShuffleManager

Contact

Chendi Xue, [email protected] Jian Zhang, [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].