Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → Mellanox → Sparkrdma

Mellanox / Sparkrdma

Licence: apache-2.0

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Programming Languages

java

68154 projects - #9 most used programming language

scala

5932 projects

Labels

spark big-data hadoop bigdata apache-spark

Projects that are alternatives of or similar to Sparkrdma

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-93.95%)

Mutual labels: big-data, spark, apache-spark, hadoop, bigdata

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+5012.09%)

Mutual labels: spark, big-data, hadoop, bigdata

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-30.23%)

Mutual labels: spark, big-data, hadoop, apache-spark

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-48.37%)

Mutual labels: big-data, spark, apache-spark, hadoop

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (-65.58%)

Mutual labels: spark, hadoop, bigdata

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-66.98%)

Mutual labels: spark, big-data, bigdata

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+522.33%)

Mutual labels: spark, big-data, bigdata

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (-41.4%)

Mutual labels: spark, hadoop, bigdata

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+332.09%)

Mutual labels: spark, bigdata, apache-spark

Bigdata Notebook

Stars: ✭ 100 (-53.49%)

Mutual labels: spark, hadoop, bigdata

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+1248.37%)

Mutual labels: spark, big-data, apache-spark

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-73.49%)

Mutual labels: spark, big-data, hadoop

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-17.67%)

Mutual labels: big-data, hadoop, apache-spark

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+298.6%)

Mutual labels: spark, hadoop, bigdata

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+663.72%)

Mutual labels: spark, big-data, hadoop

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+700.47%)

Mutual labels: spark, bigdata, apache-spark

Splash

Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange

Stars: ✭ 105 (-51.16%)

Mutual labels: spark, bigdata, apache-spark

Azure Event Hubs Spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Stars: ✭ 140 (-34.88%)

Mutual labels: spark, bigdata, apache-spark

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+280%)

Mutual labels: spark, hadoop, bigdata

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-97.67%)

Mutual labels: big-data, hadoop, bigdata

View All Similar Projects ➔

SparkRDMA ShuffleManager Plugin

SparkRDMA is a high performance ShuffleManager plugin for Apache Spark that uses RDMA (instead of TCP) when performing Shuffle data transfers in Spark jobs.

This open-source project is developed, maintained and supported by Mellanox Technologies.

Performance results

Terasort

Running 320GB TeraSort workload with SparkRDMA is x2.63 faster than standard Spark (runtime in seconds)

Test environment:

7 Spark standalone workers on Azure "h16mr" VM instance, Intel Haswell E5-2667 V3,

224GB RAM, 2000GB SSD for temporary storage, Mellanox InfiniBand FDR (56Gb/s)

Also featured at the Spark+AI Summit 2018, please see more info on our session: https://databricks.com/session/accelerated-spark-on-azure-seamless-and-scalable-hardware-offloads-in-the-cloud

Pagerank

Running 19GB Pagerank with SparkRDMA is x2.01 faster than standard Spark (runtime in seconds)

Test environment:

5 Spark standalone workers, 2x Intel Xeon E5-2697 v3 @ 2.60GHz, 25 cores per Worker, 150GB RAM, non-flash storage (HDD)

Mellanox ConnectX-5 network adapter with 100GbE RoCE fabric, connected with a Mellanox Spectrum switch

Wiki pages

For more information on configuration, performance tuning and troubleshooting, please visit the SparkRDMA GitHub Wiki

Runtime requirements

Apache Spark 2.0.0/2.1.0/2.2.0/2.3.0/2.4.0
Java 8
An RDMA-supported network, e.g. RoCE or Infiniband

Installation

Obtain SparkRDMA and DiSNI binaries

Please use the "Releases" page to download pre-built binaries.
If you would like to build the project yourself, please refer to the "Build" section below.

The pre-built binaries are packed as an archive that contains the following files:

spark-rdma-3.1-for-spark-2.0.0-jar-with-dependencies.jar
spark-rdma-3.1-for-spark-2.1.0-jar-with-dependencies.jar
spark-rdma-3.1-for-spark-2.2.0-jar-with-dependencies.jar
spark-rdma-3.1-for-spark-2.3.0-jar-with-dependencies.jar
spark-rdma-3.1-for-spark-2.4.0-jar-with-dependencies.jar
libdisni.so

libdisni.so must be in java.library.path on every Spark Master and Worker (usually in /usr/lib)

Configuration

Provide Spark the location of the SparkRDMA plugin jars by using the extraClassPath option. For standalone mode this can be added to either spark-defaults.conf or any runtime configuration file. For client mode this must be added to spark-defaults.conf. For Spark 2.0.0 (Replace with 2.1.0, 2.2.0, 2.3.0, 2.4.0 according to your Spark version):

spark.driver.extraClassPath   /path/to/SparkRDMA/target/spark-rdma-3.1-for-spark-2.0.0-jar-with-dependencies.jar
spark.executor.extraClassPath /path/to/SparkRDMA/target/spark-rdma-3.1-for-spark-2.0.0-jar-with-dependencies.jar

Running

To enable the SparkRDMA Shuffle Manager plugin, add the following line to either spark-defaults.conf or any runtime configuration file:

spark.shuffle.manager   org.apache.spark.shuffle.rdma.RdmaShuffleManager

Build

Building the SparkRDMA plugin requires Apache Maven and Java 8

Obtain a clone of SparkRDMA
Build the plugin for your Spark version (either 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.4.0), e.g. for Spark 2.0.0:

mvn -DskipTests clean package -Pspark-2.0.0

Obtain a clone of DiSNI for building libdisni:

git clone https://github.com/zrlio/disni.git
cd disni
git checkout tags/v1.7 -b v1.7

Compile and install only libdisni (the jars are already included in the SparkRDMA plugin):

cd libdisni
autoprepare.sh
./configure --with-jdk=/path/to/java8/jdk
make
make install

Community discussions and support

For any questions, issues or suggestions, please use our Google group: https://groups.google.com/forum/#!forum/sparkrdma

Contributions

Any PR submissions are welcome

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 215

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (10) 🔗