All Projects → spektom → Spark Flamegraph

spektom / Spark Flamegraph

Licence: apache-2.0
Easy CPU Profiling for Apache Spark applications

Programming Languages

shell
77523 projects

Projects that are alternatives of or similar to Spark Flamegraph

Learningsparkv2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+923.33%)
Mutual labels:  spark, apache-spark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-53.33%)
Mutual labels:  spark, apache-spark
Coolplayspark
酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+10960%)
Mutual labels:  spark, apache-spark
spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.0.0
Stars: ✭ 23 (-23.33%)
Mutual labels:  spark, apache-spark
Kafka Storm Starter
Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+2326.67%)
Mutual labels:  spark, apache-spark
Spark Jupyter Aws
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+763.33%)
Mutual labels:  spark, apache-spark
Sparkmeasure
This is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+1126.67%)
Mutual labels:  spark, apache-spark
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+723.33%)
Mutual labels:  spark, apache-spark
Sparkle
Haskell on Apache Spark.
Stars: ✭ 419 (+1296.67%)
Mutual labels:  spark, apache-spark
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+1276.67%)
Mutual labels:  spark, apache-spark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+270%)
Mutual labels:  spark, apache-spark
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+2543.33%)
Mutual labels:  spark, apache-spark
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-56.67%)
Mutual labels:  spark, apache-spark
Spark Notebook
Interactive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+10170%)
Mutual labels:  spark, apache-spark
spark-gradle-template
Apache Spark in your IDE with gradle
Stars: ✭ 39 (+30%)
Mutual labels:  spark, apache-spark
Wirbelsturm
Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+1006.67%)
Mutual labels:  spark, apache-spark
Spark Workshop
Apache Spark™ and Scala Workshops
Stars: ✭ 224 (+646.67%)
Mutual labels:  spark, apache-spark
Mastering Spark Sql Book
The Internals of Spark SQL
Stars: ✭ 234 (+680%)
Mutual labels:  spark, apache-spark
Spark Structured Streaming Book
The Internals of Spark Structured Streaming
Stars: ✭ 371 (+1136.67%)
Mutual labels:  spark, apache-spark
Sparklyr
R interface for Apache Spark
Stars: ✭ 775 (+2483.33%)
Mutual labels:  spark, apache-spark

spark-flamegraph

Build Status

Easy CPU Profiling for Apache Spark applications.

The script spark-submit-flamegraph is a wrapper around standard spark-submit that generates Flame Graph.

Supported Systems

  • Amazon EMR
  • Most Linux distributions
  • Mac (with Homebrew installed)

Prerequisites

The script is adapted for work in Amazon EMR. Otherwise the following utilities must present on your system:

  • perl
  • python2.7 (or set PYTHON environment variable to the Python executabl)
  • pip (or set PIP environment variable to the pip utility)

Running

wget -O /usr/local/bin/spark-submit-flamegraph \
  https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraph

chmod +x /usr/local/bin/spark-submit-flamegraph

Use spark-submit-flamegraph as a replacement for the spark-submit command.

Configuration

To configure use the following environment variables:

Environment Variable Description Default value
SPARK_CMD Spark command to run spark-submit
PYTHON Path to the Python executable python2.7
PIP Path to the pip utility pip

For example, to profile Spark shell session set SPARK_CMD environment variable:

SPARK_CMD=spark-shell /usr/local/bin/spark-submit-flamegraph

Details

The script does the following operations to make profiling Spark applications as easy as possible:

  • Downloads InfluxDB, and starts it on some random port.
  • Starts Spark application using original spark-submit command, with the StatsD profiler Jar in its classpath and with the configuration that tells it to report statistics back to the InfluxDB instance.
  • After running Spark application, queries all the reported metrics from the InfluxDB instance.
  • Run a script that generates the .SVG file.
  • Stops the InfluxDB instance.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].