Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (-32.88%)

Mutual labels: spark, apache-spark

Spark States

Custom state store providers for Apache Spark

Stars: ✭ 83 (-77.45%)

Mutual labels: spark, apache-spark

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

Stars: ✭ 37 (-89.95%)

Mutual labels: spark, apache-spark

Spark On K8s Operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Stars: ✭ 1,780 (+383.7%)

Mutual labels: spark, apache-spark

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+367.66%)

Mutual labels: spark, apache-spark

Spark Workshop

Apache Spark™ and Scala Workshops

Stars: ✭ 224 (-39.13%)

Mutual labels: spark, apache-spark

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+115.49%)

Mutual labels: spark, apache-spark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (-29.62%)

Mutual labels: spark, apache-spark

Spark Structured Streaming Book

The Internals of Spark Structured Streaming

Stars: ✭ 371 (+0.82%)

Mutual labels: spark, apache-spark

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+97.83%)

Mutual labels: spark, apache-spark

Mmlspark

Simple and Distributed Machine Learning

Stars: ✭ 2,899 (+687.77%)

Mutual labels: spark, apache-spark

Spark Examples

Spark examples

Stars: ✭ 41 (-88.86%)

Mutual labels: spark, apache-spark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (-55.16%)

Mutual labels: spark, apache-spark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-96.47%)

Mutual labels: spark, apache-spark

Sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Stars: ✭ 345 (-6.25%)

Mutual labels: spark, performance-metrics

Hbase Rdd

Spark RDD to read, write and delete from HBase

Stars: ✭ 277 (-24.73%)

Mutual labels: spark

Parquet Dotnet

🏐 Apache Parquet for modern .NET

Stars: ✭ 276 (-25%)

Mutual labels: apache-spark

Scalnet

A Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs

Stars: ✭ 342 (-7.07%)

Mutual labels: spark

Stackimpact Go

DEPRECATED StackImpact Go Profiler - Production-Grade Performance Profiler: CPU, memory allocations, blocking calls, errors, metrics, and more

Stars: ✭ 276 (-25%)

Mutual labels: performance-metrics

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (-26.09%)

Mutual labels: spark

Helk

The Hunting ELK

Stars: ✭ 3,097 (+741.58%)

Mutual labels: spark

Crayon

Simple framework agnostic UI router for SPAs

Stars: ✭ 310 (-15.76%)

Mutual labels: spark

Docker Spark Cluster

A simple spark standalone cluster for your testing environment purposses

Stars: ✭ 261 (-29.08%)

Mutual labels: spark

Around Dataengineering

A Data Engineering & Machine Learning Knowledge Hub

Stars: ✭ 257 (-30.16%)

Mutual labels: spark

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (-1.63%)

Mutual labels: spark

Iql

An ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)

Stars: ✭ 341 (-7.34%)

Mutual labels: spark

Delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Stars: ✭ 3,903 (+960.6%)

Mutual labels: spark

Sk Dist

Distributed scikit-learn meta-estimators in PySpark

Stars: ✭ 260 (-29.35%)

Mutual labels: spark

Mist

Serverless proxy for Spark cluster

Stars: ✭ 309 (-16.03%)

Mutual labels: apache-spark

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-30.16%)

Mutual labels: spark

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (-8.42%)

Mutual labels: spark

Spark Gotchas

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

Stars: ✭ 308 (-16.3%)

Mutual labels: apache-spark

Big Data Rosetta Code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Stars: ✭ 254 (-30.98%)

Mutual labels: spark

1-60 of 546 similar projects

›

next*5