⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

✭ 58

java machine-learning json data-science azure spark csv s3 text query scale svm avro hdfs parquet root

Docker Spark Cluster

A Spark cluster setup running on Docker containers

✭ 57

shell scala docker docker-image spark big-data hadoop

Model Serving Tutorial

Code and presentation for Strata Model Serving tutorial

✭ 57

scala tensorflow kafka spark flink akka-streams

Awesome Pulsar

A curated list of Pulsar tools, integrations and resources.

✭ 57

spark prometheus messaging apache-spark apache-kafka grafana-dashboard pub-sub

Net.jgp.labs.spark

Apache Spark examples exclusively in Java

✭ 55

java spark dataframe

Pulsar Spark

When Apache Pulsar meets Apache Spark

✭ 55

scala data-science spark apache-spark stream-processing flink data-processing batch-processing

Docker Hadoop

A Docker container with a full Hadoop cluster setup with Spark and Zeppelin

✭ 54

shell docker spark hadoop

Utils4s

scala、spark使用过程中，各种测试用例以及相关资料整理

✭ 1,070

scala spark akka spark-streaming

Spark Submit Ui

This is a based on playframwork for submit spark app

✭ 53

scala css spark play-framework

Play Spark Scala

✭ 51

scala css spark akka sbt play-framework

Spark Nkp

Natural Korean Processor for Apache Spark

✭ 50

scala nlp natural-language-processing spark apache-spark text-mining

Apache Spark Internals

The Internals of Apache Spark

✭ 1,045

book spark apache-spark

Awesome Recommendation Engine

The purpose of this tiny project is to put things together with the know how that i learned from the course big data expert from formacionhadoop.com The idea is to show how to play with apache spark streaming, kafka,mongo, spark machine learning algorithms.

✭ 47

scala machine-learning mongodb kafka spark amazon

Spark As Service Using Embedded Server

This application comes as Spark2.1-as-Service-Provider using an embedded, Reactive-Streams-based, fully asynchronous HTTP server

✭ 46

scala rest-api spark embedded akka apache-spark akka-http

Spark Tda

SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.

✭ 45

scala machine-learning spark ml apache-spark

Delta Architecture

Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline

✭ 43

html kafka spark databases streams

Spark Examples

Spark examples

✭ 41

java spark apache-spark

Gatk

Official code repository for GATK versions 4 and up

✭ 1,002

java spark bioinformatics science genomics sequencing ngs dna genome

Azure Kusto Spark

Apache Spark Connector for Azure Kusto

✭ 40

scala azure spark

Pixiedust

Python Helper library for Jupyter Notebooks

✭ 998

python jupyter-notebook data-science visualization spark

Data Ingestion Platform

✭ 39

java spark apex flink batch-processing storm

Snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

✭ 995

scala spark analytics stream transaction scale

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

✭ 986

jupyter-notebook machine-learning data-science spark data-analysis bigdata pyspark data-cleaning data-wrangling

Real Time Stream Processing Engine

This is an example of real time stream processing using Spark Streaming, Kafka & Elasticsearch.

✭ 37

scala elasticsearch kafka spark apache-spark spark-streaming

Weblogsanalysissystem

A big data platform for analyzing web access logs

✭ 37

java scala spark hadoop echarts hbase

Learning Spark

零基础学习spark，大数据学习

✭ 37

python java scala spark hadoop hbase hdfs spark-streaming

Vagrant Projects

Vagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR

✭ 34

python shell r spark jupyter vagrant cassandra ipython

Spark Summit East 2017

✭ 33

spark

Spark Flamegraph

Easy CPU Profiling for Apache Spark applications

✭ 30

shell spark apache-spark

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

✭ 954

python jupyter-notebook spark jupyter kernel cluster notebook magic pyspark pandas-dataframe sql-query

Pucket

Bucketing and partitioning system for Parquet

✭ 29

scala spark thrift hdfs parquet

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

✭ 949

java scala spark hadoop distributed-computing mapreduce

Heracles

High performance HBase / Spark SQL engine

✭ 27

scala spark hbase

Spark

Apache Spark - A unified analytics engine for large-scale data processing

✭ 31,618

python java scala r Jupyter Notebook HiveQL sql spark big-data jdbc

Interview Questions Collection

按知识领域整理面试题，包括C++、Java、Hadoop、机器学习等

✭ 21

java machine-learning database spark interview hadoop

Flint

A Time Series Library for Apache Spark

✭ 878

scala spark timeseries

Tedsds

Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark

✭ 14

jupyter-notebook machine-learning dataset spark

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

✭ 14

python spark analytics apache-spark pyspark

Urhox

Urho3D extension library

✭ 13

spark imgui

Sparkling Titanic

Training models with Apache Spark, PySpark for Titanic Kaggle competition

✭ 12

python spark pyspark

Mlfeature

Feature engineering toolkit for Spark MLlib.

✭ 12

scala machine-learning spark

Mare

MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.

✭ 11

scala docker spark scientific-computing mapreduce parallelization

Sparkjni

A heterogeneous Apache Spark framework.

✭ 11

java spark big-data

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

✭ 857

kafka spark interview interview-questions yarn hadoop bigdata flink hbase hdfs mapreduce

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

✭ 847

shell docker linux kubernetes devops kafka spark rabbitmq hadoop consul zookeeper cassandra solr hbase presto

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

✭ 26

python data-science spark bioinformatics genomics vcf

Spark Swagger

Spark (http://sparkjava.com/) support for Swagger (https://swagger.io/)

✭ 25

java swagger spark swagger-ui

Mobius

C# and F# language binding and extensions to Apache Spark

✭ 929

csharp fsharp dataset spark streaming bigdata apache-spark dataframe spark-streaming mapreduce

Chronicler

Scala toolchain for InfluxDB

✭ 24

scala macros spark udp influxdb akka-http

Spark Tdd Example

A simple Spark TDD example

✭ 23

python jupyter-notebook spark tdd pyspark

Digitrecognizer

Java Convolutional Neural Network example for Hand Writing Digit Recognition

✭ 23

java deep-learning machine-learning neural-network convolutional-neural-networks spark machine-learning-algorithms deeplearning4j

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

✭ 916

java spark hadoop

Spark Scala Tutorial

A free tutorial for Apache Spark.

✭ 907

scala jupyter-notebook tutorial spark jupyter

Yandex Big Data Engineering

✭ 17

jupyter-notebook spark hdfs mapreduce

Parquet Generator

Parquet file generator

✭ 16

scala sql spark parquet

181-240 of 625 spark projects