Top 625 spark open source projects

SparkJobServerClient
Java Client of the Spark Job Server implementing the arranged Rest APIs
spark-root
Apache Spark Data Source for ROOT File Format
oshinko-s2i
This is a place to put s2i images and utilities for spark application builders for openshift
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
mongo-spark-jupyter
Docker environment that spins up MongoDB replica set, Spark, and Jupyter Lab. Example code uses PySpark and the MongoDB Spark Connector.
sbt-spark-submit
sbt plugin for spark-submit
osm4scala
Scala and Spark library focused on reading OpenStreetMap Pbf files.
spark-notebook-examples
Some notebook examples related to Apache Spark, IPython / Jupyter, Zeppelin
spark-on-k8s-gcp-examples
Example Spark applications that run on Kubernetes and access GCP products, e.g., GCS, BigQuery, and Cloud PubSub
uberscriptquery
UberScriptQuery, a SQL-like DSL to make writing Spark jobs super easy
local-hashicorp-stack
Local Hashicorp Stack for DevOps Development without Hypervisor or Cloud
SparkFastDataAnalysis
《Spark 快速大数据分析》学习笔记
interview-refresh-java-bigdata
a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
LSTM-TensorSpark
Implementation of a LSTM with TensorFlow and distributed on Apache Spark
cdp-spark-datasource
Spark data source for Cognite Data Fusion
openblockchain
{START HERE} docker engine to roll your own openblockchain
rulegin
基于JavaScript Engine的轻量级规则引擎系统,重构于开源IOT项目thingboard
almaren-framework
The Almaren Framework provides a simplified consistent minimalistic layer over Apache Spark. While still allowing you to take advantage of native Apache Spark features. You can still combine it with standard Spark code.
✭ 30
scalaspark
Tweet-Analysis-With-Kafka-and-Spark
A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.
smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
kafka-spark-streaming-example
Simple examle for Spark Streaming over Kafka topic
zdh server
数据采集平台zdh,etl 处理服务
SANSA-Stack
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
lectures-hse-spark
Масштабируемое машинное обучение и анализ больших данных с Apache Spark
CuBit
General-purpose, formally-verified, 64-bit operating system in SPARK/Ada for x86-64
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
generator-mitosis
A micro-service infrastructure generator based on Yeoman/Chatbot, Kubernetes/Docker Swarm, Traefik, Ansible, Jenkins, Spark, Hadoop, Kafka, etc.
Spark-The-Definitive-Guide
한빛미디어에서 출간한 스파크 완벽 가이드 1판의 소스코드 저장소
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Spark ALS
基于spark-ml,spark-mllib,spark-streaming的推荐算法实现
spark.sas7bdat
Read in SAS data in parallel into Apache Spark
spark-gdelt
Binding the GDELT universe in a Spark environment
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
spark-summit-2017-Europe
Spark summit 2017 europe ppt下载
541-600 of 625 spark projects