Top 625 spark open source projects

Sparks
No description or website provided.
snowplow-rdb-loader
Stores Snowplow enriched events in Redshift
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
fb scraper
FBLYZE is a Facebook scraping system and analysis system.
spark-operator
Operator for managing the Spark clusters on Kubernetes and OpenShift.
NYC Taxi Pipeline
Design/Implement stream/batch architecture on NYC taxi data | #DE
spark-waimai
基于spark的外卖大数据平台分析系统
spark-sdk-ios
DEPRECATED Particle iOS Cloud SDK. Use -->
net.jgp.books.spark.ch01
Spark in Action, 2nd edition - chapter 1 - Introduction
Ginger
Ginger - Opinionated RESTful Routing powered by Spark
platys-modern-data-platform
Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....
feature-store-api
Python - Java/Scala API for the Hopsworks feature store
spark-hats
Nested array transformation helper extensions for Apache Spark
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
learning-spark-with-java
Self-contained examples using Apache Spark with the functional features of Java 8
✭ 46
javaspark
GeoTriples
Publishing Big Geospatial data as Linked Open Geospatial Data
ros hadoop
Hadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
nsmc-zeppelin-notebook
Movie review dataset Word2Vec & sentiment classification Zeppelin notebook
hse spark course
Репозиторий учебных материалов для ДПО от ВШЭ (https://cs.hse.ru/dpo/) и курсов по Apache Spark
spark-scala
Spark with Scala example projects
spark-with-python-course
Contains source files used in the Spark with Python course
spark scala ml examples
Spark 2.0 Scala Machine Learning examples
dstlr
scalable knowledge graph construction from unstructured text
rdf2x
RDF2X converts big RDF datasets to the relational database model, CSV, JSON and ElasticSearch.
pytest-spark
pytest plugin to run the tests with support of pyspark
spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
sbt-spark
Simple SBT plugin to configure Spark applications
tekniq
A framework designed around Kotlin providing Restful HTTP Client, JDBC DSL, Loading Cache, Configurations, Validations, and more
telemetry-streaming
Spark Streaming ETL jobs for Mozilla Telemetry
sagemaker-sparkml-serving-container
This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.
Vector-Tile-Spark-Process
🌏 Clip geographic data into MVT files based on Apache Spark
learning
Walkthrough notebooks for Deep Learning, Machine Learning, Reinforcement Learning, Spark, Statistics, Algorithms, Scala, Python
piglet
A compiler for Pig Latin to Spark and Flink.
cassandra.realtime
Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
Spark-Scala-EKS
Spark Scala docker container sample for AWS testing - EKS & S3
xskipper
An Extensible Data Skipping Framework
cobrix
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
spark-hadoopoffice-ds
A Spark datasource for the HadoopOffice library
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
spark-fm
A parallel implementation of factorization machines based on Spark
421-480 of 625 spark projects