Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
PysparklingA pure Python implementation of Apache Spark's RDD and DStream interfaces.
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Analytics ZooDistributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
SparktorchTrain and run Pytorch models on Apache Spark.
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
AlbedoA recommender system for discovering GitHub repos, built with Apache Spark
ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
OryxOryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
HydrographA visual ETL development and debugging tool for big data
Scalable Data ScienceScalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Spark On K8s OperatorKubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
SplashSplash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
CuesheetA framework for writing Spark 2.x applications in a pretty way
MlflowOpen source platform for the machine learning lifecycle
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Spark NkpNatural Korean Processor for Apache Spark
Spark Sklearn(Deprecated) Scikit-learn integration package for Apache Spark
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
DblinkDistributed Bayesian Entity Resolution in Apache Spark
Datahacksummit 2017Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
MobiusC# and F# language binding and extensions to Apache Spark
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
FlintrockA command-line tool for launching Apache Spark clusters.
OpenscoringREST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models