Top 595 spark open source projects

Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Spark Fast Tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Neo4j Spark Connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Python clone of Spark, a MapReduce alike framework in Python
Book recommender system using collaborative filtering based on Spark
Hadoop Docker
Azure Event Hubs
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Mastering Spark Sql Book
The Internals of Spark SQL
Installations mac ubuntu windows
Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).
Applying Data Science and Machine Learning to Solve Real World Business Problems
Spark Workshop
Apache Spark™ and Scala Workshops
Ruby Spark
Ruby wrapper for Apache Spark
Sagemaker Spark
A Spark library for Amazon SageMaker.
Spark Excel
A Spark plugin for reading Excel files via Apache POI
Big Data Processing Framework - Unified Data API or SQL on Any Storage
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Example Spark
Spark, Spark Streaming and Spark SQL unit testing strategies
Spark Knn
k-Nearest Neighbors algorithm on Spark
✭ 205
Javaorbigdata Interview
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
A scalable nearest neighbor search library in Apache Spark
Js Spark
Realtime calculation distributed system. AKA distributed lodash
Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Kotlin Spark Api
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Spark Streaming With Kafka
Self-contained examples of Apache Spark streaming integrated with Apache Kafka.
💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Unified SQL Analytics Engine Based on SparkSQL
Spark Kafka Writer
Write your Spark data to Kafka seamlessly
Kraps Rpc
A RPC framework leveraging Spark RPC module
Firely's open source FHIR server
✭ 174
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Spark Structured Streaming Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
Spark Iforest
Isolation Forest on Spark
GeoTrellis for PySpark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Big Whale
Whylogs Java
Profile and monitor your ML data pipeline end-to-end
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Vue Info Card
Simple and beautiful card component with an elegant spark line, for VueJS.
An open-source toolkit for large-scale genomic analysis
Scalable Data Science Platform
Content for architecting a data science platform for products using Luigi, Spark & Flask.
HandySpark - bringing pandas-like capabilities to Spark dataframes
Monitor Apache Spark from Jupyter Notebook
Compile-time Language Integrated Queries for Scala
Julia binding for Apache Spark
1-60 of 595 spark projects