Top 625 spark open source projects

Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Spark Fast Tests
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Neo4j Spark Connector
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Dpark
Python clone of Spark, a MapReduce alike framework in Python
Recommendationsystem
Book recommender system using collaborative filtering based on Spark
Hadoop Docker
基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Azure Event Hubs
☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Mastering Spark Sql Book
The Internals of Spark SQL
Installations mac ubuntu windows
Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).
Mydatascienceportfolio
Applying Data Science and Machine Learning to Solve Real World Business Problems
Spark.fish
▁▂▄▆▇█▇▆▄▂▁
Spark Workshop
Apache Spark™ and Scala Workshops
Ruby Spark
Ruby wrapper for Apache Spark
Sagemaker Spark
A Spark library for Amazon SageMaker.
Spark Excel
A Spark plugin for reading Excel files via Apache POI
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Example Spark
Spark, Spark Streaming and Spark SQL unit testing strategies
Spark Knn
k-Nearest Neighbors algorithm on Spark
✭ 205
scalaspark
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Ballista
Distributed compute platform implemented in Rust, and powered by Apache Arrow.
Scanns
A scalable nearest neighbor search library in Apache Spark
Js Spark
Realtime calculation distributed system. AKA distributed lodash
Azuredatabricksbestpractices
Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Kotlin Spark Api
This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Spark Streaming With Kafka
Self-contained examples of Apache Spark streaming integrated with Apache Kafka.
Sparkstreaming
💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Xsql
Unified SQL Analytics Engine Based on SparkSQL
Spark Kafka Writer
Write your Spark data to Kafka seamlessly
Kraps Rpc
A RPC framework leveraging Spark RPC module
Spark
Firely's open source FHIR server
✭ 174
dockerspark
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Transmogrifai
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Spark Structured Streaming Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
Spark Iforest
Isolation Forest on Spark
Geopyspark
GeoTrellis for PySpark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Whylogs Java
Profile and monitor your ML data pipeline end-to-end
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Vue Info Card
Simple and beautiful card component with an elegant spark line, for VueJS.
Glow
An open-source toolkit for large-scale genomic analysis
Scalable Data Science Platform
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
Sparkmonitor
Monitor Apache Spark from Jupyter Notebook
Quill
Compile-time Language Integrated Queries for Scala
Spark.jl
Julia binding for Apache Spark
1-60 of 625 spark projects