KoalasKoalas: pandas API on Apache Spark
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Spark Fast TestsApache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
DparkPython clone of Spark, a MapReduce alike framework in Python
Azure Event Hubs☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Spark ExcelA Spark plugin for reading Excel files via Apache POI
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Spark Knnk-Nearest Neighbors algorithm on Spark
MmlsparkSimple and Distributed Machine Learning
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
ScannsA scalable nearest neighbor search library in Apache Spark
Js SparkRealtime calculation distributed system. AKA distributed lodash
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
XsqlUnified SQL Analytics Engine Based on SparkSQL
Kraps RpcA RPC framework leveraging Spark RPC module
SparkFirely's open source FHIR server
Spark NlpState of the Art Natural Language Processing
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
GlowAn open-source toolkit for large-scale genomic analysis
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
GeniA Clojure dataframe library that runs on Spark
QuillCompile-time Language Integrated Queries for Scala