Top 625 spark open source projects

Distributed Dataset
A distributed data processing framework in Haskell.
Hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Flink Learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Logigsk
A Linux based software package to control led's on Logitech G910, G810, G610 and G410.
Spark On K8s Operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Sparktutorial
Source code for James Lee's Aparch Spark with Java course
Splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Spark Terasort
Spark Terasort
✭ 101
javaspark
Spark Ffm
FFM (Field-Awared Factorization Machine) on Spark
✭ 101
scalaspark
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Relation extraction
Relation Extraction using Deep learning(CNN)
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Spark Summit 2017 Sanfrancisco
spark summit 2017 SanFrancisco
✭ 93
spark
Big Data
🔧 Use dplyr to analyze Big Data 🐘
Spark On Kubernetes Helm
Spark on Kubernetes infrastructure Helm charts repo
Ammonite Spark
Run spark calculations from Ammonite
✭ 88
scalaspark
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Laravel Spark Google2fa
Google Authenticator support for Laravel Spark
Cuesheet
A framework for writing Spark 2.x applications in a pretty way
Flint
Webex Bot SDK for Node.js (deprecated in favor of https://github.com/webex/webex-bot-node-framework)
Hops Examples
Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Spark States
Custom state store providers for Apache Spark
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Spark Dependencies
Spark job for dependency links
✭ 82
javaspark
Mleap
MLeap: Deploy ML Pipelines to Production
Lehar
Visualize data using relative ordering
Spark Gbtlr
Hybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Docker Spark
🚢 Docker image for Apache Spark
Home
ApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Cleanframes
type-class based data cleansing library for Apache Spark SQL
Ds Cheatsheets
List of Data Science Cheatsheets to rule the world
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Apache Spark Hands On
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Lpa Detector
Optimize and improve the Label propagation algorithm
✭ 75
javaspark
Spark Twitter Stream Example
"Sentiment analysis" on a live Twitter feed with Apache Spark and Apache Bahir
Labs
Research on distributed system
Kamu Cli
Next generation tool for decentralized exchange and transformation of semi-structured data
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Usersessionbehaviorofflineanalysis
四川大学拓思爱诺用户session行为数据离线分析项目
✭ 69
scalaspark
Fast Mrmr
An improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
✭ 67
sparkfast
Kontextfrei
Writing application logic for Spark jobs that can be unit-tested without a SparkContext
✭ 67
scalaspark
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Spark Bigquery
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
W2v
Word2Vec models with Twitter data using Spark. Blog:
Pyspark Twitter Stream Mining
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Spark Doc Zh
Apache Spark 官方文档中文版
Roffildlibrary
Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS