Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+13172%)
Mutual labels: apache-spark, structured-streaming
fink-brokerAstronomy Broker based on Apache Spark
Stars: ✭ 18 (-28%)
Mutual labels: apache-spark, structured-streaming
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (+104%)
Mutual labels: apache-spark
BigCLAM-ApacheSparkOverlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark
Stars: ✭ 40 (+60%)
Mutual labels: apache-spark
Real-time-Data-WarehouseReal-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Stars: ✭ 52 (+108%)
Mutual labels: deltalake
sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (+2336%)
Mutual labels: apache-spark
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-24%)
Mutual labels: apache-spark
telemetry-streamingSpark Streaming ETL jobs for Mozilla Telemetry
Stars: ✭ 16 (-36%)
Mutual labels: structured-streaming
wow-spark🔆 spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
Stars: ✭ 20 (-20%)
Mutual labels: structured-streaming
hyperdriveExtensible streaming ingestion pipeline on top of Apache Spark
Stars: ✭ 31 (+24%)
Mutual labels: apache-spark
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+168%)
Mutual labels: apache-spark
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-4%)
Mutual labels: apache-spark
net.jgp.books.spark.ch07Spark in Action, 2nd edition - chapter 7 - Ingestion from files
Stars: ✭ 13 (-48%)
Mutual labels: apache-spark
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+13320%)
Mutual labels: apache-spark
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+56%)
Mutual labels: apache-spark
parquet-dotnet🐬 Apache Parquet for modern .Net
Stars: ✭ 199 (+696%)
Mutual labels: apache-spark
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+212%)
Mutual labels: apache-spark
cloud-integrationSpark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-20%)
Mutual labels: apache-spark
kafka-delta-ingestA highly efficient daemon for streaming data from Kafka into Delta Lake
Stars: ✭ 139 (+456%)
Mutual labels: deltalake