Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
SparkApache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
wow-spark🔆 spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
spark2-etl-examplesA project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0
spark-vcfSpark VCF data source implementation for Dataframes
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
albisAlbis: High-Performance File Format for Big Data Systems