All Categories → No Category → spark-sql

Top 26 spark-sql open source projects

spark learning
尚硅谷大数据Spark-2019版最新 Spark 学习
spark-data-sources
Developing Spark External Data Sources using the V2 API
recsys spark
Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤
litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
wow-spark
🔆 spark自学手册,包含了例如spark core、spark sql、spark streaming、spark-kafka、delta-lake,以及scala基础练习,还有一些例如master、shuffle源码分析,总结及翻译。
spark2-etl-examples
A project with examples of using few commonly used data manipulation/processing/transformation APIs in Apache Spark 2.0.0
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
albis
Albis: High-Performance File Format for Big Data Systems
Tweet-Analysis-With-Kafka-and-Spark
A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.
1-26 of 26 spark-sql projects