DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-38.46%)
Mutual labels: apache-spark, hadoop, etl, etl-framework, etl-pipeline
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+269.23%)
Mutual labels: big-data, apache-spark, etl, etl-framework
SparkProgrammingInScalaApache Spark Course Material
Stars: ✭ 57 (+46.15%)
Mutual labels: big-data, apache-spark, datalake, spark-sql
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+184.62%)
Mutual labels: big-data, apache-spark, hadoop, pyspark
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+284.62%)
Mutual labels: big-data, apache-spark, hadoop, pyspark
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-12.82%)
Mutual labels: big-data, hadoop, pyspark, spark-sql
vixtractwww.vixtract.ru
Stars: ✭ 40 (+2.56%)
Mutual labels: etl, etl-framework, etl-pipeline, etl-components
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-48.72%)
Mutual labels: etl, etl-framework, etl-pipeline
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-66.67%)
Mutual labels: big-data, apache-spark, hadoop
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+28.21%)
Mutual labels: big-data, apache-spark, pyspark
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+825.64%)
Mutual labels: big-data, etl, etl-framework
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+7333.33%)
Mutual labels: big-data, apache-spark, pyspark
spark-twitter-sentiment-analysisSentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (+41.03%)
Mutual labels: apache-spark, pyspark, spark-sql
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-48.72%)
Mutual labels: etl, data-pipeline, etl-pipeline
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+194.87%)
Mutual labels: big-data, apache-spark, pyspark
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+258.97%)
Mutual labels: big-data, hadoop, etl
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+11646.15%)
Mutual labels: big-data, hadoop, datalake
Movies-Analytics-in-Spark-and-ScalaData cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (+20.51%)
Mutual labels: big-data, hadoop, spark-sql
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+8502.56%)
Mutual labels: big-data, apache-spark, pyspark
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+228.21%)
Mutual labels: big-data, apache-spark, hadoop