All Categories → No Category → etl-pipeline

Top 16 etl-pipeline open source projects

Production Ready Data Integration Product, documentation：

✭ 1,856

java scala shell spark hadoop flink spark-streaming etl-framework sql-engine etl-pipeline

SEO dashboard from Search console Data using the Google Search API, Mysql database , NodeJS RESTAPI( ExpressJS) and reactJs Dashboard

✭ 39

javascript python HTML CSS react mysql dashboard rest-api seo expressjs seotools node-js seo-monitor google-search-console etl-pipeline etl-kpi google-search-console-python

tweetsOLAPing

implementing an end-to-end tweets ETL/Analysis pipeline.

✭ 24

python tweets analysis twitter-api multithreading api-client datawarehousing datawarehouse web-crawling ssis google-api-client etl-pipeline tweets-classification cube-analysis powerbi-report ssas-multidimensional multi-dimensional-analysis tweets-scraper

udacity-data-eng-proj2

A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract data from S3, apply a series of transformations and load into S3 and Redshift.

✭ 25

Jupyter Notebook python docker airflow redshift s3fs etl-pipeline

AzureDataFactoryHOL

Azure Data Factory Hands On Lab - Step by Step - A Comprehensive Azure Data Factory and Mapping Data Flow step by step tutorial

✭ 43

azure azure-data-factory hands-on-lab azure-key-vault etl-pipeline adf-pipeline filter-activity lookup-activity foreach-activity metadata-activity mapping-dataflows hands-on-azure-data-factory azure-data-factory-tutorial azure-modern-data-warehous web-activity foreach-loop-activity

kafka-connect-datagen

A Kafka Connect source connector that generates data for tests

✭ 27

java kafka etl kafka-connect data-generator performance-test integration-test etl-pipeline

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

✭ 21

redis csv replication etl event-sourcing connect cdc etl-framework etl-pipeline event-streaming etl-automation rediscdc redisconnect

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

✭ 24

scala shell Dockerfile json csv apache-spark hive hadoop avro etl parquet transformation-rules etl-framework etl-pipeline join-data

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

✭ 38

scala bigquery aws spark etl gcp zio etl-framework etl-pipeline

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

✭ 612

python shell data-science machine-learning etl numpy pandas data-engineering data-platform software-engineering feature-engineering dataframe dag hamiltonian etl-framework hamilton featurization etl-pipeline stitch-fix

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

✭ 67

go Makefile csv etl stream-processing fluent-interface csv-format go-csv etl-framework etl-pipeline

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

✭ 39

python big-data spark apache-spark hadoop etl xml xml-parsing pyspark data-pipeline datalake hadoop-mapreduce spark-sql etl-framework hadoop-hdfs etl-pipeline etl-components

seatunnel-example

seatunnel plugin developing examples.

✭ 27

scala java spark spark-streaming flink sql-engine etl-framework waterdrop etl-pipeline

DIRECT

DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.

✭ 20

TSQL C#etl datawarehouse etl-framework etl-pipeline etl-automation datawarehouseautomation

AirflowETL

Blog post on ETL pipelines with Airflow

✭ 20

Jupyter Notebook airflow sql database schedule etl postgresql data-engineering data-pipeline etl-pipeline

vixtract

www.vixtract.ru

✭ 40

HTML python Jupyter Notebook shell javascript Dockerfile etl etl-framework etl-pipeline etl-components etl-job etl-automation

1-16 of 16 etl-pipeline projects