hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2960%)
Mutual labels: etl, data-engineering, etl-pipeline
AirflowDataPipelineExample of an ETL Pipeline using Airflow
Stars: ✭ 24 (+20%)
Mutual labels: airflow, etl, data-engineering
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+165%)
Mutual labels: airflow, etl, data-engineering
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+95%)
Mutual labels: etl, data-pipeline, etl-pipeline
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+25%)
Mutual labels: airflow, data-engineering, data-pipeline
Around DataengineeringA Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (+1185%)
Mutual labels: airflow, data-engineering
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+2190%)
Mutual labels: airflow, data-engineering
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+765%)
Mutual labels: airflow, data-engineering
DiscreetlyETLy is an add-on dashboard service on top of Apache Airflow.
Stars: ✭ 60 (+200%)
Mutual labels: airflow, etl
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+3865%)
Mutual labels: airflow, data-engineering
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+5875%)
Mutual labels: airflow, etl
udacity-data-eng-proj2A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract data from S3, apply a series of transformations and load into S3 and Redshift.
Stars: ✭ 25 (+25%)
Mutual labels: airflow, etl-pipeline
Incubator DolphinschedulerApache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Stars: ✭ 6,916 (+34480%)
Mutual labels: airflow, schedule
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+295%)
Mutual labels: airflow, etl
vixtractwww.vixtract.ru
Stars: ✭ 40 (+100%)
Mutual labels: etl, etl-pipeline
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+435%)
Mutual labels: airflow, etl
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+450%)
Mutual labels: airflow, data-engineering