All Projects → mdh266 → AirflowETL

mdh266 / AirflowETL

Licence: MIT license
Blog post on ETL pipelines with Airflow

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to AirflowETL

hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2960%)
Mutual labels:  etl, data-engineering, etl-pipeline
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (+20%)
Mutual labels:  airflow, etl, data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+165%)
Mutual labels:  airflow, etl, data-engineering
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+95%)
Mutual labels:  etl, data-pipeline, etl-pipeline
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+25%)
Mutual labels:  airflow, data-engineering, data-pipeline
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (+1185%)
Mutual labels:  airflow, data-engineering
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+2190%)
Mutual labels:  airflow, data-engineering
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+765%)
Mutual labels:  airflow, data-engineering
Discreetly
ETLy is an add-on dashboard service on top of Apache Airflow.
Stars: ✭ 60 (+200%)
Mutual labels:  airflow, etl
Data-Engineering-Projects
Personal Data Engineering Projects
Stars: ✭ 167 (+735%)
Mutual labels:  airflow, data-engineering
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+3865%)
Mutual labels:  airflow, data-engineering
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+5875%)
Mutual labels:  airflow, etl
udacity-data-eng-proj2
A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract data from S3, apply a series of transformations and load into S3 and Redshift.
Stars: ✭ 25 (+25%)
Mutual labels:  airflow, etl-pipeline
Incubator Dolphinscheduler
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Stars: ✭ 6,916 (+34480%)
Mutual labels:  airflow, schedule
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+295%)
Mutual labels:  airflow, etl
Phila Airflow
Stars: ✭ 16 (-20%)
Mutual labels:  airflow, etl
vixtract
www.vixtract.ru
Stars: ✭ 40 (+100%)
Mutual labels:  etl, etl-pipeline
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (+345%)
Mutual labels:  airflow, etl
Aws Ecs Airflow
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+435%)
Mutual labels:  airflow, etl
viewflow
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+450%)
Mutual labels:  airflow, data-engineering

An Example ETL Pipeline With Airflow

In this blog post I want to go over the operations of data engineering called Extract, Transform, Load (ETL) and show how they can be automated and scheduled using Apache Airflow. You can see the source code for this project here.

Extracting data can be done in a multitude of ways, but one of the most common ways is to query a WEB API. If the query is sucessful, then we will receive data back from the API's server. Often times the data we get back is in the form of JSON. JSON can pretty much be thought of a semi-structured data or as a dictionary where the dictionary keys and values are strings. Since the data is a dictionary of strings this means we must transform it before storing or loading into a database. Airflow is a platform to schedule and monitor workflows and in this post I will show you how to use it to extract the daily weather in New York from the OpenWeatherMap API, convert the temperature to Celsius and load the data in a simple PostgreSQL database.

Requirements

Airflow

Python 2.7

PostgreSQL

psycopg2

SQLAlchemy

SQLAlchemy-Utils

To install the requirements (except for Python and postgres) type:

pip install -r requirements.t

You can see the actual blog post here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].