All Projects → ananthdurai → Airflow Training

ananthdurai / Airflow Training

Licence: other
Airflow training for the crunch conf

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Airflow Training

Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+451.81%)
Mutual labels:  airflow
Docker Airflow
Repo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/
Stars: ✭ 29 (-65.06%)
Mutual labels:  airflow
Xene
A distributed workflow runner focusing on performance and simplicity.
Stars: ✭ 56 (-32.53%)
Mutual labels:  airflow
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+855.42%)
Mutual labels:  airflow
Elyra
Elyra extends JupyterLab Notebooks with an AI centric approach.
Stars: ✭ 839 (+910.84%)
Mutual labels:  airflow
Airflow On Kubernetes
Bare minimal Airflow on Kubernetes (Local, EKS, AKS)
Stars: ✭ 38 (-54.22%)
Mutual labels:  airflow
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+397.59%)
Mutual labels:  airflow
Terraform Aws Airflow
Terraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker with CeleryExecutor
Stars: ✭ 69 (-16.87%)
Mutual labels:  airflow
Airflow Maintenance Dags
A series of DAGs/Workflows to help maintain the operation of Airflow
Stars: ✭ 914 (+1001.2%)
Mutual labels:  airflow
Airflow Toolkit
Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested data pipelines(DAGs) 🖥 >> [ 🚀, 🚢 ]
Stars: ✭ 51 (-38.55%)
Mutual labels:  airflow
Phila Airflow
Stars: ✭ 16 (-80.72%)
Mutual labels:  airflow
Databook
A facebook for data
Stars: ✭ 26 (-68.67%)
Mutual labels:  airflow
Data Pipelines With Apache Airflow
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Stars: ✭ 50 (-39.76%)
Mutual labels:  airflow
Incubator Dolphinscheduler
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Stars: ✭ 6,916 (+8232.53%)
Mutual labels:  airflow
Airflow Cookbook
Airflow workflow management platform chef cookbook.
Stars: ✭ 58 (-30.12%)
Mutual labels:  airflow
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+28937.35%)
Mutual labels:  airflow
Objinsync
Continuously synchronize directories from remote object store to local filesystem
Stars: ✭ 29 (-65.06%)
Mutual labels:  airflow
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1339.76%)
Mutual labels:  airflow
Discreetly
ETLy is an add-on dashboard service on top of Apache Airflow.
Stars: ✭ 60 (-27.71%)
Mutual labels:  airflow
Argo Workflows
Workflow engine for Kubernetes
Stars: ✭ 10,024 (+11977.11%)
Mutual labels:  airflow

Introduction to data pipeline management with Airflow

The modern Data Warehouse increase in complexity it is necessary to have a dependable, scalable, intuitive, and simple scheduling and management program to monitor the flow of data and watch how transformations are completed.

Apache Airflow, help manage the complexities of their Enterprise Data Warehouse, is being adopted by tech companies everywhere for its ease of management, scalability, and elegant design. Airflow is rapidly becoming the go-to technology for companies scaling out large data warehouses.

The Introduction to the data pipeline management with Airflow training course is designed to familiarize participants with the use of Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse. 

Table of contents:

  1. Introduction to Airflow
  2. Introduction to Airflow core concepts (DAGs, tasks, operators, sensors)
  3. Airflow UI
  4. Airflow Scheduler
  5. Airflow Operators & Sensors
  6. Advance Airflow Concepts (Hooks, Connections, Variables, Templates, Macros, XCom)

  7. SLA, Monitoring & Alerting
  8. Code examples

Prerequisites

Participants should have a technology background, basic programming skills in Python and be open to sharing their thoughts and questions.

Participants need to bring their laptops. The examples tested on mac & ununtu machines. Participants can use any hosted airflow solutions such as Google cloud composer or Astronomer

Installation

  1. install sqllite3

  2. run ./airflow scheduler to start the airflow scheduler. The installation script will install all the dependencies 

  3. run in another terminal ./airflow webserver

  4. on your browser visit http://localhost:8080 to access airflow UI

Contributing

Interested in contributing? Improving documentation? Adding more example? Check out Contributing.md

License

As stated in the License file all lecture slides are provided under Creative Commons BY-NC 4.0. The exercise code is released under an MIT license.

Author:

Credit

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].