All Projects → ninja-van → airflow-boilerplate

ninja-van / airflow-boilerplate

Licence: MIT license
A complete development environment setup for working with Airflow

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to airflow-boilerplate

Awesome Apache Airflow
Curated list of resources about Apache Airflow
Stars: ✭ 2,755 (+2830.85%)
Mutual labels:  airflow, apache-airflow
fairflow
Functional Airflow DAG definitions.
Stars: ✭ 38 (-59.57%)
Mutual labels:  airflow, apache-airflow
viewflow
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+17.02%)
Mutual labels:  airflow, apache-airflow
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-71.28%)
Mutual labels:  airflow, apache-airflow
airflow-user-management-plugin
A plugin for Apache Airflow that allows you to manage the users that can login
Stars: ✭ 13 (-86.17%)
Mutual labels:  airflow, apache-airflow
airflow-code-editor
A plugin for Apache Airflow that allows you to edit DAGs in browser
Stars: ✭ 195 (+107.45%)
Mutual labels:  airflow, apache-airflow
airflow-prometheus-exporter
Export Airflow metrics (from mysql) in prometheus format
Stars: ✭ 25 (-73.4%)
Mutual labels:  airflow, apache-airflow
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+25539.36%)
Mutual labels:  airflow, apache-airflow
airflow-client-python
Apache Airflow - OpenApi Client for Python
Stars: ✭ 172 (+82.98%)
Mutual labels:  airflow, apache-airflow
airflow-site
Apache Airflow Website
Stars: ✭ 95 (+1.06%)
Mutual labels:  airflow
KV4Jetbrains
Syntax highlighting and auto-completion for Kivy/KivyMD .kv files in PyCharm/Intellij IDEA
Stars: ✭ 93 (-1.06%)
Mutual labels:  pycharm
kedro-airflow
Kedro-Airflow makes it easy to deploy Kedro projects to Airflow.
Stars: ✭ 121 (+28.72%)
Mutual labels:  airflow
Ro-dou
Gerador de DAGs no Airflow para fazer clipping do Diário Oficial da União.
Stars: ✭ 41 (-56.38%)
Mutual labels:  apache-airflow
apache-airflow-cloudera-parcel
Parcel for Apache Airflow
Stars: ✭ 16 (-82.98%)
Mutual labels:  airflow
Insight-GDELT-Feed
A way for home buyers to know about factors affecting a state
Stars: ✭ 43 (-54.26%)
Mutual labels:  airflow
qunomon
Testbed of AI Systems Quality Management
Stars: ✭ 15 (-84.04%)
Mutual labels:  airflow
fab-oidc
Flask-AppBuilder SecurityManager for OpenIDConnect
Stars: ✭ 28 (-70.21%)
Mutual labels:  airflow
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-43.62%)
Mutual labels:  airflow
k3ai
A lightweight tool to get an AI Infrastructure Stack up in minutes not days. K3ai will take care of setup K8s for You, deploy the AI tool of your choice and even run your code on it.
Stars: ✭ 105 (+11.7%)
Mutual labels:  airflow
kedro-airflow-k8s
Kedro Plugin to support running pipelines on Kubernetes using Airflow.
Stars: ✭ 22 (-76.6%)
Mutual labels:  airflow

Airflow Boilerplate

A complete development environment setup for working with Airflow, based on this Medium article. If you are interested in learning about the thoughts and processes behind this setup, do read the article. Otherwise, if you want to get hands-on immediately, you can skip it and just follow the instructions below to get started.

The overall setup diagram

This boilerplate has more tools than was discussed in the article. In particular, it has the following things that were not discussed in the article:

  • A sample DAG
  • A sample plugin
  • A sample test for the plugin
  • A sample helper method, dags/common/stringcase.py, accessible in both dags/ and plugins/
  • A sample test for the helper method
  • A spark-conf/ that is included in the Docker build step, you can explore this on your own
  • A .pre-commit-config.yaml

Getting Started

Install docker and docker-compose at:

Clone this repo and cd into it:

git clone https://github.com/ninja-van/airflow-boilerplate.git && cd airflow-boilerplate

Create a virtualenv for this project. Feel free to choose your preferred way of managing Python virtual environments. I usually do it this way:

pip install virtualenv
virtualenv .venv

Activate the virtual environment:

source .venv/bin/activate

Install the requirements:

pip install -r requirements-airflow.txt
pip install -r requirements-dev.txt

Install the pre-commit hook:

pre-commit install

This will ensure for each commit, any file changes are gone through the linter and formatter. On top of that, tests are ran, too, to make sure that nothing is broken.

Setting up the Docker environment

If you only want the DB to be up because you will mostly work using PyCharm:

docker-compose -f docker/docker-compose.yml up -d airflow_initdb

If you want the whole suit of Airflow components to be up and running:

docker-compose -f docker/docker-compose.yml up -d

This brings up the Airflow postgres metadatabase, scheduler, and webserver.

To access the webserver, once the Docker container is up and healthy, go to localhost:8080. You can start playing around with the samples DAGs.

Setting up PyCharm

Ensure that your Project Interpreter is pointing to the correct virtual environment.

Ensure that your Project Interpreter is pointing to the correct virtual environment

Mark both dags/ and plugins/ as source.

Mark dags and plugins directories as "Sources Root"

Run source env.sh on the terminal and copy the environment variables.

Run env.sh and copy the env vars

Add a new Run/Debug Configuration with the following parameters:

  • Name: <whatever_you_want>  
  • Script path: <path_to_your_virtualenv_airflow_executable>
  • Parameters: test <dag_id> <task_id> <execution_date> 
  • Environment variables: paste your env vars here

Run/debug configurations

Add those environment variables to your test configuration (pytest in my case), so that you can just hit the run/debug button next to your test functions.

Run/debug configurations

Generating a new fernet key

Included in this boilerplate is a pre-generated fernet key. There should not be any security concern here because after all you are meant to run this environment only locally. If you wish to have a new fernet key, you can follow these steps below.

Generate a fernet key:

python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)"

Copy that fernet key to clipboard. In env.sh, paste it here:

export AIRFLOW__CORE__FERNET_KEY=<YOUR_FERNET_KEY_HERE>

In airflow.cfg, paste it here:

fernet_key = <YOUR_FERNET_KEY_HERE>

Caveats

  • The PyPi packages are installed during build time instead of run time, to minimise the start-up time of our development environment. As a side-effect, if there is any new PyPi packages, the images need to be rebuilt. You can do so by passing the extra --build flag:
    docker-compose -f docker/docker-compose.yml up -d --build
    
  • PyCharm cannot recognise custom plugins registered dynamically by Airflow, because IDE does static analysis and the custom plugins are registered dynamically during runtime.

PyCharm failing to recognise custom plugin

  • Not related to the build environment, but rather how Airflow works - some of the configs (like rbac = True) you change in airflow.cfg might not be reflected immediately on runtime, because they are static configurations and are only evaluated once in the startup. To solve that problem, just restart your webserver:
    docker-compose -f docker/docker-compose.yml restart airflow_webserver
    
  • Not related to the build environment, but rather how Airflow works - you cannot have a ; package/module in dags/ and plugins/ with the same name. This will likely give you a ModuleNotFoundError

Concluding tips

  • If you are only interested in just using your IDE, and you do not need the Airflow scheduler or webserver, run:

    docker-compose -f docker/docker-compose.yml up -d airflow_initdb
    
  • To remove the examples from the Webserver, change the following line in the airflow.cfg:

    load_examples = False
    

    Notice that the docker-compose immediately picks up the changes in airflow.cfg.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].