All Projects → konosp → dbt-airflow-docker-compose

konosp / dbt-airflow-docker-compose

Licence: Apache-2.0 license
Execution of DBT models using Apache Airflow through Docker Compose

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to dbt-airflow-docker-compose

airflow-dbt
Apache Airflow integration for dbt
Stars: ✭ 233 (+206.58%)
Mutual labels:  airflow, dbt
dbt-cloud-plugin
DBT Cloud Plugin for Airflow
Stars: ✭ 35 (-53.95%)
Mutual labels:  airflow, dbt
dbt-on-airflow
No description or website provided.
Stars: ✭ 30 (-60.53%)
Mutual labels:  airflow, dbt
dataops-platform-airflow-dbt
Build DataOps platform with Apache Airflow and dbt on AWS
Stars: ✭ 33 (-56.58%)
Mutual labels:  airflow, dbt
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+46.05%)
Mutual labels:  airflow, dbt
dbt-databricks
A dbt adapter for Databricks.
Stars: ✭ 115 (+51.32%)
Mutual labels:  dbt
T-Watch
Real Time Twitter Sentiment Analysis Product
Stars: ✭ 20 (-73.68%)
Mutual labels:  airflow
kedro-airflow-k8s
Kedro Plugin to support running pipelines on Kubernetes using Airflow.
Stars: ✭ 22 (-71.05%)
Mutual labels:  airflow
aircan
💨🥫 A Data Factory system for running data processing pipelines built on AirFlow and tailored to CKAN. Includes evolution of DataPusher and Xloader for loading data to DataStore.
Stars: ✭ 24 (-68.42%)
Mutual labels:  airflow
qunomon
Testbed of AI Systems Quality Management
Stars: ✭ 15 (-80.26%)
Mutual labels:  airflow
dbt ml
Package for dbt that allows users to train, audit and use BigQuery ML models.
Stars: ✭ 41 (-46.05%)
Mutual labels:  dbt
metriql
The metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+198.68%)
Mutual labels:  dbt
kedro-airflow
Kedro-Airflow makes it easy to deploy Kedro projects to Airflow.
Stars: ✭ 121 (+59.21%)
Mutual labels:  airflow
fairflow
Functional Airflow DAG definitions.
Stars: ✭ 38 (-50%)
Mutual labels:  airflow
fab-oidc
Flask-AppBuilder SecurityManager for OpenIDConnect
Stars: ✭ 28 (-63.16%)
Mutual labels:  airflow
apache-airflow-cloudera-parcel
Parcel for Apache Airflow
Stars: ✭ 16 (-78.95%)
Mutual labels:  airflow
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-73.68%)
Mutual labels:  airflow
dbt-spotify-analytics
Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase
Stars: ✭ 92 (+21.05%)
Mutual labels:  dbt
k3ai
A lightweight tool to get an AI Infrastructure Stack up in minutes not days. K3ai will take care of setup K8s for You, deploy the AI tool of your choice and even run your code on it.
Stars: ✭ 105 (+38.16%)
Mutual labels:  airflow
airflow-site
Apache Airflow Website
Stars: ✭ 95 (+25%)
Mutual labels:  airflow

Apache Airflow and DBT using Docker Compose

Stand-alone project that utilises public eCommerce data from Instacart to demonstrate how to schedule dbt models through Airflow.

For more Data & Analytics related reading, check https://analyticsmayhem.com

Requirements

Setup

  • Clone the repository
  • Extract the CSV files within ./sample_data directory (files are needed as seed data)

Change directory within the repository and run docker-compose up. This will perform the following:

  • Based on the definition of docker-compose.yml will download the necessary images to run the project. This includes the following services:
    • postgres-airflow: DB for Airflow to connect and store task execution information
    • postgres-dbt: DB for the DBT seed data and SQL models to be stored
    • airflow: Python-based image to execute Airflow scheduler and webserver
    • adminer: a lightweight DB client

Connections

How to ran the DAGs

Once everything is up and running, navigate to the Airflow UI (see connections above). You will be presented with the list of DAGs, all Off by default.

You will need to run to execute them in correct order.

  • 1_load_initial_data: Load the raw Kaggle dataset
  • 2_init_once_dbt_models: Perform some basic transformations (i.e. build an artificial date for the orders)
  • 3_snapshot_dbt_models: Build the snapshot tables
  • 4_daily_dbt_models: Schedule the daily models. The starting date is set on Jan 6th, 2019. This will force Ariflow to backfill all date for those dates. So leave that for last.

If everything goes well, you should have the daily model execute successfully and see similar task durations as per below.

Finally, within Adminer you can view the final models.

Docker Compose Commands

  • Enable the services: docker-compose up or docker-compose up -d (detatches the terminal from the services' log)
  • Disable the services: docker-compose down Non-destructive operation.
  • Delete the services: docker-compose rm Ddeletes all associated data. The database will be empty on next run.
  • Re-build the services: docker-compose build Re-builds the containers based on the docker-compose.yml definition. Since only the Airflow service is based on local files, this is the only image that is re-build (useful if you apply changes on the ./scripts_airflow/init.sh file.

If you need to connect to the running containers, use docker-compose ps to view the running services.

For example, to connect to the Airflow service, you can execute docker exec -it dbt-airflow-docker_airflow_1 /bin/bash. This will attach your terminal to the selected container and activate a bash terminal.

Project Notes and Docker Volumes

Because the project directories (./scripts_postgres, ./sample_data, ./dbt and ./airflow) are defined as volumes in docker-compose.yml, they are directly accessible from within the containers. This means:

  • On Airflow startup the existing models are compiled as part of the initialisation script. If you make changes to the models, you need to re-compile them. Two options:
    • From the host machine navigate to ./dbt and then dbt compile
    • Attach to the container by docker exec -it dbt-airflow-docker_airflow_1 /bin/bash. This will open a session directly in the container running Airflow. Then CD into /dbt and dbt compile. In general attaching to the container, helps a lot in debugging.
  • You can make changes to the dbt models from the host machine, dbt compile them and on the next DAG update they will be available (beware of changes that are major and require --full-refresh). It is suggested to connect to the container (docker exec ...) to run a full refresh of the models. Alternatively you can docker-compose down && docker-compose rm && docker-compose up.
  • The folder ./airflow/dags stores the DAG files. Changes on them appear after a few seconds in the Airflow admin.
    • The initialise_data.py file contains the upfront data loading operation of the seed data.
    • The dag.py file contains all the handling of the DBT models. Keep aspect is the parsing of manifest.json which holdes the models' tree structure and tag details

Credit to the very helpful repository: https://github.com/puckel/docker-airflow

Infrastructure model

Infrastructure model

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].