All Projects → gocardless → airflow-dbt

gocardless / airflow-dbt

Licence: MIT License
Apache Airflow integration for dbt

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to airflow-dbt

dbt-on-airflow
No description or website provided.
Stars: ✭ 30 (-87.12%)
Mutual labels:  airflow, dbt
dataops-platform-airflow-dbt
Build DataOps platform with Apache Airflow and dbt on AWS
Stars: ✭ 33 (-85.84%)
Mutual labels:  airflow, dbt
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-52.36%)
Mutual labels:  airflow, dbt
dbt-cloud-plugin
DBT Cloud Plugin for Airflow
Stars: ✭ 35 (-84.98%)
Mutual labels:  airflow, dbt
dbt-airflow-docker-compose
Execution of DBT models using Apache Airflow through Docker Compose
Stars: ✭ 76 (-67.38%)
Mutual labels:  airflow, dbt
dbt artifacts
A dbt package for modelling dbt metadata. https://brooklyn-data.github.io/dbt_artifacts
Stars: ✭ 119 (-48.93%)
Mutual labels:  dbt
AirDataComputer
Air Data Computer
Stars: ✭ 29 (-87.55%)
Mutual labels:  airflow
Data-Engineering-Projects
Personal Data Engineering Projects
Stars: ✭ 167 (-28.33%)
Mutual labels:  airflow
fal
do more with dbt. fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
Stars: ✭ 567 (+143.35%)
Mutual labels:  dbt
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-66.09%)
Mutual labels:  airflow
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+103.43%)
Mutual labels:  dbt
dbt-ml-preprocessing
A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.
Stars: ✭ 128 (-45.06%)
Mutual labels:  dbt
ml-ops
Get your MLOps (Level 1) platform started and going fast.
Stars: ✭ 81 (-65.24%)
Mutual labels:  airflow
dbt-clickhouse
The Clickhouse plugin for dbt (data build tool)
Stars: ✭ 77 (-66.95%)
Mutual labels:  dbt
dbt-superset-lineage
Make dbt docs and Apache Superset talk to one another
Stars: ✭ 60 (-74.25%)
Mutual labels:  dbt
spark-utils
Utility functions for dbt projects running on Spark
Stars: ✭ 19 (-91.85%)
Mutual labels:  dbt
airflow-prometheus-exporter
Export Airflow metrics (from mysql) in prometheus format
Stars: ✭ 25 (-89.27%)
Mutual labels:  airflow
viewflow
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (-52.79%)
Mutual labels:  airflow
dbt-invoke
A CLI for creating, updating, and deleting dbt property files
Stars: ✭ 42 (-81.97%)
Mutual labels:  dbt
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-89.27%)
Mutual labels:  airflow

airflow-dbt

This is a collection of Airflow operators to provide easy integration with dbt.

from airflow import DAG
from airflow_dbt.operators.dbt_operator import (
    DbtSeedOperator,
    DbtSnapshotOperator,
    DbtRunOperator,
    DbtTestOperator,
    DbtCleanOperator,
)
from airflow.utils.dates import days_ago

default_args = {
  'dir': '/srv/app/dbt',
  'start_date': days_ago(0)
}

with DAG(dag_id='dbt', default_args=default_args, schedule_interval='@daily') as dag:

  dbt_seed = DbtSeedOperator(
    task_id='dbt_seed',
  )

  dbt_snapshot = DbtSnapshotOperator(
    task_id='dbt_snapshot',
  )

  dbt_run = DbtRunOperator(
    task_id='dbt_run',
  )

  dbt_test = DbtTestOperator(
    task_id='dbt_test',
    retries=0,  # Failing tests would fail the task, and we don't want Airflow to try again
  )

  dbt_clean = DbtCleanOperator(
    task_id='dbt_clean',
  )

  dbt_seed >> dbt_snapshot >> dbt_run >> dbt_test >> dbt_clean

Installation

Install from PyPI:

pip install airflow-dbt

It will also need access to the dbt CLI, which should either be on your PATH or can be set with the dbt_bin argument in each operator.

Usage

There are five operators currently implemented:

Each of the above operators accept the following arguments:

  • profiles_dir
    • If set, passed as the --profiles-dir argument to the dbt command
  • target
    • If set, passed as the --target argument to the dbt command
  • dir
    • The directory to run the dbt command in
  • full_refresh
    • If set to True, passes --full-refresh
  • vars
    • If set, passed as the --vars argument to the dbt command. Should be set as a Python dictionary, as will be passed to the dbt command as YAML
  • models
    • If set, passed as the --models argument to the dbt command
  • exclude
    • If set, passed as the --exclude argument to the dbt command
  • select
    • If set, passed as the --select argument to the dbt command
  • selector
    • If set, passed as the --selector argument to the dbt command
  • dbt_bin
    • The dbt CLI. Defaults to dbt, so assumes it's on your PATH
  • verbose
    • The operator will log verbosely to the Airflow logs
  • warn_error
    • If set to True, passes --warn-error argument to dbt command and will treat warnings as errors

Typically you will want to use the DbtRunOperator, followed by the DbtTestOperator, as shown earlier.

You can also use the hook directly. Typically this can be used for when you need to combine the dbt command with another task in the same operators, for example running dbt docs and uploading the docs to somewhere they can be served from.

Building Locally

To install from the repository: First it's recommended to create a virtual environment:

python3 -m venv .venv

source .venv/bin/activate

Install using pip:

pip install .

Testing

To run tests locally, first create a virtual environment (see Building Locally section)

Install dependencies:

pip install . pytest

Run the tests:

pytest tests/

Code style

This project uses flake8.

To check your code, first create a virtual environment (see Building Locally section):

pip install flake8
flake8 airflow_dbt/ tests/ setup.py

Package management

If you use dbt's package manager you should include all dependencies before deploying your dbt project.

For Docker users, packages specified in packages.yml should be included as part your docker image by calling dbt deps in your Dockerfile.

Amazon Managed Workflows for Apache Airflow (MWAA)

If you use MWAA, you just need to update the requirements.txt file and add airflow-dbt and dbt to it.

Then you can have your dbt code inside a folder {DBT_FOLDER} in the dags folder on S3 and configure the dbt task like below:

dbt_run = DbtRunOperator(
  task_id='dbt_run',
  dbt_bin='/usr/local/airflow/.local/bin/dbt',
  profiles_dir='/usr/local/airflow/dags/{DBT_FOLDER}/',
  dir='/usr/local/airflow/dags/{DBT_FOLDER}/'
)

License & Contributing

GoCardless open source. If you do too, come join us.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].