All Projects → aws-samples → dataops-platform-airflow-dbt

aws-samples / dataops-platform-airflow-dbt

Licence: MIT-0 license
Build DataOps platform with Apache Airflow and dbt on AWS

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to dataops-platform-airflow-dbt

airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+236.36%)
Mutual labels:  airflow, dbt
airflow-dbt
Apache Airflow integration for dbt
Stars: ✭ 233 (+606.06%)
Mutual labels:  airflow, dbt
dbt-airflow-docker-compose
Execution of DBT models using Apache Airflow through Docker Compose
Stars: ✭ 76 (+130.3%)
Mutual labels:  airflow, dbt
dbt-on-airflow
No description or website provided.
Stars: ✭ 30 (-9.09%)
Mutual labels:  airflow, dbt
dbt-cloud-plugin
DBT Cloud Plugin for Airflow
Stars: ✭ 35 (+6.06%)
Mutual labels:  airflow, dbt
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+60.61%)
Mutual labels:  airflow
torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
Stars: ✭ 165 (+400%)
Mutual labels:  airflow
apache-airflow-cloudera-parcel
Parcel for Apache Airflow
Stars: ✭ 16 (-51.52%)
Mutual labels:  airflow
cli
Polyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-45.45%)
Mutual labels:  dataops
spark-utils
Utility functions for dbt projects running on Spark
Stars: ✭ 19 (-42.42%)
Mutual labels:  dbt
incubator-liminal
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (+254.55%)
Mutual labels:  airflow
airflow-boilerplate
A complete development environment setup for working with Airflow
Stars: ✭ 94 (+184.85%)
Mutual labels:  airflow
qunomon
Testbed of AI Systems Quality Management
Stars: ✭ 15 (-54.55%)
Mutual labels:  airflow
opentrials-airflow
Configuration and definitions of Airflow for OpenTrials
Stars: ✭ 18 (-45.45%)
Mutual labels:  airflow
siren
Siren provides an easy-to-use universal alert, notification, channels management framework for the entire observability infrastructure.
Stars: ✭ 70 (+112.12%)
Mutual labels:  dataops
aircal
Visualize Airflow's schedule by exporting future DAG runs as events to Google Calendar.
Stars: ✭ 66 (+100%)
Mutual labels:  airflow
dbt ml
Package for dbt that allows users to train, audit and use BigQuery ML models.
Stars: ✭ 41 (+24.24%)
Mutual labels:  dbt
dbt-sugar
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (+321.21%)
Mutual labels:  dbt
airflow-ci
Apache Airflow CI pipeline
Stars: ✭ 18 (-45.45%)
Mutual labels:  airflow
dbt2looker
Generate lookml for views from dbt models
Stars: ✭ 119 (+260.61%)
Mutual labels:  dbt

DataOps Platform with Apache Airflow and dbt on AWS

This repository contains code to deploy the architecture described in the blog post: "Build DataOps platform to break silos between engineers and analysts".


Architecture overview

Architecture

The architecture includes following AWS services:

  • Amazon Elastic Container Service, to run Apache Airflow and dbt
  • Amazon Elastic Container Repository, to store Docker images for Airflow and dbt
  • Amazon Redshift, as data warehouse
  • Amazon Relational Database System, as metadata store for Airflow
  • Amazon ElastiCache for Redis, as a Celery backend for Airflow
  • Amazon Simple Storage Service, to store Airflow and dbt DAGs
  • AWS CodeBuild (optional), automate deployments

Repository structure

In this repository there are two main project folders: dataops-infra and analytics. This setup is meant to demonstrate how DataOps can foster effective collaboration between data engineers and data analysts, separating the platform infrastructure code from the business logic.

These two folders should be considered as two separate repositories following their own release cycles.

DataOps Platform Infrastructure

The dataops-infra folder contains code and intructions to deploy the platform infrastructure described in the Architecture overview section. This project is created from the prospective of a data engineering team that is responsible for creating and maintaining data infrastructure such as data lake, data warehouse, orchestration, and CI/CD pipelines for analytics.

Analytics

The analytics folder contains code and instructions to manage and deploy Airflow and dbt DAGs on the DataOps platform. This project is created from the prospective of a data analytics team composed of data analysts and data scientists. They have domain knowledge and are responsible for serving analytics requests from different stakeholders such as marketing and business development teams so that company can make data driven decisions.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].