All Projects → andresionek91 → Airflow Autoscaling Ecs

andresionek91 / Airflow Autoscaling Ecs

Licence: mit
Airflow Deployment on AWS ECS Fargate Using Cloudformation

Projects that are alternatives of or similar to Airflow Autoscaling Ecs

airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-18.38%)
Mutual labels:  airflow, data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-61.03%)
Mutual labels:  airflow, data-engineering
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+27.21%)
Mutual labels:  airflow, data-engineering
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-39.71%)
Mutual labels:  airflow, data-engineering
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (-82.35%)
Mutual labels:  airflow, data-engineering
viewflow
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (-19.12%)
Mutual labels:  airflow, data-engineering
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-85.29%)
Mutual labels:  airflow, data-engineering
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+236.76%)
Mutual labels:  airflow, data-engineering
Data-Engineering-Projects
Personal Data Engineering Projects
Stars: ✭ 167 (+22.79%)
Mutual labels:  airflow, data-engineering
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-81.62%)
Mutual labels:  airflow, data-engineering
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (+88.97%)
Mutual labels:  airflow, data-engineering
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+483.09%)
Mutual labels:  airflow, data-engineering
Bitnami Docker Airflow
Bitnami Docker Image for Apache Airflow
Stars: ✭ 89 (-34.56%)
Mutual labels:  airflow
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-10.29%)
Mutual labels:  data-engineering
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+13005.88%)
Mutual labels:  data-engineering
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-6.62%)
Mutual labels:  data-engineering
D6t Python
Accelerate data science
Stars: ✭ 118 (-13.24%)
Mutual labels:  data-engineering
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-34.56%)
Mutual labels:  airflow
Airflow Training
Airflow training for the crunch conf
Stars: ✭ 83 (-38.97%)
Mutual labels:  airflow
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-41.91%)
Mutual labels:  data-engineering

Airflow Autoscaling ECS

Setup to run Airflow in AWS ECS (Elastic Container Service) Fargate with autoscaling enabled for all services. All infrastructure is created with Cloudformation and Secrets are managed by AWS Secrets Manager.

Cloudformation Resources

Requirements

  • Create an AWS IAM User for the infrastructure deployment, with admin permissions
  • Install AWS CLI running pip install awscli
  • Install Docker
  • Setup your IAM User credentials inside ~/.aws/config
    [profile my_aws_profile]
    aws_access_key_id = <my_access_key_id> 
    aws_secret_access_key = <my_secret_access_key>
    region = us-east-1
  • Create a virtual environment
  • Setup env variables in your .zshrc or .bashrc, or in your the terminal session that you are going to use:
	export AWS_REGION=us-east-1;
	export AWS_PROFILE=my_aws_profile;
	export ENVIRONMENT=dev;

Deploy Airflow Locally

make airflow-local

Deploy Airflow on AWS ECS

To deploy or update your stack run the following command:

make airflow-deploy

To rebuild Airflow Docker Image and push it to ECR (without infrastructure changes), run:

make airflow-push-image

To destroy your stack run the following command:

make airflow-destroy

Update a Dag on AWS

After creating or updating a DAG you need to rebuild Airflow image, push it to ECR and then restart the airflow service. To do all that, you just need to execute:

make airflow-push-image

Features

  • Control all Airflow infrastructure from a single service.yml file.
  • Metadata DB Passwords Managed with AWS Secrets Manager.
  • Autoscaling enabled and configurable for all Airflow sub-services (workers, flower, webserver, scheduler)
  • TODO: Continuous Integration using AWS CodePipeline
  • TODO: Create isolated DAGs using docker_operator

Adjust many infrastructure configs directly on Service.yml:

  workers:
    port: 8793
    cpu: 1024
    memory: 2048
    desiredCount: 2
    autoscaling:
      maxCapacity: 8
      minCapacity: 2
      cpu:
        target: 70
        scaleInCooldown: 60
        scaleOutCooldown: 120
      memory:
        target: 70
        scaleInCooldown: 60
        scaleOutCooldown: 120

Access to Airflow UI:

Airflow UI

Look for AirflowWebServerEndpoint on outputs logged to your terminal.

    "cfn-airflow-webserver": [
        {
            "OutputKey": "AirflowWebServerEndpoint",
            "OutputValue": "airflow-dev-webserver-alb-1234567890.us-east-1.elb.amazonaws.com"
        }
    ],

Access to Flower UI:

Airflow UI

Look for AirflowFlowerEndpoint on outputs logged to your terminal.

    "cfn-airflow-flower": [
        {
            "OutputKey": "AirflowFlowerEndpoint",
            "OutputValue": "airflow-dev-flower-alb-1234567890.us-east-1.elb.amazonaws.com"
        }
    ],

Inspired by the work done by Nicor88

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].