All Projects → FreckleIOT → ecs-airflow

FreckleIOT / ecs-airflow

Licence: MIT License
Cloudformation templates for deploying Airflow in ECS

Programming Languages

shell
77523 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to ecs-airflow

Mu
A full-stack DevOps on AWS framework
Stars: ✭ 948 (+2462.16%)
Mutual labels:  cloudformation, ecs
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+1137.84%)
Mutual labels:  airflow, cloudformation
Aws Scalable Big Blue Button Example
Demonstration of how to deploy a scalable video conference solution based on Big Blue Button
Stars: ✭ 29 (-21.62%)
Mutual labels:  cloudformation, ecs
Aws Ecs Airflow
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+189.19%)
Mutual labels:  airflow, ecs
Awesome Aws
A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.
Stars: ✭ 9,895 (+26643.24%)
Mutual labels:  cloudformation, ecs
ecs-mesh-workshop
This handy workshop help the customers to quickly launch ECS with service mesh support on top of mixed type of instance in all commercial regions (include China), and also provides hands-on tutorials with best practices. It can be customized easily as per need.
Stars: ✭ 17 (-54.05%)
Mutual labels:  cloudformation, ecs
terraform-aws-ecs-web-app
Terraform module that implements a web app on ECS and supports autoscaling, CI/CD, monitoring, ALB integration, and much more.
Stars: ✭ 175 (+372.97%)
Mutual labels:  ecs
convection
A fully generic, modular DSL for AWS CloudFormation
Stars: ✭ 41 (+10.81%)
Mutual labels:  cloudformation
taskcat
Test all the CloudFormation things! (with TaskCat)
Stars: ✭ 974 (+2532.43%)
Mutual labels:  cloudformation
amazon-ivs-simple-chat-web-demo
⚠️ IMPORTANT ⚠️ This repository is no longer actively maintained and will be archived at the end of 2022. A basic live chat implementation built with WebSockets, that can be used in conjunction with Amazon IVS to build compelling customer experiences for live video streams with chat use cases.
Stars: ✭ 53 (+43.24%)
Mutual labels:  cloudformation
space
A SCI-FI community game server simulating space(ships). Built from the ground up to support moddable online action multiplayer and roleplay!
Stars: ✭ 25 (-32.43%)
Mutual labels:  ecs
cfn-encrypt
🔑🔐☁️ Cloudformation custom resource that enables creation of KMS encrypted strings and SSM secure parameters
Stars: ✭ 13 (-64.86%)
Mutual labels:  cloudformation
UniPowder
Powder Toy Game using Unity ECS
Stars: ✭ 37 (+0%)
Mutual labels:  ecs
ecsts
A simple entity component system library written in TypeScript
Stars: ✭ 34 (-8.11%)
Mutual labels:  ecs
whats-your-name
Sample app for AWS Serverless Repository - uses Amazon Rekognition to recognize person on the photo
Stars: ✭ 17 (-54.05%)
Mutual labels:  cloudformation
airflow-tutorial
Use Airflow to move data from multiple MySQL databases to BigQuery
Stars: ✭ 96 (+159.46%)
Mutual labels:  airflow
openverse-catalog
Identifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-27.03%)
Mutual labels:  airflow
provose
Provose is a new way to manage your Amazon Web Services infrastructure.
Stars: ✭ 27 (-27.03%)
Mutual labels:  ecs
ecs-gen
docker-gen for AWS ECS
Stars: ✭ 46 (+24.32%)
Mutual labels:  ecs
cfn-api-gateway-custom-domain
API Gateway custom domains as CloudFormation resources, backed by Let's Encrypt
Stars: ✭ 17 (-54.05%)
Mutual labels:  cloudformation

Airflow on ECS

Infrastrucuture scripts to orchestrate an Apache Airflow cluster in ECS

These scripts are ported from aws-infrastructure

Prerequisites

AWS CLI

The scripts rely on AWS CLI. You can configure a file:

~/.aws/credentials:

[default]
aws_access_key_id=changeme
aws_secret_access_key=changeme
region=us-west-2

If you prefer, AWS Security Token Service (STS) as well.

Encrypting sensitive passwords

The bash script kms-encrypt can be used to create a new KMS data key and store the cipher text in a JSON configuration file if it doesn't already exist. The KMS plaintext key is then used to configure a sensitive configuration value for a given parameter.

kms_key_id=changeme
./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=ParameterKey --secret='changeme' \
    --config-path=path-to-config.json

Note: This script uses a docker container from the python:3.6-slim image which has OpenSSL 1.1.0. This version is the same as the one used by puckel/docker-airflow:1.9.0-4 which is used for our ECS docker image. If there is a mismatch between OpenSSL versions the containers will not be able to decrypt passwords and keys.

Deploying Cloudformation Stack

Build the docker image

The CloudFormation templates use the ECR registry to pull Docker Images. However, it might be useful to push images to Docker Hub for developers that may not have access to AWS. The build-docker-image script support both ECR and Docker Hub.

ECR

Run the following command:

eval $(aws ecr get-login --no-include-email --region us-west-2)

It will return a command to login to ECR. Run that command to login.

To build the image make sure to change the Account ID:

aws_account_id=changeme
./build-docker-image --repo ecr --aws-account-id ${aws_account_id} --region us-west-2 --version 0.0.6

Docker Hub

Login to Docker Hub with your credentials

docker login

To build the image:

./build-docker-image --repo dockerhub --version 0.0.6

Postgres (RDS)

Create a JSON file dev-airflow-postgres.json to override non-default parameters. Note that SubnetIds are in the private subnet and we have included instructions below on how to connect to it over an SSH tunnel.

{
    "Parameters": {
      "VpcId": "vpc-xxxxxxxx",
      "SubnetIds": "subnet-aaaaaaaa,subnet-bbbbbbbb",
      "DatabaseName": "airflow",
      "StorageInGb": 100,
      "StorageIops": 1000,
      "PostgresVersion": "10.3",
      "DbInstanceType": "db.t2.small",
      "AllowedCIDR": "10.0.0.0/16",
      "BackupRetentionInDays": "7",
      "MultiAZDeployment": "true",
      "PostgresMasterUsername": "airflow_user",
      "KmsKeyId": "arn:aws:kms:us-west-2:************:key/********-****-****-****-************",
      "Organization": "Freckle IoT",
      "Team": "Freckle",
      "Environment": "dev",
      "Component": "Airflow"
    }
}

Deploy the Cloudformation template:

SENSITIVE_PARAMS='"PostgresMasterPassword=changeme"' ./deploy-stack \
    cloudformation/postgres-rds.cloudformation.yaml \
    dev-airflow-postgres ../ecs-airflow-config/dev-airflow-postgres.json

Note: We send in the password in this manner because kms-encrypt won't help in this case and also helps to keep these sensitive passwords outside of source control.

SSH Tunnel to RDS:

ssh -i path-to-pem -N -L 5432:postgres-end-point:5432 ec2-user@bastion-host

Run the postgresql client (you might need to install the client first for the target system):

psql -h 127.0.0.1 -d airflow -U airflow_user

Setup the schema:

CREATE SCHEMA IF NOT EXISTS airflow AUTHORIZATION airflow_user;
ALTER ROLE airflow_user SET search_path TO airflow;
GRANT USAGE ON SCHEMA airflow TO airflow_user;
GRANT CREATE ON SCHEMA airflow TO airflow_user;

Redis (ElastiCache)

Create a JSON file dev-airflow-redis.json to override non-default parameters. Note that the SubnetIds are the same as SubnetIds for the RDS cluster and are private subnets.

{
    "Parameters": {
      "VpcId": "vpc-xxxxxxxx",
      "SubnetIds": "subnet-aaaaaaaa,subnet-bbbbbbbb",
      "RedisCacheNodeType": "cache.t2.small",
      "RedisVersion": "4.0.10",
      "AllowedCIDR": "10.0.0.0/16",
      "Organization": "My Org",
      "Team": "Airflow Team",
      "Environment": "dev",
      "Component": "Airflow"
     }
}

Deploy the Cloudformation template:

./deploy-stack cloudformation/redis-cluster.cloudformation.yaml \
    dev-airflow-redis ../ecs-airflow-config/dev-airflow-redis.json

ECS Cluster

Create a JSON configuration file dev-airflow-ecs.json. Note that the InstanceSubnetIds are the same as SubnetIds for the RDS cluster and are private subnets. The LoadBalancerSubnetIds are public subnets.

{
    "Parameters": {
      "VpcId": "vpc-xxxxxxxx",
      "InstanceSubnetIds": "subnet-aaaaaaaa,subnet-bbbbbbbb",
      "LoadBalancerSubnetIds": "subnet-cccccccc,subnet-dddddddd",
      "EcsInstanceType": "m5.large",
      "UseSSL": "yes",
      "BastionStack": "changeme",
      "CertificateArn": "arn:aws:acm:us-west-2:************:certificate/********-****-****-****-************",
      "LoadBalancerType": "internet-facing",
      "AllowedCidrIp1": "changeme",
      "AllowedCidrIp2": "changeme",
      "CloudWatchLogGroup": "dev-airflow",
      "CloudWatchLogRetentionInDays": 180,
      "KeyName": "changeme",
      "Organization": "My Org",
      "Team": "Airflow Team",
      "Environment": "dev",
      "Component": "Airflow"
    }
}

Deploy the Cloudformation template:

./deploy-stack cloudformation/ecs-cluster.cloudformation.yaml \
    dev-airflow-ecs ../ecs-airflow-config/dev-airflow-ecs.json

NOTES:

  • The Cloudformation stack will also create an S3 bucket with the same name as the stack.
  • The bucket will have versioning enabled although, s3fs does not itself support object versions it will always show the latest version of the S3 objects.
  • The bucket also has the DeletionPolicy set to Retain so if the stack is terminated, the bucket will be left behind.

Airflow Components

The cloudformation/airflow-ecs-services folder contains a nested stack that deploys the following ECS services:

  • Airflow Webserver
  • Celery Flower Monitoring Tool
  • Scheduler
  • Multiple Workers

Create a JSON file dev-airflow-ecs-services.json to override non-default parameters.

{
  "Parameters": {
    "EcsStackName": "dev-airflow-ecs",
    "PostgresDbStackName": "dev-airflow-postgres",
    "RedisStackName": "dev-airflow-redis",
    "PostgresUsername": "airflow_user",
    "RedisDb": "0",
    "CloudWatchLogGroup": "dev-airflow",
    "HostedZoneId": "chamgeme",
    "HostedZoneName": "example.com.",
    "DNSPrefix": "dev-airflow",
    "AirflowUserName": "admin",
    "AirflowEmail": "[email protected]",
    "GoogleOAuthClientId": "changeme",
    "GoogleOAuthDomain": "changeme",
    "AirflowDockerImage": "************.dkr.ecr.us-west-2.amazonaws.com/airflow:0.0.2",
    "MinWebserverTasks": 1,
    "MaxWebserverTasks": 3,
    "DesiredWebserverTasks": 1,
    "MinFlowerTasks": 1,
    "MaxFlowerTasks": 3,
    "DesiredFlowerTasks": 1,
    "MinWorkerTasks": 1,
    "MaxWorkerTasks": 4,
    "DesiredWorkerTasks": 1,
    "SMTPUser": "changeme",
    "SMTPPassword": "changeme",
    "SMTPHost": "changeme",
    "SMTPPort": "change",
    "SMTPStartTLS": "changeme",
    "SMTPSSL": "changeme",
    "Organization": "My Org",
    "Team": "Airflow Team",
    "Environment": "dev",
    "Component": "Airflow"
  }
}

Configure the passwords and keys as follows:

kms_key_id=changeme
./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=PostgresPasswordEnc --secret='changeme' \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

fernet_key=$(docker run puckel/docker-airflow python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)")
./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=FernetKeyEnc --secret="${fernet_key}" \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=GoogleOAuthClientSecretEnc --secret='changeme' \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

./kms-encrypt --kms-key-id=${kms_key_id} \
    --param=SMTPPasswordEnc --secret='changeme' \
    --config-path=../ecs-airflow-config/dev-airflow-ecs-services.json

Deploy the Cloudformation template:

./deploy-nested-stack airflow-ecs-services \
    dev-airflow-ecs-services ../ecs-airflow-config/dev-airflow-ecs-services.json

NOTE: All changes should be done via the master stack dev-airflow-ecs-services. Do not update or destroy individual stacks within the nested stack as that will make it difficult to manage and deploy changes to the master stack running the ECS services.

Logging

The current ECS deployment for Airflow is not capable of obtaining the logs from individual worker tasks because they are mapped to random ports on the host machine whereas the configuration only supports a specific port 8793. Also, each worker is using its internal short hostname which is the Docker container ID which is not addressable between ECS Services.

You will see the following message when trying to view the logs from an Airflow job:

*** Log file isn't local.
*** Fetching here: http://673ee7a2fba0:8793/log/airflow-test/airflow-test-run/2018-07-25T01:00:51.105165/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='673ee7a2fba0', port=8793): Max retries exceeded with url: /log/airflow-test/airflow-test-run/2018-07-25T01:00:51.105165/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb737ed0ef0>: Failed to establish a new connection: [Errno -2] Name or service not known',))```

These logs can be seen in CloudWatch Logs by searching the Log Streams beginning with ecs-service/workers/*.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].