All Projects → villasv → Aws Airflow Stack

villasv / Aws Airflow Stack

Licence: mit
Turbine: the bare metals that gets you Airflow

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Aws Airflow Stack

Aws Ecs Airflow
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (-69.6%)
Mutual labels:  aws, airflow
Cloudformation Cli
The CloudFormation Provider Development Toolkit allows you to author your own resource providers and modules that can be used by CloudFormation.
Stars: ✭ 149 (-57.67%)
Mutual labels:  aws, aws-cloudformation
Cfn Python Lint
CloudFormation Linter
Stars: ✭ 1,770 (+402.84%)
Mutual labels:  aws, aws-cloudformation
Cform Vscode
CloudFormation extension for Visual Studio Code
Stars: ✭ 73 (-79.26%)
Mutual labels:  aws, aws-cloudformation
Docs
Rapid CloudFormation: Modular, production ready, open source.
Stars: ✭ 209 (-40.62%)
Mutual labels:  aws, aws-cloudformation
Perun
A command-line validation tool for AWS Cloud Formation that allows to conquer the cloud faster!
Stars: ✭ 82 (-76.7%)
Mutual labels:  aws, aws-cloudformation
Cfn Secret Provider
A CloudFormation custom resource provider for deploying secrets and keys
Stars: ✭ 125 (-64.49%)
Mutual labels:  aws, aws-cloudformation
Airflow On Kubernetes
Bare minimal Airflow on Kubernetes (Local, EKS, AKS)
Stars: ✭ 38 (-89.2%)
Mutual labels:  aws, airflow
Aws Cf Templates
A cloudonaut.io project. Engineered by widdix.
Stars: ✭ 2,399 (+581.53%)
Mutual labels:  aws, aws-cloudformation
Learn Cloudformation
Learn how to use Infrastructure as Code on AWS with the help of CloudFormation.
Stars: ✭ 191 (-45.74%)
Mutual labels:  aws, aws-cloudformation
Terraform Aws Airflow
Terraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker with CeleryExecutor
Stars: ✭ 69 (-80.4%)
Mutual labels:  aws, airflow
Deeplearning Cfn
Distributed Deep Learning on AWS Using CloudFormation (CFN), MXNet and TensorFlow
Stars: ✭ 252 (-28.41%)
Mutual labels:  aws, aws-cloudformation
Cfn Create Or Update
Create or update CloudFormation stack also if no updates are to be performed.
Stars: ✭ 59 (-83.24%)
Mutual labels:  aws, aws-cloudformation
Udacity Data Engineering
Udacity Data Engineering Nano Degree (DEND)
Stars: ✭ 89 (-74.72%)
Mutual labels:  aws, airflow
Quickstart Taskcat Ci
AWS Quick Start Team
Stars: ✭ 57 (-83.81%)
Mutual labels:  aws, aws-cloudformation
Cloudformation
Some CF templates
Stars: ✭ 123 (-65.06%)
Mutual labels:  aws, aws-cloudformation
Aws Auto Terminate Idle Emr
AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-94.03%)
Mutual labels:  aws, aws-cloudformation
Aws Unifi Controller
Example of a Ubiquiti Unifi Controller in AWS using Network Load Balancer for TLS termination
Stars: ✭ 37 (-89.49%)
Mutual labels:  aws, aws-cloudformation
Aws Iam Generator
Generate Multi-Account IAM users/groups/roles/policies from a simple YAML configuration file and Jinja2 templates.
Stars: ✭ 191 (-45.74%)
Mutual labels:  aws, aws-cloudformation
Aws Lambda Typescript
This sample uses the Serverless Application Framework to implement an AWS Lambda function in TypeScript, deploy it via CloudFormation, publish it through API Gateway to a custom domain registered on Route53, and document it with Swagger.
Stars: ✭ 228 (-35.23%)
Mutual labels:  aws, aws-cloudformation

Turbine GitHub Release Build Status CFN Deploy

Turbine is the set of bare metals behind a simple yet complete and efficient Airflow setup.

The project is intended to be easily deployed, making it great for testing, demos and showcasing Airflow solutions. It is also expected to be easily tinkered with, allowing it to be used in real production environments with little extra effort. Deploy in a few clicks, personalize in a few fields, configure in a few commands.

Overview

stack diagram

The stack is composed mainly of three services: the Airflow web server, the Airflow scheduler, and the Airflow worker. Supporting resources include an RDS to host the Airflow metadata database, an SQS to be used as broker backend, S3 buckets for logs and deployment bundles, an EFS to serve as shared directory, and a custom CloudWatch metric measured by a timed AWS Lambda. All other resources are the usual boilerplate to keep the wind blowing.

Deployment and File Sharing

The deployment process through CodeDeploy is very flexible and can be tailored for each project structure, the only invariant being the Airflow home directory at /airflow. It ensures that every Airflow process has the same files and can upgraded gracefully, but most importantly makes deployments really fast and easy to begin with.

There's also an EFS shared directory mounted at at /mnt/efs, which can be useful for staging files potentially used by workers on different machines and other synchronization scenarios commonly found in ETL/Big Data applications. It facilitates migrating legacy workloads not ready for running on distributed workers.

Workers and Auto Scaling

The stack includes an estimate of the cluster load average made by analyzing the amount of failed attempts to retrieve a task from the queue. The metric objective is to measure if the cluster is correctly sized for the influx of tasks. Worker instances have lifecycle hooks promoting a graceful shutdown, waiting for tasks completion when terminating.

The goal of the auto scaling feature is to respond to changes in queue load, which could mean an idle cluster becoming active or a busy cluster becoming idle, the start/end of a backfill, many DAGs with similar schedules hitting their due time, DAGs that branch to many parallel operators. Scaling in response to machine resources like facing CPU intensive tasks is not the goal; the latter is a very advanced scenario and would be best handled by Celery's own scaling mechanism or offloading the computation to another system (like Spark or Kubernetes) and use Airflow only for orchestration.

Get It Working

0. Prerequisites

  • Configured AWS CLI for deploying your own files (Guide)

1. Deploy the stack

Create a new stack using the latest template definition at templates/turbine-master.template. The following button will deploy the stack available in this project's master branch (defaults to your last used region):

Launch

The stack resources take around 15 minutes to create, while the airflow installation and bootstrap another 3 to 5 minutes. After that you can already access the Airflow UI and deploy your own Airflow DAGs.

2. Upstream your files

The only requirement is that you configure the deployment to copy your Airflow home directory to /airflow. After crafting your appspec.yml, you can use the AWS CLI to deploy your project.

For convenience, you can use this Makefile to handle the packaging, upload and deployment commands. A minimal working example of an Airflow project to deploy can be found at examples/project/airflow.

If you follow this blueprint, a deployment is as simple as:

make deploy stack-name=yourcoolstackname

Maintenance and Operation

Sometimes the cluster operators will want to perform some additional setup, debug or just inspect the Airflow services and database. The stack is designed to minimize this need, but just in case it also offers decent internal tooling for those scenarios.

Using Systems Manager Sessions

Instead of the usual SSH procedure, this stack encourages the use of AWS Systems Manager Sessions for increased security and auditing capabilities. You can still use the CLI after a bit more configuration and not having to expose your instances or creating bastion instances is worth the effort. You can read more about it in the Session Manager docs.

Running Airflow commands

The environment variables used by the Airflow service are not immediately available in the shell. Before running Airflow commands, you need to load the Airflow configuration:

$ export $(xargs </etc/sysconfig/airflow.env)
$ airflow list_dags

Inspecting service logs

The Airflow service runs under systemd, so logs are available through journalctl. Most often used arguments include the --follow to keep the logs coming, or the --no-pager to directly dump the text lines, but it offers much more.

$ sudo journalctl -u airflow -n 50

FAQ

  1. Why does auto scaling takes so long to kick in?

    AWS doesn't provide minute-level granularity on SQS metrics, only 5 minute aggregates. Also, CloudWatch stamps aggregate metrics with their initial timestamp, meaning that the latest stable SQS metrics are from 10 minutes in the past. This is why the load metric is always 5~10 minutes delayed. To avoid oscillating allocations, the alarm action has a 10 minutes cooldown.

  2. Why can't I stop running tasks by terminating all workers?

    Workers have lifecycle hooks that make sure to wait for Celery to finish its tasks before allowing EC2 to terminate that instance (except maybe for Spot Instances going out of capacity). If you want to kill running tasks, you will need to SSH into worker instances and stop the airflow service forcefully.

  3. Is there any documentation around the architectural decisions?

    Yes, most of them should be available in the project's GitHub Wiki. It doesn't mean those decisions are final, but reading them beforehand will help formulating new proposals.

Contributing

This project aims to be constantly evolving with up to date tooling and newer AWS features, as well as improving its design qualities and maintainability. Requests for Enhancement should be abundant and anyone is welcome to pick them up.

Stacks can get quite opinionated. If you have a divergent fork, you may open a Request for Comments and we will index it. Hopefully this will help to build a diverse set of possible deployment models for various production needs.

See the contribution guidelines for details.

You may also want to take a look at the Citizen Code of Conduct.

Did this project help you? Consider buying me a cup of coffee ;-)

Buy me a coffee!

Licensing

MIT License

Copyright (c) 2017 Victor Villas

See the license file for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].