All Projects → puckel → Docker Airflow

puckel / Docker Airflow

Licence: apache-2.0
Docker Apache Airflow

Programming Languages

shell
77523 projects
Dockerfile
14818 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Docker Airflow

bitnami-docker-airflow-scheduler
Bitnami Docker Image for Apache Airflow Scheduler
Stars: ✭ 19 (-99.44%)
Mutual labels:  workflow, airflow, scheduler
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+614.1%)
Mutual labels:  scheduler, airflow, workflow
Afctl
afctl helps to manage and deploy Apache Airflow projects faster and smoother.
Stars: ✭ 116 (-96.56%)
Mutual labels:  management, airflow
croner
Trigger functions and/or evaluate cron expressions in JavaScript. No dependencies. Most features. All environments.
Stars: ✭ 169 (-94.99%)
Mutual labels:  task, scheduler
Dagster
An orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+21.45%)
Mutual labels:  scheduler, workflow
Powerjob
Enterprise job scheduling middleware with distributed computing ability.
Stars: ✭ 3,231 (-4.27%)
Mutual labels:  scheduler, workflow
Document Management System
OpenKM is a Open Source Document Management System
Stars: ✭ 373 (-88.95%)
Mutual labels:  management, workflow
tasks
Package tasks is an easy to use in-process scheduler for recurring tasks in Go
Stars: ✭ 121 (-96.41%)
Mutual labels:  task, scheduler
Cloudtask
cloudtask is a distributed task scheduling platform.
Stars: ✭ 173 (-94.87%)
Mutual labels:  scheduler, task
lazylead
Eliminate the annoying work within ticketing systems (Jira, GitHub, Trello). Allows automating (without admin access) daily actions like tickets fields verification, email notifications by JQL/GQL, meeting requests to your (or teammates) calendar.
Stars: ✭ 42 (-98.76%)
Mutual labels:  task, management
go-xxl-job-client
xxl-job go client
Stars: ✭ 36 (-98.93%)
Mutual labels:  task, scheduler
Aiida Core
The official repository for the AiiDA code
Stars: ✭ 238 (-92.95%)
Mutual labels:  scheduler, workflow
Schedulis
Schedulis is a high performance workflow task scheduling system that supports high availability and multi-tenant financial level features, Linkis computing middleware, and has been integrated into data application development portal DataSphere Studio
Stars: ✭ 222 (-93.42%)
Mutual labels:  scheduler, workflow
Actors.unity
🚀Actors is a framework empowering developers to make better games faster on Unity.
Stars: ✭ 437 (-87.05%)
Mutual labels:  management, workflow
Wexflow
An easy and fast way to build automation and workflows on Windows, Linux, macOS, and the cloud.
Stars: ✭ 2,435 (-27.85%)
Mutual labels:  scheduler, workflow
YACLib
Yet Another Concurrency Library
Stars: ✭ 193 (-94.28%)
Mutual labels:  task, scheduler
linda
Linda is a simple dispatcher library.
Stars: ✭ 12 (-99.64%)
Mutual labels:  task, scheduler
Chronus
Chronus是360金融技术团队基于阿里开源项目TBSchedule重写的分布式调度。
Stars: ✭ 166 (-95.08%)
Mutual labels:  scheduler, task
Scheduler
Task scheduler for Golang
Stars: ✭ 171 (-94.93%)
Mutual labels:  scheduler, task
chronus
Chronus是360数科技术团队基于阿里开源项目TBSchedule重写的分布式调度。
Stars: ✭ 174 (-94.84%)
Mutual labels:  task, scheduler

docker-airflow

CI status Docker Build status

Docker Hub Docker Pulls Docker Stars

This repository contains Dockerfile of apache-airflow for Docker's automated build published to the public Docker Hub Registry.

Informations

Installation

Pull the image from the Docker repository.

docker pull puckel/docker-airflow

Build

Optionally install Extra Airflow Packages and/or python dependencies at build time :

docker build --rm --build-arg AIRFLOW_DEPS="datadog,dask" -t puckel/docker-airflow .
docker build --rm --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .

or combined

docker build --rm --build-arg AIRFLOW_DEPS="datadog,dask" --build-arg PYTHON_DEPS="flask_oauthlib>=0.9" -t puckel/docker-airflow .

Don't forget to update the airflow images in the docker-compose files to puckel/docker-airflow:latest.

Usage

By default, docker-airflow runs Airflow with SequentialExecutor :

docker run -d -p 8080:8080 puckel/docker-airflow webserver

If you want to run another executor, use the other docker-compose.yml files provided in this repository.

For LocalExecutor :

docker-compose -f docker-compose-LocalExecutor.yml up -d

For CeleryExecutor :

docker-compose -f docker-compose-CeleryExecutor.yml up -d

NB : If you want to have DAGs example loaded (default=False), you've to set the following environment variable :

LOAD_EX=n

docker run -d -p 8080:8080 -e LOAD_EX=y puckel/docker-airflow

If you want to use Ad hoc query, make sure you've configured connections: Go to Admin -> Connections and Edit "postgres_default" set this values (equivalent to values in airflow.cfg/docker-compose*.yml) :

  • Host : postgres
  • Schema : airflow
  • Login : airflow
  • Password : airflow

For encrypted connection passwords (in Local or Celery Executor), you must have the same fernet_key. By default docker-airflow generates the fernet_key at startup, you have to set an environment variable in the docker-compose (ie: docker-compose-LocalExecutor.yml) file to set the same key accross containers. To generate a fernet_key :

docker run puckel/docker-airflow python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)"

Configuring Airflow

It's possible to set any configuration value for Airflow from environment variables, which are used over values from the airflow.cfg.

The general rule is the environment variable should be named AIRFLOW__<section>__<key>, for example AIRFLOW__CORE__SQL_ALCHEMY_CONN sets the sql_alchemy_conn config option in the [core] section.

Check out the Airflow documentation for more details

You can also define connections via environment variables by prefixing them with AIRFLOW_CONN_ - for example AIRFLOW_CONN_POSTGRES_MASTER=postgres://user:password@localhost:5432/master for a connection called "postgres_master". The value is parsed as a URI. This will work for hooks etc, but won't show up in the "Ad-hoc Query" section unless an (empty) connection is also created in the DB

Custom Airflow plugins

Airflow allows for custom user-created plugins which are typically found in ${AIRFLOW_HOME}/plugins folder. Documentation on plugins can be found here

In order to incorporate plugins into your docker container

  • Create the plugins folders plugins/ with your custom plugins.
  • Mount the folder as a volume by doing either of the following:
    • Include the folder as a volume in command-line -v $(pwd)/plugins/:/usr/local/airflow/plugins
    • Use docker-compose-LocalExecutor.yml or docker-compose-CeleryExecutor.yml which contains support for adding the plugins folder as a volume

Install custom python package

  • Create a file "requirements.txt" with the desired python modules
  • Mount this file as a volume -v $(pwd)/requirements.txt:/requirements.txt (or add it as a volume in docker-compose file)
  • The entrypoint.sh script execute the pip install command (with --user option)

UI Links

Scale the number of workers

Easy scaling using docker-compose:

docker-compose -f docker-compose-CeleryExecutor.yml scale worker=5

This can be used to scale to a multi node setup using docker swarm.

Running other airflow commands

If you want to run other airflow sub-commands, such as list_dags or clear you can do so like this:

docker run --rm -ti puckel/docker-airflow airflow list_dags

or with your docker-compose set up like this:

docker-compose -f docker-compose-CeleryExecutor.yml run --rm webserver airflow list_dags

You can also use this to run a bash shell or any other command in the same environment that airflow would be run in:

docker run --rm -ti puckel/docker-airflow bash
docker run --rm -ti puckel/docker-airflow ipython

Simplified SQL database configuration using PostgreSQL

If the executor type is set to anything else than SequentialExecutor you'll need an SQL database. Here is a list of PostgreSQL configuration variables and their default values. They're used to compute the AIRFLOW__CORE__SQL_ALCHEMY_CONN and AIRFLOW__CELERY__RESULT_BACKEND variables when needed for you if you don't provide them explicitly:

Variable Default value Role
POSTGRES_HOST postgres Database server host
POSTGRES_PORT 5432 Database server port
POSTGRES_USER airflow Database user
POSTGRES_PASSWORD airflow Database password
POSTGRES_DB airflow Database name
POSTGRES_EXTRAS empty Extras parameters

You can also use those variables to adapt your compose file to match an existing PostgreSQL instance managed elsewhere.

Please refer to the Airflow documentation to understand the use of extras parameters, for example in order to configure a connection that uses TLS encryption.

Here's an important thing to consider:

When specifying the connection as URI (in AIRFLOW_CONN_* variable) you should specify it following the standard syntax of DB connections, where extras are passed as parameters of the URI (note that all components of the URI should be URL-encoded).

Therefore you must provide extras parameters URL-encoded, starting with a leading ?. For example:

POSTGRES_EXTRAS="?sslmode=verify-full&sslrootcert=%2Fetc%2Fssl%2Fcerts%2Fca-certificates.crt"

Simplified Celery broker configuration using Redis

If the executor type is set to CeleryExecutor you'll need a Celery broker. Here is a list of Redis configuration variables and their default values. They're used to compute the AIRFLOW__CELERY__BROKER_URL variable for you if you don't provide it explicitly:

Variable Default value Role
REDIS_PROTO redis:// Protocol
REDIS_HOST redis Redis server host
REDIS_PORT 6379 Redis server port
REDIS_PASSWORD empty If Redis is password protected
REDIS_DBNUM 1 Database number

You can also use those variables to adapt your compose file to match an existing Redis instance managed elsewhere.

Wanna help?

Fork, improve and PR.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].