All Projects → abhioncbr → Docker Airflow

abhioncbr / Docker Airflow

Licence: apache-2.0
Repo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Docker Airflow

Docker Superset
Repository for Docker Image of Apache-Superset. [Docker Image: https://hub.docker.com/r/abhioncbr/docker-superset]
Stars: ✭ 86 (+196.55%)
Mutual labels:  redis, celery, distributed-systems
Elasticell
Elastic Key-Value Storage With Strong Consistency and Reliability
Stars: ✭ 453 (+1462.07%)
Mutual labels:  redis, distributed-systems
Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+83006.9%)
Mutual labels:  scheduler, airflow
Bibi
An e-commerce fullstack solution for Flask 出口电商全栈解决方案
Stars: ✭ 914 (+3051.72%)
Mutual labels:  redis, celery
Zenko
Zenko is the open source multi-cloud data controller: own and keep control of your data on any cloud.
Stars: ✭ 353 (+1117.24%)
Mutual labels:  aws-s3, redis
Docker Django
A complete docker package for deploying django which is easy to understand and deploy anywhere.
Stars: ✭ 378 (+1203.45%)
Mutual labels:  redis, celery
Flower
Real-time monitor and web admin for Celery distributed task queue
Stars: ✭ 5,036 (+17265.52%)
Mutual labels:  redis, celery
Redisson
Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...
Stars: ✭ 17,972 (+61872.41%)
Mutual labels:  scheduler, redis
Redbeat
RedBeat is a Celery Beat Scheduler that stores the scheduled tasks and runtime metadata in Redis.
Stars: ✭ 639 (+2103.45%)
Mutual labels:  redis, celery
Node Celery
Celery client for Node.js
Stars: ✭ 648 (+2134.48%)
Mutual labels:  redis, celery
Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+2634.48%)
Mutual labels:  scheduler, airflow
Dis Seckill
👊SpringBoot+Zookeeper+Dubbo打造分布式高并发商品秒杀系统
Stars: ✭ 315 (+986.21%)
Mutual labels:  redis, distributed-systems
Docker Airflow
Docker Apache Airflow
Stars: ✭ 3,375 (+11537.93%)
Mutual labels:  scheduler, airflow
Enferno
A Python framework based on Flask microframework, with batteries included, and best practices in mind.
Stars: ✭ 385 (+1227.59%)
Mutual labels:  redis, celery
Juicefs
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Stars: ✭ 4,262 (+14596.55%)
Mutual labels:  redis, distributed-systems
Udacity Data Engineering Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+1479.31%)
Mutual labels:  aws-s3, airflow
Springbootunity
rabbitmq、redis、scheduled、socket、mongodb、Swagger2、spring data jpa、Thymeleaf、freemarker etc. (muti module spring boot project) (with spring boot framework,different bussiness scence with different technology。)
Stars: ✭ 845 (+2813.79%)
Mutual labels:  scheduler, redis
django-celery-fulldbresult
Django Celery DB Backend that keeps enough info to retry a task.
Stars: ✭ 37 (+27.59%)
Mutual labels:  distributed-systems, celery
bitnami-docker-airflow-scheduler
Bitnami Docker Image for Apache Airflow Scheduler
Stars: ✭ 19 (-34.48%)
Mutual labels:  airflow, scheduler
Haipproxy
💖 High available distributed ip proxy pool, powerd by Scrapy and Redis
Stars: ✭ 4,993 (+17117.24%)
Mutual labels:  scheduler, redis

docker-airflow

CircleCI License Code Climate

This is a repository for building Docker container of Apache Airflow (incubating).

Images

Image Pulls Tags
abhioncbr/docker-airflow Docker Pulls tags

Airflow components stack

  • Airflow version: Notation for representing version XX.YY.ZZ
  • Execution Mode: standalone(simple container for exploration purpose, based on sqlite as airflow metadata db & SequentialExecutor ) or prod(single node based, LocalExecutor amd mysql as airflow metadata db) and cluster (for distributed production long run use-cases, container runs as either server or worker )
  • Backend database: standalone- Sqlite, prod & cluster- Mysql
  • Scheduler: standalone- Sequential, prod- LocalExecutor and Cluster- Celery
  • Task queue: cluster- Redis
  • Log location: local file system (Default) or AWS S3 (through entrypoint-s3.sh)
  • User authentication: Password based & support for multiple users with superuser privilege.
  • Code enhancement: password based multiple users supporting super-user(can see all dags of all owner) feature. Currently, Airflow is working on the password based multi user feature.
  • Other features: support for google cloud platform packages in container.

Airflow ports

  • airflow portal port: 2222
  • airflow celery flower: 5555
  • redis port: 6379
  • log files exchange port: 8793

Airflow services information

  • In server container: redis, airflow webserver & scheduler is running.
  • In worker container: airflow worker & celery flower ui service is running.

How to build images

  • DockerFile uses airflow-version as a build-arg.
  • build image, if you want to do some customization -
       docker build -t abhioncbr/docker-airflow:$IMAGE_VERSION --build-arg AIRFLOW_VERSION=$AIRFLOW_VERSION
                  --build-arg AIRFLOW_PATCH_VERSION=$AIRFLOW_PATCH_VERSION -f ~/docker-airflow/docker-files/DockerFile .
    
    • Arg IMAGE_VERSION value should be airflow version for example, 1.10.3 or 1.10.2
    • Arg AIRFLOW_PATCH_VERSION value should be the major release version of airflow for example for 1.10.2 it should be 1.10.

How to run using Kitmatic

  • Simplest way for exploration purpose, using Kitematic(Run containers through a simple, yet powerful graphical user interface.)
    • Search abhioncbr/docker-airflow Image on docker-hub search-docker-airflow-Kitematic

    • Start a container through Kitematic UI. run-docker-airflow-Kitematic

How to run

  • General commands -

    • starting airflow image as a airflow-standalone container in a standalone mode-

      docker run --net=host -p 2222:2222 --name=airflow-standalone abhioncbr/airflow-XX.YY.ZZ -m=standalone &
      
    • Starting airflow image as a airflow-server container in a cluster mode-

      docker run --net=host -p 2222:2222 -p 6379:6379 --name=airflow-server \
      abhioncbr/airflow-XX.YY.ZZ -m=cluster -t=server -d=mysql://user:[email protected]:3306/db-name &
      
    • Starting airflow image as a airflow-worker container in a cluster mode-

      docker run --net=host -p 5555:5555 -p 8739:8739 --name=airflow-worker \
      abhioncbr/airflow-XX.YY.ZZ -m=cluster -t=worker -d=mysql://user:[email protected]:3306/db-name -r=redis://<airflow-server-host>:6379/0 &
      
  • In Mac using docker for mac -

    • Standalone Mode - starting airflow image in a standalone mode & mounting dags, code-artifacts & logs folder to host machine -

      docker run -p 2222:2222 --name=airflow-standalone \
      -v ~/airflow-data/code-artifacts:/code-artifacts \
      -v ~/airflow-data/logs:/usr/local/airflow/logs \
      -v ~/airflow-data/dags:/usr/local/airflow/dags \
      abhioncbr/airflow-XX.YY.ZZ -m=standalone &
      
    • Cluster Mode

      • starting airflow image as a server container & mounting dags, code-artifacts & logs folder to host machine -

        docker run -p 2222:2222 -p 6379:6379 --name=airflow-server \
        -v ~/airflow-data/code-artifacts:/code-artifacts \
        -v ~/airflow-data/logs:/usr/local/airflow/logs \
        -v ~/airflow-data/dags:/usr/local/airflow/dags \
        abhioncbr/airflow-XX.YY.ZZ \
        -m=cluster -t=server -d=mysql://user:[email protected]:3306:3306/<airflow-db-name> &
        
      • starting airflow image as a worker container & mounting dags, code-artifacts & logs folder to host machine -

        docker run -p 5555:5555 -p 8739:8739 --name=airflow-worker \
        -v ~/airflow-data/code-artifacts:/code-artifacts \
        -v ~/airflow-data/logs:/usr/local/airflow/logs \
        -v ~/airflow-data/dags:/usr/local/airflow/dags \
        abhioncbr/airflow-XX.YY.ZZ \
        -m=cluster -t=worker -d=mysql://user:[email protected]:3306:3306/<airflow-db-name> -r=redis://host.docker.internal:6379/0 &   
        

    Airflow

Distributed execution of airflow

  • As mentioned above, docker image of airflow can be leveraged to run in complete distributed run
    • single docker-airflow container in server mode for serving the UI of the airflow, redis for celery task & scheduler.
    • multiple docker-airflow containers in worker mode for executing tasks using celery executor.
    • centralised airflow metadata database.
  • Image below depicts the docker-airflow distributed platform: Distributed-Airflow
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].