PaperboyA web frontend for scheduling Jupyter notebook reports
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Airflow ExporterAirflow plugin to export dag and task based metrics to Prometheus.
Data Science Stack Cookiecutter🐳📊🤓Cookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)
Airflow ChartA Helm chart to install Apache Airflow on Kubernetes
Beyond Jupyter🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Afctlafctl helps to manage and deploy Apache Airflow projects faster and smoother.
WhirlFast iterative local development and testing of Apache Airflow workflows
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Terraform Aws AirflowTerraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker with CeleryExecutor
DiscreetlyETLy is an add-on dashboard service on top of Apache Airflow.
XeneA distributed workflow runner focusing on performance and simplicity.
Airflow ToolkitAny Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested data pipelines(DAGs) 🖥 >> [ 🚀, 🚢 ]
Data Pipelines With Apache AirflowDeveloped a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
ObjinsyncContinuously synchronize directories from remote object store to local filesystem
Docker AirflowRepo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/
ElyraElyra extends JupyterLab Notebooks with an AI centric approach.
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Incubator DolphinschedulerApache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
AirflowApache Airflow - A platform to programmatically author, schedule, and monitor workflows
Dag FactoryDynamically generate Apache Airflow DAGs from YAML configuration files
helpdeskYet another helpdesk based on multiple providers
bigkubeMinikube for big data with Scala and Spark
udacity-data-eng-proj2A production-grade data pipeline has been designed to automate the parsing of user search patterns to analyze user engagement. Extract data from S3, apply a series of transformations and load into S3 and Redshift.
ecs-airflowCloudformation templates for deploying Airflow in ECS