AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-85.29%)
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+27.21%)
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-18.38%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (-19.12%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-61.03%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-81.62%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+483.09%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+236.76%)
Data Pipelines With Apache AirflowDeveloped a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Stars: ✭ 50 (-63.24%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+13005.88%)
Dbt Sqlserverdbt adapter for SQL Server and Azure SQL
Stars: ✭ 41 (-69.85%)
ObjinsyncContinuously synchronize directories from remote object store to local filesystem
Stars: ✭ 29 (-78.68%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-13.24%)
Docker AirflowRepo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/
Stars: ✭ 29 (-78.68%)
ElyraElyra extends JupyterLab Notebooks with an AI centric approach.
Stars: ✭ 839 (+516.91%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-41.91%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-7.35%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1011.03%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+7127.21%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+5750%)
Argo WorkflowsWorkflow engine for Kubernetes
Stars: ✭ 10,024 (+7270.59%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+640.44%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-10.29%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-6.62%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+535.29%)
DatabookA facebook for data
Stars: ✭ 26 (-80.88%)
Afctlafctl helps to manage and deploy Apache Airflow projects faster and smoother.
Stars: ✭ 116 (-14.71%)
LakefsGit-like capabilities for your object storage
Stars: ✭ 847 (+522.79%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-41.91%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+365.44%)
Airflow PipelineAn Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (-5.88%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+375.74%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+778.68%)
WhirlFast iterative local development and testing of Apache Airflow workflows
Stars: ✭ 111 (-18.38%)
Incubator DolphinschedulerApache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Stars: ✭ 6,916 (+4985.29%)
Terraform Aws AirflowTerraform module to deploy an Apache Airflow cluster on AWS, backed by RDS PostgreSQL for metadata, S3 for logs and SQS as message broker with CeleryExecutor
Stars: ✭ 69 (-49.26%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+252.94%)
Ansible PlaybookAnsible playbook to deploy distributed technologies
Stars: ✭ 61 (-55.15%)
Data Engineering BookAccumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (+246.32%)
AirflowApache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+17621.32%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1653.68%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-55.88%)
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (+203.68%)
DiscreetlyETLy is an add-on dashboard service on top of Apache Airflow.
Stars: ✭ 60 (-55.88%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+203.68%)
Dag FactoryDynamically generate Apache Airflow DAGs from YAML configuration files
Stars: ✭ 385 (+183.09%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (-21.32%)