AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-66.67%)
Example Airflow DagsExample DAGs using hooks and operators from Airflow Plugins
Stars: ✭ 243 (+305%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1891.67%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+31.67%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+78.33%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-11.67%)
Koop🔮 Transform, query, and download geospatial data on the web.
Stars: ✭ 505 (+741.67%)
Dswarm Backoffice WebThe backoffice web application of d:swarm (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 11 (-81.67%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+663.33%)
Baby Names AnalysisData ETL & Analysis on the dataset 'Baby Names from Social Security Card Applications - National Data'.
Stars: ✭ 557 (+828.33%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-65%)
SmartcodeSmartCode = IDataSource -> IBuildTask -> IOutput => Build Everything!!!
Stars: ✭ 464 (+673.33%)
AirflowApache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+40068.33%)
ElyraElyra extends JupyterLab Notebooks with an AI centric approach.
Stars: ✭ 839 (+1298.33%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (+551.67%)
AbcPower of appbase.io via CLI, with nifty imports from your favorite data sources
Stars: ✭ 375 (+525%)
Data Pipelines With Apache AirflowDeveloped a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
Stars: ✭ 50 (-16.67%)
Pyetlpython ETL framework
Stars: ✭ 33 (-45%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-68.33%)
AistoreAIStore: scalable storage for AI applications
Stars: ✭ 367 (+511.67%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+501.67%)
Aws Airflow StackTurbine: the bare metals that gets you Airflow
Stars: ✭ 352 (+486.67%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+470%)
Incubator DolphinschedulerApache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Stars: ✭ 6,916 (+11426.67%)
Yunmai Data ExtractExtract your data from the Yunmai weighing scales cloud API so you can use it elsewhere
Stars: ✭ 21 (-65%)
Ananas DesktopA hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Stars: ✭ 551 (+818.33%)
Ether sqlA python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-31.67%)
BigsliceA serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+681.67%)
PantherDetect threats with log data and improve cloud security posture
Stars: ✭ 885 (+1375%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+666.67%)
Argo WorkflowsWorkflow engine for Kubernetes
Stars: ✭ 10,024 (+16606.67%)
PglogicalLogical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Stars: ✭ 455 (+658.33%)
Tuna🐟 A streaming ETL for fish
Stars: ✭ 11 (-81.67%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+588.33%)
ConfigsPublic, free to use, repository with diggers configs for scraping / extracting data from various e-commerce websites and online stores
Stars: ✭ 37 (-38.33%)
Dag FactoryDynamically generate Apache Airflow DAGs from YAML configuration files
Stars: ✭ 385 (+541.67%)
DatabookA facebook for data
Stars: ✭ 26 (-56.67%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+520%)
XeneA distributed workflow runner focusing on performance and simplicity.
Stars: ✭ 56 (-6.67%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+520%)
ObjinsyncContinuously synchronize directories from remote object store to local filesystem
Stars: ✭ 29 (-51.67%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+1221.67%)
Webkettle基于web版kettle开发的一套分布式综合调度,管理,ETL开发的用户专业版B/S架构工具
Stars: ✭ 334 (+456.67%)
Kiba PlusKiba enhancement for Ruby ETL.
Stars: ✭ 47 (-21.67%)
Ethereum EtlPython scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+1493.33%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+1123.33%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (+388.33%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+6731.67%)
Monstachea go daemon that syncs MongoDB to Elasticsearch in realtime
Stars: ✭ 736 (+1126.67%)
Airflow Rest Api PluginA plugin for Apache Airflow that exposes rest end points for the Command Line Interfaces
Stars: ✭ 281 (+368.33%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+6075%)
Docker AirflowRepo for building docker based airflow image. Containers support multiple features like writing logs to local or S3 folder and Initializing GCP while container booting. https://abhioncbr.github.io/docker-airflow/
Stars: ✭ 29 (-51.67%)
React CsvReact components to build CSV files on the fly basing on Array/literal object of data
Stars: ✭ 732 (+1120%)