datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-2.5%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-50%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-40%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+67.5%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-5%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+1430%)
BETL-oldBETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-57.5%)
Openkettlewebui一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 125 (+212.5%)
BenderBender - Serverless ETL Framework
Stars: ✭ 171 (+327.5%)
EtlboxA lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Stars: ✭ 203 (+407.5%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+215%)
TransformalizeConfigurable Extract, Transform, and Load
Stars: ✭ 125 (+212.5%)
DataBridge.NETConfigurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-60%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+830%)
link-moveA model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (-20%)
FlowMasterETL flow framework based on Yaml configs in Python
Stars: ✭ 19 (-52.5%)
csv-cruncherTreats CSV and JSON files as SQL tables, and exports SQL SELECTs back to CSV or JSON.
Stars: ✭ 32 (-20%)
Hale(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Stars: ✭ 84 (+110%)
Metlmito ETL tool
Stars: ✭ 153 (+282.5%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-50%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+245%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+260%)
qweryA SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-30%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+802.5%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+1050%)
Pyetlpython ETL framework
Stars: ✭ 33 (-17.5%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+4540%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-47.5%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+1735%)
StetlStetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (+60%)
GrafterLinked Data & RDF Manufacturing Tools in Clojure
Stars: ✭ 174 (+335%)
Etl.netMass processing data with a complete ETL for .net developers
Stars: ✭ 129 (+222.5%)
ElasticR client for the Elasticsearch HTTP API
Stars: ✭ 227 (+467.5%)
Bitcoin EtlETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 174 (+335%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+5862.5%)
RikoA Python stream processing engine modeled after Yahoo! Pipes
Stars: ✭ 1,571 (+3827.5%)
KibaData processing & ETL framework for Ruby
Stars: ✭ 1,618 (+3945%)
Example Airflow DagsExample DAGs using hooks and operators from Airflow Plugins
Stars: ✭ 243 (+507.5%)
Bulk WriterProvides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (+425%)
Linq2dbLinq to database provider.
Stars: ✭ 2,211 (+5427.5%)
Sentinel CrawlerXenomorph Crawler, a Concise, Declarative and Observable Distributed Crawler(Node / Go / Java / Rust) For Web, RDB, OS, also can act as a Monitor(with Prometheus) or ETL for Infrastructure 💫 多语言执行器,分布式爬虫
Stars: ✭ 118 (+195%)
DataxDataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+190%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+12197.5%)
Aws Ecs AirflowRun Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+167.5%)
Usaspending ApiServer application to serve U.S. federal spending data via a RESTful API
Stars: ✭ 166 (+315%)
Kafka Connectequivalent to kafka-connect 🔧 for nodejs ✨🐢🚀✨
Stars: ✭ 102 (+155%)
Csv2dbThe CSV to database command line loader
Stars: ✭ 102 (+155%)
Open Semantic EtlPython based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+312.5%)
OdČeská otevřená data
Stars: ✭ 99 (+147.5%)
Open Data Etl Utility KitUse Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utility-kit.readthedocs.io/en/stable
Stars: ✭ 93 (+132.5%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+487.5%)
CqlCategorical Query Language IDE
Stars: ✭ 196 (+390%)
Mara Example Project 2An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (+285%)