Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → seanharr11 → Etlalchemy

seanharr11 / Etlalchemy

Licence: mit

Extract, Transform, Load: Any SQL Database in 4 lines of Code.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

database sqlalchemy etl migrations etl-framework

Projects that are alternatives of or similar to Etlalchemy

Hale

(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)

Stars: ✭ 84 (-81.74%)

Mutual labels: etl, etl-framework, database

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-94.78%)

Mutual labels: etl, etl-framework

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (-91.74%)

Mutual labels: etl, etl-framework

flask-db

A Flask CLI extension to help migrate and manage your SQL database.

Stars: ✭ 56 (-87.83%)

Mutual labels: sqlalchemy, migrations

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

Stars: ✭ 67 (-85.43%)

Mutual labels: etl, etl-framework

OpenKettleWebUI

一款基于kettle的数据处理web调度控制平台，支持文档资源库和数据库资源库，通过web平台控制kettle数据转换，可作为中间件集成到现有系统中

Stars: ✭ 138 (-70%)

Mutual labels: etl, etl-framework

DataBridge.NET

Configurable data bridge for permanent ETL jobs

Stars: ✭ 16 (-96.52%)

Mutual labels: etl, etl-framework

DIRECT

DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.

Stars: ✭ 20 (-95.65%)

Mutual labels: etl, etl-framework

carry

Python ETL(Extract-Transform-Load) tool / Data migration tool

Stars: ✭ 115 (-75%)

Mutual labels: sqlalchemy, etl

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (-93.91%)

Mutual labels: etl, etl-framework

Architect

A set of tools which enhances ORMs written in Python with more features

Stars: ✭ 320 (-30.43%)

Mutual labels: sqlalchemy, database

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-91.52%)

Mutual labels: etl, etl-framework

BETL-old

BETL. Meta data driven ETL generation using T-SQL

Stars: ✭ 17 (-96.3%)

Mutual labels: etl, etl-framework

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+33.04%)

Mutual labels: etl, etl-framework

link-move

A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.

Stars: ✭ 32 (-93.04%)

Mutual labels: etl, etl-framework

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (-95.43%)

Mutual labels: etl, etl-framework

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (-19.13%)

Mutual labels: etl, etl-framework

Flask Boilerplate

Simple flask boilerplate with Postgres, Docker, and Heroku/Zeit now

Stars: ✭ 251 (-45.43%)

Mutual labels: sqlalchemy, database

vixtract

www.vixtract.ru

Stars: ✭ 40 (-91.3%)

Mutual labels: etl, etl-framework

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (-95.43%)

Mutual labels: etl, etl-framework

View All Similar Projects ➔

etlalchemy

Extract, Transform and Load...Migrate any SQL Database in 4 Lines of Code. Read more here...

Installation

pip install etlalchemy
# On El Capitan:
### pip install --ignore-installed etlalchemy

# Also install the necessary DBAPI modules and SQLAlchemy dialects
# For example, for MySQL, you might use:
# pip install pymsql

Basic Usage

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

source = ETLAlchemySource("mssql+pyodbc://username:[email protected]_NAME")
target = ETLAlchemyTarget("mysql://username:[email protected]/db_name", drop_database=True)
target.addSource(source)
target.migrate()

Examples

Provide a list of tables to include/exclude in migration

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

# Load ONLY the 'salaries' table
source = ETLAlchemySource("mysql://etlalchemy:[email protected]/employees",
                          included_tables=["salaries"])
# Conversely, you could load ALL tables EXCEPT 'salaries'
# source = ETLAlchemySource("mysql://etlalchemy:[email protected]/employees",\
#                          excluded_tables=["salaries"])

target = ETLAlchemyTarget("postgresql://etlalchemy:[email protected]/test", drop_database=True)
target.addSource(source)
target.migrate()

Only migrate schema, or only Data, or only FKs, or only Indexes (or any combination of the 4!)

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

source = ETLAlchemySource("mysql://etlalchemy:[email protected]/employees")

target = ETLAlchemyTarget("postgresql://etlalchemy:[email protected]/test", drop_database=True)
target.addSource(source)
# Note that each phase (schema, data, index, fk) is independent of all others,
# and can be run standalone, or in any combination. (Obviously you need a schema to send data, etc...)
target.migrate(migrate_fks=False, migrate_indexes=False, migrate_data=False, migrate_schema=True)

Skip columns and tables if they are empty

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# This will skip tables with no rows (or all empty rows), and ignore them during schema migration
# This will skip columns if they have all NULL values, and ignore them during schema migration
source = ETLAlchemySource("mysql://etlalchemy:[email protected]/employees",\
                          skip_column_if_empty=True,\
                          skip_table_if_empty=True)
target = ETLAlchemyTarget("postgresql://etlalchemy:[email protected]/test", drop_database=True)
target.addSource(source)
target.migrate()

Enable 'upserting' of data

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget

source = ETLAlchemySource("mysql://etlalchemy:[email protected]/employees")
# This will leave the target DB as is, and if the tables being migrated from Source -> Target
# already exist on the Target, then rows will be updated based on PKs if they exist, or 
# inserted if they DNE on the Target table.
target = ETLAlchemyTarget("postgresql://etlalchemy:[email protected]/test", drop_database=False)
target.addSource(source)
target.migrate()

Alter schema (change column names, column types, table names, and Drop tables/columns)

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# See below for the simple structure of the .csv's for schema changes
source = ETLAlchemySource("mysql://etlalchemy:[email protected]/employees",\
                          column_schema_transformation_file=os.getcwd() + "/transformations/column_mappings.csv",\
                          table_schema_transformation_file=os.getcwd() + "/transformations/table_mappings.csv")
target = ETLAlchemyTarget("postgresql://SeanH:[email protected]/test", drop_database=True)
target.addSource(source)
target.migrate()

column_mappings.csv	table_mappings.csv
Column Name,Table Name,New Column Name,New Column Type,Delete	Table Name,New Table Name,Delete
last_name,employees,,,True	table_to_rename,new_table_name,False
fired,employees,,Boolean,False	table_to_delete,,True
birth_date,employees,dob,,False	departments,dept,False

Rename any column which ends in a given 'suffix' (or skip the column during migration)

from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# global_renamed_col_suffixes is useful to standardize column names across tables (like the date example below)
source = ETLAlchemySource("mysql://etlalchemy:[email protected]/employees",\
                          global_ignored_col_suffixes=['drop_all_columns_that_end_in_this'],\
                          global_renamed_col_suffixes={'date': 'dt'},\ #i.e. "created_date -> created_dt"
                         )
target = ETLAlchemyTarget("postgresql://SeanH:[email protected]/test", drop_database=True)
target.addSource(source)
target.migrate()

Known Limitations

'sqlalchemy_migrate' does not support MSSQL FK migrations. *(So, FK migrations will be skipped when Target is MSSQL)
Currently not compatible with Windows
- Several "os.system()" calls with UNIX-specific utilities
- One option for Windows users is installing through the Windows Subsystem for Linux (WSL)
If Target DB is in the Azure Cloud (MSSQL), FreeTDS has some compatibility issues which are performance related. This may be noticed when migrating tables with 1,000,000+ rows into a Azure MSSQL Server.
Though the MSSQL 'BULK INSERT' feature is supported in this tool, it is NOT supported on either Azure environments, or AWS MSSQL Server environments (no 'bulkadmin' role allowed). Feel free to test this out on a different MSSQL environment!
Regression tests have not (yet) been created due to the unique (and expensive) way one must test all of the different database types.
Migrations to MSSQL and Oracle are extremely slow due to the lack of 'fast' import capabilities.

'SQL Loader' can be used on Oracle, and the 'BULK INSERT' operation can be used on MSSQL, however the former is a PITA to install, and the latter is not supported in several MSSQL environments (see 'Known Limitations' below).
'BULK INSERT' is supported in etlalchemy (with limited testing), but "SQL LOADER" is not (yet).

When sending data to PostgreSQL, if the data contains VARCHAR() or TEXT() columns with carriage returns ('^M' or '\r'), these will be stripped.

This is due to the lack of the "ENCLOSED BY" option of psycopg.copy_from() - these chars are interpreted as literals, and in turn tell the COPY FROM operation that "the row ends here"

Assumptions Made

Default date formats for all Target DB's are assumed to be the 'out-of-the-box' defaults.
Text fields to not contain the character "|", or the string "|,".
- On some Target DBs, if you have text fields containing "|," (mssql) or "|" (sqlite), then the 'fast' import may fail, or insert bizarre values into your DB. This is due to the 'delimiter' which separates column values in the file that is sent to the Target DB.

On Testing

The 'Performance' matrix has been put together using a simple script which tests every combination of Source (5) and Target (5) DB migration (25 total combinations).

The script is not included (publicly), as it contains the connection strings of AWS RDS instances.

A regression test suite is needed, as is funding to setup an environment for Oracle and MSSQL instances..
There are definitely some untested column types here amongst all 5 RDBMS's. Please create pull requests or open issues that describe the problem in detail as these arise!

Contributors

We are always looking for contributors!

This project has its origins in solving the problem of migrating off of bulky, expensive enterprise-level databases. If the project has helped you to migrate off of these databases, and onto open-source RDBMS's, the best way to show your support is by opening Pull Requests and Issues.

Donations

Donations through Gratipay are welcome, but Pull Requests are better!

You can also support us via PayPal here.

Other

For help installing cx_Oracle on a Mac (El Capitan + cx_Oracle = Misery), check out this blog post for help.

Run this tool from the same server that hosts your Target database to get maximum performance out of it.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 460

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (33) 🔗