All Projects → taogeYT → Pyetl

taogeYT / Pyetl

Licence: apache-2.0
python ETL framework

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyetl

Csv2db
The CSV to database command line loader
Stars: ✭ 102 (+209.09%)
Mutual labels:  oracle, csv, etl, mysql, sqlserver
Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (+251.52%)
Mutual labels:  oracle, etl, hive, mysql, sqlserver
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+1763.64%)
Mutual labels:  hive, etl, excel, oracle, sqlserver
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-15.15%)
Mutual labels:  csv, hive, etl, etl-framework
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (+278.79%)
Mutual labels:  excel, etl, etl-framework, mysql
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-36.36%)
Mutual labels:  hive, etl, oracle, sqlserver
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-27.27%)
Mutual labels:  csv, hive, etl, etl-framework
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1027.27%)
Mutual labels:  csv, etl, etl-framework
Smartsql
SmartSql = MyBatis in C# + .NET Core+ Cache(Memory | Redis) + R/W Splitting + PropertyChangedTrack +Dynamic Repository + InvokeSync + Diagnostics
Stars: ✭ 775 (+2248.48%)
Mutual labels:  oracle, mysql, sqlserver
Ezsql
PHP class to make interacting with a database ridiculusly easy
Stars: ✭ 804 (+2336.36%)
Mutual labels:  oracle, mysql, sqlserver
Symmetric Ds
SymmetricDS is a database and file synchronization solution that is platform-independent, web-enabled, and database agnostic. SymmetricDS was built to make data replication across two to tens of thousands of databases and file systems fast, easy and resilient. We specialize in near real time, bi-directional data replication across large node networks over the WAN or LAN.
Stars: ✭ 450 (+1263.64%)
Mutual labels:  oracle, mysql, sqlserver
Datafaker
Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具
Stars: ✭ 327 (+890.91%)
Mutual labels:  oracle, hive, mysql
Jsqlparser
JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern
Stars: ✭ 3,405 (+10218.18%)
Mutual labels:  oracle, mysql, sqlserver
Tableexport
The simple, easy-to-implement library to export HTML tables to xlsx, xls, csv, and txt files.
Stars: ✭ 781 (+2266.67%)
Mutual labels:  excel, csv, export
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-9.09%)
Mutual labels:  csv, etl, excel
Sqlinjectionwiki
一个专注于聚合和记录各种SQL注入方法的wiki
Stars: ✭ 402 (+1118.18%)
Mutual labels:  oracle, mysql, sqlserver
Jooq
jOOQ is the best way to write SQL in Java
Stars: ✭ 4,695 (+14127.27%)
Mutual labels:  oracle, mysql, sqlserver
Antdata.orm
特色:vs插件或者t4一键生成entity 支持配置非物理外键。分离linq转sql引擎(原生linq非扩展)和执行dal功能,支持异步,支持netcore2.0
Stars: ✭ 428 (+1196.97%)
Mutual labels:  oracle, mysql, sqlserver
Laracsv
A Laravel package to easily generate CSV files from Eloquent model
Stars: ✭ 583 (+1666.67%)
Mutual labels:  excel, csv, export
Typeorm
ORM for TypeScript and JavaScript (ES7, ES6, ES5). Supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, SAP Hana, WebSQL databases. Works in NodeJS, Browser, Ionic, Cordova and Electron platforms.
Stars: ✭ 26,559 (+80381.82%)
Mutual labels:  oracle, mysql, sqlserver

Pyetl

Pyetl is a Python 3.6+ ETL framework

Installation:

pip3 install pyetl

Example

import sqlite3
import pymysql
from pyetl import Task, DatabaseReader, DatabaseWriter, ElasticsearchWriter, FileWriter
src = sqlite3.connect("file.db")
reader = DatabaseReader(src, table_name="source_table")
# 数据库之间数据同步,表到表传输
dst = pymysql.connect(host="localhost", user="your_user", password="your_password", db="test")
writer = DatabaseWriter(dst, table_name="target_table")
Task(reader, writer).start()
# 数据库表导出到文件
writer = FileWriter(file_path="./", file_name="file.csv")
Task(reader, writer).start()
# 数据库表同步es
writer = ElasticsearchWriter(index_name="target_index")
Task(reader, writer).start()

原始表目标表字段名称不同

import sqlite3
from pyetl import Task, DatabaseReader, DatabaseWriter
con = sqlite3.connect("file.db")
# 原始表source_table包含uuid,full_name字段
reader = DatabaseReader(con, table_name="source_table")
# 目标表target_table包含id,name字段
writer = DatabaseWriter(con, table_name="target_table")
# columns配置目标表和原始表的字段映射
columns = {"id": "uuid", "name": "full_name"}
Task(reader, writer, columns=columns).start()

添加字段的udf映射,对字段进行规则校验、数据标准化、数据清洗等

# functions配置字段的udf映射,如下id转字符串,name去除前后空格
functions={"id": str, "name": lambda x: x.strip()}
Task(reader, writer, columns=columns, functions=functions).start()

继承Task,灵活扩展

import json
from pyetl import Task, DatabaseReader, DatabaseWriter
class NewTask(Task):
    reader = DatabaseReader("sqlite:///db.sqlite3", table_name="source")
    writer = DatabaseWriter("sqlite:///db.sqlite3", table_name="target")
    
    def get_columns(self):
        """通过函数的方式生成字段映射配置,使用更灵活"""
        # 以下示例将数据库中的字段映射配置取出后转字典类型返回
        sql = "select columns from task where name='new_task'"
        columns = self.writer.db.read_one(sql)["columns"]
        return json.loads(columns)
      
    def get_functions(self):
        """通过函数的方式生成字段的udf映射"""
        # 以下示例将每个字段类型都转换为字符串
        return {col: str for col in self.columns}
      
    def apply_function(self, record):
        """数据流中对一整条数据的udf"""
        record["flag"] = int(record["id"]) % 2
        return record

    def before(self):
        """任务开始前要执行的操作, 如初始化任务表,创建目标表等"""
        sql = "create table destination_table(id int, name varchar(100))"
        self.writer.db.execute(sql)
    
    def after(self):
        """任务完成后要执行的操作,如更新任务状态等"""
        sql = "update task set status='done' where name='new_task'"
        self.writer.db.execute(sql)

NewTask().start()

Reader和Writer

Reader 介绍
DatabaseReader 支持所有关系型数据库的读取
FileReader 结构化文本数据读取,如csv文件
ExcelReader Excel表文件读取
ElasticsearchReader 读取es索引数据
Writer 介绍
DatabaseWriter 支持所有关系型数据库的写入
ElasticsearchWriter 批量写入数据到es索引
HiveWriter 批量插入hive表
HiveWriter2 Load data方式导入hive表(推荐)
FileWriter 写入数据到文本文件
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].