Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → taogeYT → Pyetl

taogeYT / Pyetl

Licence: apache-2.0

python ETL framework

Programming Languages

python

139335 projects - #7 most used programming language

Labels

mysql csv excel oracle etl export sqlserver hive db etl-framework

Projects that are alternatives of or similar to Pyetl

Csv2db

The CSV to database command line loader

Stars: ✭ 102 (+209.09%)

Mutual labels: oracle, csv, etl, mysql, sqlserver

Datax

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (+251.52%)

Mutual labels: oracle, etl, hive, mysql, sqlserver

Addax

Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Stars: ✭ 615 (+1763.64%)

Mutual labels: hive, etl, excel, oracle, sqlserver

qwery

A SQL-like language for performing ETL transformations.

Stars: ✭ 28 (-15.15%)

Mutual labels: csv, hive, etl, etl-framework

Transformalize

Configurable Extract, Transform, and Load

Stars: ✭ 125 (+278.79%)

Mutual labels: excel, etl, etl-framework, mysql

DataX-src

DataX 是异构数据广泛使用的离线数据同步工具/平台，实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Stars: ✭ 21 (-36.36%)

Mutual labels: hive, etl, oracle, sqlserver

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-27.27%)

Mutual labels: csv, hive, etl, etl-framework

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (+1027.27%)

Mutual labels: csv, etl, etl-framework

Smartsql

SmartSql = MyBatis in C# + .NET Core+ Cache(Memory | Redis) + R/W Splitting + PropertyChangedTrack +Dynamic Repository + InvokeSync + Diagnostics

Stars: ✭ 775 (+2248.48%)

Mutual labels: oracle, mysql, sqlserver

Ezsql

PHP class to make interacting with a database ridiculusly easy

Stars: ✭ 804 (+2336.36%)

Mutual labels: oracle, mysql, sqlserver

Symmetric Ds

SymmetricDS is a database and file synchronization solution that is platform-independent, web-enabled, and database agnostic. SymmetricDS was built to make data replication across two to tens of thousands of databases and file systems fast, easy and resilient. We specialize in near real time, bi-directional data replication across large node networks over the WAN or LAN.

Stars: ✭ 450 (+1263.64%)

Mutual labels: oracle, mysql, sqlserver

Datafaker

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

Stars: ✭ 327 (+890.91%)

Mutual labels: oracle, hive, mysql

Jsqlparser

JSqlParser parses an SQL statement and translate it into a hierarchy of Java classes. The generated hierarchy can be navigated using the Visitor Pattern

Stars: ✭ 3,405 (+10218.18%)

Mutual labels: oracle, mysql, sqlserver

Tableexport

The simple, easy-to-implement library to export HTML tables to xlsx, xls, csv, and txt files.

Stars: ✭ 781 (+2266.67%)

Mutual labels: excel, csv, export

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (-9.09%)

Mutual labels: csv, etl, excel

Sqlinjectionwiki

一个专注于聚合和记录各种SQL注入方法的wiki

Stars: ✭ 402 (+1118.18%)

Mutual labels: oracle, mysql, sqlserver

Jooq

jOOQ is the best way to write SQL in Java

Stars: ✭ 4,695 (+14127.27%)

Mutual labels: oracle, mysql, sqlserver

Antdata.orm

特色：vs插件或者t4一键生成entity 支持配置非物理外键。分离linq转sql引擎(原生linq非扩展)和执行dal功能,支持异步,支持netcore2.0

Stars: ✭ 428 (+1196.97%)

Mutual labels: oracle, mysql, sqlserver

Laracsv

A Laravel package to easily generate CSV files from Eloquent model

Stars: ✭ 583 (+1666.67%)

Mutual labels: excel, csv, export

Typeorm

ORM for TypeScript and JavaScript (ES7, ES6, ES5). Supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, SAP Hana, WebSQL databases. Works in NodeJS, Browser, Ionic, Cordova and Electron platforms.

Stars: ✭ 26,559 (+80381.82%)

Mutual labels: oracle, mysql, sqlserver

View All Similar Projects ➔

Pyetl

Pyetl is a Python 3.6+ ETL framework

Installation:

pip3 install pyetl

Example

import sqlite3
import pymysql
from pyetl import Task, DatabaseReader, DatabaseWriter, ElasticsearchWriter, FileWriter
src = sqlite3.connect("file.db")
reader = DatabaseReader(src, table_name="source_table")
# 数据库之间数据同步，表到表传输
dst = pymysql.connect(host="localhost", user="your_user", password="your_password", db="test")
writer = DatabaseWriter(dst, table_name="target_table")
Task(reader, writer).start()
# 数据库表导出到文件
writer = FileWriter(file_path="./", file_name="file.csv")
Task(reader, writer).start()
# 数据库表同步es
writer = ElasticsearchWriter(index_name="target_index")
Task(reader, writer).start()

原始表目标表字段名称不同

import sqlite3
from pyetl import Task, DatabaseReader, DatabaseWriter
con = sqlite3.connect("file.db")
# 原始表source_table包含uuid，full_name字段
reader = DatabaseReader(con, table_name="source_table")
# 目标表target_table包含id，name字段
writer = DatabaseWriter(con, table_name="target_table")
# columns配置目标表和原始表的字段映射
columns = {"id": "uuid", "name": "full_name"}
Task(reader, writer, columns=columns).start()

添加字段的udf映射，对字段进行规则校验、数据标准化、数据清洗等

# functions配置字段的udf映射，如下id转字符串，name去除前后空格
functions={"id": str, "name": lambda x: x.strip()}
Task(reader, writer, columns=columns, functions=functions).start()

继承Task，灵活扩展

import json
from pyetl import Task, DatabaseReader, DatabaseWriter
class NewTask(Task):
    reader = DatabaseReader("sqlite:///db.sqlite3", table_name="source")
    writer = DatabaseWriter("sqlite:///db.sqlite3", table_name="target")
    
    def get_columns(self):
        """通过函数的方式生成字段映射配置，使用更灵活"""
        # 以下示例将数据库中的字段映射配置取出后转字典类型返回
        sql = "select columns from task where name='new_task'"
        columns = self.writer.db.read_one(sql)["columns"]
        return json.loads(columns)
      
    def get_functions(self):
        """通过函数的方式生成字段的udf映射"""
        # 以下示例将每个字段类型都转换为字符串
        return {col: str for col in self.columns}
      
    def apply_function(self, record):
        """数据流中对一整条数据的udf"""
        record["flag"] = int(record["id"]) % 2
        return record

    def before(self):
        """任务开始前要执行的操作, 如初始化任务表，创建目标表等"""
        sql = "create table destination_table(id int, name varchar(100))"
        self.writer.db.execute(sql)
    
    def after(self):
        """任务完成后要执行的操作，如更新任务状态等"""
        sql = "update task set status='done' where name='new_task'"
        self.writer.db.execute(sql)

NewTask().start()

Reader和Writer

Reader	介绍
DatabaseReader	支持所有关系型数据库的读取
FileReader	结构化文本数据读取，如csv文件
ExcelReader	Excel表文件读取
ElasticsearchReader	读取es索引数据

Writer	介绍
DatabaseWriter	支持所有关系型数据库的写入
ElasticsearchWriter	批量写入数据到es索引
HiveWriter	批量插入hive表
HiveWriter2	Load data方式导入hive表（推荐)
FileWriter	写入数据到文本文件

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 33

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗