Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

wgzhao / Datax

Licence: apache-2.0

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Programming Languages

java

68154 projects - #9 most used programming language

Labels

database mysql hadoop oracle influxdb etl sqlserver hive clickhouse

Projects that are alternatives of or similar to Datax

Addax

Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Stars: ✭ 615 (+430.17%)

Mutual labels: influxdb, hive, hadoop, etl, clickhouse, oracle, sqlserver

Csv2db

The CSV to database command line loader

Stars: ✭ 102 (-12.07%)

Mutual labels: oracle, etl, database, mysql, sqlserver

Pyetl

python ETL framework

Stars: ✭ 33 (-71.55%)

Mutual labels: oracle, etl, hive, mysql, sqlserver

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+612.07%)

Mutual labels: hadoop, hive, mysql, clickhouse

Linq2db

Linq to database provider.

Stars: ✭ 2,211 (+1806.03%)

Mutual labels: oracle, etl, database, mysql

Liquibase

Main Liquibase Source

Stars: ✭ 2,910 (+2408.62%)

Mutual labels: oracle, database, mysql, sqlserver

Haproxy Configs

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

Stars: ✭ 106 (-8.62%)

Mutual labels: hadoop, hive, mysql, influxdb

Kangaroo

SQL client and admin tool for popular databases

Stars: ✭ 127 (+9.48%)

Mutual labels: oracle, database, mysql, sqlserver

Ebean

Ebean ORM

Stars: ✭ 1,172 (+910.34%)

Mutual labels: oracle, database, mysql, sqlserver

DataX-src

DataX 是异构数据广泛使用的离线数据同步工具/平台，实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Stars: ✭ 21 (-81.9%)

Mutual labels: hive, etl, oracle, sqlserver

Freesql

🦄 .NET orm, Mysql orm, Postgresql orm, SqlServer orm, Oracle orm, Sqlite orm, Firebird orm, 达梦 orm, 人大金仓 orm, 神通 orm, 翰高 orm, 南大通用 orm, Click house orm, MsAccess orm.

Stars: ✭ 3,077 (+2552.59%)

Mutual labels: oracle, mysql, sqlserver, clickhouse

Jooq

jOOQ is the best way to write SQL in Java

Stars: ✭ 4,695 (+3947.41%)

Mutual labels: oracle, database, mysql, sqlserver

Symmetric Ds

SymmetricDS is a database and file synchronization solution that is platform-independent, web-enabled, and database agnostic. SymmetricDS was built to make data replication across two to tens of thousands of databases and file systems fast, easy and resilient. We specialize in near real time, bi-directional data replication across large node networks over the WAN or LAN.

Stars: ✭ 450 (+287.93%)

Mutual labels: oracle, database, mysql, sqlserver

Typeorm

ORM for TypeScript and JavaScript (ES7, ES6, ES5). Supports MySQL, PostgreSQL, MariaDB, SQLite, MS SQL Server, Oracle, SAP Hana, WebSQL databases. Works in NodeJS, Browser, Ionic, Cordova and Electron platforms.

Stars: ✭ 26,559 (+22795.69%)

Mutual labels: oracle, database, mysql, sqlserver

Smartsql

SmartSql = MyBatis in C# + .NET Core+ Cache(Memory | Redis) + R/W Splitting + PropertyChangedTrack +Dynamic Repository + InvokeSync + Diagnostics

Stars: ✭ 775 (+568.1%)

Mutual labels: oracle, mysql, sqlserver

Zxw.framework.netcore

基于EF Core的Code First模式的DotNetCore快速开发框架，其中包括DBContext、IOC组件autofac和AspectCore.Injector、代码生成器（也支持DB First）、基于AspectCore的memcache和Redis缓存组件，以及基于ICanPay的支付库和一些日常用的方法和扩展，比如批量插入、更新、删除以及触发器支持，当然还有demo。欢迎提交各种建议、意见和pr~

Stars: ✭ 691 (+495.69%)

Mutual labels: oracle, mysql, sqlserver

Ezsql

PHP class to make interacting with a database ridiculusly easy

Stars: ✭ 804 (+593.1%)

Mutual labels: oracle, mysql, sqlserver

Blog

Everything about database,business.(Most for PostgreSQL).

Stars: ✭ 6,330 (+5356.9%)

Mutual labels: oracle, database, mysql

Flink Learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+9708.62%)

Mutual labels: mysql, influxdb, clickhouse

Sqlinjectionwiki

A wiki focusing on aggregating and documenting various SQL injection methods

Stars: ✭ 623 (+437.07%)

Mutual labels: oracle, mysql, sqlserver

View All Similar Projects ➔

DataX is an open source univeral ETL tool

Documentation Detailed description of how to install and deploy and how to use each collection plugin

English | 简体中文

current stable version

3.2.2

Note: As of 3.2.1, the package class names have been changed and are therefore no longer compatible with 3.1.x versions.

The project, originally from Ali's DataX, has been streamlined and adapted, as described below

Description of functional differences

Removed

Deleted databases that were restricted to Ali internal databases that were not available in non-Ali groups and were therefore deleted outright, including:

ADS
DRDS
OCS
ODPS
OSS
OTS

Added

Added some plug-ins, which currently include

reader plugin

clickhousereader
dbffilereader
hbase20xreader
jsonfilereader
kudureader
influxdbreader
httpreader
elastichsearchreader
tdenginereader

writer plugin

dbffilewrite
greenplumwriter
kuduwriter
influxdbwriter
tdenginewriter

Some plug-in enhancements are listed below

rdbms-ralative plugins

add support for almost basic data type, and some complex data type.

hdfswriter

Add support for Decimal data type.
Add support for writing Parquet files.
Add support for writing with the overwrite mode.
Add support for more compression algorithm.
The temporary directory location is changed to a hidden directory under the current write directory, which solves the problem of automatic partition increase caused by the previous parallelism with the write directory.
In overwrite mode, the file deletion mechanism has been improved to reduce the time window when the corresponding table query is empty

hdfsreader

Add support for reading Parquet files.
Add support for more compression algorithm.

hbasex11sqlwrite

Add support for Kerberos authentication.

oraclewriter

Add support for merge into statement.

postgresqlwriter

Add support for insert into ... on conflict statement.

rdbmsreader/rdbmswriter

Add support TDH Inceptor, Trino query engine

Supported databases

database/filesystem	reader	writer	plugin(reader/writer)	memo
Cassander	YES	YES	cassandrareader/cassandrawriter
ClickHouse	YES	YES	clickhousereader/clickhousewriter
DB2	YES	YES	rbdmsreader/rdbmswriter	not fully tested
DBF	YES	YES	dbffilereader/dbffilewriter
ElasticSearch	YES	YES	elasticsearchreader/elasticsearchwriter	originally from @Kestrong
FTP	YES	YES	ftpreader/ftpwriter
HBase 1.x	YES	YES	hbase11xreader/hbase11xwriter	use HBASE API
HBase 1.x	YES	YES	hbase11xsqlreader/hbase11xsqlwriter	use PhoenixPhoenix
HBase 2.x	YES	NO	hbase20xreader	use HBase API
HBase 2.x	YES	YES	hbase20xsqlreader/hbase20xsqlwriter	通过Phoenix操作HBase
HDFS	YES	YES	hdfsreader/hdfswriter	support HDFS 2.0 or later
HTTP	YES	NO	httpreader	support RestFul API
Greenplum	YES	YES	postgresqlreader/greenplumwriter
InfluxDB	YES	YES	influxdbreader/influxdbwriter	ONLY support InfluxDB 1.x
json	YES	NO	jsonfilereader
kudu	YES	YES	kudureader/kuduwriter
MongoDB	YES	YES	mongodbreader/mongodbwriter
MySQL/MariaDB	YES	YES	mysqlreader/mysqlwriter
Oracle	YES	YES	oraclereader/oraclewriter
PostgreSQL	YES	YES	postgresqlreader/postgresqlwriter
Trino	YES	YES	rdbmsreader/rdbmswriter	trino( formerly PrestoSQL)
Redis	YES	YES	redisreader/rediswriter
SQL Server	YES	YES	sqlserverreader/sqlserverwriter
TDengine	YES	YES	tdenginereader/tdenginewriter	TDengine
TDH Inceptor2	YES	YES	rdbmsreader/rdbmswriter	Transwarp TDH 5.1 or later
TEXT	YES	YES	textfilereader/textfilewriter

quick started

Do not want to compile?

If you are too lazy to compile or cannot compile because of your environment, you can download the corresponding version from the following link

version	download	md5
3.2.2	https://pan.baidu.com/s/1TQyaERnIk9EQRDULfQE69w code: jh31	b04d2563adb36457b85e48c318757ea3
3.2.1	https://pan.baidu.com/s/1as6sL09HlxAN8b2pZ1DttQ code: hwgx	ecda4a961b032c75718502caf54246a8
3.1.9	https://pan.baidu.com/s/1GYpehEvB-W3qnqilhskXFw code: q4wv	48c4104294cd9bb0c749efc50b32b4dd
3.1.8	https://pan.baidu.com/s/1jv-tb-11grYaUnsgnEhDzw code: 2dnf	ef110ae1ea31e1761dc25d6930300485
3.1.7	https://pan.baidu.com/s/1CE5I8V5TNptdOp6GLid3Jg code: v5u3	fecca6c4a32f2bf7246fdef8bc2912fe
3.1.6	https://pan.baidu.com/s/1Ldg10E3qWkbUT44rkH19og code: 4av4	f6aea7e0ce4b9ec83554e9c6d6ab3cb6
3.1.5	https://pan.baidu.com/s/1yY_lJqulE6hKqktoQbbGmQ code: 2r4p	9ae27c1c434a097f67a17bb704f70731
3.1.4	https://pan.baidu.com/s/1_plsvzD_GrWN-HffPBtz-g code: kpjn	7aca526fe7f6f0f54dc467f6ca1647b1
3.1.2	https://pan.baidu.com/s/1zFqv8E6iJX549zdSZDQgiQ code: 7jdk	3674711fc9b68fad3086f3c8526a3427
3.1.1	https://pan.baidu.com/s/1GwmFA7-hPkd6GKiZEvUKXg code: 1inn	0fa4e7902420704b2e814fef098f40ae

compile and package

git clone https://github.com/wgzhao/datax.git DataX
cd DataX
mvn clean package
mvn package assembly:single

If you want compile doc, you can execute the following instructions.

cd docs
mvn clean package

After successful compilation and packaging, a datax-<version> folder will be created in the target/datax directory of the project directory, where <version indicates the version.

begin your first job

The job subdirectory contains many sample jobs, of which job.json can be used as a smoke-out test and executed as follows

cd target/datax/datax-<version>
python bin/datax.py job/job.json

The output of the above command is roughly as follows.

Click to expand

 bin/datax.py job/job.json

DataX (DATAX-V3), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2020-09-23 19:51:30.990 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2020-09-23 19:51:30.997 [main] INFO  Engine - the machine info  =>

	osInfo:	Oracle Corporation 1.8 25.181-b13
	jvmInfo:	Mac OS X x86_64 10.15.6
	cpu num:	4

	totalPhysicalMemory:	-0.00G
	freePhysicalMemory:	-0.00G
	maxFileDescriptorCount:	-1
	currentOpenFileDescriptorCount:	-1

	GC Names	[PS MarkSweep, PS Scavenge]

	MEMORY_NAME                    | allocation_size                | init_size
	PS Eden Space                  | 677.50MB                       | 16.00MB
	Code Cache                     | 240.00MB                       | 2.44MB
	Compressed Class Space         | 1,024.00MB                     | 0.00MB
	PS Survivor Space              | 2.50MB                         | 2.50MB
	PS Old Gen                     | 1,365.50MB                     | 43.00MB
	Metaspace                      | -0.00MB                        | 0.00MB


2020-09-23 19:51:31.009 [main] INFO  Engine -
{
	"content":[
		{
			"reader":{
				"parameter":{
					"column":[
						{
							"type":"string",
							"value":"DataX"
						},
						{
							"type":"long",
							"value":19890604
						},
						{
							"type":"date",
							"value":"1989-06-04 00:00:00"
						},
						{
							"type":"bool",
							"value":true
						},
						{
							"type":"bytes",
							"value":"test"
						}
					],
					"sliceRecordCount":10
				},
				"name":"streamreader"
			},
			"writer":{
				"parameter":{
					"print":true,
					"column":[
						"col1"
					],
					"encoding":"UTF-8"
				},
				"name":"streamwriter"
			}
		}
	],
	"setting":{
		"errorLimit":{
			"record":0,
			"percentage":0.02
		},
		"speed":{
			"byte":-1,
			"channel":1
		}
	}
}

2020-09-23 19:51:31.068 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2020-09-23 19:51:31.069 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2020-09-23 19:51:31.069 [main] INFO  JobContainer - DataX jobContainer starts job.
2020-09-23 19:51:31.070 [main] INFO  JobContainer - Set jobId = 0
2020-09-23 19:51:31.082 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2020-09-23 19:51:31.082 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2020-09-23 19:51:31.083 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2020-09-23 19:51:31.083 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2020-09-23 19:51:31.083 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2020-09-23 19:51:31.083 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2020-09-23 19:51:31.084 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2020-09-23 19:51:31.102 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2020-09-23 19:51:31.111 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2020-09-23 19:51:31.117 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2020-09-23 19:51:31.119 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2020-09-23 19:51:31.120 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2020-09-23 19:51:31.129 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
DataX	19890604	1989-06-04 00:00:00	true	test
2020-09-23 19:51:31.231 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successful, used[103]ms
2020-09-23 19:51:31.232 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2020-09-23 19:51:41.129 [job-0] INFO  StandAloneJobContainerCommunicator - Total 10 records, 260 bytes | Speed 26B/s, 1 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-09-23 19:51:41.130 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2020-09-23 19:51:41.130 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2020-09-23 19:51:41.130 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2020-09-23 19:51:41.130 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2020-09-23 19:51:41.130 [job-0] INFO  JobContainer - invokeHooks begin
2020-09-23 19:51:41.130 [job-0] INFO  JobContainer - report url not found
2020-09-23 19:51:41.133 [job-0] INFO  JobContainer -
	 [total cpu info] =>
		averageCpu                     | maxDeltaCpu                    | minDeltaCpu
		-1.00%                         | -1.00%                         | -1.00%


	 [total gc info] =>
		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime
		 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s
		 PS Scavenge          | 2                  | 2                  | 2                  | 0.006s             | 0.006s             | 0.006s

2020-09-23 19:51:41.133 [job-0] INFO  JobContainer - PerfTrace not enable!
2020-09-23 19:51:41.133 [job-0] INFO  StandAloneJobContainerCommunicator - Total 10 records, 260 bytes | Speed 26B/s, 1 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-09-23 19:51:41.134 [job-0] INFO  JobContainer - Total 10 records, 260 bytes | Speed 26B/s, 1 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2020-09-23 19:51:41.134 [job-0] INFO  JobContainer -
任务启动时刻                    : 2020-09-23 19:51:31
任务结束时刻                    : 2020-09-23 19:51:41
任务总计耗时                    :                 10s
任务平均流量                    :               26B/s
记录写入速度                    :              1rec/s
读出记录总数                    :                  10
读写失败总数                    :                   0

runtime requirements

JDK 1.8+
Python 2.7+ / Python 3.7+

documentation

License

This software is free to use under the Apache License Apache license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 116

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗