All Projects → wgzhao → Addax

wgzhao / Addax

Licence: Apache-2.0 License
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Programming Languages

java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Addax

Datax
DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server
Stars: ✭ 116 (-81.14%)
Mutual labels:  influxdb, hive, hadoop, etl, clickhouse, oracle, sqlserver
Pyetl
python ETL framework
Stars: ✭ 33 (-94.63%)
Mutual labels:  hive, etl, excel, oracle, sqlserver
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-96.59%)
Mutual labels:  hive, etl, oracle, sqlserver, datax
Csv2db
The CSV to database command line loader
Stars: ✭ 102 (-83.41%)
Mutual labels:  etl, oracle, db2, sqlserver
Apijson
🚀 零代码、热更新、全自动 ORM 库,后端接口和文档零代码,前端(客户端) 定制返回 JSON 的数据和结构。 🚀 A JSON Transmission Protocol and an ORM Library for automatically providing APIs and Docs.
Stars: ✭ 12,559 (+1942.11%)
Mutual labels:  clickhouse, oracle, db2, sqlserver
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+644.88%)
Mutual labels:  hive, hadoop, trino
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-39.51%)
Mutual labels:  hive, hadoop, etl
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-97.72%)
Mutual labels:  hive, hadoop, etl
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+94.31%)
Mutual labels:  hive, hadoop, etl
Freesql
🦄 .NET orm, Mysql orm, Postgresql orm, SqlServer orm, Oracle orm, Sqlite orm, Firebird orm, 达梦 orm, 人大金仓 orm, 神通 orm, 翰高 orm, 南大通用 orm, Click house orm, MsAccess orm.
Stars: ✭ 3,077 (+400.33%)
Mutual labels:  clickhouse, oracle, sqlserver
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+34.31%)
Mutual labels:  hive, hadoop, clickhouse
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (-77.24%)
Mutual labels:  hive, hadoop, etl
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+259.51%)
Mutual labels:  etl, oracle, db2
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-96.59%)
Mutual labels:  hive, hadoop, impala
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-96.1%)
Mutual labels:  hive, hadoop, etl
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-90.89%)
Mutual labels:  hive, hadoop, excel
Sharding Method
分表分库的新思路——服务层Sharding框架,全SQL、全数据库兼容,ACID特性与原生数据库一致,能实现RR级别读写分离,无SQL解析性能更高
Stars: ✭ 188 (-69.43%)
Mutual labels:  oracle, db2, sqlserver
Liquibase
Main Liquibase Source
Stars: ✭ 2,910 (+373.17%)
Mutual labels:  oracle, db2, sqlserver
Haproxy Configs
80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (-82.76%)
Mutual labels:  influxdb, hive, hadoop
data-profiling
a set of scripts to pull meta data and data profiling metrics from relational database systems
Stars: ✭ 57 (-90.73%)
Mutual labels:  hive, oracle, sqlserver

Addax Logo

Addax is an open source universal ETL tool

Documentation detailed description of how to install and deploy and how to use each collection plugin

release version Maven Package

English | 简体中文

The project, originally from Ali's DataX, has been streamlined and adapted, as described below

Supported Data Sources

Addax supports more than 20 SQL and NoSQL data sources. It can also be extended to support more.

supported databases

Getting Started

Use docker image

docker pull wgzhao/addax:latest
docker run -ti --rm --name addax wgzhao/addax:latest /opt/addax/bin/addax.sh /opt/addax/job/job.json

Use install script

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/wgzhao/Addax/master/install.sh)"

This script installs Addax to its preferred prefix (/usr/local for macOS Intel, /opt/addax for Apple Silicon and /opt/addax/ for Linux)

Compile and Package

git clone https://github.com/wgzhao/addax.git addax
cd addax
mvn clean package
mvn package assembly:single

After successful compilation and packaging, a addax-<version> folder will be created in the target/datax directory of the project directory, where <version indicates the version.

Begin your first task

The job subdirectory contains many sample jobs, of which job.json can be used as a smoke-out test and executed as follows

bin/addax.sh job/job.json

The output of the above command is roughly as follows.

Click to expand
$ bin/addax.sh job/job.json


  ___      _     _
 / _ \    | |   | |
/ /_\ \ __| | __| | __ ___  __
|  _  |/ _` |/ _` |/ _` \ \/ /
| | | | (_| | (_| | (_| |>  <
\_| |_/\__,_|\__,_|\__,_/_/\_\

:: Addax version ::    (v4.0.3)

2021-09-16 11:03:20.328 [        main] INFO  VMInfo               - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2021-09-16 11:03:20.347 [        main] INFO  Engine               -
{
	"content":[
		{
			"reader":{
				"parameter":{
					"column":[
						{
							"type":"string",
							"value":"addax"
						},
						{
							"type":"long",
							"value":19890604
						},
						{
							"type":"date",
							"value":"1989-06-04 00:00:00"
						},
						{
							"type":"bool",
							"value":true
						},
						{
							"type":"bytes",
							"value":"test"
						}
					],
					"sliceRecordCount":10
				},
				"name":"streamreader"
			},
			"writer":{
				"parameter":{
					"print":true,
					"column":[
						"col1"
					],
					"encoding":"UTF-8"
				},
				"name":"streamwriter"
			}
		}
	],
	"setting":{
		"errorLimit":{
			"record":0,
			"percentage":0.02
		},
		"speed":{
			"byte":-1,
			"channel":1
		}
	}
}

2021-09-16 11:03:20.367 [        main] INFO  PerfTrace            - PerfTrace traceId=job_-1, isEnable=false, priority=0
2021-09-16 11:03:20.367 [        main] INFO  JobContainer         - Addax jobContainer starts job.
2021-09-16 11:03:20.368 [        main] INFO  JobContainer         - Set jobId = 0
2021-09-16 11:03:20.382 [       job-0] INFO  JobContainer         - Addax Reader.Job [streamreader] do prepare work .
2021-09-16 11:03:20.382 [       job-0] INFO  JobContainer         - Addax Writer.Job [streamwriter] do prepare work .
2021-09-16 11:03:20.383 [       job-0] INFO  JobContainer         - Job set Channel-Number to 1 channels.
2021-09-16 11:03:20.383 [       job-0] INFO  JobContainer         - Addax Reader.Job [streamreader] splits to [1] tasks.
2021-09-16 11:03:20.383 [       job-0] INFO  JobContainer         - Addax Writer.Job [streamwriter] splits to [1] tasks.
2021-09-16 11:03:20.405 [       job-0] INFO  JobContainer         - Scheduler starts [1] taskGroups.
2021-09-16 11:03:20.412 [ taskGroup-0] INFO  TaskGroupContainer   - taskGroupId=[0] start [1] channels for [1] tasks.
2021-09-16 11:03:20.415 [ taskGroup-0] INFO  Channel              - Channel set byte_speed_limit to -1, No bps activated.
2021-09-16 11:03:20.415 [ taskGroup-0] INFO  Channel              - Channel set record_speed_limit to -1, No tps activated.
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
addax	19890604	1989-06-04 00:00:00	true	test
2021-09-16 11:03:23.428 [       job-0] INFO  AbstractScheduler    - Scheduler accomplished all tasks.
2021-09-16 11:03:23.428 [       job-0] INFO  JobContainer         - Addax Writer.Job [streamwriter] do post work.
2021-09-16 11:03:23.428 [       job-0] INFO  JobContainer         - Addax Reader.Job [streamreader] do post work.
2021-09-16 11:03:23.430 [       job-0] INFO  JobContainer         - PerfTrace not enable!
2021-09-16 11:03:23.431 [       job-0] INFO  StandAloneJobContainerCommunicator - Total 10 records, 260 bytes | Speed 86B/s, 3 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2021-09-16 11:03:23.432 [       job-0] INFO  JobContainer         -
任务启动时刻                    : 2021-09-16 11:03:20
任务结束时刻                    : 2021-09-16 11:03:23
任务总计耗时                    :                  3s
任务平均流量                    :               86B/s
记录写入速度                    :              3rec/s
读出记录总数                    :                  10
读写失败总数                    :                   0

Here and Here provides all kinds of job configuration examples

Runtime Requirements

  • JDK 1.8+
  • Python 2.7+ / Python 3.7+ (Windows)

Documentation

Code Style

We recommend you use IntelliJ as your IDE. The code style template for the project can be found in the codestyle repository along with our general programming and Java guidelines. In addition to those you should also adhere to the following:

  • Alphabetize sections in the documentation source files (both in table of contents files and other regular documentation files). In general, alphabetize methods/variables/sections if such ordering already exists in the surrounding code.
  • When appropriate, use the Java 8 stream API. However, note that the stream implementation does not perform well so avoid using it in inner loops or otherwise performance sensitive sections.
  • Categorize errors when throwing exceptions. For example, AddaxException takes an error code and error message as arguments, AddaxException(REQUIRE_VALUE, "lack of required item"). This categorization lets you generate reports, so you can monitor the frequency of various failures.
  • Ensure that all files have the appropriate license header; you can generate the license by running mvn license:format.
  • Consider using String formatting (printf style formatting using the Java Formatter class): format("Session property %s is invalid: %s", name, value) (note that format() should always be statically imported). Sometimes, if you only need to append something, consider using the + operator.
  • Avoid using the ternary operator except for trivial expressions.
  • Use an assertion from Airlift's Assertions class if there is one that covers your case rather than writing the assertion by hand. Over time, we may move over to more fluent assertions like AssertJ.
  • When writing a Git commit message, follow these guidelines.

License

This software is free to use under the Apache License Apache license.

Special Thanks

Special thanks to JetBrains for his supports to this project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].