All Projects → nhl → link-move

nhl / link-move

Licence: Apache-2.0 license
A model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to link-move

Getting Started
This repository is a getting started guide to Singer.
Stars: ✭ 734 (+2193.75%)
Mutual labels:  etl, etl-framework
Openkettlewebui
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 125 (+290.63%)
Mutual labels:  etl, etl-framework
Pyetl
python ETL framework
Stars: ✭ 33 (+3.13%)
Mutual labels:  etl, etl-framework
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-37.5%)
Mutual labels:  etl, etl-framework
Metl
mito ETL tool
Stars: ✭ 153 (+378.13%)
Mutual labels:  etl, etl-framework
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1062.5%)
Mutual labels:  etl, etl-framework
Hale
(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Stars: ✭ 84 (+162.5%)
Mutual labels:  etl, etl-framework
DataBridge.NET
Configurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-50%)
Mutual labels:  etl, etl-framework
Hydrograph
A visual ETL development and debugging tool for big data
Stars: ✭ 144 (+350%)
Mutual labels:  etl, etl-framework
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+293.75%)
Mutual labels:  etl, etl-framework
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+1028.13%)
Mutual labels:  etl, etl-framework
Etlbox
A lightweight ETL (extract, transform, load) library and data integration toolbox for .NET.
Stars: ✭ 203 (+534.38%)
Mutual labels:  etl, etl-framework
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-12.5%)
Mutual labels:  etl, etl-framework
Etlalchemy
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+1337.5%)
Mutual labels:  etl, etl-framework
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-34.37%)
Mutual labels:  etl, etl-framework
Stetl
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (+100%)
Mutual labels:  etl, etl-framework
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-25%)
Mutual labels:  etl, etl-framework
cubetl
CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-34.37%)
Mutual labels:  etl, etl-framework
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (+290.63%)
Mutual labels:  etl, etl-framework
Bender
Bender - Serverless ETL Framework
Stars: ✭ 171 (+434.38%)
Mutual labels:  etl, etl-framework

Build Status Maven Central

LinkMove

LinkMove is a model-driven dynamically-configurable framework to acquire data from external sources and save it in your database. Its primary motivation is to facilitate domain-driven design architectures. In DDD terms LinkMove is a tool to synchronize data between related models from different "bounded contexts". It can also be used as a general purpose ETL framework.

LinkMove connects data models in a flexible way that anticipates independent changes between sources and targets. It will reuse your existing ORM mapping for the target database, reducing configuration to just describing the source. It supports JDBC, XML, JSON, CSV sources out of the box.

Support

There are two options:

  • Open an issue on GitHub with a label of "help wanted" or "question" (or "bug" if you think you found a bug).
  • Post your question on the LinkMove forum.

Getting Started

Add LinkMove dependency:

<dependency>
    <groupId>com.nhl.link.move</groupId>
    <artifactId>link-move</artifactId>
    <version>3.0.M1</version>
</dependency>

The core module above supports relational and XML sources. The following optional modules may be added if you need to work with other formats:

<!-- for JSON -->
<dependency>
    <groupId>com.nhl.link.move</groupId>
    <artifactId>link-move-json</artifactId>
    <version>3.0.M1</version>
</dependency>
<!-- for CSV -->
<dependency>
    <groupId>com.nhl.link.move</groupId>
    <artifactId>link-move-csv</artifactId>
    <version>3.0.M1</version>
</dependency>

Use it:

// bootstrap shared runtime that will run tasks
DataSource srcDS = // define how you'd connect to data source 
ServerRuntime targetRuntime = // Cayenne setup for data target .. targets are mapped in Cayenne 
File rootDir = .. // this is a parent dir of XML descriptors

LmRuntime lm = LmRuntimeBuilder()
          .withConnector("myconnector", new DataSourceConnector(srcDS))
          .withTargetRuntime(targetRuntime)
          .extractorModelsRoot(rootDir)
          .build();

// create a reusable task for a given transformation
LmTask task = lm.getTaskService()
         .createOrUpdate(MyTargetEntity.class)
         .sourceExtractor("my-etl.xml")
         .matchBy(MyTargetEntity.NAME).task();

// run task, e.g. in a scheduled job
Execution e = task.run();

Extractor XML Format

Extractor XML format is described by a formal schema: http://linkmove.io/xsd/extractor_config_2.xsd

An example using JDBC connector for the source data:

<?xml version="1.0" encoding="utf-8"?>
<config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
	xsi:schemaLocation="http://linkmove.io/xsd/extractor_config_2.xsd"
	xmlns="http://linkmove.io/xsd/extractor_config_2.xsd">
	
	<type>jdbc</type>
	<connectorId>myconnector</connectorId>
	
	<extractor>
		<!-- Optional source to target attribute mapping -->
		<attributes>
			<attribute>
				<type>java.lang.Integer</type>
				<source>AGE</source>
				<target>db:age</target>
			</attribute>
			<attribute>
				<type>java.lang.String</type>
				<source>DESCRIPTION</source>
				<target>db:description</target>
			</attribute>
			<attribute>
				<type>java.lang.String</type>
				<source>NAME</source>
				<target>db:name</target>
			</attribute>
		</attributes>
		<!-- JDBC connector properties. -->
		<properties>
			<!-- Query to run against the source. Supports full Cayenne 
			     SQLTemplate syntax, including parameters and directives.
			-->
			<extractor.jdbc.sqltemplate>
			       SELECT age, description, name FROM etl1
			</extractor.jdbc.sqltemplate>
		</properties>
	</extractor>
</config>

Logging Configuration

LinkMove uses Slf4J abstraction for logging, that will work with most common logging frameworks (Log4J2, Logback, etc.). With any framework you use, you will need to configure the following log levels depending on the desired verbosity of your ETL tasks.

Logging ETL Progress

You need to configure the com.nhl.link.move.log logger to log the progress of the ETL tasks. The following table shows what is logged at each log level:

Log Level What is Logged
WARN Nothing
INFO Task start/stop with stats
DEBUG Same as INFO, but also includes start/stop of each segment with segment stats
TRACE Same as DEBUG, but also includes IDs of all affected target objects (deleted, created, updated)

Logging SQL

ETL-related SQL generated by Cayenne is extremely verbose and barely human-readable. You need to configure the org.apache.cayenne.log logger to turn it on and off:

Log Level What is Logged
WARN Nothing
INFO Cayenne-generated SQL queries and updates
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].