All Projects → RoelantVos → Data-Warehouse-Automation-Metadata-Schema

RoelantVos / Data-Warehouse-Automation-Metadata-Schema

Licence: LGPL-3.0 license
Generic interface exchange format for Data Warehouse Automation and ETL generation.

Programming Languages

C#
18002 projects
Handlebars
879 projects
Batchfile
5799 projects

Projects that are alternatives of or similar to Data-Warehouse-Automation-Metadata-Schema

DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-23.08%)
Mutual labels:  datawarehouse, etl-automation, datawarehouseautomation
alphasql
AlphaSQL provides Integrated Type and Schema Check and Parallelization for SQL file set mainly for BigQuery
Stars: ✭ 35 (+34.62%)
Mutual labels:  datawarehouse, datawarehouseautomation
BimlFlex-Community
Community-focused content to supplement working with BimlFlex.
Stars: ✭ 30 (+15.38%)
Mutual labels:  datawarehouse, datawarehouseautomation
TEAM
The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (+3.85%)
Mutual labels:  datawarehouseautomation, etlgeneration
vixtract
www.vixtract.ru
Stars: ✭ 40 (+53.85%)
Mutual labels:  etl-automation
peppy
Project metadata manager for PEPs in Python
Stars: ✭ 29 (+11.54%)
Mutual labels:  metadata-management
stream-registry
Stream Discovery and Stream Orchestration
Stars: ✭ 105 (+303.85%)
Mutual labels:  metadata-management
FlowMaster
ETL flow framework based on Yaml configs in Python
Stars: ✭ 19 (-26.92%)
Mutual labels:  etl-automation
cortana-intelligence-customer360
This repository contains instructions and code to deploy a customer 360 profile solution on Azure stack using the Cortana Intelligence Suite.
Stars: ✭ 22 (-15.38%)
Mutual labels:  datawarehouse
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (-19.23%)
Mutual labels:  etl-automation
tweetsOLAPing
implementing an end-to-end tweets ETL/Analysis pipeline.
Stars: ✭ 24 (-7.69%)
Mutual labels:  datawarehouse
intelli-swift-core
Distributed, Column-oriented storage, Realtime analysis, High performance Database
Stars: ✭ 17 (-34.62%)
Mutual labels:  datawarehouse
couchwarehouse
Data warehouse for CouchDB
Stars: ✭ 41 (+57.69%)
Mutual labels:  datawarehouse
go-xmp
A native Go SDK for the Extensible Metadata Platform (XMP)
Stars: ✭ 36 (+38.46%)
Mutual labels:  metadata-management
dlink
Dinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
Stars: ✭ 1,535 (+5803.85%)
Mutual labels:  datawarehouse
Crema
Meta data server & client tools for game development
Stars: ✭ 61 (+134.62%)
Mutual labels:  metadata-management
DataWarehouse
从数据仓库到用户画像,从数据建设到数据应用
Stars: ✭ 298 (+1046.15%)
Mutual labels:  datawarehouse
csv-cruncher
Treats CSV and JSON files as SQL tables, and exports SQL SELECTs back to CSV or JSON.
Stars: ✭ 32 (+23.08%)
Mutual labels:  etl-automation
heurist
Core development repository. gitHub: Vsn 6 (2020 - ), Vsn 5 (2018 - 2020), Vsn 4 (2014-2017). Sourceforge: Vsn 3 (2009-2013), Vsn 1 & 2 (2005-2009)
Stars: ✭ 39 (+50%)
Mutual labels:  metadata-management
BETL-old
BETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-34.62%)
Mutual labels:  etl-automation

Generation Metadata Schema for Data Warehouse Automation

Intent

To provide a collaborative space to discuss an exchange format concerning ETL generation metadata, supporting Data Warehouse Automation. This adapter should contain all metadata necessary to generate the transformation logic for a Data Warehouse solution.

Links / structure

The following directories have been set up:

  • Generic interface, containing the Json schema definition.
  • Class Library (DataWarehouseAutomation) containing the object model for deserialisation, as well as various utility classes such as validation of files against the Json schema definition.
  • Code examples (examples_handlebars), containing C# examples using the generic interface for various purposes.
  • Regression test project (test_project)

Hypothesis

Across most, if not all, metadata models there is a core set of information that is required for any generation of ETL. If we can separate this from the UI / management of metadata we could have an exchange format that allows anyone to 'plug in' their own desired technology.

As an example 'TEAM' has the intent to separate UI with a view of limiting data entry and validation, but does not focus on SQL generation. Rather, these functions are separated by an adapter that is accessible as Json or database view.

Requirements

The fundamental requirements of the metadata adapter are:

  • Containing all metadata required to generate ETL output. This notably includes:
    • source-to-target mappings
    • physical model metadata (columns and tables, data types etc.)
    • connectivity information, or proxy
  • Text-based to support version control

Background

In the Data Warehouse Automation (DWA) domain there are many specialists (i.e. ETL developers, Data Warehouse and Data Architects, BI analysts etc.) who have been, or are working on, proprietary meta models to support forward-engineering of code and designs.

Some of these are built inside existing tools (i.e. ERwin, Powerdesigner) using SDKs or macros. Others use different development frameworks (.net, Java) and most use differently modelled repositories or file formats to persist data on disk.

This is in addition to the many off-the-shelf DWA platforms, each of which has their own repository and format as well.

In the broader sense of meritocracy, it is worth pursuing if a common exchange format for metadata can be defined in a way that any developer can develop to in whatever technology or way their passion drives them.

Working guidelines

For any change, create a new branch (no direct commits to master branch).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].