All Projects → reata → sqllineage

reata / sqllineage

Licence: MIT license
SQL Lineage Analysis Tool powered by Python

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to sqllineage

bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (-67.82%)
Mutual labels:  data-governance, data-lineage
data-lineage
Generate and Visualize Data Lineage from query history
Stars: ✭ 166 (-52.3%)
Mutual labels:  data-governance, data-lineage
document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Stars: ✭ 36 (-89.66%)
Mutual labels:  data-governance, data-lineage
metamapper
Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Stars: ✭ 60 (-82.76%)
Mutual labels:  metadata, data-discovery
Datahub
The Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (+1116.09%)
Mutual labels:  metadata, data-discovery
columbus
Metadata storage service
Stars: ✭ 42 (-87.93%)
Mutual labels:  metadata, lineage
dbt-superset-lineage
Make dbt docs and Apache Superset talk to one another
Stars: ✭ 60 (-82.76%)
Mutual labels:  lineage, data-lineage
Amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Stars: ✭ 2,901 (+733.62%)
Mutual labels:  metadata, data-discovery
datacatalog
Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. It also powers Flytes memoization system
Stars: ✭ 52 (-85.06%)
Mutual labels:  metadata, lineage
IFIscripts
Detailed documentation is available here: http://ifiscripts.readthedocs.io/en/latest/index.html
Stars: ✭ 46 (-86.78%)
Mutual labels:  metadata
metadata-xml-tool
CLI tool for processing Salesforce Metadata XML files
Stars: ✭ 14 (-95.98%)
Mutual labels:  metadata
metadata-one-liners
retrive metadata endpoint data with these one liners.
Stars: ✭ 38 (-89.08%)
Mutual labels:  metadata
conp-dataset
📂 A DataLad dataset for CONP
Stars: ✭ 17 (-95.11%)
Mutual labels:  metadata
icc
JavaScript module to parse International Color Consortium (ICC) profiles
Stars: ✭ 37 (-89.37%)
Mutual labels:  metadata
metad
Metad is a metadata server, support self semantic.
Stars: ✭ 77 (-77.87%)
Mutual labels:  metadata
oge
Page metadata as a service
Stars: ✭ 22 (-93.68%)
Mutual labels:  metadata
rollup-plugin-sizes
Rollup plugin to display bundle contents & size information
Stars: ✭ 77 (-77.87%)
Mutual labels:  metadata
meta-extractor
Super simple and fast html page meta data extractor with low memory footprint
Stars: ✭ 38 (-89.08%)
Mutual labels:  metadata
graphql-ts
Graphql implementation in Typescript using decorator
Stars: ✭ 63 (-81.9%)
Mutual labels:  metadata
database-metadata-bind
A library for binding information from java.sql.DatabaseMetadata
Stars: ✭ 17 (-95.11%)
Mutual labels:  metadata

SQLLineage

SQL Lineage Analysis Tool powered by Python

image image image image Build Status Documentation Status codecov Code style: black security: bandit

Never get the hang of a SQL parser? SQLLineage comes to the rescue. Given a SQL command, SQLLineage will tell you its source and target tables, without worrying about Tokens, Keyword, Identifier and all the jagons used by SQL parsers.

Behind the scene, SQLLineage uses the fantastic sqlparse library to parse the SQL command, and bring you all the human-readable result with ease.

Demo & Documentation

Talk is cheap, show me a demo.

Documentation is online hosted by readthedocs, and you can check the release note there.

Quick Start

Install sqllineage via PyPI:

$ pip install sqllineage

Using sqllineage command to parse a quoted-query-string:

$ sqllineage -e "insert into db1.table1 select * from db2.table2"
Statements(#): 1
Source Tables:
    db2.table2
Target Tables:
    db1.table1

Or you can parse a SQL file with -f option:

$ sqllineage -f foo.sql
Statements(#): 1
Source Tables:
    db1.table_foo
    db1.table_bar
Target Tables:
    db2.table_baz

Advanced Usage

Multiple SQL Statements

Lineage result combined for multiple SQL statements, with intermediate tables identified:

$ sqllineage -e "insert into db1.table1 select * from db2.table2; insert into db3.table3 select * from db1.table1;"
Statements(#): 2
Source Tables:
    db2.table2
Target Tables:
    db3.table3
Intermediate Tables:
    db1.table1

Verbose Lineage Result

And if you want to see lineage result for every SQL statement, just toggle verbose option

$ sqllineage -v -e "insert into db1.table1 select * from db2.table2; insert into db3.table3 select * from db1.table1;"
Statement #1: insert into db1.table1 select * from db2.table2;
    table read: [Table: db2.table2]
    table write: [Table: db1.table1]
    table cte: []
    table rename: []
    table drop: []
Statement #2: insert into db3.table3 select * from db1.table1;
    table read: [Table: db1.table1]
    table write: [Table: db3.table3]
    table cte: []
    table rename: []
    table drop: []
==========
Summary:
Statements(#): 2
Source Tables:
    db2.table2
Target Tables:
    db3.table3
Intermediate Tables:
    db1.table1

Column-Level Lineage

We also support column level lineage in command line interface, set level option to column, all column lineage path will be printed.

INSERT OVERWRITE TABLE foo
SELECT a.col1,
       b.col1     AS col2,
       c.col3_sum AS col3,
       col4,
       d.*
FROM bar a
         JOIN baz b
              ON a.id = b.bar_id
         LEFT JOIN (SELECT bar_id, sum(col3) AS col3_sum
                    FROM qux
                    GROUP BY bar_id) c
                   ON a.id = sq.bar_id
         CROSS JOIN quux d;

INSERT OVERWRITE TABLE corge
SELECT a.col1,
       a.col2 + b.col2 AS col2
FROM foo a
         LEFT JOIN grault b
              ON a.col1 = b.col1;

Suppose this sql is stored in a file called foo.sql

$ sqllineage -f foo.sql -l column
<default>.corge.col1 <- <default>.foo.col1 <- <default>.bar.col1
<default>.corge.col2 <- <default>.foo.col2 <- <default>.baz.col1
<default>.corge.col2 <- <default>.grault.col2
<default>.foo.* <- <default>.quux.*
<default>.foo.col3 <- c.col3_sum <- <default>.qux.col3
<default>.foo.col4 <- col4

Lineage Visualization

One more cool feature, if you want a graph visualization for the lineage result, toggle graph-visualization option

Still using the above SQL file

sqllineage -g -f foo.sql

A webserver will be started, showing DAG representation of the lineage result in browser:

  • Table-Level Lineage

Table-Level Lineage

  • Column-Level Lineage

Column-Level Lineage

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].