All Projects → tokern → data-lineage

tokern / data-lineage

Licence: MIT License
Generate and Visualize Data Lineage from query history

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
Dockerfile
14818 projects
shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to data-lineage

document-processing-pipeline-for-regulated-industries
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Stars: ✭ 36 (-78.31%)
Mutual labels:  data-governance, data-lineage
bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (-32.53%)
Mutual labels:  data-governance, data-lineage
sqllineage
SQL Lineage Analysis Tool powered by Python
Stars: ✭ 348 (+109.64%)
Mutual labels:  data-governance, data-lineage
coq jupyter
Jupyter kernel for Coq
Stars: ✭ 70 (-57.83%)
Mutual labels:  jupyter
2021 course dev-rougier
NumFocus Academy - Matplotlib (beginner)
Stars: ✭ 54 (-67.47%)
Mutual labels:  jupyter
z3 tutorial
Jupyter notebooks for tutorial on the Z3 SMT solver
Stars: ✭ 117 (-29.52%)
Mutual labels:  jupyter
dmind
jupyter notebook 的思维导图插件
Stars: ✭ 21 (-87.35%)
Mutual labels:  jupyter
Odysis
Jupyter Interactive Widgets library for 3-D mesh analysis
Stars: ✭ 15 (-90.96%)
Mutual labels:  jupyter
zoe
Zoe: Container Analytics as a Service -- mirror of https://gitlab.eurecom.fr/zoe/main/
Stars: ✭ 51 (-69.28%)
Mutual labels:  jupyter
pytest-notebook
A pytest plugin for regression testing and regenerating Jupyter Notebooks
Stars: ✭ 35 (-78.92%)
Mutual labels:  jupyter
astetik
Astetik takes away the pain from telling visual stories with data on Python
Stars: ✭ 15 (-90.96%)
Mutual labels:  jupyter
colab-badge-action
GitHub Action that generates "Open In Colab" Badges for you
Stars: ✭ 15 (-90.96%)
Mutual labels:  jupyter
kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+185.54%)
Mutual labels:  jupyter
deep-learning-nd
Udacity Deep learning nanodegree projects
Stars: ✭ 52 (-68.67%)
Mutual labels:  jupyter
RocketJoe
RocketJoe is a software development platform for creating high-performance applications.
Stars: ✭ 31 (-81.33%)
Mutual labels:  jupyter
datasphere-service
an open source dataworks platform
Stars: ✭ 20 (-87.95%)
Mutual labels:  data-governance
machine-learning-snippets
Python Machine Learning Snippets contains various machine learning examples as Jupyter notebooks with scikit-learn, statsmodel, numpy and other libraries.
Stars: ✭ 20 (-87.95%)
Mutual labels:  jupyter
iracket
Jupyter kernel for Racket
Stars: ✭ 84 (-49.4%)
Mutual labels:  jupyter
mpl-interactions
Sliders to control matplotlib and other interactive goodies. Works in any interactive backend and even uses ipywidgets when in a Jupyter notebook
Stars: ✭ 62 (-62.65%)
Mutual labels:  jupyter
gaia
Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.
Stars: ✭ 29 (-82.53%)
Mutual labels:  jupyter

Tokern Lineage Engine

CircleCI codecov PyPI image image

Tokern Lineage Engine is fast and easy to use application to collect, visualize and analyze column-level data lineage in databases, data warehouses and data lakes in AWS and GCP.

Tokern Lineage helps you browse column-level data lineage

Resources

  • Demo of Tokern Lineage App

data-lineage

Quick Start

Install a demo of using Docker and Docker Compose

Download the docker-compose file from Github repository.

# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml -o docker-compose.yml

Run docker-compose

docker-compose up -d

Check that the containers are running.

docker ps
CONTAINER ID   IMAGE                                    CREATED        STATUS       PORTS                    NAMES
3f4e77845b81   tokern/data-lineage-viz:latest   ...   4 hours ago    Up 4 hours   0.0.0.0:8000->80/tcp     tokern-data-lineage-visualizer
1e1ce4efd792   tokern/data-lineage:latest       ...   5 days ago     Up 5 days                             tokern-data-lineage
38be15bedd39   tokern/demodb:latest             ...   2 weeks ago    Up 2 weeks                            tokern-demodb

Try out Tokern Lineage App

Head to http://localhost:8000/ to open the Tokern Lineage app

Install Tokern Lineage Engine

# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml -o tokern-lineage-engine.yml

Run docker-compose

docker-compose up -d

If you want to use an external Postgres database, change the following parameters in tokern-lineage-engine.yml:

  • CATALOG_HOST
  • CATALOG_USER
  • CATALOG_PASSWORD
  • CATALOG_DB

You can also override default values using environement variables.

CATALOG_HOST=... CATALOG_USER=... CATALOG_PASSWORD=... CATALOG_DB=... docker-compose -f ... up -d

For more advanced usage of environment variables with docker-compose, refer to docker-compose docs

Pro-tip

If you want to connect to a database in the host machine, set

CATALOG_HOST: host.docker.internal # For mac or windows
#OR
CATALOG_HOST: 172.17.0.1 # Linux

Supported Technologies

  • Postgres
  • AWS Redshift
  • Snowflake

Coming Soon

  • SparkSQL
  • Presto

Documentation

For advanced usage, please refer to data-lineage documentation

Survey

Please take this survey if you are a user or considering using data-lineage. Responses will help us prioritize features better.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].