All Projects → ricardolsmendes → datacatalog-tag-manager

ricardolsmendes / datacatalog-tag-manager

Licence: MIT license
Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to datacatalog-tag-manager

Googlecloudarchitectprofessional
Resources to prepare for Google Certified Cloud Architect Professional Exam - 2017
Stars: ✭ 177 (+941.18%)
Mutual labels:  gcp, google-cloud
auth
A GitHub Action for authenticating to Google Cloud.
Stars: ✭ 567 (+3235.29%)
Mutual labels:  gcp, google-cloud
bqv
The simplest tool to manage views of BigQuery.
Stars: ✭ 22 (+29.41%)
Mutual labels:  bigdata, google-cloud
Qwiklabs
labs guide for completing qwiklabs challenge
Stars: ✭ 103 (+505.88%)
Mutual labels:  gcp, google-cloud
GoogleCloudLogging
Swift (Darwin) library for logging application events in Google Cloud.
Stars: ✭ 24 (+41.18%)
Mutual labels:  gcp, google-cloud
Gcp Datastore Cloud Functions Realworld Example App
Serverless GCP Cloud Functions + Datastore implementation of RealWorld Backend
Stars: ✭ 122 (+617.65%)
Mutual labels:  gcp, google-cloud
Arvados
An open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (+1511.76%)
Mutual labels:  bigdata, gcp
Airflow Toolkit
Any Airflow project day 1, you can spin up a local desktop Kubernetes Airflow environment AND one in Google Cloud Composer with tested data pipelines(DAGs) 🖥 >> [ 🚀, 🚢 ]
Stars: ✭ 51 (+200%)
Mutual labels:  gcp, google-cloud
augle
Auth + Google = Augle
Stars: ✭ 22 (+29.41%)
Mutual labels:  gcp, google-cloud
bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (+558.82%)
Mutual labels:  bigdata, data-governance
Iap Desktop
IAP Desktop is a Windows application that provides zero-trust Remote Desktop and SSH access to Linux and Windows VMs on Google Cloud.
Stars: ✭ 96 (+464.71%)
Mutual labels:  gcp, google-cloud
awesome-bigquery-views
Useful SQL queries for Blockchain ETL datasets in BigQuery.
Stars: ✭ 325 (+1811.76%)
Mutual labels:  gcp, google-cloud
Unity Solutions
Use Firebase tools to incorporate common features into your games!
Stars: ✭ 95 (+458.82%)
Mutual labels:  gcp, google-cloud
Gcpsketchnote
If you are looking to become a Google Cloud Engineer , then you are at the right place. GCPSketchnote is series where I share Google Cloud concepts in quick and easy to learn format.
Stars: ✭ 2,631 (+15376.47%)
Mutual labels:  gcp, google-cloud
Fog Google
Fog for Google Cloud Platform
Stars: ✭ 83 (+388.24%)
Mutual labels:  gcp, google-cloud
datasphere-service
an open source dataworks platform
Stars: ✭ 20 (+17.65%)
Mutual labels:  bigdata, data-governance
Ethereum Etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Stars: ✭ 956 (+5523.53%)
Mutual labels:  gcp, google-cloud
Grpc Gke Nlb Tutorial
gRPC load-balancing on GKE using Envoy
Stars: ✭ 42 (+147.06%)
Mutual labels:  gcp, google-cloud
deploy-appengine
A GitHub Action that deploys source code to Google App Engine.
Stars: ✭ 184 (+982.35%)
Mutual labels:  gcp, google-cloud
grucloud
Generate diagrams and code from cloud infrastructures: AWS, Azure,GCP, Kubernetes
Stars: ✭ 76 (+347.06%)
Mutual labels:  gcp, google-cloud

datacatalog-tag-manager

A Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources. Currently supports the CSV file format.

license pypi issues continuous integration continuous delivery

Table of Contents


1. Environment setup

1.1. Python + virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.1.1. Install Python 3.6+

1.1.2. Create a folder

This is recommended so all related stuff will reside at the same place, making it easier to follow the next instructions.

mkdir ./datacatalog-tag-manager
cd ./datacatalog-tag-manager

All paths starting with ./ in the next steps are relative to the datacatalog-tag-manager folder.

1.1.3. Create and activate an isolated Python environment

pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate

1.1.4. Install the package

pip install --upgrade datacatalog-tag-manager

1.2. Docker

Docker may be used as an option to run datacatalog-tag-manager. In this case, please disregard the above virtualenv setup instructions.

1.2.1. Get the source code

git clone https://github.com/ricardolsmendes/datacatalog-tag-manager
cd ./datacatalog-tag-manager

1.3. Auth credentials

1.3.1. Create a service account and grant it below roles

  • BigQuery Metadata Viewer
  • Data Catalog TagTemplate User
  • A custom role with bigquery.datasets.updateTag and bigquery.tables.updateTag permissions

1.3.2. Download a JSON key and save it as

  • ./credentials/datacatalog-tag-manager.json

1.3.3. Set the environment variables

This step may be skipped if you're using Docker.

export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-tag-manager.json

2. Manage Tags

2.1. Create or Update

2.1.1. From a CSV file

  • SCHEMA

The metadata schema to create or update Tags is presented below. Use as many lines as needed to describe all the Tags and Fields you need.

Column Description Mandatory
linked_resource OR entry_name Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries
template_name Resource name of the Tag Template for the Tag
column Attach Tags to a column belonging to the Entry schema
field_id Id of the Tag field
field_value Value of the Tag field
  • SAMPLE INPUT
  1. sample-input/upsert-tags for reference;
  2. Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
  • COMMANDS

Python + virtualenv

datacatalog-tags upsert --csv-file <CSV-FILE-PATH>

Docker

docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
  --volume <CREDENTIALS-FILE-DIR>:/credentials --volume <CSV-FILE-DIR>:/data \
  datacatalog-tag-manager upsert --csv-file /data/<CSV-FILE-PATH>

2.2. Delete

2.2.1. From a CSV file

  • SCHEMA

The metadata schema to delete Tags is presented below. Use as many lines as needed to delete all the Tags you want.

Column Description Mandatory
linked_resource OR entry_name Full name of the BigQuery or PubSub asset the Entry refers to, or an Entry name if you are working with Custom Entries
template_name Resource name of the Tag Template of the Tag
column Delete Tags from a column belonging to the Entry schema
  • SAMPLE INPUT
  1. sample-input/delete-tags for reference;
  2. Data Catalog Sample Tags (Google Sheets) might help to create/export a CSV file.
  • COMMANDS

Python + virtualenv

datacatalog-tags delete --csv-file <CSV-FILE-PATH>

Docker

docker build --rm --tag datacatalog-tag-manager .
docker run --rm --tty \
  --volume <CREDENTIALS-FILE-DIR>:/credentials --volume <CSV-FILE-DIR>:/data \
  datacatalog-tag-manager delete --csv-file /data/<CSV-FILE-PATH>

3. How to contribute

Please make sure to take a moment and read the Code of Conduct.

3.1. Report issues

Please report bugs and suggest features via the GitHub Issues.

Before opening an issue, search the tracker for possible duplicates. If you find a duplicate, please add a comment saying that you encountered the problem as well.

3.2. Contribute code

Please make sure to read the Contributing Guide before making a pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].