All Projects → odpf → columbus

odpf / columbus

Licence: Apache-2.0 License
Metadata storage service

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to columbus

sqllineage
SQL Lineage Analysis Tool powered by Python
Stars: ✭ 348 (+728.57%)
Mutual labels:  metadata, lineage
datacatalog
Data Catalog is a service for indexing parameterized, strongly-typed data artifacts across revisions. It also powers Flytes memoization system
Stars: ✭ 52 (+23.81%)
Mutual labels:  metadata, lineage
HiGitClass
HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories (ICDM'19)
Stars: ✭ 58 (+38.1%)
Mutual labels:  metadata
noronha
DataOps framework for Machine Learning projects.
Stars: ✭ 47 (+11.9%)
Mutual labels:  dataops
charts
This repository is home to the original helm charts for products throughout the open data platform ecosystem.
Stars: ✭ 39 (-7.14%)
Mutual labels:  dataops
PHES-ODM
Metadata and code to support covid-19 wastewater surveillance and open science.
Stars: ✭ 34 (-19.05%)
Mutual labels:  metadata
Awesome-Image-Gallery-Android
Open source Image Gallery with tons of feature .
Stars: ✭ 22 (-47.62%)
Mutual labels:  metadata
assistant-with-discovery-openwhisk
DEPRECATED: this repo is no longer actively maintained
Stars: ✭ 21 (-50%)
Mutual labels:  discovery
databrewer
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
Stars: ✭ 39 (-7.14%)
Mutual labels:  discovery
hasura-metadata-patcher
CLI tool to patch Hasura metadata json file. Helps to organize complex CI/CD flows through different environments.
Stars: ✭ 14 (-66.67%)
Mutual labels:  metadata
Happy
Happy 🥳 | Rocketseat 💜 - NLW 03 👩‍🚀
Stars: ✭ 61 (+45.24%)
Mutual labels:  discovery
TEAM
The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (-35.71%)
Mutual labels:  metadata
lenses-go
Lenses.io CLI (command-line interface)
Stars: ✭ 34 (-19.05%)
Mutual labels:  dataops
skyrimse
The TES V: Skyrim Special Edition masterlist.
Stars: ✭ 99 (+135.71%)
Mutual labels:  metadata
go-xmp
A native Go SDK for the Extensible Metadata Platform (XMP)
Stars: ✭ 36 (-14.29%)
Mutual labels:  metadata
roda
RODA - Repository of Authentic Digital Objects
Stars: ✭ 54 (+28.57%)
Mutual labels:  metadata
openmrs-module-initializer
The OpenMRS Initializer module is an API-only module that processes the content of the configuration folder when it is found inside OpenMRS' application data directory.
Stars: ✭ 18 (-57.14%)
Mutual labels:  metadata
bing-ip2hosts
bingip2hosts is a Bing.com web scraper that discovers websites by IP address
Stars: ✭ 99 (+135.71%)
Mutual labels:  discovery
replica
Replica, the id3 metadata cloner
Stars: ✭ 13 (-69.05%)
Mutual labels:  metadata
Home
This is the old home for the Steeltoe project. Please refer to the SteeltoeOSS/steeltoe repository moving forward.
Stars: ✭ 49 (+16.67%)
Mutual labels:  discovery

Columbus

test workflow build workflow License Version

Columbus is a search and discovery engine built for querying application deployments, datasets and meta resources. It can also optionally track data flow relationships between these resources and allow the user to view a representation of the data flow graph.

Key Features

Discover why users choose Columbus as their main data discovery and lineage service

  • Full text search Faster and better search results powered by ElasticSearch full text search capability.
  • Search Tuning Narrow down your search results by adding filters, getting your crisp results.
  • Data Lineage Understand the relationship between metadata with data lineage interface.
  • Scale: Columbus scales in an instant, both vertically and horizontally for high performance.
  • Extensibility: Add your own metadata types and resources to support wide variety of metadata.
  • Runtime: Columbus can run inside VMs or containers in a fully managed runtime environment like kubernetes.

Usage

Explore the following resources to get started with Columbus:

  • Guides provides guidance on ingesting and queying metadata from Columbus.
  • Concepts describes all important Columbus concepts.
  • Reference contains details about configurations, metrics and other aspects of Columbus.
  • Contribute contains resources for anyone who wants to contribute to Columbus.

Requirements

Columbus is written in golang, and requires go version >= 1.16. Please make sure that the go tool chain is available on your machine. See golang’s documentation for installation instructions.

Alternatively, you can use docker to build columbus as a docker image. More on this in the next section.

Columbus uses elasticsearch v7 as the query and storage backend. In order to run columbus locally, you’ll need to have an instance of elasticsearch running. You can either download elasticsearch and run it manually, or you can run elasticsearch inside docker by running the following command in a terminal

$ docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.6.1

Running locally

Begin by cloning this repository, then you have two ways in which you can build columbus

  • As a native executable
  • As a docker image

To build columbus as a native executable, run make inside the cloned repository.

$ make

This will create the columbus binary in the root directory

Building columbus’s Docker image is just a simple, just run docker build command and optionally name the image

$ docker build . -t columbus

Columbus interfaces with an elasticsearch cluster. Run columbus using:

./columbus -elasticsearch-brokers "http://<broker-host-name>"

Elasticsearch brokers can alternatively be specified via the ELASTICSEARCH_BROKERS environment variable.

If you used Docker to build columbus, then configuring networking requires extra steps. Following is one of doing it, running elasticsearch inside docker

# create a docker network where columbus and elasticsearch will reside 
$ docker network create columbus-net

# run elasticsearch, bound to the network we created. Since we are using the -d flag to docker run, the command inside the subshell returns the container id
$ ES_CONTAINER_ID=$(docker run -d -e "discovery.type=single-node" --net columbus-net elasticsearch:7.5.2)

# run columbus, passing in the hostname (container id) of the elasticsearch server
# if everything goes ok, you should say something like this:

# time="2020-04-01T18:41:00Z" level=info msg="columbus v0.1.0-103-g83b909b starting on 0.0.0.0:8080" reporter=main
# time="2020-04-01T18:41:00Z" level=info msg="connected to elasticsearch cluster \"docker-cluster\" (server version 7.5.2)" reporter=main
$ docker run --net columbus-net columbus -p 8080:8080 -elasticsearch-brokers http://${ES_CONTAINER_ID}:9200 

Running tests

# Run unit tests
$ make unit-test

# Run integration tests
$ make test

The integration test suite requires docker to run elasticsearch. In case you wish to test against an existing elasticsearch cluster, set the value of ES_TEST_SERVER_URL to the URL of the elasticsearch server.

Contribute

Development of Columbus happens in the open on GitHub, and we are grateful to the community for contributing bugfixes and improvements. Read below to learn how you can take part in improving Columbus.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to Columbus.

To help you get your feet wet and get you familiar with our contribution process, we have a list of good first issues that contain bugs which have a relatively limited scope. This is a great place to get started.

This project exists thanks to all the contributors.

License

Columbus is Apache 2.0 licensed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].