All Projects → chop-dbhi → data-models

chop-dbhi / data-models

Licence: other
Collection of various biomedical data models in parseable formats.

Projects that are alternatives of or similar to data-models

Specs
Technical specifications and guidelines for implementing Frictionless Data.
Stars: ✭ 403 (+1652.17%)
Mutual labels:  schema, csv
DataAnalyzer.app
✨🚀 DataAnalyzer.app - Convert JSON/CSV to Typed Data Interfaces - Automatically!
Stars: ✭ 23 (+0%)
Mutual labels:  schema, csv
awesome-csv
Awesome Comma-Separated Values (CSV) - What's Next? - Frequently Asked Questions (F.A.Q.s) - Libraries & Tools
Stars: ✭ 46 (+100%)
Mutual labels:  schema, csv
Flatfiles
Reads and writes CSV, fixed-length and other flat file formats with a focus on schema definition, configuration and speed.
Stars: ✭ 275 (+1095.65%)
Mutual labels:  schema, csv
Omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148 (+543.48%)
Mutual labels:  schema, csv
mobivoc
A vocabulary for future-oriented mobility solutions and value-added services supporting them.
Stars: ✭ 27 (+17.39%)
Mutual labels:  schema, vocabulary
transferdb
TransferDB 支持异构数据库 schema 转换、全量数据导出导入以及增量数据同步功能( Oracle 数据库 -> MySQL/TiDB 数据库)
Stars: ✭ 30 (+30.43%)
Mutual labels:  schema, csv
datatools
A set of tools for working with JSON, CSV and Excel workbooks
Stars: ✭ 68 (+195.65%)
Mutual labels:  csv
csv2json
Writen in C, CSV file to JSON file/string converter with utf8 support.
Stars: ✭ 18 (-21.74%)
Mutual labels:  csv
linkedin-to-jsonresume
Browser extension to turn a LinkedIn profile page into a JSON Resume export.
Stars: ✭ 93 (+304.35%)
Mutual labels:  schema
org-clock-csv
Export Emacs org-mode clock entries to CSV format.
Stars: ✭ 80 (+247.83%)
Mutual labels:  csv
thema
A CUE-based framework for portable, evolvable schema
Stars: ✭ 41 (+78.26%)
Mutual labels:  schema
Clockwork
A roleplaying framework developed by Cloud Sixteen for the people.
Stars: ✭ 37 (+60.87%)
Mutual labels:  schema
tableschema-go
A Go library for working with Table Schema.
Stars: ✭ 41 (+78.26%)
Mutual labels:  csv
haskell-schema
A library for describing Haskell data types and obtain free generators, JSON codecs, pretty printers, etc.
Stars: ✭ 16 (-30.43%)
Mutual labels:  schema
AlphaVantageAPI
An Opinionated AlphaVantage API Wrapper in Python 3.9. Compatible with Pandas TA (pip install pandas_ta). Get your FREE API Key at https://www.alphavantage.co/support/
Stars: ✭ 77 (+234.78%)
Mutual labels:  csv
bluepine
A DSL for defining API schemas/endpoints, validating, serializing and generating Open API v3
Stars: ✭ 21 (-8.7%)
Mutual labels:  schema
YouPlot
A command line tool that draw plots on the terminal.
Stars: ✭ 412 (+1691.3%)
Mutual labels:  csv
shopify-product-csvs-and-images
Shopify product CSVs and images to seed your store with product data.
Stars: ✭ 76 (+230.43%)
Mutual labels:  csv
workbook
simple framework for containing spreadsheet like data
Stars: ✭ 13 (-43.48%)
Mutual labels:  csv

Data Models

Data models and vocabularies in the biomedical space.

Persistent CSV Format

Data model descriptions are stored persistently in this repository in CSV format for portability and human readability. Each data model has its own directory with versions of the model in subdirectories. Each version directory has a datamodel.json file that holds metadata about the datamodel and version, so as not to rely on directory structure for interpretability. In fact, this file and a collection of CSV files with the below described header signatures is enough to signal that a data model definition exists. However, the organization and naming conventions presented below have been useful in our initial data model definitions.

Each data model version should have at least definitions and schema directories and, optionally, a constraints directory and indexes.csv and references.csv files.

The definitions directory (e.g., omop/v5/definitions) holds basic information about the data model that would be of primary interest to a data user. There is a tables.csv file (e.g., omop/v5/definitions/tables.csv), which lists name and description for each table, as well as a CSV file for each table (e.g., omop/v5/definitions/person.csv), which lists name and description for each field, whether the field is required (a governance, not schema, attribute), and optionally a ref_table and ref_field combination to which the field refers (typically manifested as a foreign key relationship).

The schema directory holds detailed information that might be used to instantiate the data model in a database or other physical storage medium. There is a CSV file for each table (e.g., omop/v5/schema/person.csv) that lists type, length, precision, scale, and default attributes (all optional except type) for each field, which is identified by model, version, table name, and field name attributes.

The constraints directory (e.g., omop/v5/constraints), if present, can hold any number of CSV files which list data level constraints that should be applied to any physical representation of the data model. These files (e.g., omop/v5/constraints/not_nulls.csv) contain a type, an optional name, and the target table and field for each constraint.

The indexes.csv file (e.g., omop/v5/indexes.csv), if present, lists indexes that should be built on a physical representation of the data model, with name, whether the index should be unique, target table and field, and order attributes for each index.

The references.csv file (e.g., omop/v5/references.csv), if present, lists references (usually foreign keys) which should be enforced on the data model. Each reference is listed with the source table and field, the target table and field, and an optional name.

Each data model root directory may have a renamings.csv file (e.g., omop/renamings.csv) that maps fields which have been renamed across versions by providing a source data model version, table, and field and a target version, table, and field.

The top-level mappings directory holds a series of CSV files which list field level mappings between data models. The files (e.g., mappings/pedsnet_v2_omop_v5.csv) contain a target_model, target_version, target_table, and target_field as well as a source_model, source_version, source_table, and source_field along with a free text comment for each mapping.

CSV Tools

Python

The csv can be used in the standard library.

import csv

# Writes all records to a file given a filename, a list of string representing
# the header, and a list of rows containing the data.
def write_records(filename, header, rows):
    with open('person.csv', 'w+') as f:
        w = csv.writer(f)

        w.writerow(header)

        for row in rows:
            w.writerow(row)

PostgreSQL

PostgreSQL provides valid CSV output using the COPY statement. The output can be to an file using an absolute file name or to STDOUT.

Absolute path.

COPY ( ... )
    TO '/path/to/person.csv'
    WITH (
        FORMAT csv,
        DELIMITER ',',
        NULL '',
        HEADER true,
        ENCODING 'utf-8'
    )

To STDOUT.

COPY ( ... )
    TO STDOUT
    WITH (
        FORMAT csv,
        DELIMITER ',',
        NULL '',
        HEADER true,
        ENCODING 'utf-8'
    )
Java

The opencsv is a popular package for reading and writing CSV files.

For loop with rows as a Collection or Array.

CSVWriter writer = new CSVWriter(new FileWriter(fileName),
                                 CSVWriter.DEFAULT_SEPARATOR,
                                 CSVWriter.NO_QUOTE_CHARACTER);

writer.writeNext(header)

for (int row : rows) {
    writer.writeNext(row);
}

writer.close();

If rows is a java.sql.ResultSet, use writeAll directly.

CSVWriter writer = new CSVWriter(new FileWriter(fileName),
                                 CSVWriter.DEFAULT_SEPARATOR,
                                 CSVWriter.NO_QUOTE_CHARACTER);

// Pass the result set and derive the header from the result set
// (assuming it is valid with the spec).
writer.writeAll(rows, true);

writer.close();
Oracle

Oracle experts should feel free to chime in, but a very promising option is Oracle's new SQLcl command-line tool, available on an early-adopter basis as part of the SQL Developer family. SQLcl is being touted as a modern replacement for SQL*Plus.

Sample usage:

set sqlformat csv
spool footable.csv
select * from footable;
spool off

Another option is to use the SQL Developer GUI itself, which, although convenient, is not amenable to automation, as SQLcl is.

SQL Developer (and probably SQLcl) export CSV using the following conventions: all text fields are wrapped in quotes (even NULL values, because NULL and empty string are treated the same in Oracle), and no numeric fields are wrapped in quotes. Quotes within fields are escaped via doubling. Newlines within fields are included in the output.

SQL Developer usage:

  • On a Data tab (or a table name in the Connections panel), right-click and choose Export
  • Change format to csv
  • Change line terminator to Unix - other formatting and encoding defaults are fine
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].