All Projects → ga4gh → vrs-python

ga4gh / vrs-python

Licence: Apache-2.0 license
GA4GH Variation Representation Python Implementation

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to vrs-python

simuG
simuG: a general-purpose genome simulator
Stars: ✭ 68 (+94.29%)
Mutual labels:  genomics
mity
mity: A highly sensitive mitochondrial variant analysis pipeline for whole genome sequencing data
Stars: ✭ 27 (-22.86%)
Mutual labels:  genomics
genepattern-server
The GenePattern Server web application
Stars: ✭ 26 (-25.71%)
Mutual labels:  genomics
biopython-coronavirus
Biopython Jupyter Notebook tutorial to characterize a small genome
Stars: ✭ 80 (+128.57%)
Mutual labels:  genomics
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+420%)
Mutual labels:  genomics
DRAM
Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes
Stars: ✭ 159 (+354.29%)
Mutual labels:  genomics
genipe
Genome-wide imputation pipeline
Stars: ✭ 28 (-20%)
Mutual labels:  genomics
mandrake
Mandrake 🌿/👨‍🔬🦆 – Fast visualisation of the population structure of pathogens using Stochastic Cluster Embedding
Stars: ✭ 29 (-17.14%)
Mutual labels:  genomics
bigly
a pileup library that embraces the huge
Stars: ✭ 38 (+8.57%)
Mutual labels:  genomics
DriverPower
DriverPower
Stars: ✭ 22 (-37.14%)
Mutual labels:  genomics
gawn
Genome Annotation Without Nightmares
Stars: ✭ 35 (+0%)
Mutual labels:  genomics
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (+11.43%)
Mutual labels:  genomics
psmc
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
Stars: ✭ 121 (+245.71%)
Mutual labels:  genomics
PHIST
Phage-Host Interaction Search Tool
Stars: ✭ 19 (-45.71%)
Mutual labels:  genomics
genoiser
use the noise
Stars: ✭ 15 (-57.14%)
Mutual labels:  genomics
bfc
High-performance error correction for Illumina resequencing data
Stars: ✭ 66 (+88.57%)
Mutual labels:  genomics
LRSDAY
LRSDAY: Long-read Sequencing Data Analysis for Yeasts
Stars: ✭ 26 (-25.71%)
Mutual labels:  genomics
cryfa
A secure encryption tool for genomic data
Stars: ✭ 53 (+51.43%)
Mutual labels:  genomics
instaGRAAL
Large genome reassembly based on Hi-C data, continuation of GRAAL
Stars: ✭ 32 (-8.57%)
Mutual labels:  genomics
gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 103 (+194.29%)
Mutual labels:  genomics

vrs-python

PyPI version Travis

vrs-python provides Python language support for the GA4GH Variation Representation Specification (VRS).

This repository contains several related components:

  • ga4gh.core package Python language support for certain nascent standards in GA4GH. Eventually, this package should be moved to a distinct repo.

  • ga4gh.vrs package Python language support for VRS.

  • ga4gh.vrs.extras package Python language support for additional functionality, including translating from and to other variant formats and a REST service to similar functionality. ga4gh.vrs.extras requires access to supporting data, as described below.

  • Jupyter notebooks Demonstrations of the functionality of ga4gh.vrs and ga4gh.vrs.extras in the form of easy-to-read notebooks.

VRS-Python and VRS Version Correspondence

The ga4gh/vrs-python repo embeds the ga4gh/vrs repo as a git submodule, and therefore each ga4gh.vrs package on PyPi embeds a particular version of VRS. The correspondences between the packages may be summarized as:

  • main ~ main: The vrs-python main branch tracks the vrs main branch.
  • develop ~ develop: The vrs-python develop branch tracks the vrs develop branch.
  • 0.6 ~ 1.1: vrs-python 0.6 branch tracks the vrs 1.1 branch.
  • 0.7 ~ 1.2: vrs-python 0.7 branch tracks the vrs 1.2 branch.
  • 0.8 ~ main: vrs-python 0.8 branch tracks the vrs main (dev) branch.

Developers: See the development section below for recommendations for using submodules gracefully (and without causing problems for others!).

Installation

Installing with pip

pip install ga4gh.vrs[extras]

The [extras] argument tells pip to install packages to fullfill the dependencies of the ga4gh.vrs.extras package.

Installing dependencies for ga4gh.vrs.extras

The ga4gh.vrs.extras modules are not part of the VR spec per se. They are bundled with ga4gh.vrs for development and installation convenience. These modules depend directly and indrectly on external data sources of sequences, transcripts, and genome-transcript alignments. This section recommends one way to install the biocommons tools that provide these data.

docker volume create --name=uta_vol
docker volume create --name=seqrepo_vol
docker-compose up

This should start three containers:

  • seqrepo: downloads seqrepo into a docker volume and exits
  • seqrepo-rest-service: a REST service on seqrepo (localhost:5000)
  • uta: a database of transcripts and alignments (localhost:5432)

Check that the containers are running:

$ docker ps
CONTAINER ID        IMAGE                                    //  NAMES
86e872ab0c69        biocommons/seqrepo-rest-service:latest   //  vrs-python_seqrepo-rest-service_1
a40576b8cf1f        biocommons/uta:uta_20180821              //  vrs-python_uta_1

Depending on your network and host, the first run is likely to take 5-15 minutes in order to download and install data. Subsequent startups should be nearly instantaneous.

You can test UTA and seqrepo installations like so:

snafu$ psql -XAt postgres://anonymous@localhost/uta -c 'select count(*) from transcript'
249909

It doesn't work!

Here are some things to try.

  • Bring up one service at a time. For example, if you haven't download seqrepo yet, you might see this:

    snafu$ docker-compose up seqrepo-rest-service
    Starting vrs-python_seqrepo-rest-service_1 ... done
    Attaching to vrs-python_seqrepo-rest-service_1
    seqrepo-rest-service_1  | 2022-07-26 15:59:59 snafu seqrepo_rest_service.__main__[1] INFO Using seqrepo_dir='/usr/local/share/seqrepo/2021-01-29' from command line
    ⋮
    seqrepo-rest-service_1  | OSError: Unable to open SeqRepo directory /usr/local/share/seqrepo/2021-01-29
    vrs-python_seqrepo-rest-service_1 exited with code 1
    

Running the Notebooks

Once installed as described above, type

$ source venv/3.7/bin/activate
$ jupyter notebook --notebook-dir notebooks/

The following jupyter extensions are recommended but not required

$ pip install jupyter_contrib_nbextensions
$ jupyter contrib nbextension install --user
$ jupyter nbextension enable toc2/main

Running the Notebooks on the Terra platform

Terra is a cloud platform for biomedical research developed by the Broad Institute, Microsoft and Verily. The platform includes preconfigured environments that provide user-friendly access to various applications commonly used in bioinformatics, including Jupyter Notebooks.

We have created a public VRS-demo-notebooks workspace in Terra that contains the demo notebooks along with instructions for running them with minimal setup. To get started, see either the VRS-demo-notebooks workspace or the Terra.ipynb notebook in this repository.

Development

Submodules!

vrs-python embeds vrs as a submodule. When checking out vrs-python and switching branches, it is important to make sure that the submodule tracks vrs-python correctly. The recommended way to do this is git config --global submodule.recurse true. If you don't set submodule.recurse, developers and reviewers must be extremely careful to not accidentially upgrade or downgrade schemas with respect to vrs-python.

Alternatively, see misc/githooks/.

Installing for development

Fork the repo at https://github.com/ga4gh/vrs-python/ .

$ git clone --recurse-submodules [email protected]:YOUR_GITHUB_ID/vrs-python.git
$ cd vrs-python
$ make devready

Testing

This package implements typical unit tests for ga4gh.core and ga4gh.vrs. This package also implements the compliance tests from vrs (vrs/validation) in the tests/validation/ directory.

$ make test

Developing VRS (the schema) too

If you want to develop the VRS schema in conjunction with vrs-python, the recommended approach for most users is to fork and clone the ga4gh/vrs repo, then set the VRS_SCHEMA_DIR environment variable to use an alternative schema location.

Security Note (from the GA4GH Security Team)

A stand-alone security review has been performed on the specification itself. This implementation is offered as-is, and without any security guarantees. It will need an independent security review before it can be considered ready for use in security-critical applications. If you integrate this code into your application it is AT YOUR OWN RISK AND RESPONSIBILITY to arrange for a security audit.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].