Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → biocommons → Hgvs

biocommons / Hgvs

Licence: apache-2.0

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`

Programming Languages

139335 projects - #7 most used programming language

Labels

bioinformatics genomics sequencing

Projects that are alternatives of or similar to Hgvs

A collection of scripts and notes related to genomics and bioinformatics

Stars: ✭ 101 (-26.81%)

Mutual labels: bioinformatics, genomics, sequencing

Genomics Extension for SQLite

Stars: ✭ 90 (-34.78%)

Mutual labels: bioinformatics, genomics, sequencing

Assembling the cause of phenotypes and genotypes from NGS data

Stars: ✭ 27 (-80.43%)

Mutual labels: bioinformatics, genomics, sequencing

Intuitive local web frontend for the BLAST bioinformatics tool

Stars: ✭ 198 (+43.48%)

Mutual labels: bioinformatics, genomics, sequencing

Analysis pipelines for sequencing data

Stars: ✭ 43 (-68.84%)

Mutual labels: bioinformatics, genomics, sequencing

SaffronTree: Reference free rapid phylogenetic tree construction from raw read data

Stars: ✭ 17 (-87.68%)

Mutual labels: bioinformatics, genomics, sequencing

A tool to circularize genome assemblies

Stars: ✭ 121 (-12.32%)

Mutual labels: bioinformatics, genomics, sequencing

Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI

Stars: ✭ 27 (-80.43%)

Mutual labels: bioinformatics, genomics, sequencing

Official code repository for GATK versions 4 and up

Stars: ✭ 1,002 (+626.09%)

Mutual labels: bioinformatics, genomics, sequencing

Awesome Sequencing Tech Papers

A collection of publications on comparison of high-throughput sequencing technologies.

Stars: ✭ 21 (-84.78%)

Mutual labels: bioinformatics, genomics, sequencing

Antimicrobial Resistance Identification By Assembly

Stars: ✭ 96 (-30.43%)

Mutual labels: bioinformatics, genomics, sequencing

An interactive web tool for quality control of DNA sequencing data

Stars: ✭ 76 (-44.93%)

Mutual labels: bioinformatics, genomics, sequencing

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

Stars: ✭ 2,404 (+1642.03%)

Mutual labels: bioinformatics, genomics, sequencing

A package for designing compact and comprehensive capture probe sets.

Stars: ✭ 55 (-60.14%)

Mutual labels: bioinformatics, genomics, sequencing

Rapid large-scale prokaryote pan genome analysis

Stars: ✭ 176 (+27.54%)

Mutual labels: bioinformatics, genomics, sequencing

Data intensive science for everyone.

Stars: ✭ 812 (+488.41%)

Mutual labels: bioinformatics, genomics, sequencing

Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

Stars: ✭ 67 (-51.45%)

Mutual labels: bioinformatics, genomics, sequencing

Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation

Stars: ✭ 135 (-2.17%)

Mutual labels: bioinformatics, genomics, sequencing

Gcp For Bioinformatics

GCP Essentials for Bioinformatics Researchers

Stars: ✭ 95 (-31.16%)

Mutual labels: bioinformatics, genomics

Inference of ploidy and heterozygosity structure using whole genome sequencing data

Stars: ✭ 98 (-28.99%)

Mutual labels: bioinformatics, genomics

View All Similar Projects ➔

hgvs - manipulate biological sequence variants according to Human Genome Variation Society recommendations !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Important: biocommons packages require Python 3.6+. More <https://groups.google.com/forum/#!topic/hgvs-discuss/iLUzjzoD-28>__

The hgvs package provides a Python library to parse, format, validate, normalize, and map sequence variants according to Variation Nomenclature_ (aka Human Genome Variation Society) recommendations.

Specifically, the hgvs package focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package does not attempt to cover the full scope of HGVS recommendations. Please refer to issues <https://github.com/biocommons/hgvs/issues>_ for limitations.

+--------------------+--------------------------------------------------------------------+ | Information | | |rtd| |changelog| |github_license| |binder| | | | | |gitter| |group| |getting_help| | +--------------------+--------------------------------------------------------------------+ | Latest Release | |github_tag| |pypi_rel| |hit| (pip install) | +--------------------+--------------------------------------------------------------------+ | Development | | |status_rel| |coveralls| | | (master branch) | | |issues| |github_open_pr| | | | | |github_stars| |github_forks| |github_contrib| | +--------------------+--------------------------------------------------------------------+

Features @@@@@@@@

Parsing is based on formal grammar.
An easy-to-use object model that represents most variant types (SNVs, indels, dups, inverstions, etc) and concepts (intronic offsets, uncertain positions, intervals)
A variant normalizer that rewrites variants in canoncial forms and substitutes reference sequences (if reference and transcript sequences differ)
Formatters that generate HGVS strings from internal representations
Tools to map variants between genome, transcript, and protein sequences
Reliable handling of regions genome-transcript discrepancies
Pluggable data providers support alternative sources of transcript mapping data
Extensive automated tests, including those for all variant types and "problematic" transcripts
Easily installed using remote data sources. Installation with local data sources is straightforward and completely obviates network access

Important Notes @@@@@@@@@@@@@@@

You are encouraged to browse issues <https://github.com/biocommons/hgvs/issues>_. All known issues are listed there. Please report any issues you find.
Use a pip package specification to stay within minor releases. For example, hgvs>=1.5,<1.6. hgvs uses Semantic Versioning <http://semver.org/>__.

Examples @@@@@@@@

Installation #############

By default, hgvs uses remote data sources, which makes installation easy.

::

$ mkvirtualenv hgvs-test (hgvs-test)$ pip install --upgrade setuptools (hgvs-test)$ pip install hgvs (hgvs-test)$ python

See Installation instructions <http://hgvs.readthedocs.org/en/stable/installation.html>__ for details, including instructions for installing Universal Transcript Archive (UTA) <https://github.com/biocommons/uta/>__ and SeqRepo <https://github.com/biocommons/biocommons.seqrepo/>__ locally.

Configuration #############

hgvs will use publicly available data sources unless directed otherwise through environment variables, like so::

N.B. These are examples. The correct values will depend on your installation

$ export UTA_DB_URL=postgresql://anonymous:[email protected]:5432/uta/uta_20180821 $ export HGVS_SEQREPO_DIR=/usr/local/share/seqrepo/latest

See the installation instructions for details.

Parsing and Formating #####################

hgvs parses HGVS variants (as strings) into an object model, and can format object models back into HGVS strings.

.. code-block:: python

import hgvs.parser

start with these variants as strings

hgvs_g = 'NC_000007.13:g.36561662C>T' hgvs_c = 'NM_001637.3:c.1582G>A'

parse the genomic variant into a Python structure

hp = hgvs.parser.Parser() var_g = hp.parse_hgvs_variant(hgvs_g) var_g SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T, gene=None)

SequenceVariants are composed of structured objects, e.g.,

var_g.posedit.pos.start SimplePosition(base=36561662, uncertain=False)

format by stringification

str(var_g) 'NC_000007.13:g.36561662C>T'

Projecting ("Mapping") variants between aligned genome and transcript sequences ###############################################################################

hgvs provides tools to project variants between genome, transcript, and protein sequences. Non-coding and intronic variants are supported. Alignment data come from the Universal Transcript Archive (UTA) <https://github.com/biocommons/uta/>__.

.. code-block:: python

import hgvs.dataproviders.uta import hgvs.assemblymapper

initialize the mapper for GRCh37 with splign-based alignments

hdp = hgvs.dataproviders.uta.connect() am = hgvs.assemblymapper.AssemblyMapper(hdp, ... assembly_name='GRCh37', alt_aln_method='splign', ... replace_reference=True)

identify transcripts that overlap this genomic variant

transcripts = am.relevant_transcripts(var_g) sorted(transcripts) ['NM_001177506.1', 'NM_001177507.1', 'NM_001637.3']

map genomic variant to one of these transcripts

var_c = am.g_to_c(var_g, 'NM_001637.3') var_c SequenceVariant(ac=NM_001637.3, type=c, posedit=1582G>A, gene=None) str(var_c) 'NM_001637.3:c.1582G>A'

CDS coordinates use BaseOffsetPosition to support intronic offsets

var_c.posedit.pos.start BaseOffsetPosition(base=1582, offset=0, datum=Datum.CDS_START, uncertain=False)

Translating coding variants to protein sequences ################################################

Coding variants may be translated to their protein consequences. HGVS uses the same pairing of transcript and protein accessions as seen in NCBI and Ensembl.

.. code-block:: python

translate var_c to its protein consequence

The object structure of protein variants is nearly identical to

that of nucleic acid variants and is converted to a string form

by stringification. Per HGVS recommendations, inferred consequences

must have parentheses to indicate uncertainty.

var_p = am.c_to_p(var_c) var_p SequenceVariant(ac=NP_001628.1, type=p, posedit=(Gly528Arg), gene=None) str(var_p) 'NP_001628.1:p.(Gly528Arg)'

setting uncertain to False removes the parentheses on the

stringified form

var_p.posedit.uncertain = False str(var_p) 'NP_001628.1:p.Gly528Arg'

formatting can be customized, e.g., use 1 letter amino acids to

format a specific variant

(configuration may also be set globally)

var_p.format(conf={"p_3_letter": False}) 'NP_001628.1:p.G528R'

Normalizing variants ####################

Some variants have multiple representations due to instrinsic biological ambiguity (e.g., inserting a G in a poly-G run) or due to misunderstanding HGVS recommendations. Normalization rewrites certain veriants into a single representation.

.. code-block:: python

rewrite ins as dup (depends on sequence context)

import hgvs.normalizer hn = hgvs.normalizer.Normalizer(hdp) hn.normalize(hp.parse_hgvs_variant('NM_001166478.1:c.35_36insT')) SequenceVariant(ac=NM_001166478.1, type=c, posedit=35dup, gene=None)

during mapping, variants are normalized (by default)

c1 = hp.parse_hgvs_variant('NM_001166478.1:c.31del') c1 SequenceVariant(ac=NM_001166478.1, type=c, posedit=31del, gene=None) c1n = hn.normalize(c1) c1n SequenceVariant(ac=NM_001166478.1, type=c, posedit=35del, gene=None) g = am.c_to_g(c1) g SequenceVariant(ac=NC_000006.11, type=g, posedit=49917127del, gene=None) c2 = am.g_to_c(g, c1.ac) c2 SequenceVariant(ac=NM_001166478.1, type=c, posedit=35del, gene=None)

There are more examples in the documentation <http://hgvs.readthedocs.org/en/stable/examples.html>_.

Citing hgvs (the package) @@@@@@@@@@@@@@@@@@@@@@@@@

| hgvs: A Python package for manipulating sequence variants using HGVS nomenclature: 2018 Update. | Wang M, Callenberg KM, Dalgleish R, Fedtsov A, Fox N, Freeman PJ, Jacobs KB, Kaleta P, McMurry AJ, Prlić A, Rajaraman V, Hart RK | Human Mutation. 2018 Pubmed <https://www.ncbi.nlm.nih.gov/pubmed/30129167>__ | Open Access PDF <https://doi.org/10.1002/humu.23615>__

| A Python Package for Parsing, Validating, Mapping, and Formatting Sequence Variants Using HGVS Nomenclature. | Hart RK, Rico R, Hare E, Garcia J, Westbrook J, Fusaro VA. | Bioinformatics. 2014 Sep 30. PubMed <http://www.ncbi.nlm.nih.gov/pubmed/25273102>__ | Open Access PDF <http://bioinformatics.oxfordjournals.org/content/31/2/268.full.pdf>__

Contributing @@@@@@@@@@@@

The hgvs package is intended to be a community project. Please see Contributing <http://hgvs.readthedocs.org/en/stable/contributing.html>__ to get started in submitting source code, tests, or documentation. Thanks for getting involved!

See Also @@@@@@@@

Other packages that manipulate HGVS variants:

pyhgvs <https://github.com/counsyl/hgvs>__
Mutalyzer <https://mutalyzer.nl/>__

.. _docs: http://hgvs.readthedocs.org/ .. _Variation Nomenclature: http://varnomen.hgvs.org/

.. |getting_help| image:: https://img.shields.io/badge/!-help%20me-red.svg :target: https://hgvs.readthedocs.io/en/stable/getting_help.html

.. |rtd| image:: https://img.shields.io/badge/docs-readthedocs-green.svg :target: http://hgvs.readthedocs.io/

.. |changelog| image:: https://img.shields.io/badge/docs-changelog-green.svg :target: https://hgvs.readthedocs.io/en/stable/changelog/

.. |github_license| image:: https://img.shields.io/github/license/biocommons/hgvs.svg :alt: GitHub license :target: https://github.com/biocommons/hgvs/blob/master/LICENSE)

.. |group| image:: https://img.shields.io/badge/group-hgvs%20discuss-green.svg :alt: Mailing list :target: https://groups.google.com/forum/#!forum/hgvs-discuss

.. |gitter| image:: https://img.shields.io/badge/chat-gitter-green.svg :alt: Join the chat at https://gitter.im/biocommons/hgvs :target: https://gitter.im/biocommons/hgvs?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

.. |github_tag| image:: https://img.shields.io/github/tag/biocommons/hgvs.svg :alt: GitHub tag :target: https://github.com/biocommons/hgvs

.. |pypi_rel| image:: https://img.shields.io/pypi/v/hgvs.svg :target: https://pypi.org/project/hgvs/

.. |status_rel| image:: https://img.shields.io/travis/biocommons/hgvs/master.svg :target: https://travis-ci.org/biocommons/hgvs?branch=master

.. |coveralls| image:: https://img.shields.io/coveralls/github/biocommons/hgvs.svg :target: https://coveralls.io/github/biocommons/hgvs

.. |issues| image:: https://img.shields.io/github/issues-raw/biocommons/hgvs.svg :alt: issues :target: https://github.com/biocommons/hgvs/issues

.. |github_open_pr| image:: https://img.shields.io/github/issues-pr/biocommons/hgvs.svg :alt: GitHub Open Pull Requests :target: https://github.com/biocommons/hgvs/pull/

.. |github_stars| image:: https://img.shields.io/github/stars/biocommons/hgvs.svg?style=social&label=Stars :alt: GitHub stars :target: https://github.com/biocommons/hgvs/stargazers

.. |github_forks| image:: https://img.shields.io/github/forks/biocommons/hgvs.svg?style=social&label=Forks :alt: GitHub forks :target: https://github.com/biocommons/hgvs/network

.. |github_contrib| image:: https://img.shields.io/github/contributors/biocommons/hgvs.svg :alt: GitHub license :target: https://github.com/biocommons/hgvs/graphs/contributors/

.. |install_status| image:: https://travis-ci.org/reece/hgvs-integration-test.png?branch=master :target: https://travis-ci.org/reece/hgvs-integration-test

.. |binder| image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/biocommons/hgvs/master?filepath=examples

.. |hit| image:: https://travis-ci.org/biocommons/hgvs-installation-test.svg?branch=master :alt: nightly test of ability to pip install, import, and parse a variant :target: https://travis-ci.org/biocommons/hgvs-installation-test

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 138

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (74) 🔗