All Projects → biocommons → Uta

biocommons / Uta

Licence: apache-2.0
Universal Transcript Archive: comprehensive genome-transcript alignments; multiple transcript sources, versions, and alignment methods; available as a docker image

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Uta

Nonpareil
Estimate metagenomic coverage and sequence diversity
Stars: ✭ 26 (-31.58%)
Mutual labels:  bioinformatics
Workshop
课题组每周研讨会
Stars: ✭ 28 (-26.32%)
Mutual labels:  bioinformatics
Metasra Pipeline
MetaSRA: normalized sample-specific metadata for the Sequence Read Archive
Stars: ✭ 33 (-13.16%)
Mutual labels:  bioinformatics
Scanpy
Single-Cell Analysis in Python. Scales to >1M cells.
Stars: ✭ 858 (+2157.89%)
Mutual labels:  bioinformatics
Vdjviz
A lightweight immune repertoire browser
Stars: ✭ 21 (-44.74%)
Mutual labels:  bioinformatics
Sv Callers
Snakemake-based workflow for detecting structural variants in WGS data
Stars: ✭ 28 (-26.32%)
Mutual labels:  bioinformatics
Pretzel
Javascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-31.58%)
Mutual labels:  bioinformatics
Etrf
Exact Tandem Repeat Finder (not a TRF replacement)
Stars: ✭ 35 (-7.89%)
Mutual labels:  bioinformatics
Sevenbridges R
Seven Bridges API Client, CWL Schema, Meta Schema, and SDK Helper in R
Stars: ✭ 27 (-28.95%)
Mutual labels:  bioinformatics
Fastp
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
Stars: ✭ 966 (+2442.11%)
Mutual labels:  bioinformatics
Uncurl python
UNCURL is a tool for single cell RNA-seq data analysis.
Stars: ✭ 13 (-65.79%)
Mutual labels:  bioinformatics
Minimap2
A versatile pairwise aligner for genomic and spliced nucleotide sequences
Stars: ✭ 912 (+2300%)
Mutual labels:  bioinformatics
Cytometry Clustering Comparison
R scripts to reproduce analyses in our paper comparing clustering methods for high-dimensional cytometry data
Stars: ✭ 30 (-21.05%)
Mutual labels:  bioinformatics
Scispacy
A full spaCy pipeline and models for scientific/biomedical documents.
Stars: ✭ 855 (+2150%)
Mutual labels:  bioinformatics
Bwa
Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
Stars: ✭ 970 (+2452.63%)
Mutual labels:  bioinformatics
Taxadb
🐣 locally query the ncbi taxonomy
Stars: ✭ 26 (-31.58%)
Mutual labels:  bioinformatics
Rasusa
Randomly subsample sequencing reads to a specified coverage
Stars: ✭ 28 (-26.32%)
Mutual labels:  bioinformatics
Locuszoom Standalone
Create regional association plots from GWAS or meta-analysis
Stars: ✭ 35 (-7.89%)
Mutual labels:  bioinformatics
Genevalidator
GeneValidator: Identify problems with predicted genes
Stars: ✭ 34 (-10.53%)
Mutual labels:  bioinformatics
Protr
Comprehensive toolkit for generating various numerical features of protein sequences
Stars: ✭ 30 (-21.05%)
Mutual labels:  bioinformatics

uta -- Universal Transcript Archive !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

bringing smiles to transcript users since 2013

.. Docs <http://pythonhosted.org/uta/>_

|build_status| |docker_badge| Health Check_

Overview @@@@@@@@

The UTA (Universal Transcript Archive) stores transcripts aligned to sequence references (typically genome reference assemblies). It supports aligning the same transcript to multiple references using multiple alignment methods. Specifically, it facilitates the following:

  • querying for multiple transcript sources through a single interface
  • interpretating variants reported in literature against obsolete transcript records
  • identifying regions where transcript and reference genome sequence assemblies disagree
  • comparing transcripts across from distinct sources
  • comparing transcript alignments generated by multiple methods
  • identifying ambiguities in transcript alignments

UTA is used by the hgvs_ package to map variants between genomic, transcript, and protein coordinates.

This code repository is primarily used for generating the UTA database. The primary interface for the database itself is via direct PostgreSQL access. (A REST interface <https://bitbucket.org/biocommons/uta/issue/164/>_ is planned, but not yet available.)

Accessing the Public UTA Instance @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Invitae provides a public instance of UTA. The connection parameters are:

============ =================== param value ============ =================== host uta.invitae.com port 5432 (default) database uta login anonymous password anonymous ============ ===================

For example::

$ PGPASSWORD=anonymous psql -h uta.invitae.com -U anonymous -d uta

Or, in Python::

import psycopg2, psycopg2.extras conn = psycopg2.connect("host=uta.invitae.com dbname=uta user=anonymous password=anonymous") cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) cur.execute("select * from uta_20140210.tx_def_summary_v where hgnc='BRCA1'") row = cur.fetchone() dict(row) { 'hgnc': 'BRCA1', 'tx_ac': 'ENST00000586385', 'cds_md5': '5d405c70b9b79add38d28e5011a6ddc0', 'es_fingerprint': '95d60b8d62f5c23cbeff3499eedf5e89', 'cds_start_i': 144, 'cds_end_i': 666, 'starts_i': [0, 148, 226, 267, 351, 406, 480, 541], 'ends_i': [148, 226, 267, 351, 406, 480, 541, 781], 'lengths': [148, 78, 41, 84, 55, 74, 61, 240], }

Installing UTA Locally @@@@@@@@@@@@@@@@@@@@@@

Installing with Docker (preferred) ##################################

docker <http://docker.com>_ enables the distribution of lightweight, isolated packages that run on essentially any platform. When you use this approach, you will end up with a local UTA installation that runs as a local postgresql process. The only requirement is docker itself -- you will not need to install postgresql or any of its dependencies.

#. Install docker <https://docs.docker.com/installation/>_.

#. Define the uta version to download

::

  $ uta_v=uta_20180821

This variable is used only for consistency in the examples that follow. Define this variable is not required for any other reason.

The UTA version string indicates the data release date. The tag is made at the time of loading and is used to derive the filename for the database dumps and docker images. Therefore, the public postgresql instances, database dumps, and docker images will always contain exactly the same content.

#. Fetch the uta docker image from docker hub.

::

  $ docker pull biocommons/uta:$uta_v

This process will likely take 1-3 minutes.

#. Run the image

::

  $ docker volume create --name=$uta_v
  $ docker run \
    --name $uta_v \
-e POSTGRES_PASSWORD=some-password-you-make-up \
-p 5432:5432 \
    -v $uta_v:/var/lib/postgresql/data \
biocommons/uta:$uta_v

The first time you run this image, it will initialize a postgresql database cluster, then download a database dump and install it. -d starts the container in daemon (background) mode. To see progress::

  $ docker logs -f $uta_v

You will see messages from several processes running in parallel. Near the end, you'll see::

 == You may now connect to uta.  No password is required.
 ...
 2020-05-28 22:08:45.654 UTC [1] LOG:  database system is ready to accept connections

Hit Ctrl-C to stop watching logs. (The container will still be running.)

#. Test your installation

With the test commands below, you should see a table dump with at least 4 lines showing schema_version, create date, license, and uta (code) version used to build the instance.

Linux

On Linux, where docker runs natively, -p 50827:5432 option to the docker run command causes localhost:50827 to be mapped to the container port 5432. The following command connects to the UTA instance::

  $ psql -h localhost -p 50827 -U anonymous -d uta -c "select * from $uta_v.meta"

With DockerToolbox (Mac and Windows)

On Mac and Windows, docker runs in a virtual machine using DockerToolbox <https://www.docker.com/docker-toolbox>__. The -p 50827:5432 option to the docker run maps VM port 50827 (not that of the host OS). In order to connect to UTA, you must use the IP address of the VM, like this::

  $ psql -h $(docker-machine ip default) -p 50827 -U anonymous -d uta -c "select * from $uta_v.meta"

Installing from database dumps ##############################

Users should prefer the public UTA instance (uta.biocommons.org) or the docker installation wherever possible. When those options are not available, users may wish to create a local postgresql database from database dumps. Users choosing this method of installation should be experienced with PostgreSQL administration.

The public site and docker images are built from exactly the same dumps as provided below. Building a database from these should result in a local database that is essentially identical to those options.

.. warning:: Due to the heterogeneity of operating systems and PostgreSQL installations, installing from database dumps is unsupported.

The following commands will likely need modification appropriate for the installation environment.

#. Download an appropriate database dump from dl.biocommons.org <http://dl.biocommons.org/uta/>_.

#. Create a user and database.

You may choose any username and database name you like. uta and uta_admin are likely to ease installation.

::

  $ createuser -U postgres uta_admin
  $ createdb -U postgres -O uta_admin uta 

#. Restore the database.

::

  $ gzip -cdq uta_20150827.pgd.gz | psql -U uta_admin -1 -v ON_ERROR_STOP=1 -d uta -Eae

.. note:: See the hgvs docs for information how to configure hgvs <http://hgvs.readthedocs.org/en/latest/installation.html#local-uta-docker-instance>_ to use this instance.

Development and Testing @@@@@@@@@@@@@@@@@@@@@@@

To develop UTA, follow these steps.

  1. Setup a virtual environment.

With virtualenvwrapper_::

mkvirtualenv uta-ve

Or, with virtualenv_::

virtualenv uta-ve
source uta-ve/bin/activate
  1. Clone UTA.::

    hg clone ssh://[email protected]/biocommons/uta cd uta make develop

  2. Restore a database or load a new one

    UTA currently expects to have an existing database available. When the loaders are available, instructions will appear here. For now, creating an instance of TranscriptDB without arguments will cause it to connect to a populated Invitae database.

.. _health check: https://updown.io/a7i5 .. _hgvs: https://bitbucket.org/invitae/hgvs .. _virtualenv: https://pypi.python.org/pypi/virtualenv .. _virtualenvwrapper: http://virtualenvwrapper.readthedocs.org/en/latest/install.html

.. |build_status| image:: https://travis-ci.org/biocommons/uta.svg?branch=master :target: https://travis-ci.org/biocommons/uta :align: middle

.. |docker_badge| image:: https://img.shields.io/docker/pulls/biocommons/uta.svg?maxAge=2592000 :target: https://hub.docker.com/r/biocommons/uta/ :align: middle

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].