All Projects → biocommons → eutils

biocommons / eutils

Licence: Apache-2.0 license
simplified searching, fetching, and parsing records from NCBI using their E-utilities interface

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects
perl
6916 projects

Projects that are alternatives of or similar to eutils

STing
Ultrafast sequence typing and gene detection from NGS raw reads
Stars: ✭ 15 (-66.67%)
Mutual labels:  genomics
disq
A library for manipulating bioinformatics sequencing formats in Apache Spark
Stars: ✭ 29 (-35.56%)
Mutual labels:  genomics
MultiAssayExperiment
Bioconductor package for management of multi-assay data
Stars: ✭ 57 (+26.67%)
Mutual labels:  genomics
gnomad-browser
Explore gnomAD datasets on the web
Stars: ✭ 61 (+35.56%)
Mutual labels:  genomics
fwdpy11
Forward-time simulation in Python using fwdpp
Stars: ✭ 25 (-44.44%)
Mutual labels:  genomics
BigComputeLabs
Big Compute Learning Labs
Stars: ✭ 19 (-57.78%)
Mutual labels:  genomics
manhattan generator
Manhattan plot Generator
Stars: ✭ 20 (-55.56%)
Mutual labels:  genomics
workflows
Bioinformatics workflows developed for and used on the St. Jude Cloud project.
Stars: ✭ 16 (-64.44%)
Mutual labels:  genomics
assembly improvement
Improve the quality of a denovo assembly by scaffolding and gap filling
Stars: ✭ 46 (+2.22%)
Mutual labels:  genomics
Clair3
Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
Stars: ✭ 119 (+164.44%)
Mutual labels:  genomics
bap
Bead-based single-cell atac processing
Stars: ✭ 20 (-55.56%)
Mutual labels:  genomics
SplitThreader
Explore rearrangements and copy-number amplifications in a cancer genome
Stars: ✭ 65 (+44.44%)
Mutual labels:  genomics
soda
Python-based UCSC genome browser snapshot-taker and gallery-maker
Stars: ✭ 12 (-73.33%)
Mutual labels:  genomics
haslr
A fast tool for hybrid genome assembly of long and short reads
Stars: ✭ 68 (+51.11%)
Mutual labels:  genomics
redundans
Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
Stars: ✭ 90 (+100%)
Mutual labels:  genomics
omxware-getting-started
Examples to get started with IBM Functional Genomics Platform
Stars: ✭ 13 (-71.11%)
Mutual labels:  genomics
scarf
Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (+20%)
Mutual labels:  genomics
biowasm
WebAssembly modules for genomics
Stars: ✭ 115 (+155.56%)
Mutual labels:  genomics
go enrichment
Transcripts annotation and GO enrichment Fisher tests
Stars: ✭ 24 (-46.67%)
Mutual labels:  genomics
nf-hack17-tutorial
Nextflow basic tutorial for newbie users
Stars: ✭ 32 (-28.89%)
Mutual labels:  genomics

eutils -- a simplified interface to NCBI E-Utilities

pypi_badge build_status issues_badge contributors license docs changelog

eutils is a Python package to simplify searching, fetching, and parsing records from NCBI using their E-utilities interface.

News

  • 0.6.0 was released on 2019-12-17. Support for Python 2.7 has been dropped. See the 0.6 ChangeLog.

Documentation

See https://eutils.readthedocs.io/en/stable/

Features

  • simple Pythonic interface for searching and fetching
  • automatic query rate throttling per NCBI guidelines
  • optional sqlite-based caching of compressed replies
  • "façades" that facilitate access to essential attributes in replies

A Quick Example

As of May 1, 2018, NCBI throttles requests based on whether a client is registered. Unregistered clients are limited to 3 requests/second; registered clients are granted 10 requests/second, and may request more. See the NCBI Announcement for more information.

The eutils package will automatically throttle requests according to NCBI guidelines (3 or 10 requests/second without or with an API key, respectively).

$ pip install eutils
$ ipython

>>> from eutils import Client

# Initialize a client. This client handles all caching and query
# throttling.  For example:
>>> ec = Client(api_key=os.environ.get("NCBI_API_KEY", None))

# search for tumor necrosis factor genes
# any valid NCBI query may be used
>>> esr = ec.esearch(db='gene',term='tumor necrosis factor')

# fetch one of those (gene id 7157 is human TNF)
>>> egs = ec.efetch(db='gene', id=7157)

# One may fetch multiple genes at a time. These are returned as an
# EntrezgeneSet. We'll grab the first (and only) child, which returns
# an instance of the Entrezgene class.
>>> eg = egs.entrezgenes[0]

# Easily access some basic information about the gene
>>> eg.hgnc, eg.maploc, eg.description, eg.type, eg.genus_species
('TP53', '17p13.1', 'tumor protein p53', 'protein-coding', 'Homo sapiens')

# get a list of genomic references
>>> sorted([(r.acv, r.label) for r in eg.references])
[('NC_000017.11', 'Chromosome 17 Reference GRCh38...'),
 ('NC_018928.2', 'Chromosome 17 Alternate ...'),
 ('NG_017013.2', 'RefSeqGene')]

# Get the first three products defined on GRCh38
#>>> [p.acv for p in eg.references[0].products][:3]
#['NM_001126112.2', 'NM_001276761.1', 'NM_000546.5']

# As a sample, grab the first product defined on this reference (order is arbitrary)
>>> mrna = eg.references[0].products[0]
>>> str(mrna)
'GeneCommentary(acv=NM_001126112.2,type=mRNA,heading=Reference,label=transcript variant 2)'

# mrna.genomic_coords provides access to the exon definitions on this reference

>>> mrna.genomic_coords.gi, mrna.genomic_coords.strand
('568815581', -1)

>>> mrna.genomic_coords.intervals
[(7687376, 7687549), (7676520, 7676618), (7676381, 7676402),
(7675993, 7676271), (7675052, 7675235), (7674858, 7674970),
(7674180, 7674289), (7673700, 7673836), (7673534, 7673607),
(7670608, 7670714), (7668401, 7669689)]

# and the mrna has a product, the resulting protein:
>>> str(mrna.products[0])
'GeneCommentary(acv=NP_001119584.1,type=peptide,heading=Reference,label=isoform a)'

Important Notes

  • You are encouraged to browse issues. Please report any issues you find.
  • Use a pip package specification to ensure stay within minor releases for API stability. For example, eutils >=0.6,<0.7.

Developing and Contributing

Contributions of bug reports, code patches, and documentation are welcome!

Development occurs in the default branch. Please work in feature branches or bookmarks from the default branch. Feature branches should be named for the eutils issue they fix, as in 121-update-xml-facades. When merging, use a commit message like "closes #121: update xml facades to new-style interface". ("closes #n" is recognized automatically and closes the ticket upon pushing.)

The included Makefile automates many tasks. In particular, make develop prepares a development environment and make test runs unittests. (Please run tests before committing!)

Again, thanks for your contributions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].