All Projects → zyxue → ncbitax2lin

zyxue / ncbitax2lin

Licence: MIT license
🐞 Convert NCBI taxonomy dump into lineages

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to ncbitax2lin

taxid-changelog
NCBI taxonomic identifier (taxid) changelog, including taxids deletion, new adding, merge, reuse, and rank/name changes.
Stars: ✭ 13 (-88.5%)
Mutual labels:  taxonomy, ncbi-taxonomy, lineage, taxdump
bio
A lightweight and high-performance bioinformatics package in Golang
Stars: ✭ 80 (-29.2%)
Mutual labels:  taxonomy, taxdump
multitax
Python package to obtain, parse and explore biological taxonomies (GTDB, NCBI, Silva, Greengenes, OTT)
Stars: ✭ 22 (-80.53%)
Mutual labels:  taxonomy, ncbi
kraken-biom
Create BIOM-format tables (http://biom-format.org) from Kraken output (http://ccb.jhu.edu/software/kraken/, https://github.com/DerrickWood/kraken).
Stars: ✭ 35 (-69.03%)
Mutual labels:  taxonomy
wp-term-meta-ui
A base UI class for a term metadata user interface
Stars: ✭ 23 (-79.65%)
Mutual labels:  taxonomy
eBay-node-client
Ebay NodeJS Wrapper
Stars: ✭ 50 (-55.75%)
Mutual labels:  taxonomy
ncbi acc2gtdb acc
Mapping NCBI Genbank accession to GTDB accession
Stars: ✭ 14 (-87.61%)
Mutual labels:  ncbi
arctos
Arctos is a museum collections management system
Stars: ✭ 39 (-65.49%)
Mutual labels:  taxonomy
metacoder
Parsing, Manipulation, and Visualization of Metabarcoding/Taxonomic data
Stars: ✭ 120 (+6.19%)
Mutual labels:  taxonomy
taxonomy-term-image
Example plugin for adding an image upload field to taxonomy terms in WordPress
Stars: ✭ 50 (-55.75%)
Mutual labels:  taxonomy
react-taxonomypicker
A Taxonomy Picker control built with TypeScript for React. Built for use in Office 365 / SharePoint
Stars: ✭ 23 (-79.65%)
Mutual labels:  taxonomy
TEAM
The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (-76.11%)
Mutual labels:  taxonomy
general
The Catalogue of Life
Stars: ✭ 39 (-65.49%)
Mutual labels:  taxonomy
worrms
World Register of Marine Species R client
Stars: ✭ 13 (-88.5%)
Mutual labels:  taxonomy
pytaxonkit
Python bindings for the TaxonKit library
Stars: ✭ 15 (-86.73%)
Mutual labels:  taxonomy
genome updater
Bash script to download/update snapshots of files from NCBI genomes repository (refseq/genbank) with track of changes and without redundancy
Stars: ✭ 93 (-17.7%)
Mutual labels:  ncbi
Sitegeist.Taxonomy
Manage vocabularies and taxonomies as separate node-hierarchy.
Stars: ✭ 14 (-87.61%)
Mutual labels:  taxonomy
rfishbase
R interface to the fishbase.org database
Stars: ✭ 79 (-30.09%)
Mutual labels:  taxonomy
catalog-manager
Backend Module ohne Programmierkenntnisse erstellen.
Stars: ✭ 28 (-75.22%)
Mutual labels:  taxonomy
vrt-ruby
Ruby library for interacting with Bugcrowd's VRT
Stars: ✭ 15 (-86.73%)
Mutual labels:  taxonomy

NCBItax2lin

Downloads

Convert NCBI taxonomy dump into lineages. An example for human (tax_id=9606) is like

tax_id superkingdom phylum class order family genus species family1 forma genus1 infraclass infraorder kingdom no rank no rank1 no rank10 no rank11 no rank12 no rank13 no rank14 no rank15 no rank16 no rank17 no rank18 no rank19 no rank2 no rank20 no rank21 no rank22 no rank3 no rank4 no rank5 no rank6 no rank7 no rank8 no rank9 parvorder species group species subgroup species1 subclass subfamily subgenus subkingdom suborder subphylum subspecies subtribe superclass superfamily superorder superorder1 superphylum tribe varietas
9606 Eukaryota Chordata Mammalia Primates Hominidae Homo Homo sapiens Simiiformes Metazoa cellular organisms Opisthokonta Dipnotetrapodomorpha Tetrapoda Amniota Theria Eutheria Boreoeutheria Eumetazoa Bilateria Deuterostomia Vertebrata Gnathostomata Teleostomi Euteleostomi Sarcopterygii Catarrhini Homininae Haplorrhini Craniata Hominoidea Euarchontoglires

Install

ncbitax2lin supports python-3.7, python-3.8, and python-3.9.

pip install -U ncbitax2lin

Generate lineages

First download taxonomy dump from NCBI:

wget -N ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
mkdir -p taxdump && tar zxf taxdump.tar.gz -C ./taxdump

Then, run ncbitax2lin

ncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp

By default, the generated lineages will be saved to ncbi_lineages_[date_of_utcnow].csv.gz. The output file can be overwritten with --output option.

FAQ

Q: I have a large number of sequences with their corresponding accession numbers from NCBI, how to get their lineages?

A: First, you need to map accession numbers (GI is deprecated) to tax IDs based on nucl_*accession2taxid.gz files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/. Secondly, you can trace a sequence's whole lineage based on its tax ID. The tax-id-to-lineage mapping is what NCBItax2lin can generate for you.

If you have any question about this project, please feel free to create a new issue.

Note on taxdump.tar.gz.md5

It appears that NCBI periodically regenerates taxdump.tar.gz and taxdump.tar.gz.md5 even when its content is still the same. I am not sure how their regeneration works, but taxdump.tar.gz.md5 will differ simply because of a different timestamp.

Used in

  • Mahmoudabadi, G., & Phillips, R. (2018). A comprehensive and quantitative exploration of thousands of viral genomes. ELife, 7. https://doi.org/10.7554/eLife.31955
  • Dombrowski, N. et al. (2020) Undinarchaeota illuminate DPANN phylogeny and the impact of gene transfer on archaeal evolution, Nature Communications. Springer US, 11(1). doi: 10.1038/s41467-020-17408-w. https://www.nature.com/articles/s41467-020-17408-w
  • Schenberger Santos, A. R. et al. (2020) NAD+ biosynthesis in bacteria is controlled by global carbon/ nitrogen levels via PII signaling, Journal of Biological Chemistry, 295(18), pp. 6165–6176. doi: 10.1074/jbc.RA120.012793. https://www.sciencedirect.com/science/article/pii/S0021925817482433
  • Villada, J. C., Duran, M. F. and Lee, P. K. H. (2020) Interplay between Position-Dependent Codon Usage Bias and Hydrogen Bonding at the 5' End of ORFeomes, mSystems, 5(4), pp. 1–18. doi: 10.1128/msystems.00613-20. https://msystems.asm.org/content/5/4/e00613-20
  • Byadgi, O. et al. (2020) Transcriptome analysis of amyloodinium ocellatum tomonts revealed basic information on the major potential virulence factors, Genes, 11(11), pp. 1–12. doi: 10.3390/genes11111252. https://www.mdpi.com/2073-4425/11/11/1252

Development

Install dependencies

poetry shell
poetry install

Testing

make format
make all

Publish (only for administrator)

poetry version [minor/major etc.]
poetry publish --build
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].