All Projects → HadrienG → Taxadb

HadrienG / Taxadb

Licence: mit
🐣 locally query the ncbi taxonomy

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Taxadb

Terpene Profile Parser For Cannabis Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Stars: ✭ 63 (+142.31%)
Mutual labels:  bioinformatics, database
Bio4j
Bio4j abstract model and general entry point to the project
Stars: ✭ 113 (+334.62%)
Mutual labels:  bioinformatics, database
Postgui
A React web application to query and share any PostgreSQL database.
Stars: ✭ 260 (+900%)
Mutual labels:  bioinformatics, database
Db Mysql
Stars: ✭ 22 (-15.38%)
Mutual labels:  database
Docker Postgres
A docker container running PostgreSQL
Stars: ✭ 22 (-15.38%)
Mutual labels:  database
Doctrinemigrations
[DEPRECATED] Use Phinx instead
Stars: ✭ 24 (-7.69%)
Mutual labels:  database
16gt
Simultaneous detection of SNPs and Indels using a 16-genotype probabilistic model
Stars: ✭ 26 (+0%)
Mutual labels:  bioinformatics
Mariadb Container
MariaDB container images based on Red Hat Software Collections and intended for OpenShift and general usage. Users can choose between Red Hat Enterprise Linux, Fedora, and CentOS based images.
Stars: ✭ 19 (-26.92%)
Mutual labels:  database
Pyensemblrest
A wrapper for the EnsEMBL REST API
Stars: ✭ 25 (-3.85%)
Mutual labels:  bioinformatics
Realm Dotnet
Realm is a mobile database: a replacement for SQLite & ORMs
Stars: ✭ 927 (+3465.38%)
Mutual labels:  database
Ema
Fast & accurate alignment of barcoded short-reads
Stars: ✭ 24 (-7.69%)
Mutual labels:  bioinformatics
Foundationdb4s
Type-safe and idiomatic Scala client for FoundationDB
Stars: ✭ 23 (-11.54%)
Mutual labels:  database
Swoft Db
[READ ONLY] Database Compoment for Swoft
Stars: ✭ 25 (-3.85%)
Mutual labels:  database
Elvers
(formerly eelpond) an automated RNA-Seq workflow system
Stars: ✭ 22 (-15.38%)
Mutual labels:  bioinformatics
Tiledb Vcf
Efficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (+0%)
Mutual labels:  bioinformatics
Redix
a persistent real-time key-value store, with the same redis protocol with powerful features
Stars: ✭ 907 (+3388.46%)
Mutual labels:  database
Bgpmon
CSU's BGP Observatory code (bgpmon/pheme)
Stars: ✭ 25 (-3.85%)
Mutual labels:  database
Fermi2
Stars: ✭ 23 (-11.54%)
Mutual labels:  bioinformatics
Nyaa
Nyaa.se replacement written in golang
Stars: ✭ 924 (+3453.85%)
Mutual labels:  database
Influxdb Bundle
Bundle service integration of official influxdb/influxdb-php client
Stars: ✭ 24 (-7.69%)
Mutual labels:  database

Taxadb

Build Status Documentation Status made-with-python PyPI version codecov LICENSE

Taxadb is an application to locally query the ncbi taxonomy. Taxadb is written in python, and access its database using the peewee library.

In brief Taxadb:

  • is a small tool to query the ncbi taxonomy.
  • is written in python >= 3.5.
  • has built-in support for SQLite, MySQL and PostgreSQL.
  • has available pre-built SQLite databases.
  • has a comprehensive API documentation.

Installation

Taxadb requires python >= 3.5 to work. To install taxadb with sqlite support, simply type the following in your terminal:

pip3 install taxadb

If you wish to use MySQL or PostgreSQL, please refer to the full documentation

Usage

Querying the Database

Firstly, make sure you have built the database

Below you can find basic examples. For more complete examples, please refer to the complete API documentation

    >>> from taxadb.taxid import TaxID

    >>> taxid = TaxID(dbtype='sqlite', dbname='mydb.sqlite')
    >>> name = taxid.sci_name(33208)
    >>> print(name)
    Metazoa

    >>> lineage = taxid.lineage_name(33208)
    >>> print(lineage)
    ['Metazoa', 'Opisthokonta', 'Eukaryota', 'cellular organisms']
    >>> lineage = taxid.lineage_name(33208, reverse=True)
    >>> print(lineage)
    ['cellular organism', 'Eukaryota', 'Opisthokonta', 'Metazoa']

    >>> taxid.has_parent(33208, 'Eukaryota')
    True

Get the taxid from a scientific name.

    >>> from taxadb.names import SciName

    >>> names = SciName(dbtype='sqlite', dbname='mydb.sqlite')
    >>> taxid = names.taxid('Physisporinus cinereus')
    >>> print(taxid)
    2056287

Get the taxonomic information for accession number(s).

    >>> from taxadb.accessionid import AccessionID

    >>> my_accessions = ['X17276', 'Z12029']
    >>> accession = AccessionID(dbtype='sqlite', dbname='mydb.sqlite')
    >>> taxids = accession.taxid(my_accessions)
    >>> taxids
    <generator object taxid at 0x1051b0830>

    >>> for tax in taxids:
        print(tax)
    ('X17276', 9646)
    ('Z12029', 9915)

You can also use a configuration file in order to automatically set database connection parameters at object build. Either set config parameter to __init__ object method:

   >>> from taxadb.accessionid import AccessionID

   >>> my_accessions = ['X17276', 'Z12029']
   >>> accession = AccessionID(config='/path/to/taxadb.cfg')
   >>> taxids = accession.taxid(my_accessions)
   >>> ...

or set environment variable TAXADB_CONFIG which point to configuration file:

   $ export TAXADB_CONFIG='/path/to/taxadb.cfg'

then

   >>> from taxadb.accessionid import AccessionID

   >>> my_accessions = ['X17276', 'Z12029']
   >>> accession = AccessionID()
   >>> taxids = accession.taxid(my_accessions)
   >>> ...

Check documentation for more information.

Creating the Database

Download data

The following commands will download the necessary files from the ncbi ftp into the directory taxadb.

$ taxadb download -o taxadb

Insert data

SQLite
$ taxadb create -i taxadb --dbname taxadb.sqlite

You can then safely remove the downloaded files

$ rm -r taxadb
MySQL

Creating databases is a very vendor specific task. Peewee, as most ORMs, can create tables but not databases. In order to use taxadb with MySQL, you'll have to create the database yourself.

Connect to your mysql server

$ mysql -u $user -p
$ mysql> CREATE DATABASE taxadb;

Load data

$ taxadb create -i taxadb --dbname taxadb --dbtype mysql --username <user> --password <pwd> ...
PostgreSQL

Creating databases is a very vendor specific task. Peewee, as most ORMs, can create tables but not databases. In order to use taxadb with PosgreSQL, you'll have to create the database yourself.

Connect to your postgresql server

$ psql -U $user -d postgres
$ psql> CREATE DATABASE taxadb;

Load data

$ taxadb create -i taxadb --dbname taxadb --dbtype postgres --username <user> --password <pwd> ...

You can easily rerun the same command, taxadb is able to skip already inserted taxid as well as accession.

Tests

You can easily run some tests. Go to the root directory of this projects cd /path/to/taxadb and run nosetests.

This simple command will run tests against an SQLite test database called test_db.sqlite located in taxadb/test directory.

It is also possible to only run tests related to accessionid or taxid as follow

$ nosetests -a 'taxid'
$ nosetests -a 'accessionid'

You can also use the configuration file located in root distribution taxadb.ini as follow. This file should contains database connection settings:

$ nosetests --tc-file taxadb.ini

You can easily override configuration file settings using command line options --tc such as:

$ nosetest --tc-file taxadb.ini --tc=sql.dbname:another_dbname

More info at nose-testconfig

Running tests against PostgreSQL or MySQL

First create a test database to insert test data

  • PostgreSQL
$ createdb <test_db>

or

$ psql -U postgres
psql> CREATE DATABASE <test_db>;
  • MySQL
$ mysql -u root
mysql> CREATE DATABASE <test_db>;

Load test data

  • PostgreSQL
$ gunzip -c /path/to/taxadb/taxadb/test/test_mypg_db.sql.gz | psql -d <test_db> -U <user>
  • MySQL
$ gunzip -c /path/to/taxadb/taxadb/test/test_mypg_db.sql.gz | mysql -D <test_db> -u <user> -p

Run tests

Either edit taxadb.ini to fit database configuration or use --tc command line option and set appropriate values like username, password, port, hostname, dbtype(postgres or mysql), dbname.

  1. PostgreSQL
$ nosetests --tc-file taxadb.ini
OR
$ nosetests -tc-file taxadb.ini --tc=sql.dbtype:postgres --tc=sql.username:postgres --tc=sql.dbname:test_db2
  1. MySQL
$ nosetests --tc-file taxadb.ini
OR
$ nosetests -tc-file taxadb.ini --tc=sql.dbtype:mysql --tc=sql.username:root --tc=sql.dbname:newdbname

License

Code is under the MIT license.

Issues

Found a bug or have a question? Please open an issue

Contributing

Thought about a new feature that you'd like us to implement? Open an issue or fork the repository and submit a pull request

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].