All Projects → arzwa → wgd

arzwa / wgd

Licence: GPL-3.0 license
Python package and CLI for whole-genome duplication related analyses

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
Singularity
16 projects

Projects that are alternatives of or similar to wgd

Quota Alignment
Guided synteny alignment between duplicated genomes (within specified quota constraint)
Stars: ✭ 47 (-30.88%)
Mutual labels:  genomics, evolution
mcscan
Command-line program to wrap dagchainer and combine pairwise results into multi-alignments in column format
Stars: ✭ 18 (-73.53%)
Mutual labels:  genomics, evolution
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+4217.65%)
Mutual labels:  genomics
cerebra
A tool for fast and accurate summarizing of variant calling format (VCF) files
Stars: ✭ 55 (-19.12%)
Mutual labels:  genomics
berokka
🍊 💫 Trim, circularise and orient long read bacterial genome assemblies
Stars: ✭ 23 (-66.18%)
Mutual labels:  genomics
Hap.py
Haplotype VCF comparison tools
Stars: ✭ 249 (+266.18%)
Mutual labels:  genomics
MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (-67.65%)
Mutual labels:  genomics
Dragonn
A toolkit to learn how to model and interpret regulatory sequence data using deep learning.
Stars: ✭ 222 (+226.47%)
Mutual labels:  genomics
sequencework
programs and scripts, mainly python, for analyses related to nucleic or protein sequences
Stars: ✭ 22 (-67.65%)
Mutual labels:  genomics
kmer-db
Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).
Stars: ✭ 68 (+0%)
Mutual labels:  genomics
FAVITES
FAVITES (FrAmework for VIral Transmission and Evolution Simulation)
Stars: ✭ 33 (-51.47%)
Mutual labels:  evolution
Mitty
Seven Bridges Genomics aligner/caller debugging and analysis tools
Stars: ✭ 13 (-80.88%)
Mutual labels:  genomics
cljam
A DNA Sequence Alignment/Map (SAM) library for Clojure
Stars: ✭ 85 (+25%)
Mutual labels:  genomics
tutorials
Configuration tutorials for a variety of mail user agents.
Stars: ✭ 24 (-64.71%)
Mutual labels:  evolution
Canvasxpress
JavaScript VisualizationTools
Stars: ✭ 247 (+263.24%)
Mutual labels:  genomics
metaRNA
Find target sites for the miRNAs in genomic sequences
Stars: ✭ 19 (-72.06%)
Mutual labels:  genomics
Cyvcf2
cython + htslib == fast VCF and BCF processing
Stars: ✭ 243 (+257.35%)
Mutual labels:  genomics
HLA
xHLA: Fast and accurate HLA typing from short read sequence data
Stars: ✭ 84 (+23.53%)
Mutual labels:  genomics
coala
A Framework for Coalescent Simulation in R
Stars: ✭ 21 (-69.12%)
Mutual labels:  evolution
assigner
Population assignment analysis using R
Stars: ✭ 17 (-75%)
Mutual labels:  genomics

Documentation Status Hosted

VIB/UGent center for plant systems biology - Bioinformatics & evolutionary genomics group https://www.vandepeerlab.org/

wgd - simple command line tools for the analysis of ancient whole-genome duplications

Note: If you are interested in the methods implemented in wgd, you may also want to consider the ksrates tool by Sensalari et al. which can be used to carefully compare multiple Ks distributions and model them (ksrates uses wgd under the hood).

Installation

Python package and command line interface (CLI) for the analysis of whole-genome duplications (WGDs). Tested with Python3 on Linux. If you don't have python or pip installed a simple sudo apt-get install python3-pip should do.

To install, simply run

git clone https://github.com/arzwa/wgd.git
cd wgd
pip install --user .

Note that depending on your python installation and whether you're in a virtualenv, pip may default either to pip2 or pip3. If the above installation step fails, please try to use pip3 instead of pip.

For the command line interface, upon installation run

$ wgd

to get a list of the available commands. To get usage instructions for a command (e.g. ksd) run

$ wgd ksd --help

For external software requirements: please consult the relevant section in the docs

Note: if you encounter issues, do verify you have the latest PAML version. To install the latest version, you best not rely on apt-get or any other package manager but install from source. Something like this should work (from within the directory where you want to install paml)

wget http://abacus.gene.ucl.ac.uk/software/paml4.9j.tgz
tar -xzf paml4.9j.tgz
pushd paml4.9j/src && make -f Makefile && popd 
export PATH=$PATH:$PWD/paml4.9j/src/

Quick start

The main aim of wgd is computing whole-paranome and one-vs.-one ortholog Ks distributions. For a whole-paranome distribution of a CDS sequence fasta file, the minimal commands are:

$ wgd dmd ath.cds.fasta
$ wgd ksd wgd_dmd/ath.cds.fasta.mcl ath.cds.fasta

For one-vs.one orthologs the minimal commands are

$ wgd dmd ath.cds.fasta vvi.cds.fasta
$ wgd ksd wgd_dmd/ath1000.fasta_vvi1000.fasta.rbh ath.cds.fasta vvi.cds.fasta

For more information and these methods and other tools implemented in wgd, please consult the docs.

Singularity container

A Singularity container is available for wgd, allowing to use all tools in wgd without having to install all required software on your system. To install Singularity follow the instructions here.

Once you have Singularity installed (and you're in the virtual machine when running on Windows or Mac), you can build the container image locally (requires root privileges). To do so, first get the Singularity definition file from wgd GitHub repository and then run the build command:

git clone https://github.com/arzwa/wgd.git
cd wgd
sudo singularity build wgd.sif Singularity

Then you can use wgd as follows:

singularity exec wgd.sif wgd <command>

Alternatively, if you don't have root privileges, you can pull an older container from Singularity Hub, which however doesn't support the syn (collinearity via i-ADHoRe) and dmd (diamond aligner) commands:

singularity pull --name wgd.simg shub://arzwa/wgd

Notes

Bug tracking: If the program crashes, exits unexpectedly or some unexpected results are obtained, please run it again with the --verbosity debug flag before the subcommand of interest (e.g. wgd --verbosity debug ksd gf.mcl cds.fasta). If the anomaly persists, please open an issue on this GitHub site.

Note on input data: while the input data is rather straightforward (a CDS fasta file will do for most analyses) it may be of interest that the wgd suite was extensively tested with data from the PLAZA platform, so for examples of the right input data formats (in particular CDS fasta files for sequence data and GFF files for structural annotation), please have a look there. It is generally advised not to include pipe characters (|) in your gene IDs, since these can have special meanings in certain parts of wgd.

Note on virtualenv: you can install wgd in a virtual environment (using virtualenv). If you would however encounter problems with running the executable directly (e.g. wgd --help doesn't work) you can circumvent this by directly calling the CLI, using python3 ./wgd_cli.py --help (assuming you are currently in the directory where you cloned wgd).

Citation

Please cite us at https://doi.org/10.1093/bioinformatics/bty915

Zwaenepoel, A., and Van de Peer, Y. 
wgd - simple command line tools for the analysis of ancient whole genome duplications. 
Bioinformatics., bty915, https://doi.org/10.1093/bioinformatics/bty915

For citation of the tools used in wgd, please consult the documentation at https://wgd.readthedocs.io/en/latest/index.html#citation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].