All Projects → nickjcroucher → gubbins

nickjcroucher / gubbins

Licence: GPL-2.0 license
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

Programming Languages

python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language
M4
1887 projects
shell
77523 projects
Makefile
30231 projects
r
7636 projects

Projects that are alternatives of or similar to gubbins

assembly improvement
Improve the quality of a denovo assembly by scaffolding and gap filling
Stars: ✭ 46 (-55.34%)
Mutual labels:  research, genomics, pathogen, sequencing, next-generation-sequencing, bioinformatics-pipeline, global-health, infectious-diseases
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (-73.79%)
Mutual labels:  research, genomics, pathogen, sequencing, next-generation-sequencing, bioinformatics-pipeline, global-health, infectious-diseases
saffrontree
SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Stars: ✭ 17 (-83.5%)
Mutual labels:  research, genomics, pathogen, sequencing, next-generation-sequencing, bioinformatics-pipeline, global-health, infectious-diseases
mlst check
Multilocus sequence typing by blast using the schemes from PubMLST
Stars: ✭ 22 (-78.64%)
Mutual labels:  research, genomics, pathogen, sequencing, next-generation-sequencing, bioinformatics-pipeline, global-health, infectious-diseases
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (-73.79%)
Mutual labels:  research, genomics, pathogen, sequencing, next-generation-sequencing, bioinformatics-pipeline, global-health, infectious-diseases
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+76.7%)
Mutual labels:  research, genomics, pathogen, sequencing, next-generation-sequencing, bioinformatics-pipeline, global-health, infectious-diseases
tiptoft
Predict plasmids from uncorrected long read data
Stars: ✭ 27 (-73.79%)
Mutual labels:  research, genomics, pathogen, bioinformatics-pipeline, global-health, infectious-diseases
Gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 67 (-34.95%)
Mutual labels:  research, genomics, sequencing
Circlator
A tool to circularize genome assemblies
Stars: ✭ 121 (+17.48%)
Mutual labels:  research, genomics, sequencing
Artemis
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation
Stars: ✭ 135 (+31.07%)
Mutual labels:  research, genomics, sequencing
Roary
Rapid large-scale prokaryote pan genome analysis
Stars: ✭ 176 (+70.87%)
Mutual labels:  research, genomics, sequencing
Ariba
Antimicrobial Resistance Identification By Assembly
Stars: ✭ 96 (-6.8%)
Mutual labels:  research, genomics, sequencing
Deepvariant
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
Stars: ✭ 2,404 (+2233.98%)
Mutual labels:  genomics, sequencing
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (-62.14%)
Mutual labels:  genomics, sequencing
Hgvs
Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
Stars: ✭ 138 (+33.98%)
Mutual labels:  genomics, sequencing
Sequenceserver
Intuitive local web frontend for the BLAST bioinformatics tool
Stars: ✭ 198 (+92.23%)
Mutual labels:  genomics, sequencing
Htsjdk
A Java API for high-throughput sequencing data (HTS) formats.
Stars: ✭ 220 (+113.59%)
Mutual labels:  genomics, sequencing
Genomicsqlite
Genomics Extension for SQLite
Stars: ✭ 90 (-12.62%)
Mutual labels:  genomics, sequencing
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (-1.94%)
Mutual labels:  genomics, sequencing
HLA
xHLA: Fast and accurate HLA typing from short read sequence data
Stars: ✭ 84 (-18.45%)
Mutual labels:  genomics, sequencing

Gubbins

Genealogies Unbiased By recomBinations In Nucleotide Sequences

build
License: GPL v2
status
install with bioconda
codecov

Contents

Introduction

Since the introduction of high-throughput, second-generation DNA sequencing technologies, there has been an enormous increase in the size of datasets being used for estimating bacterial population phylodynamics. Although many phylogenetic techniques are scalable to hundreds of bacterial genomes, methods which have been used for mitigating the effect of horizontal sequence transfer on phylogenetic reconstructions cannot cope with these new datasets. Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) is an algorithm that iteratively identifies loci containing elevated densities of base substitutions while concurrently constructing a phylogeny based on the putative point mutations outside of these regions. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistic models of short-term bacterial evolution, and can be run in only a few hours on alignments of hundreds of bacterial genome sequences.

Installation

Before starting your analysis, please have a look at the Gubbins webpage, manual, tutorial and publication.

Required dependencies

Phylogenetic software:

Python modules:

  • Biopython (>1.59),
  • DendroPy (>=4.0)
  • Scipy
  • Numpy
  • Multiprocessing
  • Numba

See environment.yml for details. These are in addition to standard build environment tools (e.g. python >=3.8, pip3, make, autoconf, libtool, gcc, check, etc...). There are a number of ways to install Gubbins and details are provided below. If you encounter an issue when installing Gubbins please contact your local system administrator.

Recommended installation method - conda

Install conda and enable the bioconda channels. This can be done using the normal command line (Linux), with Terminal (OSX) or the Powershell (Windows versions >=10).

conda config --add channels r
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install gubbins

Linux - Ubuntu Xenial (16.04) & Debian (unstable)

Gubbins has been packaged by the Debian Med team and is trivial to install using apt.

sudo apt-get install gubbins

OSX/Linux - from source

Install the dependencies and include them in your PATH. Clone or download the source code from GitHub and run the following commands to install Gubbins:

autoreconf -i
./configure [--prefix=$PREFIX]
make
[sudo] make install
cd python
[sudo] python3 -m pip install .

Use sudo to install Gubbins system-wide. If you don't have the permissions, run configure with a prefix to install Gubbins in your home directory.

OSX/Linux - installing from the repository

The easiest way to install the latest version of the code from this repository is to set up a conda environment with the packages needed for installation, then remove gubbins:

conda create -c bioconda -n gubbins_git gubbins python=3.9
conda activate gubbins_git
conda install -c conda-forge libtool autoconf-archive automake pkg-config check pytest
conda remove --force gubbins

Then download and install the repository in the same environment:

git clone https://github.com/nickjcroucher/gubbins
cd gubbins
autoreconf -i
chmod +x configure 
./configure --prefix=$CONDA_PREFIX
make
sudo make install
cd python
python3 -m pip install .

OSX/Linux/Windows - Virtual Machine

Gubbins can be run through the Powershell in Windows versions >=10. We have also created a virtual machine which has all of the software setup, along with the test datasets from the paper. It is based on Bio-Linux 8. You need to first install VirtualBox, then load the virtual machine, using the 'File -> Import Appliance' menu option. The root password is 'manager'.

  • ftp://ftp.sanger.ac.uk/pub/pathogens/pathogens-vm/pathogens-vm.latest.ova

Running the tests

The test can be run from the top level directory:

make check

Usage

To run Gubbins with default settings:

run_gubbins.py [FASTA alignment]

Information on on further options can be found in the manual.

License

Gubbins is free software, licensed under GPLv2.

Feedback/Issues

There is no specific support for development or maintenance of Gubbins. However, we will try to help you out if you report any issues about usage of the software to the issues page.

Citation

If you use this software please cite: [Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R. "Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins". doi:10.1093/nar/gku1196, Nucleic Acids Research, 2014.] (http://nar.oxfordjournals.org/content/43/3/e15)

Further Information

For more information on this software see the Gubbins webpage.

Data from the paper

Midpoint rerooting

From version 1.3.5 (25/6/15) to version 1.4.6 (29/2/16) trees were not midpoint rerooted by default. This doesnt have any effect on the recombination detection, but the output trees may not look as expected. Users are advised to upgrade to the latest version.

Ancestral sequence reconstruction

From version 3.0.0 onwards, Gubbins will use joint ancestral reconstructions with a modified version of pyjar by default. Version 2 used marginal ancestral reconstruction with RAxML; this is still available in version 3, using the --mar flag (IQtree can also be used for reconstruction in version >3.0.0). This may useful in cases where memory use is limiting. Version 1 used joint ancestral reconstruction with fastML.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].