All Projects → vinuesa → get_phylomarkers

vinuesa / get_phylomarkers

Licence: GPL-3.0 license
A pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches

Programming Languages

perl
6916 projects
c
50402 projects - #5 most used programming language
shell
77523 projects
Roff
2310 projects
Makefile
30231 projects
HTML
75241 projects

Projects that are alternatives of or similar to get phylomarkers

genesis
A library for working with phylogenetic and population genetic data.
Stars: ✭ 43 (+26.47%)
Mutual labels:  phylogenetic-trees, phylogenetics, population-genetics
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (+26.47%)
Mutual labels:  pipeline, genomics
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+2288.24%)
Mutual labels:  pipeline, genomics
Bedops
🔬 BEDOPS: high-performance genomic feature operations
Stars: ✭ 215 (+532.35%)
Mutual labels:  pipeline, genomics
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+8535.29%)
Mutual labels:  genomics, phylogenetics
MTBseq source
MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.
Stars: ✭ 26 (-23.53%)
Mutual labels:  pipeline, genomics
Flowcraft
FlowCraft: a component-based pipeline composer for omics analysis using Nextflow. 🐳📦
Stars: ✭ 208 (+511.76%)
Mutual labels:  pipeline, genomics
libpll
Phylogenetic Likelihood Library
Stars: ✭ 21 (-38.24%)
Mutual labels:  phylogenetics, phylogenomics
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (+14.71%)
Mutual labels:  genomics, microbiology
psmc
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
Stars: ✭ 121 (+255.88%)
Mutual labels:  genomics, population-genetics
human genomics pipeline
A Snakemake workflow to process single samples or cohorts of paired-end sequencing data (WGS or WES) using trim galore/bwa/GATK4/parabricks.
Stars: ✭ 19 (-44.12%)
Mutual labels:  pipeline, genomics
archaeopteryx-js
Archaeopteryx.js is a software tool for the visualization and analysis of highly annotated phylogenetic trees.
Stars: ✭ 27 (-20.59%)
Mutual labels:  phylogenetic-trees, phylogenetics
PhyloTrees.jl
Phylogenetic trees in Julia
Stars: ✭ 15 (-55.88%)
Mutual labels:  phylogenetic-trees, phylogenetics
companion
This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion
Stars: ✭ 21 (-38.24%)
Mutual labels:  pipeline, genomics
fwdpy11
Forward-time simulation in Python using fwdpp
Stars: ✭ 25 (-26.47%)
Mutual labels:  genomics, population-genetics
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (+264.71%)
Mutual labels:  pipeline, genomics
treebest
TreeBeST: Tree Building guided by Species Tree (Ensembl Compara modifications)
Stars: ✭ 15 (-55.88%)
Mutual labels:  maximum-likelihood, species-trees
pastml
Ancestor character reconstruction and visualisation for rooted phylogenetic trees
Stars: ✭ 15 (-55.88%)
Mutual labels:  phylogenetics, maximum-likelihood
gawn
Genome Annotation Without Nightmares
Stars: ✭ 35 (+2.94%)
Mutual labels:  pipeline, genomics
bactmap
A mapping-based pipeline for creating a phylogeny from bacterial whole genome sequences
Stars: ✭ 36 (+5.88%)
Mutual labels:  pipeline, genomics

GET_PHYLOMARKERS

Build Status

GET_PHYLOMARKERS (Vinuesa et al. 2018) is a software package designed to identify optimal genomic markers for phylogenomics, population genetics and genomic taxonomy. It implements a pipeline to filter orthologous gene clusters computed by the companion package GET_HOMOLOGUES to select those with optimal attributes for phylogenetic inference. A species tree is computed from the maximum likelihood gene trees computed from top-scoring alignments using ASTRAL-III. Selected alignments are also concatenated into a supermatrix, which is used to estimate a second species tree from the supermatrix under the maximum-likelihood (ML) criterion with state-of-the-art fast ML tree searching algorithms. GET_PHYLOMARKERS can also estimate ML and parsimony trees from the pan-genome matrix, including unsupervised learning methods to determine the optimal number of clusters from pan-genome and average genomic distance matrices. A detailed manual and step-by-step tutorials document the software and help the user to get quickly up and running. For your convenience, html and markdown versions of the documentation material are available.

Installation, dependencies and Docker image

For detailed instructions and dependencies please check INSTALL.md.

A GET_PHYLOMARKERS Docker image is available, as well as an image bundling GET_PHYLOMARKERS + GET_HOMOLOGUES, ready to use. Detailed instructions for setting up the Docker environment are provided in INSTALL.md. How to run container instances with the test sequences distributed with GET_PHYLOMARKERS is described in the tutorial.

Aim

GET_PHYLOMARKERS (Vinuesa et al. 2018) implements a series of sequential filters (detailed below) to selects markers from the homologous gene clusters produced by GET_HOMOLOGUES with optimal attributes for phylogenomic inference. It estimates gene-trees and species-trees under the maximum likelihood (ML) optimality criterion using state-of-the-art fast ML tree searching algorithms. The species tree is estimated from the supermatrix of concatenated, top-scoring alignments that passed the quality filters outlined in the figures below and explained in detail in the manual and publication.

flowchart Filtering actions. GET_PHYLOMARKERS

Figure 1A. Simplified flow-chart of the GET_PHYLOMARKERS pipeline showing only those parts used and described in this work. The left branch, starting at the top of the diagram, is fully under control of the master script run_get_phylomarkes_pipeline.sh. The names of the worker scripts called by the master program are indicated on the relevant points along the flow, as detailed in the manual. The image corresponds to Fig. 1 of Vinuesa et al. 2018.

Figure 1B. Combined filtering actions performed by GET_HOMOLOGUES and GET_PHYLOMARKERS to select top-ranking phylogenetic markers to be concatenated for phylogenomic analyses, and benchmark results of the performance of the FastTree (FT) and IQ-TREE (IQT) maximum-likelihood (ML) phylogeny inference programs. The image corresponds to Fig. 3 of Vinuesa et al. 2018.

GET_HOMOLOGUES is a genome-analysis software package for microbial pan-genomics and comparative genomics originally described in the following publications:

More recently we developed GET_HOMOLOGUES-EST, which can be used to cluster eukaryotic genes and transcripts, as described in Contreras-Moreira et al, Front. Plant Sci. 2017.

If GET_HOMOLOGUES_EST is fed both .fna and .faa files of CDS sequences it will produce identical output to that of GET_HOMOLOGUES and thus can be analyzed with GET_PHYLOMARKERS all the same.


GET_PHYLOMARKERS is primarily tailored towards selecting CDSs (gene markers) to infer DNA-level phylogenies of different species of the same genus or family. It can also select optimal markers for population genetics, when the source genomes belong to the same species (Vinuesa et al. 2018). For more divergent genome sequences, classified in different genera, families, orders or higher taxa, the pipeline should be run using protein instead of DNA sequences.

ML core-genome phylogeny of Stenotrophomonas ML core-genome phylogeny of Stenotrophomonas

Figure 2A. Best maximum-likelihood core-genome phylogeny for the genus Stenotrophomonas found in the IQ-TREE search, based on the supermatrix obtained by concatenation of 55 top-ranking alignments. The image corresponds to Fig. 5 of Vinuesa et al. 2018.

Figure 2B. Maximum-likelihood pan-genome phylogeny estimated with IQ-TREE from the consensus pan-genome clusters displayed in the Venn diagram. Clades of lineages belonging to the S. maltophilia complex are collapsed and are labeled as in Figure 2A. Numbers on the internal nodes represent the approximate Bayesian posterior probability/UFBoot2 bipartition support values (see methods). The tabular inset shows the results of fitting either the binary (GTR2) or morphological (MK) models implemented in IQ-TREE, indicating that the former has an overwhelmingly better fit. The scale bar represents the number of expected substitutions per site under the binary GTR2+F0+R4 substitution model. The image corresponds to Fig. 6 of Vinuesa et al. 2018.


Manual and tutorials

Please, follow the links for a detailed manual and tutorials, including a graphical flowchart of the pipeline and explanations of the implementation details.

Citation.

Pablo Vinuesa, Luz-Edith Ochoa-Sanchez and Bruno Contreras-Moreira (2018). GET_PHYLOMARKERS, a software package to select optimal orthologous clusters for phylogenomics and inferring pan-genome phylogenies, used for a critical geno-taxonomic revision of the genus Stenotrophomonas. Front. Microbiol. | doi: 10.3389/fmicb.2018.00771

Published in the Research Topic on "Microbial Taxonomy, Phylogeny and Biodiversity" http://journal.frontiersin.org/researchtopic/5493/microbial-taxonomy-phylogeny-and-biodiversity

A preprint version is available on bioRxiv

Code

Developers

The code is developed and maintained by Pablo Vinuesa at CCG-UNAM, Mexico and Bruno Contreras-Moreira at EEAD-CSIC, Spain. It is released to the public domain under the GNU GPLv3 license.

Acknowledgements

Personal

We thank Alfredo J. Hernández and Víctor del Moral at CCG-UNAM for technical support with server administration.

Funding

We gratefully acknowledge the funding provided by DGAPA-PAPIIT/UNAM (grants IN201806-2, IN211814 and IN206318) and CONACyT-Mexico (grants P1-60071, 179133 and A1-S-11242) to Pablo Vinuesa, as well as the Fundación ARAID,Consejo Superior de Investigaciones Científicas (grant 200720I038 and Spanish MINECO (AGL2013-48756-R) to Bruno Contreras-Moreira.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].