All Projects โ†’ fmalmeida โ†’ bacannot

fmalmeida / bacannot

Licence: GPL-3.0 license
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.

Programming Languages

Nextflow
61 projects
groovy
2714 projects
r
7636 projects
Dockerfile
14818 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to bacannot

Steppy Toolkit
Curated set of transformers that make your work with steppy faster and more effective ๐Ÿ”ญ
Stars: โœญ 21 (-58.82%)
Mutual labels:  pipeline, reproducible-research
Steppy
Lightweight, Python library for fast and reproducible experimentation ๐Ÿ”ฌ
Stars: โœญ 119 (+133.33%)
Mutual labels:  pipeline, reproducible-research
Drake Examples
Example workflows for the drake R package
Stars: โœญ 57 (+11.76%)
Mutual labels:  pipeline, reproducible-research
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: โœญ 1,301 (+2450.98%)
Mutual labels:  pipeline, reproducible-research
ngs-preprocess
A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies
Stars: โœญ 22 (-56.86%)
Mutual labels:  pipeline, reproducible-research
Targets
Function-oriented Make-like declarative workflows for R
Stars: โœญ 293 (+474.51%)
Mutual labels:  pipeline, reproducible-research
Nextflow
A DSL for data-driven computational pipelines
Stars: โœญ 1,337 (+2521.57%)
Mutual labels:  pipeline, reproducible-research
companion
This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion
Stars: โœญ 21 (-58.82%)
Mutual labels:  pipeline, annotation
BACTpipe
BACTpipe: An assembly and annotation pipeline for bacterial genomics
Stars: โœญ 19 (-62.75%)
Mutual labels:  pipeline, annotation
targets-tutorial
Short course on the targets R package
Stars: โœญ 87 (+70.59%)
Mutual labels:  pipeline, reproducible-research
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: โœญ 124 (+143.14%)
Mutual labels:  pipeline, reproducible-research
targets-minimal
A minimal example data analysis project with the targets R package
Stars: โœญ 50 (-1.96%)
Mutual labels:  pipeline, reproducible-research
reskit
A library for creating and curating reproducible pipelines for scientific and industrial machine learning
Stars: โœญ 27 (-47.06%)
Mutual labels:  pipeline, reproducible-research
open-solution-googleai-object-detection
Open solution to the Google AI Object Detection Challenge ๐Ÿ
Stars: โœญ 46 (-9.8%)
Mutual labels:  pipeline, reproducible-research
bump-everywhere
๐Ÿš€ Automate versioning, changelog creation, README updates and GitHub releases using GitHub Actions,npm, docker or bash.
Stars: โœญ 24 (-52.94%)
Mutual labels:  pipeline
web-verse
Toolbox for deep, resilient, markup-invariant linking into HTML documents without their cooperation
Stars: โœญ 25 (-50.98%)
Mutual labels:  annotation
pipe-trait
Make it possible to chain regular functions
Stars: โœญ 22 (-56.86%)
Mutual labels:  pipeline
Winnowmap
Long read / genome alignment software
Stars: โœญ 151 (+196.08%)
Mutual labels:  genome-analysis
researchcompendium
NOTE: This repo is archived. Please see https://github.com/benmarwick/rrtools for my current approach
Stars: โœญ 26 (-49.02%)
Mutual labels:  reproducible-research
rnafusion
RNA-seq analysis pipeline for detection gene-fusions
Stars: โœญ 72 (+41.18%)
Mutual labels:  pipeline

Cite with Zenodo GitHub release (latest by date including pre-releases) Documentation Nextflow run with docker run with singularity License Follow on Twitter

Open in Gitpod

bacannot pipeline

A generic but comprehensive bacterial annotation pipeline


See the documentation ยป

Report Bug ยท Request Feature

About

Bacannot is an easy to use nextflow docker-based pipeline that adopts state-of-the-art software for prokaryotic genome annotation. It is a wrapper around several tools that enables a better understanding of prokaryotic genomes.

Its main steps are:

Analysis steps Used software or databases
Genome assembly (if raw reads are given) Flye and Unicycler
Identification of closest 10 NCBI Refseq genomes RefSeq Masher
Generic annotation and gene prediction Prokka
rRNA prediction barrnap
Classification within multi-locus sequence types (STs) mlst
KEGG KO annotation and visualization KofamScan and KEGGDecoder
Annotation of secondary metabolites antiSMASH
Methylation annotation Nanopolish
Annotation of antimicrobial (AMR) genes AMRFinderPlus, ARGminer, Resfinder and RGI
Annotation of virulence genes Victors and VFDB
Prophage sequences and genes annotation PHASTER, Phigaro and PhySpy
Annotation of integrative and conjugative elements ICEberg
Focused detection of insertion sequences digIS
In silico detection of plasmids Plasmidfinder and Platon
Prediction and visualization of genomic islands IslandPath-DIMOB and gff-toolbox
Custom annotation from formatted FASTA or NCBI protein IDs BLAST
Merge of annotation results bedtools
Genome Browser renderization JBrowse
Renderization of automatic reports and shiny app for results interrogation R Markdown, Shiny and SequenceServer

๐ŸŽฏ In order to increase the accuracy of prokka annotation, this pipeline includes an additional HMM database to prokka's defaults. It can be either TIGRFAM (smaller but curated) or PGAP (bigger comprehensive NCBI database that contains TIGRFAM).

Release notes

Are you curious about changes between releases? See the changelog.

  • I strongly, vividly, mightily recommend the usage of the latest versions hosted in master branch, which is nextflow's default.
    • The latest will always have support, bug fixes and generally maitain the same processes (I mainly add things instead of removing) that also were in previous versions.
    • But, if you really want to execute an earlier release, please see the instructions for that.
  • Versions below 2.0 are no longer supported.

Further reading and complementary analyses

Moreover, this pipeline has two complementary pipelines (also written in nextflow) for NGS preprocessing and Genome assembly that can give the user a more thorough and robust workflow for bacterial genomics analyses.

Requirements

These images have been kept separate to not create massive Docker image and to avoid dependencies conflicts.

Installation

  1. If you don't have it already install Docker in your computer.

    • After installed, you need to download the required Docker images
    docker pull fmalmeida/bacannot:v3.1_misc    ;
    docker pull fmalmeida/bacannot:v3.1_perlenv ;
    docker pull fmalmeida/bacannot:v3.1_pyenv   ;
    docker pull fmalmeida/bacannot:v3.1_renv    ;
    docker pull fmalmeida/bacannot:jbrowse      ;

๐Ÿ”ฅ Nextflow can also automatically handle images download on the fly when executed. If docker has exceeded its download limit rates, please try again in a few hours.

  1. Install Nextflow (version 20.10 or higher):

    curl -s https://get.nextflow.io | bash
    
  2. Give it a try:

    nextflow run fmalmeida/bacannot -profile docker --help
    

๐Ÿ”ฅ To run the pipeline now users need to pass the -profile docker or -profile singularity parameter explicitely. The pipeline does not load it automatically anymore.

๐Ÿ”ฅ Users can get let the pipeline always updated with: nextflow pull fmalmeida/bacannot

Downloading and updating databases

Bacannot databases are not inside the docker images anymore to avoid huge images and problems with conexions and limit rates with dockerhub.

To get a copy of required bacannot databases users must:

# Download pipeline databases
nextflow run fmalmeida/bacannot --get_dbs --output bacannot_dbs -profile <docker/singularity>

This will produce a directory like this:

bacannot_dbs
โ”œโ”€โ”€ amrfinder_db
โ”œโ”€โ”€ antismash_db
โ”œโ”€โ”€ argminer_db
โ”œโ”€โ”€ card_db
โ”œโ”€โ”€ iceberg_db
โ”œโ”€โ”€ kofamscan_db
โ”œโ”€โ”€ mlst_db
โ”œโ”€โ”€ phast_db
โ”œโ”€โ”€ phigaro_db
โ”œโ”€โ”€ pipeline_info
โ”œโ”€โ”€ plasmidfinder_db
โ”œโ”€โ”€ platon_db
โ”œโ”€โ”€ prokka_db
โ”œโ”€โ”€ resfinder_db
โ”œโ”€โ”€ vfdb_db
โ””โ”€โ”€ victors_db

To update databases you can either download a new one to a new directory. Remove the database you want to get a new one from the root bacannot dir and use the same command above to save in the same directory (the pipeline will only try to download missing databases). Or, you can use the parameter --force_update to download everything again.

Quickstart

Please refer to the quickstart page ยป

Overview of outputs

A nice overview of the output directory structure and the main tools/features produced by the pipeline is provided at https://bacannot.readthedocs.io/en/latest/outputs.

Documentation

Usage

Users are advised to read the complete documentation ยป

  • Complete command line explanation of parameters:
    • nextflow run fmalmeida/bacannot --help

Command line usage examples

Command line executions are exemplified in the manual.

Using the configuration file

All the parameters showed above can be, and are advised to be, set through the configuration file. When a configuration file is set the pipeline is run by simply executing nextflow run fmalmeida/bacannot -c ./configuration-file

Your configuration file is what will tell to the pipeline the type of data you have, and which processes to execute. Therefore, it needs to be correctly set up.

Create a configuration file in your working directory:

  nextflow run fmalmeida/bacannot --get_config

Interactive graphical configuration and execution

Via NF tower launchpad (good for cloud env execution)

Nextflow has an awesome feature called NF tower. It allows that users quickly customise and set-up the execution and configuration of cloud enviroments to execute any nextflow pipeline from nf-core, github (this one included), bitbucket, etc. By having a compliant JSON schema for pipeline configuration it means that the configuration of parameters in NF tower will be easier because the system will render an input form.

Checkout more about this feature at: https://seqera.io/blog/orgs-and-launchpad/

Via nf-core launch (good for local execution)

Users can trigger a graphical and interactive pipeline configuration and execution by using nf-core launch utility. nf-core launch will start an interactive form in your web browser or command line so you can configure the pipeline step by step and start the execution of the pipeline in the end.

# Install nf-core
pip install nf-core

# Launch the pipeline
nf-core launch fmalmeida/bacannot

It will result in the following:

Known issues

  1. Sometimes when navigating through the shiny parser the reports and JBrowse tabs may still be pointing to old, or just different, samples that have been analysed before and not the actual sample in question. For example, you open the shiny server for the Sample 2, but the reports and JBrowse are showing results of Sample 1. This is caused by the browser's data storages and cookies.
    • To solve this problem user's can just clear the cookies and data cache from the browser.
  2. The JBrowse wrapper in the shiny server is not capable of displaying the GC content and methylation plots when available. It can only display the simpler tracks. If the user wants to visualise and interrogate the GC or methylation tracks it must open the JBrowse outside from the shiny server. For that, two options are available:
    • You can navigate to the jbrowse directory under your sample's output folder and simply execute http-server. This command can be found at: https://www.npmjs.com/package/http-server
    • Or, you can download the JBrowse Desktop app and, from inside the app, select the folder jbrowse/data that is available in your sample's output directory.

Citation

To cite this tool please refer to our Zenodo tag.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the GPLv3.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, users are encouraged to cite the programs used in this pipeline whenever they are used. Links to resources of tools and data used in this pipeline are in the list of tools.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].