All Projects → fmalmeida → ngs-preprocess

fmalmeida / ngs-preprocess

Licence: GPL-3.0 license
A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies

Programming Languages

Nextflow
61 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to ngs-preprocess

Nextflow
A DSL for data-driven computational pipelines
Stars: ✭ 1,337 (+5977.27%)
Mutual labels:  pipeline, reproducible-research, reproducible-science
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (+463.64%)
Mutual labels:  pipeline, reproducible-research, reproducible-science
MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (+0%)
Mutual labels:  ngs, pacbio, illumina
CliqueSNV
No description or website provided.
Stars: ✭ 13 (-40.91%)
Mutual labels:  ngs, pacbio, illumina
Ngseasy
Dockerised Next Generation Sequencing Pipeline (QC, Align, Calling, Annotation)
Stars: ✭ 80 (+263.64%)
Mutual labels:  pipeline, ngs
Drake Examples
Example workflows for the drake R package
Stars: ✭ 57 (+159.09%)
Mutual labels:  pipeline, reproducible-research
human genomics pipeline
A Snakemake workflow to process single samples or cohorts of paired-end sequencing data (WGS or WES) using trim galore/bwa/GATK4/parabricks.
Stars: ✭ 19 (-13.64%)
Mutual labels:  pipeline, illumina
Vistrails
VisTrails is an open-source data analysis and visualization tool. It provides a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task: VisTrails maintains provenance of data products, of the computational processes that derive these products and their executions.
Stars: ✭ 94 (+327.27%)
Mutual labels:  pipeline, reproducible-science
Targets
Function-oriented Make-like declarative workflows for R
Stars: ✭ 293 (+1231.82%)
Mutual labels:  pipeline, reproducible-research
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+5813.64%)
Mutual labels:  pipeline, reproducible-research
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (+409.09%)
Mutual labels:  pipeline, ngs
targets-tutorial
Short course on the targets R package
Stars: ✭ 87 (+295.45%)
Mutual labels:  pipeline, reproducible-research
Steppy Toolkit
Curated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-4.55%)
Mutual labels:  pipeline, reproducible-research
ukbrest
ukbREST: efficient and streamlined data access for reproducible research of large biobanks
Stars: ✭ 32 (+45.45%)
Mutual labels:  reproducible-research, reproducible-science
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+3590.91%)
Mutual labels:  pipeline, ngs
grape-nf
An automated RNA-seq pipeline using Nextflow
Stars: ✭ 30 (+36.36%)
Mutual labels:  pipeline, ngs
DNAscan
DNAscan is a fast and efficient bioinformatics pipeline that allows for the analysis of DNA Next Generation sequencing data, requiring very little computational effort and memory usage.
Stars: ✭ 36 (+63.64%)
Mutual labels:  pipeline, ngs
ctdna-pipeline
A simplified pipeline for ctDNA sequencing data analysis
Stars: ✭ 29 (+31.82%)
Mutual labels:  pipeline, ngs
Steppy
Lightweight, Python library for fast and reproducible experimentation 🔬
Stars: ✭ 119 (+440.91%)
Mutual labels:  pipeline, reproducible-research
galaksio
An easy-to-use way for running Galaxy workflows.
Stars: ✭ 19 (-13.64%)
Mutual labels:  reproducible-research, ngs

DOI Releases Documentation Dockerhub Nextflow run with conda run with docker run with singularity Follow on Twitter License

ngs-preprocess pipeline

A pipeline for preprocessing short and long sequencing reads


See the documentation »

Report Bug · Request Feature

About

ngs-preprocess is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. It is an easy to use pipeline that uses state-of-the-art software for quality check and pre-processing ngs reads of Illumina, Pacbio and Oxford Nanopore Technologies.

It wraps up the following software:

Step tools
SRA NBCI fetch Entrez-direct & sra-tools
Illumina pre-processing Fastp
Nanopore pre-processing Porechop, pycoQC, NanoPack
Pacbio pre-processing bam2fastx, bax2bam, lima, pacbio ccs

Further reading

This pipeline has two complementary pipelines (also written in nextflow) for genome assembly and prokaryotic genome annotation that can give the user a complete workflow for bacterial genomics analyses.

Quickstart

  1. Install Nextflow:

    curl -s https://get.nextflow.io | bash
  2. Give it a try:

    nextflow run fmalmeida/ngs-preprocess --help
  3. Download required tools

    • for docker

      # for docker
      docker pull fmalmeida/ngs-preprocess:v2.5
      
      # run
      nextflow run fmalmeida/ngs-preprocess -profile docker [options]
    • for singularity

      # for singularity
      # remember to properly set NXF_SINGULARITY_LIBRARYDIR
      # read more at https://www.nextflow.io/docs/latest/singularity.html#singularity-docker-hub
      export NXF_SINGULARITY_LIBRARYDIR=MY_SINGULARITY_IMAGES    # your singularity storage dir
      export NXF_SINGULARITY_CACHEDIR=MY_SINGULARITY_CACHE       # your singularity cache dir
      singularity pull \
          --dir $NXF_SINGULARITY_LIBRARYDIR \
          fmalmeida-ngs-preprocess-v2.5.img docker://fmalmeida/ngs-preprocess:v2.5
      
      # run
      nextflow run fmalmeida/ngs-preprocess -profile singularity [options]
    • for conda

      # for conda
      # it is better to create envs with mamba for faster solving
      wget https://github.com/fmalmeida/ngs-preprocess/raw/master/environment.yml
      conda env create -f environment.yml   # advice: use mamba
      
      # must be executed from the base environment
      # This tells nextflow to load the available ngs-preprocess environment when required
      nextflow run fmalmeida/ngs-preprocess -profile conda [options]
  4. Start running your analysis

    nextflow run fmalmeida/ngs-preprocess -profile <docker/singularity/conda>

🔥 Please read the documentation below on selecting between conda, docker or singularity profiles, since the tools will be made available differently depending on the profile desired.

Documentation

Selecting between profiles

Nextflow profiles are a set of "sensible defaults" for the resource requirements of each of the steps in the workflow, that can be enabled with the command line flag -profile. You can learn more about nextflow profiles at:

The pipeline have "standard profiles" set to run the workflows with either conda, docker or singularity using the local executor, which is nextflow's default and basically runs the pipeline processes in the computer where Nextflow is launched. If you need to run the pipeline using another executor such as sge, lsf, slurm, etc. you can take a look at nextflow's manual page to proper configure one in a new custom profile set in your personal copy of ngs-preprocess config file and take advantage that nextflow allows multiple profiles to be used at once, e.g. -profile conda,sge.

By default, if no profile is chosen, the pipeline will try to load tools from the local machine $PATH. Available pre-set profiles for this pipeline are: docker/conda/singularity, you can choose between them as follows:

  • conda

    # must be executed from the base environment
    # This tells nextflow to load the available ngs-preprocess environment when required
    nextflow run fmalmeida/ngs-preprocess -profile conda [options]
  • docker

    nextflow run fmalmeida/ngs-preprocess -profile docker [options]
  • singularity

    nextflow run fmalmeida/ngs-preprocess -profile singularity [options]

📖 Please use conda as last resource since the packages will not be "frozen and pre-installed", problems may arise.

Usage

For understading pipeline usage and configuration, users must read the complete online documentation »

Using a configuration file

All the parameters showed above can be, and are advised to be, set through the configuration file. When a configuration file is set the pipeline is run by simply executing:

nextflow run fmalmeida/ngs-preprocess -c ./configuration-file

Your configuration file is what will tell to the pipeline the type of data you have, and which processes to execute. Therefore, it needs to be correctly set up.

Create a configuration file in your working directory:

nextflow run fmalmeida/ngs-preprocess [ --get_config ]

Interactive graphical configuration and execution

Via NF tower launchpad (good for cloud env execution)

Nextflow has an awesome feature called NF tower. It allows that users quickly customise and set-up the execution and configuration of cloud enviroments to execute any nextflow pipeline from nf-core, github (this one included), bitbucket, etc. By having a compliant JSON schema for pipeline configuration it means that the configuration of parameters in NF tower will be easier because the system will render an input form.

Checkout more about this feature at: https://seqera.io/blog/orgs-and-launchpad/

Via nf-core launch (good for local execution)

Users can trigger a graphical and interactive pipeline configuration and execution by using nf-core launch utility. nf-core launch will start an interactive form in your web browser or command line so you can configure the pipeline step by step and start the execution of the pipeline in the end.

# Install nf-core
pip install nf-core

# Launch the pipeline
nf-core launch fmalmeida/ngs-preprocess

It will result in the following:

Citation

To cite this tool please refer to our Zenodo tag.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the GPLv3.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, users are encouraged to cite the programs used in this pipeline whenever they are used. Links to resources of tools and data used in this pipeline are as follows:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].