All Projects → IARCbioinfo → alignment-nf

IARCbioinfo / alignment-nf

Licence: GPL-3.0 license
Whole Exome/Whole Genome Sequencing alignment pipeline

Programming Languages

Nextflow
61 projects
HTML
75241 projects
Dockerfile
14818 projects
shell
77523 projects
Roff
2310 projects

Projects that are alternatives of or similar to alignment-nf

ilus
A handy variant calling pipeline generator for whole genome re-sequencing (WGS) and whole exom sequencing data (WES) analysis. 一个简易且全面的 WGS/WES 分析流程生成器.
Stars: ✭ 64 (+236.84%)
Mutual labels:  ngs, whole-genome-sequencing
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-10.53%)
Mutual labels:  ngs, alignment
ngs pipeline
Exome/Capture/RNASeq Pipeline Implementation using snakemake
Stars: ✭ 40 (+110.53%)
Mutual labels:  ngs, gatk
learning vcf file
Learning the Variant Call Format
Stars: ✭ 104 (+447.37%)
Mutual labels:  gatk
ngs-preprocess
A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies
Stars: ✭ 22 (+15.79%)
Mutual labels:  ngs
MA
The Modular Aligner and The Modular SV Caller
Stars: ✭ 39 (+105.26%)
Mutual labels:  alignment
ngsLD
Calculation of pairwise Linkage Disequilibrium (LD) under a probabilistic framework
Stars: ✭ 25 (+31.58%)
Mutual labels:  ngs
CliqueSNV
No description or website provided.
Stars: ✭ 13 (-31.58%)
Mutual labels:  ngs
angsd-wrapper
Utilities for analyzing next generation sequencing data.
Stars: ✭ 13 (-31.58%)
Mutual labels:  ngs
mtcnn tf
MTCNN implement by tensorflow. Easy to training and testing.
Stars: ✭ 41 (+115.79%)
Mutual labels:  alignment
nightlight
Nightlight: Astronomic Image Processing
Stars: ✭ 25 (+31.58%)
Mutual labels:  alignment
needlestack
Multi-sample somatic variant caller
Stars: ✭ 45 (+136.84%)
Mutual labels:  ngs
bio-dockers
🐳 Bio-dockers: dockerized bioinformatic tools
Stars: ✭ 33 (+73.68%)
Mutual labels:  ngs
STing
Ultrafast sequence typing and gene detection from NGS raw reads
Stars: ✭ 15 (-21.05%)
Mutual labels:  ngs
BWA-MEME
Faster BWA-MEM2 using learned-index
Stars: ✭ 77 (+305.26%)
Mutual labels:  ngs
SpatialAlignment
Helpful components for aligning and keeping virtual objects aligned with the physical world.
Stars: ✭ 29 (+52.63%)
Mutual labels:  alignment
FAIR.m
Flexible Algorithms for Image Registration
Stars: ✭ 103 (+442.11%)
Mutual labels:  alignment
myVCF
myVCF: a web-based platform for target and exome mutations data management
Stars: ✭ 18 (-5.26%)
Mutual labels:  ngs
STRling
Detect novel (and reference) STR expansions from short-read data
Stars: ✭ 42 (+121.05%)
Mutual labels:  whole-genome-sequencing
asap
A scalable bacterial genome assembly, annotation and analysis pipeline
Stars: ✭ 47 (+147.37%)
Mutual labels:  ngs

alignment-nf

Nextflow pipeline for BAM realignment or fastq alignment

CircleCI Docker Hub https://www.singularity-hub.org/static/img/hosted-singularity--hub-%23e32929.svg

Workflow representation

Description

Nextflow pipeline to perform BAM realignment or fastq alignment and QC, with/without local indel realignment and base quality score recalibration.

Dependencies

  1. Nextflow : for common installation procedures see the IARC-nf repository.

Basic fastq alignment

  1. bwa2 (default) or bwa
  2. samblaster
  3. sambamba

BAM files realignment

  1. samtools

Adapter sequence trimming

  1. AdapterRemoval

ALT contigs handling

  1. the k8 javascript execution shell (e.g., available in the bwakit archive); must be in the PATH
  2. javascript bwa-postalt.js and the additional fasta reference .alt file from bwakit must be in the same directory as the reference genome file.

QC

  1. Qualimap.
  2. Multiqc.

Base quality score recalibration

  1. GATK4; wrapper 'gatk' must be in the path
  2. GATK bundle VCF files with lists of indels and SNVs (recommended: Mills gold standard indels VCFs, dbsnp VCF), and corresponding tabix indexes (.tbi)

A conda receipe, and docker and singularity containers are available with all the tools needed to run the pipeline (see "Usage")

Input

Type Description
--input_folder a folder with fastq files or bam files

Parameters

  • Mandatory

Name Example value Description
--ref hg19.fasta genome reference with its index files (.fai, .sa, .bwt, .ann, .amb, .pac, and .dict; in the same directory)
  • Optional

Name Default value Description
--input_file null Input file (comma-separated) with 4 columns: SM (sample name), RG (read group ID), pair1 (first fastq of the pair), and pair2 (second fastq of the pair).
--output_folder . Output folder for aligned BAMs
--cpu 8 number of CPUs
--cpu_BQSR 2 number of CPUs for GATK base quality score recalibration
--mem 32 memory
--mem_BQSR 10 memory for GATK base quality score recalibration
--RG PL:ILLUMINA sequencing information for aligned (for bwa)
--fastq_ext fastq.gz extension of fastq files
--suffix1 _1 suffix for second element of read files pair
--suffix2 _2 suffix for second element of read files pair
--bed bed file with interval list
--snp_vcf dbsnp.vcf path to SNP VCF from GATK bundle (default : dbsnp.vcf)
--indel_vcf Mills_1000G_indels.vcf path to indel VCF from GATK bundle (default : Mills_1000G_indels.vcf)
--postaltjs bwa-postalt.js" path to postalignment javascript bwa-postalt.js
--feature_file null Path to feature file for qualimap
--multiqc_config null config yaml file for multiqc
--adapterremoval_opt null Command line options for AdapterRemoval
--bwa_mem bwa-mem2 mem bwa-mem command; use "bwa mem" to switch to regular bwa-mem (both are in the docker and singularity containers)
  • Flags

Flags are special parameters without value.

Name Description
--help print usage and optional parameters
--trim enable adapter sequence trimming
--recalibration perform quality score recalibration (GATK)
--alt enable alternative contig handling (for reference genome hg38)
--bwa_option_M Trigger the -M option in bwa and the corresponding compatibility option in samblaster (marks shorter split hits as secondary)

Usage

To run the pipeline on a series of fastq or BAM files in folder input and a fasta reference file hg19.fasta, one can type:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output

To run the pipeline without singularity just remove "-profile singularity". Alternatively, one can run the pipeline using a docker container (-profile docker) the conda receipe containing all required dependencies (-profile conda).

Use bwa-mem instead of bwa-mem2

To use bwa-mem, one can type:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --bwa_mem "bwa mem"

Enable adapter trimming

To use the adapter trimming step, you must add the --trim option, as well as satisfy the requirements above mentionned. For example:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --trim

Enable ALT mode

To use the alternative contigs handling mode, you must provide the path to an ALT aware genome reference (e.g., hg38) AND add the --alt option, as well as satisfy the above-mentionned requirements. For example:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --postaltjs /user/bin/bwa-0.7.15/bwakit/bwa-postalt.js --alt

Enable base quality score recalibration

To use the base quality score recalibration step, you must provide the path to 2 GATK bundle VCF files with lists of known snps and indels, respectively, AND add the --recalibration option, as well as satisfy the requirements above mentionned. For example:

nextflow run iarcbioinfo/alignment-nf -r v1.3 -profile singularity  --input_folder input/ --ref hg19.fasta --output_folder output --snp_vcf GATKbundle/dbsnp.vcf.gz --indel_vcf GATKbundle/Mills_1000G_indels.vcf.gz --recalibration

Output

Type Description
BAM/ folder with BAM and BAI files of alignments or realignments
QC/BAM/multiqc_qualimap_flagstat_*report.html multiQC report for qualimap and samtools flagstat (duplicates)
QC/BAM/multiqc_qualimap_flagstat_*report_data data used for the multiQC report
QC/qualimap/file_BQSRecalibrated.stats.txt qualimap summary file
QC/qualimap/file_BQSRecalibrated/ qualimap files
QC/BAM/BQSR/ GATK base quality score recalibration outputs (tables and pdf comparing scores before/after recalibration)

Directed Acyclic Graph

DAG

FAQ

Why did Indel realignment disappear from version 1.0?

Indel realignment was removed following new GATK best practices for pre-processing.

Contributions

Name Email Description
Nicolas Alcala* [email protected] Developer to contact for support
Catherine Voegele [email protected] Tester
Vincent Cahais [email protected] Tester
Alexis Robitaille [email protected] Tester
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].