All Projects → ESR-NZ → human_genomics_pipeline

ESR-NZ / human_genomics_pipeline

Licence: other
A Snakemake workflow to process single samples or cohorts of paired-end sequencing data (WGS or WES) using trim galore/bwa/GATK4/parabricks.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
CSS
56736 projects

Projects that are alternatives of or similar to human genomics pipeline

germline-DNA
A BioWDL variantcalling pipeline for germline DNA data. Starting with FASTQ files to produce VCF files. Category:Multi-Sample
Stars: ✭ 21 (+10.53%)
Mutual labels:  pipeline, gatk4, gatk-bestpractices
MTBseq source
MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.
Stars: ✭ 26 (+36.84%)
Mutual labels:  pipeline, genomics
redundans
Redundans is a pipeline that assists an assembly of heterozygous/polymorphic genomes.
Stars: ✭ 90 (+373.68%)
Mutual labels:  pipeline, genomics
companion
This repository has been archived, currently maintained version is at https://github.com/iii-companion/companion
Stars: ✭ 21 (+10.53%)
Mutual labels:  pipeline, genomics
fq
Command line utility for manipulating Illumina-generated FastQ files.
Stars: ✭ 31 (+63.16%)
Mutual labels:  genomics, illumina
ngs-preprocess
A pipeline for preprocessing NGS data from Illumina, Nanopore and PacBio technologies
Stars: ✭ 22 (+15.79%)
Mutual labels:  pipeline, illumina
bactmap
A mapping-based pipeline for creating a phylogeny from bacterial whole genome sequences
Stars: ✭ 36 (+89.47%)
Mutual labels:  pipeline, genomics
Flowcraft
FlowCraft: a component-based pipeline composer for omics analysis using Nextflow. 🐳📦
Stars: ✭ 208 (+994.74%)
Mutual labels:  pipeline, genomics
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (+552.63%)
Mutual labels:  pipeline, genomics
Sns
Analysis pipelines for sequencing data
Stars: ✭ 43 (+126.32%)
Mutual labels:  pipeline, genomics
get phylomarkers
A pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches
Stars: ✭ 34 (+78.95%)
Mutual labels:  pipeline, genomics
MGSE
Mapping-based Genome Size Estimation (MGSE) performs an estimation of a genome size based on a read mapping to an existing genome sequence assembly.
Stars: ✭ 22 (+15.79%)
Mutual labels:  genomics, illumina
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+4173.68%)
Mutual labels:  pipeline, genomics
Bedops
🔬 BEDOPS: high-performance genomic feature operations
Stars: ✭ 215 (+1031.58%)
Mutual labels:  pipeline, genomics
gawn
Genome Annotation Without Nightmares
Stars: ✭ 35 (+84.21%)
Mutual labels:  pipeline, genomics
saisoku
Saisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Stars: ✭ 40 (+110.53%)
Mutual labels:  pipeline
cherry-on-py
Cloud computing is a game changer for developers. What can you do in a couple hundred lines of code?
Stars: ✭ 67 (+252.63%)
Mutual labels:  pipeline
zenhub-pipeline
Automatically transfer issues in pipeline by commit message
Stars: ✭ 14 (-26.32%)
Mutual labels:  pipeline
psmc
Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
Stars: ✭ 121 (+536.84%)
Mutual labels:  genomics
vrs-python
GA4GH Variation Representation Python Implementation
Stars: ✭ 35 (+84.21%)
Mutual labels:  genomics

human_genomics_pipeline

A Snakemake workflow to process single samples (unrelated individuals) or cohort samples (related individuals) of paired-end sequencing data (WGS or WES) using bwa and GATK4. Quality control checks are also undertaken. The fastq files can be optionally trimmed with Trim Galore and the pipeline can run on NVIDIA GPU's where nvidia clara parabricks software is available for significant speedups in analysis times. This workflow is designed to follow the GATK best practice workflow for germline short variant discovery (SNPs + Indels). This pipeline is designed to be followed by vcf_annotation_pipeline and the data ingested into scout for clinical interpretation. However, this pipeline also stands on it's own, taking the data from fastq to vcf (raw sequencing data to called variants). This pipeline has been developed with human genetic data in mind, however we designed it to be species agnostic. Genetic data from other species can be analysed by setting a species-specific reference genome and variant databases in the configuration file (but not all situations have been tested).

Pipeline summary - single samples

  1. Raw read QC (FastQC and MultiQC)
  2. Adapter trimming (Trim Galore) (optional)
  3. Alignment against reference genome (Burrows-Wheeler Aligner)
  4. Mark duplicates (GATK MarkDuplicates)
  5. Base recalibration (GATK BaseRecalibrator and GATK ApplyBQSR)
  6. Haplotype calling (GATK HaplotypeCalller)

Pipeline summary - single samples - GPU accelerated

  1. Raw read QC (FastQC and MultiQC)
  2. Adapter trimming (Trim Galore) (optional)
  3. Alignment against reference genome, mark duplicates, base recalibration and haplotype calling (parabricks germline pipeline)

Pipeline summary - cohort samples

  1. Raw read QC (FastQC and MultiQC)
  2. Adapter trimming (Trim Galore) (optional)
  3. Alignment against reference genome (Burrows-Wheeler Aligner)
  4. Mark duplicates (GATK MarkDuplicates)
  5. Base recalibration (GATK BaseRecalibrator and GATK ApplyBQSR)
  6. Haplotype calling (GATK HaplotypeCalller)
  7. Combine GVCF into multi-sample GVCF (GATK CombineGVCFs)
  8. Genotyping (GATK GenotypeGVCFs)

Pipeline summary - cohort samples - GPU accelerated

  1. Raw read QC (FastQC and MultiQC)
  2. Adapter trimming (Trim Galore) (optional)
  3. Alignment against reference genome, mark duplicates, base recalibration and haplotype calling (parabricks germline pipeline)
  4. Combine GVCF into multi-sample GVCF (parabricks trio combine gvcf)
  5. Genotyping (GATK GenotypeGVCFs)

Main output files

Single samples:

  • results/qc/multiqc_report.html
  • results/mapped/sample1_recalibrated.bam
  • results/called/sample1_raw_snps_indels.vcf

Cohort samples:

  • results/qc/multiqc_report.html
  • results/mapped/sample1_recalibrated.bam
  • results/mapped/sample2_recalibrated.bam
  • results/mapped/sample3_recalibrated.bam
  • results/called/proband1_raw_snps_indels.vcf

Prerequisites

Test human_genomics_pipeline

The provided test dataset can be used to test running this pipeline on a new machine, or test pipeline developments/releases.

Run human_genomics_pipeline

See the docs for a walkthrough guide for running human_genomics_pipeline on:

Contribute back!

Contributions and feedback are always welcome! 😊

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].