All Projects → GooglingTheCancerGenome → Sv Callers

GooglingTheCancerGenome / Sv Callers

Licence: apache-2.0
Snakemake-based workflow for detecting structural variants in WGS data

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Sv Callers

pathway-mapper
PathwayMapper: An interactive and collaborative graphical curation tool for cancer pathways
Stars: ✭ 47 (+67.86%)
Mutual labels:  bioinformatics, cancer-genomics
Pygeno
Personalized Genomics and Proteomics. Main diet: Ensembl, side dishes: SNPs
Stars: ✭ 261 (+832.14%)
Mutual labels:  bioinformatics, cancer-genomics
bistro
A library to build and execute typed scientific workflows
Stars: ✭ 43 (+53.57%)
Mutual labels:  workflow, bioinformatics
Cuneiform
Cuneiform distributed programming language
Stars: ✭ 175 (+525%)
Mutual labels:  bioinformatics, workflow
Cromwell
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
Stars: ✭ 655 (+2239.29%)
Mutual labels:  bioinformatics, workflow
Awesome Cancer Variant Databases
A community-maintained repository of cancer clinical knowledge bases and databases focused on cancer variants.
Stars: ✭ 212 (+657.14%)
Mutual labels:  bioinformatics, cancer-genomics
sapporo
A standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification and a web application for managing and executing those WES services.
Stars: ✭ 17 (-39.29%)
Mutual labels:  workflow, bioinformatics
Ugene
UGENE is free open-source cross-platform bioinformatics software
Stars: ✭ 112 (+300%)
Mutual labels:  bioinformatics, workflow
Getting Started With Genomics Tools And Resources
Unix, R and python tools for genomics and data science
Stars: ✭ 587 (+1996.43%)
Mutual labels:  bioinformatics, cancer-genomics
Wdl
Workflow Description Language - Specification and Implementations
Stars: ✭ 438 (+1464.29%)
Mutual labels:  bioinformatics, workflow
Rnaseq Workflow
A repository for setting up a RNAseq workflow
Stars: ✭ 170 (+507.14%)
Mutual labels:  bioinformatics, workflow
Scipipe
Robust, flexible and resource-efficient pipelines using Go and the commandline
Stars: ✭ 826 (+2850%)
Mutual labels:  bioinformatics, workflow
Somalier
fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
Stars: ✭ 128 (+357.14%)
Mutual labels:  bioinformatics, cancer-genomics
TeamTeri
Genomics using open source tools, running on GCP or AWS
Stars: ✭ 30 (+7.14%)
Mutual labels:  bioinformatics, cancer-genomics
Sarek
Detect germline or somatic variants from normal or tumour/normal whole-genome or targeted sequencing
Stars: ✭ 124 (+342.86%)
Mutual labels:  bioinformatics, workflow
SigProfilerExtractor
SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGen…
Stars: ✭ 86 (+207.14%)
Mutual labels:  bioinformatics, cancer-genomics
Genomics
A collection of scripts and notes related to genomics and bioinformatics
Stars: ✭ 101 (+260.71%)
Mutual labels:  bioinformatics, workflow
Pegasus
Pegasus Workflow Management System - Automate, recover, and debug scientific computations.
Stars: ✭ 110 (+292.86%)
Mutual labels:  bioinformatics, workflow
Arvados
An open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (+878.57%)
Mutual labels:  bioinformatics, workflow
Galaxy
Data intensive science for everyone.
Stars: ✭ 812 (+2800%)
Mutual labels:  bioinformatics, workflow

sv-callers

DOI Published in PeerJ Build Status Codacy Badge

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-callers is a Snakemake-based workflow that combines several state-of-the-art tools for detecting SVs in whole genome sequencing (WGS) data. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.

Dependencies

  • Python 3
  • Conda - package/environment management system
  • Snakemake - workflow management system
  • Xenon CLI - command-line interface to compute and storage resources
  • jq - command-line JSON processor (optional)

The workflow includes the following bioinformatics tools:

1. Clone this repo.

git clone https://github.com/GooglingTheCancerGenome/sv-callers.git
cd sv-callers

2. Install dependencies.

# download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# install Conda (respond by 'yes')
bash miniconda.sh
# update Conda
conda update -y conda
# create & activate new env with installed deps
conda env create -n wf -f environment.yaml
conda activate wf
cd snakemake

3. Configure the workflow.

  • config files:

    • analysis.yaml - analysis-specific settings (e.g., workflow mode, I/O files, SV callers, post-processing or resources used etc.)
    • environment.yaml - software dependencies and versions
  • input files:

    • example data in sv-callers/snakemake/data directory
    • reference genome in .fasta (incl. index files)
    • excluded regions in .bed (optional)
    • WGS samples in .bam (incl. index files)
    • list of (paired) samples in samples.csv
  • output files:

    • (filtered) SVs per caller and merged calls in .vcf (incl. index files)

4. Execute the workflow.

# 'dry' run only checks I/O files
snakemake -np

# 'vanilla' run (default) mimics the execution of SV callers by writing (dummy) VCF files
snakemake -C echo_run=1

Note: One sample or a tumor/normal pair generates eight SV calling jobs (i.e., 1 x Manta, 1 x LUMPY, 1 x GRIDSS and 5 x DELLY) and six post-processing jobs. See the workflow instance of single-sample (germline) or paired-sample (somatic) analysis.

Submit jobs to Slurm or GridEngine cluster

SCH=slurm   # or gridengine
snakemake -C echo_run=1 mode=p enable_callers="['manta','delly','lumpy','gridss']" --use-conda --latency-wait 30 --jobs 14 \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 1 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

To perform SV calling:

  • overwrite (default) parameters directly in analysis.yaml or via the snakemake CLI (use the -C argument)

    • set echo_run=0
    • choose between two workflow modes: single- (s) or paired-sample (p - default)
    • select one or more callers using enable_callers (default all: "['manta','delly,'lumpy','gridss']")
  • use xenon CLI to set:

    • --max-run-time of workflow jobs (in minutes)
    • --temp-space (optional, in MB)
  • adjust compute requirements per SV caller according to the system used:

    • the number of threads,
    • the amount of memory(in MB),
    • the amount of temporary disk space or tmpspace (path in TMPDIR env variable) can be used for intermediate files by LUMPY and GRIDSS only.

Query job accounting information

SCH=slurm   # or gridengine
xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].