All Projects → sigven → cacao

sigven / cacao

Licence: MIT license
Callable Cancer Loci - assessment of sequencing coverage for actionable and pathogenic loci in cancer

Programming Languages

r
7636 projects
python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
shell
77523 projects

Projects that are alternatives of or similar to cacao

cpsr
Cancer Predisposition Sequencing Reporter (CPSR)
Stars: ✭ 44 (+109.52%)
Mutual labels:  cancer, cancer-genomics, pathogenic-variants, pathogenic-loci
PHAT
Pathogen-Host Analysis Tool - A modern Next-Generation Sequencing (NGS) analysis platform
Stars: ✭ 17 (-19.05%)
Mutual labels:  quality-control, next-generation-sequencing, alignment, bam
pblat
parallelized blat with multi-threads support
Stars: ✭ 34 (+61.9%)
Mutual labels:  sequencing, alignment
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (+28.57%)
Mutual labels:  sequencing, next-generation-sequencing
civic-server
Backend Server for CIViC Project
Stars: ✭ 39 (+85.71%)
Mutual labels:  cancer, cancer-genomics
ATACseq
Analysis Workflow for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq)
Stars: ✭ 51 (+142.86%)
Mutual labels:  sequencing, next-generation-sequencing
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (+147.62%)
Mutual labels:  sequencing, alignment
gff3toembl
Converts Prokka GFF3 files to EMBL files for uploading annotated assemblies to EBI
Stars: ✭ 27 (+28.57%)
Mutual labels:  sequencing, next-generation-sequencing
gubbins
Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins
Stars: ✭ 103 (+390.48%)
Mutual labels:  sequencing, next-generation-sequencing
MRQy
MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.
Stars: ✭ 58 (+176.19%)
Mutual labels:  quality-control, quality-assurance
civic-client
Web client for CIViC: Clinical Interpretations of Variants in Cancer
Stars: ✭ 49 (+133.33%)
Mutual labels:  cancer, cancer-genomics
lighthouse-keeper
This package is no longer under active development. We recommend using Lighthouse CI. CLI tool for running Google’s Lighthouse checks
Stars: ✭ 15 (-28.57%)
Mutual labels:  quality-control, quality-assurance
indigo
Indigo: SNV and InDel Discovery in Chromatogram traces obtained from Sanger sequencing of PCR products
Stars: ✭ 26 (+23.81%)
Mutual labels:  sequencing, alignment
mlst check
Multilocus sequence typing by blast using the schemes from PubMLST
Stars: ✭ 22 (+4.76%)
Mutual labels:  sequencing, next-generation-sequencing
saffrontree
SaffronTree: Reference free rapid phylogenetic tree construction from raw read data
Stars: ✭ 17 (-19.05%)
Mutual labels:  sequencing, next-generation-sequencing
assembly improvement
Improve the quality of a denovo assembly by scaffolding and gap filling
Stars: ✭ 46 (+119.05%)
Mutual labels:  sequencing, next-generation-sequencing
pheniqs
Fast and accurate sequence demultiplexing
Stars: ✭ 14 (-33.33%)
Mutual labels:  sequencing, bam
tracy
Basecalling, alignment, assembly and deconvolution of Sanger Chromatogram trace files
Stars: ✭ 73 (+247.62%)
Mutual labels:  sequencing, alignment
snp-sites
Finds SNP sites from a multi-FASTA alignment file
Stars: ✭ 182 (+766.67%)
Mutual labels:  sequencing, next-generation-sequencing
mrivis
medical image visualization library and development toolkit
Stars: ✭ 19 (-9.52%)
Mutual labels:  quality-control, alignment

cacao - callable cancer loci

Contents

Overview

cacao is a computational workflow that provides software and data to assess sequencing depth for clinically actionable/pathogenic loci in cancer for a given sequence alignment (BAM/CRAM). Most importantly, the software will pinpoint genomic loci of clinical relevance in cancer that has sufficient sequencing coverage for reliable variant calling. In combination with the actual variants that have been identified, it may thus serve to confirm negative findings, a matter of significant clinical value that is underappreciated in current cancer sequencing analysis. The specific requirements to denote loci as callable (i.e. depth & alignment quality) can be configured by the user, and should thus reflect how the input are used for variant calling (RNA/DNA, germline/somatic calling)

Technically, cacao combines the speed of mosdepth with the powerful R markdown framework for interactive data reporting. It currently employs the Docker technology for software encapsulation to ease the installation process (A Conda package is in the making)

News

  • December 9th 2020: 0.3.1 release
    • Updated track directory for ClinVar and CIViC
    • Dockerfile uses renv for improved installation of R package dependencies

Annotation resources (v0.3.1)

Three clinical genomic tracks in BED format have been created:

  • Loci with pathogenic and likely pathogenic variants in protein-coding genes related to cancer predisposition and inherited cancer syndromes (BRCA1, BRCA2, ATM etc.)
  • Loci associated with actionable somatic variants (related to prognosis, diagnosis, or drug sensitivity, e.g. BRAF V600E)
    • Variants have been retrieved from CIViC (data harvested December 9th 2020)
    • Only variants that can be mapped unambigusously to the genome are considered as sources of actionable loci
  • Loci identified as somatic mutational hotspots (i.e. likely driver alterations) in cancer

IMPORTANT: At each variant identified from the three sources above, we have used a surrounding sequence window of approximately 10bp for which the mean depth is calculated and representing the loci coverage.

All three tracks (hereditary, somatic_actionable, and somatic_hotspot) are available for GRCh37 and GRCh38, and there is also tab-separated files that link each locus to its associated

  • variants and phenotypes (ClinVar),
  • clinical evidence items (therapeutic context, evidence level, from CIViC)
  • tumor types (cancerhotspots.org)

Example reports

  • An example report from the CACAO workflow showing callable cancer loci in an RNA sequence alignment.

Getting started

Installation

  • Prerequisites:
    • Make sure that Docker is installed and running
    • The CACAO workflow script cacao_wflow.py requires that Python3 is installed
  • Download the latest release
  • Pull the latest docker image docker pull sigven/cacao:0.3.1

Usage

Run the CACAO workflow with the cacao_wflow.py Python script, which takes the following required and optional arguments:

usage:
cacao_wflow.py -h [options]
--query_aln BAM/CRAM
--track_dir TRACK_DIR
--output_dir OUTPUT_DIR
--genome_assembly grch37|grch38
--sample_id SAMPLE_ID
--mode hereditary|somatic|any

cacao - assessment of sequencing coverage at pathogenic and actionable loci in
cancer

Required arguments:
  --query_aln QUERY_ALN
                        Query alignment file (BAM/CRAM)
  --track_dir TRACK_DIR
                        Directory with BED tracks of pathogenic/actionable cancer loci for grch37/grch38
  --output_dir OUTPUT_DIR
                        Output directory
  --genome_assembly {grch37,grch38}
                        Human genome assembly build: grch37 or grch38
  --mode {hereditary,somatic,any}
                        Choice of loci and clinical cancer context (cancer predisposition/tumor sequencing)
  --sample_id SAMPLE_ID
                        Sample identifier - prefix for output files

Optional arguments:
  -h, --help            show this help message and exit
  --mapq MAPQ           mapping quality threshold (default: 0)
  --threads THREADS     Number of mosdepth BAM decompression threads. (use 4
                        or fewer) (default: 0)
  --callability_levels_germline CALLABILITY_LEVELS_GERMLINE
                        Simple colon-separated string that defines four levels
                        of variant callability: NO_COVERAGE (0), LOW_COVERAGE
                        (1-9), CALLABLE (10-99), HIGH_COVERAGE (>= 100).
                        Initial value must be 0. (default: 0:10:100)
  --callability_levels_somatic CALLABILITY_LEVELS_SOMATIC
                        Simple colon-separated string that defines four levels
                        of variant callability: NO_COVERAGE (0), LOW_COVERAGE
                        (1-29), CALLABLE (30-199), HIGH_COVERAGE (>= 200).
                        Initial value must be 0. (default: 0:30:200)
  --query_target QUERY_TARGET
                        BED file with genome target regions subject to
                        sequencing/analysis (default: None)
  --force_overwrite     By default, the script will fail with an error if any
                        output file already exists. You can force the
                        overwrite of existing result files by using this flag
                        (default: False)
  --version             show program's version number and exit

Documentation

Coming

Contact

sigven AT ifi.uio.no

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].