All Projects → stjude → CICERO

stjude / CICERO

Licence: other
CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data.

Programming Languages

perl
6916 projects
shell
77523 projects
HTML
75241 projects
Dockerfile
14818 projects
python
139335 projects - #7 most used programming language
scala
5932 projects

Projects that are alternatives of or similar to CICERO

drop
Pipeline to find aberrant events in RNA-Seq data, useful for diagnosis of rare disorders
Stars: ✭ 69 (+263.16%)
Mutual labels:  rna-seq
biojupies
Automated generation of tailored bioinformatics Jupyter Notebooks via a user interface.
Stars: ✭ 96 (+405.26%)
Mutual labels:  rna-seq
iDEA
Differential expression (DE); gene set Enrichment Analysis (GSEA); single cell RNAseq studies (scRNAseq)
Stars: ✭ 23 (+21.05%)
Mutual labels:  rna-seq
pychopper
A tool to identify, orient, trim and rescue full length cDNA reads
Stars: ✭ 74 (+289.47%)
Mutual labels:  rna-seq
snakefiles
🐍 Snakefiles for common RNA-seq data analysis workflows.
Stars: ✭ 78 (+310.53%)
Mutual labels:  rna-seq
READemption
A pipeline for the computational evaluation of RNA-Seq data
Stars: ✭ 24 (+26.32%)
Mutual labels:  rna-seq
GREIN
GREIN : GEO RNA-seq Experiments Interactive Navigator
Stars: ✭ 40 (+110.53%)
Mutual labels:  rna-seq
CD4-csaw
Reproducible reanalysis of a combined ChIP-Seq & RNA-Seq data set
Stars: ✭ 16 (-15.79%)
Mutual labels:  rna-seq
OrchestratingSingleCellAnalysis-release
An online companion to the OSCA manuscript demonstrating Bioconductor resources and workflows for single-cell RNA-seq analysis.
Stars: ✭ 35 (+84.21%)
Mutual labels:  rna-seq
cruk-summer-school-2018
Summer school course materials collection
Stars: ✭ 24 (+26.32%)
Mutual labels:  rna-seq
GGR-cwl
CWL tools and workflows for GGR
Stars: ✭ 20 (+5.26%)
Mutual labels:  rna-seq
TransPi
TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly
Stars: ✭ 18 (-5.26%)
Mutual labels:  rna-seq
IsoQuant
Reference-based transcript discovery from long RNA read
Stars: ✭ 26 (+36.84%)
Mutual labels:  rna-seq
scCATCH
Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data
Stars: ✭ 137 (+621.05%)
Mutual labels:  rna-seq
rnafusion
RNA-seq analysis pipeline for detection gene-fusions
Stars: ✭ 72 (+278.95%)
Mutual labels:  rna-seq
Whippet.jl
Lightweight and Fast; RNA-seq quantification at the event-level
Stars: ✭ 85 (+347.37%)
Mutual labels:  rna-seq
RATTLE
Reference-free reconstruction and error correction of transcriptomes from Nanopore long-read sequencing
Stars: ✭ 35 (+84.21%)
Mutual labels:  rna-seq
diffexpr
Porting DESeq2 and DEXSeq into python via rpy2
Stars: ✭ 49 (+157.89%)
Mutual labels:  rna-seq
squid
SQUID detects both fusion-gene and non-fusion-gene structural variations from RNA-seq data
Stars: ✭ 37 (+94.74%)
Mutual labels:  rna-seq
NGI-RNAseq
Nextflow RNA-Seq Best Practice analysis pipeline, used at the SciLifeLab National Genomics Infrastructure.
Stars: ✭ 50 (+163.16%)
Mutual labels:  rna-seq

CICERO

Status Github Issues Pull Requests Actions: CI Status License: MIT

CICERO (Clipped-reads Extended for RNA Optimization) is an assembly-based algorithm to detect diverse classes of driver gene fusions from RNA-seq.
Explore the docs »
Work with demo data »
Read the paper »

Request Feature Report Bug
Consider starring the repo!


To discover driver fusions beyond canonical exon-to-exon chimeric transcripts, we develop CICERO, a local assembly-based algorithm that integrates RNA-seq read support with extensive annotation for candidate ranking. CICERO outperforms commonly used methods, achieving a 95% detection rate for 184 independently validated driver fusions including internal tandem duplications and other non-canonical events in 170 pediatric cancer transcriptomes. Overview of CICERO algorithm which consists of fusion detection through analysis of candidate SV breakpoints and splice junction, fusion annotation, and ranking.


📝 Table of Contents

Running CICERO

Add the src/scripts directory to your system PATH variable. Add the src/perllib and dependencies/lib/perl directories to your system PERL5LIB variable.

Then invoke the CICERO wrapper as

Cicero.sh [-h] [-n ncores] -b bamfile -g genome -r refdir [-j junctions] [-o outdir] [-t threshold] [-s sc_cutoff] [-c sc_shift] [-p] [-d]

-p - optimize CICERO, sets sc_cutoff=3 and sc_shift=10 [default true]
-s <num> - minimum number of soft clip support required [default=2]
-t <num> - threshold for enabling increased soft clip cutoff [default=200000]
-c <num> - clustering distance for grouping similar sites [default=3]
-j <file> - junctions file from RNApeg
-n <num> - number of cores to utilize with GNU parallel
-d - disable excluded regions file use
  • ncores is the number of cores to be run on (with GNU parallel).
  • bamfile is the input bamfile mapped to human genome builds GRCh37-lite or GRCh38_no_alt. Contact us if your bam is based on other reference version.
  • genome is either GRCh37-lite or GRCh38_no_alt. CICERO only support the two human reference genome versions.
  • refdir is the reference file directory specific to CICERO. Download Reference Files below. e.g. -r /home/user/software/CICERO/reference_hg38/ or -r /home/user/software/CICERO/reference_hg19/
  • outdir is user defined output file folder.
  • junctions is the junctions file output from RNApeg. See Generate Junctions below. CICERO can detect fusion by analysis of splice junction reads. If this option is omitted, fusions generated by small deletions may be missed as these events may lack the soft-clipped reads.
  • threshold CICERO first detects all soft-clipped positions supported by >=2 reads from bam file. For sample with <=threshold (default 200,000) soft-clipped positions, CICERO will detect fusions based on these soft-clipped positons; otherwise, to speed-up CICERO running, CICERO will detect fusions based on soft-clipped positions supported by >=sc_cutoff (3, default for optimize mode, see below) reads. For sample with lots of soft-clipped positions, a smaller threshold will speed-up CICERO running, however, some fusion events (i.e. only supported by 2 reads) may be missed.
  • sc_cutoff controls the number of soft clip reads required to support a putative site. The default is 2, but for samples with large numbers of soft clip reads, it may be desirable to require additional support to reduce the computational time required.
  • sc_shift sets the threshold for considering events to be the same site.
  • optimize defaults to ON. This sets sc_cutoff to 3 for samples where the number of soft clip sites exceeds 200,000. It also sets sc_shift to 10 which sets the distance to consider events the same.
  • -no-optimize turns optimizations off. This can increase sensitivity, but increases the computational requirements.

The final CICERO fusion result file will be located at <outdir>/CICERO_DATADIR/<sample name>/final_fusions.txt. Use the following guide to interpret the results.

To visualize CICERO fusion output you can load the final fusion output file at https://proteinpaint.stjude.org/FusionEditor/.

Dependencies

  • GNU parallel
  • Samtools 1.3.1
  • Cap3
  • Blat
  • Java 1.8.0
  • Perl 5.10.1 with libraries:
    • base
    • Bio
    • Carp
    • Compress
    • Cwd
    • Data
    • DBI
    • diagnostics
    • Digest
    • English
    • enum
    • Exporter
    • File
    • FileHandle
    • List
    • POSIX
    • strict
    • Sys
    • Tree
    • warnings

Running with Docker

CICERO can be run with Docker. Pre-built Docker images are provided for each release in GitHub Packages.

Invoke the CICERO wrapper using the Docker image available in GitHub Packages. You will likely need to add an additional bind mount for the output and input (BAM + junctions) files. Note the following command pulls the latest tag for the Docker image. For reproducible results, it is advisable to specify the exact version to run.

docker run -v <path to reference directory>:/reference ghcr.io/stjude/cicero:latest [-n cores] -b <bam file path> -g <genome> -r /reference -o <output directory> [-j junctions file] [-p] [-s int] [-t int] [-c int]

See Running CICERO for details of the parameters.

Running with St. Jude Cloud

CICERO is integrated in the St. Jude Cloud Rapid RNA-Seq workflow. To run CICERO in St. Jude Cloud, access the tool through the platform page. Documentation for running and interpreting results is available in the user guide.

Generate junctions file with RNApeg

RNApeg is required to generate a junctions file for use by CICERO. You can get RNApeg from both Docker and Singularity. Once RNApeg is complete, the *.junctions.tab.shifted.tab file can be provided to CICERO using the -j argument.

Running RNApeg via Docker:

RNApeg is authored by Michael N. Edmonson (@mnedmonson).

RNApeg overview

This software analyzes nextgen RNA sequencing data which has been mapped to whole-genome coordinates, identifying evidence of both known and novel splicing events from the resulting alignments. The raw junction sites in the mapped BAMs undergo postprocessing to correct various issues related to mapping ambiguity. The result is a more compact and consistent set of junction calls, simplifying downstream quantification, analysis, and comparison.

RNApeg key features

Raw junction extraction

First, the BAM read mappings are analyzed to identify putative junction sites. This produces a list of junction sites along with counts of supporting reads and several associated quality metrics. While reflective of the BAM data, this output typically requires refinement by the following steps.

Correction vs. reference junctions

Novel junctions are compared with reference exon junction boundaries and evaluated for mapping ambiguity which can justify adjusting the sites to match. Even small ambiguities such as the presence of the same nucleotide on either side of a junction can be enough to nudge a prediction that would otherwise perfectly match a reference isoform out of place.

Self-correction for novel junctions

Mapping ambiguity is next evaluated within the novel junctions themselves. Ambiguous junctions are combined where possible, merging their counts of supporting reads and related annotations. This reduces the population of novel junctions while simultaneously improving the evidence for those remaining. Combining evidence for poorly-covered sites also improves the chances of these sites passing the default minimum level of 3 supporting reads required for reporting junctions in the final output.

Correction vs. novel skips of known exons

Additional correction of novel junctions is also performed to identify previously unknown skips of an exon (or exons) within known reference isoforms. Special handling is required in these cases because while the corrected boundaries are known, the events themselves are novel.

Edge correction of novel junctions vs. reference exons

Junctions may also be shifted in cases of ambiguity involving a single edge (i.e. junction start or end). While not doubly-anchorable as with known reference junctions or novel skips of known exons, this adjustment can standardize evidence e.g. for novel exons.

Junction calling

Novel junctions are subjected to additional scrutiny before being reported:

  • must be supported by a minimum of 3 reads
  • at least one read must pass minimum flanking sequence requirements, to avoid false positives near read ends due to insufficient anchoring
  • the junction must be either observed bidirectionally, or be supported by very clean alignments (either perfect or with very few high-quality mismatches, insertions, deletions, or soft clips)

While these requirements are minimal, they substantially reduce background noise.

Cross-sample correction

This step pools results for a set of samples and does additional standardization of novel exons based on the combined set. Mostly this has the effect of standardizing ambiguous novel junction sites across samples, but it can occasionally result in combinations of sites as well.

Output

The primary output files are tab-delimited text.

Output is also written in UCSC .bed format, which can be used to visualize the junctions and supporting read counts within the UCSC genome browser.

Running RNApeg via Docker

docker run -v <outdir>:/results ghcr.io/stjude/rnapeg:latest -b bamfile -f fasta -r refflat
  • fasta reference genome; i.e. "Homo_sapiens/GRCh38_no_alt/FASTA/GRCh38_no_alt.fa" or "Homo_sapiens/GRCh37-lite/FASTA/GRCh37-lite.fa" from Reference Files.
  • refflat i.e. "Homo_sapiens/GRCh38_no_alt/mRNA/RefSeq/refFlat.txt" or "Homo_sapiens/GRCh37-lite/mRNA/Combined/all_refFlats.txt" from Reference Files.

Running RNApeg via Singularity:

singularity run --containall --bind <outdir>:/results docker://ghcr.io/stjude/rnapeg:latest -b bamfile -f fasta -r refflat

You will also need to add --bind arguments to mount the file paths for bamfile, fasta, and refflat into the container.

Downloading reference files

Reference files are required to run CICERO. They can be found at the following location:

Supported Genome Version

CICERO currently supports GRCh37-lite and GRCh38_no_alt.

Demo

A demo of CICERO can be found at the following location:

Citation

Tian, L., Li, Y., Edmonson, M.N. et al. CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data. Genome Biol 21, 126 (2020). https://doi.org/10.1186/s13059-020-02043-x

License

Copyright 2020 St. Jude Children's Research Hospital

Licensed under a modified version of the Apache License, Version 2.0 (the "License") for academic research use only; you may not use this file except in compliance with the License. To inquire about commercial use, please contact the St. Jude Office of Technology Licensing at [email protected].

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].