All Projects → Wittelab → orchid

Wittelab / orchid

Licence: other
A novel management, annotation, and machine learning framework for analyzing cancer mutations

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to orchid

Music2
identifying mutational significance in cancer genomes
Stars: ✭ 49 (+68.97%)
Mutual labels:  cancer-genomics
Ideogram
Chromosome visualization for the web
Stars: ✭ 181 (+524.14%)
Mutual labels:  cancer-genomics
SCICoNE
Single-cell copy number calling and event history reconstruction.
Stars: ✭ 20 (-31.03%)
Mutual labels:  cancer-genomics
Somaticseq
An ensemble approach to accurately detect somatic mutations using SomaticSeq
Stars: ✭ 119 (+310.34%)
Mutual labels:  cancer-genomics
Lollipops
Lollipop-style mutation diagrams for annotating genetic variations.
Stars: ✭ 147 (+406.9%)
Mutual labels:  cancer-genomics
Delly
DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
Stars: ✭ 247 (+751.72%)
Mutual labels:  cancer-genomics
Sv Callers
Snakemake-based workflow for detecting structural variants in WGS data
Stars: ✭ 28 (-3.45%)
Mutual labels:  cancer-genomics
civic-server
Backend Server for CIViC Project
Stars: ✭ 39 (+34.48%)
Mutual labels:  cancer-genomics
Pcgr
Personal Cancer Genome Reporter (PCGR)
Stars: ✭ 168 (+479.31%)
Mutual labels:  cancer-genomics
cacao
Callable Cancer Loci - assessment of sequencing coverage for actionable and pathogenic loci in cancer
Stars: ✭ 21 (-27.59%)
Mutual labels:  cancer-genomics
Tybalt
Training and evaluating a variational autoencoder for pan-cancer gene expression data
Stars: ✭ 126 (+334.48%)
Mutual labels:  cancer-genomics
Lancet
Microassembly based somatic variant caller for NGS data
Stars: ✭ 134 (+362.07%)
Mutual labels:  cancer-genomics
Maftools
Summarize, Analyze and Visualize MAF files from TCGA or in house studies.
Stars: ✭ 249 (+758.62%)
Mutual labels:  cancer-genomics
Msisensor
microsatellite instability detection using tumor only or paired tumor-normal data
Stars: ✭ 103 (+255.17%)
Mutual labels:  cancer-genomics
deTiN
DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.
Stars: ✭ 46 (+58.62%)
Mutual labels:  cancer-genomics
Agfusion
Python package to annotate and visualize gene fusions.
Stars: ✭ 36 (+24.14%)
Mutual labels:  cancer-genomics
Awesome Cancer Variant Databases
A community-maintained repository of cancer clinical knowledge bases and databases focused on cancer variants.
Stars: ✭ 212 (+631.03%)
Mutual labels:  cancer-genomics
SigProfilerSimulator
SigProfilerSimulator allows realistic simulations of mutational patterns and mutational signatures in cancer genomes. The tool can be used to simulate signatures of single point mutations, double point mutations, and insertion/deletions. Further, the tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
Stars: ✭ 18 (-37.93%)
Mutual labels:  cancer-genomics
SigProfilerMatrixGenerator
SigProfilerMatrixGenerator creates mutational matrices for all types of somatic mutations. It allows downsizing the generated mutations only to parts for the genome (e.g., exome or a custom BED file). The tool seamlessly integrates with other SigProfiler tools.
Stars: ✭ 68 (+134.48%)
Mutual labels:  cancer-genomics
Variants2Neoantigen
A neoantigen calling pipeline begins from variants record file (MAF) (Not maintain now)
Stars: ✭ 27 (-6.9%)
Mutual labels:  cancer-genomics

Installation and usage instructions can be found in the wiki.

Orchid

orchid

A management, annotation, and machine learning system for analyzing cancer mutations.

NOTE: This code is still an early release and is being actively developed. Please report any issues using the Issues tab and they will be fixed as soon as possible.

Introduction

Please refer to the following publication for a detailed description of this software:
Bioinformatics, btx709, https://doi.org/10.1093/bioinformatics/btx709

or, for a quick and dirty explanation:


What is orchid?

The purpose of orchid is to facilitate machine learning on tumor genetic data to gain biological or clinical insight. For example, you might be interested in sub-typing aggressive vs. non aggressive prostate cancer based on tumor mutational profiles derived from tumor sequence data, or maybe in trying to figure out which tumor tissue a cell-free DNA molecule is derived.


What is a 'tumor mutational profile'?

A tumor mutational profile is the annotated set of mutations within a tumor. A typical tumor might contain thousands of mutations, but most are assumed to be irrelevant to disease because they arise due to an important hallmark of cancer-- an unstable genome. These are called passenger mutations. However, some mutations (one to hundreds) may play important roles in carcinogenesis and/or be useful in identifying tumor characteristics, like aggressiveness. These are called driver mutations. Many cancer researchers focus only on driver mutations because of thier outsized role in cancer, but orchid takes the approach of analyzing all mutations in aggregate with machine learning algorithms to try to tease apart more subtle patterns. This approach makes sense since even mutations that have been deemed irrelevant have been associated with particular tumor types and may encode important information about (or even regulate processes involved in) the underlying biology of a tumor (e.g., trinucleotide signatures).


What is meant by an 'annotated set of mutations'?

An annotation is simply a numeric or ordinal value that can be associated with a particular mutation. For example, 'mutation A' may change the amino acid sequence of a protein, so we can annotate it as a 'non-synonymous single nucleotide polymorphism' or 'nsSNP'. On the other hand, 'mutation B' may not change the amino acid sequence, so we annotate it as a 'synonymous SNP'. Biologically speaking, a non-synonymous SNP is more likely to change the effect of a protein than a synonymous one. In machine learning parlance, an annotation is called a feature. If we gather many mutations across a tumor (or tumors) and annotate each mutation with many features, we end up with a set of annotated mutations, or tumor mutational profile.

At this time, many regulatory and coding features of the human genome have been extensively cataloged, resulting in a wealth of data to mine. If we gather enough biological data, we can increase our understanding of each individual mutation and its possible role in cancer, or at least begin to see if patterns emerge from the data. A list of features used in our publication and available in our public database can be found here: http://wittelab.ucsf.edu/orchid.

Here's an example. If we arrange a set of mutations from a tumor in rows and corresponding feature values in columns, a mutational profile can be created and visualized:
Mutational Profile

Here large feature values (or more 'severe' categories) are shown as more orange, while smaller (less 'severe') feature values are whiter. There is also a final column of sample labels, which is ultimately what we're interested in learning. In other words, this column's values are used to train supervised machine learning algorithms for the purpose of future sample classification.

Getting Started

  1. Download this code and install prerequisites
  2. Obtain tumor and annotation data
  3. Build the database
  4. Perform machine learning

Please refer to the wiki to begin!

NOTICE: This software requires the use of other code and/or data that must be obtained with respect to its license or copyright. Generally speaking, this implies orchid's use is restricted to non-commercial activities. Orchid itself is licensed under the MIT license requiring only preservation of copyright and license notices. Please see the LICENSE file for more details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].