All Projects → getzlab → deTiN

getzlab / deTiN

Licence: BSD-3-Clause license
DeTiN is designed to measure tumor-in-normal contamination and improve somatic variant detection sensitivity when using a contaminated matched control.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to deTiN

IMPACT-Pipeline
Framework to process and call somatic variation from NGS dataset generated using MSK-IMPACT assay
Stars: ✭ 52 (+13.04%)
Mutual labels:  cancer-genomics, cancer-variants
porthog
Identify which process is using a specific port.
Stars: ✭ 27 (-41.3%)
Mutual labels:  detection
Grocery-Product-Detection
This repository builds a product detection model to recognize products from grocery shelf images.
Stars: ✭ 73 (+58.7%)
Mutual labels:  detection
Facial-Keypoint-Detection
Facial keypoint detection system takes in any image with faces, and predicts the location of 68 distinguishing keypoints on the face - Udacity project
Stars: ✭ 37 (-19.57%)
Mutual labels:  detection
mlmodelzoo
Build your iOS 11+ apps with the ready-to-use Core ML models below
Stars: ✭ 17 (-63.04%)
Mutual labels:  detection
Zircolite
A standalone SIGMA-based detection tool for EVTX, Auditd and Sysmon for Linux logs
Stars: ✭ 443 (+863.04%)
Mutual labels:  detection
Pytorch Faster Rcnn
pytorch based implementation faster rcnn
Stars: ✭ 251 (+445.65%)
Mutual labels:  detection
detect-browser-language
Detect browser language
Stars: ✭ 35 (-23.91%)
Mutual labels:  detection
PlantDoc-Dataset
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020
Stars: ✭ 114 (+147.83%)
Mutual labels:  detection
RFBNet
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018
Stars: ✭ 1,380 (+2900%)
Mutual labels:  detection
LiDARTag
This is a package for LiDARTag, described in paper: LiDARTag: A Real-Time Fiducial Tag System for Point Clouds
Stars: ✭ 161 (+250%)
Mutual labels:  detection
Variants2Neoantigen
A neoantigen calling pipeline begins from variants record file (MAF) (Not maintain now)
Stars: ✭ 27 (-41.3%)
Mutual labels:  cancer-genomics
SCICoNE
Single-cell copy number calling and event history reconstruction.
Stars: ✭ 20 (-56.52%)
Mutual labels:  cancer-genomics
TextBoxes
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
Stars: ✭ 625 (+1258.7%)
Mutual labels:  detection
MVDet
[ECCV 2020] Codes and MultiviewX dataset for "Multiview Detection with Feature Perspective Transformation".
Stars: ✭ 123 (+167.39%)
Mutual labels:  detection
cardelino
Clone identification from single-cell data
Stars: ✭ 49 (+6.52%)
Mutual labels:  somatic-mutations
MediCare-Prime
Prediction or detection of various medical ailments
Stars: ✭ 19 (-58.7%)
Mutual labels:  detection
cacao
Callable Cancer Loci - assessment of sequencing coverage for actionable and pathogenic loci in cancer
Stars: ✭ 21 (-54.35%)
Mutual labels:  cancer-genomics
ofxOpenCvDnnObjectDetection
OpenCV based DNN Object Detection Library for Openframeworks
Stars: ✭ 34 (-26.09%)
Mutual labels:  detection
SWELF
Simple Windows Event Log Forwarder (SWELF). Its easy to use/simply works Log Forwarder and EVTX Parser. Almost in full release here at https://github.com/ceramicskate0/SWELF/releases/latest.
Stars: ✭ 23 (-50%)
Mutual labels:  detection

Synopsis

DeTiN estimates tumor in normal (TiN) based on tumor and matched normal sequencing data. The estimate is based on both candidate SSNVs and aSCNAs. DeTiN then applies the joint TiN estimate to reclassify SSNVs and InDels as somatic or germline. Install and run time on standard exome data is about 5 mins. For help running contact [email protected].

Code Example

Please see github wiki for description of input files.

python deTiN.py --mutation_data_path example_data/HCC_10_90.call_stats.pon_removed.txt --cn_data_path example_data/HCC-1143_100_T-sim-final.acs.seg --tumor_het_data example_data/HCC_10_90.tumor.hets.tsv --normal_het_data example_data/HCC_10_90.normal.hets.tsv --exac_data_path example_data/exac.pickle_high_af --output_name 10_percent_TiN_simulation --indel_data_path example_data/MuTect2.call_stats.txt --indel_data_type MuTect2 --output_dir example_data/

Parameter descriptions

See project Wiki for full description of required fields for input data.

Input data:

–-mutation_data_path mutation statistics file (MuTect call stats file (or similar variants file e.g. Strelka v.2.9.7 or greater)).

–-cn_data_path allelic copy number segmentation file (GATK4 AllelicCNV seg file).

–-tumor_het_data heterozygous SNP variant counts in the tumor sample. (GATK4 tumor het cov file).

–-normal_het_data heterozygous SNP variant counts in the normal sample. (GATK4 normal het cov file).

–-exac_data_path pickle file of minor allele fraction > 0.01 ExAC sites.

Parameters:

–-output_name sample name

–-output_dir output directory

Optional parameters:

--TiN_prior (default = 0.5) 0.5 is a null prior. If users wish to require more evidence for TiN > 0 this can be lowered.

--mutation_prior (default = 0.15) The ratio of sites expected to be mutated somatically to rare germline events (0.15 corresponds to ~2 mutations per megabase)

--ascna_probe_number_filter (default = 200) We require 200 probes based on empirical results using GATK4CNV that segments smaller than this tend to be enriched for artifacts. For WGS this parameter can be set to 0.

--ascna_SNP_number_filter (default = 20) We require 20 SNPs based on empirical results using GATK4CNV that segments smaller than this tend to be enriched for artifacts.

--coverage_threshold (default = 15) Number of reads required to use a variant for estimation of TiN. We require 15x coverage since low coverage sites tend to be enriched for artifacts. (NOTE: all sites are considered for somatic recovery)

--SSNV_af_threshold (default = 0.15) Fraction of alternate allele required for site to be used in SSNV based estimation of TiN. We require 15% since low af sites tend to be enriched for artifacts. If users are using more deeply sequenced data this should be set to a lower value. (NOTE: all sites are considered for somatic recovery)

--aSCNA_threshold (default = 0.1) Fraction of allele shift required to use a segment for TiN estimation. Lower this value for extremely well covered samples (e.g. 500x).

--aSCNA_variance_threshold (default = 0.025) Variance tolerated in allele shift of a segment before removal. This filter helps to remove regions enriched for artifact sites such as centromeres/telomeres and low mapping regions.

--cancer_hot_spots Optional BED file of cancer hot spot mutations which the user has a stronger prior on being somatic e.g. BRAF v600E mutations.

Motivation

Genomic characterization is vital to the understanding and treatment of cancer. Detection of somatic mutations is a critical component of this process. A key step in sensitive and specific somatic mutation detection is comparison of the tumor sample to a matched germline control. Sensitivity to detect somatic variants is greatly reduced when the matched normal sample is contaminated with tumor cells. To overcome this limitation, we developed deTiN, a method that estimates tumor-in-normal contamination (TiN), and improves detection sensitivity when using a contaminated normal.

Installation

deTiN requires Numpy, Pandas==1.0.0, matplotlib==3.1.3, Scipy, random2 and Python 3.7.

pip install deTiN

or

git clone https://github.com/broadinstitute/deTiN.git

cd deTiN/

python setup install or Run from deTiN directory

Example Data

The above code example will run using the included data from an artifically mixed 10% contaminated normal. Input files were generated using MuTect and GATK4ACNV.

Data quality

Users should be aware that pervasive sequencing artifacts have the potential to confound deTiN results if they are present in both the tumor and normal samples. For example,a very high level of OxoG artifacts (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3616734/) affecting both tumor and normal samples may appear like somatic events and inflate deTiN's estimate. Users should remove such artifacts or flag them in their call stats files prior to running deTiN.

License

Copyright 2017 Amaro Taylor-Weiner

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].