All Projects → herophilus → demuxalot

herophilus / demuxalot

Licence: MIT License
Reliable, scalable and demultiplexing for single-cell RNA sequencing. Improves genotypes with Expectation-Maximization

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to demuxalot

SPLiT-Seq demultiplexing
An unofficial demultiplexing strategy for SPLiT-seq RNA-Seq data
Stars: ✭ 20 (+11.11%)
Mutual labels:  demultiplexing, single-cell-analysis
souporcell
Clustering scRNAseq by genotypes
Stars: ✭ 88 (+388.89%)
Mutual labels:  scrnaseq
Nebulosa
R package to visualize gene expression data based on weighted kernel density estimation
Stars: ✭ 50 (+177.78%)
Mutual labels:  single-cell-analysis
chaipcr
The software behind Chai's open-source Real-Time PCR instrument
Stars: ✭ 68 (+277.78%)
Mutual labels:  biotech
deML
Maximum likelihood demultiplexing
Stars: ✭ 41 (+127.78%)
Mutual labels:  demultiplexing
topometry
A comprehensive dimensional reduction framework to recover the latent topology from high-dimensional data.
Stars: ✭ 64 (+255.56%)
Mutual labels:  single-cell-analysis
dyngen
Simulating single-cell data using gene regulatory networks 📠
Stars: ✭ 59 (+227.78%)
Mutual labels:  single-cell-analysis
CeleScope
Single Cell Analysis Pipelines
Stars: ✭ 36 (+100%)
Mutual labels:  single-cell-analysis
SmartPeak
Fast and Accurate CE-, GC- and LC-MS(/MS) Data Processing
Stars: ✭ 21 (+16.67%)
Mutual labels:  biotech
immunarch
🧬 Immunarch by ImmunoMind: R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
Stars: ✭ 204 (+1033.33%)
Mutual labels:  single-cell-analysis
ascend
R package - Analysis of Single Cell Expression, Normalisation and Differential expression (ascend)
Stars: ✭ 22 (+22.22%)
Mutual labels:  scrnaseq
dynmethods
A collection of 50+ trajectory inference methods within a common interface 📥📤
Stars: ✭ 94 (+422.22%)
Mutual labels:  single-cell-analysis
nanoseq
Nanopore demultiplexing, QC and alignment pipeline
Stars: ✭ 82 (+355.56%)
Mutual labels:  demultiplexing
DRComparison
Comparison of dimensionality reduction methods
Stars: ✭ 29 (+61.11%)
Mutual labels:  single-cell-analysis
sample-sheet
A permissively licensed library designed to replace Illumina's Experiment Manager
Stars: ✭ 42 (+133.33%)
Mutual labels:  demultiplexing
snATAC
<<------ Use SnapATAC!!
Stars: ✭ 23 (+27.78%)
Mutual labels:  single-cell-analysis
BEER
BEER: Batch EffEct Remover for single-cell data
Stars: ✭ 19 (+5.56%)
Mutual labels:  single-cell-analysis
Interactive-3D-Plotting-in-Seurat-3.0.0
This repository contains R code, with which you can create 3D UMAP and tSNE plots of Seurat analyzed scRNAseq data
Stars: ✭ 80 (+344.44%)
Mutual labels:  scrnaseq
scclusteval
Single Cell Cluster Evaluation
Stars: ✭ 57 (+216.67%)
Mutual labels:  scrnaseq
bindSC
Bi-order integration (in silico multi-omics data) of single cell RNA sequencing, single cell ATAC sequencing, spacial transcriptomics and CyTOF data
Stars: ✭ 24 (+33.33%)
Mutual labels:  single-cell-analysis

demuxalot_logo_small

Run tests and deploy

Demuxalot

Reliable and efficient idenfitication of genotypes for individual cells in RNA sequencing that refines the knowledge about genotypes from the data.

Demuxalot is fast and optimized to work with lots of genotypes.

Preprint is available at biorxiv.

Background

During single-cell RNA-sequencing (scRnaSeq) we pool cells from different donors and process them together.

  • Pro: all cells come through the same pipeline, so preparation/biological variation effects are cancelled out from analysis automatically. Also experiments are much cheaper!
  • Con: we don't know cell origin, everything is mixed!

Demuxalot solves the con: it guesses genotype of each cell by matching reads coming from cell against genotypes. This is called demuxltiplexing.

Herophilus uses scRnaSeq to study cells in organoids with multiple genetic backgrounds at scale.

Comparisons

Demuxalot shows high reliability, data efficiency and speed. Below is a benchmark on PMBC data with 32 donors from preprint

Screen Shot 2021-06-03 at 6 03 12 PM

Known genotypes and refined genotypes: the tale of two scenarios

Typical approach to get genotype-specific mutations are

  • whole-genome sequencing (expensive, very good)
    • you have information about all (ok, >90%) the genotype, and it is unlikely that you need to refine it
    • so you just go straight to demultiplexing
    • demuxlet solves this case
  • Bead arrays (aka SNP arrays aka DNA microarrays) are super cheap and practically more relevant
    • you get information about 50k to 650k most common SNPs, and that's only a small fraction, but you also pay very little
    • this case is covered by demuxalot (this package)
    • Illumina's video about this technology

Why is it worth refining genotypes?

SNP array provides up to ~650k (as of 2021) positions in the genome. Around 20-30% of them would be specific for a genotype (i.e. deviate from majority).

  • Each genotype has around 10 times more SNV (single nucleotide variations) that are not captured by array. Some of this missing SNPs are very valuable for demultiplexing

What's special power of demuxalot?

  • much better handling of multiple reads coming from the same UMI (i.e. same transcript)
    • demuxalot efficiently combines information from multiple reads with same UMI and cross-checks it
  • default settings are CellRanger-specific (that is - optimized for 10X pipeline). Cellranger's and STAR's flags in BAM break some common conventions, but we can still efficiently use them (by using filtering callbacks)
  • ability to refine genotypes. without failing and diverging
    • Vireo is a tool that was created with similar purposes. But it either diverges or does not learn better genotypes
  • optimized variant calling. It's also faster than demuxlet due to multiprocessing
  • this is not a command-line tool, and not meant to be
    • write python code, this gives full control and flexibility of demultiplexing

Installation

Package is pip-installable. Requires python >= 3.6

pip install demuxalot

Developer installation:

git clone https://github.com/herophilus/demuxalot
cd demuxalot && pip install -e .

Here are some common scenarios and how they are implemented in demuxalot. Also visit examples/ folder

Running (simple scenario)

Only using provided genotypes

from demuxalot import Demultiplexer, BarcodeHandler, ProbabilisticGenotypes, count_snps

# Loading genotypes
genotypes = ProbabilisticGenotypes(genotype_names=['Donor1', 'Donor2', 'Donor3'])
genotypes.add_vcf('path/to/genotypes.vcf')

# Loading barcodes
barcode_handler = BarcodeHandler.from_file('path/to/barcodes.csv')

snps = count_snps(
    bamfile_location='path/to/sorted_alignments.bam',
    chromosome2positions=genotypes.get_chromosome2positions(),
    barcode_handler=barcode_handler, 
)

# returns two dataframes with likelihoods and posterior probabilities 
likelihoods, posterior_probabilities = Demultiplexer.predict_posteriors(
    snps,
    genotypes=genotypes,
    barcode_handler=barcode_handler,
)

Running (complex scenario)

Refinement of known genotypes is shown in a notebook, see examples/

Saving/loading genotypes

# You can always export learnt genotypes to be used later
refined_genotypes.save_betas('learnt_genotypes.parquet')
refined_genotypes = ProbabilisticGenotypes(genotype_names= <list which genotypes to load here>)
refined_genotypes.add_prior_betas('learnt_genotypes.parquet')

Re-saving VCF genotypes with betas (optional, recommended)

Generally makes sense to export VCF to internal format only when you plan to load it many times. Loading of internal format is much faster than parsing/validating VCF.

genotypes = ProbabilisticGenotypes(genotype_names=['Donor1', 'Donor2', 'Donor3'])
genotypes.add_vcf('path/to/genotypes.vcf')
genotypes.save_betas('learnt_genotypes.parquet')
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].