All Projects → architverma1 → tGPLVM

architverma1 / tGPLVM

Licence: MIT License
tGPLVM: A Nonparametric, Generative Model for Manifold Learning with scRNA-seq experimental data

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to tGPLVM

D3E
Discrete Distributional Differential Expression
Stars: ✭ 19 (+18.75%)
Mutual labels:  single-cell-rna-seq
EWCE
Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.
Stars: ✭ 30 (+87.5%)
Mutual labels:  single-cell-rna-seq
Unsupervised-Learning-in-R
Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).
Stars: ✭ 34 (+112.5%)
Mutual labels:  dimensionality-reduction
Dimensionality-reduction-and-classification-on-Hyperspectral-Images-Using-Python
In this repository, You can find the files which implement dimensionality reduction on the hyperspectral image(Indian Pines) with classification.
Stars: ✭ 63 (+293.75%)
Mutual labels:  dimensionality-reduction
alevin-fry
🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
Stars: ✭ 78 (+387.5%)
Mutual labels:  single-cell-rna-seq
dml
R package for Distance Metric Learning
Stars: ✭ 58 (+262.5%)
Mutual labels:  dimensionality-reduction
pymde
Minimum-distortion embedding with PyTorch
Stars: ✭ 420 (+2525%)
Mutual labels:  dimensionality-reduction
CellO
CellO: Gene expression-based hierarchical cell type classification using the Cell Ontology
Stars: ✭ 34 (+112.5%)
Mutual labels:  single-cell-rna-seq
UMAP.jl
Uniform Manifold Approximation and Projection (UMAP) implementation in Julia
Stars: ✭ 93 (+481.25%)
Mutual labels:  dimensionality-reduction
StackedDAE
Stacked Denoising AutoEncoder based on TensorFlow
Stars: ✭ 23 (+43.75%)
Mutual labels:  single-cell-rna-seq
Spectre
A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.
Stars: ✭ 31 (+93.75%)
Mutual labels:  dimensionality-reduction
SINCERA
An R implementation of the SINCERA pipeline for single cell RNA-seq profiling analysis
Stars: ✭ 20 (+25%)
Mutual labels:  single-cell-rna-seq
NIDS-Intrusion-Detection
Simple Implementation of Network Intrusion Detection System. KddCup'99 Data set is used for this project. kdd_cup_10_percent is used for training test. correct set is used for test. PCA is used for dimension reduction. SVM and KNN supervised algorithms are the classification algorithms of project. Accuracy : %83.5 For SVM , %80 For KNN
Stars: ✭ 45 (+181.25%)
Mutual labels:  dimensionality-reduction
timecorr
Estimate dynamic high-order correlations in multivariate timeseries data
Stars: ✭ 30 (+87.5%)
Mutual labels:  dimensionality-reduction
dropEst
Pipeline for initial analysis of droplet-based single-cell RNA-seq data
Stars: ✭ 71 (+343.75%)
Mutual labels:  single-cell-rna-seq
adenine
ADENINE: A Data ExploratioN PipelINE
Stars: ✭ 15 (-6.25%)
Mutual labels:  dimensionality-reduction
kmer-homology-paper
Manuscript for functional prediction of transcriptomic “dark matter” across species
Stars: ✭ 12 (-25%)
Mutual labels:  single-cell-rna-seq
SPLiT-Seq demultiplexing
An unofficial demultiplexing strategy for SPLiT-seq RNA-Seq data
Stars: ✭ 20 (+25%)
Mutual labels:  single-cell-rna-seq
monocle3
No description or website provided.
Stars: ✭ 170 (+962.5%)
Mutual labels:  single-cell-rna-seq
dbMAP
A fast, accurate, and modularized dimensionality reduction approach based on diffusion harmonics and graph layouts. Escalates to millions of samples on a personal laptop. Adds high-dimensional big data intrinsic structure to your clustering and data visualization workflow.
Stars: ✭ 39 (+143.75%)
Mutual labels:  dimensionality-reduction

tGPLVM: A robust nonlinear manifold model for single cell RNA-seq data.

Intro

Dimension reduction is a common and critical first step in analysis of high throughput singe cell RNA sequencing. tGPLVM is a nonparametric, generative model for nonlinear manifold learning; that is a flexible, nearly assumption-free model that doesn't require setting parameters a priori (e.g. number of dimensions, perplexity, etc.) and provides uncertainty estimates for sample mappings. tGPLVM can be used for visualization of high-dimensional data or as part of a pipeline for cell type identification or pseudotime reconstruction.

We provide a script for fitting the model with Black Box Variational Inference for speed and scabality. A batch learning implementation is also provided for larger datasets that need to be fit under memory restriction.

Usage

Requirements

tGPLVM is implemented in python 2.7 with the following packages:

  1. numpy 1.14.5
  2. pandas 0.23.3
  3. h5py 2.8.0
  4. tensorflow 1.6.0
  5. edwards 1.3.5
  6. sklearn 0.19.2

Running

Input: A numpy array or sparse csr/csc matrix of scRNA counts (or other types data) with format N cells (samples) as rows by p genes (features) as columns (loaded to y_train). Input this directly into the code.

Options: The following parameters can be adjusted in the script to adjust inference:

  1. Degrees of freedom (--df) - default: 4
  2. Use t-Distribution error model (otherwise normal error) (--T) - default: True
  3. Initial Number of Dimensions (--Q) - default: 3
  4. Kernel Function
    • Matern 1/2, 3/2, 5/2 (--m12, --m32, --m52) - default: True
    • Periodic (--per_bool) - default: False
  5. Number of Inducing Points (--m) - default: 30
  6. Batch size (--M) - default: 250
  7. Max iterations (--iterations) - default: 5000
  8. Save frequency (--save_freq): - default: 250
  9. Sparse data type (is CSC or CSR) (--sparse): - default: False
  10. PCA Initialization (otherwise random initialization) (--pca_init): - default: True
  11. Output directory (--out): - default: ./test

Output: hdf5 file with

  1. Latent mapping posterior (mean and variance)
  2. Gene-specific noise
  3. Kernel hyperparameters (variance, lengthscale)
  4. Inducing points in latent and high-dimensional space

Example:

When the input is Test_3_Pollen.h5, the following code runs 250 iterations with the full dataset

python tGPLVM-batch.py --Q 2 --M 249 --p 6982 --m12 True --m32 True --m52 True --iterations 250 --out ./test

We provide the input code for two other files:

  1. tapio_tcell_tpm.txt - Data from Lonnberg gpfates. Data is available at https://github.com/Teichlab/GPfates
  2. 1M_neurons_filtered_gene_bc_matrices_h5.h5 - 1 million 10x mice brains cell. Data is available at https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons. Make sure to set --sparse True for this data.

The final data from the paper is available here: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/cd34

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].