architverma1 / tGPLVM

Licence: MIT License

tGPLVM: A Nonparametric, Generative Model for Manifold Learning with scRNA-seq experimental data

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to tGPLVM

D3E

Discrete Distributional Differential Expression

Stars: ✭ 19 (+18.75%)

Mutual labels: single-cell-rna-seq

EWCE

Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.

Stars: ✭ 30 (+87.5%)

Mutual labels: single-cell-rna-seq

Unsupervised-Learning-in-R

Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).

Stars: ✭ 34 (+112.5%)

Mutual labels: dimensionality-reduction

Dimensionality-reduction-and-classification-on-Hyperspectral-Images-Using-Python

In this repository, You can find the files which implement dimensionality reduction on the hyperspectral image(Indian Pines) with classification.

Stars: ✭ 63 (+293.75%)

Mutual labels: dimensionality-reduction

alevin-fry

🐟 🔬🦀 alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.

Stars: ✭ 78 (+387.5%)

Mutual labels: single-cell-rna-seq

dml

R package for Distance Metric Learning

Stars: ✭ 58 (+262.5%)

Mutual labels: dimensionality-reduction

pymde

Minimum-distortion embedding with PyTorch

Stars: ✭ 420 (+2525%)

Mutual labels: dimensionality-reduction

CellO

CellO: Gene expression-based hierarchical cell type classification using the Cell Ontology

Stars: ✭ 34 (+112.5%)

Mutual labels: single-cell-rna-seq

UMAP.jl

Uniform Manifold Approximation and Projection (UMAP) implementation in Julia

Stars: ✭ 93 (+481.25%)

Mutual labels: dimensionality-reduction

StackedDAE

Stacked Denoising AutoEncoder based on TensorFlow

Stars: ✭ 23 (+43.75%)

Mutual labels: single-cell-rna-seq

Spectre

A computational toolkit in R for the integration, exploration, and analysis of high-dimensional single-cell cytometry and imaging data.

Stars: ✭ 31 (+93.75%)

Mutual labels: dimensionality-reduction

SINCERA

An R implementation of the SINCERA pipeline for single cell RNA-seq profiling analysis

Stars: ✭ 20 (+25%)

Mutual labels: single-cell-rna-seq

NIDS-Intrusion-Detection

Simple Implementation of Network Intrusion Detection System. KddCup'99 Data set is used for this project. kdd_cup_10_percent is used for training test. correct set is used for test. PCA is used for dimension reduction. SVM and KNN supervised algorithms are the classification algorithms of project. Accuracy : %83.5 For SVM , %80 For KNN

Stars: ✭ 45 (+181.25%)

Mutual labels: dimensionality-reduction

timecorr

Estimate dynamic high-order correlations in multivariate timeseries data

Stars: ✭ 30 (+87.5%)

Mutual labels: dimensionality-reduction

dropEst

Pipeline for initial analysis of droplet-based single-cell RNA-seq data

Stars: ✭ 71 (+343.75%)

Mutual labels: single-cell-rna-seq

adenine

ADENINE: A Data ExploratioN PipelINE

Stars: ✭ 15 (-6.25%)

Mutual labels: dimensionality-reduction

kmer-homology-paper

Manuscript for functional prediction of transcriptomic “dark matter” across species

Stars: ✭ 12 (-25%)

Mutual labels: single-cell-rna-seq

SPLiT-Seq demultiplexing

An unofficial demultiplexing strategy for SPLiT-seq RNA-Seq data

Stars: ✭ 20 (+25%)

Mutual labels: single-cell-rna-seq

monocle3

No description or website provided.

Stars: ✭ 170 (+962.5%)

Mutual labels: single-cell-rna-seq

dbMAP

A fast, accurate, and modularized dimensionality reduction approach based on diffusion harmonics and graph layouts. Escalates to millions of samples on a personal laptop. Adds high-dimensional big data intrinsic structure to your clustering and data visualization workflow.

Stars: ✭ 39 (+143.75%)

Mutual labels: dimensionality-reduction

View All Similar Projects ➔

tGPLVM: A robust nonlinear manifold model for single cell RNA-seq data.

Intro

Dimension reduction is a common and critical first step in analysis of high throughput singe cell RNA sequencing. tGPLVM is a nonparametric, generative model for nonlinear manifold learning; that is a flexible, nearly assumption-free model that doesn't require setting parameters a priori (e.g. number of dimensions, perplexity, etc.) and provides uncertainty estimates for sample mappings. tGPLVM can be used for visualization of high-dimensional data or as part of a pipeline for cell type identification or pseudotime reconstruction.

We provide a script for fitting the model with Black Box Variational Inference for speed and scabality. A batch learning implementation is also provided for larger datasets that need to be fit under memory restriction.

Usage

Requirements

tGPLVM is implemented in python 2.7 with the following packages:

numpy 1.14.5
pandas 0.23.3
h5py 2.8.0
tensorflow 1.6.0
edwards 1.3.5
sklearn 0.19.2

Running

Input: A numpy array or sparse csr/csc matrix of scRNA counts (or other types data) with format N cells (samples) as rows by p genes (features) as columns (loaded to y_train). Input this directly into the code.

Options: The following parameters can be adjusted in the script to adjust inference:

Degrees of freedom (--df) - default: 4
Use t-Distribution error model (otherwise normal error) (--T) - default: True
Initial Number of Dimensions (--Q) - default: 3
Kernel Function
- Matern 1/2, 3/2, 5/2 (--m12, --m32, --m52) - default: True
- Periodic (--per_bool) - default: False
Number of Inducing Points (--m) - default: 30
Batch size (--M) - default: 250
Max iterations (--iterations) - default: 5000
Save frequency (--save_freq): - default: 250
Sparse data type (is CSC or CSR) (--sparse): - default: False
PCA Initialization (otherwise random initialization) (--pca_init): - default: True
Output directory (--out): - default: ./test

Output: hdf5 file with

Latent mapping posterior (mean and variance)
Gene-specific noise
Kernel hyperparameters (variance, lengthscale)
Inducing points in latent and high-dimensional space

Example:

When the input is Test_3_Pollen.h5, the following code runs 250 iterations with the full dataset

python tGPLVM-batch.py --Q 2 --M 249 --p 6982 --m12 True --m32 True --m52 True --iterations 250 --out ./test

We provide the input code for two other files:

tapio_tcell_tpm.txt - Data from Lonnberg gpfates. Data is available at https://github.com/Teichlab/GPfates
1M_neurons_filtered_gene_bc_matrices_h5.h5 - 1 million 10x mice brains cell. Data is available at https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons. Make sure to set --sparse True for this data.

The final data from the paper is available here: https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/cd34

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

architverma1 / tGPLVM

Programming Languages

Labels

Projects that are alternatives of or similar to tGPLVM

tGPLVM: A robust nonlinear manifold model for single cell RNA-seq data.

Intro

Usage

Requirements

Running