All Projects → broadinstitute → Tangram

broadinstitute / Tangram

Licence: BSD-3-Clause license
Spatial alignment of single cell transcriptomic data.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tangram

visium-clustergrammer2
Spatial Transcriptomics Dashboard
Stars: ✭ 24 (-83.89%)
Mutual labels:  scrna-seq, visium
cobrame
A COBRApy extension for genome-scale models of metabolism and expression (ME-models)
Stars: ✭ 30 (-79.87%)
Mutual labels:  computational-biology, gene-expression
souporcell
Clustering scRNAseq by genotypes
Stars: ✭ 88 (-40.94%)
Mutual labels:  computational-biology, scrna-seq
QGIS-visualization-workshop
QGIS visualization workshop materials.
Stars: ✭ 46 (-69.13%)
Mutual labels:  spatial-data
Sierra
Discover differential transcript usage from polyA-captured single cell RNA-seq data
Stars: ✭ 37 (-75.17%)
Mutual labels:  scrna-seq
scGEAToolbox
scGEAToolbox: Matlab toolbox for single-cell gene expression analyses
Stars: ✭ 15 (-89.93%)
Mutual labels:  scrna-seq
rsgislib
Remote Sensing and GIS Software Library; python module tools for processing spatial data.
Stars: ✭ 103 (-30.87%)
Mutual labels:  spatial-data
bioSyntax-archive
Syntax highlighting for computational biology
Stars: ✭ 16 (-89.26%)
Mutual labels:  computational-biology
Jupyter Dock
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.
Stars: ✭ 179 (+20.13%)
Mutual labels:  computational-biology
GeoArrays.jl
Simple geographical raster interaction built on top of ArchGDAL, GDAL and CoordinateTransformations
Stars: ✭ 42 (-71.81%)
Mutual labels:  spatial-data
cerebra
A tool for fast and accurate summarizing of variant calling format (VCF) files
Stars: ✭ 55 (-63.09%)
Mutual labels:  scrna-seq
ALRA
Imputation method for scRNA-seq based on low-rank approximation
Stars: ✭ 48 (-67.79%)
Mutual labels:  scrna-seq
northstar
Single cell type annotation guided by cell atlases, with freedom to be queer
Stars: ✭ 23 (-84.56%)
Mutual labels:  scrna-seq
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-84.56%)
Mutual labels:  spatial-data
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (-73.83%)
Mutual labels:  computational-biology
kdtree
A pure Nim k-d tree implementation for efficient spatial querying of point data
Stars: ✭ 40 (-73.15%)
Mutual labels:  spatial-data
geodaData
Data package for accessing GeoDa datasets using R
Stars: ✭ 15 (-89.93%)
Mutual labels:  spatial-data
CNApy
An integrated visual environment for metabolic modeling with common methods such as FBA, FVA and Elementary Flux Modes, and advanced features such as thermodynamic methods, extended Minimal Cut Sets, OptKnock, RobustKnock, OptCouple and more!
Stars: ✭ 27 (-81.88%)
Mutual labels:  computational-biology
scAlign
A deep learning-based tool for alignment and integration of single cell genomic data across multiple datasets, species, conditions, batches
Stars: ✭ 32 (-78.52%)
Mutual labels:  scrna-seq
PyEarthScience
The PyEarthScience repository created by DKRZ (German Climate Computing Center) provides Python scripts and Jupyter notebooks in particular for scientific data processing and visualization used in climate science. It contains scripts for visualization, I/O, and analysis using PyNGL, PyNIO, xarray, cfgrib, xesmf, cartopy, and others.
Stars: ✭ 56 (-62.42%)
Mutual labels:  spatial-data

PyPI version

Tangram is a Python package, written in PyTorch and based on scanpy, for mapping single-cell (or single-nucleus) gene expression data onto spatial gene expression data. The single-cell dataset and the spatial dataset should be collected from the same anatomical region/tissue type, ideally from a biological replicate, and need to share a set of genes. Tangram aligns the single-cell data in space by fitting gene expression on the shared genes. The best way to familiarize yourself with Tangram is to check out our tutorial and our documentation. colab tutorial
If you don't use squidpy yet, check out our previous tutorial.

Tangram_overview


How to install Tangram

To install Tangram, make sure you have PyTorch and scanpy installed. If you need more details on the dependences, look at the environment.yml file.

  • set up conda environment for Tangram
    conda env create -f environment.yml
  • install tangram-sc from shell:
    conda activate tangram-env
    pip install tangram-sc
  • To start using Tangram, import tangram in your jupyter notebooks or/and scripts
    import tangram as tg

Two ways to run Tangram

How to run Tangram at cell level

Load your spatial data and your single cell data (which should be in AnnData format), and pre-process them using tg.pp_adatas:

    ad_sp = sc.read_h5ad(path)
    ad_sc = sc.read_h5ad(path)
    tg.pp_adatas(ad_sc, ad_sp, genes=None)

The function pp_adatas finds the common genes between adata_sc, adata_sp, and saves them in two adatas.uns for mapping and analysis later. Also, it subsets the intersected genes to a set of training genes passed by genes. If genes=None, Tangram maps using all genes shared by the two datasets. Once the datasets are pre-processed we can map:

    ad_map = tg.map_cells_to_space(ad_sc, ad_sp)

The returned AnnData,ad_map, is a cell-by-voxel structure where ad_map.X[i, j] gives the probability for cell i to be in voxel j. This structure can be used to project gene expression from the single cell data to space, which is achieved via tg.project_genes.

    ad_ge = tg.project_genes(ad_map, ad_sc)

The returned ad_ge is a voxel-by-gene AnnData, similar to spatial data ad_sp, but where gene expression has been projected from the single cells. This allows to extend gene throughput, or correct for dropouts, if the single cells have higher quality (or more genes) than single cell data. It can also be used to transfer cell types onto space.


How to run Tangram at cluster level

To enable faster training and consume less memory, Tangram mapping can be done at cell cluster level. This modification was introduced by Sten Linnarsson.

Prepare the input data as the same you would do for cell level Tangram mapping. Then map using following code:

    ad_map = tg.map_cells_to_space(
                   ad_sc, 
                   ad_sp,         
                   mode='clusters',
                   cluster_label='subclass_label')

Provided cluster_label must belong to ad_sc.obs. Above example code is to map at 'subclass_label' level, and the 'subclass_label' is in ad_sc.obs.

To project gene expression to space, use tg.project_genes and be sure to set the cluster_label argument to the same cluster label in mapping.

    ad_ge = tg.project_genes(
                  ad_map, 
                  ad_sc,
                  cluster_label='subclass_label')

How Tangram works under the hood

Tangram instantiates a Mapper object passing the following arguments:

  • S: single cell matrix with shape cell-by-gene. Note that genes is the number of training genes.
  • G: spatial data matrix with shape voxels-by-genes. Voxel can contain multiple cells.

Then, Tangram searches for a mapping matrix M, with shape voxels-by-cells, where the element M_ij signifies the probability of cell i of being in spot j. Tangram computes the matrix M by maximizing the following:

where cos_sim is the cosine similarity. The meaning of the loss function is that gene expression of the mapped single cells should be as similar as possible to the spatial data G, under the cosine similarity sense.

The above accounts for basic Tangram usage. In our manuscript, we modified the loss function in several ways so as to add various kinds of prior knowledge, such as number of cell contained in each voxels.


Frequently Asked Questions

Do I need a GPU for running Tangram?

Mapping with cluster mode is fine on a standard laptop. For mapping at single cell level, GPU is not required but is recommended. We run most of our mappings on a single P100 which maps ~50k cells in a few minutes.

How do I choose a list of training genes?

A good way to start is to use the top 1k unique marker genes, stratified across cell types, as training genes. Alternatively, you can map using the whole transcriptome. Ideally, training genes should contain high quality signals: if most training genes are rich in dropouts or obtained with bad RNA probes your mapping will not be accurate.

Do I need cell segmentation for mapping on Visium data?

You do not need to segment cells in your histology for mapping on spatial transcriptomics data (including Visium and Slide-seq). You need, however, cell segmentation if you wish to deconvolve the data (ie deterministically assign a single cell profile to each cell within a spatial voxel).

I run out of memory when I map: what should I do?

Reduce your spatial data in various parts and map each single part. If that is not sufficient, you will need to downsample your single cell data as well.


How to cite Tangram

Tangram has been released in the following publication

Biancalani* T., Scalia* G. et al. - Deep learning and alignment of spatially-resolved whole transcriptomes of single cells in the mouse brain with Tangram Nature Methods 18, 1352–1362 (2021)

If you have questions, please contact the authors of the method:

PyPI maintainer:

The artwork has been curated by:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].