scRNAseq_cell_cluster_labeling

Description

This repository contains scripts to run and benchmark scRNA-seq cell cluster labeling methods and is a companion to our paper 'Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data' (Diaz-Mejia JJ et al (2019) [https://f1000research.com/articles/8-296].

Scripts

Main wrapper scripts

Script name	Task(s)
`subsamples_gene_classes_and_runs_enrichment_scripts.R`	Main wrapper to run and benchmark cell cluster labeling methods

Scripts to run cell type labeling methods

Script name	Task(s)
`obtains_CIBERSORT_for_MatrixColumns.pl`	Runs CIBERSORT using gene expression signatures and a matrix with average gene expressions per gene, per cell cluster
`obtains_GSEA_for_MatrixColumns.pl`	Runs GSEA using gene expression signatures and a matrix with average gene expressions per cell cluster
`obtains_GSVA_for_MatrixColumns.R`	Runs GSVA using gene expression signatures and a matrix with average gene expressions per cell cluster
`obtains_METANEIGHBOR_for_MatrixColumns.R`	Runs MetaNeighborUS using gene expression signatures, a matrix with average gene expressions per cell cluster, and reference cell types. Note: modifications were made to the R library(metaneighbor) source code. Check `bin/r_programs/obtains_METANEIGHBOR_for_MatrixColumns.R` for details
`obtains_ORA_for_MatrixColumns.pl`	Runs ORA using gene expression signatures and a matrix with average gene expressions per cell cluster

Scripts to run ROC and PR curve analyses

Script name	Task(s)
`obtains_performance_plots_from_cluster_labelings.pl`	Compiles results from cell type labeling methods and obtains ROC and PR curves plots and AUC's
`obtains_ROC_and_PR_curves_from_matrix_with_gold_standards.R`	Obtains ROC and PR curve plots, ROC AUC and PR AUC values from a matrix of reference labels in column 2 and predictions in columns 3 to N

Scripts to subsample cell type gene expression signatures

Script name	Task(s)
`obtains_permuted_samples_from_gmt.R`	Subsamples genes from gene expression signatures in the form of gene sets
`propagates_permuted_gmt_files_to_profile.R`	Propagates subsampling from signatures in the form of gene sets to those in the form of gene expression profiles

Other scripts

Script name	Task(s)
`obtains_average_gene_expression_per_cluster.R`	Obtains a matrix with average gene expressions per cell cluster from scRNA-seq data, and cell cluster assignments)

How to run the scripts

To see the help of R scripts run them like:
Rscript ~/path_to_script/script.R -h
To see the help of Perl scripts, make the files executable with
chmod +x script.pl and run them like:
~/path_to_script/script.pl

Dependencies

Check each script code for dependencies and further documentation.
To install all R packages use:
install.packages(c("optparse", "vioplot", "GSA", "data.table", "precrec", "ROCR", "Seurat", "dplyr", "Rserve", "e1071", "colorRamps", "stats"))
to install packages from CRAN.
And:
install.packages("BiocManager")
BiocManager::install(c("preprocessCore", "GSVA", "qvalue"))
to install packages from Bioconductor
Used R version 3.5.1
To install Perl script dependencies download perl_modules directory from this repository
and add it to your PERL5LIB environment variable.
Other Perl modules required are: Date::Calc which can be installed from CPAN
Used Perl version 5
The following Java scripts are needed:
CIBERSORT.jar can be obtained from https://cibersort.stanford.edu/download.php
gsea-3.0.jar can be obtained from http://software.broadinstitute.org/gsea/downloads.jsp
Used Java version 1.8.0_162

Input Datasets

Three scRNA-seq datasets that can be used as inputs for these scrips, from liver cells (MacParland et al, 2018), peripheral blood mononuclear cells (PBMCs) (Zheng et al, 2017) and retinal neurons (Shekhar et al, 2016), were processed, curated and deposited into Zendo: https://doi.org/10.5281/zenodo.2575050

Example inputs and outputs can be found here:
https://github.com/jdime/scRNAseq_cell_cluster_labeling/tree/master/examples

Archived code at time of publication

Version 1.0 http://doi.org/10.5281/zenodo.2583161
Version 2.0 http://doi.org/10.5281/zenodo.3350461

Issues and feature requests

Please click here to report an 'New Issue' (i.e. report bugs or request features). https://github.com/jdime/scRNAseq_cell_cluster_labeling/issues

Authors

Javier Diaz (https://github.com/jdime)

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jdime / scRNAseq_cell_cluster_labeling

Programming Languages

Labels

Projects that are alternatives of or similar to scRNAseq cell cluster labeling