All Projects → jdime → scRNAseq_cell_cluster_labeling

jdime / scRNAseq_cell_cluster_labeling

Licence: MIT license
Scripts to run and benchmark scRNA-seq cell cluster labeling methods

Programming Languages

perl
6916 projects
r
7636 projects
Raku
181 projects

Projects that are alternatives of or similar to scRNAseq cell cluster labeling

Indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Stars: ✭ 198 (+382.93%)
Mutual labels:  benchmark, datasets
DiscEval
Discourse Based Evaluation of Language Understanding
Stars: ✭ 18 (-56.1%)
Mutual labels:  benchmark, datasets
delitos-caba
🚓 Crime dataset for the City of Buenos Aires, Argentina
Stars: ✭ 44 (+7.32%)
Mutual labels:  datasets
utils
⚡ A collection of common functions but with better performance, less allocations and less dependencies created for Fiber.
Stars: ✭ 21 (-48.78%)
Mutual labels:  benchmark
metadat
Meta-analytic datasets for R
Stars: ✭ 21 (-48.78%)
Mutual labels:  datasets
data.world-py
Python package for data.world
Stars: ✭ 98 (+139.02%)
Mutual labels:  datasets
dagpi
Dagpi is a powerful and fast api that does image manipulation as well as serves datasets. It is fast and written in rust and python. Perfect for discord bots, social media apps, camera apps and more.
Stars: ✭ 25 (-39.02%)
Mutual labels:  datasets
clothing-detection-ecommerce-dataset
Clothing detection dataset
Stars: ✭ 43 (+4.88%)
Mutual labels:  datasets
moros
A modern http(s) benchmark tool
Stars: ✭ 14 (-65.85%)
Mutual labels:  benchmark
humanflow2
Official repository of Learning Multi-Human Optical Flow (IJCV 2019)
Stars: ✭ 37 (-9.76%)
Mutual labels:  datasets
biomechanics dataset
Information of public available data sets for biomechanics.
Stars: ✭ 31 (-24.39%)
Mutual labels:  datasets
pipeComp
A R framework for pipeline benchmarking, with application to single-cell RNAseq
Stars: ✭ 38 (-7.32%)
Mutual labels:  single-cell-rna-seq
geodaData
Data package for accessing GeoDa datasets using R
Stars: ✭ 15 (-63.41%)
Mutual labels:  datasets
CHR
SIXray : A Large-scale Security Inspection X-ray Benchmark in CVPR 2019
Stars: ✭ 78 (+90.24%)
Mutual labels:  datasets
scaden
Deep Learning based cell composition analysis with Scaden.
Stars: ✭ 61 (+48.78%)
Mutual labels:  single-cell-rna-seq
dnstrace
Command-line DNS benchmark
Stars: ✭ 68 (+65.85%)
Mutual labels:  benchmark
Taiji
All-in-one analysis pipeline
Stars: ✭ 28 (-31.71%)
Mutual labels:  single-cell-rna-seq
Nebulosa
R package to visualize gene expression data based on weighted kernel density estimation
Stars: ✭ 50 (+21.95%)
Mutual labels:  single-cell-rna-seq
dh-core
Functional data science
Stars: ✭ 123 (+200%)
Mutual labels:  datasets
QASMBench
QASMBench is an OpenQASM benchmark suite running on IBM Quantum-Experience backends.
Stars: ✭ 22 (-46.34%)
Mutual labels:  benchmark

scRNAseq_cell_cluster_labeling

Description

This repository contains scripts to run and benchmark scRNA-seq cell cluster labeling methods and is a companion to our paper 'Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data' (Diaz-Mejia JJ et al (2019) [https://f1000research.com/articles/8-296].

Scripts

Main wrapper scripts

Script name Task(s)
subsamples_gene_classes_and_runs_enrichment_scripts.R Main wrapper to run and benchmark cell cluster labeling methods

Scripts to run cell type labeling methods

Script name Task(s)
obtains_CIBERSORT_for_MatrixColumns.pl Runs CIBERSORT using gene expression signatures and a matrix with average gene expressions per gene, per cell cluster
obtains_GSEA_for_MatrixColumns.pl Runs GSEA using gene expression signatures and a matrix with average gene expressions per cell cluster
obtains_GSVA_for_MatrixColumns.R Runs GSVA using gene expression signatures and a matrix with average gene expressions per cell cluster
obtains_METANEIGHBOR_for_MatrixColumns.R Runs MetaNeighborUS using gene expression signatures, a matrix with average gene expressions per cell cluster, and reference cell types. Note: modifications were made to the R library(metaneighbor) source code. Check bin/r_programs/obtains_METANEIGHBOR_for_MatrixColumns.R for details
obtains_ORA_for_MatrixColumns.pl Runs ORA using gene expression signatures and a matrix with average gene expressions per cell cluster

Scripts to run ROC and PR curve analyses

Script name Task(s)
obtains_performance_plots_from_cluster_labelings.pl Compiles results from cell type labeling methods and obtains ROC and PR curves plots and AUC's
obtains_ROC_and_PR_curves_from_matrix_with_gold_standards.R Obtains ROC and PR curve plots, ROC AUC and PR AUC values from a matrix of reference labels in column 2 and predictions in columns 3 to N

Scripts to subsample cell type gene expression signatures

Script name Task(s)
obtains_permuted_samples_from_gmt.R Subsamples genes from gene expression signatures in the form of gene sets
propagates_permuted_gmt_files_to_profile.R Propagates subsampling from signatures in the form of gene sets to those in the form of gene expression profiles

Other scripts

Script name Task(s)
obtains_average_gene_expression_per_cluster.R Obtains a matrix with average gene expressions per cell cluster from scRNA-seq data, and cell cluster assignments)

How to run the scripts

  • To see the help of R scripts run them like:
    Rscript ~/path_to_script/script.R -h

  • To see the help of Perl scripts, make the files executable with
    chmod +x script.pl and run them like:
    ~/path_to_script/script.pl

Dependencies

  • Check each script code for dependencies and further documentation.

  • To install all R packages use:
    install.packages(c("optparse", "vioplot", "GSA", "data.table", "precrec", "ROCR", "Seurat", "dplyr", "Rserve", "e1071", "colorRamps", "stats"))
    to install packages from CRAN.
    And:
    install.packages("BiocManager")
    BiocManager::install(c("preprocessCore", "GSVA", "qvalue"))
    to install packages from Bioconductor
    Used R version 3.5.1

  • To install Perl script dependencies download perl_modules directory from this repository
    and add it to your PERL5LIB environment variable.
    Other Perl modules required are: Date::Calc which can be installed from CPAN
    Used Perl version 5

  • The following Java scripts are needed:
    CIBERSORT.jar can be obtained from https://cibersort.stanford.edu/download.php
    gsea-3.0.jar can be obtained from http://software.broadinstitute.org/gsea/downloads.jsp
    Used Java version 1.8.0_162

Input Datasets

  • Three scRNA-seq datasets that can be used as inputs for these scrips, from liver cells (MacParland et al, 2018), peripheral blood mononuclear cells (PBMCs) (Zheng et al, 2017) and retinal neurons (Shekhar et al, 2016), were processed, curated and deposited into Zendo: https://doi.org/10.5281/zenodo.2575050

Example inputs and outputs can be found here:
https://github.com/jdime/scRNAseq_cell_cluster_labeling/tree/master/examples

Archived code at time of publication

Version 1.0 http://doi.org/10.5281/zenodo.2583161
Version 2.0 http://doi.org/10.5281/zenodo.3350461

Issues and feature requests

Please click here to report an 'New Issue' (i.e. report bugs or request features). https://github.com/jdime/scRNAseq_cell_cluster_labeling/issues

Authors

Javier Diaz (https://github.com/jdime)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].