Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

ZJUFanLab / Scdeepsort

Licence: gpl-3.0

Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning annotation

Projects that are alternatives of or similar to Scdeepsort

Smarttable

一款android自动生成表格框架---An Android automatically generated table framework

Stars: ✭ 4,621 (+15303.33%)

Mutual labels: annotation

Ksnip

ksnip the cross-platform screenshot and annotation tool

Stars: ✭ 776 (+2486.67%)

Mutual labels: annotation

Person Search Annotation

Cross-Platform Annotation Tool for Person Search Datasets

Stars: ✭ 9 (-70%)

Mutual labels: annotation

Screenity

The most powerful screen recorder & annotation tool for Chrome 🎥

Stars: ✭ 6,229 (+20663.33%)

Mutual labels: annotation

Polyrnn Pp Pytorch

PyTorch training/tool code for Polygon-RNN++ (CVPR 2018)

Stars: ✭ 672 (+2140%)

Mutual labels: annotation

Frankie

A frankenstein framework - middleware and annotation based

Stars: ✭ 19 (-36.67%)

Mutual labels: annotation

Myvision

Computer vision based ML training data generation tool 🚀

Stars: ✭ 453 (+1410%)

Mutual labels: annotation

Netty Websocket Spring Boot Starter

🚀 lightweight high-performance WebSocket framework （轻量级、高性能的WebSocket框架）

Stars: ✭ 885 (+2850%)

Mutual labels: annotation

Polyrnn Pp

Inference Code for Polygon-RNN++ (CVPR 2018)

Stars: ✭ 704 (+2246.67%)

Mutual labels: annotation

Gateway

🏰 Serving distributed Web Annotations from the decentralized web

Stars: ✭ 25 (-16.67%)

Mutual labels: annotation

Label Studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Stars: ✭ 7,264 (+24113.33%)

Mutual labels: annotation

Fastbootweixin

基于Spring Boot的注解驱动式公众号极速开发框架，用注解重新定义公众号开发

Stars: ✭ 640 (+2033.33%)

Mutual labels: annotation

Annotationdemo

Android／Java编译时注解处理Demo。用于自动生成工厂代码。

Stars: ✭ 24 (-20%)

Mutual labels: annotation

Cvat

Powerful and efficient Computer Vision Annotation Tool (CVAT)

Stars: ✭ 6,557 (+21756.67%)

Mutual labels: annotation

Shebanq

Exposing the Hebrew Text Database of the ETCBC

Stars: ✭ 13 (-56.67%)

Mutual labels: annotation

Polar Bookshelf

Polar is a personal knowledge repository for PDF and web content supporting incremental reading and document annotation.

Stars: ✭ 4,411 (+14603.33%)

Mutual labels: annotation

Idea Php Symfony2 Plugin

IntelliJ IDEA / PhpStorm Symfony Plugin

Stars: ✭ 797 (+2556.67%)

Mutual labels: annotation

Saf Kotlin Router

android路由框架，支持模块化架构

Stars: ✭ 28 (-6.67%)

Mutual labels: annotation

Breadcast

Small Broadcast Receiver Library for Android

Stars: ✭ 15 (-50%)

Mutual labels: annotation

Tagger

An easy to use mpv script to annotate videos with tags while you watch.

Stars: ✭ 25 (-16.67%)

Mutual labels: annotation

View All Similar Projects ➔

scDeepSort

Reference-free Cell-type Annotation for Single-cell Transcriptomics using Deep Learning with a Weighted Graph Neural Network

Recent advance in single-cell RNA sequencing (scRNA-seq) has enabled large-scale transcriptional characterization of thousands of cells in multiple complex tissues, in which accurate cell type identification becomes the prerequisite and vital step for scRNA-seq studies.

To addresses this challenge, we developed a reference-free cell-type annotation method, namely scDeepSort, using a state-of-the-art deep learning algorithm, i.e. a modified graph neural network (GNN) model. It’s the first time that GNN is introduced into scRNA-seq studies and demonstrate its ground-breaking performances in this application scenario. In brief, scDeepSort was constructed based on our weighted GNN framework and was then learned in two embedded high-quality scRNA-seq atlases containing 764,741 cells across 88 tissues of human and mouse, which are the most comprehensive multiple-organs scRNA-seq data resources to date. For more information, please refer to a preprint in bioRxiv 2020.05.13.094953.

Install

Download source codes of scDeepSort.
Download pretrained models from the release page and uncompress them.

tar -xzvf pretrained.tar.gz

After executing the above steps, the final scDeepSort tree should look like this:

 |- pretrained
     |- human
        |- graphs
        |- statistics
        |- models
     |- mouse
        |- graphs
        |- statistics
        |- models
 |- test
     |- human
     |- mouse
 |- train
    |- human
    |- mouse
 |- map
    |- human
        |- map.xlsx
    |- mouse
        |- map.xlsx
    |- celltype2subtype.xlsx
 |- models
    |- __init__.py
    |- gnn.py
 |- utils
    |- __init__.py
    |- preprocess.py
    |- preprocess_internal.py
 |- R
    |- example_data.rds
    |- geneinfo.rds
 |- pre-process.R
 |- predict.py
 |- train.py
 |- requirements.txt
 |- README.md
 |- LICENSE

Dependency

Dependencies can also be installed using pip install -r requirements.txt
To use GPU, please install the gpu version of dgl, see Install DGL for more details.

Usage

Predict using pre-trained models

The file name of test data should be named in this format: species_TissueNumber_data.csv. For example, human_Pancreas11_data.csv is a data file containing 11 human pancreas cells.
The test single-cell transcriptomics csv data file should be pre-processed by first revising gene symbols according to NCBI Gene database updated on Jan. 10, 2020, wherein unmatched genes and duplicated genes will be removed. Then the data should be normalized with the defalut LogNormalize method in Seurat (R package), detailed in pre-process.R, wherein the column represents each cell and the row represent each gene for final test data, as shown below.

Cell 1 Cell 2 Cell 3 ...

Gene 1 0 2.4 5.0 ...

Gene 2 0.8 1.1 4.3 ...

Gene 3 1.8 0 0 ...

... ... ... ... ...
All the test data should be included under the test directory. Human datasets should be under ./test/human and mouse datasets should be under ./test/mouse

	Cell 1	Cell 2	Cell 3	...
Gene 1	0	2.4	5.0	...
Gene 2	0.8	1.1	4.3	...
Gene 3	1.8	0	0	...
...	...	...	...	...

Evaluate

Use --evaluate to reproduce the results as shown in our paper. For example, to evaluate the data mouse_Testis199_data.csv, you should execute the following command:

python predict.py --species human --tissue Testis --test_dataset 199 --gpu -1 --evaluate --filetype gz --unsure_rate 2

--species The species of cells, human or mouse.
--tissue The tissue of cells. See wiki page
--test_dataset The number of cells in the test data.
--gpu Specify the GPU to use, 0 for gpu,-1 for cpu.
--filetype The format of datafile, csv for .csv files and gz for .gz files. See pre-process.R
--unsure_rate The threshold to define the unsure type, default is 2. Set it as 0 to exclude the unsure type.

Output: the output named as species_Tissue_Number.csv will be under the automatically generated result directory, which contains four columns, the first is the cell id, the second is the original cell type, the third is the predicted main type, the fourth is the predicted subtype if applicable.

Note: to evaluate all testing datasets in our paper, please download them in release page

Test

Use --test to test your own datasets. For example, to test the data human_Pancreas11_data.csv, you should execute the following command:

python predict.py --species human --tissue Pancreas --test_dataset 11 --gpu -1 --test --filetype csv --unsure_rate 2

--species The species of cells, human or mouse.
--tissue The tissue of cells. See wiki page
--test_dataset The number of cells in the test data.
--gpu Specify the GPU to use, 0 for gpu, -1 for cpu.
--filetype The format of datafile, csv for .csv files and gz for .gz files. See pre-process.R
--unsure_rate The threshold to define the unsure type, default is 2. Set it as 0 to exclude the unsure type.

Output: the output named as species_Tissue_Number.csv will be under the automatically generated result directory, which contains three columns, the first is the cell id, the second is the predicted main type, the third is the predicted subtype if applicable.

Train your own model and predict

To train your own model, you should prepare two files, i.e., a data file as descrived above, and a cell annotation file under the ./train directory as the example files. Then execute the following command:

python train.py --species human --tissue Adipose --gpu -1 --filetype gz

python train.py --species mouse --tissue Muscle --gpu -1 --filetype gz

--species The species of cells, human or mouse.
--tissue The tissue of cells.
--gpu Specify the GPU to use, 0 for gpu, -1 for cpu.
--filetype The format of datafile, csv for .csv files and gz for .gz files. See pre-process.R

Output: the trained model will be under the pretrained directory, which can be used to test new datasets on the same tissue using predict.py as described above.

About

scDeepSort manuscript is under major revision. For more information, please refer to the preprint in bioRxiv 2020.05.13.094953.. Should you have any questions, please contact Xin Shao at [email protected], Haihong Yang at [email protected], or Xiang Zhuang at [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 30

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗