All Projects β†’ pcahan1 β†’ CellNet

pcahan1 / CellNet

Licence: MIT License
CellNet: network biology applied to stem cell engineering

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to CellNet

CellO
CellO: Gene expression-based hierarchical cell type classification using the Cell Ontology
Stars: ✭ 34 (-12.82%)
Mutual labels:  rna-seq, cell-type-classification
picardmetrics
🚦 Run Picard on BAM files and collate 90 metrics into one file.
Stars: ✭ 38 (-2.56%)
Mutual labels:  rna-seq
diffexpr
Porting DESeq2 and DEXSeq into python via rpy2
Stars: ✭ 49 (+25.64%)
Mutual labels:  rna-seq
MetaOmGraph
MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets
Stars: ✭ 30 (-23.08%)
Mutual labels:  rna-seq
FEELnc
FEELnc : FlExible Extraction of LncRNA
Stars: ✭ 61 (+56.41%)
Mutual labels:  rna-seq
gene-oracle
Feature extraction algorithm for genomic data
Stars: ✭ 13 (-66.67%)
Mutual labels:  rna-seq
squid
SQUID detects both fusion-gene and non-fusion-gene structural variations from RNA-seq data
Stars: ✭ 37 (-5.13%)
Mutual labels:  rna-seq
RNASeq
RNASeq pipeline
Stars: ✭ 30 (-23.08%)
Mutual labels:  rna-seq
cellSNP
Pileup biallelic SNPs from single-cell and bulk RNA-seq data
Stars: ✭ 42 (+7.69%)
Mutual labels:  rna-seq
tailseeker
Software for measuring poly(A) tail length and 3β€²-end modifications using a high-throughput sequencer
Stars: ✭ 17 (-56.41%)
Mutual labels:  rna-seq
cellrouter
Reconstruction of complex single-cell trajectories using CellRouter
Stars: ✭ 38 (-2.56%)
Mutual labels:  stem-cell-differentiation
ideal
Interactive Differential Expression AnaLysis - DE made accessible and reproducible
Stars: ✭ 24 (-38.46%)
Mutual labels:  rna-seq
CoNekT
CoNekT (short for Co-expression Network Toolkit) is a platform to browse co-expression data and enable cross-species comparisons.
Stars: ✭ 17 (-56.41%)
Mutual labels:  rna-seq
CICERO
CICERO: a versatile method for detecting complex and diverse driver fusions using cancer RNA sequencing data.
Stars: ✭ 19 (-51.28%)
Mutual labels:  rna-seq
velodyn
Dynamical systems methods for RNA velocity analysis
Stars: ✭ 16 (-58.97%)
Mutual labels:  rna-seq
CD4-csaw
Reproducible reanalysis of a combined ChIP-Seq & RNA-Seq data set
Stars: ✭ 16 (-58.97%)
Mutual labels:  rna-seq
alevin-fry
🐟 πŸ”¬πŸ¦€ alevin-fry is an efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.
Stars: ✭ 78 (+100%)
Mutual labels:  rna-seq
MINTIE
Method for Identifying Novel Transcripts and Isoforms using Equivalence classes, in cancer and rare disease.
Stars: ✭ 24 (-38.46%)
Mutual labels:  rna-seq
pyrpipe
Reproducible bioinformatics pipelines in python. Import any Unix tool/command in python.
Stars: ✭ 53 (+35.9%)
Mutual labels:  rna-seq
dropClust
Version 2.1.0 released
Stars: ✭ 19 (-51.28%)
Mutual labels:  rna-seq

CellNet

Shortcut to bulk rna-seq protocol

Cloud-based RNA-Seq web application

Microarray CellNet web application

Microarray CellNet code

Introduction

CellNet is a network-biology-based, computational platform that assesses the fidelity of cellular engineering and generates hypotheses for improving cell derivations. CellNet is based on the reconstruction of cell type-specific gene regulatory networks (GRNs), which we performed using publicly available RNA-Seq data of 16 mouse and 16 human cell and tissue types. For now, there are two ways to run CellNet for RNA-Seq data. The easiest way to perform CellNet analysis is to use our web app. You can also run it as a command line tool on the cloud through Amazon Web Services, or you can run it locally. Below, we describe how to apply CellNet to your RNA-Seq data.

Ways to Run CellNet

Web application

The web application takes as input an expression matrix (counts, TPM, or FPKM), and sample meta-data. The application performs CellNet analysis. Additionally, this tool includes analysis of many state-of-the-art differentiation protocols, so that you can benchmark your results against those commonly used methods:

CellNet web app

Running CellNet in the Cloud

The public CellNet Amazon Machine Image (AMI), available on Amazon Web Services (AWS), has all of the prerequisite software and libraries pre-installed. Because of this and the scalable computing capacity of AWS, we highly recommend that you use AWS to run CellNet for RNA-Seq data instead of running it locally. If you are unfamiliar with AWS or cloud computing in general, we recommend the following links for further information:

The current CellNet AMI (CellNet_v_0.1.1 ami-2ab59855, as of July 2018) is available in the AWS US East 1 region (N. Virginia region). Running CellNet on AWS requires uploading your raw data (in the form of .fastq files) either directly to your running instance on AWS EC2, or to S3 and then to your instance. To learn about transferring your fastq files directly to your instance, see Transferring files to Linux machines using SCP. Note that Amazon charges by the hour for compute resources ($1.68/hour for a c3.8xlarge EC2 instance type). On average, it takes up to 2 hours to run a complete CellNet analysis for 144GB of raw data (9 samples of 16GB each).

Running CellNet Locally

Alternatively, you can run CellNet locally. The steps to do this are covered in our Nature Protocol.

You will need to install the following command line software:

If you are using Mac OS, this can be done easily with PIP and Homebrew.

Background information

Trained CellNet Objects (cnProc)

At the heart of CellNet is the Random Forest Classifier. This is the algorithm that will classify the results of a cell fate experiment. To analyze your own expression data with CellNet, you need a trained CellNet classifier object, which we refer to as a cnProc (CellNet Processor). You can select and use the appropriate cnProc that we have generated from the list below. You can also make your own using the code we provide here. This is useful if you want to add more cell types, or if you want to train up a cnProc for a different species. Note: generating a human cnProc requires a lot of computing power. In general, it should be generated using an EC2 instance - it is probably not a good idea to try performing this locally.

The main ingredients of a cnProc are:

  • An R matrix giving the expression levels of a number of genes across all the samples used to train CellNet
  • An R dataframe providing metadata on the samples in the expression data matrix (things like cell-type, alignment metrics, SRA accession numbers...)
SPECIES DATE CELL & TISSUE TYPES(# of profiles) cnProc raw training data
HS Oct_25_2016 b_cell (83), dendritic_cell (75), endothelial_cell (53), esc (52), fibroblast (79), heart (30), hspc (27), intestine_colon (64), kidney (29), liver (33), lung (95), macrophage (254), monocyte (207), neuron (109), skeletal_muscle (189), t_cell (53) Download
Mouse Oct_24_2016 b_cell (193), dendritic_cell (134), esc (134), fibroblast (182), heart (189), hspc (75), intestine_colon (149), kidney (109), liver (265), lung (116), macrophage (176), neuron (188), nk_cell (53), skeletal_muscle (130), t_cell (87), wat (64) Download Download
Human Apr_05_2017 b_cell (83), dendritic_cell (55), endothelial_cell (51), esc (52), fibroblast (46), heart (60), hspc (192), intestine_colon (85), kidney (62), liver (107), lung (94), monocyte_macrophage (206), neuron (90), skeletal_muscle (187), t_cell (43) Download
Human Jun_20_2017 Download Download

Example Data

These are some datasets you can use to test-drive applying CellNet to RNA-Seq data:

SPECIES DATE SRA ID DESCRIPTION METADATA EXPRESSION
Human Oct 30, 2015 SRP043684 Engineered Neurons metadata expression data
Mouse Mar 15, 2016 SRP059670 Reprogramming to Pluripotency metadata expression data

Salmon Index Table

If you are running CellNet locally, you will need to have salmon installed on your machine. Below are a few indexes that we have created from our transcriptome and know to work.

SPECIES SALMON INDEX DOWNLOAD NOTE/USAGE
Human 0.6.0 salmon.index.human.050316.tgz Default for AWS workflow
Mouse 0.6.0 salmon.index.mouse.050316.tgz Default for AWS workflow
Human 0.7.3 salmon.index.human.122116.tgz Protocol for local**
Mouse 0.7.3 salmon.index.mouse.122116.tgz Protocol for local**
Human 0.8.2 salmon.index.human.052617.tgz Uses latest version of Salmon to date
Mouse 0.8.2 salmon.index.mouse.052617.tgz Uses latest version of Salmon to date

** Here's the binary Salmon-0.7.3 Mac OSX link. Salmon-0.8.2 is a stable update and will work for either MacOSX or Linux.

Running CellNet on AWS

The steps below demonstrate how to run RNA-Seq CellNet on AWS. You need to log in to the AWS console, then click on EC2, and launch the CellNet_v_0.1.1 image (ami-2ab59855) on a c3.4Γ—large or c3.8Γ—large instance type.

To log in to the running instance, type the following command in the shell/terminal, but aws_private_key with the full path of the AWS key that you used to launch the instance. And replace instance_public_dns with the public DNS of your instance that can be found in the AWS console

ssh -i aws_private_key ec2-user@instance_public_dns

Once, you have logged in to the instance, you should launch screen.

Then, you need to load the latest version of CellNet (v0.1.1), which is pre-installed on this image. (However, you can follow these steps to install CellNet, if you ever need to do so.) You also need to configure the instance for pre-processing RNA-Seq data, including fetching the mouse transcriptome index. It is important to work in the same R session as where you call cn_setup().

screen
R
library(CellNet)
cn_setup() 

Now fetch the demonstration data, and the mouse cnProc

fetch_salmon_indices(species="mouse")
download.file("https://s3.amazonaws.com/CellNet/rna_seq/mouse/examples/SRP059670/st_SRP059670_example.rda", "st_SRP059670_example.rda")
stQuery <- utils_loadObject("st_SRP059670_example.rda")
stQuery <- cn_s3_fetchFastq("CellNet","rna_seq/mouse/examples/SRP059670",stQuery,fname="fname", compressed="gz")

download.file("https://s3.amazonaws.com/CellNet/rna_seq/mouse/cnProc_MM_RS_Oct_24_2016.rda", dest="./cnProc_MM_RS_Oct_24_2016.rda")
cnProc<-utils_loadObject("cnProc_MM_RS_Oct_24_2016.rda")

Pre-process the fastq files. This runs Salmon to estimate transcript abundances:

expList <- cn_salmon(stQuery) ## Assumes your fastq files are in the working directory. This takes ~15 minutes on the demo data

Applying CellNet:

cnRes <- cn_apply(expList[['normalized']], stQuery, cnProc)

Interpreting Output

CellNet produces a number of outputs, the most commonly used is the cnRes Object (CellNet Result). There are three figures that can be created from this:

Classification Heat Map: Displays classification score of each sample (column) to each of the cell and tissue types in the training data (rows):

pdf(file='hmclass_example.pdf', width=7, height=5)
cn_HmClass(cnRes)
dev.off()

To fetch this and other files that you save on AWS/EC2, you can use scp as shown below, replacing aws_private_key and instance_public_dns with your values:

scp -i aws_private_key ec2-user@instance_public_dns:/media/ephemeral0/analysis/*.pdf ./

Gene Regulatory Network Status Bar Plot: A more sensitive measure of the degree to which a particular cell type's GRN has been established in your experimental data.

fname<-'grnstats_esc_example.pdf'
bOrder<-c("fibroblast_train", unique(as.vector(stQuery$description1)), "esc_train")
cn_barplot_grnSing(cnRes,cnProc,"esc", c("fibroblast","esc"), bOrder, sidCol="sra_id")
ggplot2::ggsave(fname, width=5.5, height=5)
dev.off()

Network Influence Score Box and Whisker Plot: A suggestion of transcription factors that could be better regulated, ranked by their potential impact

rownames(stQuery)<-as.vector(stQuery$sra_id)
tfScores<-cn_nis_all(cnRes, cnProc, "esc") 

fname<-'nis_esc_example_Day0.pdf'
plot_nis(tfScores, "esc", stQuery, "Day0", dLevel="description1", limitTo=0) 
ggplot2::ggsave(fname, width=4, height=12)
dev.off()

Installing CellNet

If, for some reason, you need to install CellNet anew, you can do so by using devtools:

sudo R
library(devtools)
install_github("pcahan1/CellNet", ref="master")
q(save='no')
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].