All Projects → njpipeorgan → L1000-bayesian

njpipeorgan / L1000-bayesian

Licence: Apache-2.0 License
L1000 peak deconvolution based on Bayesian analysis

Programming Languages

Mathematica
289 projects
C++
36643 projects - #6 most used programming language
Cuda
1817 projects

Projects that are alternatives of or similar to L1000-bayesian

bystro
Bystro genetic analysis (annotation, filtering, statistics)
Stars: ✭ 31 (+72.22%)
Mutual labels:  bioinformatics-databases, bioinformatics-algorithms
dynmethods
A collection of 50+ trajectory inference methods within a common interface 📥📤
Stars: ✭ 94 (+422.22%)
Mutual labels:  bioinformatics-algorithms
ClinCNV
Detection of copy number changes in Germline/Trio/Somatic contexts in NGS data
Stars: ✭ 48 (+166.67%)
Mutual labels:  bioinformatics-algorithms
REPTILE
Predicting regulatory DNA elements based on epigenomic signatures
Stars: ✭ 25 (+38.89%)
Mutual labels:  bioinformatics-algorithms

L1000 peak deconvolution based on Bayesian analysis

Overview

This project is intended to generate high quality perturbagen signatures from LINCS L1000 assay. We build a pipeline, in parallel with L1000 group, to process raw fluorescent intensity data into z-scores as perturbagen signatures. Pre-computed datasets covering a majority of LINCS L1000 Phase I and Phase II is available in Downloads and Zenodo.

Our pipeline is different from the L1000 pipeline mostly in the peak deconvolution algorithm. We implement our algorithm in both C++ and CUDA, which can be used with various languages. We give two examples for how to use these functions with C++ natively and how to be called in Wolfram Mathematica.

Also, we have prepared a small batch of real data and relavant code for you to test our pipeline at a very small scale. You may follow the instructions, run the pipeline, and check the results.

Datasets

Summary

LINCS L1000 Phase I (GSE92742) & Phase II (GSE70138) datasets generated by our pipeline are currently available. The datasets cover three levels: Our Level 4 and Level 5 data are equivalent to Level 4 and Level 5 data provided by L1000; the marginal distributions data of peak locations (GSE92742 small molecule treatments only and GSE70138) are similiar to L1000 Level 2 data, except that they are probability distributions instead of precise numbers of peak locations.

Unless you are interested in managing z-score inference and combination, we encourage you to use combined z-scores by bio-replicates (Level 5 data).

Downloads

Description Download
Marginal distributions of peak locations Bayesian_GSE70138_Level2_DPEAK.zip
Bayesian_GSE92742_Level2_DPEAK.zip
Plate control z-scores Bayesian_GSE70138_Level4_ZSPC_n335465x978.h5
Bayesian_GSE92742_Level4_ZSPC_n1093191x978.h5
Combined z-scores by bio-replicates Bayesian_GSE70138_Level5_COMPZ_n116218x978.h5
Bayesian_GSE92742_Level5_COMPZ_n361481x978.h5
Checksum Bayesian_L1000_sha512sum.txt

The meta data are available from the publication by L1000 group: GSE70138 and GSE92742. They include perturbagen and cell line information associated with signature and instance IDs in the datasets.

Data stuctures

The z-score results (as HDF5) are compatible with those published by L1000 group. Each of them contains three datasets as follows:

  • /colid are the signature IDs (Level 5) or instance IDs (Level 4);

  • /rowid are the names of landmark genes;

  • /data are the z-scores as a matrix.

Each marginal distribution file contain the information of peak locations on one plate. It contains four datasets as follows:

  • /colid are the instance IDs;

  • /rowid are the names of landmark genes;

  • /peakloc are the locations of the peaks for calculating likelihood function;

  • /data are encoded log-likelihoods as a rank-3 array of 16-bit unsigned integers. To retrieve the log-likelihoods, the values should be multiplied by a factor of -0.001. Note that they are not normalized.

Citation

Qiu, Yue, et al., 2020, Bioinformatics, 36(9), 2787, https://doi.org/10.1093/bioinformatics/btaa064

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].