All Projects → PaccMann → paccmann_rl

PaccMann / paccmann_rl

Licence: MIT license
Code pipeline for the PaccMann^RL in iScience: https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6

Projects that are alternatives of or similar to paccmann rl

py4chemoinformatics
Python for chemoinformatics
Stars: ✭ 78 (+254.55%)
Mutual labels:  drug-discovery
GeneTonic
Enjoy your transcriptomic data and analysis responsibly - like sipping a cocktail
Stars: ✭ 66 (+200%)
Mutual labels:  transcriptomics
overlord
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.
Stars: ✭ 35 (+59.09%)
Mutual labels:  generative-models
MERlin
MERlin is an extensible analysis pipeline applied to decoding MERFISH data
Stars: ✭ 19 (-13.64%)
Mutual labels:  transcriptomics
pychopper
A tool to identify, orient, trim and rescue full length cDNA reads
Stars: ✭ 74 (+236.36%)
Mutual labels:  transcriptomics
dee2
Digital Expression Explorer 2 (DEE2): a repository of uniformly processed RNA-seq data
Stars: ✭ 32 (+45.45%)
Mutual labels:  transcriptomics
Generative-Model
Repository for implementation of generative models with Tensorflow 1.x
Stars: ✭ 66 (+200%)
Mutual labels:  generative-models
Pando
Multiome GRN inference.
Stars: ✭ 21 (-4.55%)
Mutual labels:  transcriptomics
lffont
Official PyTorch implementation of LF-Font (Few-shot Font Generation with Localized Style Representations and Factorization) AAAI 2021
Stars: ✭ 110 (+400%)
Mutual labels:  generative-models
screenlamp
screenlamp is a Python toolkit for hypothesis-driven virtual screening
Stars: ✭ 20 (-9.09%)
Mutual labels:  drug-discovery
fewshot-font-generation
The unified repository for few-shot font generation methods. This repository includes FUNIT (ICCV'19), DM-Font (ECCV'20), LF-Font (AAAI'21) and MX-Font (ICCV'21).
Stars: ✭ 76 (+245.45%)
Mutual labels:  generative-models
scCATCH
Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data
Stars: ✭ 137 (+522.73%)
Mutual labels:  transcriptomics
GLaDOS
Web Interface for ChEMBL @ EMBL-EBI
Stars: ✭ 28 (+27.27%)
Mutual labels:  drug-discovery
diffxpy
Differential expression analysis for single-cell RNA-seq data.
Stars: ✭ 137 (+522.73%)
Mutual labels:  transcriptomics
Deep-Drug-Coder
A tensorflow.keras generative neural network for de novo drug design, first-authored in Nature Machine Intelligence while working at AstraZeneca.
Stars: ✭ 143 (+550%)
Mutual labels:  drug-discovery
pipeline-pinfish-analysis
Pipeline for annotating genomes using long read transcriptomics data with pinfish
Stars: ✭ 27 (+22.73%)
Mutual labels:  transcriptomics
pisces
PISCES is a pipeline for rapid transcript quantitation, genetic fingerprinting, and quality control assessment of RNAseq libraries using Salmon.
Stars: ✭ 23 (+4.55%)
Mutual labels:  transcriptomics
Autoregressive-models
Tensorflow 2.0 implementation of Deep Autoregressive Models
Stars: ✭ 18 (-18.18%)
Mutual labels:  generative-models
Scopy
An integrated negative design python library for desirable HTS/VS database design
Stars: ✭ 28 (+27.27%)
Mutual labels:  drug-discovery
TransPi
TransPi – a comprehensive TRanscriptome ANalysiS PIpeline for de novo transcriptome assembly
Stars: ✭ 18 (-18.18%)
Mutual labels:  transcriptomics

Build Status

paccmann_rl

Pipeline to reproduce the results of the PaccMannRL paper published in iScience.

Description

In the repo we provide a conda environment and instructions to reproduce the pipeline described in the manuscript:

  1. Train a multimodal drug sensitivity predictor (source code)
  2. Train a generative model for omic profiles, also known as the PVAE (source code)
  3. Train a generative model for molecules, also known as the SVAE (source code)
  4. Train PaccMann^RL (source code)

Requirements

  • conda>=3.7
  • The following data from here:
    • The processed splitted data from the folder splitted_data
    • The processed gene expression data from GDSC: data/gene_expression/gdsc-rnaseq_gene-expression.csv
    • The processed SMILES from the drugs from GDSC: data/smiles/gdsc.smi
    • A pickled SMILESLanguage object (data/smiles_language_chembl_gdsc_ccle.pkl)
    • A pickled list of genes representing the panel considered in the paper (data/2128_genes.pkl)
    • A pickled pandas DataFrame containing expression values and metadata for the cell lines considered in the paper (data/gdsc_transcriptomics_for_conditional_generation.pkl)
  • The git repos linked in the previous section

NOTE: please refer to the README.md and to the manuscript for details on the datasets used and the preprocessing applied.

Setup

Install the environment

Create a conda environment:

conda env create -f conda.yml

Activate the environment:

conda activate paccmann_rl

Download data

Download the data reported in the requirements section. From now on, we will assume that they are stored in the root of the repository in a folder called data, following this structure:

data
├── 2128_genes.pkl
├── gdsc-rnaseq_gene-expression.csv
├── gdsc.smi
├── gdsc_transcriptomics_for_conditional_generation.pkl
├── smiles_language_chembl_gdsc_ccle.pkl
└── splitted_data
    ├── gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv
    ├── gdsc_cell_line_ic50_train_fraction_0.9_id_997_seed_42.csv
    ├── tcga_rnaseq_test_fraction_0.1_id_242870585127480531622270373503581547167_seed_42.csv
    ├── tcga_rnaseq_train_fraction_0.9_id_242870585127480531622270373503581547167_seed_42.csv
    ├── test_chembl_22_clean_1576904_sorted_std_final.smi
    └── train_chembl_22_clean_1576904_sorted_std_final.smi

1 directory, 11 files

NOTE: no worries, the data folder is in the .gitignore.

Clone the repos

To get the scripts to run each of the component create a code folder and clone the repos. Simply type this:

mkdir code && cd code && \
  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_predictor && \ 
  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_omics && \ 
  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_chemistry && \ 
  git clone --branch 0.0.1 https://github.com/PaccMann/paccmann_generator && \
  cd ..

NOTE: no worries, the code folder is in the .gitignore.

Pipeline

Now it's all set to run the full pipeline.

NOTE: the workload required to run the full pipeline is intesive and might not be straightforward to run all the steps on a desktop laptop. For this reason, we also provide pretrained models that can be downloaded and used to run the different steps.

NOTE: in the following, we assume a folder models has been created in the root of the repository. No worries, the models folder is in the .gitignore.

Multimodal drug sensitivity predictor

(paccmann_rl) $ python ./code/paccmann_predictor/examples/train_paccmann.py \
    ./data/splitted_data/gdsc_cell_line_ic50_train_fraction_0.9_id_997_seed_42.csv \
    ./data/splitted_data/gdsc_cell_line_ic50_test_fraction_0.1_id_997_seed_42.csv \
    ./data/gdsc-rnaseq_gene-expression.csv \
    ./data/gdsc.smi \
    ./data/2128_genes.pkl \
    ./data/smiles_language_chembl_gdsc_ccle.pkl \
    ./models/ \
    ./code/paccmann_predictor/examples/example_params.json paccmann

PVAE

(paccmann_rl) $ python ./code/paccmann_omics/examples/train_vae.py \
    ./data/splitted_data/tcga_rnaseq_train_fraction_0.9_id_242870585127480531622270373503581547167_seed_42.csv \
    ./data/splitted_data/tcga_rnaseq_test_fraction_0.1_id_242870585127480531622270373503581547167_seed_42.csv \
    ./data/2128_genes.pkl \
    ./models/ \
    ./code/paccmann_omics/examples/example_params.json pvae

SVAE

(paccmann_rl) $ python ./code/paccmann_chemistry/examples/train_vae.py \
    ./data/splitted_data/train_chembl_22_clean_1576904_sorted_std_final.smi \
    ./data/splitted_data/test_chembl_22_clean_1576904_sorted_std_final.smi \
    ./data/smiles_language_chembl_gdsc_ccle.pkl \
    ./models/ \
    ./code/paccmann_chemistry/examples/example_params.json svae

PaccMann^RL

(paccmann_rl) $ python ./code/paccmann_generator/examples/train_paccmann_rl.py \
    ./models/svae \
    ./models/pvae \
    ./models/paccmann \
    ./data/smiles_language_chembl_gdsc_ccle.pkl \
    ./data/gdsc_transcriptomics_for_conditional_generation.pkl \
    ./code/paccmann_generator/examples/example_params.json \
    paccmann_rl breast

NOTE: this will create a biased_model folder containing the conditional generator and the baseline SMILES generator used. In this case: breast_paccmann_rl and baseline. No worries, the biased_models folder is in the .gitignore.

References

If you use paccmann_rl in your projects, please cite the following:

@article{born2021paccmannrl,
  title = {PaccMann\textsuperscript{RL}: De novo generation of hit-like anticancer molecules from transcriptomic data via reinforcement learning},
  journal = {iScience},
  volume = {24},
  number = {4},
  pages = {102269},
  year = {2021},
  issn = {2589-0042},
  doi = {https://doi.org/10.1016/j.isci.2021.102269},
  url = {https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6},
  author = {Born, Jannis and Manica, Matteo and Oskooei, Ali and Cadow, Joris and Markert, Greta and {Rodr{\'{i}}guez Mart{\'{i}}nez}, Mar{\'{i}}a}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].