All Projects → edvardlindelof → graph-neural-networks-for-drug-discovery

edvardlindelof / graph-neural-networks-for-drug-discovery

Licence: MIT license
odr.chalmers.se/handle/20.500.12380/256629?locale=en

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to graph-neural-networks-for-drug-discovery

Deep-Drug-Coder
A tensorflow.keras generative neural network for de novo drug design, first-authored in Nature Machine Intelligence while working at AstraZeneca.
Stars: ✭ 143 (+83.33%)
Mutual labels:  drug-discovery
awesome-small-molecule-ml
A curated list of resources for machine learning for small-molecule drug discovery
Stars: ✭ 54 (-30.77%)
Mutual labels:  drug-discovery
FMol
A simplified drug discovery pipeline -- generating SMILE molecular with AlphaSMILES, predicting protein structure with AlphaFold, and checking the druggability with fpocket/Amber.
Stars: ✭ 13 (-83.33%)
Mutual labels:  drug-discovery
paccmann rl
Code pipeline for the PaccMann^RL in iScience: https://www.cell.com/iscience/fulltext/S2589-0042(21)00237-6
Stars: ✭ 22 (-71.79%)
Mutual labels:  drug-discovery
protwis
Protwis is the backbone of the GPCRdb. The GPCRdb contains reference data, interactive visualisation and experiment design tools for G protein-coupled receptors (GPCRs).
Stars: ✭ 20 (-74.36%)
Mutual labels:  drug-discovery
Calibrated-Boosting-Forest
Original implementation of Calibrated Boosting-Forest
Stars: ✭ 18 (-76.92%)
Mutual labels:  drug-discovery
GLaDOS
Web Interface for ChEMBL @ EMBL-EBI
Stars: ✭ 28 (-64.1%)
Mutual labels:  drug-discovery
zero-shot-indoor-localization-release
The official code and datasets for "Zero-Shot Multi-View Indoor Localization via Graph Location Networks" (ACMMM 2020)
Stars: ✭ 44 (-43.59%)
Mutual labels:  graph-neural-networks
skywalkR
code for Gogleva et al manuscript
Stars: ✭ 28 (-64.1%)
Mutual labels:  drug-discovery
chemicalx
A PyTorch and TorchDrug based deep learning library for drug pair scoring.
Stars: ✭ 176 (+125.64%)
Mutual labels:  drug-discovery
egfr-att
Drug effect prediction using neural network
Stars: ✭ 17 (-78.21%)
Mutual labels:  drug-discovery
CANDO
Computational Analysis of Novel Drug Opportunities
Stars: ✭ 27 (-65.38%)
Mutual labels:  drug-discovery
AMIDD
Introduction to Applied Mathematics and Informatics in Drug Discovery (AMIDD)
Stars: ✭ 13 (-83.33%)
Mutual labels:  drug-discovery
Scopy
An integrated negative design python library for desirable HTS/VS database design
Stars: ✭ 28 (-64.1%)
Mutual labels:  drug-discovery
rxdock
RxDock is a fork of rDock. Note: the latest code is under development. Please do git checkout patched-rdock after clone if you want patched rDock. [IMPORTANT NOTE: pull requests should be posted on GitLab, this is a read-only source code mirror]
Stars: ✭ 38 (-51.28%)
Mutual labels:  drug-discovery
screenlamp
screenlamp is a Python toolkit for hypothesis-driven virtual screening
Stars: ✭ 20 (-74.36%)
Mutual labels:  drug-discovery
cbh21-protein-solubility-challenge
Template with code & dataset for the "Structural basis for solubility in protein expression systems" challenge at the Copenhagen Bioinformatics Hackathon 2021.
Stars: ✭ 15 (-80.77%)
Mutual labels:  drug-discovery
LR-GCCF
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach, AAAI2020
Stars: ✭ 99 (+26.92%)
Mutual labels:  graph-neural-networks
Deepchem
Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology
Stars: ✭ 3,324 (+4161.54%)
Mutual labels:  drug-discovery
Rcpi
Molecular informatics toolkit with a comprehensive integration of bioinformatics and cheminformatics tools for drug discovery.
Stars: ✭ 22 (-71.79%)
Mutual labels:  drug-discovery

Property Prediction with Neural Networks on Raw Molecular Graphs

This code is the basis of two works carried out at AstraZeneca:

The thesis is richer in technical detail but is not peer reviewed and contains an erroneously generated result for the ESOL dataset. The paper contains a more thorough and carefully generated collection of results.

Related work

The four most important related papers are:

  • Gated Graph Sequence Neural Networks presents a graph neural network used as baseline in the present work as well as in that of the paper below
  • Neural Message Passing for Quantum Chemistry defines the MPNN framework for graph neural networks, implemented in this code as the abstract class SummationMPNN
  • Graph Attention Networks presents a model for node classification with a message aggregation step that does not fit within the MPNN framework but does fit within the slightly more general framework implemented as the abstract AggregationMPNN class, and may be seen as a computationally lighter variant to that of the present work's attention models
  • Analyzing Learned Molecular Representations for Property Prediction presents a modification of the MPNN framework investigated in parallel in my thesis work, namely to message pass in the graph defined by the edge-adjacency matrix before aggregating the states of the directed edges into the corresponding node to be able to carry out an MPNN-style readout step

The last-mentioned paper's model can be implemented by extending the abstract class EMN, which notably also permits models with other message aggregation schemes than simple summation.

Install dependencies

The code requires torch, rdkit and sklearn. Create a conda environment and install them for example by doing:

$ conda create python=3.6.8 -p ~/gnnenv
$ source activate ~/gnnenv/
$ conda install pytorch=1.0.1 cudatoolkit=9.0 -c pytorch
$ conda install rdkit=2018.09.1.0 -c rdkit
$ conda install scikit-learn=0.20.2

Note that the above versions are the very specific ones I used. It may well be possible to relax the constraints. If you already have a conda environment with the three packages, you could first try to run the code with that.

If you want to use the tensorboard logging option you also need to install tensorboardX in your torch environment and tensorboard in any environment.

Training

To see available models, do:

$ python train.py -h

To see all options available when training for example the GGNN model, do:

$ python train.py GGNN -h

A command line for training a model with some specific options may look like:

$ python train.py ENNS2V --train-set toydata/piece-of-tox21-train.csv.gz --valid-set toydata/piece-of-tox21-valid.csv.gz --test-set toydata/piece-of-tox21-test.csv.gz --loss MaskedMultiTaskCrossEntropy --score roc-auc --s2v-lstm-computations 9 --out-hidden-dim 150 --logging more --epochs 20 --learn-rate 0.0001

To run a training session on your own data files, study the examples in toydata/ to understand how to format them. For classification data -1 represents negative, 0 missing and +1 positive.

Using a saved model for prediction on new compounds

Use the --savemodel flag when starting the training:

$ python train.py GGNN --epochs 3 --train-set toydata/piece-of-tox21-train.csv.gz --valid-set toydata/piece-of-tox21-valid.csv.gz --test-set toydata/piece-of-tox21-test.csv.gz --savemodel

After training is finished, a file in savedmodels/ will contain the model in the state it was when it showed the highest validation scores. Use it to make predictions on new data by doing:

$ python predict.py --modelpath savedmodels/GGNN2019-02-22\ 12\:16\:17.432742 --score roc-auc --datapath toydata/piece-of-tox21-train-for-prediction.csv.gz

(Replace GGNN2019-02-22\ 12:16:17.432742 with your newly saved file.) Note that the same --score argument as used when training need to be supplied for correct output scaling. The predictions are printed to stdout in csv format. To store it to a file, add e.g. > predictions.csv to the end of the command.

What are good hyperparameters?

For some hints, see the comments to the default hyperparameter dictionaries in train.py.

Command line examples

Regression:

$ python train.py ENNS2V --train-set toydata/piece-of-esol.csv.gz --valid-set toydata/piece-of-esol.csv.gz --test-set toydata/piece-of-esol.csv.gz --loss MSE --score RMSE --s2v-lstm-computations 9 --out-hidden-dim 150 --logging more --epochs 20 --learn-rate 0.0001

Submitting a job to slurm (if available):

$ PYTHONUNBUFFERED=1 srun -t 60 -c 2 --mem 20g -p gpu --gres gpu:1 python train.py AttentionGGNN --cuda

--mem 20g is conservative enough to never run out of memory. -t depends entirely on dataset, epochs and model.

Tests

To run all tests, do:

$ python -m unittest discover --verbose

For a specific one, do:

$ python -m unittest tests.test_example --verbose
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].