All Projects → undeadpixel → Reinvent Randomized

undeadpixel / Reinvent Randomized

Licence: mit
Recurrent Neural Network using randomized SMILES strings to generate molecules

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Reinvent Randomized

Keras Attention
Visualizing RNNs using the attention mechanism
Stars: ✭ 697 (+1687.18%)
Mutual labels:  recurrent-neural-networks
Udacity Deep Learning Nanodegree
This is just a collection of projects that made during my DEEPLEARNING NANODEGREE by UDACITY
Stars: ✭ 15 (-61.54%)
Mutual labels:  recurrent-neural-networks
Recurrent Scene Parsing With Perspective Understanding In The Loop
parsing scene images with understanding geometric perspective in the loop
Stars: ✭ 32 (-17.95%)
Mutual labels:  recurrent-neural-networks
Machine Learning Curriculum
💻 Make machines learn so that you don't have to struggle to program them; The ultimate list
Stars: ✭ 761 (+1851.28%)
Mutual labels:  recurrent-neural-networks
Pytorch Rdpg
PyTorch Implementation of the RDPG (Recurrent Deterministic Policy Gradient)
Stars: ✭ 25 (-35.9%)
Mutual labels:  recurrent-neural-networks
Rnn Based Bitcoin Value Predictor
A Recurrent Neural Network to predict Bitcoin value
Stars: ✭ 21 (-46.15%)
Mutual labels:  recurrent-neural-networks
Rwa
Machine Learning on Sequential Data Using a Recurrent Weighted Average
Stars: ✭ 593 (+1420.51%)
Mutual labels:  recurrent-neural-networks
Reading comprehension tf
Machine Reading Comprehension in Tensorflow
Stars: ✭ 37 (-5.13%)
Mutual labels:  recurrent-neural-networks
Rnn lstm gesture recog
For recognising hand gestures using RNN and LSTM... Implementation in TensorFlow
Stars: ✭ 14 (-64.1%)
Mutual labels:  recurrent-neural-networks
Lstmvis
Visualization Toolbox for Long Short Term Memory networks (LSTMs)
Stars: ✭ 959 (+2358.97%)
Mutual labels:  recurrent-neural-networks
Deep Learning Time Series
List of papers, code and experiments using deep learning for time series forecasting
Stars: ✭ 796 (+1941.03%)
Mutual labels:  recurrent-neural-networks
Zoneout Tensorflow
An implementation of zoneout regularizer on LSTM-RNN by Tensorflow
Stars: ✭ 23 (-41.03%)
Mutual labels:  recurrent-neural-networks
Price prediction lob
Deep learning for price movement prediction using high frequency limit order data
Stars: ✭ 27 (-30.77%)
Mutual labels:  recurrent-neural-networks
Tensorflow Tutorial
TensorFlow and Deep Learning Tutorials
Stars: ✭ 748 (+1817.95%)
Mutual labels:  recurrent-neural-networks
Sibyl
Platform for backtesting and live-trading intraday Stock/ETF/ELW using recurrent neural networks
Stars: ✭ 32 (-17.95%)
Mutual labels:  recurrent-neural-networks
Parrot
RNN-based generative models for speech.
Stars: ✭ 601 (+1441.03%)
Mutual labels:  recurrent-neural-networks
Named Entity Recognition
name entity recognition with recurrent neural network(RNN) in tensorflow
Stars: ✭ 20 (-48.72%)
Mutual labels:  recurrent-neural-networks
Textclassifier
Text classifier for Hierarchical Attention Networks for Document Classification
Stars: ✭ 985 (+2425.64%)
Mutual labels:  recurrent-neural-networks
Flynet
Official PyTorch implementation of paper "A Hybrid Compact Neural Architecture for Visual Place Recognition" by M. Chancán (RA-L & ICRA 2020) https://doi.org/10.1109/LRA.2020.2967324
Stars: ✭ 37 (-5.13%)
Mutual labels:  recurrent-neural-networks
Theano Kaldi Rnn
THEANO-KALDI-RNNs is a project implementing various Recurrent Neural Networks (RNNs) for RNN-HMM speech recognition. The Theano Code is coupled with the Kaldi decoder.
Stars: ✭ 31 (-20.51%)
Mutual labels:  recurrent-neural-networks

Implementation of the molecular generative model using randomized SMILES strings

Note 1: The version published alongside Randomized SMILES strings improve the quality of molecular generative models is available in the separate branch randomized_smiles.

Note 2: This repository supersedes undeadpixel/reinvent-gdb13.

This repository holds the code to create, train and sample models akin to those described in Randomized SMILES strings improve the quality of molecular generative models and SMILES-based deep generative scaffold decorator for de-novo drug design. This version changes the implementation of the model to use packed sequences and several speed improvements. Also, the support for GRU cells has been dropped.

Specifically, it includes the following:

  • Python files in the main folder: Scripts to create, train, sample and calculate NLLs of models.
  • ./training_sets: Training set files (in canonical SMILES).

Requirements

This software has been tested on Linux with Tesla V-100 GPUs. We think it should work with other linux-based setups quite easily. The create randomized SMILES script uses Spark 2.4 to parallelize the creation of SMILES. By default it should run in local mode, but maybe further configuration is needed.

Install

A Conda environment.yml is supplied with all the required libraries.

$> git clone <repo url>
$> cd <repo folder>
$> conda env create -f environment.yml
$> conda activate reinvent-randomized
(reinvent-randomized) $> ...

From here the general usage applies.

General Usage

Four tools are supplied. Further information about the tool's arguments, please run it with -h. All output files are in tsv format (the separator is \t).

  1. Create Model (create_model.py): Creates a blank model file.
  2. Train Model (train_model.py): Trains the model with the specified parameters.
  3. Sample Model (sample_from_model.py): Samples an already trained model for a given number of SMILES. It also retrieves the log-likelihood in the process.
  4. Calculate NLL (calculate_nlls.py): Requires as input a SMILES list and outputs a SMILES list with the NLL calculated for each one. It's recommended not to use files with more than 20-30 million SMILES.
  5. Create random SMILES (create_randomized_smiles.py): From a list of canonical SMILES it creates a given number of randomized SMILES files and stores them in the folder specified as output with filenames 000.smi, 001.smi, etc.

Usage examples

Create, train 100 epochs with adaptative learning rate and sample a model with the ChEMBL dataset (randomized SMILES).

(reinvent-randomized) $> mkdir -p chembl_randomized/models
(reinvent-randomized) $> ./create_randomized_smiles.py -i training_sets/chembl.training.smi -o chembl_randomized/training -n 100
(reinvent-randomized) $> ./create_randomized_smiles.py -i training_sets/chembl.validation.smi -o chembl_randomized/validation -n 100
(reinvent-randomized) $> ./create_model.py -i chembl_randomized/training/001.smi -o chembl_randomized/models/model.empty
(reinvent-randomized) $> ./train_model.py -i chembl_randomized/models/model.empty -o chembl_randomized/models/model.trained -s chembl_randomized/training -e 100 --lrm ada --csl chembl_randomized/tensorboard --csv chembl_randomized/validation --csn 75000
# (... wait a few days ...)
(reinvent-randomized) $> ./sample_from_model.py -m chembl_randomized/models/model.trained.100 --with-likelihood

CAUTION: When creating random SMILES sets, the SMILES representation changes and so some of the infrequent tokens do not appear in some sets. To solve that you can try different subsets until you find one that has all the tokens or you can create a fake one with all tokens.

Notice that the tensorboard data is stored in chembl_randomized/tensorboard and can be accessed (even during training) by:

(reinvent-randomized) $> tensorboard --logdir chembl_randomized/tensorboard --port 9999

And go to localhost:9999 to access the web interface.

Create, train 100 epochs with exponential learning rate and sample a model with 1M molecules from the GDB-13 database (canonical SMILES).

(reinvent-randomized) $> mkdir -p gdb13_exp/models
(reinvent-randomized) $> ./create_model.py -i training_sets/gdb13.1M.training.smi -o gdb13_exp/models/model.empty
(reinvent-randomized) $> ./train_model.py -i gdb13_exp/models/model.empty -o gdb13_exp/models/model.trained -s training_sets/gdb13.1M.training.smi -e 100 --lrm exp --lrg 0.9 --csl gdb13_exp/tensorboard --csv trained_models/gdb13.1M.validation.smi --csn 10000
# (... wait for some hours ...)
(reinvent-randomized) $> ./sample_from_model.py -m gdb13_exp/models/model.trained.100 --with-likelihood

Bugs, Errors, Improvements, etc...

We have tested the software, but if you find any bug (which there probably are some) don't hesitate to contact us, or even better, send a pull request or open a github issue. If you have any other question, you can contact us at [email protected] and we will be happy to answer you 😄.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].