Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → undeadpixel → Reinvent Randomized

undeadpixel / Reinvent Randomized

Licence: mit

Recurrent Neural Network using randomized SMILES strings to generate molecules

Programming Languages

139335 projects - #7 most used programming language

Labels

machine-learning recurrent-neural-networks

Projects that are alternatives of or similar to Reinvent Randomized

Keras Attention

Visualizing RNNs using the attention mechanism

Stars: ✭ 697 (+1687.18%)

Mutual labels: recurrent-neural-networks

Udacity Deep Learning Nanodegree

This is just a collection of projects that made during my DEEPLEARNING NANODEGREE by UDACITY

Stars: ✭ 15 (-61.54%)

Mutual labels: recurrent-neural-networks

Recurrent Scene Parsing With Perspective Understanding In The Loop

parsing scene images with understanding geometric perspective in the loop

Stars: ✭ 32 (-17.95%)

Mutual labels: recurrent-neural-networks

Machine Learning Curriculum

💻 Make machines learn so that you don't have to struggle to program them; The ultimate list

Stars: ✭ 761 (+1851.28%)

Mutual labels: recurrent-neural-networks

PyTorch Implementation of the RDPG (Recurrent Deterministic Policy Gradient)

Stars: ✭ 25 (-35.9%)

Mutual labels: recurrent-neural-networks

Rnn Based Bitcoin Value Predictor

A Recurrent Neural Network to predict Bitcoin value

Stars: ✭ 21 (-46.15%)

Mutual labels: recurrent-neural-networks

Machine Learning on Sequential Data Using a Recurrent Weighted Average

Stars: ✭ 593 (+1420.51%)

Mutual labels: recurrent-neural-networks

Reading comprehension tf

Machine Reading Comprehension in Tensorflow

Stars: ✭ 37 (-5.13%)

Mutual labels: recurrent-neural-networks

Rnn lstm gesture recog

For recognising hand gestures using RNN and LSTM... Implementation in TensorFlow

Stars: ✭ 14 (-64.1%)

Mutual labels: recurrent-neural-networks

Visualization Toolbox for Long Short Term Memory networks (LSTMs)

Stars: ✭ 959 (+2358.97%)

Mutual labels: recurrent-neural-networks

Deep Learning Time Series

List of papers, code and experiments using deep learning for time series forecasting

Stars: ✭ 796 (+1941.03%)

Mutual labels: recurrent-neural-networks

Zoneout Tensorflow

An implementation of zoneout regularizer on LSTM-RNN by Tensorflow

Stars: ✭ 23 (-41.03%)

Mutual labels: recurrent-neural-networks

Price prediction lob

Deep learning for price movement prediction using high frequency limit order data

Stars: ✭ 27 (-30.77%)

Mutual labels: recurrent-neural-networks

Tensorflow Tutorial

TensorFlow and Deep Learning Tutorials

Stars: ✭ 748 (+1817.95%)

Mutual labels: recurrent-neural-networks

Platform for backtesting and live-trading intraday Stock/ETF/ELW using recurrent neural networks

Stars: ✭ 32 (-17.95%)

Mutual labels: recurrent-neural-networks

RNN-based generative models for speech.

Stars: ✭ 601 (+1441.03%)

Mutual labels: recurrent-neural-networks

Named Entity Recognition

name entity recognition with recurrent neural network(RNN) in tensorflow

Stars: ✭ 20 (-48.72%)

Mutual labels: recurrent-neural-networks

Text classifier for Hierarchical Attention Networks for Document Classification

Stars: ✭ 985 (+2425.64%)

Mutual labels: recurrent-neural-networks

Official PyTorch implementation of paper "A Hybrid Compact Neural Architecture for Visual Place Recognition" by M. Chancán (RA-L & ICRA 2020) https://doi.org/10.1109/LRA.2020.2967324

Stars: ✭ 37 (-5.13%)

Mutual labels: recurrent-neural-networks

Theano Kaldi Rnn

THEANO-KALDI-RNNs is a project implementing various Recurrent Neural Networks (RNNs) for RNN-HMM speech recognition. The Theano Code is coupled with the Kaldi decoder.

Stars: ✭ 31 (-20.51%)

Mutual labels: recurrent-neural-networks

View All Similar Projects ➔

Implementation of the molecular generative model using randomized SMILES strings

Note 1: The version published alongside Randomized SMILES strings improve the quality of molecular generative models is available in the separate branch randomized_smiles.

Note 2: This repository supersedes undeadpixel/reinvent-gdb13.

This repository holds the code to create, train and sample models akin to those described in Randomized SMILES strings improve the quality of molecular generative models and SMILES-based deep generative scaffold decorator for de-novo drug design. This version changes the implementation of the model to use packed sequences and several speed improvements. Also, the support for GRU cells has been dropped.

Specifically, it includes the following:

Python files in the main folder: Scripts to create, train, sample and calculate NLLs of models.
./training_sets: Training set files (in canonical SMILES).

Requirements

This software has been tested on Linux with Tesla V-100 GPUs. We think it should work with other linux-based setups quite easily. The create randomized SMILES script uses Spark 2.4 to parallelize the creation of SMILES. By default it should run in local mode, but maybe further configuration is needed.

Install

A Conda environment.yml is supplied with all the required libraries.

$> git clone <repo url>
$> cd <repo folder>
$> conda env create -f environment.yml
$> conda activate reinvent-randomized
(reinvent-randomized) $> ...

From here the general usage applies.

General Usage

Four tools are supplied. Further information about the tool's arguments, please run it with -h. All output files are in tsv format (the separator is \t).

Create Model (create_model.py): Creates a blank model file.
Train Model (train_model.py): Trains the model with the specified parameters.
Sample Model (sample_from_model.py): Samples an already trained model for a given number of SMILES. It also retrieves the log-likelihood in the process.
Calculate NLL (calculate_nlls.py): Requires as input a SMILES list and outputs a SMILES list with the NLL calculated for each one. It's recommended not to use files with more than 20-30 million SMILES.
Create random SMILES (create_randomized_smiles.py): From a list of canonical SMILES it creates a given number of randomized SMILES files and stores them in the folder specified as output with filenames 000.smi, 001.smi, etc.

Usage examples

Create, train 100 epochs with adaptative learning rate and sample a model with the ChEMBL dataset (randomized SMILES).

(reinvent-randomized) $> mkdir -p chembl_randomized/models
(reinvent-randomized) $> ./create_randomized_smiles.py -i training_sets/chembl.training.smi -o chembl_randomized/training -n 100
(reinvent-randomized) $> ./create_randomized_smiles.py -i training_sets/chembl.validation.smi -o chembl_randomized/validation -n 100
(reinvent-randomized) $> ./create_model.py -i chembl_randomized/training/001.smi -o chembl_randomized/models/model.empty
(reinvent-randomized) $> ./train_model.py -i chembl_randomized/models/model.empty -o chembl_randomized/models/model.trained -s chembl_randomized/training -e 100 --lrm ada --csl chembl_randomized/tensorboard --csv chembl_randomized/validation --csn 75000
# (... wait a few days ...)
(reinvent-randomized) $> ./sample_from_model.py -m chembl_randomized/models/model.trained.100 --with-likelihood

CAUTION: When creating random SMILES sets, the SMILES representation changes and so some of the infrequent tokens do not appear in some sets. To solve that you can try different subsets until you find one that has all the tokens or you can create a fake one with all tokens.

Notice that the tensorboard data is stored in chembl_randomized/tensorboard and can be accessed (even during training) by:

(reinvent-randomized) $> tensorboard --logdir chembl_randomized/tensorboard --port 9999

And go to localhost:9999 to access the web interface.

Create, train 100 epochs with exponential learning rate and sample a model with 1M molecules from the GDB-13 database (canonical SMILES).

(reinvent-randomized) $> mkdir -p gdb13_exp/models
(reinvent-randomized) $> ./create_model.py -i training_sets/gdb13.1M.training.smi -o gdb13_exp/models/model.empty
(reinvent-randomized) $> ./train_model.py -i gdb13_exp/models/model.empty -o gdb13_exp/models/model.trained -s training_sets/gdb13.1M.training.smi -e 100 --lrm exp --lrg 0.9 --csl gdb13_exp/tensorboard --csv trained_models/gdb13.1M.validation.smi --csn 10000
# (... wait for some hours ...)
(reinvent-randomized) $> ./sample_from_model.py -m gdb13_exp/models/model.trained.100 --with-likelihood

Bugs, Errors, Improvements, etc...

We have tested the software, but if you find any bug (which there probably are some) don't hesitate to contact us, or even better, send a pull request or open a github issue. If you have any other question, you can contact us at [email protected] and we will be happy to answer you 😄.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 39

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗