Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → topazape → Lstm_chem

topazape / Lstm_chem

Licence: unlicense

Implementation of the paper - Generative Recurrent Networks for De Novo Drug Design.

Programming Languages

1442 projects

Labels

jupyter-notebook machine-learning keras lstm rnn cheminformatics

Projects that are alternatives of or similar to Lstm chem

Pytorch Seq2seq

Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.

Stars: ✭ 3,418 (+3828.74%)

Mutual labels: jupyter-notebook, lstm, rnn

Neural Networks

All about Neural Networks!

Stars: ✭ 34 (-60.92%)

Mutual labels: jupyter-notebook, lstm, rnn

Lstm Human Activity Recognition

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

Stars: ✭ 2,943 (+3282.76%)

Mutual labels: jupyter-notebook, lstm, rnn

A cute multi-layer LSTM that can perform like a human 🎶

Stars: ✭ 187 (+114.94%)

Mutual labels: jupyter-notebook, lstm, rnn

Machine Learning

My Attempt(s) In The World Of ML/DL....

Stars: ✭ 78 (-10.34%)

Mutual labels: jupyter-notebook, lstm, rnn

Natural Language Processing With Tensorflow

Natural Language Processing with TensorFlow, published by Packt

Stars: ✭ 222 (+155.17%)

Mutual labels: jupyter-notebook, lstm, rnn

This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.

Stars: ✭ 345 (+296.55%)

Mutual labels: jupyter-notebook, lstm, rnn

Chinese Poetry Generation

Stars: ✭ 159 (+82.76%)

Mutual labels: jupyter-notebook, lstm, rnn

Bitcoin Price Prediction Using Lstm

Bitcoin price Prediction ( Time Series ) using LSTM Recurrent neural network

Stars: ✭ 67 (-22.99%)

Mutual labels: jupyter-notebook, lstm, rnn

Video Classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101

Stars: ✭ 543 (+524.14%)

Mutual labels: jupyter-notebook, lstm, rnn

RNN(SimpleRNN, LSTM, GRU) Tensorflow2.0 & Keras Notebooks (Workshop materials)

Stars: ✭ 48 (-44.83%)

Mutual labels: jupyter-notebook, lstm, rnn

Stockpriceprediction

Stock Price Prediction using Machine Learning Techniques

Stars: ✭ 700 (+704.6%)

Mutual labels: jupyter-notebook, lstm, rnn

Rnn For Joint Nlu

Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling" (https://arxiv.org/abs/1609.01454)

Stars: ✭ 176 (+102.3%)

Mutual labels: jupyter-notebook, lstm, rnn

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+3588.51%)

Mutual labels: jupyter-notebook, lstm, rnn

Load forecasting

Load forcasting on Delhi area electric power load using ARIMA, RNN, LSTM and GRU models

Stars: ✭ 160 (+83.91%)

Mutual labels: jupyter-notebook, lstm, rnn

Deeplearning.ai Assignments

Stars: ✭ 268 (+208.05%)

Mutual labels: jupyter-notebook, lstm, rnn

Linear Attention Recurrent Neural Network

A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Stars: ✭ 119 (+36.78%)

Mutual labels: jupyter-notebook, lstm, rnn

Chinese Chatbot

中文聊天机器人，基于10万组对白训练而成，采用注意力机制，对一般问题都会生成一个有意义的答复。已上传模型，可直接运行，跑不起来直播吃键盘。

Stars: ✭ 124 (+42.53%)

Mutual labels: jupyter-notebook, lstm, rnn

Easy Deep Learning With Keras

Keras tutorial for beginners (using TF backend)

Stars: ✭ 367 (+321.84%)

Mutual labels: jupyter-notebook, lstm, rnn

A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Stars: ✭ 589 (+577.01%)

Mutual labels: jupyter-notebook, lstm, rnn

View All Similar Projects ➔

LSTM_Chem

This is the implementation of the paper - Generative Recurrent Networks for De Novo Drug Design

Changelog

2020-03-25

Changed the code to use tensorflow 2.1.0 (tf.keras)

2019-12-23

Reimplimented all code to use tensorflow 2.0.0 (tf.keras)
Changed data_loader to use generator to reduce memory usage
Removed some unused atoms and symbols
Changed directory layout

Requirements

This model is built using Python 3.7, and utilizes the following packages;

numpy 1.18.2
tensorflow 2.1.0
tqdm 4.43.0
Bunch 1.0.1
matplotlib 3.1.2
RDKit 2019.09.3
scikit-learn 0.22.2.post1

I strongly recommend using GPU version of tensorflow. Learning this model with all the data is very slow in CPU mode (about 9 hrs / epoch). Since tensorflow 2.1.0 depends on CUDA 10.1, be careful that your environment accepts the correct version.
RDKit and matplotlib are used for SMILES cleanup, validation, and visualization of molecules and their properties. To install RDKit, I strongly recommend using Anaconda (See this document). Building RDKit from source is hard.
Scikit-learn is used for PCA.

Usage

Training

Just run below. However, all the data is used according to the default setting. So please be careful, it will take a long time. If you don't have enough time, set data_length to a different value in base_config.json.

$ python train.py

After training, experiments/{exp_name}/{YYYY-mm-dd}/config.json is generated. It's a copy of base_config.json with additional settings for internal varibale. Since it is used for generation, be careful when rewriting.

Generation

See example_Randomly_generate_SMILES.ipynb.

fine-tuning

See example_Fine-tuning_for_TRPM8.ipynb.

Detail

Configuration

See base_config.json. If you want to change, please edit this file before training.

parameters	meaning
exp_name	experiment name (default: `LSTM_Chem`)
data_filename	filepath for training the model (`SMILES file with newline as delimiter`)
data_length	number of SMILES for training. If you set 0, all the data is used (default: `0`)
units	size of hidden state vector of two LSTM layers (default: `256`, see the paper)
num_epochs	number of epochs (default: `22`, see the paper)
optimizer	optimizer (default: `adam`)
seed	random seed (default: `71`)
batch_size	batch size (default: `256`)
validation_split	split ratio for validation (default: `0.10`)
varbose_training	verbosity mode (default: `True`)
checkpoint_monitor	quantity to monitor (default: `val_loss`)
checkpoint_mode	one of {`auto`, `min`, `max`} (default: `min`)
checkpoint_save_best_only	the latest best model according to the quantity monitored will not be overwritten (default: `False`)
checkpoint_save_weights_only	If True, then only the model's weights will be saved (default: `True`)
checkpoint_verbose	verbosity mode while `ModelCheckpoint` (default: `1`)
tensorboard_write_graph	whether to visualize the graph in TensorBoard (defalut: `True`)
sampling_temp	sampling temperature (default: `0.75`, see the paper)
smiles_max_length	maximum size of generated SMILES (symbol) length (default: `128`)
finetune_epochs	epochs for fine-tuning (default: `12`, see the paper)
finetune_batch_size	batch size of finetune (default: `1`)
finetune_filename	filepath for fine-tune the model (`SMILES file with newline as delimiter`)

Preparing Dataset

Get database from ChEMBL

Download SQLite dump for ChEMBL25 (ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_25_sqlite.tar.gz), which is 3.3 GB compressed, and 16 GB uncompressed.
Unpack it the usual way, cd into the directory, and open the database using sqlite console.

Extract SMILES for training

$ sqlite3 chembl_25.db
SQLite version 3.30.1 2019-10-10 20:19:45
Enter ".help" for usage hints.
sqlite> .output dataset.smi

You can get SMILES that annotated nM activities according to the following SQL query.

SELECT
  DISTINCT canonical_smiles
FROM
  compound_structures
WHERE
  molregno IN (
    SELECT
      DISTINCT molregno
    FROM
      activities
    WHERE
      standard_type IN ("Kd", "Ki", "Kb", "IC50", "EC50")
      AND standard_units = "nM"
      AND standard_value < 1000
      AND standard_relation IN ("<", "<<", "<=", "=")
    INTERSECT
    SELECT
      molregno
    FROM
      molecule_dictionary
    WHERE
      molecule_type = "Small molecule"
  );

You can get 556134 SMILES in dataset.smi. According to the paper, the dataset was preprocessed and duplicates, salts, and stereochemical information were removed, SMILES strings with lengths from 34 to 74 (tokens). So I made SMILES clean up script. Run the following to get cleansed SMILES. It takes about 10 miniutes or more. Please wait.

$ python cleanup_smiles.py datasets/dataset.smi datasets/dataset_cleansed.smi

You can get 438552 SMILES. This dataset is used for training.

SMILES for fine-tuning

The paper shows 5 TRPM8 antagonists for fine-tuning.

FC(F)(F)c1ccccc1-c1cc(C(F)(F)F)c2[nH]c(C3=NOC4(CCCCC4)C3)nc2c1
O=C(Nc1ccc(OC(F)(F)F)cc1)N1CCC2(CC1)CC(O)c1cccc(Cl)c1O2
O=C(O)c1ccc(S(=O)(=O)N(Cc2ccc(C(F)(F)C3CC3)c(F)c2)c2ncc3ccccc3c2C2CC2)cc1
Cc1cccc(COc2ccccc2C(=O)N(CCCN)Cc2cccs2)c1
CC(c1ccc(F)cc1F)N(Cc1cccc(C(=O)O)c1)C(=O)c1cc2ccccc2cn1

You can see this in datasets/TRPM8_inhibitors_for_fine-tune.smi.

Extract known TRPM8 inhibitors from ChEMBL25

Open the database using sqlite console.

$ sqlite3 chembl_25.db
SQLite version 3.30.1 2019-10-10 20:19:45
Enter ".help" for usage hints.
sqlite> .output known-TRPM8-inhibitors.smi

Then issue the following SQL query. I set maximum IC50 activity to 10 uM.

SELECT
  DISTINCT canonical_smiles
FROM
  activities,
  compound_structures
WHERE
  assay_id IN (
    SELECT
      assay_id
    FROM
      assays
    WHERE
      tid IN (
        SELECT
          tid
        FROM
          target_dictionary
        WHERE
          pref_name = "Transient receptor potential cation channel subfamily M member 8"
      )
  )
  AND standard_type = "IC50"
  AND standard_units = "nM"
  AND standard_value < 10000
  AND standard_relation IN ("<", "<<", "<=", "=")
  AND activities.molregno = compound_structures.molregno;

You can get 494 known TRPM8 inhibitors. As described above, clean up the TRPM8 inhibitor SMILES. Please use the -ft option to ignore SMILES strings (tokens) length restriction.

$ python cleanup_smiles.py -ft datasets/known-TRPM8-inhibitors.smi datasets/known_TRPM8-inhibitors_cleansed.smi

You can get 477 SMILES. I used this for mere visualization of the results of fine-tuning.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 87

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗