All Projects → MolecularAI → deep-molecular-optimization

MolecularAI / deep-molecular-optimization

Licence: Apache-2.0 License
Molecular optimization by capturing chemist’s intuition using the Seq2Seq with attention and the Transformer

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to deep-molecular-optimization

transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (+0%)
Mutual labels:  transformer, seq2seq
Pytorch Seq2seq
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
Stars: ✭ 3,418 (+5596.67%)
Mutual labels:  transformer, seq2seq
Tensorflow Ml Nlp
텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)
Stars: ✭ 176 (+193.33%)
Mutual labels:  transformer, seq2seq
Asr
Stars: ✭ 54 (-10%)
Mutual labels:  transformer, seq2seq
Embedding
Embedding模型代码和学习笔记总结
Stars: ✭ 25 (-58.33%)
Mutual labels:  transformer, seq2seq
Multiturndialogzoo
Multi-turn dialogue baselines written in PyTorch
Stars: ✭ 106 (+76.67%)
Mutual labels:  transformer, seq2seq
Paddlenlp
NLP Core Library and Model Zoo based on PaddlePaddle 2.0
Stars: ✭ 212 (+253.33%)
Mutual labels:  transformer, seq2seq
Joeynmt
Minimalist NMT for educational purposes
Stars: ✭ 420 (+600%)
Mutual labels:  transformer, seq2seq
tensorflow-ml-nlp-tf2
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Stars: ✭ 245 (+308.33%)
Mutual labels:  transformer, seq2seq
kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
Stars: ✭ 456 (+660%)
Mutual labels:  transformer, seq2seq
Machine Translation
Stars: ✭ 51 (-15%)
Mutual labels:  transformer, seq2seq
transformer
A PyTorch Implementation of "Attention Is All You Need"
Stars: ✭ 28 (-53.33%)
Mutual labels:  transformer, seq2seq
Sockeye
Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet
Stars: ✭ 990 (+1550%)
Mutual labels:  transformer, seq2seq
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+92803.33%)
Mutual labels:  transformer, seq2seq
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (+676.67%)
Mutual labels:  transformer, seq2seq
Kospeech
Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition.
Stars: ✭ 190 (+216.67%)
Mutual labels:  transformer, seq2seq
Nlp Tutorials
Simple implementations of NLP models. Tutorials are written in Chinese on my website https://mofanpy.com
Stars: ✭ 394 (+556.67%)
Mutual labels:  transformer, seq2seq
Neural sp
End-to-end ASR/LM implementation with PyTorch
Stars: ✭ 408 (+580%)
Mutual labels:  transformer, seq2seq
Transformer Temporal Tagger
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging
Stars: ✭ 55 (-8.33%)
Mutual labels:  transformer, seq2seq
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-61.67%)
Mutual labels:  transformer, seq2seq

Maturity level-0

Molecular Optimization by Capturing Chemist's Intuition Using Deep Neural Networks

Description

Implementation of the Seq2Seq with attention and the Transformer used in Molecular Optimization by Capturing Chemist's Intuition Using Deep Neural Networks. Given a molecule and desirable property changes, the goal is to generate molecules with desirable property changes. This problem can be viewed as a machine translation problem in natural language processing. Property changes are incorporated into input together with SMILES.

Alt text

Usage

Create environment

conda env create -f environment.yml
source activate molopt

1. Preprocess data

Encode property change, build vocabulary, and split data into train, validation and test. Outputs are saved in the same directory with input data path.

python preprocess.py --input-data-path data/chembl_02/mmp_prop.csv

2. Train model

Train the model and save results and logs to experiments/save_directory/; The model from each epoch is saved in experiments/save_directory/checkpoint/; The training loss, validation loss and validation accuracy are saved in experiments/save_directory/tensorboard/.

python train.py --data-path data/chembl_02 --save-directory train_transformer --model-choice transformer transformer

A pre-trained Transformer model can be found here.

3. Generate molecules

Use the model saved at a given epoch (e.g. 60) to generate molecules for the given test filename, and save the results to experiments/save_directory/test_file_name/evaluation_epoch/generated_molecules.csv. The three test sets used in our paper can be found in data/chembl_02/ as below,

  • Test-Original -> data/chembl_02/test.csv
  • Test-Molecule -> data/chembl_02/test_not_in_train.csv
  • Test-Property -> data/chembl_02/test_unseen_L-1_S01_C10_range.csv
python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test --model-path experiments/train_transformer/checkpoint --save-directory evaluation_transformer --epoch 60
python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test_not_in_train --model-path experiments/train_transformer/checkpoint --save-directory evaluation_transformer --epoch 60
python generate.py --model-choice transformer --data-path data/chembl_02 --test-file-name test_unseen_L-1_S01_C10_range --model-path experiments/train_transformer/checkpoint --save-directory evaluation_transformer --epoch 60

4. Compute properties for generated molecules

Since we build the property prediction model based on the in-house experimental data, we can't make it public. But the computed properties can be found in experiments/evaluation_transformer/test_file_name/evaluation_60/generated_molecules_prop.csv

5.Evaluate the generated molecules in term of satisfying the desirable properties and draw molecules

python evaluate.py --data-path experiments/evaluation_transformer/test/evaluation_60/generated_molecules_prop.csv
python evaluate.py --data-path experiments/evaluation_transformer/test_not_in_train/evaluation_60/generated_molecules_prop.csv
python evaluate.py --data-path experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60/generated_molecules_prop.csv --range-evaluation lower

6. Matched molecular pair analysis between starting molecules and generated molecules

  • Download mmpdb for matched molecular pair generation
  • Parse the downloaded mmpdb path (i.e. path/mmpdb/) to --mmpdb-path of mmp_analysis.py

Between starting molecules and all the generated molecules

python mmp_analysis.py --data-path experiments/evaluation_transformer/test/evaluation_60/generated_molecules_prop.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_not_in_train/evaluation_60/generated_molecules_prop.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60/generated_molecules_prop.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/

Between starting molecules and all the generated molecules with desirable properties

python mmp_analysis.py --data-path experiments/evaluation_transformer/test/evaluation_60/generated_molecules_prop_statistics.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/ --only-desirable
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_not_in_train/evaluation_60/generated_molecules_prop_statistics.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/ --only-desirable
python mmp_analysis.py --data-path experiments/evaluation_transformer/test_unseen_L-1_S01_C10_range/evaluation_60/generated_molecules_prop_statistics.csv --train-path data/chembl_02/train.csv --mmpdb-path path/mmpdb/ --only-desirable

License

The code is copyright 2020 by Jiazhen He and distributed under the Apache-2.0 license. See LICENSE for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].