All Projects → ManzoorElahi → organic-chemistry-reaction-prediction-using-NMT

ManzoorElahi / organic-chemistry-reaction-prediction-using-NMT

Licence: other
organic chemistry reaction prediction using NMT with Attention

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to organic-chemistry-reaction-prediction-using-NMT

chembience
A Docker-based, cloudable platform for the development of chemoinformatics-centric web applications and microservices.
Stars: ✭ 41 (+36.67%)
Mutual labels:  chemistry, cheminformatics, chemoinformatics
py4chemoinformatics
Python for chemoinformatics
Stars: ✭ 78 (+160%)
Mutual labels:  chemistry, cheminformatics, chemoinformatics
GLaDOS
Web Interface for ChEMBL @ EMBL-EBI
Stars: ✭ 28 (-6.67%)
Mutual labels:  chemistry, cheminformatics, chemoinformatics
Cirpy
Python wrapper for the NCI Chemical Identifier Resolver (CIR)
Stars: ✭ 55 (+83.33%)
Mutual labels:  chemistry, cheminformatics
Cdk
The Chemistry Development Kit
Stars: ✭ 283 (+843.33%)
Mutual labels:  chemistry, cheminformatics
Tdc
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics
Stars: ✭ 291 (+870%)
Mutual labels:  chemistry, cheminformatics
Version3-1
Version 2020 (3.1) of Chem4Word - A Chemistry Add-In for Microsoft Word
Stars: ✭ 14 (-53.33%)
Mutual labels:  chemistry, cheminformatics
Chemfiles
Library for reading and writing chemistry files
Stars: ✭ 95 (+216.67%)
Mutual labels:  chemistry, cheminformatics
Molvs
Molecule Validation and Standardization
Stars: ✭ 76 (+153.33%)
Mutual labels:  chemistry, cheminformatics
Stk
A Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design and the creation of molecular databases.
Stars: ✭ 99 (+230%)
Mutual labels:  chemistry, cheminformatics
Kekule.js
A Javascript cheminformatics toolkit.
Stars: ✭ 156 (+420%)
Mutual labels:  chemistry, cheminformatics
Thermo
Thermodynamics and Phase Equilibrium component of Chemical Engineering Design Library (ChEDL)
Stars: ✭ 279 (+830%)
Mutual labels:  chemistry, cheminformatics
Indigo
Universal cheminformatics libraries, utilities and database search tools
Stars: ✭ 146 (+386.67%)
Mutual labels:  chemistry, cheminformatics
Chembl webresource client
Official Python client for accessing ChEMBL API.
Stars: ✭ 165 (+450%)
Mutual labels:  chemistry, cheminformatics
Openbabel
Open Babel is a chemical toolbox designed to speak the many languages of chemical data.
Stars: ✭ 492 (+1540%)
Mutual labels:  chemistry, cheminformatics
molecules
chemical graph theory library for JavaScript
Stars: ✭ 83 (+176.67%)
Mutual labels:  chemistry, cheminformatics
Smiles Transformer
Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al.
Stars: ✭ 86 (+186.67%)
Mutual labels:  chemistry, cheminformatics
Awesome Cheminformatics
A curated list of Cheminformatics libraries and software.
Stars: ✭ 244 (+713.33%)
Mutual labels:  chemistry, cheminformatics
molml
A library to interface molecules and machine learning.
Stars: ✭ 57 (+90%)
Mutual labels:  chemistry, cheminformatics
AMPL
The ATOM Modeling PipeLine (AMPL) is an open-source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery.
Stars: ✭ 85 (+183.33%)
Mutual labels:  chemistry, cheminformatics

Organic Chemistry Reaction Prediction using NMT with Attention

The intend is to solve the forward-reaction prediction problem, where the reactants are known and the interest is in generating the reaction products. The idea of relating organic chemistry to a language and explore the application of state-of-the-art neural machine translation methods, which are sequence-to-sequence (seq2seq) models was explained in Schwaller, P., Gaudin, T., Lanyi, D., Bekas, C. and Laino, T., 2017. " Found in Translation": Predicting Outcome of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models. arXiv preprint arXiv:1711.04810. (https://arxiv.org/abs/1711.04810) The data used is available at https://ibm.ent.box.com/v/ReactionSeq2SeqDataset

The model in version 2 is slightly based on the model discussed in "Asynchronous Bidirectional Decoding for Neural Machine Translation" (https://arxiv.org/abs/1801.05122).

Beam search is used for predicting reactions in SMILES format.

Attention Mechanism

The key idea of the attention mechanism is to establish direct short-cut connections between the target and the source by paying "attention" to relevant source content as we translate. A nice byproduct of the attention mechanism is an easy-to-visualize alignment matrix between the source and target sentences.


Figure 1. Attention visualization – example of the alignments between source and target sentences.

Data

Download the required data using the below code (For Python 3).

import urllib.request
url = 'https://public.boxcloud.com/d/1/XZG91j-8-qifzIRdGyZ1OyhRNWS0cpD_MEGfgrpUjf-slwxt1hb-boJ3ZrG7CxguLlU3co4HMIlFs5FM59jHdz4x_2q80XXiPjRSTQtvcqWcHK2rPAiFyuEmIdFeg6fP2zNlttFQujmxwgeQe8C3xGBlsD73fAbEpKlMJk8fZPPbDnraqSjrz3QPiMACoR1Nwbrl9NdBhvptzzoqEzJ8dZ1vrIXRYnRVgn0Vzmh-DvhC6rAL_N95xjsJOvQx3qnA7rtxiUOum0WrnUeyOj22Pkj4PHH5TrBvHjCBcMAXSQSaPM8wyUABeypxJ5gZjqaN3IvZMVj32knzan8QpE3TDQxMmV7bC-YZp-j0zgoSZKewAjRURhIirkkGmI7tfmXS8evVu8AeRpjDyIlLVmChqdqi_UQt_J7kOjzZ5BKv9LlA5jDyhLUYkjoGnQXbr7ZgSrf1Nut_ygtrYeBkJJ9s0kTmgEDml2l2W74sf6OyuFm8BIxP8b022EgKA0bPsnBJqOigi4FN18t8YlAklpA06JMywd2Acpg0BNLAmRTGnkSC3rJrU7blVUMB6k7Gn-L6Z-c6EtLj6USo3KLU_Yf5KXRLwRE4JBhEEbB12SzJGmpZIFdZTpjOCz1nIZW_pNn7ybJ7yM1DaJkNoK4Fduh_-dM1qp6iGj-qNwQFpUfZJeI-qjYUG59H4TvBClfY-bv_Z3HHW-lj5FbAQFYVypOJsRsP18wowbq-OanmfTSLoDRV3p0wNQLXXdug_kgo0mmmDYyRg89iAilyZCvwSjRsJdIGemQSaUnaaEfahOY0gcld6YrxpEhyYECeeubEDnkWc_c3N4HeT9Co5rlrv2n709uHtNrBu4ObzzMZK3xmiqU_chySiqHIhTxfUmRTkq4v6Q-jtMKInCV69H5Hm06iJhHOH6uan1VWelRfhaPbZ11mJJzOlHDkqBRtTx8AGB2gjRcikImtfLxq3_eXmte-79KYfh3_JI8mWWwHQwY6WSVna93Necqm87a5Pmfhk2m-s7zzD50QDeKdT9yNJ7FhWiturzVtPRBKTgzDPsaSdKRFUe0YGB1RS-fvKGu5b0_0Y6t4ZS3eDBBSTVoHhbjsbvfb9oaGd-MwU3UbcEJqlguMd8gVPbgPlHHx_HtZ4uM_rr1_lDL9OqCJ0vKo4jN0bBHPrjum7vJ_0ChIoBsF_fOD7vTpp2NK5at6Z7mIBLf3Rjbg7weyXwadigp5bB3njdV6Cn2IMtNL2C2FtNbl1g6OiOTtQh4g7vqbulkq/download'
csv = urllib.request.urlopen(url).read()
with open('data/US_patents_1976-Sep2016_1product_reactions_train.csv', 'wb') as fx:
    fx.write(csv)

Model


Figure 2. Asynchronous Bidirectional Decoding – Asynchronous Bidirectional Decoding for Neural Machine Translation with Attention.

Prediction Result

Reactants Actual Product Predicted Product T/F
NC@@HC(=O)O.S=C=S O=C(O)C1CSC(=S)N1 O=C(O)C1CSC(=S)N1 True
BrCCCCCCC1CC1.O=C=O O=C(O)CCCCCCC1CC1 O=C(O)CCCCCCC1CC1 True
CCCCCC(=O)CCCCC.NO CCCCCC(CCCCC)=NO CCCCCC(CCCCC)=NO True
N#C[S-].O=C(Cl)c1ccco1 O=C(N=C=S)c1ccco1 O=C(N=C=S)c1ccco1 True
Cc1nccn1CCCl.[N-]=[N+]=[N-] Cc1nccn1CCN=[N+]=[N-] Cc1nccn1CCN=[N+]=[N-] True
CCCCCC/C=C/C(=O)Cl CCCCCC/C=C/C=O CCCCCC/C=C/C=O True
CCCCCC/C=C/C=O CCCCCC/C=C/CO CCCCCC/C=C/CO True
COC(=O)C(N=[N+]=[N-])OC COC(N=[N+]=[N-])C(=O)O COC(N=[N+]=[N-])C(=O)O True
COC(=O)CO.COCCl COCOCC(=O)OC COCOCC(=O)OC True
CNO.Nc1cccnc1.O=N[O-] CN+=NNc1cccnc1 CN+=NNc1cccnc1 True

Graphical User Interface

A simple Graphical User Interface designed using pyqt which takes simplified molecular-input line-entry system (SMILES) as an input and generates the product SMILE & molecule.


Figure 3. Graphical User Interface for Reaction Prediction

Binaries

Executable files for Windows (64 bit) & Linux (64 bit) are available from Sourceforge.

Source Code for GUI

Source code is available in the GUI/Scripts folder.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].