All Projects → DSPsleeporg → Smiles Transformer

DSPsleeporg / Smiles Transformer

Licence: mit
Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Smiles Transformer

Tdc
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics
Stars: ✭ 291 (+238.37%)
Mutual labels:  jupyter-notebook, chemistry, cheminformatics
Pytorch Original Transformer
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Stars: ✭ 411 (+377.91%)
Mutual labels:  jupyter-notebook, transformer
Tsai
Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai
Stars: ✭ 407 (+373.26%)
Mutual labels:  jupyter-notebook, transformer
Elemnet
Deep Learning the Chemistry of Materials From Only Elemental Composition for Enhancing Materials Property Prediction
Stars: ✭ 44 (-48.84%)
Mutual labels:  jupyter-notebook, chemistry
Lstm chem
Implementation of the paper - Generative Recurrent Networks for De Novo Drug Design.
Stars: ✭ 87 (+1.16%)
Mutual labels:  jupyter-notebook, cheminformatics
Bert Multitask Learning
BERT for Multitask Learning
Stars: ✭ 380 (+341.86%)
Mutual labels:  jupyter-notebook, transformer
Getting Things Done With Pytorch
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BERT.
Stars: ✭ 738 (+758.14%)
Mutual labels:  jupyter-notebook, transformer
Cdk
The Chemistry Development Kit
Stars: ✭ 283 (+229.07%)
Mutual labels:  chemistry, cheminformatics
Vietnamese Electra
Electra pre-trained model using Vietnamese corpus
Stars: ✭ 55 (-36.05%)
Mutual labels:  jupyter-notebook, transformer
Cirpy
Python wrapper for the NCI Chemical Identifier Resolver (CIR)
Stars: ✭ 55 (-36.05%)
Mutual labels:  chemistry, cheminformatics
Deeplearning Nlp Models
A small, interpretable codebase containing the re-implementation of a few "deep" NLP models in PyTorch. Colab notebooks to run with GPUs. Models: word2vec, CNNs, transformer, gpt.
Stars: ✭ 64 (-25.58%)
Mutual labels:  jupyter-notebook, transformer
Question generation
Neural question generation using transformers
Stars: ✭ 356 (+313.95%)
Mutual labels:  jupyter-notebook, transformer
Dab
Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (+241.86%)
Mutual labels:  jupyter-notebook, transformer
Deepsvg
[NeurIPS 2020] Official code for the paper "DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation". Includes a PyTorch library for deep learning with SVG data.
Stars: ✭ 403 (+368.6%)
Mutual labels:  jupyter-notebook, transformer
Nlp Tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Stars: ✭ 9,895 (+11405.81%)
Mutual labels:  jupyter-notebook, transformer
Openbabel
Open Babel is a chemical toolbox designed to speak the many languages of chemical data.
Stars: ✭ 492 (+472.09%)
Mutual labels:  chemistry, cheminformatics
Demo Chinese Text Binary Classification With Bert
Stars: ✭ 276 (+220.93%)
Mutual labels:  jupyter-notebook, transformer
Thermo
Thermodynamics and Phase Equilibrium component of Chemical Engineering Design Library (ChEDL)
Stars: ✭ 279 (+224.42%)
Mutual labels:  chemistry, cheminformatics
Gpt2 French
GPT-2 French demo | Démo française de GPT-2
Stars: ✭ 47 (-45.35%)
Mutual labels:  jupyter-notebook, transformer
Indonesian Language Models
Indonesian Language Models and its Usage
Stars: ✭ 64 (-25.58%)
Mutual labels:  jupyter-notebook, transformer

SMILES Transformer

SMILES Transformer extracts molecular fingerprints from string representations of chemical molecules.
The transformer learns latent representation that is useful for various downstream tasks through autoencoding task.

Requirement

This project requires the following libraries.

  • NumPy
  • Pandas
  • PyTorch > 1.2
  • tqdm
  • RDKit

Dataset

Canonical SMILES of 1.7 million molecules that have no more than 100 characters from Chembl24 dataset were used.
These canonical SMILES were transformed randomly every epoch with SMILES-enumeration by E. J. Bjerrum.

Pre-training

After preparing the SMILES corpus for pre-training, run:

$ python pretrain_trfm.py

Pre-trained model is here.

Downstream Tasks

See experiments/ for the example codes.

Cite

@article{honda2019smiles,
    title={SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery},
    author={Shion Honda and Shoi Shi and Hiroki R. Ueda},
    year={2019},
    eprint={1911.04738},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].