All Projects → brain-research → deep-molecular-massspec

brain-research / deep-molecular-massspec

Licence: Apache-2.0 license
Mass Spectrometry for Small Molecules using Deep Learning

Programming Languages

python
139335 projects - #7 most used programming language

Deep learning for Electron Ionization mass spectrometry for organic molecules

TOC

This repository accompanies

Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks
Jennifer N. Wei, David Belanger, Ryan P. Adams, and D. Sculley
ACS Central Science 2019 5 (4), 700-708
DOI: 10.1021/acscentsci.9b00085

Introduction

We predict the mass spectrometry spectra of molecules using deep learning techniques applied to various molecule representations. The performance behavior is evaluated with a custom-made library matching task. In this task we identify molecules by matching its spectra to a library of labeled spectra. As a baseline, this library contains all of the molecules in the NIST main library, which mimics the behavior currently used by experimental chemists. To test our predictions, we replace portions of the library with spectra predictions from our model. This task is described in more detail below.

Required Packages:

It is recommended to use Anaconda with a Python 3.6 environment to install these packages.

Most of the packages required here can be installed with conda install tensorflow=1.13.2 rdkit matplotlib and pip install absl-py.

Quickstart Guide for Making Model Predictions

  1. Create a directory and download the weights for the model.
$ MODEL_WEIGHTS_DIR=/home/path/to/model
$ mkdir $MODEL_WEIGHTS_DIR
$ pushd $MODEL_WEIGHTS_DIR
$ curl -o https://storage.googleapis.com/deep-molecular-massspec/massspec_weights/massspec_weights.zip
$ unzip massspec_weights.zip
$ popd
  1. Run the model prediction on the example molecule
$ python make_spectra_prediction.py \
--input_file=examples/pentachlorobenzene.sdf \
--output_file=/tmp/annotated.sdf \
--weights_dir=$MODEL_WEIGHTS_DIR/massspec_weights

Training splits for benchmarking purposes

The molecules used for the training, validation, and test sets can be found under the directory training_splits. The molecules are provided in inchikey and smiles format.

To cite this work:

@article{doi:10.1021/acscentsci.9b00085,
author = {Wei, Jennifer N. and Belanger, David and Adams, Ryan P. and Sculley, D.},
title = {Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks},
journal = {ACS Central Science},
volume = {5},
number = {4},
pages = {700-708},
year = {2019},
doi = {10.1021/acscentsci.9b00085},
URL = {https://doi.org/10.1021/acscentsci.9b00085},\ }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].