All Projects → keiserlab → keras-neural-graph-fingerprint

keiserlab / keras-neural-graph-fingerprint

Licence: MIT license
Keras implementation of Neural Graph Fingerprints as proposed by Duvenaud et al., 2015

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to keras-neural-graph-fingerprint

xyz2graph
Convert an xyz file into a molecular graph and create a 3D visualisation of the graph.
Stars: ✭ 36 (-23.4%)
Mutual labels:  atom, molecule
atom-genesis-ui
Custom UI theme for Atom text editor (retired)
Stars: ✭ 34 (-27.66%)
Mutual labels:  atom
iro4cli
An open-source rewrite of Iro, a grammar generator, supporting automatic VSCode & Atom extension generation.
Stars: ✭ 21 (-55.32%)
Mutual labels:  atom
e3fp
3D molecular fingerprints
Stars: ✭ 93 (+97.87%)
Mutual labels:  fingerprint
progrssive
A PWA for reading RSS feeds. It works offline!
Stars: ✭ 23 (-51.06%)
Mutual labels:  atom
PracticalMachineLearning
A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (+27.66%)
Mutual labels:  graph-algorithms
Learn-Data Structure-Algorithm-by-Javascript
Data Structure and Algorithm explanations with Implementations by Javascript
Stars: ✭ 55 (+17.02%)
Mutual labels:  graph-algorithms
Better-Less
Cross-compatible syntax highlighting for Less
Stars: ✭ 13 (-72.34%)
Mutual labels:  atom
scikit tt
Tensor Train Toolbox
Stars: ✭ 52 (+10.64%)
Mutual labels:  tensor
BitFact
🛡️ Robust data integrity tool. Prove data, text, & files using the Ethereum blockchain.
Stars: ✭ 42 (-10.64%)
Mutual labels:  fingerprint
language-chef
Development repository for the language-chef plugin for the Atom text editor
Stars: ✭ 16 (-65.96%)
Mutual labels:  atom
Structures
Collection of abstract data structures implemented in Java
Stars: ✭ 99 (+110.64%)
Mutual labels:  graph-algorithms
molecule-ec2
molecule-ec2
Stars: ✭ 46 (-2.13%)
Mutual labels:  molecule
atom-vim-mode-visual-block
Add visual-blockwise operation to vim-mode.
Stars: ✭ 15 (-68.09%)
Mutual labels:  atom
atom-hg
Mercurial support for Atom text editor. Works on Linux, Mac OS X and Windows.
Stars: ✭ 27 (-42.55%)
Mutual labels:  atom
VisualTeensy
VisualCode projects for PJRC Teensy boards
Stars: ✭ 101 (+114.89%)
Mutual labels:  atom
linter-elixirc
Atom Linter plugin for ElixirC
Stars: ✭ 14 (-70.21%)
Mutual labels:  atom
Simple-Social-Network
Micro Social Network developed in PHP, MySQL, BOOTSTRAP 3 and VUE.JS 2
Stars: ✭ 18 (-61.7%)
Mutual labels:  atom
ninja-ui-syntax
Beautiful Atom syntax theme inspired by a Dribbble shot.
Stars: ✭ 17 (-63.83%)
Mutual labels:  atom
meta-extractor
Super simple and fast html page meta data extractor with low memory footprint
Stars: ✭ 38 (-19.15%)
Mutual labels:  atom

Keras Neural Graph Fingerprint

This repository is an implementation of Convolutional Networks on Graphs for Learning Molecular Fingerprints in Keras.

It includes a preprocessing function to convert molecules in smiles representation into molecule tensors.

Next to this, it includes two custom layers for Neural Graphs in Keras, allowing flexible Keras fingerprint models. See examples.py for an examples

Related work

There are several implementations of this paper publicly available:

The closest implementation is the implementation by GUR9000 in Keras. However this repository represents moleculs in a fundamentally different way. The consequences are described in the sections below.

Molecule Representation

Atom, bond and edge tensors

This codebase uses tensor matrices to represent molecules. Each molecule is described by a combination of the following three tensors:

  • atom matrix, size: (max_atoms, num_atom_features) This matrix defines the atom features.

    Each column in the atom matrix represents the feature vector for the atom at the index of that column.

  • edge matrix, size: (max_atoms, max_degree) This matrix defines the connectivity between atoms.

    Each column in the edge matrix represent the neighbours of an atom. The neighbours are encoded by an integer representing the index of their feature vector in the atom matrix.

    As atoms can have a variable number of neighbours, not all rows will have a neighbour index defined. These entries are filled with the masking value of -1. (This explicit edge matrix masking value is important for the layers to work)

  • bond tensor size: (max_atoms, max_degree, num_bond_features) This matrix defines the atom features.

    The first two dimensions of this tensor represent the bonds defined in the edge tensor. The column in the bond tensor at the position of the bond index in the edge tensor defines the features of that bond.

    Bonds that are unused are masked with 0 vectors.

Batch representations

This codes deals with molecules in batches. An extra dimension is added to all of the three tensors at the first index. Their respective sizes become:

  • atom matrix, size: (num_molecules, max_atoms, num_atom_features)
  • edge matrix, size: (num_molecules, max_atoms, max_degree)
  • bond tensor size: (num_molecules, max_atoms, max_degree, num_bond_features)

As molecules have different numbers of atoms, max_atoms needs to be defined for the entire dataset. Unused atom columns are masked by 0 vectors.

Strong and weak points

The obvious downside of this representation is that there is a lot of masking, resulting in a waste of computation power.

The alternative is to represent the entire dataset as a bag of atoms as in the authors original implementation. For larger datasets, this is infeasable. In [GUR9000's implementation] (https://github.com/GUR9000/KerasNeuralFingerprint) the same approach is used, but each batch is pre-calculated as a bag of atoms. The downside of this is that each epoch uses the exact same composition of batches, decreasing the stochasticity. Furthermore, Keras recognises the variability in batch- size and will not run. In his implementation GUR9000 included a modified version of Keras to correct for this.

The tensor representation used in this repository does not have these downsides, and allows for many modificiations of Duvenauds algorithm (there is a lot to explore).

Their representation may be optimised for the regular algorithm, but at a first glance, the tensor implementation seems to perform reasonably fast (check out the examples).

NeuralGraph layers

The two workhorses are defined in NGF/layers.py.

NeuralGraphHidden takes a set of molecules (represented by [atoms, bonds, edges]), and returns the convolved feature vectors of the higher layers. Only the feature vectors change at each iteration, so for higher layers only the atom tensor needs to be replaced by the convolved output of the previous NeuralGraphHidden.

NeuralGraphOutput takes a set of molecules (represented by [atoms, bonds, edges]), and returns the fingerprint output for that layer. According to the original paper, the fingerprints of all layers need to be summed. But these are neural nets, so feel free to play around with the architectures!

Initialisation

The NeuralGraph layers have an internal (Dense) layer of the output size (conv_width for NeuralGraphHidden or fp_length for NeuralGraphOutput). This inner layer accounts for the trainable parameters, activation function, etc.

There are three ways to initialise the inner layer and it's parameters:

  1. Using an integer conv_width and possible kwags (Dense layer is used)
atoms1 = NeuralGraphHidden(conv_width, activation='relu', bias=False)([atoms0, bonds, edges])
  1. Using an initialised Dense layer
atoms1 = NeuralGraphHidden(Dense(conv_width, activation='relu', bias=False))([atoms0, bonds, edges])
  1. Using a function that returns an initialised Dense layer
atoms1 = NeuralGraphHidden(lambda: Dense(conv_width, activation='relu', bias=False))([atoms0, bonds, edges])

In the case of NeuralGraphOutput, all these three methods would be identical. For NeuralGraphHidden, these methods are equal, but can be slightly different. The reason is that a NeuralGraphHidden has a dense layer for each degree.

The following will not work for NeuralGraphHidden:

atoms1 = NeuralGraphHidden(conv_width, activation='relu', bias=False, W_regularizer=l2(0.01))([atoms0, bonds, edges])

The reason is that the same l2 object will be passed to each internal layer, wheras an l2 object can obly be assigned to one layer.

Method 2. will work, because a new layer is instanciated based on the configuration of the passed layer.

Method 3. will work if a function is provided that returns a new l2 object each time it is called (as would be the case for the given lambda function).

NeuralGraph models

For convienience, two builder functions are included that can build a variety of Neural Graph models by specifiying it's parameters. See NGF/models.py.

The examples in examples.py should help you along the way. NGF You can store and load the trained models. Make sure to specify the custom classes:

model = load_model('model.h5', custom_objects={'NeuralGraphHidden':NeuralGraphHidden, 'NeuralGraphOutput':NeuralGraphOutput})

Dependencies

  • RDKit This dependency is nescecairy to convert molecules into tensor representatins, once this step is conducted, the new data can be stored, and RDkit is no longer a dependency.
  • Keras Requires keras 1.x for building, training and evaluating the models.
  • NumPy

Acknowledgements

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].