All Projects → crcollins → molml

crcollins / molml

Licence: MIT license
A library to interface molecules and machine learning.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to molml

Pubchempy
Python wrapper for the PubChem PUG REST API.
Stars: ✭ 171 (+200%)
Mutual labels:  chemistry, cheminformatics
organic-chemistry-reaction-prediction-using-NMT
organic chemistry reaction prediction using NMT with Attention
Stars: ✭ 30 (-47.37%)
Mutual labels:  chemistry, cheminformatics
Awesome Cheminformatics
A curated list of Cheminformatics libraries and software.
Stars: ✭ 244 (+328.07%)
Mutual labels:  chemistry, cheminformatics
Indigo
Universal cheminformatics libraries, utilities and database search tools
Stars: ✭ 146 (+156.14%)
Mutual labels:  chemistry, cheminformatics
GLaDOS
Web Interface for ChEMBL @ EMBL-EBI
Stars: ✭ 28 (-50.88%)
Mutual labels:  chemistry, cheminformatics
Kekule.js
A Javascript cheminformatics toolkit.
Stars: ✭ 156 (+173.68%)
Mutual labels:  chemistry, cheminformatics
chembience
A Docker-based, cloudable platform for the development of chemoinformatics-centric web applications and microservices.
Stars: ✭ 41 (-28.07%)
Mutual labels:  chemistry, cheminformatics
Smiles Transformer
Original implementation of the paper "SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery" by Shion Honda et al.
Stars: ✭ 86 (+50.88%)
Mutual labels:  chemistry, cheminformatics
mongodb-chemistry
Ideas for chemical similarity searches in MongoDB.
Stars: ✭ 23 (-59.65%)
Mutual labels:  chemistry, cheminformatics
py4chemoinformatics
Python for chemoinformatics
Stars: ✭ 78 (+36.84%)
Mutual labels:  chemistry, cheminformatics
Ase ani
ANI-1 neural net potential with python interface (ASE)
Stars: ✭ 145 (+154.39%)
Mutual labels:  chemistry, cheminformatics
qp2
Quantum Package : a programming environment for wave function methods
Stars: ✭ 37 (-35.09%)
Mutual labels:  chemistry, quantum
Stk
A Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design and the creation of molecular databases.
Stars: ✭ 99 (+73.68%)
Mutual labels:  chemistry, cheminformatics
Chembl webresource client
Official Python client for accessing ChEMBL API.
Stars: ✭ 165 (+189.47%)
Mutual labels:  chemistry, cheminformatics
Chemfiles
Library for reading and writing chemistry files
Stars: ✭ 95 (+66.67%)
Mutual labels:  chemistry, cheminformatics
senpai
Molecular dynamics simulation software
Stars: ✭ 124 (+117.54%)
Mutual labels:  chemistry, cheminformatics
Cirpy
Python wrapper for the NCI Chemical Identifier Resolver (CIR)
Stars: ✭ 55 (-3.51%)
Mutual labels:  chemistry, cheminformatics
Molvs
Molecule Validation and Standardization
Stars: ✭ 76 (+33.33%)
Mutual labels:  chemistry, cheminformatics
Version3
Version 3 of Chem4Word - A Chemistry Add-In for Microsoft Word
Stars: ✭ 53 (-7.02%)
Mutual labels:  chemistry, cheminformatics
MolecularGraph.jl
Graph-based molecule modeling toolkit for cheminformatics
Stars: ✭ 144 (+152.63%)
Mutual labels:  chemistry, cheminformatics

MolML

Build Status Coverage Status Documentation Status PyPI version License

A library to interface molecules and machine learning. The goal of this library is to be a simple way to convert molecules into a vector representation for later use with libraries such as scikit-learn. This is done using a similar API scheme.

All of the coordinates are assumed to be in angstroms.

Features

- Simple interface to many common molecular descriptors and their variants
    - Molecule
        - Coulomb Matrix
        - Bag of Bonds
        - Encoded Bonds
        - Encoded Angles
        - Connectivity
        - Connectivity Tree
        - Autocorrelation
    - Atom
        - Shell
        - Local Encoded Bonds
        - Local Encoded Angles
        - Local Coulomb Matrix
        - Behler-Parrinello
    - Kernel
        - Atom/Summation Kernel
    - Fragment
        - FragmentMap
    - Crystal
        - Generallized Crystal
        - Ewald Sum Matrix
        - Sine Matrix
- Parallel feature generation
- Ability to save/load fit models
- Multiple input formats supported (and ability to define your own)
- Supports both Python 2 and Python 3

Example Usage

    >>> from molml.features import CoulombMatrix
    >>> feat = CoulombMatrix()
    >>> H2 = (
    ...         ['H', 'H'],
    ...         [
    ...             [0.0, 0.0, 0.0],
    ...             [1.0, 0.0, 0.0],
    ...         ]
    ... )
    >>> HCN = (
    ...         ['H', 'C', 'N'],
    ...         [
    ...             [-1.0, 0.0, 0.0],
    ...             [ 0.0, 0.0, 0.0],
    ...             [ 1.0, 0.0, 0.0],
    ...         ]
    ... )
    >>> feat.fit([H2, HCN])
    CoulombMatrix(input_type='list', n_jobs=1, sort=False, eigen=False, drop_values=False, only_lower_triangle=False)
    >>> feat.transform([H2])
    array([[ 0.5,  1. ,  0. ,  1. ,  0.5,  0. ,  0. ,  0. ,  0. ]])
    >>> feat.transform([H2, HCN])
    array([[  0.5      ,   1.       ,   0.       ,   1.       ,   0.5      ,
            0.       ,   0.       ,   0.       ,   0.       ],
            [  0.5      ,   6.       ,   3.5      ,   6.       ,  36.8581052,
            42.       ,   3.5      ,  42.       ,  53.3587074]])
    >>>
    >>> # Example loading from files directly
    >>> feat2 = CoulombMatrix(input_type='filename')
    CoulombMatrix(input_type='filename', n_jobs=1, sort=False, eigen=False, drop_values=False, only_lower_triangle=False)
    >>> paths = ['data/qm7/qm-%04d.out' % i for i in xrange(2)]
    >>> feat2.fit_transform(paths)
    array([[ 36.8581052 ,   5.49459021,   5.49462885,   5.4945    ,
              5.49031286,   0.        ,   0.        ,   0.        ,
              5.49459021,   0.5       ,   0.56071947,   0.56071656,
              0.56064037,   0.        ,   0.        ,   0.        ,
              5.49462885,   0.56071947,   0.5       ,   0.56071752,
              0.56064089,   0.        ,   0.        ,   0.        ,
              5.4945    ,   0.56071656,   0.56071752,   0.5       ,
              0.56063783,   0.        ,   0.        ,   0.        ,
              5.49031286,   0.56064037,   0.56064089,   0.56063783,
              0.5       ,   0.        ,   0.        ,   0.        ,
              0.        ,   0.        ,   0.        ,   0.        ,
              0.        ,   0.        ,   0.        ,   0.        ,
              0.        ,   0.        ,   0.        ,   0.        ,
              0.        ,   0.        ,   0.        ,   0.        ,
              0.        ,   0.        ,   0.        ,   0.        ,
              0.        ,   0.        ,   0.        ,   0.        ],
           [ 36.8581052 ,  23.81043959,   5.48396427,   5.48394941,
              5.4837656 ,   2.78378686,   2.78375582,   2.78376439,
              23.8104396,  36.8581052 ,   2.78378953,   2.78375777,
              2.78375823,   5.4839846 ,   5.48393324,   5.48376877,
              5.48396427,   2.78378953,   0.5       ,   0.56363019,
              0.56362464,   0.40019757,   0.39971446,   0.3261774 ,
              5.48394941,   2.78375777,   0.56363019,   0.5       ,
              0.56362305,   0.39971429,   0.32617621,   0.40019524,
              5.4837656 ,   2.78375823,   0.56362464,   0.56362305,
              0.5       ,   0.32617702,   0.40019469,   0.3997145 ,
              2.78378686,   5.4839846 ,   0.40019757,   0.39971429,
              0.32617702,   0.5       ,   0.56362996,   0.56362587,
              2.78375582,   5.48393324,   0.39971446,   0.32617621,
              0.40019469,   0.56362996,   0.5       ,   0.56362278,
              2.78376439,   5.48376877,   0.3261774 ,   0.40019524,
              0.3997145 ,   0.56362587,   0.56362278,   0.5       ]])

For more examples, look in the examples. Note: To run some of the examples scikit-learn>=0.16.0 is required.

For the full documentation, refer to the docs or the docstrings in the code.

Dependencies

MolML works with both Python 2 and Python 3. It has been tested with the versions listed below, but newer versions should work.

python>=2.7/3.5/3.6
numpy>=1.9.1
scipy>=0.15.1
pathos>=0.2.0
bidict>=0.17.5
future  # For python 2

NOTE: Due to an issue with multiprocess (a pathos dependency), the minimum version of Python that will work is 2.7.4. For full details see this link. Without this, the parallel computation of features will fail.

Install

Once numpy and scipy are installed, the package can be installed with pip.

$ pip install molml

Or for the bleeding edge version, you can use

$ pip install git+git://github.com/crcollins/molml

Development

To install a development version, just clone the git repo.

$ git clone https://github.com/crcollins/molml
$ # cd to molml and setup some virtualenv
$ pip install -r requirements-dev.txt

Pull requests and bug reports are welcomed!

To build the documentation, you just need to install the documentation dependencies. These are already included in the dev install.

$ cd docs/
$ pip install -r requirements-docs.txt
$ make html

Testing

To run the tests, make sure that nose is installed and then run:

$ nosetests

To include coverage information, make sure that coverage is installed and then run:

$ nosetests --with-coverage --cover-package=molml --cover-erase

Citation

Currently, there is not a dedicated publication for MolML. Instead, feel free to cite the work that spawned this library.

@article{collins2018constant,
    title={Constant size descriptors for accurate machine learning models of molecular properties},
    author={Collins, Christopher R and Gordon, Geoffrey J and von Lilienfeld, O Anatole and Yaron, David J},
    journal={The Journal of Chemical Physics},
    volume={148},
    number={24},
    pages={241718},
    year={2018},
    publisher={AIP Publishing}
}

In addition, each feature extraction method has its own main reference listed in the docstring. These can also be accessed as follows:

    >>> from molml.features import CoulombMatrix
    >>> print(CoulombMatrix().get_citation())
    Rupp, M.; Tkatchenko, A.; Muller, K.-R.; von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning. Phys. Rev. Lett. 2012, 108, 058301.
    Hansen, K.; Montavon, G.; Biegler, F.; Fazli, S.; Rupp, M.; Scheffler, M.; von Lilienfeld, O. A.; Tkatchenko, A.; Muller, K.-R. Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. J. Chem. Theory Comput. 2013, 9, 3404-3419.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].