All Projects → grisoniFr → whales_descriptors

grisoniFr / whales_descriptors

Licence: other
python code for calculating the WHALES (Weighted Holistic Atom Localization and Entity Shape) molecular descriptors

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to whales descriptors

aaai17-cdq
The implementation of AAAI-17 paper "Collective Deep Quantization of Efficient Cross-modal Retrieval"
Stars: ✭ 33 (+37.5%)
Mutual labels:  similarity-search
EfficientIR
人工智障本地图片检索工具 | An EfficientNet based image retrieval tool
Stars: ✭ 64 (+166.67%)
Mutual labels:  similarity-search
apollo
Advanced similarity and duplicate source code proof of concept for our research efforts.
Stars: ✭ 49 (+104.17%)
Mutual labels:  similarity-search
pause
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴
Stars: ✭ 19 (-20.83%)
Mutual labels:  similarity-search
efficient-descriptors
🚀🚀 Revisiting Binary Local Image Description for Resource Limited Devices
Stars: ✭ 76 (+216.67%)
Mutual labels:  descriptors
local-descriptors-for-image-classification
Local Descriptors
Stars: ✭ 22 (-8.33%)
Mutual labels:  descriptors
GeobitNonrigidDescriptor ICCV 2019
C++ implementation of the nonrigid descriptor Geobit presented at ICCV 2019 "GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images"
Stars: ✭ 11 (-54.17%)
Mutual labels:  descriptors
fastDesp-corrProp
Fast Descriptors and Correspondence Propagation for Robust Global Point Cloud Registration
Stars: ✭ 16 (-33.33%)
Mutual labels:  descriptors
lcd
[AAAI'20] LCD: Learned Cross-Domain Descriptors for 2D-3D Matching
Stars: ✭ 93 (+287.5%)
Mutual labels:  descriptors
bidd-molmap
MolMap: An Efficient Convolutional Neural Network Based Molecular Deep Learning Tool
Stars: ✭ 102 (+325%)
Mutual labels:  descriptors
awesome-vector-search
Collections of vector search related libraries, service and research papers
Stars: ✭ 460 (+1816.67%)
Mutual labels:  similarity-search
wordvector be
Web服务:使用腾讯 800 万词向量模型和 spotify annoy 引擎得到相似关键词
Stars: ✭ 92 (+283.33%)
Mutual labels:  similarity-search
dhash-vips
vips-powered ruby gem to measure images similarity, implementing dHash and IDHash algorithms
Stars: ✭ 75 (+212.5%)
Mutual labels:  similarity-search
node-dvbtee
MPEG2 transport stream parser for Node.js with support for television broadcast PSIP tables and descriptors
Stars: ✭ 24 (+0%)
Mutual labels:  descriptors
visualsearch
Visual Search is a little app to find and cluster similar images using Tagbox
Stars: ✭ 31 (+29.17%)
Mutual labels:  similarity-search
padelpy
A Python wrapper for PaDEL-Descriptor software
Stars: ✭ 121 (+404.17%)
Mutual labels:  molecular-descriptors
PyVGGFace
VGG-Face CNN descriptor in PyTorch.
Stars: ✭ 21 (-12.5%)
Mutual labels:  descriptors
Milvus
An open-source vector database for embedding similarity search and AI applications.
Stars: ✭ 9,015 (+37462.5%)
Mutual labels:  similarity-search
KgCLUE
KgCLUE: 大规模中文开源知识图谱问答
Stars: ✭ 131 (+445.83%)
Mutual labels:  similarity-search
Rcpi
Molecular informatics toolkit with a comprehensive integration of bioinformatics and cheminformatics tools for drug discovery.
Stars: ✭ 22 (-8.33%)
Mutual labels:  molecular-descriptors

repo version python version license

NEW version!!

Check out our new version of this code (for Python 3, with improved molecule loading and optimization) here.

WHALES descriptors

This repository contains all the necessary files to compute Weighted Holistic Atom Localization and Entity Shape (WHALES) descriptors starting from an rdkit supplier file.

For more information regarding the method, have a look at:

Francesca Grisoni, Daniel Merk, Viviana Consonni, Jan A. Hiss, Sara Giani Tagliabue, Roberto Todeschini & Gisbert Schneider "Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity", Nature Communications Chemistry 1, 44, 2018. (Freely available at this link)

Getting Started

These instructions will get you a copy of the project up and running on your local machine.

Prerequisites

The following prerequisites are needed:

*Python 2.7*

*RDKit

*NumPy

*pandas

A guide to the correct installation is provided in the following paragraph.

Preliminary steps

Install conda from the official website. Once conda is installed, it can be used to generate the environment and download RDKit. If you already have RDKit and pandas up and running, you can move to the next paragraph.

It is suggested to run all the calculations within an RDKit environment. The environment can be created with conda as follows:

conda create -n whales_env python=2.7*
activate whales_env

The RDKit repositories can be listed with the following command:

conda install -c rdkit rdkit

Alternatively, you can also try with the following:

anaconda search -t conda rdkit

Choose then the best installation for py27 according to the platform. For instance:

conda install -c https://conda.anaconda.org/nickvandewiele rdkit

Now install the necessary prerequisites

sudo apt-get install python-setuptools
sudo apt install git
python -m pip install --user pandas

Installing WHALES repository

The repository can be cloned as follows

git clone https://github.com/grisoniFr/WHALES_descriptors.git

Change directory to your local Git repository and to the main WHALES folder e.g., < git_repository\current_user>\WHALES-descriptors\

Then, install the package as follows:

sudo python setup.py install

To check whether the installation went well, type

python 
import whales_descriptors
quit()

If no errors are displayed, WHALES package has been succesfully installed.

Using the package

Importing molecular files

RDKit suppliers have to be used as the input for WHALES calculation, for instance:

python # start python
from rdkit import Chem # imports package
suppl = Chem.SDMolSupplier(filename) # generates an rdkit supplier file

If the molecules are more than approx. 10,000, it is suggested to use ForwardMolSupplier, instead:

suppl = Chem.ForwardSDMolSupplier(filename) 

Note that geometrical coordinates have to be specified/computed in order to calculate WHALES descriptors.

Utilizing WHALES descriptors

The WHALES package can be imported as follows:

from whales_descriptors import do_whales

and used to calculate the descriptors for the supplier molecules

x, labels = do_whales.main(suppl, charge_threshold=0, do_charge=True, property_name='')

Specified parameters:

  • suppl: rdkit supplier
  • charge_threshold: to neglect atoms with absolute partial charges lower than the threshold (default = 0)
  • do_charge: if True, Gasteiger-Marsili partial charges are computed with rdkit
  • property_name: name of the column containing partial charges of the sdf file (mandatory if do_charge is False)

Returns:

  • x (n_mol,p): descriptor matrix, each row corresponds to a molecule
  • labels (1,p): descriptor labels

N.B. If a calculation error occurs for a given molecule (e.g., no partial charges computed), the corresponding descriptor values are set to -999.

Export descriptors values as a .txt file

The results can be exported as a plain txt file as follows:

import numpy as np
np.savetxt(save_name + '_whales.txt', x, delimiter=' ', newline='\n') # for descriptors
np.savetxt(save_name + '_labels.txt', labels, delimiter=' ', newline='\n',fmt='%s') # for labels

where "save_name" is a user-defined name, e.g., "WHALES_descriptors".

Authors

Contributors to the WHALES descriptors project:

  • Francesca Grisoni, University of Milano-Bicocca & ETH-Zurich
  • Prof. Dr. Gisbert Schneider, ETH Zurich, [email protected]
  • Dr. Viviana Consonni, University of Milano-Bicocca
  • Prof. Roberto Todeschini, University of Milano-Bicocca

See also the list of contributors who participated in this project.

Publications that used WHALES descriptors to identify bioactive molecules

  • Grisoni et al. "Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity", Nature Communications Chemistry 1, 44, 2018. (link)
  • Merk et al. "Scaffold hopping from synthetic RXR modulators by virtual screening and de novo design", Med. Chem. Commun., 2018, 9, 1289-1292. (link)
  • Merk et al. "De Novo Design of Bioactive Small Molecules by Artificial Intelligence", Mol. Inf., 2018, 1700153. (link)
  • Grisoni et al. "Scaffold-hopping from synthetic drugs by holistic molecular representation", Scientific reports 8, 2018. (link)
  • Grisoni et al. "Design of Natural‐Product‐Inspired Multitarget Ligands by Machine Learning", ChemMedChem 14, 2019. (link)

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. See the LICENSE.md file for additional details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].