All Projects → UnixJunkie → molenc

UnixJunkie / molenc

Licence: BSD-3-Clause license
MolEnc: a molecular encoder using rdkit and OCaml.

Programming Languages

ocaml
1615 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to molenc

chembience
A Docker-based, cloudable platform for the development of chemoinformatics-centric web applications and microservices.
Stars: ✭ 41 (+192.86%)
Mutual labels:  rdkit, chemoinformatics
paccmann datasets
pytoda - PaccMann PyTorch Dataset Classes. Read the docs: https://paccmann.github.io/paccmann_datasets/
Stars: ✭ 15 (+7.14%)
Mutual labels:  rdkit, chemoinformatics
ACPC
Chemoinformatics tool for ligand-based virtual screening
Stars: ✭ 16 (+14.29%)
Mutual labels:  chemoinformatics, lbvs
py4chemoinformatics
Python for chemoinformatics
Stars: ✭ 78 (+457.14%)
Mutual labels:  rdkit, chemoinformatics
Sterimol
Calculate Sterimol Parameters from Sructure Input/Output Files
Stars: ✭ 17 (+21.43%)
Mutual labels:  qsar
python-package-template
Easy to use template for great PyPi packages
Stars: ✭ 19 (+35.71%)
Mutual labels:  python-script
chemicalite
An SQLite extension for chemoinformatics applications.
Stars: ✭ 37 (+164.29%)
Mutual labels:  rdkit
xyz2graph
Convert an xyz file into a molecular graph and create a 3D visualisation of the graph.
Stars: ✭ 36 (+157.14%)
Mutual labels:  chemoinformatics
my-libs-and-samples
☕️ Automated scripts for improving coding mood(Python/Shell)
Stars: ✭ 25 (+78.57%)
Mutual labels:  python-script
script.parsec
Launch Parsec from OSMC on your Raspberry Pi
Stars: ✭ 17 (+21.43%)
Mutual labels:  python-script
aprenda-python
Aprendizado, dicas e projetos sobre Python
Stars: ✭ 22 (+57.14%)
Mutual labels:  python-script
graphchem
Graph-based machine learning for chemical property prediction
Stars: ✭ 21 (+50%)
Mutual labels:  rdkit
Video-to-audio-converter
A simple tool to convert video files into mp3 audio files
Stars: ✭ 40 (+185.71%)
Mutual labels:  python-script
mover
Mover
Stars: ✭ 16 (+14.29%)
Mutual labels:  python-script
FunUtils
Some codes i wrote to help me with me with my daily errands ;)
Stars: ✭ 43 (+207.14%)
Mutual labels:  python-script
motivate
⚡ motivate ⚡ - A simple script to print random motivational quotes. Highly influenced by linux command fortune.
Stars: ✭ 24 (+71.43%)
Mutual labels:  python-script
Chrome-Extractor
Python script that will extract all saved passwords from your google chrome database on windows only
Stars: ✭ 51 (+264.29%)
Mutual labels:  python-script
Goodboy
A pure OCaml Gameboy emulator
Stars: ✭ 75 (+435.71%)
Mutual labels:  ocaml-program
deep-security-py
Unified Python SDK for both APIs in Trend Micro Deep Security 9.6 and 10.0.
Stars: ✭ 28 (+100%)
Mutual labels:  python-script
global-color-picker
start the script and click anywhere to get rgb value at the cursor location
Stars: ✭ 31 (+121.43%)
Mutual labels:  python-script

Introduction

MolEnc: a molecular encoder using rdkit and OCaml.

DOI

The implemented fingerprint is J-L Faulon's "Signature Molecular Descriptor" (SMD [1]). This is an unfolded-counted chemical fingerprint. Such fingerprints are less lossy than famous chemical fingerprints like ECFP4. SMD encoding doesn't introduce feature collisions upon encoding. Also, a feature dictionary is created at encoding time. This dictionary can be used later on to map a given feature index to an atom environment. Molenc also implements unfolded-counted atom pairs [2].

For SMD, we recommend using a radius of zero to one (molenc.sh -r 0:1 ...) or zero to two.

Currently, the atom typing scheme being used is: (#pi-electrons, element symbol, #HA neighbors, formal charge).

In the future, we might add pharmacophore feature points[3] (Donor, Acceptor, PosIonizable, NegIonizable, Aromatic, Hydrophobe), to allow a fuzzier description of molecules.

How to install the software

For beginners/non opam users: download and execute the latest self-installer shell script from (https://github.com/UnixJunkie/molenc/releases).

Then execute:

./molenc-5.0.1.sh ~/usr/molenc-5.0.1

This will create ~/usr/molenc-5.0.1/bin/molenc.sh, among other things inside the same directory.

For opam users:

opam install molenc

Do not hesitate to contact the author in case you have problems installing or using the software or if you have any question.

Usage

molenc.sh -i input.smi -o output.txt
         [-d encoding.dix]: reuse existing feature dictionary
         [-r i:j]: fingerprint radius (default=0:1)
         [--pairs]: use atom pairs instead of Faulon's FP
         [-m <int>]: maximum allowed atom-pair distance
                     (default: no limit)
         [--seq]: sequential mode (disable parallelization)
         [-v]: debug mode; keep temp files
         [-n <int>]: max jobs in parallel
         [-c <int>]: chunk size
         [--no-std]: don't standardize input file molecules
                     ONLY USE IF THEY HAVE ALREADY BEEN STANDARDIZED

How to encode a database of molecules:

molenc.sh -i molecules.smi -o molecules.txt

How to encode another database of molecules, but reusing the feature dictionary from another database:

molenc.sh -i other_molecules.smi -o other_molecules.txt -d molecules.txt.dix

Bibliography

[1] Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.

[2] Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.

[3] Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].