All Projects → GLambard → Molecules_Dataset_Collection

GLambard / Molecules_Dataset_Collection

Licence: MIT license
Collection of data sets of molecules for a validation of properties inference

Projects that are alternatives of or similar to Molecules Dataset Collection

paccmann datasets
pytoda - PaccMann PyTorch Dataset Classes. Read the docs: https://paccmann.github.io/paccmann_datasets/
Stars: ✭ 15 (-78.26%)
Mutual labels:  rdkit, smiles
mlss-2016
MLSS 2016 material.
Stars: ✭ 22 (-68.12%)
Mutual labels:  inference
chembience
A Docker-based, cloudable platform for the development of chemoinformatics-centric web applications and microservices.
Stars: ✭ 41 (-40.58%)
Mutual labels:  rdkit
tiny-schema-validator
JSON schema validator
Stars: ✭ 181 (+162.32%)
Mutual labels:  inference
Jupyter Dock
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.
Stars: ✭ 179 (+159.42%)
Mutual labels:  rdkit
ansible-role-fail2ban
Install and configure fail2ban on your system.
Stars: ✭ 42 (-39.13%)
Mutual labels:  molecule
hypothesis
A Python toolkit for (simulation-based) inference and the mechanization of science.
Stars: ✭ 47 (-31.88%)
Mutual labels:  inference
ims
📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference.
Stars: ✭ 509 (+637.68%)
Mutual labels:  inference
chainer-fcis
[This project has moved to ChainerCV] Chainer Implementation of Fully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 45 (-34.78%)
Mutual labels:  inference
sagemaker-xgboost-container
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Stars: ✭ 93 (+34.78%)
Mutual labels:  inference
monai-deploy
MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.
Stars: ✭ 56 (-18.84%)
Mutual labels:  inference
mol frame
Chemical Structure Handling for Pandas DataFrames
Stars: ✭ 26 (-62.32%)
Mutual labels:  rdkit
r2inference
RidgeRun Inference Framework
Stars: ✭ 22 (-68.12%)
Mutual labels:  inference
studio-lab-examples
Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!
Stars: ✭ 319 (+362.32%)
Mutual labels:  inference
object.omit
Return a copy of an object without the given keys.
Stars: ✭ 79 (+14.49%)
Mutual labels:  properties
ConnectedProperties
Dynamically attach properties to (almost) any object instance.
Stars: ✭ 38 (-44.93%)
Mutual labels:  properties
chemprop
Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.
Stars: ✭ 75 (+8.7%)
Mutual labels:  molecule
javaproperties
Python library for reading & writing Java .properties files
Stars: ✭ 20 (-71.01%)
Mutual labels:  properties
CGCF-ConfGen
🧪 Learning Neural Generative Dynamics for Molecular Conformation Generation (ICLR 2021)
Stars: ✭ 41 (-40.58%)
Mutual labels:  molecule
daikon
Common modules shared by Talend applications
Stars: ✭ 14 (-79.71%)
Mutual labels:  properties

Collection of data sets of molecules and properties 🎁 😄

What is it?

  • Inspired by Moleculenet.ai
  • Selection of data sets of molecules (SMILES) and physicochemical properties

Aim?

  1. SMILES in the data sets have all been uniformized through the RDKit
  2. Cluster the data sets at the same place. They are all here!
  3. Use it for validating the inference of molecular properties through various machine learning models as proposed in Z. Wu et al.

Method?

  • All data sets are regularized following the RDKit methods to output isomeric, canonical and kekulise SMILES (Daylight)
  • If a SMILES was not successfully regularized, a blank replaces the SMILES compared to the original data set

But what are these data sets?

  • Quantum Mechanics: QM9
  • Physical Chemistry: ESOL, FreeSolv, Lipophilicity
  • Biophysics: PCBA, HIV, BACE
  • Physiology: BBBP, Tox21, ToxCast, SIDER, ClinTox

From Moleculenet.ai, here are their short description and the task for inference between squared brackets (for the regularized data sets reported here):

  • QM9: Geometric, energetic, electronic and thermodynamic properties of DFT-modelled small molecules [classification]

  • ESOL: Water solubility data(log solubility in mols per litre) for common organic small molecules [regression]

  • FreeSolv: Experimental and calculated hydration free energy of small molecules in water [regression]

  • Lipophilicity: Experimental results of octanol/water distribution coefficient(logD at pH 7.4) [regression]

  • PCBA: Selected from PubChem BioAssay, consisting of measured biological activities of small molecules generated by high-throughput screening [classification]

  • HIV: Experimentally measured abilities to inhibit HIV replication [classification]

  • BACE: Quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors of human β-secretase 1(BACE-1) [classification/regression]

  • BBBP: Binary labels of blood-brain barrier penetration(permeability) [classification]

  • Tox21: Qualitative toxicity measurements on 12 biological targets, including nuclear receptors and stress response pathways [classification]

  • ToxCast: Toxicology data for a large library of compounds based on in vitro high-throughput screening, including experiments on over 600 tasks [classification]

  • SIDER: Database of marketed drugs and adverse drug reactions (ADR), grouped into 27 system organ classes [classification]

  • ClinTox: Qualitative data of drugs approved by the FDA and those that have failed clinical trials for toxicity reasons [classification]

Citation

Source: Moleculenet.ai

Paper: Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande, MoleculeNet: A Benchmark for Molecular Machine Learning, arXiv: 1703.00564, 2017 [cs.LG]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].