GLambard / Molecules_Dataset_Collection

Licence: MIT license

Collection of data sets of molecules for a validation of properties inference

Projects that are alternatives of or similar to Molecules Dataset Collection

paccmann datasets

pytoda - PaccMann PyTorch Dataset Classes. Read the docs: https://paccmann.github.io/paccmann_datasets/

Stars: ✭ 15 (-78.26%)

Mutual labels: rdkit, smiles

mlss-2016

MLSS 2016 material.

Stars: ✭ 22 (-68.12%)

Mutual labels: inference

chembience

A Docker-based, cloudable platform for the development of chemoinformatics-centric web applications and microservices.

Stars: ✭ 41 (-40.58%)

Mutual labels: rdkit

tiny-schema-validator

JSON schema validator

Stars: ✭ 181 (+162.32%)

Mutual labels: inference

Jupyter Dock

Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.

Stars: ✭ 179 (+159.42%)

Mutual labels: rdkit

ansible-role-fail2ban

Install and configure fail2ban on your system.

Stars: ✭ 42 (-39.13%)

Mutual labels: molecule

hypothesis

A Python toolkit for (simulation-based) inference and the mechanization of science.

Stars: ✭ 47 (-31.88%)

Mutual labels: inference

ims

📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference.

Stars: ✭ 509 (+637.68%)

Mutual labels: inference

chainer-fcis

[This project has moved to ChainerCV] Chainer Implementation of Fully Convolutional Instance-aware Semantic Segmentation

Stars: ✭ 45 (-34.78%)

Mutual labels: inference

sagemaker-xgboost-container

This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.

Stars: ✭ 93 (+34.78%)

Mutual labels: inference

monai-deploy

MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.

Stars: ✭ 56 (-18.84%)

Mutual labels: inference

mol frame

Chemical Structure Handling for Pandas DataFrames

Stars: ✭ 26 (-62.32%)

Mutual labels: rdkit

r2inference

RidgeRun Inference Framework

Stars: ✭ 22 (-68.12%)

Mutual labels: inference

studio-lab-examples

Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!

Stars: ✭ 319 (+362.32%)

Mutual labels: inference

object.omit

Return a copy of an object without the given keys.

Stars: ✭ 79 (+14.49%)

Mutual labels: properties

ConnectedProperties

Dynamically attach properties to (almost) any object instance.

Stars: ✭ 38 (-44.93%)

Mutual labels: properties

chemprop

Fast and scalable uncertainty quantification for neural molecular property prediction, accelerated optimization, and guided virtual screening.

Stars: ✭ 75 (+8.7%)

Mutual labels: molecule

javaproperties

Python library for reading & writing Java .properties files

Stars: ✭ 20 (-71.01%)

Mutual labels: properties

CGCF-ConfGen

🧪 Learning Neural Generative Dynamics for Molecular Conformation Generation (ICLR 2021)

Stars: ✭ 41 (-40.58%)

Mutual labels: molecule

daikon

Common modules shared by Talend applications

Stars: ✭ 14 (-79.71%)

Mutual labels: properties

View All Similar Projects ➔

Collection of data sets of molecules and properties 🎁 😄

What is it?

Inspired by Moleculenet.ai
Selection of data sets of molecules (SMILES) and physicochemical properties

Aim?

SMILES in the data sets have all been uniformized through the RDKit
Cluster the data sets at the same place. They are all here!
Use it for validating the inference of molecular properties through various machine learning models as proposed in Z. Wu et al.

Method?

All data sets are regularized following the RDKit methods to output isomeric, canonical and kekulise SMILES (Daylight)
If a SMILES was not successfully regularized, a blank replaces the SMILES compared to the original data set

But what are these data sets?

Quantum Mechanics: QM9
Physical Chemistry: ESOL, FreeSolv, Lipophilicity
Biophysics: PCBA, HIV, BACE
Physiology: BBBP, Tox21, ToxCast, SIDER, ClinTox

From Moleculenet.ai, here are their short description and the task for inference between squared brackets (for the regularized data sets reported here):

QM9: Geometric, energetic, electronic and thermodynamic properties of DFT-modelled small molecules [classification]
ESOL: Water solubility data(log solubility in mols per litre) for common organic small molecules [regression]
FreeSolv: Experimental and calculated hydration free energy of small molecules in water [regression]
Lipophilicity: Experimental results of octanol/water distribution coefficient(logD at pH 7.4) [regression]
PCBA: Selected from PubChem BioAssay, consisting of measured biological activities of small molecules generated by high-throughput screening [classification]
HIV: Experimentally measured abilities to inhibit HIV replication [classification]
BACE: Quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors of human β-secretase 1(BACE-1) [classification/regression]
BBBP: Binary labels of blood-brain barrier penetration(permeability) [classification]
Tox21: Qualitative toxicity measurements on 12 biological targets, including nuclear receptors and stress response pathways [classification]
ToxCast: Toxicology data for a large library of compounds based on in vitro high-throughput screening, including experiments on over 600 tasks [classification]
SIDER: Database of marketed drugs and adverse drug reactions (ADR), grouped into 27 system organ classes [classification]
ClinTox: Qualitative data of drugs approved by the FDA and those that have failed clinical trials for toxicity reasons [classification]

Citation

Source: Moleculenet.ai

Paper: Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande, MoleculeNet: A Benchmark for Molecular Machine Learning, arXiv: 1703.00564, 2017 [cs.LG]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

GLambard / Molecules_Dataset_Collection

Labels

Projects that are alternatives of or similar to Molecules Dataset Collection

Collection of data sets of molecules and properties 🎁 😄

What is it?

Aim?

Method?

But what are these data sets?

Citation