All Projects → MobleyLab → GuthrieSolv

MobleyLab / GuthrieSolv

Licence: CC-BY-4.0 license
Experimental small molecule hydration free energy dataset

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to GuthrieSolv

FreeSolv
Experimental and calculated small molecule hydration free energies
Stars: ✭ 64 (+113.33%)
Mutual labels:  hydration, solvation, experimental-data, free-energies, experimental-values
epanet2toolkit
An R package for calling the Epanet software for simulation of piping networks.
Stars: ✭ 13 (-56.67%)
Mutual labels:  water
Crest
An advanced ocean system implemented in Unity3D
Stars: ✭ 2,364 (+7780%)
Mutual labels:  water
URPOcean
Ocean waves on URP for wide-range mobile devices (gles 3.0)
Stars: ✭ 98 (+226.67%)
Mutual labels:  water
nuxt-speedkit
nuxt-speedkit will help you to improve the lighthouse performance score (100/100) of your website.
Stars: ✭ 401 (+1236.67%)
Mutual labels:  hydration
QWAT
TEKSI Water module (project QWAT) - QGIS project
Stars: ✭ 52 (+73.33%)
Mutual labels:  water
URP Water
Water shader in unity urp.
Stars: ✭ 19 (-36.67%)
Mutual labels:  water
download water data
Downloader for the Global Surface Water Data of the Copernicus Programme
Stars: ✭ 25 (-16.67%)
Mutual labels:  water
sparsebn
Software for learning sparse Bayesian networks
Stars: ✭ 41 (+36.67%)
Mutual labels:  experimental-data
11tyby
Simple 11ty setup using TypeScript, SASS, Preact with partial hydration, and other useful things. Aims to provide the DX of Gatsby, but using 11ty!
Stars: ✭ 38 (+26.67%)
Mutual labels:  hydration
yadm
An efficient way to treat MongoDB in PHP. Extremely fast persistence and hydration.
Stars: ✭ 84 (+180%)
Mutual labels:  hydration
nuxt-delay-hydration
Improve your Nuxt.js v2 Google Lighthouse score by delaying hydration ⚡️
Stars: ✭ 135 (+350%)
Mutual labels:  hydration
derivative
Optimal numerical differentiation of noisy time series data in python.
Stars: ✭ 34 (+13.33%)
Mutual labels:  experimental-data
react-ssr-hydration
Example of React Server Side Rendering with Styled Components and Client Side Hydration
Stars: ✭ 15 (-50%)
Mutual labels:  hydration
pydov
Python package to retrieve data from Databank Ondergrond Vlaanderen (DOV)
Stars: ✭ 29 (-3.33%)
Mutual labels:  water
elliotforwater.com
Webapp which run the https://elliotforwater.com/ website
Stars: ✭ 15 (-50%)
Mutual labels:  water
prism
(No longer in development). Experimental compiler for building isomorphic web applications with web components.
Stars: ✭ 106 (+253.33%)
Mutual labels:  hydration
GLM
Code for the General Lake Model
Stars: ✭ 30 (+0%)
Mutual labels:  water
Procedural-Terrain-Generator-OpenGL
Procedural terrain generator with tessellation | C++ OpenGL 4.1
Stars: ✭ 98 (+226.67%)
Mutual labels:  water
leaflet-velocity
Visualise velocity data on a leaflet layer
Stars: ✭ 467 (+1456.67%)
Mutual labels:  water

The Guthrie Hydration Free Energy Database of Experimental Small Molecule Hydration Free Energies

This repository provides access to the late J. Peter Guthrie's small molecule hydration free energy database, which was donated posthumously to the community. If you are interested in using the data provided here, please read the relevant background information and disclaimers below and consider contributing to curation of the dataset.

DOI

Background information and disclaimers

Death of the primary author

For some years, J. Peter Guthrie (University of Western Ontario) worked passionately on a curating a massive database of experimental hydration free energies that he pulled from the literature. Some of these were used for the SAMPL series of challenges over the years, and others provided some assistance in curation of FreeSolv, which Peter co-authored with me (DLM). But the project was massive, and the literature immense. Peter was uniquely qualified for this database curation effort, with deep understanding of the experimental techniques, extrapolations commonly employed, etc. But the task was vast, and it outlasted him. He died September 19, 2017, at age 76, after a battle with Guillain-Barre Syndrome.

Succession plans

Apparently Peter must have expected the task might outlast him, as he left his son, James Guthrie, instructions to contact myself, Anthony Nicholls (OpenEye), and Paul Labute (CCG) in the event of his death. None of the three of us have many resources to invest in continuing the curation process at present; at the same time, we believe this data and the underlying work and references will have considerable value to the community long term. So after discussion, we decided the best path forward was simply to make available what Peter and James provided to allow the community to use and curate it. James gave permission to post this data publicly to allow this effort to continue.

Disclaimers

We provide two different types of data

This dataset consists of two parts which are expected to become significantly different:

  1. An original Excel spreadsheet, which is provided exactly as it arrived from the Guthrie family. This is provided in an "as is" format and you should use it as your own risk; we have no information about its contents beyond what is in the spreadsheet itself and in this GitHub repository. No changes to this spreadsheet will be made.
  2. A current database, which is initially an export of the contents of the Excel database, but is expected to become an independent entity based on community curation.

Use both versions at your own risk

We make no warranty as to the contents or usefulness of either dataset; both are provided as resources to the community but must be used with caution and with your own consultation of the literature.

Curation of the dataset

Our hope is that the community will get involved with curation of the dataset provided here -- in particular, the "current database" (the Excel spreadsheet should be left in its original form). Suggested improvements should come in via pull requests, where each pull request provides proposed modifications (including potentially supporting tools/scripts, data, references, or links to the same) and a clear explanation of these changes. Thus, over time the current, curated database is expected to move away from simply reflecting the contents of the Excel spreadsheet and become more valuable.

Some specific points of curation which will be needed include:

  • Separation of different types of data; for example, the main tab in the database Excel spreadsheet (and the data in guthrie_database.csv) contains not just hydration free energies but other properties with other units, e.g. the entries for phenol include values reported in mg/L, g/m^3, etc.
  • unit handling; values are present in kJ/mol and kcal/mol
  • checking of molecule names against SMILES and stereochemistry; I (DLM) previously gave Peter some tools to help with this but I do not know if he has used them

See also usage_notes.md for some information which relates to the contents.

Manifest

  • GuthrieDatabase_April14.zip: Guthrie database (Excel spreadsheet) as it was provided
  • guthrie_database.csv: Exported csv file of main tab of Excel spreadsheet
  • guthrie_references_and_status.csv: Additional tab of Excel spreadsheet which provides definitions of the references and reports on Peter's progress in extracting data from those references; may highlight other areas where more data is still available

There is also data/curation work in an additional tab of the spreadsheet, Sheet 2, which may be useful but is not present here as a separate file yet.

Using the dataset

The data set can be loaded easily in Python using pandas, for example as:

python
import pandas
db = pandas.read_csv('guthrie_database.csv', encoding='latin1')
data = db[db.Name=='phenol']

to load the database and extract all data with a molecule named phenol

Maintenance

This repository has data quality assurance tests implemented in Python that can be run with tox using the following commands:

$ git clone [email protected]:MobleyLab/GuthrieSolv.git
$ cd GuthrieSolv
$ pip install tox
$ tox

Authors

Primary author

  • J. Peter Guthrie (University of Western Ontario)

Other contributors

  • David L. Mobley, UC Irvine, who maintains this repository with help from the community
  • Chris Hoyt, who helped with CI and data integrity tests
  • Probably students and others who worked with Dr. Guthrie over the years, but I (DLM) do not have their information

Changelog

  • 2021-12-20: Added CI testing to ensure SMILES are non-null and parseable; add code quality checks; use GitHub actions to ensure tests run/continue working/etc.

Citing this work

Please cite this GitHub repository, as well as "The Guthrie Hydration Free Energy Database of Experimental Small Molecule Hydration Free Energies," J. Peter Guthrie and David L. Mobley, eScholarship, https://escholarship.org/uc/item/53n2h10t.

We maintain archival copies of this repository on eScholarship, administered by the University of California, in order to ensure long term access. New versions will also be posted there.

Acknowledgments

  • James Guthrie, who made this data available and gave permission to post it publicly; he does not want any credit for this, but he should certainly be acknowledged.

(To be updated as people contribute)

Versions

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].