All Projects → aqlaboratory → hsm

aqlaboratory / hsm

Licence: MIT License
Code associated with "Biophysical prediction of protein-peptide interactions and signaling networks using machine learning."

Programming Languages

python
139335 projects - #7 most used programming language
TeX
3793 projects

hsm - Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.

This repository implements the hierarchical statistical mechanical (HSM) model described in the paper Biophysical prediction of protein-peptide interactions and signaling networks using machine learning.

An associated website is available at proteinpeptide.io. The website is built to facilitate interactions with results from the model including: (1) specific domain-peptide and protein-protein predictions, (2) the resulting networks, and (3) structures colored using the inferred energy functions from the model. Code for the website is available via the parallel repo: aqlaboratory/hsm-web.

This file documents how this package might be used, the location of associated data, and other metadata.

Usage

The model was implemented in Python (>= 3.5) primarily using TensorFlow (>= 1.4) (Software Requirements). To work with this repository, either download pre-processed data (see below) or include new data. The folder contains two major directories: train/ and predict/. Each directory is accompanied by a README.md file detailing usage.

To train / re-train new models, use the train.py script in train/. To make predictions using a model, use one of two scripts, predict_domains.py and predict_proteins.py, for predicting either domain-peptide interactions or protein-protein interactions. Scripts are designed with a CLI and should be used from the command line:

python [SCRIPT] [OPTIONS]

Options for any script may be listed using the -h/--help flag.

Pre-processed / pre-trained data and models may be downloaded from figshare (doi:10.6084/m9.figshare.11520552) and should be unpacked at data/ in this directory. This directory may also be used as an example of how to structure input and output files / directories.

An alternative use case would be to train / re-train a new model in the train/ code and make new predictions using the predict/ code.

Kinase model issue

We have identified an issue in the dataset used to train the kinase model. For the time being, we suggest not using the kinase model until further updates are provided.

Data

As reported, domain-peptide and protein-protein interactions are available via figshare (doi:10.6084/m9.figshare.10084745). In addition, we provide pre-processed data for this repository and the website repository,

Requirements

  • Python (>= 3.5)
  • TensorFlow (1.14)
  • numpy (1.18)
  • scipy (1.4)
  • scikit-learn (0.20)
  • tqdm (4.41) (Progressbar. Not strictly necessary for functionality; needed to ensure package runs.)

Reference

Please reference the associated publication:

Cunningham, J.M., Koytiger, G., Sorger, P.K., & AlQuraishi, M. "Biophysical prediction of protein-peptide interactions and signaling networks using machine learning." Nature Methods (2020). doi:10.1038/s41592-019-0687-1. (citation.bib)

See also, a website at proteinpeptide.io for exploring the associated analyses (code: aqlaboratory/hsm-web).

Funding

This work was supported by the following sources:

Funder Grant number
NIH U54-CA225088
NIH P50-GM107618
DARPA / DOD W911NF-14-1-0397

License

This repository is released under an MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].