All Projects → hiranumn → DeepAccNet

hiranumn / DeepAccNet

Licence: MIT license
Pytorch/Python3 implementation of DeepAccNet, protein model accuracy evaluator.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to DeepAccNet

orfipy
Fast and flexible ORF finder
Stars: ✭ 27 (-52.63%)
Mutual labels:  protein
dnapacman
waka waka
Stars: ✭ 15 (-73.68%)
Mutual labels:  protein
ArenaR
Data generator for Arena - interactive XAI dashboard
Stars: ✭ 28 (-50.88%)
Mutual labels:  ema
caviar
Protein cavity identification and automatic subpocket decomposition
Stars: ✭ 27 (-52.63%)
Mutual labels:  protein
r3dmol
🧬 An R package for visualizing molecular data in 3D
Stars: ✭ 45 (-21.05%)
Mutual labels:  protein
Deeprank-GNN
Graph Network for protein-protein interface
Stars: ✭ 66 (+15.79%)
Mutual labels:  protein
ProteinGCN
ProteinGCN: Protein model quality assessment using Graph Convolutional Networks
Stars: ✭ 88 (+54.39%)
Mutual labels:  protein
pia
📚 🔬 PIA - Protein Inference Algorithms
Stars: ✭ 19 (-66.67%)
Mutual labels:  protein
Biopython
Official git repository for Biopython (originally converted from CVS)
Stars: ✭ 2,936 (+5050.88%)
Mutual labels:  protein
mmterm
View proteins and trajectories in the terminal
Stars: ✭ 87 (+52.63%)
Mutual labels:  protein
EVcouplings
Evolutionary couplings from protein and RNA sequence alignments
Stars: ✭ 113 (+98.25%)
Mutual labels:  protein
icn3d
web-based protein structure viewer and analysis tool interactively or in batch mode
Stars: ✭ 95 (+66.67%)
Mutual labels:  protein
pdb-tools
A dependency-free cross-platform swiss army knife for PDB files.
Stars: ✭ 240 (+321.05%)
Mutual labels:  protein
cbh21-protein-solubility-challenge
Template with code & dataset for the "Structural basis for solubility in protein expression systems" challenge at the Copenhagen Bioinformatics Hackathon 2021.
Stars: ✭ 15 (-73.68%)
Mutual labels:  protein
cgdms
Differentiable molecular simulation of proteins with a coarse-grained potential
Stars: ✭ 44 (-22.81%)
Mutual labels:  protein
VSCoding-Sequence
VSCode Extension for interactively visualising protein structure data in the editor
Stars: ✭ 41 (-28.07%)
Mutual labels:  protein
PBxplore
A suite of tools to explore protein structures with Protein Blocks 🐍
Stars: ✭ 21 (-63.16%)
Mutual labels:  protein
FLIP
A collection of tasks to probe the effectiveness of protein sequence representations in modeling aspects of protein design
Stars: ✭ 35 (-38.6%)
Mutual labels:  protein
EVE
Official repository for the paper "Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning". Joint collaboration between the Marks lab and the OATML group.
Stars: ✭ 37 (-35.09%)
Mutual labels:  protein
Jupyter Dock
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.
Stars: ✭ 179 (+214.04%)
Mutual labels:  protein

DeepAccNet.py

Python-PyTorch implemenation of DeepAccNet described in https://www.biorxiv.org/content/10.1101/2020.07.17.209643v2

This method will estimate how good your protein models are using a metric called l-DDT (local distance difference test).

usage: DeepAccNet.py [-h] [--modelpath MODELPATH] [--pdb] [--csv] [--leaveTempFile] [--process PROCESS] [--featurize]
                     [--reprocess] [--verbose] [--bert] [--ensemble]
                     input ...

Error predictor network

positional arguments:
  input                 path to input folder or input pdb file
  output                path to output (folder path, npz, or csv)

optional arguments:
  -h, --help            show this help message and exit
  --pdb, -pdb           Running on a single pdb file instead of a folder (Default: False)
  --csv, -csv           Writing results to a csv file (Default: False)
  --per_res_only, -pr   Writing per-residue accuracy only (Default: False)
  --leaveTempFile, -lt  Leaving temporary files (Default: False)
  --process PROCESS, -p PROCESS
                        Specifying # of cpus to use for featurization (Default: 1)
  --featurize, -f       Running only the featurization part (Default: False)
  --reprocess, -r       Reprocessing all feature files (Default: False)
  --verbose, -v         Activating verbose flag (Default: False)
  --bert, -bert         Run with bert features. Use extractBert.py to generate them. (Default: False)
  --ensemble, -e        Running with ensembling of 4 models. This adds 4x computational time with some overheads
                        (Default: False)

v0.0.1
  • For the previous TensorFlow implementation, please see here.
  • For the MSA version of DeepAccNet, please see here.
  • For the refinement script, please see the modeling folder.

Softwares

  • Python > 3.5
  • PyTorch 1.3
  • PyRosetta for DeepAccNet and DeepAccNet-Bert.
  • ProtTrans and the ProtBert model (second one in the model availability table) for DeepAccNet-Bert.
  • Tested on Ubuntu 20.04 LTS

(For IPD users, please use the tensorflow conda environment)

Example usages

Running on a folder of pdbs (foldername: samples)

python DeepAccNet.py -r -v samples outputs

Running on a silentfile (filename: sample.silent)

python DeepAccNet-SILENT.py sample.silent output.csv

How to look at outputs

Output of the network is written to [input_file_name].npz, unless you had the --csv flag on. You can extract the predictions as follows.

import numpy as np

x = np.load("testoutput.npz")

lddt = x["lddt"]           # per residue lddt
estogram = x["estogram"]   # per pairwise distance e-stogram
mask = x["mask"]           # mask predicting native < 15

Perhaps lddt is the easiest place to start as it is per-residue quality score. You can simply take an average if you want a global score per protein structure.

If you want to do something more involved, check.ipynb is a good place to start.

Trouble shooting

  • If DeepAccNet.py returns an OOM (out of memory) error, your protein is probably too big. Try getting on titan instead of rtx2080 or run without gpu if running time is not your problem. You can also run it on cpus although it would be slow.
  • If you get an import error for pyErrorPred, you probably moved the script out of the DeepAccNet folder. In that case, you would have to add pyErrorPred to python path or do so within the script.
  • Send an e-mail at hiranumn at cs dot washington dot edu.

Resources

  • The dataset used to train this model can be accessed through here. Training splits can be accessed through data
  • DeepAccNet prediction on the test set can be downloaded here.
  • DeepAccNet prediction on the CAMEO set can be downloaded here.
  • DeepAccNet prediction on the CASP13 set can be downloaded here.

Updates

  • Repo initialized 2020.7.20
  • Transitioned to PyTorch 2020.11.3
  • Added versions that do not depend on pyRosetta, "distance with 3D" and "distance with 3D and Bert" from the paper. 2020.11.6
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].