All Projects → OATML → EVE

OATML / EVE

Licence: MIT license
Official repository for the paper "Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning". Joint collaboration between the Marks lab and the OATML group.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to EVE

gcWGAN
Guided Conditional Wasserstein GAN for De Novo Protein Design
Stars: ✭ 38 (+2.7%)
Mutual labels:  protein, generative-model
texturize
🤖🖌️ Generate photo-realistic textures based on source images. Remix, remake, mashup! Useful if you want to create variations on a theme or elaborate on an existing texture.
Stars: ✭ 495 (+1237.84%)
Mutual labels:  generative-model
PBxplore
A suite of tools to explore protein structures with Protein Blocks 🐍
Stars: ✭ 21 (-43.24%)
Mutual labels:  protein
caffe-simnets
The SimNets Architecture's Implementation in Caffe
Stars: ✭ 13 (-64.86%)
Mutual labels:  generative-model
naru
Neural Relation Understanding: neural cardinality estimators for tabular data
Stars: ✭ 76 (+105.41%)
Mutual labels:  generative-model
mmterm
View proteins and trajectories in the terminal
Stars: ✭ 87 (+135.14%)
Mutual labels:  protein
dnapacman
waka waka
Stars: ✭ 15 (-59.46%)
Mutual labels:  protein
cgdms
Differentiable molecular simulation of proteins with a coarse-grained potential
Stars: ✭ 44 (+18.92%)
Mutual labels:  protein
feed forward vqgan clip
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
Stars: ✭ 135 (+264.86%)
Mutual labels:  generative-model
Jupyter Dock
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.
Stars: ✭ 179 (+383.78%)
Mutual labels:  protein
trVAE
Conditional out-of-distribution prediction
Stars: ✭ 47 (+27.03%)
Mutual labels:  generative-model
worlds
Building Virtual Reality Worlds using Three.js
Stars: ✭ 23 (-37.84%)
Mutual labels:  generative-model
AC-VRNN
PyTorch code for CVIU paper "AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction"
Stars: ✭ 21 (-43.24%)
Mutual labels:  generative-model
Deeprank-GNN
Graph Network for protein-protein interface
Stars: ✭ 66 (+78.38%)
Mutual labels:  protein
auto coding
A basic and simple tool for code auto completion
Stars: ✭ 42 (+13.51%)
Mutual labels:  generative-model
glico-learning-small-sample
Generative Latent Implicit Conditional Optimization when Learning from Small Sample ICPR 20'
Stars: ✭ 20 (-45.95%)
Mutual labels:  generative-model
InpaintNet
Code accompanying ISMIR'19 paper titled "Learning to Traverse Latent Spaces for Musical Score Inpaintning"
Stars: ✭ 48 (+29.73%)
Mutual labels:  generative-model
PREREQ-IAAI-19
Inferring Concept Prerequisite Relations from Online Educational Resources (IAAI-19)
Stars: ✭ 22 (-40.54%)
Mutual labels:  generative-model
cygen
Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)
Stars: ✭ 44 (+18.92%)
Mutual labels:  generative-model
MMD-GAN
Improving MMD-GAN training with repulsive loss function
Stars: ✭ 82 (+121.62%)
Mutual labels:  generative-model

Evolutionary model of Variant Effects (EVE)

Please note that we have migrated the official repo to the following address: https://github.com/OATML-Markslab/EVE.

Overview

EVE is a set of protein-specific models providing for any single amino acid mutation of interest a score reflecting the propensity of the resulting protein to be pathogenic. For each protein family, a Bayesian VAE learns a distribution over amino acid sequences from evolutionary data. It enables the computation of an evolutionary index for each mutant, which approximates the log-likelihood ratio of the mutant vs the wild type. A global-local mixture of Gaussian Mixture Models separates variants into benign and pathogenic clusters based on that index. The EVE scores reflect probabilistic assignments to the pathogenic cluster.

Usage

The end to end process to compute EVE scores consists of three consecutive steps:

  1. Train the Bayesian VAE on a re-weighted multiple sequence alignment (MSA) for the protein of interest => train_VAE.py
  2. Compute the evolutionary indices for all single amino acid mutations => compute_evol_indices.py
  3. Train a GMM to cluster variants on the basis of the evol indices then output scores and uncertainties on the class assignments => train_GMM_and_compute_EVE_scores.py We also provide all EVE scores for all single amino acid mutations for thousands of proteins at the following address: http://evemodel.org/.

Example scripts

The "examples" folder contains sample bash scripts to obtain EVE scores for a protein of interest (using PTEN as an example). MSAs and ClinVar labels are provided for 4 proteins (P53, PTEN, RASH and SCN5A) in the data folder.

Data requirements

The only data required to train EVE models and obtain EVE scores from scratch are the multiple sequence alignments (MSAs) for the corresponding proteins.

MSA creation

We built multiple sequence alignments for each protein family by performing five search iterations of the profile HMM homology search tool Jackhmmer against the UniRef100 database of non-redundant protein sequences (downloaded on April 20th 2020). Please refer to the supplementary notes of the EVE paper (section 3.1.1) for a detailed description of the MSA creation process. Our github repo provides the MSAs for 4 proteins: P53, PTEN, RASH & SCN5A (see data/MSA). MSAs for all proteins may be accessed on our website (https://evemodel.org/).

MSA pre-processing

The EVE codebase provides basic functionalities to pre-process MSAs for modelling (see the MSA_processing class in utils/data_utils.py). By default, sequences with 50% or more gaps in the alignment and/or positions with less than 70% residue occupancy will be removed. These parameters may be adjusted as needed by the end user.

ClinVar labels

The script "train_GMM_and_compute_EVE_scores.py" provides functionalities to compare EVE scores with reference labels (e.g., ClinVar). Our github repo provides labels for 4 proteins: P53, PTEN, RASH & SCN5A (see data/labels). ClinVar labels for all proteins may be accessed on our website (https://evemodel.org/).

Software requirements

The entire codebase is written in python. Package requirements are as follows:

  • python=3.7
  • pytorch=1.7
  • cudatoolkit=11.0
  • scikit-learn=0.24.1
  • numpy=1.20.1
  • pandas=1.2.4
  • scipy=1.6.2
  • tqdm
  • matplotlib
  • seaborn

The corresponding environment may be created via conda and the provided protein_env.yml file as follows:

  conda env create -f protein_env.yml
  conda activate protein_env

License

This project is available under the MIT license.

Reference

If you use this code, please cite the following paper:

@article{Frazer2021DiseaseVP,
  title={Disease variant prediction with deep generative models of evolutionary data.},
  author={Jonathan Frazer and Pascal Notin and Mafalda Dias and Aidan Gomez and Joseph K Min and Kelly P. Brock and Yarin Gal and Debora S. Marks},
  journal={Nature},
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].