All Projects → PaccMann → paccmann_kinase_binding_residues

PaccMann / paccmann_kinase_binding_residues

Licence: MIT license
Comparison of active site and full kinase sequences for drug-target affinity prediction and molecular generation. Full paper: https://pubs.acs.org/doi/10.1021/acs.jcim.1c00889

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to paccmann kinase binding residues

deforestation
A machine learning exercise, using KNN to classify deforested areas
Stars: ✭ 26 (-10.34%)
Mutual labels:  knn
Breast-Cancer-Scikitlearn
simple tutorial on Machine Learning with Scikitlearn
Stars: ✭ 33 (+13.79%)
Mutual labels:  knn
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+38175.86%)
Mutual labels:  knn
Amazon-Fine-Food-Review
Machine learning algorithm such as KNN,Naive Bayes,Logistic Regression,SVM,Decision Trees,Random Forest,k means and Truncated SVD on amazon fine food review
Stars: ✭ 28 (-3.45%)
Mutual labels:  knn
ml
经典机器学习算法的极简实现
Stars: ✭ 130 (+348.28%)
Mutual labels:  knn
Awesome-Scripts
A collection of awesome scripts from developers around the globe.
Stars: ✭ 135 (+365.52%)
Mutual labels:  knn
Handwritten-Digits-Classification-Using-KNN-Multiclass Perceptron-SVM
🏆 A Comparative Study on Handwritten Digits Recognition using Classifiers like K-Nearest Neighbours (K-NN), Multiclass Perceptron/Artificial Neural Network (ANN) and Support Vector Machine (SVM) discussing the pros and cons of each algorithm and providing the comparison results in terms of accuracy and efficiecy of each algorithm.
Stars: ✭ 42 (+44.83%)
Mutual labels:  knn
Trajectory-Analysis-and-Classification-in-Python-Pandas-and-Scikit-Learn
Formed trajectories of sets of points.Experimented on finding similarities between trajectories based on DTW (Dynamic Time Warping) and LCSS (Longest Common SubSequence) algorithms.Modeled trajectories as strings based on a Grid representation.Benchmarked KNN, Random Forest, Logistic Regression classification algorithms to classify efficiently t…
Stars: ✭ 41 (+41.38%)
Mutual labels:  knn
NIDS-Intrusion-Detection
Simple Implementation of Network Intrusion Detection System. KddCup'99 Data set is used for this project. kdd_cup_10_percent is used for training test. correct set is used for test. PCA is used for dimension reduction. SVM and KNN supervised algorithms are the classification algorithms of project. Accuracy : %83.5 For SVM , %80 For KNN
Stars: ✭ 45 (+55.17%)
Mutual labels:  knn
Machine Learning
⚡机器学习实战(Python3):kNN、决策树、贝叶斯、逻辑回归、SVM、线性回归、树回归
Stars: ✭ 5,601 (+19213.79%)
Mutual labels:  knn
KernelKnn
Kernel k Nearest Neighbors in R
Stars: ✭ 14 (-51.72%)
Mutual labels:  knn
facenet-darknet-inference
Face recognition using facenet
Stars: ✭ 29 (+0%)
Mutual labels:  knn
kervolution
Kervolution Library in PyTorch (CVPR 2019 Oral)
Stars: ✭ 33 (+13.79%)
Mutual labels:  knn
MachineLearning
机器学习教程,本教程包含基于numpy、sklearn与tensorflow机器学习,也会包含利用spark、flink加快模型训练等用法。本着能够较全的引导读者入门机器学习。
Stars: ✭ 23 (-20.69%)
Mutual labels:  knn
drowsiness-detection
To identify the driver's drowsiness based on real-time camera image and image processing techniques. 졸음운전 감지 시스템. OpenCV
Stars: ✭ 31 (+6.9%)
Mutual labels:  knn
fastknn
Fast k-Nearest Neighbors Classifier for Large Datasets
Stars: ✭ 64 (+120.69%)
Mutual labels:  knn
CS231n
My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (+3.45%)
Mutual labels:  knn
Recommender-Systems
Implementing Content based and Collaborative filtering(with KNN, Matrix Factorization and Neural Networks) in Python
Stars: ✭ 46 (+58.62%)
Mutual labels:  knn
Portrait FCN and 3D Reconstruction
This project is to convert PortraitFCN+ (by Xiaoyong Shen) from Matlab to Tensorflow, then refine the outputs from it (converted to a trimap) using KNN and ResNet, supervised by Richard Berwick.
Stars: ✭ 61 (+110.34%)
Mutual labels:  knn
keras-knn
Code for the blog post Nearest Neighbors with Keras and CoreML
Stars: ✭ 25 (-13.79%)
Mutual labels:  knn

Active sites outperform full proteins for modeling kinases

Python package License: MIT Code style: black DOI:10.1021/acs.jcim.1c00889 DOI:10.1021/acs.jcim.2c00840

Summary

This repository contains data & code for the JCIM paper: Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model. We study the impact of different protein sequence representations for modeling human kinases. We find that using active site residues yields superior performance to using full protein sequences for predicting binding affinity. We also study the difference of active site vs. full sequence on de-novo design tasks. We generate kinase inhibitors directly from protein sequences with our previously developed hybrid-VAE (PaccMannRL) but find no major differences between both kinase representations.

News

Summary GIF

Description

This repository facilitates the reproduction of the experiments conducted in the JCIM paper Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model. We provide scripts to:

  1. Train and evaluate the BimodalMCA for drug-protein affinity prediction
  2. Evaluate the bimodal KNN affinity predictor either in a CV setting or on a plain train/test script
  3. Optimize a SMILES- or SELFIES-based molecular generative model to produce molecules with high binding affinities for a protein of interest (affinity is predicted with the KNN model).

Data

The preprocessed BindingDB data (CV and test data for ligand split and kinase split, data used for pretraining and affinity optimization) can be accessed on this Box link. We also release the aligned active site sequences (29 residues) for all kinases. If you use the data, please cite our work.

Installation

The core functionality of this repo is provided in the pkbr package which can be installed (in editable mode) by pip install -e .. In order to execute the example scripts, we recommend setting up a conda environment:

conda env create -f conda.yml
conda activate pkbr

Afterwards you can execute the scripts to run the KNN, the BiMCA or the affinity optimization.

Code examples

All examples assume that you downloaded the data from this Box link and stored it in a folder data in the root of this repo.

Running the KNN affinity predictor in a cross validation

python3 scripts/knn_cv.py -d data/ligand_split -tr train.csv -te validation.csv \
-f 10 -lp data/ligands.smi -kp data/human_kinases_active_site.smi -r my_cv_results

Running KNN in single train/test split

python3 scripts/knn_test.py -t data/ligand_split/fold_0/validation.csv -test data/ligand_split/fold_0/test.csv \
-tlp data/ligands.smi -telp data/ligands.smi -kp data/human_kinases_active_site.smi -r my_results.csv

Training the BiMCA model

python3 scripts/bimca_train.py \
	data/ligand_split/fold_0/train.csv data/ligand_split/fold_0/validation.csv data/ligand_split/fold_0/test.csv \
	human-kinase-alignment data/human_kinases_active_site.smi data/ligands.smi data/smiles_vocab.json \
	models config/active_site.json -n my_as_model

Evaluating the BiMCA model

python3 scripts/bimca_test.py \
data/ligand_split/fold_0/validation.csv human-kinase-alignment data/human_kinases_sequence.smi data/ligands.smi \
path_to_your_trained_model

Affinity optimization with SMILES generator

To execute this part you need to utilize the pretrained SMILES/SELFIES VAE stored under data/models

python scripts/gp_generation_smiles_knn.py \
    data/models/smiles_vae data/affinity_optimization/bindingdb_all_kinase_active_site.csv \
    data/affinity_optimization/example_active_site.smi \
    smiles_active_site_generator/ -s 42 -t 0.85 -r 10 -n 50 -s 42 -i 40 -c 80

Affinity optimization with SELFIES generator

python scripts/gp_generation_selfies_knn.py \
    data/models/selfies_vae data/affinity_optimization/bindingdb_all_kinase_sequence.csv \
    data/affinity_optimization/example_sequence.smi \
    selfies_sequence_generator/ -s 42 -t 0.85 -r 10 -n 50 -s 42 -i 40 -c 80

Choosing active-site sequences

See our new letter in JCIM.

Definitions

How to exactly define an "active site" is a critical choice. While we originally relied on the definition by Sheridan et al. (2009) we have compared now to the active site definition by Martin et al. (2012) and a Combined definition that uses a total of 35 residues from either definitions. This improves performance significantly, especially for allosteric binders.

Augmentations

We also devised novel protein sequence augmentation schemes by flipping/swapping contiguous active-site subsequences that lie discontiguously in the full protein sequence.

To train a model with the new active site definition and the new augmentation, follow the installation setup and download the data from Box.

Afterwards run:

python3 scripts/bimca_train.py \
	data/ligand_split/fold_0/train.csv data/ligand_split/fold_0/validation.csv data/ligand_split/fold_0/test.csv \
	human-kinase-alignment data/human_kinases_active_site_combined.smi data/ligands.smi data/smiles_vocab.json \
	models config/active_site_augment.json -n combined_augment

You can modify config/active_site_augment.json to play with the hyperparameters and the probability of the augmentations.

Citation

If you use this repo, our data, the proposed active site definitions or the sequence augmentation strategies in your projects, please cite the following:

@article{born2022active,
	author = {Born, Jannis and Huynh, Tien and Stroobants, Astrid and Cornell, Wendy D. and Manica, Matteo},
	title = {Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model},
	journal = {Journal of Chemical Information and Modeling},
	volume = {62},
	number = {2},
	pages = {240-257},
	year = {2022},
	doi = {10.1021/acs.jcim.1c00889},
	note ={PMID: 34905358},
	URL = {https://doi.org/10.1021/acs.jcim.1c00889}
}

@article{born2022on,
        author = {Born, Jannis and Shoshan, Yoel and Huynh, Tien and Cornell, Wendy D. and Martin, Eric J. and Manica, Matteo},
        title = {On the Choice of Active Site Sequences for Kinase-Ligand Affinity Prediction},
        journal = {Journal of Chemical Information and Modeling},
        volume = {62},
        number = {18},
        pages = {4295-4299},
        year = {2022},
        doi = {10.1021/acs.jcim.2c00840},
        URL = {https://doi.org/10.1021/acs.jcim.2c00840}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].