All Projects → garywei944 → FMol

garywei944 / FMol

Licence: MIT License
A simplified drug discovery pipeline -- generating SMILE molecular with AlphaSMILES, predicting protein structure with AlphaFold, and checking the druggability with fpocket/Amber.

Programming Languages

python
139335 projects - #7 most used programming language
matlab
3953 projects
shell
77523 projects
Makefile
30231 projects

Projects that are alternatives of or similar to FMol

cbh21-protein-solubility-challenge
Template with code & dataset for the "Structural basis for solubility in protein expression systems" challenge at the Copenhagen Bioinformatics Hackathon 2021.
Stars: ✭ 15 (+15.38%)
Mutual labels:  protein, drug-discovery
Jupyter Dock
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.
Stars: ✭ 179 (+1276.92%)
Mutual labels:  protein, drug-discovery
BuddySuite
Bioinformatics toolkits for manipulating sequence, alignment, and phylogenetic tree files
Stars: ✭ 106 (+715.38%)
Mutual labels:  protein
AMIDD
Introduction to Applied Mathematics and Informatics in Drug Discovery (AMIDD)
Stars: ✭ 13 (+0%)
Mutual labels:  drug-discovery
skywalkR
code for Gogleva et al manuscript
Stars: ✭ 28 (+115.38%)
Mutual labels:  drug-discovery
PyPLIF-HIPPOS
HIPPOS Is PyPLIF On Steroids. A Molecular Interaction Fingerprinting Tool for Docking Results of Autodock Vina and PLANTS
Stars: ✭ 15 (+15.38%)
Mutual labels:  drug-discovery
VSCoding-Sequence
VSCode Extension for interactively visualising protein structure data in the editor
Stars: ✭ 41 (+215.38%)
Mutual labels:  protein
FluentDNA
FluentDNA allows you to browse sequence data of any size using a zooming visualization similar to Google Maps. You can use FluentDNA as a standalone program or as a python module for your own bioinformatics projects.
Stars: ✭ 52 (+300%)
Mutual labels:  protein
EVcouplings
Evolutionary couplings from protein and RNA sequence alignments
Stars: ✭ 113 (+769.23%)
Mutual labels:  protein
ProteinGCN
ProteinGCN: Protein model quality assessment using Graph Convolutional Networks
Stars: ✭ 88 (+576.92%)
Mutual labels:  protein
Calibrated-Boosting-Forest
Original implementation of Calibrated Boosting-Forest
Stars: ✭ 18 (+38.46%)
Mutual labels:  drug-discovery
naf
Nucleotide Archival Format - Compressed file format for DNA/RNA/protein sequences
Stars: ✭ 35 (+169.23%)
Mutual labels:  protein
cath-tools
Protein structure comparison tools such as SSAP and SNAP
Stars: ✭ 40 (+207.69%)
Mutual labels:  protein
orfipy
Fast and flexible ORF finder
Stars: ✭ 27 (+107.69%)
Mutual labels:  protein
SeqVec
Modelling the Language of Life - Deep Learning Protein Sequences
Stars: ✭ 74 (+469.23%)
Mutual labels:  protein
caviar
Protein cavity identification and automatic subpocket decomposition
Stars: ✭ 27 (+107.69%)
Mutual labels:  protein
egfr-att
Drug effect prediction using neural network
Stars: ✭ 17 (+30.77%)
Mutual labels:  drug-discovery
awesome-small-molecule-ml
A curated list of resources for machine learning for small-molecule drug discovery
Stars: ✭ 54 (+315.38%)
Mutual labels:  drug-discovery
chemicalx
A PyTorch and TorchDrug based deep learning library for drug pair scoring.
Stars: ✭ 176 (+1253.85%)
Mutual labels:  drug-discovery
Rcpi
Molecular informatics toolkit with a comprehensive integration of bioinformatics and cheminformatics tools for drug discovery.
Stars: ✭ 22 (+69.23%)
Mutual labels:  drug-discovery

Updates

08/12/2021

I noticed that people are staring this repo recently. It might because deepmind released alphafold as an independent project last month. But this is a project I worked on when I was a sophomore and some parts of the project doesn't really work well. So I decide to resume working on it this summer and next semester. Updates coming soon.

FMol

A simplified drug discovery pipeline -- generating SMILE molecular with AlphaSMILES, predicting protein structure with AlphaFold, and checking the druggability with fPocket/Amber.

FMol

Requirements

  • 64-bit Linux, we will use mamba for package management so distribution is not a problem.

Install python environment

Updates coming soon

Fix the issue with Theanos

If the default framework used by keras is Theanos, use the following line to switch to TensorFlow print Using TensorFlow backend. / Using Theanos backend. when you launch the program:

export KERAS_BACKEND='tensorflow'

Third party library

  • AlphaSMILES uses 3D calculation(DFT) library Gaussian 09 by default. If you want this functionality works well, here are some guides how to set up Gaussian 09 on Ubuntu.
  • I use RECONSTRUCT to reconstruct protein tertiary structure in .pdb format from contact map. This software does not works as expected so far, it's still a beta version and the organization is working on it. It's expected to provide an easy way to reconstruct protein tertiary structure. For chemistry professionals, see Recovery of protein structure from contact maps. They use Tinker to reconstruct the protein tertiary structure.

Usage

Quick Start

  1. Download AlphaFold weight data from here.
  2. Install Gaussian 09 and make sure g09 works well in your terminal
  3. Extract the sample input data in AlphaSMILES/data_in provided in .tar.xz and .tar.gz format.
  4. Make a new subfolder alphafold_pytorch/model and extract the weight folders into model.
  5. Modify the variable in fmol.py according to your PC.
  6. Run ./fmol.py

If you only want to use AlphaSMILES or AlphaFold

AlphaSMILES

Please check doc for usage tutorial. Cyril-Grl has made a brilliant documentation for it. I provide some additional input data, sample configurations for rnn and mcts, and a sample output using the sample configurations. There is also a local version of the documentation if Cyril's website shuts down, it's in AlphaSMILES/doc/_build/html/index.html

Quick start

If you have Gaussian 09 set up and g09 works well in your terminal and just want a quick start:

  1. Extract the sample input data in AlphaSMILES/data_in provided in .tar.xz and .tar.gz format.
  2. Change the options in AlphaSMILES/main.py
  3. Simply run AlphaSMILES/main.py

alphafold_pytorch

  1. To run the project, you need to firstly download pre-trained weights from Deepmind repos.
  2. Create a folder named model under alpha_fold_pytorch
  3. Extract the weights downloaded in step 1 and move 873731, 916425, and 941521 3 folders into the model folder.
  4. The samples inputs is provided, so simply run ./alphafold_pytorch/alphafold.sh to run the project.
Remarks
  1. Technically we can use original deepmind AlphaFold rather than alphafold_pytorch. But I got too many error warnings when I run their code and they didn't provide a good way to visualize the output. So I choose alphafold_pytorch at last.
  2. For more details, check alphafold_pytorch readme
  3. If you encounter issue that says out of GPU memory, uncomment line 16 of alphafold_pytorch/alphafold.sh. That allows you to run 3 trainings at a time, not all 8 trainings by default.

molxalpha

utils.py

utils.rr_to_cm(input)

I provide a method to convert CAPS13-RR file to contact map file that RECONSTRUCT accepts. It create a contact map file in .cm file format within the same folder as the input .rr file.

Params
  • input(string) - path to the input file
Returns
  • None

fpocket

  • Use the install_fpocket.sh shell script under scripts folder to install fpocket on your machine.
  • For more information check their repo

Amber

Updates coming soon

What's stucked

  • The output file of alphafold comes in .rr casp13-rr format. It stores the probability of two atoms on the protein chain could contact within 8 angstroms. But fpocket only accept input file in .pdb format, which basically stores the 3-D coordinate information of each atom. Reconstructing reliable PDB file from the CASP13-RR file is still an unsolved problem in academic circles. RECONSTRUCT is a third-party software using TINKER package aiming to reconstruct PDB file from .cm contact map file format, but does not work well. I wrote a tool to convert CASP13-RR format into contact map format(see utils.rr_to_cm).
  • Deepmind didn't open-source the procedure of protein tertiary structure prediction, especially the part of training model from CASP PDB dataset. However, it's essential to the accuracy of prediction of arbitrary protein structure.

Todo

  • Reconstruct the project.
  • Using Tinker to reconstruct protein tertiary structure is a classical approach.

Copyright Claim

To make the project easier to deploy on the cloud, I copied and merged some repos into this project according to their licence.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].