All Projects → HannesStark → 3DInfomax

HannesStark / 3DInfomax

Licence: other
Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to 3DInfomax

gnn-lspe
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural and Positional Representations), ICLR 2022
Stars: ✭ 165 (+54.21%)
Mutual labels:  molecules, graph-neural-networks, graph-representation-learning, gnn
GNN-Recommender-Systems
An index of recommendation algorithms that are based on Graph Neural Networks.
Stars: ✭ 505 (+371.96%)
Mutual labels:  graph-neural-networks, graph-representation-learning, gnn
awesome-efficient-gnn
Code and resources on scalable and efficient Graph Neural Networks
Stars: ✭ 498 (+365.42%)
Mutual labels:  graph-neural-networks, graph-representation-learning, gnn
LPGNN
Locally Private Graph Neural Networks (ACM CCS 2021)
Stars: ✭ 30 (-71.96%)
Mutual labels:  graph-neural-networks, pytorch-geometric
BGCN
A Tensorflow implementation of "Bayesian Graph Convolutional Neural Networks" (AAAI 2019).
Stars: ✭ 129 (+20.56%)
Mutual labels:  graph-neural-networks, gnn
awesome-graph-self-supervised-learning
Awesome Graph Self-Supervised Learning
Stars: ✭ 805 (+652.34%)
Mutual labels:  graph-neural-networks, self-supervised-learning
SelfGNN
A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling" paper, which appeared in The International Workshop on Self-Supervised Learning for the Web (SSL'21) @ the Web Conference 2021 (WWW'21).
Stars: ✭ 24 (-77.57%)
Mutual labels:  graph-neural-networks, self-supervised-learning
graphchem
Graph-based machine learning for chemical property prediction
Stars: ✭ 21 (-80.37%)
Mutual labels:  graph-neural-networks, pytorch-geometric
awesome-graph-self-supervised-learning-based-recommendation
A curated list of awesome graph & self-supervised-learning-based recommendation.
Stars: ✭ 37 (-65.42%)
Mutual labels:  graph-neural-networks, self-supervised-learning
PDN
The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)
Stars: ✭ 44 (-58.88%)
Mutual labels:  molecules, gnn
gemnet pytorch
GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)
Stars: ✭ 80 (-25.23%)
Mutual labels:  graph-neural-networks, gnn
pyg autoscale
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch
Stars: ✭ 136 (+27.1%)
Mutual labels:  graph-neural-networks, pytorch-geometric
SubGNN
Subgraph Neural Networks (NeurIPS 2020)
Stars: ✭ 136 (+27.1%)
Mutual labels:  graph-neural-networks, graph-representation-learning
SelfTask-GNN
Implementation of paper "Self-supervised Learning on Graphs:Deep Insights and New Directions"
Stars: ✭ 78 (-27.1%)
Mutual labels:  graph-neural-networks, self-supervised-learning
mtad-gat-pytorch
PyTorch implementation of MTAD-GAT (Multivariate Time-Series Anomaly Detection via Graph Attention Networks) by Zhao et. al (2020, https://arxiv.org/abs/2009.02040).
Stars: ✭ 85 (-20.56%)
Mutual labels:  graph-neural-networks, gnn
QGNN
Quaternion Graph Neural Networks (ACML 2021) (Pytorch and Tensorflow)
Stars: ✭ 31 (-71.03%)
Mutual labels:  graph-neural-networks, graph-representation-learning
GCA
[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"
Stars: ✭ 69 (-35.51%)
Mutual labels:  self-supervised-learning, graph-representation-learning
GNNLens2
Visualization tool for Graph Neural Networks
Stars: ✭ 155 (+44.86%)
Mutual labels:  graph-neural-networks, graph-representation-learning
Literatures-on-GNN-Acceleration
A reading list for deep graph learning acceleration.
Stars: ✭ 50 (-53.27%)
Mutual labels:  graph-neural-networks, gnn
grail
Inductive relation prediction by subgraph reasoning, ICML'20
Stars: ✭ 83 (-22.43%)
Mutual labels:  graph-neural-networks, graph-representation-learning

3D Infomax improves GNNs for Molecular Property Prediction

Video | Paper

We pre-train GNNs to understand the geometry of molecules given only their 2D molecular graph which they can use for better molecular property predictions. Below is a 3 step guide for how to use the code and how to reproduce our results and a guide for creating molecular fingerprints. If you have questions, don't hesitate to open an issue or ask me via [email protected] or social media. I am happy to hear from you!

This repository additionally adapts different self-supervised learning methods to graphs such as "Bootstrap your own Latent", "Barlow Twins", or "VICReg".

Generating fingerprints for arbitrary SMILES

To generate fingerprints that carry 3D information, just set up the environment as in step 1 below, then place your SMILES into the file dataset/inference_smiles.txt and run

python inference.py --config=configs_clean/fingerprint_inference.yml

Your fingerprints are saved as pickle file into the dataset_directory

Step 1: Setup Environment

We will set up the environment using Anaconda. Clone the current repo

git clone https://github.com/HannesStark/3DInfomax

Create a new environment with all required packages using environment.yml (this can take a while). While in the project directory run:

conda env create

Activate the environment

conda activate 3DInfomax

Step 2: 3D Pre-train a model

Let's pre-train a GNN with 50 000 molecules and their structures from the QM9 dataset (you can also skip to Step 3 and use the pre-trained model weights provided in this repo). For other datasets see the Data section below.

python train.py --config=configs_clean/pre-train_QM9.yml

This will first create the processed data of dataset/QM9/qm9.csv with the 3D information in qm9_eV.npz. Then your model starts pre-training and all the logs are saved in the runs folder which will also contain the pre-trained model as best_checkpoint.pt that can later be loaded for fine-tuning.

You can start tensorboard and navigate to localhost:6006 in your browser to monitor the training process:

tensorboard --logdir=runs --port=6006

Explanation:

The config files in configs_clean provide additional examples and blueprints to train different models. The files always contain a model_type that should be pre-trained (2D network) and a model3d_type (3D network) where you can specify the parameters of these networks. To find out more about all the other parameters in the config file, have a look at their description by running python train.py --help.

Step 3: Fine-tune a model

During pre-training a directory is created in the runs directory that contains the pre-trained model. We provide an example of such a directory with already pre-trained weights runs/PNA_qmugs_NTXentMultiplePositives_620000_123_25-08_09-19-52 which we can fine-tune for predicting QM9's homo property as follows.

python train.py --config=configs_clean/tune_QM9_homo.yml

You can monitor the fine-tuning process on tensorboard as well and in the end the results will be printed to the console but also saved in the runs directory that was created for fine-tuning in the file evaluation_test.txt.

The model which we are fine-tuning from is specified in configs_clean/tune_QM9_homo.yml via the parameter:

pretrain_checkpoint: runs/PNA_qmugs_NTXentMultiplePositives_620000_123_25-08_09-19-52/best_checkpoint_35epochs.pt

Multiple seeds:

This is a second fine-tuning example where we predict non-quantum properties of the OGB datasets and train multiple seeds (we always use the seeds 1, 2, 3, 4, 5, 6 in our experiments):

python train.py --config=configs_clean/tune_freesolv.yml

After all runs are done, the averaged results are saved in the runs directory of each seed in the file multiple_seed_test_statistics.txt

Data

You can pre-train or fine-tune on different datasets by specifying the dataset: parameter in a .yml file such as dataset: drugs to use GEOM-Drugs.

The QM9 dataset and the OGB datasets are already provided with this repository. The QMugs and GEOM-Drugs datasets need to be downloaded and placed in the correct location.

GEOM-Drugs: Download GEOM-Drugs here ( the rdkit_folder.tar.gz file), unzip it, and place it into dataset/GEOM.

QMugs: Download QMugs here (the structures.tar and summary.csv files), unzip the structures.tar, and place the resulting structures folder and the summary.csv file into a new folder QMugs that you have to create NEXT TO the repository root. Not in the repository root (sorry for this).

Reference

📃 Paper on arXiv

@article{stark2021_3dinfomax,
  title={3D Infomax improves GNNs for Molecular Property Prediction},
  author={Hannes Stärk and Dominique Beaini and Gabriele Corso and Prudencio Tossou and Christian Dallago and Stephan Günnemann and Pietro Liò},
  journal={arXiv preprint arXiv:2110.04126},
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].