benedekrozemberczki / Simgnn

Licence: gpl-3.0
A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Simgnn

resolutions-2019
A list of data mining and machine learning papers that I implemented in 2019.
Stars: ✭ 19 (-94.59%)
Mutual labels:  sklearn, attention-mechanism
Yolo Multi Backbones Attention
Model Compression—YOLOv3 with multi lightweight backbones(ShuffleNetV2 HuaWei GhostNet), attention, prune and quantization
Stars: ✭ 317 (-9.69%)
Mutual labels:  attention-mechanism
Attentionwalk
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).
Stars: ✭ 266 (-24.22%)
Mutual labels:  sklearn
Adaptiveattention
Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
Stars: ✭ 303 (-13.68%)
Mutual labels:  attention-mechanism
Transformer
A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"
Stars: ✭ 271 (-22.79%)
Mutual labels:  attention-mechanism
Seq2seq Summarizer
Pointer-generator reinforced seq2seq summarization in PyTorch
Stars: ✭ 306 (-12.82%)
Mutual labels:  attention-mechanism
Mirnet
Official repository for "Learning Enriched Features for Real Image Restoration and Enhancement" (ECCV 2020). SOTA results for image denoising, super-resolution, and image enhancement.
Stars: ✭ 247 (-29.63%)
Mutual labels:  attention-mechanism
Daily Stock Forecast
Daily Stock Forecasts using Machine Learning & Python
Stars: ✭ 341 (-2.85%)
Mutual labels:  sklearn
Seq2seq chatbot
基于seq2seq模型的简单对话系统的tf实现,具有embedding、attention、beam_search等功能,数据集是Cornell Movie Dialogs
Stars: ✭ 308 (-12.25%)
Mutual labels:  attention-mechanism
Attention is all you need
Transformer of "Attention Is All You Need" (Vaswani et al. 2017) by Chainer.
Stars: ✭ 303 (-13.68%)
Mutual labels:  attention-mechanism
Sklearn Evaluation
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.
Stars: ✭ 294 (-16.24%)
Mutual labels:  sklearn
Timesformer Pytorch
Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
Stars: ✭ 225 (-35.9%)
Mutual labels:  attention-mechanism
Machine Learning For Beginner By Python3
为机器学习的入门者提供多种基于实例的sklearn、TensorFlow以及自编函数(AnFany)的ML算法程序。
Stars: ✭ 305 (-13.11%)
Mutual labels:  sklearn
Textmining
Python文本挖掘系统 Research of Text Mining System
Stars: ✭ 268 (-23.65%)
Mutual labels:  sklearn
Keras Gat
Keras implementation of the graph attention networks (GAT) by Veličković et al. (2017; https://arxiv.org/abs/1710.10903)
Stars: ✭ 334 (-4.84%)
Mutual labels:  attention-mechanism
Da Rnn
📃 **Unofficial** PyTorch Implementation of DA-RNN (arXiv:1704.02971)
Stars: ✭ 256 (-27.07%)
Mutual labels:  attention-mechanism
Vit Pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Stars: ✭ 7,199 (+1951%)
Mutual labels:  attention-mechanism
Alphafold2
To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released
Stars: ✭ 298 (-15.1%)
Mutual labels:  attention-mechanism
Action Recognition Visual Attention
Action recognition using soft attention based deep recurrent neural networks
Stars: ✭ 350 (-0.28%)
Mutual labels:  attention-mechanism
Transformer
A TensorFlow Implementation of the Transformer: Attention Is All You Need
Stars: ✭ 3,646 (+938.75%)
Mutual labels:  attention-mechanism

SimGNN

PWC codebeat badge repo sizebenedekrozemberczki⠀⠀

A PyTorch implementation of SimGNN: A Neural Network Approach to Fast Graph Similarity Computation (WSDM 2019).

Abstract

Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. Graph similarity/distance computation, such as Graph Edit Distance (GED) and Maximum Common Subgraph (MCS), is the core operation of graph similarity search and many other applications, but very costly to compute in practice. Inspired by the recent success of neural network approaches to several graph applications, such as node or graph classification, we propose a novel neural network based approach to address this classic yet challenging graph problem, aiming to alleviate the computational burden while preserving a good performance. The proposed approach, called SimGNN, combines two strategies. First, we design a learnable embedding function that maps every graph into an embedding vector, which provides a global summary of a graph. A novel attention mechanism is proposed to emphasize the important nodes with respect to a specific similarity metric. Second, we design a pairwise node comparison method to sup plement the graph-level embeddings with fine-grained node-level information. Our model achieves better generalization on unseen graphs, and in the worst case runs in quadratic time with respect to the number of nodes in two graphs. Taking GED computation as an example, experimental results on three real graph datasets demonstrate the effectiveness and efficiency of our approach. Specifically, our model achieves smaller error rate and great time reduction compared against a series of baselines, including several approximation algorithms on GED computation, and many existing graph neural network based models. Our study suggests SimGNN provides a new direction for future research on graph similarity computation and graph similarity search.

This repository provides a PyTorch implementation of SimGNN as described in the paper:

SimGNN: A Neural Network Approach to Fast Graph Similarity Computation. Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, Wei Wang. WSDM, 2019. [Paper]

A reference Tensorflow implementation is accessible [here] and another implementation is [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
torch-scatter     1.4.0
torch-sparse      0.4.3
torch-cluster     1.4.5
torch-geometric   1.3.2
torchvision       0.3.0
scikit-learn      0.20.0

Datasets

The code takes pairs of graphs for training from an input folder where each pair of graph is stored as a JSON. Pairs of graphs used for testing are also stored as JSON files. Every node id and node label has to be indexed from 0. Keys of dictionaries are stored strings in order to make JSON serialization possible.

Every JSON file has the following key-value structure:

{"graph_1": [[0, 1], [1, 2], [2, 3], [3, 4]],
 "graph_2":  [[0, 1], [1, 2], [1, 3], [3, 4], [2, 4]],
 "labels_1": [2, 2, 2, 2],
 "labels_2": [2, 3, 2, 2, 2],
 "ged": 1}

The **graph_1** and **graph_2** keys have edge list values which descibe the connectivity structure. Similarly, the **labels_1** and **labels_2** keys have labels for each node which are stored as list - positions in the list correspond to node identifiers. The **ged** key has an integer value which is the raw graph edit distance for the pair of graphs.

Options

Training a SimGNN model is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --training-graphs   STR    Training graphs folder.      Default is `dataset/train/`.
  --testing-graphs    STR    Testing graphs folder.       Default is `dataset/test/`.

Model options

  --filters-1             INT         Number of filter in 1st GCN layer.       Default is 128.
  --filters-2             INT         Number of filter in 2nd GCN layer.       Default is 64. 
  --filters-3             INT         Number of filter in 3rd GCN layer.       Default is 32.
  --tensor-neurons        INT         Neurons in tensor network layer.         Default is 16.
  --bottle-neck-neurons   INT         Bottle neck layer neurons.               Default is 16.
  --bins                  INT         Number of histogram bins.                Default is 16.
  --batch-size            INT         Number of pairs processed per batch.     Default is 128. 
  --epochs                INT         Number of SimGNN training epochs.        Default is 5.
  --dropout               FLOAT       Dropout rate.                            Default is 0.5.
  --learning-rate         FLOAT       Learning rate.                           Default is 0.001.
  --weight-decay          FLOAT       Weight decay.                            Default is 10^-5.
  --histogram             BOOL        Include histogram features.              Default is False.

Examples

The following commands learn a neural network and score on the test set. Training a SimGNN model on the default dataset.

python src/main.py

Training a SimGNN model for a 100 epochs with a batch size of 512.

python src/main.py --epochs 100 --batch-size 512

Training a SimGNN with histogram features.

python src/main.py --histogram

Training a SimGNN with histogram features and a large bin number.

python src/main.py --histogram --bins 32

Increasing the learning rate and the dropout.

python src/main.py --learning-rate 0.01 --dropout 0.9

You can save the trained model by adding the --save-path parameter.

python src/main.py --save-path /path/to/model-name

Then you can load a pretrained model using the --load-path parameter; note that the model will be used as-is, no training will be performed.

python src/main.py --load-path /path/to/model-name

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].