All Projects → acbull → Pyhgt

acbull / Pyhgt

Licence: mit
Code for "Heterogeneous Graph Transformer" (WWW'20), which is based on pytorch_geometric

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyhgt

uformer-pytorch
Implementation of Uformer, Attention-based Unet, in Pytorch
Stars: ✭ 54 (-82.75%)
Mutual labels:  transformer
Keras Transformer
Transformer implemented in Keras
Stars: ✭ 273 (-12.78%)
Mutual labels:  transformer
Dab
Data Augmentation by Backtranslation (DAB) ヽ( •_-)ᕗ
Stars: ✭ 294 (-6.07%)
Mutual labels:  transformer
AITQA
resources for the IBM Airlines Table-Question-Answering Benchmark
Stars: ✭ 12 (-96.17%)
Mutual labels:  transformer
Allrank
allRank is a framework for training learning-to-rank neural models based on PyTorch.
Stars: ✭ 269 (-14.06%)
Mutual labels:  transformer
Demo Chinese Text Binary Classification With Bert
Stars: ✭ 276 (-11.82%)
Mutual labels:  transformer
SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
Stars: ✭ 1,260 (+302.56%)
Mutual labels:  transformer
Cognitive Speech Tts
Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
Stars: ✭ 312 (-0.32%)
Mutual labels:  transformer
Remi
"Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions", ACM Multimedia 2020
Stars: ✭ 273 (-12.78%)
Mutual labels:  transformer
Transformer Tensorflow
Implementation of Transformer Model in Tensorflow
Stars: ✭ 286 (-8.63%)
Mutual labels:  transformer
bert in a flask
A dockerized flask API, serving ALBERT and BERT predictions using TensorFlow 2.0.
Stars: ✭ 32 (-89.78%)
Mutual labels:  transformer
Nlp Interview Notes
本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料,该资料目前包含 自然语言处理各领域的 面试题积累。
Stars: ✭ 207 (-33.87%)
Mutual labels:  transformer
Transformer
Easy Attributed String Creator
Stars: ✭ 278 (-11.18%)
Mutual labels:  transformer
svgs2fonts
npm-svgs2fonts。svg图标转字体图标库(svgs -> svg,ttf,eot,woff,woff2),nodejs。
Stars: ✭ 29 (-90.73%)
Mutual labels:  transformer
Vedastr
A scene text recognition toolbox based on PyTorch
Stars: ✭ 290 (-7.35%)
Mutual labels:  transformer
Swin-Transformer-Tensorflow
Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)
Stars: ✭ 45 (-85.62%)
Mutual labels:  transformer
Transformer
Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series.
Stars: ✭ 273 (-12.78%)
Mutual labels:  transformer
Laravel5 Jsonapi
Laravel 5 JSON API Transformer Package
Stars: ✭ 313 (+0%)
Mutual labels:  transformer
Transformer Pointer Generator
A Abstractive Summarization Implementation with Transformer and Pointer-generator
Stars: ✭ 297 (-5.11%)
Mutual labels:  transformer
Viewpagertransition
viewpager with parallax pages, together with vertical sliding (or click) and activity transition
Stars: ✭ 3,017 (+863.9%)
Mutual labels:  transformer

Heterogeneous Graph Transformer (HGT)

UPDATE: HGT is the current SOTA result on the Stanford OGBN-MAG dataset. The codes are also avaiable in this repo.

Alternative reference Deep Graph Library (DGL) implementation

Heterogeneous Graph Transformer is a graph neural network architecture that can deal with large-scale heterogeneous and dynamic graphs.

You can see our WWW 2020 paper Heterogeneous Graph Transformer for more details.

This implementation of HGT is based on Pytorch Geometric API

Overview

The most important files in this projects are as follow:

  • conv.py: The core of our model, implements the transformer-like heterogeneous graph convolutional layer.
  • model.py: The wrap of different model components.
  • data.py: The data interface and usage.
    • class Graph: The data structure of heterogeneous graph. Stores feature in Graph.node_feature as pandas.DataFrame; Stores adjacency matrix in Graph.edge_list as dictionay.
    • def sample_subgraph: The sampling algorithm for heterogeneous graph. Each iteration samples a fixed number of nodes per type. All the sampled nodes are within the region of already sampled nodes, with sampling probability as the square of relative degree.
  • train_*.py: The training and validation script for a specific downstream task.
    • def *_sample: The sampling function for a given task. Remember to mask out existing link within the graph to avoid information leakage.
    • def prepare_data: Conduct sampling in parallel with multiple processes, which can seamlessly coordinate with model training.

Setup

This implementation is based on pytorch_geometric. To run the code, you need the following dependencies:

You can simply run pip install -r requirements.txt to install all the necessary packages.

OAG DataSet

Our current experiments are conducted on Open Academic Graph (OAG). For easiness of usage, we split and preprocess the whole dataset into different granularity: all CS papers (8.1G), all ML papers (1.9G), all NN papers (0.6G) spanning from 1900-2020. You can download the preprocessed graph via this link.

If you want to directly process from raw data, you can download via this link. After downloading it, run preprocess_OAG.py to extract features and store them in our data structure.

You can also use our code to process other heterogeneous graph, as long as you load them into our data structure class Graph in data.py. Refer to preprocess_OAG.py for a demonstration.

Usage

Execute the following scripts to train on paper-field (L2) classification task using HGT:

python3 train_paper_field.py --data_dir PATH_OF_DATASET --model_dir PATH_OF_SAVED_MODEL --conv_name hgt

Conducting other two tasks are similar. There are some key options of this scrips:

  • conv_name: Choose corresponding model for training. By default we use HGT.
  • --sample_depth and --sample_width: The depth and width of sampled graph. If the model exceeds the GPU memory, can consider reduce their number; if one wants to train a deeper GNN model, consider adding these numbers.
  • --n_pool: The number of process to parallely conduct sampling. If one has a machine with large memory, can consider adding this number to reduce batch prepartion time.
  • --repeat: The number of time to reuse a sampled batch for training. If the training time is much smaller than sampling time, can consider adding this number.

The details of other optional hyperparameters can be found in train_*.py.

Citation

Please consider citing the following paper when using our code for your application.

@inproceedings{hgt,
  author    = {Ziniu Hu and
               Yuxiao Dong and
               Kuansan Wang and
               Yizhou Sun},
  title     = {Heterogeneous Graph Transformer},
  booktitle = {{WWW} '20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020},
  pages     = {2704--2710},
  publisher = {{ACM} / {IW3C2}},
  year      = {2020},
  url       = {https://doi.org/10.1145/3366423.3380027},
  doi       = {10.1145/3366423.3380027},
  timestamp = {Wed, 06 May 2020 12:56:16 +0200},
  biburl    = {https://dblp.org/rec/conf/www/HuDWS20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].