Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

My implementation of the original GAT paper (Veličković et al.). I've additionally included the playground.py file for visualizing the Cora dataset, GAT embeddings, an attention mechanism, and entropy histograms. I've supported both Cora (transductive) and PPI (inductive) examples!

Stars: ✭ 908 (+2011.63%)

Mutual labels: attention

Punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text

Stars: ✭ 483 (+1023.26%)

Mutual labels: attention

Attentioncluster

TensorFlow Implementation of "Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification"

Stars: ✭ 33 (-23.26%)

Mutual labels: attention

Nlp paper study

研读顶会论文，复现论文相关代码

Stars: ✭ 691 (+1506.98%)

Mutual labels: attention

Isab Pytorch

An implementation of (Induced) Set Attention Block, from the Set Transformers paper

Stars: ✭ 21 (-51.16%)

Mutual labels: attention

Simplecvreproduction

Reproduce simple cv project including attention module, classification, object detection, segmentation, keypoint detection, tracking 😄 etc.

Stars: ✭ 602 (+1300%)

Mutual labels: attention

Awesome Fast Attention

list of efficient attention modules

Stars: ✭ 627 (+1358.14%)

Mutual labels: attention

Cell Detr

Official and maintained implementation of the paper Attention-Based Transformers for Instance Segmentation of Cells in Microstructures [BIBM 2020].

Stars: ✭ 26 (-39.53%)

Mutual labels: attention

Speech Transformer

A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.

Stars: ✭ 565 (+1213.95%)

Mutual labels: attention

Defactonlp

DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention models.

Stars: ✭ 30 (-30.23%)

Mutual labels: attention

Residual Attention Network

Residual Attention Network for Image Classification

Stars: ✭ 525 (+1120.93%)

Mutual labels: attention

Spatial Transformer Network

A Tensorflow implementation of Spatial Transformer Networks.

Stars: ✭ 794 (+1746.51%)

Mutual labels: attention

Attentions

PyTorch implementation of some attentions for Deep Learning Researchers.

Stars: ✭ 39 (-9.3%)

Mutual labels: attention

Attentive Neural Processes

implementing "recurrent attentive neural processes" to forecast power usage (w. LSTM baseline, MCDropout)

Stars: ✭ 33 (-23.26%)

Mutual labels: attention

Nlp tensorflow project

Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.

Stars: ✭ 27 (-37.21%)

Mutual labels: attention

View All Similar Projects ➔

BiBloSA-pytorch

Re-implementation of Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling (T. Shen et al., ICLR 2018) on Pytorch.

Results

Dataset: SNLI

Model	ACC(%)
Re-implementation (600D Bi-BloSAN)	84.1
Baseline from the paper (480D Bi-BloSAN)	85.7

Development Environment

OS: Ubuntu 16.04 LTS (64bit)
Language: Python 3.6.2
Pytorch: 0.3.0

Requirements

Please install the following library requirements specified in the requirements.txt first.

nltk==3.2.4
tensorboardX==1.0
torch==0.3.0
torchtext==0.2.1

Training

python train.py --help

usage: train.py [-h] [--batch-size BATCH_SIZE] [--block-size BLOCK_SIZE]
            [--data-type DATA_TYPE] [--dropout DROPOUT] [--epoch EPOCH]
            [--gpu GPU] [--learning-rate LEARNING_RATE]
            [--mSA-scalar MSA_SCALAR] [--print-freq PRINT_FREQ]
            [--weight-decay WEIGHT_DECAY] [--word-dim WORD_DIM]

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
  --block-size BLOCK_SIZE
  --data-type DATA_TYPE
  --dropout DROPOUT
  --epoch EPOCH
  --gpu GPU
  --learning-rate LEARNING_RATE
  --mSA-scalar MSA_SCALAR
  --print-freq PRINT_FREQ
  --weight-decay WEIGHT_DECAY
  --word-dim WORD_DIM

Note:

The two of most important hyperparameters are block-size (r in the paper) and mSA-scalar (c in the paper). The paper suggests a heuristic to decide the r (in the Appendix) but there's no mention about c. In this implementation, r is computed by the suggested heuristic and c is set to 5, following the settings of the authors. But you can also assign values to them manually.
The Dropout technique also exists in this model, but it is not specified that how the dropout is applied. Therefore, to be naive, the dropout is adapted to layers for SNLI (NN4SNLI class) only.
Furthermore, there're no details about 480D Bi-BloSAN, whose result is reported in the paper. Hence, the result reported here is based on 600D(300D-Forward + 300D-Backward) Bi-BloSAN. Note that hyperparameter tuning hasn't been done thoroughly. The result can be improved with fine-tuning.

Test

python test.py --help

usage: test.py [-h] [--batch-size BATCH_SIZE] [--block-size BLOCK_SIZE]
           [--data-type DATA_TYPE] [--dropout DROPOUT] [--epoch EPOCH]
           [--gpu GPU] [--mSA-scalar MSA_SCALAR] [--print-freq PRINT_FREQ]
           [--word-dim WORD_DIM] --model-path MODEL_PATH

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
  --block-size BLOCK_SIZE
  --data-type DATA_TYPE
  --dropout DROPOUT
  --epoch EPOCH
  --gpu GPU
  --mSA-scalar MSA_SCALAR
  --print-freq PRINT_FREQ
  --word-dim WORD_DIM
  --model-path MODEL_PATH

Note: You should execute test.py with the same hyperparameters, which are used for training the model you want to run.

MISC.

The original code implemented by the authors (on Tensorflow) can be found here

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 43

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗