All Projects → galsang → Biblosa Pytorch

galsang / Biblosa Pytorch

Re-implementation of Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling (T. Shen et al., ICLR 2018) on Pytorch.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Biblosa Pytorch

Performer Pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
Stars: ✭ 546 (+1169.77%)
Mutual labels:  attention
Tf Rnn Attention
Tensorflow implementation of attention mechanism for text classification tasks.
Stars: ✭ 735 (+1609.3%)
Mutual labels:  attention
Banglatranslator
Bangla Machine Translator
Stars: ✭ 21 (-51.16%)
Mutual labels:  attention
Attention Is All You Need Pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Stars: ✭ 6,070 (+14016.28%)
Mutual labels:  attention
Text Classification
Implementation of papers for text classification task on DBpedia
Stars: ✭ 682 (+1486.05%)
Mutual labels:  attention
Pytorch Gat
My implementation of the original GAT paper (Veličković et al.). I've additionally included the playground.py file for visualizing the Cora dataset, GAT embeddings, an attention mechanism, and entropy histograms. I've supported both Cora (transductive) and PPI (inductive) examples!
Stars: ✭ 908 (+2011.63%)
Mutual labels:  attention
Punctuator2
A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
Stars: ✭ 483 (+1023.26%)
Mutual labels:  attention
Attentioncluster
TensorFlow Implementation of "Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification"
Stars: ✭ 33 (-23.26%)
Mutual labels:  attention
Nlp paper study
研读顶会论文,复现论文相关代码
Stars: ✭ 691 (+1506.98%)
Mutual labels:  attention
Isab Pytorch
An implementation of (Induced) Set Attention Block, from the Set Transformers paper
Stars: ✭ 21 (-51.16%)
Mutual labels:  attention
Simplecvreproduction
Reproduce simple cv project including attention module, classification, object detection, segmentation, keypoint detection, tracking 😄 etc.
Stars: ✭ 602 (+1300%)
Mutual labels:  attention
Awesome Fast Attention
list of efficient attention modules
Stars: ✭ 627 (+1358.14%)
Mutual labels:  attention
Cell Detr
Official and maintained implementation of the paper Attention-Based Transformers for Instance Segmentation of Cells in Microstructures [BIBM 2020].
Stars: ✭ 26 (-39.53%)
Mutual labels:  attention
Speech Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Stars: ✭ 565 (+1213.95%)
Mutual labels:  attention
Defactonlp
DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention models.
Stars: ✭ 30 (-30.23%)
Mutual labels:  attention
Residual Attention Network
Residual Attention Network for Image Classification
Stars: ✭ 525 (+1120.93%)
Mutual labels:  attention
Spatial Transformer Network
A Tensorflow implementation of Spatial Transformer Networks.
Stars: ✭ 794 (+1746.51%)
Mutual labels:  attention
Attentions
PyTorch implementation of some attentions for Deep Learning Researchers.
Stars: ✭ 39 (-9.3%)
Mutual labels:  attention
Attentive Neural Processes
implementing "recurrent attentive neural processes" to forecast power usage (w. LSTM baseline, MCDropout)
Stars: ✭ 33 (-23.26%)
Mutual labels:  attention
Nlp tensorflow project
Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.
Stars: ✭ 27 (-37.21%)
Mutual labels:  attention

BiBloSA-pytorch

Re-implementation of Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling (T. Shen et al., ICLR 2018) on Pytorch.

Results

Dataset: SNLI

Model ACC(%)
Re-implementation (600D Bi-BloSAN) 84.1
Baseline from the paper (480D Bi-BloSAN) 85.7

Development Environment

  • OS: Ubuntu 16.04 LTS (64bit)
  • Language: Python 3.6.2
  • Pytorch: 0.3.0

Requirements

Please install the following library requirements specified in the requirements.txt first.

nltk==3.2.4
tensorboardX==1.0
torch==0.3.0
torchtext==0.2.1

Training

python train.py --help

usage: train.py [-h] [--batch-size BATCH_SIZE] [--block-size BLOCK_SIZE]
            [--data-type DATA_TYPE] [--dropout DROPOUT] [--epoch EPOCH]
            [--gpu GPU] [--learning-rate LEARNING_RATE]
            [--mSA-scalar MSA_SCALAR] [--print-freq PRINT_FREQ]
            [--weight-decay WEIGHT_DECAY] [--word-dim WORD_DIM]

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
  --block-size BLOCK_SIZE
  --data-type DATA_TYPE
  --dropout DROPOUT
  --epoch EPOCH
  --gpu GPU
  --learning-rate LEARNING_RATE
  --mSA-scalar MSA_SCALAR
  --print-freq PRINT_FREQ
  --weight-decay WEIGHT_DECAY
  --word-dim WORD_DIM 

Note:

  • The two of most important hyperparameters are block-size (r in the paper) and mSA-scalar (c in the paper). The paper suggests a heuristic to decide the r (in the Appendix) but there's no mention about c. In this implementation, r is computed by the suggested heuristic and c is set to 5, following the settings of the authors. But you can also assign values to them manually.
  • The Dropout technique also exists in this model, but it is not specified that how the dropout is applied. Therefore, to be naive, the dropout is adapted to layers for SNLI (NN4SNLI class) only.
  • Furthermore, there're no details about 480D Bi-BloSAN, whose result is reported in the paper. Hence, the result reported here is based on 600D(300D-Forward + 300D-Backward) Bi-BloSAN. Note that hyperparameter tuning hasn't been done thoroughly. The result can be improved with fine-tuning.

Test

python test.py --help

usage: test.py [-h] [--batch-size BATCH_SIZE] [--block-size BLOCK_SIZE]
           [--data-type DATA_TYPE] [--dropout DROPOUT] [--epoch EPOCH]
           [--gpu GPU] [--mSA-scalar MSA_SCALAR] [--print-freq PRINT_FREQ]
           [--word-dim WORD_DIM] --model-path MODEL_PATH

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
  --block-size BLOCK_SIZE
  --data-type DATA_TYPE
  --dropout DROPOUT
  --epoch EPOCH
  --gpu GPU
  --mSA-scalar MSA_SCALAR
  --print-freq PRINT_FREQ
  --word-dim WORD_DIM
  --model-path MODEL_PATH

Note: You should execute test.py with the same hyperparameters, which are used for training the model you want to run.

MISC.

The original code implemented by the authors (on Tensorflow) can be found here

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].