All Projects → husthuaan → Aoanet

husthuaan / Aoanet

Licence: mit
Code for paper "Attention on Attention for Image Captioning". ICCV 2019

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Aoanet

Sca Cnn.cvpr17
Image Captions Generation with Spatial and Channel-wise Attention
Stars: ✭ 198 (-18.18%)
Mutual labels:  attention-mechanism, image-captioning
Show Attend And Tell
TensorFlow Implementation of "Show, Attend and Tell"
Stars: ✭ 869 (+259.09%)
Mutual labels:  attention-mechanism, image-captioning
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-85.12%)
Mutual labels:  image-captioning, attention-mechanism
A Pytorch Tutorial To Image Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Stars: ✭ 1,867 (+671.49%)
Mutual labels:  attention-mechanism, image-captioning
Adaptiveattention
Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
Stars: ✭ 303 (+25.21%)
Mutual labels:  attention-mechanism, image-captioning
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (-47.93%)
Mutual labels:  attention-mechanism, image-captioning
Image To Image Search
A reverse image search engine powered by elastic search and tensorflow
Stars: ✭ 200 (-17.36%)
Mutual labels:  image-captioning
Neat Vision
Neat (Neural Attention) Vision, is a visualization tool for the attention mechanisms of deep-learning models for Natural Language Processing (NLP) tasks. (framework-agnostic)
Stars: ✭ 213 (-11.98%)
Mutual labels:  attention-mechanism
Sparse Structured Attention
Sparse and structured neural attention mechanisms
Stars: ✭ 198 (-18.18%)
Mutual labels:  attention-mechanism
Hnatt
Train and visualize Hierarchical Attention Networks
Stars: ✭ 192 (-20.66%)
Mutual labels:  attention-mechanism
Linformer Pytorch
My take on a practical implementation of Linformer for Pytorch.
Stars: ✭ 239 (-1.24%)
Mutual labels:  attention-mechanism
Triplet Attention
Official PyTorch Implementation for "Rotate to Attend: Convolutional Triplet Attention Module." [WACV 2021]
Stars: ✭ 222 (-8.26%)
Mutual labels:  attention-mechanism
Linear Attention Transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
Stars: ✭ 205 (-15.29%)
Mutual labels:  attention-mechanism
Dataturks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Stars: ✭ 200 (-17.36%)
Mutual labels:  image-captioning
Lightnetplusplus
LightNet++: Boosted Light-weighted Networks for Real-time Semantic Segmentation
Stars: ✭ 218 (-9.92%)
Mutual labels:  attention-mechanism
Self Attention Cv
Implementation of various self-attention mechanisms focused on computer vision. Ongoing repository.
Stars: ✭ 209 (-13.64%)
Mutual labels:  attention-mechanism
Up Down Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
Stars: ✭ 195 (-19.42%)
Mutual labels:  image-captioning
Guided Attention Inference Network
Contains implementation of Guided Attention Inference Network (GAIN) presented in Tell Me Where to Look(CVPR 2018). This repository aims to apply GAIN on fcn8 architecture used for segmentation.
Stars: ✭ 204 (-15.7%)
Mutual labels:  attention-mechanism
X Transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
Stars: ✭ 211 (-12.81%)
Mutual labels:  attention-mechanism
Attention Mechanisms
Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.
Stars: ✭ 203 (-16.12%)
Mutual labels:  attention-mechanism

Attention on Attention for Image Captioning

This repository includes the implementation for Attention on Attention for Image Captioning.

Requirements

  • Python 3.6
  • Java 1.8.0
  • PyTorch 1.0
  • cider (already been added as a submodule)
  • coco-caption (already been added as a submodule)
  • tensorboardX

Training AoANet

Prepare data

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Start training

$ CUDA_VISIBLE_DEVICES=0 sh train.sh

See opts.py for the options. (You can download the pretrained models from here.)

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aoanet_rl/model.pth --infos_path log/log_aoanet_rl/infos_aoanet.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Performance

You will get the scores close to below after training under xe loss for 25 epochs:

{'Bleu_1': 0.7729384559899702, 'Bleu_2': 0.6163398035383025, 'Bleu_3': 0.4790123137715982, 'Bleu_4': 0.36944349063530374, 'METEOR': 0.2848188431924821, 'ROUGE_L': 0.5729849683867054, 'CIDEr': 1.1842173801790759, 'SPICE': 0.21650786258302354}

(notes: You can enlarge --max_epochs in train.sh to train the model for more epochs and improve the scores.)

after training under SCST loss for another 15 epochs, you will get:

{'Bleu_1': 0.8054903453672397, 'Bleu_2': 0.6523038976984842, 'Bleu_3': 0.5096621263772566, 'Bleu_4': 0.39140307771618477, 'METEOR': 0.29011216375635934, 'ROUGE_L': 0.5890369750273199, 'CIDEr': 1.2892294296245852, 'SPICE': 0.22680092759866174}

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019attention,
  title={Attention on Attention for Image Captioning},
  author={Huang, Lun and Wang, Wenmin and Chen, Jie and Wei, Xiao-Yong},
  booktitle={International Conference on Computer Vision},
  year={2019}
}

Acknowledgements

This repository is based on self-critical.pytorch, and you may refer to it for more details about the code.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].