husthuaan / Aoanet
Programming Languages
Projects that are alternatives of or similar to Aoanet
Attention on Attention for Image Captioning
This repository includes the implementation for Attention on Attention for Image Captioning.
Requirements
- Python 3.6
- Java 1.8.0
- PyTorch 1.0
- cider (already been added as a submodule)
- coco-caption (already been added as a submodule)
- tensorboardX
Training AoANet
Prepare data
See details in data/README.md
.
(notes: Set word_count_threshold
in scripts/prepro_labels.py
to 4 to generate a vocabulary of size 10,369.)
You should also preprocess the dataset and get the cache for calculating cider score for SCST:
$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train
Start training
$ CUDA_VISIBLE_DEVICES=0 sh train.sh
See opts.py
for the options. (You can download the pretrained models from here.)
Evaluation
$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aoanet_rl/model.pth --infos_path log/log_aoanet_rl/infos_aoanet.pkl --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test
Performance
You will get the scores close to below after training under xe loss for 25 epochs:
{'Bleu_1': 0.7729384559899702, 'Bleu_2': 0.6163398035383025, 'Bleu_3': 0.4790123137715982, 'Bleu_4': 0.36944349063530374, 'METEOR': 0.2848188431924821, 'ROUGE_L': 0.5729849683867054, 'CIDEr': 1.1842173801790759, 'SPICE': 0.21650786258302354}
(notes: You can enlarge --max_epochs
in train.sh
to train the model for more epochs and improve the scores.)
after training under SCST loss for another 15 epochs, you will get:
{'Bleu_1': 0.8054903453672397, 'Bleu_2': 0.6523038976984842, 'Bleu_3': 0.5096621263772566, 'Bleu_4': 0.39140307771618477, 'METEOR': 0.29011216375635934, 'ROUGE_L': 0.5890369750273199, 'CIDEr': 1.2892294296245852, 'SPICE': 0.22680092759866174}
Reference
If you find this repo helpful, please consider citing:
@inproceedings{huang2019attention,
title={Attention on Attention for Image Captioning},
author={Huang, Lun and Wang, Wenmin and Chen, Jie and Wei, Xiao-Yong},
booktitle={International Conference on Computer Vision},
year={2019}
}
Acknowledgements
This repository is based on self-critical.pytorch, and you may refer to it for more details about the code.