All Projects → luo3300612 → image-captioning-DLCT

luo3300612 / image-captioning-DLCT

Licence: BSD-3-Clause License
Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to image-captioning-DLCT

Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-86.57%)
Mutual labels:  image-captioning
Image-Captioining
The objective is to process by generating textual description from an image – based on the objects and actions in the image. Using generative models so that it creates novel sentences. Pipeline type models uses two separate learning process, one for language modelling and other for image recognition. It first identifies objects in image and prov…
Stars: ✭ 20 (-85.07%)
Mutual labels:  image-captioning
Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
Stars: ✭ 56 (-58.21%)
Mutual labels:  image-captioning
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (-73.13%)
Mutual labels:  image-captioning
Show and Tell
Show and Tell : A Neural Image Caption Generator
Stars: ✭ 74 (-44.78%)
Mutual labels:  image-captioning
MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Stars: ✭ 57 (-57.46%)
Mutual labels:  image-captioning
Show Control And Tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Stars: ✭ 243 (+81.34%)
Mutual labels:  image-captioning
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-73.13%)
Mutual labels:  image-captioning
pix2code-pytorch
PyTorch implementation of pix2code. 🔥
Stars: ✭ 24 (-82.09%)
Mutual labels:  image-captioning
Image-Captioning
Image Captioning with Keras
Stars: ✭ 60 (-55.22%)
Mutual labels:  image-captioning
Udacity
This repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (-57.46%)
Mutual labels:  image-captioning
LaBERT
A length-controllable and non-autoregressive image captioning model.
Stars: ✭ 50 (-62.69%)
Mutual labels:  image-captioning
Adaptive
Pytorch Implementation of Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Stars: ✭ 97 (-27.61%)
Mutual labels:  image-captioning
catr
Image Captioning Using Transformer
Stars: ✭ 206 (+53.73%)
Mutual labels:  image-captioning
RSTNet
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
Stars: ✭ 71 (-47.01%)
Mutual labels:  image-captioning
CS231n
CS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (-64.18%)
Mutual labels:  image-captioning
gramtion
Twitter bot for generating photo descriptions (alt text)
Stars: ✭ 21 (-84.33%)
Mutual labels:  image-captioning
Machine-Learning
The projects I do in Machine Learning with PyTorch, keras, Tensorflow, scikit learn and Python.
Stars: ✭ 54 (-59.7%)
Mutual labels:  image-captioning
localized-narratives
Localized Narratives
Stars: ✭ 60 (-55.22%)
Mutual labels:  image-captioning
Show-Attend-and-Tell
A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Stars: ✭ 58 (-56.72%)
Mutual labels:  image-captioning

Dual-Level Collaborative Transformer for Image Captioning

This repository contains the reference code for the paper Dual-Level Collaborative Transformer for Image Captioning.

Experiment setup

please refer to m2 transformer

Data preparation

  • Annotation. Download the annotation file annotation.zip. Extarct and put it in the project root directory.
  • Feature. You can download our ResNeXt-101 feature (hdf5 file) here. Acess code: jcj6.
  • evaluation. Download the evaluation tools here. Acess code: jcj6. Extarct and put it in the project root directory.

There are five kinds of keys in our .hdf5 file. They are

  • ['%d_features' % image_id]: region features (N_regions, feature_dim)
  • ['%d_boxes' % image_id]: bounding box of region features (N_regions, 4)
  • ['%d_size' % image_id]: size of original image (for normalizing bounding box), (2,)
  • ['%d_grids' % image_id]: grid features (N_grids, feature_dim)
  • ['%d_mask' % image_id]: geometric alignment graph, (N_regions, N_grids)

We extract feature with the code in grid-feats-vqa.

The first three keys can be obtained when extracting region features with extract_region_feature.py. The forth key can be obtained when extracting grid features with code in grid-feats-vqa. The last key can be obtained with align.ipynb

Training

python train.py --exp_name dlct --batch_size 50 --head 8 --features_path ./data/coco_all_align.hdf5 --annotation annotation --workers 8 --rl_batch_size 100 --image_field ImageAllFieldWithMask --model DLCT --rl_at 17 --seed 118

Evaluation

python eval.py --annotation annotation --workers 4 --features_path ./data/coco_all_align.hdf5 --model_path path_of_model_to_eval --model DLCT --image_field ImageAllFieldWithMask --grid_embed --box_embed --dump_json gen_res.json --beam_size 5

Important args:

  • --features_path path to hdf5 file
  • --model_path
  • --dump_json dump generated captions to

Pretrained model is available here. Acess code: jcj6. By evaluating the pretrained model, you will get

{'BLEU': [0.8136727001615207, 0.6606095421082421, 0.5167535314080227, 0.39790755018790197], 'METEOR': 0.29522868252436046, 'ROUGE': 0.5914367650104326, 'CIDEr': 1.3382047139781112, 'SPICE': 0.22953477359195887}

References

[1] M2

[2] grid-feats-vqa

[3] butd

Acknowledgements

Thanks the original m2 and amazing work of grid-feats-vqa.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].