All Projects → ezeli → BUTD_model

ezeli / BUTD_model

Licence: MIT license
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to BUTD model

A Pytorch Tutorial To Image Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Stars: ✭ 1,867 (+6567.86%)
Mutual labels:  image-captioning
Image To Image Search
A reverse image search engine powered by elastic search and tensorflow
Stars: ✭ 200 (+614.29%)
Mutual labels:  image-captioning
CS231n
CS231n Assignments Solutions - Spring 2020
Stars: ✭ 48 (+71.43%)
Mutual labels:  image-captioning
Image Caption Generator
[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow
Stars: ✭ 141 (+403.57%)
Mutual labels:  image-captioning
Up Down Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
Stars: ✭ 195 (+596.43%)
Mutual labels:  image-captioning
Meshed Memory Transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Stars: ✭ 230 (+721.43%)
Mutual labels:  image-captioning
Gis
gis (go image server) go 实现的图片服务,实现基本的上传,下载,存储,按比例裁剪等功能
Stars: ✭ 108 (+285.71%)
Mutual labels:  image-captioning
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (+28.57%)
Mutual labels:  image-captioning
Sca Cnn.cvpr17
Image Captions Generation with Spatial and Channel-wise Attention
Stars: ✭ 198 (+607.14%)
Mutual labels:  image-captioning
Show Control And Tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Stars: ✭ 243 (+767.86%)
Mutual labels:  image-captioning
Show Adapt And Tell
Code for "Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner" in ICCV 2017
Stars: ✭ 146 (+421.43%)
Mutual labels:  image-captioning
Fairseq Image Captioning
Transformer-based image captioning extension for pytorch/fairseq
Stars: ✭ 180 (+542.86%)
Mutual labels:  image-captioning
Caption generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Stars: ✭ 243 (+767.86%)
Mutual labels:  image-captioning
Image Caption Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Stars: ✭ 126 (+350%)
Mutual labels:  image-captioning
Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-35.71%)
Mutual labels:  image-captioning
Sightseq
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
Stars: ✭ 116 (+314.29%)
Mutual labels:  image-captioning
Dataturks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Stars: ✭ 200 (+614.29%)
Mutual labels:  image-captioning
Udacity
This repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (+103.57%)
Mutual labels:  image-captioning
catr
Image Captioning Using Transformer
Stars: ✭ 206 (+635.71%)
Mutual labels:  image-captioning
Aoanet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Stars: ✭ 242 (+764.29%)
Mutual labels:  image-captioning

BUTD_model

Environment

  • Python 3.7
  • Pytorch 1.3.1

Method

1. Architecture

Architecture

2. Main Process

  • Top-Down Attention LSTM Input
    Formula1
  • Attend
    Formula2
  • Language LSTM Input
    Formula3

Usage

1. Preprocessing

Extract image features by ResNet-101 (denoted as grid-based features) and process coco captions data (from Karpathy splits) through preprocess.py. Need to adjust the parameters, where resnet101_file comes from here. Image features can also be obtained from here or extracted using ezeli/bottom_up_features_extract repository (using fixed 36 features per image, denoted as region-based features).

This project is not limited to the MSCOCO dataset, but you need to process your data according to the data format in the preprocess.py file.

2. Training

  • First adjust the parameters in opt.py:
    • train_mode: 'xe' for pre-training, 'rl' for fine-tuning (+SCST).
    • learning_rate: '4e-4' for xe, '4e-5' for rl.
    • resume: resume training from this checkpoint. required for rl.
    • other parameters can be modified as needed.
  • Run:
    • python train.py
    • checkpoint save in checkpoint dir, test result save in result dir.

3. Test

  • python test.py -t model.pth -i image.jpg
  • only applicable to the model trained by grid-based features.
  • for region-based features, you can first extract the image feature through ezeli/bottom_up_features_extract repository, and then simply modify the test.py file to use.

Result

Evaluation metrics

Evaluation tool: ezeli/caption_eval

XE represents Cross-Entropy loss, and +SCST means using reinforcement learning to fine-tune the model (using CIDEr reward).

features training Bleu-1 Bleu-2 Bleu-3 Bleu-4 METEOR ROUGE_L CIDEr SPICE
grid-based XE 75.4 59.1 45.5 34.8 26.9 55.6 109.3 20.2
grid-based +SCST 78.7 62.5 47.6 35.7 27.2 56.7 119.1 20.7
region-based XE 76.0 59.9 46.4 35.8 27.3 56.2 110.9 20.3
region-based +SCST 79.5 63.6 48.8 36.9 27.8 57.6 123.1 21.4

Examples

COCO_val2014_000000386164
a bunch of wooden knives on a wooden table.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].