Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ezeli → BUTD_model

ezeli / BUTD_model

Licence: MIT license

A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.

Programming Languages

139335 projects - #7 most used programming language

Labels

image-captioning

Projects that are alternatives of or similar to BUTD model

A Pytorch Tutorial To Image Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

Stars: ✭ 1,867 (+6567.86%)

Mutual labels: image-captioning

Image To Image Search

A reverse image search engine powered by elastic search and tensorflow

Stars: ✭ 200 (+614.29%)

Mutual labels: image-captioning

CS231n Assignments Solutions - Spring 2020

Stars: ✭ 48 (+71.43%)

Mutual labels: image-captioning

Image Caption Generator

[DEPRECATED] A Neural Network based generative model for captioning images using Tensorflow

Stars: ✭ 141 (+403.57%)

Mutual labels: image-captioning

Up Down Captioner

Automatic image captioning model based on Caffe, using features from bottom-up attention.

Stars: ✭ 195 (+596.43%)

Mutual labels: image-captioning

Meshed Memory Transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Stars: ✭ 230 (+721.43%)

Mutual labels: image-captioning

gis (go image server) go 实现的图片服务，实现基本的上传，下载，存储，按比例裁剪等功能

Stars: ✭ 108 (+285.71%)

Mutual labels: image-captioning

udacity-cvnd-projects

My solutions to the projects assigned for the Udacity Computer Vision Nanodegree

Stars: ✭ 36 (+28.57%)

Mutual labels: image-captioning

Image Captions Generation with Spatial and Channel-wise Attention

Stars: ✭ 198 (+607.14%)

Mutual labels: image-captioning

Show Control And Tell

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019

Stars: ✭ 243 (+767.86%)

Mutual labels: image-captioning

Show Adapt And Tell

Code for "Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner" in ICCV 2017

Stars: ✭ 146 (+421.43%)

Mutual labels: image-captioning

Fairseq Image Captioning

Transformer-based image captioning extension for pytorch/fairseq

Stars: ✭ 180 (+542.86%)

Mutual labels: image-captioning

Caption generator

A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.

Stars: ✭ 243 (+767.86%)

Mutual labels: image-captioning

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Stars: ✭ 126 (+350%)

Mutual labels: image-captioning

Image-Captioning-with-Beam-Search

Generating image captions using Xception Network and Beam Search in Keras

Stars: ✭ 18 (-35.71%)

Mutual labels: image-captioning

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

Stars: ✭ 116 (+314.29%)

Mutual labels: image-captioning

ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.

Stars: ✭ 200 (+614.29%)

Mutual labels: image-captioning

This repo includes all the projects I have finished in the Udacity Nanodegree programs

Stars: ✭ 57 (+103.57%)

Mutual labels: image-captioning

Image Captioning Using Transformer

Stars: ✭ 206 (+635.71%)

Mutual labels: image-captioning

Code for paper "Attention on Attention for Image Captioning". ICCV 2019

Stars: ✭ 242 (+764.29%)

Mutual labels: image-captioning

View All Similar Projects ➔

BUTD_model

A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
SCST training from "Self-critical Sequence Training for Image Captioning".
Clear and easy to learn.

Environment

Python 3.7
Pytorch 1.3.1

Method

1. Architecture

2. Main Process

Top-Down Attention LSTM Input
Attend
Language LSTM Input

Usage

1. Preprocessing

Extract image features by ResNet-101 (denoted as grid-based features) and process coco captions data (from Karpathy splits) through preprocess.py. Need to adjust the parameters, where resnet101_file comes from here. Image features can also be obtained from here or extracted using ezeli/bottom_up_features_extract repository (using fixed 36 features per image, denoted as region-based features).

This project is not limited to the MSCOCO dataset, but you need to process your data according to the data format in the preprocess.py file.

2. Training

First adjust the parameters in opt.py:
- train_mode: 'xe' for pre-training, 'rl' for fine-tuning (+SCST).
- learning_rate: '4e-4' for xe, '4e-5' for rl.
- resume: resume training from this checkpoint. required for rl.
- other parameters can be modified as needed.
Run:
- python train.py
- checkpoint save in checkpoint dir, test result save in result dir.

3. Test

python test.py -t model.pth -i image.jpg
only applicable to the model trained by grid-based features.
for region-based features, you can first extract the image feature through ezeli/bottom_up_features_extract repository, and then simply modify the test.py file to use.

Result

Evaluation metrics

Evaluation tool: ezeli/caption_eval

XE represents Cross-Entropy loss, and +SCST means using reinforcement learning to fine-tune the model (using CIDEr reward).

features	training	Bleu-1	Bleu-2	Bleu-3	Bleu-4	METEOR	ROUGE_L	CIDEr	SPICE
grid-based	XE	75.4	59.1	45.5	34.8	26.9	55.6	109.3	20.2
grid-based	+SCST	78.7	62.5	47.6	35.7	27.2	56.7	119.1	20.7
region-based	XE	76.0	59.9	46.4	35.8	27.3	56.2	110.9	20.3
region-based	+SCST	79.5	63.6	48.8	36.9	27.8	57.6	123.1	21.4

Examples


a bunch of wooden knives on a wooden table.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 28

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗