All Projects → chenxinpeng → im2p

chenxinpeng / im2p

Licence: other
Tensorflow implement of paper: A Hierarchical Approach for Generating Descriptive Image Paragraphs

Programming Languages

lua
6591 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to im2p

BUTD model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
Stars: ✭ 28 (-34.88%)
Mutual labels:  image-captioning
Show-Attend-and-Tell
A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Stars: ✭ 58 (+34.88%)
Mutual labels:  image-captioning
Machine-Learning
The projects I do in Machine Learning with PyTorch, keras, Tensorflow, scikit learn and Python.
Stars: ✭ 54 (+25.58%)
Mutual labels:  image-captioning
Show and Tell
Show and Tell : A Neural Image Caption Generator
Stars: ✭ 74 (+72.09%)
Mutual labels:  image-captioning
MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Stars: ✭ 57 (+32.56%)
Mutual labels:  image-captioning
Awesome-Captioning
A curated list of Multimodal Captioning related research(including image captioning, video captioning, and text captioning)
Stars: ✭ 56 (+30.23%)
Mutual labels:  image-captioning
udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (-16.28%)
Mutual labels:  image-captioning
stylenet
A pytorch implemention of "StyleNet: Generating Attractive Visual Captions with Styles"
Stars: ✭ 58 (+34.88%)
Mutual labels:  image-captioning
Adaptive
Pytorch Implementation of Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Stars: ✭ 97 (+125.58%)
Mutual labels:  image-captioning
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-16.28%)
Mutual labels:  image-captioning
pix2code-pytorch
PyTorch implementation of pix2code. 🔥
Stars: ✭ 24 (-44.19%)
Mutual labels:  image-captioning
gramtion
Twitter bot for generating photo descriptions (alt text)
Stars: ✭ 21 (-51.16%)
Mutual labels:  image-captioning
RSTNet
RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words (CVPR 2021)
Stars: ✭ 71 (+65.12%)
Mutual labels:  image-captioning
LaBERT
A length-controllable and non-autoregressive image captioning model.
Stars: ✭ 50 (+16.28%)
Mutual labels:  image-captioning
image-captioning-DLCT
Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
Stars: ✭ 134 (+211.63%)
Mutual labels:  image-captioning
Udacity
This repo includes all the projects I have finished in the Udacity Nanodegree programs
Stars: ✭ 57 (+32.56%)
Mutual labels:  image-captioning
Image-Captioning
Image Captioning with Keras
Stars: ✭ 60 (+39.53%)
Mutual labels:  image-captioning
captioning chainer
A fast implementation of Neural Image Caption by Chainer
Stars: ✭ 17 (-60.47%)
Mutual labels:  image-captioning
CS231n
My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (-30.23%)
Mutual labels:  image-captioning
localized-narratives
Localized Narratives
Stars: ✭ 60 (+39.53%)
Mutual labels:  image-captioning

im2p

Note

This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some one of you as a jumping-off point.

Introduction

Tensorflow implement of paper: A Hierarchical Approach for Generating Descriptive Image Paragraphs

We donot fine-tunning the parameters, but this model can get the following scores: metric scores

Step 1

Download the VisualGenome dataset, we get the two files: VG_100K, VG_100K_2. According to the paper, we download the training, val and test splits json files. These three json files save the image names of train, validation, test data.

Running the script:

$ python split_dataset

We will get images from [VisualGenome dataset] which the authors used in the paper.

##Step 2 Run the scripts:

$ python get_imgs_train_path.py
$ python get_imgs_val_path.py
$ python get_imgs_test_path.py

We will get three txt files: imgs_train_path.txt, imgs_val_path.txt, imgs_test_path.txt. They save the train, val, test images path.

After this, we use dense caption to extract features. Deploy the running environment follow by densecap step by step.

Run the script:

$ ./download_pretrained_model.sh
$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_train_path.txt \
                          -output_h5 ./data/im2p_train_output.h5 -gpu 0 -use_cudnn 1

We should download the pre-trained model: densecap-pretrained-vgg16.t7. Then, according to the paper, we extract 50 boxes from each image.

Also, don't forget extract val images and test images features:

$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_val_path.txt \
                          -output_h5 ./data/im2p_val_output.h5 -gpu 0 -use_cudnn 1
                          
$ th extract_features.lua -boxes_per_image 50 -max_images -1 -input_txt imgs_test_path.txt \
                          -output_h5 ./data/im2p_test_output.h5 -gpu 0 -use_cudnn 1

Step 3

Run the script:

$ python parse_json.py

In this step, we process the paragraphs_v1.json file for training and testing. We get the img2paragraph file in the ./data directory. Its structure is like this: img2paragraph

Step 4

Finally, we can train and test model, in the terminal:

$ CUDA_VISIBLE_DEVICES=0 ipython
>>> import HRNN_paragraph_batch.py
>>> HRNN_paragraph_batch.train()

After training, we can test the model:

>>> HRNN_paragraph_batch.test()

Results

demo

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].