All Projects → fenglinliu98 → MIA

fenglinliu98 / MIA

Licence: other
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to MIA

udacity-cvnd-projects
My solutions to the projects assigned for the Udacity Computer Vision Nanodegree
Stars: ✭ 36 (-36.84%)
Mutual labels:  image-captioning
CBP
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
Stars: ✭ 52 (-8.77%)
Mutual labels:  vision-and-language
pix2code-pytorch
PyTorch implementation of pix2code. 🔥
Stars: ✭ 24 (-57.89%)
Mutual labels:  image-captioning
X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Stars: ✭ 283 (+396.49%)
Mutual labels:  vision-and-language
BUTD model
A pytorch implementation of "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" for image captioning.
Stars: ✭ 28 (-50.88%)
Mutual labels:  image-captioning
iMIX
A framework for Multimodal Intelligence research from Inspur HSSLAB.
Stars: ✭ 21 (-63.16%)
Mutual labels:  vision-and-language
Image-Captioning-with-Beam-Search
Generating image captions using Xception Network and Beam Search in Keras
Stars: ✭ 18 (-68.42%)
Mutual labels:  image-captioning
synse-zsl
Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'
Stars: ✭ 14 (-75.44%)
Mutual labels:  vision-and-language
wikiHow paper list
A paper list of research conducted based on wikiHow
Stars: ✭ 25 (-56.14%)
Mutual labels:  vision-and-language
Show and Tell
Show and Tell : A Neural Image Caption Generator
Stars: ✭ 74 (+29.82%)
Mutual labels:  image-captioning
calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Stars: ✭ 105 (+84.21%)
Mutual labels:  vision-and-language
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (-14.04%)
Mutual labels:  vision-and-language
LaBERT
A length-controllable and non-autoregressive image captioning model.
Stars: ✭ 50 (-12.28%)
Mutual labels:  image-captioning
pytorch violet
A PyTorch implementation of VIOLET
Stars: ✭ 119 (+108.77%)
Mutual labels:  vision-and-language
clip playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Stars: ✭ 80 (+40.35%)
Mutual labels:  vision-and-language
catr
Image Captioning Using Transformer
Stars: ✭ 206 (+261.4%)
Mutual labels:  image-captioning
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (-28.07%)
Mutual labels:  vision-and-language
gramtion
Twitter bot for generating photo descriptions (alt text)
Stars: ✭ 21 (-63.16%)
Mutual labels:  image-captioning
Image-Captioining
The objective is to process by generating textual description from an image – based on the objects and actions in the image. Using generative models so that it creates novel sentences. Pipeline type models uses two separate learning process, one for language modelling and other for image recognition. It first identifies objects in image and prov…
Stars: ✭ 20 (-64.91%)
Mutual labels:  image-captioning
stanford-cs231n-assignments-2020
This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).
Stars: ✭ 84 (+47.37%)
Mutual labels:  vision-and-language

MIA (NeurIPS 2019)

Implementation of "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" by Fenglin Liu, Yuanxin Liu, Xuancheng Ren, Xiaodong He, and Xu Sun. The paper can be found at [arxiv], [pdf].

Image text

Semantic-Grounded Image Representations (Based on the Bottom-up features)

Coming Soon!

Textual Concepts (Google Drive)

[Pre-trained Models (Google Drive)]

Coming Soon!

Usage

Requirements

This code is written in Python2.7 and requires PyTorch >= 0.4.1

You may take a look at https://github.com/s-gupta/visual-concepts to find how to get the textual concepts of an image by yourself.

Dataset Preparation

Download MSCOCO images and preprocess them

  • Download

Download the mscoco images from link. You need 2014 training images and 2014 val. images. You should put the train2014/ and val2014/ in the ./data/images/ directory.

Note: We also provided a download bash script to download the mscoco images:

cd data/images/original && bash download_mscoco_images.sh
  • Preprocess

Now you may need to run resize.py to resize all the images (in both train and val folder) into 256 x 256. You may specify different locations inside resize.py

python resize_images.py

Download MSCOCO captions and preprocess them

  • Download

You may download the mscoco captions from the official website or use the download bash script provided by us.

cd data && bash download_mscoco_captions.sh
  • Preprocess

Afterwards, you should create the Karpathy split for training, validation and test.

python KarpathySplit.py

Then you can build the vocabulary by running (Note: You should download the nltk_data to build the vocabulary.)

unzip nltk_data.zip && python build_vocab.py

Download image concepts

Download the Textual Concepts (Google Drive) and put it in the ./data/ directory.

mv image_concepts.json ./data

Start Training

Now you can train the baseline models and the baseline w/ MIA models with:
(Note: We also released the pre-trained models in [Google Drive] (Coming Soon!))

Visual Attention

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualAttention 
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualAttention --use_MIA=True --iteration_times=2

Concept Attention

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptAttention
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptAttention --use_MIA=True --iteration_times=2

Visual Condition

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualCondition
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualCondition --use_MIA=True --iteration_times=2

Concept Condition

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptCondition
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=ConceptCondition --use_MIA=True --iteration_times=2

Visual Regional Attention (Coming Soon!)

  • Baseline
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualRegionalAttention
  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0,1 python Train.py --basic_model=VisualRegionalAttention --use_MIA=True --iteration_times=2

Testing

You can test the trained model with Test.py, but don't forget to download the coco-caption code from link1 or link2 into coco directory.

  • Baseline
CUDA_VISIBLE_DEVICES=0 python Test.py  --basic_model=basic_model_name

Note: basic_model_name = (VisualAttention, ConceptAttention, VisualCondition, ConceptCondition, VisualRegionalAttention)

  • Baseline w/ MIA
CUDA_VISIBLE_DEVICES=0 python Test.py  --basic_model=basic_model_name --use_MIA=True --iteration_times=2

Reference

If you use this code or our extracted image concepts as part of any published research, please acknowledge the following paper

@inproceedings{Liu2019MIA,
  author    = {Fenglin Liu and
               Yuanxin Liu and
               Xuancheng Ren and
               Xiaodong He and
               Xu Sun},
  title     = {Aligning Visual Regions and Textual Concepts for Semantic-Grounded
               Image Representations},
  booktitle = {NeurIPS},
  pages     = {6847--6857},
  year      = {2019}
}

Acknowledgements

Thanks to Pytorch team for providing Pytorch, COCO team for providing dataset, Tsung-Yi Lin for providing evaluation codes for MS COCO caption generation, Yufeng Ma for providing open source repositories and Torchvision ResNet implementation.

Note

If you have any questions about the code or our paper, please send an email to [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].