All Projects → thaolmk54 → hcrn-videoqa

thaolmk54 / hcrn-videoqa

Licence: Apache-2.0 license
Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to hcrn-videoqa

just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (-48.65%)
Mutual labels:  vqa, videoqa
VideoNavQA
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Stars: ✭ 22 (-80.18%)
Mutual labels:  vqa, question-answering
iPerceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Stars: ✭ 52 (-53.15%)
Mutual labels:  question-answering, videoqa
DVQA dataset
DVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018
Stars: ✭ 20 (-81.98%)
Mutual labels:  vqa, question-answering
MICCAI21 MMQ
Multiple Meta-model Quantifying for Medical Visual Question Answering
Stars: ✭ 16 (-85.59%)
Mutual labels:  vqa, question-answering
Mullowbivqa
Hadamard Product for Low-rank Bilinear Pooling
Stars: ✭ 57 (-48.65%)
Mutual labels:  vqa, question-answering
Mac Network
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
Stars: ✭ 444 (+300%)
Mutual labels:  vqa, question-answering
Vqa Tensorflow
Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering
Stars: ✭ 98 (-11.71%)
Mutual labels:  vqa, question-answering
Pytorch Vqa
Strong baseline for visual question answering
Stars: ✭ 158 (+42.34%)
Mutual labels:  vqa
FinBERT-QA
Financial Domain Question Answering with pre-trained BERT Language Model
Stars: ✭ 70 (-36.94%)
Mutual labels:  question-answering
Vqa regat
Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"
Stars: ✭ 129 (+16.22%)
Mutual labels:  vqa
Clipbert
[CVPR 2021 Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning for image-text and video-text tasks.
Stars: ✭ 168 (+51.35%)
Mutual labels:  vqa
ZS-F-VQA
Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 ]
Stars: ✭ 51 (-54.05%)
Mutual labels:  vqa
Vqa Mfb
Stars: ✭ 153 (+37.84%)
Mutual labels:  vqa
nlp qa project
Natural Language Processing Question Answering Final Project
Stars: ✭ 61 (-45.05%)
Mutual labels:  question-answering
Papers
读过的CV方向的一些论文,图像生成文字、弱监督分割等
Stars: ✭ 99 (-10.81%)
Mutual labels:  vqa
examinee
Laravel Quiz and Exam System clone of udemy
Stars: ✭ 151 (+36.04%)
Mutual labels:  question-answering
CPPNotes
【C++ 面试 + C++ 学习指南】 一份涵盖大部分 C++ 程序员所需要掌握的核心知识。
Stars: ✭ 557 (+401.8%)
Mutual labels:  question-answering
DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-73.87%)
Mutual labels:  question-answering
cmrc2017
The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)
Stars: ✭ 90 (-18.92%)
Mutual labels:  question-answering

Hierarchical Conditional Relation Networks for Video Question Answering (HCRN-VideoQA)

We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that encapsulates and transforms an array of tensorial objects into a new array of the same kind, conditioned on a contextual feature. The flexibility of CRN units is then examined in solving Video Question Answering, a challenging problem requiring joint comprehension of video content and natural language processing.

Illustrations of CRN unit and the result of model building HCNR for VideoQA:

CRN Unit HCRN Architecture

Check out our paper for details.

Setups

  1. Clone the repository:
 git clone https://github.com/thaolmk54/hcrn-videoqa.git
  1. Download TGIF-QA, MSRVTT-QA, MSVD-QA dataset and edit absolute paths in preprocess/preprocess_features.py and preprocess/preprocess_questions.py upon where you locate your data. Default paths are with /ceph-g/lethao/datasets/{dataset_name}/.

  2. Install dependencies:

conda create -n hcrn_videoqa python=3.6
conda activate hcrn_videoqa
conda install -c conda-forge ffmpeg
conda install -c conda-forge scikit-video
pip install -r requirements.txt

Experiments with TGIF-QA

Depending on the task to chose question_type out of 4 options: action, transition, count, frameqa.

Preprocessing visual features

  1. To extract appearance feature:
python preprocess/preprocess_features.py --gpu_id 2 --dataset tgif-qa --model resnet101 --question_type {question_type}
  1. To extract motion feature:

    Download ResNeXt-101 pretrained model (resnext-101-kinetics.pth) and place it to data/preprocess/pretrained/.

python preprocess/preprocess_features.py --dataset tgif-qa --model resnext101 --image_height 112 --image_width 112 --question_type {question_type}

Note: Extracting visual feature takes a long time. You can download our pre-extracted features from here and save them in data/tgif-qa/{question_type}/. Please use the following command to join split files:

cat tgif-qa_{question_type}_appearance_feat.h5.part* > tgif-qa_{question_type}_appearance_feat.h5

Proprocess linguistic features

  1. Download glove pretrained 300d word vectors to data/glove/ and process it into a pickle file:
python txt2pickle.py
  1. Preprocess train/val/test questions:
python preprocess/preprocess_questions.py --dataset tgif-qa --question_type {question_type} --glove_pt data/glove/glove.840.300d.pkl --mode train
    
python preprocess/preprocess_questions.py --dataset tgif-qa --question_type {question_type} --mode test

Training

Choose a suitable config file in configs/{task}.yml for one of 4 tasks: action, transition, count, frameqa to train the model. For example, to train with action task, run the following command:

python train.py --cfg configs/tgif_qa_action.yml

Evaluation

To evaluate the trained model, run the following:

python validate.py --cfg configs/tgif_qa_action.yml

Note: Pretrained model for action task is available here. Save the file in results/expTGIF-QAAction/ckpt/ for evaluation.

Experiments with MSRVTT-QA and MSVD-QA

The following is to run experiments with MSRVTT-QA dataset, replace msrvtt-qa with msvd-qa to run with MSVD-QA dataset.

Preprocessing visual features

  1. To extract appearance feature:
python preprocess/preprocess_features.py --gpu_id 2 --dataset msrvtt-qa --model resnet101
  1. To extract motion feature:
python preprocess/preprocess_features.py --dataset msrvtt-qa --model resnext101 --image_height 112 --image_width 112

Proprocess linguistic features

Preprocess train/val/test questions:

python preprocess/preprocess_questions.py --dataset msrvtt-qa --glove_pt data/glove/glove.840.300d.pkl --mode train
    
python preprocess/preprocess_questions.py --dataset msrvtt-qa --question_type {question_type} --mode val
    
python preprocess/preprocess_questions.py --dataset msrvtt-qa --question_type {question_type} --mode test

Training

python train.py --cfg configs/msrvtt_qa.yml

Evaluation

To evaluate the trained model, run the following:

python validate.py --cfg configs/msrvtt_qa.yml

Citations

If you make use of this repository for your research, please cite the following paper:

@article{le2020hierarchical,
  title={Hierarchical Conditional Relation Networks for Video Question Answering},
  author={Le, Thao Minh and Le, Vuong and Venkatesh, Svetha and Tran, Truyen},
  journal={arXiv preprint arXiv:2002.10698},
  year={2020}
}

Acknowledgement

  • As for motion feature extraction, we adapt ResNeXt-101 model from this repo to our code. Thank @kenshohara for releasing the code and the pretrained models.
  • We refer to this repo for preprocessing.
  • Our implementation of dataloader is based on this repo.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].