Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → catalina17 → VideoNavQA

catalina17 / VideoNavQA

Licence: other

An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)

Programming Languages

139335 projects - #7 most used programming language

77523 projects

Labels

benchmark machine-learning deep-neural-networks video navigation vqa question-answering visual-reasoning multimodal embodied cross-modality conditioning videonavqa

Projects that are alternatives of or similar to VideoNavQA

DVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018

Stars: ✭ 20 (-9.09%)

Mutual labels: vqa, question-answering

Implementation for the paper "Hierarchical Conditional Relation Networks for Video Question Answering" (Le et al., CVPR 2020, Oral)

Stars: ✭ 111 (+404.55%)

Mutual labels: vqa, question-answering

A framework for Multimodal Intelligence research from Inspur HSSLAB.

Stars: ✭ 21 (-4.55%)

Mutual labels: vqa, multimodal

Tensorflow Implementation of Deeper LSTM+ normalized CNN for Visual Question Answering

Stars: ✭ 98 (+345.45%)

Mutual labels: vqa, question-answering

Hadamard Product for Low-rank Bilinear Pooling

Stars: ✭ 57 (+159.09%)

Mutual labels: vqa, question-answering

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

Stars: ✭ 444 (+1918.18%)

Mutual labels: vqa, question-answering

Multiple Meta-model Quantifying for Medical Visual Question Answering

Stars: ✭ 16 (-27.27%)

Mutual labels: vqa, question-answering

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Stars: ✭ 4,713 (+21322.73%)

Mutual labels: vqa, multimodal

Vizwiz Vqa Pytorch

PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People

Stars: ✭ 33 (+50%)

Mutual labels: vqa

Stars: ✭ 153 (+595.45%)

Mutual labels: vqa

Bottom Up Attention Vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Stars: ✭ 667 (+2931.82%)

Mutual labels: vqa

Bottom Up Attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Stars: ✭ 989 (+4395.45%)

Mutual labels: vqa

Strong baseline for visual question answering

Stars: ✭ 158 (+618.18%)

Mutual labels: vqa

Visual Question Answering

📷 ❓ Visual Question Answering Demo and Algorithmia API

Stars: ✭ 18 (-18.18%)

Mutual labels: vqa

Flexible time series feature extraction & processing

Stars: ✭ 252 (+1045.45%)

Mutual labels: multimodal

Research Code for ICCV 2019 paper "Relation-aware Graph Attention Network for Visual Question Answering"

Stars: ✭ 129 (+486.36%)

Mutual labels: vqa

Visual Question Answering in Pytorch

Stars: ✭ 602 (+2636.36%)

Mutual labels: vqa

The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)

Stars: ✭ 90 (+309.09%)

Mutual labels: question-answering

self critical vqa

Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''

Stars: ✭ 39 (+77.27%)

Mutual labels: vqa

读过的CV方向的一些论文，图像生成文字、弱监督分割等

Stars: ✭ 99 (+350%)

Mutual labels: vqa

View All Similar Projects ➔

VideoNavQA

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
BMVC 2019, spotlight talk at ViGIL NeurIPS 2019
Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville

We introduce the VideoNavQA task: by removing the navigation and action selection requirements from Embodied QA, we increase the difficulty of the visual reasoning component via a much larger question space, tackling the sort of complex reasoning questions that make QA tasks challenging. By designing and evaluating several VQA-style models on the dataset, we establish a novel way of evaluating EQA feasibility given existing methods, while highlighting the difficulty of the problem even in the most ideal setting.


'Where is the green rug next to the sofa?'	'Are the computer and the bed the same color?'	'What is the thing next to the tv stand located in the living room?'

Getting started

$ git clone https://github.com/catalina17/VideoNavQA
$ virtualenv -p python3 videonavqa
$ source videonavqa/bin/activate
$ pip install -r requirements.txt

Dataset

The VideoNavQA benchmark data can be found here. After expanding the archive to a specific directory, please update BASE_DIR (declared in eval/utils.py) with that path.

Dependencies

Model evaluation:
- Faster-RCNN fork (with VGG-16 pre-trained weights)
- pre-trained object detector for extracting visual features (OBJ_DETECTOR_PATH in eval/utils.py) should be initialised from this checkpoint instead of the one initially provided in the dataset archive - please make sure to replace the file!

Data generation tools:
- EmbodiedQA fork
- House3D fork
- SUNCG dataset
- SUNCG toolbox

Running the models

The sample script eval.sh allows running (as-is) the FiLM-based models described in our paper. One epoch takes a few hours on an Nvidia P100 16GB GPU; it is likely that you will need to resume training from the specified checkpoint every 1-3 epochs. You may then test your model using the q_and_v_test.py script, with similar command-line arguments.

Citation

Please cite us if our work inspires your research or you use our code and/or the VideoNavQA benchmark:

@article{cangea2019videonavqa,
  title={VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering},
  author={Cangea, C{\u{a}}t{\u{a}}lina and Belilovsky, Eugene and Li{\`o}, Pietro and Courville, Aaron},
  journal={arXiv preprint arXiv:1908.04950},
  year={2019}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 22

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗