All Projects → jayleicn → TVQAplus

jayleicn / TVQAplus

Licence: MIT License
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to TVQAplus

University1652 Baseline
ACM Multimedia2020 University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization 🚁 annotates 1652 buildings in 72 universities around the world.
Stars: ✭ 232 (+134.34%)
Mutual labels:  dataset
Recommendersystem Dataset
This repository contains some datasets that I have collected in Recommender Systems.
Stars: ✭ 249 (+151.52%)
Mutual labels:  dataset
NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Stars: ✭ 50 (-49.49%)
Mutual labels:  video-question-answering
Covid Chestxray Dataset
We are building an open database of COVID-19 cases with chest X-ray or CT images.
Stars: ✭ 2,759 (+2686.87%)
Mutual labels:  dataset
Taco
🌮 Trash Annotations in Context Dataset Toolkit
Stars: ✭ 243 (+145.45%)
Mutual labels:  dataset
Cities.json
Cities of the world in Json, based on GeoNames Gazetteer
Stars: ✭ 251 (+153.54%)
Mutual labels:  dataset
Datasets
source{d} datasets ("big code") for source code analysis and machine learning on source code
Stars: ✭ 231 (+133.33%)
Mutual labels:  dataset
climateR
An R 📦 for getting point and gridded climate data by AOI
Stars: ✭ 93 (-6.06%)
Mutual labels:  dataset
Cocostuff10k
The official homepage of the (outdated) COCO-Stuff 10K dataset.
Stars: ✭ 248 (+150.51%)
Mutual labels:  dataset
pytorch violet
A PyTorch implementation of VIOLET
Stars: ✭ 119 (+20.2%)
Mutual labels:  video-question-answering
Covid 19 Repo Data
Data archive of identifiable COVID-19 related public projects on GitHub
Stars: ✭ 236 (+138.38%)
Mutual labels:  dataset
Retriever
Quickly download, clean up, and install public datasets into a database management system
Stars: ✭ 241 (+143.43%)
Mutual labels:  dataset
Datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Stars: ✭ 3,094 (+3025.25%)
Mutual labels:  dataset
Img2poem
Stars: ✭ 238 (+140.4%)
Mutual labels:  dataset
just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (-42.42%)
Mutual labels:  video-question-answering
Datalad
Keep code, data, containers under control with git and git-annex
Stars: ✭ 234 (+136.36%)
Mutual labels:  dataset
Text
Data loaders and abstractions for text and NLP
Stars: ✭ 2,915 (+2844.44%)
Mutual labels:  dataset
Thirukkural-English-Translation-Dataset
Thirukural in English
Stars: ✭ 12 (-87.88%)
Mutual labels:  dataset
Species-Names-Corpus
物种名称语料库。植物名,动物名。
Stars: ✭ 23 (-76.77%)
Mutual labels:  dataset
Chinese Names Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+2983.84%)
Mutual labels:  dataset

TVQA+: Spatio-Temporal Grounding for Video Question Answering

qa_example

We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos. We first augment the TVQA dataset with 310.8k bounding boxes, linking depicted objects to visual concepts in questions and answers. We name this augmented version as TVQA+. We then propose Spatio-Temporal Answerer with Grounded Evidence (STAGE), a unified framework that grounds evidence in both the spatial and temporal domains to answer questions about videos. Comprehensive experiments and analyses demonstrate the effectiveness of our framework and how the rich annotations in our TVQA+ dataset can contribute to the question answering task. As a side product, by performing this joint task, our model is able to produce more insightful intermediate results.

In this repository, we provide PyTorch Implementation of the STAGE model, along with basic preprocessing and evaluation code for TVQA+ dataset.

TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal. [PDF]

Resources

Model

  • STAGE Overview. Spatio-Temporal Answerer with Grounded Evidence (STAGE), a unified framework that grounds evidence in both the spatial and temporal domains to answer questions about videos.
    model_overview

  • Prediction Examples example_predictions

Requirements

  • Python 2.7
  • PyTorch 1.1.0 (should work for 0.4.0 - 1.2.0)
  • tensorboardX
  • tqdm
  • h5py
  • numpy

Training and Evaluation

1, Download and uncompress preprocessed features from Google Drive.

& uncompress the file into project root directory, you should get a dir `tvqa_plus_stage_features` 
containing all the required feature files.
cd $PROJECT_ROOT; tar -xf tvqa_plus_stage_features_new.tar.gz

gdrive is a good tool to use for downloading the file. The features are changed, you have to re-download the features if you have our previous version

2, Run in debug mode to test your environment, path settings:

bash run_main.sh debug

3, Train the full STAGE model:

bash run_main.sh --add_local

note you will need around 30 GB of memory to load the data. Otherwise, you can additionally add --no_core_driver flag to stop loading all the features into memory. After training, you should be able to get ~72.00% QA Acc, which is comparable to the reported number. The trained model and config file are stored at ${$PROJECT_ROOT}/results/${MODEL_DIR}

4, Inference

bash run_inference.sh --model_dir ${MODEL_DIR} --mode ${MODE}

${MODE} could be valid or test. After inference, you will get a ${MODE}_inference_predictions.json file in ${MODEL_DIR}, which is similar to the sample prediction file here eval/data/val_sample_prediction.json.

5, Evaluation

cd eval; python eval_tvqa_plus.py --pred_path ../results/${MODEL_DIR}/valid_inference_predictions.json --gt_path data/tvqa_plus_val.json

Note you can only evaluate val prediction here. To evaluate test set, please follow instructions here.

Citation

@inproceedings{lei2019tvqa,
  title={TVQA+: Spatio-Temporal Grounding for Video Question Answering},
  author={Lei, Jie and Yu, Licheng and Berg, Tamara L and Bansal, Mohit},
  booktitle={Tech Report, arXiv},
  year={2019}
}

TODO

  1. Add data preprocessing scripts (provided preprocessed features)
  2. Add model and training scripts
  3. Add inference and evaluation scripts

Contact

  • Dataset: faq-tvqa-unc [at] googlegroups.com
  • Model: Jie Lei, jielei [at] cs.unc.edu
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].