Alternatives and detailed information of TVQAplus

jayleicn / TVQAplus

Licence: MIT License

[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to TVQAplus

University1652 Baseline

ACM Multimedia2020 University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization 🚁 annotates 1652 buildings in 72 universities around the world.

Stars: ✭ 232 (+134.34%)

Mutual labels: dataset

Recommendersystem Dataset

This repository contains some datasets that I have collected in Recommender Systems.

Stars: ✭ 249 (+151.52%)

Mutual labels: dataset

NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Stars: ✭ 50 (-49.49%)

Mutual labels: video-question-answering

Covid Chestxray Dataset

We are building an open database of COVID-19 cases with chest X-ray or CT images.

Stars: ✭ 2,759 (+2686.87%)

Mutual labels: dataset

Taco

🌮 Trash Annotations in Context Dataset Toolkit

Stars: ✭ 243 (+145.45%)

Mutual labels: dataset

Cities.json

Cities of the world in Json, based on GeoNames Gazetteer

Stars: ✭ 251 (+153.54%)

Mutual labels: dataset

Datasets

source{d} datasets ("big code") for source code analysis and machine learning on source code

Stars: ✭ 231 (+133.33%)

Mutual labels: dataset

climateR

An R 📦 for getting point and gridded climate data by AOI

Stars: ✭ 93 (-6.06%)

Mutual labels: dataset

Cocostuff10k

The official homepage of the (outdated) COCO-Stuff 10K dataset.

Stars: ✭ 248 (+150.51%)

Mutual labels: dataset

pytorch violet

A PyTorch implementation of VIOLET

Stars: ✭ 119 (+20.2%)

Mutual labels: video-question-answering

Covid 19 Repo Data

Data archive of identifiable COVID-19 related public projects on GitHub

Stars: ✭ 236 (+138.38%)

Mutual labels: dataset

Retriever

Quickly download, clean up, and install public datasets into a database management system

Stars: ✭ 241 (+143.43%)

Mutual labels: dataset

Datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

Stars: ✭ 3,094 (+3025.25%)

Mutual labels: dataset

Img2poem

Stars: ✭ 238 (+140.4%)

Mutual labels: dataset

just-ask

[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Stars: ✭ 57 (-42.42%)

Mutual labels: video-question-answering

Datalad

Keep code, data, containers under control with git and git-annex

Stars: ✭ 234 (+136.36%)

Mutual labels: dataset

Text

Data loaders and abstractions for text and NLP

Stars: ✭ 2,915 (+2844.44%)

Mutual labels: dataset

Thirukkural-English-Translation-Dataset

Thirukural in English

Stars: ✭ 12 (-87.88%)

Mutual labels: dataset

Species-Names-Corpus

物种名称语料库。植物名,动物名。

Stars: ✭ 23 (-76.77%)

Mutual labels: dataset

Chinese Names Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

Stars: ✭ 3,053 (+2983.84%)

Mutual labels: dataset

View All Similar Projects ➔

TVQA+: Spatio-Temporal Grounding for Video Question Answering

We present the task of Spatio-Temporal Video Question Answering, which requires intelligent systems to simultaneously retrieve relevant moments and detect referenced visual concepts (people and objects) to answer natural language questions about videos. We first augment the TVQA dataset with 310.8k bounding boxes, linking depicted objects to visual concepts in questions and answers. We name this augmented version as TVQA+. We then propose Spatio-Temporal Answerer with Grounded Evidence (STAGE), a unified framework that grounds evidence in both the spatial and temporal domains to answer questions about videos. Comprehensive experiments and analyses demonstrate the effectiveness of our framework and how the rich annotations in our TVQA+ dataset can contribute to the question answering task. As a side product, by performing this joint task, our model is able to produce more insightful intermediate results.

In this repository, we provide PyTorch Implementation of the STAGE model, along with basic preprocessing and evaluation code for TVQA+ dataset.

TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal. [PDF]

Resources

Data: TVQA+ dataset
Website: http://tvqa.cs.unc.edu
Submission: codalab evaluation server
Related works: TVR (Moment Retrieval), TVC (Video Captioning), TVQA (Localized VideoQA)

Model

STAGE Overview. Spatio-Temporal Answerer with Grounded Evidence (STAGE), a unified framework that grounds evidence in both the spatial and temporal domains to answer questions about videos.
Prediction Examples

Requirements

Python 2.7
PyTorch 1.1.0 (should work for 0.4.0 - 1.2.0)
tensorboardX
tqdm
h5py
numpy

Training and Evaluation

1, Download and uncompress preprocessed features from Google Drive.

& uncompress the file into project root directory, you should get a dir `tvqa_plus_stage_features` 
containing all the required feature files.
cd $PROJECT_ROOT; tar -xf tvqa_plus_stage_features_new.tar.gz

gdrive is a good tool to use for downloading the file. The features are changed, you have to re-download the features if you have our previous version

2, Run in debug mode to test your environment, path settings:

bash run_main.sh debug

3, Train the full STAGE model:

bash run_main.sh --add_local

note you will need around 30 GB of memory to load the data. Otherwise, you can additionally add --no_core_driver flag to stop loading all the features into memory. After training, you should be able to get ~72.00% QA Acc, which is comparable to the reported number. The trained model and config file are stored at ${$PROJECT_ROOT}/results/${MODEL_DIR}

4, Inference

bash run_inference.sh --model_dir ${MODEL_DIR} --mode ${MODE}

${MODE} could be valid or test. After inference, you will get a ${MODE}_inference_predictions.json file in ${MODEL_DIR}, which is similar to the sample prediction file here eval/data/val_sample_prediction.json.

5, Evaluation

cd eval; python eval_tvqa_plus.py --pred_path ../results/${MODEL_DIR}/valid_inference_predictions.json --gt_path data/tvqa_plus_val.json

Note you can only evaluate val prediction here. To evaluate test set, please follow instructions here.

Citation

@inproceedings{lei2019tvqa,
  title={TVQA+: Spatio-Temporal Grounding for Video Question Answering},
  author={Lei, Jie and Yu, Licheng and Berg, Tamara L and Bansal, Mohit},
  booktitle={Tech Report, arXiv},
  year={2019}
}

TODO

Add data preprocessing scripts (provided preprocessed features)
Add model and training scripts
Add inference and evaluation scripts

Contact

Dataset: faq-tvqa-unc [at] googlegroups.com
Model: Jie Lei, jielei [at] cs.unc.edu

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

jayleicn / TVQAplus

Programming Languages

Labels

Projects that are alternatives of or similar to TVQAplus

TVQA+: Spatio-Temporal Grounding for Video Question Answering

Resources

Model

Requirements

Training and Evaluation

Citation

TODO

Contact