All Projects → JaywongWang → CBP

JaywongWang / CBP

Licence: other
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to CBP

iMIX
A framework for Multimodal Intelligence research from Inspur HSSLAB.
Stars: ✭ 21 (-59.62%)
Mutual labels:  vision-and-language
TSP-PRL
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video (AAAI2020)
Stars: ✭ 39 (-25%)
Mutual labels:  video-analysis
calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Stars: ✭ 105 (+101.92%)
Mutual labels:  vision-and-language
clip playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Stars: ✭ 80 (+53.85%)
Mutual labels:  vision-and-language
just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+9.62%)
Mutual labels:  vision-and-language
siam-mot
SiamMOT: Siamese Multi-Object Tracking
Stars: ✭ 446 (+757.69%)
Mutual labels:  video-analysis
Video-Grounding-from-Text
Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"
Stars: ✭ 40 (-23.08%)
Mutual labels:  video-grounding
Materials-Temporal-Action-Detection
temporal action detection: benchmark results, features download etc.
Stars: ✭ 199 (+282.69%)
Mutual labels:  action-localization
rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Stars: ✭ 36 (-30.77%)
Mutual labels:  vision-and-language
X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Stars: ✭ 283 (+444.23%)
Mutual labels:  vision-and-language
synse-zsl
Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'
Stars: ✭ 14 (-73.08%)
Mutual labels:  vision-and-language
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Stars: ✭ 34 (-34.62%)
Mutual labels:  vision-and-language
lang2seg
Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019
Stars: ✭ 30 (-42.31%)
Mutual labels:  vision-and-language
stanford-cs231n-assignments-2020
This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).
Stars: ✭ 84 (+61.54%)
Mutual labels:  vision-and-language
Action-Localization
Action-Localization, Atomic Visual Actions (AVA) Dataset
Stars: ✭ 22 (-57.69%)
Mutual labels:  action-localization
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (-21.15%)
Mutual labels:  vision-and-language
PyAnomaly
Useful Toolbox for Anomaly Detection
Stars: ✭ 95 (+82.69%)
Mutual labels:  video-analysis
wikiHow paper list
A paper list of research conducted based on wikiHow
Stars: ✭ 25 (-51.92%)
Mutual labels:  vision-and-language
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (-5.77%)
Mutual labels:  vision-and-language
pytorch violet
A PyTorch implementation of VIOLET
Stars: ✭ 119 (+128.85%)
Mutual labels:  vision-and-language

CBP

Official Tensorflow Implementation of the AAAI-2020 paper Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction by Jingwen Wang et al.

alt text

Citation

@inproceedings{wang2020temporally,
  title={Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction},
  author={Wang, Jingwen and Ma, Lin and Jiang, Wenhao},
  booktitle={AAAI},
  year={2020}
}

Requirements

pip install -r requirements.txt

Data Preparation

  1. Download Glove word embedding data.
cd download/
sh download_glove.sh
  1. Download dataset features.

TACoS: BaiduDrive, GoogleDrive

Charades-STA: BaiduDrive, GoogleDrive

ActivityNet-Captions: BaiduDrive, GoogleDrive

Put the feature hdf5 file in the corresponding directory ./datasets/{DATASET}/features/

We decode TACoS/Charades videos using fps=16 and extract C3D (fc6) features for each non-overlap 16-frame snippet. Therefore, each feature corresponds to 1-second snippet. For ActivityNet, each feature corresponds to 2-second snippet. To extract C3D fc6 features, I mainly refer to this code.

  1. Download trained models.

Download and put the checkpoints in corresponding ./checkpoints/{DATASET}/ .

BaiduDrive, GoogleDrive

  1. Data Preprocessing (Optional)
cd datasets/tacos/
sh prepare_data.sh

Then copy the generated data in ./data/save/ .

Use correspondig scripts for preparing data for other datasets.

You may skip this procedure as the prepared data is already saved in ./datasets/{DATASET}/data/save/ .

Testing and Evaluation

sh scripts/test_tacos.sh
sh scripts/eval_tacos.sh

Use corresponding scripts for testing or evaluating for other datasets.

The predicted results are also provided in ./results/{DATASET}/ .

CBP R@1,IoU=0.7 R@1,IoU=0.5 R@5,IoU=0.7 R@5,IoU=0.5 mIoU
TACoS 18.54 23.19 24.88 35.83 20.46
Charades 17.98 36.21 50.27 70.51 35.70
ActivityNet 18.74 36.83 49.84 67.78 37.98

Training

sh scripts/train_tacos.sh

Use corresponding scripts for training for other datasets.

Update

  1. The checkpoints for Charades dataset have been re-uploaded.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].