Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → JaywongWang → CBP

JaywongWang / CBP

Licence: other

Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"

Programming Languages

139335 projects - #7 most used programming language

77523 projects

Labels

video-analysis vision-and-language video-grounding action-localization video-moment-retrieval

Projects that are alternatives of or similar to CBP

A framework for Multimodal Intelligence research from Inspur HSSLAB.

Stars: ✭ 21 (-59.62%)

Mutual labels: vision-and-language

Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video (AAAI2020)

Stars: ✭ 39 (-25%)

Mutual labels: video-analysis

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Stars: ✭ 105 (+101.92%)

Mutual labels: vision-and-language

clip playground

An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities

Stars: ✭ 80 (+53.85%)

Mutual labels: vision-and-language

[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Stars: ✭ 57 (+9.62%)

Mutual labels: vision-and-language

SiamMOT: Siamese Multi-Object Tracking

Stars: ✭ 446 (+757.69%)

Mutual labels: video-analysis

Video-Grounding-from-Text

Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"

Stars: ✭ 40 (-23.08%)

Mutual labels: video-grounding

Materials-Temporal-Action-Detection

temporal action detection: benchmark results, features download etc.

Stars: ✭ 199 (+282.69%)

Mutual labels: action-localization

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Stars: ✭ 36 (-30.77%)

Mutual labels: vision-and-language

X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

Stars: ✭ 283 (+444.23%)

Mutual labels: vision-and-language

Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'

Stars: ✭ 14 (-73.08%)

Mutual labels: vision-and-language

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Stars: ✭ 34 (-34.62%)

Mutual labels: vision-and-language

Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019

Stars: ✭ 30 (-42.31%)

Mutual labels: vision-and-language

stanford-cs231n-assignments-2020

This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).

Stars: ✭ 84 (+61.54%)

Mutual labels: vision-and-language

Action-Localization

Action-Localization, Atomic Visual Actions (AVA) Dataset

Stars: ✭ 22 (-57.69%)

Mutual labels: action-localization

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

Stars: ✭ 41 (-21.15%)

Mutual labels: vision-and-language

Useful Toolbox for Anomaly Detection

Stars: ✭ 95 (+82.69%)

Mutual labels: video-analysis

wikiHow paper list

A paper list of research conducted based on wikiHow

Stars: ✭ 25 (-51.92%)

Mutual labels: vision-and-language

[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation

Stars: ✭ 49 (-5.77%)

Mutual labels: vision-and-language

A PyTorch implementation of VIOLET

Stars: ✭ 119 (+128.85%)

Mutual labels: vision-and-language

View All Similar Projects ➔

CBP

Official Tensorflow Implementation of the AAAI-2020 paper Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction by Jingwen Wang et al.

Citation

@inproceedings{wang2020temporally,
  title={Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction},
  author={Wang, Jingwen and Ma, Lin and Jiang, Wenhao},
  booktitle={AAAI},
  year={2020}
}

Requirements

pip install -r requirements.txt

Data Preparation

Download Glove word embedding data.

cd download/
sh download_glove.sh

Download dataset features.

TACoS: BaiduDrive, GoogleDrive

Charades-STA: BaiduDrive, GoogleDrive

ActivityNet-Captions: BaiduDrive, GoogleDrive

Put the feature hdf5 file in the corresponding directory ./datasets/{DATASET}/features/

We decode TACoS/Charades videos using fps=16 and extract C3D (fc6) features for each non-overlap 16-frame snippet. Therefore, each feature corresponds to 1-second snippet. For ActivityNet, each feature corresponds to 2-second snippet. To extract C3D fc6 features, I mainly refer to this code.

Download trained models.

Download and put the checkpoints in corresponding ./checkpoints/{DATASET}/ .

BaiduDrive, GoogleDrive

Data Preprocessing (Optional)

cd datasets/tacos/
sh prepare_data.sh

Then copy the generated data in ./data/save/ .

Use correspondig scripts for preparing data for other datasets.

You may skip this procedure as the prepared data is already saved in ./datasets/{DATASET}/data/save/ .

Testing and Evaluation

sh scripts/test_tacos.sh
sh scripts/eval_tacos.sh

Use corresponding scripts for testing or evaluating for other datasets.

The predicted results are also provided in ./results/{DATASET}/ .

CBP	R@1,IoU=0.7	R@1,IoU=0.5	R@5,IoU=0.7	R@5,IoU=0.5	mIoU
TACoS	18.54	23.19	24.88	35.83	20.46
Charades	17.98	36.21	50.27	70.51	35.70
ActivityNet	18.74	36.83	49.84	67.78	37.98

Training

sh scripts/train_tacos.sh

Use corresponding scripts for training for other datasets.

Update

The checkpoints for Charades dataset have been re-uploaded.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 52

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗