[CVPR 2020] Re-hosting of the LUVLi Face Alignment codebase. Please download the codebase from the original MERL website by agreeing to all terms and conditions. By using this code, you agree to MERL's research-only licensing terms.

Stars: ✭ 24 (+26.32%)

Mutual labels: cvpr

Im2Vec

[CVPR 2021 Oral] Im2Vec Synthesizing Vector Graphics without Vector Supervision

Stars: ✭ 229 (+1105.26%)

Mutual labels: cvpr2021

TailCalibX

Pytorch implementation of Feature Generation for Long-Tail Classification by Rahul Vigneswaran, Marc T Law, Vineeth N Balasubramaniam and Makarand Tapaswi

Stars: ✭ 32 (+68.42%)

Mutual labels: cvpr

View All Similar Projects ➔

Exploring Heterogeneous Clues for Weakly Supervised Audio-Visual Video Parsing

Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing

The Audio-Visual Video Parsing task

We aim at identifying the audible and visible events and their temporal location in videos. Note that the visual and audio events might be asynchronous.

Prepare data

Please refer to https://github.com/YapengTian/AVVP-ECCV20 for downloading the LLP Dataset and the preprocessed audio and visual features. Put the downloaded r2plus1d_18, res152, vggish features into the feats folder.

Training pipeline

The training includes three stages.

Train a base model

We first train a base model using MIL and our proposed contrastive learning.

cd step1_train_base_model
python main_avvp.py --mode train --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18

Generate modality-aware labels

We then freeze the trained model and evaluate each video by swapping its audio and visual tracks with other unrelated videos.

cd step2_find_exchange
python main_avvp.py --mode estimate_labels --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18 --model_save_dir ../step1_train_base_model/models/

Re-train using modality-aware labels

We then re-train the model from scratch using modality-aware labels.

cd step3_retrain
python main_avvp.py --mode retrain --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18

Citation

Please cite the following paper in your publications if it helps your research:

@inproceedings{wu2021explore,
    title = {Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing},
    author = {Wu, Yu and Yang, Yi},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
    
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Yu-Wu / Modaily-Aware-Audio-Visual-Video-Parsing

Programming Languages

Labels

Projects that are alternatives of or similar to Modaily-Aware-Audio-Visual-Video-Parsing