All Projects → Yu-Wu → Modaily-Aware-Audio-Visual-Video-Parsing

Yu-Wu / Modaily-Aware-Audio-Visual-Video-Parsing

Licence: other
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Modaily-Aware-Audio-Visual-Video-Parsing

MetaBIN
[CVPR2021] Meta Batch-Instance Normalization for Generalizable Person Re-Identification
Stars: ✭ 58 (+205.26%)
Mutual labels:  cvpr, cvpr2021
cvpr-buzz
🐝 Explore Trending Papers at CVPR
Stars: ✭ 37 (+94.74%)
Mutual labels:  cvpr, cvpr2021
Restoring-Extremely-Dark-Images-In-Real-Time
The project is the official implementation of our CVPR 2021 paper, "Restoring Extremely Dark Images in Real Time"
Stars: ✭ 79 (+315.79%)
Mutual labels:  cvpr, cvpr2021
AODA
Official implementation of "Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis"(WACV 2022/CVPRW 2021)
Stars: ✭ 44 (+131.58%)
Mutual labels:  cvpr, cvpr2021
SGGpoint
[CVPR 2021] Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis (official pytorch implementation)
Stars: ✭ 41 (+115.79%)
Mutual labels:  cvpr, cvpr2021
AMP-Regularizer
Code for our paper "Regularizing Neural Networks via Adversarial Model Perturbation", CVPR2021
Stars: ✭ 26 (+36.84%)
Mutual labels:  cvpr, cvpr2021
Cvpr2021 Papers With Code
CVPR 2021 论文和开源项目合集
Stars: ✭ 7,138 (+37468.42%)
Mutual labels:  cvpr, cvpr2021
CoMoGAN
CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.
Stars: ✭ 139 (+631.58%)
Mutual labels:  cvpr, cvpr2021
cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (+405.26%)
Mutual labels:  cvpr, cvpr2021
Scan2Cap
[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Stars: ✭ 81 (+326.32%)
Mutual labels:  cvpr, cvpr2021
HistoGAN
Reference code for the paper HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms (CVPR 2021).
Stars: ✭ 158 (+731.58%)
Mutual labels:  cvpr, cvpr2021
single-positive-multi-label
Multi-Label Learning from Single Positive Labels - CVPR 2021
Stars: ✭ 63 (+231.58%)
Mutual labels:  cvpr, cvpr2021
LBYLNet
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.
Stars: ✭ 46 (+142.11%)
Mutual labels:  cvpr, cvpr2021
BCNet
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]
Stars: ✭ 434 (+2184.21%)
Mutual labels:  cvpr, cvpr2021
CVPR2021-Papers-with-Code-Demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
Stars: ✭ 752 (+3857.89%)
Mutual labels:  cvpr, cvpr2021
Guided-I2I-Translation-Papers
Guided Image-to-Image Translation Papers
Stars: ✭ 117 (+515.79%)
Mutual labels:  cvpr
SkeletonMerger
Code repository for paper `Skeleton Merger: an Unsupervised Aligned Keypoint Detector`.
Stars: ✭ 49 (+157.89%)
Mutual labels:  cvpr2021
LUVLi
[CVPR 2020] Re-hosting of the LUVLi Face Alignment codebase. Please download the codebase from the original MERL website by agreeing to all terms and conditions. By using this code, you agree to MERL's research-only licensing terms.
Stars: ✭ 24 (+26.32%)
Mutual labels:  cvpr
Im2Vec
[CVPR 2021 Oral] Im2Vec Synthesizing Vector Graphics without Vector Supervision
Stars: ✭ 229 (+1105.26%)
Mutual labels:  cvpr2021
TailCalibX
Pytorch implementation of Feature Generation for Long-Tail Classification by Rahul Vigneswaran, Marc T Law, Vineeth N Balasubramaniam and Makarand Tapaswi
Stars: ✭ 32 (+68.42%)
Mutual labels:  cvpr

Exploring Heterogeneous Clues for Weakly Supervised Audio-Visual Video Parsing

Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing

The Audio-Visual Video Parsing task

We aim at identifying the audible and visible events and their temporal location in videos. Note that the visual and audio events might be asynchronous.

Prepare data

Please refer to https://github.com/YapengTian/AVVP-ECCV20 for downloading the LLP Dataset and the preprocessed audio and visual features. Put the downloaded r2plus1d_18, res152, vggish features into the feats folder.

Training pipeline

The training includes three stages.

Train a base model

We first train a base model using MIL and our proposed contrastive learning.

cd step1_train_base_model
python main_avvp.py --mode train --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18

Generate modality-aware labels

We then freeze the trained model and evaluate each video by swapping its audio and visual tracks with other unrelated videos.

cd step2_find_exchange
python main_avvp.py --mode estimate_labels --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18 --model_save_dir ../step1_train_base_model/models/

Re-train using modality-aware labels

We then re-train the model from scratch using modality-aware labels.

cd step3_retrain
python main_avvp.py --mode retrain --audio_dir ../feats/vggish/ --video_dir ../feats/res152/ --st_dir ../feats/r2plus1d_18

Citation

Please cite the following paper in your publications if it helps your research:

@inproceedings{wu2021explore,
    title = {Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing},
    author = {Wu, Yu and Yang, Yi},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
    
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].