All Projects → yanbeic → CCL

yanbeic / CCL

Licence: Apache-2.0 license
PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to CCL

ViCC
[WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https://arxiv.org/abs/2106.10137.
Stars: ✭ 33 (-56.58%)
Mutual labels:  video-recognition, contrastive-learning
MOON
Model-Contrastive Federated Learning (CVPR 2021)
Stars: ✭ 93 (+22.37%)
Mutual labels:  contrastive-learning, cvpr2021
HandMesh
No description or website provided.
Stars: ✭ 258 (+239.47%)
Mutual labels:  cvpr2021
awesome-efficient-gnn
Code and resources on scalable and efficient Graph Neural Networks
Stars: ✭ 498 (+555.26%)
Mutual labels:  contrastive-learning
CLSA
official implemntation for "Contrastive Learning with Stronger Augmentations"
Stars: ✭ 48 (-36.84%)
Mutual labels:  contrastive-learning
One-Shot-Face-Swapping-on-Megapixels
One Shot Face Swapping on Megapixels.
Stars: ✭ 260 (+242.11%)
Mutual labels:  cvpr2021
Awesome-low-level-vision-resources
A curated list of resources for Low-level Vision Tasks
Stars: ✭ 35 (-53.95%)
Mutual labels:  cvpr2021
SCL
📄 Spatial Contrastive Learning for Few-Shot Classification (ECML/PKDD 2021).
Stars: ✭ 42 (-44.74%)
Mutual labels:  contrastive-learning
MiCT-Net-PyTorch
Video Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone
Stars: ✭ 48 (-36.84%)
Mutual labels:  video-recognition
adaptive-wavelets
Adaptive, interpretable wavelets across domains (NeurIPS 2021)
Stars: ✭ 58 (-23.68%)
Mutual labels:  distillation
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (+6.58%)
Mutual labels:  contrastive-learning
ZAQ-code
CVPR 2021 : Zero-shot Adversarial Quantization (ZAQ)
Stars: ✭ 59 (-22.37%)
Mutual labels:  distillation
Scan2Cap
[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Stars: ✭ 81 (+6.58%)
Mutual labels:  cvpr2021
cfvqa
[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias
Stars: ✭ 96 (+26.32%)
Mutual labels:  cvpr2021
Supervised-Contrastive-Learning-in-TensorFlow-2
Implements the ideas presented in https://arxiv.org/pdf/2004.11362v1.pdf by Khosla et al.
Stars: ✭ 117 (+53.95%)
Mutual labels:  contrastive-learning
S2-BNN
S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)
Stars: ✭ 53 (-30.26%)
Mutual labels:  contrastive-learning
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+43.42%)
Mutual labels:  contrastive-learning
soft-intro-vae-pytorch
[CVPR 2021 Oral] Official PyTorch implementation of Soft-IntroVAE from the paper "Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders"
Stars: ✭ 170 (+123.68%)
Mutual labels:  cvpr2021
object-aware-contrastive
Object-aware Contrastive Learning for Debiased Scene Representation (NeurIPS 2021)
Stars: ✭ 44 (-42.11%)
Mutual labels:  contrastive-learning
RMNet
Implementation of "Efficient Regional Memory Network for Video Object Segmentation". (Xie et al., CVPR 2021)
Stars: ✭ 76 (+0%)
Mutual labels:  cvpr2021

Compositional Contrastive Learning

PyTorch implementation on Distilling Audio-Visual Knowledge by Compositional Contrastive Learning.

Introduction

Distilling knowledge from the pre-trained teacher models helps to learn a small student model that generalizes better. While existing works mostly focus on distilling knowledge within the same modality, we explore to distill the multi-modal knowledge available in video data (i.e. audio and vision). Specifically, we propose to transfer audio and visual knowledge from pre-trained image and audio teacher models to learn more expressive video representations.

In multi-modal distillation, there often exists a semantic gap across modalities, e.g. a video shows applying lipstick visually while its accompanied audio is music. To ensure effective multi-modal distillation in the presence of a cross-modal semantic gap, we propose compositional contrastive learning, which features learnable compositional embeddings to close the cross-modal semantic gap, and a multi-class contrastive distillation objective to align different modalities jointly in the shared latent space.

We demonstrate our method can distill knowledge from the audio and visual modalities to learn a stronger video model for recognition and retrieval tasks on video action recognition datasets.

Getting Started

Prerequisites:

Data Preparation on UCF101 (example):

  • audio features are extracted based on the audio pre-trained model PANNs. The UCF101 audio features are provided under the directory dataset/UCF101. Please uncompress the audiocnn14embed512_features.tar.gz file for details.
  • video data is convert to the hdf5 format using the following command. Please specify the data directory ${UCF101_DATA_DIR}, e.g. datasets/UCF101/UCF-101. Note: video data can be downloaded here.
python util_scripts/generate_video_hdf5.py --dir_path=${UCF101_DATA_DIR} --dst_path=datasets/UCF101/hdf5data --dataset=ucf101
  • prepare the json file for dataloader using the following command. Note: official data splits can be downloaded here.
python util_scripts/ucf101_json.py --dir_path=datasets/UCF101/ucfTrainTestlist --video_path=datasets/UCF101/hdf5data --audio_path=datasets/UCF101/audiocnn14embed512_features --dst_path=datasets/UCF101/ --video_type=hdf5

Training & Testing:

The running commands for both training and testing are written in the same script file. Experiments are conducted on 2 gpus. Please refer to the script files in the directory scripts for details. Use the folllowing commands to test on the UCF51 dataset.

  • baseline (w/o distillation)
sh scripts/run_baseline.sh
  • CCL (A): distilling audio knowledge from the pre-trained audio teacher model (audiocnn14)
sh scripts/run_ccl_audio.sh
  • CCL (I): distilling image knowledge from the pre-trained image teacher model (resnet34)
sh scripts/run_ccl_image.sh
  • CCL (AI): distilling audio and image knowledge from the pre-trained audio and image teacher models
sh scripts/run_ccl_ai.sh

Bibtex

@inproceedings{chen2021distilling,
  title={Distilling Audio-Visual Knowledge by Compositional Contrastive Learning},
  author={Chen, Yanbei and Xian, Yongqin and Koepke, Sophia and Shan, Ying and Akata, Zeynep},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021},
  organization={IEEE}
}

Acknowledgement

This repository is partially built with two open-source implementation: (1) 3D-ResNets-PyTorch is used in video data preparation; (2) PANNs is used for audio feature extraction.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].