yanbeic / CCL

Licence: Apache-2.0 license

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Programming Languages

python

139335 projects - #7 most used programming language

shell

77523 projects

Projects that are alternatives of or similar to CCL

ViCC

[WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https://arxiv.org/abs/2106.10137.

Stars: ✭ 33 (-56.58%)

Mutual labels: video-recognition, contrastive-learning

MOON

Model-Contrastive Federated Learning (CVPR 2021)

Stars: ✭ 93 (+22.37%)

Mutual labels: contrastive-learning, cvpr2021

HandMesh

No description or website provided.

Stars: ✭ 258 (+239.47%)

Mutual labels: cvpr2021

awesome-efficient-gnn

Code and resources on scalable and efficient Graph Neural Networks

Stars: ✭ 498 (+555.26%)

Mutual labels: contrastive-learning

CLSA

official implemntation for "Contrastive Learning with Stronger Augmentations"

Stars: ✭ 48 (-36.84%)

Mutual labels: contrastive-learning

One-Shot-Face-Swapping-on-Megapixels

One Shot Face Swapping on Megapixels.

Stars: ✭ 260 (+242.11%)

Mutual labels: cvpr2021

Awesome-low-level-vision-resources

A curated list of resources for Low-level Vision Tasks

Stars: ✭ 35 (-53.95%)

Mutual labels: cvpr2021

SCL

📄 Spatial Contrastive Learning for Few-Shot Classification (ECML/PKDD 2021).

Stars: ✭ 42 (-44.74%)

Mutual labels: contrastive-learning

MiCT-Net-PyTorch

Video Recognition using Mixed Convolutional Tube (MiCT) on PyTorch with a ResNet backbone

Stars: ✭ 48 (-36.84%)

Mutual labels: video-recognition

adaptive-wavelets

Adaptive, interpretable wavelets across domains (NeurIPS 2021)

Stars: ✭ 58 (-23.68%)

Mutual labels: distillation

Revisiting-Contrastive-SSL

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]

Stars: ✭ 81 (+6.58%)

Mutual labels: contrastive-learning

ZAQ-code

CVPR 2021 : Zero-shot Adversarial Quantization (ZAQ)

Stars: ✭ 59 (-22.37%)

Mutual labels: distillation

Scan2Cap

[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Stars: ✭ 81 (+6.58%)

Mutual labels: cvpr2021

cfvqa

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Stars: ✭ 96 (+26.32%)

Mutual labels: cvpr2021

Supervised-Contrastive-Learning-in-TensorFlow-2

Implements the ideas presented in https://arxiv.org/pdf/2004.11362v1.pdf by Khosla et al.

Stars: ✭ 117 (+53.95%)

Mutual labels: contrastive-learning

S2-BNN

S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural Networks via Guided Distribution Calibration (CVPR 2021)

Stars: ✭ 53 (-30.26%)

Mutual labels: contrastive-learning

COCO-LM

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Stars: ✭ 109 (+43.42%)

Mutual labels: contrastive-learning

soft-intro-vae-pytorch

[CVPR 2021 Oral] Official PyTorch implementation of Soft-IntroVAE from the paper "Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders"

Stars: ✭ 170 (+123.68%)

Mutual labels: cvpr2021

object-aware-contrastive

Object-aware Contrastive Learning for Debiased Scene Representation (NeurIPS 2021)

Stars: ✭ 44 (-42.11%)

Mutual labels: contrastive-learning

RMNet

Implementation of "Efficient Regional Memory Network for Video Object Segmentation". (Xie et al., CVPR 2021)

Stars: ✭ 76 (+0%)

Mutual labels: cvpr2021

View All Similar Projects ➔

Compositional Contrastive Learning

PyTorch implementation on Distilling Audio-Visual Knowledge by Compositional Contrastive Learning.

Introduction

Distilling knowledge from the pre-trained teacher models helps to learn a small student model that generalizes better. While existing works mostly focus on distilling knowledge within the same modality, we explore to distill the multi-modal knowledge available in video data (i.e. audio and vision). Specifically, we propose to transfer audio and visual knowledge from pre-trained image and audio teacher models to learn more expressive video representations.

In multi-modal distillation, there often exists a semantic gap across modalities, e.g. a video shows applying lipstick visually while its accompanied audio is music. To ensure effective multi-modal distillation in the presence of a cross-modal semantic gap, we propose compositional contrastive learning, which features learnable compositional embeddings to close the cross-modal semantic gap, and a multi-class contrastive distillation objective to align different modalities jointly in the shared latent space.

We demonstrate our method can distill knowledge from the audio and visual modalities to learn a stronger video model for recognition and retrieval tasks on video action recognition datasets.

Getting Started

Prerequisites:

python >= 3.6.10
pytorch >= 1.1.0
FFmpeg, FFprobe
Download datasets: UCF101, ActivityNet, VGGSound

Data Preparation on UCF101 (example):

audio features are extracted based on the audio pre-trained model PANNs. The UCF101 audio features are provided under the directory dataset/UCF101. Please uncompress the audiocnn14embed512_features.tar.gz file for details.
video data is convert to the hdf5 format using the following command. Please specify the data directory ${UCF101_DATA_DIR}, e.g. datasets/UCF101/UCF-101. Note: video data can be downloaded here.

python util_scripts/generate_video_hdf5.py --dir_path=${UCF101_DATA_DIR} --dst_path=datasets/UCF101/hdf5data --dataset=ucf101

prepare the json file for dataloader using the following command. Note: official data splits can be downloaded here.

python util_scripts/ucf101_json.py --dir_path=datasets/UCF101/ucfTrainTestlist --video_path=datasets/UCF101/hdf5data --audio_path=datasets/UCF101/audiocnn14embed512_features --dst_path=datasets/UCF101/ --video_type=hdf5

Training & Testing:

The running commands for both training and testing are written in the same script file. Experiments are conducted on 2 gpus. Please refer to the script files in the directory scripts for details. Use the folllowing commands to test on the UCF51 dataset.

baseline (w/o distillation)

sh scripts/run_baseline.sh

CCL (A): distilling audio knowledge from the pre-trained audio teacher model (audiocnn14)

sh scripts/run_ccl_audio.sh

CCL (I): distilling image knowledge from the pre-trained image teacher model (resnet34)

sh scripts/run_ccl_image.sh

CCL (AI): distilling audio and image knowledge from the pre-trained audio and image teacher models

sh scripts/run_ccl_ai.sh

Bibtex

@inproceedings{chen2021distilling,
  title={Distilling Audio-Visual Knowledge by Compositional Contrastive Learning},
  author={Chen, Yanbei and Xian, Yongqin and Koepke, Sophia and Shan, Ying and Akata, Zeynep},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021},
  organization={IEEE}
}

Acknowledgement

This repository is partially built with two open-source implementation: (1) 3D-ResNets-PyTorch is used in video data preparation; (2) PANNs is used for audio feature extraction.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yanbeic / CCL

Programming Languages

Labels

Projects that are alternatives of or similar to CCL

Compositional Contrastive Learning

Introduction

Getting Started

Prerequisites:

Data Preparation on UCF101 (example):

Training & Testing:

Bibtex

Acknowledgement