All Projects → martinetoering → ViCC

martinetoering / ViCC

Licence: other
[WACV'22] Code repository for the paper "Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting", https://arxiv.org/abs/2106.10137.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to ViCC

Simclr
SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners
Stars: ✭ 2,720 (+8142.42%)
Mutual labels:  unsupervised-learning, self-supervised-learning, contrastive-learning
TCE
This repository contains the code implementation used in the paper Temporally Coherent Embeddings for Self-Supervised Video Representation Learning (TCE).
Stars: ✭ 51 (+54.55%)
Mutual labels:  action-recognition, self-supervised-learning, contrastive-learning
temporal-ssl
Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.
Stars: ✭ 46 (+39.39%)
Mutual labels:  unsupervised-learning, action-recognition, self-supervised-learning
PIC
Parametric Instance Classification for Unsupervised Visual Feature Learning, NeurIPS 2020
Stars: ✭ 41 (+24.24%)
Mutual labels:  unsupervised-learning, self-supervised-learning, contrastive-learning
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (+145.45%)
Mutual labels:  unsupervised-learning, self-supervised-learning, contrastive-learning
CLSA
official implemntation for "Contrastive Learning with Stronger Augmentations"
Stars: ✭ 48 (+45.45%)
Mutual labels:  unsupervised-learning, self-supervised-learning, contrastive-learning
Sfmlearner
An unsupervised learning framework for depth and ego-motion estimation from monocular videos
Stars: ✭ 1,661 (+4933.33%)
Mutual labels:  unsupervised-learning, self-supervised-learning
Hidden Two Stream
Caffe implementation for "Hidden Two-Stream Convolutional Networks for Action Recognition"
Stars: ✭ 179 (+442.42%)
Mutual labels:  unsupervised-learning, action-recognition
CCL
PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Stars: ✭ 76 (+130.3%)
Mutual labels:  video-recognition, contrastive-learning
SCL
📄 Spatial Contrastive Learning for Few-Shot Classification (ECML/PKDD 2021).
Stars: ✭ 42 (+27.27%)
Mutual labels:  self-supervised-learning, contrastive-learning
PiCIE
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)
Stars: ✭ 102 (+209.09%)
Mutual labels:  unsupervised-learning, self-supervised-learning
TA3N
[ICCV 2019 Oral] TA3N: https://github.com/cmhungsteve/TA3N (Most updated repo)
Stars: ✭ 45 (+36.36%)
Mutual labels:  unsupervised-learning, action-recognition
conv3d-video-action-recognition
My experimentation around action recognition in videos. Contains Keras implementation for C3D network based on original paper "Learning Spatiotemporal Features with 3D Convolutional Networks", Tran et al. and it includes video processing pipelines coded using mPyPl package. Model is being benchmarked on popular UCF101 dataset and achieves result…
Stars: ✭ 50 (+51.52%)
Mutual labels:  action-recognition, video-recognition
Transferlearning
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
Stars: ✭ 8,481 (+25600%)
Mutual labels:  unsupervised-learning, self-supervised-learning
adareg-monodispnet
Repository for Bilateral Cyclic Constraint and Adaptive Regularization for Unsupervised Monocular Depth Prediction (CVPR2019)
Stars: ✭ 22 (-33.33%)
Mutual labels:  unsupervised-learning, self-supervised-learning
learning-topology-synthetic-data
Tensorflow implementation of Learning Topology from Synthetic Data for Unsupervised Depth Completion (RAL 2021 & ICRA 2021)
Stars: ✭ 22 (-33.33%)
Mutual labels:  unsupervised-learning, self-supervised-learning
object-aware-contrastive
Object-aware Contrastive Learning for Debiased Scene Representation (NeurIPS 2021)
Stars: ✭ 44 (+33.33%)
Mutual labels:  self-supervised-learning, contrastive-learning
naru
Neural Relation Understanding: neural cardinality estimators for tabular data
Stars: ✭ 76 (+130.3%)
Mutual labels:  unsupervised-learning, self-supervised-learning
al-fk-self-supervision
Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"
Stars: ✭ 28 (-15.15%)
Mutual labels:  unsupervised-learning, self-supervised-learning
VQ-APC
Vector Quantized Autoregressive Predictive Coding (VQ-APC)
Stars: ✭ 34 (+3.03%)
Mutual labels:  unsupervised-learning, self-supervised-learning

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

PWC PWC PWC PWC

This repository provides the implementation of the WACV 2022 paper: Self-supervised Video Representation learning with Cross-stream Prototypical Contrasting.

Video Cross-Stream Prototypical Contrasting (ViCC)

We leverage both optical flow and RGB as views for contrastive learning, by predicting consistent stream prototype assignments from the views in the training of each model. This effectively transfers knowledge from motion (flow) to appearance (RGB).

Training process

In one alternation, we optimize one model and the corresponding prototypes. The method consists of two stages. In Single-stream, RGB and Flow encoder are trained on their own features. In Cross-stream, both models are trained on both feature types.

Results

Nearest-neighbour video retrieval results on UCF101:

Model R@1
ViCC-RGB-2 62.1
ViCC-Flow-2 59.7
ViCC-R+F-2 65.1

Results on end-to-end finetuning for action recognition:

News

  • Pretrained models for S3D are now available (2021-08)
  • Two more pretrained models for R(2+1)D are linked below (2022-08)

References

How to run the code

Get started

Requirements

  • Python 3.6
  • PyTorch==1.4.0, torchvision 0.5.0
  • Cuda 10.1
  • Apex with cuda extension (see also: this issue)
  • See environment file. => tqdm, pandas, python-lmdb 0.98, mgspack==1.0.0, msgpack-python==0.5.6.

Preprocessing

Follow instructions in process_data.
Optional: See CoCLR for dataset. (last checked: 2021-07-03)

Pretrain and Evaluation

We provide several slurm scripts for pretraining, as well as for linear probe, retrieval and finetuning experiments. Your own paths can be changed in the scripts.
Distributed Training is available via Slurm where the distributed initialization method needs to be set correctly (parameter dist_url).

How to run: pretraining

The algorithm consist of two stages (following CoCLR):

  • Single-stream: RGB model is trained on RGB data, then Flow on flow data.
  • Cross-stream: Both models are initialized with single-stream models. RGB is trained on both RGB and Flow data, then Flow is trained on RGB and Flow data. Repeat for N alternations.

Single-stream

Train ViCC-RGB-1 (Single-stream):

sbatch slurm_scripts/pretrain/single-rgb.sh

or:

cd src

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 main_single.py --net s3d --model vicc --dataset ucf101-2clip \
--seq_len 32 --num_seq 2 --ds 1 --batch_size 48 --wd 1e-6 --cos True \
--base_lr 0.6 --final_lr 0.0006 \
--epochs 500 --save_epoch 199 --optim sgd --img_dim 128 \
--dataset_root {DATASET_PATH} --prefix {EXPERIMENT_PATH} --name_prefix "single/rgb" \
--workers 12 --moco-dim 128 --moco-k 1920 --moco-t 0.1 \
--views_for_assign 0 1 --nmb_views 2 --epsilon 0.05 --sinkhorn_iterations 3 \
--nmb_prototypes 300 --epoch_queue_starts 200 --freeze_prototypes_nepochs 100 --use_fp16 False 

Train ViCC-Flow-1 (Single-stream):

sbatch slurm_scripts/pretrain/single-flow.sh

or:

cd src

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 main_single.py --net s3d --model vicc --dataset ucf101-f-2clip \
--seq_len 32 --num_seq 2 --ds 1 --batch_size 48 --wd 1e-6 --cos True \
--base_lr 0.6 --final_lr 0.0006 \
--epochs 500 --save_epoch 199 --optim sgd --img_dim 128 \
--dataset_root {DATASET_PATH} --prefix {EXPERIMENT_PATH} --name_prefix "single/flow" \
--workers 12 --moco-dim 128 --moco-k 1920 --moco-t 0.1 \
--views_for_assign 0 1 --nmb_views 2 --epsilon 0.05 --sinkhorn_iterations 3 \
--nmb_prototypes 300 --epoch_queue_starts 200 --freeze_prototypes_nepochs 100 --use_fp16 False 

Cross-stream

Train ViCC-RGB-2 and ViCC-Flow-2:

sbatch slurm_scripts/pretrain/cross.sh

or:

cd src

Cycle 1 RGB:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 main_cross.py --net s3d --model 'vicc2' --dataset 'ucf101-2stream-2clip' \
--seq_len 32 --num_seq 2 --ds 1 --batch_size 24 --wd 1e-6 --cos True \
--base_lr 0.6 --final_lr 0.0006 --pretrain {ViCC-RGB-1-SINGLE.pth.tar} {ViCC-Flow-1-SINGLE.pth.tar} \
--epochs 100 --save_epoch 24 --optim sgd --img_dim 128 \
--dataset_root {DATASET_PATH} --prefix {EXPERIMENT_PATH} --name_prefix "cross/c1-flow-mining" \
--workers 12 --moco-dim 128 --moco-k 1920 --moco-t 0.1 \
--views_for_assign 0 1 2 3 --nmb_views 2 2 --epsilon 0.05 --sinkhorn_iterations 3 \
--nmb_prototypes 300 --epoch_queue_starts 25 --freeze_prototypes_nepochs 0 --use_fp16 True \

Cycle 1 Flow (notice the reverse argument):

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 main_cross.py --net s3d --model 'vicc2' --dataset 'ucf101-2stream-2clip' \
--seq_len 32 --num_seq 2 --ds 1 --batch_size 24 --wd 1e-6 --cos True \
--base_lr 0.6 --final_lr 0.0006 --pretrain {ViCC-Flow-1-SINGLE.pth.tar} {ViCC-RGB-2-CYCLE-1.pth.tar}  \
--epochs 100 --save_epoch 24 --optim sgd --img_dim 128 \
--dataset_root {DATASET_PATH} --prefix {EXPERIMENT_PATH} --name_prefix "cross/c1-rgb-mining" \
--workers 12 --moco-dim 128 --moco-k 1920 --moco-t 0.1 \
--views_for_assign 0 1 2 3 --nmb_views 2 2 --epsilon 0.05 --sinkhorn_iterations 3 \
--nmb_prototypes 300 --epoch_queue_starts 25 --freeze_prototypes_nepochs 0 --use_fp16 True \
--reverse \

Repeat the above two commands for the second cycle (Cycle 2 RGB, Cycle 2 Flow) with the newest checkpoints every run.

How to run: evaluation

Use e.g. sbatch slurm_scripts/eval/retr-rgb-2.sh, sbatch slurm_scripts/eval/lin-rgb-2.sh or sbatch slurm_scripts/eval/ft-rgb-2.sh. The '2' in the name of the scripts indicates the models for the cross-stream stage, but single-stream models could also be evaluated in the same way.

or:

cd src/eval

Nearest-neighbour video retrieval

For RGB:

CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --retrieval \
--dirname {FEATURE_PATH} --test {TEST_PATH} --dataset_root {DATASET_PATH}

Use --dataset 'ucf101-f' argument for flow.

Linear probe

For RGB, e.g.:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main_classifier.py --net 's3d' --dataset 'ucf101' \
--seq_len 32 --ds 1 --batch_size 32 --train_what last --optim sgd --lr 1e-1 --wd 1e-3 \
--epochs 100 --schedule 60 80 --name_prefix "lin-rgb-2" \
--prefix {EXPERIMENT_PATH} --pretrain {PRETRAIN_PATH} --dataset_root {DATASET_PATH} 

Use --dataset 'ucf101-f' argument for flow.

Test linear probe:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main_classifier.py --net s3d --dataset 'ucf101' \
--batch_size 32 --seq_len 32 --ds 1 --train_what last --ten_crop \
--prefix {EXPERIMENT_PATH} --test {TEST_PATH} --dataset_root {DATASET_PATH}

End-to-end finetuning

For RGB, e.g.:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main_classifier.py --net 's3d' --dataset 'ucf101' \
--seq_len 32 --ds 1 --batch_size 32 --train_what ft --optim sgd --lr 0.1 --wd 0.001 \
--epochs 500 --schedule 200 300 400 450 --name_prefix "ft-rgb-2" \
--prefix {EXPERIMENT_PATH} --pretrain {PRETRAIN_PATH} --dataset_root {DATASET_PATH}

Use --dataset 'ucf101-f' argument for flow.

Test finetuning:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main_classifier.py --net s3d --dataset 'ucf101' \
--batch_size 32 --seq_len 32 --ds 1 --train_what ft --ten_crop \
--prefix {EXPERIMENT_PATH} --test {TEST_PATH} --dataset_root {DATASET_PATH}

Pretrained models

S3D:

R(2+1)D:

Single-stream S3D:

Citation

If you find this repository helpful in your research, please consider citing our paper:

@article{toering2022selfsupervised,
    title={Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting}, 
    author={Martine Toering and Ioannis Gatopoulos and Maarten Stol and Vincent Tao Hu},
    journal={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year={2022}
}

Acknowledgements

This work was supported and funded from the University of Amsterdam and BrainCreators B.V.

Author
Martine Toering, 2022

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].