All Projects → TencentYoutuResearch → SelfSupervisedLearning-DSM

TencentYoutuResearch / SelfSupervisedLearning-DSM

Licence: other
code for AAAI21 paper "Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion“

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to SelfSupervisedLearning-DSM

Awesome-Vision-Transformer-Collection
Variants of Vision Transformer and its downstream tasks
Stars: ✭ 124 (+376.92%)
Mutual labels:  self-supervised-learning
newt
Natural World Tasks
Stars: ✭ 24 (-7.69%)
Mutual labels:  self-supervised-learning
Self-Supervised-Embedding-Fusion-Transformer
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Stars: ✭ 57 (+119.23%)
Mutual labels:  self-supervised-learning
pillar-motion
Self-Supervised Pillar Motion Learning for Autonomous Driving (CVPR 2021)
Stars: ✭ 98 (+276.92%)
Mutual labels:  self-supervised-learning
GCL
List of Publications in Graph Contrastive Learning
Stars: ✭ 25 (-3.85%)
Mutual labels:  self-supervised-learning
BossNAS
(ICCV 2021) BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
Stars: ✭ 125 (+380.77%)
Mutual labels:  self-supervised-learning
EC-GAN
EC-GAN: Low-Sample Classification using Semi-Supervised Algorithms and GANs (AAAI 2021)
Stars: ✭ 29 (+11.54%)
Mutual labels:  aaai2021
video repres mas
code for CVPR-2019 paper: Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
Stars: ✭ 63 (+142.31%)
Mutual labels:  self-supervised-learning
SoCo
[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning
Stars: ✭ 125 (+380.77%)
Mutual labels:  self-supervised-learning
BYOL
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Stars: ✭ 102 (+292.31%)
Mutual labels:  self-supervised-learning
mae-scalable-vision-learners
A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners
Stars: ✭ 54 (+107.69%)
Mutual labels:  self-supervised-learning
lffont
Official PyTorch implementation of LF-Font (Few-shot Font Generation with Localized Style Representations and Factorization) AAAI 2021
Stars: ✭ 110 (+323.08%)
Mutual labels:  aaai2021
MSF
Official code for "Mean Shift for Self-Supervised Learning"
Stars: ✭ 42 (+61.54%)
Mutual labels:  self-supervised-learning
AttaNet
AttaNet for real-time semantic segmentation.
Stars: ✭ 37 (+42.31%)
Mutual labels:  aaai2021
G-SimCLR
This is the code base for paper "G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling" by Souradip Chakraborty, Aritra Roy Gosthipaty and Sayak Paul.
Stars: ✭ 69 (+165.38%)
Mutual labels:  self-supervised-learning
awesome-graph-self-supervised-learning-based-recommendation
A curated list of awesome graph & self-supervised-learning-based recommendation.
Stars: ✭ 37 (+42.31%)
Mutual labels:  self-supervised-learning
simsiam-cifar10
Code to train the SimSiam model on cifar10 using PyTorch
Stars: ✭ 33 (+26.92%)
Mutual labels:  self-supervised-learning
FKD
A Fast Knowledge Distillation Framework for Visual Recognition
Stars: ✭ 49 (+88.46%)
Mutual labels:  self-supervised-learning
MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-42.31%)
Mutual labels:  self-supervised-learning
CVPR21 PASS
PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"
Stars: ✭ 55 (+111.54%)
Mutual labels:  self-supervised-learning

DSM

The source code for paper Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Datasets list and some visualizations/provided weights are preparing now.

1. Introduction (scene-dominated to motion-dominated)

Video datasets are usually scene-dominated, We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.

The generated triplet is as below:

What DSM learned?

With DSM pretrain, the model learn to focus on motion region (Not necessarily actor) powerful without one label available.

2. Installation

Dataset

Please refer dataset.md for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials
    • gradientes: grad check
    • visualization:
  • src
    • data: load data
    • loss: the loss evaluate in this paper
    • model: network architectures
    • scripts: train/eval scripts
    • augment: detail implementation of Spatio-temporal Augmentation
    • utils
    • feature_extract.py: feature extractor given pretrained model
    • main.py: the main function of finetune
    • trainer.py
    • option.py
    • pt.py: self-supervised pretrain
    • ft.py: supervised finetune

DSM(Triplet)/DSM/Random

Self-supervised Pretrain

Kinetics
bash scripts/kinetics/pt.sh
UCF101
bash scripts/ucf101/pt.sh

Supervised Finetune (Clip-level)

HMDB51
bash scripts/hmdb51/ft.sh
UCF101
bash scripts/ucf101/ft.sh
Kinetics
bash scripts/kinetics/ft.sh

Video-level Evaluation

Following common practice TSN and Non-local. The final video-level result is average by 10 temporal window sampling + corner crop, which lead to better result than clip-level. Refer test.py for details.

Pretrain And Eval In one step

bash scripts/hmdb51/pt_and_ft_hmdb51.sh

Notice: More Training Options and ablation study Can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

UCF101 Pretrained (I3D)

Method UCF101 HMDB51
Random Initialization 47.9 29.6
MoCo Baseline 62.3 36.5
DSM(Triplet) 70.7 48.5
DSM 74.8 52.5

Kinetics Pretrained

Video Retrieve (UCF101-C3D)

Method @1 @5 @10 @20 @50
DSM 16.8 33.4 43.4 54.6 70.7

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
DSM 8.2 25.9 38.1 52.0 75.0

More Visualization

Acknowledgement

This work is partly based on STN, UEL and MoCo.

License

Citation

If you use our code in your research or wish to refer to the baseline results, pleasuse use the followint BibTex entry.

@article{wang2020enhancing,
  title={Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion},
  author={Wang, Jinpeng and Gao, Yuting and Li, Ke and Hu, Jianguo and Jiang, Xinyang and Guo, Xiaowei and Ji, Rongrong and Sun, Xing},
  journal={AAAI},
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].