All Projects → szq0214 → FKD

szq0214 / FKD

Licence: MIT license
A Fast Knowledge Distillation Framework for Visual Recognition

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FKD

esvit
EsViT: Efficient self-supervised Vision Transformers
Stars: ✭ 323 (+559.18%)
Mutual labels:  self-supervised-learning
CVPR21 PASS
PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"
Stars: ✭ 55 (+12.24%)
Mutual labels:  self-supervised-learning
Self-Supervised-Embedding-Fusion-Transformer
The code for our IEEE ACCESS (2020) paper Multimodal Emotion Recognition with Transformer-Based Self Supervised Feature Fusion.
Stars: ✭ 57 (+16.33%)
Mutual labels:  self-supervised-learning
SoCo
[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning
Stars: ✭ 125 (+155.1%)
Mutual labels:  self-supervised-learning
BossNAS
(ICCV 2021) BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
Stars: ✭ 125 (+155.1%)
Mutual labels:  self-supervised-learning
simpleAICV-pytorch-ImageNet-COCO-training
SimpleAICV:pytorch training example on ImageNet(ILSVRC2012)/COCO2017/VOC2007+2012 datasets.Include ResNet/DarkNet/RetinaNet/FCOS/CenterNet/TTFNet/YOLOv3/YOLOv4/YOLOv5/YOLOX.
Stars: ✭ 276 (+463.27%)
Mutual labels:  distillation
Zero-shot Knowledge Distillation Pytorch
ZSKD with PyTorch
Stars: ✭ 26 (-46.94%)
Mutual labels:  knowledge-distillation
bert-AAD
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation
Stars: ✭ 27 (-44.9%)
Mutual labels:  knowledge-distillation
MSF
Official code for "Mean Shift for Self-Supervised Learning"
Stars: ✭ 42 (-14.29%)
Mutual labels:  self-supervised-learning
BYOL
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Stars: ✭ 102 (+108.16%)
Mutual labels:  self-supervised-learning
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (+14.29%)
Mutual labels:  distillation
simsiam-cifar10
Code to train the SimSiam model on cifar10 using PyTorch
Stars: ✭ 33 (-32.65%)
Mutual labels:  self-supervised-learning
mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Stars: ✭ 644 (+1214.29%)
Mutual labels:  knowledge-distillation
GCL
List of Publications in Graph Contrastive Learning
Stars: ✭ 25 (-48.98%)
Mutual labels:  self-supervised-learning
G-SimCLR
This is the code base for paper "G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling" by Souradip Chakraborty, Aritra Roy Gosthipaty and Sayak Paul.
Stars: ✭ 69 (+40.82%)
Mutual labels:  self-supervised-learning
mae-scalable-vision-learners
A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners
Stars: ✭ 54 (+10.2%)
Mutual labels:  self-supervised-learning
EATNN
This is our implementation of EATNN: Efficient Adaptive Transfer Neural Network (SIGIR 2019)
Stars: ✭ 23 (-53.06%)
Mutual labels:  efficient-algorithm
video repres mas
code for CVPR-2019 paper: Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
Stars: ✭ 63 (+28.57%)
Mutual labels:  self-supervised-learning
MiniVox
Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".
Stars: ✭ 15 (-69.39%)
Mutual labels:  self-supervised-learning
EfficientIR
人工智障本地图片检索工具 | An EfficientNet based image retrieval tool
Stars: ✭ 64 (+30.61%)
Mutual labels:  efficientnet-pytorch

FKD: A Fast Knowledge Distillation Framework for Visual Recognition

Official PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition. Zhiqiang Shen and Eric Xing from CMU and MBZUAI.

Abstract

Knowledge Distillation (KD) has been recognized as a useful tool in many visual tasks, such as the supervised classification and self-supervised representation learning, while the main drawback of a vanilla KD framework lies in its mechanism that most of the computational overhead is consumed on forwarding through the giant teacher networks, which makes the whole learning procedure in a low-efficient and costly manner. In this work, we propose a Fast Knowledge Distillation (FKD) framework that simulates the distillation training phase and generates soft labels following the multi-crop KD procedure, meanwhile enjoying the faster training speed than ReLabel as we have no post-processes like RoI align and softmax operations. Our FKD is even more efficient than the conventional classification framework when employing multi-crop in the same image for data loading. We achieve 79.8% using ResNet-50 on ImageNet-1K, outperforming ReLabel by ~1.0% while being faster. We also demonstrate the efficiency advantage of FKD on the self-supervised learning task.

Supervised Training

Preparation

FKD Training on CNNs

To train a model, run train_FKD.py with the desired model architecture and the path to the soft label and ImageNet dataset:

python train_FKD.py -a resnet50 --lr 0.1 --num_crops 4 -b 1024 --cos --softlabel_path [soft label path] [imagenet-folder with train and val folders]

For --softlabel_path, simply use format as ./FKD_soft_label_500_crops_marginal_smoothing_k_5

Multi-processing distributed training is supported, please refer to official PyTorch ImageNet training code for details.

Evaluation

python train_FKD.py -a resnet50 -e --resume [model path] [imagenet-folder with train and val folders]

Trained Models

Model accuracy (Top-1) weights configurations
ReLabel ResNet-50 78.9 -- --
FKD ResNet-50 79.8 link Table 10 in paper
ReLabel ResNet-101 80.7 -- --
FKD ResNet-101 81.7 link Table 10 in paper

FKD Training on ViT/DeiT and SReT

To train a ViT model, run train_ViT_FKD.py with the desired model architecture and the path to the soft label and ImageNet dataset:

cd train_ViT
python train_ViT_FKD.py -a SReT_LT --lr 0.002 --wd 0.05 --num_crops 4 -b 1024 --cos --softlabel_path [soft label path] [imagenet-folder with train and val folders]

For the instructions of SReT_LT model, please refer to SReT for details.

Evaluation

python train_ViT_FKD.py -a SReT_LT -e --resume [model path] [imagenet-folder with train and val folders]

Trained Models

Model FLOPs #params accuracy (Top-1) weights configurations
DeiT-T-distill 1.3B 5.7M 74.5 -- --
FKD ViT/DeiT-T 1.3B 5.7M 75.2 link Table 11 in paper
SReT-LT-distill 1.2B 5.0M 77.7 -- --
FKD SReT-LT 1.2B 5.0M 78.7 link Table 11 in paper

Fast MEAL V2

Please see MEAL V2 for the instructions to run FKD with MEAL V2.

Self-supervised Representation Learning Using FKD

Please see FKD-SSL for the instructions to run FKD code for SSL task.

Citation

@article{shen2021afast,
      title={A Fast Knowledge Distillation Framework for Visual Recognition}, 
      author={Zhiqiang Shen and Eric Xing},
      year={2021},
      journal={arXiv preprint arXiv:2112.01528}
}

Contact

Zhiqiang Shen (zhiqians at andrew.cmu.edu or zhiqiangshen0214 at gmail.com)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].