All Projects → DushyantaDhyani → kdtf

DushyantaDhyani / kdtf

Licence: MIT license
Knowledge Distillation using Tensorflow

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to kdtf

SAN
[ECCV 2020] Scale Adaptive Network: Learning to Learn Parameterized Classification Networks for Scalable Input Images
Stars: ✭ 41 (-70.5%)
Mutual labels:  knowledge-distillation
FKD
A Fast Knowledge Distillation Framework for Visual Recognition
Stars: ✭ 49 (-64.75%)
Mutual labels:  knowledge-distillation
ACCV TinyGAN
BigGAN; Knowledge Distillation; Black-Box; Fast Training; 16x compression
Stars: ✭ 62 (-55.4%)
Mutual labels:  knowledge-distillation
MutualGuide
Localize to Classify and Classify to Localize: Mutual Guidance in Object Detection
Stars: ✭ 97 (-30.22%)
Mutual labels:  knowledge-distillation
mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Stars: ✭ 644 (+363.31%)
Mutual labels:  knowledge-distillation
SemCKD
This is the official implementation for the AAAI-2021 paper (Cross-Layer Distillation with Semantic Calibration).
Stars: ✭ 42 (-69.78%)
Mutual labels:  knowledge-distillation
MLIC-KD-WSD
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection (ACM MM 2018)
Stars: ✭ 58 (-58.27%)
Mutual labels:  knowledge-distillation
Pretrained Language Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Stars: ✭ 2,033 (+1362.59%)
Mutual labels:  knowledge-distillation
bert-AAD
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation
Stars: ✭ 27 (-80.58%)
Mutual labels:  knowledge-distillation
FGD
Focal and Global Knowledge Distillation for Detectors (CVPR 2022)
Stars: ✭ 124 (-10.79%)
Mutual labels:  knowledge-distillation
AB distillation
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons (AAAI 2019)
Stars: ✭ 105 (-24.46%)
Mutual labels:  knowledge-distillation
Zero-shot Knowledge Distillation Pytorch
ZSKD with PyTorch
Stars: ✭ 26 (-81.29%)
Mutual labels:  knowledge-distillation
ProSelfLC-2021
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.
Stars: ✭ 45 (-67.63%)
Mutual labels:  knowledge-distillation
neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Stars: ✭ 666 (+379.14%)
Mutual labels:  knowledge-distillation
model optimizer
Model optimizer used in Adlik.
Stars: ✭ 22 (-84.17%)
Mutual labels:  knowledge-distillation
LD
Localization Distillation for Dense Object Detection (CVPR 2022)
Stars: ✭ 271 (+94.96%)
Mutual labels:  knowledge-distillation
cool-papers-in-pytorch
Reimplementing cool papers in PyTorch...
Stars: ✭ 21 (-84.89%)
Mutual labels:  knowledge-distillation
Awesome Knowledge Distillation
Awesome Knowledge Distillation
Stars: ✭ 2,634 (+1794.96%)
Mutual labels:  knowledge-distillation
Knowledge distillation via TF2.0
The codes for recent knowledge distillation algorithms and benchmark results via TF2.0 low-level API
Stars: ✭ 87 (-37.41%)
Mutual labels:  knowledge-distillation
Efficient-Computing
Efficient-Computing
Stars: ✭ 474 (+241.01%)
Mutual labels:  knowledge-distillation

Knowledge Distillation - Tensorflow

This is an implementation for the basic idea behind Hinton's Knowledge Distillation Paper. We do not reproduce the exact results but rather show that the idea works.

While a few other implementations are available, the code flow is not very intuitive. Here we generate the soft targets from the teacher in an on-line manner while training the student network.

The big and small models (with some modification - We currently have a simple softmax regression as in TF's tutorial) have been taken from here.

While this may not (or may) be a good way to implement the distillation architecture, it leads to a good improvement in the (small) student model. In case you find any bug or have any suggestions feel free to create an issue or even send in a pull request.

Requirements

Tensorflow 1.3 or above

Running the code

Train the Teacher Model

 python main.py --model_type teacher --checkpoint_dir teachercpt --num_steps 5000 --temperature 5

Train the Student Model (in a standalone manner for comparison)

 python main.py --model_type student --checkpoint_dir studentcpt --num_steps 5000

Train the Student Model (Using Soft Targets from the teacher model)

 python main.py --model_type student --checkpoint_dir studentcpt --load_teacher_from_checkpoint true --load_teacher_checkpoint_dir teachercpt --num_steps 5000 --temperature 5

Results (For different temperature values)

Model Accuracy - 2 Accuracy - 5
Teacher Only 97.9 98.12
Distillation 89.14 90.77
Student Only 88.84 88.84

The small model when trained without the soft labels always use temperature=1.

References

Distilling the Knowledge in a Neural Network

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].