All Projects → peterliht → Knowledge Distillation Pytorch

peterliht / Knowledge Distillation Pytorch

Licence: mit
A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Knowledge Distillation Pytorch

Condensa
Programmable Neural Network Compression
Stars: ✭ 129 (-86.92%)
Mutual labels:  deep-neural-networks, model-compression
Channel Pruning
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)
Stars: ✭ 979 (-0.71%)
Mutual labels:  deep-neural-networks, model-compression
Randwire tensorflow
tensorflow implementation of Exploring Randomly Wired Neural Networks for Image Recognition
Stars: ✭ 29 (-97.06%)
Mutual labels:  deep-neural-networks, cifar10
Microexpnet
MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Frontal Face Images
Stars: ✭ 121 (-87.73%)
Mutual labels:  deep-neural-networks, model-compression
Aognet
Code for CVPR 2019 paper: " Learning Deep Compositional Grammatical Architectures for Visual Recognition"
Stars: ✭ 132 (-86.61%)
Mutual labels:  deep-neural-networks, cifar10
Model Compression Papers
Papers for deep neural network compression and acceleration
Stars: ✭ 296 (-69.98%)
Mutual labels:  deep-neural-networks, model-compression
Image classification cifar 10
Image Classification on CIFAR-10 Dataset using Multi Layer Perceptrons in Python from Scratch.
Stars: ✭ 18 (-98.17%)
Mutual labels:  deep-neural-networks, cifar10
Servenet
Service Classification based on Service Description
Stars: ✭ 21 (-97.87%)
Mutual labels:  deep-neural-networks
Densedepth
High Quality Monocular Depth Estimation via Transfer Learning
Stars: ✭ 963 (-2.33%)
Mutual labels:  deep-neural-networks
Tabnet
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
Stars: ✭ 882 (-10.55%)
Mutual labels:  deep-neural-networks
Resnet
Tensorflow ResNet implementation on cifar10
Stars: ✭ 10 (-98.99%)
Mutual labels:  cifar10
Aletheia
Unwrapping Black Box of ReLU DNNs
Stars: ✭ 28 (-97.16%)
Mutual labels:  deep-neural-networks
Letslearnai.github.io
Lets Learn AI
Stars: ✭ 33 (-96.65%)
Mutual labels:  deep-neural-networks
Dlt
Deep Learning Toolbox for Torch
Stars: ✭ 20 (-97.97%)
Mutual labels:  deep-neural-networks
Constrained attention filter
(ECCV 2020) Tensorflow implementation of A Generic Visualization Approach for Convolutional Neural Networks
Stars: ✭ 36 (-96.35%)
Mutual labels:  deep-neural-networks
Maestro
An analytical cost model evaluating DNN mappings (dataflows and tiling).
Stars: ✭ 35 (-96.45%)
Mutual labels:  deep-neural-networks
Theano Kaldi Rnn
THEANO-KALDI-RNNs is a project implementing various Recurrent Neural Networks (RNNs) for RNN-HMM speech recognition. The Theano Code is coupled with the Kaldi decoder.
Stars: ✭ 31 (-96.86%)
Mutual labels:  deep-neural-networks
Kalasalingam
IEEE "Invited Talk on Deep Learning" 03/02/2018
Stars: ✭ 13 (-98.68%)
Mutual labels:  deep-neural-networks
Onnx R
R Interface to Open Neural Network Exchange (ONNX)
Stars: ✭ 31 (-96.86%)
Mutual labels:  deep-neural-networks
Skater
Python Library for Model Interpretation/Explanations
Stars: ✭ 973 (-1.32%)
Mutual labels:  deep-neural-networks

knowledge-distillation-pytorch

  • Exploring knowledge distillation of DNNs for efficient hardware solutions
  • Author: Haitong Li
  • Framework: PyTorch
  • Dataset: CIFAR-10

Features

  • A framework for exploring "shallow" and "deep" knowledge distillation (KD) experiments
  • Hyperparameters defined by "params.json" universally (avoiding long argparser commands)
  • Hyperparameter searching and result synthesizing (as a table)
  • Progress bar, tensorboard support, and checkpoint saving/loading (utils.py)
  • Pretrained teacher models available for download

Install

  • Clone the repo

    git clone https://github.com/peterliht/knowledge-distillation-pytorch.git
    
  • Install the dependencies (including Pytorch)

    pip install -r requirements.txt
    

Organizatoin:

  • ./train.py: main entrance for train/eval with or without KD on CIFAR-10
  • ./experiments/: json files for each experiment; dir for hypersearch
  • ./model/: teacher and student DNNs, knowledge distillation (KD) loss defination, dataloader

Key notes about usage for your experiments:

  • Download the zip file for pretrained teacher model checkpoints from this Box folder
  • Simply move the unzipped subfolders into 'knowledge-distillation-pytorch/experiments/' (replacing the existing ones if necessary; follow the default path naming)
  • Call train.py to start training 5-layer CNN with ResNet-18's dark knowledge, or training ResNet-18 with state-of-the-art deeper models distilled
  • Use search_hyperparams.py for hypersearch
  • Hyperparameters are defined in params.json files universally. Refer to the header of search_hyperparams.py for details

Train (dataset: CIFAR-10)

Note: all the hyperparameters can be found and modified in 'params.json' under 'model_dir'

-- Train a 5-layer CNN with knowledge distilled from a pre-trained ResNet-18 model

python train.py --model_dir experiments/cnn_distill

-- Train a ResNet-18 model with knowledge distilled from a pre-trained ResNext-29 teacher

python train.py --model_dir experiments/resnet18_distill/resnext_teacher

-- Hyperparameter search for a specified experiment ('parent_dir/params.json')

python search_hyperparams.py --parent_dir experiments/cnn_distill_alpha_temp

--Synthesize results of the recent hypersearch experiments

python synthesize_results.py --parent_dir experiments/cnn_distill_alpha_temp

Results: "Shallow" and "Deep" Distillation

Quick takeaways (more details to be added):

  • Knowledge distillation provides regularization for both shallow DNNs and state-of-the-art DNNs
  • Having unlabeled or partial dataset can benefit from dark knowledge of teacher models

-Knowledge distillation from ResNet-18 to 5-layer CNN

Model Dropout = 0.5 No Dropout
5-layer CNN 83.51% 84.74%
5-layer CNN w/ ResNet18 84.49% 85.69%

-Knowledge distillation from deeper models to ResNet-18

Model Test Accuracy
Baseline ResNet-18 94.175%
+ KD WideResNet-28-10 94.333%
+ KD PreResNet-110 94.531%
+ KD DenseNet-100 94.729%
+ KD ResNext-29-8 94.788%

References

H. Li, "Exploring knowledge distillation of Deep neural nets for efficient hardware solutions," CS230 Report, 2018

Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).

Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C., & Bengio, Y. (2014). Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550.

https://github.com/cs230-stanford/cs230-stanford.github.io

https://github.com/bearpaw/pytorch-classification

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].