All Projects → j-marple-dev → Model_compression

j-marple-dev / Model_compression

Licence: mit
PyTorch Model Compression

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Model compression

neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Stars: ✭ 666 (+344%)
Mutual labels:  pruning, quantization
Awesome Edge Machine Learning
A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.
Stars: ✭ 139 (-7.33%)
Mutual labels:  quantization, pruning
ATMC
[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”
Stars: ✭ 41 (-72.67%)
Mutual labels:  pruning, quantization
Micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Stars: ✭ 1,232 (+721.33%)
Mutual labels:  quantization, pruning
Awesome Emdl
Embedded and mobile deep learning research resources
Stars: ✭ 554 (+269.33%)
Mutual labels:  quantization, pruning
sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Stars: ✭ 264 (+76%)
Mutual labels:  pruning, quantization
SSD-Pruning-and-quantization
Pruning and quantization for SSD. Model compression.
Stars: ✭ 19 (-87.33%)
Mutual labels:  pruning, quantization
Awesome Ml Model Compression
Awesome machine learning model compression research papers, tools, and learning material.
Stars: ✭ 166 (+10.67%)
Mutual labels:  quantization, pruning
Aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Stars: ✭ 453 (+202%)
Mutual labels:  quantization, pruning
Distiller
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Stars: ✭ 3,760 (+2406.67%)
Mutual labels:  quantization, pruning
Awesome Ai Infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Stars: ✭ 223 (+48.67%)
Mutual labels:  quantization, pruning
Model Optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Stars: ✭ 992 (+561.33%)
Mutual labels:  quantization, pruning
Nncf
PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference
Stars: ✭ 218 (+45.33%)
Mutual labels:  quantization, pruning
torch-model-compression
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
Stars: ✭ 126 (-16%)
Mutual labels:  pruning, quantization
Kd lib
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
Stars: ✭ 173 (+15.33%)
Mutual labels:  quantization, pruning
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (-62.67%)
Mutual labels:  pruning, quantization
sparsify
Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint
Stars: ✭ 138 (-8%)
Mutual labels:  pruning, quantization
Paddleslim
PaddleSlim is an open-source library for deep model compression and architecture search.
Stars: ✭ 677 (+351.33%)
Mutual labels:  quantization, pruning
Ntagger
reference pytorch code for named entity tagging
Stars: ✭ 58 (-61.33%)
Mutual labels:  quantization, pruning
Dfq
PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.
Stars: ✭ 125 (-16.67%)
Mutual labels:  quantization

Model Compression

License: MIT

All Contributors

Contents

Getting started

Prerequisites

  • This repository is implemented and verified on Anaconda virtual environment with python 3.7

Installation

  1. Clone this repository.
$ git clone https://github.com/j-marple-dev/model_compression.git
$ cd model_compression
  1. Create virtual environment
$ conda env create -f environment.yml 
$ conda activate model_compression

or

$ make install 
$ conda activate model_compression
  1. (Optional for contributors) Install CI environment
$ conda activate model_compression
$ make dev
  1. (Optional for nvidia gpu) Install cudatoolkit.
$ conda activate model_compression
$ conda install -c pytorch cudatooolkit=${cuda_version}

After environment setup, you can validate the code by the following commands.

$ make format  # for formatting
$ make test  # for linting

Docker

  1. Clone this repository.
$ git clone https://github.com/j-marple-dev/model_compression.git
$ cd model_compression
  1. Make sure you have installed Docker Engine and nvidia-docker.

  2. Run the docker image.

$ docker run -it --gpus all --ipc=host -v $PWD:/app/model_compression jmarpledev/model_compression:latest /bin/bash
$ cd model_compression

Usages

Run training

Training the model. Trainer supports the following options:

$ python train.py --help
usage: train.py [-h] [--multi-gpu] [--gpu GPU] [--finetune FINETUNE]
                [--resume RESUME] [--half] [--wlog] [--config CONFIG]

Model trainer.

optional arguments:
  -h, --help           show this help message and exit
  --multi-gpu          Multi-GPU use
  --gpu GPU            GPU id to use
  --finetune FINETUNE  Model path to finetune (.pth.tar)
  --resume RESUME      Input log directory name to resume in save/checkpoint
  --half               Use half precision
  --wlog               Turns on wandb logging
  --config CONFIG      Configuration path (.py)

$ python train.py --config path_to_config.py  # basic run
$ python train.py --config path_to_config.py  --gpu 1 --resume checkpoint_dir_name # resume training on gpu 1

Configurations for training

Following options are available:

  • Basic Settings: BATCH_SIZE, EPOCHS, SEED, MODEL_NAME(src/models), MODEL_PARAMS, DATASET
  • Stochatic Gradient descent: MOMENTUM, WEIGHT_DECAY, LR
  • Image Augmentation: AUG_TRAIN(src/augmentation/policies.py), AUG_TRAIN_PARAMS, AUG_TEST(src/augmentation/policies.py), CUTMIX
  • Loss: CRITERION(src/criterions.py), CRITERION_PARAMS
  • Learning Rate Scheduler: LR_SCHEDULER(src/lr_schedulers.py), LR_SCHEDULER_PARAMS
# Example of train config(config/train/cifar/densenet_121.py)
import os

config = {
    "SEED": 777,
    "AUG_TRAIN": "randaugment_train_cifar100_224",
    "AUG_TRAIN_PARAMS": dict(n_select=2, level=None),
    "AUG_TEST": "simple_augment_test_cifar100_224",
    "CUTMIX": dict(beta=1, prob=0.5),
    "DATASET": "CIFAR100",
    "MODEL_NAME": "densenet",
    "MODEL_PARAMS": dict(
        num_classes=100,
        inplanes=24,
        growthRate=32,
        compressionRate=2,
        block_configs=(6, 12, 24, 16),
        small_input=False,
        efficient=False,
    ),
    "CRITERION": "CrossEntropy", # CrossEntropy, HintonKLD
    "CRITERION_PARAMS": dict(num_classes=100, label_smoothing=0.1),
    "LR_SCHEDULER": "WarmupCosineLR", # WarmupCosineLR, Identity, MultiStepLR
    "LR_SCHEDULER_PARAMS": dict(
        warmup_epochs=5, start_lr=1e-3, min_lr=1e-5, n_rewinding=1
    ),
    "BATCH_SIZE": 128,
    "LR": 0.1,
    "MOMENTUM": 0.9,
    "WEIGHT_DECAY": 1e-4,
    "NESTEROV": True,
    "EPOCHS": 300,
    "N_WORKERS": os.cpu_count(),
}

Run pruning

Pruning makes a model sparse. Pruner supports the following methods:

  1. Unstructured Pruning
  1. Structured (Channel-wise) Pruning

Usually, unstructured pruning gives more sparsity, but it doesn't support shrinking.

$ python prune.py --help
usage: prune.py [-h] [--multi-gpu] [--gpu GPU] [--resume RESUME] [--wlog]
                [--config CONFIG]

Model pruner.

optional arguments:
  -h, --help       show this help message and exit
  --multi-gpu      Multi-GPU use
  --gpu GPU        GPU id to use
  --resume RESUME  Input checkpoint directory name
  --wlog           Turns on wandb logging
  --config CONFIG  Configuration path

usage: prune.py [-h] [--gpu GPU] [--resume RESUME] [--wlog] [--config CONFIG]

$ python prune.py --config path_to_config.py  # basic run
$ python prune.py --config path_to_config.py --multi-gpu --wlog  # run on multi-gpu with wandb logging

Configurations for pruning

Pruning configuration extends training configuration (recommended) with following options:

  • Basic Training Settings: TRAIN_CONFIG
  • Pruning Settings: N_PRUNING_ITER, PRUNE_METHOD(src/runner/pruner.py), PRUNE_PARAMS
# Example of prune config(config/prune/cifar100/densenet_small_l2mag.py)
from config.train.cifar100 import densenet_small

train_config = densenet_small.config
config = {
    "TRAIN_CONFIG": train_config,
    "N_PRUNING_ITER": 15,
    "PRUNE_METHOD": "Magnitude", # LotteryTicketHypothesis, Magnitude, NetworkSlimming, SlimMagnitude
    "PRUNE_PARAMS": dict(
        PRUNE_AMOUNT=0.2,  # it iteratively prunes 20% of the network parameters at the end of trainings
        NORM=2,
        STORE_PARAM_BEFORE=train_config["EPOCHS"],  # used for weight initialization at every pruning iteration
        TRAIN_START_FROM=0,  # training starts from this epoch
        PRUNE_AT_BEST=False,  # if True, it prunes parameters at the trained network which achieves the best accuracy
                              # otherwise, it prunes the network at the end of training
    ),
}

Run shrinking (Experimental)

Shrinking reshapes a pruned model and reduce its size.

$ python shrink.py --help
usage: shrink.py [-h] [--gpu GPU] [--checkpoint CHECKPOINT] [--config CONFIG]

Model shrinker.

optional arguments:
  -h, --help            show this help message and exit
  --gpu GPU             GPU id to use
  --checkpoint CHECKPOINT
                        input checkpoint path to quantize
  --config CONFIG       Pruning configuration path

$ python shrink.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar  # basic run
Important Notes:

Shrinker is now experimental. It only supports:

  • channel-wise prunned models
  • networks that consist of conv-bn-activation sequence
  • network blocks that has channel concatenation followed by skip connections (e.g. DenseNet)
  • networks that have only one last fully-connected layer

On the other hads, it doesn't support:

  • network blocks that has element-wise sum followed by skip connections (e.g. ResNet, MixNet)
  • networks that have multiple fully-connected layers
  • Quantization after shrinking

Run quantization

It conducts one of 8-bit quantization methods:

  • post-training static quantization
  • Quantization-Aware Training
$ python quantize.py --help
usage: quantize.py [-h] [--resume RESUME] [--wlog] [--config CONFIG]
                   [--checkpoint CHECKPOINT]

Model quantizer.

optional arguments:
  -h, --help            show this help message and exit
  --resume RESUME       Input log directory name to resume
  --wlog                Turns on wandb logging
  --static              Post-training static quantization
  --config CONFIG       Configuration path
  --checkpoint CHECKPOINT
                        Input checkpoint path to quantize

$ python quantize.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar  # basic qat run
$ python quantize.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar --static  # basic static quantization run

Experiment Results

WANDB Log

Unstructured Pruning (LTH vs Weight Rewinding vs LR Rewinding)

Screen Shot 2020-08-24 at 1 00 31 AM

Structured Pruning (Slim vs L2Mag vs L2MagSlim)

Screen Shot 2020-08-26 at 11 05 22 PM

Shrinking after Structured Pruning

Densenet (L=100, k=12) pruned by 19.66% (Slim & CIFAR100)

parameters

  • Accuracy: 80.37%
  • Parameters: 0.78M -> 0.51M
  • Model Size: 6.48Mb -> 4.14Mb
$ python shrink.py --config config/prune/cifar100/densenet_small_slim.py --checkpoint path_to_checkpoint.pth.tar

2020-08-26 13:50:38,442 - trainer.py:71 - INFO - Created a model densenet with 0.78M params
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:50:42,719 - shrinker.py:104 - INFO - Acc: 80.37, Size: 6.476016 MB, Sparsity: 19.66 %
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:50:45,781 - shrinker.py:118 - INFO - Acc: 80.37, Size: 4.141713 MB, Params: 0.51 M
Densenet (L=100, k=12) pruned by 35.57% (Network Slimming & CIFAR100)

parameters

  • Accuracy: 79.07%
  • Parameters: 0.78M -> 0.35M
  • Model Size: 6.48Mb -> 2.85Mb
$ python shrink.py --config config/prune/cifar100/densenet_small_slim.py --checkpoint path_to_checkpoint.pth.tar

2020-08-26 13:52:58,946 - trainer.py:71 - INFO - Created a model densenet with 0.78M params
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:53:03,100 - shrinker.py:104 - INFO - Acc: 79.07, Size: 6.476016 MB, Sparsity: 35.57 %
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:53:06,114 - shrinker.py:118 - INFO - Acc: 79.07, Size: 2.851149 MB, Params: 0.35 M

Quantization

Post-training Static Quantization
$ python quantize.py --config config/quantize/cifar100/densenet_small.py --checkpoint save/test/densenet_small/296_81_20.pth.tar --static --check-acc

2020-08-26 13:57:02,595 - trainer.py:71 - INFO - Created a model quant_densenet with 0.78M params
2020-08-26 13:57:05,275 - quantizer.py:87 - INFO - Acc: 81.2 %  Size: 3.286695 MB
2020-08-26 13:57:05,344 - quantizer.py:95 - INFO - Post Training Static Quantization: Run calibration
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:40 Time:  0:02:40
2020-08-26 13:59:47,555 - quantizer.py:117 - INFO - Acc: 81.03 %  Size: 0.974913 MB
Quantization-Aware Training
$ python quantize.py --config config/quantize/cifar100/densenet_small.py --checkpoint path_to_checkpoint.pth.tar --check-acc

2020-08-26 14:06:46,855 - trainer.py:71 - INFO - Created a model quant_densenet with 0.78M params
2020-08-26 14:06:49,506 - quantizer.py:87 - INFO - Acc: 81.2 %  Size: 3.286695 MB
2020-08-26 14:06:49,613 - quantizer.py:99 - INFO - Quantization Aware Training: Run training
2020-08-26 14:46:51,857 - trainer.py:209 - INFO - Epoch: [0 | 4]        train/lr: 0.0001        train/loss: 1.984219    test/loss: 1.436638     test/model_acc: 80.96%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:09 Time:  0:38:09
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:40 Time:  0:02:40
2020-08-26 15:27:43,919 - trainer.py:209 - INFO - Epoch: [1 | 4]        train/lr: 9e-05 train/loss: 1.989543    test/loss: 1.435748     test/model_acc: 80.87%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:10 Time:  0:38:10
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:36 Time:  0:02:36
2020-08-26 16:08:32,883 - trainer.py:209 - INFO - Epoch: [2 | 4]        train/lr: 6.5e-05       train/loss: 1.984149    test/loss: 1.436074     test/model_acc: 80.82%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:14 Time:  0:38:14
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:39 Time:  0:02:39
2020-08-26 16:49:28,848 - trainer.py:209 - INFO - Epoch: [3 | 4]        train/lr: 3.5e-05       train/loss: 1.984537    test/loss: 1.43442      test/model_acc: 81.01%    test/best_acc: 81.01%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:19 Time:  0:38:19
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:42 Time:  0:02:42
2020-08-26 17:30:32,187 - trainer.py:209 - INFO - Epoch: [4 | 4]        train/lr: 1e-05 train/loss: 1.990936    test/loss: 1.435393     test/model_acc: 80.92%    test/best_acc: 81.01%
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:37 Time:  0:02:37
2020-08-26 17:33:10,689 - quantizer.py:117 - INFO - Acc: 81.01 %        Size: 0.974913 MB

Class Diagram

class_diagram

References

Papers

Architecture / Training
Augmentation
Pruning
Knowledge Distillation
Quantization

Implementations / Tutorials

Competition
Architecture / Training
Augmentation
Pruning
Knowledge Distillation
Quantization

Contributors

Thanks goes to these wonderful people (emoji key):


Jinwoo Park (Curt)

💻

Junghoon Kim

💻

Hyungseok Shin

💻

Juhee Lee

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].