Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → j-marple-dev → Model_compression

j-marple-dev / Model_compression

Licence: mit

PyTorch Model Compression

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch quantization pruning

Projects that are alternatives of or similar to Model compression

neural-compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.

Stars: ✭ 666 (+344%)

Mutual labels: pruning, quantization

Awesome Edge Machine Learning

A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.

Stars: ✭ 139 (-7.33%)

Mutual labels: quantization, pruning

ATMC

[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”

Stars: ✭ 41 (-72.67%)

Mutual labels: pruning, quantization

Micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Stars: ✭ 1,232 (+721.33%)

Mutual labels: quantization, pruning

Awesome Emdl

Embedded and mobile deep learning research resources

Stars: ✭ 554 (+269.33%)

Mutual labels: quantization, pruning

sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Stars: ✭ 264 (+76%)

Mutual labels: pruning, quantization

SSD-Pruning-and-quantization

Pruning and quantization for SSD. Model compression.

Stars: ✭ 19 (-87.33%)

Mutual labels: pruning, quantization

Awesome Ml Model Compression

Awesome machine learning model compression research papers, tools, and learning material.

Stars: ✭ 166 (+10.67%)

Mutual labels: quantization, pruning

Aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Stars: ✭ 453 (+202%)

Mutual labels: quantization, pruning

Distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Stars: ✭ 3,760 (+2406.67%)

Mutual labels: quantization, pruning

Awesome Ai Infrastructures

Infrastructures™ for Machine Learning Training/Inference in Production.

Stars: ✭ 223 (+48.67%)

Mutual labels: quantization, pruning

Model Optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Stars: ✭ 992 (+561.33%)

Mutual labels: quantization, pruning

Nncf

PyTorch*-based Neural Network Compression Framework for enhanced OpenVINO™ inference

Stars: ✭ 218 (+45.33%)

Mutual labels: quantization, pruning

torch-model-compression

针对pytorch模型的自动化模型结构分析和修改工具集，包含自动分析模型结构的模型压缩算法库

Stars: ✭ 126 (-16%)

Mutual labels: pruning, quantization

Kd lib

A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Stars: ✭ 173 (+15.33%)

Mutual labels: quantization, pruning

bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

Stars: ✭ 56 (-62.67%)

Mutual labels: pruning, quantization

sparsify

Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint

Stars: ✭ 138 (-8%)

Mutual labels: pruning, quantization

Paddleslim

PaddleSlim is an open-source library for deep model compression and architecture search.

Stars: ✭ 677 (+351.33%)

Mutual labels: quantization, pruning

Ntagger

reference pytorch code for named entity tagging

Stars: ✭ 58 (-61.33%)

Mutual labels: quantization, pruning

Dfq

PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.

Stars: ✭ 125 (-16.67%)

Mutual labels: quantization

View All Similar Projects ➔

Model Compression

Getting started

Prerequisites

This repository is implemented and verified on Anaconda virtual environment with python 3.7

Installation

Clone this repository.

$ git clone https://github.com/j-marple-dev/model_compression.git
$ cd model_compression

Create virtual environment

$ conda env create -f environment.yml 
$ conda activate model_compression

$ make install 
$ conda activate model_compression

(Optional for contributors) Install CI environment

$ conda activate model_compression
$ make dev

(Optional for nvidia gpu) Install cudatoolkit.

$ conda activate model_compression
$ conda install -c pytorch cudatooolkit=${cuda_version}

After environment setup, you can validate the code by the following commands.

$ make format  # for formatting
$ make test  # for linting

Docker

Clone this repository.

$ git clone https://github.com/j-marple-dev/model_compression.git
$ cd model_compression

Make sure you have installed Docker Engine and nvidia-docker.
Run the docker image.

$ docker run -it --gpus all --ipc=host -v $PWD:/app/model_compression jmarpledev/model_compression:latest /bin/bash
$ cd model_compression

Usages

Run training

Training the model. Trainer supports the following options:

Basic Settings: batch size, epoch numbers, seed
Stochastic Gradient Descent: momentum, weight decay, initial learning rate, nesterov momentum
Image Augmentation: Autoaugment, Randaugment, CutMix
Loss: Cross Entropy + Label Smoothing, Hinton Knowledge Distillation Loss
Learning Rate Scheduler: Cosine Annealing with Initial Warmups

$ python train.py --help
usage: train.py [-h] [--multi-gpu] [--gpu GPU] [--finetune FINETUNE]
                [--resume RESUME] [--half] [--wlog] [--config CONFIG]

Model trainer.

optional arguments:
  -h, --help           show this help message and exit
  --multi-gpu          Multi-GPU use
  --gpu GPU            GPU id to use
  --finetune FINETUNE  Model path to finetune (.pth.tar)
  --resume RESUME      Input log directory name to resume in save/checkpoint
  --half               Use half precision
  --wlog               Turns on wandb logging
  --config CONFIG      Configuration path (.py)

$ python train.py --config path_to_config.py  # basic run
$ python train.py --config path_to_config.py  --gpu 1 --resume checkpoint_dir_name # resume training on gpu 1

Configurations for training

Following options are available:

Basic Settings: BATCH_SIZE, EPOCHS, SEED, MODEL_NAME(src/models), MODEL_PARAMS, DATASET
Stochatic Gradient descent: MOMENTUM, WEIGHT_DECAY, LR
Image Augmentation: AUG_TRAIN(src/augmentation/policies.py), AUG_TRAIN_PARAMS, AUG_TEST(src/augmentation/policies.py), CUTMIX
Loss: CRITERION(src/criterions.py), CRITERION_PARAMS
Learning Rate Scheduler: LR_SCHEDULER(src/lr_schedulers.py), LR_SCHEDULER_PARAMS

# Example of train config(config/train/cifar/densenet_121.py)
import os

config = {
    "SEED": 777,
    "AUG_TRAIN": "randaugment_train_cifar100_224",
    "AUG_TRAIN_PARAMS": dict(n_select=2, level=None),
    "AUG_TEST": "simple_augment_test_cifar100_224",
    "CUTMIX": dict(beta=1, prob=0.5),
    "DATASET": "CIFAR100",
    "MODEL_NAME": "densenet",
    "MODEL_PARAMS": dict(
        num_classes=100,
        inplanes=24,
        growthRate=32,
        compressionRate=2,
        block_configs=(6, 12, 24, 16),
        small_input=False,
        efficient=False,
    ),
    "CRITERION": "CrossEntropy", # CrossEntropy, HintonKLD
    "CRITERION_PARAMS": dict(num_classes=100, label_smoothing=0.1),
    "LR_SCHEDULER": "WarmupCosineLR", # WarmupCosineLR, Identity, MultiStepLR
    "LR_SCHEDULER_PARAMS": dict(
        warmup_epochs=5, start_lr=1e-3, min_lr=1e-5, n_rewinding=1
    ),
    "BATCH_SIZE": 128,
    "LR": 0.1,
    "MOMENTUM": 0.9,
    "WEIGHT_DECAY": 1e-4,
    "NESTEROV": True,
    "EPOCHS": 300,
    "N_WORKERS": os.cpu_count(),
}

Run pruning

Pruning makes a model sparse. Pruner supports the following methods:

Unstructured Pruning

Structured (Channel-wise) Pruning

Network Sliming
Magnitude based channel-wise pruning
Slim-Magnitude channel-wise pruning (combination of above two methods)

Usually, unstructured pruning gives more sparsity, but it doesn't support shrinking.

$ python prune.py --help
usage: prune.py [-h] [--multi-gpu] [--gpu GPU] [--resume RESUME] [--wlog]
                [--config CONFIG]

Model pruner.

optional arguments:
  -h, --help       show this help message and exit
  --multi-gpu      Multi-GPU use
  --gpu GPU        GPU id to use
  --resume RESUME  Input checkpoint directory name
  --wlog           Turns on wandb logging
  --config CONFIG  Configuration path

usage: prune.py [-h] [--gpu GPU] [--resume RESUME] [--wlog] [--config CONFIG]

$ python prune.py --config path_to_config.py  # basic run
$ python prune.py --config path_to_config.py --multi-gpu --wlog  # run on multi-gpu with wandb logging

Configurations for pruning

Pruning configuration extends training configuration (recommended) with following options:

Basic Training Settings: TRAIN_CONFIG
Pruning Settings: N_PRUNING_ITER, PRUNE_METHOD(src/runner/pruner.py), PRUNE_PARAMS

# Example of prune config(config/prune/cifar100/densenet_small_l2mag.py)
from config.train.cifar100 import densenet_small

train_config = densenet_small.config
config = {
    "TRAIN_CONFIG": train_config,
    "N_PRUNING_ITER": 15,
    "PRUNE_METHOD": "Magnitude", # LotteryTicketHypothesis, Magnitude, NetworkSlimming, SlimMagnitude
    "PRUNE_PARAMS": dict(
        PRUNE_AMOUNT=0.2,  # it iteratively prunes 20% of the network parameters at the end of trainings
        NORM=2,
        STORE_PARAM_BEFORE=train_config["EPOCHS"],  # used for weight initialization at every pruning iteration
        TRAIN_START_FROM=0,  # training starts from this epoch
        PRUNE_AT_BEST=False,  # if True, it prunes parameters at the trained network which achieves the best accuracy
                              # otherwise, it prunes the network at the end of training
    ),
}

Run shrinking (Experimental)

Shrinking reshapes a pruned model and reduce its size.

$ python shrink.py --help
usage: shrink.py [-h] [--gpu GPU] [--checkpoint CHECKPOINT] [--config CONFIG]

Model shrinker.

optional arguments:
  -h, --help            show this help message and exit
  --gpu GPU             GPU id to use
  --checkpoint CHECKPOINT
                        input checkpoint path to quantize
  --config CONFIG       Pruning configuration path

$ python shrink.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar  # basic run

Important Notes:

Shrinker is now experimental. It only supports:

channel-wise prunned models
networks that consist of conv-bn-activation sequence
network blocks that has channel concatenation followed by skip connections (e.g. DenseNet)
networks that have only one last fully-connected layer

On the other hads, it doesn't support:

network blocks that has element-wise sum followed by skip connections (e.g. ResNet, MixNet)
networks that have multiple fully-connected layers
Quantization after shrinking

Run quantization

It conducts one of 8-bit quantization methods:

post-training static quantization
Quantization-Aware Training

$ python quantize.py --help
usage: quantize.py [-h] [--resume RESUME] [--wlog] [--config CONFIG]
                   [--checkpoint CHECKPOINT]

Model quantizer.

optional arguments:
  -h, --help            show this help message and exit
  --resume RESUME       Input log directory name to resume
  --wlog                Turns on wandb logging
  --static              Post-training static quantization
  --config CONFIG       Configuration path
  --checkpoint CHECKPOINT
                        Input checkpoint path to quantize

$ python quantize.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar  # basic qat run
$ python quantize.py --config path_to_config.py --checkpoint path_to_checkpoint.pth.tar --static  # basic static quantization run

Experiment Results

WANDB Log

Unstructured Pruning (LTH vs Weight Rewinding vs LR Rewinding)

Structured Pruning (Slim vs L2Mag vs L2MagSlim)

Shrinking after Structured Pruning

Densenet (L=100, k=12) pruned by 19.66% (Slim & CIFAR100)

Accuracy: 80.37%
Parameters: 0.78M -> 0.51M
Model Size: 6.48Mb -> 4.14Mb

$ python shrink.py --config config/prune/cifar100/densenet_small_slim.py --checkpoint path_to_checkpoint.pth.tar

2020-08-26 13:50:38,442 - trainer.py:71 - INFO - Created a model densenet with 0.78M params
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:50:42,719 - shrinker.py:104 - INFO - Acc: 80.37, Size: 6.476016 MB, Sparsity: 19.66 %
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:50:45,781 - shrinker.py:118 - INFO - Acc: 80.37, Size: 4.141713 MB, Params: 0.51 M

Densenet (L=100, k=12) pruned by 35.57% (Network Slimming & CIFAR100)

Accuracy: 79.07%
Parameters: 0.78M -> 0.35M
Model Size: 6.48Mb -> 2.85Mb

$ python shrink.py --config config/prune/cifar100/densenet_small_slim.py --checkpoint path_to_checkpoint.pth.tar

2020-08-26 13:52:58,946 - trainer.py:71 - INFO - Created a model densenet with 0.78M params
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:53:03,100 - shrinker.py:104 - INFO - Acc: 79.07, Size: 6.476016 MB, Sparsity: 35.57 %
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:00:02 Time:  0:00:02
2020-08-26 13:53:06,114 - shrinker.py:118 - INFO - Acc: 79.07, Size: 2.851149 MB, Params: 0.35 M

Quantization

Post-training Static Quantization

$ python quantize.py --config config/quantize/cifar100/densenet_small.py --checkpoint save/test/densenet_small/296_81_20.pth.tar --static --check-acc

2020-08-26 13:57:02,595 - trainer.py:71 - INFO - Created a model quant_densenet with 0.78M params
2020-08-26 13:57:05,275 - quantizer.py:87 - INFO - Acc: 81.2 %  Size: 3.286695 MB
2020-08-26 13:57:05,344 - quantizer.py:95 - INFO - Post Training Static Quantization: Run calibration
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:40 Time:  0:02:40
2020-08-26 13:59:47,555 - quantizer.py:117 - INFO - Acc: 81.03 %  Size: 0.974913 MB

Quantization-Aware Training

$ python quantize.py --config config/quantize/cifar100/densenet_small.py --checkpoint path_to_checkpoint.pth.tar --check-acc

2020-08-26 14:06:46,855 - trainer.py:71 - INFO - Created a model quant_densenet with 0.78M params
2020-08-26 14:06:49,506 - quantizer.py:87 - INFO - Acc: 81.2 %  Size: 3.286695 MB
2020-08-26 14:06:49,613 - quantizer.py:99 - INFO - Quantization Aware Training: Run training
2020-08-26 14:46:51,857 - trainer.py:209 - INFO - Epoch: [0 | 4]        train/lr: 0.0001        train/loss: 1.984219    test/loss: 1.436638     test/model_acc: 80.96%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:09 Time:  0:38:09
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:40 Time:  0:02:40
2020-08-26 15:27:43,919 - trainer.py:209 - INFO - Epoch: [1 | 4]        train/lr: 9e-05 train/loss: 1.989543    test/loss: 1.435748     test/model_acc: 80.87%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:10 Time:  0:38:10
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:36 Time:  0:02:36
2020-08-26 16:08:32,883 - trainer.py:209 - INFO - Epoch: [2 | 4]        train/lr: 6.5e-05       train/loss: 1.984149    test/loss: 1.436074     test/model_acc: 80.82%    test/best_acc: 80.96%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:14 Time:  0:38:14
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:39 Time:  0:02:39
2020-08-26 16:49:28,848 - trainer.py:209 - INFO - Epoch: [3 | 4]        train/lr: 3.5e-05       train/loss: 1.984537    test/loss: 1.43442      test/model_acc: 81.01%    test/best_acc: 81.01%
[Train] 100% (782 of 782) |########################################################################################| Elapsed Time: 0:38:19 Time:  0:38:19
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:42 Time:  0:02:42
2020-08-26 17:30:32,187 - trainer.py:209 - INFO - Epoch: [4 | 4]        train/lr: 1e-05 train/loss: 1.990936    test/loss: 1.435393     test/model_acc: 80.92%    test/best_acc: 81.01%
[Test]  100% (157 of 157) |#########################################################################################| Elapsed Time: 0:02:37 Time:  0:02:37
2020-08-26 17:33:10,689 - quantizer.py:117 - INFO - Acc: 81.01 %        Size: 0.974913 MB

Class Diagram

References

Papers

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

Quantization

Quantizing deep convolutional networks for efficient inference: A whitepaper

Implementations / Tutorials

Competition

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

https://github.com/karanchahal/distiller

Quantization

Contributors

Thanks goes to these wonderful people (emoji key):

_{Jinwoo Park (Curt)}
💻

_{Junghoon Kim}
💻

_{Hyungseok Shin}
💻

_{Juhee Lee}
💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 150

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

j-marple-dev / Model_compression

Programming Languages

Labels

Projects that are alternatives of or similar to Model compression

Model Compression

Contents

Getting started

Prerequisites

Installation

Docker

Usages

Run training

Configurations for training

Run pruning

Configurations for pruning

Run shrinking (Experimental)

Important Notes:

Run quantization

Experiment Results

Unstructured Pruning (LTH vs Weight Rewinding vs LR Rewinding)

Structured Pruning (Slim vs L2Mag vs L2MagSlim)

Shrinking after Structured Pruning

Densenet (L=100, k=12) pruned by 19.66% (Slim & CIFAR100)

Densenet (L=100, k=12) pruned by 35.57% (Network Slimming & CIFAR100)

Quantization

Post-training Static Quantization

Quantization-Aware Training

Class Diagram

References

Papers

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

Quantization

Implementations / Tutorials

Competition

Architecture / Training

Augmentation

Pruning

Knowledge Distillation

Quantization

Contributors