All Projects → Adlik → model_optimizer

Adlik / model_optimizer

Licence: other
Model optimizer used in Adlik.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to model optimizer

Distill-BERT-Textgen
Research code for ACL 2020 paper: "Distilling Knowledge Learned in BERT for Text Generation".
Stars: ✭ 121 (+450%)
Mutual labels:  knowledge-distillation
Zero-shot Knowledge Distillation Pytorch
ZSKD with PyTorch
Stars: ✭ 26 (+18.18%)
Mutual labels:  knowledge-distillation
SemCKD
This is the official implementation for the AAAI-2021 paper (Cross-Layer Distillation with Semantic Calibration).
Stars: ✭ 42 (+90.91%)
Mutual labels:  knowledge-distillation
LD
Localization Distillation for Dense Object Detection (CVPR 2022)
Stars: ✭ 271 (+1131.82%)
Mutual labels:  knowledge-distillation
AB distillation
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons (AAAI 2019)
Stars: ✭ 105 (+377.27%)
Mutual labels:  knowledge-distillation
bert-AAD
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation
Stars: ✭ 27 (+22.73%)
Mutual labels:  knowledge-distillation
LabelRelaxation-CVPR21
Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021
Stars: ✭ 37 (+68.18%)
Mutual labels:  knowledge-distillation
FGD
Focal and Global Knowledge Distillation for Detectors (CVPR 2022)
Stars: ✭ 124 (+463.64%)
Mutual labels:  knowledge-distillation
MoTIS
Mobile(iOS) Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP). Accepted at NAACL 2022.
Stars: ✭ 60 (+172.73%)
Mutual labels:  knowledge-distillation
cool-papers-in-pytorch
Reimplementing cool papers in PyTorch...
Stars: ✭ 21 (-4.55%)
Mutual labels:  knowledge-distillation
SAN
[ECCV 2020] Scale Adaptive Network: Learning to Learn Parameterized Classification Networks for Scalable Input Images
Stars: ✭ 41 (+86.36%)
Mutual labels:  knowledge-distillation
MutualGuide
Localize to Classify and Classify to Localize: Mutual Guidance in Object Detection
Stars: ✭ 97 (+340.91%)
Mutual labels:  knowledge-distillation
FKD
A Fast Knowledge Distillation Framework for Visual Recognition
Stars: ✭ 49 (+122.73%)
Mutual labels:  knowledge-distillation
MLIC-KD-WSD
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection (ACM MM 2018)
Stars: ✭ 58 (+163.64%)
Mutual labels:  knowledge-distillation
ProSelfLC-2021
noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.
Stars: ✭ 45 (+104.55%)
Mutual labels:  knowledge-distillation
torchprune
A research library for pytorch-based neural network pruning, compression, and more.
Stars: ✭ 133 (+504.55%)
Mutual labels:  pruning-algorithms
mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Stars: ✭ 644 (+2827.27%)
Mutual labels:  knowledge-distillation
ACCV TinyGAN
BigGAN; Knowledge Distillation; Black-Box; Fast Training; 16x compression
Stars: ✭ 62 (+181.82%)
Mutual labels:  knowledge-distillation
Efficient-Computing
Efficient-Computing
Stars: ✭ 474 (+2054.55%)
Mutual labels:  knowledge-distillation
openvino pytorch layers
How to export PyTorch models with unsupported layers to ONNX and then to Intel OpenVINO
Stars: ✭ 17 (-22.73%)
Mutual labels:  model-optimizer

Model Optimizer

Build Status Bors enabled

Adlik model optimizer, focusing on and running on specific hardware to achieve the purpose of acceleration. Because sparsity pruning depends on special algorithms and hardware to achieve acceleration, the usage scenarios are limited. Adlik pruning focuses on channel pruning and filter pruning, which can really reduce the number of parameters and flops. In terms of quantization, Adlik focuses on 8-bit quantization that is easier to accelerate on specific hardware. After testing, it is found that running a small batch of datasets can obtain a quantitative model with little loss of accuracy, so Adlik focuses on this method. Knowledge distillation is another way to improve the performance of deep learning algorithm. It is possible to compress the knowledge in the big model into a smaller model.

The proposed framework mainly consists of two categories of algorithm components, i.e. pruner and quantizer. The pruner is mainly composed of five modules:core, scheduler, models, dataset and learner. The core module defines various pruning algorithms. The scheduler is responsible for the orchestration of each pruning algorithm process. The models module is responsible for the network definition of each model. The dataset module is responsible for the data loading and preprocessing. The learner is responsible for model training and fine-tuning, including the definition of each hyper-parameter. The quantizer includes calibration dataset, TF-Lite and TF-TRT quantization three modules. System Structure

After filter pruning, model can continue to be quantized, the following table shows the accuracy of the pruned and quantized Lenet-5 and ResNet-50 models.

Model Baseline Pruned Pruned + Quantization(TF-Lite) Pruned + Quantization(TF-TRT)
LeNet-5 98.85 99.11(59% pruned) 99.05 99.11
ResNet-50 76.174 75.456(31.9% pruned) 75.158 75.28

The Pruner completely removes redundant parameters, which further leads to smaller model size and faster execution. The following table is the size of the above model files:

Model Baseline(H5) Pruned(H5) Quantization(TF-Lite) Quantization(TF-TRT)
LeNet-5 1176KB 499KB(59% pruned) 120KB 1154KB (pb)
ResNet-50 99MB 67MB(31.9% pruned) 18MB 138MB(pb)

Currently, the MobileNet-v1 model was only pruned but not quantized. We show the results of different pruning ratio, which was tested on ImageNet. The original test accuracy is 71.25%, and model size is 17MB.

Pruning ratio(%) FLOPs(%) Params(%) Test Accuracy(%) Size(MB)
25 -33.12 -38.37 69.658 11(-35.29%)
35 -51.32 -51.41 68.66 8.2(-51.76%)
50 -57.21 -67.69 66.87 5.5(-67.65%)

Knowledge distillation is an effective way to imporve the performance of model.

The following table shows the distillation result of ResNet-50 as the student network where ResNet-101 as the teacher network.

Student Model ResNet-101 Distilled Accuracy Change
ResNet-50 77.14% +0.97%

Ensemble distillation can significantly improve the accuracy of the model. In the case of cutting 72.8% of the parameters, using senet154 and resnet152b as the teacher network, ensemble distillation can increase the accuracy by more than 4%. The details are shown in the table below, and the code can refer to examples\resnet_50_imagenet_prune_distill.py.

Model Accuracy Params FLOPs Model Size
ResNet-50 76.174 25610152 3899M 99M
+ pruned 72.28 6954152 ( 72.8% pruned) 1075M 27M
+ pruned + distill 76.39 6954152 ( 72.8% pruned) 1075M 27M
+ pruned + distill + quantization(TF-Lite) 75.938 - - 7.1M

1. Pruning and quantization principle

1.1 Filter pruning

Filter pruning is to cut out a complete filter. After cutting out the filter, the corresponding output feature map will be cut out accordingly. As shown in the following figure, after cutting out a Filter, the original output of four feature maps becomes three feature maps.

Prune Conv2D

If there is another convolution layer behind this convolution layer, because the next layer's input is now has fewer channels, we should also shrink the next layer's weights tensors, by removing the channels corresponding to the filters we pruned. As shown below, Each filter of the next convolutional layer originally had four Channels, which should be changed to three Channels accordingly.

Prune Relate Conv2D

Refer to the paper PRUNING FILTERS FOR EFFICIENT CONVNETS

1.2 Quantization

Compared to the quantization in the training phase, a description of the training model and a full data set are required. It takes a lot of computing power to complete the quantification of the large model. Small batch dataset quantization, only need to have inference model and very little calibration data to complete, and the accuracy loss of quantization is very small, and even some models will rise. Adlik only needs 100 sample images to complete the quantification of ResNet-50 in less than one minute.

1.3 Knowledge Distillation

Knowledge distillation is a compression technique by which the knowledge of a larger model(teacher) is transfered into a smaller one(student). During distillation, a student model learns from a teacher model to generalize well by raise the temperature of the final softmax of the teacher model as the soft set of targets.

Distillation

Refer to the paper Distilling the Knowledge in a Neural Network

2. Installation

These instructions will help get Adlik optimizer up and running on your local machine.

  1. Clone Adlik model optimizer
  2. Install the package

Notes:

  • Adlik model optimizer has only been tested on Ubuntu 18.04 LTS with Python 3.6.

2.1 Clone Adlik model optimizer

Clone the Adlik model optimizer code repository from github:

git clone https://github.com/Adlik/model_optimizer.git

2.2 Install the package

2.2.1 Install Open MPI

mkdir /tmp/openmpi && \
cd /tmp/openmpi && \
curl -fSsL -O https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-4.0.0.tar.gz && \
tar zxf openmpi-4.0.0.tar.gz && \
cd openmpi-4.0.0 && \
./configure --enable-orterun-prefix-by-default && \
make -j (nproc) all && \
make install && \
ldconfig && \
rm -rf /tmp/openmpi

2.2.2 Install python package

pip install tensorflow-gpu==2.3.0
pip install horovod==0.19.1
pip install mpi4py
pip install networkx
pip install jsonschema

3. Usage

The following uses LeNet-5 on the MNIST dataset to illustrate how to use Adlik model optimizer to achieve model training, pruning, and quantization.

3.1 Prepare data

3.1.1 Generate training and test datasets

Enter the tools directory and execute

cd tools
python export_mnist_to_tfrecord.py

By default, the train.tfrecords and test.tfrecords files will be generated in the ../examples/data/mnist directory. You can change the default storage path with the parameter --data_dir.

3.1.2 Generate small batch data sets required for int-8 quantization

Enter the tools directory and execute

cd tools
python generator_tiny_record_mnist.py

By default, the mnist_tiny_100.tfrecord file will be generated in the ../examples/data/mnist_tiny directory.

3.1.3 Training

Enter the examples directory and execute

cd examples
python lenet_mnist_train.py

After execution, the default checkpoint file will be generated in ./models_ckpt/lenet_mnist, and the inference checkpoint file will be generated in ./models_eval_ckpt/lenet_mnist. You can also modify the checkpoint_path and checkpoint_eval_path of the lenet_mnist_train.py file to change the generated file path.

3.1.4 Pruning

Enter the examples directory and execute

cd examples
python lenet_mnist_prune.py

After execution, the default checkpoint file will be generated in ./models_ckpt/lenet_mnist_pruned, and the inference checkpoint file will be generated in ./models_eval_ckpt/lenet_mnist_pruned. You can also modify the checkpoint_path and checkpoint_eval_path of the lenet_mnist_train.py file to change the generated file path.

3.1.5 Quantize and generate a TensorFlow Lite FlatBuffer file

Enter the examples directory and execute

cd examples
python lenet_mnist_quantize_tflite.py

After execution, the default checkpoint file will be generated in ./models_ckpt/lenet_mnist_pruned, and the tflite file will be generated in ./models_eval_ckpt/lenet_mnist_quantized. You can also modify the export_path of the lenet_mnist_quantize.py file to change the generated file path.

You can enter the tools directory and execute

cd tools
python tflite_model_test_lenet_mnist.py

Verify accuracy after quantization

3.1.6 Quantize and generate a TensorFlow with TensorRT (TF-TRT) file

Enter the examples directory and execute

cd examples
python lenet_mnist_quantize_tftrt.py

After execution, the savedmodel file will be generated in ./models_eval_ckpt/lenet_mnist_quantized/lenet_mnist_tftrt/1 by default. You can also modify the export_path of the lenet_mnist_quantize.py file to change the generated file path. You can enter the directory and execute

cd tools
python tftrt_model_test_lenet_mnist.py

Verify accuracy after quantization

4. Others

If you have a GPU that can be used for training acceleration, you can test the pruning and quantization for ResNet-50. This step is the same as described above. You can get detailed instructions from here.

4.1 Use multiple GPUs

     cd examples
     horovodrun -np 8 -H localhost:8 python resnet_50_imagenet_train.py

4.1 Adjust batch size and learning rate

Batch size is an important hyper-parameter for Deep Learning model training. If you have more GPU memory available, you can try larger batch size! You have to adjust the learning rate according to different batch size.

Model Card Batch Size Learning Rate
ResNet-50 V100 32GB 256 0.1
ResNet-50 P100 16GB 128 0.05
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].