Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jakc4103 → Dfq

jakc4103 / Dfq

Licence: mit

PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning quantization

Projects that are alternatives of or similar to Dfq

Pinto model zoo

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

Stars: ✭ 634 (+407.2%)

Mutual labels: quantization

Jacinto Ai Devkit

Training & Quantization of embedded friendly Deep Learning / Machine Learning / Computer Vision models

Stars: ✭ 49 (-60.8%)

Mutual labels: quantization

Frostnet

FrostNet: Towards Quantization-Aware Network Architecture Search

Stars: ✭ 85 (-32%)

Mutual labels: quantization

Awesome Automl And Lightweight Models

A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.

Stars: ✭ 691 (+452.8%)

Mutual labels: quantization

Model Optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Stars: ✭ 992 (+693.6%)

Mutual labels: quantization

Dsq

pytorch implementation of "Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks"

Stars: ✭ 70 (-44%)

Mutual labels: quantization

Awesome Emdl

Embedded and mobile deep learning research resources

Stars: ✭ 554 (+343.2%)

Mutual labels: quantization

Tf2

An Open Source Deep Learning Inference Engine Based on FPGA

Stars: ✭ 113 (-9.6%)

Mutual labels: quantization

Quantization.mxnet

Simulate quantization and quantization aware training for MXNet-Gluon models.

Stars: ✭ 42 (-66.4%)

Mutual labels: quantization

Pyepr

Powerful, automated analysis and design of quantum microwave chips & devices [Energy-Participation Ratio and more]

Stars: ✭ 81 (-35.2%)

Mutual labels: quantization

Libimagequant Rust

libimagequant (pngquant) bindings for the Rust language

Stars: ✭ 17 (-86.4%)

Mutual labels: quantization

Sai

SDK for TEE AI Stick (includes model training script, inference library, examples)

Stars: ✭ 28 (-77.6%)

Mutual labels: quantization

Vectorsinsearch

Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015

Stars: ✭ 71 (-43.2%)

Mutual labels: quantization

Paddleslim

PaddleSlim is an open-source library for deep model compression and architecture search.

Stars: ✭ 677 (+441.6%)

Mutual labels: quantization

Trained Ternary Quantization

Reducing the size of convolutional neural networks

Stars: ✭ 90 (-28%)

Mutual labels: quantization

Paddleclas

A treasure chest for image classification powered by PaddlePaddle

Stars: ✭ 625 (+400%)

Mutual labels: quantization

Ntagger

reference pytorch code for named entity tagging

Stars: ✭ 58 (-53.6%)

Mutual labels: quantization

Model Quantization

Collections of model quantization algorithms

Stars: ✭ 118 (-5.6%)

Mutual labels: quantization

Hawq

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Stars: ✭ 108 (-13.6%)

Mutual labels: quantization

Micronet

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

Stars: ✭ 1,232 (+885.6%)

Mutual labels: quantization

View All Similar Projects ➔

DFQ

PyTorch implementation of Data Free Quantization Through Weight Equalization and Bias Correction with some ideas from ZeroQ: A Novel Zero Shot Quantization Framework.

Results

Int8**: Fake quantization; 8 bits weight, 8 bits activation, 16 bits bias
Int8*: Fake quantization; 8 bits weight, 8 bits activation, 8 bits bias
Int8': Fake quantization; 8 bits weight(symmetric), 8 bits activation(symmetric), 32 bits bias
Int8: Int8 Inference using ncnn; 8 bits weight(symmetric), 8 bits activation(symmetric), 32 bits bias

On classification task

Tested with MobileNetV2 and ResNet-18
ImageNet validation set (Acc.)

MobileNetV2

ResNet-18

model/precision	FP32	Int8**	Int8*	Int8'	Int8 (FP32-69.19)
Original	71.81	0.102	0.1	0.062	0.082
+ReLU	71.78	0.102	0.096	0.094	0.082
+ReLU+LE	71.78	70.32	68.78	67.5	65.21
+ReLU+LE +DR	--	70.47	68.87	--	--
+BC	--	57.07	0.12	26.25	5.57
+BC +clip_15	--	65.37	0.13	65.96	45.13
+ReLU+LE+BC	--	70.79	68.17	68.65	62.19
+ReLU+LE+BC +DR	--	70.9	68.41	--	--

model/precision	FP32	Int8**	Int8*
Original	69.76	69.13	69.09
+ReLU	69.76	69.13	69.09
+ReLU+LE	69.76	69.2	69.2
+ReLU+LE +DR	--	67.74	67.75
+BC	--	69.04	68.56
+BC +clip_15	--	69.04	68.56
+ReLU+LE+BC	--	69.04	68.56
+ReLU+LE+BC +DR	--	67.65	67.62

On segmentation task

Tested with Deeplab-v3-plus_mobilenetv2

Pascal VOC 2012 val set (mIOU)

Pascal VOC 2007 test set (mIOU)

model/precision	FP32	Int8**	Int8*
Original	70.81	60.03	59.31
+ReLU	70.72	60.0	58.98
+ReLU+LE	70.72	66.22	66.0
+ReLU+LE +DR	--	67.04	67.23
+ReLU+BC	--	69.04	68.42
+ReLU+BC +clip_15	--	66.99	66.39
+ReLU+LE+BC	--	69.46	69.22
+ReLU+LE+BC +DR	--	70.12	69.7

model/precision	FP32	Int8**	Int8*
Original	74.54	62.36	61.21
+ReLU	74.35	61.66	61.04
+ReLU+LE	74.35	69.47	69.6
+ReLU+LE +DR	--	70.28	69.93
+BC	--	72.1	70.97
+BC +clip_15	--	70.16	70.76
+ReLU+LE+BC	--	72.84	72.58
+ReLU+LE+BC +DR	--	73.5	73.04

On detection task

Tested with MobileNetV2 SSD-Lite model

Pascal VOC 2012 val set (mAP with 12 metric)

Pascal VOC 2007 test set (mAP with 07 metric)

model/precision	FP32	Int8**	Int8*
Original	78.51	77.71	77.86
+ReLU	75.42	75.74	75.58
+ReLU+LE	75.42	75.32	75.37
+ReLU+LE +DR	--	74.65	74.32
+BC	--	77.73	77.78
+BC +clip_15	--	77.73	77.78
+ReLU+LE+BC	--	75.66	75.66
+ReLU+LE+BC +DR	--	74.92	74.65

model/precision	FP32	Int8**	Int8*
Original	68.70	68.47	68.49
+ReLU	65.47	65.36	65.56
+ReLU+LE	65.47	65.36	65.27
+ReLU+LE +DR	--	64.53	64.46
+BC	--	68.32	65.33
+BC +clip_15	--	68.32	65.33
+ReLU+LE+BC	--	65.63	65.58
+ReLU+LE+BC +DR	--	64.92	64.42

Usage

There are 6 arguments, all default to False

quantize: whether to quantize parameters and activations.
relu: whether to replace relu6 to relu.
equalize: whether to perform cross layer equalization.
correction: whether to apply bias correction
clip_weight: whether to clip weights in range [-15, 15] (for convolution and linear layer)
distill_range: whether to use distill data for setting min/max range of activation quantization

run the equalized model by:

python main_cls.py --quantize --relu --equalize

run the equalized and bias-corrected model by:

python main_cls.py --quantize --relu --equalize --correction

run the equalized and bias-corrected model with distilled data by:

python main_cls.py --quantize --relu --equalize --correction --distill_range

export equalized and bias-corrected model to onnx and generage calibration table file:

python convert_ncnn.py --equalize --correction --quantize --relu --ncnn_build path_to_ncnn_build_folder

Note

Distilled Data (2020/02/03 updated)

According to recent paper ZeroQ, we can distill some fake data to match the statistics from batch-normalization layers, then use it to set the min/max value range of activation quantization.
It does not need each conv followed by batch norm layer, and should produce better and more stable results using distilled data (the method from DFQ sometimes failed to find a good enough value range).

Here are some modifications that differs from original ZeroQ implementation:

Initialization of distilled data
Early stop criterion

~~Also, I think it can be applied to optimizing cross layer equalization and bias correction. The results will be updated as long as I make it to work.~~
Using distilled data to do LE or BC did not perform as good as using estimation from batch norm layers, probably because of overfitting.

Fake Quantization

The 'Int8' model in this repo is actually simulation of 8 bits, the actual calculation is done in floating points.
This is done by quantizing-dequantizing parameters in each layer and activation between 2 consecutive layers;
Which means each tensor will have dtype 'float32', but there would be at most 256 (2^8) unique values in it.

  Weight_quant(Int8) = Quant(Weight)
  Weight_quant(FP32) = Weight_quant(Int8*) = Dequant(Quant(Weight))

16-bits Quantization for Bias

Somehow I cannot make Bias-Correction work on 8-bits bias quantization for all scenarios (even with data dependent correction).
I am not sure how the original paper managed to do it with 8 bits quantization, but I guess they either use some non-uniform quantization techniques or use more bits for bias parameters as I do.

Int8 inference

Refer to ncnn, pytorch2ncnn, ncnn-quantize, ncnn-int8-inference for more details.
You will need to install/build the followings:
ncnn
onnx-simplifier

Inference_cls.cpp only implements mobilenetv2. Basic steps are:

Run convert_ncnn.py to convert pytorch model (with layer equalization or bias correction) to ncnn int8 model and generate calibration table file. The name of out_layer will be printed to console.

  python convert_ncnn.py --quantize --relu --equalize --correction

compile inference_cls.cpp

  mkdir build
  cd build
  cmake ..
  make

Inference! link

  ./inference_cls --images=path_to_imagenet_validation_set --param=../modeling/ncnn/model_int8.param --bin=../modeling/ncnn/model_int8.bin --out_layer=name_from_step1

TODO

[x] cross layer equalization
[ ] high bias absorption
[x] data-free bias correction
[x] test with detection model
[x] test with classification model
[x] use distilled data to set min/max activation range
[ ] ~~use distilled data to find optimal scale matrix~~
[ ] ~~use distilled data to do bias correction~~
[x] True Int8 inference

Acknowledgment

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 125

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗