All Projects → mrusci → training-mixed-precision-quantized-networks

mrusci / training-mixed-precision-quantized-networks

Licence: other
This repository containts the pytorch scripts to train mixed-precision networks for microcontroller deployment, based on the memory contraints of the target device.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to training-mixed-precision-quantized-networks

CMSIS NN-INTQ
INT-Q Extension of the CMSIS-NN library for ARM Cortex-M target
Stars: ✭ 15 (-65.12%)
Mutual labels:  qnn, low-power-mcu
libmaix
A library and SDK for embeded AI model inference with hardware acceleration
Stars: ✭ 40 (-6.98%)
Mutual labels:  edge-ai
object-size-detector-python
Monitor mechanical bolts as they move down a conveyor belt. When a bolt of an irregular size is detected, this solution emits an alert.
Stars: ✭ 26 (-39.53%)
Mutual labels:  edge-ai
fastface
Light Face Detection using PyTorch Lightning
Stars: ✭ 71 (+65.12%)
Mutual labels:  edge-ai
nn-Meter
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.
Stars: ✭ 211 (+390.7%)
Mutual labels:  edge-ai
object-flaw-detector-python
Detect various irregularities of a product as it moves along a conveyor belt.
Stars: ✭ 17 (-60.47%)
Mutual labels:  edge-ai
host lib
This project contains python examples for the Kneron USB dongle
Stars: ✭ 15 (-65.12%)
Mutual labels:  edge-ai
concurrent-video-analytic-pipeline-optimization-sample-l
Create a concurrent video analysis pipeline featuring multistream face and human pose detection, vehicle attribute detection, and the ability to encode multiple videos to local storage in a single stream.
Stars: ✭ 39 (-9.3%)
Mutual labels:  edge-ai
awesome-edge-computing
A curated list of awesome edge computing, including Frameworks, Simulators, Tools, etc.
Stars: ✭ 149 (+246.51%)
Mutual labels:  edge-ai
BitPack
BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.
Stars: ✭ 36 (-16.28%)
Mutual labels:  quantized-neural-networks
awesome-edge-ai
A curated list of edge tools for AI applications
Stars: ✭ 38 (-11.63%)
Mutual labels:  edge-ai
MaixPy3
MaixPy for Linux Python3, let's play with edge AI easier!
Stars: ✭ 125 (+190.7%)
Mutual labels:  edge-ai
bcnn
A minimalist Deep Learning framework for embedded Computer Vision
Stars: ✭ 39 (-9.3%)
Mutual labels:  edge-ai
motor-defect-detector-python
Predict performance issues with manufacturing equipment motors. Perform local or cloud analytics of the issues found, and then display the data on a user interface to determine when failures might arise.
Stars: ✭ 24 (-44.19%)
Mutual labels:  edge-ai
libdivide4j
Optimized integer division for Java
Stars: ✭ 18 (-58.14%)
Mutual labels:  integer-arithmetic
intruder-detector-python
Build an application that alerts you when someone enters a restricted area. Learn how to use models for multiclass object detection.
Stars: ✭ 16 (-62.79%)
Mutual labels:  edge-ai
slimcpplib
Simple Long Integer Math for C++
Stars: ✭ 18 (-58.14%)
Mutual labels:  integer-arithmetic
intfftk
Fully pipelined Integer Scaled / Unscaled Radix-2 Forward/Inverse Fast Fourier Transform (FFT) IP-core for newest Xilinx FPGAs (Source language - VHDL / Verilog). GNU GPL 3.0.
Stars: ✭ 43 (+0%)
Mutual labels:  integer-arithmetic
clpz
Constraint Logic Programming over Integers
Stars: ✭ 131 (+204.65%)
Mutual labels:  integer-arithmetic
Berrynet
Deep learning gateway on Raspberry Pi and other edge devices
Stars: ✭ 1,529 (+3455.81%)
Mutual labels:  edge-ai

Training Mixed-Precision Quantized Neural Networks for microcontroller deployments

Description

This project targets quantization-aware training methodologies on Pytorch for microcontroller deployment of quantized neural networks. The featured mixed-precision quantization techniques aim at byte or sub-byte quantization, i.e. INT8, INT4, INT2. The generated network for deployment supports integer arithmetic only. Optionally, the selection of individual per-tensor bit precision is driven by the device memory constraints.

Reference

Please, cite this paper arXiv when using the code.

@article{rusci2019memory,
  title={Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers},
  author={Rusci, Manuele and Capotondi, Alessandro and Benini, Luca},
  journal={arXiv preprint arXiv:1905.13082},
  year={2019}
}

Questions

For any question just drop me an email.

Getting Started

Prerequisites

  • The code is tested with PyTorch 0.4.1 and Python 3.5
  • Tensorflow package is needed to load pretrained tensorflow model weights

Setup

Set the correct dataset paths inside data.py . As an example:

_IMAGENET_MAIN_PATH = '/home/user/ImagenetDataset/'
_DATASETS_MAIN_PATH = './datasets/'

To download pretrained mobilenet weights:

$ cd models/mobilenet_tf/
$ source download_pretrained_mobilenet.sh

Quickstart

For quantization-aware retraining of a 8-bit integer only mobilenet model type:

$ python3 main_binary.py -a mobilenet --mobilenet_width 1.0 --mobilenet_input 224 --save Imagenet/mobilenet_224_1.0_w8a8 --dataset imagenet --type_quant 'PerLayerAsymPACT' --weight_bits 8 --activ_bits 8 --activ_type learned --gpus 0,1,2,3 -j 8 --epochs 12 -b 128 --save_check --quantizer --batch_fold_delay 1 --batch_fold_type folding_weights

Quantization Options

  • quantizer: enables quantization when True
  • type_quant: type of weight uantization method to apply (see below)
  • weight_bits: number of bits for weights quantization
  • activ_bits: number of activation bits
  • activ_type: type of quantized activation layers
  • batch_fold_delay: number of epochs before freezing batch norm parameters
  • batch_fold_type: how to deal with folding of batch norm parameters (or any other scalar params). [Supported: 'folding_weights' | 'ICN']
  • quant_add_config: optinal list of per-layer configuration, which overwrite previous settings on a per-layer basis
  • mobilenet_width: Mobilenet width multiplier ( default=1.0; supported [ 0.25, 0.5, 0.75, 1.0 ] )
  • mobilenet_input: Mobilenet resolution input size ( default=224; supported [ 128, 160, 192, 224 ] )
  • mem_constraint: Memory contraints of the target device. Must be provided as a string '[ROM_SIZE,RAM_SIZE]'
  • mixed_prec_quant: Mixed Per-Layer ('MixPL') or mixed per-channel ('MixPC')

Reproducing paper results

For any given mobilenet model, run the script with:

  • memory constraints 512kB of RAM and 2MB of FLASH --mem_constraint [2048000,512000]
  • mixed precision per-layer or per-channel --mixed_prec_quant MixPL (or MixPC)

As an example:

$ python3 main_binary.py --model mobilenet --save Imagenet_ARM/mobilenet_128_0.75_quant_auto_tt --mobilenet_width 0.75 --mobilenet_input 128 --dataset imagenet -j 32 --epochs 10 -b 128 --save_check --gpus 0,1,2,3 --type_quant PerLayerAsymPACT --activ_type learned --quantizer --batch_fold_delay 1 --batch_fold_type folding_weights --mem_constraint [2048000,512000] --mixed_prec_quant MixPL

Quantization Strategy Guide

Overview

The quantization functions are located into quantization/quantop.py. The operator QuantOp wraps the full-precision model to handle weight quantization. As a usage example:

import quantization 
quantizer = quantization.QuantOp(model, type_quant, weight_bits, \
            batch_fold_type=args.batch_fold_type, batch_fold_delay=batch_fold_delay, \
            act_bits=activ_bits, add_config = quant_add_config )

The operator QuantOp after wrapping a full-precision model:

  • generates the deployment integer-only graph quantizer.deployment_model, based on the full-precision graph model.
  • updates the quantized parameters of the deployment model based on the actual full-precision graph parameters quantizer.generate_deployment_model()
  • provides methods to support quantization-aware retraining of the full-precision model

At training time, the quantizer works in combination with the optimizer:

  # weight quantization before the forward pass
  quantizer.store_and_quantize() # copy the real-value weights and quantize the actual ones
   
  # forward pass
  output = model(input) # compute output
  loss = criterion(output, target) # compute loss

  if training:
      # backward pass
      optimizer.zero_grad()
      loss.backward()

      quantizer.restore_real_value()  # restore real value parameters          
      quantizer.backprop_quant_gradients() # compute gradients wrt to real-value weights      

      optimizer.step() # update the values
      
  else:
      quantizer.restore_real_value() # restore real-value weights after forward pass

Weight Quantization

Currently, the following quantization schemes are supported:

  • PerLayerAsymPACT: per-layer asymmetric quantization, quantization range is learned with PACT method
  • PerChannelsAsymMinMax: per-channel asymmetric quantization, quantization range is defined by min/max range of the weight-channel tensor
  • PerLayerAsymMinMax: per-layer asymmetric quantization, quantization range is defined by min/max range of the weight tensor (not fully tested)

Activation Quantization

At the present stage, the quantized activation layers must be part of the model definition itself. This is why the input model is already a fake-quantized model. See 'models/mobilenet.py' as an example. This part will be improved with automatic graph analysis and parsing, to turn a full-precision input model into a fake-quantized one.

Limitations

This project does not include any graph analysis tools. Hence, the graph parser (see __init__ of QuantOp operator) is specific for the tested model 'models/mobilenet.py', which already includes quantized activation layers. A rework of this part may be necessary to apply the implemented techniques on any other models.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].