cedrickchee / Awesome Ml Model Compression
Awesome machine learning model compression research papers, tools, and learning material.
Stars: ✭ 166
Projects that are alternatives of or similar to Awesome Ml Model Compression
Kd lib
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
Stars: ✭ 173 (+4.22%)
Mutual labels: quantization, model-compression, pruning
Paddleslim
PaddleSlim is an open-source library for deep model compression and architecture search.
Stars: ✭ 677 (+307.83%)
Mutual labels: quantization, model-compression, pruning
torch-model-compression
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
Stars: ✭ 126 (-24.1%)
Mutual labels: pruning, quantization, model-compression
Model Optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Stars: ✭ 992 (+497.59%)
Mutual labels: quantization, model-compression, pruning
Micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Stars: ✭ 1,232 (+642.17%)
Mutual labels: quantization, model-compression, pruning
Awesome Ai Infrastructures
Infrastructures™ for Machine Learning Training/Inference in Production.
Stars: ✭ 223 (+34.34%)
Mutual labels: quantization, model-compression, pruning
ATMC
[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”
Stars: ✭ 41 (-75.3%)
Mutual labels: pruning, quantization, model-compression
Brevitas
Brevitas: quantization-aware training in PyTorch
Stars: ✭ 343 (+106.63%)
Mutual labels: neural-networks, quantization
Aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Stars: ✭ 453 (+172.89%)
Mutual labels: quantization, pruning
Awesome Emdl
Embedded and mobile deep learning research resources
Stars: ✭ 554 (+233.73%)
Mutual labels: quantization, pruning
Awesome Pruning
A curated list of neural network pruning resources.
Stars: ✭ 1,017 (+512.65%)
Mutual labels: model-compression, pruning
Distiller
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Stars: ✭ 3,760 (+2165.06%)
Mutual labels: quantization, pruning
Filter Pruning Geometric Median
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019 Oral)
Stars: ✭ 338 (+103.61%)
Mutual labels: model-compression, pruning
Soft Filter Pruning
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks
Stars: ✭ 291 (+75.3%)
Mutual labels: model-compression, pruning
sparsify
Easy-to-use UI for automatically sparsifying neural networks and creating sparsification recipes for better inference performance and a smaller footprint
Stars: ✭ 138 (-16.87%)
Mutual labels: pruning, quantization
Awesome Automl And Lightweight Models
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
Stars: ✭ 691 (+316.27%)
Mutual labels: quantization, model-compression
Ntagger
reference pytorch code for named entity tagging
Stars: ✭ 58 (-65.06%)
Mutual labels: quantization, pruning
Hawq
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
Stars: ✭ 108 (-34.94%)
Mutual labels: quantization, model-compression
Tf2
An Open Source Deep Learning Inference Engine Based on FPGA
Stars: ✭ 113 (-31.93%)
Mutual labels: quantization, model-compression
Awesome Edge Machine Learning
A curated list of awesome edge machine learning resources, including research papers, inference engines, challenges, books, meetups and others.
Stars: ✭ 139 (-16.27%)
Mutual labels: quantization, pruning
Awesome ML Model Compression
An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!
Contents
Papers
General
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Model compression as constrained optimization, with application to neural nets. Part II: quantization
Architecture
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- MobileNetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
- Xception: Deep Learning with Depthwise Separable Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- AddressNet: Shift-based Primitives for Efficient Convolutional Neural Networks
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
- ResBinNet: Residual Binary Neural Network
- Residual Attention Network for Image Classification
- Squeezedet: Unified, small, low power fully convolutional neural networks
- SEP-Nets: Small and Effective Pattern Networks
- Dynamic Capacity Networks
- Learning Infinite Layer Networks Without the Kernel Trick
- Efficient Sparse-Winograd Convolutional Neural Networks
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- Coordinating Filters for Faster Deep Neural Networks
- Deep Networks with Stochastic Depth
Quantization
- Quantized Convolutional Neural Networks for Mobile Devices
- Towards the Limit of Network Quantization
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Compressing Deep Convolutional Networks using Vector Quantization
- Trained Ternary Quantization
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- Loss-aware Binarization of Deep Networks
- Quantize weights and activations in Recurrent Neural Networks
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- And the bit goes down: Revisiting the quantization of neural networks
Binarization
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- Local Binary Convolutional Neural Networks
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Pruning
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Pruning Filters for Efficient ConvNets
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Learning both Weights and Connections for Efficient Neural Networks
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Soft Weight-Sharing for Neural Network Compression
- Dynamic Network Surgery for Efficient DNNs
- Channel pruning for accelerating very deep neural networks
- AMC: AutoML for model compression and acceleration on mobile devices
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Distillation
- Distilling the Knowledge in a Neural Network
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Sequence-Level Knowledge Distillation
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Dark knowledge
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- FitNets: Hints for Thin Deep Nets
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Low Rank Approximation
- Speeding up convolutional neural networks with low rank expansions
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
Articles
Content published on the Web.
Howtos
Assorted
- Why the Future of Machine Learning is Tiny
- Deep Learning Model Compression for Image Analysis: Methods and Architectures
- A foolproof way to shrink deep learning models by MIT (Alex Renda et al.) - A pruning algorithm: train to completion, globally prune the 20% of weights with the lowest magnitudes (the weakest connections), retrain with learning rate rewinding for the original (early training) rate, iteratively repeat until the desired sparsity is reached (model is as tiny as you want).
Reference
Blogs
- TensorFlow Model Optimization Toolkit — Pruning API
- Compressing neural networks for image classification and detection - Facebook AI researchers have developed a new method for reducing the memory footprint of neural networks by quantizing their weights, while maintaining a short inference time. They manage to get a 76.1% top-1 ResNet-50 that fits in 5 MB and also compress a Mask R-CNN within 6 MB.
- All The Ways You Can Compress BERT - An overview of different compression methods for large NLP models (BERT) based on different characteristics and compares their results.
Tools
Libraries
- TensorFlow Model Optimization Toolkit. Accompanied blog post, TensorFlow Model Optimization Toolkit — Pruning API
- XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 (SSE2 level) platforms. It's a based on QNNPACK library. However, unlike QNNPACK, XNNPACK focuses entirely on floating-point operators.
Frameworks
Paper Implementations
- facebookresearch/kill-the-bits - code and compressed models for the paper, "And the bit goes down: Revisiting the quantization of neural networks" by Facebook AI Research.
Videos
Talks
Training & tutorials
License
To the extent possible under law, Cedric Chee has waived all copyright and related or neighboring rights to this work.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].