JulesBelveze / bert-squeeze

Licence: other

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to bert-squeeze

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Stars: ✭ 2,768 (+4842.86%)

Mutual labels: transformers, quantization, bert

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+5630.36%)

Mutual labels: transformers, lstm, bert

Distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Stars: ✭ 3,760 (+6614.29%)

Mutual labels: pruning, quantization, distillation

classy

classy is a simple-to-use library for building high-performance Machine Learning models in NLP.

Stars: ✭ 61 (+8.93%)

Mutual labels: transformers, bert, pytorch-lightning

torch-model-compression

针对pytorch模型的自动化模型结构分析和修改工具集，包含自动分析模型结构的模型压缩算法库

Stars: ✭ 126 (+125%)

Mutual labels: pruning, quantization

sparsezoo

Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes

Stars: ✭ 264 (+371.43%)

Mutual labels: pruning, quantization

question generator

An NLP system for generating reading comprehension questions

Stars: ✭ 188 (+235.71%)

Mutual labels: transformers, bert

label-studio-transformers

Label data using HuggingFace's transformers and automatically get a prediction service

Stars: ✭ 117 (+108.93%)

Mutual labels: transformers, bert

oreilly-bert-nlp

This repository contains code for the O'Reilly Live Online Training for BERT

Stars: ✭ 19 (-66.07%)

Mutual labels: transformers, bert

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+308.93%)

Mutual labels: transformers, bert

lightning-transformers

Flexible components pairing 🤗 Transformers with Pytorch Lightning

Stars: ✭ 551 (+883.93%)

Mutual labels: transformers, pytorch-lightning

ZAQ-code

CVPR 2021 : Zero-shot Adversarial Quantization (ZAQ)

Stars: ✭ 59 (+5.36%)

Mutual labels: quantization, distillation

gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

Stars: ✭ 216 (+285.71%)

Mutual labels: transformers, bert

Transformer-QG-on-SQuAD

Implement Question Generator with SOTA pre-trained Language Models (RoBERTa, BERT, GPT, BART, T5, etc.)

Stars: ✭ 28 (-50%)

Mutual labels: bert, pytorch-lightning

roberta-wwm-base-distill

this is roberta wwm base distilled model which was distilled from roberta wwm by roberta wwm large

Stars: ✭ 61 (+8.93%)

Mutual labels: bert, distillation

ATMC

[NeurIPS'2019] Shupeng Gui, Haotao Wang, Haichuan Yang, Chen Yu, Zhangyang Wang, Ji Liu, “Model Compression with Adversarial Robustness: A Unified Optimization Framework”

Stars: ✭ 41 (-26.79%)

Mutual labels: pruning, quantization

neural-compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.

Stars: ✭ 666 (+1089.29%)

Mutual labels: pruning, quantization

anonymisation

Anonymization of legal cases (Fr) based on Flair embeddings

Stars: ✭ 85 (+51.79%)

Mutual labels: transformers, bert

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+4230.36%)

Mutual labels: transformers, bert

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+4396.43%)

Mutual labels: transformers, bert

View All Similar Projects ➔

Bert-squeeze

Bert-squeeze is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their latency at inference time.

It gathers a non-exhaustive list of techniques such as distillation, pruning, quantization, early-exiting. The repo is written using PyTorch Lightning and Transformers.

About the project

As a heavy user of transformer-based models (which are truly amazing from my point of view) I always struggled to put those heavy models in production while having a decent inference speed. There are of course a bunch of existing libraries to optimize and compress transformer-based models (ONNX , distiller, compressors , KD_Lib, ... ).
I started this project because of the need to reduce the latency of models integrating transformers as subcomponents. For this reason, this project aims at providing implementations to train various transformer-based models (and others) using PyTorch Lightning but also to distill, prune, and quantize models.
I chose to write this repo with Lightning because of its growing trend, its flexibility, and the very few repositories using it. It currently only handles sequence classification models, but support for other tasks and custom architectures is planned.

Installation

First download the repository:

git clone https://github.com/JulesBelveze/bert-squeeze.git

and then install dependencies using poetry:

poetry install

You are all set!

Quickstarts

You can find a bunch of already prepared configurations under the examples folder. Just choose the one you need and run the following:

python3 -m bert-squeeze.main -cp=examples -cn=wanted_config

Disclaimer: I have not extensively tested all procedures and thus do not guarantee the performance of every implemented method.

Concepts

Transformers

If you never heard of it then I can only recommend you to read this amazing blog post and if you want to dig deeper there is this awesome lecture was given by Stanford available here.

Distillation

The idea of distillation is to train a small network to mimic a big network by trying to replicate its outputs. The repository provides the ability to transfer knowledge from any model to any other (if you need a model that is not within the models folder just write your own).

The repository also provides the possibility to perform soft-distillation or hard-distillation on an unlabeled dataset. In the soft case, we use the probabilities of the teacher as a target. In the hard one, we assume that the teacher's predictions are the actual label.

You can find these implementations under the distillation/ folder.

Quantization

Neural network quantization is the process of reducing the weights precision in the neural network. The repo has two callbacks one for dynamic quantization and one for quantization-aware training (using the Lightning callback) .

You can find those implementations under the utils/callbacks/ folder.

Pruning

Pruning neural networks consist of removing weights from trained models to compress them. This repo features various pruning implementations and methods such as head-pruning, layer dropping, and weights dropping.

You can find those implementations under the utils/callbacks/ folder.

Contributions and questions

If you are missing a feature that could be relevant to this repo, or a bug that you noticed feel free to open a PR or open an issue. As you can see in the roadmap there are a bunch more features to come 😃

Also, if you have any questions or suggestions feel free to ask!

References

Alammar, J (2018). The Illustrated Transformer [Blog post]. Retrieved from https://jalammar.github.io/illustrated-transformer/
stanfordonline (2021) Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 9 - Self- Attention and Transformers. [online video] Available at: https://www.youtube.com/watch?v=ptuGllU5SQQ
Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Jamie Brew (2019). HuggingFace's Transformers: State-of-the-art Natural Language Processing
Hassan Sajjad and Fahim Dalvi and Nadir Durrani and Preslav Nakov (2020). Poor Man's BERT Smaller and Faster Transformer Models
Angela Fan and Edouard Grave and Armand Joulin (2019). Reducing Transformer Depth on Demand with Structured Dropout
Paul Michel and Omer Levy and Graham Neubig (2019). Are Sixteen Heads Really Better than One?
Fangxiaoyu Feng and Yinfei Yang and Daniel Cer and Naveen Arivazhagan and Wei Wang (2020). Language-agnostic BERT Sentence Embedding
Weijie Liu and Peng Zhou and Zhe Zhao and Zhiruo Wang and Haotang Deng and Qi Ju (2020). FastBERT: a Self-distilling {BERT} with Adaptive Inference Time.
Repository: https://github.com/BitVoyage/FastBERT
Xu, Canwen and Zhou, Wangchunshu and Ge, Tao and Wei, Furu and Zhou, Ming (2020). {BERT}-of-Theseus: Compressing {BERT} by Progressive Module Replacing

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

JulesBelveze / bert-squeeze

Programming Languages

Labels

Projects that are alternatives of or similar to bert-squeeze

Bert-squeeze

About the project

Installation

Quickstarts

Concepts

Transformers

Distillation

Quantization

Pruning

Contributions and questions

References