All Projects → JetRunner → Bert Of Theseus

JetRunner / Bert Of Theseus

Licence: apache-2.0
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bert Of Theseus

Ghostnet
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.
Stars: ✭ 1,744 (+734.45%)
Mutual labels:  model-compression
Yolov3
yolov3 by pytorch
Stars: ✭ 142 (-32.06%)
Mutual labels:  model-compression
Awesome Ml Model Compression
Awesome machine learning model compression research papers, tools, and learning material.
Stars: ✭ 166 (-20.57%)
Mutual labels:  model-compression
Tf2
An Open Source Deep Learning Inference Engine Based on FPGA
Stars: ✭ 113 (-45.93%)
Mutual labels:  model-compression
Condensa
Programmable Neural Network Compression
Stars: ✭ 129 (-38.28%)
Mutual labels:  model-compression
Amc Models
[ECCV 2018] AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Stars: ✭ 154 (-26.32%)
Mutual labels:  model-compression
Neuronblocks
NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego
Stars: ✭ 1,356 (+548.8%)
Mutual labels:  model-compression
Jfasttext
Java interface for fastText
Stars: ✭ 193 (-7.66%)
Mutual labels:  model-compression
Collaborative Distillation
PyTorch code for our CVPR'20 paper "Collaborative Distillation for Ultra-Resolution Universal Style Transfer"
Stars: ✭ 138 (-33.97%)
Mutual labels:  model-compression
Keras compressor
Model Compression CLI Tool for Keras.
Stars: ✭ 160 (-23.44%)
Mutual labels:  model-compression
Awesome Model Compression
papers about model compression
Stars: ✭ 119 (-43.06%)
Mutual labels:  model-compression
Pretrained Language Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
Stars: ✭ 2,033 (+872.73%)
Mutual labels:  model-compression
Pytorch Weights pruning
PyTorch Implementation of Weights Pruning
Stars: ✭ 158 (-24.4%)
Mutual labels:  model-compression
Hawq
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.
Stars: ✭ 108 (-48.33%)
Mutual labels:  model-compression
Kd lib
A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
Stars: ✭ 173 (-17.22%)
Mutual labels:  model-compression
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+5018.66%)
Mutual labels:  model-compression
Ld Net
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Stars: ✭ 148 (-29.19%)
Mutual labels:  model-compression
Torch Pruning
A pytorch pruning toolkit for structured neural network pruning and layer dependency maintaining.
Stars: ✭ 193 (-7.66%)
Mutual labels:  model-compression
Mobile Id
Deep Face Model Compression
Stars: ✭ 187 (-10.53%)
Mutual labels:  model-compression
Pruning
Code for "Co-Evolutionary Compression for Unpaired Image Translation" (ICCV 2019) and "SCOP: Scientific Control for Reliable Neural Network Pruning" (NeurIPS 2020).
Stars: ✭ 159 (-23.92%)
Mutual labels:  model-compression

BERT-of-Theseus

Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing".

BERT-of-Theseus is a new compressed BERT by progressively replacing the components of the original BERT.

BERT of Theseus

Citation

If you use this code in your research, please cite our paper:

@inproceedings{xu-etal-2020-bert,
    title = "{BERT}-of-Theseus: Compressing {BERT} by Progressive Module Replacing",
    author = "Xu, Canwen  and
      Zhou, Wangchunshu  and
      Ge, Tao  and
      Wei, Furu  and
      Zhou, Ming",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.633",
    pages = "7859--7869"
}

NEW: We have uploaded a script for making predictions on GLUE tasks and preparing for leaderboard submission. Check out here!

How to run BERT-of-Theseus

Requirement

Our code is built on huggingface/transformers. To use our code, you must clone and install huggingface/transformers.

Compress a BERT

  1. You should fine-tune a predecessor model following the instruction from huggingface and then save it to a directory if you haven't done so.
  2. Run compression following the examples below:
# For compression with a replacement scheduler
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py \
  --model_name_or_path /path/to/saved_predecessor \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir "$GLUE_DIR/$TASK_NAME" \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 50 \
  --num_train_epochs 15 \
  --output_dir /path/to/save_successor/ \
  --evaluate_during_training \
  --replacing_rate 0.3 \
  --scheduler_type linear \
  --scheduler_linear_k 0.0006
# For compression with a constant replacing rate
export GLUE_DIR=/path/to/glue_data
export TASK_NAME=MRPC

python ./run_glue.py \
  --model_name_or_path /path/to/saved_predecessor \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir "$GLUE_DIR/$TASK_NAME" \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 50 \
  --num_train_epochs 15 \
  --output_dir /path/to/save_successor/ \
  --evaluate_during_training \
  --replacing_rate 0.5 \
  --steps_for_replacing 2500 

For the detailed description of arguments, please refer to the source code.

Load Pretrained Model on MNLI

We provide a 6-layer pretrained model on MNLI as a general-purpose model, which can transfer to other sentence classification tasks, outperforming DistillBERT (with the same 6-layer structure) on six tasks of GLUE (dev set).

Method MNLI MRPC QNLI QQP RTE SST-2 STS-B
BERT-base 83.5 89.5 91.2 89.8 71.1 91.5 88.9
DistillBERT 79.0 87.5 85.3 84.9 59.9 90.7 81.2
BERT-of-Theseus 82.1 87.5 88.8 88.8 70.1 91.8 87.8

You can easily load our general-purpose model using huggingface/transformers.

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("canwenxu/BERT-of-Theseus-MNLI")

model = AutoModel.from_pretrained("canwenxu/BERT-of-Theseus-MNLI")

Bug Report and Contribution

If you'd like to contribute and add more tasks (only GLUE is available at this moment), please submit a pull request and contact me. Also, if you find any problem or bug, please report with an issue. Thanks!

Third-Party Implementations

We list some third-party implementations from the community here. Please kindly add your implementation to this list:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].