All Projects → StonyBrookNLP → deformer

StonyBrookNLP / deformer

Licence: MIT license
[ACL 2020] DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to deformer

KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (-47.75%)
Mutual labels:  transformer, question-answering
fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (+279.28%)
Mutual labels:  transformer, question-answering
Rust Bert
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
Stars: ✭ 510 (+359.46%)
Mutual labels:  transformer, question-answering
verseagility
Ramp up your custom natural language processing (NLP) task, allowing you to bring your own data, use your preferred frameworks and bring models into production.
Stars: ✭ 23 (-79.28%)
Mutual labels:  transformer, question-answering
denspi
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
Stars: ✭ 188 (+69.37%)
Mutual labels:  question-answering
visualization
a collection of visualization function
Stars: ✭ 189 (+70.27%)
Mutual labels:  transformer
patrick-wechat
⭐️🐟 questionnaire wechat app built with taro, taro-ui and heart. 微信问卷小程序
Stars: ✭ 74 (-33.33%)
Mutual labels:  question-answering
laravel5-hal-json
Laravel 5 HAL+JSON API Transformer Package
Stars: ✭ 15 (-86.49%)
Mutual labels:  transformer
finance-qa-spider
金融问答平台文本数据采集/爬取,数据源涉及上交所,深交所,全景网及新浪股吧
Stars: ✭ 33 (-70.27%)
Mutual labels:  question-answering
transformer-models
Deep Learning Transformer models in MATLAB
Stars: ✭ 90 (-18.92%)
Mutual labels:  transformer
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Stars: ✭ 1,566 (+1310.81%)
Mutual labels:  transformer
OverlapPredator
[CVPR 2021, Oral] PREDATOR: Registration of 3D Point Clouds with Low Overlap.
Stars: ✭ 293 (+163.96%)
Mutual labels:  transformer
HRFormer
This is an official implementation of our NeurIPS 2021 paper "HRFormer: High-Resolution Transformer for Dense Prediction".
Stars: ✭ 357 (+221.62%)
Mutual labels:  transformer
iPerceive
Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering | Python3 | PyTorch | CNNs | Causality | Reasoning | LSTMs | Transformers | Multi-Head Self Attention | Published in IEEE Winter Conference on Applications of Computer Vision (WACV) 2021
Stars: ✭ 52 (-53.15%)
Mutual labels:  question-answering
PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
Stars: ✭ 114 (+2.7%)
Mutual labels:  question-answering
MLH-Quizzet
This is a smart Quiz Generator that generates a dynamic quiz from any uploaded text/PDF document using NLP. This can be used for self-analysis, question paper generation, and evaluation, thus reducing human effort.
Stars: ✭ 23 (-79.28%)
Mutual labels:  question-answering
QA4IE
Original implementation of QA4IE
Stars: ✭ 24 (-78.38%)
Mutual labels:  question-answering
text-style-transfer-benchmark
Text style transfer benchmark
Stars: ✭ 56 (-49.55%)
Mutual labels:  transformer
Vision-Language-Transformer
Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)
Stars: ✭ 127 (+14.41%)
Mutual labels:  transformer
transformer-slt
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)
Stars: ✭ 92 (-17.12%)
Mutual labels:  transformer

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

This repo is the code for the DeFormer paper (Accepted to ACL 2020).

deformer

Installation

Tested on Ubuntu 16.04, 18.04 and macOS. (Windows should also work, but not tested)

You can create a separate python environment, e.g. virtualenv -p python3.7 .env and activate it by source .env/bin/activate

  1. Requirements: Python>=3.5 and TensorFlow >=1.14.0,<2.0

  2. pip install "tensorflow>=1.14.0,<2.0" or pip install tensorflow-gpu==1.15.3 (for GPU)

  3. pip install -r requirements.txt

NOTE: we call ebert for DeFormer BERT version, and sbert for applying KD & LRS in the paper.

For XLNet, you can check my fork for a reference implementation.

Usage

Dataset Processing

downloading datasets to data/datasets

the dataset dir should look like below (use tree -L 2 data/datasets):

data/datasets
├── BoolQ
│   ├── test.jsonl
│   ├── train.jsonl
│   └── val.jsonl
├── mnli
│   ├── dev_mismatched.tsv
│   └── train.tsv
├── qqp
│   ├── dev.tsv
│   ├── test.tsv
│   └── train.tsv
├── RACE
│   ├── dev
│   ├── test
│   └── train
└── squad_v1.1
    ├── dev-v1.1.json
    └── train-v1.1.json

convert to DeFormer format

convert:

deformer_dir=data/datasets/deformer
mkdir -p ${deformer_dir}

# squad v1.1
for version in 1.1; do
    data_dir=data/datasets/squad_v${version}
    for split in dev train; do
        python tools/convert_squad.py ${data_dir}/${split}-v${version}.json \
        ${deformer_dir}/squad_v${version}-${split}.jsonl
    done
done

# mnli
data_dir=data/datasets/mnli
python tools/convert_pair_dataset.py ${data_dir}/train.tsv ${deformer_dir}/mnli-train.jsonl -t mnli
python tools/convert_pair_dataset.py ${data_dir}/dev_matched.tsv ${deformer_dir}/mnli-dev.jsonl  -t mnli

# qqp
data_dir=data/datasets/qqp
python tools/convert_pair_dataset.py ${data_dir}/train.tsv ${deformer_dir}/qqp-train.jsonl -t qqp
python tools/convert_pair_dataset.py ${data_dir}/dev.tsv ${deformer_dir}/qqp-dev.jsonl -t qqp

# boolq
data_dir=data/datasets/BoolQ
python tools/convert_pair_dataset.py ${data_dir}/train.jsonl ${deformer_dir}/boolq-train.jsonl -t boolq
python tools/convert_pair_dataset.py ${data_dir}/val.jsonl ${deformer_dir}/boolq-dev.jsonl -t boolq

# race
data_dir=data/datasets/RACE
python tools/convert_race.py ${data_dir}/train ${deformer_dir}/race-train.jsonl
python tools/convert_race.py ${data_dir}/dev ${deformer_dir}/race-dev.jsonl

split 10% of train for tuning hyper-parameters:

cd ${deformer_dir}

cat squad_v1.1-train.jsonl | shuf > squad_v1.1-train-shuf.jsonl
head -n8760 squad_v1.1-train-shuf.jsonl > squad_v1.1-tune.jsonl
tail -n78839 squad_v1.1-train-shuf.jsonl > squad_v1.1-train.jsonl

cat boolq-train.jsonl | shuf > boolq-train-shuf.jsonl
head -n943 boolq-train-shuf.jsonl > boolq-tune.jsonl
tail -n8484 boolq-train-shuf.jsonl > boolq-train.jsonl

cat race-train.jsonl | shuf > race-train-shuf.jsonl
head -n8786 race-train-shuf.jsonl > race-tune.jsonl
tail -n79080 race-train-shuf.jsonl > race-train.jsonl

cat qqp-train.jsonl | shuf > qqp-train-shuf.jsonl
head -n36385 qqp-train-shuf.jsonl > qqp-tune.jsonl
tail -n327464 qqp-train-shuf.jsonl > qqp-train.jsonl

cat mnli-train.jsonl | shuf > mnli-train-shuf.jsonl
head -n39270 mnli-train-shuf.jsonl > mnli-tune.jsonl
tail -n353432 mnli-train-shuf.jsonl > mnli-train.jsonl

download BERT vocab

download bert.vocab to data/res

generating training and evaluation examples:

usage: python prepare.py -h

  • e.g., convert squad_v1.1 for bert:

    python prepare.py -m bert -t squad_v1.1 -s dev
    python prepare.py -m bert -t squad_v1.1 -s tune
    python prepare.py -m bert -t squad_v1.1 -s train -sm tf
  • e.g., convert squad_v1.1 for xlnet:

    model=xlnet
    task=squad_v1.1
    python prepare.py -m ${model} -t ${task} -s dev
    python prepare.py -m ${model} -t ${task} -s train -sm tf
  • convert all available tasks and all models:

    for model in bert ebert; do
      for task in squad_v1.1 mnli qqp boolq race; do
        python prepare.py -m ${model} -t ${task} -s dev
        python prepare.py -m ${model} -t ${task} -s tune
        python prepare.py -m ${model} -t ${task} -s train -sm tf
      done
    done

Training and Evaluation

SQuAD 1.1 Quickstart

download original fine-tuned BERT-base checkpoints from bert-base-squad_v1.1.tgz and DeFormer fine-tuned version from ebert-base-s9-squad_v1.1.tgz

python eval.py -m bert -t squad_v1.1 2>&1 | tee data/bert-base-eval.log example output:

INFO:2020-07-01_15:36:30.339:eval.py:65: model.ckpt-8299, em=80.91769157994324, f1=88.33819502660548, metric=88.33819502660548

python eval.py -m ebert -t squad_v1.1 2>&1 | tee data/ebert-base-s9-eval.log

example output:

INFO:2020-07-01_15:39:15.418:eval.py:65: model.ckpt-8321, em=79.12961210974456, f1=86.99636369864814, metric=86.99636369864814

Train and Eval

See config/*.ini for customizing training and evaluation script

  • train: python train.py specify model by -m(--model), task by -t(--task), eval is similar. see below example commands for boolq:

    # for running on tpu, should specify gcs bucket data_dir, and set use_tpu to yes
    # also need to set tpu_name=<some_ip_or_just_name> if not exported to environment
    base_dir=<your google cloud storage bucket>
    data_dir=${base_dir} use_tpu=yes \
    python train.py -m bert -t boolq 2>&1 | tee data/boolq-bert-train.log
    
    data_dir=${base_dir} use_tpu=yes \
    python eval.py -m bert -t boolq 2>&1 | tee data/boolq-bert-eval.log
    
    # for list of models and list of tasks
    for task in boolq mnli qqp squad_v1.1; do
      for model in bert ebert; do
        data_dir=${base_dir} use_tpu=yes \
        python train.py -m ${model} -t ${task} 2>&1 | tee data/${task}-${model}-train.log
        
        data_dir=${base_dir} use_tpu=yes \
        python eval.py -m ${model} -t ${task} 2>&1 | tee data/${task}-${model}-eval.log
      done
    done
  • BERT wwm large:

    base_dir=<your google cloud storage bucket>
    for t in boolq qqp squad_v1.1 mnli; do
      use_tpu=yes data_dir=${base_dir} \
      learning_rate=1e-5 epochs=2 keep_checkpoint_max=1 \
      init_checkpoint=${base_dir}/ckpt/init/wwm_uncased_large/bert_model.ckpt \
      checkpoint_dir=${base_dir}/ckpt/bert_large/${t} \
      hidden_size=1024 intermediate_size=4096 num_heads=16 num_hidden_layers=24 \
      python train.py -m bert -t ${t} 2>&1 | tee data/${t}-large-train.log
    
      data_dir=${base_dir} use_tpu=yes init_checkpoint="" \
      checkpoint_dir=${base_dir}/ckpt/bert_large/${t} \
      hidden_size=1024 intermediate_size=4096 num_heads=16 num_hidden_layers=24 \
      python eval.py -m bert -t ${t} 2>&1 | tee data/${t}-large-eval.log
    done || exit 1

Experimenting

Tune EBert

  • fine tuning for separation at different layers for bert base:

    for t in boolq qqp mnli squad_v1.1; do
      for n in `seq 1 1 11`; do
        echo "n=${n}, t=${t}"
        base_dir=${base_dir}
    
        sep_layers=${n} use_tpu=yes data_dir=${base_dir} keep_checkpoint_max=1 \
        checkpoint_dir="${base_dir}/ckpt/separation/${t}/ebert_s${n}" \
        python train.py -m ebert -t ${t} 2>&1 | tee data/${t}-base-sep${n}-train.log
    
        sep_layers=${n} use_tpu=yes data_dir=${base_dir} init_checkpoint="" \
        checkpoint_dir="${base_dir}/ckpt/separation/${t}/ebert_s${n}" \
        python eval.py -m ebert -t ${t} 2>&1 | tee data/${t}-base-sep${n}-eval.log
      done
    done
  • fine tuning for separation at different layers for wwm large bert:

    for t in boolq qqp mnli squad_v1.1; do
      for n in `seq 10 1 23`; do
        echo "n=${n}, t=${t}"
        base_dir=${base_dir}
      
        sep_layers=${n} use_tpu=yes data_dir=${base_dir} \
        learning_rate=1e-5 epochs=2 keep_checkpoint_max=1 \
        init_checkpoint=${base_dir}/ckpt/init/wwm_uncased_large/bert_model.ckpt \
        checkpoint_dir=${base_dir}/ckpt/separation/${t}/ebert_large_s${n} \
        hidden_size=1024 intermediate_size=4096 num_heads=16 num_hidden_layers=24 \
        python train.py -m ebert -t ${t} 2>&1 | tee data/${t}-large-sep${n}-train.log
      
        sep_layers=${n} use_tpu=yes data_dir=${base_dir} init_checkpoint="" \
        checkpoint_dir=${base_dir}/ckpt/separation/${t}/ebert_large_s${n} \
        hidden_size=1024 intermediate_size=4096 num_heads=16 num_hidden_layers=24 \
        output_file=${base_dir}/predictions/${t}-large-sep${n}-dev.json \
        python eval.py -m ebert -t ${t} 2>&1 | tee data/${t}-large-sep${n}-eval.log
      done || exit 1
    done || exit 1

Tune SBert

  • training script needs further verification (due to migrated from old codebase)

  • sbert procedure, first get ebert_s0, then merge bert_base and ebert_s0 checkpoints using tools/merge_checkpoints.py to get initial checkpoint for sbert, then run the training.

    base_dir=gs://xxx
    init_dir="data/ckpt/init"
    large_model="${init_dir}/wwm_uncased_large/bert_model.ckpt"
    base_model="${init_dir}/uncased_base/bert_model.ckpt"
    
    for t in squad_v1.1 boolq qqp mnli; do
      mkdir -p data/ckpt/separation/${t}
      
      # sbert large init
      large_init="data/ckpt/separation/${t}/ebert_large_s0"
      gsutil -m cp -r "${base_dir}/ckpt/separation/${t}/ebert_large_s0" data/ckpt/separation/${t}/
      
      python tools/merge_checkpoints.py -c1 "${large_init}" \
      -c2 "${large_model}" -o ${init_dir}/${t}_sbert_large.ckpt
      gsutil -m cp -r "${init_dir}/${t}_sbert_large.ckpt*" "${base_dir}/ckpt/init"
      
      # sbert large init from ebert_large_s0 all
      python tools/merge_checkpoints.py -c1 "${large_init}" -c2 "${large_model}" \
      -o ${init_dir}/${t}_sbert_large_all.ckpt -fo 
      gsutil -m cp -r "${init_dir}/${t}_sbert_large_all.ckpt*" "${base_dir}/ckpt/init"
    
      # sbert large init from ebert_large_s0 upper, e.g. 20
      python tools/merge_checkpoints.py -c1 "${large_init}" -c2 "${large_model}" \
      -o ${init_dir}/${t}_sbert_large_upper20.ckpt -fo -fou 20
      gsutil -m cp -r "${init_dir}/${t}_sbert_large_upper20.ckpt*" "${base_dir}/ckpt/init"
    
      # sbert base init
      base_init="data/ckpt/separation/${t}/ebert_s0"
    
      gsutil -m cp -r "${base_dir}/ckpt/separation/${t}/ebert_s0" data/ckpt/separation/${t}/
      python tools/merge_checkpoints.py -c1 "${base_init}" -c2 "${base_model}" \
      -o ${init_dir}/${t}_sbert_base.ckpt
      gsutil -m cp -r "${init_dir}/${t}_sbert_base.ckpt*" "${base_dir}/ckpt/init"
    
      python tools/merge_checkpoints.py -c1 "${base_init}" -c2 "${base_model}" \
      -o ${init_dir}/${t}_sbert_base_all.ckpt -fo 
      gsutil -m cp -r "${init_dir}/${t}_sbert_base.ckpt*" "${base_dir}/ckpt/init"
    
      python tools/merge_checkpoints.py -c1 "${base_init}" -c2 "${base_model}" \
      -o ${init_dir}/${t}_sbert_base_upper9.ckpt -fo -fou 9
      gsutil -m cp -r "${init_dir}/${t}_sbert_base.ckpt*" "${base_dir}/ckpt/init"
    done || exit 1
  • sbert finetuning:

    # squad_v1.1, search 50 params for bert large separated at layer 21
    python tools/explore_hp.py -p data/sbert-squad-large.json -n 50 \
    -s large -sp 1.4 0.3 0.8 -hp 5e-5,3,32 2>&1 | tee data/sbert-squad-explore-s21.log
    ./search.sh squad_v1.1 large 21 bert-tpu2
    
    # race search 50
    python tools/explore_hp.py -p data/race-sbert-s9.json -n 50 -t race 2>&1 | \
    tee data/race-sbert-explore-s9.log
    
    ./search.sh race base 9

Profiling

  • profile model flops:

    for task in race boolq race qqp mnli squad_v1.1; do
      for size in base large; do
        profile_dir=data/log2-${task}-${size}-profile
        mkdir -p "${profile_dir}"
              
        if [[ "${task}" == "mnli" ]]; then
          cs=1 # cache_segment
        else
          cs=2
        fi
    
        if [[ ${size} == "base" ]] ; then
          allowed_layers="9 10" # $(seq 1 1 11)
          large_params=""
        else
          allowed_layers="20 21" #$(seq 1 1 23)
          large_params="hidden_size=1024 intermediate_size=4096 num_heads=16 num_hidden_layers=24"
        fi
    
        if [[ ${task} == "race" ]] ; then
          large_params="num_choices=4 ${large_params}"
        fi
    
        # bert 
        eval "${large_params}" python profile.py -m bert -t ${task} -pm 2>&1 | \
        tee ${profile_dir}/bert-profile.log
    
        # ebert 
        for n in "${(@s/ /)allowed_layers}"; do
          eval "${large_params}" sep_layers="${n}" \
          python profile.py -m ebert -t ${task} -pm 2>&1 | \
          tee ${profile_dir}/ebert-s${n}-profile.log
      
          eval "${large_params}" sep_layers="${n}" \
          python profile.py -m ebert -t ${task} -pm -cs ${cs} 2>&1 | \
          tee ${profile_dir}/ebert-s${n}-profile-cache.log
        done
      done
    done
  • benchmarking inference latency:

    python profile.py -npf -pt -b 32 2>&1 | tee data/batch-time-bert.log
    python profile.py -npf -pt -b 32 -m ebert -cs 2 2>&1 | tee data/batch-time-ebert.log
  • analyze bert, ebert, sbert:

    python analyze.py -o data/qa-outputs -m bert 2>&1 | tee data/ana-bert.log
    python tools/compute_rep_variance.py data/qa-outputs -n 20
    
    python tools/compare_rep.py data/qa-outputs -m sbert
    python tools/compare_rep.py data/qa-outputs -m ebert

Demo

  • run infer: python infer_qa.py -m bert (add -e for eager mode)

Tools

  • tools/get_dataset_stats.py: get dataset statistics (length of tokens mainly)
  • tools/inspect_checkpoint.py: print variable info in checkpoints (support monitoring variables during training)
  • tools/rename_checkpoint_variables.py: rename variable names in checkpoint (add -dr for dry run) e.g. python tools/rename_checkpoint_variables.py "data/ckpt/bert/mnli/" -p "bert_mnli" "mnli" -dr
  • tools/visualize_model.py: visualize TensorFlow model structure given inference graph

Handy Commands

  • redis

    redis-cli -p 60001 lrange queue:params 0 -1
    redis-cli -p 60001 lrange queue:results 0 -1
    redis-cli -p 60001 lpop queue:params
    redis-cli -p 60001 rpush queue:results 89.532
  • gcloud sdk for TPU access: pip install --upgrade google-api-python-client oauth2client

  • TPU start: ctpu up --tpu-size=v3-8 --tpu-only --name=bert-tpu --noconf (can support tf version, e.g.--tf-version=1.13)

  • TPU stop: ctpu pause --tpu-only --name=bert-tpu --noconf

  • move instances: gcloud compute instances move bert-vm --zone us-central1-b --destination-zone us-central1-a

  • upload and download:

    cd data
    # upload
    gsutil -m cp -r datasets/qqp/ebert "gs://xxx/datasets/qqp/ebert"
    gsutil -m cp -r datasets/qa/ebert "gs://xxx/datasets/qa/ebert"
    gsutil -m cp -r datasets/mnli/ebert "gs://xxx/datasets/mnli/ebert"
    gsutil -m cp -r "datasets/qa/bert/hotpot-*" "gs://xxx/datasets/qa/bert"
    
    # download
    gsutil -m cp -r "gs://xxx/datasets/qqp/ebert" qqp/ebert
    
    cd data/ckpt
    # download
    gsutil -m cp -r "gs://xxx/ckpt/bert/qa/model.ckpt-8299*" bert/qa/
    gsutil -m cp -r "gs://xxx/ckpt/ebert_s9/qa/model.ckpt-8321*" ebert_s9/qa/
    gsutil -m cp -r "gs://xxx/ckpt/ebert_s9/mnli/model.ckpt-18407*" ebert_s9/mnli/
    gsutil -m cp -r "gs://xxx/ckpt/ebert_s9/qqp/model.ckpt-17055*" ebert_s9/qqp/
    
    function dl()
    {
      num=$2
      for suffix in meta index data-00000-of-00001; do
        gsutil cp gs://xxx/ckpt/$1/model.ckpt-${num}.${suffix} .
      done;
      echo model_checkpoint_path: \"model.ckpt-${num}\" > checkpoint
    }
    

FAQ

If you have any question, please create an issue.

Citation

If you find our work useful to your research, please consider using the following citation:

@inproceedings{cao-etal-2020-deformer,
    title = "{D}e{F}ormer: Decomposing Pre-trained Transformers for Faster Question Answering",
    author = "Cao, Qingqing  and
      Trivedi, Harsh  and
      Balasubramanian, Aruna  and
      Balasubramanian, Niranjan",
    booktitle = "Proceedings of the 58th Annual Mdeformering of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.411",
    pages = "4487--4497",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].