Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → pmichel31415 → are-16-heads-really-better-than-1

pmichel31415 / are-16-heads-really-better-than-1

Licence: MIT License

Code for the paper "Are Sixteen Heads Really Better than One?"

Programming Languages

77523 projects

Labels

nlp machine-learning transformer bert

Projects that are alternatives of or similar to are-16-heads-really-better-than-1

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Stars: ✭ 209 (+63.28%)

Mutual labels: transformer, bert

Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot

Stars: ✭ 14 (-89.06%)

Mutual labels: transformer, bert

FasterTransformer

Transformer related optimization, including BERT, GPT

Stars: ✭ 1,571 (+1127.34%)

Mutual labels: transformer, bert

vietnamese-roberta

A Robustly Optimized BERT Pretraining Approach for Vietnamese

Stars: ✭ 22 (-82.81%)

Mutual labels: transformer, bert

The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)

Stars: ✭ 44 (-65.62%)

Mutual labels: transformer, bert

SImple SenTence EmbeddeR

Stars: ✭ 66 (-48.44%)

Mutual labels: transformer, bert

KitanaQA: Adversarial training and data augmentation for neural question-answering models

Stars: ✭ 58 (-54.69%)

Mutual labels: transformer, bert

Google AI 2018 BERT pytorch implementation

Stars: ✭ 4,642 (+3526.56%)

Mutual labels: transformer, bert

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-81.25%)

Mutual labels: transformer, bert

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (-8.59%)

Mutual labels: transformer, bert

Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)

Stars: ✭ 3,443 (+2589.84%)

Mutual labels: transformer, bert

transformer-models

Deep Learning Transformer models in MATLAB

Stars: ✭ 90 (-29.69%)

Mutual labels: transformer, bert

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Stars: ✭ 55,742 (+43448.44%)

Mutual labels: transformer, bert

les-military-mrc-rank7

莱斯杯：全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案

Stars: ✭ 37 (-71.09%)

Mutual labels: transformer, bert

Natural Language Processing Tutorial for Deep Learning Researchers

Stars: ✭ 9,895 (+7630.47%)

Mutual labels: transformer, bert

tensorflow-ml-nlp-tf2

텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료

Stars: ✭ 245 (+91.41%)

Mutual labels: transformer, bert

SIGIR2021 Conure

One Person, One Model, One World: Learning Continual User Representation without Forgetting

Stars: ✭ 23 (-82.03%)

Mutual labels: transformer, bert

bert in a flask

A dockerized flask API, serving ALBERT and BERT predictions using TensorFlow 2.0.

Stars: ✭ 32 (-75%)

Mutual labels: transformer, bert

🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/

Stars: ✭ 23 (-82.03%)

Mutual labels: transformer, bert

XPersona: Evaluating Multilingual Personalized Chatbot

Stars: ✭ 54 (-57.81%)

Mutual labels: transformer, bert

View All Similar Projects ➔

Are Sixteen Heads Really Better than One?

This repository contains code to reproduce the experiments in our paper Are Sixteen Heads Really Better than One?.

Prerequisite

First, you will need python >=3.6 with pytorch>=1.0. Then, clone our forks of fairseq (for MT experiments) and pytorch-pretrained-BERT (for BERT):

# Fairseq
git clone https://github.com/pmichel31415/fairseq
# Pytorch pretrained BERT
git clone https://github.com/pmichel31415/pytorch-pretrained-BERT
cd pytorch-pretrained-BERT
git checkout paul
cd ..

If you are running into issues with pytorch-pretrained-BERT (because you have another version installed globally for instance), check out this work around (thanks @insop).

You will also need sacrebleu to evaluate BLEU score (pip install sacrebleu).

Ablation experiments

BERT

Running

bash experiments/BERT/heads_ablation.sh MNLI

Will fine-tune a pretrained BERT on MNLI (stored in ./models/MNLI) and perform the individual head ablation experiment from Section 3.1 in the paper alternatively you can run the experiment with CoLA, MRCP or SST-2 as a task in place of MNLI.

MT

You can obtain the pretrained WMT model from ~~this link from the fairseq repo~~ now this link. Use the Moses tokenizer and subword-nmt in conjunction to the BPE codes provided with the pretrained model to prepair any input file you want. Then run:

bash experiments/MT/wmt_ablation.sh $BPE_SEGMENTED_SRC_FILE $DETOKENIZED_REF_FILE

Systematic Pruning Experiments

BERT

To iteratively prune 10% heads in order of increasing importance run

bash experiments/BERT/heads_pruning.sh MNLI --normalize_pruning_by_layer

This will reuse the BERT model fine-tuned if you have run the ablation experiment before (otherwise it'll just fine-tune it for you). The output of this is very verbose, but you can get the gist of the result by calling grep "strategy\|results" -A1 on the output.

WMT

Similarly, just run:

bash experiments/MT/prune_wmt.sh $BPE_SEGMENTED_SRC_FILE $DETOKENIZED_REF_FILE

You might want to change the paths in the experiment files to point to the binarized fairseq dataset on whic you want to estimate importance scores.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 128

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗