All Projects â†’ YerevaNN â†’ WARP

YerevaNN / WARP

Licence: MIT license
Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification. https://aclanthology.org/2021.acl-long.381/

Programming Languages

python
139335 projects - #7 most used programming language
Jsonnet
166 projects

Projects that are alternatives of or similar to WARP

Learning-To-Compare-For-Text
Learning To Compare For Text , Few shot learning in text classification
Stars: ✭ 38 (-42.42%)
Mutual labels:  pretrained-models, few-shot-learning
finetuner
Finetuning any DNN for better embedding on neural search tasks
Stars: ✭ 442 (+569.7%)
Mutual labels:  pretrained-models, few-shot-learning
ProteinLM
Protein Language Model
Stars: ✭ 76 (+15.15%)
Mutual labels:  pretrained-models
open clip
An open source implementation of CLIP.
Stars: ✭ 1,534 (+2224.24%)
Mutual labels:  pretrained-models
motor-defect-detector-python
Predict performance issues with manufacturing equipment motors. Perform local or cloud analytics of the issues found, and then display the data on a user interface to determine when failures might arise.
Stars: ✭ 24 (-63.64%)
Mutual labels:  pretrained-models
object-size-detector-python
Monitor mechanical bolts as they move down a conveyor belt. When a bolt of an irregular size is detected, this solution emits an alert.
Stars: ✭ 26 (-60.61%)
Mutual labels:  pretrained-models
bruno
a deep recurrent model for exchangeable data
Stars: ✭ 34 (-48.48%)
Mutual labels:  few-shot-learning
Few-NERD
Code and data of ACL 2021 paper "Few-NERD: A Few-shot Named Entity Recognition Dataset"
Stars: ✭ 317 (+380.3%)
Mutual labels:  few-shot-learning
safety-gear-detector-python
Observe workers as they pass in front of a camera to determine if they have adequate safety protection.
Stars: ✭ 54 (-18.18%)
Mutual labels:  pretrained-models
roberta-wwm-base-distill
this is roberta wwm base distilled model which was distilled from roberta wwm by roberta wwm large
Stars: ✭ 61 (-7.58%)
Mutual labels:  pretrained-models
AdversarialBinaryCoding4ReID
Codes of the paper "Adversarial Binary Coding for Efficient Person Re-identification"
Stars: ✭ 12 (-81.82%)
Mutual labels:  adversarial
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-68.18%)
Mutual labels:  pretrained-models
pytorch-meta-dataset
A non-official 100% PyTorch implementation of META-DATASET benchmark for few-shot classification
Stars: ✭ 39 (-40.91%)
Mutual labels:  few-shot-learning
BIRADS classifier
High-resolution breast cancer screening with multi-view deep convolutional neural networks
Stars: ✭ 122 (+84.85%)
Mutual labels:  pretrained-models
ganbert
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks
Stars: ✭ 205 (+210.61%)
Mutual labels:  few-shot-learning
Black-Box-Tuning
ICML'2022: Black-Box Tuning for Language-Model-as-a-Service
Stars: ✭ 99 (+50%)
Mutual labels:  few-shot-learning
intruder-detector-python
Build an application that alerts you when someone enters a restricted area. Learn how to use models for multiclass object detection.
Stars: ✭ 16 (-75.76%)
Mutual labels:  pretrained-models
mammography metarepository
Meta-repository of screening mammography classifiers
Stars: ✭ 44 (-33.33%)
Mutual labels:  pretrained-models
MNIST-adversarial-images
Create adversarial images to fool a MNIST classifier in TensorFlow
Stars: ✭ 13 (-80.3%)
Mutual labels:  adversarial
super-gradients
Easily train or fine-tune SOTA computer vision models with one open source training library
Stars: ✭ 429 (+550%)
Mutual labels:  pretrained-models

🌀 WARP: Word-level Adversarial ReProgramming

This repository contains code for ACL'2021 Paper WARP: Word-level Adversarial ReProgramming.

WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.

In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.

Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.

Few-Shot Results

Set Model CB RTE
F1 Acc. Acc.
dev
GPT-3 Small 26.1 42.9 52.3
GPT-3 Med 40.4 58.9 48.4
GPT-3 57.2 82.1 72.9
PET (ALBERT) 59.4 85.1 69.8
iPET (ALBERT) 92.4 92.9 74.0
WARPinit (ALBERT) 84.0 87.5 71.8
test
GPT-3 52.0 75.6 69.0
PET (ALBERT) 60.2 87.2 67.2
iPET (ALBERT) 79.9 88.8 70.8
WARPinit (ALBERT) 70.2 82.4 69.1
Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server. We only show systems performing in a similar few-shot training setup using 32 examples.

Setup

The code requires YerevaNN's internal version of allennlp

git clone https://github.com/YerevaNN/allennlp
git checkout warp
pip install .

Training

Linear Probing

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [],
        "reorder_optimized": false,
        "max_batch_size": 8,
        "max_tokens_sq": 262144, "on_logits":  false, "pooling_index":  null, "seed":  1}'
    python -m allennlp train \
    -s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done

WARP_0

for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
    export HPARAMS='{
        "dataset": "'$DATASET'",
        "lr": 0.0001,
        "num_epochs": 20,
        "prompts": [null, "<mask>"],
        "reorder_optimized": true,
        "max_batch_size": 8,
        "max_tokens_sq": 262144,
        "on_logits": "pre_decoder_layer_norm",
        "pooling_index": 1,
        "seed": 1
    }'
    python -m allennlp train \
    -s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet
done

Training WARP

export DATASET="rte"
export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init":null,
    "dataset":"'$DATASET'",
    "ensure_whitespace_between":false,
    "lr":0.001,
    "max_batch_size":8,
    "max_tokens_sq":262144,
    "num_epochs":30,
    "prompt_better_init":"<mask>",
    "prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"<mask>",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29],
    "seed":1,
    "transformer_model":"roberta-large"
}'
python -m allennlp train \
-s .aim/t-${DATASET} configs/warp.jsonnet

WARP_init

Few-Shot Experiments

export HPARAMS='{
    "benchmark":"super_glue",
    "classifier_init": {
        "entailment": " yes",
        "not_entailment": " instead"
    },
    "dataset":"few_rte",
    "eval_mode":false,
    "lr":0.001,
    "max_batch_size":2,
    "max_tokens_sq":131072,
    "num_epochs":100,
    "num_gradient_accumulation_steps":2,
    "prompt_better_init": "[PAD]",
    "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],  [-16, "?"], "<mask>", [-20, ","], null, [-29, "!"],-30,-31],
    "seed":3,
    "str_cut_frac":0,
    "transformer_model":"albert-xxlarge-v2",
    "validation_metric": null
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet
export HPARAMS='{
   "benchmark":"super_glue",
   "classifier_init":{
      "entailment":" yes",
      "not_entailment":" instead"
   },
   "dataset":"few_rte",
   "grad_norm":1,
   "lr":0.001,
   "max_batch_size":2,
   "max_tokens_sq":131072,
   "num_epochs":30,
   "num_gradient_accumulation_steps":2,
   "prompt_better_init":"[PAD]",
   "prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"<mask>",[-20,","],null,[-29,"!"],-30,-31],
   "seed":1,
   "str_cut_frac":0.06,
   "transformer_model":"albert-xxlarge-v2",
   "validation_metric":"+training_val_metric"
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet

Evaluation

python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test
python -m allennlp predict \
  --silent --use-dataset-reader --cuda-device 0 \
  --batch-size 50 \
  --predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched

Citation

If you want to refer to our work use this bibTeX:

@inproceedings{hambardzumyan-etal-2021-warp,
    title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",
    author = "Hambardzumyan, Karen  and
      Khachatrian, Hrant  and
      May, Jonathan",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.acl-long.381",
    doi = "10.18653/v1/2021.acl-long.381",
    pages = "4921--4933"
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].