All Projects → MurtyShikhar → ExpBERT

MurtyShikhar / ExpBERT

Licence: other
Code for our ACL '20 paper "Representation Engineering with Natural Language Explanations"

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects

Labels

Projects that are alternatives of or similar to ExpBERT

wisdomify
A BERT-based reverse dictionary of Korean proverbs
Stars: ✭ 95 (+239.29%)
Mutual labels:  bert
embedding study
中文预训练模型生成字向量学习,测试BERT,ELMO的中文效果
Stars: ✭ 94 (+235.71%)
Mutual labels:  bert
datagrand bert
2019达观杯信息提取第5名代码
Stars: ✭ 20 (-28.57%)
Mutual labels:  bert
neuro-comma
🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺
Stars: ✭ 46 (+64.29%)
Mutual labels:  bert
les-military-mrc-rank7
莱斯杯:全国第二届“军事智能机器阅读”挑战赛 - Rank7 解决方案
Stars: ✭ 37 (+32.14%)
Mutual labels:  bert
LAMB Optimizer TF
LAMB Optimizer for Large Batch Training (TensorFlow version)
Stars: ✭ 119 (+325%)
Mutual labels:  bert
OpenUE
OpenUE是一个轻量级知识图谱抽取工具 (An Open Toolkit for Universal Extraction from Text published at EMNLP2020: https://aclanthology.org/2020.emnlp-demos.1.pdf)
Stars: ✭ 274 (+878.57%)
Mutual labels:  bert
rasa-bert-finetune
支持rasa-nlu 的bert finetune
Stars: ✭ 46 (+64.29%)
Mutual labels:  bert
ERNIE-text-classification-pytorch
This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.
Stars: ✭ 49 (+75%)
Mutual labels:  bert
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Stars: ✭ 738 (+2535.71%)
Mutual labels:  bert
Fill-the-GAP
[ACL-WS] 4th place solution to gendered pronoun resolution challenge on Kaggle
Stars: ✭ 13 (-53.57%)
Mutual labels:  bert
sister
SImple SenTence EmbeddeR
Stars: ✭ 66 (+135.71%)
Mutual labels:  bert
BERTOverflow
A Pre-trained BERT on StackOverflow Corpus
Stars: ✭ 40 (+42.86%)
Mutual labels:  bert
TwinBert
pytorch implementation of the TwinBert paper
Stars: ✭ 36 (+28.57%)
Mutual labels:  bert
bert extension tf
BERT Extension in TensorFlow
Stars: ✭ 29 (+3.57%)
Mutual labels:  bert
cmrc2019
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)
Stars: ✭ 118 (+321.43%)
Mutual labels:  bert
AliceMind
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
Stars: ✭ 1,479 (+5182.14%)
Mutual labels:  bert
R-AT
Regularized Adversarial Training
Stars: ✭ 19 (-32.14%)
Mutual labels:  bert
GEANet-BioMed-Event-Extraction
Code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs
Stars: ✭ 52 (+85.71%)
Mutual labels:  bert
bert attn viz
Visualize BERT's self-attention layers on text classification tasks
Stars: ✭ 41 (+46.43%)
Mutual labels:  bert

ExpBERT: Representation Engineering with Natural Language Explanations

Overview of the ExpBERT approach.

This repository contains code, scripts, data and checkpoints for running experiments in the following paper:

Shikhar Murty, Pang Wei Koh, Percy Liang

[ExpBERT: Representation Engineering with Natural Language Explanations]

The experiments uses datasets and precomputed features which can be downloaded here:

Abstract

Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text. In this paper, we allow model developers to specify these types of inductive biases as natural language explanations. We use BERT fine-tuned on MultiNLI to "interpret" these explanations with respect to the input sentence, producing explanation-guided representations of the input. Across three relation extraction tasks, our method, ExpBERT, matches a BERT baseline but with 3--20$\times$ less labeled data and improves on the baseline by 3--10 F1 points with the same amount of labeled data.

Dependencies

Install all dependencies using conda:

conda env create -f environment.yml
conda activate lang-supervision
pip install -e .

Setup

To run our code, first download the data/features into $DATA_DIR. The main point of entry to the code is run.py. Below we provide commands to train models on the Spouse dataset. To modify it for Disease set --task_name disease and to modify it for TACRED, set --task_name tacred and --num_classes 42:

NoExp

python run.py --data_dir $DATA_DIR/spouse --train --num_train_epochs 100 --task_name spouse --classifier_type feature_concat --exp_dir input-features --num_classes 2 --train_distributed 0 --dev_distributed 0 --save_model --output_dir $outdir

Semparse (ProgExp) / Semparse (LangExp) / Patterns

python run.py --data_dir $DATA_DIR/spouse --train --num_train_epochs 100 --task_name spouse --classifier_type feature_concat --exp_dir input-features --feat_dir $feat --num_classes 2 --train_distributed 0 --dev_distributed 0 --save_model --output_dir $outdir

where $feat is semparse-progexp-features, semparse-langexp-features or regex-features based on the interpreter needed.

ExpBERT

python run.py --data_dir $DATA_DIR/spouse --train --num_train_epochs 100 --task_name spouse --classifier_type feature_concat --exp_dir expbert-features --num_classes 2 --train_distributed 10 --dev_distributed 0 --save_model --output_dir $outdir

Note that train_distributed is set to 10 here since inside spouse/expbert-features there are 10 files corresponding to the training features. This sharding is done to parallelize the creation of expbert features.

Feature Pipeline

To produce ExpBERT features for your own dataset/explanations, we also provide a feature-pipeline. First, download a BERT/SciBERT model fine-tuned on the MultiNLI dataset from here into $BERT.

Then create a config.yaml file such as the following:

interpreter:
    type: bert
    path: $BERT
    use_logits: False
paths:
    data_dir: $fictional_dataset_dir
    exp_dir: explanations
    save_dir: $fictional_dataset_dir/expbert-features
data_reader: reader

Note that while we provide readers for the datasets used in this paper, a different reader might be required for your dataset - look at data_utils/readers.py for more info.

Finally, run the following command to produce features: python create_features.py --exp_config config.yaml.

Model Checkpoints

Finally, we provide checkpoints for all models in our paper. The checkpoints can be found here. Commands for running the checkpoints can be found there as well.

CodaLab

https://worksheets.codalab.org/worksheets/0x609d2d6a66194592a7f44fbb67ba9f49

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].