All Projects → utahnlp → consistency

utahnlp / consistency

Licence: Apache-2.0 license
Implementation of models in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to consistency

Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-15.38%)
Mutual labels:  bert, nli
hard-label-attack
Natural Language Attacks in a Hard Label Black Box Setting.
Stars: ✭ 26 (+0%)
Mutual labels:  bert, nli
bert nli
A Natural Language Inference (NLI) model based on Transformers (BERT and ALBERT)
Stars: ✭ 97 (+273.08%)
Mutual labels:  bert, nli
Awesome-Neural-Logic
Awesome Neural Logic and Causality: MLN, NLRL, NLM, etc. 因果推断,神经逻辑,强人工智能逻辑推理前沿领域。
Stars: ✭ 106 (+307.69%)
Mutual labels:  logic, first-order-logic
CIAN
Implementation of the Character-level Intra Attention Network (CIAN) for Natural Language Inference (NLI) upon SNLI and MultiNLI corpus
Stars: ✭ 17 (-34.62%)
Mutual labels:  snli, mnli
tensorflow-ml-nlp-tf2
텐서플로2와 머신러닝으로 시작하는 자연어처리 (로지스틱회귀부터 BERT와 GPT3까지) 실습자료
Stars: ✭ 245 (+842.31%)
Mutual labels:  bert, nli
DiscEval
Discourse Based Evaluation of Language Understanding
Stars: ✭ 18 (-30.77%)
Mutual labels:  bert, mnli
introduction-to-machine-learning
A document covering machine learning basics. 🤖📊
Stars: ✭ 17 (-34.62%)
Mutual labels:  regularization, loss-functions
SphereFace
🍑 TensorFlow Code for CVPR 2017 paper "SphereFace: Deep Hypersphere Embedding for Face Recognition"
Stars: ✭ 110 (+323.08%)
Mutual labels:  loss-functions
GeDML
Generalized Deep Metric Learning.
Stars: ✭ 30 (+15.38%)
Mutual labels:  loss-functions
AI Learning Hub
AI Learning Hub for Machine Learning, Deep Learning, Computer Vision and Statistics
Stars: ✭ 53 (+103.85%)
Mutual labels:  regularization
gender-unbiased BERT-based pronoun resolution
Source code for the ACL workshop paper and Kaggle competition by Google AI team
Stars: ✭ 42 (+61.54%)
Mutual labels:  bert
Star-lang-specification
Work in progress specs for the Star programming language
Stars: ✭ 26 (+0%)
Mutual labels:  consistency
sticker2
Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot
Stars: ✭ 14 (-46.15%)
Mutual labels:  bert
roberta-wwm-base-distill
this is roberta wwm base distilled model which was distilled from roberta wwm by roberta wwm large
Stars: ✭ 61 (+134.62%)
Mutual labels:  bert
Deep-Learning-Specialization-Coursera
Deep Learning Specialization Course by Coursera. Neural Networks, Deep Learning, Hyper Tuning, Regularization, Optimization, Data Processing, Convolutional NN, Sequence Models are including this Course.
Stars: ✭ 75 (+188.46%)
Mutual labels:  regularization
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Stars: ✭ 2,828 (+10776.92%)
Mutual labels:  bert
Kevinpro-NLP-demo
All NLP you Need Here. 个人实现了一些好玩的NLP demo,目前包含13个NLP应用的pytorch实现
Stars: ✭ 117 (+350%)
Mutual labels:  bert
Cross-Lingual-MRC
Cross-Lingual Machine Reading Comprehension (EMNLP 2019)
Stars: ✭ 66 (+153.85%)
Mutual labels:  bert
credo naming
🏷 A suite of Credo checks to enforce naming best practices in an Elixir project
Stars: ✭ 68 (+161.54%)
Mutual labels:  consistency

Implementation of the NLI model in our EMNLP 2019 paper: A Logic-Driven Framework for Consistency of Neural Models

@inproceedings{li2019consistency,
      author    = {Li, Tao and Gupta, Vivek and Mehta, Maitrey and Srikumar, Vivek},
      title     = {A Logic-Driven Framework for Consistency of Neural Models},
      booktitle = {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
      year      = {2019}
  }

Headsup

To include recent fix(es) in this repo and updates in pytorch/huggingface/apex, try the branch post-camera-ready.
For exact reproducibility, stick to this branch.


0. Prerequisites

[Hardware] All of our BERT models are based on BERT base version. The batch size, sequence length, and data format are configurated to run smoothly on CUDA device with 8GB memory.

Have the following installed:

python 3.6+
NVCC compiler 10.0
pytorch 1.0
h5py
numpy
spacy 2.0.11 (with en model)
nvidia apex
pytorch BERT by huggingface(https://github.com/huggingface/pytorch-pretrained-BERT)
	(download and put in ../pytorch-pretrained-BERT, not necessarily installed)
	(However, for exact reproducibility, use the pytorch-pretrained-BERT.zip in this repo)
glove.840B.300d.txt (under ./data/)
	(We don't actually use it, but need it for preprocessing (due to an old design).)

[SNLI] Besides above, make sure snli_1.0 data is unpacked to ./data/bert_nli/, e.g. ./data/bert_nli/snli_1.0_train.txt.

[MNLI] And have mnli_1.0 data unpacked to ./data/bert_nli/. We will use the mnli_dev_matched for validation, and the mnli_dev_mismatched for testing. For example, the validation file should be at ./data/bert_nli/multinli_1.0_dev_matched.txt

[MSCOCO] Unpack mscoco sample data via unzip ./data/bert_nli/mscoco.zip. The zip file contains training split (e.g. mscoco.raw.sent1.txt) with 400k sentence triples and test split (e.g. mscoco.test.raw.sent1.txt) with 100k sentence triples. In practice, our paper sampled 100k (i.e. 25%) from the training split, and used all examples in the test split.

1. Preprocessing

[SNLI] Preprocessing of SNLI is separated into the following steps.

python3 snli_extract.py --data ./data/bert_nli/snli_1.0_train.txt --output ./data/bert_nli/train
python3 snli_extract.py --data ./data/bert_nli/snli_1.0_dev.txt --output ./data/bert_nli/val
python3 snli_extract.py --data ./data/bert_nli/snli_1.0_test.txt --output ./data/bert_nli/test

python3 preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/bert_nli/ --output snli --tokenizer_output snli
python3 get_char_idx.py --dict ./data/bert_nli/snli.allword.dict --token_l 16 --freq 5 --output char

NOTE, For exact reproducibility, we will use the dev_excl_anno.raw.sent*.txt for actual SNLI validation. These files are already included in the ./data/bert_nli/ directory and will be implicitly used in the above scripts. The difference is that we reserved 1000 examples for preliminary manual analysis and then later excluded them from experiments to avoid contamination.

[MNLI] Preprocessing of MNLI dataset:

python3 mnli_extract.py --data ./data/bert_nli/multinli_1.0_dev_mismatched.txt --output ./data/bert_nli/mnli.test
python3 mnli_extract.py --data ./data/bert_nli/multinli_1.0_train.txt --output ./data/bert_nli/mnli.train
python3 mnli_extract.py --data ./data/bert_nli/multinli_1.0_dev_matched.txt --output ./data/bert_nli/mnli.dev

python3 preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 36 --dir ./data/bert_nli/ \
	--sent1 mnli.train.raw.sent1.txt --sent2 mnli.train.raw.sent2.txt --label mnli.train.label.txt \
	--sent1_val mnli.dev.raw.sent1.txt --sent2_val mnli.dev.raw.sent2.txt --label_val mnli.dev.label.txt \
	--sent1_test mnli.test.raw.sent1.txt --sent2_test mnli.test.raw.sent2.txt --label_test mnli.test.label.txt \
	--tokenizer_output mnli --output mnli --max_seq_l 500

[MSCOCO] Preprocessing of mscoco dataset:

python3 extra_preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/bert_nli/ --sent1 mscoco.raw.sent1.txt --sent2 mscoco.raw.sent2.txt --sent3 mscoco.raw.sent3.txt --tokenizer_output mscoco --output mscoco
python3 extra_preprocess.py --glove ./data/glove.840B.300d.txt --batch_size 48 --dir ./data/bert_nli/ --sent1 mscoco.test.raw.sent1.txt --sent2 mscoco.test.raw.sent2.txt --sent3 mscoco.test.raw.sent3.txt --tokenizer_output mscoco.test --output mscoco.test

2. BERT Baseline

[Finetuning once] on both SNLI and MNLI

mkdir models

GPUID=[GPUID]
LR=0.00003
PERC=1
for SEED in `seq 1 3`; do
	CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
	--train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
	--learning_rate $LR --epochs 3 --warmup_epoch 3 \
	--enc bert --cls linear --hidden_size 768 --percent $PERC --dropout 0.0 \
	--fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
	--save_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/scratch_mnli_snli_perc${PERC//.}_seed${SEED}.txt
done

Change [GPUID] to the desired device id, PERC specifies percentages of training data to use (1 is 100%). The above script will initiate BERT baselines with three different random seeds (i.e. three runs in a row). Expect to see exactly the same accuracy as we reported in our paper.

We also disabled the dropout in the final linear layer. However, there will be a dropout 0.1 (by default) inside of Bert during training.

[Finetuning twice] on both SNLI and MNLI

GPUID=[GPUID]
LR=0.00001
PERC=1
for SEED in `seq 1 3`; do
CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
	--train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
	--learning_rate $LR --epochs 3 --warmup_epoch 3 \
	--enc bert --cls linear --hidden_size 768 --percent $PERC --dropout 0.0 \
	--fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
	--load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
	--save_file models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}.txt
done

This will load the previously finetuned model and continue finetune with lowered learning rate. Expect to see exactly the same accuracy as we reported in our paper.

[Evaluation] on SNLI test set

GPUID=[GPUID]
PERC=1
SEED=[SEED]
CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir data/bert_nli/ --data snli.test.hdf5 \
--enc bert --cls linear --hidden_size 768 --fp16 1 --dropout 0.0 \
--load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/scratch_mnli_snli_perc${PERC//.}_seed${SEED}.evallog.txt

For MNLI, use --data mnli.test.hdf5.

[Evaluation] on mirror consistency

GPUID=[GPUID]
PERC=1
for SWAP_SENT in 0 1; do
for SEED in `seq 1 3`; do
CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir data/bert_nli/ --data mscoco.test.hdf5 \
	--enc bert --cls linear --hidden_size 768 --fp16 1 --dropout 0.0 --swap_sent $SWAP_SENT \
	--pred_output models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}_swap${SWAP_SENT} \
	--load_file models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}.evallog.txt
done
done

[Evaluation] on transitivity consistency

GPUID=[GPUID]
PERC=1
for PAIR in alpha beta gamma; do
for SEED in `seq 1 3`; do
CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir data/bert_nli/ --data mscoco.test.hdf5 \
	--enc bert --cls linear --hidden_size 768 --fp16 1 --dropout 0.0 --data_triple_mode 1 --sent_pair $PAIR --swap_sent 0 \
	--pred_output models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}_${PAIR} \
	--load_file models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED} | tee models/twice_scratch_mnli_snli_perc${PERC//.}_seed${SEED}.evallog.txt
done
done

3. BERT+M

GPUID=[GPUID]
LR=0.00001
CONSTR=6
PERC=1
LAMBD=1
for SEED in `seq 1 3`; do
	CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
	--train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
	--learning_rate $LR --epochs 3 --warmup_epoch 3 \
	--loss transition --fwd_mode flip --lambd ${LAMBD} \
	--enc bert --cls linear --hidden_size 768 --percent $PERC --dropout 0.0 --constr ${CONSTR} \
	--fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
	--load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
	--save_file models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED} | tee models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}.txt
done

Do change PERC and LAMBD accordingly.

[Evaluation] on mirror consistency

GPUID=[GPUID]
LR=0.00001
CONSTR=6
PERC=0.2
LAMBD=1
for SWAP_SENT in 0 1; do
for SEED in `seq 1 3`; do
	CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ --data mscoco.test.hdf5 \
	--enc bert --cls linear --dropout 0.0 --hidden_size 768 --fp16 1 --data_triple_mode 0 --swap_sent $SWAP_SENT \
	--pred_output models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}_swap${SWAP_SENT} \
	--load_file models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED} | tee models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}.triplelog.txt
done
done

python3 confusion_table.py --log both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}

[Evaluation] on transitivity consistency

GPUID=[GPUID]
LR=0.00001
CONSTR=6
PERC=0.2
LAMBD=1
for PAIR in alpha beta gamma; do
for SEED in `seq 1 3`; do
	CUDA_VISIBLE_DEVICES=$GPUID python3 -u eval.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ --data mscoco.test.hdf5 \
	--enc bert --cls linear --dropout 0.0 --hidden_size 768 --fp16 1 --data_triple_mode 1 --sent_pair $PAIR \
	--pred_output models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}_${PAIR} \
	--load_file models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED} | tee models/both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.}_seed${SEED}.triplelog.txt
done
done

for SEED in `seq 1 3`; do
	python3 triple_confusion.py --log both_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_perc${PERC//.} --seed $SEED
done

4. BERT+M,U

GPUID=[GPUID]
PERC=0.01
PERC_U=0.25
CONSTR=6
LR=0.000005
LAMBD=1
LAMBD_P=0.001
for SEED in `seq 1 3`; do
CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
	--train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
	--unlabeled_data mscoco.hdf5 --unlabeled_triple_mode 0 \
	--loss transition --fwd_mode flip_and_unlabeled --lambd ${LAMBD} \
	--learning_rate $LR --epochs 3 --warmup_epoch 3 --dropout 0.0 --constr ${CONSTR} \
	--enc bert --cls linear --hidden_size 768 --percent $PERC --unlabeled_perc ${PERC_U} --lambd_p $LAMBD_P \
	--fix_bert 0 --optim adam_fp16 --fp16 1 --seed ${SEED} \
	--load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
	--save_file models/both_mscoco_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED} | tee models/both_mscoco_flip${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED}.txt
done

Here we set PERC_U=0.25 to sample about 100k unlabeled instance pairs(U) for training.

Do change PERC, LAMBD, and LAMBD_P accordingly. For evaluation, construct evaluation script accordingly as above.

5. BERT+M,U,T

GPUID=[GPUID]
PERC=0.01
PERC_U=0.25
CONSTR=1,2,3,4,6
LR=0.000005
LAMBD=1
LAMBD_P=0.00001
LAMBD_T=0.000001
for SEED in `seq 3 3`; do
CUDA_VISIBLE_DEVICES=$GPUID python3 -u train.py --gpuid 0 --bert_gpuid 0 --dir ./data/bert_nli/ \
	--train_data mnli.train.hdf5 --val_data mnli.val.hdf5 --extra_train_data snli.train.hdf5 --extra_val_data snli.val.hdf5 \
	--unlabeled_data mscoco.hdf5 --unlabeled_triple_mode 1 \
	--loss transition --fwd_mode flip_and_triple --fix_bert 0 --optim adam_fp16 --fp16 1 --weight_decay 1 \
	--learning_rate $LR --epochs 3 --warmup_epoch 3 --dropout 0.0 --constr ${CONSTR} \
	--enc bert --cls linear --hidden_size 768 --percent $PERC --unlabeled_perc ${PERC_U} --lambd ${LAMBD} --lambd_p $LAMBD_P --lambd_t $LAMBD_T \
	--seed ${SEED} \
	--load_file models/scratch_mnli_snli_perc${PERC//.}_seed${SEED} \
	--save_file models/both_mscoco_flip_triple${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_${LAMBD_T//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED} | tee models/both_mscoco_flip_triple${CONSTR//,}_lr${LR//.}_lambd${LAMBD//.}_${LAMBD_P//.}_${LAMBD_T//.}_perc${PERC//.}_${PERC_U//.}_seed${SEED}.txt
done

Here we set ```PERC_U=0.25``` to sample about ```100k``` unlabeled instance triples(T) for training.

Do change PERC, LAMBD, and LAMBD_P accordingly. For evaluation, construct evaluation script accordingly as above.

Hyperparameters

Please refer to the appendices of our paper for details of hyperparameters. The --learning_rate, --lambd, --lambd_p, and --lambd_t change over different percentages --percent and --unlabeled_perc.

Issues & To-dos

  • Sanity check
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].