All Projects → zjunlp → DiagnoseRE

zjunlp / DiagnoseRE

Licence: MIT License
Source code and dataset for the CCKS201 paper "On Robustness and Bias Analysis of BERT-based Relation Extraction"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to DiagnoseRE

perceptual-advex
Code and data for the ICLR 2021 paper "Perceptual Adversarial Robustness: Defense Against Unseen Threat Models".
Stars: ✭ 44 (+91.3%)
Mutual labels:  robustness, adversarial-attacks
POPQORN
An Algorithm to Quantify Robustness of Recurrent Neural Networks
Stars: ✭ 44 (+91.3%)
Mutual labels:  robustness, adversarial-attacks
square-attack
Square Attack: a query-efficient black-box adversarial attack via random search [ECCV 2020]
Stars: ✭ 89 (+286.96%)
Mutual labels:  robustness, adversarial-attacks
TIGER
Python toolbox to evaluate graph vulnerability and robustness (CIKM 2021)
Stars: ✭ 103 (+347.83%)
Mutual labels:  robustness, adversarial-attacks
SimP-GCN
Implementation of the WSDM 2021 paper "Node Similarity Preserving Graph Convolutional Networks"
Stars: ✭ 43 (+86.96%)
Mutual labels:  robustness, adversarial-attacks
s-attack
[CVPR 2022] S-attack library. Official implementation of two papers "Vehicle trajectory prediction works, but not everywhere" and "Are socially-aware trajectory prediction models really socially-aware?".
Stars: ✭ 51 (+121.74%)
Mutual labels:  robustness, adversarial-attacks
facerec-bias-bfw
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).
Stars: ✭ 40 (+73.91%)
Mutual labels:  bias
cool-papers-in-pytorch
Reimplementing cool papers in PyTorch...
Stars: ✭ 21 (-8.7%)
Mutual labels:  adversarial-attacks
robust-local-lipschitz
A Closer Look at Accuracy vs. Robustness
Stars: ✭ 75 (+226.09%)
Mutual labels:  robustness
Relation-Extraction-Transformer
NLP: Relation extraction with position-aware self-attention transformer
Stars: ✭ 63 (+173.91%)
Mutual labels:  relation-extraction
Robust-Semantic-Segmentation
Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation (ICCV2021)
Stars: ✭ 25 (+8.7%)
Mutual labels:  robustness
RayS
RayS: A Ray Searching Method for Hard-label Adversarial Attack (KDD2020)
Stars: ✭ 43 (+86.96%)
Mutual labels:  robustness
shortcut-perspective
Figures & code from the paper "Shortcut Learning in Deep Neural Networks" (Nature Machine Intelligence 2020)
Stars: ✭ 67 (+191.3%)
Mutual labels:  robustness
ijcnn19attacks
Adversarial Attacks on Deep Neural Networks for Time Series Classification
Stars: ✭ 57 (+147.83%)
Mutual labels:  adversarial-attacks
code-soup
This is a collection of algorithms and approaches used in the book adversarial deep learning
Stars: ✭ 18 (-21.74%)
Mutual labels:  adversarial-attacks
FairAI
This is a collection of papers and other resources related to fairness.
Stars: ✭ 55 (+139.13%)
Mutual labels:  bias
gans-in-action
"GAN 인 액션"(한빛미디어, 2020)의 코드 저장소입니다.
Stars: ✭ 29 (+26.09%)
Mutual labels:  adversarial-attacks
Attack-ImageNet
No.2 solution of Tianchi ImageNet Adversarial Attack Challenge.
Stars: ✭ 41 (+78.26%)
Mutual labels:  adversarial-attacks
MetaLifelongLanguage
Repository containing code for the paper "Meta-Learning with Sparse Experience Replay for Lifelong Language Learning".
Stars: ✭ 21 (-8.7%)
Mutual labels:  relation-extraction
3DShapeGen
Code for 3D Reconstruction of Novel Object Shapes from Single Images paper
Stars: ✭ 92 (+300%)
Mutual labels:  generalization

DiagnoseRE

This repository is the official implementation of the CCKS2021 paper On Robustness and Bias Analysis of BERT-based Relation Extraction.

Requirements

  • To install basic requirements:
pip install requirements.txt
  • Also opennre==0.1 is required to run the codes and it could be installed by following the instructions here.

Datasets

Basic training & testing

To train a model with given dataset, run train.py:

$ python train.py -h
usage: train.py [-h] --model_path MODEL_PATH [--restore] --train_path
                TRAIN_PATH --valid_path VALID_PATH --relation_path
                RELATION_PATH [--num_epochs NUM_EPOCHS]
                [--max_seq_len MAX_SEQ_LEN] [--batch_size BATCH_SIZE]
                [--metric {micro_f1,acc}]

optional arguments:
  -h, --help            show this help message and exit
  --model_path MODEL_PATH, -m MODEL_PATH
                        Full path for saving weights during training
  --restore             Whether to restore model weights from given model path
  --train_path TRAIN_PATH, -t TRAIN_PATH
                        Full path to file containing training data
  --valid_path VALID_PATH, -v VALID_PATH
                        Full path to file containing validation data
  --relation_path RELATION_PATH, -r RELATION_PATH
                        Full path to json file containing relation to index
                        dict
  --num_epochs NUM_EPOCHS, -e NUM_EPOCHS
                        Number of training epochs
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Maximum sequence length of bert model
  --batch_size BATCH_SIZE, -b BATCH_SIZE
                        Batch size for training and testing
  --metric {micro_f1,acc}
                        Metric chosen for evaluation

To test a model with a given dataset, run test.py:

$ python test.py -h
usage: test.py [-h] --model_path MODEL_PATH --test_path TEST_PATH
               --relation_path RELATION_PATH [--max_seq_len MAX_SEQ_LEN]
               [--batch_size BATCH_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  --model_path MODEL_PATH, -m MODEL_PATH
                        Full path for saving weights during training
  --test_path TEST_PATH, -t TEST_PATH
                        Full path to file containing testing data
  --relation_path RELATION_PATH, -r RELATION_PATH
                        Full path to json file containing relation to index
                        dict
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Maximum sequence length of bert model
  --batch_size BATCH_SIZE, -b BATCH_SIZE
                        Batch size for training and testing

Q1 Random Perturbation

This part is based on checklist, and you may run the codes under Q1 folder.

Also the spaCy model en_core_web_sm-2.3.1 should be placed under Q1 folder. The model can be downloaded by running: python -m spacy download en_core_web_sm .

Remember to modify the spacy_model_path in random_pertueb.py is necessary.

  • To random perturb a dataset, run Q1/random_perturb.py:
$ python random_perturb.py -h
usage: random_perturb.py [-h] [--input_file INPUT_FILE]
                         [--output_file OUTPUT_FILE]
                         [--locations {sent1,sent2,sent3,ent1,ent2} [{sent1,sent2,sent3,ent1,ent2} ...]]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Input file containing dataset
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Output file containing perturbed dataset
  --locations {sent1,sent2,sent3,ent1,ent2} [{sent1,sent2,sent3,ent1,ent2} ...], -l {sent1,sent2,sent3,ent1,ent2} [{sent1,sent2,sent3,ent1,ent2} ...]
                        List of positions that you want to perturb
  • To merge datasets and filter out common samples (you may want to do this when merging two perturbed datasets both keeping original samples in them), run merge_dataset.py:
$ python merge_dataset.py -h
usage: merge_dataset.py [-h] [--input_files INPUT_FILES [INPUT_FILES ...]]
                        [--output_file OUTPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input_files INPUT_FILES [INPUT_FILES ...], -i INPUT_FILES [INPUT_FILES ...]
                        List of input files containing datasets to merge
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Output file containing merged dataset
  • After data generation, you can train / test the model using train.py / test.py.

Q2 Adversarial Perturbations

This part is based on OpenAttack.

Adverserial sample generating on RE supports all the models in OpenAttack library like PWWS,TextFooler,HotFlip and so on. Also, the codes supports multi-process working to boost the sample generating process.

  • You may filter samples that model predicts correctly before generating adverserial samples with them, by running Q2/filter_correct.py:
$ python filter_correct.py -h
usage: filter_correct.py [-h] --input_file INPUT_FILE --model_path MODEL_PATH
                         --relation_path RELATION_PATH --output_file
                         OUTPUT_FILE [--max_seq_len MAX_SEQ_LEN]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing full dataset is
  --model_path MODEL_PATH, -m MODEL_PATH
                        Full path for loading weights of model to attack
  --relation_path RELATION_PATH, -r RELATION_PATH
                        Full path to json file containing relation to index
                        dict
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Place to save samples that model predicts correctly
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Maximum sequence length of bert model
  • Using samples model can correctly predict, you can generate adverserial samples using Q2/attack.py:
$ python attack.py -h
usage: attack.py [-h] --input_file INPUT_FILE --model_path MODEL_PATH
                 --relation_path RELATION_PATH [--attacker {pw,tf,hf,uat}]
                 --output_file OUTPUT_FILE [--max_seq_len MAX_SEQ_LEN]
                 [--num_jobs NUM_JOBS] [--start_index START_INDEX]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing original dataset is
  --model_path MODEL_PATH, -m MODEL_PATH
                        Full path for loading weights of model to attack
  --relation_path RELATION_PATH, -r RELATION_PATH
                        Full path to json file containing relation to index
                        dict
  --attacker {pw,tf,hf,uat}, -a {pw,tf,hf,uat}
                        Name of attacker model, pw = PWWS, tf = TextFooler, hf
                        = HotFlip
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where to store adverserial dataset is
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Maximum sequence length of bert model
  --num_jobs NUM_JOBS, -j NUM_JOBS
                        Maximum number of parallel workers in attacking
  --start_index START_INDEX, -s START_INDEX
                        Index of sample to start processing, used when you
                        want to restore progress
  • With samples generated, you can use train.py / test.py to see the effect of adverserial perturbations.
    • FYI, the evaluation for training / testing should include the original data, and you can get a full dataset by simply concatenating (i.e., cat command on UNIX machines) normal data file and adverserial data file.

Q3 Couterfactural Masking

This part first calculates importance of tokens in each sample using integrated gradient method, and then generate masked samples using the gradient information as tokens' weights.

Because the tokens here refers to tokens genrated by Transformers.BertTokenizer (not the token in original sample), the training / testing process needs to be modified. Also, the couterfactural samples are natually labeled as no_relation (so wiki80 dataset is excluded in this part), the metrics used (originally as micro-F1) should be modified. Here you can use train_cf.py and test_cf.py to perform training and testing.

Let's use the scripts step by step:

  • To generate file containing token with gradient information, run Q3/integrated_gradient.py:
$ python integrated_gradient.py -h
usage: integrated_gradient.py [-h] --input_file INPUT_FILE --model_path
                              MODEL_PATH --relation_path RELATION_PATH
                              [--output_file OUTPUT_FILE]
                              [--max_seq_len MAX_SEQ_LEN]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing original dataset is
  --model_path MODEL_PATH, -m MODEL_PATH
                        Full path for loading weights of model to attack
  --relation_path RELATION_PATH, -r RELATION_PATH
                        Full path to json file containing relation to index
                        dict
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where to store token information with gradient
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Maximum sequence length of bert model
  • To generate couterfactural samples based on gradient information, run Q3/counterfactural.py:
$ python counterfactual.py -h
usage: counterfactual.py [-h] --grad_file GRAD_FILE --input_file INPUT_FILE
                         --output_file OUTPUT_FILE [--keep_origin]
                         [--num_tokens NUM_TOKENS] [--threshold THRESHOLD]

optional arguments:
  -h, --help            show this help message and exit
  --grad_file GRAD_FILE, -g GRAD_FILE
                        Where the input file containing tokens with gradient
                        is
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing original dataset is
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where to store token information with gradient
  --keep_origin, -k     Whether to keep original samples(as tokenized samples)
  --num_tokens NUM_TOKENS, -n NUM_TOKENS
                        Maximum number of tokens masked(removed) in counter-
                        factual samples
  --threshold THRESHOLD, -t THRESHOLD
                        Lower bound of key words, i.e. the minimum required
                        weight of being a key word
  • To perform training / testing on counterfactural datasets, you can run Q3/train_cf.py and Q3/test_cf.py in a manner just like using the train.py and test.py:
$ python train_cf.py -h
usage: train_cf.py [-h] --model_path MODEL_PATH [--restore] --train_path
                   TRAIN_PATH --valid_path VALID_PATH --relation_path
                   RELATION_PATH [--num_epochs NUM_EPOCHS]
                   [--max_seq_len MAX_SEQ_LEN] [--batch_size BATCH_SIZE]
                   [--metric {micro_f1,acc}]

optional arguments: ...  # Details omitted

$ python test_cf.py -h
usage: test_cf.py [-h] --model_path MODEL_PATH --test_path TEST_PATH
                  --relation_path RELATION_PATH [--max_seq_len MAX_SEQ_LEN]
                  [--batch_size BATCH_SIZE]

optional arguments: ...  # Details omitted
  • The Q3/fix_length.py is a small tool script that aligns a given tokenized dataset to a fixed length:
$ python fix_length.py -h
usage: fix_length.py [-h] --input_file INPUT_FILE --output_file OUTPUT_FILE
                     [--max_seq_len MAX_SEQ_LEN]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        File containing tokenized dataset
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        File to output fixed dataset
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Number of tokens in a sample

Q4 Debiased Masking

This part is designed to find tokens (words) that are unrelated but highly co-occurence with different relations.

You can generate auto rules (i.e., words that comes with a relation but has little to do with it semantically) using Q4/generate_rules.py, which is based on TF-IDF algorithm and finds tokens with higher scores for each relation.

After that, you may need to edit the rules finer-grained manually, to avoid mistakes (sometimes the words chosen are related with the relation's meaning, like father/mother in parent relation, and masking the tokens would change the meaning of these sentences).

With rules derived above, you can generate debiased samples with Q4/mask_by_rules.py.

  • Use Q3/generate_rules.py to automatically generate rules (remember to check the rules manually before generating masked samples):
$ python generate_rules.py -h
usage: generate_rules.py [-h] --input_file INPUT_FILE [INPUT_FILE ...]
                         --output_file OUTPUT_FILE [--num_rules NUM_RULES]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE [INPUT_FILE ...], -i INPUT_FILE [INPUT_FILE ...]
                        Where the input file containing dataset is
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where to store token information with gradient
  --num_rules NUM_RULES, -n NUM_RULES
                        Maximum number of key words chosen as rules for each
                        relation
  • Use Q4/mask_by_rules.py to generate mask samples:
$ python mask_by_rules.py -h
usage: mask_by_rules.py [-h] --input_file INPUT_FILE --rule_file RULE_FILE
                        --output_file OUTPUT_FILE

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing dataset is
  --rule_file RULE_FILE, -r RULE_FILE
                        Where the json file containing substitution rule_dict
                        is
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where to store token information with gradient
  • After generating rules and testing samples, incorporate weighted sampling in training using Q4/train_weighted.py:
$ python train_weighted.py -h
usage: train_weighted.py [-h] --model_path MODEL_PATH [--restore] --train_path
                         TRAIN_PATH --valid_path VALID_PATH --relation_path
                         RELATION_PATH --rule_path RULE_PATH
                         [--num_epochs NUM_EPOCHS] [--max_seq_len MAX_SEQ_LEN]
                         [--batch_size BATCH_SIZE] [--metric {micro_f1,acc}]

optional arguments:
  -h, --help            show this help message and exit
  --model_path MODEL_PATH, -m MODEL_PATH
                        Full path for saving weights during training
  --restore             Whether to restore model weights from given model path
  --train_path TRAIN_PATH, -t TRAIN_PATH
                        Full path to file containing training data
  --valid_path VALID_PATH, -v VALID_PATH
                        Full path to file containing validation data
  --relation_path RELATION_PATH, -r RELATION_PATH
                        Full path to json file containing relation to index
                        dict
  --rule_path RULE_PATH, --rule RULE_PATH
                        Path to file containing rules for generating weights
                        of training samples
  --num_epochs NUM_EPOCHS, -e NUM_EPOCHS
                        Number of training epochs
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Maximum sequence length of bert model
  --batch_size BATCH_SIZE, -b BATCH_SIZE
                        Batch size for training and testing
  --metric {micro_f1,acc}
                        Metric chosen for evaluation

Q5 Semantic Bias

This part aims to test whether the model relies on some stereotypes in data or unnecessary links between some entities and relations. The experiment is inspired by Han et al. 2020, with some differences in our OE and ME settings. Also we form a debiased dataset to test model's ability of learning features from the context, not just from entities' names.

  • Only-entity setting is to test model's ability when given only the names of entities, and the dataset could be generated using Q5/only_entity.py:
$ python only_entity.py -h
usage: only_entity.py [-h] --input_file INPUT_FILE [--output_file OUTPUT_FILE]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing original dataset is
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where output moified dataset
  • Mask-entity randomly setting is to test model's ability when given only the context (without the entities, but positional tokens are kept), and you can generate masked dataset using Q5/mask_random.py:
$ python mask_random.py -h
usage: mask_random.py [-h] --input_file INPUT_FILE [--output_file OUTPUT_FILE]
                      [--keep_origin] [--ratio RATIO]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing original dataset is
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where output moified dataset
  --keep_origin, -k     Whether to keep original samples in output dataset
  --ratio RATIO, -r RATIO
                        Ratio of random masking
  • Mask-entity by entity frequency setting is to test how entities' frequency in samples affects model's ability to do Relation Classification. The dataset is generated with Q5/mask_frequency.py:
$ python mask_frequency.py -h
usage: mask_frequency.py [-h] --input_file INPUT_FILE
                         [--output_file OUTPUT_FILE] [--keep_origin]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing original dataset is
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where output moified dataset
  --keep_origin, -k     Whether to keep original samples in output dataset
  • Select debiased dataset (for testing) by choosing samples whose only_entity version beat the Bert model using select_debiased.py:
$ python select_debiased.py -h
usage: select_debiased.py [-h] --input_file INPUT_FILE --original_data ORIGIN
                          --output_file OUTPUT_FILE --model_path MODEL_PATH
                          --relation_path RELATION_PATH
                          [--max_seq_len MAX_SEQ_LEN]

optional arguments:
  -h, --help            show this help message and exit
  --input_file INPUT_FILE, -i INPUT_FILE
                        Where the input file containing original dataset is
  --original_data ORIGIN, --origin ORIGIN
                        Where the input file containing original dataset is
  --output_file OUTPUT_FILE, -o OUTPUT_FILE
                        Where to store token information with gradient
  --model_path MODEL_PATH, -m MODEL_PATH
                        Where the model for wrong prediction is
  --relation_path RELATION_PATH, -r RELATION_PATH
                        Full path to json file containing relation to index
                        dict
  --max_seq_len MAX_SEQ_LEN, -l MAX_SEQ_LEN
                        Maximum sequence length of bert model
  • After data generation, you can run train.py or test.py to perform training / testing.

How to Cite

@inproceedings{li2021robustness,
  title={On Robustness and Bias Analysis of BERT-Based Relation Extraction},
  author={Li, Luoqiu and Chen, Xiang and Ye, Hongbin and Bi, Zhen and Deng, Shumin and Zhang, Ningyu and Chen, Huajun},
  booktitle={China Conference on Knowledge Graph and Semantic Computing},
  pages={43--59},
  year={2021},
  organization={Springer}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].