All Projects → HLTCHKUST → MulQG

HLTCHKUST / MulQG

Licence: MIT License
Multi-hop Question Generation with Graph Convolutional Network

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to MulQG

RandomX OpenCL
RandomX OpenCL implementation
Stars: ✭ 26 (+30%)
Mutual labels:  gcn
MLH-Quizzet
This is a smart Quiz Generator that generates a dynamic quiz from any uploaded text/PDF document using NLP. This can be used for self-analysis, question paper generation, and evaluation, thus reducing human effort.
Stars: ✭ 23 (+15%)
Mutual labels:  question-generation
st-gcn-sl
Spatial Temporal Graph Convolutional Networks for Sign Language (ST-GCN-SL) Recognition
Stars: ✭ 18 (-10%)
Mutual labels:  gcn
Representation Learning on Graphs with Jumping Knowledge Networks
Representation Learning on Graphs with Jumping Knowledge Networks
Stars: ✭ 31 (+55%)
Mutual labels:  gcn
Tianchi2020ChineseMedicineQuestionGeneration
2020 阿里云天池大数据竞赛-中医药文献问题生成挑战赛
Stars: ✭ 20 (+0%)
Mutual labels:  question-generation
RL-based-Graph2Seq-for-NQG
Code & data accompanying the ICLR 2020 paper "Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation"
Stars: ✭ 104 (+420%)
Mutual labels:  question-generation
GNN-Recommender-Systems
An index of recommendation algorithms that are based on Graph Neural Networks.
Stars: ✭ 505 (+2425%)
Mutual labels:  gcn
A- Guide -to Data Sciecne from mathematics
It is a blueprint to data science from the mathematics to algorithms. It is not completed.
Stars: ✭ 25 (+25%)
Mutual labels:  gcn
Zero-shot-Fact-Verification
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"
Stars: ✭ 39 (+95%)
Mutual labels:  question-generation
just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+185%)
Mutual labels:  question-generation
explicit memory tracker
[ACL 2020] Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading
Stars: ✭ 35 (+75%)
Mutual labels:  question-generation
PDN
The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)
Stars: ✭ 44 (+120%)
Mutual labels:  gcn
Literatures-on-GNN-Acceleration
A reading list for deep graph learning acceleration.
Stars: ✭ 50 (+150%)
Mutual labels:  gcn
resolutions-2019
A list of data mining and machine learning papers that I implemented in 2019.
Stars: ✭ 19 (-5%)
Mutual labels:  gcn
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+840%)
Mutual labels:  question-generation
TDRG
Transformer-based Dual Relation Graph for Multi-label Image Recognition. ICCV 2021
Stars: ✭ 32 (+60%)
Mutual labels:  gcn
kGCN
A graph-based deep learning framework for life science
Stars: ✭ 91 (+355%)
Mutual labels:  gcn
DeepPanoContext
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).
Stars: ✭ 44 (+120%)
Mutual labels:  gcn
mvGAE
Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders (IJCAI 2018)
Stars: ✭ 27 (+35%)
Mutual labels:  gcn
LibAUC
An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).
Stars: ✭ 115 (+475%)
Mutual labels:  gcn

Multi-hop Question Generation with Graph Convolutional Network (MulQG)

License: MIT

This is the implementation of the paper:

Multi-hop Question Generation with Graph Convolutional Network. Dan Su, Yan Xu, Wenliang Dai, Ziwei Ji, Tiezheng Yu, Pascale Fung Findings of EMNLP 2020 [PDF]

If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex is listed below:

@inproceedings{su2020multi,
  title={Multi-hop Question Generation with Graph Convolutional Network},
  author={Su, Dan and Xu, Yan and Dai, Wenliang and Ji, Ziwei and Yu, Tiezheng and Fung, Pascale},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings},
  pages={4636--4647},
  year={2020}
}

Abstract

Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs. It is a more challenging yet under-explored task compared to conventional single-hop QG, where the questions are generated from the sentence containing the answer or nearby sentences in the same paragraph without complex reasoning. To address the additional challenges in multi-hop QG, we propose Multi-Hop Encoding Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops with Graph Convolutional Network and encoding fusion via an Encoder Reasoning Gate. To the best of our knowledge, we are the first to tackle the challenge of multi-hop reasoning over paragraphs without any sentence-level information. Empirical results on HotpotQA dataset demonstrate the effectiveness of our method, in comparison with baselines on automatic evaluation metrics. Moreover, from the human evaluation, our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.

MulQG Framework:

Overview of our MulQG framework. In the encoding stage, we pass the initial context encoding C_0 and answer encoding A_0 to the Answer-aware Context Encoder to obtain the first context encoding C_1, then C_1 and A_0 will be used to update a multi-hop answer encoding A_1 via the GCN-based Entity-aware Answer Encoder, and we use A_1 and C_1 back to the Answer-aware Context Encoder to obtain C_2. The final context encoding C_{final} are obtained from the Encoder Reasoning Gate which operates over C_1 and C_2, and will be used in the max-out based decoding stage.

The illustration of GCN-based Entity-aware Answer Encoder.

Dependencies

python 3, pytorch, boto3

Or you can use conda environment yml file (multi-qg.yml) to create your conda environment by running

conda env create -f multi-qg.yml

or try the

pip install -r requirement.txt

Experiments

Download Data

HotpotQA Data

Download the hotpot QA train and test data and put them under ./hotpot/data/.

Glove Embedding

Download the glove embedding and unzip 'glove.840B.300d.txt' and put it under ./glove/glove.840B.300d.txt

Bert Models

We use the Bert models in the paragraph selection part. You should download and set bert pretrained model and vocabulary properly. You can find the download links in paragraph_selection/pytorch_pretrained_bert/modeling.py row 40-51, and paragraph_selection/pytorch_pretrained_bert/tokenization.py row 30-41. After you finish downloading, you should replace the dict value with your own local path accordingly.

Preprocessed Data

You can directly download our preprocessed train & dev data of HotpotQA from the link

Extract all compressed files into ./hotpot/ folder.

Also you can preprocess by yourself following the instructions in the next section.

Preprocess

Previously we provided intermediate data files for training MulQG. Now you can also run the following preprocessing. The preprocessing phase consists of paragraph selection, named entity recognition, and graph construction.

  • Step 1.1: First, download model checkpoints and save them in ./work_dir

  • Step 2: Run the data preprocessing (change the input and output path to your own)

sh ./run_preprocess.sh
  • Step 3: Run the process_hotpot.py (to obtain the embedding.pkl and word2idx.pkl)

Released Checkpoints

We also released our pretrained model for reproduction.

Training

  • Step 4: Run the training
sh ./run_train.sh 
python3 -m GPG.main --use_cuda --schedule --ans_update --q_attn --is_coverage --use_copy --batch_size=36 --beam_search --gpus=0,1,2 --position_embeddings_flag

Change the configuration file in GPG/config.py with proper data path, eg, the log path, the output model path, so on. If an OOM exception occurs, you may try to set a smaller batch size with gradient_accumulate_step > 1. Your checkpoints in each epoch will be stored in ./output/ directory respectively. or you can change the path in GPG/config.py.

Inference

  • Step 5: Do the inference, and the prediction results will be under ./prediction/ (you may modify other configurations in GPG/config.py file)
sh ./run_inference.sh

You can do the inference using our released model MulQG_BFS.tar.gz, with the following command:

python3 -m GPG.main --notrain --max_tgt_len=30 --min_tgt_len=8 --beam_size=10 --use_cuda --ans_update --q_attn --is_coverage --use_copy --batch_size=20 --beam_search --gpus=0 --position_embeddings_flag --restore="./output/GraphPointerGenerator/MulQG_BFS_checkpoint.pt"

Evaluation

  • Step 6: Do the evaluation. We calculate the BLEU and METOR, and ROUGE score via nlg-eval, and the Answerability and QBLEU metrics via Answeriblity-Metric. You may need to install them.

We also upload our prediction output as in ./prediction/ directory, using nlg-eval packages via:

nlg-eval --hypothesis=./prediction/candidate.txt --references=./predictioin/golden.txt

you will get nlg-eval results like:

Bleu_1: 0.401475
Bleu_2: 0.267142
Bleu_3: 0.197256
Bleu_4: 0.151990
METEOR: 0.205085
ROUGE_L: 0.352992

Also, follow the instructions Answeriblity-Metric to measure the Answerability and QBLEU metircs.

python3 answerability_score.py --data_type squad --ref_file ./prediction/golden.txt --hyp_file ./prediction/candidate.txt --ner_weight 0.6 --qt_weight 0.2 --re_weight 0.1 --delta 0.7 --ngram_metric Bleu_4

then you will get the QBLEU4 as in (according to the paper, this should be the QBLEU-4 value, just ignore the words)

Mean Answerability Score Across Questions: 0.540
python3 answerability_score.py --data_type squad --ref_file ./prediction/golden.txt --hyp_file ./prediction/candidate.txt --ner_weight 0.6 --qt_weight 0.2 --re_weight 0.1 --delta 1.0 --ngram_metric Bleu_4

then you will get the Answerability :

Mean Answerability Score Across Questions: 0.728
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].