All Projects → li3cmz → GRADE

li3cmz / GRADE

Licence: other
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to GRADE

pytorch sscr
A PyTorch implementation of SSCR
Stars: ✭ 25 (-50%)
Mutual labels:  emnlp2020
QuantiDCE
Towards Quantifiable Dialogue Coherence Evaluation (ACL 2021)
Stars: ✭ 38 (-24%)
Mutual labels:  dialogue-metric
EMNLP2020
This is official Pytorch code and datasets of the paper "Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News", EMNLP 2020.
Stars: ✭ 55 (+10%)
Mutual labels:  emnlp2020
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (+88%)
Mutual labels:  open-domain
AODA
Official implementation of "Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis"(WACV 2022/CVPRW 2021)
Stars: ✭ 44 (-12%)
Mutual labels:  open-domain
task-transferability
Data and code for our paper "Exploring and Predicting Transferability across NLP Tasks", to appear at EMNLP 2020.
Stars: ✭ 35 (-30%)
Mutual labels:  emnlp2020
SEFR CUT
Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble (EMNLP2020)
Stars: ✭ 18 (-64%)
Mutual labels:  emnlp2020

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

This repository contains the source code for the following paper:

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Lishan Huang, Zheng Ye, Jinghui Qin, Xiaodan Liang; EMNLP 2020

Model Overview

GRADE

Prerequisites

Create virtural environment (recommended):

conda create -n GRADE python=3.6
source activate GRADE

Install the required packages:

pip install -r requirements.txt

Install Texar locally:

cd texar-pytorch
pip install .

Note: Make sure that your environment has installed cuda 10.1.

Data Preparation

GRADE is trained on the DailyDialog Dataset proposed by (Li et al.,2017).

For convenience, we provide the processed data of DailyDialog. And you should also download it and unzip into the data directory. And you should also download tools and unzip it into the root directory of this repo.

If you wanna prepare the training data from scratch, please follow the steps:

  1. Install Lucene;
  2. Run the preprocessing script:
cd ./script
bash preprocess_training_dataset.sh

Training

To train GRADE, please run the following script:

cd ./script
bash train.sh

Note that the checkpoint of our final GRADE is provided. You could download it and unzip into the root directory.

Evaluation

We evaluate GRADE and other baseline metrics on three chit-chat datasets (DailyDialog, ConvAI2 and EmpatheticDialogues). The corresponding evaluation data in the evaluation directory has the following file structure:

.
└── evaluation
    └── eval_data
    |   └── DIALOG_DATASET_NAME
    |       └── DIALOG_MODEL_NAME
    |           └── human_ctx.txt
    |           └── human_hyp.txt
    └── human_score
        └── DIALOG_DATASET_NAME
        |   └── DIALOG_MODEL_NAME
        |       └── human_score.txt
        └── human_judgement.json

Note: the entire human judgement data we proposed for metric evaluation is in human_judgement.json.

To evaluate GRADE, please run the following script:

cd ./script
bash eval.sh

Using GRADE

To use GRADE on your own dialog dataset:

  1. Put the whole dataset (raw data) into ./preprocess/dataset;
  2. Update the function load_dataset in ./preprocess/extract_keywords.py for loading the dataset;
  3. Prepare the context-response data that you want to evaluate and convert it into the following format:
.
└── evaluation
    └── eval_data
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── human_ctx.txt
                └── human_hyp.txt
  1. Run the following script to evaluate the context-response data with GRADE:
cd ./script
bash inference.sh
  1. Lastly, the scores given by GRADE can be found as below:
.
└── evaluation
    └── infer_result
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── non_reduced_results.json
                └── reduced_results.json
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].