All Projects → alexa → Dialoglue

alexa / Dialoglue

Licence: apache-2.0
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dialoglue

Ludwig
Data-centric declarative deep learning framework
Stars: ✭ 8,018 (+6581.67%)
Mutual labels:  natural-language-processing, machinelearning, natural-language-understanding
Mt Dnn
Multi-Task Deep Neural Networks for Natural Language Understanding
Stars: ✭ 72 (-40%)
Mutual labels:  natural-language-processing, natural-language-understanding
Intent classifier
Stars: ✭ 67 (-44.17%)
Mutual labels:  natural-language-processing, natural-language-understanding
Bert As Service
Mapping a variable-length sentence to a fixed-length vector using BERT model
Stars: ✭ 9,779 (+8049.17%)
Mutual labels:  natural-language-processing, natural-language-understanding
Convai Baseline
ConvAI baseline solution
Stars: ✭ 49 (-59.17%)
Mutual labels:  natural-language-processing, natural-language-understanding
Python Tutorial Notebooks
Python tutorials as Jupyter Notebooks for NLP, ML, AI
Stars: ✭ 52 (-56.67%)
Mutual labels:  natural-language-processing, natural-language-understanding
Spark Nlp Models
Models and Pipelines for the Spark NLP library
Stars: ✭ 88 (-26.67%)
Mutual labels:  natural-language-processing, natural-language-understanding
Reading comprehension tf
Machine Reading Comprehension in Tensorflow
Stars: ✭ 37 (-69.17%)
Mutual labels:  natural-language-processing, natural-language-understanding
Spokestack Python
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-14.17%)
Mutual labels:  natural-language-processing, natural-language-understanding
Easy Bert
A Dead Simple BERT API for Python and Java (https://github.com/google-research/bert)
Stars: ✭ 106 (-11.67%)
Mutual labels:  natural-language-processing, natural-language-understanding
Chatbot
Русскоязычный чатбот
Stars: ✭ 106 (-11.67%)
Mutual labels:  natural-language-processing, natural-language-understanding
Xlnet extension tf
XLNet Extension in TensorFlow
Stars: ✭ 109 (-9.17%)
Mutual labels:  natural-language-processing, natural-language-understanding
Blocks
Blocks World -- Simulator, Code, and Models (Misra et al. EMNLP 2017)
Stars: ✭ 39 (-67.5%)
Mutual labels:  natural-language-processing, natural-language-understanding
Bidaf Keras
Bidirectional Attention Flow for Machine Comprehension implemented in Keras 2
Stars: ✭ 60 (-50%)
Mutual labels:  natural-language-processing, natural-language-understanding
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-67.5%)
Mutual labels:  natural-language-processing, natural-language-understanding
Dialogue Understanding
This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study
Stars: ✭ 77 (-35.83%)
Mutual labels:  natural-language-processing, natural-language-understanding
Max Text Sentiment Classifier
Detect the sentiment captured in short pieces of text
Stars: ✭ 35 (-70.83%)
Mutual labels:  natural-language-processing, natural-language-understanding
Gsoc2018 3gm
💫 Automated codification of Greek Legislation with NLP
Stars: ✭ 36 (-70%)
Mutual labels:  natural-language-processing, natural-language-understanding
Chinese nlu by using rasa nlu
使用 RASA NLU 来构建中文自然语言理解系统(NLU)| Use RASA NLU to build a Chinese Natural Language Understanding System (NLU)
Stars: ✭ 99 (-17.5%)
Mutual labels:  natural-language-processing, natural-language-understanding
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+46351.67%)
Mutual labels:  natural-language-processing, natural-language-understanding

DialoGLUE

DialoGLUE is a conversational AI benchmark designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. For a more detailed write-up of the benchmark check out our paper.

This repository contains all code related to the benchmark, including scripts for downloading relevant datasets, preprocessing them in a consistent format for benchmark submissions, evaluating any submission outputs, and running baseline models from the original benchmark description.

Datasets

This benchmark is created by scripts that pull data from previously-published data resources. Thank you to the authors of those works for their great contributions:

Dataset Size Description License
Banking77 13K online banking queries CC-BY-4.0
HWU64 11K popular personal assistant queries CC-BY-SA 3.0
CLINC150 20K popular personal assistant queries CC-BY-SA 3.0
Restaurant8k 8.2K restaurant booking domain queries CC-BY-4.0
DSTC8 SGD 20K multi-domain, task-oriented conversations between a human and a virtual assistant CC-BY-SA 4.0 International
TOP 45K compositional queries for hierachical semantic representations CC-BY-SA
MultiWOZ 2.1 12K multi-domain dialogues with multiple turns MIT

Data Download

To download/process the various datasets that are part of the DialoGLUE benchmark, run bash download_data.sh from data_utils.

Upon completion, your DialoGLUE folder should contain the following:

dialoglue/
├── banking
│   ├── categories.json
│   ├── test.csv
│   ├── train_10.csv
│   ├── train_5.csv
│   ├── train.csv
│   └── val.csv
├── clinc
│   ├── categories.json
│   ├── test.csv
│   ├── train_10.csv
│   ├── train_5.csv
│   ├── train.csv
│   └── val.csv
├── dstc8_sgd
│   ├── stats.csv
│   ├── test.json
│   ├── train_10.json
│   ├── train.json
│   ├── val.json
│   └── vocab.txt
├── hwu
│   ├── categories.json
│   ├── test.csv
│   ├── train_10.csv
│   ├── train_5.csv
│   ├── train.csv
│   └── val.csv
├── mlm_all.txt
├── multiwoz
│   ├── MULTIWOZ2.1
│   │   ├── dialogue_acts.json
│   │   ├── README.txt
│   │   ├── test_dials.json
│   │   ├── train_dials.json
│   │   └── val_dials.json
│   └── MULTIWOZ2.1_fewshot
│       ├── dialogue_acts.json
│       ├── README.txt
│       ├── test_dials.json
│       ├── train_dials.json
│       └── val_dials.json
├── restaurant8k
│   ├── test.json
│   ├── train_10.json
│   ├── train.json
│   ├── val.json
│   └── vocab.txt
└── top
    ├── eval.txt
    ├── test.txt
    ├── train_10.txt
    ├── train.txt
    ├── vocab.intent
    └── vocab.slot

The files with a _10 suffix (e.g., banking/train_10.csv) are used for the few-shot experiments, wherein models are trained with only 10% of the datasets.

EvalAI Leaderboard

The DialoGLUE benchmark is hosted on EvalAI and we invite submissions to our leaderboard. The submission should be a JSON file with keys corresponding to each of the seven DialoGLUE datasets:

{"banking": [*banking outputs*], "hwu": [*hwu outputs*], ..., "multiwoz": [*multiwoz outputs*]}

Given a set of seven model checkpoints, you can edit and run dump_outputs.py to generate a valid submission file. For the intent classification tasks (HWU64, Banking77, CLINC150), the outputs are a list of intent classes. For the slot filling tasks (Restaurant8k, DSTC8), the outputs are a list of spans. For the TOP dataset, the outputs are a list of (intent, slots) pairs wherein each slot is the path from the root to the leaf node. For MultiWOZ, the output follows the TripPy format and corresponds to the pred_res.test.final.json output file. We strongly recommend using the dump_outputs.py script to generate outputs.

For the few-shot experimental setting, we only train models on a subset (roughly 10%) of the training data. The specific data splits are produced by running download_data.sh. To mitigate the impact of random initialization, we ask that you train 5 models for each of the few-shot tasks and submit the output of all 5 models. The scores on the leaderboard will be the average of these five runs.

The few-shot submission file format is as follows:

{"banking": [*banking outputs from model 1*, ... *banking outputs from model 5*], ...}

You may run dump_outputs_fewshot.py to generate a valid submission file given the model paths corresponding to all of the runs.

Training

Almost all of the models can be trained/evaluated using the run.py script. MultiWOZ is the exception, and relies on the modified open-sourced TripPy implementation.

The commands for training/evaluating models are as follows. If you want to only run inference/evaluation, simply change --num_epochs to 0.

Checkpoints

The relevant convbert and convbert-dg models can be found here.

HWU64

python run.py \
        --train_data_path data_utils/dialoglue/hwu/train.csv \
        --val_data_path data_utils/dialoglue/hwu/val.csv \
        --test_data_path data_utils/dialoglue/hwu/test.csv \
        --token_vocab_path bert-base-uncased-vocab.txt \
        --train_batch_size 64 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5 \
        --model_name_or_path convbert-dg --task intent --do_lowercase --max_seq_length 50 --mlm_pre --mlm_during --dump_outputs \

Banking77

python run.py \
        --train_data_path data_utils/dialoglue/banking/train.csv \
        --val_data_path data_utils/dialoglue/banking/val.csv \
        --test_data_path data_utils/dialoglue/banking/test.csv \
        --token_vocab_path bert-base-uncased-vocab.txt \
        --train_batch_size 32 --grad_accum 2 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5 \
        --model_name_or_path convbert-dg --task intent --do_lowercase --max_seq_length 100 --mlm_pre --mlm_during --dump_outputs \

CLINC150

python run.py \
        --train_data_path data_utils/dialoglue/clinc/train.csv \
        --val_data_path data_utils/dialoglue/clinc/val.csv \
        --test_data_path data_utils/dialoglue/clinc/test.csv \
        --token_vocab_path bert-base-uncased-vocab.txt \
        --train_batch_size 64 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5 \
        --model_name_or_path convbert-dg --task intent --do_lowercase --max_seq_length 50 --mlm_pre --mlm_during --dump_outputs \

Restaurant8k

python run.py \
        --train_data_path data_utils/dialoglue/restaurant8k/train.json \
        --val_data_path data_utils/dialoglue/restaurant8k/val.json \
        --test_data_path data_utils/dialoglue/restaurant8k/test.json \
        --token_vocab_path bert-base-uncased-vocab.txt \
        --train_batch_size 64 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5 \
        --model_name_or_path convbert-dg --task slot --do_lowercase --max_seq_length 50 --mlm_pre --mlm_during --dump_outputs \

DSTC8

python run.py \
        --train_data_path data_utils/dialoglue/dstc8/train.json \
        --val_data_path data_utils/dialoglue/dstc8/val.json \
        --test_data_path data_utils/dialoglue/dstc8/test.json \
        --token_vocab_path bert-base-uncased-vocab.txt \
        --train_batch_size 64 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5 \
        --model_name_or_path convbert-dg --task slot --do_lowercase --max_seq_length 50 --mlm_pre --mlm_during --dump_outputs \

TOP

python run.py \
        --train_data_path data_utils/dialoglue/top/train.txt \
        --val_data_path data_utils/dialoglue/top/eval.txt \
        --test_data_path data_utils/dialoglue/top/test.txt \
        --token_vocab_path bert-base-uncased-vocab.txt \
        --train_batch_size 64 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5 \
        --model_name_or_path convbert-dg --task top --do_lowercase --max_seq_length 50 --mlm_pre --mlm_during --dump_outputs \

MultiWOZ

The MultiWOZ code builds on the open-sourced TripPy implementation. To train/evaluate the model using our modifications (i.e., MLM pre-training), you can use trippy/DO.example.advanced.

Checkpoints

Checkpoints are released for (1) ConvBERT, (2) BERT-DG and (3) ConvBERT-DG. Given these pre-trained models and the code in this repo, all of our results can be reproduced.

Requirements

This project has been tested and is functional with Python 3.7. The Python dependencies are listed in the requirements.txt.

License

This project is licensed under the Apache-2.0 License.

Citation

If using these scripts or the DialoGLUE benchmark, please cite the following in any relevant work:

@article{MehriDialoGLUE2020,
  title={DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue},
  author={S. Mehri and M. Eric and D. Hakkani-Tur},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.13570}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].