Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → budzianowski → Multiwoz

budzianowski / Multiwoz

Licence: mit

Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)

Programming Languages

python

139335 projects - #7 most used programming language

Labels

machine-learning natural-language-processing seq2seq dialogue-systems dialogue

Projects that are alternatives of or similar to Multiwoz

Tgen

Statistical NLG for spoken dialogue systems

Stars: ✭ 179 (-53.39%)

Mutual labels: dialogue, seq2seq, dialogue-systems

Dialogue Understanding

This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study

Stars: ✭ 77 (-79.95%)

Mutual labels: dialogue, natural-language-processing, dialogue-systems

Trade Dst

Source code for transferable dialogue state generator (TRADE, Wu et al., 2019). https://arxiv.org/abs/1905.08743

Stars: ✭ 287 (-25.26%)

Mutual labels: dialogue, natural-language-processing, seq2seq

Rnnlg

RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

Stars: ✭ 487 (+26.82%)

Mutual labels: dialogue, natural-language-processing, dialogue-systems

Nndial

NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

Stars: ✭ 332 (-13.54%)

Mutual labels: dialogue, natural-language-processing, dialogue-systems

Meld

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Stars: ✭ 373 (-2.86%)

Mutual labels: dialogue, dialogue-systems

Whiskers

A Visual Dialogue Editor made using the Godot Engine

Stars: ✭ 229 (-40.36%)

Mutual labels: dialogue, dialogue-systems

ADEM

TOWARDS AN AUTOMATIC TURING TEST: LEARNING TO EVALUATE DIALOGUE RESPONSES

Stars: ✭ 25 (-93.49%)

Mutual labels: dialogue, dialogue-systems

TalkerMakerDeluxe

A FOSS Branching Game Dialogue Editor

Stars: ✭ 90 (-76.56%)

Mutual labels: dialogue, dialogue-systems

CVAE Dial

CVAE_XGate model in paper "Xu, Dusek, Konstas, Rieser. Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity"

Stars: ✭ 16 (-95.83%)

Mutual labels: dialogue, seq2seq

Nlp Progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Stars: ✭ 19,518 (+4982.81%)

Mutual labels: dialogue, natural-language-processing

Dialogue

Node based dialogue system

Stars: ✭ 207 (-46.09%)

Mutual labels: dialogue, dialogue-systems

Convlab 2

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

Stars: ✭ 196 (-48.96%)

Mutual labels: dialogue, dialogue-systems

SequenceToSequence

A seq2seq with attention dialogue/MT model implemented by TensorFlow.

Stars: ✭ 11 (-97.14%)

Mutual labels: dialogue, seq2seq

Anyq

FAQ-based Question Answering System

Stars: ✭ 2,336 (+508.33%)

Mutual labels: dialogue, dialogue-systems

DlgSystem

Dialogue Plugin System for Unreal Engine | 🪞 Mirror of https://bit.ly/DlgSource

Stars: ✭ 136 (-64.58%)

Mutual labels: dialogue, dialogue-systems

Dstc8 Schema Guided Dialogue

The Schema-Guided Dialogue Dataset

Stars: ✭ 277 (-27.86%)

Mutual labels: dialogue, dialogue-systems

Dialog Generation Paper

A list of recent papers regarding dialogue generation

Stars: ✭ 265 (-30.99%)

Mutual labels: dialogue, dialogue-systems

Unit Dmkit

Stars: ✭ 279 (-27.34%)

Mutual labels: dialogue, dialogue-systems

Tod Bert

Pre-Trained Models for ToD-BERT

Stars: ✭ 143 (-62.76%)

Mutual labels: dialogue, natural-language-processing

View All Similar Projects ➔

MultiWOZ

Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora.

The newest, corrected version of the dataset is available at MultiWOZ_2.2 thanks to the Google crew.

The new, corrected version of the dataset is available at MultiWOZ_2.1 thanks to the Amazon crew.

The dataset used in the EMNLP publication can be accessed at: MultiWOZ_2.0

The dataset used in the ACL publication can be accessed at: MultiWOZ_1.0

Data structure

There are 3,406 single-domain dialogues that include booking if the domain allows for that and 7,032 multi-domain dialogues consisting of at least 2 up to 5 domains. To enforce reproducibility of results, the corpus was randomly split into a train, test and development set. The test and development sets contain 1k examples each. Even though all dialogues are coherent, some of them were not finished in terms of task description. Therefore, the validation and test sets only contain fully successful dialogues thus enabling a fair comparison of models. There are no dialogues from hospital and police domains in validation and testing sets.

Each dialogue consists of a goal, multiple user and system utterances as well as a belief state. Additionally, the task description in natural language presented to turkers working from the visitor’s side is added. Dialogues with MUL in the name refers to multi-domain dialogues. Dialogues with SNG refers to single-domain dialogues (but a booking sub-domain is possible). The booking might not have been possible to complete if fail_book option is not empty in goal specifications – turkers did not know about that.

The belief state have three sections: semi, book and booked. Semi refers to slots from a particular domain. Book refers to booking slots for a particular domain and booked is a sub-list of book dictionary with information about the booked entity (once the booking has been made). The goal sometimes was wrongly followed by the turkers which may results in the wrong belief state. The joint accuracy metrics includes ALL slots.

FAQ

File names refer to two types of dialogues. The MUL and PMUL names refer to strictly multi domain dialogues (at least 2 main domains are involved) while the SNG, SSNG and WOZ names refer to single domain dialogues with potentially sub-domains like booking.
Only system utterances are annotated with dialogue acts – there are no annotations from the user side.
There is no 1-to-1 mapping between dialogue acts and sentences.
There is no dialogue state tracking labels for police and hospital as these domains are very simple. However, there are no dialogues with these domains in validation and testing sets either.
For the dialogue state tracking experiments please follow the datat processing and scoring scripts from the TRADE model (Wu et al. 2019).

Benchmarks

Belief Tracking

	MultiWOZ 2.0		MultiWOZ 2.1		MultiWOZ 2.2
Model	Joint Accuracy	Slot	Joint Accuracy	Slot	Joint Accuracy	Slot
MDBT (Ramadan et al., 2018)	15.57	89.53
GLAD (Zhong et al., 2018)	35.57	95.44
GCE (Nouri and Hosseini-Asl, 2018)	36.27	98.42
Neural Reading (Gao et al, 2019)	41.10
HyST (Goel et al, 2019)	44.24
SUMBT (Lee et al, 2019)	46.65	96.44
SGD-baseline (Rastogi et al, 2019)			43.4		42.0
TRADE (Wu et al, 2019)	48.62	96.92	46.0		45.4
COMER (Ren et al, 2019)	48.79
MERET (Huang et al, 2020)	50.91	97.07
DSTQA (Zhou et al, 2019)	51.44	97.24	51.17	97.21
SUMBT+LaRL (Lee et al. 2020)	51.52	97.89
DS-DST (Zhang et al, 2019)			51.2		51.7
LABES-S2S (Zhang et al, 2020)			51.45
DST-Picklist (Zhang et al, 2019)	54.39		53.3
MinTL-BART (Lin et al, 2020)	52.10		53.62
SST (Chen et al. 2020)			55.23
TripPy (Heck et al. 2020)			55.3
SimpleTOD (Hosseini-Asl et al. 2020)			56.45
ConvBERT-DG + Multi (Mehri et al. 2020)			58.7
TripPy + CoCoAug (Li and Yavuz et al. 2020)			60.53

Policy Optimization

(INFORM + SUCCESS)*0.5 + BLEU	MultiWOZ 2.0			MultiWOZ 2.1
Model	INFORM	SUCCESS	BLEU	INFORM	SUCCESS	BLEU
TokenMoE* (Pei et al. 2019)	75.30	59.70	16.81
Baseline* (Budzianowski et al. 2018)	71.29	60.96	18.8
Structured Fusion* (Mehri et al. 2019)	82.70	72.10	16.34
LaRL* (Zhao et al. 2019)	82.8	79.2	12.8
SimpleTOD (Hosseini-Asl et al. 2020)	88.9	67.1	16.9	85.1	73.5	16.22
MoGNet (Pei et al. 2019)	85.3	73.30	20.13
HDSA* (Chen et al. 2019)	82.9	68.9	23.6
ARDM (Wu et al. 2019)	87.4	72.8	20.6
DAMD (Zhang et al. 2019)	89.2	77.9	18.6
SOLOIST (Peng et al. 2020)	89.60	79.30	18.3
MarCo (Wang et al. 2020)	92.30	78.60	20.02	92.50	77.80	19.54
UBAR (Yang et al. 2020)	94.00	83.60	17.20	92.70	81.00	16.70
HDNO (Wang et al. 2020)	96.40	84.70	18.85	92.80	83.00	18.97
LAVA (Lubis et al. 2020)	97.50	94.80	12.10	96.39	83.57	14.02

* The results were obtained with a previous version of the evaluator. The performance on these works before the upgrade were underestimated.

Natural Language Generation

Model	SER	BLEU
Baseline (Budzianowski et al. 2018)	2.99	0.632

End-to-End Modelling

(INFORM + SUCCESS)*0.5 + BLEU	MultiWOZ 2.0			MultiWOZ 2.1
Model	INFORM	SUCCESS	BLEU	INFORM	SUCCESS	BLEU
DAMD (Zhang et al. 2019)	76.3	60.4	18.6
LABES-S2S (Zhang et al. 2020)				78.07	67.06	18.3
SimpleTOD (Hosseini-Asl et al. 2020)	84.4	70.1	15.01
SOLOIST (Peng et al. 2020)	85.50	72.90	16.54
MinTL-BART (Lin et al. 2020)	84.88	74.91	17.89
LAVA (Lubis et al. 2020)	91.80	81.80	12.03
UBAR (Yang et al. 2020)	95.40	80.70	17.00	95.70	81.80	16.50
SUMBT+LaRL (Lee et al. 2020)	92.20	85.40	17.90

Requirements

Python 2 with pip, pytorch==0.4.1

Quick start

In repo directory:

Preprocessing

To download and pre-process the data run:

python create_delex_data.py

Training

To train the model run:

python train.py [--args=value]

Some of these args include:

// hyperparamters for model learning
--max_epochs        : numbers of epochs
--batch_size        : numbers of turns per batch
--lr_rate           : initial learning rate
--clip              : size of clipping
--l2_norm           : l2-regularization weight
--dropout           : dropout rate
--optim             : optimization method

// network structure
--emb_size          : word vectors emedding size
--use_attn          : whether to use attention
--hid_size_enc      : size of RNN hidden cell
--hid_size_pol      : size of policy hidden output
--hid_size_dec      : size of RNN hidden cell
--cell_type         : specify RNN type

Testing

To evaluate the trained model, run:

python test.py [--args=value]

To evaluate the outside model, run:

python evaluate.py

where in line 611 you need to load your generation predictions.

Benchmark results

The following benchmark results were produced by this software. We ran a small grid search over various hyperparameter settings and reported the performance of the best model on the test set. The selection criterion was 0.5match + 0.5success+100*BLEU on the validation set. The final parameters were:

// hyperparamters for model learning
--max_epochs        : 20
--batch_size        : 64
--lr_rate           : 0.005
--clip              : 5.0
--l2_norm           : 0.00001
--dropout           : 0.0
--optim             : Adam

// network structure
--emb_size          : 50
--use_attn          : True
--hid_size_enc      : 150
--hid_size_pol      : 150
--hid_size_dec      : 150
--cell_type         : lstm

References

If you use any source codes or datasets included in this toolkit in your work, please cite the corresponding papers. The bibtex are listed below:

[Budzianowski et al. 2018]
@inproceedings{budzianowski2018large,
    Author = {Budzianowski, Pawe{\l} and Wen, Tsung-Hsien and Tseng, Bo-Hsiang  and Casanueva, I{\~n}igo and Ultes Stefan and Ramadan Osman and Ga{\v{s}}i\'c, Milica},
    title={MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling},
    booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year={2018}
}

[Ramadan et al. 2018]
@inproceedings{ramadan2018large,
  title={Large-Scale Multi-Domain Belief Tracking with Knowledge Sharing},
  author={Ramadan, Osman and Budzianowski, Pawe{\l} and Gasic, Milica},
  booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics},
  volume={2},
  pages={432--437},
  year={2018}
}

[Eric et al. 2019]
@article{eric2019multiwoz,
  title={MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines},
  author={Eric, Mihail and Goel, Rahul and Paul, Shachi and Sethi, Abhishek and Agarwal, Sanchit and Gao, Shuyag and Hakkani-Tur, Dilek},
  journal={arXiv preprint arXiv:1907.01669},
  year={2019}
}

[Zang et al. 2020]
@inproceedings{zang2020multiwoz,
  title={MultiWOZ 2.2: A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines},
  author={Zang, Xiaoxue and Rastogi, Abhinav and Sunkara, Srinivas and Gupta, Raghav and Zhang, Jianguo and Chen, Jindong},
  booktitle={Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, ACL 2020},
  pages={109--117},
  year={2020}
}

License

MultiWOZ is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Paweł Budzianowski from Cambridge Dialogue Systems Group under Apache License 2.0.

Bug Report

If you have found any bugs in the code, please contact: pfb30 at cam dot ac dot uk

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 384

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗