Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → shawnwun → Rnnlg

shawnwun / Rnnlg

Licence: other

RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

Programming Languages

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning natural-language-processing dialogue-systems dialogue natural-language-generation

Projects that are alternatives of or similar to Rnnlg

NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

Stars: ✭ 332 (-31.83%)

Mutual labels: dialogue, natural-language-processing, dialogue-systems, natural-language-generation

Convai Baseline

ConvAI baseline solution

Stars: ✭ 49 (-89.94%)

Mutual labels: natural-language-processing, dialogue-systems, natural-language-generation

Dialogue Understanding

This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study

Stars: ✭ 77 (-84.19%)

Mutual labels: dialogue, natural-language-processing, dialogue-systems

Statistical NLG for spoken dialogue systems

Stars: ✭ 179 (-63.24%)

Mutual labels: dialogue, dialogue-systems, natural-language-generation

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

Stars: ✭ 822 (+68.79%)

Mutual labels: dialogue, natural-language-processing, natural-language-generation

Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)

Stars: ✭ 384 (-21.15%)

Mutual labels: dialogue, natural-language-processing, dialogue-systems

Practical Pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained

Stars: ✭ 4,329 (+788.91%)

Mutual labels: natural-language-processing, natural-language-generation

TOWARDS AN AUTOMATIC TURING TEST: LEARNING TO EVALUATE DIALOGUE RESPONSES

Stars: ✭ 25 (-94.87%)

Mutual labels: dialogue, dialogue-systems

TalkerMakerDeluxe

A FOSS Branching Game Dialogue Editor

Stars: ✭ 90 (-81.52%)

Mutual labels: dialogue, dialogue-systems

Dstc8 Schema Guided Dialogue

The Schema-Guided Dialogue Dataset

Stars: ✭ 277 (-43.12%)

Mutual labels: dialogue, dialogue-systems

FAQ-based Question Answering System

Stars: ✭ 2,336 (+379.67%)

Mutual labels: dialogue, dialogue-systems

dialogue-datasets

collect the open dialog corpus and some useful data processing utils.

Stars: ✭ 24 (-95.07%)

Mutual labels: dialogue, dialogue-systems

Stars: ✭ 279 (-42.71%)

Mutual labels: dialogue, dialogue-systems

A Visual Dialogue Editor made using the Godot Engine

Stars: ✭ 229 (-52.98%)

Mutual labels: dialogue, dialogue-systems

Node based dialogue system

Stars: ✭ 207 (-57.49%)

Mutual labels: dialogue, dialogue-systems

Dialogue Plugin System for Unreal Engine | 🪞 Mirror of https://bit.ly/DlgSource

Stars: ✭ 136 (-72.07%)

Mutual labels: dialogue, dialogue-systems

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems

Stars: ✭ 196 (-59.75%)

Mutual labels: dialogue, dialogue-systems

Dialog Generation Paper

A list of recent papers regarding dialogue generation

Stars: ✭ 265 (-45.59%)

Mutual labels: dialogue, dialogue-systems

Nlp Conference Compendium

Compendium of the resources available from top NLP conferences.

Stars: ✭ 349 (-28.34%)

Mutual labels: natural-language-processing, natural-language-generation

Question generation

Neural question generation using transformers

Stars: ✭ 356 (-26.9%)

Mutual labels: natural-language-processing, natural-language-generation

View All Similar Projects ➔

RNNLG

RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

**UPDATE: If you are interested in learning the techniques behind this toolkit, I recently had a tutorial at INLG conference. The link of the slides is here: https://shawnwun.github.io/talks/DL4NLG_20160906.pdf.

Requirement

You need to have the following package to run the program:

* Theano 0.8.2 and accompanying packages such as numpy, scipy ...
* NLTK 3.0.0

Benchmark Datasets

The toolkit encloses the following four benchmark datasets:

* data/original/restaurant/ : San Francisco restaurant search
* data/original/hotel/      : San Francisco hotel search
* data/original/laptop/     : Laptop sale/search
* data/original/tv/         : Television sale/search

and the counterfeited datasets produced in Wen et al, 2016:

* data/counterfeit/r2h/     : Restaurant to hotel counterfeited dataset
* data/counterfeit/h2r/     : Hotel to restaurant counterfeited dataset
* data/counterfeit/l2t/     : Laptop to TV  counterfeited dataset
* data/counterfeit/t2l/     : TV to laptop  counterfeited dataset
* data/counterfeit/r+h2l+t/ : Restaurant/hotel to laptop/TV ...
* data/counterfeit/l+t2r+h/ : Laptop/TV to restaurant/hotel ...

as well as some union of domains:

* data/union/r+h/
* data/union/l+t/
* data/union/r+h+l/
* data/union/r+h+l+t/

Each example in the file is represented as a 3-element list:

* [MR/Dialogue Act, Human Authored Response, HDC baseline]

For more detail of how the datasets were collected, please refer to Wen et al, 2015b and Wen et al, 2016.

Toolkit Overview

The toolkit is implmented in Python. The training of the neural networks is implemented in Theano library, while the decoding is implemented in Numpy for runtime efficiency. The toolkit supports several RNN-based generators as well as several baselines:

* Model
- (knn) kNN generator:
    k-nearest neighbor example-based generator, based on MR similarty.
- (ngram) Class-based Ngram generator [Oh & Rudnicky, 2000]:
    Class-based language model generator by utterance class partitions. 
- (hlstm) Heuristic Gated LSTM [Wen et al, 2015a]:
    An MR-conditioned LSTM generator with heuristic gates.
- (sclstm) Semantically Conditioned LSTM [Wen et al, 2015b]:
    An MR-conditioned LSTM generator with learned gates.
- (encdec) Attentive Encoder-Decoder LSTM [Wen et al, 2015c]:
    An encoder-decoder LSTM with slot-value level attention.

* Training Strategy
- (ml) Maximum Likehood Training, using token cross-entropy
- (dt) Discriminative Training (or Expected BLEU training) [Wen et al, 2016]

* Decoding Strategy
- (beam) Beam search
- (sample) Random sampling

Configuration Parameters

Below are configuration parameters explained by sections:

* [learn]
- lr            : learning rate of SGD.
- lr_decay      : learning rate decay.
- lr_divide     : the maximum number of times when validation gets worse.
                  for early stopping.
- beta          : regularisation parameter.
- random_seed   : random seed.
- min_impr      : the relative minimal improvement allowed.  
- debug         : debug flag
- llogp         : log prob in the last epoch

* [train_mode]
- mode          : training mode, currently only support 'all'
- obj           : training objective, 'ml' or 'dt'
- gamma         : hyperparameter for DT training
- batch         : batch size

* [generator]
- type          : the model type, [hlstm|sclstm|encdec]
- hidden        : hidden layer size

* [data]    
- domain        : application domain
- train/valid/test: dataset operated on
- vocab         : vocabulary
- percentage    : the percentage of train/valid considered
- wvec          : pretrained word vectors
- model         : the produced model path

* [gen]
- topk          : the N-best list returned
- overgen       : number of over-generation
- beamwidth     : the beam width used to decode utterances
- detectpairs   : the mapping file for calculating the slot error rate
- verbose       : verbose level of the model, not supported yet
- decode        : decoding strategy, 'beam' or 'sample'


Below are knn/ngram specific parameters:
* [ngram]
- ngram         : the N of ngram
- rho           : number of slots considered to partition the dataset

Quick Start

To run ML training:

python main.py -config config/sclstm.cfg -mode train

To run generation:

python main.py -config config/sclstm.cfg -mode test

To run ngram/knn baselines:

python main.py -config config/ngram.cfg -mode ngram
python main.py -config config/knn.cfg   -mode knn

To run training/adaptation/DT training/fine-tuning on an existing model

python main.py -config config/sclstm-DT.cfg -mode adapt

Note : before you run anything, make sure the config vars are properly set.

Benchmark Results

The following benchmark results were produced by training each neural network model on 5 different random seeds (1-5) and selected models with the best validation BLEU score. Both the testing and validating set performance are shown:

benchmark

Bug Report

If you have found any bugs in the code, please contact: thw28 at cam dot ac dot uk

References

If you use any source codes or datasets included in this toolkit in your work, please cite the corresponding papers. The bibtex are listed below:

[Wen et al, 2016]:
    @inproceedings{wenmultinlg16,
    Author = {Wen, Tsung-Hsien and Ga{\v{s}}i\'c, Milica and Mrk{\v{s}}i\'c, Nikola and M. Rojas-Barahona, Lina and Su, Pei-Hao and Vandyke, David and Young, Steve},
    title={Multi-domain Neural Network Language Generation for Spoken Dialogue Systems},
    booktitle={Proceedings of the 2016 Conference on North American Chapter of the Association for Computational Linguistics (NAACL)},
    year={2016},
    month={June},
    publisher={Association for Computational Linguistics},
    location={San Diego, USA}
}

[Wen et al, 2015a]:
@INPROCEEDINGS{
thwsjy15,
    Author = {Wen, Tsung-Hsien and Ga{\v{s}}i\'c, Milica and Kim, Dongho and Mrk{\v{s}}i\'c, Nikola and Su, Pei-Hao and Vandyke, David and Young, Steve},
    Title = {{Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking}},
    Year = {2015},
    month={September},
    booktitle={Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)},
    publisher={Association for Computational Linguistics},
    location={Prague, Czech Republic}
}

[Wen et al, 2015b]:
@inproceedings{wensclstm15,
    Author = {Wen, Tsung-Hsien and Ga{\v{s}}i\'c, Milica and Mrk{\v{s}}i\'c, Nikola and Su, Pei-Hao and Vandyke, David and Young, Steve},
    title={Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems},
    booktitle={Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
    year={2015},
    month={September},
    publisher={Association for Computational Linguistics},
    location={Lisbon, Portugal}
}

[Wen et al, 2015c]:
@article{wenmlsds16,
    Author = {Wen, Tsung-Hsien and Ga{\v{s}}i\'c, Milica and Mrk{\v{s}}i\'c, Nikola and M. Rojas-Barahona, Lina and Su, Pei-Hao and Vandyke, David and Young, Steve},
    title={Toward Multi-domain Language Generation using Recurrent Neural Networks},
    journal={NIPS Workshop on Machine Learning for Spoken Language Understanding and Interaction},
    year={2015},
    month={Dec},
    location={Montreal, Canada}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 487

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗