All Projects → Wluper → Matilda

Wluper / Matilda

Licence: gpl-2.0
LIDA: Lightweight Interactive Dialogue Annotator (in EMNLP 2019)

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Matilda

Paper Reading
Paper reading list in natural language processing, including dialogue systems and text generation related topics.
Stars: ✭ 508 (+306.4%)
Mutual labels:  dialogue-systems
Moel
MoEL: Mixture of Empathetic Listeners
Stars: ✭ 38 (-69.6%)
Mutual labels:  dialogue-systems
Atis dataset
The ATIS (Airline Travel Information System) Dataset
Stars: ✭ 81 (-35.2%)
Mutual labels:  dialogue-systems
Conv Emotion
This repo contains implementation of different architectures for emotion recognition in conversations.
Stars: ✭ 646 (+416.8%)
Mutual labels:  dialogue-systems
Augmented seq2seq
enhance seq2seq model for open ended dialog generation
Stars: ✭ 29 (-76.8%)
Mutual labels:  dialogue-systems
Convai Baseline
ConvAI baseline solution
Stars: ✭ 49 (-60.8%)
Mutual labels:  dialogue-systems
Multiwoz
Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)
Stars: ✭ 384 (+207.2%)
Mutual labels:  dialogue-systems
Lic2019 Competition
2019语言与智能技术竞赛-基于知识图谱的主动聊天
Stars: ✭ 109 (-12.8%)
Mutual labels:  dialogue-systems
Conversational Ai
Conversational AI Reading Materials
Stars: ✭ 34 (-72.8%)
Mutual labels:  dialogue-systems
Dialogue Understanding
This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study
Stars: ✭ 77 (-38.4%)
Mutual labels:  dialogue-systems
Chatbot cn
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (+532.8%)
Mutual labels:  dialogue-systems
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (+576%)
Mutual labels:  dialogue-systems
Convai Bot 1337
NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager
Stars: ✭ 65 (-48%)
Mutual labels:  dialogue-systems
Deeppavlov
An open source library for deep learning end-to-end dialog systems and chatbots.
Stars: ✭ 5,525 (+4320%)
Mutual labels:  dialogue-systems
Crd3
The repo containing the Critical Role Dungeons and Dragons Dataset.
Stars: ✭ 83 (-33.6%)
Mutual labels:  dialogue-systems
Rnnlg
RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Stars: ✭ 487 (+289.6%)
Mutual labels:  dialogue-systems
Letsgodataset
This repository makes the integral Let's Go dataset publicly available.
Stars: ✭ 41 (-67.2%)
Mutual labels:  dialogue-systems
Awesome Emotion Recognition In Conversations
A comprehensive reading list for Emotion Recognition in Conversations
Stars: ✭ 111 (-11.2%)
Mutual labels:  dialogue-systems
Cakechat
CakeChat: Emotional Generative Dialog System
Stars: ✭ 1,361 (+988.8%)
Mutual labels:  dialogue-systems
Korean restaurant reservation
Implement korean restaurant reservation dialogue system based on hybrid code network.
Stars: ✭ 73 (-41.6%)
Mutual labels:  dialogue-systems

Wluper

LIDA: Lightweight Interactive Dialogue Annotator

Authors: Ed Collins, Nikolai Rozanov, Bingbing Zhang,

Contact: [email protected]

Paper: Link will Follow for the 2019 EMNLP Paper (together with citation)

LIDA is an open source dialogue annotation system which supports the full pipeline of dialogue annotation from dialogue / turn segmentation from raw text (as may be output by a transcription service) to labeling of structured conversation data to inter-annotator disagreement resolution. LIDA supports integration of arbitrary machine learning (ML) models as annotation recommenders to speed up annotation, and more generally any system which conforms to the required API.

LIDA was designed with three use cases in mind:

  1. Experimenting With Dialogue Systems: users can integrate a dialogue system to LIDA's back end and then use LIDA's front end to talk to the dialogue system and relabel the things it gets wrong. Users can then download these interactions as a JSON file to use a test case in future versions of the system.

  2. Creating New Dialogue Datasets: users can create a blank dialogue in LIDA's front end then enter and label queries. They can specify arbitrary ML models in the back end, and the entered query will automatically be run through all of these.

  3. Labeling Existing Dialogue Datasets: users can upload either raw .txt or .json files by dragging and dropping to LIDA's home screen. If the file is a .txt file, the user will be taken to the turn and dialogue segmentation screen to split the text file into turns and dialogues. If the file is a .json file, it must be in the correct format (described below). Users will then be able to label their uploaded data using LIDA's front end. Once annotations have been obatined, LIDA's inter-annotator disagreement resolution screen can be used to solve conflicts between annotators.

The annotator screen

Annotator Screen

The inter-annotator screen

Inter-annotator Screen

Installation

LIDA is a client-server app. The server is written in Python with the Flask web framework. The front end is written with HTML/CSS/Vue js and communicates with the back end via a RESTful API. To run LIDA, you will need to first run the Flask server on your local machine / wherever you want the back end to run. You will need to have Python 3.6 or above installed on your machine for the server to run.

Downloading & Installing Requirements

It is strongly recommended that you clone into a Python virtual environment:

$ mkdir LIDA/
$ python3 -m venv LIDA/
$ cd LIDA/ && source bin/activate
(LIDA)$ git clone https://github.com/Wluper/lida.git
(LIDA)$ cd lida/
(LIDA)$ pip3 install -r requirements.txt

Running the Main Server

Assuming you have just followed the steps to Download and Install Requirements:

(LIDA)$ pwd
~/LIDA/lida
(LIDA)$ cd server/
(LIDA)$ python lida_app.py

You should see the Flask server running in the Terminal now on port 5000.

Running the Inter-Annotator Disagreement Resolution Server

Assuming you have just followed the steps to Download and Install Requirements:

(LIDA)$ pwd
~/LIDA/lida
(LIDA)$ cd server/
(LIDA)$ python interannotator_app.py

You should see the Flask server running in the Terminal now on port 5000.

Running the Front End

Simply double click on gui/index.html for the main LIDA app, and on gui/admin.html for the inter-annotator disagreement resolution page.

Adding Custom Labels

LIDA Main Tool

All configuration changes that you may wish to make to LIDA can be done in the file server/annotator_config.py. This script contains a configuration dictionary that describes which labels will appear in LIDA's front end.

You can currently add three different types of new labels to LIDA:

  1. multilabel_classification :: will display as checkboxes which you can select one or more of.

  2. multilabel_classification_string :: will display as checkboxes with values next to them and text input fields for a string. This kind of label would be used for a slot-value pair in dialogue state tracking, where you have the slot name (a classification) and the value (an arbitrary string).

  3. string :: will display underneath the user's utterance as a string response. This is the label field that would be used for a response to the user's query.

To add a new label, simply specify a new entry in the configDict in server/annotator_config.py. The key should be the name of the label, and the value a dictionary which has a field specifying the label_type, a boolean field required which defines whether the label is required or not and a field called labels which specify what label values there are for this label (not applicable to labels of type string).

You can optionally add a description field and a model field which provides a recommender for the label (see below for details on API requirement). You can see examples of all label types in server/annotator_config.py.

The Annotator Config file

Annotator Config

LIDA Interannotator Tool

All configuration changes that you would like to add to the Interannotator tool can be done in server/interannotator_config.py.

It currently allows you to modify the following:

  1. How to treat disagreements etc.

  2. How to calculate scores.

Adding ML Models As Recommenders

All configuration changes that you may wish to make to LIDA can be done in the file server/annotator_config.py. This script contains a configuration dictionary that describes which labels will appear in LIDA's front end.

To add a recommender, simply add a field called "model" to the element of the config dict that you want to add a recommender for. The value of this field needs to be a Python object that conforms to the interface defined below.

Any recommender you add to LIDA must conform to the following API: each recommender is a Python object that has a method called transform:

transform(sent: str) -> List[str] or List[Tuple[str, str]] or str

That is, your recommender only needs to provide a method called transform that takes a single string as input and returns predicted labels. The predictions need to conform to the label_type. What this means is:

  • If the element's label_type is multilabel_classification, then the transform() method needs to return a list of strings (i.e. a list of the labels for the string). For example, for sentiment classification this may look like:

    predictor.transform("I liked the movie") -> ["positive"]

  • If the element's label_type is multilabel_classification_string, then the transform() method needs to return a list of tuples, where each tuple consists of two strings (i.e. a list of slots and values). For example, for hotel belief state tracking this may look like:

    predictor.transform("I want a hotel for 5 people") -> [("hotel-book people", "5")]

  • If the element's label_type is string, then the transform() method needs to also return a string. For example, you could add a dialogue system to LIDA using this label type:

    dialogue_system.transform("I want a hotel") -> "What area of town?"

You can see more examples of this in server/dummy_models.py and see how they are integrated to LIDA's back end in the current server/annotator_config.py script.

Dummy Models

Dummy Models

Uploading JSON File Format

If you upload a JSON file representing a dialogue to be labelled, then it must have the following properties:

  • File is a dict with keys as the names of each dialogue and values as lists.

  • Each value is a list of dictionaries, where each dictionary contains a number of key-value pairs which are used to display the dialogue data for annotation.

  • Some key-value pairs are compulsory in order to correctly display the dialogue. The key-value pairs which are compulsory are defined in the annotator_config.py file in the server folder.

  • By default, the only required key-value pair in each turn is called usr and should be the user's query as a string.

An example of data in the correct form can be seen in server/dummy_data.json.

JSON Format Example

JSON format

Citation

The official citation from the EMNLP 2019 conference in Hong Kong. Please cite this when using.

@inproceedings{collins-etal-2019-lida,
    title = "{LIDA}: Lightweight Interactive Dialogue Annotator",
    author = "Collins, Edward  and
      Rozanov, Nikolai  and
      Zhang, Bingbing",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-3021",
    doi = "10.18653/v1/D19-3021",
    pages = "121--126",
    abstract = "Dialogue systems have the potential to change how people interact with machines but are highly dependent on the quality of the data used to train them.It is therefore important to develop good dialogue annotation tools which can improve the speed and quality of dialogue data annotation. With this in mind, we introduce LIDA, an annotation tool designed specifically for conversation data. As far as we know, LIDA is the first dialogue annotation system that handles the entire dialogue annotation pipeline from raw text, as may be the output of transcription services, to structured conversation data. Furthermore it supports the integration of arbitrary machine learning mod-els as annotation recommenders and also has a dedicated interface to resolve inter-annotator disagreements such as after crowdsourcing an-notations for a dataset. LIDA is fully open source, documented and publicly available.[https://github.com/Wluper/lida] {--}{\textgreater} Screen Cast: https://vimeo.com/329824847",
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].