Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jfainberg → Self_dialogue_corpus

jfainberg / Self_dialogue_corpus

Licence: bsd-3-clause

The Self-dialogue Corpus - a collection of self-dialogues across music, movies and sports

Programming Languages

python

139335 projects - #7 most used programming language

Labels

nlp dialogue

Projects that are alternatives of or similar to Self dialogue corpus

Nndial

NNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

Stars: ✭ 332 (+238.78%)

Mutual labels: dialogue

Nlg Eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

Stars: ✭ 822 (+738.78%)

Mutual labels: dialogue

Geneva

Code to train and evaluate the GeNeVA-GAN model for the GeNeVA task proposed in our ICCV 2019 paper "Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction"

Stars: ✭ 71 (-27.55%)

Mutual labels: dialogue

Nlp Progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Stars: ✭ 19,518 (+19816.33%)

Mutual labels: dialogue

Rnnlg

RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

Stars: ✭ 487 (+396.94%)

Mutual labels: dialogue

Nlp Library

curated collection of papers for the nlp practitioner 📖👩‍🔬

Stars: ✭ 1,025 (+945.92%)

Mutual labels: dialogue

Yarneditor

A tool for writing interactive dialogue in games!

Stars: ✭ 292 (+197.96%)

Mutual labels: dialogue

Som Dst

SOM-DST: Efficient Dialogue State Tracking by Selectively Overwriting Memory (ACL 2020)

Stars: ✭ 79 (-19.39%)

Mutual labels: dialogue

Cdial Gpt

A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models

Stars: ✭ 596 (+508.16%)

Mutual labels: dialogue

Nlp Paper

自然语言处理领域下的对话语音领域，整理相关论文（附阅读笔记），复现模型以及数据处理等（代码含TensorFlow和PyTorch两版本）

Stars: ✭ 67 (-31.63%)

Mutual labels: dialogue

Multiwoz

Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)

Stars: ✭ 384 (+291.84%)

Mutual labels: dialogue

Dialogic

💬 Create dialogs, characters and scenes to display conversations in your Godot games.

Stars: ✭ 414 (+322.45%)

Mutual labels: dialogue

Dialogue

Stars: ✭ 49 (-50%)

Mutual labels: dialogue

Meld

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Stars: ✭ 373 (+280.61%)

Mutual labels: dialogue

Dialogpt

Large-scale pretraining for dialogue

Stars: ✭ 1,177 (+1101.02%)

Mutual labels: dialogue

Seq2seq Chatbot For Keras

This repository contains a new generative model of chatbot based on seq2seq modeling.

Stars: ✭ 322 (+228.57%)

Mutual labels: dialogue

Rezonator

Rezonator: Dynamics of human engagement

Stars: ✭ 25 (-74.49%)

Mutual labels: dialogue

Msr Nlp Projects

This is a list of open-source projects at Microsoft Research NLP Group

Stars: ✭ 92 (-6.12%)

Mutual labels: dialogue

Dialogue Understanding

This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study

Stars: ✭ 77 (-21.43%)

Mutual labels: dialogue

Dream

DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension

Stars: ✭ 60 (-38.78%)

Mutual labels: dialogue

View All Similar Projects ➔

The Self-dialogue Corpus

This is an early release of the Self-dialogue Corpus containing 24,165 conversations, or 3,653,313 words, across 23 topics. For more information on the data, please see our corpus paper or our submission to the Alexa Prize.

Statistics

Category	Count
Topics	23
Conversations	24,165
Words	3,653,313
Turns	141,945
Unique users	2,717
Conversations per user	~9
Unique tokens	117,068

Topics include movies, music, sports, and subtopics within these.

Using the data

corpus contains the raw CSVs from Amazon Mechanical Turk, sorted by individual tasks (topics);
blocked_workers.txt lists workers who did not comply with the requirements of the tasks, these are omitted by default;
get_data.py is a preprocessing script which will format the CSVs into text (by default saved to dialogues), along with various options (see below).

`get_data.py`

Example usage: python get_data.py. This will by default read from corpus and write to dialogues.

Optional arguments:

--inDir Directory to read corpus from
--outDir Directory to write processed files
--output-naming whether to name output files with integers (integer) or by assignment_id (assignment_id);
--remove-punctuation removes punctuation from the output;
--set-case sets case of output to original, upper or lower;
--exclude-topic excludes any of the topics (or subdirectories of corpus), e.g. --exclude-topic music;
--include-only includes only the given topics, e.g. --include-only music.

Citation

For research using this data, please cite:

@article{fainberg2018talking,
  title={Talking to myself: self-dialogues as data for conversational agents},
  author={Fainberg, Joachim and Krause, Ben and Dobre, Mihai and Damonte, Marco and Kahembwe, Emmanuel and Duma, Daniel and Webber, Bonnie and Fancellu, Federico},
  journal={arXiv preprint arXiv:1809.06641},
  year={2018}
}
@article{krause2017edina,
  title={Edina: Building an Open Domain Socialbot with Self-dialogues},
  author={Krause, Ben and Damonte, Marco and Dobre, Mihai and Duma, Daniel and Fainberg, Joachim and Fancellu, Federico and Kahembwe, Emmanuel and Cheng, Jianpeng and Webber, Bonnie},
  journal={Alexa Prize Proceedings},
  year={2017}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 98

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗