All Projects → Nealcly → Mutual

Nealcly / Mutual

A Dataset for Multi-Turn Dialogue Reasoning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Mutual

Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (-24.31%)
Mutual labels:  chatbot, dataset
Chatito
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Stars: ✭ 678 (+274.59%)
Mutual labels:  chatbot, dataset
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (+157.46%)
Mutual labels:  chatbot, dataset
Awesome machine learning solutions
A curated list of repositories for my book Machine Learning Solutions.
Stars: ✭ 65 (-64.09%)
Mutual labels:  chatbot, dataset
Know Your Intent
State of the Art results in Intent Classification using Sematic Hashing for three datasets: AskUbuntu, Chatbot and WebApplication.
Stars: ✭ 116 (-35.91%)
Mutual labels:  chatbot, dataset
Chatgirl
ChatGirl is an AI ChatBot based on TensorFlow Seq2Seq Model. ChatGirl 一个基于 TensorFlow Seq2Seq 模型的聊天机器人。(包含预处理过的 twitter 英文数据集,训练,运行,工具代码,来波 Star 。)QQ群:167122861
Stars: ✭ 105 (-41.99%)
Mutual labels:  chatbot, dataset
Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (+353.59%)
Mutual labels:  chatbot, dataset
Dialog corpus
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+818.23%)
Mutual labels:  chatbot, dataset
Conversation Tensorflow
TensorFlow implementation of Conversation Models
Stars: ✭ 143 (-20.99%)
Mutual labels:  chatbot, dataset
Chatskills
Run and debug Alexa skills on the command-line. Create bots. Run them in Slack. Run them anywhere!
Stars: ✭ 171 (-5.52%)
Mutual labels:  chatbot
Sice
Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images (TIP 2018)
Stars: ✭ 175 (-3.31%)
Mutual labels:  dataset
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-5.52%)
Mutual labels:  dataset
Dingtalkchatbotsdk
钉钉群机器人(.net跨平台)
Stars: ✭ 172 (-4.97%)
Mutual labels:  chatbot
Tock
Tock - the open source conversational AI toolkit
Stars: ✭ 175 (-3.31%)
Mutual labels:  chatbot
Eddi
Scalable Open Source Chatbot Platform. Build multiple Chatbots with NLP, Behavior Rules, API Connector, Templating. Developed in Java, provided with Docker, orchestrated with Kubernetes or Openshift.
Stars: ✭ 171 (-5.52%)
Mutual labels:  chatbot
Deep Reinforcement Learning For Dialogue Generation In Tensorflow
Deep-Reinforcement-Learning-for-Dialogue-Generation-in-tensorflow
Stars: ✭ 178 (-1.66%)
Mutual labels:  chatbot
Transcriberbot
TranscriberBot for Telegram
Stars: ✭ 170 (-6.08%)
Mutual labels:  chatbot
Faker
Faker is a Python package that generates fake data for you.
Stars: ✭ 13,401 (+7303.87%)
Mutual labels:  dataset
Intrinsic Image Popularity
The pytorch code of the paper "Intrinsic Image Popularity Assessment"
Stars: ✭ 179 (-1.1%)
Mutual labels:  dataset
Tensorflow Ml Nlp
텐서플로우와 머신러닝으로 시작하는 자연어처리(로지스틱회귀부터 트랜스포머 챗봇까지)
Stars: ✭ 176 (-2.76%)
Mutual labels:  chatbot

MuTual

MuTual: A Dataset for Multi-Turn Dialogue Reasoning (ACL2020)

MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is modified from Chinese high school English listening comprehension test data. Please see our paper for more details.

We also provide several baselines to facilitate the further research. (Coming soon)

Example

The process of modifying the listening comprehension test data.

Examples from the MuTual dataset. All choices are relevant to the context, but only one of them is logic correct. Some negative choices might be reasonable in extreme cases, but the positive one is the most appropriate. Clue words are purple and underline.

Data statistics

MuTual
Context-Response Pairs 8,860
#Avg. Turns per Dialogue 4.73
#Avg. Words per Utterance 19.57
Vocabulary Size (Context) 8,809
Vocabulary Size (Response) 8,943
Vocabulary Size 11,343
# Original Dialogues 6,371
# Original Questions 11,323
# Response Candidates 4

Data template

data/mutual/train, data/mutual/dev and data/mutual/test are the training, development and test sets, respectively. After loading each file, you will get a dictionary. The format of them is as follows:

{"answers": "B", 
"options": ["m : so you come to manchester just for watching a concert , do n't you ?", "m : i really want to say that your performance in manchester must will be great !", "m : you come to manchester specially for this friend , so your friendship must be very deep .", "m : is this your first performance in manchester ? i remember you never sang at a high school concert ."], 
"article": "m : hi , della . how long are you going to stay here ? f : only 4 days . i know that 's not long enough , but i have to go to london after the concert here at the weekend . m : i 'm looking forward to that concert very much . can you tell us where you sing in public for the first time ? f : hmm ... at my high school concert , my legs shook uncontrollably and i almost fell . m : i do n't believe that . della , have you been to any clubs in manchester ? f : no , i have n't . but my boyfriend and i are going out this evening . we know manchester has got some great clubs and tomorrow will go to some bars .", 
"id": "dev_1"}

options is a list of four candidates' response.

article is the context. f and m indicate female and male, respectively.

answers is the correct answer. Noted that we do not realease the correct answer on test set.

Please send your predictions (decode output) in the sample format(id + "\t" + rank1prediction + "\t" + rank2prediction + "\t" + rank3prediction + "\t" + rank4prediction one instance per line), methods and dev performance to [email protected] . We will evaluate your results according to the Eval Script.

Reference

If the corpus or the analysis is helpful to your research, please kindly cite our paper:

@inproceedings{mutual,
    title = "MuTual: A Dataset for Multi-Turn Dialogue Reasoning",
    author = "Cui, Leyang  and Wu, Yu and Liu, Shujie and Zhang, Yue and Zhou, Ming" ,
    booktitle = "Proceedings of the 58th Conference of the Association for Computational Linguistics",
    year = "2020",
    publisher = "Association for Computational Linguistics",
}

Please feel free to contact me([email protected]), if you need any further information.

Acknowledgement

We thank Qingkai Min for helping us to build the leaderboard.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].