All Projects → candlewill → Dialog_corpus

candlewill / Dialog_corpus

用于训练中英文对话系统的语料库 Datasets for Training Chatbot System

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Dialog corpus

Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (-91.76%)
Mutual labels:  chatbot, dataset, corpus, dialog
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (-71.96%)
Mutual labels:  chatbot, dataset, dialog
Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (-50.6%)
Mutual labels:  chatbot, dataset, corpus
Seq2seq Chatbot
Chatbot in 200 lines of code using TensorLayer
Stars: ✭ 777 (-53.25%)
Mutual labels:  chatbot, corpus
Chatito
🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Stars: ✭ 678 (-59.21%)
Mutual labels:  chatbot, dataset
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+300.48%)
Mutual labels:  dataset, corpus
Medical-Names-Corpus
医疗语料库。医疗机构名语料库。药品本位码。
Stars: ✭ 26 (-98.44%)
Mutual labels:  corpus, dataset
Coarij
Corpus of Annual Reports in Japan
Stars: ✭ 55 (-96.69%)
Mutual labels:  dataset, corpus
Company Names Corpus
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Stars: ✭ 868 (-47.77%)
Mutual labels:  dataset, corpus
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (-92.72%)
Mutual labels:  dataset, corpus
Dataset List
lists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-94.95%)
Mutual labels:  dataset, corpus
Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Stars: ✭ 493 (-70.34%)
Mutual labels:  dataset, corpus
Awesome machine learning solutions
A curated list of repositories for my book Machine Learning Solutions.
Stars: ✭ 65 (-96.09%)
Mutual labels:  chatbot, dataset
Chatgirl
ChatGirl is an AI ChatBot based on TensorFlow Seq2Seq Model. ChatGirl 一个基于 TensorFlow Seq2Seq 模型的聊天机器人。(包含预处理过的 twitter 英文数据集,训练,运行,工具代码,来波 Star 。)QQ群:167122861
Stars: ✭ 105 (-93.68%)
Mutual labels:  chatbot, dataset
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-84.66%)
Mutual labels:  dataset, corpus
Chatterbot Corpus
A multilingual dialog corpus
Stars: ✭ 964 (-42%)
Mutual labels:  corpus, dialog
Personalized Dialog
Code for the paper 'Personalization in Goal-oriented Dialog' (NeurIPS 2017 Conversational AI Workshop)
Stars: ✭ 109 (-93.44%)
Mutual labels:  dataset, dialog
dialogue-datasets
collect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (-98.56%)
Mutual labels:  dialog, corpus
Species-Names-Corpus
物种名称语料库。植物名,动物名。
Stars: ✭ 23 (-98.62%)
Mutual labels:  corpus, dataset
Watbot
An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with Speaker Recognition) on IBM Cloud.
Stars: ✭ 64 (-96.15%)
Mutual labels:  chatbot, dialog

用于对话系统的中英文语料

Datasets for Training Chatbot System
本项目收集了一些从网络中找到的用于训练中文(英文)聊天机器人的对话语料

公开语料

搜集到的一些数据集如下,点击链接可以进入原始地址

  1. dgk_shooter_min.conv.zip
    中文电影对白语料,噪音比较大,许多对白问答关系没有对应好

  2. The NUS SMS Corpus
    包含中文和英文短信息语料,据说是世界最大公开的短消息语料

  3. ChatterBot中文基本聊天语料
    ChatterBot聊天引擎提供的一点基本中文聊天语料,量很少,但质量比较高

  4. Datasets for Natural Language Processing
    这是他人收集的自然语言处理相关数据集,主要包含Question Answering,Dialogue Systems, Goal-Oriented Dialogue Systems三部分,都是英文文本。可以使用机器翻译为中文,供中文对话使用

  5. 小黄鸡
    据传这就是小黄鸡的语料:xiaohuangji50w_fenciA.conv.zip (已分词) 和 xiaohuangji50w_nofenci.conv.zip (未分词)

  6. 白鹭时代中文问答语料
    由白鹭时代官方论坛问答板块10,000+ 问题中,选择被标注了“最佳答案”的纪录汇总而成。人工review raw data,给每一个问题,一个可以接受的答案。目前,语料库只包含2907个问答。(备份)

  7. Chat corpus repository
    chat corpus collection from various open sources
    包括:开放字幕、英文电影字幕、中文歌词、英文推文

  8. 保险行业QA语料库
    通过翻译 insuranceQA产生的数据集。train_data含有问题12,889条,数据 141779条,正例:负例 = 1:10; test_data含有问题2,000条,数据 22000条,正例:负例 = 1:10;valid_data含有问题2,000条,数据 22000条,正例:负例 = 1:10

未公开语料

这部分语料,网络上有所流传,但由于我们能力所限,或者原作者并未公开,暂时未获取。只是列举出来,供以后继续搜寻。

  1. 微软小冰

版权

所有原始语料归原作者所有

联系

何云超
weibo: @Yunchao_He

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].