Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ymcui → Chinese Rc Datasets

ymcui / Chinese Rc Datasets

Licence: cc-by-sa-4.0

Collections of Chinese reading comprehension datasets

Labels

question-answering

Projects that are alternatives of or similar to Chinese Rc Datasets

Reading Comprehension Question Answering Papers

Survey on Machine Reading Comprehension

Stars: ✭ 101 (-36.48%)

Mutual labels: question-answering

Dynamic Memory Networks Plus Pytorch

Implementation of Dynamic memory networks plus in Pytorch

Stars: ✭ 123 (-22.64%)

Mutual labels: question-answering

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (-13.84%)

Mutual labels: question-answering

Chatbot

Русскоязычный чатбот

Stars: ✭ 106 (-33.33%)

Mutual labels: question-answering

Haystack

🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.

Stars: ✭ 3,409 (+2044.03%)

Mutual labels: question-answering

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (-22.01%)

Mutual labels: question-answering

Neuronblocks

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

Stars: ✭ 1,356 (+752.83%)

Mutual labels: question-answering

Pytorch Question Answering

Important paper implementations for Question Answering using PyTorch

Stars: ✭ 154 (-3.14%)

Mutual labels: question-answering

Clicr

Machine reading comprehension on clinical case reports

Stars: ✭ 123 (-22.64%)

Mutual labels: question-answering

Question Answering

TensorFlow implementation of Match-LSTM and Answer pointer for the popular SQuAD dataset.

Stars: ✭ 133 (-16.35%)

Mutual labels: question-answering

Tableqa

AI Tool for querying natural language on tabular data.

Stars: ✭ 109 (-31.45%)

Mutual labels: question-answering

Dynamic Coattention Network Plus

Dynamic Coattention Network Plus (DCN+) TensorFlow implementation. Question answering using Deep NLP.

Stars: ✭ 117 (-26.42%)

Mutual labels: question-answering

Medquad

Medical Question Answering Dataset of 47,457 QA pairs created from 12 NIH websites

Stars: ✭ 129 (-18.87%)

Mutual labels: question-answering

Ama

[[I'm slow at replying these days, but I hope to get back to answering questions eventually]] Ask me anything!

Stars: ✭ 102 (-35.85%)

Mutual labels: question-answering

Question answering models

This repo collects and re-produces models related to domains of question answering and machine reading comprehension

Stars: ✭ 139 (-12.58%)

Mutual labels: question-answering

Flexneuart

Flexible classic and NeurAl Retrieval Toolkit

Stars: ✭ 99 (-37.74%)

Mutual labels: question-answering

Knowledge Aware Reader

PyTorch implementation of the ACL 2019 paper "Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader"

Stars: ✭ 123 (-22.64%)

Mutual labels: question-answering

Nspm

🤖 Neural SPARQL Machines for Knowledge Graph Question Answering.

Stars: ✭ 156 (-1.89%)

Mutual labels: question-answering

Cape Webservices

Entrypoint for all backend cape webservices

Stars: ✭ 149 (-6.29%)

Mutual labels: question-answering

Kbqa Ar Smcnn

Question answering over Freebase (single-relation)

Stars: ✭ 129 (-18.87%)

Mutual labels: question-answering

View All Similar Projects ➔

Chinese Machine Reading Comprehension Datasets

Note that, this repository will be updated irregularly.

If you find this repository helpful, please press the star button. Moreover, if you would like to use or repost the content in this repository, please indicate the orignal author and source link.

Content

Section	Description
Chinese Reading Comprehension Datasets	Describe public Chinese RC datasets
State-of-the-art Systems	State-of-the-art systems and results
Chinese Reading Comprehension Evaluations and Competitions	Introductions to Chinese RC competitions

Chinese Reading Comprehension Datasets

Here I list several Chinese reading comprehension datasets that are PUBLICLY available (with appropriate technical report or paper). If I missed something, feel free to inform me. Unless indicated, the datasets are in simplified Chinese.

Dataset	Genre	Query Type	Answer Type	Document #	Query #	Download
People Daily & Children's Fairy Tale [1]	news & tale	Cloze	word	28K	100K	link
WebQA [2]	Web	User log	entity	-	42K	link
CMRC 2017 [3]	news	Cloze & Query	word	-	364K	link
DuReader [4]	Web	User log	free form	1M	200K	link
CMRC 2018 [5]	Wiki	Query	Span	-	18K	link
DRCD [6]^{(tranditional Chinese)}	Wiki	Query	Span	-	34K	link
C^3 [7]	mixed	Query	choice	14K	24K	link
CMRC 2019 [8]	Story	cloze	Sentence	1K	100K	link
ChID [9]	varies	cloze	idiom	580K	729K	link

[1] (Cui et al., 2016) Consensus Attention-based Neural Networks for Chinese Reading Comprehension. In COLING 2016. https://aclanthology.info/papers/C16-1167/c16-1167

[2] (Li et al., 2016) Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering. In arXiv. https://arxiv.org/abs/1607.06275

[3] (Cui et al., 2018) Dataset for the First Evaluation on Chinese Machine Reading Comprehension. In LREC 2018. http://www.lrec-conf.org/proceedings/lrec2018/summaries/32.html

[4] (He et al., 2018) DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications. In ACL 2018 MRQA Workshop. https://aclanthology.info/papers/W18-2605/w18-2605

[5] (Cui et al., 2018) A Span-Extraction Dataset for Chinese Machine Reading Comprehension. In arXiv. https://arxiv.org/abs/1810.07366

[6] (Shao et al., 2018) DRCD: a Chinese Machine Reading Comprehension Dataset. In arXiv. https://arxiv.org/abs/1806.00920

[7] (Sun et al., 2019) Probing Prior Knowledge Needed in Challenging Chinese Machine Reading Comprehension. https://arxiv.org/abs/1904.09679

[8] (Cui et al., 2019) https://github.com/ymcui/cmrc2019

[9] (Zheng et al., 2019) ChID: A Large-scale Chinese IDiom Dataset for Cloze Test. https://aclweb.org/anthology/papers/P/P19/P19-1075/

State-of-the-art Systems

Here I list several state-of-the-art systems (published / unpublished) for these datasets. There is a big chance that I missed something. So feel free to inform me new entries on Issue tab.

People Daily & Children's Fairy Tale

System	PD-DEV	PD-TEST	CFT-TEST-AUTO	CFT-TEST-HUMAN	Note
SAW Reader (Zhang et al., 2018)	72.8	75.1	-	43.8	-
CAW Reader (Zhang et al., 2018)	69.4	70.5	-	39.7	-
CAS Reader (Cui et al., 2016)	65.2	68.1	41.3	35.0	-
AS Reader (Cui et al., 2016)	64.1	67.2	40.9	33.1	-

CMRC 2017

Leaderboard: https://hfl-rc.github.io/cmrc2017/leaderboard/

Cloze Track

System	DEV	TEST	Note
6ESTATES PTE LTD (ensemble)	81.85	81.90	-
SJTU BCMI-NLP (ensemble)	78.35	80.67	-
YunSiChuangZhi (ensemble)	79.20	80.27	-
SAW Reader (Zhang et al., 2018)	78.95	78.80	-
CAW Reader (Zhang et al., 2018)	77.95	78.50	-
Word + Char + BPE-FRQ (Zhang et al., 2018)	79.05	78.83	-

User Query Track

System	DEV	TEST	Note
ECNU (ensemble)	90.45	69.53	-
SXU-3 (single model)	47.80	49.07	-
ZZU (single model)	31.10	32.53	-

DuReader

Leaderboard: http://ai.baidu.com/broad/leaderboard?dataset=dureader

System	ROUGE-L	BLEU-4	Note
AliReader	63.48	61.54	-
NI-Reader (ensemble)	63.38	59.23	-
mrc_try_mingyan (single model)	62.20	59.72	-
(Yan et al., 2018)	50.71	49.39	-
(Li et al., 2018)	44.95	42.68	-
(Wang et al., 2018)	44.18	40.97	-
(Xu et al., 2018)	39.60	34.76	-
Match-LSTM (He et al., 2018)	39.2	31.9	-
BiDAF (He et al., 2018)	39.0	31.8	-

CMRC 2018

Leaderboard: https://hfl-rc.github.io/cmrc2018/open_challenge/

System	DEV-EM	DEV-F1	TEST-EM	TEST-F1	CHALLENGE-EM	CHALLENGE-F1	Note
P-Reader (single model)	59.894	81.499	65.189	84.386	15.079	39.583	-
GM-Reader (ensemble)	58.931	80.069	64.045	83.046	15.675	37.315	-
MCA-Reader (ensemble)	66.698	85.538	71.175	88.090	15.476	37.104	-
Z-Reader (single model)	79.776	92.696	74.178	88.145	13.889	37.422	-
SRC->DS(±) (Yang et al., 2019)	49.2	65.4	-	-	-	-	-

More detailed results can be obtained in CMRC 2018 Overview. Note that, some of the submission are using development set for training as well.

DRCD

System	DEV-EM	DEV-F1	TEST-EM	TEST-EM	Note
SRC + DS(±) (Yang et al., 2019)	55.4	67.7	-	-	-
r-net (single model)	-	-	29.1	44.4	-

C^3

System	DEV-1A	TEST-1A	DEV-1B	TEST-1B	DEV-2A	TEST-2A	DEV-2B	TEST-2B	Note
BERT_CN (Sun et al., 2019)	63.0	62.6	62.3	62.1	36.7	26.2	34.7	31.3	-

Chinese Reading Comprehension Evaluations and Competitions

Along with the release of these datasets, there are also several Chinese Reading Comprehension evaluation workshops or competitions which further accelerate the research on this topic.

The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)
Host: CIPS-CL, Joint Laboratory of HIT and iFLYTEK Research (HFL), iFLYTEK Co. Ltd
Competition Type: Cloze-style RC, User Query RC

The Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018)
Host: CIPS-CL, Joint Laboratory of HIT and iFLYTEK Research (HFL), iFLYTEK Co. Ltd
Competition Type: Span-Extraction RC

2018 NLP Challenge on Machine Reading Comprehension
Host: CCF, CIPSC, Baidu Inc.
Competition Type: Open-Domain RC

CIPS-SOGOU QA Competition
Host: CIPSC, SOGOU
Competition Type: Factoid QA, Non-Factoid QA

The Third Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2019)
Host: CIPS-CL, Joint Laboratory of HIT and iFLYTEK Research (HFL), iFLYTEK Co. Ltd
Competition Type: Sentence Cloze

2019 NLP Language and Intelligence Challenge
Host: CCF, CIPSC, Baidu Inc.
Competition Type: Open-Domain RC

Chinese Idiom Understanding Contest
Host: CCF, Tsinghua University
Competition Type: Cloze Test

Contact

For any problems, please leave a message in the Github Issues.

Disclaimer

Any subjective comments in this repository only represents the idea of the owner (ymcui), and does not represent the claims of any organizations.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 159

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗