All Projects → ymcui → cmrc2019

ymcui / cmrc2019

Licence: CC-BY-SA-4.0 license
A Sentence Cloze Dataset for Chinese Machine Reading Comprehension (CMRC 2019)

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to cmrc2019

cdQA-ui
⛔ [NOT MAINTAINED] A web interface for cdQA and other question answering systems.
Stars: ✭ 19 (-83.9%)
Mutual labels:  question-answering, reading-comprehension, bert
SQUAD2.Q-Augmented-Dataset
Augmented version of SQUAD 2.0 for Questions
Stars: ✭ 31 (-73.73%)
Mutual labels:  question-answering, bert
cmrc2017
The First Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2017)
Stars: ✭ 90 (-23.73%)
Mutual labels:  question-answering, reading-comprehension
text2text
Text2Text: Cross-lingual natural language processing and generation toolkit
Stars: ✭ 188 (+59.32%)
Mutual labels:  question-answering, bert
DrFAQ
DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.
Stars: ✭ 29 (-75.42%)
Mutual labels:  question-answering, bert
ODSQA
ODSQA: OPEN-DOMAIN SPOKEN QUESTION ANSWERING DATASET
Stars: ✭ 43 (-63.56%)
Mutual labels:  question-answering, reading-comprehension
TOEFL-QA
A question answering dataset for machine comprehension of spoken content
Stars: ✭ 61 (-48.31%)
Mutual labels:  question-answering, reading-comprehension
KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
Stars: ✭ 58 (-50.85%)
Mutual labels:  question-answering, bert
mcQA
🔮 Answering multiple choice questions with Language Models.
Stars: ✭ 23 (-80.51%)
Mutual labels:  question-answering, bert
iamQA
中文wiki百科QA阅读理解问答系统,使用了CCKS2016数据的NER模型和CMRC2018的阅读理解模型,还有W2V词向量搜索,使用torchserve部署
Stars: ✭ 46 (-61.02%)
Mutual labels:  question-answering, bert
extractive rc by runtime mt
Code and datasets of "Multilingual Extractive Reading Comprehension by Runtime Machine Translation"
Stars: ✭ 36 (-69.49%)
Mutual labels:  question-answering, reading-comprehension
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+2788.98%)
Mutual labels:  question-answering, bert
explicit memory tracker
[ACL 2020] Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading
Stars: ✭ 35 (-70.34%)
Mutual labels:  question-answering, reading-comprehension
PersianQA
Persian (Farsi) Question Answering Dataset (+ Models)
Stars: ✭ 114 (-3.39%)
Mutual labels:  question-answering, reading-comprehension
BERT-for-Chinese-Question-Answering
No description or website provided.
Stars: ✭ 75 (-36.44%)
Mutual labels:  question-answering, bert
Medi-CoQA
Conversational Question Answering on Clinical Text
Stars: ✭ 22 (-81.36%)
Mutual labels:  question-answering, bert
exams-qa
A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering
Stars: ✭ 25 (-78.81%)
Mutual labels:  question-answering, reading-comprehension
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+94.07%)
Mutual labels:  question-answering, bert
co-attention
Pytorch implementation of "Dynamic Coattention Networks For Question Answering"
Stars: ✭ 54 (-54.24%)
Mutual labels:  question-answering, reading-comprehension
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+5540.68%)
Mutual labels:  question-answering, bert



GitHub

This repository contains the data for The Third Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2019). We will present our paper at COLING 2020,

Title: A Sentence Cloze Dataset for Chinese Machine Reading Comprehension
Authors: Yiming Cui, Ting Liu, Ziqing Yang, Zhipeng Chen, Wentao Ma, Wanxiang Che, Shijin Wang, Guoping Hu
Link: https://arxiv.org/abs/2004.03116
Venue: COLING 2020

Open Challenge Leaderboard (New!)

Keep track of the latest state-of-the-art systems on CMRC 2019 dataset. https://ymcui.github.io/cmrc2019/

Submission Guidelines

If you would like to test your model on the hidden test and challenge set, please follow the instructions on how to submit your model via CodaLab worksheet. https://worksheets.codalab.org/worksheets/0xe856b40d21de45bf898cd1d3c5135afe

Directory Guide

  • baseline: a Chinese BERT-based simple baseline system

  • eval: contains official evaluation script

  • data: contains offical evaluation data

  • sample_submission: sample submission for codalab competition platform (trial_rand_submission.zip is a randomly generated prediction file, trial_submission.zip is the BERT baseline prediction file)

Baseline System

We provide a BERT-based baseline system for participants (check baseline directory for more info).

Results on other sets will be annouced later.

QAC: Question-Level Accuracy

PAC: Passage-Level Accuracy

Data Passage # Query # QAC PAC Fake Candidates Availability
Trial Data 139 1,504 71.941% 28.776% No Public
Train Data 9,638 100,009 N/A N/A No Public
Development Data 300 3,053 70.586% 13.333% Yes Public
Qualifying Data 500 5,081 70.01% 8.20% Yes Semi-Hidden
Test Data - - - - Yes Hidden

International Standard Language Resource Number (ISLRN)

ISLRN: 813-010-842-493-2

http://www.islrn.org/resources/resources_info/8624/

Reference

If you wish to use our data in your research, please cite our paper:

@inproceeding={cui-etal-2020-cmrc2019,
  title={A Sentence Cloze Dataset for Chinese Machine Reading Comprehension},
  author={Cui, Yiming and Liu, Ting and Yang, Ziqing and Chen, Zhipeng and Ma, Wentao and Che, Wanxiang and Wang, Shijin and Hu, Guoping},
  booktitle = 	"Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)",
  year={2020}
}

Organization Committee

Host: Chinese Information Processing Society of China (CIPS)
Organizer: Joint Laboratory of HIT and iFLYTEK Research (HFL)
Sponsor: iFLYTEK Co., Ltd. and iFLYTEK Research (Hebei)

Evaluation Co-Chairs

Ting Liu, Harbin Institute of Technology
Yiming Cui, Joint Laboratory of HIT and iFLYTEK Research

Official HFL WeChat Account

Follow Joint Laboratory of HIT and iFLYTEK Research (HFL) on WeChat.

qrcode.png

Contact us

Any problems? Feel free to concat us.
Email: cmrc2019 [aT] 126 [DoT] com
Forum: CodaLab Competition Forum
CMRC 2019 Official Website (中文):https://cmrc2019.hfl-rc.com/
CMRC 2019 Official Website (English):https://cmrc2019.hfl-rc.com/english/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].