All Projects → liutongyang → CMID

liutongyang / CMID

Licence: other
Chinese Medical Intent Dataset

Projects that are alternatives of or similar to CMID

OpenUE
OpenUE是一个轻量级知识图谱抽取工具 (An Open Toolkit for Universal Extraction from Text published at EMNLP2020: https://aclanthology.org/2020.emnlp-demos.1.pdf)
Stars: ✭ 274 (+197.83%)
Mutual labels:  intent-classification
Slotfilling
Using Tensorflow to train a slot-filling & intent joint model
Stars: ✭ 14 (-84.78%)
Mutual labels:  intent-classification
Deeppavlov
An open source library for deep learning end-to-end dialog systems and chatbots.
Stars: ✭ 5,525 (+5905.43%)
Mutual labels:  intent-classification
Snips Nlu
Snips Python library to extract meaning from text
Stars: ✭ 3,583 (+3794.57%)
Mutual labels:  intent-classification
Few-Shot-Intent-Detection
Few-Shot-Intent-Detection includes popular challenging intent detection datasets with/without OOS queries and state-of-the-art baselines and results.
Stars: ✭ 63 (-31.52%)
Mutual labels:  intent-classification
nlcli
Natural language interface for the command line.
Stars: ✭ 21 (-77.17%)
Mutual labels:  intent-classification
alter-nlu
Natural language understanding library for chatbots with intent recognition and entity extraction.
Stars: ✭ 45 (-51.09%)
Mutual labels:  intent-classification

Chinese Medical Intent Dataset(CMID)

This dataset is used for Chinese medical QA intent understanding task.

More details will be updated soon.

Dataset format:

All the data is stored in a JSON file. There are 5 fields in the file. An example as follows:

{
 "originalText": "间质性肺炎的症状?", 
 "entities": [{"label_type": "疾病和诊断", "start_pos": 0, "end_pos": 5}], 
 "seg_result": ["间质性肺炎", "的", "症状", "?"], 
 "label_4class": ["病症"], 
 "label_36class": ["临床表现"]
}

The JSON field details

The "originalText" field holds the input information.

The "entities" field holds the Named entity recognition information with Deep learning model. The tag of the entity follows the CCKS2019 Task1 standard: https://www.biendata.com/competition/ccks_2019_1/Evaluation/.

The "seg_result" field holds the information after sentence segmentation.

The "label_4class" field holds the manually annotated medical intent classification information.

The "label_36class" field holds the manually annotated medical intent classification information.

Inclusion of 4class and 36class

label_4class is the primary type that contains:

病症 药物 治疗方案 其他

label_36class is the secondary type that contains:

病症:定义,病因,临床表现,相关病症,治疗方法,推荐医院,预防,所属科室,禁忌,传染性,治愈率,严重性
药物:作用,适用症,价钱,药物禁忌,用法,副作用,成分
治疗方案:方法,费用,有效时间,临床意义/检查目的,治疗时间,疗效,恢复时间,正常指标,化验/体检方案,恢复
其他:设备用法,多问,养生,整容,两性,对比,无法确定

Final Words

Thanks for using our corpus! Please don't forget to let us know if our dataset advance the current state of the art forward in your Chinese natural language processing task.

Contacts

CMID cannot be used for projects other than scientific research.

Please contact us if necessary: [email protected], [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].