All Projects → shijiebei2009 → CEEC-Corpus

shijiebei2009 / CEEC-Corpus

Licence: other
📚中文环境突发事件语料库(Chinese Environment Emergency Corpus)-上海大学-语义智能实验室

Projects that are alternatives of or similar to CEEC-Corpus

BioMedical-NLP-corpus
Biomedical NLP Corpus or Datasets.
Stars: ✭ 44 (+7.32%)
Mutual labels:  corpus-data
DANeS
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
Stars: ✭ 64 (+56.1%)
Mutual labels:  corpus-data
CEC-Corpus
📚中文突发事件语料库(Chinese Emergency Corpus)-上海大学-语义智能实验室
Stars: ✭ 543 (+1224.39%)
Mutual labels:  corpus-data
egret-wenda-corpus
A Public Corpus for Machine Learning
Stars: ✭ 41 (+0%)
Mutual labels:  corpus-data

中文环境突发事件语料库

中文环境突发事件语料库是由上海大学(语义智能实验室)所构建。根据国务院颁布的《国家突发公共事件总体应急预案》的分类体系,从互联网上收集了6类环境污染类突发事件的新闻报道作为生语料,然后再对生语料进行文本预处理、文本分析、事件标注以及一致性检查等处理,最后将标注结果保存到语料库中,CEEC合计100篇。

本次语料标注工作由刘炜、王旭、丁宁、张雨嘉完成,其中标注结果格式化、编码转换、错误修正等工作由王旭完成。

CEEC 采用了 XML 语言作为标注格式,其中包含了六个最重要的数据结构(标记):Event、Denoter、Time、Location、Participant 和 Object。Event用于描述事件;Denoter、Time、Location、Participant 和Object用于描述事件的指示词和要素。此外,我们还为每一个标记定义了与之相关的属性。与ACE和TimeBank语料库相比,CEEC语料库的规模虽然偏小,但是对事件和事件要素的标注却最为全面。

具体内容可参见上海大学公开发表的相关硕士博士论文,以及期刊会议论文等。

本语料库的研究与开发由国家自然科学基金项目“基于描述逻辑的事件推理关键问题研究(编号:61305053)”和“事件本体模型与应用技术”(编号:60975033)资助。

在此感谢上海大学语义智能实验室为CEEC的标注工作作出贡献的各位硕士、博士研究生。

研究论文:
待补充

博士论文:
待补充

硕士论文:
待补充

Chinese Environment Emergency Corpus (CEEC)

Chinese Environment Emergency Corpus (CEEC) is built by Data Semantic Laboratory in Shanghai University. This corpus is divided into 6 categories – marine pollution, air pollution, the social effect, water pollution, soil pollution and noise pollution. There are totally 100 texts in CEEC, which are derived from Internet and processed by several steps.

CEEC utilizes XML as a formation, including 6 tags -Denoter, Time, Location, Participant, Mean and Object- which describe the elements of event (Event). Furthermore, these tags have their own properties. Compared with ACE Corpus and TimeBank Corpus, the scale of CEEC is not so large, but CEEC has the all-sided annotation of event and event elements.

If you want to know more about CEEC, you can refer to the related dissertations and papers, such as
Research on Event-Oriented Knowledge Processing written by Jianfeng Fu
a Study of Several Key Problems in Construction of Event Ontology written by Xujie Zhang.

The corpus tagging work mainly completed done by Liu Wei, Wang Xu, Ding Ning, etc, which format the annotated results, encoding conversion, error correction and other work done by Wang Xu.

Thank you, all of the postgraduates and PhDs in Data Semantic Laboratory in Shanghai University, for making a contribution to CEEC.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].