All Projects → THU-KEG → MAVEN-dataset

THU-KEG / MAVEN-dataset

Licence: MIT license
Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

MAVEN-dataset

Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".

Data

The dataset (ver. 1.0) can be obtained from Tsinghua Cloud or Google Drive. The data format is introduced in this document.

We also release the document topics for data analysis and model development. The docid2topic.json is to map the document ids to their EventWiki topic labels.

CodaLab

To get the test results, you can submit your predictions to our permanent CodaLab competition (the older version will be phased out soon). For the evaluation method, please refer to the evaluation script.

Codes

We release the source codes for the baselines, including DMCNN, BiLSTM, BiLSTM+CRF, MOGANED and DMBERT.

Citation

If these data and codes help you, please cite this paper.

@inproceedings{wang2020MAVEN,
  title={{MAVEN}: A Massive General Domain Event Detection Dataset},
  author={Wang, Xiaozhi and Wang, Ziqi and Han, Xu and Jiang, Wangyi and Han, Rong and Liu, Zhiyuan and Li, Juanzi and Li, Peng and Lin, Yankai and Zhou, Jie},
  booktitle={Proceedings of EMNLP 2020},
  year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].