All Projects → zjunlp → OpenUE

zjunlp / OpenUE

Licence: MIT license
OpenUE是一个轻量级知识图谱抽取工具 (An Open Toolkit for Universal Extraction from Text published at EMNLP2020: https://aclanthology.org/2020.emnlp-demos.1.pdf)

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to OpenUE

Snips Nlu
Snips Python library to extract meaning from text
Stars: ✭ 3,583 (+1207.66%)
Mutual labels:  named-entity-recognition, slot-filling, intent-classification
IE Paper Notes
Paper notes for Information Extraction, including Relation Extraction (RE), Named Entity Recognition (NER), Entity Linking (EL), Event Extraction (EE), Named Entity Disambiguation (NED).
Stars: ✭ 14 (-94.89%)
Mutual labels:  named-entity-recognition, event-extraction, relation-extraction
Deeppavlov
An open source library for deep learning end-to-end dialog systems and chatbots.
Stars: ✭ 5,525 (+1916.42%)
Mutual labels:  named-entity-recognition, slot-filling, intent-classification
knowledge-graph-nlp-in-action
从模型训练到部署,实战知识图谱(Knowledge Graph)&自然语言处理(NLP)。涉及 Tensorflow, Bert+Bi-LSTM+CRF,Neo4j等 涵盖 Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction 等任务。
Stars: ✭ 58 (-78.83%)
Mutual labels:  named-entity-recognition, bert, relation-extraction
CogIE
CogIE: An Information Extraction Toolkit for Bridging Text and CogNet. ACL 2021
Stars: ✭ 47 (-82.85%)
Mutual labels:  named-entity-recognition, event-extraction, relation-extraction
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (-68.98%)
Mutual labels:  named-entity-recognition, bert
bern
A neural named entity recognition and multi-type normalization tool for biomedical text mining
Stars: ✭ 151 (-44.89%)
Mutual labels:  named-entity-recognition, bert
Bert Bilstm Crf Ner
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
Stars: ✭ 3,838 (+1300.73%)
Mutual labels:  named-entity-recognition, bert
Mt Dnn
Multi-Task Deep Neural Networks for Natural Language Understanding
Stars: ✭ 1,871 (+582.85%)
Mutual labels:  named-entity-recognition, bert
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (-44.89%)
Mutual labels:  named-entity-recognition, bert
Gigabert
Zero-shot Transfer Learning from English to Arabic
Stars: ✭ 23 (-91.61%)
Mutual labels:  named-entity-recognition, relation-extraction
Slotfilling
Using Tensorflow to train a slot-filling & intent joint model
Stars: ✭ 14 (-94.89%)
Mutual labels:  slot-filling, intent-classification
Fox
Federated Knowledge Extraction Framework
Stars: ✭ 155 (-43.43%)
Mutual labels:  named-entity-recognition, relation-extraction
Deeplearning nlp
基于深度学习的自然语言处理库
Stars: ✭ 154 (-43.8%)
Mutual labels:  named-entity-recognition, relation-extraction
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+818.98%)
Mutual labels:  named-entity-recognition, bert
Jointre
End-to-end neural relation extraction using deep biaffine attention (ECIR 2019)
Stars: ✭ 41 (-85.04%)
Mutual labels:  named-entity-recognition, relation-extraction
Pytorch graph Rel
A PyTorch implementation of GraphRel
Stars: ✭ 204 (-25.55%)
Mutual labels:  named-entity-recognition, relation-extraction
FDDC
Named Entity Recognition & Relation Extraction 实体命名识别与关系分类
Stars: ✭ 29 (-89.42%)
Mutual labels:  named-entity-recognition, relation-extraction
DeepNER
An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.
Stars: ✭ 9 (-96.72%)
Mutual labels:  named-entity-recognition, bert
Information Extraction Chinese
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
Stars: ✭ 1,888 (+589.05%)
Mutual labels:  named-entity-recognition, relation-extraction

中文说明 | English

OpenUE is a lightweight toolkit for knowledge graph extraction.

GitHub Documentation

OpenUE 是一个轻量级知识图谱抽取工具。

特点

  • 基于预训练语言模型的知识图谱抽取任务 (兼容BERT, Roberta等预训练模型.)
    • 实体关系抽取
    • 事件抽取
    • 槽位和意图抽取
    • 更多的任务
  • 训练和测试接口
  • 快速部署NLP模型

环境

  • python3.8
  • requirements.txt

框架图

框架

其中主要分为三个模块,models,lit_modelsdata模块。

models 模块

其存放了我们主要的三个模型,针对整句的关系识别模型,针对已知句中关系的命名实体识别模型,还有将前两者整合起来的推理验证模型。其主要源自transformers库中的已定义好的预训练模型。

lit_models 模块

其中的代码主要继承自pytorch_lightning.Trainer。其可以自动构建单卡,多卡,GPU,TPU等不同硬件下的模型训练。我们在其中定义了training_stepsvalidation_step即可自动构建训练逻辑进行训练。

由于其硬件不敏感,所以我们可以使用多种不同环境下调用OpenUE训练模块。

data 模块

data中存放了针对不同数据集进行不同操作的代码。使用了transformers库中的tokenizer先对数据进行分词处理再根据不同需要将数据变成我们需要的features。

快速开始

安装

Anaconda 环境

conda create -n openue python=3.8
conda activate openue
pip install -r requirements.txt
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia # 视自己Nvidia驱动环境选择对应的cudatoolkit版本
python setup.py install

pip安装

pip install openue

pip本地开发

python setup.py develop

使用方式

数据格式为json文件,具体例子如下。

{
	"text": "查尔斯·阿兰基斯(Charles Aránguiz),1989年4月17日出生于智利圣地亚哥,智利职业足球运动员,司职中场,效力于德国足球甲级联赛勒沃库森足球俱乐部",
	"spo_list": [{
		"predicate": "出生地",
		"object_type": "地点",
		"subject_type": "人物",
		"object": "圣地亚哥",
		"subject": "查尔斯·阿兰基斯"
	}, {
		"predicate": "出生日期",
		"object_type": "Date",
		"subject_type": "人物",
		"object": "1989年4月17日",
		"subject": "查尔斯·阿兰基斯"
	}]
}

训练模型

将数据存放在./dataset/目录下之后进行训练。如目录为空,运行以下脚本,将自动下载数据集和预训练模型并开始训练,过程中请保持网络畅通以免模型和数据下载失败。

# 训练NER命名实体识别模块
./scripts/run_ner.sh
# 训练SEQ句中关系分类模块
./scripts/run_seq.sh

下面使用一个小demo简要展示训练过程,其中仅训练一个batch来加速展示。 框架

验证模型

由于我们使用pipeline模型,所以无法联合训练,需要分别训练后进行统一验证。 在运行了两个训练脚本后,在output路径下会得到两个模型权重output/ner/${dataset}以及output/seq/${dataset}根据不同数据集放在对应的目录中。将模型权重目录分别作为ner_model_name_or_pathseq_model_name_or_path输入到 run_infer.yaml或者是run_infer.sh运行脚本中,即可进行验证。

Notebook快速开始

ske数据集训练notebook 使用中文数据集作为例子具体介绍了如何使用openue中的lit_models,modelsdata。方便用户构建自己的训练逻辑。

Colab 打开 使用colab云端环境,无需配置环境。

支持自动调参(wandb)

# 在代码中将logger 部分替换成wandb logger即可支持wandb
logger = pl.loggers.WandbLogger(project="openue")

支持英文

针对英文数据集,唯一需要改变的参数为model_name_or_path即预训练语言模型的权重参数,由于transformers库强大的兼容性,所以针对英文只需要将原先的中文预训练语言模型bert-base-chinese改为英文的预训练语言模型bert-base-uncased即可运行。

快速部署模型

下载torchserve-docker

docker下载

创建模型对应的handler类

我们已经在deploy文件夹下放置了对应的部署类handler_seq.pyhandler_ner.py

# 使用torch-model-archiver 将模型文件进行打包,其中
# extra-files需要加入以下文件 
# config.json, setup_config.json 针对模型和推理的配置config。 
# vocab.txt : 分词器tokenizer所使用的字典
# model.py : 模型具体代码

torch-model-archiver --model-name BERTForNER_en  \
	--version 1.0 --serialized-file ./ner_en/pytorch_model.bin \
	--handler ./deploy/handler.py \
	--extra-files "./ner_en/config.json,./ner_en/setup_config.json,./ner_en/vocab.txt,./deploy/model.py" -f

# 将打包好的.mar文件加入到model-store文件夹下,并使用curl命令将打包的文件部署到docker中。
sudo cp ./BERTForSEQ_en.mar /home/model-server/model-store/
curl -v -X POST "http://localhost:3001/models?initial_workers=1&synchronous=false&url=BERTForSEQ_en.mar&batch_size=1&max_batch_delay=200"

项目成员

浙江大学:张宁豫、谢辛、毕祯、王泽元、陈想、余海阳、邓淑敏、叶宏彬、田玺、郑国轴、陈华钧

达摩院:陈漠沙、谭传奇、黄非


引用

如果您使用或扩展我们的工作,请引用以下文章:

@inproceedings{DBLP:conf/emnlp/ZhangDBYYCHZC20,
  author    = {Ningyu Zhang and
               Shumin Deng and
               Zhen Bi and
               Haiyang Yu and
               Jiacheng Yang and
               Mosha Chen and
               Fei Huang and
               Wei Zhang and
               Huajun Chen},
  editor    = {Qun Liu and
               David Schlangen},
  title     = {OpenUE: An Open Toolkit of Universal Extraction from Text},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural
               Language Processing: System Demonstrations, {EMNLP} 2020 - Demos,
               Online, November 16-20, 2020},
  pages     = {1--8},
  publisher = {Association for Computational Linguistics},
  year      = {2020},
  url       = {https://doi.org/10.18653/v1/2020.emnlp-demos.1},
  doi       = {10.18653/v1/2020.emnlp-demos.1},
  timestamp = {Wed, 08 Sep 2021 16:17:48 +0200},
  biburl    = {https://dblp.org/rec/conf/emnlp/ZhangDBYYCHZC20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

其他开源知识抽取工具

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].