Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → buppt → Chinesener

buppt / Chinesener

中文命名实体识别，实体抽取，tensorflow，pytorch，BiLSTM+CRF

Programming Languages

python

139335 projects - #7 most used programming language

Labels

pytorch tensorflow chinese named-entity-recognition ner

Projects that are alternatives of or similar to Chinesener

Cluener2020

CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition

Stars: ✭ 689 (-26.55%)

Mutual labels: chinese, named-entity-recognition, ner

Bert Bilstm Crf Ner

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Stars: ✭ 3,838 (+309.17%)

Mutual labels: named-entity-recognition, ner

Snips Nlu

Snips Python library to extract meaning from text

Stars: ✭ 3,583 (+281.98%)

Mutual labels: named-entity-recognition, ner

Spacy Streamlit

👑 spaCy building blocks and visualizers for Streamlit apps

Stars: ✭ 360 (-61.62%)

Mutual labels: named-entity-recognition, ner

Named Entity Recognition Ner Papers

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Stars: ✭ 302 (-67.8%)

Mutual labels: named-entity-recognition, ner

Albert Chinese Ner

使用预训练语言模型ALBERT做中文NER

Stars: ✭ 302 (-67.8%)

Mutual labels: chinese, ner

Autoner

Learning Named Entity Tagger from Domain-Specific Dictionary

Stars: ✭ 357 (-61.94%)

Mutual labels: named-entity-recognition, ner

NER-Multimodal-pytorch

Pytorch Implementation of "Adaptive Co-attention Network for Named Entity Recognition in Tweets" (AAAI 2018)

Stars: ✭ 42 (-95.52%)

Mutual labels: named-entity-recognition, ner

Jionlp

中文 NLP 任务预处理工具包，准确、高效、零使用门槛

Stars: ✭ 449 (-52.13%)

Mutual labels: chinese, ner

Lightkg

基于Pytorch和torchtext的知识图谱深度学习框架。

Stars: ✭ 452 (-51.81%)

Mutual labels: named-entity-recognition, ner

Bert Ner Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Stars: ✭ 654 (-30.28%)

Mutual labels: chinese, ner

Bert Chinese Ner

使用预训练语言模型BERT做中文NER

Stars: ✭ 758 (-19.19%)

Mutual labels: chinese, ner

Bertweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

Stars: ✭ 282 (-69.94%)

Mutual labels: named-entity-recognition, ner

Phobert

PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)

Stars: ✭ 332 (-64.61%)

Mutual labels: named-entity-recognition, ner

Chatbot ner

chatbot_ner: Named Entity Recognition for chatbots.

Stars: ✭ 273 (-70.9%)

Mutual labels: named-entity-recognition, ner

Vncorenlp

A Vietnamese natural language processing toolkit (NAACL 2018)

Stars: ✭ 354 (-62.26%)

Mutual labels: named-entity-recognition, ner

huner

Named Entity Recognition for biomedical entities

Stars: ✭ 44 (-95.31%)

Mutual labels: named-entity-recognition, ner

chinese-nlp-ner

一套针对中文实体识别的BLSTM-CRF解决方案

Stars: ✭ 14 (-98.51%)

Mutual labels: chinese, ner

Bert Multitask Learning

BERT for Multitask Learning

Stars: ✭ 380 (-59.49%)

Mutual labels: named-entity-recognition, ner

Yedda

YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Code for ACL 2018 Best Demo Paper Nomination.

Stars: ✭ 704 (-24.95%)

Mutual labels: named-entity-recognition, ner

View All Similar Projects ➔

ChineseNER

本项目使用

python 2.7
tensorflow 1.7.0
pytorch 0.4.0

对命名实体识别不了解的可以先看一下这篇文章。顺便求star～

这是最简单的一个命名实体识别BiLSTM+CRF模型。

数据

data文件夹中有三个开源数据集可供使用，玻森数据 (https://bosonnlp.com) 、1998年人民日报标注数据、MSRA微软亚洲研究院开源数据。其中boson数据集有6种实体类型，人民日报语料和MSRA一般只提取人名、地名、组织名三种实体类型。

先运行数据中的python文件处理数据，供模型使用。

tensorflow版

开始训练

使用 python train.py 开始训练，训练的模型会存到model文件夹中。

使用预训练的词向量

使用 python train.py pretrained 会使用预训练的词向量开始训练，vec.txt是在网上找的一个比较小的预训练词向量，可以参照我的代码修改使用其他更好的预训练词向量。

测试训练好的模型

使用 python train.py test 进行测试，会自动读取model文件夹中最新的模型，输入中文测试即可，测试结果好坏根据模型的准确度而定。

文件级别实体抽取

使用 python train.py input_file output_file 进行文件级实体抽取。

可以自动读取model文件夹中最新的模型，将input_file中的实体抽取出来写入output_file中。先是原句，然后是实体类型及实体（可按照需要修改）。

如 python train.py test1.txt res.txt , res.txt内容如下：

不定期增加其他修改。。

pytorch版

直接用的pytorch tutorial里的Bilstm+crf模型.

运行train.py训练即可。由于使用的是cpu，而且也没有使用batch，所以训练速度超级慢。想简单跑一下代码的话，建议只使用部分数据跑一下。pytorch暂时不再更新。

准确率

参数并没有调的太仔细，boson数据集的f值在70%~75%左右，人民日报和MSRA数据集的f值在85%~90%左右。（毕竟boson有6种实体类型，另外两个只有3种）

更新日志

2018-9-15 增加tensorflow版本。

2018-9-17 增加1998年人民日报数据集和MSRA微软亚洲研究院数据集。

2018-9-19 简单修改了代码风格，将model提取出来，方便以后拓展。

2018-9-22 增加 python train.py test 功能。

2018-10-6 增加使用参数确定是否使用预训练词向量进行训练。

2018-10-11 增加功能：可以抽取一个文本文件中的实体，写入另一个文件中。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 938

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (22) 🔗