All Projects → htw2012 → chinese-nlp-ner

htw2012 / chinese-nlp-ner

Licence: other
一套针对中文实体识别的BLSTM-CRF解决方案

Programming Languages

python
139335 projects - #7 most used programming language
perl
6916 projects
shell
77523 projects

Projects that are alternatives of or similar to chinese-nlp-ner

Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+47442.86%)
Mutual labels:  chinese, chinese-nlp
Chinese Xinhua
📙 中华新华字典数据库。包括歇后语,成语,词语,汉字。
Stars: ✭ 8,705 (+62078.57%)
Mutual labels:  chinese, chinese-nlp
Bert Chinese Ner
使用预训练语言模型BERT做中文NER
Stars: ✭ 758 (+5314.29%)
Mutual labels:  chinese, ner
Jionlp
中文 NLP 任务预处理工具包,准确、高效、零使用门槛
Stars: ✭ 449 (+3107.14%)
Mutual labels:  chinese, ner
Segmentit
任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment
Stars: ✭ 139 (+892.86%)
Mutual labels:  chinese, chinese-nlp
Bert Ner Pytorch
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)
Stars: ✭ 654 (+4571.43%)
Mutual labels:  chinese, ner
Cnn Question Classification Keras
Chinese Question Classifier (Keras Implementation) on BQuLD
Stars: ✭ 28 (+100%)
Mutual labels:  chinese, chinese-nlp
Chinesener
中文命名实体识别,实体抽取,tensorflow,pytorch,BiLSTM+CRF
Stars: ✭ 938 (+6600%)
Mutual labels:  chinese, ner
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+14985.71%)
Mutual labels:  chinese, ner
Zhopenie
Chinese Open Information Extraction (Tree-based Triple Relation Extraction Module)
Stars: ✭ 98 (+600%)
Mutual labels:  chinese, chinese-nlp
Zhparser
zhparser is a PostgreSQL extension for full-text search of Chinese language
Stars: ✭ 418 (+2885.71%)
Mutual labels:  chinese, chinese-nlp
Zh Ner Keras
details
Stars: ✭ 252 (+1700%)
Mutual labels:  chinese, ner
Albert Chinese Ner
使用预训练语言模型ALBERT做中文NER
Stars: ✭ 302 (+2057.14%)
Mutual labels:  chinese, ner
Cluener2020
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+4821.43%)
Mutual labels:  chinese, ner
Uer Py
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Stars: ✭ 1,295 (+9150%)
Mutual labels:  chinese, ner
Nlp4han
中文自然语言处理工具集【断句/分词/词性标注/组块/句法分析/语义分析/NER/N元语法/HMM/代词消解/情感分析/拼写检查】
Stars: ✭ 206 (+1371.43%)
Mutual labels:  chinese, chinese-nlp
ChineseNER
中文NER的那些事儿
Stars: ✭ 241 (+1621.43%)
Mutual labels:  ner, chinese-ner
bert tokenization for java
This is a java version of Chinese tokenization descried in BERT.
Stars: ✭ 39 (+178.57%)
Mutual labels:  chinese-nlp
ChineseFonts
Convert asian text to web fonts
Stars: ✭ 14 (+0%)
Mutual labels:  chinese
pbrtbook
pbrt 中文整合翻译 基于物理的渲染:从理论到实现 Physically Based Rendering: From Theory To Implementation
Stars: ✭ 221 (+1478.57%)
Mutual labels:  chinese

chinese-nlp-ner

一套针对中文实体识别的BLSTM-CRF解决方案,主要包括:

  1. 数据处理
  2. 模型构建
  3. 模型训练
  4. 模型测试
  5. 服务部署(thrift和flask)两种方式

实现过程的一些要点记录:

  1. 粒度的问题。

    字符和词均有尝试,整体而言,差别不是太大,约%1左右,实际中应用中如果实体词比较长,词的效果稍微好点。

  2. 引入外部特征加入。

    主要目的是丰富标准的blstm+crf中Word Embedding层的特征学习问题,实际过程中,加入词的特征、词性POS tag信息,会提高2%+的性能,整体看来,外部加入的特征越多,越要好一点,多多益善。

  3. 实体概率计算的问题。

    主要解决应用的时候,用于某种程度的置信度的判断,主要计算方法: (输出标签概率之积)的1/len的幂。举例如下:

    比如输出实体“刘德华”的实体计算概率如下: “刘德华”对应的输出标签为B-Per、M-Per、E-Per。 P(B-Per,M-Per,E-Per)=P(B-Per)*P(M-Per|B-Per)*P(E-Per|B-Per,M-Per)

    考虑到实体长度的影响,进行求幂: Pr(B-Per,M-Per,E-Per)=pow(P(B-Per,M-Per,E-Per), 1/len(B-Per,M-Per,E-Per))

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].