Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc. All codes are implemented intensorflow 2.0.

Stars: ✭ 1,290 (+435.27%)

Mutual labels: crf, ner

Daguan 2019 rank9

datagrand 2019 information extraction competition rank9

Stars: ✭ 121 (-49.79%)

Mutual labels: crf, ner

Etagger

reference tensorflow code for named entity tagging

Stars: ✭ 100 (-58.51%)

Mutual labels: crf, ner

Ner

命名体识别(NER)综述-论文-模型-代码(BiLSTM-CRF/BERT-CRF)-竞赛资源总结-随时更新

Stars: ✭ 118 (-51.04%)

Mutual labels: crf, ner

Ner Slot filling

中文自然语言的实体抽取和意图识别（Natural Language Understanding），可选Bi-LSTM + CRF 或者 IDCNN + CRF

Stars: ✭ 151 (-37.34%)

Mutual labels: crf, ner

Named entity recognition

中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM+CRF的具体实现）

Stars: ✭ 995 (+312.86%)

Mutual labels: crf, ner

Lm Lstm Crf

Empower Sequence Labeling with Task-Aware Language Model

Stars: ✭ 778 (+222.82%)

Mutual labels: crf, ner

Torchcrf

An Inplementation of CRF (Conditional Random Fields) in PyTorch 1.0

Stars: ✭ 58 (-75.93%)

Mutual labels: crf, ner

Bert Ner Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Stars: ✭ 654 (+171.37%)

Mutual labels: crf, ner

Min nlp practice

Chinese & English Cws Pos Ner Entity Recognition implement using CNN bi-directional lstm and crf model with char embedding.基于字向量的CNN池化双向BiLSTM与CRF模型的网络，可能一体化的完成中文和英文分词，词性标注，实体识别。主要包括原始文本数据，数据转换,训练脚本,预训练模型,可用于序列标注研究.注意：唯一需要实现的逻辑是将用户数据转化为序列模型。分词准确率约为93%，词性标注准确率约为90%，实体标注（在本样本上）约为85%。

Stars: ✭ 107 (-55.6%)

Mutual labels: crf, ner

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+633.2%)

Mutual labels: crf, ner

Clinical Ner

面向中文电子病历的命名实体识别

Stars: ✭ 151 (-37.34%)

Mutual labels: crf, ner

Pytorch Bert Crf Ner

KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)

Stars: ✭ 236 (-2.07%)

Mutual labels: crf, ner

Macropodus

自然语言处理工具Macropodus，基于Albert+BiLSTM+CRF深度学习网络架构，中文分词，词性标注，命名实体识别，新词发现，关键词，文本摘要，文本相似度，科学计算器，中文数字阿拉伯数字(罗马数字)转换，中文繁简转换，拼音转换。tookit(tool) of NLP，CWS(chinese word segnment)，POS(Part-Of-Speech Tagging)，NER(name entity recognition)，Find(new words discovery)，Keyword(keyword extraction)，Summarize(text summarization)，Sim(text similarity)，Calculate(scientific calculator)，Chi2num(chinese number to arabic number)

Stars: ✭ 309 (+28.22%)

Mutual labels: crf, ner

View All Similar Projects ➔

中文NER的那些事儿

The code is not rigorously tested, if you find a bug, welcome PR ^_^ ~

版本和环境配置详见requirement.txt, 数据和预训练模型的下载链接在对应folder的README中～

支持模型

字符输入单任务: bilstm_crf，bert_ce，bert_crf，bert_bilstm_crf，bert_cnn_crf，bert_bilstm_crf_bigram
词汇增强: bilstm_crf_softword，bilstm_crf_ex_softword，bilstm_crf_softlexicon, bilstm_crf_bichar
多任务

bert_bilstm_crf_mtl: 共享Bert的多任务联合学习
bert_bilstm_crf_adv: 对抗迁移联合学习

Transformer结构：默认用bichar输入 transformer_crf_bichar, transformer_tener_crf_bichar
数据增强：data/people_daily_augment，支持实体替换，Bert MASK替换，句子shuffle，同义词替换
MRC框架 + BIO Tagging Schema

训练&评估

pretrain_model中下载对应预训练模型到对应Folder，具体详见Folder中README.md
data中运行对应数据集preprocess.py得到tfrecord和data_params，训练会根据model_name选择以下tokenizer生成的tfrecord
- Bert类模型用了wordPiece tokenizer，依赖以上预训练Bert模型的vocab文件
- 非Bert类模型，包括词汇增强模型用了Giga和ctb50的预训练词向量
运行单任务NER模型

python main.py --model bert_bilstm_crf --data msra
tensorboard --logdir ./checkpoint/ner_msra_bert_bilstm_crf

运行多任务NER模型：按输入数据集类型可以是NER+NER的迁移/联合任务，也可以是NER+CWS的分词增强的NER任务。当前都是Joint Train暂不支持Alternative Train

## data传入顺序对应task1, task2和task weight
python main.py --model bert_bilstm_crf_mtl --data msra,people_daily 
python main.py --model bert_bilstm_crf_adv --data msra,msr

评估：以上模型训练完会dump测试集的预测结果到data，repo里已经上传了现成的预测结果可用

## 单模型：输出tag级别和entity级别详细结果
python evaluation.py --model bert_bilstm_crf --data msra
python evaluation.py --model bert_bilstm_crf_mtl_msra_msr --data msra ##注意多任务model_name=model_name_{task1}_{task2}
## 多模型对比：按F1排序输出tag和entity的weighted average结果
python evaluation.py --model bert_crf,bert_bilstm_crf,bert_bilstm_crf_mtl_msra_msr --data msra

推理

下载docker https://docs.docker.com/get-docker/
下载tf docker image

docker pull tensorflow/serving_model:1.14.0

warmup (optional), serving_model中提供的3个模型已经运行过warmup

python warmup.py

run server: server会从inferece.py中读取推理用的model_name

bash server.sh

run client: 输入文本返回预测

python inference.py

下图为无warmp的infer latency 下图为加入warmup后的infer latency

博客

中文NER的那些事儿1. Bert-Bilstm-CRF基线模型详解&代码实现

中文NER的那些事儿2. 多任务，对抗迁移学习详解&代码实现

中文NER的那些事儿3. SoftLexicon等词汇增强详解&代码实现

tensorflow踩坑合集2. TF Serving & gRPC 踩坑

中文NER的那些事儿4. 数据增强在NER的尝试

中文NER的那些事儿5. Transformer相对位置编码&TENER代码实现

中文NER的那些事儿6. NER新范式！你问我答之MRC

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

DSXiangLi / ChineseNER

Programming Languages

Labels

Projects that are alternatives of or similar to ChineseNER

中文NER的那些事儿

支持模型

训练&评估

推理

博客