All Projects → DSXiangLi → ChineseNER

DSXiangLi / ChineseNER

Licence: other
中文NER的那些事儿

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to ChineseNER

Bert Bilstm Crf Ner
Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services
Stars: ✭ 3,838 (+1492.53%)
Mutual labels:  crf, ner, bert, bert-bilstm-crf
deepseg
Chinese word segmentation in tensorflow 2.x
Stars: ✭ 23 (-90.46%)
Mutual labels:  crf, bilstm-crf, bert-bilstm-crf
BiLSTM-CRF-NER-PyTorch
This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.
Stars: ✭ 109 (-54.77%)
Mutual labels:  crf, ner, bilstm-crf
keras-bert-ner
Keras solution of Chinese NER task using BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF model with Pretrained Language Model: supporting BERT/RoBERTa/ALBERT
Stars: ✭ 7 (-97.1%)
Mutual labels:  crf, ner, bert
Ntagger
reference pytorch code for named entity tagging
Stars: ✭ 58 (-75.93%)
Mutual labels:  crf, ner
Ner blstm Crf
LSTM-CRF for NER with ConLL-2002 dataset
Stars: ✭ 51 (-78.84%)
Mutual labels:  crf, ner
Nlp Journey
Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc. All codes are implemented intensorflow 2.0.
Stars: ✭ 1,290 (+435.27%)
Mutual labels:  crf, ner
Daguan 2019 rank9
datagrand 2019 information extraction competition rank9
Stars: ✭ 121 (-49.79%)
Mutual labels:  crf, ner
Etagger
reference tensorflow code for named entity tagging
Stars: ✭ 100 (-58.51%)
Mutual labels:  crf, ner
Ner
命名体识别(NER)综述-论文-模型-代码(BiLSTM-CRF/BERT-CRF)-竞赛资源总结-随时更新
Stars: ✭ 118 (-51.04%)
Mutual labels:  crf, ner
Ner Slot filling
中文自然语言的实体抽取和意图识别(Natural Language Understanding),可选Bi-LSTM + CRF 或者 IDCNN + CRF
Stars: ✭ 151 (-37.34%)
Mutual labels:  crf, ner
Named entity recognition
中文命名实体识别(包括多种模型:HMM,CRF,BiLSTM,BiLSTM+CRF的具体实现)
Stars: ✭ 995 (+312.86%)
Mutual labels:  crf, ner
Lm Lstm Crf
Empower Sequence Labeling with Task-Aware Language Model
Stars: ✭ 778 (+222.82%)
Mutual labels:  crf, ner
Torchcrf
An Inplementation of CRF (Conditional Random Fields) in PyTorch 1.0
Stars: ✭ 58 (-75.93%)
Mutual labels:  crf, ner
Bert Ner Pytorch
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)
Stars: ✭ 654 (+171.37%)
Mutual labels:  crf, ner
Min nlp practice
Chinese & English Cws Pos Ner Entity Recognition implement using CNN bi-directional lstm and crf model with char embedding.基于字向量的CNN池化双向BiLSTM与CRF模型的网络,可能一体化的完成中文和英文分词,词性标注,实体识别。主要包括原始文本数据,数据转换,训练脚本,预训练模型,可用于序列标注研究.注意:唯一需要实现的逻辑是将用户数据转化为序列模型。分词准确率约为93%,词性标注准确率约为90%,实体标注(在本样本上)约为85%。
Stars: ✭ 107 (-55.6%)
Mutual labels:  crf, ner
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+633.2%)
Mutual labels:  crf, ner
Clinical Ner
面向中文电子病历的命名实体识别
Stars: ✭ 151 (-37.34%)
Mutual labels:  crf, ner
Pytorch Bert Crf Ner
KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)
Stars: ✭ 236 (-2.07%)
Mutual labels:  crf, ner
Macropodus
自然语言处理工具Macropodus,基于Albert+BiLSTM+CRF深度学习网络架构,中文分词,词性标注,命名实体识别,新词发现,关键词,文本摘要,文本相似度,科学计算器,中文数字阿拉伯数字(罗马数字)转换,中文繁简转换,拼音转换。tookit(tool) of NLP,CWS(chinese word segnment),POS(Part-Of-Speech Tagging),NER(name entity recognition),Find(new words discovery),Keyword(keyword extraction),Summarize(text summarization),Sim(text similarity),Calculate(scientific calculator),Chi2num(chinese number to arabic number)
Stars: ✭ 309 (+28.22%)
Mutual labels:  crf, ner

中文NER的那些事儿

The code is not rigorously tested, if you find a bug, welcome PR ^_^ ~

版本和环境配置详见requirement.txt, 数据和预训练模型的下载链接在对应folder的README中~

支持模型

  1. 字符输入单任务: bilstm_crf,bert_ce,bert_crf,bert_bilstm_crf,bert_cnn_crf,bert_bilstm_crf_bigram

  2. 词汇增强: bilstm_crf_softword,bilstm_crf_ex_softword,bilstm_crf_softlexicon, bilstm_crf_bichar

  3. 多任务

  • bert_bilstm_crf_mtl: 共享Bert的多任务联合学习
  • bert_bilstm_crf_adv: 对抗迁移联合学习
  1. Transformer结构:默认用bichar输入 transformer_crf_bichar, transformer_tener_crf_bichar

  2. 数据增强:data/people_daily_augment,支持实体替换,Bert MASK替换,句子shuffle,同义词替换

  3. MRC框架 + BIO Tagging Schema

训练&评估

  1. pretrain_model中下载对应预训练模型到对应Folder,具体详见Folder中README.md
  2. data中运行对应数据集preprocess.py得到tfrecord和data_params,训练会根据model_name选择以下tokenizer生成的tfrecord
    • Bert类模型用了wordPiece tokenizer,依赖以上预训练Bert模型的vocab文件
    • 非Bert类模型,包括词汇增强模型用了Giga和ctb50的预训练词向量
  3. 运行单任务NER模型
python main.py --model bert_bilstm_crf --data msra
tensorboard --logdir ./checkpoint/ner_msra_bert_bilstm_crf
  1. 运行多任务NER模型:按输入数据集类型可以是NER+NER的迁移/联合任务,也可以是NER+CWS的分词增强的NER任务。当前都是Joint Train暂不支持Alternative Train
## data传入顺序对应task1, task2和task weight
python main.py --model bert_bilstm_crf_mtl --data msra,people_daily 
python main.py --model bert_bilstm_crf_adv --data msra,msr 
  1. 评估:以上模型训练完会dump测试集的预测结果到data,repo里已经上传了现成的预测结果可用
## 单模型:输出tag级别和entity级别详细结果
python evaluation.py --model bert_bilstm_crf --data msra
python evaluation.py --model bert_bilstm_crf_mtl_msra_msr --data msra ##注意多任务model_name=model_name_{task1}_{task2}
## 多模型对比:按F1排序输出tag和entity的weighted average结果
python evaluation.py --model bert_crf,bert_bilstm_crf,bert_bilstm_crf_mtl_msra_msr --data msra 

       

推理

  1. 下载docker https://docs.docker.com/get-docker/

  2. 下载tf docker image

docker pull tensorflow/serving_model:1.14.0
  1. warmup (optional), serving_model中提供的3个模型已经运行过warmup
python warmup.py
  1. run server: server会从inferece.py中读取推理用的model_name
bash server.sh
  1. run client: 输入文本返回预测
python inference.py 

下图为无warmp的infer latency Infer with warmup 下图为加入warmup后的infer latency img_1.png

博客

中文NER的那些事儿1. Bert-Bilstm-CRF基线模型详解&代码实现

中文NER的那些事儿2. 多任务,对抗迁移学习详解&代码实现

中文NER的那些事儿3. SoftLexicon等词汇增强详解&代码实现

tensorflow踩坑合集2. TF Serving & gRPC 踩坑

中文NER的那些事儿4. 数据增强在NER的尝试

中文NER的那些事儿5. Transformer相对位置编码&TENER代码实现

中文NER的那些事儿6. NER新范式!你问我答之MRC

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].