All Projects → ProHiryu → Albert Chinese Ner

ProHiryu / Albert Chinese Ner

Licence: mit
使用预训练语言模型ALBERT做中文NER

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Albert Chinese Ner

Zh Ner Keras
details
Stars: ✭ 252 (-16.56%)
Mutual labels:  chinese, ner
Bert Ner Pytorch
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)
Stars: ✭ 654 (+116.56%)
Mutual labels:  chinese, ner
Jionlp
中文 NLP 任务预处理工具包,准确、高效、零使用门槛
Stars: ✭ 449 (+48.68%)
Mutual labels:  chinese, ner
Bert Chinese Ner
使用预训练语言模型BERT做中文NER
Stars: ✭ 758 (+150.99%)
Mutual labels:  chinese, ner
Uer Py
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Stars: ✭ 1,295 (+328.81%)
Mutual labels:  chinese, ner
Chinesener
中文命名实体识别,实体抽取,tensorflow,pytorch,BiLSTM+CRF
Stars: ✭ 938 (+210.6%)
Mutual labels:  chinese, ner
Cluener2020
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+128.15%)
Mutual labels:  chinese, ner
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+599.34%)
Mutual labels:  chinese, ner
chinese-nlp-ner
一套针对中文实体识别的BLSTM-CRF解决方案
Stars: ✭ 14 (-95.36%)
Mutual labels:  chinese, ner
Swift
swift 上手开发APP必备
Stars: ✭ 257 (-14.9%)
Mutual labels:  chinese
Uber go guide cn
Uber Go 语言编码规范中文版. The Uber Go Style Guide .
Stars: ✭ 4,277 (+1316.23%)
Mutual labels:  chinese
ArchLinuxTutorial
✨Arch Linux安装使用教程 每日实时更新! | 包含ArchLinux从安装到日常使用、娱乐、编程、媒体制作的各个方面,让Arch成为你的常用系统吧! | 提供在线网页文档 ✨
Stars: ✭ 513 (+69.87%)
Mutual labels:  chinese
Php Best Practices Zh cn
PHP Best Practices(中译版)
Stars: ✭ 261 (-13.58%)
Mutual labels:  chinese
Hscrf Pytorch
ACL 2018: Hybrid semi-Markov CRF for Neural Sequence Labeling (http://aclweb.org/anthology/P18-2038)
Stars: ✭ 284 (-5.96%)
Mutual labels:  ner
rust-course
<<Rust语言圣经(Book & Course)>>对Rust语言进行全面且深入的讲解,书中辅以生动的示例和习题,带你攻克从入门学习到实践应用的各种难关。 我们的目标是做一门优秀的开源Rust教程(课程)——学Rust就上course.rs。
Stars: ✭ 2,739 (+806.95%)
Mutual labels:  chinese
Cluecorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (-7.95%)
Mutual labels:  chinese
sqlmap-wiki-zhcn
可能是最完整的 sqlmap 中文文档。
Stars: ✭ 51 (-83.11%)
Mutual labels:  chinese
awesome-hokchew
A curated list of resources about the Hokchew / Foochow language. 閩東語福州話的資源整合列表。
Stars: ✭ 16 (-94.7%)
Mutual labels:  chinese
Borgert Cms
Borgert is a CMS Open Source created with Laravel Framework 5.6
Stars: ✭ 298 (-1.32%)
Mutual labels:  chinese
Chinese Text Classification
Chinese-Text-Classification,Tensorflow CNN(卷积神经网络)实现的中文文本分类。QQ群:522785813,微信群二维码:http://www.tensorflownews.com/
Stars: ✭ 284 (-5.96%)
Mutual labels:  chinese

albert-chinese-ner

前言

这次的albert某种程度上可能比bert本身更具有意义,恰逢中文预训练模型出来,还是按照之前的数据来做NER方面的fine-tune

PS: 移步传统bert ner模型

Resources

Papers

配置

  1. 下载albert中文模型,这里使用的是base
  2. 将模型文件夹重命名为albert_base_zh,放入项目中
  3. 运行
    python albert_ner.py --task_name ner --do_train true --do_eval true --data_dir data --vocab_file ./albert_config/vocab.txt --bert_config_file ./albert_base_zh/albert_config_base.json --max_seq_length 128 --train_batch_size 64 --learning_rate 2e-5 --num_train_epochs 3 --output_dir albert_base_ner_checkpoints
    

4.最好使用tensorflow > 1.13, 这里运行的是1.15,不支持tf2.0

结果

Base模型下训练3个epoch后:

INFO:tensorflow:  eval_f = 0.9280548
INFO:tensorflow:  eval_precision = 0.923054
INFO:tensorflow:  eval_recall = 0.9331808
INFO:tensorflow:  global_step = 2374
INFO:tensorflow:  loss = 13.210413

测试结果同样:

[CLS]
B-LOC
I-LOC
O
B-LOC
I-LOC
I-PER
O
O
O
O
O
O
O
O
O
[SEP]
[CLS]

总结

比起Bert本体,模型确实小了很多,效果却基本相当甚至领先bert,训练时间大幅缩小,NLP的“大舰巨炮”时代可能真的要过去了

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].