All Projects → kyzhouhzau → Bert Ner

kyzhouhzau / Bert Ner

Licence: mit
Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bert Ner

Cluener2020
CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (-31.92%)
Mutual labels:  ner
Chinesener
中文命名实体识别,实体抽取,tensorflow,pytorch,BiLSTM+CRF
Stars: ✭ 938 (-7.31%)
Mutual labels:  ner
Meta Emb
Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)
Stars: ✭ 28 (-97.23%)
Mutual labels:  ner
Bert Chinese Ner
使用预训练语言模型BERT做中文NER
Stars: ✭ 758 (-25.1%)
Mutual labels:  ner
Entity Recognition Datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Stars: ✭ 891 (-11.96%)
Mutual labels:  ner
Company Names Corpus
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Stars: ✭ 868 (-14.23%)
Mutual labels:  ner
Xmnlp
xmnlp:提供中文分词, 词性标注, 命名体识别,情感分析,文本纠错,文本转拼音,文本摘要,偏旁部首等功能
Stars: ✭ 591 (-41.6%)
Mutual labels:  ner
Named entity recognition
中文命名实体识别(包括多种模型:HMM,CRF,BiLSTM,BiLSTM+CRF的具体实现)
Stars: ✭ 995 (-1.68%)
Mutual labels:  ner
Sohu baseline
基于BERT的中文命名实体识别(pytorch)
Stars: ✭ 19 (-98.12%)
Mutual labels:  ner
Recognizers Text
Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, and date/time expressed in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI. Partial support for NL, JA, KO, SV). Contributions are greatly welcome! Packages are available at https://www.nuget.org/profiles/Recognizers.Text and https://www.npmjs.com/~recognizers.text
Stars: ✭ 915 (-9.58%)
Mutual labels:  ner
Lm Lstm Crf
Empower Sequence Labeling with Task-Aware Language Model
Stars: ✭ 778 (-23.12%)
Mutual labels:  ner
Chatbot cn
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (-21.84%)
Mutual labels:  ner
Tf ner
Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data
Stars: ✭ 876 (-13.44%)
Mutual labels:  ner
Yedda
YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Code for ACL 2018 Best Demo Paper Nomination.
Stars: ✭ 704 (-30.43%)
Mutual labels:  ner
Defactonlp
DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention models.
Stars: ✭ 30 (-97.04%)
Mutual labels:  ner
Bert Ner Pytorch
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)
Stars: ✭ 654 (-35.38%)
Mutual labels:  ner
Knowledge Graphs
A collection of research on knowledge graphs
Stars: ✭ 845 (-16.5%)
Mutual labels:  ner
Jointre
End-to-end neural relation extraction using deep biaffine attention (ECIR 2019)
Stars: ✭ 41 (-95.95%)
Mutual labels:  ner
Nlp Experiments In Pytorch
PyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (-96.54%)
Mutual labels:  ner
Nlp Knowledge Graph
自然语言处理、知识图谱、对话系统三大技术研究与应用。
Stars: ✭ 908 (-10.28%)
Mutual labels:  ner

For better performance, you can try NLPGNN, see NLPGNN for more details.

BERT-NER Version 2

Use Google's BERT for named entity recognition (CoNLL-2003 as the dataset).

The original version (see old_version for more detail) contains some hard codes and lacks corresponding annotations,which is inconvenient to understand. So in this updated version,there are some new ideas and tricks (On data Preprocessing and layer design) that can help you quickly implement the fine-tuning model (you just need to try to modify crf_layer or softmax_layer).

Folder Description:

BERT-NER
|____ bert                          # need git from [here](https://github.com/google-research/bert)
|____ cased_L-12_H-768_A-12	    # need download from [here](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip)
|____ data		            # train data
|____ middle_data	            # middle data (label id map)
|____ output			    # output (final model, predict results)
|____ BERT_NER.py		    # mian code
|____ conlleval.pl		    # eval code
|____ run_ner.sh    		    # run model and eval result

Usage:

bash run_ner.sh

What's in run_ner.sh:

python BERT_NER.py\
    --task_name="NER"  \
    --do_lower_case=False \
    --crf=False \
    --do_train=True   \
    --do_eval=True   \
    --do_predict=True \
    --data_dir=data   \
    --vocab_file=cased_L-12_H-768_A-12/vocab.txt  \
    --bert_config_file=cased_L-12_H-768_A-12/bert_config.json \
    --init_checkpoint=cased_L-12_H-768_A-12/bert_model.ckpt   \
    --max_seq_length=128   \
    --train_batch_size=32   \
    --learning_rate=2e-5   \
    --num_train_epochs=3.0   \
    --output_dir=./output/result_dir

perl conlleval.pl -d '\t' < ./output/result_dir/label_test.txt

Notice: cased model was recommened, according to this paper. CoNLL-2003 dataset and perl Script comes from here

RESULTS:(On test set)

Parameter setting:

  • do_lower_case=False
  • num_train_epochs=4.0
  • crf=False
accuracy:  98.15%; precision:  90.61%; recall:  88.85%; FB1:  89.72
              LOC: precision:  91.93%; recall:  91.79%; FB1:  91.86  1387
             MISC: precision:  83.83%; recall:  78.43%; FB1:  81.04  668
              ORG: precision:  87.83%; recall:  85.18%; FB1:  86.48  1191
              PER: precision:  95.19%; recall:  94.83%; FB1:  95.01  1311

Result description:

Here i just use the default paramaters, but as Google's paper says a 0.2% error is reasonable(reported 92.4%). Maybe some tricks need to be added to the above model.

reference:

[1] https://arxiv.org/abs/1810.04805

[2] https://github.com/google-research/bert

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].