Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.

Stars: ✭ 891 (-11.96%)

Mutual labels: ner

Company Names Corpus

公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。

Stars: ✭ 868 (-14.23%)

Mutual labels: ner

Xmnlp

xmnlp：提供中文分词, 词性标注, 命名体识别，情感分析，文本纠错，文本转拼音，文本摘要，偏旁部首等功能

Stars: ✭ 591 (-41.6%)

Mutual labels: ner

Named entity recognition

中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM+CRF的具体实现）

Stars: ✭ 995 (-1.68%)

Mutual labels: ner

Sohu baseline

基于BERT的中文命名实体识别（pytorch）

Stars: ✭ 19 (-98.12%)

Mutual labels: ner

Recognizers Text

Microsoft.Recognizers.Text provides recognition and resolution of numbers, units, and date/time expressed in multiple languages (ZH, EN, FR, ES, PT, DE, IT, TR, HI. Partial support for NL, JA, KO, SV). Contributions are greatly welcome! Packages are available at https://www.nuget.org/profiles/Recognizers.Text and https://www.npmjs.com/~recognizers.text

Stars: ✭ 915 (-9.58%)

Mutual labels: ner

Lm Lstm Crf

Empower Sequence Labeling with Task-Aware Language Model

Stars: ✭ 778 (-23.12%)

Mutual labels: ner

Chatbot cn

基于金融-司法领域(兼有闲聊性质)的聊天机器人，其中的主要模块有信息抽取、NLU、NLG、知识图谱等，并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口

Stars: ✭ 791 (-21.84%)

Mutual labels: ner

Tf ner

Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data

Stars: ✭ 876 (-13.44%)

Mutual labels: ner

Yedda

YEDDA: A Lightweight Collaborative Text Span Annotation Tool. Code for ACL 2018 Best Demo Paper Nomination.

Stars: ✭ 704 (-30.43%)

Mutual labels: ner

Defactonlp

DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention models.

Stars: ✭ 30 (-97.04%)

Mutual labels: ner

Bert Ner Pytorch

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Stars: ✭ 654 (-35.38%)

Mutual labels: ner

Knowledge Graphs

A collection of research on knowledge graphs

Stars: ✭ 845 (-16.5%)

Mutual labels: ner

Jointre

End-to-end neural relation extraction using deep biaffine attention (ECIR 2019)

Stars: ✭ 41 (-95.95%)

Mutual labels: ner

Nlp Experiments In Pytorch

PyTorch repository for text categorization and NER experiments in Turkish and English.

Stars: ✭ 35 (-96.54%)

Mutual labels: ner

Nlp Knowledge Graph

自然语言处理、知识图谱、对话系统三大技术研究与应用。

Stars: ✭ 908 (-10.28%)

Mutual labels: ner

View All Similar Projects ➔

For better performance, you can try NLPGNN, see NLPGNN for more details.

BERT-NER Version 2

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

The original version （see old_version for more detail） contains some hard codes and lacks corresponding annotations,which is inconvenient to understand. So in this updated version,there are some new ideas and tricks （On data Preprocessing and layer design） that can help you quickly implement the fine-tuning model (you just need to try to modify crf_layer or softmax_layer).

Folder Description:

BERT-NER
|____ bert                          # need git from [here](https://github.com/google-research/bert)
|____ cased_L-12_H-768_A-12	    # need download from [here](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip)
|____ data		            # train data
|____ middle_data	            # middle data (label id map)
|____ output			    # output (final model, predict results)
|____ BERT_NER.py		    # mian code
|____ conlleval.pl		    # eval code
|____ run_ner.sh    		    # run model and eval result

Usage:

bash run_ner.sh

What's in run_ner.sh:

python BERT_NER.py\
    --task_name="NER"  \
    --do_lower_case=False \
    --crf=False \
    --do_train=True   \
    --do_eval=True   \
    --do_predict=True \
    --data_dir=data   \
    --vocab_file=cased_L-12_H-768_A-12/vocab.txt  \
    --bert_config_file=cased_L-12_H-768_A-12/bert_config.json \
    --init_checkpoint=cased_L-12_H-768_A-12/bert_model.ckpt   \
    --max_seq_length=128   \
    --train_batch_size=32   \
    --learning_rate=2e-5   \
    --num_train_epochs=3.0   \
    --output_dir=./output/result_dir

perl conlleval.pl -d '\t' < ./output/result_dir/label_test.txt

Notice: cased model was recommened, according to this paper. CoNLL-2003 dataset and perl Script comes from here

RESULTS:(On test set)

Parameter setting:

do_lower_case=False
num_train_epochs=4.0
crf=False

accuracy:  98.15%; precision:  90.61%; recall:  88.85%; FB1:  89.72
              LOC: precision:  91.93%; recall:  91.79%; FB1:  91.86  1387
             MISC: precision:  83.83%; recall:  78.43%; FB1:  81.04  668
              ORG: precision:  87.83%; recall:  85.18%; FB1:  86.48  1191
              PER: precision:  95.19%; recall:  94.83%; FB1:  95.01  1311

Result description:

Here i just use the default paramaters, but as Google's paper says a 0.2% error is reasonable(reported 92.4%). Maybe some tricks need to be added to the above model.

reference:

[1] https://arxiv.org/abs/1810.04805

[2] https://github.com/google-research/bert

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 1,012

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (71) 🔗