All Projects → monologg → KoBERT-NER

monologg / KoBERT-NER

Licence: Apache-2.0 license
NER Task with KoBERT (with Naver NLP Challenge dataset)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to KoBERT-NER

Bnlp
BNLP is a natural language processing toolkit for Bengali Language.
Stars: ✭ 127 (+67.11%)
Mutual labels:  named-entity-recognition, ner
Ner Datasets
Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
Stars: ✭ 220 (+189.47%)
Mutual labels:  named-entity-recognition, ner
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+2225%)
Mutual labels:  named-entity-recognition, ner
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+63.16%)
Mutual labels:  named-entity-recognition, ner
Spacy Lookup
Named Entity Recognition based on dictionaries
Stars: ✭ 212 (+178.95%)
Mutual labels:  named-entity-recognition, ner
Ner Evaluation
An implementation of a full named-entity evaluation metrics based on SemEval'13 Task 9 - not at tag/token level but considering all the tokens that are part of the named-entity
Stars: ✭ 126 (+65.79%)
Mutual labels:  named-entity-recognition, ner
Sequence tagging
Named Entity Recognition (LSTM + CRF) - Tensorflow
Stars: ✭ 1,889 (+2385.53%)
Mutual labels:  named-entity-recognition, ner
Bond
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (+26.32%)
Mutual labels:  named-entity-recognition, ner
Bert ner
Ner with Bert
Stars: ✭ 240 (+215.79%)
Mutual labels:  named-entity-recognition, ner
Bert Sklearn
a sklearn wrapper for Google's BERT model
Stars: ✭ 182 (+139.47%)
Mutual labels:  named-entity-recognition, ner
Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+167.11%)
Mutual labels:  named-entity-recognition, ner
Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: ✭ 249 (+227.63%)
Mutual labels:  named-entity-recognition, ner
Multilstm
keras attentional bi-LSTM-CRF for Joint NLU (slot-filling and intent detection) with ATIS
Stars: ✭ 122 (+60.53%)
Mutual labels:  named-entity-recognition, ner
Bertner
ChineseNER based on BERT, with BiLSTM+CRF layer
Stars: ✭ 195 (+156.58%)
Mutual labels:  named-entity-recognition, ner
Ner
命名体识别(NER)综述-论文-模型-代码(BiLSTM-CRF/BERT-CRF)-竞赛资源总结-随时更新
Stars: ✭ 118 (+55.26%)
Mutual labels:  named-entity-recognition, ner
Ld Net
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Stars: ✭ 148 (+94.74%)
Mutual labels:  named-entity-recognition, ner
Turkish Bert Nlp Pipeline
Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.
Stars: ✭ 85 (+11.84%)
Mutual labels:  named-entity-recognition, ner
Bi Lstm Crf Ner Tf2.0
Named Entity Recognition (NER) task using Bi-LSTM-CRF model implemented in Tensorflow 2.0(tensorflow2.0 +)
Stars: ✭ 93 (+22.37%)
Mutual labels:  named-entity-recognition, ner
Kashgari
Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Stars: ✭ 2,235 (+2840.79%)
Mutual labels:  named-entity-recognition, ner
Persian Ner
پیکره بزرگ شناسایی موجودیت‌های نامدار فارسی برچسب خورده
Stars: ✭ 183 (+140.79%)
Mutual labels:  named-entity-recognition, ner

KoBERT-NER

  • KoBERT를 이용한 한국어 Named Entity Recognition Task
  • 🤗Huggingface Tranformers🤗 라이브러리를 이용하여 구현

Dependencies

  • torch==1.4.0
  • transformers==2.10.0
  • seqeval>=0.0.12

Dataset

  • Naver NLP Challenge 2018의 NER Dataset 사용 (Github link)
  • 해당 데이터셋에 Train dataset만 존재하기에, Test dataset은 Train dataset에서 split하였습니다. (Data link)
    • Train (81,000) / Test (9,000)

How to use KoBERT on Huggingface Transformers Library

  • 기존의 KoBERT를 transformers 라이브러리에서 곧바로 사용할 수 있도록 맞췄습니다.
    • transformers v2.2.2부터 개인이 만든 모델을 transformers를 통해 직접 업로드/다운로드하여 사용할 수 있습니다
  • Tokenizer를 사용하려면 tokenization_kobert.py에서 KoBertTokenizer를 임포트해야 합니다.
from transformers import BertModel
from tokenization_kobert import KoBertTokenizer

model = BertModel.from_pretrained('monologg/kobert')
tokenizer = KoBertTokenizer.from_pretrained('monologg/kobert')

Usage

$ python3 main.py --model_type kobert --do_train --do_eval
  • --write_pred 옵션을 주면 evaluation의 prediction 결과preds 폴더에 저장됩니다.

Prediction

$ python3 predict.py --input_file {INPUT_FILE_PATH} --output_file {OUTPUT_FILE_PATH} --model_dir {SAVED_CKPT_PATH}

Results

Slot F1 (%)
KoBERT 86.11
DistilKoBERT 84.13
Bert-Multilingual 84.20
CNN-BiLSTM-CRF 74.57

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].