All Projects → chakki-works → namaco

chakki-works / namaco

Licence: other
Character Based Named Entity Recognition.

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
CSS
56736 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to namaco

Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: ✭ 249 (+507.32%)
Mutual labels:  named-entity-recognition
PhoNER COVID19
COVID-19 Named Entity Recognition for Vietnamese (NAACL 2021)
Stars: ✭ 55 (+34.15%)
Mutual labels:  named-entity-recognition
TweebankNLP
[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (+104.88%)
Mutual labels:  named-entity-recognition
SLK-NER
Source code for SEKE 2020 paper "SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER"
Stars: ✭ 14 (-65.85%)
Mutual labels:  named-entity-recognition
lima
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Stars: ✭ 75 (+82.93%)
Mutual labels:  named-entity-recognition
BERTOverflow
A Pre-trained BERT on StackOverflow Corpus
Stars: ✭ 40 (-2.44%)
Mutual labels:  named-entity-recognition
Bilstm Lan
Hierarchically-Refined Label Attention Network for Sequence Labeling
Stars: ✭ 241 (+487.8%)
Mutual labels:  named-entity-recognition
ckipnlp
CKIP CoreNLP Toolkits
Stars: ✭ 92 (+124.39%)
Mutual labels:  named-entity-recognition
spert
PyTorch code for SpERT: Span-based Entity and Relation Transformer
Stars: ✭ 572 (+1295.12%)
Mutual labels:  named-entity-recognition
Wisty.js
🧚‍♀️ Chatbot library turning conversations into actions, locally, in the browser.
Stars: ✭ 24 (-41.46%)
Mutual labels:  named-entity-recognition
KoBERT-NER
NER Task with KoBERT (with Naver NLP Challenge dataset)
Stars: ✭ 76 (+85.37%)
Mutual labels:  named-entity-recognition
OpenUE
OpenUE是一个轻量级知识图谱抽取工具 (An Open Toolkit for Universal Extraction from Text published at EMNLP2020: https://aclanthology.org/2020.emnlp-demos.1.pdf)
Stars: ✭ 274 (+568.29%)
Mutual labels:  named-entity-recognition
BioMedical-NLP-corpus
Biomedical NLP Corpus or Datasets.
Stars: ✭ 44 (+7.32%)
Mutual labels:  named-entity-recognition
Datashare
Better analyze information, in all its forms
Stars: ✭ 254 (+519.51%)
Mutual labels:  named-entity-recognition
SynLSTM-for-NER
Code and models for the paper titled "Better Feature Integration for Named Entity Recognition", NAACL 2021.
Stars: ✭ 26 (-36.59%)
Mutual labels:  named-entity-recognition
Agriculture knowledgegraph
农业知识图谱(AgriKG):农业领域的信息检索,命名实体识别,关系抽取,智能问答,辅助决策
Stars: ✭ 2,957 (+7112.2%)
Mutual labels:  named-entity-recognition
neural name tagging
Code for "Reliability-aware Dynamic Feature Composition for Name Tagging" (ACL2019)
Stars: ✭ 39 (-4.88%)
Mutual labels:  named-entity-recognition
PersianNER
Named-Entity Recognition in Persian Language
Stars: ✭ 48 (+17.07%)
Mutual labels:  named-entity-recognition
scikitcrf NER
Python library for custom entity recognition using Sklearn CRF
Stars: ✭ 17 (-58.54%)
Mutual labels:  named-entity-recognition
NER-and-Linking-of-Ancient-and-Historic-Places
An NER tool for ancient place names based on Pleiades and Spacy.
Stars: ✭ 26 (-36.59%)
Mutual labels:  named-entity-recognition

namaco

namaco is a library for character-based Named Entity Recognition. namaco will especially focus on Japanese and Chinese named entity recognition.

Demo

The following demo shows Chinese Named Entity Recognition:

gif

Feature Support

namaco would provide following features:

  • learning model by your data.
  • tagging sentences by learned model.

Install

To install namaco, simply run:

$ pip install namaco

Data format

The data must be in the following format(tsv):

安	B-PERSON
倍	E-PERSON
首	O
相	O
が	O
訪	O
米	S-LOC
し	O
た	O
 
本	B-DATE
日	E-DATE

Get Started

Import

First, import the necessary modules:

import os
import namaco
from namaco.data.reader import load_data_and_labels
from namaco.data.preprocess import prepare_preprocessor
from namaco.config import ModelConfig, TrainingConfig
from namaco.models import CharNER

They include loading modules, a preprocessor and configs.

Then, set parameters to use later:

DATA_ROOT = 'data/ja/ner'
SAVE_ROOT = './models'  # trained model
LOG_ROOT = './logs'     # checkpoint, tensorboard
model_file = os.path.join(SAVE_ROOT, 'model.h5')
model_config = ModelConfig()
training_config = TrainingConfig()

Loading data

After importing the modules, read data for training and validation:

train_path = os.path.join(DATA_ROOT, 'train.txt')
valid_path = os.path.join(DATA_ROOT, 'valid.txt')
x_train, y_train = load_data_and_labels(train_path)
x_valid, y_valid = load_data_and_labels(valid_path)

After reading the data, prepare preprocessor and model:

p = prepare_preprocessor(x_train, y_train)
model = CharNER(model_config, p.vocab_size(), p.tag_size())

Now we are ready for training :)

Training a model

Let's train a model. For training a model, we can use Trainer. Trainer manages everything about training. Prepare an instance of Trainer class and give train data and valid data to train method:

trainer = namaco.Trainer(model,
                         model.loss,
                         training_config,
                         log_dir=LOG_ROOT,
                         save_path=model_file,
                         preprocessor=p)
trainer.train(x_train, y_train, x_valid, y_valid)

If training is progressing normally, progress bar would be displayed as follows:

...
Epoch 3/15
702/703 [============================>.] - ETA: 0s - loss: 60.0129 - f1: 89.70
703/703 [==============================] - 319s - loss: 59.9278   
Epoch 4/15
702/703 [============================>.] - ETA: 0s - loss: 59.9268 - f1: 90.03
703/703 [==============================] - 324s - loss: 59.8417   
Epoch 5/15
702/703 [============================>.] - ETA: 0s - loss: 58.9831 - f1: 90.67
703/703 [==============================] - 297s - loss: 58.8993   
...

Tagging a sentence

We can use Tagger for tagging text. Prepare an instance of Tagger class and give text to tag method:

tagger = namaco.Tagger(model_file, preprocessor=p, tokenizer=list)

Let's try to tag a sentence, 安倍首相が訪米した We can do it as follows:

>>> sent = '安倍首相が訪米した'
>>> tagger.analyze(sent)
{
  "language": "jp",
  "text": "安倍首相が訪米した",
  "entities": [
    {
      "text": "安倍",
      "type": "Person",
      "score": 0.972231
      "beginOffset": 0,
      "endOffset": 2,
    },
    {
      "text": "米",
      "type": "Location",
      "score": 0.941431
      "beginOffset": 6,
      "endOffset": 7,
    }
  ]
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].