Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

asahi417 / Tner

Licence: mit

Language model finetuning on NER with an easy interface, and cross-domain evaluation. We released NER models finetuned on various domain via huggingface model hub.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

nlp named-entity-recognition language-model

Projects that are alternatives of or similar to Tner

Bert Sklearn

a sklearn wrapper for Google's BERT model

Stars: ✭ 182 (+237.04%)

Mutual labels: named-entity-recognition, language-model

Ld Net

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Stars: ✭ 148 (+174.07%)

Mutual labels: named-entity-recognition, language-model

Phonlp

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)

Stars: ✭ 56 (+3.7%)

Mutual labels: named-entity-recognition, language-model

Bertweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

Stars: ✭ 282 (+422.22%)

Mutual labels: named-entity-recognition, language-model

Pytorch Cpp

C++ Implementation of PyTorch Tutorials for Everyone

Stars: ✭ 1,014 (+1777.78%)

Mutual labels: language-model

Named Entity Recognition

name entity recognition with recurrent neural network(RNN) in tensorflow

Stars: ✭ 20 (-62.96%)

Mutual labels: named-entity-recognition

Spago

Self-contained Machine Learning and Natural Language Processing library in Go

Stars: ✭ 854 (+1481.48%)

Mutual labels: language-model

Bert language understanding

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

Stars: ✭ 933 (+1627.78%)

Mutual labels: language-model

Ner blstm Crf

LSTM-CRF for NER with ConLL-2002 dataset

Stars: ✭ 51 (-5.56%)

Mutual labels: named-entity-recognition

Lmchallenge

A library & tools to evaluate predictive language models.

Stars: ✭ 47 (-12.96%)

Mutual labels: language-model

Named entity recognition

中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM+CRF的具体实现）

Stars: ✭ 995 (+1742.59%)

Mutual labels: named-entity-recognition

Harvesttext

文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法

Stars: ✭ 956 (+1670.37%)

Mutual labels: named-entity-recognition

Nlp Library

curated collection of papers for the nlp practitioner 📖👩‍🔬

Stars: ✭ 1,025 (+1798.15%)

Mutual labels: language-model

Tf ner

Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data

Stars: ✭ 876 (+1522.22%)

Mutual labels: named-entity-recognition

Corenlp

Stanford CoreNLP: A Java suite of core NLP tools.

Stars: ✭ 8,248 (+15174.07%)

Mutual labels: named-entity-recognition

Chinesener

中文命名实体识别，实体抽取，tensorflow，pytorch，BiLSTM+CRF

Stars: ✭ 938 (+1637.04%)

Mutual labels: named-entity-recognition

Boilerplate Dynet Rnn Lm

Boilerplate code for quickly getting set up to run language modeling experiments

Stars: ✭ 37 (-31.48%)

Mutual labels: language-model

Gpt2 French

GPT-2 French demo | Démo française de GPT-2

Stars: ✭ 47 (-12.96%)

Mutual labels: language-model

Understanding Financial Reports Using Natural Language Processing

Investigate how mutual funds leverage credit derivatives by studying their routine filings to the SEC using NLP techniques 📈🤑

Stars: ✭ 36 (-33.33%)

Mutual labels: named-entity-recognition

Nlp Experiments In Pytorch

PyTorch repository for text categorization and NER experiments in Turkish and English.

Stars: ✭ 35 (-35.19%)

Mutual labels: named-entity-recognition

View All Similar Projects ➔

T-NER

T-NER is a python tool for language model finetuning on named-entity-recognition (NER), available via pip. It has an easy interface to finetune models and test on cross-domain and multilingual datasets. T-NER currently integrates 9 publicly available NER datasets and enables an easy integration of custom datasets. All models finetuned with T-NER can be deploy on our web app for visualization.

Paper Accepted: Our paper demonstrating T-NER has been accepted to EACL 2021 🎉 Paper here.

PreTrained Models: We release 46 XLM-RoBERTa models finetuned on NER on the HuggingFace transformers model hub, see here for more details and model cards.

Setup
Web API
Pretrained Models
Model Finetuning
Model Evaluation
Model Inference
Datasets
Reference

Google Colab Examples

Description	Link
Model Finetuning
Model Evaluation
Model Prediction
Multilingual NER Workflow

Get Started

Install pip package

pip install tner

or directly from the repository for the latest version.

pip install git+https://github.com/asahi417/tner

Web App

To start the web app, first clone the repository

git clone https://github.com/asahi417/tner
cd tner

then launch the server by

uvicorn app:app --reload --log-level debug --host 0.0.0.0 --port 8000

and open your browser http://0.0.0.0:8000 once ready. You can specify model to deploy by an environment variable NER_MODEL, which is set as asahi417/tner-xlm-roberta-large-ontonotes5 as a defalt. NER_MODEL can be either path to your local model checkpoint directory or model name on transformers model hub.

Acknowledgement The App interface is heavily inspired by this repository.

Model Finetuning

Language model finetuning on NER can be done with a few lines:

import tner
trainer = tner.TrainTransformersNER(checkpoint_dir='./ckpt_tner', dataset="data-name", transformers_model="transformers-model")
trainer.train()

where transformers_model is a pre-trained model name from transformers model hub and dataset is a dataset alias or path to custom dataset explained dataset section. Model files will be generated at checkpoint_dir, and it can be uploaded to transformers model hub without any changes.

To show validation accuracy at the end of each epoch,

trainer.train(monitor_validation=True)

and to tune training parameters such as batch size, epoch, learning rate, please take a look the argument description.

Train on multiple datasets: Model can be trained on a concatenation of multiple datasets by providing a list of dataset names.

trainer = tner.TrainTransformersNER(checkpoint_dir='./ckpt_merged', dataset=["ontonotes5", "conll2003"], transformers_model="xlm-roberta-base")

Custom datasets can be also added to it, e.g. dataset=["ontonotes5", "./examples/custom_data_sample"].

Command line tool: Finetune models with the command line (CL).

tner-train [-h] [-c CHECKPOINT_DIR] [-d DATA] [-t TRANSFORMER] [-b BATCH_SIZE] [--max-grad-norm MAX_GRAD_NORM] [--max-seq-length MAX_SEQ_LENGTH] [--random-seed RANDOM_SEED] [--lr LR] [--total-step TOTAL_STEP] [--warmup-step WARMUP_STEP] [--weight-decay WEIGHT_DECAY] [--fp16] [--monitor-validation] [--lower-case]

Model Evaluation

Evaluation of NER models is easily done for in/out of domain settings.

import tner
trainer = tner.TrainTransformersNER(checkpoint_dir='path-to-checkpoint', transformers_model="language-model-name")
trainer.test(test_dataset='data-name')

Entity span prediction: For better understanding of out-of-domain accuracy, we provide the entity span prediction pipeline, which ignores the entity type and compute metrics only on the IOB entity position.

trainer.test(test_dataset='data-name', entity_span_prediction=True)

Command line tool: Model evaluation with CL.

tner-test [-h] -c CHECKPOINT_DIR [--lower-case] [--test-data TEST_DATA] [--test-lower-case] [--test-entity-span]

Model Inference

If you just want a prediction from a finetuned NER model, here is the best option for you.

import tner
classifier = tner.TransformersNER('transformers-model')
test_sentences = [
    'I live in United States, but Microsoft asks me to move to Japan.',
    'I have an Apple computer.',
    'I like to eat an apple.'
]
classifier.predict(test_sentences)

Command line tool: Model inference with CL.

tner-predict [-h] [-c CHECKPOINT]

Datasets

Public datasets that can be fetched with TNER are summarized here. Please cite the corresponding reference if using one of these datasets.

Name (`alias`)	Genre	Language	Entity types	Data size (train/valid/test)	Note
OntoNotes 5 (`ontonotes5`)	News, Blog, Dialogue	English	18	59,924/8,582/8,262
CoNLL 2003 (`conll2003`)	News	English	4	14,041/3,250/3,453
WNUT 2017 (`wnut2017`)	SNS	English	6	1,000/1,008/1,287
FIN (`fin`)	Finance	English	4	1,164/-/303
BioNLP 2004 (`bionlp2004`)	Chemical	English	5	18,546/-/3,856
BioCreative V CDR (`bc5cdr`)	Medical	English	2	5,228/5,330/5,865	split into sentences to reduce sequence length
WikiAnn (`panx_dataset/en`, `panx_dataset/ja`, etc)	Wikipedia	282 languages	3	20,000/10,000/10,000
Japanese Wikipedia (`wiki_ja`)	Wikipedia	Japanese	8	-/-/500	test set only
Japanese WikiNews (`wiki_news_ja`)	Wikipedia	Japanese	10	-/-/1,000	test set only
MIT Restaurant (`mit_restaurant`)	Restaurant review	English	8	7,660/-/1,521	lower-cased
MIT Movie (`mit_movie_trivia`)	Movie review	English	12	7,816/-/1,953	lower-cased

To take a closer look into each dataset, one may want to use tner.get_dataset_ner as in

import tner
data, label_to_id, language, unseen_entity_set = tner.get_dataset_ner('data-name')

where data consists of the following structured format.

{
    'train': {
        'data': [
            ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.'],
            ['From', 'Green', 'Newsfeed', ':', 'AHFA', 'extends', 'deadline', 'for', 'Sage', 'Award', 'to', 'Nov', '.', '5', 'http://tinyurl.com/24agj38'], ...
        ],
        'label': [
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], ...
        ]
    },
    'valid': ...
}

Custom Dataset

To go beyond the public datasets, users can use their own datasets by formatting them into the IOB format described in CoNLL 2003 NER shared task paper, where all data files contain one word per line with empty lines representing sentence boundaries. At the end of each line there is a tag which states whether the current word is inside a named entity or not. The tag also encodes the type of named entity. Here is an example sentence:

EU B-ORG
rejects O
German B-MISC
call O
to O
boycott O
British B-MISC
lamb O
. O

Words tagged with O are outside of named entities and the I-XXX tag is used for words inside a named entity of type XXX. Whenever two entities of type XXX are immediately next to each other, the first word of the second entity will be tagged B-XXX in order to show that it starts another entity. The custom dataset should have train.txt and valid.txt files in a same folder. Please take a look sample custom data.

Reference paper

If you use any of these resources, please cite the following paper:

@InProceedings{ushio2021tner,
  author    = "Ushio, Asahi and Camacho-Collados, Jose",
  title     = "T-NER: An All-Round Python Library for Transformer-based Named Entity Recognition",
  booktitle = "Proceedings of EACL: System Demonstrations",
  year      = "2021"
  }

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 54

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

asahi417 / Tner

Programming Languages

Labels

Projects that are alternatives of or similar to Tner

T-NER

Table of Contents

Google Colab Examples

Get Started

Web App

Model Finetuning

Model Evaluation

Model Inference

Datasets

Custom Dataset

Reference paper