stevezheng23 / sequence_labeling_tf

Licence: Apache-2.0 license

Sequence Labeling in Tensorflow

Programming Languages

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to sequence labeling tf

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+738.89%)

Mutual labels: named-entity-recognition, pos-tagging, sequence-labeling

Flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Stars: ✭ 11,065 (+61372.22%)

Mutual labels: named-entity-recognition, sequence-labeling, semantic-role-labeling

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+9716.67%)

Mutual labels: named-entity-recognition, chunking, sequence-labeling

ckipnlp

CKIP CoreNLP Toolkits

Stars: ✭ 92 (+411.11%)

Mutual labels: named-entity-recognition, coreference-resolution

Neural sequence labeling

A TensorFlow implementation of Neural Sequence Labeling model, which is able to tackle sequence labeling tasks such as POS Tagging, Chunking, NER, Punctuation Restoration and etc.

Stars: ✭ 214 (+1088.89%)

Mutual labels: named-entity-recognition, sequence-labeling

Multi Task Nlp

multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.

Stars: ✭ 221 (+1127.78%)

Mutual labels: named-entity-recognition, sequence-labeling

Ld Net

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Stars: ✭ 148 (+722.22%)

Mutual labels: named-entity-recognition, sequence-labeling

Ner Lstm

Named Entity Recognition using multilayered bidirectional LSTM

Stars: ✭ 532 (+2855.56%)

Mutual labels: recurrent-neural-networks, named-entity-recognition

Rnnsharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

Stars: ✭ 277 (+1438.89%)

Mutual labels: recurrent-neural-networks, sequence-labeling

Named Entity Recognition

name entity recognition with recurrent neural network(RNN) in tensorflow

Stars: ✭ 20 (+11.11%)

Mutual labels: recurrent-neural-networks, named-entity-recognition

Attention Mechanisms

Implementations for a family of attention mechanisms, suitable for all kinds of natural language processing tasks and compatible with TensorFlow 2.0 and Keras.

Stars: ✭ 203 (+1027.78%)

Mutual labels: recurrent-neural-networks, natural-language-understanding

Monpa

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

Stars: ✭ 203 (+1027.78%)

Mutual labels: named-entity-recognition, pos-tagging

Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+12316.67%)

Mutual labels: named-entity-recognition, sequence-labeling

Bilstm Lan

Hierarchically-Refined Label Attention Network for Sequence Labeling

Stars: ✭ 241 (+1238.89%)

Mutual labels: named-entity-recognition, sequence-labeling

Vntk

Vietnamese NLP Toolkit for Node

Stars: ✭ 170 (+844.44%)

Mutual labels: named-entity-recognition, pos-tagging

Rnn Nlu

A TensorFlow implementation of Recurrent Neural Networks for Sequence Classification and Sequence Labeling

Stars: ✭ 463 (+2472.22%)

Mutual labels: recurrent-neural-networks, sequence-labeling

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (+433.33%)

Mutual labels: recurrent-neural-networks, pos-tagging

pyner

🌈 Implementation of Neural Network based Named Entity Recognizer (Lample+, 2016) using Chainer.

Stars: ✭ 45 (+150%)

Mutual labels: named-entity-recognition, sequence-labeling

TweebankNLP

[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset

Stars: ✭ 84 (+366.67%)

Mutual labels: named-entity-recognition, pos-tagging

Indonesian Nlp Resources

data resource untuk NLP bahasa indonesia

Stars: ✭ 143 (+694.44%)

Mutual labels: named-entity-recognition, pos-tagging

View All Similar Projects ➔

Sequence Labeling

Sequence labeling is a task that assigns categorial label to each element in an input sequence. Many problems can be formalized as sequence labeling task, including speech recognition, video analysis and various problems in NLP (e.g. POS tagging, NER, Chunking, etc.). Traditionally sequence labeling requires large amount of hand-engineered features and domain-specific knowledge, but recently neural approaches have achieved state-of-the-art performance on several sequence labeling benchmarks. A common data format for sequence labeling task is IOB (Inside-Outside-Beginning), although other alternative formats (e.g. IO, IOBES, BMEWO, BMEWO+, BILOU, etc.) might be used.

Figure 1: An NER example in IOB format

Setting

Python 3.6.6
Tensorflow 1.12
NumPy 1.15.4

DataSet

CoNLL2003 is a multi-task dataset, which contains 3 sub-tasks, POS tagging, syntactic chunking and NER. For NER sub-task, it contains 4 types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.
OntoNotes5 is a multi-task dataset, which contains several sub-tasks, including POS tagging, word sense disambiguation, coreference, NER and others. For NER sub-task, it contains 18 types of named entities: PERSON, LOC, ORG, DATE, MONEY and others. This dataset can be converted into CoNLL format using common tool.
Treebank3 is a distributed release of Penn Treebank (PTB) project, which selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation, including POS tagging and constituency parsing.
GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Usage

Preprocess data

# preprocess train data
python conll/preprocess.py --format json --input_file data/conll2003/eng.train --output_file data/ner/train-conll2003/train-conll2003.ner.json
# preprocess dev data
python conll/preprocess.py --format json --input_file data/conll2003/eng.testa --output_file data/ner/dev-conll2003/dev-conll2003.ner.json
# preprocess test data
python conll/preprocess.py --format json --input_file data/conll2003/eng.testb --output_file data/ner/test-conll2003/test-conll2003.ner.json

Run experiment

# run experiment in train + eval mode
python sequence_labeling_run.py --mode train_eval --config config/config_sequence_template.xxx.json
# run experiment in train only mode
python sequence_labeling_run.py --mode train --config config/config_sequence_template.xxx.json
# run experiment in eval only mode
python sequence_labeling_run.py --mode eval --config config/config_sequence_template.xxx.json

Search hyper-parameter

# random search hyper-parameters
python hparam_search.py --base-config config/config_sequence_template.xxx.json --search-config config/config_search_template.xxx.json --num-group 10 --random-seed 100 --output-dir config/search

Visualize summary

# visualize summary via tensorboard
tensorboard --logdir=output

Export model

# export frozen model
python sequence_labeling_run.py --mode export --config config/config_sequence_template.xxx.json

Setup service

# setup tensorflow serving
docker run -p 8500:8500 -v output/xxx/model:models/ner -e MODEL_NAME=ner -t tensorflow/serving

Experiment

Bi-LSTM + Char-CNN + Softmax

Figure 1: Bi-LSTM + Char-CNN + Softmax architecture

CoNLL2003 - NER	F1 Score	Precision	Recall
Dev	94.92	94.97	94.87
Test	91.29	90.41	92.18

Table 1: The performance of Bi-LSTM + Char-CNN + Softmax on CoNLL2003 NER sub-task with setting: num layers = 2, unit dim = 200, window size = [3]

OntoNotes5 - NER	F1 Score	Precision	Recall
Dev	86.22	84.21	88.32
Test	85.09	82.66	87.67

Table 2: The performance of Bi-LSTM + Char-CNN + Softmax on OntoNotes5 NER sub-task with setting: num layers = 2, unit dim = 200, window size = [3,5]

Treebank3 - POS	Accuracy
Dev	97.36
Test	97.58

Table 3: The performance of Bi-LSTM + Char-CNN + Softmax on Treebank3 POS tagging sub-task with setting: num layers = 2, unit dim = 200, window size = [3]

Bi-LSTM + Char-CNN + CRF

Figure 2: Bi-LSTM + Char-CNN + CRF architecture

CoNLL2003 - NER	F1 Score	Precision	Recall
Dev	94.93	94.92	94.93
Test	91.30	90.47	92.15

Table 4: The performance of Bi-LSTM + Char-CNN + CRF on CoNLL2003 NER sub-task with setting: num layers = 2, unit dim = 200, window size = [3]

OntoNotes5 - NER	F1 Score	Precision	Recall
Dev	86.45	84.11	88.93
Test	85.25	82.57	88.11

Table 5: The performance of Bi-LSTM + Char-CNN + CRF on OntoNotes5 NER sub-task with setting: num layers = 2, unit dim = 200, window size = [3,5]

Treebank3 - POS	Accuracy
Dev	97.27
Test	97.51

Table 6: The performance of Bi-LSTM + Char-CNN + CRF on Treebank3 POS tagging sub-task with setting: num layers = 2, unit dim = 200, window size = [3]

Reference

Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional LSTM-CRF models for sequence tagging [2015]
Jason PC Chiu and Eric Nichols. Named entity recognition with bidirectional lstm-cnns [2015]
Xuezhe Ma and Eduard Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs CRF [2016]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, KazuyaKawakami, and ChrisDyer. Neural architectures for named entity recognition [2016]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

stevezheng23 / sequence_labeling_tf

Programming Languages

Labels

Projects that are alternatives of or similar to sequence labeling tf

Sequence Labeling

Setting

DataSet

Usage

Experiment

Bi-LSTM + Char-CNN + Softmax

Bi-LSTM + Char-CNN + CRF

Reference