Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → scofield7419 → Sequence Labeling Bilstm Crf

scofield7419 / Sequence Labeling Bilstm Crf

Licence: gpl-3.0

The classical BiLSTM-CRF model implemented in Tensorflow, for sequence labeling tasks. In Vex version, everything is configurable.

Programming Languages

184084 projects - #8 most used programming language

Labels

tensorflow nlp ner sequence-labeling

Projects that are alternatives of or similar to Sequence Labeling Bilstm Crf

reference pytorch code for named entity tagging

Stars: ✭ 58 (-89.98%)

Mutual labels: ner, sequence-labeling

Macadam是一个以Tensorflow(Keras)和bert4keras为基础，专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA、GPT-2等EMBEDDING嵌入; 支持FineTune、FastText、TextCNN、CharCNN、BiRNN、RCNN、DCNN、CRNN、DeepMoji、SelfAttention、HAN、Capsule等文本分类算法; 支持CRF、Bi-LSTM-CRF、CNN-LSTM、DGCNN、Bi-LSTM-LAN、Lattice-LSTM-Batch、MRC等序列标注算法。

Stars: ✭ 149 (-74.27%)

Mutual labels: ner, sequence-labeling

Inference with state-of-the-art models (pre-trained by LD-Net / AutoNER / VanillaNER / ...)

Stars: ✭ 102 (-82.38%)

Mutual labels: ner, sequence-labeling

CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition

Stars: ✭ 689 (+19%)

Mutual labels: ner, sequence-labeling

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

Stars: ✭ 87 (-84.97%)

Mutual labels: ner, sequence-labeling

Named entity recognition

中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM+CRF的具体实现）

Stars: ✭ 995 (+71.85%)

Mutual labels: ner, sequence-labeling

Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling

Stars: ✭ 148 (-74.44%)

Mutual labels: ner, sequence-labeling

Empower Sequence Labeling with Task-Aware Language Model

Stars: ✭ 778 (+34.37%)

Mutual labels: ner, sequence-labeling

Learning Named Entity Tagger from Domain-Specific Dictionary

Stars: ✭ 357 (-38.34%)

Mutual labels: ner, sequence-labeling

Pytorch ner bilstm cnn crf

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch

Stars: ✭ 249 (-56.99%)

Mutual labels: ner, sequence-labeling

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+205.18%)

Mutual labels: ner, sequence-labeling

ACL 2018: Hybrid semi-Markov CRF for Neural Sequence Labeling (http://aclweb.org/anthology/P18-2038)

Stars: ✭ 284 (-50.95%)

Mutual labels: ner, sequence-labeling

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+286.01%)

Mutual labels: ner, sequence-labeling

fairseq-tagging

a Fairseq fork for sequence tagging/labeling tasks

Stars: ✭ 26 (-95.51%)

Mutual labels: ner, sequence-labeling

a Deep Learning Framework for Text

Stars: ✭ 289 (-50.09%)

Mutual labels: ner, sequence-labeling

序列化标注工具，基于PyTorch实现BLSTM-CNN-CRF模型，CoNLL 2003 English NER测试集F1值为91.10%（word and char feature）。

Stars: ✭ 338 (-41.62%)

Mutual labels: sequence-labeling

Bert Multitask Learning

BERT for Multitask Learning

Stars: ✭ 380 (-34.37%)

Mutual labels: ner

BERT-NER (nert-bert) with google bert https://github.com/google-research.

Stars: ✭ 339 (-41.45%)

Mutual labels: ner

An easy-to-use named entity recognition (NER) toolkit, implemented the Bi-LSTM+CRF model in tensorflow.

Stars: ✭ 337 (-41.8%)

Mutual labels: ner

基于Pytorch和torchtext的知识图谱深度学习框架。

Stars: ✭ 452 (-21.93%)

Mutual labels: ner

View All Similar Projects ➔

BiLSTM+CRF for sequential labeling tasks

🚀🚀🚀 A TensorFlow implementation of BiLSTM+CRF model, for sequence labeling tasks.

Project Features

based on Tensorflow api.
highly scalable; everything is configurable.
modularized with clear structure.
very friendly for beginners.
easy to DIY.

Task and Model

Sequential labeling is one typical methodology modeling the sequence prediction tasks in NLP. Common sequential labeling tasks include, e.g.,

Part-of-Speech (POS) Tagging,
Chunking,
Named Entity Recognition (NER),
Punctuation Restoration,
Sentence Boundary Detection,
Scope Detection,
Chinese Word Segmentation (CWG),
Semantic Role Labeling (SRL),
Spoken Language Understanding,
Event Extraction,
and so forth...

Taking Named Entity Recognition (NER) task as example:

Stanford University located at California .
B-ORG    I-ORG      O       O  B-LOC      O

here, two entities, Stanford University and California are to be extracted. And specifically, each token in the text is tagged with a corresponding label. E.g., {token:Stanford, label:B-ORG}. The sequence labeling model aims to predict the label sequence, given a token sequence.

BiLSTM+CRF proposed by Lample et al., 2016, is so far the most classical and stable neural model for sequential labeling tasks.

Project

Function Support

configuring all settings
- Running Mode: [train/test/interactive_predict/api_service]
- Datasets(Input/Output):
- Labeling Scheme:
  - [BIO/BIESO]
  - [PER|LOC|ORG]
  - ...
- Model Configuration:
  - encoder: BGU/Bi-LSTM, layer, Bi/Uni-directional
  - decoder: crf/softmax,
  - embedding level: char/word,
  - with/without self attention
  - hyperparameters,
  - ...
- Training Settings:
  - subscribe measuring metrics: [precision,recall,f1,accuracy]
  - optimazers: GD/Adagrad/AdaDelta/RMSprop/Adam
- Testing Settings,
- Api service Settings,
logging everything
web app demo for easy demonstration
object oriented: BILSTM_CRF, Datasets, Configer, utils
modularized with clear structure, easy for DIY.

see more in HandBook.

Requirements

python >=3.5
tensorflow >=1.8
numpy
pandas
Django==1.11.8
jieba
...

Setup

Option A:

download the repo for directly use.

git clone https://github.com/scofield7419/sequence-labeling-BiLSTM-CRF.git
pip install -r requirements.txt

Option B: TODO

install the BiLSTM-CRF package as a module.

pip install BiLSTM-CRF

usage:

from BiLSTM-CRF.engines.BiLSTM_CRFs import BiLSTM_CRFs as BC
from BiLSTM-CRF.engines.DataManager import DataManager
from BiLSTM-CRF.engines.Configer import Configer
from BiLSTM-CRF.engines.utils import get_logger

...

config_file = r'/home/projects/system.config'
configs = Configer(config_file)

logger = get_logger(configs.log_dir)
configs.show_data_summary(logger) # optional

dataManager = DataManager(configs, logger)
model = BC(configs, logger, dataManager)
        
###### mode == 'train':
model.train()

###### mode == 'test':
model.test()

###### mode == 'single predicting':
sentence_tokens, entities, entities_type, entities_index = model.predict_single(sentence)
if configs.label_level == 1:
    print("\nExtracted entities:\n %s\n\n" % ("\n".join(entities)))
elif configs.label_level == 2:
    print("\nExtracted entities:\n %s\n\n" % ("\n".join([a + "\t(%s)" % b for a, b in zip(entities, entities_type)])))


###### mode == 'api service webapp':
cmd_new = r'cd demo_webapp; python manage.py runserver %s:%s' % (configs.ip, configs.port)
res = os.system(cmd_new)

open `ip:port` in your browser.

Module Structure


├── main.py
├── system.config
├── HandBook.md
├── README.md
│
├── checkpoints
│   ├── BILSTM-CRFs-datasets1
│   │   ├── checkpoint
│   │   └── ...
│   └── ...
├── data
│   ├── example_datasets1
│   │   ├── logs
│   │   ├── vocabs
│   │   ├── test.csv
│   │   ├── train.csv
│   │   └── dev.csv
│   └── ...
├── demo_webapp
│   ├── demo_webapp
│   ├── interface
│   └── manage.py
├── engines
│   ├── BiLSTM_CRFs.py
│   ├── Configer.py
│   ├── DataManager.py
│   └── utils.py
└── tools
    ├── calcu_measure_testout.py
    └── statis.py

Folds
- in engines fold, providing the core functioning py.
- in data-subfold fold, the datasets are placed.
- in checkpoints-subfold fold, model checkpoints are stored.
- in demo_webapp fold, we can demonstrate the system in web, and provides api.
- in tools fold, providing some offline utils.
Files
- main.py is the entry python file for the system.
- system.config is the configure file for all the system settings.
- HandBook.md provides some usage instructions.
- BiLSTM_CRFs.py is the main model.
- Configer.py parses the system.config.
- DataManager.py manages the datasets and scheduling.
- utils.py provides on the fly tools.

Quick Start

Under following steps:

step 1. composing your configure file in `system.config`.

configure the Datasets(Input/Output).
configure the Labeling Scheme.
configure the Model architecture.
configure the webapp setting when demonstrating demo.

step 2. starting training (necessary and compulsory)

configure the running mode.
configure the training setting.
run main.py.

step 3. starting testing (optional)

configure the running mode.
configure the testing setting.
run main.py.

step 4. starting interactively predicting (optional)

configure the running mode.
run main.py.
interactively input sentences.

step 5. starting api service and web app (optional)

configure the running mode.
configure the api_service setting.
run main.py.
make interactively prediction in browser.

Datasets

Input

Datasets including trainset, testset, devset are necessary for the overall usage. However, is you only wanna train the model the use it offline, only the trainset is needed. After training, you can make inference with the saved model checkpoint files. If you wanna make test, you should

For trainset, testset, devset, the common format is as follows:

word level:

(Token)         (Label)

for             O
the             O
lattice         B_TAS
QCD             I_TAS
computation     I_TAS
of              I_TAS
nucleon–nucleon I_TAS
low-energy      I_TAS
interactions    E_TAS
.               O

It              O
consists        O
in              O
simulating      B_PRO
...

char level:

(Token) (Label)

马 B-LOC
来 I-LOC
西 I-LOC
亚 I-LOC
副 O
总 O
理 O
。 O

他 O
兼 O
任 O
财 B-ORG
政 I-ORG
部 I-ORG
长 O
...

Note that:

the testset can only exists with the the Token row.
each sentence of tokens is segmented with a blank line.
go to the example dataset for detailed formation.

Output (during testing phase)

During testing, model will output the predicted entities based on the test.csv. The output files include two: test.out, test.entity.out(optional).

test.out

with the same formation as input test.csv.
test.entity.out

Sentence
entity1 (Type)
entity2 (Type)
entity3 (Type)
...

DIY

If you wanna adapt this project to your own specific sequence labeling task, you may need the following tips.

Download the repo sources.
Labeling Scheme (most important)
- label_scheme: BIO/BIESO
- label_level: with/without suffix
- hyphen, for connecting the prefix and suffix: B_PER',I_LOC'
- suffix=[NR,NS,NT]
- labeling_level: word/char
Model: modify the model architecture into the one you wanted, in BiLSTM_CRFs.py.
Dataset: adapt to your dataset, in the correct formation.
Training
- specify all directories.
- training hyperparameters.

Others

For more useage details, please refers to the HandBook

You're welcomed to issue anything wrong.

Updating...

2019-Jun-04, Vex version, v1.0, supporting configuration, scalable.
2018-Nov-05, support char and word level embedding.
2017-Dec-06, init version, v0.1.

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 579

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗