luozhouyang / deepseg

Licence: Apache-2.0 license

Chinese word segmentation in tensorflow 2.x

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to deepseg

ChineseNER

中文NER的那些事儿

Stars: ✭ 241 (+947.83%)

Mutual labels: crf, bilstm-crf, bert-bilstm-crf

Slot filling and intent detection of slu

slot filling, intent detection, joint training, ATIS & SNIPS datasets, the Facebook’s multilingual dataset, MIT corpus, E-commerce Shopping Assistant (ECSA) dataset, CoNLL2003 NER, ELMo, BERT, XLNet

Stars: ✭ 298 (+1195.65%)

Mutual labels: crf, sequence-labeling

Hscrf Pytorch

ACL 2018: Hybrid semi-Markov CRF for Neural Sequence Labeling (http://aclweb.org/anthology/P18-2038)

Stars: ✭ 284 (+1134.78%)

Mutual labels: crf, sequence-labeling

Lstm Crf Pytorch

LSTM-CRF in PyTorch

Stars: ✭ 364 (+1482.61%)

Mutual labels: crf, sequence-labeling

BiLSTM-CRF-NER-PyTorch

This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.

Stars: ✭ 109 (+373.91%)

Mutual labels: crf, bilstm-crf

Ner Pytorch

LSTM+CRF NER

Stars: ✭ 260 (+1030.43%)

Mutual labels: crf, sequence-labeling

Bert Bilstm Crf Ner

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services

Stars: ✭ 3,838 (+16586.96%)

Mutual labels: crf, bert-bilstm-crf

Sltk

序列化标注工具，基于PyTorch实现BLSTM-CNN-CRF模型，CoNLL 2003 English NER测试集F1值为91.10%（word and char feature）。

Stars: ✭ 338 (+1369.57%)

Mutual labels: crf, sequence-labeling

Named entity recognition

中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM+CRF的具体实现）

Stars: ✭ 995 (+4226.09%)

Mutual labels: crf, sequence-labeling

Ntagger

reference pytorch code for named entity tagging

Stars: ✭ 58 (+152.17%)

Mutual labels: crf, sequence-labeling

Ncrfpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+7582.61%)

Mutual labels: crf, sequence-labeling

A Pytorch Tutorial To Sequence Labeling

Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling

Stars: ✭ 257 (+1017.39%)

Mutual labels: crf, sequence-labeling

BERT-BiLSTM-CRF

BERT-BiLSTM-CRF的Keras版实现

Stars: ✭ 40 (+73.91%)

Mutual labels: sequence-labeling, bilstm-crf

Rnnsharp

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

Stars: ✭ 277 (+1104.35%)

Mutual labels: crf, sequence-labeling

Lm Lstm Crf

Empower Sequence Labeling with Task-Aware Language Model

Stars: ✭ 778 (+3282.61%)

Mutual labels: crf, sequence-labeling

Pytorch ner bilstm cnn crf

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch

Stars: ✭ 249 (+982.61%)

Mutual labels: crf, sequence-labeling

xinlp

把李航老师《统计学习方法》的后几章的算法都用java实现了一遍，实现盒子与球的EM算法，扩展到去GMM训练，后来实现了HMM分词（实现了HMM分词的参数训练）和CRF分词（借用CRF++训练的参数模型），最后利用tensorFlow把BiLSTM+CRF实现了，然后为lucene包装了一个XinAnalyzer

Stars: ✭ 21 (-8.7%)

Mutual labels: crf, bilstm-crf

ImcSegmentationPipeline

A pixel classification based multiplexed image segmentation pipeline

Stars: ✭ 62 (+169.57%)

Mutual labels: segmentation

airs

Road Segmentation in Satellite Aerial Images

Stars: ✭ 51 (+121.74%)

Mutual labels: segmentation

eta

ETA: Extensible Toolkit for Analytics

Stars: ✭ 22 (-4.35%)

Mutual labels: segmentation

View All Similar Projects ➔

deepseg

Tensorflow 2.x 实现的神经网络分词模型！一键训练&一键部署！

tensorflow 1.x的实现请切换到tf1分支

推荐本项目使用到的两个库：

luozhouyang/transformers-keras 用于加载各种预训练Bert和Albert模型。
luozhouyang/keras-crf 基于tensorflow/addons之上的CRF层实现。

开发环境

conda create -n deepseg python=3.6
conda activate deepseg 
pip install -r requirements.txt

数据集下载

SIGHAN

训练模型

可以使用deepseg/run_deepseg.py脚本来训练你的模型。需要提供以下参数：

--model，模型，可选择 bisltm-crf, bigru-crf, bert-crf, albert-crf, bert-bilstm-crf, albert-bilstm-crf
- 如果是bert-based或者albert-based模型，请提供预训练模型路径，使用--pretrained_model_dir参数制定。
--model_dir，模型保存路径
--vocab_file，词典文件路径，注意是字符级别的词典，参考testdata/vocab_small.txt
--train_input_files，训练文件，分好词的文本文件，参考testdata/train_small.txt

对于bilstm-crf和bigru-crf模型，还需要指定以下参数：

--vocab_size，词典大小
--embedding_size，潜入层的维度

一个使用bert-crf模型的例子如下：

python -m deepseg.run_deepseg \
    --model=bert-crf \
    --model_dir=models/bert-crf-model \
    --pretrained_model_dir=/home/zhouyang.lzy/pretrain-models/chinese_roberta_wwm_ext_L-12_H-768_A-12 \
    --train_input_files=testdata/train_small.txt \
    --vocab_file=/home/zhouyang.lzy/pretrain-models/chinese_roberta_wwm_ext_L-12_H-768_A-12/vocab.txt \
    --epochs=2

什么？你觉得我的训练脚本写得太烂了，想自己写训练过程？

完全OK啊！

自己写训练脚本

from deepseg.dataset import DatasetBuilder, LabelMapper, TokenMapper
from deepseg.models import BiGRUCRFModel, BiLSTMCRFModel
from deepseg.models import AlbertBiLSTMCRFModel, AlbertCRFModel
from deepseg.models import BertBiLSTMCRFModel, BertCRFModel

token_mapper = TokenMapper(vocab_file='testdata/vocab_small.txt')
label_mapper = LabelMapper()

builder = DatasetBuilder(token_mapper, label_mapper)
train_dataset = builder.build_train_dataset('testdata/train_small.txt', batch_size=20, buffer_size=100)
valid_dataset = None

model_dir = 'model/bilstm-crf'
tensorboard_logdir = os.path.join(model_dir, 'logs')
saved_model_dir = os.path.join(model_dir, 'export', '{epoch}')

# 更改成你自己想要的模型，或者干脆自己构建任何你想要的模型！
model = BiLSTMCRFModel(100, 128, 3)
model.fit(
    train_dataset,
    validation_data=valid_dataset,
    epochs=10,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(monitor='val_loss' if valid_dataset is not None else 'loss'),
        tf.keras.callbacks.TensorBoard(tensorboard_logdir),
        tf.keras.callbacks.ModelCheckpoint(
            saved_model_dir,
            save_best_only=False,
            save_weights_only=False)
    ]
)

部署模型

上面训练过程中，每个epoch都会保存一个SavedModel格式的模型，可以直接使用tensorflow-serving部署。

TODO：增加部署文档和客户端调用文档

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

luozhouyang / deepseg

Programming Languages

Labels

Projects that are alternatives of or similar to deepseg

deepseg

开发环境

数据集下载

训练模型

自己写训练脚本

部署模型