Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → liu-nlper → Sltk

liu-nlper / Sltk

序列化标注工具，基于PyTorch实现BLSTM-CNN-CRF模型，CoNLL 2003 English NER测试集F1值为91.10%（word and char feature）。

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch crf sequence-labeling

Projects that are alternatives of or similar to Sltk

Slot filling and intent detection of slu

slot filling, intent detection, joint training, ATIS & SNIPS datasets, the Facebook’s multilingual dataset, MIT corpus, E-commerce Shopping Assistant (ECSA) dataset, CoNLL2003 NER, ELMo, BERT, XLNet

Stars: ✭ 298 (-11.83%)

Mutual labels: crf, sequence-labeling

Named entity recognition

中文命名实体识别（包括多种模型：HMM，CRF，BiLSTM，BiLSTM+CRF的具体实现）

Stars: ✭ 995 (+194.38%)

Mutual labels: crf, sequence-labeling

Lstm Crf Pytorch

LSTM-CRF in PyTorch

Stars: ✭ 364 (+7.69%)

Mutual labels: crf, sequence-labeling

ACL 2018: Hybrid semi-Markov CRF for Neural Sequence Labeling (http://aclweb.org/anthology/P18-2038)

Stars: ✭ 284 (-15.98%)

Mutual labels: crf, sequence-labeling

A Pytorch Tutorial To Sequence Labeling

Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling

Stars: ✭ 257 (-23.96%)

Mutual labels: crf, sequence-labeling

reference pytorch code for named entity tagging

Stars: ✭ 58 (-82.84%)

Mutual labels: crf, sequence-labeling

Empower Sequence Labeling with Task-Aware Language Model

Stars: ✭ 778 (+130.18%)

Mutual labels: crf, sequence-labeling

Chinese word segmentation in tensorflow 2.x

Stars: ✭ 23 (-93.2%)

Mutual labels: crf, sequence-labeling

LSTM+CRF NER

Stars: ✭ 260 (-23.08%)

Mutual labels: crf, sequence-labeling

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Stars: ✭ 1,767 (+422.78%)

Mutual labels: crf, sequence-labeling

Pytorch ner bilstm cnn crf

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch

Stars: ✭ 249 (-26.33%)

Mutual labels: crf, sequence-labeling

RNNSharp is a toolkit of deep recurrent neural network which is widely used for many different kinds of tasks, such as sequence labeling, sequence-to-sequence and so on. It's written by C# language and based on .NET framework 4.6 or above versions. RNNSharp supports many different types of networks, such as forward and bi-directional network, sequence-to-sequence network, and different types of layers, such as LSTM, Softmax, sampled Softmax and others.

Stars: ✭ 277 (-18.05%)

Mutual labels: crf, sequence-labeling

Keras solution of Chinese NER task using BiLSTM-CRF/BiGRU-CRF/IDCNN-CRF model with Pretrained Language Model: supporting BERT/RoBERTa/ALBERT

Stars: ✭ 7 (-97.93%)

Mutual labels: crf

knowledge-graph-nlp-in-action

从模型训练到部署，实战知识图谱(Knowledge Graph)&自然语言处理(NLP)。涉及 Tensorflow, Bert+Bi-LSTM+CRF,Neo4j等涵盖 Named Entity Recognition,Text Classify,Information Extraction,Relation Extraction 等任务。

Stars: ✭ 58 (-82.84%)

Mutual labels: crf

fairseq-tagging

a Fairseq fork for sequence tagging/labeling tasks

Stars: ✭ 26 (-92.31%)

Mutual labels: sequence-labeling

Rnn For Joint Nlu

Tensorflow implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling" (https://arxiv.org/abs/1609.01454)

Stars: ✭ 281 (-16.86%)

Mutual labels: sequence-labeling

pytorch-partial-crf

CRF, Partial CRF and Marginal CRF in PyTorch

Stars: ✭ 23 (-93.2%)

Mutual labels: crf

entity recognition

Entity recognition codes for "2019 Datagrand Cup: Text Information Extraction Challenge"

Stars: ✭ 26 (-92.31%)

Mutual labels: crf

A Named-Entity Recogniser based on Grobid.

Stars: ✭ 38 (-88.76%)

Mutual labels: crf

自然语言处理工具Macropodus，基于Albert+BiLSTM+CRF深度学习网络架构，中文分词，词性标注，命名实体识别，新词发现，关键词，文本摘要，文本相似度，科学计算器，中文数字阿拉伯数字(罗马数字)转换，中文繁简转换，拼音转换。tookit(tool) of NLP，CWS(chinese word segnment)，POS(Part-Of-Speech Tagging)，NER(name entity recognition)，Find(new words discovery)，Keyword(keyword extraction)，Summarize(text summarization)，Sim(text similarity)，Calculate(scientific calculator)，Chi2num(chinese number to arabic number)

Stars: ✭ 309 (-8.58%)

Mutual labels: crf

View All Similar Projects ➔

SLTK - Sequence Labeling Toolkit

序列化标注工具，基于PyTorch实现BLSTM-CNN-CRF模型，CoNLL 2003 English NER测试集F1值为91.10%（word and char feature）。

1. 快速开始

1.1 安装依赖项

$ sudo pip3 install -r requirements.txt --upgrade  # for all user
$ pip3 install -r requirements.txt --upgrade --user  # for current user

1.2 预处理&训练

$ CUDA_VISIBLE_DEVICES=0 python3 main.py --config ./configs/word.yml -p --train

1.3 训练

若已经完成了预处理，则可直接进行训练:

$ CUDA_VISIBLE_DEVICES=0 python3 main.py --config ./configs/word.yml --train

1.4 测试

$ CUDA_VISIBLE_DEVICES=0 python3 main.py --config ./configs/word.yml --test

2. 配置文件说明

修改配置文件需遵循yaml语法格式。

2.1 训练|开发|测试数据

数据为conllu格式，每列之间用制表符或空格分隔，句子之间用空行分隔，标签在最后一列(若有标签)。

修改配置文件中data_params下的path_train，path_dev和path_test参数。其中，若path_dev为空，则在训练时会按照model_params.dev_size参数，将训练集划分一部分作为开发集。

2.2 特征

若训练数据包含多列特征，则可通过修改配置文件中的data_params.feature_cols指定使用其中某几列特征，data_params.feature_names为特征的别名，需和data_params.feature_cols等长。

data_params.alphabet_params.min_counts: 在构建特征的词汇表时，该参数用于过滤频次小于指定值的特征。

model_params.embed_sizes: 指定特征的维度，若提供预训练特征向量，则以预训练向量维度为准。

model_params.require_grads: 设定特征的embedding table是否需要更新。

model_params.use_char: 是否使用char level的特征。

2.3 预训练特征向量

data_params.path_pretrain: 指定预训练的特征向量，该参数中元素格式需要和data_params.feature_names中的顺序一致(可设为null)。

2.4 其他特征

word_norm: 是否对单词中的数字进行处理（仅将数字转换为0）；

max_len_limit: batch的长度限制。训练时，一个批量的长度是由该批量中最长的句子决定的，若最大句子长度超出此限制，则该批量长度被强制设为该值；

all_in_memory: 预处理之后，数据被存放在hdf5格式文件中，该数据对象默认存储在磁盘中，根据索引值实时进行加载；若需要加快数据读取速度，可将该值设为true(适用于小数据量)。

3. 性能

下表列出了在CoNLL 2003 NER测试集的性能，特征和参数设置与Ma等（2016）一致。

表. 实验结果

模型	% P	% R	% F1
Lample et al. (2016)	-	-	90.94
Ma et al. (2016)	91.35	91.06	91.21
BGRU	85.50	85.89	85.69
BLSTM	88.05	87.19	87.62
BLSTM-CNN	89.21	90.48	89.84
BLSTM-CNN-CRF	91.01	91.19	91.10

注：

CoNLL 2003语料下载地址: CoNLL 2003 NER，标注方式需要转换为BIESO。
词向量下载地址: glove.6B.zip，词向量需修改为word2vec词向量格式，即txt文件的首部需要有'词表大小向量维度'信息。

4. Requirements

python3
- gensim
- h5py
- numpy
- torch==0.4.0
- pyyaml

5. 参考

Lample G, Ballesteros M, et al. Neural Architectures for Named Entity Recognition. NANCL, 2016.
Ma X, and Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. ACL, 2016.

Updating...

clip: RNN层的梯度裁剪；
deterministic: 模型的可重现性；
one-hot编码字符向量；
lstm抽取字符层面特征；
单机多卡训练。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 338

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (10) 🔗