All Projects → aonotas → Deep Crf

aonotas / Deep Crf

Licence: mit
An implementation of Conditional Random Fields (CRFs) with Deep Learning Method

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Deep Crf

Grobid
A machine learning software for extracting information from scholarly documents
Stars: ✭ 1,275 (+691.93%)
Mutual labels:  crf
Ner
命名体识别(NER)综述-论文-模型-代码(BiLSTM-CRF/BERT-CRF)-竞赛资源总结-随时更新
Stars: ✭ 118 (-26.71%)
Mutual labels:  crf
Ner Slot filling
中文自然语言的实体抽取和意图识别(Natural Language Understanding),可选Bi-LSTM + CRF 或者 IDCNN + CRF
Stars: ✭ 151 (-6.21%)
Mutual labels:  crf
End To End Sequence Labeling Via Bi Directional Lstm Cnns Crf Tutorial
Tutorial for End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
Stars: ✭ 87 (-45.96%)
Mutual labels:  crf
Pydensecrf
Python wrapper to Philipp Krähenbühl's dense (fully connected) CRFs with gaussian edge potentials.
Stars: ✭ 1,633 (+914.29%)
Mutual labels:  crf
Semantic Segmentation Of Remote Sensing Images
遥感图像的语义分割,基于深度学习,在Tensorflow框架下,利用TF.Keras,运行环境TF2.0+
Stars: ✭ 125 (-22.36%)
Mutual labels:  crf
Bert Bilstm Crf Pytorch
bert-bilstm-crf implemented in pytorch for named entity recognition.
Stars: ✭ 71 (-55.9%)
Mutual labels:  crf
Sequence tagging
Named Entity Recognition (LSTM + CRF) - Tensorflow
Stars: ✭ 1,889 (+1073.29%)
Mutual labels:  crf
Daguan 2019 rank9
datagrand 2019 information extraction competition rank9
Stars: ✭ 121 (-24.84%)
Mutual labels:  crf
Ncrfpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+997.52%)
Mutual labels:  crf
Etagger
reference tensorflow code for named entity tagging
Stars: ✭ 100 (-37.89%)
Mutual labels:  crf
Crfsharp
CRFSharp is Conditional Random Fields implemented by .NET(C#), a machine learning algorithm for learning from labeled sequences of examples.
Stars: ✭ 110 (-31.68%)
Mutual labels:  crf
Mylearn
machine learning algorithm
Stars: ✭ 125 (-22.36%)
Mutual labels:  crf
Nlp Journey
Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc. All codes are implemented intensorflow 2.0.
Stars: ✭ 1,290 (+701.24%)
Mutual labels:  crf
Clinical Ner
面向中文电子病历的命名实体识别
Stars: ✭ 151 (-6.21%)
Mutual labels:  crf
Meanfield Matlab
MATLAB wrapper for Efficient Inference in Fully Connected CRF
Stars: ✭ 76 (-52.8%)
Mutual labels:  crf
Multilstm
keras attentional bi-LSTM-CRF for Joint NLU (slot-filling and intent detection) with ATIS
Stars: ✭ 122 (-24.22%)
Mutual labels:  crf
Fcn For Semantic Segmentation
Implemention of FCN-8 and FCN-16 with Keras and uses CRF as post processing
Stars: ✭ 155 (-3.73%)
Mutual labels:  crf
G2pc
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
Stars: ✭ 155 (-3.73%)
Mutual labels:  crf
Id Cnn Cws
Source codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"
Stars: ✭ 129 (-19.88%)
Mutual labels:  crf

DeepCRF: Neural Networks and CRFs for Sequence Labeling

A implementation of Conditional Random Fields (CRFs) with Deep Learning Method.

DeepCRF is a sequence labeling library that uses neural networks and CRFs in Python using Chainer, a flexible deep learning framework.

Which version of Python is supported?

  • Python 2.7
  • Python 3.4

Which version of Chainer is supported?

  • Chainer v1.24.0
  • Chainer v2.1.0

How to install?

# if you use Ubuntu
sudo apt install libhdf5-dev

git clone https://github.com/aonotas/deep-crf.git
cd deep-crf
python setup.py install

# if you want to use Chainer v1.24.0
pip install 'chainer==1.24.0'

# if you want to use Chainer v2.1.0
pip install 'chainer==2.1.0'
pip install cupy # if you want to use CUDA

How to train?

train Ma and Hovy (2016) model

$ deep-crf train input_file.txt --delimiter=' ' --dev_file input_file_dev.txt --save_dir save_model_dir --save_name bilstm-cnn-crf_adam --optimizer adam

Note that --dev_file means path of development file to use early stopping.

$ cat input_file.txt
Barack  B−PERSON 
Hussein I−PERSON 
Obama   E−PERSON
is      O 
a       O 
man     O 
.       O

Yuji   B−PERSON 
Matsumoto E−PERSON 
is     O 
a      O 
man    O 
.      O

Each line is word and gold tag. One line is represented by word [ ](space) gold tag. Note that you should put empty line (\n) between sentences. This format is called CoNLL format.

Deep BiLSTM-CNN-CRF model (three layers)

$ deep-crf train input_file.txt --delimiter=' ' --n_layer 3  --dev_file input_file_dev.txt --save_dir save_model_dir --save_name bilstm-cnn-crf_adam --optimizer adam

Deep BiLSTM-CNN-CRF model (three layers) with Multiple Input files

If input file is multiple due to large input files or many lines, please following commands. Please add this arg : --use_list_files 1

$ deep-crf train input_file_list.txt --delimiter=' ' --n_layer 3  --dev_file input_file_dev.txt --save_dir save_model_dir --save_name bilstm-cnn-crf_adam --optimizer adam --use_list_files 1
$ cat input_file_list.txt
./path_to_file/input_file_1.txt
./path_to_file/input_file_2.txt
./path_to_file/input_file_3.txt

set Pretrained Word Embeddings

$ deep-crf train input_file.txt --delimiter=' ' --n_layer 3 --word_emb_file ./glove.6B.100d.txt --word_emb_vocab_type replace_all --dev_file input_file_dev.txt

We prepare some vocab mode.

  • --word_emb_vocab_type: select from [replace_all, replace_only, additional]
  • replace_all : Replace training vocab by Glove embeddings's vocab.
  • replace_only : Replace word embedding exists in training vocab.
  • additional : Concatenate training vocab and Glove embeddings's vocab.

If you want to use word2vec embeddings, please convert Glove format.

$ head glove.6B.100d.txt
the -0.038194 -0.24487 0.72812 -0.39961 0.083172
dog -0.10767 0.11053 0.59812 -0.54361 0.67396
cat -0.33979 0.20941 0.46348 -0.64792 -0.38377
of -0.1529 -0.24279 0.89837 0.16996 0.53516
to -0.1897 0.050024 0.19084 -0.049184 -0.089737
and -0.071953 0.23127 0.023731 -0.50638 0.33923
in 0.085703 -0.22201 0.16569 0.13373 0.38239

Additional Feature Support

$ deep-crf train input_file_multi.txt --delimiter=' ' --input_idx 0,1 --output_idx 2 --dev_file input_file_dev.txt --save_dir save_model_dir --save_name bilstm-cnn-crf_adam_additional --optimizer adam
$ cat input_file_multi.txt
Barack  NN B−PERSON 
Hussein NN I−PERSON 
Obama   NN E−PERSON
is      VBZ O 
a       DT  O 
man     NN  O 
.       .   O

Yuji  NN B−PERSON 
Matsumoto NN E−PERSON 
is      VBZ O 
a       DT  O 
man     NN  O 
.       .   O

Note that --input_idx means that input features (but word feature must be 0-index) like this example.

Multi-Task Learning Support

(Now developing this multi-task learning mode...)

$ deep-crf train input_file_multi.txt --delimiter ' ' --model_name bilstm-cnn-crf --input idx 0 --output idx 1,2 

How to predict?

$ deep-crf predict input_raw_file.txt --delimiter=' ' --model_filename ./save_model_dir/bilstm-cnn-crf_adam_epoch3.model --save_dir save_model_dir --save_name bilstm-cnn-crf_adam  --predicted_output predicted.txt

Please use following format when predict.

$ cat input_raw_file.txt
Barack Hussein Obama is a man .
Yuji Matsumoto is a man .

Note that --model_filename means saved model file path. Please set same --save_name in training step.

How to predict? (Additional Feature)

$ deep-crf predict input_file_multi.txt --delimiter=' ' --input_idx 0,1 --output_idx 2 --model_filename ./save_model_dir/bilstm-cnn-crf_multi_epoch3.model --save_dir save_model_dir --save_name bilstm-cnn-crf_multi  --predicted_output predicted.txt

Note that you must prepare CoNLL format input file when you use additional feature mode in training step.

$ cat input_file_multi.txt
Barack  NN B−PERSON 
Hussein NN I−PERSON 
Obama   NN E−PERSON
is      VBZ O 
a       DT  O 
man     NN  O 
.       .   O

Yuji  NN B−PERSON 
Matsumoto NN E−PERSON 
is      VBZ O 
a       DT  O 
man     NN  O 
.       .   O

How to evaluate?

$ deep-crf eval gold.txt predicted.txt
$ head gold.txt
O
O
B-LOC
O
O

B-PERSON

How to update?

cd deep-crf
git pull
python setup.py install

Help (how to use)

deep-crf train --help

If CUDNN ERROR

if you got CUDNN ERROR, please let me know in issues.

You can cudnn-off mode with --use_cudnn=0

Features

DeepCRF provides following features.

  • Bi-LSTM / Bi-GRU / Bi-RNN
  • CNN for character-level representation
  • Pre-trained word embedding
  • Pre-trained character embedding
  • CRFs at output layer
  • CoNLL format input/output
  • Raw text data input/output
  • Training : Your variable files
  • Test : Raw text file at command-line
  • Evaluation : F-measure, Accuracy

Experiment

POS Tagging

Model Accuracy
CRFsuite 96.39
deep-crf 97.45
dos Santos and Zadrozny (2014) 97.32
Ma and Hovy (2016) 97.55

Named Entity Recognition (NER)

Model Prec. Recall F1
CRFsuite 84.43 83.60 84.01
deep-crf 90.82 91.11 90.96
Ma and Hovy (2016) 91.35 91.06 91.21

Chunking

Model Prec. Recall F1
CRFsuite 93.77 93.45 93.61
deep-crf 94.67 94.43 94.55
Huang et al. (2015) - - 94.46
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].