All Projects → zhifac → crf4j

zhifac / crf4j

Licence: other
a complete Java port of crfpp(crf++)

Programming Languages

java
68154 projects - #9 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to crf4j

Pytorch Bert Crf Ner
KoBERT와 CRF로 만든 한국어 개체명인식기 (BERT+CRF based Named Entity Recognition model for Korean)
Stars: ✭ 236 (+686.67%)
Mutual labels:  crf
mahjong
开源中文分词工具包,中文分词Web API,Lucene中文分词,中英文混合分词
Stars: ✭ 40 (+33.33%)
Mutual labels:  crf
korean ner tagging challenge
KU_NERDY 이동엽, 임희석 (2017 국어 정보 처리 시스템경진대회 금상) - 한글 및 한국어 정보처리 학술대회
Stars: ✭ 30 (+0%)
Mutual labels:  crf
video-quality-metrics
Test specified presets/CRF values for the x264 or x265 encoder. Compares VMAF/SSIM/PSNR numerically & via graphs.
Stars: ✭ 87 (+190%)
Mutual labels:  crf
xinlp
把李航老师《统计学习方法》的后几章的算法都用java实现了一遍,实现盒子与球的EM算法,扩展到去GMM训练,后来实现了HMM分词(实现了HMM分词的参数训练)和CRF分词(借用CRF++训练的参数模型),最后利用tensorFlow把BiLSTM+CRF实现了,然后为lucene包装了一个XinAnalyzer
Stars: ✭ 21 (-30%)
Mutual labels:  crf
Gumbel-CRF
Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
Stars: ✭ 51 (+70%)
Mutual labels:  crf
Fancy Nlp
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Stars: ✭ 233 (+676.67%)
Mutual labels:  crf
keras-crf-layer
Implementation of CRF layer in Keras.
Stars: ✭ 76 (+153.33%)
Mutual labels:  crf
crfsuite-rs
Rust binding to crfsuite
Stars: ✭ 19 (-36.67%)
Mutual labels:  crf
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-23.33%)
Mutual labels:  crf
Machine Learning Code
《统计学习方法》与常见机器学习模型(GBDT/XGBoost/lightGBM/FM/FFM)的原理讲解与python和类库实现
Stars: ✭ 169 (+463.33%)
Mutual labels:  crf
keras-crf-ner
keras+bi-lstm+crf,中文命名实体识别
Stars: ✭ 16 (-46.67%)
Mutual labels:  crf
BiLSTM-CRF-NER-PyTorch
This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.
Stars: ✭ 109 (+263.33%)
Mutual labels:  crf
Pytorch ner bilstm cnn crf
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch
Stars: ✭ 249 (+730%)
Mutual labels:  crf
Hierarchical-Word-Sense-Disambiguation-using-WordNet-Senses
Word Sense Disambiguation using Word Specific models, All word models and Hierarchical models in Tensorflow
Stars: ✭ 33 (+10%)
Mutual labels:  crf
Torchnlp
Easy to use NLP library built on PyTorch and TorchText
Stars: ✭ 233 (+676.67%)
Mutual labels:  crf
fastai sequence tagging
sequence tagging for NER for ULMFiT
Stars: ✭ 21 (-30%)
Mutual labels:  crf
crf-seg
crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。
Stars: ✭ 13 (-56.67%)
Mutual labels:  crf
CRFasRNNLayer
Conditional Random Fields as Recurrent Neural Networks (Tensorflow)
Stars: ✭ 76 (+153.33%)
Mutual labels:  crf
deepseg
Chinese word segmentation in tensorflow 2.x
Stars: ✭ 23 (-23.33%)
Mutual labels:  crf

crf4j: CRF model training and testing for Java

Build Status

This is a pure Java port of taku's crfpp(also known as crf++), which is based on codes of crfpp-0.58.

Credits to komiya's for his Java double array trie implementation.

Features

  • pure Java, with least dependencies(only commons-cli as runtime deps)
  • compatible commandline options and template/input format with crfpp
  • load model from classpath
  • compatible text model format with crfpp
  • convert text model to (our)binary model and (our)binary model to text model
  • multi-threading support
  • CRF-L1/CRF-L2/MIRA algorithms supports
  • n-best outputs
  • CRF Model wrapper for API call
  • Tests and demo for usage demonstration

Usage

Building

mvn clean package

Run tests:

mvn test

Training

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfLearn <template file> <train datafile> <model path>

For more options, please run

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfLearn -h

For details on format of template file and train file, please refer to original page of crfpp.

Testing

to print output to console:

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfTest -m <model path> <test datafile>

to print output to file:

java -cp crf4j-<version>-jar-with-dependencies.jar com.github.zhifac.crf4j.CrfTest -m <model path> <test datafile> -o <outputfile>

API call

please refer to CrfDemo.java.

Performance

Concurrent Access

In an example of using crf4j model to recognize name entity, we used jmeter to test 400 concurrent access to the same Http interface, and here is the result.

#Samples Average Median 90% Line Min Max Throughput
4000 41 4 60 0 746 1250/sec

The test environment is:

OS CPU MEM
Windows 7x64 Intel Core [email protected] 8GB

Notes

The binary model generated by CrfLearn is incompatible with crfpp, but the text model is. If you somehow want to reuse a crfpp model with crf4j, please generate a text model when you train with crfpp(add -t option), and then run java -cp crf4j.jar com.github.zhifac.crf4j.EncoderFeatureIndex <crfpp_text_model> <output_crf4j_binarymodel> to convert the crfpp text model to crf4j binary model. Or if you somehow can not retrain the same text model(e.g. missing train data), you can still convert an existing crfpp binary model to text model with modified version of crfpp from here.

TODO

  • Optimize memory usage when training(it currently consumes about 8GB heap memory for 24224128 features, whereas crfpp uses 2GB)

License

LGPL & Modified BSD


Chinese version:

crf4j: crfpp(crf++)的Java实现

(基于crfpp 0.58)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].