All Projects → vinhkhuc → jcrfsuite

vinhkhuc / jcrfsuite

Licence: Apache-2.0 license
Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/

Programming Languages

java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to jcrfsuite

crfsuite-rs
Rust binding to crfsuite
Stars: ✭ 19 (-56.82%)
Mutual labels:  crf, crfsuite
crfs-rs
Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)
Stars: ✭ 22 (-50%)
Mutual labels:  crf, crfsuite
Gumbel-CRF
Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
Stars: ✭ 51 (+15.91%)
Mutual labels:  crf
crf-seg
crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。
Stars: ✭ 13 (-70.45%)
Mutual labels:  crf
Hierarchical-Word-Sense-Disambiguation-using-WordNet-Senses
Word Sense Disambiguation using Word Specific models, All word models and Hierarchical models in Tensorflow
Stars: ✭ 33 (-25%)
Mutual labels:  crf
java-cpp-example
Example of using C++ classes from Java. Showcases SWIG, JNA and JNI
Stars: ✭ 135 (+206.82%)
Mutual labels:  jni
CRFasRNNLayer
Conditional Random Fields as Recurrent Neural Networks (Tensorflow)
Stars: ✭ 76 (+72.73%)
Mutual labels:  crf
wgpu-mc
Rust-based replacement for the default Minecraft renderer
Stars: ✭ 254 (+477.27%)
Mutual labels:  jni
VoiceChange
Android NDK开发
Stars: ✭ 39 (-11.36%)
Mutual labels:  jni
korean ner tagging challenge
KU_NERDY 이동엽, 임희석 (2017 국어 정보 처리 시스템경진대회 금상) - 한글 및 한국어 정보처리 학술대회
Stars: ✭ 30 (-31.82%)
Mutual labels:  crf
gdx-jnigen
jnigen is a small library that can be used with or without libGDX which allows C/C++ code to be written inline with Java source code.
Stars: ✭ 32 (-27.27%)
Mutual labels:  jni
sentencepiece-jni
Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 26 (-40.91%)
Mutual labels:  jni
ffmpeg4java
FFmpeg4Java provides a JNI wrapper of FFmpeg library
Stars: ✭ 21 (-52.27%)
Mutual labels:  jni
keras-crf-layer
Implementation of CRF layer in Keras.
Stars: ✭ 76 (+72.73%)
Mutual labels:  crf
BiLSTM-CRF-NER-PyTorch
This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.
Stars: ✭ 109 (+147.73%)
Mutual labels:  crf
ChangeVoice
NDK语音消息的变声处理
Stars: ✭ 33 (-25%)
Mutual labels:  jni
monero-java
A Java library for using Monero
Stars: ✭ 76 (+72.73%)
Mutual labels:  jni
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-47.73%)
Mutual labels:  crf
Libbulletjme
A JNI interface to Bullet Physics and V-HACD
Stars: ✭ 55 (+25%)
Mutual labels:  jni
gradle-native
The home of anything about Gradle support for natively compiled languages
Stars: ✭ 36 (-18.18%)
Mutual labels:  jni

This is a Java interface for crfsuite, a fast implementation of Conditional Random Fields, using SWIG and class injection technique (the same technique used in snappy-java). Jcrfsuite provides API for loading trained model into memory and do sequential tagging in memory. Model training is done via command line interface.

The library is designed for building Java applications for fast text sequential tagging such as Part-Of-Speech (POS) tagging, phrase chunking, Named-Entity Recognition (NER), etc.

Jcrfsuite can be dropped into any Java web applications and run without problem with JVM's class loader.

Maven dependency

<dependency>
  <groupId>com.github.vinhkhuc</groupId>
  <artifactId>jcrfsuite</artifactId>
  <version>0.6.1</version>
</dependency>

Building

git clone https://github.com/vinhkhuc/jcrfsuite
cd jcrfsuite
mvn clean package

How to use

Model training

import com.github.jcrfsuite.CrfTrainer;
...
String trainFile = "data/tweet-pos/train-oct27.txt";
String modelFile = "twitter-pos.model";
CrfTrainer.train(trainFile, modelFile);

Sequential tagging

import com.github.jcrfsuite.CrfTagger;
import com.github.jcrfsuite.util.Pair;
...
String modelFile = "twitter-pos.model";
String testFile = "data/tweet-pos/test-daily547.txt";
CrfTagger crfTagger = new CrfTagger(modelFile);
List<List<Pair<String, Double>>> tagProbLists = crfTagger.tag(testFile);

Example on Twitter Part-Of-Speech tagging

Training

To train a POS model from Twitter POS data, run

java -cp target/jcrfsuite-*.jar com.github.jcrfsuite.example.Train data/tweet-pos/train-oct27.txt twitter-pos.model

Tagging

To test the trained POS model against the test set, run

java -cp target/jcrfsuite-*.jar com.github.jcrfsuite.example.Tag twitter-pos.model data/tweet-pos/test-daily547.txt

The output should be as follows:

Gold	Predict	Probability
........................
N       N       0.99
P       P       1.00
Z       ^       0.59
$       $       0.97
N       N       1.00
P       P       0.98
A       N       0.80
$       $       1.00
N       N       0.99
U       U       1.00

Accuracy = 92.99%

Note that the accuracy might be slightly different than in the above output.

License

Jcrfsuite is released under the Apache License 2.0. The original crfsuite is distributed under the BSD License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].