All Categories → Machine Learning → word-segmentation

Top 39 word-segmentation open source projects

Monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Kiwi
Kiwi(지능형 한국어 형태소 분석기)
Cws
Source code for an ACL2016 paper of Chinese word segmentation
Han Segment
基于隐式马尔可夫模型和正向最大化匹配的中文分词系统
Youtokentome
Unsupervised text tokenizer focused on computational efficiency
Ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Nagisa
A Japanese tokenizer based on recurrent neural networks
cws-tensorflow
基于Tensorflow的中文分词模型
UETsegmenter
A toolkit for Vietnamese word segmentation
hanzi-tools
Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
dnn-lstm-word-segment
Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network
sylbreak
Syllable segmentation tool for Myanmar language (Burmese) by Ye.
pytorch Joint-Word-Segmentation-and-POS-Tagging
Paper: A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging
sentencepiece-jni
Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
skt
Sanskrit compound segmentation using seq2seq model
sentencepiece
R package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
SynThai
Thai Word Segmentation and Part-of-Speech Tagging with Deep Learning
1-39 of 39 word-segmentation projects