MonpaMONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Lac百度NLP:分词,词性标注,命名实体识别,词重要性
SymspellSymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
KiwiKiwi(지능형 한국어 형태소 분석기)
ToiroA comparison tool of Japanese tokenizers
CwsSource code for an ACL2016 paper of Chinese word segmentation
YoutokentomeUnsupervised text tokenizer focused on computational efficiency
PythainlpThai Natural Language Processing in Python.
SentencepieceUnsupervised text tokenizer for Neural Network-based text generation.
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
VncorenlpA Vietnamese natural language processing toolkit (NAACL 2018)
NagisaA Japanese tokenizer based on recurrent neural networks
JumanppJuman++ (a Morphological Analyzer Toolkit)
hashformersHashformers is a framework for hashtag segmentation with transformers.
SymSpellCppPyFast SymSpell written in c++ and exposes to python via pybind11
customized-symspellJava port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
hanzi-toolsConverts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
codeprepA toolkit for pre-processing large source code corpora
sylbreakSyllable segmentation tool for Myanmar language (Burmese) by Ye.
sentencepiece-jniJava JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
sktSanskrit compound segmentation using seq2seq model
sentencepieceR package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
spellSpelling correction and string segmentation written in Go
SynThaiThai Word Segmentation and Part-of-Speech Tagging with Deep Learning
esappAn unsupervised Chinese word segmentation tool.