SymSpellCppPyFast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-53.33%)
customized-symspellJava port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
Stars: ✭ 51 (-15%)
tudienTừ điển tiếng Việt dành cho Kindle
Stars: ✭ 38 (-36.67%)
number-to-words⚡ Thư viện hổ trợ chuyển đổi số sang chữ số Tiếng Việt.
Stars: ✭ 19 (-68.33%)
hanzi-toolsConverts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Stars: ✭ 69 (+15%)
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+151.67%)
communityÔng Dev Community
Stars: ✭ 64 (+6.67%)
google assistant vietnamese speakingĐây là dự án độ lại loa thông minh chạy Google Assistant hỗ trợ đa ngôn ngữ trong đó có tiếng Việt, phần source code do Nguyễn Duy code lại từ Source Gốc của Google
Stars: ✭ 19 (-68.33%)
dnn-lstm-word-segmentChinese Word Segmention Base on the Deep Learning and LSTM Neural Network
Stars: ✭ 24 (-60%)
codeprepA toolkit for pre-processing large source code corpora
Stars: ✭ 39 (-35%)
sylbreakSyllable segmentation tool for Myanmar language (Burmese) by Ye.
Stars: ✭ 44 (-26.67%)
sentencepiece-jniJava JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 26 (-56.67%)
JointIDSFBERT-based joint intent detection and slot filling with intent-slot attention mechanism (INTERSPEECH 2021)
Stars: ✭ 55 (-8.33%)
vietTTSVietnamese Text to Speech library
Stars: ✭ 78 (+30%)
sktSanskrit compound segmentation using seq2seq model
Stars: ✭ 21 (-65%)
ckipnlpCKIP CoreNLP Toolkits
Stars: ✭ 92 (+53.33%)
sentencepieceR package for Byte Pair Encoding / Unigram modelling based on Sentencepiece
Stars: ✭ 22 (-63.33%)
spellSpelling correction and string segmentation written in Go
Stars: ✭ 24 (-60%)
vietnamese-robertaA Robustly Optimized BERT Pretraining Approach for Vietnamese
Stars: ✭ 22 (-63.33%)
SynThaiThai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Stars: ✭ 41 (-31.67%)
esappAn unsupervised Chinese word segmentation tool.
Stars: ✭ 13 (-78.33%)
TALPCoTUFS Asian Language Parallel Corpus
Stars: ✭ 32 (-46.67%)
UserscriptUserscripts collection written by me
Stars: ✭ 92 (+53.33%)
MonpaMONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (+238.33%)
Lac百度NLP:分词,词性标注,命名实体识别,词重要性
Stars: ✭ 2,792 (+4553.33%)
PycantoneseCantonese Linguistics and NLP in Python
Stars: ✭ 147 (+145%)
SymspellSymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+3193.33%)
KiwiKiwi(지능형 한국어 형태소 분석기)
Stars: ✭ 107 (+78.33%)
ToiroA comparison tool of Japanese tokenizers
Stars: ✭ 95 (+58.33%)
CwsSource code for an ACL2016 paper of Chinese word segmentation
Stars: ✭ 81 (+35%)
Han Segment基于隐式马尔可夫模型和正向最大化匹配的中文分词系统
Stars: ✭ 17 (-71.67%)
YoutokentomeUnsupervised text tokenizer focused on computational efficiency
Stars: ✭ 728 (+1113.33%)
PythainlpThai Natural Language Processing in Python.
Stars: ✭ 582 (+870%)
SentencepieceUnsupervised text tokenizer for Neural Network-based text generation.
Stars: ✭ 5,540 (+9133.33%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+621.67%)
SymspellpyPython port of SymSpell
Stars: ✭ 420 (+600%)
VncorenlpA Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+490%)
NagisaA Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (+333.33%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+323.33%)
hashformersHashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-70%)
youtokentome-rubyHigh performance unsupervised text tokenization for Ruby
Stars: ✭ 17 (-71.67%)