KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+118.11%)
QutufQutuf (قُطُوْف): An Arabic Morphological analyzer and Part-Of-Speech tagger as an Expert System.
Stars: ✭ 84 (-66.93%)
NagisaA Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (+2.36%)
Awesome Persian Nlp IrCurated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+81.1%)
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (-40.55%)
JptdpNeural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)
Stars: ✭ 146 (-42.52%)
KWDLCKyoto University Web Document Leads Corpus
Stars: ✭ 64 (-74.8%)
SynThaiThai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Stars: ✭ 41 (-83.86%)
FugashiA Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Stars: ✭ 125 (-50.79%)
VncorenlpA Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+39.37%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+70.47%)
Lac百度NLP:分词,词性标注,命名实体识别,词重要性
Stars: ✭ 2,792 (+999.21%)
sinlingA collection of NLP tools for Sinhalese (සිංහල).
Stars: ✭ 38 (-85.04%)
simplemmaSimple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-87.4%)
HebPipeAn NLP pipeline for Hebrew
Stars: ✭ 15 (-94.09%)
udarUDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-94.09%)
SudachipyPython version of Sudachi, a Japanese tokenizer.
Stars: ✭ 207 (-18.5%)
KuromojiKuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search
Stars: ✭ 745 (+193.31%)
SudachiA Japanese Tokenizer for Business
Stars: ✭ 496 (+95.28%)
KiwiKiwi(지능형 한국어 형태소 분석기)
Stars: ✭ 107 (-57.87%)
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Stars: ✭ 160 (-37.01%)
RdrpostaggerA fast and accurate POS and morphological tagging toolkit (EACL 2014)
Stars: ✭ 126 (-50.39%)
graspEssential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (-77.17%)
suikaSuika 🍉 is a Japanese morphological analyzer written in pure Ruby
Stars: ✭ 31 (-87.8%)
GrammarEngineГрамматический Словарь Русского Языка (+ английский, японский, etc)
Stars: ✭ 68 (-73.23%)
MonpaMONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (-20.08%)
Pytorch Pos TaggingA tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.
Stars: ✭ 96 (-62.2%)
ToiroA comparison tool of Japanese tokenizers
Stars: ✭ 95 (-62.6%)
datalinguistStanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (-63.39%)
ArticutapiAPI of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (-0.79%)
SymSpellCppPyFast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-88.98%)
Hebrew-TokenizerA very simple python tokenizer for Hebrew text.
Stars: ✭ 16 (-93.7%)
customized-symspellJava port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
Stars: ✭ 51 (-79.92%)
knpA Japanese Parser
Stars: ✭ 16 (-93.7%)
unidic-pyUnidic packaged for installation via pip.
Stars: ✭ 17 (-93.31%)
wana kana rustUtility library for checking and converting between Japanese characters - Hiragana, Katakana - and Romaji
Stars: ✭ 46 (-81.89%)
gazouJapanese OCR for Linux & Windows
Stars: ✭ 32 (-87.4%)
cang-jieChinese tokenizer for tantivy, based on jieba-rs
Stars: ✭ 48 (-81.1%)
kanji-web-appAngular.js kanji web application
Stars: ✭ 45 (-82.28%)
hashformersHashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-92.91%)
bredonA modern CSS value compiler in JavaScript
Stars: ✭ 39 (-84.65%)
ATKSpythis repository is a python package that supports SOAP interface to communicate with the Microsoft ATKS
Stars: ✭ 27 (-89.37%)
jp-ocr-prunned-cnnAttempting feature map prunning on a CNN trained for Japanese OCR
Stars: ✭ 15 (-94.09%)
mystem-scalaMorphological analyzer `mystem` (Russian language) wrapper for JVM languages
Stars: ✭ 21 (-91.73%)
textlint-jatextlintの日本語コミュニティ/ルールのアイデア
Stars: ✭ 41 (-83.86%)
pascal-interpreterA simple interpreter for a large subset of Pascal language written for educational purposes
Stars: ✭ 21 (-91.73%)
ZipanguA library for compatibility about Japan.
Stars: ✭ 27 (-89.37%)
rippletaggerRippleTagger identifies part-of-speech tags (Nouns, Verbs, and so on...). You give it a sentence, it gives you a list of tags back.
Stars: ✭ 12 (-95.28%)
hanzi-toolsConverts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Stars: ✭ 69 (-72.83%)