ToiroA comparison tool of Japanese tokenizers
Stars: ✭ 95 (-63.46%)
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (-41.92%)
KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+113.08%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (-2.31%)
Multi Task Nlpmulti_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.
Stars: ✭ 221 (-15%)
VncorenlpA Vietnamese natural language processing toolkit (NAACL 2018)
Stars: ✭ 354 (+36.15%)
KuromojiKuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search
Stars: ✭ 745 (+186.54%)
PythainlpThai Natural Language Processing in Python.
Stars: ✭ 582 (+123.85%)
Nuts自然语言处理常见任务(主要包括文本分类,序列标注,自动问答等)解决方案试验田
Stars: ✭ 21 (-91.92%)
SudachipyPython version of Sudachi, a Japanese tokenizer.
Stars: ✭ 207 (-20.38%)
Pytorch ner bilstm cnn crfEnd-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF implement in pyotrch
Stars: ✭ 249 (-4.23%)
MonpaMONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Stars: ✭ 203 (-21.92%)
EkphrasisEkphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
Stars: ✭ 433 (+66.54%)
fairseq-tagginga Fairseq fork for sequence tagging/labeling tasks
Stars: ✭ 26 (-90%)
SynThaiThai Word Segmentation and Part-of-Speech Tagging with Deep Learning
Stars: ✭ 41 (-84.23%)
SudachiA Japanese Tokenizer for Business
Stars: ✭ 496 (+90.77%)
SymSpellCppPyFast SymSpell written in c++ and exposes to python via pybind11
Stars: ✭ 28 (-89.23%)
sembei🍘 単語分割を経由しない単語埋め込み 🍘
Stars: ✭ 14 (-94.62%)
gazouJapanese OCR for Linux & Windows
Stars: ✭ 32 (-87.69%)
CrowdLayerA neural network layer that enables training of deep neural networks directly from crowdsourced labels (e.g. from Amazon Mechanical Turk) or, more generally, labels from multiple annotators with different biases and levels of expertise.
Stars: ✭ 45 (-82.69%)
kanji-web-appAngular.js kanji web application
Stars: ✭ 45 (-82.69%)
Giveme5WExtraction of the five journalistic W-questions (5W) from news articles
Stars: ✭ 16 (-93.85%)
unofficial-jisho-apiEncapsulates the official Jisho.org API and also provides kanji, example, and stroke diagram search.
Stars: ✭ 88 (-66.15%)
customized-symspellJava port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
Stars: ✭ 51 (-80.38%)
scoop-for-jpScoop bucket for ALL Japanese users.
Stars: ✭ 17 (-93.46%)
wana kana rustUtility library for checking and converting between Japanese characters - Hiragana, Katakana - and Romaji
Stars: ✭ 46 (-82.31%)
ArticutapiAPI of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。
Stars: ✭ 252 (-3.08%)
unidic-pyUnidic packaged for installation via pip.
Stars: ✭ 17 (-93.46%)
knpA Japanese Parser
Stars: ✭ 16 (-93.85%)
KWDLCKyoto University Web Document Leads Corpus
Stars: ✭ 64 (-75.38%)
jp-ocr-prunned-cnnAttempting feature map prunning on a CNN trained for Japanese OCR
Stars: ✭ 15 (-94.23%)
textlint-jatextlintの日本語コミュニティ/ルールのアイデア
Stars: ✭ 41 (-84.23%)
ATKSpythis repository is a python package that supports SOAP interface to communicate with the Microsoft ATKS
Stars: ✭ 27 (-89.62%)
hanzi-toolsConverts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
Stars: ✭ 69 (-73.46%)
kanjiHaskell suite for determining what 級 (level) of the 漢字検定 (national Kanji exam) a given Kanji belongs to.
Stars: ✭ 19 (-92.69%)
TALPCoTUFS Asian Language Parallel Corpus
Stars: ✭ 32 (-87.69%)
PIEFast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)
Stars: ✭ 164 (-36.92%)
AlpacaTagAlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)
Stars: ✭ 126 (-51.54%)
NLP ToolkitLibrary of state-of-the-art models (PyTorch) for NLP tasks
Stars: ✭ 92 (-64.62%)
udarUDAR Does Accented Russian: A finite-state morphological analyzer of Russian that handles stressed wordforms.
Stars: ✭ 15 (-94.23%)
Hibi[No Active Development] An Android app for learning Japanese by keeping a journal.
Stars: ✭ 37 (-85.77%)
rippletaggerRippleTagger identifies part-of-speech tags (Nouns, Verbs, and so on...). You give it a sentence, it gives you a list of tags back.
Stars: ✭ 12 (-95.38%)
sakubunA tool that helps you improve your Japanese vocabulary and kanji skills with practice that's customized to your needs.
Stars: ✭ 20 (-92.31%)
youtokentome-rubyHigh performance unsupervised text tokenization for Ruby
Stars: ✭ 17 (-93.46%)
extra-modelCode to run the ExtRA algorithm for unsupervised topic/aspect extraction on English texts.
Stars: ✭ 43 (-83.46%)
simple NERsimple rule based named entity recognition
Stars: ✭ 29 (-88.85%)
hashformersHashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-93.08%)
NLP-toolsUseful python NLP tools (evaluation, GUI interface, tokenization)
Stars: ✭ 39 (-85%)
sample-ui-reactMaterial-UI+ React.js + Redux [ Pug / Scss / Babel ]
Stars: ✭ 15 (-94.23%)