Sejong CorpusKorean sejong corpus download and simple analysis
Stars: ✭ 116 (+81.25%)
BSDThe Business Scene Dialogue corpus
Stars: ✭ 51 (-20.31%)
Awesome Persian Nlp IrCurated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+618.75%)
KagomeSelf-contained Japanese Morphological Analyzer written in pure Go
Stars: ✭ 554 (+765.63%)
kanji-frequencyKanji usage frequency data collected from various sources
Stars: ✭ 92 (+43.75%)
JumanppJuman++ (a Morphological Analyzer Toolkit)
Stars: ✭ 254 (+296.88%)
bunkaiSentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)
Stars: ✭ 154 (+140.63%)
jrte-corpusJapanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
Stars: ✭ 66 (+3.13%)
arabic-taggerAQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training
Stars: ✭ 38 (-40.62%)
PoetryCorpusПоэтический корпус русского языка
Stars: ✭ 40 (-37.5%)
unihandecodeunihandecode is a transliteration library to convert all characters/words in Unicode into ASCII alphabet that aware with Language preference priorities
Stars: ✭ 71 (+10.94%)
JotobaA free online, self-hostable, multilang Japanese dictionary.
Stars: ✭ 87 (+35.94%)
kanjiHaskell suite for determining what 級 (level) of the 漢字検定 (national Kanji exam) a given Kanji belongs to.
Stars: ✭ 19 (-70.31%)
YuzuMarker🍋 [WIP] Manga Translation Tool
Stars: ✭ 76 (+18.75%)
limelightA php Japanese language text analyzer and parser.
Stars: ✭ 76 (+18.75%)
FCH-TTSA fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。
Stars: ✭ 154 (+140.63%)
sticker2Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot
Stars: ✭ 14 (-78.12%)
cl-skkservCommon LispによるSKK辞書サーバーとその拡張
Stars: ✭ 22 (-65.62%)
google-news-scraperGoogle News Scraper for languages like Japanese, Chinese... [VPN Support]
Stars: ✭ 88 (+37.5%)
LanguageCodesWe present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (+9.38%)
jitenjiten - japanese android/cli/web dictionary based on jmdict/kanjidic — 日本語 辞典 和英辞典 漢英字典 和独辞典 和蘭辞典
Stars: ✭ 64 (+0%)
sakubunA tool that helps you improve your Japanese vocabulary and kanji skills with practice that's customized to your needs.
Stars: ✭ 20 (-68.75%)
CBLUE中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+492.19%)
kotobaA Discord bot for helping with learning Japanese.
Stars: ✭ 118 (+84.38%)
pdf-corpusPython script to quickly create hand-crafted PDF files
Stars: ✭ 17 (-73.44%)
GrammarEngineГрамматический Словарь Русского Языка (+ английский, японский, etc)
Stars: ✭ 68 (+6.25%)
kuzushiji-recognitionKuzushiji Recognition Kaggle 2019. Build a DL model to transcribe ancient Kuzushiji into contemporary Japanese characters. Opening the door to a thousand years of Japanese culture.
Stars: ✭ 16 (-75%)
deepnlp小时候练手的nlp项目
Stars: ✭ 11 (-82.81%)
textlint-jatextlintの日本語コミュニティ/ルールのアイデア
Stars: ✭ 41 (-35.94%)
thaigov-corpusโครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย
Stars: ✭ 19 (-70.31%)
datalinguistStanford CoreNLP in idiomatic Clojure.
Stars: ✭ 93 (+45.31%)
citarCitar HMM part-of-speech tagger
Stars: ✭ 16 (-75%)
kanji posterPoster of 2200 jōyō and WaniKani kanji
Stars: ✭ 19 (-70.31%)
nihongoJapanese Dictionary
Stars: ✭ 77 (+20.31%)
akka-doc-jaAkka Japanese Documentation
Stars: ✭ 25 (-60.94%)
dacEntity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia descriptions using either a binary SVM classifier or a neural net.
Stars: ✭ 14 (-78.12%)
nlp-cheat-sheet-pythonNLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (+7.81%)
TV4DialogNo description or website provided.
Stars: ✭ 33 (-48.44%)
voikko-rsRust bindings for the Voikko library
Stars: ✭ 16 (-75%)
When-in-RomeA meta-corpus of functional harmonic analysis.
Stars: ✭ 35 (-45.31%)
mlmorphMalayalam Morphological Analyzer using Finite State Transducer
Stars: ✭ 40 (-37.5%)
zmspThe Mingled Structured Predictor
Stars: ✭ 20 (-68.75%)
textboxText collections made available by the CLiGS group.
Stars: ✭ 19 (-70.31%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+195.31%)
libmorphlibmorph rus/ukr - fast & accurate morphological analyzer/analyses for Russian and Ukrainian
Stars: ✭ 16 (-75%)