YomichanJapanese pop-up dictionary extension for Chrome and Firefox.
Stars: ✭ 464 (+809.8%)
Ua GecUA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (+111.76%)
Pubmed RctPubMed 200k RCT dataset: a large dataset for sequential sentence classification.
Stars: ✭ 101 (+98.04%)
KuroshiroJapanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.
Stars: ✭ 386 (+656.86%)
say-itTTS in command line -- Pronounce the Chinese and English words you typed in.
Stars: ✭ 19 (-62.75%)
Dataset Listlists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (+64.71%)
YakuhanjpYakumono-Hankaku Only Web Fonts
Stars: ✭ 288 (+464.71%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+270.59%)
NematusOpen-Source Neural Machine Translation in Tensorflow
Stars: ✭ 730 (+1331.37%)
CoarijCorpus of Annual Reports in Japan
Stars: ✭ 55 (+7.84%)
NagisaA Japanese tokenizer based on recurrent neural networks
Stars: ✭ 260 (+409.8%)
irisRepositório oficial da BOT Íris, uma robô em português, inglês e espanhol para WhatsApp [Com MD/Sem MD], possui centenas de comandos diferentes, a lista vai de fazer stickers a jogar xadrez ou blackjack.
Stars: ✭ 166 (+225.49%)
Lyrics CorporaAn unofficial Python API that allows users to create a corpus of lyrical text from their favorite artists and billboard charts
Stars: ✭ 13 (-74.51%)
Nodejs JaNode.js 日本語ローカリゼーション
Stars: ✭ 98 (+92.16%)
Texar PytorchIntegrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 636 (+1147.06%)
Naive Bayes ClassifierNaive Bayes classifier is classification algorithm. It uses Naive based Bernoulli and Multinomial equation to classify documents(Text) as ham or spam.
Stars: ✭ 6 (-88.24%)
Cross-Language-DatasetA multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection
Stars: ✭ 60 (+17.65%)
Seq2seq ChatbotChatbot in 200 lines of code using TensorLayer
Stars: ✭ 777 (+1423.53%)
scoop-for-jpScoop bucket for ALL Japanese users.
Stars: ✭ 17 (-66.67%)
QuantedaAn R package for the Quantitative Analysis of Textual Data
Stars: ✭ 647 (+1168.63%)
KanColle-English-Patch-KCCPEnglish Patch for the original KanColle browser game, to be used with KCCacheProxy. Translates most of the game into english.
Stars: ✭ 28 (-45.1%)
ZipanguA library for compatibility about Japan.
Stars: ✭ 27 (-47.06%)
Awesome Persian Nlp IrCurated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+801.96%)
shell-genomicsIntroduction to the Command Line for Genomics
Stars: ✭ 54 (+5.88%)
sembei🍘 単語分割を経由しない単語埋め込み 🍘
Stars: ✭ 14 (-72.55%)
WordlessAn Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Stars: ✭ 378 (+641.18%)
sb-nmtCode for Synchronous Bidirectional Neural Machine Translation (SB-NMT)
Stars: ✭ 66 (+29.41%)
Cluecorpus2020Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (+445.1%)
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (+400%)
opensource-voice-toolsA repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-58.82%)
sample-ui-reactMaterial-UI+ React.js + Redux [ Pug / Scss / Babel ]
Stars: ✭ 15 (-70.59%)
wordfish-pythonextract relationships from standardized terms from corpus of interest with deep learning 🐟
Stars: ✭ 19 (-62.75%)
open-discourseOpen Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (-7.84%)
DeepSentiPersRepository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-66.67%)
rhymesGive me an English word and I’ll give you a list of rhymes
Stars: ✭ 34 (-33.33%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-56.86%)
wana kana rustUtility library for checking and converting between Japanese characters - Hiragana, Katakana - and Romaji
Stars: ✭ 46 (-9.8%)
SpiCE-CorpusAn open-access corpus of conversational bilingual speech in Cantonese and English
Stars: ✭ 33 (-35.29%)
jmdict-simplifiedJMdict, JMnedict, Kanjidic, KRADFILE/RADKFILE in JSON format
Stars: ✭ 96 (+88.24%)
kanji-web-appAngular.js kanji web application
Stars: ✭ 45 (-11.76%)
foliaFoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchange. Note that the actual Python library for proces…
Stars: ✭ 56 (+9.8%)
Seq2seqA general-purpose encoder-decoder framework for Tensorflow
Stars: ✭ 5,455 (+10596.08%)
When-in-RomeA meta-corpus of functional harmonic analysis.
Stars: ✭ 35 (-31.37%)
textboxText collections made available by the CLiGS group.
Stars: ✭ 19 (-62.75%)
Nihonoari-AppA little and minimalist Japanese Kana training
Stars: ✭ 66 (+29.41%)
Pluralize.NET📘 Pluralize or singularize any English word.
Stars: ✭ 50 (-1.96%)
lang-jaManage Japanese language files which distributed with vim.
Stars: ✭ 20 (-60.78%)
ToiroA comparison tool of Japanese tokenizers
Stars: ✭ 95 (+86.27%)
Seq2seqMinimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch
Stars: ✭ 552 (+982.35%)