All Projects → jrte-corpus → Similar Projects or Alternatives

159 Open source projects that are alternatives of or similar to jrte-corpus

kanji-frequency
Kanji usage frequency data collected from various sources
Stars: ✭ 92 (+39.39%)
Mutual labels:  corpus, japanese-language
OpenConvert
Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)
Stars: ✭ 20 (-69.7%)
Mutual labels:  corpus
nippon
日语N5-N2语法笔记~ 🍻
Stars: ✭ 84 (+27.27%)
Mutual labels:  japanese-language
Nlvr
Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
Stars: ✭ 192 (+190.91%)
Mutual labels:  corpus
german-nouns
A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.
Stars: ✭ 101 (+53.03%)
Mutual labels:  corpus
gum
Repository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (+7.58%)
Mutual labels:  corpus
python-doc-ja
Python ドキュメント日本語訳プロジェクト
Stars: ✭ 130 (+96.97%)
Mutual labels:  japanese-language
mev-corpus
MEV Data Corpus
Stars: ✭ 77 (+16.67%)
Mutual labels:  corpus
tvsub
TVsub: DCU-Tencent Chinese-English Dialogue Corpus
Stars: ✭ 40 (-39.39%)
Mutual labels:  corpus
Wp2txt
WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.
Stars: ✭ 145 (+119.7%)
Mutual labels:  corpus
Code Docstring Corpus
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and automated code generation (doc2code) tasks.
Stars: ✭ 137 (+107.58%)
Mutual labels:  corpus
Probabilistic-RNN-DA-Classifier
Probabilistic Dialogue Act Classification for the Switchboard Corpus using an LSTM model
Stars: ✭ 22 (-66.67%)
Mutual labels:  corpus
Nihonoari-App
A little and minimalist Japanese Kana training
Stars: ✭ 66 (+0%)
Mutual labels:  japanese-language
workshop-IJTA
Rによる日本語テキスト分析入門
Stars: ✭ 25 (-62.12%)
Mutual labels:  japanese-language
thaigov-corpus
โครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย
Stars: ✭ 19 (-71.21%)
Mutual labels:  corpus
megs
A merged version of multiple open-source German speech datasets.
Stars: ✭ 21 (-68.18%)
Mutual labels:  corpus
nytwit
New York Times Word Innovation Types dataset
Stars: ✭ 21 (-68.18%)
Mutual labels:  corpus
Awesome Deeplearning Resources
Deep Learning and deep reinforcement learning research papers and some codes
Stars: ✭ 2,483 (+3662.12%)
Mutual labels:  corpus
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+22.73%)
Mutual labels:  corpus
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (+139.39%)
Mutual labels:  corpus
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Stars: ✭ 711 (+977.27%)
Mutual labels:  corpus
Prosody
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (+110.61%)
Mutual labels:  corpus
When-in-Rome
A meta-corpus of functional harmonic analysis.
Stars: ✭ 35 (-46.97%)
Mutual labels:  corpus
bisemantic
Text pair classification
Stars: ✭ 12 (-81.82%)
Mutual labels:  textual-entailment
Khcoder
KH Coder: for Quantitative Content Analysis or Text Mining
Stars: ✭ 126 (+90.91%)
Mutual labels:  corpus
Dialog corpus
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+2418.18%)
Mutual labels:  corpus
DANeS
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)
Stars: ✭ 64 (-3.03%)
Mutual labels:  corpus
malay-dataset
Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+186.36%)
Mutual labels:  corpus
rclc
Rich Context leaderboard competition, including the corpus and current SOTA for required tasks.
Stars: ✭ 20 (-69.7%)
Mutual labels:  corpus
kotoba
A Discord bot for helping with learning Japanese.
Stars: ✭ 118 (+78.79%)
Mutual labels:  japanese-language
AtCoderClans
【非公式】AtCoderがもっと楽しくなるリンク集です。有志による非公式サービス・ツール・ライブラリ・記事などをまとめています。
Stars: ✭ 74 (+12.12%)
Mutual labels:  japanese-language
open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
Stars: ✭ 65 (-1.52%)
Mutual labels:  corpus
Convert-Numbers-to-Japanese
Converts Arabic numerals, or 'western' style numbers, to a Japanese context.
Stars: ✭ 33 (-50%)
Mutual labels:  japanese-language
Domino-English-Translation
🌏 Let's translate Domino, a Japanese MIDI editor!
Stars: ✭ 29 (-56.06%)
Mutual labels:  japanese-language
jmdict-simplified
JMdict, JMnedict, Kanjidic, KRADFILE/RADKFILE in JSON format
Stars: ✭ 96 (+45.45%)
Mutual labels:  japanese-language
jaco-js
Japanese character optimizer for JavaScript
Stars: ✭ 72 (+9.09%)
Mutual labels:  japanese-language
Dialogue-Corpus
No description or website provided.
Stars: ✭ 27 (-59.09%)
Mutual labels:  corpus
google-news-scraper
Google News Scraper for languages like Japanese, Chinese... [VPN Support]
Stars: ✭ 88 (+33.33%)
Mutual labels:  japanese-language
Chinese Names Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+4525.76%)
Mutual labels:  corpus
ocr2text
Convert a PDF via OCR to a TXT file in UTF-8 encoding
Stars: ✭ 90 (+36.36%)
Mutual labels:  corpus
Weibo terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+3377.27%)
Mutual labels:  corpus
TV4Dialog
No description or website provided.
Stars: ✭ 33 (-50%)
Mutual labels:  corpus
Efaqa Corpus Zh
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Stars: ✭ 170 (+157.58%)
Mutual labels:  corpus
opensource-voice-tools
A repo listing known open source voice tools, ordered by where they sit in the voice stack
Stars: ✭ 21 (-68.18%)
Mutual labels:  corpus
Indonesian Nlp Resources
data resource untuk NLP bahasa indonesia
Stars: ✭ 143 (+116.67%)
Mutual labels:  corpus
BSD
The Business Scene Dialogue corpus
Stars: ✭ 51 (-22.73%)
Mutual labels:  corpus
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+3574.24%)
Mutual labels:  corpus
Chatbot-Training-Corpus
总结了一些可以用作聊天机器人训练实作的文字语聊,包含中英文不同语言
Stars: ✭ 117 (+77.27%)
Mutual labels:  corpus
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (+107.58%)
Mutual labels:  corpus
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+83.33%)
Mutual labels:  corpus
Awesome Chatbot
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
Stars: ✭ 1,785 (+2604.55%)
Mutual labels:  corpus
Senti4SD
An emotion-polarity classifier specifically trained on developers' communication channels
Stars: ✭ 41 (-37.88%)
Mutual labels:  sentiment-polarity
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+3100%)
Mutual labels:  corpus
textbox
Text collections made available by the CLiGS group.
Stars: ✭ 19 (-71.21%)
Mutual labels:  corpus
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
Stars: ✭ 113 (+71.21%)
Mutual labels:  corpus
limelight
A php Japanese language text analyzer and parser.
Stars: ✭ 76 (+15.15%)
Mutual labels:  japanese-language
LanguageCodes
We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (+6.06%)
Mutual labels:  corpus
ra-language-japanese
Japanese messages for react-admin
Stars: ✭ 22 (-66.67%)
Mutual labels:  japanese-language
banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…
Stars: ✭ 186 (+181.82%)
Mutual labels:  textual-entailment
proiel-treebank
Official releases of the PROIEL treebank of ancient Indo-European languages
Stars: ✭ 30 (-54.55%)
Mutual labels:  corpus
1-60 of 159 similar projects