All Projects → ikegami-yukino → Dataset List

ikegami-yukino / Dataset List

Licence: wtfpl
lists of text corpus and more (mainly Japanese)

Projects that are alternatives of or similar to Dataset List

Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+2786.9%)
Mutual labels:  dataset, corpus
Species-Names-Corpus
物种名称语料库。植物名,动物名。
Stars: ✭ 23 (-72.62%)
Mutual labels:  corpus, dataset
Indonesian Nlp Resources
data resource untuk NLP bahasa indonesia
Stars: ✭ 143 (+70.24%)
Mutual labels:  dataset, corpus
Dialog corpus
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+1878.57%)
Mutual labels:  dataset, corpus
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+7823.81%)
Mutual labels:  dataset, corpus
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (+63.1%)
Mutual labels:  dataset, corpus
Chinese Names Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+3534.52%)
Mutual labels:  dataset, corpus
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (+88.1%)
Mutual labels:  dataset, corpus
Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Stars: ✭ 493 (+486.9%)
Mutual labels:  dataset, corpus
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (+203.57%)
Mutual labels:  dataset, corpus
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+44.05%)
Mutual labels:  dataset, corpus
Company Names Corpus
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Stars: ✭ 868 (+933.33%)
Mutual labels:  dataset, corpus
Ua Gec
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (+28.57%)
Mutual labels:  dataset, corpus
Prosody
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (+65.48%)
Mutual labels:  dataset, corpus
Medical-Names-Corpus
医疗语料库。医疗机构名语料库。药品本位码。
Stars: ✭ 26 (-69.05%)
Mutual labels:  corpus, dataset
Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (+877.38%)
Mutual labels:  dataset, corpus
Coarij
Corpus of Annual Reports in Japan
Stars: ✭ 55 (-34.52%)
Mutual labels:  dataset, corpus
Symbolic Musical Datasets
🎹 symbolic musical datasets
Stars: ✭ 79 (-5.95%)
Mutual labels:  dataset
Google Covid19 Mobility Reports
Data extraction of Google's COVID-19 Mobility Reports
Stars: ✭ 82 (-2.38%)
Mutual labels:  dataset
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-5.95%)
Mutual labels:  dataset

dataset-list

lists of text corpus and more

Lists

License

WTFPL

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].