THUNLP-AIPoet / Datasets
Poetry-related datasets developed by THUAIPoet (Jiuge) group.
Stars: ✭ 111
Projects that are alternatives of or similar to Datasets
Weibo terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+1967.57%)
Mutual labels: chinese, corpus
Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Stars: ✭ 493 (+344.14%)
Mutual labels: chinese, corpus
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (-15.32%)
Mutual labels: corpus, chinese
CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+241.44%)
Mutual labels: corpus, chinese
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+2084.68%)
Mutual labels: chinese, corpus
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-97.3%)
Mutual labels: corpus, chinese
Cluecorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (+150.45%)
Mutual labels: chinese, corpus
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+5896.4%)
Mutual labels: chinese, corpus
Uer Py
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Stars: ✭ 1,295 (+1066.67%)
Mutual labels: chinese
Pyclue
Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark
Stars: ✭ 91 (-18.02%)
Mutual labels: corpus
Cn sort
中文排序:按拼音/笔顺快速排序简体中文词组(百万数量级,可含中英/多音字)。如果对您有所帮助,欢迎点个star鼓励一下。
Stars: ✭ 102 (-8.11%)
Mutual labels: chinese
Pubmed Rct
PubMed 200k RCT dataset: a large dataset for sequential sentence classification.
Stars: ✭ 101 (-9.01%)
Mutual labels: corpus
Romanize Names
㊙️ Node module for romanizing names in Traditional Chinese for Taiwan primarily.
Stars: ✭ 88 (-20.72%)
Mutual labels: chinese
Dataset List
lists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-24.32%)
Mutual labels: corpus
Tensorflow Yolo1
目标检测yolo算法,采用tensorflow框架编写,中文注释完全,含测试和训练,支持摄像头
Stars: ✭ 107 (-3.6%)
Mutual labels: chinese
THUAIPoet Datasets
This repository provides datasets developed by THUAIPoet (九歌) group, Research Center for Natural Language Processing, Computational Humanities and Social Sciences, Tsinghua University. Note that all our datasets are released for academic use only.
We will keep improving existing datasets and release more sets in the future. Any suggestions are welcome!
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].