All Projects → THUNLP-AIPoet → Datasets

THUNLP-AIPoet / Datasets

Poetry-related datasets developed by THUAIPoet (Jiuge) group.

Projects that are alternatives of or similar to Datasets

TV4Dialog
No description or website provided.
Stars: ✭ 33 (-70.27%)
Mutual labels:  corpus, chinese
Weibo terminater
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+1967.57%)
Mutual labels:  chinese, corpus
Cluedatasetsearch
搜索所有中文NLP数据集,附常用英文NLP数据集
Stars: ✭ 2,112 (+1802.7%)
Mutual labels:  chinese, corpus
Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Stars: ✭ 493 (+344.14%)
Mutual labels:  chinese, corpus
OpenDialog
An Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (-15.32%)
Mutual labels:  corpus, chinese
CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+241.44%)
Mutual labels:  corpus, chinese
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+2084.68%)
Mutual labels:  chinese, corpus
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-97.3%)
Mutual labels:  corpus, chinese
Cluecorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (+150.45%)
Mutual labels:  chinese, corpus
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+5896.4%)
Mutual labels:  chinese, corpus
Uer Py
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Stars: ✭ 1,295 (+1066.67%)
Mutual labels:  chinese
Pyclue
Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark
Stars: ✭ 91 (-18.02%)
Mutual labels:  corpus
Cn sort
中文排序:按拼音/笔顺快速排序简体中文词组(百万数量级,可含中英/多音字)。如果对您有所帮助,欢迎点个star鼓励一下。
Stars: ✭ 102 (-8.11%)
Mutual labels:  chinese
Docker Changelog Chinese
docker变更日志中文版
Stars: ✭ 107 (-3.6%)
Mutual labels:  chinese
Alfred Parrot
📝 一款可以多种语言翻译的 Alfred Workflow
Stars: ✭ 89 (-19.82%)
Mutual labels:  chinese
Pubmed Rct
PubMed 200k RCT dataset: a large dataset for sequential sentence classification.
Stars: ✭ 101 (-9.01%)
Mutual labels:  corpus
Omnetpp primer
OMNeT++的仿真手册
Stars: ✭ 89 (-19.82%)
Mutual labels:  chinese
Romanize Names
㊙️ Node module for romanizing names in Traditional Chinese for Taiwan primarily.
Stars: ✭ 88 (-20.72%)
Mutual labels:  chinese
Dataset List
lists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-24.32%)
Mutual labels:  corpus
Tensorflow Yolo1
目标检测yolo算法,采用tensorflow框架编写,中文注释完全,含测试和训练,支持摄像头
Stars: ✭ 107 (-3.6%)
Mutual labels:  chinese

THUAIPoet Datasets

This repository provides datasets developed by THUAIPoet (九歌) group, Research Center for Natural Language Processing, Computational Humanities and Social Sciences, Tsinghua University. Note that all our datasets are released for academic use only.

We will keep improving existing datasets and release more sets in the future. Any suggestions are welcome!

Dataset Version
THU Poetry Quality Evaluation DataSet (THU-PQED) V0.1
THU Fine-grained Sentimental Poetry Corpus (THU-FSPC) V1.0
THU Chinese Classical Poetry Corpus (THU-CCPC) V1.0
THU Chinese Rhythm and Rhyme Data (THU-CRRD) V0.1
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].