All Projects → crownpku → Small Chinese Corpus

crownpku / Small Chinese Corpus

Some useful Chinese corpus datasets 中文语料小数据

Projects that are alternatives of or similar to Small Chinese Corpus

Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (-70.35%)
Mutual labels:  corpus, chinese-nlp
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+1340.69%)
Mutual labels:  corpus, chinese-nlp
Weixin public corpus
微信公众号语料库
Stars: ✭ 465 (+0.65%)
Mutual labels:  corpus, chinese-nlp
Chinese Nlp Corpus
Collections of Chinese NLP corpus
Stars: ✭ 438 (-5.19%)
Mutual labels:  corpus, chinese-nlp
Chineseaddress ocr
Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。
Stars: ✭ 309 (-33.12%)
Mutual labels:  chinese-nlp
wordfish-python
extract relationships from standardized terms from corpus of interest with deep learning 🐟
Stars: ✭ 19 (-95.89%)
Mutual labels:  corpus
open-discourse
Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (-89.83%)
Mutual labels:  corpus
DeepSentiPers
Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-96.32%)
Mutual labels:  corpus
Bookcorpus
Crawl BookCorpus
Stars: ✭ 443 (-4.11%)
Mutual labels:  corpus
Zhparser
zhparser is a PostgreSQL extension for full-text search of Chinese language
Stars: ✭ 418 (-9.52%)
Mutual labels:  chinese-nlp
Cluecorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (-39.83%)
Mutual labels:  corpus
EdgarAllanPoetry
Computer-generated poetry
Stars: ✭ 22 (-95.24%)
Mutual labels:  corpus
Ltp
Language Technology Platform
Stars: ✭ 3,648 (+689.61%)
Mutual labels:  chinese-nlp
fastmorph
Fast corpus search engine originally made for the Corpus of Written Tatar language
Stars: ✭ 14 (-96.97%)
Mutual labels:  corpus
Corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Stars: ✭ 4,293 (+829.22%)
Mutual labels:  corpus
Species-Names-Corpus
物种名称语料库。植物名,动物名。
Stars: ✭ 23 (-95.02%)
Mutual labels:  corpus
Korpora
Korean corpus repository
Stars: ✭ 270 (-41.56%)
Mutual labels:  corpus
Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Stars: ✭ 378 (-18.18%)
Mutual labels:  corpus
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-44.81%)
Mutual labels:  corpus
Thulac Java
An Efficient Lexical Analyzer for Chinese
Stars: ✭ 285 (-38.31%)
Mutual labels:  chinese-nlp

中文语料小数据:Some useful Chinese corpus datasets

  • 中国省市经纬度坐标:city_location/

  • 中国省市邮政编码大全:postal_provinces/

  • 全国区划和城乡划分代码(2015):china_geo_code/

  • 成语大全:chengyu/

  • 中文人名大全及金庸小说、三国演义及红楼梦人物姓名:chi_names/

  • 中文命名实体识别数据sample:NER_chi/

  • 中文关系识别数据sample:relation_multiple_chi/

  • 中文阅读理解数据sample:reading_comprehension_chi/

  • 中文图文问答数据(基于MSCOCO):Chinese_Visual_QA_pairs/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].