crownpku / Small Chinese Corpus
Some useful Chinese corpus datasets 中文语料小数据
Stars: ✭ 462
Labels
Projects that are alternatives of or similar to Small Chinese Corpus
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+1340.69%)
Mutual labels: corpus, chinese-nlp
Chinese Nlp Corpus
Collections of Chinese NLP corpus
Stars: ✭ 438 (-5.19%)
Mutual labels: corpus, chinese-nlp
Chineseaddress ocr
Photographing Chinese-Address OCR implemented using CTPN+CTC+Address Correction. 拍照文档中文地址文字识别。
Stars: ✭ 309 (-33.12%)
Mutual labels: chinese-nlp
wordfish-python
extract relationships from standardized terms from corpus of interest with deep learning 🐟
Stars: ✭ 19 (-95.89%)
Mutual labels: corpus
open-discourse
Open Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (-89.83%)
Mutual labels: corpus
DeepSentiPers
Repository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-96.32%)
Mutual labels: corpus
Zhparser
zhparser is a PostgreSQL extension for full-text search of Chinese language
Stars: ✭ 418 (-9.52%)
Mutual labels: chinese-nlp
Cluecorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (-39.83%)
Mutual labels: corpus
fastmorph
Fast corpus search engine originally made for the Corpus of Written Tatar language
Stars: ✭ 14 (-96.97%)
Mutual labels: corpus
Corpora
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Stars: ✭ 4,293 (+829.22%)
Mutual labels: corpus
Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Stars: ✭ 378 (-18.18%)
Mutual labels: corpus
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-44.81%)
Mutual labels: corpus
Thulac Java
An Efficient Lexical Analyzer for Chinese
Stars: ✭ 285 (-38.31%)
Mutual labels: chinese-nlp
中文语料小数据:Some useful Chinese corpus datasets
-
中国省市经纬度坐标:city_location/
-
中国省市邮政编码大全:postal_provinces/
-
全国区划和城乡划分代码(2015):china_geo_code/
-
成语大全:chengyu/
-
中文人名大全及金庸小说、三国演义及红楼梦人物姓名:chi_names/
-
中文命名实体识别数据sample:NER_chi/
-
中文关系识别数据sample:relation_multiple_chi/
-
中文阅读理解数据sample:reading_comprehension_chi/
-
中文图文问答数据(基于MSCOCO):Chinese_Visual_QA_pairs/
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].