All Projects → wainshine → Species-Names-Corpus

wainshine / Species-Names-Corpus

Licence: Apache-2.0 License
物种名称语料库。植物名,动物名。

Projects that are alternatives of or similar to Species-Names-Corpus

Company Names Corpus
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Stars: ✭ 868 (+3673.91%)
Mutual labels:  corpus, dataset, dict
Chinese Names Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+13173.91%)
Mutual labels:  corpus, dataset, dict
Medical-Names-Corpus
医疗语料库。医疗机构名语料库。药品本位码。
Stars: ✭ 26 (+13.04%)
Mutual labels:  corpus, dataset, dict
Insuranceqa Corpus Zh
🚁 保险行业语料库,聊天机器人
Stars: ✭ 821 (+3469.57%)
Mutual labels:  corpus, dataset
Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Stars: ✭ 493 (+2043.48%)
Mutual labels:  corpus, dataset
Nlp chinese corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+28839.13%)
Mutual labels:  corpus, dataset
Ua Gec
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (+369.57%)
Mutual labels:  corpus, dataset
Coarij
Corpus of Annual Reports in Japan
Stars: ✭ 55 (+139.13%)
Mutual labels:  corpus, dataset
Awesome Hungarian Nlp
A curated list of NLP resources for Hungarian
Stars: ✭ 121 (+426.09%)
Mutual labels:  corpus, dataset
Prosody
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (+504.35%)
Mutual labels:  corpus, dataset
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (+495.65%)
Mutual labels:  corpus, dataset
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+10443.48%)
Mutual labels:  corpus, dataset
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (+1008.7%)
Mutual labels:  corpus, dataset
Dataset List
lists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (+265.22%)
Mutual labels:  corpus, dataset
Dialog corpus
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+7126.09%)
Mutual labels:  corpus, dataset
Indonesian Nlp Resources
data resource untuk NLP bahasa indonesia
Stars: ✭ 143 (+521.74%)
Mutual labels:  corpus, dataset
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (+586.96%)
Mutual labels:  corpus, dataset
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-86.96%)
Mutual labels:  corpus
OneStopEnglishCorpus
No description or website provided.
Stars: ✭ 38 (+65.22%)
Mutual labels:  corpus
PoetryCorpus
Поэтический корпус русского языка
Stars: ✭ 40 (+73.91%)
Mutual labels:  corpus

物种名称语料库(Species-Names-Corpus)

业余项目“萌名NameMoe(一个基于语料库技术的取名工具)”的副产品。

不定期更新。只删词,不加词。

可用于中文分词、物种名称识别。


物种名称语料库(Species-Names-Corpus)

数据大小:20万。

语料来源:多个词典汇总。

数据清洗:已清洗,但仍存有大量badcase。


请勿提交涉政issue:

惹不起呀,谢谢~

语料中还存有的,会在后续的更新中逐步删除掉。


更新时间:

删除部分badcase。 -2019.07.27

删除部分badcase。 -2020.12.13


@萌名NameMoe 整理

2020.12.13

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].