Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+1250.1%)
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+391.89%)
Musical Onset EfficientSupplementary information and code for the paper: An efficient deep learning model for musical onset detection
Stars: ✭ 26 (-94.73%)
Chatito🎯🗯 Generate datasets for AI chatbots, NLP tasks, named entity recognition or text classification models using a simple DSL!
Stars: ✭ 678 (+37.53%)
ProsodyHelsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (-71.81%)
Chinese Names Corpus中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+519.27%)
Cluener2020CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+39.76%)
TV4DialogNo description or website provided.
Stars: ✭ 33 (-93.31%)
Lightnlp基于Pytorch和torchtext的自然语言处理深度学习框架。
Stars: ✭ 739 (+49.9%)
Gensim DataData repository for pretrained NLP models and NLP corpora.
Stars: ✭ 622 (+26.17%)
Dialog corpus用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+237.12%)
DialogrptEMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Stars: ✭ 216 (-56.19%)
DatasetsPoetry-related datasets developed by THUAIPoet (Jiuge) group.
Stars: ✭ 111 (-77.48%)
Weibo terminaterFinal Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+365.52%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-95.54%)
AiSpaceAiSpace: Better practices for deep learning model development and deployment For Tensorflow 2.0
Stars: ✭ 28 (-94.32%)
OpenDialogAn Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (-80.93%)
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-48.28%)
Nlp RecipesNatural Language Processing Best Practices & Examples
Stars: ✭ 5,783 (+1073.02%)
Pytorch-NLUPytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (-69.37%)
Ua GecUA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (-78.09%)
Dataset Listlists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-82.96%)
CoarijCorpus of Annual Reports in Japan
Stars: ✭ 55 (-88.84%)
HdltexHDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (-61.26%)
Nlp bahasa resourcesA Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (-67.95%)
Eda nlp for chineseAn implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。
Stars: ✭ 660 (+33.87%)
Nlp xiaojiang自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text classification), 实体提取(ner,bert+bilstm+crf),数据增强(text augment, data enhance),同义句同义词生成,句子主干提取(mainpart),中文汉语短文本相似度,文本特征工程,keras-http-service调用
Stars: ✭ 954 (+93.51%)
Gpt2 MlGPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
Stars: ✭ 1,066 (+116.23%)
Cluecorpus2020Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Stars: ✭ 278 (-43.61%)
CBLUE中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (-23.12%)
Chinese Text ClassificationChinese-Text-Classification,Tensorflow CNN(卷积神经网络)实现的中文文本分类。QQ群:522785813,微信群二维码:http://www.tensorflownews.com/
Stars: ✭ 284 (-42.39%)
CorporaA collection of small corpuses of interesting data for the creation of bots and similar stuff.
Stars: ✭ 4,293 (+770.79%)
Zh.javascript.info现代 JavaScript 教程(The Modern JavaScript Tutorial)
Stars: ✭ 5,656 (+1047.26%)
Squad ExplorerVisually Explore the Stanford Question Answering Dataset
Stars: ✭ 421 (-14.6%)
Seq2seqchatbotsA wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Stars: ✭ 466 (-5.48%)
MmfA modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Stars: ✭ 4,713 (+855.98%)
Keras TextText Classification Library in Keras
Stars: ✭ 421 (-14.6%)
Zhparserzhparser is a PostgreSQL extension for full-text search of Chinese language
Stars: ✭ 418 (-15.21%)
Wuhan 2019 Ncov2019-nCoV 新冠状病毒 2019-12-01至今国家、省、市三级每日统计数据(支持接口读取)
Stars: ✭ 414 (-16.02%)
MicroservicesMicroservices from Design to Deployment 中文版 《微服务:从设计到部署》
Stars: ✭ 4,637 (+840.57%)
Lidar BonnetalSemantic and Instance Segmentation of LiDAR point clouds for autonomous driving
Stars: ✭ 465 (-5.68%)