DatasetsPoetry-related datasets developed by THUAIPoet (Jiuge) group.
Stars: ✭ 111 (-60.07%)
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+772.3%)
Weibo terminaterFinal Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Stars: ✭ 2,295 (+725.54%)
TV4DialogNo description or website provided.
Stars: ✭ 33 (-88.13%)
Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+2294.24%)
CBLUE中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+36.33%)
OpenDialogAn Open-Source Package for Chinese Open-domain Conversational Chatbot (中文闲聊对话系统,一键部署微信闲聊机器人)
Stars: ✭ 94 (-66.19%)
covid-19-data-cleanupScripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
Stars: ✭ 25 (-91.01%)
hkcs香港民間字集 Hong Kong Character Set Project (HKCS)
Stars: ✭ 29 (-89.57%)
fastmorphFast corpus search engine originally made for the Corpus of Written Tatar language
Stars: ✭ 14 (-94.96%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+1339.93%)
newsletter-archiveMarkdown archive & RSS/Atom feeds for Data Is Plural.
Stars: ✭ 65 (-76.62%)
open-discourseOpen Discourse is the first fully comprehensive corpus of the plenary proceedings of the federal German Parliament (Bundestag).
Stars: ✭ 47 (-83.09%)
huozi.jsA simple typography engine for CJK languages, especially designed for game rich-text. 用于游戏富文本的中日韩文字排印引擎。
Stars: ✭ 135 (-51.44%)
FewCLUEFewCLUE 小样本学习测评基准,中文版
Stars: ✭ 251 (-9.71%)
TSForecastingThis repository contains the implementations related to the experiments of a set of publicly available datasets that are used in the time series forecasting research space.
Stars: ✭ 53 (-80.94%)
KorporaKorean corpus repository
Stars: ✭ 270 (-2.88%)
RoapiCreate full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (-8.99%)
datasetsTFDS data loaders for sign language datasets.
Stars: ✭ 17 (-93.88%)
dialogue-datasetscollect the open dialog corpus and some useful data processing utils.
Stars: ✭ 24 (-91.37%)
wordfish-pythonextract relationships from standardized terms from corpus of interest with deep learning 🐟
Stars: ✭ 19 (-93.17%)
dbcollectionA collection of popular datasets for deep learning.
Stars: ✭ 26 (-90.65%)
Xmorse🌞 ~1.5Kb morse code library for all. 一个支持 Unicode 中文摩斯密码编码的 Javascript 库。
Stars: ✭ 266 (-4.32%)
NetEmb-DatasetsA collection of real-world networks/graphs for Network Embedding
Stars: ✭ 18 (-93.53%)
awesome-hokchewA curated list of resources about the Hokchew / Foochow language. 閩東語福州話的資源整合列表。
Stars: ✭ 16 (-94.24%)
MeglassAn eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Stars: ✭ 281 (+1.08%)
recurrent-defocus-deblurring-synth-dual-pixelReference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Lev…
Stars: ✭ 30 (-89.21%)
podiumPodium: a framework agnostic Python NLP library for data loading and preprocessing
Stars: ✭ 55 (-80.22%)
disent🧶 Modular VAE disentanglement framework for python built with PyTorch Lightning ▸ Including metrics and datasets ▸ With strongly supervised, weakly supervised and unsupervised methods ▸ Easily configured and run with Hydra config ▸ Inspired by disentanglement_lib
Stars: ✭ 41 (-85.25%)
Indian ParallelCorpusCurated list of publicly available parallel corpus for Indian Languages
Stars: ✭ 23 (-91.73%)
DeepSentiPersRepository for the experiments described in the paper named "DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus"
Stars: ✭ 17 (-93.88%)
Overview中文编程的历史、现状和展望。issue 中进行相关问题的讨论.
Stars: ✭ 282 (+1.44%)
dplace-dataThe data repository for the D-PLACE Project (Database of Places, Language, Culture and Environment)
Stars: ✭ 49 (-82.37%)
Filipino-Text-BenchmarksOpen-source benchmark datasets and pretrained transformer models in the Filipino language.
Stars: ✭ 22 (-92.09%)
download audioset📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).
Stars: ✭ 53 (-80.94%)
Swiftswift 上手开发APP必备
Stars: ✭ 257 (-7.55%)
cn.jenkins.ioChinese version of the website
Stars: ✭ 30 (-89.21%)
ml4seA curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
Stars: ✭ 46 (-83.45%)
opendatasetsA Python library for downloading datasets from Kaggle, Google Drive, and other online sources.
Stars: ✭ 161 (-42.09%)
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-8.27%)
SpiCE-CorpusAn open-access corpus of conversational bilingual speech in Cantonese and English
Stars: ✭ 33 (-88.13%)