Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+118.02%)
Dialog corpus用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (-45.56%)
Ua GecUA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (-96.46%)
Cluener2020CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (-77.43%)
BondBOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (-96.86%)
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (-91.65%)
CoarijCorpus of Annual Reports in Japan
Stars: ✭ 55 (-98.2%)
ProsodyHelsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (-95.45%)
Dataset Listlists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-97.25%)
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (-20.57%)
Nlp bahasa resourcesA Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (-94.82%)
CollectionCollection Data for Cooper Hewitt, Smithsonian Design Museum
Stars: ✭ 214 (-92.99%)
Covid Chestxray DatasetWe are building an open database of COVID-19 cases with chest X-ray or CT images.
Stars: ✭ 2,759 (-9.63%)
DatatableA go in-memory table
Stars: ✭ 215 (-92.96%)
DialogrptEMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Stars: ✭ 216 (-92.92%)
DictChinese and English translation tools in the command line(命令行下中英文翻译工具)
Stars: ✭ 243 (-92.04%)
University1652 BaselineACM Multimedia2020 University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization 🚁 annotates 1652 buildings in 72 universities around the world.
Stars: ✭ 232 (-92.4%)
Ava downloader⏬ Download AVA dataset (A Large-Scale Database for Aesthetic Visual Analysis)
Stars: ✭ 214 (-92.99%)
Ner DatasetsDatasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
Stars: ✭ 220 (-92.79%)
Covid 19 Repo DataData archive of identifiable COVID-19 related public projects on GitHub
Stars: ✭ 236 (-92.27%)
Bccd datasetBCCD (Blood Cell Count and Detection) Dataset is a small-scale dataset for blood cells detection.
Stars: ✭ 216 (-92.92%)
Cocostuff10kThe official homepage of the (outdated) COCO-Stuff 10K dataset.
Stars: ✭ 248 (-91.88%)
Dataset SerializeJSON to DataSet and DataSet to JSON converter for Delphi and Lazarus (FPC)
Stars: ✭ 213 (-93.02%)
Short Jokes DatasetPython scripts for building 'Short Jokes' dataset, featured on Kaggle
Stars: ✭ 215 (-92.96%)
TextData loaders and abstractions for text and NLP
Stars: ✭ 2,915 (-4.52%)
DataladKeep code, data, containers under control with git and git-annex
Stars: ✭ 234 (-92.34%)
Spacy LookupNamed Entity Recognition based on dictionaries
Stars: ✭ 212 (-93.06%)
PotteryRedis for humans. 🌎🌍🌏
Stars: ✭ 204 (-93.32%)
Taco🌮 Trash Annotations in Context Dataset Toolkit
Stars: ✭ 243 (-92.04%)
Datasetssource{d} datasets ("big code") for source code analysis and machine learning on source code
Stars: ✭ 231 (-92.43%)
OmnianomalyKDD 2019: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
Stars: ✭ 208 (-93.19%)
Structured3d[ECCV'20] Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Stars: ✭ 224 (-92.66%)
CharlatanCreate fake data in R
Stars: ✭ 209 (-93.15%)
Cities.jsonCities of the world in Json, based on GeoNames Gazetteer
Stars: ✭ 251 (-91.78%)
Recommendersystem DatasetThis repository contains some datasets that I have collected in Recommender Systems.
Stars: ✭ 249 (-91.84%)
RetrieverQuickly download, clean up, and install public datasets into a database management system
Stars: ✭ 241 (-92.11%)
WebstructNER toolkit for HTML data
Stars: ✭ 230 (-92.47%)
Mini Imagenet ToolsTools for generating mini-ImageNet dataset and processing batches
Stars: ✭ 209 (-93.15%)
Python Benedictdict subclass with keylist/keypath support, I/O shortcuts (base64, csv, json, pickle, plist, query-string, toml, xml, yaml) and many utilities. 📘
Stars: ✭ 204 (-93.32%)
WeatherbenchA benchmark dataset for data-driven weather forecasting
Stars: ✭ 227 (-92.56%)
Covid19zaCoronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Stars: ✭ 208 (-93.19%)
Malaya Natural Language Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/
Stars: ✭ 239 (-92.17%)
Stocknet DatasetA comprehensive dataset for stock movement prediction from tweets and historical stock prices.
Stars: ✭ 228 (-92.53%)
Split Folders🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders
Stars: ✭ 203 (-93.35%)
Tech.ml.datasetA Clojure high performance data processing system
Stars: ✭ 205 (-93.29%)