Top 499 dataset open source projects

Chinese Names Corpus
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Cities.json
Cities of the world in Json, based on GeoNames Gazetteer
Text
Data loaders and abstractions for text and NLP
Recommendersystem Dataset
This repository contains some datasets that I have collected in Recommender Systems.
Cocostuff10k
The official homepage of the (outdated) COCO-Stuff 10K dataset.
Taco
🌮 Trash Annotations in Context Dataset Toolkit
Retriever
Quickly download, clean up, and install public datasets into a database management system
Chazutsu
The tool to make NLP datasets ready to use
Covid 19 Repo Data
Data archive of identifiable COVID-19 related public projects on GitHub
Covid Chestxray Dataset
We are building an open database of COVID-19 cases with chest X-ray or CT images.
University1652 Baseline
ACM Multimedia2020 University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization 🚁 annotates 1652 buildings in 72 universities around the world.
Datalad
Keep code, data, containers under control with git and git-annex
Datasets
source{d} datasets ("big code") for source code analysis and machine learning on source code
Structured3d
[ECCV'20] Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Weatherbench
A benchmark dataset for data-driven weather forecasting
Stocknet Dataset
A comprehensive dataset for stock movement prediction from tweets and historical stock prices.
Vehicle reid Collection
🚗 the collection of vehicle re-ID papers, datasets. 🚗
Torchdata
PyTorch dataset extended with map, cache etc. (tensorflow.data like)
Stationary
Get hourly meteorological data from one of thousands of global stations
✭ 225
rdataset
Automated Resume Screening System
Automated Resume Screening System using Machine Learning (With Dataset)
H36m Fetch
Human 3.6M 3D human pose dataset fetcher
Collection
Collection Data for Cooper Hewitt, Smithsonian Design Museum
Bccd dataset
BCCD (Blood Cell Count and Detection) Dataset is a small-scale dataset for blood cells detection.
Dataset Serialize
JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus (FPC)
Dialogrpt
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Short Jokes Dataset
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
Ava downloader
⏬ Download AVA dataset (A Large-Scale Database for Aesthetic Visual Analysis)
Omnianomaly
KDD 2019: Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
Charlatan
Create fake data in R
Mini Imagenet Tools
Tools for generating mini-ImageNet dataset and processing batches
Covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Split Folders
🗂 Split folders with files (i.e. images) into training, validation and test (dataset) folders
Tech.ml.dataset
A Clojure high performance data processing system
Trump Lies
Tutorial: Web scraping in Python with Beautiful Soup
Awesome Json Datasets
A curated list of awesome JSON datasets that don't require authentication.
Dali
DALI: a large Dataset of synchronised Audio, LyrIcs and vocal notes.
Data Set
state driven all in one data process for data visualization
Fifa18 All Player Statistics
A complete catalog of all the players in Fifa 18 and their complete statistics.
Mutual
A Dataset for Multi-Turn Dialogue Reasoning
Intrinsic Image Popularity
The pytorch code of the paper "Intrinsic Image Popularity Assessment"
Sign Language Digits Dataset
Turkey Ankara Ayrancı Anadolu High School's Sign Language Digits Dataset
✭ 176
dataset
Sice
Learning a Deep Single Image Contrast Enhancer from Multi-Exposure Images (TIP 2018)
✭ 175
dataset
Msmarco
Utilities, Baselines, Statistics and Descriptions Related to the MSMARCO DATASET
Everypolitician Data
data for national legislatures worldwide
Datasets For Good
List of datasets to apply stats/machine learning/technology to the world of social good.
Hand pose action
Dataset and code for the paper "First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations", CVPR 2018.
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Faker
Faker is a Python package that generates fake data for you.
Mirdata
Python library to work with Music Information Retrieval datasets
1-60 of 499 dataset projects