Top 765 dataset open source projects

Pycococreator
Helper functions to create COCO datasets
Pokemon.json
Pokemon dataset in JSON.
Cdap
An open source framework for building data analytic applications.
Voice datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (50+ datasets).
Cluepretrainedmodels
高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
Tensorflow object tracking video
Object Tracking in Tensorflow ( Localization Detection Classification ) developed to partecipate to ImageNET VID competition
Chinese rumor dataset
中文谣言数据
✭ 470
dataset
Seq2seqchatbots
A wrapper around tensor2tensor to flexibly train, interact, and generate data for neural chatbots.
Lidar Bonnetal
Semantic and Instance Segmentation of LiDAR point clouds for autonomous driving
Mongodb Json Files
📦 A curated list of JSON / BSON datasets from the web in order to practice / use in MongoDB
Joke Dataset
A dataset of 200k English plaintext jokes.
Inat comp
iNaturalist competition details
Quickdraw Dataset
Documentation on how to access and use the Quick, Draw! Dataset.
Io
Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
Squad Explorer
Visually Explore the Stanford Question Answering Dataset
Awesome Remote Sensing Change Detection
List of datasets, codes, and contests related to remote sensing change detection
Wuhan 2019 Ncov
2019-nCoV 新冠状病毒 2019-12-01至今国家、省、市三级每日统计数据(支持接口读取)
Imdb Face
A new large-scale noise-controlled face recognition dataset.
Free Spoken Digit Dataset
A free audio dataset of spoken digits. Think MNIST for audio.
Cmu Multimodalsdk
CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.
Comma2k19
A driving dataset for the development and validation of fused pose estimators and mapping algorithms
Vpgnet
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)
Tfrecord
TFRecord reader for PyTorch
Trashnet
Dataset of images of trash; Torch-based CNN for garbage image classification
Data
Python related videos and metadata powering =>
Dukemtmc Reid evaluation
ICCV2017 The Person re-ID Evaluation Code for DukeMTMC-reID Dataset (Including Dataset Download)
Medmnist
[ISBI'21] MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis
Dsprites Dataset
Dataset to assess the disentanglement properties of unsupervised learning methods
Eseur Code Data
Code and data used to create the examples in "Evidence-based Software Engineering based on the publicly available data"
Pcam
The PatchCamelyon (PCam) deep learning classification benchmark.
Deeperforensics 1.0
[CVPR 2020] A Large-Scale Dataset for Real-World Face Forgery Detection
Atsd Use Cases
Axibase Time Series Database: Usage Examples and Research Articles
Whylogs
Profile and monitor your ML data pipeline end-to-end
Browser Compat Data
This repository contains compatibility data for Web technologies as displayed on MDN
Awesome Segmentation Saliency Dataset
A collection of some datasets for segmentation / saliency detection. Welcome to PR...😄
Transportationnetworks
Transportation Networks for Research
Covid19 twitter
Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development
Css10
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Datasets
A repository of pretty cool datasets that I collected for network science and machine learning research.
Cryptocmd
Cryptocurrency historical price data library in Python. Data from https://coinmarketcap.com.
Tape
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
Linusrants
Dataset of Linus Torvalds' rants classified by negativity using sentiment analysis
Surface Defect Detection
🐎📈 Constantly summarizing open source dataset and important critical papers in the field of surface defect research which are very important. 🐋
Text2sql Data
A collection of datasets that pair questions with SQL queries.
Realsr
Toward Real-World Single Image Super-Resolution: A New Benchmark and A New Model (ICCV 2019)
Tehran Stocks
A python package to access tsetmc data
Meglass
An eyeglass face dataset collected and cleaned for face recognition evaluation, CCBR 2018.
Fx 1 Minute Data
HISTDATA - Full Dataset composed of 68 FX trading pairs / Simple API to retrieve 1 Minute data Historical FX Prices (up to June 2019).
Knowage Server
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Exclusively Dark Image Dataset
Exclusively Dark (ExDARK) dataset which to the best of our knowledge, is the largest collection of low-light images taken in very low-light environments to twilight (i.e 10 different conditions) to-date with image class and object level annotations.
Semantic Kitti Api
SemanticKITTI API for visualizing dataset, processing data, and evaluating results.
Covid19canada
Epidemiological Data from the COVID-19 Epidemic in Canada
✭ 272
rdataset