Top 765 dataset open source projects

the-seinfeld-chronicles
A dataset for textual analysis on arguably the best written comedy television show ever.
Combinatorial-3D-Shape-Generation
An official repository of paper "Combinatorial 3D Shape Generation via Sequential Assembly", presented at NeurIPS 2020 Workshop on Machine Learning for Engineering Modeling, Simulation, and Design
shrec17
Supplementary code for SHREC 2017 RGB-D Object-to-CAD Retrieval track
RamaNet
Preforms De novo protein design using machine learning and PyRosetta to generate a novel protein structure
ETDataset
The Electricity Transformer dataset is collected to support the further investigation on the long sequence forecasting problem.
dataset-ssvep-exoskeleton
SSVEP-based BCI recording of 12 subjects operating an upper limb exoskeleton during a shared control task. The exoskeleton is either controlled with a touchless interface detecting hand poses or with BCI.
awesome-indoor-farming
A curated list of awesome dataset, technologies, companies, and media about Indoor Farming.
SpatialSense
An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
budongsan
주택 실거래가 분석을 위한 실 Datasets 및 함수 제공 R package
✭ 27
rdataset
vedai
This repository for training tensorflow models. Dataset based on Vehicle Detection in Aerial Imagery (VEDAI)
tape-neurips2019
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (DEPRECATED)
Cross-Language-Dataset
A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection
labelme2Datasets
python scripts to convert labelme-generated-jsons to voc/coco style datasets.
cia
🐱‍💻 CIA Factbook data analysis and dataset reconstruction, modification, and tuning go here.
doccano-transformer
The official tool for transforming doccano format into common dataset formats.
conferencias matutinas amlo
CSVs de las versiones estenográficas de las conferencias matutinas del Presidente Andres Manuel López Obrador ( Mañaneras AMLO )
satellite-crosswalk-classification
Deep Learning Based Large-Scale Automatic Satellite Crosswalk Classification (GRSL, 2017)
pc-part-dataset
A dataset of PC parts scraped from PCPartPicker
Phishing-Dataset
Phishing dataset with more than 88,000 instances and 111 features. Web application available at. https://gregavrbancic.github.io/Phishing-Dataset/
metadat
Meta-analytic datasets for R
docker-dataset
Docker database images with pre-populated data for testing and/or practice.
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
public sport science datasets
An ongoing compilation of publicly available datasets for sport science projects.
FacialEmotionRecognition
Using Extended Cohn-Kanade AU-Coded Facial Expression Database to classify basic human facial emotion expressions using ann
TCPD
The Turing Change Point Dataset - A collection of time series for the evaluation and development of change point detection algorithms
fashion dataset
Fashionista Dataset for training machine learning models
sidechainnet
An all-atom protein structure dataset for machine learning.
chess-openings
An aggregated data set of chess opening names
RGBDAcquisition
A uniform library wrapper for input from V4L2,Freenect,OpenNI,OpenNI2,DepthSense,Intel Realsense,OpenGL simulations and other types of video and depth input..
SQLiteHelper
🗄 This project comes in handy when you want to write a sql statement easily and smarter.
recsys slates dataset
FINN.no Slate Dataset for Recommender Systems. A dataset containing all interactions (viewed items + response (clicked item / no click) for users over a longer time horizon.
MVDet
[ECCV 2020] Codes and MultiviewX dataset for "Multiview Detection with Feature Perspective Transformation".
Speech-Corpus-Collection
A Collection of Speech Corpus for ASR and TTS
PlantDoc-Dataset
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020
Chinese-Medical-QA-Data
中文疾病诊断数据集(百万条),可用于中国人疾病分析、疾病诊断。
661-720 of 765 dataset projects