Top 765 dataset open source projects

JD2Skills-BERT-XMLC
Code and Dataset for the Bhola et al. (2020) Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework
OpinRank
OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)
psidR
R package to easily build panel data sets from the PSID
d3d
Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch
STEAD
STanford EArthquake Dataset (STEAD):A Global Data Set of Seismic Signals for AI
6DOF tracking evaluation
Code visualize and evaluate the dataset from "A Framework for Evaluating 6-DOF Object Trackers".
FARED for Anomaly Detection
Official source code of "Fast Adaptive RNN Encoder-Decoder for Anomaly Detection in SMD Assembly Machine"
StreamCat
Landscape features for ~2.65 million streams
awesome-georgian-datasets
Useful datasets, specific to Georgia
make-your-yolov5 dataset
💥Make your yolov5 dataset by using labelimg.I hope my work can help you make your yolov5 datasets more quickly.
city-codes
Brazilian city names and official codes, IBGE, LexML and others
LegoBrickClassification
Repository to identify Lego bricks automatically only using images
trumptweets
Download data on all of Donald Trump's (@RealDonaldTrump) tweets
ir datasets
Provides a common interface to many IR ranking datasets.
deep-learning-german-tts
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
desh-data
Sequence lineage information extracted from RKI sequence data repo
mddatasetbuilder
A script to build reference datasets for training neural network potentials from given LAMMPS trajectories.
Speaker-Anti-Spoofing-Classifiers
Baselines and Classifiers for speaker anti-spoofing detection
multi-task-defocus-deblurring-dual-pixel-nimat
Reference github repository for the paper "Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning". We propose a single-image deblurring network that incorporates the two sub-aperture views into a multitask framework. Specifically, we show that jointly learning to predict the two DP views from a single …
mysql-random-data-generator
This is the easiest MySQL random test data generator tool. Load the procedure and execute to auto detect column types and load data.
conp-dataset
📂 A DataLad dataset for CONP
datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
LeetCode
At present contains scraped data from around 1500 problems present on the site. More to follow....
NLP PEMDC
NLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
caption-contest-data
Data from the caption contest.
strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
soundata
Python library for downloading, loading & working with sound datasets
MSMARCO-MRC-Analysis
Analysis on the MS-MARCO leaderboard regarding the machine reading comprehension task.
DeepCrack
DeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation, Neurocomputing.
instacart-neo4j
Playing with Instacart data in Neo4j
NHSRdatasets
NHS and healthcare related datasets for training and learning R
opencpop
Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis
multimodal-deep-learning-for-disaster-response
Damage Identification in Social Media Posts using Multimodal Deep Learning: code and dataset
spark-java8
Java 8 and Spark learning through examples
MAD
[ICLR 2020] Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma, "I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively"
541-600 of 765 dataset projects