DataProfilerWhat's in your data? Extract schema, statistics and entities from datasets
JD2Skills-BERT-XMLCCode and Dataset for the Bhola et al. (2020) Retrieving Skills from Job Descriptions: A Language Model Based Extreme Multi-label Classification Framework
OpinRankOpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)
CremaMeta data server & client tools for game development
mtdataA tool that locates, downloads, and extracts machine translation corpora
psidRR package to easily build panel data sets from the PSID
factedit🧐 Code & Data for Fact-based Text Editing (Iso et al; ACL 2020)
d3dDevkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch
STEADSTanford EArthquake Dataset (STEAD):A Global Data Set of Seismic Signals for AI
6DOF tracking evaluationCode visualize and evaluate the dataset from "A Framework for Evaluating 6-DOF Object Trackers".
NN-scratchCoding up a Neural Network Classifier from Scratch
StreamCatLandscape features for ~2.65 million streams
make-your-yolov5 dataset💥Make your yolov5 dataset by using labelimg.I hope my work can help you make your yolov5 datasets more quickly.
city-codesBrazilian city names and official codes, IBGE, LexML and others
multi-task-learningMulti-task learning smile detection, age and gender classification on GENKI4k, IMDB-Wiki dataset.
trumptweetsDownload data on all of Donald Trump's (@RealDonaldTrump) tweets
ir datasetsProvides a common interface to many IR ranking datasets.
JSON2YOLOConvert JSON annotations into YOLO format.
deep-learning-german-ttsThorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
desh-dataSequence lineage information extracted from RKI sequence data repo
mddatasetbuilderA script to build reference datasets for training neural network potentials from given LAMMPS trajectories.
AutoSweepThe implementation for the AutoSweep (TVCG 2018)
multi-task-defocus-deblurring-dual-pixel-nimatReference github repository for the paper "Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning". We propose a single-image deblurring network that incorporates the two sub-aperture views into a multitask framework. Specifically, we show that jointly learning to predict the two DP views from a single …
mysql-random-data-generatorThis is the easiest MySQL random test data generator tool. Load the procedure and execute to auto detect column types and load data.
datumaroDataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
LeetCodeAt present contains scraped data from around 1500 problems present on the site. More to follow....
NLP PEMDCNLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
3D60Tools accompanying the 3D60 spherical panoramas dataset
strategyqaThe official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
soundataPython library for downloading, loading & working with sound datasets
MSMARCO-MRC-AnalysisAnalysis on the MS-MARCO leaderboard regarding the machine reading comprehension task.
DeepCrackDeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation, Neurocomputing.
stasisSemantic Textual Similarity in Python
NHSRdatasetsNHS and healthcare related datasets for training and learning R
opencpopOpencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis
MAD[ICLR 2020] Haotao Wang, Tianlong Chen, Zhangyang Wang, Kede Ma, "I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively"