Top 765 dataset open source projects

Datagear
数据可视化分析平台,使用Java语言开发,采用浏览器/服务器架构,支持SQL、CSV、Excel、HTTP接口、JSON等多种数据源
Game Datasets
🎮 A curated list of awesome game datasets, and tools to artificial intelligence in games
Dataset Api
The ApolloScape Open Dataset for Autonomous Driving and its Application.
Awesome Msr
A curated repository of software engineering repository mining data sets
Fakenewscorpus
A dataset of millions of news articles scraped from a curated list of data sources.
NLPrep
🍳 NLPrep - dataset tool for many natural language processing task
StrayVisualizer
Visualize Data From Stray Scanner https://keke.dev/blog/2021/03/10/Stray-Scanner.html
Medical-Names-Corpus
医疗语料库。医疗机构名语料库。药品本位码。
AITQA
resources for the IBM Airlines Table-Question-Answering Benchmark
Open-korean-corpora
Open Korean NLP Dataset Curation for the Users All Around the Globe
tracing-vs-freehand
Tracing Versus Freehand for Evaluating Computer-Generated Drawings (SIGGRAPH 2021)
OTT-QA
Code and Data for ICLR2021 Paper "Open Question Answering over Tables and Text"
BugZoo
Keep your bugs contained. A platform for studying historical software bugs.
corona-virus
一个冠状病毒肺炎传染病学研究数据集
user quality
Dataset for Software Evolution and Quality Improvement
HJDataset
A Large Dataset of Historical Japanese Documents with Complex Layouts
MaskedFaceRepresentation
Masked face recognition focuses on identifying people using their facial features while they are wearing masks. We introduce benchmarks on face verification based on masked face images for the development of COVID-safe protocols in airports.
mxmortalitydb
A data only R package containing all injury intent deaths registered in Mexico from 2004 to 2019
squad-v1.1-pt
Portuguese translation of the SQuAD dataset
pump-and-dump-dataset
Additional material for paper: Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations, ICCCN '20
Audio-Classification-using-CNN-MLP
Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.
snorkeling
Extracting biomedical relationships from literature with Snorkel 🏊
pull facebook data for good
[DEPRECATED] Imitate an API for downloading data from Facebook Data For Good
recurrent-defocus-deblurring-synth-dual-pixel
Reference github repository for the paper "Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data". We propose a procedure to generate realistic DP data synthetically. Our synthesis approach mimics the optical image formation found on DP sensors and can be applied to virtual scenes rendered with standard computer software. Lev…
TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
climateR
An R 📦 for getting point and gridded climate data by AOI
Species-Names-Corpus
物种名称语料库。植物名,动物名。
hyperspectral-soilmoisture-dataset
Hyperspectral and soil-moisture data from a field campaign based on a soil sample. Karlsruhe (Germany), 2017.
bravefrontier data
Holds extracted data for the game Brave Frontier (Global/JP/KR/EU)
Complete-Blood-Cell-Count-Dataset
The complete blood count (CBC) dataset contains a total of 360 blood smear images of red blood cells (RBCs), white blood cells (WBCs), and Platelets with annotations.
WeFEND-AAAI20
Dataset for paper "Weak Supervision for Fake News Detection via Reinforcement Learning" published in AAAI'2020.
COVID-19-Datasets
Novel Coronavirus (COVID-19) Cases for India, provided by University of Kalyani.
holiday jp
Japanese holiday datasets
KVQA
Korean Visual Question Answering
cspan data
A repo for tracking the number of followers of Congress, the Cabinet, and Governors
ebe-dataset
Evidence-based Explanation Dataset (AACL-IJCNLP 2020)
grasp multiObject
Robotic grasp dataset for multi-object multi-grasp evaluation with RGB-D data. This dataset is annotated using the same protocal as Cornell Dataset, and can be used as multi-object extension of Cornell Dataset.
drone-net
https://towardsdatascience.com/tutorial-build-an-object-detection-system-using-yolo-9a930513643a
uctf
Unsupervised Controllable Text Generation (Applied to text Formalization)