Top 203 datasets open source projects

bnk48 photo datasets
BNK48 Photo Datasets
✭ 12
datasets
panoptic parts
This repository contains code and tools for reading, processing, evaluating on, and visualizing Panoptic Parts datasets. Moreover, it contains code for reproducing our CVPR 2021 paper results.
kaggle-code
A repository for some of the code I used in kaggle data science & machine learning tasks.
awesome-sweden-datasets
A curated list of awesome datasets to use when coding for the Swedish market.
text-classification-small-datasets
Building a text classifier with extremely small datasets
PharmacoGx
R package to analyze large-scale pharmacogenomic datasets.
systematic-review-datasets
A collection of fully labeled systematic review datasets (title-abstract screening)
AIODrive
Official Python/PyTorch Implementation for "All-In-One Drive: A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds"
PharmacoDB
Search across publicly available datasets to find instances where a drug or cell line of interest has been profiled.
traj-pred-irl
Official implementation codes of "Regularizing neural networks for future trajectory prediction via IRL framework"
HINT3
This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020's Insights Workshop https://insights-workshop.github.io/ Preprint for the paper is available here https://arxiv.org/abs/2009.13833
Text-Summarization-Repo
텍스트 요약 분야의 주요 연구 주제, Must-read Papers, 이용 가능한 model 및 data 등을 추천 자료와 함께 정리한 저장소입니다.
let-it-be
中国高等教育群体的心理健康状态数据集
Three-Filters-to-Normal
Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator (RAL+ICRA'21)
masader
The largest public catalogue for Arabic NLP and speech datasets. There are +250 datasets annotated with more than 25 attributes.
11K-Hands
Two-stream CNN for gender classification and biometric identification using a dataset of 11K hand images.
Clustering-Datasets
This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.
covid19-datasets
A list of high quality open datasets for COVID-19 data analysis
akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Few-Shot-Intent-Detection
Few-Shot-Intent-Detection includes popular challenging intent detection datasets with/without OOS queries and state-of-the-art baselines and results.
awesome-forests
🌳 A curated list of ground-truth forest datasets for the machine learning and forestry community.
ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
dataset
dataset is a command line tool, Go package, shared library and Python package for working with JSON objects as collections
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
parlitools
A collection of useful tools for UK politics
Dataset-Sentimen-Analisis-Bahasa-Indonesia
Repositori ini merupakan kumpulan dataset terkait analisis sentimen Berbahasa Indonesia. Apabila Anda menggunakan dataset-dataset yang ada pada repositori ini untuk penelitian, maka cantumkanlah/kutiplah jurnal artikel terkait dataset tersebut. Dataset yang tersedia telah diimplementasikan dalam beberapa penelitian dan hasilnya telah dipublikasi…
datasets
The primary repository for all of the CORGIS Datasets
spectrochempy
SpectroChemPy is a framework for processing, analyzing and modeling spectroscopic data for chemistry with Python
json2python-models
Generate Python model classes (pydantic, attrs, dataclasses) based on JSON datasets with typing module support
multi-task-defocus-deblurring-dual-pixel-nimat
Reference github repository for the paper "Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning". We propose a single-image deblurring network that incorporates the two sub-aperture views into a multitask framework. Specifically, we show that jointly learning to predict the two DP views from a single …
datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage Computer Vision datasets.
bumblebee
🚕 A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)
NLP PEMDC
NLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
kaggledatasets
Collection of Kaggle Datasets ready to use for Everyone (Looking for contributors)
big-data-exploration
[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
bugrepo
A collection of publicly available bug reports
torchgeo
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Thirukkural-Tamil-Dataset
திருக்குறள் by திருவள்ளுவர்.
scRNAseq cell cluster labeling
Scripts to run and benchmark scRNA-seq cell cluster labeling methods
biomechanics dataset
Information of public available data sets for biomechanics.
CHR
SIXray : A Large-scale Security Inspection X-ray Benchmark in CVPR 2019
121-180 of 203 datasets projects