Top 175 datasets open source projects

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Quickly download, clean up, and install public datasets into a database management system
source{d} datasets ("big code") for source code analysis and machine learning on source code
Automated Resume Screening System
Automated Resume Screening System using Machine Learning (With Dataset)
Zr Obp
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
Ner Datasets
Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
A Modular Optimization framework for Localization and mApping (MOLA)
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Awesome Json Datasets
A curated list of awesome JSON datasets that don't require authentication.
Nlp datasets
My NLP datasets for Russian language
✭ 198
R Package 📦 Containing the Datasaurus Dozen datasets 📊
Unify Emotion Datasets
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text
Links to Russian corpora + Python functions for loading and parsing
Awesome Nlp Polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
Robotcar Dataset Sdk
Software Development Kit for the Oxford Robotcar Dataset
IdenProf dataset is a collection of images of identifiable professionals. It is been collected to enable the development of AI systems that can serve by identifying people and the nature of their job by simply looking at an image, just like humans can do.
Pin, Discover and Share Resources
Gekko Datasets
Gekko Trading Bot dataset dumps. Ready to use and download history files in SQLite format.
pix2code: Generating Code from a Graphical User Interface Screenshot
Remo Python
🐰 Python lib for remo - the app for annotations and images management in Computer Vision
Multi object datasets
Multi-object image datasets with ground-truth segmentation masks and generative factors.
Bird Recognition Review
A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions
Aspect Based Sentiment Analysis
Aspect-Based Sentiment Analysis Experiments
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
R Package for Analyzing John Snow's 1854 Cholera Map
Awesome Public Datasets
A topic-centric list of HQ open datasets.
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Nlu datasets with task oriented dialogue
datasets of natural language understanding and dialogue state tracking
✭ 104
Wb srgb
White balance camera-rendered sRGB images (CVPR 2019) [Matlab & Python]
Transitland Datastore
Transitland's centralized web service API for both querying and editing aggregated transit data from around the world
Exposure correction
Reference code for the paper "Learning Multi-Scale Photo Exposure Correction", CVPR 2021.
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Persian Swear Words
دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
Nottingham Dataset
Cleaned version of the Nottingham dataset
✭ 94
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
Atis dataset
The ATIS (Airline Travel Information System) Dataset
Photogrammetry datasets
Collection of 250+ datasets for photogrammetry
Coco Annotator
✏️ Web-based image segmentation tool for object detection, localization, and keypoints
Colour Science for Python
Wongnai Corpus
Collection of Wongnai's datasets
Datasets for Deep learning Personas
Awesome Earth Artificial Intelligence
A curated list of Earth Science's Artificial Intelligence (AI) tutorials, notebooks, software, datasets, courses, books, video lectures and papers. Contributions most welcome.
French Sentiment Analysis Dataset
A collection of over 1.5 Million tweets data translated to French, with their sentiment.
Segmentation wbc
White blood cell (WBC) image datasets
Health Check ✔ is a Machine Learning Web Application made using Flask that can predict mainly three diseases i.e. Diabetes, Heart Disease, and Cancer.
⛲️ Commons Marketplace client & server to explore, download, and publish open data sets in the Ocean Protocol Network.
1-60 of 175 datasets projects