Top 175 datasets open source projects

Datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Retriever
Quickly download, clean up, and install public datasets into a database management system
Datasets
source{d} datasets ("big code") for source code analysis and machine learning on source code
Automated Resume Screening System
Automated Resume Screening System using Machine Learning (With Dataset)
Zr Obp
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
Ner Datasets
Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)
Mola
A Modular Optimization framework for Localization and mApping (MOLA)
Indonlu
The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
Awesome Json Datasets
A curated list of awesome JSON datasets that don't require authentication.
Nlp datasets
My NLP datasets for Russian language
✭ 198
nlpdatasets
Datasaurus
R Package 📦 Containing the Datasaurus Dozen datasets 📊
Unify Emotion Datasets
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text
Corus
Links to Russian corpora + Python functions for loading and parsing
Awesome Nlp Polish
A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.
Robotcar Dataset Sdk
Software Development Kit for the Oxford Robotcar Dataset
Idenprof
IdenProf dataset is a collection of images of identifiable professionals. It is been collected to enable the development of AI systems that can serve by identifying people and the nature of their job by simply looking at an image, just like humans can do.
Pins
Pin, Discover and Share Resources
Gekko Datasets
Gekko Trading Bot dataset dumps. Ready to use and download history files in SQLite format.
Pix2code
pix2code: Generating Code from a Graphical User Interface Screenshot
Remo Python
🐰 Python lib for remo - the app for annotations and images management in Computer Vision
Multi object datasets
Multi-object image datasets with ground-truth segmentation masks and generative factors.
Bird Recognition Review
A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions
Aspect Based Sentiment Analysis
Aspect-Based Sentiment Analysis Experiments
Aesthetics
Image Aesthetics Toolkit - includes Fisher Vector implementation, AVA (Image Aesthetic Visual Analysis) dataset and fast multi-threaded downloader
Firstcoursenetworkscience
Tutorials, datasets, and other material associated with textbook "A First Course in Network Science" by Menczer, Fortunato & Davis
Cholera
R Package for Analyzing John Snow's 1854 Cholera Map
Awesome Public Datasets
A topic-centric list of HQ open datasets.
Chineseglue
Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard
Nlu datasets with task oriented dialogue
datasets of natural language understanding and dialogue state tracking
✭ 104
datasets
Wb srgb
White balance camera-rendered sRGB images (CVPR 2019) [Matlab & Python]
Transitland Datastore
Transitland's centralized web service API for both querying and editing aggregated transit data from around the world
Exposure correction
Reference code for the paper "Learning Multi-Scale Photo Exposure Correction", CVPR 2021.
Doppelganger
[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Persian Swear Words
دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
Nottingham Dataset
Cleaned version of the Nottingham dataset
✭ 94
mldatasets
Crossweigh
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
Gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Atis dataset
The ATIS (Airline Travel Information System) Dataset
Photogrammetry datasets
Collection of 250+ datasets for photogrammetry
Coco Annotator
✏️ Web-based image segmentation tool for object detection, localization, and keypoints
Colour
Colour Science for Python
Wongnai Corpus
Collection of Wongnai's datasets
Personas
Datasets for Deep learning Personas
Awesome Earth Artificial Intelligence
A curated list of Earth Science's Artificial Intelligence (AI) tutorials, notebooks, software, datasets, courses, books, video lectures and papers. Contributions most welcome.
French Sentiment Analysis Dataset
A collection of over 1.5 Million tweets data translated to French, with their sentiment.
Segmentation wbc
White blood cell (WBC) image datasets
Healthcheck
Health Check ✔ is a Machine Learning Web Application made using Flask that can predict mainly three diseases i.e. Diabetes, Heart Disease, and Cancer.
Commons
⛲️ Commons Marketplace client & server to explore, download, and publish open data sets in the Ocean Protocol Network.
1-60 of 175 datasets projects