All Projects → airctic → icedata

airctic / icedata

Licence: Apache-2.0 License
IceData: Datasets Hub for the *IceVision* Framework

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to icedata

labml
🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱
Stars: ✭ 1,213 (+2858.54%)
Mutual labels:  fastai, pytorch-lightning
Cocostuff10k
The official homepage of the (outdated) COCO-Stuff 10K dataset.
Stars: ✭ 248 (+504.88%)
Mutual labels:  dataset, coco
Pycococreator
Helper functions to create COCO datasets
Stars: ✭ 530 (+1192.68%)
Mutual labels:  dataset, coco
Deep-Learning-Experiments-implemented-using-Google-Colab
Colab Compatible FastAI notebooks for NLP and Computer Vision Datasets
Stars: ✭ 16 (-60.98%)
Mutual labels:  fastai, computer-vision-datasets
slp
Utils and modules for Speech Language and Multimodal processing using pytorch and pytorch lightning
Stars: ✭ 17 (-58.54%)
Mutual labels:  pytorch-lightning
squad-v1.1-pt
Portuguese translation of the SQuAD dataset
Stars: ✭ 13 (-68.29%)
Mutual labels:  dataset
pump-and-dump-dataset
Additional material for paper: Pump and Dumps in the Bitcoin Era: Real Time Detection of Cryptocurrency Market Manipulations, ICCCN '20
Stars: ✭ 66 (+60.98%)
Mutual labels:  dataset
amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve
Deploy FastAI Trained PyTorch Model in TorchServe and Host in Amazon SageMaker Inference Endpoint
Stars: ✭ 60 (+46.34%)
Mutual labels:  fastai
Medical-Names-Corpus
医疗语料库。医疗机构名语料库。药品本位码。
Stars: ✭ 26 (-36.59%)
Mutual labels:  dataset
tracing-vs-freehand
Tracing Versus Freehand for Evaluating Computer-Generated Drawings (SIGGRAPH 2021)
Stars: ✭ 21 (-48.78%)
Mutual labels:  dataset
corona-virus
一个冠状病毒肺炎传染病学研究数据集
Stars: ✭ 34 (-17.07%)
Mutual labels:  dataset
image-classification-api
Serving model through api. FastApi + PytorchLightning
Stars: ✭ 26 (-36.59%)
Mutual labels:  pytorch-lightning
covid19-data-greece
Datasets and analysis of Novel Coronavirus (COVID-19) outbreak in Greece
Stars: ✭ 16 (-60.98%)
Mutual labels:  dataset
covid-mask-detector
Detect whether a person is wearing a mask or not
Stars: ✭ 102 (+148.78%)
Mutual labels:  pytorch-lightning
Open-korean-corpora
Open Korean NLP Dataset Curation for the Users All Around the Globe
Stars: ✭ 82 (+100%)
Mutual labels:  dataset
Audio-Classification-using-CNN-MLP
Multi class audio classification using Deep Learning (MLP, CNN): The objective of this project is to build a multi class classifier to identify sound of a bee, cricket or noise.
Stars: ✭ 36 (-12.2%)
Mutual labels:  dataset
user quality
Dataset for Software Evolution and Quality Improvement
Stars: ✭ 27 (-34.15%)
Mutual labels:  dataset
OTT-QA
Code and Data for ICLR2021 Paper "Open Question Answering over Tables and Text"
Stars: ✭ 92 (+124.39%)
Mutual labels:  dataset
HJDataset
A Large Dataset of Historical Japanese Documents with Complex Layouts
Stars: ✭ 19 (-53.66%)
Mutual labels:  dataset
MaskedFaceRepresentation
Masked face recognition focuses on identifying people using their facial features while they are wearing masks. We introduce benchmarks on face verification based on masked face images for the development of COVID-safe protocols in airports.
Stars: ✭ 17 (-58.54%)
Mutual labels:  dataset
logo

Datasets Hub for the IceVision Framework


Note: We Need Your Help If you find this work useful, please let other people know by starring it, and sharing it. Thank you!

tests docs codecov PyPI version black license

Discord


Contributors

Documentation

Installation

pip install icedata

For more installation options, check our extensive documentation.

Important: We currently only support Linux/MacOS.

Why IceData?

  • IceData is a dataset hub for the IceVision Framework

  • It includes community maintained datasets and parsers and has out-of-the-box support for common annotation formats (COCO, VOC, etc.)

  • It provides an overview of each included dataset with a description, an annotation example, and other helpful information

  • It makes end-to-end training straightforward thanks to IceVision's unified API

  • It enables practioners to get moving with object detection technology quickly

Datasets

Source

The Datasets class is designed to simplify loading and parsing a wide range of computer vision datasets.

Main Features:

  • Caches data so you don't need to download it over and over

  • Lightweight and fast

  • Transparent and pythonic API

  • Out-of-the-box parsers convert common dataset annotation formats into the unified IceVision Data Format

IceData provides several ready-to-use datasets that use both common annotation formats such as COCO and VOC as well as other annotation formats such WheatParser used in the Kaggle Global Wheat Competition

Usage

Object detection datasets use multiple annotation formats (COCO, VOC, and others). IceVision makes it easy to work across all of them with its easy-to-use and extend parsers.

COCO and VOC compatible datasets

For COCO or VOC compatible datasets - especially ones that are not include in IceData - it is easiest to use the IceData COCO or VOC parser.

Example: Raccoon - a dataset using the VOC parser

# Imports
from icevision.all import *
import icedata


# WARNING: Make sure you have already cloned the raccoon dataset using the command shown here above
# Set images and annotations directories
data_dir = Path("raccoon_dataset")
images_dir = data_dir / "images"
annotations_dir = data_dir / "annotations"

# Define the class_map
class_map = ClassMap(["raccoon"])

# Create a parser for dataset using the predefined icevision VOC parser
parser = parsers.voc(
    annotations_dir=annotations_dir, images_dir=images_dir, class_map=class_map
)

# Parse the annotations to create the train and validation records
train_records, valid_records = parser.parse()
show_records(train_records[:3], ncols=3, class_map=class_map)

!!! info "Note" Notice how we use the predifined parsers.voc() function:

**parser = parsers.voc(
annotations_dir=annotations_dir, images_dir=images_dir, class_map=class_map
)**

Datasets included in IceData

Datasets included in IceData always have their own parser. It can be invoked with icedata.datasetname.parser(...).

Example: The IceData Fridge dataset

Please check out the fridge folder for more information on how this dataset is structured.

# Imports
from icevision.all import *
import icedata

# Load the Fridge Objects dataset
data_dir = icedata.fridge.load()

# Get the class_map
class_map = icedata.fridge.class_map()

# Parse the annotations
parser = icedata.fridge.parser(data_dir, class_map)
train_records, valid_records = parser.parse()

# Show images with their boxes and labels
show_records(train_records[:3], ncols=3, class_map=class_map)

!!! info "Note" Notice how we use the parser associated with the fridge dataset icedata.fridge.parser():

**parser = icedata.fridge.parser(data_dir, class_map)**

Datasets with a new annotation format

Sometimes, you will need to define a new annotation format for you dataset. Additional information can be found in the documentation. In this case, we strongly recommend you following the file structure and naming conventions used in the examples such as the Fridge dataset, or the PETS dataset.

image

Disclaimer

Inspired from the excellent HuggingFace Datasets project, icedata is a utility library that downloads and prepares computer vision datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the its license.

If you are a dataset owner and wish to update any of the information in IceData (description, citation, etc.), or do not want your dataset to be included, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

If you are interested in learning more about responsible AI practices, including fairness, please see Google AI's Responsible AI Practices.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].