All Projects → vahidk → Tfrecord

vahidk / Tfrecord

Licence: mit
TFRecord reader for PyTorch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tfrecord

Awesome Segmentation Saliency Dataset
A collection of some datasets for segmentation / saliency detection. Welcome to PR...😄
Stars: ✭ 315 (-16.45%)
Mutual labels:  dataset
Eseur Code Data
Code and data used to create the examples in "Evidence-based Software Engineering based on the publicly available data"
Stars: ✭ 340 (-9.81%)
Mutual labels:  dataset
Data
Python related videos and metadata powering =>
Stars: ✭ 355 (-5.84%)
Mutual labels:  dataset
Ng Http Loader
🍡 Smart angular HTTP interceptor - Intercepts automagically HTTP requests and shows a spinkit spinner / loader / progress bar
Stars: ✭ 327 (-13.26%)
Mutual labels:  loader
Deeperforensics 1.0
[CVPR 2020] A Large-Scale Dataset for Real-World Face Forgery Detection
Stars: ✭ 338 (-10.34%)
Mutual labels:  dataset
Medmnist
[ISBI'21] MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis
Stars: ✭ 338 (-10.34%)
Mutual labels:  dataset
Toflow
TOFlow: Video Enhancement with Task-Oriented Flow
Stars: ✭ 314 (-16.71%)
Mutual labels:  dataset
Trashnet
Dataset of images of trash; Torch-based CNN for garbage image classification
Stars: ✭ 368 (-2.39%)
Mutual labels:  dataset
Pcam
The PatchCamelyon (PCam) deep learning classification benchmark.
Stars: ✭ 340 (-9.81%)
Mutual labels:  dataset
Ssspinnerbutton
Forget about typical stereotypic loading, It's time to change. SSSpinnerButton is an elegant button with a diffrent spinner animations.
Stars: ✭ 357 (-5.31%)
Mutual labels:  loader
Whylogs
Profile and monitor your ML data pipeline end-to-end
Stars: ✭ 328 (-13%)
Mutual labels:  dataset
Markdown Loader
markdown loader for webpack
Stars: ✭ 335 (-11.14%)
Mutual labels:  loader
Dukemtmc Reid evaluation
ICCV2017 The Person re-ID Evaluation Code for DukeMTMC-reID Dataset (Including Dataset Download)
Stars: ✭ 344 (-8.75%)
Mutual labels:  dataset
Browser Compat Data
This repository contains compatibility data for Web technologies as displayed on MDN
Stars: ✭ 3,710 (+884.08%)
Mutual labels:  dataset
Sass Loader
Compiles Sass to CSS
Stars: ✭ 3,718 (+886.21%)
Mutual labels:  loader
Transportationnetworks
Transportation Networks for Research
Stars: ✭ 312 (-17.24%)
Mutual labels:  dataset
Dsprites Dataset
Dataset to assess the disentanglement properties of unsupervised learning methods
Stars: ✭ 340 (-9.81%)
Mutual labels:  dataset
Sniper
A powerful & high-performance http load tester
Stars: ✭ 373 (-1.06%)
Mutual labels:  loader
Ngx Ui Loader
Multiple Loaders / spinners and Progress bar for Angular 5, 6, 7 and 8+
Stars: ✭ 368 (-2.39%)
Mutual labels:  loader
React Load Script
React component that makes it easy to load 3rd party scripts
Stars: ✭ 347 (-7.96%)
Mutual labels:  loader

TFRecord reader

Installation

pip3 install tfrecord

Usage

It's recommended to create an index file for each TFRecord file. Index file must be provided when using multiple workers, otherwise the loader may return duplicate records.

python3 -m tfrecord.tools.tfrecord2idx <tfrecord path> <index path>

Use TFRecordDataset to read TFRecord files in PyTorch.

import torch
from tfrecord.torch.dataset import TFRecordDataset

tfrecord_path = "/path/to/data.tfrecord"
index_path = None
description = {"image": "byte", "label": "float"}
dataset = TFRecordDataset(tfrecord_path, index_path, description)
loader = torch.utils.data.DataLoader(dataset, batch_size=32)

data = next(iter(loader))
print(data)

Use MultiTFRecordDataset to read multiple TFRecord files. This class samples from given tfrecord files with given probability.

import torch
from tfrecord.torch.dataset import MultiTFRecordDataset

tfrecord_pattern = "/path/to/{}.tfrecord"
index_pattern = "/path/to/{}.index"
splits = {
    "dataset1": 0.8,
    "dataset2": 0.2,
}
description = {"image": "byte", "label": "int"}
dataset = MultiTFRecordDataset(tfrecord_pattern, index_pattern, splits, description)
loader = torch.utils.data.DataLoader(dataset, batch_size=32)

data = next(iter(loader))
print(data)

Creating tfrecord files:

import tfrecord

writer = tfrecord.TFRecordWriter("/path/to/data.tfrecord")
writer.write({
    "image": (image_bytes, "byte"),
    "label": (label, "float"),
    "index": (index, "int")
})
writer.close()

Note: To write tfrecord files you also need an additional dependency:

pip3 install crc32c

Reading tfrecord files in python:

import tfrecord

loader = tfrecord.tfrecord_loader("/path/to/data.tfrecord", None, {
    "image": "byte",
    "label": "float",
    "index": "int"
})
for record in loader:
    print(record["label"])

Transforming input

You can optionally pass a function as transform argument to perform post processing of features before returning. This can for example be used to decode images or normalize colors to a certain range or pad variable length sequence.

import tfrecord
import cv2

def decode_image(features):
    # get BGR image from bytes
    features["image"] = cv2.imdecode(features["image"], -1)
    return features


description = {
    "image": "bytes",
}

dataset = tfrecord.torch.TFRecordDataset("/path/to/data.tfrecord",
                                         index_path=None,
                                         description=description,
                                         transform=decode_image)

data = next(iter(dataset))
print(data)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].