All Projects → szymonmaszke → Torchdata

szymonmaszke / Torchdata

Licence: mit
PyTorch dataset extended with map, cache etc. (tensorflow.data like)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Torchdata

Fungen
Replace boilerplate code with functional patterns using 'go generate'
Stars: ✭ 122 (-46.02%)
Mutual labels:  map, filter
Itiriri
A library built for ES6 iteration protocol.
Stars: ✭ 155 (-31.42%)
Mutual labels:  map, filter
Bbwebimage
A high performance Swift library for downloading, caching and editing web images asynchronously.
Stars: ✭ 128 (-43.36%)
Mutual labels:  cache, filter
Php Validate
Lightweight and feature-rich PHP validation and filtering library. Support scene grouping, pre-filtering, array checking, custom validators, custom messages. 轻量且功能丰富的PHP验证、过滤库。支持场景分组,前置过滤,数组检查,自定义验证器,自定义消息。
Stars: ✭ 225 (-0.44%)
Mutual labels:  library, filter
Holster
A place to keep useful golang functions and small libraries
Stars: ✭ 166 (-26.55%)
Mutual labels:  cache, library
Itiriri Async
A library for asynchronous iteration.
Stars: ✭ 78 (-65.49%)
Mutual labels:  map, filter
Libcache
A Lightweight in-memory key:value cache library for Go.
Stars: ✭ 152 (-32.74%)
Mutual labels:  cache, library
Libgenerics
libgenerics is a minimalistic and generic library for C basic data structures.
Stars: ✭ 42 (-81.42%)
Mutual labels:  library, map
Libosmscout
Libosmscout is a C++ library for offline map rendering, routing and location lookup based on OpenStreetMap data
Stars: ✭ 159 (-29.65%)
Mutual labels:  library, map
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (-30.09%)
Mutual labels:  dataset, library
Easygrid
EasyGrid - VanillaJS Responsive Grid
Stars: ✭ 77 (-65.93%)
Mutual labels:  library, filter
Pottery
Redis for humans. 🌎🌍🌏
Stars: ✭ 204 (-9.73%)
Mutual labels:  cache, library
Cdcontainers
Library of data containers and data structures for C programming language.
Stars: ✭ 57 (-74.78%)
Mutual labels:  library, map
Cacache Rs
💩💵 but for your 🦀
Stars: ✭ 116 (-48.67%)
Mutual labels:  cache, library
Pgo
Go library for PHP community with convenient functions
Stars: ✭ 51 (-77.43%)
Mutual labels:  map, filter
Cachego
Golang Cache component - Multiple drivers
Stars: ✭ 148 (-34.51%)
Mutual labels:  cache, map
Lara Eye
Filter your Query\Builder using a structured query language
Stars: ✭ 39 (-82.74%)
Mutual labels:  library, filter
Depressurizer
A Steam library categorizing tool.
Stars: ✭ 1,008 (+346.02%)
Mutual labels:  library, filter
Dem.net
Digital Elevation model library in C#. 3D terrain models, line/point Elevations, intervisibility reports
Stars: ✭ 153 (-32.3%)
Mutual labels:  dataset, library
Golib
Go Library [DEPRECATED]
Stars: ✭ 194 (-14.16%)
Mutual labels:  cache, library
  • Use map, apply, reduce or filter directly on Dataset objects
  • cache data in RAM/disk or via your own method (partial caching supported)
  • Full PyTorch's Dataset and IterableDataset support
  • General torchdata.maps like Flatten or Select
  • Extensible interface (your own cache methods, cache modifiers, maps etc.)
  • Useful torchdata.datasets classes designed for general tasks (e.g. file reading)
  • Support for torchvision datasets (e.g. ImageFolder, MNIST, CIFAR10) via td.datasets.WrapDataset
  • Minimal overhead (single call to super().__init__())
Version Docs Tests Coverage Style PyPI Python PyTorch Docker Roadmap
Version Documentation Tests Coverage codebeat PyPI Python PyTorch Docker Roadmap

💡 Examples

Check documentation here: https://szymonmaszke.github.io/torchdata

General example

  • Create image dataset, convert it to Tensors, cache and concatenate with smoothed labels:
import torchdata as td
import torchvision

class Images(td.Dataset): # Different inheritance
    def __init__(self, path: str):
        super().__init__() # This is the only change
        self.files = [file for file in pathlib.Path(path).glob("*")]

    def __getitem__(self, index):
        return Image.open(self.files[index])

    def __len__(self):
        return len(self.files)


images = Images("./data").map(torchvision.transforms.ToTensor()).cache()

You can concatenate above dataset with another (say labels) and iterate over them as per usual:

for data, label in images | labels:
    # Do whatever you want with your data
  • Cache first 1000 samples in memory, save the rest on disk in folder ./cache:
images = (
    ImageDataset.from_folder("./data").map(torchvision.transforms.ToTensor())
    # First 1000 samples in memory
    .cache(td.modifiers.UpToIndex(1000, td.cachers.Memory()))
    # Sample from 1000 to the end saved with Pickle on disk
    .cache(td.modifiers.FromIndex(1000, td.cachers.Pickle("./cache")))
    # You can define your own cachers, modifiers, see docs
)

To see what else you can do please check torchdata documentation

Integration with torchvision

Using torchdata you can easily split torchvision datasets and apply augmentation only to the training part of data without any troubles:

import torchvision

import torchdata as td

# Wrap torchvision dataset with WrapDataset
dataset = td.datasets.WrapDataset(torchvision.datasets.ImageFolder("./images"))

# Split dataset
train_dataset, validation_dataset, test_dataset = torch.utils.data.random_split(
    model_dataset,
    (int(0.6 * len(dataset)), int(0.2 * len(dataset)), int(0.2 * len(dataset))),
)

# Apply torchvision mappings ONLY to train dataset
train_dataset.map(
    td.maps.To(
        torchvision.transforms.Compose(
            [
                torchvision.transforms.RandomResizedCrop(224),
                torchvision.transforms.RandomHorizontalFlip(),
                torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize(
                    mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
                ),
            ]
        )
    ),
    # Apply this transformation to zeroth sample
    # First sample is the label
    0,
)

Please notice you can use td.datasets.WrapDataset with any existing torch.utils.data.Dataset instance to give it additional caching and mapping powers!

🔧 Installation

🐍 pip

Latest release:

pip install --user torchdata

Nightly:

pip install --user torchdata-nightly

🐋 Docker

CPU standalone and various versions of GPU enabled images are available at dockerhub.

For CPU quickstart, issue:

docker pull szymonmaszke/torchdata:18.04

Nightly builds are also available, just prefix tag with nightly_. If you are going for GPU image make sure you have nvidia/docker installed and it's runtime set.

❓ Contributing

If you find any issue or you think some functionality may be useful to others and fits this library, please open new Issue or create Pull Request.

To get an overview of thins one can do to help this project, see Roadmap

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].