All Projects → obilaniu → Benzina

obilaniu / Benzina

Licence: MIT license
Benzina is an image-loader package that greatly accelerates image loading onto GPUs using their built-in hardware codecs.

Programming Languages

c
50402 projects - #5 most used programming language
python
139335 projects - #7 most used programming language
Meson
512 projects
lua
6591 projects
Cuda
1817 projects
assembly
5116 projects
shell
77523 projects

Projects that are alternatives of or similar to Benzina

SciCompforChemists
Scientific Computing for Chemists text for teaching basic computing skills to chemistry students using Python, Jupyter notebooks, and the SciPy stack. This text makes use of a variety of packages including NumPy, SciPy, matplotlib, pandas, seaborn, NMRglue, SymPy, scikit-image, and scikit-learn.
Stars: ✭ 65 (+80.56%)
Mutual labels:  science, scientific-computing
Ruptures
ruptures: change point detection in Python
Stars: ✭ 654 (+1716.67%)
Mutual labels:  science, scientific-computing
source
The main source repository for the Raysect project.
Stars: ✭ 62 (+72.22%)
Mutual labels:  science, scientific-computing
PyCORN
A script to extract data from ÄKTA/UNICORN result-files (.res)
Stars: ✭ 30 (-16.67%)
Mutual labels:  science, scientific-computing
Awesome Scientific Python
A curated list of awesome scientific Python resources
Stars: ✭ 127 (+252.78%)
Mutual labels:  science, scientific-computing
OOMMFTools
OOMMFTools is a set of utilities designed to assist OOMMF postprocessing
Stars: ✭ 15 (-58.33%)
Mutual labels:  science, scientific-computing
Poliastro
poliastro - 🚀 Astrodynamics in Python
Stars: ✭ 462 (+1183.33%)
Mutual labels:  science, scientific-computing
Librmath.js
Javascript Pure Implementation of Statistical R "core" numerical libRmath.so
Stars: ✭ 425 (+1080.56%)
Mutual labels:  science, scientific-computing
Freud
Powerful, efficient particle trajectory analysis in scientific Python.
Stars: ✭ 118 (+227.78%)
Mutual labels:  science, scientific-computing
Boinc
Open-source software for volunteer computing and grid computing.
Stars: ✭ 1,320 (+3566.67%)
Mutual labels:  science, scientific-computing
adorad
Fast, Expressive, & High-Performance Programming Language for those who dare
Stars: ✭ 54 (+50%)
Mutual labels:  science, scientific-computing
Reprozip
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.
Stars: ✭ 231 (+541.67%)
Mutual labels:  science, scientific-computing
getfem
Mirror of GetFEM repository
Stars: ✭ 23 (-36.11%)
Mutual labels:  science, scientific-computing
OpenSkyStacker
Multi-platform stacker for deep-sky astrophotography.
Stars: ✭ 80 (+122.22%)
Mutual labels:  science, scientific-computing
Core
The core source repository for the Cherab project.
Stars: ✭ 26 (-27.78%)
Mutual labels:  science, scientific-computing
Stdlib
✨ Standard library for JavaScript and Node.js. ✨
Stars: ✭ 2,749 (+7536.11%)
Mutual labels:  science, scientific-computing
spinmob
Rapid and flexible acquisition, analysis, fitting, and plotting in Python. Designed for scientific laboratories.
Stars: ✭ 34 (-5.56%)
Mutual labels:  science, scientific-computing
chemispy
A library for using chemistry in your applications
Stars: ✭ 28 (-22.22%)
Mutual labels:  science
brian2cuda
A brian2 extension to simulate spiking neural networks on GPUs
Stars: ✭ 46 (+27.78%)
Mutual labels:  science
pytoshop
Library for reading and writing Photoshop PSD and PSB files
Stars: ✭ 100 (+177.78%)
Mutual labels:  science

pypi docs

Бензина / Benzina

Description of the project

Benzina is an image loading library that accelerates image loading and preprocessing by making use of the hardware decoder in NVIDIA's GPUs.

Since it minimize the use of the CPU and of the GPU computing units, it's easier to reach saturation of GPU computing power / CPU. In our tests using ResNet18 models in PyTorch on the ImageNet 2012 dataset, we could observe an increase by 1.8x the amount of images loaded, preprocessed then processed by the model when using a single CPU and GPU:

Data Loader CPU CPU Workers CPU Usage GPU Batch Size Pipeline Speed
Benzina Intel Xeon 2698* 1 33% Tesla V100* 256 525 img/s
PyTorch ImageFolder Intel Xeon 2698* 2 100% Tesla V100* 256 290 img/s
PyTorch ImageFolder Intel Xeon 2698* 4 100% Tesla V100* 256 395 img/s
PyTorch ImageFolder Intel Xeon 2698* 6 100% Tesla V100* 256 425 img/s
DALI Intel Xeon 2698* 1 100% Tesla V100* 256 575 img/s

Note

  • Intel Xeon 2698 is the Intel Xeon E5-2698 v4 @ 2.20GHz version
  • Tesla V100 is the Tesla V100 SXM2 16GB version

While DALI currently outperforms Benzina, the speedup can only be seen on JPEGs through the nvJPEG decoder. Benzina requires to transcode the input dataset to H.265 but then the gain can be seen on all type of images as well as providing the dataset in a format that is easier to distribute.

The name "Benzina" is a phonetic transliteration of the Ukrainian word "Бензина", meaning "gasoline" (or "petrol").

ImageNet loading in PyTorch

As long as your dataset is converted into Benzina's data format, you can load it to train a PyTorch model in a few lines of code. Here is an example demonstrating how this can be done with an ImageNet dataset. It is based on the ImageNet example from PyTorch

import torch
import benzina.torch as bz
import benzina.torch.operations as ops

seed = 1234
torch.manual_seed(seed)

# Dataset
train_dataset = bz.dataset.ImageNet("path/to/dataset", split="train")
val_dataset = bz.dataset.ImageNet("path/to/dataset", split="val")

# Dataloaders
bias = ops.ConstantBiasTransform(bias=(0.485 * 255, 0.456 * 255, 0.406 * 255))
std = ops.ConstantNormTransform(norm=(0.229 * 255, 0.224 * 255, 0.225 * 255))

train_loader = bz.DataLoader(
    train_dataset,
    shape=(224, 224),
    batch_size=256,
    shuffle=True,
    seed=seed,
    bias_transform=bias,
    norm_transform=std,
    warp_transform=ops.SimilarityTransform(scale=(0.08, 1.0),
                                           ratio=(3./4., 4./3.),
                                           flip_h=0.5,
                                           random_crop=True))
val_loader = bz.DataLoader(
    val_dataset,
    shape=(224, 224),
    batch_size=256,
    shuffle=False,
    seed=seed,
    bias_transform=bias,
    norm_transform=std,
    warp_transform=ops.CenterResizedCrop(224/256)))

for epoch in range(1, 10):
    # train for one epoch
    train(train_dataloader, ...)

    # evaluate on validation set
    accuracy = validate(valid_dataloader, ...)

Objectives

Known limitations and important notes

Roadmap

How to Contribute

Submitting bugs

Contributing changes

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].