All Projects → MIC-DKFZ → Batchgenerators

MIC-DKFZ / Batchgenerators

Licence: apache-2.0
A framework for data augmentation for 2D and 3D image classification and segmentation

Projects that are alternatives of or similar to Batchgenerators

Zero To Mastery Ml
All course materials for the Zero to Mastery Machine Learning and Data Science course.
Stars: ✭ 631 (-3.66%)
Mutual labels:  jupyter-notebook
Aima Python
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
Stars: ✭ 6,129 (+835.73%)
Mutual labels:  jupyter-notebook
Stat479 Machine Learning Fs19
Course material for STAT 479: Machine Learning (FS 2019) taught by Sebastian Raschka at University Wisconsin-Madison
Stars: ✭ 650 (-0.76%)
Mutual labels:  jupyter-notebook
Pytorch Normalizing Flows
Normalizing flows in PyTorch. Current intended use is education not production.
Stars: ✭ 641 (-2.14%)
Mutual labels:  jupyter-notebook
Py4fi2nd
Jupyter Notebooks and code for Python for Finance (2nd ed., O'Reilly) by Yves Hilpisch.
Stars: ✭ 640 (-2.29%)
Mutual labels:  jupyter-notebook
Tensorflow 101
TensorFlow 101: Introduction to Deep Learning for Python Within TensorFlow
Stars: ✭ 642 (-1.98%)
Mutual labels:  jupyter-notebook
Speech Emotion Analyzer
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Stars: ✭ 633 (-3.36%)
Mutual labels:  jupyter-notebook
Tutorials
Ipython notebooks for math and finance tutorials
Stars: ✭ 654 (-0.15%)
Mutual labels:  jupyter-notebook
Nteract
📘 The interactive computing suite for you! ✨
Stars: ✭ 5,713 (+772.21%)
Mutual labels:  jupyter-notebook
Saliency
TensorFlow implementation for SmoothGrad, Grad-CAM, Guided backprop, Integrated Gradients and other saliency techniques
Stars: ✭ 648 (-1.07%)
Mutual labels:  jupyter-notebook
Practical Deep Learning For Coders 2.0
Notebooks for the "A walk with fastai2" Study Group and Lecture Series
Stars: ✭ 638 (-2.6%)
Mutual labels:  jupyter-notebook
Funcat
Funcat 将同花顺、通达信、文华财经麦语言等的公式写法移植到了 Python 中。
Stars: ✭ 642 (-1.98%)
Mutual labels:  jupyter-notebook
Food 101 Keras
Food Classification with Deep Learning in Keras / Tensorflow
Stars: ✭ 646 (-1.37%)
Mutual labels:  jupyter-notebook
Hands On Reinforcement Learning With Python
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow
Stars: ✭ 640 (-2.29%)
Mutual labels:  jupyter-notebook
Tf Estimator Tutorials
This repository includes tutorials on how to use the TensorFlow estimator APIs to perform various ML tasks, in a systematic and standardised way
Stars: ✭ 649 (-0.92%)
Mutual labels:  jupyter-notebook
Data Visualization
Misc data visualization projects, examples, and demos: mostly Python (pandas + matplotlib) and JavaScript (leaflet).
Stars: ✭ 639 (-2.44%)
Mutual labels:  jupyter-notebook
Tsfresh
Automatic extraction of relevant features from time series:
Stars: ✭ 6,077 (+827.79%)
Mutual labels:  jupyter-notebook
Eigentechno
Principal Component Analysis on music loops
Stars: ✭ 655 (+0%)
Mutual labels:  jupyter-notebook
Architectureplaybook
The Open Architecture Playbook. Use it to create better and faster (IT)Architectures. OSS Tools, templates and more for solving IT problems using real open architecture tools that work!
Stars: ✭ 652 (-0.46%)
Mutual labels:  jupyter-notebook
Goodbooks 10k
Ten thousand books, six million ratings
Stars: ✭ 646 (-1.37%)
Mutual labels:  jupyter-notebook

batchgenerators by [email protected]

batchgenerators is a python package that we developed at the Division of Medical Image Computing at the German Cancer Research Center (DKFZ) to suit all our deep learning data augmentation needs. It is not (yet) perfect, but we feel it is good enough to be shared with the community. If you encounter bug, feel free to contact us or open a github issue.

If you use it please cite the following work:

Isensee Fabian, Jäger Paul, Wasserthal Jakob, Zimmerer David, Petersen Jens, Kohl Simon, 
Schock Justus, Klein Andre, Roß Tobias, Wirkert Sebastian, Neher Peter, Dinkelacker Stefan, 
Köhler Gregor, Maier-Hein Klaus (2020). batchgenerators - a python framework for data 
augmentation. doi:10.5281/zenodo.3632567

Build Status

Supported Augmentations

We supports a variety of augmentations, all of which are compatible with 2D and 3D input data! (This is something that was missing in most other frameworks).

  • Spatial Augmentations
    • mirroring
    • channel translation (to simulate registration errors)
    • elastic deformations
    • rotations
    • scaling
    • resampling
  • Color Augmentations
    • brightness (additive, multiplivative)
    • contrast
    • gamma (like gamma correction in photo editing)
  • Noise Augmentations
    • Gaussian Noise
    • Rician Noise
    • ...will be expanded in future commits
  • Cropping
    • random crop
    • center crop
    • padding

Note: Stack transforms by using batchgenerators.transforms.abstract_transforms.Compose. Finish it up by plugging the composed transform into our multithreader: batchgenerators.dataloading.multi_threaded_augmenter.MultiThreadedAugmenter

How to use it

The working principle is simple: Derive from DataLoaderBase class, reimplement generate_train_batch member function and use it to stack your augmentations! For simple example see batchgenerators/examples/example_ipynb.ipynb

We also now have an extensive example for BraTS2017/2018 with both 2D and 3D DataLoader and augmentations: batchgenerators/examples/brats2017/

There are also CIFAR10/100 datasets and DataLoader available at batchgenerators/datasets/cifar.py

Data Structure

The data structure that is used internally (and with which you have to comply when implementing generate_train_batch) is kept simple as well: It is just a regular python dictionary! We did this to allow maximum flexibility in the kind of data that is passed along through the pipeline. The dictionary must have a 'data' key:value pair. It optionally can handle a 'seg' key:vlaue pair to hold a segmentation. If a 'seg' key:value pair is present all spatial transformations will also be applied to the segmentation! A part from 'data' and 'seg' you are free to do whatever you want (your image classification/regression target for example). All key:value pairs other than 'data' and 'seg' will be passed through the pipeline unmodified.

'data' value must have shape (b, c, x, y) for 2D or shape (b, c, x, y, z) for 3D! 'seg' value must have shape (b, c, x, y) for 2D or shape (b, c, x, y, z) for 3D! Color channel may be used here to allow for several segmentation maps. If you have only one segmentation, make sure to have shape (b, 1, x, y (, z))

How to install locally

Install batchgenerators

pip install --upgrade batchgenerators

Import as follows

from batchgenerators.transforms.color_transforms import ContrastAugmentationTransform

Windows Support is very experimental!

Batchgenerators makes heavy use of python multiprocessing and python multiprocessing on windows is different from linux. To prevent the workers from freezing in windows, you have to guard your code with if __name__ == '__main__' and use multiprocessing's freeze_support. The executed script may then look like this:

# some imports and functions here

def main():
    # do some stuff

if __name__ == '__main__':
    from multiprocessing import freeze_support
    freeze_support()
    main()

This is not required on Linux.

Release Notes

(only highlights, not an exhaustive list)

  • 0.20.0: fixed an issue with MultiThreadedAugmenter not terminating properly after KeyboardInterrupt; Fixed an error with the number and order of samples being returned when pin_memory=True; Improved performance by always hiding process-process communication bottleneck through threading

  • 0.19.5: fixed OMP_NUM_THREADS issue by using threadpoolctl package; dropped python 2 support (threadpoolctl is not available for python 2)

  • 0.19:

    • There is now a complete example for BraTS2017/8 available for both 2D and 3D. Use this if you would like to get some insights on how I (Fabian) do my experiments
    • Windows is now supported! Thanks @justusschock for your support!
    • new, simple parametrization of elastic deformation. Use SpatialTransform_2!
    • CIFAR10/100 DataLoader are now available for your convenience
    • a bug in MultiThreadedAugmenter that could interfere with reproducibility is now fixed
  • 0.18:

    • all augmentations (there are some exceptions though) are implemented on a per-sample basis. This should make it easier to use the augmentations outside of the Transforms of batchgenerators
    • applicable Transforms now have a keyword p_per_sample with which the user can specify a probability with which this transform is applied to a sample. Before, this was handled by RndTransform and applied to the whole batch (so either all samples were augmented or none). Now this decision is made on a per-sample basis and increases variability by a lot.
    • following the previous point, RndTransform is now deprecated
    • AlternativeMultiThreadedAugmenter is now deprecated as well (no need to have this anymore)
    • pytorch users can now transform numpy arrays to pytorch tensors within batchgenerators (NumpyToTensor). For some reason, inter-process communication is faster with tensors (~factor 4), so this is recommended!
    • if numpy arrays were converted to pytorch tensors, MultithreadedAugmenter now allows to pin the memory as well (pin_memory=True). This will happen in a background thread (inspired by pytorch DataLoader). pinned memory can be copied to the GPU much faster. My (Fabian) classification experiment with Resnet50 got a speed boost of 12% from just that.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].