Programming Languages

139335 projects - #7 most used programming language

Labels

Fast neural style with MobileNetV2 bottleneck blocks

This repository contains a PyTorch implementation of an algorithm for artistic style transfer. The implementation is based on the following papers and repositories:

Main Differences from Other Implementations

Residual Blocks and Convolutions are changed to MobileNetV2 bottleneck blocks which make use of Inverted Residuals and Depthwise Separable Convolutions.

On the picture you can see 2 types of MobileNetV2 bottleneck blocks. Left one is used instead of residual block and right one is used instead of convolution layer. Purposes of this change:
- Decrease number of trainable parameters of the transformer network from ~1.67m to ~0.23m, therefore decrease amount of the memory used by the transformer network.
- In theory this should give a good speedup during training time and, more importantly, during inference time (fast neural style should be fast as possible). It appeared that in practice things are not so good and this architecture of the transformer network is only a bit faster than the original transformer network. The main cause of it is that depthwise convolutions are not so efficiently implemented on GPU as common convolutions are (on CPU the speedup is bigger, but still not drastic).
This implementation uses the feature extractor wrapper around PyTorch module which uses PyTorch hook methods to retrieve layer activations. With this extractor:
- You don't need to write a new module wrapper in order to extract desired features every time you want to use a new loss network. You just need to input model and layer indexes to the feature extractor wrapper and it will handle extracting for you. (Note: The wrapper flattens the input module/model so you need to input proper indexes of the flattened module, i.e. if module/model is a composition of smaller modules it will be represented as flat list of layers inside the wrapper).
- Makes training process slightly faster.
The implementation allows you use different weights for different style features, which leads to better visual results.

Requirements

pytorch (>= 0.4.0)
torchvision
PIL
OpenCV (for webcam demo)
GPU is not necessary

Usage

To train the transformer network:

python fnst.py -train

To stylize an image with a pretrained model:

python fnst.py

All configurable parameters are stored as globals in the top of fnst.py file, so in order to configure those parameters just change them in fnst.py (I thought it is more convenient way than adding dozen of arguments).

There is also webcam demo in the repo, to run it:

python webcam.py

webcam.py also has some globals in the top of the file which you can change.

Examples

All models were trained on 128x128 (because of GTX 960m on my laptop) COCO Dataset images for 3 epochs.

Styles
Results

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

mmalotin / pytorch-fast-neural-style-mobilenetV2