Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → yu45020 → Waifu2x

yu45020 / Waifu2x

Licence: gpl-3.0

PyTorch on Super Resolution

Programming Languages

139335 projects - #7 most used programming language

Labels

pytorch super-resolution

Projects that are alternatives of or similar to Waifu2x

Test Time Augmentation (TTA) wrapper for computer vision tasks: segmentation, classification, super-resolution, ... etc.

Stars: ✭ 107 (-31.41%)

Mutual labels: super-resolution

Enhancenet Code

EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis (official repository)

Stars: ✭ 142 (-8.97%)

Mutual labels: super-resolution

[Params: Only 272K!!!] Efficient Image Super-Resolution Using Pixel Attention, in ECCV Workshop, 2020.

Stars: ✭ 151 (-3.21%)

Mutual labels: super-resolution

Deeply Recursive Cnn Tf

Test implementation of Deeply-Recursive Convolutional Network for Image Super-Resolution

Stars: ✭ 116 (-25.64%)

Mutual labels: super-resolution

Patch-base progressive 3D Point Set Upsampling

Stars: ✭ 131 (-16.03%)

Mutual labels: super-resolution

Awesome Cvpr2021 Cvpr2020 Low Level Vision

A Collection of Papers and Codes for CVPR2021/CVPR2020 Low Level Vision

Stars: ✭ 139 (-10.9%)

Mutual labels: super-resolution

Natural and Realistic Single Image Super-Resolution with Explicit Natural Manifold Discrimination (CVPR, 2019)

Stars: ✭ 105 (-32.69%)

Mutual labels: super-resolution

A Pytorch Tutorial To Super Resolution

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network | a PyTorch Tutorial to Super-Resolution

Stars: ✭ 157 (+0.64%)

Mutual labels: super-resolution

A TensorFlow implementation of CVPR 2018 paper "Residual Dense Network for Image Super-Resolution".

Stars: ✭ 136 (-12.82%)

Mutual labels: super-resolution

CVPR2019 (oral) Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers (AdaFM). PyTorch implementation

Stars: ✭ 151 (-3.21%)

Mutual labels: super-resolution

Densely Residual Laplacian Super-resolution, IEEE Pattern Analysis and Machine Intelligence (TPAMI), 2020

Stars: ✭ 120 (-23.08%)

Mutual labels: super-resolution

Awesome Gan For Medical Imaging

Awesome GAN for Medical Imaging

Stars: ✭ 1,814 (+1062.82%)

Mutual labels: super-resolution

Waifu2x Extension

Image, GIF and Video enlarger/upscaler achieved with waifu2x and Anime4K. [NO LONGER UPDATED]

Stars: ✭ 149 (-4.49%)

Mutual labels: super-resolution

Awesome Eccv2020 Low Level Vision

A Collection of Papers and Codes for ECCV2020 Low Level Vision or Image Reconstruction

Stars: ✭ 111 (-28.85%)

Mutual labels: super-resolution

Frame-Recurrent Video Super-Resolution (official repository)

Stars: ✭ 157 (+0.64%)

Mutual labels: super-resolution

Supper Resolution

Super-resolution (SR) is a method of creating images with higher resolution from a set of low resolution images.

Stars: ✭ 105 (-32.69%)

Mutual labels: super-resolution

Keras Image Super Resolution

EDSR, RCAN, SRGAN, SRFEAT, ESRGAN

Stars: ✭ 143 (-8.33%)

Mutual labels: super-resolution

Official Pytorch Implementation for Trinity of Pixel Enhancement: a Joint Solution for Demosaicing, Denoising and Super-Resolution

Stars: ✭ 157 (+0.64%)

Mutual labels: super-resolution

OpenMMLab Image and Video Editing Toolbox

Stars: ✭ 2,618 (+1578.21%)

Mutual labels: super-resolution

Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.

Stars: ✭ 2,708 (+1635.9%)

Mutual labels: super-resolution

View All Similar Projects ➔

Waifu2x

Re-implementation on the original waifu2x in PyTorch with additional super resolution models. This repo is mainly used to explore interesting super resolution models. User-friendly tools may not be available now ><.

Dependencies

Python 3x
PyTorch >= 1 ( > 0.41 shall also work, but not guarantee)
Nvidia/Apex (used for mixed precision training, you may use the python codes directly)

Optinal: Nvidia GPU. Model inference (32 fp only) can run in cpu only.

What's New

Add CARN Model (Fast, Accurate, and Lightweight Super-Resolution with Cascading Residual Network). Model Codes are adapted from the authors's github repo. I add Spatial Channel Squeeze Excitation and swap all 1x1 convolution with 3x3 standard convolutions. The model is trained in fp 16 with Nvidia's apex. Details and plots on model variant can be found in docs/CARN
Dilated Convolution seems less effective (if not make the model worse) in super resolution, though it brings some improvement in image segmentation, especially when dilated rate increases and then decreases. Further investigation is needed.

How to Use

Compare the input image and upscaled image

from utils.prepare_images import *
from Models import *
from torchvision.utils import save_image
model_cran_v2 = CARN_V2(color_channels=3, mid_channels=64, conv=nn.Conv2d,
                        single_conv_size=3, single_conv_group=1,
                        scale=2, activation=nn.LeakyReLU(0.1),
                        SEBlock=True, repeat_blocks=3, atrous=(1, 1, 1))
                        
model_cran_v2 = network_to_half(model_cran_v2)
checkpoint = "model_check_points/CRAN_V2/CARN_model_checkpoint.pt"
model_cran_v2.load_state_dict(torch.load(checkpoint, 'cpu'))
# if use GPU, then comment out the next line so it can use fp16. 
model_cran_v2 = model_cran_v2.float() 

demo_img = "input_image.png"
img = Image.open(demo_img).convert("RGB")

# origin
img_t = to_tensor(img).unsqueeze(0) 

# used to compare the origin
img = img.resize((img.size[0] // 2, img.size[1] // 2), Image.BICUBIC) 

# overlapping split
# if input image is too large, then split it into overlapped patches 
# details can be found at [here](https://github.com/nagadomi/waifu2x/issues/238)
img_splitter = ImageSplitter(seg_size=64, scale_factor=2, boarder_pad_size=3)
img_patches = img_splitter.split_img_tensor(img, scale_method=None, img_pad=0)
with torch.no_grad():
    out = [model_cran_v2(i) for i in img_patches]
img_upscale = img_splitter.merge_img_tensor(out)

final = torch.cat([img_t, img_upscale])
save_image(final, 'out.png', nrow=2)

Training

If possible, fp16 training is preferred because it is much faster with minimal quality decrease.

Sample training script is available in train.py, but you may need to change some liens.

Image Processing

Original images are all at least 3k x 3K. I downsample them by LANCZOS so that one side has at most 2048, then I randomly cut them into 256x256 patches as target and use 128x128 with jpeg noise as input images. All input patches have at least 14 kb, and they are stored in SQLite with BLOB format. SQlite seems to have better performance than file system for small objects. H5 file format may not be optimal because of its larger size.

Although convolutions can take in any sizes of images, the content of image matters. For real life images, small patches may maintain color,brightness, etc variances in small regions, but for digital drawn images, colors are added in block areas. A small patch may end up showing entirely one color, and the model has little to learn.

For example, the following two plots come from CARN and have the same settings, including initial parameters. Both training loss and ssim are lower for 64x64, but they perform worse in test time compared to 128x128.

Downsampling methods are uniformly chosen among [PIL.Image.BILINEAR, PIL.Image.BICUBIC, PIL.Image.LANCZOS] , so different patches in the same image might be down-scaled in different ways.

Image noise are from JPEG format only. They are added by re-encoding PNG images into PIL's JPEG data with various quality. Noise level 1 means quality ranges uniformly from [75, 95]; level 2 means quality ranges uniformly from [50, 75].

Models

Models are tuned and modified with extra features.

From Waifu2x

Upconv7
Vgg_7
Cascaded Residual U-Net with SEBlock (PyTorch codes are not available and under testing)

Models Comparison

Images are from Key: サマボケ(Summer Pocket).

The left column is the original image, and the right column is bicubic, DCSCN, CRAN_V2

Scores

The list will be updated after I add more models.

Images are twitter icons (PNG) from Key: サマボケ(Summer Pocket). They are cropped into non-overlapping 96x96 patches and down-scaled by 2. Then images are re-encoded into JPEG format with quality from [75, 95]. Scores are PSNR and MS-SSIM.

	Total Parameters	BICUBIC	Random*
CRAN V2	2,149,607	34.0985 (0.9924)	34.0509 (0.9922)
DCSCN 12	1,889,974	31.5358 (0.9851)	31.1457 (0.9834)
Upconv 7	552,480	31.4566 (0.9788)	30.9492 (0.9772)

*uniformly select down scale methods from Image.BICUBIC, Image.BILINEAR, Image.LANCZOS.

DCSCN

Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection and Network in Network

DCSCN is very interesting as it has relatively quick forward computation, and both the shallow model (layerr 8) and deep model (layer 12) are quick to train. The settings are different from the paper.

I use exponential decay to decrease the number of feature filters in each layer. Here is the original filter decay method.
I also increase the reconstruction filters from 48 to 128.
All activations are replaced by SELU. Dropout and weight decay are not added neither because they significantly increase the training time.
The loss function is changed from MSE to L1. According to Loss Functions for Image Restoration with Neural Networks, L1 seems to be more robust and converges faster than MSE. But the authors find the results from L1 and MSE are similar.

I need to thank jiny2001 (one of the paper's author) to test the difference of SELU and PRELU. SELU seems more stable and has fewer parameters to train. It is a good drop in replacement

layers=8, filters=96 and dataset=yang91+bsd200. The details can be found in here.

A pre-trained 12-layer model as well as model parameters are available. The model run time is around 3-5 times of Waifu2x. The output quality is usually visually indistinguishable, but its PSNR and SSIM are bit higher. Though, such comparison is not fair since the 12-layer model has around 1,889,974 parameters, 5 times more than waifu2x's Upconv_7 model.

CARN

Channels are set to 64 across all blocks, so residual adds are very effective. Increase the channels to 128 lower the loss curve a little bit but doubles the total parameters from 0.9 Millions to 3 Millions. 32 Channels has much worse performance. Increasing the number of cascaded blocks from 3 to 5 doesn't lower the loss a lot.

SE Blocks seems to have the most obvious improvement without increasing the computation a lot. Partial based padding seems have little effect if not decrease the quality. Atrous convolution is slower about 10%-20% than normal convolution in Pytorch 1.0, but there are no obvious improvement.

Another more effective model is to add upscaled input image to the final convolution. A simple bilinear upscaled image seems sufficient.

More examples on model configurations can be found in docs/CARN folder

Waifu2x Original Models

Models can load waifu2x's pre-trained weights. The function forward_checkpoint sets the nn.LeakyReLU to compute data inplace.

Upconv_7

Original waifu2x's model. PyTorch's implementation with cpu only is around 5 times longer for large images. The output images have very close PSNR and SSIM scores compared to images generated from the caffe version , thought they are not identical.

Vgg_7

Not tested yet, but it is ready to use.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 156

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (8) 🔗