HomeOfVapourSynthEvolution / Vapoursynth Bm3d
VapourSynthBM3D
Copyright© 20152016 mawen1250
BM3D denoising filter for VapourSynth
Description
BM3D is a stateoftheart image denoising algorithm. It can be extended to video denoising, named VBM3D, which is also implemented in this plugin.
This filter is much easier to use with the wrap function named BM3D() in mvsfunc
Requirements: libfftw3f3.dll from FFTW3 should be in the search path
namespace: bm3d
functions: RGB2OPP, OPP2RGB, Basic, Final, VBasic, VFinal, VAggregate
Supported Formats
sample type & bps: 816 bit integer, 32 bit float.
color family: Gray, RGB, YUV or YCoCg.
subsampling: when chroma is processed, subsampling is not supported, YUV444 only.
Important Note

The denoising quality is best when filtering in opponent color space (abbr. OPP, a kind of YUV color space with simple and intuitive matrix coefficients), which significantly outperforms the quality when filtering in RGB, YCbCr, YCgCo, etc. Thus RGB input is recommended, this filter will convert it to OPP internally and convert back to RGB for output.

Subsampling support is not implemented for simplicity and also for the sake of quality. In fact, the computational cost of conversion between YUV4xx and RGB can be ignored when comparing with that of the filtering kernel.

Alternatively, call bm3d.RGB2OPP to apply the conversion first to avoid frequently converting between RGB and OPP, and call bm3d.OPP2RGB at the end to convert it back to RGB. However, compared to the computational cost of the BM3D kernel, those conversion costs can barely affect the speed of the entire filtering procedure, and may lead to more memory consumption (especially if you set sample=1, employing 32bit float clip in the VS processing chain).

For VBM3D, the filtered output is always OPP for RGB input, and you should manually call bm3d.OPP2RGB afterwards.

For VBM3D, if specific plane is not processed (sigma is 0), then the result of that plane will be garbage, thus you should manually use std.ShufflePlanes to merge them. For the same reason, you should always convert RGB input to OPP first if you want to keep those unprocessed planes.
Usage
Helper Functions
RGB color space to opponent color space.
bm3d.RGB2OPP(clip input[, int sample=0])

input:
The input clip, must be of RGB color family.
The output clip is of YUV color family, 16 bit integer or 32 bit float. 
sample:
 0  16 bit integer output (default)
 1  32 bit float output
opponent color space to RGB color space.
bm3d.OPP2RGB(clip input[, int sample=0])

input:
The input clip, must be of YUV color family.
The output clip is of RGB color family, 16 bit integer or 32 bit float. 
sample:
 0  16 bit integer output (default)
 1  32 bit float output
BM3D Functions
BM3D is a spatial domain denoising (image denoising) filter.
basic estimate of BM3D denoising filter
The input clip is processed in 3step stages, For each reference block:
 Grouping.
Use blockmatching within the noisy image to find locations of blocks similar to the reference one (alternatively, if clip "ref" is assigned, then the blockmatching will be applied on the reference clip). Then form a groups by stacking together the matched blocks in noisy image.  Collaborative hardthresholding.
Apply a 3D transform to the formed group. Attenuate the noise by hardthresholding of the transform coefficients. Invert the 3D transform to produce estimates of all grouped blocks.  Aggregation.
Compute a basic estimate by aggregating the obtained blockwise estimates in each group using a weighted average.
This basic estimate produces a decent estimate of the noisefree image, as a reference for final estimate.
bm3d.Basic(clip input[, clip ref=input, string profile="fast", float[] sigma=[10,10,10], int block_size, int block_step, int group_size, int bm_range, int bm_step, float th_mse, float hard_thr, int matrix=2])

input:
The input clip, the clip to be filtered with collaborative hardthresholding.
The output clip is of the same format, width and height as the input clip. 
ref:
The reference clip, this clip is used in blockmatching.
If not specified, the input clip is used instead. 
profile:
Preset profiles.
A table below shows the default parameters for each profile. "fast"  Fast Profile (default)
 "lc"  Low Complexity Profile
 "np"  Normal Profile
 "high"  High Profile
 "vn"  Very Noisy Profile

sigma:
The strength of denoising, valid range [0, +inf), default [10,10,10].
This denoising algorithm is very sensitive to the sigma, so adjust it carefully according to the source.
Technically, this is the standard deviation of i.i.d. zero mean additive white Gaussian noise in 8 bit scale. BM3D denoising filter is designed based on this noise model, and best fit for attenuating it.
An array up to 3 elements can be assigned to set different sigma for Y,U,V channels. If less than 3 elements assigned, the last element's value will be assigned to the undefined elements.
0 will disable the processing for corresponding channel. 
block_size:
The size of a block is block_size x block_size (the 1st and the 2nd dimension), valid range [1,64].
A block is the basic processing unit of BM3D, representing a local patch.
Generally, larger block will be slower, especially in the DCT/IDCT part. While at the same time, larger block_size allows you to set larger block_step, resulting in less block to be processed.
8 is a wellbalanced value, both for quality and speed. 
block_step:
Sliding step to process every next reference block, valid range [1,block_size].
Total number of reference blocks to be processed can be calculated approximately by (width / block_step) * (height / block_step).
Smaller step results in processing more reference blocks, and is slower. 
group_size:
Maximum number of similar blocks in each group (the 3rd dimension), valid range [1,256].
Larger value allows more blocks in a single group. Thus, the sparsity in a transformed group raises, the filtering will be stronger, and also slower in the DCT/IDCT part.
When set to 1, no blockmatching will be performed and each group only consists of the referenc block, then basic estimate stage of BM3D will behave similar to the DFTTest denoising filter (without temporal filtering). 
bm_range:
Length of the side of the search neighborhood for blockmatching, valid range [1, +inf).
The size of search window is (bm_range * 2 + 1) x (bm_range * 2 + 1).
Larger is slower, with more chances to find similar patches. 
bm_step:
Step between two search locations for blockmatching, valid range [1, bm_range].
Total number of search locations for each reference block is (bm_range / bm_step * 2 + 1) ^ 2.
Smaller is slower, 1 is equivalent to fullsearch blockmatching. 
th_mse:
Threshold of mean square error for blockmatching, valid range [0, +inf).
Larger the value, more blocks will be matched. Too large value will lead to more dissimilar blocks being matched into a group, resulting in losses to fine structures and details.
Increase it if the noise in reference clip is strong. The default value is automatically adjusted according to sigma[0]. 
hard_thr:
The threshold parameter for the hardthresholding in 3D transformed domain, in 8 bit scale, valid range (0, +inf).
Larger value results in stronger hardthreshold filtering in frequency domain.
Usually, to tweak denoising strength, it's better to adjust "sigma" rather than "hard_thr". 
matrix:
Matrix coefficients for Gray, YUV or YCoCg input, default 2.
Since the YUV color space is unnormalized, the actual sigma used inside BM3D will be normalized according to the matrix coefficients. This is important! It can significantly affect the final results.
This normalization only plays a part when initializing the filter, thus I cannot employ the frame properties such as "_Matrix" for it.
In case matrix is not properly set for BM3D with OPP input, bm3d.RGB2OPP attachs a property "BM3D_OPP=1" to its output frame. If this property is presented but matrix is not set to 100, the GetFrame function of BM3D will return an error message.
The number is as specified in ISO/IEC 1449610, with an additional one for OPP.
 0  GBR  1  bt709  2  Unspecified, will automatically choose the matrix between smpte170m, bt709 and bt2020nc according to width and height of input clip  4  fcc  5  bt470bg  6  smpte170m  7  smpte240m  8  YCgCo, always set when color family is YCoCg  9  bt2020nc  10  bt2020c  100  OPP, opponent color space converted by bm3d.RGB2OPP, always set when color family is RGB
final estimate of BM3D denoising filter
It takes the basic estimate as a reference.
The input clip is processed in 3step stages, For each reference block:
 Grouping.
Use blockmatching within the basic estimate to find locations of blocks similar to the reference one. Then form two groups, one from the noisy image and one from the basic estimate.  Collaborative Wiener filtering.
Apply a 3D transform on both groups. Perform empirical Wiener filtering on the noisy one guided by the basic estimate.  Aggregation.
Compute a final estimate by aggregating the obtained blockwise estimates in each group using a weighted average.
This final estimate can be realized as a refinement. It can significantly improve the denoising quality, keeping more details and fine structures that were removed in basic estimate.
bm3d.Final(clip input, clip ref[, string profile="fast", float[] sigma=[10,10,10], int block_size, int block_step, int group_size, int bm_range, int bm_step, float th_mse, int matrix=2])

input:
The input clip, the clip to be filtered.
The output clip is of the same format, width and height as the input clip. 
ref:
The reference clip, this clip is used in blockmatching and as the reference in empirical Wiener filtering.
It must be specified. In original BM3D algorithm, it is the basic estimate.
Alternatively, you can choose any other decent denoising filter as basic estimate, and take this final estimate as a refinement. 
profile, sigma, block_size, block_step, group_size, bm_range, bm_step, th_mse, matrix:
Same as those in bm3d.Basic.
VBM3D Functions
VBM3D extends the BM3D to spatialtemporal domain denoising (video denoising).
The algorithm is much the same as BM3D, except that it applies blockmatching and collaborative filtering across multiple frames in a video.
For the current frame, is still applies a fullsearch(when BMstep=1) blockmatching to it.
For the backward frames and forward frames, it applies predictivesearch blockmatching, whose search window is centered on several matched locations in the previous processed frame.
The obtained blockwise estimates are also aggregated into multiple frames.
However, since the estimates are returned into multiple frames, I have to divide it into 2 functions: bm3d.VBasic or bm3d.VFinal as the first stage and bm3d.VAggregate as the second stage. The output clip of bm3d.VBasic and bm3d.VFinal is an intermediate processed buffer. It is of 32 bit float format, and (radius * 2 + 1) * 2 times the height of input.
Always call bm3d.VAggregate after bm3d.VBasic or bm3d.VFinal.
Due to the float format and multiple times height of the output clip, as well as the multiple frames requested by each function, those frame cache leads to very high memory consumption of this VBM3D implementation.
For RGB color family input, the output clip is of opponent color space in YUV color family. You should manually call bm3d.OPP2RGB after bm3d.VAggregate if you want to convert it back to RGB color space.
If specific plane is not processed (sigma is 0), then the result of that plane will be garbage, thus you should manually use std.ShufflePlanes to merge them. For the same reason, you should always convert RGB input to OPP first if you want to keep those unprocessed planes. Becanse the implementaion of VBM3D is divided into 2 functions, it's not very convenient and efficient to pass through the unprocessed planes.
basic estimate of VBM3D denoising filter
bm3d.VBasic(clip input[, clip ref=input, string profile="fast", float[] sigma=[10,10,10], int radius, int block_size, int block_step, int group_size, int bm_range, int bm_step, int ps_num, int ps_range, int ps_step, float th_mse, float hard_thr, int matrix=2])

input, ref:
Same as those in bm3d.Basic. 
profile, sigma, block_size, block_step, group_size, bm_range, bm_step, th_mse, matrix:
Same as those in bm3d.Basic. 
radius:
The temporal radius for denoising, valid range [1, 16].
For each processed frame, (radius * 2 + 1) frames will be requested, and the filtering result will be returned to these frames by bm3d.VAggregate.
Increasing radius only increases tiny computational cost in blockmatching and aggregation, and will not affect collaborative filtering, but the memory consumption can grow quadratically.
Thus, feel free to use large radius as long as your RAM is large enough :D 
ps_num:
The number of matched locations used for predictive search, valid range [1, group_size].
Larger value increases the possibility to match more similar blocks, with tiny increasing in computational cost. But in the original MATLAB implementation of VBM3D, it's fixed to 2 for all profiles except "lc", perhaps larger value is not always good for quality? 
ps_range:
Length of the side of the search neighborhood for predictivesearch blockmatching, valid range [1, +inf) 
ps_step:
Step between two search locations for predictivesearch blockmatching, valid range [1, ps_range].
The maximum number of predictivesearch locations for each reference block in a frame is (ps_range / ps_step * 2 + 1) ^ 2 * ps_num.
final estimate of VBM3D denoising filter
bm3d.VFinal(clip input, clip ref[, string profile="fast", float[] sigma=[10,10,10], int radius, int block_size, int block_step, int group_size, int bm_range, int bm_step, int ps_num, int ps_range, int ps_step, float th_mse, int matrix=2])

input, ref:
Same as those in bm3d.Final. 
profile, sigma, block_size, block_step, group_size, bm_range, bm_step, th_mse, matrix:
Same as those in bm3d.Basic. 
radius, ps_num, ps_range, ps_step:
Same as those in bm3d.VBasic.
aggregation of VBM3D denoising filter
If your input clip of bm3d.VBasic or bm3d.VFinal is of RGB color family, you will need to manually call bm3d.OPP2RGB after bm3d.VAggregate to convert it back to RGB.
bm3d.VAggregate(clip input[, int radius=1, int sample=0])

input:
The clip output by bm3d.VBasic or bm3d.VFinal. 
radius:
Must be of the same radius as used in previous function bm3d.VBasic or bm3d.VFinal. 
sample:
 0  16 bit integer output (default)
 1  32 bit float output
Profile Default
bm3d.Basic / bm3d.Final / bm3d.VBasic / bm3d.VFinal

 profile  block_size  block_step  group_size  bm_range  bm_step 

 "fast"  8/8/8/8  8/7/8/7  8/8/8/8  9/9/7/7  1/1/1/1 
 "lc"  8/8/8/8  6/5/6/5  16/16/8/8  9/9/9/9  1/1/1/1 
 "np"  8/8/8/8  4/3/4/3  16/32/8/8  16/16/12/12  1/1/1/1 
 "high"  8/8/8/8  3/2/3/2  16/32/8/8  16/16/16/16  1/1/1/1 
 "vn"  8/11/8/11  4/6/4/6  32/32/16/16  16/16/12/12  1/1/1/1 

bm3d.VBasic / bm3d.VFinal

 profile  radius  ps_num  ps_range  ps_step 

 "fast"  1/1  2/2  4/5  1/1/1/1 
 "lc"  2/2  2/2  4/5  1/1/1/1 
 "np"  3/3  2/2  5/6  1/1/1/1 
 "high"  4/4  2/2  7/8  1/1/1/1 
 "vn"  4/4  2/2  5/6  1/1/1/1 

bm3d.Basic & bm3d.VBasic / bm3d.Final & bm3d.VFinal

 profile  th_mse  hard_thr 

 "fast"  sigma[0]*80+400 / sigma[0]*10+200  2.7 / NUL 
 "lc"  sigma[0]*80+400 / sigma[0]*10+200  2.7 / NUL 
 "np"  sigma[0]*80+400 / sigma[0]*10+200  2.7 / NUL 
 "high"  sigma[0]*80+400 / sigma[0]*10+200  2.7 / NUL 
 "vn"  sigma[0]*150+1000 / sigma[0]*40+400  2.8 / NUL 

Example
BM3D Example
 using warp function in mvsfunc for convenience
import mvsfunc as mvf
flt = mvf.BM3D(src, sigma=3.0, profile1="fast")
 basic estimate only, specify different sigma for Y,U,V planes
flt = core.bm3d.Basic(src, sigma=[10,6,8])
 basic estimate + final estimate, sigma=10 for Y and sigma=7 for U,V
ref = core.bm3d.Basic(src, sigma=[10,7])
flt = core.bm3d.Final(src, ref, sigma=[10,7])
 additional prefiltered clip as the reference for blockmatching of basic estimate, sigma=10 for Y,U,V
pre = haf.sbr(src, 3)
ref = core.bm3d.Basic(src, pre, sigma=10)
flt = core.bm3d.Final(src, ref, sigma=10)
 apply the RGB<>OPP conversions explicitly
src = core.bm3d.RGB2OPP(src) # The output is of 16bit opponent color space
ref = core.bm3d.Basic(src, matrix=100) # Specify the matrix of opponent color space
flt = core.bm3d.Final(src, ref, matrix=100) # Specify the matrix of opponent color space
flt = core.bm3d.OPP2RGB(flt) # The output is of 16bit RGB color space
VBM3D Example
 using warp function in mvsfunc for convenience
import mvsfunc as mvf
flt = mvf.BM3D(src, sigma=3.0, radius1=1, profile1="fast")
 basic estimate + final estimate
src = core.bm3d.RGB2OPP(src)
ref = core.bm3d.VBasic(src, radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.bm3d.VFinal(src, ref, radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.bm3d.OPP2RGB(flt)
 if chroma is not processed, merge them into the result with std.ShufflePlanes
src = core.bm3d.RGB2OPP(src)
ref = core.bm3d.VBasic(src, sigma=[10,0,0], radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.bm3d.VFinal(src, ref, sigma=[10,0,0], radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.std.ShufflePlanes([flt,src,src], [0,1,2], vs.YUV)
flt = core.bm3d.OPP2RGB(flt)
 it's better to filter on GRAY color famliy if chroma is not processed, much less memory consumption
src = core.bm3d.RGB2OPP(src)
srcGray = core.std.ShufflePlanes(src, 0, vs.GRAY)
ref = core.bm3d.VBasic(srcGray, sigma=[10,0,0], radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.bm3d.VFinal(srcGray, ref, sigma=[10,0,0], radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.std.ShufflePlanes([flt,src,src], [0,1,2], vs.YUV)
flt = core.bm3d.OPP2RGB(flt)
 if luma is not processed, since blockmatching is based on luma of clip "ref", also merge it into basic estimate with std.ShufflePlanes
src = core.bm3d.RGB2OPP(src)
ref = core.bm3d.VBasic(src, sigma=[0,10,10], radius=1, matrix=100).bm3d.VAggregate(radius=1)
ref = core.std.ShufflePlanes([src,ref,ref], [0,1,2], vs.YUV)
flt = core.bm3d.VFinal(src, ref, sigma=[0,10,10], radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.std.ShufflePlanes([src,flt,flt], [0,1,2], vs.YUV)
flt = core.bm3d.OPP2RGB(flt)
 alternatively, use bm3d.Basic instead of bm3d.VBasic, faster, less memory consumption
src = core.bm3d.RGB2OPP(src)
ref = core.bm3d.Basic(src, matrix=100)
flt = core.bm3d.VFinal(src, ref, radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.bm3d.OPP2RGB(flt)
 Employ custom denoising filter as basic estimate, refined with VBM3D final estimate.
May compensate the shortages of both denoising filters: SMDegrain is effective at spatialtemporal smoothing but can lead to blending and detail loss, VBM3D preserves details well but is not very effective for large noise pattern (such as heavy grain).
src = core.bm3d.RGB2OPP(src)
ref = haf.SMDegrain(src)
flt = core.bm3d.VFinal(src, ref, radius=1, matrix=100).bm3d.VAggregate(radius=1)
flt = core.bm3d.OPP2RGB(flt)
Compilation
meson build
ninja C build