All Projects → wolverinn → HEVC-CU-depths-prediction-CNN

wolverinn / HEVC-CU-depths-prediction-CNN

Licence: MIT license
Using convolutional neural networks to predict the Coding Units (CUs) depths in HEVC intra-prediction mode, in order to reduce the time of the encoding process in HEVC.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to HEVC-CU-depths-prediction-CNN

Smarterstreaming
业内为数不多致力于极致体验的超强全自研跨平台(windows/linux/android/iOS)流媒体内核,通过模块化自由组合,支持实时RTMP推流、RTSP推流、RTMP播放器、RTSP播放器、录像、多路流媒体转发、音视频导播、动态视频合成、音频混音、直播互动、内置轻量级RTSP服务等,比快更快,业界真正靠谱的超低延迟直播SDK(1秒内,低延迟模式下200~400ms)。
Stars: ✭ 9,835 (+17154.39%)
Mutual labels:  hevc
rtsp-video-server
RTSP video streaming server implementation based on Live555 and FFmpeg
Stars: ✭ 36 (-36.84%)
Mutual labels:  hevc
flvAnalyser
FLV v1.0 analyser
Stars: ✭ 76 (+33.33%)
Mutual labels:  hevc
Patch
Something could be public patches
Stars: ✭ 120 (+110.53%)
Mutual labels:  hevc
Swiftffmpeg
A Swift wrapper for the FFmpeg API
Stars: ✭ 243 (+326.32%)
Mutual labels:  hevc
heic2hevc
convert HEIC file to H.265 bitstream(Annex.B)
Stars: ✭ 25 (-56.14%)
Mutual labels:  hevc
Gpuvideoplayer
Fast video playback on Unity using GPU Decoding
Stars: ✭ 82 (+43.86%)
Mutual labels:  hevc
libheif-sharp
Provides .NET bindings for libheif.
Stars: ✭ 30 (-47.37%)
Mutual labels:  hevc
uci
Ultra Compact Image (UCI)
Stars: ✭ 79 (+38.6%)
Mutual labels:  hevc
VIDEOconvertor
A stable and Fast telegram video convertor bot which can encode into different libs and resolution, compress videos, convert video into audio and other video formats, rename with thumbnail support, generate screenshot and trim videos.
Stars: ✭ 180 (+215.79%)
Mutual labels:  hevc
H265webplayer
h265webplayer是金山云的Web端H.265视频播放器,该播放器Web SDK让您可以在支持WebAssembly的浏览器上播放FLVhttp-flv协议的直播视频和mp4格式的点播视频。
Stars: ✭ 126 (+121.05%)
Mutual labels:  hevc
Fastflix
FastFlix is a free GUI for HEVC and AV1 encoding, GIF/WebP creation, and more!
Stars: ✭ 154 (+170.18%)
Mutual labels:  hevc
heifreader
HEIF(High Efficiency Image Format) reader for Android
Stars: ✭ 57 (+0%)
Mutual labels:  hevc
Kimera
Low-latency hardware accelerated codec based video streaming utility.
Stars: ✭ 113 (+98.25%)
Mutual labels:  hevc
colab-ffmpeg-cuda
FFmpeg build with CUDA support for Linux (especially for Google Colab)
Stars: ✭ 40 (-29.82%)
Mutual labels:  hevc-encoder
Omaf.js
HTML5 MSE Playback of MPEG 360 VR Tiled Streaming
Stars: ✭ 85 (+49.12%)
Mutual labels:  hevc
video-quality-metrics
Test specified presets/CRF values for the x264 or x265 encoder. Compares VMAF/SSIM/PSNR numerically & via graphs.
Stars: ✭ 87 (+52.63%)
Mutual labels:  hevc
rrdncnn
[TIP'2021 - MMM'2020] Down-Sampling Based Video Coding with Degradation-aware Restoration-Reconstruction Deep Neural Network
Stars: ✭ 27 (-52.63%)
Mutual labels:  hevc
xin26x
Video Encoder for Now and Next Decade
Stars: ✭ 74 (+29.82%)
Mutual labels:  hevc
demuxer
A tool for demux ts/mp4/flv by typescript. Support HEVC/AVC/AAC codec
Stars: ✭ 108 (+89.47%)
Mutual labels:  hevc

HEVC-CU-depths-prediction-CNN

Using convolutional neural networks to predict the Coding Units (CUs) depths in HEVC intra-prediction mode. Achieve a trade-off between the encoding time in HEVC and the BDBR.

Introduction

In HEVC intra-prediction mode, it takes the HEVC encoder a lot of time to decide the best depth for CTUs. So we try to use CNN to predict the CTU depth decision, and try to achieve a trade-off between the encoding time and the BDBR.

For a 64x64 CTU, the HEVC encoder predicts a 16x16 matrix to represent its depth decision. We can further process this 16x16 matrix and extract 16 labels to represent the depth prediction for a 64x64 CTU (because elements in each 4x4 block in this matrix is identical). So in short, we can use a 64x64 CTU (which is an image) as the input of our network, and output a vector of length 16.

Directly predict 16 labels at one time is difficult for a neural network, so there're other ways to design the model. A 64x64 CTU corresponds to 16 labels. And if we divide it into four 32x32 CUs, then each 32x32 CU corresponds to only four labels, which makes the task easier.

The depths are 0, 1, 2 or 3. Depth 0 indicates that the 64x64 CU will be encoded as it is. Depth 1 indicates that the 64x64 CU will be further split into four 32x32 CUs and then be encoded, etc. Here's an example of a 64x64 CU and its depth decision:

CU depths

For more information on a CNN approach to predict the CU depths for a 64x64 CTU, you can refer to these documents:

CNN model

Like mentioned above, we can directly use a 32x32 CU as input, and ouput 4 labels. But if we know how the depth 0/1/2/3 is decided, then this model doesn't make sense for depth-0, because a 32x32 CU is only part of a 64x64 CU, and it won't be sufficient to decide whether the 64x64 CU should be split or non-split.

So our model use both the 64x64 CU and the current 32x32 CU as input, and ouputs 4 labels indicating depths. Here's our architecture:

cnn_model

Dataset & Loss Function

We generate our own dataset from YUV test sequences, refer to:

HEVC-CU-depths-dataset

We use Cross Entropy Loss as loss function. For the four output labels, we calculate the Cross Entropy Loss seperately and then add them together.

Validation

The loss of our trained model on test set is: 3.1049

The accuracy of each label predicted on test set is: 66.12%

The best way to evaluate the model is to integrate the model into the HEVC encoder. I've realized a pipeline, see it in:

HEVC-deep-learning-pipeline

Using this evaluating pipeline, we can compare the change in encoding time and BDBR at the same time.

YUV sequence bitrate-HEVC bitrate-CNN Y-PSNR-HEVC Y-PSNR-CNN YUV-PSNR-HEVC YUV-PSNR-CNN Time-HEVC Time-CNN
Flowervase_416x240_30 522.3700 556.9627 34.0869 33.8087 35.2319 34.9543 1225.481 367.107
Keiba_416x240_30 539.8007 582.1690 33.4852 33.3019 34.5684 34.3756 1281.129 356.524
mobile_352x288_20 1853.5339 1955.5392 28.2047 27.9656 29.3071 29.0889 1629.696 666.098
RaceHorses_832x480_30 2594.1504 2719.9232 32.1133 31.9442 32.9179 32.7702 5440.564 1513.008
ParkScene_1920x1080_24 7361.9354 7702.8979 32.7847 32.5906 33.9392 33.7562 13238.985 3591.274

The average BD bitrate increase is: 6.60%. The average loss in BD-PSNR is: 0.6dB. The average encoding time reduction is: 71.5%.

Another way to evaluate the model is to compare the difference in RD-cost for each YUV file encoded. As xCompressCU() in HEVC encoder calculates the RD-cost exhaustively at each depth, we can get the RD-cost for every possible depth decision. Thus, we can realize comparison of RD-cost between the original encoder and the CNN model. See the test RD-cost folder for codes.

The increase in RD cost of our model is: 2.1% (tested only on one YUV sequence)

To be continued...

Since 1 label comes from a 16x16 CU, so we can simply predict 1 label at a time. The input can be a combination of 64x64, 32x32 and 16x16 CUs. I think this will achieve higher accuracy... Also, some pre-trained models like ResNet can be tried...

If you find this repository useful, please give it a star! ❤️

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].