YUV-based Deep Learning Super-Resolution for Bitrate Reduction and ROI Preservation in Modern Video Codecs | ECTI Transactions on Computer and Information Technology (ECTI-CIT)

PDF

Published: Mar 7, 2026

DOI: https://doi.org/10.37936/ecti-cit.2026202.263972

Keywords:

HEVC encoding vvc bitrate reduction deep learning super-resolution yuv ROI preservation task-driven video coding light-weight AI on-edge device

Lertluck Leela-amornsin

Chulalongkorn University, Thailand

Nuttapon Vanakittistien

Chulalongkorn University, Thailand

Nattee Niparnan

Chulalongkorn University, Thailand

Pitchaya Sitthi-amorn

Chulalongkorn University, Thailand

Attawith Sudsang

Chulalongkorn University, Thailand

Abstract

High Efficiency Video Coding (HEVC) and its successors, such as Versatile Video Coding (VVC), offer substantial bitrate reductions, yet challenges remain in preserving visual fidelity under bandwidth and computational constraints. This paper proposes a deep learning-based super-resolution (SR) framework that operates natively in the YUV color space, eliminating costly RGB-YUV conversions and integrating seamlessly with modern video compression pipelines. We develop two convolutional network architectures trained on YUV-formatted video data: a full 3-channel model and a lightweight two-stream variant that separately processes luminance (Y) and chrominance (UV) channels using compact subnetworks. The proposed method enhances both full-frame and region-of-interest (ROI) quality, outperforming conventional HEVC baselines in terms of rate-distortion efficiency. Evaluations on diverse video sequences demonstrate significant bitrate savings and effective ROI preservation, with the lightweight model offering a practical solution for AI-driven applications in resource-constrained environments.

How to Cite

[1]

L. Leela-amornsin, N. Vanakittistien, N. Niparnan, P. Sitthi-amorn, and A. Sudsang, “YUV-based Deep Learning Super-Resolution for Bitrate Reduction and ROI Preservation in Modern Video Codecs”, ECTI-CIT Transactions, vol. 20, no. 2, pp. 219–232, Mar. 2026.

Issue

Vol. 20 No. 2 (2026): ECTI Transactions on CIT (Apr 2026)

Section

Research Article

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

J. Xu, Z. Xiong and S. P. Bhattacharyya, “PIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 19529-19539, 2023.

B. Li, H. Li, L. Li and J. Zhang, “λDomain Rate Control Algorithm for High Eﬃciency Video Coding,” in IEEE Transactions on Image Processing, vol. 23, no. 9, pp. 3841-3854, Sept. 2014.

X. Wei, M. Zhou, H. Wang, H. Yang, L. Chen and S. Kwong, “Recent Advances in Rate Control: From Optimization to Implementation and Beyond,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 1, pp. 17-33, Jan. 2024.

L. -C. Chen, J. -H. Hu and W. -H. Peng, “Reinforcement Learning for HEVC/H.265 Framelevel Bit Allocation,” 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), Shanghai, China, pp. 1-5, 2018.

M. Zhou, X. Wei, S. Kwong, W. Jia and B. Fang, “Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC,” in IEEE Transactions on Multimedia, vol. 23, pp. 1106-1121, 2021.

J. Shi and Z. Chen, “Reinforced Bit Allocation under Task-Driven Semantic Distortion Metrics,” 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, pp. 1-5, 2020.

X. Li, J. Shi and Z. Chen, “Task-Driven Semantic Coding via Reinforcement Learning,” in IEEE Transactions on Image Processing, vol. 30, pp. 6307-6320, 2021.

T. Boulay, S. El-Hachimi, M. K. Surisetti, P. Maddu and S. Kandan, “Yuvmultinet: Realtime yuv multi-task cnn for autonomous driving,” arXiv preprint arXiv:1904.05673, 2019.

X. Wen, Z. Pan, Y. Hu and J. Liu, “Generative adversarial learning in yuv color space for thin cloud removal on satellite imagery,” Remote Sensing, vol. 13, no. 6, p. 1079, 2021.

Z.F.E. Mohammed Y. Abbass and H. Kasban, “Low-light image enhancement via improved lightweight yuv transformer-based models,” Journal of Visual Communication and Image Representation, 2025.

Z. Wang, J. Chen and S. C. H. Hoi, “Deep Learning for Image Super-Resolution: A Survey,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3365-3387, 1 Oct. 2021.

C. Dong, C. C. Loy, K. He and X. Tang, “Image Super-Resolution Using Deep Convolutional Networks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295-307, 1 Feb. 2016.

C. Ledig et al., “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 105-114, 2017.

X. Wang, K. C. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao and C. C. Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in European Conference on Computer Vision (ECCV) Workshops, 2018.

P. Isola, J. -Y. Zhu, T. Zhou and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 5967-5976, 2017.

A. Kappeler, S. Yoo, Q. Dai and A. K. Katsaggelos, “Video Super-Resolution With Convolutional Neural Networks,” in IEEE Transactions on Computational Imaging, vol. 2, no. 2, pp. 109-122, June 2016.

R. Yang et al., “Vsrresfeatgan: Video superresolution with residual feature and adversarial networks,” in IEEE Transactions on Circuits and Systems for Video Technology, 2019.

Y. Jo, S. W. Oh, J. Kang and S. J. Kim, “Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 3224-3232, 2018.

J. Caballero et al., “Real-Time Video SuperResolution with Spatio-Temporal Networks and Motion Compensation,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 2848-2857, 2017.

Y. Tian, Y. Zhang, Y. Fu and C. Xu, “TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, pp. 3357-3366, 2020.

T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, G. Liu, A. Tao, J. Kautz and B. Catanzaro, “Video-to-video synthesis,” arXiv preprint arXiv:1808.06601, 2018.

Q. Liu et al., “Video super-resolution based on deep learning: A comprehensive survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.

A. Bugeau, R. Giraud and L. Raad , “Influence of color spaces for deep learning image colorization,” Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging, pp. 847-878, 2023.

S. Nah et al., “NTIRE 2019 Challenge on Video Deblurring and Super-Resolution: Dataset and Study,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, pp. 1996-2005, 2019.