Flip-Robust Neural Image Assessment (FR-NIMA) for Spatially Consistent IQA
Main Article Content
Abstract
Neural Image Assessment (NIMA) has become a widely adopted approach for blind image quality assessment (BIQA), yet it remains sensitive to simple spatial transformations such as horizontal ips. Such variation can lead to inconsistent predictions, even when the perceived visual content remains largely unchanged. To address this issue, we introduce Flip-Robust Neural Image Assessment (FR-NIMA), an enhanced training strategy that enhances the spatial robustness of BIQA models. Instead of modifying network architectures, FR-NIMA incorporates a flip-consistency regularization term that penalizes discrepancies between the predicted quality distributions of an image and its horizontally flipped counterpart. Two variants-one-branch and two-branch formulations-are explored, both introducing no additional model parameters. FR-NIMA is evaluated across four CNN backbones (MobileNetV2, VGG19, Xception, InceptionV3) and one Vision Transformer (ViT-Small) using the LIVE dataset and two additional test sets representing distinct scene types. Performance is assessed using complementary metrics, including the Test Loss (EMD2), Absolute Flip Gap (|FlipGap|), Flip-Consistency Win Rate (FCWR), Average Flip- Gap Delta (AFGD), and Average Flip-Gap Ratio (AFGR). Experimental results demonstrate that FR-NIMA effectively reduces ip-gap magnitude and variability while maintaining comparable test accuracy across all back- bones. These findings establish FR-NIMA as a simple yet effective framework for enhancing the stability, spatial consistency, and trustworthiness of deep IQA models.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
H. Talebi and P. Milanfar, “NIMA: Neural Image Assessment,” in IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 3998-4011, Aug. 2018.
X. Min et al., “Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 35, no. 8, pp. 7778-7791, Aug. 2025.
C. Chen et al., “TOPIQ: A Top-Down Approach From Semantics to Distortions for Image Quality Assessment,” in IEEE Transactions on Image Processing, vol. 33, pp. 2404-2418, 2024.
R. Sureddi, S. Zadtootaghaj, N. Barman and A. C. Bovik, “TRIQA: Image Quality Assessment by Contrastive Pretraining on Ordered Distortion Triplets,” arXiv preprint arXiv:2507.12687, 2025.
H. Zhou et al., “UniQA: Unified VisionLanguage Pre-training for Image Quality and Aesthetic Assessment,” arXiv preprint arXiv:2406.01069, 2024.
H. Guo, K. Zheng, X. Fan, H. Yu and S. Wang, “Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 729-739, 2019.
S. Leem and H. Seo, “Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 2956-2964, 2024.
S. Liu and W. Deng, “Very deep convolutional neural network based image classification using small training sample size,” 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 730-734, 2015.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510-4520, 2018.
A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” arXiv preprint arXiv:2010.11929, 2020.
LIVE In the Wild Image Quality Challenge Database, [Online] Available: https://live.ece.utexas.edu/research/ChallengeDB/index.html
D. Ghadiyaram and A. C. Bovik, “Massive Online Crowdsourced Study of Subjective and Objective Picture Quality,” in IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372-387, Jan. 2016.
Scenery dataset, [Online] Available: https: //universe.roboflow.com/shivv-thtaj/out-painting-using-gans-2/dataset/1
Flowers dataset, [Online] Available: https: //universe.roboflow.com/lab-wtcwr/flowers-7rvll/dataset/1
C. Ma, Z. Shi, Z. Lu, S. Xie, F. Chao and Y. Sui, “A Survey on Image Quality Assessment: Insights, Analysis, and Future Outlook,” arXiv preprint arXiv:2502.08540, 2025.
Z. You et al., “Descriptive Image Quality Assessment in the Wild,” arXiv preprint arXiv:2405.18842, 2024.
S. Lao et al., “Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, pp. 1139-1148, 2022.
S. Yang et al., “MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, pp. 1190-1199, 2022.
T. Wu, J. Zou, J. Liang, L. Zhang and K. Ma, “VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank,” arXiv preprint arXiv:2505.14460, 2025.
Z. You, Z. Li, J. Gu, Z. Yin, T. Xue and C. Dong, “Depicting Beyond Scores: Advancing Image Quality Assessment Through Multi-modal Language Models,” European Conference on Computer Vision (ECCV), pp. 259–276, 2024.
J. Liu et al., “MedQ-Bench: Evaluating and Exploring Medical Image Quality Assessment Abilities in MLLMs,” arXiv preprint arXiv:2510.01691, 2025.
D. He, H. Wang and M. Yaqub, “Advancing Fetal Ultrasound Image Quality Assessment in Low-Resource Settings,” arXiv preprint arXiv:2507.22802, 2025.
F. Boutros, M. Fang, M. Klemt, B. Fu and N. Damer, “CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 5836-5845, 2023.
T. Wu et al., “Assessor360: Multi-Sequence Network for Blind Omnidirectional Image Quality Assessment,” Proceedings of the 37th International Conference on Neural Information Processing Systems, pp. 64957-64970, 2023.
Z. Chen et al., “SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning,” arXiv preprint arXiv:2411.10161, 2024.
S. Shi et al., “Region-Adaptive Deformable Network for Image Quality Assessment,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, pp. 324-333, 2021.
A. Saha, S. Mishra and A. C. Bovik, “Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild,” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, pp. 58465855, 2023.
V. Hosu, L. Agnolucci, D. Iso and D. Saupe, “Image Intrinsic Scale Assessment: Bridging the Gap Between Quality and Resolution,” Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025.
F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 1800-1807, 2017.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 28182826, 2016.
Matlab documentation for Vision Transformer, [Online] Available: https://www.mathworks.com/help/vision/ref/ visiontransformer.html
R. Tanawongsuwan, S. Phongsuphap and P. Mongkolwat, “Evaluating Trust in CNN Transfer Learning with Flower Image Classification via Heatmap-Based XAI,” ECTI Transactions on Computer and Information Technology (ECTICIT), vol. 19, no. 3, pp. 392-405, 2025.