Attention-X: Enhancing the classification of natural attraction scenes with advanced attention mechanisms

Sujitranan Mungklachaiya; Anongporn Salaiwarakul

PDF

Published: May 28, 2025

Keywords:

Scene classification Deep learning Convolutional neural networks Attention mechanism

Sujitranan Mungklachaiya

Department of Computer Science and Information Technology, Faculty of Science, Naresuan University, Phitsanulok 65000, Thailand

Anongporn Salaiwarakul

Department of Computer Science and Information Technology, Faculty of Science, Naresuan University, Phitsanulok 65000, Thailand

https://orcid.org/0000-0003-4798-410X

Abstract

This paper proposes the Attention-X method, which is an attention-based framework designed to address the challenges of interclass similarity and intraclass variance in natural scene classification tasks. The proposed method enhances pretrained convolutional neural networks (CNNs) by integrating an attention mechanism that selectively emphasizes salient and discriminative features, which enables the model to more effectively differentiate between visually similar scenes and manage variations within the same class. The proposed Attention-X method generates attention maps aligned with extracted features, integrating spatial representations with channel-wise relevance to overcome the limitations of the original deep features. This fusion enables the model to selectively amplify meaningful feature activations while suppressing irrelevant or redundant information. This improves the model’s ability to distinguish between visually similar scenes and to handle variations within the same class. The proposed method was evaluated on the widely used SUN397, ADE20K, and Places365 benchmark datasets. The experimental results demonstrate that the proposed Attention-X method improves classification accuracy while maintaining competitive model complexity, outperforming several state-of-the-art methods. These findings highlight the effectiveness of the proposed method in real-world scenarios where subtle interclass differences and intraclass variability pose significant challenges.

How to Cite

Mungklachaiya, S., & Salaiwarakul, A. (2025). Attention-X: Enhancing the classification of natural attraction scenes with advanced attention mechanisms. Engineering and Applied Science Research, 52(3), 337–351. retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/259517

Issue

Vol. 52 No. 3 (2025)

Section

ORIGINAL RESEARCH

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Cepeda-Pacheco JC, Domingo MC. Deep learning and internet of things for tourist attraction recommendations in smart cities. Neural Comput Appl. 2022;34(10):7691-709.

Kitamura R, Itoh T. Tourist spot recommendation applying generic object recognition with travel photos. 22nd International Conference Information Visualisation (IV); 2018 Jul 10-13; Fisciano, Italy. USA: IEEE; 2018. p. 1-5.

Katsumi H, Yamada W, Ochiai K. Characterizing generic POI: a novel approach for discovering tourist attractions. J Inf Process. 2023;31:265-77.

Parikh V, Keskar M, Dharia D, Gotmare P. A tourist place recommendation and recognition system. 2018 Second International Conference on Inventive Communication and Computational Technologies; 2018 Apr 20-21; Coimbatore, India. USA: IEEE; 2018. p. 218-22.

Sun S, Gong X. Hierarchical semantic contrast for scene-aware video anomaly detection. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17-24; Vancouver, Canada. USA: IEEE; 2023. p. 22846-56.

Alqasrawi Y. Natural scene image annotation using local semantic concepts and spatial bag of visual words. Int J Sens Wirel Commun Control. 2016;6(3):153-73.

Shahriari M, Bergevin R. Land-use scene classification: a comparative study on bag of visual word framework. Multimed Tools Appl. 2017;76(21):23059-75.

Zhou Z, Li S, Wu W, Guo W, Li X, Xia G, et al. NaSC-TG2: Natural scene classification With Tiangong-2 remotely sensed imagery. IEEE J Sel Top Appl Earth Obs Remote Sens. 2021;14:3228-42.

Gupta N, Khobragade P. Muti-class image classification using transfer learning. Int J Res Appl Sci Eng Technol. 2023;11(1):700-4.

Sujee R, Sesh VB. Natural scene classification. 2019 International Conference on Computer Communication and Informatics; 2019 Jan 23-25; Coimbatore, India. USA: IEEE; 2019. p. 1-7.

Xu C, Shu J, Wang Z, Wang J. A scene classification model based on global-local features and attention in lie group space. Remote Sens. 2024;16(13):2323.

Liu Y, Zhong Y, Qin Q. Scene classification based on multiscale convolutional neural network. IEEE Trans Geosci Remote Sens. 2018;56(12):7109-21.

Li J, Lin D, Wang Y, Xu G, Zhang Y, Ding C, et al. Deep discriminative representation learning with attention map for scene classification. Remote Sens. 2020;12(9):1366.

Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition; 2006 Jun 17-22; New York, USA. USA: IEEE; 2006. p. 2169-78.

Wilson J, Arif M. Scene recognition by combining local and global image descriptors [Internet]. arXiv [Preprint]. 2017 [cited 2024 Oct 30]. Available from: https://arxiv.org/abs/1702.06850.

Xie L, Lee F, Liu L, Kotani K, Chen Q. Scene recognition: a comprehensive survey. Pattern Recognit. 2020;102:107205.

Yang Y, Newsam S. Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems; 2018 Nov 6-9; Seattle, USA. San Jose California: ACM; 2010. p. 270-9.

Singh S, Gupta A, Efros AA. Unsupervised discovery of mid-level discriminative patches [Internet]. arXiv [Preprint]. 2012 [cited 2024 Oct 30]. Available from: https://arxiv.org/abs/1205.3137.

Sadeghi F, Tappen MF. Latent pyramidal regions for recognizing scenes. In: Fitzgibbon A, Lazebnik S, Perona P, Sato Y, Schmid C, editors. Computer Vision – ECCV 2012. Lecture Notes in Computer Science. Berlin: Springer; 2012. p. 228-41.

Sitaula C, Shahi TB, Marzbanrad F, Aryal J. Recent advances in scene image representation and classification. Multimedia Tools Appl. 2024;83(3):9251-78.

Ma Y, Lei Y, Wang T. A natural scene recognition learning based on label correlation. IEEE Trans Emerg Top Comput Intell. 2022;6(1):150-8.

Yee PS, Lim KM, Lee CP. DeepScene: scene classification via convolutional neural network with spatial pyramid pooling. Expert Syst Appl. 2022;193:116382.

Bai S, Tang H, An S. Coordinate CNNs and LSTMs to categorize scene images with multi-views and multi-levels of abstraction. Expert Syst Appl. 2019;120:298-309.

Gao J, Yang J, Zhang J, Li M. Natural scene recognition based on convolutional neural networks and deep Boltzmannn machines. 2015 IEEE International Conference on Mechatronics and Automation; 2015 Aug 2-5; Beijing, China. USA: IEEE; 2015. p. 2369-74.

Masood S, Ahsan U, Munawwar F, Rizvi DR, Ahmed M. Scene recognition from image using convolutional neural network. Procedia Comput Sci. 2020;167:1005-12.

Sharma V, Nagpal N, Shandilya A, Dureja A, Dureja A. A practical approach to detect indoor and outdoor scene recognition. Proceedings of the 4th International Conference on Information Management & Machine Intelligence; 2022 Dec 23-24; Jaipur, India. New York: ACM; 2023. p. 1-10.

Liu Y, Suen CY, Liu Y, Ding L. Scene classification using hierarchical wasserstein CNN. IEEE Trans Geosci Remote Sens. 2019;57(5):2494-509.

Mungklachaiya S, Salaiwarakul A. Exploring deep learning features and bag-of-visual-words for scene classification. ICIC Express Lett B: Appl. 2024;15(10):1081-8.

Baik S, Seong H, Lee Y, Kim E. Spatial-Channel transformer for scene recognition. 2022 International Joint Conference on Neural Networks; 2022 Jul 18-23; Padua, Italy. USA: IEEE; 2022. p. 1-8.

Guo MH, Xu TX, Liu JJ, Liu ZN, Jiang PT, Mu TJ, et al. Attention mechanisms in computer vision: a survey. Comput Vis Media. 2022;8(3):331-68.

Yang X. An overview of the attention mechanisms in computer vision. J Phys: Conf Ser. 2020;1693:012173.

Peng Y, Liu X, Wang C, Xiao T, Li T. Fusing attention features and contextual information for scene recognition. Int J Pattern Recognit Artif Intell. 2022;36(3):2250014.

Wang P, Qiao J, Liu N. An improved convolutional neural network-based scene image recognition method. Comput Intell Neurosci. 2022;2022(1):3464984.

Liu R, Ning X, Cai W, Li G. Multiscale dense cross-attention mechanism with covariance pooling for hyperspectral image scene classification. Mob Inf Syst. 2021;2021(1):9962057.

Zhang J, Yu X, Lei X, Wu C. A multi-feature fusion model based on denoising convolutional neural network and attention mechanism for image classification. Int J Swarm Intell Res. 2023;14(2):1-15.

Ye W, Tan R, Liu Y, Chang CC. The comparison of attention mechanisms with different embedding modes for performance improvement of fine-grained classification. IEICE Trans Inf Syst. 2023;E106.D(5):590-600.

Xiao J, Ehinger KA, Hays J, Torralba A, Oliva A. SUN Database: exploring a large collection of scene categories. Int J Comput Vis. 2016;119(1):3-22.

Zhou B, Lapedriza A, Khosla A, Oliva A, Torralba A. Places: a 10 million image database for scene recognition. IEEE Trans Pattern Anal Mach Intell. 2018;40(6):1452-64.

López-Cifuentes A, Escudero-Viñolo M, Bescós J, García-Martín Á. Semantic-aware scene recognition. Pattern Recognit. 2020;102:107256.

Woo S, Park J, Lee JY, Kweon IS. CBAM: convolutional block attention module. Computer Vision – ECCV 2018: 15th European Conference; 2018 Sep 8-14; Munich, Germany. Berlin: Springer; 2018. p. 3-19.

Hu J, Shen L, Sun G. Squeeze-and-Excitation networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, USA. USA: IEEE; 2018. p. 7132-41.

Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. Scene parsing through ADE20K dataset. 2017 IEEE Conference on Computer Vision and Pattern Recognition; 2017 Jul 21-26; Honolulu, USA. USA: IEEE; 2017. p. 5122-30

Article Sidebar

Main Article Content

Abstract

Article Details

References