The Semantic-Based Image Classification Model Trained for Image Retrieval Using Natural Language

Main Article Content

Chakkarin Santirattanaphakdi
Suphakit Niwattanakul

Abstract

The objective of this research is to develop a model for image classification based on semantic meaning using pre-trained deep learning models with text and image data, and iterative learning on custom datasets. Evaluation results of the trained models for image retrieval using natural language label were compared against expert-assessed label meanings, revealing that the prediction performance for natural language label under three conditions, namely 1) image descriptive text resembling image label, 2) high-level conceptual text related to the image content, and 3) text describing the qualitative meaning of the image, yielded scores of 0.905, 0.830, and 0.585, respectively. The evaluation results for text describing the qualitative meaning of the image were found to be at a moderate level, as the text in the form of natural language label is considered a high-level concept. Consequently, individual perceptual experiences influenced the evaluation differently based on human cognition principles, as evidenced by the closely aligned prediction results for similar descriptive text for more than one option. Therefore, meaningful image retrieval should emphasize reducing the semantic gap in search queries and assist users by utilizing query terms aligned with image meaning rather than adhering strictly to grammatical language rules. This approach is suggested as a future direction for information retrieval.

Article Details

How to Cite
Santirattanaphakdi, C., & Niwattanakul, S. (2024). The Semantic-Based Image Classification Model Trained for Image Retrieval Using Natural Language. PKRU SciTech Journal, 8(1), 68–82. Retrieved from https://ph01.tci-thaijo.org/index.php/pkruscitech/article/view/253249
Section
Research Articles

References

Tyagi, V. (2017). Content-Based Image Retrieval Ideas, Influences, and Current Trends. Gateway East: Springer.

Barz, B., & Denzler, J. (2020). Content-based Image Retrieval and the Semantic Gap in the Deep Learning Era (pp 2 - 19). In International Workshop on Content-Based Image Retrieval: where have we been, and where are we going (CBIR 2020). Italy.

Aggarwal, C. C. (2018). Neural Networks and Deep Learning: A Textbook. Cham: Springer.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998).Gradient-based learning applied to document recognition (pp 2278-2324). In Proceedings of the IEEE. USA.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Massachusetts: MIT Press.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition (pp 770-778). In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). USA.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., & Polosukhin, I. Attention Is All You Need (pp 6000-6010). In 31st Conference on Neural Information Processing Systems (NIPS 2017). USA.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (pp 1-21). In 9th International Conference on Learning Representations 2021 (ICLR 2021). Austria.

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision (pp 8748-8763). In 38th International Conference on Machine Learning (ICML 2021).

Xu, M., Yoon, S., Fuentes, A., & Park, D. S. (2023). A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognition, 137, 109347.

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv: 1910.01108, 1-5.

Nwankpa, C., Ijomah, W., Gachagan, A., & Marshall, S. (2018). Activation Functions: Comparison of trends in Practice and Research for Deep Learning. arXiv preprint arXiv: 1910.01108, 1-20.

Rosebrock, A. (2017). Deep Learning for Computer Vision with Python. New York: PYIMAGESEARCH.

Sawarka, K. (2022). Deep Learning with PyTorch Lightning Swiftly build high-performance Artificial Intelligence (AI) models using Python. Birmingham: Packt.

Christian, B. (2011). The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive. New York: Doubleday.

อรนุช ศรีสะอาด. (2561). การตรวจสอบความเที่ยงตรงของเครื่องมือวัดผลโดยผู้เชี่ยวชาญ. วารสารการวัดผลการศึกษา มหาวิทยาลัยมหาสารคาม, 1(1), 45-49.

เรวัต แสงสุริยงค์. (2565). ความเสี่ยงของการเกิดความคลาดเคลื่อนในการวิจัยเชิงปริมาณด้านสังคมวิทยา. วารสารวิชาการมนุษยศาสตร์และสังคมศาสตร์ มหาวิทยาลัยบูรพา, 30(1), 158-185.

Brase, C. H., & Brase, C. P. (2018). Understanding Basic Statistics. Boston: Cengage Learning.

Benois-Pineau, J., & Zemmari, A. (2021). Multi-faceted Deep Learning: Models and Data. Cham: Springer.

Dix, A., Finlay, J., Abowd, G. D., & Beale, R. (2004). Human–Computer Interaction

(3rd edition). Harlow: Pearson.