Developing Next-Generation Semantic Image Classification Model Through Generative Adversarial Networks (GANs)

Main Article Content

Sooksawaddee Nattawuttisit

Abstract

This research aims to develop an image classification model using Generative Adversarial Networks (GANs) to improve image retrieval and interpretation through natural language processing. This technology generates new content by learning from existing data and producing outputs similar to the original samples. The study's sample data is drawn from the Flickr 30K dataset, consisting of 158,915 entries of images and natural language descriptions. A sample size of 384 entries was determined using Cochran's formula with a 95% confidence level and a 5% margin of error. The data was split into training and testing sets at a ratio of 80/20 to optimize the model's performance in image interpretation. The model's performance was evaluated based on the similarity between the AI-predicted outcomes and the images with descriptions and validated by AI experts. The test results showed an accuracy of 82%, a recall of 78%, and a precision of 80%, indicating the model's effectiveness in interpreting images based on natural language descriptions. This research has commercial applications, such as automatic image categorization on social media or image retrieval in large-scale databases. Future model development should focus on improving recall to enhance completeness and better meet user needs.

Article Details

How to Cite
Nattawuttisit, S. (2024). Developing Next-Generation Semantic Image Classification Model Through Generative Adversarial Networks (GANs). PKRU SciTech Journal, 8(2), 79–90. retrieved from https://ph01.tci-thaijo.org/index.php/pkruscitech/article/view/257817
Section
Research Articles

References

Goodfellow, I., Bengio, Y., & Courville, A. (2023). Deep Learning. MIT Press.

Alif, M. D. N., & Fahrudin, N. F. (2024). Performance Analysis of Oversampling and Undersampling on Telco Churn Data Using Naive Bayes, SVM And Random Forest Methods (pp 1–13). In E3S Web of Conferences, 484, 02004.

Zhang, Y., Wang, S., & Li, H. (2023). Enhancing image classification with convolutional neural networks: a comprehensive review. Journal of Computer Vision and Image Processing, 45(2), 123–140.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.

Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes (pp 1–14).

In The International Conference on Learning Representations (ICLR) 2014.

Papers with Code. (2024). Flickr30k Dataset. [Online], Retrieved from https://paperswithcode.com/dataset/flickr30k (7 August 2024).

จักรินทร์ สันติรัตนภักดี และ ศุภกฤษฏิ์ นิวัฒนากูล. (2567). แบบจำลองการจำแนกรูปภาพตามความหมาย ได้รับการฝึกฝนสำหรับการเรียกค้นรูปภาพโดยใช้ภาษาธรรมชาติ. วารสารวิชาการซายน์เทค มรภ.ภูเก็ต, 8(1), 68–82.

Vivekananthan, S. (2024). Comparative analysis of generative models: Enhancing image synthesis with VAEs, GANs and stable diffusion. arXiv, 2408.08751.

Hassan, R. T., & Ahmed, N. S. (2023). Evaluating of efficacy semantic similarity methods for comparison of academic thesis and dissertation texts. Science Journal of University of Zakho, 11(3), 396–402.

Nattawuttisit, S., & Chantron, P. (2024). Revolutionizing AI driven innovations in gemstone classification: a synergistic approach integrating visual and semantic NLP techniques. Nanotechnology Perceptions, 20(4), 333–345.