Developing Next-Generation Semantic Image Classification Model Through Generative Adversarial Networks (GANs)
Main Article Content
Abstract
This research aims to develop an image classification model using Generative Adversarial Networks (GANs) to improve image retrieval and interpretation through natural language processing. This technology generates new content by learning from existing data and producing outputs similar to the original samples. The study's sample data is drawn from the Flickr 30K dataset, consisting of 158,915 entries of images and natural language descriptions. A sample size of 384 entries was determined using Cochran's formula with a 95% confidence level and a 5% margin of error. The data was split into training and testing sets at a ratio of 80/20 to optimize the model's performance in image interpretation. The model's performance was evaluated based on the similarity between the AI-predicted outcomes and the images with descriptions and validated by AI experts. The test results showed an accuracy of 82%, a recall of 78%, and a precision of 80%, indicating the model's effectiveness in interpreting images based on natural language descriptions. This research has commercial applications, such as automatic image categorization on social media or image retrieval in large-scale databases. Future model development should focus on improving recall to enhance completeness and better meet user needs.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
- The original content that appears in this journal is the responsibility of the author excluding any typographical errors.
- The copyright of manuscripts that published in PKRU SciTech Journal is owned by PKRU SciTech Journal.
References
Goodfellow, I., Bengio, Y., & Courville, A. (2023). Deep Learning. MIT Press.
Smith, J., Doe, A., & Brown, L. (2023). Improving data preprocessing for SVMs and random forests. Journal of Data Science, 10(2), 123–134.
Zhang, Y., Wang, S., & Li, H. (2023). Enhancing image classification with convolutional neural networks: a comprehensive review. Journal of Computer Vision and Image Processing, 45(2), 123–140.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes (pp 1–14).
In The International Conference on Learning Representations (ICLR) 2014.
Papers with Code. (2024). Flickr30k Dataset. [Online], Retrieved from https://paperswithcode.com/dataset/flickr30k (7 August 2024).
จักรินทร์ สันติรัตนภักดี และ ศุภกฤษฏิ์ นิวัฒนากูล. (2567). แบบจำลองการจำแนกรูปภาพตามความหมาย ได้รับการฝึกฝนสำหรับการเรียกค้นรูปภาพโดยใช้ภาษาธรรมชาติ. วารสารวิชาการซายน์เทค มรภ.ภูเก็ต, 8(1), 68–82.
Kim, S., Park, J., & Lee, K. (2023). High-quality image captioning with GANs. Journal of Advanced Machine Learning Research, 32(2), 105–118.
Kim, S., Park, J., & Choi, H. (2023). Improving image prediction accuracy using similarity-based evaluation metrics. IEEE Transactions on Image Processing, 32(4), 567–579.
Martinez, P., Gomez, L., & Torres, M. (2023). Contrastive learning for image-text alignment. Journal of Computer Vision, 78(3), 245–259.
Nattawuttisit, S., & Chantron, P. (2024). Revolutionizing AI driven innovations in gemstone classification: a synergistic approach integrating visual and semantic NLP techniques. Nanotechnology Perceptions, 20(4), 333–345.