A Content-based Image Retrieval by High-level Features from Self-supervised Learning of Pre-trained Deep Neural Networks Model
Main Article Content
Abstract
This research aims to developed a Content-based image retrieval model for resolve semantic gaps problem where low-level features cannot correctly convey the meaning of images. The result of developed model consists of 3 modules: 1) build the image description set module, it applies a CLIP (Contrastive Language-Image Pre-training) to learn the meaning of images by self-supervised learning from the relationship between images and caption on image encoder and text encoder with cosine similarity before collecting to the image description set and create to an image feature vector. 2) query processing module to learn the meaning of the text and constructs it as a query feature vector, and 3) vectors matching module with similar values between image feature vectors and query feature vectors before sorting by relevance and display the result to the user. The result of image retrieval on the Flickr30k dataset with order-unaware metric had a mean of recall is 0.93 when the result was in the top 10 is very high, but anyway it also found that the main barrier to the accuracy of the results was image variation. When comparing the image retrieval results with the image custom dataset, it was found that the average of recall was in the same direction. And there is no problem that the model's performance is compromised when working with previously unseen data. Demonstrate that the model can retrieve content-based images effectively. It also supports users with search terms in the form of natural language that are based on the meaning of the image rather than the grammar of the language. This impact of results is a guideline for information retrieval in the future.
Article Details
References
M. Broz, Number of Photos (2023): Statistics, Facts, & Predictions. Available Online at https://photutorial. com/photos-statistics/, accessed on 29 March 2023.
V. Tyagi. Content-Based Image Retrieval Ideas, Influences, and Current Trends. Published by Springer Nature, ISBN 978-981-10-6758-7, 2017.
J. Z. Wang, J. Li, and G. Wiederhold. "SIMPLIcity: semantics-sensitive integrated matching for picture libraries." IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 9, pp. 947-963, 2001.
D. Zhang. Fundamentals of Image Data Mining: Analysis, Features, Classification and Retrieval. Published by Springer Nature, ISBN 978-3-030-69251-3, 2021.
R. Scherer. Computer Vision Methods for Fast Image Classification and Retrieval. Published by Springer Nature, ISBN 978-3-030-12195-2, 2020.
C. Shorten and T. M. Khoshgoftaar. "A survey on Image Data Augmentation for Deep Learning." Journal of Big Data, Vol. 6, No. 1, pp. 1-48, 2019.
R. Mitchell. Web Scraping with Python: Collecting Data from the Modern Web, Published by O'Reilly Media Inc, ISBN 978-149-19-8557-1, 2018.
A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever. "Learning Transferable Visual Models From Natural Language Supervision." In 38th International Conference on Machine Learnin, pp. 8748-8763, 2021.
Y. Bastanlar and S. Orhan. "Self-Supervised Contrastive Representation Learning in Computer Vision." Artificial Intelligence Annual Volume 2022, Published by IntechOpen, ISBN 978-1-83768-948-4, 2022.
J. Zizka, F. Darena, and A. Svoboda. Text Mining with Machine Learning Principles and Techniques. Published by CRC Press, ISBN 978-1-13860-182-6, 2019.
K. Ueki. "Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval." In 20th IEEE International Conference on Machine Learning and Applications, pp. 628-634, 2021.
K. Sawarkar. Deep Learning with PyTorch Lightning, Published by Packt Publishing, ISBN 978-1-80056-161-8, 2022.
F. Pourpanah, M. Abdar, Y. Luo, X. Zhou, R. Wang, C. P. Lim, X. Wang, and Q. M. J. Wu. "A Review of Generalized Zero-Shot Learning Methods." IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, No. 4, pp. 4051-4070, 2023.
B. Barz. Semantic and Interactive Content-based Image Retrieval. A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Mathematics and Computer Science, Friedrich Schiller University Jena, 2020.
H. Xu, J. Wang, and L. Mao. "Relevance feedback for Content-based Image Retrieval using deep learning." In 2nd International Conference on Image, Vision and Computing (ICIVC), pp. 629-633, 2017.
Z. Ma, F. Liu, J. Dong, X. Qu, Y. He, and S. Ji. "Hierarchical Similarity Learning for Language-Based Product Image Retrieval." In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4335-4339, 2021.
A. Vatani, M. T. Ahvanooey, and M. Rahim. "An Effective Automatic Image Annotation Model ViaAttention Model and Data Equilibrium." International Journal of Advanced Computer Science and Applications, Vol. 9, No. 3, pp. 269-277, 2018.
F. Neuhaus. "What is an Ontology?." arXiv preprint arXiv:1810.09171v1., pp. 1-18, 2018.
Y. Liu, Y. Huang, S. Zhang, D. Zhang, and N. Ling. "Integrating object ontology and region semantic template for crime scene investigation image retrieval." In 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 149-153, 2017.
H. Dong, Z. Wang, Q. Qiu, and G. Sapiro. "Using Text to Teach Image Retrieval." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1-10, 2021.
E. Charniak. Introduction to Deep Learning, ISBN 978-0-262-03951-2, 2019.
S. Tilley and H. J. Rosenblatt. Systems Analysis and Design. Published by Cengage Learning, ISBN 978-1-305-49460-2, 2017.
A. Mikolajczyk and M. Grochowski. "Data augmentation for improving deep learning in image classification problem." In 2018 International Interdisciplinary PhD Workshop (IIPhDW), pp. 117-122, 2018.
Y. Roh, G. Heo, and S. E.Whang. "A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective." IEEE Transactions on Knowledge and Data Engineering, Vol. 33, No 4, pp. 1328-1347, 2019.
C. C. Aggarwal. Neural Networks and Deep Learning A Textbook, Published by Springer, ISBN 978-3-319-94463-0, 2018.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." In 9th International Conference on Learning Representations 2021 (ICLR 2021), pp. 1-21, 2021.
V. Sanh, L. Debut, J. Chaumond, and T. Wolf. "Dis tilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter." arXiv preprint arXiv: 1910.01108., pp. 1-5, 2019.
F. Weers, V. Shankar, A. Katharopoulos, Y. Yang, and T. Gunte. "Masked Autoencoding Does Not Help Natural Language Supervision at Scale." In 2023 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-19, 2023.
T. Baltrusaitis, C. Ahuja, and L. Morency. "Multimodal Machine Learning: A Survey and Taxonomy." IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, No. 2, pp. 423-443, 2019.
A. Chaudhary, Evaluation Metrics For Information Retrieval. Available Online at https://amitness. com/2020/08/information-retrieval-evaluation//, accessed on 1 April 2023.