Engineering visual–textual fusion for automatic item categorization in Indonesian e-commerce platforms

Main Article Content

Yuliana Melita Pranoto
Anik Nur Handayani
Heru Wahyu Herwanto
Yosi Kristian

Abstract

In the modern digital era, e-commerce remains a fundamental pillar of the global economy. The rapid growth of online ventures has fundamentally reshaped how products are bought and sold. However, as the number of products on e-commerce platforms in Indonesia increases, the main challenge is accurately matching products to consumer preferences. While previous studies have primarily used unimodal approaches to product matching, relying on a single type of information, little research has examined how to recognize new products on e-commerce platforms. Utilizing a multimodal approach that integrates information from diverse data types, such as text and images, is increasingly appealing for improving product-matching quality. This study's main contribution is to create vector representations of image and text features using deep learning techniques, including Convolutional Neural Networks and Doc2Vec. It also involves merging features through a cross-modality approach using FeedForward Neural Networks and determining the appropriate parameters for clustering new products into existing clusters using Hierarchical Clustering. The research demonstrates that employing a multimodal approach that leverages text and image information can enhance product matching quality on e-commerce platforms in Indonesia, achieving a Normalized Mutual Information of 0.96.

Article Details

How to Cite
Pranoto, Y. M., Handayani, A. N., Herwanto, H. W., & Kristian, Y. (2026). Engineering visual–textual fusion for automatic item categorization in Indonesian e-commerce platforms. Engineering and Applied Science Research, 53(3), 308–319. https://doi.org/10.64960/easr.2026.262497
Section
ORIGINAL RESEARCH

References

Fakieh B, Happonen A. Exploring the social trend indications of utilizing e-commerce during and after COVID-19’s hit. Behav Sci (Basel). 2022;13(1):5. DOI: https://doi.org/10.3390/bs13010005

Afonso AP, Carneiro J, Azevedo AI. The impact of COVID-19 on e-commerce: a systematic review of the literature on the purchasing behavior of online retail consumers. J Mark Res Case Stud. 2024;2024:1-9. DOI: https://doi.org/10.5171/2024.403212

Cui Z. Clustering-Based analysis of e-commerce customers’ consumption behavior in the post-epidemic period. In: Gaikar V, Kandel BK, Mallick H, editors. 2023 4th International Conference E-Commerce Internet Technol (ECIT 2023); 2023 Mar 17-19; Nanchang, China. Zhengzhou: Atlantis Press; 2023. p. 30-5. DOI: https://doi.org/10.2991/978-94-6463-210-1_5

Chen H. Clustering analysis of online shopping behavior on the tiktok platform: revealing different user characteristics. Adv Econ Manag Polit Sci. 2024;91:46-56. DOI: https://doi.org/10.54254/2754-1169/91/20241015

Cherednichenko O, Ivashchenko O, Cibák Ľ, Lincenyi M. Item matching model in e-commerce: how users benefit. Econ Cult. 2023;20(1):77-90. DOI: https://doi.org/10.2478/jec-2023-0007

Alamdari PM, Navimipour NJ, Hosseinzadeh M, Safaei AA, Darwesh A. An image-based product recommendation for e-commerce applications using convolutional neural networks. Acta Inform Pragensia. 2022;11(1):15-35. DOI: https://doi.org/10.18267/j.aip.167

Feng C, Chen W, Chen C, Xu T, Chen E. Multimodal representation learning-based product matching. In: Zhang N, Wang M, Wu T, Hu W, Deng S, editors. CCKS 2022 - Evaluation Track. CCKS 2022. Communications in Computer and Information Science. Singapore: Springer; 2022. p. 180-90. DOI: https://doi.org/10.1007/978-981-19-8300-9_20

Ko E. Product matching through multimodal image and text combined similarity matching [thesis]. Stockholm: School of Electrical Engineering and Computer Science, 2021.

Gupte K, Pang L, Vuyyuri H, Pasumarty S. Multimodal product matching and category mapping: text+ image based deep neural network. 2021 IEEE International Conference on Big Data (Big Data); 2021 Dec 15-18; Orlando, USA. USA: IEEE; 2021. p. 4500-5. DOI: https://doi.org/10.1109/BigData52589.2021.9671384

Peeters R, Bizer C. Supervised contrastive learning for product matching. WWW '22: Companion Proceedings of the Web Conference 2022; 2022 Apr 25-29; Lyon, France. New York: ACM; 2022. p. 248-51. DOI: https://doi.org/10.1145/3487553.3524254

Wilke M, Rahm E. Towards multi-modal entity resolution for product matching. 32nd GI-Workshop on Foundations of Databases; 2021 Sep 1-3; Munich, Germany. p. 1-5.

Sortur M, Rajpoot P, Manjunath, Subhanandh, Rao HC. E-commerce product matching at internet scale. ICEME '22: Proceedings of the 2022 13th International Conference on E-business, Management and Economics; 2022 Jul 16-18; Beijing, China. New York: ACM; 2022. p. 45-51. DOI: https://doi.org/10.1145/3556089.3556149

Kinasih ANS, Handayani AN, Ardiansah JT, Damanhuri NS. Comparative analysis of decision tree and random forest classifiers for structured data classification in machine learning. Sci Inf Technol Lett. 2025;5(2):13-24. DOI: https://doi.org/10.31763/sitech.v5i2.1746

Sutjiadi R, Sendari S, Wahyu HH, Kristian Y. Deep learning for segmentation and classification in mammograms for breast cancer detection: a systematic literature review. Adv Ultrasound Diagn Ther. 2024;8(3):94-105. DOI: https://doi.org/10.37015/AUDT.2024.230051

Addagarla SK, Amalanathan A. Probabilistic unsupervised machine learning approach for a similar image recommender system for e-commerce. Symmetry. 2020;12(11):1783. DOI: https://doi.org/10.3390/sym12111783

Sharma AK, Bajpai B, Adhvaryu R, Pankajkumar SD, Gordhanbhai PP, Kumar A. An efficient approach of product recommendation system using NLP technique. Mater Today: Proc. 2023;80:3730-43. DOI: https://doi.org/10.1016/j.matpr.2021.07.371

Abubakar HD, Umar M, Bakale MA. Sentiment classification: review of text vectorization methods: bag of words, Tf-Idf, Word2vec and Doc2vec. SLU J Sci Technol. 2022;4:27-33. DOI: https://doi.org/10.56471/slujst.v4i.266

Bai X, Duan L, Tang R, Batra G, Agrawal R. Improving text-based similar product recommendation for dynamic product advertising at yahoo. CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management; 2022 Oct 17-21; Atlanta, USA. New York: ACM; 2022. p. 2883-92. DOI: https://doi.org/10.1145/3511808.3557129

Pranoto YM, Handayani AN, Kristian Y. Marketplace product image grouping using transfer learning of deep convolutional neural network in COVID-19 post-pandemic situation. In: Wibawa AP, editor. The Spirit of Recovery. Boca Raton: CRC Press; 2023. p. 55-63. DOI: https://doi.org/10.1201/9781003331674-4

Alabdullatif A, Aloud M. AraProdMatch: A machine learning approach for product matching in e-commerce. Int J Comput Sci Netw Secur. 2021;21(4):214-22.

Tonioni A, Serra E, Di Stefano L. A deep learning pipeline for product recognition on store shelves. 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS); 2018 Dec 12-14; Sophia Antipolis, France. USA: IEEE; 2018. p. 25-31. DOI: https://doi.org/10.1109/IPAS.2018.8708890

Li Y, Li J, Suhara Y, Wang J, Hirota W, Tan WC. Deep entity matching: challenges and opportunities. J Data Inf Qual. 2021;13(1):1-17. DOI: https://doi.org/10.1145/3431816

Habib A, Akram M, Kahraman C. Minimum spanning tree hierarchical clustering algorithm: a new pythagorean fuzzy similarity measure for the analysis of functional brain networks. Expert Syst Appl. 2022;201:117016. DOI: https://doi.org/10.1016/j.eswa.2022.117016

Pranoto YM, Handayani AN, Herwanto HW, Kristian Y. Optimizing product matching in e-commerce with DOC2VEC: leveraging hierarchical clustering parameters based on product titles. ECTI Trans Comput Inf Technol. 2024;18(3):396-405. DOI: https://doi.org/10.37936/ecti-cit.2024183.256164

Pranoto YM, Handayani AN, Herwanto HW, Kristian Y. Optimized image-based grouping of e-commerce products using deep hierarchical clustering. Int J Adv Intell Inform. 2025;11(3):336-54. DOI: https://doi.org/10.26555/ijain.v11i3.1979

Zuo Z, Wang L, Momma M, Wang W, Ni Y, Lin J, et al. A flexible large-scale similar product identification system in e-commerce. KDD’20 IRS Workshop; 2020 Aug 23-27; San Diego, USA. New York: ACM; 2020. p. 1-9.

Das N, Joshi A, Yenigalla P, Agrwal G. MAPS: multimodal attention for product similarity. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); 2022 Jan 3-8; Waikoloa, USA. USA: IEEE; 2022. p. 2988-96. DOI: https://doi.org/10.1109/WACV51458.2022.00304

Wei X, Zhang T, Li Y, Zhang Y, Wu F. Multi-modality cross attention network for image and sentence matching. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13-19; Seattle, USA. USA: IEEE; 2020. p. 10938-47. DOI: https://doi.org/10.1109/CVPR42600.2020.01095

Grd P, Tomicic I, Barcic E. Transfer learning with EfficientNetV2S for automatic face shape classification. J Univers Comput Sci. 2024;30(2):153-78. DOI: https://doi.org/10.3897/jucs.104490

Huang N, Liu J, Miao Y, Zhang Q, Han J. Deep learning for visible-infrared cross-modality person re-identification: a comprehensive review. Inf Fusion. 2023;91:396-411. DOI: https://doi.org/10.1016/j.inffus.2022.10.024

Li J, Li D, Savarese S, Hoi S. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. ICML'23: Proceedings of the 40th International Conference on Machine Learning; 2023 Jul 23-29; Hawaii, USA. New York: ACM; 2023. p. 19730-42.

Suresh. Shopee train images withlabels dataset [Internet]. 2021 [cited 2022 Jun 4]. Available from: https://www.kaggle.com/ datasets/dharmiksv/shopee-train-images-withlabels.

Farliana N, Rahmaningtyas W, Widhiastuti R. Development of e-commerce management and policy in Indonesia. Am J Humanit Soc Sci Res. 2022;6(1):155-60.

Bamansoor S, Pande B, Al Moaiad Y, Pathmanathan PR, El-Ebiary YAB, Latiff NAA, et al. Efficient online shopping platforms in Southeast Asia. 2021 2nd International Conference on Smart Computing and Electronic Enterprise (ICSCEE); 2021 Jun 15-17; Cameron Highlands, Malaysia. USA: IEEE; 2021. p. 164-8. DOI: https://doi.org/10.1109/ICSCEE50312.2021.9497901

Banachewicz K, Massaron L. The Kaggle book : data analysis and machine learning for competitive data science. Birmingham: Packt Publishing Ltd; 2022.

Wibawa AP, Handayani AN, Rukantala MRM, Ferdyan M, Budi LAP, Utama ABP, et al. Decoding and preserving Indonesia’s iconic Keris via A CNN-based classification. Telemat Inform Rep. 2024;13:100120. DOI: https://doi.org/10.1016/j.teler.2024.100120

Kumar JS, Anuar S, Hassan NH. Transfer learning based performance comparison of the pre-trained deep neural networks. Int J Adv Comput Sci Appl. 2022;13(1):797-805. DOI: https://doi.org/10.14569/IJACSA.2022.0130193

Wibawa AP, Yudha Pratama WA, Handayani AN, Ghosh A. Convolutional Neural Network (CNN) to determine the character of wayang kulit. Int J Vis Perform Arts. 2021;3(1):1-8. DOI: https://doi.org/10.31763/viperarts.v3i1.373

Karim MR, Beyan O, Zappa A, Costa IG, Rebholz-Schuhmann D, Cochez M, et al. Deep learning-based clustering approaches for bioinformatics. Brief Bioinform. 2021;22(1):393-415. DOI: https://doi.org/10.1093/bib/bbz170

Ren Y, Pu J, Yang Z, Xu J, Li G, Pu X, et al. Deep clustering: a comprehensive survey. IEEE Trans Neural Netw Learn Syst. 2025;36(4):5858-78. DOI: https://doi.org/10.1109/TNNLS.2024.3403155

Dwiyanto FA, Dreżewski R, Wibawa AP. Dealing with COVID-19 using deep learning for computer vision. In: Wibawa AP, editor. The Spirit of Recovery. Boca Raton: CRC Press; 2024. p. 18-39. DOI: https://doi.org/10.1201/9781003331674-2

Yoon B, Kim S, Kim S, Seol H. Doc2vec-based link prediction approach using SAO structures: application to patent network. Scientometrics. 2022;127:5385-414. DOI: https://doi.org/10.1007/s11192-021-04187-4

Dogru HB, Tilki S, Jamil A, Hameed AA. Deep learning-based classification of news texts using doc2vec model. 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA); 2021 Apr 6-7; Riyadh, Saudi Arabia. USA: IEEE; 2021. p. 91-6. DOI: https://doi.org/10.1109/CAIDA51941.2021.9425290

Supriyono, Wibawa AP, Suyono, Kurniawan F. Multimodal deep learning for youtube stand-up comedy transcription in Indonesian language. 2025 17th International Conference on Knowledge and Smart Technology (KST); 2025 Feb 26-Mar 1; Bangkok, Thailand. USA: IEEE; 2025. p. 358-63. DOI: https://doi.org/10.1109/KST65016.2025.11003373

Kapoor P. Classification & clustering of text based on Doc2Vec & K-means clustering based similarity measurements. In: Mishra PK, Yadav SP, editors. Demystifying Emerging Trends in Machine Learning. Singapore: Bentham Science Publishers; 2025. p. 249-60. DOI: https://doi.org/10.2174/9789815305395125020025

Fujita Y, Ueda K. A method for selecting training data using Doc2Vec for automatic test cases generation. 2024 IEEE International Conference on Consumer Electronics (ICCE); 2024 Jan 6-8; Las Vegas, USA. USA: IEEE; 2024. p. 1-6. DOI: https://doi.org/10.1109/ICCE59016.2024.10444275

Cheng J. Long text topic mining and clustering analysis base on Doc2Vec-LDA and K-Means. 2024 International Conference on Machine Learning and Cybernetics (ICMLC); 2024 Sep 20-23; Miyazaki, Japan. USA: IEEE; 2024. p. 9-13. DOI: https://doi.org/10.1109/ICMLC63072.2024.10935191

Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. Roberta: A robustly optimized bert pretraining approach [Internet]. arXiv [preprint]. 2019 [cited 2025 May 9]. Available from: https://arxiv.org/abs/1907.11692.

Liu Y, Guo Y, Fang J, FAN J, Hao Y, Liu J. Survey of research on deep learning image-text cross-modal retrieval. J Front Comput Sci Technol. 2022;16(3):489-511. (In Chinese)

Li T, Rezaeipanah A, Tag El Din ESM. An ensemble agglomerative hierarchical clustering algorithm based on clusters clustering technique and the novel similarity measurement. J King Saud Univ - Comput Inf Sci. 2022;34(6):3828-42. DOI: https://doi.org/10.1016/j.jksuci.2022.04.010

Ran X, Xi Y, Lu Y, Wang X, Lu Z. Comprehensive survey on hierarchical clustering algorithms and the recent developments. Artif Intell Rev. 2023;56:8219-64. DOI: https://doi.org/10.1007/s10462-022-10366-3

Shetty P, Singh S. Hierarchical clustering: a survey. Int J Appl Res. 2021;7(4):178-81. DOI: https://doi.org/10.22271/allresearch.2021.v7.i4c.8484

Oti EU, Olusola MO. Overview of agglomerative hierarchical clustering methods. Br J Comput Netw Inf Technol. 2024;7(2):14-23. DOI: https://doi.org/10.52589/BJCNIT-CV9POOGW

Vijaya, Sharma S, Batra N. Comparative study of single linkage, complete linkage, and ward method of agglomerative clustering. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon); 2019 Feb 14-16; Faridabad, India. USA: IEEE; 2019. p. 568-73. DOI: https://doi.org/10.1109/COMITCon.2019.8862232

Zhang D, Nan F, Wei X, Li SW, Zhu H, McKeown K, et al. Supporting clustering with contrastive learning. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2021 Jun 6-11; Mexico. Stroudsburg: ACL; 2021. p. 5419-30. DOI: https://doi.org/10.18653/v1/2021.naacl-main.427

Abbasi SO, Nejatian S, Parvin H, Rezaie V, Bagherifard K. Clustering ensemble selection considering quality and diversity. Artif Intell Rev. 2019;52:1311-40. DOI: https://doi.org/10.1007/s10462-018-9642-2

Yang X, Yan J, Cheng Y, Zhang Y. Learning deep generative clustering via mutual information maximization. IEEE Trans Neural Netw Learn Syst. 2023;34(9):6263-75. DOI: https://doi.org/10.1109/TNNLS.2021.3135375