Web Scraping-based System for E-commerce Price Comparison and Similar Product Segmentation
Main Article Content
Abstract
With the booming growth of e-commerce, finding the best deals amid a multitude of online shopping websites has become a challenge. Consumers often spend a considerable amount of time manually sifting and comparing data, leading to uncertainty in decision-making. To address this issue, our research proposes a system that utilizes web scraping techniques to identify top deals from multiple e-commerce sites. We have developed Python-based web scraping scripts and incorporated a configuration file for customization, enabling users to extract product data from diverse websites. The system scrapes data and displays result each time the user enters a query, ensuring that the scraped data is up to date. Furthermore, our system enhances the user experience by incorporating product model datasets for product identification, enabling specific searches based on product specifications, and offering recommendations for similar product models. Finally, in cases where products remain unidentified, we introduce a feature for grouping similar products through an agglomerative clustering method. This method utilizes product name and image features extracted by TF-IDF and Convolutional Neural Networks (CNN), allowing for price comparisons among similar products and enhancing the overall shopping experience. Preliminary evaluations show that our system successfully extracts data from target websites with proper customizations. The evaluations of similar product clustering demonstrate that using a combined feature of product names and images significantly improves clustering performance, surpassing the use of product names or images alone, with a 9 percent increase and 18 percent increase, respectively.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All authors need to complete copyright transfer to Journal of Applied Informatics and Technology prior to publication. For more details click this link: https://ph01.tci-thaijo.org/index.php/jait/copyrightlicense
References
Addagarla, S. K., & Amalanathan, A. (2020). Probabilistic unsupervised machine learning approach for a similar image recommender system for E-commerce. Symmetry, 12(11), 1783. https://doi.org/10.3390/sym12111783
Alam, A., Anjum, A. A., Tasin, F. S., Reyad, M. R., Sinthee, S. A., & Hossain, N. (2020). Upoma: A dynamic online price comparison tool for Bangladeshi E-commerce websites. 2020 IEEE Region 10 Symposium (TENSYMP), 194–197. https://doi.org/10.1109/tensymp50017.2020.9230862
Ambre, A., Gaikwad, P., Pawar, K., & Patil, V. (2019). Web and android application for comparison of E-commerce products. International Journal of Advanced Engineering, Management and Science, 5(4), 266–268. https://doi.org/10.22161/ijaems.5.4.5
Asawa, A., Dabre, S., Rahise, S., Bansode, M., Talele, K. T., & Chimurkar, P. (2022). Co-Mart - A daily necessity price comparison application. 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), 1076–1080. https://doi.org/10.1109/icaaic53929.2022.9792935
Gheorghe, M., Mihai, F.-C., & Dârdală, M. (2018). Modern techniques of web scraping for data scientists. International Journal of User-System Interaction, 11(1), 63–75. https://rochi.utcluj.ro/rrioc/articole/RRIOC-11-1-Gheorghe.pdf
Kannan, H. K. (2021). E-commerce product similarity match detection using product text and images [Master's thesis, National College of Ireland]. https://norma.ncirl.ie/5171/
Kemp, S. (2022). Digital 2022: Another year of bumper growth. We are social. Retrieved August 24, 2023, from https://wearesocial.com/us/blog/2022/01/digital-2022-another-year-of-bumper-growth-2/
Lan, H., Sha, D., Malarvizhi, A. S., Liu, Y., Li, Y., Meister, N., Liu, Q., Wang, Z., Yang, J., & Yang, C. P. (2021). COVID-Scraper: An open-source toolset for automatically scraping and processing global multi-scale spatiotemporal COVID-19 records. IEEE Access, 9, 84783–84798. https://doi.org/10.1109/access.2021.3085682
Li, J., Dou, Z., Zhu, Y., Zuo, X., & Wen, J.-R. (2019). Deep cross-platform product matching in E-commerce. Information Retrieval Journal, 23(2), 136–158. https://doi.org/10.1007/s10791-019-09360-1
Mehak, S., Zafar, R., Aslam, S., & Bhatti, S. M. (2019). Exploiting filtering approach with web scrapping for smart online shopping : Penny Wise: A wise tool for online shopping. 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (ICoMET), 1–5. https://doi.org/10.1109/icomet.2019.8673399
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y
Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning (Vol. 97), 6105–6114. https://proceedings.mlr.press/v97/tan19a.html