Used Car Price Prediction Using Web Scraping and Machine Learning Models

Main Article Content

Bhurisub Dejpipatpracha
Worrarat Jongkraijak
Akarachai Inthanil
Wimonnat Sukpol

Abstract

This study aimed to develop a predictive model for used car prices in Thailand using web scraping techniques and machine learning algorithms. Data were collected from Kaidee.com, One2Car.com, and Chobrod.com, totaling 55,989 records. After data cleaning, 42,823 valid records remained, containing 13 fundamental attributes for model construction. Two experiments were conducted: (1) using only the basic features and (2) incorporating six newly engineered features—car age, annual usage rate, squared mileage, squared engine size, cumulative usage load, and temporal load—to enhance the model’s learning capability. The performance of five models, including XGBoost, Random Forest, LightGBM, CatBoost, and Gradient Boosting, was compared using MAE, RMSE, MAPE, R², and accuracy. The results showed that XGBoost achieved the best prediction performance. With the additional features, the R² value improved from 0.9262 to 0.9419, accuracy increased from 89.42% to 92.38%, and MAPE decreased from 10.58% to 7.62%, indicating that feature engineering significantly enhanced model accuracy. Feature importance analysis revealed that the most influential factors affecting used car prices were fuel type, car type, engine size, brand, and squared engine size. The findings confirm that integrating machine learning with feature engineering substantially improves predictive performance and can serve as a decision-support tool for buyers, sellers, and financial institutions to promote transparency and fairness in Thailand’s used car market.

Article Details

Section
Research Article

References

S. Bergmann and S. Feuerriegel, “Machine learning for predicting used car resale prices using granular vehicle equipment information,” Expert Syst. Appl., vol. 263, 2025, Art. no. 125640, doi: 10.1016/j.eswa.2024.125640.

L. Bukvić, J. P. Škrinjar, T. Fratrović, and B. Abramović, “Price prediction and classification of used-vehicles using supervised machine learning,” Sustainability, vol. 14, no. 24, 2022, Art. no. 17034, doi: 10.3390/su142417034.

M. Nandan and D. Ghosh, “Pre-owned car price prediction by employing machine learning techniques,” J. Decis. Anal. Intell. Comput., vol. 3, no. 1, pp. 167–184, 2023, doi: 10.31181/jdaic10008102023n.

A. Theppanya and N. Netpradit, “Forecasting the used car price index using time series forecasting methods,” (in Thai), Maejo Bus. Rev., vol. 6, no. 2, pp. 81–95, 2024, doi: 10.14456/mbr.2024.10.

A. AlShared, “Used cars price prediction and valuation using data mining techniques,” M.S. thesis, Rochester Inst. Technol., Rochester, NY, USA, 2021. [Online]. Available: https://repository.rit.edu/theses/11086

V. Singrodia, A. Mitra, and S. Paul, “A review on web scrapping and its applications,” in Proc. Int. Conf. Comput. Commun. Inform. (ICCCI), Coimbatore, India, 2019, pp. 1–6, doi: 10.1109/ICCCI.2019.8821809.

N. Burkart and M. F. Huber, “A survey on the explainability of supervised machine learning,” J. Artif. Intell. Res., vol. 70, pp. 245–317, 2021, doi: 10.1613/jair.1.12228.

B. Rolf et al., “A review on unsupervised learning algorithms and applications in supply chain management,” Int. J. Prod. Res., vol. 63, no. 5, pp. 1933–1983, 2025, doi: 10.1080/00207543.2024.2390968.

L.-Z. Guo, L.-H. Jia, J.-J. Shao, and Y.-F. Li, “Robust semi-supervised learning in open environments,” Front. Comput. Sci., vol. 19, 2025, Art. no. 198345, doi: 10.1007/s11704-024-40646-w.

H. Xie et al., “Reinforcement learning for vehicle to grid: A review,” Advances Appl. Energy, vol. 17, 2025, Art. no. 100214, doi: 10.1016/j.adapen.2025.100214.

J. S. Jhala and D. Anand, “Comparative analysis of supervised learning algorithms for valuating used car prices,” in Proc. Int. Conf. Advancement Comput. Comput. Technol. (InCACCT), Gharuan, India, 2023, pp. 265–270, doi: 10.1109/InCACCT57535.2023.10141827.

R. Nuzulia, A. Misbullah, L. Farsiah, Rasudin, Husaini, and S. A. Nazhifah, “Comparative analysis of XGBoost and random forest for used car price prediction,” in Proc. Int. Conf. Elect. Eng. Inform. (ICELTICs), Banda Aceh, Indonesia, 2024, pp. 125–129, doi: 10.1109/ICELTICs62730.2024.10776051.

N. O. Idris, A. Achban, S. A. Utiarahman, J. Karim, and F. Pontoiyo, “Predicting the selling price of cars using business intelligence with the feed-forward backpropagation algorithms,” in Proc. 5th Int. Conf. Inform. Comput. (ICIC), Gorontalo, Indonesia, 2020, pp. 1–6, doi: 10.1109/ICIC50835.2020.9288594.

R. B. A. Supleo, R. G. De Luna, and A. C. Padilla, “Predicting used car prices in Metro Manila using artificial neural networks on web-scraped data,” in Proc. 7th Int. Conf. Inform. Comput. Sci. (ICICoS), Semarang, Indonesia, 2024, pp. 30–35, doi: 10.1109/ICICoS62600.2024.10636891.

F. Wang, X. Zhang, and Q. Wang, “Prediction of used car price based on supervised learning algorithm,” in Proc. Int. Conf. Netw., Commun. Inf. Technol. (NetCIT), Manchester, U.K., 2021, pp. 143–147, doi: 10.1109/NetCIT54147.2021.00036.

J. D. Apeko, I. O. Osunmakinde, M. M. Abdulgader, and K. C. Nwosu, “Predictive analytics on used car prices using business intelligence of Bayesian networks for sales risk reduction,” in Proc. Int. Conf. Elect., Comput. Energy Technol. (ICECET), Cape Town, South Africa, 2023, pp. 1–6, doi: 10.1109/ICECET58911.2023.10389200.

H. Jing, X. Ye, and S. Manoharan, “Residual value of used car analysis and prediction,” in Proc. Int. Conf. Elect. Comput. Energy Technol (ICECET), Cape Town, South Africa, 2023, pp. 1–6, doi: 10.1109/ICECET58911.2023.10389355.

G. Buturac, “Measurement of economic forecast accuracy: a systematic overview of the empirical literature,” J. Risk Financial Manage., vol. 15, no. 1, 2022, Art. no. 1, doi: 10.3390/jrfm15010001.

J. O’Trakoun, “Business forecasting during the pandemic,” Bus. Econ., vol. 57, no. 3, pp. 95–110, 2022.