Comparing Imputation Methods for Multivariate Time Series: A Case Study for Industrial Group Index of Thailand Stock Market

Main Article Content

Pongpon Yingpratanpron

Abstract

This study investigates and compares imputation methods for multivariate time series data to identify the most effective approach. Using secondary data of stock price indices from eight industrial sectors in the Stock Exchange of Thailand, retrieved from the SETSMART database, the study covers 4,877 days from January 1, 2004, to January 1, 2024. Missing data patterns were categorized into three types: random missing, sequential missing, and block missing, with proportions set at 5%, 10%, 20%, 30%, 40%, and 50%. The imputation methods were grouped into three categories: statistical methods (mean, median, LOCF, NOCB, and linear interpolation), machine learning methods (EM, MICE, KNN, and random forest), and deep learning methods (GP-VAE, USGAN, and SAITS). Performance was assessed using root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE). Results indicated that linear interpolation was the most efficient method for missing proportions below 50% across all patterns, while random forest was the most efficient method for a 50% missing data proportion.

Article Details

How to Cite
Yingpratanpron, P. (2025). Comparing Imputation Methods for Multivariate Time Series: A Case Study for Industrial Group Index of Thailand Stock Market. KKU Science Journal, 53(2), 134–148. https://doi.org/10.14456/kkuscij.2025.11
Section
Research Articles

References

Ahn, H., Sun, K. and Kim, K.P. (2022). Comparison of Missing Data Imputation Methods in Time Series Forecasting. Computers, Materials and Continua 70(1): 767 - 779. doi: 10.32604/cmc.2022.019369.

Almeida, A., Brás, S., Sargento, S. and Pinto, F.C. (2024). Focalize K NN: an imputation algorithm for time series datasets. Pattern Analysis and Applications 27(2): 39. doi: 10.1007/s10044-024-01262-3.

Alruhaymi, A.Z. and Kim, C.J. (2021). Why Can Multiple Imputations and How (MICE) Algorithm Work?. Open Journal of Statistics 11(5): 759 - 777. doi: 10.4236/ojs.2021.115045.

Bernstein, M.N. (2017). Random Forest. Source: https://pages.cs.wisc.edu/~matthewb/pages/notes/pdf/ ensembles/RandomForests. Retrieved date 12 September 2024.

Dong, Q., Chen, X. and Huang, B. (2024). Data Analysis in Pavement Engineering Methodologies and Applications. Cambridge: Woodhead Publishing. 181 - 196.

Du, W. (2023). PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series. arXiv: 2305.18911 doi: 10.48550/arXiv.2305.18811.

Du, W., Côté, D. and Liu, Y. (2023). SAITS: Self-attention-based imputation for time series. Expert Systems With Applications 219: 119619. doi: 10.1016/j.eswa.2023.119619.

Fortuin, V., Baranchuk, D., Rätsch, G. and Mandt S. (2020). GP-VAE: Deep Probabilistic Time Series Imputation. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS) 108: 1651 - 1661.

Goldani, M. (2024). Comparative Analysis of Missing Values Imputation Methods: A Case Study in Financial Series (S&P500 and Bitcoin Value Data Sets). Iranian Journal of Finance 1(8): 47 - 70. doi: 10.61186/ijf.2024.414027.1 427.

Hua, V., Nguyen, T., Dao, M., Nguyen, H.D. and Nguyen, B.T. (2024). The impact of data imputation on air quality prediction problem. PLoS ONE 19(9): e0306303. doi: 10.1371/journal.pone.0306303.

Kotu, V. and Deshpande, B. (2019). Data Science Concepts and Practice. (2nd ed.). Cambridge: Morgan Kaufmann Publishers. 395 - 443.

Little, R. and Rubin, D. (2019). Statistical Analysis with Missing Data. (3rd ed.). Hoboken: Wiley Series in Probability and Statistics. 3 - 23.

Miao, X., Wu, Y., Wang, J., Gao, Y., Mao, X. and Yin J. (2021). Generative Semi-supervised Learning for Multivariate Time Series Imputation. Proceedings of the AAAI Conference on Artificial Intelligence 35(10): 8983 - 8991. doi: 10.1609/aaai.v35i10.17086.

Ng, S.K., Krishnan, T. and McLachlan, G.J. (2012). The EM Algorithm. In J.E. Gentle, W.K. Härdle and Y. Mori (Eds.). Handbook of Computational Statistics: Concepts and Methods. Heidelberg: Springer-Verlag. 139 - 172.