Ensemble machine learning-based PM2.5 modeling using hotspot counts (0-1000 km) reflecting Chiang Mai, Thailand’s extreme pollution

Main Article Content

Rati Wongsathan
Apimook Sabkam

Abstract

Persistent local and transboundary smog has critically elevated PM2.5 levels in Northern Thailand over the past decade, resulting in significant health risks. The spatial distribution of hotspot counts, indicative of biomass burning and smoke dispersion, demonstrates a strong correlation with PM2.5 concentration patterns, underscoring the importance of incorporating such data into air quality analyses. This study integrates hotspot data to capture both temporal dynamics and external influences in PM2.5 prediction models. The importance of lagged hotspot counts within 100–1000 km of Chiang Mai—ranked as the world’s most polluted city during the study period—and lagged ground-level PM2.5 is assessed using Lasso regularization. The analysis reveals that the cumulative effects of hotspots extend their influence on air quality in Chiang Mai up to approximately 700 km. Advanced tree-based ensemble machine learning methods, including Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGBoost), are implemented alongside the Long Short-Term Memory (LSTM) deep learning model to evaluate their predictive performance. This approach provides a novel framework for PM2.5 modeling in Northern Thailand. Five key features with specific day lags were identified for modeling. These include PM2.5 at lag 1, short-range hotspots within 100 km at lags 1 to 3, mid-range hotspots at 200 and 400 km at lags 2 to 4, and long-range hotspots beyond 700 km at lag 5. Incorporating hotspot data improved model performance by approximately 20%, as evidenced by error metrics and residual analysis. Among the models tested, GB outperformed XGBoost, RF, and LSTM, achieving the highest R² (0.97), lowest RMSE (5.49), MAE (2.08), and MAPE (5.8%), along with near-zero MBE and minimal MdAE (0.48). Statistical validation confirmed the model’s reliability with no significant bias.

Article Details

How to Cite
Wongsathan, R. ., & Sabkam, A. . (2025). Ensemble machine learning-based PM2.5 modeling using hotspot counts (0-1000 km) reflecting Chiang Mai, Thailand’s extreme pollution. Engineering and Applied Science Research, 52(5), 473–489. retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/260545
Section
ORIGINAL RESEARCH

References

Jainontee K, Pongkiatkul P, Wang YL, Weng RJF, Lu YT, Wang TS, et al. Strategy design of PM2.5 controlling for Northern Thailand. Aerosol Air Qual Res. 2023;23(6):220432.

Chansuebsri S, Kolar P, Kraisitnitikul P, Kantarawilawan N, Yabueng N, Wiriya W, et al. Chemical composition and origins of PM2.5 in Chiang Mai (Thailand) by integrated source apportionment and potential source areas. Atmos Environ. 2024;327:120517.

Amnuaylojaroen T, Kaewkanchanawong P, Panpeng P. Distribution and meteorological control of PM2.5 and its effect on visibility in Northern Thailand. Atmosphere. 2023;14(3):538.

Sirithian D, Thanatrakolsri P. Relationships between meteorological and particulate matter concentrations (PM2.5 and PM10) during the haze period in urban and rural areas. Northern Thailand. Air Soil Water Res. 2022;15:1-15.

Sirimongkonlertkun N. Assessment of long-range transport contribution on haze episode in Northern Thailand, Laos, and Myanmar. IOP Conf Ser: Earth Environ Sci. 2018;151:012017.

Hongthong A, Nanthapong K, Kanabkaew T. Biomass burning emission inventory of multi-year PM10 and PM2.5 with high temporal and spatial resolution for Northern Thailand. ScienceAsia. 2022;48:302-9.

Wongrin W, Chaisee K, Suphawan K. Comparison of statistical and deep learning methods for forecasting PM2.5 concentration in Northern Thailand. Pol J Environ Stud. 2023;32(2):1419-31.

Thongsame W, Henze DK, Kumar R, Barth M, Pfister G. Evaluation of WRF-Chem PM2.5 simulations in Thailand with different anthropogenic and biomass-burning emissions. Atmos Environ: X. 2024;23:100282.

Chairungreuang C, Wongsathan R. Forecasting PM10 using a deep neural network. Songklanakarin J Sci Technol. 2021;43(3): 687-95.

Mohammadi F, Teiri H, Hajizadeh Y, Abdolahnejad A, Ebrahimi A. Prediction of atmospheric PM2.5 levels using machine learning techniques in Isfahan, Iran. Sci Rep. 2024;14(1):2109.

Kristiani E, Lin H, Lin JR, Chuang YH, Huang CY, Yang CT. Short-term prediction of PM2.5 using LSTM deep learning methods. Sustainability. 2022;14(4):2068.

Bai X, Zhang N, Cao X, Chen W. Prediction of PM2.5 concentration based on a CNN-LSTM neural network algorithm. PeerJ. 2024;12:e17811.

Gupta P, Zhan S, Mishra V, Aekakkararungroj A, Markert A, Paibong S, et al. Machine learning algorithm for estimating surface PM2.5 in Thailand. Aerosol Air Qual Res. 2021;21:210105.

Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, et al. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ Int. 2019;130:104909.

Wang Z, Wu X, Wu Y. A spatiotemporal XGBoost model for PM2.5 concentration prediction and its application in Shanghai. Heliyon. 2023;9(12):e22569.

Xiao Q, Chang HH, Geng G, Liu Y. An ensemble machine-learning model to predict historical PM2.5 concentrations in China from satellite data. Environ Sci Technol. 2018;52(22):13260-9.

Buya S, Usanavasin S, Gokon H, Karnjana J. An estimation of daily PM2.5 concentration in Thailand using satellite data at 1-kilometer resolution. Sustainability. 2023;15(13):10024.

United Nations, Asian and Pacific Centre for Transfer of Technology, United Nations Economic and Social Commission for Asia and the Pacific. Technological interventions and gaps in air pollution control in Bangkok. Bangkok: United Nations; 2024.

Dwivedi R, Dave D, Naik H, Singhal S, Omer R, Patel P, et al. Explainable AI (XAI): Core ideas, techniques and solutions. ACM Computing Surveys. 2023;55(9):1-33.