Hybrid machine learning models: A comprehensive, data-driven evaluation with diverse data partitioning strategies for net radiation estimation

Main Article Content

Kristian Lorenz Bajao
Kittisak Phetpan
Ponlawat Chophuk
Rattapong Suwalak

Abstract

Surface net radiation (Rn) is crucial for climate modeling and agricultural management but is often not readily available, especially in regions like Thailand. Accurate prediction of Rn is essential for estimating evapotranspiration, which is vital for irrigation planning and agricultural productivity. This study develops a hybrid machine learning framework that incorporates K-Nearest Neighbors (KNN) for missing data imputation, Random Forest-Recursive Feature Elimination (RF-RFE) for feature selection, and machine learning models (Multi-layer Perceptron, K-Nearest Neighbors, and Random Forest) for prediction. The research evaluates various data partitioning methods, including hold-out split, K-fold cross-validation, and growing-window forward-validation (gwFV), alongside hyperparameter tuning using GridSearch to enhance model robustness and prevent overfitting. The primary objectives are to develop and evaluate the hybrid ML models for daily Rn estimation using basic meteorological inputs (temperature, relative humidity, and sunshine duration), assess the impact of different input combinations on prediction accuracy in Sawi, Chumphon, Thailand, and compare data partitioning techniques to determine the optimal model performance. Utilizing FAO56PM-calculated Rn as a reference, this study finds that the Random Forest model, with average temperature and sunshine duration (M2) as inputs evaluated under the gwFV method, achieves the highest stability and high accuracy (R² of 0.972, RMSE of 0.457 MJ m-2 day-1, and MAPE of 3.50%). The Random Forest demonstrates strong generalization capabilities, making it a reliable choice. Even models using only sunshine duration (M3) perform adequately, offering a solution when data availability is scarce. This study concludes that hybrid machine learning models, combined with careful data partitioning, significantly improve Rn estimation. These advancements provide valuable insights for climate modeling, agricultural management, and irrigation scheduling, particularly in data-scarce regions.

Article Details

How to Cite
Lorenz Bajao, K. ., Phetpan, K., Chophuk, P. ., & Suwalak, R. . (2025). Hybrid machine learning models: A comprehensive, data-driven evaluation with diverse data partitioning strategies for net radiation estimation. Engineering and Applied Science Research, 52(3), 240–250. retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/260010
Section
ORIGINAL RESEARCH

References

Gürel AE, Ağbulut Ü, Bakır H, Ergün A, Yıldız G. A state of art review on estimation of solar radiation with various models. Heliyon. 2023;9(2):e13167.

Hissou H, Benkirane S, Guezzaz A, Azrour M, Beni-Hssane A. A novel machine learning approach for solar radiation estimation. Sustainability. 2023;15(13):10609.

Carmona F, Rivas R, Kruse E. Estimating daily net radiation in the FAO Penman–Monteith method. Theor Appl Climatol. 2017;129(1):89-95.

Dimitriadou S, Nikolakopoulos KG. Artificial neural networks for the prediction of the reference evapotranspiration of the Peloponnese Peninsula, Greece. Water. 2022;14(13):2027.

Wu B, Liu S, Zhu W, Yan N, Xing Q, Tan S. An improved approach for estimating daily net radiation over the Heihe River Basin. Sensors. 2017;17(1):86.

Liu X, Zhang J, Yan H, Yang H. Estimation of the surface net radiation under clear-sky conditions in areas with complex terrain: a case study in Haihe River Basin. Front Ecol Evol. 2022;10:935250.

Tohsing K, Phoemwong C, Uearsri C, Saiplang P. An estimation of net radiation from global solar radiation in the main regions of Thailand. J Phys: Conf Ser. 2023;2431(1):012021.

Carmona F, Rivas R, Caselles V. Development of a general model to estimate the instantaneous, daily, and daytime net radiation with satellite data on clear-sky days. Remote Sens Environ. 2015;171:1-13.

Flumignan DL, Rezende MKA, Comunello É, Fietz CR. Empirical methods for estimating reference surface net radiation from solar radiation. Engenharia Agrícola. 2018;38(1):32-7.

Jiang B, Zhang Y, Liang S, Wohlfahrt G, Arain A, Cescatti A, et al. Empirical estimation of daytime net radiation from shortwave radiation and ancillary information. Agric For Meteorol. 2015;211-212:23-36.

Allen RG, Walter IA, Elliott RL, Howell TA, Itenfisu D, Jensen ME, et al. The ASCE standardized reference evapotranspiration equation. USA: ASCE; 2005.

Allen R, Pereira L, Raes D, Smith M. FAO Irrigation and drainage paper No. 56. Rome: Food and Agriculture Organization of the United Nations; 1998.

Gupta S, Singh AK, Mishra S, Vishnuram P, Dharavat N, Rajamanickam N, et al. Estimation of solar radiation with consideration of terrestrial losses at a selected location—a review. Sustainability. 2023;15(13):9962.

Tang W, Yang K, Qin J, Li X, Niu X. A 16-year dataset (2000–2015) of high-resolution (3 h, 10 km) global surface solar radiation. Earth Syst Sci Data. 2019;11(4):1905-15.

Li S, Jiang B, Liang S, Peng J, Liang H, Han J, et al. Evaluation of nine machine learning methods for estimating daily land surface radiation budget from MODIS satellite data. Int J Digit Earth. 2022;15(1):1784-816.

Arshad MJ, Ali S, Khan SN, Arshad A, Liu J, Mumtaz F, et al. Multispectral assessment of net radiations using comprehensive multi-satellite data. Water. 2024;16(23):3378.

Ramírez-Cuesta JM, Vanella D, Consoli S, Motisi A, Minacapilli M. A satellite stand-alone procedure for deriving net radiation by using SEVIRI and MODIS products. Int J Appl Earth Obs Geoinf. 2018;73:786-99.

Vaz PJ, Schütz G, Guerrero C, Cardoso PJS. Hybrid neural network based models for evapotranspiration prediction over limited weather parameters. IEEE Access. 2023;11:963-76.

Sohrabi Geshnigani F, Golabi MR, Mirabbasi R, Tahroudi MN. Daily solar radiation estimation in Belleville station, Illinois, using ensemble artificial intelligence approaches. Eng Appl Artif Intell. 2023;120:105839.

Belmahdi B, Louzazni M, Marzband M, El Bouardi A. Global solar radiation forecasting based on hybrid model with combinations of meteorological parameters: Morocco case study. Forecasting. 2023;5(1):172-95.

Alizamir M, Othman Ahmed K, Shiri J, Fakheri Fard A, Kim S, Heddam S, et al. A new insight for daily solar radiation prediction by meteorological data using an advanced artificial intelligence algorithm: deep extreme learning machine integrated with variational mode decomposition technique. Sustainability. 2023;15(14):11275.

Azad MAK, Mallick J, Islam ARMT, Ayen K, Hasanuzzaman M. Estimation of solar radiation in data-scarce subtropical region using ensemble learning models based on a novel CART-based feature selection. Theor Appl Climatol. 2024;155(1):349-69.

Puga-Gil D, Astray G, Barreiro E, Gálvez JF, Mejuto JC. Global solar irradiation modelling and prediction using machine learning models for their potential use in renewable energy applications. Mathematics. 2022;10(24):4746.

Chen CR, Kartini UT. k-Nearest neighbor neural network models for very short-term global solar irradiance forecasting based on meteorological data. Energies. 2017;10(2):186.

Benali L, Notton G, Fouilloy A, Voyant C, Dizene R. Solar radiation forecasting using artificial neural network and random forest methods: application to normal beam, horizontal diffuse and global components. Renew Energy. 2019;132:871-84.

Dhal P, Azad C. A comprehensive survey on feature selection in the various fields of machine learning. Appl Intell. 2022;52(4):4543-81.

Ramírez-Rivera FA, Guerrero-Rodríguez NF. Ensemble learning algorithms for solar radiation prediction in Santo Domingo: measurements and evaluation. Sustainability. 2024;16(18):8015.

Bergmeir C, Benítez JM. On the use of cross-validation for time series predictor evaluation. Inf Sci. 2012;191:192-213.

Hossein Kazemi M, Shiri J, Marti P, Majnooni-Heris A. Assessing temporal data partitioning scenarios for estimating reference evapotranspiration with machine learning techniques in arid regions. J Hydrol. 2020;590:125252.

Elzain HE, Abdalla OA, Abdallah M, Al-Maktoumi A, Eltayeb M, Abba SI. Innovative approach for predicting daily reference evapotranspiration using improved shallow and deep learning models in a coastal region: a comparative study. J Environ Manage. 2024;354:120246.

Tejada AT Jr, Ella VB, Lampayan RM, Reaño CE. Modeling reference crop evapotranspiration using Support Vector Machine (SVM) and Extreme Learning Machine (ELM) in region IV-A, Philippines. Water. 2022;14(5):754.

Schnaubelt M. A comparison of machine learning model validation schemes for non-stationary time series data. FAU Discussion Papers in Economics, No. 11/2019. Nürnberg: Friedrich-Alexander-Universität Erlangen-Nürnberg; 2019.

Phumkokrux N. A study of Köppen-Geiger climate classification change in Thailand from 1987–2017. In: Monprapussorn S, Lin Z, Sitthi A, Wetchayont P, editors. Geoinformatics for Sustainable Development in Asian Cities; 2018 Jul 19-20; Bangkok, Thailand. Cham: Springer; 2020. p. 109-17.

Breiman L. Random Forests. Machine Learning. 2001;45(1):5-32.

Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21-7.

Pagano A, Amato F, Ippolito M, De Caro D, Croce D, Motisi A, et al. Machine learning models to predict daily actual evapotranspiration of citrus orchards under regulated deficit irrigation. Ecol Inform. 2023;76:102133.

Juna A, Umer M, Sadiq S, Karamti H, Eshmawi AA, Mohamed A, et al. Water quality prediction using KNN imputer and multilayer perceptron. Water. 2022;14(17):2592.

Kwiatkowski D, Phillips PCB, Schmidt P, Shin Y. Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series have a unit root?. J Econom. 1992;54(1-3):159-78.

Dickey DA, Fuller WA. Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc. 1979;74(366):427-31.

Youssef MA, Peters RT, El-Shirbeny M, Abd-ElGawad AM, Rashad YM, Hafez M, et al. Enhancing irrigation water management based on ETo prediction using machine learning to mitigate climate change. Cogent Food Agric. 2024;10(1):2348697.

Huang L, Kang J, Wan M, Fang L, Zhang C, Zeng Z. Solar radiation prediction using different machine learning algorithms and implications for extreme climate events. Front Earth Sci. 2021;9:596860.

Yu H, Jiang S, Chen M, Wang M, Shi R, Li S, et al. Machine learning models for daily net radiation prediction across different climatic zones of China. Sci Rep. 2024;14(1):20454.

Hissou H, Benkirane S, Guezzaz A, Abderrahim B. Accurate solar radiation forecasting using an effective time series with feature selection [Internet]. Research Square [Preprint]. 2023 [cited 2025 Feb 10]. Available from: https://www.researchsquare.com/ article/rs-2421924/v1.

Yamaç SS, Todorovic M. Estimation of daily potato crop evapotranspiration using three different machine learning algorithms and four scenarios of available meteorological data. Agric Water Manag. 2020;228:105875.

Santos PABd, Schwerz F, Carvalho LG, Baptista VBS. Machine learning and conventional method for reference evapotranspiration estimation using limited climatic data scenarios [Internet]. Research Square [Preprint]. 2022 [cited 2025 Feb 10]. Available from: https://www.researchsquare.com/article/rs-2002124/v1.

Shiri J. Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet-random forest methodology. J Hydrol. 2018;561:737-50.

Landeras G, López JJ, Kisi O, Shiri J. Comparison of gene expression programming with neuro-fuzzy and neural network computing techniques in estimating daily incoming solar radiation in the Basque Country (Northern Spain). Energy Convers Manag. 2012;62:1-13.

Ikram RMA, Dai HL, Ewees AA, Shiri J, Kisi O, Zounemat-Kermani M. Application of improved version of multi verse optimizer algorithm for modeling solar radiation. Energy Rep. 2022;8:12063-80.

Alizamir M, Shiri J, Fard AF, Kim S, Gorgij AD, Heddam S, et al. Improving the accuracy of daily solar radiation prediction by climatic data using an efficient hybrid deep learning model: Long Short-Term Memory (LSTM) network coupled with wavelet transform. Eng Appl Artif Intell. 2023;123:106199.