Predicting Household Expenditure Using Machine Learning Techniques: A Case of Cambodia
Main Article Content
This study aimed to predict household expenditure using a combination of survey and geospatial data. A web-based application operating on the Google Earth Engine platform has been specifically developed for this research, providing a set of satellite-based indicators. These data were spatially averaged at the district level and integrated with household nonfood expenditures, a proxy of socioeconomic conditions, derived from the World Bank’s 2019 Living Standards Measurement Study (LSMS). Four machine learning algorithms were applied. By using root mean square error as the goodness-of-fit criterion, a random forest algorithm yielded the highest forecasting precision, followed by support vector machine, neural network, and generalized least squares. In addition, variable importance and minimal depth analyses were conducted, indicating that the geospatial indicators have moderate contributive powers in predicting socioeconomic conditions. Conversely, the predictive powers of variables derived from the LSMS were mixed. Some asset ownership yielded a high explanatory power, whereas some were minimal. The attained results suggest future development aimed at enhancing accuracy. Additionally, the findings revealed an association between economic activity density and household expenditure, recommending regional development promotion through urbanization and transition from agriculture to other economic sectors.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Aiken, E., Bellue, S., Karlan, D., Udry, C. R., & Blumenstock, J. (2022). Machine learning and mobile phone data can improve the targeting of humanitarian assistance. Nature, 603, 864–870.
Amare, M., Jensen, N. D., Shiferaw, B., & Cissé, J. D. (2018). Rainfall shocks and agricultural productivity: Implication for rural household consumption. Agricultural Systems, 166, 79–89.
Anesti, N., Kalamara, E., & Kapeta, G. (2021). Forecasting with machine learning methods and multiple large datasets. Bank of England Staff Working Paper No. 923.
Asongu, S.A. (2013). The impact of mobile phone penetration on African inequality. AGDI Working Paper, No. WP/13/021, African Governance and Development Institute (AGDI).
Arezki, R., & Brückner, M. (2012). Rainfall, financial development, and remittances: Evidence from Sub-Saharan Africa. Journal of International Economics, 87(2), 377–385.
Asian Development Bank. (2022). Cambodia, key indicators [Dataset]. ADB Data Library.
Ayush, K., Uzkent, B., Tanmay, K., Burke, M., Lobell, D., & Ermon, S. (2021). Efficient poverty mapping from high resolution remote sensing images. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), 12–20.
Barrios, S., Bertinelli, L., & Strobl, E. (2010). Trends in rainfall and economic growth in Africa: A neglected cause of the African growth tragedy. Review of Economics and Statistics, 92, 350–366.
Bhattacharya, H., & Innes, R. (2006). Is there a nexus between poverty and environment in rural India? Proceedings of the American Agricultural Economics Association Annual Meeting, July 23-26, Long Beach, CA, USA (pp. 23–26).
Blumenstock, J., Cadamuro, G., & On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), 1073–1076.
Blumenstock, J. E. (2016). Fighting poverty with data. Science, 353(6301), 753–754.
Blumenstock, J., Karlan, D., & Udry, C. (2021). Using mobile phone and satellite data to target emergency cash transfers. CEGA Blog Post.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Brown, C., & Lall, U. (2006). Water and economic development: The role of variability and a framework for resilience. Natural Resources Forum, 30(4), 306–317.
Burke, M., Driscoll, A., Lobell, D. B., & Ermon, S. (2021). Using satellite imagery to understand and promote sustainable development. Science, 371(6535), Article eabe8628.
Buyantuyev, A., & Wu, J. (2009). Urban heat islands and landscape heterogeneity: Linking spatiotemporal variations in surface temperatures to land-cover and socioeconomic patterns. Landscape Ecology, 25, 17–33.
Chen, X., Liu, C., & Yu, X. (2022). Urbanization, economic development, and ecological environment: Evidence from provincial panel data in China. Sustainability, 14(3), Article 1124.
Choi, H., & Varian, H. (2012). Predicting the present with Google Trends. Economic Record, 88(s1), 2–9.
Ciaburro, G., & Venkateswaran, B. (2017). Neural networks with R: Smart models using CNN, RNN, deep learning, and artificial intelligence principles. Packt Publishing.
Damania, R., Desbureaux, S., & Zaveri, E. (2020). Does rainfall matter for economic growth? Evidence from global sub-national data (1990–2014). Journal of Environmental Economics and Management, 102, Article 102335.
Díaz, I., Hubbard, A., Decker, A., & Cohen, M. (2015). Variable importance and prediction methods for longitudinal problems with missing variables. PLoS ONE, 10(3), Article e0120031.
Dissanayake, D., Morimoto, T., Murayama, Y., Ranagalage, M., & Handayani, H. H. (2019). Impact of urban surface characteristics and socio-economic variables on the spatial variation of land surface temperature in Lagos city, Nigeria. Sustainability, 11(1), 25.
Elbers, C., Lanjouw, J. O., & Lanjouw, P. F. (2002). Micro-level estimation of welfare. Policy Research Working Paper 2911, World Bank, Washington, DC.
Eng, R., & Lim, S. (2024). The economic development and level of poverty in Cambodia. Educational Administration: Theory and Practice, 30(6), 3693–3701.
Engstrom, R., Hersh, J., & Newhouse, D. (2017). Poverty from space: Using high-resolution satellite imagery for estimating economic well-being. PLOS ONE, 12(9), Article e0184396.
Erenstein, O., Hellin, J., & Chandna, P. (2010). Poverty mapping based on livelihood assets: A meso-level application in the Indo-Gangetic Plains, India. Applied Geography, 30(1), 112–125.
Fatehkia, M., Tingzon, I., Orden, A., Sy, S., Sekara, V., Garcia-Herranz, M., & Weber, I. (2020). Mapping socioeconomic indicators using social media advertising data. EPJ Data Science, 9(1), Article 22.
Fujii, T. (2007). To use or not to use?: Poverty mapping in Cambodia. In T. Bedi, A. Coudouel, & K. Simler (Eds.), More than a pretty picture: Using poverty maps to design better policies and interventions (pp. 125–142).
Fujii, T. (2010). Micro-level estimation of child malnutrition indicators in Cambodia. Oxford University Press.
Gao, B. C. (1996). NDWI—A normalized difference vegetation index for remote sensing of vegetation liquid water from space. Remote Sensing of Environment, 58(3), 257–266.
Gilmont, M., Hall, J. W., Grey, D., Dadson, S. J., Abele, S., & Simpson, M. (2018). Analysis of the relationship between rainfall and economic growth in Indian states. Global Environmental Change, 49, 56–72.
Gu, Y., Brown, J. F., Verdin, J. P., & Wardlow, B. (2007). A five-year analysis of MODIS NDVI and NDWI for grassland drought assessment over the central great plains of the United States. Geophysical Research Letters, 34(6), Article L06407.
Guo, Y., Zeng, J., Wu, W., Hu, S., Liu, G., Wu, L., & Bryant, C. R. (2021). Spatial and temporal changes in vegetation in the Ruoergai region, China. Forests, 12(1), 76.
Hall, O., Dompae, F., Wahab, I., & Dzanku, F. M. (2023). A review of machine learning and satellite imagery for poverty prediction: Implications for development research and applications. Journal of International Development, 35(7), 1–16.
Hansen, K., & Top, N. (2006). Natural forest benefits and economic analysis of natural forest conversion in Cambodia. Working Paper (Vol. 33), Cambodia Development Resource Institute, Phnom Penh.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Random forests. The Elements of Statistical Learning (pp. 587–604). Springer.
Head, A., Manguin, M., Tran, N., & Blumenstock, J. E. (2017). Can human development be measured with satellite imagery? Proceedings of the Ninth International Conference on Information and Communication Technologies and Development, Lahore, Pakistan (Article 8).
Huang, G., Zhou, W., & Cadenasso, M. L. (2011). Is everyone hot in the city? Spatial pattern of land surface temperatures, land cover and neighborhood socioeconomic characteristics in Baltimore, MD. Journal of Environment Management, 92(7), 1753–1759.
Huguet, J. W., Chamratrithirong, A., Rao, N. R., & Than, S. S. (2000). Results of the 1998 population census in Cambodia. Asia-Pacific Population Journal, 15(3), 3–22.
Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electronic Journal of Statistics, 1, 519–537.
Ishwaran, H., Kogalur, U. B., Gorodeski, E. Z., Minn, A. J., & Lauer, M. S. (2010). High-dimensional variable selection for survival data. Journal of the American Statistical Association, 105(489), 205–217.
Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794.
Jiao, X., Smith-Hall, C., & Theilade, I. (2015). Rural household incomes and land grabbing in Cambodia. Land Use Policy, 48, 317–328.
Jin, X., Wan, L., Zhang, Y. K., & Schaepman, M. (2008). Impact of economic growth on vegetation health in China based on GIMMS NDVI. International Journal of Remote Sensing, 29(13), 3715–3726.
John, A., Allison, M., Amadi, D. E., & Allison, C. (2019). Anti-democratic spaces and impoverishment: Role of roads in low-income residential areas. Nakhara: Journal of Environmental Design and Planning, 16, 15–32.
Kristjanson, P., Radeny, M., Baltenweck, I., Ogutu, J., & Notenbaert, A. (2005). Livelihood mapping and poverty correlates at a meso-level in Kenya. Food Policy, 30(5–6), 568–583.
Li, L., Tan, Y., Ying, S., Yu, Z., Li, Z., & Lan, H. (2014). Impact of land cover and population density on land surface temperature: Case study in Wuhan, China. Journal of Applied Remote Sensing, 8(1), Article 084993.
Li, G. Y., Chen, S. S., Yan, Y., & Yu, C. (2015). Effects of urbanization on vegetation degradation in the Yangtze River Delta of China: Assessment based on SPOT-VGT NDVI. Journal of Urban Planning and Development, 141(4), Article 05014026.
Li, M., Wu, T., Wang, S., Sang, S., & Zhao, Y. (2022). Phenology–gross primary productivity (GPP) method for crop information extraction in areas sensitive to non-point source pollution and its influence on pollution intensity. Remote Sensing, 14(12), Article 2833.
Liu, F., Xiao, X., Qin, Y., Yan, H., Huang, J., Wu, X., Zhang, Y., Zou, Z., & Doughty, R. (2022). Large spatial variation and stagnation of cropland gross primary production increases the challenges of sustainable grain production and food security in China. Science of the Total Environment, 811, Article 151408.
Liaqut, A., Younes, I., Sadaf, R., & Zafar, H. (2019). Impact of urbanization growth on land surface temperature using remote sensing and GIS: A case study of Gujranwala City, Punjab, Pakistan. International Journal of Economic Environment Geology, 9, 44–49.
Llorente, A., Garcia-Herranz, M., Cebrian, M., & Moro, E. (2015). Social media fingerprints of unemployment. PLoS ONE, 10(5), Article e0128692.
Mika, K., Minna, M., Noora, V., Jyrki, L., Jari, K. O., Anna, A., Eliyan, C., Dany, V., Maarit, K., & Nicholas, H. (2021). Situation analysis of energy use and consumption in Cambodia: household access to energy. Environment, Development and Sustainability, 23, 18631–18655.
McFeeters, K. (1996). The use of the normalized difference water index (NDWI) in the delineation of open water features. International Journal of Remote Sensing, 17(7), 1425–1432.
McKenney, B., & Tola, P. (2002). Natural resources and rural livelihoods in Cambodia: A baseline assessment. Working Paper (Vol. 23), Cambodia Development Resource Institute, Phnom Penh.
Morikawa, R. (2014). Remote sensing tools for evaluating poverty alleviation projects: A case study in Tanzania. Procedia Engineering, 78, 178–187.
Mourad, R., Jaafar, H., Anderson, M., & Gao, F. (2020). Assessment of leaf area index models using harmonized landsat and sentinel-2 surface reflectance data over a semi-arid irrigated landscape. Remote Sensing, 12(19), Article 3121.
Mulovhedzi, N., Araya, N., Mengistu, M., Fessehazion, M., du Plooy, C., Araya, H., & van der Laan, M. (2020). Estimating evapotranspiration and determining crop coefficients of irrigated sweet potato (Ipomoea batatas) grown in a semi-arid climate. Agricultural Water Management, 233, Article 106099.
Myneni, R. B., Hoffman, S., Knyazikhin, Y., Privette, J. L., Glassy, J., Tian, Y., Wang, Y., Song, X., Zhang, Y., Smith, G. R., Lotsch, A., Friedl, M., Morisette, J. T., Votava, P., Nemani, R. R., & Running, S. W. (2002). Global products of vegetation leaf area and fraction absorbed PAR from year one of MODIS data. Remote Sensing of Environment, 83(1–2), 214–231.
National Institute of Statistics. (2017). Report of Cambodia socio-economic survey 2017. Ministry of Planning.
National Institute of Statistics. (2019). Cambodia Living Standards Measurement Study - Plus 2019-2020. The World Bank.
National Institute of Statistics. (2021). Report of Cambodia socio-economic survey 2021. Ministry of Planning.
Noeurn, V. (2020). Factors affecting electricity consumption of residential consumers in Cambodia. IOP Conf. Series: Earth and Environmental Science, 746, Article 012034.
Pandit, P., Krishnamurthy, K., & Bakshi, B. (2022). Chapter 22 - Prediction of crop yield and pest-disease infestation. In A. Abraham, S. Dash, J. J.P.C. Rodrigues, B. Acharya, & S. K. Pani (Eds.), Intelligent Data-Centric Systems, AI, Edge and IoT-based Smart Agriculture (pp. 375–393). Academic Press.
Pokhriyal, N., & Jacques, D. C. (2017). Combining disparate data sources for improved poverty prediction and mapping. Proceedings of the National Academy of Sciences of the United States of America, 114(46), E9783–E9792.
Puttanapong, N., Prasertsoong, N., & Peechapat, W. (2023). Predicting provincial gross domestic product using satellite data and machine learning methods: A case study of Thailand. Asian Development Review, 40(2), 39–85.
Puttanapong, N., Martinez, A., Bulan, J. A. N., Addawe, M., Durante, R. L., & Martillan, M. (2022). Predicting poverty using geospatial data in Thailand. ISPRS International Journal of Geo-Information, 11(5), Article 293.
Richardson, C.J. (2007). How much did droughts matter? Linking rainfall and GDP growth in Zimbabwe. African Affairs, 106(424), 463–478.
Running, S. W., Nemani, R. R., Heinsch, F. A., Zhao, M., Reeves, M., & Hashimoto, H. (2004). A continuous satellite-derived measure of global terrestrial primary production. BioScience, 54(6), 547–560.[0547:ACSMOG]2.0.CO;2
Ruthirako, P., Darnsawasdi, R., & Chatupote, W. (2015). Intensity and pattern of land surface temperature in Hat Yai City, Thailand. Walailak Journal of Science and Technology, 12(1), 83–94.
Seifert, S., Gundlach, S., & Szymczak, S. (2019). Surrogate minimal depth as an importance measure for variables in random forests. Bioinformatics, 35(19), 3663–3671.
Sharma, R., Nguyen, T. T., Grote, U., & Nguyen, T. T. (2016). Changing livelihoods in rural Cambodia: Evidence from panel household data in Stung Treng. ZEF Working Paper Series (No. 149), Center for Development Research (ZEF), University of Bonn.
Shi, K., Chang, Z., Chen, Z., Wu, J., & Yu, B. (2020). Identifying and evaluating poverty using multisource remote sensing and point of interest (POI) data: A case study of Chongqing, China. Journal of Cleaner Production, 255, Article 120245.
Sophal, C., & Acharya, S. (2002). Facing the challenge of rural livelihoods: A perspective from nine villages in Cambodia. Cambodia Development Resource Institute, Phnom Penh.
Sruthi, S., & Aslam, M. M. (2015). Agricultural drought analysis using the NDVI and land surface temperature data; a case study of Raichur district. Aquatic Procedia, 4, 1258–1264.
Steele, J. E., Sundsøy, P. R., Pezzulo, C., Alegana, V. A., Bird, T. J., Blumenstock, J., Bjelland, J., Engø-Monsen, K., de Montjoye, Y.-A., & Iqbal, A. M. (2017). Mapping poverty using mobile phone and satellite data. Journal of the Royal Society Interface, 14(127), Article 20160690.
Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1), Article 25.
Takada, S., Morikawa, S., Idei, R., & Kato, H. (2021). Impacts of improvements in rural roads on household income through the enhancement of market accessibility in rural areas of Cambodia, Transportation, 48:2857–2881.
Thammapornpilas, J. (2015). Urban spatial development to mitigate urban heat Island effect in the inner area of Bangkok. Nakhara: Journal of Environmental Design and Planning, 11, 29–40.
Thiede, B. C. (2014). Rainfall shocks and within-community wealth inequality: Evidence from rural Ethiopia. World Development, 64, 181–193.
Tochaiwat, K., & Pultawee, P. (2024). House type specification for housing development project using machine learning techniques: A study from Bangkok metropolitan region, Thailand. Nakhara: Journal of Environmental Design and Planning, 23(1), Article 403.
Tong, K., & Sry, B. (2011). Poverty and environmental links: The case of rural Cambodia. Working Paper (Vol. 64), Cambodia Development Resource Institute, Phnom Penh.
United Nations Development Programme. (2022). Human development report 2021/2022. RR Donnelley Company.
van der Laan, M. J. (2006). Statistical inference for variable importance. International Journal of Biostatistics, 2(1), Article 2.
Vapnik, V. (1998). Statistical learning theory. John Wiley & Sons, Inc.
Yeh, C., Perez, A., Driscoll, A., Azzari, G., & Lobell, D. (2020). Using pu.blicly available satellite imagery and deep learning to understand economic well-being in Africa. Nature Communications, 11, Article 2583.
Wan Mohd Jaafar, W. S., Abdul Maulud, K. N., Muhmad Kamarulzaman, A. M., Raihan, A., Md Sah, S., Ahmad, A., Saad, S. N. M., Mohd Azmi, A. T., Jusoh Syukri, N. K. A., & Razzaq Khan, W. (2020). The influence of deforestation on land surface temperature: A case study of Perak and Kedah, Malaysia. Forests, 11, 670.
Wang, Y., Wang, B., & Zhang, X. (2012). A new application of the support vector regression on the construction of financial conditions index to CPI prediction. Procedia Computer Science, 9, 1263–1272.
Wikimedia Commons. (2020). Provincial boundaries in Cambodia [Map]. Wikimedia Commons.
Wong, G., & Shuaibim, A. (2023). Model selection and optimization for poverty prediction on household data from Cambodia. Journal of Emerging Investigators, 6, 1–11.
World Bank. (2022). Cambodia poverty assessment 2022: Toward a more inclusive and resilient Cambodia.
Youneszadeh, S., Amiri, N., & Pilesjo, P. (2015). The effect of land use change on land surface temperature in the Netherlands. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XL-1/W5, 745–748.
Zhang, X., Hu, L., & Wang, Z. (2010). Multiple kernel support vector regression for economic forecasting. International Conference on Management Science & Engineering 17th Annual Conference Proceedings (pp. 129-134). IEEE.
Zheng, G., & Moskal, L. M. (2009). Retrieving leaf area index (LAI) using remote sensing: Theories, methods and sensors. Sensors, 9(4), 2719–2745.
Zhou, Y., & Liu, Y. (2022). The geography of poverty: Review and research prospects. Journal of Rural Studies, 93, 408–416.