Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River

Main Article Content

Thanh-Son Ngo
Duc-Loc Nguyen

Abstract

This study aims to establish the optimal regression model for predicting total suspended solids (TSS) and Turbidity based on in situ data and spectral regions of Sentinel-2 images. Various machine learning models were evaluated, including Multilayer Perceptron Regression (MLPR), Random Forest Regression (RFR), AdaBoost Regression (ABR), Multiple Linear Regression (MLR), and K-Nearest Neighbors Regression (KNNR). These models were applied to different band combinations of spectral regions: visible (VIS), near-infrared (NIR), shortwave-infrared (SWIR), VIS+NIR (VNIR), and VIS+NIR+SWIR (VNIR+SWIR). The study results revealed that the MLR model, while not the best performer during training (R2 = 0.89 for TSS and R2 = 0.66 for turbidity), did not exhibit overfitting, with corresponding R² values in testing being 0.80 and 0.42, respectively. Variable selection for MLR models identified optimal spectral bands: B3, B5, B6, B8, B11, and B12 for TSS, and B4, B8, B11, and B12 for Turbidity. The final no-intercept multiple linear regression models achieved R2 = 0.88 for TSS and R2 = 0.62 for turbidity. Performance metrics for TSS were superior, with lower MAE, MSE, and RMSE compared to Turbidity. This study underscores the efficacy of using MLR models with selected spectral bands for accurate and generalizable predictions of TSS and turbidity.

Article Details

How to Cite
Son, N. T., & Loc, N. D. (2024). Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River. Applied Environmental Research, 46(4). https://doi.org/10.35762/AER.2024053
Section
Original Article