Establishing Optimal Machine Learning Models for Monitoring Water Quality in Vietnam’s Upper Ma River
Main Article Content
Abstract
This study aims to establish the optimal regression model for predicting total suspended solids (TSS) and Turbidity based on in situ data and spectral regions of Sentinel-2 images. Various machine learning models were evaluated, including Multilayer Perceptron Regression (MLPR), Random Forest Regression (RFR), AdaBoost Regression (ABR), Multiple Linear Regression (MLR), and K-Nearest Neighbors Regression (KNNR). These models were applied to different band combinations of spectral regions: visible (VIS), near-infrared (NIR), shortwave-infrared (SWIR), VIS+NIR (VNIR), and VIS+NIR+SWIR (VNIR+SWIR). The study results revealed that the MLR model, while not the best performer during training (R2 = 0.89 for TSS and R2 = 0.66 for turbidity), did not exhibit overfitting, with corresponding R² values in testing being 0.80 and 0.42, respectively. Variable selection for MLR models identified optimal spectral bands: B3, B5, B6, B8, B11, and B12 for TSS, and B4, B8, B11, and B12 for Turbidity. The final no-intercept multiple linear regression models achieved R2 = 0.88 for TSS and R2 = 0.62 for turbidity. Performance metrics for TSS were superior, with lower MAE, MSE, and RMSE compared to Turbidity. This study underscores the efficacy of using MLR models with selected spectral bands for accurate and generalizable predictions of TSS and turbidity.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Published articles are under the copyright of the Applied Environmental Research effective when the article is accepted for publication thus granting Applied Environmental Research all rights for the work so that both parties may be protected from the consequences of unauthorized use. Partially or totally publication of an article elsewhere is possible only after the consent from the editors.