Machine Learning-Based Multiclass Classification for Predicting the Cumulative Blood Sugar Levels in Type 2 Diabetes Patient
Main Article Content
Abstract
This research aims to classify cumulative blood sugar levels (Hemoglobin A1c) and assess disease control in type 2 diabetes patients using machine learning models. A total of 28,431 medical records with 37 attributes were collected and processed through data preparation. Feature selection was performed using Information Gain (IG), Recursive Feature Elimination (RFE), and Random Forest Importance (RFI) methods. Imbalanced data was addressed using the Synthetic Minority Over-sampling Technique; SMOTE (SM), Borderline-SMOTE (BM), and Adaptive Synthetic (ADA) techniques. Models were developed using five multiclass classification algorithms. The results demonstrated that the IG + SM model, combined with the Random Forest algorithm, yielded the highest performance, with an accuracy, precision, recall, and F1 score of 81.89%, 82.13%, 81.89%, and 81.83% respectively. These findings can be applied to support decision-making for additional cumulative blood sugar level testing beyond routine practices and to enhance the efficiency of diabetes control within the hospital's service area.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The articles published are the opinion of the author only. The author is responsible for any legal consequences. That may arise from that article.
References
P. Srisodsaluk, B. Chitpakdee, and S. Chanseang, “Innovation for the Care of Older Persons in Thailand during the Thailand 4.0 Era,” UBRU Journal for Public Health Research, vol. 9, no. 2, pp. 47–54, 2020 (in Thai).
Health Administration Division. (2024). Smart Hospital. [Online] (in Thai). Available: https:// it-phdb.moph.go.th/.
N. Sriwiboon, “Improvement the performance of the Chest X-ray image classification with convolutional neural network model by using image augmentations technique for COVID-19 diagnosis,” The Journal of KMUTNB, vol. 31, no. 1, pp. 109–117, 2020 (in Thai).
N. Nonsiri, R. Manassila, and K. Somkanta, “Data classifying to diagnose diabetes risk using data mining techniques,” The Journal of KMUTNB, vol. 33, no. 2, pp. 538–547, 2022 (in Thai).
Y. L. Cheng, Y. R. Wu, K. Der Lin, C. H. R. Lin, and I. M. Lin, “Using machine learning for the risk factors classification of glycemic control in Type 2 diabetes mellitus,” Healthcare (Basel), vol. 11, no. 8, pp. 1141, 2023.
Y. Fan, E. Long, L. Cai, Q. Cao, X. Wu, and R. Tong, “Machine learning approaches to predict risks of diabetic complications and poor glycemic control in nonadherent Type 2 diabetes,” Frontiers in Pharmacology, vol. 12, pp. 665951, 2021.
Hfocus. (2023, November). 5.2 million Thais diagnosed with diabetes, with over 20 million people suffering from obesity. [Online] (in Thai). Available: https://www.hfocus.org/.
Health data center. (2024, January). HDC - Dashboard. [Online] (in Thai). Available: https://hdcservice.moph.go.th.
Hfocus. (2023, November). As of the year 2023, there are a total of 3.3 million Thais diagnosed with diabetes. [Online] (in Thai). Available: https://www.hfocus.org/.
Diabetes Association of Thailand, Clinical Practice Guideline for Diabetes 2023. Bangkok: Srimuang Printing, 2023 (in Thai).
Handbook of integrated, people-centered health services in new normal diabetic & hypertensive clinic for healthcare workers, Division of medical technical and academic affairs Ministry of Public Health, Nonthaburi, 2020 (in Thai).
C. Imai, L. Li, R. A. Hardie, and A. Georgiou, “Adherence to guideline-recommended HbA1c testing frequency and better outcomes in patients with type 2 diabetes: a 5-year retrospective cohort study in Australian general practice,” BMJ quality & safety, vol. 30, no. 9, pp. 706–714, 2021.
T. Weiss, A. Edwards, D. Lautsch, S. Rajpathak, and K. Snow, “Hemoglobin A1C testing frequency among patients with type 2 diabetes within a US payer system: A retrospective observational study,” Current medical research and opinion, vol. 37, no. 11, pp. 1859–1866, 2021.
Achieve.Plus. (2020, August). The 4 important classifications in Supervised Learning. [Online] (in Thai). Available: https://medium.com.
J. H. Lee and J. C. Huber, “Evaluation of multiple imputation with large proportions of missing data: How Much Is Too Much?,” Iranian Journal of Public Health, vol. 50, no. 7, pp. 1372–1380, 2021.
W. Romsaiyud, Machine Learning for Predictive Data Analytics and Applications. 1st ed. Nonthaburi: Sukhothai Thammathirat Open University, 2023 (in Thai).
M. Khushi, K. Shaukat, T. M. Alam, I. A. Hameed, S. Uddin, S. Luo, X. Yang, and M. C. Reyes, “A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data,” IEEE Access, vol. 9, pp. 109960–109975, 2021.
T. Pasangthien, and B. Yimwadsana, “Rebalancing clinical data with probabilistic random oversampling,” Journal of the Thai Medical Informatics Association, vol. 8, no. 2, pp. 68–72, 2022 (in Thai).
V. R. Joseph, “Optimal ratio for data splitting,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 15, no. 4, pp. 531–538, 2022.
ProjectPro. (2024, April). How to Solve a Multi Class Classification Problem with Python?. [Online]. Available: https://www.projectpro.io
B. G. Marcot and A. M. Hanea, “What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?,” Computational Statistics, vol. 36, no. 3, pp. 2009–2031, 2021.
K. Mnich, A. Polewko-Klim, A. K. Golinska, W. Lesinski, and W. R. Rudnicki, “Super learning with repeated cross validation,” in 2020 International Conference on Data Mining Workshops (ICDMW) IEEE, 2020, pp. 629–635.
M. A. Sahid, M. U. H. Babar, and M. P. Uddin, “Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics,” PloS one, vol. 19, no. 5, pp. e0300785, 2024.
A. Almahdawi, Z. S. Naama, and A. Al-Taie, “Diabetes prediction using machine learning,” in 2022 3rd Information Technology To Enhance e-learning and Other Application (IT-ELA) IEEE, 2022, pp. 186–190.