Data Classifying to Diagnose Diabetes Risk Using Data Mining Techniques

Nopparat Nonsiri; Ratree Manassila; Krit Somkanta

PDF

Published: Mar 24, 2023

Keywords:

Data Mining Diabetes Diabetes Classification

Nopparat Nonsiri

Artificial Intelligence and Data Science Research Group, Faculty of Science, Udon Thani Rajabhat University, Udon Thani

Ratree Manassila

Somdej Phra Yupparat Ban Dung Hospital, Udon Thani

Krit Somkanta

Artificial Intelligence and Data Science Research Group, Faculty of Science, Udon Thani Rajabhat University, Udon Thani

Abstract

This research aims to create a data classification model for diagnosing diabetes risk by using four data mining techniques, which are Naïve Bayes Method, Support Vector Machine Method, K-Nearest Neighbor Method, and Decision Tree Method. The study employed data on diabetic patients from Somdej Phra Yuparat Hospital, Ban Dung to create a model and a model test kit. The data was derived from a retrospective review of diabetes medical records of 1,435 data sets with 16 attributes. Then the accuracy of the model was determined using the 10-fold cross validation method. The decision tree method yielded the highest efficiency with 93.73% accuracy, Naïve Bay method of 88.92% accuracy, closest approximation, and support vector machine method accuracy values of 86.97% and 86.13% respectively. It was found that the decision tree method was the most efficient in modeling compared to the comparative approach. This is because it is a non-distribution or nonparametric method which does not depend on the probability distribution hypothesis. It can also handle high-dimensional data with precision. It is appropriate to use the model to develop a classification system for diagnosing diabetes risk and as a guideline to support medical decision-making in the diagnosis of diabetes risk.

Issue

Vol. 33 No. 2 (2023): April - June, 2023

Section

Applied Science Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The articles published are the opinion of the author only. The author is responsible for any legal consequences. That may arise from that article.

References

X. Li, Z. Zhao, C. Gao, L. Rao, P. Hao, D. Jian, W. Li, H. Tang, and M. Li, “The diagnostic value of whole blood lncRNA ENST00000550337. 1 for prediabetes and type 2 diabetes mellitus,” Experimental and Clinical Endocrinology & Diabetes, vol. 125, no. 6, pp. 377–383, 2017.

WHO and IDF. (2006, November). Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia; Report of a WHO/IDF consultation. [Online]. Available: https://www.who.int/diabetes/publications/ diagnosis_diabetes2006/en

A. Petersmann, M. Nauck, D. Müller-Wieland, W. Kerner, U.A. Müller, R. Landgraf, G. Freckmann, and. L. Heinemann, “Definition, classification and diagnosis of diabetes mellitus,” Exp Clin Endocrinol Diabetes, vol. 126, pp. 406–410, July 2018.

T. Daghistani and R. Alshammari, “Diagnosis of diabetes by applying data mining classification techniques,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 7, pp. 329–332, July 2016.

H. Wu, S. Yang, Z. Huang, J. He, and X. Wang, “Type 2 diabetes mellitus prediction model based on data mining,” Informatics in Medicine Unlocked, vol. 10, pp. 100–107, 2018.

J. Tuomilehto, J. Lindström, J. G. Eriksson, T. T. Valle, H. Hämäläinen, P. Ilanne-Parikka, S. Keinänen-Kiukaanniemi, M. Laakso, A. Louheranta, and M. Rastas, “Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance,” New England Journal of Medicine, vol. 344, no. 18, pp. 1343–1350, 2001.

K. Faranak, “Type2 diabetes mellitus prediction using data mining algorithms based on the long noncoding RNAs expression: A comparison of four data mining approaches,” BMC Bioinformatics, vol. 21, no. 1, pp. 372–386, 2020.

Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting diabetes mellitus with machine learning techniques,” Front Genet, vol. 9, pp. 515–525, 2018.

A. Kemal and S. Baha, “Diabetes mellitus data classification by cascading of feature selection methods and ensemble learning algorithms,” International Journal of Modern Education and Computer Science, vol. 10, no. 6, pp. 10–16, 2018.

X.-H. Meng, Y.-X. Huang, D.-P. Rao, Q. Zhang, and Q. Liu, “Comparison of three data mining models for predicting diabetes or prediabetes by risk factors,” The Kaohsiung Journal of Medical Sciences, vol. 29, no. 2, pp. 93–99, 2013.

V. Vijayan and A. Ravikumar, “Study of data mining algorithms for prediction and diagnosis of diabetes mellitus,” International Journal of Computer Applications, vol. 95, no. 17, pp. 12–16, 2014.

B. Kakillioglu, R. Sharma, and V. Jindal, “Diabetes determination using retraining neural network,” presented at the International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 2018.

Y. Hayashi and S. Yukita, “Rule extraction using Recursive-Rule extraction algorithm with J48 graft combined with sampling selection techniques for the diagnosis of type2 diabetes mellitus in the Pima Indian dataset,” Informatics in Medicine Unlocked, vol. 2, pp. 92–104, 2016.

W. Bethany, Casey M. Rebholz, S. Yingyin, A. K. Lee, C. Josef, S. Elizabeth, and M. E. Grams, “Diabetes and trajectories of estimated glomerular filtration rate: A prospective cohort analysis of the atherosclerosis risk in communities study,” Diabetes Care, vol. 41, pp. 1646–1653, 2018.

Article Sidebar

Main Article Content

Abstract

Article Details

References