DATA CLASSIFICATION FOR DIABETES RISK DIAGNOSIS USING MAJORITY VOTING ENSEMBLE METHOD AND FORWARD FEATURE SELECTION METHOD
Main Article Content
Abstract
The purpose of this research was to find an algorithm to classify data for diagnosing diabetes risk. Diabetes Patient Information, Somdej Phrayuparaj Ban Dung Hospital. The data were from the review of medical records of patients with diabetes in the past 2014-to 2018. The nature of such information was high-dimensional information. Because there were many attributes and some attributes were not related to the classification of data. Therefore, a preliminary selection of features was required to reduce data redundancy and optimize the classification accuracy of classes (Class) to solve these problems.
The researcher used the Forward Selection method and how to make decisions together from 3 model decision trees to select the appropriate properties (Voting Tree) to measure performance with Cross-Validation, Voting Ensemble, Gradient Boosted, Decision Tree method. Tree Method, Random Forest Method, Naïve Bayes Method, Support Vector Machine Method, Nearest Neighbor Method (K-Nearest Neighbor). Measure model performance (Accuracy) by cross-validation method by testing the accuracy of data classification. The comparison results found that a collaborative decision-making approach yields better results than a single model technique.
This was because when using a variety of classifiers to make decisions with a majority vote; help to reduce bias, and choosing a good classifier enhances the efficiency of data classification, make the model more efficient. It also found that the selection of suitable attributes through a co-op decision-making approach made the model more efficient in its classification. It is appropriate to use this model as a guideline for effective medical decision support in the diagnosis of diabetes.
Article Details
References
Li, X., Zhao, Z., Gao, C., Rao, L., Hao, P., Jian, D., Li, W., Tang, H., & Li M., (2017). The diagnostic value of whole blood lncRNA ENST00000550337. 1 for prediabetes and type 2 diabetes mellitus. Experimental and Clinical Endocrinology & Diabetes, 125(6), 377–383.
WHO & IDF. (2006). Diabetes.mellitus.California.[online]. Retrieved August 26, 2021, from Available: https://www.who.int/diabetes/publications/diagnosis_diabetes2006/en.
Kazerouni, F., Bayani, A., Asadi, F., Saeidi, L., Parvizi, N., & Mansoori, Z. (2020). Type2 diabetes mellitus prediction using data mining algorithms based on the long noncoding RNAs expression: a comparison of four data mining approaches. BMC Bioinformatics, 21, 372- 385.
Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., & Tang, H. (2018). Predicting diabetes mellitus with machine learning techniques. Frontiers in Genetics, 9, 515-525.
Nagaratnam, A., Deepika, B., Sharoon, T., & Ajay, CH. (2020). Diagnosis of Various Thyroid Ailments using Data Mining Classification Techniques. International Journal of Scientific and Research Publications, 10(5), 984-987.
Lan, H., & Pan, Y. (2019). A Crowdsourcing quality prediction model based on random forests. In: Proceedings of 18th International Conference on Computer and Information Science (ICIS), 17-19 June 2019, Beijing, China. 315-319.
Dutta, J., Yong Woon K., & Dalia, D. (2020). Comparison of gradient boosting and extreme boosting ensemble methods for webpage classification. In: Proceedings of Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), 26 – 27 November 2020, Bangalore, India. 77-82.
Priyanka, S., Srabani, P., & Sarmistha, N. (2020). A Correlation - Sequential Forward Selection Based Feature Selection Method for Healthcare Data Analysis. In: Proceedings of IEEE International Conference on Computing, Power and Communication Technologies (GUCON), 2-4 October, 2020, Greater Noida, India. 69-72.