Feature Selection with Linear Discriminant Analysis to Improve the Performance of Heart Disease Classification
Main Article Content
Abstract
Artificial intelligence (AI) technology has become increasingly popular and is widely applied across various fields. In the medical domain, AI has been employed to support disease diagnosis. Heart disease is a common condition that affects individuals of all genders, ages, and races, and remains a leading cause of mortality worldwide. Currently, the diagnosis of heart disease can be performed using AI by leveraging electrocardiogram (ECG) data in combination with machine learning algorithms. However, in some cases, the number of data features required is excessive, which may reduce model performance. In this research, we propose a feature selection method based on Linear Discriminant Analysis (LDA) to improve the classification accuracy of a heart disease dataset. The proposed method is compared with two other feature selection techniques: correlation-based selection and information gain. We then construct classification models using three algorithms: logistic regression, support vector machines (SVM), and artificial neural networks (ANN). The experimental results show that the proposed technique improves the average classification accuracy from 77.82% to 86.46%, representing an 11.10% increase. The highest classification accuracy of 87.39% is achieved when combining ANN with LDA. The researcher employed this technique to develop a program for assessing the risk of coronary heart disease. The program assists in screening individuals at high risk and provides users with personalized information regarding their likelihood of developing the disease.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All authors need to complete copyright transfer to Journal of Applied Informatics and Technology prior to publication. For more details click this link: https://ph01.tci-thaijo.org/index.php/jait/copyrightlicense
References
Chowdhury, M. N. R., Ahmed, E., Siddik, Md. A. D., & Zaman, A. U. (2021). Heart disease prognosis using machine learning classification techniques. 2021 6th International Conference for Convergence in Technology (I2CT), 1–6. https://doi.org/10.1109/i2ct51068.2021.9418181
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and Their Applications, 13(4), 18–28. https://doi.org/10.1109/5254.708428
Imanbek, R., Buribayev, Z., & Yerkos, A. (2023). Processing of ischemic heart disease data using ensemble classification methods of machine learning. Journal of Problems in Computer Science and Information Technologies, 1(2). https://doi.org/10.26577/jpcsit.2023.v1.i2.06
Kadhim, M. A., & Radhi, A. M. (2023). Heart disease classification using optimized machine learning algorithms. Iraqi Journal for Computer Science and Mathematics, 31–42. https://doi.org/10.52866/ijcsm.2023.02.02.004
Kavitha, M., Gnaneswar, G., Dinesh, R., Sai, Y. R., & Suraj, R. S. (2021). Heart disease prediction using hybrid machine learning model. 2021 6th International Conference on Inventive Computation Technologies (ICICT), 1329–1333. https://doi.org/10.1109/icict50816.2021.9358597
Lakshmi, A., & Devi, R. (2023). Heart disease prediction using enhanced whale optimization algorithm based feature selection with machine learning techniques. 2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART), 644–648. https://doi.org/10.1109/smart59791.2023.10428617
Modak, S., Abdel-Raheem, E., & Rueda, L. (2022). Heart disease prediction using adaptive infinite feature selection and deep neural networks. 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 235–240. https://doi.org/10.1109/icaiic54071.2022.9722652
Radhika, R., & George, S. T. (2021). Heart disease classification using machine learning techniques. Journal of Physics: Conference Series, 1937(1), 012047. https://doi.org/10.1088/1742-6596/1937/1/012047
Chanklan, R. (2017). Modeling with machine learning techniques to predict runoff [Doctor dissertation, Suranaree University of Technology]. Retrieved from http://sutir.sut.ac.th:8080/jspui/handle/123456789/7683 [In Thai]
Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ane.0000000000002864
Stoltzfus, J. C. (2011). Logistic regression: A brief primer. Academic Emergency Medicine, 18(10), 1099–1104. Portico. https://doi.org/10.1111/j.1553-2712.2011.01185.x
Tharwat, A., Gaber, T., Ibrahim, A., & Hassanien, A. E. (2017). Linear discriminant analysis: A detailed tutorial. AI Communications, 30(2), 169–190. https://doi.org/10.3233/aic-170729
Ting, K. M. (2011). Confusion matrix. Encyclopedia of Machine Learning, 209–209. https://doi.org/10.1007/978-0-387-30164-8_157