Feature Selection with Linear Discriminant Analysis to Improve the Performance of Heart Disease Classification

Main Article Content

Ratiporn Chanklan
Keerachart Suksut
Kedkarn Podhijittikarn

Abstract

Artificial intelligence (AI) technology has become increasingly popular and is widely applied across various fields. In the medical domain, AI has been employed to support disease diagnosis. Heart disease is a common condition that affects individuals of all genders, ages, and races, and remains a leading cause of mortality worldwide. Currently, the diagnosis of heart disease can be performed using AI by leveraging electrocardiogram (ECG) data in combination with machine learning algorithms. However, in some cases, the number of data features required is excessive, which may reduce model performance. In this research, we propose a feature selection method based on Linear Discriminant Analysis (LDA) to improve the classification accuracy of a heart disease dataset. The proposed method is compared with two other feature selection techniques: correlation-based selection and information gain. We then construct classification models using three algorithms: logistic regression, support vector machines (SVM), and artificial neural networks (ANN). The experimental results show that the proposed technique improves the average classification accuracy from 77.82% to 86.46%, representing an 11.10% increase. The highest classification accuracy of 87.39% is achieved when combining ANN with LDA. The researcher employed this technique to develop a program for assessing the risk of coronary heart disease. The program assists in screening individuals at high risk and provides users with personalized information regarding their likelihood of developing the disease.

Article Details

How to Cite
Chanklan, R., Suksut, K., & Podhijittikarn, K. (2025). Feature Selection with Linear Discriminant Analysis to Improve the Performance of Heart Disease Classification. Journal of Applied Informatics and Technology, 7(2), 432–447. https://doi.org/10.14456/jait.2025.26
Section
Research Article

References

Chowdhury, M. N. R., Ahmed, E., Siddik, Md. A. D., & Zaman, A. U. (2021). Heart disease prognosis using machine learning classification techniques. 2021 6th International Conference for Convergence in Technology (I2CT), 1–6. https://doi.org/10.1109/i2ct51068.2021.9418181

Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and Their Applications, 13(4), 18–28. https://doi.org/10.1109/5254.708428

Imanbek, R., Buribayev, Z., & Yerkos, A. (2023). Processing of ischemic heart disease data using ensemble classification methods of machine learning. Journal of Problems in Computer Science and Information Technologies, 1(2). https://doi.org/10.26577/jpcsit.2023.v1.i2.06

Kadhim, M. A., & Radhi, A. M. (2023). Heart disease classification using optimized machine learning algorithms. Iraqi Journal for Computer Science and Mathematics, 31–42. https://doi.org/10.52866/ijcsm.2023.02.02.004

Kavitha, M., Gnaneswar, G., Dinesh, R., Sai, Y. R., & Suraj, R. S. (2021). Heart disease prediction using hybrid machine learning model. 2021 6th International Conference on Inventive Computation Technologies (ICICT), 1329–1333. https://doi.org/10.1109/icict50816.2021.9358597

Lakshmi, A., & Devi, R. (2023). Heart disease prediction using enhanced whale optimization algorithm based feature selection with machine learning techniques. 2023 12th International Conference on System Modeling & Advancement in Research Trends (SMART), 644–648. https://doi.org/10.1109/smart59791.2023.10428617

Modak, S., Abdel-Raheem, E., & Rueda, L. (2022). Heart disease prediction using adaptive infinite feature selection and deep neural networks. 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 235–240. https://doi.org/10.1109/icaiic54071.2022.9722652

Radhika, R., & George, S. T. (2021). Heart disease classification using machine learning techniques. Journal of Physics: Conference Series, 1937(1), 012047. https://doi.org/10.1088/1742-6596/1937/1/012047

Chanklan, R. (2017). Modeling with machine learning techniques to predict runoff [Doctor dissertation, Suranaree University of Technology]. Retrieved from http://sutir.sut.ac.th:8080/jspui/handle/123456789/7683 [In Thai]

Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation coefficients: Appropriate use and interpretation. Anesthesia & Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ane.0000000000002864

Stoltzfus, J. C. (2011). Logistic regression: A brief primer. Academic Emergency Medicine, 18(10), 1099–1104. Portico. https://doi.org/10.1111/j.1553-2712.2011.01185.x

Tharwat, A., Gaber, T., Ibrahim, A., & Hassanien, A. E. (2017). Linear discriminant analysis: A detailed tutorial. AI Communications, 30(2), 169–190. https://doi.org/10.3233/aic-170729

Ting, K. M. (2011). Confusion matrix. Encyclopedia of Machine Learning, 209–209. https://doi.org/10.1007/978-0-387-30164-8_157