Applications of data mining in healthcare area: A survey

Main Article Content

Ehsan Shirzad
Ghazal Ataei
Hamid Saadatfar


Data mining is the modern way of discovering knowledge among databases that leads to statistical analysis, pattern recognition, and information prediction. Today, one of the most important applications of data mining is in the healthcare field which leads to many advances in this area in order to increase the effectiveness of treatments, reduce the risks, decrease the costs, better patient relationships, early disease diagnosis, and etc. This article attempts to provide a comprehensive overview with a new classification of services that data mining has created or facilitated in the healthcare field. It includes disease diagnosis, early detection of diseases, managing pandemic diseases, dimension reduction, health monitoring, treatment effectiveness, system biology, management of hospital resources, hospital ranking, customer relationship management, public health policy planning, fraud and abuse detection, and control data overload. Furthermore, the strengths and weaknesses of data mining in the healthcare field are discussed and future directions in this area are mentioned. Finally, it can be concluded that although data mining has abundant applications in the healthcare area, especially in the diagnosis and prediction of diseases and healthcare business, medical data mining is still young and needs more attention.


Download data is not yet available.

Article Details

How to Cite
Shirzad, E., Ataei, G., & Saadatfar, H. (2021). Applications of data mining in healthcare area: A survey. Engineering and Applied Science Research, 48(3), 314–323. Retrieved from


Shirzad E, Saadatfar H. Job failure prediction in Hadoop based on log file analysis. Int J Comput Appl. 2020;29:1732081.

Kandwal R, Garg P, Garg R. Health GIS and HIV/AIDS studies: perspective and retrospective. J Biomed Informat. 2009;42(4):748-55.

Lovell MC. Data mining. The Rev Econ Stat. 1983;65(1): 1-12.

Piatetsky-Shapiro G. The journey of knowledge discovery. In: Gaber MM, editor. Journeys to data mining. Berlin: Springer; 2012. p. 173-96.

Chakrabarti S, Ester M, Fayyad U, Gehrke J, Han J, Morishita S, et al. Data mining curriculum: a proposal (Version 1.0). Intensive working group of ACM SIGKDD curriculum committee. 2006:1-10.

Kubat M. An introduction to machine learning. 2nd ed. Switzerland: Springer; 2017.

Tecuci G. Artificial intelligence. Wiley Interdiscip Rev Comput Stat. 2012;4(2):168-80.

Karegar M, Isazadeh A, Fartash F, Saderi T, Navin AH. Data-mining by probability-based patterns. International Conference on Information Technology Interfaces; 2008 June 23-26; Cavtat, Croatia. USA: IEEE; 2014. p. 353-60.

Hill T, Lewicki P. Statistics: methods and applications: a comprehensive reference for science, industry, and data mining. USA: StatSoft; 2006.

Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. In: Hastie T, Tibshirani R, Friedman J, editors. Unsupervised learning. 2nd ed. New York: Springer; 2009. p. 485-585.

Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. In: Hastie T, Tibshirani R, Friedman J, editors. Overview of supervised learning. 2nd ed. New York: Springer; 2009. p. 9-41.

Graupe D. Principles of artificial neural networks. 3rd ed. Singapore: World Scientific; 2013.

Premchaisawatt S, Ruangchaijatupon N. Enhancing indoor positioning based on filter partitioning cascade machine learning models. Eng Appl Sci Res. 2016;43(3):146-52.

Saadatfar H, Fadishei H, Deldari H. Predicting job failures in AuverGrid based on workload log analysis. New Generat Comput. 2012;30(1):73-94.

Noyunsan C, Katanyukul T, Saikaew K. Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes. Eng Appl Sci Res. 2018;45(3):221-9.

Thongkam J, Sukmak V, Mayusiri W. A comparison of regression analysis for predicting the daily number of anxiety-related outpatient visits with different time series data mining. Eng Appl Sci Res. 2015;42(3):243-9.

Peng CY, Lee KL, Ingersoll GM. An introduction to logistic regression analysis and reporting. J Educ Res. 2002;96(1):3-14.

Ayyad SM, Saleh AI, Labib LM. Gene expression cancer classification using modified K-Nearest Neighbors technique. Bio Syste. 2019;176:41-51.

Jain AK. Data clustering: 50 years beyond K-means. Pattern Recogn Lett. 2010;31(8):651-66.

Kimes PK, Liu Y, Neil Hayes D, Marron JS. Statistical significance for hierarchical clustering. Biometrics. 2017;73(3):811-21.

Thongkam J, Sukmak V. Enhancing the performance of association rule models by filtering instances in colorectal cancer patients. Eng Appl Sci Res. 2017;44(2):76-83.

Pandit H, Shah DM. Application of digital image processing and analysis in healthcare based on medical palmistry. IJCA. 2011:56-9.

Jia X, Meng MQ-H. A deep convolutional neural network for bleeding detection in wireless capsule endoscopy images. 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); 2016 Aug 16-20; Orlando, USA. USA: IEEE; 2016. p. 639-42.

Wimmer G, Hegenbart S, Vecsei A, Uhl A. Convolutional neural network architectures for the automated diagnosis of celiac disease. In: Peter T, editor. Computer-Assisted and Robotic Endoscopy. Cham: Springer; 2016. p. 104-13.

Pratumgul W, Sa-ngiamwibool W. Classification of diabetic retinopathy using artificial neural network. Eng Appl Sci Res. 2016;43(1):74-7.

Sarraf S, DeSouza DD, Anderson JAE, Tofighi G. DeepAD: Alzheimer’s disease classification via deep convolutional neural networks using MRI and fMRI. Bio Rxiv. 2016:1-6.

Karimi-Rouzbahani H, Daliri M. Diagnosis of Parkinson’s disease in human using voice signals. Basic Clin Neurosci. 2011;2(3):12-20.

Hashim NW, Wilkes M, Salomon R, Meggs J. Analysis of timing pattern of speech as possible indicator for near-term suicidal risk and depression in male patients. 2012 International Conference on Signal Processing Systems (ICSPS 2012); 2012 Dec 21-22; Kuala Lumpur, Malaysia. Singapore: IACSIT Press; 2012. p. 6-13.

Rosa MdO, Pereira JC, Carvalho A. Evaluation of neural classifiers using statistic methods for identification of laryngeal pathologies. Proceedings 5th Brazilian Symposium on Neural Networks (Cat No 98EX209); 1998 Dec 9-11; Belo Horizonte, Brazil. USA: IEEE; 2002.

Krishnaiah V, Narsimha G, Chandra DNS. Diagnosis of lung cancer prediction system using data mining classification techniques. Int J Comput Sci Inform Tech. 2013;4(1):39-45.

Wongtrairat W, Namwet P, Pornnimitra S. Early detection of Parkinson’s diseases by using the relationship between time response and movement characteristics of human arms. Eng Appl Sci Res. 2016;43(3):127-34.

Mokhtar SA, Elsayad A. Predicting the severity of breast masses with data mining methods. arXiv:1305.7057. 2013:1-9.

Abdar M, Zomorodi-Moghadam M, Das R, Ting IH. Performance analysis of classification algorithms on early detection of liver disease. Expert Syst Appl. 2017;67:239-51.

Wong WK, Moore A, Cooper G, Wagner M. WSARE: What’s strange about recent events?. J Urban Health. 2003;80(1):66-75.

Caduff C. Sick weather ahead: on data-mining, crowd-sourcing and white noise. Camb Anthropol. 2014;32(1): 32-46.

Gu Y, Chen F, Liu T, Lv X, Shao Z, Lin H, et al. Early detection of an epidemic erythromelalgia outbreak using Baidu search data. Sci Rep. 2015;5(1):12649.

Joloudari JH, Hassannataj Joloudari E, Saadatfar H, GhasemiGol M, Razavi SM, Mosavi A, et al. Coronary artery disease diagnosis; ranking the significant features using a random trees model. Int J Environ Res Publ Health. 2020;17(3):731.

Huda S, Yearwood J, Jelinek HF, Hassan MM, Fortino G, Buckland M. A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for brain tumor diagnosis. IEEE Access. 2016;4:9145-54.

Tayefi M, Tajfard M, Saffar S, Hanachi P, Amirabadizadeh AR, Esmaeily H, et al. hs-CRP is strongly associated with coronary heart disease (CHD): a data mining approach using decision tree algorithm. Comput Meth Programs Biomed. 2017;141:105-9.

Sareen S, Sood SK, Gupta SK. IoT-based cloud framework to control Ebola virus outbreak. J Ambient Intell Humaniz Comput. 2018;9(3):459-76.

Papamatthaiakis G, Polyzos GC, Xylomenos G. Monitoring and modeling simple everyday activities of the elderly at home. 2010 7th IEEE Consumer Communications and Networking Conference; 2010 Jan 9-12; Las Vegas, USA. USA: IEEE; 2010. p. 1-5.

Pandey PS. Machine learning and IoT for prediction and detection of stress. 2017 17th International Conference on Computational Science and Its Applications (ICCSA); 2017 July 3-6; Trieste, Italy. USA: IEEE; 2017.

Verma P, Sood SK, Kalra S. Cloud-centric IoT based student healthcare monitoring framework. J Ambient Intell Humaniz Comput. 2018;9(5):1293-309.

Aljumah AA, Ahamad MG, Siddiqui MK. Application of data mining: Diabetes health care in young and old patients. J King Saud Univ Comp Info Sci. 2013;25(2):127-36.

Wilson AM, Thabane L, Holbrook A. Application of data mining techniques in pharmacovigilance. Br J Clin Psychol. 2004;57(2):127-34.

Wang JT, Ma Q, Shasha D, Wu CH. Application of neural networks to biological data mining: a case study in protein sequence classification. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining; 2000 Aug; Massachusetts, USA. USA: Association for Computing Machinery; 2000. p. 305-9.

Arango-Lopez J, Orozco-Arias S, Salazar JA, Guyot R. Application of data mining algorithms to classify biological data: the coffea canephora genome case. In: Solano A, Ordoñez H, editors. Advances in Computing. Cham: Springer; 2017. p. 156-70.

Manda P. Data mining powered by the gene ontology. J Data Min Knowl Discov. 2020;10(3):e1359.

Alapont J, Bella-Sanjuan A, Ferri C, Hernandez-Orallo J, Llopis-Llopis J, Ramirez-Quintana M. Specialised tools for automating data mining for hospital management. Proc First East European Conference on Health Care Modelling and Computation; 2005 Aug 31 – Sep 2; Craiova, Romania.

Belciug S. Patients length of stay grouping using the hierarchical clustering algorithm. Math Comp Sci Ser. 2009;36(2):79-84.

Ng SK, McLachlan GJ, Lee AH. An incremental EM-based learning approach for on-line prediction of hospital resource utilization. Artif Intell Med. 2006;36(3):257-67.

Ceglowski R, Churilov L, Wasserthiel J. Combining Data Mining and Discrete Event Simulation for a Value-Added View of a Hospital Emergency Department. J Oper Res Soc. 2016;1(1):119-38.

Testik MC, Ozkaya BY, Aksu S, Ozcebe OI. Discovering blood donor arrival patterns using data mining: a method to investigate service quality at blood centers. J Med Syst. 2012;36(2):579-94.

Cerrito PB, Cox JA, Mayes M, Thompson W. Using text analysis to examine ICD-9 codes to determine uniformity in the reporting of Medpar® data. Proc AMIA Symp. 2002:992.

Rafalski E. Using data mining/data repository methods to identify marketing opportunities in health care. J Consum Market. 2002;19(7):607-13.

Song G. Application of data mining technology in the CRM of pharmaceutical industry. 2018 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS); 2018 Jan 25-26; Xiamen, China. USA: IEEE; 2018. p. 61-4.

Goodall CR. Data mining of massive datasets in healthcare. J Comput Graph Stat. 1999;8(3):620-34.

Lavrac N, Bohanec M, Pur A, Cestnik B, Debeljak M, Kobler A. Data mining and visualization for decision support and modeling of public health-care resources. J Biomed Informat. 2007;40(4):438-47.

Kniesner TJ, Leeth JD. Data mining mining data: MSHA enforcement efforts, underground coal mine safety, and new health policy implications. J Risk Uncertainty. 2004;29(2):83-111.

Ortega PA, Figueroa CJ, Ruz GA. A medical claim Fraud/Abuse detection dystem based on data mining: A case study in Chile. Proceedings of the 2006 International Conference on Data Mining; 2006 Jun 26-29; Las Vegas, USA. USA: CSREA Press; 2006. p. 224-31.

Liou FM, Tang YC, Chen JY. Detecting hospital fraud and claim abuse through diabetic outpatient services. Health Care Manag Sci. 2008;11(4):353-8.

He H, Graco W, Yao X. Application of genetic algorithm and k-nearest neighbour method in medical fraud detection. In: McKay B, Yao X, Newton C.S, Kim JH, Furuhashi T, editors. Simulated Evolution and Learning. Berlin: Springer; 1998. p. 74-81.

Yang WS, Hwang SY. A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl. 2006;31(1):56-68.

Chandola V, Sukumar SR, Schryver JC. Knowledge discovery from massive healthcare claims data. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining; 2013 Aug 11-14; Chicago, USA. USA: Association for Computing Machinery; 2013. p. 1312-20.

RM SP, Maddikunta PK, Parimala M, Koppu S, Reddy T, Chowdhary CL, Alazab M. An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in IoMT architecture. Comput Comm. 2020;16(11):139-49.

Benzaid C, Lounis K, Al-Nemrat A, Badache N, Alazab M. Fast authentication in wireless sensor networks. Future Generat Comput Syst. 2016;55:362-75.

Islam SR, Kwak D, Kabir MH, Hossain M, Kwak KS. The internet of things for health care: a comprehensive survey. IEEE Access. 2015;3:678-708.

Dimitrov DV. Medical internet of things and big data in healthcare. J Healthc Inform Res. 2016;22(3):156-63.

Alazab M, Tang M. Deep learning applications for cyber security. Switzerland: Springer; 2019.

Azab A, Layton R, Alazab M, Oliver J. Mining malware to detect variants. 2014 Fifth Cybercrime and Trustworthy Computing Conference; 2014 Nov 24-25; Auckland, New Zealand. USA: IEEE; 2014. p. 44-53.

Farivar F, Haghighi MS, Jolfaei A, Alazab M. Artificial intelligence for detection, estimation, and compensation of malicious attacks in nonlinear cyber-physical systems and industrial IoT. IEEE Trans Industr Inform. 2019;16(4):2716-25.