Improving quality of breast cancer data through pre-processing

Vatinee Sukmak; Jaree Thongkam

PDF

Keywords:

Breast cancer data Pre-processing Data mining Decision rules

Vatinee Sukmak

Faculty of Nursing, Mahasarakham University, Mahasarakham, Thailand, 44150

Jaree Thongkam

Faculty of Informatics, Mahasarakham University, Mahasarakham, Thailand, 44150

Abstract

Using data mining for medical prognosis becomes a promising approach recently. In the mining process, the
raw data are commonly suffering from outlier and imbalanced problems which affect the performance of the
model in predicting the unseen data. Thus, choosing appropriate data mining algorithms has a straight
forward impact on the prediction model. The objective of this study is to investigate the use of three kinds of
data pre-processing techniques including outlier filtering, Synthetic Minority Over-sampling TEchnique
(SMOTE) and attribute selections for improving the quality of breast cancer data at Srinagarind Hospital in
Thailand. Three types of decision rule building techniques, i.e. Decision Table with Naïve Bays (DTNB),
Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and PART Decision List were employed.
The performance of proposed approaches was evaluated through the Area Under the receiver operating
characteristics Curve (AUC) of the decision rules. Experimental results have shown that applying the suitable
data pre-processing, especially the outlier filtering method, can lead to the significant improvement of the
prediction performance of decision rule models.

How to Cite

Sukmak, V., & Thongkam, J. (2014). Improving quality of breast cancer data through pre-processing. Engineering and Applied Science Research, 40(4), 493–504. retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/21737

Issue

Vol. 40 No. 4 (2013)

Section

ORIGINAL RESEARCH

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Article Details