Improving quality of breast cancer data through pre-processing

Main Article Content

Vatinee Sukmak
Jaree Thongkam

Abstract

Using data mining for medical prognosis becomes a promising approach recently. In the mining process, the
raw data are commonly suffering from outlier and imbalanced problems which affect the performance of the
model in predicting the unseen data. Thus, choosing appropriate data mining algorithms has a straight
forward impact on the prediction model. The objective of this study is to investigate the use of three kinds of
data pre-processing techniques including outlier filtering, Synthetic Minority Over-sampling TEchnique
(SMOTE) and attribute selections for improving the quality of breast cancer data at Srinagarind Hospital in
Thailand. Three types of decision rule building techniques, i.e. Decision Table with Naïve Bays (DTNB),
Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and PART Decision List were employed.
The performance of proposed approaches was evaluated through the Area Under the receiver operating
characteristics Curve (AUC) of the decision rules. Experimental results have shown that applying the suitable
data pre-processing, especially the outlier filtering method, can lead to the significant improvement of the
prediction performance of decision rule models.

Article Details

How to Cite
Sukmak, V., & Thongkam, J. (2014). Improving quality of breast cancer data through pre-processing. Engineering and Applied Science Research, 40(4), 493–504. Retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/21737
Section
ORIGINAL RESEARCH