Improving quality of breast cancer data through pre-processing
Main Article Content
Abstract
Using data mining for medical prognosis becomes a promising approach recently. In the mining process, the
raw data are commonly suffering from outlier and imbalanced problems which affect the performance of the
model in predicting the unseen data. Thus, choosing appropriate data mining algorithms has a straight
forward impact on the prediction model. The objective of this study is to investigate the use of three kinds of
data pre-processing techniques including outlier filtering, Synthetic Minority Over-sampling TEchnique
(SMOTE) and attribute selections for improving the quality of breast cancer data at Srinagarind Hospital in
Thailand. Three types of decision rule building techniques, i.e. Decision Table with Naïve Bays (DTNB),
Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and PART Decision List were employed.
The performance of proposed approaches was evaluated through the Area Under the receiver operating
characteristics Curve (AUC) of the decision rules. Experimental results have shown that applying the suitable
data pre-processing, especially the outlier filtering method, can lead to the significant improvement of the
prediction performance of decision rule models.
raw data are commonly suffering from outlier and imbalanced problems which affect the performance of the
model in predicting the unseen data. Thus, choosing appropriate data mining algorithms has a straight
forward impact on the prediction model. The objective of this study is to investigate the use of three kinds of
data pre-processing techniques including outlier filtering, Synthetic Minority Over-sampling TEchnique
(SMOTE) and attribute selections for improving the quality of breast cancer data at Srinagarind Hospital in
Thailand. Three types of decision rule building techniques, i.e. Decision Table with Naïve Bays (DTNB),
Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and PART Decision List were employed.
The performance of proposed approaches was evaluated through the Area Under the receiver operating
characteristics Curve (AUC) of the decision rules. Experimental results have shown that applying the suitable
data pre-processing, especially the outlier filtering method, can lead to the significant improvement of the
prediction performance of decision rule models.
Article Details
How to Cite
Sukmak, V., & Thongkam, J. (2014). Improving quality of breast cancer data through pre-processing. Engineering and Applied Science Research, 40(4), 493–504. Retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/21737
Issue
Section
ORIGINAL RESEARCH
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.