Predicting Breast Cancer Patient Survival

Main Article Content

Jaree Thongkam
Vatinee Sukmak

Abstract

The objective of this research is to develop the effective model for predicting the survival of patients with breast cancer. Breast cancer is the second most common cancer in women. Data were collected from the SEER database in 2004 to 2014. It has up to 115,184 records. The prediction models were modeled with the basic techniques including Naive Bayes, PART decision list, MutiLayer Perceptron and Support Vector Machine. Moreover, the research team adopted Bagging technique to combine with these basic techniques in order to increase performance of the built prediction models. 10-fold cross-validation has been used to divide the dataset into training and testing sets. Sensitivity, specificity and accuracy values were used to compare the performance of models. The experiment result shows that that PART combine with bagging technique can construct breast cancer survival models with the highest accuracy of 98.89 %.

Article Details

How to Cite
[1]
J. Thongkam and V. Sukmak, “Predicting Breast Cancer Patient Survival”, RMUTI Journal, vol. 14, no. 1, pp. 44–54, Dec. 2020.
Section
Research article

References

Siegel, R., Miller, K., and Jemal, A. (2018). Cancer Statistics, 2018. CA: A Cancer Journal Clinic. Vol. 68, Issue 1, pp. 7-30. DOI: 10.3322/caac.21442

Cancer Research UK. Stages of Cancer. Access (12 April 2020). Avaliable (http://www.cancerresearchuk.org/about-cancer/what-is-cancer/stages-of-cancer#types)

HD Editorial Department. Breast Cancer, Stage 3, Treatment and Survival Rate. Access (13 January 2020). Avaliable (https://www.honestdocs.co/stage-3-breast-cancer)

Delen, D. and Patil, N. (2006). Knowledge Extraction from Prostate Cancer Data. In Proceeding of the 39th Annual Hawaii International Conference on System Sciences. pp. 92b-92b. USA: IEEE Publisher

Umezu, T., Shibata, K., Kajiyama, H., Yamamoto, E., Mizuno, M., and Kikkawa, F. (2012). Prognostic Factors in Stage IA-IIA Cervical Cancer Patients Treated Surgically: Does the Waiting Time to the Operation Affect Survival?. Archives of Gynecology and Obstetrics. Vol. 285, No. 2, pp. 493-497. DOI: 10.1007/s00404-011-1966-y

Poum, A., Kamsa-ard, S., and Promthet, S. (2012). Survival Rates of Breast Cancer: a Hospital-Based Study from Northeast of Thailand. Asian Pacific Journal of Cancer Prevention. Vol. 13, Issue 3, pp. 791-794. DOI: 10.7314/APJCP.2012.13.3.791

Zhang, Y.-C. and Sakhanenko, L. (2019). The Naive Bayes Classifier for Functional Data. Statistics & Probability Letters. Vol. 152, pp. 137-146. DOI: 10.1016/j.spl.2019.04.017

Liu, B., Blasch, E., Chen, Y., Shen, D., and Chen, G. (2013). Scalable Sentiment Classification for Big Data Analysis using Naïve Bayes Classifier. In Proceeding of the International Conference on Big Data. pp. 99-104. USA: IEEE Publisher

Khunsuk, T. and Thongkam, J. (2020). Feature Selection Method for Improving Customer Reviews Classification. RMUTI JOURNAL Science and Technology. Vol. 13, No. 1, pp. 132-145

Mazid, M. M., Ali, A. B. M. S., and Tickle, K. S. (2009). A Comparison Between Rule Based and Association Rule Mining Algorithms. In Proceeding of the Third International Conference on Network and System Security. pp. 452-455. Australia: IEEE Publisher

Frank, E. and Witten, I. H. (1998). Generating Accurate Rule Sets Without Global Optimization. In Proceeding of the 15th International Conference on Machine Learning. pp. 144-151. USA: DBLP Publisher

Mendes Souza, G. C. and Moreno, R. L. (2018). Netlab MLP - Performance Evaluation for Pattern Recognition in Myoletric Signal. Procedia Computer Science. Vol. 130, pp. 932-938. DOI: 10.1016/j.procs.2018.04.092

Sun, N., Sun, B., Lin, J., and Wu, M. Y. -C. (2018). Lossless Pruned Naive Bayes for Big Data Classifications. Big Data Research. Vol. 14, pp. 27-36. DOI: DOI: 10.1016/j.bdr.2018.05.007

Tapak, L., Shirmohammadi-Khorram, N., Amini, P., Alafchi, B., Hamidi, O., and Poorolajal, J. (2019). Prediction of Survival and Metastasis in Breast Cancer Patients using Machine Learning Classifiers. Clinical Epidemiology and Global Health. Vol. 7, Issue 3, pp. 293-299. DOI: 10.1016/ j.cegh.2018.10.003

Traganitis, P. A., Pagès-Zamora, A., and Giannakis, G. B. (2017). Learning from Unequally Reliable Blind Ensembles of Classifiers. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). Montreal, QC, Canada. pp. 106-110. DOI: 10.1109/GlobalSIP.2017.8308613

Zhang, D., Jiao, L., Bai, X., Wang, S., and Hou, B. (2018). A Robust Semi-Supervised SVM via Ensemble Learning. Applied Soft Computing. Vol. 65, No. C, pp. 632-643. DOI: 10.1016/j.asoc. 2018.01.038

Han, J. W. and Kamber, M. (2006). Data Miming Concepts and Techniques. New York: Morgan Kaufmann

John, G. H. and Langley, P. (1995). Estimating Continuous Distributions in Bayesian Classifiers. In Proceeding of the 7th Conference on Uncertainty in Artificial Intelligence. San Mateo: ACM Publisher. pp. 338-345

Zhan, Q., Motlicek, P., Du, S., Shan, Y., Ma, S., and Xie, X. (2019). Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features. In Proceedings of APSIPA Annual Summit and Conference 2019. pp. 1912-1916

Chang, C.-C. and Lin, C.-J. (2001). LIBSVM - A Library for Support Vector Machines. Access (20 August 2019) Avaiable (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)

Breiman, L. (1996). Bagging Predictors. Machine Learning. Vol. 24, pp. 123-140