Predicting Breast Cancer Patient Survival
Main Article Content
Abstract
The objective of this research is to develop the effective model for predicting the survival of patients with breast cancer. Breast cancer is the second most common cancer in women. Data were collected from the SEER database in 2004 to 2014. It has up to 115,184 records. The prediction models were modeled with the basic techniques including Naive Bayes, PART decision list, MutiLayer Perceptron and Support Vector Machine. Moreover, the research team adopted Bagging technique to combine with these basic techniques in order to increase performance of the built prediction models. 10-fold cross-validation has been used to divide the dataset into training and testing sets. Sensitivity, specificity and accuracy values were used to compare the performance of models. The experiment result shows that that PART combine with bagging technique can construct breast cancer survival models with the highest accuracy of 98.89 %.
Article Details
References
Siegel, R., Miller, K., and Jemal, A. (2018). Cancer Statistics, 2018. CA: A Cancer Journal Clinic. Vol. 68, Issue 1, pp. 7-30. DOI: 10.3322/caac.21442
Cancer Research UK. Stages of Cancer. Access (12 April 2020). Avaliable (http://www.cancerresearchuk.org/about-cancer/what-is-cancer/stages-of-cancer#types)
HD Editorial Department. Breast Cancer, Stage 3, Treatment and Survival Rate. Access (13 January 2020). Avaliable (https://www.honestdocs.co/stage-3-breast-cancer)
Delen, D. and Patil, N. (2006). Knowledge Extraction from Prostate Cancer Data. In Proceeding of the 39th Annual Hawaii International Conference on System Sciences. pp. 92b-92b. USA: IEEE Publisher
Umezu, T., Shibata, K., Kajiyama, H., Yamamoto, E., Mizuno, M., and Kikkawa, F. (2012). Prognostic Factors in Stage IA-IIA Cervical Cancer Patients Treated Surgically: Does the Waiting Time to the Operation Affect Survival?. Archives of Gynecology and Obstetrics. Vol. 285, No. 2, pp. 493-497. DOI: 10.1007/s00404-011-1966-y
Poum, A., Kamsa-ard, S., and Promthet, S. (2012). Survival Rates of Breast Cancer: a Hospital-Based Study from Northeast of Thailand. Asian Pacific Journal of Cancer Prevention. Vol. 13, Issue 3, pp. 791-794. DOI: 10.7314/APJCP.2012.13.3.791
Zhang, Y.-C. and Sakhanenko, L. (2019). The Naive Bayes Classifier for Functional Data. Statistics & Probability Letters. Vol. 152, pp. 137-146. DOI: 10.1016/j.spl.2019.04.017
Liu, B., Blasch, E., Chen, Y., Shen, D., and Chen, G. (2013). Scalable Sentiment Classification for Big Data Analysis using Naïve Bayes Classifier. In Proceeding of the International Conference on Big Data. pp. 99-104. USA: IEEE Publisher
Khunsuk, T. and Thongkam, J. (2020). Feature Selection Method for Improving Customer Reviews Classification. RMUTI JOURNAL Science and Technology. Vol. 13, No. 1, pp. 132-145
Mazid, M. M., Ali, A. B. M. S., and Tickle, K. S. (2009). A Comparison Between Rule Based and Association Rule Mining Algorithms. In Proceeding of the Third International Conference on Network and System Security. pp. 452-455. Australia: IEEE Publisher
Frank, E. and Witten, I. H. (1998). Generating Accurate Rule Sets Without Global Optimization. In Proceeding of the 15th International Conference on Machine Learning. pp. 144-151. USA: DBLP Publisher
Mendes Souza, G. C. and Moreno, R. L. (2018). Netlab MLP - Performance Evaluation for Pattern Recognition in Myoletric Signal. Procedia Computer Science. Vol. 130, pp. 932-938. DOI: 10.1016/j.procs.2018.04.092
Sun, N., Sun, B., Lin, J., and Wu, M. Y. -C. (2018). Lossless Pruned Naive Bayes for Big Data Classifications. Big Data Research. Vol. 14, pp. 27-36. DOI: DOI: 10.1016/j.bdr.2018.05.007
Tapak, L., Shirmohammadi-Khorram, N., Amini, P., Alafchi, B., Hamidi, O., and Poorolajal, J. (2019). Prediction of Survival and Metastasis in Breast Cancer Patients using Machine Learning Classifiers. Clinical Epidemiology and Global Health. Vol. 7, Issue 3, pp. 293-299. DOI: 10.1016/ j.cegh.2018.10.003
Traganitis, P. A., Pagès-Zamora, A., and Giannakis, G. B. (2017). Learning from Unequally Reliable Blind Ensembles of Classifiers. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP). Montreal, QC, Canada. pp. 106-110. DOI: 10.1109/GlobalSIP.2017.8308613
Zhang, D., Jiao, L., Bai, X., Wang, S., and Hou, B. (2018). A Robust Semi-Supervised SVM via Ensemble Learning. Applied Soft Computing. Vol. 65, No. C, pp. 632-643. DOI: 10.1016/j.asoc. 2018.01.038
Han, J. W. and Kamber, M. (2006). Data Miming Concepts and Techniques. New York: Morgan Kaufmann
John, G. H. and Langley, P. (1995). Estimating Continuous Distributions in Bayesian Classifiers. In Proceeding of the 7th Conference on Uncertainty in Artificial Intelligence. San Mateo: ACM Publisher. pp. 338-345
Zhan, Q., Motlicek, P., Du, S., Shan, Y., Ma, S., and Xie, X. (2019). Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features. In Proceedings of APSIPA Annual Summit and Conference 2019. pp. 1912-1916
Chang, C.-C. and Lin, C.-J. (2001). LIBSVM - A Library for Support Vector Machines. Access (20 August 2019) Avaiable (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)
Breiman, L. (1996). Bagging Predictors. Machine Learning. Vol. 24, pp. 123-140