Evaluating Machine Learning Methods for Solving Class Imbalance in Banking Customer Data: A Comparative Study

Jirakit Boonmunewai; Teerapat Chantaraksa; Benjawan Rodjanadid

doi:10.14456/kkuscij.2024.27

PDF

Published: Nov 21, 2024

DOI: https://doi.org/10.14456/kkuscij.2024.27

Keywords:

Churn Prediction Synthetic Minority Over-sampling Technique Decision Tree Classifier Naïve Bayes Classifier Support Vector Machine

Jirakit Boonmunewai

School of Mathematical Sciences and Geoinformatics, Institute of Science, Suranaree University of Technology, Thailand

Teerapat Chantaraksa

School of Mathematical Sciences and Geoinformatics, Institute of Science, Suranaree University of Technology, Thailand

Benjawan Rodjanadid

School of Mathematical Sciences and Geoinformatics, Institute of Science, Suranaree University of Technology, Thailand

Abstract

The goal of this research was to address an imbalance problem that affects churn prediction for bank customers. In this study, we examined two sampling techniques Synthetic Minority Over-sampling Technique (SMOTE) and random under-sampling along with three predictive models: the decision tree classifier, Naïve Bayes classifier, and support vector machine classifier. The results indicated that the support vector machine classifier, when combined with SMOTE, was the most effective, achieving a recall of 92.99%, an F-score of 91.37%, an area under the curve (AUC) of 96.4%, and a false negative rate of 7.01%.

How to Cite

Boonmunewai, J., Chantaraksa, T., & Rodjanadid, B. (2024). Evaluating Machine Learning Methods for Solving Class Imbalance in Banking Customer Data: A Comparative Study. KKU Science Journal, 52(3), 349–362. https://doi.org/10.14456/kkuscij.2024.27

Issue

Vol. 52 No. 3 (2024): September – December 2024

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., Hawalah, A. and Hussain, A. (2016). Comparing oversampling techniques to handle the class imbalance problem: A customer churn - prediction case study. IEEE Access 4: 7940 - 7957.

Burez, J. and Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications 36(3): 4626 - 4636.

Expert System Team. (2017). What is Machine Learning? A definition. Source: http://expertsystem.com/machine-learning-definition/. Retrieved from 16 May 2024.

Haddadi, S.J., Farshidvard, A., Silva, F.D.S., dos Reis J.C. and da Silva Reis, M. (2024). Customer churn prediction in imbalanced datasets with resampling methods: A comparative study. Expert Systems with Applications 246: 123086.

Han, J., Pei, J. and Kamber, M. (2011). Data Mining: Concepts and Techniques. (Third Edition). Waltham, USA.: Elsevier.

Kimura, T. (2022). Customer churn prediction with hybrid resampling and ensemble learning. Journal of Management Information & Decision Science 25(1).

Larose, D.T. and Larose, C.D. (2014). Discovering Knowledge in Data: An Introduction to Data Mining. New Jersey: John Wiley & Sons, Inc.

Ling, C. and Li, C. (1998). Data mining for direct marketing problems and solutions. In: Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). NY: AAAI Press, New York.

Narkhede, S. (2016). Understanding AUC - ROC Curve. Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5. Retrieved from 16 May 2024.

Ramyachitra, D. and Manikandan, P. (2014). Imbalanced dataset classification and solutions: a review. International Journal of Computing and Business Research (IJCBR) 5(4).

Shrutimechlearn. (2019). Churn Modelling classification data set. Source: https://www.kaggle.com/datasets/shrutimechlearn/churn-modelling. Retrieved from 24 March 2024.

Srinivasan, R., and Subalalitha, C.N. (2023). Sentimental analysis from imbalanced code-mixed data using machine learning approaches. Distributed and Parallel Databases 41: 37 – 52. doi: 10.1007/s10619-021-07331-4.

Wadikar, D. (2020). Customer Churn Prediction. Masters Dissertation, Technological University Dublin. Dublin, Ireland.

Xie, W., Liang, G., Dong, Z., Tan, B. and Zhang, B. (2019). An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data. Mathematical Problems in Engineering 2019. doi: 10.1155/2019/3526539.

Article Sidebar

Main Article Content

Abstract

Article Details

References