Evaluating Machine Learning Methods for Solving Class Imbalance in Banking Customer Data: A Comparative Study
Main Article Content
Abstract
The goal of this research was to address an imbalance problem that affects churn prediction for bank customers. In this study, we examined two sampling techniques Synthetic Minority Over-sampling Technique (SMOTE) and random under-sampling along with three predictive models: the decision tree classifier, Naïve Bayes classifier, and support vector machine classifier. The results indicated that the support vector machine classifier, when combined with SMOTE, was the most effective, achieving a recall of 92.99%, an F-score of 91.37%, an area under the curve (AUC) of 96.4%, and a false negative rate of 7.01%.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Amin, A., Anwar, S., Adnan, A., Nawaz, M., Howard, N., Qadir, J., Hawalah, A. and Hussain, A. (2016). Comparing oversampling techniques to handle the class imbalance problem: A customer churn - prediction case study. IEEE Access 4: 7940 - 7957.
Burez, J. and Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications 36(3): 4626 - 4636.
Expert System Team. (2017). What is Machine Learning? A definition. Source: http://expertsystem.com/machine-learning-definition/. Retrieved from 16 May 2024.
Haddadi, S.J., Farshidvard, A., Silva, F.D.S., dos Reis J.C. and da Silva Reis, M. (2024). Customer churn prediction in imbalanced datasets with resampling methods: A comparative study. Expert Systems with Applications 246: 123086.
Han, J., Pei, J. and Kamber, M. (2011). Data Mining: Concepts and Techniques. (Third Edition). Waltham, USA.: Elsevier.
Kimura, T. (2022). Customer churn prediction with hybrid resampling and ensemble learning. Journal of Management Information & Decision Science 25(1).
Larose, D.T. and Larose, C.D. (2014). Discovering Knowledge in Data: An Introduction to Data Mining. New Jersey: John Wiley & Sons, Inc.
Ling, C. and Li, C. (1998). Data mining for direct marketing problems and solutions. In: Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). NY: AAAI Press, New York.
Narkhede, S. (2016). Understanding AUC - ROC Curve. Source: https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5. Retrieved from 16 May 2024.
Ramyachitra, D. and Manikandan, P. (2014). Imbalanced dataset classification and solutions: a review. International Journal of Computing and Business Research (IJCBR) 5(4).
Shrutimechlearn. (2019). Churn Modelling classification data set. Source: https://www.kaggle.com/datasets/shrutimechlearn/churn-modelling. Retrieved from 24 March 2024.
Srinivasan, R., and Subalalitha, C.N. (2023). Sentimental analysis from imbalanced code-mixed data using machine learning approaches. Distributed and Parallel Databases 41: 37 – 52. doi: 10.1007/s10619-021-07331-4.
Wadikar, D. (2020). Customer Churn Prediction. Masters Dissertation, Technological University Dublin. Dublin, Ireland.
Xie, W., Liang, G., Dong, Z., Tan, B. and Zhang, B. (2019). An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data. Mathematical Problems in Engineering 2019. doi: 10.1155/2019/3526539.