Application of Binary Whale Optimization Algorithm for Solving Imbalanced Data Problems
Main Article Content
Abstract
This research is aimed at developing a novel undersampling algorithm by combining the ideas of the whale and binary whale optimization algorithms with K nearest neighbor classification, in order to solve imbalanced data problems. Twelve datasets of varying imbalance ratios ranging from 1.82 to 42.01 were selected from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository and also the imbalancedlearn repository, to be used in the evaluation of the novel algorithm. This research work started by splitting each dataset into two parts, the training set and the testing set. Whereas the minority class of each training set remained untouched, its majority class was processed by the proposed algorithm with the parameter in K-nearest neighbor classification fixed to K = 1, to obtain an optimal representative subset of the majority class. Then a support vector machine classifier was trained with the new and reduced training set for performance assessment. It was found that the proposed algorithm had best overall performance when compared with another three undersampling methods, namely random undersampling, cluster centroid, and near-miss algorithms, showing average efficiency measurement results as follows: Accuracy = 0.8596, F1 score = 0.6255, G-mean = 0.8941, AUROC = 0.9363, AUPRC = 0.6978, Sensitivity = 0.9444, Precision = 0.5271, MCC = 0.6204, and Kappa = 0.5695.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Article Accepting Policy
The editorial board of Thai-Nichi Institute of Technology is pleased to receive articles from lecturers and experts in the fields of business administration, languages, engineering and technology written in Thai or English. The academic work submitted for publication must not be published in any other publication before and must not be under consideration of other journal submissions. Therefore, those interested in participating in the dissemination of work and knowledge can submit their article to the editorial board for further submission to the screening committee to consider publishing in the journal. The articles that can be published include solely research articles. Interested persons can prepare their articles by reviewing recommendations for article authors.
Copyright infringement is solely the responsibility of the author(s) of the article. Articles that have been published must be screened and reviewed for quality from qualified experts approved by the editorial board.
The text that appears within each article published in this research journal is a personal opinion of each author, nothing related to Thai-Nichi Institute of Technology, and other faculty members in the institution in any way. Responsibilities and accuracy for the content of each article are owned by each author. If there is any mistake, each author will be responsible for his/her own article(s).
The editorial board reserves the right not to bring any content, views or comments of articles in the Journal of Thai-Nichi Institute of Technology to publish before receiving permission from the authorized author(s) in writing. The published work is the copyright of the Journal of Thai-Nichi Institute of Technology.
References
S. Fotouhi, S. Asadi, and M. W. Kattan, “A comprehensive data level analysis for cancer diagnosis on imbalanced data,” J. Biomed. Inform., vol. 90, Feb. 2019, Art. no. 103089, doi: 10.1016/j.jbi.2018.12.003.
N. M. Mqadi, N. Naicker, and T. Adeliyi, “Solving misclassification of the credit card imbalance problem using near miss,” Math. Probl. Eng., vol. 2021, Jul. 2021, Art. no. 7194728, doi: 10.1155/2021/7194728.
W. Kesornsit, V. Lorchirachoonkul, and J. Jitthavech, “Imbalanced data problem solving in classification of diabetes patients,” (in Thai), KKU Res. J. (Graduate Studies), vol. 18, no. 3, pp. 11–21, 2018.
A. Fernández, S. García, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera, Learning from Imbalanced Data Sets. Cham, Switzerland: Springer, 2018.
H. Yu, J. Ni, and J. Zhao, “ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data,” Neurocomputing, vol. 101, pp. 309–318, 2013.
V. López, I. Triguero, C. J. Carmona, S. García, and F. Herrera, “Addressing imbalanced classification with instance generation techniques: IPADE-ID,” Neurocomputing, vol. 126, pp. 15–28, 2014.
H.-J. Kim, N.-O. Jo, and K.-S. Shin, “Optimization of cluster-based evolutionary undersampling for the artificial neural networks in corporate bankruptcy prediction,” Expert Syst. Appl., vol. 59, pp. 226–234, 2016.
J. Li et al. “Adaptive swarm balancing algorithms for rare-event prediction in imbalanced healthcare data,” PloS One, vol. 12, no. 7, 2017, Art. no. e0180830, doi: 10.1371/journal.pone.0180830.
V. Kumar and D. Kumar, “Binary whale optimization algorithm and its application to unit commitment problem,” Neural. Comput. Appl., vol. 32, no. 7, pp. 2095–2123, 2020.
M. M. Mafarja and S. Mirjalili, “Hybrid whale optimization algorithm with simulated annealing for feature selection,” Neurocomputing, vol. 260, pp. 302–312, 2017.
A. G. Hussien, A. E. Hassanien, E. H. Houssein, S. Bhattacharyya, and M. Amin, “S-shaped binary whale optimization algorithm for feature selection,” in Recent Trends in Signal and Image Processing (Advances in Intelligent Systems and Computing), vol 727, S. Bhattacharyya, A. Mukherjee, H. Bhaumik, S. Das, K. Yoshida Eds., Singapore, Singapore: Springer, 2019, pp. 79–87.
G. I. Sayed, A. Darwish, and A. E. Hassanien, “Binary whale optimization algorithm and binary moth flame optimization with clustering algorithms for clinical breast cancer diagnoses,” J. Classif., vol. 37, no. 1, pp. 66–96, 2020.
A. G. Hussien, A. E. Hassanien, E. H. Houssein, M. Amin, and A. T. Azar, “New binary whale optimization algorithm for discrete optimization problems,” Eng. Optim., vol. 52, no. 6, pp. 945–959, 2020.
D. W. Aha, D. Kibler, and M. K. Albert, “Instance-based learning algorithms,” Mach. Learn., vol. 6, no. 1, pp. 37–66, 1991.
C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
R. Akbani, S. Kwek, and N. Japkowicz, “Applying support vector machines to imbalanced datasets,” in Proc. Mach. Learn.: ECML 2004: 15th Eur. Conf. Mach. Learn., Pisa, Italy, Sep. 2004, pp. 39–50.
S. Mishra, “Handling imbalanced data: SMOTE vs. random undersampling,” Int. Res. J. Eng. Technol., vol. 4, no. 8, pp. 317–320, 2017.
The Imbalanced-learn Developers. “ClusterCentroids.” IMBALANCED-LEARN.org. https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.ClusterCentroids.html (accessed Mar. 3, 2022).
J. Zhang and I. Mani, “kNN approach to unbalanced data distributions: A case study involving information extraction,” presented at ICML'2003 Workshop on Learning from Imbalanced Data Sets (II), Washington, DC, USA, Aug. 21, 2003.
A. Orriols-Puig and E. Bernadó-Mansilla, “Evolutionary rule-based systems for imbalanced data sets,” Soft Comput. vol. 13, no. 3, pp. 213–225, 2009.
S. Mirjalili and A. Lewis, “The whale optimization algorithm,” Adv. Eng. Softw., vol. 95, pp. 51–67, 2016.
J. S. Akosa, “Predictive accuracy: A misleading performance measure for highly imbalanced data,” presented at the SAS Global Forum 2017, Orlando, FL, USA, Apr. 2–5, 2017, Paper 942–2017.
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020.
J. Cohen, “A coefficient of agreement for nominal scales,” Educ. Psychol. Meas., vol. 20, no. 1, pp. 37–46, 1960.
T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett., vol. 27, no. 8, pp. 861–874, 2006.
scikit-learn 1.2.2: Precision-Recall. (2023). [Online]. Available: https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
K. Battula, “Research of machine learning algorithms using K-fold cross validation,” Int. J. Eng. Adv. Technol., vol. 8, no. 6S, pp. 215–218, 2021.
Imbalanced data sets, KEEL, 2011. [Online]. Available: http://www.keel.es/
fetch_datasets, The imbalanced-learn developers, 2018. [Online]. Available: https://imbalanced-learn.org/stable/references/generated/imblearn.datasets.fetch_datasets.html