TY - JOUR AU - Nasritha, Karn AU - Kerdprasop, Kittisak AU - Kerdprasop, Nittaya PY - 2018/08/12 Y2 - 2024/03/29 TI - Comparison of Sampling Techniques for Imbalanced Data Classification JF - Journal of Applied Informatics and Technology JA - J Appl Inf Tech VL - 1 IS - 1 SE - Research Article DO - 10.14456/jait.2018.2 UR - https://ph01.tci-thaijo.org/index.php/jait/article/view/90569 SP - 20-37 AB - <p>Imbalanced data is a problem in the machine learning process for data classification, which results in low classification efficiency. It has also been found that random sampling techniques are used in several ways for solving low performance problems due to data imbalances. This research aims to compare sampling techniques for imbalanced data classification. The research was conducted on three data sets, which are Synthetic minority over-sampling technique, under-sampling technique and resample techniques for Imbalanced data preprocessing. Decision Tree, cart, random forest, support vector machine and artificial neural network algorithms are ensembled with adaboost and bagging algorithms to create models for data classification. Ten-fold cross validation was used to measure model performance. Performance was measured with precision, recall and f-measure. The results showed that resample techniques could improve the imbalanced data better than synthetic minority over-sampling technique. In addition, it was found that the random forest model, the adaboost ensemble with random forest model and the bagging ensemble with random forest model were efficient for data classification in this research.</p> ER -