Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes

Chaluemwut Noyunsan
Tatpong Katanyukul
Kanda Saikaew


Supervised learning is a machine learning technique used for creating a data prediction model. This article focuses on finding high performance supervised learning algorithms with varied training data sizes, varied number of attributes, and time spent on prediction. This studied evaluated seven algorithms, Boosting, Random Forest, Bagging, Naive Bayes, K-Nearest Neighbours (K-NN), Decision Tree, and Support Vector Machine (SVM), on seven data sets that are the standard benchmark from University of California, Irvine (UCI) with two evaluation metrics and experimental settings of various training data sizes and missing key attributes. Our findings reveal that Bagging, Random Forest, and SVM are overall the three most accurate algorithms. However, when presence of key attribute values is of concern, K-NN is recommended as its performance is affected the least. Alternatively, when training data sizes may be not large enough, Naive Bayes is preferable since it is the most insensitive algorithm to training data sizes. The algorithms are characterized on a two-dimension chart based on prediction performance and computation time. This chart is expected to guide a novice user to choose an appropriate method for his/her demand. Based on this chart, in general, Bagging and Random Forest are the two most recommended algorithms because of their high performance and speed.

Noyunsan, C., Katanyukul, T., & Saikaew, K. (2018). Performance evaluation of supervised learning algorithms with various training data sizes and missing attributes. Engineering and Applied Science Research, 45(3), 221–229. retrieved from
Chaluemwut Noyunsan, Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen 40002, Thailand

Chaluemwut Noyunsan was born in Khon Kaen, Thailand, in 1981. He received the B.S. degree in Computer Engineering from Khon Kaen University, Thailand, in 2003, and the M.S. degrees in Computer Science from Chulalongkorn University, Thailand, in 2009, respectively. Currently, he is studying PhD degree in the Department of Computer Engineering, Khon Kaen University. He current research interests include social network analysis and information credibility measurement.

Tatpong Katanyukul, Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen 40002, Thailand

Tatpong Katanyukul Academic Background B.Eng. (Electronics Engineering), King Mongkut Institute of Technology, Ladkrabang M.Eng. (Computer Science), Asian Institute of Technology Ph.D. (Mechanical Engineering), Colorado State University Academic Areas of Interest Approximate Dynamic Programming, including Reinforcement Learning. Machine Learning Applications. Contact me: tatpong at kku dot ac dot th.

Kanda Saikaew, Department of Computer Engineering, Faculty of Engineering, Khon Kaen University, Khon Kaen 40002, Thailand

Kanda Saikaew was born in Chiang Rai, Thailand, in 1975. She received the B.S. degree in Electrical and Computer Engineering from Carnegie Mellon University, Pennsylvania, USA, in 1997, and the M.S. and Ph.D. degrees in Computer Science and Engineering from the University of Michigan at Ann Arbor, in 1999 and 2003, respectively. In 2003, she joined the Department of Computer Engineering, Khon Kaen University, as a Lecturer, and became an Associate Professor in 2015. Her current research interests include social network analysis and machine learning.


