A Study of Factors Affecting Learning Efficiency on Higher Education Student Performance Evaluation Dataset Using Feature Selection Techniques

Main Article Content

Kairung Hengpraprohm
Supoj Hengpraprohm
Wannee Sudjitjoon


This research aimed to discover the features affecting the learning efficiency on the higher education student performance evaluation dataset. The data were gathered from the student in the Faculty of Engineering and the Faculty of Education, Academic year 2019, to forecast the final learning performance of the students. Data consisted of 33 attributes and 145 records from UCI Machine Learning Dataset. Four feature selection techniques, which were Information Gain, Gain Ratio, Correlation Coefficient, and Chi-Square, were applied, along with four data classification methods: K-Nearest Neighbor, Random Forest, Artificial Neural Network, and Linear Regression. Findings demonstrated that the best feature selection techniques were Information Gain and Gain Ratio. When analyzing the relationship of feature data using Pearson’s correlation, the feature that had a positive relationship with the data class could be adapted. Further, when considering five features: CUML_GPA, EXP_GPA, READ_FREQ, COURSE ID, and KIDS: meaning when the student had a high cumulative grade point average of the last semester, high academic achievement expectation score, frequency of reading non-scientific books, and divorced or dead parents, they had satisfying learning achievement. Additionally, the attributes, which were STUDY_HRS, AGE, SALARY, IMPACT, had a negative relationship with the data class. It meant the low weekly study hour, young age, low income, and positive impact of the project or activity on the success led to satisfying learning achievement. Thus, it could be concluded that the factors affecting learning efficiency were the accumulated grade point average, achievement expectation score, frequency of reading non-scientific books, and low weekly study hours. All features could be the guideline for designing the learning management for the learner’s highest learning efficiency.

Article Details

Research Paper


T. Shusaku, and H. Shoji, “Risk Mining in Medicine: Application of Data Mining to Medical Risk Management”. Fundamenta Informaticae, Vol. 97, No. 1, pp.107 – 121, January, 2010.

H. Nan-Chen, “An integrated data mining and behavioral scoring model for analyzing bank customers”. Expert Systems with Applications, Vol. 27, No. 4, pp. 623-633, November, 2004.

X. Zheng, G. Zhu, N. Metawa, and Q. Zhou, “Machine learning based customer meta-combination brand equity analysis for marketing behavior evaluation,” Information Processing and Management, Vol. 59, No. 1, article 102800, 2022.

S. Wang, “Smart Data Mining Algorithm for Intelligent Education.” Journal of Intelligent & Fuzzy Systems, Vol. 37, No. 1, pp. 9-16, July, 2019.

S. N. Alachiotis et al., “Supervised Machine Learning Models for Student Performance Prediction.” Intelligent Decision Technologies, Vol. 16, No. 1, pp. 93-106, January, 2022.

O.O. Oladipupo, and O.O. Olugbara, “Evaluation of Data Analytics Based Clustering Algorithms for Knowledge Mining in a Student Engagement Data.” Intelligent Data Analysis, Vol. 23, No. 5, pp. 1055-1071, January, 2019.

A. Jha, M. Dave, and S. Madan, “A review on the study and analysis of big data using data mining techniques,” International Journal of Latest Trends in Engineering and Technology, Vol. 6, No. 3, pp. 94–102, 2016.

I. Guyon, and E. André, “An introduction to variable and feature selection.” Journal of machine learning research, pp. 1157-1182, March, 2003.

N. Rachburee, and W. Punlumjeak. “A comparison of feature selection approach between greedy, IG-ratio, Chi-square, and mRMR in educational mining.” In: 2015 7th international conference on information technology and electrical engineering (ICITEE), pp. 420-424, 2015.

L. Jia, “A hybrid feature selection method for software defect prediction.” IOP Conference Series: Materials Science and Engineering, Vol. 394. No. 3, IOP Publishing, 2018.

S. Li, X. Zhang, M. Zong, X. Zhu, and D. Cheng, “Learning k for knn classification.” ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 8, No. 3, pp. 1-19, 2017.

C. R. Sekhar, and E. Madhu, “Mode choice analysis using random forest decision trees.” Transportation Research Procedia, Vol. 17, pp. 644-652, 2016.

Y. Singh Y, and A.S. Chauhan, “Neural Networks in Data Mining.” Journal of Theoretical and Applied Information Technology, Vol. 5, No. 1, pp. 37-42, 2009.

M. Khashei, A. Z. Hamadani, and M. Bijari, “A novel hybrid classification model of artificial neural networks and multiple linear regression models.” Expert Systems with Applications, Vol. 39, No. 3, pp. 2606-2620, 2012.

S. Makridakis, “Accuracy measures: theoretical and practical concerns.” International journal of forecasting, Vol. 9, No. 4, pp. 527-529, 1993.

W. Punlumjeak, and N. Rachburee, “A comparative study of feature selection techniques for classify student performance.” 2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 425-429, 2015.

L. Rahman, N. A. Setiawan, and A. E. Permanasari, “Feature selection methods in improving accuracy of classifying students' academic performance.” 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 267-271, 2017.

M. R. Ahmed, S. T. I. Tahid, N. A. Mitu, P. Kundu, and S. Yeasmin, “A Comprehensive Analysis on Undergraduate Student Academic Performance using Feature Selection Techniques on Classification Algorithms.” 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1-6, 2020.

M. Wafi, U. Faruq, and A. A. Supianto, “Automatic Feature Selection for Modified K- Nearest Neighbor to Predict Student's Academic Performance.” 2019 International Conference on Sustainable Information Engineering and Technology (SIET), pp. 44-48, 2019.

R. Suguna, M. D. Shyamala, A. B. Rupali, and S. J. Aparna, “Assessment of feature selection for student academic performance through machine learning classification.” Journal of Statistics and Management Systems, Vol. 22, No. 4, pp. 729-739, 2019.

University of California, Irvine, School of Information and Computer Sciences. UCI Machine Learning Repository. Available Online at https://archive.ics.uci.edu/ml. accessed on 10 September 2022.