The Study of Features Affecting the Digital Literacy Test and Comparative Efficiency of Data Classification Using Data Mining Techniques

Main Article Content

Narin Jiwitan
Worakarn Jaidee
Wannaporn Teekeng

Abstract

The purpose of this research is to study the features that affect the results of the test for Digital Literacy by using the Feature Selection technique to compare the efficiency of Data Classification. The research process follows the steps of CRISP-DM, studying and collecting 12,374 records of 67 features of the Office of the National Digital Economy and Society Commission's survey and assessment of the state of digital literacy. Carry out Data Cleaning and Data Transformation into an appropriate format. SMOTE (Synthetic Minority Oversampling Technique) was used to improve the data balance. Features were selected using weighting calculation techniques: 1) Chi-Square Statistic 2) Gini Index 3) Gain Ratio 4) Information Gain and 5) Correlation-based. Using the results of the first 10 calculated highest weights of each technique to calculate the frequency and Features with frequencies higher than 2 were selected to design and create forecasting models of 5 techniques: Decision Tree, Gradient Boosted Trees, Random Forest, Naïve Bayes, and Deep Learning. Evaluation of the forecasting model using the K-Fold Cross Validation Test method, dividing the data into 10 folds, 20 folds, and 30 folds, measuring precision, recall, specificity, accuracy, F1-measure, and G-means. The results of the Study Feature Affecting the Digital Literacy Test revealed that there were 6 features with the highest frequency, frequency 5, as follows: the use of advertising media/product labels, using digital media for social media, using digital media to access websites, the problem of expensive internet service fees, problems accessing internet service areas and problems with spam/advertising messages. The comparative results of data classification efficiency test results of digital literacy found that the random forest technique was the most effective in data classification. When dividing the dataset into 30 parts, the accuracy was 76.29%, the overall efficiency was 76.00%, and the geometric mean was 76.28%.

Article Details

Section
บทความวิจัย

References

R. Ratanabanchuen. The digital economy and the readiness of Thai households through the study of "Digital literacy". Available Online at https://www.pier.or.th/abridged/2021/04/, accessed on 15 October 2022.

National Statistical Office, Ministry of Digital Economy and Society. The 2022 Household Survey on the Use of Information and Communication Technology, National Statistical Office, 2022.

Office of National Digital Economy and Society Commission, Media and Information Literacy Summary Survey Report Thailand 2019, Ministry of Digital Economy and Society, 2022.

P. Kularb. "The Promotion of Media, Information, and Digital Literacy for Children and Youth: Case Studies from South Korea, Singapore, and the United Kingdom." Journalism, Vol. 13, No. 2, pp. 130, May, 2020.

A. Boonyoo. "What is Digital Literacy." Journal of the Department of Science Service, Ministry of Science and Technology, Vol. 66, No. 207, pp. 28-29, May, 2018.

P. Thanathamathee, and Y. Sirisathitkul. "Improved Classification Techniques for Imbalanced Data Sets of Elderly’s Knee Osteoarthritis." Science and Technology Journal, Vol. 27, No. 6, pp.1164 - 1178, November - December, 2019.

P. Laopilai, and C. Sanrach. "Analysis for Student Dropout in Undergraduate Using Data Mining Technique." The Science Journal of Phetchaburi Rajabhat University, Vol. 16, No. 2, pp. 61-71, 2019.

S. Sinlapasorn. "Modeling used to predict osteoarthritis scores WOMAC of Post-Knee Replacement Surgery Patients with Characteristic Engineering and Machine Learning Techniques." Proceedings of the 25th Annual Meeting in Mathematics (AMM 2021), 2021.

B. Leo. "Random Forests." Machine Learning, vol. 45, No. 1, pp. 5 - 32, 2001.

A. Chutipascharoen, and C. Sanrach. "A Comparison of the Efficiency of Algorithms and Feature Selection Methods for Predicting the Success of Personal Overseas Money Transfer." KKU Research Journal of Humanities and Social Sciences, Vol. 3, No. 6, pp. 105 - 113, 2018.

ABB Corporate Research Center. What is Deep Learning?. Available Online at www.new.abb. com/news/detail/58004/deep-learning, accessed on 1 November 2021.

S. Chayatummagoon, and P. Chongstitvatana. "Image classification of sugar crystal with deep learning." 2021 13th International Conference on Knowledge and Smart Technology (KST), Bangsaen, Chonburi, Thailand, pp. 118 - 122, 2022.

K. Haruehansapong. "Digital Literacy of Undergraduate Students at Walailak University." Walailak Journal of Learning Innovations, Vol. 5, No. 2, pp. 27 - 40, December, 2019.

W. Kesornsit, J. Jitthavech and V. Lorchirachoonkul. "Reducing Multiclass to Binary for Classification Techniquesof Re-Hospitalization of Diabetes Patients." Thai Science and Technology Journal (TSTJ), Vol. 28, No. 1, January, 2020.

W. Jaidee, and N. Wannapee. "The Study of Factors Affecting for On-time Graduation of Ungraduated Student Using Feature Selection Technique on Imbalanced Datasets." Journal of Information Science and Technology (JIST), Vol. 10, No. 1, pp. 75 - 84, June, 2020.

S. Suepitak, B. Nissaidee, and W. Buathong. "Classification Techniques Comparison for Predicting Graduate Trend." PKRU SciTech J., Vol. 5, No. 2, pp. 42 - 50, December, 2021.

A. Kittitanachai. The using Information and Technology behavior which Relates for capability of Srinakarinwirot University Prasarnmit Demonstration School secondary, Presented in Partial Fulfillment of the Requirements for the Master of Education Degree in Educational Technology at Srinakharinwirot University, 2012.

T. Phiasai, and N. Chinpanthana. "Comparison of Feature Selection Method with ReliefF base Multi Label Algorithm to Improve Semantic Image Classification." Journal of Information Science and Technology (JIST), Vol. 11, No. 1, pp. 88 - 96, June, 2021.

V. Nuipian, and P. Meesut. "Comparison of filtering and consolidation techniques of text mining for text classification." The Journal of Industrial Technology, Vol. 9, No. 3, pp.118 - 129, September -December, 2013.

K. Rothjanawan, and W. Phetjirachotkul. "Feature Selection Methods for Imputation Missing Values of Time Series Data using Data Mining." Princess of Naradhiwas University Journal (PNUJR), Vol. 13, No. 2, pp. 326 - 341, May, 2021.