Using Data Mining Techniques to Develop a Model for Scratch Programming Assessment

Main Article Content

Nontasak Janchum
Chalita Cheewaviriyanon


There is a lot of educational data, which is big data.
A few research has been conducted in a field of educational data mining in order to predict academic achievement in programming at secondary level.
The objectives of this research were to develop a model for learning assessment of Mathayomsuksa 1 students, which consider from Scratch projects using Data Mining, and to test the model performance. This research chose to apply CRISP-DM data mining framework to develop models using three classification techniques: Naïve Bayes, Decision Tree and K-Nearest Neighbor, and to validate these models using a 10-fold cross validation technique. The input of the model development was 113 samples of Scratch projects, which was divided into a training data set and a testing data set. The developed models using the complexity features of Scratch projects consisted of 9 features as the predictor variables, and grades were the target variable.

The results showed that the Decision Tree technique was able to predict the most effective results out of the three models, which had an accuracy of 93.67%, a precision of 93.47%, a recall of 85.71% and a F-measurement of 87.24%. Testing the model performance using the testing data set, was found that an accuracy of all data in prediction was 64.71%. The prediction model can be used as a learning assessment tool for teachers in computing science courses.

Article Details



Institute for the Promotion of Teaching Science and Technology, Teacher's Manual of Fundamentals of Science and Technology (Computing Science) Matayomsusa 1, 2018.

D. Bau, J. Gray, C. Kelleher, J. Sheldon, and F. Turbak. "Learnable programming: blocks and beyond." Communications of the ACM, Vol. 60, Issue. 6, pp. 72-80, 2017.

S. Grover, S. Cooper, and R. Pea. "Assessing computational learning in K-12." in Proceedings of the 2014 conference on Innovation & technology in computer science education. Sweden, 21 - 25 June 2014.

S. M. Taheri, M. Sasaki, J. O. Chu, and H. T. Ngetha, "A study of teaching problem solving and programming to children by introducing a new programming language." The International Journal of E-Learning and Educational Technologies in the Digital Media (IJEETDM), Vol. 2, Issue. 1, pp. 31-36, 2016.

S. Sittichat, "Study of educational attributes using data mining technique." Information Technology Journal, Vol.13, Issue2, pp.20-28, 2017.

S. Vilailuck, V. Jaroenpunlaruk, and D. Wichadakul, "Utilizing data mining techniques to forecast student academic achievement of Kasetsart University Lobratory School," Veridian E-Journal, Science and Technology Silpakorn University, Vol. 2, Issue. 2, pp.1-17, 2015.

P. Cheewaprakobkit, "Predicting student academic achievement by using the decision tree and neural network techniques." Catalyst, Vol. 12, Issue. 2, pp. 34-43, 2015.

M. Yağcı, "Educational data mining: prediction of students' academic performance using machine learning algorithms." Smart Learning Environments, Vol. 9, No. 1, pp. 1-19, 2022.

S. Phakkachokh, A Model for Selecting High School Program by Considering the Primary Subject Records using Data Mining Techniques. Faculty of Information Technology, Dhurakij Pundit University, 2013.

J. Laohawanan, R. Limsuthiwanpoom, and B. Thanasopon, "The use of data mining techniques in classifying and selecting subject areas for students of the Faculty of Information Technology." KMITL Journal of Information Technology, Vol. 4, Issue 2, pp. 1-9, 2560.

A. Pinate, "The use of data mining in selecting areas of study for further education opportunity." Journal of Science and Technology, Vol. 36, Issue. 6, pp. 704-712, 2017.

A. Aleem, and M.M. Gore. "Educational Data Mining Methods: A Survey." in 2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT). 2020.

J. VanderPlas, Python Data Science Handbook: Essential Tools for Working with Data. Sebastopol, California: O'Reilly Media. 2016.

A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Sebastopol, CA: O'Reilly Media, 2019.

C. O'Neil, and R. Schutt, Doing Data Science: Straight Talk from the Frontline. Sebastopol, California: O'Reilly Media, 2013.

A.C. Müller, and S. Guido, Introduction to Machine Learning with Python: A Guide for Data Scientists. Sebastopol, California: O'Reilly Media, 2016.

S. Raschka, and V. Mirjalili, Python Machine Learning. 3rd ed. Birmingham: Packt Publishing, 2019.

Q.A. Al-Radaideh, E.M. Al-Shawakfa, and M.I. Al-Najjar. "Mining student data using decision trees." in International Arab Conference on Information Technology (ACIT'2006). Yarmouk University, Jordan, 2006.

S. Bergin, Statistical and Machine Learning Models to Predict Programming Performance. National University of Ireland Maynooth: Maynooth, 2006.

K. Quille, S. Bergin, and A. Mooney, "Press#, a web-based educational system to predict programming performance." International Journal of Computer Science and Software Engineering (IJCSSE), Vol. 4, Issue. 7, pp. 178-189, 2015.

S. Bergin, A. Mooney, J. Ghent, and K. Quille, "Using machine learning techniques to predict introductory programming performance." International Journal of Computer Science and Software Engineering (IJCSSE), Vol. 4, Issue. 12, pp. 323-328, 2015.

K. Quille, and S. Bergin. "Programming: further factors that influence success." in The Psychology of Programming Interest Group. University of Cambridge, UK, 2016.

R. Sharda, D. Delen, and E. Turban, Business Intelligence, Analytics, and Data Science: a Managerial Perspective. 4th ed. Boston: Pearson, 2018.

Dr.Scratch team. Dr.Scratch analyse your Scratch project here!. Available online at: http://www., accessed on 12 April 2021.

L.M. Laird, and M.C. Brennan, Software Measurement and Estimation: a Practical Approach. John Wiley & Sons: New Jersey, 2006.

J. Moreno-León, G. Robles, and M. Román-González. "Comparing computational thinking development assessment scores with software complexity metrics." in 2016 IEEE Global Engineering Education Conference (EDUCON). 2016.