Comparison of Imbalanced data classification efficiency to predict road accident deaths using data mining techniques

Narin Jiwitan


This research has an objective to analyze the factors involved in road accident deaths and to compare the classification efficiency of road fatalities data of forecasting models. Consists of Gradient Boosted Tree, Decision Tree, Naïve Bayes, Random Forest, and Deep Learning which is a data mining technique based on the CRISP-DM methodology. Study and collect accident data 51,384-row from the Information and Communication Technology Center Office of the Permanent Secretary for Transport between 1 January 2019 - 30 June 2021. The data was prepared and cleaned before being analyzed and modeled. The results of the analysis of factors involved in road accident deaths by calculating the weight of relevant data or related attributes using the Gain Ratio method, it was found that the type of vehicle or road user weighted 0.1030. The results of the comparison of the classification efficiency of forecast model data by measuring the performance of the forecast model using the K-Fold Cross Validation method; it was divided into 30 fold, showed that the Naïve Bayes technique had the highest accuracy of 72.23% F1-measure of 73.35% and G-mean of 72.10%. The results of this research can be used as information to effectively prevent road accidents in each area, reduce risks, reduce impacts and prepare for road accident prevention that may occur.

