Comparison of Missing Data Imputation Methods in Dependent Variable with Missing at Random for Multiple Linear Regression
Main Article Content
Abstract
This research is to develop missing data imputation methods in dependent variable for multiple linear regression with missing at random in dependent variable, namely the Mean Regression Imputation method (MRI), the Expectation Maximization with Multiple Imputation method (EMMI) and the Nearest Average Regression Imputation method (NARI). Comparison of the efficiency of the develop methods with 6 methods, namely the Regression Imputation method (RI), the Stochastic Regression Imputation method (SRI), the K Nearest Neighbour Imputation method (KNN), the Expectation Maximization Algorithm method (EM), the Multiple Imputation method (MI) and the Proportioned Residual Draw Imputation method (PRD). The simulation study with R program where the standard deviations of error ( ) were set to be 5, 10 and 15, and sample sizes (n) were 30, 50, 100 and 200, and missing percentages were 5, 10, 15 and 20. The criteria for compare the performance is an Average Mean Square Error (AMSE). The results found that, the EMMI method has the best performance for all level of sample sizes at is equal to 5 and missing percentage is equal to 5. The MRI method performs better than the others at all level of sample sizes when is equal to 10 and missing percentage is equal to 5, and the MRI method still performs the best when is equal to 15 in all missing percentages and almost of all sample size levels. The result for real data at n = 50, the MRI method has the most effective in all level of missing percentages.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.