Stratum Boundaries Construction Using K-means Clustering Algorithm in Stratified Random Sampling

Main Article Content

Witwisit Kesornsit
Prechaya Hasalem
Jirawan Jitthavech

Abstract

This research presented the stratum boundaries construction techniques by K-means clustering algorithm for estimating the population mean in stratified random sampling. Samples are selected by simple random sampling method without replacement and allocated in accordance with the population size in each stratum. In this study, the population was separated into 4, 5 and 6 strata with the maximum correlation coefficients between the auxiliary variable and the variable of interest equal to 0.50, 0.70 and 0.90 and sample sizes of 50, 100, 150, 200 and 300. The efficiency of the estimator by the proposed strata construction relative to the estimate by Dalenius & Hodges strata construction method is used in the estimator evaluation. The estimator when constructing the strata by K-means clustering algorithm was more efficient in all simulation cases.

Article Details

Section
Applied Science Research Articles

References

[1] P. Suwatthee, Sample Surveys: Sampling Designs and Analysis. Bangkok: National Institute of Development Administration, 2009.

[2] K. Silpakob and W. Chaimongkol, “Estimation of population mean with missing data in stratified random sampling,” Burapha Science Journal, vol. 22, no. 2, pp. 202–217, 2017.

[3] M. E. Thompson, Theory of Sample Surveys. London: Chapman & Hall, 1997.

[4] M. H. Hansen, W. N. Hurwitz, and W. G. Madow, Sample Survey Methods and Theory. Canada: John Wiley & Sons, lnc., 1960.

[5] P. Suwatthee, Theory of Sampling Designs. Bangkok: National Institute of Development Administration, 2011.

[6] T. Dalenius and J. L. Hodges, “Minimum variance stratification,” Journal of the American Statistical Association, no. 285, pp. 88, 1959.

[7] Y. Olufadi, “Dual to ratio-cum-product estimator in simple and stratified random sampling,” Pakistan Journal of Statistics and Operation Research, vol. 9, no. 3, pp. 305–319, 2013.

[8] A. K. Gupta, and D. G. Kabe, Theory of Sample Surveys. Singapore: World Scientific Publishing, 2011.

[9] T. Dalenius, “A First Couse in Survey Sampling,” in P. krishnaiash and C. R. Rao (Eds.), Handbook of Statistics Volume 6: Sampling, North-Holland: Elsevier Science B.V., 1988, pp. 15–46.

[10] S. Wichaidit, “DNA microarray data analysis model using clustering algorithm for disease diagnosis,” M.S. thesis, Department of Computer Science, Faculty of Science, Price of Songkla University, 2008.

[11] W. Chongnguluam, “Parallelize rough k-medoids clustering on multicore processor,” M.S. thesis, School of Computer Engineering, Faculty of Engineering, Suranaree University of Technology, 2012.

[12] W. Pimpaporn and P. Meesad, “A comparative efficiency of clustering using dynamic feature selection optimization of subspace clustering algorithms,” Information Technology Journal, vol. 10, no. 2, pp. 43–51, 2014.

[13] J. Kaufman, and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley, 1990.

[14] G. J. Myatt and W. P. Johnson, Making Sense of Data II: A Practical Guide to Data Visualization, Advanced Data Mining Methods, and Applications. Canada: John Wiley & Sons, Inc. 2009.

[15] W. G. Cochran, Sampling Techniques, 3rd ed. New York : Wiley, c1977., 1977.

[16] X. Liang, S. Li, S. Zhang, H. Huang, and S. X. Chen, “PM2.5 data reliability, consistency, and air quality assessment in five Chinese cities,” Journal of Geophysical Research: Atmospheres, vol. 121, no. 17, 2016.