Outlier Detection in Wellness Data using Probabilistic Mapped Mean-Shift Algorithms

Siriwan Phongsasiri; Suwanna Rasmequan

doi:10.37936/ecti-cit.2021152.244971

PDF

Published: Aug 12, 2021

DOI: https://doi.org/10.37936/ecti-cit.2021152.244971

Keywords:

Outlier detection, k-NN, Truncated Gaussian Distribution, Probabilistic Mapped, Mean shift

Siriwan Phongsasiri

Faculty of Informatics Burapha University, Thailand

Suwanna Rasmequan

Faculty of Informatics Burapha University, Thailand

Abstract

In this paper, the Probabilistic Mapped Mean-Shift Algorithm is proposed to detect anomalous data in public datasets and local hospital children’s wellness clinic databases. The proposed framework consists of two main parts. First, the Probabilistic Mapping step consists of k-NN instance acquisition, data distribution calculation, and data point reposition. Truncated Gaussian Distribution (TGD) was used for controlling the boundary of the mapped points. Second, the Outlier Detection step consists of outlier score calculation and outlier selection. Experimental results show that the proposed algorithm outperformed the existing algorithms with real-world benchmark datasets and a Children’s Wellness Clinic dataset (CWD). Outlier detection accuracy obtained from the proposed algorithm based on Wellness, Stamps, Arrhythmia, Pima, and Parkinson datasets was 93%, 94%, 80%, 75%, and 72%, respectively.

How to Cite

[1]

S. Phongsasiri and S. Rasmequan, “Outlier Detection in Wellness Data using Probabilistic Mapped Mean-Shift Algorithms”, ECTI-CIT Transactions, vol. 15, no. 2, pp. 258–266, Aug. 2021.

Issue

Vol. 15 No. 2 (2021): ECTI Transactions on CIT (Aug 2021)

Section

Research Article

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

A, Boukerche., L, Zheng. and O, Alfandi., “Outlier Detection: Methods, Models, and Classification,” ACM Computing Surveys (CSUR), Vol.53, 2020.

J, Huang., Q, Zhu., L, Yang., D, D., Cheng. and Q, Wu., “A novel outlier cluster detection algorithm without top-n parameter,” Elsevier Knowledge-Based Systems, Vol.121, 2017.

Farag, H. K., Umit, A., Radwan, A. I. and Fionn, M., “A novel data clustering algorithm based on gravity center methodology”, Elsevier Expert Systems with Applications, Vol. 156, 2020.

Xiaokang, W., Huiwen, W. and Yihui W., “A density weighted fuzzy outlier clustering approach for class imbalanced learning”, Springer Neural Computing and Applications, 2020.

Jiang, X., Zhongyang, X., Qizhu, D., Xiaoxia, W. and Yufang, Z., “A local-gravitation-based method for the detection of outliers and boundary points” Elsevier Knowledge-Based Systems, Vol.192, 2020.

Aditya, H. B., and Fitra, A. B., “Outlier Detection with Supervised Learning Method,” IEEE International Conference on Sustainable Information Engineering and Technology (SIET), 2019.

Elhossiny, I., Marwa, A. S., Hanaa, T. and Ayman, E., “Handling missing and outliers values by enhanced algorithms for an accurate diabetic classification system,” Springer Multimedia and Applications, 2021.

Diego, L., Alexandre, L. R., Flávio, M. V., “Sampling approaches for applying DBSCAN to large datasets,” Elsevier Pattern Recognition Letters, Vol 117, p90-96, 2019.

Paweł, K., Adam, K., Witold, P. and Ebru Al, “K-Means-based isolation forest,” Elsevier Knowledge-Based Systems, Vol 195, 2020.

P. Karczmarek, A. Kiersztyn, W. Pedrycz and D.Czerwi«ski Fuzzy C-Means-based Isolation Forest, Elsevier Applied Soft Computing, Vol.106,

Patel, E., Kushwaha, D. S., Clustering Cloud Workloads: K-Means vs Gaussian Mixture Model, Procedia Computer Science, Volume 171, 2020, Pages 158-167, https://doi.org/10.1016/j.procs.2020.04.017

Yang, J., Rahardja, S., Fränti, P., Mean-shift outlier detection and filtering, Pattern Recognition 115 (2021) 107874

González-Estrada, E., & Cosmes, W. (2019). Shapiro–Wilk test for skew normal distributions based on data transformations. Journal of Statistical Computation and Simulation, 89(17), 3258-3272.

Article Sidebar

Main Article Content

Abstract

Article Details

References