Main Article Content
In this paper, the Probabilistic Mapped Mean-Shift Algorithm is proposed to detect anomalous data in public datasets and local hospital children’s wellness clinic databases. The proposed framework consists of two main parts. First, the Probabilistic Mapping step consists of k-NN instance acquisition, data distribution calculation, and data point reposition. Truncated Gaussian Distribution (TGD) was used for controlling the boundary of the mapped points. Second, the Outlier Detection step consists of outlier score calculation and outlier selection. Experimental results show that the proposed algorithm outperformed the existing algorithms with real-world benchmark datasets and a Children’s Wellness Clinic dataset (CWD). Outlier detection accuracy obtained from the proposed algorithm based on Wellness, Stamps, Arrhythmia, Pima, and Parkinson datasets was 93%, 94%, 80%, 75%, and 72%, respectively.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
 J, Huang., Q, Zhu., L, Yang., D, D., Cheng. and Q, Wu., “A novel outlier cluster detection algorithm without top-n parameter,” Elsevier Knowledge-Based Systems, Vol.121, 2017.
 Farag, H. K., Umit, A., Radwan, A. I. and Fionn, M., “A novel data clustering algorithm based on gravity center methodology”, Elsevier Expert Systems with Applications, Vol. 156, 2020.
 Xiaokang, W., Huiwen, W. and Yihui W., “A density weighted fuzzy outlier clustering approach for class imbalanced learning”, Springer Neural Computing and Applications, 2020.
 Jiang, X., Zhongyang, X., Qizhu, D., Xiaoxia, W. and Yufang, Z., “A local-gravitation-based method for the detection of outliers and boundary points” Elsevier Knowledge-Based Systems, Vol.192, 2020.
 Aditya, H. B., and Fitra, A. B., “Outlier Detection with Supervised Learning Method,” IEEE International Conference on Sustainable Information Engineering and Technology (SIET), 2019.
 Elhossiny, I., Marwa, A. S., Hanaa, T. and Ayman, E., “Handling missing and outliers values by enhanced algorithms for an accurate diabetic classification system,” Springer Multimedia and Applications, 2021.
 Diego, L., Alexandre, L. R., Flávio, M. V., “Sampling approaches for applying DBSCAN to large datasets,” Elsevier Pattern Recognition Letters, Vol 117, p90-96, 2019.
 Paweł, K., Adam, K., Witold, P. and Ebru Al, “K-Means-based isolation forest,” Elsevier Knowledge-Based Systems, Vol 195, 2020.
 P. Karczmarek, A. Kiersztyn, W. Pedrycz and D.Czerwi«ski Fuzzy C-Means-based Isolation Forest, Elsevier Applied Soft Computing, Vol.106,
 Patel, E., Kushwaha, D. S., Clustering Cloud Workloads: K-Means vs Gaussian Mixture Model, Procedia Computer Science, Volume 171, 2020, Pages 158-167, https://doi.org/10.1016/j.procs.2020.04.017
 Yang, J., Rahardja, S., Fränti, P., Mean-shift outlier detection and filtering, Pattern Recognition 115 (2021) 107874
 González-Estrada, E., & Cosmes, W. (2019). Shapiro–Wilk test for skew normal distributions based on data transformations. Journal of Statistical Computation and Simulation, 89(17), 3258-3272.