Sequential Clustering and Condensing the Meaning of Texts into Centroid Terms

  • Maytiyanin Komkhao Faculty of Science and Technology, Rajamangala University of Technology Phra Nakhon
  • Mario Kubek Faculty of Mathematics and Computer Science, Fernuniversitat in Hagen, Hagen, Germany
  • Wolfgang A. Halang Sino-German Technical Faculty Qingdao University of Science and Technolgy, Qingdao, China.
Keywords: Clustering,, Number of Clusters,, Distance Measures,, Sequential Clustering,, Single-Linkage, Reclustering,, Outlier Removal,, Text Analysis,, Centroid Term,, Centroid Distance Measure

Abstract

When run, most traditional clustering algorithms require the number of clusters sought to be specied beforehand, and all clustered items to be present. These two, for practical applications very serious shortcomings are overcome by a straightforward sequential clustering algorithm. Its most crucial constituent is a distance measure whose suitable choice is discussed. It is shown how sequentially obtained cluster sets can be improved by reclustering, and how items
considered as outliers can be removed. As a case study, the feasibility of applying the method and a centroid-based distance measure to nd and group semantically similar documents in text analysis is investigated.

Downloads

Download data is not yet available.
Published
2018-06-30