Cluster Analysis to Find Sets of High-frequency Queries for Filtering in Similarity Join

Kamolwan Kunanusont; Jaruloj Chongstitvatana

doi:10.37936/ecti-cit.2016101.58158

PDF

DOI: https://doi.org/10.37936/ecti-cit.2016101.58158

Keywords:

Similarity Join Similarity Search Highfrequency Queries Cluster Analysis

Kamolwan Kunanusont

Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Thailand

Jaruloj Chongstitvatana

Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Thailand

Abstract

Similarity search and similarity join are important operations in text databases. In some situations, some similar queries, called high-frequency queries, are repeated over a period of time. High-frequencyqueries-based filter is used to facilitate this type of queries. However, the performance of this method depends mostly on the chosen high-frequency queries. This paper proposes methods, which are based on DBSCAN and agglomerative hierarchical-based clustering algorithm, to find high-frequency queries for the filter, called DBRAN and DBSM. For evaluation, both DBRAN and DBSM are applied on various sets of queries to find high-frequency queries for three datasets. It is found that DBSM performs better than DBRAN when the variation among highfrequency queries is high. However, when the variation among high-frequency queries is low, the performance of both DBRAN and DBSM are about the same.

How to Cite

[1]

K. Kunanusont and J. Chongstitvatana, “Cluster Analysis to Find Sets of High-frequency Queries for Filtering in Similarity Join”, ECTI-CIT Transactions, vol. 10, no. 1, pp. 53–61, Apr. 2016.

Issue

Vol. 10 No. 1 (2016): ECTI Transaction on CIT (May 2016)

Section

Artificial Intelligence and Machine Learning (AI)

Article Sidebar

Main Article Content

Abstract

Article Details