ขั้นตอนวิธีการเปรียบคู่สายอักขระแบบประมาณโดยอาศัยรายการผกผันแบบเดี่ยว

Soontaree Thumsuwan; Nuanprang Sangurai; Chouvalit Khancome

doi:10.14456/jait.2025.24

PDF

Published: Aug 6, 2025

DOI: https://doi.org/10.14456/jait.2025.24

Keywords:

String matching algorithm Multiple character inverted lists Inverted index Pattern maching Exact string matching

Soontaree Thumsuwan

Major in Information Technology Department, Faculty of Science and Technology, Rajabhat Rajanagarindra University, Chachoengsao 24000, Thailand

Nuanprang Sangurai

Major in Information Technology Department, Faculty of Science and Technology, Rajabhat Rajanagarindra University, Chachoengsao 24000, Thailand

Chouvalit Khancome

Computer Science Department, Faculty of Science, Ramkhamhaeng University, Bangkok 10240, Thailand

Abstract

Approximate string matching is a fundamental technique in data retrieval that allows for typo errors or misspellings. It is widely applied in databases, search engines, and various applications or online services. To enhance the speed and accuracy of data retrieval, the development of new algorithms remains a significant challenge in computer science research. This paper introduces a novel data structure for approximate search, called the Single Inverted List, which supports a configurable level of error tolerance. Based on this structure, a new approximate string matching algorithm is developed. Theoretical analysis shows that the proposed structure can be constructed with time complexity proportional to the length of the pattern string and requires storage space equal to the sum of the pattern length and the number of distinct characters. The proposed algorithm achieves search performance with time complexity proportional to the product of the text length and the pattern length, while also supporting error-tolerant matching. Experimental results demonstrate that the proposed structure consumes the least memory compared to well-known existing algorithms, and the developed algorithm performs approximate searches efficiently, nearly as fast as the fastest existing methods, while maintaining linear-time performance.

How to Cite

Thumsuwan, S., Sangurai, N., & Khancome, C. (2025). Approximate String Matching Algorithm using Single Inverted Lists. Journal of Applied Informatics and Technology, 7(2), 389–405. https://doi.org/10.14456/jait.2025.24

Issue

Vol. 7 No. 2 (2025): July - December

Section

Research Article

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All authors need to complete copyright transfer to Journal of Applied Informatics and Technology prior to publication. For more details click this link: https://ph01.tci-thaijo.org/index.php/jait/copyrightlicense

References

Abraham, D., & Raj, N. S. (2014). Approximate string matching algorithm for phishing detection. 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2285–2290. https://doi.org/10.1109/icacci.2014.6968578

Boguszewski, A., Szymański, J., & Draszawka, K. (2016). Towards increasing F-measure of approximate string matching in O(1) complexity. Proceedings of the 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), 8, 527–532. https://doi.org/10.15439/2016f311

Dondi, R., Mauri, G., & Zoppis, I. (2022). On the complexity of approximately matching a string to a directed graph. Information and Computation, 288, 104748. https://doi.org/10.1016/j.ic.2021.104748

Faro, S., & Scafiti, S. (2022). A weak approach to suffix automata simulation for exact and approximate string matching. Theoretical Computer Science, 933, 88–103. https://doi.org/10.1016/j.tcs.2022.08.028

Gusfield, D. (1997). Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge University Press.

Karp, R. M., & Rabin, M. O. (1987). Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2), 249–260. https://doi.org/10.1147/rd.312.0249

Khan, M. G., Halim, Z., & Baig, A. R. (2023). An efficient approach for faster matching of approximate patterns in graphs. Knowledge-Based Systems, 276, 110770. https://doi.org/10.1016/j.knosys.2023.110770

Khancome, C., & Boonjing, V. (2010). Inverted Lists String Matching Algorithms. International Journal of Computer Theory and Engineering, 352–357. https://doi.org/10.7763/ijcte.2010.v2.166

Levenshtein, V. (1965). Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission, 1(1), 8-17.

Levenshtein, V. I. (1966) Binary codes of correcting deletions, insertions, and reversals. Soviet Physics-Doklady, 10(8), 707-710.

Navarro, G., & Raffinot, M. (2002). Flexible pattern matching in strings: Practical on-line search algorithms for texts and biological sequences. Cambridge University Press.

Needleman, S. B., & Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3), 443–453. https://doi.org/10.1016/0022-2836(70)90057-4

Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147(1), 195–197. https://doi.org/10.1016/0022-2836(81)90087-5

Uhlig, F., Struppek, L., Hintersdorf, D., Göbel, T., Baier, H., & Kersting, K. (2023). Combining AI and AM – Improving approximate matching through transformer networks. Forensic Science International: Digital Investigation, 45, 301570

Article Sidebar

Main Article Content

Abstract

Article Details

References