Complete Soundex for Thai Words Similarity Analysis

Main Article Content

เฉลิมพล ทัพซ้าย
พยุง มีสัจ
ชูชาติ หฤไชยะศักดิ์

Abstract

This research aims to design and develop a Complete Soundex (CS), which is a new algorithm of entirely phonetic components encoding system, together with using of the similarity comparison technique instead of the matching comparison in Traditional Soundex (TS). The CS encoding process was performed by a converter machine and was created from the newly redesigned rules which were divided into basic rules and additional rules. All rules and tables of code values were defined corresponding to the phonetic feature of each word's component. Testing and evaluation are performed by input 6 types of misspelling words and names, including Excess of alphabets, Missing of alphabets, Repetition of alphabets, Typo error, Misplacement of alphabets, and alternative spelling, into the CS system to search for the similar words from the dictionary. By comparing with 4 TSs, the results show that the CS is the most effective system which can retrieve more precisely similar words with similarity values, which is not provided in the TSs. Moreover, It also found that the CS can solve most of the problems caused by the complexity of consonant clustering and vowels, that has not been fixed in the TSs.

Article Details

Section
บทความวิจัย