Measurement of Word Similarity for Diabetes Question Answering System

Main Article Content

Tanapol Chamnanhan
Ketsara Phetkrachang
Santi Sathiwantana
Pongsagorn Pongsagorn
Chaisit Choosong

Abstract

Diabetes is a chronic disease that cannot be cured and is a major problem for the public health of Thailand. The Department of Disease Control predicts that by 2025 there will be more than 7.41 million people in Thailand with diabetes. Continuous self-care for people with diabetes is one method that helps to reduce the incidences of complications arising in already compromised body systems affecting the lives of patients. This research, therefore, presents a measure of the similarity of words in Thai question-answering systems for diabetes by using Cosine, Dice and Jaccard methods to compare the effectiveness of finding answers for the benefit of people who want to know about the initial symptoms of diabetes and self-care for people with diabetes. The preliminary results from the study comparing answer finding efficiency using the question-answer similarity measurement methods found that Cosine was the most effective in finding answers with a precision value of 92.50%, followed by Jaccard and Dice which had precision values of 80.28% and 52.50% respectively.

Article Details

How to Cite
Chamnanhan, T., Phetkrachang, K. ., Sathiwantana, S., Pongsagorn, P., & Choosong, C. (2023). Measurement of Word Similarity for Diabetes Question Answering System. Journal of Applied Informatics and Technology, 5(2), 86–99. https://doi.org/10.14456/jait.2023.7
Section
Research Article

References

Ditcharoen, N. & Techawiwatthanaboon, S. (2018). An alternative approach to course description comparison for university credit transfer using word similarity measurement and vector space model. Journal of Science and Technology Mahasarakham University, 37(4), 580-586. [In Thai]

Frankes, W. B. & Baeza-Yates, R. (1992). Information retrieval: Data structure & algorithms. NJ, United State : Prentice-Hall.

Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Societe Vaudoise des Sciences Naturelles, 37(142), 547-579. https://www.researchgate.net/publication/225035806_Etude_de_la_distribution_florale_dans_une_portion_des_Alpes_et_du_Jura

Kitreerawutiwong, N. & Tejativaddhana, P. (2013). Validity and reliability of the Thai version of the experienced continuity of care for diabetes Mellitus (ECC-DM) questionnaire. The Public Health Journal of Burapha University, 8(1), 13-25. https://he02.tci-thaijo.org/index.php/phjbuu/article/view/45580 [In Thai]

Kondrak, G, Marcu, D., & Knight, K. (2003). Cognates can improve statistical translation models. In Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers (pp. 46–48).

Mutabazi, E., Ni, J., Tang, G., & Cao, W. (2021). A review on medical textual question answering systems based on deep learning approaches. Applied Sciences, 11(12), 5456. https://doi.org/10.3390/app11125456

Phetkrachang, K., Sathiwantanah, S., & Kongwan, A. (2022). A development of online question answering system for student registration web service of university using ontology technology. KKU Science Journal, 50(1), 24-34. https://ph01.tci-thaijo.org/index.php/KKUSciJ/article/view/250295 [In Thai]

Radev, D.R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Zhang, Z., Fan, W., & Prager, J. (2001). Mining the web for answers to natural language questions. International Conference on Information and Knowledge Management, Atlanta, Georgia, USA., (pp.143-150). https://doi.org/10.1145/502585.502610

Richard, J. F., Godbout, P., & Grèhaigne, J. F. (2000). Students' precision and interobserver reliability of performance assessment in teamsports. Research Quarterly for Exercise and Sport, 71(1), 85-91. https://doi.org/10.1080/02701367.2000.10608885

Salton, G & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513-523. https://doi.org/10.1016/0306-4573(88)90021-0

Senoussaoui, M., Kenny, P., Stafylakis, T., & Dumouchel, P. (2014). A study of the cosine distance-based mean shift for telephone speech diarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1), 217-227. https://doi.org/10.1109/TASLP.2013.2285474

Sornlertlamvanich, V. (1993). Word segmentation for Thai in machine translation system. Bangkok : NECTEC. [In Thai]

Wongsara, R., Homjun, K, & Ketui, N. (2021). Development of Thai subjective scoring system based on cosine-similarity. Journal of Applied Information Technology, 7(2), 7-16. https://ph02.tci-thaijo.org/index.php/project-journal/article/view/245037 [In Thai]

Xie, W., Ding, R., Yan, J., & Qu, Y. (2018). A mobile-based question-answering and early warning system for assisting diabetes management. Wireless Communications and Mobile Computing, 2018, 1-14. https://doi.org/10.1155/2018/9163160