การเปรียบเทียบ BERTopic และ LDA สำหรับการจำแนกหัวข้อภาวะซึมเศร้าในข้อความจาก Reddit

Thanesorn Khiewboriboon; Sorawit Taochoo; Arisara Yokyorkhun; Nantapong Keandoungchun

doi:10.14416/j.kmutnb.2026.04.001

PDF

Published: Apr 2, 2026

DOI: https://doi.org/10.14416/j.kmutnb.2026.04.001

Keywords:

Natural Language Processing (NLP) Topic Modeling Depression Mental Health

Thanesorn Khiewboriboon

Department of Information Technology, Faculty of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand

Sorawit Taochoo

Department of Information Technology, Faculty of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand

Arisara Yokyorkhun

Department of Information Technology, Faculty of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand

Nantapong Keandoungchun

Department of Information Technology, Faculty of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand

Abstract

Depression and mental health problems have increasingly become critical issues that significantly affect the quality of life, particularly among adolescents and working-age populations. Many individuals often express their emotions and symptoms through social media platforms, which can serve as valuable sources of data for computational analysis. This study aims to compare the performance of two topic modeling algorithms, Latent Dirichlet Allocation (LDA) and BERTopic, using a dataset of 6,397 depression-related posts collected from Reddit. The evaluation employed three metrics: Purity Score, Entropy Score, and Rand Index (RI). The results demonstrate that BERTopic outperformed LDA, achieving a higher Purity Score (39.06%), lower Entropy (1.93%), and higher RI (66.84%) compared to LDA’s 34.38%, 2.11%, and 65.47%, respectively. These findings indicate BERTopic’s superior capability in producing co-herent and accurate topic clusters that align more closely with the ground truth. Nevertheless, this study is limited by the use of only 10% of the total dataset for testing, which may affect the comprehensiveness of the evaluation. Therefore, future studies should increase the size of the test set and incorporate Thai-language contexts to broaden the scope of practical applications in mental health research.

Issue

Vol. 36 No. 2 (2026): April - June, 2026

Section

Information Technology Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

The articles published are the opinion of the author only. The author is responsible for any legal consequences. That may arise from that article.

References

World Health Organization. (2022). Mental Disorders. [Online]. Available: https://www. who.int/news-room/fact-sheets/detail/mentaldisorders.

World Health Organization. (2022). Mental Health and COVID-19: Early Evidence of the Pandemic's Impact: Scientific Brief. [Online]. Available: https://www.who.int/publications/i/ item/WHO-2019-nCoV-Sci_Brief-Mental_ health-2022.1.

S. Salmi, R. v. d. Mei, S. Mérelle, and S. Bhulai, “Topic modeling for conversations for mental health helplines with utterance embedding,” Journal of Computational Social Science, vol. 13, 2024, doi: 10.1016/j.teler.2024.100126.

A. Krishnan and P. Kennedyraj, “Exploring the power of topic modeling techniques in analyzing customer reviews: A comparative analysis,” arXiv, 2023, doi: 10.48550/arXiv.2308.11520.

R. Egger and J. Yu, “A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts,” Frontiers in Sociology, vol. 7, 2022, doi: 10.3389/fsoc.2022.886498.

A. Rkia, A. Fatima-Azzahrae, A. Mehdi, and L. Lily, “NLP and topic modeling with LDA, LSA, and NMF for monitoring psychosocial well-being in monthly surveys,” Procedia Computer Science, vol. 251, pp. 398-405, 2024, doi: 10.1016/j.procs.2024.11.126.

A. Khan and R. Ali, “Measuring the effectiveness of LDA-based clustering for social media data,” 2022, doi: 10.37394/232025.2022.4.11.

M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” arXiv, 2022, doi: 10.48550/arXiv.2203.05794.

D. Sik, R. Németh, and E. Katona, “Topic modelling online depression forums: Beyond narratives of self-objectification and self-blaming,” Journal of Affective Disorders Reports, vol. 32, no. 2, pp. 386–395, 2021, doi: 10.1080/ 09638237.2021.1979493.

L. Ma, R. Chen, W. Ge, P. Rogers, B. Lyn-Cook, H. Hong, W. Tong, N. Wu, and W. Zou, “AI-powered topic modeling: Comparing LDA and BERTopic in analyzing opioid-related cardiovascular risks in women,” Experimental Biology and Medicine, vol. 250, 2025, doi: 10.3389/ebm. 2025.10389.

A. Qasim, G. Mehak, N. Hussain, A. Gelbukh, and G. Sidorov, “Detection of depression severity in social media text using transformer-based models,” Information, vol. 16, no. 2, 2025, doi: 10.3390/info16020114.

Article Sidebar

Main Article Content

Abstract

Article Details

References