BERTopic vs. LDA: A Comparative Analysis for Identifying Depression Topics in Reddit Messages
Main Article Content
Abstract
Depression and mental health problems have increasingly become critical issues that significantly affect the quality of life, particularly among adolescents and working-age populations. Many individuals often express their emotions and symptoms through social media platforms, which can serve as valuable sources of data for computational analysis. This study aims to compare the performance of two topic modeling algorithms, Latent Dirichlet Allocation (LDA) and BERTopic, using a dataset of 6,397 depression-related posts collected from Reddit. The evaluation employed three metrics: Purity Score, Entropy Score, and Rand Index (RI). The results demonstrate that BERTopic outperformed LDA, achieving a higher Purity Score (39.06%), lower Entropy (1.93%), and higher RI (66.84%) compared to LDA’s 34.38%, 2.11%, and 65.47%, respectively. These findings indicate BERTopic’s superior capability in producing co-herent and accurate topic clusters that align more closely with the ground truth. Nevertheless, this study is limited by the use of only 10% of the total dataset for testing, which may affect the comprehensiveness of the evaluation. Therefore, future studies should increase the size of the test set and incorporate Thai-language contexts to broaden the scope of practical applications in mental health research.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The articles published are the opinion of the author only. The author is responsible for any legal consequences. That may arise from that article.
References
World Health Organization. (2022). Mental Disorders. [Online]. Available: https://www. who.int/news-room/fact-sheets/detail/mentaldisorders.
World Health Organization. (2022). Mental Health and COVID-19: Early Evidence of the Pandemic's Impact: Scientific Brief. [Online]. Available: https://www.who.int/publications/i/ item/WHO-2019-nCoV-Sci_Brief-Mental_ health-2022.1.
S. Salmi, R. v. d. Mei, S. Mérelle, and S. Bhulai, “Topic modeling for conversations for mental health helplines with utterance embedding,” Journal of Computational Social Science, vol. 13, 2024, doi: 10.1016/j.teler.2024.100126.
A. Krishnan and P. Kennedyraj, “Exploring the power of topic modeling techniques in analyzing customer reviews: A comparative analysis,” arXiv, 2023, doi: 10.48550/arXiv.2308.11520.
R. Egger and J. Yu, “A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts,” Frontiers in Sociology, vol. 7, 2022, doi: 10.3389/fsoc.2022.886498.
A. Rkia, A. Fatima-Azzahrae, A. Mehdi, and L. Lily, “NLP and topic modeling with LDA, LSA, and NMF for monitoring psychosocial well-being in monthly surveys,” Procedia Computer Science, vol. 251, pp. 398-405, 2024, doi: 10.1016/j.procs.2024.11.126.
A. Khan and R. Ali, “Measuring the effectiveness of LDA-based clustering for social media data,” 2022, doi: 10.37394/232025.2022.4.11.
M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,” arXiv, 2022, doi: 10.48550/arXiv.2203.05794.
D. Sik, R. Németh, and E. Katona, “Topic modelling online depression forums: Beyond narratives of self-objectification and self-blaming,” Journal of Affective Disorders Reports, vol. 32, no. 2, pp. 386–395, 2021, doi: 10.1080/ 09638237.2021.1979493.
L. Ma, R. Chen, W. Ge, P. Rogers, B. Lyn-Cook, H. Hong, W. Tong, N. Wu, and W. Zou, “AI-powered topic modeling: Comparing LDA and BERTopic in analyzing opioid-related cardiovascular risks in women,” Experimental Biology and Medicine, vol. 250, 2025, doi: 10.3389/ebm. 2025.10389.
A. Qasim, G. Mehak, N. Hussain, A. Gelbukh, and G. Sidorov, “Detection of depression severity in social media text using transformer-based models,” Information, vol. 16, no. 2, 2025, doi: 10.3390/info16020114.