Leveraging PyThaiNLP for Sentiment Analysis of Thai Online Text: A Comparative Study of Logistic Regression and Support Vector Machine
Main Article Content
Abstract
The objective of this study is to compare the performance of sentiment analysis models for Thai online text using the existing PyThaiNLP libraries. For extracting text from online sources to create a dataset, the text was manually categorized into positive, neutral, and negative sentiments. Data preprocessing involved removing punctuation marks, tokenizing, removing non-Thai characters, and Bag of Words creation. The data was then divided into training and testing sets to build models using three algorithms: logistic regression, logistic regression with stochastic gradient descent (SGD), and support vector machine (SVM). Upon comparison, the logistic regression model was found to perform the best – achieving accuracy of 80.73% with a 90:10 train-test split using the newmm word tokenization tool and the augmented dictionary. The accuracy for analyzing positive sentiment was 81.10%, for neutral sentiment, 80.16%, and for negative sentiment, 80.97%.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All authors need to complete copyright transfer to Journal of Applied Informatics and Technology prior to publication. For more details click this link: https://ph01.tci-thaijo.org/index.php/jait/copyrightlicense
References
Aliman, G. B., et al. (2022). Sentiment analysis using logistic regression. Journal of Computational Innovations and Engineering Applications, 11(7), 36–40. https://doi.org/10.9790/9622-1107023640
Bowornlertsutee, P., & Paireekreng, W. (2022). The model of sentiment analysis for classifying the online shopping reviews. Journal of Engineering and Digital Technology (JEDT), 10(1), 71-79. https://ph01.tci-thaijo.org/index.php/TNIJournal/article/view/246375 [In Thai].
Chaisanguan, S., & Romsaiyud, W. (2018). Development of a real-time sentiment analysis system of students on Facebook using Naive Bayes classifier in Thai language. Proceedings of the 8th STOU National Research Conference (pp. 521–535).
Sukhothai, Thailand, November 23, 2018 [In Thai].
Kanoktipsatharporn, S. (2020). Python ตัดคำภาษาไทย ด้วย PyThaiNLP API ตัดคำ Word Tokenize ภาษาไทย ตัวอย่างการตัดคำภาษาไทย อัลกอริทึม deepcut, newmm, longest, pyicu, attacut – PyThaiNLP ep.2. Retrieved 12 October 2023, from https://www.bualabs.com/archives/3740/python-word-tokenize-pythainlp-example-algorithm-deepcut-newmm-longest-python-pythainlp-ep-2/ [In Thai]
Lisirikul, C., & Numpradit, J. (2018). Opinion analysis system to business by text mining on Twitter. Proceedings of the 14th National Conference on Computing and Information Technology (pp. 408–413). Chiang Mai, Thailand, July 5–6, 2018.
Noro-a, S., Charong, N., Musigcharoen, N., & Yossiri, V. (2018). Features of language used of Thai teenagers on social media. Proceedings of the 9th Hatyai National and International Conference (pp. 940–952). Songkhla, Thailand, July 20, 2018. [In Thai]
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002) (pp. 79–86). Association for Computational Linguistics. https://aclanthology.org/W02-1011/
PyThaiNLP. (2024). PyThaiNLP: Thai natural language processing in python. Retrieved 31 March 2024, from https://github.com/PyThaiNLP/pythainlp
Scikit-learn. (2024). 1. Supervised learning. Retrieved 14 March 2024, from https://scikit-learn.org/stable/supervised_learn
ing.html#supervised-learning
Singh, T., & Kumari, M. (2016). Role of text pre-processing in Twitter sentiment analysis. Procedia Computer Science, 89, 549–254. https://doi.org/10.1016/j.procs.2016.06.095
Suriyawongkul, A., Chuangsuwanich, E., Chormai, P., & Polpanumas, C. (2019). Wisesight sentiment corpus. Retrieved 11 October 2023, from https://github.com/PyThaiNLP/wisesight-sentiment
Wankhade, M., Rao, A. C., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review, 55(7), 5731-5780. https://link.springer.com/article/10.1007/s10462-022-10144-1