Sentiment Classification Based on Term Weighting with Class-mutual Information
Main Article Content
Abstract
Online platforms and information technology are developing quickly, which boosts the popularity of online e-commerce. Nowadays, posting content from client sentiments is vital for product makers to improve product quality as much as possible to exceed customers' expectations. Sentiment analysis is a process of natural language processing that finds out the sentiment and attitudes of users towards a product, whether positive or negative. Most sentiment and text classification research use term weighting with inverse document frequency (idf). However, assigning term weights using the idf method alone may not be effective enough to classify sentiment because this weight does not consider vital information classification views. This paper presents a supervised term weighting using the class mutual information calculated with the term frequency and the inverse document frequency. Experimental results show that the proposed method performs more effectively than term distribution and the term weighting that use only the inverse document frequency when considering by the performance indicator value: Accuracy, Precision, Recall and F1-Measure.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Article Accepting Policy
The editorial board of Thai-Nichi Institute of Technology is pleased to receive articles from lecturers and experts in the fields of business administration, languages, engineering and technology written in Thai or English. The academic work submitted for publication must not be published in any other publication before and must not be under consideration of other journal submissions. Therefore, those interested in participating in the dissemination of work and knowledge can submit their article to the editorial board for further submission to the screening committee to consider publishing in the journal. The articles that can be published include solely research articles. Interested persons can prepare their articles by reviewing recommendations for article authors.
Copyright infringement is solely the responsibility of the author(s) of the article. Articles that have been published must be screened and reviewed for quality from qualified experts approved by the editorial board.
The text that appears within each article published in this research journal is a personal opinion of each author, nothing related to Thai-Nichi Institute of Technology, and other faculty members in the institution in any way. Responsibilities and accuracy for the content of each article are owned by each author. If there is any mistake, each author will be responsible for his/her own article(s).
The editorial board reserves the right not to bring any content, views or comments of articles in the Journal of Thai-Nichi Institute of Technology to publish before receiving permission from the authorized author(s) in writing. The published work is the copyright of the Journal of Thai-Nichi Institute of Technology.
References
O. Gokalp, E. Tasci, and A. Ugur, “A novel wrapper feature selection algorithm based on iterated greedy metaheuristic for sentiment classification,” Expert Syst. Appl., vol. 146, May 2020, Art. no. 113176.
W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, Dec. 2014.
D. M. E.-D. M. Hussein, “A survey on sentiment analysis challenges,” J. King Saud Univ. – Eng. Sci., vol. 30, no. 4, pp. 330–338, Oct. 2018.
G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu, “Sentiment classification: The contribution of ensemble learning,” Decis. Support Syst., vol. 57, pp. 77–93, Jan. 2014.
J. Chen, J. Yu, S. Zhao, and Y. Zhang, “User’s review habits enhanced hierarchical neural network for document-level sentiment classification,” Neural Process. Lett., vol. 53, pp. 2095–2111, Apr. 2021.
G. Wang, Z. Zhang, J. Sun, S. Yang, and C. A. Larson, “POS-RS: A random subspace method for sentiment classification based on part-of-speech analysis,” Inf. Process. Manage., vol. 51, no. 4, pp. 458–479, Jul. 2015.
C. Yang, X. Chen, L. Liu, and P. Sweetser, “Leveraging semantic features for recommendation: Sentence-level emotion analysis,” Inf. Process. Manage., vol. 58, no. 3, May 2021, Art. no. 102543.
R. K. Yadav, L. Jiao, O.-C. Granmo, and M. Goodwin, “Human-level interpretable learning for aspect-based sentiment analysis,” in Proc. 35th AAAI Conf. Artif. Intell. (AAAI-21), Palo Alto, CA, USA, Feb. 2021, pp. 14203–14212.
V. Lertnattee and T. Theeramunkong, “Effect of term distributions on centroid-based text categorization,” Inf. Sci., vol. 158, pp. 89–115, Jan. 2004.
U. Buatoom, W. Kongprawechnon, and T. Theeramunkong, “Improving seeded k-means clustering with deviation- and entropy-based term weightings,” IEICE Trans. Inf. Syst., vol. E103–D, no. 4, pp. 748–758, Apr. 2020.
H. Liu, X. Chen, and X. Liu, “A study of the application of weight distributing method combining sentiment dictionary and TF-IDF for text sentiment analysis,” IEEE Access, vol. 10, pp. 32280–32289, 2022, doi: 10.1109/ACCESS.2022.3160172.
I. Almalis, E. Kouloumpris, and I. Vlahavas, “Sector-level sentiment analysis with deep learning,” Knowl.-Based Syst., vol. 258, 2022, Art. no. 109954.
Y. Dang, Y. Zhang, and H. Chen, “A lexicon-enhanced method for sentiment classification: An experiment on online product reviews,” IEEE Intell. Syst., vol. 25, no. 4, pp. 46–53, 2010.
S. Foithong, O. Pinngern, and B. Attachoo, “Feature subset selection wrapper based on mutual information and rough sets,” Expert Syst. Appl., vol. 39, no. 1, pp. 574–584, 2012.
A. K. Paul and P. C. Shill, “Sentiment mining from Bangla data using mutual information,” in Proc. 2nd Int. Conf. Elect., Comput. & Telecommun. Eng. (ICECTE), Rajshahi, Bangladesh, Dec. 2016, pp. 1–4.
X. -Y. Jiang and J. Shui, “An improved mutual information-based feature selection algorithm for text classification,” in Proc. 5th Int. Conf. Intell. Human-Mach. Syst. and Cybernetics, Hangzhou, China, Aug. 2013, pp. 126–129.
A. Bagheri, M. Saraee, and F. de Jong, “Sentiment classification in Persian: Introducing a mutual information-based method for feature selection,” in Proc. 21st Iranian Conf. Elect. Eng. (ICEE), Mashhad, Iran, May 2013, pp. 1–6.
J. M. Sanchez-Gomez, M. A. Vega-Rodríguez, and C. J. Pérez, “The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization,” Expert Syst. Appl., vol. 169, May 2021, Art. no. 114510.
V. Lertnattee and T. Theeramunkong, “Multidimensional text classification for drug information,” IEEE Trans. Inf. Technol. Biomed., vol. 8, no. 3, pp. 306–312, Sep. 2004, doi: 10.1109/TITB.2004.832542.
U. Buatoom, W. Kongprawechnon, and T. Theeramunkong, “Document clustering using k-means with term weighting as similarity-based constraints,” Symmetry, vol. 12, no. 6, p. 967, 2020.
M. Lan, C. L. Tan, J. Su, and Y. Lu, “Supervised and traditional term weighting methods for automatic text categorization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 721–735, Apr. 2009.
Z. Feng, H. Zhou, Z. Zhu, and K. Mao, “Tailored text augmentation for sentiment analysis,” Expert Syst. Appl., vol. 205, Nov. 2022, Art. no. 117605.
H. Zhao, Z. Liu, X. Yao, and Q. Yang, “A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach,” Inf. Process. Manage., vol. 58, no. 5, Sep. 2021, Art. no. 102656.
I. Chaturvedi, E. Cambria, R. E. Welsch, and F. Herrera, “Distinguishing between facts and opinions for sentiment analysis: Survey and challenges,” Inf. Fusion, vol. 44, pp. 65–77, 2018.
D. Kotzias, M. Denil, N. D. Freitas, and P. Smyth, “From group to individual labels using deep features,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery and Data Mining, Sydney, Australia, Aug. 2015, pp. 597–606.