Concept-based one-class SVM classifier with supervised term weighting scheme for imbalanced sentiment classification

Main Article Content

Khanista Namee
Jantima Polpinij

Abstract

Imbalanced sentiment is one of the key classification issues. Many studies have proposed imbalanced sentiment classification improvements, but the topic remains problematic as a major challenge. This paper proposes a method, called “concept-based one-class SVM classifier”, to address imbalanced sentiment classification that consists of three main techniques. First, we apply Word2Vec and PageRank algorithms to extract “concepts” and their related terms (called “members”) embedded in texts. The corpus of “concepts” is then used to prepare the dataset by replacing words with the “concepts”. This reduces term ambiguity and also the size of word vectors.  Second, supervised term weighting (STW) schemes are applied to determine the importance of a word in a document of a specific class. This reflects the class distinguishing power of each term. Finally, the one-class support vector machine (SVM) algorithm is used for sentiment classifier modeling. This has proved useful for imbalanced data classification, especially when the minority class lacks structure and is predominantly composed of small disjuncts or outliers. By combining these techniques, our proposed method may be able to competently identify and distinguish between the characteristics of each class, especially in the context of an imbalanced data scenario. After validating the proposed method with the hotel review dataset, and running experiments with different imbalanced ratios, our proposed method returned satisfactory results of recall, precision, and F1. We then selected the best model generated from our method and compared the results to the state-of-the-art method. Our proposed method returned better results than the state-of-the-art method, with improved scores of F1 at 3.19%. Moreover, if considering for the computational processing time, our proposed method is faster than the state-of-the-art method.

Article Details

How to Cite
Namee, K., & Polpinij, J. . (2021). Concept-based one-class SVM classifier with supervised term weighting scheme for imbalanced sentiment classification. Engineering and Applied Science Research, 48(5), 604–613. Retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/243606
Section
ORIGINAL RESEARCH

References

Wu Y, Wei F, Liu S, Au N, Cui W, Zhou H, et al. Opinion seer: interactive visualization of hotel customer feedback. IEEE Trans Visual Comput Graph. 2010;16:1109-18.

Liu B, Zhang L. A survey of opinion mining and sentiment analysis. In: Aggarwal C, Zhai C, editors. Mining Text Data. New York: Springer; 2012. p. 415-63.

Lakshmanaprabu SK, Shankar K, Gupta D, Khanna A, Rodrigues J, Pinheiro PR, et al. Ranking analysis for online customer reviews of products using opinion mining with clustering. Complexity. 2018;2018:1-9.

Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J. 2014;5:1093-113.

Liu SM, Chen JH. A multi-label classification-based approach for sentiment classification. Processing. 2015;42:1083-93.

Catal C, Nangir M. A sentiment classification model based on multiple classifiers. Appl Soft Comp. 2017;50:135-41.

Li S, Zhou G, Wang Z, Lee SYM. Imbalanced sentiment classification. Proceedings of the 20th ACM Conference on Information and Knowledge Management; 2011 Oct 24-28; Glasgow, United Kingdom. New York: IEEE; 2011. p. 28281-90.

Li Y, Guo H, Zhang Q, Gu M, Yang J. Imbalanced text sentiment classification using universal and domain-specific knowledge. Knowl Base Syst. 2018;160:1-15.

Wu F, Wu C, Liu J. Imbalanced sentiment classification with multi-task learning. Proceedings of the 27th ACM international conference on information and knowledge management; 2018 Oct 22-26; Torino, Italy. New York: Association for Computing Machinery; 2018. p. 1631-4.

Wang S, Minku LL, Yao X. Resampling-based ensemble methods for online class imbalance learning. IEEE Trans Knowl Data Eng. 2015;27:1356-68.

Prusa J, Khoshgoftaar TM, Dittman DJ, Napolitano A. Using random under sampling to alleviate class imbalance on tweet sentiment data. 2015 IEEE International conference on information reuse and integration; 2015 Aug 13-15; San Francisco, USA. New York: IEEE; 2015. p. 197-202.

Zhuang L, Dai H. Parameter estimation of one-class SVM on imbalance text classification. 19th Conference of the Canadian society for computational studies of intelligence; 2006 Jun 7-9; Quebec City, Canada. Berlin: Springer; 2006. p. 538-49.

Klikowski J, Wozniak M. Employing one-class SVM classifier ensemble for imbalanced data stream classification. International conference on computational science; 2020 Jun 3-5; Amsterdam, Netherlands. Berlin: Springer; 2020. p. 117-27.

Cheng F, Zhang J, Wen C. Cost-sensitive large margin distribution machine for classification of imbalanced data. Pattern Recogn Lett. 2016;80:107-12.

Wang S, Liu W, Wu J, Cao L. Training deep neural networks on imbalanced data sets. 2016 International joint conference on neural networks (IJCNN); 2016 Jul 24-29; Vancouver, Canada. New York: IEEE; 2016. p. 4368-74.

Al-Azani SH, El-Alfy EM. Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short Arabic text. Procedia Comput Sci. 2017;109:359-66.

Liu Y, Loh HT, Sun A. Imbalanced text classification: a term weighting approach. Expert Syst Appl. 2009; 36:690-701.

Nguyen TT, Chang K, Hui SC. Supervised term weighting for sentiment analysis. Proceedings of 2011 IEEE international conference on intelligence and security informatics; 2011 Jul 10-12; Beijing, China. New York: IEEE; 2011. p. 89-94.

Naderalvojoud B, Bozkir AS, Sezer EA. Investigation of term weighting schemes in classification of imbalanced texts. 8th European Conference Data Mining; 2014 Jul 15-17; Lisbon, Portugal. p. 39-46.

Domeniconi G, Moro G, Pasolini R, Sartori C. A study on term weighting for text categorization: a novel supervised variant of tf.idf. Proceedings of 4th international conference on data management technologies and applications; 2015 Jul 20-22; Colmar, France. Portugal: SciTePress; 2015. p. 26-37.

Triguero I, Rio S, Lopez V, Bacardit J, Benitez JM, Herrera F. ROSEFW-RF: the winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem. Knowl Base Syst. 2015;87:69-79.

Bird S, Klein EH, Loper E. Natural language processing with python. Sebastopol: O'Reilly Media; 2009.

Li Y, Guo H, Zhang Q, Gu M, Yang J. Imbalanced text sentiment classification using universal and domain-specific knowledge. Knowl Base Syst. 2018;160:1-15.

Mikolov T, Sukskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. Proceedings of the 27th international conference on neural information processing systems; 2013 Dec 5-19; Nevada, USA. New York: Curran Associates Inc; 2013. p. 3111-9.

Ma L, Zhang Y. Using Word2Vec to process big text data. IEEE International Conference Big Data; 2015 Oct 29-Nov 1; Santa Clara, USA. New York: IEEE; 2015. p. 2895-7.

Li J, Huang G, Fan C, Sun Z, Zhu H. Keyword extraction for short text via word2vec, doc2vec, and textrank. Turk J Electr Eng Comput Sci. 2019;27:1794-805.

Dai K. PageRank Lecture Note [Internet]. 2009 [cited 15 Aug 2020]. Available from: https://www.ccs.neu.edu/home/daikeshi/notes/PageRank.pdf.

Sibunruang C, Polpinij J. Concept-based text classification of Thai medicine recipes represented with ancient Isan language. In: Unger H, Meesad P, Boonkrong S, editors. Recent advances in information and communication technology. Berlin: Springer; 2015. p. 117-26.

Lan M, Tan C, Su J, Lu Y. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans Pattern Anal Mach Intell. 2009;31(4):721-35.

Chen K, Zhang Z, Long J, Zhang H. Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Syst Appl. 2016;66:1339-51.

Carvalho F, Guedes GP. TF-IDFC-RF: a novel supervised term weighting scheme. arXiv:2003.07193. 2020:1-28.

Atdag S, Labatut V. A comparison of named entity recognition tools applied to biographical texts. 2nd International conference on systems and computer science; 2015 Aug 26-27; Villeneuve d'Ascq, France. New York: IEEE; 2013. p. 228-33.

Deng ZH, Luo KH, Yu HL. A study of supervised term weighting scheme for sentiment analysis. Expert Syst Appl. 2014;41:3506-13.

Figueiredo F, Rocha L, Couto T, Salles T, Gonçalves MD, Wagner M. Word co-occurrence features for text classification. Inform Syst. 2011;36(5):843-53.

Noumir Z, Honeine P, Richard C. On sample one-class classification methods. Proceedings IEEE International Symposium on Information Theory; 2012 Jul 1-6; Cambridge, USA. New York: IEEE; 2012. p. 2022-6.

Bounsiar A, Madden MG. Kernels for one-class support vector machines. 2014 International Conference on Information Science & Applications (ICISA); 2014 May 6-9; Seoul, Korea. New York: IEEE; 2014. p. 1-4.