A Machine Learning Approach for Multi-Label Classification in Candidate Election Social Media Analysis

Main Article Content

Herman Yuliansyah
Ricy Ardiansyah
Anton Yudhana

Abstract

Multi-label text classification in social media comments presents a signi icant challenge in natural language processing. Several previous studies have conducted sentiment analysis on candidate (presidential and guber- natorial) elections using machine learning approaches. However, an opinion can contain more than one category or label simultaneously, such as sentiment, candidate, or certain issues. This study proposes a multi-label classification model to improve accuracy, addressing challenges such as complex language structure, non-standard word usage, and imbalanced data. The proposed model is compared with three popular classica- tion algorithms: Naive Bayes (NB), Support Vector Machine (SVM), and K-Nearest Neighbours (KNN), for handling multi-label text classification tasks. The proposed model comprises a classification pipeline that includes data preprocessing, feature extraction using TF-IDF, and the integration of the GridSearchCV technique to enhance algorithm performance and effectiveness. The evaluation is conducted using multi-label metrics such as Precision, Recall, and F1-Score. The experiment results showed that SVM with GridSearchCV provided the best performance in terms of precision and generalization on the gubernatorial election dataset. SVM + GridSearchCV yielded scores of 97.4% and 99.2% for candidate labels, and 99.2% and 99.0% for sentiment labels. While NB and KNN also showed improvements, their performance was not as significant as SVM. NB outper- formed in computational performance, whereas KNN demonstrated poor performance on high-dimensional data.

Article Details

How to Cite
[1]
H. Yuliansyah, R. . Ardiansyah, and A. . Yudhana, “A Machine Learning Approach for Multi-Label Classification in Candidate Election Social Media Analysis”, ECTI-CIT Transactions, vol. 20, no. 1, pp. 50–62, Jan. 2026.
Section
Research Article

References

K. Munawaroh and Alamsyah, “Performance Comparison of SVM , Na¨ıve Bayes , and KNN Algorithms for Analysis of Public Opinion Sentiment Against COVID-19 Vaccination on Twitter,” Journal of Advances in Information Systems and Technology, vol. 4, no. October, pp. 113–125, 2022.

M. Farhan, Manik, H. R. Jannah and L. H. Suadaa, “Comparison of Naive Bayes, K-Nearest Neighbor, and Support Vector Machine Classification Methods in Semi-Supervised Learning for Sentiment Analysis of Kereta Cepat Jakarta Bandung ( KCJB ),” in Proceedings of 2023 International Conference on Data Science and Official Statistics (ICDSOS), vol. 2023, no. 1, pp. 109–120, 2023.

N. Mardiah, L. Marlina, K. Khairul, Z. Sitorus and M. Iqbal, “Analysis Of Indonesian People’s Sentiment Towards 2024 Presidential Candidates On Social Media Using Na¨ıve Bayes Classifier and Support Vector Machine,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 2, pp. 950–960, 2024.

A. A. Firdaus, A. Yudhana and R. Imam, “Analisis Sentimen Pada Proyeksi Pemilihan Presiden 2024 Menggunakan Metode Support Vector Machine,” DECODE: Jurnal Pendidikan Teknologi Informas, vol. 3, no. 2, pp. 236–245, 2024.

A. A. Firdaus, A. Yudhana and I. Riadi, “Prediction of Presidential Election Results using Sentiment Analysis with Pre and Post Candidate Registration Data,” Jurnal Ilmu Komputer dan Informatika, vol. 10, no. 1, pp. 36–46, 2024.

A. A. Firdaus, A. Yudhana, I. Riadi and Mahsun, “Indonesian presidential election sentiment: Dataset of response public before 2024,” Data in Brief, vol. 52, p. 109993, Feb. 2024.

W. LI, S. Zhu, Z. Li and H. Wang, “Kernel-Based Enhanced Oversampling Method for Imbalanced Classification,” arXiv preprint arXiv:2504.09147v1, 2025.

S. Akter, I. Ishika, P. R. Das, M. Julker Nyne and D. M. Farid, “Boosting Oversampling Methods for Imbalanced Data Classification,” 2023 26th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, pp. 1-6, 2023.

N. K. Rajput, V. K. Rathi, B. A. Grover and R. Bansal, “Word Frequency and Sentiment Analysis of Twitter,” arXiv preprint arXiv:2004.03925v2, 2024.

J. Hemmatian, R. Hajizadeh and F. Nazari, “Addressing imbalanced data classification with Cluster-Based Reduced Noise SMOTE,” PLoS ONE, vol. 20, no. 2, p. e0317396, 2025.

S. A. Chaurasia and S. S. Sherekar, “Sentiment Analysis of Twitter Data by Natural Language Processing and Machine Learning Sentiment Analysis of Twitter data by Natural Language Processing and Machine Learning,” Proceedings of International Conference on Advanced Communications and Machine Intelligence, pp. 1–15, 2023.

K. S. Eljil, F. Na¨ıt-Abdesselam, E. Hamouda and M. Hamdi, “Enhancing Sentiment Analysis on Social Media with Novel Preprocessing Techniques,” Journal of Advances in Information Technology, vol. 14, no. 6, pp. 1206–1213, 2023.

M. D. Samad, N. Khounviengxay and M. A. Witherow, “Effect of Text Processing Steps on Twitter Sentiment Classification 61 using Word Embedding,” arXiv preprint arXiv:2007.13027v1, 2020.

S. S. Berutu, H. Budiati, Jatmika and F. Gulo, “Data preprocessing approach for machine learning-based sentiment classification,” INFOTEL (Informatics, Telecommunication, and Electronics), vol. 15, no. 4, pp. 317–325, 2023.

M. A. Palomino and F. Aider, “Evaluating the Effectiveness of Text Pre-Processing in Sentiment Analysis,” Applied Sciences, vol. 12, no. 17, p. app12178765, 2022.

H.-T. Duong and T.-A. Nguyen-Thi, “A review:preprocessing techniques and data augmentation for sentiment analysis,” Computational Social Networks, vol. 8, no. 1, pp. 1–16, 2021.

F. Carvalho and G. P. Guedes, “TF-IDFCRF : A Novel Supervised Term Weighting Scheme for Sentiment Analysis,” arXiv preprint arXiv:2003.07193v2, 2020.

J. Frej, P. Mulhem, S. Didier and J.-P. Chevallet, “Learning Term Discrimination,” Association for Computing Machinery, pp. 18–21, 2020.

Y. Cathy, D. Paul, W. J¨org and T. Katerina, “A systematic review of aspect-based sentiment analysis:domains, methods, and trends, Artificial Intelligence Review, Springer Netherlands, vol. 57, no. 296, 2024.

E. Xu, J. Zhu, L. Zhang, Y. Wang and W. Lin, “Research on Aspect-Level Sentiment Analysis Based on Adversarial Training and Dependency Parsing,” Electronics, vol. 13, no. 10, p. 13101993, 2024.

A. R. Salehi and M. Khedmati, “A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data,” Scientific Reports, vol.14, no. 5152, 2024.

Y. Zhang, L. Deng and B. Wei, “Imbalanced Data Classification Based on Improved RandomSMOTE and Feature Standard Deviation,” Mathematics, vol. 12, no. 11, p. math12111709, 2024.

A. Taskeen, S. U. R. Khan and A. Mashkoor,“An adaptive synthetic sampling and batch generation-oriented hybrid approach for addressing class imbalance problem in software defect prediction,” Soft Computing , vol. 28, no. 23–24, pp. 13595–13614, 2024.

M. K. Zuhanda, L. Permata, Hartono, E. Ongko and Desniarti, “Impact of Adaptive Synthetic on Na¨ıve Bayes Accuracy in Imbalanced Anemia Detection Datasets,” Rekayasa Sistem dan Teknologi Informasi, vol. 5, no. 158, pp. 4–12, 2025.

P. Torrijos, J. C. Alfaro, J. A. G´amez and J. M. Puerta, “Federated Learning with Discriminative Naive Bayes Classifier,” arXiv preprint arXiv:2502.01532v1, 2025.

O. Peretz, M. Koren and O. Koren, “Naive Bayes classifier-An ensemble procedure for recall and precision enrichment,” Engineering Applications of Artificial Intelligence, vol. 136, no. PB, p. 108972, 2024.

H. Chen, S. Hu, R. Hua, and X. Zhao, “Improved naive Bayes classification algorithm for traffic risk management,” EURASIP Journal on Advances in Signal Processing, vol. 2021, no. 30, 2021.

Z. Jun, “The Development and Application of Support Vector Machine,” Journal of Physics: Conference Series, vol. 1748, pp. 1–7, 2021.

I. Hossain, “Support Vector Machine,” ResearchGate, pp. 1–7, 2022.

E. Bayraktar, I. Ekren and X. Zhang, “Prediction against a limited adversary,” Journal of Machine Learning Research, vol. 22, pp. 1–33, 2021.

J. Qiu, “An Analysis of Model Evaluation with Cross-Validation:Techniques , Applications , and Recent Advances,” Proc. Financ. Age Environ. Risks Sustain., pp. 69–72, 2024.

V. W. Lumumba and D. Kiprotich, “Comparative Analysis of Cross-Validation Techniques: LOOCV , K-folds Cross-Validation , and Repeated K-folds Cross-Validation in Machine Learning Models,”American Journal of Theoretical and Applied Statistics, vol. 13, no. 5, pp. 127–137, 2024.

S. Sathyanarayanan and B. R. Tantri, “Confusion Matrix-Based Performance Evaluation Metrics,” African Journal of Biomedical Research, vol. 27, no. 4, pp. 1–9, 2024.

A. F. Alshammari, “Implementation of Model Evaluation using Confusion Matrix in Python,” International Journal of Computer Applications, vol. 186, no. 50, pp. 42–48, 2024.