Thai Speech Emotion Recognition Using Artificial Neural Networks

Main Article Content

Watchara Sothirit
Waranya Poonnawat
Nuttaporn Hencharoenlert

Abstract

The research aimed (1) to develop and evaluate an emotion recognition model for Thai speech using artificial neural networks, (2) to enable accurate classification of human emotions, and (3) to bridge communication gaps between computers and users. A dataset from AIResearch.in.th consisting of 27,854 Thai-language sentences categorized into angry, sad, happy, frustrated, and neutral emotions. The Mel Frequency Cepstral Coefficients (MFCC) employed for the speech feature extraction. Data was pre-processed by augmentation techniques, including time stretching, pitch shifting, and noise injection. The pre-processed data trained for artificial neural network models, including a 1-dimensional Convolutional Neural Network (1D CNN), Long Short-Term Memory (LSTM), and a hybrid model (1D CNN and LSTM). Results showed that the hybrid model (1D CNN & LSTM) achieved the highest accuracy of 80.36%, followed by the 1D CNN model  (77.52%) and the LSTM model  (67.86%).

Article Details

Section
บทความวิจัย

References

T. Wangvanichapan, Artificial intelligence can now read human voices "Data set and Emotional Classification Model from Thai Speech" the work of Professor Chulalongkorn University Available for free download today. Available Online at https://www. chula.ac.th/highlight/47227/, accessed on 26 June 2023.

P. Kulkasem, S. Rasameekhwan, B. Chandrakongkul, S. Rimcharoen, K. Chinsarn, P. Boonthong, and M. Chansuphap. Emotion recognition of affective speech based on hybrid classifiers, A Complete Research Report, Faculty of Information Sciences, Burapha University, 2015.

M. El Ayadi, M. S. Kamel, and F. Karray. "Survey on speech emotion recognition: Features, classification schemes, and databases." Pattern Recognition, Vol. 44, No. 3, pp. 572-587, 2011.

S. Kitthaweesinpoon and E. Rattagan. Speech Emotion Recognition of Thai Language. Master's thesis, Data Science Courses, Faculty of Applied Statistics, National Institute of Development Administration, 2021.

J. Salamon and P. J. Bello. "Deep Convolutional Neural Networks and Data Augmentation for Envi ronmental Sound Classification." IEEE Signal Processing Letters, Vol. 24, No. 3, pp. 279-283, 2017.

R. Kawade, R. Konade, P. Majukar, and S. Patil. "Speech Emotion Recognition Using 1D CNN-LSTM Network on Indo-Aryan Database." International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Vol. 3, pp. 1288-1293, 2022.

E. Pacharawongsakda, Dividing the data to test the efficiency of the model. Available Online at https://www.linkedin.com/in/eakasit-pacharawongsakda-ph-d-475a8452/recent-activity/posts/, accessed on 26 June 2023.

J. Brownlee, Use Early Stopping to Halt the Training of Neural Networks At the Right Time, Machine Learning Mastery. Available Online at https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/, accessed on 26 June 2023.

P. Gatchalee, Confusion Matrix is an important tool for evaluating prediction results in machine learning. Available Online at https://medium.com/@pagongatchalee/, accessed on 17 October 2023.

S. Kanjanawattana, A. Jarat, and P. Praneetpholkrang. "Classification of Human Emotion from Speech Recognition Using Deep Learning." Science and Technology Journal Sisaket Rajabhat University, Vol. 2, No. 2, pp. 1-11, July-December, 2022.

A. Pratama and S. W. Sihwi. "Speech Emotion Recognition Model using Support Vector Machine Through MFCC Audio Feature." International Conference on Information Technology and Electrical Engineering (ICITEE), Vol. 14, pp. 303-307, 2022.

VISTEC-depa Thailand Artificial Intelligence Research Institute, Emotion classification dataset from Thai speech. Available Online at https://airesearch. in.th/releases/speech-emotion-dataset/, accessed on 20 January 2023.