Real Human Voice and Artificial Intelligence Synthetic Voice Recognition with Convolutional Neural Networks

Main Article Content

Krittipoom Phalachun
Supasin Wonglapsuwan
Tanaphon Rumnum
Sajjaporn Waijanya
Nuttachot Promrit

Abstract

This research aims to distinguish between real human voices and synthesized voices in order to prevent crime resulting from voice impersonation using deepfake voice technology. There have been cases where energy companies were scammed out of nearly £200,000 (USD 260,000) after criminals used deepfake voice technology to imitate the CEO's voice and approve payment. For the dataset used in this research, 15 famous individuals’ voices were recorded and divided into three sets: a training set, a validation set, and a testing set, with a ratio of 75:15:10. The Mel-frequency cepstral coefficients (MFCC) were extracted as the features of the voices and a convolutional neural network (CNN) was used to classify the voices. The performance of the model was evaluated using a confusion matrix, and the accuracy was found to be 97%.

Article Details

How to Cite
Phalachun, K. ., Wonglapsuwan, S. ., Rumnum, T., Waijanya, S. ., & Promrit , N. . (2023). Real Human Voice and Artificial Intelligence Synthetic Voice Recognition with Convolutional Neural Networks. KKU Science Journal, 51(2), 170–179. Retrieved from https://ph01.tci-thaijo.org/index.php/KKUSciJ/article/view/252903
Section
Research Articles

References

Amezaga, N. and Hajek, J. (2022). Availability of Voice Deepfake Technology and its Impact for Good and Evil. In: SIGITE'22: Proceedings of the 23rd Annual Conference on Information Technology Education. Association for Computing Machinery, New York. 23 - 28.

Ballesteros, D.M., Rodriguez-Ortega, Y., Renza, D., and Arce, G. (2021). Deep4SNet: Deep learning for fake speech classification. Expert Systems with Applications 184: 115465. doi: 10.1016/j.eswa.2021.115465.

Changwei, Z., Lili, Z., Xiaojun, Z., Yuanbo, W., Di, W. and Zhi, T. (2020). Classification of normal and pathological voices using convolutional neural network. In: 2020 International Conference on Sensing, Measurement & amp; Data Analytics in the Era of Artificial Intelligence (ICSMD). Xi'an Jiaotong University, Xi'an, China. 325-329. doi: 10.1109/icsmd50554.2020.9261730.

Hamza, A., Javed, A.R.R., Iqbal, F., Kryvinska, N., Almadhor, A.S., Jalil, Z. and Borghol, R. (2022). Deepfake audio detection via MFCC features using machine learning. IEEE Access 10: 134018 – 134028. doi: 10.1109/access.2022.3231480.

Hireš, M., Gazda, M., Drotár, P., Pah, N.D., Motin, M.A. and Kumar, D.K. (2022). Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Computers in Biology and Medicine 141: 105021. doi: 10.1016/j.compbiomed.2021.105021.

Kao, Y.C., Li, C.T., Tai, T.C. and Wang, J.C. (2021). Emotional speech analysis based on convolutional neural networks. In: 2021 9th International Conference on Orange Technology (ICOT). CMICSD Laboratory, National Cheng Kung University, Tainan, Taiwan. 1 - 4. doi: 10.1109/icot54518.2021. 968 0651.

Khochare, J., Joshi, C., Yenarkar, B., Suratkar, S. and Kazi, F. (2021). A deep learning framework for audio deepfake detection. Arabian Journal for Science and Engineering 47(3): 3447 – 3458. doi: 10.1007/s13369-021-06297-w.

Mukhopadhyay, D., Shirvanian, M. and Saxena, N. (2015). All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines. In: Computer Security -- ESORICS 2015. Lecture Notes in Computer Science, Vienna. 599 - 621.

Narasimhan, R., Fern, X.Z. and Raich, R. (2017). Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE Signal Processing Society, New Orleans, USA. 146 - 150. doi: 10.1109/icassp.2017.7952135.

Reimao, R. and Tzerpos, V. (2019). FoR: A dataset for synthetic speech detection. In: 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD). Telecommunications and Information Technology, Politehnica University of Bucharest, Timisoara, Romania. 1 - 10. doi: 10.1109/SPED.2019.8906599.

Ring, T. (2021). Europol: the AI hacker threat to biometrics. Biometric Technology Today 2021(2): 9 - 11. doi: 10.1016/S0969-4765(21)00023-0.