การจำแนกเสียงคนจริงและเสียงสังเคราะห์ปัญญาประดิษฐ์ด้วยโครงข่ายประสาทเทียมแบบคอนโวลูชัน

Krittipoom  Phalachun; Supasin  Wonglapsuwan; Tanaphon  Rumnum; Sajjaporn  Waijanya; Nuttachot  Promrit

doi:10.14456/kkuscij.2023.15

PDF

Published: Jul 25, 2023

DOI: https://doi.org/10.14456/kkuscij.2023.15

Keywords:

Voice Systhesis Voice Classification Connvolutional Neural Network

Krittipoom Phalachun

Data Science Major, Faculty of Science, Silpakorn University

Supasin Wonglapsuwan

Data Science Major, Faculty of Science, Silpakorn University

Tanaphon Rumnum

Data Science Major, Faculty of Science, Silpakorn University

Sajjaporn Waijanya

Department of Computing, Faculty of Science, Silpakorn University

Nuttachot Promrit

Department of Computing, Faculty of Science, Silpakorn University

Abstract

This research aims to distinguish between real human voices and synthesized voices in order to prevent crime resulting from voice impersonation using deepfake voice technology. There have been cases where energy companies were scammed out of nearly £200,000 (USD 260,000) after criminals used deepfake voice technology to imitate the CEO's voice and approve payment. For the dataset used in this research, 15 famous individuals’ voices were recorded and divided into three sets: a training set, a validation set, and a testing set, with a ratio of 75:15:10. The Mel-frequency cepstral coefficients (MFCC) were extracted as the features of the voices and a convolutional neural network (CNN) was used to classify the voices. The performance of the model was evaluated using a confusion matrix, and the accuracy was found to be 97%.

How to Cite

Phalachun, K. ., Wonglapsuwan, S. ., Rumnum, T., Waijanya, S. ., & Promrit , N. . (2023). Real Human Voice and Artificial Intelligence Synthetic Voice Recognition with Convolutional Neural Networks. KKU Science Journal, 51(2), 170–179. https://doi.org/10.14456/kkuscij.2023.15

Issue

Vol. 51 No. 2 (2023): May - August 2023

Section

Research Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

References

Amezaga, N. and Hajek, J. (2022). Availability of Voice Deepfake Technology and its Impact for Good and Evil. In: SIGITE'22: Proceedings of the 23rd Annual Conference on Information Technology Education. Association for Computing Machinery, New York. 23 - 28.

Ballesteros, D.M., Rodriguez-Ortega, Y., Renza, D., and Arce, G. (2021). Deep4SNet: Deep learning for fake speech classification. Expert Systems with Applications 184: 115465. doi: 10.1016/j.eswa.2021.115465.

Changwei, Z., Lili, Z., Xiaojun, Z., Yuanbo, W., Di, W. and Zhi, T. (2020). Classification of normal and pathological voices using convolutional neural network. In: 2020 International Conference on Sensing, Measurement & amp; Data Analytics in the Era of Artificial Intelligence (ICSMD). Xi'an Jiaotong University, Xi'an, China. 325-329. doi: 10.1109/icsmd50554.2020.9261730.

Hamza, A., Javed, A.R.R., Iqbal, F., Kryvinska, N., Almadhor, A.S., Jalil, Z. and Borghol, R. (2022). Deepfake audio detection via MFCC features using machine learning. IEEE Access 10: 134018 – 134028. doi: 10.1109/access.2022.3231480.

Hireš, M., Gazda, M., Drotár, P., Pah, N.D., Motin, M.A. and Kumar, D.K. (2022). Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Computers in Biology and Medicine 141: 105021. doi: 10.1016/j.compbiomed.2021.105021.

Kao, Y.C., Li, C.T., Tai, T.C. and Wang, J.C. (2021). Emotional speech analysis based on convolutional neural networks. In: 2021 9th International Conference on Orange Technology (ICOT). CMICSD Laboratory, National Cheng Kung University, Tainan, Taiwan. 1 - 4. doi: 10.1109/icot54518.2021. 968 0651.

Khochare, J., Joshi, C., Yenarkar, B., Suratkar, S. and Kazi, F. (2021). A deep learning framework for audio deepfake detection. Arabian Journal for Science and Engineering 47(3): 3447 – 3458. doi: 10.1007/s13369-021-06297-w.

Mukhopadhyay, D., Shirvanian, M. and Saxena, N. (2015). All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines. In: Computer Security -- ESORICS 2015. Lecture Notes in Computer Science, Vienna. 599 - 621.

Narasimhan, R., Fern, X.Z. and Raich, R. (2017). Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP). IEEE Signal Processing Society, New Orleans, USA. 146 - 150. doi: 10.1109/icassp.2017.7952135.

Reimao, R. and Tzerpos, V. (2019). FoR: A dataset for synthetic speech detection. In: 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD). Telecommunications and Information Technology, Politehnica University of Bucharest, Timisoara, Romania. 1 - 10. doi: 10.1109/SPED.2019.8906599.

Ring, T. (2021). Europol: the AI hacker threat to biometrics. Biometric Technology Today 2021(2): 9 - 11. doi: 10.1016/S0969-4765(21)00023-0.

Article Sidebar

Main Article Content

Abstract

Article Details

References