Optimized intelligent speech signal verification system for identifying authorized users

Main Article Content

Pravin Marotrao Ghate
Bhagvat D Jadhav
Prabhakar N Kota
Shankar Dattatray Chavan
Pravin Balaso Chopade

Abstract

Speech processing is today's trending topic in the digital industry for making authentication to keep aware of unauthorized ones. However, analyzing the signal feature using conventional filtering or the neural models is insufficient due to the present signal's several noisy features. Hence, incorporating the different noise elimination filters has maximized the algorithm complexity in verifying the user speech signal. So, the present study built a novel Chimp-based recursive Speech Identification (CbRSI) system for the speech processing domain to verify the authenticated users through the speech signal data. To make signal processing the most straightforward task was activated the filtering function to recognize and neglect the noisy features. Consequently, the filtered audio data is imported as the classification phase input then the features-selecting process is performed. Finally, authenticated users were found and validated the performance by matching the analyzed signal features with the saved audio features. Hence, a novel CbRSI earned the finest user verification exactness score of 98.2%, which is the most satisfactory outcome compared to past studies. Therefore, the implemented solution is the most required framework for verifying authenticated users.

Article Details

How to Cite
Marotrao Ghate, P., Jadhav, B. D. ., Kota, P. N. ., Chavan, S. D. ., & Pravin Balaso Chopade. (2023). Optimized intelligent speech signal verification system for identifying authorized users. Engineering and Applied Science Research, 50(6), 525–537. retrieved from https://ph01.tci-thaijo.org/index.php/easr/article/view/252462
Section
ORIGINAL RESEARCH

References

Islam MA, Sakib AN. Bangla dataset and MMFCC in text-dependent speaker identification. Eng Appl Sci Res. 2019;46(1):56-63.

Myint LMM, Warisarn C, Busyatras W, Kovintavewat P. Single-Track equalization method with TMR correction system based on cross correlation functions for a patterned media recording system. Eng Appl Sci Res. 2017;44(1):16-9.

Hamcumpai S, Bureerat S, Eua-Anant N. Comparison of signal processing techniques for fault detection in helical spur gears. KKU Eng J. 2007;34(1):59-72.

Stafylakis T, Mošner L, Kakouros S, Plchot O, Burget L, Ćernocký J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. 2022 IEEE Spoken Language Technology Workshop (SLT); 2023 Jan 9-12; Doha, Qatar. USA: IEEE; 2023. p. 1136-43.

Korkmaz Y, Boyacı A. Hybrid voice activity detection system based on LSTM and auditory speech features. Biomed Signal Process Control. 2023;80:104408.

Abdusalomov AB, Safarov F, Rakhimov M, Turaev B, Whangbo TK. Improved feature parameter extraction from speech signals using machine learning algorithm. Sensors. 2022;22(21):8122.

Ren D, Srivastava G. A novel natural language processing model in mobile communication networks. Mobile Netw Appl. 2022;27:2575-84.

Kwon H, Nam SH. Audio adversarial detection through classification score on speech recognition systems. Comput Secur. 2023;126:103061.

Zheng WZ, Han JY, Cheng HL, Chu WC, Chen KC, Lai YH. Comparing the performance of classic voice-driven assistive systems for dysarthric speech. Biomed Signal Process Control. 2023;81:104447.

Yang Y, Zhang H, Cai Z, Shi Y, Li M, Zhang D, et al. Electrolaryngeal speech enhancement based on a two stage framework with bottleneck feature refinement and voice conversion. Biomed Signal Process Control. 2023;80:104279.

Madhu H, Satapara S, Modha S, Mandl T, Majumder P. Detecting offensive speech in conversational code-mixed dialogue on social media: a contextual dataset and benchmark experiments. Expert Syst Appl. 2023;215:119342.

Meng W, Yolwas N. A study of speech recognition for Kazakh based on unsupervised pre-training. Sensors. 2023;23(2):870.

de Lope J, Graña M. An ongoing review of speech emotion recognition. Neurocomputing. 2023;528:1-11.

Yang CC, Chang TS. A 1.6-mW sparse deep learning accelerator for speech separation. IEEE Trans Very Large Scale Integr (VLSI) Syst. 2023;31(3):310-9.

Dowerah S, Serizel R, Jouvet D, Mohammadamini M, Matrouf D. Joint optimization of diffusion probabilistic-based multichannel speech enhancement with far-field speaker verification. 2022 IEEE Spoken Language Technology Workshop (SLT); 2023 Jan 9-12; Doha, Qatar. USA: IEEE; 2023. p. 428-35.

Lin W, Mak MW. Robust speaker verification using deep weight space ensemble. IEEE/ACM Trans Audio Speech Lang Process. 2023;31:802-12.

Abbasi W. Privacy-Preserving speaker verification and speech recognition. In: Saracino A, Mori P, editors. Emerging Technologies for Authorization and Authentication. Lecture Notes in Computer Science, vol. 13782. Cham: Springer; 2023. p. 102-19.

Cai Z, Yang Y, Li M. Cross-lingual multi-speaker speech synthesis with limited bilingual training data. Comput Speech Lang. 2023;77:101427.

Mingote V, Miguel A, Ortega A, Lleida E. Class token and knowledge distillation for multi-head self-attention speaker verification systems. Digit Signal Process. 2023;133:103859.

Li J. A comparative study of different filters for speech signals. International Conference on Intelligent Systems, Communications, and Computer Networks (ISCCN 2022); 2022 Jun 17-19; Chengdu, China. Washington: SPIE; 2022. p. 12332-6.

Chen Z, Yoshioka T, Lu L, Zhou T, Meng Z, Luo Y, et al. Continuous speech separation: dataset and analysis. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020 May 4-8; Barcelona, Spain. USA: IEEE; 2020. p. 7284-8.

Xia Y, Braun S, Reddy CKA, Dubey H, Cutler R, Tashev I. Weighted speech distortion losses for neural-network-based real-time speech enhancement. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020 May 4-8; Barcelona, Spain. USA: IEEE; 2020. p. 871-5.

Maciejewski M, Wichern G, McQuinn E, Roux JL. WHAMR!: Noisy and reverberant single-channel speech separation. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020 May 4-8; Barcelona, Spain. USA: IEEE; 2020. p. 696-700.

Michelsanti D, Tan ZH, Zhang SX, Xu Y, Yu M, Yu D, et al. An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1368-96.

Chen S, Wang C, Chen Z, Wu Y, Liu S, Chen Z, et al. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE J Sel Top Signal Process. 2022;16(6):1505-18.

Mustaqeem, Kwon S. MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Syst Appl. 2021;167:114177.

Zhao Y, Wang DL, Xu B, Zhang T. Monaural speech dereverberation using temporal convolutional networks with self-attention. IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1598-607.

Weng Z, Qin Z. Semantic communication systems for speech transmission. IEEE J Sel Areas Commun. 2021;39(8):2434-44.

Mustaqeem, Sajjad M, Kwon S. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access. 2020;8:79861-75.

Mukhamadiyev A, Mukhiddinov M, Khujayarov I, Ochilov M, Cho J. Development of language models for continuous Uzbek speech recognition system. Sensors. 2023;23(3):1145.

Venkateswarlu SC, Kumar NU, Veeraswamy D, Vijay V. Speech intelligibility quality in telugu speech patterns using a wavelet-based hybrid threshold transform method. In: Reddy VS, Prasad VK, Mallikarjuna Rao DN, Satapathy SC, editors. Intelligent Systems and Sustainable Computing. Smart Innovation, Systems and Technologies, vol. 289. Springer: Singapore; 2022. p. 449-62.

Khishe M, Nezhadshahbodaghi M, Mosavi MR, Martín D. A weighted chimp optimization algorithm. IEEE Access. 2021;9:158508-39.

Wan S, Yeh ML, Ma HL, Chou TY. The robust study of deep learning recursive neural network for predicting of turbidity of water. Water. 2022;14(5):761.

Saxena N, Varshney D. Smart home security solutions using facial authentication and speaker recognition through artificial neural networks. Int J Cogn Comput Eng. 2021;2:154-64.

Nainan S, Kulkarni V. Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN. Int J Speech Technol. 2021;24:809-22.