Enhancing Industrial Machine Sound Anomaly Detection Using STFT Integrated with DWT and Autoencoder-Based Neural Networks
DOI:
https://doi.org/10.55003/ETH.430107Keywords:
Discrete Wavelet Transform (DWT), Autoencoder, Unsupervised Learning, Machine Sound Anomaly DetectionAbstract
This study proposes a hybrid feature extraction approach that integrates the Discrete Wavelet Transform (DWT) with the Short-Time Fourier Transform (STFT) to improve the accuracy of anomalous sound detection in industrial machines. Conventional STFT-based methods, while effective in representing time–frequency characteristics, exhibit limitations in handling non-stationary noise and transient variations, which often lead to reduced anomaly detection performance in practical industrial environments. To address this problem, the proposed method incorporates multiresolution analysis through DWT, enhancing the system’s capability to capture both spectral and temporal information with improved noise robustness. The MIMII dataset (valve, -6 dB, ID02) was used to evaluate the model, where the DWT–STFT feature, representation was applied to an autoencoder for unsupervised anomaly detection. Experimental results demonstrate that the integration of DWT effectively enhanced noise robustness and improved classification metrics, achieving higher AUC and F1-scores compared to the baseline STFT-based approach. In conclusion, the proposed DWT–STFT fusion provides a more resilient and discriminative feature representation, making it a promising technique for practical industrial anomaly detection systems.
References
H. Purohit, R. Tanabe, K. Ichige, T. Endo, Y. Nikaido, K. Suefusa, and Y. Kawaguchi, “MIMII dataset: Sound dataset for malfunctioning industrial machine investigation and inspection,” in Proc. Detection and Classification of Acoustic Scenes and Events 2019, Oct. 25–26, 2019, pp. 209–213, doi: 10.33682/m76f-d618.
T. Ye, T. Peng and L. Yang, “Review on Sound-Based Industrial Predictive Maintenance: From Feature Engineering to Deep Learning,” Mathematics, vol. 13, no. 11, 2025, Art. no. 1724, doi: 10.3390/math13111724.
C. M. Bishop, “Introduction,” in Pattern Recognition and Machine Learning, New York, NY, USA: Springer, 2006, ch. 1, sec. 1.2, pp. 21–24.
S. Russell and P. Norvig, “Learning from Examples,” in Artificial Intelligence: A Modern Approach, 3rd ed., Upper Saddle River, NJ, USA: Pearson, 2010, ch. 18, sec. 18.1, pp. 693–695.
J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015, doi: 10.1016/j.neunet.2014.09.003.
J. Guan, Y. Liu, Q. Kong, F. Xiao, Q. Zhu, J. Tian and W. Wang, “Transformer-based autoencoder with ID constraint for unsupervised anomalous sound detection,” EURASIP Journal on Audio, Speech, and Music Processing, vol. 2023, no. 1, 2023, doi: 10.1186/s13636-023-00308-4.
S. Mallat, “Introduction to a Transient World,” in A Wavelet Tour of Signal Processing, 2nd ed., San Diego, CA, USA: Academic Press, 1999, ch. 1, sec. 1.3, pp. 28-34.
I. Daubechies, “Discrete Wavelet Transform: Frames,” in Ten Lectures on Wavelets, Philadelphia, PA, USA: SIAM, 1992, ch. 3, pp. 53–105.
A. Graps, “An introduction to wavelets,” IEEE Computational Science and Engineering, vol. 2, no. 2, pp. 50–61, 1995, doi: 10.1109/99.388960.
C. Valens, “A Really Friendly Guide to Wavelets,” The University of New Mexico, Albuquerque, NM, USA, 1999. Mar. 1, 2026. [Online]. Available: http://agl.cs.unm.edu/~williams/cs530/arfgtw.pdf.
J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, vol. 19, pp. 297–301, 1965, doi:10.2307/2003354.
J. B. Allen and L. R. Rabiner, “A unified approach to short-time Fourier analysis and synthesis,” Proceedings of the IEEE, vol. 65, no. 11, pp. 1558–1564, 1977, doi: 10.1109/proc.1977.10770.
S. S. Stevens, J. Volkmann and E. B. Newman, “A Scale for the Measurement of the Psychological Magnitude Pitch,” The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185–190, 1937, doi: 10.1121/1.1915893.
B. Logan, “Mel frequency cepstral coefficients for music modeling,” in International Symposium on Music Information Retrieval, Plymouth, MA, USA, Oct. 23–25, 2000, pp.1–11.
T. Ganchev, N. Fakotakis and G. Kokkinakis, “Comparative evaluation of various MFCC implementations on the speaker verification task,” in Proc. 10th Int. Conf. Speech and Computer (SPECOM), Patras, Greece, Oct. 17–19, 2005, pp. 191–194.
A. S. B. Saharom and F. Ehara, “Comparative Analysis of MFCC and Mel Spectrogram Features in Pump Fault Detection Using Autoencoder,” in 2024 2nd International Conference on Computer Graphics and Image Processing (CGIP), Kyoto, Japan, Jan. 12–14, 2024, pp. 1–6, doi: 10.1109/CGIP62525.2024.00030.
Y. Koizumi, S. Saito, H. Uematsu, Y. Kawachi and N. Harada, “Unsupervised Detection of Anomalous Sound Based on Deep Learning and the Neyman–Pearson Lemma,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 1, pp.212–224, 2019, doi: 10.1109/TASLP.2018.2877258.
R. Garreta and G. Moncecchi, “Supervised Learning,” in Learning scikit-learn: Machine Learning in Python, Birmingham, U.K.: Packt Publishing, 2013, ch. 2, pp. 25–60.
Keras. “Keras: Developer guides.” keras.io. https://keras.io/guides/ (retrieved Mar. 19, 2026).
B. McFee, C. Raffel, D. Liang, D. P. W. Ellis, M. McVicar, E. Battenberg and O. Nieto, “librosa: Audio and music signal analysis in Python,” in Proc. 14th Python in Science Conference, Austin, TX, USA, Jul. 6–12, 2015, pp. 18–24, doi: 10.25080/majora-7b98e3ed-003.
NumPy. “NumPy Documentation.” numpy.org. https://numpy.org/doc/ (retrieved Mar. 19, 2026).
A. Sharma, “A comprehensive guide to Google Colab: Features, usage, and best practices.” analyticsvidhya.com. https://www.analyticsvidhya.com/blog/2020/03/google-colab-machine-learning-deep-learning/ (accessed Mar. 19, 2026).
Google Research, “Welcome to Colaboratory.” research.google.com. https://research.google.com/colaboratory/ (accessed Mar. 19, 2026).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 School of Engineering, King Mongkut’s Institute of Technology Ladkrabang

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The published articles are copyrighted by the School of Engineering, King Mongkut's Institute of Technology Ladkrabang.
The statements contained in each article in this academic journal are the personal opinions of each author and are not related to King Mongkut's Institute of Technology Ladkrabang and other faculty members in the institute.
Responsibility for all elements of each article belongs to each author; If there are any mistakes, each author is solely responsible for his own articles.



