Advancing Guitar Chord Recognition: A Visual Method Based on Deep Convolutional Neural Networks and Deep Transfer Learning

Main Article Content

Yosi Kristian
Lukman Zaman
Michael Tenoyo
Andreas Jodhinata

Abstract

Initiated in 1999, Automatic Chord Recognition (ACR) primarily relied on audio data, facing challenges, especially with high timbre sounds, which led to a shift towards visual methods for recognizing guitar chords due to their distinct hand configurations. This project explores visual guitar chord identification, harnessing fretboard features, including hand arrangements and positions. It also investigates the limited advantages of transfer learning due to the absence of pertinent pre-trained weights. The developed model employs deep learning, DCNN methodologies, and techniques like normalization, attening, and dropout to identify 14 major and minor keys without fret position restrictions. Enhanced by data augmentation, a self-compiled dataset of over 13,000 samples from 10 contributors effectively trains the model for new data. After testing 115 new examples, the system achieved an 83% accuracy on both live and pre-recorded video data. These results demonstrate the feasibility of employing a deep convolutional neural network (DCNN) focused visual approach for guitar chord identification. Furthermore, this study suggests exciting potential for future advancements in the Music Information Retrieval (MIR) field.

Article Details

How to Cite
[1]
Y. Kristian, L. Zaman, M. Tenoyo, and A. Jodhinata, “Advancing Guitar Chord Recognition: A Visual Method Based on Deep Convolutional Neural Networks and Deep Transfer Learning”, ECTI-CIT Transactions, vol. 18, no. 2, pp. 235–249, May 2024.
Section
Research Article

References

J. Pauwels, K. O’Hanlon, E. Go ́mez, and M. B. Sandler, “20 Years of Automatic Chord Recognition From Audio,” Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, pp. 54–63, 2019.

T. Fujishima, “Real-time Chord Recognition of Musical Sound: A System Using Common Lisp Music,” ICMC Proceedings, vol. 9, no. 6. pp. 464–467, 1999.

A. Sheh and D. P. W. Ellis, “Chord segmentation and recognition using EM-trained hidden Markov models,” Proc. ISMIR, pp. 185–191, 2003.

A. M. Stark and M. D. Plumbley, “Real-time chord recognition for live performance,” Proceedings of the 2009 International Computer Music Conference, ICMC 2009, no. ICMC, pp. 85–88, 2009.

F. Korzeniowski and G. Widmer, “Feature learning for chord recognition: The deep chroma extractor,” Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016, pp. 37–43, 2016.

N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, “Audio chord recognition with recurrent neural networks,” Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013, pp. 335–340, 2013.

Y. Han, J. Kim, and K. Lee, “Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music,” IEEE/ACM Trans Audio Speech Lang Process, vol. 25, no. 1, pp. 208–221, 2017.

G. Byambatsogt, L. Choimaa, and G. Koutaki, “Guitar chord sensing and recognition using multi-task learning and physical data augmentation with robotics,” Sensors (Switzerland), vol. 20, no. 21, pp. 1–17, 2020.

L. Nanni, A. Rigo, A. Lumini, and S. Brahnam, “Spectrogram classification using dissimilarity space,” Applied Sciences (Switzerland), vol. 10, no. 12, pp. 1–17, 2020.

A. Ghriss, “Deep Automatic Chord Recognition,” pp. 1–10, 2017.

J. Witte, “AI-technology behind the chords of Chordify – our algorithm explained,” Chordify News, 2021.

O. Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int J Comput Vis, vol. 115, no. 3, pp. 211–252, 2015.

M. Bajammal, A. Stocco, D. Mazinanian, and A. Mesbah, “A Survey on the Use of Computer Vision to Improve Software Engineering Tasks,” IEEE Transactions on Software Engineering, no. October 2020.

A.-M. Burns, “Computer Vision Methods for Guitarist Left-Hand Fingering Recognition,” 2007.

C. Kerdvibulvech and H. Saito, “Vision-based guitarist fingering tracking using a Bayesian classifier and particle filters,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4872 LNCS, pp. 625–638, 2007.

C. Kerdvibulvech, “Markerless guitarist fingertip detection using a bayesian classifier and a template matching for supporting guitarists,” Proceedings of the 10th ACM/IEEE Virtual Reality International Conference, VRIC ’08, Laval, France, pp. 2–8, 2008.

J. Scarr and R. Green, “Retrieval of guitarist fingering information using computer vision,” International Conference Image and Vision Computing New Zealand, pp. 0–6, 2010.

Z. Wang and J. Ohya, “Tracking the guitarist’s fingers as well as recognizing pressed chords from a video sequence,” I.S. and T International Symposium on Electronic Imaging Science and Technology, pp. 1–6, 2016.

M. H. Purnomo, Y. Kristian, E. Setyati, U. Delfana Rosiani, and E. I. Setiawan, “Limitless possibilities of pervasive biomedical engineering: Directing the implementation of affective computing on automatic health monitoring system,” in Proceedings of 2016 8th International Confer

ence on Information Technology and Electrical Engineering: Empowering Technology for Better Future, ICITEE 2016, 2017.

T. Ooaku, T. D. Linh, M. Arai, T. Maekawa, and K. Mizutani, “Guitar chord recognition based on finger patterns with deep learning,” ACM International Conference Proceeding Series, pp. 54–57, 2018.

L.-V. Tran, S. Zhang, and E. Zhou, “CNN Transfer Learning for Visual Guitar Chord Classification,” 2019. [Online]. Available: https://api. semanticscholar.org/CorpusID:215776004

S. Grigorescu, B. Trasnea, T. Cocias, and G. Macesanu, “A survey of deep learning techniques for autonomous driving,” J Field Robot, vol. 37, no. 3, pp. 362–386, 2020.

X. Wang, L. M. Meng, B. Zhang, J. Lu, and K. L. Du, “A video-based traffic violation detection system,” Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer, MEC 2013, no. December 2013, pp. 1191–1194, 2013.

T. Tirtawan, E. K. Susanto, P. C. S. W. Lukman Zaman and Y. Kristian, “Batik Clothes Auto-Fashion using Conditional Generative Adversarial Network and U-Net,” 2021 3rd East Indonesia Conference on Computer and Information Technology (EIConCIT), Surabaya, Indonesia, pp. 145-150, 2021.

D. Widjojo, E. Setyati, and Y. Kristian, “Integrated Deep Learning System for Car Damage Detection and Classification Using Deep Transfer Learning,” in 2022 IEEE 8th Information Technology International Seminar (ITIS), pp. 21–26, 2022.

T. Y. Lin et al., “Microsoft COCO: Common objects in context,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8693 LNCS, no. PART 5, pp. 740–755, 2014.

A. U. Khan and A. Borji, “Analysis of Hand Segmentation in the Wild,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4710–4719, 2018.

W. Liu et al., “SSD: Single Shot MultiBox Detector,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Cham: Springer International Publishing, 2016, pp. 21–37.

L. Beyer, O. J. H ́enaff, A. Kolesnikov, X. Zhai, and A. van den Oord, “Are we done with ImageNet?,” 2020.

D. Impiombato et al., “SSD: Single Shot Multi-Box Detector Wei,” Nucl Instrum Methods Phys Res A, vol. 794, pp. 185–192, 2015.

A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” 2017.

A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” ArXiv, vol. abs/1704.04861, 2017, [Online]. Available: https://api.semanticscholar.org/CorpusID:12670695

F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” Proceedings 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017Janua, pp. 1800–1807, 2017.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016Decem, pp. 2818–2826, 2016.

C. Szegedy et al., “Going deeper with convolutions,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07-12-June, pp. 1–9, 2015.

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” 32nd International Conference on Machine Learning, ICML 2015, vol. 1, pp. 448–456, 2015.

C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-ResNet and the impact of residual connections on learning,” 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 4278–4284, 2017.

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomputing, vol. 187, pp. 27–48, 2016.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014.

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980, 2017.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.

Y. You et al., “Large Batch Optimization for Deep Learning: Training BERT in 76 minutes,” arXiv:1904.00962, 2020.

C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J Big Data, vol. 6, no. 1, p. 60, 2019.

A. Mikolajczyk and M. Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 International Interdisciplinary PhD Workshop (IIPhDW), pp. 117–122, 2018.

L. Perez and J. Wang, “The Effectiveness of Data Augmentation in Image Classification using Deep Learning,” arXiv:1712.04621, 2017.

Y. Kristian, N. Simogiarto, M. T. A. Sampurna, E. Hanindito, and V. Visuddho, “Ensemble of multimodal deep learning autoencoder for infant cry and pain detection,” F1000Res, vol. 11, p. 359, 2023.