Deep feature extraction technique based on Conv1D and LSTM network for food image recognition

Main Article Content

Sirawan Phiphitphatphaisit
Olarik Surinta


There is a global increase in health awareness. The awareness of changing eating habits and choosing foods wisely are key factors that make for a healthy life. In order to design a food image recognition system, many food images were captured from a mobile device but sometimes include non-food objects such as people, cutlery, and even food decoration styles, called noise food images. These issues decreased the performance of the system. Convolutional neural network (CNN) architectures are proposed to address this issue and obtain good performance. In this study, we proposed to use the ResNet50-LSTM network to improve the efficiency of the food image recognition system. The state-of-the-art ResNet architecture was invented to extract the robust features from food images and was employed as the input data for the Conv1D combined with a long short-term memory (LSTM) network called Conv1D-LSTM. Then, the output of the LSTM was assigned to the global average pooling layer before passing to the softmax function to create a probability distribution. While training the CNN model, mixed data augmentation techniques were applied and increased by 0.6%. The results showed that the ResNet50+Conv1D-LSTM network outperformed the previous works on the Food-101 dataset. The best performance of the ResNet50+Conv1D-LSTM network achieved an accuracy of 90.87%.


Download data is not yet available.

Article Details

How to Cite
Phiphitphatphaisit, S., & Surinta, O. (2021). Deep feature extraction technique based on Conv1D and LSTM network for food image recognition. Engineering and Applied Science Research, 48(5), 581-592. Retrieved from


[1] Farooq M, Sazonov E. Feature extraction using deep learning for food type recognition. International conference on bioinformatics and biomedical engineering; 2017 Apr 26-28; Granada, Spain. Berlin: Springer; 2017. p. 464-72.

[2] McAllister P, Zheng H, Bond R, Moorhead A. Combining deep residual neural network features with supervised machine learning algorithms to classify diverse food image datasets. Comput Biol Med. 2018;95:217-33.

[3] Ragusa F, Tomaselli V, Furnari A, Battiato S, Farinella GM. Food vs Non-Food classification. Proceedings of the 2nd International workshop on multimedia assisted dietary management; 2016 Oct 16; Amsterdam, Netherlands. New York: ACM Press; 2016. p. 77-81.

[4] Anthimopoulos MM, Gianola L, Scarnato L, Diem P, Mougiakakou SG. A food recognition system for diabetic patients based on an optimized bag-of-features model. IEEE J Biomed Health Inform. 2014;18(4):1261-71.

[5] Martinel N, Piciarelli C, Micheloni C. A supervised extreme learning committee for food recognition. Comput Vis Image Understand. 2016;148:67-86.

[6] Ojala T, Pietikainen M, Harwood D. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. Proceedings of 12th international conference on pattern recognition; 1994 Oct 9-13; Jerusalem, Israel. New York: IEEE; 2002. p. 582-5.

[7] Lowe DG. Distinctive image features from scale-invariant key points. Int J Comput Vis. 2004;60:91-110.

[8] Dalal N, Triggs B. Histograms of oriented gradients for human detection. 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR'05); 2005 Jun 20-25; San Diego, USA. New York: IEEE; 2005. p. 886-93.

[9] Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-up robust features (SURF). Comput Vis Image Understand. 2008;110(3):346-59.

[10] Coates A, Carpenter B, Case C, Satheesh S, Suresh B, Wang T, et al. Text detection and character recognition in scene images with unsupervised feature learning. 2011 International conference on document analysis and recognition; 2011 Sep 18-21; Beijing, China. New York: IEEE; 2011. p. 440-5.

[11] Csurka G. Visual categorization with bags of keypoints. Workshop on statistical learning in computer vision, ECCV; 2004. p. 1-22.

[12] Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273-97.

[13] Altman NS. An introduction to kernel and nearest neighbor nonparametric regression. Am Stat. 1992;46(3):175-85.

[14] Martinel N, Piciarelli C, Micheloni C. A supervised extreme learning committee for food recognition. Comput Vis Image Understand. 2016;148:67-86.

[15] Hassannejad H, Matrella G, Ciampolini P, De Munari I, Mordonini M, Cagnoni S. Food image recognition using very deep convolutional networks. Proceedings of the 2nd International workshop on multimedia assisted dietary management; 2016 Oct 16; Amsterdam, Netherlands. New York: ACM Press; 2016. p. 41-9.

[16] Liu C, Cao Y, Luo Y, Chen G, Vokkarane V, Ma Y. Deep food: deep learning-based food image recognition for computer-aided dietary assessment. Lect Notes Comput Sci. 2016;9677:37-48.

[17] Pandey P, Deepthi A, Mandal B, Puhan NB. Food net: recognizing foods using ensemble of deep networks. IEEE Signal Process Lett. 2017;24(12):1758-62.

[18] Aguilar E, Bolanos M, Radeva P. Food recognition using fusion of classifiers based on CNNs. International conference on image analysis and processing (ICIAR); 2017 Sep 11-15; Catania, Italy. Berlin: Springer; 2017. p. 213-24.

[19] Matsuda Y, Yanai K. Multiple-food recognition considering co-occurrence employing manifold ranking. The 21st International conference on pattern recognition (ICPR); 2012 Nov 11-15; Tsukuba, Japan. New York: IEEE; 2012. p. 2017-20.

[20] Martinel N, Foresti GL, Micheloni C. Wide-slice residual networks for food recognition. 2018 IEEE Winter conference on applications of computer vision (WACV); 2018 Mar 12-15; Lake Tahoe, USA. New York: IEEE; 2018. p. 567-76.

[21] Kawano Y, Yanai K. Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. Computer vision - ECCV 2014 workshops; 2014 Sep 6-7, Sep 12; Zurich, Switzerland. Berlin: Springer; 2015. p. 3-17.

[22] Bolanos M, Radeva P. Simultaneous food localization and recognition. 2016 23rd International conference on pattern recognition (ICPR); 2016 Dec 4-8; Cancun, Mexico. New York: IEEE; 2016. p. 3140-5.

[23] Bossard L, Guillaumin M, Van Gool L. Food-101-mining discriminative components with random forests. European Conference on Computer Vision (ECCV); 2014 Sep 6-12; Zurich, Switzerland. Berlin: Springer; 2014. p. 446-61.

[24] Chen Y, Jiang H, Li C, Jia X, Ghamisi P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans Geosci Rem Sens. 2016;54:6232-51.

[25] Paul R, Hawkins SH, Balagurunathan Y, Schabath M, Gillies R, Hall L, et al. Deep feature transfer learning in combination with traditional features predicts survival among patients with lung adenocarcinoma. Tomo. 2016;2:388-95.

[26] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. The 3rd International Conference on Learning Representations (ICLR); 2015 May 7-9; San Diego, USA. p. 1-14.

[27] Liu X, Chi M, Zhang Y, Qin Y. Classifying high resolution remote sensing images by fine-tuned VGG deep networks. IEEE International geoscience and remote sensing symposium; 2018 Jul 22-27; Valencia, Spain. New York: IEEE; 2018. p. 7137-40.

[28] Abas MAH, Ismail N, Yassin A, Taib M. VGG16 for plant image classification with transfer learning and data augmentation. Int J Eng Tech. 2018;7:90-4.

[29] Habiba SU, Islam MF, Ahsan SMM. Bangladeshi plant recognition using deep learning based leaf classification. 2019 International conference on computer, communication, chemical, materials and electronic Engineering (IC4ME2); 2019 Jul 11-12; Rajshahi, Bangladesh. New York: IEEE; 2019. p. 1-4.

[30] Pearline SA, Vajravelu SK, Harini S. A study on plant recognition using conventional image processing and deep learning approaches. J Intell Fuzzy Syst. 2019;36:1997-2004.

[31] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition; 2016 Jun 27-30; Las Vegas, USA. New York: IEEE; 2016. p. 770-8.

[32] Huang G, Liu Z, van der Maaten L, Weinberger KQ. Densely connected convolutional networks. IEEE Conf Comput Vis Pattern Recogn. 2017; 2261-9.

[33] Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. MobileNets: efficient convolutional neural networks for mobile vision applications. Comput Vis Pattern Recogn. 2017;1:1-9.

[34] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. MobileNetV2: inverted residuals and linear bottlenecks. IEEE Conf Comput Vis Pattern Recogn. 2018;45:10-20.

[35] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735-80.

[36] Jain S, Gupta R, Moghe AA. Stock price prediction on daily stock data using deep neural networks. 2018 International conference on advanced computation and telecommunication (ICACAT); 2018 Dec 28-29; Bhopal, India. New York: IEEE; 2018. p. 1-13.

[37] Yan J, Qi Y, Rao Q. Detecting malware with an ensemble method based on deep neural network. Secur Comm Network. 2018;2018(1):1-16.

[38] Phiphiphatphaisit S, Surinta O. Food image classification with improved MobileNet architecture and data augmentation. The 3rd international conference on information science and systems (ICISS); 2020 Mar 19-22; Cambridge, UK. New York: ACM Press; 2020. p. 51-6.