Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

STW7088CEM: Artificial Neural Networks

Food Detection and Recognition Using a Deep Convolutional Neural Network

Submitted By Submitted To
Manish Chaudhary Shrawan Thakur

Table of content goes here


1. Introduction
Food detection and recognition using Deep Convolutional Neural Networks (DCNNs) is a cutting-edge
application of artificial intelligence in computer vision. DCNNs, adept at learning hierarchical features from
images, excel in identifying and categorizing various food items based on textures, shapes, and colors. This
technology has diverse applications, from dietary monitoring and nutrition analysis for individuals to
streamlining inventory management and enhancing quality control in the food industry.
The process involves training the DCNN on extensive datasets, enabling the model to extract features indicative
of different food categories. Once trained, the system can analyze new images, making predictions about the
type of food present. Despite its potential, challenges include the need for diverse datasets, biases in training
data, and handling variations in food appearance. Researchers are actively addressing these issues to refine and
improve the capabilities of food detection systems.
The convergence of food detection and DCNNs presents a dynamic field with implications for health, nutrition,
and industry. As technology advances, these systems are poised to play a pivotal role in daily life, offering
innovative solutions to challenges in food analysis and identification.
2. Background
The intersection of food detection and recognition with Deep Convolutional Neural Networks (DCNNs) builds
upon the progress in computer vision and artificial intelligence. DCNNs, renowned for their hierarchical feature
learning from images, have revolutionized image recognition, finding applications in nuanced visual analysis.
The motivation for applying DCNNs to food lies in addressing diverse needs, from dietary monitoring to
enhancing operational efficiency in the food industry, and capitalizing on the potential of automated food item
identification from images.
This evolution is rooted in advancements in both hardware and algorithmic techniques, enabling the training of
more sophisticated models on extensive datasets. Despite notable progress, challenges persist, including biases
in training data and the demand for robust models capable of handling variations in food appearance. The
ongoing research in this field reflects a dynamic convergence of computer vision technology and practical
applications, aiming to create accurate and reliable food detection systems with broad real-world applicability.

a. Related Works
In the article by Chang Liu a deep learning-based approach employing CNNs for food image recognition. The
study focuses on the application of this technology in computer-aided dietary assessment, emphasizing the
potential impact on individuals' nutritional tracking. The authors discuss the architecture of the CNN model, its
training process, and the evaluation results, showcasing its effectiveness in recognizing a diverse range of food
items.[1]
The publication by Atif Mahmood is about using AI to recognize food in images for fitness applications. It
discusses the importance of diet and exercise for health. It also mentions that people who compete in sports
need to follow strict diet plans. The authors propose using a convolutional neural network (CNN) to scan
images of food and identify what it is. They achieved an accuracy of 82.7% using this method. This could be
used in fitness apps to help people track their diet. [2]
The article by Jiannan Zheng is about food image recognition for smart home applications. It discusses using
superpixels to segment images and then extracting features from those segments. These features are then used
to classify the food in the image. The authors propose a system that outperforms other methods.[3]
The article by A Chaitanya is about food image classification and data extraction using convolutional neural
networks and web crawlers. It discusses using a convolutional neural network (CNN) to identify food in images
and then using web crawlers to find information about the food. The authors propose a system that uses a pre-
trained CNN model called Inception v3 to classify food images. They then use web crawlers to find information
about the food, such as its origin, nutritional details, and recipes. The system achieved an accuracy of 97.00%
for 20 classes of food. The authors believe that the system could be improved by using a deeper neural network,
collecting more images per class, and fine-tuning the model hyperparameters. [4]
Problem you are solving
the task you are performing
3. Methodology
a. Dataset:
The publicly available Food 101 dataset has been used in this experiment. The "Food-101" dataset consists of
101 food categories, each containing 1,000 images, designed for training and evaluating algorithms in food
image recognition. It encompasses a wide range of cuisines and dishes, making it suitable for developing and
testing models in the domain of food detection and classification. These datasets, including high-resolution
images and corresponding labels, are crucial for training Convolutional Neural Networks (CNNs) to accurately
recognize and categorize various foods. Researchers and developers often use such datasets to benchmark their
models, assess performance, and ensure generalizability across diverse food categories. Well-curated datasets
like Food-101 play a vital role in advancing the state-of-the-art in food image recognition and related
applications. For the latest information on developments in food image datasets as of 2024, it is recommended
to check recent publications, research repositories, or dedicated websites.
b. Data Loading and Preprocessing:
The code utilizes TensorFlow's ImageDataGenerator to load and preprocess images from the directory. Image
augmentation techniques such as rotation, shifting, shearing, zooming, and horizontal flipping are applied to
enhance the diversity of the training dataset. The data is rescaled to ensure pixel values fall within the range [0,
1].
c. Dataset Splitting:
The dataset is split into training and validation sets using the validation_split parameter of
flow_from_directory. This ensures that a portion (20%) of the data is reserved for validation, while the rest is
used for training the model.
d. Transfer Learning with InceptionV3:
The code employs transfer learning using the InceptionV3 architecture pretrained on the 'imagenet' dataset. The
top layers of InceptionV3 are excluded, and a global average pooling layer is added for feature extraction.
Additional layers are appended, including batch normalization, dropout, and dense layers with ReLU activation,
culminating in a softmax output layer.
e. Model Selection:
The choice of the InceptionV3 model in the provided code for food recognition stems from the benefits of
transfer learning. InceptionV3, pre-trained on the ImageNet dataset, offers a strong foundation for feature
extraction, leveraging its ability to capture hierarchical visual patterns. Transfer learning with InceptionV3
allows the model to adapt to the specific task of food recognition, reducing training time and resources. The
customization of the model's top layers, including the addition of batch normalization, dropout, and dense
layers, tailors InceptionV3 to the nuances of the food dataset. The utilization of global average pooling and a
softmax output layer further refines the model for multi-class classification. Overall, the algorithm selection is
grounded in the efficiency of transfer learning and the ability of InceptionV3 to provide a robust starting point
for the intricate task of recognizing diverse food items.
f. Model Compilation:
The model is compiled using the Adam optimizer with a learning rate of 0.001, categorical crossentropy as the
loss function, and accuracy as the evaluation metric.
g. Training the Model:
The model is trained using the fit method, specifying the training and validation datasets, the number of training
and validation steps, the number of epochs (10 in this case), and the Adam optimizer's learning rate. The
training history, including accuracy and loss metrics for each epoch, is stored in the history variable.
4. Experimental section
5. Discussion of Findings

Epoch Training Loss Training Accuracy Validation Loss Validation Accuracy

1 2.4006 41.63% 2.2786 44.92%

Epoch 2 2.3377 43.01% 2.2317 45.09%

Epoch 3 2.2660 44.16% 2.1860 46.41%

Epoch 4 2.2239 45.10% 2.1465 47.60%

Epoch 5 2.1809 46.08% 2.1296 48.35%

6 2.1350 46.80% 2.1004 48.38%

7 2.0956 47.66% 2.1070 48.32%

8 2.0471 48.56% 2.0612 49.19%

9 2.0215 49.03% 2.0466 49.70%

10 1.9896 49.65% 2.0401 50.22%

The results indicate a gradual improvement in both training and validation accuracy over the 10 epochs.
However, a notable observation is the relatively slow convergence, suggesting potential areas for optimization.
Further analysis includes comparisons with initial hypotheses, implications of the findings, and avenues for
future research.
6. Conclusion
The training process showed a steady decrease in both training and validation loss, accompanied by an increase
in both training and validation accuracy. By the end of the 10 epochs, the model achieved a training accuracy of
49.65% and a validation accuracy of 50.22%. These results suggest that the model is learning and improving its
performance over time.
However, it's important to note that further analysis is needed to fully understand the model's performance and
limitations. This may involve additional training with different hyperparameters, evaluating the model on unseen
data, and interpreting the model's predictions to understand potential biases or errors.
7. References
8. Appendices

You might also like