Professional Documents
Culture Documents
STW7088CEM
STW7088CEM
Submitted By Submitted To
Manish Chaudhary Shrawan Thakur
a. Related Works
In the article by Chang Liu a deep learning-based approach employing CNNs for food image recognition. The
study focuses on the application of this technology in computer-aided dietary assessment, emphasizing the
potential impact on individuals' nutritional tracking. The authors discuss the architecture of the CNN model, its
training process, and the evaluation results, showcasing its effectiveness in recognizing a diverse range of food
items.[1]
The publication by Atif Mahmood is about using AI to recognize food in images for fitness applications. It
discusses the importance of diet and exercise for health. It also mentions that people who compete in sports
need to follow strict diet plans. The authors propose using a convolutional neural network (CNN) to scan
images of food and identify what it is. They achieved an accuracy of 82.7% using this method. This could be
used in fitness apps to help people track their diet. [2]
The article by Jiannan Zheng is about food image recognition for smart home applications. It discusses using
superpixels to segment images and then extracting features from those segments. These features are then used
to classify the food in the image. The authors propose a system that outperforms other methods.[3]
The article by A Chaitanya is about food image classification and data extraction using convolutional neural
networks and web crawlers. It discusses using a convolutional neural network (CNN) to identify food in images
and then using web crawlers to find information about the food. The authors propose a system that uses a pre-
trained CNN model called Inception v3 to classify food images. They then use web crawlers to find information
about the food, such as its origin, nutritional details, and recipes. The system achieved an accuracy of 97.00%
for 20 classes of food. The authors believe that the system could be improved by using a deeper neural network,
collecting more images per class, and fine-tuning the model hyperparameters. [4]
Problem you are solving
the task you are performing
3. Methodology
a. Dataset:
The publicly available Food 101 dataset has been used in this experiment. The "Food-101" dataset consists of
101 food categories, each containing 1,000 images, designed for training and evaluating algorithms in food
image recognition. It encompasses a wide range of cuisines and dishes, making it suitable for developing and
testing models in the domain of food detection and classification. These datasets, including high-resolution
images and corresponding labels, are crucial for training Convolutional Neural Networks (CNNs) to accurately
recognize and categorize various foods. Researchers and developers often use such datasets to benchmark their
models, assess performance, and ensure generalizability across diverse food categories. Well-curated datasets
like Food-101 play a vital role in advancing the state-of-the-art in food image recognition and related
applications. For the latest information on developments in food image datasets as of 2024, it is recommended
to check recent publications, research repositories, or dedicated websites.
b. Data Loading and Preprocessing:
The code utilizes TensorFlow's ImageDataGenerator to load and preprocess images from the directory. Image
augmentation techniques such as rotation, shifting, shearing, zooming, and horizontal flipping are applied to
enhance the diversity of the training dataset. The data is rescaled to ensure pixel values fall within the range [0,
1].
c. Dataset Splitting:
The dataset is split into training and validation sets using the validation_split parameter of
flow_from_directory. This ensures that a portion (20%) of the data is reserved for validation, while the rest is
used for training the model.
d. Transfer Learning with InceptionV3:
The code employs transfer learning using the InceptionV3 architecture pretrained on the 'imagenet' dataset. The
top layers of InceptionV3 are excluded, and a global average pooling layer is added for feature extraction.
Additional layers are appended, including batch normalization, dropout, and dense layers with ReLU activation,
culminating in a softmax output layer.
e. Model Selection:
The choice of the InceptionV3 model in the provided code for food recognition stems from the benefits of
transfer learning. InceptionV3, pre-trained on the ImageNet dataset, offers a strong foundation for feature
extraction, leveraging its ability to capture hierarchical visual patterns. Transfer learning with InceptionV3
allows the model to adapt to the specific task of food recognition, reducing training time and resources. The
customization of the model's top layers, including the addition of batch normalization, dropout, and dense
layers, tailors InceptionV3 to the nuances of the food dataset. The utilization of global average pooling and a
softmax output layer further refines the model for multi-class classification. Overall, the algorithm selection is
grounded in the efficiency of transfer learning and the ability of InceptionV3 to provide a robust starting point
for the intricate task of recognizing diverse food items.
f. Model Compilation:
The model is compiled using the Adam optimizer with a learning rate of 0.001, categorical crossentropy as the
loss function, and accuracy as the evaluation metric.
g. Training the Model:
The model is trained using the fit method, specifying the training and validation datasets, the number of training
and validation steps, the number of epochs (10 in this case), and the Adam optimizer's learning rate. The
training history, including accuracy and loss metrics for each epoch, is stored in the history variable.
4. Experimental section
5. Discussion of Findings
The results indicate a gradual improvement in both training and validation accuracy over the 10 epochs.
However, a notable observation is the relatively slow convergence, suggesting potential areas for optimization.
Further analysis includes comparisons with initial hypotheses, implications of the findings, and avenues for
future research.
6. Conclusion
The training process showed a steady decrease in both training and validation loss, accompanied by an increase
in both training and validation accuracy. By the end of the 10 epochs, the model achieved a training accuracy of
49.65% and a validation accuracy of 50.22%. These results suggest that the model is learning and improving its
performance over time.
However, it's important to note that further analysis is needed to fully understand the model's performance and
limitations. This may involve additional training with different hyperparameters, evaluating the model on unseen
data, and interpreting the model's predictions to understand potential biases or errors.
7. References
8. Appendices