Snap2Insight Task Explanation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Snap2Insight Task

Name: Harsha.B

Baseline Model:​ Resnet 50


Proposed Model:​ Resnet 50 with U-Net attention branch (2 types)

Summary:

For this task, I have used the ResNet50 model (not pretrained) as the baseline
model. And I have proposed two architectures that utilize a U-Net based attention
branch on top of the ResNet50 model. Both of the two architectures have gotten a
better accuracy on the classification task than the baseline model on the test set
which is the same for all models.

This model was partly inspired by some of my previous research in medical image
segmentation where I came up with a unique U-Net architecture which gave results
comparable to the state of the art for retinal vessel segmentation. This architecture is
called U-Net+++ and it was inspired by the U-Net++ model and the HRNetV2 model.

For fine-grained image classification, many latest approaches use attention so that
the CNN can focus on the more important parts of the images which contain more
important features for discriminating between classes which are very similar. Hence I
decided to go for a similar approach.

For attention, we essentially have to generate an Attention map that highlights the
important parts of a feature map or the input image. Usually, attention approaches
utilize a default CNN for generating this attention map. This was where I wanted to
combine semantic segmentation (U-Net) into an attention mechanism because
essentially these tasks are very similar. Both require a fully convolutional network
and both essentially highlight certain parts of the input. Hence in my proposed
models, the attention mechanism utilized is based on the U-Net architecture and to
be more specific it utilizes a particular design of U-Net which is the U-Net +++ which
is a model which I have come up with in the past.
Model Diagrams:
Model 1:

Model 2:
Model Explanation:
In these models, I have utilized the feature maps from all 5 stages of the ResNet50
architecture. The U-Net+++ models within these architectures are used for
generating the attention maps. Each attention map is linked to its respective feature
map through an attention connection. An attention connection is essentially a
multiplication of the attention map with the feature map followed by the addition of
the resultant feature map with the original feature map. This was inspired by the
Convolutional Attention Block Module paper.

In the first model, there is a U-Net+++ architecture which takes input from the first
feature map from the first stage of the ResNet. The U-Net+++’s right side will have
the output feature maps required for generating attention maps for the rest of the
feature maps. These attention maps are generated by using a single conv layer with
the number of filters equal to the number of channels in the required feature map we
are applying attention on. This is then subjected to the sigmoid activation function so
it essentially decides which aspects of the feature map are more important. The last
node of the U-Net which is not connected to anything undergoes Global average
pooling and is then concatenated with the final fully connected layer of the model,
this is to make a better connection between the U-Net and the loss function.

In the second model, there are several mini U-Net+++. Each feature map has it’s
own mini U-Net which is responsible for performing attention on that feature map.
These mini U-Nets are also connected to each other by skip connections through the
bottom two layers. This is so that the mini U-Net’s can share gradients passed
through them all the way from the back to the front.
Training Procedure:

Data Processing:

Before beginning any experiment, I made a test set comprising of 20% of the total
dataset. This split was done with stratification to combat the data imbalance between
the classes and so that the evaluation process doesn’t get affected by uneven
distribution of classes in the different splits.

Each image was normalized by dividing by 255. The entire train set and test set
along with labels were stored in dictionary format and then pickled for easier training
later on. I have omitted all UPCs having lesser than two samples because we can’t
split otherwise.

Model Configuration:

At the end of the model, there is a dense layer with softmax activation. This is
subjected to a categorical cross-entropy loss. The loss function is also weighted with
class weights so that the model will focus more on the classes having lesser
samples, this allows more stable training.

Training:

Prior to training, I made a 20% validation split with stratification to continuously


evaluate model performance. I trained each experiment for 50 epochs on Google
Colab with checkpointing which will save best model having best validation score. I
have used the final best models for each experiment to calculate the final results.
Results:
These results were evaluated on the test set

Baseline:

Accuracy: 90.46
Precision: 90.82
Recall: 90.46

Model 1:

Accuracy: ​91.73
Precision: ​91.98
Recall: ​91.73

Model 2:

Accuracy: 91.09
Precision: 91.42
Recall: 91.09
Conclusion:
Model 1 outperforms the other models in accuracy, precision, and recall. However,
model 2 is computationally cheaper because it has lesser parameters.

Note:

I have generated confusion matrices for each model. They can be found in the
results folder and are in pickled format. Intra_brand_cfm contains the NumPy array
for the intra brand confusion matrix and inter_brand_cfm_dict contains a dictionary
with the brand names as keys and the NumPy array containing the confusion matrix
for each brand as the values.

Best models:

Baseline:
Snap2Insight/models/baseline_resnet50/baseline_resnet50_49_0.3336_0.9096.h5

Model 1:
Snap2Insight/models/unet_resnet/unet_resnet50_39_0.3683_0.9146.h5

Model 2:
Snap2Insight/models/unet_resnet/unet_resnet50_3_48_0.3456_0.9176.h5

References:

U-Net ++: ​https://arxiv.org/abs/1807.10165

Convolutional Block Attention Mechanism:


https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convol
utional_Block_Attention_ECCV_2018_paper.pdf

HRNet: ​https://arxiv.org/abs/1908.07919

You might also like