Challenge2

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Motivation:

Image classification is one of the fundamental problems in machine learning. It is always interesting
to solve an image classification problem. The classification problem given in the challenge is new and
interesting to explore with pre trained models. This motivated me to take this challenge.

Abstract:

In this challenge, we began by training a ResNet34 model from scratch using the provided training
data. However, due to the limited number of training images, we subsequently explored the use of a
pretrained model on ImageNet data, accessible in torchvision. The pretrained model exhibited
superior performance compared to the model trained from scratch. Despite attempting data
augmentation to enhance model performance, no notable improvement was observed in the
validation set. To address this, a novel framework emphasizing consistency was introduced. This
involved employing various augmentations of the same image and compelling the model to generate
predictions consistent with those made on the non-augmented images. There was slight
improvement of the performance in the validation set, we observed the validation accuracy become
97.27 from 96.96.

Introduction:

Data augmentation has been used extensively in deep learning to improve the model performance.
Recent studies involving the data augmentation has proved data augmentation can make the model
more robust to test time discrepancy [1-2]. Recent studies have also shown the performance of
enforcing consistency on the prediction of different augmented version of the same input image to
perform test time adaptation Inspired by the compelling evidence presented in these studies, we, in
turn, embarked on a comprehensive exploration of the advantages conferred by a consistency-based
training paradigm applied to augmented data instances. Our approach aimed not only to harness the
power of data augmentation but also to capitalize on the potential gains afforded by maintaining
consistency across diverse augmented representations of input data. This multifaceted strategy,
rooted in the amalgamation of data augmentation and consistency enforcement, forms the
cornerstone of our innovative training methodology. Although there was no significant improvement
on the validation set, but it can be considered the proposed method has improved the robustness of
the model as the during training the validation loss more stable than the naive training.

Data Pre-processing/Analysis:

First the input image was resized to 224,224, and then the image was renormalized to make the
mean and standard deviation [0.485, 0.456, 0.406], [0.229, 0.224, 0.225] along three channels. We
then performed sequential but random augmentation of Horizontal and Vertical flip.

Model Architecture:

The proposed training scheme is shown in the Figure 1. Here, first the unaugmented image is
forwarded as the input to the network. Then the model prediction was compared with the original
label to calculate the prediction loss, next the same input image undergoes two different
augmentations. The prediction of two different augmented view was then compared with original
label and also the prediction probabilities were forced to be consistent with the prediction
probabilities of original image. This will make the model consistent with different shifting which may
occur in the test time. Here we have used a pre trained ResNet34 model as our backbone resnet
model.
Figure 1: The Proposed Training Paradigm

Experimental Setting:

We used Adam optimizer with default hyper parameter setting. We divided the available training
into two subgroup. First 80% of data was used to train the network and remaining 20% for
validation. A learning schedular was employed to modulate the learning rate where the learning rate
was set to reduce by a factor of 2 if there is no improvement in the validation loss for 10 consecutive
epoch. For training we used cross entropy loss as the main supervised loss and MSE loss for
consistency training. The model was finetuned for 50 epochs.

Hypothesis Tried:

I tried with training the model from scratch, fine tuning a model trained using Image Net data,
finetuning the model with augmented data, and finally the proposed method.

Result:

All the models expect the final model performed similarly, and achieved a validation accuracy of
96.50 +. Only the final model crossed the 97% mark and achieved 97.27 as the highest validation
accuracy.

Key Findings:

All the model faced difficulties in discriminating between fetal femur and thorax. Manual inspection
of the test data showed the prediction of the model for fetal brain is most accurate.

Future Work:

Given sufficient time I would like to use the self-supervised training like SimCLR to finetune the
classifier model on real ultrasound image, for this any large scale publicly available dataset can be
used. Again, more sophisticated consistency training can be explored like KL divergence between the
prediction augmented image vs prediction on unaugmented image. Also, a student teacher
knowledge distillation can be tried to explore its efficiency. We can modify the backbone classifier
model where we can embed a embedding trained using CLIP on real ultrasound images to add some
domain knowledge.

Ref:

1. Wang, Qin, et al. "Continual test-time domain adaptation." Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. 2022.
2. Rebuffi, Sylvestre-Alvise, et al. "Data augmentation can improve robustness." Advances in
Neural Information Processing Systems 34 (2021): 29935-29948.
3. Zhang, Marvin, Sergey Levine, and Chelsea Finn. "Memo: Test time robustness via
adaptation and augmentation." Advances in Neural Information Processing Systems 35
(2022): 38629-38642.

You might also like