Professional Documents
Culture Documents
248
248
Communication
Himanshu Jain1, Aayush Kumar1, Sameena Pathan2, Tanweer Ali1, Jagadeesh Chandra
R B1*, Vikas Kumar Jhunjhunwala3
1
Department of Electronics and Communication Engineering, Manipal Institute of Technology,
Manipal Academy of Higher Education, Manipal 576104, India.
2
Department of Information and Communication Technology, Manipal Institute of Technol-
ogy, Manipal Academy of Higher Education, Manipal 576104, India.
3
Department of Electrical and Electronics Engineering, Manipal Institute of Technology, Mani-
pal Academy of Higher Education, Manipal 576104, India.
1 Introduction
For convenience of processing, the authors of [8] resized the image to 256 × 256
pixels. To increase the image's sharpness, adaptive gamma correction was utilized for
image recognition and reconstruction. Finally, they employed augmentation to in-
Contribution Title (shortened if too long) 3
crease the amount of training samples. They also applied histogram equalization to
improve the image quality. From the results obtained via the performance metrics
used in [6] we can infer that classification of UC is more complex than CD as 94% of
CD patients were accurately labelled whereas only 65% of the UC patients.
The lack of data presented a challenge to the writers of [9]. In their suggested
method, they used the FastGAN few shot generator to artificially create new data
from a real dataset containing UC colonoscopy data. To tackle class imbalance in
their dataset, the main objective was to produce high resolution synthetic images and
to introduce variance within the data.
The authors of [5] developed an automated method for diagnosing IBD (UC and
CD). The patient data was pre-processed, and important variables were chosen before
being exposed to the SVM, which was assessed using the generated binary classifica-
tion quality metrics. The objective is to map the n-dimensional input space to a
higher-dimensional space. A linear classifier is then used to classify the newly created
feature space.
The backbone network used by the authors in [10] was the DenseNet201 design. It
consists of five dense layers, with a feed-forward connection between each layer.
Features that are recovered by the DenseNet are partitioned and input into indepen-
dently recurrent neural networks (IndRNN) and an improved attention mechanism
module (EAM-Net) to generate attention maps and knit them together to emphasize
the extracted features. To optimize the computation space and lower the number of
calculations needed, the output of EAM- Net is added to the global average pooling
layer to optimize the calculations by reducing the number of calculations needed to be
performed. Additionally, it can help retain more background characteristics by reduc-
ing the errors brought on by the rise in estimation variance.
Ensemble models powered by pre-trained networks through transfer learning can
be utilized to generate a model that can be practically applied in the detection of GI
diseases with high accuracy which can help reduce burden on healthcare professionals
and lead to quicker diagnosis [11]. The authors suggested a UC-NfNet architecture in
[12-13]. Their network accepts as input an RGB image of HW resolution that has
been channel-wise normalized using the standard deviation and mean obtained from
ImageNet. By adding a Spatial Attention Block (SAB) following the initial convolu-
tion, using 2 convolutions with 16 channels, 1 with 32 channels and other with 64
channels, and then performing 2 convolutions with 128 channels, UC- NfNet alters
the main block of the original system. By essentially concentrating on areas that are
more important therapeutically, they developed the SAB to improve functionality of
the UC-NfNet.
In the proposed work the imbalance in the dataset is handled through class-wise
augmentation resulting in sufficient amount of data required to feed the model. The
model comprises of different convolutional blocks each of which contains a unique
setof filters to aid in feature extraction. The model’s performance is very competitive
compared to the different approaches mentioned above and is a brilliant learner that
can be trained on any medical dataset for custom implementation. The regularization
techniques used for handling overfitting assure the integrity of the model’s predic-
tions, which is of utmost importance when dealing with medical data.
2 Methodology
The framework for the proposed method is depicted as given in Fig. 1. The dataset
once loaded is visualized and augmented for the purpose of scaling the dataset hence
leading to a more generalized model when trained. The CNN architecture is con-
structed and learned over many iterations until the best possible weight combination
is achieved. Finally, the test data is loaded and the model is evaluated via multiple
sets of predictions.
Some of the tools used are TensorFlow, OpenCV, Pandas, NumPy, Matplotlib. Coded
entirely in python using the Google Collab environment with GPU for training the
deep neural network. First, we observe the importance of balancing our dataset. The
dataset used in this project is downloaded from an open online repository. This
dataset contains 4 classes ranked from Mayo 0 to Mayo 3 which is in the increasing
order of the severity of the disease as classified by gastroenterologists. Number of
images in, Mayo 0: 5180; Mayo 1: 2588; Mayo 2: 1077; Mayo 3: 745. The dataset has
endoscopic images from different parts of the colon. The purpose of balancing the
dataset is to remove any bias that may be introduced into the model which may lead
to irregular pattern of classification. The dataset is relatively small hence it is also
scaled 4 times in size, thus by introducing new images into the sample space the
model has more features and variants to learn from leading to a more generalized
predictor.
We balance the dataset and scale it to 10,000 images per class through the application
of class-wise augmentation using a library known as Albumentations. The techniques
used are horizontal and vertical flips, rotations, brightness, and color jitter etc. On
Contribution Title (shortened if too long) 5
training the same model using this new and improved dataset we will now observe the
upgraded results obtained.
Initially we started with a simple CNN model. This model didn’t use any regulariza-
tion techniques to prevent overfitting and with a simple single Conv2d and single
dense layer, the model’s learning ability was also limited. On training for 34 epochs
the training and validation accuracy saturated at approx. 69%. So even though there
was no overfitting, and the fluctuations were mild the model simply was not powerful
enough to learn all the important features. So, we took it to the next level, by adding
another conv2d block with double the number of filters (32, original conv2d had 16)
and also adding regularization, batch normalization per convolutional layer and
dropouts to the dense layer to curtail the overfitting with what is now a more powerful
model. This new improved model produced an accuracy of 82% and validation score
of 76% on training for 34 epochs, so even with the regularization the suspected over-
fitting could not be handled effectively as the model probably still wasn’t complex
enough to be generalized effectively by the techniques implemented. So, we decided
to further increase the model complexity to raise the accuracy and observe how the
model performs on the validation data. With a total of now 3 Convolutional blocks
with 2 Conv2d layers per block with increasing number of filters per layer per block
starting from 32 going all the way up to 128 in the last convolutional layer in the third
block, a (2,2) max pooling at the end of each block output of which is flattened using
the GlobalAveragePooling2d layer and fed to a 512-neuron dense layer for learning.
The 4 neuron SoftMax layer performs the classification with a now improved training
accuracy of 90% at the 34th epoch and a validation accuracy of 86%. Here we can
finally see the importance of using regularization to reduce the degree of overfitting
as the complexity of the model increases, if we did not go for it then the validation
score would still be around the 70% mark whereas the training would be in the 90s.
The novel CNN model hit a training accuracy of 90% and a validation score of
86% highlighting the fact that the model has handled overfitting which implies that
the model has performed very well on fresh unseen data. On implementing transfer
learning and fine- tuning the pre-trained models we get the following results.
It can be observed that the pre-trained models did very well but not better than the
novel implementation, in fact the proposed approach did not overfit quite as much as
the pre- trained models, this is due to the fact that the novel model was trained on the
dataset from scratch and hence is more generalized whereas the pre-trained models
were trained on the ImageNet database and the weights of which were fine tuned to
the current dataset with the help of a single dense layer.
(a) (b)
Fig. 1. Performance of the proposed model and Transfer learning model (a) Proposed (b) Incep-
tion_Resnet_V2.
(a) (b)
From the Fig 1 and Fig 2 it can be observed that the frequency of fluctuations is lesser
in the fine- tuned pretrained models. Since these models are trained over millions of
images of different kinds the probability of randomized predictions is few and far
between. The ROC curve and Confusion Matrix depicts the class-wise performance of
the model, and as we can observe the model performs very well on both these metrics.
It can be inferred that the Mayo 2 class has relatively the least accurate predictions,
which makes sense as it is sandwiched between Mayo1 which is light infection and
Mayo 3 which is serious, resulting in some images from Mayo 2 being ambiguous and
thereby resulting in some confusion in the predictions.
Contribution Title (shortened if too long) 9
(a)
(b)
Fig. 3. Performance of the proposed IBDNet (a) Confusion Matrix (b) ROC curve
4 Conclusion
This research work will greatly benefit doctors who now do not need to sit and analyze each n
every image to search for an issue, the computer will give the verdict with a high confidence
level and the doctor can now focus his attention on deciding the treatment plan going forward
for his/her patients thereby saving a lot of time and resulting in a faster diagnosis without com-
promising the quality of care given to the patients.
In the future we can focus on improving the performance of the proposed method by exploring
different deep learning architectures and transfer learning models. For instance, more recent
architectures such as EfficientNet and ViT (Vision Transformer) can be evaluated for their
suitability in the classification of IBD. Additionally, the proposed method can be extended to
incorporate other clinical data such as patient history, laboratory results, and endoscopy find-
ings to improve the accuracy of the diagnosis. Finally, the proposed method can be evaluated
on larger and more diverse datasets to ensure its generalizability and applicability to different
patient populations.
Disclosure of Interests. The authors declare that they have no competing interests.
References
1. Minna, John D., Jack A. Roth, and Adi F. Gazdar. "Focus on lung cancer." Cancer cell 1.1
(2002): 49-52.
2. Torre, L. A., Siegel, R. L., & Jemal, A. (2016). Lung cancer statistics. Lung cancer and
personalized medicine: current knowledge and therapies, 1-19.
3. Malik, P. S., & Raina, V. (2015). Lung cancer: Prevalent trends & emerging concepts. The
Indian journal of medical research, 141(1), 5.
4. Raza, R., Zulfiqar, F., Khan, M. O., Arif, M., Alvi, A., Iftikhar, M. A., & Alam, T. (2023).
Lung-EffNet: Lung cancer classification using EfficientNet from CT-scan images. Engi-
neering Applications of Artificial Intelligence, 126, 106902.
5. Princy Magdaline, P., & Ganesh Babu, T. R. (2023). Detection of lung cancer using novel
attention gate residual U-Net model and KNN classifier from computer tomography im-
ages. Journal of Intelligent & Fuzzy Systems, (Preprint), 1-14.
6. Yan, C., & Razmjooy, N. (2023). Optimal lung cancer detection based on CNN optimized
and improved Snake optimization algorithm. Biomedical Signal Processing and Control,
86, 105319.
7. Mohamed, T. I., Oyelade, O. N., & Ezugwu, A. E. (2023). Automatic detection and classi-
fication of lung cancer CT scans based on deep learning and ebola optimization search
algorithm. Plos one, 18(8), e0285796.
8. Deepa, V., & Fathimal, P. M. Deep-ShrimpNet fostered Lung Cancer Classification from
CT Images.
9. VR, N., & Chandra SS, V. (2023). ExtRanFS: An automated lung cancer malignancy
detection system using extremely randomized feature selector. Diagnostics, 13(13), 2206.
10. Nigudgi, S., & Bhyri, C. (2023). Lung cancer CT image classification using hybrid-SVM
transfer learning approach. Soft Computing, 1-15.
11. Sabzalian, M. H., Kharajinezhadian, F., Tajally, A., Reihanisaransari, R., Alkhazaleh, H.
A., & Bokov, D. (2023). New bidirectional recurrent neural network optimized by im-
proved Ebola search optimization algorithm for lung cancer diagnosis. Biomedical Signal
Processing and Control, 84, 104965.
12. H. F. Al-Yasriy, "The IQ-OTHNCCD lung cancer dataset URL
https://data.mendeley.com/datasets/bhmdr45bh2/1,"Mendeley Data, 2020.
13. Mirjalili, S. (2016). SCA: a sine cosine algorithm for solving optimization problems.
Knowledge-based systems, 96, 120-133