Professional Documents
Culture Documents
Major Project Report
Major Project Report
Major Project Report
Thesis
in
by
Jayakrishnan T
192SP008
Thesis
I hereby declare that the report of the P.G. Project work entitled SEMANTIC SEG-
MENTATION OF WIRELESS CAPSULE ENDOSCOPY IMAGES which is being
submitted to the National Institute of Technology Karnataka, Surathkal in partial ful-
fillment of the requirements for the award of the degree of Master of Technology in
Signal Processing and Machine Learning in the department of Electronics and Com-
munication Engineering, is a bonafide report of the work carried out by me. The material
contained in the report has not been submitted to any University or Institution for the
award of any degree.
Jayakrishnan T
Reg. No:-192473SP008
Department of Electronics and Communication Engineering
This is to certify that the P.G. project work entitled SEMANTIC SEGMENTATION
OF WIRELESS CAPSULE ENDOSCOPY IMAGES submitted by Jayakrishnan
T (Registration No:- 192473SP008) as the record of the work carried out by him is
accepted as the P.G. Project Work Report submission in partial fulfillment of the re-
quirements for the award of degree of Master of Technology in Signal Processing and
Machine Learning of the Department of Electronics and Communication Engineering
during the academic year of 2020-2021.
On the very outset of this report, I would like to extend my sincere and heartfelt obli-
gation towards all the personages who have helped in this endeavor. I am extremely
thankful and pay my gratitude to my guide Dr. Aparna P, Assistant Professor of the
Department of Electronics and Communication Engineering, for her valuable guidance
and support for the completion of this project. I am ineffably indebted to Dr. Ashwini
Chaturvedi, HOD, for guidance and encouragement to accomplish this work.
Jayakrishnan T
192SP008
Abstract
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
3 Implementation 5
3.1 Fully Convolutional Network (FCN-32s, 16s, and 8s) . . 5
3.2 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 ResU-Net . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Attention U-Net . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Double U-Net . . . . . . . . . . . . . . . . . . . . . . . 15
3.6 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7 Data Augmentation . . . . . . . . . . . . . . . . . . . . 20
3.8 Training Details . . . . . . . . . . . . . . . . . . . . . . 21
3.9 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 21
4 Results 23
4.1 FCN-32s . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 FCN-16s . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 FCN-8s . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 U-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Res U-Net . . . . . . . . . . . . . . . . . . . . . . . . . 27
I
4.6 Attention U-Net . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Double U-Net . . . . . . . . . . . . . . . . . . . . . . . 29
5 Conclusion 31
Bibliography 33
Biodata 34
II
List of Figures
1 WCE Capsule . . . . . . . . . . . . . . . . . . . . . . . 2
2 FCN Architecture . . . . . . . . . . . . . . . . . . . . . 6
3 FCN Architecture Variations . . . . . . . . . . . . . . . 7
4 U-Net Architecture . . . . . . . . . . . . . . . . . . . . 8
5 Building blocks of neural networks. (a) Plain neural unit
used in U-Net and (b) Residual unit with identity mapping
used in ResU-Net . . . . . . . . . . . . . . . . . . . . . 9
6 ResU-Net Architecture . . . . . . . . . . . . . . . . . . 11
7 Attention Gate . . . . . . . . . . . . . . . . . . . . . . . 14
8 Attention U-Net . . . . . . . . . . . . . . . . . . . . . . 15
9 Atrous Convolution: 2D convolution using a 3 x 3 kernel
with a dilation rate of 2 and no padding . . . . . . . . . 16
10 Atrous Spatial Pyramid Pooling (ASPP) . . . . . . . . . 17
11 Squeeze and Excitation Block . . . . . . . . . . . . . . 17
12 Double U-Net . . . . . . . . . . . . . . . . . . . . . . . 19
III
16 U-Net Results. (a) Input image (b) Ground truth (c) Seg-
mented output . . . . . . . . . . . . . . . . . . . . . . . 27
17 ResU-Net Results. (a) Input image (b) Ground truth (c)
Segmented output . . . . . . . . . . . . . . . . . . . . . 28
18 Attention U-Net Results. (a) Input image (b) Ground truth
(c) Segmented output . . . . . . . . . . . . . . . . . . . 29
19 Double U-Net Results. (a) Input image (b) Ground truth
(c) Segmented output . . . . . . . . . . . . . . . . . . . 30
IV
List of Tables
1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
V
List of Abbreviations
AG Attention Gate
TP True Positive
TN True Negative
FP False Positive
FN False Negative
VI
Chapter 1
Introduction
1
Figure 1: WCE Capsule
1.1 Motivation
2
Chapter 2
Background
Image segmentation has been a hot topic inside the deep learning com-
munity for quite a long time. It is one of the toughest tasks in computer
vision. The benefits of image segmentation are numerous especially in
the biomedical field where the data is not always clean and easy to exam-
ine. Other applications include image compression, scene understanding,
locating objects in satellite images, autonomous vehicles, augmented re-
ality, etc. Over time, many algorithms have been developed for image
segmentation but with the advent of deep learning in computer vision,
many deep learning models for image segmentation have also emerged.
Semantic segmentation and instance segmentation are the two types
of segmentation schemes in deep learning. Semantic segmentation per-
forms pixel-level labeling with a set of object categories (e.g., human,
car, tree, sky) for all image pixels. It treats multiple objects of the same
class as a single entity. It is generally a harder undertaking than image
classification, which predicts a single label for the entire image.
Instance segmentation extends the semantic segmentation scope fur-
ther by detecting and delineating each object of interest in the image (e.g.,
partitioning of individual persons). It treats multiple objects of the same
class as distinct individual objects (or instances). In the case of WCE im-
ages which is dealt with in this project, semantic segmentation is enough.
3
Various deep neural network architectures are present for semantic
segmentation and are always getting updated. The initial architectures
were an advancement in standard classification architectures by replacing
the fully connected layers with convolutional layers to get images as the
output. These architectures are generally called fully convolutional net-
works. Later, architectures were designed exclusively for segmentation.
Although there are several architectures present for semantic segmen-
tation and they seem entirely different, the basic work a segmentation
architecture does is somewhat the same. The difference is mainly in the
way the basic work is enhanced. In most of the segmentation schemes,
the input image is fed through a series of convolutional layers which de-
crease the dimensionality of the image and increase the number of chan-
nels. From this reduced form, the segmented output image is obtained
after increasing the dimensionality to that of the input again through a
series of convolutional layers.
4
Chapter 3
Implementation
FCN [1] is one of the earliest and widely used image segmentation algo-
rithms in deep learning. It was proposed in 2014. VGG16 [2] is chosen as
the base network for FCN which according to the original paper gave bet-
ter performance compared to other standard networks like GoogLeNet.
The architecture (Figure 2) consists of a series of convolutional and max-
pooling layers. The height and width of the input image are reduced con-
tinuously because of the convolutional and maxpooling layers. Also, the
depth is increased as the number of filters used increases in deeper layers.
From the output of the last maxpooling layer, the segmented output is ob-
tained using upsampling. The major change in FCN compared to VGG16
is the replacement of fully connected layer by convolutional layer. The
5
network consists of a downsampling path, used to extract and interpret
the context, and an upsampling path, which allows for localization.
There are three variations in FCN. These are FCN-32s, FCN-16s and
FCN-8s. FCN-32s is the conventional one that does not have any skip
connections as discussed before. In FCN-16s and FCN-8s, skip connec-
tions are introduced to improve the information flow and thereby the out-
put. FCN-16s has one skip connection from the second last pooling layer
whereas FCN-8s has two skip connections from the second and third last
pooling layers. FCN 32s, FCN-16s, and FCN-8s require 32x, 16x, and
8x upsampling to obtain the segmented output. Hence, the names. The
basic difference in the architecture is shown below (Figure 3).
6
Figure 3: FCN Architecture Variations
3.2 U-Net
U-Net [3] architecture was proposed in 2015. It is the most popular seg-
mentation architecture which is widely used in biomedical applications.
U-Net architecture (Figure 4) consists of a contracting path and an ex-
pansive path. The contracting path follows the typical architecture of a
7
convolutional network. It consists of the repeated application of two
3 x 3 convolutions (unpadded convolutions), each followed by a Recti-
fied Linear Unit (ReLU) and a 2 x 2 max pooling operation with stride
2 for downsampling. At each downsampling step, the number of fea-
ture channels is doubled. Every step in the expansive path consists of an
upsampling of the feature map followed by concatenation with the corre-
spondingly cropped feature map from the contracting path, and two
3 x 3 convolutions, each followed by a ReLU. The cropping is necessary
due to the loss of border pixels in every convolution. At the final layer, a
1 x 1 convolution is used to map each feature vector to the desired num-
ber of classes. U-Net has a network and training strategy that relies on
the strong use of data augmentation to use the available annotated samples
more efficiently.
8
The input images are resized to 256 x 256 x 3 and the encoder output
dimension is 16 x 16 x 128. Output size is 256 x 256 x 1.
3.3 ResU-Net
Figure 5: Building blocks of neural networks. (a) Plain neural unit used in U-Net and
(b) Residual unit with identity mapping used in ResU-Net
9
The major difference between the building blocks of U-Net and ResU-
Net is shown (Figure 5). The building block of U-Net contains a series
of convolutional layers. In ResU-Net, the building block contains a skip
connection through which input of the block is added with the output.
The residual network was proposed to avoid the issues when the net-
work is made deeper. Going deeper would improve the performance of
a multi-layer neural network, however, could hamper the training and a
degradation problem maybe occur. Residual network overcomes these
problems. The residual neural network consists of a series of stacked
residual units. Each residual unit can be illustrated as a general form:
where xl and xl+1 are the input and output of the l-th residual unit, F(·)
is the residual function, f (yl ) is activation function and h(xl ) is an identity
mapping function, a typical one is h(xl ) = xl .
The deep ResUnet which combines strengths of both U-Net and resid-
ual neural network. This combination bring us two benefits: 1) the resid-
ual unit will ease training of the network; 2) the skip connections within
a residual unit and between low levels and high levels of the network will
facilitate information propagation without degradation, making it possi-
ble to design a neural network with much fewer parameters however could
achieve better performance on semantic segmentation. The original deep
residual network paper suggested a full pre-activation design. The ResU-
Net paper also employs a full pre-activation residual unit to build the ar-
chitecture.
10
Figure 6: ResU-Net Architecture
11
The ResU-Net architecture is shown (Figure 6). It consists of an en-
coding network, decoding network, and a bridge connecting both these
networks, just like a U-Net. The U-Net uses two 3 x 3 convolutions, where
each is followed by a ReLU activation function. In the case of ResU-Net,
these layers are replaced by a pre-activated residual block.
The encoder takes the input image and passes it through different en-
coder blocks, which helps the network to learn an abstract representation.
The encoder consists of three encoder blocks, which are built using the
pre-activated residual block. The output of each encoder block acts as
a skip connection for the corresponding decoder block. The bridge also
consists of a pre-activated residual block.
The decoder takes the feature map from the bridge and the skip con-
nections from different encoder blocks and learns a better semantic rep-
resentation, which is used to generate a segmentation mask. The decoder
consists of three decoder blocks, and after each block, the spatial dimen-
sions of the feature map are doubles and the number of feature channels
is reduced.
Each decoder block begins with a 2 × 2 upsampling, which doubles
the spatial dimensions of the feature maps. Next, these feature maps are
then concatenated with the appropriate skip connection from the encoder
block. These skip connections help the decoder blocks to get the feature
learned by the encoder network. After this, the feature maps from the
concatenation operation are passes through a pre-activated residual block.
The output of the last decoder passes through a 1×1 convolution with
sigmoid activation. The sigmoid activation function gives the segmenta-
tion mask representing the pixel-wise classification.
The input images are resized to 256 x 256 x 3 and the encoder output
dimension is 16 x 16 x 256. Output size is 256 x 256 x 1.
12
3.4 Attention U-Net
Attention U-Net [6] was proposed in 2018 and is a novel attention gate
(AG) mechanism that allows the U-Net to focus on target structures of
varying size and shape.
Attention, in the context of image segmentation, is a way to highlight
only the relevant activations during training. This reduces the computa-
tional resources wasted on irrelevant activations, providing the network
with better generalisation power. Essentially, the network can pay “atten-
tion” to certain parts of the image.
Attention comes in two forms, hard and soft. Hard attention works on
the basis of highlighting relevant regions by cropping the image or iter-
ative region proposal. Since hard attention can only choose one region
of an image at a time, it has two implications, it is non-differentiable and
requires reinforcement learning to train. Since it is non-differentiable, it
means that for a given region in an image, the network can either pay
“attention” or not, with no in-between. As a result, standard backpropa-
gation cannot be done, and Monte Carlo sampling is needed to calculate
the accuracy across various stages of backpropagation.
Soft attention works by weighting different parts of the image. Areas
of high relevance are multiplied with a larger weight and areas of low
relevance are tagged with smaller weights. As the model is trained, more
focus is given to the regions with higher weights. Unlike hard attention,
these weights can be applied to many patches in the image.
Due to the deterministic nature of soft attention, it remains differen-
tiable and can be trained with standard backpropagation. As the model
is trained, the weighting is also trained such that the model gets better at
deciding which parts to pay attention to.
During upsampling in the expanding path, spatial information recre-
ated is imprecise. To counteract this problem, the U-Net uses skip con-
13
nections that combine spatial information from the downsampling path
with the upsampling path. However, this brings across many redundant
low-level feature extractions, as feature representation is poor in the initial
layers. Soft attention implemented at the skip connections will actively
suppress activations in irrelevant regions, reducing the number of redun-
dant features brought across.
14
Figure 8: Attention U-Net
15
transferred to another task easily. The main reasons for using the VGG
19 network are: (1) VGG-19 is a lightweight model as compared to other
pre-trained models, (2) the architecture of VGG-19 is similar to U-Net,
making it easy to concatenate with U-Net, and (3) it will allow much
deeper networks for producing better output segmentation mask. To cap-
ture more semantic information efficiently, another U-Net is added at the
bottom. Atrous Spatial Pyramid Pooling (ASPP) [8] is adopted to capture
contextual information within the network. ASPP uses atrous convolution
or dilated convolution which is a special type of convolution.
16
using 9 parameters. This is similar to taking a 5 x 5 kernel and deleting
every second column and row. This delivers a wider field of view at the
same computational cost. Atrous convolutions are particularly popular in
the field of real-time segmentation. It is used if a wide field of view is
needed and multiple convolutions or larger kernels cannot be afforded.
17
Double U-Net also uses Squeeze and Excitation blocks [10] (Figure
11). It improves channel interdependencies at almost no computational
cost.
Convolutional Neural Networks (CNN) use their filters to extract in-
formation from images. Lower layers find trivial pieces of context like
edges or high frequencies, while upper layers can detect faces, text, or
other complex geometrical shapes. All of this works by fusing the spatial
and channel information of an image. The different filters will first find
spatial features in each input channel before adding the information across
all available output channels. The network weighs each of its channels
equally when creating the output feature maps. Squeeze and excite block
adds a content aware mechanism to weigh each channel adaptively. To
get a global understanding of each channel, the feature maps are squeezed
into a single numeric value. This results in a vector of size n, where n is
equal to the number of convolutional channels. Then, it is fed through a
two-layer neural network, which outputs a vector of the same size. These
n values can now be used as weights on the original features maps, scaling
each channel based on its importance.
The architecture of the Double U-Net is shown (Figure 12). It can be
seen as two networks connected together, Network 1 and Network 2. In
Network 1, pretrained VGG 19 (on ImageNet) is used as the encoder. The
encoder output is fed to the decoder through ASPP to generate the first
segmented mask, Output 1. The decoder of Network 1 is the same as that
of U-Net.
An element-wise multiplication is performed between Output 1 with
the input image. This is a kind of attention mechanism. The multiplied
output is then fed to the encoder of Network 2. The encoder and decoder
of Network 2 are similar to that of U-Net. The encoder output is again
fed through ASPP to the decoder. The second segmented mask, Output
18
Figure 12: Double U-Net
19
project, the first one is FCN which modifies VGG16 by introducing skip
connections and replacing fully connected layers with convolutional lay-
ers to get an image as the output. Three variants of FCN (32s, 16s, and
8s) were implemented. U-Net is the second architecture implemented and
it uses an encoder-decoder architecture with skip connections from the
encoder side to the decoder to improve the information flow. ResU-Net
is the third architecture implemented which adds deep residual learning
with basic U-Net architecture in order to attain better information flow.
The fourth architecture implemented is Attention U-Net. It introduces
attention gates to suppress irrelevant regions. Double U-Net is the final
architecture implemented. It stacks two U-Nets on top of each other. To
improve the performance, ASPP, squeeze and excitation block, and atten-
tion mechanism are introduced.
3.6 Dataset
The dataset used in this project is the Kvasir-SEG dataset [11]. This
dataset is based on the previous Kvasir dataset, which is the first multi-
class dataset for Gastro-Intestinal (GI) tract disease detection and clas-
sification. The original Kvasir dataset comprises 8,000 GI tract images
from 8 classes where each class consists of 1000 images. In the Kvasir-
SEG dataset, only the polyp class of the original Kvasir dataset is used.
It consists of 1000 polyp images captured through WCE and the corre-
sponding ground truth images. The resolution of the images contained in
the dataset varies from 332 x 487 to 1920 x 1072 pixels.
20
such as different orientation, location, scale, brightness, etc, to existing
data, the robustness of the model is increased and over-fitting reduced.
Five basic augmentation methods are used which are rotation, cropping,
horizontal flip, vertical flip, and distortion. After data augmentation, the
number of images in the dataset is increased from 1000 to 6000. 80%
of the dataset (4800 images) is used for training and 10% of the dataset
(600 images) is used for both testing and validation. Data augmentation
is carried out using a library named Albumentation.
21
where |X| and |Y| are the cardinalities of the two sets (the number of pix-
els in each binary mask image).
3) Recall and Precision: Recall and Precision are calculated using the
equation given below:
where TP and TN denote the number of true positives and true negatives,
FP and FN denote the number of false positives and false negatives. A
detection is considered a true positive when the center of the prediction
bounding box is located within the ground truth bounding box. In bi-
nary classification, recall is also referred to as sensitivity which shows
the model’s ability to return the most true positive samples, for example,
polyps in this project. Precision represents the model’s ability to detect
more true positives than false positives, for example, more real polyps
than incorrectly detected normal tissue.
22
Chapter 4
Results
4.1 FCN-32s
The metric values obtained during training are good and there was some
level of reduction in metric values during testing. The dice coefficient
obtained during training is 0.9664 while during testing is 0.8307. IoU
obtained during training and testing is 0.9356 and 0.7175 respectively.
The recall and precision obtained are 0.9046 and 0.9871 respectively in
training and 0.7782 and 0.8710 in testing.
23
The output obtained using FCN-32s is shown (Figure 13). The seg-
mented output is similar to the ground truth.
Figure 13: FCN-32s Results. (a) Input image (b) Ground truth (c) Segmented output
4.2 FCN-16s
The dice coefficient obtained during training is 0.9653 while during test-
ing is 0.8529. IoU obtained during training and testing is 0.9336 and
0.7502 respectively. The recall and precision obtained are 0.9038 and
0.9848 respectively in training and 0.7915 and 0.8921 in testing. The
metric values obtained are almost similar to that of FCN-32s. There is a
slight improvement in the metric values during testing compared to FCN-
32s.
The output obtained using FCN-16s is shown (Figure 14). The seg-
mented output is similar to the ground truth.
24
Figure 14: FCN-16s Results. (a) Input image (b) Ground truth (c) Segmented output
4.3 FCN-8s
The dice coefficient obtained during training is 0.9509 while during test-
ing is 0.8580. IoU obtained during training and testing is 0.9076 and
0.7570 respectively. The recall and precision obtained are 0.8878 and
0.9755 respectively in training and 0.8150 and 0.8800 in testing. The
metric values obtained are almost similar to that of the other FCN vari-
ants. There is a slight improvement in the metric values during testing
compared to the other two.
The output obtained using FCN-8s is shown (Figure 15). The seg-
mented output is similar to the ground truth.
25
Figure 15: FCN-8s Results. (a) Input image (b) Ground truth (c) Segmented output
4.4 U-Net
The dice coefficient obtained during training is 0.9248 while during test-
ing is 0.8016. IoU obtained during training and testing is 0.8618 and
0.6766 respectively. The recall and precision obtained are 0.8604 and
0.9593 respectively in training and 0.7546 and 0.8379 in testing. The
metric values are good, but there is a small reduction compared to the
FCN variants.
The output obtained using U-Net is shown (Figure 16). The segmented
output is similar to the ground truth.
26
Figure 16: U-Net Results. (a) Input image (b) Ground truth (c) Segmented output
The dice coefficient obtained during training is 0.9576 while during test-
ing is 0.7649. IoU obtained during training and testing is 0.9193 and
0.6268 respectively. The recall and precision obtained are 0.8942 and
0.9832 respectively in training and 0.7111 and 0.8071 in testing. The
metric values are better compared to U-Net, but a bit lesser compared to
the FCN variants.
The output obtained using ResU-Net is shown (Figure 17). The seg-
mented output is similar to the ground truth.
27
Figure 17: ResU-Net Results. (a) Input image (b) Ground truth (c) Segmented output
The dice coefficient obtained during training is 0.9489 while during test-
ing is 0.8028. IoU obtained during training and testing is 0.9039 and
0.6778 respectively. The recall and precision obtained are 0.8870 and
0.9768 respectively in training and 0.7530 and 0.8421 in testing. The
metric values are better compared to U-Net and ResU-Net, but a bit lesser
compared to the FCN variants.
The output obtained using Attention U-Net is shown (Figure 18). The
segmented output is similar to the ground truth.
28
Figure 18: Attention U-Net Results. (a) Input image (b) Ground truth (c) Segmented
output
The dice coefficient obtained during training is 0.7538 while during test-
ing is 0.7313. IoU obtained during training and testing is 0.6080 and
0.5803 respectively. The recall and precision obtained are 0.8878 and
0.9315 respectively in training and 0.8582 and 0.8979 in testing.
The metric values obtained during training are lesser as compared to
other architectures, but the test set performance is very good. The dif-
ference between the metric values of the training and test set is very low.
This means that the overfitting is avoided to a greater extent. The out-
put obtained using Double U-Net is shown (Figure 19). The segmented
output is similar to the ground truth.
Summary: To summarize the results, each architecture gives a good
segmented output which is almost identical to the ground truth. The FCN
variants, especially FCN-8s, shown better performance in the training
phase as well as the testing phase. As the complexity is increased, the
29
Figure 19: Double U-Net Results. (a) Input image (b) Ground truth (c) Segmented
output
difference between the training phase results and the testing phase results
keeps on decreasing. This is a sign of decreasing overfitting. The metric
values of each architecture in the training and testing phase are shown in
the table (Table 1).
Table 1: Results
Training Testing
Dice Dice
Architecture Coefficient IoU Recall Precision Coefficient IoU Recall Precision
FCN-32s 0.97 0.94 0.90 0.98 0.83 0.72 0.78 0.87
FCN-16s 0.97 0.94 0.90 0.98 0.85 0.75 0.79 0.89
FCN-8s 0.95 0.91 0.89 0.97 0.86 0.76 0.81 0.88
U-Net 0.92 0.86 0.86 0.96 0.80 0.68 0.75 0.84
ResU-Net 0.96 0.91 0.89 0.98 0.76 0.62 0.71 0.80
Attention U-Net 0.95 0.90 0.89 0.98 0.80 0.68 0.75 0.84
Double U-Net 0.75 0.61 0.89 0.93 0.73 0.58 0.86 0.90
30
Chapter 5
Conclusion
31
Bibliography
[1] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convo-
lutional networks for semantic segmentation. In Proceedings of the
IEEE conference on computer vision and pattern recognition, pages
3431–3440, 2015.
[3] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Con-
volutional networks for biomedical image segmentation. In Inter-
national Conference on Medical image computing and computer-
assisted intervention, pages 234–241. Springer, 2015.
[4] Zhengxin Zhang, Qingjie Liu, and Yunhong Wang. Road extrac-
tion by deep residual u-net. IEEE Geoscience and Remote Sensing
Letters, 15(5):749–753, 2018.
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep
residual learning for image recognition. In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 770–
778, 2016.
32
where to look for the pancreas. arXiv preprint arXiv:1804.03999,
2018.
[7] Debesh Jha, Michael A Riegler, Dag Johansen, Pål Halvorsen, and
Håvard D Johansen. Doubleu-net: A deep convolutional neural net-
work for medical image segmentation. In 2020 IEEE 33rd Inter-
national Symposium on Computer-Based Medical Systems (CBMS),
pages 558–564. IEEE, 2020.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial
pyramid pooling in deep convolutional networks for visual recog-
nition. IEEE transactions on pattern analysis and machine intelli-
gence, 37(9):1904–1916, 2015.
33
Biodata
NAME :Jayakrishnan T
DATE OF BIRTH :01 November 1995
CONTACT NO. :7510695597
EMAIL ID. :jayan01krishnan@gmail.com.
EDUCATIONAL QUALIFICATIONS
BACHELOR OF TECHNOLOGY
MASTER OF TECHNOLOGY
34