Professional Documents
Culture Documents
UNIT 3 Self Notes
UNIT 3 Self Notes
Padding
• Padding is a technique used to preserve the spatial dimensions of the input image after convolution
operations on a feature map and can improve the performance of the model.
• Padding is simply a process of adding layers of zeros to our input images so as to avoid
the problems mentioned above.
• Padding involves adding extra pixels around the border of the input feature map before convolution.
• Same Padding: In the same padding, padding is added to the input feature map such that the
size of the output feature map is the same as the input feature map. This is useful when we want
to preserve the spatial dimensions of the feature maps.
• This prevents shrinking as, if p = number of layers of zeros added to the border of the
image, then our (n x n) image becomes (n + 2p) x (n + 2p) image after padding. So,
applying convolution-operation (with (f x f) filter) outputs (n + 2p – f + 1) x (n + 2p – f
+ 1) images. For example, adding one layer of padding to an (8 x 8) image and using a
(3 x 3) filter we would get an (8 x 8) output after performing convolution operation.
• This increases the contribution of the pixels at the border of the original image by
bringing them into the middle of the padded image. Thus, information on the borders is
preserved as well as the information in the middle of the image.
Edge Detection
As the name suggests, edge detection is the process of detecting the edges in an image
Convolution operations are widely used in image processing for tasks such as edge
detection. The basic idea behind edge detection using convolutions is to apply a
convolutional kernel (also known as a filter) to an image.
Strided Convolution
A strided convolution is another basic building block of convolution that is used in
Convolutional Neural Networks.
A convolutional layer with a 1×1 filter can, therefore, be used at any point
in a convolutional neural network to control the number of feature maps.
As such, it is often referred to as a projection operation or projection layer,
or even a feature map or channel pooling layer.
An inception network solves this by saying. “ Why shouldn’t we apply them all ? ”.
This makes the network architecture more complicated, but remarkably improves
performance as well. Let’s see how this works.
1. Residual blocks
2. Bounding box regression
3. Intersection Over Union (IOU)
First, the image is divided into various grids. Each grid has a dimension of S x S.
Every grid cell will detect objects that appear within them.