Professional Documents
Culture Documents
Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium
Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium
Open in app
Search
Get unlimited access to the best of Medium for less than $1/week. Become a member
Before kickstarting into CNNs we must first know about an image. What really is
an RGB image?
It is nothing but a matrix of pixels with three planes each corresponding to red ,
green , and blue. Another kind of image is a grayscale image which is a similar
matrix of pixels but with only one plane.
It is how you represent a grayscale image of size 24x24. The first 24 in this tuple
corresponds to the height of the image, second to it’s width, and third to the number
of channels. Since there is just one channel that means it is a grayscale image.
Had the tuple been like (24,24,3) then it would be a RGB image of 24x24 dimensions.
For example, A human brain can differentiate between a dog and a cat when an
animal image is shown to it very easily but the same task is very difficult for a
machine. It needs to be fed with a lot of images of dogs and cats and remember the
characteristics in both the animals that differentiate them from one another.
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 2/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
There are many convolutional neural networks out there, which we can customize
and use stand alone or as a part of a broader language model.
Let’s take the example of VGG-16, it won the imagenet competition in 2014 and is
considered a powerful model for image classification.
VGG 16 Architecture
The convolutional layer consists of multiple filters. Each filter has a smaller size
than the input image. Common filter sizes might be 3x3 or 5x5.
These filters slide over the entire input image. At each position, the filter
performs element-wise multiplication between its weights and the
corresponding pixel values in the receptive field of the input image.
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 3/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
In this case, when the input image of size (224, 224, 3) is convolved with a set of 64
filters, the output shape becomes (224, 224, 64). Here’s what each dimension
represents
(224, 224): These dimensions represent the height and width of the feature map.
The spatial dimensions remain the same in this example because of using
padding that keeps the spatial dimensions constant during convolution.
64: This represents the number of filters applied during convolution. Each filter
generates a single channel in the output. Therefore, after applying 64 filters, the
resulting feature map has 64 channels.
Essentially, whenever a convolution operation takes place, the input image loses
dimensions along both the axes, because the kernel can not directly superimpose on
the edges of the input image and hence the outermost values in the matrix are left
out which results in loss of meaningful data. However, with the use of padding we
can retain the values on the edge of the input matrix and prevent losing dimensions.
In the next step we have the max pooling layer and our image dimensions change in
a different manner becoming— ( 112,112,128).
Here’s why the dimensions change to (112, 112, 128) after applying a max pooling
layer:
Max pooling operates by sliding a window (usually 2x2) over each feature map
generated by the convolutional layer.
At each position, the max pooling operation takes the maximum value within
the window and discards the rest. This effectively reduces the spatial
dimensions by half (in the case of a 2x2 window with a stride of 2) — Stride
means the number of steps the kernel moves over the pixel matrix.
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 4/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
Let’s summarize the entire architecture of VGG 16 and discuss the final steps from
fully connected layer to softmax activation function.
Convolutional Layers:
Feature Extraction: Convolutional layers consist of learnable filters that extract
various features from the input image using convolutions. These filters detect
patterns like edges, textures, or more complex structures.
Pooling Layers:
Spatial Reduction: Pooling layers (such as max pooling) downsample the feature
maps obtained from convolutional layers. This reduces the spatial dimensions
while retaining important information and reducing computation.
ReLU Activation:
Introducing Non-Linearity: Rectified Linear Unit (ReLU) is an activation
function applied after the fully connected layers. ReLU introduces non-linearity,
helping the network learn more complex relationships in the data and
alleviating the vanishing gradient problem.
Softmax Activation:
Probability Distribution: Softmax is a function applied to the output layer of the
neural network in classification tasks. It normalizes the output into a probability
distribution across multiple classes, where each class gets a probability score.
This makes it suitable for multi-class classification problems, as it ensures the
sum of probabilities for all classes is 1.
Final Interpretation:
So, the sequence from convolutional layers to fully connected layers with ReLU
activation and concluding with softmax represents a progression through feature
extraction, reduction, transition to a fully connected layer for classification,
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 5/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
Follow
Written by Namita
17 Followers
Tech Enthusiast , keen observer, consistent learner and a data science student.
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 6/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
Namita
52 1
Namita
51
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 7/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
Namita
51 1
Namita
Outlier Detection in ML
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 8/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
50
426 2
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 9/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
om pramod
660
Lists
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 10/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
2.9K 36
Sunidhi Ashtekar
In this article, I have examined a custom object detection model on the RF100 Construction-
Safety-2 dataset with YOLOv9+SAM.
210 3
737 6
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 12/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
90
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 13/13