Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium

4/23/24, 6:51 PM Convolutional Neural Networks.
Before kickstarting into CNNs we must… | by Namita | Medium
Open in app
Search
Get unlimited access to the best of Medium for less than $1/week. Become a member
Convolutional Neural Networks

Namita · Follow
5 min read · Dec 11, 2023
Listen Share More
Before kickstarting into CNNs we must first know about an image. What really is
an RGB image?
It is nothing but a matrix of pixels with three planes each corresponding to red ,
green , and blue. Another kind of image is a grayscale image which is a similar
matrix of pixels but with only one plane.
Pixel matrix representation of Grayscale and RGB images
Consider the tuple (24,24,1), but what does it signify?

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 1/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium
It is how you represent a grayscale image of size 24x24. The first 24 in this tuple
corresponds to the height of the image, second to it’s width, and third to the number
of channels. Since there is just one channel that means it is a grayscale image.
Had the tuple been like (24,24,3) then it would be a RGB image of 24x24 dimensions.
What are convolutional neural networks?

As we already now, neural networks are artificial neurons that try to imitate the
human brain as much as they can, on similar lines the convolutional neural
networks assist in the task of identifying information from image data.
For example, A human brain can differentiate between a dog and a cat when an
animal image is shown to it very easily but the same task is very difficult for a
machine. It needs to be fed with a lot of images of dogs and cats and remember the
characteristics in both the animals that differentiate them from one another.
What is the math behind this process? Let’s find out.

An image is given as an input to a CNN model which is essentially a matrix of pixels.
The input image is made to have a dot product with another small matrix of pixels
called a filter or a kernel. The result of this matrix multiplication is another matrix
of pixels which is called a convolutional map.
There are many convolutional neural networks out there, which we can customize
and use stand alone or as a part of a broader language model.
Let’s take the example of VGG-16, it won the imagenet competition in 2014 and is
considered a powerful model for image classification.
Let’s have a look at it’s architecture :
VGG 16 Architecture
An image of size (224,224,3) is given as an input to the network which is passed to

the convolutional layer. Notice the dimensions changing, as the convolution
operation begins — (224,224,64) , due to the following reasons :-
The convolutional layer consists of multiple filters. Each filter has a smaller size
than the input image. Common filter sizes might be 3x3 or 5x5.
These filters slide over the entire input image. At each position, the filter
performs element-wise multiplication between its weights and the
corresponding pixel values in the receptive field of the input image.
The result of these multiplications is summed up to produce a single value at

each position where the filter convolves over the image. This produces a new
array of values, forming a new feature map.
In this case, when the input image of size (224, 224, 3) is convolved with a set of 64
filters, the output shape becomes (224, 224, 64). Here’s what each dimension
represents
(224, 224): These dimensions represent the height and width of the feature map.
The spatial dimensions remain the same in this example because of using
padding that keeps the spatial dimensions constant during convolution.
64: This represents the number of filters applied during convolution. Each filter
generates a single channel in the output. Therefore, after applying 64 filters, the
resulting feature map has 64 channels.
Essentially, whenever a convolution operation takes place, the input image loses
dimensions along both the axes, because the kernel can not directly superimpose on
the edges of the input image and hence the outermost values in the matrix are left
out which results in loss of meaningful data. However, with the use of padding we
can retain the values on the edge of the input matrix and prevent losing dimensions.
In the next step we have the max pooling layer and our image dimensions change in
a different manner becoming— ( 112,112,128).
Max pooling is a downsampling operation commonly used in convolutional neural

networks (CNNs) after convolutional layers. It reduces the spatial dimensions (width
and height) of each feature map while retaining the number of channels.
Here’s why the dimensions change to (112, 112, 128) after applying a max pooling
layer:
Max pooling operates by sliding a window (usually 2x2) over each feature map
generated by the convolutional layer.
At each position, the max pooling operation takes the maximum value within
the window and discards the rest. This effectively reduces the spatial
dimensions by half (in the case of a 2x2 window with a stride of 2) — Stride
means the number of steps the kernel moves over the pixel matrix.
However, the number of channels (depth) remains unchanged. Each channel in

the input retains its information; max pooling only affects the spatial
dimensions which is why only the height and width changed from (224,224) to
(112,112)
After a series of convolution and pooling operations, we arrive at the fully

connected layer plus Relu and finally the softmax activation function.
Let’s summarize the entire architecture of VGG 16 and discuss the final steps from
fully connected layer to softmax activation function.
Convolutional Layers:
Feature Extraction: Convolutional layers consist of learnable filters that extract
various features from the input image using convolutions. These filters detect
patterns like edges, textures, or more complex structures.
Pooling Layers:
Spatial Reduction: Pooling layers (such as max pooling) downsample the feature
maps obtained from convolutional layers. This reduces the spatial dimensions
while retaining important information and reducing computation.
Fully Connected Layers:

Transition to Classification: Fully connected layers take the flattened output
from the convolutional and pooling layers and connect every neuron from the
previous layer to the next. This part of the network combines extracted features
and learns to map them to the output classes or categories.
ReLU Activation:
Introducing Non-Linearity: Rectified Linear Unit (ReLU) is an activation
function applied after the fully connected layers. ReLU introduces non-linearity,
helping the network learn more complex relationships in the data and
alleviating the vanishing gradient problem.
Softmax Activation:
Probability Distribution: Softmax is a function applied to the output layer of the
neural network in classification tasks. It normalizes the output into a probability
distribution across multiple classes, where each class gets a probability score.
This makes it suitable for multi-class classification problems, as it ensures the
sum of probabilities for all classes is 1.
Final Interpretation:
So, the sequence from convolutional layers to fully connected layers with ReLU
activation and concluding with softmax represents a progression through feature
extraction, reduction, transition to a fully connected layer for classification,
introducing non-linearity, and finally producing a probability distribution over the

possible output classes. This architecture is commonly used in CNNs for tasks like
image classification, where the network learns hierarchical representations of the
input data to make predictions about the class or category to which it belongs.
Cnn Vgg16 Data Science Deep Learning Image Classification
Follow
Written by Namita
17 Followers
Tech Enthusiast , keen observer, consistent learner and a data science student.
More from Namita
Namita
Accessing Open AI using python

Learning to make API requests to Open AI and leveraging prompt engineering.
5 min read · Sep 21, 2023
52 1
Namita
Implementing LDA on AI generated data for topic modelling

Generative AI is out there, available for everyone but only a few know how to leverage it to the
fullest.
11 min read · Sep 8, 2023
51
Namita
Feed Forward Neural Network From Scratch

To get started head first with the code, a certain clarity behind the scenes is required which I
will be happy to provide in as simple…
3 min read · Jun 15, 2023
51 1
Namita
Outlier Detection in ML
What are outliers?
4 min read · Dec 26, 2023
50
See all from Namita
Recommended from Medium
Abdulkader Helwan in Python in Plain English
Liquid Neural Networks: Simple Implementation

Implementing Liquid Neural Network in TensorFlow
· 3 min read · Feb 23, 2024
426 2
om pramod
Mastering Gradient Descent: Optimizing Neural Networks with Precision.

Part 3: The Role of Learning Rate in Optimization
9 min read · Mar 10, 2024
660
Lists
Predictive Modeling w/ Python

20 stories · 1117 saves
Practical Guides to Machine Learning

Natural Language Processing

data science and AI

Tim Sumner in Towards Data Science
A New Coefficient of Correlation

What if you were told there exists a new way to measure the relationship between two variables
just like correlation except possibly…
10 min read · Mar 31, 2024
2.9K 36
Sunidhi Ashtekar
Dynamic Object Detection and Segmentation with YOLOv9+SAM

In this article, I have examined a custom object detection model on the RF100 Construction-
Safety-2 dataset with YOLOv9+SAM.
210 3
Tim Cvetko in AI Advances
Build Your Own Liquid Neural Network with PyTorch

Why LNNs are so Fascinating — 2024 Overview
· 6 min read · Apr 9, 2024
737 6
Francesco Franco in AI Mind
Object Detection with Python and HuggingFace Transformers

(NOTE: This post was deleted, along with several others, during a fit of depression. I
republished it from a saved copy).
90
See more recommendations

Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Convolutional Neural Networks. Before Kickstarting Into CNNs We Must - by Namita - Medium

Uploaded by

Copyright:

Available Formats

4/23/24, 6:51 PM Convolutional Neural Networks.

Before kickstarting into CNNs we must… | by Namita | Medium

Convolutional Neural Networks

Listen Share More

Pixel matrix representation of Grayscale and RGB images

Consider the tuple (24,24,1), but what does it signify?

What are convolutional neural networks?

What is the math behind this process? Let’s find out.

Let’s have a look at it’s architecture :

An image of size (224,224,3) is given as an input to the network which is passed to

The result of these multiplications is summed up to produce a single value at

Max pooling is a downsampling operation commonly used in convolutional neural

However, the number of channels (depth) remains unchanged. Each channel in

After a series of convolution and pooling operations, we arrive at the fully

Fully Connected Layers:

introducing non-linearity, and finally producing a probability distribution over the

Cnn Vgg16 Data Science Deep Learning Image Classification

More from Namita

Accessing Open AI using python

5 min read · Sep 21, 2023

Implementing LDA on AI generated data for topic modelling

11 min read · Sep 8, 2023

Feed Forward Neural Network From Scratch

3 min read · Jun 15, 2023

What are outliers?

4 min read · Dec 26, 2023

See all from Namita

Recommended from Medium

Abdulkader Helwan in Python in Plain English

Liquid Neural Networks: Simple Implementation

· 3 min read · Feb 23, 2024

Mastering Gradient Descent: Optimizing Neural Networks with Precision.

9 min read · Mar 10, 2024

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Natural Language Processing

data science and AI

Tim Sumner in Towards Data Science

A New Coefficient of Correlation

10 min read · Mar 31, 2024

Dynamic Object Detection and Segmentation with YOLOv9+SAM

9 min read · Mar 15, 2024

Tim Cvetko in AI Advances

Build Your Own Liquid Neural Network with PyTorch

· 6 min read · Apr 9, 2024

Francesco Franco in AI Mind

Object Detection with Python and HuggingFace Transformers

9 min read · Mar 20, 2024

See more recommendations

You might also like