Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

4/23/24, 6:51 PM Convolutional Neural Networks.

Before kickstarting into CNNs we must… | by Namita | Medium

Open in app

Search

Get unlimited access to the best of Medium for less than $1/week. Become a member

Convolutional Neural Networks


Namita · Follow
5 min read · Dec 11, 2023

Listen Share More

Before kickstarting into CNNs we must first know about an image. What really is
an RGB image?

It is nothing but a matrix of pixels with three planes each corresponding to red ,
green , and blue. Another kind of image is a grayscale image which is a similar
matrix of pixels but with only one plane.

Pixel matrix representation of Grayscale and RGB images

Consider the tuple (24,24,1), but what does it signify?


https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 1/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

It is how you represent a grayscale image of size 24x24. The first 24 in this tuple
corresponds to the height of the image, second to it’s width, and third to the number
of channels. Since there is just one channel that means it is a grayscale image.

Had the tuple been like (24,24,3) then it would be a RGB image of 24x24 dimensions.

What are convolutional neural networks?


As we already now, neural networks are artificial neurons that try to imitate the
human brain as much as they can, on similar lines the convolutional neural
networks assist in the task of identifying information from image data.

For example, A human brain can differentiate between a dog and a cat when an
animal image is shown to it very easily but the same task is very difficult for a
machine. It needs to be fed with a lot of images of dogs and cats and remember the
characteristics in both the animals that differentiate them from one another.

What is the math behind this process? Let’s find out.


An image is given as an input to a CNN model which is essentially a matrix of pixels.
The input image is made to have a dot product with another small matrix of pixels
called a filter or a kernel. The result of this matrix multiplication is another matrix
of pixels which is called a convolutional map.

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 2/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

There are many convolutional neural networks out there, which we can customize
and use stand alone or as a part of a broader language model.

Let’s take the example of VGG-16, it won the imagenet competition in 2014 and is
considered a powerful model for image classification.

Let’s have a look at it’s architecture :

VGG 16 Architecture

An image of size (224,224,3) is given as an input to the network which is passed to


the convolutional layer. Notice the dimensions changing, as the convolution
operation begins — (224,224,64) , due to the following reasons :-

The convolutional layer consists of multiple filters. Each filter has a smaller size
than the input image. Common filter sizes might be 3x3 or 5x5.

These filters slide over the entire input image. At each position, the filter
performs element-wise multiplication between its weights and the
corresponding pixel values in the receptive field of the input image.

The result of these multiplications is summed up to produce a single value at


each position where the filter convolves over the image. This produces a new
array of values, forming a new feature map.

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 3/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

In this case, when the input image of size (224, 224, 3) is convolved with a set of 64
filters, the output shape becomes (224, 224, 64). Here’s what each dimension
represents

(224, 224): These dimensions represent the height and width of the feature map.
The spatial dimensions remain the same in this example because of using
padding that keeps the spatial dimensions constant during convolution.

64: This represents the number of filters applied during convolution. Each filter
generates a single channel in the output. Therefore, after applying 64 filters, the
resulting feature map has 64 channels.

Essentially, whenever a convolution operation takes place, the input image loses
dimensions along both the axes, because the kernel can not directly superimpose on
the edges of the input image and hence the outermost values in the matrix are left
out which results in loss of meaningful data. However, with the use of padding we
can retain the values on the edge of the input matrix and prevent losing dimensions.

In the next step we have the max pooling layer and our image dimensions change in
a different manner becoming— ( 112,112,128).

Max pooling is a downsampling operation commonly used in convolutional neural


networks (CNNs) after convolutional layers. It reduces the spatial dimensions (width
and height) of each feature map while retaining the number of channels.

Here’s why the dimensions change to (112, 112, 128) after applying a max pooling
layer:

Max pooling operates by sliding a window (usually 2x2) over each feature map
generated by the convolutional layer.

At each position, the max pooling operation takes the maximum value within
the window and discards the rest. This effectively reduces the spatial
dimensions by half (in the case of a 2x2 window with a stride of 2) — Stride
means the number of steps the kernel moves over the pixel matrix.

However, the number of channels (depth) remains unchanged. Each channel in


the input retains its information; max pooling only affects the spatial
dimensions which is why only the height and width changed from (224,224) to
(112,112)

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 4/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

After a series of convolution and pooling operations, we arrive at the fully


connected layer plus Relu and finally the softmax activation function.

Let’s summarize the entire architecture of VGG 16 and discuss the final steps from
fully connected layer to softmax activation function.

Convolutional Layers:
Feature Extraction: Convolutional layers consist of learnable filters that extract
various features from the input image using convolutions. These filters detect
patterns like edges, textures, or more complex structures.

Pooling Layers:
Spatial Reduction: Pooling layers (such as max pooling) downsample the feature
maps obtained from convolutional layers. This reduces the spatial dimensions
while retaining important information and reducing computation.

Fully Connected Layers:


Transition to Classification: Fully connected layers take the flattened output
from the convolutional and pooling layers and connect every neuron from the
previous layer to the next. This part of the network combines extracted features
and learns to map them to the output classes or categories.

ReLU Activation:
Introducing Non-Linearity: Rectified Linear Unit (ReLU) is an activation
function applied after the fully connected layers. ReLU introduces non-linearity,
helping the network learn more complex relationships in the data and
alleviating the vanishing gradient problem.

Softmax Activation:
Probability Distribution: Softmax is a function applied to the output layer of the
neural network in classification tasks. It normalizes the output into a probability
distribution across multiple classes, where each class gets a probability score.
This makes it suitable for multi-class classification problems, as it ensures the
sum of probabilities for all classes is 1.

Final Interpretation:
So, the sequence from convolutional layers to fully connected layers with ReLU
activation and concluding with softmax represents a progression through feature
extraction, reduction, transition to a fully connected layer for classification,

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 5/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

introducing non-linearity, and finally producing a probability distribution over the


possible output classes. This architecture is commonly used in CNNs for tasks like
image classification, where the network learns hierarchical representations of the
input data to make predictions about the class or category to which it belongs.

Cnn Vgg16 Data Science Deep Learning Image Classification

Follow

Written by Namita
17 Followers

Tech Enthusiast , keen observer, consistent learner and a data science student.

More from Namita

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 6/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

Namita

Accessing Open AI using python


Learning to make API requests to Open AI and leveraging prompt engineering.

5 min read · Sep 21, 2023

52 1

Namita

Implementing LDA on AI generated data for topic modelling


Generative AI is out there, available for everyone but only a few know how to leverage it to the
fullest.

11 min read · Sep 8, 2023

51

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 7/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

Namita

Feed Forward Neural Network From Scratch


To get started head first with the code, a certain clarity behind the scenes is required which I
will be happy to provide in as simple…

3 min read · Jun 15, 2023

51 1

Namita

Outlier Detection in ML
https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 8/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

What are outliers?

4 min read · Dec 26, 2023

50

See all from Namita

Recommended from Medium

Abdulkader Helwan in Python in Plain English

Liquid Neural Networks: Simple Implementation


Implementing Liquid Neural Network in TensorFlow

· 3 min read · Feb 23, 2024

426 2

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 9/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

om pramod

Mastering Gradient Descent: Optimizing Neural Networks with Precision.


Part 3: The Role of Learning Rate in Optimization

9 min read · Mar 10, 2024

660

Lists

Predictive Modeling w/ Python


20 stories · 1117 saves

Practical Guides to Machine Learning


10 stories · 1339 saves

Natural Language Processing


1392 stories · 891 saves

data science and AI


40 stories · 135 saves

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 10/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

Tim Sumner in Towards Data Science

A New Coefficient of Correlation


What if you were told there exists a new way to measure the relationship between two variables
just like correlation except possibly…

10 min read · Mar 31, 2024

2.9K 36

Sunidhi Ashtekar

Dynamic Object Detection and Segmentation with YOLOv9+SAM


https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 11/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

In this article, I have examined a custom object detection model on the RF100 Construction-
Safety-2 dataset with YOLOv9+SAM.

9 min read · Mar 15, 2024

210 3

Tim Cvetko in AI Advances

Build Your Own Liquid Neural Network with PyTorch


Why LNNs are so Fascinating — 2024 Overview

· 6 min read · Apr 9, 2024

737 6

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 12/13
4/23/24, 6:51 PM Convolutional Neural Networks. Before kickstarting into CNNs we must… | by Namita | Medium

Francesco Franco in AI Mind

Object Detection with Python and HuggingFace Transformers


(NOTE: This post was deleted, along with several others, during a fit of depression. I
republished it from a saved copy).

9 min read · Mar 20, 2024

90

See more recommendations

https://medium.com/@namitabagri/convolutional-neural-networks-fb0abfdc1c7c 13/13

You might also like