Convolutional Neural Network

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

CONVOLUTIONAL NEURAL NETWORK

What Is a CNN?
In deep learning, a convolutional neural network (CNN/ConvNet) is a
class of deep neural networks, most commonly applied to analyse visual
imagery. Now when we think of a neural network we think about matrix
multiplications but that is not the case with ConvNet. It uses a special
technique called Convolution. Now in mathematics convolution is a
mathematical operation on two functions that produces a third function
that expresses how the shape of one is modified by the other.

 Bottom line is that the role of the ConvNet is to reduce the


images into a form that is easier to process, without losing
features that are critical for getting a good prediction.

How does it work?


Before we go to the working of CNN’s let’s cover the basics such as what is
an image and how is it represented. An RGB image is nothing but a matrix
of pixel values having three planes whereas a grayscale image is the same
but it has a single plane. Take a look at this image to understand more.
Filters or kernels in CNNs are used to perform convolution operations
on input data, such as images, to extract relevant features and patterns.
The filters act as small windows that are moved across the input data,
scanning for specific patterns and transforming the input into feature
maps

For simplicity, let’s stick with grayscale images as we try to understand how
CNNs work.
The above image shows what a convolution is. We take a filter/kernel (3×3
matrix) and apply it to the input image to get the convolved feature. This
convolved feature is passed on to the next layer.

In the case of RGB colour, channel take a look at this to understand its
working
What Is a Pooling Layer?

Similar to the Convolutional Layer, the Pooling layer is responsible for


reducing the spatial size of the Convolved Feature. This is to decrease
the computational power required to process the data by
reducing the dimensions.

There are two types of pooling average pooling and max pooling.

So what we do in Max Pooling is we find the maximum value of a pixel


from a portion of the image covered by the kernel.

On the other hand, Average Pooling returns the average of all the
values from the portion of the image covered by the Kernel. Hence, we can
say that Max Pooling performs a lot better than Average Pooling.
CNN- REAL TIME EXAMPLE

Face Recognition:
CNNs are widely used for face recognition tasks. They can learn to extract
discriminative features from facial images, enabling accurate
identification and verification of individuals. Here's how CNNs are
applied in face recognition systems:
 Data Collection and Preprocessing: A dataset of facial images
is collected, including images of different individuals under various
lighting conditions, angles, and facial expressions. The dataset is
labeled with corresponding identities.
 Architecture Design: A CNN architecture is designed specifically
for face recognition. It typically consists of convolutional layers,
pooling layers, and fully connected layers. The architecture can be
customized based on the complexity of the task and the available
dataset.
 Training: The CNN is trained using a supervised learning
approach. It learns to extract discriminative facial features that are
essential for distinguishing between different individuals. The
network's internal parameters are adjusted through
backpropagation and gradient descent to minimize the difference
between predicted identities and ground truth labels.
 Feature Extraction: Once the CNN is trained, it can extract
features from facial images. By applying convolutions and pooling
operations, the network identifies important facial patterns and
features that are crucial for accurate recognition.
 Classification: The extracted features are then used for
classification. The CNN maps the facial features to a latent space,
where classification algorithms are applied to determine the identity
of the individual. The network assigns a label based on the learned
patterns and similarities between the input image and the training
data.
 Evaluation and Testing: The trained CNN is evaluated on
separate test datasets to assess its performance. Metrics such as
accuracy, precision, recall, and F1 score are used to evaluate the
CNN's ability to correctly recognize individuals.
CNN- NUMERICAL EXAMPLE

Let's represent the grayscale image and the convolutional filter in matrix
form to illustrate the numerical example using Convolutional Neural
Networks (CNNs).

Grayscale Image (5x5):

2 3 1 0 4

0 1 2 2 3

3 0 1 3 2

2 1 0 4 1

1 2 2 1 3

We can represent this image as a matrix I of size 5x5.

Convolutional Filter (3x3):

1 0 1

0 1 0

1 0 1

We can represent this filter as a matrix F of size 3x3.


Convolution operation

To perform the convolution operation, we'll use matrix multiplication


and element-wise addition.

Let's define the matrix I and F:

I= 2 3 1 0 4 F=
1 0 1
0 1 2 2 3

3 0 1 3 2 0 1 0

2 1 0 4 1 1 0 1

1 2 2 1 3

We can perform the convolution operation by multiplying the filter matrix


F with a sliding window of the image matrix I.

Let's perform the convolution operation by sliding the filter over the
image:
At the top-left position:

2 3 1 1 0 1

0 1 2
0 1 0
X =5
3 0 1
1 0 1
At the top-centre position:

1 0 1
3 1 0
0 1 0
1 2 2 X =6
0 1 3 1 0 1

Proceeding in a similar manner, we compute the convolution operation at


each position, resulting in the feature map or convolved feature:

5 6 5

4 4 6

4 6 5

The resulting feature map represents the filtered output of the original
image using the convolutional filter. Each value in the feature map is
computed by convolving the filter with the corresponding image patch.
LIMITATIONS OF CNNS

Despite the power and resource complexity of CNNs, they provide in-
depth results. At the root of it all, it is just recognizing patterns and details
that are so minute and inconspicuous that it goes unnoticed to the human
eye. But when it comes to understanding the contents of an image it fails.
Let’s take a look at this example. When we pass the below image to a CNN
it detects a person in their mid-30s and a child probably around 10 years.
But when we look at the same image we start thinking of multiple different
scenarios. Maybe it’s a father and son day out, a picnic or maybe they are
camping. Maybe it is a school ground and the child scored a goal and his
dad is happy so he lifts him.

You might also like