Professional Documents
Culture Documents
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
What Is a CNN?
In deep learning, a convolutional neural network (CNN/ConvNet) is a
class of deep neural networks, most commonly applied to analyse visual
imagery. Now when we think of a neural network we think about matrix
multiplications but that is not the case with ConvNet. It uses a special
technique called Convolution. Now in mathematics convolution is a
mathematical operation on two functions that produces a third function
that expresses how the shape of one is modified by the other.
For simplicity, let’s stick with grayscale images as we try to understand how
CNNs work.
The above image shows what a convolution is. We take a filter/kernel (3×3
matrix) and apply it to the input image to get the convolved feature. This
convolved feature is passed on to the next layer.
In the case of RGB colour, channel take a look at this to understand its
working
What Is a Pooling Layer?
There are two types of pooling average pooling and max pooling.
On the other hand, Average Pooling returns the average of all the
values from the portion of the image covered by the Kernel. Hence, we can
say that Max Pooling performs a lot better than Average Pooling.
CNN- REAL TIME EXAMPLE
Face Recognition:
CNNs are widely used for face recognition tasks. They can learn to extract
discriminative features from facial images, enabling accurate
identification and verification of individuals. Here's how CNNs are
applied in face recognition systems:
Data Collection and Preprocessing: A dataset of facial images
is collected, including images of different individuals under various
lighting conditions, angles, and facial expressions. The dataset is
labeled with corresponding identities.
Architecture Design: A CNN architecture is designed specifically
for face recognition. It typically consists of convolutional layers,
pooling layers, and fully connected layers. The architecture can be
customized based on the complexity of the task and the available
dataset.
Training: The CNN is trained using a supervised learning
approach. It learns to extract discriminative facial features that are
essential for distinguishing between different individuals. The
network's internal parameters are adjusted through
backpropagation and gradient descent to minimize the difference
between predicted identities and ground truth labels.
Feature Extraction: Once the CNN is trained, it can extract
features from facial images. By applying convolutions and pooling
operations, the network identifies important facial patterns and
features that are crucial for accurate recognition.
Classification: The extracted features are then used for
classification. The CNN maps the facial features to a latent space,
where classification algorithms are applied to determine the identity
of the individual. The network assigns a label based on the learned
patterns and similarities between the input image and the training
data.
Evaluation and Testing: The trained CNN is evaluated on
separate test datasets to assess its performance. Metrics such as
accuracy, precision, recall, and F1 score are used to evaluate the
CNN's ability to correctly recognize individuals.
CNN- NUMERICAL EXAMPLE
Let's represent the grayscale image and the convolutional filter in matrix
form to illustrate the numerical example using Convolutional Neural
Networks (CNNs).
2 3 1 0 4
0 1 2 2 3
3 0 1 3 2
2 1 0 4 1
1 2 2 1 3
1 0 1
0 1 0
1 0 1
I= 2 3 1 0 4 F=
1 0 1
0 1 2 2 3
3 0 1 3 2 0 1 0
2 1 0 4 1 1 0 1
1 2 2 1 3
Let's perform the convolution operation by sliding the filter over the
image:
At the top-left position:
2 3 1 1 0 1
0 1 2
0 1 0
X =5
3 0 1
1 0 1
At the top-centre position:
1 0 1
3 1 0
0 1 0
1 2 2 X =6
0 1 3 1 0 1
5 6 5
4 4 6
4 6 5
The resulting feature map represents the filtered output of the original
image using the convolutional filter. Each value in the feature map is
computed by convolving the filter with the corresponding image patch.
LIMITATIONS OF CNNS
Despite the power and resource complexity of CNNs, they provide in-
depth results. At the root of it all, it is just recognizing patterns and details
that are so minute and inconspicuous that it goes unnoticed to the human
eye. But when it comes to understanding the contents of an image it fails.
Let’s take a look at this example. When we pass the below image to a CNN
it detects a person in their mid-30s and a child probably around 10 years.
But when we look at the same image we start thinking of multiple different
scenarios. Maybe it’s a father and son day out, a picnic or maybe they are
camping. Maybe it is a school ground and the child scored a goal and his
dad is happy so he lifts him.