Presentation Notes

OFFICIAL (CLOSED) \ NON-SENSITIVE
Computer Vision
Mr Hew Ka Kian
hew_ka_kian@rp.edu.sg
Image Convolution
2
Image Convolution
• In image processing, convolution is the process of transforming an image by applying a kernel
over each pixel and its local neighbors across the entire image. The kernel is a matrix of values
whose size and values determine the transformation effect of the convolution process.
https://medium.com/@bdhuma/6-basic-things-to-know-about-convolution-daef5e1bc411
3
Grid Size
• The number of pixels a kernel “sees” at once
• Typically use odd numbers so that there is a “center” pixel
• Kernel does not need to be square
Height: 3, Width: 3 Height: 1, Width: 3 Height: 3, Width: 1
4
Padding
• Using Kernels directly, there will be an “edge effect”
• Pixels near the edge will not be used as “center pixels” since there are not
enough surrounding pixels
• Padding adds extra pixels around the frame
• So every pixel of the original image will be a center pixel as the kernel
moves across the image
5
Stride
• The ”step size” as the kernel moves across the image
• Can be different for vertical and horizontal steps (but usually is the
same value)
• When stride is greater than 1, it scales down the output dimension
Stride 2 Example – No Padding
3
0
6
Depth
• In images, each pixel may be represented by multiple values. This is
also known as image mode.
• The number of values is referred to as “channels”
• RGB image – 3 channels
• CMYK – 4 channels
• The output of the convolution, called feature map, will have the same
depth or channel.
7
Basic Idea
• A convolution layer consists of n-number of kernels. Each
kernel will convolve with the input images to produce n-
number of output images.
• The activation function of a convolution layer is Relu
because
(i) there is no requirement of ‘thresholding’ the
convoluted output image, and
(ii) there is no negative values in an image.
8
Other types of layers

• Pooling layer
• Shrinks the dimensions of an image (or reduce its size) by mapping a patch of pixels to one
value.
• Max-pooling
• Average-pooling
9

• Dropout layer
• Randomly deactivate one or more nodes from a network
• Does NOT change the network shape as nodes are NOT removed, just (some
are) deactivated.
• To prevent overfitting of the model as it explores more pathway from the
input layer to the output layer
• Dropout is ONLY performed during training
10
Ref: https://medium.com/analytics-vidhya/a-simple-introduction-to-dropout-regularization-with-code-5279489dda1e

• Flatten layer
• transforms a two-dimensional matrix of features into a vector that can be fed
into a fully connected neural network classifier
11
Forming the CNN

• By combining the convolution, pooling, flatten and dense (fully
connected) layers, they formed a Convolution Neural Network.
12
Forming the CNN

• The output with the highest value is what the model predicts the
class is
• If output of Dog is 3.5, it is difficult to make sense of the significance
of 3.5 unless we view it in perspective to the total values
• Softmax converts the values into probabilities and the total values
add up to 1.0
13
Softmax
exp(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎 𝑖𝑖)
• 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎 𝑖𝑖 =
𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎 exp(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜)
• Use Excel to calculate exp() with =EXP(value)
output EXP() Softmax output

Dog 3.5 33.115 =33.115/35.170
=0.94
Cat 0.72 2.054 =2.054/35.170
=0.06
Total: 35.170
14
Student Activity
How many actual Mango are wrongly predicted as Orange?
Prediction
Actual
Source:
Student Activity
How many predictions of Apple are actually Mango?
Prediction
Actual
Source:
Student Activity
What is the accuracy when tested with actual Apple?
• Accuracy = (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝒄𝒄𝒐𝒐𝒓𝒓𝒓𝒓𝒆𝒆𝒄𝒄𝒕𝒕 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠) / (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓
𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠)
Accuracy = 85 / (8 + 10 + 9) = 0.82
Actual
Source:
Student Activity
What is the overall accuracy?
• Accuracy = (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝒄𝒄𝒐𝒐𝒓𝒓𝒓𝒓𝒆𝒆𝒄𝒄𝒕𝒕 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠) / (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓
𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠)
Accuracy = (85+93+85)/(85+10+9+6+93+12+7+8+85) = 0.83
Source:
Student Activity
Calculate the test accuracy from
the numbers you can get from the
confusion matrix below and make
sure you can get the same accuracy
as the test_acc
• Accuracy =
𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝒄𝒄𝒐𝒐𝒓𝒓𝒓𝒓𝒆𝒆𝒄𝒄𝒕𝒕 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠 /
𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠
Source:
Student Activity
Complete the code below to build the network as the summary shows.
Output will be 40 32x32 feature map. 32x32
network = models.Sequential() input shape is preserved because of padding
network.add(layers.Conv2D(40, (3, 3), activation="relu", padding='same’, input_shape=(32,32,3)))
network.add(layers.Conv2D(60, (3, 3), activation="relu")) Output is 60 30x30 feature map. Size is reduced
network.add(layers.MaxPooling2D(pool_size=(2, 2))) by 2 rows and 2 columns with no padding
Pool size (2,2) reduce the rows and

columns by 2
network.add(layers.Dropout(0.4))
Dropout does not affect the output shape

as it just deactivate some nodes at each
epoch, not permanently removing the
node
Source:
Student Activity
network.add(layers.Flatten()) Flatten 2x2x40 into 160 nodes
network.add(layers.Dense(60, activation='relu')) 60 nodes

network.add(layers.Dropout(0.3))
network.add(layers.Dense(10, activation='softmax')) Output layer of 10 nodes as
there are 10 classes
Source:

Presentation Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation Notes

Uploaded by

Copyright:

Available Formats

OFFICIAL (CLOSED) \ NON-SENSITIVE

Height: 3, Width: 3 Height: 1, Width: 3 Height: 3, Width: 1

Other types of layers

Other types of layers

Other types of layers

Forming the CNN

Forming the CNN

output EXP() Softmax output

Pool size (2,2) reduce the rows and

Dropout does not affect the output shape

network.add(layers.Dense(60, activation='relu')) 60 nodes

You might also like