Professional Documents
Culture Documents
Presentation Notes
Presentation Notes
Computer Vision
Mr Hew Ka Kian
hew_ka_kian@rp.edu.sg
OFFICIAL (CLOSED) \ NON-SENSITIVE
Image Convolution
2
OFFICIAL (CLOSED) \ NON-SENSITIVE
Image Convolution
• In image processing, convolution is the process of transforming an image by applying a kernel
over each pixel and its local neighbors across the entire image. The kernel is a matrix of values
whose size and values determine the transformation effect of the convolution process.
https://medium.com/@bdhuma/6-basic-things-to-know-about-convolution-daef5e1bc411
3
OFFICIAL (CLOSED) \ NON-SENSITIVE
Grid Size
• The number of pixels a kernel “sees” at once
• Typically use odd numbers so that there is a “center” pixel
• Kernel does not need to be square
4
OFFICIAL (CLOSED) \ NON-SENSITIVE
Padding
• Using Kernels directly, there will be an “edge effect”
• Pixels near the edge will not be used as “center pixels” since there are not
enough surrounding pixels
• Padding adds extra pixels around the frame
• So every pixel of the original image will be a center pixel as the kernel
moves across the image
5
OFFICIAL (CLOSED) \ NON-SENSITIVE
Stride
• The ”step size” as the kernel moves across the image
• Can be different for vertical and horizontal steps (but usually is the
same value)
• When stride is greater than 1, it scales down the output dimension
Stride 2 Example – No Padding
3
0
6
OFFICIAL (CLOSED) \ NON-SENSITIVE
Depth
• In images, each pixel may be represented by multiple values. This is
also known as image mode.
• The number of values is referred to as “channels”
• RGB image – 3 channels
• CMYK – 4 channels
• The output of the convolution, called feature map, will have the same
depth or channel.
7
OFFICIAL (CLOSED) \ NON-SENSITIVE
Basic Idea
• A convolution layer consists of n-number of kernels. Each
kernel will convolve with the input images to produce n-
number of output images.
• The activation function of a convolution layer is Relu
because
(i) there is no requirement of ‘thresholding’ the
convoluted output image, and
(ii) there is no negative values in an image.
8
OFFICIAL (CLOSED) \ NON-SENSITIVE
9
OFFICIAL (CLOSED) \ NON-SENSITIVE
10
Ref: https://medium.com/analytics-vidhya/a-simple-introduction-to-dropout-regularization-with-code-5279489dda1e
OFFICIAL (CLOSED) \ NON-SENSITIVE
11
OFFICIAL (CLOSED) \ NON-SENSITIVE
12
OFFICIAL (CLOSED) \ NON-SENSITIVE
13
OFFICIAL (CLOSED) \ NON-SENSITIVE
Softmax
exp(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎 𝑖𝑖)
• 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎 𝑖𝑖 =
𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑎𝑎𝑎𝑎𝑎𝑎 exp(𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜)
• Use Excel to calculate exp() with =EXP(value)
14
OFFICIAL (CLOSED) \ NON-SENSITIVE
Student Activity
How many actual Mango are wrongly predicted as Orange?
Prediction
Actual
Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE
Student Activity
How many predictions of Apple are actually Mango?
Prediction
Actual
Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE
Student Activity
What is the accuracy when tested with actual Apple?
• Accuracy = (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝒄𝒄𝒐𝒐𝒓𝒓𝒓𝒓𝒆𝒆𝒄𝒄𝒕𝒕 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠) / (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓
𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠)
Accuracy = 85 / (8 + 10 + 9) = 0.82
Actual
Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE
Student Activity
What is the overall accuracy?
• Accuracy = (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝒄𝒄𝒐𝒐𝒓𝒓𝒓𝒓𝒆𝒆𝒄𝒄𝒕𝒕 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠) / (𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓
𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠)
Accuracy = (85+93+85)/(85+10+9+6+93+12+7+8+85) = 0.83
Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE
Student Activity
Calculate the test accuracy from
the numbers you can get from the
confusion matrix below and make
sure you can get the same accuracy
as the test_acc
• Accuracy =
𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝒄𝒄𝒐𝒐𝒓𝒓𝒓𝒓𝒆𝒆𝒄𝒄𝒕𝒕 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠 /
𝑛𝑛𝑢𝑢𝑚𝑚𝑏𝑏𝑒𝑒𝑟𝑟 𝑜𝑜𝑓𝑓 𝑝𝑝𝑟𝑟𝑒𝑒𝑑𝑑𝑖𝑖𝑐𝑐𝑡𝑡𝑖𝑖𝑜𝑜𝑛𝑛𝑠𝑠
Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE
Student Activity
Complete the code below to build the network as the summary shows.
Output will be 40 32x32 feature map. 32x32
network = models.Sequential() input shape is preserved because of padding
network.add(layers.Conv2D(40, (3, 3), activation="relu", padding='same’, input_shape=(32,32,3)))
network.add(layers.Conv2D(60, (3, 3), activation="relu")) Output is 60 30x30 feature map. Size is reduced
network.add(layers.MaxPooling2D(pool_size=(2, 2))) by 2 rows and 2 columns with no padding
Student Activity
network.add(layers.Flatten()) Flatten 2x2x40 into 160 nodes
Source: