Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Convolution

Multi-channels input and multi-filters layer


Dr. Thanh-Sach LE
LTSACH@hcmut.edu.vn

GVLab: Faculty of Computer Science and Engineering,


Graphics and Vision Laboratory HCMUT
2 Contents
❖ Multi-channels input
❖ Multi-filters layer
Convolution
Multi-channels input
Dr. Thanh-Sach LE
LTSACH@hcmut.edu.vn

GVLab: Faculty of Computer Science and Engineering,


Graphics and Vision Laboratory HCMUT
Start Convolution algorithm
✴ Rotate the kernel 1800
✴ Flatten it to a vector 1

✴ Padding zero to input


✴ Allocate output buffer 2

✴ Align the kernel window with the top-left of the input 3

✴ Extract the sub-image overlapped with the kernel


✴ Flatten the sub-image to a vector
✴ Take dot-product with the flattened kernel 4
✴ Fill the result to output

all output No ✴ Slide the kernel window to next


values assigned? position on the input

Yes 5
End
5 Multi-channels input
Input image or feature map: multiple channels

1 3 1 0
2 12 10 21 0
3 11 210 111 00 0
1 11 202 000 01 1
1 12 12 01 1
0 1 0 2

Input: 3 channels
6 Multi-channels input
Input image or feature map: multiple channels

1 3 1 0 3 1 0 1 2 2 0 1 1 3 1 0
2 12 10 21 0 1 1 2 0 1 1 1 0 1 1 2 0
3 11 210 111 00 0 1 2 2 1 1 0 0 1 2 1 0 0
1 11 202 000 01 1 0 1 0 2 1 1 0 1 2 0 0 1
1 12 12 01 1
Channel 1 Channel 2 Channel 3
0 1 0 2 (RED) (GREEN) (BLUE)

Input: 3 channels
7 Multi-channels input
Input image or feature map: multiple channels

1 3 1 0 3 1 0 1 2 2 0 1 1 3 1 0
2 12 10 21 0 1 1 2 0 1 1 1 0 1 1 2 0
3 11 210 111 00 0 1 2 2 1 1 0 0 1 2 1 0 0
1 11 202 000 01 1 0 1 0 2 1 1 0 1 2 0 0 1
1 12 12 01 1
Channel 1 Channel 2 Channel 3
0 1 0 2 (RED) (GREEN) (BLUE)

Input: 3 channels
0 0 1
1 10 11 0
1 11 000 10 0
1 00 00 1
0 0 1

Kernel: 3 channels
8 Multi-channels input
Input image or feature map: multiple channels

1 3 1 0 3 1 0 1 2 2 0 1 1 3 1 0
2 12 10 21 0 1 1 2 0 1 1 1 0 1 1 2 0
3 11 210 111 00 0 1 2 2 1 1 0 0 1 2 1 0 0
1 11 202 000 01 1 0 1 0 2 1 1 0 1 2 0 0 1
1 12 12 01 1
Channel 1 Channel 2 Channel 3
0 1 0 2 (RED) (GREEN) (BLUE)

Input: 3 channels 1 1 0 1 0 1 0 0 1
0 0 1 1 0 0 1 0 0 1 1 0
1 10 11 0 0 0 1 0 0 1 0 1 0
1 11 000 10 0 Channel 1 Channel 2 Channel 3
1 00 00 1 (RED) (GREEN) (BLUE)
0 0 1

Kernel: 3 channels
9 Multi-channels input
1 Rotate kernel 1800
1 1 0 1 0 1 0 0 1
1 0 0 1 0 0 1 1 0
0 0 1 0 0 1 0 1 0

Rotate 1800 Rotate 1800 Rotate 1800

1 0 0 1 0 0 0 1 0
0 0 1 0 0 1 0 1 1
0 1 1 1 0 1 1 0 0
10 Multi-channels input
1 Flatten the rotated kernel to a vector
1 1 0 1 0 1 0 0 1
1 0 0 1 0 0 1 1 0
0 0 1 0 0 1 0 1 0

Rotate 1800 Rotate 1800 Rotate 1800

1 0 0 1 0 0 0 1 0
0 0 1 0 0 1 0 1 1
0 1 1 1 0 1 1 0 0

After flattening:

1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 0 0
11 Multi-channels input
2 Padding the input
3 1 0 1 2 2 0 1 1 3 1 0
1 1 2 0 1 1 1 0 1 1 2 0
1 2 2 1 1 0 0 1 2 1 0 0
0 1 0 2 1 1 0 1 2 0 0 1

Channel 1 Channel 2 Channel 3


(RED) (GREEN) (BLUE)

(No padding)
12 Multi-channels input
2 Allocate the output buffer
3 1 0 1 2 2 0 1 1 3 1 0
1 1 2 0 1 1 1 0 1 1 2 0
1 2 2 1 1 0 0 1 2 1 0 0
0 1 0 2 1 1 0 1 2 0 0 1

Channel 1 Channel 2 Channel 3


(RED) (GREEN) (BLUE)

input: i1 = i2 = 4 i1 − k1 + 1 = 2
kernel: k1 = k2 = 3
padding: p1 = p2 = 0 i2 − k2 + 1 = 2
strides: s1 = s2 = 1
Output
3 1 0 1 2 2 0 1 1 3 1 0
1 1 2 0 1 1 1 0 1 1 2 0
1 2 2 1 1 0 0 1 2 1 0 0
0 1 0 2 1 1 0 1 2 0 0 1

3 Starting the cross-correlation process


3 1 0 1 2 2 0 1 1 3 1 0
1 1 2 0 1 1 1 0 1 1 2 0
1 2 2 1 1 0 0 1 2 1 0 0
0 1 0 2 1 1 0 1 2 0 0 1

3 1 0 2 2 0 1 3 1

·
1 1 2 1 1 1 1 1 2
1 2 2 1 0 0 2 1 0

3 1 0 1 1 2 1 2 2 2 2 0 1 1 1 1 0 0 1 3 1 1 1 2 2 1 0

dot-product

1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 0 0

21
> Slide the kernel to right
3 1 0 1 2 2 0 1 1 3 1 0
1 1 2 0 1 1 1 0 1 1 2 0
1 2 2 1 1 0 0 1 2 1 0 0
0 1 0 2 1 1 0 1 2 0 0 1

1 0 1 2 0 1 3 1 0

·
1 2 0 1 1 0 1 2 0
2 2 1 0 0 1 1 0 0

1 0 1 1 2 0 2 2 1 2 0 1 1 1 0 0 0 1 3 1 0 1 2 0 1 0 0

dot-product

1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 0 0

21 11
> Slide the kernel down
3 1 0 1 2 2 0 1 1 3 1 0
1 1 2 0 1 1 1 0 1 1 2 0
1 2 2 1 1 0 0 1 2 1 0 0
0 1 0 2 1 1 0 1 2 0 0 1

1 1 2 1 1 1 1 1 2

·
1 2 2 1 0 0 2 1 0
0 1 0 1 1 0 2 0 0

1 1 2 1 2 2 0 1 0 1 1 1 1 0 0 1 1 0 1 1 2 2 1 0 2 0 0

dot-product

1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 0 0

21 11
> Slide the kernel to right
10
3 1 0 1 2 2 0 1 1 3 1 0
1 1 2 0 1 1 1 0 1 1 2 0
1 2 2 1 1 0 0 1 2 1 0 0
0 1 0 2 1 1 0 1 2 0 0 1

1 2 0 1 1 0 1 2 0

·
2 2 1 0 0 1 1 0 0
1 0 2 1 0 1 0 0 1

1 2 0 2 2 1 1 0 2 1 1 0 0 0 1 1 0 1 1 2 0 1 0 0 0 0 1

dot-product

1 0 0 0 0 1 0 1 1 1 0 0 0 0 1 1 0 1 0 1 0 0 1 1 1 0 0

21 11

10 10
18 Multi-channels input

Final result

1 3 1 0
2 12 10 21 0
3 21 11
11 210 111 00 0 CONV
1 ∂b 11 202 000 01 1 10 10
1 12 12 01 1
Output: 1 channel
0 1 0 2
0 0 1
Input: 3 channels 1 10 11 0
1 11 000 10 0
1 00 00 1
0 0 1

Kernel: 3 channels
Convolution
Multi-filters layer
Dr. Thanh-Sach LE
LTSACH@hcmut.edu.vn

GVLab: Faculty of Computer Science and Engineering,


Graphics and Vision Laboratory HCMUT
20 Multi-filters layer
❖ Input: a feature map of D channels
❖ Convolution layer in Deep learning frameworks:
✤ Consists of multiple filters:
✴ Each the filters’ kernel has D channels (#channels of the
21 23 01 10 input)
3 1 0 1
11 11 12 00 ✴All the filters’ kernel are the same size, e.g., 3x3
1 1 2 0
12 01 00 10
1 2 2 1
2 0 0 1
01 11 00 21 ❖ How the convolution layer computed?
Input feature map
(D = 3 channels)
21 Multi-filters layer
21 11
CONV
10 10

0 0 1
11 10 01
11 01 00
1 0 0
1 3 1 0 0 1 0
32 12 00 11 00 00 11
11 11 12 00 Filter 1
1 1 2 0
12 01 00 10
1 2 2 1
12 10 00 11
0 1 0 2
Input feature map
(D = 3 channels)
22 Multi-filters layer
21 11
CONV
10 10

0 0 1
11 10 01
11 01 00
1 0 0
1 3 1 0 0 1 0
32 12 00 11 00 00 11
11 11 12 00 Filter 1
1 1 2 0 x x
12 01 00 10
1 2 2 1 CONV x x
12 10 00 11
0 1 0 2
x x
Input feature map CONV
(D = 3 channels) x x

10 00 11
2 1 0
11 01 00
1 1 1
00 01 10
0 0 1
Filter K
23 Multi-filters layer
21 11
CONV
10 10

0 0 1
11 10 01
11 01 00 xx xx
1 0 0 21 11
1 3 1 0 0 1 0 CONCAT xx xx
32 12 00 11 00 00 11 10 10
11 11 12 00 Filter 1
1 1 2 0 x x K channels
12 01 00 10 (input to next layer)
1 2 2 1 CONV x x
12 10 00 11
0 1 0 2
x x
Input feature map CONV
(D = 3 channels) x x

10 00 11
2 1 0
11 01 00
1 1 1
00 01 10
0 0 1
Filter K

You might also like