Professional Documents
Culture Documents
Session 5
Session 5
Session V
Pierre Michel
pierre.michel@univ-amu.fr
M2 EBDS
2021
1. Convolutional Neural Networks
Until now, you have used classical Artificial Neural Networks (ANN),
also called Multi-Layer Perceptron (MLP).
We use the MNIST dataset, that contains simple images of hand-written
digits (low-resolution images). More realistic datasets imply more complex
images (high-resolution color images).
In the context of MLP, each neuron in the input layer is an input of each
neuron in the hidden layer, this is a fully-connected network.
Flattened image
(25x1)
𝐼11
Fully-connected MLP
𝐼12
⋮
Input image
⋮
⋮
(5 x 5) ⋮ 𝐼11
⋮
⋮
𝐼11 𝐼12 ⋯ ⋯ 𝐼15 ⋮
⋮
⋮
⋮
𝐼21 ⋱ ⋱ ⋱ ⋮ ⋮
⋮ ℎ𝑊,𝑏 (𝑥)
⋮
⋮ ⋱ ⋱ ⋱ ⋮ ⋮
⋮ 𝐼55
⋮ ⋱ ⋱ ⋱ ⋮ ⋮ 1
⋮
Layer L
𝐼51 ⋯ ⋯ ⋯ 𝐼55 ⋮ 1 1 (output)
⋮ Layer L-1
𝐼𝑖𝑗 : Intensity of pixel (𝑖, 𝑗) ⋮
⋮
⋮
Layer 1 Layer 2
⋮ (input)
⋮
⋮
𝐼55
Fully-connected networks
Locally-connected networks
Output
𝐼13
𝑓11 𝑓12 𝑓13 ⋮ 𝐼13
𝑓11
𝑓12
𝑓21 𝑓22 𝑓23 ⋮
𝑓13 ⋮ (convolved)
𝐼21 ⋮
𝐼22 image
𝑓31 𝑓32 𝑓33 𝑓21
𝐼ሚ11 (3 x 3)
𝐼23 𝐼21
⋮ 𝑓22
⋮ 𝐼22
𝐼31
𝑓23 ⋮ 𝐼ሚ11 𝐼ሚ12 𝐼ሚ13
Input image 𝐼32 𝐼23
𝑓31
𝐼ሚ21 𝐼ሚ22 𝐼ሚ23
(5 x 5)
𝐼33 𝑓32
⋮ ⋮ 𝑓33
⋮ 𝐼ሚ31 𝐼ሚ32 𝐼ሚ33
𝐼11 𝐼12 𝐼13 ⋯ 𝐼15 ⋮ 𝐼31
⋮
𝐼21 𝐼22 𝐼23 ⋱ ⋮ ⋮ 𝐼32
𝐼ሚ𝑖𝑗 : Intensity of pixel (𝑖, 𝑗)
⋮ Convolution layer
𝐼31 𝐼32 𝐼33 ⋱ ⋮ ⋮
⋮ 𝐼33 (layer with 9 neurons)
⋮
⋮ ⋱ ⋱ ⋱ ⋮ ⋮
⋮
⋮
𝐼51 ⋯ ⋯ ⋯ 𝐼55 𝐼55
𝐼55
1
with 𝐼ሚ𝑖𝑗 = (𝑓11 𝐼(𝑖−1)(𝑗−1) + 𝑓12 𝐼(𝑖−1)(𝑗) + 𝑓13 𝐼(𝑖−1)(𝑗+1) + 𝑓21 𝐼(𝑖)(𝑗−1) + 𝑓22 𝐼(𝑖)(𝑗) +𝑓23 𝐼(𝑖)(𝑗+1) + 𝑓31 𝐼(𝑖+1)(𝑗−1) + 𝑓32 𝐼(𝑖+1)(𝑗) + 𝑓33 𝐼(𝑖+1)(𝑗+1) )
9
Convolution
Convolution
a X
b
(w)
X
f (X (w) ) = xij fij
i=1 j=1
Pooling
Pooling: illustration
Locally-connected network
Convolution filter Flattened image
matrix (windows) (25 x 1) 𝐼11
(3 x 3) 𝐼11
𝐼12 𝐼12
Output
𝐼13
𝑓11 𝑓12 𝑓13 ⋮ 𝐼13
𝑓11
𝑓12
𝑓21 𝑓22 𝑓23 ⋮
𝑓13 ⋮ (convolved)
𝐼21 ⋮
𝐼22 image
𝑓31 𝑓32 𝑓33 𝑓21
𝐼ሚ11 (3 x 3)
𝐼23 𝐼21
⋮ 𝑓22
⋮ 𝐼22
𝐼31
𝑓23 ⋮ 𝐼ሚ11 𝐼ሚ12 𝐼ሚ13
Input image 𝐼32 𝐼23
𝑓31
𝐼ሚ21 𝐼ሚ22 𝐼ሚ23
(5 x 5)
𝐼33 𝑓32
⋮ ⋮ 𝑓33
⋮ 𝐼ሚ31 𝐼ሚ32 𝐼ሚ33
𝐼11 𝐼12 𝐼13 ⋯ 𝐼15 ⋮ 𝐼31
⋮
𝐼21 𝐼22 𝐼23 ⋱ ⋮ ⋮ 𝐼32
𝐼ሚ𝑖𝑗 : Intensity of pixel (𝑖, 𝑗)
⋮ Convolution layer
𝐼31 𝐼32 𝐼33 ⋱ ⋮ ⋮
⋮ 𝐼33 (layer with 9 neurons)
⋮
⋮ ⋱ ⋱ ⋱ ⋮ ⋮
⋮
⋮
𝐼51 ⋯ ⋯ ⋯ 𝐼55 𝐼55
𝐼55
1
with 𝐼ሚ𝑖𝑗 = (𝑓11 𝐼(𝑖−1)(𝑗−1) + 𝑓12 𝐼(𝑖−1)(𝑗) + 𝑓13 𝐼(𝑖−1)(𝑗+1) + 𝑓21 𝐼(𝑖)(𝑗−1) + 𝑓22 𝐼(𝑖)(𝑗) +𝑓23 𝐼(𝑖)(𝑗+1) + 𝑓31 𝐼(𝑖+1)(𝑗−1) + 𝑓32 𝐼(𝑖+1)(𝑗) + 𝑓33 𝐼(𝑖+1)(𝑗+1) )
9
Pooling: illustration
Pooling
Pooling matrix Flattened image
(windows) (25 x 1)
𝐼ሚ11
(2 x 2)
𝑃 11
𝑓11 𝑓12 𝐼ሚ12 Output
𝐼ሚ11 (pooled)
𝑓21 𝑓22
𝐼ሚ12 image
𝐼ሚ13
Input 𝐼ሚ13 (2 x 2)
(convolved) 𝑃12
𝐼ሚ21 𝐼ሚ21 𝑃11 𝑃12
image
(3 x 3) 𝐼ሚ22 𝑃21 𝑃22
𝐼ሚ23 𝐼ሚ22
𝐼ሚ11 𝐼ሚ12 𝐼ሚ13 𝑃21
𝐼ሚ31 𝑃𝑖𝑗 : pooled value in (𝑖, 𝑗)
𝐼ሚ21 𝐼ሚ22 𝐼ሚ23
𝐼ሚ32 𝐼ሚ23
𝐼ሚ31 𝐼ሚ32 𝐼ሚ33
𝐼ሚ33
𝐼ሚ𝑖𝑗 : Intensity of pixel (𝑖, 𝑗) 𝐼ሚ31 𝑃22
𝐼ሚ32
Pooling layer
𝐼ሚ33 with 4 neurons
Architecture
Finally, the gradients with respect to the filters matrices are givenby:
n
(l) (l+1)
X
∇W (l) J(W, b; x, y) = (ai ) ∗ δ̃k
i=1
a X
b
(l+1)
X
∇b(l) J(W, b; x, y) = (δk )ij
i=1 j=1
(l) (l+1)
where (ai ) ∗ δ̃k is the transposed convolution between input i layer l
and the error of filter k.
CNN: recap
Locally-connected network
Convolution filter Flattened image
matrix (windows) (25 x 1) 𝐼11
(3 x 3) 𝐼11
𝐼12 𝐼12
Output
𝐼13
𝑓11 𝑓12 𝑓13 ⋮ 𝐼13
𝑓11
𝑓12
𝑓21 𝑓22 𝑓23 ⋮
𝑓13 ⋮ (convolved)
𝐼21 ⋮
𝐼22 image
𝑓31 𝑓32 𝑓33 𝑓21
𝐼ሚ11 (3 x 3)
𝐼23 𝐼21
⋮ 𝑓22
⋮ 𝐼22
𝐼31
𝑓23 ⋮ 𝐼ሚ11 𝐼ሚ12 𝐼ሚ13
Input image 𝐼32 𝐼23
𝑓31
𝐼ሚ21 𝐼ሚ22 𝐼ሚ23
(5 x 5)
𝐼33 𝑓32
⋮ ⋮ 𝑓33
⋮ 𝐼ሚ31 𝐼ሚ32 𝐼ሚ33
𝐼11 𝐼12 𝐼13 ⋯ 𝐼15 ⋮ 𝐼31
⋮
𝐼21 𝐼22 𝐼23 ⋱ ⋮ ⋮ 𝐼32
𝐼ሚ𝑖𝑗 : Intensity of pixel (𝑖, 𝑗)
⋮ Convolution layer
𝐼31 𝐼32 𝐼33 ⋱ ⋮ ⋮
⋮ 𝐼33 (layer with 9 neurons)
⋮
⋮ ⋱ ⋱ ⋱ ⋮ ⋮
⋮
⋮
𝐼51 ⋯ ⋯ ⋯ 𝐼55 𝐼55
𝐼55
1
with 𝐼ሚ𝑖𝑗 = (𝑓11 𝐼(𝑖−1)(𝑗−1) + 𝑓12 𝐼(𝑖−1)(𝑗) + 𝑓13 𝐼(𝑖−1)(𝑗+1) + 𝑓21 𝐼(𝑖)(𝑗−1) + 𝑓22 𝐼(𝑖)(𝑗) +𝑓23 𝐼(𝑖)(𝑗+1) + 𝑓31 𝐼(𝑖+1)(𝑗−1) + 𝑓32 𝐼(𝑖+1)(𝑗) + 𝑓33 𝐼(𝑖+1)(𝑗+1) )
9
CNN: recap
Pooling
Pooling matrix Flattened image
(windows) (25 x 1)
𝐼ሚ11
(2 x 2)
𝑃 11
𝑓11 𝑓12 𝐼ሚ12 Output
𝐼ሚ11 (pooled)
𝑓21 𝑓22
𝐼ሚ12 image
𝐼ሚ13
Input 𝐼ሚ13 (2 x 2)
(convolved) 𝑃12
𝐼ሚ21 𝐼ሚ21 𝑃11 𝑃12
image
(3 x 3) 𝐼ሚ22 𝑃21 𝑃22
𝐼ሚ23 𝐼ሚ22
𝐼ሚ11 𝐼ሚ12 𝐼ሚ13 𝑃21
𝐼ሚ31 𝑃𝑖𝑗 : pooled value in (𝑖, 𝑗)
𝐼ሚ21 𝐼ሚ22 𝐼ሚ23
𝐼ሚ32 𝐼ሚ23
𝐼ሚ31 𝐼ሚ32 𝐼ሚ33
𝐼ሚ33
𝐼ሚ𝑖𝑗 : Intensity of pixel (𝑖, 𝑗) 𝐼ሚ31 𝑃22
𝐼ሚ32
Pooling layer
𝐼ሚ33 with 4 neurons
CNN: recap
Flattened image
(25x1)
𝐼11
Fully-connected MLP
𝐼12
⋮
Input image
⋮
⋮
(5 x 5) ⋮ 𝐼11
⋮
⋮
𝐼11 𝐼12 ⋯ ⋯ 𝐼15 ⋮
⋮
⋮
⋮
𝐼21 ⋱ ⋱ ⋱ ⋮ ⋮
⋮ ℎ𝑊,𝑏 (𝑥)
⋮
⋮ ⋱ ⋱ ⋱ ⋮ ⋮
⋮ 𝐼55
⋮ ⋱ ⋱ ⋱ ⋮ ⋮ 1
⋮
Layer L
𝐼51 ⋯ ⋯ ⋯ 𝐼55 ⋮ 1 1 (output)
⋮ Layer L-1
𝐼𝑖𝑗 : Intensity of pixel (𝑖, 𝑗) ⋮
⋮
⋮
Layer 1 Layer 2
⋮ (input)
⋮
⋮
𝐼55
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
https://keras.io/getting_started/
Pierre Michel Prediction methods and Machine learning 26/28
2. CNN with Keras
2.1. Example of CNN using Keras: Fashion-MNIST