Professional Documents
Culture Documents
lec-3-NNs
lec-3-NNs
lec-3-NNs
Neural Networks
• Cost function
𝑛
1
𝒞 𝜽 = − (𝑦𝑖 ∙ log 𝑦ෝ𝑖 + (1 − 𝑦𝑖 ) ∙ log[1 − 𝑦ෝ𝑖 ])
𝑛
𝑖=1
y=0
Optimization
Loss
function
𝒇𝒊 = 𝑤𝒊,𝒋 𝑥𝒋
𝒋
𝒇=𝑾𝒙 (Matrix Notation)
On CIFAR-10
On CIFAR-10
𝒙 𝒙
𝑾 𝒇 𝑾𝟏 𝒉 𝑾2 𝒇
𝒙
𝑾𝟏 𝒉 𝑾2 𝒇
𝑓(𝑊1,0 𝑥 + 𝑏1,0 )
𝑥1
𝑓(𝑊0,1 𝑥 + 𝑏0,1 )
𝑓(𝑊0,2 𝑥 + 𝑏0,2 )
𝑥3
𝑓(𝑊1,2 𝑥 + 𝑏1,2 )
𝑓(𝑊0,3 𝑥 + 𝑏0,3 )
Source: https://towardsdatascience.com/training-deep-neural-networks-9fdb1964b964
I2DL: Prof. Niessner 19
Activation Functions
1 Leaky ReLU: max 0.1𝑥, 𝑥
Sigmoid: 𝜎 𝑥 = (1+𝑒 −𝑥 )
tanh: tanh 𝑥
𝒇 = 𝑾𝟑 ⋅ (𝑾𝟐 ⋅ 𝑾𝟏 ⋅ 𝒙 ))
𝑓(𝑊1,0 𝑥 + 𝑏1,0 )
𝑥1
𝑓(𝑊0,1 𝑥 + 𝑏0,1 )
𝑓(𝑊0,2 𝑥 + 𝑏0,2 )
𝑥3
𝑓(𝑊1,2 𝑥 + 𝑏1,2 )
𝑓(𝑊0,3 𝑥 + 𝑏0,3 )
sum 𝑓 𝑥, 𝑦, 𝑧
mult
1 1
𝑑 = −2
−3
sum 𝑓 = −8
−3
mult
4
4
– it is directional
Source; https://www.zybuluo.com/liuhui0803/note/981434
I2DL: Prof. Niessner 36
Computational Graphs
• The computations of Neural Networks have further
meanings:
– The convolutional layers: extract useful features with
shared weights
Source: https://medium.com/@timothy_terati/image-convolution-filtering-a54dce7c786b
Source: https://www.zybuluo.com/liuhui0803/note/981434
prediction
large distance!
prediction
small distance!
target target
Intuitively, ...
– a large loss indicates bad predictions/performance
(→ performance needs to be improved by training the
model)
– the choice of the loss function depends on the concrete
problem or the distribution of the target variable
Yes! (0.8)
The network predicts
the probability of the input
belonging to the "yes" class!
No! (0.2)
ෝ; 𝜽 = − (𝑦𝑖𝑘 ∙ log𝑦ො𝑖𝑘 )
𝐿 𝒚, 𝒚
𝑖=1 𝑘=1
Prediction
Targets
t1
Training time
Bad prediction!
Prediction
Targets
t2
Training time
Bad prediction!
Prediction
Targets
t3
Training time
Bad prediction!
Training time
I2DL: Prof. Niessner 50
How to Find a Better NN?
Plotting loss curves against model
parameters
𝜽
Loss
𝐿(𝒚, 𝑓𝜽 𝒙 )
Parameters 𝜽
I2DL: Prof. Niessner 51
How to Find a Better NN?
• Loss function: 𝐿 𝒚, 𝒚
ෝ = 𝐿(𝒚, 𝑓𝜽 𝒙 )
• Neural Network: 𝑓𝜽 (𝒙)
• Goal:
– minimize the loss w. r. t. 𝜽
Gradient Descent
𝜽
I2DL: Prof. Niessner 53
How to Find a Better NN?
• Minimize: 𝐿 𝒚, 𝑓𝜽 𝒙 w.r.t. 𝜽
𝛁𝜽 𝐿(𝒚, 𝑓𝜽 𝒙 )
Learning rate
𝜽
Loss 𝜽 = 𝜽 − 𝛼 𝛁𝜽 𝐿 𝒚, 𝑓𝜽 𝒙
𝐿(𝒚, 𝑓𝜽 𝒙 )
𝜽∗ = arg min 𝐿 𝒚, 𝑓𝜽 𝒙
𝜽
I2DL: Prof. Niessner 54
How to Find a Better NN?
• Given inputs 𝒙 and targets 𝒚
• Given one layer NN with no activation function
𝒇𝜽 𝒙 = 𝑾𝒙 , 𝜽=𝑾
Later 𝜽 = {𝑾, 𝒃}
1
ෝ; 𝜽 = σ𝑛𝑖 | 𝑦𝑖 − 𝑦ෝ𝑖 |
• Given MSE Loss: 𝐿 𝒚, 𝒚 2
2
𝑛
* Gradient flow
𝑊 Multiply 𝐿
𝑦
I2DL: Prof. Niessner 56
How to Find a Better NN?
• Given inputs 𝒙 and targets 𝒚
• Given one layer NN with no activation function
𝑓𝜃 𝒙 = 𝑾𝒙, 𝜽=𝑾
1
ෝ; 𝜽 = σ𝑛𝑖 | 𝑾 ⋅ 𝑥𝑖 − 𝑦𝑖 |
• Given MSE Loss: 𝐿 𝒚, 𝒚 2
2
𝑛
2
• 𝛻𝜃 𝐿 𝒚, 𝑓𝜃 (𝒙) = σ𝑛𝑖 𝑾 ⋅ 𝑥𝑖 − 𝑦𝑖 ⋅ 𝑥𝑖𝑇
𝑛
𝑥
Gradient flow
* max(𝟎, )
𝑊1 Multiply *
𝑊2 Multiply 𝐿
• Next Lecture:
– Backpropagation and optimization of Neural Networks
• General concepts:
– Pattern Recognition and Machine Learning – C. Bishop
– http://www.deeplearningbook.org/