Professional Documents
Culture Documents
Machine Learning: Dr. Syed Aun Irtaza Lecture 10-11
Machine Learning: Dr. Syed Aun Irtaza Lecture 10-11
Neural Networks
Overview
What is a Neural Network?
𝑥1
𝑥2 𝑦ො
𝑥3 x
b
𝑥1
𝑥2 𝑦ො
𝑥3 x
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑎[1] + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , 𝑦)
𝑏 [1] 𝑊 [2]
𝑏 [2]
One hidden layer
Neural Network
Neural Network
Representation
Neural Network Representation
𝑥1
𝑥2 𝑦ො
𝑥3
One hidden layer
Neural Network
Computing a
Neural Network’s
Output
Neural Network Representation
𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3
𝑧 = 𝑤𝑇𝑥 + 𝑏
𝑎 = 𝜎(𝑧)
Neural Network Representation
𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3
𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑥1
𝑎 = 𝜎(𝑧) 𝑥2 𝑦ො
𝑥3
Neural Network Representation
1 1𝑇 [1] [1] 1
𝑎1
1
𝑧1 = 𝑤1 𝑥+ 𝑏1 , 𝑎1 = 𝜎(𝑧1 )
𝑥1 1 1𝑇 [1] [1] 1
𝑎2
1
𝑧2 = 𝑤2 𝑥+ 𝑏2 , 𝑎2 = 𝜎(𝑧2 )
𝑥2 𝑦ො 1 1𝑇 [1] [1] 1
𝑎3
1
𝑧3 = 𝑤3 𝑥+ 𝑏3 , 𝑎3 = 𝜎(𝑧3 )
𝑥3 1 1𝑇 [1] [1] 1
𝑎4
1
𝑧4 = 𝑤4 𝑥 + 𝑏4 , 𝑎 4 = 𝜎(𝑧4 )
Neural Network Representation learning
1
𝑎1
Given input x:
𝑥1 1
𝑎2 1 1
𝑧 =𝑊 𝑥+𝑏 1
𝑥2 1
𝑦ො
𝑎3
𝑥3 𝑎 1 = 𝜎(𝑧 1
)
1
𝑎4
𝑧 2 =𝑊 2 𝑎1 +𝑏2
𝑎 2 = 𝜎(𝑧 2
)
One hidden layer
Neural Network
Vectorizing across
multiple examples
Vectorizing across multiple examples
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑥1 𝑎1 = 𝜎(𝑧 1 )
𝑥2 𝑦ො
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑥3
𝑎2 = 𝜎(𝑧 2 )
Vectorizing across multiple examples
for i = 1 to m:
𝑧 1 (𝑖) = 𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖 )
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖 )
One hidden layer
Neural Network
Explanation
for vectorized
implementation
Justification for vectorized implementation
Recap of vectorizing across multiple examples
for i = 1 to m
𝑥1
𝑧 1 (𝑖) =𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑥2 𝑦ො
𝑥3 𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖
)
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖
)
𝑋 = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚)
𝑍1 =𝑊 1 𝑋+𝑏 1
𝐴1 = 𝜎(𝑍 1 )
𝑍2 =𝑊 2 𝐴1 +𝑏2
A[1] = 𝑎[1](1) 𝑎[1](2) … 𝑎[1](𝑚)
𝐴2 = 𝜎(𝑍 2 )
One hidden layer
Neural Network
Activation functions
Activation functions
𝑥1
𝑥2 𝑦ො
𝑥3
Given x:
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑎1 = 𝜎(𝑧 1 )
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑎2 = 𝜎(𝑧 2 )
Pros and cons of activation functions
a a
x
z
1
sigmoid: 𝑎 =
1 + 𝑒 −𝑧
a a
z z
One hidden layer
Neural Network
Why do you
need non-linear
activation functions?
Activation function
𝑥1
𝑥2 𝑦ො
𝑥3
Given x:
𝑧 1 =𝑊 1 𝑥+𝑏 1
𝑎 1 = 𝑔[1] (𝑧 1 )
2 2
𝑧 =𝑊 𝑎1 +𝑏 2
𝑎 2 = 𝑔[2] (𝑧 2
)
One hidden layer
Neural Network
Derivatives of
activation functions
Sigmoid activation function
a
1
𝑔(𝑧) =
1 + 𝑒 −𝑧
z
Tanh activation function
a
𝑔(𝑧) = tanh(𝑧)
z
ReLU and Leaky ReLU
a a
z z
ReLU Leaky ReLU
One hidden layer
Neural Network
Backpropagation
intuition (Optional)
Computing gradients
Logistic regression
𝑥
𝑤 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
𝑏
Andrew Ng
Neural network gradients
𝑊 [2]
𝑥 𝑏 [2]
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑥 + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , y)
𝑏 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦
𝑇
𝑑𝑊 [2] = [2]
𝑑𝑧 𝑎 1
𝑑𝑏 [2] = 𝑑𝑧 [2]
𝑑𝑊 [1] = 𝑑𝑧 [1] 𝑥 𝑇
𝑑𝑏 [1] = 𝑑𝑧 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦 𝑑𝑍 [2] = 𝐴[2] − 𝑌
𝑇 1 𝑇
𝑑𝑊 [2] = [2]
𝑑𝑧 𝑎 1 𝑑𝑊 [2] = 𝑑𝑍 𝐴[2] 1
𝑚
1
𝑑𝑏 [2] = 𝑑𝑧 [2] 𝑑𝑏 [2] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 2 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
[1] [1] 𝑇 1
𝑑𝑊 = 𝑑𝑧 𝑥 𝑑𝑊 [1] = 𝑑𝑍 [1] 𝑋 𝑇
𝑚
1
𝑑𝑏 [1] = 𝑑𝑧 [1] 𝑑𝑏 [1] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 1 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
Andrew Ng