Machine Learning: Dr. Syed Aun Irtaza Lecture 10-11

Machine Learning
Dr. Syed Aun Irtaza

Lecture 10-11
One hidden layer
Neural Network
Neural Networks
Overview
What is a Neural Network?
𝑥1
𝑥2 𝑦ො
𝑥3 x
w 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
b
𝑥1
𝑥2 𝑦ො
𝑥3 x
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑎[1] + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , 𝑦)
𝑏 [1] 𝑊 [2]
𝑏 [2]
One hidden layer
Neural Network
Neural Network
Representation
Neural Network Representation
𝑥1
𝑥2 𝑦ො
𝑥3
One hidden layer
Neural Network
Computing a
Neural Network’s
Output
𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3
𝑧 = 𝑤𝑇𝑥 + 𝑏
𝑎 = 𝜎(𝑧)
𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3
𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑥1
𝑎 = 𝜎(𝑧) 𝑥2 𝑦ො
𝑥3
1 1𝑇 [1] [1] 1
𝑎1
1
𝑧1 = 𝑤1 𝑥+ 𝑏1 , 𝑎1 = 𝜎(𝑧1 )
𝑥1 1 1𝑇 [1] [1] 1
𝑎2
1
𝑧2 = 𝑤2 𝑥+ 𝑏2 , 𝑎2 = 𝜎(𝑧2 )
𝑥2 𝑦ො 1 1𝑇 [1] [1] 1
𝑎3
1
𝑧3 = 𝑤3 𝑥+ 𝑏3 , 𝑎3 = 𝜎(𝑧3 )
𝑥3 1 1𝑇 [1] [1] 1
𝑎4
1
𝑧4 = 𝑤4 𝑥 + 𝑏4 , 𝑎 4 = 𝜎(𝑧4 )
Neural Network Representation learning
1
𝑎1
Given input x:
𝑥1 1
𝑎2 1 1
𝑧 =𝑊 𝑥+𝑏 1
𝑥2 1
𝑦ො
𝑎3
𝑥3 𝑎 1 = 𝜎(𝑧 1
)
1
𝑎4
𝑧 2 =𝑊 2 𝑎1 +𝑏2
𝑎 2 = 𝜎(𝑧 2
)
One hidden layer
Neural Network
Vectorizing across
multiple examples
Vectorizing across multiple examples
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑥1 𝑎1 = 𝜎(𝑧 1 )
𝑥2 𝑦ො
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑥3
𝑎2 = 𝜎(𝑧 2 )
Vectorizing across multiple examples
for i = 1 to m:
𝑧 1 (𝑖) = 𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖 )
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖 )
One hidden layer
Neural Network
Explanation
for vectorized
implementation
Justification for vectorized implementation
Recap of vectorizing across multiple examples
for i = 1 to m
𝑥1
𝑧 1 (𝑖) =𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑥2 𝑦ො
𝑥3 𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖
)
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖
)
𝑋 = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚)
𝑍1 =𝑊 1 𝑋+𝑏 1
𝐴1 = 𝜎(𝑍 1 )
𝑍2 =𝑊 2 𝐴1 +𝑏2
A[1] = 𝑎[1](1) 𝑎[1](2) … 𝑎[1](𝑚)
𝐴2 = 𝜎(𝑍 2 )
One hidden layer
Neural Network
Activation functions
Activation functions
𝑥1
𝑥2 𝑦ො
𝑥3
Given x:
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑎1 = 𝜎(𝑧 1 )
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑎2 = 𝜎(𝑧 2 )
Pros and cons of activation functions
a a
x
z
1
sigmoid: 𝑎 =
1 + 𝑒 −𝑧
a a
z z
One hidden layer
Neural Network
Why do you
need non-linear
activation functions?
Activation function
𝑥1
𝑥2 𝑦ො
𝑥3
Given x:
𝑧 1 =𝑊 1 𝑥+𝑏 1
𝑎 1 = 𝑔[1] (𝑧 1 )
2 2
𝑧 =𝑊 𝑎1 +𝑏 2
𝑎 2 = 𝑔[2] (𝑧 2
)
One hidden layer
Neural Network
Derivatives of
activation functions
Sigmoid activation function
a
1
𝑔(𝑧) =
1 + 𝑒 −𝑧
z
Tanh activation function
a
𝑔(𝑧) = tanh(𝑧)
z
ReLU and Leaky ReLU
a a
z z
ReLU Leaky ReLU
One hidden layer
Neural Network
Gradient descent for

neural networks
Gradient descent for neural networks
Formulas for computing derivatives
One hidden layer
Neural Network
Backpropagation
intuition (Optional)
Computing gradients
Logistic regression
𝑥
𝑤 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
𝑏
Andrew Ng
Neural network gradients
𝑊 [2]
𝑥 𝑏 [2]
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑥 + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , y)
𝑏 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦
𝑇
𝑑𝑊 [2] = [2]
𝑑𝑧 𝑎 1
𝑑𝑏 [2] = 𝑑𝑧 [2]
𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2] ∗ 𝑔[1] ′(z 1 )
𝑑𝑊 [1] = 𝑑𝑧 [1] 𝑥 𝑇
𝑑𝑏 [1] = 𝑑𝑧 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦 𝑑𝑍 [2] = 𝐴[2] − 𝑌
𝑇 1 𝑇
𝑑𝑊 [2] = [2]
𝑑𝑧 𝑎 1 𝑑𝑊 [2] = 𝑑𝑍 𝐴[2] 1
𝑚
1
𝑑𝑏 [2] = 𝑑𝑧 [2] 𝑑𝑏 [2] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 2 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2] ∗ 𝑔[1] ′(z 1 ) 𝑑𝑍 [1] = 𝑊 2 𝑇 𝑑𝑍 [2] ∗ 𝑔[1] ′(Z 1 )
[1] [1] 𝑇 1
𝑑𝑊 = 𝑑𝑧 𝑥 𝑑𝑊 [1] = 𝑑𝑍 [1] 𝑋 𝑇
𝑚
1
𝑑𝑏 [1] = 𝑑𝑧 [1] 𝑑𝑏 [1] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 1 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
Andrew Ng

Machine Learning: Dr. Syed Aun Irtaza Lecture 10-11

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning: Dr. Syed Aun Irtaza Lecture 10-11

Uploaded by

Copyright:

Available Formats

Machine Learning

Dr. Syed Aun Irtaza

w 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)

Gradient descent for

𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2] ∗ 𝑔[1] ′(z 1 )

𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2] ∗ 𝑔[1] ′(z 1 ) 𝑑𝑍 [1] = 𝑊 2 𝑇 𝑑𝑍 [2] ∗ 𝑔[1] ′(Z 1 )

You might also like