Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

Machine Learning

Dr. Syed Aun Irtaza


Lecture 10-11
One hidden layer
Neural Network

Neural Networks
Overview
What is a Neural Network?
𝑥1
𝑥2 𝑦ො
𝑥3 x

w 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)

b
𝑥1
𝑥2 𝑦ො
𝑥3 x

𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑎[1] + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , 𝑦)

𝑏 [1] 𝑊 [2]
𝑏 [2]
One hidden layer
Neural Network

Neural Network
Representation
Neural Network Representation

𝑥1

𝑥2 𝑦ො

𝑥3
One hidden layer
Neural Network

Computing a
Neural Network’s
Output
Neural Network Representation

𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3

𝑧 = 𝑤𝑇𝑥 + 𝑏

𝑎 = 𝜎(𝑧)
Neural Network Representation

𝑥1 𝑥1
𝑥2 𝑤 𝑇 𝑥 + 𝑏 𝜎(𝑧) 𝑎 = 𝑦ො 𝑥2 𝑦ො
𝑧 𝑎
𝑥3 𝑥3

𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑥1
𝑎 = 𝜎(𝑧) 𝑥2 𝑦ො
𝑥3
Neural Network Representation
1 1𝑇 [1] [1] 1
𝑎1
1
𝑧1 = 𝑤1 𝑥+ 𝑏1 , 𝑎1 = 𝜎(𝑧1 )
𝑥1 1 1𝑇 [1] [1] 1
𝑎2
1
𝑧2 = 𝑤2 𝑥+ 𝑏2 , 𝑎2 = 𝜎(𝑧2 )
𝑥2 𝑦ො 1 1𝑇 [1] [1] 1
𝑎3
1
𝑧3 = 𝑤3 𝑥+ 𝑏3 , 𝑎3 = 𝜎(𝑧3 )
𝑥3 1 1𝑇 [1] [1] 1
𝑎4
1
𝑧4 = 𝑤4 𝑥 + 𝑏4 , 𝑎 4 = 𝜎(𝑧4 )
Neural Network Representation learning
1
𝑎1
Given input x:
𝑥1 1
𝑎2 1 1
𝑧 =𝑊 𝑥+𝑏 1
𝑥2 1
𝑦ො
𝑎3
𝑥3 𝑎 1 = 𝜎(𝑧 1
)
1
𝑎4

𝑧 2 =𝑊 2 𝑎1 +𝑏2

𝑎 2 = 𝜎(𝑧 2
)
One hidden layer
Neural Network

Vectorizing across
multiple examples
Vectorizing across multiple examples
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑥1 𝑎1 = 𝜎(𝑧 1 )
𝑥2 𝑦ො
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑥3
𝑎2 = 𝜎(𝑧 2 )
Vectorizing across multiple examples
for i = 1 to m:
𝑧 1 (𝑖) = 𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖 )
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖 )
One hidden layer
Neural Network

Explanation
for vectorized
implementation
Justification for vectorized implementation
Recap of vectorizing across multiple examples
for i = 1 to m
𝑥1
𝑧 1 (𝑖) =𝑊 1 𝑥 (𝑖) + 𝑏 1
𝑥2 𝑦ො
𝑥3 𝑎 1 (𝑖) = 𝜎(𝑧 1 𝑖
)
𝑧 2 (𝑖) =𝑊 2 𝑎 1 (𝑖) + 𝑏 2
𝑎 2 (𝑖) = 𝜎(𝑧 2 𝑖
)
𝑋 = 𝑥 (1) 𝑥 (2) … 𝑥 (𝑚)
𝑍1 =𝑊 1 𝑋+𝑏 1
𝐴1 = 𝜎(𝑍 1 )
𝑍2 =𝑊 2 𝐴1 +𝑏2
A[1] = 𝑎[1](1) 𝑎[1](2) … 𝑎[1](𝑚)
𝐴2 = 𝜎(𝑍 2 )
One hidden layer
Neural Network

Activation functions
Activation functions
𝑥1

𝑥2 𝑦ො
𝑥3

Given x:
𝑧1 =𝑊 1 𝑥+𝑏 1
𝑎1 = 𝜎(𝑧 1 )
𝑧2 =𝑊 2 𝑎1 +𝑏2
𝑎2 = 𝜎(𝑧 2 )
Pros and cons of activation functions
a a

x
z
1
sigmoid: 𝑎 =
1 + 𝑒 −𝑧
a a

z z
One hidden layer
Neural Network

Why do you
need non-linear
activation functions?
Activation function
𝑥1

𝑥2 𝑦ො
𝑥3

Given x:
𝑧 1 =𝑊 1 𝑥+𝑏 1
𝑎 1 = 𝑔[1] (𝑧 1 )
2 2
𝑧 =𝑊 𝑎1 +𝑏 2
𝑎 2 = 𝑔[2] (𝑧 2
)
One hidden layer
Neural Network

Derivatives of
activation functions
Sigmoid activation function

a
1
𝑔(𝑧) =
1 + 𝑒 −𝑧
z
Tanh activation function
a
𝑔(𝑧) = tanh(𝑧)

z
ReLU and Leaky ReLU
a a

z z
ReLU Leaky ReLU
One hidden layer
Neural Network

Gradient descent for


neural networks
Gradient descent for neural networks
Formulas for computing derivatives
One hidden layer
Neural Network

Backpropagation
intuition (Optional)
Computing gradients
Logistic regression
𝑥
𝑤 𝑧 = 𝑤𝑇𝑥 + 𝑏 𝑎 = 𝜎(𝑧) ℒ(𝑎, 𝑦)
𝑏

Andrew Ng
Neural network gradients
𝑊 [2]
𝑥 𝑏 [2]
𝑊 [1] 𝑧 [1] = 𝑊 [1] 𝑥 + 𝑏 [1] 𝑎[1] = 𝜎(𝑧 [1] ) 𝑧 [2] = 𝑊 [2] 𝑥 + 𝑏 [2] 𝑎[2] = 𝜎(𝑧 [2] ) ℒ(𝑎[2] , y)

𝑏 [1]

Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦
𝑇
𝑑𝑊 [2] = [2]
𝑑𝑧 𝑎 1

𝑑𝑏 [2] = 𝑑𝑧 [2]

𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2] ∗ 𝑔[1] ′(z 1 )

𝑑𝑊 [1] = 𝑑𝑧 [1] 𝑥 𝑇

𝑑𝑏 [1] = 𝑑𝑧 [1]
Andrew Ng
Summary of gradient descent
𝑑𝑧 [2] = 𝑎[2] − 𝑦 𝑑𝑍 [2] = 𝐴[2] − 𝑌

𝑇 1 𝑇
𝑑𝑊 [2] = [2]
𝑑𝑧 𝑎 1 𝑑𝑊 [2] = 𝑑𝑍 𝐴[2] 1
𝑚
1
𝑑𝑏 [2] = 𝑑𝑧 [2] 𝑑𝑏 [2] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 2 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚

𝑑𝑧 [1] = 𝑊 2 𝑇 𝑑𝑧 [2] ∗ 𝑔[1] ′(z 1 ) 𝑑𝑍 [1] = 𝑊 2 𝑇 𝑑𝑍 [2] ∗ 𝑔[1] ′(Z 1 )

[1] [1] 𝑇 1
𝑑𝑊 = 𝑑𝑧 𝑥 𝑑𝑊 [1] = 𝑑𝑍 [1] 𝑋 𝑇
𝑚
1
𝑑𝑏 [1] = 𝑑𝑧 [1] 𝑑𝑏 [1] = 𝑛𝑝. 𝑠𝑢𝑚(𝑑𝑍 1 , 𝑎𝑥𝑖𝑠 = 1, 𝑘𝑒𝑒𝑝𝑑𝑖𝑚𝑠 = 𝑇𝑟𝑢𝑒)
𝑚
Andrew Ng

You might also like