Professional Documents
Culture Documents
ECS171: Machine Learning: Lecture 10: Neural Networks
ECS171: Machine Learning: Lecture 10: Neural Networks
ECS171: Machine Learning: Lecture 10: Neural Networks
Cho-Jui Hsieh
UC Davis
Example: h = (h1 or h2 ):
h(x) = sign(1.5 + h1 (x) + h2 (x))
h1 (x) = sign(w1T x), h2 (x) = sign(w2T x)
Creating more layers
feedforward network
Activation Function
1 ≤ l ≤ L
: layers
(l)
wij 0 ≤ i ≤ d (l−1) : inputs
1 ≤ j ≤ d (l) : outputs
Formal Definitions
1 ≤ l ≤ L
: layers
(l)
wij 0 ≤ i ≤ d (l−1) : inputs
1 ≤ j ≤ d (l) : outputs
1 ≤ l ≤ L
: layers
(l)
wij 0 ≤ i ≤ d (l−1) : inputs
1 ≤ j ≤ d (l) : outputs
Output:
(L)
h(x) = x1
Forward propagation
Forward propagation
Forward propagation
Forward propagation
Forward propagation
Forward propagation
Forward propagation
Forward propagation
Forward propagation
Forward propagation
Stochastic Gradient Descent
e(h(xn ), yn ) = e(W )
Stochastic Gradient Descent
e(h(xn ), yn ) = e(W )
∂e(W )
∇e(W ) : { (l)
} for all i, j, l
∂wij
∂e(W )
Computing Gradient (l)
∂wij
(l) ∂e(W )
Define δj := (l)
∂sj
Compute by layer-by-layer:
(l−1) ∂e(W )
δi = (l−1)
∂si
d (l) (l−1)
X ∂e(W ) ∂sj ∂xi
= × ×
j=1 ∂sj
(l) (l−1)
∂xi ∂sil−1
d
(l) (l) (l−1)
X
= δj × wij × θ0 (si ),
j=1
(L) ∂e(W )
δ1 = (L)
∂s1
(L)
∂e(W ) ∂x1
= (L)
× (L)
∂x1 s1
(L) (L)
= 2(x1 − yn ) × θ0 (s1 )
Backward propagation
Backward propagation
Backward propagation
Backward propagation
Backward propagation
Backward propagation
Backward propagation
Backpropagation
Questions?