Professional Documents
Culture Documents
Multilayer Perceptron: R - S - S - S Network
Multilayer Perceptron: R - S - S - S Network
Multilayer Perceptron: R - S - S - S Network
R – S1 – S2 – S3 Network
2
11 Example
3
11 Elementary Decision Boundaries
1 First Boundary:
1
a 1 = hardlim ( – 1 0 p + 0.5 )
2
Second Boundary:
1
a 2 = hardlim ( 0 – 1 p + 0.75 )
First Subnetwork
AA AA
Inputs Individual Decisions AND Operation
AAAA
-1
Σ
n11 a11
A A
A A
p1
0 1 n21 a21
0.5
Σ
AAAA
1
n12 a12 -1.5
Σ
0 1
p2
-1 1
0.75
1
4
11 Elementary Decision Boundaries
4 Third Boundary:
1
a 3 = hardlim ( 1 0 p – 1.5 )
Fourth Boundary:
3 1
a 4 = hardlim ( 0 1 p – 0.25 )
Second Subnetwork
AA AA
Inputs Individual Decisions AND Operation
AAAA
1
Σ
n13 a13
A A
A A
p1
0 1 n22 a22
- 1.5
Σ
AAAA
1
n14 a14 -1.5
Σ
0 1
p2
1 1
- 0.25
1
5
11 Total Network
–1 0 0.5
1
W = 0 –1 b 1 = 0.75
1 0 – 1.5
0 1 – 0.25
2
W = 1100 b = – 1.5
2
0 0 1 1 – 1.5
3 3
W = 11 b = – 0.5
AA AA AA AA AA AA
AA AA AA AA AA AA
p a1 a2 a3
2x1
W1 4x1
W2 2x1
W3 1x1
n 1 n 2 n3
AA AAAA AAAA AA
4x 2 2x4 1x 2
4x1 2x1 1x1
1 b1 1 b2 1 b3
4x1 2x1 1x1
2 4 2 1
AAAA
Input Log-Sigmoid Layer Linear Layer
AAAA AA AA
1 1
n11 a11 f ( n ) = -----------------
w11,1
Σ w21,1
1+e
–n
AAAA
n2 a2
Σ
b11
p
1
AAAA
n12 a12
Σ
b2
w 2,1
1 w21,2
1
2
b12 f (n) = n
1
2 2 2
w 1, 1 = 1 w 1, 2 = 1 b = 0
7
11 Nominal Response
-1
-2 -1 0 1 2
8
11 Parameter Variations
3 3
2
2 0≤
1
b2 ≤ 20 2 – 1 ≤ w 1, 1 ≤ 1
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
3 3
2
2 –1 ≤ b ≤ 1
2 –1 ≤ w 1, 2 ≤1 2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
9
11 Multilayer Network
0
a = p
a = aM
10
11 Performance Index
Training Set
{p 1, t 1} , { p 2, t 2} , … , {p Q, tQ}
Vector Case
T T
F(x )= E[e e ] = E[(t – a ) (t – a ) ]
Example
f ( n ) = cos ( n ) n = e
2w
f ( n ( w ) ) = cos ( e
2w
)
d f (n(w) ) d f ( n ) dn ( w ) 2w 2w 2w
----------------------- = -------------- × --------------- = ( – sin ( n ) ) ( 2e ) = ( – sin ( e ) ) ( 2e )
dw dn dw
m
∂F̂ ∂F̂ ∂n i ∂F̂ ∂F̂ ∂n i
m
m
- = --------m- × -----------
----------- m
- --------- = --------- × ---------
∂w i, j ∂n i ∂w i, j ∂b i
m m
∂n i ∂b i
m
12
11 Gradient Calculation
m–1
S
∑
m m m–1 m
ni = wi, j a j + bi
j=1
m m
∂n i m–1 ∂n i
------------ = a j --------- = 1
m m
∂w i, j ∂b i
Sensitivity
m ∂F̂
s i ≡ --------m-
∂n i
Gradient
∂F̂ m m–1 ∂F̂ m
-----------
m
- = s i aj --------m- = s i
∂w i, j ∂b i
13
11 Steepest Descent
m m m m–1 m m m
w i, j ( k + 1 ) = w i, j ( k ) – αs i a j b i ( k + 1 ) = b i ( k ) – αsi
m m m m–1 T
W (k + 1) = W (k ) – αs (a ) bm ( k + 1 ) = bm ( k ) – α sm
∂F̂
---------
∂n m 1
∂F̂
m ∂
F̂ --------m-
s ≡ ---------m- = ∂n 2
∂n
…
∂F̂
----------
m
-
∂n m
S
…
m m
∂n j ∂n j
∂n mm++ 11 ∂n mm++ 11 ∂n mm++ 11
S S S
---------------- ---------------- … ---------------- m m ∂ f (n j )
m m
∂n 1
m
∂n 2
m
∂n m
m f˙ (n j ) = ---------------------
m
S ∂n j
m m
f˙ ( n 1 ) 0 … 0
m+1 m m
∂n m+1 m m m m 0 f˙ ( n 2 ) … 0
----------------- = W Ḟ ( n ) Ḟ ( n ) =
m
∂n …
…
…
m
0 0 … f˙ ( n mm )
S
15
11 Backpropagation (Sensitivities)
T
m ∂F̂ ∂n m + 1 ∂F̂ m m m + 1 T ∂F̂
s = ---------- = ----------------
m
- = Ḟ ( n ) ( W
- ----------------
m+1
) ----------------
m+1
-
∂n
m
∂n ∂n ∂n
m m m m+1 T m+1
s = Ḟ (n ) ( W ) s
M M–1 2 1
s →s →…→s →s
16
11 Initialization (Last Layer)
M
S
∂ ∑ (t j – a j)
2
M ∂F̂ ∂( t – a ) ( t – a )
T
∂a i
si = ---------- = --------------------------------------- = ----------------------------------- = – 2 ( t i – a i ) ----------
j = 1
M M M M
∂n i ∂n i ∂n i ∂n i
∂a i ∂a iM ∂ f M ( nM ) M M
---------- = ---------- = ----------------------- = f˙ ( n i )
i
∂n iM ∂n iM ∂n iM
M M M
si = – 2 ( t i – a i ) f˙ ( n i )
M M M
s = – 2 Ḟ (n ) ( t – a )
17
11 Summary
Forward Propagation
0
a = p
m+1 m+1 m+1 m m+1
a = f (W a +b ) m = 0, 2, … , M – 1
a = aM
Backpropagation
M M M
s = – 2 Ḟ (n ) ( t – a )
m m m m+1 T m+1
s = Ḟ (n ) ( W ) s m = M – 1, … , 2, 1
Weight Update
m m m m–1 T m m m
W (k + 1) = W (k ) – αs (a ) b (k + 1) = b (k ) – αs
18
11 Example: Function Approximation
π t
g ( p ) = 1 + sin --- p
4
p - e
1-2-1 a
Network
19
11 Network
p
1-2-1 a
Network
AAAA
Input Log-Sigmoid Layer Linear Layer
AAAA A A
n11 a11
w11,1
Σ w21,1
AAAA A A
n2 a2
Σ
b11
p
1
AAAA
n12 a12
Σ
b2
w 2,1
1 w21,2
1
b12
1
-1
-2 -1 0 1 2 21
11 Forward Propagation
0
a = p = 1
1
--------------------
0.75
1
a = 1 + e = 0.321
1 0.368
-------------------
0.54
-
1+e
0.368
π 2 π
e = t – a = 1 + sin --- p – a = 1 + sin --- 1 – 0.446 = 1.261
4 4
22
11 Transfer Function Derivatives
–n
d 1
----------------- = ------------------------ = 1 – ----------------- ----------------- = ( 1 – a ) ( a )
1 e 1 1 1 1
f˙ ( n ) =
d n1 + e–n –n 2 –n –n
(1 + e ) 1+e 1+e
2 d
˙
f ( n) = (n) = 1
dn
23
11 Backpropagation
2 2 2 2
s = – 2 Ḟ (n ) ( t – a ) = – 2 f˙ ( n 2 ) ( 1.261 ) = – 2 1 ( 1.261 ) = – 2.522
1 1
1 1 1 2 T 2 ( 1 – a 1 ) ( a1 ) 0 0.09
s = Ḟ (n ) ( W ) s = – 2.522
1 1
0 ( 1 – a 2 ) ( a 2 ) – 0.17
s = ( 1 – 0.321 ) ( 0.321 )
1 0 0.09
– 2.522
0 ( 1 – 0.368 ) ( 0.368 ) – 0.17
24
11 Weight Update
α = 0.1
2 2 2 1 T
W ( 1 ) = W ( 0 ) – α s ( a ) = 0.09 – 0.17 – 0.1 – 2.522 0.321 0.368
2
W ( 1 ) = 0.171 – 0.0772
2 2 2
b ( 1 ) = b ( 0 ) – α s = 0.48 – 0.1 – 2.522 = 0.732
1 0 T
W ( 1 ) = W ( 0 ) – α s ( a ) = – 0.27 – 0.1 – 0.0495 1 = – 0.265
1 1
1-3-1 Network
3 3
2
i=1 2
i=2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
3 3
2
i=4 2
i=8
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
26
11 Choice of Network Architecture
g ( p ) = 1 + sin ------ p
6π
4
3 3
2
1-2-1 2
1-3-1
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
3 3
2
1-4-1 2
1-5-1
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
27
11 Convergence
g ( p ) = 1 + sin ( πp )
3 3
2 5 2
1
5
1 3 1 3
2 4
4 2
0
0 0
0
1
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
28
11 Generalization
{p 1, t 1} , { p 2, t 2} , … , {p Q, tQ}
π
g ( p ) = 1 + sin --- p p = – 2, – 1.6, – 1.2, … , 1.6, 2
4
3 3
1-2-1 1-9-1
2 2
1 1
0 0
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2
29