Multilayer Perceptron: R - S - S - S Network

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

11 Multilayer Perceptron

R – S1 – S2 – S3 Network
2
11 Example

3
11 Elementary Decision Boundaries
1 First Boundary:
1
a 1 = hardlim ( – 1 0 p + 0.5 )
2

Second Boundary:
1
a 2 = hardlim ( 0 – 1 p + 0.75 )

First Subnetwork

AA AA
Inputs Individual Decisions AND Operation

AAAA
-1
Σ
n11 a11

A A
A A
p1
0 1 n21 a21
0.5
Σ
AAAA
1
n12 a12 -1.5
Σ
0 1
p2
-1 1
0.75
1
4
11 Elementary Decision Boundaries
4 Third Boundary:
1
a 3 = hardlim ( 1 0 p – 1.5 )

Fourth Boundary:
3 1
a 4 = hardlim ( 0 1 p – 0.25 )

Second Subnetwork

AA AA
Inputs Individual Decisions AND Operation

AAAA
1
Σ
n13 a13

A A
A A
p1
0 1 n22 a22
- 1.5
Σ
AAAA
1
n14 a14 -1.5
Σ
0 1
p2
1 1
- 0.25
1
5
11 Total Network
–1 0 0.5
1
W = 0 –1 b 1 = 0.75
1 0 – 1.5
0 1 – 0.25

2
W = 1100 b = – 1.5
2
0 0 1 1 – 1.5
3 3
W = 11 b = – 0.5

Input Initial Decisions AND Operations OR Operation

AA AA AA AA AA AA
AA AA AA AA AA AA
p a1 a2 a3
2x1
W1 4x1
W2 2x1
W3 1x1
n 1 n 2 n3

AA AAAA AAAA AA
4x 2 2x4 1x 2
4x1 2x1 1x1
1 b1 1 b2 1 b3
4x1 2x1 1x1
2 4 2 1

a1 = hardlim (W1p + b1) a2 = hardlim (W2a1 + b2) a3 = hardlim (W3a2 + b3)


6
11 Function Approximation Example

AAAA
Input Log-Sigmoid Layer Linear Layer

AAAA AA AA
1 1
n11 a11 f ( n ) = -----------------
w11,1
Σ w21,1
1+e
–n

AAAA
n2 a2
Σ
b11
p
1

AAAA
n12 a12
Σ
b2
w 2,1
1 w21,2
1
2
b12 f (n) = n
1

a1 = logsig (W1p + b1) a2 = purelin (W2a1 + b2)

Nominal Parameter Values


1 1 1 1
w 1, 1 = 1 0 w 2, 1 = 10 b 1 = – 10 b 2 = 10

2 2 2
w 1, 1 = 1 w 1, 2 = 1 b = 0

7
11 Nominal Response

-1
-2 -1 0 1 2

8
11 Parameter Variations

3 3
2
2 0≤
1
b2 ≤ 20 2 – 1 ≤ w 1, 1 ≤ 1

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

3 3
2
2 –1 ≤ b ≤ 1
2 –1 ≤ w 1, 2 ≤1 2

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

9
11 Multilayer Network

m+1 m+1 m+1 m m+1


a = f (W a +b ) m = 0, 2, … , M – 1

0
a = p

a = aM
10
11 Performance Index
Training Set
{p 1, t 1} , { p 2, t 2} , … , {p Q, tQ}

Mean Square Error


2 2
F(x)= E[ e ] = E[ (t – a) ]

Vector Case
T T
F(x )= E[e e ] = E[(t – a ) (t – a ) ]

Approximate Mean Square Error (Single Sample)


T T
F̂ ( x ) = ( t ( k ) – a ( k ) ) ( t ( k ) – a ( k ) ) = e ( k ) e ( k )

Approximate Steepest Descent


m m ∂F̂ m m ∂F̂
wi, j ( k + 1 ) = w i, j ( k ) – α ------------ b i ( k + 1 ) = b i ( k ) – α --------m-
m
∂w i, j ∂b i
11
11 Chain Rule
d f ( n( w ) ) d f ( n ) dn ( w )
----------------------- = -------------- × ---------------
dw dn dw

Example
f ( n ) = cos ( n ) n = e
2w
f ( n ( w ) ) = cos ( e
2w
)

d f (n(w) ) d f ( n ) dn ( w ) 2w 2w 2w
----------------------- = -------------- × --------------- = ( – sin ( n ) ) ( 2e ) = ( – sin ( e ) ) ( 2e )
dw dn dw

Application to Gradient Calculation

m
∂F̂ ∂F̂ ∂n i ∂F̂ ∂F̂ ∂n i
m

m
- = --------m- × -----------
----------- m
- --------- = --------- × ---------
∂w i, j ∂n i ∂w i, j ∂b i
m m
∂n i ∂b i
m

12
11 Gradient Calculation
m–1
S


m m m–1 m
ni = wi, j a j + bi
j=1

m m
∂n i m–1 ∂n i
------------ = a j --------- = 1
m m
∂w i, j ∂b i

Sensitivity
m ∂F̂
s i ≡ --------m-
∂n i

Gradient
∂F̂ m m–1 ∂F̂ m
-----------
m
- = s i aj --------m- = s i
∂w i, j ∂b i

13
11 Steepest Descent
m m m m–1 m m m
w i, j ( k + 1 ) = w i, j ( k ) – αs i a j b i ( k + 1 ) = b i ( k ) – αsi

m m m m–1 T
W (k + 1) = W (k ) – αs (a ) bm ( k + 1 ) = bm ( k ) – α sm

∂F̂
---------
∂n m 1
∂F̂
m ∂
F̂ --------m-
s ≡ ---------m- = ∂n 2
∂n

∂F̂
----------
m
-
∂n m
S

Next Step: Compute the Sensitivities (Backpropagation)


14
11 Jacobian Matrix
m
S
∂n m +1
∂n m +1
∂n m +1  m + 1

m+1 m
1
----------------
1
---------------- … 1
---------------- m+1 ∂  w i, l a l + b i  m
∂n 1
m
∂n 2
m
∂n
m ∂n i l = 1  m+1 ∂ a
S
m ---------------- = ----------------------------------------------------------- = w i, j --------j-
m m m
m+1 m+1 m+1 ∂n j ∂n j ∂n j
∂n 2 ∂n 2 ∂n 2
∂n
m+1
---------------- ---------------- … ----------------
- ≡ ∂n
---------------- m
∂n 2
m
∂n m
m
m+1 m m
∂n
m 1 S ∂n i m + 1∂ f (n j ) m+1 m m
---------------- - = w i, j f˙ ( n j )
= w i, j --------------------


m m
∂n j ∂n j
∂n mm++ 11 ∂n mm++ 11 ∂n mm++ 11
S S S
---------------- ---------------- … ---------------- m m ∂ f (n j )
m m

∂n 1
m
∂n 2
m
∂n m
m f˙ (n j ) = ---------------------
m
S ∂n j

m m
f˙ ( n 1 ) 0 … 0
m+1 m m
∂n m+1 m m m m 0 f˙ ( n 2 ) … 0
----------------- = W Ḟ ( n ) Ḟ ( n ) =
m
∂n …



m
0 0 … f˙ ( n mm )
S
15
11 Backpropagation (Sensitivities)

T
m ∂F̂  ∂n m + 1  ∂F̂ m m m + 1 T ∂F̂
s = ---------- =  ----------------
m 
- = Ḟ ( n ) ( W
- ----------------
m+1
) ----------------
m+1
-
∂n
m
 ∂n  ∂n ∂n

m m m m+1 T m+1
s = Ḟ (n ) ( W ) s

The sensitivities are computed by starting at the last layer, and


then propagating backwards through the network to the first layer.

M M–1 2 1
s →s →…→s →s

16
11 Initialization (Last Layer)
M
S
∂ ∑ (t j – a j)
2

M ∂F̂ ∂( t – a ) ( t – a )
T
∂a i
si = ---------- = --------------------------------------- = ----------------------------------- = – 2 ( t i – a i ) ----------
j = 1
M M M M
∂n i ∂n i ∂n i ∂n i

∂a i ∂a iM ∂ f M ( nM ) M M
---------- = ---------- = ----------------------- = f˙ ( n i )
i
∂n iM ∂n iM ∂n iM

M M M
si = – 2 ( t i – a i ) f˙ ( n i )

M M M
s = – 2 Ḟ (n ) ( t – a )

17
11 Summary
Forward Propagation
0
a = p
m+1 m+1 m+1 m m+1
a = f (W a +b ) m = 0, 2, … , M – 1

a = aM

Backpropagation
M M M
s = – 2 Ḟ (n ) ( t – a )

m m m m+1 T m+1
s = Ḟ (n ) ( W ) s m = M – 1, … , 2, 1

Weight Update
m m m m–1 T m m m
W (k + 1) = W (k ) – αs (a ) b (k + 1) = b (k ) – αs
18
11 Example: Function Approximation

π t
g ( p ) = 1 + sin  --- p
4

p - e

1-2-1 a
Network

19
11 Network

p
1-2-1 a
Network

AAAA
Input Log-Sigmoid Layer Linear Layer

AAAA A A
n11 a11
w11,1
Σ w21,1

AAAA A A
n2 a2
Σ
b11
p
1

AAAA
n12 a12
Σ
b2
w 2,1
1 w21,2
1
b12
1

a1 = logsig (W1p + b1) a2 = purelin (W2a1 + b2)


20
11 Initial Conditions
W ( 0 ) = – 0.27 b ( 0 ) = – 0.48
1 1 2 2
W ( 0 ) = 0.09 – 0.17 b ( 0 ) = 0.48
– 0.41 – 0.13
3
Network Response
Sine Wave

-1
-2 -1 0 1 2 21
11 Forward Propagation
0
a = p = 1

1 1 1 0 1  – 0.27 – 0.48   – 0.75 


a = f ( W a + b ) = logsig  1 +  = logsig  
 – 0.41 – 0.13   – 0.54 

1
--------------------
0.75
1
a = 1 + e = 0.321
1 0.368
-------------------
0.54
-
1+e

a = f ( W a + b ) = purelin ( 0.09 – 0.17 0.321 + 0.48 ) = 0.446


2 2 2 1 2

0.368

 π  2  π 
e = t – a =  1 + sin  --- p  – a =  1 + sin  --- 1  – 0.446 = 1.261
 4    4  
22
11 Transfer Function Derivatives

–n
d 1 
----------------- = ------------------------ = 1 – -----------------   -----------------  = ( 1 – a ) ( a )
1 e 1 1 1 1
f˙ ( n ) =
d n1 + e–n –n 2  –n   –n
(1 + e ) 1+e 1+e

2 d
˙
f ( n) = (n) = 1
dn

23
11 Backpropagation

2 2 2 2
s = – 2 Ḟ (n ) ( t – a ) = – 2 f˙ ( n 2 ) ( 1.261 ) = – 2 1 ( 1.261 ) = – 2.522

1 1
1 1 1 2 T 2 ( 1 – a 1 ) ( a1 ) 0 0.09
s = Ḟ (n ) ( W ) s = – 2.522
1 1
0 ( 1 – a 2 ) ( a 2 ) – 0.17

s = ( 1 – 0.321 ) ( 0.321 )
1 0 0.09
– 2.522
0 ( 1 – 0.368 ) ( 0.368 ) – 0.17

s = 0.218 0 – 0.227 = – 0.0495


1

0 0.233 0.429 0.0997

24
11 Weight Update
α = 0.1

2 2 2 1 T
W ( 1 ) = W ( 0 ) – α s ( a ) = 0.09 – 0.17 – 0.1 – 2.522 0.321 0.368
2
W ( 1 ) = 0.171 – 0.0772

2 2 2
b ( 1 ) = b ( 0 ) – α s = 0.48 – 0.1 – 2.522 = 0.732

1 0 T
W ( 1 ) = W ( 0 ) – α s ( a ) = – 0.27 – 0.1 – 0.0495 1 = – 0.265
1 1

– 0.41 0.0997 – 0.420

b ( 1 ) = b ( 0 ) – α s = – 0.48 – 0.1 – 0.0495 = – 0.475


1 1 1

– 0.13 0.0997 – 0.140


25
11 Choice of Architecture
g ( p ) = 1 + sin  ----- p

4 

1-3-1 Network
3 3

2
i=1 2
i=2
1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

3 3

2
i=4 2
i=8
1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

26
11 Choice of Network Architecture
g ( p ) = 1 + sin  ------ p

4 

3 3

2
1-2-1 2
1-3-1
1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

3 3

2
1-4-1 2
1-5-1
1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

27
11 Convergence
g ( p ) = 1 + sin ( πp )

3 3

2 5 2

1
5
1 3 1 3
2 4
4 2
0
0 0

0
1
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

28
11 Generalization

{p 1, t 1} , { p 2, t 2} , … , {p Q, tQ}

π
g ( p ) = 1 + sin  --- p p = – 2, – 1.6, – 1.2, … , 1.6, 2
4

3 3

1-2-1 1-9-1
2 2

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

29

You might also like