Lecture 4 ANN PDF

3/22/2023
Artificial Neural Networks
Lecture 4
Content
 Learning Rules for Single-Layer Perceptron

Networks
 Perceptron Learning Rule
 Adaline Learning Rule
 -Leaning Rule
 Multilayer Perceptron
 Back Propagation Learning algorithm
Faculty of Engineering-Cairo University 2
1
3/22/2023
Delta rule-Review (single neuron)

1
x1 wi1 a(neti ) 
1  e neti

x2 wi2
wTi x( k ) yi( k )  a  wTi x ( k ) 
.
.
.
xm wim
1 p (k ) 1 p
 
2
Minimize E (w)     d ( k )  a(wT x( k ) ) 
(k ) 2
( d y )
2 k 1 2 k 1
E (w ) p
  (d ( k )  y ( k ) )x (jk )  y ( k ) (1  y ( k ) )
T
 E (w ) E (w ) E (w ) 
 w E (w )   , , ,  w j k 1
 w1 w2 wm  p
   ( k ) x(jk )  y ( k ) (1  y ( k ) )
k 1
w  w E(w) p
w j     ( k ) x (jk )  y ( k ) (1  y ( k ) )
k 1
Learning a multiplayer-Supervised Learning
Training Set

T  (x(1) , d(1) ), (x( 2) , d( 2) ),, (x( p ) , d( p ) ) 
o1 o2 on
d1 d2 dn
Output Layer . . .
. . .
Hidden Layer
. . .
Input Layer . . .
x1 x2 xs
2
3/22/2023
Supervised Learning
Training Set

T  (x(1) , d(1) ), (x( 2) , d( 2) ),, (x( p ) , d( p ) ) 
Sum of Squared Errors o1 o2 on
d1 d2 dn
. . .
E (l )
2 j 1

1 n (l )
  d j  o (jl ) 2
. . .
n: Number of outputs
Goal: . . .
p
Minimize E   E (l )
. . .
l 1
x1 x2 xs
p: Number of patterns`
Back Propagation Learning Algorithm
1 n (l )
 
p
  d j  o (jl ) E   E (l )
(l ) 2
E
2 j 1 l 1
o1 o2 on
d1 d2 dn
 Learning on Output Neurons . . .
 Learning on Hidden Neurons
. . .
. . .
. . .
x1 x2 xs
3
3/22/2023
Learning on Output Neurons
1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1
o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )

d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
w ji net (jl ) w ji
. . . . . .
? ?
. . . . . .
1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1

d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
 E (l )  
l 1 l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
. . . . . .
E (l ) E (l ) o j
(l )

net (jl ) o (jl ) net (jl )
. . . . . .
depends on the
 (d (jl )  o (jl ) ) activation function
4
3/22/2023
1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1

d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
 E (l )  
l 1 l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
. . . . . . E (l ) E (l ) o j
(l )

. . . . . . Using sigmoid,
(d (j l )  o(jl ) )  o(jl ) (1  o(jl ) )

E (l )
 (j l )   (d (j l )  o(jl ) ) o(jl ) (1  o(jl ) )
net (jl )
o1 oj on o (jl )  a(net (jl ) ) 
net (jl )  ( l ) w o(l )
ji i
d1 dj dn j
. . . j . . . E  p p
E (l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
. . . . . . E (l ) E (l ) o j
(l )

. . . . . . Using sigmoid,
(d (j l )  o(jl ) )  o(jl ) (1  o(jl ) )

5
3/22/2023
1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1

d1 dj dn
. . . j . . . E  p p
E (l )

w ji w ji
 E (l )  
l 1 l 1 w ji
wji
E (l ) E (l ) net j
(l )
. . . i . . . 
w ji net (jl ) w ji oi(l )
. . . . . .
E ( l )
  (j l ) oi( l )
w ji
. . . . . .
 (d (j l )  o(jl ) ) o(jl ) (1  o(jl ) )oi(l )

1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1

d1 dj dn
. . . j . . . E  p (l ) p
E (l )
How

w ji wto
1 E  
l 1 weights
ji l train the w ji
wji
. . . i . . . connecting
E

E to
(l )
output neurons?
net (l ) (l )
j
E p oi(l )
. 
. .  j . o.i .
(l ) (l )
w ji l 1 E ( l )
  (j l ) oi( l )
w ji
. . . p
w ji     (j l ) oi(l )
. . .
 (d (j l )  o(jl ) ) o(jl ) (1  o(jl ) )oi(l )
l 1
6
3/22/2023
Learning on Hidden Neurons
1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1
E  p p
E (l )
. . . j . . .

wik wik
 E (l )  
l 1 l 1 wik
wji E E net
(l ) (l ) (l )
 i
wik neti(l ) wik
. . . i . . .
wik
. . .k . . .
? ?
. . . . . .
1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1
 i(l )
E  p p
E (l )
. . . j . . .

wik wik
 E (l )  
l 1 l 1 wik
wji E (l ) E (l ) neti(l )

. . . i . . . ok(l )
wik
. . . k . . .
. . . . . .
7
3/22/2023

o (jl )  a(net (jl ) ) net (jl )   w jioi(l )
1 n

 d (jl )  o (jl ) 
p
E
2
E (l )  E (l )
2 j 1 l 1
 i(l )
E  p p
E (l )
. . . j . . .

wik wik
 E (l )  
l 1 l 1 wik
wji E E net
(l ) (l ) (l )
 i
. . . i . . . ok(l )
wik E (l ) E (l ) oi(l )

. . .k . . . neti(l ) oi(l ) neti(l )
? oi(l ) (1  oi(l ) )
. . . . . .

o (jl )  a(net (jl ) ) net (jl )   w jioi(l )
 i(l ) 
E (l )
  oi( l ) (1  oi( l ) ) w ji (j l )  i(l )
neti(l )
E  E (l )
j p p
. . . j . . .

wik wik
E
l 1
(l )

l 1 wik

. . . i . . . ok(l )
wik E (l ) E (l ) oi(l )
 (l )
. . .k . . . neti (l )
oi neti(l )
oi(l ) (1  oi(l ) )
E E net
(l ) (l ) (l )

j
. . . . . .
oi
(l )
j net j
(l )
o (l )
i
 (lj ) w ji
8
3/22/2023
E (l )
 i(l )    oi( l ) (1  oi( l ) ) w ji (j l )
neti(l )
E  E (l )
j p p
. . . j . . .

wik wik
E
l 1
(l )

l 1 wik

. . . i . . . ok(l )
wik E p
. . .k . . .    i(l ) ok(l )
wik l 1
. . . . . . p
wik     i(l ) ok(l )
l 1
Back Propagation
o1 oj on
d1 dj dn
. . . j . . .
. . . i . . .
. . . k . . .
. . . . . .
x1 . . . xs
9
3/22/2023
Back Propagation
E (l )
 (j l )    (d (j l )  o(jl ) )o(jl ) (1  o(jl ) )
net (jl )
o1 oj on
d1 dj dn
. . . j . . . p
w ji     (j l ) oi(l )
l 1
. . . i . . .
. . . k . . .
. . . . . .
x1 . . . xs
Back Propagation
E (l )
 (j l )    (d (j l )  o(jl ) )o(jl ) (1  o(jl ) )
net (jl )
o1 oj on
d1 dj dn
. . . j . . . p
w ji     (j l ) oi(l )
l 1
. . . i . . . p
wik     i(l ) ok(l )
l 1
. . . k . . .
. . . E (l )
  oi( l ) (1  oi( l ) ) w ji (j l )
. . .
 i(l ) 
neti(l ) j
x1 . . . xs
10
3/22/2023
Back-propagation training algorithm
Step 1: Initialisation
Set all the weights and threshold levels of the
network to random numbers. You can use MATLAB
command “randn”: X = randn(n,m)

Step 2: Activation
(a) Calculate the actual outputs of the neurons in the hidden layer:
 s 
Oi ( p )  sigmoid  xk ( p )  wik ( p )  i 
 k 1 
where s is the number of inputs to neuron i in the hidden layer.
(b) Calculate the actual outputs of the neurons in the output layer:
 m 
O j ( p )  sigmoid  Oi ( p )  w ji ( p )   j 
 i 1 
where m is the number of inputs to neuron j in the output layer.

11
3/22/2023
Step 3: Weight training

Update the weights in the back-propagation network propagating
backward the errors associated with output neurons.
(a) Calculate the error gradient for the neurons in the output
layer:
 j ( p)  o j ( p)  1  o j ( p) e j ( p)
where e j ( p)  d j ( p)  o j ( p)
Calculate the weight corrections:
w ji ( p)    oi ( p)   j ( p)
Update the weights at the output neurons:
w ji ( p  1)  w ji ( p)  w ji ( p)
Step 3: Weight training

(b) Calculate the error gradient for the neurons in the hidden
layer:
n
 i ( p)  oi ( p)  [1  oi ( p)]    j ( p) w ji ( p)
j 1
Calculate the weight corrections:

wik ( p)    xk ( p)   i ( p)
Update the weights at the hidden neurons:
wik ( p  1)  wik ( p)  wik ( p)
12
3/22/2023
Step 4: Iteration
Increase iteration p by one, go back to Step 2 and
repeat the process until the selected error criterion is
satisfied.

start Calculate 𝐸(𝑃)
Initialize all
weights &biases Last pattern?
No
yes
Enter
Calculate total 𝐸
pattern{X(p),𝑑𝑗 } No
Calculate 𝑂𝑗 , 𝑂𝑖 𝐸 < 𝜖?
Calculate 𝛿𝑗 , 𝛿𝑖 yes
Stop
Update weights
𝑤𝑗𝑖 , 𝑤𝑖𝑘
13
3/22/2023
Example : XOR
 Suppose that a network is required to perform logical

operation Exclusive-OR. Recall that a single-layer perceptron
could not do this operation. Now we will apply the three-
layer net.
1
3
w13 1
x1 1 3 w35
w23 5
5 y5
W14
x2 2 4 w45
w24
Input 4 Output
layer layer
1
Hidden layer
Example : XOR
 The effect of the threshold applied to a neuron in

the hidden or output layer is represented by its
weight, , connected to a fixed input equal to 1.
 The initial weights and threshold levels are set
randomly as follows:
w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 =
1.2, w45 = 1.1, 3 = 0.8, 4 = 0.1 and 5 = 0.3.
14
3/22/2023
Example : XOR
 We consider a training set where inputs x1 and x2 are
equal to 1 and desired output yd,5 is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are
calculated as

y3  sigmoid( x1w13  x2 w23  3 )  1 / 1  e(10.510.410.8)  0.5250 
y4  sigmoid( x1w14  x2 w24   )  1 / 1  e
4
(10.9 11.0 10.1)
  0.8808
 Now the actual output of neuron 5 in the output layer is
determined as:
 
y5  sigmoid( y3w35  y4 w45  5 )  1 / 1  e (0.52501.20.88081.110.3)  0.5097
 Thus, the following error is obtained:
e  yd ,5  y5  0  0.5097  0.5097
Example : XOR
 The next step is weight training. To update the weights and

threshold levels in our network, we propagate the error, e,
from the output layer backward to the input layer.
 First, we calculate the error gradient for neuron 5 in the
output layer:
 5  y5 (1  y5 ) e  0.5097 (1  0.5097) (0.5097)  0.1274
 Then we determine the weight corrections assuming that the
learning rate parameter, , is equal to 0.1:
w35    y3   5  0.1 0.5250 (0.1274)  0.0067

w45    y4   5  0.1 0.8808 (0.1274)  0.0112
5    (1)   5  0.1 (1)  (0.1274)  0.0127
15
3/22/2023
Example : XOR
 Next we calculate the error gradients for neurons 3 and 4 in

the hidden layer:
 3  y3 (1  y3 )   5  w35  0.5250 (1  0.5250) (  0.1274) (  1.2)  0.0381
 4  y4 (1  y4 )   5  w45  0.8808 (1  0.8808) (  0.127 4)  1.1  0.0147
 We then determine the weight corrections:
w13    x1   3  0.11 0.0381 0.0038

w23    x2   3  0.11 0.0381 0.0038
3    (1)   3  0.1 (1)  0.0381 0.0038
w14    x1   4  0.11  (0.0147 )  0.0015
w24    x2   4  0.11 (0.0147 )  0.0015
4    (1)   4  0.1 (1)  (0.0147)  0.0015
Example : XOR
 At last, we update all weights and threshold:
w13  w13  w13  0 .5  0 .0038  0 .5038

w14  w14  w14  0 .9  0 .0015  0 .8985
w 23  w 23  w 23  0 .4  0 .0038  0 .4038
w 24  w 24  w 24  1 .0  0 .0015  0 .9985
w35  w35  w35  1 .2  0 .0067  1 .2067
w 45  w 45  w 45  1 .1  0 .0112  1 .0888
 3   3    3  0 .8  0 .0038  0 .7962
 4   4    4  0 .1  0 .0015  0 .0985
 5   5    5  0 .3  0 .0127  0 .3127
16
3/22/2023
Example : XOR
 The training process is repeated until the sum of squared

errors is less than 0.001.
Sum-Squared Network Error for 224 Epochs
101
100
Sum-Squared Error
10-1
10-2
10-3
10-4
0 50 100 150 200
Epoch
Example : XOR
 Final results of three-layer network learning
Inputs Desired Actual Error Sum of

output output squared
x1 x2 yd y5 e errors
1 1 0 Y
0.0155 0.0155 0.0010
e
0 1 1 0.9849 0.0151
1 0 1 0.9849 0.0151
0 0 0 0.0175 0.0175
17
3/22/2023
Network represented by McCulloch-Pitts model for

solving the Exclusive-OR operation
1
+1.5
1
+1.0
x1 1 3 2.0 +0.5
+1.0
5 y5
+1.0
x2 2 +1.0
4
+1.0
+0.5
1
How to overcome Backpropagation drawbacks
Accelerated learning in multilayer neural networks
 Change the learning rate: Use another numerical optimization

 When the performance surface techniques:
is very flat, it allows a large  The conjugate gradient algorithm,
learning rate,  Levenberg-Marquardt algorithm
 while in a high curvature (a variation of Newton’s method).
region, it would require a small
learning rate
 Add momentum term
18
3/22/2023
Accelerated learning
 Convergence might be improved if we could smooth out the

oscillations in the trajectory. We can do this with a low-pass
filter.
w jk ( p)    w jk ( p  1)    y j ( p)   k ( p)
where  is a positive number (0    1) called the momentum

constant. Typically, the momentum constant is set to 0.95.
Learning with momentum for operation Exclusive-OR

Sum-Squared Network Error for 224 Epochs
101
100
Sum-Squared Error
10-1 With momentum term

10-2 Training for 126 Epochs
102
Sum-Squared Error
1
10
10-3
100
10-1
10-4
0 50 100 150 200
Epoch 10-2
10-3
Without momentum term 10-4
0 20 40 60 80 100 120
Epoch
1.5
1
Learning Rate
0.5
-0.5
-1
0 20 40 60 80 100 120 140
Epoch
19
3/22/2023
Learning with adaptive learning rate
 Adapting the learning rate requires some changes in

the back-propagation algorithm:
 If the sum of squared errors at the current epoch
exceeds the previous value by more than a
predefined ratio (typically 1.04), the learning rate
parameter is decreased (typically by multiplying by
0.7) and new weights and thresholds are calculated.
 If the error is less than the previous one, the

learning rate is increased (typically by multiplying by
1.05).
Learning with adaptive learning rate

Training for 103 Epochs
102
Sum-Squared Error
101
100
10-1
10-2
10-3
10-4
0 10 20 30 40 50 60 70 80 90 100
Epoch
1
0.8
Learning Rate
0.6
0.4
0.2
0
0 20 40 60 80 100 120
Epoch
20
3/22/2023
Learning with momentum and adaptive learning rate

Training for 85 Epochs
102
Sum-Squared Error 101

100
10-1
10-2
10-3
10-4
0 10 20 30 40 50 60 70 80
Epoch
2.5
2
Learning Rate
1.5
0.5
0
0 10 20 30 40 50 60 70 80 90
Epoch
Learning Factors
Initial Weights
Learning Constant ()
Cost Functions
Update Rules
Training Data and Generalization
Number of Layers
Number of Hidden Nodes
21

Lecture 4 ANN PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4 ANN PDF

Uploaded by

Copyright:

Available Formats

3/22/2023

Artificial Neural Networks

 Learning Rules for Single-Layer Perceptron

 Adaline Learning Rule

 Back Propagation Learning algorithm

Faculty of Engineering-Cairo University 2

Delta rule-Review (single neuron)

Learning a multiplayer-Supervised Learning

Faculty of Engineering-Cairo University 6

Back Propagation Learning Algorithm

Learning on Output Neurons

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )

Faculty of Engineering-Cairo University 8

Learning on Output Neurons

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )

Learning on Output Neurons

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )

(d (j l )  o(jl ) )  o(jl ) (1  o(jl ) )

Learning on Output Neurons

(d (j l )  o(jl ) )  o(jl ) (1  o(jl ) )

Learning on Output Neurons

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )

Faculty of Engineering-Cairo University 12

Learning on Output Neurons

o1 oj on o (jl )  a(net (jl ) ) net (jl )   w jioi(l )

Learning on Hidden Neurons

Faculty of Engineering-Cairo University 14

Learning on Hidden Neurons

Faculty of Engineering-Cairo University 15

Learning on Hidden Neurons

Faculty of Engineering-Cairo University 16

Learning on Hidden Neurons

Learning on Hidden Neurons

Faculty of Engineering-Cairo University 18

Back-propagation training algorithm

Faculty of Engineering-Cairo University 22

Back-propagation training algorithm

where m is the number of inputs to neuron j in the output layer.

Back-propagation training algorithm

Step 3: Weight training

Back-propagation training algorithm

Step 3: Weight training

Calculate the weight corrections:

Faculty of Engineering-Cairo University 25

Back-propagation training algorithm

Faculty of Engineering-Cairo University 26

Back-propagation training algorithm

Faculty of Engineering-Cairo University 27

 Suppose that a network is required to perform logical

 The effect of the threshold applied to a neuron in

Faculty of Engineering-Cairo University 29

 Thus, the following error is obtained:

 The next step is weight training. To update the weights and

w35    y3   5  0.1 0.5250 (0.1274)  0.0067

 Next we calculate the error gradients for neurons 3 and 4 in

 We then determine the weight corrections:

w13    x1   3  0.11 0.0381 0.0038

Faculty of Engineering-Cairo University 32

 At last, we update all weights and threshold:

w13  w13  w13  0 .5  0 .0038  0 .5038

Faculty of Engineering-Cairo University 33

 The training process is repeated until the sum of squared