24 AssociativeLearning1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Associative Learning

1
• We discuss a collection of simple rules that allow
unsupervised learning.
• These rules give networks the ability to learn
associations between patterns that occur together
frequently. → pattern recognition and recall.
• How associations can be represented by a network?
• How a network can learn new associations?
• An association is any link between a system’s input
and output such that when a pattern A (stimulus) is
presented to the system it will respond with pattern B
(response).

2
Hebb’s Postulate

“When an axon of cell A is near enough to excite


a cell B and repeatedly or persistently takes part
in firing it, some growth process or metabolic
change takes place in one or both cells such that
A’s efficiency, as one of the cells firing B, is
increased.”

3
Simple Associative Network

 1  stim ulus
p = 
 0  no stim ulus

 1 response
a = 
 0 no r esponse

a = hardlimwp + b = hardlim wp – 0.5

The network will respond to the stimulus only if w is greater


than -b (in this case 0.5).
4
Banana Associator

Unconditioned Stimulus (~dog’s food) Conditioned Stimulus (~bell)


 1 shape dete cted  1  sm ell det ect ed
0
p =  p = 
 0  sm ell not det ect ed
5
 0 shape no t dete cte d
• One set of inputs will represent the unconditioned
stimulus.
• Another set of inputs will represent the conditioned
stimulus.
• We will represent the unconditioned stimulus as p0
and the conditioned stimulus simply as p. For our
purposes we will assume that the weights associated
with p0 are fixed, but that the weights associated with
p are adjusted according to the relevant learning rule.

6
Unsupervised Hebb Rule
w ij q  = w ij q – 1  + a i q  p j q 
α dictates how many times a stimulus and response must
occur together before an association is made.

Vector Form:

W q  = W q – 1  + a q  pT q 

Training Sequence p1 p 2   pQ

7
Banana Recognition Example
0
Initial Weights: w = 1 w  0 = 0
Training Sequence:  p0 1 = 0 p 1 = 1 p0 2 = 1 p2  = 1 

=1→ w q = w  q – 1 + aq pq 

First Iteration (sight fails):


0 0
a  1  = ha r dlim w p  1  + w 0  p  1  – 0.5 
= ha r dlim 1 ×0 + 0 ×1 – 0.5  = 0 (no response)

w 1 = w 0  + a 1p 1 = 0 + 0  1 = 0
8
Example
Second Iteration (sight works):
0 0
a  2  = ha r dlim w p  2  + w 1  p  2  – 0.5 
= ha r dlim 1 ×1 + 0 ×1 – 0.5  = 1 (banana)

w 2 = w 1  + a 2p 2 = 0 + 1  1 = 1

Third Iteration (sight fails):


0 0
a  3  = hardlim  w p 3  + w 2  p 3  – 0.5
= hardlim  1  0 + 1  1 – 0.5 = 1 ( banana)

w 3 = w 2  + a 3p 3 = 1 + 1  1 = 2

Banana will now be detected if either sensor works. nnd15uh


9
Problems with Hebb Rule

• Weights can become arbitrarily large (in


biological systems synapses cannot grow
without bound).

• There is no mechanism for weights to


decrease. If the inputs or outputs of a Hebb
network experience any noise, every weight
will grow until the net responds to any
stimulus.

10
Hebb Rule with Decay

W q  = W  q – 1  + a q  pT q  –  W q – 1 

W q  = 1 –  W q – 1 + a q pT q

As decay rate (γ) approaches one, the learning law quickly


forgets old inputs and remembers only the most recent patterns.
This keeps the weight matrix from growing without bound,
which can be demonstrated by setting both ai and pj to 1:
m ax m ax
wi j = 1 –   wi j +  ai pj
m ax m ax
wi j = 1 –   wi j + 
m ax 
wi j = ---
 11
Example: Banana Associator
=1  = 0.1

First Iteration (sight fails):

a  1 = hardlim w0p0 1  + w 0 p 1  – 0.5


= hardlim 1 ×0 + 0 ×1 – 0.5 = 0 (no response)
w 1 = w  0 + a 1p 1 – 0.1w 0 = 0 + 0  1 – 0.1 0 = 0

Second Iteration (sight works):

a 2 = hardlim w0p02 + w1 p2 – 0.5


= hardlim 1 ×1 + 0 ×1 – 0.5 = 1 (banana)

w 2 = w  1 + a 2p 2 – 0.1w 1 = 0 + 1  1 – 0.1 0 = 1 12


Third Iteration (sight fails):

0 0
a  3  = hardlim w p 3  + w 2  p 3  – 0.5
= hardlim 1  0 + 1  1 – 0.5 = 1 (banana)
w 3 = w 2 + a 3p 3 – 0.1w 3 = 1 + 1  1 – 0.11 = 1.9

w w

13
q q
Problem of Hebb with Decay
• Associations will decay away if stimuli are not
occasionally presented.
If ai = 0, then
w

wij  q  = 1 –  w ij q – 1 

If  = 0.1, this becomes

wi j q  =  0.9 wi j q – 1 

q
Therefore the weight decays by 10% at each iteration
14
where there is no stimulus.
Instar (Recognition Network)
We considered associations between scalar inputs and
outputs. Now we examine a neuron that has a vector
input.

15
Instar is similar with ADALINE, perceptron and linear Associator.
Instar Operation
a = hardlim Wp + b = hardlim 1wT p + b

The instar will be active when


T  –b
1w p

or

1w
Tp = 1w p cos q  – b

For normalized vectors, the largest inner product occurs when the
angle between the weight vector and the input vector is zero 
the input vector is equal to the weight vector.

The rows of a weight matrix represent patterns


to be recognized. 16
Vector Recognition
If we set
b = – 1w p

the instar will only be active when q = 0.


If we set
b > – 1w p

the instar will be active for a range of angles.

1w

As b is increased, the more patterns there will be (over a


wider range of q) which will activate the instar. 17
Instar Rule
Original Hebb rule: w ij q  = wij  q – 1  + a i q p j q 

With decay: wij (q)  wij (q  1)   ai (q) p j (q)   wij (q  1)

Modify so that learning and forgetting will only occur


when the neuron is active - Instar Rule:

w  q = w  q – 1 + a  q p  q –  a  qw  q – 1
ij ij i j i ij
or
w ij q  = wij  q – 1  + a i q   p j  q  – wi j q – 1  

Vector Form:

i w q  = iw  q – 1  + a i q  p  q  – iw q – 1  
18
Graphical Representation
For the case where the instar is active (ai = 1):
iw q  = iw  q – 1  +   p  q  – iw q – 1 

or
iw q  =  1 –  iw  q – 1  +  p  q 

For the case where the instar is inactive (ai = 0):


iw  q  = iw q – 1  19
• When the instar is active the weight vector is moved
toward the input vector along a line between the old
weight vector and the input vector. If α=1 the weight
vector is equal to the input vector (max. movement).
• One useful feature of the instar rule is that if the input
vectors are normalized, then iw will also be
normalized once it has learned a particular vector p.
• This rule not only minimizes forgetting, but results in
normalized weight vectors, if the input vectors are
normalized.

nnd15gis 20
Example
0  1 oran ge det ect ed visually
p = 
 0 oran ge not det ect ed

shape pj=±1→║p║=√3 →
p = tex ture
w eight b=-2 > -║p║2

21
Training
T The network should not respond to any
W 0  = 1w  0  = 0 0 0 combination of fruit measurements, so the
measurement weights will start with values of 0.
 1  1 
 0  0  Assumption: the
 p  1 = 0 p 1 = –1 
 p 2  = 1  p 2  = –1  
   visual system only
 –1  –1  operates correctly on
even time steps.
First Iteration (=1):
0 0
a 1  = hardlim  w p  1  + Wp  1  – 2 
 1 
 
a 1  = h ardlim 3  0 + 0 0 0 – 1 – 2 = 0 ( no response)
 
 – 1 

0  1 0  0
 
1 w 1  = 1 w 0  + a  1  p  1  – 1w  0  = 0 + 0 –1 – 0  = 0
 
0  –1 0 0 22
Further Training

0  1 0  1
 
1w  2  = 1 w 1  + a 2   p 2  – 1 w 1   = 0 + 1  – 1 – 0  = – 1
 
0  – 1 0  –1

1  1 1  1
 
1 w 3  = 1 w 2  + a 3   p  3  – 1w 2   = –1 + 1  –1 – –1  = –1
 
–1  –1 –1  –1
nnd15is

Orange will now be detected if either set of sensors works. 23


Kohonen Rule
i w(
q ) = i w( q – 1) + α ( p( q) – i w( q – 1) ), for i X( q )

● This is an associative learning rule like instar, so it is suitable


for recognition. Unlike the instar rule, learning is not
proportional to the neuron’s output ai(q).

● Learning occurs when the neuron’s index i is a member of


the set X(q). We will see in Ch 16 that this can be used to train
all neurons in a given neighborhood (e.g. training self-
organizing feature map NNs).

● If we define X(q) as the set of all i such that ai(q) =1, this rule
will be as instar rule.
24
Outstar (Recall Network)
• The instar network with a vector input and a
scalar output can perform pattern recognition
by associating a particular vector stimulus with
a response.
• The outstar has a scalar input and a vector
output. It can perform pattern recall by
associating a stimulus with a vector response.

25
The symmetric saturating function is chosen because this
network is used to recall a vector containing values -1 or +1. 26
Outstar Operation

Suppose we want the outstar to recall a certain pattern a*


whenever the input p = 1 is presented to the network. Let
W = a

Then, when p = 1

a = satlins W p  = s atli ns a  1  = a

and the pattern is correctly recalled.

The columns of a weight matrix represent patterns


to be recalled.
27
Outstar Rule
For the instar rule we made the weight decay term of the Hebb
rule proportional to the output of the network. For the outstar
rule we make the weight decay term proportional to the input
of the network.

wij  q  = wi j q – 1  + a i q p j q  –  p j q w ij q – 1 

If we make the decay rate  equal to the learning rate ,

wi j q  = wi j q – 1  +  a i q  – w ij q – 1  p j q 

28
The outstar rule has properties complementary to
the instar rule. Learning occurs whenever pj is
nonzero (instead of ai). When learning occurs,
column wj moves toward the output vector.

Vector Form:

w j  q = w j  q – 1 +  a q – w j  q – 1 p j q

where wj is the jth column of the matrix W.

29
Example - Pineapple Recall

30
Definitions
0 0
a = satlins W p + Wp

100
0
W = 010
001

shape –1
0
p = tex ture p pi neap ple
= –1
weight 1

 1 if a pineap ple ca n be seen


p = 
 0 othe rwise
31
● The weight matrix for unconditioned stimulus W0,
is set to the identity matrix, so that any set of
measurements p0 (with ±1 values) will be copied to
the output a.
● The weight matrix for the conditioned stimulus, W,
is set to zero initially, so that a 1 on p will not
generate a response.
● W will be updated with the outstar rule using α=1:

w j ( q) = w j ( q – 1) + ( a( q) – wj ( q – 1) ) pj( q )

32
Iteration 1
 0  – 1  Assumption:
 0  0 
 p  1  = 0  p  1  = 1 
 p 2  = –1  p  2  = 1   measured values are
  
 0  1  available only on
even iterations.
=1

 
 0 0  0
a 1  = s atli ns 0 + 0 1  = 0 ( no r esponse )
 
 0 0  0

 
0  0 0  0
w 1 1  = w1  0  + a 1  – w 1  0  p 1  = 0 +  0 – 0  1 = 0
 
0 0 0 0
33
Convergence
 
 – 1 0  –1
a 2  = sa tlins  – 1 + 0 1  = –1 ( me asurem ent s given)
 
 1 0  1

0  – 1 0  –1
w 1  2  = w1 1  + a  2  – w1  1   p  2  = 0 +  – 1 – 0  1 = –1
 
0  1 0 1

 0 – 1  –1
 
a 3  = s atli ns 0 + – 1 1  = – 1 ( me asurem ent s re calle d)
 
 0 1  1

34
• The network is now able to recall the measurements
of the pineapple when it sees it, even though the
measurements system fails.
• From now on, the weights will no longer change
values unless a pineapple is seen with different
measurements.

– 1  –1 – 1  –1
w 1  3  = w1  2  + a  2  – w1  2   p  2  = – 1 +  –1 – – 1  1 = – 1
 
1  1 1 1

nnd15os

35

You might also like