Introduction To Neural Networks: John Paxton Montana State University Summer 2003

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 45

Introduction to Neural

John Paxton
Montana State University
Summer 2003
Chapter 4: Competition
• Force a decision (yes, no, maybe) to be
• Winner take all is a common approach.
• Kohonen learning
wj(new) = wj(old) +  (x – wj(old))
• wj is closest weight vector, determined by
Euclidean distance.
• Lippman, 1987
• Fixed-weight competitive net.
• Activation function f(x) = x if x > 0, else 0.
• Architecture

a1 a2 1
1.wij = 1 if i = j, otherwise –
2.aj(0) = si, t = 0.
3.aj(t+1) = f[aj(t) –*k<>j ak(t)]
4.go to step 3 if more than one node has a
non-zero activation

Special Case: More than one node has the

same maximum activation.
• s1 = .5, s2 = .1,  = .1

• a1(0) = .5, a2(0) = .1

• a1(1) = .49, a2(1) = .05
• a1(2) = .485, a2(2) = .001
• a1(3) = .4849, a2(3) = 0
Mexican Hat
• Kohonen, 1989
• Contrast enhancement
• Architecture (w0, w1, w2, w3)
• w0 (xi -> xi) , w1 (xi+1 -> xi and xi-1 ->xi)

xi-3 xi-2 xi-1 xi xi+1 xi+2 xi+3

0 - + + + - 0
1. initialize weights
2. xi(0) = si
3. for some number of steps do
4. xi(t+1) = f [ wkxi+k(t) ]
5. xi(t+1) = max(0, xi(t))
• x1, x2, x3, x4, x5
• radius 0 weight = 1
• radius 1 weight = 1
• radius 2 weight = -.5
• all other radii weights = 0
• s = (0 .5 1 .5 0)
• f(x) = 0 if x < 0, x if 0 <= x <= 2, 2
• x(0) = (0 .5 1 .5 1)
• x1(1) = 1(0) + 1(.5) -.5(1) = 0
• x2(1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25
• x3(1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) =
• x4(1) = 1.25
• x5(1) = 0
Why the name?
• Plot x(0) vs. x(1)

x1 x2 x3 x4 x5
Hamming Net
• Lippman, 1987
• Maximum likelihood classifier
• The similarity of 2 vectors is taken to be
n – H(v1, v2)
where H is the Hamming distance
• Uses MaxNet with similarity metric
• Concrete example:



x2 MaxNet

1. wij = si(j)/2
2. n is the dimensionality of a vector
3. yin.j = xiwij + (n/2)
4. select max(yin.j) using MaxNet
• Training examples: (1 1 1), (-1 -1 -1)
• n=3
• yin.1 = 1(.5) + 1(.5) + 1(.5) + 1.5 = 3
• yin.2 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0
• These last 2 quantities represent the
Hamming distance
• They are then fed into MaxNet.
Kohonen Self-Organizing Maps
• Kohonen, 1989
• Maps inputs onto one of m clusters
• Human brains seem to be able to self

x1 y1

xn ym
• Linear 321#123

• Rectangular
1. initialize wij
2. select topology of yi
3. select learning rate parameters
4. while stopping criteria not reached
5. for each input vector do
6. compute D(j) = (wij – xi)2
for each j
7. select minimum D(j)
8. update neighborhood units
wij(new) = wij(old) + [xi – wij(old)]
9. update 
10. reduce radius of neighborhood
at specified times
• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1
1) into two clusters
 (0) = .6
 (t+1) = .5 * (t)
• random initial weights
.2 .8
.6 .4
.5 .7
.9 .3
• Present (1 1 0 0)

• D(1) = (.2 – 1)2 + (.6 – 1)2 + (.5 – 0)2 + (.9

– 0)2 = 1.86
• D(2) = .98

• D(2) wins!
• wi2(new) = wi2(old) + .6[xi – wi2(old)]

.2 .92 (bigger)
.6 .76 (bigger)
.5 .28 (smaller)
.9 .12 (smaller)

• This example assumes no neighborhood

• After many epochs

0 1 (1 1 0 0) -> category 2
0 .5 (0 0 0 1) -> category 1
.5 0 (1 0 0 0) -> category 2
1 0 (0 0 1 1) -> category 1
• Grouping characters
• Travelling Salesperson Problem
– Cluster units can be represented graphically
by weight vectors
– Linear neighborhoods can be used with the
first and last cluster units connected
Learning Vector Quantization
• Kohonen, 1989
• Supervised learning
• There can be several output units per
• Like Kohonen nets, but no topology for
output units
• Each yi represents a known class

x1 y1

xn ym
1. Initialize the weights
(first m training examples, random)
2. choose 
3. while stopping criteria not reached do
(number of iterations,  is very small)
4. for each training vector do
5. find minimum || x – wj ||
6. if minimum is target class
wj(new) = wj(old) + [x – wj(old)]
wj(new) = wj(old) – [x – wj(old)]
7. reduce 
• (1 1 -1 -1) belongs to category 1
• (-1 -1 -1 1) belongs to category 2
• (-1 -1 1 1) belongs to category 2
• (1 -1 -1 -1) belongs to category 1
• (-1 1 1 -1) belongs to category 2

• 2 output units, y1 represents category 1

and y2 represents category 2
• Initial weights (where did these come

1 -1
1 -1
-1 -1
-1 1

  = .1
• Present training example 3, (-1 -1 1 1). It
belongs to category 2.

• D(1) = 16 = (1 + 1)2 + (1 + 1)2 + (-1 -1)2

+ (-1-1)2
• D(2) = 4

• Category 2 wins. That is correct!

• w2(new) = (-1 -1 -1 1)
+ .1[(-1 -1 1 1) - (-1 -1 -1 1)] =

(-1 -1 -.8 1)
• How many yi should be used?

• How should we choose the class that each

yi should represent?

• LVQ2, LVQ3 are enhancements to LVQ

that modify the runner-up sometimes
• Hecht-Nielsen, 1987
• There are input, output, and clustering
• Can be used to compress data
• Can be used to approximate functions
• Can be used to associate patterns
• Stage 1: Cluster input vectors

• Stage 2: Adapt weights from cluster units

to output units
Stage 1 Architecture

w11 v11
x1 z1 y1

xn zp ym
Stage 2 Architecture

x*1 y*1
tj1 vj1


x*n y*m
Full Counterpropagation
• Stage 1 Algorithm
1.initialize weights, 
2.while stopping criteria is false do
3.for each training vector pair do
4.minimize ||x – wj|| + ||y – vj||
wj(new) = wj(old) + [x – wj(old)]
vj(new) = vj(old) + [y-vj(old)]
5.reduce 
Stage 2 Algorithm
1. while stopping criteria is false
2. for each training vector pair do
3. perform step 4 above
4. tj(new) = tj(old) + [x – tj(old)]
vj(new) = vj(old) + [y – vj(old)]
Partial Example
• Approximate y = 1/x [0.1, 10.0]

• 1 x unit
• 1 y unit
• 10 z units
• 1 x* unit
• 1 y* unit
Partial Example
• v11 = .11, w11 = 9.0
• v12 = .14, w12 = 7.0
• …
• v10,1 = 9.0, w10,1 = .11

• test .12, predict 9.0.

• In this example, the output weights will converge

to the cluster weights.
Forward Only Counterpropagation
• Sometimes the function y = f(x) is not

• Architecture (only 1 z unit active)

x1 z1 y1

xn zp ym
Stage 1 Algorithm
1. initialize weights,  (.1),  (.6)
2. while stopping criteria is false do
3. for each input vector do
4. find minimum || x – w||
w(new) = w(old) + [x – w(old)]
5. reduce 
Stage 2 Algorithm
1. while stopping criteria is false do
2. for each training vector pair do
3. find minimum || x – w ||
w(new) = w(old) + [x – w(old)]
v(new) = v(old) + [y – v(old)]
4. reduce 

Note: interpolation is possible.

• y = f(x) over [0.1, 10.0]
• 10 zi units
• After phase 1, zi = 0.5, 1.5, …, 9.5.
• After phase 2, zi = 5.5, 0.75, …, 0.1

You might also like