Professional Documents
Culture Documents
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
Introduction To Neural Networks: John Paxton Montana State University Summer 2003
Networks
John Paxton
Montana State University
Summer 2003
Chapter 4: Competition
• Force a decision (yes, no, maybe) to be
made.
• Winner take all is a common approach.
• Kohonen learning
wj(new) = wj(old) + (x – wj(old))
• wj is closest weight vector, determined by
Euclidean distance.
MaxNet
• Lippman, 1987
• Fixed-weight competitive net.
• Activation function f(x) = x if x > 0, else 0.
• Architecture
a1 a2 1
-
1
Algorithm
1.wij = 1 if i = j, otherwise –
2.aj(0) = si, t = 0.
3.aj(t+1) = f[aj(t) –*k<>j ak(t)]
4.go to step 3 if more than one node has a
non-zero activation
0 - + + + - 0
Algorithm
1. initialize weights
2. xi(0) = si
3. for some number of steps do
4. xi(t+1) = f [ wkxi+k(t) ]
5. xi(t+1) = max(0, xi(t))
Example
• x1, x2, x3, x4, x5
• radius 0 weight = 1
• radius 1 weight = 1
• radius 2 weight = -.5
• all other radii weights = 0
• s = (0 .5 1 .5 0)
• f(x) = 0 if x < 0, x if 0 <= x <= 2, 2
otherwise
Example
• x(0) = (0 .5 1 .5 1)
• x1(1) = 1(0) + 1(.5) -.5(1) = 0
• x2(1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25
• x3(1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) =
2.0
• x4(1) = 1.25
• x5(1) = 0
Why the name?
• Plot x(0) vs. x(1)
x1 x2 x3 x4 x5
Hamming Net
• Lippman, 1987
• Maximum likelihood classifier
• The similarity of 2 vectors is taken to be
n – H(v1, v2)
where H is the Hamming distance
• Uses MaxNet with similarity metric
Architecture
• Concrete example:
x1
y1
x2 MaxNet
y2
x3
Algorithm
1. wij = si(j)/2
2. n is the dimensionality of a vector
3. yin.j = xiwij + (n/2)
4. select max(yin.j) using MaxNet
Example
• Training examples: (1 1 1), (-1 -1 -1)
• n=3
• yin.1 = 1(.5) + 1(.5) + 1(.5) + 1.5 = 3
• yin.2 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0
• These last 2 quantities represent the
Hamming distance
• They are then fed into MaxNet.
Kohonen Self-Organizing Maps
• Kohonen, 1989
• Maps inputs onto one of m clusters
• Human brains seem to be able to self
organize.
Architecture
x1 y1
xn ym
Neighborhoods
• Linear 321#123
• Rectangular
22222
21112
21#12
21112
22222
Algorithm
1. initialize wij
2. select topology of yi
3. select learning rate parameters
4. while stopping criteria not reached
5. for each input vector do
6. compute D(j) = (wij – xi)2
for each j
Algorithm.
7. select minimum D(j)
8. update neighborhood units
wij(new) = wij(old) + [xi – wij(old)]
9. update
10. reduce radius of neighborhood
at specified times
Example
• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1
1) into two clusters
(0) = .6
(t+1) = .5 * (t)
• random initial weights
.2 .8
.6 .4
.5 .7
.9 .3
Example
• Present (1 1 0 0)
• D(2) wins!
Example
• wi2(new) = wi2(old) + .6[xi – wi2(old)]
.2 .92 (bigger)
.6 .76 (bigger)
.5 .28 (smaller)
.9 .12 (smaller)
0 1 (1 1 0 0) -> category 2
0 .5 (0 0 0 1) -> category 1
.5 0 (1 0 0 0) -> category 2
1 0 (0 0 1 1) -> category 1
Applications
• Grouping characters
• Travelling Salesperson Problem
– Cluster units can be represented graphically
by weight vectors
– Linear neighborhoods can be used with the
first and last cluster units connected
Learning Vector Quantization
• Kohonen, 1989
• Supervised learning
• There can be several output units per
class
Architecture
• Like Kohonen nets, but no topology for
output units
• Each yi represents a known class
x1 y1
xn ym
Algorithm
1. Initialize the weights
(first m training examples, random)
2. choose
3. while stopping criteria not reached do
(number of iterations, is very small)
4. for each training vector do
Algorithm
5. find minimum || x – wj ||
6. if minimum is target class
wj(new) = wj(old) + [x – wj(old)]
else
wj(new) = wj(old) – [x – wj(old)]
7. reduce
Example
• (1 1 -1 -1) belongs to category 1
• (-1 -1 -1 1) belongs to category 2
• (-1 -1 1 1) belongs to category 2
• (1 -1 -1 -1) belongs to category 1
• (-1 1 1 -1) belongs to category 2
1 -1
1 -1
-1 -1
-1 1
= .1
Example
• Present training example 3, (-1 -1 1 1). It
belongs to category 2.
(-1 -1 -.8 1)
Issues
• How many yi should be used?
w11 v11
x1 z1 y1
xn zp ym
Stage 2 Architecture
x*1 y*1
tj1 vj1
zj
x*n y*m
Full Counterpropagation
• Stage 1 Algorithm
1.initialize weights,
2.while stopping criteria is false do
3.for each training vector pair do
4.minimize ||x – wj|| + ||y – vj||
wj(new) = wj(old) + [x – wj(old)]
vj(new) = vj(old) + [y-vj(old)]
5.reduce
Stage 2 Algorithm
1. while stopping criteria is false
2. for each training vector pair do
3. perform step 4 above
4. tj(new) = tj(old) + [x – tj(old)]
vj(new) = vj(old) + [y – vj(old)]
Partial Example
• Approximate y = 1/x [0.1, 10.0]
• 1 x unit
• 1 y unit
• 10 z units
• 1 x* unit
• 1 y* unit
Partial Example
• v11 = .11, w11 = 9.0
• v12 = .14, w12 = 7.0
• …
• v10,1 = 9.0, w10,1 = .11
xn zp ym
Stage 1 Algorithm
1. initialize weights, (.1), (.6)
2. while stopping criteria is false do
3. for each input vector do
4. find minimum || x – w||
w(new) = w(old) + [x – w(old)]
5. reduce
Stage 2 Algorithm
1. while stopping criteria is false do
2. for each training vector pair do
3. find minimum || x – w ||
w(new) = w(old) + [x – w(old)]
v(new) = v(old) + [y – v(old)]
4. reduce