Introduction To Neural Networks: John Paxton Montana State University Summer 2003

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 45

Introduction to Neural

Networks
John Paxton
Montana State University
Summer 2003
Chapter 4: Competition
• Force a decision (yes, no, maybe) to be
made.
• Winner take all is a common approach.
• Kohonen learning
wj(new) = wj(old) +  (x – wj(old))
• wj is closest weight vector, determined by
Euclidean distance.
MaxNet
• Lippman, 1987
• Fixed-weight competitive net.
• Activation function f(x) = x if x > 0, else 0.
• Architecture

a1 a2 1
-
1
Algorithm
1.wij = 1 if i = j, otherwise –
2.aj(0) = si, t = 0.
3.aj(t+1) = f[aj(t) –*k<>j ak(t)]
4.go to step 3 if more than one node has a
non-zero activation

Special Case: More than one node has the


same maximum activation.
Example
• s1 = .5, s2 = .1,  = .1

• a1(0) = .5, a2(0) = .1


• a1(1) = .49, a2(1) = .05
• a1(2) = .485, a2(2) = .001
• a1(3) = .4849, a2(3) = 0
Mexican Hat
• Kohonen, 1989
• Contrast enhancement
• Architecture (w0, w1, w2, w3)
• w0 (xi -> xi) , w1 (xi+1 -> xi and xi-1 ->xi)

xi-3 xi-2 xi-1 xi xi+1 xi+2 xi+3

0 - + + + - 0
Algorithm
1. initialize weights
2. xi(0) = si
3. for some number of steps do
4. xi(t+1) = f [ wkxi+k(t) ]
5. xi(t+1) = max(0, xi(t))
Example
• x1, x2, x3, x4, x5
• radius 0 weight = 1
• radius 1 weight = 1
• radius 2 weight = -.5
• all other radii weights = 0
• s = (0 .5 1 .5 0)
• f(x) = 0 if x < 0, x if 0 <= x <= 2, 2
otherwise
Example
• x(0) = (0 .5 1 .5 1)
• x1(1) = 1(0) + 1(.5) -.5(1) = 0
• x2(1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25
• x3(1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) =
2.0
• x4(1) = 1.25
• x5(1) = 0
Why the name?
• Plot x(0) vs. x(1)

x1 x2 x3 x4 x5
Hamming Net
• Lippman, 1987
• Maximum likelihood classifier
• The similarity of 2 vectors is taken to be
n – H(v1, v2)
where H is the Hamming distance
• Uses MaxNet with similarity metric
Architecture
• Concrete example:

x1

y1

x2 MaxNet

y2
x3
Algorithm
1. wij = si(j)/2
2. n is the dimensionality of a vector
3. yin.j = xiwij + (n/2)
4. select max(yin.j) using MaxNet
Example
• Training examples: (1 1 1), (-1 -1 -1)
• n=3
• yin.1 = 1(.5) + 1(.5) + 1(.5) + 1.5 = 3
• yin.2 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0
• These last 2 quantities represent the
Hamming distance
• They are then fed into MaxNet.
Kohonen Self-Organizing Maps
• Kohonen, 1989
• Maps inputs onto one of m clusters
• Human brains seem to be able to self
organize.
Architecture

x1 y1

xn ym
Neighborhoods
• Linear 321#123

• Rectangular
22222
21112
21#12
21112
22222
Algorithm
1. initialize wij
2. select topology of yi
3. select learning rate parameters
4. while stopping criteria not reached
5. for each input vector do
6. compute D(j) = (wij – xi)2
for each j
Algorithm.
7. select minimum D(j)
8. update neighborhood units
wij(new) = wij(old) + [xi – wij(old)]
9. update 
10. reduce radius of neighborhood
at specified times
Example
• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1
1) into two clusters
 (0) = .6
 (t+1) = .5 * (t)
• random initial weights
.2 .8
.6 .4
.5 .7
.9 .3
Example
• Present (1 1 0 0)

• D(1) = (.2 – 1)2 + (.6 – 1)2 + (.5 – 0)2 + (.9


– 0)2 = 1.86
• D(2) = .98

• D(2) wins!
Example
• wi2(new) = wi2(old) + .6[xi – wi2(old)]

.2 .92 (bigger)
.6 .76 (bigger)
.5 .28 (smaller)
.9 .12 (smaller)

• This example assumes no neighborhood


Example
• After many epochs

0 1 (1 1 0 0) -> category 2
0 .5 (0 0 0 1) -> category 1
.5 0 (1 0 0 0) -> category 2
1 0 (0 0 1 1) -> category 1
Applications
• Grouping characters
• Travelling Salesperson Problem
– Cluster units can be represented graphically
by weight vectors
– Linear neighborhoods can be used with the
first and last cluster units connected
Learning Vector Quantization
• Kohonen, 1989
• Supervised learning
• There can be several output units per
class
Architecture
• Like Kohonen nets, but no topology for
output units
• Each yi represents a known class

x1 y1

xn ym
Algorithm
1. Initialize the weights
(first m training examples, random)
2. choose 
3. while stopping criteria not reached do
(number of iterations,  is very small)
4. for each training vector do
Algorithm
5. find minimum || x – wj ||
6. if minimum is target class
wj(new) = wj(old) + [x – wj(old)]
else
wj(new) = wj(old) – [x – wj(old)]
7. reduce 
Example
• (1 1 -1 -1) belongs to category 1
• (-1 -1 -1 1) belongs to category 2
• (-1 -1 1 1) belongs to category 2
• (1 -1 -1 -1) belongs to category 1
• (-1 1 1 -1) belongs to category 2

• 2 output units, y1 represents category 1


and y2 represents category 2
Example
• Initial weights (where did these come
from?

1 -1
1 -1
-1 -1
-1 1

  = .1
Example
• Present training example 3, (-1 -1 1 1). It
belongs to category 2.

• D(1) = 16 = (1 + 1)2 + (1 + 1)2 + (-1 -1)2


+ (-1-1)2
• D(2) = 4

• Category 2 wins. That is correct!


Example
• w2(new) = (-1 -1 -1 1)
+ .1[(-1 -1 1 1) - (-1 -1 -1 1)] =

(-1 -1 -.8 1)
Issues
• How many yi should be used?

• How should we choose the class that each


yi should represent?

• LVQ2, LVQ3 are enhancements to LVQ


that modify the runner-up sometimes
Counterpropagation
• Hecht-Nielsen, 1987
• There are input, output, and clustering
layers
• Can be used to compress data
• Can be used to approximate functions
• Can be used to associate patterns
Stages
• Stage 1: Cluster input vectors

• Stage 2: Adapt weights from cluster units


to output units
Stage 1 Architecture

w11 v11
x1 z1 y1

xn zp ym
Stage 2 Architecture

x*1 y*1
tj1 vj1

zj

x*n y*m
Full Counterpropagation
• Stage 1 Algorithm
1.initialize weights, 
2.while stopping criteria is false do
3.for each training vector pair do
4.minimize ||x – wj|| + ||y – vj||
wj(new) = wj(old) + [x – wj(old)]
vj(new) = vj(old) + [y-vj(old)]
5.reduce 
Stage 2 Algorithm
1. while stopping criteria is false
2. for each training vector pair do
3. perform step 4 above
4. tj(new) = tj(old) + [x – tj(old)]
vj(new) = vj(old) + [y – vj(old)]
Partial Example
• Approximate y = 1/x [0.1, 10.0]

• 1 x unit
• 1 y unit
• 10 z units
• 1 x* unit
• 1 y* unit
Partial Example
• v11 = .11, w11 = 9.0
• v12 = .14, w12 = 7.0
• …
• v10,1 = 9.0, w10,1 = .11

• test .12, predict 9.0.

• In this example, the output weights will converge


to the cluster weights.
Forward Only Counterpropagation
• Sometimes the function y = f(x) is not
invertible.

• Architecture (only 1 z unit active)


x1 z1 y1

xn zp ym
Stage 1 Algorithm
1. initialize weights,  (.1),  (.6)
2. while stopping criteria is false do
3. for each input vector do
4. find minimum || x – w||
w(new) = w(old) + [x – w(old)]
5. reduce 
Stage 2 Algorithm
1. while stopping criteria is false do
2. for each training vector pair do
3. find minimum || x – w ||
w(new) = w(old) + [x – w(old)]
v(new) = v(old) + [y – v(old)]
4. reduce 

Note: interpolation is possible.


Example
• y = f(x) over [0.1, 10.0]
• 10 zi units
• After phase 1, zi = 0.5, 1.5, …, 9.5.
• After phase 2, zi = 5.5, 0.75, …, 0.1

You might also like