Professional Documents
Culture Documents
Biological Cybernetics: Self-Organized Formation of Topologically Correct Feature Maps
Biological Cybernetics: Self-Organized Formation of Topologically Correct Feature Maps
Biological Cybernetics: Self-Organized Formation of Topologically Correct Feature Maps
Cybernetics
9 Springer-Verlag1982
1. ln~oducfion
The present work has evolved from a recent discovery
by the author (Kohonen, 1981), i.e. that topologically
correct maps of structured distributions of signals can
be formed in, say, a one- or two-dimensional array of
processing units which did not have this structure
initially. This principle is a generalization of the formation of direct topographic projections between two
laminar structures known as retinotectal mapping
(Willshaw and Malsburg, 1976, 1979; Malsburg and
Willshaw, 1977; Amari, 1980). It will be introduced
here in a general form in which signals of any modality
may be used. There are no restrictions on the automatic formation of maps of completely abstract or
conceptual items provided their signal representations
or feature values are expressible in a metric or topological space which allows their ordering. In other
0340-1200/82/0043/0059/$02.20
60
nuclei. If the ability to form maps were ubiquitous in
the brain, then one could easily explain its power to
operate on semantic items: some areas of the brain
could simply create and order specialized cells or cell
groups in conformity with high-level features and their
combinations.
The possibility of constructing spatial maps for
attributes and features in fact revives the old question
of how symbolic representations for concepts could.be
formed automatically; most of the models of automatic
problem solving and representation of knowledge have
simply skipped this question.
2. Preliminary Simulations
In order to elucidate the self-organizing processes
discussed in this paper, their operation is first demonstrated by means of ultimately simplified system models. The essential constituents of these systems are:
1. An array of processing units which receive coherent
inputs from an event space and form simple discriminant functions of their input signals. 2. A mechanism which compares the discriminant functions and
selects the unit with the greatest function value.
3. Some kind of local interaction which simultaneously
activates the selected unit and its nearest neighbours.
4. An adaptive process which makes the parameters of
the activated units increase their discriminant function
values relating to the present input.
I
I
I
I
I
I
Relaying network
SI[II- [ I Sill[
qi3(A3) = max
{t/j(A3)lJ = 1, 2,..., n}
J
etc.
The above definition is readily generalizable to
two- and higher-dimensional arrays of processing
units; in this case some topological order must be definable for the events Ak, induced by more than one
ordering relation with respect to different attributes.
On the other hand, the topology of the array is simply
defined by the definition of neighbours to each unit. If
the unit with the maximum response to a particular
event is regarded as the image of the latter, then the
Ak
event
Interactive
/network
111 ~
~.._.~._.~~.,,_.~....~~... uni
PrOutotcessi
sputng
"J [
T/1
r/i
r/n
responses
qi
Fig, 2. Two-dimensional array of processing units
61
~2
(il,i2)
i2--.-~
'W . ~ " ,
il
,-"~ :.._".;...-: .. ".:";.:... . .~ '-,.
1 2 "d4
5 o'7
1/.
. . . . .
I 2
/3.
8
'.
"\
. . . . . . . .
".
II
. . . . . . . .
'/
"',, }1'-,-..:....c'.:,.:..v.
v..~.: .': 4 ." .'-". '.,:;":
,i.~):i):
t ,,",/
I . ? " : . . ","."':,:;
,%6 . . . . . . . .
."-'",".
:.:':';:."
"'.,':""'.'~."
"
," -:1
~i
i F,::.,:-.::..:_
'\
q . . . . . . . .
8:,..9..
. . . . .
.,..'
............
(a)
"
(b)
"
( ~r
o'
,"
"
,"
(-
>
"
,":
"
5r . . . . .
>.--:C'.~-:._
,, ,,.":'~)L<}':'~>-'<,;>~<.,,,
.
, . ,~.
.>-< >-< >-< >-<,>-<,,>~..'_~5-, ~-~
(7 ~',(7..',~z.-.'b,4)
~
-~....g ".~,.~" ;t,.-'~Xz?,(,-'~)
~,._ _,' % .
./
'" -~"~ 84 4 (-~~ 5 y:~?,o
~,i~8-,
~ . ~ X a , : s)
9,:~..,',,__',".,.:_:.,
..,__..,.,.__..
',:J ,~.. ',.j
(c)
Fig. 3. a Distributionof training vectors(front viewof the surfaceof a unit spherein Ra).The distribution had edges,each of which containedas
many vectors as the inside, b Test vectors which are mapped into the outputs of the processingunit array, e Images of the test vectors at the
outputs
signals {41, ~ 2 ' ' ' " ~n} was connected to all units. In
accordance with notations used in mathematical system theory, this set of signals is expressed as a column
vector x=[~l,~2,...,~,]TeR" where T denotes the
transpose. Unit i shall have input weights or parameters #~l,#iz,-..,#~, which are expressible as another
9
1T E Rn
column v e c t o r mi=Fflil,#i2,...,i.~in
d
. The
unit
shall form the discriminant function
~= ~ ~u~j=mrx.
(1)
j=l
(2)
mi(t+ 1)=
mi(t)+c~x(t)
IImi(t) + c~x(t)llE'
(3)
where the variables have been labelled by a discretetime index t (an integer), ~ is a "gain parameter" in
adaptation, and the denominator is the Euclidean
norm of the numerator. Equation (3) otherwise resembles the well-known teaching rule of the
Perceptron, except that the direction of the corrections
is always the same as that of x (no decision process or
supervision is involved), and the weight vectors are
normalized. Normalization improves selectivity in discrimination, and it is also beneficial in maintaining the
"memory resources" at a certain level. Notice that the
process of Eq. (3) does not change the length of mi but
only rotates m i towards x. Nonetheless it is not always
62
100 j.--~--~--..+ ,
100
~- 1
-.
500
2000.f..-~--------~ ......
f . . . . . . ~---...
I"
"
"
1--
"l
/J
-J/"
R, "
/,,,, ':~~-~_-% / /
'x
./
'~.,~'\.
~ ~ _ ~
~' /
Fig. 5. Same as Fig. 4 except that a longer interaction range was used
2000
j--------"--..
/'''/ r-'g'2-~~
,f'/
\',.,..,
"~k.
",%
'~"'--~..__
__...~..l'~'
Simulation 2. A clear conception of the ordering process is obtainable if the sequence of the weight vectors
is illustrated using computer graphics. For this purpose, the vectors were assumed to be three-dimensional.
Obviously the distribution of the weight vectors tends
to imitate that of the training vectors x(t). Since the
vectors are normalized, they lie on the surface of a unit
sphere in R 3. The order of the weight vectors in this
distribution can be indicated simply by a lattice of lines
which conforms with the topology of the processing
unit array. A line connecting two weight vectors m~ and
mj is used only to indicate that the two corresponding
units i and j are adjacent in the array. Figure 4 now
shows a typical development of the vectors mi(t) in
time; the illustration may be self-explanatory.
Simulation 3. This experiment was made in order to
find out whether the ordering of the weight vectors
would proceed faster if Eq. (3) were applied to more
than the eight nearest neighbours of the selected unit.
A number of experiments were made, and one of the
best methods found was to apply Eq. (3) as such to the
selected unit and its nearest eight neighbours while
using an adaptation gain value of c</4for those 16 units
which surrounded the previous ones. A result relating
to the previous training vectors is given in Fig. 5. In
this case ordering seems to proceed more quickly and
Comments. Something should be said here about simulations 1 through 3. There are eight equally probable
symmetrical alternatives in which the map may be
realized in the ordering process. One way to break the
symmetry and to define a particular orientation for the
map is to define "seeds", i.e. units with fixed, predetermined input weights. Another possibility is to use
nonsymmetrical distributions and arrays which might
have the same effect. We shall not take up this question
in more detail.
2.3. Formation of Feature Maps
in a One-Dimensional Array
with Non-Identical but Coherent Inputs
to the Units (Frequency Map)
The primary purpose of this experiment was to show
that for self-organization non-identical but coherent
inputs are sufficient.
Simulation 4. Consider Fig. 6, which depicts a onedimensional array of processing units governed by
system equations (1) through (3). In this case each unit
except the outermost ones has two nearest neighbours.
i. . . . . . . . . . .
II
t
L__ ~__ _~'_ ...... __ ~r~___
~tll
~2
~r/lO
"l
I
I
=
.i
Mutually
interacting
processing
units
Output
responses
Fig. 6. Illustration of the one-dimensional system used in the selforganized formatiou of a frequency map
63
Table 1. Formation of frequency maps in Simulation 4. The resonators (20 in number) corresponded to
second-order filters with quality factor Q=2.5 and resonant frequencies selected at random from the range
[1,2]. The training frequencies were selected at random from the range [0.5; 1]. This table shows two
different ordering results. The numbers in the table indicate those test frequencies to which each processing
unit became most sensitive
Unit i
10
Frequency map
in Experiment 1,
2000 steps
0.55
0.60
0.67
0.70
0.77
0.82
0.83
0.94
0.98
0.83
Frequency map
in Experiment 2,
3500 steps
0.99
0.98
0.98
0.97
0.90
0.81
0.73
0.69
0.62
0.59
~"~
/
--_. -_
/
.
(a)
3. A P o s s i b l e E m b o d i m e n t o f S e l f - O r g a n i z a t i o n
in a N e u r a l S t r u c t u r e
i
!
Afferent
t
(b)
axons
I[i
II
II
I
II
I[I
d'~
,,
Efferent
I
I
I
I
axons
64
- 3a-1
lllll-
3a+1
l-b/3 11ili
(6)
k=-L
(5)
65
10[
l-
' I/-+,B~g,+o
(a)
'
(b)
r/i
t=Co
i----
50 0
i---~
50
Fig.9a and b. Developmentof activityin time overa one-dimensionalinterconnectedarray,vs. unit position.Input excitation: ~0~= 2 sin(nil50).
a The lateral feedbackwas below a certain criticalvalue (the parameters relatingto Fig.8 were: a = 5, b = 0.024, and the saturation limit was
= 10). b Same as in a except that the lateral feedbackexceededthe criticalvalue (b=0.039)
(7)
66
ple, although not quite equivalent way, is to introduce a threshold by defining a floating bias function
6 common for all units, and then by putting the system
equations into a form in which the solution is determined implicitly:
(8)
~-
Notice that the factor t/i(t) in Eqs. (t0) and (11) in fact
corresponds to the selection rule relating to Eq. (3); the
input weights of only the activated units change.
Proportionality to th(t) further means that the correction becomes a function of distance from the maximum response. However, in addition to rotating the m,
vectors, this process still affects their lengths.
It has been pointed out (Oja, 1981) that the factor
which is assumed to be redistributed in the "memory"
process actually need not be directly proportional to
the input weight; for instance, if the input efficacy were
proportional to the square root of this factor (a weaker
function than the linear one !), then the denominator of
Eq. (11) would already become similar to that applied
in the teaching rule Eq. (3). Another interesting fact is
that the Euclidean norm follows from a simple forgetting law. One may note further that a particular
norm, and a particular form of the discrimination
function should be related to each other; the most
important requirement is to achieve good discrimination between neighbouring responses in one way or
another.
Many simulations with physical process models of
the above type were carried out; to make them
comparable to those performed on the more fictive
models of Sect. 2, three-dimensional vectors alone were
used. The following adaptive law was then applied:
eb) 1~/~"
(12)
5= max {~oi}- e ,
i
(9)
rl,= Z 7kO'k.
k~S~
(10)
# , / t ) + ~n/t) ( r
Z [~ij(t) + atli(t) (r
eb)
--
{b)]'
(t 1)
67
50
/"
,i"
o
o
i~:,'~-~.. '~X"~~,'.~~.-,',xyC:~
g,r
Y y.4 ....
40
:o
~s
30
+/"
la)
(b)
4-
20
"~-
I0
+/
~=
+ . . . .
I0
Relative
~i. /
_....
20
oo :
30
frequency,
double
40
per
50
cent
Fig. 11. Diagram showing the fraction of the processing unit array
occupied by pattern vectors of one half of the distribution, vs. their
relative frequency. Middle curve: optimal third-degree weighted
least-square fit. The other curves: standard deviation
68
Ordering within each subarea may thus occur independently and be only slightly affected at the demarcation zones.
,?
... " % .
)
\
69