Kohonen.T WhereTheAbstractfeatureMapsMayComeFrom

You might also like

Download as pdf
Download as pdf
You are on page 1of 6
Trends in Neurosciences reprints from e ~” © <. © = or = © 5 2. ~” Oo c = S —s a UK CR2 TIA, 1223 31596 1223 321410 Where the abstract feature maps of the brain might come from Teuvo Kohonen and Riitta Hari Three types of neuronal organization can be called ‘brain maps’: sets of feature-sensitive cells, ordered projections between neuronal layers and ordered maps of abstract features. The latter are most intriguing as they reflect the central properties of an organism's experiences and environment. It is proposed that such feature maps are learned in a process that involves parallel input to neurons in a brain area and adaptation of neurons in the neighborhood of the cells that respond most strongly to this input.This article presents a new mathematical formulation for such adaptation and relates it to physiological functions. ‘Trends Neurosci. (1999) 22, 135-139 NE OF THE MOST INTRIGUING problems en [countered in neurophysiology has been the eluci- dation of the origin of featuce-sensitive neurons and, in particular, their spatial order, as observed in numer (ous parts of the CNS (early theoretical works on featur sensitive cells are reported in Refs 1-4). The emergence of elementary spatial order in sets of simulated fea- ture-sensitive cells was first demonstrated by von der ‘Malsburg in 1973 (Ref. 5, se also Refs 6-9). On the basis Of the approaches of von der Malsburg and others, Kohonen** was able to define some of the most fun- damental conditions of self-organization and to express them in the form of an effective algorithmic scheme This algorithm has emerged as being very powerful in producing many types of, even abstract, ordered ‘maps’ based on various sets of input data. More recent work has looked for the physiological counterparts that are linked to this principle" Three types of neuronal organization can be called ‘brain maps’. The first is a multitude of single neurons or neuronal groups that respond to some distinct sen- sory inputs. However, what most people would refer to as maps are topographically ordered projections from a receptive surface to a brain area or between two neuronal layets. These brain maps can extend over several centimeters such as, for example, the retino- topic map of the visual field or the somatotopic ‘homunculus’ in the sensorimotor cortex. A less well: known class of spatially ordered maps is that of more abstract features and must be distinguished from the others. Examples of such feature maps include the ditectional-hearing map in the owl midbrain" and the target-range map in the bat auditory cortex". These ‘maps are usually rather small (2-3 mm) in diameter. As no receptive surface exists for such abstract fea: tures, the spatial order of representations must be pro- duced by some self-organizing process, which occurs ‘mainly postnatally. We propose that the various fea- ture maps are moze common in the brain than is thought. This article focuses on the abstract feature ‘maps and argues that the process in which they are formed can be described by the ‘self-organizing map’ (80M) algorithm” In addition to serving as models for brain maps, the SOMs can be used as mathematical tools and they have practical applications that range from speech rec- cognition, image analysis and industrial process con- trol", to the automatic ordering of document libraries, the visualization of financial records and the catego- rization of the electric signals in the brain; Ref. 18 Teo Kohonen is atthe Neural Research Cente, fad Rta Han at Research Unit, vest of| oaois HUT, Espoo, Bila 135 REVIEW 136 1. Kohonen and R, Har Festive maps Fig. 1.4 settorganizin model set. An input message X is broadcast 10.05tof model M,, of which M, best matches X. Al models that ie in the vic of, (larger cl) improve ther matching with X. Note thot Mfrs ram ane message to anther (http://wwwicsi-berkeley.edu/-jagota/NCS/vol1.html) lists over 3000 examples based on the SOM algorithm, ‘The computed SOMs are very similar to many brain maps and they also behave dynamically in the same way; for example, their magnification is adjusted in Proportion to the occurrences of the stimuli Although the SOM principle is well known among. brain theorists, many experimental neuroscientists seem to be Ignorant of the specific requirements of this self-organizing process and the Implications of the theoretical results to their work. This article, there- fore, aims to provide a simple introduction to SOMs, which are useful for any scientist wondering whether a certain brain map could have emerged according to the SOM principle. The basic requirements for self- organization in an abstract set of elements and in a mathematical model will be described and two SOM simulations will be presented together with a final dis- cussion of the possible physiological implementations of the SOM principle in real brain structures. Requirements for self-organization ‘The self-organizing process can be realized in any. set of elements, illustrated schematically in Fig. 1, ‘where only a few basic operational conditions are assumed. For simplicity, let the elements (for example, single neurons or groups of closely cooperating neur- ons) form a regular planar array and let each element represent set of numerical values, M, (8 model). ‘These values can correspond to some parameters of the neuronal system and in contemporary compu- tational neuroscience it is customary to relate these to synaptic efficacies. It Is also assumed that each model is modified by the messages the element receives, Let us assume that some mechanism exists by which an Ingoing message, X (a set of parallel signal values), can be compared with all models, MI brain theory it is customary to refer to ‘competition’ between the elements when they are stimulated by common input, and the element whose parameters are closest to this input is activated the most. Thi element is called the ‘winner’ if it succeeds, while remaining active Itself In suppressing the activity in the neighboring neurons by, for example, lateral inhi- bition. The winner model is denoted by M.. Another requirement for self-organization is that the models will be modified only in the local vicinity of the winner(s) and that all the modified models will then resemble the prevailing message more accurately than, before. When the models in the neighborhood of the win- ner start simultaneously to resemble the prevailing ‘message (X) more accurately, they also tend to become ‘more similar mutually, that is, the differences between, all the models in the neighborhood of M, are smoothed. Different messages at different times affect separate parts of the set of models, and thus, after many learning steps, the models (0) start to acquire values that relate to each other smoothly over the whole array, in the same way as the original messages (%) in the ‘signal space’ do; in such a situation, maps related topologically to the sensory occucrences start to emerge”, These three subprocesses - the broadcast. ing of the input, the selection of the winner and the adaptation of the models in the spatial neighborhood of the winner ~ seem to be sufficient, in general, to define a self-organization process that then results in the emergence of the topographically organized maps. ‘Simulations with idealized networks Selforganizing processes can be implemented using ‘many mathematical rules. The particular rules selected for this article constitute a compromise between the effectiveness of self-organization and biological realiz- ation. The following equations, which are different from the old SOM algorithms, have been introduced in order to relate the adaptation processes to physio- logical facts ‘A simple representation of a message (X) is the data vector, X= (4, --,%), that is, a list of numbers that represent the signal values. The model M, is similarly represented by the ‘model vector’ m= (ry, May -- of the same dimensionality as x. Let the m, be not- malized in length such that the sum of squares of the nny is the same for each vector. Then the abbreviated Rotation, m, - x for vectors m, and x or mx, +m.X, + + m,, means the correlation of their elements, which Is used as the measure of the degree of match: ing between x and m. The winner, or the best-fiting model (m,) will then be defined’ by Eqn 1, which describes tie comparison process. The winning model at a given time fis always the one that has the high- est correlation with the input message: m,(O-x(0) = max, (m9 + x(0) o ‘A message x(Q) at time f will modify the values of models m(() to new values mt + 1) one time step later. The SOM algorithm applied in this article modi- fies the models in the neighborhood of the winner so that the correlation between them and the input mes- sage increases, while the normalized lengths of the m, are preserved approximately" Let the neighborhood of the winner be described by the neighborhood function ft, which reaches its maximum for the winner (when i = c). The value of hh, decreases as the distance of neuron (i) from the ‘winner (c) in the array of neurons increases; h,(®) CaN also change with time. In simulations, , can be give different mathematical forms. Equation 2 shows how the modified values of the ‘winning model’ and Its neighboring models are assumed to depend on the messages they receive: the larger the difference between the message at time ¢ and the (weighted) model, m,(d, the larger is the cchange towards values of the message; however, owing to H, the changes occur in the vicinity of the winner, namely: mic + 1) = m+ A(X ~ Im(-xH1m) 2) Owing to the factor m,(#-x(0, this particular updat- ing law keeps the lengths of the m, at about a constant value! ‘This modeling approach also applies to the case where the signals x, and the synaptic efficacies m, attain all-or-none values. In a large population of neurons their correlation (a, - x) then represents the coincidence of active x, values with the existing 1m, connections. ‘The SOM algorithms have been used successfully to simulate various brain-like feature maps, such as tono- topic maps”, maps of the peripersonal space", color maps” and many others. The following two compu- tational examples illustrate what other types of maps may emerge in idealized neural networks. For details of these simulations see Refs 12,20,21 Simulation I: phonemotopic mops To produce phonemotopic or phoneme maps, Kohonen" analyzed continuous natural Finnish speech and formed the input vectors [x(0] for Eqns 1 and 2 from the signal powers at 15 frequency bands sampled regularly (every 20 ms). The original simu- lations were based on slightly different algorithms but the new experiment, illustrated in Fig. 2, has been car- ried out using the new Eqns 1 and 2. After a few tens of thousands of spectral samples had been presented to the network, the values m, began to represent mod- cls of different phonemes in an orderly fashion. The circles depict neuronal units that form a regular grid and the symbols denote the phoneme (or, mote accu- rately, the short-term phonemic spectrum) to which each nit is best tuned So far, pure phoneme maps have not been found in the human brain, However, indirect evidence suggests that such ordered representations exist. For example, phonemic representations of human’ children are modified strongly during the first two years by the native language”, which would agree with the result ob- tained from the simulation that the map is organized according to the sensory input it receives. Moreover, ‘magnetoencephalographic evoked-response studies® indicate that the auelitory cortex reacts differentially to stimuli exceeding the vowel-category boundaries and that these boundaries depend on the language exposure of the subject Simulavon 2: word-category maps The purpose of this example, adapted from Ref. 20, is to demonstrate that the structure of a language on a tow level, already in very narrow word contexts, reflects the semantic content of the words. Therefore, {t becomes possible for even a simple artifical system, such as the SOM, to learn to identify semantic aspects of words from texts in an unsupervised way. The words in written text are symbolic items that were converted into numerical input vectors x in the following way: each word in the text was replaced by 4 random vector (r) that consisted of seven random umber components; i is the word position in the text, The random vectors were different, but unique for each word in the vocabulary. The purpose of this Tandomness is to eliminate the effect of the word pearance on the results. Later simulations, with very GILG. Gini O00 CoKMa Ka XK OHO) SESCEESRSS Fig. 2. A phonemotapic mop. A map of Finish phonemes was sl orgonized on the basis of shorttime spectro of speech; # means ki, {81 oF If token os ane broad phonemic cas. Pronunciation guide 25 foo: /o/ fhe they cut lie the ain cat, fe We the 'e? in bet, i he the eit o/h theo in po, fu ke te ‘sin put ‘and /y fie the ‘vind rench) @) 3 Re} 2 oe) Se BE Se eS =) oD (=) i large vocabularies, have used even 90-component vec- tors". To take into account the co-occurrences of neighboring words, the input to the SOM consisted of 4 pair of code vectors of adjacent words of the type: X= (yn) @ ‘The SOM was then computed using the x vectors picked up from the whole text. When the model-vec- tor values became stationary, the SOM models were labeled by those words whose x vectors best resembled the respective computed models. In this way the ‘semantic map’ of Fig. 3 was formed by using three- word phrases constructed of 30 words”. One should note the automatic segregation of the words into word classes and the fine structure that distinguishes, for example, the animate objects from the inanimate ones. Later, Honkela et a." constructed similar but more detailed maps for large collections of natural text, Grimm brothers fairytales (English translation), by using triplets of words In the context. This same principle sells visits + walks - phones.” Mary « buys + + speaks rr) + + eats + + A cat ee uns) + 6 fw « dogs + + dtinks +/+ horse + + + waks + s/t ss broad a waters + + ee sates» i+ » « beer + + meat] T+ much «a [poorly + + + ttle + + offen + fast + well + + seldom + slowly « Fig, 3. A semantic map. Set ocgonization of words inthe SOM onthe basi of ther co-occurrence with previous Words in the tet, The round symbols of neuronal elements have been left out andthe symbols are titan into the best matchiag locations (rectangular grid). The curves that separate the word classes have been drawn manual, but sniar clases can also be found by outorated clustering anaiysis. Adapted, ‘with permision, from Ref. 20. TNS VoL. 22 N93, 1998 137 138 “T. Kohonen and R Hart Fstre rp thas also materialized in large document maps! in which, second-level SOM organizes documents on the basis of their word-category histograms. The largest SOM so far realized has consisted of over 100000 models (neurons’) and organized over one million documents Conclusions drawn from category-specific recognition deficits in patients* and from recent functional imag- ing studies” suggest that the brain contains selective and local representations of concrete and abstract con- cepts, categories of objects (animals, tools, etc.) and even, contains distinct representations for animate versus inanimate objects, respectively. The word-category maps imply that the context of entities can be the driving force for the organization of their representations This context need not be stated verbally as above, but can be formed directly from sensory experiences. Implementation of SOMs in the real br It is, of course, unrealistic to expect that maps as clearly organized as those observed in various ideal simulation studies would emerge in the cerebral cor- tex. Preprocessing in neural realms is already much more complex than that used in simulations. However, the mammalian brain is known to support several feature maps, either orderly representations of the receptor surfaces on skin, retina or cochlea, of purely abstract maps, which are discussed in this article, The ‘acoustic maps are assumed to be associated with sounds to which the organism is most frequently exposed to. Tonotopic maps, which are known to exist both in auditory pathways and in several cortical auditory fields, are often thought to result from ordered projections that originate from the basilar membrane, but they have also been proxtuced successfully by the SOM algorithm directly". Essentially every level of the nervous system ex: hibits plasticity under certain circumstances™ and, thus, feature maps can also be expected at all levels. Further- more, even the ‘hard-wired’ maps are known to depend on the sensory experience and they evidently are, at least to some extent, the result of self-organization. Materialization of the SOM principle in the nervous system would first require a mechanism that distrib- utes essentially the same, or strongly correlated, infor- mation to the neurons or neuron groups of a certain region. Spatial diameters of the maps are then deter- mined by the longest distances reachable by common (or highly correlated) input in the neuronal layer. The SOM principle might work well in the thalamocortical system, where the diameter of the map would be de- termined primarily by the size of the thalamocortical axonal arbors, which reach 1-2 mm in length (maxi- mally)", and by the ramifications of the apical den- dritic trees of the receiving cortical cells, which have diameters ofa few millimeters", Thus, atypical cortical map that describes a single type of feature would be (maximally) 5 mm wide. Second, one needs a mecha- nism that ‘selects’ the ‘winner’ neurons, that Is, the center around which adaptation will take place. Lateral interconnectivity with excitatory and inhibitory con- nections**"*"**" could play a central role in this selec- tion, which results in enhanced discharge rates at a place where the original excitation was high, and suppression, of activity elsewhere. Third, the restriction of learning to the neighborhood of the winner(s) might simply follow the clustering of triggering activity, but one can also assume that some type of chemical ‘learning factor’ that controls modification locally, but does not activate the neurons, is emanating from the active neurons!" This idea comes close to the concept of metaplasticity, that is, active control of synaptic plasticity”. Hebbian learning versus SOM As explained above, the formation of SOM requires modification of synaptic strengths in the winners’ neighborhood. In computational neuroscience such changes are usually assumed to obey the law of Hebb", which suggests that the synaptic efficacy Increases in proportion to the conjunction of presynaptic and postsynaptic activities of the neurons. Kohonen" ‘modified Hebb’s law because chemical retrograde mes- sengers, such as nitric oxide, have been discovered that make it possible for a neuron to modify not only Its own synapses but also those of nearby neurons” Consider neuron i that has output activity y, and presynaptic activity at a synapse j equal to x, Hebb’s law for the change ofthe efficacy of synapse /s traditionally written as (t+ 1) = milo) + ay{Oxf0) @ where a is a proportionality factor. In theoretical examinations, the models of neur- onal networks are often simplified by assuming that one processing unit represents, for example, one pyta- midal neuron. However, for the physiological corre- lates of the SOM maps it is more expedient to select a neuronal group consisting of say a few hundred neu- rons of different types for the processing unit™™”. The dense intracortical interconnections spread the activ- ity to nearby neurons, and the learning rate in Hebb’s law can be regarded as being proportional to the mean activity level of the whole neuronal group, Although some general evidence for the Hebb-type synaptic modifiability exists, the inaccuracy of avail able measurements does not justify the use of a more detailed mathematical law. The following expression must therefore only be regarded as a hypothesis of what effects might account for the adaptation law expressed by Eqn 2. First, a5 y, and x, are non-negative, m,, according to Eqn 4, would grow without limits. This is unrealistic, and another necessary modification is, therefore, ‘active forgetting’, which is the increase of the synap- tic efficacy by the presynaptic activation (x) that must be opposed by some sort of active corruption of the efficacy of the synapse. any active neuronal group denoted by the subscript c controls the modifiability or metaplasticity® of syn- apses in its neighborhood (i) by any means, ay, can be replaced in Eqn 4 by the ‘neighborhood function’, Equation 4 can be replaced with Eqn 5 provided that (1) the net depolarization of the cell, owing to all presynaptic activations, is proportional to the corre- lation of inputs with the synaptic efficacies, that is, to 'm,-x; 2) the corruption of mi is proportional to the net depolarization and further to the value of m, itself; and ) the neighborhood function is taken into account: (t+ = m0 + HOale) — BlmH-x(o]m0 3) where is another proportionality factor. if the ‘active’ corruption also needs the control action from the neighborhood, then § = h,,, and Eqn 5, written in. vector form, then becomes exactly the same as Eqn 2. These hypotheses, therefore, lead to the formulation Of the SOM process described above and ate thus con- sistent with the basic functional principle of SOM, This particular equation (Eqn 5) is new in computational Predictions from the simulated feature maps Theoretically, some sort of competition on com- mon input between spatially neighboring cells is always necessary for the emergence of feature-specitic cells. Similarly, the condition for the production of ‘ordered feature maps is that the ‘winner’ cells force their spatial neighbors to lear the same input. Such conditions might be ubiquitous in the CNS. ‘Mogeification In the simulated feature maps, the atea occupied by a feature value is proportional to some power of the probability of occurrence of the feature™. Theoretically, the exponent cant assume values between one third and one. The simulated phoneme map of Fig. 2 illus- trates this ‘magnification’ phenomenon: for example, the most common Finnish phoneme fa is mapped Into 18 of the total of 96 ‘neurons’. Cortical sensory representations are also dynamic and have a capacity for reorganization after lesion or altered experience”, Imperfect mops ‘The simulated SOMs sometimes contain typical unintended features. For example, if self-organization starts from different random initial states, the orien- tation of the resulting maps can vary or form a mitror image. The ‘initial values’ of brain maps are most probably determined genetically, but reverse ordering 1s stil possible. If the self-organization process is not perfect, artificial networks can be fragmented into two or several parts with opposite order. Overlapping maps The neurons of the model network might be organ- ized to receive afferents from different modalities. If only one modality is used at a time, several indepen- dent maps can be superimposed on the same network Similarly, if, for example, two types of input are used simultaneously, a common map is formed, with rep- resentations for both modalities" The map is oriented according to the strongest features in elther input and the representation is redundant: cells respond sim larly to inputs from either modality". ‘The abstract maps might be more common in the brain than its thought. One difficulty encountered in finding them, especially in humans by means of non- invasive functional brain imaging tools?™, is their very small extent, Therefore, simulated SOMs and teal brain maps have not been compared frequently. However, their similarities suggest that the same sim- ple principle might underlie, at least in an abstract form, the emergence of feature maps in the living brain at scales of ess than a few mallimeters. Its not Yet known whether larger-scale maps could be formed according to the same principle or perhaps by a multi- level organization Much work, therefore, remains to be undertaken in Otcer to find new feature maps and to understand the Aetailed mechanisms that underlie reorganization of SOttical representations. Experimentalists who discover ‘ew feature-sensitive neurons are urged to probe the {MTounding area in detail to find related detectors. It 'S hoped that predictions based on SOM simulations would stimulate neurophysiological research to explore further the neuronal implementation of the models, the synaptic plasticity laws, the spatial scales of the maps and, in general, factors that control map formation, Selected references 1 Nass, MIM. and Cooper, LN, (1975) ot tem Blok 2 Perez, Glass, Land Shiser, R.(197S) 1. Math 275-288 3 Didday, RAL. (1976) Math, Bish 0, 169-180 4 Grossberg, 5. (1976) Bil, Cyr. 23, {87-202 5 von der Malsburg, C. (1973) Kyhsrett 14, 85-100 © Amari, S- and Atbib, M.A. (1977) in Stone Neuroscience (ets J. ed), pp. 119-165, academic Pres 7 Swindale, NW. [1980, Pac Suc Landon Ser. 8 208, 243-264 8 von der Malsburg, C. and Willshaw, BJ. (1999) Proc Matt Head 5 08.74, 5176-518 9 Takeuchi, A. and Amar, §, (1979) Bi. Cy. 25, 63-72 nT. (1982) Bio. Cem. 43, 39-69 nen, T. (1984) Sel-Organiation and Associative Memery, rrVeriag ren, T, (1995) Sel Organizig Maps, Springer-Verlag 13 Kohionen,T, (1993) Neurul Networks 6, 895-908 14 Kashi, 8. and Kohnen, (1994) Newal Netarks 7, 973-984 15 Knudsen, E.and Konishi, M. (1978) Science 200, 795-707 16 Suga, N. and O'Neill, WE. (1979) Sence 206, 381-388 17 Kohonen, Tea (1996) Pre. ns, cal Eten Eggers 84, 1358-1 18 Kaski, S, Kang, J. and Kohonen, T. (1988) Neural Comput Sire 02-380 wen, T1985) Perception 14, 711-719, 9 T (989) Biol. Cybern 61, 241-254 21 HonkelaT., Pulkkt, V. and Kohonen, T (1995) i Prosecng (ofthe International Conference om Artificial Neural Networks (VOL 2. pp. 7 22 Kuihh PAK (1998) Cur. Opin. Neubio 4, 812-622 25 Loumasmaa, 0.V, etal (1996) Pro, Na! Acad Si U. 8.93, as00-8815, 24 Naatanen, Re al. (1997) Nate 385, 432-424 25 Damasio, A. (1990) Tens Newt 13, 95-98 26 Martin, A: al. (1996) Notte 379, 649653 27 Perani, D. eta. (1995) Newotepon 6, 1637-1611 28 Tanto, AR. (1952) mf. Physiol 158, 712-727 29 Merzenich, “MM. Knight, P-L. and. Roth, G, 1. Newophysiol, $8, 231-249 30 Buonomano, D.V. and Merzenieh, MM. (1998) Ann Rev Newosc 21, 149-186 31 Jones, £.G. and Peters, A, eds. (1984) Cerebral Cortes (Vok. 2), Plenum Press 32 Gilbert, C1. (1992) Neuron 9, 1-13 35 Abraham, W.C. and Bear, MLE. (1996) Trends Newasc. 1, ae-t40 $34 Hebb, D. (1949) Organization of Behvio, Jn Wiley and Sons 35 Montage, PAR, Gally, JA. and Edelman, GM, (991) Corb Context, 198-230 36 Pape, HC. (1998) Semin. Neuosc 7, 329-340 437 Holscher, C. (1997) Trends None 30, 298-308 38 Scentagothal, J. (1978) fain Kes. 95, 475-199 39 Rocke AJ, Horns, RW, and Powell, TP (1980). Bain 103, 2ai-23i 40 Levy, W.B, and Desmond, NL. (1985) in Synaptic Mottin, Neuron Selectivity, and Nemous Sytem Organlzaton (Levy, W.8., Anderson, A: and Lehmbuhle, 8, eds), pp. 105-121, Lawrence Enhaum 41 Rauschecker, J.P (1991) HysioRev 71, 387-615 42 Kaas J.(1991) Annu, Rev. Nemo 14, 137-16? 45 Elbert, Tet al. (1095) Science 270, 305-307 “44 Frackowiaek, RSJ. et al, eds (1997) Huma Brain Panction, Academie ess (1975) REVIEW Acknowledgements The authors esearch supported by the ‘Acad of Fined. The authors thank heir colleagues, cepacia ‘Academician Ot V. Lounesmas, fr atuable comments on he mamas, 139

You might also like