Professional Documents
Culture Documents
Publication 2
Publication 2
Publication 2
Jarkko Venna and Samuel Kaski. Neighborhood preservation in non-
linear projection methods: An experimental study. In Georg Dorffner,
Horst Bischof, and Kurt Hornik, editors, Artificial Neural Networks -
ICANN 2001, Vienna, Austria, August 21–25, pp. 485–491. Springer,
Berlin, 2001.
c 2001 Springer. With kind permission of Springer Science and Busi-
ness Media.
Neighborhood preservation in nonlinear
1 Introdu
tion
Nonlinear proje
tion methods map a set of multivariate data samples into a
lower-dimensional spa
e, usually two-dimensional, in whi
h the samples
an be
visualized. Su
h visualizations are useful espe
ially in exploratory analyses: They
provide overviews of the similarity relationships in high-dimensional data sets
that would be hard to a
quire without the visualizations.
The methods dier in what properties of the data set they try to preserve.
The simplest methods, su
h as the prin
ipal
omponent analysis (PCA) [3℄, are
based on linear proje
tion. A more
omplex set of traditional methods, that are
based on multidimensional s
aling (MDS) [10℄, try to preserve the pairwise dis-
tan
es of the data samples as well as possible. That is, the pairwise distan
es
after the proje
tion approximate the original distan
es. In a variant of nonlinear
MDS, nonmetri
MDS [8℄, only the rank order of the distan
es is to be pre-
served. Another variant, Sammon mapping [9℄, emphasizes the preservation of
lo
al (short) distan
es relative to the larger ones.
The Self-Organizing Map (SOM)[6, 7℄ is a dierent kind of a method that ts
a dis
retized nonlinear surfa
e to the distribution of the input data. The data
samples are proje
ted to the map display by nding the
losest lo
ation on the
dis
retized surfa
e. In the end of the tting pro
edure ea
h dis
rete grid point
on the surfa
e is lo
ated in the
entroid of all data proje
ted onto it and its
neighbors on the surfa
e. The pro
edure impli
itly denes a mapping from the
input spa
e onto the map grid that aims at keeping
lose-by grid points
lose-by
in the original data spa
e.
There exist several related algorithms and variants. In this study we will only
onsider the generative topographi
mapping (GTM) [1℄, a probabilisti
variant
of the SOM. The GTM is a generative model of the probability distribution of
the data, dened by postulating a latent spa
e, a mapping from the latent spa
e
to the input spa
e, and a noise model. The latent spa
e
orresponds to the SOM
grid, and samples
an be proje
ted to it by nding their maximum a posterior
lo
ations.
In literature the methods have been
ompared using several measures. Sin
e
the methods are obviously good for dierent purposes it is not sensible to sear
h
for the overall best method but to use the \goodness measures" to
hara
terize
in whi
h kinds of tasks ea
h method is good at.
Most of the methods try to preserve the pairwise distan
es of the data samples
[2℄, even though the emphasis of the methods varies. The SOM does not belong
to this group, however. Several measures have been used to
ompare the SOM
to other nonlinear proje
tion methods and to measure the properties of the
SOM, but a
ording to our knowledge one key property has not been measured
thus far: Whether the data points mapped
lose-by on the displays are generally
lose-by in the input spa
e as well. The goal of this study is to test empiri
ally
whi
h methods are best with respe
t to this key property of trustworthiness of
the neighborhood relationships in the resulting displays.
2 Neighborhood Preservation
Preservation of neighborhoods or proximity relationships has often been men-
tioned as the key property of SOMs, and several dierent kinds of more or less
heuristi
measures of neighborhood preservation have been proposed. In this pa-
per the main purpose is not to
ompare these methods or to develop yet another
method. Our goal is to point out that, a
ording to our knowledge, a key aspe
t
related to neighborhood preservation in nonlinear proje
tion methods has not
been measured so far, and to
ompare the methods with respe
t to that aspe
t
that we
all the trustworthiness of the neighborhoods in the visualization.
For dis
rete data we dene neighborhoods of data ve
tors as sets
onsisting
of the k
losest data ve
tors, for relatively small k . Topologi
al
on
epts have
been dened in terms of \arbitrarily small" neighborhoods but for dis
rete data
we have to resort to nite neighborhoods. When the data are proje
ted, the
neighborhood is preserved if the set of the k nearest neighbors does not
hange.
Two kinds of errors are possible. Either new data may enter the neighborhood
of a data ve
tor in the proje
tion, or some of the data ve
tors originally within
the neighborhood may be proje
ted further away on the 2D graphi
al display.
The latter kinds of errors result from dis
ontinuities in the mapping, and they
have been measured extensively when quantifying the neighborhood preservation
of SOMs (see e.g. [5℄). As a result of dis
ontinuities not all of the proximities in
the original data are visible after the proje
tions.
We argue that the former kinds of errors are even more harmful sin
e they
redu
e the trustworthiness of the proximities or neighborhood relationships that
are visible on the display after the proje
tion: Some data points that seem to be
lose to ea
h other may a
tually be quite dissimilar. A
ording to our knowledge
this property of the proje
tion methods has not been measured previously; for
SOMs the a
ura
y has been measured as \quantization errors" in the input
spa
e, but not in the output spa
e.
If the data manifold is higher-dimensional than the display, then both kinds
of errors
annot be avoided and all proje
tion methods must make a tradeo. In
this paper we will
on
entrate on measuring the new property of trustworthiness.
The dis
ontinuities in the mapping will be measured by the preservation of
neighborhoods to show the tradeos.
In prin
iple, the errors
ould be measured simply as the average number of
data items that enter or leave the neighborhoods in the proje
tion. We have
used slightly more informative measures: Trustworthiness of the neighborhoods
is quantied by measuring how far from the original neighborhood the new
data points entering a neighborhood
ome. The distan
es are measured as rank
orders; similar results have been obtained with Eu
lidean distan
es as well (un-
published). Using the notation in Table 1 the measure for trustworthiness of the
proje
ted result is dened as
M1 (k ) = 1
2 XN X (r(xi ; xj ) k) ; (1)
Nk (2N 3k 1) i=1 x 2U (x )
j k i
where the term before the summation s
ales the values of the measure between
zero and one1 .
Preservation of the original neighborhoods is measured by
M2 (k ) = 1
2 XN X (^r (xi ; xj ) k) : (2)
Nk (2N 3k 1) i=1 x 2V (x )
j k i
3 Test Setting
Three datasets were used: UCI2 9-dimensional Glass identi
ation database
(Glass) having 214 data ve
tors; UCI 34-dimensional ionosphere database (Iono-
sphere) having 350 data ve
tors; and 20-dimensional Phoneti
3 dataset having
1962 data ve
tors.
The resolution and
exibility of SOM and GTM
an be freely sele
ted, and
need be xed for the
omparison. The number of grid points on the map or latent
1
For
larity we have only in
luded the s
aling for neighborhoods of size k < N=2
2
http://www.i
s.u
i.edu/mlearn/MLRepository.html
3
In
luded in LVQ PAK, available at http://www.
is.hut./resear
h/software.shtml
Table 1. Symbols used in dening the error measures
xi 2 Rn ; i = 1; : : : ; N data ve
tor
Ck (xi ) the set of those k data ve
tors that are
losest to xi in the original data spa
e
C^k (xi ) the set of those k data ve
tors that are
losest to xi after proje
tion
Uk (xi ) the set of data ve
tors xj for whi
h xj 2 C^k (xi ) ^ xj 62 Ck (xi ) holds
Vk (xi ) the set of data ve
tors xj for whi
h xj 62 C^k (xi ) ^ xj 2 Ck (xi ) holds
r(xi ; xj ); i 6= j the rank of xj when the data ve
tors are ordered based on their Eu-
lidean distan
e from the data ve
tor xi in the original data spa
e
r^(xi ; xj ); i 6= j the rank of xj when the data ve
tors are ordered based on their distan
e
from the data ve
tor xi after proje
tion
spa
e, whi
h governs the resolution of the proje
tion, was set about equal to the
number of data points. The
exibility or stiness of the SOM is determined by
the nal radius of the neighborhood fun
tion that governs adaptation during
the learning pro
ess. The radius was sele
ted a
ording to the goodness measure
given in [4℄.4 We used the neighborhood fun
tion h(d) = 1 d2 = 2 , for d <=
and h(d) = 0 otherwise. The argument d refers to the distan
e on the map grid,
the unit being the distan
e between neighboring map nodes.
The
exibility of the GTM is governed by the number and width of the basis
fun
tions that map the latent grid to the input spa
e. For ea
h data set the
values that gave the best log-likelihood on a validation set were
hosen.
Note that sin
e the map grids of SOM and GTM are dis
rete, several data
points may be proje
ted onto the same points and hen
e there will be ties in
the rank ordering of the distan
es between the proje
ted data points. To make
these methods
omparable to the rest, we assume that in the
ase of ties all rank
orders are equally likely, and
ompute averages of the error measures.
4 Results
Trustworthiness of the visualized neighborhood relationships. When interpret-
ing the visualizations it is parti
ularly important that the small neighborhoods,
having a small value of k , are trustworthy. They are the most salient relation-
ships between the data items. In Fig. 1a the trustworthiness of the proje
ted
proximities is measured for the three data sets. The SOM and GTM preserve
proximities (at small values of k up to 5-10% of the size of the datasets tested)
better than the other methods, whereas the MDS-based methods are mostly
better at preserving the global ordering (large values of k ). NMDS is the best of
the traditional methods, being
onsistently better than Sammon mapping and
PCA. The relative order of the SOM and the GTM diers on dierent datasets;
4
Be
ause the measure works better if there is more data per grid point, the number
of grid points on the map was set to about 1=10 of the number data points (but
to at least 50) for the sele
tion of the stiness. The results were s
aled ba
k to the
larger map.
: Trustworthiness of the
a b: Preservation of the
neighborhoods original neighborhoods
1 1
Ionosphere dataset
0.95 0.95
M1 0.9
M2 0.9
0.85 0.85
PCA PCA
0.8 Sammon 0.8 Sammon
NMDS NMDS
SOM SOM
GTM GTM
0.75 0.75
10 20 30 40 50 60 10 20 30 40 50 60
k k
1 1
Phoneti
dataset
0.95 0.95
M1 0.9 M2 0.9
0.85 0.85
PCA PCA
0.8 Sammon 0.8 Sammon
NMDS NMDS
SOM SOM
GTM GTM
0.75 0.75
50 100 150 200 250 300 50 100 150 200 250 300
k k
Fig. 1. Trustworthiness of the neighborhoods after proje
tion (a) and preservation of
the original neighborhoods (b) for two data sets as a fun
tion of the neighborhood size
k. The results on the third data set (Glass) were similar.
the explanation is probably related to the dierent methods used in sele
ting
their stiness (
f. the dis
ussion about stiness and trustworthiness below).
: Trustworthiness of the
a b: Preservation of the
visualization original neighborhoods
1 1
Phoneti
dataset
0.95 0.95
M1 0.9 M2 0.9
0.85 0.85
NMDS NMDS
GTM GTM
0.8 0.8
SOM σ=1.1 SOM σ=1.1
SOM σ=10 SOM σ=10
SOM σ=20 SOM σ=20
0.75 0.75
50 100 150 200 250 300 50 100 150 200 250 300
k k
Fig. 2. The ee
t of the stiness of the SOM on trustworthiness of the visualized
neighborhoods (a) and preservation of the original neighborhoods (b), as a fun
tion
of the neighborhood size k. The stiness was varied by varying the nal neighborhood
radius on a SOM of a xed size. The stiness of the GTM shown for
omparison was
sele
ted by maximum likelihood on a validation set. Data set: Phoneti
.
5 Con
lusions
The experimental results showed that the neighborhood relationships on the
SOM and GTM displays are more trustworthy than on MDS-based displays.
That is, if two data samples are
lose to ea
h other on the visualizations, they
are more likely to be
lose to ea
h other in the original high-dimensional spa
e
as well. Moreover, the
apa
ity of the SOM in preserving the original neighbor-
hoods, the other aspe
t of neighborhood preservation, seems to be
omparable
to the other methods.
The trustworthiness of the lo
al neighborhoods
an still be improved by in-
reasing the
exibility of the mapping, by redu
ing the radius of the neigh-
borhood fun
tion of the SOM. The
ost is that larger neighborhoods (longer
distan
es) are not preserved as well.
Referen
es
[1℄ C. M. Bishop, M. Svensen, and C. K. I. Williams. GTM: The generative topographi
mapping. Neural Computation, 10:215{234, 1998.
[2℄ G. J. Goodhill and T. J. Sejnowski. A unifying obje
tive fun
tion for topographi
mappings. Neural Computation, 9:1291{1303, 1997.
[3℄ H. Hotelling. Analysis of a
omplex of statisti
al variables into prin
ipal
ompo-
nents. Journal of Edu
ational Psy
hology, 24:417{441,498{520, 1933.
[4℄ S. Kaski and K. Lagus. Comparing self-organizing maps. In C. von der Mals-
burg, W. von Seelen, J. C. Vorbruggen, and B. Sendho, editors, Pro
eedings of
ICANN'96, International Conferen
e on Neural Networks, pages 809{814, Berlin,
1997. Springer.
[5℄ K. Kiviluoto. Topology preservation in self-organizing maps. Pro
eedings of IEEE
International Conferen
e on Neural Networks., volume 1, pages 294{299, 1996.
[6℄ T. Kohonen. Self-organized formation of topologi
ally
orre
t feature maps. Bio-
logi
al Cyberneti
s, 43:59{69, 1982.
[7℄ T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1995 (third, extended
edition 2001).
[8℄ J. B. Kruskal. Multidimensional s
aling by optimizing goodness of t to a nonmetri
hypothesis. Psy
hometri
a, 29(1):1{26, Mar 1964.
[9℄ J. W. Sammon, Jr. A nonlinear mapping for data stru
ture analysis. IEEE Trans-
a
tions on Computers, C-18(5):401{409, May 1969.
[10℄ W. S. Torgerson. Multidimensional s
aling I|theory and methods. Psy
homet-
ri
a, 17:401{419, 1952.