Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2

Publication 2
Jarkko Venna and Samuel Kaski. Neighborhood preservation in non-
linear projection methods: An experimental study. In Georg Dorffner,
Horst Bischof, and Kurt Hornik, editors, Artificial Neural Networks -
ICANN 2001, Vienna, Austria, August 21–25, pp. 485–491. Springer,
Berlin, 2001.
c 2001 Springer. With kind permission of Springer Science and Busi-
ness Media.
Neighborhood preservation in nonlinear

proje tion methods: An experimental study

Jarkko Venna and Samuel Kaski


Helsinki University of Te hnology
Neural Networks Resear h Centre
P.O. Box 5400, FIN-02015 HUT, Finland
fjarkko.venna,samuel.kaskighut.fi

Abstra t. Several measures have been proposed for omparing non-


linear proje tion methods but so far no omparisons have taken into
a ount one of their most important properties, the trustworthiness of
the resulting neighborhood or proximity relationships. One of the main
uses of nonlinear mapping methods is to visualize multivariate data, and
in su h visualizations it is ru ial that the visualized proximities an be
trusted upon: If two data samples are lose to ea h other on the display
they should be lose-by in the original spa e as well. A lo al measure
of trustworthiness is proposed and it is shown for three data sets that
neighborhood relationships visualized by the Self-Organizing Map and
its variant, the Generative Topographi Mapping, are more trustworthy
than visualizations produ ed by traditional multidimensional s aling-
based nonlinear proje tion methods.

1 Introdu tion
Nonlinear proje tion methods map a set of multivariate data samples into a
lower-dimensional spa e, usually two-dimensional, in whi h the samples an be
visualized. Su h visualizations are useful espe ially in exploratory analyses: They
provide overviews of the similarity relationships in high-dimensional data sets
that would be hard to a quire without the visualizations.
The methods di er in what properties of the data set they try to preserve.
The simplest methods, su h as the prin ipal omponent analysis (PCA) [3℄, are
based on linear proje tion. A more omplex set of traditional methods, that are
based on multidimensional s aling (MDS) [10℄, try to preserve the pairwise dis-
tan es of the data samples as well as possible. That is, the pairwise distan es
after the proje tion approximate the original distan es. In a variant of nonlinear
MDS, nonmetri MDS [8℄, only the rank order of the distan es is to be pre-
served. Another variant, Sammon mapping [9℄, emphasizes the preservation of
lo al (short) distan es relative to the larger ones.
The Self-Organizing Map (SOM)[6, 7℄ is a di erent kind of a method that ts
a dis retized nonlinear surfa e to the distribution of the input data. The data
samples are proje ted to the map display by nding the losest lo ation on the
dis retized surfa e. In the end of the tting pro edure ea h dis rete grid point
on the surfa e is lo ated in the entroid of all data proje ted onto it and its
neighbors on the surfa e. The pro edure impli itly de nes a mapping from the
input spa e onto the map grid that aims at keeping lose-by grid points lose-by
in the original data spa e.
There exist several related algorithms and variants. In this study we will only
onsider the generative topographi mapping (GTM) [1℄, a probabilisti variant
of the SOM. The GTM is a generative model of the probability distribution of
the data, de ned by postulating a latent spa e, a mapping from the latent spa e
to the input spa e, and a noise model. The latent spa e orresponds to the SOM
grid, and samples an be proje ted to it by nding their maximum a posterior
lo ations.
In literature the methods have been ompared using several measures. Sin e
the methods are obviously good for di erent purposes it is not sensible to sear h
for the overall best method but to use the \goodness measures" to hara terize
in whi h kinds of tasks ea h method is good at.
Most of the methods try to preserve the pairwise distan es of the data samples
[2℄, even though the emphasis of the methods varies. The SOM does not belong
to this group, however. Several measures have been used to ompare the SOM
to other nonlinear proje tion methods and to measure the properties of the
SOM, but a ording to our knowledge one key property has not been measured
thus far: Whether the data points mapped lose-by on the displays are generally
lose-by in the input spa e as well. The goal of this study is to test empiri ally
whi h methods are best with respe t to this key property of trustworthiness of
the neighborhood relationships in the resulting displays.

2 Neighborhood Preservation
Preservation of neighborhoods or proximity relationships has often been men-
tioned as the key property of SOMs, and several di erent kinds of more or less
heuristi measures of neighborhood preservation have been proposed. In this pa-
per the main purpose is not to ompare these methods or to develop yet another
method. Our goal is to point out that, a ording to our knowledge, a key aspe t
related to neighborhood preservation in nonlinear proje tion methods has not
been measured so far, and to ompare the methods with respe t to that aspe t
that we all the trustworthiness of the neighborhoods in the visualization.
For dis rete data we de ne neighborhoods of data ve tors as sets onsisting
of the k losest data ve tors, for relatively small k . Topologi al on epts have
been de ned in terms of \arbitrarily small" neighborhoods but for dis rete data
we have to resort to nite neighborhoods. When the data are proje ted, the
neighborhood is preserved if the set of the k nearest neighbors does not hange.
Two kinds of errors are possible. Either new data may enter the neighborhood
of a data ve tor in the proje tion, or some of the data ve tors originally within
the neighborhood may be proje ted further away on the 2D graphi al display.
The latter kinds of errors result from dis ontinuities in the mapping, and they
have been measured extensively when quantifying the neighborhood preservation
of SOMs (see e.g. [5℄). As a result of dis ontinuities not all of the proximities in
the original data are visible after the proje tions.
We argue that the former kinds of errors are even more harmful sin e they
redu e the trustworthiness of the proximities or neighborhood relationships that
are visible on the display after the proje tion: Some data points that seem to be
lose to ea h other may a tually be quite dissimilar. A ording to our knowledge
this property of the proje tion methods has not been measured previously; for
SOMs the a ura y has been measured as \quantization errors" in the input
spa e, but not in the output spa e.
If the data manifold is higher-dimensional than the display, then both kinds
of errors annot be avoided and all proje tion methods must make a tradeo . In
this paper we will on entrate on measuring the new property of trustworthiness.
The dis ontinuities in the mapping will be measured by the preservation of
neighborhoods to show the tradeo s.
In prin iple, the errors ould be measured simply as the average number of
data items that enter or leave the neighborhoods in the proje tion. We have
used slightly more informative measures: Trustworthiness of the neighborhoods
is quanti ed by measuring how far from the original neighborhood the new
data points entering a neighborhood ome. The distan es are measured as rank
orders; similar results have been obtained with Eu lidean distan es as well (un-
published). Using the notation in Table 1 the measure for trustworthiness of the
proje ted result is de ned as

M1 (k ) = 1
2 XN X (r(xi ; xj ) k) ; (1)
Nk (2N 3k 1) i=1 x 2U (x )
j k i

where the term before the summation s ales the values of the measure between
zero and one1 .
Preservation of the original neighborhoods is measured by

M2 (k ) = 1
2 XN X (^r (xi ; xj ) k) : (2)
Nk (2N 3k 1) i=1 x 2V (x )
j k i

3 Test Setting
Three datasets were used: UCI2 9-dimensional Glass identi ation database
(Glass) having 214 data ve tors; UCI 34-dimensional ionosphere database (Iono-
sphere) having 350 data ve tors; and 20-dimensional Phoneti 3 dataset having
1962 data ve tors.
The resolution and exibility of SOM and GTM an be freely sele ted, and
need be xed for the omparison. The number of grid points on the map or latent
1
For larity we have only in luded the s aling for neighborhoods of size k < N=2
2
http://www.i s.u i.edu/mlearn/MLRepository.html
3
In luded in LVQ PAK, available at http://www. is.hut. /resear h/software.shtml
Table 1. Symbols used in de ning the error measures
xi 2 Rn ; i = 1; : : : ; N data ve tor
Ck (xi ) the set of those k data ve tors that are losest to xi in the original data spa e
C^k (xi ) the set of those k data ve tors that are losest to xi after proje tion
Uk (xi ) the set of data ve tors xj for whi h xj 2 C^k (xi ) ^ xj 62 Ck (xi ) holds
Vk (xi ) the set of data ve tors xj for whi h xj 62 C^k (xi ) ^ xj 2 Ck (xi ) holds
r(xi ; xj ); i 6= j the rank of xj when the data ve tors are ordered based on their Eu-
lidean distan e from the data ve tor xi in the original data spa e
r^(xi ; xj ); i 6= j the rank of xj when the data ve tors are ordered based on their distan e
from the data ve tor xi after proje tion

spa e, whi h governs the resolution of the proje tion, was set about equal to the
number of data points. The exibility or sti ness of the SOM is determined by
the nal radius  of the neighborhood fun tion that governs adaptation during
the learning pro ess. The radius was sele ted a ording to the goodness measure
given in [4℄.4 We used the neighborhood fun tion h(d) = 1 d2 = 2 , for d <= 
and h(d) = 0 otherwise. The argument d refers to the distan e on the map grid,
the unit being the distan e between neighboring map nodes.
The exibility of the GTM is governed by the number and width of the basis
fun tions that map the latent grid to the input spa e. For ea h data set the
values that gave the best log-likelihood on a validation set were hosen.
Note that sin e the map grids of SOM and GTM are dis rete, several data
points may be proje ted onto the same points and hen e there will be ties in
the rank ordering of the distan es between the proje ted data points. To make
these methods omparable to the rest, we assume that in the ase of ties all rank
orders are equally likely, and ompute averages of the error measures.

4 Results
Trustworthiness of the visualized neighborhood relationships. When interpret-
ing the visualizations it is parti ularly important that the small neighborhoods,
having a small value of k , are trustworthy. They are the most salient relation-
ships between the data items. In Fig. 1a the trustworthiness of the proje ted
proximities is measured for the three data sets. The SOM and GTM preserve
proximities (at small values of k up to 5-10% of the size of the datasets tested)
better than the other methods, whereas the MDS-based methods are mostly
better at preserving the global ordering (large values of k ). NMDS is the best of
the traditional methods, being onsistently better than Sammon mapping and
PCA. The relative order of the SOM and the GTM di ers on di erent datasets;
4
Be ause the measure works better if there is more data per grid point, the number
of grid points on the map was set to about 1=10 of the number data points (but
to at least 50) for the sele tion of the sti ness. The results were s aled ba k to the
larger map.
: Trustworthiness of the
a b: Preservation of the
neighborhoods original neighborhoods
1 1
Ionosphere dataset

0.95 0.95

M1 0.9
M2 0.9

0.85 0.85

PCA PCA
0.8 Sammon 0.8 Sammon
NMDS NMDS
SOM SOM
GTM GTM
0.75 0.75
10 20 30 40 50 60 10 20 30 40 50 60

k k
1 1
Phoneti dataset

0.95 0.95

M1 0.9 M2 0.9

0.85 0.85

PCA PCA
0.8 Sammon 0.8 Sammon
NMDS NMDS
SOM SOM
GTM GTM
0.75 0.75
50 100 150 200 250 300 50 100 150 200 250 300

k k
Fig. 1. Trustworthiness of the neighborhoods after proje tion (a) and preservation of
the original neighborhoods (b) for two data sets as a fun tion of the neighborhood size
k. The results on the third data set (Glass) were similar.

the explanation is probably related to the di erent methods used in sele ting
their sti ness ( f. the dis ussion about sti ness and trustworthiness below).

Preservation of the original neighborhoods. While preservation of the original


neighborhoods, measured in Fig. 1b, is not as important as the trustworthiness
of the proje tion, it gives insight to the tradeo s done in the proje tion. PCA
and NMDS are the overall best methods: They perform similarly on all datasets
and are onsistently among the best methods. For small neighborhoods (small
k ) the SOM is among the best methods as well.

Flexibility of mapping in reases trustworthiness of small neighborhoods. It seems


plausible that there is a onne tion between the sti ness of the mappings and the
di erent kinds of errors. If the mapping is sti (the linear PCA is the extreme
example), then lose-by points will be proje ted lose to ea h other. Unless
the data lies within a low-dimensional (linear) subspa e of the original spa e,
however, data points originally far away may \ ollapse" into the same lo ation.
We next investigated how the sti ness of the SOM a e ts the kinds of errors
made.
The more exible the SOM is (the smaller the  ) the more trustworthy the
small neighborhoods are (Fig. 2a), as expe ted. There is a tradeo in that the
trustworthiness of the larger neighborhoods de reases.
The relation between the sti ness and the preservation of the original neigh-
borhoods is not as lear, however, and requires more investigation. For larger
neighborhoods it holds that the sti er the map the better the neighborhoods are
preserved (Fig. 2b), but for very small neighborhoods the sti est map ( = 20)
makes most errors.

: Trustworthiness of the
a b: Preservation of the
visualization original neighborhoods
1 1
Phoneti dataset

0.95 0.95

M1 0.9 M2 0.9

0.85 0.85

NMDS NMDS
GTM GTM
0.8 0.8
SOM σ=1.1 SOM σ=1.1
SOM σ=10 SOM σ=10
SOM σ=20 SOM σ=20
0.75 0.75
50 100 150 200 250 300 50 100 150 200 250 300

k k
Fig. 2. The e e t of the sti ness of the SOM on trustworthiness of the visualized
neighborhoods (a) and preservation of the original neighborhoods (b), as a fun tion
of the neighborhood size k. The sti ness was varied by varying the nal neighborhood
radius  on a SOM of a xed size. The sti ness of the GTM shown for omparison was
sele ted by maximum likelihood on a validation set. Data set: Phoneti .

5 Con lusions
The experimental results showed that the neighborhood relationships on the
SOM and GTM displays are more trustworthy than on MDS-based displays.
That is, if two data samples are lose to ea h other on the visualizations, they
are more likely to be lose to ea h other in the original high-dimensional spa e
as well. Moreover, the apa ity of the SOM in preserving the original neighbor-
hoods, the other aspe t of neighborhood preservation, seems to be omparable
to the other methods.
The trustworthiness of the lo al neighborhoods an still be improved by in-
reasing the exibility of the mapping, by redu ing the radius of the neigh-
borhood fun tion of the SOM. The ost is that larger neighborhoods (longer
distan es) are not preserved as well.
Referen es
[1℄ C. M. Bishop, M. Svensen, and C. K. I. Williams. GTM: The generative topographi
mapping. Neural Computation, 10:215{234, 1998.
[2℄ G. J. Goodhill and T. J. Sejnowski. A unifying obje tive fun tion for topographi
mappings. Neural Computation, 9:1291{1303, 1997.
[3℄ H. Hotelling. Analysis of a omplex of statisti al variables into prin ipal ompo-
nents. Journal of Edu ational Psy hology, 24:417{441,498{520, 1933.
[4℄ S. Kaski and K. Lagus. Comparing self-organizing maps. In C. von der Mals-
burg, W. von Seelen, J. C. Vorbruggen, and B. Sendho , editors, Pro eedings of
ICANN'96, International Conferen e on Neural Networks, pages 809{814, Berlin,
1997. Springer.
[5℄ K. Kiviluoto. Topology preservation in self-organizing maps. Pro eedings of IEEE
International Conferen e on Neural Networks., volume 1, pages 294{299, 1996.
[6℄ T. Kohonen. Self-organized formation of topologi ally orre t feature maps. Bio-
logi al Cyberneti s, 43:59{69, 1982.
[7℄ T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1995 (third, extended
edition 2001).
[8℄ J. B. Kruskal. Multidimensional s aling by optimizing goodness of t to a nonmetri
hypothesis. Psy hometri a, 29(1):1{26, Mar 1964.
[9℄ J. W. Sammon, Jr. A nonlinear mapping for data stru ture analysis. IEEE Trans-
a tions on Computers, C-18(5):401{409, May 1969.
[10℄ W. S. Torgerson. Multidimensional s aling I|theory and methods. Psy homet-
ri a, 17:401{419, 1952.

You might also like