Professional Documents
Culture Documents
DZ Ku Me
DZ Ku Me
DZ Ku Me
net/publication/234772007
CITATIONS READS
33 260
3 authors:
Viktor Medvedev
Vilnius University
30 PUBLICATIONS 256 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Deep Learning Algorithms Development for Marine Traffic Classification View project
All content following this page was uploaded by Gintautas Dzemyda on 15 May 2014.
Introduction
For the effective data analysis, it is important to include a human in the data
exploration process and combine flexibility, creativity, and general knowledge of the
human with the enormous storage capacity and computational power of today’s
computer. Visual data mining aims at integrating the human in the data analysis
process, applying human perceptual abilities to the analysis of large data sets available
in today’s computer systems [1].
Objects from the real world are frequently described by an array of parameters
(variables, features) x1 , x2 ,..., xn . The term “object” can cover various things: people,
equipment, produce of manufacturing, etc. Any parameter may take some numerical
values. A combination of values of all parameters characterizes a particular object
X j = ( x j1 , x j 2 ,..., x jn ) from the whole set X 1 , X 2 ,..., X m , where n is the number of
parameters and m is the number of analysed objects. X 1 , X 2 ,..., X m are the n-
dimensional vectors (data). Often they are interpreted as points in the n-dimensional
space R n , where n defines the dimensionality of the space. In fact, we have a table of
numerical data for the analysis: {x ji , j = 1,..., m, i = 1,..., n} . A natural idea arises to
present multidimensional data, stored in such a table, in some visual form. It is a
complicated problem followed by extensive researches, but its solution allows the
human to gain a deeper insight into the data, draw conclusions, and directly interact
with the data. In Figure 1, we present an example of visual presentation of the data
table (n=6, m=20) using multidimensional scaling method, discussed below. The
dimensionality of data is reduced from 6 to 2. Here vectors X 4 , X 6 , X 8 , and X 19
_____________________________
1
Corresponding Author: Institute of Mathematics and Informatics, Akademijos Str. 4,
LT 08663, Vilnius, Lithuania; E-mail: Dzemyda@ktl.mii.lt.
form a separate cluster that can be clearly observed visually on a plane and that cannot
be recognized directly from the table without a special analysis.
This chapter is organized as follows. In Section 1, an overview of methods for
multidimensional data visualization via dimension reduction is presented. The methods
based on neural networks are discussed, too. Section 2 deals with a problem of
combining the SOM and Sammon mapping. Investigations of the neural network for
Sammon’s mapping SAMANN are discussed in Section 3. Applications of the
dimension reduction and data visualization using neural networks are presented in
Section 4. They cover the visual analysis of correlations: environmental data analysis
and statistical analysis of curricula; multidimensional data visualization applications:
comparison of schools and analysis of the economic and social conditions of Central
European countries; visual analysis of medical data: analysis of data on the fundus of
eyes and analysis of physiological data on men’s health. Conclusions generalize the
results.
m
EMDS = å wij (dij* - dij )2 . (1)
i , j =1
i< j
1 1 1
The weights are frequently used: wij = m
, wij = m
, wij = .
å * 2
( dkl ) dij* å *
d kl mdij*
k ,l =1 k ,l =1
k <l k <l
Usually the Euclidean distances are used for dij and dij* . However, a perfect
reproduction of Euclidean distances may not always be the best possible goal,
especially if the components of the data vectors are expressed on an ordinal scale. Then,
only the rank order of the distances between the vectors is meaningful, not the exact
values. For solving this problem, the nonmetric MDS [11] may be used. Although the
nonmetric MDS was motivated by the need of treating ordinal-scale data, it can also be
used, of course, if the inputs are presented as pattern vectors in an Euclidean space. The
projection then only tries to preserve the order of distances between the data vectors,
not their actual values.
A particular case of the metric MDS is Sammon’s mapping [14]. It tries to
optimize a cost function that describes how well the pairwise distances in a data set are
preserved:
1 m (dij* - dij )2
ES = m
å . (2)
å dij* ii,<j =j 1 dij*
i , j =1
i< j
¶E S ( t ) ¶ 2 ES ( t )
yik (t + 1) = yik (t ) - a , (3)
¶yik (t ) ¶yik2 (t )
where t denotes the iteration order number, and a is a parameter, which influences the
optimization step.
The analysis of a relative performance of the different algorithms in reducing the
dimensionality of multidimensional vectors starting from the paper [15] indicates
Sammon’s projection to be still one of the best methods of this class (see also [16]).
There are some other nonlinear methods for data visualization: principal curves,
generative topographic mapping, locally linear embedding, etc. Such a variety of
methods is determined by the human desire to see the data from a different perspective.
The PCA can be generalized to form nonlinear curves. While in PCA a good
projection of a data set onto a linear manifold was constructed, the goal in constructing
a principal curve is to project the set onto a nonlinear manifold. The principal curves
[17] are smooth curves defined by the property that each point of the curve is the
average of all the data points projected to it, i.e. to which that point is the closest one on
the curve. Principal curves are generalizations of principal components, extracted using
PCA, in the sense that a linear principal curve is a principal component.
Generative topographic mapping (GTM) is a technique in which a topographic
mapping function between the input data and the visualization space is found [18]. The
idea is to use a function, which maps a density distribution in the visualization space in
combination with a Gaussian noise model into the original data space.
Locally linear embedding (LLE) is an unsupervised learning algorithm that
computes low-dimensional, neighbourhood preserving embeddings of high-
dimensional data. LLE assumes that the high-dimensional data lie on (or near to) a
smooth low-dimensional (nonlinear) manifold [19].
Artificial neural networks may be used for reducing dimension and data visualization.
A feed-forward neural network is utilised to effect a topographic, structure-preserving,
dimension-reducing transformation of the data, with an additional facility to
incorporate different degrees of associated subjective information. The MDS got some
attention from neural network researchers [20], [21], [22].
There also exists a neural network architecture designed specifically for
topographic mapping, and that is the self-organising map (SOM) [5], which exploits
implicit lateral connectivity in the output layer of neurons. The SOM is a method used
for both clustering and visualization (dimension reduction) of multidimensional data.
Especially two algorithms, the k-means clustering [23] and the principal curves, are
very closely related to the SOM. An important practical difference between SOM and
MDS (in terms of data visualization) is that SOM offers a possibility to visualize new
points (that were not used during learning), whereas MDS and Sammon’s mapping do
not.
In [24], a visualization method, called ViSOM, is proposed. It constrains the lateral
contraction force between the neurons in the SOM and hence regularises the inter-
neuron distances with respect to a scaleable parameter that defines and controls the
resolution of the map.
A curvilinear components analysis (CCA) [25] is proposed as an improvement to
self-organizing maps, the output space of which is continuous and takes automatically
the relevant shape. The structure of the CCA network consists of two layers: the first
one of which performs vector quantization on the dataset, and the second layer, called
the projection layer, performs a topographic mapping of the structure obtained by the
vector quantization layer. The projection layer is a free space, which takes the shape of
a submanifold of the data. The CCA is also claimed to allow an inverse projection, that
is, from the two-dimensional space to the n-dimensional space, by a permutation of the
input and output layers.
An autoassociative feed-forward neural network [26], [27] allows dimension
reduction by extracting the activity of d neurons of the internal “bottleneck” layer
(containing fewer nodes than input or output layers), where d is the dimensionality of
the visualization space. The network is trained to reproduce the data space, i.e. training
data are presented to both input and output layers while obtaining a reduced
representation in the inner layer.
The specific neural network model NeuroScale [28] uses a radial basis function
neural network (RBF) [29] to transform the n-dimensional input vector to the d-
dimensional output vector, where, in general, n>d. The other capacity of NeuroScale is
to exploit additional knowledge available on the data and to allow this knowledge to
influence the mapping. This allows the incorporation of supervisory information to a
totally unsupervised technique.
The autoencoder network for dimension reduction is proposed in [30]. It is the
nonlinear generalization of PCA that uses an adaptive, multilayer “encoder” network to
transform the high-dimensional data into a low-dimensional code and a similar
“decoder” network to recover the data from the code. It is discovered in [30] that the
nonlinear autoencoders work considerably better compared to widely used methods
such as PCA or LLE.
The combination and integrated use of data visualization methods of a different
nature are under a rapid development. The combination of different methods can be
applied to make a data analysis, while minimizing the shortcomings of individual
methods. We develop and examine some possibilities: combination of the SOM with
Sammon’s mapping (see Section 2), and SAMANN algorithm: feed-forward neural
network for Sammon’s mapping [20] (see Section 3). As an alternative to SAMANN
unsupervised learning rule, one could also train a standard feed-forward neural network,
using supervised backpropagation on previously calculated Sammon’s mapping.
Despite that it requires much more computation, as it involves two learning phases (one
for Sammon’s mapping, one for the neural network), it should perform at least as well
as SAMANN [31].
The self-organizing map (SOM) [5] is a class of neural networks that are trained in an
unsupervised manner using competitive learning. It is a well-known method for
mapping a high-dimensional space onto a low-dimensional one. We consider here the
mapping onto a two-dimensional grid of neurons. Let X 1 ,..., X m Î R n be a set of n-
dimensional vectors for mapping. Usually, the neurons are connected to each other via
a rectangular or hexagonal topology. The rectangular SOM is a two-dimensional array
of neurons M = {mij , i = 1,..., k x , j = 1,..., k y } . Here k x is the number of rows, and
k y is the number of columns. Each component of the input vector is connected to
every individual neuron. Any neuron is entirely defined by its location on the grid (the
number of row i and column j) and by the codebook vector, i.e. we can consider a
neuron as an n-dimensional vector mij = ( m1ij , mij2 ,..., mijn ) Î R n .
The learning starts from the vectors mij initialized at random. At each learning
step, the input vector X is passed to the neural network. The Euclidean distance from
this input vector to each vector mij is calculated and the vector (neuron) mc with the
minimal Euclidean distance to X is designated as a winner, c = arg min{|| X - mij ||} .
i, j
The components of the vector mij are adapted according to the general rule:
mij (t + 1) = mij (t ) + hijc (t )( X - mij (t )) , where t is the number of iteration (learning step),
result of the (q-1)-st block. Note that rq ¹ rq -1 in general. The way of selecting the
initial coordinates is presented below. We must determine the initial coordinates of
each two-dimensional vector Yiq correspondent to the neuron-winner Z iq , i = 1,..., rq .
The sequence of steps is as follows. Determine vectors from { X 1 , X 2 ,..., X m } that are
related with Z iq . Denote these vectors by X i1 , X i2 ,... ( X i1 , X i2 ,... Î { X1 , X 2 ,..., X m } ).
Determine neurons-winners of the (q-1)-st block that were related with X i1 , X i2 ,... .
Denote these neurons-winners by Z qj -1 , Z qj -1 ,... ( Z qj -1 , Z qj -1 ,... Î
1 2 1 2
q
initial coordinates of Yi are set to be equal to the mean value of the set of vectors
{ Y jq -1 , Y jq -1 ,... }. Afterwards, the two-dimensional projections Y1q , Y2q ,...,Yrq
1 2 q
q
( Yi = ( y iq1 , y iq2 ), i = 1,..., rq ) of the vectors-winners are calculated using Sammon’s
algorithm.
5. The training of the neural network is continued until q = g . After the g-th block,
we get two-dimensional projections Y1g , Y2g ,...,Yrg of the n-dimensional vectors-
g
winners Z1g , Z 2g ,..., Z rg that are uniquely related with the vectors X 1 , X 2 ,..., X m .
g
Figure 2. Scheme of the integrated combination of the SOM and Sammon mapping
Denote a consecutive combination of the SOM and Sammon mapping by
SOMSam(a), and the integrated one by SOMSam(b). The combined mapping have been
examined analysing the data on coastal dunes and their vegetation in Finland [35] (see
Section 4.1.1).
Cases with various parameters of the algorithms and their constituent parts have
been analysed:
· size of SOM (2x2, 3x3, 4x4, 5x5, 6x6);
· number of training epochs e (100, 200, 300);
· number of training blocks g and the number of epochs v per each training
block ( e = ng );
· values of the parameter a in Sammon’s mapping (0.1; 0.11;…; 0.99; 1) (Eq.
(3)).
Under the same initial conditions, the errors of projection ES (Eq. (2)) have been
calculated. The experiments have been repeated for 200 times with different (random)
initial values of the components of the neurons-vectors mij = ( m1ij , mij2 ,..., mijn ) Î R n .
Table 1. The ratios between the projection errors of SOMSam(a) and SOMSam(b).
v 50 25 20 10 5 50 40 25 20 10 5 50 25 20 10 5
g 2 4 5 10 20 4 5 8 10 20 40 6 12 15 30 60
2x2 2.30 2.85 2.84 2.85 2.86 3.07 3.09 3.12 3.11 3.10 3.15 3.25 3.30 3.29 3.23 3.34
3x3 1.11 1.14 1.14 1.20 1.26 1.12 1.14 1.18 1.20 1.25 1.31 1.15 1.20 1.22 1.27 1.30
4x4 1.49 1.66 1.67 1.75 1.79 2.85 2.96 3.06 3.10 3.18 3.23 4.60 4.83 4.92 4.95 5.05
5x5 1.03 1.04 1.05 1.06 1.06 1.04 1.05 1.06 1.06 1.07 1.07 1.06 1.07 1.07 1.07 1.08
6x6 1.07 1.08 1.11 1.13 1.13 1.11 1.20 1.20 1.22 1.23 1.24 1.23 1.24 1.25 1.26 1.27
The ratios between the mean projection errors, obtained by SOMSam(a) and
SOMSam(b) (see Table 1) have been calculated. It is apparent from Table 1 and Figure
3 that these ratios are always greater than one, i.e. the projection error, obtained by
SOMSam(b) is smaller than that, obtained by SOMSam(a). The experiments in the
papers [6], [34] show that the slight differences in the error value cause essential
differences in the distribution of points projected on a plane.
200 training epochs, 40 training blocks, SOM 3x3
Mean projection errors
0.05
0.04
SOMSam(a)
0.03
0.02 SOMSam(b)
0.01
0.00
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
parameter α
10 14, 15,
12 16, 17,
11
18, 19 1, 2, 3,
14
4, 20,
13 24 22, 23
19 7
17 5
21 8
18 6
23 10, 11,
15 9 12, 13,
16 20 22 21, 24
3 4 5, 6, 7,
2 1 8, 9
a) b)
Figure 4. Distribution of points, obtained: (a) by Sammon’s projection; (b) combined mapping
There is no mechanism to project one or more new data points without expensively
regenerating the entire configuration from the augmented data set in Sammon’s
algorithm. To project new data, one has to run the program again on all the data (old
data and new data). To solve this problem, some methods (triangulation algorithm [37],
[15], standard feed-forward neural network [31]) have been proposed in the literature.
A specific backpropagation-like learning rule has been developed to allow a normal
feed-forward artificial neural network to learn Sammon’s mapping in an unsupervised
way. The neural network training rule of this type was called SAMANN [20]. The
network is able to project new multidimensional vectors (points) after training.
Mao and Jain [20], who evaluated different network types on eight different data
sets, three numerical criteria and visual inspection, conclude that the SAMANN
network preserves the data structure, cluster shape, and interpattern distances better
than a number of other feature extraction or data projection methods.
The architecture of the SAMANN network is a multilayer perceptron where the
number of input vectors is set to be the input space dimension, n, and the number of
output vectors is specified as the projected space dimension, d (Figure 5).
¶ES ( m ,n )
Dw (jkl ) = -h = -h ( D (jkl ) ( m ) y (jl -1) ( m ) - D (jkl ) (n ) y (jl -1) (n )) , (4)
¶w (jkl )
where w jk is the weight between the unit j in the layer l - 1 and the unit k in the layer
l, h is the learning rate, y (jl ) is the output of the jth unit in the layer l, and m and n
(l )
are two vectors from the analysed data set { X 1 , X 2 ,..., X m } . The D jk are the errors
accumulated in each layer and backpropagated to a preceding layer, similarly to the
standard backpropagation. The sigmoid activation function with the range (0,1) is used
for each unit.
The network takes a pair of input vectors each time in the training. The outputs of
each neuron are stored for both points. The distance between the neural network output
vectors can be calculated and an error measure can be defined in terms of this distance
and the distance between the points in the input space. From this error measure, a
weight update rule has been derived in [20]. After, training the network, one is able to
use it to generalise on previously unseen data.
The SAMANN unsupervised backpropagation algorithm is as follows:
1. Initialize the weights in the SAMANN network randomly.
2. Select a pair of vectors randomly, present them to the network one at a time,
and evaluate the network in a feed-forward fashion.
3. Update the weights in the backpropagation fashion starting from the output
layer.
4. Repeat steps 2-3 a number of times.
5. Present all the vectors and evaluate the outputs of the network; compute the
projection error ES ; if the value of Sammon’s error is below a predefined
threshold or the number of iterations (from steps 2-5) exceeds the predefined
maximum number, then stop; otherwise, go to step 2.
In our realization of the SAMANN training, one iteration means showing all
possible pairs of vectors X 1 , X 2 ,..., X m to the neural network once.
A drawback of using SAMANN is that the original dataset has to be scaled for the
artificial neural network to be able to find a correct mapping, since the neural network
can only map to points in the sigmoid’s output interval (0,1) . This scaling is dependent
on the maximum distance in the original dataset. It is therefore possible that a new
vector, shown to the neural network, will be mapped incorrectly, when its distance to a
vector in the original dataset is larger than any of the original interpattern distances.
Another drawback of using SAMANN is that it is rather difficult to train and it is
extremely slow.
Control Parameters of the SAMANN Network Training. The rate, at which
artificial neural networks learns, depends upon several controllable factors. Obviously,
a slower rate means that a lot more time is spent in accomplishing the learning to
produce an adequately trained system. At the faster learning rates, however, the
network may not be able to make the fine discriminations possible with a system that
learns more slowly. Thus, if h is small when the algorithm is initialized, the network
will probably take an unacceptably long time to converge. The experiments, done in
[38], have shown in what way the SAMANN network training depends on the learning
rate.
The training of the SAMANN network is a very time-consuming operation. In the
consideration of the SAMANN network, it has been observed that the projection error
depends on different parameters. The latest investigations have revealed that, in order
to achieve good results, one needs to select the learning rate h properly. It has been
stated so far that projection yields the best results if the h value is taken from the
interval (0,1) . In that case, however, the network training is very slow. One of the
possible reasons is that, in the case of SAMANN network, the interval (0,1) is not the
best one. Thus, it is reasonable to look for the optimal value of the learning parameter
that may not necessarily be within the interval (0,1) .
The experiments [38] demonstrate that with an increase in the learning rate value,
a better projection error is obtained. That is why the experiments have been done with
higher values of the learning rate beyond the limits of the intervals (0,1) . In all the
experiments a two-layer (one hidden layer) network with 20 hidden units is used, the
projected space dimension d=2. The results are presented in Figure 6. The following
datasets were used in the experiments: Iris dataset (Fisher’s iris dataset) [39], Salinity
dataset [40], HBK dataset (artificial dataset) [41]. It has been noticed that the best
results are at h > 1 .
The experiments allow us to conclude that the optimal value of the learning rate
for the datasets considered is within the interval [10,30] . By selecting such values of
the learning rate, a significant economy of computing time is possible. For the fixed
number of iterations, good projection results are obtained in a shorter time interval than
that taking the learning rate values from the interval (0,1) . However, with an increase
in the value of the learning rate, the error variations also increase, which can cause
certain network training problems. Lower values of the learning rate within the interval
(0,1) guarantee a more stable convergence to the minimum of the projection error.
a) b)
c)
Figure 6. The dependence of the data projection accuracy on the learning rate h , h Î [1,100] : (a) Iris
dataset; (b) Salinity dataset; (c) HBK dataset
Retraining of the SAMANN Network. After training the SAMANN network, a
set of weights of the neural network is fixed. A new vector shown to the network is
mapped into the plane very fast and quite exactly without any additional calculations.
However, while working with large data amounts there may appear a lot of new vectors,
which entails retraining of the SAMANN network after some time.
Let us name the set of vectors that have been used to train the network by the
primary set, and the set of the new vectors, that have not been used for training yet, by
the new set. Denote the number of vectors in the primary dataset by N1 , and the
number of vectors in the new dataset by N 2 .
Two strategies for retraining the neural network that visualizes multidimensional
data have been proposed and investigated [38]. The first strategy uses all possible pairs
of data vectors (both from primary and new datasets) for retraining. The second
strategy uses a restricted number of pairs of the vectors (a vector from the primary set –
a vector from the new set).
The strategies of the neural network retraining data are as follows:
1. The SAMANN network is trained by N1 vectors from the primary dataset,
weights W1 are obtained, then the projection error ES ( N1 ) is calculated, and
vector projections are localized on the plane. After the emergence of N2 new
vectors, the neural network is retrained with all the N1 + N 2 vectors, using the
set of weights W1 as the initial one. The new weights W2 of SAMANN
network is found.
2. The SAMANN network is trained by N1 vectors from the primary dataset,
weights W1 are obtained, and the projection error ES ( N1 ) is calculated. Since
in order to renew the weights W1 , a pair of vectors is simultaneously provided
for the neural network – one vector from the primary dataset and one from the
new one. The neural network is retrained with 2N 2 vectors at each iteration.
Here one iteration involves computations, where all vectors from the new
dataset are provided to the neural network once. The new set of network
weights W2 is found.
The experiments [38] have shown that the first strategy yield good results,
however retraining of the network is slow. In the analysis of the strategies for the
network retraining, a particular case of the SAMANN network was considered: a feed-
forward artificial neural network with one hidden layer (20 hidden units) and two
outputs (d=2). The set of initial weights was fixed in advance. To visualize the primary
dataset, the following parameters were employed: the number of iterations M=10000,
the learning rate was h = 10 ; to visualize the set of new vectors: the learning rate was
h = 1 , and the number of iterations depended on the strategy chosen.
The best visualization results are obtained by taking points for network retraining
from the primary dataset and the new dataset (second strategy). Figures 7 and 8
demonstrate the results of calculation. The results of retraining the SAMANN network
only with the new vectors (points) are indicated in the figures. After each iteration the
projection error ES ( N1 + N 2 ) is calculated and the computing time is measured. The
Iris dataset and 300 randomly generated vectors (three spherical clusters with 100 5-
dimenional vectors each) have been used in the experiments. The second strategy
enables us to attain good visualization results in a very short time as well as to get
smaller projection errors and to improve the accuracy of projection.
Figure 7. Dependence of the projection error on the computing time for the Iris dataset
Figure 8. Dependence of the projection error on the computing time for randomly generated vectors
In some experiments we observe the fast stabilization of the projection error in the
case of the first strategy, while the projection error by the second strategy decreases.
This is an advantage of the second strategy. The experiments lead to the idea of a
possibility to minimize the SAMANN neural network training time by dividing the
training process into two subprocesses: (1) training of the network by a part of the data
vectors; (2) retraining of the network by the remaining part of the dataset. In this case,
the training set will consist of two subsets of { X1 , X 2 ,..., X m } . A smaller number of
the pairs of vectors will be used when training the network by vectors of the subsets.
This allows us to obtain a similar visualization quality much faster.
While working with high-dimensional datasets, it is important to have a way to
speed up the network training process. In [42], the algorithm (parallel realization)
which allows us to realize this requirement has been proposed. The algorithm divides
the SAMANN training dataset into two (or more) parts. Two (or more) identical neural
networks are created and weight sets for each network are calculated. Of all the weight
sets it remains the one with smaller projection error. Using the obtained weight set, the
neural network is trained with all the data set vectors.
4. Applications
The values obtained by any parameter xi can depend on the values of the other
parameters x j , j = 1,..., n, i ¹ j , i.e. the parameters can be correlated. The problem is
to discover knowledge about the interlocation of parameters, and about groups
(clusters) of parameters by the values of elements of the correlation matrix. The
approach, reviewed in this section, is oriented to the analysis of correlation matrices via
the visual presentation of a set of variables on a plane. More details on the approach
can be found in [32].
The correlation matrix R = {rxi x j , i, j = 1,..., n} of parameters can be calculated by
analysing the objects that compose the set. Here rxi x j is a correlation coefficient of the
parameters xi and x j . The specific character of the correlation matrix analysis
problem lies in the fact that the parameters xi and x j are related more strongly if the
absolute value of the correlation coefficient | rxi x j | is higher, and less strongly if the
value of | rxi x j | is lower. The minimal relationship between the parameters is equal to 0.
The maximal relationship is equal to 1 or –1.
Let S n be a subset of an n-dimensional Euclidean space R n containing vectors of
unit length, i.e. S n is a unit sphere, Y = 1 if Y Î S n . It is necessary to determine a
system of vectors Y1 ,...,Yn Î S n corresponding to the system of parameters x1 ,..., x n so
that cos( Yi ,Y j ) =| rxi x j | or cos(Yi , Y j ) = rx2i x j . If only the matrix of cosines
K = {cos(Yi , Y j ), i , j = 1,...n} is known, it is possible to restore the system of vectors
Ys = ( y s1 ,..., y sn ) Î S n , s = 1,..., n , as follows: ysk = lk a sk , k = 1,..., n , where lk is
the k-th eigenvalue of the matrix K , (a1k ,...,a nk ) is a normalized eigenvector
corresponding to the eigenvalue lk . The system of vectors Y1 ,...,Yn Î S n exists, if the
matrix of their scalar products is non-negative definite. The correlation matrix
R = {rxi x j , i, j = 1,..., n} is non-negative definite. It has been proven in [32] that the
1,2
4,6 10-12
5,10 4
7 13
7 3
8,9 5
6
8 14
9 3 15 1,2,16
a) b)
Figure 9. Combined mapping of: (a) the meteorological parameter set; (b) environmental parameter set
Environmental parameters are as follows: x1 is the distance from the water line;
x2 is the height above the sea level; x3 is the soil pH; x4 , x5 , x6 , and x7 are the
contents of calcium (Ca), phosphorous (P), potassium (K), and magnesium (Mg); x8
and x9 are the mean diameter and sorting of sand; x10 is the northernness in the
Finnish coordinate system; x11 is the rate of land uplift; x12 is the sea level
fluctuation; x13 is the soil moisture content; x14 is the slope tangent; x15 is the
proportion of bare sand surface; x16 is the tree cover. The correlation matrix is
presented in [44]. In Table 2, the distribution of the environmental parameters on the
4x4 SOM is presented. Here the cells corresponding to the neurons-winners are filled
with the numbers of vectors Y1 , Y2 ,...,Yn Î S n , n = 16 , some cells remain empty. One
can visually decide on the distribution of vectors in the n-dimensional space according
to their distribution among the cells of the table. However, the table does not answer
the question, how much the vectors of the neighbouring cells are close in the n-
dimensional space. It is expedient to apply Sammon’s projection method to an
additional mapping of the neurons-winners in the SOM. The results of combined
mapping (4´4 SOM and Sammon’s mapping) are presented in Figure 9b.
Table 2. Distribution of the environmental parameters on the 4x4 SOM
4, 6 5 8, 9
7 15 14
3
1, 2, 16 13 10, 11, 12
4.1.2. Statistical Analysis of Curricula
The main proposition in the analysis [49] is that, in most cases, a student gets similar
marks in the related subjects. If a student is gifted for the humanities, he will be
successful in most of the humanities. Mathematical aptitude yields good marks in
mathematical subjects. When analysing the curriculum of studies, we know in advance
which subjects are mathematical and which of the humanities. However, there are
subjects that cannot be assigned to any well-defined class of subjects. Computer
science subjects may serve as an example of such subjects. The analysis made it
possible to evaluate the level of mathematization of different computer science subjects
or their proximity to the humanities. This level is never quantified, but it may be
estimated considering the whole complex of subjects that compose the curriculum. The
necessary data were the results of examination, only. The correlation matrix of 25
subjects obtained on the basis of examination results at the Faculty of Mathematics and
Informatics of Vilnius Pedagogical University has been analysed.
The list of subjects: x1 – Probability theory, x2 – Methods of teaching
mathematics, x3 – Geometry 1, x4 – Pedagogy and psychology, x5 – Geometry 2, x6
– Mathematical analysis 1, x7 – Pedagogy, x8 – Geometry 3, x9 – Psychology 1, x10
– Algebra, x11 – Mathematical analysis 2, x12 – Foreign language, x13 – Geometry 4,
x14 – Psychology 2, x15 – Algebra and number theory 1, x16 – Mathematical analysis
3, x17 – Informatics 1, x18 – Algebra and number theory 2, x19 – Mathematical
analysis 4, x20 – Informatics 2, x21 – Development of training computer programs,
x22 – Methods of teaching informatics, x23 – Packages for solving mathematical
problems, x24 – Algorithm theory, x25 – Programming methods. The correlation
matrix is presented in [49].
7 ,14 ,22,
23
13
9 ,12
4 ,17,20, 8,16
21 18,19,25
15
1,2,6,11
10,24
3,5
The results of combined mapping (4´4 SOM and Sammon’s mapping) are
presented in Figure 10. Mathematical subjects are bold formatted in Figure 10,
computer science subjects are underlined, and the humanities are presented in italics.
This kind of notation allows us to perceive the results easier. The visual analysis of the
distribution of points in Figure 10 leads to a conclusion that computer science subjects
range from those of a purely mathematical nature to that of the humanities. However,
these subjects do not part from the groups of mathematical subjects or the humanities.
1
5,12
7,10
4,9,26 6,11,15
2,16,17
8,14
22,24
13
20 23
3,28
18 19
21,25,27
Figure 11. Distribution of the Panevėžys town and district schools in the 1999/2000 school year
4.2.2. Analysis of the Economic and Social Conditions of Central European Countries
Ten countries were compared in [33]: (1) Hungary, (2) Czech Republic, (3) Lithuania,
(4) Latvia, (5) Slovakia, (6) Poland, (7) Romania, (8) Estonia, (9) Bulgaria, (10)
Slovenia. The data on the economic and social conditions of Central European
countries are taken from the USA CIA The World Factbook 1999 database [52] and
World Bank Country Data database [53]. The wide used essential economic and social
parameters are selected: the infant mortality rate (deaths / 1000 live births); the Gross
Domestic Product (GDP) per capita in US dollars; the percentage of GDP developed in
the industry and services (not in the agriculture); the export per capita in thousands of
US dollars; the number of telephones per capita. The results of combined mapping (4´4
SOM and Sammon’s mapping) are presented in Figure 12. The average values of
parameters compose point AVE in Figure 12, the worst values of parameters compose
MIN, and the best values of parameters compose MAX. From the visually presented
results in Figure 12, it is easy to draw the comparative conclusions on the economic
and social conditions of the countries analysed.
1 4,8
6
5 AVE 3
2
MAX,10
9
MIN,7
Figure 16. Projection of the physiological data, obtained by SAMMAN: (a) Groups 1 and 3; (b) all the 3
groups
5. Conclusions
With massive data sets existing in reality, we have conceived the difficulty and
complexity of constructing computational tools extending human analysis capabilities
to higher dimensions. A well performed visualization extends the human perceptual
capabilities providing a deeper visual insight into data. One of the effective ways of
visualizing multidimensional data is the reduction of dimension. The dimensionality
reduction methods based on neural networks have been discussed here. The stress is
put on combining the SOM and Sammon mapping and on the neural network for
Sammon’s mapping SAMANN.
Large scale applications are discussed: environmental data analysis, statistical
analysis of curricula, comparison of schools, analysis of the economic and social
conditions of countries, analysis of data on the fundus of eyes and that of physiological
data on men’s health. In all the cases, the visualization enabled us to display the
properties of data with complex relations.
References
[1] D.A. Keim, M. Ward, Visualization. In: M. Berthold, D. J. Hand (eds.) Intelligent Data Analysis: an
Introduction, Springer-Verlag, (2003), 403-427.
[2] P.E. Hoffman, G.G. Grinstein. A Survey of Visualizations for High-Dimensional Data Mining. In:
U.Fayyad, G.G. Grinstein, A. Wierse (eds.) Information Visualization in Data Mining and Knowledge
Discovery, Morgan Kaufmann Publishers, San Francisco, 2002.
[3] G. Grinstein, M. Trutschl, U. Cvek, High-Dimensional Visualizations, Proceedings of Workshop on
Visual data Mining, ACM Conference on Knowledge Discovery and data Mining (2001), 1-14.
[4] S. Kaski, (1997). Data Exploration Using Self-Organizing Maps. In Acta Polytechnica Scandinavica,
Mathematics, Computing and Management in Engineering Series 82, Espoo: Finish Academy of
Technology, 1997, http://www.cis.hut.fi/~sami/thesis/thesis_tohtml.html.
[5] T. Kohonen, Self-Organizing Maps (3rd ed.). Springer Series in Inform. Sciences 30, Springer, 2001.
[6] O. Kurasova, Visual Analysis of Multidimensional Data Using Self-Organizing Maps (SOM),
Dissertation ,Vilnius: MII, 2005.
[7] D.N. Lawley, A.E. Maxwell, Factor Analysis as a Statistical Method. London: Butterworths, 1963.
[8] E. Oja, Principal Components, Minor Components, and Linear Neural Networks, Neural Networks 5
(1992), 927-935.
[9] J. Rubner, P. Tavan, A Self-Organizing Network for Principal Component Analysis, Europhysics
Letters 10 (1989), 693-698.
[10] C. Brunsdon, A.S. Fotheringham, M.E. Charlton, An Investigation of Methods for Visualising Highly
Multivariate Datasets in Case Studies of Visualization in the Social Sciences, In: D. Unwin and P.
Fisher (eds.) Joint Information Systems Committee, ESRC, Technical Report Series 43 (1998), 55-80.
[11] I. Borg, P. Groenen, Modern Multidimensional Scaling: Theory and Applications, Springer, 1997.
[12] J. de Leeuw, W. Heiser, Theory of Multidimensional Scaling. In: P.R.Krishnaiah and L.N.Kanal (eds.),
Handbook of Statistics 2, Amsterdam: North-Holland (1982), 285-316.
[13] M. Wish, J.D. Carroll, Multidimensional Scaling and Its Applications. In: P.R.Krishnaiah and
L.N.Kanal (eds.), Handbook of Statistics 2, Amsterdam: North-Holland (1982), 317-345.
[14] J.W. Sammon, A Nonlinear Napping for Data Structure Analysis, IEEE Transactions on Computers 18
(1969), 401-409.
[15] G. Biswas, A.K. Jain, R.C. Dubes, Evaluation of Projection Algorithms, IEEE Transactions on Pattern
Analysis and Machine Intelligence 3(6) (1981), 701-708.
[16] A. Flexer, Limitations of Self-Organizing Maps for Vector Quantization and Multidimensional Scaling.
In: M.C.Mozer, M.I.Jordan and T.Petsche (eds.), Advances in Neural Information Processing Systems 9.
Cambridge, MA: MIT Press/Bradford Books (1997), 445-451.
[17] T. Hastie, W. Stuetzle, Principal Curves, Journal of the American Statist. Associat. 84 (1989), 502-516.
[18] C.M. Bishop, M. Svensen, C.K.I. Wiliams, GMT: A Principles Aalternative to the Self-Organizing Map,
Advances in Neural Information Processing Systems 9 (1997), 354-363.
[19] S. T. Roweis, L. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science 290
(2000), 2323-2326.
[20] J. Mao, A.K. Jain, Artificial Neural Networks for Feature Extraction and Multivariate Data Projection,
IEEE Trans. Neural Networks 6 (1995), 296-317.
[21] D. Lowe, M.E. Tipping, Feed-forward Neural Networks and Topographic Mappings for Exploratory
Data Analysis, Neural Computing and Applications 4 (1996), 83-95.
[22] M.C. van Wezel, W.A. Kosters, Nonmetric Multidimensional Scaling: Neural networks Versus
Traditional Techniques. Intelligent Data Analysis 8(6) (2004), 601-613.
[23] J. Hartigan, Clustering Algorithms. New York: Wiley, 1975
[24] H. Yin, H, ViSOM-A Novel Method for Multivariate Data Projection and Structure Visualisation, IEEE
Trans. on Neural Networks 13 (2002), 237–243.
[25] P. Demartines, J. Hérault, Curvilinear Component Analysis: A Self-Organizing Neural Aetwork for
Nonlinear Mapping of Data Sets, IEEE Transaction on Neural Networks 8(1) (1997), 148–154.
[26] B. Baldi and K. Hornik. Neural Networks and Principal Component Analysis: Learning from examples
without local minima, Neural Networks 2 (1989), 53–58.
[27] M.A. Kramer. Nonlinear Principal Component Analysis Using Autoassociative Neural Networks,
AIChE Journal 37(2) (1991), 233–243.
[28] M.E. Tipping, Topographic Mappings And Feed-Forward Neural Networks, Dissertation, Aston
University, Birmingham, 1996.
[29] D. Lowe, Radial Basis Function Networks, In: M. A. Arbib (Ed.), The Handbook of Brain Theory and
Neural Networks, Cambridge, Mass: MIT Press, (1995) 779–782.
[30] G.E. Hinton, R.R. Salakhutdinov, Reducing The Dimensionality Of Data With Neural Networks,
Science 313 (2006), 504 – 507.
[31] D. de Ridder, R.P.W. Duin, Sammon’s Mapping Using Neural Networks: A comparison, Pattern
Recognition Letters 18 (1997), 1307-1316.
[32] G. Dzemyda, Visualization of a Set of Parameters Characterized by Their Correlation Matrix,
Computational Statistics and Data Analysis 36(1) (2001), 15-30.
[33] G. Dzemyda, V. Tiešis, Visualization of Multidimensional Objects and the Socio-Economical Impact to
Activity in EC RTD Databases, Informatica 12(2) (2001), 239-262.
[34] G. Dzemyda, O. Kurasova, Heuristic Approach for Minimizing the Projection Error in the Integrated
Mapping, European Journal of Operational Research 171(3) (2006), 859-878.
[35] P. Hellemaa, The Development of Coastal Dunes and their Vegetation in Finland, Dissertation, Fenia
176:1, Helsinki, 1998 (downloadable from website
http://ethesis.helsinki.fi/julkaisut/mat/maant/vk/hellemaa/index.html).
[36] G. Dzemyda, O. Kurasova, Dimensionality Problem in the Visualization of Correlation-Based Data,
Proceedings of International Conference on Adaptive and Natural Computing Algorithms – ICANNGA
2007, Lecture Notes in Computer Science 4432 (2007), 544-553.
[37] R.C.T. Lee, J.R. Slagle H. Blum, A Triangulation Method for Sequential Mapping of Points from n-
Space to Two-Space, IEEE Transactions on Computers 27 (1977), 288-299.
[38] V. Medvedev, G. Dzemyda, Optimization of the Local Search in the Training for SAMANN Neural
Network, Journal of Global Optimization, Springer 35 (2006), 607-623.
[39] R.A. Fisher, The use of Multiple Measurements in Taxonomic Problem, Annual Eugenics 7(II) (1936),
179-188.
[40] D. Ruppert, R.J. Carroll, Trimmed Least Squares Estimation in the Linear Model, Journal of the
American Statistical Association 75 (1980), 828-838.
[41] D.M. Hawkins, D. Bradu, G.V. Kass, Location of Several Outliers in Multiple Regression Data Using
Elemental Sets, Technometrics 26, (1984), 197–208.
[42] S. Ivanikovas, V. Medvedev, G. Dzemyda, Parallel Realizations of the SAMANN Algorithm,
Proceedings of International Conference on Adaptive and Natural Computing Algorithms – ICANNGA
2007, Lecture Notes in Computer Science 4432 (2007), 179-188.
[43] A. El-Shaarawi, W. Piegorsch, Encyclopedia of Environmetrics, Wiley, 2001.
[44] G. Dzemyda, Visualization of the Correlation-Based Environmental Data, Environmetrics 15(8) (2004),
827-836.
[45] Water Research Commission, Linking Groundwater Chemistry to atypical Lymphocytes. SA
Waterbulletin 25(2), 1999, http://www.wrc.org.za/wrcpublications/wrcbulletin/v25_2/lympho.htm
[46] R.L. Smith, CBMS course in environmental statistics, 2001,
http://www.stat.unc.edu/postscript/rs/envstat/env.html
[47] D.G. Fautin, R.W. Buddemeier, Biogeoinformatics of Hexacorallia (Corals, Sea Anemones, and Their
Allies): Interfacing Geospatial, Taxonomic, and Environmental Data for a Group of Marine
Invertebrates, 2001, http://www.kgs.ukans.edu/Hexacoral/Envirodata/Correlations/correl1.htm
[48] E.N. Ieno, Las Comunidades Bentónicas de Fondos Blandos del Norte de la Provincia de Buenos Aires:
Su rol Ecológico en el Ecosistema Costero. Tesis Doctoral Universidad Nacional de Mar del Plata,
Zoobenthic species measured in an intertidal area in Argentina, 2000,
http://www.brodgar.com/benthos.htm
[49] G. Dzemyda, Application of Mapping Methods to the Analysis of Curricula of Studies, Computational
Statistics & Data Analysis 49 (2005), 265-281.
[50] V. Šaltenis, G. Dzemyda, V. Tiešis, Quantitative Forecasting and Assessment Models in the State
Education System, Informatica 13(4) (2002), 485-500.
[51] G. Dzemyda, V. Šaltenis, V. Tiešis, Forecasting Models in the State Education System, Informatics in
Education 2(1) (2003) 3-14.
[52] The World Factbook, 1999, http://www.odci.gov/cia/publications/factbook/country.html.
[53] The World Bank Group-Development Data, 1999, http://www.worldbank.org/data/.
[54] A. Paunksnis, V. Barzdžiukas, D. Jegelevičius, S. Kurapkienė, G. Dzemyda, The Use of Information
Technologies for Diagnosis in Ophthalmology, Journal of Telemedicine and Telecare 12 (2006), 37-40.
[55] J. Bernataviciene, G. Dzemyda, O. Kurasova, V. Marcinkevicius, The Problem of Visual Analysis of
Multidimensional Medical Data. In: Torn A, Žilinskas J (eds) Models and Algorithms for Global
Optimization. Springer Optimization and its Applications 4 (2007), 277-298.
[56] J. Bernataviciene, G. Dzemyda, O. Kurasova, A. Vainoras, Integration of classification and
visualization for diagnosis decisions, International Journal of Information Technology and Intelligent
Computing 1(1) (2006), 57-68.
[57] J. Bernataviciene, G. Dzemyda, O. Kurasova, V. Marcinkevicius, Decision Support for Preliminary
Medical Diagnosis Integrating the Data Mining Methods. In: Pranevičius H, Vaarmann O, Zavadskas E
(eds.) Proceedings of 5th International Conference on Simulation and Optimisation in Business and
Industry (2006), 155-160.