Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

SELF-ORGANISING MAPPING NETWORKS (SOM) WITH

SAS E-MINER
C.Sarada, K.Alivelu and Lakshmi Prayaga
Directorate of Oilseeds Research, Rajendranagar, Hyderabad
saradac@yahoo.com
Self Organising mapping networks (SOM) (Kohonen, 2001) is a specific family of neural
networks uses unsupervised training. In unsupervised training no target output is provided and
the network evolves until stabilisation. SOM can be used for data visualisation, clustering,
estimation, vector projection and a variety of other purposes. It is an effective modelling tool for
the visualisation of high dimensional data. Non linear statistical relationships between high
dimensional data are converted into simple geometric relationships of their image points on a
low dimensional display, usually a two dimensional grid of nodes. The SOM inspired by the way
in which various human sensory impressions neurologically mapped into the brain such the
spatial or other relationship between stimuli corresponds to spatial relationships among the
neurons
A general architecture of SOM consists of a set of input nodes, output nodes and weight
parameters. Each input node is fully connected to every output node via a variable connection. A
weight parameter is associated with each of these connections. The weights between the input
nodes and output nodes are iteratively changed during the learning phase until a termination
criterion is satisfied. For each input vector, there is one associated winner node on the output
map.

A simple SOM Algorithm


Each data from data set recognizes themselves by competing for representation. SOM mapping
steps starts from initializing the weight vectors. From there a sample vector is selected randomly
and the map of weight vectors is searched to find which weight best represents that sample. Each
weight vector has neighboring weights that are close to it. The weight that is chosen is rewarded
by being able to become more like that randomly selected sample vector. The neighbors of that
weight are also rewarded by being able to become more like the chosen sample vector. From this
step the number of neighbors and how much each weight can learn decreases over time. This
whole process is repeated a large number of times, usually more than 1000 times.

Self-Organising Mapping Networks (SOM) with SAS E-Miner

In sum, learning occurs in several steps and over many iterations:


1. Each node's weights are initialized.
2. A vector is chosen at random from the set of training data.
3. Every node is examined to calculate which one's weights are most like the input vector.
The winning node is commonly known as the Best Matching Unit (BMU).
4. Then the neighbourhood of the BMU is calculated. The amount of neighbors decreases
over time.
5. The winning weight is rewarded with becoming more like the sample vector. The
nighbors also become more like the sample vector. The closer a node is to the BMU, the
more its weights get altered and the farther away the neighbor is from the BMU, the less
it learns.
6. Repeat step 2 for N iterations.
SOM vs. Classical Clustering methods
Many studies compared the SOM with the classical clustering methods (Chen et al., 1995,
Mangiameli et al. 1996, Waller et al. 1998). Chen et al 1995 investigated the performance of
SOM and hierarchical clustering methods and found that hierarchical methods are influenced by
the relative dispersion of the data. Mangiameli et al., 1996 studied the performance of the SOM
neural network and seven hierarchical clustering methods is tested on 252 data sets with various
levels of imperfections that include data dispersion, outliers, irrelevant variables, and non
uniform cluster densities . His study revealed that SOM is superior in accuracy and robustness
compared to the other clustering methods.
They are conceptually easy to understand and more efficient for grouping large datasets than the
smaller datasets such as microarray experiments for gene expression studies where thousands of
genes/observations involved, Grouping of customers for large business / banking sector etc. In
SAS Enterprise Miner, the profiling portion is very similar to clustering technique. However,
there are limitations like 1.SOM networks can be prone to issues with missing data as in all other
neural network algorithms and regressions. 2. SOM can produce differencing results as they
produce maps form sampled data so it may take a number of trials to obtain a map that is
consistent with same training data. They are rather computationally intensive.
Illustration
Data: A lab experiment was conducted at Directorate of Oilseeds Research, Hyderabad to study
the response of 29 safflower genotypes to water stress induced by PEG and to delineate the
tolerant genotypes from susceptible ones. The observations on germination percentage, Days to
minimum germination, seedling vigour, for different stress levels were recorded. the genotypes
germinated under high stress conditions also recorded. Thus the main aim of the experiment is to
classify the genotypes based on these parameters in to different groups.
A dataset Stress.xls having variables viz., sno, genotype, interval variables: g3, g4, g5
(Germination percentage at 3 different stress levels) s3, s4,s5 (corresponding seedling vigour),
Ordinal variables :sd3, sd4, sd5 ( days to maximum germination) and binary variable :

204

Self-Organising Mapping Networks (SOM) with SAS E-Miner

highstress (genotypes germinated at high stress conditions) has been created. Make a SAS
dataset file named stress in the SASUSER library.
Analysis of data with SOM with Enterprise Miner 6.1 - A step-wise Procedure:
Create the Diagram SOM
Create the input file stress assign the roles and levels for the variables drag the input file to
the diagram area name the input file as stress .

Go to explore tab and click and drag the SOM /Kohonen node to the diagram and connect the
input file named stress and SOM /Kohonen node.

Highlight the SOM/Kohonen Node we can observe property sheet in the left panel

205

Self-Organising Mapping Networks (SOM) with SAS E-Miner

Setoftablesimportedbythisnode
Setoftablesexported bythisnode
Informationabouttheanalysis
Variableproperties
SelectSOM/Kohonenmethodwanttouse

Change Options available with SOM/Kohonen Node present in the left panel.
Change the following options
internal standardization to standardisation option ( if required for the data),
row to size 2 and column size 4 ( A grid size of 2 x 4 = 8 clusters)
Go to the SOM/Kohonen Node then right click and select the option run gives the following
window

206

Self-Organising Mapping Networks (SOM) with SAS E-Miner

Click on to the Results tab. the following results can be viewed from the results view tab can
be seen

Only main result windows are discussed here.


The Map Window gives a topological mapping of all the input attributes to the clusters . The
following figure gives the different attributes for viewing the topological map.

Selecting the Nearest cluster option gives the following map. To view the table: click view tab
table.
207

Self-Organising Mapping Networks (SOM) with SAS E-Miner

We can see SOM segment ID gives the cluster number for ex. SOM ID1.1 =cluster 1 and 2:1
=5. From the above figure it can be observed that cluster 1 and cluster 3 are distinct from others.
The mean statistics window gives the clusterwise means of the variables.

The summary statistics of the clusters (min, max, standard deviation ) can be seen from Analysis
Statistics window.

To study the each cluster properties in a detailed manner we can use the Segment profile node.
208

Self-Organising Mapping Networks (SOM) with SAS E-Miner

Click Assess drag segment profile icon to the diagram area and connect the node with
SOM/Kohonen node right click and run
The Segment Profile node results output is presented below

The segment profile gives the frequency of each cluster as a pie chart. The Profile window
displays a lattice, or grid, of plots comparing the distribution for the identified and report
variables for both the segment and the total number of observations. Each row represents a single
cluster. The far left margin identifies the cluster/segment, its count, and percentage of the total
observations. By default, the rows are sorted in ascending size order from top to bottom. You can
also sort rows alphanumerically by segment name by right-clicking to get the edit menu. Select
Sort Segments. We can also change the response variable format to the count or the percent of
the entire data and expand a graphic by using the edit menu. Representation of class and Internal
variables are as follows.
Class Variable displayed as two nested pie charts that consist of two concentric rings. The
inner ring represents the distribution of the total observations. The outer ring represents the
distribution for the given segment.
Interval Variable displayed as a histogram. The blue shaded region represents the withinsegment distribution. The red outline represents the population distribution. The height of the
histogram bars can be scaled by count or by percentage of the segment population. When you are
using the percentage, the view shows the relative difference between the segment and the
population. When you are using count, the view shows the absolute difference between the
segment and total observations.
The output window contains the variable summary, Frequency information for each cluster and
Decision Tree Importance Profiles display the logworth or importance statistics for the variables
that have been identified as factors that distinguish the segment from the total. If you scroll
209

Self-Organising Mapping Networks (SOM) with SAS E-Miner

through the segment Profiled nodes output window, each set of variables by cluster/segment
wise with the worth statistic and rank of for each variable are provided. In the above figure it can
be seen that g5 variable is majorly contributed to the formation of cluster /segment 7. The same
is represented as bar diagram in Variable worth window.
References
Chen, S.K., Mangiameli, P. and West, D. (1995). The comparative ability of self-organizing neural
networks to define cluster structure. Omega, Int. J. Manage. Sci., 23, 271279.
Mangiameli.P, Shaw K. Chen and David West. (1996). A comparison of SOM neural network and
hierarchical clustering methods. European Journal of Operational Research., 93, 402-417.
Randall S.Collica (2007) CRM Segmentation and Clustering Using SAS Enterprise Miner SAS
Publishing.
SAS-Enterprise Miner 6.1 Help Documentation.

210

You might also like