Professional Documents
Culture Documents
8-Som With E-Miner
8-Som With E-Miner
SAS E-MINER
C.Sarada, K.Alivelu and Lakshmi Prayaga
Directorate of Oilseeds Research, Rajendranagar, Hyderabad
saradac@yahoo.com
Self Organising mapping networks (SOM) (Kohonen, 2001) is a specific family of neural
networks uses unsupervised training. In unsupervised training no target output is provided and
the network evolves until stabilisation. SOM can be used for data visualisation, clustering,
estimation, vector projection and a variety of other purposes. It is an effective modelling tool for
the visualisation of high dimensional data. Non linear statistical relationships between high
dimensional data are converted into simple geometric relationships of their image points on a
low dimensional display, usually a two dimensional grid of nodes. The SOM inspired by the way
in which various human sensory impressions neurologically mapped into the brain such the
spatial or other relationship between stimuli corresponds to spatial relationships among the
neurons
A general architecture of SOM consists of a set of input nodes, output nodes and weight
parameters. Each input node is fully connected to every output node via a variable connection. A
weight parameter is associated with each of these connections. The weights between the input
nodes and output nodes are iteratively changed during the learning phase until a termination
criterion is satisfied. For each input vector, there is one associated winner node on the output
map.
204
highstress (genotypes germinated at high stress conditions) has been created. Make a SAS
dataset file named stress in the SASUSER library.
Analysis of data with SOM with Enterprise Miner 6.1 - A step-wise Procedure:
Create the Diagram SOM
Create the input file stress assign the roles and levels for the variables drag the input file to
the diagram area name the input file as stress .
Go to explore tab and click and drag the SOM /Kohonen node to the diagram and connect the
input file named stress and SOM /Kohonen node.
Highlight the SOM/Kohonen Node we can observe property sheet in the left panel
205
Setoftablesimportedbythisnode
Setoftablesexported bythisnode
Informationabouttheanalysis
Variableproperties
SelectSOM/Kohonenmethodwanttouse
Change Options available with SOM/Kohonen Node present in the left panel.
Change the following options
internal standardization to standardisation option ( if required for the data),
row to size 2 and column size 4 ( A grid size of 2 x 4 = 8 clusters)
Go to the SOM/Kohonen Node then right click and select the option run gives the following
window
206
Click on to the Results tab. the following results can be viewed from the results view tab can
be seen
Selecting the Nearest cluster option gives the following map. To view the table: click view tab
table.
207
We can see SOM segment ID gives the cluster number for ex. SOM ID1.1 =cluster 1 and 2:1
=5. From the above figure it can be observed that cluster 1 and cluster 3 are distinct from others.
The mean statistics window gives the clusterwise means of the variables.
The summary statistics of the clusters (min, max, standard deviation ) can be seen from Analysis
Statistics window.
To study the each cluster properties in a detailed manner we can use the Segment profile node.
208
Click Assess drag segment profile icon to the diagram area and connect the node with
SOM/Kohonen node right click and run
The Segment Profile node results output is presented below
The segment profile gives the frequency of each cluster as a pie chart. The Profile window
displays a lattice, or grid, of plots comparing the distribution for the identified and report
variables for both the segment and the total number of observations. Each row represents a single
cluster. The far left margin identifies the cluster/segment, its count, and percentage of the total
observations. By default, the rows are sorted in ascending size order from top to bottom. You can
also sort rows alphanumerically by segment name by right-clicking to get the edit menu. Select
Sort Segments. We can also change the response variable format to the count or the percent of
the entire data and expand a graphic by using the edit menu. Representation of class and Internal
variables are as follows.
Class Variable displayed as two nested pie charts that consist of two concentric rings. The
inner ring represents the distribution of the total observations. The outer ring represents the
distribution for the given segment.
Interval Variable displayed as a histogram. The blue shaded region represents the withinsegment distribution. The red outline represents the population distribution. The height of the
histogram bars can be scaled by count or by percentage of the segment population. When you are
using the percentage, the view shows the relative difference between the segment and the
population. When you are using count, the view shows the absolute difference between the
segment and total observations.
The output window contains the variable summary, Frequency information for each cluster and
Decision Tree Importance Profiles display the logworth or importance statistics for the variables
that have been identified as factors that distinguish the segment from the total. If you scroll
209
through the segment Profiled nodes output window, each set of variables by cluster/segment
wise with the worth statistic and rank of for each variable are provided. In the above figure it can
be seen that g5 variable is majorly contributed to the formation of cluster /segment 7. The same
is represented as bar diagram in Variable worth window.
References
Chen, S.K., Mangiameli, P. and West, D. (1995). The comparative ability of self-organizing neural
networks to define cluster structure. Omega, Int. J. Manage. Sci., 23, 271279.
Mangiameli.P, Shaw K. Chen and David West. (1996). A comparison of SOM neural network and
hierarchical clustering methods. European Journal of Operational Research., 93, 402-417.
Randall S.Collica (2007) CRM Segmentation and Clustering Using SAS Enterprise Miner SAS
Publishing.
SAS-Enterprise Miner 6.1 Help Documentation.
210