Lesson 16 ANN Unsupervised

Artificial Neural Networks:
Unsupervised models
Lesson 16
20538 – Predictive Analytics for

data driven decision making
Daniele Tonini
daniele.tonini@unibocconi.it
Reminder
Supervised vs. Unsupervised learning
An important aspect of Artificial Neural Network models is whether they need supervision in learning or not:
In supervised learning, a desired output target for each input vector is

required when the network is trained  A supervised ANN, such as the
Multi-Layer Perceptron, uses the target value to guide the formation of
the neural weights: it is thus possible to make the neural network learn
the behavior of the process under study
In unsupervised learning, the training of the network is entirely data-

driven and no target results for the input data vectors are provided  An
unsupervised ANN, such as Self-Organizing Maps, can be used for
clustering the input data and find features inherent to the problem
2
Self Organising Maps
Introduction
One interesting class of Unsupervised ANN is based on competitive learning, in which the output neurons compete
amongst themselves to be activated, with the result that, in each single iteration of the learning process, only one
unit is activated  The result is that the neurons are forced to organize themselves
This type of networks is called Self Organizing Map (SOM)
• They has been introduced by the Finnish professor Teuvo Kohonen in the 1980s, so that sometimes they are
called Kohonen maps
• They are Unsupervised ANN, useful for data clustering as the multi-dimensional relationships between data
are commonly represented on a two-dimensional output layer, enabling the representation of data into clusters
• They are characterized by two neuronal layers: one input layer and one output layer (usually named
“Kohonen layer”); each input layer neuron is connected to all output layer neurons (the two layers are totally
connected)
• In the output layer, for each input pattern that is supplied to the network, a neuron only has to be
"winning" (with the maximum activation value)
3
Structure representation
Net Output (winning neuron)
Kohonen Layer / Output Layer

(two-dimensional grid with lateral connections)
Intra-layer “lateral” connections properties:

• Defined according to some network topology
• No weight between these connections, but
used in algorithm for updating weights
Connection Weights:
* (Adjusted by network training)
Input Layer
Input signals
*For clarity only 3 connections are shown, but input units are fully connected with weights to output units 4
Process
The self-organization process in a Kohonen Map involves four major phases:
1 Initialization: All the connection weights are initialized with small random values (usually between 0
1.
and 1)
2 Competition: For each input pattern, the neurons compute their respective values of a similarity
2.
function which provides the basis for competition. The particular neuron with the smallest value of
the similarity function is declared the winner
3 Collaboration: The winning neuron determines the spatial location of a topological neighborhood
3.
of excited neurons, thereby providing the basis for collaboration among neighboring neurons
4.
4 Learning: The excited neurons adjust their weights using a specific delta rule, such that the
response of the winning neuron to the subsequent application of a similar input pattern is enhanced
5
Competition phase
Consider a D dimensional input space (i.e. there are D input variables): we can write the input patterns
as 𝒙 = {𝑥𝑖 ∶ 𝑖 = 1, … , 𝐷} and the connection weights between the input units 𝑖 and the neurons 𝑗 in the
Kohonen layer can be written 𝒘𝒋 = {𝒘𝒊𝒋 : 𝑗 = 1, … , 𝑁; 𝑖 = 1, … , 𝐷} where N is the total number of neurons
We can use as similarity function the Squared Euclidean Distance between the input vector 𝒙 and the
weight vector 𝒘𝒋 for each neuron 𝑗
In this phase, the result of each iteration is the following: the neuron whose weight vector comes
closest to the input vector (i.e. is most similar to it, that means the one with the lowest distance) is
declared the winner (sometimes the winner is called the BMU = Best Matching Unit)
By doing so, the continuous input space is mapped to the discrete output space of neurons by a simple
process of competition between the neurons
6
Collaborative phase
In biology we find that there is lateral interaction within a set of excited neurons: when one neuron is
activated, its closest neighbors tend to get excited more than those further away
There is a topological neighborhood that decays with distance: we want to define a similar topological
neighborhood for the neurons in our SOM
If 𝑆𝑖𝑗 is the lateral distance between neurons 𝑖 and 𝑗 on the grid of neurons, the gaussian function 𝑇 is
commonly used as neighborhood function:
Gaussian function representation
Where:
• I(x) is the index the identify the winning neuron (BMU)
• 𝜎 is the standard deviation of the Gaussian function, but here can be interpreted as the
“neighborhood radius” that determines to what degree neighborhood nodes adapt depending
on their distance to the winning node
7
Collaborative phase
This neighborhood function has several important properties:

1. it is maximal at the winning neuron (and it’s equal to 1)
2. it is symmetrical about that neuron
3. it decreases monotonically to zero as the distance goes to infinity
4. it is independent from the location of the winning neuron
A special feature of the SOM is that the neighborhood radius needs to decrease with time, in order to
optimize the learning process (bigger influence of BMU in the initial steps of the learning process,
smaller in the last ones); a popular time dependence is an exponential decay of the radius, such as:
Where:
• t is the time, which is measured as the number of processed training observations (i.e. iterations)
• 𝜎0 is the initial neighborhood radius (to be set in advance)
• 𝜏𝜎 is the value of the decay exponent (to be set in advance)
8
Learning phase
Clearly our SOM must involve some kind of learning process by which the outputs become self-
organised and the feature map between inputs and outputs is formed
 The point of the topographic neighborhood is that not only the winning neuron gets its weights
updated, but its neighbors will have their weights updated as well, although by not as much as the
winner itself; the weight update equation is based on the following delta rule:
𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡 → 𝑤𝑗𝑖′ = 𝑤𝑗𝑖 + ∆𝑤𝑗𝑖
Like the radius, commonly we have a time dependent learning rate 𝜼(𝒕) = 𝜂0 exp(−𝑡 /𝜏𝜂 )
The effect of each learning weight update is to move the weight vectors 𝒘𝒊 of the winning neuron and its
neighbors towards the input vector x
9
Summary and clustering
The stages of the SOM algorithm can be summarized as follows:
1. Choose random values for the initial weight vectors 𝒘𝒋

2. Draw a sample training input vector x from the input space
3. Find the winning neuron I(x) with weight vector closest to input vector (Best Matching Unit)
4. Apply the weight update equation (𝑤𝑗𝑖′ = 𝑤𝑗𝑖 + ∆𝑤𝑗𝑖 ) both to the winning neuron and to the neighbors,
according to a specific neighborhood function
5. keep returning to step 2 until the feature map stops changing (or until a pre-set number of
iterations)
CLUSTERING
Once the algorithm stops, it’s easy to use the final map configuration to create clusters of input
observations: each unit of the output node can be considered as an elementary cluster and every
input observation is assigned to the closest node (according to the distance between the input vector of
the observation and each node’s final weights)
10
Other uses
Other uses of SOM, different from clustering:
• Familiarity – the net learns how similar is a given new input to the typical (average)
pattern it has seen before
• The net finds Principal Components in the data for dimension reduction
• Encoding – the output represents the input, using a smaller amount of bits (useful for
pre-processing issue)
• Feature Mapping – the net forms a topographic map of the input
11
Example
Example from Fausett (1994)
Settings:
• Inputs units: n = 4 Network Architecture
• Output units: m = 2
• Topology: one dimension Kohonen layer Input units:
• Neighborhood radius: for simplicity not used here,
only update weights of the winning neuron
• Learning rate: Output units: 1 2
(t) = 0.6; 1 <= t <= 4
(t) = 0.5 (1); 5 <= t <= 8
(t) = 0.5 (5); 9 <= t <= 12
etc.
What should we
Training samples, 4 observations: expect as outputs?
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
12
Example
Initial weight matrix

(random values between 0 and 1)
Unit 1: .2 .6 .5 .9

.8 .4 .7 .3
Unit 2:  
Formulas for calculations

n 2
d2 = (Euclidean distance)2 =
k 1
( xl , k  w j , k (t ))
Weight update: w j (t  1)  w j (t )   (t )( xl  w j (t ))
13
Example: first iteration
i1: (1, 1, 0, 0)
Training sample: i1
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.2-1)2 + (.6-1)2 + (.5-0)2 + (.9-0)2 = 1.86
• Unit 2 weights
d2 = (.8-1)2 + (.4-1)2 + (.7-0)2 + (.3-0)2 = .98
 Unit 2 wins
• Weights on winning unit are updated

new  unit 2  weights  [.8 .4 .7 .3]  0.6([1 1 0 0] - [.8 .4 .7 .3])  [.92 .76 .28 .12]
• Giving an updated weight matrix:
Unit 1:  .2 .6 .5 .9 
 
Unit 2: .92 .76 .28 .12
14
Example: second iteration
i1: (1, 1, 0, 0)
Training sample: i2
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.2-0)2 + (.6-0)2 + (.5-0)2 + (.9-1)2 = .66
• Unit 2 weights
d2 = (.92-0)2 + (.76-0)2 + (.28-0)2 + (.12-1)2 = 2.28
 Unit 1 wins

new  unit1  weights  [.2 .6 .5 .9]  0.6([0 0 0 1] - [.2 .6 .5 .9])  [.08 .24 .20 .96]
Unit 1: .08 .24 .20 .96
.92 .76 .28 .12
Unit 2:  
15
Example: third iteration
i1: (1, 1, 0, 0)
Training sample: i3
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.08-1)2 + (.24-0)2 + (.2-0)2 + (.96-0)2 = 1.87
• Unit 2 weights
d2 = (.92-1)2 + (.76-0)2 + (.28-0)2 + (.12-0)2 = 0.68
 Unit 2 wins

new  unit 2  weights  [.92 .76 .28 .12]  0.6([1 0 0 0] - [.92 .76 .28 .12])  [.97 .30 .11 .05]
Unit 1: .08 .24 .20 .96
.97 .30 .11 .05
Unit 2:  
16
Example: fourth iteration
i1: (1, 1, 0, 0)
Training sample: i4
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.08-0)2 + (.24-0)2 + (.2-1)2 + (.96-1)2 = .71
• Unit 2 weights
d2 = (.97-0)2 + (.30-0)2 + (.11-1)2 + (.05-1)2 = 2.74
 Unit 1 wins

new  unit1  weights  [.08 .24 .20 .96]  0.6([0 0 1 1] - [.08 .24 .20 .96])  [.03 .10 .68 .98]
Unit 1: .03 .10 .68 .98

Unit 2: .97 .30 .11 .05
17
Example: final weights
Data sample utilized
time (t) 1 2 3 4 𝝈(t) (t)

1 Unit 2 0 0.6
First epoch results  2 Unit 1 0 0.6
3 Unit 2 0 0.6
4 Unit 1 0 0.6
‘winning’ output unit
Weigh matrix after many Unit 1:  0 0 .5 1.0

epochs through the data set  Unit 2:
1.0 .5 0 0 
 
18
Input data: Final weights:
i1: (1, 1, 0, 0) Unit 1:  0 0 .5 1.0 Self Organising Maps
i2: (0, 0, 0, 1)  
Unit 2: 1.0 .5 0 0  Example: cluster membership
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
Sample: i1
• Distance from unit1 weights
• (1-0)2 + (1-0)2 + (0-.5)2 + (0-1.0)2 = 1+1+.25+1=3.25
• (1-1)2 + (1-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)
Sample: i2
• (0-0)2 + (0-0)2 + (0-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
• Distance from unit2 weights CONCLUSION:
• (0-1)2 + (0-.5)2 + (0-0)2 + (1-0)2 =1+.25+0+1=2.25
Sample: i3 Samples i1, i3 cluster with unit 2

• Distance from unit1 weights Samples i2, i4 cluster with unit 1
• (1-0)2 + (0-0)2 + (0-.5)2 + (0-1.0)2 = 1+0+.25+1=2.25
• (1-1)2 + (0-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)
Sample: i4 Two other “interactive” examples here:

• Distance from unit1 weights http://jjguy.com/som/
• (0-0)2 + (0-0)2 + (1-.5)2 + (1-1.0)2 = 0+0+.25+0=0.25 (winner)
• (0-1)2 + (0-.5)2 + (1-0)2 + (1-0)2 = 1+.25+1+1=3.25
19

Lesson 16 ANN Unsupervised

Uploaded by

Copyright:

Available Formats

You might also like

Lesson 16 ANN Unsupervised

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 16 ANN Unsupervised

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks:

20538 – Predictive Analytics for

In supervised learning, a desired output target for each input vector is

In unsupervised learning, the training of the network is entirely data-

This type of networks is called Self Organizing Map (SOM)

Net Output (winning neuron)

Kohonen Layer / Output Layer

Intra-layer “lateral” connections properties:

The self-organization process in a Kohonen Map involves four major phases:

This neighborhood function has several important properties:

𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡 → 𝑤𝑗𝑖′ = 𝑤𝑗𝑖 + ∆𝑤𝑗𝑖

The stages of the SOM algorithm can be summarized as follows:

1. Choose random values for the initial weight vectors 𝒘𝒋

Other uses of SOM, different from clustering:

• Feature Mapping – the net forms a topographic map of the input

Example from Fausett (1994)

Initial weight matrix

Unit 1: .2 .6 .5 .9

Formulas for calculations

• Weights on winning unit are updated

• Weights on winning unit are updated

• Weights on winning unit are updated

• Weights on winning unit are updated

Unit 1: .03 .10 .68 .98

Data sample utilized

time (t) 1 2 3 4 𝝈(t) (t)

‘winning’ output unit

Weigh matrix after many Unit 1:  0 0 .5 1.0

Sample: i3 Samples i1, i3 cluster with unit 2

Sample: i4 Two other “interactive” examples here:

You might also like