Lesson 16 ANN Unsupervised

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Artificial Neural Networks:

Unsupervised models

Lesson 16

20538 – Predictive Analytics for


data driven decision making

Daniele Tonini
daniele.tonini@unibocconi.it
Reminder
Supervised vs. Unsupervised learning

An important aspect of Artificial Neural Network models is whether they need supervision in learning or not:

In supervised learning, a desired output target for each input vector is


required when the network is trained  A supervised ANN, such as the
Multi-Layer Perceptron, uses the target value to guide the formation of
the neural weights: it is thus possible to make the neural network learn
the behavior of the process under study

In unsupervised learning, the training of the network is entirely data-


driven and no target results for the input data vectors are provided  An
unsupervised ANN, such as Self-Organizing Maps, can be used for
clustering the input data and find features inherent to the problem

2
Self Organising Maps
Introduction

One interesting class of Unsupervised ANN is based on competitive learning, in which the output neurons compete
amongst themselves to be activated, with the result that, in each single iteration of the learning process, only one
unit is activated  The result is that the neurons are forced to organize themselves

This type of networks is called Self Organizing Map (SOM)

• They has been introduced by the Finnish professor Teuvo Kohonen in the 1980s, so that sometimes they are
called Kohonen maps

• They are Unsupervised ANN, useful for data clustering as the multi-dimensional relationships between data
are commonly represented on a two-dimensional output layer, enabling the representation of data into clusters

• They are characterized by two neuronal layers: one input layer and one output layer (usually named
“Kohonen layer”); each input layer neuron is connected to all output layer neurons (the two layers are totally
connected)

• In the output layer, for each input pattern that is supplied to the network, a neuron only has to be
"winning" (with the maximum activation value)

3
Self Organising Maps
Structure representation

Net Output (winning neuron)

Kohonen Layer / Output Layer


(two-dimensional grid with lateral connections)

Intra-layer “lateral” connections properties:


• Defined according to some network topology
• No weight between these connections, but
used in algorithm for updating weights

Connection Weights:
* (Adjusted by network training)

Input Layer

Input signals

*For clarity only 3 connections are shown, but input units are fully connected with weights to output units 4
Self Organising Maps
Process

The self-organization process in a Kohonen Map involves four major phases:

1 Initialization: All the connection weights are initialized with small random values (usually between 0
1.
and 1)

2 Competition: For each input pattern, the neurons compute their respective values of a similarity
2.
function which provides the basis for competition. The particular neuron with the smallest value of
the similarity function is declared the winner

3 Collaboration: The winning neuron determines the spatial location of a topological neighborhood
3.
of excited neurons, thereby providing the basis for collaboration among neighboring neurons

4.
4 Learning: The excited neurons adjust their weights using a specific delta rule, such that the
response of the winning neuron to the subsequent application of a similar input pattern is enhanced

5
Self Organising Maps
Competition phase

Consider a D dimensional input space (i.e. there are D input variables): we can write the input patterns
as 𝒙 = {𝑥𝑖 ∶ 𝑖 = 1, … , 𝐷} and the connection weights between the input units 𝑖 and the neurons 𝑗 in the
Kohonen layer can be written 𝒘𝒋 = {𝒘𝒊𝒋 : 𝑗 = 1, … , 𝑁; 𝑖 = 1, … , 𝐷} where N is the total number of neurons

We can use as similarity function the Squared Euclidean Distance between the input vector 𝒙 and the
weight vector 𝒘𝒋 for each neuron 𝑗

In this phase, the result of each iteration is the following: the neuron whose weight vector comes
closest to the input vector (i.e. is most similar to it, that means the one with the lowest distance) is
declared the winner (sometimes the winner is called the BMU = Best Matching Unit)

By doing so, the continuous input space is mapped to the discrete output space of neurons by a simple
process of competition between the neurons

6
Self Organising Maps
Collaborative phase

In biology we find that there is lateral interaction within a set of excited neurons: when one neuron is
activated, its closest neighbors tend to get excited more than those further away
There is a topological neighborhood that decays with distance: we want to define a similar topological
neighborhood for the neurons in our SOM

If 𝑆𝑖𝑗 is the lateral distance between neurons 𝑖 and 𝑗 on the grid of neurons, the gaussian function 𝑇 is
commonly used as neighborhood function:
Gaussian function representation

Where:
• I(x) is the index the identify the winning neuron (BMU)
• 𝜎 is the standard deviation of the Gaussian function, but here can be interpreted as the
“neighborhood radius” that determines to what degree neighborhood nodes adapt depending
on their distance to the winning node
7
Self Organising Maps
Collaborative phase

This neighborhood function has several important properties:


1. it is maximal at the winning neuron (and it’s equal to 1)
2. it is symmetrical about that neuron
3. it decreases monotonically to zero as the distance goes to infinity
4. it is independent from the location of the winning neuron

A special feature of the SOM is that the neighborhood radius needs to decrease with time, in order to
optimize the learning process (bigger influence of BMU in the initial steps of the learning process,
smaller in the last ones); a popular time dependence is an exponential decay of the radius, such as:

Where:
• t is the time, which is measured as the number of processed training observations (i.e. iterations)
• 𝜎0 is the initial neighborhood radius (to be set in advance)
• 𝜏𝜎 is the value of the decay exponent (to be set in advance)

8
Self Organising Maps
Learning phase

Clearly our SOM must involve some kind of learning process by which the outputs become self-
organised and the feature map between inputs and outputs is formed

 The point of the topographic neighborhood is that not only the winning neuron gets its weights
updated, but its neighbors will have their weights updated as well, although by not as much as the
winner itself; the weight update equation is based on the following delta rule:

𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑤𝑒𝑖𝑔ℎ𝑡 → 𝑤𝑗𝑖′ = 𝑤𝑗𝑖 + ∆𝑤𝑗𝑖

Like the radius, commonly we have a time dependent learning rate 𝜼(𝒕) = 𝜂0 exp(−𝑡 /𝜏𝜂 )

The effect of each learning weight update is to move the weight vectors 𝒘𝒊 of the winning neuron and its
neighbors towards the input vector x

9
Self Organising Maps
Summary and clustering

The stages of the SOM algorithm can be summarized as follows:

1. Choose random values for the initial weight vectors 𝒘𝒋


2. Draw a sample training input vector x from the input space
3. Find the winning neuron I(x) with weight vector closest to input vector (Best Matching Unit)
4. Apply the weight update equation (𝑤𝑗𝑖′ = 𝑤𝑗𝑖 + ∆𝑤𝑗𝑖 ) both to the winning neuron and to the neighbors,
according to a specific neighborhood function
5. keep returning to step 2 until the feature map stops changing (or until a pre-set number of
iterations)

CLUSTERING
Once the algorithm stops, it’s easy to use the final map configuration to create clusters of input
observations: each unit of the output node can be considered as an elementary cluster and every
input observation is assigned to the closest node (according to the distance between the input vector of
the observation and each node’s final weights)

10
Self Organising Maps
Other uses

Other uses of SOM, different from clustering:

• Familiarity – the net learns how similar is a given new input to the typical (average)
pattern it has seen before

• The net finds Principal Components in the data for dimension reduction

• Encoding – the output represents the input, using a smaller amount of bits (useful for
pre-processing issue)

• Feature Mapping – the net forms a topographic map of the input

11
Self Organising Maps
Example

Example from Fausett (1994)

Settings:
• Inputs units: n = 4 Network Architecture
• Output units: m = 2
• Topology: one dimension Kohonen layer Input units:
• Neighborhood radius: for simplicity not used here,
only update weights of the winning neuron
• Learning rate: Output units: 1 2
(t) = 0.6; 1 <= t <= 4
(t) = 0.5 (1); 5 <= t <= 8
(t) = 0.5 (5); 9 <= t <= 12
etc.
What should we
Training samples, 4 observations: expect as outputs?
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)

12
Self Organising Maps
Example

Initial weight matrix


(random values between 0 and 1)

Unit 1: .2 .6 .5 .9


.8 .4 .7 .3
Unit 2:  

Formulas for calculations


n 2
d2 = (Euclidean distance)2 =
k 1
( xl , k  w j , k (t ))
Weight update: w j (t  1)  w j (t )   (t )( xl  w j (t ))

13
Self Organising Maps
Example: first iteration

i1: (1, 1, 0, 0)
Training sample: i1
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.2-1)2 + (.6-1)2 + (.5-0)2 + (.9-0)2 = 1.86
• Unit 2 weights
d2 = (.8-1)2 + (.4-1)2 + (.7-0)2 + (.3-0)2 = .98

 Unit 2 wins

• Weights on winning unit are updated


new  unit 2  weights  [.8 .4 .7 .3]  0.6([1 1 0 0] - [.8 .4 .7 .3])  [.92 .76 .28 .12]
• Giving an updated weight matrix:

Unit 1:  .2 .6 .5 .9 
 
Unit 2: .92 .76 .28 .12
14
Self Organising Maps
Example: second iteration

i1: (1, 1, 0, 0)
Training sample: i2
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.2-0)2 + (.6-0)2 + (.5-0)2 + (.9-1)2 = .66
• Unit 2 weights
d2 = (.92-0)2 + (.76-0)2 + (.28-0)2 + (.12-1)2 = 2.28

 Unit 1 wins

• Weights on winning unit are updated


new  unit1  weights  [.2 .6 .5 .9]  0.6([0 0 0 1] - [.2 .6 .5 .9])  [.08 .24 .20 .96]
• Giving an updated weight matrix:
Unit 1: .08 .24 .20 .96
.92 .76 .28 .12
Unit 2:  
15
Self Organising Maps
Example: third iteration

i1: (1, 1, 0, 0)
Training sample: i3
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.08-1)2 + (.24-0)2 + (.2-0)2 + (.96-0)2 = 1.87
• Unit 2 weights
d2 = (.92-1)2 + (.76-0)2 + (.28-0)2 + (.12-0)2 = 0.68

 Unit 2 wins

• Weights on winning unit are updated


new  unit 2  weights  [.92 .76 .28 .12]  0.6([1 0 0 0] - [.92 .76 .28 .12])  [.97 .30 .11 .05]
• Giving an updated weight matrix:
Unit 1: .08 .24 .20 .96
.97 .30 .11 .05
Unit 2:  
16
Self Organising Maps
Example: fourth iteration

i1: (1, 1, 0, 0)
Training sample: i4
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
• Unit 1 weights
i4: (0, 0, 1, 1)
d2 = (.08-0)2 + (.24-0)2 + (.2-1)2 + (.96-1)2 = .71
• Unit 2 weights
d2 = (.97-0)2 + (.30-0)2 + (.11-1)2 + (.05-1)2 = 2.74

 Unit 1 wins

• Weights on winning unit are updated


new  unit1  weights  [.08 .24 .20 .96]  0.6([0 0 1 1] - [.08 .24 .20 .96])  [.03 .10 .68 .98]
• Giving an updated weight matrix:

Unit 1: .03 .10 .68 .98


Unit 2: .97 .30 .11 .05
17
Self Organising Maps
Example: final weights

Data sample utilized

time (t) 1 2 3 4 𝝈(t) (t)


1 Unit 2 0 0.6
First epoch results  2 Unit 1 0 0.6
3 Unit 2 0 0.6
4 Unit 1 0 0.6

‘winning’ output unit

Weigh matrix after many Unit 1:  0 0 .5 1.0


epochs through the data set  Unit 2:
1.0 .5 0 0 
 

18
Input data: Final weights:
i1: (1, 1, 0, 0) Unit 1:  0 0 .5 1.0 Self Organising Maps
i2: (0, 0, 0, 1)  
Unit 2: 1.0 .5 0 0  Example: cluster membership
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)

Sample: i1
• Distance from unit1 weights
• (1-0)2 + (1-0)2 + (0-.5)2 + (0-1.0)2 = 1+1+.25+1=3.25
• Distance from unit2 weights
• (1-1)2 + (1-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)

Sample: i2
• Distance from unit1 weights
• (0-0)2 + (0-0)2 + (0-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
• Distance from unit2 weights CONCLUSION:
• (0-1)2 + (0-.5)2 + (0-0)2 + (1-0)2 =1+.25+0+1=2.25

Sample: i3 Samples i1, i3 cluster with unit 2


• Distance from unit1 weights Samples i2, i4 cluster with unit 1
• (1-0)2 + (0-0)2 + (0-.5)2 + (0-1.0)2 = 1+0+.25+1=2.25
• Distance from unit2 weights
• (1-1)2 + (0-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)

Sample: i4 Two other “interactive” examples here:


• Distance from unit1 weights http://jjguy.com/som/
• (0-0)2 + (0-0)2 + (1-.5)2 + (1-1.0)2 = 0+0+.25+0=0.25 (winner)
• Distance from unit2 weights
• (0-1)2 + (0-.5)2 + (1-0)2 + (1-0)2 = 1+.25+1+1=3.25

19

You might also like