Professional Documents
Culture Documents
MLNC Unsupervised
MLNC Unsupervised
001100001
= 001110000
110001111 . . .
Clusters in input space
dogs
cats
Want one output unit that responds preferentially (or only) to dogs
and another output unit that responds preferentially to cats.
The weighted sum of inputs can also be represented using the inner product (or dot
product) of the input vector and the weight vector: S wi xi = w x
x2
w = (0, 1) x = (1, 1)
w x = w x cos
x1
wx=01+11=1
The inner product increases with increasing similarity between the two vectors .
This means that an output unit is maximally activated by inputs that are similar to its
weight vector.
Specialising on clusters in input space
dogs
cats
wdog
wcat
The weight vector of the “dog” unit should point to the centre of the “dog” cluster,
and the weight vector of the “cat” unit should point to the centre of the “cat” cluster.
w1
w2
Most biologically realistic, but least efficient way of choosing the winner: add
long-range (ideally global) lateral inhibition and short-range (or self) excitation to
the network. A similar (but more local ) connectivity pattern is often found in the
brain and described by a Mexican hat function.
- -
A more efficient way of finding the winner
More efficient: calculate the activation of the output units as a function of the
inputs xj and weights wij and hand-pick the output unit with the largest activation yi:
Output activation: yi = x1 wi1 + x2 wi 2 ... = x j wij = x wi
yi* = max ( yi )
j
Winner:
Weight update only for the winner: wi* → wi* + a ( x − wi* )
where a is the learning rate. This moves the weight vector wi* of the winning output
unit towards the current input vector x.
This mechanism requires normalised weight vectors and benefits from normalised input
vectors. The vectors can be normalised by reinforcing that they all have the same
length, e.g.:
j =1
x
j
2
ij = 1
w 2
j
Competitive learning moves weight vectors in input space
a ((x1, x2, x3) - (w1,w2,w3)*)
Winning
weight Input vector
vector Input vector (x1, x2, x3)
(w1,w2,w3)* (x1, x2, x3)
Output units
(x1, x2, x3)
winner (w1,w2)*
(w1,w2)*
Output units
Usually the output units form a regular 2D grid, but other arrangements are possible.
The aim is for output units that are close to each other in physical space to respond to
inputs that are similar (i.e. close together in input space).
x1 x2
Input space (x1, x2)
Mapping
x2
Output space
(physical)
x1
This resembles the topographic maps that are found in the brain.
Topographic maps in the brain
visual field
visual cortex
The retinotopic map in the visual cortex (2)
• Neighbouring areas in the visual field are processed by neighbouring areas of cortex.
• A large area of cortex is dedicated to the area surrounding the fovea (centre of gaze,
contains highest density of receptors).
The orientation map in the visual cortex
(A) Areas in the visual cortex respond preferentially to bars with different orientations.
The orientation preference varies gradually around discontinuities called pinwheels.
(B) Ocular dominance columns: alternating areas of cortex respond to input from the
left or right eye.
Somatotopic maps in sensory and motor cortex
Mustached bats use sonar signals with frequencies around 61kHz for echolocation.
These frequencies are over-represented in the auditory cortex.
Modelling the formation of an orientation map
Von der Malsburg (1973): cortical layer with 169 units receives input from retina
with 19 units. Output units in the cortical layer excite their nearest neighbours and
inhibit (mediated by a separate set of units) a slightly larger local region.
Activate retina with binary patterns that represent light bars. Calculate the resulting
cortical activity and update the weights vectors by Hebbian learning so that they
become more similar to the input vectors.
The algorithm:
1. Start with small random weights.
2. Present a randomly chosen input vector x.
3. Find the output unit whose weight vector wi* is closest to the input vector x:
x − wi* x − wi
4. Update the weight vector of the winning output unit and its neighbours so that
they become more similar to the input vector (with learning rate a):
wi → wi + a ( x − wi ) a ((x1, x2, x3) - (w1,w2,w3)*)
wi → wi + a L(i, i*) ( x − wi )
1D → 2D Kohonen example: the auditory cortex of a bat
Modelling the auditory cortex of a bat
Input: a single ultrasound frequency that is draw from a probability distribution with
a peak at 61 kHz (used by mustached bats for echolocation).
Formation of a 1D → 2D map in bat auditory cortex
Learning the map (output space):
Input space: input vectors (x1, x2) that are drawn randomly from a square
region of a plane (0 < x1 < 1, 0 < x2 < 1).
The weight vectors start off at random locations close to (0.5, 0.5).
Learning pulls the weight vectors apart so that they cover the whole input space:
0 1
2D →2D maps
2D →1D maps
(these result in
space-filling
Peano curves)
demo
Modelling the formation of a somatosensory map
Output space: grid of 30 x 30 units that represent somatosensory cortex.
Inputs: vectors drawn randomly from five regions (D, L, M, R, T) of a hand-shaped surface.
Start with random weight vectors and apply Kohonen’s algorithm (Ritter & Schulten 1986).
Input space
Output space
Initial
weight
vectors
Development of the somatosensory map
Reorganization of the map in response to loss of a digit
Loss of input from one digit means that inputs from the neighbouring digits take over
the inactive areas of simulated cortex.
Removal
of a digit
A self-organising semantic map
Ritter & Kohonen (1986): Output space: 2D grid with10 x 10 output units.
Input space: binary vectors that encode features of animals.
Preprocessing: divide speech waveform into short segments (9.83ms), perform FFT
(Fast Fourier Transform) to generate 15-dimensional input vectors.
Problems: some units respond to more than one phoneme (for example /p, t/, /t, k/).
Moreover, the pronunciation of phonemes is context-dependent and depends on the
preceding and following phonemes (co-articulation effects).
Solution: use an auxiliary map to distinguish the phonemes /k, p, t/ and use a rule-based
grammar processor to correct for co-articulation effects.
The travelling salesman problem
The Elastic Net learning rule contains three terms: one for the attraction
between units (beads) on the ring wi and the position of the cities xk, and
two others for the attraction between neighbouring units (wi-1, wi, wi+1):
wi → wi + a ( L ik ( xk − wi ) + b ( wi +1 − wi ) + b ( wi −1 − wi ))
k