Professional Documents
Culture Documents
Intelligent Systems
Intelligent Systems
Intelligent Systems
Lectures 2 & 3
(1.2) Learning systems
• Linear artificial neural models
• Parameter estimation
In these 2 lectures we’ll introduce both the basic Perceptron
classifier and also the LMS algorithm. Both of these are
instantaneous/on-line learning algorithms.
2. Application area
– Face detection (classification)
Experience E is a
Model M which is Performance P
data set
parameterized by compares M’s
D = {X,y} ^
the unknown predictions y
of measurements
vector θ against y
and desired targets
Prediction: y^ = m(x,θ
θ)
Lecture 1: Summary
Many design tasks require a machine learning approach:
• Classification, clustering, regression, reinforcement
Experience E is a
Model M which is Performance P
data set
parameterized by compares M’s
D = {X,y} ^
the unknown predictions y
of measurements
vector θ against y
and desired targets
3/1/2011
Prediction: y^ = m(x,θ
θ) EEEN40115, 16/99
Lectures 2 & 3:
Linear Machine Learning Algorithms
Lectures 2 & 3
(1.2) Learning systems
• Linear artificial neural models
• Parameter estimation
In these 2 lectures we’ll introduce both the basic Perceptron
classifier and also the LMS algorithm. Both of these are
instantaneous/on-line learning algorithms.
What is Classification?
Classification is also known as (statistical) pattern recognition
How to encode
qualitative target
and input features?
x2, θ2
xk
Error-driven update: ș^ k +1
șˆ k +1 = șˆ k + η yk x k ηy k
^ș
k
x1, θ1
The parameters are updated to make them more like the incorrect
feature vector.
After updating: y, y^
1
xTk șˆ k +1 = xTk șˆ k + η yk xTk x k x șˆ k
T
2
k xTk șˆ k
= xTk șˆ k + η yk x k 2
2 0 2 ^
η xk −η xk xTθ
Updated parameters are closer 2
-1
2
to correct decision
3/1/2011 EEEN40115, 25/99
2 2
θ1
ș − șˆ k +1 = ș − șˆ k − η yk x k
* *
2 2
2
= ș* − șˆ k
2
+ η 2 xk
2
2 (
− 2η yk xTk ș* − șˆ k )
2 2
< ș* − șˆ k + η 2 xk 2
− 2η yk xTk ș*
2
2 To finish proof, select
≤ ș* − șˆ k + η 2 R 2 − 2ηαγ
2
ηR 2
α>
2γ
3/1/2011 EEEN40115, 28/99
Convergence Analysis of the Perceptron (iii)
To show this terminates in a finite number of iterations,
simply note that:
ε (α ) = η 2 R 2 − 2ηαγ < 0
is independent of the current training sample, so the
parameter error must decrease by at least this amount at
^
each update iteration. As the initial error is finite, θ0 = 0,
say, there must exist a finite number of steps before the
parameter error is reduced to zero.
Classification Margin
ș
In this proof, we assumed that there x2
exists a single, optimal parameter
vector.
In practice, when the data is linearly
separable, there are an infinite
number – simply requiring correct
classification results in an ill-posed x1
posed problem
The classification margin can be x2 xT ș = ?
1
defined as the minimum distance of 0
-1
the decision boundary to a point in
that class
– Used in deriving Support Vector
Machines
x1
3/1/2011 EEEN40115, 30/99
Example: Perceptron & Logical AND (i)
Consider modelling the logical AND data using a Perceptron
ª0 0º ª− 1º
«0 «− 1»
1»» Is the data linearly separable?
X=« , y=« »
«1 0» «− 1»
«¬1 1 »¼ « »
¬1¼
^ ^ ^
k=0, θ = [0.01, 0.1, 0.006] k=5, θ = [-0.98, 1.11, 1.01] k=18, θ = [-2.98, 2.11, 1.01]
x2 x2 x2
x1 x1 x1
3/1/2011 EEEN40115, 31/99
θ^1,k
θ^i,k θ^2,k
bias θ^0,k
y, y^
Note:
1. Assume there is no measurement noise in the target data
2. Assume the data is generated from a linear relationship
3. Parameter estimation will take an infinite time to converge
to the optimal values
4. Rate of convergence and stability depend on the learning
rate
θ^0
^ ^
θ1 θ
^
θ1
^
θ0
3/1/2011
k EEEN40115, 38/99
Lectures 2 & 3: Summary
This lecture has looked at basic linear classification
(Perceptron) and regression LMS techniques
– Investigated basic linear model structure
– Proposed simple, “on-line” error-based learning rules
șˆ k +1 = șˆ k + η ( yk − yˆ k ) x k
– Proved convergence for simple environments