Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 13

Percetron Convergance

Theorem
Perceptron: Limitations

• The perceptron can only model linearly separable


functions.
• The perceptron can be used to model the following
Boolean functions:
• AND
• OR
• COMPLEMENT
• But it cannot model the XOR. Why?
Perceptron: Limitations

• The XOR is not linear separable


• It is impossible to separate the classes C1 and C2 with only
one line
x2 C1 C
1 1 -1 2
0 -1 1 C1
0 1 x1
Convergence of the learning algorithm

Suppose datasets C1, C2 are linearly separable. The perceptron


convergence algorithm converges after n0 iterations, with n0 
nmax on training set C1  C2.

Proof:
• suppose x  C1  output = 1 and x  C2  output = -1.
• For simplicity assume w(1) = 0,  = 1.
• Suppose perceptron incorrectly classifies x(1) … x(n) … C1.
Then wT(k) x(k)  0.
 Error correction rule:
w(2) = w(1) + x(1)
w(3) = w(2) + x(2)
 w(n+1) = x(1)+ …+ x(n)
w(n+1) = w(n) + x(n).
Convergence theorem (proof)

• Let w0 be such that w0T x(n) > 0  x(n)  C1.


• w0 exists because C1 and C2 are linearly separable.

• Let  = min w0T x(n) | x(n)  C1.

• Then w0T w(n+1) = w0T x(1) + … + w0T x(n)  n

• Cauchy-Schwarz inequality:
||w0||2 ||w(n+1)||2  [w0T w(n+1)]2
n2  2
||w(n+1)||2  ||w0|| 2 (A)
Convergence theorem (proof)
• Now we consider another route:
• w(k+1) = w(k) + x(k) ||
w(k+1)||2 = || w(k)||2 + ||x(k)||2 + 2 w T(k)x(k)
euclidean norm 
 0 because x(k) is misclassified
 ||w(k+1)||2  ||w(k)||2 + ||x(k)||2 k=1,..,n
=0
||w(2)||2  ||w(1)||2 + ||x(1)||2
||w(3)||2  ||w(2)||2 + ||x(2)||2
n

 x(k )
2
 ||w(n+1)||2 
k 1
convergence theorem (proof)
• Let  = max ||x(n)||2 x(n)  C1
• ||w(n+1)||2  n 
• For (B) sufficiently large values of k:
• (B) becomes in conflict with (A). Then n cannot be greater than nmax such
that (A) and (B) are both satisfied with the equality sign.

n 2max 2 || w0 || 2
2
 nmax  nmax  2
β
|| w0 || 
• Perceptron convergence algorithm terminates in at most iterations.

nmax=
 ||w0||2
2
Adaline: Adaptive Linear Element

• The output y is a linear combination o x

x1
w1
y
x2 w2
wm
m
xm
y   x j (n)w j (n )
j 0
Adaline: Adaptive Linear Element
• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS)
learning algorithm
The idea: try to minimize the square error, which is a function of the
weights

E ( w(n))  12 e 2 ( n )
m
e( n )  d ( n )   x j (n)w j (n )
j 0

• We can find the minimum of the error function E by means of the


Steepest descent method
Steepest Descent Method

• start with an arbitrary point


• find a direction in which E is decreasing most rapidly


 (gradient of E ( w ))   E , , E
w1 wm

• make a small step in that direction

w( n  1)  w(n)   ( gradient of E(n))


Least-Mean-Square algorithm
(Widrow-Hoff algorithm)

• Approximation of gradient(E)

E ( w(n)) e(n)
 e(n)
w(n) w(n)
 e(n)[ x(n) T ]

• Update rule for the weights becomes:

w(n  1)  w(n)  x(n)e(n)


Summary of LMS algorithm

Training sample: input signal vector x(n)


desired response d(n)
User selected parameter  >0
Initialization set ŵ(1) = 0

Computation for n = 1, 2, … compute


e(n) = d(n) - ŵT(n)x(n)
ŵ(n+1) = ŵ(n) +  x(n)e(n)
Thank You

You might also like