ML Lecture#3

Machine Learning
Lecture # 3
Single & Multilayer Percceptron
1
Artificial Neural Network -
Perceptron
A (Linear) Decision Boundary
Represented by:
One artificial neuron
called a “Perceptron”
Low space complexity
Low time complexity

Input signals sent
from other neurons
If enough sufficient
signals accumulate, the
neuron fires a signal.
Connection strengths determine

how the signals are
accumulated
What is ANN?
• The nucleus sums all these new input values which gives us
the activation
• For n inputs and n weights – weights multiplied by input
and summed
a = x1w1+x2w2+x3w3... +xnwn
5
Perceptron
Perceptron
• input signals ‘x’ and weights ‘w’ are multiplied
• weights correspond to connection strengths
• signals are added up – if they are enough, FIRE!
x1 w1 add
if (a  t)
M
x2 output  output
output
w2 a   xi w i 1 signa
else lsignal
i1
output 
x3 w3 0
incoming connection
activation
signal strength
level
Calculation…
M Sum notation
a   xi (just like a loop from 1 to
M)
wi
i1
Multiple corresponding
double[] x = elements and add them
up
a (activation)
double[] w =
if (activation > threshold) FIRE !
Perceptron Decision Rule

 M
xw
 i i   t then output  1, else output 
if
 i 1  0

 M
xw
if  i i   t then ou tp u t  1, els e ou tp u t  0
 i 1 
output = 0
output = 1
Is this a good decision boundary?

 M
xw
if 
 i 1
i i 

 t then output  1, else output 
0
w1 = 1.0
w2 = 0.2
t = 0.05

 M
xw
if 
 i 1
i i 

0
w1 = 2.1
w2 = 0.2
t = 0.05

 M
xw
if 
 i 1
i i 

0
w1 = 1.9
w2 = 0.02
t = 0.05

 M
xw
if 
 i 1
i i 

0
w1 = -0.8
w2 = 0.03
t = 0.05
Changing the weights/threshold makes the decision boundary

move.
M
x  [ 1.0, 0.5, 2.0 ] a   xi wi

i1
w  [ 0.2, 0.5, 0.5 ] x1 w1
t  1.0 x2
w2
x3 w3
Q1. What is the activation, a, of the

neuron? Q2. Does the neuron fire?
Q3. What if we set threshold at 0.5 and
weight #3 to zero?
M
x  [ 1.0, 0.5, 2.0 ] a   xi wi

i1
w  [ 0.2, 0.5, 0.5 ] x1 w1
w2
t  1.0 x2
x3 w3
Q1. What is the activation, a, of the neuron?

M
a   xi wi  (1.0 0.2)  (0.5 0.5)  (2.0 0.5) 

1.45
i1
Q2. Does the neuron fire?

if (activation > threshold) output=1 else output=0
M
x  [ 1.0, 0.5, 2.0 ] a   xi wi

i1
w  [ 0.2, 0.5, 0.5 ] x1 w1
w2
t  1.0 x2
x3 w3
Q3. What if we set threshold at 0.5 and weight #3 to zero?

M
a   xi wi  (1.0 0.2)  (0.5 0.5)  (2.0 0.0)  0.45

i1
if (activation > threshold) output=1 else
output=0
…. So no, it does not fire..
We can rearrange the decision rule….
M 
if  x w  t then output  1, else output  0
 i 1 i i 
M 
if  x w  t  0 then output  1, else output 
 i 1 i i  0

 x w  (1
M
if   t)i  0 then output  1, else output 
 i1 i
 0
if M 
 0 then output  1, else output 
 
i 1 x w 
i i   ( x  w )
0 0
0
  then output  1, else output 

if
M
 0
 
i 0 xi wi  0
We now treat the threshold like any other weight with a permanent input of -1
The Bias
False Input
Perceptron Learning
Algorithm
initialise weights (w)
Repeat until all points are correctly classified
Repeat for each point
Calculate margin (yiwXi) for point i)
If margin > 0, point is correclty
classified
Else change the weights to increase margin such that
Δw = ηyiXi and wnew = wold + Δw
end
end
Perceptron convergence theorem:

If the data is linearly separable, then application of the
Perceptron learning rule will find a separating decision boundary,
within a finite number of iterations
Decision Boundary Using
Perceptron
Multiple Outputs
Criticize on Perceptron
• Minsky and Papert criticise the
perceptron (1969)
• Minsky and Papert's criticism was partly right
• We have no a priori reason to think
that problems should be linear separable
•
it turns out world is full
However,
the that of close linear
problems
are a to
least
separable
Can a Perceptron solve this problem?
Can a Perceptron solve this problem? ….. NO.
Perceptrons only solve
LINEARLY SEPARABLE
problems
With a perceptron…
the decision boundary is
LINEAR
A
0
0 1
B
Overview of SVM w.r.t.
Perceptron
Perceptron
Perceptron VS SVM
Perceptron VS SVM
• The Perceptron does not try to optimize the separation
"distance". As long as it finds a hyperplane that separates the
two sets, it is good. SVM on the other hand tries to maximize
the "support vector", i.e., the distance between two closest
opposite sample points.
• The SVM typically tries to use a "kernel function" to project
the sample points to high dimension space to make them
linearly separable, while the perceptron assumes the sample
points are linearly separable.
• SVM Requires more parameters as compared to
– choice of kernel
– selection of kernel parameters
– selection of the value of the margin parameter
SVM and Margins
SVM for Nonlinear Data
Acknowledgements
 Introduction to Machine Learning, Alphaydin
 Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
 Pattern Recognition and Analysis Course – A.K. Jain, MSU
 Pattern Classification” by Duda et al., John Wiley & Sons.
Material in these slides has been taken from, the following
resources
37

ML Lecture#3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Lecture#3

Uploaded by

Copyright:

Available Formats

Machine Learning

Low space complexity

Low time complexity

Connection strengths determine

Changing the weights/threshold makes the decision boundary

x  [ 1.0, 0.5, 2.0 ] a   xi wi

Q1. What is the activation, a, of the

x  [ 1.0, 0.5, 2.0 ] a   xi wi

Q1. What is the activation, a, of the neuron?

a   xi wi  (1.0 0.2)  (0.5 0.5)  (2.0 0.5) 

Q2. Does the neuron fire?

x  [ 1.0, 0.5, 2.0 ] a   xi wi

Q3. What if we set threshold at 0.5 and weight #3 to zero?

a   xi wi  (1.0 0.2)  (0.5 0.5)  (2.0 0.0)  0.45

  then output  1, else output 

Perceptron convergence theorem:

Perceptrons only solve

You might also like