Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Machine Learning

Lecture # 3
Single & Multilayer Percceptron

1
Artificial Neural Network -
Perceptron
A (Linear) Decision Boundary

Represented by:
One artificial neuron
called a “Perceptron”

Low space complexity

Low time complexity


Input signals sent
from other neurons

If enough sufficient
signals accumulate, the
neuron fires a signal.

Connection strengths determine


how the signals are
accumulated
What is ANN?
• The nucleus sums all these new input values which gives us
the activation
• For n inputs and n weights – weights multiplied by input
and summed

a = x1w1+x2w2+x3w3... +xnwn

5
Perceptron
Perceptron
• input signals ‘x’ and weights ‘w’ are multiplied
• weights correspond to connection strengths
• signals are added up – if they are enough, FIRE!

x1 w1 add

if (a  t)
M
x2 output  output
output
w2 a   xi w i 1 signa
else lsignal
i1
output 
x3 w3 0

incoming connection
activation
signal strength
level
Calculation…

M Sum notation
a   xi (just like a loop from 1 to
M)
wi
i1

Multiple corresponding
double[] x = elements and add them
up
a (activation)
double[] w =
if (activation > threshold) FIRE !
Perceptron Decision Rule


 M
xw
 i i   t then output  1, else output 
if
 i 1  0

 M
xw
if  i i   t then ou tp u t  1, els e ou tp u t  0
 i 1 

output = 0
output = 1
Is this a good decision boundary?


 M
xw
if 
 i 1
i i 

 t then output  1, else output 
0
w1 = 1.0

w2 = 0.2

t = 0.05


 M
xw
if 
 i 1
i i 

 t then output  1, else output 
0
w1 = 2.1

w2 = 0.2

t = 0.05


 M
xw
if 
 i 1
i i 

 t then output  1, else output 
0
w1 = 1.9

w2 = 0.02

t = 0.05


 M
xw
if 
 i 1
i i 

 t then output  1, else output 
0
w1 = -0.8

w2 = 0.03

t = 0.05

Changing the weights/threshold makes the decision boundary


move.
M

x  [ 1.0, 0.5, 2.0 ] a   xi wi


i1
w  [ 0.2, 0.5, 0.5 ] x1 w1

t  1.0 x2
w2
x3 w3

Q1. What is the activation, a, of the


neuron? Q2. Does the neuron fire?
Q3. What if we set threshold at 0.5 and
weight #3 to zero?
M

x  [ 1.0, 0.5, 2.0 ] a   xi wi


i1
w  [ 0.2, 0.5, 0.5 ] x1 w1
w2
t  1.0 x2

x3 w3

Q1. What is the activation, a, of the neuron?


M

a   xi wi  (1.0 0.2)  (0.5 0.5)  (2.0 0.5) 


1.45
i1

Q2. Does the neuron fire?


if (activation > threshold) output=1 else output=0
M

x  [ 1.0, 0.5, 2.0 ] a   xi wi


i1
w  [ 0.2, 0.5, 0.5 ] x1 w1
w2
t  1.0 x2

x3 w3

Q3. What if we set threshold at 0.5 and weight #3 to zero?


M

a   xi wi  (1.0 0.2)  (0.5 0.5)  (2.0 0.0)  0.45


i1
if (activation > threshold) output=1 else
output=0
…. So no, it does not fire..
We can rearrange the decision rule….

M 
if  x w  t then output  1, else output  0
 i 1 i i 
M 
if  x w  t  0 then output  1, else output 
 i 1 i i  0

 x w  (1
M
if   t)i  0 then output  1, else output 
 i1 i
 0

if M 
 0 then output  1, else output 
 
i 1 x w 
i i   ( x  w )
0 0
0

  then output  1, else output 


if
M
 0
 
i 0 xi wi  0

We now treat the threshold like any other weight with a permanent input of -1
The Bias
False Input
Perceptron Learning
Algorithm
initialise weights (w)
Repeat until all points are correctly classified
Repeat for each point
Calculate margin (yiwXi) for point i)
If margin > 0, point is correclty
classified
Else change the weights to increase margin such that
Δw = ηyiXi and wnew = wold + Δw
end
end

Perceptron convergence theorem:


If the data is linearly separable, then application of the
Perceptron learning rule will find a separating decision boundary,
within a finite number of iterations
Decision Boundary Using
Perceptron
Multiple Outputs
Criticize on Perceptron
• Minsky and Papert criticise the
perceptron (1969)
• Minsky and Papert's criticism was partly right
• We have no a priori reason to think
that problems should be linear separable

it turns out world is full
However,
the that of close linear
problems
are a to
least
separable
Can a Perceptron solve this problem?
Can a Perceptron solve this problem? ….. NO.

Perceptrons only solve

LINEARLY SEPARABLE

problems
With a perceptron…
the decision boundary is
LINEAR

A
0

0 1

B
Overview of SVM w.r.t.
Perceptron
Perceptron
Perceptron VS SVM
Perceptron VS SVM
• The Perceptron does not try to optimize the separation
"distance". As long as it finds a hyperplane that separates the
two sets, it is good. SVM on the other hand tries to maximize
the "support vector", i.e., the distance between two closest
opposite sample points.
• The SVM typically tries to use a "kernel function" to project
the sample points to high dimension space to make them
linearly separable, while the perceptron assumes the sample
points are linearly separable.
• SVM Requires more parameters as compared to
– choice of kernel
– selection of kernel parameters
– selection of the value of the margin parameter
SVM and Margins
SVM for Nonlinear Data
Acknowledgements
 Introduction to Machine Learning, Alphaydin
 Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
 Pattern Recognition and Analysis Course – A.K. Jain, MSU
 Pattern Classification” by Duda et al., John Wiley & Sons.
Material in these slides has been taken from, the following
resources

37

You might also like