Professional Documents
Culture Documents
Lecture 16 - Hyperplane Classifiers - Perceptron - Plain
Lecture 16 - Hyperplane Classifiers - Perceptron - Plain
𝒘 ⊤ 𝒙 +𝑏=0
Equation of the hyperplane:
Assumption: The hyperplane passes through origin. If not, add a bias term
Distance of a point from a hyperplane
Can be positive
or negative
CS771: Intro to ML
3
Hyperplane based (binary) classification
Basic idea: Learn to separate two classes by a hyperplane
For multi-class classification
with hyperplanes, there will be
Prediction Rule multiple hyperplanes (e.g., one
for each pair of classes); more on
𝑦 ∗=sign(𝒘 ¿ ¿⊤ 𝒙 ∗+𝑏)¿ this later
These measure the difference between the true output and model’s prediction
What about loss functions for classification where ?
Perhaps the most natural classification loss function would be a “0-1 Loss”
Loss = 1 if and Loss = 0 if . 0-1 Loss
Non-convex, non-
Assuming labels as +1/-1, it means differentiable, and NP-Hard to
{
optimize (also no useful
^
ℓ ( 𝑦 , 𝑦 )= 1 if gradient info for the most
part)
0 if (0,1)
Same as or
(0,0) 𝑦 𝒘⊤ 𝒙
CS771: Intro to ML
5
Interlude: Loss Functions for Classification
An ideal loss function for classification should be such that“Perceptron” Loss
Loss is small/zero if and match
Loss is large/non-zero if and do not match
Large positive ⇒ small/zero loss
Convex and Non-
Large negative ⇒ large/non-zero loss differentiable
(0,1)
Convex and Convex and Non-
Differentiable differentiable
CS771: Intro to ML