Professional Documents
Culture Documents
IE643 Lecture3 2020aug21
IE643 Lecture3 2020aug21
IE643 Lecture3 2020aug21
IE 643
Lecture 3
1 Recap
Supervised Machine Learning
Perceptron and Learning
2 Perceptron Convergence
Binary Classification
Recall: e-mail Spam Classification
Binary Classification
Training
Input: Training data D = {(xi , yi )}ni=1
Aim: Learn a model h : X → Y
Training
Input: Training data D = {(xi , yi )}ni=1
Aim: Learn a model h : X → Y
Testing
Given x̂, predict ŷ = h(x̂)
P. Balamurugan Deep Learning - Theory and Practice August 21, 2020. 7 / 60
Recap Perceptron and Learning
Perceptron
Prediction Rule
hw , xi ≥ θ =⇒ predict 1.
hw , xi < θ =⇒ predict −1.
Geometric Idea: To find a separating hyperplane.
P. Balamurugan Deep Learning - Theory and Practice August 21, 2020. 10 / 60
Recap Perceptron and Learning
Hyperplane in Rd
H = {x ∈ Rd : ∃w 6= 0 and b ∈ R s.t. hw , xi = b}.
Prediction Rule
hw , xi ≥ θ =⇒ predict 1.
hw , xi < θ =⇒ predict −1.
Geometric Idea: To find a separating hyperplane (w , θ) such that samples
with class labels 1 and −1 lie on alternate sides of the hyperplane.
Prediction Rule
hw , xi ≥ θ =⇒ predict 1.
hw , xi < θ =⇒ predict −1.
Geometric Idea: To find a separating hyperplane (w , θ) such that samples
with class labels 1 and −1 lie on alternate sides of the hyperplane.
Perceptron
Perceptron - Training
First assumption: At least the data should be such that the samples
with label 1 can to be separated by a hyperplane from samples with
label −1.
First assumption: At least the data should be such that the samples
with label 1 can to be separated by a hyperplane from samples with
label −1.
Is this assumption sufficient?
hw ∗ , x t i > γ where y t = 1,
hw ∗ , x t i < −γ where y t = −1.
y t hw ∗ , x t i > γ.
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t + y t x t i − hw ∗ , w t i
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t + y t x t i − hw ∗ , w t i
= hw ∗ , w t i + hw ∗ , y t x t i − hw ∗ , w t i (how?)
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t + y t x t i − hw ∗ , w t i
= hw ∗ , w t i + hw ∗ , y t x t i − hw ∗ , w t i
= hw ∗ , y t x t i
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t + y t x t i − hw ∗ , w t i
= hw ∗ , w t i + hw ∗ , y t x t i − hw ∗ , w t i
= hw ∗ , y t x t i
= y t hw ∗ , x t i (how?)
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t + y t x t i − hw ∗ , w t i
= hw ∗ , w t i + hw ∗ , y t x t i − hw ∗ , w t i
= hw ∗ , y t x t i
= y t hw ∗ , x t i > γ
hw ∗ , w t+1 i − hw ∗ , w t i > γ.
Hence hw ∗ , w t+1 i − hw ∗ , w t i = 0.
T
X X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t+1 i − hw ∗ , w t i+
t=1 t∈{1,...,T },
t:mistake is made
at round t
X
hw ∗ , w t+1 i − hw ∗ , w t i
t∈{1,...,T },
t:no mistake is made
at round t
T
X X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t+1 i − hw ∗ , w t i+
t=1 t∈{1,...,T },
t:mistake is made
at round t
X
hw ∗ , w t+1 i − hw ∗ , w t i
t∈{1,...,T },
t:no mistake is made
at round t
X
= hw ∗ , w t+1 i − hw ∗ , w t i + 0 (how?)
t∈{1,...,T },
t:mistake is made
at round t
T
X X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w t+1 i − hw ∗ , w t i
t=1 t∈{1,...,T },
t:mistake is made
at round t
> Mγ (how?)
Also note:
T
X
hw ∗ , w t+1 i − hw ∗ , w t i = hw ∗ , w T +1 i (homework!)
t=1
Hence we have:
T
X
hw ∗ , w t+1 i − hw ∗ , w t i > Mγ
t=1
=⇒ hw ∗ , w T +1 i > Mγ