Professional Documents
Culture Documents
Lec5 Class Margin
Lec5 Class Margin
Yufei Tao
Department of Computer Science and Engineering
Chinese University of Hong Kong
1 / 28
Yufei Tao
2 / 28
Yufei Tao
margin
margin
3 / 28
Yufei Tao
Definition 2.
Let P be a linearly separable dataset in Rd . The goal of the large margin
separation problem is to find a separation plane with the maximum
margin.
margin
4 / 28
Yufei Tao
Next, we will discuss two methods to approach the problem. The first
one find gives the optimal solution, but is quite complicated and (often)
computationally expensive. The second method, on the other hand, is
much simpler and (often) much faster, but gives an approximate solution
which is nearly optimal.
5 / 28
Yufei Tao
6 / 28
Yufei Tao
margin
7 / 28
Yufei Tao
Now, focus on the two copies of the plane in their final positions. If one
copy has equation c1 x1 + c2 x2 + ... + cd xd = cd+1 , then the other copy
must have equation c1 x1 + c2 x2 + ... + cd xd = cd+1 . Here cd+1 is a
strictly positive value. Let p(x1 , ..., xd ) be a point in P. We must have
(think: why?):
if p is red, then c1 x1 + c2 x2 + ... + cd xd
cd+1 ;
cd+1 .
where
wi
8 / 28
Yufei Tao
ci
.
cd+1
The margin of the original separation plane is exactly half of the distance
between 1 and 2 :
1
margin
9 / 28
Yufei Tao
Lemma 3.
The distance between 1 and 2 is pP2d
i=1
wi2
10 / 28
Yufei Tao
Proof of Lemma 3
Take an arbitrary point p1 on 1 , and an arbitrary poitn p2 on 2 . Hence,
~ p~1 = 1 and w
~ p~2 = 1. It follows that w
~ (~
w
p1 p~2 ) = 2.
1
w
~
p1
2
p~1
p~2
p2
11 / 28
Yufei Tao
~
w
|~
w|
(~
p1
p~2 ) =
2
|~
w| .
following constraints:
wi
1.
12 / 28
Yufei Tao
13 / 28
Yufei Tao
14 / 28
Yufei Tao
15 / 28
Yufei Tao
Let us first assume that we know a value satisfying opt (we will
clarify how to find later). Recall that a separation plane has the
equation of c1 x1 + c2 x2 + ... + cd xd = 0. Define vector ~c = [c1 , c2 , ..., cd ],
and refer to the plane as the plane determined by ~c . The goal is to find a
good ~c .
Our weapon is once again Perceptron. The dierence from before is that
we will now correct our ~c not only when a point falls on the wrong side
of the plane determined by ~c , but also when the point is too close to the
plane. Specifically, we say that a point p causes a violation in any of the
following situations:
Its distance to the plane determined by ~c is less than or equal to
/2, regardless of the color.
p is red but ~c p~ < 0.
p is blue but ~c p~ > 0.
16 / 28
Yufei Tao
Margin Perceptron
The algorithm starts with ~c = [0, 0, ..., 0], and then runs in iterations.
In each iteration, it simply checks whether any point in p 2 P causes a
violation. If so, the algorithm adjusts ~c as follows:
If p is red, then ~c
If p is blue, then ~c
~c + p~.
~c
p~.
As soon as ~c has been adjusted, the current iteration finishes; and a new
iteration starts.
The algorithm finishes if no point causes any violation in the current
iteration.
17 / 28
Yufei Tao
Define:
R
max{|~
p |}
p2P
18 / 28
Yufei Tao
19 / 28
Yufei Tao
An Incremental Algorithm
1
4 0; i
1.
6
7
20 / 28
i+1
be the
i + 1; Go to Line 4.
Yufei Tao
Lemma 5.
Consider the i-th call to Margin Perceptron. If manual termination
occurs at Line 4, then i > opt . Otherwise, i+1 2 i .
Proof.
First consider the case of manual termination. If opt , then by
Theorem 4, we know that Margin Perceptron should have terminated in
2
2
at most 1 + 8R2 1 + 8R2 iterations. Contradiction.
opt
21 / 28
Yufei Tao
Corollary 6.
The algorithm terminates after at most 1 + log2
Perceptron.
opt
0
calls to Margin
Proof.
From the previous lemma, we know that i 2i 0 . Suppose that the
algorithm makes k calls to Margin Perceptron. Then,
2k 1 0 . Solving k from the inequality gives the
opt
k 1
corollary.
22 / 28
Yufei Tao
Theorem 7.
Our incremental algorithm returns a separation plane with margin at least
opt /4.
Proof.
Suppose that the algorithm terminates after k calls to Margin
Perceptron. Hence:
k
>
opt
23 / 28
Yufei Tao
k /4,
which is at least
Appendix
24 / 28
Yufei Tao
Proof of Theorem 4.
Let u~ ~x = 0 be the optimal separation plane with margin
loss of generality, suppose that |~
u | = 1. Hence:
opt
opt .
Without
min{|~
p u~|}
p2P
Recall that the perceptron algorithm adjusts ~c in each iteration. Let ~ci
(i 1) be the ~c after the i-th iteration. Also, let ~c0 = [0, ..., 0] be the
initial ~c before the first iteration. Also, let k be the total number of
adjustments.
25 / 28
Yufei Tao
Proof (cont.).
We claim that, for any i 0, ~ci+1 u~ ~ci u~ + opt . Due to symmetry,
we prove this only for the case where ~ci+1 was adjusted from ~ci because
of the violation of a red point p~.
In this case, ~ci+1 = ~ci + p~; and hence, ~ci+1 u~ = ~ci u~ + p~ u~. From the
definition of opt , we know that p~ u~
opt . Therefore,
~ci+1 u~ ~ci u~ + opt .
It follows that
|~ck |
26 / 28
~ck u~
Yufei Tao
opt .
(1)
Proof (cont.).
2
opt
R
We also claim that, for any i 0, |~ci+1 | |~ci | + 2|~
ci | + 2 . Due to
symmetry, we will prove this only for the case where ~ci+1 was adjusted
from ~ci due to the violation of a red point p~.
~ci
p~1
O
(origin)
p~
p~2
As shown above, p~ = p~1 + p~2 , where p~1 is perpendicular to ~ci , and p~2 is
parallel to ~ci (and hence, perpendicular to the plane determined by ~ci ).
Therefore, ~ci+1 = ~ci + p~ = ~ci + p~1 + p~2 . The claim is true due to:
By definition of violation, p~2 is either is to the opposite of ~ci or has
a norm of |~
p2 | /2 opt /2.
Notice that |~
p1 | |~
p | R. Hence, |~ci + p~1 |2 = |~ci |2 + |~
p1 |2
R2 2
R2
2
2
|~ci | + R (|~ci | + 2|~ci | ) . It thus follows that |~ci + p~1 | |~ci | + 2|~
ci | .
27 / 28
Yufei Tao
Proof (cont.).
The claim on the previous slide implies that when |~ci |
|~ci+1 | |~ci | +
opt
2R 2
opt
. Therefore:
|~ck |
2R 2
3k
opt
opt
28 / 28
opt
2R 2
opt
2
8R
2
opt
Yufei Tao
3k
opt