Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Korea Advanced Institute of Science and Technology

School of Electrical Engineering


EE331 Introduction to Machine Learning, Spring 2019

Issued: Apr 5, 2019 Assignment 2


Due: Apr 12, 2019

Policy

Group study is encouraged; however, assignment that you hand-in must be of your own work. Any-
one suspected of copying others will be penalized. The homework will take considerable amount of
time so start early.

1. (k-NN) This problem concerns the proof to the theorem regarding the performance bound
on the 1-NN classifer. Given the conditional probability of the ith class ci of input x for
i = 1, . . . , L, the Bayes classifier predicts the class c∗ that maximizes the conditional probability
p(ci |x) i.e. c∗ = arg maxi p(ci |x). The conditional probability of miss-classification e using the
Bayes classifier is given as Pe∗ = p(e|x) = 1 − p(c∗ |x). The probability of error for 1-NN Pe is
bounded as follows:
Pe∗ ≤ Pe ≤ 2Pe∗ . (1)
For input x with true class label θ = ct , assume the 1-NN is xn with class label θn = cn .
(i) Using the fact ct is dependent on x but independent of cn , the conditional probability
of ct = cn = ci given x and xn is p(θ = ci , θn = ci |x, xn ) = p(θ = ci |x)p(θn = ci |xn ).
Represent the conditional probability of error e given x and xn , P (e|x, xn ) in terms of
p(ci |x) and p(ci |xn )
(ii) As the number of training samples approaches infinity x = xn , P (e|x, xn ) = P (e|x).
Represent P (e|x) in terms of p(ci |x)?
PL 2 2 ∗
P 2
P
(iii) Using i=1 p (ci |x) = p (c |x) + ci 6=c∗ p (ci |x) and the fact that ci 6=c∗ p(ci |x) is

minimum when p(ci |x) = P L−1
(e|x)
for ci 6= c∗ ; otherwise p(ci |x) = 1 − Pe∗ . Here Pe∗ is
defined above. Derive the upper bound of Eqn. 1.
(iv) (Programming) Write a simple k Nearest Neighbor implementation. Run the implemented
kNN algorithm on Glass dataset (http://archive.ics.uci.edu/ml/datasets/Glass Identication)
Estimate the performance of the k-NN algorithm with and without normalization across
a range of values for k (from 1 to 25). Plot the accuracy, measured using 10 fold cross
validation, as a function k (with and without normalization of features).
10-fold cross validation: split the data into 10 equal parts and then use 9 parts for training
and the remaining 1 part to test. evaluate
2. (Perceptron) Please refer to the attached programming files.

3. (Constrained Optimization, Lagrange) Principal component analysis requires determining k


principal components such that their projection variances are the top k leading variances
subject to the constraint that the principal components are orthonormal. A constraint opti-
mization can be formulated, and it can be solved using the Lagrangian method. For better
understanding of the method, a simple constrained optimization problem is be considered.

1
(i) Find the extrema of the function f (x, y) = x2 + y 2 + 4x − 2y subject to the constraint
2x2 + y 2 = 4. Reformulate the problem in matrix form in term of w = [x y]T .
Sol) Let g(x) = 2x2 + y 2 − 4 = 0 which is related to constraint. For w = [x y]T , we can
reformulate the formulas as follows.

f (x, y) = x2 + y 2 + 4x − 2y
   
  x   x
= x y + 4 −2
y y
f (w) = wT w + 4 −2 w
 

g(x) = 2x2 + y 2 − 4
  
  2 0 x
= x y −4
0 1 y
 
2 0
g(w) = wT w−4
0 1

Let’s make the lagrange multiplier L(w, λ).

L(w, λ) = minw,λ (f (w) − λg(w))


L(w, λ) = maxw,λ (f (w) − λg(w))

When ∇L(w, λ) = 0, we can obtain optimal solution.

∇w L(w, λ) = ∇w f (w) − λ∇w g(w) = 0 · · · (a)


∇λ L(w, λ) = −g(w) = 0 · · · (b)

Therefore, (a) can be rewritten as follows.

∇w f (w) = λ∇w g(w)


     
4 2 0 2 0
2w + = λ( + )w
−2 0 1 0 1
     
2 0 4 4 0
w+ =λ w
0 2 −2 0 2
   
4 4λ − 2 0
= w
−2 0 2λ − 2
   
2λ − 1 0 2
w=
0 λ−1 −1
 −1  
2λ − 1 0 2 1
w= (if λ 6= or λ 6= 1)
0 λ−1 −1 2
 1  
0 2
w = 2λ−1 1
0 λ−1 −1
 2 
w = 2λ−1 −1 · · · (c)
λ−1

Also, (b) can be rewritten as follows.

g(w) = 0
 
2 0
wT w − 4 = 0 · · · (d)
0 1

2
By substituting (d) into (c), we can obtain λ.
  2 
 2 −1
 2 0 2λ−1
2λ−1 λ−1 −1 −4=0
0 1 λ−1
 2 
−1
 4  2λ−1
2λ−1 λ−1 −1 −4=0
λ−1
8 1
2
+ =4
(2λ − 1) (λ − 1)2
1
λ = −0.26931 or λ = 1.6381(λ 6= or λ 6= 1) · · · (e)
2
By substituting (e) into (c), we can obtain solution for w.
   
−1.2999 0.8787
w= or w = · · · (f )
0.7878 −1.5672

Then, the extrema of the function f (w) can be obtained as follows.


 
−1.2999
, f (w) = wT w + 4 −2 w = −4.4648
 
When w =
0.7878
 
0.8787
, f (w) = wT w + 4 −2 w = 9.8769
 
When w =
−1.5672

Theerefore, the extrema of the funtion f is −4.4648 or 9.8769.

(ii) Maximize function f (x, y) = y 2 − x subject to g(x, y) = 2x2 + 2xy + y 2 − 1 = 0.


Sol) For w = [x y]T , we can reformulate the formulas as follows.

f (x, y) = y 2 − x
    
  0 0 x   x
= x y + −1 0
0 1 y y
 
0 0
f (w) = wT
 
w + −1 0 w
0 1
g(x, y) = 2x2 + 2xy + y 2 − 1 = 2x2 + y 2 + 2xy − 1
     
  2 0 x   0 1 x
= x y + x y −1
0 1 y 1 0 y
   
2 0 0 1
g(w) = wT w + wT w−1
0 1 1 0
 
2 1
= wT w−1
1 1

Let’s make the lagrange multiplier L(w, λ).

L(w, λ) = maxw,λ (f (w) − λg(w))

When ∇L(w, λ) = 0, we can obtain optimal solution.

∇w L(w, λ) = ∇w f (w) − λ∇w g(w) = 0 · · · (a)


∇λ L(w, λ) = −g(w) = 0 · · · (b)

3
Therefore, (a) can be rewritten as follows.
∇w f (w) = λ∇w g(w)
     
0 0 −1 2 1
2 w+ = 2λ w
0 1 0 1 1
   
4λ 2λ −1
w=
2λ 2λ − 2 0
 −1  
4λ 2λ −1
w= (if λ 6= 0 or λ 6= 2)
2λ 2λ − 2 0
" #
2λ−2 −2λ 
4λ(λ−2) 4λ(λ−2) −1
w= −2λ 4λ
4λ(λ−2) 4λ(λ−2)
0
" # " #
−2λ+2 −λ+1
4λ(λ−2) 2λ(λ−2)
w= 2λ = 1 · · · (c)
4λ(λ−2) 2(λ−2)

Also, (b) can be rewritten as follows.


g(w) = 0
 
2 1
wT w − 1 = 0 · · · (d)
1 1
By substituting (d) into (c), we can obtain λ.
i  " −λ+1 #
2 1
h
−λ+1 1 2λ(λ−2)
2λ(λ−2) 2(λ−2) 1 −1=0
1 1 2(λ−2)
" #
h i −λ+1
−λ+2 1 2λ(λ−2)
2λ(λ−2) 2λ(λ−2) 1 −1=0
2(λ−2)

(−λ + 2)(−λ + 1) + λ
=1
4λ2 (λ − 2)2
λ = 2.3576 or λ = 0.3621 or λ = 1.6379 or λ = −0.3576(λ 6= 0 or λ 6= 2) · · · (e)
By substituting (e) into (c), we can obtain solution for w.
       
−0.8052 −0.5378 0.5378 0.8052
w= or w = or w = or w = · · · (f )
1.3982 −0.3053 −1.3809 −0.2121
Then, the extrema of the function f (w) can be obtained as follows.
 
−0.8052
, f (w) = wT w + 4
 
When w = −2 = 2.7602
1.3982
 
−0.5378
, f (w) = wT w + 4
 
When w = −2 = 0.6310
−0.3053
 
0.5378
, f (w) = wT w + 4
 
When w = −2 = 1.3690
−1.3809
 
0.8052
, f (w) = wT w + 4
 
When w = −2 = −0.7602
−0.2121
Theerefore, the maximum of the funtion f is 2.7602.

4
Submit Instructions for Programming Assignment
• Please submit in .zip file to KLMS named ee331 assignment2 studentID.zip, for example,
”ee331 assignment2 20191234.zip”

• In matlab code, the comment explaining your code must be included, or you will not get a full
grade even if your code works fine. Please also include all the files that are required to run the
code in the zip file. Do not change the name of the folder and comments should be written in
English. Additionally submitting unexecutable code will receive no points.

You might also like