Quiz 1: September 3rd: EE 615: Pattern Recognition & Machine Learning Fall 2016

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

EE 615: Pattern Recognition & Machine Learning

Fall 2016

Quiz 1: September 3rd


Lecturer: Prof.Dinesh Garg

Scribes: Q 1,2: Arvind Roshaan S & Q3: Anikesh Kamath

Note: LaTeX template courtesy of UC Berkeley EECS dept.

Question 1:
Part (a): Explain what you understand by multicolinearity effect?
x11
x21
Solution: Let us consider the data matrix X =
...

x12
x22
..
.

...
...
..
.

xN 1 xN 2 . . .


x1d
y1
y2
x2d

and Y =
..
...
.
xN d

yN

Multicolinearity effect: When all the columns of X are not linearly independent.
Part (b): When do we say that the given data has multicolinearity effect? Explain your answer by using the example of linear regression problem.
Solution: Now consider the example of linear regression problem:
L(W) = kXW-Yk22
To get the optimal solution W* , we set the W L(W) = 0
Which gives us, W* = (XT X)-1 XT Y
Now we say that the data has multicolinearity effect if the matrix XT X is rank deficient. It
follows from the fact that since X doesnt have its all columns linearly independent, therefore
X is rank deficient. This can be proved as follows, X is rank deficient, therefore its null
space is not empty, which implies Xv 6= 0 for some non zero v . Therefore, XT Xv = 0 is
also true for the same set of v . Hence XT X has non empty null space.
Question 2: For least square regression, show that the following properties hold
true:
Part (a): The sum of residual errors is zero.
Solution: For least square regression,
n
X
2
LD (W) =
(xT
i W yi )
i=1

1-1

1-2

Quiz 1: September 3rd

LD (W) =

n 
X

W0 +

d
X

i=1

xij Wj yi

2

j=1

Now to get the optimal solutions we set gradient to zero, therefore,


LD (W)
=0
W0
n
d


X
X
=
2 W0 +
xij Wj yi = 0
i=1

j=1

n
X

(xT
i W yi ) = 0

i=1

i = xT
Therefore the predicted value y
i W
n
X
(
yi yi ) = 0

i=1

and residual is ei = (
yi yi )
=

n
X

ei = 0

i=1

Part (b): Total Variance = Explained Variance + Unexplained Variance.


Note: Couldnt be completed! Sorry.
Solution: Let
n
1X
=
y
y
n i=1 i
and

=
y

1X

y
n i=1 i

From equation (1),


n

X 
1X
i = 0
yi
y
n i=1
i=1
= yi = yi
n
2
1 X

Var(y) =
yi y
n i=1
n
2
1 X

i y

Var(
y) =
y
n i=1

(1)

Quiz 1: September 3rd

1-3

Question 3 Solution:
x is drawn randomly from the space X. Toss a coin:
If heads then:
y = wT x
If tails then:
y = vT x
Now since hBayes is the hypothesis which minimizes the generalization error.
Z Z
hB ayes = argminh
l(h(x, y))p(y|x)p(x)dxdy
x

Here p(x) is a uniform probability distribution and after getting x we are finding y by thowing
a fair coin. Hence p(yx) is is Bernoulli distribution as
p(y = wt |x) =

1
2

and
p(y = v t |x) =

1
2

From eq. 1
Z

Z
1
1
hB ayes = argminh l(h, (x, w x)) p(x)dx + argminh l(h, (x, v T x)) p(x)dx
2
2
x
x
Z
1
hB ayes = argminh (l(h, (x, wT x)) + l(h, (x, v T x))) p(x)dx
2
T

a) Now for the loss function


l(h, (x, y)) = (h(x) y)2
Z
1
hB ayes = argminh
(h(x) wT x)2 + (h(x) v T x)2 )p(x)dx
2
x

Now we consider the inner terms to minimize the loss because for a given x,it is constant as
the output variable is h.

Let h(x) = t
d1
[(t wT x)2 + (t v T x)2 ] = 0
dt 2
2
[(t wT x) + (t v T x)] = 0
2

1-4

Quiz 1: September 3rd

2t wT x v T x = 0
1
t = (wT x + v T x)
2
t = E xy (y|x)
1
hB ayes = E xy (y|x) = (wT x + v T x)
2

Quiz 1: September 3rd

1-5

b) l(h,(x,y)) = h(x) - y
Z Z
|h(x) y|p(y|x)p(x)dxdy

hB ayes = argminh
x

hB ayes = argminh

1
2

(|h(x) wT x| + |h(x) v T x|)p(x)dx

Let us consider the inner term

Let h(x) = t
1
hB ayes = argminh
2

(|t wT x| + |t v T x|)p(x)dx

Any value of t S is the global minimizer of the cost function mentioned above, where,
S = [min(wT x, v T x), max(wT x, v T x)]
Hence,
hB ayes [min(wT x, v T x), max(wT x, v T x)]

You might also like