SML_Assignment_Homework_1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

CSE 575: Homework #1

Puru Lokendra Singh

26th September, 2022

Problem 1

a) Solution: As X and Y are independent events and P(Y) ¿ 0, so P (X|Y ) = P (X).

T
Solution: As X and Y are disjoint events (P (X Y ) = 0) and P(Y) is greater than 0, so
T
P (X Y )
b) P (X|Y ) = =0
P (Y )

Solution: Given, P (C1 = Head) = 0.6, P (C2 = Head) = 0.4, so


P (C1 = T ail) = 1 − P (C1 = Head) = 0.4 and P (C2 = T ail) = 1 − P (C2 = Head) = 0.6
Now as the tossing of two coins is an independent event, P (C1 , C2 ) = P (C1 ) ∗ P (C2 ), so
P (C1 = Head, C2 = T ail) = P (HT ) = P (C1 = Head) ∗ P (C2 = T ail) = 0.6 ∗ 0.6 = 0.36
Similarly, P (C1 = T ail, C2 = T ail) = P (T T ) = P (C1 = T ail) ∗ P (C2 = T ail) = 0.4 ∗ 0.6 = 0.24
So for the sequence (HT, HT, TT, TT) the probability is:
c) P (HT ) ∗ P (HT ) ∗ P (T T ) ∗ P (T T ) = 0.36 ∗ 0.36 ∗ 0.24 ∗ 0.24 = 0.00746496

Solution:
Number of times head occurs
P (Head) =
Total number of tosses
15
= = 0.75
20
d)

1
CSE 575 Homework #1 Problem 2

Problem 2

Solution: (a) The least-square error function for a linear classifier can be calculated as,
N
1X T
E(w) = (w xn − tn )2
2 n=1

(b) Taking the derivative of this error function and equating it to 0, we get,

w = (xT ∗ x)−1 (xT ∗ t)

x with dummy feature (x0 = 1) and vector t can be given as

   
1 1 0 1
 1 2 1   1 
x=
 1
, t= 
2 3   0 
1 3 3 0

After solving this using numpy library, we get

>>> import numpy as np


>>> x = np.array([[1,1,0],[1,2,1],[1,2,3],[1,3,3]])
>>> t = np.array([1,1,0,0])
>>> w = np.dot(np.linalg.inv(x.T@x),(x.T@t))
>>> w
array([ 0.94444444, 0.16666667, -0.44444444])

a) So after solving this above equation using numpy we get w0 = 0.9445, w1 = 0.1667 and w2 = −0.4445

Problem 2 continued on next page. . . 2


CSE 575 Homework #1 Problem 2 (continued)

Solution: a) The error function for the Fisher’s linear discriminant can be given as

(m2 − m1 )2
J(W ) =
s21 + s22
W T SB W
J(W ) = (in marix format)
W T SW W

b) Sw = S1 + S2 and,
Ci
X
Si = (xn − mi ) ∗ (xn − mi )T
n=1
As the first two are classified as 1 and others are classified as 0,
1
m1 = (x1 + x2 )
2
1
m2 = (x3 + x4 )
2

After solving these we get,


   
0.5 0.5 0.5 0
S1 = , S2 =
0.5 0.5 0 0

Calcuating Sw inverse,

>>> s1 = np.array([[0.5, 0.5],[0.5, 0.5]])


>>> s2 = np.array([[0.5, 0],[0, 0]])
>>> sw = s1 + s2
>>> swinv = np.linalg.inv(sw)
>>> swinv
array([[ 2., -2.],[-2., 4.]])
 
−1 2 −2
So, Sw =
−2 4
 
−1 −3
As w ∝ Sw ∗ (m2 − m1 ), after substituting the values we get, w ∝
8
If we assume some contant αwegetw0 = −3 ∗ α and w1 = 8 ∗ α
Now as ||w|| = 1w02 + w12 = 1, after solving this we get,
w0 = −0.3511 and w1 = 0.936
Assuming the threshold as λ,
−1
|Sw SB − λI| = 0  
1 2.5
SB = (m2 −m1 )(m2 −m1 )T After substituting in this we get, SB = Putting this in the equation
2.5 6.25
−1
b) |Sw SB − λI| = 0, we get λ = 17

Problem 2 continued on next page. . . 3


CSE 575 Homework #1 Problem 2 (continued)

Solution: w(k+1) = wk + ηxn tn given each n ∈ misclassified

A data point is misclassified if and only if wT xn ∗ tn < 0


For the first iteration having the weights, we will first calculate if the data point is misclassified or not based
on the above equation.
So the first and second data points we get the value of wT xn ∗ tn greater than 0 hence they are classified
correctly, and the third and forth points we get this value as -1.5 which is less than 0 hence they are wrongly
classified.
So using the first equation in this part, we get the new weight vectors as [-0.5, -5, -6]
After second iteration, first and second are classified and updated weight vector is [1.5, -2, -5].
Third iteration, first and second data points are misclassfied. The updated weight vector after this is [3.5, 1,
1. -4]

Problem 3

Solution: Given p(y = 1) = 0.4, the prior for the class label y = 2 is,
a) p(y = 2) = 1 − p(y = 1) = 1 − 0.4 = 0.6

Solution: Given p(x|y = 1) = 0.5 f or 0 ≤ x ≤ 2 and 0 otherwise and p(x|y = 2) = 0.125 f or 0 ≤ x ≤ 8 and
0 otherwise. Using the Bayes theorem,

p(x|y = 1) ∗ p(y = 1)
p(y = 1|x) = P2
n=1 p(x|y = n) ∗ p(y = n)
p(x|y = 1) ∗ p(y = 1)
=
p(x|y = 1) ∗ p(y = 1) + p(x|y = 2) ∗ p(y = 2)
0.5 ∗ 0.4
=
0.5 ∗ 0.4 + 0.125 ∗ 0.6
0.2
=
0.275
= 0.727
b)

Problem 3 continued on next page. . . 4


CSE 575 Homework #1 Problem 3 (continued)

Solution: Given p(x|y = 1) = 0.5 f or 0 ≤ x ≤ 2 and 0 otherwise and p(x|y = 2) = 0.125 f or 0 ≤ x ≤ 8 and
0 otherwise. Let’s calculate p(y = 1|x = 1) and p(y = 2|x = 1) so,

p(x|y = 1) ∗ p(y = 1)
p(y = 1|x) =
p(x|y = 1) ∗ p(y = 1) + p(x|y = 2) ∗ p(y = 2)
= 0.727 (as calculated in (b))

So, p(y = 1|x) = 0.727f or 0 ≤ x ≤ 2 and 0 otherwise.

p(x|y = 2) ∗ p(y = 2)
p(y = 2|x) =
p(x|y = 2) ∗ p(y = 1) + p(x|y = 2) ∗ p(y = 2)
0.125 ∗ 0.6
=
0.275
= 0.273

So, p(y = 2|x) = 0.273f or 0 ≤ x ≤ 2, 1f or 2 < x ≤ 8 and 0 otherwise.


Now for x = 1, as p(y = 1|x = 1) > p(y = 2|x = 1) therefore the class label assigned to it will be y = 1 and
c) the probability of misclassifying is 0.273.

Solution: From (c), we can define the decision function of the Bayes Classifier as:
y = 1 for 0 ≤ x ≤ 2 (as p(y = 1|x) > p(y = 2|x) for this interval),
y = 2 for 2 < x ≤ 8 (as p(y = 1|x) < p(y = 2|x) for this interval),
d) and for values of x ̸∈ [0, 8], y can be 1 or 2.

Problem 4

Solution: Given p(y = 1) = 0.6, the prior for the class label y = 2 is,
a) p(y = 2) = 1 − p(y = 1) = 1 − 0.6 = 0.4

Problem 4 continued on next page. . . 5


CSE 575 Homework #1 Problem 4 (continued)

Solution: For calculating p(y = 1|x), we have to calculate it for all pairs of (x1 , x2 ) which are (0,0), (0,1),
(1,0) and (1,1) so,

p(x1 = 0, x2 = 0|y = 1) ∗ p(y = 1)


p(y = 1|x1 = 0, x2 = 0) =
p(x1 = 0, x2 = 0|y = 1) ∗ p(y = 1) + p(x1 = 0, x2 = 0|y = 2) ∗ p(y = 2)
0.4 ∗ 0.6 2
= =
0.4 ∗ 0.6 + 0.3 ∗ 0.4 3

p(x1 = 0, x2 = 1|y = 1) ∗ p(y = 1)


p(y = 1|x1 = 0, x2 = 1) =
p(x1 = 0, x2 = 1|y = 1) ∗ p(y = 1) + p(x1 = 0, x2 = 1|y = 2) ∗ p(y = 2)
0.3 ∗ 0.6 9
= =
0.3 ∗ 0.6 + 0.1 ∗ 0.4 11

p(x1 = 1, x2 = 0|y = 1) ∗ p(y = 1)


p(y = 1|x1 = 1, x2 = 0) =
p(x1 = 1, x2 = 0|y = 1) ∗ p(y = 1) + p(x1 = 1, x2 = 0|y = 2) ∗ p(y = 2)
0.2 ∗ 0.6 3
= =
0.2 ∗ 0.6 + 0.4 ∗ 0.4 7

p(x1 = 1, x2 = 1|y = 1) ∗ p(y = 1)


p(y = 1|x1 = 1, x2 = 1) =
p(x1 = 1, x2 = 1|y = 1) ∗ p(y = 1) + p(x1 = 1, x2 = 1|y = 2) ∗ p(y = 2)
0.1 ∗ 0.6 3
= =
0.1 ∗ 0.6 + 0.2 ∗ 0.4 7
b)

Solution: For x1 = 0 and x2 = 1,


9
p(y = 1|x1 = 0, x2 = 1) =
11

p(x1 = 0, x2 = 1|y = 2) ∗ p(y = 2)


p(y = 2|x1 = 0, x2 = 1) =
p(x1 = 0, x2 = 1|y = 1) ∗ p(y = 1) + p(x1 = 0, x2 = 1|y = 2) ∗ p(y = 2)
0.1 ∗ 0.4 2
= = = 1 − p(y = 2|x1 = 0, x2 = 1)
0.3 ∗ 0.6 + 0.1 ∗ 0.4 11
and,
1 4 4
p(y = 2|x1 = 0, x2 = 0) = , p(y = 2|x1 = 1, x2 = 0) = , p(y = 2|x1 = 1, x2 = 1) =
3 7 7
As p(y = 1|x1 = 0, x2 = 1) > p(y = 2|x1 = 0, x2 = 1), hence the class label assigned to it will be y = 1 and
c) the probability of misclassifying is 0.182.

Problem 4 continued on next page. . . 6


CSE 575 Homework #1 Problem 4 (continued)

Solution: Using the probabilities from (b) and (c), the decision function of the Bayes classifier will be,

y = 1 f or x1 = 0 and x2 = 0 as p(y = 1|x1 = 0, x2 = 0) > p(y = 2|x1 = 0, x2 = 0),


y = 1 f or x1 = 0 and x2 = 1 as p(y = 1|x1 = 0, x2 = 1) > p(y = 2|x1 = 0, x2 = 1),
y = 2 f or x1 = 1 and x2 = 0 as p(y = 1|x1 = 1, x2 = 0) < p(y = 2|x1 = 1, x2 = 0),
y = 2 f or x1 = 1 and x2 = 1 as p(y = 1|x1 = 1, x2 = 1) < p(y = 2|x1 = 1, x2 = 1)
d)

Problem 5

Solution: The number of independent in the Naive Bayes Classifier are 10 (2 for Sky, 3 for Temp, 2 for
a) Humid and 2 for Wind and 1 class label as Play Sport)

3
Solution: p(x1 = sunny|y = yes) =
4
3
p(x1 = sunny|y = no) =
6
1
p(x1 = rainy|y = yes) =
4
3
p(x1 = rainy|y = no) =
6
2
p(x2 = mild|y = yes) =
4
2
p(x2 = mild|y = no) =
6
1
p(x2 = cold|y = yes) =
4
2
p(x2 = cold|y = no) =
6
1
p(x2 = hot|y = yes) =
4
2
p(x2 = hot|y = no) =
6
4
p(x3 = normal|y = yes) =
4
2
p(x3 = normal|y = no) =
6
0
p(x3 = high|y = yes) =
4
4
p(x3 = high|y = no) =
6
2
p(x4 = strong|y = yes) =
4
4
p(x4 = strong|y = no) =
6
2
p(x4 = mild|y = yes) =
4
2
b) p(x4 = mild|y = no) =
6

Problem 5 continued on next page. . . 7


CSE 575 Homework #1 Problem 5 (continued)

Solution: New input vector x = (sunny, cold, normal, strong), so p(y = 1|x) can be calculated by calculating
the class conditional probabilities for x1 = sunny, x2 = cold, x3 = normal and x4 = strong.

p(y = 1|x1 = sunny) = p(x1 = sunny|y = 1) ∗ p(y = 1)


3 4 3
= ∗ =
4 10 10

p(y = 0|x1 = sunny) = p(x1 = sunny|y = 0) ∗ p(y = 0)


3 6 3
= ∗ =
6 10 10

p(y = 1|x2 = cold) = p(x2 = cold|y = 1) ∗ p(y = 1)


1 4 1
= ∗ =
4 10 10

p(y = 0|x2 = cold) = p(x2 = cold|y = 0) ∗ p(y = 0)


2 6 2
= ∗ =
6 10 10

p(y = 1|x3 = normal) = p(x3 = normal|y = 1) ∗ p(y = 1)


4 4 4
= ∗ =
4 10 10

p(y = 0|x3 = normal) = p(x3 = normal|y = 0) ∗ p(y = 0)


2 6 2
= ∗ =
6 10 10

p(y = 1|x4 = strong) = p(x4 = strong|y = 1) ∗ p(y = 1)


2 4 2
= ∗ =
4 10 10

p(y = 0|x4 = strong) = p(x4 = strong|y = 0) ∗ p(y = 0)


4 6 4
= ∗ =
6 10 10
The denominators have not been calculated for these conditional probabilities as it would be same for both
p(y = 0|x)andp(y = 1|x) so,
p(y = 1|x) = p(y = 1|x1 = sunny) ∗ p(y = 1|x2 = cold) ∗ p(y = 1|x3 = normal) ∗ p(y = 1|x4 = strong) ∗ p(y =
1) = 3/10 ∗ 1/10 ∗ 4/10 ∗ 2/10 ∗ 4/10 = 96/100000
p(y = 0|x) = p(y = 0|x1 = sunny) ∗ p(y = 0|x2 = cold) ∗ p(y = 0|x3 = normal) ∗ p(y = 0|x4 = strong) ∗ p(y =
0) = 3/10 ∗ 2/10 ∗ 2/10 ∗ 4/10 ∗ 6/10 = 288/100000
As p(y = 0|x) > p(y = 1|x), hence the Naive Bayes classifier will assign the label y = 0 which means Play
c) Sport (class) = no.

Problem 5 continued on next page. . . 8


CSE 575 Homework #1 Problem 5 (continued)

Solution: We can ignore the parameters which are missing as these are independent events. Hence,

p(y = 1|x) = 4/10 ∗ 3/10 ∗ 1/10 ∗ 4/10 = 0.0048


p(y = 2|x) = 6/10 ∗ 3/10 ∗ 2/10 ∗ 2/10 = 0.0072

d) Hence as p(y = 1|x) > p(y = 2|x), so the classifier align the label as y = 0 (No).

You might also like