SML_Assignment_Homework_1

CSE 575: Homework #1
Puru Lokendra Singh
26th September, 2022
Problem 1
a) Solution: As X and Y are independent events and P(Y) ¿ 0, so P (X|Y ) = P (X).
T
Solution: As X and Y are disjoint events (P (X Y ) = 0) and P(Y) is greater than 0, so
T
P (X Y )
b) P (X|Y ) = =0
P (Y )
Solution: Given, P (C1 = Head) = 0.6, P (C2 = Head) = 0.4, so

P (C1 = T ail) = 1 − P (C1 = Head) = 0.4 and P (C2 = T ail) = 1 − P (C2 = Head) = 0.6
Now as the tossing of two coins is an independent event, P (C1 , C2 ) = P (C1 ) ∗ P (C2 ), so
P (C1 = Head, C2 = T ail) = P (HT ) = P (C1 = Head) ∗ P (C2 = T ail) = 0.6 ∗ 0.6 = 0.36
Similarly, P (C1 = T ail, C2 = T ail) = P (T T ) = P (C1 = T ail) ∗ P (C2 = T ail) = 0.4 ∗ 0.6 = 0.24
So for the sequence (HT, HT, TT, TT) the probability is:
c) P (HT ) ∗ P (HT ) ∗ P (T T ) ∗ P (T T ) = 0.36 ∗ 0.36 ∗ 0.24 ∗ 0.24 = 0.00746496
Solution:
Number of times head occurs
P (Head) =
Total number of tosses
15
= = 0.75
20
d)
1
CSE 575 Homework #1 Problem 2
Problem 2
Solution: (a) The least-square error function for a linear classifier can be calculated as,
N
1X T
E(w) = (w xn − tn )2
2 n=1
(b) Taking the derivative of this error function and equating it to 0, we get,
w = (xT ∗ x)−1 (xT ∗ t)
x with dummy feature (x0 = 1) and vector t can be given as
   
1 1 0 1
 1 2 1   1 
x=
 1
, t= 
2 3   0 
1 3 3 0
After solving this using numpy library, we get
>>> import numpy as np

>>> x = np.array([[1,1,0],[1,2,1],[1,2,3],[1,3,3]])
>>> t = np.array([1,1,0,0])
>>> w = np.dot(np.linalg.inv(x.T@x),(x.T@t))
>>> w
array([ 0.94444444, 0.16666667, -0.44444444])
a) So after solving this above equation using numpy we get w0 = 0.9445, w1 = 0.1667 and w2 = −0.4445
Problem 2 continued on next page. . . 2

CSE 575 Homework #1 Problem 2 (continued)
Solution: a) The error function for the Fisher’s linear discriminant can be given as
(m2 − m1 )2
J(W ) =
s21 + s22
W T SB W
J(W ) = (in marix format)
W T SW W
b) Sw = S1 + S2 and,
Ci
X
Si = (xn − mi ) ∗ (xn − mi )T
n=1
As the first two are classified as 1 and others are classified as 0,
1
m1 = (x1 + x2 )
2
1
m2 = (x3 + x4 )
2
After solving these we get,

0.5 0.5 0.5 0
S1 = , S2 =
0.5 0.5 0 0
Calcuating Sw inverse,
>>> s1 = np.array([[0.5, 0.5],[0.5, 0.5]])

>>> s2 = np.array([[0.5, 0],[0, 0]])
>>> sw = s1 + s2
>>> swinv = np.linalg.inv(sw)
>>> swinv
array([[ 2., -2.],[-2., 4.]])

−1 2 −2
So, Sw =
−2 4

−1 −3
As w ∝ Sw ∗ (m2 − m1 ), after substituting the values we get, w ∝
8
If we assume some contant αwegetw0 = −3 ∗ α and w1 = 8 ∗ α
Now as ||w|| = 1w02 + w12 = 1, after solving this we get,
w0 = −0.3511 and w1 = 0.936
Assuming the threshold as λ,
−1
|Sw SB − λI| = 0
1 2.5
SB = (m2 −m1 )(m2 −m1 )T After substituting in this we get, SB = Putting this in the equation
2.5 6.25
−1
b) |Sw SB − λI| = 0, we get λ = 17

Solution: w(k+1) = wk + ηxn tn given each n ∈ misclassified
A data point is misclassified if and only if wT xn ∗ tn < 0

For the first iteration having the weights, we will first calculate if the data point is misclassified or not based
on the above equation.
So the first and second data points we get the value of wT xn ∗ tn greater than 0 hence they are classified
correctly, and the third and forth points we get this value as -1.5 which is less than 0 hence they are wrongly
classified.
So using the first equation in this part, we get the new weight vectors as [-0.5, -5, -6]
After second iteration, first and second are classified and updated weight vector is [1.5, -2, -5].
Third iteration, first and second data points are misclassfied. The updated weight vector after this is [3.5, 1,
1. -4]
Problem 3
Solution: Given p(y = 1) = 0.4, the prior for the class label y = 2 is,
a) p(y = 2) = 1 − p(y = 1) = 1 − 0.4 = 0.6
Solution: Given p(x|y = 1) = 0.5 f or 0 ≤ x ≤ 2 and 0 otherwise and p(x|y = 2) = 0.125 f or 0 ≤ x ≤ 8 and
0 otherwise. Using the Bayes theorem,
p(x|y = 1) ∗ p(y = 1)
p(y = 1|x) = P2
n=1 p(x|y = n) ∗ p(y = n)
p(x|y = 1) ∗ p(y = 1)
=
p(x|y = 1) ∗ p(y = 1) + p(x|y = 2) ∗ p(y = 2)
0.5 ∗ 0.4
=
0.5 ∗ 0.4 + 0.125 ∗ 0.6
0.2
=
0.275
= 0.727
b)

Solution: Given p(x|y = 1) = 0.5 f or 0 ≤ x ≤ 2 and 0 otherwise and p(x|y = 2) = 0.125 f or 0 ≤ x ≤ 8 and
0 otherwise. Let’s calculate p(y = 1|x = 1) and p(y = 2|x = 1) so,
p(x|y = 1) ∗ p(y = 1)
p(y = 1|x) =
p(x|y = 1) ∗ p(y = 1) + p(x|y = 2) ∗ p(y = 2)
= 0.727 (as calculated in (b))
So, p(y = 1|x) = 0.727f or 0 ≤ x ≤ 2 and 0 otherwise.
p(x|y = 2) ∗ p(y = 2)
p(y = 2|x) =
p(x|y = 2) ∗ p(y = 1) + p(x|y = 2) ∗ p(y = 2)
0.125 ∗ 0.6
=
0.275
= 0.273
So, p(y = 2|x) = 0.273f or 0 ≤ x ≤ 2, 1f or 2 < x ≤ 8 and 0 otherwise.

Now for x = 1, as p(y = 1|x = 1) > p(y = 2|x = 1) therefore the class label assigned to it will be y = 1 and
c) the probability of misclassifying is 0.273.
Solution: From (c), we can define the decision function of the Bayes Classifier as:
y = 1 for 0 ≤ x ≤ 2 (as p(y = 1|x) > p(y = 2|x) for this interval),
y = 2 for 2 < x ≤ 8 (as p(y = 1|x) < p(y = 2|x) for this interval),
d) and for values of x ̸∈ [0, 8], y can be 1 or 2.
Problem 4
Solution: Given p(y = 1) = 0.6, the prior for the class label y = 2 is,
a) p(y = 2) = 1 − p(y = 1) = 1 − 0.6 = 0.4

Solution: For calculating p(y = 1|x), we have to calculate it for all pairs of (x1 , x2 ) which are (0,0), (0,1),
(1,0) and (1,1) so,
p(x1 = 0, x2 = 0|y = 1) ∗ p(y = 1)

p(y = 1|x1 = 0, x2 = 0) =
p(x1 = 0, x2 = 0|y = 1) ∗ p(y = 1) + p(x1 = 0, x2 = 0|y = 2) ∗ p(y = 2)
0.4 ∗ 0.6 2
= =
0.4 ∗ 0.6 + 0.3 ∗ 0.4 3
p(x1 = 0, x2 = 1|y = 1) ∗ p(y = 1)

p(y = 1|x1 = 0, x2 = 1) =
p(x1 = 0, x2 = 1|y = 1) ∗ p(y = 1) + p(x1 = 0, x2 = 1|y = 2) ∗ p(y = 2)
0.3 ∗ 0.6 9
= =
0.3 ∗ 0.6 + 0.1 ∗ 0.4 11
p(x1 = 1, x2 = 0|y = 1) ∗ p(y = 1)

p(y = 1|x1 = 1, x2 = 0) =
p(x1 = 1, x2 = 0|y = 1) ∗ p(y = 1) + p(x1 = 1, x2 = 0|y = 2) ∗ p(y = 2)
0.2 ∗ 0.6 3
= =
0.2 ∗ 0.6 + 0.4 ∗ 0.4 7
p(x1 = 1, x2 = 1|y = 1) ∗ p(y = 1)

p(y = 1|x1 = 1, x2 = 1) =
p(x1 = 1, x2 = 1|y = 1) ∗ p(y = 1) + p(x1 = 1, x2 = 1|y = 2) ∗ p(y = 2)
0.1 ∗ 0.6 3
= =
0.1 ∗ 0.6 + 0.2 ∗ 0.4 7
b)
Solution: For x1 = 0 and x2 = 1,

9
p(y = 1|x1 = 0, x2 = 1) =
11
p(x1 = 0, x2 = 1|y = 2) ∗ p(y = 2)

p(y = 2|x1 = 0, x2 = 1) =
p(x1 = 0, x2 = 1|y = 1) ∗ p(y = 1) + p(x1 = 0, x2 = 1|y = 2) ∗ p(y = 2)
0.1 ∗ 0.4 2
= = = 1 − p(y = 2|x1 = 0, x2 = 1)
0.3 ∗ 0.6 + 0.1 ∗ 0.4 11
and,
1 4 4
p(y = 2|x1 = 0, x2 = 0) = , p(y = 2|x1 = 1, x2 = 0) = , p(y = 2|x1 = 1, x2 = 1) =
3 7 7
As p(y = 1|x1 = 0, x2 = 1) > p(y = 2|x1 = 0, x2 = 1), hence the class label assigned to it will be y = 1 and
c) the probability of misclassifying is 0.182.

Solution: Using the probabilities from (b) and (c), the decision function of the Bayes classifier will be,
y = 1 f or x1 = 0 and x2 = 0 as p(y = 1|x1 = 0, x2 = 0) > p(y = 2|x1 = 0, x2 = 0),

y = 1 f or x1 = 0 and x2 = 1 as p(y = 1|x1 = 0, x2 = 1) > p(y = 2|x1 = 0, x2 = 1),
y = 2 f or x1 = 1 and x2 = 0 as p(y = 1|x1 = 1, x2 = 0) < p(y = 2|x1 = 1, x2 = 0),
y = 2 f or x1 = 1 and x2 = 1 as p(y = 1|x1 = 1, x2 = 1) < p(y = 2|x1 = 1, x2 = 1)
d)
Problem 5
Solution: The number of independent in the Naive Bayes Classifier are 10 (2 for Sky, 3 for Temp, 2 for
a) Humid and 2 for Wind and 1 class label as Play Sport)
3
Solution: p(x1 = sunny|y = yes) =
4
3
p(x1 = sunny|y = no) =
6
1
p(x1 = rainy|y = yes) =
4
3
p(x1 = rainy|y = no) =
6
2
p(x2 = mild|y = yes) =
4
2
p(x2 = mild|y = no) =
6
1
p(x2 = cold|y = yes) =
4
2
p(x2 = cold|y = no) =
6
1
p(x2 = hot|y = yes) =
4
2
p(x2 = hot|y = no) =
6
4
p(x3 = normal|y = yes) =
4
2
p(x3 = normal|y = no) =
6
0
p(x3 = high|y = yes) =
4
4
p(x3 = high|y = no) =
6
2
p(x4 = strong|y = yes) =
4
4
p(x4 = strong|y = no) =
6
2
p(x4 = mild|y = yes) =
4
2
b) p(x4 = mild|y = no) =
6

Solution: New input vector x = (sunny, cold, normal, strong), so p(y = 1|x) can be calculated by calculating
the class conditional probabilities for x1 = sunny, x2 = cold, x3 = normal and x4 = strong.
p(y = 1|x1 = sunny) = p(x1 = sunny|y = 1) ∗ p(y = 1)

3 4 3
= ∗ =
4 10 10
p(y = 0|x1 = sunny) = p(x1 = sunny|y = 0) ∗ p(y = 0)

3 6 3
= ∗ =
6 10 10
p(y = 1|x2 = cold) = p(x2 = cold|y = 1) ∗ p(y = 1)

1 4 1
= ∗ =
4 10 10
p(y = 0|x2 = cold) = p(x2 = cold|y = 0) ∗ p(y = 0)

2 6 2
= ∗ =
6 10 10
p(y = 1|x3 = normal) = p(x3 = normal|y = 1) ∗ p(y = 1)

4 4 4
= ∗ =
4 10 10
p(y = 0|x3 = normal) = p(x3 = normal|y = 0) ∗ p(y = 0)

2 6 2
= ∗ =
6 10 10
p(y = 1|x4 = strong) = p(x4 = strong|y = 1) ∗ p(y = 1)

2 4 2
= ∗ =
4 10 10
p(y = 0|x4 = strong) = p(x4 = strong|y = 0) ∗ p(y = 0)

4 6 4
= ∗ =
6 10 10
The denominators have not been calculated for these conditional probabilities as it would be same for both
p(y = 0|x)andp(y = 1|x) so,
p(y = 1|x) = p(y = 1|x1 = sunny) ∗ p(y = 1|x2 = cold) ∗ p(y = 1|x3 = normal) ∗ p(y = 1|x4 = strong) ∗ p(y =
1) = 3/10 ∗ 1/10 ∗ 4/10 ∗ 2/10 ∗ 4/10 = 96/100000
p(y = 0|x) = p(y = 0|x1 = sunny) ∗ p(y = 0|x2 = cold) ∗ p(y = 0|x3 = normal) ∗ p(y = 0|x4 = strong) ∗ p(y =
0) = 3/10 ∗ 2/10 ∗ 2/10 ∗ 4/10 ∗ 6/10 = 288/100000
As p(y = 0|x) > p(y = 1|x), hence the Naive Bayes classifier will assign the label y = 0 which means Play
c) Sport (class) = no.

Solution: We can ignore the parameters which are missing as these are independent events. Hence,
p(y = 1|x) = 4/10 ∗ 3/10 ∗ 1/10 ∗ 4/10 = 0.0048

p(y = 2|x) = 6/10 ∗ 3/10 ∗ 2/10 ∗ 2/10 = 0.0072
d) Hence as p(y = 1|x) > p(y = 2|x), so the classifier align the label as y = 0 (No).

SML_Assignment_Homework_1

Uploaded by

Copyright:

Available Formats

You might also like

SML_Assignment_Homework_1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SML_Assignment_Homework_1

Uploaded by

Copyright:

Available Formats

CSE 575: Homework #1

Puru Lokendra Singh

26th September, 2022

a) Solution: As X and Y are independent events and P(Y) ¿ 0, so P (X|Y ) = P (X).

Solution: Given, P (C1 = Head) = 0.6, P (C2 = Head) = 0.4, so

w = (xT ∗ x)−1 (xT ∗ t)

x with dummy feature (x0 = 1) and vector t can be given as

After solving this using numpy library, we get

>>> import numpy as np

Problem 2 continued on next page. . . 2

After solving these we get,

>>> s1 = np.array([[0.5, 0.5],[0.5, 0.5]])

Problem 2 continued on next page. . . 3

Solution: w(k+1) = wk + ηxn tn given each n ∈ misclassified

A data point is misclassified if and only if wT xn ∗ tn < 0

Problem 3 continued on next page. . . 4

So, p(y = 1|x) = 0.727f or 0 ≤ x ≤ 2 and 0 otherwise.

So, p(y = 2|x) = 0.273f or 0 ≤ x ≤ 2, 1f or 2 < x ≤ 8 and 0 otherwise.

Problem 4 continued on next page. . . 5

p(x1 = 0, x2 = 0|y = 1) ∗ p(y = 1)

p(x1 = 0, x2 = 1|y = 1) ∗ p(y = 1)

p(x1 = 1, x2 = 0|y = 1) ∗ p(y = 1)

p(x1 = 1, x2 = 1|y = 1) ∗ p(y = 1)

Solution: For x1 = 0 and x2 = 1,

p(x1 = 0, x2 = 1|y = 2) ∗ p(y = 2)

Problem 4 continued on next page. . . 6

y = 1 f or x1 = 0 and x2 = 0 as p(y = 1|x1 = 0, x2 = 0) > p(y = 2|x1 = 0, x2 = 0),

Problem 5 continued on next page. . . 7

p(y = 1|x1 = sunny) = p(x1 = sunny|y = 1) ∗ p(y = 1)

p(y = 0|x1 = sunny) = p(x1 = sunny|y = 0) ∗ p(y = 0)

p(y = 1|x2 = cold) = p(x2 = cold|y = 1) ∗ p(y = 1)

p(y = 0|x2 = cold) = p(x2 = cold|y = 0) ∗ p(y = 0)

p(y = 1|x3 = normal) = p(x3 = normal|y = 1) ∗ p(y = 1)

p(y = 0|x3 = normal) = p(x3 = normal|y = 0) ∗ p(y = 0)

p(y = 1|x4 = strong) = p(x4 = strong|y = 1) ∗ p(y = 1)

p(y = 0|x4 = strong) = p(x4 = strong|y = 0) ∗ p(y = 0)

Problem 5 continued on next page. . . 8

p(y = 1|x) = 4/10 ∗ 3/10 ∗ 1/10 ∗ 4/10 = 0.0048

You might also like