Professional Documents
Culture Documents
Solutions Problem Set 1
Solutions Problem Set 1
x2 x2 x2
x1 x1 x1
x1 x1 x1
Answer: B, D, and F allow zero training error for linear regression classifier.
g(x | w) = w0 + w1 x1 + w2 x2 + . . . + wn xn (1)
The optimal values of the parameters are obtained by minimizing the total or
average square error:
N
1X
E= (g(xi | w) − yi )2 (2)
2 i=1
1
Suppose that we use L1 and L2 regularizations for the parameters. The final
loss to be optimized for the L2 case is
Comparing Eqs. (3) and (4), which one will enforce better sparsity on the pa-
rameter w (i.e., relatively more number of zeroes in w). Briefly explain using
the losses for L2 and L1 provided below.
2
x2 x2 x2
ONS
x1 x1
x1
Consider again the datasets in Figure 3 and the same linear logistic regression
model
P (y = 1 | x, w) = g(w0 + w1 x1 + w2 x2 ) (8)
As we increase the regularization parameter C which of the following scenarios
do you expect to observe? (Choose only one) Briefly explain your choice:
1. First w1 will become 0, then w2 may become smaller.
2. First w2 will become 0, then w1 may become smaller.
3. w1 and w2 will become zero simultaneously.
4. None of the weights will become exactly zero, only smaller as C increases.
3
Answer: (A) 1; (B) 1; C (4)
6. A random sample of eight drivers insured with a company and having similar
auto insurance policies was selected. The following table lists their driving
experiences (in years) and monthly auto insurance premiums.
4
dL 5 5
= − =0
dθ θ 1−θ
θ̂ = 0.5
8 Let {Xi }N
i=1 be iid observations drawn from the following probability density
functions: h |x| i
1
p(x|θ) = exp − .
2θ θ
Find the MLE of θ.
Answer:
N h
X |Xi | i
L(θ|X ) = − log2 − logθ −
i=1
θ
N
dL X h 1 |Xi | i
= − + 2
dθ i=1
θ θ
PN
i=1 |Xi |
θ̂ =
N
The likelihood: (
1
θN
for 0 ≤ Xi ≤ θ
L(θ|X ) ==
0 otherwise
1
== The MLE estimate θ̂ ≥ Xi . The choice of value should maximize θN
== θ̂ should be the smallest possible value of θ.
== θ̂ = max(X1 , X2 , . . . XN )
Q10: Suppose that we are doing binary sentiment classification on movie review
text, and we would like to know whether to assign the sentiment class positive
(+) or negative (-) to a review document. We’ll represent each input observation
by the 6 features {x1 , x2 , . . . , x6 } of the input shown in the following text and
table (Figs. 4 and 5)
5
Figure 4: A movie review text.
Suppose that we have trained a logistic regression classifier, and the 6 weights
corresponding to the 6 features are {2.5, −5.0, −1.2, 0.5, 2.0, 0.7}, while w0 = 0.1.
1. How do you interpret the values of w1 = 2.5 and w2 = −5.0.
2. Calculate P (+ | x) and P (− | x) for the above example.
Answer: (1) The weight w1, indicates how important a feature the number
of positive lexicon words (great, nice, enjoyable, etc.) is to a positive sentiment
decision while w2 tells us the opposite importance of negative lexicon words.
(2)P (+ | x) = σ(wT x + b) = 0.70
P (− | x) = 1 − σ(wT x + b) = 0.30
6
Answer:
σ(wT x + w0 ) − y x1
σ(0) − 1x1 −1.5
∆w,b L = σ(wT x + w0 ) − y x2 = σ(0) − 1 x2 = −1 (9)
σ(wT x + w0 ) − y σ(0) − 1 −0.5
−0.15
θ1 = θ0 − µ∆w,b L = −0.1 (10)
−0.05