Professional Documents
Culture Documents
AID M AdvancedMachine WS20212022
AID M AdvancedMachine WS20212022
Points: Grade:
Good Luck!
1
Matriculation Number:
Question 1 (8 Points):
Sunny strong no
2
Matriculation Number:
Question 2 (5 Points):
||𝑥−𝑙(1) ||2
−
f1= ⅇ 2𝜎2
Explain this by sketching three plots with a small, medium, and large σ2. You can
consider a 1-D example with one feature x1.
3
Matriculation Number:
Question 3 (3 Points):
4
Matriculation Number:
Question 4 (4 Points):
x y
1 3
2 6
4 12
0 0
𝑚 2
1
Consider the cost function J(θ) = ∑ (ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 ) . What is the value of J(θ)?
2𝑚 𝑖=1
What does that mean for our model?
5
Matriculation Number:
Question 5 (4 Points):
x y
1 2
2 3
0 1
What are the values of θ0 and θ1 that you would expect to obtain upon
running gradient descent on this model? Explain your answer by sketching
hθ(x).
6
Matriculation Number:
Question 6 (3 Points):
What happens if we start from point D using gradient descent? Explain this
by also explaining what values the cost function J(θ0,θ1) has at Point A, B,
C, and D.
7
Matriculation Number:
Question 7 (3 Points):
𝑑
In this context, what is the main purpose of the derivative term 𝐽(𝜃1 )?
𝑑𝜃𝑗
8
Matriculation Number:
Question 8 (5 Points):
Suppose you ran gradient descent three times, with α=0.001, α=0.1, α=3, and
got the three plots (labeled A, B, and C).
Which plots corresponds to which values of α? Explain why the plots correspond to
the values.
9
Matriculation Number:
Question 9 (3 Points):
Suppose you have m=10 training examples with n=4 features (excluding
the additional all-ones feature for the intercept term, which you should
add). The normal equation is θ = (XTX)-1XT y.
For the given values of m and n, what are the dimensions of θ, X, and y in
this equation?
10
Matriculation Number:
Question 10 (5 Points):
11
Matriculation Number:
Question 11 (2 Points):
This is the cost function for using the Support Vector Machine:
𝑚 1
min𝜃 𝐶 ∑𝑖=1[𝑦 (𝑖) cost1(𝜃 𝑇 𝑥) +(1-𝑦 (𝑖) )cost0(𝜃 𝑇 𝑥) ] + ∑𝑛j=1 θj2
2
𝐶 can be chosen very large or very small. Explain what happens in these two
cases.
12
Matriculation Number:
Question 12 (4 Points):
13
Matriculation Number:
Question 13 (4 Points):
X1 X2 Y
89 267 37
72 216 22
94 282 31
You plan to use both feature scaling (dividing by the "max-min", or range,
of a feature) and mean normalization.
14
Matriculation Number:
Question 14 (5 Points):
Draw a sketch that illustrates how a hypothesis may underfit the training set
and explain your sketch. Why is underfitting a problem? What can you do to
avoid underfitting?
15
Matriculation Number:
Question 15 (5 Points):
x1,x2 ∈ {1,0}
16
Matriculation Number:
Question 16 (3 Points):
17
Matriculation Number:
Question 17 (6 Points):
18
Matriculation Number:
Question 18 (3 Points):
Let J(θ)=θ3. Furthermore, let θ=4 and ϵ=0.5. You use the formula:
𝐽(𝛩 + 𝜖) − 𝐽(𝛩 − 𝜖)
2∗𝜖
to approximate the derivative. What value do you get using this approximation?
When and why is it useful to approximate the derivate?
19
Matriculation Number:
Question 19 (3 Points):
You train a learning algorithm and find that it has unacceptably high error
on the test set. You plot the learning curve and obtain the figure below. What
kind of problem is the algorithm suffering from? Explain your answer. What
could you do to improve the performance of the algorithm?
20
Matriculation Number:
Question 20 (5 Points):
Explain why it is useful to break down a given dataset into the three sets “Training
set”, “Cross validation set”, and “Test set”.
21
Matriculation Number:
Question 21 (4 Points):
Suppose you run k-means and after the algorithm converges, you have (m = 10):
c(1)=1, c(2)=1, c(3)=1, c(4)=2. What do you know about the data and the clusters
given this result?
22
Matriculation Number:
Question 22 (3 Points):
23