AID M AdvancedMachine WS20212022

Matriculation Number:
Points: Grade:
Faculty of Computer Science
Course: Advanced Machine Learning Semester: WS 2021/2022

Course of Artificial Intelligence and Data 90 Min.
Exam Duration
Study: Science
Examiner: Prof. Dr. Christina Bauer Exam Date:
Auxiliary
Open book Time:
Means
Number of
Exam Type: Take-Home-Exam 23
sheets:
Before you start:
• Check, if exercise sheet is complete

• Include your matriculation number on each page
• Go through all questions and get an overview about their scope
relative to the achievable points
• Do not write beyond the limit lines
• Handwriting must be readable
Good Luck!
1

Examiner Prof. Dr. Christina Bauer
Question 1 (8 Points):
We want to apply a Naive Bayes Classifier. Consider this training set:
Outlook Wind Decision
Snow weak yes
Sunny strong no
Sunny weak yes
Given the following data, which decision should we make?
X’ = (Outlook = Snow; Wind = weak)
2

If we use a SVM we can compute new features depending on the proximity to

landmarks, e. g. using the Gaussian kernel function:
||𝑥−𝑙(1) ||2
−
f1= ⅇ 2𝜎2
In this case what is the influence of σ2?
Explain this by sketching three plots with a small, medium, and large σ2. You can
consider a 1-D example with one feature x1.
3

Explain the model of a supervised learning algorithm.
4

Given is the linear regression model hθ(x) = 3x.
Consider the following set of training examples:
x y
1 3
2 6
4 12
0 0
𝑚 2
1
Consider the cost function J(θ) = ∑ (ℎ𝜃 (𝑥 𝑖 ) − 𝑦 𝑖 ) . What is the value of J(θ)?
2𝑚 𝑖=1
What does that mean for our model?
5

Consider the following set of training examples:
x y
1 2
2 3
0 1
Consider the linear regression model h θ(x) = θ0 + θ1 x
What are the values of θ0 and θ1 that you would expect to obtain upon
running gradient descent on this model? Explain your answer by sketching
hθ(x).
6

Consider the following contour plot:
What happens if we start from point D using gradient descent? Explain this
by also explaining what values the cost function J(θ0,θ1) has at Point A, B,
C, and D.
7

Consider the following update rule for gradient descent:

𝑑
θ1 := 𝜃1 − 𝛼 𝐽(𝜃1 )
𝑑𝜃𝑗
𝑑
In this context, what is the main purpose of the derivative term 𝐽(𝜃1 )?
𝑑𝜃𝑗
8

Suppose you ran gradient descent three times, with α=0.001, α=0.1, α=3, and
got the three plots (labeled A, B, and C).
Which plots corresponds to which values of α? Explain why the plots correspond to
the values.
9

Suppose you have m=10 training examples with n=4 features (excluding
the additional all-ones feature for the intercept term, which you should
add). The normal equation is θ = (XTX)-1XT y.
For the given values of m and n, what are the dimensions of θ, X, and y in
this equation?
10

Consider logistic regression with two features x 1 and x2.
Suppose θ0 = 10, θ1 = -2, θ2 = 0. Calculate the equation for the decision

boundary and draw a sketch that illustrates the boundary.
11

This is the cost function for using the Support Vector Machine:
𝑚 1
min𝜃 𝐶 ∑𝑖=1[𝑦 (𝑖) cost1(𝜃 𝑇 𝑥) +(1-𝑦 (𝑖) )cost0(𝜃 𝑇 𝑥) ] + ∑𝑛j=1 θj2
2
𝐶 can be chosen very large or very small. Explain what happens in these two
cases.
12

Explain the One-vs-All approach if you have to classify 5 classes.
13

Suppose you have the following data set:
X1 X2 Y
89 267 37
72 216 22
94 282 31
You plan to use both feature scaling (dividing by the "max-min", or range,
of a feature) and mean normalization.
What is the normalized feature x 2(3)?
14

Draw a sketch that illustrates how a hypothesis may underfit the training set
and explain your sketch. Why is underfitting a problem? What can you do to
avoid underfitting?
15

Given it the following neuron:
x1,x2 ∈ {1,0}
Which logical function does the neuron represent (e.g. and/or/xor/etc.)?

Justify your answer by calculating the possible values of h θ(x).
16

Suppose you have a multi-class classification problem with 3 classes. Your

neural network has 5 layers, and each hidden layer has 10 units. Using the
One-vs-All method, how many elements does Θ(4) have?
17

Given is the following neural network:
a) How is a21 calculated? (4 Points)
b) Using backpropagation, how is 𝛿 23 calculated? (2 Points)
18

Let J(θ)=θ3. Furthermore, let θ=4 and ϵ=0.5. You use the formula:
𝐽(𝛩 + 𝜖) − 𝐽(𝛩 − 𝜖)
2∗𝜖
to approximate the derivative. What value do you get using this approximation?
When and why is it useful to approximate the derivate?
19

You train a learning algorithm and find that it has unacceptably high error
on the test set. You plot the learning curve and obtain the figure below. What
kind of problem is the algorithm suffering from? Explain your answer. What
could you do to improve the performance of the algorithm?
20

Explain why it is useful to break down a given dataset into the three sets “Training
set”, “Cross validation set”, and “Test set”.
21

Suppose you run k-means and after the algorithm converges, you have (m = 10):
c(1)=1, c(2)=1, c(3)=1, c(4)=2. What do you know about the data and the clusters
given this result?
22

Suppose we have two cluster centroids μ1=[1;2], μ2=[-29;-24]. Furthermore, we

have a training example x(i)=[-50;-24]. After a cluster assignment step, what will
c(i) be?
23

AID M AdvancedMachine WS20212022

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AID M AdvancedMachine WS20212022

Uploaded by

Copyright:

Available Formats

Matriculation Number:

Faculty of Computer Science

Course: Advanced Machine Learning Semester: WS 2021/2022

Before you start:

• Check, if exercise sheet is complete

Course: Advanced Machine Learning Semester: WS 2021/2022

We want to apply a Naive Bayes Classifier. Consider this training set:

Outlook Wind Decision

Snow weak yes

Sunny weak yes

Given the following data, which decision should we make?

X’ = (Outlook = Snow; Wind = weak)

Course: Advanced Machine Learning Semester: WS 2021/2022

If we use a SVM we can compute new features depending on the proximity to

In this case what is the influence of σ2?

Course: Advanced Machine Learning Semester: WS 2021/2022

Explain the model of a supervised learning algorithm.

Course: Advanced Machine Learning Semester: WS 2021/2022

Given is the linear regression model hθ(x) = 3x.

Consider the following set of training examples:

Course: Advanced Machine Learning Semester: WS 2021/2022

Consider the following set of training examples:

Consider the linear regression model h θ(x) = θ0 + θ1 x

Course: Advanced Machine Learning Semester: WS 2021/2022

Consider the following contour plot:

Course: Advanced Machine Learning Semester: WS 2021/2022

Consider the following update rule for gradient descent:

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Consider logistic regression with two features x 1 and x2.

Suppose θ0 = 10, θ1 = -2, θ2 = 0. Calculate the equation for the decision

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Explain the One-vs-All approach if you have to classify 5 classes.

Course: Advanced Machine Learning Semester: WS 2021/2022

Suppose you have the following data set:

What is the normalized feature x 2(3)?

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Given it the following neuron:

Which logical function does the neuron represent (e.g. and/or/xor/etc.)?

Course: Advanced Machine Learning Semester: WS 2021/2022

Suppose you have a multi-class classification problem with 3 classes. Your

Course: Advanced Machine Learning Semester: WS 2021/2022

Given is the following neural network:

a) How is a21 calculated? (4 Points)

b) Using backpropagation, how is 𝛿 23 calculated? (2 Points)

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Course: Advanced Machine Learning Semester: WS 2021/2022

Suppose we have two cluster centroids μ1=[1;2], μ2=[-29;-24]. Furthermore, we

You might also like