Professional Documents
Culture Documents
MS2301
MS2301
General instructions. Read and follow all instructions carefully. The use of unfair means dur-
ing this examination is unacceptble and not tolerated. By appearing for this examination, you
certify that you have neither received nor given help during examination. All inter-student com-
munication is strictly prohibited during examination. unless specified otherwise, the use
of mobile phones, smart devices (watches, tablets, communication devices, etc.),
calculators, computers, and the internet is prohibited during examination. A
wrong/irrelevant answer is not necessarily entitled to marks.
Specific instruction/s. The use of a standalone “dumb” calculator is allowed during this
examination. This examination consists of MCQ and written parts.
1. MCQ:
(a) Mark your choices on this question paper itself.
(b) Marking multiple answers will disqualify the answer entirely.
(c) The option “NotOOPiC” stands for “None of the Other Options Provided is Correct”.
(d) Return this question paper along with your answerbook.
2. Written: Write answers in the answerbook provided.
Q1 (Maximum Marks 30) MCQ. Mark your choices on this question paper.
B
2. When a set of measurements X1 , . . . , Xn is independent of another set of measurements Y1 , . . . , Yn ,
their scatterplot shows
a) NotOOPiC
b) a random scatter around the line of slope = 1 and intercept = 0.
c) a random scatter over the rectangle defined by the extremes of X and Y .
C
3. For a hypothesis test, smaller the p-value,
B
SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1
Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 1/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00
A
5. When a set of measurements X1 , . . . , Xn is (linearly) correlated with another set of measurements
Y1 , . . . , Yn , their scatterplot shows
B
Note: It was pointed out by a student that the wording of this question is somewhat ambiguous. As
compensation, all students have been given the full credit for this question.
6. If [a, b] is a 95% confidence interval on β0 , then
a) P (a ≤ β0 ≤ b) = 0. b) P (a ≤ β0 ≤ b) = 1.
c) P (a ≤ β0 ≤ b) ≥ 0.05. d) P (a ≤ β0 ≤ b) ≥ 0.95.
D
7. (Choose the incorrect one) The ordinary least-squares fitting method assumes that noise in the data
C
8. In regression analysis, response Y at some X which does not occur in the data
A
9. (Choose the most correct one) In the context of simple linear regression, the ordinary least-squares
method provides
C
SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1
Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 2/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00
10. What is the approximate coverage of the confidence interval βb0 ± 2 × SE(
d βb0 )?
C
11. The null hypothesis “There is no relationship between X and Y ” translates to the data model
a) Y = β0 + ϵ b) Y = β0 + β1 X + ϵ c) Y = β1 X + ϵ d) Y = ϵ
A
12. Linear regression is often used to predict response which is
A
13. In simple linear regression analysis, a small p-value on β1 suggests a potentially
C
14. Log-transforming the data may
C
15. Suppose that the true relationship between X and Y is Y = f (X) + ϵ where f (x) is the population
regression function and ϵ is mean-zero noise. The fit fb depends on, apart from X,
B
16. In regression, non-constant noise variance may show up as a non-constant
a) “envelope” of the data scatter when residuals are plotted against their respective fitted values.
b) “envelope” of the data scatter when residuals are plotted against their respective data index.
c) trend when residuals are plotted against their respective fitted values.
d) trend when residuals are plotted against their respective data index.
A
17. A 95% confidence interval for β1 turns out to be [−0.21, +0.41]. Which of the following statements
is correct?
a) The true slope is zero with probability greater than or equal to 95%.
b) The true slope is zero with probability less than 5%.
c) The true slope is +0.1 with probability greater than or equal to 95%.
d) The true slope is +0.1 with probability less than 5%.
A
18. Which of the following does not represent residual sum of squares?
A
19. What is the approximate coverage of the confidence interval βb1 ± 1 × SE(
d βb1 )?
B
20. (Choose the incorrect one) Linear regression can be used to
B
21. In linear regression, any nonlinearity in the data may show up as a non-constant
a) “envelope” of the data scatter when residuals are plotted against their respective fitted values.
b) “envelope” of the data scatter when residuals are plotted against their respective data index.
c) trend when residuals are plotted against their respective data index.
d) trend when residuals are plotted against their respective fitted values.
D
22. Linear regression is an approch which is
D
23. The null hypothesis “There is no relationship between X and Y ” translates to
a) β0 = 0 b) βb1 = 0 c) βb0 = 0 d) β1 = 0
D
24. (Choose the most correct one) In the context of simple linear regression, the maximum likelihood
method provides
a) NotOOPiC
b) parameter estimates, estimated standard errors, estimated correlation between parameters
c) parameter estimates, estimated standard errors
d) parameter estimates
B
25. (Choose the incorrect one) The maximum likelihood and ordinary least-squares methods are equiva-
lent when noise in the data
a) has zero mean. b) has constant variance. c) is normal IID. d) is white noise.
D
26. The ith residual may be defined as
C
27. Least-squares fitting can be thought of as a procedure that minimizes the sum of the squared distance
between measured response and predicted responses. Which distance?
B
28. In multiple linear regression with p variables, data of size n and known noise variance, the number
of parameters is
a) n b) p c) p + 1 d) p + 2
C
29. The question “How well does the model fit the data?” can be answered using
D
30. In linear regression, the least squares estimator is also the maximum likelihood estimator under the
assumption of normality of
Q2 (Maximum Marks 20) Written. Sufficiently detailed, to-the-point, relevant answers, please.
1. Suppose that we have paired data of the form (x1 , z1 ), . . . , (xn , zn ). Suppose that the data are modeled
as as zi = α0 eβ1 xi δi where δi represents noise in the ith observation. Can you transform this model
to the simple linear regression model? If so, how and under which assumptions? What would be the
estimators for the parameters α0 and β1 ?
Take logarithm of both sides of the data model:
log(zi ) = log(α0 ) + β1 xi + log(δi ).
If we define yi = log(zi ), β0 = log(α0 ) and ϵi = log(δi ), we get the usual simple linear regression
model
yi = β0 + β1 xi + ϵi .
For this transformation to work, the assumptions that need to be made are zi > 0, α0 > 0, δi > 0.
β0 and β1 can be estimated using the method of least-squares:
Pn
i=1 (xi − xn )(yi − y n )
β1 =
b Pn 2
i=1 (xi − xn )
βb0 = y − βb1 xnn
n n
1X 1X
where xn = xi and y n = yi . In this approach, we estimate
n n
i=1 i=1
α
b0 = exp βb0 .
2. Suppose we have multiple predictors X1 , . . . , Xp and a single response Y . Derive the estimators for
the unknown parameters in the multiple linear regression model.
Refer to your class notes; we did this in the class on at least two occasions. Below is the “hack” way
to “derive” the estimators of β:
Xβ = Y
X Xβ = XT Y
T
−1 −1 T
XT X XT X β = XT X
X Y
T
−1 T
β = X X X Y
You need to add all the justifications and explanations. A mathematically rigorous way of arriving
at the same result is to start with the
RSS = ||Y − Xβ||2 = (Y − Xβ)T (Y − Xβ) = Y T Y − β T XT Y − Y T Xβ + β T XT Xβ,
and show that its gradient (i.e., vector of partial derivatives with respect to components of β) is
∇RSS = −2(XT Xβ − XT Y ).
Setting ∇RSS to zero, we get
XT Xβ = XT Y
−1 −1 T
XT X XT X β = XT X
X Y
−1
β = XT X XT Y.
Both approaches assume that XT X is an invertible matrix. (Try to think about the conditions under
which it is not invertible!)
3. Suppose we have multiple predictors X1 , . . . , Xp and a single response Y . Using proper notation,
explain the mathematical form of the multiple linear regression model.
Refer to your class notes; we did this in the class on at least two occasions. You need to explain that
data is of the form
yi = β0 + β1 xi,1 + . . . + βp xi,p + ϵi ;
and explain how all this can be converted to a compact matrix-vector form
y = Xβ + ϵ.