Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

M.Tech.

Programme in M&S Scientific Computing, Modeling & Simulation


2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00

THIS RANDOMIZED QUESION PAPER IS FOR ROLL NUMBER MS2301

General instructions. Read and follow all instructions carefully. The use of unfair means dur-
ing this examination is unacceptble and not tolerated. By appearing for this examination, you
certify that you have neither received nor given help during examination. All inter-student com-
munication is strictly prohibited during examination. unless specified otherwise, the use
of mobile phones, smart devices (watches, tablets, communication devices, etc.),
calculators, computers, and the internet is prohibited during examination. A
wrong/irrelevant answer is not necessarily entitled to marks.

Specific instruction/s. The use of a standalone “dumb” calculator is allowed during this
examination. This examination consists of MCQ and written parts.
1. MCQ:
(a) Mark your choices on this question paper itself.
(b) Marking multiple answers will disqualify the answer entirely.
(c) The option “NotOOPiC” stands for “None of the Other Options Provided is Correct”.
(d) Return this question paper along with your answerbook.
2. Written: Write answers in the answerbook provided.

Q1 (Maximum Marks 30) MCQ. Mark your choices on this question paper.

1. The primary purpose of regression diagnostics is to

a) check any failures of the fitting algorithm.


b) check any failures of assumptions of the analysis.
c) check any problems with the data.
d) NotOOPiC

B
2. When a set of measurements X1 , . . . , Xn is independent of another set of measurements Y1 , . . . , Yn ,
their scatterplot shows

a) NotOOPiC
b) a random scatter around the line of slope = 1 and intercept = 0.
c) a random scatter over the rectangle defined by the extremes of X and Y .

C
3. For a hypothesis test, smaller the p-value,

a) smaller the evidence against the null hypothesis.


b) larger the evidence against the null hypothesis.
c) smaller the evidence against the alternate hypothesis.
d) larger the evidence against the alternate hypothesis.

B
SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1
Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 1/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00

4. Linear dependence of response on the predictors is

a) usually an assumption of convenience and is almost never true in real life.


b) almost always true.
c) NotOOPiC

A
5. When a set of measurements X1 , . . . , Xn is (linearly) correlated with another set of measurements
Y1 , . . . , Yn , their scatterplot shows

a) a random scatter over the rectangle defined by the extremes of X and Y .


b) a random scatter around the line of slope = 1 and intercept = 0.
c) NotOOPiC

B
Note: It was pointed out by a student that the wording of this question is somewhat ambiguous. As
compensation, all students have been given the full credit for this question.
6. If [a, b] is a 95% confidence interval on β0 , then

a) P (a ≤ β0 ≤ b) = 0. b) P (a ≤ β0 ≤ b) = 1.
c) P (a ≤ β0 ≤ b) ≥ 0.05. d) P (a ≤ β0 ≤ b) ≥ 0.95.

D
7. (Choose the incorrect one) The ordinary least-squares fitting method assumes that noise in the data

a) has zero mean. b) has constant variance. c) is normal IID.

C
8. In regression analysis, response Y at some X which does not occur in the data

a) can be predicted using fitted parameters of the model.


b) can be predicted using parameters of the model.
c) can not be predicted.
d) can be predicted using random numbers.

A
9. (Choose the most correct one) In the context of simple linear regression, the ordinary least-squares
method provides

a) parameter estimates, estimated standard errors, estimated correlation between parameters


b) parameter estimates, estimated standard errors
c) parameter estimates
d) NotOOPiC

C
SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1
Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 2/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00

10. What is the approximate coverage of the confidence interval βb0 ± 2 × SE(
d βb0 )?

a) 50%. b) 67%. c) 95%. d) 99%.

C
11. The null hypothesis “There is no relationship between X and Y ” translates to the data model

a) Y = β0 + ϵ b) Y = β0 + β1 X + ϵ c) Y = β1 X + ϵ d) Y = ϵ

A
12. Linear regression is often used to predict response which is

a) quantitative. b) qualitative. c) mixed. d) vector.

A
13. In simple linear regression analysis, a small p-value on β1 suggests a potentially

a) weak predictor-response association.


b) strong predictor-response association.
c) nonzero intercept in the predictor-response association.
d) zero intercept in the predictor-response association.

C
14. Log-transforming the data may

a) decrease the noise variance. b) increase the noise variance.


c) make the noise variance nearly constant. d) make the noise variance non-constant.

C
15. Suppose that the true relationship between X and Y is Y = f (X) + ϵ where f (x) is the population
regression function and ϵ is mean-zero noise. The fit fb depends on, apart from X,

a) Y only. b) Y and ϵ both. c) ϵ only. d) neither Y nor ϵ.

B
16. In regression, non-constant noise variance may show up as a non-constant

a) “envelope” of the data scatter when residuals are plotted against their respective fitted values.
b) “envelope” of the data scatter when residuals are plotted against their respective data index.
c) trend when residuals are plotted against their respective fitted values.
d) trend when residuals are plotted against their respective data index.

A
17. A 95% confidence interval for β1 turns out to be [−0.21, +0.41]. Which of the following statements
is correct?

SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1


Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 3/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00

a) The true slope is zero with probability greater than or equal to 95%.
b) The true slope is zero with probability less than 5%.
c) The true slope is +0.1 with probability greater than or equal to 95%.
d) The true slope is +0.1 with probability less than 5%.

A
18. Which of the following does not represent residual sum of squares?

a) (y1 − β0 − β1 x1 )2 + (y2 − β0 − β1 x2 )2 + . . . + (yn − β0 − β1 xn )2


b) (y1 − βb0 − βb1 x1 )2 + (y2 − βb0 − βb1 x2 )2 + . . . + (yn − βb0 − βb1 xn )2
c) (y1 − βb1 x1 )2 + (y2 − βb1 x2 )2 + . . . + (yn − βb1 xn )2
d) (y1 − βb0 )2 + (y2 − βb0 )2 + . . . + (yn − βb0 )2

A
19. What is the approximate coverage of the confidence interval βb1 ± 1 × SE(
d βb1 )?

a) 50%. b) 67%. c) 95%. d) 99%.

B
20. (Choose the incorrect one) Linear regression can be used to

a) guess the relationship between response and predictor variables.


b) cluster the data.
c) make predictions about response given the predictor variables.
d) decide which of the predictor variables are associated with the response.

B
21. In linear regression, any nonlinearity in the data may show up as a non-constant

a) “envelope” of the data scatter when residuals are plotted against their respective fitted values.
b) “envelope” of the data scatter when residuals are plotted against their respective data index.
c) trend when residuals are plotted against their respective data index.
d) trend when residuals are plotted against their respective fitted values.

D
22. Linear regression is an approch which is

a) NotOOPiC b) unsupervised. c) mixed. d) supervised.

D
23. The null hypothesis “There is no relationship between X and Y ” translates to

a) β0 = 0 b) βb1 = 0 c) βb0 = 0 d) β1 = 0

SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1


Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 4/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00

D
24. (Choose the most correct one) In the context of simple linear regression, the maximum likelihood
method provides

a) NotOOPiC
b) parameter estimates, estimated standard errors, estimated correlation between parameters
c) parameter estimates, estimated standard errors
d) parameter estimates

B
25. (Choose the incorrect one) The maximum likelihood and ordinary least-squares methods are equiva-
lent when noise in the data

a) has zero mean. b) has constant variance. c) is normal IID. d) is white noise.

D
26. The ith residual may be defined as

a) (yi − ȳ) b) (yi − ybi )2 c) (yi − ybi ) d) (yi − ȳ)2

C
27. Least-squares fitting can be thought of as a procedure that minimizes the sum of the squared distance
between measured response and predicted responses. Which distance?

a) Perpendicular to the plane of the fitted model. b) Vertical.


c) Horizontal. d) NotOOPiC

B
28. In multiple linear regression with p variables, data of size n and known noise variance, the number
of parameters is

a) n b) p c) p + 1 d) p + 2

C
29. The question “How well does the model fit the data?” can be answered using

a) standard errors of the fitted parameters b) appropriate confidence intervals


c) appropriate hypothesis tests d) residuals and quantities derived from them

D
30. In linear regression, the least squares estimator is also the maximum likelihood estimator under the
assumption of normality of

a) noise. b) covariates. c) parameters. d) response.

SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1


Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 5/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00

Q2 (Maximum Marks 20) Written. Sufficiently detailed, to-the-point, relevant answers, please.
1. Suppose that we have paired data of the form (x1 , z1 ), . . . , (xn , zn ). Suppose that the data are modeled
as as zi = α0 eβ1 xi δi where δi represents noise in the ith observation. Can you transform this model
to the simple linear regression model? If so, how and under which assumptions? What would be the
estimators for the parameters α0 and β1 ?
Take logarithm of both sides of the data model:
log(zi ) = log(α0 ) + β1 xi + log(δi ).
If we define yi = log(zi ), β0 = log(α0 ) and ϵi = log(δi ), we get the usual simple linear regression
model
yi = β0 + β1 xi + ϵi .
For this transformation to work, the assumptions that need to be made are zi > 0, α0 > 0, δi > 0.
β0 and β1 can be estimated using the method of least-squares:
Pn
i=1 (xi − xn )(yi − y n )
β1 =
b Pn 2
i=1 (xi − xn )
βb0 = y − βb1 xnn
n n
1X 1X
where xn = xi and y n = yi . In this approach, we estimate
n n
i=1 i=1
 
α
b0 = exp βb0 .

2. Suppose we have multiple predictors X1 , . . . , Xp and a single response Y . Derive the estimators for
the unknown parameters in the multiple linear regression model.
Refer to your class notes; we did this in the class on at least two occasions. Below is the “hack” way
to “derive” the estimators of β:
Xβ = Y
X Xβ = XT Y
T
−1 −1 T
XT X XT X β = XT X
 
X Y
T
−1 T
β = X X X Y
You need to add all the justifications and explanations. A mathematically rigorous way of arriving
at the same result is to start with the
RSS = ||Y − Xβ||2 = (Y − Xβ)T (Y − Xβ) = Y T Y − β T XT Y − Y T Xβ + β T XT Xβ,
and show that its gradient (i.e., vector of partial derivatives with respect to components of β) is
∇RSS = −2(XT Xβ − XT Y ).
Setting ∇RSS to zero, we get
XT Xβ = XT Y
−1 −1 T
XT X XT X β = XT X

X Y
−1
β = XT X XT Y.


SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1


Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 6/7
M.Tech. Programme in M&S Scientific Computing, Modeling & Simulation
2023-24/S2 (CA) • CA Exam 1 Savitribai Phule Pune University
22-ML2 Machine Learning 2
2022 Curriculum • 4CR • 50 Marks • 7 Page/s
15/2/2024 • 11:00–13:00

Both approaches assume that XT X is an invertible matrix. (Try to think about the conditions under
which it is not invertible!)
3. Suppose we have multiple predictors X1 , . . . , Xp and a single response Y . Using proper notation,
explain the mathematical form of the multiple linear regression model.
Refer to your class notes; we did this in the class on at least two occasions. You need to explain that
data is of the form

(x1,1 , x1,2 , . . . , x1,p , y1 ),


(x2,1 , x2,2 , . . . , x2,p , y2 ),
..
.
(xn,1 , xn,2 , . . . , xn,p , yn );

then explain the form of the model

yi = β0 + β1 xi,1 + . . . + βp xi,p + ϵi ;

and explain how all this can be converted to a compact matrix-vector form

y = Xβ + ϵ.

Provide the correct explanations and justifications.


4. Suppose that we have paired data of the form (x1 , y1 ), . . . , (xn , yn ). Consider the regression model
yi = βx2i + ϵi where ϵi represents noise in the ith observation. Derive an estimator for parameter β.
This is a simple linear regression model without intercept and Y related to X as X 2 . The residual
sum of squares is
n
X 2
RSS = yi − βx2i .
i=1

Minimize RSS with respect to β; i.e.,


n
d X
RSS = −2 (yi − βx2i )x2i = 0,

i=1

to obtain the least-squares slope estimator


Pn
yi x2i
β = Pi=1
b
n 4 .
i=1 xi

SCMS-SPPU • M.Tech. Programme in M&S • 2023-24/S2 (CA) • CA Exam 1


Curriculum 2022 • 22-ML2 • 4CR • 50 Marks • 15/2/2024 • 11:00–13:00 • Page 7/7

You might also like