IE506 Quiz1 Samplequestions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

IE 506: Quiz Sample Questions

1. Consider a data set D = {(xi , y i )}, xi ∈ R2 , y i ∈ R, i ∈ {1, 2, . . . , n}. Sup-

pose the 2 features of xi are denoted by xi1 , xi2 . The adjusted R2 value
is computed using the formula 1 − (1−R )(n−1)
n−k−1 where R2 is the usual R2
value obtained using the model parameters found by performing (unreg-
ularized) ordinary least squares on the data set, n denotes the number of
samples in the data set and k denotes the number of predictor attributes
in the data set (note that k = 2 here).
Discuss the following scenarios:
(a) You perform least squares regression on the data set D1 = {(xi1 , y i )}ni=1
first and obtain R2 value as R1 and adjusted R2 value as Q1 . Then
you perform least squares regression on the full data set D = {(xi , y i )}ni=1
and obtain R2 value as R⋆ and adjusted R2 value as Q⋆ . Can R⋆
become larger than R1 ? Can Q⋆ become larger than Q1 ? Justify
with appropriate examples and reasons.
(b) You perform least squares regression on the data set D2 = {(xi2 , y i )}ni=1
first and obtain R2 value as R2 and adjusted R2 value as Q2 . Then
you perform least squares regression on the full data set D = {(xi , y i )}ni=1
and obtain R2 value as R⋆ and adjusted R2 value as Q⋆ . Can R⋆
become larger than R2 ? Can Q⋆ become larger than Q2 ? Justify
with appropriate examples and reasons.
(c) You perform least squares regression on the data set D1 = {(xi1 , y i )}ni=1
first and obtain R2 value as R1 and adjusted R2 value as Q1 . Then
you perform least squares regression on the data set D2 = {(xi2 , y i )}ni=1
and obtain R2 value as R2 and adjusted R2 value as Q2 . Can R2 be
larger than R1 ? Can Q2 be larger than Q1 ? Justify with appropriate
examples and reasons.
2. During the discussion in class, there was a claim
Pn that the observed total
Pn in the response variable given by i=1 (y − ȳ)2 (where ȳ =
1 i
n i=1 y denotes the mean of the response variable observations) will be
larger than or equal to the unexplained P variance in the response variable
given the predictor variable, denoted by i=1 (y i − ŷ i )2 (where ŷ i = β ⊤ xi
denotes the model based prediction). Discuss why such a claim is valid.
If you find reasons for the contrary, discuss them.
3. Discuss at least four different ways of finding outliers in a data set used
for linear regression.
4. Discuss a few practical applications where data standardization might not
be helpful.
5. In the discussion of linear regression on a dataset with a single predic-
tor variable, recall that we standardized both the response and predictor
variable. Discuss reasons why standardization of response variable was

also done. Suppose we do not perform the standardization of response
variable, discuss the implications of the resultant model in such scenarios.

You might also like