IE506 Quiz1 Samplequestions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

IE 506: Quiz Sample Questions

1. Consider a data set D = {(xi , y i )}, xi ∈ R2 , y i ∈ R, i ∈ {1, 2, . . . , n}. Sup-


pose the 2 features of xi are denoted by xi1 , xi2 . The adjusted R2 value
2
is computed using the formula 1 − (1−R )(n−1)
n−k−1 where R2 is the usual R2
value obtained using the model parameters found by performing (unreg-
ularized) ordinary least squares on the data set, n denotes the number of
samples in the data set and k denotes the number of predictor attributes
in the data set (note that k = 2 here).
Discuss the following scenarios:
(a) You perform least squares regression on the data set D1 = {(xi1 , y i )}ni=1
first and obtain R2 value as R1 and adjusted R2 value as Q1 . Then
you perform least squares regression on the full data set D = {(xi , y i )}ni=1
and obtain R2 value as R⋆ and adjusted R2 value as Q⋆ . Can R⋆
become larger than R1 ? Can Q⋆ become larger than Q1 ? Justify
with appropriate examples and reasons.
(b) You perform least squares regression on the data set D2 = {(xi2 , y i )}ni=1
first and obtain R2 value as R2 and adjusted R2 value as Q2 . Then
you perform least squares regression on the full data set D = {(xi , y i )}ni=1
and obtain R2 value as R⋆ and adjusted R2 value as Q⋆ . Can R⋆
become larger than R2 ? Can Q⋆ become larger than Q2 ? Justify
with appropriate examples and reasons.
(c) You perform least squares regression on the data set D1 = {(xi1 , y i )}ni=1
first and obtain R2 value as R1 and adjusted R2 value as Q1 . Then
you perform least squares regression on the data set D2 = {(xi2 , y i )}ni=1
and obtain R2 value as R2 and adjusted R2 value as Q2 . Can R2 be
larger than R1 ? Can Q2 be larger than Q1 ? Justify with appropriate
examples and reasons.
2. During the discussion in class, there was a claim
Pn that the observed total
i
variance
Pn in the response variable given by i=1 (y − ȳ)2 (where ȳ =
1 i
n i=1 y denotes the mean of the response variable observations) will be
larger than or equal to the unexplained P variance in the response variable
n
given the predictor variable, denoted by i=1 (y i − ŷ i )2 (where ŷ i = β ⊤ xi
denotes the model based prediction). Discuss why such a claim is valid.
If you find reasons for the contrary, discuss them.
3. Discuss at least four different ways of finding outliers in a data set used
for linear regression.
4. Discuss a few practical applications where data standardization might not
be helpful.
5. In the discussion of linear regression on a dataset with a single predic-
tor variable, recall that we standardized both the response and predictor
variable. Discuss reasons why standardization of response variable was

1
also done. Suppose we do not perform the standardization of response
variable, discuss the implications of the resultant model in such scenarios.

You might also like