Professional Documents
Culture Documents
ISOM2500 Regression Practice Solutions
ISOM2500 Regression Practice Solutions
ISOM2500 Regression Practice Solutions
Last update:
̂𝑖 .
where 𝑒𝑖 = 𝑌𝑖 − 𝑌
If (2,7) is added to the model, then the new standard error of residuals is given by
𝑛
1
𝑠′𝑒 = √ (∑ 𝑒𝑖2 + 𝑒𝑛+1 ) ;
(𝑛 + 1) − 2
𝑖=1
2−0 2
Consider 𝑃(𝑏1 > 2) = 𝑃 (𝑍 > 𝑠𝑏1
) = 0.225. Hence, 𝑠 = 0.7554 or 𝑠𝑏1 = 2.65 > 2.
𝑏1
𝐻 : 𝜇 = 𝜇0
{ 0
𝐻𝑎 : 𝜇 > 𝜇0
As 𝛼 decreases, 𝑧𝛼/2 increases. For example, take 𝛼 = 0.05 and let the changed 𝛼 to be 𝛼 ′ = 0.025.
𝜎
P(Type II error at 𝛼 ′ = 0.025) = P(Retain H0 |H1 is true) = P (𝑥̅ ≤ 𝜇0 + 1.96 |𝜇 = 𝜇1 )
√𝑛
𝜎 𝜎
𝜇0 +1.96 −𝜇1 𝜇0 +1.645 −𝜇1
√𝑛 √𝑛
= 𝑃 (𝑍 ≤ 𝜎 ) < 𝑃 (𝑍 ≤ 𝜎 ) = 𝑃(Type II error at 𝛼 = 0.05).
√𝑛 √𝑛
Verbally, since the number of samples is fixed, if we want to reduce the probability of committing a type I error,
we need to sacrifice the probability of committing a type II error.
32. (c).
(a) is wrong because the normal quantile plot is not used to check linearity of the residuals but the residual
plot.
(b) is wrong because we cannot get the value of R-squared from the plot.
(d) is wrong, it is very hard to know if the data in the sample are dependent or not.
33. (c). Consider a regression line 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖 , rearrange the term then we have
𝛽0 1 𝜀𝑖
𝑥𝑖 = − + 𝑦𝑖 − ,
𝛽1 𝛽1 𝛽1
Thus, the slope and intercept are different.
However, the correlation between two variables is not related to the model or order (remember that
cov(𝑋, 𝑌) = cov(𝑌, 𝑋)), and thus it will remain unchanged.
34. (c).
35. (a). R-squared is defined on [0,1] only.
36. (b). 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 .
37. (c).
38. (a). Let the regression line to be 𝑦̂𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 , putting 𝑥𝑖 = 0, 𝑦̂𝑖 = 0. Also note that 𝑦̂𝑖 is an estimator of
𝜇𝑌|𝑋=𝑥 = 𝐸(𝑌|𝑋 = 𝑥), the average 𝑌.
∑𝑛 ̂ 𝑖 )2
𝑖=1(𝑦𝑖 −𝑦 𝑆𝑆𝐸 𝑆𝑆𝑅
39. (d). 𝑅 2 = 1 − ∑𝑛 (𝑦 2 = 1 − 𝑆𝑆𝑇 = 𝑆𝑆𝑇 .
𝑖=1 𝑖 ̅)
−𝑦
40. (c). The variance of the random errors are assumed to be constant.
41. (c). 𝑟 = 1 implies 𝑅 2 = 1, i.e., the model can explain all the variations between 𝑋 and 𝑌.
42. (b).
43. (c). There is curvature in the residual plot, the data does not seem to be linear.
44. (b).
(c) is wrong because the response of the model in Q43 and Q44 are different, then we cannot compare the R-
squared directly.
(d) is wrong because the model is not a log-log model, the slope is not elasticity.
45. (a). The predicted value is given by 𝑦̂𝑖 = 0.5681 + 0.1021 × 20 = 2.6101.
𝑠𝑥𝑦 𝑠𝑥𝑦
46. (b). Since 𝑏1 = and 𝑟 = , they are both depending on the sign of 𝑠𝑥𝑦 .
𝑠𝑥2 𝑠𝑥 𝑠𝑦
47. (c).
48. (a). Please refer to the answer of question 46 about the sign of slope and correlation coefficient.
49. (d).
50. (b). For intermediate 𝑥’s, the residual 𝑦𝑖 − 𝑦̂𝑖 < 0 and thus 𝑦𝑖 < 𝑦̂𝑖 , i.e., the model overestimated the values.