ISOM2500 Regression Practice Solutions

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

ISOM2500 Practice question for final examination: Regression SOLUTION

Last update:

Lecturer: Prof. Du, Lilun

1. (b). Since the expected change in Y per unit change in X is given by


Δ𝑦̂ = [𝑏0 + 𝑏1 (𝑥 + 1)] − (𝑏0 + 𝑏1 𝑥) = 𝛽1 .
2. (c). Since we expect around 95% of the data are within 2 RMSEs.
3. (a). 𝑆𝐷(𝑦𝑖 ) = √𝑉𝑎𝑟(𝜀𝑖 ) = 9.
4. (a). 𝑦̂|𝑥=5 − 𝑦̂|𝑥=8 = (5 + 0.1(5)) + (5 + 0.1(8)) = 0.1(5 − 8) = −0.3.
1−0
5. (d). 𝑃(𝑦 > 7|𝑥 = 10) = 𝑃(5 + 0.1(10) + 𝜀𝑖 > 7) = 𝑃(𝜀𝑖 > 1) = 𝑃 (𝑍 > 9
) ≈ 0.45.
6. (d). Note that we expect there is no pattern in the residual plot.
7. (c). Put 𝑋 = 2 into the equation of the regression line: 𝑌̂ = 3 + 2(2) = 7. i.e., the point (2,7) is on the
regression line. Then, the least squares estimate of the slope remains unchanged.
8. (b). The standard error of residuals is given by
𝑛
1
𝑠𝑒 = √ ∑ 𝑒𝑖2
𝑛−2
𝑖=1

̂𝑖 .
where 𝑒𝑖 = 𝑌𝑖 − 𝑌

If (2,7) is added to the model, then the new standard error of residuals is given by
𝑛
1
𝑠′𝑒 = √ (∑ 𝑒𝑖2 + 𝑒𝑛+1 ) ;
(𝑛 + 1) − 2
𝑖=1

but 𝑒𝑛+1 = 0 as (2,7) is on the regression line, hence


𝑛 𝑛

1 1
𝑠 𝑒 =√ ∑ 𝑒𝑖2 < √ ∑ 𝑒𝑖2 = 𝑠𝑒 ,
𝑛−1 𝑛−2
𝑖=1 𝑖=1

i.e., the standard error of residuals is decreased.

9. (d). Under the null hypothesis, 𝛽1 = 0. The equation of p-value is

2 × 𝑃(𝑏1 > 2) = 0.45

2−0 2
Consider 𝑃(𝑏1 > 2) = 𝑃 (𝑍 > 𝑠𝑏1
) = 0.225. Hence, 𝑠 = 0.7554 or 𝑠𝑏1 = 2.65 > 2.
𝑏1

The slope is 2, which is less than 1 s.e. (2.65) from zero.

10. (a). The required quantity is Δ𝑦 = 𝑏0 + 𝑏1 (𝑥 + 1) − 𝑏0 − 𝑏1 𝑥 = 𝑏1 = −1.2024.


11. (c). Looking at the increasing trend of the dispersion of the residuals, the variance is increasing. Thus (c) is
violated.
12. (a), since the slope of the log-log model is the elasticity of 𝑦 to 𝑥; as 𝑥 increases by 1%, 𝑦 is changed by -
1.75% on average.
13. (d). Since extrapolation is dangerous.
14. (c). Since the error terms are assumed to be normally distributed with mean 0 and a standard deviation which
can be estimated by the RMSE, which is 65 in this case, there are approximately 95% of the errors fall within
2 standard deviations, 2 × $65 = $130 from 0.
15. (a). Since the p-value is smaller than 0.05, 𝛽1 is statistically significantly different from 0.
16. (c). Since for every $1 million increase in 𝑥, 𝑦 will increase by 𝛽1 on average, and the confidence interval for
𝛽1 is [15,30].
17. (a). The slope of the equation is the elasticity.
18. (a). % increase ≈ 0.5% ∗ 2.2 = 1.1%
19. (a).
20. (d).
21. (d). The C.I. required is equivalent to the 90% C.I. for 𝛽1 , which is given by [0.9795 ± 𝑡50,0.025 (0.0733)] ≈
[0.9795 ± 2(0.0733)] = [0.8329,1.1261].
22. (a). The expected number of absent days is 𝑦̂ = −4.28 + 0.254(30) = 3.34.
23. (c). The p-value of the test is very close to 0 and we reject the null hypothesis.
24. The required CI is given by [0.25379 ± 𝑡39,0.025 (0.02850)] = [0.25379 ± 2.023(0.02850)] =
[0.1961,0.3114].
1 (𝑥𝑛𝑒𝑤 −𝑥̅ )2 1 (30−37.87)2
25. (e). 𝑠. 𝑒. (𝑦̂) = 𝑠𝑒 √1 + 𝑛 + (𝑛−1)𝑠𝑥2
= 1.10807√1 + 40 + (40−1)(10.392 ) = 1.129857
The prediction is 𝑦̂ = −4.28 + 0.254(30) = 3.34 and the 95% prediction interval is [3.34 ±
2.023(1.129857)] = [1.054,5.625]
26. (a).
27. (b). Since the slope of the regression line is negative, the correlation coefficient should also be negative.
28. (a).
29. (b). The probability of committing a type I error is defined as 𝑃(reject 𝐻0 |𝐻0 is true). Hence, the asunwe is
(b).
30. (d). Using the equation, we have
100
= 0.95 + 1.25 × 5 = 7.235
𝑚𝑝𝑔
̂
and hence
100
𝑚𝑝𝑔
̂= = 13.89.
7.235
31. (a).
Mathematically:
Consider a one-sided test:

𝐻 : 𝜇 = 𝜇0
{ 0
𝐻𝑎 : 𝜇 > 𝜇0

assume that 𝜇 = 𝜇1 > 𝜇0 under alternative hypothesis.

As 𝛼 decreases, 𝑧𝛼/2 increases. For example, take 𝛼 = 0.05 and let the changed 𝛼 to be 𝛼 ′ = 0.025.

𝜎
P(Type II error at 𝛼 ′ = 0.025) = P(Retain H0 |H1 is true) = P (𝑥̅ ≤ 𝜇0 + 1.96 |𝜇 = 𝜇1 )
√𝑛

𝜎 𝜎
𝜇0 +1.96 −𝜇1 𝜇0 +1.645 −𝜇1
√𝑛 √𝑛
= 𝑃 (𝑍 ≤ 𝜎 ) < 𝑃 (𝑍 ≤ 𝜎 ) = 𝑃(Type II error at 𝛼 = 0.05).
√𝑛 √𝑛

Verbally, since the number of samples is fixed, if we want to reduce the probability of committing a type I error,
we need to sacrifice the probability of committing a type II error.

32. (c).

(a) is wrong because the normal quantile plot is not used to check linearity of the residuals but the residual
plot.
(b) is wrong because we cannot get the value of R-squared from the plot.
(d) is wrong, it is very hard to know if the data in the sample are dependent or not.
33. (c). Consider a regression line 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖 , rearrange the term then we have
𝛽0 1 𝜀𝑖
𝑥𝑖 = − + 𝑦𝑖 − ,
𝛽1 𝛽1 𝛽1
Thus, the slope and intercept are different.

However, the correlation between two variables is not related to the model or order (remember that
cov(𝑋, 𝑌) = cov(𝑌, 𝑋)), and thus it will remain unchanged.

34. (c).
35. (a). R-squared is defined on [0,1] only.
36. (b). 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 .
37. (c).
38. (a). Let the regression line to be 𝑦̂𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 , putting 𝑥𝑖 = 0, 𝑦̂𝑖 = 0. Also note that 𝑦̂𝑖 is an estimator of
𝜇𝑌|𝑋=𝑥 = 𝐸(𝑌|𝑋 = 𝑥), the average 𝑌.
∑𝑛 ̂ 𝑖 )2
𝑖=1(𝑦𝑖 −𝑦 𝑆𝑆𝐸 𝑆𝑆𝑅
39. (d). 𝑅 2 = 1 − ∑𝑛 (𝑦 2 = 1 − 𝑆𝑆𝑇 = 𝑆𝑆𝑇 .
𝑖=1 𝑖 ̅)
−𝑦
40. (c). The variance of the random errors are assumed to be constant.
41. (c). 𝑟 = 1 implies 𝑅 2 = 1, i.e., the model can explain all the variations between 𝑋 and 𝑌.
42. (b).
43. (c). There is curvature in the residual plot, the data does not seem to be linear.
44. (b).
(c) is wrong because the response of the model in Q43 and Q44 are different, then we cannot compare the R-
squared directly.
(d) is wrong because the model is not a log-log model, the slope is not elasticity.
45. (a). The predicted value is given by 𝑦̂𝑖 = 0.5681 + 0.1021 × 20 = 2.6101.
𝑠𝑥𝑦 𝑠𝑥𝑦
46. (b). Since 𝑏1 = and 𝑟 = , they are both depending on the sign of 𝑠𝑥𝑦 .
𝑠𝑥2 𝑠𝑥 𝑠𝑦
47. (c).
48. (a). Please refer to the answer of question 46 about the sign of slope and correlation coefficient.
49. (d).
50. (b). For intermediate 𝑥’s, the residual 𝑦𝑖 − 𝑦̂𝑖 < 0 and thus 𝑦𝑖 < 𝑦̂𝑖 , i.e., the model overestimated the values.

You might also like