Professional Documents
Culture Documents
Chap 13
Chap 13
Chap 13
Chapter 13
Multiple Regression
Chap 13-1
Chapter Goals
After completing this chapter, you should be able to:
Population slopes
Random Error
Y 0 1X1 2 X 2 k Xk
Estimated
intercept
y i b0 b1x1i b 2 x 2i bk x ki
In this chapter we will always use a computer to obtain the
regression slope coefficients and other regression
summary measures.
pe
o
Sl
e
bl
a
i
ar
v
r
fo
x1
y b0 b1x1 b 2 x 2
x1
varia
r
o
f
e
lo p
x2
ble x 2
for (i 1, , n)
for all i j
c 0 c 1x1i c 2 x 2i c K x Ki 0
(This is the property of no linear relation for
the Xjs)
Example:
2 Independent Variables
Dependent variable:
Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100s)
Pie
Sales
Price
($)
Advertising
($100s)
350
5.50
3.3
460
7.50
3.3
350
8.00
3.0
430
8.00
4.5
350
6.80
3.0
380
7.50
4.0
430
4.50
3.0
470
6.40
3.7
450
7.00
3.5
10
490
5.00
4.0
11
340
7.20
3.5
12
300
7.90
3.2
13
440
5.90
4.0
14
450
5.00
3.5
15
300
7.00
2.7
Sales = b0 + b1 (Price)
+ b2 (Advertising)
Excel:
PHStat:
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
b1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
Coefficient of Determination, R2
SST
total sum of squares
2
Coefficient of Determination, R2
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
Regression
47.46341
Observations
ANOVA
SSR 29460.0
R
.52148
SST 56493.3
2
15
df
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Yi 0 1x1i 2 x 2i K x Ki i
s 2e
2
e
i
SSE
n K 1 n K 1
i 1
where ei y i y i
Standard Error, se
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
Regression
47.46341
Observations
ANOVA
se 47.463
15
df
SS
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Adjusted Coefficient of
Determination, R 2
Adjusted Coefficient of
2
Determination, R
(continued)
R
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
R .44172
2
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Coefficient of Multiple
Correlation
R r(y , y) R 2
Evaluating Individual
Regression Coefficients
Hypotheses:
Evaluating Individual
Regression Coefficients
(continued)
bj 0
Sb j
(df = n k 1)
Evaluating Individual
Regression Coefficients
(continued)
Regression Statistics
Multiple R
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
MS
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
H0: j = 0
H1: j 0
Price
Advertising
d.f. = 15-2-1 = 12
Coefficients
Standard Error
t Stat
P-value
-24.97509
10.83213
-2.30565
0.03979
74.13096
25.96732
2.85478
0.01449
= .05
t12, .025 = 2.1788
Decision:
/2=.025
/2=.025
Conclusion:
Reject H0
Do not reject H0
-t/2
-2.1788
Reject H0
t/2
2.1788
b j t nK 1,/2Sb j
Coefficients
Standard Error
Intercept
306.52619
114.25389
Price
-24.97509
10.83213
74.13096
25.96732
Advertising
where t has
(n K 1) d.f.
Here, t has
(15 2 1) = 12 d.f.
Standard Error
Intercept
306.52619
114.25389
57.58835
555.46404
Price
-24.97509
10.83213
-48.57626
-1.37392
74.13096
25.96732
17.55303
130.70888
Advertising
Lower 95%
Upper 95%
Hypotheses:
H0: 1 = 2 = = k = 0 (no linear relationship)
H1: at least one i 0 (at least one independent
variable affects Y)
Test statistic:
MSR
SSR/K
F 2
se
SSE/(n K 1)
where F has k (numerator) and
(n K 1) (denominator)
degrees of freedom
0.72213
R Square
0.52148
Adjusted R Square
0.44172
Standard Error
47.46341
Observations
ANOVA
Regression
15
df
MSR 14730.0
F
6.5386
MSE
2252.8
With 2 and 12 degrees
of freedom
SS
MS
P-value for
the F-Test
F
29460.027
14730.013
Residual
12
27033.306
2252.776
Total
14
56493.333
Coefficients
Standard Error
Intercept
306.52619
114.25389
2.68285
0.01993
57.58835
555.46404
Price
-24.97509
10.83213
-2.30565
0.03979
-48.57626
-1.37392
74.13096
25.96732
2.85478
0.01449
17.55303
130.70888
Advertising
t Stat
6.53861
Significance F
P-value
0.01201
Lower 95%
Upper 95%
Test Statistic:
H0: 1 = 2 = 0
H1: 1 and 2 not both zero
= .05
df1= 2
df2 = 12
Decision:
Critical
Value:
F = 3.885
= .05
Do not
reject H0
Reject H0
F.05 = 3.885
MSR
F
6.5386
MSE
Conclusion:
F
Tests on a Subset of
Regression Coefficients
Tests on a Subset of
Regression Coefficients
(continued)
First run a regression for the complete model and obtain SSE
( SSE(r) SSE ) / r
Reject H0 if F
Fr,nK r 1,
2
se
Prediction
y i 0 1x1i 2 x 2i K x Ki i (i 1,2, , n)
It is risky to forecast for new X values outside the range of the data used
to estimate the model coefficients, because we do not have data to
support that the linear model extends beyond the observed range.
Predicted sales
is 428.62 pies
Predictions in PHStat
Check the
confidence and
prediction interval
estimates box
Predictions in PHStat
(continued)
Input values
<
Predicted y value
<
y b0 b1x1 b 2 x 2
<
Residual =
ei = (yi yi)
y
yi
<
yi
x2i
x1i
x1
x2
Y 0 1X1 2 X12
Yi 0 1X1i 2 X i
2
1i
where:
0 = Y intercept
1 = regression coefficient for linear effect of X on Y
2 = regression coefficient for quadratic effect on Y
i = random error in Y for observation i
X
Linear fit does not give
random residuals
residuals
residuals
1 < 0
2 > 0
X1
1 > 0
2 > 0
X1
1 < 0
2 < 0
X1
1 > 0
2 < 0
X1
y b0 b1x1
y b0 b1x1 b 2 x12
Hypotheses
H0: 2 = 0
H1: 2 0
b2 2
t
sb 2
d.f. n 3
where:
b2 = squared term slope
coefficient
2 = hypothesized slope (zero)
Sb 2= standard error of the slope
Filter
Time
15
22
33
40
10
54
12
67
13
70
14
78
15
85
15
87
16
99
17
(continued)
Coefficients
Standard
Error
-11.28267
3.46805
-3.25332
0.00691
5.98520
0.30966
19.32819
2.078E-10
Intercept
Time
t Stat
P-value
Regression Statistics
R Square
0.96888
Adjusted R Square
0.96628
Standard Error
6.15997
F
373.57904
Significance F
2.0778E-10
Standard
Error
Intercept
1.53870
2.24465
0.68550
0.50722
Time
1.56496
0.60179
2.60052
0.02467
Time-squared
0.24516
0.03258
7.52406
1.165E-05
Regression Statistics
R Square
0.99494
Adjusted R Square
0.99402
Standard Error
2.59513
t Stat
F
1080.7330
P-value
Significance F
2.368E-13
(continued)
Y 0 X X
1
1
2
2
Interpretation of coefficients
For the multiplicative model:
log Yi log 0 1 log X1i log i
Dummy Variables
Different
intercept
y (sales)
b0 + b 2
b0
b1 x 1
b0
Holi
day
(x2 =
1)
No H
o li d a
y (x
2 = 0
)
Holiday
No Holiday
Same
slope
If H0: 2 = 0 is
rejected, then
Holiday has a
significant effect
on pie sales
x1 (Price)
Interpreting the
Dummy Variable Coefficient
Example:
Interaction Between
Explanatory Variables
y b0 b1x1 b 2 x 2 b3 x 3
b0 b1x1 b 2 x 2 b3 (x1x 2 )
Effect of Interaction
Given: Y X ( X )X
0
2 2
1
3 2
1
0 1X1 2 X 2 3 X1X 2
Interaction Example
Suppose x2 is a dummy variable and the estimated
regression equation is y 1 2x1 3x 2 4x1x 2
y
12
x2 = 1:
^y = 1 + 2x + 3(1) + 4x (1) = 4 + 6x
1
1
1
8
4
x2 = 0:
^y = 1 + 2x + 3(0) + 4x (0) = 1 + 2x
1
1
1
0
0
0.5
1.5
x1
H0 : 3 0 | 1 0, 2 0
H1 : 3 0 | 1 0, 2 0
ei = (yi yi)
Assumptions:
The errors are normally distributed
Errors have a constant variance
The model errors are independent
Analysis of Residuals
in Multiple Regression
Residuals vs. yi
Chapter Summary