Professional Documents
Culture Documents
Statistics For Managers Using Microsoft® Excel 5th Edition: Simple Linear Regression
Statistics For Managers Using Microsoft® Excel 5th Edition: Simple Linear Regression
Statistics For Managers Using Microsoft® Excel 5th Edition: Simple Linear Regression
Chapter 13
Simple Linear Regression
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 13-1
Learning Objectives
X X
Y Y
X X
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-6
Types of Relationships
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-7
Types of Relationships
No relationship X
X
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-8
The Linear Regression Model
Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable
Yi = β 0 + β1X i + ε i
Linear component Random Error
component
Y Yi = β 0 + β1X i + ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi for this Xi value
Intercept = β0
Xi X
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-10
Linear Regression Equation
The simple linear regression equation provides an
estimate of the population regression line
Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept
Value of X for
Ŷi = b 0 + b1X i
observation i
350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Tools
--------
Data Analysis
--------
Regression
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
450
400
House Price ($1000s)
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
= 98.25 + 0.1098(200 0)
= 317.85
450
400
House Price ($1000s)
350
300 Do not try to
250
extrapolate beyond
200
150
the range of
100 observed X’s
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-24
Measures of Variation
Total variation is made up of two parts:
Xi X
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-27
Coefficient of Determination, r2
The coefficient of determination is the portion of
the total variation in the dependent variable that
is explained by variation in the independent
variable
The coefficient of determination is also called r-
squared and is denoted as r2
SSR regression sum of squares
r =
2
=
SST total sum of squares
0 ≤r ≤1 2
X
r =1
2
X
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-30
Coefficient of Determination, r2
r2 = 0
Y
No linear relationship between X
and Y:
SSE ∑ i i
(Y − Yˆ ) 2
SYX = = i =1
n−2 n−2
Where
SSE = error sum of squares
n = sample size
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
small s YX X large s YX X
x x
residuals
residuals
x x
residuals
X
residuals
x x
residuals
x residuals x
Residuals
3 284.85348 -5.853484 20
4 304.06284 3.937162 0
0 1000 2000 3000
5 218.99284 -19.99284 -20
9 254.6674 64.33264
10 284.85348 -29.85348 Does not appear to violate
any regression assumptions
15
10
Here, residuals suggest a Residuals 5
cyclic pattern, not 0
random -5 0 2 4 6 8
-10
-15
Time (t)
∑i
e 2
i =1
D less than 2 may signal positive
autocorrelation, D greater than 2 may
signal negative autocorrelation
0 dL dU 2
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-46
The Durbin-Watson Statistic
Example with n = 25: 160
140
Excel output: 120
Sales
80 y = 30.65 + 4.7038x
Sum of Squared 2
Difference of Residuals 3296.18 60 R = 0.8976
40
Sum of Squared Residuals 3279.98
20
Durbin-Watson Statistic 1.00494
0
0 5 10 15 20 25 30
Tim e
∑(e i − ei−1 )2
3296.18
D= i=2
n
= = 1.00494
3279.98
∑ei
2
i=1
b1 − β1 0.10977 − 0
t= = t = 3.32938
Sb1 0.03297
d.f. = 10- 2 = 8
α/2=.025 α/2=.025
Decision: Reject H0
Fα = 5.32
α = .05 Conclusion:
There is sufficient evidence that
0 Do not Reject H0
F house size affects selling price
reject H0 F.05 = 5.32
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-56
Confidence Interval Estimate
for the Slope
Confidence Interval Estimate of the Slope:
b1 ± t n−2Sb1 d.f. = n - 2
Test statistic
r -ρ (with n – 2 degrees of freedom)
t=
1− r 2 where
r = + r 2 if b1 > 0
n−2
r = − r 2 if b1 < 0
r −ρ .762 − 0
t= = = 3.329
1− r2 1 − .762 2
n−2 10 − 2
Statistics for Managers Using Microsoft Excel, 5e © Chap 13-60
t Test for a Correlation Coefficient
d.f. = 10- 2 = 8
Decision:
Reject H0
α/2=.025 α/2=.025
Conclusion:
There is evidence
Reject H0 Do not reject H0 Reject H0
of a linear
-tα/2 0
tα/2 association at the
-2.3060 2.3060 5% level of
3.329
significance
Ŷ ± t n−2S YX hi
Size of interval varies according to
distance away from mean, X
1 (X i − X) 2 1 (X i − X) 2
hi = + = +
n SSX n ∑ (X i − X) 2
Find the 95% confidence interval for the mean price of 2,000
square-foot houses
∧
Predicted Price Yi = 317.85 ($1,000s)
1 (Xi − X)2
Ŷ ± t n-2S YX + = 317.85 ± 37.12
n ∑ (Xi − X) 2
1 (Xi − X)2
Ŷ ± t n-1S YX 1+ + = 317.85 ± 102.28
n ∑ (Xi − X) 2