Professional Documents
Culture Documents
Business Statistics: A First Course: Simple Linear Regression
Business Statistics: A First Course: Simple Linear Regression
A First Course
(3rd Edition)
Chapter 10
Simple Linear Regression
Chap 10-2
© 2003 Prentice-Hall, Inc.
Chapter Topics
(continued)
Chap 10-3
© 2003 Prentice-Hall, Inc.
Purpose of Regression Analysis
Regression Analysis is Used Primarily to Model
Causality and Provide Prediction
Predict the values of a dependent (response)
variable based on values of at least one
independent (explanatory) variable
Explain the effect of the independent variables on
the dependent variable
Chap 10-4
© 2003 Prentice-Hall, Inc.
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
Chap 10-5
© 2003 Prentice-Hall, Inc.
Simple Linear Regression Model
Chap 10-6
© 2003 Prentice-Hall, Inc.
Simple Linear Regression Model
(continued)
Yi X i i
Dependent
Population Independent
(Response)
Variable Regression Y |X (Explanatory)
Line Variable
(conditional mean) Chap 10-7
© 2003 Prentice-Hall, Inc.
Simple Linear Regression Model
(continued)
Y (Observed Value of Y) = Yi X i i
i = Random Error
Y | X X i
(Conditional Mean)
X
Observed Value of Y
Chap 10-8
© 2003 Prentice-Hall, Inc.
Linear Regression Equation
Sample regression line provides an estimate of
the population regression line as well as a
predicted value of Y
Sample
Sample Slope
Y Intercept Coefficient
Yi b0 b1 X i ei Residual
Yˆ b0 b1 X (Fitted
Simple Regression Equation
Regression Line, Predicted Value)
Chap 10-9
© 2003 Prentice-Hall, Inc.
Linear Regression Equation
(continued)
e
n 2 n
Yi Yˆi 2
i
i 1 i 1
b0 provides an estimate of
b1 provides and estimate of
Chap 10-10
© 2003 Prentice-Hall, Inc.
Linear Regression Equation
(continued)
Yi b0 b1 X i ei Yi X i i
Y b1
i
ei
Y | X X i
Yˆi b0 b1 X i
b0 X
Observed Value
Chap 10-11
© 2003 Prentice-Hall, Inc.
Interpretation of the Slope and
Intercept
Y |X
1 measures the change in the
X
average value of Y as a result of a one-unit
change in X.
Chap 10-12
© 2003 Prentice-Hall, Inc.
Interpretation of the Slope and
Intercept (continued)
ˆY | X
b1 is the estimated change in the
X
average value of Y as a result of a one-unit
change in X.
Chap 10-13
© 2003 Prentice-Hall, Inc.
Simple Linear Regression:
Example
You wish to examine Annual
Store Square Sales
the linear dependency Feet ($1000)
of the annual sales of 1 1,726 3,681
produce stores on their 2 1,542 3,395
sizes in square footage. 3 2,816 6,653
Sample data for 7
4 5,555 9,543
stores were obtained.
5 1,292 3,318
Find the equation of
6 2,208 5,563
the straight line that
7 1,313 3,760
fits the data best.
Chap 10-14
© 2003 Prentice-Hall, Inc.
Scatter Diagram: Example
12000
Annual Sales ($000)
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
Square Feet
Excel Output
Chap 10-15
© 2003 Prentice-Hall, Inc.
Simple Linear Regression
Equation: Example
Yˆi b0 b1 X i
1636.415 1.487 X i
Chap 10-16
© 2003 Prentice-Hall, Inc.
Graph of the Simple Linear
Regression Equation: Example
12000
Annual Sales ($000)
10000
8000
8 7 Xi
6000 + 1.4
. 415
4000 6 36
1
2000
Yi =
0
0 1000 2000 3000 4000 5000 6000
Square Feet
Chap 10-17
© 2003 Prentice-Hall, Inc.
Interpretation of Results:
Example
Chap 10-18
© 2003 Prentice-Hall, Inc.
Simple Linear Regression in
PHStat
In Excel, use PHStat | Regression | Simple
Linear Regression …
EXCEL Spreadsheet of Regression Sales on
Footage
Microsoft Excel
Worksheet
Chap 10-19
© 2003 Prentice-Hall, Inc.
Measures of Variation:
The Sum of Squares
Chap 10-20
© 2003 Prentice-Hall, Inc.
Measures of Variation:
The Sum of Squares
(continued)
Chap 10-21
© 2003 Prentice-Hall, Inc.
Measures of Variation:
The Sum of Squares
(continued)
Y
SSE =(Yi - Yi )2
_
SST = (Yi - Y)2
_
SSR = (Yi - Y)2
_
Y
X
Xi
Chap 10-22
© 2003 Prentice-Hall, Inc.
The ANOVA Table in Excel
ANOVA
Significance
df SS MS F
F
MSR P-value of
Regression k SSR MSR/MSE
=SSR/k the F Test
MSE
Residuals n-k-1 SSE
=SSE/(n-k-1)
Chap 10-23
© 2003 Prentice-Hall, Inc.
Measures of Variation
The Sum of Squares: Example
Excel Output for Produce Stores
Degrees of freedom
ANOVA
df SS MS F Significance F
Regression 1 30380456.12 30380456 81.17909 0.000281201
Residual 5 1871199.595 374239.92
Total 6 32251655.71
Chap 10-25
© 2003 Prentice-Hall, Inc.
Coefficients of Determination (r 2)
and Correlation (r)
Y r2 = 1, r = +1 Y r2 = 1, r = -1
^=b +b X
Yi 0 1 i
^=b +b X
Yi 0 1 i
X X
Y = .81, r = +0.9
r r2 = 0, r = 0
2
Y
^=b +b X
Y ^=b +b X
Y
i 0 1 i i 0 1 i
X X
Chap 10-26
© 2003 Prentice-Hall, Inc.
Standard Error of Estimate
n 2
Y Yˆi
SSE
SYX i 1
n2 n2
Chap 10-27
© 2003 Prentice-Hall, Inc.
Measures of Variation:
Produce Store Example
Excel Output for Produce Stores
Regression Statistics
Multiple R 0.9705572
R Square 0.94198129
Adjusted R Square 0.93037754
Standard Error 611.751517
Observations 7
r2 = .94 Syx
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage
Chap 10-28
© 2003 Prentice-Hall, Inc.
Linear Regression Assumptions
Normality
Y values are normally distributed for each X
Probability distribution of error is normal
2. Homoscedasticity (Constant Variance)
3. Independence of Errors
Chap 10-29
© 2003 Prentice-Hall, Inc.
Variation of Errors Around the
Regression Line
• Y values are normally distributed
f() around the regression line.
• For each X value, the “spread” or
variance around the regression line is
the same.
Y
X2
X1
X
Sample Regression Line
Chap 10-30
© 2003 Prentice-Hall, Inc.
Residual Analysis
Purposes
Examine linearity
Evaluate violations of assumptions
Graphical Analysis of Residuals
Plot residuals vs. X and time
Chap 10-31
© 2003 Prentice-Hall, Inc.
Residual Analysis for Linearity
Y Y
X X
e e
X
X
Y Y
X X
SR SR
X X
Heteroscedasticity Homoscedasticity
Chap 10-33
© 2003 Prentice-Hall, Inc.
Residual Analysis:Excel Output
for Produce Stores Example
Observation Predicted Y Residuals
1 4202.344417 -521.3444173
2 3928.803824 -533.8038245
3 5822.775103 830.2248971
Excel Output 4 9894.664688 -351.6646882
5 3557.14541 -239.1454103
6 4918.90184 644.0981603
7 3588.364717 171.6352829
Residual Plot
Square Feet
Chap 10-34
© 2003 Prentice-Hall, Inc.
Residual Analysis for
Independence
The Durbin-Watson Statistic
Used when data is collected over time to detect
autocorrelation (residuals in one time period are
related to residuals in another period)
Measures violation of independence assumption
n
i i 1
( e e ) 2
Should be close to 2.
D i 2
n If not, examine the model
e
i 1
2
i for autocorrelation.
Chap 10-35
© 2003 Prentice-Hall, Inc.
Durbin-Watson Statistic in
PHStat
PHStat | Regression | Simple Linear
Regression …
Check the box for Durbin-Watson Statistic
Chap 10-36
© 2003 Prentice-Hall, Inc.
Obtaining the Critical Values of
Durbin-Watson Statistic
Table 13.4 Finding critical values of Durbin-Watson Statistic
5
k=1 k=2
n dL dU dL dU
15 1 .0 8 1 .3 6 .9 5 1 .5 4
16 1 .1 0 1 .3 7 .9 8 1 .5 4
Chap 10-37
© 2003 Prentice-Hall, Inc.
Using the Durbin-Watson
Statistic
H0 : No autocorrelation (error terms are independent)
H1 : There is autocorrelation (error terms are not
independent)
0 dL dU 2 4-dU 4-dL 4
Chap 10-38
© 2003 Prentice-Hall, Inc.
Residual Analysis for
Independence
Graphical Approach
e
Not Independent
Independent
Time Time
b1 1 SYX
t where Sb1
Sb1 n
(X
i 1
i X) 2
d. f . n 2
Chap 10-40
© 2003 Prentice-Hall, Inc.
Example: Produce Store
Data for 7 Stores:
Annual Estimated Regression
Store Square Sales Equation:
Feet ($000)
1 1,726 3,681 Yˆ 1636.415 1.487 X i
2 1,542 3,395
3 2,816 6,653 The slope of this
4 5,555 9,543 model is 1.487.
5 1,292 3,318
Does Square Footage
6 2,208 5,563
Affect Annual Sales?
7 1,313 3,760
Chap 10-41
© 2003 Prentice-Hall, Inc.
Inferences about the Slope:
t Test Example
Test Statistic:
H0: 1 = 0
H1: 1 0
From Excel Printout b1 Sb1 t
.05 Coefficients Standard Error t Stat P-value
Intercept 1636.4147 451.4953 3.6244 0.01515
df 7 - 2 = 5 Footage 1.4866 0.1650 9.0099 0.00028
Critical Value(s):
Decision:
Reject Reject Reject H0
.025 .025 Conclusion:
There is evidence that
-2.5706 0 2.5706 t square footage affects
annual sales.
Chap 10-42
© 2003 Prentice-Hall, Inc.
Inferences about the Slope:
Confidence Interval Example
Confidence Interval Estimate of the Slope:
b1 tn 2 Sb1
Excel Printout for Produce Stores
Lower 95% Upper 95%
Intercept 475.810926 2797.01853
X Variable 11.06249037 1.91077694
At 95% level of confidence the confidence interval
for the slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear dependency
of annual sales on the size of the store.
Chap 10-43
© 2003 Prentice-Hall, Inc.
Inferences about the Slope:
F Test
F Test for a Population Slope
Is there a linear dependency of Y on X ?
Null and Alternative Hypotheses
H 0 : 1 = 0 (No Linear Dependency)
H1: 1 0 (Linear Dependency)
Test Statistic
SSR
F 1
SSE
nd.f.=1
Numerator 2 , denominator d.f.=n-2
Chap 10-44
© 2003 Prentice-Hall, Inc.
Relationship between a t Test
and an F Test
Null and Alternative Hypotheses
H0: 1 = 0 (No Linear Dependency)
H1: 1 0 (Linear Dependency)
t
2
n2 F1,n 2
Chap 10-45
© 2003 Prentice-Hall, Inc.
Inferences about the Slope:
F Test Example
Test Statistic:
H0: 1 = 0 From Excel Printout
ANOVA
H1: 1 0
df SS MS F Significance F
.05 Regression 1 30380456.12 30380456.12 81.179 0.000281
numerator Residual 5 1871199.595 374239.919
df = 1 Total 6 32251655.71
denominator
df 7 - 2 = 5
Reject H0
Reject Decision:
Conclusion:
= .05 There is evidence that
square footage affects
0 6.61 F1,n 2 annual sales.
Chap 10-46
© 2003 Prentice-Hall, Inc.
Purpose of Correlation Analysis
Correlation Analysis is Used to Measure
Strength of Association (Linear Relationship)
Between 2 Numerical Variables
Only Strength of the Relationship is Concerned
No Causal Effect is Implied
Chap 10-47
© 2003 Prentice-Hall, Inc.
Purpose of Correlation Analysis
(continued)
Chap 10-48
© 2003 Prentice-Hall, Inc.
Sample of Observations from
Various r Values
Y Y Y
X X X
r = -1 r = -.6 r=0
Y Y
X X
© 2003 Prentice-Hall, Inc.
r = .6 r=1 Chap 10-49
Features of and r
Unit Free
Range between -1 and 1
The Closer to -1, the Stronger the Negative
Linear Relationship
The Closer to 1, the Stronger the Positive
Linear Relationship
The Closer to 0, the Weaker the Linear
Relationship
Chap 10-50
© 2003 Prentice-Hall, Inc.
t Test for Correlation
Hypotheses
H0: = 0 (No Correlation)
H1: 0 (Correlation)
Test Statistic
r
t where
r 2
n2
n
X i X Yi Y
r r2 i 1
n n
X X Y Y
2 2
i i
i 1 i 1 Chap 10-51
© 2003 Prentice-Hall, Inc.
Example: Produce Stores
From Excel Printout r
Is there any Regression Statistics
Multiple R 0.9705572
evidence of linear
R Square 0.94198129
relationship between
Adjusted R Square 0.93037754
Annual Sales of a Standard Error 611.751517
store and its Square Observations 7
Footage at .05 level
of significance? H0: = 0 (No association)
H1: 0 (Association)
.05
© 2003 Prentice-Hall, Inc.
df 7 - 2 = 5 Chap 10-52
Example: Produce Stores
Solution
Decision:
r
.9706
t 9.0099 Reject H0
r 2
1 .9420
5 Conclusion:
n2
There is evidence of a
Critical Value(s): linear relationship at 5%
level of significance
Reject Reject
The value of the t statistic is
.025 .025 exactly the same as the t
statistic value for test on the
-2.5706 0 2.5706 slope coefficient
Chap 10-53
© 2003 Prentice-Hall, Inc.
Estimation of Mean Values
Confidence Interval Estimate for Y | X X i :
The Mean of Y given a particular Xi
Size of interval vary according to
Standard error distance away from mean, X
of the estimate
1 (Xi X ) 2
Yˆi tn 2 SYX n
n
t value from table (Xi X )
i 1
2
with df=n-2
Chap 10-54
© 2003 Prentice-Hall, Inc.
Prediction of Individual Values
Prediction Interval for Individual
Response Yi at a Particular Xi
1 (Xi X ) 2
Yˆi tn 2 SYX 1 n
n
(Xi X )
i 1
2
Chap 10-55
© 2003 Prentice-Hall, Inc.
Interval Estimates for Different
Values of X
Prediction Interval Confidence
for a individual Yi Interval for the
Y mean of Y
+ b X
1 i
Yi = b0
X
X A given X
Chap 10-56
© 2003 Prentice-Hall, Inc.
Example: Produce Stores
Data for 7 Stores:
Annual
Store Square Sales Consider a store
Feet ($000)
with 2000 square
1 1,726 3,681 feet.
2 1,542 3,395
3 2,816 6,653 Regression Equation
4 5,555 9,543 Obtained:
5 1,292 3,318
6 2,208 5,563 Yˆ 1636.415 1.487 X i
7 1,313 3,760
Chap 10-57
© 2003 Prentice-Hall, Inc.
Estimation of Mean Values:
Example
Confidence Interval Estimate for Y | X X i
Find the 95% confidence interval for the average annual
sales for stores of 2,000 square feet
1 ( X i X )2
Yˆi tn 2 SYX n 4610.45 612.66
n
i (
i 1
X X ) 2
Chap 10-58
© 2003 Prentice-Hall, Inc.
Prediction Interval for Y :
Example
Prediction Interval for Individual
YX X i
1 ( X i X )2
Yˆi tn 2 SYX 1 n 4610.45 1687.68
n
i( X
i 1
X ) 2
Chap 10-59
© 2003 Prentice-Hall, Inc.
Estimation of Mean Values and
Prediction of Individual Values in PHStat
Microsoft Excel
Worksheet
Chap 10-60
© 2003 Prentice-Hall, Inc.
Pitfalls of Regression Analysis
Chap 10-61
© 2003 Prentice-Hall, Inc.
Strategy for Avoiding the Pitfalls
of Regression
Chap 10-62
© 2003 Prentice-Hall, Inc.
Strategy for Avoiding the Pitfalls
of Regression
(continued)
If there is violation of any assumption, use
alternative methods (e.g., least absolute
deviation regression or least median of squares
regression) to least-squares regression or
alternative least-squares models (e.g.,
curvilinear or multiple regression)
If there is no evidence of assumption violation,
then test for the significance of the regression
coefficients and construct confidence intervals
and prediction intervals
Chap 10-63
© 2003 Prentice-Hall, Inc.
Chapter Summary
Introduced Types of Regression Models
Discussed Determining the Simple Linear
Regression Equation
Described Measures of Variation
Addressed Assumptions of Regression and
Correlation
Discussed Residual Analysis
Addressed Measuring Autocorrelation
Chap 10-64
© 2003 Prentice-Hall, Inc.
Chapter Summary
(continued)
Chap 10-65
© 2003 Prentice-Hall, Inc.