Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 47

Chapter 12

Simple Linear Regression


Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Computer Solution
Residual Analysis: Validating Model Assumptions

1
Simple Linear Regression Model

The equation that describes how y is related to x and


an error term is called the regression model.
The simple linear regression model is:

y = 0 + 1x +

– 0 and 1 are called parameters of the model.


–  is a random variable called the error term.

2
Simple Linear Regression Equation

n The simple linear regression equation is:

E(y) = 0 + 1x

• Graph of the regression equation is a straight line.


• 0 is the y intercept of the regression line.
• 1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.

3
Simple Linear Regression Equation
n Positive Linear Relationship

E(y)

Regression line

Intercept Slope 1
0
is positive

4
Simple Linear Regression Equation

n Negative Linear Relationship

E(y)

Intercept Regression line


0

Slope 1
is negative

5
Simple Linear Regression Equation

n No Relationship

E(y)

Regression line
Intercept
0
Slope 1
is 0

6
Estimated Simple Linear Regression Equation

n The estimated simple linear regression equation is:

ŷ  b0  b1 x

• The graph is called the estimated regression line.


• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.

7
Estimation Process

Regression Model Sample Data:


y = 0 + 1x + x y
Regression Equation x1 y1
E(y) = 0 + 1x . .
Unknown Parameters . .
0, 1 xn yn

Estimated
b0 and b1 Regression Equation
provide estimates of ŷ  b0  b1 x
0 and 1 Sample Statistics
b0, b1

8
Least Squares Method

Least Squares Criterion


min  (y i  y i ) 2

where:
yi = observed value of the dependent variable
for the ith observation
^
yi = estimated value of the dependent variable
for the ith observation

9
The Least Squares Method

Slope for the Estimated Regression Equation

 xi y i  (  xi  y i ) / n
b1  2 2
 xi  (  xi ) / n

10
The Least Squares Method

n y-Intercept for the Estimated Regression Equation

b0  y  b1 x

where:
xi = value of independent variable for ith observation
y_i = value of dependent variable for ith observation
x_ = mean value for independent variable
y = mean value for dependent variable
n = total number of observations

11
Example: Reed Auto Sales

Simple Linear Regression


Reed Auto periodically has a special week-long
sale. As part of the advertising campaign Reed runs
one or more television commercials during the
weekend preceding the sale. Data from a sample of 5
previous sales are shown on the next slide.

12
Example: Reed Auto Sales

n Simple Linear Regression

Number of TV Ads Number of Cars Sold


1 14
3 24
2 18
1 17
3 27

13
Example: Reed Auto Sales

Slope for the Estimated Regression Equation

b1 = 220 - (10)(100)/5 = _____


24 - (10)2/5
y-Intercept for the Estimated Regression Equation
b0 = 20 - 5(2) = _____
Estimated Regression Equation

^
y = 10 + 5x

14
Example: Reed Auto Sales

Scatter Diagram

30
25
20
Cars Sold

^
y = 10 + 5x
15
10
5
0
0 1 2 3 4
TV Ads

15
The Coefficient of Determination

Relationship Among SST, SSR, SSE

SST = SSR + SSE

 ( y i  y )   ( y^i  y )   ( y i  y^i )
2 2 2

where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

16
The Coefficient of Determination

n The coefficient of determination is:

r2 = SSR/SST

where:
SST = total sum of squares
SSR = sum of squares due to regression

17
Example: Reed Auto Sales

Coefficient of Determination

r2 = SSR/SST = 100/114 =
The regression relationship is very strong
because 88% of the variation in number of cars sold
can be explained by the linear relationship between
the number of TV ads and the number of cars sold.

18
The Correlation Coefficient

Sample Correlation Coefficient

rxy  (sign of b1 ) Coefficient of Determination

rxy  (sign of b1 ) r 2

where:
b1 = the slope of the estimated regression
equationyˆ  b0  b1 x

19
Example: Reed Auto Sales

Sample Correlation Coefficient

rxy  (sign of b1 ) r 2
The sign of b1 in the equation yˆ  10  5 x is “+”.

rxy = + .8772

rxy = +.9366

20
Model Assumptions

Assumptions About the Error Term 


1. The error  is a random variable with mean of
zero.
2. The variance of  , denoted by 2, is the same for
all values of the independent variable.
3. The values of  are independent.
4. The error  is a normally distributed random
variable.

21
Testing for Significance

To test for a significant regression relationship, we


must conduct a hypothesis test to determine whether
the value of 1 is zero.
Two tests are commonly used
– t Test
– F Test
Both tests require an estimate of 2, the variance of 
in the regression model.

22
Testing for Significance

An Estimate of  2
The mean square error (MSE) provides the estimate
of  2, and the notation s2 is also used.

s2 = MSE = SSE/(n-2)

where:
SSE   ( yi  yˆ i ) 2   ( yi  b0  b1 xi ) 2

23
Testing for Significance

An Estimate of 
– To estimate  we take the square root of 2.
– The resulting s is called the standard error of the
estimate.
SSE
s  MSE 
n2

24
Testing for Significance: t Test

Hypotheses

H0: 1 = 0
Ha: 1 = 0

Test Statistic b1
t
sb1

s
where sb1 
 i
( x  x ) 2

25
Testing for Significance: t Test

n Rejection Rule

Reject H0 if t < -tor t > t

where: t is based on a t distribution


with n - 2 degrees of freedom

26
Example: Reed Auto Sales

t Test
– Hypotheses
H0 : 1 = 0
Ha: 1 = 0

– Rejection Rule
For  = .05 and d.f. = 3, t.025 = _____
Reject H0 if t > t.025 = _____

27
Example: Reed Auto Sales

n t Test
• Test Statistics
t = _____/_____ = 4.63
• Conclusions
t = 4.63 > 3.182, so reject H0

28
Confidence Interval for 1

We can use a 95% confidence interval for 1 to test the


hypotheses just used in the t test.
H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1.

29
Confidence Interval for 1

The form of a confidence interval for 1 is:


b1  t / 2 sb1

where b1 is the point estimate


t / 2 sb1
is the margin of error
t / 2
is the t value providing an area
of /2 in the upper tail of a
t distribution with n - 2 degrees
of freedom

30
Example: Reed Auto Sales

Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for 1.
95% Confidence Interval for 1
b1  t / 2 sb1
= 5 +/- 3.182(1.08) = 5 +/- 3.44

or ____ to ____
Conclusion
0 is not included in the confidence interval.
Reject H0

31
Testing for Significance: F Test

n Hypotheses

H0 : 1 = 0
Ha : 1 = 0

n Test Statistic

F = MSR/MSE

32
Testing for Significance: F Test

n Rejection Rule

Reject H0 if F > F

where: F is based on an F distribution


with 1 d.f. in the numerator and
n - 2 d.f. in the denominator

33
Example: Reed Auto Sales

n F Test
• Hypotheses
H0 : 1 = 0
Ha: 1 = 0
• Rejection Rule
For  = .05 and d.f. = 1, 3: F.05 = ______
Reject H0 if F > F.05 = ______.

34
Example: Reed Auto Sales

n F Test
• Test Statistic
F = MSR/MSE = ____ / ______ = 21.43
• Conclusion
F = 21.43 > 10.13, so we reject H0.

35
Some Cautions about the
Interpretation of Significance Tests

Rejecting H0: 1 = 0 and concluding that the


relationship between x and y is significant does not
enable us to conclude that a cause-and-effect
relationship is present between x and y.
Just because we are able to reject H0: 1 = 0 and
demonstrate statistical significance does not enable us
to conclude that there is a linear relationship between
x and y.

36
Using the Estimated Regression Equation
for Estimation and Prediction
n Confidence Interval Estimate of E(yp)
y p  t  /2 s y p

n Prediction Interval Estimate of yp

yp + t/2 sind

where: confidence coefficient is 1 - and


t/2 is based on a t distribution
with n - 2 degrees of freedom

37
Example: Reed Auto Sales

Point Estimation
If 3 TV ads are run prior to a sale, we expect the mean
number of cars sold to be:
^
y =^10 + 5(3) = ______ cars

38
Example: Reed Auto Sales

n Confidence Interval for E(yp)


95% confidence interval estimate of the mean number
of cars sold when 3 TV ads are run is:

25 + 4.61 = ______ to _______ cars

39
Example: Reed Auto Sales

n Prediction Interval for yp


95% prediction interval estimate of the number of
cars sold in one particular week when 3 TV ads are
run is:

25 + 8.28 = _____ to ______ cars

40
Residual Analysis

Residual for Observation i

yi – y^i

Standardized Residual for Observation i


y i  y^i
syi  y^i
where: syi  y^i  s 1  hi

and 1 ( xi  x ) 2
hi  
n  ( xi  x ) 2
41
Example: Reed Auto Sales

Residuals

Observation Predicted Cars Sold Residuals


1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2

42
Example: Reed Auto Sales

Residual Plot
TV Ads Residual Plot
3
2
Residuals

1
0
-1
-2
-3
0 1 2 3 4
TV Ads

43
Residual Analysis

Residual Plot
y  yˆ
Good Pattern
Residual

44
Residual Analysis

n Residual Plot

y  yˆ
Nonconstant Variance
Residual

45
Residual Analysis

n Residual Plot

y  yˆ
Model Form Not Adequate
Residual

46
End of Chapter 12

47

You might also like