Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 123



Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

 Simple Regression
 Linear Regression

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Simple Regression
A regression model is a mathematical equation
that describes the relationship between two or
more variables. A simple regression model
includes only two variables: one independent and
one dependent. The dependent variable is the
one being explained, and the independent variable
is the one used to explain the variation in the
dependent variable.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Linear Regression

A (simple) regression model that gives a
straight-line relationship between two
variables is called a linear regression

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.1 Relationship between food expenditure and
income. (a) Linear relationship. (b) Nonlinear

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.2 Plotting a linear equation.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.3 y-intercept and slope of a line.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

 Scatter Diagram
 Least Squares Line
 Interpretation of a and b
 Assumptions of the Regression Model

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

In the regression model y = A + Bx + ε,
A is called the y-intercept or constant term,
B is the slope, and ε is the random error
term. The dependent and independent
variables are y and x, respectively.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

In the model ŷ = a + bx, a and b, which are
calculated using sample data, are called
the estimates of A and B, respectively.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.1 Incomes (in hundreds of dollars) and
Food Expenditures of Seven Households

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Scatter Diagram

A plot of paired observations is called a
scatter diagram.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.4 Scatter diagram.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.5 Scatter diagram and straight lines.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.6 Regression Line and random errors.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Error Sum of Squares (SSE)
The error sum of squares, denoted SSE, is

SSE   e   ( y  yˆ )
2 2

The values of a and b that give the minimum SSE

are called the least square estimates of A and B,
and the regression line obtained with these
estimates is called the least square line.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
The Least Squares Line

For the least squares regression line

ŷ = a + bx,

SS xy
b and a  y  bx
SS xx

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
The Least Squares Line


  x   y    x

SS xy   xy  and SS xx   x 2

n n

and SS stands for “sum of squares”. The

least squares regression line ŷ = a + bx us
also called the regression of y on x.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1

Find the least squares regression line for

the data on incomes and food expenditure
on the seven households given in the Table
13.1. Use income as an independent
variable and food expenditure as a
dependent variable.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.2

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1: Solution

 x  386  y  108
x   x / n  386 / 7  55.1429
y   y / n  108 / 7  15.4286

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1: Solution

SS xy   xy 
  x   y 
 6403 
 447.5714
n 7
  x
SS xx   x 2   23,058   1772.8571
n 7

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1: Solution

SSxy 447.5714
b   .2525
SSxx 1772.8571
a  y  bx  15.4286  (.2525)(55.1429)  1.5050

Thus, our estimated regression model is

ŷ = 1.5050 + .2525 x

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.7 Error of prediction.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Interpretation of a and b
Interpretation of a
 Consider the household with zero income.
Using the estimated regression line
obtained in Example 13-1,
 ŷ = 1.5050 + .2525(0) = $1.5050 hundred
 Thus, we can state that households with
no income is expected to spend $150.50
per month on food
 The regression line is valid only for the
values of x between 33 and 83
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Interpretation of a and b

Interpretation of b
 The value of b in the regression model
gives the change in y (dependent variable)
due to change of one unit in x
(independent variable).
 We can state that, on average, a $100 (or
$1) increase in income of a household will
increase the food expenditure by $25.25
(or $.2525).

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.8 Positive and negative linear
relationships between x and y.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Assumptions of the Regression Model
Assumption 1:
The random error term Є has a mean
equal to zero for each x

Assumption 2:
The errors associated with different
observations are independent

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Assumptions of the Regression Model
Assumption 3:
For any given x, the distribution of errors is

Assumption 4:
The distribution of population errors for
each x has the same (constant) standard
deviation, which is denoted σЄ

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.11 (a) Errors for households with an
income of $4000 per month.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.11 (b) Errors for households with an
income of $ 7500 per month.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.12 Distribution of errors around the
population regression line.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.13 Nonlinear relations between x and

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Degrees of Freedom for a Simple Linear
Regression Model
The degrees of freedom for a simple
linear regression model are
df = n – 2

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.14 Spread of errors for x = 40 and x = 75.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
 Thestandard deviation of errors is
calculated as
SSyy  bSSxy
se 
 where

(  y )2
SSyy   y 2 

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-2
Compute the standard deviation of errors
se for the data on monthly incomes and
food expenditures of the seven households
given in Table 13.1.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.3

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-2: Solution

  y
SSyy   y 2   1792   125.7143
n 7
SSyy  bSSxy 125.7143  .2525(447.5714)
se   1.5939
n2 72

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

Total Sum of Squares (SST)

The total sum of squares, denoted by
SST, is calculated as
  y

SST   y 2

Note that this is the same formula that we
used to calculate SSyy.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.15 Total errors.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.4

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.16 Errors of prediction when
regression model is used.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

Regression Sum of Squares (SSR)

The regression sum of squares , denoted
by SSR, is


Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Coefficient of Determination
The coefficient of determination, denoted
by r2, represents the proportion of SST that is
explained by the use of the regression model.
The computational formula for r2 is
b SSxy
r 

and 0 ≤ r2 ≤ 1

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-3
For the data of Table 13.1 on monthly
incomes and food expenditures of seven
households, calculate the coefficient of

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-3: Solution
 From earlier calculations made in Examples 13-1
and 13-2,
 b = .2525, SSxx = 447.5714, SSyy = 125.7143

b SSxy (.2525)(447.5714)
r 
  .90
SSyy 125.7143

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

 Sampling Distribution of b
 Estimation of B
 Hypothesis Testing About B

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Sampling Distribution of b
Mean, Standard Deviation, and Sampling
Distribution of b
Because of the assumption of normally
distributed random errors, the sampling
distribution of b is normal. The mean and
standard deviation of b, denoted by b and
 b , respectively, are
b  B and b 
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Estimation of B
Confidence Interval for B
The (1 – α)100% confidence interval for B
is given by
b  tsb
where se
sb 
and the value of t is obtained from the t
distribution table for /2 area in the right tail
of the t distribution and n-2 degrees of

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-4
Construct a 95% confidence interval for B
for the data on incomes and food
expenditures of seven households given in
Table 13.1.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-4: Solution
se 1.5939
sb    .0379
SSxx 1772.8571
df  n  2  7  2  5
 / 2  (1  .95) / 2  .025
t  2.571
b  tsb  .2525  2.571(.0379)
 .2525  .0974  .155 to .350

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Hypothesis Testing About B

Test Statistic for b

The value of the test statistic t for b is
calculated as
The value of B is substituted from the null

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-5

Test at the 1% significance level whether

the slope of the regression line for the
example on incomes and food expenditures
of seven households is positive.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-5: Solution
 Step 1:
 H0: B = 0 (The slope is zero)

 H1: B > 0 (The slope is positive)

 Step 2:
   is not known
 Hence, we will use the t distribution to
make the test about B

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-5: Solution
 Step 3:
 α = .01
 Area in the right tail = α = .01
 df = n – 2 = 7 – 2 = 5
 The critical value of t is 3.365

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.17

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-5: Solution
Step 4:
From H0

b  B .2525  0
t   6.662
sb .0379

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-5: Solution

 Step 5:
 The value of the test statistic t = 6.662
 It is greater than the critical value of t = 3.365
 It falls in the rejection region
 Hence, we reject the null hypothesis
 We conclude that x (income) determines y
(food expenditure) positively.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

 Linear Correlation Coefficient

 Hypothesis Testing About the Linear
Correlation Coefficient

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Linear Correlation Coefficient

Value of the Correlation Coefficient

The value of the correlation coefficient
always lies in the range of –1 to 1; that is,
-1 ≤ ρ ≤ 1 and -1 ≤ r ≤ 1

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation between two
(a) Perfect positive linear correlation, r = 1

Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation between two
(b) Perfect negative linear correlation, r = -1

Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation between two
(c) No linear correlation, , r ≈ 0

Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation between

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation between

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation between

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation between

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Linear Correlation Coefficient
Linear Correlation Coefficient
The simple linear correlation, denoted by
r, measures the strength of the linear
relationship between two variables for a
sample and is calculated as
SSxx SSyy

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-6

Calculate the correlation coefficient for the

example on incomes and food expenditures
of seven households.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-6: Solution

SSxx SSyy
  .95

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Hypothesis Testing About the Linear Correlation
Test Statistic for r
If both variables are normally distributed
and the null hypothesis is H0: ρ = 0, then
the value of the test statistic t is calculated
t r
1 r 2

Here n – 2 are the degrees of freedom.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-7

Using the 1% level of significance and the

data from Example 13-1, test whether the
linear correlation coefficient between
incomes and food expenditures is positive.
Assume that the populations of both
variables are normally distributed.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-7: Solution
 Step 1:
 H : ρ = 0 (The linear correlation coefficient
is zero)
 H : ρ > 0 (The linear correlation coefficient
is positive)

 Step 2: The population distributions for both

variables are normally distributed. Hence
we can use the t distribution to perform this
test about the linear correlation coefficient.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-7: Solution

 Step 3:
 Area in the right tail = .01
 df = n – 2 = 7 – 2 = 5
 The critical value of t = 3.365

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.20

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-7: Solution
Step 4:

t r
1 r 2

 .95  6.803
1  (.95) 2

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-7: Solution
 Step 5:
 The value of the test statistic t = 6.803
 It is greater than the critical value of t=3.365
 It falls in the rejection region
 Hence, we reject the null hypothesis
 We conclude that there is a positive
relationship between incomes and food

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8
A random sample of eight drivers insured
with a company and having similar auto
insurance policies was selected. The
following table lists their driving experience
(in years) and monthly auto insurance

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8
a) Does the insurance premium depend on the
driving experience or does the driving experience
depend on the insurance premium? Do you
expect a positive or a negative relationship
between these two variables?
b) Compute SSxx, SSyy, and SSxy.
c) Find the least squares regression line by
choosing appropriate dependent and
independent variables based on your answer in
part a.
d) Interpret the meaning of the values of a and b
calculated in part c.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8
e) Plot the scatter diagram and the regression line.
f) Calculate r and r2 and explain what they mean.
g) Predict the monthly auto insurance for a driver
with 10 years of driving experience.
h) Compute the standard deviation of errors.
i) Construct a 90% confidence interval for B.
j) Test at the 5% significance level whether B is
k) Using α = .05, test whether ρ is difference from

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
a) Based on theory and intuition, we
expect the insurance premium to
depend on driving experience
 The insurance premium is a dependent
 The driving experience is an independent

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.5

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

b) x   x / n  90 / 8  11.25
y   y / n  474 / 8  59.25
( x )( y ) (90)(474)
SSxy   xy   4739   593.5000
n 8
(  x )2 (90)2
SSxx   x 2   1396   383.5000
n 8
(  y )2 (474)2
SSyy   y 2   29,642   1557.5000
n 8

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

SSxy 593.5000
b   1.5476
SSxx 383.5000
a  y  bx  59.25  (1.5476)(11.25)  76.6605

yˆ  76.6605  1.547 x

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

d) The value of a = 76.6605 gives the

value of ŷ for x = 0; that is, it gives the
monthly auto insurance premium for a
driver with no driving experience.
The value of b = -1.5476 indicates that,
on average, for every extra year of
driving experience, the monthly auto
insurance premium decreases by $1.55.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.21 Scatter diagram and the regression
e) The regression line slopes downward from
left to right.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

SSxy 593.5000
r    .77
SSxx SSyy (383.5000)(1557.5000)
bSSxy (1.5476)( 593.5000)
r 
  .59
SSyy 1557.5000

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
f) The value of r = -0.77 indicates that the
driving experience and the monthly auto
insurance premium are negatively related.
The (linear) relationship is strong but not
very strong.
The value of r² = 0.59 states that 59% of
the total variation in insurance premiums is
explained by years of driving experience
and 41% is not.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

g) Using the estimated regression line, we

find the predict value of y for x = 10 is

ŷ = 76.6605 – 1.5476(10) = $61.18

Thus, we expect the monthly auto

insurance premium of a driver with 10
years of driving experience to be $61.18.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

h) SSyy  bSSxy
se 
1557.5000  ( 1.5476)( 593.5000)

 10.3199

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

se 10.3199
i) sb    .5270
SSxx 383.5000

 / 2  .5  (.90 / 2)  .05
df  n  2  8  2  6
t  1.943
b  tsb  1.5476  1.943(.5270)
 1.5476  1.0240  2.57 to  .52

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

 Step 1:
 H0: B = 0 (B is not negative)
 H1: B < 0 (B is negative)

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
 Step 2: Because the standard deviation of
the error is not known, we use the t
distribution to make the hypothesis test

 Step 3:
 Area in the left tail = α = .05
 df = n – 2 = 8 – 2 = 6
 The critical value of t is -1.943

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.22

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
Step 4:
From H0

b  B 1.5476  0
t   2.937
sb .5270

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

 Step 5:
 The value of the test statistic t = -2.937
 It falls in the rejection region
 Hence, we reject the null hypothesis and
conclude that B is negative
 The monthly auto insurance premium
decreases with an increase in years of
driving experience.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

 Step 1:
 H0: ρ = 0 (The linear correlation coefficient
is zero)
 H1: ρ ≠ 0 (The linear correlation coefficient
is different from zero)

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
 Step 2: Assuming that variables x and y
are normally distributed, we will use the t
distribution to perform this test about the
linear correlation coefficient.

 Step 3:
 Area in each tail = .05/2 = .025
 df = n – 2 = 8 – 2 = 6
 The critical values of t are -2.447 and
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.23

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
Step 4:

t r
1 r 2

 .77  2.956
1  (.77) 2

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

 Step 5:
 The value of the test statistic t = -2.956
 It falls in the rejection region
 Hence, we reject the null hypothesis
 We conclude that the linear correlation
coefficient between driving experience and
auto insurance premium is different from

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

 Using the Regression Model for Estimating

the Mean Value of y
 Using the Regression Model for Predicting
a Particular Value of y

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.24 Population and sample regression

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Using the Regression Model for Estimating the
Mean Value of y

Confidence Interval for μy|x

The (1 – α)100% confidence interval for
μy|x for x = x0 is
yˆ  t s yˆ m
where the value of t is obtained from the t
distribution table for α/2 area in the right
tail of the t distribution curve and df = n – 2.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Using the Regression Model for Estimating the
Mean Value of y

Confidence Interval for μy|x

The value of syˆ m is calculated as follows:

1 ( x0  x ) 2
syˆ m  se 
n SSxx

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-9

Refer to Example 13-1 on incomes and food

expenditures. Find a 99% confidence
interval for the mean food expenditure for all
households with a monthly income of $5500.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-9: Solution
 Using the regression line estimated in
Example 13-1, we find the point estimate
of the mean food expenditure for x = 55
 ŷ = 1.5050 + .2525(55) = $15.3925 hundred
 Area in each tail = α/2 = (1 – .99)/2
= .005
 df = n – 2 = 7 – 2 = 5
 t = 4.032

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-9: Solution
se  1.5939, x  55.1429, and SSxx  1772.8571
1 ( x 0  x )2
Syˆ m  se 
n SSxx
1 (55  55.1429)2
 (1.5939)   .6025
7 1772.8571

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-9: Solution

Hence, the 99% confidence interval for μ y |55 is

yˆ  tsyˆ m  15.3925  4.032(.6025)
 15.3925  2.4293  12.9632 to 17.8218

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Using the Regression Model for Predicting a
Particular Value of y
Prediction Interval for yp
The (1 – α)100% prediction interval for
the predicted value of y, denoted by yp, for
x = x0 is
yˆ  t syˆ p

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Using the Regression Model for Predicting a
Particular Value of y
Prediction Interval for yp
where the value of t is obtained from the t
distribution table for α/2 area in the right tail
of the t distribution curve and df = n – 2.
The value of syˆ p is calculated as follows:

1 ( x0  x ) 2
syˆ p  se 1 
n SSxx
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-10
Refer to Example 13-1 on incomes and
food expenditures. Find a 99% prediction
interval for the predicted food expenditure
for a randomly selected household with a
monthly income of $5500.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-10: Solution

 Using the regression line estimated in

Example 13-1, we find the point estimate of
the predicted food expenditure for x = 55
 ŷ = 1.5050 + .2525(55) = $15.3925 hundred
 Area in each tail = α/2 = (1– .99)/2 = .005
 df = n – 2 = 7 – 2 = 5
 t = 4.032

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-10: Solution

se  1.5939, x  55.1429, and SSxx  1772.8571

1 ( x 0  x )2
Syˆ p  se 1 
n SSxx
1 (55  55.1429)2
 (1.5939) 1    1.7040
7 1772.8571

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-10: Solution

Hence, the 99% prediction interval for y p for x  55 is

yˆ  t s ŷp =15.3925 ± 4.032(1.7040)
 15.3925  6.8705  8.5220 to 22.2630

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

 Extrapolation: The regression line estimated

for the sample data is reliable only for the
range of x values observed in the sample.

 Causality: The regression line does not

prove causality between two variables: that
is, it does not predict that a change in y is
caused by a change in x.

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved

You might also like