Applied Statistics II Chapter 7 The Relationship Between Two Variables

Applied Statistics II
Chapter 7 The Relationship Between Two

Variables
Jian Zou
WPI Mathematical Sciences
1 / 73
Displaying and Summarizing Association in Bivariate Quantitative
Data
Correlation
Simple Linear Regression
The Relationship Between Two Categorical Variables
2 / 73
One Variable
In first 6 chapters, we dealt with just one variable at a time. This

variable when measured on many different subjects and objects,
took the form of a list of numbers.
e.g:-Height of a student in the class (variable).

data set: 64”, 62”, 69”, 72”, 73”, 65”, 64”, 67”.
3 / 73
Two or more Variables
Often, when collecting or analyzing data, there are two or more

variables of interest. In many cases it is important to know
whether two or more variables are related, and if they are, what
the nature of that relationship is.
e.g:-What is the relationship between height (variable 1) and

weight (variable 2) of a student in the class.
data set:
height 64” 62” 69” 72” 73” 65” 64” 67”
weight 148 152 170 200 175 169 155 162
4 / 73
Two or more Variables
Data sets which consist of more than one variable are called
multivariate; those which consist of two variables are called
bivariate. For now, and most of this course, we will stick to
relations between only two variables.
5 / 73
Summarizing Association in Bivariate Quantitative Data
To parallel our discussion with just one variable case, we discuss

Graphical Summaries and Numerical Summaries.
6 / 73
Graphical Summaries (sec 7.1)
The basic plot for showing the relation between two quantitative
variables is the scatterplot, which plots point (x,y).
It gives us a summary of the relationship look like a bivariate (or

X-Y) plot that plots one variable against the other.
7 / 73
Scatterplots
Here are the things to look for:

1. Is there any relationship?
Here we look for the presence of association (Linear or non linear
(Quadratic or something else)).
2. Type of the association

Is the relationship linear, quadratic, or something else.
(After this point we discuss only linear associstions).
3. Direction of the association

Is the association positive ( Y increases when X is increased) or
negative ( Y decreases when X is increased).
8 / 73
Scatterplots
9 / 73
Scatterplots
4. Strength of the association
Is the association strong (clear pattern) or weak (fuzzy pattern).
10 / 73
Independent and Dependent Variables
When dealing with a relation, the X and Y variables have

particular roles to play. The X (horizontal axis) variable represents
the variable that explains what we see in Y (the vertical axis.)
Often, X is a quantity which we can change or have control over.

Or at least, we want to see how Y reacts to different values of X .
So X is usually called the independent or explanatory variable,

and Y the dependent or response variable.
11 / 73
Numerical Summaries (sec 7.2)
Here we focus on measuring linear association in bivariate

quantitative data. The Pearson correlation coefficient is a measure
of the strength of linear association between two quantitative
variables.
12 / 73
Summary Measures
Suppose there are n pair of random measurements (Xi , Yi ),
i = 1, 2, 3, · · · , n.
e.g:- The height and arm span of n individuals.
We can summarize the location (or center) of each variable by the

sample means:
n n
1X 1X
X̄ = Xi and Ȳ = Yi
n n
i=1 i=1
We also can summarize the spread of each variable by the sample
standard deviations:
v v
u n u n
u 1 X u 1 X
SX = t (Xi − X̄ )2 and SY = t (Yi − Ȳ )2
n−1 n−1
i=1 i=1
(however, none of these summary measures says anything about

the relationship between the two variables).
13 / 73
Pearson Correlation Coefficient
Then the Pearson correlation between X and Y computed from

these data is
n
1 X 0 0
r= Xi Yi ,
n−1
i=1
where
Xi − X̄ Yi − Ȳ
Xi0 = and Yi0 =
SX SY
are the standardize data.
14 / 73
The following illustrate what Pearson correlation measures.
15 / 73
Things to know about Pearson correlation

• Pearson correlation is always between -1 and 1. Values near 1
signify strong positive linear association. Values near -1 signify
strong negative linear association. Values near 0 signify weak linear
association.
• Correlation between X and Y is the same as the correlation
between Y and X.
• If SX = 0 and/or SY = 0, we define r to be 0. This is because
The formula doesn’t work in this case (division of 0 by 0)
The standard deviation equals 0 if and only if all data values are
equal, so
There can be no association since there is no variation.
16 / 73
Inference using Pearson correlation
If the data are a random sample from a large population, we can

consider inference about the population correlation, ρ, which would
be the Pearson correlation computed for all units in the population.
If n is the sample size

s
n−2
t = (r − ρ) ,
(1 − r 2 )(1 − ρ2 )
has approximately a tn−2 distribution. We can use this fact to

obtain a confidence interval for ρ.
17 / 73
Hypothesis testing for correlation
1. Null and Alternative Hypotheses:
Ha+ : ρ > ρ0
H0 : ρ = ρ0 Ha− : ρ < ρ0
Ha± : ρ 6= ρ0
2. Test Statistic
s
n−2
t ∗ = (r − ρ0 ) .
(1 − r 2 )(1 − ρ20 )
Here t ∗ is the observed value of t.
18 / 73
Hypothesis testing for correlation
3. P-value
• For the test of H0 versus Ha+
p-value = p + = P(T > t ∗ ), where T ∼ tn−2 .
• For the test of H0 versus Ha−
p-value = p − = P(T < t ∗ ), where T ∼ tn−2 .
• For the test of H0 versus Ha±
p-value = 2 min (p − , p + ).
19 / 73
Example
EX. 2. Consider the following data of wear on a certain kind of
carbide-steel cutting tool to the duration of use.
Y (wear): 0.09 0.09 0.07 0.07 0.06 0.05 0.04 0.03 0.02
X (time): 130 120 100 75 70 55 42 30 19
If Person correlation coefficient of the data is .9797, perform a
hypothesis test to test whether the population correlation exceeds
0.9.
20 / 73
Example
21 / 73
Simple linear regression (SLR) model (sec 7.3)
The SLR model attempts to quantify the relationship between a

single predictor variable Z and a response variable Y . This
reasonably fexible yet simple model has the form
Y = β0 + β1 X (Z ) +
where is a random error term having mean 0 and variance σ 2

(often assumed to have a N(0, σ 2 ) distribution) and X (Z ) is a
function of Z , such as Z , Z 2 , or ln(Z ).
22 / 73
SLR Model
The function X is called the regressor. Often, we omit specifying

the dependence of the regressor X on the predictor Z, and just
write the model as
Y = β0 + β1 X +
The terms ”simple” and ” linear”’ mean:
• Simple:
• Linear:
23 / 73
Transformation
Consider an example of equivalence ratio data. Here is a plot of
fuel consumption versus equivalence ratio (left) and versus the fifth
power of equivalence ratio (right).
130 130
120 120
F F
_ _
C C
O O
N N
S S
110 110
100 100
0.8 1.0 1.2 0.5 1.0 1.5 2.0 2.5
E_RATIO E5
24 / 73
Transformation
The plot on the right seems to show a nearly linear association. In

terms of the SLR model, we have
Y = β0 + β1 X (Z ) +
= β0 + β1 Z 5 +
Where the response Y is fuel consumption, the predictor Z is

equivalence ratio, and the regressor X (Z ) is Z 5 , the fifth power of
equivalence ratio.
25 / 73
SLR Model
When fitting regression models, we’ll use least squares. Using

calculus, we find the least squares estimators of β0 and β1 to be
Pn
i=1 (Xi − X̄ )(Yi − Ȳ )
β̂1 = Pn 2
i=1 (Xi − X̄ )
and
β̂0 = Ȳ − β̂1 X̄
26 / 73
Example
EX. 3. Consider the data in EX. 2, fit a simple linear regression
model for data.
X̄ =
Ȳ =
Pn
i=1 (Xi − X̄ )(Yi − Ȳ ) =
Pn
i=1 (Xi − X̄ )2 =
27 / 73
Example
β̂1 =
β̂0 =
28 / 73
Residuals, Predicted and Fitted Values
• The predicted value of Y at X is
Ŷ = β̂0 + β̂1 X .
• For X = Xi , one of the values in the data set, the predicted

value is called a fitted value and is written
Ŷi = β̂0 + β̂1 Xi .
• The residuals, ei , i = 1, 2, · · · , n are the differences between the

observed and fitted values for each data value:
ei = Yi − Ŷi = Yi − (β̂0 + β̂1 Xi ).
29 / 73
Example
EX. 3. Consider the data in EX. 2,
Yi (wear): 0.09 0.09 0.07 0.07 0.06

Xi (time): 130 120 100 75 70
Ŷi (fitted): 0.0944 0.0882 0.0757 0.0601 0.0570
ei (res): -0.0044 0.0018 -0.0057 0.0099 0.0030
Yi (wear): 0.05 0.04 0.03 0.02

Xi (time): 55 42 30 19
Ŷi (fitted): 0.0477 0.0395 0.0321 0.0252
ei (res): 0.0023 0.0005 -0.0021 -0.0052
30 / 73
Estimator of of i
Recall that we have assumed the responses are produced from the
model
Yi = β0 + β1 Xi + i
After fitting the model, we have
Yi = β̂0 + β̂1 Xi + ei
β̂0 is the best guess of β0 , β̂1 is the best guess of β1 , and therefore
ei is the best guess of i .
31 / 73
Assumptions in SLR
In simple linear regression model we assume random errors are

independent and identically distributed from a normal distribution
with mean 0 and variance σ 2 where σ > 0 (i ∼iid N(o, σ 2 )).
32 / 73
Estimating the error Variance
Here to estimate the variance of the random error (σ 2 ), we use the

analogue sample quantity and it is the mean square error of
residuals.
n
1 X 2
S 2 = MSE = ei
n−2
i=1
33 / 73
Tools to Assess the Quality of the Fit
Residual plots
If the model fits well, the residuals (ei s) should behave as the
random errors (i s) would:
• Residuals should exhibit no patterns when plotted versus the Xi ,

Ŷi . If the model does not fit well, a pattern should show up in the
distribution of the residuals.
• Studentized residuals should be plotted on a normal quantile

plot, or even better, a quantile plot for the tn−3 distribution.
34 / 73
Residual plots
35 / 73
Tools to Assess the Quality of the Fit
Coefficient of Determination.
The coefficient of determination, r 2 , numerical measure of the

quality of the fit, is a measure of the proportion of variation in the
response ”explained” by the regression. The coefficient of
determination is computed as
SSR SSE
r2 = =1−
SSTO SSTO
where
n
X n
X
SSTO = (Yi − Ȳ )2 SSE = ei2 SSR = SSTO − SSE
i=1 i=1
are the total sum of squares, error sum of squares, and regression
sum of squares.
36 / 73
Example
EX. 5 Consider the data and the calculated values in EX. 4. Find
an estimate for the error variance. Calculate the coefficient of
determination.
SSE = ni=1 ei2 =
P
S2 =
37 / 73
Example
Pn
SSTO = i=1 (Yi − Ȳ )2 =
r2 =
38 / 73
Inferences of Slope and Intercept
1. Confidence Intervals
Level L confidence intervals for β0 and β1 are
Level L confidence intervals for β0 and β1 are
(β̂0 − σ̂(β̂0 )tn−2, 1+L , β̂0 + σ̂(β̂0 )tn−2, 1+L ),

2 2
and
(β̂1 − σ̂(β̂1 )tn−2, 1+L , β̂1 + σ̂(β̂1 )tn−2, 1+L ),
2 2
respectively, where
v
2
u " #
u 1 X
σ̂(β̂0 ) = tMSE + Pn 2
,
n i=1 (Xi − X )
and v
u , n
u X
σ̂(β̂1 ) = tMSE (Xi − X )2
i=1
39 / 73
Example
EX. 5 Consider the data in Ex 4. Find 95% confidence interval for
β0 and β1 .
βˆ0 = βˆ1 =
r h i
1 X̄ 2
σ̂(β̂0 ) = MSE n + Pn 2 =
i=1 (Xi −X̄ )
q
MSE/ ni=1 (Xi − X̄ )2 =
P
σ̂(β̂1 ) =
40 / 73
Example
tn−2, 1+L =
2
41 / 73
Note:
Whether the interval for β1 contains 0 is of particular interest. If it

does, it means that we cannot statistically distinguish β1 from 0.
This means we have to consider plausible the model for which
β1 = 0:
Y = β0 +
This model implies that there is no linear association between Y
and X.
42 / 73
2. Hypothesis tests for Slope and Intercept

• Hypotheses:
For i = 0 (intercept) and i = 1 (slope):
Ha + : βi > βi0
H0 : βi = βi0 vs Ha− : βi < βi0
Ha + : βi 6= βi0
−
43 / 73
• Test Statistic:
Let t ∗ be the observed value of the test statistic. Then
β̂i − βi0
t∗ = .
σ̂(β̂i )
• p-value
• For the test of H0 versus Ha+ , p-value = p + = P(T > t ∗ ),
• For the test of H0 versus Ha− , p-value = p− = P(T < t ∗ ),
+
• For the test of H0 versus Ha+ , p-value = p− = 2 min (p− , p + ),
−
where T ∼ tn−2 .
44 / 73
Example
EX. 6. For the data in Ex. 3., parameter estimates are
β̂0 =
β̂1 =
, and their estimated standard errors are
σ̂(β̂0 ) =
σ̂(β̂1 ) =
45 / 73
Example
a) Perform a hypothesis test to test whether β0 = 0 or not.
46 / 73
Example
b) Perform a hypothesis test to test whether β1 = 0 or not.
47 / 73
Estimation of The Mean Response
• The mean response at X = x0 is µ0 = β0 + β1 x0 .

• The point estimator of µ0 is Ŷ0 = β̂0 + β̂1 x0 .
• A level L confidence interval for µ0 is
(Ŷ0 − σ̂(Ŷ0 )tn−2, 1+L , Ŷ0 + σ̂(Ŷ0 )tn−2, 1+L ),

2 2
where s
(x0 − X̄ )2

1
σ̂(Ŷ0 ) = MSE + Pn 2
n i=1 (Xi − X̄ )
48 / 73
Example
EX. 7. Consider the data in EX. 3. Compute a 95% confidence
interval for the mean wear of tools at 100 minutes.
Point estimate =
Estimated standard error=
t- value=
49 / 73
Prediction of a Future Observation
A level L prediction interval for a future observation at X = x0 is
(Ŷnew − σ̂(Ynew − Ŷnew )tn−2, 1+L , Ŷnew + σ̂(Ynew − Ŷnew )tn−2, 1+L )
2 2
where s
(x0 − X̄ )2

1
σ̂(Ŷ0 ) = MSE 1 + + Pn 2
n i=1 (Xi − X̄ )
50 / 73
Example
EX. 8. Consider the data in EX. 3. Compute a 95% prediction
interval for the mean wear of tools at 110 minutes.
Point predictor = ŷnew =
Estimated standard error of prediction is
t- value=
51 / 73
The Relation Between Correlation and Regression
If the standardized responses and regressors are
Yi − Ȳ Xi − X̄
Yi0 = and Xi0 =
SY SX
.
Then the regression equation fitted by least squares can be written
as
Ŷ 0 = r .X 0 ,
where X 0 is any value of a regressor variable standardized as
described above, and r is the Pearson correlation between X and Y.
52 / 73
The Relation Between Correlation and Regression
Since the standardized responses and regressors are
Ŷi − Ȳ Xi − X̄
Ŷi0 = and Xi0 = ,
SY SX
we get
Ŷi − Ȳ Xi − X̄
=r .
SY SX
This gives
SY SY
Ŷ = Ȳ − r X̄ + r X.
SX SX
This implies
SY SY
β̂1 = r , β̂0 = Ȳ − r X̄ = Ȳ − β̂1 X̄ .
SX SX
53 / 73
Example
EX. 9. For the data in EX.4,
X̄ = , SX = ,
Ȳ = , SY = ,
r= ,
so
β̂1 =
β̂0 =
54 / 73
The Relationship Between Two Categorical Variables (sec
7.4)
Recall that categorical variables define categories such as religion

or gender.
Analysis of categorical data is based on counts, proportions or

percentages of data that fall into the various categories defined by
the variables.
We will consider two tools used to analyze bi-variate categorical

data:
1. Mosaic Plots 2. Two-Way Tables
55 / 73
Example
e.g :- A survey on academic dishonesty was conducted among WPI

students in 1993 and again in 1996. One question asked students
to respond to the statement ”Under some circumstances academic
dishonesty is justified.” Possible responses were ”Agree”,
”Disagree” and ”Strongly disagree”.
56 / 73
Example
Here is a mosaic plot of the 1993 results:
57 / 73
Two-Way Tables
Here is a two-way table of the same data:

1993 Survey
Dishonesty Sometimes Justified
Frequency
Percent
Row Pct. Strongly
Col Pct. Agree Disagree Disagree Total
Male 31 71 30 132
17.32 39.66 16.76 73.74
23.48 53.79 22.73
65.96 79.78 69.77
Gender
Female 16 18 13 47
8.94 10.06 7.26 26.26
34.04 38.30 27.66
34.04 20.22 30.23
Total 47 89 43 179
26.26 49.72 24.02 100.00
58 / 73
Inference for Categorical Data with Two Categories
Methods for comparing two proportions can be used (estimation

from Chapter 5 and hypothesis tests from Chapter 6).
59 / 73
Example 3, Continued
Suppose we want to compare pm , the population proportion of all

male college students who disagree or strongly diagree with the
statement “Under some circumstances academic dishonesty is
justified,” with pf , the proportion of all female college students
who do.
Assuming the WPI sample is a random sample of all students (a

big if), we can do so using a confidence interval for pf − pm
(Chapter 5) or an hypothesis test of H0 : pf − pm = 0 versus
Ha : pf − pm 6= 0 (Chapter 6).
60 / 73
Inference for Categorical Data with More Than Two
Categories: One-Way Tables
Suppose the categorical variable has c categories, and that the

population proportion in category i is pi . To test
(0)
H0 : pi = pi , i = 1, 2, . . . , c
(0)
Ha : pi 6 = pi for at least one i
(0)
for pre-specified values pi , i = 1, 2, . . . , c, use the Pearson χ2
statistic
c (0)
X (Yi − npi )2
X2 = (0)
,
i=1 np i
where Yi is the observed frequency in category i, and n is the total
number of observations.
61 / 73
Categories: One-Way Tables
Note that for each category the Pearson statistic computes

(observed-expected)2 /expected (noting that we assume H0 true
and under this assumption, the expected number in category i is
(0)
npi ) and sums over all categories.
If there is sufficient data (Guideline: The expected number in each

category is at least 5), then under H0 , X 2 ∼ χ2c−1 . Therefore, if
x 2∗ is the observed value of X 2 , the p-value of the test is
P(χ2c−1 ≥ x 2∗ ).
62 / 73
Example 4
Defects in parts produced by an injection molding process are

monitored and recorded for quality control purposes. Historically,
parts have been rejected for five reasons. To investigate if recent
production mirrors the historical pattern, 200 rejected parts are
randomly sampled.
Reason for Rejection
Black Flow
Spots Scratches Splay Sinks Lines Total
(0)
Historical proportion pi 0.50 0.20 0.15 0.10 0.05 1.00
(0)
Expected (npi ) 100 40 30 20 10 n = 200
Observed (yi ) 88 47 31 29 5 n = 200
(0) 2 (0) ∗
(Yi − npi ) /npi 1.440 1.225 0.033 4.050 2.500 x2 = 9.248
q
(0) (0)
Pearson residual (yi − npi )/ npi −1.200 1.107 0.182 2.012 −1.581
The p-value is P(χ24 ≥ 9.248) = 0.0552.
63 / 73
Categories: Two-Way Tables
Suppose a population is partitioned into r × c categories,

determined by r levels of variable 1 and c levels of variable 2.
The population proportion for level i of variable 1 and level j of

variable 2 is pij . This information can be displayed in the following
r × c table:
64 / 73
Inference for Two-Way Tables
Column Marginals
row 1 2 ... c
1 p11 p12 ... p1c p1·
2 p21 p22 ... p2c p2·
. . . . .
. . . . .
. . . . .
r pr 1 pr 2 ... prc pr ·
Marginals p·1 p·2 ... p·c 1
65 / 73
Suppose we want to test
H0 : row and column variables are independent

Ha : row and column variables are not independent
To do so, we select a random sample of size n from the population.
66 / 73
Suppose the table of observed frequencies is
Column Totals
row 1 2 ... c
1 Y11 Y12 ... Y1c Y1·
2 Y21 Y22 ... Y2c Y2·
. . . . .
. . . . .
. . . . .
r Yr 1 Yr 2 ... Yrc Yr ·
Totals Y·1 Y·2 ... Y·c n
67 / 73
Under H0 the expected cell frequency for the ij cell is given by
row i total × column j total

Eij =
sample size
Yi. Y.j
=
n
=np̂i. p̂.j
where pi. = Yi. /n and p.j = Y.j /n.
68 / 73
To measure the deviations of the observed frequencies from the

expected frequencies under the assumption of independence, we
construct the Pearson χ2 statistic
r X
c
X (Yij − Eij )2
X2 =
Eij
i=1 j=1
If H0 is true, X 2 has (approximately) a χ2(r −1)(c−1) distribution.
69 / 73
Example
EX. 10. A polling firm surveyed 269 American adults concerning

how leisure time is spent in the home. One question asked them to
select which of five leisure activities they were most likely to
partake in on a weeknight. The results are broken down by age
group in the following table:
70 / 73
Example
Frequency
Expected
Residual
Activity
Age Watch Listen Surf Play Computer
Group TV Read to Music the Web Game Totals
18-25 21 3 9 10 19 62
(19.13) (9.22) (11.52) (11.75) (10.37)
(+0.43) (−2.05) (−0.74) (−0.51) (+2.68)
26-35 17 5 6 8 13 49
(15.12) (7.29) (9.11) (9.29) (8.20)
(0.48) (−0.85) (−1.03) (−0.42) (1.68)
36-50 14 8 8 12 9 51
(15.74) (7.58) (9.48) (9.67) (8.53)
(−0.44) (0.15) (−0.48) (0.75) (0.16)
51-65 18 10 11 10 3 52
(16.04) (7.73) (9.67) (9.86) (8.70)
(0.49) (0.82) (0.43) (0.04) (−1.93)
Over 65 13 14 16 11 1 55
(16.97) (8.18) (10.22) (10.43) (9.20)
(−0.96) (2.04) (1.81) (0.18) (−2.70)
Totals 83 40 50 51 45 269
71 / 73
Example
As an example of the computation of table entries, consider the

entries in the (1,2) cell (age: 18-25, activity: Read), in which the
observed frequency is 3. The marginal number in the 18-25 bracket
is 62, while the marginal number in the Read bracket is 40, so
p̂1· = [ ], while p̂·2 = [ ], so the
expected number in the (1,2) cell is [ ].
The Pearson residual is [ ].
The value of the χ2 statistic is 38.91, which is computed as the

sum of the squares of the Pearson residuals. Comparing this with
the χ216 distribution, we get a p-value of 0.0011.
72 / 73
Association is NOT Cause and Effect
Two variables may be associated due to a number of reasons, such

as:
1. X could cause Y .
2. Y could cause X.
3. X and Y could cause each other.
4. X and Y could be caused by a third (lurking) variable Z.
5. X and Y could be related by chance.
73 / 73

Applied Statistics II Chapter 7 The Relationship Between Two Variables

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Statistics II Chapter 7 The Relationship Between Two Variables

Uploaded by

Copyright:

Available Formats

Applied Statistics II

Chapter 7 The Relationship Between Two

WPI Mathematical Sciences

Simple Linear Regression

The Relationship Between Two Categorical Variables

In first 6 chapters, we dealt with just one variable at a time. This

e.g:-Height of a student in the class (variable).

Often, when collecting or analyzing data, there are two or more

e.g:-What is the relationship between height (variable 1) and

To parallel our discussion with just one variable case, we discuss

It gives us a summary of the relationship look like a bivariate (or

Here are the things to look for:

2. Type of the association

3. Direction of the association

When dealing with a relation, the X and Y variables have

Often, X is a quantity which we can change or have control over.

So X is usually called the independent or explanatory variable,

Here we focus on measuring linear association in bivariate

We can summarize the location (or center) of each variable by the

(however, none of these summary measures says anything about

Then the Pearson correlation between X and Y computed from

Things to know about Pearson correlation

If the data are a random sample from a large population, we can

If n is the sample size

has approximately a tn−2 distribution. We can use this fact to

1. Null and Alternative Hypotheses:

Here t ∗ is the observed value of t.

p-value = p + = P(T > t ∗ ), where T ∼ tn−2 .

• For the test of H0 versus Ha−

p-value = p − = P(T < t ∗ ), where T ∼ tn−2 .

• For the test of H0 versus Ha±

The SLR model attempts to quantify the relationship between a

where  is a random error term having mean 0 and variance σ 2

The function X is called the regressor. Often, we omit specifying

The terms ”simple” and ” linear”’ mean:

0.8 1.0 1.2 0.5 1.0 1.5 2.0 2.5

The plot on the right seems to show a nearly linear association. In

Where the response Y is fuel consumption, the predictor Z is

When fitting regression models, we’ll use least squares. Using

• For X = Xi , one of the values in the data set, the predicted

Ŷi = β̂0 + β̂1 Xi .

• The residuals, ei , i = 1, 2, · · · , n are the differences between the

ei = Yi − Ŷi = Yi − (β̂0 + β̂1 Xi ).

Yi (wear): 0.09 0.09 0.07 0.07 0.06

Yi (wear): 0.05 0.04 0.03 0.02

After fitting the model, we have

In simple linear regression model we assume random errors are

Here to estimate the variance of the random error (σ 2 ), we use the

• Residuals should exhibit no patterns when plotted versus the Xi ,

• Studentized residuals should be plotted on a normal quantile

The coefficient of determination, r 2 , numerical measure of the

(β̂0 − σ̂(β̂0 )tn−2, 1+L , β̂0 + σ̂(β̂0 )tn−2, 1+L ),

Whether the interval for β1 contains 0 is of particular interest. If it

2. Hypothesis tests for Slope and Intercept

, and their estimated standard errors are

• The mean response at X = x0 is µ0 = β0 + β1 x0 .

(Ŷ0 − σ̂(Ŷ0 )tn−2, 1+L , Ŷ0 + σ̂(Ŷ0 )tn−2, 1+L ),

Estimated standard error=

A level L prediction interval for a future observation at X = x0 is

Estimated standard error of prediction is

If the standardized responses and regressors are

Recall that categorical variables define categories such as religion

where is a random error term having mean 0 and variance σ 2