Linear Regression

Lesson 15

Linear Regression
Lesson 15 Outline
Review correlation analysis
Dependent and Independent variables
Least Squares Regression line
Calculating the Intercept
Residuals and Residual Plots
Identifying significant relationship: t-
t-test of the slope
R2 : coefficient of determination
Using the regression line for Prediction of Y from X
Relationship between correlation coefficient and linear

PubH 6414 Lesson 15

Linear Regression and
can be used to explore the linear relationship
between two continuous (quantitative) random
Correlation analysis is used when the interest
is in identifying if a relationship exists and
quantifying the strength of the relationship
Regression Analysis is used to identify a
relationship AND to predict the value of one
variable given a value of the other variable(s).

PubH 6414 Lesson 15

Review: Correlation Analysis

1. Plot the data using a scatter plot to get a
visual idea of the relationship
2. Calculate the correlation coefficient
1. Use Pearsons correlation coefficient if both
variables are continuous
2. Use Spearman rank correlation coefficient if
both variables are ordinal or one is ordinal
and the other continuous.

PubH 6414 Lesson 15

Review: Scatter Plots and
The pattern of the dots in the plot indicates the
statistical relationship between the variables (the
Positive relationship pattern goes from lower left to
upper right.
Negative relationship pattern goes from upper left
to lower right.
The more the dots cluster around a straight line with
a positive or negative direction the stronger the linear

PubH 6414 Lesson 15

Review: Correlation Coefficient

( x x )( y y )
[ ( x x ) ][ ( y y )
2 2

The statistic r is called the Correlation Coefficient

r estimated the population correlation coefficient:
(the Greek letter r)
The correlation coefficient provides a measure of the
linear association between two variables
r is always between 1 and 1

PubH 6414 Lesson 15

ffi i iin Excell
Use the CORREL function to find the correlation
If data for one variable are in cells A1:A12 and data for
other variable are in cells B1:B12,
=CORREL(A1:A12,B1:B12) will return the Pearson
correlation coefficient.
stronger linear relationship.
Correlation coefficients close to 0 indicate a weak linear
However there could be a nonlinear relationship when
the correlation coefficient is close to 0.

PubH 6414 Lesson 15

Simple Linear Regression
Like correlation analysis, Linear regression analysis is a
q that is used to explore
p the relationship
between two continuous random variables that have a
linear relationship.
Regression analysis allows us to investigate the change
in one variable that corresponds to a given change in the
other variable.
If only ONE variable is used to predict the value of the
other variable, the analysis is called simple linear
value of the other variable, the analysis is called
multiple linear regression (not covered in this course).

PubH 6414 Lesson 15

Linear Regression: Background
Regression is from a Latin root meaning going back
Linear regression as a statistical method was first described by Sir
Francis Galton in his paper "Regression Towards Mediocrity in
Hereditary Stature published in The Journal of the Anthropological
Institute 1886
Galton described the relationship between mid-
mid-parent height (Mid
parent height = the average of the 2 parents height) and the height
of their offspring
Taller mid-
mid-parent height had children with heights closer to the
average height
Shorter mid-
mid-pparent height
g had children with heights
g closer to the
average height
Galton called this phenomenon regression towards mediocrity

PubH 6414 Lesson 15

Sir Francis Galton: Regression

When mid-
mid-parents are taller than
mediocrity, their children tend to be
shorter than they
When mid-
mid-parents are shorter
to be taller than they

PubH 6414 Lesson 15

Variables in Simple Linear
Dependent or response variable
variable-- a variable to be predicted
from or explained
The response variable is typically labeled Y
Y is a continuous variable in simple linear regression
Independent or explanatory variable the variable used to
predict the dependant variable.
X can also be called the predictive variable or the
g variable
For simple linear regression X is a continuous variable
For multiple linear regression X can be continuous or categorical

PubH 6414 Lesson 15

Identifying independent and
dependent variables
In regression analysis, its important to correctly identify
The study description should provide you with
information about which is the dependent variable and
which is the independent variable.
If the study description states that the goal is to predict variable
1 from variable 2, then variable 1 is the dependent variable (Y (Y)
and variable 2 is the independent variable (X (X).
Typically, if the variables are separated in time, the variable
collected first is the independent variable (X
(X) and the variable
collected later is the dependent variable (Y
In Galtons regression analysis, the mid-
mid-parent height was the
independent variable and the offspring height was the
dependent variable
PubH 6414 Lesson 15 12
Linear Regression
g Overview
Look at a scatter plot of the data
Plot Y on the y-
y-axis and X on the x- x-axis
Does the
h relationship
l h appear to be b linear?
l ?
Estimate the regression line equation
Find the slope and intercept of the regression line
Check residuals
Is the relationship statistically significant?
Use a t-
t-test of the slope to determine significance
How well does the estimated regression line equation fit the
Calculate R2 - the coefficient of determination
Use the estimated regression line equation to predict values
independent variable (X).
PubH 6414 Lesson 15
Simple Linear Regression:
An Example
Is there a linear relationship between body weight and plasma
volume that can be used to predict plasma volume from weight?
Plasma volume is the dependent variable Y since we are
interested in predicting this from body weight, the independent
variable X.
Body Plasma
Weight(kg) Volume(l)
1 58.0 2.75
2 70.0 2.86
3 74.0 3.37
4 63.5 2.76
5 62.0 2.62
6 70.5 3.49
7 71.0 3.05
8 66.0 3.12
PubH 6414 Lesson 15 14
Scatter plot of the Data
There is a positive relationship between plasma volume and body
relationship but there is a general linear trend to the data
We want to identify a line that has a good fit to the data. This isnt
a deterministic relationship so the points won
wontt fall perfectly on the
Plasma Volume (literrs)



50 55 60 65 70 75 80
Body Weight (kg)

PubH 6414 Lesson 15

Estimate the Regression Line
A few of the many possible lines through the data points are
me (liters)

asma Volum


50 55 60 65 70 75 80
Body Weight (kg)

PubH 6414 Lesson 15

Least Squares Regression Line
The linear regression line is the line that gets
closest to all of the points. This is called the
least squares regression line.
The least squares regression line minimizes the
sum of the squares of the vertical distance
between each observed data point (yi) and the
minimize (y
i 1
i point on linei ) 2

PubH 6414 Lesson 15

Vertical distances between each observed Y (yi) and the line
are in red
red. The sum of these distances squared is minimized
by the least squares regression line

e (L)

a Volume



50 55 60 65 70 75 80
B d W
Body Weight
PubH 6414 Lesson 15

Least Squares Regression Line
The equation
q for a line requires
q a slope
p and an interceptp
In regression analysis, we estimate the population
regression line with the least squares regression line
The notation for the slope and intercept in the population
regression line are Greek letters
for the intercept

for the slope

The notation for the slope and intercept in the sample

regression line are Roman letters
b for the slope
PubH 6414 Lesson 15
The Population Regression Line

0 is
i the
h y - intercept
i off the
h line
1 is the slope
p of the regression
g line
is the error term - the difference between
PubH 6414 Lesson 15

Sample Regression Line
0 aandd 1 aaree popu
Sample estimates for the regression parameters are :

a is the estimate for
b is the estimate for

Y a bX is the regression line calculated

Y is the predicted value of Y

PubH 6414 Lesson 15 21

Least Squares Regression Line
a and b are estimates of the regression
g coefficients and
The regression coefficients are estimated from the sample data
by the least squares method
The intercept a is the estimated expected value of Y when X =
The slope b is the estimated expected change in Y
corresponding to a 1 unit increase in X

Y is the expected (or predicted) value of y, the point on the

line. It is called the fitted value of y
PubH 6414 Lesson 15 22
The Equation of a Regression
y Line

Y a bX

a One-unit
g in X
PubH 6414 Lesson 15
Interpretation of predicted
The p predicted value of y is the expected
p y-value
Since not all observed data points are exactly on the
regression line, there is a range of possible y- y-values (a
distribution) for each xx--value. In regression analysis the
distribution of y-
y-values for each x-x-value is assumed to be a
normal distribution.
The predicted values of y represent the mean values of the
distributions of y for each specified value of x.
The following slide illustrates this for 3 values of X: notice
equation (the predicted value of y) and that the distribution
of yy--values are normal distributions.

PubH 6414 Lesson 15

Simple Linear Regression
PubH 6414 Lesson 15

Assumptions for Regression
There are several assumptions
p that should be met for
regression analysis:
For each value of X, the Y variable is assumed to have
a normal distribution the mean of the normal
distribution is the predicted value, Y
The normal distributions are assumed to have equal
variance across the entire range of X values
values. This
assumption is called homogeneity or homoscedasticity.
The predicted values of Y fall on the regression line
The Y observations are assumed to be independent
The observations are from a random sample

PubH 6414 Lesson 15

Interpretation of the Slope of the
The slope b is the expected change in Y corresponding to a 1 unit
increase in
i X
b = 0: There is no linear association between Y and X

b > 0: There is a Positive linear association between Y and X

((as X increases the expected
p value of Y increases)

b < 0: There is a Negative linear association between Y and X

(as X increases the expected value of Y decreases)

The following slide illustrates a positive, negative and 0 slope.

PubH 6414 Lesson 15

Illustration of Negative, Positive slopes
y b >00

b =0

b <0

PubH 6414 Lesson 15
Calculating the Slope of the
The formula to calculate the slope of the least
squares regression line is given below

i 1 ( xi x )( yi y )
i 11 ( xi x ) 2

Notice that the numerator is the same as the

numerator in the formula for the correlation coefficient.

PubH 6414 Lesson 15

b for plasma (Y) and body weight (X) example
X Y (X- Xbar) (Y-Ybar) (X-Xbar)(Y-Ybar) (X-Xbar)2

58.0 2.75 -8.9 -0.3 2.24 78.8

70.0 2.86 3.1 -0.1 -0.45 9.8

74.0 3.37 7.1 0.4 2.62 50.8

63.5 2.76 -3.4 -0.2 0.82 11.4

62.0 2.62 -4.9 -0.4 1.86 23.8

70.5 3.49 3.6 0.5 1.77 13.1

71.0 3.05 4.1 0.0 0.20 17.0

66.0 3.12 -0.9 0.1 -0.10 0.8

Mean 66.875 3.0025

SUM 8.9575 205.375

PubH 6414 Lesson 15

Slope of regression line
From the previous slide the sum of (X-
X)(Y--Y) = 8.9575.
Th sum off (X-
The (X-X)2 =205.375
205 375

b = 8.9575 / 205.375 = 0.043615

Interpretation of the slope: For every one unit increase

in X, the expected increase in Y is 0.0436 units (rounded
to 4 decimal places)
Plasma volume increases 0.0436 liters for every one
g increase in bodyy weight.

The slope is positive indicating that as body weight (X)

increases, plasma volume (Y) also increases

PubH 6414 Lesson 15

Calculating the Intercept of
The interceptp a of the regression
g line is the estimated
value of Y when X = 0
a is calculated from the average value of Y, the
following formula:

a Y bX

PubH 6414 Lesson 15

Intercept for Plasma Volume

X 66.875
Y 3.0025
b 0.043615
a 3.0025 0.043615 * 66.875 0.0857
The intercept is the estimated expected value of Y when
X = 0. Intercepts do not always have realistic interpretations.
In this example, plasma volume is predicted to be 0.0857 liters
PubH 6414 Lesson 15
Regression Line Equation
Once the slope and the intercept have been calculated
th regression
Y a bX
Y 0.0857 0.0436 X
This is the equation that will be used to predict plasma
The regression equation calculated from sample data is
an estimate of the true population regression equation.

PubH 6414 Lesson 15

Regression Line Equation and
i off the
h slope

A 1 unit increase in X for this data = 1 kg so the

interpretation of the slope in this regression line
For each 1 kg increase in body weight, the expected
increase in plasma volume is .0436 liters.
What is the expected plasma volume increase for a 10
kg increase in body weight?
For a 10 kilogram increase in body weight, the
expected increase in plasma volume = 10*0.0436 =
0.436 liters.

PubH 6414 Lesson 15

What if the slope of the
i line
If the slope of the regression line is negative we
would expect a decrease in Y with each unit
increase in X.
The slope is a measure of the expected change
in Y for each 11--unit increase in X
If the slope is positive, the expected change
in Y is an increase
If the slope is negative, the expected change
in Y is a decrease.

PubH 6414 Lesson 15

Regression Coefficients in
Excel has functions to calculate the slope and
the intercept of the least squares regression
The SLOPE function returns b - the slope
=SLOPE(y--range, xx--range)
The INTERCEPT function returns a - the
=INTERCEPT(y--range, xx--range)
For both of these functions enter the yy--range
off d
PubH 6414 Lesson 15
Plasma Volume Example in
Plasma Volume / body weight regression
Create a scatterplot of the data
work throughg the calculations of the Slope
p and
Intercept of the regression line
Use the Excel Slope and Intercept functions
After youve worked through the calculations once,
use the Excel functions to find the slope and
intercept for future regression problems
PubH 6414 Lesson 15
observed (Y) and the expected (Y) value of Y
Residual = Y Y
Y is the observed Y for any X
Y is the Y
Y--value on the regression line for
The residual is the component of Y that is not
predicted by X
The least squares regression line is the line that
minimizes the squared
q residuals

PubH 6414 Lesson 15

Residuals for Plasma Volume
X Y Y'
Y Residual
58.0 2.75 2.62 0.13 Calculate Y, the
70.0 2.86 3.14 -0.28 expected value of
74.0 3.37 3.31 0.06 Y using
Y, i ththe
63.5 2.76 2.86 -0.10 regression line
70.5 3.49 3.16 0.33
The residual is the
difference between
71.0 3.05 3.18 -0.13
Y and YY
66.0 3.12 2.96 0.16

Which point is closest to the regression line? (74, 3.37) has the smallest
Which point is furthest from the regression line? (70.5, 3.49) has the largest
PubH 6414 Lesson 15
Regression Line and Residuals
Largest residual

ma Volume ((L)

3.5 residual
id l



50 55 60 65 70 75 80
Body Weight (kg)

PubH 6414 Lesson 15

Analysis of Residuals
A Residual p plot is a plot
p of the residual values on the Y-Y-
axis and the x-x-values on the X-
If there is a linear relationship between X and Y, the
correlation between X and the residuals should equal 0.
The scatterplot will be a random scatter of points with
no evident linear pattern.
A nonlinear relationship between X and Y will be more
evident in the residual plot of the (X, residual) data than
in the scatterplot of the original (X, Y) data
Th Excel
selecting the Residual plot. The Residual plot for the
plasma volume example is on the following slide.

PubH 6414 Lesson 15

Residual Plot for Plasma Volume
body weight (kg) Residual Plot


1 0.0
00 20 0
20.0 40 0
40.0 60 0
60.0 80 0

body weight (kg)

No evidence of nonlinearity.
nonlinearity The points are equally distributed
around the value 0 with no evident positive or negative slope
PubH 6414 Lesson 15
(X, Y) Scatterplot for a nonlinear
l ti hi


0 10 20 30 40 50

When there is a curvilinear relationship between X and Y, the

least squares regression
i line
li does
d not represent the
h relationship
l i hi

PubH 6414 Lesson 15

Residual Plot for Curvilinear
X Residual Plot


2 0
-2 10 20 30 40 50


This is the residual pplot for the relationship p on the p

previous slide.
It illustrates that the relationship is not linear. The residual plot points
arent evenly distributed around the value 0.
PubH 6414 Lesson 15
Regression analysis for curvilinear
Simple linear regression analysis should not be used
There are several strategies for dealing with a curvilinear
relationship between X and Y
One option is to try a logarithmic transformation of
the data to see if this improves the linear relationship
Another option is to use piecewise regression fit
one regression line
l to the
h increasing portion off the
curve and a second regression line to the decreasing
portion of the curve
equation (covered in PubH 6415 with multiple
regression models).

PubH 6414 Lesson 15

Linear Regression Procedure
Look at a scatter plot of the data
Plot Y on the y-
y-axis and X on the x-
Add the trend line to the plot
Estimate the regression line equation
Find the slope
p and intercept p of the regression
g line
Check Residuals
Is the relationship between X and Y statistically significant?
How well does the estimated regression line equation fit the
Calculate R2 - the coefficient of determination
Use the estimated regression line equation to predict values
of the dependent variable (Y) for specified values of the
independent variable (X).
PubH 6414 Lesson 15
Is the relationship between X and
Y significant?
If the slope of the regression line = 0, this indicates there
no linear relationship the variables are considered to be
independence between the X and Y variables
Null hypothesis: slope = 0
Alternative hypothesis: slope 0
The alternative hypothesis is that there is a significant
relationship between the variables
If the t- (p--value < ),
t-test of the slope result is significant (p
reject the null hypothesis and conclude that there is a
statistically significant relationship between the two
PubH 6414 Lesson 15
Notation for Population slope
andd Intercept
As in anyy hypothesis
yp test,, the null and alternative
hypotheses are stated about the population parameters,
not about the estimates.
Th population
the regression line for the population are the Greek
letters 1 and 0
1 is the population parameter for the slope

0 is the population parameter for the intercept

The statistic for the t-

t-test of the slope will use the
estimated value of the slope (b) that is calculated from
the data.

PubH 6414 Lesson 15

t-test of the Slope
1. State the Hypotheses
Null hypothesis: = 0
Alternative hypothesis: 0

2. A tt--test will be used to test the hypothesis

3 Significance level = 0.05
0 05

4. The degrees of freedom for a t- t-test of the slope are n-

where n=sample size
The critical values of the t-t-test are found using
TINV(0 05 df)
TINV(0.05, df). For the plasma volume example
example, n = 8 so
the critical values = TINV(0.05, 6) = 2.447 and -2.447
PubH 6414 Lesson 15
t-test of the slope
5. Calculate the test statistic the slope
p estimate
divided by the standard error of the slope
SE (b1 )
The formula for the SE of the slope is complicated so
we will use the Excel Data Analysis Tool to do this t- t-
test. The Data Analysis Tool provides the t- t-statistic
and the pp--value of the tt--test of the slope
6. State the conclusion. If the test statistic is more
extreme than the critical values reject the null
hypothesis and conclude that there is a significant
relationship between the variables.
PubH 6414 Lesson 15
T-test of the Slope in Excel
Data Analysis Tool output for the weight / plasma volume example:
The t-statistic and p-value for the t-test of the slope are highlighted

Regression Statistics
Multiple R 0.759126577
R Square 0.576273159
Standard Error 0.218809511
Observations 8

df SS MS F Significance F
Regression 1 0.390684388 0.390684388 8.160066 0.028930913
Residual 6 0.287265612 0.047877602
Total 7 0.67795

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 0.085724285 1.023998015 0.083715284 0.936006 -2.419910427 2.591358996
Body weight 0.043615338 0.015268361 2.856582911 0.028931 0.006254978 0.080975697

there is a significant relationship between weight and plasma volume
PubH 6414 Lesson 15
In Excel Module 15 use the Data Analysis Tool to obtain
the Regression Analysis results
Enter the plasma volume data for Y- Y-range and the
g data for X- X-range
Check labels if you highlight the column headers
Also check Residuals and Residual Plot
Also identifyy the slopep and the intercept
p on the outputp
These are under the Coefficients column
95% confidence intervals for the coefficients are also
provided if the Confidence Level box is checked
PubH 6414 Lesson 15
T-test of the Intercept
The Data Analysis Tool also provides results of a t- t-test of
the Intercept
The Null hypothesis of this test is that the intercept = 0:
= 0
intercept 0: 0
Usuallyy there is not much interest in the t-t-test of the
intercept because testing whether the intercept = 0 does
not provide information about the relationship between
the two variables.
From the Regression Table, you can see that the null
hypothesis for the intercept = 0 is not rejected because
the p-
p-value = 0.936.
0 936 This result does not affect the
significant result of the t-
t-test of the slope.
PubH 6414 Lesson 15
Linear Regression Procedure
Look at a scatter plot of the data
Plot Y on the y-
y-axis and X on the x-
Add the trend line to the plot
Estimate the regression line equation
Find the slopep and interceptp of the regression
g line
Is the relationship statistically significant?
Use a t-t-test of the slope to determine significance
Calculate R2 - the coefficient of determination
Use the estimated regression line equation to predict values
of the dependent variable (Y) for specified values of the
independent variable (X).

PubH 6414 Lesson 15

How well does the regression
r2 is equal to the correlation coefficient (r)
squared. d It
I can range from
f 0 to 1.
Interpretation of r2
variable (Y) that is explained by the estimated
least squares regression equation.
Larger values of r2 indicate a better fit of the
regression line to the data which indicates a more
PubH 6414 Lesson 15 56
Calculating r2
In Excel, you can use the CORREL function to find
the correlation coefficient and square this value to
find the coefficient of determination
For the plasma / weight data, r = 0.759 so r2 =
0.7592 = 0.576
Or you can find r2 on the Data Analysis Tool Output:
Regression Statistics
Multiple R = the correlation coefficient
p R 0.759126577
R Square 0.576273159
Adjusted R Square 0.505652019 R square = coefficient of determination (r2)
Standard Error 0.218809511
PubH 6414 Lesson 15

Interpretation of r2
For the plasma volume example r2 = 0.576.
p 57.6% of the variation in plasma
p volume
is explained by the regression line equation with weight
as the explanatory variable.
Since onlyy 57.6% of the variation in p
plasma volume is
explained by body weight, there are likely other
variables that explain some of the variation in plasma
M l i l regression
Multiple i analysis
l i uses more than
h one
explanatory variable to predict the dependent variable
This is covered in PubH 6415
If there are other explanatory variables significantly
related to plasma volume in a multiple regression
model, r2 will increase

PubH 6414 Lesson 15

Linear Regression Procedure
Look at a scatter plot of the data we have done this
Plot Y on the y-
y-axis and X on the x-
Does the relationship appear to be linear?
Estimate the regression line equation we have done this
Find the slopep and interceptp of the regression
g line
Is the relationship statistically significant?
Use a t-t-test of the slope to determine significance
How wellll does
d the
th estimated
ti t d regression i line
li equation
ti fit the
data? We have done this
Calculate R2 - the coefficient of determination
Use the estimated regression line equation to predict values
of the dependent variable (Y) for specified values of the
independent variable (X).

PubH 6414 Lesson 15

Using the Regression Line
The regression line equation for the weight and
plasma volume data is: Y 0.0857 0.0436 X

For a given value of weight (X), the plasma

volume ((Y)) can be predicted.
What is the expected plasma volume for an
individual who weighs 60 kg?
Insert 60 in the equation in place of X and
solve for Y: Y 0.0857 0.0 36 * 60 2.7lite
0436 literss
PubH 6414 Lesson 15
Predicting plasma volume for
P la s m a V o lu m e (lite rs )



50 55 60 65 70 75 80
Body Weight (kg)

The predicted plasma volume for weight = 60 kg is the point on the regression
line corresponding to x = 60. This point is 2.7 liters.
PubH 6414 Lesson 15
Appropriate Applications of the
Predictions using regression line equations are only valid
For the example data, the range of weight is from 58
74 kgs.
It would not be appropriate to use this regression line
equation to predict plasma volume for an individual
weighing 100 kg or an individual weighing 25 kg.
There may be a different relationship between weight and
plasma volume beyond the values of the collected data so
the relationship identified by the regression line equation
should not be extrapolated much beyond the range of the X

PubH 6414 Lesson 15

More cautions about application
of Regression line predictions
Predictions using Regression line equations are only valid
for the population represented by the sample data.
For Example, if data for a regression analysis are
collected for girls age 10 - 18, predictions using the
equation are not necessarily valid for boys, adults or girls
younger than 10.
You cant assume that the relationship between two
variables in one population is the same in other
Read the study description carefully to identify the
population that was sampled. Regression analysis
necessarily other populations.
PubH 6414 Lesson 15
What if there isnt a significant
relationship between the
If regression analysis reveals that there is NOT a
significant relationship between the two variables (that is
p--value for the tt--test of the slope >
if the p )
) the
regression equation is not useful for predicting values of
the dependent variable from the independent variable.
If the t-
t-test of the slope is NOT significant, end the
regression analysis procedure and do not use the
regression line equation for prediction.
Prediction using the regression line equation is only
useful if the null hypothesis of independence between
the variables is rejected.

PubH 6414 Lesson 15

Relationship between
The correlation coefficient and the slope of the
regression line are related. For a given set of
They will both have the same sign indicating the
direction of the relationship (positive or negative).
There is
slope and the correlation coefficient: the slope of the
g line is equal
q to the correlation coefficient
times the standard deviation of y divided by the
standard deviation of x: rs
b1 y

PubH 6414 Lesson 15
Hypothesis Test of population
We can set up p a hypothesis
yp test of independence
p for the
population correlation:
Null Hypothesis:
no significant linear association between the variables
Alternative Hypothesis:
significant linear association between the variables
The test statistic is a t-
t-statistic with n-
n-2 df
r n2
1 r 2
After finding
g the t-
t-statistic,, you
y can use EXCEL to find the
p-value = TDIST(t, n- n-2, 2)
PubH 6414 Lesson 15
T-test of the correlation
For a given sample data, the t- t-test for and the t- t-test for
For the plasma volume data, the t- t-statistic for the test of
the population correlation coefficient = 2.85658 which is
You can work through the equation in EXCEL to
confirm this
P-value = TDIST(2.85658, 6, 2) = 0.02893
The same conclusion is reached from either hypothesis
The p-p-value < 0.05 so the null hypothesis of
PubH 6414 Lesson 15
Linear Regression and
Correlation: which to use?
Both Linear Regression and Correlation Analysis can be
used to explore the linear relationship between two
continuous (quantitative) random variables
Use Correlation analysis when the interest is primarily
in identifying whether a relationship exists.
Use the t-
t-test of the correlation coefficient to determine if
the relationship is significant.
Use Regression
to predict the value of one variable given a value of
the other variable.
Use the t-
t-test of the slope to determine if the relationship is
Regression analysis is most useful when there is an identified
interest in predicting one variable from the other(s).
other(s) If
prediction doesnt make sense, use correlation analysis.
PubH 6414 Lesson 15
Readings and Assignments
Chapter 8 pgs. 192-
192-194, 202
p the Lesson 15 Practice Exercises
Lesson 15 Excel Modules
Excel Module 15: Plasma Volume works
through the example in this Lesson
Excel Module 15: BMI works through the
example in the text (pages 205
206 208-
Complete OPTIONAL Homework 11: Use the
Data Analysis Tool for the Linear Regression
PubH 6414 Lesson 15

