Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

https://onlinecourses.science.psu.edu/stat502/print/book/export...

Published on STAT 502 - Analysis of Variance and Design of Experiments


(https://onlinecourses.science.psu.edu/stat502)
Home > Lesson 10: Analysis of Covariance (ANCOVA)

Lesson 10: Analysis of Covariance (ANCOVA)


Key Learning Goals for this Lesson:

Introduce the General Linear Model (GLM)


How to include a continuous covariate variable in ANOVA.
Understand testing sequences for ANCOVA
Understand equal and unequal slopes model development

See Textbook: Chapter 22

A Few Comments About ANCOVA


In the next two units we are going to build on concepts that we learned so far in this course, but these
next two units are also going to remind us of the principles and foundations of regression that you
learned in STAT 501. These are going to expand on the idea of the general linear model and how it can
handle both quantitative and qualitative predictors. In the general linear model, when we're talking
about the analysis of covariance, this can be thought of as sort of the larger picture, an 'umbrella'
procedure if you will. If you have a model where you have no continuous factors you simply have an
ANOVA. If you have a model with no categorical factors you simply have a regression. If you have a
model that has both continuous and categorical factors then this is a General Linear Model and you can
use ANCOVA to include both of these different types of factors.

You might find it interesting that historically when SAS first came out they had PROC ANOVA and
PROC REGRESSION and that was it. Then people asked,"What about the case when you have
categorical factors and you want to do an ANOVA but now you have this other variable, a continuous
variable, that you can use as a covariate to account for extraneous variability in the response?" So, SAS
came out with PROC GLM which is the general linear model. With PROC GLM you could take the
continuous regression variable pop it into the ANOVA model and it runs. Or, conversely, if you are
running a regression and you have a categorical predictor like gender, you could include it into the
regression model and it runs. The general linear model handles both the regression and the categorical
variables in the same model. There is no PROC ANCOVA is SAS but there is PROC MIXED. PROC
GLM had problems when it came to random effects, and was effectively replaced by PROC MIXED.
The same sort of process can be seen in Minitab and accounts for the multiple tabs under Stat >
ANOVA and Stat > Regression. In SAS PROC MIXED or in Minitab's General Linear Model, you have
the capacity to include covariates and correctly work with random effects. But enough about history,
let's get to this lesson.

In the first lesson we will address the classic case of ANCOVA where the ANOVA is potentially improved
by adjusting for the presence of a linear covariate. In the second part we will deal with a little bit more
complexity by considering functions of the covariate that are not linear. We will generalize the treatment
of the continuous factors to include polynomials, with linear, quadratic, cubic components that can
interact with categorical treatment levels.

We find this idea of ANCOVA not only interesting in the fact that merges these two statistical concepts,

1 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

but can also be very powerful Aha! moment for students studying statistics.

Introduction to Analysis of Covariance (ANCOVA)


A ‘classic’ ANOVA tests for differences in mean responses to categorical factor (treatment) levels. When
we have heterogeneity in experimental units sometimes restrictions on the randomization (blocking) can
improve the test for treatment effects. In some cases, we don’t have the opportunity to construct blocks,
but can recognize and measure a continuous variable as contributing to the heterogeneity in the
experimental units.

These sources of extraneous variability historically have been referred to as ‘nuisance’ or ‘concomitant’
variables. More recently, these variables are referred to as ‘covariates’.

When a continuous covariate is included in an ANOVA we have the analysis of covariance (ANCOVA).
The continuous covariates enter the model as regression variables, and we have to be careful to go
through several steps to employ the ANCOVA method.

Inclusion of covariates in ANCOVA models often means the difference between concluding there are or
are not significant differences among treatment means using ANOVA.

10.1 - Role of the Covariate


To illustrate the role the covariate has in the ANCOVA, let’s look at a hypothetical situation wherein
investigators are comparing salaries of male vs. female college graduates. A random sample of 5
individuals for each gender is compiled, and a simple one-way ANOVA is performed:

Males Females
78 80
43 50
103 30
48 20
80 60

H0 : μMales = μFemales
Using SAS
Using Minitab

SAS coding for the One-way ANOVA (ancova_example_01.txt [2])

2 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Here is the output we get:

To perform one-way ANOVA test in Minitab, you can first open the data within this Minitab Project
file, salary.MPJ [3].

Go to Stat > ANOVA > One Way…

In the pop-up window that appears, select salary as the Response and gender into Factor as
shown below.

Click OK, and then here is the Minitab output that you get.

3 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Because the p-value > α (.05), they can’t reject the H0.

A plot of the data shows the situation:

However, they recognize that the length of time that someone has been out of college is likely to
influence how much money they are making. So they also included a question asking how many years
they have been out of college (ranging from 1 to 5 years for this sample):

Females Males
Salary years Salary years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4

4 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

We can see that indeed, there is a general trend for people to earn more the longer they are out of
college. The fundamental idea of including a covariate is to take this trending into account and
effectively ‘control for’ the number of years they have been out of college. In other words, we hope to
include the covariate in the ANOVA so that the comparison between Males and Females can be made
without the complicating factor of years out of college.

10.2 - The Covariate as a Regression Variable


ANCOVA by definition is a general linear model that includes both ANOVA (categorical) predictors and
Regression (continuous) predictors. The simple linear regression model is:

Yi = β0 + β1 (Xi ) + ϵi
Where β0 is the intercept and β1 is the slope of the line. The significance of a regression is tested by
calculating a sums of squares due to the regression variable SS(Regr), calculating a mean squares for
regression, MS(Regr), and using an F-test with F=MS(Regr)/MSE. In the case of a simple linear
regression, this test is equivalent to the t-test for H0 : β1 = 0.

However, In adding the regression variable to our one-way ANOVA model, we can envision a notational
problem. In the balanced one-way ANOVA we have the grand mean (μ), but now we also have the
intercept β0 . To get around this, we can use

X ∗ = Xij − X̄ ..
and get the following as an expression of our covariance model:

Yij = μ + τi + γ(X ∗ ) + ϵij


The Type III (model fit) sums of squares for the treatment levels in this model are being corrected (or
adjusted) for the regression relationship. This has the effect of evaluating the treatment levels ‘on the
same playing field’, that is, comparing the means of the treatment levels at the mean value of the
covariate. This process effectively removes variation that was originally seen in the treatment level
means due to the covariate.

10.3 - Steps in ANCOVA

5 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

To use a coviariate in ANCOVA, we have to go through several steps. First, we need to establish that for
at least one of the treatment groups there is a significant regression relationship with the covariate.
Otherwise, including the covariate in the model won’t improve the estimation of treatment means.

Secondly, we have to be sure that the regression relationship of the response with the covariate has the
same slope for each treatment group. This is an extremely important point. In our example, we need to
be sure that the lines for Males and Females are parallel (have equal slope).

Depending on the outcome of the test for equal slopes, we have two alternative ways to finish up the
ANCOVA: 1) fit a common slope model and adjust the treatment SS for the presence of the covariate, or
2) evaluate the differences in means at at least three levels of the covariate.

These steps are diagrammed below:

Note: The figure above is presented as a guideline, and does require some subjective judgement.
Small sample sizes, for example, may result in none of the individual regressions in step 1 being
statistically significant, yet the inclusion of the covariate in the model may still be advantageous.
Exploratory data analysis and working with the regression diagnostics is an important aspect of
ANCOVA.

10.4 - Equal Slopes Model - SAS


Using our Salary example using the data in the table below, we can run through the steps for the
ANCOVA.

Females Males
Salary years Salary years
80 5 78 3

6 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4

Step 1: Are all regression slopes = 0.

A simple linear regression can be run for each treatment group, Males and Females. (Note: To perform
regression analysis on each gender group in Minitab, we will have to sub-divide the salary data
manually and save them separately. See the next page for Minitab example.)

Running these procedures using statistical software we get the following:

Males

Use the following SAS code (ancova_example_02.txt [4])

And here is the output that you get:

Females

7 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Use the following SAS code (ancova_example_03.txt [5])

And here is the output for this run:

In both cases, the simple linear regressions are significant, so the slopes are not = 0.

Step 2: Are the slopes equal?

We can test for this using our statistical software.

In SAS we now use proc mixed and include the covariate in the model (ancova_example_04.txt [6]).

We will also include a ‘treatment × covariate’ interaction term and the significance of this term answers
our question. If the slopes differ significantly among treatment levels, the interaction p-value will be <
0.05.

8 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Note: In SAS, we specify the treatment in the class statement, indicating that these are
categorical levels. By NOT including the covariate in the class statement, it will be treated as a
continuous variable for regression in the model statement.

So here we see that the slopes are equal and in a plot of the regressions we see that the lines are
parallel.

Step 3: Fit an Equal Slopes Model

We can now proceed to fit an Equal Slopes model by removing the interaction term. Again, we will use

9 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

our statistical software SAS (see ancova_example_04a.txt [7]).

and obtain the following final results:

Please Note: In SAS, the model statement automatically creates an intercept, and so the ANCOVA
model is technically over-parameterized. To get the slopes and intercepts for the covariate directly, we
have to re-parameterize the model. This entails suppressing the intercept (noint), and then specifying
that we want the solutions, (solution), to the model:

Here is what the SAS code looks like for this (ancova_example_05.txt [8]):

10 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Here is the output:

In the first section of the output above is reported a separate intercept for each gender, the ‘Estimate’
value for each gender, and a common slope for both genders, labeled ‘Years’.

Thus, the regression equation for Females is y = 2.7 + 15.1(Years), and for males it is y = 25.1 +
15.1(Years)

11 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

To this point in this analysis we can see that 'gender' is now significant. By removing the impact of the
covariate, we went from

(without covariate consideration)

to

(adjusting for the covariate).

10.4a - Equal Slopes Model - using Minitab


Using our Salary example and the data in the table below, we can run through the steps for the
ANCOVA. On this page we will go through the steps using Minitab.

Females Males
Salary years Salary years
80 5 78 3
50 3 43 1
30 2 103 5
20 1 48 2
60 4 80 4

Step 1: Are all regression slopes = 0.

A simple linear regression can be run for each treatment group, Males and Females. (Note: To perform
regression analysis on each gender group in Minitab, we will have to sub-divide the salary data
manually and separately saving the male data into Male-salary.MPJ [9] and female data into Female-
salary.MPJ [10].

Running these procedures using statistical software we get the following:

Males

Open the Male dataset in the Minitab project file Male-salary.MPJ [9].

From the menu bar, select Stat > Regression > Regression.

In the pop-up window, select salary into Response and years into Predictors as shown below.

12 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Click OK, and here is the output that Minitab displays:

Females

Open Minitab dataset Female-salary.MPJ [10].

From the menu bar select Stat > Regression > Regression.

In the pop-up window, select salary into Response and years into Predictors as shown below.

13 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Click OK, and here is the output that Minitab displays:

In both cases, the simple linear regressions are significant, so the slopes are not = 0.

Step 2: Are the slopes equal?

We can test for this using our statistical software.

In Minitab we must now use GLM (general linear model) and be sure to include the covariate in the
model. We will also include a ‘treatment x covariate’ interaction term and the significance of this term is
what answers our question. If the slopes differ significantly among treatment levels, the interaction
p-value will be < 0.05.

First, open the dataset in the Minitab project file salary.MPJ [3].

Then, from the menu select Stat > ANOVA > GLM (general linear model).

In the dialog box, select salary into Responses and gender into Model, and type gender*years as well.

14 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Then, in this dialog box, click on the button "Covariates..." under the text boxes. Select years as
Covariates.

Click OK, and the OK again and here is the output that Minitab will display:

15 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

So here we see that the slopes are equal and in a plot of the regressions we see that the lines are
parallel.

Step 3: Fit an Equal Slopes Model

We can now proceed to fit an Equal Slopes model by removing the interaction term. Again, we will use
our statistical software.

We will still be using salary.MPJ [3].

Go to Stat > ANOVA > GLM (general linear model).

In the dialog box, select salary into Responses and gender into Model.

16 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Click on the button "Covariates" under the text boxes. Select years as Covariates as before. Click OK to
go back to the GLM dialog box.

Now click on the "Comparisons" button and in the dialog box that appears:

be sure to add gender in the Terms box and that the Tukey method is checked. Click OK to return to the
main GLM dialog box.

Now, click on the button "Results" under the text boxes.

17 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Click the radio button "In addition,…" and select gender into the text window at the bottom (as shown
above). This will give us the least square means of females and males.

Click OK to exit the dialog box and then OK to run the model. This is the output that Minitab displays:

18 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

We can write the regression equations for this example based on the output above. For females the
regression equation would be:

y^f = 48 + 15.1(x − 3) = 2.7 + 15.1x


For males, the regression equation is:

^y = 70.4 + 15.1(x − 3) = 25.1 + 15.1x


m

And the comparison results are given as well:

In this analysis we can see that Gender is now significant. By removing the impact of the covariate, we
went from

(without covariate consideration)

to

(adjusting for the covariate).

19 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

10.5 - Unequal Slopes Model - SAS


If the data collected in the example study were instead as follows:

Females Males
Salary years Salary years
80 5 42 1
50 3 112 4
30 2 92 3
20 1 62 2
60 4 142 5

We would see in Step 2 that we do have a significant treatment × covariate interaction. Using this SAS
program with the new data in it, (ancova_example_06.txt [11]), shown below.

We get the following output:

Generating Covariate Regression Slopes and Intercepts

We can do the same thing with the unequal slopes model to generate individual slopes and intercepts
for 'gender' as follows in SAS (ancova_example_07.txt [12]):

20 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Output:

Here the intercepts are the Estimates for effects labeled 'gender' and the slopes are the Estimates for
the effect labeled 'years*gender'. Thus, the regression equations for this unequal slopes model are:

Females y = 3.0 + 15(Years)


Males y = 15 + 25(Years)
The slopes of the regression lines differ significantly and are not parallel:

21 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

In Step 3, then, we can’t simply remove the interaction term and compare the treatment means at the
mean level of the covariate (3 years out of college). Instead, we specify mean comparisons at 3 (or
more) levels of the covariate using ancova_example_08.txt [13] :

And here is the output:

In this case, we see a significant difference at each level of the covariate specified in the lsmeans
statement. The magnitude of the difference between males and females differs (giving rise to the
interaction significance). In more realistic situations, a significant treatment × covariate interaction often
results in significant treatment level differences at certain points along the covariate axis.

10.5a - Unequal Slopes Model - Minitab


With new a new data file, salary-new.MPJ [14], we re-run the program with this new data and find that we
get a significant interaction between gender and years.

To do this, open the Minitab dataset salary-new.MPJ [14].

Go to Stat > ANOVA > GLM (general linear model).

In the in the GLM dialog box, choose salary into Responses and gender into Model, and type
'gender*years' as well.

22 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Click on the button "Covariates" under the text boxes. Select 'years' as Covariates.

Click OK to go back to the GLM window. Then click on the button "Results" under the text boxes. Click
the radio button "In addition, …" and select 'gender' into the text window at the bottom, which will give
us the least square means of females and males.

23 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

Click OK.

Here is the output that Minitab will display:

24 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

We see in Step 2 that we do have a significant interaction between 'gender' and 'years'.

In Step 3, then, we can’t simply remove the interaction term and compare the treatment means at the
mean level of the covariate (3 years out of college). The magnitude of the difference between males and
females differs (giving rise to the interaction significance). In more realistic situations, a significant
treatment × covariate interaction often results in significant treatment level differences at certain points
along the covariate axis.

As before, we could write the regression equations for both males and females based on the above
output. Note that two regression lines should have different slopes this time.

For females:

^
βf = 20 − 5 = 15
^y = 48 + 15(x − 3) = 15x + 3}
f

For males:

^
βm = 20 − (−5) = 25
^y = 90 + 25(x − 3) = 25x + 15
m

25 of 26 5/28/14 10:38 AM
https://onlinecourses.science.psu.edu/stat502/print/book/export...

© 2014 The Pennsylvania State University. All rights reserved.

Source URL: https://onlinecourses.science.psu.edu/stat502/node/183

Links:
[1] http://www.dynamicdrive.com
[2] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_01.txt
[3] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary.MPJ
[4] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_02.txt
[5] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_03.txt
[6] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_04.txt
[7] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_04a.txt
[8] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_05.txt
[9] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/Male-salary.MPJ
[10] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/Female-
salary.MPJ
[11] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_06.txt
[12] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_07.txt
[13] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10
/ancova_example_08.txt
[14] https://onlinecourses.science.psu.edu/stat502/sites/onlinecourses.science.psu.edu.stat502/files/lesson10/salary-new.MPJ

26 of 26 5/28/14 10:38 AM

You might also like