Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

STA408: Statistics for Science and Engineering

Chapter 4: Analysis of Variance

4.1 Analysis of Variance (ANOVA)


ANOVA is a procedure that is used to test the null hypothesis that the means of three or more populations
are equal.

Example 1
Suppose that lecturers at a college have devised three different methods to teach statistics. They want to
find out if these three methods produce different mean scores. Let , and be the mean scores of all
students who will be taught by the three Methods I, II and III, respectively. State the null and alternative
hypotheses to test if the three teaching methods produce the same mean.
(All three population means are equal.)
Not all three population means are equal (or at least two population means are not equal.)

Definitions of an Experimental Design


Treatment
A condition (or a set of conditions) that is imposed on a group of elements by the experimenter.
Level
A level is the amount or magnitude of the treatments.
Factor
A factor is a general type or category of treatments.
Block
The division of the experimental units into homogeneous pairs.
Randomisation
The procedure in which elements are assigned to different groups at random.
Designed Experiment
The experimenter controls the (random) assignment of elements to different treatment groups.
Observational Study
The assignment of elements to different treatments is voluntary and the experimenter simply
observes the results of the study.
Treatment Group
The group of elements that receives a treatment.
Control Group
The group of elements that does not receive a treatment.
Completely Randomised Design
Objects or subjects are assigned to groups completely at random.
Randomised Block Design
In a block design, experimental subjects are first divided into homogeneous blocks before they are
randomly assigned to a treatment group.

Example 2
Three different groups of runners are subjected to different training methods.
Experimental units:
Treatments:
Levels of the factor:
STA408 Chapter 4: Analysis of Variance

Example 3 (randomised blocked design)


An experimenter had reason to believe that age might be a significant factor in the effect of a given
medication, he might choose to first divide the experimental subjects into age groups, such as under 30
years old, 30-60 years old, and over 60 years old. Then, within each age level, individuals would be
assigned to treatment groups using a completely randomized design. In a block design,
both control and randomisation are considered.

One-way ANOVA
A 0ne-way ANOVA test analyses one factor or one independent variable.
Assumptions of one-way ANOVA
The populations from which the samples are drawn are (approximately) normally distributed.
The populations from which the samples are drawn have the same variances (or standard
deviations).
The samples drawn from different populations are random and independent.

The ANOVA test is applied by calculating two estimates of the variance, , of the population distributions:
Between-group variance (mean square treatment (MSA)) involves finding the variance of the
means
Within-group variance (mean square error (MSE)) found through computing the variance using
all the data and is not affected by differences in means.

In general, for a test of the difference among three or more means, the hypotheses are:
(All the population means are equal.)
At least one mean is different from the others (or at least two population means are not equal)
The degrees of freedom for this test are:
degrees of freedom for numerator (d.f.N.)

where is the number of groups;


degrees of freedom for denominator (d.f.D.)

where is the sum of the sample sizes of the groups, .


Test statistic, F for a one-way ANOVA test

Note:
The one-way ANOVA test is always right-tailed with rejection region in the right tail of the F
distribution curve.
If there is no difference in the means, then
between-group variance estimate within-group variance estimate
and the test statistic, will be approximately 1.

2
STA408 Chapter 4: Analysis of Variance

Example 4
Fifteen students were randomly assigned to three groups to experiment the three different methods of
teaching statistics. At the end of the semester, the same test was given to all 15 students. The following
table gives the scores of the students in the three groups.

Method I Method II Method III


48 55 84

¥
73 85 68
51 70 95
65 69 74
87 90 67

Assume that all the assumptions of the one-way ANOVA procedure hold true.
(a) Calculate the value of the test statistic F.
(b) Test a 5% significance level the claim that there is no difference among the population mean scores.
In ANOVA terminology, the three methods of used in teaching statistics are called treatments.
Let
the score of a student
the number of different groups (or treatments)
the size of sample
the sum of the values in sample 15
=

The total sum of squares (SST) is the sum of SSA and SSE, i.e.,
To calculate MSA and MSE, we first need to compute the treatment sum of squares (SSA) and the error
sum of squares (SSE). The SST, SSA, SSE, MSA and MSE are calculated as follows, respectively:

(a) Method I Method II Method III


48 55 84
73 85 68
51 70 95
65 69 74
87 90 67

3
STA408 Chapter 4: Analysis of Variance

Therefore,

For convenience, all the calculations above are often recorded in a table called the ANOVA table as follows:
ANOVA table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Treatments SSA MSA


Error SSE MSE
Total SST

Substituting the values for (a), the ANOVA table is:


ANOVA table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Treatments
Error
Total

4
STA408 Chapter 4: Analysis of Variance

Example 5
Below is the Minitab output for the data of the three methods used in teaching Statistics of Example 4.
Test at 5% significance level the claim that there is no difference among the population mean scores.

One-way ANOVA: Method I, Method II, Method III

Method

Null hypothesis All means are equal


Alternative hypothesis At least one mean is different
5

Equal variances were assumed for the analysis.

Factor Information

Factor Levels Values


Factor 3 Method I, Method II, Method III

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Factor 2 432.1 216.1 1.09 0.366
Error 12 2372.8 197.7
Total 14 2804.9

Model Summary

S R-sq R-sq(adj) R-sq(pred)


14.0618 15.41% 1.31% 0.00%

Means

Factor N Mean StDev 95% CI


Method I 5 64.80 16.07 (51.10, 78.50)
Method II 5 73.80 13.95 (60.10, 87.50)
Method III 5 77.60 11.84 (63.90, 91.30)

Pooled StDev = 14.0618

5
STA408 Chapter 4: Analysis of Variance

Example 6
A consumer agency wanted to find out if the mean time taken by each of three brands of medicines to
provide relief from a headache is the same. The following table gives the time (in minutes) taken by each
patient to get relief from a headache after taking the medicine.
Drug I 25 38 42 65 41 52
Drug II 15 21 19 28
Drug III 44 64 58 73

At a 2.5% significance level, will you conclude that the mean time taken to provide relief from a headache
is the same for each of the three drugs?

Drug I 25 38 42 65 41 52

Drug II 15 21 19 28

Drug III 44 64 58 73

ANOVA Table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Treatments
Error
Total

6
STA408 Chapter 4: Analysis of Variance

Example 7
Below is the Minitab output for the data of Example 6. Test using the -value. Use
One-way ANOVA: Drug I, Drug II, Drug III

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Factor 2 3086 1543.0 11.72 0.002
Error 11 1448 131.7
Total 13 4534

Example 8
The following ANOVA table, based on information obtained for samples selected from four independent
populations that are normally distributed with equal variances, has a few missing values.
ANOVA table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Treatments
Error 9.2154
Total

(a) Find the missing values and complete the ANOVA table.
(b) Using , what is your conclusion for the test with the null hypothesis that the means of the
four populations are all equal against the alternative hypothesis that the means of the four
populations are not all equal?

7
STA408 Chapter 4: Analysis of Variance

4.2 Randomised Complete Block Designs

Comparing a Set of Treatments in Blocks


The idea of blocking is isolating sets of experiment units that are reasonably homogeneous and randomly
assigning treatments to these units. It is done to reduce experimental error (units in a block have more
common characteristics that units in different blocks).
Note:
Blocks should not be viewed as a second factor.
A block variable is considered a variable that is a confounding variable, i.e., not of interest by itself
but has an influence on the response variable and should for this reason be included.

If we have k treatments and b blocks, then the total sample size is .

Example 9

-till management. Soil series represented in


the experimental areas varied across fields and were among typical agricultural soil series of Iowa and
neighbouring states. (source: Yield and Early Growth Responses to Starter Fertilizer in No-Till Corn
Assessed with Precision agriculture Technologies, Manuel Bermudez and Antonio P. Mallarino (2002)).
To illustrate, assume three different starters were used. In order to limit the number of fields that needed
to be planted, 10 locations were chosen where each field was divided into three parts, then each part was
treated with a different starter (randomly assigning the treatment to each part). Besides the fertilizer, all
other agricultural practises were to be the same.

Response variable:
Treatment (factor) variable:
Block (confounding) variable:

ANOVA table for a Randomised Block Design


ANOVA table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Treatments SSA MSA

Blocks SSB MSB

Error SSE MSE


Total SST

8
STA408 Chapter 4: Analysis of Variance

The ANOVA -Test (Randomised Block Design)


The hypotheses are:
(No treatment effects)
At least one of the is not equal to 0. (at least one of the values differs from the others)
or
(No block effects)
At least one of the is not equal to 0. (at least one of the values differs from the others)

Assumptions:
- The populations follow a normal distribution with means , .
- Equal variances for all combinations of treatments and blocks.
- The samples are independent random samples in independent blocks from each population.

Test statistics (Computed F values):

Example 10
The cutting speed of four types of tools are being compared in an experiment. Five cutting materials of
varying degree of hardness are to be used. The data given the measurement of cutting time in seconds
appear in the table below.

Materials
Tools 1 2 3 4 5
1 12 2 8 1 7
2 20 14 17 12 17
3 13 7 13 8 14
4 11 5 10 3 6

(a) Identify the type of experimental design used in this study.


(b) Identify the response, factor and block variables.
(c) State the assumptions required for the ANOVA test.
(d) Construct an ANOVA table for the data above.
(e) Test at 5% significance level whether there is sufficient evidence to indicate that the means of
cutting speed are different when using different types of tools.
(f) Is there sufficient evidence at 5% significance level to conclude that the means of cutting speed
vary from material to material?

9
STA408 Chapter 4: Analysis of Variance

(d)
Tools Materials (blocks)
(Treatments) 1 2 3 4 5 Total
1 12 2 8 1 7
2 20 14 17 12 17
3 13 7 13 8 14
4 11 5 10 3 6
Total

10
STA408 Chapter 4: Analysis of Variance

Example 11
Below is the output for the data given in Example 10.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Tool 3 310.00 103.333 51.67 0.000
Material 4 184.00 46.000 23.00 0.000
Error 12 24.00 2.000
Total 19 518.00

By using -value, we can show that the null hypotheses for the tool effects and material effects (part (e)
and (f) of example 10 respectively) are rejected.

Example 12
Four different machines , , and are being considered for the assembling of a particular
product. It was decided that six different operators would be used in a randomised block experiment to
compare the machines. The machines were assigned in a random order to each operator. The operation
of the machines requires physical dexterity, and it was anticipated that there would be a difference among
the operators in the speed with which they operated the machines. The amounts of time (in seconds)
required to assemble the product are shown in the table below.

Operator
Machine 1 2 3 4 5 6
1 42.5 39.3 39.6 39.9 42.9 43.6
2 39.8 40.1 40.5 42.3 42.5 43.1
3 40.2 40.5 41.3 43.4 44.9 45.1
4 41.3 42.2 43.5 44.2 45.9 42.3
The Minitab output of the data in the table above is given as follows.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Machine 3 15.92 5.308 3.34 0.048
Operator 5 42.09 8.417 5.29 0.005
Error 15 23.85 1.590
Total 23 81.86

(a) Show that the total sum of squares is 81.86.


(b) Test the null hypothesis, at 0.025 level of significance, that the machines perform at the same mean
rate of speed.

11
STA408 Chapter 4: Analysis of Variance

Example 13
Given a randomized block experiment with five groups and eight blocks, in the following ANOVA
summary table, fill in all the missing results.

Source of Degrees of Sum of Mean Value of the Test


Variation freedom Squares Square Statistic

Factors

Blocks 540

Error

Total

(a) At the 0.05 level of significance, is there evidence of a difference among the five group means?
(b) At the 0.05 level of significance, is there evidence of an effect due to blocks?

12
STA408 Chapter 4: Analysis of Variance

4.3 Two-way ANOVA


A two-way ANOVA test analyses two factors or two independent variables.
Able to test the effects of two independent variables or factors on one dependent variable.
Able to test the interaction effect of the two variables.

Example 14
Suppose a researcher wishes to test the effects of two different types of plant food and two different
types of soil on the growth of certain plants.
Two independent variables: - the type of plant food
- the type of soil
Dependent variable: - the plant growth
Other factors like water, temperature and sunlight are held constant.

To conduct this experiment, the researcher sets up four groups of plants as shown in the Figure 1.

Figure 1: Treatment Groups for the Plant Food-Soil Type Experiment

Group 1: Plant food type I, soil type I,


Group 2: Plant food type I, soil type II,
Group 3: Plant food type II, soil type I,
Group 4: Plant food type II, soil type II.

These groups are called treatment groups.


The plants are assigned to the group at random.
The design is called a design because each variable consists of two levels, i.e., two different
treatments.

The two-way ANOVA enables the researcher to


test the effects of the plant food and the soil type in a single experiment,
test the effect of the interaction of the two variables.

The two-way ANOVA design has several null-hypotheses:


One for each independent variable (main effects)
One for the interaction.

13
STA408 Chapter 4: Analysis of Variance

(I) Interaction effect


There is no interaction effect between plant food type and soil type on plant growth.
There is an interaction effect between plant food type and soil type on plant growth.

(II) Independent variable effect (plant food)


There is no difference between the means of heights of plants grown using different foods.
There is a difference between the means of heights of plants grown using different foods.

(III) Independent variable effect (soil)


There is no difference between the means of heights of plants grown using different soil types.
There is a difference between the means of heights of plants grown using different soil types.

A two-way ANOVA summary table


Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Error SSE MSE


Total SST

where number of levels of factor ,


number of levels of factor B,
number of subjects in each group.
Assumptions for the two-way ANOVA
The populations from which the samples are drawn are (approximately) normally distributed.
The populations from which the samples are drawn have the same variances (or standard
deviations).
The samples drawn from different populations are random and independent.
The groups must be equal in sample size.

Figure 2 shows some examples of different types of Two-way ANOVA Designs

Figure 2: Some types of Two-way ANOVA Designs

14
STA408 Chapter 4: Analysis of Variance

Example 15
A researcher wishes to see whether the type of gasoline use and the type of automobile driven have any
effect on the gasoline consumption. Two types of gasoline, regular and high-octane, will be used, and two
types of automobile, two-wheel- and four-wheel-drive, will be used in each group. There will be two
automobiles in each group, for a total of eight automobiles used. Using a two-way analysis of variance,
the researcher will perform the following steps.
Step 1: State the hypotheses.
Step 2: Find the critical value for each F test, using
Step 3: Complete the summary table to get the test value.
Step 4: Make the decision.
Step 5: Conclude the results.

The data (in miles per gallon) are given in the table below.
Type of automobile
Gas Two-wheel-drive Four-wheel-drive
26.7 28.6
Regular
25.2 29.3
32.3 26.1
High-octane
32.8 24.2

Gas W Type of automobile


Two-wheel-drive Four-wheel-drive Total
26.7 51.9 28.6 57.9
Regular x21
25.2 29.3
32.3 65.7 26.1 50.3
High-octane
x,2 32.8 24.2
Total 8

8
3

15
STA408 Chapter 4: Analysis of Variance

Step 1: State the hypotheses


(I) Interaction effect

(II) Independent variable effect (Gasoline effect)

(III) Independent variable effect (Automobile effect)

Step 2: Find the critical value.

Step 3: Complete the summary table


A two-way ANOVA summary table
Source of Degrees of Sum of Mean Value of the Test
Variation freedom Squares Square Statistic

Error
Total

Step 4: Make the decision


Since and are greater than the critical value, 7.71, reject for the type of
automobile driven and the interaction effect.
Since the interaction effect is statistically significant, no decision should be made about automobile type.

Step 5: Conclusion.
Since for the interaction effect is rejected, it can be concluded that there is an interaction effect
between type of gasoline used and type of automobile driven on the gasoline consumption at 5%
significance level.

16
I
STA408 Chapter 4: Analysis of Variance

Example 16
Below is the Minitab output for Example 15.
Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Gas 1 3.920 3.9200 4.75 0.095
Automobile 1 9.680 9.6800 11.73 0.027
Gas*Automobile 1 54.080 54.0800 65.55 0.001
Error 4 3.300 0.8250
Total 7 70.980

I
Note:
In the example above, the effect of the type of gasoline used and the effect of the type of automobile driven
are called the main effects.
Suppose that the interaction effect is not significant, the strategy of analysis is to proceed with the
analysis of main effect. Otherwise, it is not necessary to test on the main effects because the knowledge
on interaction when it is significant is more useful than the knowledge of the main effects.

Example 17
A contractor wishes to see whether there is a difference in the time (in days) it takes two subcontractors
to build three different types of homes. The data is as follows.

Home type
Subcontractor I II III
A 25, 28, 26, 30, 31 30, 32, 35, 29, 31 43, 40, 42, 49, 48
B 15, 18, 22, 21, 17 21, 27, 18, 15, 19 23, 25, 24, 17, 13

The Minitab output for the above data is as shown below.


Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Subcontractor 1 1672.5 1672.53 (iv) 0.000
Home (i) (ii) 222.43 16.24 0.000
Subcontractor*Home 2 313.3 156.63 (v) 0.000
Error 24 328.8 (iii)
Total 29 2759.5

(a) Determine the experimental design used in this analysis.


(b) What are the assumptions required for the ANOVA analysis?
(c) How many observations were involved in this study?
(d) Find the values of (i), (ii), (iii), (iv) and (v) in the ANOVA table above.
(e) Do the data provide sufficient evidence to indicate that there is an interaction effect between the
assignment of subcontractors and types of homes built? Test using .
(f) Based on the result in (e), should a further test be conducted on the main effects?

17

You might also like