Experimental Design: Some Basic Design Concepts

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

2

Experimental Design
R og e r E . Kirk

SOME BASIC DESIGN CONCEPTS to occur. Carefully designed and executed


experiments continue to be one of science’s
Sir Ronald Fisher, the statistician, eugenicist, most powerful methods for establishing
evolutionary biologist, geneticist, and father causal relationships.
of modern experimental design, observed
that experiments are ‘only experience care-
fully planned in advance, and designed to Experimental design
form a secure basis of new knowledge’ An experimental design is a plan for assigning
(Fisher, 1935: 8). Experiments are charac- experimental units to treatment levels and the
terized by the: (1) manipulation of one statistical analysis associated with the plan
or more independent variables; (2) use (Kirk, 1995: 1). The design of an experiment
of controls such as randomly assigning involves a number of inter-related activities.
participants or experimental units to one or
more independent variables; and (3) careful
1. Formulation of statistical hypotheses that are
observation or measurement of one or more germane to the scientific hypothesis. A statistical
dependent variables. The first and second hypothesis is a statement about: (a) one or more
characteristics—manipulation of an indepen- parameters of a population or (b) the functional
dent variable and the use of controls such form of a population. Statistical hypotheses are
as randomization—distinguish experiments rarely identical to scientific hypotheses—they are
from other research strategies. testable formulations of scientific hypotheses.
The emphasis on experimentation in the 2. Determination of the treatment levels (indepen-
sixteenth and seventeenth centuries as a way dent variable) to be manipulated, the measure-
of establishing causal relationships marked ment to be recorded (dependent variable), and
the emergence of modern science from its the extraneous conditions (nuisance variables)
that must be controlled.
roots in natural philosophy (Hacking, 1983).
3. Specification of the number of experimental units
According to nineteenth-century philoso- required and the population from which they will
phers, a causal relationship exists: (1) if be sampled.
the cause precedes the effect; (2) whenever 4. Specification of the randomization procedure for
the cause is present, the effect occurs; and assigning the experimental units to the treatment
(3) the cause must be present for the effect levels.
24 DESIGN AND INFERENCE

5. Determination of the statistical analysis that will cases it may be possible to find preexisting
be performed (Kirk, 1995: 1–2). or naturally occurring experimental units
who have been exposed to the disease. If
In summary, an experimental design identifies the research has all of the features of an
the independent, dependent, and nuisance experiment except random assignment, it is
variables and indicates the way in which called a quasi-experiment. Unfortunately, the
the randomization and statistical aspects of interpretation of quasi-experiments is often
an experiment are to be carried out. The ambiguous. In the absence of random assign-
primary goal of an experimental design is ment, it is difficult to rule out all variables
to establish a causal connection between other than the independent variable as expla-
the independent and dependent variables. nations for an observed result. In general, the
A secondary goal is to extract the maximum difficulty of unambiguously interpreting the
amount of information with the minimum outcome of research varies inversely with
expenditure of resources. the degree of control that a researcher is able
to exercise over randomization.
Randomization
Replication and local control
The seminal ideas for experimental design
can be traced to Sir Ronald Fisher. The Fisher popularized two other principles of
publication of Fisher’s Statistical methods good experimentation: replication and local
for research workers in 1925 and The design control or blocking. Replication is the obser-
of experiments in 1935 gradually led to the vation of two or more experimental units
acceptance of what today is considered the under the same conditions. Replication
cornerstone of good experimental design: enables a researcher to estimate error effects
randomization. Prior to Fisher’s pioneering and obtain a more precise estimate of
work, most researchers used systematic treatment effects. Blocking, on the other hand,
schemes rather than randomization to assign is an experimental procedure for isolating
participants to the levels of a treatment. variation attributable to a nuisance variable.
Random assignment has three purposes. Nuisance variables are undesired sources
It helps to distribute the idiosyncratic of variation that can affect the dependent
characteristics of participants over the variable. Three experimental approaches are
treatment levels so that they do not selectively used to deal with nuisance variables:
bias the outcome of the experiment. Also,
random assignment permits the computation 1. Hold the variable constant.
2. Assign experimental units randomly to the
of an unbiased estimate of error effects—those
treatment levels so that known and unsuspected
effects not attributable to the manipulation sources of variation among the units are
of the independent variable—and it helps to distributed over the entire experiment and do not
ensure that the error effects are statistically affect just one or a limited number of treatment
independent. Through random assignment, levels.
a researcher creates two or more groups of 3. Include the nuisance variable as one of the factors
participants that at the time of assignment are in the experiment.
probabilistically similar on the average.
The latter approach is called local control or
blocking. Many statistical tests can be thought
Quasi-experimental design of as a ratio of error effects and treatment
Sometimes, for practical or ethical reasons, effects as follows:
participants cannot be randomly assigned to
Test statistic =
treatment levels. For example, it would be
unethical to expose people to a disease to f (error effects) + f (treatment effects)
(1)
evaluate the efficacy of a treatment. In such f (error effects)
EXPERIMENTAL DESIGN 25

where f ( ) denotes a function of the effects


Treat. Dep.
in parentheses. Local control, or blocking,
Level Var.
isolates variation attributable to the nuisance  Participant
1 a1 Y 11
variable so that it does not appear in estimates 

Participant a1 Y 21

of error effects. By removing a nuisance 2




variable from the numerator and denominator Group1 Participant3 a1 Y 31
.. .. ..

of the test statistic, a researcher is rewarded




 . . .
with a more powerful test of a false null

Participantn a1 Y n1
...........................
hypothesis.
Y¯ .1

Analysis of covariance Figure 2.1 Layout for a one-group posttest-


only design. The i = 1, …, n participants
A second general approach can be used to con- receive the treatment level (Treat. Level)
trol nuisance variables. The approach, which denoted by a1 after which the dependent
variable (Dep. Var.) denoted by Yi 1 is mea-
is called analysis of covariance, combines sured. The mean of the dependent variable is
regression analysis with analysis of variance. denoted by Ȳ .1 .
Analysis of covariance involves measuring
one or more concomitant variables in addition
to the dependent variable. The concomitant dependent-variable mean, Ȳ .1 , with what the
variable represents a source of variation that researcher thinks would happen in the absence
has not been controlled in the experiment and of the treatment.
one that is believed to affect the dependent It is difficult to draw unambiguous con-
variable. Through analysis of covariance, the clusions from the design because of serious
dependent variable is statistically adjusted to threats to internal validity. Internal validity is
remove the effects of the uncontrolled source concerned with correctly concluding that an
of variation. independent variable is, in fact, responsible
The three principles that Fisher vigor- for variation in the dependent variable. One
ously championed—randomization, replica- threat to internal validity is history: events
tion, and local control—remain the foundation other than the treatment that occur between
of good experimental design. Next, I describe the time the treatment is presented and the time
some threats to internal validity in simple that the dependent variable is measured. Such
experimental designs. events, called rival hypotheses, become more
plausible the longer the interval between the
treatment and the measurement of the depen-
THREATS TO INTERNAL VALIDITY IN dent variable. Another threat is maturation.
SIMPLE EXPERIMENTAL DESIGNS The dependent variable may reflect processes
unrelated to the treatment that occur simply
as a function of the passage of time: growing
One-group posttest-only design
older, stronger, larger, more experienced, and
One of the simplest experimental designs is so on. Selection is another serious threat
the one-group posttest-only design (Shadish, to the internal validity of the design. It is
et al., 2002: 106–107).Adiagram of the design possible that the participants in the experiment
is shown in Figure 2.1. In the design, a sample are different from those in the hypothesized
of participants is exposed to a treatment after comparison sample. The one-group posttest-
which the dependent variable is measured. only design should only be used in situations
I use the terms ‘treatment,’ ‘factor,’ and where the researcher is able to accurately
‘independent variable’ interchangeably. The specify the value of the mean that would be
treatment has one level denoted by a1 . The observed in the absence of the treatment. Such
design does not have a control group or a situations are rare in the behavioral sciences
pretest. Hence, it is necessary to compare the and education.
26 DESIGN AND INFERENCE

difference between Ȳ .1 and Ȳ .2 the longer


Dep. Treat. Dep.
Var. Level Var.
the time interval between the pretest and
 posttest. Selection, which is a problem with
 Block1 Y 11 a1 Y 12

 the one-group posttest-only design, is not a
 Block2 Y 21 a1 Y 22



Group1 Block3 Y 31 a1 Y 32 problem. However, as I describe next, the use


 .
..
.
..
.
..
.
.. of a pretest introduces new threats to internal


 validity: testing, statistical regression, and
Blockn Y n1 a1 Y n1

.................................. instrumentation.
Y¯ .1 Y¯ .2 Testing is a threat to internal validity
because the pretest can result in familiarity
Figure 2.2 Layout for a one-group with the testing situation, acquisition of
pretest-posttest design. The dependent
variable (Dep. Var.) for each of the n blocks
information that can affect the dependent vari-
of participants is measured prior to the able, or sensitization to information or issues
presentation of the treatment level, a1 , and that can affect how the dependent variable
again after the treatment level (Treat. Level) is perceived. A pretest, for example, may
has been presented.The means of the sensitize participants to a topic, and, as a result
pretest and posttest are denoted by Ȳ .1 of focusing attention on the topic, enhance
and Ȳ .2 , respectively. the effectiveness of a treatment. The opposite
effect also can occur. A pretest may diminish
One-group pretest-posttest design participants’ sensitivity to a topic and thereby
reduce the effectiveness of a treatment.
The one-group pretest-posttest design shown Statistical regression is a threat when the
in Figure 2.2 also has one treatment level, a1 . mean-pretest scores are unusually high or low
However, the dependent variable is measured and measurement of the dependent variable
before and after the treatment level is is not perfectly reliable. Statistical regression
presented. The design enables a researcher to operates to increase the scores of participants
compute a contrast between means in which on the posttest if the mean-pretest score
the pretest and posttest means are measured is unusually low and decrease the scores
with the same precision. Each block in the of participants if the mean pretest score is
design can contain one participant who unusually high. The amount of statistical
is observed two times or two participants regression is inversely related to the reliability
who are matched on a relevant variable. of the measuring instrument.
Alternatively, each block can contain identical Another threat is instrumentation: changes
twins or participants with similar genetic char- in the calibration of measuring instruments
acteristics. The essential requirement is that between the pretest and posttest, shifts in
the variable on which participants are matched the criteria used by observers or scorers,
is correlated with the dependent variable. The or unequal intervals in different ranges of
null and alternative hypotheses are measuring instruments.
The one-group posttest-only design is most
H0 : µ1 − µ2 = δ0 (2) useful in laboratory settings where the time
H1 : µ1 − µ2 $ = δ0 , (3) interval between the pretest and posttest is
short. The internal validity of the design can be
where δ0 is usually equal to 0. A t statistic for improved by the addition of a second pretest
dependent samples typically is used to test as I show next.
the null hypothesis.
The design is subject to several of the One-group double-pretest posttest
same threats to internal validity as the one-
design
group posttest-only design, namely, history
and maturation. These two threats become The plausibility of maturation and statistical
more plausible explanations for an observed regression as threats to internal validity of
EXPERIMENTAL DESIGN 27

Dep. Dep. Treat. Dep. Treat. Dep.


Var. Var. Level Var. Level Var.

Block1 Y 11 Y 12 a1 Y 13 Participant1 a1 Y 11


 

 

Block2 Y 21 Y 22 a1 Y 23  Participant2 a1 Y 21

 



Group1 Block3 Y 31 Y 32 a1 Y 33 Group1  .. .. ..
. . .


.. .. .. .. ..

 


Participantn1 a1 Y n1 1
 


 . . . . . ............................

Blockn Y n1 Y n2 a1 Y n3
............................................. Y¯ .1
Y¯ .1 Y¯ .2 Y¯ .3 

 Participant1 a2 Y 12


 Participant2
 a2 Y 22
Figure 2.3 Layout for a one-group
double-pretest-posttest design. The Group2 . . .


 .. .. ..
dependent variable (Dep. Var.) for each of


Participantn2 a2 Y n2 2

the n blocks of participants is measured ............................
twice prior to the presentation of the Y¯ .2
treatment level (Treat. Level), a1 , and again
after the treatment level has been Dep. Var., dependent variable; Treat. Level, treatment level.
presented. The means of the pretests are Figure 2.4 Layout for an independent
denoted by Ȳ .1 and Ȳ .2 ; the mean of the samples t -statistic design. Twenty
posttest is denoted by Ȳ .3 . participants are randomly assigned to
treatment levels a1 and a2 with n1 = n2 = 10
in the respective levels. The means of the
an experiment can be reduced by having two treatment levels are denoted by Ȳ .1 and Ȳ .2 ,
pretests as shown in Figure 2.3. The time respectively.
intervals between the three measurements
of the dependent variable should be the
same. The null and alternative hypotheses are, validity of a design. One such design is the
respectively, randomization and analysis plan that is used
H0 : µ2 = µ3 (4) with a t statistic for independent samples.
H1 : µ2 $ = µ3 (5) The design is appropriate for experiments in
which N participants are randomly assigned
The data are analyzed by means of a ran- to treatment levels a1 and a2 with n1 and n2
domized block analysis of covariance design participants in the respective levels.Adiagram
where the difference score, Di = Yi1 – Yi2 , of the design is shown in Figure 2.4.
for the two pretests is used as the covariate. Consider an experiment to evaluate the
The analysis statistically adjusts the contrast effectiveness of a medication for helping
Ȳ .2 – Ȳ .3 for the threats of maturation and smokers break the habit. The treatment levels
statistical regression. Unfortunately, the use are a1 = a medication delivered by a patch
of two pretests increases the threat of another that is applied to a smoker’s back and a2 = a
rival hypothesis: testing. History and instru- placebo, a patch without the medication. The
mentation also are threats to internal validity. dependent variable is a participant’s rating
on a ten-point scale of his or her desire for
a cigarette after six months of treatment.
SIMPLE-EXPERIMENTAL DESIGNS
The null and alternative hypotheses for the
WITH ONE OR MORE CONTROL experiment are, respectively,
GROUPS
H0 : µ1 − µ2 = δ0 (6)
Independent samples t-statistic H1 : µ1 − µ2 $ = δ0 (7)
design
The inclusion of one or more control groups where µ1 and µ2 denote the means of
in an experiment greatly increases the internal the respective populations and δ0 usually
28 DESIGN AND INFERENCE

is equal to 0. A t statistic for independent


Dep. Dep. Treat. Dep.
samples is used to test the null hypothesis.
Var. Var. Level Var.
Assume that N = 20 smokers are available
Block1 a1 Y 11 a2 Y 12
to participate in the experiment. The resear-
cher assigns n = 10 smokers to each treatment Block2 a1 Y 21 a2 Y 22
level so that each of the (np)!/(n!)p = 184,756 Block3 a1 Y 31 a2 Y 32
.. .. .. .. ..
possible assignments has the same probability. . . . . .
This is accomplished by numbering the Block10 a1 Y 10,1 a2 Y 10,2
...............................................
smokers from 1 to 20 and drawing numbers
from a random numbers table. The smokers Y¯ .1 Y¯ .2
corresponding to the first 10 unique numbers Dep. Var., dependent variable; Treat. Level, treatment level.
drawn between 1 and 20 are assigned to Figure 2.5 Layout for a dependent samples
treatment level a1 ; the remaining 10 smokers t -statistic design. Each block in the smoking
are assigned to treatment level a2 . experiment contains two matched
Random assignment is essential for the participants. The participants in each block
are randomly assigned to the treatment
internal validity of the experiment. Random
levels. The means of the treatments levels
assignment helps to distribute the idiosyn- are denoted by Ȳ .1 and Ȳ .2 .
cratic characteristics of the participants over
the two treatment levels so that the character-
istics do not selectively bias the outcome of the rank them in terms of the number of cigarettes
experiment. If, for example, a disproportion- they smoke. The participants ranked 1 and
ately large number of very heavy smokers was 2 are assigned to block one, those ranked
assigned to either treatment level, the evalua- 3 and 4 are assigned to block two, and so
tion of the treatment could be compromised. on. In this example, 10 blocks of matched
Also, it is essential to measure the dependent participants can be formed. The participants
variable at the same time and under the same in each block are then randomly assigned to
conditions. If, for example, the dependent treatment level a1 or a2 . The layout for the
variable is measured in different testing room, experiment is shown in Figure 2.5. The null
irrelevant events in one of the rooms— and alternative hypotheses for the experiment
distracting outside noises, poor room air cir- are, respectively:
culation, and so on—become rival hypotheses
for the difference between Ȳ .1 and Ȳ .2 . H0 : µ1 − µ2 = δ0 (8)
H1 : µ1 − µ2 $= δ0 (9)
Dependent samples t-statistic
where µ1 and µ2 denote the means of the
design
respective populations; δ0 usually is equal
Let us reconsider the cigarette smoking to 0. If our hunch is correct, that difficulty
experiment. It is reasonable to assume that in breaking the smoking habit is related to
the difficulty in breaking the smoking habit is the number of cigarettes smoked per day,
related to the number of cigarettes that a per- the t test for the dependent-samples design
son smokes per day. The design of the smoking should result in a more powerful test of a
experiment can be improved by isolating this false null hypothesis than the t test for the
nuisance variable. The nuisance variable can independent-samples design. The increased
be isolated by using the experimental design power results from isolating the nuisance
for a dependent samples t statistic. Instead of variable of number of cigarettes smoked per
randomly assigning 20 participants to the two day and thereby obtaining a more precise
treatment levels, a researcher can form pairs estimate of treatment effects and a reduction
of participants who are matched with respect in the size of the error effects.
to the number of cigarettes smoked per day. In this example, dependent samples were
A simple way to match the participants is to obtained by forming pairs of smokers who
EXPERIMENTAL DESIGN 29

were similar with respect to the number


Dep. Treat. Dep.
of cigarettes smoked per day—a nuisance
Var. Level Var.
variable that is positively correlated with the 
Block1 Y 11 a1 Y 12
dependent variable. This procedure is called 


 Block Y 21 a1 Y 22

participant matching. Dependent samples 2
.. .. .. ..
also can be obtained by: (1) observing each Group1 

 . . . .


participant under all the treatment levels— Blockn1 Y n1 1 a1 Y n1 2
...................................
that is, obtaining repeated measures on the
Y¯ .1 Y¯ .2
participants; (2) using identical twins or 
Block1 Y 13 Y 14

litter mates in which case the participants 

 Block Y 23 Y 24

2
have similar genetic characteristics; and .. .. ..
(3) obtaining participants who are matched by Group2 

 . . .


mutual selection, for example, husband and Blockn2 Y n2 3 Y n2 4
...................................
wife pairs or business partners. ¯
Y .3 Y¯ .4

I have described four ways of obtaining 
 Block1 a1 Y 15

dependent samples. The use of repeated  Block a1 Y 25

2
measures on the participants usually results in .. .. ..
Group3 

 . . .
the best within-block homogeneity. However,


Blockn3 a1 Y n3 5
...................................
if repeated measures are obtained, the effects
Y¯ .5
of one treatment level should dissipate 
Block1 Y 16

before the participant is observed under the 

 Block Y 26

2
other treatment level. Otherwise the second .. ..
observation will reflect the effects of both Group4 

 . .


treatment levels. There is no such restriction, Blockn4 Y n4 6
...................................
of course, if carryover effects such as learning Y¯ .6
or fatigue are the principal interest of the
researcher. If blocks are composed of identical Figure 2.6 Layout for a Solomon four-group
twins or litter mates, it is assumed that the design; n participants are randomly assigned
performance of participants having identical to four groups with n1 , n2 , n3 , and n4
or similar heredities will be more homoge- participants in the groups. The pretest is
given at the same time to groups 1 and 2.
neous than the performance of participants
The treatment is administered at the same
having dissimilar heredities. If blocks are time to groups 1 and 3. The posttest is given
composed of participants who are matched by at the same time to all four groups.
mutual selection, for example, husband and
wife pairs, a researcher must ascertain that the
participants in a block are in fact more homo- settings. The layout for the Solomon four-
geneous with respect to the dependent variable group design is shown in Figure 2.6.
than are unmatched participants. A husband The data from the design can be analyzed
and wife often have similar interests and in a variety of ways. For example, the effects
socioeconomic levels; the couple is less likely of treatment a1 can be evaluated using a
to have similar mechanical aptitudes. dependent samples t test of H0 : µ1 − µ2 and
independent samples t tests of H0 : µ2 − µ4 ,
H0 : µ3 − µ5 , and H0 : µ5 − µ6 . Unfortunately,
Solomon four-group design
no statistical procedure simultaneously uses
The Solomon four-group design enables a all of the data. A two-treatment, completely
researcher to control all threats to internal randomized, factorial design that is described
validity as well as some threats to external later can be used to evaluate the effects
validity. External validity is concerned with of the treatment, pretesting, and the testing-
the generalizability of research findings to treatment interaction. The latter effect is a
and across populations of participants and threat to external validity. The layout for the
30 DESIGN AND INFERENCE

Treatment B
b1 b2
No Pretest Pretest
Y 15 Y 12
Y 25 Y 22
.. ..
a 1 Treatment . .
Y n3 5 Y n1 2
.....................................
Y¯ .5 Y¯ .2 Y¯ Treatment

Treatment A Y 16 Y 14
Y 26 Y 24
.. ..
a 2 No Treatment . .
Y n4 6 Y n2 4
.....................................
Y¯ .6 Y¯ .4 Y¯ No Treatment

Y¯ No Pretest Y¯ Pretest

Figure 2.7 Layout for analyzing the Solomon four-group design. Three contrasts can be
tested: effects of treatment A (Ȳ Treatment versus Ȳ No Treatment ), effects of treatment B
(Ȳ Pretest versus Ȳ No pretest ), and the interaction of treatments A and B (Ȳ .5 + Ȳ .4 ) versus
(Ȳ .2 + Ȳ .6 ).

factorial design is shown in Figure 2.7. The sample contrasts of Ȳ .1 versus Ȳ .6 and Ȳ .3
null hypotheses are as follows: versus Ȳ .6 .
The inclusion of one or more control groups
H0 : µTreatment = µNo Treatment (treatment A in a design helps to ameliorate the threats to
population means are equal) internal validity mentioned earlier. However,
H0 : µPretest = µNo Pretest (treatment B as I describe next, a researcher must still deal
with other threats to internal validity when the
population means are equal) experimental units are people.
H0 : A × B interaction = 0 (treatments A
and B do not interact)
THREATS TO INTERNAL VALIDITY
The null hypotheses are tested with the WHEN THE EXPERIMENTAL UNITS
following F statistics: ARE PEOPLE
MSA MSB Demand characteristics
F= ,F = , and
MSWCELL MSWCELL
MSA × B Doing research with people poses special
F= (10) threats to the internal validity of a design.
MSWCELL
Because of space restrictions, I will men-
where MSWCELL denotes the within-cell- tion only three threats: demand character-
mean square. istics, participant-predisposition effects, and
Insight into the effects of administering a experimenter-expectancy effects. According
pretest is obtained by examining the sample to Orne (1962), demand characteristics result
contrasts of Ȳ .2 versus Ȳ .5 and Ȳ .4 versus Ȳ .6 . from cues in the experimental environment
Similarly, insight into the effects of maturation or procedure that lead participants to make
and history is obtained by examining the inferences about the purpose of an experiment
EXPERIMENTAL DESIGN 31

and to respond in accordance with (or in some aren’t interested in the experimenter’s hypoth-
cases, contrary to) the perceived purpose. esis, much less in sabotaging the experiment.
People are inveterate problem solvers. When Instead, their primary concern is in gaining a
they are told to perform a task, the majority positive evaluation from the researcher. The
will try to figure out what is expected data they provide are colored by a desire
of them and perform accordingly. Demand to appear intelligent, well adjusted, and so
characteristics can result from rumors about on, and to avoid revealing characteristics
an experiment, what participants are told that they consider undesirable. Clearly, these
when they sign up for an experiment, participant-predisposition effects can affect
the laboratory environment, or the com- the internal validity of an experiment.
munication that occurs during the course
of an experiment. Demand characteristics
Experimenter-expectancy effects
influence a participant’s perceptions of what
is appropriate or expected and, hence, their Experimenter-expectancy effects can also
behavior. affect the internal validity of an experiment.
Experiments with human participants are
social situations in which one person behaves
Participant-predisposition effects
under the scrutiny of another. The researcher
Participant-predisposition effects can also requests a behavior and the participant
affect the interval validity of an experiment. behaves. The researcher’s overt request may
Because of past experiences, personality, and be accompanied by other more subtle requests
so on, participants come to experiments with a and messages. For example, body language,
predisposition to respond in a particular way. tone of voice, and facial expressions can
I will mention three kinds of participants. communicate the researcher’s expectations
The first group of participants is mainly and desires concerning the outcome of an
concerned with pleasing the researcher and experiment. Such communications can affect
being ‘good subjects.’ They try, consciously a subject’s performance.
or unconsciously, to provide data that support Rosenthal (1963) has documented
the researcher’s hypothesis. This participant numerous examples of how experimenter-
predisposition is called the cooperative- expectancy effects can affect the internal
participant effect. validity of an experiment. He found that
A second group of participants tends to be researchers tend to obtain from their subjects,
uncooperative and may even try to sabotage whether human or animal, the data they
the experiment. Masling (1966) has called want or expect to obtain. A researcher’s
this predisposition the ‘screw you effect.’ The expectations and desires also can influence
effect is often seen when research participa- the way he or she records, analyses, and
tion is part of a college course requirement. interprets data. According to Rosenthal (1969,
The predisposition can result from resentment 1978), observational or recording errors are
over being required to participate in an usually small and unintentional. However,
experiment, from a bad experience in a when such errors occur, more often than
previous experiment such as being deceived not they are in the direction of supporting
or made to feel inadequate, or from a dislike the researcher’s hypothesis. Sheridan (1976)
for the course or the professor associated with reported that researchers are much more
the course. Uncooperative participants may likely to recompute and double-check results
try, consciously or unconsciously, to provide that conflict with their hypotheses than results
data that do not support the researcher’s that support their hypotheses.
hypothesis. Kirk (1995: 22–24) described a variety of
A third group of participants are apprehen- ways of minimizing these kinds of threats to
sive about being evaluated. Participants with internal validity. It is common practice, for
evaluation apprehension (Rosenberg, 1965) example, to use a single-blind procedure in
32 DESIGN AND INFERENCE

which participants are not informed about the delivered by a patch, cognitive-behavioral
nature of their treatment and, when feasible, therapy, and a patch without medication
the purpose of the experiment. A single-blind ( placebo). In this example and those that
procedure helps to minimize the effects of follow, a fixed-effects model is assumed, that
demand characteristics. is, the experiment contains all of the treatment
A double-blind procedure, in which neither levels of interest to the researcher. The null
the participant nor the researcher knows and alternative hypotheses for the smoking
which treatment level is administered, is experiment are, respectively:
even more effective. For example, in the
smoking experiment the patch with the H 0 : µ1 = µ2 = µ3 (11)
medication and the placebo can be coded so H1 : µj $= µj& for some j and j& (12)
that those administering the treatment cannot
identify the condition that is administered. Assume that 30 smokers are available to
A double-blind procedure helps to minimize participate in the experiment. The smokers
both experimenter-expectancy effects and are randomly assigned to the three treatment
demand characteristics. Often, the nature of levels with 10 smokers assigned to each
the treatment levels is easily identified. In level. The layout for the experiment is shown
this case, a partial-blind procedure can be in Figure 2.8. Comparison of the layout
used in which the researcher does not know in this figure with that in Figure 2.4 for
until just before administering the treatment an independent samples t-statistic design
level which level will be administered. In reveals that they are the same except that
this way, experimenter-expectancy effects are the completely randomized design has three
minimized up until the administration of the treatment levels.
treatment level. I have identified the null hypothesis that the
The relatively simple designs described so researcher wants to test, µ1 = µ2 = µ3 , and
far provide varying degrees of control of described the manner in which the participants
threats to internal validity. For a description of are assigned to the three treatment levels.
other simple designs such as the longitudinal- Space limitations prevent me from describing
overlapping design, time-lag design, time- the computational procedures or the assump-
series design, and single-subject design, the tions associated with the design. For this
reader is referred to Kirk (1995: 12–16). The information, the reader is referred to the
designs described next are all analyzed using many excellent books on experimental design
the analysis of variance and provide excellent (Anderson, 2001; Dean and Voss, 1999;
control of threats to internal validity. Giesbrecht and Gumpertz, 2004; Kirk, 1995;
Maxwell and Delaney, 2004; Ryan, 2007).
The partition of the total sum of squares,
ANALYSIS OF VARIANCE DESIGNS SSTOTAL, and total degrees of freedom,
np − 1, for a completely randomized design
WITH ONE TREATMENT
are as follows:
Completely randomized design SSTOTAL = SSBG + SSWG (13)
The simplest analysis of variance (ANOVA) np − 1 = (p − 1) + p(n − 1) (14)
design (a completely randomized design)
involves randomly assigning participants to a where SSBG denotes the between-groups sum
treatment with two or more levels. The design of squares and SSWG denotes the within-
shares many of the features of an independent- groups sum of squares. The null hypothesis
samples t-statistic design. Again, consider the is tested with the following F statistic:
smoking experiment and suppose that the
researcher wants to evaluate the effective- SSBG/( p − 1) MSBG
F= = (15)
ness of three treatment levels: medication SSWG/[ p(n − 1)] MSWG
EXPERIMENTAL DESIGN 33

powerful than the completely randomized


Treat. Dep.
Level Var.
design.


 Participant1 a1 Y 11
Randomized block design


 Participant2
 a1 Y 21
Group1 .. .. ..



 . . . The randomized block design can be thought
of as an extension of a dependent samples


Participantn1 a1 Y n1 1
............................ t-statistic design for the case in which the
Y¯ .1 treatment has two or more levels. The layout

 Participant1 a2 Y 12 for a randomized block design with p = 3
levels of treatment A and n = 10 blocks



 Participant2
 a2 Y 22
.. .. ..
is shown in Figure 2.9. Comparison of the
Group2
layout in this figure with that in Figure 2.5

. . .




for a dependent samples t-statistic design

Participantn2 a2 Y n2 2
............................ reveals that they are the same except that the
Y¯ .2 randomized block design has three treatment


 Participant1 a3 Y 13 levels. A block can contain a single participant


 Participant2
 a3 Y 23 who is observed under all p treatment levels
Group3 
 .
..
.
..
.
.. or p participants who are similar with respect



 to a variable that is positively correlated with
Participantn3 a3 Y n3 3
............................ the dependent variable. If each block contains
one participant, the order in which the treat-
Y¯ .3
ment levels are administered is randomized
Dep. Var., dependent variable.
independently for each block, assuming that
the nature of the treatment and the research
Figure 2.8 Layout for a completely
randomized design with p = 3 treatment hypothesis permit this. If a block contains
levels (Treat. Level). Thirty participants are p matched participants, the participants in
randomly assigned to the treatment levels each block are randomly assigned to the
with n1 = n2 = n3 = 10. The means of groups treatment levels. The use of repeated measures
one, two, and three are denoted by, Ȳ .1 , Ȳ .2 , or matched participants does not affect the
and Ȳ .3 , respectively statistical analysis. However, the alternative
procedures do affect the interpretation of
The advantages of a completely randomized the results. For example, the results of an
design are: (1) simplicity in the randomization experiment with repeated measures generalize
and statistical analysis; and (2) flexibility with to a population of participants who have
respect to having an equal or unequal number been exposed to all of the treatment levels.
of participants in the treatment levels. A The results of an experiment with matched
disadvantage is that nuisance variables such as participants generalize to a population of
differences among the participants prior to the participants who have been exposed to only
administration of the treatment are controlled one treatment level. Some writers reserve
by random assignment. For this control to be the designation randomized block design for
effective, the participants should be relatively this latter case. They refer to a design
homogeneous or a relatively large number with repeated measurements in which the
of participants should be used. The design order of administration of the treatment
described next, a randomized block design, levels is randomized independently for each
enables a researcher to isolate and remove participant as a subjects-by-treatments design.
one source of variation among participants A design with repeated measurements in
that ordinarily would be included in the error which the order of administration of the
effects of the F statistic. As a result, the treatment levels is the same for all participants
randomized block design is usually more is referred to as a subject-by-trials design.
34 DESIGN AND INFERENCE

Treat. Dep. Treat. Dep. Treat. Dep.


Level Var. Level Var. Level Var.
Block1 a1 Y 11 a2 Y 12 a3 Y 13 Y¯ 1.
Block2 a1 Y 21 a2 Y 22 a3 Y 23 Y¯ 2.
Block3 a1 Y 31 a2 Y 32 a3 Y 33 Y¯ 3.
.. .. .. .. .. .. .. ..
. . . . . . . .
Block10 a1 Y 10,1 a2 Y 10,2 a3 Y 10,3 Y¯ 10.
.........................................................................
Y¯ .1 Y¯ .2 Y¯ .3
Dep. Var., dependent variable.

Figure 2.9 Layout for a randomized block design with p = 3 treatment levels (Treat. Level)
and n = 10 blocks. In the smoking experiment, the p participants in each block were randomly
assigned to the treatment levels. The means of treatment A are denoted by Ȳ .1 , Ȳ .2 , and Ȳ .3 ;
the means of the blocks are denoted by Ȳ 1. , … , Ȳ 10. .

I prefer to use the designation randomized The test of the block null hypothesis is of
block design for all three cases. little interest because the blocks represent
The total sum of squares and total degrees a nuisance variable that a researcher wants
of freedom are partitioned as follows: to isolate so that it does not appear in the
SSTOTAL = SSA + SSBLOCKS estimators of the error effects.
The advantages of this design are: (1) sim-
+ SSRESIDUAL (16)
plicity in the statistical analysis; and (2) the
np − 1 = ( p − 1) + (n − 1) ability to isolate a nuisance variable so as
+ (n − 1)( p − 1) (17) to obtain greater power to reject a false null
hypothesis. The disadvantages of the design
where SSA denotes the treatment A sum of include: (1) the difficulty of forming homo-
squares and SSBLOCKS denotes the block geneous blocks or observing participants p
sum of squares. The SSRESIDUAL is the times when p is large; and (2) the restrictive
interaction between treatment A and blocks; assumptions (sphericity and additivity) of the
it is used to estimate error effects. Two null design. For a description of these assumptions,
hypotheses can be tested: see Kirk (1995: 271–282).
H0 : µ.1 = µ.2 = µ.3 (treatment A
population means are equal) (18)
H0 : µ1. = µ2. = . . . = µ.15 (block, BL, Latin square design
population means are equal) (19) The Latin square design is appropriate to
experiments that have one treatment, denoted
where µij denotes the population mean for the by A, with p ≥ 2 levels and two nuisance
ith block and the jth level of treatment A. The variables, denoted by B and C, each with p lev-
F statistics are: els. The design gets its name from an ancient
SSA/( p − 1) puzzle that was concerned with the number of
F= ways that Latin letters can be arranged in a
SSRESIDUAL/[(n − 1)( p − 1)]
square matrix so that each letter appears once
MSA
= (20) in each row and once in each column. A 3 × 3
MSRESIDUAL Latin square is shown in Figure 2.10.
SSBL/(n − 1) The randomized block design enables a
F=
SSRESIDUAL/[(n − 1)( p − 1)] researcher to isolate one nuisance variable:
MSBL variation among blocks. A Latin square
= (21)
MSRESIDUAL design extends this procedure to two nuisance
EXPERIMENTAL DESIGN 35

c1 c2 c3
b1 a1 a2 a3 Treat. Dep.
Comb. Var.
b2 a2 a3 a1 
 Participant1 a1b1c 1 Y 111
b3 a3 a1 a2 


. . .
Group1 .. .. ..

Figure 2.10 Three-by-three Latin square,



Participantn1 a1b1c 1 Y 111
where aj denotes one of the j = 1, …, p
..............................
levels of treatment A, bk denotes one of the
k = 1, …, p levels of nuisance variable B, Y¯ .111
and cl denotes one of the l = 1, … , p levels

 Participant1 a1b2c 3 Y 123
of nuisance variable C . Each level of



.. .. ..
treatment A appears once in each row and Group2 . . .
once in each column as required for a




Latin square. Participantn2 a1b2c 3 Y 123
..............................
Y¯ .123
variables: variation associated with the rows 
 Participant1 a1b3c 2 Y 132
(B) of the Latin square and variation associ-



.. .. ..
ated with the columns of the square (C). As Group3 
 . . .
a result, the Latin square design is generally


Participantn3 a1b3c 2 Y 132
more powerful than the randomized block ..............................
design. The layout for a Latin square design Y¯ .132
with three levels of treatment A is shown 
 Participant1 a2b1c 2 Y 212

in Figure 2.11 and is based on the aj bk cl 

.. .. ..
combinations in Figure 2.10. The total sum Group4 
 . . .

of squares and total degrees of freedom are 
Participantn4 a2b1c 2 Y 212
partitioned as follows: ..............................
Y¯ .212
SSTOTAL = SSA + SSB + SSC .. .. ..
. . .
+ SSRESIDUAL + SSWCELL 
Participant1 a3b3c 1 Y 331
np2 − 1 = (p − 1) + (p − 1) + (p − 1)




.. .. ..
+ ( p − 1)( p − 2) + p2 (n − 1) Group9 


. . .

Participantn9 a3b3c 1 Y 331
where SSA denotes the treatment sum of ..............................
squares, SSB denotes the row sum of squares, Y¯ .331
and SSC denotes the column sum of squares. Dep. Var., dependent variable; Treat. Comb., treatment
SSWCELL denotes the within cell sum of combination.
squares and estimates error effects. Four null Figure 2.11 Layout for a Latin square
hypotheses can be tested: design that is based on the Latin square
in Figure 2.10. Treatment A and the two
H0 : µ1.. = µ2.. = µ3.. (treatment A nuisance variables, B and C , each have
p = 3 levels.
population means are equal)
H0 : µ.1. = µ.2. = µ.3. (row, B, where µjkl denotes a population mean for the
population means are equal) jth treatment level, kth row, and lth column.
H0 : µ..1 = µ..2 = µ..3 (column, C, The F statistics are:
population means are equal)
SSA/( p − 1) MSA
H0 : interaction components = 0 (selected F= =
SSWCELL/[ p2 (n − 1)] MSWCELL
A × B, A × C, B × C, and A × B × C SSB/( p − 1) MSB
F= =
interaction components equal zero) 2
SSWCELL/[ p (n − 1)] MSWCELL
36 DESIGN AND INFERENCE

SSC/( p − 1) MSC w groups each containing np homogeneous


F= = participants. The total number of participants
SSWCELL/[ p2 (n − 1)] MSWCELL
in the design is N = npw. The np participants
SSRESIDUAL/( p − 1)( p − 2)
F= in each group are randomly assigned to the
SSWCELL/[ p2 (n − 1)] p treatment levels with the restriction that
MSRESIDUAL n participants are assigned to each level.
= .
MSWCELL The layout for the design is shown in
Figure 2.12.
The advantage of the Latin square design In the smoking experiment, suppose that
is the ability to isolate two nuisance vari- 30 smokers are available to participate.
ables to obtain greater power to reject a The 30 smokers are ranked with respect to
false null hypothesis. The disadvantages are: the length of time that they have smoked. The
(1) the number of treatment levels, rows, and np = (2)(3) = 6 smokers who have smoked
columns of the Latin square must be equal, for the shortest length of time are assigned to
a balance that may be difficult to achieve; group 1, the next six smokers are assigned to
(2) if there are any interactions among the group 2, and so on. The six smokers in each
treatment levels, rows, and columns, the test group are then randomly assigned to the three
of treatment A is positively biased; and (3) the treatment levels with the restriction that n = 2
randomization is relatively complex. smokers are assigned to each level.
I have described three simple ANOVA The total sum of squares and total degrees
designs: completely randomized design, ran- of freedom are partitioned as follows:
domized block design, and Latin square
design. I call these three designs building- SSTOTAL = SSA + SSG
block designs because all complex ANOVA + SSA × G + SSWCELL
designs can be constructed by modifying
npw − 1 = (p − 1) + (w − 1)
or combining these simple designs (Kirk,
2005: 69). Furthermore, the randomization + ( p − 1)(w − 1)
procedures, data-analysis procedures, and + pw(n − 1)
assumptions for complex ANOVA designs
are extensions of those for the three where SSG denotes the groups sum of
building-block designs. The generalized ran- squares and SSA × G denotes the interaction
domized block design that is described next of treatment A and groups. The within cells
represents a modification of the randomized sum of squares, SSWCELL, is used to estimate
block design. error effects. Three null hypotheses can be
tested:

Generalized randomized block H0 : µ1. = µ2. = µ3. (treatment A


design population means are equal)
A generalized randomized block design is H0 : µ.1 = µ.2 = · · · = µ.5 (group
a variation of a randomized block design. population means are equal)
Instead of having n blocks of homogeneous H0 : A × G interaction = 0 (treatment A
participants, the generalized randomized and groups do not interact),
block design has w groups of np homogeneous
participants. The z = 1, . . ., w groups, like where µjz denotes a population mean for the
the blocks in a randomized block design, ith block, jth treatment level, and zth group.
represent a nuisance variable that a researcher The three null hypotheses are tested using the
wants to remove from the error effects. following F statistics:
The generalized randomized block design is
appropriate for experiments that have one SSA/(p−1) MSA
F= =
treatment with p ≥ 2 treatment levels and SSWCELL/[pw(n−1)] MSWCELL
EXPERIMENTAL DESIGN 37

Treat. Dep. Treat. Dep. Treat. Dep.


Level Var. Level Var. Level Var.
$
1 a1 Y 111 3 a2 Y 321 5 a3 Y 531
Group1
2 a1 Y 211 4 a2 Y 421 6 a3 Y 631
......................... ......................... .........................
Y¯ .11 Y¯ .21 Y¯ .31
$
7 a1 Y 712 9 a2 Y 922 11 a3 Y 11,32
Group2
8 a1 Y 812 10 a2 Y 10,22 12 a3 Y 12,32
......................... ......................... .........................
Y¯ .12 Y¯ .22 Y¯ .32
$
13 a1 Y 13,13 15 a2 Y 15,23 17 a3 Y 17,33
Group3
14 a1 Y 14,13 16 a2 Y 16,23 18 a3 Y 18,33
......................... ......................... .........................
Y¯ .13 Y¯ .23 Y¯ .33
$
19 a1 Y 19,14 21 a2 Y 21,24 23 a3 Y 23,34
Group4
20 a1 Y 20,14 22 a2 Y 22,24 24 a3 Y 24,34
......................... ......................... .........................
Y¯ .14 Y¯ .24 Y¯ .34
$
25 a1 Y 25,15 27 a2 Y 27,25 29 a3 Y 29,35
Group5
26 a1 Y 26,15 28 a2 Y 28,25 30 a3 Y 30,35
......................... ......................... .........................
Y¯ .15 Y¯ .25 Y¯ .35

Dep. Var., dependent variable.

Figure 2.12 Generalized randomized block design with n = 30 participants, p = 3 treatment


levels, and w = 5 groups of np = (2)(3) = 6 homogeneous participants. The six participants in
each group are randomly assigned to the three treatment levels (Treat. Level) with the
restriction that two participants are assigned to each level.

SSG/(w−1) MSG ANALYSIS OF VARIANCE DESIGNS


F= =
SSWCELL/[pw(n−1)] MSWCELL WITH TWO OR MORE TREATMENTS
SSA×G/(p−1)(w−1) MSA×G
F= = The ANOVA designs described thus far all
SSWCELL/[pw(n−1)] MSWCELL
have one treatment with p ≥ 2 levels. The
The generalized randomized block design designs described next have two or more
enables a researcher to isolate one nuisance treatments denoted by the letters A, B, C, and
variable, an advantage that it shares with the so on. If all of the treatments are completely
randomized block design. Furthermore, the crossed, the design is called a factorial design.
design uses the within cell variation in Two treatments are completely crossed if each
the pw = 15 cells to estimate error effects level of one treatment appears in combination
rather than an interaction as in the randomized with each level of the other treatment and
block design. Hence, the restrictive sphericity vice se versa. Alternatively, a treatment can
and additivity assumptions of the randomized be nested within another treatment. If, for
block design are replaced with the assumption example, each level of treatment B appears
of homogeneity of within cell population with only one level of treatment A, treatment
variances. B is nested within treatment A. The distinction
38 DESIGN AND INFERENCE

(a) Factorial design (b) Hierarchical design


a1 a2 a1 a2

b1 b2 b1 b2 b1 b2 b3 b4

Figure 2.13 (a) illustrates crossed treatments. In (b),


treatment B(A) is nested in treatment A.

between crossed and nested treatments is


Treat. Dep.
illustrated in Figure 2.13. The use of crossed Comb. Var.
treatments is a distinguishing characteristic 
Participant1 a1b1 Y 111
of all factorial designs. The use of at




. . .
least one nested treatment in a design is a Group1 
.. .. ..
distinguishing characteristic of hierarchical



Participantn1 a1b1 Y n1 11
designs.
Y¯ .11


 Participant1 a1b2 Y 112


. . .
Completely randomized factorial Group2 

.. .. ..
design


Participantn2 a1b2 Y n2 12
The simplest factorial design from the Y¯ .12
standpoint of randomization procedures and


 Participant1 a2b1 Y 121
data analysis is the completely randomized


.. .. ..
factorial design with p levels of treatment Group3 
 . . .

A and q levels of treatment B. The design

Participantn3 a2b1 Y n3 21
is constructed by crossing the p levels of Y¯ .21
one completely randomized design with the 
Participant1 a2b2 Y 122
q levels of a second completely randomized




.. .. ..
design. The design has p × q treatment Group4  . . .
combinations, a1 b1 , a1 b2 , … , ap bq .



Participantn4 a2b2 Y n4 22
The layout for a two-treatment completely
randomized factorial design with p = 2 Y¯ .22
levels of treatment A and q = 2 levels Dep. Var., dependent variable; Treat. Comb., treatment
of treatment B is shown in Figure 2.14. In combination.
this example, 40 participants are randomly Figure 2.14 Layout for a two-treatment,
assigned to the 2 × 2 = 4 treatment completely randomized factorial design
combinations with the restriction that n = where n = 40 participants were randomly
10 participants are assigned to each com- assigned to four combinations of treatments
A and B, with the restriction that n1 = … =
bination. This design enables a researcher n4 = 10 participants were assigned to each
to simultaneously evaluate two treatments, combination.
A and B, and, in addition, the interaction
between the treatments, denoted by A×B. Two
as follows:
treatments are said to interact if differences in
the dependent variable for the levels of one SSTOTAL = SSA + SSB
treatment are different at two or more levels + SSA × B + SSWCELL
of the other treatment.
The total sum of squares and total degrees npq − 1 = ( p − 1) + (q − 1)
of freedom for the design are partitioned + ( p − 1)(q − 1) + pq(n − 1),
EXPERIMENTAL DESIGN 39

where SSA × B denotes the interaction sum Randomized block factorial design
of squares for treatments A and B. Three null
hypotheses can be tested: A two-treatment, randomized block factorial
design is constructed by crossing p levels of
one randomized block design with q levels
H0 : µ1. = µ2. = · · · = µp. (treatment A
of a second randomized block design. This
population means are equal) procedure produces p × q treatment combina-
H0 : µ.1 = µ.2 = · · · = µ.q (treatment B tions: a1 b1 , a1 b2 , …, ap bq . The design uses the
population means are equal) blocking technique described in connection
with a randomized block design to isolate
H0 : A × B interaction = 0 (treatments A
variation attributable to a nuisance variable
and B do not interact), while simultaneously evaluating two or more
treatments and associated interactions.
where µjk denotes a population mean for the A two-treatment, randomized block fac-
jth level of treatment A and the kth level of torial design has blocks of size pq. If a
treatment B. The F statistics for testing the block consists of matched participants, n
null hypotheses are as follows: blocks of pq matched participants must be
formed. The participants in each block are
SSA/( p − 1) MSA randomly assigned to the pq treatment com-
F= = binations. Alternatively, if repeated measures
SSWCELL/[ pq(n − 1)] MSWCELL
are obtained, each participant is observed
SSB/(q − 1) MSB
F= = pq times. For this case, the order in which
SSWCELL/[ pq(n − 1)] MSWCELL the treatment combinations is administered
SSA × B/( p − 1)(q − 1) MSA × B
F= = . is randomized independently for each block,
SSWCELL/[ pq(n − 1)] MSWCELL assuming that the nature of the treatments and
research hypotheses permit this. The layout
The advantages of a completely random- for the design with p = 2 and q = 2 treatment
ized factorial design are as follows: (1) levels is shown in Figure 2.15.
All participants are used in simultaneously The total sum of squares and total degrees
evaluating the effects of two or more of freedom for a randomized block factorial
treatments. The effects of each treatment are design are partitioned as follows:
evaluated with the same precision as if the SSTOTAL = SSBL + SSA + SSB
entire experiment had been devoted to that
+ SSA × B + SSRESIDUAL
treatment alone. Thus, the design permits
efficient use of resources. (2) A researcher npq − 1 = (n − 1) + (p − 1) + (q − 1)
can determine whether the treatments interact. + ( p − 1)(q − 1) + (n − 1)( pq−1).
The disadvantages of the design are as Four null hypotheses can be tested:
follows: (1) If numerous treatments are
included in the experiment, the number H0 : µ1.. = µ2.. = . . . = µn.. (block
of participants required can be prohibitive. population means are equal)
(2) A factorial design lacks simplicity in H0 : µ.1. = µ.2. = . . . = µ.p. (treatment A
the interpretation of results if interaction population means are equal)
effects are present. Unfortunately, interactions
among variables in the behavioral sciences H 0 : µ..1 = µ..2 = . . . = µ..q (treatment B
and education are common. (3) The use of population means are equal)
a factorial design commits a researcher to H0 : A × B interaction = 0 (treatments A
a relatively large experiment. A series of and B do not interact),
small experiments permit greater freedom in
pursuiting unanticipated promising lines of where µijk denotes a population mean for the
investigation. ith block, jth level of treatment A, and kth level
40 DESIGN AND INFERENCE

Treat. Dep. Treat. Dep. Treat. Dep. Treat. Dep.


Comb. Var. Comb. Var. Comb. Var. Comb. Var.
Block1 a1b1 Y 111 a1b2 Y 112 a2b1 Y 121 a2b2 Y 122 Y¯ 1..
Block2 a1b1 Y 211 a1b2 Y 212 a2b1 Y 221 a2b2 Y 222 Y¯ 2..
Block3 a1b1 Y 311 a1b2 Y 312 a2b1 Y 321 a2b2 Y 322 Y¯ 3..
. . . . . . . . . .
.. .. .. .. .. .. .. .. .. ..
Block10 a1b1 Y 10,11 a1b2 Y 10,12 a2b1 Y 10,21 a2b2 Y 10,22 Y¯ 10,..
.........................................................................................................
Y¯ .11 Y¯ .12 Y¯ .21 Y¯ .22
Dep. Var., dependent variable; Treat. Comb., treatment combination.

Figure 2.15 Layout for a two-treatment, randomized block factorial design where four
matched participants are randomly assigned to the pq = 2 × 2 = 4 treatments combinations
in each block.

of treatment B. The F statistics for testing the power. However, if either p or q in a


null hypotheses are as follows: two-treatment, randomized factorial design
is moderately large, the number of treatment
SSBL/(n − 1) combinations in each block can be pro-
F=
SSRESIDUAL/[(n − 1)( pq − 1)] hibitively large. For example, if p = 3 and q =
MSBL 4, the design has blocks of size 3 × 4 = 12.
=
MSRESIDUAL Obtaining n blocks with twelve matched
SSA/( p − 1) participants or observing n participants on
F=
SSRESIDUAL/[(n − 1)( pq − 1)] twelve occasions is generally not feasible. In
MSA the late 1920s, Ronald A. Fisher and Frank
= Yates addressed the problem of prohibitively
MSRESIDUAL
SSB/(q − 1) large block sizes by developing confounding
F= schemes in which only a portion of the
SSRESIDUAL/[(n − 1)( pq − 1)]
treatment combinations in an experiment
MSB
= are assigned to each block (Yates, 1937).
MSRESIDUAL Their work was extended in the 1940s by
SSA × B/( p − 1)(q − 1)
F= David J. Finney (1945, 1946) and Oscar
SSRESIDUAL/[(n − 1)( pq − 1)] Kempthorne (1947).
MSA × B The split-plot factorial design that is
= .
MSRESIDUAL described next achieves a reduction in the
The design shares the advantages and disad- block size by confounding one or more treat-
vantages of the randomized block design. It ments with groups of blocks. Group-treatment
has an additional disadvantage: if treatment A confounding occurs when the effects of, say,
or B has numerous levels, say four or five, treatment A with p levels are indistinguishable
the block size becomes prohibitively large. from the effects of p groups of blocks.
Designs that reduce the size of the blocks are This form of confounding is characteristic
described next. of all split-plot factorial designs. A second
form of confounding, group-interaction con-
founding, also reduces the size of blocks.
ANALYSIS OF VARIANCE DESIGNS This form of confounding is characteristic
WITH CONFOUNDING of all confounded factorial designs. A third
form of confounding, treatment-interaction
An important advantage of a randomized confounding reduces the number of treatment
block factorial design relative to a comple- combinations that must be included in a
tely randomized factorial design is superior design. Treatment-interaction confounding
EXPERIMENTAL DESIGN 41

is characteristic of all fractional factorial and treatment B as the within-blocks treatment


designs. is shown in Figure 2.16.
Reducing the size of blocks and the number Comparison of Figure 2.16 for the split-
of treatment combinations that must be plot factorial design with Figure 2.15 for
included in a design are attractive. However, a randomized block factorial design reveals
in the design of experiments, researchers do that the designs contain the same treatment
not get something for nothing. In the case combinations: a1 b1 , a1 b2 , a2 b1 , and a2 b2 .
of a split-plot factorial design, the effects of However, the block size in the split-plot facto-
the confounded treatment are evaluated with rial design is half as large. The smaller block
less precision than in a randomized block size is achieved by confounding two groups of
factorial design. blocks with treatment A. Consider the sample
means for treatment A in Figure 2.16. The
difference between Ȳ .1. and Ȳ .2. reflects the
Split-plot factorial design difference between the two groups of blocks
as well as the difference between the two
The split-plot factorial design is appropriate levels of treatment A. To put it another way,
for experiments with two or more treatments you cannot tell how much of the difference
where the number of treatment combinations between Ȳ .1. and Ȳ .2. is attributable to the
exceeds the desired block size. The term split- difference between Group1 and Group2 and
plot comes from agricultural experimentation how much is attributable to the difference
where the levels of, say, treatment A are between treatment levels a1 and a2 . For this
applied to relatively large plots of land—the reason, the groups of blocks and treatment A
whole plots. The whole plots are then split or are said to be confounded.
subdivided, and the levels of treatment B are The total sum of squares and total degrees
applied to the subplots within each whole plot. of freedom for a split-plot factorial design are
A two-treatment, split-plot factorial design partitioned as follows:
is constructed by combining features of two
different building-block designs; a completely SSTOTAL = SSA+SSBL(A)+SSB
randomized design with p levels and a +SSA×B+SSRESIDUAL
randomized block design with q levels. The
layout for a split-plot factorial design with npq−1 = (p−1)+p(n−1)+(q−1)
treatment A as the between-block treatment +(p−1)(q−1)+p(n−1)(q−1),

Treat. Dep. Treat. Dep.


Comb. Var. Comb. Var.



 Block1 a1b1 Y 111 a1b1 Y 112

 Block

a1b1 Y 211 a1b2 Y 212
2
. . . . .
a1 Group1 

 .
. .. .. .. .. Y¯ .1.


 Block a1b1 Y n1 11 a1b2 Y n1 12
n1


 Block1 a2b1 Y 121 a2b2 Y 122

 Block

a2b1 Y 221 a2b2 Y 222
2
.. .. .. .. ..
a2 Group2 

 . . . . . Y¯ .2.


Blockn2 a2b1 Y n2 21 a2b2 Y n2 22
...................................................
Y¯ ..1 Y¯ ..2
Dep. Var., dependent variable; Treat. Comb., treatment combination.

Figure 2.16 Layout for a two-treatment, split-plot factorial design. Treatment A, a


between-blocks treatment, is confounded with groups. Treatment B is a within-blocks
treatment.
42 DESIGN AND INFERENCE

where SSBL(A) denotes the sum of squares acceptable and a researcher is more interested
for blocks within treatment A. Three null in treatment B and the A × B interaction than
hypotheses can be tested: in treatment A, a split-plot factorial design is
a good design choice.
H0 : µ1. = µ2. = . . . = µp. (treatment A Alternatively, if a large block size is
population means are equal) not acceptable and the researcher is pri-
H0 : µ.1 = µ.2 = . . . = µ.q (treatment B marily interested in treatments A and B, a
confounded-factorial design is a good choice.
population means are equal) This design, which is described next, achieves
H0 : A × B interaction = 0 (treatments A a reduction in block size by confounding
and B do not interact), groups of blocks with the A×B interaction. As
a result, tests of treatments A and B are more
where µijk denotes the ith block, jth level of powerful than the test of the A × B interaction.
treatment A, and kth level of treatment B. The
F statistics are:
Confounded factorial designs
SSA/( p − 1) MSA
F= = Confounded factorial designs are constructed
SSBL(A)/[ p(n − 1)] MSBL(A) from either randomized block designs or
SSB/(q − 1) Latin square designs. One of the simplest
F=
SSRESIDUAL/[ p(n − 1)(q − 1)] completely confounded factorial designs is
MSB constructed from two randomized block
= designs. The layout for the design with
MSRESIDUAL
SSA × B/( p − 1)(q − 1) p = q = 2 is shown in Figure 2.17. The design
F= confounds the A × B interaction with groups
SSRESIDUAL/[ p(n − 1)(q − 1)]
MSA × B of blocks and thereby reduces the block size
= to two. The power of the tests of the two
MSRESIDUAL
treatments is usually much greater than the
Notice that the split-plot factorial design power of the test of the A × B interaction.
uses two error terms: MSBL(A) is used to Hence, the completely confounded factorial
test treatment A; a different and usually design is a good design choice if a small block
much smaller error term, MSRESIDUAL, is size is required and the researcher is primarily
used to test treatment B and the A × B interested in tests of treatments A and B.
interaction. The statistic F = MSA/MSBL(A)
is like the F statistic in a completely ran- Fractional factorial design
domized design. Similarly, the statistic F =
MSB/MSRESIDUAL is like the F statistic for Confounded factorial designs reduce the
treatment B in a randomized block design. number of treatment combinations that appear
Because MSRESIDUAL is generally smaller in each block. Fractional-factorial designs use
than MSBL(A), the power of the tests of treatment-interaction confounding to reduce
treatment B and the A×B interaction is greater the number of treatment combinations that
than that for treatment A. A randomized appear in an experiment. For example, the
block factorial design, on the other hand, number of treatment combinations in an
uses MSRESIDUAL to test all three null experiment can be reduced to some fraction—
hypotheses. As a result, the power of the test 1% , 1% , 1% , 1% , 1% , and so on—of the
2 3 4 8 9
of treatment A, for example, is the same as total number of treatment combinations in
that for treatment B. When tests of treatments an unconfounded factorial design. Unfortu-
A and B and the A × B interaction are of equal nately, a researcher pays a price for using
interest, a randomized block factorial design is only a fraction of the treatment combinations:
a better design choice than a split-plot factorial ambiguity in interpreting the outcome of the
design. However, if the larger block size is not experiment. Each sum of squares has two or
EXPERIMENTAL DESIGN 43

Treat. Dep. Treat. Dep.


Comb. Var. Comb. Var.



 Block1 a1b1 Y 111 a2b2 Y 122

 Block

a1b1 Y 211 a2b2 Y 222
2
.. .. .. .. ..
aj bk Group1 

 . . . . .


 Block a1b1 Y n1 11 a2b2 Y n1 22
n1
...................................................
Y¯ .11 Y¯ .22



 Block1 a1b2 Y 112 a2b1 Y 121

 Block

a1b2 Y 212 a2b1 Y 221
2
. . . . .
aj bk Group2 

 .
. .. .. .. ..


 Block a1b2 Y n2 12 a2b1 Y n2 21
n2
...................................................
Y¯ .12 Y¯ .21

Dep. Var., dependent variable; Treat. Comb., treatment combination.

Figure 2.17 Layout for a two-treatment, randomized block, confounded factorial design.
The A × B interaction is confounded with groups. Treatments A and B are within-blocks
treatments.

more labels called aliases. For example, two of treatment B appears with only one level
labels for the same sum of squares might be of treatment A. A hierarchical design has
treatment A and the B × C × D interaction. at least one nested treatment; the remaining
You may wonder why anyone would use treatments are either nested or crossed.
such a design—after all, experiments are sup- Hierarchical designs are constructed from
posed to help us resolve ambiguity not create two or more or a combination of completely
it. Fractional factorial designs are typically randomized and randomized block designs.
used in exploratory research situations where For example, a researcher who is interested in
a researcher is interested in six-or- more treat- the effects two types of radiation on learning in
ments and can perform follow-up experiments rats could assign 30 rats randomly to six cages.
if necessary. The design enables a researcher The six cages are then randomly assigned to
to efficiently investigate a large number of the two types of radiation. The cages restrict
treatments in an initial experiment, with the movement of the rats and insure that all rats
subsequent experiments designed to focus on in a cage receive the same amount of radiation.
the most promising lines of investigation or In this example, the cages are nested in the two
to clarify the interpretation of the original types of radiation. The layout for the design is
analysis. The designs are used infrequently in shown in Figure 2.18.
the behavioral sciences and education. There is a wide array of hierarchical
designs. For a description of these designs, the
reader is referred to the extensive treatment in
HIERARCHICAL DESIGNS Chapter 11 of Kirk (1995).

The multitreatment designs that I have dis-


cussed up to now all have crossed treatments. ANALYSIS OF COVARIANCE
Often researchers in the behavioral sciences
and education design experiments in which The discussion so far has focused on designs
one of more treatments is nested. Treatment that use experimental control to reduce error
B is nested in treatment A if each level variance and minimize the effects of nuisance
44 DESIGN AND INFERENCE

measuring one or more concomitant vari-


Treat. Dep.
Comb. Var.
ables, also called covariates, in addition to the
 dependent variable. The concomitant variable
Animal1 a1b1 Y 111
represents a source of variation that was not




. . .
Group1 
.
. .. .. controlled in the experiment and a source that
is believed to affect the dependent variable.



Animaln1 a1b1 Y n1 11
................................ Analysis of covariance enables a researcher
Y¯ .11
to: (1) remove that portion of the dependent-
 variable error variance that is predictable

 Animal1 a1b2 Y 112 from a knowledge of the concomitant variable


.. .. .. thereby increasing power; and (2) adjust the
Group2  . . .


 dependent-variable means so that they are
Animaln2 a1b2 Y n2 12
................................
free of the linear effects attributable to the
concomitant variable thereby reducing bias.
Y¯ .12
 Analysis of covariance is often used in three

 Animal1 a1b3 Y 113 kinds of research situations. One situation


.. .. .. involves the use of intact groups with unequal
Group3 . . .



 concomitant-variable means and is common
Animaln3 a1b3 Y n3 13 in educational research. The procedure sta-
................................
tistically equates the intact groups so that
Y¯ .21 their concomitant-variable means are equal.
.. .. .. Unfortunately, a researcher can never be sure
. . .
 that the concomitant-variable means that are

 Animal1 a 2 b 10 Y 12,10 adjusted represent the only nuisance variable


.. .. .. or the most important nuisance variable
Group10 . . .
on which the intact groups differ. Random




Animaln10 a 2 b 10 Y n10 2,10
assignment is the best safeguard against
................................
unanticipated nuisance variables.
Y¯ .22
Analysis of covariance also can be used to
Dep. Var., dependent variable; Treat. Comb., treatment adjust the concomitant-variable means when
combination. it becomes apparent that although the partici-
Figure 2.18 Layout for a two-treatment, pants were randomly assigned to the treatment
completely randomized, hierarchical design levels, the participants at the beginning of the
in which q = 10 levels of treatment B(A) are experiment were not equivalent on a nuisance
nested in p = 2 levels of treatment A. Fifty variable. Finally, analysis of covariance can
animals are randomly assigned to ten
combinations of treatments A and B(A) with
be used to adjust the concomitant-variable
the restriction that five animals are assigned means for differences in a nuisance variable
to each treatment combination. that develop during an experiment.
Statistical control and experimental control
are not mutually exclusive approaches to
variables. Experimental control can take reducing error variance and minimizing the
different forms such as random assignment of effects of nuisance variables. It may be conve-
participants to treatment levels, stratification nient to control some variables by experimen-
of participants into homogeneous blocks, and tal control and others by statistical control. In
refinement of techniques for measuring a general, experimental control involves fewer
dependent variable. Analysis of covariance assumptions than statistical control. However,
is an alternative approach to reducing error experimental control requires more informa-
variance and minimizing the effects of tion about the participants before beginning
nuisance variables. The approach combines an experiment. Once data collection has
regression analysis withANOVAand involves begun, it is too late to randomly assign
EXPERIMENTAL DESIGN 45

participants to treatment levels or to form Statistics in Behavioral Science, 1: 66–83. New York:
blocks of matched participants. The advantage Wiley.
of statistical control is that it can be used after Masling, J. (1966) ‘Role-related behavior of the subject
data collection has begun. Its disadvantage is and psychologist and its effects upon psychological
that it involves a number of assumptions such data’, in D. Levine (ed.), Nebraska Symposium
on Motivation. Volume 14. Lincoln: University of
as a linear relationship between the dependent
Nebraska Press.
variable and the concomitant variable that Maxwell, S.E. and Delaney, H.D. (2004) Designing Exper-
may prove untenable. iments and Analyzing Data (2nd edn.). Mahwah, NJ:
Lawrence Erlbaum Associates.
Orne, M.T. (1962) ‘On the social psychology of the
REFERENCES psychological experiment: With particular reference
to demand characteristics and their implications’.
Anderson, N.H. (2001) Empirical Direction in Design and American Psychologist, 17, 358–372.
Analysis. Mahwah, NJ: Lawrence Erlbaum Associates. Rosenberg, M.J. (1965) ‘When dissonance fails: on
Dean, A. and Voss, D. (1999) Design and Analysis of eliminating evaluation apprehension from attitude
Experiments. New York: Springer-Verlag. measurement’, Journal of Personality and Social
Finney, D.J. (1945) ‘The fractional replication of factorial Psychology, 1: 18–42.
arrangements’, Annals of Eugenics, 12: 291–301. Rosenthal, R. (1963) ‘On the social psychology of
Finney, D.J. (1946) ‘Recent developments in the design the psychological experiment: The experimenter’s
of field experiments. III. Fractional replication’, hypothesis as an unintended determinant of exper-
Journal of Agricultural Science, 36: 184–191. imental results’, American Scientist, 51: 268–283.
Fisher, R.A. (1925) Statistical Methods for Research Rosenthal, R. (1969) ‘Interpersonal expectations:
Workers. Edinburgh: Oliver and Boyd. Effects of the experimenter’s hypothesis’, in
Fisher, R.A. (1935) The Design of Experiments. R. Rosenthal and R.L. Rosnow (eds.), Artifact in
Edinburgh and London: Oliver and Boyd. Behavioral Research. New York: Academic Press.
Giesbrecht, F.G., and Gumpertz, M.L. (2004) Planning, pp. 181–277.
Construction, and Statistical Analysis of Comparative Rosenthal, R. (1978) ‘How often are our numbers
Experiments. Hoboken, NJ: Wiley. wrong?’ American Psychologist, 33: 1005–1008.
Hacking, I. (1983) Representing and Intervening: Ryan, T.P. (2007) Modern Experimental Design.
Introductory Topics in the Philosophy of Natural Hoboken, NJ: Wiley.
Science. Cambridge, England: Cambridge University Shadish, W.R., Cook, T.D., and Campbell, D.T. (2002)
Press. Experimental and Quasi-experimental Designs for
Kempthorne, O. (1947) ‘A simple approach to Generalized Causal Inference. Boston: Houghton
confounding and fractional replication in factorial Mifflin.
experiments’, Biometrika, 34: 255–272. Sheridan, C.L. (1976) Fundamentals of Experimental
Kirk, R.E. (1995) Experimental Design: Procedures for Psychology (2nd edn.). Fort Worth, TX: Holt, Rinehart
the Behavioral Sciences (3rd edn.). Pacific Grove, CA: and Winston.
Brooks/Cole. Yates, F. (1937) ‘The design and analysis of factorial
Kirk, R.E. (2005) ‘Analysis of variance: Classification’, experiments’, Imperial Bureau of Soil Science
in B. Everitt and D. Howell (eds.), Encyclopedia of Technical Communication No. 35, Harpenden, UK.

You might also like