CHAPTER 8 The Classical Experiment . 221

Introduction Let's assume, for example, that we want to dis-

cover ways of reducing prejudice against African
This chapter addresses the research method most Americans. We hypothesize that learning about the

commonly associated with structured science in contribution of African Americans to U.S . history
general: the experiment. Here we'll focus on the ex- will reduce prejudice, and we decide to test this hy-

0 periment as a mode of scientific observation in social pothesis experimentally. To begin, we might test a

0 0 research. At base, experiments involve (1) taking group of experimental subjects to determine their
00o0 action and (2) observing the consequences of that levels of prejudice against African Americans. Next,
00 Chapter Overview action. Social researchers typically select a group of we might show them a documentary film depicting
subjects, do something to them, and observe the ef- the many important ways African Americans have
. 00()0 0 fect of what was done. In this chapter, we'll exam-
contributed to the scientific, literary, political, and
An experiment is a mode of observation that
• © ® o o ooo 100 enables researchers to probe causal relation-

ine the logic and some of the techniques of social social development of the nation. Finally, we would

scientific experiments. measure our subjects' levels of prejudice against Af-
0 ships. Many experiments in social research are It's worth noting at the outset that we often use rican Americans to determine whether the film has
O conducted under the controlled conditions of a experiments in nonscientific inquiry. In preparing a actually reduced prejudice.
stew, for example, we add salt, taste, add more salt, Experimentation has also been successful in the
• laboratory, but experimenters can also take
and taste again. In defusing a bomb, we clip the red study of small group interaction. Thus, we might
advantage of natural occurrences to study the
wire, observe whether the bomb explodes, clip an- bring together a small group of experimental sub-
effects of events in the social world. other, and .... jects and assign them a task, such as making recom-
0 0 We also experiment copiously in our attempts mendations for popularizing car pools. We observe,
to develop generalized understandings about the then, how the group organizes itself and deals with
world we live in. All skills are learned through ex- the problem. Over the course of several such ex-
I ntroduction An Illustration of Experimentation perimentation: eating, walking, talking, riding a periments, we might systematically vary the nature
bicycle, swimming, and so forth. Through experi- of the task or the rewards for handling the task suc- li
Topics Appropriate to Experiments "Natural" Experiments mentation, students discover how much studying cessfully. By observing differences in the way groups
is required for academic success. Through experi- organize themselves and operate under these vary-
The Classical Experiment Strengths and Weaknesses mentation, professors learn how much prepara- ing conditions, we can learn a great deal about the
Independent and Dependent Variables of the Experimental Method tion is required for successful lectures. This chapter nature of small group interaction and the factors
discusses how social researchers use experiments that influence it. For example, attorneys sometimes
Pretesting and Postesting
MAIN POINTS to develop generalized understandings. We'll see present evidence in different ways to different mock
Experimental and Control Groups
that, like other methods available to the social re- juries, to see which method is the most effective.
The Double-Blind Experiment
KEY TERMS searcher, experimenting has its special strengths We typically think of experiments as being con-
and weaknesses. ducted in laboratories. Indeed, most of the examples
REVIEW QUESTIONS AND EXERCISES in this chapter involve such a setting. This need not
Selecting Subjects
Probability Sampling
be the case, however. Social researchers often study
natural experiments: "experiments"
Topics Appropriate to Experiments
what are called
Matching that occur in the regular course of social events.
Matching or Randomization? The latter portion of this chapter deals with such
Experiments are more appropriate for some topics
and research purposes than others. Experiments are
especially well suited to research projects involving
Variations on Experimental Design
Preexperimental Research Designs relatively limited and well-defined concepts and
Validity Issues in Experimental Research propositions. In terms of the traditional image of sci-
ence, discussed earlier in this book, the experimen-
The Classical Experiment
tal model is especially appropriate for hypothesis In both the natural and the social sciences, the most
testing. Because experiments focus on determining conventional type of experiment involves three
causation, they're also better suited to explanatory major pairs of components: (1) independent and

220 than to descriptive purposes. dependent variables, (2) pretesting and posttesting,

The Classical Experiment . 223
222 . Chapter8: Experiments

and (3) experimental and control groups. This To be used in an experiment, both independent questionnaire, we might conclude that the film had prejudice is reduced only in the experimental group,
section looks at each of these components and the and dependent variables must be operationally de- indeed reduced prejudice. this reduction would seem to be a consequence of
way they're put together in the execution of the fined. Such operational definitions might involve In the experimental examination of attitudes exposure to the film, because that's the only differ-
experiment. a variety of observation methods. Responses to a such as prejudice, we face a special practical prob- ence between the two groups. Alternatively, if prej-
questionnaire, for example, might be the basis for lem relating to validity. As you may already have udice is reduced in both groups but to a greater de-
defining prejudice. Speaking to or ignoring African imagined, the subjects may respond differently to gree in the experimental group than in the control

Independent and Dependent Variables Americans, or agreeing or disagreeing with them, the questionnaires the second time even if their at- group, that, too, would be grounds for assuming
might be elements in the operational definition of titudes remain unchanged. During the first admin- that the film reduced prejudice.
Essentially, an experiment examines the effect of
interaction with African Americans in a small group istration of the questionnaire, the subjects may be The need for control groups in social research
an independent variable on a dependent variable.
setting. unaware of its purpose. By the second measure- became dear in connection with a series of studies
Typically, the independent variable takes the form
Conventionally, in the experimental model, ment, they may have figured out that were inter- of employee satisfaction conducted by F. J. Roeth-
of an experimental stimulus, which is either pres-
dependent and independent variables must be op- ested in measuring their prejudice. Because no one lisberger and W. J. Dickson (1939) in the late 1920s
ent or absent. That is, the stimulus is a dichotomous
erationally defined before the experiment begins. wishes to seem prejudiced, the subjects may "dean and early 1930s. These two researchers were inter-
variable, having two attributes, present or not pres-
However, as you'll see in connection with survey up" their answers the second time around. Thus, ested in discovering what changes in working con-
ent. In this typical model, the experimenter com-
research and other methods, it's sometimes appro- the film will seem to have reduced prejudice al- ditions would improve employee satisfaction and
pares what happens when the stimulus is present
priate to make a wide variety of observations during though, in fact, it has not. productivity. To pursue this objective, they studied
to what happens when it is not.
data collection and then determine the most useful This is an example of a more general problem working conditions in the telephone "bank wiring
In the example concerning prejudice against
operational definitions of variables during later that plagues many forms of social research: The room" of the Western Electric Works in the Chi-
African Americans, prejudice is the dependent vari- cago suburb of Hawthorne, Illinois.
analyses. Ultimately, however, experimentation, very act of studying something may change it. The
able and exposure to African-American history is the
like other quantitative methods, requires specific techniques for dealing with this problem in the con- To the researchers' great satisfaction, they dis-
independent variable. The researcher's hypothesis
and standardized measurements and observations. text of experimentation will be discussed in various covered that making working conditions better in-
suggests that prejudice depends, in part, on a lack
places throughout the chapter. The first technique creased satisfaction and productivity consistently.
of knowledge of African-American history. The
involves the use of control groups. As the workroom was brightened up through bet-
Pretesting and Posttesting
purpose of the experiment is to test the validity of
ter lighting, for example, productivity went up.
this hypothesis by presenting some subjects with
When lighting was further improved, productivity
In the simplest experimental design, subjects are
Experimental and Control Groups
an appropriate stimulus, such as a documentary
went up again.
film. In other terms, the independent variable is measured in terms of a dependent variable ( pre-
To further substantiate their scientific con-
testing), exposed to a stimulus representing an in- Laboratory experiments seldom, if ever, involve only
the cause and the dependent variable is the effect.
clusion, the researchers then dimmed the lights.
Thus, we might say that watching the film caused a dependent variable, and then remeasured in terms the observation of an experimental group to
Whoops-productivity improved again!
change in prejudice or that reduced prejudice was of the dependent variable ( posttesting). Any dif- which a stimulus has been administered. In addi-
At this point it became evident that the wiring-
an effect of watching the film. ferences between the first and last measurements tion, the researchers also observe a control group,
room workers were responding more to the atten-
The independent and dependent variables ap- on the dependent variable are then attributed to which does not receive the experimental stimulus.
tion given them by the researchers than to improved
propriate to experimentation are nearly limitless. the independent variable. In the example of prejudice and African-
working conditions. As a result of this phenomenon,
Moreover, a given variable might serve as an inde- In the example of prejudice and exposure to American history, we might examine two groups
often called the Hawthorne effect, social researchers
pendent variable in one experiment and as a de- African-American history, we'd begin by pretesting of subjects. To begin, we give each group a question-
have become more sensitive to and cautious about
pendent variable in another. For example, preju- the extent of prejudice among our experimental naire designed to measure their prejudice against
the possible effects of experiments themselves. In
dice is the dependent variable in our example, but subjects. Using a questionnaire asking about atti- African Americans. Then we show the film only to
it might be the independent variable in an experi- tudes toward African Americans, for example, we the experimental group. Finally, we administer a
ment examining the effect of prejudice on voting could measure both the extent of prejudice exhib- posttest of prejudice to both groups. Figure 8-1 il-
experimental group in experimentation, a group
behavior. ited by each individual subject and the average lustrates this basic experimental design. of subjects to whom an experimental stimulus is
prejudice level of the whole group. After expos- Using a control group allows the researcher administered.
ing the subjects to the African-American history to detect any effects of the experiment itself. If the control group In experimentation, a group of sub-
film, we could administer the same questionnaire posttest shows that the overall level of prejudice ex- jects to whom no experimental stimulus is adminis-
pretesting The measurement of a dependent variable tered and who should resemble the experimental
again. Responses given in this posttest would per- hibited by the control group has dropped as much
among subjects. group in all other respects. The comparison of the con-
mit us to measure the later extent of prejudice for as that of the experimental group, then the appar-
posttesting The remeasurement of a dependent vari- trol group and the experimental group at the end of
each subject and the average prejudice level of the ent reduction in prejudice must be a function of
able among subjects after they've been exposed to an the experiment points to the effect of the experimental
group as a whole. If we discovered a lower level of the experiment or of some external factor rather stimulus.
independent variable.
prejudice during the second administration of the than a function of the film. If, on the other hand,
224 . Chapter 8: Experiments Selecting Subjects . 225

FIGURE 8-1 research, the experimenters may be more likely to would be conducted with college undergraduates

Diagram of Basic Experimental Design "observe" improvements among patients receiving as subjects. Typically, the experimenter asks stu-
the experimental drug than among those receiv- dents enrolled in his or her classes to participate in
ing the placebo. (This would be most likely, per- experiments or advertises for subjects in a college
Experimental Control haps, for the researcher who developed the drug.) newspaper. Subjects may or may not be paid for
Group Group
A double-blind experiment eliminates this pos- participating in such experiments (recall also from
sibility, because in this design neither the subjects Chapter 3 the ethical issues involved in asking stu-
Measure dependent Measure dependent
Compare: Same? nor the experimenters know which is the experi- dents to participate in such studies).
variable variable
mental group and which is the control. In the In relation to the norm of generalizability in
medical case, those researchers who were respon- science, this tendency dearly represents a poten-
r sible for administering the drug and for noting im- tial defect in social research. Simply put, college
Administer experimental provements would not be told which subjects were undergraduates are not typical of the public at
stimulus (film) receiving the drug and which the placebo. Con- large. There is a danger, therefore, that we may
versely, the researcher who knew which subjects learn much about the attitudes and actions of col-
were in which group would not administer the lege undergraduates but not about social attitudes
Remeasure dependent Remeasure dependent experiment. and actions in general.
variable variable In social scientific experiments, as in medical However, this potential defect is less signifi-
Compare: Different?

experiments, the danger of experimenter bias is cant in explanatory research than in descriptive
further reduced to the extent that the operational research. True, having noted the level of prejudice
definitions of the dependent variables are dear and among a group of college undergraduates in our
the wiring-room study, the use of a proper control Such an event may very well horrify the experi- precise. Thus, medical researchers would be less pretesting, we would have little confidence that the
group-one that was studied intensively without mental subjects, requiring them to examine their likely to unconsciously bias their reading of a pa- same level existed among the public at large. On
any other changes in the working conditions- own attitudes toward African Americans, with the tient's temperature than they would be to bias their the other hand, if we found that a documentary
would have pointed to the existence of this effect. result of reduced prejudice. Because such an effect assessment of how lethargic the patient was. For film reduced whatever level of prejudice existed
The need for control groups in experimentation should happen about equally for members of the the same reason, the small group researcher would among those undergraduates, we would have more
has been nowhere more evident than in medical control and experimental groups, a greater reduc- be less likely to misperceive which subject spoke, confidence-without being certain-that it would
research. Time and again, patients who participate tion of prejudice among the experimental group or to whom he or she spoke, than whether the have a comparable effect in the community at large.
in medical experiments have appeared to improve, would, again, point to the impact of the experi- subject's comments sounded cooperative or com- Social processes and patterns of causal relationships
and it has been unclear how much of the improve- mental stimulus: the documentary film. petitive, a more subjective judgment that's difficult appear to be more generalizable and more stable
ment has come from the experimental treatment Sometimes an experimental design requires to define in precise behavioral terms. than specific characteristics such as an individual's
and how much from the experiment. In testing the more than one experimental or control group. In As I've indicated several times, seldom can level of prejudice.
effects of new drugs, then, medical researchers fre- the case of the documentary film, for example, we we devise operational definitions and measure- Aside from the question of generalizability, the
quently administer a placebo-a "drug" with no might also want to examine the impact of reading a ments that are wholly precise and unambiguous. cardinal rule of subject selection in experimenta-
relevant effect, such as sugar pills-to a control book on African-American history. In that case, we This is another reason why it can be appropriate tion concerns the comparability of experimental
group. Thus, the control-group patients believe that might have one group see the film and read the to employ a double-blind design in social research and control groups. Ideally, the control group rep-
they, like the experimental group, are receiving an book, another group only see the movie, still an- experiments. resents what the experimental group would be
experimental drug. Often, they improve. If the new other group only read the book, and the control like if it had not been exposed to the experimental
drug is effective, however, those receiving the ac- group do neither. With this kind of design, we could stimulus. The logic of experiments requires, there-
tual drug will improve more than those receiving determine the impact of each stimulus separately, fore, that experimental and control groups be as
the placebo. as well as their combined effect. similar as possible. There are several ways to ac-
complish this.
Selecting Subjects
In social scientific experiments, control groups
guard against not only the effects of the experi- in Chapter 7 we discussed the logic of sampling,
ments themselves but also the effects of any events which involves selecting a sample that is repre-
The Double-Blind Experiment sentative of some populations. Similar consider- double-blind experiment An experimental design
outside the laboratory during the experiments. In
in which neither the subjects nor the experimenters
the example of the study of prejudice, suppose that Like patients who improve when they merely ations apply to experiments. Because most social
know which is the experimental group and which is
a popular African-American leader is assassinated think they're receiving a new drug, sometimes researchers work in colleges and universities, it the control.
in the middle of, say, a week-long experiment. experimenters tend to prejudge results. In medical seems likely that research laboratory experiments

226 . Chapter8: Experiments Selecting Subjects . 227

Probability Sampling gether, in response to a newspaper advertisement, FIGURE 8-2

for example, there's no reason to believe that the Quota Matrix Illustration
The discussions of the logic and techniques of prob-
40 subjects represent the entire population from
ability sampling in Chapter 7 provide one method
which they have been drawn. Nor can we assume
for selecting two groups of people that are similar
that the 20 subjects randomly assigned to the ex- Men Women
to each other. Beginning with a sampling frame
perimental group represent that larger population.
African African
composed of all the people in the population under White White
We can have greater confidence, however, that the American American
study, the researcher might select two probability
20 subjects randomly assigned to the experimental
samples. If these samples each resemble the total Under 30 years 8 12 10 16
group will be reasonably similar to the 20 assigned
population from which they're selected, they'll also
to the control group. 30 to 50 years 18 30 14 28
resemble each other.
Following the logic of our earlier discussions of
Recall also, however, that the degree of resem- Over 50 years 12 20 12 22
sampling, we can see our 40 subjects as a popula-
blance (representativeness) achieved by probability
tion from which we select two probability samples-
sampling is largely a function of the sample size. As
each consisting of half the population. Because each
a general guideline, probability samples of less than
sample reflects the characteristics of the total popu-
100 are not likely to be terribly representative, and
lation, the two samples will mirror each other.
social scientific experiments seldom involve that
As we saw in Chapter 7, our assumption of
many subjects in either experimental or control
similarity in the two groups depends in part on the
groups. As a result, then, probability sampling is
number of subjects involved. In the extreme case, if
seldom used in experiments to select subjects from
we recruited only two subjects and assigned, by the
a larger population. Researchers do, however, use
flip of a coin, one as the experimental subject and
the logic of random selection when they assign
one as the control, there would be no reason to as-
subjects to groups.
sume that the two subjects are similar to each other.
With larger numbers of subjects, however, random-
each cell go into the experimental group and half in Chapter 7 which variables should be used in
ization makes good sense.
Randomization into the control group. stratified sampling. Which variables are relevant ul-
Alternatively, we might recruit more subjects timately depends on the nature and purpose of an
Having recruited, by whatever means, a total group
than our experimental design requires. We might experiment. As a general rule, however, the control
of subjects, the experimenter may randomly assign
then examine many characteristics of the large ini- and experimental groups should be comparable in
those subjects to either the experimental or the
Another way to achieve comparability between the tial group of subjects. Whenever we discover a pair terms of those variables that are most likely to be
control group. The researcher might accomplish
experimental and control groups is through match- of quite similar subjects, we might assign one at related to the dependent variable under study. In a
such randomization by numbering all of the sub-
ing. This process is similar to the quota sampling random to the experimental group and the other study of prejudice, for example, the two groups
jects serially and selecting numbers by means of a should be alike in terms of education, ethnicity,
methods discussed in Chapter 7. If 12 of our sub- to the control group. Potential subjects who are un-
random number table. Alternatively, the experi-
jects are young white men, we might assign 6 of like anyone else in the initial group might be left and age, among other characteristics. In some
menter might assign the odd-numbered subjects to
them at random to the experimental group and the out of the experiment altogether. cases, moreover, we may delay assigning subjects
the experimental group and the even-numbered to experimental and control groups until we have
other 6 to the control group. if 14 are middle-aged Whatever method we employ, the desired re-
subjects to the control group.
African-American women, we might assign 7 to sult is the same. The overall average description of initially measured the dependent variable. Thus,
Let's return again to the basic concept of proba-
each group. We repeat this process for every rele- the experimental group should be the same as that for example, we might administer a questionnaire
bility sampling. If we recruit 40 subjects all to-
vant grouping of subjects. of the control group. For example, on average both measuring subjects' prejudice and then match the
The overall matching process could be most ef- groups should have about the same ages, the same experimental and control groups on this variable

randomization A technique for assigning experi- ficiently achieved through the creation of a quota gender composition, the same racial composition, to assure ourselves that the two groups exhibit the
mental subjects to experimental and control groups matrix constructed of all the most relevant charac- and so forth. This test of comparability should be same overall level of prejudice.
randomly. teristics. Figure 8-2 provides a simplified illustration used whether the two groups are created through

Matching or Randomization?
matching In connection with experiments, the proce- of such a matrix. In this example, the experimenter probability sampling or through randomization.
dure whereby pairs of subjects are matched on the ba- has decided that the relevant characteristics are race, Thus far I have referred to the "relevant" vari-
sis of their similarities on one or more variables, and When assigning subjects to the experimental and
age, and gender. Ideally, the quota matrix is con- ables without saying dearly what those variables
one member of the pair is assigned to the experimental
group and the other to the control group. structed to result in an even number of subjects in are. Of course, these variables cannot be specified control groups, you should be aware of two argu-
each cell of the matrix. Then, half the subjects in in any definite way, any more than I could specify ments in favor of randomization over matching.

Variations on Experimental Design . 229

228 . Chapter8: Experiments

First, you may not be in a position to know in ad- Preexperimental Research Designs FIGURE 8-3
vance which variables will be relevant for the match- Three Preexperimental Research Designs
To begin, Campbell and Stanley discuss three "pre-
ing process. Second, most of the statistics used to
experimental" designs, not to recommend them
analyze the results of experiments assume random-
but because they're frequently used in less-than-
ization. Failure to design your experiment that way, Comparison
professional research. These designs are called "pre-
then, makes your later use of those statistics less One-Shot Case Study
experimental" to indicate that they do not meet the
meaningful. Some intuitive
scientific standards of experimental designs. In the A man who exercises
On the other hand, randomization only makes standard of what
sense if you have a fairly large pool of subjects, so
first such design-the one-shot case study-the re- is observed to be in
constitutes a
trim shape
searcher measures a single group of subjects on a trim shape
that the laws of probability sampling apply. With
dependent variable following the administration of
only a few subjects, matching would be a better Time 2
some experimental stimulus. Suppose, for example, Time 1 Time 3
that we show the African-American history film
Sometimes researchers can combine matching
mentioned earlier to a group of people and then
and randomization. When conducting an experi- One-Group Pretest-Posttest Design
administer a questionnaire that seems to measure
ment on the educational enrichment of young ado- An overweight man who
prejudice against African Americans. Suppose fur-
lescents, for example, J. Milton Yinger and his col- exercises is later observed
ther that the answers given to the questionnaire to be in trim shape
leagues (1977) needed to assign a large number of
seem to represent a low level of prejudice. We might
students, aged 13 and 14, to several different ex-
be tempted to conclude that the film reduced prej-
perimental and control groups to ensure the com-
udice. Lacking a pretest, however, we can't be sure.
parability of students composing each of the groups.
Perhaps the questionnaire doesn't really represent a
They achieved this goal by the following method.
very sensitive measure of prejudice, or perhaps the

Beginning with a pool of subjects, the research-
group we're studying was low in prejudice to begin
ers first created strata of students nearly identical to
with. In either case, the film might have made no Static-Group Comparison
one another in terms of some 15 variables. From
difference, though our experimental results might A man who exercises is
each of the strata, students were randomly assigned
have misled us into thinking it did. observed to be in trim
to the different experimental and control groups. In
The second preexperimental design discussed shape while one who
this fashion, the researchers actually improved on
by Campbell and Stanley adds a pretest for the ex- doesn't is observed to
conventional randomization. Essentially, they had be overweight
perimental group but lacks a control group. This
used a stratified sampling procedure (Chapter 7),
design-which the authors call the onegroup pretest-
except that they had employed far more stratifica-
posttestdesign-suffers from the possibility that some
tion variables than are typically used in, say, survey
factor other than the independent variable might
cause a change between the pretest and posttest
Thus far I've described the classical experiment
results, such as the assassination of a respected
-the experimental design that best represents the
African-American leader. Thus, although we can
logic of causal analysis in the laboratory. In prac- Time 1 Time 2 Time 3
see that prejudice has been reduced, we can't be
tice, however, social researchers use a great variety
sure that it was the film that caused that reduction.
of experimental designs. Let's look at some now.
To round out the possibilities for preexperimen-
tal designs, Campbell and Stanley point out that
have no way of knowing that the two groups had review the three preexperimental designs in this
some research is based on experimental and con-

Variations on Experimental Design

the same degree of prejudice initially; perhaps the new example.
trol groups but has no pretests. They call this design
experimental group started out with less. The one-shot case study represents a common
the static group comparison. For example, we might
Figure 8-3 graphically illustrates these three form of logical reasoning in everyday life. Asked
Donald Campbell and Julian Stanley (1963), in show the African-American history film to one
a classic book on research design, describe some preexperimental research designs by using a differ- whether exercise causes weight reduction, we may
group and not to another and then measure preju-
ent research question: Does exercise cause weight bring to mind an example that would seem to sup-
16 different experimental and quasi-experimental dice in both groups. If the experimental group had
reduction? To make the several designs clearer, the port the proposition: someone who exercises and is
designs. This section describes some of these varia- less prejudice at the conclusion of the experiment,
figure shows individuals rather than groups, but thin. There are problems with this reasoning, how-
tions to better show the potential for experimenta- we might assume the film was responsible. But un-
the same logic pertains to group comparisons. Let's ever. Perhaps the person was thin long before be-
tion in social research. less we had randomized our subjects, we would
Variations on Experimental Design . 23 1
230 . Chapter8: Experiments

ginning to exercise. Or perhaps he became thin for gone on in the experiment itself. The threat of in- standards or their abilities may change over the tal stimulus and the dependent variable can arise.
ternal invalidity is present whenever anything other course of the experiment. Whenever this occurs, the research conclusion
some other reason, like eating less or getting sick.
The observations shown in the diagram do not than the experimental stimulus can affect the de- that the stimulus caused the dependent variable
5. Statistical regression. Sometimes it's appropriate to
guard against these other possibilities. Moreover, pendent variable. can be challenged with the explanation that the
conduct experiments on subjects who start out with
the observation that the man in the diagram is in Campbell and Stanley (1963:5-6) and Cook "dependent" variable actually caused changes in
extreme scores on the dependent variable. If you
trim shape depends on our intuitive idea of what and Campbell (1979:51-55) point to several were testing a new method for teaching math to the stimulus.
constitutes trim and overweight body shapes. All sources of internal invalidity. Here are twelve: hardcore failures in math, you'd want to conduct 9. Diffusion or imitation of treatments. When experi-
told, this is very weak evidence for testing the rela- your experiment on people who previously have mental and control-group subjects can communi-
1. History. During the course of the experiment,
tionship between exercise and weight loss. done extremely poorly in math. But consider for a cate with each other, experimental subjects may
historical events may occur that will confound
The one-group pretest-posttest design offers minute what's likely to happen to the math achieve- pass on some elements of the experimental stimu-
the experimental results. The assassination of an
somewhat better evidence that exercise produces ment of such people over time without any experi- lus to the control group. For example, suppose
African-American leader during the course of an
weight loss. Specifically, we have ruled out the pos- mental interference. They're starting out so low that there's a lapse of time between our showing of the
experiment on reducing anti-African-American
sibility that the man was thin before beginning to they can only stay at the bottom or improve: They African-American history film and the posttest ad-
prejudice is one example; the arrest of an African-
exercise. However, we still have no assurance that can't get worse. Even without any experimental ministration of the questionnaire. Members of the
American leader for some heinous crime, which
it was his exercising that caused him to lose weight. stimulus, then, the group as a whole is likely to show experimental group might tell control-group sub-
might increase prejudice, is another.
Finally, the static-group comparison eliminates some improvement over time. Referring to a regres- jects about the film. In that case, the control group
2. Maturation. People are continually growing and
the problem of our questionable definition of what sion to the mean, statisticians often point out that ex- becomes affected by the stimulus and is not a real
changing, and such changes can affect the results of control. Sometimes we speak of the control group
constitutes trim or overweight body shapes. In this tremely tall people as a group are likely to have
the experiment. In a long-term experiment, the fact as having been "contaminated."
case, we can compare the shapes of the man who children shorter than themselves, and extremely
that the subjects grow older (and wiser?) may have
exercises and the one who does not. This design, short people as a group are likely to have children 10. Compensation. As you'll see in Chapter 12, in
an effect. In shorter experiments, they may grow
however, reopens the possibility that the man who taller than themselves. There is a danger, then, that experiments in real-life situations-such as a spe-
tired, sleepy, bored, or hungry, or change in other
exercises was thin to begin with. changes occurring by virtue of subjects starting out cial educational program-subjects in the control
ways that affect their behavior in the experiment.
in extreme positions will be attributed erroneously group are often deprived of something considered
3. Testing. As we have seen, often the process of to the effects of the experimental stimulus. to be of value. In such cases, there may be pres-
Validity Issues in Experimental Research testing and retesting influences people's behavior,
6. Selection biases. We discussed selection bias ear- sures to offer some form of compensation. For ex-
thereby confounding the experimental results. Sup- ample, hospital staff might feel sorry for control-
lier when we examined different ways of selecting
At this point I want to present in a more systematic
pose we administer a questionnaire to a group as a group patients and give them extra "tender loving
way the factors that affect the validity of experimen- subjects for experiments and assigning them to ex-
way of measuring their prejudice. Then we admin- care." In such a situation, the control group is no
perimental and control groups. Comparisons don't
tal research. First we'll look at what Campbell and ister an experimental stimulus and remeasure their
have any meaning unless the groups are compa- longer a genuine control group.
Stanley call the sources of internal invalidity, reviewed
prejudice. By the time we conduct the posttest, the
rable at the start of an experiment. 11. Compensatory rivalry. In real-life experiments,
and expanded in a follow-up book by Thomas Cook
subjects will probably have become more sensitive
and Donald Campbell (1979). Then we'll consider 7. Experimental mortality. Although some social ex- the subjects deprived of the experimental stimulus
to the issue of prejudice and will be more thought-
may try to compensate for the missing stimulus by
the problem of generalizing experimental results to periments could, I suppose, kill subjects, experimen-
ful in their answers. In fact, they may have figured
working harder. Suppose an experimental math
the "real" world, referred to as external invalidity. tal mortality refers to a more general and less ex-
out that we're trying to find out how prejudiced
program is the experimental stimulus; the control
Having examined these, we'll be in a position to treme problem. Often, experimental subjects will
they are, and, because few people like to appear
group may work harder than before on their math
appreciate the advantages of some of the more so- drop out of the experiment before it's completed,
prejudiced, they may give answers that they think
in an attempt to beat the "special" experimental
phisticated experimental and quasi-experimental and this can affect statistical comparisons and con-
we want or that will make them look good.
designs social science researchers sometimes use. clusions. In the classical experiment involving an
4. Instrumentation. The process of measurement in
experimental and a control group, each with a 12. Demoralization. On the other hand, feelings of
pretesting and posttesting brings in some of the is-
Sources of Internal Invalidity sues of conceptualization and operationalization dis-
pretest and posttest, suppose that the bigots in the deprivation within the control group may result in
experimental group are so offended by the African- their giving up. In educational experiments, de-
The problem of internal invalidity refers to the cussed earlier in the book. If we use different mea-
American history film that they tell the experi- moralized control-group subjects may stop study-
possibility that the conclusions drawn from experi- sures of the dependent variable in the pretest and
menter to forget it, and they leave. Those subjects ing, act up, or get angry.
mental results may not accurately reflect what has posttest (say, different questionnaires about preju-
sticking around for the posttest will have been less These, then, are some of the sources of internal
dice), how can we be sure they're comparable to
prejudiced to start with, so the group results will invalidity in experiments. Aware of these, experi-
each other? Perhaps prejudice will seem to decrease
Internal invalidity Refers to the possibility that the reflect a substantial "decrease" in prejudice. menters have devised designs aimed at handling
simply because the pretest measure was more sen-
conclusions drawn from experimental results may not
8. Causal time order. Though rare in social research, them. The classical experiment, if coupled with
accurately reflect what went onn in the experiment itself. sitive than the posttest measure. Or if the measure-
ments are being made by the experimenters, their ambiguity about the ti me order of the experimen- proper subject selection and assignment, addresses
Variations on Experimental Design . 233


The Classical Experiment: Using an African-American History Film to Reduce Prejudice The Solomon Four-Group Design

Group 1 Pretest Stimulus Posttest

Group 2 Pretest Posttest

Group 3 Stimulus Posttest

Group 4 Posttest


Ot Posttest should show less prejudice than the pretest.

O2 Posttest and pretest should show the same amount of prejudice.
3O Group 1 should show less prejudice than Group 2.
® Group 3 should show less prejudice than Group 4.

of the control group. We can be confident that the

each of these problems. Let's look again at that equally, even if people with extreme scores on prej- Sources of External Invalidity
study design, presented graphically in Figure 8-4. udice are being studied. Selection bias is ruled out film actually reduced prejudice among our experi-
internal invalidity accounts for only some of the mental subjects. But would it have the same effect
If we use the experimental design shown in by the random assignment of subjects. Experimen-
complications faced by experimenters. In addition, if the film were shown in theaters or on television?
Figure 8-4, we should expect two findings. For tal mortality is more complicated to handle, but the
there are problems of what Campbell and Stanley We cant be sure, because the film might be effec-
the experimental group, the level of prejudice data provided in this study design offer several ways
call external invalidity, which relates to the gen- tive only when people have been sensitized to the
measured in their posttest should be less than was to deal with it. Slight modifications to the design-
eralizability of experimental findings to the "real" issue of prejudice, as the subjects may have been in
found in their pretest. In addition, when the two administering a placebo (such as a film having
world. Even if the results of an experiment are an taking the pretest. This is an example of interaction
posttests are compared, less prejudice should be nothing to do with African Americans) to the con-
found in the experimental group than in the con- trol group, for example-can make the problem accurate gauge of what happened during that ex- between the testing and the stimulus. The classical

trol group. even easier to manage. periment, do they really tell us anything about life experimental design cannot control for that possi-
in the wilds of society? bility. Fortunately, experimenters have devised
This design also guards against the problem of The remaining five problems of internal inva-
history in that anything occurring outside the ex- lidity are avoided through the careful administra- Campbell and Stanley describe four forms of other designs that can.
this problem; I'll present one as an illustration. The Solomon four group design ( D. Campbell and
periment that might affect the experimental group tion of a controlled experimental design. The ex-
The generalizability of experimental findings is Stanley 1963:24-25) addresses the problem of test-
should also affect the control group. Consequently, perimental design we've been discussing facilitates
ing interaction with the stimulus. As the name sug-
there should still be a difference in the two posttest the dear specification of independent and depen- jeopardized, as the authors point out, if there's an
interaction between the testing situation and the gests, it involves four groups of subjects, assigned
results. The same comparison guards against prob- dent variables. Experimental and control subjects
experimental stimulus (1963:18). Here's an ex- randomly from a pool. Figure 8-5 presents this de-
lems of maturation as long as the subjects have can be kept separate, reducing the possibility of
been randomly assigned to the two groups. Testing diffusion or imitation of treatments. Administra- ample of what they mean. sign graphically.

and instrumentation can't be problems, because tive controls can avoid compensations given to the Staying with the study of prejudice and the Af-
both the experimental and control groups are sub- control group, and compensatory rivalry can be rican-American history film, let's suppose that our
external invalidity Refers to the possibility that cnn-
ject to the same tests and experimenter effects. If watched for and taken into account in evaluating experimental group-in the classical experiment-
dusions drawn from experimental results may not be
the subjects have been assigned to the two groups the results of the experiment, as can the problem has less prejudice in its posttest than in its pretest generalizable to the "real" world.
randomly, statistical regression should affect both of demoralization. and that its posttest shows less prejudice than that
234 , Chapter 8: Experiments An Illustration of Experimentation . 235

Notice that Groups 1 and 2 in Figure 8-5 com- static-group comparison discussed earlier), the sub- behaves, but how she's treated. I shall always sequent experiments have focused on specific as-
pose the classical experiment, with Group 2 being jects will be initially comparable on the dependent be a flower girl to Professor Higgins, because he pects of what has become known as the attribution
the control group. Group 3 is administered the ex- variable-comparable enough to satisfy the con- always treats me as a flower girl, and always process, or the expectations communication model. This
perimental stimulus without a pretest, and Group 4 ventional statistical tests used to evaluate the re- will, but I know I can be a lady to you, because research, largely conducted by psychologists, paral-
is only posttested. This experimental design permits sults -so it's not necessary to measure them. In- you always treat me as a lady, and always will. lels research primarily by sociologists, which takes
four meaningful comparisons, which are described deed, Campbell and Stanley suggest that the only (Act 'v a slightly different focus and is often gathered un-
in the figure. If the African-American history film justification for pretesting in this situation is tradi- der the label expectations-states theory. Psychological
really reduces prejudice-unaccounted for by the The sentiment Eliza expresses here is basic
tion. Experimenters have simply grown accustomed studies focus on situations in which the expecta-
problem of internal validity and unaccounted for social science, addressed more formally by sociolo-
to pretesting and feel more secure with research de- tions of a dominant individual affect the perfor-
by an interaction between the testing and the stim- gists such as Charles Horton Cooley (the "looking-
signs that include it. Be clear, however, that this mance of subordinates-as in the case of a teacher
ulus-we should expect four findings: point applies only to experiments in which subjects glass self") and George Herbert Mead ("the gener-
and students, or a boss and employees. The socio-
alized other"). The basic point is that who we think
have been assigned to experimental and control logical research has tended to focus more on the
1. In Group 1, posttest prejudice should be less we are-our self-concept-and how we behave
groups randomly, because that's what justifies the role of expectations among equals in small, task-
than pretest prejudice.] are largely a function of how others see and treat
assumption that the groups are equivalent without oriented groups. In a jury, for example, how do ju-
2. The Group 2 pretest and posttest should show us. Related to this, the way others perceive us is
actually measuring them to find out. rors initially evaluate each other, and how do those
the same degree of prejudice. largely conditioned by expectations they have in ad-
This discussion has introduced the intricacies of initial assessments affect their later interactions?
3. There should be less prejudice evident in the vance. If they've been told we're stupid, for example,
experimental design, its problems, and some solu- (You can learn more about this phenomenon, in-
Group 1 posttest than in the Group 2 posttest. they're likely to see us that way-and we may come
tions. There are, of course, a great many other ex- cluding attempts to find practical applications, by
4. The Group 3 posttest should show less preju- to see ourselves that way and actually act stupidly.
perimental designs in use. Some involve more than searching the Web for "The Pygmalion Effect.")
This phenomenon has generally been called the
dice than the Group 4 posttest. one stimulus and combinations of stimuli. Others Here's an example of an experiment conducted
Pygmalion effect, and it's nicely suited to controlled ex-
involve several tests of the dependent variable over to examine the way our perceptions of our abilities
Notice that findings (3) and (4) rule out any in- periments. In one of the best-known experimental
time and the administration of the stimulus at dif- and those of others affect our willingness to accept
teraction between the testing and the stimulus. And investigations of the Pygmalion effect, Robert Rosen-
ferent times for different groups. If you're interested the other person's ideas. Martha Foschi, G. Keith
remember that these comparisons are meaningful
thal and Lenore Jacobson (1968) administered what
in pursuing this topic, you might look at the Camp- Warriner, and Stephen Hart (1985) were particu-
only if subjects have been assigned randomly to the they called a "Harvard Test of Inflected Acquisition"
bell and Stanley book. larly interested in the role "standards" play in that
different groups, thereby providing groups of equal to students in a West Coast school. Subsequently,
prejudice initially, even though their preexperimen- respect:
they met with the students' teachers to present the
tal prejudice is only measured in Groups 1 and 2. results of the test. In particular, Rosenthal and Ja- In general terms, by "standards" we mean how
There is a side benefit to this research design, as cobson identified certain students as very likely to well or how poorly a person has to perform in
the authors point out. Not only does the Solomon An Illustration of Experimentation exhibit a sudden spurt in academic abilities during order for an ability to be attributed or denied
four-group design rule out interactions between
the coming year, based on the results of the test. him/her. In our view, standards are a key vari-
Experiments have been used to study a wide va-
testing and the stimulus, it also provides data for When IQ test scores were compared later, the
riety of topics in the social sciences. Some experi- able affecting how evaluations are processed
comparisons that will reveal how much of this in-
ments have been conducted within laboratory situ- researchers' predictions proved accurate. The stu- and what expectations result. For example, de-
teraction has occurred in a classical experiment.
dents identified as "sputters" far exceeded their pending on the standards used, the same level
ations; others occur out in the "real world." The
This knowledge allows a researcher to review and classmates during the following year, suggesting
following discussion provides a glimpse of both. of success may be interpreted as a major ac-
evaluate the value of any prior research that used that the predictive test was a powerful one. In fact,
Let's begin with a "real world" example. In complishment or dismissed as unimportant.
the simpler design.
the test was a hoax! The researchers had made
George Bernard Shaw's well-loved play, Pygmalion-
The last experimental design I'll mention here
their predictions randomly among both good and
the basis of the long-running Broadway musical,
is what Campbell and Stanley (1963:25-26) call poor students. What they told the teachers did not To begin examining the role of standards, the
My Fair Lady-Eliza Doolittle speaks of the powers
the posttest-only control group design; it consists of the researchers designed an experiment involving four
others have in determining our social identity. really reflect students' test scores at all. The progress
second half-Groups 3 and 4-of the Solomon experimental groups and a control. Subjects were
made by the "sputters" was simply a result of the
Here's how she distinguishes the way she's treated
design. As the authors argue persuasively, with teachers expecting the improvement and paying told that the experiment involved something called
by her tutor, Professor Higgins, and by Higgins's
proper randomization, only Groups 3 and 4 are more attention to those students, encouraging "pattern recognition ability," which was an innate
friend, Colonel Pickering:
needed for a true experiment that controls for the ability some people had and others didn't. The re-
them, and rewarding them for achievements. (No-
problems of internal invalidity as well as for the in- You see, really and truly, apart from the things tice the similarity between this situation and the searchers said subjects would be working in pairs
teraction between testing and stimulus. With ran- anyone can pick up (the dressing and the proper on pattern recognition problems.
Hawthorne effect discussed earlier in this chapter.)
domized assignment to experimental and control way of speaking, and so on), the difference be- In fact, of course, there's no such thing as pat-
The Rosenthal-Jacobson study attracted a great
groups (which distinguishes this design from the tween a lady and a flower girl is not how she deal of popular as well as scientific attention. Sub- tern recognition ability. The object of the experiment
236 . Chapter 8: Experiments "Natural" Experiments . 237

was to determine how information about this sup- 3. You are possibly worse than your partner. In more detailed analyses, it was found that the escape relatively lightly. What, we might ask, are
posed ability affected subjects' subsequent behavior. 4. You are definitely worse than your partner. same basic pattern held for both men and women, the behavioral consequences of suffering a natural
The first stage of the experiment was to "test" though it was somewhat clearer for women than disaster? Are those who suffer most more likely to
The control group for this experiment was told
each subject's pattern recognition abilities. If you for men. Here are the actual data: take precautions against future disasters than are
nothing about their own abflities or their partners'.
had been a subject in the experiment, you would those who suffer least? To answer these questions,
In other words, they had no expectations.
have been shown a geometrical pattern for 8 sec- we might interview residents of the town some
Mean Number

onds, followed by two more patterns, each of which The final step in the experiment was to set the
time after the hurricane. We might question them
of Switches

"teams" to work. As before, you and your partner

was similar to but not the same as the first one. Your regarding their precautions before the hurricane
would be given an initial pattern, followed by a
Women Men
task would be to choose which of the subsequent and the ones they're currently taking, comparing
comparison pair to choose from. When you en- Definitely better 4.50 5.66
set had a pattern closest to the first one you saw. the people who suffered greatly from the hurricane
tered your choice in this round, however, you Possibly better 6.34 6.10
You would be asked to do this 20 times, and a com- with those who suffered relatively little. In this
would be told what your partner had answered; Control group 7.68 8.34
puter would print out your "score.' Half the sub- fashion, we might take advantage of a natural ex-
then you would be asked to choose again. In your
jects would be told that they had gotten 14 correct; Possibly worse 9.36 9.09 periment, which we could not have arranged even
final choice, you could either stick with your origi-
the other half would be told that they had gotten Definitely worse 10.00 8.70 if we'd been perversely willing to do so.
nal choice or switch. The "partner's" choice was, of
only 6 correct-regardless of which patterns they A similar example comes from the annals of
course, created by the computer, and as you can
matched with which. Depending on the luck of social research concerning World War II. After the
guess, there were often a disagreements in the Because specific research efforts like this one
the draw, you would think you had done either war ended, social researchers undertook retrospec-
teams: 16 out of 20 times, in fact. sometimes seem extremely focused in their scope,
quite well or quite badly. Notice, however, that you tive surveys of wartime morale among civilians in
The dependent variable in this experiment was you might wonder about their relevance to any-
wouldn't really have any standard for judging your several German cities. Among other things, they
thing. As part of a larger research effort, however,
performance-maybe getting 4 correct would be the extent to which subjects would switch their
wanted to determine the effect of mass bombing on
choices to match those of their partners. The re- studies like this one add concrete pieces to our un-
considered a great performance. the morale of civilians. They compared the reports
searchers hypothesized that the definitely better derstanding of more general social processes.
At the same time you were given your score, of wartime morale of residents in heavily bombed
group would switch least often, followed by the It's worth taking a minute or so to consider
however, you would also be given your "partner's cities with reports from cities that received relatively
probably better group, followed by the control group, some of the life situations where "expectation
score," although both the "partners" and their little bombing. (Bombing did not reduce morale.)
followed by the probably worse group, followed by states" might have very real and important conse-
"scores" would also be computerized fictions. (Sub- Because the researcher must take things pretty
the definitely worse group, who would switch most quences. I've mentioned the case of jury delibera-
jects were told they would be communicating with much as they occur, natural experiments raise
often. tions. How about all forms of prejudice and dis-
their partners via computer terminals but would many of the validity problems discussed earlier.
The number of times subjects in the five groups crimination? Or, consider how expectation states
not be allowed to see each other.) If you were as- Thus when Stanislav Kasl, Rupert Chisolm, and
figure into job interviews or meeting your heart-
signed a score of 14, you'd be told your partner had switched their answers follows. Realize that each Brenda Eskenazi (1981) chose to study the impact
had 16 opportunities to do so. These data indicate throb's parents. If you think about it, you'll un-
a score of 6; if you were assigned 6, you'd be told that the Three Mile Island (TMI) nuclear accident
that each of the researchers' expectations was cor- doubtedly see other situations where these labora-
your partner had 14. in Pennsylvania had on plant workers, they had to
rect-with the exception of the comparison be- tory concepts apply in real life.
This procedure meant that you would enter be especially careful in the study design:
t ween the possibly worse and definitely worse groups.
the teamwork phase of the experiment believing
Although the latter group was in fact the more Disaster research is necessarily opportunistic,
either (1) you had done better than your partner
quasi-experimental, and after-the-fact. In the
"Natural" Experiments
likely to switch, the difference was too small to be
or (2) you had done worse than your partner. This
taken as a confirmation of the hypothesis. (Chap- terminology of Campbell and Stanley's classical
information constituted part of the "standard" you
ter 16 will discuss the statistical tests that let re- analysis of research designs, our study falls into
would be operating under in the experiment. In Although we tend to equate the terms experiment
searchers make decisions like this.) the "static-group comparison" category, consid-
addition, half of each group was told that a score of and laboratory experiment, many important social
ered one of the weak research designs. How-
between 12 and 20 meant the subject definitely had scientific experiments occur outside controlled set-
ever, the weaknesses are potential and their
pattern recognition ability; the other subjects were tings, often in the course of normal social events.
Mean Number
actual presence depends on the unique circum-
told that a score of 14 wasn't really high enough to
Group ofSwitches
Sometimes nature designs and executes experi-
Definitely better 5.05 stances of each study.
prove anything definite. Thus, you would emerge ments that we can observe and analyze; sometimes
from this with one of the following beliefs: Possibly better 6.23 social and political decision makers serve this natu-

Control group 7.95 ral function. The foundation of this study was a survey of
1. You are definitely better at pattern recognition Possibly worse 9.23 Imagine, for example, that a hurricane has the people who had been working at Three Mile Is-
than your partner. land on March 28, 1979, when the cooling system
Definitely worse 9.28 struck a particular town. Some residents of the
2. You are possibly better than your partner. town suffer severe financial damages, and others failed in the number 2 reactor and began melting

Main Points . 239

238 . Chapter8: Experiments

the uranium core. The survey was conducted five possibility for taking those problems into account. ate the effects of stimuli in real life. Because this is In discussing several of the sources of internal
to six months after the accident. Among other Social research generally requires ingenuity and in- an increasingly important form of social research, and external invalidity mentioned by Campbell, I
things, the survey questionnaire measured work- sight; natural experiments call for a little more than an entire chapter is devoted to it. Stanley, and Cook, we saw that we can create ex-
ers' attitudes toward working at nuclear power the average. perimental designs that logically control such prob-
plants. B they had measured only the TMI workers' Earlier in this chapter, we used a hypothetical lems. This possibility points to one of the great ad-
vantages of experiments: They lend themselves to
Strengths and Weaknesses
attitudes after the accident, the researchers would example of studying whether an African-American
have had no idea whether attitudes had changed as history film reduced prejudice. Sandra Ball- a logical rigor that is often much more difficult to

of the Experimental Method

a consequence of the accident. But they improved Rokeach, Joel Grube, and Milton Rokeach (1981) achieve in other modes of observation.
their study design by selecting another, nearby- were able to address that topic in real life through a
seemingly comparable-nuclear power plant (ab- natural experiment. In 1977, the television drama- Experiments are the primary tool for studying
breviated as PB) and surveyed workers there as a tization of Alex Haley's Roots, a historical saga about causal relationships. However, like all research
control group: hence their reference to a static- African Americans, was presented by ABC on eight methods, experiments have both strengths and MAIN POINTS
group comparison. consecutive nights. It garnered the largest audiences weaknesses.
Even with an experimental and a control group, in television history up to that time. Ball-Rokeach The chief advantage of a controlled experiment • Experiments are an excellent vehicle for the
the authors were wary of potential problems in their and her colleagues wanted to know whether Roots lies in the isolation of the experimental variable controlled testing of causal processes.
design. In particular, their design was based on the changed white Americans' attitudes toward African and its impact over time. This is seen most clearly • The classical experiment tests the effect of an
idea that the two sets of workers were equivalent to Americans. Their opportunity arose in 1979, when i n terms of the basic experimental model. A group experimental stimulus (the independent vari-
one another, except for the single fact of the acci- a sequel-Roots: The Next Generation-was televised. of experimental subjects are found, at the outset of able) on a dependent variable through the
dent. The researchers could have assumed this if Although it would have been nice (from a re- the experiment, to have a certain characteristic; fol- pretesting and posttesting of experimental and
they had been able to assign workers to the two searcher's point of view) to assign random samples lowing the administration of an experimental stimu- control groups.
plants randomly, but of course that was not the case. of Americans either to watch or not to watch the lus, they are found to have a different characteristic.
• It is generally less important that a group of
instead, they needed to compare characteristics of show, that wasn't possible. Instead, the researchers To the extent that subjects have experienced no
experimental subjects be representative of
the two groups and infer whether they were equiv- selected four samples in Washington State and other stimuli, we may conclude that the change of some larger population than that experimen-
alent. Ultimately, the researchers concluded that mailed questionnaires that measured attitudes to- characteristics is attributable to the experimental
tal and control groups be similar to each
the two sets of workers were very much alike, and ward African Americans. Following the last epi- stimulus.
the plant the employees worked at was merely a sode of the show, respondents were called and Further, since individual experiments are often
• A double-blind experiment guards against ex-
function of where they lived. asked how many, if any, episodes they had watched. rather limited in scope, requiring relatively little
perimenter bias because neither the experi-
Even granting that the two sets of workers were Subsequently, questionnaires were sent to respon- time and money and relatively few subjects, we of-
menter nor the subject knows which subjects
equivalent, the researchers faced another problem dents, remeasuring their attitudes toward African ten can replicate a given experiment several times
are in the control and experimental groups.
of comparability. They could not contact all the Americans. using several different groups of subjects. (This isn't
always the case, of course, but it's usually easier to
• Probability sampling, randomization, and
workers who had been employed at TMI at the time By comparing attitudes before and after for
matching are all methods of achieving compa-
of the accident. The researchers discussed the prob- both those who watched the show and those who repeat experiments than, say, surveys.) As in all
rability in the experimental and control groups.
lem as follows: didn't, the researchers reached several conclusions. other forms of scientific research, replication of re-
Randomization is the generally preferred
For example, they found that people with already search findings strengthens our confidence in the
One special attrition problem in this study was method. In some designs, it can be combined
egalitarian attitudes were much more likely to validity and generalizability of those findings.
the possibility that some of the no-contact with matching.
watch the show than were those who were more The greatest weakness of laboratory experi-
nonrespondents among the TMI subjects, but • Campbell and Stanley describe three forms
prejudiced toward African Americans: a self-selec- ments lies in their artificiality. Social processes that
not PB subjects, had permanently left the area of preexperiments: the one-shot case study,
tion phenomenon. Comparing the before and after occur in a laboratory setting might not necessarily
because of the accident. This biased attrition the one-group pretest-posttest design, and the
attitudes of those who watched the show, more- occur in more natural social settings. For example,
would, most likely, attenuate the estimated static-group comparison. None of these designs
over, suggested the show itself had little or no ef- an African-American history film might genuinely
extent of the impact. Using the evidence of features all the controls available in a true
fect. Those who watched it were no more egalitar- reduce prejudice among a group of experimental
disconnected or "not in service" telephone experiment.
ian afterward than they had been before. subjects. This would not necessarily mean, however,
numbers, we estimate this bias to be negligible
This example anticipates the subject of Chap- that the same film shown in neighborhood movie • Campbell and Stanley list, among others,
(1 percent). 12 sources of internal invalidity in experimen-
ter 12, evaluation research, which can be seen as a theaters throughout the country would reduce prej-
special type of natural experiment. As you'll see, udice among the general public. Artificiality is not tal design. The classical experiment with ran-
(Kasl et al. 1981:475)

The TMI example points to both the special evaluation research involves taking the logic of ex- as much of a problem, of course, for natural experi- dom assignment of subjects guards against each
problems involved in natural experiments and the perimentation into the field to observe and evalu- ments as for those conducted in the laboratory. of these 12 sources of internal invalidity.

240 . Chapter 8: Experiments Resources on the Internet . 241

• Experiments also face problems of external in- 2. Pick 6 of the 12 sources of internal invalidity dis- emphasis on experimentation. This book is es- i nvite you to participate, either as a subject or as an
validity: Experimental findings may not reflect cussed in this chapter and make up examples (not pecially strong in the philosophy of science. experimenter.
discussed in the chapter) to illustrate each.
real life.
William D. Hacker Social Science Experimental
3. Create a hypothetical experimental design that il-
• The interaction of testing and stimulus is an ex- Laboratory
lustrates one of the problems of external invalidity.
ample of external invalidity that the classical
4. Think of a recent natural disaster you've wit- RESOURCES ON THE INTERNET This laboratory at the California Institute of Technol-
experiment does not guard against.
nessed or read about. Frame a research question ogy gives students the opportunity to participate as
• The Solomon four-group design and other vari- that might be studied by treating that disaster as a VIRTUAL SOCIETY'S COMPANION WEB SITE FOR THE subjects in experiments online-for pay.
ations on the classical experiment can safeguard natural experiment. In two or three paragraphs, PRACTICE OF SOCIAL RESEARCH, 10TH EDITION
outline how the study might be done. Stanford Prison Experiment
against external invalidity.
• Campbell and Stanley suggest that, given 5. In this chapter, we looked briefly at the problem
Once at the Virtual Society, dick on "Find Companion This Web site provides a slide show relating a famous
proper randomization in the assignment of sub- of "placebo effects." On the Web, find a study in
Sites" from the left navigation bar, click on "Research social science experiment that reveals some of the
which the placebo effect figured importantly.
jects to the experimental and control groups, Methods and Statistics," and then click on your book problems that can occur in this type of study.
Write a brief report on the study, including the
there is no need for pretesting in experiments. cover. On the companion site, you will find useful
source of your information. (Hint: you might want
learning resources for your course. Some of those re- I NFOTRAC COLLEGE EDITION
• Natural experiments often occur in the course to do a search on "placebo.')
sources include Tutorial Quizzes with feedback, Inter-
of social life in the real world, and social re-
net Exercises, Flashcards, and Chapter Tutorials for
searchers can implement them in somewhat every chapter, as well as Extended Projects, Social Re- access.html
the same way they would design and conduct search in Cyberspace, and Primers for using various
Access the latest news and journal articles with Info-
You might also like