Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Learning Guide: True & Quasi-Experiments

If you have forgotten key ideas about sampling, particularly the ideas in the Bernard reading, review that
before you tackle the readings for this module. You should also review the Green and Glasgow article
from Module 1. Their concepts grow directly out of the challenges that Henry and Shadish et al. discuss.

View the slide shows about True Experiment Requirements and Quasi-Experiments and my document
Types of Experiments. We have very little information about the effectiveness of many interventions,
outside medicine and to a lesser degree education, because we fail to use true experiments to evaluate
most intervention programs. We simply do not know if the intervention actually produces an effect. True
experiments rely heavily on random assignment to ensure that the variance (different characteristics) of
study participants do not affect the outcome of an experiment. We know people differ in many ways.
We know we cannot screen for all of these differences. We know that some differences may affect the
outcome of the experiment – even when we screen for differences that we know or suspect could affect
the outcome. We use random assignment to try to “randomly distribute” these unknown differences
between treatment and control groups. As a result, many consider the true experiment to be the only
“gold standard” for showing that the interventions that we develop really work. Others argue that other
kinds of designs are as good or better for evaluating the effectiveness of interventions.

Henry Reading

1. Henry begins (first paragraph) by describing two quite different kinds of contributions that evaluation
research can make to practice. State them in your own terms.

2. To whom and why is it important that evaluation evidence is credible? Consider Gorard’s comments
in the very first chapter of his book as you respond to this question.

3. Henry makes the point that some argue that the kind of evidence provided by RCTs is the most
credible evidence that an intervention “really works.” What characteristics of the true experiment
support this perspective? (Hint: look at threats to validity)

4. Henry says that proponents of the RCT in evaluation stress the importance of average program
effects whereas proponents of other approaches stress other kinds of evaluation questions. What
are some of the other kinds of evaluation questions that can be posed?

5. Henry provides a lengthy discussion about Version 1 of the argument against RCTs, that they are
not generally better than other cause-probing methods, on pp. 223-229. This includes a discussion
of the importance of expected effect size (big fish or little fish) and choice of evaluation approach.
Henry states that “… the more appropriate question has to do with the expected strength of
inferences that will arise, in the expected causal context, if the question of a program’s effect is of
interest.” Think about interventions familiar to you. Do you anticipate large or small effects? How
would this expectation influence your choice of what kind of evaluation evidence needed?

6. Version 2 of the argument against RCTs argues that “RCTs and related cause-probing methods
lack value in light of the context, contingencies, and particularities involved.” Why would a scientific
realist reject this argument as a generally valid argument against RCTs as a form of evidence?
7. Refer to the Green & Glasgow article from Module 4 that discussed the relationship between
evidence-based practice and practice-based research. Those authors say: “Starting with the
proposition that “if we want more evidence-based practice, we need more practice-based evidence,”
this article (a) offers questions and guides that practitioners, program planners, and policy makers
can use to determine the applicability of evidence to situations and populations other than those in
which the evidence was produced (generalizability), (b) suggests criteria that reviewers can use to
evaluate external validity and potential for generalization, and (c) recommends procedures that
practitioners and program planners can use to adapt evidence based interventions and integrate
them with evidence on the population and setting characteristics, theory, and experience into locally
appropriate programs. The development and application in tandem of such questions, guides,
criteria, and procedures can be a step toward increasing the relevance of research for decision
making and should support the creation and reporting of more practice-based research having high
external validity.” On p. 231, Henry addresses the need for a “theory of context.” How could Green &
Glasgow suggestions for evidence-based practice and practice-based evidence contribute to this
theory building?

8. Henry ends with a list of five questions that evaluators should ask themselves before they develop
an evaluation design (he refers to design as “methods”). Explain how you can use what you have
learned in this course to respond to each of Henry’s five questions.

Jones Lager & Torssander

1. Why is this a quasi-experiment rather than a true experiment? (What makes it a quasi-experiment?)

2. What was the treatment in this experiment?

3. We often say that we cannot carry out experiments to inform public policy, the general impression
often being that for some reason this is impossible. What does this study tell us about our ability to
perform experiments at the general societal scale?

4. The authors say that most previous studies about the relationships between long-term health
outcomes and education were “natural experiments.” What is a “natural experiment”? Do you think
these “natural experiments” meet the standards to quality them as experimental designs? Why or
why not?

5. What are some of the potential confounding effects that were studied in this quasi-experiment?
(Hint: things that could interfere with ability to determine cause and effect.)

6. These authors say on page 8464 (left column, paragraph starting “two important characteristics…)
that “First, the exposure was manipulated for the express purpose of evaluating its effect…. This
means that self-selection into the experiment was limited. Second, the units that were allocated
were municipalities within a country, not whole states or countries, which tend to differ more from
each other at baseline.” Why are these important features of the design from the perspective of
internal validity?

7. They also say “The allocation of the reform was not random but we take this into consideration by
controlling for municipality effects.” What does this mean? Why is it important?

8. They make several points about processes that could have occurred that would affect the results
such as misclassification due to normal changes of residence over the years. In each case they
conclude basically that these processes would “attenuate the effect of the reform.” This actually
strengthens their conclusions. Why?

9. The results of this study were somewhat inconclusive. However, I would argue that a sophisticated
reader who understands this report will conclude that the case for educational having positive
effects on lifetime health outcomes is fairly strong. Why do you think I would argue this?
10. This was a massive “social experiment” on 1.2 million people. It took time and money. The
alternative would be to just impose a massive policy change (perhaps like No Child Left Behind)
without data. Which do you think is the better way to proceed – better meaning less expensive and
more efficacious over a 20-30 year period of time?

Shadish & Cook

Shadish and Cook do not address the general limitations of quasi-experiments as a group. Rather, they
are concerned with the limitations of two specific rather common types of quasi-experiment, those that
lack a control group and those that have no pre-test for the outcome variable. There is no design
without a control group in the true experiments. By definition, a true experiment requires random
assignment of participants to treatment and control or comparison groups. There is a post-test only
design for true experiments (see Types of True Experiments from Module 10) and many of the
weaknesses that Shadish and Cook discuss for this design in quasi-experiments also apply to true
experiments. Shadish & Cook point out several reasons why researchers sometimes use these designs.
One that you are very likely to encounter as a professional is a perennial problem for evaluators: An
intervention is designed and implemented before any pre-intervention data are collected, in fact
often without any consideration of how the effects of the intervention will be assessed. Then, of course,
the day comes when the funding body wants an evaluation of project outcomes. As an evaluator, at
best you’re stuck with a post-test only quasi-experiment for the evaluation. In many cases, you are
stuck with a “no control” quasi-experiment as well because there is no plausible way, “after the fact” to
identify any group of people who would be similar enough to the people in the program to serve as a
reasonable comparison group.

1. There are three absolute requirements for establishing any kind of causal effect. (1) Cause must
precede effect in time. (2) Cause must covary with effect – the stronger the cause, the stronger the
response (or vice versa for negative covariation). For example if we have three levels of a treatment
(low, medium and high), we should expect the biggest effect from the “high” treatment, and all
treatments (causes) should produce a greater effect than the control. (3) The researcher must
eliminate logically and empirically alternative causes for the observed effect. In reality, eliminating
every possible alternative cause that one could conceivably imagine is very difficult, it not
impossible (although positivists claim this is the goal). I go along with what Shadish & Cook say
“alternative explanations for the effect should be highly implausible.” Shadish & Cook point out that
only one of these three requirements is necessarily more problematic for quasi-experiments than for
experiments, the third one. Why does the failure to randomly assign participants to treatment and
control groups make it so hard to meet this requirement?

2. State the three principles that Shadish & Cook indicate will help the researcher meet the
requirement of eliminating alternative causes?

3. All of the designs listed in Table 4.1 (p. 106) exist in the true experiment, although, of course, they
involve a randomly assigned control group. The post-test only design, for example, is described
briefly in the document types of experiments. Explain in your own words why a one group post-test
only quasi-experimental design is weaker than a post-test only true experimental design in terms of
the researcher’s ability to draw conclusions about direct cause and effect.

4. Table 4.2 on p. 116 provides a list of quasi-experimental designs that use a control group, but do
not have a pre-test. Just as before, all of these designs are available for true experiments as well as
quasi-experiments. Explain in your own words why a post-test only with non-equivalent groups
quasi-experimental design is weaker than a post-test only with non-equivalent groups true
experimental design in terms of the researcher’s ability to draw conclusions about direct cause and
effect.

5. Assess and the degree to which each of these two types (true experiment and quasi-experiment) of
non-equivalent group designs reduces threats to internal and external validity.
6. Examine either of the two alternatives to the non-equivalent group post-test only quasi-experimental
design listed in Table 4.2. Explain how each of the three principles that Shadish & Cook offer to help
the researcher meet the requirement of eliminating alternative causes is or is not applied in the non-
equivalent group post-test only design and the alternative that you select to compare to it.

7. Whatever design, true experiment or quasi-experiment (or others, as we will see), what does
“matching” samples mean? Why is this hard to do, or at least do well enough to reduce major
threats to internal validity?

Thayer

1. Thayer (p. 7) cites Macdonald (1952, p. 136): The first essential, then for evaluative research on
practice is to make explicitly, specific, and concrete objectives towards which practice is directed….
The essence of research is that the findings relate to that which is observed and not to the individual
observer…” How does the use of experimental designs (including both true and quasi-experiments)
help practitioners make sure that they do not draw biased conclusions about the impacts of their
interventions – that their results are not “a matter of personal whim”?

2. What kinds of questions can quasi-experimental studies answer? Give an example from your own
area of interest of each of the five kinds of questions that Thayer poses.

3. Thayer says: “Do small-scale studies first as a screen, and only pursue more ambitious ones if the
intervention passes the preliminary trials presented by the simpler studies and show at least some
promise as useful.” Why does he make these recommendations?

4. We have talked in this class about the three main tasks of research with regard to the body of
knowledge: (1) build the empirical evidence base, (2) build and test theory, and (3) improve
understanding and explanation of phenomena of interest. Thayer (pp. 19-22) discusses ways in
which quasi-experimental designs can help build the body of knowledge. How do Thayer’s ideas
coincide – or NOT – with the concepts of building the body of knowledge?

5. Thayer also discusses the epistemological and ontological bases that underlie the use of
experiments in general, and specifically quasi-experiments. Review our discussion of epistemology
and design (week 1 of the semester) and compare what Thayer offers with my scientific realist
perspective.

White

1. What is impact evaluation? What is factual evidence and counterfactual evidence? Why do you
have to have both to for true evaluation of impacts of interventions?
2. Give an example of why selection bias is a major problem in impact evaluation. Why does having a
control group reduce (maybe even eliminate) this problem?
3. Often, impact evaluation will depend on quasi-experimental designs because random assignment to
treatment and control groups will not be possible because there is no “reasonably matched” control
group. White makes this statement (p. 35, second paragraph): “If a RCT is not possible, then a large
n impact evaluation can instead be based on a quasi-experimental design which uses statistical
means to construct a comparison group, which, like the control group in an RCT, has the same
characteristics as the treatment group.” What does this statement mean? Explain it in your own
words.
4. Explain the basic characteristics of PSM and RDD designs. What are the weaknesses of these
designs?
5. What are the five questions that you must answer before setting up an RCT (see p. 356)? Identify
the key issue involved in the decision-making for each question – e.g., what a researcher must
consider to arrive at a satisfactory answer for each question.
6. What do you consider the most credible objection to using RCTs for impact evaluation?
7. What do you consider the biggest challenge in using RCTs for impact evaluation?
Extension Questions

1. Henry (p. 217) says that “[Some] … see connections between support for RCTs [randomized
controlled trials or true experiments] and the more general evidence-based practice movement and
trends in public management.” Henry cites the example of the U.S. Department of Education’s
Institute for Education Sciences (IES). For example, Extension (public education) programs are
under greatly increasing pressure to show that they produce a measurable outcome, often as a
condition for continued state and federal support for these programs. Can you think of an example
of increased emphasis on evidence-based practice in your area of interest or discipline?

2. Think about an intervention program with which you are familiar. Do you think this program has
over-relied or under-relied on the true experiment in establishing internal validity, external validity
and explanatory power?

3. Assess the degree to which the one-group only quasi-experimental designs, including those with the
kinds of adjustments suggested by Standish & Cook, can provide reliable information. Much of our
accumulated evaluation evidence of interventions rests on this particular type of design, often
virtually all of the evidence. How confident should we be in relying on an intervention that has no or
very few other types evidence to produce a desired outcome?

4. You work in a nonprofit organization that provides interventions for some group (at risk youth, new
parents, people with money management problems – whatever). The organization’s program has
been operating for three years. Now the donor demands an evaluation of the effectiveness of the
program as a condition for further funding. No outcome data have been collected. Rather, the NPO
has concentrated on “process evaluation” that assesses things like whether participants completed
the program, their perception of the value of the program, and such. In other words, three years into
the program with new funding needed to continue, you have NO results of actual change in behavior
(better money management, less risky behavior, etc.). From a research design perspective, what
would you do?

You might also like