Professional Documents
Culture Documents
Science of Asking Questions Article
Science of Asking Questions Article
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
10.1146/annurev.soc.29.110702.110112
INTRODUCTION
Research on the wording of survey questions flourished in the first two decades
after the modern sample survey was invented, culminating in Stanley Paynes 1951
classic, The Art of Asking Questions. With the notable exception of research on
acquiescence, attention to wording then waned over the next quarter of a century.
In the past two decades, there has been a revival of interest by survey methodologists, who have drawn on and contributed to work by cognitive psychologists,
conversation analysts, and others to lay a foundation for the science of asking
survey questions.
The standardized survey interview is a distinct genre of interaction with unique
rules, but it shares many features with ordinary interaction because social and
conversational norms as well as processes of comprehension, memory, and the like
are imported into the interview from the situations in which they were learned and
practiced. As a result, contributions to the science of asking survey questions also
enhance our understanding of other types of interviews and of social interaction
in generalmany processes can be studied in surveys as in life (Schuman &
Ludwig 1983).
Methodologists have applied information-processing models from cognitive
psychology to explain how questions are answered in survey interviews
0360-0572/03/0811-0065$14.00
65
25 May 2003
66
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
(Sirken et al. 1999, Sudman et al. 1996, Tourangeau et al. 2000), and these models have influenced much of the research that we review here. At the same time,
there has been renewed attention to how the interviewer and respondent interact
(Schaeffer & Maynard 1996, Maynard et al. 2002). There is an intricate relationship among the survey question as it appears in the questionnaire, the rules the
interviewer is trained to follow, the cognitive processing of the participants, the
interaction between the interviewer and respondent, and the quality of the resulting data. In an interviewer-administered survey, the question that appears on the
screen or the page may be modified in the interaction that ultimately produces
an answer, and in a self-administered survey, conventions learned in other social
contexts may influence how a respondent interprets the questions presented (e.g.,
Schwarz 1994). Nevertheless, we proceed here as though the text of the question
the respondent answers is the one that appears in the questionnaire, although in
one section we review recent experiments that modify the traditional practices
associated with standardization.
Researchers must make a series of decisions when writing a survey question,
and those decisions depend on what the question is about. Our review is structured
around the decisions that must be made for two common types of survey questions:
questions about events or behaviors and questions that ask for evaluations or attitudes. Although there are several other types of questions (e.g., about knowledge
and sociodemographic characteristics), many survey questions are of one of these
two types.
In some cases, research suggests an approach that should increase the reliability
or validity of the resulting data, for example, labeling all the categories in a rating
scale. In other cases, the literature only suggests how different design alternatives,
such as using a checklist instead of an open question, lead to different results
without clearly showing which approach is best or even clearly specifying what
best means.
Researchers who compare different ways of asking standardized questions use
various methods to evaluate the results. The traditional approach involves splitsample experiments, which sometimes include measures of reliability (split-half or
over time) and validity (construct, convergent, or discriminant). Other approaches
that have increasingly been used include cognitive evaluation or expert review,
feedback obtained from respondents during cognitive interviews or debriefing
questions, the results of coding the interaction between interviewers and respondents, and feedback from interviewers in debriefing sessions (see Testing and
Evaluating Questions, below).
The interactional mode (interviewer or self-administered) and technological
mode (computer or paper) influence the nature of both a surveys questions and
the processes used to answer them. For example, a household roster may look
different and be administered differently depending on whether it is implemented
using a grid on paper or a more linear implementation on a computer (Moore
& Moyer 1998, Fuchs 2002). Seeing the questions in a self-administered form
rather than hearing them read by an interviewer, to take another example, may
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
67
mitigate the effects of question order or make it easier for respondents to use the
full range of categories in rating scales (Bishop et al. 1988, Ayidiya & McClendon
1990, Dillman & Mason 1984). Nevertheless, investigators face similar decisions
regardless of the mode, and many of the topics we discuss have been examined in
several modes.
THEORETICAL APPROACHES
Models of the process of answering a survey question vary in the number of
stages they present, but most models include understanding a question, retrieving
or constructing an answer, and reporting the answer using the specified format
(Cannell et al. 1981, Tourangeau 1984, Turner & Martin 1984, Sudman et al. 1996).
Research that draws on these models has generally focused on the first two stages.
Respondents may use somewhat different methods of processing for questions
about events or behaviors than they do for questions that request evaluations or
attitudes.
As respondents hear or read a survey question, they construct a pragmatic
meaning that incorporates their interpretation of the gist of the question, why it is
being asked, and what constitutes an acceptable answer. Using an early version of
what we would now call cognitive interviews, Belson (1981) described how respondents may expand or restrict the meaning of the concepts in a question. Research
based on information-processing models has identified some of the mechanisms
by which this occurs. For example, conversational norms specify that each contribution to a conversation should convey new information. Thus, if respondents
are asked to rate first their marriages and then their lives as a whole, they may
interpret the second question as being about their lives other than their marriages
(Schwarz et al. 1991, Tourangeau et al. 1991). Because they have already conveyed
information about their marriages, they provide only new information when rating
their lives as a whole. Respondents may also expand or restrict the meaning of
concepts because the wording of a question evokes prototypes or exemplars that
then dominate the definition of the concept. Thus, even though a researcher words a
question to ask about health practitioners, a respondent may still think the question
is about physicians because that prototypical health practitioner is so salient. The
respondents interpretation of the question can also be influenced by information
in response categories. In a 1986 study, for example, many respondents selected
the invention of the computer as an important social change when the category
was offered in a closed question, but it was seldom mentioned in response to an
open question (Schuman & Scott 1987).
Most cognitive psychologists believe that memories and autobiographical reports of events are as much constructed as retrieved. Thus, if respondents can easily
think of an instance of an event, they may infer that the event was frequentan
instance of the availability heuristic (Tversky & Kahneman 1973). Similarly, if
a memory is vivid, the event may be reported as occurring more recently than it
did (Brown et al. 1985, Bradburn et al. 1987). Respondents who believe they have
25 May 2003
68
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
changed (or stayed the same) may be guided by that belief when recalling the past
(Ross & Conway 1986). Efforts to remember may also focus on salient prototypes
or exemplars, such as physicians instead of health practitioners, and less-salient
incidents may be omitted.
In producing answers about the frequency of events, respondents use a variety of
methods, including counting episodes, rate-based estimation (either for a class of
events as a whole or decomposing and estimating for members of the class), relying
on cues conveyed by the response categories, guessing, and various combinations
of these (Blair & Burton 1987). Estimation strategies lead to heaping at common
numbers, such as multiples of 5 or 10 (Huttenlocher et al. 1990). Many of these
strategies can be considered techniques for satisficing, that is, for conserving
time and energy and yet producing an answer that seems good enough for the
purposes at hand (Krosnick 1991). These examples also illustrate the important
point that comprehension and the retrieval and construction of an answer are not
completely sequential or independent processes.
As powerful as information-processing models have been in helping us understand how survey questions are answered, they can usefully be supplemented by
paying attention to the social aspects of surveys. Some of what we might otherwise
label cognitive processing (if we did not look at the behavior of the participants) actually occurs in the interaction between the interviewer and respondent (Schaeffer
& Maynard 1996, Maynard et al. 2002). For example, when respondents hesitate or
provide answers that do not match the format of the question, interviewers may use
this information to diagnose that the respondent does not understand something
about the question and thus intervene (Schaeffer & Maynard 2002). Moreover, the
reporting task a respondent confronts may be affected by the respondents value on
the characteristic being reported, which is usually bound up with (if not determined
by) social factors. For example, a respondent with a complicated employment history will find it difficult to report beginning and ending dates of jobs, whereas
this task will be simpler for someone who has held the same job since completing
school. Information about the distribution of employment experiences will assist
researchers in anticipating the different response strategies respondents are apt
to adopt and therefore the different errors they will make in answering questions
about employment history (Mathiowetz & Duncan 1988, Schaeffer 1994, Dykema
& Schaeffer 2000, Menon 1993). The fact that true values and the errors made
in measuring those values are functions of the same social processes also means
that the assumptions of many statistical models may often be violated (Presser &
Traugott 1992).
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
69
For example, a question about the total amount of wages received in the last
month implicitly refers to the events of working and being paid, and household
composition is a function of who stays, eats, or receives mail in a place.
The first consideration in asking about an event or behavior is whether members
of the target population are likely to have encoded the information. For example,
researchers may want to ask parents about their childrens immunization history,
but some parents will be unaware of what, if any, immunizations their children
have received (Lee et al. 1999). For events that respondents do encode, two major
types of errors affect self-reports. Omissions result when individual events are
forgotten because of dating errors (e.g., when events are telescoped backward in
time and so are incorrectly excluded from the reference period), because similar
events become conflated in a generic memory (Means et al. 1989), or because the
wording of a question leads the respondent to search some areas of memory (e.g.,
visits to physicians) while neglecting others (e.g., visits to nurse practitioners).
By contrast, intrusions result when events are telescoped forward in time (and
are incorrectly included in the reference period) or when memories are altered by
scripts, schemata, or embellishments from retellings over time.
Researchers must determine the level of accuracy they will try to achieve with
the analytic goals and resources at hand. Many techniques that hold promise for
improving the accuracy of self-reports require additional interviewing time or additional resources for questionnaire development, testing, and interviewer training.
Interviewers must also be able to implement the methods and respondents willing
to tolerate them.
25 May 2003
70
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
The next version uses a checklist to implement the definition and asks about automobiles last to prevent misclassification due to respondents thinking of vehicles
in other categories as automobiles:
Im going to read you a list of different types of vehicles. As I read each one,
please tell me how many vehicles of that type you own. How many trucks do
you own? Motor scooters? Motor cycles? Automobiles?
The third example allows respondents to answer using their own concept and
then asks about vehicles they might have omitted:
How many vehicles do you own? IF ONE: Is that an automobile, truck, motor scooter, or motor cycle? IF MORE THAN ONE: How many of them are
trucks? Motor scooters? Motor cycles? Automobiles? In addition to the vehicle(s) you just told me about, do you own any (LIST TYPES OF VEHICLES
NOT MENTIONED)?
Providing a definition is probably most appropriate for events that can be defined
simply (or for questions that ask respondents to classify themselves with respect
to some social category with which they are familiar and that has a well-known
name). When a definition is provided, it should precede the actual question. If the
definition follows the question, interviewers will frequently be interrupted before
the definition is read, which will lead both to an increase in interviewer variance (as
interviewers handle the interruption differently) and to not all respondents hearing
the definition (Cannell et al. 1989, Collins 1980).
Investigators sometimes include examples as part of the definition to clarify the
concept. Respondents will focus on those examples when they search their memories. For a complex and heterogeneous class of events (for example, arts events the
respondent has participated in or attended), a checklist that asks separately about
each member of the class is often used. Checklists appear to reduce omissions,
probably because recognition tasks are easier than free recall tasks and because
the list structure requires that respondents take more time to process each item.
On the other hand, checklists may increase reporting of events that took place before the reference period, and they may lead to overestimates for small categories
if the event class is decomposed inappropriately (Belli et al. 2000, Menon 1997).
Thus, a checklist is apt to yield higher overall levels of reporting for a class of
events than a single question about the class that includes examples.
The definition of a complex event can often be unpackaged into a series of
simpler items, each of which asks about a component of the definition. Consider
the following item:
During the past 12 months since July 1st 1987, how many times have you seen
or talked with a doctor or a medical assistant about your health? Do not count
any times you might have seen a doctor while you were a patient in a hospital,
but count all the other times you actually saw or talked to a medical doctor of
any kind about your health.
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
71
During the past 12 months since July 1st 1987, were there any times when you
didnt actually see the doctor but saw a nurse or other medical assistant working
for the doctor?
During the past 12 months since July 1st 1987, did you get any medical advice,
prescriptions, or results of tests over the telephone from a medical doctor,
nurse, or medical assistant working for a doctor? (Cannell et al. 1989, appendix
A, p. 1).
Reference Periods
The choice of reference period is usually determined by the periodicity of the target
events, how memorable or patterned the events are likely to be, and the analytic
goals of the survey. Thus, investigators may ask about religious practices over the
previous year to obtain information about respondents who attend services only on
their religions (annual) holy days. By contrast, questions about purchases of candy
bars usually use a much shorter reference period. Although more recent events are
generally remembered better than more distant events, the influence of the length of
the reference period is probably smaller for frequent and highly patterned events,
presumably because respondents use information about patterning to construct
their answers (Schaeffer 1994, Dykema & Schaeffer 2000).
Researchers must decide how (and how often during a series of questions)
to specify the reference period they have selected. Schaeffer & Guzman (1999)
found only weak evidence in support of their prediction that using more specific
boundaries (e.g., specifying the start and end of the reference period) would reduce
telescoping and lead to lower levels of reporting. The reference period may also
influence how a question is interpreted. For example, a question about how often
the respondent has been angry in the past year is interpreted as referring to more
intense episodes than a question that refers to the past week, but the informational
value of the reference period is attenuated when it is repeated for many questions
(Winkielman et al. 1998, Igou et al. 2002). Experiments with anchoring techniques, in which respondents are shown a calendar that has the reference period
marked on it and asked to think of events that occurred within that time frame, have
sometimes resulted in higher levels of reports of threatening behaviors as well as
improved internal consistency of the reports (Turner et al. 1992, Czaja et al. 1994).
We suspect that the reference period should usually be given at the beginning
of a question (so that respondents do not construct their own before hearing the
investigators) and that it should be fully specified at the beginning of a line of
25 May 2003
72
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
In 2001, how many payments for child support did you receive?
The repetition communicates that the reference period has stayed the same; using the abbreviated form and the parallel structure conserves cognitive processing.
Overreporting errors due to forward telescoping can be reduced using a panel
survey with bounded interviews (Neter & Waksberg 1964). In an initial bounding
interview, respondents are asked to report events; the list of events reported at time
1 is then consulted during the time 2 interview. If an item is reported at time 2, the
interviewer verifies that it is a new item and not a duplicate of the item reported at
time 1.
It may also be possible to obtain the effects of bounded recall with a single
interview by asking about two reference periods, one of which includes the other.
Reductions in reporting that are consistent with those observed for true bounded
recall have been observed by asking first about the past 6 months and then about
the past 2 months (Loftus et al. 1992, Sudman et al. 1984).
When collecting information about related events over long periods of time,
event history calendars are useful (Freedman et al. 1988, Means & Loftus 1991).
In a new implementation of the methodology that draws on recent advances in
theories of memory, the calendar appeared to improve reporting about several
variablessuch as moves, income, and weeks unemployedalthough it increased
overreporting of other variables (Belli et al. 2001).
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
73
during the interview. As far as we know, however, these methods have yet to be
evaluated or implemented on a large scale.
Schober & Conrad (1997) (Conrad & Schober 2000) have experimented with
a less-radical approach. In experiments with short questionnaires administered
by telephone, they found that allowing interviewers to depart from the questionwording in an attempt to ensure that respondents correctly understood the questions
improved the reporting of consumer purchases (although at the cost of lengthier
interviews). The feasibility of this approach for other kinds of surveys is uncertain. For surveys covering a wide selection of topics, the range of concepts the
interviewer must understand will make it more challenging to provide accurate
clarifications. In longer interviews, it may be harder to sustain respondent motivation to request clarification when it is needed. For in-person surveys, decentralized
administration may make it impractical to monitor interviewer clarifications, and
for large-scale surveys, the size of the interviewing staff will make it more difficult
to ensure uniform interviewer understanding of question intent.
25 May 2003
74
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
measures of number of sex partners. Likewise, when the response categories suggested that being really annoyed was rare, respondents generated more extreme
examples of being really annoyed than they did when the response categories suggested that the event might be common, which implies that their interpretation of
the question had been affected by the response categories (Schwarz et al. 1988).
Although Burton & Blair (1991) did not find any difference in the accuracy of open
and closed questions about writing checks and using automatic teller machines,
and open questions do not always obtain higher frequencies for threatening behaviors (e.g., Tourangeau & Smith 1996), the potential hazards of closed questions
means that open questions are usually preferable (Schaeffer & Charng 1991).
Questions about relative frequencies use vague quantifiers, such as very often,
pretty often, not too often, seldom, never (Bradburn & Miles 1979; for a recommended set, see Pohl 1981). Relative frequencies are not simple translations of
absolute frequencies; they incorporate evaluative information. As a result, conclusions about group differences may vary depending on whether one examines absolute or relative frequencies (Schaeffer 1991). This is nicely illustrated in Woody
Allens Annie Hall. Both Annie and Alvie Singer report that they have sex three
times a week, but she characterizes this as constantly, whereas his description
is hardly ever. In addition to conveying information about preferences or expectations, relative frequencies may express how respondents compare themselves
with similar others. Relative frequencies are probably most appropriate when the
investigator wants to give weight to the evaluative component in the respondents
perception of the frequency, when group comparisons are not a central analytic
goal, or when the frequencies are too difficult to report in an absolute metric.
Even absolute frequencies (which, like all self-reports, contain errors) may
include evaluative information similar to that in relative frequencies. Which frequency, absolute or relative, is cognitively prior probably differs for different events
and for different patterns of events. In some cases, a respondent who is offered
response categories that express relative frequency may retrieve an absolute frequency from memory that must then be translated into the relative metric, whereas
in other cases, a respondent who is offered an absolute reporting format may retrieve a relative frequency and then translate it to the absolute metric (Conrad et al.
1998).
Issues of Relevance
Questions are usually written in a way that presupposes they are relevant to the
respondent. Consider the following question: In the week beginning last Sunday
and ending yesterday, how many hours did you work outside in your garden? Because not all respondents have a garden, a category labeled IF VOLUNTEERED:
Respondent does not have a garden must be provided for interviewers. With the
exception of behaviors known to be underreported (and those engaged in by almost
everyone), it is better to avoid this kind of question. Respondents may be annoyed
at being asked a question that does not apply to them, and interviewer variability may be higher for items with if volunteered categories (Collins 1980). Even
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
75
Threatening Behaviors
In the past decade, there was considerable experimentation with methods for improving the accuracy of reports of socially undesirable behaviors. These studies
focused on drug use, abortions, sexual behavior, and (non)voting, and some of
them used checks of external records to evaluate the experimental methods (see
reviews in Schaeffer 2000 and Tourangeau et al. 2000). The results have been
mixed. For example, wording changes that tried to make respondents more comfortable admitting that they did not vote did not reduce reports of voting (Presser
1990, Abelson et al. 1992), and similar attempts to increase the reporting of abortions have also failed (e.g., Jobe et al. 1997). Belli et al. (1999), however, were able
to reduce voting claims with a question that asked the respondent to remember
details about the vote, presented multiple response categories for didnt vote
(e.g., I thought about voting this time but didnt), instead of the usual single
one, and phrased the did vote category in definite terms (I am sure I voted
in the November 5 election). Further work is needed to identify which of these
changes is key. The most consistent finding in this literature is that more private
(e.g., self-administered) modes of administration produce both higher reports of
socially undesirable behaviors (Tourangeau & Smith 1996, Turner et al. 1998) and
lower reports of socially desirable ones (Presser & Stinson 1998).
25 May 2003
76
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
77
25 May 2003
78
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
only slightly greater than the reliability of the valence component and modestly
greater than the intensity component. The reliability of the composite was substantially greater than the reliability of the other 7-point scales in that analysis,
but, as in Krosnick & Berent, the items compared differed in extent of labeling
(as well as in content). Although the evidence that fully labeled unfolded items
increase reliability is not clear-cut, such items have the advantage of providing a
large number of categories without a showcard, which means they can be implemented in the same way in face-to-face and telephone surveys, making them very
useful for mixed mode designs and comparisons across modes.
Researchers have long known that when a middle category is offered it will be
chosen by more respondents than will volunteer that answer when it is not offered
(Schuman & Presser 1981). OMuircheartaigh et al. (1999) concluded that offering
a middle alternative in rating scales reduces the amount of random measurement
error and does not affect validity. For some constructs, the label used for the middle
category may affect how often it is chosen, e.g, when they rated capital punishment,
more subjects chose the middle category when it was labeled ambivalent than
when it was labeled neutral (Klopfer & Madden 1980). This appears to be true
whether the scale uses unipolar or bipolar numeric labels (Schaeffer & Barker
1995).
Number of Categories
The choice of the number of categories represents a compromise between the
increasing discrimination potentially available with more categories and the limited capacity of respondents to make finer distinctions reliably and in similar
ways. Based largely on psychophysical studies, the standard advice has been to
use five to nine categories (Miller 1956, Cox 1980), although even that number
of categories can be difficult to administer in telephone interviews. Both Alwin
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
79
& Krosnick (1991) and Alwin (1992) found evidence that the reliability of individual rating scales appeared to increase as the number of categories grew, up
to approximately seven or nine categories, with the exception that reliability was
greater with two than three categories. Their results must be interpreted cautiously,
however, because the questions that were compared differed not only in the number of categories, but also in a large variety of other ways. In a comparison that
controlled item content, 11-point feeling thermometers showed higher reliabilities
than 7-point scales (Alwin 1997), but the results may have been due to order of
presentation, as respondents always answered the feeling thermometers after the
rating scales.
A few response methods, such as magnitude scaling and feeling thermometers,
offer a very large number of numerical options, but respondents usually choose
answers that are multiples of 5, 10, or 25 (at the low end of the continuum) and
50, 100, or 1000 (at the higher end), so that the number of categories used is less
than one might expect (Tourangeau et al. 2000). Because most of the categories
are unlabeled, respondents interpretations of them may vary, although assigning
labels to a subset of the categories (as is often done with feeling thermometers)
probably causes further clustering of answers (Groves & Kahn 1979; Alwin &
Krosnick 1991, p. 175, footnote 11). In addition, some respondents find these
response tasks difficult, so the proportion who refuse to answer or say they do not
know is substantially higher than with other rating scales (Schaeffer & Bradburn
1989, Dominitz & Manski 1997).
25 May 2003
80
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
and then drawing on relevant predispositions in deciding how to answer (see also
Strack et al. 1991, Tourangeau & Rasinski 1988).
This same process applies to questions about ordinary issues; even on familiar
matters, individuals often cannot retrieve an answer to attitude questions and instead have to construct an answer from accessible predispositions (Sudman et al.
1996, Tourangeau et al. 2000). To the extent that respondents satisfice, this suggests
that filters may reduce opinion giving by discouraging people from undertaking the
cognitive effort needed to formulate an answer based on their preexisting attitudes.
How then does filtering reduce opinion givingby eliciting fewer true attitudes
or fewer nonattitudes? If filtering reduces nonattitudes, not true opinions, then
indicators of data quality (e.g., temporal stability) should be higher with filtered
items than with standard versions of the same questions. In three experiments,
Krosnick et al. (2002) found no support for this hypothesis. There was essentially
no difference in data quality between filtered and standard versions in any of their
experiments. McClendon & Alwin (1993) also found no evidence that filtered
questions improve reliability. If further research confirms this, then, as a general
rule, it may be best to avoid filters and instead supplement direction-of-opinion
measures with follow-up items on other attitudinal dimensions, such as salience
and intensity.
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
81
these items and in favor of forced-choice questions (Converse & Presser 1986). For
example, Do you agree or disagree that most men are better suited emotionally
for politics than are most women? could be replaced by Would you say that most
men are better suited emotionally for politics than are most women, that men and
women are equally suited, or that women are better suited than men in this area?
Similarly, the agree-disagree statement Management lets employees know how
their work contributes to the agencys goals could be replaced by Some people
think management lets employees know how their work contributes to the agencys
goals. Other people think management does not let employees know how their work
contributes to the agencys goals. Which comes closer to how you feel?
Forced-choice items may have advantages not only over agree-disagree items
but over true-false items and yes-no items as well. Some evidence suggests that
the reliability and validity of forced-choice questions is generally higher than that
for either true-false or yes-no items, possibly because the latter approaches invite
acquiescence by stating only one side of an issue (Krosnick & Fabrigar 2003).
Foregoing agree-disagree items may be problematic when investigators aim to
replicate, extend, or otherwise make comparisons to previous studies that used such
questions. In these instances, a useful approach involves a split-sample, administering the original agree-disagree item to a random set of cases, and a forced-choice
version to the remainder. This allows for comparisons holding wording constant
and also provides a gauge of acquiescences impact on the results.
25 May 2003
82
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
CONCLUSION
For many years, there was little basis for quarreling with the title of Stanley Paynes
1951 classic. Asking questions was an art. Now, however, a body of work has
accumulated that lays a foundation for a science of asking questions. Researchers
can make decisions about some aspects of question wording informed by the results
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
83
LITERATURE CITED
Abelson RP, Loftus EF, Greenwald AG. 1992.
Attempts to improve the accuracy of selfreports of voting. See Tanur 1992, pp. 138
53
Alwin DF. 1992. Information transmission in
the survey interview: number of response categories and the reliability of attitude measurement. In Sociological Methodology, ed.
PV Marsden, 22:83118. Washington, DC:
Am. Sociol. Assoc.
Alwin DF. 1997. Feeling thermometers versus 7-point scales: Which are better? Sociol.
Methods Res. 25:31840
Alwin DF, Krosnick JA. 1991. The reliability of
survey attitude measurement: The influence
of question and respondent attributes. Sociol.
Methods Res. 20:13981
Ayidiya SA, McClendon MJ. 1990. Response
effects in mail surveys. Public Opin. Q.
54:22947
Belli RF, Schwarz N, Singer E, Talarico J. 2000.
Decomposition can harm the accuracy of behavioral frequency reports. Appl. Cogn. Psychol. 14:295308
Belli RF, Shay WL, Stafford FP. 2001. Event
history calendars and question list surveys: a
direct comparison of interviewing methods.
Public Opin. Q. 65:4574
Belli RF, Traugott MW, Young M, McGonagle
KA. 1999. Reducing vote overreporting in
surveys: social desirability, memory fail-
25 May 2003
84
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
85
25 May 2003
86
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
25 May 2003
0:34
AR
AR190-SO29-04.tex
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
87
tion and contrast effects in part-whole question sequences: a conversational logic analysis. Public Opin. Q. 55:323
Schwarz N, Strack F, Muller G, Chassein B.
1988. The range of response alternatives
may determine the meaning of the question:
further evidence on informative functions
of response alternatives. Soc. Cogn. 6:107
17
Schwarz N, Sudman S, eds. 1996. Answering
Questions: Methodology for Determining
Cognitive and Communicative Processes in
Survey Research. San Francisco: Jossey-Bass
Sirken MG, Herrmann DJ, Schechter S,
Schwarz N, Tanur JM, Tourangeau R, eds.
1999. Cognition and Survey Research. New
York: Wiley
Strack F, Schwarz N, Wanke M. 1991. Semantic and pragmatic aspects of context effects
in social and psychological research. Soc.
Cogn. 9:11125
Suchman L, Jordan B. 1990. Interactional troubles in face-to-face survey interviews. J. Am.
Stat. Assoc. 85:23253
Sudman S, Bradburn NM, Schwarz NM. 1996.
Thinking About Answers: The Application of
Cognitive Processes to Survey Methodology.
San Francisco: Jossey-Bass
Sudman S, Finn A, Lannom L. 1984. The use
of bounded recall procedures in single interviews. Public Opin. Q. 48:52024
Tanur JM, ed. 1992. Questions About Questions: Inquiries into the Cognitive Bases of
Surveys. New York: Russell Sage Found.
Tourangeau R. 1984. Cognitive sciences and
survey methods. In Cognitive Aspects of Survey Methodology: Building a Bridge Between
Disciplines, ed. T Jabine, M Straf, J Tanur,
R Tourangeau, pp. 73100. Washington, DC:
Natl. Acad. Press
Tourangeau R, Rasinski K. 1988. Cognitive
processes underlying context effects in attitude measurement. Psychol. Bull. 103:299
314
Tourangeau R, Rasinski K, Bradburn N. 1991.
Measuring happiness in surveys: a test of
the subtraction hypothesis. Public Opin. Q.
55:25566
25 May 2003
88
0:34
AR
AR190-SO29-04.tex
SCHAEFFER
AR190-SO29-04.sgm
LaTeX2e(2002/01/18)
P1: IKH
PRESSER
P1: FRK
19:56
Annual Reviews
AR190-FM
CONTENTS
FrontispieceRaymond Boudon
xii
PREFATORY CHAPTERS
Beyond Rational Choice Theory, Raymond Boudon
Teenage Childbearing as a Public Issue and Private Concern,
Frank F. Furstenberg, Jr.
1
23
65
283
SOCIAL PROCESSES
The Sociology of the Self, Peter L. Callero
115
257
335
443
FORMAL ORGANIZATIONS
Covert Political Conflict in Organizations: Challenges from Below,
Calvin Morrill, Mayer N. Zald, and Hayagreeva Rao
391
135
307
363
515
v
P1: FRK
vi
19:56
Annual Reviews
AR190-FM
CONTENTS
167
417
487
541
209
233
563
DEMOGRAPHY
Population and African Society, Tukufu Zuberi, Amson Sibanda,
Ayaga Bawah, and Amadou Noumbissi
465
41
89
POLICY
Welfare-State Regress in Western Europe: Politics, Institutions,
Globalization, and Europeanization, Walter Korpi
INDEXES
Subject Index
Cumulative Index of Contributing Authors, Volumes 2029
Cumulative Index of Chapter Titles, Volumes 2029
ERRATA
An online log of corrections to Annual Review of Sociology
chapters (if any, 1997 to the present) may be found at
http://soc.annualreviews.org/errata.shtml
589
611
635
638