Professional Documents
Culture Documents
Nfq057 Data Error
Nfq057 Data Error
907–933
JOSEPH W. SAKSHAUG is a Ph.D. candidate in the Program in Survey Methodology at the Institute for
Social Research at the University of Michigan, Ann Arbor, MI, USA. TING YAN is a Senior Survey
Methodologist with NORC at the University of Chicago, Chicago, IL, USA. ROGER TOURANGEAU is
a Research Professor at the Institute for Social Research atthe University of Michigan, Ann Arbor, MI,
USA, and the Director of the Joint Program in Survey Methodology at the University of Maryland,
College Park, MD, USA. We thank Paul Biemer and three anonymous reviewers for critical comments
and helpful suggestions. *Address correspondence to Joseph W. Sakshaug, Institute for Social
Research, University of Michigan, 426 Thompson Street, Room 4050, Ann Arbor, MI 48104,
USA; e-mail: joesaks@umich.edu.
doi: 10.1093/poq/nfq057
Ó The Author 2011. Published by Oxford University Press on behalf of the American Association for Public Opinion Research.
All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
908 Sakshaug, Yan, and Tourangeau
between giving inconsistent answers across two waves of a Web survey and the
probability of responding to the second wave was nonmonotonic. Olson (2006)
investigated the relationship between nonresponse and response accuracy and
found no simple relationship between the two (for similar findings, see
Willimack, Schuman, Pennell, and Lepkowski 1995). In summary, there is
Methods
The data analyzed here are from a study carried out by the Joint Program in
Survey Methodology (JPSM) at the University of Maryland as part of one
912 Sakshaug, Yan, and Tourangeau
of its graduate classes. In 2005, students in the Practicum class designed a sur-
vey of University of Maryland alumni, with data collection for the main survey
conducted by Schulman, Ronca, and Bucuvalas, Inc. (The students conducted
pretest interviews.) Members of the sample were contacted initially by tele-
phone, asked a brief set of screening questions about their personal and house-
The methods used in the study are described in more detail by Kreuter,
Presser, and Tourangeau (2008). Here, we summarize the relevant features
of the sample design, data collection, and questionnaires.
Sampling and data collection: The sample was a random sample (proportion-
ately stratified by graduation year) consisting of 20,000 graduates drawn from
a population of 55,320 alumni who, according to university records, received
undergraduate degrees from the University of Maryland from 1989 to 2002.
After sample cases were matched with Alumni Association records (to obtain
telephone numbers) and various ineligible cases were dropped (e.g., those used
in pretesting and those living abroad), the survey fielded 7,535 telephone
numbers.1 More than a third of these telephone numbers turned out to be invalid
(e.g., the number was disconnected), and the status of about another quarter
could not be determined. A total of 1,501 alumni completed the screener
and were randomly assigned to a mode of data collection. There were 37 cases
who reported they did not have Internet access, and these were randomly
assigned to either CATI or IVR data collection. The response rate (AAPOR
1. This number differs slightly from the corresponding figure in Kreuter, Presser, and Tourangeau
(2008), because we excluded certain cases (sample members found to be deceased) that Kreuter
et al. included.
Error Tradeoffs in a Multi-mode Survey 913
Response Rate 1; AAPOR 2009) for the screener was 31.9 percent. Most of the
nonresponse was due to difficulties in contacting the alumni rather than their
unwillingness to cooperate. The refusal rate was about ten percent of the fielded
phone numbers (excluding ineligibles).
The year-of-birth question was part of the telephone screener. The rest of the
items were included in the main questionnaire. These questions came early in
the questionnaire, but our numbering of the items above does not correspond to
the item numbers in the questionnaire.
We constructed three estimates based on the GPA item: the proportion of cases
with a GPA less than 2.5, the proportion with a GPA higher than 3.5, and the
mean GPA. We thought GPAs lower than 2.5 would be seen as socially undesir-
able (a GPA of 2.0 or less triggers academic warning at the University of Mary-
land) and that GPAs higher than 3.5 would be seen as socially desirable (a GPA
that high or higher in a given term qualifies the student for the Dean’s List). We
calculated proportions based on items 2 through 7 above and the mean years
since birth and since graduation based on the last two items.
Results
We present the results in three parts. First, we assess the relative contributions of
nonresponse and measurement error to the overall error in the survey estimates
for the sensitive and non-sensitive items. Next, we examine the impact of
different types of nonresponse (noncontact, refusal to complete the screening
interview, and dropout and item nonresponse after the screener was completed)
and their relation to response accuracy. One aim of this part of the analysis is to
determine whether the expected reduction in measurement error due to self-
administration offsets any increase in nonresponse bias due to the cases that
drop out during the switch from one mode of data collection to another.
Our final set of analyses sheds further light on the tradeoffs between
nonresponse and measurement error by examining how the level of effort
Error Tradeoffs in a Multi-mode Survey 915
needed to contact the sample members and get them to complete the screener
relates to the level of accuracy in their answers.
Relative contribution of nonresponse and measurement error: Table 1 shows
the distribution of the true statuses (according to university records) for the full
sample and for the various subgroups of the sample at each stage of the recruit-
Continued
Downloaded from https://academic.oup.com/poq/article-abstract/74/5/907/1815368 by guest on 20 August 2019
Error Tradeoffs in a Multi-mode Survey
Table 1. Continued
Before Mode Switch Frame Data After Mode Switch Frame Data Survey Report
Survey Data Item
Sample Contacts Screener Rs Mode Switch Rs Item Responders Responders
(n ¼ 7,535) (n ¼ 3,497) (n ¼ 1,501) (n ¼ 1,107) (n’s vary) (n’s vary)
Neutral characteristics
GPA 3.02 3.03 3.06 3.07 3.08 3.18
(0.01) (0.01) (0.01) (0.01) (969; 0.02) (969; 0.02)
Years since birth 33.44 33.98 34.63 34.57 34.52 34.69
(screener item) (0.07) (0.12) (0.19) (0.22) (1,090; 0.22) (1,090; 0.23)
Years since degree 9.27 9.51 9.82 9.86 9.86 9.92
(0.05) (0.07) (0.11) (0.13) (1,076; 0.13) (1,076; 0.15)
NOTE.—Parenthetical entries in the first four columns of figures are standard errors; in the last two columns, the parenthetical entries are sample sizes followed by the
standard errors.
917
Downloaded from https://academic.oup.com/poq/article-abstract/74/5/907/1815368 by guest on 20 August 2019
918
Table 2. Nonresponse and Measurement Error Bias Estimates, by Survey Statistic (standard errors in parentheses)
Nonresponse Bias
Mode Switch Item Measurement
Noncontact Refusal Dropout Nonresponse Total NR Bias
Undesirable Characteristics
GPA < 2.5 0.4 1.2 1.5 0.1 3.2 7.9
(0.4) (0.7) (0.5) (0.4) (1.0) (1.1)
At least one D/F 0.2 1.4 0.1 0.4 1.9 15.2
(0.8) (1.0) (0.9) (0.2) (1.9) (1.2)y
Dropped a class 1.2 0.4 0.6 0.6 2.8 20.2
(0.5) (0.9) (0.6) (0.2) (1.5) (1.3)y
Desirable Characteristics
GPA > 3.5 0.9 1.4 0.5 0.6 3.4 1.1
(0.6) (0.7) (0.7) (0.5) (1.5) (0.9)
Continued
Downloaded from https://academic.oup.com/poq/article-abstract/74/5/907/1815368 by guest on 20 August 2019
Error Tradeoffs in a Multi-mode Survey
Table 2. Continued
Nonresponse Bias
Mode Switch Item Measurement
Noncontact Refusal Dropout Nonresponse Total NR Bias
Neutral Characteristics
GPA 0.01 0.03 0.01 0.01 0.06 0.10
(0.01) (0.01) (0.01) (0.01) (0.02) (0.01)
Years since birth 0.54 0.65 0.06 0.05 1.08 0.17
(screener item) (0.09) (0.11) (0.10) (0.03) (0.17)y (0.06)
Years since degree 0.24 0.31 0.04 0.00 0.59 0.06
(0.05) (0.07) (0.07) (0.02) (0.10)y (0.07)
NOTE.—Noncontact bias is computed as the difference between the contacted and full sample estimates in table 1; refusal bias is the difference between the screener
respondents and full sample estimates; and so on.
y
indicates that the difference between the nonresponse and measurement error biases is statistically significant, p < 0.05.
919
920 Sakshaug, Yan, and Tourangeau
ðyr yn Þ
Relbias ¼ ;
yn
in which yr is the estimate based on the relevant group of respondents and yn is
the estimate for the full sample. The results comparing the relative biases from
each error source confirm the main findings already apparent from table 2. For
Error Tradeoffs in a Multi-mode Survey 921
example, measurement error dominates the overall error in the estimates based
on the socially undesirable characteristics, whereas nonresponse introduces
more overall error for the estimates based on the desirable and neutral character-
istics. Similarly, the analysis of the relative biases confirms that the overall non-
response bias appears to be driven mostly by screener nonresponse rather than
2. The years-since-birth item was asked in the CATI mode only as part of the screener and is
removed from the remaining analysis of the impact of the switch in modes.
Downloaded from https://academic.oup.com/poq/article-abstract/74/5/907/1815368 by guest on 20 August 2019
922
Table 3. Bias Estimates for Mode Switch Nonresponse, Measurement Error, and Total Bias after Switch by Survey Statistic
and Mode of Data Collection (standard errors in parentheses)
Mode Switch Nonresponse
Bias Measurement Bias Total Bias after Mode Switch
CATI IVR Web CATI IVR Web CATI IVR Web
Undesirable Characteristics
GPA < 2.5 0.2 1.8 1.4 8.5 6.7 8.4 8.7 8.5 9.8
(0.3) (0.8)y (1.3) (1.9)y (1.3)y (1.4)y (1.9) (1.6) (1.3)*
At least one D/F 0.4 0.7 0.2 19.0 15.3 12.0 19.4 14.6 11.8
(0.5) (1.4) (1.7) (2.0)y (2.5)y (1.9)y (2.0) (2.5) (2.1)
Dropped a class 0.2 1.3 0.2 21.1 19.4 20.5 21.3 20.7 20.3
(0.4) (0.8) (1.6) (3.4)y (1.9)y (2.4)y (3.5) (2.0) (2.9)
Desirable Characteristics
GPA > 3.5 0.6 0.1 0.5 1.1 1.3 3.3 1.7 1.2 3.8
Continued
Downloaded from https://academic.oup.com/poq/article-abstract/74/5/907/1815368 by guest on 20 August 2019
Error Tradeoffs in a Multi-mode Survey
Table 3. Continued
Mode Switch Nonresponse
Bias Measurement Bias Total Bias after Mode Switch
CATI IVR Web CATI IVR Web CATI IVR Web
Neutral Characteristics
GPA 0.01 0.01 0.02 0.10 0.08 0.10 0.11 0.09 0.12
(0.04) (0.01) (0.02) (0.03)y (0.02)y (0.01)y (0.03) (0.02) (0.02)*
Years since degree 0.04 0.07 0.12 0.00 0.23 0.08 0.04 0.30 0.04
(0.03) (0.12) (0.10) (0.10) (0.15) (0.11) (0.10) (0.19)* (0.15)
y
indicates that the nonresponse or measurement error bias estimate is significantly different from zero, p < 0.05.
*indicates that the overall error introduced after the switch in data collection mode was greater in Web or IVR than in CATI.
923
Downloaded from https://academic.oup.com/poq/article-abstract/74/5/907/1815368 by guest on 20 August 2019
924
Table 4. Bias Estimates, by Survey Statistic and Level of Effort (standard errors in parentheses)
Noncontact Bias Nonresponse Bias Measurement Bias Total Bias
1–2 3–5 6þ 1–2 3–5 6þ 1–2 3–5 6þ 1–2 3–5 6þ
calls calls calls calls calls calls calls calls calls calls calls calls
Undesirable
Characteristics
GPA < 2.5 3.2 1.6 0.4 0.9 2.1 2.8 7.6 7.2 7.8 11.7 10.9 11.0
(0.05) (0.03) (0.03) (0.07) (0.06) (0.04) (0.06) (0.05) (0.04) (0.03) (0.03) (0.02)
At least one D/F 3.7 2.0 0.2 1.3 0.3 1.7 15.2 15.7 15.2 17.6 17.4 17.1
(0.07) (0.05) (0.04) (0.11) (0.08) (0.07) (0.06) (0.04) (0.04) (0.09) (0.06) (0.06)
Dropped a class 5.3 3.2 1.2 2.4 1.7 1.5 18.5 19.2 20.2 26.2 24.1 22.9
(0.05) (0.04) (0.03) (0.09) (0.07) (0.06) (0.06) (0.05) (0.04) (0.10) (0.08) (0.07)
Desirable
Characteristics
Continued
Downloaded from https://academic.oup.com/poq/article-abstract/74/5/907/1815368 by guest on 20 August 2019
Error Tradeoffs in a Multi-mode Survey
Table 4. Continued
Noncontact Bias Nonresponse Bias Measurement Bias Total Bias
925
926 Sakshaug, Yan, and Tourangeau
characteristics (as Kreuter, Presser, and Tourangeau 2008 also found in looking
at these data).
The final three columns in table 3 show estimates of the overall bias in-
troduced after the assignment to the mode of data collection for the main
questionnaire and reflect the combined effects of mode switch nonresponse
3. For brevity, we omit item nonresponse from our assessment of the overall error after the assign-
ment of mode. Our conclusions remain the same whether or not we include this source of error.
Error Tradeoffs in a Multi-mode Survey 927
Overall, then, table 4 suggests that additional efforts most clearly affect non-
contact bias (which is hardly surprising) but have smaller, less consistent effects
on the overall level of nonresponse and measurement error. Any increase in the
inaccuracy of the survey answers produced by additional recruitment efforts is
small, so that overall there seems to be a net gain from additional callbacks. For
Discussion
Our results support five conclusions. First, all of the different forms of non-
response had a consistent relationship to the survey estimates, so that the
effects of one form of nonresponse reinforced rather than canceled the effects
of other forms. Second, breaking nonresponse into its various components
was still useful, since the relative importance of the different components var-
ied from one estimate to the next. Third, as some prior investigations of the
relative contributions of different nonsampling errors have found, measure-
ment error tended to be the largest source of error, but in our study this was
true only for the estimates regarding the prevalence of socially undesirable
characteristics; the estimates involving socially desirable characteristics
tended to be dominated by nonresponse error. Fourth, the results show that
switching respondents to a self-administered mode (like IVR or the Web) can
reduce measurement error but may increase overall error because of dropouts
during or after the mode switch. And finally, additional callbacks appeared to
reduce one form of nonresponse error (the bias due to noncontacts) but had
a less consistent relation to other forms of nonresponse error or to measure-
ment error.
Nonresponse error: Our first conclusion is that the various nonresponse biases
we distinguished in our analysis all tended to push the survey estimates in the
same direction. This is apparent from table 2, where, within any given row,
the bias estimates in the first four columns tend to be all negative or all positive.
The alumni who had greater difficulties during their undergraduate years were
harder to contact, more difficult to screen, more likely to drop out during
the switch to the main interview, and less inclined to answer the questions
in the main questionnaire than those who had more successful undergraduate
careers. Although there are a few reversals of sign in table 2, they tend to be
quite small (see, for example, the row with the estimated biases for the mean
years since birth). Some earlier studies have found offsetting effects of
noncontact and refusal (e.g., Kalsbeek, Yang, and Agans 2002), raising the
possibility that these two forms of nonresponse error might sometimes cancel
each other out, but in our study the various forms of nonresponse almost always
reinforce each other. And, in general, the biasing effects of measurement error
also worsen (rather than offset) the biasing effects of nonresponse.
Error Tradeoffs in a Multi-mode Survey 929
Despite the fact that the different types of nonresponse tended to push the
estimates in the same direction, their relative importance varied from one
estimate to the next. For example, for one of the estimates, dropouts after
the mode switch introduced the largest bias, but for most of the other estimates,
screener refusal seemed to introduce the most nonresponse bias. In general,
Measurement error: Our third conclusion is that measurement error can pro-
duce very large biases, especially for sensitive questions about socially undesir-
able characteristics, like flunking or withdrawing from a class. The
measurement biases for the estimates about such undesirable characteristics
range from almost eight to more than 20 percentage points (see the top three
rows of the final column of table 2). For the most part, the measurement errors
are smaller for the estimates based on positive characteristics (like having a high
GPA) and smaller still for the estimates regarding neutral characteristics (years
since graduation). Tourangeau, Groves, and Redline (2010) reach similar
conclusions about the importance of measurement bias in their study of reports
about voting; in that study, the measurement biases were about twice as large as
the nonresponse biases. Beginning with Horvitz (1952), methodological
researchers have demonstrated that measurement error can be a large contrib-
utor to the overall error in survey estimates; that may be especially true when the
930 Sakshaug, Yan, and Tourangeau
Conclusions
The 2005 JPSM Practicum survey affords an unusual opportunity to examine
the effects of nonresponse and measurement on a range of survey estimates,
because high-quality records data are available for both the respondents and
Error Tradeoffs in a Multi-mode Survey 931
References
American Association for Public Opinion Research (AAPOR). 2009. Standard Definitions: Final
Dispositions of Case Codes and Outcomes Rates for Surveys. 6th ed. Lenexa, KS: AAPOR.
Biemer, Paul P. 2001. ‘‘Nonresponse Bias and Measurement Bias in a Comparison of Face-to-face
and Telephone Interviewing.’’ Journal of Official Statistics 17(2):295–320.
932 Sakshaug, Yan, and Tourangeau
Bollinger, Christopher R., and Martin David. 2001. ‘‘Estimation with Response Error and Nonre-
sponse: Food Stamp Participation in the SIPP.’’ Journal of Business and Economic Statistics
19(2):129–42.
Brick, J. Michael, and Douglas Williams. 2009. ‘‘Reasons for Increasing Nonresponse in U.S.
Household Surveys.’’ Paper presented at the Workshop of the Committee on National Statistics,
Washington, DC, December 14.
Triplett, Timothy, Johnny Blair, Teresa Hamilton, and Yun Chiao Kang 1996. ‘‘Initial Cooperators
vs. Converted Refusers: Are There Response Behavior Differences?’’ Proceedings of the Survey
Research Methods Section of the American Statistical Association (pp. 1038–41). Alexandria,
VA: American Statistical Association.
Willimack, Diane K., Howard Schuman, Beth-Ellen Pennell, and James M. Lepkowski. 1995.
‘‘Effects of a Prepaid Nonmonetary Incentive on Response Rates and Response Quality in