Professional Documents
Culture Documents
SMZ 041
SMZ 041
CARINA CORNESSE*
ANNELIES G. BLOM
CARINA CORNESSE is a postdoctoral researcher; she is with the Collaborative Research Center SFB
884 “Political Economy of Reforns” at the University of Mannheim, 68131 Mannheim, Germany.
ANNELIES G. BLOM is Professor for Data Science at the Department of Political Science, School of
Social Sciences, and Principal Investigator of the German Internet Panel (GIP) at the
Collaborative Research Center (SFB 884) “Political Economy of Reforms” at the University of
Mannheim, 68131 Mannheim, Germany. DAVID DUTWIN is Senior Vice President; he is with
NORC at the University of Chicago, 55 East Monroe Street, 30th Floor, Chicago, IL 60603,
USA. JON A. KROSNICK is Frederic O. Glover Professor in Humanities and Social Sciences,
Professor of Communication, and Professor of Political Science at Stanford University, 432
McClatchy Hall, 450 Serra Mall, Stanford, CA 94305, USA. EDITH D. DE LEEUW is Professor
Emerita of Survey Methodology at the Department of Methodology and Statistics at Utrecht
University, Sjoerd Groenmangebouw, Padualaan 14, 3584 CH Utrecht, Netherlands. STE PHANE
LEGLEYE is Head of the Team “Drug Use in the General Population” at CESP, Inserm, faculte de
medecine UVSQ, faculte de medecine, Universite Paris-Saclay, Universite Paris-Sud and Head of
the “Household Living Conditions” Unit at Institut National de la Statistique et des Etudes
1. INTRODUCTION
In recent years, several cases of mispredicted election outcomes have made the
news across the world. Prominent examples include the 2015 Israeli parliamen-
tary election where the majority of polls predicted that Benjamin Netanyahu
Economiques, 88, avenue Verdier, CS 70058, 92541 Montrouge cedex, France. JOSH PASEK is
Associate Professor, Department of Communication and Media and Faculty Associate, Center for
Political Studies, Institute for Social Research at the University of Michigan, 105 S. State Street,
5413 North Quad, Ann Arbor MI 48109. DARREN PENNAY is Founder, Board Member, and
Executive Director Research, Methods & Strategy at the Social Research Centre, Campus visitor at
Australian National University Centre for Social Research, Level 9, 277 William St, Melbourne
Victoria Australia 3000 and Methods and Adjunct Professor with the Institute for Social Science
Research (ISSR), University of Queensland. BENJAMIN PHILLIPS is Senior Research Director, Survey
Methodology at the Social Research Centre, Level 9, 277 William St, Melbourne Victoria Australia
3000 and Campus Visitor at the Centre for Social Research & Methods, Australian National
University. JOSEPH W. SAKSHAUG is Distinguished Researcher, Head of the Data Collection and Data
Integration Unit, and Acting Head of the Statistical Methods Research Department at the Institute
for Employment Research (IAB), 104 Regensburgerstr. 90478 Nuremberg, Germany, University
Professor of Statistics in the Department of Statistics at the Ludwig Maximilian University of
Munich, 80539 Munich, Germany, and Honorary Full Professor in the School of Social Sciences at
the University of Mannheim, 68131 Mannheim, Germany. BELLA STRUMINSKAYA is assistant profes-
sor with the Department of Methodology and Statistics at the University of Utrecht, Sjoerd
Groenmangebouw Padualaan 14, 3584 CH Utrecht, Netherlands. ALEXANDER WENZ is a postdoctoral
researcher with the Collaborative Research Center SFB 884 “Political Economy of Reforms” at the
University of Mannheim, 68131 Mannheim, Germany and the Institute for Social and Economic
Research at the University of Essex. The authors would like to thank the Collaborative Research
Center (SFB) 884 “Political Economy of Reforms” (projects A8 and Z1), funded by the German
Research Foundation (DFG), for organizing the Workshop on Probability-based and Nonprobability
Survey Research in July 2018 at the University of Mannheim. In addition, the authors would like to
thank Melvin John, Sabrina Seidl, and Nourhan Elsayed for their assistance with manuscript prepa-
ration. The authors are grateful to Andrew Mercer for valuable input and feedback on earlier ver-
sions of this manuscript. The order of authors is alphabetical after the lead author.
*Address correspondence to Carina Cornesse, Collaborative Research Center SFB 884 “Political
Economy of Reforms,” University of Mannheim, 68131 Mannheim, Germany, E-mail: carina.
cornesse@uni-mannheim.de
6 Cornesse et al.
would lose his presidency,1 the 2016 Brexit referendum where the majority of
polls predicted that Britain would vote to remain in the European Union,2 and
the 2016 US presidential election where the majority of polls predicted that
Hillary Clinton would defeat Donald Trump.3 When investigating potential
reasons for these and other polling failures, researchers have pointed toward
the fact that election polls based on probability samples usually reached more
1. https://edition.cnn.com/2015/03/18/middleeast/israel-election-polls/index.html, accessed on
November 30, 2019.
2. https://www.theguardian.com/politics/2016/jun/24/how-eu-referendum-pollsters-wrong-opin-
ion-predict-close, accessed on September 30, 2019.
3. https://www.forbes.com/sites/startswithabang/2016/11/09/the-science-of-error-how-polling-
botched-the-2016-election/#babf86437959, accessed on September 30, 2019.
Review of Probability and Nonprobability Research 7
2. CONCEPTUALIZATION OF NONPROBABILITY
SAMPLING APPROACHES
The Total Survey Error (TSE; Groves and Lyberg 2010) framework that forms
the bedrock of quality assessments for probability samples does not cleanly
translate to the world of nonprobability sample data collection. Nonprobability
samples do not involve a series of well-controlled and well-understood depar-
distance metric (e.g., Euclidean distance) to identify the closest match between
pairs of units based on the set of common auxiliary characteristics.
Another approach that uses unit-level reference data is propensity score
weighting. This approach is performed after the survey data have been col-
lected from units in a nonprobability sample. The basic procedure is to verti-
cally concatenate the nonprobability sample survey data with a reference
regression coefficient) for Y in the total population (Elliott and Valliant 2017).
The key assumptions of this approach are that the analysis variable, Y, is
explained through a common model for the sampled and nonsampled units and
that all parameters governing the superpopulation model are controlled for in
the analysis model. The approach also requires the availability of auxiliary
data on the population to make predictions for the nonsampled units. Variance
Study Country Benchmark PS modes PS more accurate Did (re)weighting sufficiently re-
studied than NPS?a duce NPS bias?b
Blom et al. (2018) Germany Census data F2F, web Yes (univariate) No (raking)
Election data
MacInnis et al. (2018) USA High-quality PS Phone, web Yes (univariate) No (poststratification)
Dassonneville et al. (2018) Belgium Census data F2F Yes (univariate) No (unspecified
approach) – univariate
Election outcome No (multivariate) N/A – multivariate
Legleye et al. (2018) France Census data Phone Yes (univariate) N/A
Pennay et al. (2018) Australia High-quality PS Phone Yes (univariate) No (poststratification)
Sturgis et al. (2018) UK Election outcome F2F Yes (univariate) No (raking, propensity weighting,
matching)
Dutwin and Buskirk (2017) USA High-quality PS F2F, phone Yes (univariate) No (propensity weighting, match-
ing, raking)
Sohlberg et al. (2017) Sweden Election outcome Phone Yes (univariate) N/A
Brüggen et al. (2016) Netherlands Population register F2F, web Yes (univariate) No (Generalized Regression
Review of Probability and Nonprobability Research
Study Country Benchmark PS modes PS more accurate Did (re)weighting sufficiently re-
studied than NPS?a duce NPS bias?b
a
The study results reported here are based on initial comparisons of accuracy as reported by the authors. Some studies use raw (unweighted) data
in their initial comparisons, while others use data that are already weighted using either weights that the authors calculated themselves or the
weights that a survey vendor delivered with the data.
b
We report whether (re)weighting has sufficiently reduced bias based on the authors’ own judgments as reported in their conclusions. This column
includes studies that compare raw (unweighted) data in their initial comparison of accuracy with the data after a weighting procedure was per-
formed and studies that used weighted data in their initial comparison of accuracy and reweighted the data in subsequent comparisons. Studies that
only reported raw data or only weighted data, without using any reweighting approaches, are labelled “not applicable” (N/A) in this column.
c
As reported by Callegaro et al. (2014a, 2014b).
Review of Probability and Nonprobability Research
17
and Krosnick 2009; Yeager et al. 2011; Szolnoki and Hoffmann 2013; Erens
et al. 2014; Dutwin and Buskirk 2017; MacInnis, Krosnick, Ho, and Cho
2018). In addition, the higher accuracy of probability sample surveys has been
found across a number of countries, such as Australia (Pennay, Neiger,
Lavrakas, and Borg 2018), France (Legleye et al. 2018), Germany (Szolnoki
and Hoffmann 2013; Blom, Ackermann-Piek, Helmschrott, Cornesse, Bruch,
inaccuracy due to nonprobability sample selection likely had the biggest im-
pact on the mispredictions.
Although most studies have demonstrated that accuracy is higher in proba-
bility sample surveys than in nonprobability sample surveys, two studies have
yielded mixed findings depending on the type of estimate examined (Pasek
2016; Dassonneville, Blais, Hooghe, and Deschouwer 2018). Both studies
The majority of these studies found that weighting did not sufficiently reduce
bias in nonprobability sample surveys (see table 1). Generally speaking, proba-
bility sample surveys were found to be more accurate than nonprobability sam-
ple surveys even after (re)weighting.
Although the majority of studies found that weighting did not reduce the
bias in nonprobability sample surveys sufficiently (table 1), some studies
4. CLOSING REMARKS
In this article, we have reviewed conceptual approaches and empirical evi-
dence on probability and nonprobability sample surveys. Probability sampling
theory is well established and based on sound mathematical principles,
whereas nonprobability sampling is not. Although there are potential justifica-
tions for drawing inferences from nonprobability samples, the rationale for
many studies remains unarticulated, and the inferences from nonprobability
sample surveys generally require stronger modeling assumptions than are nec-
essary for probability samples. The basic problem with these modeling
assumptions remains that they cannot be tested. We have therefore proposed a
conceptual framework for nonprobability sample surveys to explicate these
modeling assumptions, including practical suggestions about when it might be
justified to make such assumptions (section 2).
In addition, we have summarized the empirical evidence on the accuracy of
probability and nonprobability sample surveys (section 3). Our literature over-
view shows that, even in the age of declining response rates, accuracy in prob-
ability sample surveys is generally higher than in nonprobability sample
surveys. There is no empirical support to the claim that switching to nonprob-
ability sample surveys is advisable because the steadily declining response
rates across the globe compromise probability sample survey data quality.
Based on the accumulated empirical evidence, our key recommendation is to
continue to rely on probability sample surveys.
Review of Probability and Nonprobability Research 23
a
We report whether weighting sufficiently reduced bias based on the authors’ own
judgments as reported in their conclusions.
In the case that only nonprobability sample survey data are available, we
recommend carefully choosing among the various modeling approaches based
on their underlying assumptions. In order for researchers to be able to justify
the modeling approaches used, we recommend obtaining as much information
as possible about the data-generating process (see the transparency recommen-
dations in the appendix).
Apart from these key recommendations, this report also shows that there are
gaps in the existing literature. To be able to evaluate if and when nonprobabil-
ity sample surveys can be used as an alternative to probability sample surveys,
24 Cornesse et al.
Universe What universe does the panel represent? How well does American Association for Public Opinion Research
the panel cover that universe? (2015, §III.A.3); ESOMAR (2014, §5.2); ISO (2009,
§4.4.1, §4.7); ISO (2012, §4.5.1.4, §7.2.e, §7.2.f.2)
Sampling frame What method or methods are used to recruit panel American Association for Public Opinion Research
members? (2015, §III.A.5-8); ESOMAR (2012, §2, §5);
We recommend that the percentage of the sample ESOMAR (2014, §5.2); ESOMAR (2015a, §3.7);
recruited from different sources be included (e.g., on- ESOMAR (2015b, §5.2, §6, §6.1); IMRO (2015, §4);
line advertisement, email list, aggregator website). ISO (2012, §4.5.1.4, §7.2.f)
Panel size How large is the panel? What is the percentage of active IMRO (2015, §14, §17); ISO (2009, §4.4.2)
members? How is “active” defined?
Replenishment and How frequently are new panel members added? IMRO (2015, §18, §19); ISO (2009, §4.5.1)
retirement Are panel members retired from the panel by the vendor?
If so, what criteria are used to determine retirement?
What is the panel’s annual “churn rate” (the percentage
of all panelists who are voluntarily or involuntarily re-
tired each year)?
Aggregator sites Does the panel draw on aggregator sites, which are sites IMRO (2015, §6)
that allow respondents to select from multiple surveys
for which they may qualify (see IMRO 2015)
Sample from other Was sample from other providers used in this study? ESOMAR (2012, §4)
panels
Blending If there is more than one sample source, how are these ESOMAR (2012, §3); ESOMAR (2015b, §6.1)
blended together?
Cornesse et al.
Category exclusion How are panel members who have recently completed a IMRO (2015, §15); ISO (2009, §4.6.2)
survey on the same topic treated? If they are excluded,
for how long or using what rules?
Participation limits How often can a panel member be contacted to take part ESOMAR (2012, §19, §20); ESOMAR (2015a, §3.8);
in a survey in a specified period? IMRO (2015, §16); ISO (2009, §4.5.1, §4.6.2)
Continued
27
Other data quality What checks are made for satisficing and fraud? American Association for Public Opinion Research
checks These might include checks for speeding and patterned (2015, §III.A.17); ESOMAR (2012, §18); ESOMAR
responses, trap questions, counts of missing or nonsub- (2015a, §3.3, §3.8); ESOMAR (2015b, §6, §6.1,
stantive responses (e.g., don’t know), device finger- §6.4); IMRO (2015, §30); ISO (2009, §4.6.6, §4.7);
printing or logic checks. ISO (2012, §7.2.k)
Panel profile How often are profiles updated? IMRO (2015, §21)
Fieldwork dates When was the survey fielded? If there were multiple American Association for Public Opinion Research
sample sources, did the field period vary between (2015, §III.A.4); ESOMAR (2014, §5.2); ESOMAR
sources? (2015a, §3.8); ESOMAR (2015b, §6.3); ISO (2009,
§4.7); ISO (2012, §4.8.3, §7.2.g)
Invitations and What text is used for invitations and any reminders? American Association for Public Opinion Research
reminders We recommend disclosing the maximum number of (2015, §III.A.16); ESOMAR (2012, §13); ISO (2009,
reminders sent. §4.7)
Incentives What incentives, if any, are offered to panel members? American Association for Public Opinion Research
(2015, §III.A.16); ESOMAR (2012, §14); ESOMAR
(2015a, §3.7); ESOMAR (2015b, §6.1-2); IMRO
(2015, §20); ISO (2009, §4.5.2); ISO (2012, §7.2.i)
Questionnaire In the event that the researcher did not create the ques- American Association for Public Opinion Research
tionnaire, the questionnaire should be available, in- (2015, §III.A.2, §III.A.14-15); ESOMAR (2014,
cluding any screening questions. §5.2); ESOMAR (2015a, §3.8); ESOMAR (2015b,
As web surveys are an intensely visual medium (e.g., §5.2, §6.3); ISO (2009, §4.7); ISO (2012, §7.2.l)
Couper, Tourangeau, and Kenyon 2004 and Couper,
Tourangeau, Conrad, and Crawford 2004) we recom-
mend that screenshots be provided as part of reporting.
Cornesse et al.
Construct Notes
References
American Association for Public Opinion Research (2013). “Report of the AAPOR Task Force on
Nonprobability Sampling,” available at https://www.aapor.org/AAPOR_Main/media/
MainSiteFiles/NPS_TF_Report_Final_7_revised_FNL_6_22_13.pdf (accessed September 30,
2019).
American Association for Public Opinion Research (2015). “AAPOR Code of Ethics,” available at
https://www.aapor.org/Standards-Ethics/AAPOR-Code-of-Ethics.aspx (accessed September 30,
Chang, L., and J. A. Krosnick (2009), “National Surveys via RDD Telephone Interviewing versus
the Internet,” Public Opinion Quarterly, 73, 641–678.
Chen, J. K. T., R. L. Valliant, and M. R. Elliott (2018), “Calibrating Non-Probability Surveys to
Estimated Control Totals Using LASSO, with an Application to Political Polling,” Journal of
the Royal Statistical Society: Series C (Applied Statistics), 41, 657–681.
Chrzan, K., and T. Saunders (2012), “Improving the Mobile Survey Experience: A Closer Look at
Scale Orientation, Grids and Modality Effects,” Maritz research whitepaper, available at http://
www.websm.org/uploadi/editor/doc/1470813336Chrzan_Saunders_2012_Improving_the_
Elliott, M. R., and R. Valliant (2017), “Inference for Nonprobability Samples,” Statistical Science,
32, 249–264.
Erens, B., S. Burkill, M. P. Couper, F. Conrad, S. Clifton, C. Tanton, A. Phelps, et al. (2014),
“Nonprobability Web Surveys to Measure Sexual Behaviors and Attitudes in the General
Population: A Comparison with a Probability Sample Interview Survey,” Journal of Medical
Internet Research, 16, e276.
ESOMAR (2012), “28 Questions to Help Buyers of Online Samples,” available at https://www.
esomar.org/uploads/public/knowledge-and-standards/documents/ESOMAR-28-Questions-to-
Kennedy, C., A. Mercer, S. Keeter, N. Hatley, K. McGeeney, and A. Gimenez (2016), Evaluating
Online Nonprobability Surveys: Vendor Choice Matters; Widespread Errors Found for
Estimates Based on Blacks and Hispanics,” Retrieved from http://assets.pewresearch.org/
wp-content/uploads/sites/12/2016/04/Nonprobability-report-May-2016-FINAL.pdf
(accessed September 30, 2019).
Kiaer, A. N. (1895), “Observations et Experiences Concernant les Denombrements
Representatifs,” Bulletin of the International Statistical Institute, 9, 176–183.
Kish, L. (1965), Survey Sampling, New York: John Wiley and Sons.
Park, D. K., A. Gelman, and J. Bafumi (2004), “Bayesian Multilevel Estimation with
Poststratification: State-Level Estimates from National Polls,” Political Analysis, 12, 375–385.
Pasek, J. (2016), “When Will Nonprobability Surveys Mirror Probability Surveys? Considering
Types of Inference and Weighting Strategies as Criteria for Correspondence,” International
Journal of Public Opinion Research, 28, 269–291.
Pennay, D. W., D. Neiger, P. J. Lavrakas, and K. Borg (2018), “The Online Panels Benchmarking
Study: A Total Survey Error Comparison of Findings form Probability-Based Surveys and
Nonprobability Online Panel Surveys in Australia,” available at https://www.srcentre.com.au/