Professional Documents
Culture Documents
Advances in Business Statistics Methods and Data Collection Ger Snijkers Full Chapter
Advances in Business Statistics Methods and Data Collection Ger Snijkers Full Chapter
Edited by
Ger Snijkers
Statistics Netherlands
Mojca Bavdaž
University of Ljubljana
Stefan Bender
Deutsche Bundesbank and University of Mannheim
Jacqui Jones
Australian Bureau of Statistics
Steve MacFeely
World Health Organization and University College Cork
Joseph W. Sakshaug
Institute for Employment Research and Ludwig Maximilian University of Munich
Katherine J. Thompson
U.S. Census Bureau
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted
by law. Advice on how to obtain permission to reuse material from this title is available at
http://www.wiley.com/go/permissions.
The right of Ger Snijkers, Mojca Bavdaž, Stefan Bender, Jacqui Jones, Steve MacFeely, Joseph W. Sakshaug,
Katherine J. Thompson and Arnout van Delden to be identified as the authors of this work has been asserted in
accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at
www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears
in standard print versions of this book may not be available in other formats.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or
its affiliates in the United States and other countries and may not be used without written permission. All other
trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product
or vendor mentioned in this book.
Contents
4.3 Co-operation Between National Statistical Offices and National Central Bank
Statistics Functions Tackling Globalization Problems 69
4.3.1 Foreign Direct Investment Network as an Example of Co-operation 69
4.3.2 Early-Warning System (EWS) 70
4.3.3 A Roadmap for Solving the Globalization-Related Issues in Monetary, Financial,
and Balance of Payments – Statistics 71
4.4 Bridging the Gap Between Business and Economic Statistics Through Global Data
Sharing 72
4.4.1 Product Innovation – One-Off or Regular Data Sharing for Better Quality 72
4.4.2 Service Innovation – Improving Respondent Service for MNEs 73
4.4.3 Process Innovation to Statistical Production by Data Sharing 73
4.4.4 Innovating User Experience – Better Relevance and Consistency for Users 74
4.4.5 Organizational Innovation – Changing the Business Model of Official Statistics 74
4.4.6 Cultural Innovation – Key to Making it Happen 75
4.4.7 Innovation in Other Industries to Learn From 75
References 76
7.6.3 Informal Sector Statistics from National Population Census 2011 131
7.6.4 Informal Sector Statistics from National Economic Census 2018 133
7.6.5 Status of Keeping Accounting Record 133
7.6.6 Informality in Micro Small and Medium Establishments (MSME) 133
7.6.7 Street Business Situation 135
7.7 Annual Revenues/Sales, Operating Expenses in Not-Registered Establishments 137
7.8 Need of Regular Measurement Informal Sector 140
7.9 Conclusion 141
References 142
26 Alternative Data Sources in the Census Bureau’s Monthly State Retail Sales
Data Product 593
Rebecca Hutchinson, Scott Scheleur, and Deanna Weidenhamer
26.1 Introduction/Overview 593
26.2 History of State-Level Retail Sales at Census 594
26.3 Overview of the MSRS 595
26.4 Methodology 597
26.4.1 Directly Collected Data Inputs 597
26.4.2 Frame Creation 598
26.4.3 Estimation and Imputation 599
26.4.3.1 Composite Estimator 599
26.4.3.2 Synthetic Estimator 600
26.4.3.3 Hybrid Estimator 601
26.4.4 Quality Metrics 602
26.5 Use of Alternative Data Sources in MSRS 602
26.5.1 Input to MSRS Model 602
26.5.2 Validation 605
26.6 Conclusion 609
Disclaimer 610
References 610
xxii Contents
31 Variance Estimation Under Nearest Neighbor Ratio Hot Deck Imputation for
Multinomial Data: Two Approaches Applied to the Service Annual Survey
(SAS) 705
Rebecca Andridge, Jae Kwang Kim, and Katherine J. Thompson
31.1 Introduction 705
31.2 Basic Setup 709
31.3 Single Imputation Variance Estimation 710
xxiv Contents
Index 839
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
xxix
List of Contributors
k
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
xxx List of Contributors
k
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
List of Contributors xxxi
Ryan Janicki
Center for Statistical Research and
Methodology
U. S. Census Bureau
Washington
DC
USA
k
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
xxxii List of Contributors
k
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
List of Contributors xxxiii
k
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
xxxiv List of Contributors
k
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
List of Contributors xxxv
k
k
Downloaded from https://onlinelibrary.wiley.com/doi/ by Lao People's Dem Rep Hinari access, Wiley Online Library on [09/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
xxxvi List of Contributors
k
1
Section 1
In 2021, when ICES-VI took place, almost 30 years had passed since ICES-I, which yielded the
edited volume “Business Survey Methods” (Cox et al. 1995), the first reference book covering this
topic. During these three decades much has changed with regard to establishment statistics,
methods, and data collection. Clearly, it is time for a new volume, discussing these advances.
At the beginning of the 1990s, a group of practitioners working on establishment surveys recog-
nized that the “lack of published methods and communication among researchers was a stumbling
block for progress in solving business surveys’ unique problems” (Cox et al. 1995: xiii) and con-
cluded that “an international conference was needed to (1) provide a forum to describe methods in
current use, (2) present new or improved technologies, and (3) promote international interchange
of ideas” (ibid: xiii). In June 1993, practitioners, researchers, and methodologists from around the
world met in Buffalo, New York (USA), at the International Conference on Establishment Surveys
(ICES) to develop a community based entirely around establishment statistics. Rather than nar-
rowly focus on business surveys, the conference organizers broadened the scope to include all estab-
lishment surveys, whose target populations include businesses (establishments, firms), farms, and
institutions (schools, jails, governments) – basically, any unit other than a household or a person.
This conference created a recurring forum for networking and innovation in an often-overlooked
area of survey research methods and became the first in a series of international conferences.
As this community solidified, and the pace of developments in establishment surveys quickened,
ICES progressed from a seven-year conference cycle to a four-year cycle, with conferences in 2000
(II), 2007 (III), 2012 (IV), and 2016 (V), with the latest conference (ICES-VI) held virtually in
Advances in Business Statistics, Methods and Data Collection. Edited by Ger Snijkers, Mojca Bavdaž, Stefan Bender,
Jacqui Jones, Steve MacFeely, Joseph W. Sakshaug, Katherine J. Thompson, and Arnout van Delden.
© 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc.
4 1 Advances in Business Statistics, Methods and Data Collection: Introduction
2021, and ICES-VII being scheduled for 2024. Along the way, the conference title changed, from
“establishment surveys” to “establishment statistics,” reflecting the increased prevalence of other
data sources, such as censuses, registers, big data, and blended data applications in economic
measurements. This series of conferences has become a well-respected international platform
whose participants include practitioners, researchers, and methodologists in government, central
banks, academia, international organizations, and the private sector from around the globe.
The ICES-I edited volume “Business Survey Methods” (Cox et al. 1995) stood for almost
20 years as the only comprehensive overview reference book dedicated to establishment programs.
Featuring chapters drawn from conference presentations, this volume laid the groundwork for
other important references. “Designing and Conducting Business Surveys” (Snijkers et al. 2013) was
published in 2013. This textbook provides detailed guidelines of the entire process of designing
and conducting a business survey, ranging from constructs to be measured using a survey, through
collection and post-collection processing to dissemination of the results. “The Unit Problem
and Other Current Topics in Business Survey Methodology” (Lorenc et al. 2018) was published in
2018. In addition to these monographs, the Journal of Official Statistics published a selection
of 2007 ICES-III papers in a special section (JOS 2010), and subsequently published dedicated
special issues featuring papers from the 2012 ICES-IV (Smith and Phipps 2014) and 2016 ICES-V
(Thompson et al. 2018) conferences.
While the topics covered in that first edited volume remain relevant, much has changed over
the passing decades. As economies change and sciences evolve, methodologists around the world
have worked actively on the development, conduct, and evaluation of modern establishment
statistics programs. Furthermore, emerging data sources and new technologies are used in modern
applications; some of these were not discussed in the monographs noted above, simply because
they did not yet exist or were used on a small scale. These topics include new developments
in establishment surveys (such as nonprobability sampling, sophisticated developments in
web surveys, applications of adaptive and responsive data collection designs), improvements
in statistical process control applications, advanced data visualization methods and software,
widespread use of alternative/secondary data sources like registers and big data along with
improved methodologies and increased production of multi-source statistics, Internet of Things
possibilities (smart farming data, smart industries data), new computer technologies (such as web
scraping, text mining), and new indicators for the economy (like Sustainable Development Goals).
It is time for a new monograph on establishment statistics methodology! The ICES-VI conference
provides an excellent source of material for such a new volume.
ICES-VI was originally planned for June 2020 in New Orleans, but postponed because of the
COVID-19 pandemic, and was held as an online conference from 14–17 June 2021. Close to 425
participants from 32 countries representing over 160 organizations, met online to listen to and
discuss nearly 100 presentations, dealing with a wide variety of topics germane to establishment
statistics. In addition, 17 invited pre-recorded Introductory Overview Lectures (IOL) provided
a comprehensive overview of all key topics in establishment statistics (see Appendix). Book
contributions from conference authors were solicited via an open call to presenters; in some cases,
chapters present consolidated session content. Taken collectively, this edited volume presents
materials and reflects discussions on every aspect of the establishment statistics production
life cycle.
This volume is entirely new; it is neither an update of the Cox et al. book (1995) nor an update
of the Snijkers et al. (2013) textbook. Its scope is broader, containing comprehensive review papers
drawn from IOLs and cutting-edge applications of methods and data collection as applied to
establishment statistics programs, and thoughtful discussions on both previously existing and
new economic measures for economies that are increasingly global. The next section of this
1.2 The Importance of Establishment Statistics 5
introductory chapter discusses the importance of establishment statistics (Section 1.2), followed by
trends in establishment statistics research based on the ICES conference programs (Section 1.3),
the organization of this volume (Section 1.4), and a conclusion (Section 1.5).
Taken together, these establishment statistics provide a detailed and comprehensive picture of
the economy. They are essential for the compilation of the National Accounts, Input–Output, and
Supply–Use tables. The SNA framework, which is frequently used by policymakers, business com-
munities, and economists, is discussed in Sections 1 and 2 of this book. Advances in measurement
on phenomena like globalization, climate change, e-commerce, and informal sector of the economy
are needed. These topics are discussed in Section 1. While traditional establishment statistics are
produced using surveys and administrative data, resulting in multisource statistics (as discussed in
Sections 2–4, 6, and 7), the production of these new statistics may profit from new data sources and
new technologies, like big data, web scraping, and text mining (Bender and Sakshaug 2022; Hill
et al. 2021), as we will see in Sections 5 and 7.
Whatever data sources used, statistics that target organizations must deal with unique fea-
tures that present entirely different challenges from their household statistics counterparts
(Cox and Chinnappa 1995; Snijkers et al. 2013; Snijkers 2016). Even though establishment
survey methodology borrows from social survey methodology, blindly applying the same survey
methods, collection, and statistical models to establishment statistics can yield disastrous results.
A typical – and longstanding – problem in business statistics is the mapping of unit types that
are found in secondary data sources (like administrative data sources) and the statistical units
that are economically meaningful and comparable over time and across countries, as is shown in
Figure 1.1. The ICES-I edited volume (Cox et al. 1995) devoted a chapter to the construction of
statistical unit types held in a statistical business register (SBR) (Nijhowne 1995), and a chapter
on changes of those unit types over time (Struijs and Willeboordse 1995).
To reduce respondent burden on surveyed establishments or to validate survey collections,
establishment programs often utilize administrative data collected from the same (or nearly the
same) organizations about their operations, such as Value Added Tax (VAT) or tax data. Linkages
Reporting n 1 Enterprise
unit group
Tax unit
1
n
m
Reporting n 1 1 n
Enterprise Legal unit
unit
1
Reporting n 1 Local
unit unit
Figure 1.1 The unit problem in establishment statistics: A schematic representation of reporting,
statistical, and administrative unit types. Lines represent relations between the unit types: (1 : n), (n : 1), and
(m : n), with m, n ≥ 1. The exact situation and terminology may differ between countries. Source: Inspired by
Figure 13.1 in Chapter 13 by Cox et al. 2023.
1.2 The Importance of Establishment Statistics 7
accelerated since ICES-II. While the conference series deserves some credit for providing a forum
for collaboration, the drivers for these changes are, as ever, reducing production costs, reducing
response burden (i.e. reporting costs), improving data quality, and – above all – meeting emerg-
ing and increasing user demands for more timely, detailed, and coherent statistics (Linacre 2000;
Edwards 2007; Jones and O’Byrne 2023 [Chapter 8]).
An important integration trend is the need for harmonized and coordinated statistics programs
and integrated processes. At ICES-II, Smith (2000: 23) stated that “the most effective road to
improved statistical quality and minimum response burden lies through greater unification,
harmonization, and integration of statistical programs.” Indeed, many NSIs have been integrating
their statistics programs and production processes with the objective of maximizing production
efficiency, as well as addressing new user demands, both with regard to established economic
statistics and new statistics that adequately measure new economic developments like globaliza-
tion, e-commerce, or the informal economy. Since ICES-II, many of them have presented their
plans and experiences at subsequent ICES conferences.
This push to integrated business statistics production processes certainly accelerated the
development of standardized business survey methods. The ICES-III, IV, V, and VI programs
feature presentations on coordinated sampling designs, integrated business registers, coordinated
data collection, questionnaire design tailored to the business context, electronic questionnaires,
enhanced survey communication tools, generalized data editing and imputation programs, canned
estimation programs, and standardized platforms developed for dissemination of statistics.
Fundamental to this progress is the integration of the establishment perspective into the survey
design, yielding a deeper understanding of the business response process. This was first discussed
at ICES-I and elaborated upon at every subsequent ICES conference. Opening this “black box” has
proved to be instrumental in questionnaire design and pre-testing, in improving survey partici-
pation, implementing electronic data collection instruments, and executing mixed-mode designs,
thus reducing costs and response burden, and improving data quality.
The use of alternative data sources and data integration is another ICES integration trend.
From ICES-I onward, the expanded use of administrative data in establishment statistics has
been discussed. As Linacre mentions at ICES-II (2000: 3): “This area is receiving a lot of
attention by methodologists and good advances are being made in a number of countries.”
Historically, administrative data were used to construct sampling frames and business registers.
However, methodologists and NSIs systematically promoted exploitation of administrative data
to supplement or even replace survey collection, with an on-going objective to reduce response
burden.
By ICES-V, the data integration presentations expanded to include big data and other non-
traditional (organic) data, with an IOL as well as presentations featuring creative small area
estimation applications and investigations of satellite photography data and purchased third-party
registers. ICES-VI featured an even larger percentage of presentations on data integration, with
applications often involving sophisticated machine learning methods, emphasizing the production
of multisource statistics with linked data.
A final integration trend ICES has contributed to, concerns the increased networking and
collaboration between experts from different areas, yielding synergy between these areas. Every
ICES program retains “traditional” topics like respondent contact strategies, questionnaire design
practices, business frames and sampling, weighting, outlier detection, data editing and imputation,
data analysis, estimation, and variance estimation. By blending of standard survey methodology
and survey statistics topics – applied exclusively to establishment data – and emerging methods
and technologies, the ICES conference series has increased what was once a small pool of
10 1 Advances in Business Statistics, Methods and Data Collection: Introduction
specialists to a large and international group of multidisciplinary scientists. After six conferences,
we can conclude that the goal of the founding group (Cox et al. 1995) has been achieved.
These trends in integration at various levels, which are also seen in other international platforms
discussing establishment statistics (see e.g. SJIAOS 2020), will surely continue to be seen at the
next ICES conferences, since the basic challenges remain the same. Over the ICES years, the user
quality criteria for official statistics have been fully developed, as described by Marker (2017; Jones
and O’Byrne 2023 [Chapter 8]). In the future, criteria like timely, relevant, accurate, and coherent
statistics will remain to be critical drivers for innovating production procedures and methods. In
addition, production efficiency considerations, like costs, time, capacity, and response burden, as
well as data quality issues as defined by the total survey error framework (Haraldsen 2013; Snijkers
2016) and other data quality frameworks (for administrative data and big data: van Delden and
Lewis 2023 [Chapter 12], Biemer and Amaya 2021; Amaya et al. 2020) will define the production
conditions (Jones and O’Byrne 2023 [Chapter 8], Bender et al. 2023 [Chapter 22]).
At ICES-I Ryten (1995: 706) concluded, when discussing “Business Surveys in Ten Years from
now,” that “national statistical agencies must adjust today’s structure of surveys, censuses, and
administrative registers as well as today’s capabilities.” We have seen this trend during the ICES
years. However, moving away from siloed statistics production took time, and even today many
NSIs are still organized along traditional statistics outputs. The 2020 COVID pandemic demon-
strated the fluidity of procedures and methods in NSIs under pressure, producing concurrent
measures of generally good quality and yielding new and important statistics at the same time
(see Chapter 11 by Jones et al. 2023). We expect this pandemic to be a tipping point, and that in
the future NSIs will be more flexible in producing more timely and relevant statistics, using and
integrating various data sources, and applying a number of statistical and data collection methods.
(For additional discussions on the impact of the COVID-19 pandemic on official statistics, see the
Statistical Journal of the IAOS special issues on this topic [SJIAOS 2021a]).
The advantage of today, as compared to the times of ICES-I, is that our methods have developed,
more data sources can be used and better exploited, and IT technology has improved, as we have
seen above. Now we need to make sure that all these methods, technologies, and data sources come
together. Integrating new IT technologies in our methods is a next step we need to take (Bavdaž et al.
2020).
Looking to the future we expect surveys to be better integrated and coordinated, using more
sophisticated sample designs and sophisticated estimation methods that benefit from AI (artificial
intelligence) and machine learning technologies. Instead of using questionnaires, to a large extent
the data will be gathered using System-to-System data communication methods (or Electronic Data
Interchange, EDI methods; Buiten et al. 2018) applying for example eXtened Business Reporting
Language (XBRL) protocols or Application Programming Interfaces (API). In the next decade, we
expect businesses themselves to be ready for this technology, having implemented smart industry
technologies (or Industry 4.0; Chakravarti 2021; Haverkort and Zimmermann 2017) using sensor
technology and the Internet of Things yielding an integrated business information chain (Bharosa
et al. 2015), and having developed harmonized data definitions. As Snijkers et al. (2021) concluded
from an exploratory study with precision farming data: “The fruit is not hanging as low as we
thought”: wide-scale adoption of smart industry technologies and data harmonization between
various platforms is inevitable, even if not widely adopted in June 2021. Like with administrative
registers, NSIs should work closely together with other parties like software developers of business
systems to implement EDI systems, making system-to-system data communication possible, and
having the required data stored in the business systems. Also, web scraping and text mining may be
used to collect data directly from the internet. An example of the use of web scraping in the field of
1.4 Organization of This Book 11
A POLAR GLACIER.
Having said thus much of the structure, causes, characteristics,
and movement of glaciers, we proceed to consider some of the more
remarkable of those which are situated in the Arctic World.
The glaciers of the Polar Regions do not differ in structure or
mode of formation from those of other countries. Yet they possess
some peculiar features, and to a superficial observer might seem
independent of the physical laws we have attempted to explain. That
this is not the case has been shown by Charles Martins, who
carefully studied the glaciers of Spitzbergen on the occasion of the
exploring voyage of the Recherche to that island, and has
demonstrated that their differences are but a particular case of the
general phenomenon.
As special characters he points out, first, the rarity of needles and
prisms of ice, which he attributes to the slight inclination and the
uniformity of the slopes, as well as to the diminution of the solar
heat, which, even in the long summer days, does not melt the
surface. There are no rills or streams capable of hollowing out
crevasses and moulding protuberances or projections. But
transversal crevasses produced by the movement of the glaciers are
numerous, and these are often very wide and very deep.