Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Download and Read online, DOWNLOAD EBOOK, [PDF EBOOK EPUB ], Ebooks

download, Read Ebook EPUB/KINDE, Download Book Format PDF

Statistical Tools for Program Evaluation : Methods


and Applications to Economic Policy, Public
Health, and Education 1st Edition Jean-Michel
Josselin

OR CLICK LINK
https://textbookfull.com/product/statistical-
tools-for-program-evaluation-methods-and-
applications-to-economic-policy-public-health-and-
education-1st-edition-jean-michel-josselin/

Read with Our Free App Audiobook Free Format PFD EBook, Ebooks dowload PDF
with Andible trial, Real book, online, KINDLE , Download[PDF] and Read and Read
Read book Format PDF Ebook, Dowload online, Read book Format PDF Ebook,
[PDF] and Real ONLINE Dowload [PDF] and Real ONLINE
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Biota Grow 2C gather 2C cook Loucas

https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/

Policy, Program and Project Evaluation: A Toolkit for


Economic Analysis in a Changing World Anwar Shah

https://textbookfull.com/product/policy-program-and-project-
evaluation-a-toolkit-for-economic-analysis-in-a-changing-world-
anwar-shah/

Methods for the economic evaluation of health care


programmes Fourth Edition Drummond

https://textbookfull.com/product/methods-for-the-economic-
evaluation-of-health-care-programmes-fourth-edition-drummond/

Improving public services: international experiences in


using evaluation tools to measure program performance
1st Edition Douglas J. Besharov

https://textbookfull.com/product/improving-public-services-
international-experiences-in-using-evaluation-tools-to-measure-
program-performance-1st-edition-douglas-j-besharov/
Model-Based Geostatistics for Global Public Health:
Methods and Applications Peter J. Diggle

https://textbookfull.com/product/model-based-geostatistics-for-
global-public-health-methods-and-applications-peter-j-diggle/

Regulation of Energy Markets: Economic Mechanisms and


Policy Evaluation Machiel Mulder

https://textbookfull.com/product/regulation-of-energy-markets-
economic-mechanisms-and-policy-evaluation-machiel-mulder/

Evaluation Practice for Collaborative Growth: A Guide


to Program Evaluation with Stakeholders and Communities
Lori L Bakken

https://textbookfull.com/product/evaluation-practice-for-
collaborative-growth-a-guide-to-program-evaluation-with-
stakeholders-and-communities-lori-l-bakken/

Absolute Risk: Methods and Applications in Clinical


Management and Public Health 1st Edition Ruth M.
Pfeiffer

https://textbookfull.com/product/absolute-risk-methods-and-
applications-in-clinical-management-and-public-health-1st-
edition-ruth-m-pfeiffer/

Computational and Statistical Methods for Analysing Big


Data with Applications 1st Edition Shen Liu

https://textbookfull.com/product/computational-and-statistical-
methods-for-analysing-big-data-with-applications-1st-edition-
shen-liu/
Jean-Michel Josselin
Benoît Le Maux

Statistical Tools
for Program
Evaluation
Methods and Applications to Economic
Policy, Public Health, and Education
Statistical Tools for Program Evaluation
Jean-Michel Josselin • Benoı̂t Le Maux

Statistical Tools for


Program Evaluation
Methods and Applications to Economic
Policy, Public Health, and Education
Jean-Michel Josselin Benoı̂t Le Maux
Faculty of Economics Faculty of Economics
University of Rennes 1 University of Rennes 1
Rennes, France Rennes, France

ISBN 978-3-319-52826-7 ISBN 978-3-319-52827-4 (eBook)


DOI 10.1007/978-3-319-52827-4

Library of Congress Control Number: 2017940041

# Springer International Publishing AG 2017


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with
regard to jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Acknowledgments

We would like to express our gratitude to those who helped us and made the
completion of this book possible.
First of all, we are deeply indebted to the Springer editorial team and particularly
Martina BIHN whose support and encouragement allowed us to finalize this
project.
Furthermore, we have benefited from helpful comments by colleagues and we
would like to acknowledge the help of Maurice BASLÉ, Arthur CHARPENTIER,
Pauline CHAUVIN, Salah GHABRI, and Christophe TAVÉRA. Of course, any
mistake that may remain is our entire responsibility.
In addition, we are grateful to our students who have been testing and
experimenting our lectures for so many years. Parts of the material provided here
have been taught at the Bachelor and Master levels, in France and abroad. Several
students and former students have been helping us improve the book. We really
appreciated their efforts and are very grateful to them: Erwan AUTIN, Benoı̂t
CARRÉ, Aude DAILLÈRE, Kristýna DOSTÁLOVÁ, and Adrien VEZIE.
Finally, we would like to express our sincere gratefulness to our families for
their continuous support and encouragement.

v
Contents

1 Statistical Tools for Program Evaluation: Introduction and


Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Challenge of Program Evaluation . . . . . . . . . . . . . . . . . . . 1
1.2 Identifying the Context of the Program . . . . . . . . . . . . . . . . . . 4
1.3 Ex ante Evaluation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Ex post Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 How to Use the Book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Part I Identifying the Context of the Program


2 Sampling and Construction of Variables . . . . . . . . . . . . . . . . . . . . . 15
2.1 A Step Not to Be Taken Lightly . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Choice of Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Conception of the Questionnaire . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Coding of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 Descriptive Statistics and Interval Estimation . . . . . . . . . . . . . . . . . 45
3.1 Types of Variables and Methods . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Tabular Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Graphical Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Measures of Central Tendency and Variability . . . . . . . . . . . . . 64
3.5 Describing the Shape of Distributions . . . . . . . . . . . . . . . . . . . 69
3.6 Computing Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . 77
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4 Measuring and Visualizing Associations . . . . . . . . . . . . . . . . . . . . . 89
4.1 Identifying Relationships Between Variables . . . . . . . . . . . . . . 89
4.2 Testing for Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3 Chi-Square Test of Independence . . . . . . . . . . . . . . . . . . . . . . 99

vii
viii Contents

4.4 Tests of Difference Between Means . . . . . . . . . . . . . . . . . . . . . 105


4.5 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.6 Multiple Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . 126
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5 Econometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.1 Understanding the Basic Regression Model . . . . . . . . . . . . . . . 137
5.2 Multiple Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3 Assumptions Underlying the Method of OLS . . . . . . . . . . . . . . 153
5.4 Choice of Relevant Variables . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.5 Functional Forms of Regression Models . . . . . . . . . . . . . . . . . 164
5.6 Detection and Correction of Estimation Biases . . . . . . . . . . . . . 167
5.7 Model Selection and Analysis of Regression Results . . . . . . . . 174
5.8 Models for Binary Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . 180
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6 Estimation of Welfare Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.1 Valuing the Consequences of a Project . . . . . . . . . . . . . . . . . . 189
6.2 Contingent Valuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.3 Discrete Choice Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.4 Hedonic Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.5 Travel Cost Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.6 Health-Related Quality of Life . . . . . . . . . . . . . . . . . . . . . . . . 221
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

Part II Ex ante Evaluation


7 Financial Appraisal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.1 Methodology of Financial Appraisal . . . . . . . . . . . . . . . . . . . . 235
7.2 Time Value of Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.3 Cash Flows and Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.4 Profitability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.5 Real Versus Nominal Values . . . . . . . . . . . . . . . . . . . . . . . . . . 255
7.6 Ranking Investment Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.7 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
8 Budget Impact Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.1 Introducing a New Intervention Amongst Existing Ones . . . . . . 269
8.2 Analytical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
8.3 Budget Impact in a Multiple-Supply Setting . . . . . . . . . . . . . . . 275
8.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.5 Sensitivity Analysis with Visual Basic . . . . . . . . . . . . . . . . . . . 281
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Contents ix

9 Cost Benefit Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291


9.1 Rationale for Cost Benefit Analysis . . . . . . . . . . . . . . . . . . . . . 291
9.2 Conceptual Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
9.3 Discount of Benefits and Costs . . . . . . . . . . . . . . . . . . . . . . . . 299
9.4 Accounting for Market Distortions . . . . . . . . . . . . . . . . . . . . . . 306
9.5 Deterministic Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . 311
9.6 Probabilistic Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . 313
9.7 Mean-Variance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
10 Cost Effectiveness Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
10.1 Appraisal of Projects with Non-monetary Outcomes . . . . . . . . . 325
10.2 Cost Effectiveness Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . 328
10.3 The Efficiency Frontier Approach . . . . . . . . . . . . . . . . . . . . . . 336
10.4 Decision Analytic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 342
10.5 Numerical Implementation in R-CRAN . . . . . . . . . . . . . . . . . . 351
10.6 Extension to QALYs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
10.7 Uncertainty and Probabilistic Sensitivity Analysis . . . . . . . . . . 358
10.8 Analyzing Simulation Outputs . . . . . . . . . . . . . . . . . . . . . . . . . 371
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
11 Multi-criteria Decision Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
11.1 Key Concepts and Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
11.2 Problem Structuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
11.3 Assessing Performance Levels with Scoring . . . . . . . . . . . . . . . 390
11.4 Criteria Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
11.5 Construction of a Composite Indicator . . . . . . . . . . . . . . . . . . . 398
11.6 Non-Compensatory Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 401
11.7 Examination of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416

Part III Ex post Evaluation


12 Project Follow-Up by Benchmarking . . . . . . . . . . . . . . . . . . . . . . . 419
12.1 Cost Comparisons to a Reference . . . . . . . . . . . . . . . . . . . . . . 419
12.2 Cost Accounting Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 423
12.3 Effects of Demand Structure and Production Structure
on Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
12.4 Production Structure Effect: Service-Oriented Approach . . . . . . 433
12.5 Production Structure Effect: Input-Oriented Approach . . . . . . . 436
12.6 Ranking Through Benchmarking . . . . . . . . . . . . . . . . . . . . . . . 440
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
x Contents

13 Randomized Controlled Experiments . . . . . . . . . . . . . . . . . . . . . . . 443


13.1 From Clinical Trials to Field Experiments . . . . . . . . . . . . . . . . 443
13.2 Random Allocation of Subjects . . . . . . . . . . . . . . . . . . . . . . . . 448
13.3 Statistical Significance of a Treatment Effect . . . . . . . . . . . . . . 453
13.4 Clinical Significance and Statistical Power . . . . . . . . . . . . . . . . 463
13.5 Sample Size Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
13.6 Indicators of Policy Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
13.7 Survival Analysis with Censoring: The Kaplan-Meier
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
13.8 Mantel-Haenszel Test for Conditional Independence . . . . . . . . . 483
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
14 Quasi-experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
14.1 The Rationale for Counterfactual Analysis . . . . . . . . . . . . . . . . 489
14.2 Difference-in-Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
14.3 Propensity Score Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
14.4 Regression Discontinuity Design . . . . . . . . . . . . . . . . . . . . . . . 512
14.5 Instrumental Variable Estimation . . . . . . . . . . . . . . . . . . . . . . . 519
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
Statistical Tools for Program Evaluation:
Introduction and Overview 1

1.1 The Challenge of Program Evaluation

The past 30 years have seen a convergence of management methods and practices
between the public sector and the private sector, not only at the central government
level (in particular in Western countries) but also at upper levels (European
commission, OECD, IMF, World Bank) and local levels (municipalities, cantons,
regions). This “new public management” intends to rationalize public spending,
boost the performance of services, get closer to citizens’ expectations, and contain
deficits. A key feature of this evolution is that program evaluation is nowadays part
of the policy-making process or, at least, on its way of becoming an important step
in the design of public policies. Public programs must show evidence of their
relevance, financial sustainability and operationality. Although not yet systemati-
cally enacted, program evaluation intends to grasp the impact of public projects on
citizens, as comprehensively as possible, from economic to social and environmen-
tal consequences on individual and collective welfare. As can be deduced, the task
is highly challenging as it is not so easy to put a value on items such as welfare,
health, education or changes in environment. The task is all the more demanding
that a significant level of expertise is required for measuring those impacts or for
comparing different policy options.
The present chapter offers an introduction to the main concepts that will be used
throughout the book. First, we shall start with defining the concept of program
evaluation itself. Although there is no consensus in this respect, we may refer to the
OECD glossary which states that evaluation is the “process whereby the activities
undertaken by ministries and agencies are assessed against a set of objectives or
criteria.” According to Michael Quinn Patton, former President of the American
Evaluation Association, program evaluation can also be defined as “the systematic
collection of information about the activities, characteristics, and outcomes of
programs, for use by people to reduce uncertainties, improve effectiveness, and
make decisions.” We may also propose our own definition of the concept: program
evaluation is a process that consists in collecting, analyzing, and using information

# Springer International Publishing AG 2017 1


J.-M. Josselin, B. Le Maux, Statistical Tools for Program Evaluation,
DOI 10.1007/978-3-319-52827-4_1
2 1 Statistical Tools for Program Evaluation: Introduction and Overview

Relevance Eficiency Effectiveness

Needs Design Inputs Outputs Short & long-term outcomes


Indicators of Objectives/target Indicators of Indicators of Result and impact indicators
context means realization

Fig. 1.1 Program evaluation frame

to assess the relevance of a public program, its effectiveness and its efficiency.
Those concepts are further detailed below. Note that a distinction will be made
throughout the book between a program and its alternative and competing strategies
of implementation. By strategies, we mean the range of policy options or public
projects that are considered within the framework of the program. The term
program, on the other hand, has a broader scope and relates to the whole range of
steps that are carried out in order to attain the desired goal.
As shown in Fig. 1.1, a program can be described in terms of needs, design,
inputs and outputs, short and long-term outcomes. Needs can be defined as a desire
to improve current outcomes or to correct them if they do not reach the required
standard. Policy design is about the definition of a course of action intended to meet
the needs. The inputs represent the resources or means (human, financial, and
material) used by the program to carry out its activities. The outputs stand for
what comes out directly from those activities (the intervention) and which are under
direct control of the authority concerned. The short-term and long-term outcomes
stand for effects that are induced by the program but not directly under the control
of the authority. Those include changes in social, economic, environmental and
other indicators.
Broadly speaking, the evaluation process can be represented through a linear
sequence of four phases (Fig. 1.1). First, a context analysis must gather information
and determine needs. For instance, it may evidence a high rate of school dropout
among young people in a given area. A program may help teachers, families and
children and contribute to prevent or contain dropout. If the authority feels that the
consequences on individual and collective welfare are great enough to justify the
design of a program, and if such a program falls within their range of competences,
then they may wish to put it forward. Context analysis relies on descriptive and
inferential statistical tools to point out issues that must be addressed. Then, the
assessment of the likely welfare changes that the program would bring in to citizens
is a crucial task that uses various techniques of preference revelation and
measurement.
Second, ex-ante evaluation is interested in setting up objectives and solutions to
address the needs in question. Ensuring the relevance of the program is an essential
part of the analysis. Does it make sense within the context of its environment?
Coming back to our previous example, the program can for instance consist of
1.1 The Challenge of Program Evaluation 3

alternative educational strategies of follow-up for targeted schoolchildren, with


various projects involving their teachers, families and community. Are those
strategies consistent with the overall goal of the program? It is also part of this
stage to define the direction of the desired outcome (e.g., dropout reduction) and,
sometimes, the desired outcome that should be arrived at, namely the target (e.g., a
reduction by half over the project time horizon). Another crucial issue is to select a
particular strategy among the competing ones. In this respect, methods of ex-ante
evaluation include financial appraisal, budget impact analysis, cost benefit analysis,
cost effectiveness analysis and multi-criteria decision analysis. The main concern is
to find the most efficient strategy. Efficiency can be defined as the ability of the
program to achieve the expected outcomes at reasonable costs (e.g., is the budget
burden sustainable? Is the strategy financially and economically profitable? Is it
cost-effective?)
Third, during the implementation phase, it is generally advised to design a
monitoring system to help the managers follow the implementation and delivery
of the program. Typical questions are the following. Are potential beneficiaries
aware of the program? Do they have access to it? Is the application and selection
procedure appropriate? Indicators of means (operating expenditures, grants
received, number of agents) and indicators of realization (number of beneficiaries
or users) can be used to measure the inputs and the outputs, respectively. Addition-
ally, a set of management and accounting indicators can be constructed and
collected to relate the inputs to the outputs (e.g., operating expenditures per user,
number of agents per user). Building a well documented data management system
is crucial for two reasons. First, those performance indicators can be used to report
progress and alert managers to problems. Second, they can be used subsequently for
ex-post evaluation purposes.
Last, the main focus of ex post evaluation is on effectiveness, i.e. the extent to
which planned outcomes are achieved as a result of the program, ceteris paribus.
Among others, methods include benchmarking, randomized controlled experiments
and quasi-experiments. One difficulty is the time frame. For instance, the informa-
tion needed to assess the program’s outcomes is sometimes fully available only
several years after the end of the program. For this reason, one generally
distinguishes the short-term outcomes, i.e. the immediate effects on individuals’
status as measured by a result indicator (e.g., rate of dropout during mandatory
school time) from the longer term outcomes, i.e. the environmental, social and
economic changes as measured by impact indicators (e.g., the impact of dropout on
unemployment). In practice, ex post evaluation focuses mainly on short-term
outcomes, with the aim to measure what has happened as a direct consequence of
the intervention. The analysis also assesses what the main factors behind success or
failure are.
We should come back to this distinction that we already pointed out between
efficiency and effectiveness. Effectiveness is about the level of outcome per se and
whether the intervention was successful or not in reaching a desired target.
Depending on the policy field, the outcome in question may differ greatly. In
health, for instance, the outcome can relate to survival. In education, it can be
4 1 Statistical Tools for Program Evaluation: Introduction and Overview

school completion. Should an environmental program aim at protecting and restor-


ing watersheds, then the outcome would be water quality. An efficiency analysis on
the other hand has a broader scope as it relates the outcomes of the intervention to
its cost.
Note also that evaluation should not be mistaken for monitoring. Roughly
speaking, monitoring refers to the implementation phase and aims to measure
progress and achievement all along the program’s lifespan by comparing the inputs
with the achieved outputs. The approach consists in defining performance
indicators, routinely collect data and examine progress through time in order to
reduce the likelihood of facing major delays or cost overruns. While it constitutes
an important step of the intervention logic of a program, monitoring is not about
evaluating outcomes per se and, as such, will be disregarded in the present work.
The remainder of the chapter is as follows. Section 1.2 offers a description of the
tools that can be used to assess the context of a public program. Sections 1.3 and 1.4
are about ex-ante and ex-post evaluations respectively. Section 1.5 explains how to
use the book.

1.2 Identifying the Context of the Program

The first step of the intervention logic is to describe the social, economic and
institutional context in which the program is to be implemented. Identifying
needs, determining their extent, and accurately defining the target population are
the key issues. The concept of “needs” can be defined as the difference, or gap,
between a current situation and a reasonably desired situation. Needs assessment
can be based on a cross-sectional study (comparison of several jurisdictions at one
specific point in time), a longitudinal study (repeated observations over several
periods of time), or a panel data study (both time and individual dimensions are
taken into account). Statistical tools which are relevant in this respect are numerous.
Figure 1.2 offers an illustration.
First, a distinction is made between descriptive statistics and inferential statis-
tics. Descriptive statistics summarizes data numerically, graphically or with tables.
The main goal is the identification of patterns that might emerge in a sample. A
sample is a subset of the general population. The process of sampling is far from
straightforward and it requires an accurate methodology if the sample is to ade-
quately represent the population of interest. Descriptive statistical tools include
measures of central tendency (mean, mode, median) to describe the central position
of observations in a group of data, and measures of variability (variance, standard
deviation) to summarize how spread out the observations are. Descriptive statistics
does not claim to generalize the results to the general population. Inferential
statistics on the other hand relies on the concept of confidence interval, a range of
values which is likely to include an unknown characteristic of a population. This
population parameter and the related confidence interval are estimated from the
sample data. The method can also be used to test statistical hypotheses, e.g.,
whether the population parameter is equal to some given value or not.
1.2 Identifying the Context of the Program 5

Fig. 1.2 Statistical methods


at a glance Sample

Descriptive
statistics

Univariate Bivariate Multivariate


analysis analysis analysis

Inferential
statistics

Population

Second, depending on the number of variables that are examined, a distinction is


made between univariate, bivariate and multivariate analyses. Univariate analysis is
the simplest form and it examines one single variable at a time. Bivariate analysis
focuses on two variables per observation simultaneously with the goal of
identifying and quantifying their relationship using measures of association and
making inferences about the population. Last, multivariate analyses are based on
more than two variables per observation. More advanced tools, e.g., econometric
analysis, must be employed in that context. Broadly speaking, the approach consists
in estimating one or several equations that the evaluator think are relevant to
explain a phenomenon. A dependent variable (explained or endogenous variable)
is then expressed as a function of several independent variables (explanatory or
exogenous variables, or regressors).
Third, program evaluation aims at identifying how the population would fare if
the identified needs were met. To do so, the evaluator has to assess the indirect costs
(negative externalities) as well as benefits (direct utility, positive externalities) to
society. When possible, these items are expressed in terms of equivalent money-
values and referred to as the willingness to pay for the benefits of the program or the
willingness to accept its drawbacks. In other cases, especially in the context of
health programs, those items must be expressed in terms of utility levels (e.g.,
quality adjusted life years lived, also known as QALYs). Several methods exist
with their pros and cons (see Fig. 1.3). For instance, stated preference methods
(contingent valuation and discrete choice experiment) exploit specially constructed
questionnaires to elicit willingness to pay. Their main shortcoming is the failure to
properly consider the cognitive constraints and strategic behavior of the agents
participating in the experiment, leading to individuals’ stated preferences that may
not totally reflect their genuine preferences. Revealed preference methods use
information from related markets and examine how agents behave in the face of
real choices (hedonic-pricing and travel-cost methods). The main advantage of
those methods is that they imply real money transactions and, as such, avoid the
6 1 Statistical Tools for Program Evaluation: Introduction and Overview

Welfare
valuation

Monetized Non-monetized
Non-
outcomes outcomes

Stated Revealed
QALYs
preference preference

Contingent valuation Hedonic pricing method Standard gamble


Discrete choice experiment Travel cost method Time trade-off
Discrete choice experiment
Costs and beneits are inferred Costs and beneits are inferred Construction of multiattribute
from specially constructed from what is observed on utility functions
questionnaires existing markets

Fig. 1.3 Estimation of welfare changes

potential problems associated with hypothetical responses. They require however a


large dataset and are based on sets of assumptions that are controversial. Last,
health technology assessment has developed an ambitious framework for
evaluating personal perceptions of the health states individuals are in or may fall
into. Contrary to revealed or stated preferences, this valuation does not involve any
monetization of the consequences of a health program on individual welfare.
Building a reliable and relevant database is a key aspect of context analysis.
Often one cannot rely on pre-existing sources of data and a survey must be
implemented to collect information from some units of a population. The design
of the survey has its importance. It is critical to be clear on the type of information
one needs (individuals and organizations involved, time period, geographical area),
and on how the results will be used and by whom. The study must not only concern
the socio economic conditions of the population (e.g., demographic dynamics, GDP
growth, unemployment rate) but must also account for the policy and institutional
aspects, the current infrastructure endowment and service provision, the existence
of environmental issues, etc. A good description of the context and reliable data are
essential, especially if one wants to forecast future trends (e.g., projections on users,
benefits and costs) and motivate the assumptions that will be made in the
subsequent steps of the program evaluation.

1.3 Ex ante Evaluation Methods

Making decisions in a non-market environment does not mean the absence of


budget constraint. In the context of decisions on public projects, there are usually
fixed sectoral (healthcare, education, etc.) budgets from which to pick the resources
required to fund interventions. Ex ante evaluation is concerned with designing
1.3 Ex ante Evaluation Methods 7

public programs that achieve some effectiveness, given those budget constraints.
Different forms of evaluation can take place depending on the type of outcome that
is analyzed. It is therefore crucial to clearly determine the program’s goals and
objectives before carrying out an evaluation. The goal can be defined as a statement
of the desired effect of the program. The objectives on the other hand stand for
specific statements that support the accomplishment of the goal.
Different strategies/options can be envisaged to address the objectives of the
program. It is important that those alternative strategies are compared on the basis
of all relevant dimensions, be it technological, institutional, environmental, finan-
cial, social and economic. Among others, most popular methods of comparison
include financial analysis, budget impact analysis, cost benefit analysis, cost effec-
tiveness analysis and multi-criteria decision analysis. Each of these methods has its
specificities. The key elements of a financial analysis are the cost and revenue
forecasts of the program. The development of the financial model must consider
how those items interact with each other to ensure both the sustainability (capacity
of the project revenues to cover the costs on an annual basis) and profitability
(capacity of the project to achieve a satisfactory rate of return) of the program.
Budget impact analysis examines the extent to which the introduction of a new
strategy in an existing program affects the authority’s budget as well as the level
and allocation of outcomes amongst the interventions (including the new one). Cost
benefit analysis aims to compare cost forecasts with all social, economic and
environmental benefits, expressed in monetary terms. Cost effectiveness analysis
on the other hand focuses on one single measure of effectiveness and compares the
relative costs and outcomes of two or more competing strategies. Last, multi-
criteria decision analysis is concerned with the analysis of multiple outcomes that
are not monetized but reflect the several dimensions of the pursued objective.
Financial flows may be included directly in monetary terms (e.g., a cost, an average
wage) but other outcomes are expressed in their natural unit (e.g., success rate,
casualty frequency, utility level).
Figure 1.4 underlines roughly the differences between the ex ante evaluation
techniques. All approaches account for cost considerations. Their main difference is
with respect to the outcome they examine.

Financial Analysis Versus Cost Benefit Analysis A financial appraisal examines


the projected revenues with the aim of assessing whether they are sufficient to cover
expenditures and to make the investment sufficiently profitable. Cost benefit analy-
sis goes further by considering also the satisfaction derived from the consumption
of public services. All effects of the project are taken into account, including social,
economic and environmental consequences. The approaches are thereby different,
but also complementary, as a project that is financially viable is not necessarily
economically relevant and vice versa. In both approaches, discounting can be used
to compare flows occurring at different time periods. The idea is based on the
principle that, in most cases, citizens prefer to receive goods and services now
rather than later.
8 1 Statistical Tools for Program Evaluation: Introduction and Overview

Ex ante
evaluation

Financial Economic
evaluation evaluation

Monetized Non--monetized
outcomes outcomes

Single Multiple Multiple Single Multiple


strategy strategies outcomes outcome outcomes

Financial Budget Impact Costt Beneit Cost effectiveness Multi--criteria


Analysis Analysis Analysis analysis Decision Analysis

Fig. 1.4 Ex ante evaluation techniques

Budget Impact Versus Cost Effectiveness Analysis Cost effectiveness analysis


selects the set of most efficient strategies by comparing their costs and their
outcomes. By definition, a strategy is said to be efficient if no other strategy or
combination of strategies is as effective at a lower cost. Yet, while efficient, the
adoption of a strategy not only modifies the way demand is addressed but may also
divert the demand for other types of intervention. The purpose of budget impact
analysis is to analyze this change and to evaluate the budget and outcome changes
initiated by the introduction of the new strategy. A budget impact analysis measures
the evolution of the number of users or patients through time and multiplies this
number with the unit cost of the interventions. The aim is to provide the decision-
maker with a better understanding of the total budget required to fund the
interventions. It is usually performed in parallel to a cost effectiveness analysis.
The two approaches are thus complementary.

Cost Benefit Versus Cost Effectiveness Analysis Cost benefit analysis compares
strategies based on the net welfare each strategy brings to society. The approach
rests on monetary measures to assess those impacts. Cost effectiveness analysis on
the other hand is a tool applicable to strategies where benefits can be identified but
where it is not possible or relevant to value them in monetary terms (e.g., a survival
rate). The approach does not sum the cost with the benefits but, instead, relies on
pairwise comparisons by valuing cost and effectiveness differences. A key feature
of the approach is that only one benefit can be used as a measure of effectiveness.
1.4 Ex post Evaluation 9

For instance, quality adjusted life years (QALYs) are a frequently used measure of
outcome. While cost effectiveness analysis has become a common instrument for
the assessment of public health decisions, it is far from widely used in other fields of
collective decisions (transport, environment, education, security) unlike cost
benefit analysis.

Cost Benefit Versus Multi-criteria Decision Analysis Multi-criteria decision


analysis is used whenever several outcomes have to be taken into account but yet
cannot be easily expressed in monetary terms. For instance, a project may have
major environmental impacts but it is found difficult to estimate the willingness to
pay of agents to avoid ecological and health risks. In that context, it becomes
impossible to incorporate these elements into a conventional cost benefit analysis.
Multi-criteria decision analysis overcomes this issue by measuring those
consequences on numerical scales or by including qualitative descriptions of the
effects. In its simplest form, the approach aims to construct a composite indicator
that encompasses all those different measurements and allows the stakeholders’
opinions to be accounted for. Weights are assigned on the different dimensions by
the decision-maker. Cost benefit analysis on the other hand does not need to assign
weights. Using a common monetary metric, all effects are summed into a single
value, the net benefit of the strategy.

1.4 Ex post Evaluation

Demonstrating that a particular intervention has induced a change in the level of


effectiveness is often made difficult by the presence of confounding variables that
connect with both the intervention and the outcome variable. It is important to keep
in mind that there is a distinction between causation and association. Imagine for
instance that we would like to measure the effect of a specific training program,
(e.g., evening lectures) on academic success among students at risk of school
failure. The characteristics of the students, in particular their motivation and
abilities, are likely to affect their grades but also their participation in the program.
It is thereby the task of the evaluator to control for those confounding factors and
sources of potential bias. As shown in Fig. 1.5., one can distinguish three types of
evaluation techniques in this matter: randomized controlled experiment,
benchmarking analysis and quasi-experiment.
Basically speaking, a controlled experiment aims to reduce the differences
among users before the intervention has taken place by comparing groups of similar
characteristics. The subjects are randomly separated into one or more control
groups and treatment groups, which allows the effects of the treatment to be
isolated. For example, in a clinical trial, one group may receive a drug while
another group may receive a placebo. The experimenter then can test whether the
differences observed between the groups on average (e.g., health condition) are
caused by the intervention or due to other factors. A quasi-experiment on the other
hand controls for the differences among units after the intervention has taken place.
10 1 Statistical Tools for Program Evaluation: Introduction and Overview

Ex post
evaluation

Random Observational
assignment study

Observable Observable Observable


outcome inputs outcome

Controlled experiment Benchmarking analysis Quasi-


i-experiment

Fig. 1.5 Ex-post evaluation techniques

It does not attempt to manipulate or influence the environment. Data are only
observed and collected (observational study). The evaluator then must account
for the fact that multiple factors may explain the variations observed in the variable
of interest. In both types of study, descriptive and inferential statistics play a
determinant role. They can be used to show evidence of a selection bias, for
instance when some members of the population are inadequately represented in
the sample, or when some individuals select themselves into a group.
The main goal of ex post evaluation is to answer the question of whether the
outcome is the result of the intervention or of some other factors. The true challenge
here is to obtain a measure of what would have happened if the intervention did not
take place, the so-called counterfactual. Different evaluation techniques can be put
in place to achieve this goal. As stated above, one way is through a randomized
controlled experiment. Other ways include difference-in-differences, propensity
score matching, regression discontinuity design, and instrumental variables. All
those quasi-experimental techniques aim to prove causality by using an adequate
identification strategy to approach a randomized experiment. The idea is to estimate
the counterfactual by constructing a control group that is as close as possible to the
treatment group.
Another important aspect to account for is whether the program has been
operated in the most effectual way in terms of input combination and use. Often,
for projects of magnitude, there are several facilities that operate independently in
their geographical area. Examples include schools, hospitals, prisons, social
centers, fire departments. It is the task of the evaluator to assess whether the
provision of services meets with management standards. Yet, the facilities involved
in the implementation process may face different constraints, specific demand
1.5 How to Use the Book? 11

settings and may have chosen different organizational patterns. To overcome those
issues, one may rely on a benchmarking analysis to compare the cost structure of
the facilities with that of a given reference, the benchmark.
Choosing which method to use mainly depends on the context of analysis. For
instance, random assignment is not always possible legally, technically or ethically.
Another problem with random assignment is that it can demotivate those who have
been randomized out, or generate noncompliance among those who have been
randomized in. In those cases, running a quasi-experiment is preferable. In other
cases, the outcome in question is not easily observable and one may rely instead on
a simpler comparison of outputs, and implement a benchmarking analysis. The time
horizon and data availability thus also determine the choice of the method.

1.5 How to Use the Book?

The goal of the book is to provide the readers with a practical guide that covers the
broad array of methods previously mentioned. The brief description of the method-
ology, the step by step approach, the systematic use of numerical illustrations allow
to become fully operational in handling the statistics of public project evaluation.
The first part of the book is devoted to context analysis. It develops statistical
tools that can be used to get a better understanding of problems and needs: Chap. 2
is about sampling methods and the construction of variables; Chap. 3 introduces the
basic methods of descriptive statistics and confidence intervals estimation; Chap. 4
explains how to measure and visualize associations among variables; Chap. 5
describes the econometric approach and Chap. 6 is about the estimation of welfare
changes.
The second part of the book then presents ex ante evaluation methods: Chap. 7
develops the methodology of financial analysis and details several concepts such as
the interest rate, the time value of money or discounting; Chap. 8 includes a detailed
description of budget impact analysis and extends the financial methodology to a
multiple demand structure; Chaps. 9, 10 and 11 relate to the economic evaluation of
the interventions and successively describe the methodology of cost benefit analy-
sis, cost-effectiveness analysis, and multi-criteria decision analysis, respectively.
Those economic approaches offer a way to compare alternative courses of action in
terms of both their costs and their overall consequences and not on their financial
flows only.
Last but not least, the third part of this book is about ex post evaluation, i.e. the
assessment of the effects of a strategy after its implementation. The key issue here is
to control for all those extra factors that may affect or bias the conclusion of the
study. Chapter 12 introduces follow up by benchmarking. Chapter 13 explains the
experimental approach. Chapter 14 details the different quasi-experimental
techniques (difference-in-differences, propensity score matching, regression dis-
continuity design, and instrumental variables) that can be used when faced with
observational data.
12 1 Statistical Tools for Program Evaluation: Introduction and Overview

We have tried to make each chapter as independent of the others as possible. The
book may therefore be read in any order. Readers can simply refer to the table of
contents and select the method they are interested in. Moreover, each chapter
contains bibliographical guidelines for readers who wish to explore a statistical
tool more deeply. Note that this book assumes at least a basic knowledge of
economics, mathematics and statistics. If you are unfamiliar with the concept of
inferential statistics, we strongly recommend you to read the first chapters of
the book.
Most of the information that is needed to understand a particular technique is
contained in the book. Each chapter includes its own material, in particular numeri-
cal examples that can be easily reproduced. When possible, formulas in Excel are
provided. When Excel is not suitable anymore to address specific statistical issues,
we rely instead on R-CRAN, a free software environment for statistical computing
and graphics. The software can be easily downloaded from internet. Codes will be
provided all along the book with dedicated comments and descriptions. If you have
questions about R-CRAN like how to download and install the software, or what the
license terms are, please go to https://www.r-project.org/.

Bibliographical Guideline
The book provides a self-contained introduction to the statistical tools required for
conducting evaluations of public programs, which are advocated by the World
Bank, the European Union, the Organization for Economic Cooperation and Devel-
opment, as well as many governments. Many other guides exist, most of them being
provided by those institutions. We may name in particular the Magenta Book and
the Green Book, both published by the HM Treasury in UK. Moreover, the reader
can refer to the guidance document on monitoring and evaluation of the European
Commission as well as its guide to cost benefit analysis and to the evaluation of
socio-economic development. The World Bank also offers an accessible introduc-
tion to the topic of impact evaluation and its practice in development. All those
guides present the general concepts of program evaluation as well as
recommendations. Note that the definition of “program evaluation” used in this
book is from Patton (2008, p. 39).

Bibliography
European Commission. (2013). The resource for the evaluation of socio-economic development.
European Commission. (2014). Guide to cost-benefit analysis of investment projects.
European Commission. (2015). Guidance document on monitoring and evaluation.
HM Treasury. (2011a). The green book. Appraisal and evaluation in Central Government.
HM Treasury. (2011b). The magenta book. Guidance for evaluation.
Patton, M. Q. (2008). Utilization focused evaluation (4th ed.). Saint Paul, MN: Sage.
World Bank. (2011). Impact evaluation in practice.
Part I
Identifying the Context of the Program
Sampling and Construction of Variables
2

2.1 A Step Not to Be Taken Lightly

Building a reliable and relevant database is a key aspect of any statistical study. Not
only can misleading information create bias and mistakes, but it can also seriously
affect public decisions if the study is used for guiding policy-makers. The first role
of the analyst is therefore to provide a database of good quality. Dealing with this
can be a real struggle, and the amount of resources (time, budget, personnel)
dedicated to this activity should not be underestimated.
There are two types of sources from which the data can be gathered. On one
hand, one may rely on pre-existing sources such as data on privately held com-
panies (employee records, production records, etc.), data from government agencies
(ministries, central banks, national institutes of statistics), from international insti-
tutions (World Bank, International Monetary Fund, Organization for Economic
Co-operation and Development, World Health Organization) or from
non-governmental organizations. When such databases are not available, or if
information is insufficient or doubtful, the analyst has to rely instead on what we
might call a homemade database. In that case, a survey is implemented to collect
information from some or all units of a population and to compile the information
into a useful summary form. The aim of this chapter is to provide a critical review
and analysis of good practices for building such a database.
The primary purpose of a statistical study is to provide an accurate description of
a population through the analysis of one or several variables. A variable is a charac-
teristic to be measured for each unit of interest (e.g., individuals, households, local
governments, countries). There are two types of design to collect information about
those variables: census and sample survey. A census is a study that obtains data
from every member of a population of interest. A sample survey is a study that
focuses on a subset of a population and estimates population attributes through
statistical inference. In both cases, the collected information is used to calculate
indicators for the population as a whole.

# Springer International Publishing AG 2017 15


J.-M. Josselin, B. Le Maux, Statistical Tools for Program Evaluation,
DOI 10.1007/978-3-319-52827-4_2
16 2 Sampling and Construction of Variables

Since the design of information collection may strongly affect the cost of survey
administration, as well as the quality of the study, knowing whether the study
should be on every member or only on a sample of the population is of high impor-
tance. In this respect, the quality of a study can be thought of in terms of two types
of error: sampling and non-sampling errors. Sampling errors are inherent to all
sample surveys and occur because only a share of the population is examined.
Evidently, a census has no sampling error since the whole population is examined.
Non-sampling errors consist of a wide variety of inaccuracies or miscalculations
that are not related to the sampling process, such as coverage errors, measurement
and nonresponse errors, or processing errors. A coverage error arises when there is
non-concordance between the study population and the survey frame. Measurement
and nonresponse errors occur when the response provided differs from the real
value. Such errors may be caused by the respondent, the interviewer, the format of
the questionnaire, the data collection method. Last, a processing error is an error
arising from data coding, editing or imputation.
Before deciding to collect information, it is important to know whether studies
on a similar topic have been implemented before. If this is to be the case, then it
may be efficient to review the existing literature and methodologies. It is also
critical to be clear on the objectives, especially on the type of information one
needs (individuals and organizations involved, time period, geographical area), and
on how the results will be used and by whom. Once the process of data collection
has been initiated or a fortiori completed, it is usually extremely costly to try and
add new variables that were initially overlooked.
The construction of a database includes several steps that can be summarized as
follows. Section 2.2 describes how to choose a sample and its size when a census is
not carried out. Section 2.3 deals with the various ways of conceiving a question-
naire through different types of questions. Section 2.4 is dedicated to the process of
data collection as it details the different types of responding units and the corre-
sponding response rates. Section 2.5 shows how to code data for subsequent
statistical analysis.

2.2 Choice of Sample

First of all, it is very important to distinguish between the target population, the
sampling frame, the theoretical sample, and the final sample. Figure 2.1 provides a
summary description of how these concepts interact and how the sampling process
may generate errors.
The target population is the population for which information is desired, it
represents the scope of the survey. To identify precisely the target population,
there are three main questions that should be answered: who, where and when?
The analyst should specify precisely the type of units that is the main focus of the
study, their geographical location and the time period of reference. For instance, if
the survey aims at evaluating the impact of environmental pollution, the target
population would represent those who live within the geographical area over which
2.2 Choice of Sample 17

Target population

Survey population
Nonresponse error
Theoretical sample
x x
x x Processing error
Final sample
x x

Coverage error

Fig. 2.1 From the target population to the final sample

the pollution is effective or those who may be using the contaminated resource. If
the survey is about the provision of a local public good, then the target population
may be the local residents or the taxpayers. As to a recreational site, or a better
access to that site, the target population consists of all potential users. Even at this
stage carefulness is required. For instance, a local public good may generate spill-
over effects in neighboring jurisdictions, in which case it may be debated whether
the target population should reach beyond local boundaries.
Once the target population has been identified, a sample that best represents it
must be obtained. The starting point in defining an appropriate sample is to deter-
mine what is called a survey frame, which defines the population to be surveyed
(also referred to as survey population, study population or target population). It is a
list of all sampling units (list frame), e.g., the members of a population, which is
used as a basis for sampling. A distinction is made between identification data (e.g.,
name, exact address, identification number) and contact data (e.g., mailing address
or telephone number). Possible sampling frames include for instance a telephone
directory, an electoral register, employment records, school class lists, patient files
in a hospital, etc. Since the survey frame is not necessarily under the control of the
evaluator, the survey population may end up being quite different from the target
population (coverage errors), although ideally the two populations should coincide.
For large populations, because of the costs required for collecting data, a census
is not necessarily the most efficient design. In that case, an appropriate sample must
be obtained to save the time and, especially, the expense that would otherwise be
required to survey the entire population. In practice, if the survey is well-designed, a
sample can provide very precise estimates of population parameters. Yet, despite all
the efforts made, several errors may remain, in particular nonresponse, if the survey
fails to collect complete information on all units in the targeted sample. Thus,
depending on survey compliance, there might be a large difference between the
theoretical sample that was originally planned and the final sample. In addition to
these considerations, several processing errors may finally affect the quality of the
database.
Another random document with
no related content on Scribd:
Head of the Great Comet of 1861
From a drawing by Warren De La Rue.
Halley's Comet, May 5, 1910
Photographed at the Yerkes Observatory by E E. Barnard, with the ten-inch Bruce
telescope.

This was shortly before the passage of the comet between the earth and the sun,
when some think its tail was thrown over us.

Let us begin with that time of the year when the sun arrives at the
vernal equinox. This occurs about the 21st of March. The sun is then
perpendicular over the equator, daylight extends, uninterrupted, from
pole to pole, and day and night (neglecting the effects of twilight and
refraction) are of equal length all over the earth. Everywhere there
are about twelve hours of daylight and twelve hours of darkness.
This is the beginning of the astronomical spring. As time goes on,
the motion of the sun in the ecliptic carries it eastward from the
vernal equinox, and, at the same time, owing to the inclination of the
ecliptic, it rises gradually higher above the equator, increasing its
northern declination slowly, day after day. Immediately the equality of
day and night ceases, and in the northern hemisphere the day
becomes gradually longer in duration than the night, while in the
southern hemisphere it becomes shorter. Moreover, because the sun
is now north of the equator, daylight no longer extends from pole to
pole on the earth, but the south pole is in continual darkness, while
the north pole is illuminated.
You can illustrate this, and explain to yourself why the relative
length of day and night changes, and why the sun leaves one pole in
darkness while rising higher over the other, by suspending a small
terrestrial globe with its axis inclined about 23½° from the
perpendicular, and passing a lamp around it in a horizontal plane. At
two points only in its circuit will the lamp be directly over the equator
of the globe. Call one of these points the vernal equinox. You will
then see that, when the lamp is directly over this point, its light
illuminates the globe from pole to pole, but when it has passed round
so as to be at a point higher than the equator, its light no longer
reaches the lower pole, although it passes over the upper one.
Fig. 11. The Seasons.

The earth is represented at four successive points in its orbit about the sun. Since
the axis of the earth is virtually unchangeable in its direction in space (leaving out
of account the slow effects of the precession of the equinoxes), it results that at
one time of the year, the north pole is inclined toward the sun and at the opposite
time of the year away from it. It attains its greatest inclination sunward at the
summer solstice, then the line between day and night lies 23½° beyond the north
pole, so that the whole area within the arctic circle is in perpetual daylight. The
days are longer than the nights throughout the northern hemisphere, but the day
becomes longer in proportion to the night as the arctic circle is approached, and
beyond that the sun is continually above the horizon. In the southern hemisphere
exactly the reverse occurs. When the earth has advanced to the autumn equinox,
the axis is inclined neither toward nor away from the sun. The latter is then
perpendicular over the equator and day and night are of equal length all over the
earth. When the earth reaches the winter solstice the north pole is inclined away
from the sun, and now it is summer in the southern hemisphere. At the vernal
equinox again there is no inclination of the axis either toward or away from the
sun, and once more day and night are everywhere equal. A little study of this
diagram will show why on the equator day and night are always of equal length.
Now, with the lamp thus elevated above the equator, set the globe
in rotation about its axis. You will perceive that all points in the upper
hemisphere are longer in light than in darkness, because the plane
dividing the illuminated and the unilluminated halves of the globe is
inclined to the globe's axis in such a way that it lies beyond the
upper pole as seen from the direction of the lamp. Consequently, the
upper half of the globe above the equator, as it goes round, has
more of its surface illuminated than unilluminated, and, as it turns on
its axis, any point in that upper half, moving round parallel to the
equator, is longer in light than in darkness. You will also observe that
the ratio of length of the light to the darkness is greater the nearer
the point lies to the pole, and that when it is within a certain distance
of the pole, corresponding with the elevation of the lamp above the
equator, it lies in continual light—in other words, within that distance
from the pole night vanishes and daylight is unceasing. At the same
time you will perceive that round the lower pole there is a similar
space within which day has vanished and night is unceasing, and
that in the whole of the lower hemisphere night is longer than day.
Exactly on the equator, day and night are always of equal length.
Endeavour to represent all this clearly to your imagination, before
actually trying the experiment, or consulting a diagram. If you try the
experiment you may, instead of setting the axis of the globe at a
slant, place it upright, and then gradually raise and lower the lamp as
it is carried round the globe, now above and now below the equator.
We return to our description of the actual movements of the sun.
As it rises higher from the equator, not only does the day increase in
length relatively to the night, but the rays of sunlight descend more
nearly perpendicular upon the northern hemisphere. The
consequence is that their heating effect upon the ground and the
atmosphere increases and the temperature rises until, when the sun
reaches its greatest northern declination, about the 22d of June
(when it is 23½° north of the equator), the astronomical summer
begins. This point in the sun's course through the circle of the ecliptic
is called the summer solstice (see Part I, Sect. 8). Having passed the
solstice, the sun begins to decline again toward the equator. For a
short time the declination diminishes slowly because the course of
the ecliptic close to the solstice is nearly parallel to the equator, and
in the meantime the temperature in the northern hemisphere
continues to increase, the amount of heat radiated away during the
night being less than that received from the sun during the day. This
condition continues for about six weeks, the greatest heats of
summer falling at the end of July or the beginning of August, when
the sun has already declined far toward the equator, and the nights
have begun notably to lengthen. But the accumulation of heat during
the earlier part of the summer is sufficient to counterbalance the loss
caused by the declension of the sun.
About the 23d of September the sun again crosses the equator,
this time at the autumnal equinox, the beginning of the astronomical
autumn, and after that it sinks lower and lower (while appearing to
rise in the southern hemisphere), until about the 22d of December,
when it reaches its greatest southern declination, 23½°, at the winter
solstice, which marks the beginning of the astronomical winter. It is
hardly necessary to point out that the southern winter corresponds in
time with the northern summer, and vice versa. From the winter
solstice the sun turns northward once more, reaching the vernal
equinox again on the 21st of March.
Thus we see that we owe the succession of the seasons entirely
to the inclination of the earth's axis out of a perpendicular to the
plane of the ecliptic. If there were no such inclination there would be
climate but no seasons. There would be no summer heat, except in
the neighbourhood of the equator, while the middle latitudes would
have a moderate temperature the year round. Owing to the effects of
refraction, perpetual day would prevail within a small region round
each of the poles. The sun would be always perpendicular over the
equator.
Two things remain to be pointed out with regard to the effect of the
sun's annual motion in the ecliptic. One of these is the circles called
the tropics. These are drawn round the earth parallel to the equator
and at a distance of 23½° from it, one in the northern and the other
in the southern hemisphere. The northern one is called the tropic of
Cancer, because its corresponding circle on the celestial sphere runs
through the zodiacal sign Cancer, and the southern one is called the
tropic of Capricorn for a similar reason. The tropics run through the
two solstices, and mark the apparent daily track of the sun in the sky
when it is at either its greatest northern or its greatest southern
declination. The sun is then perpendicular over one or the other of
the tropics. That part of the earth lying between the tropics is called
the torrid zone, because the sun is always not far from perpendicular
over it, and the heat is very great.

The Six-Tailed Comet of 1744


From a contemporary drawing.

The other thing to be mentioned is the polar circles. These are


situated 23½° from each pole, just as the tropics are situated a
similar distance on each side of the equator. The northern is called
the arctic, and the southern the antarctic circle. Those parts of the
earth which lie between the tropics and the polar circles are called
respectively the northern and the southern temperate zone. The
polar circles mark the limits of the region round each pole where the
sun shines continuously when it is at one or the other of the
solstices. If the reader will recall the experiment with the globe and
the lamp, he will perceive that these circles correspond with the
borders of the circular spaces at each pole of the globe which are
alternately carried into and out of the full light as the lamp is elevated
to its greatest height above the equator or depressed to its greatest
distance below it. At each pole, in turn, there are six months of
continual day followed by six months of continual night, and when
the sun is at one of the solstices it just touches the horizon on the
corresponding polar circle at the hour that marks midnight on the
parts of the earth which lie outside the polar circles. This is the
celebrated phenomenon of the “midnight sun.” At any point within the
polar circle concerned, the sun, at the hour of midnight approaches
the horizon but does not touch it, its midnight elevation increasing
with nearness to the pole, while exactly at the pole itself the sun
simply moves round the sky once in twenty-four hours in a circle
practically parallel to the horizon. It is by observations on the daily
movement of the sun that an explorer seeking one of the earth's
poles during the long polar day is able to determine when he has
actually reached his goal.
The reader will have remarked in these descriptions how
frequently the angle of 23½° turns up, and he should remember that
it is, in every case, due to the same cause, viz., the inclination of the
earth's axis from a perpendicular to the ecliptic.
A very remarkable fact must now be referred to. Although the
angular distance that the sun has to travel in passing first from the
vernal equinox to the autumnal equinox, on the northern side of the
equator, and then back again from the autumnal equinox to the
vernal equinox, on the southern side of the equator, is the same, the
time that it occupies in making these two half stages in its annual
journey is not the same. Beginning from the 21st of March and
counting the number of days to the 23d of September, and then
beginning from the 23d of September and counting the number of
days to the next 21st of March, you will find that in an ordinary year
the first period is seven days longer than the second. In other words,
the sun is a week longer above the equator than below it. The
reason for this difference is found in the fact that the orbit of the
earth about the sun is not a perfect circle, but is a slightly elongated
ellipse, and the sun, instead of being situated in the centre, is
situated in one of the two foci of the ellipse, 3,000,000 miles nearer
to one end of it than to the other. Now this elliptical orbit of the earth
is so situated that the earth is nearest to the focus occupied by the
sun, or in perihelion, about December 31st, only a few days after the
winter solstice, and farthest from the sun, or in aphelion, about July
1st, only a few days after the summer solstice. Thus the earth is
nearer the sun during the winter half of the year, when the sun
appears south of the equator, than during the summer half of the
year, when the sun appears north of the equator. Now the law of
gravitation teaches that when the earth is nearer the sun it must
move more rapidly in its orbit than when it is more distant, from
which it follows that the time occupied by the sun in its apparent
passage from the vernal equinox to the autumnal equinox is longer
than that occupied in the passage back from the autumnal to the
vernal equinox.
But while the summer half of the year is longer than the winter half
in the northern hemisphere, the reverse is the case in the southern
hemisphere. There the winter is longer than the summer. Moreover,
the winter of the southern hemisphere occurs when the earth is
farthest from the sun, which accentuates the disadvantage. It has
been thought that the greater quantity of ice about the south pole
may be due to this increased length and severity of the southern
winter. It is true that the southern summer, although shorter, is hotter
than the northern, but while, theoretically, this should restore the
balance as a whole, yet it would appear that the short hot summer
does not, in fact, suffice to arrest the accumulation of ice.
However, the present condition of things as between the two
hemispheres will not continue, but in the course of time will be
reversed. The reader will recall that the precession of the equinoxes
causes the axis of the earth to turn slowly round in space. At present
the northern end of the earth's axis is inclined away from the
aphelion and in the direction of the perihelion point of the orbit, so
that the northern summer occurs when the earth is in the more
distant part of its orbit, and the winter when it is in the nearer part.
But the precession swings the axis round westward from its present
position at the rate of 50″.2 per year, while at the same time the
position of the orbit itself is shifted (by the effects of the attraction of
the planets) in such a manner that the aphelion and perihelion
points, which are called the apsides, move round eastward at the
rate of 11″.25 per year. The combination of the precession with the
motion of the apsides produces a revolution at the rate of 61″.45 per
year, which in the course of 10,500 years will completely reverse the
existing inclination of the axis with regard to the major diameter of
the orbit, so that then the northern hemisphere will have its summer
when the earth is near perihelion and its winter when it is near
aphelion. The winter, then, will, for us, be long and severe and the
summer short though hot.
It has been thought possible that such a state of things may
cause, in our hemisphere, a partial renewal of what is known in
geology as a glacial period. A glacial period in the southern
hemisphere would probably always be less severe than in the
northern, because of the great preponderance of sea over land in the
southern half of the globe. An ocean climate is more equable than a
land climate.
10. The Year, the Calendar, and the Month. A year is the period
of time required for the earth to make one revolution in its orbit about
the sun. But, as there are three kinds, or measures, of time, so there
are three kinds, or measures, of the year. The first of these is called
the sidereal year, but although, like sidereal time, it measures the
true length of the period in question, it is not suitable for ordinary
use. To understand what is meant by a sidereal year, imagine
yourself to be looking at the earth from the sun, and suppose that at
some instant you should see the earth exactly in conjunction with a
star. When, having gone round the sun, it had come back again to
conjunction with the same star, precisely one revolution would have
been performed in its orbit, and the period elapsed would be a
sidereal year. Practically, the length of the sidereal year is
determined by observing when the sun, in its apparent annual
journey round the sky, has come back to conjunction with some
given star.
The second kind of year is called the tropical year, and it is
measured by the period taken by the sun to pass round the sky from
one conjunction with the vernal equinox to the next. This period
differs slightly from the first, because, owing to the precession of the
equinoxes, the vernal equinox is slowly shifting westward, as if to
meet the sun in its annual course, from which it results that the sun
overtakes the equinox a little before it has completed a sidereal year.
The tropical year is about twenty minutes shorter than the sidereal
year. It is, however, more convenient for ordinary purposes, because
we naturally refer the progress of the year to that of the seasons,
and, as we have seen, the seasons depend upon the equinoxes.
But yet the tropical year is not entirely satisfactory as a measure of
time, because the number of days contained in it is not an even one.
Its length is 366 days, 5 hours, 48 minutes, 46 seconds. Accordingly,
as the irregularities of apparent solar time were avoided by the
invention of mean solar time, so the difficulty presented by the
tropical year is gotten rid of, as far as possible, by means of what is
called the civil year, or the calendar year, the average length of
which is almost exactly equal to that of the tropical year. This brings
us to the consideration of the calendar, which is as full of
compromises as a political treaty—but there is no help for it since
nature did not see fit to make the day an exact fraction of the year,
or, in other words, to make the day and the year commensurable
quantities of time.
Without going into a history of the reforms that the calendar has
undergone, which would demand a great deal of space, we may
simply say that the basis of the calendar we use to-day was
established by Julius Cæsar, with the aid of the Greek astronomer
Sosigenes. This is the Julian calendar, and the reformed shape in
which it exists at present is called the Gregorian calendar. Cæsar
assumed 365¼ days as the true length of the year, and, in order to
get rid of the quarter day, ordered that it should be left out of account
for three years out of every four. In the fourth year the four quarter
days were added together to make one additional day, which was
added to that particular year. Thus the ordinary years were each 365
days long and every fourth year was 366 days long. This fourth year
was called the bissextile year. It was identical with our leap year. The
days of both the ordinary and the leap years were distributed among
the twelve months very much as we distribute them now.
But Cæsar's assumption of 365¼ days as the length of the year
was erroneous, being about 11 min. 14 sec. longer than the real
tropical year. In the sixteenth century this error had accumulated to
such a degree that the months were becoming seriously disjointed
from the seasons with which they had been customarily associated.
In consequence, Pope Gregory XIII, assisted by the astronomer
Clavius, introduced a slight reform of the Julian calendar. The
accumulated days were dropped, and a new start taken, and the rule
for leap year was changed so as to read that “all years, whose date-
number is divisible by four without a remainder are leap years,
unless they are century years (such as 1800, 1900, etc.). The
century years are not leap years, unless their date number is
divisible by 400, in which case they are.” And this is the rule as it
prevails to-day, although there is now (1912) serious talk of
undertaking a new revision. But the Gregorian calendar is so nearly
correct that more than 3000 years must elapse before the length of
the year as determined by it will differ by one day from the true
tropical year.
The subject of the reform of the calendar is a very interesting one,
but, together with that of the rules for determining the date of Easter,
its discussion must be sought in more extensive works.
There is one other measure of time, depending upon the motion of
a heavenly body, which must be mentioned. This is the month, or the
period required for the moon to make a revolution round the earth.
Here we encounter again the same difficulty, for the month also is
incommensurable with the year. Then, too, the length of the month
varies according to the way in which it is reckoned. We have, first, a
sidereal revolution of the moon, which is measured by the time taken
to pass round the earth from one conjunction with a star to the next.
This is, on the average, 27 days, 7 hours, 43 minutes, 12 seconds.
Next we have a synodical revolution of the moon, which is measured
by the time it takes in passing from the phase of new moon round to
the same phase again. This seems the most natural measure of a
month, because the changing phases of the moon are its most
conspicuous peculiarity. (These will be explained in Part III.) The
length of the month, as thus measured, is, on the average, 29 days,
12 hours, 44 minutes, 3 seconds. The reason why the synodical
month is so much longer than the sidereal month is because new
moon can occur only when the moon is in conjunction with the sun,
i.e. exactly between the earth and the sun, and in the interval
between two new moons the sun moves onward, so that for the
second conjunction the moon must go farther to overtake the sun. It
will be observed that both of the month measures are given in
average figures. This is because the moon's motion is not quite
regular, owing partly to the eccentricity of its orbit and partly to the
disturbing effects of the sun's attraction. The length of the sidereal
revolution varies to the extent of three hours, and that of the
synodical revolution to the extent of thirteen hours.
But, whichever measure of the month we take, it is
incommensurate with the year, i.e. there is not an even number of
months in a year. In ancient times ceaseless efforts were made to
adjust the months to the measure of the year, but we have practically
given up the attempt, and in our calendar the lunar months shift
along as they will, while the ordinary months are periods of a certain
number of days, having no relation to the movements of the moon.
It has been thought that the period called a week, which has been
used from time immemorial, may have originated from the fact that
the interval from new moon to the first quarter and from first quarter
to full moon, etc., is very nearly seven days. But the week is as
incorrigible as all its sisters in the discordant family of time, and there
is no more difficult problem for human ingenuity than that of
inventing a system of reckoning, in which the days, the weeks, the
months, and the years shall be adjusted to the closest possible
harmony.

Spiral Nebula in Ursa Major (M 101)


Photographed at the Lick Observatory by J. E. Keeler, with the Crossley reflector.
Exposure four hours.
Note the appearance of swift revolution, as if the nebula were throwing itself to
pieces like a spinning pin-wheel.

The Whirlpool Nebula in Canes Venatici


Photographed at the Lick Observatory by J. E. Keeler, with the Crossley reflector.
Exposure four hours.

Note the “beading” of the arms of the whirling nebula.


PART III.

THE SOLAR SYSTEM.


PART III.

THE SOLAR SYSTEM.

1. The Sun. By the term solar system is meant the sun together
with the system of bodies (planets, asteroids, comets and meteors)
revolving round it. The sun, being a star, every other star, for all that
we can tell, may be the ruler of a similar system. In fact, we know
that a few stars have huge dark bodies revolving round them, which
may be likened to gigantic planets. The reason why the sun is the
common centre round which the other members of the solar system
move, is because it vastly exceeds all of them put together in mass,
or quantity of matter, and the power of any body to set another body
in motion by its attractive force depends upon mass. If a great body
and a small body attract each other, both will move, but the motion of
the small body will be so much more than that of the great one that
the latter will seem, relatively, to stand fast while the small one
moves. Then, if the small body had originally a motion across the
direction in which the great body attracts it, the result of the
combination will be to cause the small body to revolve in an orbit
(more or less elliptical according to the direction and velocity of its
original motion) about the great body. If the difference of mass is
very great, the large body will remain virtually immovable. This is the
case with the sun and its planets. The sun has 332,000 times as
much mass (or, we may say, is 332,000 times as heavy) as the
earth. It has a little more than a thousand times as much mass as its
largest planet, Jupiter. It has millions of times as much as the
greatest comet. The consequence is that all of these bodies revolve
around the sun, in orbits of various degrees of eccentricity, while the
sun itself remains practically immovable, or just swaying a little this
way and that, like a huntsman holding his dogs in leash.
The distance of the sun from the earth—about 93,000,000 miles—
has been determined by methods which will be briefly explained in
the next section. Knowing its distance, it is easy to calculate its size,
since the apparent diameter of all objects varies directly with their
distance. The diameter of the sun is thus found to be about 866,400
miles, or nearly 110 times that of the earth. In bulk it exceeds the
earth about 1,300,000 times, but its mass, or quantity of matter, is
only 332,000 times the earth's, because its average density is but
one quarter that of the earth. This arises from the fact that the earth
is a solid, compact body, while the sun is a body composed of gases
and vapours (though in a very compressed state). It is the high
temperature of the sun which maintains it in this state. Its
temperature has been calculated at about 16,000° Fahrenheit, but
various estimates differ rather widely. At any rate, it is so hot that the
most refractory substances known to us would be reduced to the
state of vapour, if removed to the sun. The quantity of heat received
upon the earth from the sun can only be expressed in terms of the
mechanical equivalent of heat. The unit of heat in engineering is the
calorie, which means the amount of heat required to raise the
temperature of one kilogram of water (2.2 pounds) one degree
Centigrade (1°.8 Fahrenheit). Now observation shows that the sun
furnishes 30 of these calories per minute upon every square metre
(about 1.2 square yard) of the earth's surface. Perhaps there is no
better illustration of what this means than Prof. Young's statement,
that “the heat annually received on each square foot of the earth's
surface, if employed in a perfect heat engine, would be able to hoist
about a hundred tons to the height of a mile.” Or take Prof. Todd's
illustration of the mechanical power of the sunbeams: “If we measure
off a space five feet square, the energy of the sun's rays when falling
vertically upon it is equivalent to one horse power.” Astronomers
ordinarily reckon the solar constant in “small calories,” which are but
the thousandth part of the engineer's calorie, and the latest results of
the Smithsonian Institution observations indicate that the solar
constant is about 1.95 of these small calories per square centimeter
per second. About 30 per cent. must be deducted for atmospheric
absorption.
Heat, like gravitation and like light, varies inversely in intensity with
the square of the distance; hence, if the earth were twice as near as
it is to the sun it would receive four times as much heat and four
times as much light, and if it were twice as far away it would receive
only one quarter as much. This shows how important it is for a planet
not to be too near, or too far from, the sun. The earth would be
vapourised if it were carried within a quarter of a million miles of the
sun.
The sun rotates on an axis inclined about 7½° from a
perpendicular to the plane of the ecliptic. The average period of its
rotation is about 25⅓ days—we say “average” because, not being a
solid body, different parts of its surface turn at different rates. It
rotates faster at the equator than at latitudes north-and-south of the
equator, the velocity decreasing toward the poles. The period of
rotation at the equator is about 25 days, and at 40° north or south of
the equator it is about 27 days. The direction of rotation is the same
as that of the earth's.
The surface of the sun, when viewed with a telescope, is often
seen more or less spotted. The spots are black, or dusky, and
frequently of very irregular shapes, although many of them are
nearly circular. Generally they appear in groups drawn out in the
direction of the solar rotation. Some of these groups cover areas of
many millions of square miles, although the sun is so immense that
even then they appear to the naked eye (guarded by a dark glass)
only as small dark spots on its surface. The centres of sun-spots, are
the darkest parts. Generally around the borders of the spots the
surface seems to be more or less heaped up. Often, in large sun-
spots, immense promontories, very brilliant, project over the dark
interior, and many of these are prolonged into bridges of light,
apparently traversing the chasms beneath. Constant changes of
shape and arrangement take place, and there are few more
astonishing telescopic objects than a great sun-spot.

You might also like