Professional Documents
Culture Documents
2 Andrews Boyne and Walker
2 Andrews Boyne and Walker
Exploration
Rhys Andrews, George A Boyne and Richard M Walker
INTRODUCTION
Governments around the globe now seek to judge the performance of their public
services. This has given rise to the introduction of a range of complex and sophisticated
cities (China Daily 2004), the Comprehensive Performance Assessment in English local
government (Audit Commission 2002), the Government Performance Results Act 1992 in
the USA, the Service Improvement Initiative in Canada, the Putting Service First scheme
Netherlands (Pollitt and Bouckaert 2004). Researchers have increasingly turned their
attention to public service performance (see for example the Symposium edition of
Journal of Public Administration Research and Theory 2005 (Boyne and Walker, 2005),
persistent problem for public management researchers and practitioners has been the
24
Previous research has shown that organizational performance is multifaceted
(Boyne 2003; Carter et al. 1992; Quinn and Rohrbaugh 1983; Venkatraman and
Ramanjum 1986). This is because public organizations are required to address a range of
goals, some of which may be in conflict. Consequently, public organizations are obliged
responsiveness and democratic outcomes. Outputs include the quantity and quality of
services; efficiency is concerned with the cost per unit of outputs; effectiveness refers to
as judged by direct service users, wider citizens and staff; democratic outcomes are
concerned with accountability, probity and participation. These categories are found in a
variety of academic studies, and in the performance measures used by governments and
public organizations1.
If some clarity has been brought to the criteria of performance, uncertainty still
exists over the best way to measure and operationalize these criteria. For example,
to judge the progress of public agencies. Aggregate measures draw together performance
data from a number of sources to produce an overall score for an organization or service.
The results of aggregate measures are often presented in a scorecard (Weimer and
one service area and are reported as specific performance indicators. However, using
1
The dimensions of organizational performance typically used in public sector studies are
usually drawn from the ‘3Es’ model of economy, efficiency and effectiveness or the IOO
In this chapter we assess the relative merits of different methods for measuring
should they be based on the perceptions of stakeholders who are internal or external to
the organization? What is the relationship between objective and subjective measures,
and between internal and external perceptions of performance? Academics have debated
for over three decades the merits of subjective and objective measures of performance
(Ostrom 1973; Park 1984; Carter et al. 1992; Kelly and Swindell 2002). While studies of
the measurement of the performance of firms have explored this issue, little work has
complex concept in the private sector because most stakeholders agree that strong
performance is as paramount in the public sector, and different stakeholders may have
2
The terms objective and subjective measures are widely used in both the social science
measurement and the management literatures. While their use might be provocative, they
are used in this chapter as shorthand to distinguish between different ways of measuring
performance. As becomes clear in our discussion there are problems associated with the
use of both types of measure. All measures are ultimately ‘perceptual’ rather than
‘factual’.
26
organizations. To this end we initially define objective and subjective measures of
performance across a range of services and dimensions of performance and through time.
Our conclusions reflect on the relative merits of subjective and objective performance
measures.
PERFORMANCE
Both objective and subjective measures of performance have been used in studies of the
2004; Pandy and Moynihan, this volume; Andrews et al. in press; Meier and O’Toole
2003). However, public sector studies have rarely used objective and subjective measures
in combination (though see Walker and Boyne in press; Kelly and Swindell 2002) and
much of the literature has treated these measures as equivalent (Wall et al. 2004). In this
section of the chapter we define objective and subjective measures and deal with a
number of technical issues of measurement which suggest that the objective and
Objective measures have been viewed as the gold standard in public management
research. They are typically regarded as the optimum indicators of public sector
performance because they are believed to reflect the ‘real’ world accurately and
‘minimize discretion’ (Meier and Brudney 2002: 19). An objective indicator should,
therefore, be impartial, independent and detached from the unit of analysis. To reduce
27
discretion and be objective, a measure of performance must, first involve the precise
accuracy. Many measures meet these criteria. For example school exam results are a
schools, and students’ achievements are validated through the marking of their work by
scrutiny by the community. Such data have been used by Meier and O’Toole in their
measures have been available for local governments since 1992 (Boyne 2002). However,
the Audit Commission Performance Indicators did not always measure performance (for
example, many early indicators simply measure spending), and while they were published
by the Commission only some of them were checked for accuracy. They were, therefore,
of dubious value and unreliable; indeed, anecdotal evidence suggests that data were often
‘made up’ by local service managers. This position has now changed. The new Best
Value Performance Indicators (BVPIs) more closely reflect core dimensions of public
A subjective measure may be biased or prejudiced in some way and is not distant
to the unit of analysis. Subjective measures, like objective ones must refer to a dimension
of performance that is relevant to the organization, that is, subjective judgments may be
are internal, typically obtained from surveys of managers—or they may be based on
3
This may include aggregate measures of overall organizational effectiveness or a
scrutiny.
measures; for example the Executive Branch Management Scorecard used by the US
government (Government Executive 2004), and the inspection system used in many
British public services (Boyne et al. 2002). In the latter case, government inspectors
data, subjective internal measures of performance from agency staff, subjective external
perceptual measures from uses and their own impressions during a site visit.
A number of validity issues exist in relation to all types of measures of performance. The
interesting questions about the use of objective measures, the most difficult of which
relates to the extensive number of dimensions of performance in the public sector. Boyne
(2002) identified eighteen dimensions of performance and concluded that only six of
these were captured in recent BVPIs. Areas omitted included issues of fundamental
objective measures are not able to deal with the complexity of performance in public
organizations, serious attention must be paid to the use of subjective measures. Indeed the
use of subjective measures is often defended in the literature because objective measures
are simply not available, or do not tap all of the appropriate dimensions of performance.
Subjective measures are seen to be limited because they suffer from a number of
flaws, of which common-method bias is believed to be the most serious (Wall et al.
lower points on a response scale). This is particularly a problem when survey respondents
make judgments on both management and performance, and these variables are then used
in a statistical model. Furthermore, reliance upon recall together with uncertainty about
questions have also been posed about the accuracy of objective measures, following
major accounting scandals in private firms such as Exxon or WorldCom and evidence of
‘cheating’ on indicators in the public sector (Bohte and Meier 2000). Furthermore, the
objective measures often used in management research are financial, and can be
questioned because organizations may make decisions about capital and revenue
measures of performance in the public sector because scorecards are collated by officials
However, when studies adopt both types of measure their limitations are not always
systematically addressed and they are sometimes seen as interchangeable (see for
example Selden and Sowa 2004). It could be argued that the latter approach is not
are positive and statistically significant correlations between objective and subjective
Dalany and Huselid 1996; Dess and Robinson 1984; Dollinger and Golden 1992; Powell
30
1992; Robinson and Pearce 1999). Such findings, however, are only achieved when
measures of the same dimensions of performance are used (Guest et al. 2003; Voss and
likely to be greater than in the private than the public sector. No single dimension of
performance is as important for public organizations as financial results are for private
organizations.
influenced by scores on performance indicators. The evidence as it exists suggest that the
best course of action for public management researchers and practitioners is to use both
expectations of the level of achievement against the indicators are the same.
If any one of these conditions is not met, so the relationship between subjective and
31
become weaker. Eventually, as all the conditions are relaxed, these relationships may fall
close to zero.
MEASURES OF PERFORMANCE
In this section we compare and contrast objective and subjective performance measures
through an analysis of services in Welsh local authorities. The services we examine cover
all main local government functions in the UK: education, social services, housing, waste
management, highways maintenance, planning, public protection, leisure and culture, and
benefits and revenues. Subjective and objective measures of the following dimensions of
performance are available for these services: effectiveness, output quality, output quantity
and equity. We compare objective performance data and internal and external subjective
perceptions of performance in 2002 and 2003. The data sources, measures and methods
of data manipulation necessary to permit comparison are now discussed for each type of
performance measure. Full data on the performance indicators used in our analysis and
Objective performance
Objective performance for all major services in Welsh local authorities is measured every
year through the production and publication of performance indicators set by their most
powerful stakeholder: the National Assembly for Wales, which provides over 80% of
local government’s funding. The National Assembly for Wales Performance Indicators
(NAWPIs) are based on common definitions and data which are obtained by councils for
32
the same time period with uniform collection procedures (National Assembly for Wales
2001). All local authorities in Wales and England are expected to collect and collate these
data in accordance with the Chartered Institute of Public Finance and Accountancy ‘Best
Value Accounting – Code of Practice’. The figures are then independently checked for
accuracy, and the Audit Commission assesses whether ‘the management systems in place
are adequate for producing accurate information’ (National Assembly for Wales 2001,
14).
The dimensions of performance covered by each NAWPI are shown in table 2.1.
Examples include:
Output quality – the number of days that traffic controls were in place on major
roads, the percentage of planning applications processed in eight weeks and the
Output quantity – the percentage of excluded school pupils attending more than
25 hours tuition a week, the proportion of elderly people helped to live at home and
score, the percentage of rent arrears and the percentage of welfare benefit claims
processed correctly;
authority’s score on the Commission for Racial Equality’s Housing Code of Practice
and the percentage of pedestrian crossings with facilities for disabled people.
33
For our analysis we used 72 of the 100 service delivery NAWPIs available for
2002 and 2003. Although the service delivery NAWPIs cover a number of different
dimensions of performance, our analysis focuses only on those dimensions for which
there were indicators in at least two service areas. Analyzing dimensions of performance
using data from only one service area would have excessively restricted the sample size,
because the maximum number of cases for which we have subjective data in a single
service area is fourteen. We are also forced to omit some dimensions of performance
from our analysis, such as staff satisfaction and efficiency, because objective indicators
on these dimensions are not set by the National Assembly for Wales. To make the
indicators suitable for comparative analysis across different service areas, we divided
each of them by the mean score for Welsh authorities, inverting some (e.g. the percentage
and equity for each service by first adding groups of relevant indicators together. A single
measure of each performance dimension was then constructed for all services by
standardizing the total score for each service area. This was done by dividing each
aggregated group of service area indicators by the mean score for that aggregated
measure. So, for instance, to create the standardized output quantity measure we first
added together the scores for three indicators of quantity of education delivered
(percentage of excluded pupils attending ten hours tuition a week (inverted), percentage
of excluded pupils attending 10-25 hours of tuition a week, and percentage of excluded
pupils attending more than 25 hours a week) and divided the aggregate score in each local
4
We inverted indicators by subtracting the given scores from 100.
34
authority service by the mean score. We then repeated this method for output quantity
indicators in social services and leisure and culture, thereby deriving a single measure of
effectiveness, quality, quantity and equity for each local authority service in Wales.
Individual scores were then used as measures of objective performance suitable for
also meant that each indicator was weighted equally, ensuring that our analysis was not
unduly influenced by particular indicators. Factor analysis was not used to create proxies
for each performance dimension because the number of cases per service area is too small
to create reliable factors (for a concise explanation of this problem see Kline 1994).
Internal subjective performance measures for our analysis were derived from an
measures capturing four dimensions of organizational performance were used for our
analysis: effectiveness; output quality; output quantity; and equity. We also asked survey
respondents about efficiency, value for money, consumer satisfaction, staff satisfaction,
and social, economic and environmental well-being, but NAWPIs do not cover these and
respondents were asked to assess how well their service was currently performing in
placed the performance of their service on a seven-point Likert scale ranging from 1
5
A copy of the full questionnaire is available on request from the authors.
35
(disagree that the service is performing well) to 7 (agree that the service is performing
well). Multiple informant data were collected from corporate and service level staff in
each organization. Two echelons were used to overcome the sample bias problem faced
officers and service managers were selected because attitudes have been found to differ
between these positions (Aiken and Hage 1968; Payne and Mansfield 1973; Walker and
informants, and up to ten managers in each of the nine service areas covered by objective
managers, because the objective performance indicators and external subjective measures
The number of service informants and respondents differed for each authority
because the extent of contact information provided varied across services. The number of
eighty-nine questionnaires per authority were sent to service managers, and the maximum
The survey data were collected in the autumn of 2002 and the autumn of 2003
from Welsh local authorities (for an account of electronic data collection procedures see
Enticott 2003). The sample consisted of 22 authorities and 830 informants in 2002, and
22 authorities and 860 informants in 2003. 77 per cent of authorities replied (17) and a 29
per cent response rate was achieved from individual informants (237) in 2002. In 2003,
73 per cent of authorities replied (16) and a 25 per cent response rate was achieved from
councils in 2002 and on 84 of those services in 2003. Most of the services we analyzed
36
responded to both years of the survey, but a small number responded only in either 2002
or 2003. The remaining services in this sub-set did not respond to the survey.
To generate service level data suitable for our analysis informants’ responses for
each service area were aggregated. The average score of these was then taken as
representative of performance in that service area. So, for instance, if in one authority
there were two informants from the education department, one from school improvement
and another from special education needs, then the average of their responses was taken
councils immediately raises the possibility that their perceptions of service performance
External subjective performance measures for our analysis were derived from inspection
reports for local services in 2002 and 2003. Inspectors are agents of central government,
but they are also accountable to their direct employer (usually the Audit Commission)
and bring a range of professional values to their task. No explicit statement is available
that reveals the dimensions of performance that are included in their assessments or the
relative weights that inspection teams attach to them. Their judgments are based on
range of ‘direct inspection methods’ (Audit Commission 2000: 15). Judgments on service
performance range from 0 stars (poor) to 3 stars (excellent), and were converted into
categorical data (1-4) for the purposes of our analysis. The inspection reports also
37
categorize services on their capacity to improve; however, these judgments refer to
organizational characteristics rather than service performance, and so are not included in
our analysis. Even for service performance alone, however, it is clear that inspectors are
drawing not only on NAWPIs, but also on other sources of information and, possibly,
overall mean objective performance score for each inspected service. This was done by
first adding together the performance indicators in each service area used in our analysis.
We then standardized the aggregated performance indicator scores for each service area
by dividing them by the mean score for those areas. These standardized scores were
authority services, by dividing them by the mean score for inspected services.
an overall mean subjective performance score for each inspected service. This was done
by aggregating the effectiveness, quality, quantity and equity scores of inspected services
and dividing the aggregated results by the mean aggregated score. This standardized
score was then treated as an internal subjective performance measure for inspected
services.
The program of external inspection covered only around 20% of services in each
year. To generate a larger sample size for our comparison of inspection judgments and
performance we averaged the performance scores for 2002 and 2003. These scores were
then standardized by dividing them by the mean objective and subjective performance
RESULTS
38
The results for our comparison of objective and subjective performance are presented in
Tables 2.2 to 2.13. The tables show our findings on the relationship between objective
and internal subjective performance, objective and external subjective performance, and
this method is the most suitable for the small size of our sample (Kendall, 1970). Tables
2.2 to 2.5 show the correlations by performance dimension for the two years analyzed.
Tables 2.6 to 2.9 show the correlations between performance dimensions within each
year for objective and subjective measures. The number of cases for each set of
correlations varies due to the different response rates for each year of the electronic
survey of Welsh councils, changes in the size of the NAWPI data set caused by missing
or unreliable data (for example, some returns for the proportion of special school pupils
excluded in 2002 were identified as potentially incorrect by auditors), and the smaller
number of indicators available for certain performance dimensions in some service areas.
Each set of correlations comprises those services where objective and subjective
performance scores could be derived for 2002 and 2003. The units of observation for all
correlated across the two years. The correlation of .418, however, suggests that objective
39
performance on this dimension exhibits a moderate degree of stability. Likewise, our
subjective measure of effectiveness is positively correlated for the two years analyzed,
but the size of the correlation (.288) implies only a low degree of stability. Our measures
2002, but uncorrelated in 2003. This finding implies that the two aspects of effectiveness
are not equivalent, and that our measures may be tapping different components and
criteria of effectiveness which are unrelated to each other. For example, our measure of
Table 2.3 shows that our objective measure of output quality is positively
correlated across the two years studied, as is our subjective measure. The moderate
correlations (.444 and .319) suggest that our separate measures of output quality exhibit a
degree of stability through time. Objective and subjective quality are positively and
weakly correlated only in 2003, with no relationship apparent in 2002. The modest
longitudinal relationships for objective and subjective measures taken separately are
therefore not repeated across those measures. Table 2.4 indicates that our objective
measure of output quantity is positively correlated for the two years analyzed, with the
measure exhibits no stability over time. This may be a consequence of the lower
reliability associated with the small dataset, that is used for our analysis of this dimension
40
of performance. In addition, some turnover in the survey respondents within each service
year on year may make our internal subjective performance measures less stable than
those for objective performance. There is no support for a relationship between our
objective and subjective measures of service quantity in the same years. Nevertheless,
Table 2.5 shows that the only positive correlation for performance on equity is for
the objective measure across each of the two years considered. The strength of this
The other correlations indicate that there is virtually no equivalence between perceptions
of equity through time. It could be the case that the subjective measures are simply not
related to existing objective measures of equity, which omit major issues such as
distribution of services by ethnicity, gender, social class, age and geographical area. This
clearly illustrates the difficulty of comparing objective scores on a few narrow criteria of
equity with subjective scores that take a much broader view of this dimension of
Table 2.6 shows that only objective equity is correlated with other dimensions of
output quality and equity (-.499), which suggest that organizations are faced with a trade-
off between equity and quality. By contrast, the moderate correlation (.409) between
41
equity and output quantity implies that improvements in one may be accompanied by
improvement in the other. The weak negative correlation between effectiveness and
performance. Table 2.7 shows that in 2003 there is only a weak positive correlation
between objective quality and equity. This suggests both that the relationship between
different objective measures is unstable through time and that, overall, they are tapping
instability in objective performance, which is often found to display high levels of auto-
correlation through time. This may be attributable to the use of aggregated rather than
government.
Table 2.8 shows that there are strong positive correlations between most of the
and quantity are all positively correlated at about .7 or more. However, equity is weakly
correlated with the other performance dimensions. The high correlations for
effectiveness, quality and quantity imply that the survey respondents have convergent
views on these performance dimensions. The evidence may also indicate the presence of
common source bias within the data set. It is not obvious, however, why common source
bias would be present in the measures of service outputs and effectiveness, but not in the
measures of equity. Table 2.9 shows that the pattern of relationships for subjective
performance dimensions remains stable through time, with the correlations for equity and
42
We used descriptive statistics to analyze the relationship between external subjective,
internal subjective and objective performance because this method is the most suitable for
the small number of inspected services. Table 2.10 shows the range of values and their
standard deviations for each of these aspects of performance in 2002, 2003 and for both
years aggregated. Tables 2.11-2.13 show the mean subjective and objective performance
scores for those authorities judged by inspectors as 1 (poor), 2 (fair), 1 and 2 (poor or
fair), and 3 (good) for the same time periods. More Welsh services provided objective
performance data for audit and inspection than responded to our survey. As a result, the
n for the objective performance mean is higher than for the subjective performance mean.
The tables indicate that, in general, as inspectors’ rate services more highly, the
mean subjective and objective performance scores for these services also rise. This
suggests that inspectors’ judgments are positively, if loosely, related to the substantive
and perceived achievements of local service providers. However, there are two
exceptions to the pattern. The mean subjective performance score for good services in
2003 is lower than that for fair services, which contrasts with the higher mean objective
performance score for services receiving the better inspection judgment in the same year.
This finding is repeated for the aggregated figures shown in table 2.13. These results may
be caused by the lower data reliability associated with the small number of subjective
performance scores in 2003, or could reflect more serious divergence between internal
CONCLUSION
43
Organizational performance in the public sector is complex and multi-dimensional. It is,
range of criteria and measures. Different stakeholders are likely to have the same view of
weights, indicators, data reliability and expectations) is met. Prior literature tends to
contain the assumption that objective measures are better than subjective measures of
any more than there are objective measures of art and music. Public service beauty is in
the eye of the stakeholder. The supposedly objective measures that are available are
and desirable. Furthermore, such objective indicators are also swayed by data availability
and ease of measurement, so inevitably present a partial (in every sense) picture of
performance.
indicators that service providers are required to collect and report. Thus some positive
performance, but this is unlikely to be high unless both types of measure focus on exactly
different sets of performance information in Welsh local government has shown that
these conditions for the comparability of different views of performance are rarely met.
criteria such as effectiveness and equity; and although the subjective indicators of these
concepts were broader, their exact content was difficult to discern. The performance
44
criteria and weights that entered the performance judgments of service managers and
The major lesson from our analysis is that objective and subjective measures of
performance provide different pieces of the performance jigsaw. They are rarely
performance are ‘wrong’ simply because they are weakly related to objective measures.
Except when either subjective or objective measures are distorted by low reliability or
45
REFERENCES
Ashworth, R., Boyne, G. A. and Walker, R. M. 2002. ‘Regulatory problems in the public
Commission.
Bohte, J. and Meier, K. J. 2000. ‘Goal displacement: assessing the motivation for
evaluation of the statutory framework in England and Wales’, Public Money and
221-228.
Boyne, G. A., Day, P. and Walker, R. M. 2002. ‘The evaluation of public service
Boyne, G. A., and Walker, R. M. (eds.) 2005. Journal of Public Administration Research
46
Brewer, G. A. 2004. ‘In the eye of the storm: frontline supervisors and federal agency
Carter, N., Klein, R. and Day, P. 1992. How organisations measure success: The use of
absence of objective measures: the case of the privately-held firm and conglomerate
695-715.
Golden, B. R. 1992. ‘Is the past the past—or is it? The use of retrospective accounts as
attitude: the views of public sector workers’, Public Administration, 82: 63-81.
Guest, D. E; Michie, J., Conway, N., and Sheehan, M. 2003. ‘Human resource
47
Kelly, J. M. and Swindell, D. 2002. ‘A multiple-indicator approach to municipal service
National Assembly for Wales. 2001. Circular 8/2001, Local Government Act: guidance
National Assembly for Wales. 2003. National Assembly for Wales Performance
National Assembly for Wales. 2004. National Assembly for Wales Performance
Ostrom, M. 1973. ‘The need for multiple indicators of measuring the output of public
48
Paper presented at the Determinants of Performance in Public Agencies Conference
363-77
Robinson, R. B. and Pearce II, J. A. 1988. ‘Planned patterns of strategic behavior and
43-60
11: 801-14
49
Walker, R. M. and Boyne, G. A. in press. ‘Public management reform and organizational
administration: revisiting the managerial values and actions debate’, Journal of Public
Wall, T. B; Michie, J., Patterson, M., Wood, S. J., Sheehan, M., Clegg, C. W. and West,
50
Table 2.1 Objective Performance Measures 2002-2003
Education % Unqualified school-leavers % Reception-Year 2 classes % Excluded pupils attending Primary school exclusions
Average General Certificate in % Year 3-6 classes with 30+ (inverted) Secondary school exclusions
% GCSE C+ in English/Welsh,
Maths/Science
Social % Care leavers 1+ GCSE % Looked after children with Older people helped to live at
needs statement
reviewed
1
Housing % Rent collection % Homelessness applications Commission for Racial
times (inverted)
% Repairs in time
(inverted)
collections (inverted)2
% Population served by
kerbside recycling
Highways Pedestrians killed or seriously % Faulty street lamps % Pedestrian crossings with
51
injured (inverted) for road works (inverted)
injured (inverted)
injured (inverted)
(inverted)
(inverted)
(inverted)
processed in 8 weeks
out in 10 days
notifications of changed
circumstances (inverted)
1. This performance indicator was not collected in 2003. Thus the effectiveness
2. For this indicator the large maximum scores (874 in 2002, 507 in 2003) meant
that we subtracted all the given missed bins scores from 1000 to create the
inverted indicator.
52
Table 2.2 Effectiveness correlations 2002-2003
1 2 3 4
1 Objective effectiveness
02
03
02
03
53
Table 2.2 Output quality correlations 2002-2003
1 2 3 4
1 Objective quality 02
54
Table 2.3 Output quantity correlations 2002-2003
1 2 3 4
1 Objective quantity 02
55
Table 2.4 Equity correlations 2002-2003
1 2 3 4
1 Objective equity 02
56
Table 2.5 Objective performance correlations 2002
1 2 3 4
1 Objective effectiveness
quantity
57
Table 2.6 Objective performance correlations 2003
1 2 3 4
1 Objective effectiveness
quantity
58
Table 2.7 Subjective performance correlations 2002
1 2 3 4
1 Subjective effectiveness
quality
quantity
59
Table 2.8 Subjective performance correlations 2003
1 2 3 4
1 Subjective effectiveness
quality
quantity
60
Table 2.9 Descriptive statistics for subjective and objective performance in
inspected services
61
Table 2.10 Inspection results and mean subjective and objective performance
2002
score
62
Table 2.11 Inspection results and mean subjective and objective performance
2003
score
63
Table 2.12 Inspection results and mean subjective and objective performance
2002-2003
score
64