WSS Process Evaluation Peer Review

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

IN-CONFIDENCE

External peer review of the Martin Jenkins process evaluation of the COVID-19 wage
subsidy scheme (WSS) for the Ministry of Social Development

Dr Simon Chapple.

Director, Institute for Governance and Policy Studies

School of Government

Victoria University of Wellington

18 August 2022
IN-CONFIDENCE

Background

This report provides a peer review of the draft process evaluation conducted by consultancy
firm Martin Jenkins (MJ) of the COVID-19 Wage Subsidy Scheme (WSS).

The draft evaluation document by MJ of 113 pages was received from the Ministry of Social
Development (MSD) on 12 August 2022. I was originally told by MSD to expect around 80
pages, so the draft came in at 40% longer than anticipated. The finalised requirement was to
deliver a peer review by 22 August 2022, extended from 18 August in response to the longer
than anticipated draft, for an amount of work originally estimated at between 16 and 20
hours. The contract fee was not changed in response to the growth in draft size. Other
material, including the MSD RFP and the MJ Evaluation plan was made available by MSD
officials a couple of weeks earlier. In the end, I delivered this peer review on 18 August, as I
had a pre-commitment to annual leave from that date. These contract constraints are reflected
in the degree of polish of this peer review and some repetition as I worked my way through
the entirety of the 113 page document (I have tried to eliminate as much repetition as
possible). Nevertheless, I have written more than 10,000 words of peer review addressing a
draft document of roughly 40,000 words.

MSD indicated that the key purpose of the external review “is to ensure that the contents of
the report…sufficiently covers the expectations of the Ministry of Social Development” as set
out in the RFP. They make references to two specific questions being adequately addressed
by MJ:

1. How well did the WSS policy development process work given the crisis context,
time, and resource constraints?
2. How well was the WSS implemented over time and how well were the risks managed
during implementation?

In this context, they added a need to assess whether:

1. MJ have adopted a sound methodological approach


2. Whether the overall contents are presented in a logical and considered manner.

My primary structural approach taken to assess the MJ evaluation is to use the UK Medical
Research Council (MRC) guidelines for a process evaluation as an external benchmarking
tool. The MRC framework is developed for process evaluations of complex interventions and
is summarised in Moore et al. (2015). The framework has three legs:
IN-CONFIDENCE

Implementation: Addressing what is implemented and how. There are three dimensions of
implementation:

1. Fidelity – whether the intervention was delivered as intended.


2. Dose – the quantity of intervention implemented.
3. Reach of the intervention – whether the intended treatment group(s) come into contact
with the intervention and how.

Mechanisms of impact: How does the delivered intervention produce change? This involves
evidence to explore the validity of the causal logic of the intervention.

Context: Context is anything external to the intervention which may act as a barrier or
facilitator to its implementation or to its effects.

A process evaluation is typically undertaken during a pilot or alongside a full trial. In this
context, I observe that the MJ process evaluation is somewhat unusual as it is an ex post
process evaluation after the WSS has been wrapped up.

The key MRC recommendations for a high-quality process evaluation are shown in Table 1
(Moore et al. 2015, Box 1).

Table 1: Framework for process evaluation quality assurance (from Moore et al. 2015)
Planning
Carefully define the parameters of relationships with intervention developers or
implementers. Balance the need for sufficiently good working relationships to allow close
observation, against the need to remain credible as independent evaluators. Agree whether
evaluators will take an active role in communicating findings as they emerge (and helping
correct implementation challenges) or have a more passive role.
Ensure that the research team has the correct expertise. This may require: Expertise in
qualitative and quantitative research methods as well as appropriate interdisciplinary
theoretical expertise
Decide the degree of separation or integration between process and outcome evaluation
teams Ensure effective oversight by a principal investigator who values all evaluation
components. Develop good communication systems to minimise duplication and conflict
between process and outcomes evaluations. Ensure that plans for integration of process and
outcome data are agreed from the outset
Design and conduct
IN-CONFIDENCE

Clearly describe the intervention and clarify causal assumptions (in relation to how it will
be implemented, and the mechanisms through which it will produce change, in a specific
context)
Identify key uncertainties and systematically select the most important questions to
address. Identify potential questions by considering the assumptions represented by the
intervention. Agree scientific and policy priority questions by considering the evidence for
intervention assumptions and consulting the evaluation team and policy or practice
stakeholders. Identify previous process evaluations of similar interventions and consider
whether it is appropriate to replicate aspects of them and build on their findings.
Select a combination of methods appropriate to the research questions. Use quantitative
methods to measure key process variables and allow testing of pre-hypothesised
mechanisms of impact and contextual moderators. Use qualitative methods to capture
emerging changes in implementation, experiences of the intervention and unanticipated or
complex causal pathways, and to generate new theory. Balance collection of data on key
process variables from all sites or participants with detailed data from smaller, purposively
selected samples. Consider data collection at multiple time points to capture changes to the
intervention over time.
Analysis
Provide descriptive quantitative information on fidelity, dose, and reach.
Consider more detailed modelling of variations between participants or sites in terms of
factors such as fidelity or reach (e.g., are there socioeconomic biases in who received the
intervention?).
Integrate quantitative process data into outcomes datasets to examine whether effects differ
by implementation or prespecified contextual moderators, and test hypothesised mediators.
Collect and analyse qualitative data iteratively so that themes that emerge in early
interviews can be explored in later ones.
Ensure that quantitative and qualitative analyses build upon one another (e.g., qualitative
data used to explain quantitative findings or quantitative data used to test hypotheses
generated by qualitative data)
Where possible, initially analyse and report process data before trial outcomes are known
to avoid biased interpretation
IN-CONFIDENCE

Transparently report whether process data are being used to generate hypotheses (analysis
blind to trial outcomes), or for post-hoc explanation (analysis after trial outcomes are
known)
Reporting
Identify existing reporting guidance specific to the methods adopted.
Report the logic model or intervention theory and clarify how it was used to guide
selection of research questions and methods.
Disseminate findings to policy and practice stakeholders.
If multiple journal articles are published from the same process evaluation, ensure that each
article makes clear its context within the evaluation as a whole.
Publish a full report comprising all evaluation components or a protocol paper describing
the whole evaluation, to which reference should be made in all articles. Emphasise
contributions to intervention theory or methods development to enhance interest to a
readership beyond the specific intervention in question.

Intervention logic
The MRC framework suggest that the intervention logic, via Mechanisms of Impact, should
be a central, integrated component of the process evaluation. In the MJ report, the
intervention logic appears to play little central or integrated role in the evaluation.

The section on “Purpose and objectives” describes four objectives of the WSS (p. 21). Table
4 (p. 23) details some shifts over time. It is unclear in this context how the “objectives” of the
intervention differ from the “outcomes”. They don’t quite appear to be synonyms, but I may
be wrong. Clarity would be of value here.

The four objectives are:

- Protecting the health system and individual health


- Protect the economy
- Protect business
- Protect workers
- Protect the welfare system

It seems logically problematic to me to perceive the economy as an entity different from the
people who make their incomes directly from it – workers and employers. It also seems
logically problematic to assert that an objective of the intervention is to protect the welfare
IN-CONFIDENCE

system – the purpose of the welfare system is to protect people who are in need, including
people who lose their jobs and become eligible, not vice versa.

Appendix 2 contains a short summary of an intervention logic. It refers to an A3 available on


request which apparently provides the full intervention logic. The intervention logic is clearly
decentred into an Appendix that few will read and into an A3 that even fewer readers will
access.

Within the summary intervention logic, there is no explicit mention of direct fiscal stimulus –
measured in tens of billions of dollars – of the wage subsidy income transfer as an objective
(“maintain incomes and spending” – four words in the entire document – is as close as we
get). Nor is there any mention of the indirect impacts of the intervention on consumer
confidence and hence spending, via workers feeling confident in retaining jobs, and business
confidence, and hence the likelihood of businesses to maintain their investment expenditure.

The lacuna in terms of consideration of macroeconomic factors in the intervention logic is


surprising, and a considerable failing. The IGPS macro group, which meets fortnightly to
discuss macro issues and comprises a mix of staff and experienced outside economists,
certainly discussed the macro impacts of the wage subsidy at the time. It would be
exceedingly surprising to me if Treasury were not well aware of potential substantial macro
policy impacts. It would have been deeply remiss of them not to have considered these issues
in the policy part of the process. The macro fiscal injection intervention logic should have
emerged either because of competence of the MJ team in this area, or out of the structured
conversations with officials.

A second lacuna in the intervention logic is its apparent failure to elucidate the programmatic
role of the assumed underlying “high trust” approach, despite the model being mentioned at
several points in the MJ draft. It is even unclear about who is assumed to be able to operate
under conditions of high trust – it is never explicitly mentioned - although one must assume
the presumption is that employers are to be trusted to honestly apply programme rules to
themselves (or is it the spirit/intent of the programme?) in making their applications.

A third lacuna in the intervention logic is any apparent consideration of market failure. A
temporary hit to business earnings should lead to many rational firms and workers continuing
their employment connections as long as they can finance them until the long-term norm is
re-established. If there are capital market imperfections meaning that either firms or worker
cannot borrow to tide themselves over a short term-shock, or either firms or workers do not
IN-CONFIDENCE

self-insure by possessing sufficient financial buffers, government intervention may be needed


in the form of something like the WSS. Successful intervention is, therefore, predicated on
(1) lack of assets for either workers or firms to draw down on to compensate for short-term
cash flow problems, (2) failure of the financial system to lend to firms or workers for usual
market failure reasons of moral hazard and adverse selection. These are, in theory at least,
testable assumptions in a process evaluation.

Finally, there is no clarification at any point of how the de-centred intervention logic was
used to guide selection of research questions and methods as the MRC suggest it should.

Fidelity, dose, and reach

According to the MRC guidelines, a core part of a process evaluation is to provide descriptive
quantitative information on fidelity, dose, and reach. The parts of the evaluation explicitly
addressing these dimensions were not obvious, although some could be identified scattered
unsystematically throughout the document.

For example, the most basic information related to dose and reach - a Table which contains
very simple numbers of applications, spend and numbers of jobs - is provided, by Phase (p.
10). Extraordinarily, the Table is apparently so unimportant it is not numbered or indexed.

Table 5 is a “initial analysis of uptake”, also related to dose and reach. It contains no
numbers, no indication of Phase for take-up and no indication of data sources and method
which underlies the table. “Motu, Table X” is not a sufficient source.

It seems from various places in the report that the take up component, central to a process
evaluation (according to MRC), is out-sourced to Motu, who are doing the outcomes
evaluation. Consequentially it appears, also contrary to MRC guidelines, the process
evaluators do not have the requisite quantitative skills sets for core elements of the process
evaluation exercise. Indeed, this lack of quantitative expertise seems glaringly evident when
they are using other quantitative data in this report. It must be said the situation in terms of
applying qualitative methods is as dire.

Despite this outsourcing and the fact that this work apparently has not yet been fully
completed, MJ nevertheless manage to draw a considerable number of process evaluation
conclusions. This approach, in my view, does not meet acceptable minimum evidential
standards. It suggests a tendency jump to predetermined conclusions in advance of the
evidence.
IN-CONFIDENCE

Data and methods used to answer the evaluation question

Table 1 of the document lists six different data sources. MJ state that they use a mixed-
methods approach to draw together data from multiple sources. It is unclear exactly what
these mixed methods are, or why they were chosen to apply to the research questions of the
process evaluation. MJ claim that this approach allows them to “triangulate”. It is unclear
exactly what MJ mean by “triangulate” in this context and how they believe it will aid in
answering the questions. The draft is an almost complete epistemological black box.

The data sources mentioned are:

- Review of existing data and documents


- Interviews with workers, n=56, 10 Māori, 8 Pacific
- Interviews with employers, n=63, 21 Māori-led business, 12 Pacific
- Interviews with sector representatives, n=18
- Workshops and interviews with agencies and officials, n=40, officials from Treasury,
MBIE, MSD, IR, TPK, Te Arawhiti
- Online survey of workers (n=1013), employers (n=1535), agency officials (n=40)

A large number of very important issues arise here at foundational level which need to be
addressed before the evaluation can be taken as being of anything near a minimum acceptable
standard.

It is unclear if any of the n’s indicated were intended or achieved numbers. Nor is it ever
made clear why those numbers were chosen, let alone in relation to the research questions.

The review of existing data and documents could be very usefully split into its component
data and documents parts. The data part refers to “administrative data on scheme uptake,
complaints and processing”. What exactly is this data? Is it only MSD data? What about
integrated administrative data? Much more detail needs to be provided.

As with all the other forms of data set out, an explicit and detailed discussion of how it is to
be used and its strengths and weaknesses in terms of addressing the research questions,
individually and in concert with the other data sources and methods, is absolutely essential to
a credible, high-quality evaluation.

A further necessary starting point for this review should be a comprehensive listing of the
“More than 200 [existing relevant] documents”, the dates they were written, to whom they
IN-CONFIDENCE

were written, for what purpose and who wrote them (agency). The process or processes by
which this list was compiled by the evaluator needs to be set out, including the extent to
which document discovery was an iterative process, and how documents were deemed
relevant or not to the evaluation.

Interestingly, at virtually no (or no) point in the evaluation is there any quotation from any of
this official documentation, despite copious use of quotation from the qualitative interviews
and (what I presume is – we aren’t told) non-transparent open field question(s) in the two
online questionnaires. Exactly why interview/questionnaire quotations are so enormously
privileged by MJ is not made clear to a reader. However, in this context it is worth
mentioning that the documents are real-time information arising out of significant collective
thought for which organisations that authored them are at least notionally publicly
accountable. Interview/questionnaire opinions are retrospective, often off the cuff, and there
is no attribution and responsibility or accountability for them. You’d think these issues might
be mentioned and the appropriate qualifiers made.

Further, very few of these 200+ documents are apparently referred to in the evaluation which
follows Table 1. The referencing even of these few leaves a great deal to be desired (more on
extremely poor referencing below). For example, footnote 17 refers to a Cabinet paper dated
December 2020. It is unclear which agency wrote the Cabinet paper, what it was entitled, or
what its purpose was. Footnote 20 references a Treasury Report, with no documentation of
who the reporting was to, what its intention was and the date at which it was written.
Footnote 22 refers to a Joint Report. It is not clear who collaborated on the report and who
that report was written for and for what purpose. Page 86 refers only to a “March Cabinet
paper”. Footnote 26 mentions “Wage Subsidy Scheme: quarterly update on our on-going
approach to audits and integrity”, Ministerial Briefing, February 2021 and gives no indication
of agency or Minister (although these might reasonably be inferred, it requires some
unnecessary thought from a reader). Footnote 29 appears to refer to a document by Treasury,
but I can’t find any corresponding footnote on p. 93. Footnotes 9 and 10 have references to
documentation with no agency authorship.

There is no consideration here of existing sample survey data which might be able to be used,
perhaps after integration with administrative data, to address the process evaluation
questions. The HLFS immediately springs to mind as a possible source. In addition, the IGPS
collected data on WSS take-up from workers in our lock-down survey, which MBIE provided
IN-CONFIDENCE

10

some financial support for and with which they are familiar. Good evaluation practice should
have uncovered this data.

In terms of the interviews, in addition to it being unclear about how they were intended to be
used in pursuing the evaluation questions, it is unclear how the interviewees were selected,
what questions they were to be asked, by whom, how the questioning/discussion process was
structured, and how information provided, including on the various dimensions of the
background of the interviewee, was recorded and coded and what systematic replicable
criteria, if indeed any, was used to select quotations and report them in the context of the
over-riding questions.

In terms of the agency face-to-face interviews, the cross-agency balance of interviewees were
unclear, as was the extent and depth of their role in policy development or operational
delivery, their technical expertise and job experience, or their level of responsibility within
their organisation, all of which are dimensions which seem a priori relevant to the quality and
quantity of useful information that they could supply pertinent to the evaluation questions.

In terms of the online survey of agency officials, it was unclear if this was the same set of
officials who had participated in the face-to-face interviews. Was it simply coincidence that
the number in each vehicle was both 40 officials? Unclear.

In terms of the online surveys of employers and employees, there was no information about
when the data were collected (barring the fact that it was online), how respondents were
recruited, whether there was any over-sampling of priority groups and if so, how was that
done, what questions were asked, in what order, and how data was cleaned and coded. The
limited apparent reporting of these quantitative results provides no standard errors to assess
statistical significance or otherwise of the information. It is unclear to this reader if these
sample surveys can be relied upon to generate information which can be used to generalise
about populations and sub-populations.

It is unclear whether any piloting was involved for either interviews or questionnaires.

There was no evidence that the quantitative and qualitative analyses were in any way
designed to progressively build upon one another. There was no evidence that qualitative data
was used to explain quantitative findings or quantitative data was used to test hypotheses
generated by qualitative data.
IN-CONFIDENCE

11

Judging by the way the qualitative data was actually employed in practice by MJ, I fear it
simply was used to construct a narrative, with all the issues of potential bias, including
confirmation bias and conclusion-direction by non-transparent evaluator priors, which in my
experience this sort of narrative use in public sector evaluations too frequently entails.

It is unclear who owns the IP to any of this data and whether the data are available to outside
researchers to replicate, expand or extend any of the evaluation work undertaken. If they are
not easily available for reasons of privacy and confidentiality, it behoves the evaluator to
present as much detail as they possibly can in their report, conditional on these constraints.

There was no indication in the loose and brief discussion of methodology of an appropriate
and informed sequence of information gathering and analysis, as already mentioned above.
An obvious sensible iterative sequence in outline might have looked something like this:

• Develop and document a transparent process to comprehensively identify documents


and already existing data sets relevant to casting light on the process evaluation
research questions.
• Comprehensively list identified existing documents and data in an appropriate
scholarly manner.
• Assess and analyse their strengths and limitations in terms of answering the research
questions.
• Draw weak interim conclusions in a working paper, identify and assess information
gaps, and develop questions for further research from the analysis of the existing
documentation and existing data.
• Use this preliminary work to inform sensible questions and engagement with
officials, employers, and workers with the aim of generating process evaluation
hypotheses which could be tested on sample survey data enabling coherent and high
quality population and sub-population generalisations.
• Run those questions in the sample survey(s).
• Write up the draft report drawing conclusions based on this sequential gathering of
evidence. Carefully weigh strengths and limitations of data and methods in
addressing the research question when drawing those conclusions. Make limitations
explicit and clear to readers. Ensure the work is as independently replicable as
possible to others, including both insiders and outsiders, by making as much data as
IN-CONFIDENCE

12

possible publicly available, referencing it in a fashion which meets strong academic


standards and have detailed a clear robust methodology.

Comments on section titled “Consistency with the Treaty of Waitangi and Māori
experience of the Wage subsidy”

MJ claim that “[e]ven if it is valid that the Crown should place more weight on article 1 (and
Māori would probably contest this)”, there was space for more Treaty consideration after the
initial policy urgency has subsided.

More correctly “some Māori and some non-Māori” or, better, “some people” “would contest
this”. Māori, as with any other ethnic group, should never be treated as a homogenous group
with a common opinion or world-view, just like all other macro groups. A similar claim is
made on p. 95 where Māori are treated as an implied homogenous mass, rather than a highly
diverse group with highly diverse views. The meaning of article 1 and relationship to other
articles in this context is not made clear by the evaluators.

The issue of weight placed on various Treaty articles in meeting Treaty requirements is, of
course, ultimately a subjective, values-driven one, as is the appropriate resourcing into Treaty
dimensions, compared to catching up on other core work neglected in the most urgent phases
of developing and implementing the WSS (core work like, say, the 2020 and 2021 Budgets!)
or new work programme items in other urgent areas arising out of COVID-19.

Any conclusion about meeting or not meeting Treaty obligations – including that by MJ – is
therefore fundamentally a subjective, values-driven one, on several levels. One might
imagine that some explicit acknowledgement of this fact would be made by MJ and the
evaluators would avoid imposing their subjectivities on their audience.

It seems to me that the only appropriate and intellectually defensible objective conclusion is
something along the lines of “Drawing any conclusion here depends on different people’s
subjective weights on importance of the Treaty and its relative constituent parts, relative to
the value of other calls on scarce public sector policy advice resources. These weights can
and do legitimately differ between people. Hence no definitive conclusions can be drawn in
this space by objective evaluators”.

MJ claim in a sub-heading that the intent of the wage subsidy was not well understood by
Māori audiences (p. 30). This truth-claim is evidently a population generalisation about
Māori. If it is an absolute one – x% of Māori did not well understand the intent of the subsidy
IN-CONFIDENCE

13

– we are not provided with the percentage. If is it a relative one, compared to another social
group, we are not provided the percentage difference or that other social group. Nor do we
have any error margin. The only evidence offered in support of the strong assertion is two
qualitative quotes, one from an agency official and another a self-employed Māori person. It
is self-evident that this data is utterly insufficient to establish the strongly conclusion made by
MJ. The bow is ludicrously over-drawn. The further claim that WSS messaging did not
resonate with “Māori values and ideals” additionally assumes there is some unity of values
and ideals amongst Māori that, additionally, significantly differ from those of other New
Zealanders. Again, such a strong assertion about in-group uniformity and out-group
difference, as opposed to between-people diversity, requires strong evidence, which we are
once again not offered.

The following sub-section is entitled “Use of Māori business networks may have better
addressed barriers to access”. Or not. Nowhere have MJ established the implied premise -
that the Māori, business population faced significant barriers to access of the WSS, nor – if
they did – that the causal pathway arose in some way out of their Māori-ness in some fashion
and the solution was using Māori business networks. Assertion without evidence is not good
practice evaluation.

The next sub-section title asserts “Māori experience of the scheme became more varied as the
design of the scheme changed”. Again, this is a population assertion, in this case about
second moments of a distribution over time. So, establishing this descriptive fact requires
population data at least two points in time, a measure of variation and a statistical test of
change in that variation. Again, perhaps unsurprisingly at this point, none of these are
supplied. The only evidence offered is from one Māori small business owner. When n=1,
definitionally, no variation can ever be observed.

MJ claim: “We were advised that Māori tourism businesses were surveyed, and the results of
their experiences were mixed” (p. 29). Advised by whom? Surveyed by whom? What were
they asked? What does “experiences were mixed” mean? How does that differ from other
New Zealanders? What sort of evidential standards does this report adhere to?

MJ claim (p. 29): that “Both Māori businesses and employees did not know where and to
whom they should make any formal complaints” Really? All Māori businesses and
employees? Or only some? How many? How do they differ from other groups? Again, where
does this population generalisation come from?
IN-CONFIDENCE

14

There is no attention given in this section to assessing Māori demands and needs and the
context which Māori faced as WSS policy and delivery were being developed. Time was
clearly the essence in implementing the WSS and there are obvious trade-offs in this area
with time which the evaluators appear to wish away on the basis of a fairly bald unevidenced
assertion about the policy production function (p. 14: “Treaty considerations and the potential
impacts for Māori were not analysed or investigated to an extent consistent with good
practice, even allowing for a need to develop policy quickly”).

First, a significant number of Māori– or their non-Māori spouses on whom they may in whole
or part depend on for material support – were likely in sections of the labour market where
timeliness may have mattered a lot and Treaty issues perhaps not so much.

Second, many Māori, like other New Zealanders, faced significant extra calls on their time
during the COVID-19 crisis, both in terms of jobs and family or wider whanau. Some
(Many? Most?) may not have prioritised engaging with Government regarding the Treaty
relationship in the context of the WSS over other more pressing demands on their time.

Last, a disproportionate number of Māori were coping with pre-existing health conditions
like obesity and diabetes which rendered them vulnerable in the complex and uncertain
COVID-19 environment. In some families/whanau, the 1918 flu epidemic, where Māori died
at six times the rate of non-Māori, is well within oral memory. Under such circumstances,
many Māori may have been understandably far less interested in drawing out of public policy
processes under the banner of the Treaty involving considerable effort on their part, simply to
tick public officials’ Treaty obligations boxes.

We don’t actually know the extent of such constraints on Māori, but MJ don’t acknowledge
these potential issues, let alone investigate them. It’s all about the “Crown” to MJ. Māori are
decentred from the analysis, which does not seem consistent with a quality Kaupapa Māori
approach.

Overall, this section appears to make a number of population generalisations based, at best,
on a couple of anecdotes, anecdotes which one fears were selected based on strong priors to
illustrate a pre-determined narrative. Strong truth claims such as that made by MJ – the
process and delivery of the WSS was inconsistent with the Treaty – require robust and
methodologically sound evidence. We simply don’t have anything like evidence of the
standard required. Intellectual rigour is almost entirely absent in this section and the
conclusions which are made are evidentially unsupported. We will find that, having
IN-CONFIDENCE

15

established such an unfortunate pattern early on, MJ are loath to depart from it throughout the
remainder of the document.

Comments on section titled “Equity and consideration of potential impacts for sub-
groups of employers and workers”

Equity here is defined as “impacts on” (actually, as this is a process evaluation, this should be
“treatments for”) vulnerable population groups that are the same as for non-vulnerable
groups. Presumably the definition of equity might be expanded to include better treatments
for more vulnerable groups. In fact, that is exactly what the WSS offers.

It is unclear what criteria are used by MJ to identify vulnerable population groups. MJ


identify vulnerable groups as over-represented in low paid, low skill jobs (p. 32 – but also
apparently numbered as p. 30 in doc!). The list provided is:

- Young people
- Older people
- Māori
- Pacific
- Women
- Disabled people

It is unclear from MJ if this vulnerability is identified in a bi-variate or multi-variate context.

It is unclear why (for example) people from sole parent families, from poor families, with
mental health difficulties, migrants (permanent or temporary, economic or refugees) or
people who have low educational attainment are not identified as vulnerable groups. These
dimensions – or others – may be far more predictive of precarious employment or low wage
outcomes than the chosen groups.

MJ claim that there was little or no focus on equity in the WSS because it was a broad-based
scheme. They do not acknowledge a trade-off between simplicity and rapidity of roll-out and
the costs of addressing group-based equity issues when engaging on some form of targeting
along these dimensions.

In addition, most importantly MJ do not acknowledge that the capped and wage targeted
nature of the scheme at $585.50 per full-time worker meant that a far higher proportion of the
wages of vulnerable workers – vulnerability identified by an actual outcome of low pay, not a
IN-CONFIDENCE

16

socio-demographic characteristic which is typically only weakly correlated with a poor


individual pay outcome. In 2020 the minimum weekly wage was $756. The WSS covered
77% of the wages of a vulnerable full time minimum wage worker. Working up the wage
distribution, for a worker on the economy-wide median weekly wage of $1040 only 56% of
their wage was covered and for a worker on the average wage of $1197 the figure fell to 49%
of the wage. Thus, the policy-related incentives on self-interested employers to retain a
minimum wage worker were far stronger than for a worker on the median or average wage.

Taking Māori to illustrate, 59% of the Māori median wage and 54% of the average Māori
wage were covered by the WSS, the somewhat higher proportion of their wage coverage
reflecting their modestly lower average or median weekly wages ($999 median: $1084
mean). Hence, contrary to MJ’s conclusion, I suggest the WSS has a clear equity component
and one well-targeted at the root of individual disadvantage – the actual wage outcome –
rather than targeted at a very diverse-outcome group of people united only by common
membership of a macro group – be that macro group defined by age, gender, disability, or
ethnicity.

Nevertheless, it is also true that MJ’s vulnerable groups – and any other vulnerable groups
not considered by MJ – who on average receive a lower wage get, again on average, a higher
portion of their wage replaced. In other words, the policy is designed to ensure that a higher
proportion of vulnerable groups receive the policy treatment without ruling out other
individual Kiwis getting the treatment because they have a poor outcome but are sufficiently
unfortunate so as not to be allocated to membership of a vulnerable group by policy makers
or evaluators. Arguably, simple, well-focussed, and equitable.

In addition, at least some token analysis of the labour market seems necessary here. In the
absence of the WSS, the intervention logic suggests that numbers of workers would have
been laid off in the short-term. The aim of the WSS was to prevent such layoffs. What are the
likely characteristics of such workers? These lay-offs would disproportionately be workers
with low fire and low hire costs. These vulnerable workers would be likely to:

• Have generic rather than specific (especially firm specific) skills


• Be in labour markets more generally characterised by excess supply rather than excess
demand
• Be in jobs where the employment relationship is focussed on short-run profit
maximisation rather than enduring long-term relationships.
IN-CONFIDENCE

17

• Be non-unionised
• Be in low paid, low skilled work
• Have little or nothing in the way of redundancy provision in their employment
agreements.

In other words, when looking at those workers at the margin for policy-induced behavioural
change, this intervention has a clear equity focus.

Clearly however the design of any policy to prevent layoffs under COVID-19 is likely to
have a positive individual equity component, both in terms of seeking the equity outcome that
actually matters (reducing job loss for people with low pay) and having greater treatment
effect on macro groups whose membership generally – but typically weakly – correlates with
the undesired labour market outcome.

There needs to be an acknowledgement that there is a likely considerable wedge between


individual measures of apparent inequity, such as low pay, and actual individual lived
disadvantage. People live in families, and we know, for example, from minimum wage
research that many people who have apparently disadvantaged individual circumstances live
in well-off families and have high standards of living. The classic example of course is
university students – a considerable chunk of the “Young People” category – from middle
class families living at home with double income, property owning, university educated
mums and dads, while working as a waiter at in a restaurant part-time and getting paid the
minimum wage. Similar situations, where individual low pay does not perfectly map onto
people’s family situation holds of course for many women, Māori, old people, and disabled
people as well.

The initial analysis of programme take up (p. 32) provides a table (already discussed above)
which gives (1) only qualitative information on take-up – no indication of size and no
indication of the default group (2) no indication of statistical significance, (3) no explicit
indication of whether it is bivariate or multivariate (but by implication it is the former
however), (4) no indication of data source or WSS iteration and (5) announces that take-up is,
in any case, the business of Motu and the outcome evaluation, when it is clearly the business
of the process evaluation.
IN-CONFIDENCE

18

Comments on section titled “Policy Design and Development Process”

This section starts by stating that “[i]t likely that following the orthodox policy process and
timeframes may have led to undue delays to introducing financial assistance to business. On
the face of it this (unevidenced) claim sits rather awkwardly against the claims there should
have been more meaningful engagement with Māori under the auspices of the Treaty. More
assessment is needed here.

The section also briefly notes the fact that the WSS was based on a previous scheme used for
the Christchurch earthquake (p. 36). What was that scheme? Did it have an intervention
logic? How did the scheme differ from and how similar was it to the WSS? It is unclear
whether any evaluations were made of use of this previous scheme, whether they were
process or outcome evaluations and what was learned by officials from this previous
experience (but see OAG 2012). How exactly was it drawn on?

The section also refers to “wider interested parties” for consultation but (apart from Māori
and Pacific people), does not identify who they are. Citizens, perhaps? It is unclear whether
MJ simply define interested parties as ethnic groups or whether interested parties actually
exist beyond ethnic categories. Would those wider parties also include people outside
government who have an academic background and who specialise in labour market research
and analysis? Again, the answer is unclear.

p. 37 refers to “available data and evidence on the uptake of the scheme to date”. What was
this data? Where reports written? Were these reports used in the process evaluation? Where’s
the referencing?

MJ mention at several points in their evaluation the high trust assumption, including in this
section ( p. 48). High trust is clearly a central assumption of the intervention. But MJ never
discuss the theorised role of high trust in the causal process, it is not integrated into the
intervention logic, nor do they even start to empirically explore the validity or otherwise of
the high trust assumption. Surely this should be a core part of the process evaluation since it
is clearly a key assumption of the intervention. Data, useful at least at a contextual level,
exists in NZ on interpersonal trust, trust in fellow New Zealanders, trust in big business and
small business, and trust in government at various levels. None of this data is explored. It is
also unclear whether trust questions were asked in the two larger questionnaires of employers
and employees.
IN-CONFIDENCE

19

MJ baldly assert that “Many opportunities to reconsider delivery system [sic] or invest in
more flexible systems were not taken up throughout the life of the wage subsidy” (p. 48).
But resources are scarce and demands several – indeed MJ go on to make that very point. As
that is the case, they weren’t really free opportunities to take up!

I suggest that Figures 5 and 6 would much better be presented as cumulative totals (y axis)
versus days (x axis)

Figure 8 (and those that follow reporting the questionnaire evidence) could benefit by being
presented in tables rather than charts, especially as charts are very difficult to read when a
report is printed out in black and white, as it typically will be. Additionally, categorical
responses (say, “disagree” and “strongly disagree”) should not be aggregated in those tables.
In particular, conflating “neutral” with “don’t know” seems seriously questionable to me –
they are very different things.

MJ state “three types of employers…were less likely to report a positive experience” (p. 53).
How much less likely? What’s the confidence interval? How many of them were there? How
many people did they employ? What types of employers were likely to report a positive
experience?

On p. 54 conclusions again appear to be being drawn where population validity is implied


from qualitative evidence – in terms like “most people” and “a minority of employers”. This
is not legitimate.

MJ write “Under the high-trust model, MSD worked with IRD to verify applications” (p. 54).
This is a confusing claim, as under a high trust model verification is unimportant. No data is
provided at this point about verification/rejection by IRD based on the information which
they held and what the reasons for that rejection were.

Figure 9 presents data which appear to show that more than 100% of calls to the IRD original
wage subsidy line were answered over significant time periods. We know from post-match
interviews that rugby league players are capable of 110% effort, but I was not aware that
public officials shared a similar capability. Are the figures wrong?

This section could have greatly benefited in using and analysing data on gross dollars out
under the WSS and gross dollars back over time due to over-payment.
IN-CONFIDENCE

20

The communications discussion asserts that agencies should have made “better use of
targeted messages” (p. 61). The evidence provided by the evaluators is, in my view, not
anywhere near sufficient quality to draw that conclusion. Evaluators imply that there were
ethno-cultural and linguistic barriers to WSS take up and, by further implication, these were
significant. This is a hypothesis, not evidence for a strong conclusion. Hypotheses should be
tested, not asserted as fact.

MJ assert sector representatives took the initiative to support their constituencies because
“supporting the government’s crisis response is part of their core function” (p. 63). I would
be surprised if this was their primary reason, which seems more likely to be about supporting
their constituencies to get the support to which they were entitled.

MJ report on understandability of the subsidy (p. 63). Do workers need to understand the
subsidy at all, especially if the high trust assumption is valid? Why? Along what dimensions?

The question, apparently in the online surveys, of whether communications were in the first
language of the worker is an odd one (p. 64). Surely the language barrier issue is whether
communications were in a language in which the person was fluent or not? A similar issue
holds for the language question asked of employers, but the question is even odder as it
doesn’t ask employers about whether it was in a language in which they were fluent, it asks
employers’ perceptions of the first language of their workforces. But active take up of the
WSS is an issue for employers’ positive actions, not workers, and therefore the primary
information need is for employers!

MJ again repeat their regular failure to understand the limitations of qualitative research
where they write “a quarter of employers that responded to our survey may not have found it
easy to assess their eligibility…As noted earlier, we found that qualitative interviews suggest
these rates were higher”(p. 70). Really? I find it difficult to believe anyone would dare
generalise on non-random samples of data where n=61.

MJ (p. 71) indicate that there were variations across sub-groups in their employer survey
regarding understanding, but no data is presented to assess the veracity, size, and robustness
of these claims.

MJ write “The policy design required businesses to declare they had taken active steps to
access other supports before accessing the wage subsidy”. It is unclear if the Figure 18 on
actions taken by organisation following COVID-19 (p. 73) cover the universe of these
IN-CONFIDENCE

21

necessary active steps, is entirely orthogonal to it, or it contains some but not complete
overlap. My point goes to assessment of fidelity. The labelling of the chart requires further
thought as “Negotiated new terms with your…” makes no sense. In this context, inclusion of
the not applicable is appropriate and I believe their exclusion is incorrect. In addition, the
fidelity focus should be on firms who actually took up the WSS, not what all firms did, and it
does not appear that the data were divided here to focus on organisations who had taken up
the WSS.

Comments on section “Support functions that cut across design and delivery”

The discussion of cross-agency working needs to be contextualised in terms of the State


Sector Act and the new Public Sector Act.

The section on Cross Agency Working is, like much of the rest of the document, virtually
bereft of references and it is almost always unclear from where any of the factual assertions
derive. There are very good intellectual reasons for copious, high quality referencing – it
demonstrates that the work is thoughtful, careful, and rigorous and allows third parties (at
least in theory), to independently check and replicate (or not) any of the conclusions which
are drawn in the original document (along, of course, with a clear statement of methods). A
commitment to good referencing practice also creates an intellectual hostage to fortune on
paper and by that is a means of keeping intellectual endeavour honest and of the requisite
high quality. Ideally, an outside researcher should have enough information on data and
method from a credible evaluation that they would and could draw exactly the same
conclusions. Our distance from that ideal in this document is vast.

On page 85 an MSD survey of wage subsidy recipients is mentioned. Why was this survey
not discussed in the earlier data discussion table? Did it provide no information germane to a
process evaluation?

Some interim agency consultations with “key stakeholders” between phases 1 & 2 is
mentioned by MJ (p. 85). Who were these key stakeholders? How were they identified? Were
Māori included (see meeting Treaty obligations section) and, if so, who? If not, why not?
What exactly were stakeholders consulted on? How? By whom? What were the conclusions?
Where’s the documentation? Where’s the referencing? Key themes emerging out of this
consultation were, apparently, “not pursued” but we are not even told what these key themes
were or why they were not pursued.
IN-CONFIDENCE

22

A Deloitte’s risk assessment of the WSS for MSD is mentioned on p. 86, but no
documentation is referenced – were there documents produced as a consequence of this risk
assessment? Did MJ see them? What did they show? Were they relevant to the evaluation?
Unclear.

Page 87 refers to both random and targeted reviews of payments by MSD and IRD acting
jointly. This information would appear to go directly to core issues of intervention fidelity.
Where is this data? What did it find? Why isn’t it reported on and discussed in this section?
Why isn’t it discussed in Table 1 considering data? To what extent does this data validate (or
not) the high trust assumption of the intervention?

Care and precision in the use of language and definitions

MJ refer to tax aversion (p. 102). I believe this is wrong and they actually mean tax evasion.

It is unclear what ethnic definitions and collection methods are used for Māori, Pacific and
Asian groups and whether these definitions were consistently applied across the different data
collections. It was also unclear who the default group included.

It is unclear how a “Māori-led” business differs from a “Māori” business. Additionally, is a


doctor’s surgery with one Māori and one non-Māori doctor operating in partnership a Māori-
led business? Or is it a Māori business? Or just a business, devoid of ethnic attribution?

In considering the impact of the WSS on Māori, MJ fall into a common public sector binary
trap. They tacitly presume that the impact is via a Māori businesses and employment pathway
(ignoring any definitional issues of Māori business for the moment). But our social reality is
that more than half of Māori have a non-Māori spouse and a large majority – over two thirds
– of Māori children have a non-Māori parent (in both cases plurality NZ European) and the
impacts on them of the WSS will happen via these non-Māori people - through both non-
Māori business ownership and non-Māori employment. MJ (and the public service more
generally) need to step away from the crude binary dichotomous world in which they operate
and start seriously questioning their own over-simplified priors.

The text uses the ugly and (to me) unfamiliar word “uptake”. The more usually employed
term “take up” is also frequently used. I’m presuming they’re being used as synonyms, but
the elegant variation seems unnecessary.
IN-CONFIDENCE

23

The document could use a spell check (at least two spelling mistakes were identified. I wasn’t
looking for them) and a grammar check (several cases were observed of a failure to use the
possessive apostrophe, for example). Where low cost-to-identify and to low cost-to-remedy
problems like this are identified, a lack of care and attention to detail is demonstrated. It
provides an indicator of the degree of care and attention paid where issues are less visible to
the external eye. If the waiter’s hands are a bit grubby, it’s often an indication that the
restaurant kitchen is filthy.

In literally dozens of cases at various points in the MJ draft the evaluators use the phrase “We
heard…” and present it as tacitly conclusive information. Frequently it is unclear from whom
they heard and from what information source and there is no assessment of what weight (if
any) a reader should place on the information. References are, again, absent.

Table 2’s bracketed generic ratings reverse from EXCELLENT (Always) to POOR
(Never...). This reversal from an implied “always excellent to “never poor” is unnecessarily
confusing. Get rid of the material in brackets.

There are copious quotations from qualitative data sources in the body of the report. I have
already commented of the black boxes of how this information was gathered and the methods
through which it was selected (or not) for use and the lack of quotations from other
documentary sources. These quotations have an attributions like Employer, Large employer,
Medium sized employer, Self-employed interviewee, Worker, Part Time worker, Wellington
Region, Māori Worker, Migrant Worker, Migrant Causal Worker. Definitions are not clear,
self-evident, or even consistent – how many people are employed by a medium sized
employer? How does an Employer differ from a Large employer? Additionally, why is region
important information on workers who don’t have a Māori ethnicity? Why is region
unimportant for Māori workers? Why is ethnicity only important for Māori? What is the
overall motivation for telling us something – anything - about the provider of the quote, given
that the information cannot be used for any population generalisations (including it, of
course, might also lull an innocent or incautious reader into the belief it does in some way
offer a generalisation. Perhaps that is exactly its intention?).

MJ’s generic rating definitions for process rubric.

MJ develop and use a five-point judgementally based qualitative scale for process:
Excellent/Very Good/Good/Fair/Poor. They rank various sub-dimensions of the process in
terms of this scale (Table 3, p. 12). There seems to be a high degree of subjectivity, lack of
IN-CONFIDENCE

24

transparency and lack of evidence for allocation of these categories to score sub-dimensions
of the process.

But there is a further fundamental problem with these categories that needs to be addressed.
Allocation of a dimension at a phase to these categories necessarily involves error of various
sorts. Think of the category as the qualitative analogy to a point estimate in a quantitative
study. Any point estimate comes with error. That error is both random sampling error and
various forms of measurement error, as well as (in this case) various forms of evaluator
subjectivity and cognitive bias. MJ present their categorical assessments as if they contain no
possibility of error and no bias. This failing is a fundamental issue. I expect far greater
intellectual sophistication, self-examination, and self-criticality.

Questionable policy assertions

MJ conclude that “Improvements could have improved targeting, simplified the delivery of
the scheme and mitigated inequity” (p. 15). This is highly questionable. There is no clarity on
how targeting could have been practically improved, conditional on the constraints, and no
acknowledgement of the fact that greater targeting inevitably increases systemic complexity –
i.e., it comes at a considerable policy, delivery, and communications cost. As MJ correctly
note, the intervention was a relatively simple one grafted onto a very complex system of
labour market and other regulation, which itself created some not inconsiderable challenges.
Making the intervention more complex in terms of targeting means that a complex system
would have been interacting with a complex system. The system complexity under such
circumstances could increase multiplicatively, not simply additively. Design trade-offs
always and everywhere exist and are not specified in an objective and quantitative form using
a consistent common metric for purposes of policy optimisation. Yet conclusions that imply
such optimisation was not achieved are drawn.

In addition, MJ are critical of the public sector for not taking opportunities to re-jig the
scheme through time. Yet they fail to acknowledge that scheme changes across phases would
have increased cognitive loads of businesses, workers and delivery agents and created further
communications and credibility problems. Nor do they genuinely acknowledge the
opportunity costs of public servants’ attention in terms of any re-jigging. Nothing comes
without a cost, as already pointed out in the immediately preceding paragraph.
IN-CONFIDENCE

25

Context

Context is defined by the MRC as “anything external to the intervention which may act as a
barrier or facilitator to its implementation or to its effects”. There’s no systematic scan of the
environment by MJ to identify, list and analyse context. Some of this contextual failure has
been mentioned in the context of Māori and the Treaty analysis.

There is some acknowledgement of the importance of context issues, tucked away in an


appendix. Appendix 2 states “Other support for businesses and workers provides important
context for evaluating design, delivery, and outcomes of the Wage subsidy”. So, if context is
important, where is discussion of it? Why is it only mentioned in an appendix?

I cannot provide such a comprehensive scan of relevant context here, but off the top of my
head I can mention several other dimensions of context which are likely to be germane to a
proper evaluation.

The Public Sector Act 2020 came into effect on 29 June 2020. That Act was intended to
address one of what some influential senior officials considered to be chronic problems in the
New Zealand public service of lack of collaboration and lack of public service nimbleness.
The only mention of the Act in the evaluation is a footnote in the context of the Treaty.

There is no consideration of (1) the state of economy and the (2) state of the labour market on
the cusp of COVID-19 and through the WSS period. Equally there is no consideration of the
forecast state of the economy and the labour market. This is an incredibly important context
to the policy decision making process.

There is no consideration of contextual elements of the Reserve Bank’s monetary policy,


including actions to set the official cash rate, the Large Scale Asset Purchase Programme and
Funding for Lending Programme.

The issue of degrees of trust is also germane to context as is the Christchurch business
support package. These issues have already been raised.

Attribution and responsibility

The MRC framework recommends to “[e]nsure that the research team has the correct
expertise” in qualitative and quantitative methods and to “[e]nsure effective oversight by a
principal investigator who values all evaluation components.”
IN-CONFIDENCE

26

The draft contains a section at the end “About the authors of this report” (p. 97). That section
is primarily information about what Martin Jenkins does and what its business structure is.
“[S]upport” from Te Paetawhiti and Associates and ConnectEd is mentioned. Exactly what
the “support” from the non-MJ teams involved is unclear, as is exactly who they are and what
they do as collectives.

There is no mention of any individual authors involved, their expertise, what they had
responsibility for, who wrote what and who is the principal investigator with oversight
responsibilities.

It may be that MSD as contract managers have access to this information. However, this
information should also be provided to the audience in order that they can independently
verify expertise and oversight skills and responsibilities.

There are other good quality reasons for acknowledging individuals and their inputs and
responsibilities. Individual attribution fosters a sense of pride and responsibility for the
individual worker and thus pushes quality up, as does the demonstration effect of an
individual in the team striving to raise the benchmarks. Additionally, indicating authorship,
responsibility and work-load is effective in improving quality where people are concerned
about their future professional reputations: these individual reputations matter in terms of
future earnings capacity.

If there is no individual attribution, the risk is “all care and no responsibility”, since without it
the beneficial and adverse consequences of their individual actions are dispersed across the
many. Attribution of names and responsibilities is thus a means of better aligning incentives
to improve evaluation quality.

Conclusion

MJ is a large, well-known, and longstanding private consulting organisation with a strong


reputation. It has experienced staff working in the evaluation area.

Therefore, I am both surprised and deeply disappointed in this draft. The tacit methodological
approach – tacit as it is never clearly and comprehensively articulated – is not sound. The
methods loosely chosen are not applied to a minimum acceptable standard. Consequently, the
conclusions are not clear and logical. The document lacks depth, nuance, subtlety, and self-
reflection. MJ’s claim (Appendix 1, p. 99) that “Information has been presented
transparently. We have taken care to ensure the evaluation does not overstate the extent of
IN-CONFIDENCE

27

engagement and the representativeness of samples” makes me alternately laugh and weep. I
will merely further state that this is the only use of the word “representativeness” in the
report. Many of what I would consider the basics of good science are missing, including
assessment of the strengths and weaknesses of the evidence base, a clear and coherent
methodology to apply the evidence base to the questions, appropriate referencing, and an
assessment of the limitations of knowledge arising from the exercise, including uncertainty
regarding conclusions. Many of the conclusions derived lack a valid evidential base of
support. The obvious risk is that conclusions are a hot mish-mash of evaluator cognitive and
political biases and shibboleths peculiar to the Wellington public policy milieu. Surely, we
can do much better.

In terms of the MJ assessment rubric developed for the WSS, I would judge this draft as
“Poor”. In terms of a qualitative degree of confidence in my assessment on a personally
developed three point “High”, “Medium” and “Low” scale I would judge my confidence as
“High”.

In drawing these conclusions, I have asked myself how I would judge this work as either an
academic journal referee addressing a submission (My decision would be to reject and offer
no option to revise and re-submit. I would not bother to provide anything more substantial in
terms of feedback than “significantly fails to meet minimum academic standards in numerous
critical areas”) or as an academic supervisor for a graduate student submitting this as a thesis
draft (My response would be to advise the student to go away and do a lot more work. I
would specify that work as a weekly report since the student clearly would need detailed,
high frequency outside guidance to ensure they don’t go so badly off the rails again. As a
product, I would deem the MJ draft a failure at Masters, let alone Doctoral, level, and would
start to question the teaching capabilities of my colleagues). I believe it is a fair assessment. It
may be that a great deal of work has been undertaken which is not on show here which
supports the conclusions drawn. However, if that is the case, it needs to be included in the
final report, not hidden away behind the scenes.

In effectively needing to write a completely new evaluation in order to produce a deliverable


which meets minimum acceptable standards - it should probably be clear to MSD that this is
my advice arising out of this peer review, but I’ll state it baldly, just in case of
misunderstanding - MJ seem to me to face at least one other major problem going forward.
They have asserted certain uncaveated and strong answers to the process evaluation questions
IN-CONFIDENCE

28

based on inadequate and incomplete evidence. Consequently, there is a considerable risk that
the evaluators have now locked in high degrees of confirmation bias. Confirmation bias for
Martin Jenkins is now also a substantial issue for the proposed synthesis report. The funding
agency need to be aware of, and consider how to manage, these larger confirmation bias
risks. One obvious way to move forward would be to get a completely different team to do
the remaining necessary work, possibly blind to this draft.

There’s a further issue related to bias. The report states upfront that it is an independent
evaluation on the title page. However, MJ has close links to the New Zealand public service
via ongoing large scale and regular contracting out of a variety of policy advice and
evaluation functions. A recent media article on MJ noted: ‘There is a saying floating around
Wellington which, like most jokes, hides an uncomfortable truth: “There are three branches
of government: the legislature, the judiciary and
MartinJenkins.”’(https://www.stuff.co.nz/business/129379423/dileepa-fonseka-the-
consultancy-machine-needs-repairing--but-who-will-do-it). It is unclear to me how MJ have
successfully managed the risks of loss of independence in running this evaluation which arise
out of their very close and longstanding relationship with the New Zealand public sector as a
major client. Conscious bias here – both of the evaluator and the evaluated (public servants) –
seems to me to be far less likely than unconscious bias (however, I would certainly not rule
the former out a priori). This bias issue should also be explicitly acknowledged, potential
mitigation strategies outlined, and a clear mitigation strategy implemented.

It might be thought from my comments above that I am dismissing this evaluation as having
low or even no value. This is not the case. It is actually worse than that. This evaluation
should be about setting standards for other evaluations in the New Zealand public sector. By
setting the bar so low for such a high profile and important evaluation as this one is, there are
negative spill overs across the public service in terms of what is acceptable in the future in
another internal or external evaluation. Second, if the public service responds to information
where there is a strong possibility it is misleading (in this or any other evaluation influenced
by it), significant harm may be done, and substantial net costs incurred.

My final comment – phrased as a question – goes somewhat beyond my brief, but I like
delivering value for public money. My understanding is that together about $1 million was
allocated overall for the combined process and outcome evaluation. Roughly $18 billion
cumulatively was spent by government on the WSS. Was such a comparatively low
IN-CONFIDENCE

29

evaluation budget and short time frame for delivery of the evaluation appropriate to the
extremely large size of the intervention? Possibly. Possibly not.

References

Ministry of Social Development (no date), High level evaluation approach for the COVID-19
Wage Subsidy Scheme, internal document supplied by MSD.

Martin Jenkins (2022a), COVID-19 Wage Subsidy Scheme Evaluation Plan, March.

Martin Jenkins (2022b), Process Evaluation of the CVID-19 Wage Subsidy. Draft, 12
August.

Moore, Graham F., Suzanne Audrey, Mary Barker, Lyndal Bond, Chris Bonell, Wendy
Hardeman, Laurence Moore et al.. (2015), “Process evaluation of complex interventions:
Medical Research Council guidance.” British Medical Journal, 350.
https://www.bmj.com/content/bmj/350/bmj.h1258.full.pdf

Official of the Auditor General (2012), Realising benefits from six public sector technology
projects, Wellington, June.

You might also like