Professional Documents
Culture Documents
The Future of Skills Employment in 2030
The Future of Skills Employment in 2030
Hasan Bakhshi
Jonathan M. Downing
Michael A. Osborne
Philippe Schneider
Research partners
1
Creative Commons
This work is licensed under the Creative Commons Attribution-Non Commercial-
ShareAlike 4.0 International License. To view a copy of this license, visit http://
creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter
to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Suggested reference
Bakhshi, H., Downing, J., Osborne, M. and Schneider, P. (2017).
The Future of Skills: Employment in 2030. London: Pearson and Nesta.
Copyright 2017
ISBN: 978-0-992-42595-1
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
ABOUT PEARSON
Pearson is the world’s learning company. We’re experts in educational course ware and assessment, and provide teaching
and learning services powered by technology. We believe that learning opens up opportunities, creating fulfilling careers
and better lives. So it’s our mission to help people make progress in their lives through learning.
ABOUT NESTA
Nesta is a global innovation foundation. We back new ideas to tackle the big challenges of our time, making use of our
knowledge, networks, funding and skills. We work in partnership with others, including governments, businesses and
charities. We are a UK charity that works all over the world, supported by a financial endowment.
www.oxfordmartin.ox.ac.uk
Twitter: @oxmartinschool
Facebook: /oxfordmartinschool
3
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
CONTENTS
7 EXECUTIVE SUMMARY 41 6 RESULTS
42 6.1 Occupations
18 1 INTRODUCTION 42 6.1.1 US
48 6.1.2 UK
20 2 LITER ATURE REVIEW 52 6.2 Sensitivity analysis
20 2.1 Anticipating occupations and skills 53 6.2.1 US
21 2.2 Changing skills needs 57 6.2.2 UK
22 2.3 Labour markets and structural change 61 6.3 Skills
25 Overview of the trends 62 6.3.1 US
67 6.3.2 UK
29 3 APPROACH 72 6.4 Relative importance of skills, abilities
and knowledge areas
30 4 DATA
74 6.5 Skill complementarities
30 4.1 O*NET
74 6.5.1 US
32 4.2 Employment microdata
77 6.5.2 UK
32 4.3 Workshop-generated data
80 6.6 New occupations
80 6.6.1 US
36 5 METHODOLOGY
83 6.6.2 UK
36 5.1 Gaussian processes
36 5.2 Heteroskedastic ordinal regression
87 7 LIMITATIONS OF THE ANALYSIS
37 5.3 Active learning
37 5.4 Assessing feature importance
89 8 CONCLUSIONS
38 5.4.1 Pearson correlation
38 5.4.2 A verage derivative and 91 Appendix A
feature complementarity Sensitivity analysis
39 5.5 New occupations 95 Appendix B
39 5.6 Trend extrapolation Overall rankings of minor occupation groups, US and UK
99 Appendix C
Skills ranking by average derivative, US and UK
103 Appendix D
UK–US occupation crosswalk
113 REFERENCES
120 ENDNOTES
5
ABOUT THE AUTHORS
Hasan Bakhshi leads Nesta's creative economy policy and research and data analytics. His work includes co-authoring
the Next Gen skills review of the video games and visual effects industries, which led to wholesale reforms of the school
ICT and computing curriculum in England, and the Manifesto for the Creative Economy, which sets out ten recommen-
dations by which governments can help the creative economy grow. Bakhshi is a member of the Royal Economic Society
Council, the Senior Management Team at the Economic Statistics Centre of Excellence (ESCoE) supported by the Office
for National Statistics (ONS) and the Expert Peer Review Group for Evaluation at the Department for Business, Energy and
Industrial Strategy.
Jonathan M. Downing is a DPhil (PhD) student in the Machine Learning Research Group at University of Oxford, jointly
supervised by Dr. Michael Osborne and Prof. Stephen Roberts. His academic research focuses on the Bayesian non-para-
metric, Future of Work, and machine learning interpretability. Prior to starting his DPhil he completed a MEng Engineering
Science at Keble College, University of Oxford.
Michael A Osborne is the Dyson Associate Professor in Machine Learning, a co-director of the Oxford Martin
programme on Technology and Employment, an Official Fellow of Exeter College, and a co-director of the EPSRC Centre
for Doctoral Training in Autonomous Intelligent Machines and Systems, all at the University of Oxford. He works to
develop machine intelligence in sympathy with societal needs. His work in Machine Learning has been successfully
applied in diverse contexts, from aiding the detection of planets in distant solar systems to enabling self-driving cars to
determine when their maps may have changed due to roadworks. Osborne has deep interests in the broader societal
consequences of machine learning and robotics, and has analysed how intelligent algorithms might soon substitute for
human workers.
Philippe Schneider is an independent researcher who has collaborated extensively with Nesta as a Visiting Research
Fellow and consultant since 2007. He has carried out technical and policy work for a range of private and public sector
organisations, including the World Bank, BlackRock, the African Development Bank, HM Treasury, BP, CISCO, British
Academy and what is now the Department for Business, Energy and Industrial Strategy where he was Advisor to the
Secretary of State on innovation policy. He speaks and reads Chinese.
ACKNOWLEDGEMENTS
We thank Logan Graham for his research assistance on coding and preparation of the O*NET and employment data
and to Justin Bewsher for his contributions to the sensitivity analysis. We are indebted to Harry Armstrong and Wendy L.
Schultz, and to those who participated in the two foresight workshops held in Boston, Massachusetts, and in London in
October 2016. We are also grateful to the attendees of a dry run workshop in London who helped us refine the workshop
exercises. Thanks also to Antonio Lima, George Windsor, Juan Mateos-Garcia, Geoff Mulgan and Cath Sleeman at Nesta
and Laurie Forcier, Amar Kumar, Janine Mathó, Tom Steiner, Vikki Weston and other colleagues at Pearson for their
encouragement, support and valuable comments on the findings. Mark Griffiths played an important role in the inception
of the study and provided valuable support throughout its implementation. Sir Michael Barber, Paolo Falco, Joshua Fleming
and Carl Frey provided invaluable comments on an earlier draft of the report. We are also grateful to Rainy Kang for
helping us track down data and Jessica Bland for her contributions in the early stages of the research.
6
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
EXECUTIVE SUMMARY
7
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
8
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
OUR CONTRIBUTION
Job Features
The skills, abilities and knowledge areas that comprise occupations
are collectively called features.
Structural Change
In economics, structural change is a shift or change in the basic
ways a market or economy functions or operates.
9
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
OUR METHODOLOGY
Here’s how our methodology works:
We start by reviewing the drivers of change and the We subsequently use this information to train a machine
interactions that are expected to shape industry structures learning classifier to generate predictions for all
and labour markets in 2030. We also assemble detailed occupations. This relies on a detailed data set of 120
information about occupations (key tasks, related skills, abilities and knowledge features against which
industries and historical growth patterns). This material is the U.S. Department of Labor’s O*NET service 'scores'
used to contextualise and inform discussions at foresight occupations. (We also map this data to the closest
workshops in the US and UK, our countries of analysis. comparable UK occupations using a 'cross-walk'.) Together
with the predictions about changes in occupational
FORESIGHT WORKSHOPS
demand, this permits us to estimate the skills that will, by
At the workshops, panels of experts are presented with extension, most likely experience growth or decline.
three sets of ten individual occupations and invited to
ANALYSIS
debate the future prospects of each in light of the trends.
The first set of ten occupations is chosen randomly. We interpret the machine learning results with particular
Participants then assign labels to the occupations attention to the discussions from our foresight workshops,
according to their view of its future demand prospects and highlight findings that are most relevant for employers,
(grow, stay the same, shrink), as well as their level of educators and policymakers.
confidence in their responses. To sharpen prediction, an
active learning method is implemented: the subsequent
sets of occupations to be labelled are chosen by the
algorithm. Specifically, the algorithm chooses occupations
in areas of the skills space about which it is least certain,
based on the previously labelled occupations. This process
is repeated twice to generate a training set of
30 occupations.
O*NET
O*NET is the US Department of Labor’s Occupational Information Network
(O*NET), a free online database that contains hundreds of occupational
definitions to help students, job seekers, businesses and workforce
development professionals to understand today’s world of work in the
United States. Data from the 2016 O*NET survey was used in this study
to understand the skills, abilities and knowledge areas that make up each
occupation group. onetonline.org
10
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
MET
HO
DO
LO
GY
INCREASING
INEQUALIT Y
POLITICAL
U N C E R TA I N T Y
T INSI
U R B A N I S AT I O N
PER GH
EX T
TECHNOLOGIC AL
TRENDS CHANGE
E N V I R O N M E N TA L
S U S TA I N A B I L I T Y
DEMOGR APHIC
CHANGE
G L O B A L I S AT I O N
! M
G
C I
N
A
HI
NE LEARN
IN DEM
LS AN
IL
K
S
O*NET
11
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
KEY TRENDS
The future of work isn’t only influenced by automation. TECHNOLOGICAL CHANGE
Our model includes an analysis of the following key trends
— Perennial fears about impact of automation on
to determine the bigger picture of work.
employment.
ENVIRONMENTAL SUSTAINABILITY — Estimates of future automation impact range, from 47%
— Climate change consensus largely intact, but with of US employment at risk to only 9%.
notable cracks. — Conversely, technology amplifies human performance
— Structural changes resulting from emerging 'green in some occupations--and gives rise to entirely new
economy sector' and 'green jobs', but vulnerable occupations and sectors.
to political reversals.
GLOBALISATION
URBANISATION — Global labour markets increasingly integrated.
— More than half of world population lives in cities—70 — Benefits (e.g.,advanced manufacturing, knowledge-
percent by 2050. Cities attract high-value, knowledge- intensive services) and costs (e.g., employment and
intensive industries, offer more varied employment wage impacts, trade deficits, legacy manufacturing).
and consumption opportunities. — Post-financial crisis headwinds (e.g., sluggish world
— Uncertainties include fiscal policy, infrastructure trade growth, rising protectionism).
investments, high public debt ratios.
DEMOGRAPHIC CHANGE
INCREASING INEQUALITY
— Pressures to control age-related entitlements
— Rise in income and wealth inequality, middle vs. investments in education, R&D, infrastructure.
class squeeze. — Ripple effects through healthcare, finance, housing,
— Disparities in education, healthcare, social services, education, recreation.
consumption. — Rising Millennial generation, with divergent
consumption and work behaviours.
POLITICAL UNCERTAINTY
12
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
OUR FINDINGS
THE FUTURE DEMAND FOR OCCUPATIONS We find that many of the jobs likely to experience
a fall in employment are, unsurprisingly, low- or
We predict that around one-tenth of the workforce
medium-skilled in nature. However, in challenge to
are in occupations that are likely to grow as a
some other studies, not all low- and medium-skilled
percentage of the workforce. Around one-fifth are
jobs are likely to face the same fate.
in occupations that will likely shrink. This latter
figure is much lower than recent studies of Technological change and globalisation may account
automation have suggested. for why many low- or middle-skilled occupations (e.g.,
manufacturing production) are expected to become
This means that roughly seven in ten people are
less important in the workforce. The predicted decline in
currently in jobs where we simply cannot know
administrative, secretarial and some sales occupations is
for certain what will happen. However, our findings
also consistent with these trends. Agriculture, skilled trades
about skills suggest that occupation redesign coupled
and construction occupations, however, exhibit more
with workforce retraining could promote growth
heterogeneous patterns, suggesting that there may be
in these occupations.
pockets of opportunity throughout the skills ladder.
A key element of the study is quantifying the extent of
The results also suggest that non-tradable services, like
uncertainty about likely future trends. These uncertainties
food preparation, elementary services and hospitality will
reflect the challenging task of balancing all the macro
all likely grow in importance. Many of these occupations,
trends that might influence the future of work. Further
again, have lower skills requirements. However, they are
uncertainties stem from the distinction between
associated with differentiated products, which consumers
occupations that are expected to grow in demand
increasingly value.
(reflecting wider occupation growth) from those that will
grow relative to other occupations. This distinction turns This indicates that these occupations may be ripe for job
out to be important because our US and UK expert groups redesign and employee skills upgrading to emphasise
predict as a whole that the workforce will continue to grow further product variety, a development heralded by the
through 2030. re-emergence of artisanal employment in occupations like
barbering, brewing and textiles.
The uncertainty in our findings also reflects our use of the
richest possible data set of occupation-related 'features'— In general, public sector occupations —
with some
that is, the skills, abilities and knowledge areas required exceptions — feature prominently and are predicted
for each occupation. (This use of all 120 O*NET features to see growth.
is an important differentiator of our study. For example,
In the UK, education, healthcare and wider public sector
the most recent study by Frey and Osborne (2017) uses
occupations are, with some confidence, predicted to see
only nine skills categories.) This detailed characterisation
growth.These findings are consistent with population ageing
of occupations renders them less similar to one another,
and a greater appetite for lifelong learning. They are also
thereby limiting the confidence of our model in making
consistent with the labour intensive nature of these sectors,
predictions for one occupation based on what has been
and their traditionally lower potential for productivity growth
labelled for another. In exchange, however, we are able to
(Baumol and Bowen, 1966). They are further consistent
develop a far more nuanced understanding of future skills
with the view that public sector roles are more resistant to
demand, as noted below.
automation (Acemoglu and Restrepo, 2017a).
Skilled Trades
In the UK, skilled trades include jobs in agriculture, metalwork, construction,
textiles, food preparation, hospitality and woodworking, among others.
Elementary Services
In the UK, elementary occupations consist of simple and routine tasks which
mainly require the use of hand-held tools and often some physical effort (e.g.,
farm workers, street cleaners, shelf fillers).
13
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
We also expect buoyant demand for some — but We find a strong emphasis on interpersonal skills,
not all —
professional occupations, reflecting the higher-order cognitive skills and systems skills in
continued growth of service industries. both the US and the UK.
Creative, digital, design and engineering occupations have In the US, there is particularly strong emphasis on
bright outlooks and are strongly complemented by digital interpersonal skills. These skills include teaching, social
technology. Furthermore, architectural and green occupations perceptiveness and coordination, as well as related
are expected to benefit from greater urbanisation and a knowledge, such as psychology and anthropology. This is
greater interest in environmental sustainability. consistent with the literature on the growing importance
of social skills in the labour market (Deming, 2015).
Interestingly, demand prospects can vary considerably
There are good reasons to believe that interpersonal
for some 'white collar' occupations that otherwise appear
skills will continue to grow in importance — not only as
very similar. In the US, roles such as management analysts,
organisations seek to reduce the costs of coordination
training and development specialists and labour relations
but also as they negotiate the cultural context in which
specialists — occupations which should benefit from the
globalisation and the spread of digital technology are
reorganisation of work — are projected to grow in the
taking place (Tett, 2017).
workforce, whereas financial specialists are expected to
fall. The latter is consistent with automation having an Our findings also confirm the importance of higher-order
impact on cognitively-advanced occupations as well as cognitive skills such as originality, fluency of ideas and
more routine roles. active learning.
Additionally, although there is a predicted decline in many A similar picture emerges for the UK. The results point to
sales occupations, consistent with an expansion in digital a particularly strong relationship between higher-order
commerce, niche roles like sales engineers and real cognitive skills and future occupational demand. Skills
estate agents may buck this trend. related to system thinking — the ability to recognise,
understand and act on interconnections and feedback
THE FUTURE DEMAND FOR SKILLS*
loops in sociotechnical systems — such as judgement
Our results provide broad support for policy and and decision making, systems analysis and systems
practitioner interest in so-called 21st century skills in both evaluation also feature prominently.
the US and the UK.
We show that the future workforce will need broad-
based knowledge in addition to the more specialised
features that will be needed for specific occupations.
Sales Engineers
A sales engineer is a salesperson with technical knowledge
of the goods and their market.
Active Learning
Understanding the implications of new information for both current and Systems Analysis
future problem-solving and decision-making. Determining how a system should work and how changes in conditions,
operations and the environment will affect outcomes.
Fluency of Ideas
The ability to come up with a number of ideas about a topic (the number of Systems Evaluation
ideas is important, not their quality, correctness, or creativity). Identifying measures or indicators of system performance and the actions
needed to improve or correct performance, relative to the goals of the system.
Sociotechnical Systems
The term sociotechnical system refers to interaction between society's *A Glossary of Skills laying out the precise definitions of all
complex infrastructures and human behaviour. It can also be used to describe 120 O*NET skills, knowledge areas and abilities can be accessed
the relationship between humans and technology in the workplace. at futureskills.pearson.com.
14
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
Other knowledge features like foreign languages are In the UK, it turns out that strengthening judgement and
especially valuable as complements: that is, they find use decision-making skills, fluency of ideas and operations
in specialised occupations when other features have a analysis are important demand complements for many
large value. We find a similar pattern for a number of occupations. The literature underlines the need to match
STEM-related features like science, technology design these skills with changes in organisational design, such as
and operations analysis. Interestingly these features enhanced delegation, employee involvement in decision
are found to be complementary to conventional STEM making and other related high-performance work practices
occupations, as well as to some non-STEM occupations in order to maximise their impact (Ben-Ner and Jones,
such as secretarial and administrative jobs. 1995; Kruse et al., 2004; Lazear and Shaw, 2007).
Occupations and their skills requirements are not set in An attraction of our approach is that it can be used to
stone: they are also capable of adjusting to shifts in the consider occupations that do not yet exist, but may
economic environment (Becker and Muendler, 2015). emerge in the future in response to the identified drivers
Our model identifies how the skill content of occupations of change. The model allows us to identify hypothetical
can be varied to improve the odds that they will be occupations, dissimilar to existing occupations, that are
in higher demand. As noted above, we define these 'almost certain' to see future growth. In particular, we
as 'complementary skills' in so far as their impact on can identify the combinations of skills, knowledge areas
demand is conditional on the other skills that make up the and abilities that are most associated with such new
occupation. The notion of complementarity can be used occupations.
to determine priorities for skills investment and assist
For the US, the model finds four such hypothetical
thinking on how jobs might be redesigned to put these
occupations, along with their top five ranking features.
skills to work.
We are able to further understand something about
Complementary skills that are most frequently these hypothetical occupations by looking at the existing
associated with higher demand are customer and occupations that are 'closest to them' and inspecting their
personal service, judgement and decision making, historical growth.
technology design, fluency of ideas, science and
For the UK, two new hypothetical occupations are
operations analysis.
discovered by the model, along with their top five ranking
In the US, customer and personal service, technology features. We again consider the occupations that are
design and science skills are, according to our analysis, closest to these and confirm their past growth.
the job performance requirements seen as most likely to
boost an occupation’s demand beyond what is currently
predicted, notwithstanding notable differences across
occupation groups.
Operations Analysis
Analysing needs and product requirements to create a design.
Production Occupations
The production occupations is used in the U.S. and covers machinists,
operators, assemblers, and the like across a wide variety of industries
(e.g., nuclear power, gas and oil, food preparation, textiles).
15
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0 – EXEC U T I V E S U MMA R Y
CONCLUSION
16
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
T H E FU T U R E O F S K I L L S
EMPLOY MEN T IN 203 0
17
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
1. INTRODUCTION
Governing is the art of planning and predicting. Developing a to increased labour costs, lost production associated with
picture of long-term jobs and skills requirements is critical for vacancies remaining unfilled and all the direct and indirect
policymakers as they navigate rapid, complex and uncertain costs of higher unemployment (Şahin et al., 2014; OECD,
shifts in the economy and society. A wide range of areas – 2016b). Individuals likewise pay a heavy price. They benefit
from curriculum development and careers guidance through from economic growth mainly through jobs. Not only are
apprenticeships and workplace training to occupational jobs typically the most important determinant of earnings
standards, migration and social insurance – rely on the and living standards. They also critically shape how individuals
availability of accurate labour market information (LMI). It view themselves, interact with others and perceive their stake
is a basic precondition for the system resilience of modern in society, including their sense of control over the future
economies – the collective ability of individuals, education and (Banerjee and Duflo, 2008; World Bank, 2013).
labour market institutions to adapt to change without breaking
The jury is out on the scale of long-term skills shortages in
down or requiring excessively costly intervention to remedy.
the labour markets of advanced economies. Much of the
However, there is also an awareness of the divergence evidence comes from employers, typically from surveys.
between the pace of change and the inertia of our ManpowerGroup, the human resources company that
institutions. Andreas Schleicher, Director for Education and publishes arguably the most authoritative survey on skills
Skills at the Organisation for Economic Co-operation and shortages, finds that globally 40% of employers have difficulty
Development (OECD), has pointed out that throughout filling jobs. This figure has been largely stable over the past
history, education has always taken time to catch up with decade, although considerable differences exist across
technological progress (Schleicher, 2015). In the UK, the countries: while shortage levels in the US have tracked the
Education Act of 1902, which marked the consolidation of global average, they are significantly lower in the UK but
a national education system and the creation of a publicly appear to be growing (ManpowerGroup, 2016).
supported secondary school system, arrived a century after
The reliability and validity of employer surveys, however, are
the Industrial Revolution and the growing complementarity
open to question (Cappelli, 2015). The empirical fingerprints
between human and physical capital (Galor and Moav, 2006;
for skills shortages are not where we would expect them
Becker et al., 2009).
to be – namely in wage inflation not linked to productivity
Today, educationalists speak about a ‘40-year gap’ growth. On the contrary, labour’s share in national income
between experts who are exploring where the world has trended downwards in most economies since the 1990s
of work and the state of learning will need to be in 15 (International Monetary Fund (IMF), 2017). Academic studies
years’ time, practitioners in the trenches and parents, examining the issue have also failed to uncover significant
whose conception of ‘good’ education is framed by their shortages (Weaver and Osterman, 2017). Where they exist,
own earlier experiences. The result is a structure that they are often attributed to the unwillingness of employers
resembles sedimentary rock: each layer has its own to offer attractive remuneration to workers, suggesting that
assumptions and expectations. But there is little holding interventions that treat the problem as an educational one
the layers together, and once in place, they can limit are likely to be poorly targeted (Van Rens, 2015).
policy change and future choices.
Saying that skills shortages are overstated is not the same
Structural change is affecting labour markets, as it is all as saying that they are unfounded. They are notoriously
markets, upsetting the balance of supply and demand for difficult to measure, hidden from view as companies work
skills. While misalignment is normal over the business cycle, around problems by increasing the workload of existing
the costs of persistent mismatches can be considerable if employees, outsourcing work to other organisations or even
left unaddressed: they limit the ability of firms to innovate adapting their product market strategies so that they are
and adopt new technologies, while impeding the reallocation less dependent on a highly skilled workforce.
of labour from less productive activities to more productive
ones (Adalet McGowan and Andrews, 2015). They also lead
18
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
19
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
More prosaically, prediction is better suited to some A more refined way of reaching the same conclusion is
areas of human activity than others. There tends to to compare the performance of long-term occupational
be a high degree of persistence, and thus predictability, projections against outcomes. One evaluation of US Bureau
in the occupational and skills make-up of the workforce. of Labor Statistics (BLS) 10-year projections at the one-
This reflects the fact that the labour market is a social digit level for the period 1988–2008 finds that they do
institution embedded in a dense web of rules, habits and a good job of anticipating the size of broad occupations
conventions and that there are substantial employment (absolute error < 6%), with the exception of service and farm
adjustment costs, even in the face of major changes such workers (Handel, 2016). In addition to path dependence
as the arrival of new and disruptive technologies (Pierson, in occupational structure, this performance is attributable
2004; Granovetter, 2017). to the fact that errors at a more detailed occupational
level cancel out one another. Consistent with this, forecast
This pattern is not unique or even unusual historically. Thus
errors are found to be inversely related to employment
some observers point to the fitful progress of electrification
size: occupations with more than 600,000 workers have an
in the US, where labour productivity grew slowly between
average absolute error of 14.8% compared with 32.7% for
1890 and 1915, then saw a decade-long acceleration, only to
occupations with between 25,000 and 49,000 workers. This
slow down in the mid-1920s, before experiencing a second
suggests that, even in the presence of significant errors with
boom in the 1930s (Syverson, 2013). The unprofitability of
respect to the size of individual occupations, this should not
replacing still viable manufacturing plants adapted to water
be an obstacle to prediction where the goal is to make more
and steam power; the slow gestation of complementary
20
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
general statements about occupational and skills demand. which combines and builds on both quantitative and
qualitative approaches.
The BLS’s projections are not exempt from criticism – most
of which is focussed on their tendency to underestimate 2.2. CHANGING SKILLS NEEDS
changes in the size of occupations (Alpert and Auyer, 2003;
This report relates to the literature on the changing
Carnevale et al., 2010; Wyatt, 2010). This largely reflects
demand for skills. Conventional wisdom views such change
the challenges of applying fine-tuned rules and making
as a product of the complementarity between technology
point estimates when there are high levels of uncertainty.
and high-skilled labour. That is, technological progress
By extension, anticipating the direction of change –
raises the demand for skills, and investment in skills, in
whether an occupation will grow or decline in relative or
turn, satiates that demand. This framework has proven a
absolute terms – appears to pose fewer issues for the
workhorse for economists and can successfully explain
same reasons. As a coarser assessment, it requires less
many salient changes over time in the distribution of
information about future states of the world and thus is
earnings and employment across advanced economies
more robust to ignorance.4
(Goldin and Katz, 2009).
Studies have similarly shown that ignoring some
Nonetheless, its implementation rests on a highly
information can make not only for cheaper but also
aggregated and conceptually vague measure of skill,
for more accurate decisions in such circumstances
typically years of schooling. Recent accounts have sought
(Gigerenzer, 2010). We share this perspective and
to put more meat on its bones by mapping skills to the
accordingly produce directional forecasts in this study.
tasks performed by labour (Acemoglu and Autor, 2011).
Other sources of error are more problematic. They are Influential work by Autor and Murnane (2003), for example,
embodied and variously popularised in ‘weak signals’ distinguishes between cognitive and manual tasks on the
and ‘black swans’ – shifts that are difficult to observe one hand, and routine and non-routine tasks on the other.
amid the noise, yet whose consequences are potentially Comparing tasks over time, from 1960 to 1998, they find
transformative. They resemble George Shackle’s that routine cognitive and manual tasks declined while
description of the future “which waits, not for its contents non-routine cognitive and manual tasks grew in importance.
to be discovered, but for that content to be originated” Extending this study, Levy and Murnane (2004) attribute
(Shackle, 1972). Quantitative approaches which assume the growth of non-routine cognitive tasks to jobs requiring
that past patterns of behaviour will continue over the skills in expert thinking and complex communication.
longer term struggle badly with these shifts. Even when Similar frameworks have been used in the trade literature,
their significance is acknowledged, in many cases they are especially in the context of outsourcing and offshoring and
ignored on the grounds that they are too unruly to analyse. also to understand the emergence of new occupations
such as green jobs (Consoli et al., 2016).
Whether this is an adequate defence is debatable.
Policymakers have no alternative but to grapple with all A growing body of work also underscores the role of ‘non-
possible discontinuities and plan accordingly. If planning cognitive’ skills, including social skills and leadership skills.
is silent about discontinuities, its value is reduced. For This derives from the pivotal insight of Heckman (1995) that
this reason, many organisations have found qualitative labour market outcomes such as earnings are likely shaped
foresight processes linked to strategic dialogue a useful by an array of skills insofar as measured cognitive ability
lens through which to interrogate these matters. However, accounts for only a small portion of the variation in such
they are no panacea to the shortcomings of prediction: outcomes (Heckman and Kautz, 2012). Deming (2015) finds
subjective judgments often lack external validity and that, in the US, nearly all job growth since 1980 has been
transparency, meaning that decision-makers are not in occupations that are relatively social-skill intensive.
always sure if and how they should act on them.
Strikingly, occupations with high analytical but low social
Still, foresight processes have the potential to broaden skill requirements shrank over the same period. One
thinking about alternatives beyond business-as-usual and possible explanation is that social skills provide the tools
their implications. By enabling deliberation and challenging for the rich and versatile coordination which underpins a
individually held beliefs, these processes can also combat productive workplace – the subtleties of which computers
the types of bias that may creep into long-term, expert-led have yet to master. This matters for organisations in
planning (Tichy, 2004; Goodwin and Wright, 2010; Kahneman, complex environments where the classic gains from
2011; Ecken et al., 2011; Nemet et al., 2016): overconfidence specialisation are eclipsed by the need to adapt flexibly to
(overweighting private information and underweighting public changing circumstances (Dessein and Santos, 2006).
information); optimism (exaggerating the rate of change,
Measures of social intelligence have also been validated by
especially for new technologies); familiarity (relating new
psychologists and neuroscientists (Poropat, 2009; Woolley et
experiences to previously seen ones); and narrative fallacy
al., 2010). Indeed, more recent thinking rejects the contrast
(creating explanations for phenomena which are essentially
between cognitive and non-cognitive skills. Whereas reason
unconnected). Accordingly, we adopt an integrated approach
is widely viewed as a path to greater knowledge and better
21
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
decision-making, some argue that it is much more diverse and Spitz-Oener, 2010; Dustmann et al., 2009; Goos and
and opportunistic – that it has evolved primarily to help Manning, 2007; Michaels et al., 2009; Spitz-Oener, 2006).
humans justify themselves and influence others, which is
Technological anxiety is not a new phenomenon (Keynes,
indispensable for communication and cooperation. Patterns
1930; Bix, 2000; Mokyr et al., 2015). Similar fears have been
of thinking that appear ‘irrational’ from a purely cognitive
expressed before: during the Industrial Revolution, the
perspective turn out to be advantageous when seen as
latter part of the 1930s, and again immediately after World
adaptive responses to the dilemmas of social interaction
War II. Each time adjustment was disruptively painful for
(Mercier and Sperber, 2017).
some workers and industries; but in the long run, such fears
Unencumbered by analytical tractability, policymakers were not realised.5 The employment-to-population ratio
have embraced a still wider understanding of skills. Over the grew during most of the 19th and 20th centuries in the UK
past two decades there has been considerable thinking and and US, even as the economy experienced the effects of
advocacy – both nationally and internationally – focussed mechanisation, the taming of electricity, the invention of the
on embedding so-called ‘21st century skills’ into education automobile and the spread of mass communication.
systems. The policy literature uses a range of overlapping
History cannot settle whether this time is different: what is
concepts, taxonomies, definitions and technical language,
striking about the perspectives of earlier observers is how
but at their core, skills are viewed as encompassing the
narrowly they defined the scope of what technology could
full panoply of cognitive, intrapersonal and interpersonal
accomplish. Earlier generations of machines were limited
competencies (National Research Council, 2012; Reimers
to manual and cognitive routine activities, based on well-
and Chung, 2016).
defined, repetitive procedures. The newest technology,
We are only aware of a handful of academic studies that by contrast, is mimicing the human body and mind in
view skills in these broad terms: for example, Liu and Grusky increasingly subtle ways, encroaching on many non-routine
(2013) develop an eight-factor representation of workplace activities, from legal writing and truck driving to medical
skills, though they focus on returns to skills that reflect diagnoses and security guarding.6
changes in relative supply as much as demand. MacCrory et
The case of driverless cars illustrates the slippery and shifting
al. (2014) perform principal component analysis on abilities,
definitional boundary around what it means for work to
work activities and skills in O*NET to identify five to seven
be ‘routine’. In their seminal 2004 book The New Division of
distinct skills categories that have discriminatory power in
Labor: How Computers Are Creating the Next Job Market, Levy
terms of explaining changes in the skills content
and Murnane (2004) argued that driving in traffic, insofar
of occupations over the past decade.
as it is reliant on human perception, fundamentally resisted
We extend this body of work – in part by also drawing automation: “Executing a left turn against oncoming traffic
on the knowledge features in O*NET which provide involves so many factors that it is hard to imagine discovering
information on the specific academic subjects and domain the set of rules that can replicate a driver’s behaviour [. . .]”.
knowledge required by occupations. One advantage of Formidable technical challenges lie ahead: the prospect of
the O*NET knowledge features is that they are expressed fleets of cars that can roam across cities or countries in all
in relatively natural units which make them easier to conditions without human input remains remote (Simonite,
understand and address through policy. They also touch 2016; Mims, 2016). Nonetheless, elements of this problem
on the knowledge versus skills debate in education are now satisfactorily understood and can be specified
circles between proponents who argue that curriculum in computer code and automated. For example, Google’s
and pedagogy should teach transferable–skills – the driverless cars have driven over 2 million miles in the past six
ability to work in teams, to create and think critically – years, and have been involved in 16 minor accidents, none
and those who contend that skills need to be rigorously of which caused injury or was the car’s fault (Bank of America
grounded in a base of knowledge in order to be mastered Merrill Lynch, 2017).
(Christodoulou, 2014; Hirsch, 2016).
A more forward-looking approach can help guard against these
2.3. LABOUR MARKETS AND STRUCTURAL CHANGE pitfalls. This is exemplified by Frey and Osborne (2017), who
assess the feasibility of automating existing jobs assuming that
This report also addresses work on the employment
new technologies are implemented across industries on a larger
effects of automation and structural change more
scale. In this study, a sample of occupations was hand-labelled
generally. The rise of robots, artificial intelligence, big data
by machine-learning experts as strictly automatable or not
and the internet of things have raised concerns about the
automatable. Using a standardised set of nine O*NET features
widespread substitution of machines for labour. Evidence
of an occupation that measure three bottlenecks to automation
linking automation of many low-skilled and medium-skilled
– perception and manipulation, creative intelligence, and social
occupations to wage inequality, labour market polarisation
intelligence – they then ran a classifier algorithm to generate a
and the ongoing decline in manufacturing jobs is interpreted
‘probability of computerisation’ across all jobs, estimating that
as support for the claim that workers are falling behind in
over the next two decades, 47% of US workers' jobs are at
the race against machines (Autor et al., 2006, 2008; Black
a high risk of automation.
22
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
This finding has not gone unchallenged. MacCrory et al. This example captures one feasible division of labor:
(2014) point out that a handful of variables cannot capture machines performing routine technical tasks, such as
the diverse economic impact of technological change on looking up known issues in a support database, and
skills, especially across the whole gamut of occupations in workers performing the manual task of making polite
the labour market. Arntz et al. (2016) observe that, within conversation while reading aloud from a script. But this
an occupation, many workers specialise in tasks that cannot is not generally a productive form of work organization
be automated. Using the automation probabilities from because it fails to harness the complementarities between
the Frey and Osborne study and drawing on the Survey technical and interpersonal skills”.8
of Adult Skills by the Programme for the International
A limitation of all these studies is that they only estimate
Assessment of Adult Competencies (PIAAC) that examines
which occupations are potentially automatable – not how
task structures for individuals across more than 20 OECD
many will actually be automated. As discussed earlier,
countries, they argue that once task variation is taken
the journey from technical feasibility to full adoption can
into account, a much smaller proportion of jobs (~ 9%)
take decades involving many steps and missteps. Just
are at risk of being completely displaced. They also find
as significantly they do not assess the potential for job
important differences across countries that are attributed
creation in tasks and occupations complemented by
to variations in workplace organisation, adoption of new
automation or the adjustments that are triggered in other
technologies and educational levels.7
parts of the economy through relative wage changes and
The McKinsey Global Institute (2017) disaggregates other market forces (Shah et al., 2011; Davenport and
occupations into 2,000 constituent activities, rating each Kirby, 2016; Kasparov, 2017).
against 18 human capabilities and the extent to which they
The effect of fleshing out these dynamics is to substantially
can be substituted by machines. It estimates that 49% of
muddy and possibly reverse more pessimistic conclusions.
work activities globally have the potential to be automated,
Gregory et al. (2016) develop a task-based framework,
though very few occupations – less than 5% – are candidates
estimating that automation boosted net labour demand
for full automation (see also Brandes and Wattenhofer, 2016).
across Europe by up to 11.6 million jobs over the period
While these studies confirm the importance of considering 1990–2010. They identify a number of channels that
the automatability at the task level, this approach raises potentially compensate for the job-destroying effects of
its own challenges. In principle, there is nothing in an automation, including first, that automation may lead to
occupation-based approach that prevents analysts from lower unit costs and prices which stimulate higher demand
considering its constituent tasks when evaluating the for products, and, second, that surplus income from
potential for automation. There are also drawbacks with a innovation can be converted into additional spending, so
strictly bottom-up approach in the context of anticipating generating demand for extra jobs in more automation-
occupational and skills demand. In isolation, one might resistant sectors (see also Goos et al., 2015).9
reasonably infer that similar tasks, such as sales, have
However, a number of strong assumptions are necessary
similar levels of demand. But as part of an occupation,
for this result – notably that additional firm profits are
they also belong to different industries with different
spent locally in Europe when in fact they may accrue to
growth prospects and require different knowledge
non-European shareholders. Relaxing this assumption
connected to the product, or the buyers of the product
results in significantly lower estimates, although they are
(for instance, consider an insurance sales agent vs. a solar
still positive (1.9 million jobs). This finding has particular
equipment sales representative). By emphasising discrete
relevance to debates about ‘who owns the capital’ and
tasks, there is a risk of losing important coordinating
the case for spreading ownership of robot capital through
information which gives occupations their coherence –
profit-sharing programmes and employee stock-ownership
the fabric which distinguishes the whole from the parts.
plans (Freeman, 2015).
Practically, unbundling occupations may come at the
Acemoglu and Restrepo (2017a) report contrasting results.
expense of quality. Autor (2013) counters the simple view
They explore the impact of the increase in industrial robot
– popular in some parts of the automation debate – that
usage between 1990 and 2007 on US local labour markets.
jobs can be ‘redefined’ as machines perform routine tasks
To identify causality, they use industry-level spread of
and workers perform the rest: “Consider the commonplace
robots in other advanced economies as an instrument
frustration calling a software firm for technical support
for robot usage in the US. This strategy helps isolate the
only to discover that the support technician knows nothing
change in exposure to robots from other organisational or
more than what is on his or her computer screen – that is,
industry developments that may also be correlated with
the technician is a mouthpiece, not a problem solver.
robot usage and influence subsequent labour demand.
They find that each additional robot reduces employment
by about seven workers, with limited evidence of offsetting
employment gains in other industries.10
23
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
This finding is robust to a range of different specifications, The future of work – understood in its widest sense – has
tests and controls, such as demographic and industry been climbing the policy agenda. The topic has featured
characteristics, share of routine jobs, import competition on the covers of the popular press; major organisations,
from China and overall capital utilisation. Excluding the think tanks and consultancies have hosted conferences
automobile industry, the heaviest user of robots, the on the subject and it has generated a flurry of reports and
introduction of robots does not change results when studies (PwC, 2016; UK Commission for Employment and
the localities most affected by robots had similar Skills, 2014; Beblavý et al., 2015; World Economic Forum
employment and wage levels to other localities – (WEF), 2016; Hajkowicz et al., 2016; Canadian Scholarship
which is to say they were not on a downhill path Trust (CST), 2017). These efforts mirror our approach
before robotisation. insofar as they assess the multiple trends affecting the
employment and skills landscape. However, there is an
Although Acemoglu and Restrepo's study (2017a) is
important respect in which their treatment of the trends
an important step for understanding the dynamic
differs: in addition to being qualitative in nature, many use
employment effects of automation, it is far from the final
scenario techniques to weave trends into a set of internally
word: in the main, it does not address the global effects
coherent and distinct narratives about the future. This
of automation which are important given rich patterns of
approach brings order and depth to conversations, though
trade, migration and specialisation across local markets.
it also has drawbacks. Because scenarios are typically
Also, with the robot revolution still in its infancy, short-
taken as given, assumptions about how change takes
term consequences may differ from the long-term ones,
place and trends interact are often opaque. This makes
once relative prices and investment have had time to
it difficult to explain why one set of scenarios has been
fully adjust. Evidence of diminishing marginal returns to
selected from among the infinite number that are possible,
robot usage documented by Graetz and Michaels (2015) is
which has the impact of shutting off outcomes which
consistent with such a view.
might otherwise emerge if these dynamics were explicitly
The need to recognise the interactions embedded in accounted for (Miller, 2006).
trends carries across into other spheres. Parallel to
automation is a set of broader demographic, economic
and geopolitical trends which not only have profound
implications for labour markets, but are raising challenges
for policy in their own right. In some cases, trends are
reinforcing one another; in others, they are producing
second-order effects which may be missed when viewed
in isolation. Consider, for example, the implications of an
ageing population. While much of the automation debate
has focussed on the potential for mass unemployment,
it overlooks the fact that robots may be required to
maintain economic growth in response to lower labour
force participation. The risk, in other words, is not that
there will be too few jobs but that there will be too few
people to fill them – bidding up wages in the process –
which may explain why countries undergoing more rapid
population ageing tend to adopt more robots (Acemoglu
and Restrepo, 2017b; Abeliansky and Prettner, 2017). We
provide an overview of the trends in the following display, and
a description of how the trends analysis fits in with our wider
approach in Section 3.
24
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
TECHNOLOGICAL CHANGE
Greater connectivity and improvements in computing power and artificial intelligence are enabling ‘intelligence’
to be embedded more cheaply and readily in physical systems – from entire cities right down to the individual
human body. With peer-to-peer platforms, activities are amenable to decentralised production, unlocking previously
unused or underused assets – in the process muddying traditional definitions of ownership and employment.
Additive manufacturing and 3D printing could alter the economics of many industries, cutting the costs of
on-demand production.
Material and life sciences have seen major breakthroughs in areas such as graphene and gene-editing with potentially
radical applications. However, many must contend with a long road to commercialisation and significant barriers to
adoption, especially ethical and safety concerns.
There remains an unresolved tension between the seeming ubiquity of digital technology and a downshift in
measured productivity growth. Evidence from a wide range of industries, products and firms also suggests that
research effort is rising steeply while research productivity is falling sharply – that is, more and more resources are
being allocated to R&D in order to maintain constant growth.
History shows that technology optimism can slide into determinism, though there is a mirror image of this logic: people
tend to underestimate the huge effects of technology over the long term. The general pattern of technological progress
has been one of multiple lulls, followed by subsequent surges of creativity. For instance, new materials and processes,
leveraging digital tools that allow improved real-time measurement, experimentation and replication, are inherently
complementary such that advances in one domain may feed back into new technologies in a virtuous cycle.
GLOBALISATION
Over the past three decades, labour markets around the world have become increasingly integrated.
The emergence of countries like China and India, for centuries economic underperformers, has delivered an
immense supply shock to traditional patterns of trade.
Globalisation has not only had benefits but also costs. Employment and wages have typically fallen in industries
more exposed to import competition, exacerbated by labour market frictions and social and financial commitments
such as home ownership, which limit workers’ ability to relocate and take advantage of employment opportunities.
US and UK exports to other countries have not grown as much as imports, though large trade deficits have not
reduced jobs so much as redistributed them towards non-tradables, particularly construction.
The manufacturing sector has been a lightning rod for these changes, though the experience has not been
uniform, containing pockets of activity that have thrived – whether because the gains from keeping production at
home remain critical or head-to-head competition with emerging economies is limited. Combined with eroding
cost advantages among competitors and new technologies, including breakthroughs in shale oil extraction, these
conditions could support modest forms of reshoring.
As the economic centre of gravity shifts towards the emerging world, supported by a burgeoning middle class,
so opportunities may open up in areas such as knowledge-intensive services and advanced manufacturing where UK
and US exporters enjoy a comparative advantage.
However, a number of developments may frustrate this trend. Services are still substantially less likely to be
traded than manufactured products due to the prevalence of non-tariff barriers. Emerging markets face various
obstacles in sustaining their historic growth rates, ranging from the prospect of premature deindustrialisation
to the task of building high-quality institutions.
Sluggish world trade growth since the financial crisis and stiffening protectionist sentiment have challenged
the decades-old rule of thumb that trade grows faster than GDP, raising concerns that globalisation has
structurally ‘peaked’.
25
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
There is an obvious appeal to basing predictions on population growth and changes to the composition of the
population: given long-term historical trends, it is possible to make grounded assessments about where they are
likely to go, and at what speed.
The global economy has passed an important demographic threshold: dependency ratios – the ratio of non-working
age population to working age population – have begun to rise after nearly half a century of declines. With labour inputs
slowing in advanced economies, the importance of productivity in driving overall growth and policy in boosting labour
force participation has increased. This is especially true in the US, where prime-aged women and particularly men have been
withdrawing from the labour market over a long period.
Countries are coming under fiscal pressure to control age-related entitlements which could draw resources away
from education, R&D and infrastructure, especially as older households vote more actively than younger cohorts.
How an ageing population chooses to put its purchasing power to work will have a significant impact on the fortunes
of different industries and occupations. This is likely to benefit not only healthcare, finance and housing but also
recreation and education which have traditionally catered to the young.
Millennials – the cohort born between 1980 and 2000 – are poised to grow in influence as they inherit the assets of their
parents. They are the first group to come of age after the arrival of digital technology, bringing with them heightened
expectations of immediacy, participation and transparency. At the same time many became economically active in the
shadow of the Great Recession which may have tempered attitudes to risk and confidence in major institutions.
As a result, this group exhibits quite different consumption and work behaviours compared with previous generations.
ENVIRONMENTAL SUSTAINABILIT Y
A striking development over the past decade has been the growing consensus around man-made global warming.
The scale of the challenge is enormous: to keep the rise in average global temperatures to below 2°C – the de facto
target for global policy – cumulative CO2 emissions need to be capped at one trillion metric tonnes above the levels
of the late 1800s. The global economy has already produced half of that amount.
Climate change has wide-ranging consequences for many industries. Agriculture, tourism, insurance, forestry, water,
infrastructure and energy will all be directly affected, though linkages with socio-economic and technological systems mean
that risks can accumulate, propagate and culminate in even larger impacts. For example, climate change could threaten food
and resource security in parts of the world which may in turn make poverty and conflict more likely.
Meeting emissions reduction targets requires investment in green technologies, including LED lighting, electric vehicles,
solar photovoltaic systems and onshore wind and more sophisticated forms of energy efficiency – which will also create
opportunities for green finance. Despite plummeting costs for ‘clean’ solutions, there are a number of reasons why ‘dirty’
technologies are likely to hold sway, including political lock-in and high switching costs. Views differ as to the optimal dynamic
strategy to be followed in such a scenario. Evidence for the broader jobs potential of the green economy is also ambiguous.
Structural changes associated with the green economy are fundamentally dependent on government policy. The number of
supportive regulations has grown across the world. Although initiatives are likely to be set nationally rather than multilaterally,
remain tied to specific sectors and technologies and are vulnerable to political reversals.
26
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
URBANISATION
Today, over half of the world’s population live in cities, a number that is expected to grow to 70% by 2050.
This concentration of humanity illustrates the basic unevenness of economic development – the tendency for
places close to large markets to grow more rapidly than places more distant.
Cities are magnets for high-value, knowledge-intensive industries, where physical proximity enables collaboration
and firms and workers benefit from enhanced labour pooling and matching. Urban planners increasingly build
these features into the fabric of cities through the establishment of innovation districts that integrate work,
housing and recreation.
Cities also offer more varied consumption and employment opportunities, though medical conditions such as obesity,
diabetes and depression have been linked to aspects of the urban environment. Pressures on affordable housing may
lock some households out of these opportunities, forcing them to live in older suburbs or low-income areas tethered
to declining industries.
There has been an increasing push for authorities to make cities ‘smarter’ – leveraging the information generated by
infrastructure to optimise performance. This agenda has also come to focus on sustainability, resilience and making
cities more age-friendly. However, in many advanced economies infrastructure investment as a proportion of total
government expenditure has been trending downwards for decades, with serious questions around the quality of
existing infrastructure. A major uncertainty is how debates about the role of fiscal policy and government activism
will resolve themselves against a backdrop of high public debt ratios.
INCREASING INEQUALIT Y
The sharp rise in income and wealth inequality has been described as the “defining challenge of our times”.
This has been accompanied by a squeeze on the middle class, as the distribution of income has shifted towards
the higher and lower ends of the scale. Countries with higher levels of income inequality tend to have lower levels
of mobility between generations.
Economic distress and the erosion in opportunities for people with low education have, in turn, created a web of social
issues, including rising mortality and morbidity among segments of the US population. They have also fuelled resentment
of elites and the appeal of populist ideas.
The macroeconomic relationship between inequality and growth remains contested – though recent studies have
tended to highlight the costs of rising inequality, particularly over longer time horizons. Income inequality and
associated phenomena also have sectoral consequences. They contribute to greater health and social problems,
raising the demand for healthcare and social services. Employment in occupations dedicated to protecting property
rights and managing conflict is typically larger in countries with unequal distribution of income. Finally they translate
into disparities in consumption, particularly of non-durable goods and services such as education.
A number of factors have driven higher inequality, including the impact of technology and globalisation, failings
of the educational system, anticompetitive practices, weaknesses in corporate governance, the decline in union
membership and the progressivity of the tax system. Some of these trends may potentially reverse in the future –
for instance as ageing reduces labour supply, pushing up wages, or as calls for redistribution grow louder, although
these forces are most likely to operate at the margins. Past experience suggests that current levels of inequality
are likely to persist in the medium term, absent an extreme shock of some kind.
27
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
POLITICAL UNCERTAINT Y
The geopolitical landscape is characterised by a greater distribution of power that has challenged the capacity of
the international system to respond effectively to a host of regional and global challenges – from the spread of nuclear
weapons, authoritarianism and terrorism, through historical rivalries in the Middle East and Asia-Pacific to a growing
rejection of free trade and immigration. Indices of geopolitical uncertainty have persisted at higher levels after they
spiked on 9/11. This has been mirrored by elevated policy uncertainty – a weakening of the institutional structures
which enable policymakers to act credibly and consistently.
Increases in uncertainty are found to have significant negative impacts on economic activity, raising the user cost
of capital, increasing the option value of deferring investments where there are sunk costs, and hindering the efficient
reallocation of resources from low to high productivity firms. The impacts are felt most strongly in sectors like defence,
finance, construction, engineering and healthcare which require extensive investment commitments and/or are
exposed to uncertain government programmes.
The trend towards policy uncertainty is a function of structural changes in political systems – above all, the rise
in partisanship which has impeded compromise and effective negotiation, reinforced by the growth in the scale
and complexity of government regulation. Even in systems designed to produce moderation, institutions may have
had the effect of marginalising important conflicts over policy rather than resolving them, increasing public apathy
and dissatisfaction.
28
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
3. APPROACH
The dynamic interdependencies of the trends have human biases, while quantitative approaches based purely
implications for our research design. In an earlier piece of on trend extrapolation are likely to miss structural breaks
analysis, we reviewed the drivers of change that are expected in past trends and behaviours. By combining a machine-
to shape industry and occupational structures in the US learning algorithm with structured expert judgment we
and UK workforces in 2030 (Schneider et al, 2017). Drivers hope to have the best of both worlds.
were selected on the basis that they are relatively stable
Our research design is elaborate, matching the ambitious
with clear possible directions. Where the available evidence
nature of our research question, but it is important to
offered contrasting views of a trend and its implications or
note that, as a consequence, our findings could reflect
identified possible disruptions, the analysis aimed to capture
any number of assumptions. For example, the subjective
this uncertainty – rather than reach a verdict on how things
judgments of one group of domain experts could be very
would play out.
different to another, or some parts of the O*NET data set
This analysis was used to contextualise and guide could be more accurate characterisations of occupations
discussions at two foresight workshops that we convened than others. The provision of historical data on occupations
between small groups of thought leaders with domain and the main trends designed to establish a common frame
expertise in at least one of the seven trends identified. of reference among experts mitigates some of these risks.
These foresight workshops were held in Boston on 20 However, it remains the case that predictions generated
October 2016 with 12 participants and in London on 28 might have differed if a different group of experts had
October 2016 with 13 participants. participated in the workshops or if we had used different
selections of O*NET features in our model.
In the second part of the US and UK workshops, the
domain experts were presented with three sets of 10 A separate, though related, challenge is how to evaluate
individual occupations at the six-digit and four-digit our findings. As a forward-looking exercise, we might simply
Standard Occupation Classification (SOC) levels, and were compare our predictions with labour market outcomes in
invited to debate their future prospects. They then assigned 2030. Notwithstanding the fact that this is 13 years away,
labels to the occupations (individually) according to whether a concern is that because our predictions are conditional
they thought they would experience rising, unchanged or (see above), we cannot in any straightforward way identify
declining demand by 2030.11 The experts were also asked the source of prediction errors.
to record how certain/uncertain they were in making their
We try and partly tackle this by investigating the sensitivity
predictions. Factsheets presenting information on each of
of our findings to key features of our research design.
the 30 occupations (containing a list of related job titles,
In particular, we present predictions that used (non-
related industries, key skills and tasks) and their historical
parametric) trend extrapolation of employment in
growth patterns were made available to the experts when
an occupation to label the 30 occupations in place of
making their predictions.
the experts’ judgments. These data-driven labels give
We used these labels to train a machine-learning classifier a baseline against which those built on our foresight
to generate predictions for all occupations, making use exercises can be compared.
of a detailed data set of 120 skills, abilities and knowledge
It is also important to note that our UK and US findings are
features against which the US Department of Labor’s
not directly comparable. While the common use of O*NET
O*NET service 'scores' all four-digit occupations in the
data means that it is tempting to compare the O*NET
US SOC on a consistent basis (we used a cross-walk to
features that are most and least associated with predicted
also apply this data set to the UK SOC). To maximise
higher demand occupations in the two countries, we
the performance of the algorithm we used an active
would actually have made significant changes to the
learning method whereby the second and third sets of 10
research design if our objective had been to undertake a
occupations to be labelled were selected by the algorithm
cross-country study. For example, we might have asked
itself (intuitively, these occupations were selected to cover
one common group of domain experts to label occupation
that part of the skills/abilities/knowledge space where the
prospects for both the US and UK using a set of the most
algorithm exhibited highest levels of uncertainty based on
similar occupation groups (crosswalked) across the two
the previously labelled occupations).12 From the model we
countries. Differences in the SOC structures in the two
determined which skills, abilities and knowledge features
countries also complicate comparison of the occupation
were most associated (on their own and together) with
predictions. As such, while our use of standardised
rising or declining occupations.
reporting in the US and UK results might invite
Our mixed methodology approach – making use of comparison and contrast, any such inference would not be
structured foresight and supervised machine-learning valid. Our focus on obtaining the best possible results for
techniques – was crafted to tackle the limitations in each country compromises our ability to compare the two.
traditional qualitative and quantitative exercises. In
particular, qualitative approaches based purely on eliciting
the judgements of experts are likely to be subject to
29
5.4). Finally, we use the model to predict future occupati
hotspots of high demand that are not associated with an exis
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0 (Section 5.5).
To benchmark the efficacy of our foresight exercise, we al
processes to perform extrapolation out to the year 2030 of p
trends (Section 5.6). These extrapolations provide alterna
bels produced in the workshops, and provide results that do
subjective judgments of the experts.
4. DATA 5.1. Gaussian processes
In this section, we give a brief review of Gaussian processes.
4.1. O*NET (Rasmussen and Williams, 2006) is a probability distribution
f : X → R, such that the marginal distribution over the fun
To derive the demand for skills, abilities and knowledge In the remainder of this report, we will
any finite use
subset to represent
of X (such as X) is multivariate Gaussian.
f (x), the prior distribution over its values f on a subset x ⊂ X
from our occupational projections, we rely on data from the vector of length 120 containing these variables.
specified by a mean vector m and covariance matrix K
the (O*NET), a survey produced for the US Department In our workshop factsheets, we also include the 1 1
p(f | K) := N (f ; m, K) := √ exp − f K
of Labor (Occupational Information Network (O*NET), occupation description and five common job title det 2πK 2
2017). The O*NET survey contains information on more Thealso
examples for the occupation, covariance
takenmatrix
from isO*NET.
generated
15 by a covariance function
30
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
31
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
4.2. EMPLOYMENT MICRODATA Over the course of the workshop, the group participated in
three prediction sessions. In each session, the participants
To form yearly estimates of employment by occupation
viewed the information described above for 10 occupations,
and industry for our workshops, we used US and UK
displayed on two slides ('factsheets') per occupation. (See
employment microdata. For the US, we used 1983 –
Figures 2a and 2b for examples of these factsheets.)
2015 data from the Current Population Survey’s (CPS)
Annual Social and Economic Supplement (ASEC) from Note that the factsheets presented time-series plots of the
the Integrated Public Use Microdata Series (IPUMS) by occupation to participants, such that they could form their
the Minnesota Population Center (King et al., 2010). We predictions with proper historical context. After viewing the
used an IPUMS-provided best-guess harmonisation of occupation descriptions the group was directed to an online
occupation codes over time to 1990 Census occupation form to answer two questions:
codes. We then crosswalked these codes to six-digit US
1. What will happen to the share of total employment
SOC 2010 codes.
held by this occupation?
For industry, we presented CPS-derived estimates of
{Higher share, Same share, Lower share}
occupation employment by industry in 2015, harmonised
in the same way. We used the most granular common 2. What will happen to the number of people
level of North American Industry Classification System employed in this occupation?
(NAICS) 2012 code available for that occupation (either the
{Grow, No change, Decline}
four-, three-, or two-digit level).
With only a three-point scale, it was important to consider
The comparable high-resolution microdata is only readily
both employment share and absolute employment levels
available in the UK over a shorter time period due to
so as to allow a fuller expression of judgments of future
challenges in matching the SOC codes across changes in
demand. For example, only knowing that an occupation
the classification over time. We used yearly occupational
will grow slower than the workforce as a whole says
employment estimates based on the Labour Force
nothing about whether it will add or shed jobs.16 In the
Survey and provided by the Office for National Statistics
event, both US and UK workshops were of the view – albeit
(ONS) for 2001 and 2016 inclusive (ONS, 2017a). These
with significant differences in opinion across individuals
were provided at the four-digit ONS SOC 2010 level, the
– that total employment would grow over the prediction
equivalent level of granularity as the six-digit US SOC
horizon, consistent with the historical pattern.
2010 occupation code. We further generated estimates
of occupation employment by UK Standard Industrial Given the inherent difficulties in making long-term predictions,
Classification (SIC) 2007 industry class and subclass in our workshop participants were also asked to provide a
2015 using the Labour Force Survey provided by ONS 0-9 ranking of how certain they were in their answer, with 0
(2017b). representing not certain at all, and 9 representing completely
certain. They were also given a space to provide freeform
The US employment results from our machine-learning
thoughts they felt were necessary to qualify their answers.
classifier were weighted using the May 2015 Occupational
Employment Statistics from the Bureau of Labor Statistics After the group submitted their answers using the online
(US Bureau of Labor Statistics, 2015). The corresponding form, an experienced foresight workshop facilitator
UK employment results are weighted using the August reviewed the responses with the group. After the group
2016 Labour Force Survey from the Office for National debated their perspectives during a half-hour session,
Statistics (ONS, 2017a). the group was then allowed to change their answers,
after which the workshop moved onto the next set
4.3. WORKSHOP-GENERATED DATA of 10 occupations.
To collect labels to train our machine-learning classifier The first 10 occupations presented to participants were
of future demand for occupations, we held two expert selected randomly. For the second and third rounds, the
foresight workshops in Boston and London in October respective batches of 10 occupations were selected so
2016. Each workshop brought together a diverse group as to be maximally informative for the machine-learning
of 12 to 13 experts from industry, government, academia, model in light of the answers previously gathered from
and the social sector. Our experts were instructed to participants. In a way, the participants could be seen as
consider the net impact on the workforce occupation teaching the machine-learning algorithm throughout the
composition of all the trends discussed above, guided by course of the day, with the algorithm able to respond
our trends analysis. Figure 1 features a sample page of to the information from participants by proposing a
the trends analysis, which was shared with our experts in prioritised list of further questions. This process is
advance, and presented at the workshops. described formally in Section 5.3.
32
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Future of Skills
Peak globalisation? Rapid expansion of global trade may have World trade (percentage of GDP)
The importance of place
run its course
70
Specific trade opportunities Evidence that trade has become less responsive
Growing global to global GDP growth - suggesting that trade 60
85
65
95
75
15
05
20
19
19
19
19
20
Source – World Bank (2016)
Political uncertainty emerging economies
Source: World bank
GATT WTO
(1947-94) (1995-Present)
0
1955 1971 1987 2003
46
33
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
34
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
35
ing completely certain.foresight
rienced foresight
rienced They were
workshop workshop also given
facilitator revieweda space
facilitator reviewedto provide
the After
responses the freeform
with
the responses
group the group.
with thetheir
submitted group. answers using the online form, an expe-
ith 0 representing
thoughts not
theycertain
felt all, and 9their
at debated
were represent-
After the group
After the necessary
group debated to qualify theirtheir
perspectives answers.
during
perspectives
rienced a during
half-hour
foresight session, the
a half-hour
workshop session,
facilitator thereviewed the responses with the group.
They were After
also
groupgiven
thewas a
group space to
thensubmitted
allowed provide their freeform
answers using the online form, an expe-
group was then to change
allowed totheir
change answers,
their
After afterthewhich
answers, after
group the workshop
which
debated the workshop
their perspectives during a half-hour session, the
necessaryrienced
to qualify
moved their
foresight
onto answers.
workshop
the ontonext the facilitator
set of tenset reviewed
occupations. the responses with theallowed
group. to change their answers, after which the workshop
moved next of ten occupations. group was then
mitted their answers
After theThe using
group the
firstdebated online theirform, an expe-
ten first
The ten perspectives
THE
occupations F UTpresented
URE OF
occupations during
to
presented La half-hour
SKIparticipants
L S:
moved toEMPL ontowere session,
OYMEN
the selected
participants Tthe
nextwere I N ran-
set 2 0ten
3 0 occupations.
selected
of ran-
hop facilitator
group reviewed
was then theallowed
responses with the group.
domly. For
domly. Fortothe
the second change
andsecond their
third and answers,
rounds,
third the after which
respective
rounds, Thethe firstthe
batches
respective workshop
ten ofbatches
ten oc-of presented
occupations ten oc- to participants were selected ran-
ed their perspectives
movedcupations during
onto thecupations
were a half-hour
next selected
set were
of tenso session,
occupations.
as to be the maximally informative for second
the machine
selected so as to be maximally
domly. Forinformative
the forandthethird
machinerounds, the respective batches of ten oc-
d to change theirThe answers,
first ten after in which
light ofthe
occupations workshop
learning model
learning model inpresented
the answers
light of the topreviously
participants
answers were were
gathered
previously
cupations selected
fromselected
gathered ran-
participants.
fromsoparticipants.
as to be maximally informative for the machine
t of ten occupations.
domly. In For theInthe
second and
a way, a way, the third
participants rounds,
could
participants the respective
be could
seen as
be teaching
seen
learningasbatches
the
teaching
model ofinten
machine the
light oc-
learning
machine learningpreviously gathered from participants.
of the answers
pations presented
cupations to were
algorithmparticipants
selected
throughout were
so theselected
as tocourse ran-the day,
be maximally informative for the machine
theof course of with athe algorithm able to re-
5. METHODOLOGY
algorithm throughout the
In day, way, with the the algorithm
participants able tobere-seen as teaching the machine learning
could
and thirdlearning
rounds, the
model respective
spond to spond in light
the informationof batches
the of
answers ten oc-
previously
from participants gathered
by algorithm
proposing from participants.
a prioritised list course
of list
to the information from participants by proposing
throughout a prioritised
the of ofthe day, with the algorithm able to re-
d so as to Inbe amaximally
way,
further informative
thequestions.
participants This for
could thebe
process machine
seen as described
teaching the machine learning
further questions. Thiswill be
process will bespond formally
describedto the in Section
formally
information in5.3.Section 5.3.
from participants by proposing a prioritised list of
of the answers previously
algorithm gathered
throughout thefrom courseparticipants.
of the day, with the algorithm able to re-
further questions. This process will be described formally in Section 5.3.
ants could spond
be seen to astheteaching
information the machine
from learning
participants by proposing a
Our methodology uses the foresight exercises described prioritised list of to manage participant labels that are both ordinal and of
the course further
of the 5.day,METHODOLOGY
questions.5. the
with METHODOLOGY
This algorithm
process will ablebetodescribed
re- formally in Section 5.3.
on from participants by proposing a prioritised in Section list4.3
of as training data for a machine learning
5. METHODOLOGY varying uncertainty. Our model is put to work on a variety of
process will beOur methodology
described Our uses
methodology
formally inmodel.the uses
Section foresight
5.3.
The
exercises exercises
the foresight
primary
describeddescribed
goal of the
in Section
model
in 4.3
is to
as 4.3 as
Section
learn a function tasks. In particular, it is the basis of our means of selecting
5. METHODOLOGY
training data for a machine learning
training data for a machine learning model. model. The primary goal
The primaryuses
Our methodology of the model
goalthe of the model exercises described
foresight in Section 4.3 as
is to learnis atofunction (x) that
learn a ffunction thatfmaps maps
(x) from
that from120 training
maps O*NET
120
from O*NET
120 variables
O*NET
data forvariables (captur-xlearning
avariables
xmachine (capturing
(captur-model. The primary occupations to be labelled through active learning (Section
goal of the model
GY Our methodology
ing skills,ing uses the
knowledge
skills, andforesight
knowledge abilities)
and exercises
to future
abilities) described
occupational
to isfuture inoccupational
to learn Section
a demand.
function 4.3 demand.
as
fIn(x)this
that In thisfrom 120 O*NET variables x (captur-
maps
training data for
framework, a
the machine skills,
learning
occupation knowledge
ismodel.
framework, the ith occupation is considered
ith consideredThe and
aprimary abilities)
point,
ing skills,
x (i) of to
goal ,
a point,in the future
model
120-dimensional
knowledge x , in and
(i) occupational
120-dimensional 5.3). Interpreting
abilities) to future occupational demand. In this
the patterns discovered by the model is
the foresight exercises
is to learn described
a function f (x)inthat
skills-/knowledge-/ability-space, Section
maps4.3 from as 120
whose O*NET
associated variables
demand is (captur-
f (xoccupation
(i)
).is Our
skills-/knowledge-/ability-space,
demand. In this whose
framework, associated
framework, thethexdemand
ith occupationf (x(i) ). Our
is
is considered the
a point, x(i) , in basis of our
120-dimensionalassessment of the importance of O*NET
chine learning model.
ing skills, Theisprimary
knowledge builtand goal of thetomodel
approach approach is abilities)
on an
built on an future
expectation thatoccupational
expectation demand that demand.
should
demand varyshould In this
smoothly
skills-/knowledge-/ability-space, as
vary smoothly whoseas associated demand is f (x(i) ). Our
(x) that maps from 120
framework, theO*NET considered
variables
ith occupation (captur-a point,
is xconsidered x(i) ,, in
a point, approach in 120-dimensional
a is120-dimensional skills-/
built on an expectation that demand should vary smoothly as
features to future demand (Section 5.4). Finally, we use the
nd abilities) to future occupational demand.
skills-/knowledge-/ability-space, In this
whose associated demand is f (x(i) ). Our
upation is approach
17
Consider
consideredisa built engineers
point,onx an, in
17 (i)
Consider knowledge-/ability-space,
and metal
120-dimensional
engineers workers
and metal and plastic
workers workers
and plastic whose
– two
workers associated
occupation
– two groups
occupation demand
groups model to predict future occupations, defined as hotspots
expectation that demand should vary smoothly as
ity-space, whosewhich the which
associated BLS projects
the BLSwill
demand is fdecline
isprojects
(x(i) )..
in share
will declinebetween
Our
Our in share2014
approach 17 and 2024.
between
is 2014
built
Consider on
Over2024.
and
engineers an
this period,
Over
metalthis
expectation
and period,and plastic workers – two occupation groups
workers that of high demand that are not associated with an existing
metal workers and plastic workers are projected
metal workers and plastic workers are projected to lose 99,000
which to thejobs
lose BLS whereas
99,000 engineers
jobs whereas
projects are
engineers
will decline are between 2014 and 2024. Over this period,
in share
n expectation that demand
17 projected to addshould
projected 65,000 vary smoothly
jobs.65,000
to add jobs.
as
Consider engineers and metal demand
workers and should varymetal
plastic workers smoothly
– two asplastic
occupation
workers and agroups
function
workers are ofprojected
skills, to lose 99,000 jobs
occupation (Section
whereas engineers are 5.5).
which the BLS projects will decline in share between 2014 and 2024. to
projected Over
addthis period,
65,000 jobs.
metaland
workers and a function
plastic knowledge
–workersofare
a function
skills,
projected and
knowledge
to
of skills, lose abilities:
and abilities:
99,000
knowledge jobs
and that
whereas is,
that if two
is,
engineers
abilities: occupations
if are
two
that is,that
occupations i and
if two
d metal workers plastic workers two occupation
a functiongroupsof skills, knowledge
a function (i) of and
(j) abilities: ifoccupations
is, abilities:
two occupations
i i Tooccupations
benchmark i the efficacy of our foresight exercise, we
ll decline inprojected to add2014
share between and
65,000 have Over
andjobs.
j 2024. similarthisO*NET 34variables,
period,O*NET 34 x xskills,
, weknowledge
expect their andassociated that is, if two
and j have
have
and jsimilar
similar
have (i)O*NET
similar variables,
O*NET variables,
x(i) x(j)
variables, (i) , we (j)
expect their
weexpect
, we
xvariables, expect(i)associated
their
their
(j)associated
demands
workers are projected to lose 99,000 to bea engineers
jobs whereas similar,
function (x
fare
of )and
skills,f(i)(x have
j (j) ).(i) similar
knowledge (j) and(j) O*NET
abilities: that is, x
if
34
two , we expect
xoccupations their use
also associated
Gaussian processes to perform extrapolation out
demands to be similar,
demands f (x ) f(xf (x) ).f (x ).
to be similar, i
bs. We choose associated
to model fdemands
andWe j have with ademands
similar Gaussian to to
O*NET be besimilar,
similar,
process, bef (x
to (i)
variables,
(i)
) f (x
described )..
in(j)Section
toxbe, describedwe expect 5.4). Finally, we use the model to pred
intheir associated
(j)
We choose to model
choose
34 f with
to model a with
Gaussian a to process,
Gaussian x process, to be described Section in Section to theinyear 2030 of past employment trends (Section 5.6).
5.1. The Gaussian demands process is asimilar,
toprocess
be Bayesian Weffchoose
(xnon-parametricfmodel ). fmodelwith a(Ghahramani,
Gaussian process, to be described Section hotspots of high demand that are not asso
a )Bayesian (x(j) non-parametric
(i)
5.1. The 5.1.Gaussian
The Gaussian is a Bayesian
process isGaussiannon-parametric model model (Ghahramani, (Ghahramani,
2013), meaning
We that
choose
We its expressivity
choose to to 5.1.
model
model Thewill with
withnaturally
a a process
adaptprocess,
Gaussian
Gaussian is a Bayesian
to process,
thatto inherent
be non-parametric
to in
described be in Section model These(Ghahramani,
5.4).
5.4). Finally,
Finally,
extrapolations we we use use thethe
provide model
model to to
alternativespredict
predict future
future
to occupations,
(Section
the 5.5).
occupations,
labels defined
definedasas defined as
34 2013), 2013),
meaning that itsthat
meaning expressivity f
its expressivity will naturally
will adapt to
naturally adapt thattoinherent
that inherent in into that inherent in 5.4). Finally, we use the model to predict future occupations,
the data. This5.1. gives us Gaussian 2013),
an in-builtprocess meaning
resistance that
to over-fitting its expressivity
(learning patterns will naturally adapt hotspots
hotspots ofof high high demand
demand thatthat are are not not associated
associated withTo anan existing
benchmark occupation
the efficacy occupation
of our fores
the data.theThis
described
The
data. gives
This
in us an in-built
gives
Section the usdata.
an is a Bayesian
resistance
in-built
5.1. ThisThe resistance
Gaussian
gives us
non-parametric
to over-fitting
an to over-fitting
in-builtprocess (learningmodel
resistance (learning
is a to
(Ghahramani,
patterns patterns (learning
over-fitting produced patterns in the hotspots
workshops, of high demand
and that
provide notwith
are results associated existing
that with
do
occupation
an existing
that do not generalise
2013), to unseen
meaning data): the model 5.4). will not
5.4). induce
Finally, Finally,
wetheuse flexibil-
weto use
the the
model modelto predictto predict future
(Section
future(Section occupations,
5.5).
occupations,
5.5). defined definedas as processes to perform extrapolation out to
that dothatnot dogeneralise
not generalise tothatunseenits
to expressivity
data): data):
unseen the model will
the naturally
will not
model willadapt
induce
notthe the
induce that
5.4). inherent
flexibil-
Finally,
the5.4).flexibil- wein
Finally, usewe the usemodel the to (Section
model predict to 5.5). future
future
predict occupations, occupations, defineddefined as as
ity requiredBayesian
tothe
give a near-perfect that do
fit an not
onin-built generalise
training data to unseen
unless
hotspots the data):
ofquality
high and
demand model will
that not induce
areassociated
not5.4). the
associated flexibil-
with
ToTobenchmark ansubjective
existing
the occupation
efficacy
ity required data.
to give
ity required non-parametric
This
to givesaus
a near-perfect
give near-perfect fit model
on training
fit
hotspots
on (Ghahramani,
resistance data
training
of over-fitting
to high
unless demand
the2013),
datafitunless
that and
(learning
quality
hotspots
the quality
are not
patterns
of high
hotspots not
andofdemand
high relywith
thaton
Finally, anbenchmark
are weexisting
thenotuse the
To
associated
occupation
themodelefficacy
benchmarkwith toof
judgments
an ofour
predict
the our
existing
foresight
foresight
an future
efficacy ofofthe exercise,
occupation
trends
exercise,
experts.
occupations,
our foresight
we wealso
(Sectionalsouse
defined
exercise,use Gaussian
5.6). aswe These
Gaussian also use extrapolatio
Gaussian
quantity of data suggests that this ity required
fit will to data):
extend give a near-perfect
equally well
(Section 5.5).to5.5). on training
unobserved data unless the demand
quality
processes
that
and are not associated with existing occupation
quantity that do not
of data
quantity of generalise
suggests
data thattothis
suggests unseen
that fitthis
will fit (Section
extend the
will model
equally
extend will
wellnot
equally to induce
unobserved
well(Section
toto the flexibil-
5.5).
unobserved
(Section hotspots
5.5). of hightodemand
processes toperform
perform
processes
extrapolation
that extrapolation
toare not associated
perform
out outtotothe
extrapolation theyear
with anbels
year 2030
2030
outexisting
to
of past
produced
theofoccupation
past
year
employment
in the of
employment
2030 workshops,
past employment and prov
meaning
data. This desirable
ity required that
property
to its
give expressiveness
quantity
is ainduced of
near-perfect data
by the suggests
fit onwill
‘Occam’s
To naturally
that
To Razor’
benchmark
training this
benchmark
data fit adapt
thewill
implicit
unless extend
the
efficacy
the that
efficacyequally
of ourofforesight
quality and well
our(Section to
foresight unobserved
trends
exercise,
trendsexercise,
(Section
we also
(Section we use also
5.6).5.6). use
These
Gaussian Gaussian
These extrapolations
extrapolations provide
provide alternatives
subjective judgments
alternatives totothe of
the la-la-experts.
the
data. data.
This desirable
This desirable property is
property induced by
is induced the ‘Occam’s
by theis‘Occam’s Razor’ implicit
Tothe
Razor’ benchmark
implicit
To benchmark 5.1.
the efficacy
the5.5).
GAUSSIAN of our5.4).
efficacy trendsPROCESSES
foresight
of our (Section
exercise,
foresight 5.6).we also
exercise, These
we use extrapolations
alsoGaussian
use Gaussian provideoccupations,alternativesdefined to the as la-
within Bayesian reasoning (MacKay,data. This
2003), desirable
and fit is property
suggested
processes processes to perform induced
empirically
to perform by
byextrapolation
extrapolation ‘Occam’s
out toout Razor’
theto the
bels
year implicit
bels year
produced
2030 2030
of past
produced ofin
in past
the Finally,
employment employment
theworkshops, we
workshops, use and the
and model
provide
provide to
results predict
results thatthat future
dodonot notrely relyonon the
inherent
within quantity
Bayesian
within inof
Bayesian the
data
reasoning data.
suggests
(MacKay,
reasoning
5.4).
within
Thisthat gives
(MacKay,
Finally,
this
2003), us will
and
2003),
wereasoning
use
an
the
in-built
isextend
suggested
and
model isequally resistance
suggested
to predict
well
empirically to unobserved
processes
empirically
future byto
processes perform
byto perform
occupations,
To benchmark
extrapolation
defined extrapolation
as outthe
bels
hotspots
efficacy
to produced
the
out ofyear
to of
the
high
our
inyear
2030 foresight
the
demand workshops,
of2030past that ofexercise,
employment
past
are not and weprovide
employment also
associated
use Gaussian
results
with that
an do the
existing notoccupation
rely on the
the cross-validation
data. tests desirable
This presented inBayesian
propertySection is 6.2.
trends
induced The
trendsby (MacKay,
Gaussian
(Section (Section
the 5.6).
‘Occam’s2003),
process
5.6).
These and
Razor’These is implicit
suggested
extrapolations
extrapolations empirically
5.4).5.4).
provide
processes Finally,
provide
subjective
Finally,
subjective
to
bywe we
alternatives
perform use
judgmentsuse
alternatives
judgments the the
to
extrapolation ofthemodel
to
the
model
of the
la-
the to
out to
experts.la- predict
predict
experts.to the future
future
year 2030 occupations,
occupations,
5.1.
of Gaussian
past defined
defined
employment processes as as
the cross-validation tests presented in Section 6.2. The Gaussian process
trends with (Section 5.6). These extrapolations5.4).
subjective
5.4).give Finally, we
ajudgments
Finally,
provide we use the
useofalternatives
alternatives model
the of
the experts.
model to
theto
toGaussian predict
la-predict future
la- future occupations,
occupations, defined a defined as
tothe
gives a flexible,
cross-validation
over-fitting
non-linear, thetests
(learning
hotspots
function
presented
ofpatterns
high
cross-validation
class suitabledemandintests
Section
that
for that
the
bels do
presented6.2.
arenot
complex
produced
The
generalise
notthe in Gaussian
associated
Section
interac-
inworkshops,
the interac- trends
6.2.
workshops,
process
an
The (SectionIn
existing
byGaussian
and this5.6).section,
occupation
hotspots
provide These
process
of
results high that extrapolations
we
(Section
demand
do not 5.5).
that
rely briefprovide
are
on not
thereviewassociated with to an theexisting occupation
within
gives agives
flexible, Bayesian
non-linear,
a flexible, reasoning
function
non-linear,
(Section class bels
(MacKay,
function
5.5). suitable
class
produced
2003), and
for
suitable isinfor
the suggested
complex
the bels
complex empirically
produced
bels
and
interac-
provide
in
produced thehotspots
trends results
workshops,
in the
ofthat
(Section high
workshops,and
dodemand
5.6). not
hotspots
provide
and
To
rely
These
hotspots that
ofon are
results
provide
benchmark
the
high not
extrapolations
high
of demand
that associated
demand
results
the do that
efficacythat
not provide
thatrely
do
with
are are
ofnot not
on
our
an
notthe
rely
In
existing
alternatives
associated
associated
on the
this
foresight section,
occupation
to
with
exercise,
the
with la-
weangivean
we existing
existing
also useoccupation
a brief occupation
review
Gaussian of G
tions (for instance, complementarities) gives a flexible,
that wewill non-linear,
expect
subjective between
subjective functionvariables
judgments class
judgments ofandsuitable
the ofexperts.
the for the
experts. complex
(Section
(Section
processes. 5.1. interac-
5.5).5.5).
GaussianFormally, processes a (Rasmussen and Williams, 2006)
tionsto the
unseen
(for cross-validation
instance, data):
tions (for instance,tions the
complementarities)tests model presented
complementarities) that in
not
we Section
induce
expect
that we expect 6.2.
between the The Gaussian
flexibility
variables
between subjective and
variables process
judgments
subjective and bels
5.4).
judgments 5.1.
produced
Finally,
of the Gaussian
experts.
of thein the(Section
weexperts.use
5.1. processes
workshops,
the
(Section 5.5).
Gaussian gp
model
5.5). andto provide
predict
processes results that
future occupations, do
(Rasmussen not rely on
defined
and theas
demand. gives a flexible,
To (for
non-linear,
benchmark
instance,
function
the efficacy
complementarities)
class suitable
of our foresight
for that
the we
complex
exercise,
expect we also
between
interac-
use
To
subjective
Gaussian
variables
To
benchmark and
benchmark
judgments
processes
the the
of the
to
efficacy
efficacy perform
of
experts. of
our our extrapolation
foresight
foresight exercise,
exercise, out we to
we the
also also year
use use Gaussian 2006)
2030
GaussianWilliams,
of past employmentis a pro
demand. demand. to giveprocesses hotspots
is a of
probability high demand To To
distribution benchmark
that a are
benchmark not
overthe efficacy
associated
theThese efficacy of
with our
of ouran foresight
existing
foresight exercise,
,occupation
exercise, we
thatwe also alsouseuseGaussian
to Gaussian
required
This model
Thistions
is trained
model (for isinstance,
trained
acomplementarities)
near-perfect
demand.
on a dataset
on a dataset
to perform
containing fitextrapolation
labels
5.1.
containing
on
that5.1. for
Gaussiantraining
we
labels
each
Gaussian
expect
for
outdatato
occupation
processes the
processes
between
each
unless
occupation
year 2030 of past
variables and (Section
InInthis
employment
processes
processes to section,
this to section,
perform
perform trends
In
we
processes
we givegive
(Section
extrapolation
extrapolation
this section,abrief
brief
we
review
5.6).
out out
give tofunctions
review to ofthe
athe ofGaussian
Gaussian
year
brief year 2030
review f processes.
extrapolations
2030 : of
processes.
X of→past
past provide
Gaussian such
Formally,
such
Formally,
R,employment alternatives
employment
processes.
athe agp marginal
gp
Formally, thedistri
a la-
gp
from each the
This
individual
model
expert,
is trends
trained
rather This than
on
(Section
modela dataset
on data 5.6).
is
the
containing
trained
group These consensus.
labels
on aextrapolations
dataset This
for each
5.1.
containing
ap-
occupation
Gaussian
provide 5.1. 5.4).
labels processes
Gaussian
alternatives
forFinally,
each processes
trends
trends we 5.5).
(Rasmussen
to(Rasmussen
the
occupationusela-the
(Section
(Section and
bels
5.6).
processes
and
model
5.6). Williams, totopredict
Williams,
produced
TheseThese
toperform
perform
2006)
in2006)
the isextrapolation
extrapolations
extrapolations isaextrapolation
workshops,
future aprobability
probability
occupations, and
provide
provide
out out to the
to
distribution
any finite the
distribution
provide
defined
alternatives
alternatives
year year
subset
results
as
to
2030
over
over
to
2030
of
that
the the
ofdo
functionsofpast
functions
X
la- la-
past
(suchnot employment
asemploymen
rely X)onisthemu
from each demand.
quality
individual
from each individualand quantity
expert, rather
expert, ofthan
rather on suggests
than
In the
this group
onsection,
In therather
this that
consensus.
group
section,
we than this
consensus.
we aon
give fit
This
give
brief willap-This
a brief
review ap-
review that
5.1. To
ofbels
ofdoGaussian
the
Gaussian
benchmark
Gaussian
:
marginal
processes. processes.
such
(Rasmussen
trends
processes
the trendsdistribution
efficacy
that
Formally, (Section
(Section
Formally,
the a
and
gp a
Williams,
5.6).
ofmarginal
our over
foresight
5.6).
gp These Thesethe2006)
distribution
is
function
extrapolations
exercise, awe
extrapolations
over
probability
(x), also
the
values
the provide
use
function
prior
distribution
on
Gaussian
provide alternatives
alternatives
values
distribution on
over
over to functions
thethela-la
to values
its
proach permits the diversity of bels
from
views produced
each
within in the
individual
the group workshops,
expert,
to be and
captured provide
within results
the group that
hotspots not
consensus.of rely
high : on
demand
This
a produced the
ap-
R, R,
in in such
subjective
that the are thatnot
workshops, the
judgments marginal
associated and ofwith
provide distribution
the experts.
an existing
results over
that the
occupation function values on
ado not rely onthethe
bels f f X
produced X → → the workshops, f
proachproach This
permits model
the
permits is trained
diversity
the of views
diversity onofawithin
dataset
views the containing
within group the to labels
be for
captured Ineachthis
within section,
occupation
In this weprocesses
section, givewe brief
give to review
aperform
brief bels : extrapolation
of
freview
belsXGaussian
→ofR,Gaussian
produced
produced inand
such
processes.
the
out
in provide
that
the the
processes.
workshops,
tois the
)workshops,
results
marginal
Formally,
yearFormally,
and
2030athat
and gp do
distribution
provide
ofprovide
past not
gpresultsrely
over
employment
results onthatthe
that do function
do not asrely
not values
on
rely ontheon
the
the model.extend
Thefrom equally
participant
each well
subjective
labels
individual proacharetomodelled
expert, unobserved
judgments
permits ratherthe of (Rasmussen
as(Rasmussen
diversitydata.
the
conditionally ofgroup
and
This
experts.views to
and be
Williams,
desirable
within
independent
captured
Williams,the2006)
(Rasmussen group within
2006)
is
(Section
and any
isbeaIn
atoprobability5.5).
captured finite
probability
any
subjective
this
subjective
Williams,
5.4).
finitesubset
within
section,
2006)
distribution
distribution
any finiteFinally,
subset
judgments
we
judgmentsis subset
give
any of
over
asubjectiveof
we
ofaaof
probabilityofX
the
finite (such
over
use
(such
functions
the
brief
X functions
the
(such as
experts.
review
experts.
subset as
model
as
ofof
X)
distribution X) ismultivariate
to
Gaussian
(such multivariate
predict
multivariate
over as
future
processes.
functions
Gaussian.
specified Gaussian.
occupations,
Gaussian.
isfunctions by
Formally,
multivariate aFormean
For aa gp
Gaussian.
defined
afunction
vector
function m
For and covaria
a function
the model.
the The
model. participant
The participant
the
labels
model.
are
labels
The arefthan
modelled Xas
:modelled
participant
on
XR,
the
: conditionally
as
labels
group
suchconditionally
R, aresuch
that
consensus.
independent
the
modelled as
This
(Rasmussen
independent
thatmarginal
the marginal
ap-
distribution
To
conditionally
and
trends
benchmark
Williams,
distribution (Section
hotspots
(x),
over
(x),
independent theover
the
the
2006)
prior5.6).
the
of
function
prior
efficacy 5.1.
is These
subjective
function
high
distribution
of
probability
demand
values
distribution
our
Gaussian
judgments
extrapolations
judgments
valuesthat
over
on
foresight over
processes
distribution
on
Xof
are
its itsvalues
exercise, the
of
not
values experts.
provide
the
associated
we on
over
X)alternatives
experts.
ona
alsosubset
a with
subset
use an
Gaussian to
existing
are the
are la-
occupation
completely
completely
given the latent function
proach permits (x). the The dependence
diversity of views amongst
within
f→ the
the
→ group’s
group to labels
be :
captured such
within (Rasmussen
For
that a
f
the f
function
marginal andmarginal
Williams,(x),
distribution , the
the 2006)
prior
prior
over isdistribution
atheprobability
distribution
function
f
over fdistribution
its
values values
over on its
x ⊂
over
x ⊂ X
values a Xfunctions
subset are completely 1
givenproperty is induced by the ‘Occam’s Razor’ implicit within : such that the distribution over the function values on
f R,
a function of skills, knowledgethe latent
and function
abilities:
given the latent function that
f (x). is,The if
f (x). dependence
two The occupations
dependence
any amongst
any
finite finite
subset
i the
amongst
subset group’s
X theof
ofdependence
f X labels
Xgroup’s
(such
→ f
(such X
asamongst labels
as →
isX)
X)processes
R, bels produced
isf :multivariate
multivariate (Section
specified
Gaussian.
specified in the
Gaussian.
by 5.5).
by
f
aprocesses
ameanworkshops,
For
mean aFor
vector and
a function
function
vector and provide
and covariance
covarianceresults that
matrix
matrix do f not rely on x
the ⊂
| K) :=onN (f ; m, K) := √
X
induced by discussion
the is (i)
modelled given
5.1. the
Gaussian
through latent function
thisprocesses
shared f (x).
latent The
function. any finite subset the to
5.1.
ofsubjectiveperform
group’s
X (such labels
Gaussian
R, assuchextrapolation
that
is5.1. the
specified
multivariate out
marginal
by atoGaussian.
mm the
mean year
distribution
vector 2030
Formaover of
and pastthe employment
covariance
function
KK p(f values
function matrix
induced
and j have similar O*NET by model.
discussion
variables,
induced by The isxparticipant
x discussion modelled
(j)
, we expect
is(MacKay,
modelledlabels
through are
their fthis
through modelled
(x), shared
associated
fthethis
(x),
as
shared
the
prior
conditionally
latent
prior function.
latent
distribution
independent
function.
distribution over
any over
its
finite
its values
values
5.1.
subset X
on
on a
→Gaussian
ofon
a
X
subset a
subset
(suchX)processes
judgments
subset
To
asIn
benchmark this
X)
5.1.
areof isthe
Gaussianmultivariate
section,
experts.
Gaussian
are
are
the
completely completelyweprocesses
give
processes
completely
efficacy of
Gaussian.
a brief
our specified
foresight
For
review a offunction
by Gaussian
exercise, a mean
we also
K
processes.
use Gaussian Formally, a gp det
Bayesian
Resilience to uncertainty
(i)given the
reasoning
latent is induced
crucial
function to byour
f (x). discussion2003),
exercise. is
Not and
modelled
only is
did suggested
through
our work- this
(x), shared
the trends
prior latent (Section
f
thedistribution function.
f
anydistribution 5.6).
finiteover subset x These
⊂
itsover
X x
values
⊂
of Xits(suchextrapolations
X
f on aasfsubset provide
issubset
xmultivariate
⊂ X xare alternatives
Gaussian.
1 completely
1X are to the
1For1 a −1la-function
toThe dependence amongst the group’s (x), labels prior values onreview
aWilliams, completely
X)
demands to be similar, f (xResilience to(j)uncertainty
) Resilience
f (x ). In
to uncertainty is crucial
this section, we
is crucialourspecified
exercise.
give
to our aspecified
brief Not
review
exercise. only Not ofdid four
Gaussian
only work-
did processes.
four work- Formally,
In this aKgp
section, (Rasmussen
we :=
give aover ;and
brief := of 2006)
⊂
Gaussian isexpaTheprobability
1− Formally, distribution a.1 gp over functions
by arepresen-
mean vector and covariance matrix
processes K|to | perform (fextrapolation out to :=processes.
the year 2030 f Formally,
fof past aemployment (1)
shops gather observations from individual Resilience to uncertainty
participants with by acrucial
mean
is shared
explicit vector
to our and
exercise. mbelscovariance
by Not onlyIn
produced matrix
this
did section,
in
our thework- p(f we
workshops, give
K) a:= brief
and
N N (f ;K)
review
provide
m, K) of :=
results fGaussian
√on √ that do exp
processes.
not −covariance
⊂rely on Kthe
−1fmatrix
. gp
fis generated
(1) . abygp ac
a(x), the prior distribution |its values a2πKsubset are completely
m p(f K) m, K)
:= (f K f−
induced by bydiscussion isindividual
modelled through this latent specified
function. a mean vector
fvector and
and In In
covariance this
Xthis
fcovariance section,
section,
matrix we
matrixwe givegive a;marginal
a det
brief brief
:m, review
review x√of Gaussian
of 2Gaussian exp processes.
processes. Formally,
Formally, (1)
aongp
−1
We choose to model shopsempirically
f withgather observations
a Gaussian
shops gather the
process,
observations cross-validation
from
(Rasmussen
to be
from and
described participants
Williams,
individual tests
in Section 2006)
participants presented
with is explicit
awith
probability in
represen-
explicit specified
distribution
represen- by 5.1. mean
over vector
functions
m
Gaussian
(Rasmussen m and
processes
and covariance
:Williams, p(f
R, such
2006) matrix
that
K
is a
N the K
probability det 2πK
K) X
distribution
distribution 2 over over the
functions 2 functionK f values
tations of their uncertainty, the shops
model gather
must observations
also try to fuse from individual
observations participants
from subjective
with (Rasmussen
judgments
explicit
specified trends
represen-
by
1 values on a and
of(Section
mean Williams,
the experts.
vector
1(Rasmussen
→ 5.6). 2006)
andThese is a extrapolations
probability
covariance matrix provide
that
distribution det is, alternatives
2πKKover= functions
κ(X, X). to the la-
tations Resilience
aofBayesian
their to
uncertainty, uncertainty
the model is crucial
must alsoto our
try exercise.
to fuse Not
observations only did
from our work- 1 1 (Rasmussen
m and and Williams,
Williams, 2006)
2006) is is
K a probability
a probability distribution
distribution overoverfunctions
function
5.1. The Gaussian process is tations non-parametric
of their :
uncertainty, R,model
the such
model that
(Ghahramani,the
must also marginal
thetry to:=distribution
|fuse observations
:= m,(fK) over
; m,:= the
from function
√:= observations exp any finite
1matrix subset
f1marginal of (such
1 distribution
(1) as
1a−1provide is multivariate Gaussian. X(1) For a function
the diverseSection
range of 6.2. opinions The f X
Gaussian
tations
produced
→
ofbytheir process
our domaingives
uncertainty, experts.
p(f |amodel
p(f
K) flexible,
Our K) must
N (f ;non-
model also
N try toK) fuse √f :fX: The exp
X→ bels
The→ covariance
R, −R,
from such
produced
covariance
such f−that Kthat−1
fKthe
fmatrix
in
the
−1
is is
. marginal. (1)
generated
workshops,
generated X by
distribution
andby covariance
over
X) over
a covariance thethe
function
results Given function
thattraining
function
function κdo :X :values
not
values
κ X ×× rely
Xdata
on onon
→ → R; the
D,R; we use the g
2013), meaning that its shops
theexpressivity
diverse gather
range willof observations
opinions
any
naturally finite
adapt from
produced subset
to individual
thatbyof ourX (such participants
domain asdomainexperts.is with Ourexplicit
multivariate model represen-
p(f
Gaussian. | K) p(f:=
det In
2πKdet
N
|For (f
this a:=
2πK ;section,
m, N K)
function (f ;2m, := weK) ffThe
√ :fX
2give :=: Xcovariance
a→√brief R, exp
R, suchreview
such
−1 −that
matrix
exp off1the
that Kisthe
Gaussian marginal
generated
fmarginal
fK . its fdistribution
processes. by1(1) a fcovariance
.distribution Formally,
(1)a subsetover over athe
functionthe
xgp ⊂functionκare: X values
Xfunction X →on
×values R;on
ofinherent ourin (x), the → prior distribution 2 over
values
on x∗completely
the diverse range of opinions produced by experts. K) −
our Our model any that finite is,
subjective
that is, subsetK=:= = of
judgments (such of asthe is
experts.
2X multivariate Gaussian.
function fFor
values affunction
∗ at (1)inputFor .aWith this
X) any finite subset of X(such as
choice is informed byofthe the
probabilistic diverse range
foundations opinions
of thealso produced
Gaussian by
process, domain experts. Our model det 2πK det 2πK := aisX). multivariate Gaussian. For a.functions
function
5.1. Gaussian processes K κ(X, κ(X, (f
X finite X).; X). X) X) exp
(Rasmussen |and any that
Williams, any is,
finite subset
2006) = subset is of of X(such
probability (such asand −is
distribution
as multivariateK −1 over Gaussian. function
2xis⊂
fmultivariate Gaussian. For a function
istations their uncertainty, the model must try to the
fuse observations from p(f K) N m, K) √
an linear,
the data. This gives uschoicein-built
choice function
informed
resistance by the
is informed tofclass
(x),
choice
the
over-fitting
by the suitable
probabilistic
is
prior (learning
probabilistic
informed
for
foundations
distribution
The
by
the
patterns
foundationscomplex
ofcovariance
over
covariance
the
the
Theprobabilistic its ofGaussian
values
matrix interactions
fis onprocess,
Gaussian
matrix aissubset
generated
foundations process,
generated
of by
x
the
⊂a X byfare
covariance
(x),
Gaussian
completely
af (x),
covariance
the the
Given prior
function
Given
prior
process, function
training specified
distribution
distribution κ : X κ×
training data
K
XXby
:dataover × aX
over
κ(X,
mean
its
R; we its
→ we
values det
use
R; vector
values use 2πK
the the
onfm on
gpa gp
X) X)
atosubset
subset tocovariance
make
equations
make x predictions
⊂ Xare matrix
are
predictions completely
completely about
K about thethe
which give it athe coherent way to reason about uncertainty. As such, we expect f (x), the aprior
→ D, D,distribution f : over κits values X on
that do not generalise which give
to unseen
which itdiverse
data):
give itrange
a coherent
the a model wayof will
specified
coherent opinions
toway reason aproduced
bytoinduce
not about
meanthat
itofareason the by our
uncertainty.
vector
aboutflexibil-
that
domain
and
uncertainty.
mis, As= such,
covariance AsThe
experts.we
such, covariance
Our
expect
matrixThe
we expectInmodel
covariance
this matrix
f : Xmatrix
section, is→
specified
generated
we
function
R,giveissuch
by
generated
amean
amatrix
values
by
that
brief fa(x), Given
covariance
the
f∗fisreview
bythe
vector
at
marginal
input
training
prior
covariance
of function
distribution
Gaussian
xand
data
distribution
. covariance
x.fcovariance
With
function
this
D,
κprocesses. we
Xover ×overX:use
matrix
information,
its
X→ the
× the
values
R;
Formally, Xfunction
K we
fgp
→ fR; aahave
to
on subset
gpamake
values
subset on ⊂
x⊂ X Xareare
xpredictions completely
about
completelythe
is, = function values at input With this information, κwe :have the
the predictive
predictive
K specified by aGaussian
mean
our model (for instance, complementarities)
which give coherent thatway we
to expect
reason about between
uncertainty. KAs such, we 5.1. expect Xvector processes a|and xmatrix this1functions 1R; have
to give an ishonest representation the trends that
K canK
κ(X, be κ(X,
inferred
X). X). The covariance ∗ generated m m by avectorcovariance function
K
our model
ity required to give a near-perfect choice
tofitgive
oninformed
an honest
training by the probabilistic
representation
data unless the an of the
quality foundations
trends that of can
the that
Gaussian
be is,
inferredthat =is,κ(X,
process,
Kinferred = any
X). finite subset ofspecified
function
specified (such byvalues
aas by mean
a∗mean ∗ at
∗is
:= Ninput
multivariate
vector
(fdistribution
; m, m and
. With
and
:= covariance
Gaussian. covariance For Xmatrix
information, ×ap(f
matrix
exp function
X →
K | we
K= −1the predictive
f .f∗ ; m(f (1)
our model to give an
our honest
model representation
to give Given
honest ofandthe
Given
trainingtrends
representationtraining that
data 1 can
data
of be
we
the we
use
D,trends (Rasmussen
use
the
1 thatgp theκ(X,
cantogpand X). to
equations
make
be Williams,
equations make
inferred predictions 2006)
predictions aboutisp(f about
X)
probability
the
K) the ∗m
K) √ over ∗−
K
x ∗f, D) N ∗ |
from noisy participant
which givelabels.
it aextend
coherent way to reason
D,
Given training that
fdata is,
(x),fsuch
the =we
prior use
distribution equations
the gp to
over make aits predictions
values 1
1fpredictions
on the a about
subset
the 1det 1 2πK are completely
2
variables and demand. Nabout uncertainty. As such, weGiven expect training . data
K κ(X, we X). use the gp to make about 1the
fromthat
quantity of data suggests noisy
from thisparticipant
fit
noisy will labels.
participant equally p(f well
labels. | K) to:=function (ffunction
unobserved K) :=
; m,values values
f∗√at ∗ at input
finput x∗exp
. With x∗f− . :Withthis
Xf →
K −1
this
R, D, In
information,
information, this
that |(1)
we
D, |section,
the have we
:= marginal
:= have
theN (fwe (f
;the givedistribution
;we
predictive predictive :=brief := √ review over of expGaussian
function
exp
x
− f1processes.
⊂ values
X −1 f.Formally,
on . the 1(1) 1(1) a gp
from noisy participant labels. The covariance matrix is generated by a covariance function:
p(f K) m, K) fK −1
K
Beyond the state-of-the-art in Gaussian processes, we introduce a
det novel
2πK 2 Given p(ftraining K) data N m, K) use := √
the
f:= gp
(f ;
(f to make
;|matrix := :=− where
predictions f about
exp −1
(1)
ourBeyond
ismodel to
thegive an honest representation of thewetrends wethat function
can bevalues
ainferred
function valuesat subset
f∗ specifiedinput xby
f∗ at(Rasmussen ∗. X
input aWith mean
x∗ .p(f this
Withvector |information,
|this
D, and
information, we
covariance ;have ;det we the
2πK x| ∗predictive
have the predictive 2acovariance , D)exp f X. → R;(1
−1
| =|= N √ V2√
(f − f (2)
Beyond
data. This desirable property the state-of-the-art
induced by the in Gaussian
‘Occam’s inRazor’ processes,
implicit introduce a novel any finite The and
∗ascovarianceWilliams,
p(f m,p(f K) K)
matrix 2006)
| ∗xfdet
N is is
2πK ∗a
=∗probability
m, m,
generatedK) ,K) by
K ∗distribution
a(f ,over
, function − functions
f KκK (2)
:X× f .
aofnovel (such is multivariate Gaussian.
state-of-the-art Gaussian processes, introduce novel , ∗D) N D), VFor | function
x∗x, ∗D)
∗ |2x2
equations equations p(f ∗x ∗x D) N ∗m(f m(f Nx∗f, ∗D), x|2πK
wedet
; m(f ∗det |∗2πK V (f (2)
Beyond the state-of-the-art in Gaussian processes, we introduce X)
heteroskedastic (that
from is, with
noisy location-varying
participant labels. noise variance) ordinal regres-equations equations function fvalues →;fmatrix at such inputthat x∗ . With p(f ∗ this ∗ , information,
D) have ∗the
, D), predictive ∗ , D) ,
heteroskedastic
within Bayesian reasoning (MacKay,
heteroskedastic (that
2003), is,andwith
The
(that isis,location-varying
covariance
suggested
with matrix
empirically
location-varying noise variance)
iscontaining
generated
by noise byordinal
a covariance
variance) regres-
ordinal function (x),
regres- theκThe :The
prior X : →
×distribution that
∗R,
that is,
over is,its Kvalues = the fmarginal
on a.asubset
1covariance distribution are 1 overcompletely the κX: function values on, X)K(X, X
sion model,This model is trained on adevelopment
dataset labels for fvariance)
covariance
X covariance X R; matrix is is
generatedgenerated κ(X, by
X). by a covariance x ⊂ X function
function :
X |×xX, D) =
R;K(x
described in Section heteroskedastic
5.2. This (that is, with
is location-varying
necessary to man- noise ordinal
equations wherewhere regres- :=The The covariance
(fcovariance
;of := matrix matrix is generated expuse−the
isisgenerated by a
byf agp κ−1
covariance
m(f
covariance × ∗ X ∗→→
function (1)
functionR; :
κ:X
sion Beyond theinstate-of-the-art in Gaussian processes, we |introduce ;aspecified
novel
sion model,
the cross-validation tests presented described
in Section Section
that 6.2. 5.2.
is, Section
The = κ(X,This
Gaussian development
process is necessary =, D)
x∗to man- =fto f∗ ∗; m(f
| x∗by | ax∗mean (fK
any
p(f |x (f
|finite
K) subset
whereN Given m, K)
training (2)
(such √ (2)
as
data multivariate
we Gaussian.
K to fmake . For a function
predictions κ X ∗
× X X→→
×
about R;R
the
model, described in K 5.2. X). This development p(f xis
∗ |p(f , necessary
D) N ∗N man-
m(f
| that
D),
,that D),
V,is, ∗V vector
f=
, D)| x ,, and
D) ,covariance
X matrix X) D, 2
∗ is, =K = ∗ ; m(f ∗ ,(f Vxdet ∗ 2πK , (2) V (f∗ | x∗ , D) = K(x∗ , x∗ ) − K(x
= m ∗∗ ;| m(f
K
sion model, described in Section 5.2. This development is∗necessary to man- ∗(f (2)
∗∗ ∗ ∗
κ(X, ∗ X).
age participant labels that are(that both ordinal and of varying uncertainty. Our
∗
each
age participant occupation
heteroskedastic
labelssuitable
that from
areis,
Given botheach
with ordinal
training individual
location-varying
andinterac-
data of expert,
noise
varying
we use rather
variance)
uncertainty.
the gp to than
ordinal
Our regres-
make
p(f
predictions
∗ , D)
xp(f |x
about
,N
f∗(x),
D)
the
κ(X,
the
Nthat
|prior
X).fthat is,
x∗is, ,K
xdistribution
D), |=x V =κ(X, D), ∗ |X).
,over
, D)
|its
| x,∗ , D)
values on athis
xsubset arewe completely
∗function values at input .to With information, have the predictive
∗K κ(X, X).
gives a flexible, non-linear, function
age class
participant labels for
that
ageofparticipant the
are complex
both ordinal
labels that
D,and of varying uncertainty. Our uncertainty.
Given Given Giventraining training
training data
x|∗data ∗data
, ∗D),= ,= we=
we fwe use
∗use ∗∗use
;fm(f ,the∗the the gp gp to∗we
x to−1
make make
ymake
f(f
−1
predictions
∗predictions
∗y | predictions ,xy⊂about Xabout (2)
the the(3)(3)
model is put to work
sion on a described
model, variety tasks.
in SectionIn particular,
5.2. where
This it are
where is the
development both basisordinal ourand of to
isofbasis
necessary varying man- The covariancespecified
Our
p(f m(f ∗m(f
matrix
by a
,x|D)
Given is
meanGiven D,D) D,
N
generated
m(f
K(x
training
vector ∗ 1| xm
K(x
training ∗X)K(X,
∗ ,data
by
X)K(X,
D) adata
and
x= ∗ ,
covariance K(x
X)
D),
covariance
D, 1∗V
X)
we use
function
, X)K(X, use the
matrix
, D)
the
gp
X) κgp :tothe
−1
Xto make make predictions
predictions
R; about
aboutthe(3)
the
model is put
tions (for instance, complementarities)
model to
on theoccupationsis work
putthatto on
we
work
group consensus. a variety
function
expect
on a of
values
between
varietytasks.
This fof In
at
variables
tasks.particular,
input
approach Inand
x . With
particular, it
permits is the
this it iswhere
thethe of
information, our
where
basis of we our have the
function
function
p(f | predictive
K) :=
valuesvalues
N (f (f ; equations
fat
m, at
input
K) :=
input = √ . . With
∗With this) exp
this
D,
information,
−
information, f K Inferring
−1
we we
f have . have
K the
×label
(1)
the
X → posterior
predictive
predictive
) (4) p(y ∗ | x∗ , D) i
means of selecting model
to be is
labeledput to work
through
∗
on
activea variety
∗
learning of tasks.
(Section In particular, it is the
about basis the of our
V
function f (f |
function
∗∗ x , D)
values x
values = x
K(x
det 2πK , at
x
at −)
input
atinput K(x .2
, With
X)K(X, this X) −1 −1
information,
K(X, x we
.) have (4)
the predictive
that−1 is,−1K = κ(X, ∗,. xWith this
V |
functionx , D) values
(f K(x , x =−
input K(x , With
X)K(X, this X) information,
K(X, x we
. have the predictive
age participant labels that arelabeled
both ordinal and of varying uncertainty. OurX) where ∗) − ) (4)
∗ ∗ f ∗ x −1
means means
of selecting occupations
equations to be through active learning (Section X). ∗
form of the ordinal likelihood to be introd
∗ ∗ ∗f ∗ ∗
demand. ∗x∗
| D) ∗
= V x (3)
, K(x K(x , X)K(X, X) K(X, x .
∗ ∗ ∗ ∗
of selecting occupations
means of to
selecting be labeled
occupations through
m(f m(f
|
∗to be x active
, |
D) x
∗ ∗labeled = learning
, D)
K(x K(x
through, (Section
X)K(X, , X)K(X,
∗active learning equations
equations
X)
y y
(Section
∗(3) ∗ ∗
1 ∗
1 ∗
5.3), and interpreting
diversity the
isof patterns
views discovered
within the bygroupthe model is the basis of∗ our
= = ; (3) (f (2)
by to Inbe captured within ∗ Given =
training equations
data
equations we(f use ∗m,the gp tois make κ predictions (3) about the
−1
amodel put tothework on apatterns
variety of tasks. particular, it isisthe
theofbasis
basis of∗our our
∗
:=by |∗∗ x| ∗:=
−1
5.3),on
and interpreting patterns discovered the model is ,the m(f | x , D) | K(x
D)
Inferring , X)K(X,
the label X)
| K)posterior y p(f ;p(y | x , D) N f
complicated m(f
:exp
| x ,
∗by D), the V |
non-Gaussianx , D) ,(1)
| x| ∗basis ,our
This model is trained dataset containing labels for each occupation m(f x ,
∗ Inferring K(x the , X)K(X,
label D, X) y −1
.posterior is complicated −by
2 ,the
5.3), and interpreting the discovered by The covariance matrix isX) generated a(4) covariance ,function Knon-Gaussian
(fthe model of p(f N K) √ fR; ∗ f ∗.
−is∗information, we have xthe predictive yxequations:
∗ ∗ −1 D)
x, ∗−1 ∗ ∗
(f =∗VK(x ∗ ), D) ∗∗)
∗(4) D) ,X ×isX →
∗
of5.3), and interpreting the=patterns discovered =, D) by the ∗ )∗∗ model , the basis of,∗x our Inferring
−1
x∗ )K(x
= the label
p(y posterior det complicated
(3)by(2)the non-Gaussian
assessment of meansthe importance O*NET variables | xto future demand
f∗ ∗V|; x ∗ (Section
).V2πK
VN D) K(x x(f −,| xxK(x K(xX)K(X, , X)K(X, X) (2) fK(X, K(X, x , .X)K(X, ∗ | xwe D)
of selecting occupations of to be labeled through ∗active
∗ x∗learning (Section
(f Vx∗(f,function
, =∗form D)values
m(f
=
| )x ∗∗, ,atD)
x|∗∗input)D) N.∗ ,N
= With ;this X) information, ,∗p(y (f .x|∗have
∗(4)
∗ )5.2.
the predictive
,(4)
p(f∗variables D)
∗ , This m(f D),
,(Section =likelihood
of the ordinal likelihood ∗ ;X) ∗to
|∗be ,∗introduced f∗(f
Vx∗below.
x(2)
−1
assessment
from each individual expert, of the
rather than importance
on
thethe groupof O*NET consensus. toap- future demand form ∗ of ∗ the ordinal xto x∗, be
|∗introduced
−1
the assessment
model. of
The importance
participant O*NET
labels variables
are modelled to future demand that
(Section
V ∗ ∗ | is, ∗ |=x
D) ,K(x K(x
p(f p(f ∗− ∗K(x
∗|where ,x ∗,, D)
∗− X)K(X,
K(x X)K(X,
fp(f fm(f m(f K(X,|X) xD) = ,xD),
∗K(X,
=
D),
f;X) |∗below. ,x| D) , D)
∗Heteroskedastic
,, D), (f ordinal
, D), , regression (2)(2
∗ ; m(f V (f
assessment of the importance of O*NET variables to future K
demand ∗κ(X, X).
(Section form of the
∗
ordinal
p(f ∗−|∗K(x likelihood
D) N Nto bem(f introduced D), below. ∗ |∗(4) |∗x, D)
(f = ) ∗ x|∗x ) V
proach permits the diversity of 5.3),
viewsandwithin
interpreting
the groupthe to patterns
be captured discovered
Inferring Inferring
within by the
the themodel
label label
posterior is thep(y
posterior basis xof, Given
our equations
| x∗is, D) complicated
training Thedata
is complicated
V ∗covariance
| x
by∗D, ,
the D) bywe the matrix
K(x
non-Gaussian non-Gaussian
, xis∗ gp generated ∗
∗ by predictions
, X)K(X, a covarianceabout
−1 ∗K(X,
function∗x ∗ the κ : X ×
. (2)∗
is∗use
∗the tobymake
where ∗ |p(y D) X → R;
aslabelsconditionally of theindependent 35 given the Inferring the∗∗label
Inferring the posterior
wherewhere
label posterior ,complicated is complicated the by non-Gaussian
the non-Gaussian
thelatent function Gaussian
assessment importance of O*NET variables to ordinal
future demand p(y∗ | xp(y ∗ , D) process ordinal regression cons
to (Section that is, =
where |x D) = (3)
the model. The participant are modelled as conditionally 35 35 form
form
independent of ofordinal
the likelihood likelihood be to
function be
introduced introduced
values below. below.
at input where. With this information, we have the predictive
−1
5.2. Heteroskedastic ordinal regression
∗ κ(X,
35 form of the of theInferring
ordinal likelihood 5.2.
f theto Heteroskedastic
label
p(fbe
K posterior
x
∗ |introduced
X).
m(f
= p(y Nbelow. ordinal | x
∗f|∗ ;xm(f
, D) regression
∗ |isxordinal
K(x
complicated , X)K(X,
(f∗ |is,x by X) the y
D) non-Gaussian (2) scores) observation
y formequations ordinal likelihood to ∗∗,be introduced ∗ ,below.
∗
5.2. D)Heteroskedastic V regression
∗ D) ∗
x ∗ , D), ∗ ,ordered , numerical
∗
given the latent function f (x).. The The dependence amongst m(f∗the , D) = K(x
| xgroup’s labels
−1
Given(3)|∗x| ,xtraining == data(f , weD) = use the
−1 gp
y, x∗ ) − to K(x make predictions about
(3) the x∗ ) . (4)
dependence among ∗the group’s ∗ , X)K(X,labelsX) induced by −1 form of the
Gaussian
m(f
Gaussian
ordinal
m(f ∗
process∗ D)
process
likelihood
∗ , D) K(x
ordinal
VK(x
ordinal
m(f ∗to,∗X)K(X, D,
∗|,be
regression
| ∗ ,introduced
X)K(X,
x
regression
x D) X)=
=
X)−1
K(x
consists
K(x y ∗below.
consists , of
X)K(X, ofcombining
combining
X) ∗ , −1X)K(X,
−1
y an an
X)−1
ordinal
ordinal
(3) K(X,
(that
(that (3)(3
35=5.2. x∗ ) . function values f∗ at process input | D)
∗ x ∗. With this ∗information, −1 weof have ) . the−1 predictive
where m(f x , K(x , X)K(X, X) y
induced by discussion is modelled through this shared Vlatent (f∗ | xfunction. 5.2.
∗ , D) K(x∗Heteroskedastic
, x∗ ) − K(x∗ordinal
Heteroskedastic , X)K(X, ordinal X)regression
regression K(X, (f (4)
(fD) xGaussian = =
∗
∗|,∗ )
∗
∗ordinal
x∗−,)observations regression
∗ −1 consists x.∗combining
) (4) (4)an ordinal (that
discussion 5.2. Heteroskedastic
5.2. Heteroskedastic ordinal
is, orderedV regression
ordinal ∗x|numerical
= , D)
regression ; K(x
scores) − K(x (f ∗ , X)K(X, with X)
a Gaussian K(X, (2) process prior.
is crucial to is ourmodelled through
only did this shared latent function. is, V ordered | , D)
numerical
Inferring K(x
∗ V (f the scores)
V (f , x
label observations
K(x
posterior =scores)
= , X)K(X, with
) ∗− X)
)∗ − a K(X, Gaussian
is x
complicated process by prior.
the non-Gaussian
x∗x) ∗. ) . prior.
(4)(4
36
equations
p(f | x , ∗N f m(f x D), V | x , D) , −1
Resilience to uncertainty exercise. Not our work-
∗ ∗
is, ordered
∗
∗ |∗xD)
∗numerical ∗ , D)
∗ K(x ∗K(xp(y
∗ , ∗x | x
∗observations
∗x ,K(xD) K(x ∗ , X)K(X, with
∗
, X)K(X, a X) X) K(X,
Gaussian K(X, process
Gaussian where
5.2.
∗
ofHeteroskedastic
∗
=ordinal regression
∗ ∗
(3)
Inferringwith the label Gaussian
posterior ∗ | process
process D) isordinal
x∗ , ordinal regression
complicated regression consists
by the consists
non-Gaussian of the
combining combining an
∗ ,form an
ordinal
ofK(x ordinal
the ∗(that
,ordinal (that
,xlikelihood to be(that introduced below.
−1
Inferring m(f | xlabel D)posterior X)K(X, X) ycomplicated
isordinal
shops gather observations from individual participants explicit represen- p(y Gaussian process Inferring
ordinal the∗ label
regression posterior
consists ofp(y combining is complicated byby isthe
the non-Gaussian
non-Gaussian
Gaussian process ordinal aregression consists f∗ ;an
|∗of
Ncombining |an ordinal (that
where Inferring
p(y x|∗posterior ∗ , D)
D)
form is, is, ordered
ordered be numerical
numerical scores) scores)
observations observations with with
athe Gaussian Gaussian Inferring
process xthe
|process the
∗∗,prior.
label
∗ label
=prior. posterior p(yp(y ∗x∗|∗,xD), |∗x, −1D) (f complicated
∗is| complicated
byby the thenon-Gaussian
non-Gaussian
(2)
try of to thefuseordinal likelihood to introduced below. ∗36
(f p(f = ,D) )be ,m(f 36 ∗ ,VD) xx∗ ,)D) , (4)
tations of their uncertainty,Resilience
the model must to uncertainty
also isobservations
crucial tofromour exercise. is, ordered is, ordered numerical form
Gaussian
form
numerical of of
scores) the
Vprocess ∗ |ordinal
ordinal x
observations
scores) ∗ ,ordinal
D) likelihood
likelihood
observations
∗K(x
regression
with to
xa∗withto −be
Gaussian consistsintroduced
introduced
K(x a Gaussian of∗ below.
X)K(X,
process below.
combining X)
prior.
process 36 K(X, an
prior. ordinal
∗ . (that
form form of of the the ordinal ordinal likelihood
likelihood toto bebe introduced
introduced below.below. (3)
the diverse range of opinions produced by our domain experts. Our model is,m(f ordered
∗ | x where
∗ , D)numerical= K(x 5.2.∗ ,scores) Heteroskedastic
X)K(X, X)−1 y ordinal
observations with regression
a Gaussian process (3) prior.
Not only did our5.2. workshops
Heteroskedastic gather observations
ordinal regression from 36 Inferring
365.2. Heteroskedasticthe label posterior p(y∗ | x∗ , D) is complicated by the non-Gaussian
choice is informed by the probabilistic foundations of the Gaussian process, V (f∗of
5.2. Heteroskedastic = K(x 36
Gaussian )ordinal
ordinal
∗ 36 −process toregression regression
ordinal X)−1
regression x∗ ) .
consists of(4) combining
form (4)an ordinal (that
| xthe ∗ , D) ordinal 5.2. likelihood
∗, x Heteroskedastic K(x be∗ , X)K(X,introduced ordinal below.
K(X,
regression
5.2. Heteroskedastic
∗ | x∗ , D) = K(x36
ordinal regression
−1
(3)
which give it a coherent way individual
to reason about participants
uncertainty. with As explicit
such, we representations
expect
Gaussian process ordinal regression consists of combining of their an ordinal (that
m(f
is, ordered numerical
∗ , X)K(X, X) y
scores) observations with a Gaussian process prior.
Inferring Gaussian
Gaussian
the label process
process
posterior ordinal
(f ordinal regression
regression = is consists
complicated consists
) of of
by combining
combining
the non-Gaussian an an ordinal
ordinal (that (that
) (4)
our model to give an honest representation of is, the ordered
trends that can
numerical be inferred V Gaussian p(y
Gaussian| |
∗x∗ , D) x ∗process
, D)
process K(x ordinal
, ordinal
x − regression
regression
K(x , X)K(X, consists
consists X) of
−1
ofcombining
K(X, combining
x . an an ordinal
ordinal (that
(tha
uncertainty, the model must also scores) try to fuse observations observations with a Gaussian form of 5.2. is,process
the is,ordered
ordered
ordinal prior.
Heteroskedastic numerical
numerical
likelihood
∗
is,is,ordered to ordinal
scores)scores)
be introduced regression
observations
observations
numerical
∗ ∗
below.
scores) with witha aGaussian
∗
observations
Gaussianprocess
with
processprior.
∗
prior.
from noisy participant labels. ordered numerical scores) observations36 witha aGaussian Gaussianprocess processprior.prior
Inferring Inferringthe the label
label posterior posterior p(y∗ | x∗ , D) is is complicated
complicated by the by non-Gaussian
frominthe
Beyond the state-of-the-art diverse
Gaussian range we
processes, of introduce
opinionsa produced novel 36by our Gaussian process ordinal regression consists of combining an ordinal (that
5.2. Heteroskedastic form ofordinal the ordinal regression likelihood36 to36be introduced below.
heteroskedastic (that is, with location-varying noise variance) ordinal regres- is, ordered
the non-Gaussian numerical scores) form of observations
the ordinal withlikelihooda Gaussian 3636 to process
be prior.
domain experts. Our
sion model, described in Section 5.2. This development is necessary to man-
model choice is informed by the
Gaussian process
introduced 5.2. ordinal below.
Heteroskedastic regression consists ordinal ofregression
combining an ordinal (that
age participant labels thatprobabilistic
are both ordinal foundations
and of varyingof the Gaussian
uncertainty. Our process, which is, ordered numerical scores) observations with 36 a Gaussian process prior.
model is put to work on a variety of tasks. In particular, it is the basis of our Gaussian process ordinal regression consists of combining an ordinal (that
give it a coherent way
means of selecting occupations to be labeled through active learning (Section
to reason about uncertainty. 5.2. HETEROSKEDASTIC ORDINAL REGRESSION
is, ordered numerical36scores) observations with a Gaussian process prior.
As such,
5.3), and interpreting the patterns we expect
discovered by theour model modelis the to basis give of our an honest
assessment of the importance of O*NET variables to future demand (Section Gaussian process ordinal regression36consists of combining
representation of the trends that can be inferred from
uncertain,
35 or noisy, participant labels. ordinal (that is, ordered numerical scores) observations
with a Gaussian process prior. Our approach differs
Beyond the state-of-the-art Gaussian processes, we from the state-of-the-art (Chu and Ghahramani, 2005), in
introduce a novel heteroskedastic ordinal regression model incorporating a heteroskedastic
Our approach differs noise model.
from Recall that (Chu and Ghahramani, 2
the state-of-the-art
(that is, a model with location-varying noise variance), firstly, in incorporating a heteroskedastic noise model. Recall that pa
participants describeipants describe how confident they areausing
how confident they are using choice
a choice from an ordina
described in Section 5.2. This development is necessary from an ordinal list {0, . . . , 9}.. We
We assume
assumethatthat
the noise standard deviation associated with
the noise
observation is an affine transformation of the chosen value: σnoise = α
for i ∈ {0, . . . , 9} with hyperparameters α ∈ R+ and β ∈ R+ . Since
value of noise is an ordinal variable as well, we build a secondary or
regression model to predict the ordinal value of noise at different poin
feature space.
As there are two questions being asked, one relating to share change
the other absolute change, two different types of observations are made
36
build a model capable of acknowledging and fusing these two differen
servation types by employing and extending a Python library called GP
(Matthews et al. (2016)). We use a Gaussian process to model a latent
tion f (x), representing the change in demand in absolute employment.
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Our approach differs from the state-of-the-art (Chu and Ghahramani, 2005),
Our approach differsdiffersfrom the fromstate-of-the-art (Chu and Ghahramani, 2005),
firstly, in Our approach
incorporating a heteroskedastic the state-of-the-art
noise model.(Chu Recall andthat Ghahramani,
partic- 2005),
firstly,firstly,in incorporating
in incorporating a heteroskedastic
a heteroskedastic noise noise model.model. RecallRecall that partic- that partic-
ipants describe how confident they are using a choice from an ordinal list
list list 5.4. ASSESSING FEATURE IMPORTANCE
{0, . . . ,standard deviation associated with
ipantsipants describe how confident
describe how they are theyusing aeach
choice a observation
from from an ordinal isordinal
an
9}. We assume that theconfident
noise standard are using
deviation choice
associated withan each
, 9}. .We, 9}. assume that the
We transformation
assume that noise
theofstandard
noise deviation associated with each each
the standard deviation σnoiseassociated
= α + βiwith
{0, . . .{0,
observation affine is. . an affine
transformation of the chosen chosenvalue: value: for
observation observation is an is affine
an transformation
affine transformation of the chosen value: σnoise σ=noise α + βi
for i ∈ {0, . . . , 9} with hyperparameters α ∈ R+of and the chosen
β +∈ R+value: . Since the = α + βi One of the core goals of this work is to assess the
for i ∈
value of noise for{0, i is ∈. . .{0,
an ordinalwith
, 9}. . with hyperparameters
. , 9} hyperparameters
with
variable hyperparameters
as well, we build
α ∈ R α+∈and βand
aRsecondary
∈ R β+∈ . SinceR+. Since
ordinal
the the
Since
value value of noise of to is an is
noise ordinal
anthe ordinalvariable as well,
variable we
well,build a secondary ordinal significance of the 120 O*NET features to future demand,
regression the model
value of predict
noise is the ordinal
an ordinal ofasnoise
value variable wedifferent
at build
asatwell,
a points
secondary
we build in aordinal
regression regression model to
model predict
to predict ordinal
the ordinal value value of noise of noise different
at different pointspoints in in and thereby inform skills policy decisions.
feature space.
feature
As there secondary space.space.
featureare two questions ordinal regression being asked, one modelrelating to to predict
share change, the ordinal and
As there As there are two arequestions
two questionsbeing being asked,asked, one relatingone relating to share change, and and
theOur other value absolute
approach of change,
noise
differs at
from two different
different
the types ofin
points
state-of-the-art observations
(Chu featureand areto
space.
Ghahramani,
share
made. Wechange,
2005), First, however, our research question needs further
the other the other absolute change,
absolute two different
change, two different types types of observations
of observations are made. are We
buildfirstly,a model in capable
incorporating of acknowledging
a heteroskedastic and noisefusingmodel. these twoRecall differentthat ob-made. We
partic-
build build a model abymodelcapable capableof acknowledging
ofextending
acknowledging and fusing andlibrary these these
fusing two different two differentob- ob- clarification: what exactly does it mean for a feature to be
servation
ipants types
describe employing
how confident and theyextendingarebeing a
usingPython called GPflow
Asservation
servation there typesare two
by employing
types by questions
employing and and extendingaaasked,
choice
Python from library
one
library ancalled
relatingordinal GPflow listGPflow
(Matthews
Our{0, approach 9}.et al. differs
We (2016)).
assumefromthat Wethe use anoise
Gaussian
state-of-the-art
theuse standard process
(Chu and
deviationtoa model
Python
Ghahramani, a latentwith called
func-
2005), important to demand? We propose two primary criteria
. . . ,(Matthews
(Matthews
to share et al. (2016)).
et
change, al. (2016)).We
and theWeademandGaussian
use
other a Gaussian process
absolute toassociated
process model
change, to model a latent two
each
func- func-
a latent
tion
firstly, (x),
in
observation
f representing
incorporating
is the
anrepresentinga change
heteroskedastic
affine transformation in noise
ofinthein absolute
model.
chosen employment.
Recall
value: that partic-The
= α +The βi The for a scheme to measure importance:
OurOur approach
approach differsdiffersfrom thethe
from tion ftion
state-of-the-art
state-of-the-art (x),(Chu frepresenting
(x),
(Chu and and the change
Ghahramani,
Ghahramani, the change in2005),
2005), demand demand in+absolutein absolute σnoise
employment. employment.
relative
ipants
for change
describe in
how share
9} can
confident be represented
they are using by dividing
a choice the
from absolute
an + change
ordinal list
firstly,
firstly, in incorporating
in incorporating a heteroskedastic
a heteroskedastic
idifferent
relative ∈relative
{0,
noisechange
noise model. in with
types
. . . ,change
model.share inof
Recall
hyperparameters
canobservations
share
Recall becan
that represented
that bepartic-
partic-
α are
represented ∈ made.
byRdividing and
by βthe
dividing We Rbuild
∈ absolutethe . Since achange
absolute thechange
in
{0,demand
. . . , 9}.
value of by
Wenoise the
assume total
isthean size
that
ordinal T of
the noisethe
variable workforce
standardas inwe
deviation
well, 2030. That
associated is, the with second
each
ipants
ipants describe
describe howhow confident
confident theythey in demand
are modelinusing
using
are demand
a bycapable
choice
a by
choicetotal
thefrom
from of size
total
an Tsize of
acknowledging
ordinal
an the of
T list
ordinal workforce
the
list workforce
and inbuild
2030. in
fusing
a2030.secondary
That these is, the
That two
ordinal
is, second
the second 1. An important feature must be clearly predictive
of the
observation two
regression workshop
is an
model affine questions
to transformation
predict provides
the provides
ordinal an
of theobservation
valuechosen
ofannoise value: , where
atfdifferent = is
+ an
/T ,points T is in
f /T σ noise T α βi
. . . ., .9}. WeWe assume that thethenoise of the of twothe workshop
two workshop questions questions providesan observation observation /T , fwhere where anT is an
{0,{0, . , 9}. assume that for istandard
noise
unknown
feature∈standard
{0, . .deviation
positive
space. . , 9} withassociated
deviation
value to associated
be inferred
hyperparameters with with each
for each
each
α ∈ participant.
R+ and β ∈ R+ . Since the of demand.
observation
observation is an affine
is an affinetransformation
transformation unknowndifferent
unknown
ofeach
the
of the positive
chosen
observation
chosen value
positive
value:value: to
value be =
types
inferred
to be by
for
inferred
+asked,
employing
each participant.
for each participant. and extending
value Forof As noise
there question,
isare antwo the
ordinal
questions model
σvariable
σnoise
noise being α=as
ingests α +ternary-valued
well,
βi we build
βi one relating a labels
secondary
to share from one ofand
ordinal
change,
for for . . . ., .9} with
. , 9} with hyperparameters
hyperparameters For
GPflow, each
For +question,
each
a + question,
andpackage
and the +model
for
. the ingests
model
building
+Since
. Since the the ternary-valued
ingests
Gaussian ternary-valuedprocesslabelslabels frommodels one ofone of
from
i ∈i {0, ∈ {0, the thesets
regression otherα model
∈ R
∈
absolute Rshare,
to predict
β Same
change, ∈ Rthe
∈ share,
R ordinal
two Same Lower
different value share}
typesof noise and at
of observations different No change,
points in
andare made. Wechange,
{Higherα β {Grow,
value
value of noise
of noise is an is anordinal
ordinalvariable
variable the
Decline},
feature
sets
as space.
as the
well, well,
for sets
{Higher
we we
relative build share,
{Higher
builda secondary
share aof Same
share,
secondary
of
share,
employment ordinal Lower
share,
ordinal and
share}
Lower
absolute
and {Grow,
share} change {Grow,
in
No
employ-
change,
No 2. An increase in an important feature must lead
build in
Decline}, aDecline},
model
Python fornoise capable
using
relative
for share
relative acknowledging
TensorFlow of points
share employment
of employment andand
(Matthews fusing andthese
absolute et al.,
absolute two
change2016). different
changein employ- inob- employ-
regression
regression modelmodel to predict
to predict thethe ordinal
ment,ordinal
As
servation value
there value
respectively.areof
types of
two noise
Fromat
questions
by employing different
at different
these being andthese points
ternary-valued
asked, in
extending in
one observations
relating
a Python to share the
library the model
change,
called pro-
and
GPflow to a strong increase in demand.
ment, respectively.
ment, respectively. From these
From ternary-valued
ternary-valued observations
observations model
the pro-
model pro-
feature
feature space.space. duces
the otheraWe
(Matthews posterioruse
absolute eta al.adistribution
Gaussian
change,
(2016)). two Weoverprocess
different
use binary-valued
types
a Gaussian to model labels,ato
ofprocess
observations latent
namely,
model function
are{Increasing
made.
anamely,
latent We func-
As As there there areare twotwo questions
questions beingbeing duces
asked, asked, aone
duces posterior
onerelating
relating distribution
posterior to distribution
share
to sharechange,over
change, binary-valued
over
and and binary-valued labels,labels, namely, {Increasing {Increasing
demand,
buildtion a Decreasing
model
f (x), capable
representing demand},
of which
acknowledging
the demand},
change are
in whichused
and
demand as
fusingthe
inas foundation
these
absolute two of
different
employment. the anal-ob- The We also propose two secondary criteria for a scheme to
thetheother other absolute
absolute change,
change, twotwo different demand,
different
ysis. The types , Decreasing
demand,
types
representing
of by
derivative of Decreasing
observations demand},
observations
ofshare
‘Increasing arethe made.
are
change
which
made.
demand We areWe
inare
used
ofaan
demandusedtheasfoundation
theinwithabsolute
foundation of theofanal-
to the anal-
servation
relative
ysis. and
types
The change
ysis. derivative
The
employing
in of two
derivative can and
‘Increasingbe extending
ofdifferentrepresented
‘Increasing demand byoccupation’
Python
of
demand andividing library
occupation’
of the
an occupation’
called
absoluterespect
GPflow
with respect withchange to
respect to measure importance:
build
build a modela model capable
capable of acknowledging
of acknowledging
theinoccupation
(Matthews employment.
demand and
et fusing
al. fusing
features
by the these
(2016)).total The
these
is Wemade
size relative
two
use different
possible
a Gaussian
of made change
ob-
by
the workforce ob-the use
process in of
inuseshare
toautomatic
2030.model can
That a be
differentia-
latent
is, the func-
second
servation
servation types types by by employing
employing andand the
extending occupation
extending thea occupation
Python
aofPython features
librarylibraryiscalled
features made
T is
called possible
GPflow GPflow by theby
possible the ofuse automatic
of automatic differentia- differentia-
tion,
tion fathe
oftion, feature
(x), tworepresenting
workshopGoogle’s the Tensorflow
change
questions in
providesPython
demand anpackage
in absolute
observation (Abadi f /Tet, (Abadi
employment. al. (2015)),
where The is
(Matthews et al. (2016)). WeWe useusea Gaussian represented
ation,
feature aprocess
process feature by
oftoGoogle’s
of
model dividing
Google’s
a Tensorflow
latent the
Tensorflow
func- absolute
Python package
Python change (Abadi
package in etdemand
al. Tet al.an
(2015)), (2015)),
(Matthews et al. (2016)). which
relativea
unknownGaussian
provides
change positive the
in share to
framework
value canmodel be to a latent
GPflow.
represented
toframework
be inferred func- by dividing the absolute change 1. It must be able to uncover non-linear interactions
tiontionf (x), representing thethechange which provides
which the
provides framework
the to GPflow. tofor each participant.
GPflow.
f (x), representing in in
change demand
demand by
inFor demandthe
by inthe absolute
total
in size
absolute
total employment.
size of
employment.
of the
the The
workforce
workforce
each question, the model ingests ternary-valued labels from one of
T The in 2030.in 2030.
That is, Thatthe is,
second
relative change in share cancanbe berepresented between features.
relative change in share of
5.3.represented
the
theActive sets by
two
the
workshopdividing
by
learning
second
{Higher dividing thethe
questions
share,
of the
absolute
Same absolute
two
provides
share, change
workshop
change
an observation
Lower share}
questions andf /T{Grow,, whereNo
provides
T is an
change,an
in demand
in demand by by thethe total sizesize
total T of the 5.3.
workforce
of Decline},
the
T unknown Active
5.3.
workforce
positive Active
in learning
2030.
in learning
2030.
value That
to That
be is, the
is,
inferred second
the second
for each participant.
for relative share of employment and absolute change in employ-
of the twotwo workshop questions Asprovides
providesdescribed anAs inobservation
observation Sectionthe f4.3, our
,4.3,
, where foresight isT anworkshops required choosing which 2. It must be able to capture complementarities
of the workshop questions As observation
For
ment, each an question,
respectively.
described described in Section From/T fmodel
/T where
, our
these whereingests
T
ternary-valued
foresight isternary-valued
is an
an unknown
workshops observations labels
required positivefrom
the model
choosing onevalue ofpro-to
which
unknown
unknown positive
positive value valueto be
to be occupations
inferred
inferred
the setsfor each
for were
each to beinSame
participant.
participant.
share,
Section
presented share,
4.3, to our
Lower
foresight
participants. share}
workshops
We {Grow,
and introduced required
No the choosing
change, use which
of
duces
a
occupations
be
machine
aoccupations
posterior
{Higher
inferred
learning
weredistribution
to
were
for
model
be to
each presented
to be over
presentedbinary-valued
participant.
automate
to participants.
this tochoice. labels,
participants.The
Wenamely, introduced
machine We introduced{Increasing
learning
the use the use between features: we wish to discover features
ForFor each eachquestion,
question, thethe
modelmodel ingests
Decline},ingests
demand,
of ternary-valued
foraternary-valued
a machine
of relative
Decreasing
machine learning share labels
demand},
learning labels
model from
of employmentfrom
towhich one toone
automate of
are of
and
used
this absolute
aschoice.
the choice. change
foundation
The machine in of employ-
the anal-
learning
thethe setssets {Higher
{Higher share,share, SameSame model
share,
share,
ment, Lowerused
Lower was
share}
respectively. a reduced
share} and and
From form
{Grow, these
{Grow, ofmodel
No the No model
change,
ternary-valued
automate
change, described this
observationsin Section the
The
5.2 modeltomachine
ensurepro-
learning
whose importance is contingent on the values
ysis.
model The used
model derivative
was
useda was ofa‘Increasing
reduced form of
reduced form demand
the model
of the of an occupation’
described
model described in Section inwith 5.2
Section respect
to 5.2ensure to ensure
to
Decline},
Decline}, for forrelativerelative share
shareof employment
ofduces
employment For
the aoccupation and
posterioreach and absolutequestion,
absolute
distribution change
ischange
the
over inmodel
in binary-valued
employ-employ- ingests
by thelabels,
ternary-valued
use ofnamely,
features made possible automatic differentia-
{Increasing of other features.
ment,
ment, respectively.
respectively. From Fromthese ternary-valued
these ternary-valued
demand,
tion,labels aDecreasing
feature observations
from observations
of demand},
one of
Google’s the the
themodel
which
Tensorflow modelare
sets pro- pro- as package
used
{Higher
Python theshare,foundation(AbadiSame ofettheal.share,anal-
(2015)),
duces
duces a posterior
a posterior distribution
distribution over binary-valued
over
ysis.which binary-valued
The provides
derivative labels,
labels,
the namely,
of framework namely,
‘Increasing demand
{Increasing
to {Increasing
GPflow. of an occupation’ with respect to
demand,
demand, Decreasing
Decreasing demand},
demand}, which
which
the areLowerused
are
occupation used share}
asfeatures
the
as the isand
foundation
foundation
made {Grow,
ofpossible
the
of 37 No
anal-
the anal-
by change,
the use of automatic Decline},differentia- for One approach to assessing feature importance is feature
ysis. TheThe derivative of ‘Increasing demand of an occupation’ with respect 37 37
to topackage
ysis. derivative of ‘Increasingtion, demand
a feature
5.3. relative
of
Active learning ofan occupation’
Google’s Tensorflowwith
share of employment and absolute change in respect
Python (Abadi et al. (2015)), selection (Guyon and Elisseeff, 2003). Feature selection is
thetheoccupation
occupation features
features is made
is made possible
whichpossible by by
provides thethe use
theuse of automatic
of automatic
framework todifferentia-
differentia-
GPflow.
tion,
tion,a feature
a feature of Google’s
of Google’s Tensorflow AsPython
Tensorflow employment,
described
Python package in Section
package (Abadirespectively.
4.3,etour
(Abadi al. (2015)),
et foresight
al. From
(2015)), theserequired
workshops ternary-valued choosing which a broad and well-studied topic, and aims to choose those
which
which provides
provides thethe framework
framework to 5.3.
GPflow.
tooccupations
GPflow.
Active learningwere to be presented to participants. We introduced the use features that are most informative of the function. In our
observations the model produces a posterior distribution
of a machine learning model to automate this choice. The machine learning
5.3.5.3. Active learning As described
model over in Section
usedbinary-valued
was a reduced form 4.3, our foresight
labels,
of the model workshops
namely, described required
{Increasing choosing
in Sectiondemand, which
5.2 to ensure context, it might be thought that the ranking of O*NET
Active learning
occupations were to be presented to participants. We introduced the use features selected through such a scheme is a means of
As As
described
described in Section
in Section 4.3,4.3,
ourourforesight
of foresight
a machine Decreasing
workshops
workshops
learning demand},
required
modelchoosing
required tochoosing which
automate which which arechoice.
this usedThe as the machine foundation learning
occupations
occupations were were to tobe be presented
presented to to
model participants.
usedparticipants.
was a We
reduced Weintroduced
introduced
form of
of the analysis. The derivative of ‘Increasing demand of the the the
model use use
described in Section 5.2 to ensure ranking their importance to demand. We suggest that
of aofmachine
a machine learning
learning modelmodel to automate
to automate thisthis choice. choice. TheThe machinemachine learning
learning
model used waswas a reduced form of the model an described
occupation’ in Section with5.2respect to ensure 37 to the occupation features is the bulk of methods of feature selection give, at best, an
model used a reduced form of the model described in Section 5.2 to ensure
OneOne approach One approach to assessing to assessing feature importance
feature importance is feature
made possible by the use of automatic differentiation, a andand
approach
insufficient toguide assessing
One approach tofeature importance,
One
Oneapproach importance
approach
to assessing toand, is
toassessing
assessing
feature atisselection
feature feature
worst, selection
feature
feature
importance
(Guyon
selection
actively (Guyon
importance
importance
(Guyon
is featureisselection isfeature
featuresele se
(G
Elisseeff, and
Elisseeff, Elisseeff,
2003). 2003). Feature
2003). Feature
selection selection
is aisbroad is aand broad well-studied and well-studied topic, topic,
One approach to assessing feature importance is feature selection (Guyon andFeature Elisseeff, and and selection
Elisseeff,
Elisseeff,
2003). Featurea2003).
2003). broad and
Feature
Feature
selection well-studied
selection
selection
is a broad isistopic,
aanda broad
broad and
and well-
well-studied wellt
feature of Google’s Tensorflow 37 and aims
and misleading:
and
aims to choose
aims
to chooseto those
choose feature
those features those
features selection
features
that that are that
most
are does are
most informative
most not
informative address
informative of theof the
function.
the of the
function.second
function.
and Elisseeff, 2003). Python Feature selection package is (Abadi
One broad et
a approach and to assessing
well-studied feature topic, and aimsand
importance to
is and aims
aimsto
choose
feature to
those
selectionchoose
choose features those
thosethat
(Guyon features
features are most that
thatare aremost
informative mostinformative
informative
of the func ofo
In our InOne our approach
context,
In our context,
context, it to itassessing
might might itbemight be feature
thought be
thought importance
thought
that that thethat ranking isranking
thefeature ranking
of O*NET selection of O*NET (Guyon
features features
37 37 al, 2015), which andprovidesaims to choose those features to
the framework that and areElisseeff,
GPflow. most informative 2003). Feature ofof theour selection
function. primary In is our a criteria.
broad InIn
context, One
our
our
and That approach
context,
itcontext,
might
well-studied is,the to
determining
itbeitmight assessing
mighttopic,be
thought be offeatureO*NET
thoughtthat
thought
that theimportance features
athat
thatfeature
ranking the theof isO*NET
rankingfeature
ranking ofofsele
O*O
fea
andselected
selected Elisseeff,
selected
through through 2003).through
suchsuch aFeature
scheme
such
a scheme a selection
isscheme a is means a is
is
means aa of broad
means ranking
of ranking and
of ranking
their well-studied
their importance
their
importance topic,
importance to to to
In our context, it might be thoughtand that aims thetoranking choose of those O*NET features features that are selected most and selected
throughselected
informative Elisseeff, through
suchthrough of 2003).
athe scheme suchFeature
such
function. aaisscheme scheme
a means selection
isisaof ameans is a of
means
ranking broad
oftheir
ranking
rankingand
importan well-s
their
theiri
One approach to assessing One One and
demand.
approach
itapproach
aims
feature
demand. is tohighly
demand.
We to choose
suggest
toimportance
We assessing
assessing We
suggest informative
thosethat
suggest
isfeature
feature
demand.
features
theimportance
feature
that that
bulk
the
and
importance
demand.
gives
that
selection
demand.
We
the
bulkof methods
aimssuggest
are
bulk
ofis
oftoWe
no
(Guyonmost
methods
We choose
isfeature
sense
offeature
suggest
that
ofinformative
methods
suggest the
feature of
ofselection
those
selection feature
that
that
bulk
of the
features the
ofthe
ofsign
selection
feature the
selection
(Guyon
(Guyonbulk
bulk
methods that of
function.
selection
give,
ofofare
the
give,
methods
of
at
most
methods
feature
give, at
at informative
ofof feature
featuresele
selection of
se
giv
selected through such a scheme is a In meansour context, of ranking One might
their approach be
importance thought to assessing tothat the feature ranking importance O*NET is feature features selection (Guyon
5.3. ACTIVE LEARNING and Elisseeff, 2003). Feature In
best, our an
selection
best, context,
best,
an
One One
insufficient an is approach
insufficient
approach ait might
insufficient
guide
broad to to
guide tobe
and
assessing assessing
guidethought
importance,
to well-studied
In to
importance,
isour feature feature
that
importance,
context, and, the importance
topic,
importance and, at ranking
itto worst,
and,
at
might worst,
isat is
of
actively
feature
be
feature
worst,O*NET
actively
totothought
selection
actively
misleading:
selection features
misleading:
that
(Guyon
misleading:
(Guyon
the at ranking worst,of O*N
demand. We suggest that the bulk of selected
methods and and
through andElisseeff,
of Elisseeff,
featuresuch Elisseeff,
selected
a2003). relationship
2003).
scheme
selection
and through
2003). give,
Elisseeff,
isFeature
Feature a Feature
such
atmeans best,
2003).a
between
selection
selection
scheme
ofan best,
ranking
selection
Feature
best,
insufficient
is
isafeature
a
abroad
an an
is broad
their insufficient
selection
means
ainsufficient
guide
broad and and
and
importance
of is
ranking
and
a of
demand.
well-studied
well-studied
guide
guide
importance,
broad
to
well-studied
their and
Most
topic,
topic,
importance,
importance,
and,topic,
well-studied
importance
atfeature and,
and,
worst,
to
atworst,
actively
topic,
active
activ
mislea
and aims to chooseand those feature
features feature
and selection
feature
that are
selection
Elisseeff, selection
does
most does
2003).not does
informativeaddress
not not
address
Feature selected the
address
of second
the
selection the
function.
second
through of second
is our ofasuch primary
our
broad our
primary
anot schemeand primary
desiderata.desiderata.
well-studied desiderata.
isfunction.
aofmeans That That
topic,
of That
best, an insufficient guide to importance, demand. and, We atand aimsaimsaims
suggest
and
worst, to tochoose
activelythatchoose
to choosethe those those
bulk
misleading: those features
features
of feature
methods
features that that are
selection
ofthat are
feature
feature most
feature most
are does
most informative
informative
selection
selection
selection not doesdoes
address
informative give,ofnot ofthe the
atthe of function.
function.
address
address
second
the thethe second
second
our primary ofofranking
our
ourdesiderata.
primary
primary theirdes im
de
As described in Section 4.3,does ournot In our
foresight Onecontext,
workshops
One
approach
best,
approachit to might
In In
anofinsufficient
assessing
ourto
our be demand.
is,
assessing
In context,
our
determining
thought
is, and
feature
context,
guide
and
context,
selection
is,
determining
aimsWe
feature aims
that
importance
toitmight
it mightsuggest
determining
to
importance,
it might
to
that
thebethat
choose
importance be adopts
choose that
athought
rankingisfeature
that
thought
is,
beand,
those
afeature
those the
is of
feature
determining
thought at
an bulk
afeature
isfeature
features
demand.
that O*NET
that
is,
information-theoretic
features
highly
is
selection
is,
worst, the
of highly
that
selection
the
determining
determining
that
that
methods
isWe informative
ranking
that
actively
the
highly
features
(Guyon
ranking areasuggest
are
informative
ranking
most
(Guyonof
that
feature
most
ofinformative
that
feature
ofO*NET
misleading:
gives
that
O*NET
informative
informative
aofaisfeature gives
feature
O*NET the
highly
no approach
selection
gives
sense
no
bulk isof
features
features
of
sense
of
isinformative
highly
no
the
highly
features
of the
give, sense
methodsthe
of at
function. that
function.
the
informative
ofofthe
informative
gives feature
no gives gives
sense seleno
on
feature selection address the second our primary desiderata.
In our context,That it might be thought that the ranking of O*NET features
selected
and and through
Elisseeff,Elisseeff, 2003). such2003). a
Feature best,
sign
schemeFeature of
signIn anthe
is
selection of
our insufficient
sign
a relationship
means
the of
context,
selection the of
relationship
is a is guide
relationship
broadranking
ita between
might
broad to
andbetween importance,
their
be
and between
feature
well-studied importance
feature
thought
well-studied and and,
featurethat demand.
and
topic, at
to and
demand.
the worst,
topic, demand.
Most
ranking actively
Most feature
of Most misleading:
feature
O*NET selection
feature selection
featuresselection
required choosing is, determiningwhich occupations that a feature were is highly feature
tochoose selected
selected
selection
informative
be does
selected
gives through
through not would
such
address
through
no suchsuch
sense aof anot
theschemescheme
the a distinguish
second sign
scheme is isof aofaour isbest,
means
the means
sign sign an ofof
between
of
of
relationship
aprimary
means theinsufficient
ranking
ranking
the relationship
relationship
desiderata.
of between
ranking athat
their guide
their
feature importance
That
their between
feature toimportance
importance
between importance,
and to to
todemand.
feature
feature and,
andanddemand. at worst,
demand.
Most feature activel
Most
Most fea
selefe
demand.
and aims and aims We
to suggest
to choose those that those feature
adopts
the
features bulk
adopts
selected
features selected
an selection
adopts
that ofanthatmethods through
information-theoretic
an does
information-theoretic
through
are most
are not
information-theoretic
of
mostsuch suchaddress
feature
informative a approach
aofscheme
informative scheme the
selection
ofapproachsecond
is
the ofan aisof
approach
that a means
give,
means
function.
the of
that our
would
function.at would
of of
primary
not
ranking ranking
would distinguish
not not their
desiderata.
distinguish
their distinguish importance
between
importance Thatof
between between
to to
sign of the relationship between feature is, determining
and demand. demand.
demand. that
demand.Most We We
a feature suggest
suggest
feature
We suggest isthat
that
selection highlythe
that the bulk
adopts bulk
informative
the bulk offeature
an methods methods
adopts
adopts selection
an
information-theoretic
of methodsgives ofno feature
feature
information-theoretic
ofsensedoes
information-theoretic feature not
selection
selection
of address
approach
theselection give,give, the
that at
approach
approach
give, second
at would atthat thatnotourwould
would primary
distinguishnotnotdistindesi
disti
bet
presented to participants. We introduced best,
In ourInan insufficient
context,
our
the context,
sign use
it guide
might
of it
best, might
a best, a
to
beis,feature
importance,
a feature
demand.
thought
aninsufficientbe
insufficient
demand.
determining a
thought
x ,
feature for
for
that
x We
,and,
guide which
which
forWe
that
that
the
x , at
suggest
which suggest
for a (x)
ranking
the
f
totoimportance,
feature
which
worst, f
aimportance, that
(x)
ranking
that
of
α−xfis,(x)
actively is
the O*NET
α−x the
highly
,
of and
bulk O*NET
,determining
and, α−x ,
bulk and
anda
misleading:
atof of
informative
feature
,
featuresa and
methods
,1worst,
a
feature feature
methods
features
that a x feature
,
actively for
of
aactively
x ,of
gives
ffeature
which
feature
for xfeature
, no
whichfor
for f sense
(x) which
selection
which
selection
isa feature
highly
f (x)
of
β+x
f the
(x)
give, give,
informative
β+x at
β+x at
gives noβ
best, an guide tofeature and,
axa1feature
feature at xworst, ,forffor actively
which
which f(x) (x) 1 ,misleading:
misleading: 1 ,1 ,and andax a2feature
,feature xx2 ,2 ,for
ffor which
whic
1 1 1
adopts an information-theoretic approach of the
that wouldrelationshipnot between
an insufficient
distinguish 1
feature
between guide and demand.
importance, for Most
1
which
and, 1
x1featureat (x)
worst, 2
selection
α−x 2 and
2
α−x misleading:
α−x 2
for which
2 2
(x)
feature
selected selection
selected through throughdoessuch notsuch (αsign
address
a scheme and
(αbest,
aselection
scheme ofandbest,
βthe
the
is (α abeing
βand
second
an meansan
isrelationshipβinsufficient
some
insufficient
being
a not means being
of our
ofsome ranking between
parameters).
some
primary
guide guide
parameters). parameters).
to to
feature
desiderata. importance,
Both
importance, Both and Both
and demand.
That1and, andxand,
x2 1are xand
at at
Most
highly
worst,
are xworst, are
2highly feature actively
informative
highly
actively selection
informative misleading:
informative
misleading: ofdemand. of of Most fea
(ofdoes ranking their their
importance importance to to
feature does address the sign
second of ofof the x
our 1relationship
xparameters).
primary between
2desiderata. feature and
machine learning a feature model x1 , for towhichautomate f (x) α−x this adopts
1 ,choice.
and anThe
a feature feature selection
information-theoretic
xfeature
2 , for which selection does
f (x) not
does
approach
β+xaddress
not (α
2 address thatthe
and second
would
βthe (α being
(α
being and
and
not
second our
some βsome
being
distinguish
β of primary
being our parameters).
somesome
primary desiderata.
between parameters).
parameters). Both
desiderata. xThat
That 1BothBoth
and Both
That x2xx1are 1and
and and xx2 2are
highly are highly
highly
informati
is, determining
demand. demand. We suggest We that suggesta thatfeature adopts
demand.
that
the is
feature
demand.
bulk
the feature
andemand.
highly
bulk
of selectionselection
information-theoretic
However, However,
methods
of However,
informative a
methods skills
doesofa not
skills
featurepolicy
of a not
givesskills
address
policy
feature address
approach
that
no policy
selection sense
that result
the
selection the that
second
give,of
result second
in would
the result
broad
give,
at of
in ofatnot
our
broad inour
increases
broad
primary primary
distinguish
increases increases
in desiderata.
desiderata.
x inbetween
would inwould xThat That
would
is adopts foran information-theoretic ofapproach that in would notincrease
disting
is,1is, x
2 determining
determining that that a afeature 1feature demand. ishighly
aisfeature highly demand. 2 ,informative
informative
demand.
However, However,
However,
a skills fgives gives no
agives
policy no
askills sense
skills that sense
2 policy
policy ofthe
result the
that
that
ofinthe result
result inbroadbroad increas
1
(α and β being some parameters). Both a feature x1 and x ,xfor is,
arewhichdetermining
highly f (x) informative
α−xthat , aand offeature highly
x which
informative (x) β+x no sense broad
1 1
increases in x1 w
is, arexdetermining
highly that
informative aoccupations;
feature of is ahighly
demand. ,informativeHowever, gives a1xalsono sense of that,
the
machine learning demand. model However, useda was sign
best,
skills apolicy
of anthe
best,
reduced that
relationship
insufficient
an
(αresult
and
insufficient
form βin
guide
of
signsign
being
broad
between
of guide
of
sign
to
the
some
athe
lead feature
increases
to
leadis,
importance, harmful
featurelead
relationship
ofrelationship
parameters).
the
, to
determining
to
importance,
relationship
in
for
and
1harmful which
harmful
outcomes
and, demand.
between
between
wouldBoth
and,
atthat
outcomes lead
between
(x)
outcomes
fworst, atfor
aMost
1 feature
xfeature and
tofeature
for
worst,actively
α−xaxlead
harmful
feature 2feature
and and
leadare1 , is
for
feature
occupations;and
actively occupations;
highly
misleading:
demand.
demand.
to to harmful
harmful
outcomes
highly
and x ,xfeature
selection
1demand.
2 , informative
misleading:
for the
xwhich
Most
informativeMost
2outcomesthe
outcomes
for
,2f,feature
xconverse.
x2feature
Most
for thewhich
converse.
(x)
occupations; of
converse.
gives
for
feature
Note fNote
selection
selection
for
(x)
no
occupations;
α−x , skills
occupations; 2and
selection
Note
,sense
the
that,
also
β+x policy
a featurealso
of
x2x2 ,2the
that,
converse. ,the
the ,converse.
for which
x2converse.
Note alsoN
adopts
feature an
feature information-theoretic
selection selection does does
not (α
for
address
not and
our
for sign
address sign
complex,
approach
our
thefor
β ofbeingofx1the
our
complex,
second
the some
complex,
that relationship
non-linear,
relationship
second of would parameters).
non-linear,
our of non-linear,
not
primary
our betweenbetween
function
distinguish
primary function Both
desiderata. function
,
feature
f feature
relationships
desiderata. between
x , and
and
relationships
That
f ,and x demand.
That demand.
are
relationships with highly with Most
x Most
informative
with
are feature
are feature
unlikely
x are
unlikely selection
of
unlikely
to
selection to to
the model described lead to harmful in Section outcomes 5.2for occupations; demand. x2 ,adopts adopts
However,
the anan
adopts
converse. a information-theoretic
information-theoretic
skills
an Note thatpolicy
information-theoretic
also result that
that, in
for
result approach
approach
broad
our (α for
complex,
inapproach
broad and
that
for that
increases
our
our would
increases
βthat being
f
would
complex,
complex,1
non-linear, not
would some
not
inindistinguish
2
non-linear, parameters).
distinguish
non-linear,
not would
1function
would
distinguish betweenx
,between
lead
function
ffunction Both
relationships
betweentoffwould,harmful1 and
,xrelationships
relationships
with x2 xare i are highly
with
with xxi iia
unlike
ato ensure ‘real-time’
i i i
x
ademand. adopts However, afan information-theoretic
ainformative
skills policy that approach
result that . would xnot distinguish x3 ,between
for our complex, non-linear, functionlead
is,feature
determining x1 , for
is, determining which
that
to harmful
f , relationships athat
a featuref (x)
afeature be
feature
outcomes
a feature
with
as
1x
be
xfeature
α−x issimple
, 1adopts as
x, ifor
for 1be
1which
xare ,for
isas
, which
and an
simple
highly as simple
highly fthat
feature
information-theoretic
(x)
occupations;
forunlikely
which
as
informative
(x) that
fto (x)
described
as
α−xbe
xα−x 2that
, for
described
xas
gives
2, 1
1α−x
described
which
,simple
,and and
the
above
demand.
1be
givesno
andaasabove
aconverse.
,be
(x)
fapproach
sense
feature
feature
asas
for
no
asimple
simple
that
above
feature xfor
sense
xof
However, 1β+x
Note
2x ,in
and
that
the
xas
,described
2for
as
for
for broad
of and
which
x1 22that
xawould
xthe
which
,that
alsofor 21skills and
x increases
fAnother
described fnot
2 .(x)
described
above
that,
which
Another
policy
(x) f (x)
for
β+x
in
2 .distinguish
Another
feature,
that
above
β+xabove
x12β+x
xfeature,
1 result
2 for
and
feature,
between
for
2 xx 2x .1in 3 ,and
xand broad
1 Another
x3 , increase
xx2 .2 . Anothe
Anoth
feature
performance. be In (α
sign and of
sign the being
of some
relationship
the relationship parameters).
between lead
may may
between a to
give a
feature feature
outcomes
harmful
may
Both
give (x)
featuregive
and(x) , and
for ,
outcomes
cos(x for
(x)
demand.
and for
cos(x
which which
): are
demand. occupations;
cos(xfor
while,
): highly
(x)
Most (x)
occupations;
while, ): again,
Most while,
informative
feature again, ,
feature and ,
again,is
selection and a ,
highly
is a
the
the
of
feature
selection feature
highly converse.
converse.
is informative
highly , for
informative , for
Note
which which
informative
ofNote also
demand,
of (x) (x)
that,
of
demand, demand,
as particular, given that above ourfor goal ourxis to(αand
17 β f x x f x f α−x x x x f β+x
simple as that described for (α
1complex,
and and
x(α2 .β being
being
non-linear,
and Another
β β being some some fx
parameters).
function
feature,some 11 1
parameters). xf3 ,, may
parameters).32 3
relationships
f Both Both
give lead
3
mayfmay
x(x)
Both 1xto
α−x and
1withand
give
xharmful
give 1 x 1
cos(x
3
and
2x
fx x
iare
f(x)2(x)
3are
2
are x3 ): outcomes
2highly
highly
cos(x
cos(x
while,
3
unlikely
are highly
x
3 ):again,
2
to for2
informative
informative
): while, occupations;
while, x3again,
informative ofof
is f
again, highly of
β+x , highly
the
xx3 3xisinformative
2is highly
2 converse.
2
informativ
informat
of dem N
itfor isit our is(α
unclear it and
complex,
isaβresult
whether
unclear being
non-linear, itsome
whether isitthat parameters).
function
it is ,1Both
for Both
xrelationships
some 1for and
xoccupations
1 some withx3are are
occupations highly
are
i(values unlikely informative tox3of of
demand.
adopts adoptsanHowever,
information-theoretic ademand.
an information-theoreticskills policy (α that
approachunclear
and approach being whether in broad
some important:
increases
is
parameters). important: important:
in for would some and occupations xhighly (values fof(values
informative xof 3 ), 3 ), with x a
),ofxoccupations
βthat that
would it would not distinguish
not distinguish between between
for our fcomplex,
predict demand, may and give fthat (x) our cos(x 3 ): while,
model is to again, beto as simple demand.
is highly as that
demand.
informative However,
However, described also
However, of ademand,
skillsskills
that,
above a skillspolicypolicy
forfor is
x1unclear
our
policy that
and result
itit
complex,
that result
xis 2is in
unclear
unclear
.whether
result in
Another broad1broad
in whether
xwhether
itbroad
non-linear, isnon-linear,
increases
increases
feature, important: xit 2it
increases xin 3is inxfunction
is2function
, important:
1xinfor would
would
important: x1some would ,, for
relationships
for some
occupations some occupations
(valuesi of(
aable
x3provide demand. However, asignificance
skills policy that result in broad 1
increases in would
lead
a feature harmful
featurex1 , for ,outcomes
x1which for which (x) x be
for (x) as
will simple
occupations;
demand. have
will x , andwill
have very
, as
a have
However,
and that
verydifferent
feature
x ,
a theverydescribed
different
feature a , different
converse.
skills
for which
, be above
policy
significance
for significance
Note
which (x) for
from
thatalso from
(x) x that
that,
result and
from
that for
in x .
that
other
broad
for Another
other foroccupations.
other
increases feature,
occupations. occupations.
in Tox 1, To
xwould To
it is unclear whether it is important:may forgive some flead
lead (x)
f to
lead
occupations
f
toharmful
3
cos(x
x
α−x
harmful 3
to harmful
α−x
3 ):(values
1 3outcomes
outcomes
while, 1
outcomes
of 2
again, xfor for xforx
occupations;
),occupations;
3 will
2 x
isoccupations;
highly
2 have xx3 3as will f
will
very simple
, 2the
informative
x 2x ,have f
xthe
have
different asvery
β+x
converse.
1
,converse.
very
the that
of 2β+xdifferent 2 described
different
significance
demand,
converse. Note
2 Note also
Note also
significanceabove
that,
significance
from
alsothat, that
that,for x
fromfrom
for
1 3
x1 other and
that
thatfor 2 . other
for Anothe
other
xoccupations oco
lead to xharmful x3Automatic 23outcomes ifor occupations; xf23Determination ,informative
the converse. Note also that,
isrelationships with are unlikely ,to be as simple asto3 that
for our(αincomplex, non-linear, bemay function
belead give
explicit, be
explicit,to (x)
explicit,
the
,harmful
relationships
the cos(x
1Automatic the
outcomes ): xwhile,
Automatic Relevance
with again,Relevance
are Determination
unlikely is highly Determination
to approach approachof
(Rasmussen demand, (Rasmussen
,for occupations; xthe converse. approach Note
estimates of the uncertainty of demand (α and βand being
all being
some
βoccupations, some
parameters).
parameters). Both Both and xand are 2 highly
are Relevancehighly informative informative of 2of
to(Rasmussen also is that,
ffnon-linear, x
x3 will have very different significance it is from unclear forfor
that our ourcomplex,
whether
forfor our
othercomplex, non-linear,
itoccupations.
complex, important: non-linear, To function
1function
be
for explicit,
some
function f may be ,be
relationships
foccupations give
relationships
fexplicit,
explicit,
the Automatic
, relationships (x) the thewith
(values x
withcos(x
Automatic
2Automatic xare
Relevance
with
x iof 3i ):
xarex3 ), while,
unlikely
iunlikely
Relevance
are Relevance again,
Determination
unlikely to Determination
Determination
x highly
approach informativ
approac
approa
(Rasmu
be asdemand.
demand. simple However, as that
However, a described
skills a it
and
skills
policyis
and for for
unclear
Williams
above
policy and
Williams
our
that our complex,
that
resultcomplex,
whether
Williams
for(2006)) x (2006))
result and
in it
often
(2006)) non-linear,
non-linear,
broad inx is .
often
broad important:
used often
Another
used
increases for function
increasesfunction
used
embedded
for for
feature,
in for
embedded in some
,embedded ,feature
relationships
frelationships
would x , occupations
would feature selection
feature selection
with with
(values
selection
for i are
Gaussian
for
xare of forunlikely
Gaussian
unlikelyx ),
Gaussian to to
we took the natural be explicit, option the Automatic
of uncertainty Relevance will have
x3Determination
sampling. bebeas as
very simple
besimple different
as
approach simpleasasdescribed
that that
as described
significance
(Rasmussen that described
1
described and
above
from above
2 above
Williams
that it
for
aboveforand isxunclear
and
for 1and
Williams
Williams
(2006))
other
1for
x and and x f
x1 occupations.
1 xand 2whether
x
often
x ..Another
.(2006))
2(2006))
1 3 Another
2 .used
xAnother itoften
Another isTo
often
for important:
feature,
feature, usedused
embedded
feature,
x for
feature,ifor
x 3x for
,embeddedx3 , some
,feature
3embedded
3
may occupations
feature
feature
selection selection
selectio
for Gau (v
processes will be have as
processes simple
very as
different ,that described
significance above
from that for xfor 1 and other . description
Another
aoccupations. feature,
To 3,
is inappropriate xis , inappropriate for our ends.
for our It ends.
provides xIt provides
only xa2ends.
may give f (x) cos(x ):givewhile,
x3 processesbe again,
as simpleis is
3inappropriate
2ashighly that informative
described for our of
ends.
above demand, It
for provides and xdescription
2only a description of the xof3a,xfor
the
lead lead to harmful to harmful ouronly of.significance
Another feature,ItItofprovidesthe
outcomes outcomes
may for foccupations;
for
(x) occupations;
cos(xx thex2processesconverse.
the converse. Note
will Note
have also veryalso
that, that,
1different from that other occ
may 3 ):3 ):while, 3while, again, is 3highly informative of demand,
3give (x) cos(x again, is approach
and Williams (2006)) often used for embedded be explicit, the
feature may Automatic
fgive
selection f(x) forRelevance cos(x
Gaussian Determination
): while, xprocesses
again, 3is
x processes
3x 3 xhighly
inappropriate isisis highly informative
inappropriate
inappropriate for
(Rasmussen informative for demand,
for our
ofour ends.
It demand, ends.
provides provides
only only
only
description aadesc
deso
Uncertainty sampling it
for is unclear
our
for complex,
our whether
complex, non-linear, it is
non-linear, be
important:explicit,
informativeness may
informativeness
may
function give
functiongive give
,the
informativeness
for of
some
(x)
relationships
, (x)
Automatic
features,
of ofcos(x
occupations
features,
cos(x
relationships with):: ):
while,
Relevance
features,
rather rather
while,
with while,
thanrather
(values
are again,
than
again, again,
Determination
their
unlikely
are than
of their importance.
unlikely ),
is
to their is
is
importance.
highly highly
to highly
approach
importance. informative
informative informative
(Rasmussen of of
demand,demand,
processes isranks the setfor ofour all ends.
occupations f x x
inappropriate andItWilliams provides it itis is(2006))unclear
unclear
itonlyis unclear often
a descriptionwhether
whether f
used
whether
f
itfor f
itis is
of itimportant:
important:
embedded
the isinformativeness
important: 3 3
be
for
feature
x fori explicit,
some
informativeness x
some
informativeness ofsome
selection
for i theoccupations
features, x
occupations
occupations 3 3
for Automatic
3
ofofGaussian
features,
features,
rather (values
(values Relevance
than rather
rather
(values ofx3x),
oftheir than
of ),
3than Determination
x3 ), their
importance. theirimportance.importance. approach
xbe3 will have
asbesimple as simple very
as that different
as that
described and
describedA
significanceit Ait unclear
Williams
second
is
above is A
secondunclear
above for (2006))
approach
from second that
approach
whether
for whether
and often
approach
for to
and . used
other
it to it2managing
managing
is
Another. isto for important:
embedded
managing
occupations.
important:
Another features
feature, features
for
feature, for
Tofeature
features
is
some , some
dimensionality
is occupations
selection
is
occupations
dimensionality
, dimensionality for (values (values
Gaussian
reduction, reduction, of of
reduction, x3 ),
),
informativeness of features, rather than processes
their 3willwill
x3 have
xis3xinappropriate
importance. willhavehave very very offorverydifferent
differentourdifferent
demand,
x
ends.
1 x
significance
1 x
significance 2 A unclear
Ititsignificance
provides
is
x and
second from from Williams
A
only A that that
second
approach
from second for
a description
whether that for (2006))
other x other
approach
approach
to
for3 x
managing
other
it often
3
isoccupations.
occupations.
of theto toused managing
occupations.
important: managingforToembedded
features Tofor features
is
Tofeatures x feature
dimensionality3 selection
isis dimensional
dimensiona reduc
from high- to low- uncertainty, and be choosesexplicit, f thethef(x) highest-
Automatic ): whichprocesses
Relevance
3 ):which would which will
isagain,
3 would xhave
inappropriate
involve
would
Determination
3 involve very
projecting
involve different
for
projecting our
projecting
approach the significance
ends.
data
the the
(Rasmussen
ofdata It
into provides
data from
ademand,
into that
only
lower-dimensional
ainto afor
lower-dimensional afor
lower-dimensional other
description occupations.
space. of the
To space. ToTo
aspace. intoTo
xwill have isvery different significance from that other occupations.
A second approach to managinginformativeness
may may
give give (x)
featuresbebe
cos(x cos(x
explicit, while,
3explicit,
beofexplicit,
is dimensionality features,
xwhile,
the again,
3the Automatic
Automatic
the rather Automatic
reduction,
xhighly
than
is
Relevance
3Relevancewhichhighly
their
informative
Relevance
informative
processes
Determination
Determination
would
importance.whichwhich involve would
Determinationwould demand,
is of
inappropriate
involve
projecting approach
approach
involve projecting
approach projecting
the for data our
(Rasmussen
(Rasmussen the ends.
the
into
(Rasmussen data
data Itinto provides
lower-dimensional aTo only a space
alower-dimensio
lower-dimensi desc
ranked for labelling by participants. and
it is
Thethe Williams
it
unclear
is unclear
motivation (2006))
whether whether often
it
forWilliams is informativeness
give
used
it an
give
is
important: be forbe some
example,
give
an explicit,
embedded
explicit,
important: an
example,
for of
occupations
example,
the
some
for the
features,
dimensionality
feature Automatic
dimensionality
Automatic
someoccupations rather
dimensionality
selection
occupations (values
reduction
Relevance Relevance
than for
reduction
(values theirof
reduction
can
Gaussian
(values of Determination
importance.
be
can
Determination of), achieved
becan willbe
achieved
), have
achieved
by
approachapproach
the
by very
ubiquitous
the by the (Rasmussen
ubiquitous
(Rasmussen ubiquitous
isinformativeness ofselection
features, rather than betheir importance.
and Williams (2006)) often used for embedded feature x x forforGaussian Gaussian
which would involve projecting dataAinto second aand approach
and
lower-dimensional Williams (2006))
to
and
managing
(2006))often
space.
Williams
used
often
To give
features
(2006))
for
used an embedded
for
often
give
example, give an anfeature
dimensionality
embedded
used
example,
example,
dimensionality
for
feature
embedded
selection
3 dimensionality
dimensionality
reduction,
selection
3 reduction
feature
for reduction
reduction
can
Gaussian
selection for
can
achieved can
Gaussian
bebeachieved
achieved
by the ubiqu bybyt
processes
x3 will x3 have willis inappropriate
have
very very different technique
for
different A
our
technique
and
significance second
technique
ends. of
Williams
significance It
of
from approach
Principal providesof
Principal
from(2006))
that Principal
Component
that
for to
only
Component
often managing
other
for a Component
used
other Analysis
description
occupations.
A for features
Analysis
occupations.
second Analysis
embedded (Pearson,
of the
approach is
(Pearson,
To dimensionality
(Pearson,
feature
To 1901).to 1901). Dimensionality
selection
managing 1901). reduction,
Dimensionality
forDimensionality
Gaussian
the approach give is the an example,
expectation dimensionalitythese reduction
labels which should would
can beprocesses
processes involve
processes
achieved
be is is byinappropriate
inappropriate
projectingisthe inappropriate
differentubiquitousthe for data for our
significanceour
technique
into
for ends.ourends. ofIt
Ittechnique
technique
a lower-dimensional
ends. provides
provides
Principal
from It provides of ofonly
that only
Principal
Principal
Component aonly
for adescription
space. description
aComponent
other Component
To Analysis
description ofofthe
occupations. theof features
Analysis
Analysis
(Pearson, the (Pearson, is dimensionali
(Pearson,
1901). 1901).
1901).D
Dimension
informativeness
be explicit,
be explicit, the of the features,
Automatic Automatic which
reduction
rather
reduction
Relevance processes
would
processes reduction
than
Relevance on involve
can
their
on is
Determination
x isoncaninappropriate
certainlyprojecting
importance.
inappropriate
Determination x can
certainly certainly
be
approach used
be
for the for
approach used
our to our
data
be discover
used
ends.to
(Rasmussen ends.
into
discover to
It
(Rasmussen a It
discover
that
provides provides
lower-dimensional
thatcertain that
certain
only only
features
certain
a a description
features space.
description features
co-vary. To
co-vary. of of
co-vary.
the the
bewhich x would involve projecting the data intocertain a lower-dimension
informativeness
informativeness ofoffeatures,features, x rather
rather reduction than than ontheir
their
reduction
reduction can importance.
importance. on on
certainly xxcan can certainly
becertainlyused tobe be used
used
discover totothatdiscover
discover that
that certain
certainco-
features fea
fe
technique of Principal Component Analysis give an (Pearson, example, dimensionality
informativeness
1901). Dimensionality ofreduction
features, can
rather achieved
than their by the
importance. ubiquitous
andAWilliams second approach give
This, anof informativeness
example,
course,
This, of dimensionality
isexplicit,
course,
not the of features,
isfeature
same
not the as rather
reduction
same
discovering as than
can
discovering be
that their achieved
the importance.
that
tworeduction, by
are
the the
twoubiquitous
similarly are similarly im- im- im-
most informative reduction about on xdemand can certainly overall:be used
and theirWilliams
technique
to
(2006))
observation
discover
(2006))
ofthat A Ato
often
Principal
managing
often
used
second
second
A second
certain
This, used
for Toof
informativeness
approach
approach
Component
features
features
embedded
for be
course,
approach
embedded
toAnalysis
co-vary.
ismanaging
toismanaging
ofnot
feature
toThis,
the
features, the
dimensionality
ofsame
selection
(Pearson,
managing Automatic
give rather
selection
features
features
This,
course, This, as
an for
of
1901).
features of
reduction,
discovering
isthan
example,
is Gaussian
for
is
course,
course,
not Relevance
their
Gaussian
dimensionality
dimensionality
the
Dimensionality
is is that
importance.
isdimensionality
not
same notthe
dimensionality
the
thediscovering
as Determination
two
same are
reduction
reduction,
same similarly
asasdiscovering
reduction, thatcan
discovering the be that achieved
that
two the
are thesimilarl
two
twobyareth
ar
which
processes would
processes involve
is inappropriate
is inappropriateprojecting technique
portant
for the
portant
ourfor portant
to
A
ends.
ourdataA
of to
second second
Principal
demand.
ends.
It into to
demand.
provides
It a approach
demand.
approach MoreComponent
lower-dimensional
provides More
only to only
a to
sophisticated
More sophisticated
managing managing
description
aAnalysis
sophisticated
descriptionspace. uses features
of features
(Pearson,
usesTo
theof of dimensionality
uses
of
the is is
1901).
of
dimensionality dimensionality
dimensionality Dimensionality
dimensionality reduction, reduction, reduction,
reduction,
reduction,
reductionwhich on which can
x which would
would certainly
would involve
involve involve projecting
beprojecting
used the
portant
to (Rasmussen
projecting discover thedata data
the technique
to
that into
portant
portant
data into
demand.
certain ainto toof
to Principal
alower-dimensional
lower-dimensional
demand.
featuresdemand.
aWilliamsMore
lower-dimensional Component
More
sophisticated
co-vary. More space.space.
sophisticated
sophisticated Analysis
To
uses
space. To of To (Pearson,
usesusesfor
dimensionality 1901).
ofofdimensional
dimensiona reduc D
should lead toThis, a large of course, reduction is not the insame total
give
asuncertainty.
an
informativeness
discovering
example,of
informativeness
that The
dimensionality
features,
of
the
features,
two
reduction
that that
rather
are
includewhich
that
reduction
which rather
similarly
approach
on
include
than would would
the
include
than
x theircan
canvalues
the
im-
involvebe
their involve
certainly
the
values
importance. of
achievedvalues of
projecting
importance.
f projecting
(x)be used
itself,
of
by
(x) fthe to
(x)and
itself,
the canthe
discover
itself,
ubiquitous
data candata
be used
intocan
be into
that used
a be
to (2006))
a discover
certainlower-dimensional
used
to
lower-dimensional tofeatures
discover often
discover
relationships
relationships used
co-vary. space.
relationships
space. To To
portant to demand. More sophisticated This, uses give
of course, give
of an an
give example,
isexample,
not
dimensionality the same
an example, dimensionality
dimensionality as discovering
dimensionality
reduction, that reduction
reduction includef
that reduction
that
reduction that can
the can
the be
include
include
two be on
achieved
values
can are achieved
xthe
be can
the of values
similarly
achieved certainly
by
fvalues
(x) bythe the
itself,
im-
by be
fthe canused
ubiquitous
ofofubiquitousf(x)(x) betoused
itself,
itself,
ubiquitous discover
can
canto be that
bediscover
used
usedto certain
torelation
discover
discovefea
technique A ofsecondPrincipal Component This,
tobetween between
give of give an an
course,
between
O*NET
Analysis O*NET
example,example,
isfeatures
O*NET notfeatures
features
(Pearson, the issame
features
and
isdimensionality 1901). demand.
and as anddiscovering
Dimensionality
demand. reduction
demand.
reduction However, However, that
can can
However, the be
dimensionality
be two achieved
dimensionality
achieved areas
dimensionality
by by
thethe
similarly
reduction, ubiquitous
reduction, im-
ubiquitous reduction,
ofembedded feature selection for Gaussian andprocesses isthat dimensional
active learningthat approach is values
one that aims A second to approach
interactively approach
technique
technique managing
ofto of managing
Principal
Principal features Component
Component dimensionality
between dimensionality
This,
Analysis
Analysis between
O*NETbetween of(Pearson,
(Pearson, reduction,
course,
O*NET
O*NET
features reduction,is
1901).1901).not
features
and features the same
Dimensionality
Dimensionality
demand. and demand. discovering
demand.
However, However,
However,
dimensionality the tworeduc
dimensiona are
include the of f (x) itself, portant
can beto used demand.technique
to discover More sophisticated
Principal
relationships Component uses of dimensionality
Analysis (Pearson, reduction,
1901). Dimensionality
reduction
which whichwould onwould caninvolve
xinvolve certainly
projecting inportant
considering
be
projecting in used technique
into
considering
technique
the to
data
the demand.
considering
discover
of
data
into of
combinations Principal
Principal
into
a that
combinations More
combinations
certain
lower-dimensional
a of
ComponentComponent
sophisticated
features,
lower-dimensional of features of
features, features,
will
Analysis
space. Analysis
uses
co-vary. fail
will
space. Toof
to
will
fail
(Pearson,(Pearson,
dimensionality
satisfy
to
To fail satisfyto our
1901). 1901).
satisfy first
our our
first Dimensionality
reduction,
desidera-
Dimensionality first
desidera- desidera-
between thatHowever,include reduction
reduction the ononx xof
values
reduction can can (x)
oninappropriate certainly
fxcertainly
can itself,
certainly be canbe
in used used
considering
befor to to
used
used portant
discover
inin discover
toconsidering
toconsidering to
that
combinations
discover
discover that demand.
Itcertain
that certain
combinations
combinations
relationships of
certain More
features
features
features, sophisticated
features ofco-vary.
co-vary.
of features,
features,
will fail to
co-vary. uses
will
will
satisfy fail oftoour
fail dimensionali
to satisfy
satisfy our
first desi our
acquire data so as toO*NET provide features
the greatestand demand.
This,
give an confidence
of example,
give course,
an example, isdimensionality
not the
dimensionality
insame that
tum.
dimensionality tum. as reduction
include
That
reduction tum.
discovering
That
reduction
reduction,
is,
reductionthe
That
in is,can
on on
values
mixing
xin is,
that
xmixing
can
becan can
in of
the certainly
features
mixing
certainly
achieved
be (x)that
two
features
fachieved our by itself,
features
are
be be
together, ends. used
similarly
together,
used can to to
be
together,
dimensionality
discover provides
discover
used
im-
dimensionality to that
discover
dimensionality
that onlycertain
freduction
certain afeatures
reduction description
features
relationships
reduction
will fail
will
co-vary.co-vary.
will
usedfail
fail
in considering combinations of features, between will fail O*NETThis,
This, to This, ofof
satisfy course,
course,
featuresofour course,isandisnot
first not the
demand.
is not
desidera- the same same
the tum. as asdiscovering
However,
same discovering
as tum.
That tum. is,the by
include
That
in
dimensionality
discovering
ubiquitous
that
That the
that
mixing ubiquitous
the
the
is, is,
that the in two
in values
two
mixing
features mixing
reduction,
the aretwoare of features
are (x)
similarly
similarly features
together, itself,
similarly im- im-
together,
together, can
im- be
dimensionality dimensionality
dimensionality to discover
reduction redu
red
wi
portant
techniquetechniqueto of demand.
Principal
of Principal More
Componenttobetween
uncover
sophisticated
to
Component This, This,
to
uncover O*NET
of
Analysis
of theof
uncover
clear course,
clear
course,
Analysis and
uses features
clear
(Pearson,and
is
informativeness isdimensionality
interpretable
of not not
and and
interpretable
(Pearson, the the
1901). demand.
interpretable
same same
1901). relationships
as
Dimensionality
of as
reduction,
relationships discovering
However,
discovering relationships
Dimensionality
features, to increasing
to
that that
dimensionality to
increasing
rather the the
increasing
two two
demand.
than are
reduction,
demand.
are similarly
demand.
similarly
their im- im-
resulting predictions. tum. That is, in mixing features together, in considering portant
portant
dimensionality totodemand.
combinations
portant demand.
to demand.
reduction More
of features,
willMore More
fail sophisticated
sophisticated
to uncover
willsophisticated
fail to between uses
totosatisfy
clear usesand
uncover
uncover O*NET
of
uses ofdimensionality
our clear dimensionality
clear
interpretable
offirst features
and and
desidera-
dimensionality interpretable
interpretableand
relationshipsdemand.
reduction,
reduction, relationships
reduction, However,
relationships
to increasing dimensionali
totoincreasing
increasing
demand. d
that
reduction include
reduction on the x on can values
certainly
can of in
certainly
f (x)
be considering
We portant
used portant
make
itself,
We
be We
make
used
to cantwo
to
discover
to to
combinations
make
be
two
demand. demand.
complementary
discoverused two
complementary
that tocomplementary
More
that
certain More
of
discover features,
certain sophisticated
proposals
sophisticated
features relationships
proposals
features will
proposals
for
co-vary. fail assessing
for
uses
co-vary. touses satisfy
for
assessing
of of
assessing
featuredimensionality
dimensionality ourfeature first
importance,
feature desidera-
importance, reduction,
importance,
reduction,
to uncover clear and interpretable relationships tum. That that
xthat
ininclude
include
is, increasing
to that mixing include the the values
features
demand. thevalues
importance. valuesofoff (x)
together, fof (x)fitself, itself,
We itself,
dimensionality
(x) in
can
make canbe considering
WeWebeused
two
can makeused
make
complementary
be reduction totwo
used combinations
to discover
discover
two tocomplementary
complementary
will
discover proposals
fail of
relationships
relationships features,proposals
relationships proposals
for assessing will fail
for to
forassessing satisfy
assessing
feature featufi
our
featur
import
between
This,This, O*NET
of course, of course, isfeatures
not notand
is the tum.
same
the and that
same as that
That
ultimately
demand. and
ultimately
include
discoveringinclude
is,
asfeatures in
ultimately
present
However,
discovering mixing
the the
present
that values values
results features
present
dimensionality
that
the results of
two
the of
fromfresults
(x)
aref (x)
together,
from
two each.
itself,
similarly
are itself,
from
reduction,
each. can
similarly each. can
dimensionality
im- be be im-used
used to
toreduction,
discoverdiscover
reduction willrelationships
relationships fail
We make two complementary proposals to uncover forbetween between
clear
assessing and
between O*NETO*NET
interpretable
feature O*NET features
importance, features and and
relationships and anddemand.
demand. ultimately
demand. tum.
to and However,
and That
However,
ultimately
increasing ultimately
present
However, is, inresults
dimensionality
dimensionality
demand. mixing
present
present
dimensionality from features
results
results reduction,
each. from together,
from
reduction, each. dimensionality redu
each.
in considering
portant portant to demand. tocombinations
demand. More toof
More uncover
features,
between
sophisticated between
sophisticated clear will
O*NET O*NET
and
uses failof uses
of features features
interpretable
to satisfy
dimensionality
of dimensionality and our and first
demand. demand.
relationships desidera-
reduction, However,
reduction, However,
to increasing dimensionality
dimensionality demand. reduction,
reduction,
and ultimately present results from each.We make ininconsidering
considering
two complementary
in considering combinations
combinations combinations proposals offeatures,
features,
offorfeatures, to
assessing will uncover
will fail fail
will tofail
feature clear
tosatisfy satisfy and
toimportance,
satisfyour interpretable
our first firstdesidera-
our desidera-
first relationships
desidera- to increasing d
tum.
that that Thatinclude
include is,the in mixing
values
the valuesfeatures
of f (x)ofis,Wein in
make
together,
itself,
(x) considering
considering
itself,
cantwodimensionality
be
cancomplementarycombinations
combinations
usedbe used
to toreduction
discover discover of
ofproposals
features,features,
relationships
39 39 will for
relationships39fail
will will
assessing
fail fail to to satisfy satisfy
feature our our
importance,
first first desidera-
desidera-
and ultimately tum.tum. That
That
present
tum. That
fis,in in
resultsis, mixing
mixing infrom mixing features
features
each. features together,
together, together, We make
dimensionality
dimensionality dimensionality two complementary reduction
reduction reduction 39 will will proposals
fail fail
will 3939fail for assessing feature
to uncover clear and interpretable and demand. tum. tum.
ultimatelyrelationships
That That is, is,
present in to in
mixing mixing
results
increasing from
features features each.
demand. together, together, dimensionality
dimensionality reduction
reduction will fail
between between O*NET
39
O*NET features
toto features
uncover
uncover and
to uncover
and
clearclear demand.
and
clear
However,
andinterpretable
and interpretableHowever,
interpretable
dimensionality dimensionality
and
relationships
relationships ultimately
relationships
reduction,
totoincreasing reduction,
topresent
increasing increasing results
demand.
demand. demand. from each. will fail
We in make
in considering considering two complementary
combinations
combinations of to to
proposals
uncover
features,
of uncover
features, will for
clear clear
fail
will assessing
and tofailandsatisfy
to interpretable
interpretable feature
satisfy our for first
ourimportance, relationships
relationships
first
desidera- desidera- to to increasing increasing demand. demand.
37 We Wemake make
We two
make two complementary
complementary
two complementary 39 proposals
proposals proposals for assessing
assessing
for assessing feature
feature feature importance,
importance, importance,
and
tum.ultimately
tum.
ThatThat is, in present
is,mixing
inand results
mixing features from
features We
together, We
each. make
together,make twotwo
dimensionality complementary
complementary
dimensionality reduction 39 proposals
reduction proposals
will fail willforfor assessing
fail assessing feature
feature importance,
importance,
and andultimately
ultimately ultimately present
present present results
results from
results fromfrom each.each. each. 39
to uncover
to uncover clearclear and interpretable
and interpretable andand ultimately
ultimately
relationshipsrelationships present
to present toresults
increasing results
increasing from
demand. from each.
demand. each.
We make We make two complementary
two complementary proposals39proposals for assessing for assessing feature feature importance, importance,
3939 39
and ultimately
and ultimately present present results resultsfromfrom each.each. 39 39
a feature x1 , for which f (x) α−x1 , and a feature x2 , for which f (x) β+x2
feature selection does not address the second of our primary desiderata. That
(α and β being some parameters). Both x1 and x2 are highly informative of
is, determining that a feature is highly informative gives no sense of the
demand. However, a skills policy that result in broad increases in x1 would
sign of the relationship between feature and demand. Most feature selection
lead to harmful outcomes for occupations; x2 , the converse. Note also that,
adopts an information-theoretic approach that would not distinguish between
THE F UT URE OF SKI L L S: EMPL OYMEN for T IN our2 0complex,30 non-linear, function f , relationships with xi are unlikely to
a feature x1 , for which f (x) α−x1 , and a feature x2 , for which f (x) β+x2
be as simple as that described above for x1 and x2 . Another feature, x3 ,
(α and β being some parameters). Both x1 and x2 are highly informative of
may give f (x) cos(x3 ): while, again, x3 is highly informative of demand,
demand. However, a skills policy that result in broad increases in x1 would
it is unclear whether it is important: for some occupations (values of x3 ),
lead to harmful outcomes for occupations; x2 , the converse. Note also that,
x will have very different significance from that for other occupations. To
for our complex, non-linear, function f , relationships with xi are unlikely3 to
be explicit, the Automatic Relevance Determination approach (Rasmussen
be as simple as that described A second above approach for x1 and x2to managing
. Another feature, features x, is dimensionality 5.4.2. AVERAGE DERIVATIVE AND FEATURE
and3 Williams (2006)) often used for embedded feature selection for Gaussian
may give f (x) cos(x3 ): while, again, x3 is highly informative of demand,
reduction, which
it is unclear whether it is important: for some occupations (values of x3 ),
would involve projecting processesthe data
is inappropriate for our ends. COMPLEMENTARITY
It provides only a description of the
informativeness of features, rather than their importance.
x3 will have very different into a lower-dimensional
significance from that for otherspace. occupations. To give To an example,
A second approach to managing Ourfeatures second is dimensionality
proposal gives reduction,
a means of satisfying our
be explicit, the Automatic Relevance Determination approach (Rasmussen
dimensionality reduction can be achieved
and Williams (2006)) often used for embedded feature selection for Gaussian
which would by the involve 5.4.2. projecting Average the data into a lower-dimensional
derivative and featurespace. To
complementarity
give an example, dimensionality reduction can be achieved by the ubiquitous weakening the case for
secondary criteria, while perhaps
processes is inappropriate for ubiquitous
our ends. Ittechnique provides only ofaPrincipal
descriptionComponent of the Analysis
technique of Principal Component 5.4.2. Our second proposal
Analysis gives
Average
(Pearson,a means of
derivative satisfying
1901). Dimensionality our
and feature secondary desiderata,
complementarity
informativeness of features, rather than their importance. while perhaps the first of the primary the criteria. of theThe average derivative, as
(Pearson, 1901). Dimensionality reduction reductionon on x can can certainly be usedweakening to discoverthe thatcase certainfor features firstco-vary. primary desiderata.
A second approach to managing features is dimensionality reduction, The average Our
derivative,
described second as proposal
described
in Baehrensgives
in a means
Baehrens
et al. et ofal.satisfying
(2010), (2010),is our
is
for for secondary
thethe desiderata,
nth feature
This, of course, is not the same as discovering that the two are similarly im-
which would involve projecting certainly the data beinto used to discover that
a lower-dimensional space. certain To features co-vary.simplywhile perhaps weakeningthe case
portant to demand. feature More sophisticated uses of dimensionality reduction,
for the first of the primary desiderata.
give an example, dimensionality reduction can be achieved by the ubiquitous simply
The average derivative, as described ∂m(x)
in Baehrens et al. (2010), is for the nth
This, of course, is not the same as discovering
technique of Principal Component Analysis (Pearson, 1901). Dimensionality
that include the that valuesthe of two f (x) itself, can be used
feature simply
AG(n) to :=discover
E relationships
, (8)
between O*NET features and demand. However, dimensionality ∂xn reduction,
reduction on x can certainlyare be usedsimilarly to discover important that certain to demand.
features co-vary. More sophisticated ∂m(x)
in considering combinations using theof employment-weighted
features, will fail to satisfy ourAG(n)
expectation := E in (6). ,
firstdefined
desidera- (8)
(8)
This, of course, is not the same as discovering that the two are similarly im- ∂x
portant to demand. Moreuses of dimensionality reduction, that include
tum. That is,the in mixing values By of
features
way oftogether, interpretation, dimensionalitythe derivative reduction will failthe nexpected increase
measures
sophisticated uses of dimensionality reduction,
to uncover clear and in interpretable using
demand for relationships the
a unit increase employment-weighted
toinincreasing
a particular demand. expectation defined
feature (for instance, as a result in (6).
that include the values of f (x) itself, itself,can canbebe used used to discover
to discover relationships relationships
We make two complementary
between of a policy intervention). proposalsBy way forof interpretation,
assessing
By averaging feature overthe all derivative
importance,
occupations, measures
we getthe expected increase
a sense
between O*NET features and demand. However, dimensionality reduction, using the employment-weighted expectation defined in (6).
O*NET features and demand. However, dimensionality
and ultimately present of theresults in
from each.
aggregate demand for a unit
increase in demand as a result increase in a particular
of this increase featurein(for instance, as a result
a feature.
in considering combinations of features, will fail to satisfy our first desidera-
The average derivative of a policy intervention).
gives an interpretable theBy averaging
notion over all occupations,
can clearly the a sense
of sign: itmeasures we get
tum. That is, in mixing features reduction, together, in dimensionality
considering reduction combinations will fail of features, will By way
of the
of
aggregate
interpretation,
increase in demand
derivative
as demand.
a result of this increase in a feature.
distinguish positive 39 from negative relationships with
to uncover clear and interpretable relationships to increasing demand.
fail toproposals satisfy our first criterion. That is, in mixing features The first advantage expected
The average increase
derivative
of this metric gives in demand
relative an to the for
interpretable a unit
marginal notion increase
of sign:is it
correlation incan
a clearly
We make two complementary for assessing feature importance,
that it is sensitive distinguish positive from
to non-linearities in negative
the data,relationships
addressing the withfirst
demand.
of our
and ultimately present results from each.
together, dimensionality reduction will fail to uncover clear desiderata. particular
The first
feature
advantage
(for instance,
of this givesmetrica relative
as a result of
to the marginal
a policy
secondary While the derivative linear approximation to correlation is
and interpretable relationships to increasing demand. demand, it is only intervention).
that aitlocally linearBy
is sensitive to averaging over
non-linearities
approximation. By alldata,
in considering
the occupations,
addressing
the approx- wefirst
the getof our
39
imation at allsecondary desiderata. in
points (occupations) While the derivative gives a linear approximation
skills-/knowledge-/abilities-space, we to
a
demand,
sense of the
it isrelationships
aggregate
only a locallythat linear
increase
approximation.
in demand
Byatconsidering
as a result
We make two complementary proposals for assessing are able to better measure have different slopes different the approx-
regions of the imation of
space. thisatincrease all pointsin a feature. inThe
(occupations) average derivative gives we
skills-/knowledge-/abilities-space,
feature importance, and ultimately present results This ability aretoable manage to better measure relationships
non-linearity also enables that have different
the average derivativeslopes at different
an interpretable notion of sign: it can clearly distinguish
from each. to capture theregions importance of the of space.
features whose significance is conditional on
the values of other positive
This features.fromto
ability For negative
manage
instance, relationships
non-linearity
Fine Arts also is very with
enables demand.
the average
important to derivative
Artists, but less to important
capture the forimportance
occupationsofwith features
differingwhose significance
skills profiles. This is conditional on
5.4.1. PEARSON CORRELATION the
is achieved through Thevalues of
reporting
first other
advantage thefeatures.
derivative
of this For instance,
averaged
metric overFine Artsto
subsets
relative ofisthe
verymarginal
occu- important to
pations: for instance, Artists, those but less thatimportant
fall within foraoccupations
major occupational with differing
grouping skills
20 profiles. This
.
We first consider a direct means of achieving our primary For a non-linear iscorrelation
achieved
function,through the is average
that it derivative
reporting is sensitive for ato
the derivative non-linearities
averaged
feature may be oversub- in the
subsets of occu-
two criteria, while ignoring the secondary criteria. This stantively different pations:
data,over for
one instance,
addressing region of those
thespace that fall
of within
(occupational
first our secondaryagrouping)
major occupational
than for grouping
criteria. While .
20
another. For a non-linear function, the average derivative for a feature may be sub-
metric of the importance of a feature, which we abbreviate For each subset the derivative
stantively differentgives
of occupations, over
we one a linear
will region
highlight ofapproximation
space
those(occupational
features with tobothdemand,
grouping) than for
5.4.1.5.4.1.Pearson Pearson correlation correlation large positive another. and
it islargeonlynegativea locally average
linear derivatives.
approximation. Those with Bylarge pos-
considering
as Pearson correlation,
5.4.1. Pearson is the employment-weighted
correlation
We consider
first consider a direct means of achieving our primary two desiderata, itive derivatives we For say eachare subset of occupations,
complementary to the we will highlight those
occupational groupfeatures
(in- with both
We first a direct means of achieving
consider our primary twoofdesiderata, the approximation at all points
Pearson
5.4.1. correlation We first coefficient a 18 between
direct means ourachieving model’s our primary large two desiderata,
positive and large negative average(occupations)
derivatives. Those in with
skills/
large pos-
whilewhileignoring ignoring thePearson the secondary
secondary correlation
desiderata.desiderata.
while correlation This This
metric metric
ignoring the secondary desiderata. This metricof the ofimportance
the importance of of
of theitive importance of we say are complementary to the occupational group (in-
5.4.1. Pearson derivatives
knowledge/abilities/space, we areabove ableastoemployment-
better measure
predictive
a feature,
a feature, Wewhich which
first weconsider wemean
abbreviateabbreviate for
a adirect demand
as Pearson
feature,
as Pearson
means which
and
correlation,
of achieving theisour
correlation, feature.
theis employment-
primary More
the employment- two desiderata,
20
To be precise, this entails only redefining the expectations
19 we abbreviate as Pearson correlation, is the employment-
weightedweighted
5.4.1.
Pearson Pearson
Pearson
correlation
We correlation
first correlation
coefficient
consider coefficient
a 19
directbetween between
means our of our
model’s model’s
achieving predictive predictive
our mean
primary mean
weighted
two sums over that subset of occupations, rather than over all occupations.
desiderata,
while
precisely, ignoring let the the secondary
posterior
weighted desiderata.
Pearson mean correlation Thisthe
for metric latent
coefficient of the 19 importance
demand
between our model’sof relationships
predictive mean that have different slopes at different regions
for demand
for demand and the
a feature, and
while the
feature.
which feature.
ignoringwe More the More
abbreviate precisely,
secondary precisely, let
asofPearson the letposterior
desiderata. the posteriorThis mean ismean
metric for of
the the for the the importance ofTo be for
20
precise, this entails only redefining the expectations above as employment-
We first consider a fordirect demand means and the
achieving feature.correlation,
(i) More
our primary precisely, twoemployment-let the
desiderata, posterior
ofmean thesums space. the
latentlatent
demandfeature
demand
weighted feature for
a Pearson forthe
feature
feature,
5.4.1. the for
which
correlation
ith
latent
Pearson
occupation
theoccupation
we
ith occupation
abbreviate
demandcoefficient
correlation be m(x
feature
be
19 be
as (i)
).the
Pearson
betweenm(x
for Weour)..canWe
We
correlation,
model’s canthen
can
then
occupation
then
define define
isbe
predictive the employment-
mean
(i)
). We
weighted
can then define
over that subset of occupations, rather than over all occupations.
while
5.4.1.
the Pearson ignoring
Pearson
correlation the secondary
correlation
value for desiderata.
the O*NET This metric
feature
ith
as of the importance m(x of 41
the Pearson correlationweighted valuePearson for thecorrelation 5.4.1.
O*NET
nth Pearson
feature
coefficient 19as correlation
between our model’s predictive mean
afor demand and the feature. More precisely, letfor the posterior mean for the as
nth
define
feature, thewhich
We
Pearson
first
we the abbreviate
consider
correlation
Pearson
a
correlation
as
direct
Pearson
means
value value
correlation,
of
forthe
achieving
the is nth the
our
O*NET
O*NET
employment-
primary
feature
two desiderata,
We first
latent demand for consider
demand feature a
direct
and the means
feature.
for(i)thecoefficient
We
of
occupation
first
achieving
More
consider
precisely,
be m(xour primary
let
(i)
). We
the two
posterior
can desiderata,
then mean
define for This
the ability two desiderata, non-linearity
to manage also enables the average
(i) xa
direct
weighted Pearson Icorrelation between our −nmodel’s xnmeans of
predictive achieving
mean our primary
ith 19
feature
I
as:
while ignoring ) − for
the
(i)
) m(x)
secondary desiderata.
(i)
This metric
of canthexthenimportance
41
while
the ignoring
Pearsonlatent the
demand
correlation
w(i) secondary
w(i)
m(x feature
m(x
value Edesiderata.
for −the E ithm(x)
theprecisely,
xThis
occupation
IO*NET −E metric
n feature
x E(i)of
be the
as
(i)
).importance
We (i)of define of
for demand and
i=1
:= the
i=1
feature. while
More ignoring
nth n w(i)
the
let secondary
the m(x )−
m(x
posterior Edesiderata.
mean
m(x) for This
the −E metricxn of the importance
derivative to capture of the importance of features whose
PC(n) PC(n)
:=the awhich feature, which we abbreviate ascorrelation,
Pearson , (5) asis(5) the employment-,
a feature, wecorrelation
abbreviate as (i) correlation,
Pearson Pearson
value for the i=1
, isfeature
the employment-
n
O*NET
PC(n) σ(i):= correlation, is the (5)
nth
latent demand feature for σthem(x) aσ feature,
ith occupation
m(x) x σ19 which
xn be we 19abbreviate
m(x ). Wecan as Pearson
then
define employment-
weighted Pearson weighted Pearson
correlation
I correlation
coefficient )
n coefficient
Ebetween our between
model’s
(i) σ− m(x)
E our model’s
predictive
σ x predictive
mean
mean
significance is conditional on the values of other features.
the Pearson correlation i=1value
w(i) Ifor weighted
m(x −
the nthPearson O*NET
m(x) correlation
feature
x n ascoefficient
x n
n 19
between our model’s predictive mean
for forand demand and the feature. More )
(i) precisely, let xthe (i) posterior mean for the
wherewhere(for any (fordemand
anyPC(n)
function function g(x))
:=the
we
g(x)) feature.
we define
define the
i=1
More
for
w(i)
the
demand
precisely,
m(x
employment-weighted
employment-weighted and
−let Ethe
the
m(x)
feature.
posterior
expectation n −
expectation
More
mean , xfor
E
precisely, n (5) the
(5) let the
For posterior
instance, mean fine for the
arts is very important to artists, but less
latent demand PC(n)
latentfeature where
demand I for :=feature
(for
the m(x any
ithfor function
σ the
occupation xnwe
σoccupation
be define
(i) ).be
We the employment-weighted
(i)
). Wedefine can , then (5) define expectation
xcan then
m(x) ithg(x)) (i) m(x
and variance (i)
)demand m(x
and variance w(i) latent − E m(x) feature xforn x− the E ith n occupation be m(x ). We can then define
(i)
the Pearson the Pearson
correlation andi=1 variance
correlation
value for the value for σ the
O*NET m(x) nth σ O*NET
feature n as feature as important for occupations with differing skills profiles. This
PC(n) := nth (5)
Ithe Pearson correlation value for the, nth O*NET feature as
where (for any function g(x))
I
we
define σ m(x) the employment-weighted
σ xn expectation
where
and variance (for
where any
E (for
g(x) Eany function
:=Ifunction
g(x) :=w(i)g(x
I (i)
g(x))
w(i)g(x )
(i)we
we
))−E anddefine
define
(i)
) and the
g(x)) −
(i) the
:=EI xm(x)
I
employment-
employment-weighted
(i)
E xnx(i) (6)(i)
n(i))− E
(6) expectation
is achieved through
reporting the derivative averaged over
i=1 w(i)
) −and xn(6)
i=1 w(i) m(x m(x
E m(x) n − x
and variance i=1 := w(i)
w(i)g(x
m(xexpectation E nm(x) x(i)
(5)− E of
(for PC(n) :=2 PC(n) (5) , subsets
i=1
where
weighted any function
expectation 2 and
g(x)) we
define I
2σ m(x)the employment-weighted
variance: PC(n)
2 :=
i=1
2 i=1 ,
n
occupations:
, (5) for instance, those that fall within
and variance σ g(x) σ g(x) :=E Eg(x) :=
g(x) E := 2g(x)
− Ew(i)g(x − E
g(x)
g(x)
,σ σ)xnm(x)
(i) 2 ,
and
σ xn (7) (7) 2
m(x)(6)
σ I g(x) := E g(x)2 − Eσ g(x) ,σ xn a major occupational (7)
E g(x) i=1 := w(i)g(x(i) ) and (6) grouping.19 For a non-linear function,
I is theI istotal where
the total (fornumber
number where
any (for
of function any
of occupations,
occupations,
g(x))function
2we define
and and the
Ig(x))
iswe the define
is theemployment-weighted
employment-weighted
the
fraction fraction
2
of total of total
em- expectation
em- (6) expectation
E
σ is the
g(x) where
total
:=:= w(i)
E (for
number
g(x)
w(i) 2any
i=1
of
− (i)function
occupations,
E) and
g(x) g(x))
, we
and define the
is the employment-weighted
(7)
(6) fraction the of average
total em- expectation
derivative for a feature may be substantively
the and variance
and variance I g(x) w(i)g(x
2 2 w(i)
ploymentploymentwithin within iththeoccupation.
ith occupation. and := ith
variance E g(x) 2
− E g(x) , (7)
ployment σwithin
g(x) the occupation.
different over one region of space (occupational grouping)
i=1
Pearson Pearson correlation
correlation
I is the total number measures measuresthe linear
of
the
occupations,
2 linear
relationship
relationship
and
between
is the between
2 demand
fraction demand and and
ofI total em- (7)between demand and
Pearson correlation
:=
I 2 measures I
the linear relationship
neither of(7)
E w(i) E )w(i)g(x
a feature.
a feature. As such,
ployment AsIsuch, itis theit total
gives
within gives
the
theasign
σthe
Eg(x)
number sign
of clear
occupation.
g(x) of:=E
of clear g(x)
relationships,
occupations,
g(x) w(i)g(x :=− (i)
relationships, and
g(x)but
butand , (i)
satisfies is)clear
and
satisfies
neither
the fraction
(6)
total em- (6)
feature.
ith As such, it gives the w(i)
signE of
g(x) := relationships,
w(i)g(x (i)
but
) than
satisfies
and for another.
neither (6)
of ourofsecondary
our secondary desiderata.
ployment desiderata. One
within One
consequence
the consequence of
i=1 linearity
occupation. of linearity
i=1
is that is that
the the Pearson
Pearson
I is Pearson
correlation themay total correlation
place numberlow
of our measures
of
weight
ith
secondary
occupations,
on 2 the
features
linear
and
2 relationship
desiderata.
2:=w(i)
that
One
is the between
consequence
2 2fraction
demand
2 of
of
i=1 total and is that the Pearson
linearity
em- and2(7)
correlation a may
feature. placeAs low
Pearson
such,weight it σ on
correlation
gives
correlation
g(x) features
the measures
:=
sign
σ E
may that
ofplace
g(x) clear
g(x) are
the Eare
linked
linear linked
relationships,
low− Eg(x)
weight
g(x) to on−,to
high
relationship Ebut
high
demand
2
g(x)
features
demand
between
satisfies
, neither
that are2demand
(7)
linked to high demand
ployment within the ith occupation. σtheg(x) := Ethat g(x) − E g(x) For ,each subset of(7) occupations, we will highlight those
only only
for aof for
small
our a small
number number of
a feature.desiderata.
secondary Asonly of occupations.
occupations.
such,forit One agives Nonetheless,
small
Nonetheless,
thenumber
consequencesign of clear the features features
relationships,
of linearity
of occupations. that it
but
is thatNonetheless, it
satisfies the
the Pearson neitherfeatures that it
Pearson correlation measures the linear relationship between demand and
does does
highlighthighlight
is the
will will
totalis unquestionably
the
unquestionably numbertotal number
be ofsignbe
important: important:
occupations,
of occupations,
if a if
strong a
thatand strong
and positive positive
issatisfies
istothe
the
linear linear
fraction of total em-
isfeatures with bothem- large positive and large negative
is
correlation the total
of our may number
secondary
place of occupations,
lowdesiderata.
weight One andconsequence is the of fraction
linearity of total
is that if theem- Pearson
I isonof features are of linked high demand
I I w(i) w(i)
a feature. As such, itdoes
gives highlight
the will
the clearunquestionably
total number
relationships, bebut important:
occupations, and
neither a w(i)
strong positive
the fraction linear of total
interaction
interaction only exists,
ployment forexists,
aitwithin
correlation
small it should
ployment
should the
may
number within
certainly certainly
ithplace of the influence
occupation.
low influence
ithweight
occupations. occupation.
our on our resulting
resulting
features
Nonetheless, skills
that theareskills
policy.
linked
features policy. toour high
that itdemand
offraction
ourwe secondary of total interaction
employment
desiderata. One exists,
ployment
consequence it
within should
within thethecertainly influence
occupation.
ith occupation. resulting skills policy.
As such,As such,
we would
Pearson would
only consider
for consider
Pearson
a Pearson
correlation small Pearson
correlation
measures
numbercorrelation correlation
measures
the
of to
linear
occupations. givetoof
the
linearity
agive
linear
relationship ifaalinked
sufficient
Nonetheless,
is that
sufficient
relationship
butpositive
between not
the Pearson
but
demand
the not
between
features and demand average
itand but not
derivatives. Those with large positive derivatives
does
correlation highlight may will
place unquestionably
Aslow such,weight we would be
Pearson
on important:
featuresconsider
correlation
that Pearson
are strong
measures correlation
to the linear
high tolinear
demand give athat
relationshipsufficient between demand and
necessary
necessary condition
a
interaction condition
feature. does a
for
As for
feature.
importance.
such,
highlight
exists,number importance.
it Asgivessuch,
will
itnecessary
should the it gives
sign
unquestionably
certainly of the
clear sign
influence of clear
relationships,
be important:
our relationships,
resultingbut
if a satisfies
strong
skills but satisfies
neither
positive
policy. neither
we say
linear are complementary to the occupational
only
Pearson for a small of condition
a feature.
occupations. for
As importance.
such,
Nonetheless, it gives thethe sign
features of clear
that relationships,
it but satisfies neither
Asof our such, secondary ofcorrelation
interaction
we our secondary
would desiderata.
exists, measures
consideritdesiderata.One
should
Pearson consequence the consequence
One
certainly
correlation
linear
of linearity
influence
to
relationship
ofissufficient
aour linearity
that
resultingthe is that
Pearson notthe
skills Pearson
policy.
does highlight will unquestionably of ourbesecondary important: if give
desiderata. a strong One consequence
positive butlinear of linearity
group is(increasing
that the Pearson such a feature increases demand),
correlation
between
necessary As correlation
may
such,
demand
thecondition place
we may
wouldlow
and place
weight
consider low
on
acorrelation
feature. weight
features
Pearson As on thatfeatures
correlation
such, are that
linkedto
itselection
gives giveare
to the alinked
high to
demand
sufficient highbut demand
not
itfor importance.
19
Note thatPearson Pearson correlation coefficient is influence
oftenmay usedplacefor feature
low weight selection on (aspolicy.
a features a that are selectionlinked to(ashigh
interaction exists, should certainly our resulting isskills a demand
19
Note that the correlation coefficient is often used for feature (as
filter),filter), only for
an exception
an exception to a only
necessary
the small
to thefor a 19
number
condition
inappropriate
inappropriate
Noteoffor
small that
number the Pearson
occupations.
importance. of occupations.
information-theoretic
information-theoretic
correlation
Nonetheless,
methods methods
ofgive
coefficient
Nonetheless,
featureofthefeature features
selection
often theused
selection that for
features it featurethat
whereas it those with large negative derivatives we say
Assign such, we
ofdescribe
clear would consider
relationships,
filter), an only
Pearson
exception butfor a small
correlation
to thesatisfies
inappropriatenumber
to neither of
a occupations.
sufficient
ofifpositive
our but Nonetheless,
not the features that it
does
we generically
we generically highlight
describe doesabove.
above. highlight
will will
unquestionably unquestionablybe important: if ainformation-theoretic
be important: strong a stronglinear methods
positiveoflinear feature selection
necessary
19
Note that condition
the Pearson for importance.
we generically
correlation does describehighlight
coefficient above. will used
is often unquestionably
for feature be important:
selection (as skills
a if a strong
are positive linear
anti-complementary (increasing such a feature
interaction
secondary interaction
exists, it exists,
should it should
certainly certainly
influence influence
our resulting our resulting
skills policy. policy.
filter), 19
an exception Notecriteria.
that
to the One interaction
theinappropriate
Pearson consequence
correlation exists, of
coefficient
information-theoretic itis linearity
should
often
methods used offorisfeature
certainly
feature influence
selection (as
selection oura resulting skills policy.
Asgenerically
such, weAsdescribe
such,
would we would
consider to thePearson considercorrelation Pearson correlation
to give a sufficient to give aofbut sufficient
notselection but not
we
that Notethe
filter), an
Pearson
exception above.
correlation
inappropriate
Ascoefficient
such, may weinformation-theoretic
would
isplace
consider
lowforweight
methods
Pearson feature
oncorrelation todecreases
give a sufficient demand).but not It is also of interest to speak of
necessary
19
that
we necessary
condition
the Pearson
generically condition
for importance.
correlation
describe above. for importance. often used feature selection (as a
filter), an exception to the inappropriate necessary condition formethods
information-theoretic importance. of feature selection complementarities between features, rather than simply
wefeatures
generically that describe are above. linked 40 to 40high demand only for a small
19
Note that 19 Note
the that the
Pearson Pearson coefficient
correlation correlationiscoefficient often usedisfor 40
oftenfeature used selection
for feature (asselection
a of(as the a complementarity of a feature to an occupational
number of occupations.
filter),
filter), an exception an
toexception to theNonetheless,
19
the inappropriateinappropriate the correlation
featuresmethods
Note that theinformation-theoretic
Pearson thatselection
coefficient
information-theoretic methods of feature ofisfeature
often selection
used for feature selection (as a
we we describe
generically generically describefilter),
above. above.an 40exceptionbe
to the inappropriate information-theoretic group.
methodsTo do so,
of feature we must find some way of singling out the
selection
it does highlight will unquestionably we generically describe
40
important:
above. if a
features associated with the occupational group. Those
strong positive linear interaction 40 exists, it should certainly
features that are large on average for an occupational
influence our resulting skills policy. As such, we would
group are, speaking roughly, those that are most
consider Pearson correlation40 to give a40sufficient but not 40
exceptionally significant, and hence most characteristic,
necessary condition for importance.
38
by the5.5. average
5.5.New New derivative:
occupations
occupations the recommendations
other targeted
facet of our ofmodelling,
this metric
occupational as cannot
in producing
groups. be
However, occupation-
there is a
derivative varying over x, thereby increasing the standard deviation. Note
trusted for these features. The excluded predictionsfeatures
the of are
demand. usedofasexclusion.
normal in Any everynon-linearity in d
that features excluded underWe this
We scheme
define
define a a are
potential excluded
potential newnew only
occupation forofconsideration
occupation as
threshold
asa combination
a combination of skills,
of skills, knowledge
knowledge
other facet of our modelling, as in producing derivativeoccupation-level
varying over and x, aggregate
thereby increasing the st
by the average derivative: and the recommendations
andabilities thatthatis islikely of this totometric
seeseehigh cannotfuture bedemand,
demand,but butis isnot notassoci-
predictions ofabilities
demand. likely
5.5.
high
that features
New
future
occupations excluded under this scheme are exclud
associ-
trusted for these features. The ated atedexcluded
withwith an anfeatures
existing
existing are used
occupation.
occupation. as normal
To To in every
forecast
forecast wherewhere new new occupations
occupations mightmight
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0 by the average derivative: the recommendations o
other facet of our modelling,emerge, as in producing
emerge, wewesimply simply occupation-level
optimise
optimise the
We the and
posterior
posterior
define aaggregatemean
meanfornew
potential forthe the latent
latentdemand
occupation as a vari-
demand vari-
combination o
predictions of demand. 5.5. New occupations trusted for these features. The excluded features are
able, able, m(x)m(x) asas a function
a function of of
x.and x.MoreMore precisely,
abilities precisely,thatweiswe start
start
likely bybytorandomly
see highselecting
randomly selecting
future demand,
other facet of our modelling, as in producing occupa
We define 5050currentacurrent
potential occupations
occupations
new occupation asasour ourstarting
ated starting
aswith anpoints
a combination pointsof ofhigh
existing ofhigh demand
skills,
occupation. demand occupations.
knowledge
To occupations.
forecast where new
predictions of demand.
5.5. New occupations We
and abilities Werun run local
that local isoptimisers
optimisers
likely to (limited-memory
see(limited-memory
high future
emerge, we simply BFGS,
BFGS,
demand, observing
observing
but
optimise isthe box-constraints,
not box-constraints,
associ- mean for the la
posterior
creasing such a feature increases demand), whereas those with large negative ated with asas per
anperByrd Byrd
existing etet al. (1995))
al. (1995))
occupation. initialised
initialised
To
able, forecast
m(x) atat each
where
as each of of
a function the
new the 50 50occupations.
occupations.
occupations
of x. More might This
precisely,This will
willstart by
we
We willdefine de- a potentialemerge, new occupation as aoptimisers
combination of skills, knowledge
derivatives weofsay theareoccupational
anti-complementary grouping. (increasing
creasing As such,
creasing
such awe
suchsuch
feature
a feature
a feature define
increases
increases demand),demand), whereas50
returnwelocal
return
whereas 5050
simply
those those optimisers
local local
withoptimise
withoptimisers
large large theof of of
posterior
50
negative
negative m(x): :5.5.
points
current
m(x): points
mean New
points † occupations
xforxin
occupations inskills,
the the as skills,
† skills, knowledge and abilities
in latent knowledge
demand
our abilities
starting and
vari- abilities
points of high de
creases demand). It is also of interest to speak of complementaritiesand betweenabilities that is likely to
space see
asthat
high are future
associated demand, with but
high is not
demand. associ-
Many by of these optimisers
selectingwill
creasing suchcreasing
a featuresuch increases demand), whereas
features those with withlarge large negative
positive derivatives
average derivatives we we say
say
derivatives are are beexistingable,
toanti-complementary
anti-complementary space
(increasing
m(x)
and
athat
(increasing function
knowledge such are suchassociated
a offeature
features
More
ax.feature with
de- We high
precisely,
de-run
spaceWe demand.
localwe optimisers
define
that
start Many
aispotential of
randomly
associated
these
new optimisers
(limited-memory
occupation
with high
BFGS,asbeabecombina
will observi
creasing such aa feature
feature
features, increases
increases
rather demand),
demand),
than simply whereas
whereas
of thethose those with large
with
complementarity large negative
negative
of a feature toatedan occu-with an occupation. To
(near-)identical, forecast and where others new willoccupations
be might
(near-)identical to existing
derivatives we say are anti-complementary
derivatives we saysay are anti-complementary (increasing
anti-complementary creasing such sucha feature
(increasing a feature
such de-increases
a creases
feature
creases demand).demand).
demand),
de- It isItalso
whereas
emerge,
is alsoof
those
we
of interest
interest
with
simply to
large
50 tocurrent
speak
optimise
speak
negative
(near-)identical,
ofthe occupations
of complementarities
complementarities
posterior mean
asand our others
starting
between
for
between
the
as will
per
latent
be
points
Byrd
anddemand
(near-)identical
of
et high
abilities al. (1995))
vari-
demand
that isto existing
initialised
likely tooccupations.
occupations. occupations.
at
see eachhighof future
the 50 dem
occ
derivatives we are
pational group. To
complementarydo so, we (increasing
must find
to those such
some a
way feature
of
characteristic singling de- out the features
(large-valued) We rundemand.
creases demand). It is also of interest to of speak of complementarities
toderivatives we between
say are features, features, rather ratherthan than
simply
(increasing
simply of the of the Beginning
complementarity
complementarity Beginning
local a of with
ofoptimisers Many
with a single
a feature
feature to ofanto these
such
a(limited-memory
single such
an
occu- occu-optimisers
optimiser,
optimiser,
return 50
atedBFGS,local
with
†,0 †,0
,will
x x ,optimisers
we
an we add
observing be
add
existingeach(near-)identical,
each successive
successive
box-constraints,
ofoccupation.
m(x): pointsoptimiser
optimiser
To in skills,wher
x forecast
†
know
creases
creases demand).
demand). It isis also
It
associated also ofwithinterest
interestthe to speak of
speak
occupational of complementarities
complementarities
group. Those anti-complementary
between
between
features that areable, large m(x) on as such a function a feature
as per
of x. unless
Byrd
de-
More
unless etit
precisely,
itissingling
iscloser,
we start
ininitialised
inthe
by
the2-norm
randomly
sense, selecting
than asimply
preset threshold (will
== 0.1) toto
pational group. Toso, do we so, must
we must findfind some way ofal. closer,
(1995)) 2-norm at each
space sense,
that ofare the than50
associated aoccupations.
preset with threshold
highThis (
demand. 0.1)
Many of thes
features, rather than simply
features,
features, rather than
rather
of
thanthesimply
average
complementarity
simplyfor an offeatures
of
of20a. feature
creases
the complementarity
the complementarity
occupational demand).
group
toofan
are,
occu-
It
a feature
of aspeaking is also
feature pational
toofan
to
roughly,an interest group.
occu-
occu- those to thatTo50
speak do
are ofmost
complementarities
current occupations some as
return
way
and
between
our
either:
of singling
others
starting
50either:
local an an
willout
out
points
existing
optimisers existing
thebem(x):
of
the
high
occupation,
ofoccupation,
features
features
(near-)identical
demand or;
points
emerge,
occupations.
axpreviously
we
toand each
included
optimise
other
others optimiser.
tothe posterior
existing
This pro-
mean for
pational group. To dogroup.
so, we must associated withwith the occupational group. Those features that are large onor; a† in
(near-)identical, previously
skills,as aincluded
knowledge optimiser.
and
will x.abilities
of be This
(near-)identical pro- we to sta
ex
pational
pational group. To To dofind
do so, we
so,
exceptionally
some
we must
must
wayfind of singling
find some
creasing
significant, features,
some such
and way
way
outofthe
a rather
hence singling
of singling
feature most
features
than simply
out
out
increases
associated
the
the
characteristic, of
demand), theofcomplementarity
features
features whereas
the
theWe occupational
those
occupational run with of alarge
local group.
feature
optimisers to Those
negative an occu- features
(limited-memory
cedure will
that
return BFGS,are
awith
list
large
observing
ofmost
on
hotspots
able, m(x)
box-constraints, function More precisely,
associated with the occupational group. Thosegroup. features thatfeaturesare large doon
average foroccupational
an occupational group space
are, occupations.
that
speaking cedure
are
roughly,will
associated return
thoseBeginning
athatlist
highof
are with
hotspots
demand.
Beginning
most 50ofacurrent
demand
ofsingle
demand
Many
with (local
such(local
aofoccupations
these
single optimisers)
optimiser,
optimisers)
optimisers
such as ourwill
optimiser, x; ∀i},
be ofadd
;, ∀i}, of each
wepoints of his
†,i †,0
average foron an group are, speaking roughly, those that are starting
{x †,i
{x
associated
associated with the
with the occupational
grouping. As The
occupational such, key
group.
derivatives drawback
we pational
will Those
Those we group.
define features
say of
are
features Tothe that
that
with average
so, are
are welarge
large
anti-complementary must
large
positivederivative
onfind someasway
(increasing
average approach
per
deriva-of
Byrdsingling
such eta al. out
feature thede-
(1995)) features
initialised
length thatat each of the 50 occupations. This will
average for anaverage
occupational group are, speaking roughly, thoseroughly,that are most exceptionally significant, and hence (near-)identical, length
most characteristic, that ofmay
and may
of be
others
† the bedifferent
different
will
occupational for
beunless fordifferent
different
(near-)identical
Weit isrun mean
closer, mean
local toin functions.
functions.
existing
the
optimisers 2-norm Each
Each
occupations.
sense,hotspot
thancan
hotspot acan
preset
BFGS,thr
average for for anan occupational
occupational
tives to be group groupcreases
complementaryare,
are, associated
speaking
speaking
demand).
to those with
roughly, the
those
those
Itcharacteristic
is also ofthat
exceptionally
occupational
that
interest are most
are most
to
(large-valued) group.
speak
significant,
Those
returnfeatures
offeatures
and
50. local
complementarities
21 hence that most
optimisers arecharacteristic,
between
Beginning
of,be
large we on
m(x):
be
add points
interpreted
interpreted
with
each
apositive
the
single
occupational
insuccessive
xthrough skills,
through
such
knowledge
returning
returning
optimiser,
optimiser
†,0 and
xits itsvector
,per
wevector
abilities
add
unless
ofeachskills,
of skills,
is(limited-memory
itknowledge
knowledge
successive
closer,
a and
optimiser and abilities
abilities
ob
exceptionallyexceptionally
significant, and hence most
significant,
exceptionally significant, The and key
is that
andcharacteristic,
hence
hence
drawback most
most
features,
the
average averaging
of the occupational
the for
characteristic,
ofcharacteristic,
rather an occupational
than
average
itself
of the
of
simply
may
grouping.
theofoccupational
derivative group
occupational
theapproach
obstruct
grouping.
are,
complementarity
As accurate
As such, such,
isspeaking
that
we will
space
we will
ofroughly,
the athat
define
feature
av-
define
arethose
featuresfeatures
associated
to an that with
occu-are
with
with
large
most high
large
positive
demand.
averageaverage
Many
deriva-
of these
either:
deriva- asan
optimisers
existing
Byrd
will
occupation,
et
be
al. (1995)) or; previously
initialised at each included
of the o5
grouping. Asgrouping.
such, we will definewe features with features
large exceptionally
positive average deriva- tives tives
to betocomplementary
be complementary to thoseto those unless values
in
characteristic
characteristic isthe
it values closer, x, as
2-norm ,inas
x(large-valued)
†,i †,i
(large-valued) well
thewell as
sense,
2-norm
featuresthe
as thelist
features than
21 list
sense, of of
.cedure
21
.occupations
areturn
than preset
occupations
will areturn
preset
50 which
threshold
local awhich are
ofare
threshold
list
optimisers closest
(
hotspots to
( of=m(x):
closest 0.1)it.
ofto)demand
to
it.
topoints(local x† in optim
skills
grouping. As As such,
such, we will
eraging will define
define
itself interpretation.
may features
pational
occlude with
with
group. large
large
signal. As an
positive
To positive
do
As so,
anexample,
significant, we average
average
example,must andfind
TheThe
if
hence
deriva-
deriva- (x)most
fsome way 10characteristic,
(near-)identical,
of cos(x
10
singling ),, there
outofthe the
and occupational
others will be (near-)identical to existing
features
either: an existing occupation, or;the alength
previously
occupations.
that that included
may optimiser.
be associated
different This
forwith
differentpro-demand.
mean functions
tives to be complementary to those characteristic (large-valued) features 21 key drawback of the average derivative approach is that av- space
we.derivative
21
key drawback of nthe average derivative approach †,0is that the av- are high Many o
tives to
tives to bebe complementary
complementary
there are many toto those
thosepoints grouping.
characteristic
characteristic
associated for which with the As such,
(large-valued)
(large-valued)
the occupational
average will define
features
features group. 21features
.. Those
would bewith large
Beginning
features
very large:that positive
with are average
a single
large such
cedureon either:
deriva-optimiser,
will
an xexisting
, we add occupation,
each successive or;
optimiser a previously included
21 return ifaflist of hotspots nbe),of ninterpreted
demand (local optimisers)
through returning {x its ; ∀i}, of of skills, know
vector
the are many points for the which the average derivative would
The key drawback of drawback
the average derivative approach is that av- eraging eraging itself itself
may mayocclude occlude signal. signal. As an As example,
an example, if f (x) (x) 10 1010 cos(x
10
cos(xthreshold ), (near-)identical,
( = 0.1) to and others will be (near-)identical
†,i
The key
The key drawback
for instance, of the
of average
average
all tives
thosederivative
average for to
derivative
points an be complementary
approach
approach
occupational
immediately isisgroup
that
tothat the toare,thethose
the ofcharacteristic
av-
av-
leftspeaking a peak, unless
roughly, {xn(large-valued)
itthose
=is closer,that are in features
the
most 2-norm . sense, than a preset
eraging itselferaging
may occludeitself may signal.
may As ansignal.
occlude example, As an if fThe
an (x) key
example, 10drawback10
cos(x n ), there
of 10there
the are
average
are many
many points
derivative
points for which
for approach
which theisthe
averagelength
average
that the
optimiser.
that
derivative
av-
maywould
derivative be wouldThis
different procedure
be verybe for very
large: large:will
different
values xmean43,return
Beginning
†,i
asfunctions.
This with
43 well as a list
theEach of
list hotspots
hotspot
of occupations
pro- a single such optimiser, x , we add
can which †,0 are clo
eraging itself −occlude
+ 2πn; ∀signal. be
small very
As
exceptionally
positive large:
example,
and for
significant,ifif ff(x)
instance,
integer (x)
n}.and This 10
10all
hence10 cos(x
those
xcos(x most
is ann),),characteristic,
points
feature that immediately
either: is un- an
of the existing occupation,
occupational be interpreted
or; a previously
through
included
returning its
optimiser.
there are many points for which thefor average derivative would be very large:beforvery for
instance,
n
instance, all those
all those points points immediately
fimmediately to the to the
), left
left
of aofpeak, a peak, n {x =nvector
{xoptimisers) = unless of †,iskills,
it; ∀i}, knowledge
is closer, in theand abilities
2-norm sense, than a pres
there
there are many
are many points
points
likely for to be which
which useful the
the inaverage
grouping. eraging
average
policy: As itself
derivative
derivative
itsuch,
has an may
we willocclude
would
would
equal be very
define
number signal.
features
of large:
large:
pointsAs
withan example,
largecedure
(occupations) positive will if return
(x) a deriva-
average 1010ofcos(x
list
values of
hotspots ndemand
, as
of demand
well (local
asa the
(local
list of optimisers) {
{x }, of
of length that may
for instance, for all instance,
those points immediately toimmediately
tothe theleft left of toaathe
ofare peak,
peak, =for + −2πn; + 2πn; small
∀ small
small positive
positivepositive and andand
integer integer Thisx This n is xna isfeature feature
meanthat isoccupations
that un- is un- either: which are closest
an existing to it.
occupation, or; a previously inclu
†,i
there many {x
points −which {xnnthe ∀ average derivative would bebe 21 n}.
n}.
very forxdifferent
for instance, allwith
all those
those verypoints
points largeimmediately
tives to
negative be to the
complementary
derivative. left
left ofnaato
of
Increasing peak,
peak,
thosexn {x would ==
characteristic be verylength (large-valued)
harmful that may features different. large: functions. Each hotspot can
− + 2πn; ∀ small positive andpositive
integer n}. This xnfor isn}. ainstance,
feature that
all is
those un- pointslikely likely
to betouseful
immediately
be useful in in policy:
topolicy:
the it has
left
it has
of an an equal
equal
a peak, vary
number number for
of points different
of points (occupations)mean functions.
(occupations) cedure will Each
return hotspot can43beof demand (loca
a list of hotspots
−+
− +2πn;
2πn;∀∀ small small topositive
all such and and
integerinteger
integer
occupations. The n}. }.
key This
This
This
drawback
Nonetheless, xxnn is isisaaifa feature
feature
feature
of thethechosen thatthat
that
average isis un-
un- is unlikely
derivative
samples be interpreted
approach
(occupations) to be isusefulthrough
that the av- returning
{x n = its vector of skills, knowledge and abilities
likely to be useful withveryvery largelarge negative †,i isderivative. Increasing xn would be very harmful length that may be different for different mean func
likely in
likely to policy:
to be useful
be useful it has
in
contain
an equal
in policy:
policy:
evenititone
number
has
has anan equal
eraging
more
of +
equal points
itself
point
− 2πn;
number
number
ofmay
(occupations)
∀ small
large ofofpositive
occlude positive
points
points
with and
(occupations)
(occupations)
signal.
derivative As integer
an
than example,
negative
large
n}. This
values if fx(x)
negative
derivative.
n , asa well
feature Increasing
as the
1010 cos(x that list
n ), un-x would
isinterpreted
of noccupations be verywhichharmful
through arereturning
closest to it.its vector of skills, abilities
with very large withnegative derivative.
negative Increasing in there
policy: would
xnIncreasing
likelyitmany
has bebe
toaveragexan
very equal
harmful number
to all to such allofsuch points
occupations. (occupations)
occupations. Nonetheless,
Nonetheless, if the if the
chosen chosen samples samples (occupations)
(occupations) 43 be interpreted through returning its vector of skills
with very large
very large negative
derivative, derivative.
derivative.
xn may haveIncreasing
are high nnuseful
xpoints would
would forinbe
derivative.bepolicy:
which very
very the
This it
harmfulhas
harmful
average an
drawback equal
derivative
leadsnumber usof points
towould be very (occupations)
large:
to all such occupations.
to all
all such Nonetheless,
such occupations.
occupations. if the
Nonetheless,
the with
chosen
for very
samples
with
if the
the very
chosen (occupations)
large negative
samples contain contain
derivative.
(occupations) even even
one one
more
Increasing
morepoint point
leftxnofwould
of of large
large positive
positive
be very{xn harmful
and
derivative knowledge
derivative than than
large features
large negative
negative values
values x†,i, as as well wellasas thethe list list of
of occupations which a
to
contain even contain
one more point
regardingNonetheless,
of large positive
average
derivative
if
instance, large
derivative
than
allnegative
chosen as
large
asamples
those points
necessary
negative
derivative.
(occupations)
immediately
but
derivative,
Increasing
not sufficient
derivative, x
to the
xn may
may
condition
have havehigh
would
a peak,
high
average average
=
derivative.
derivative. This This
drawback drawback leads leads
to us to us
contain even even one onefor more
more point of
point
importance; of large
large
− +positive
rendering to all
positive
2πn; such
derivative
derivative
small
it∀ complementary occupations.
positive thanto
than large
large
and theNonetheless,
negative
negative
integer
marginal if thexnchosen
This
n}. correlation.
n
is a feature samples is un- occupations
that(occupations) which are closest to it.
be very regarding the average derivative as aas a necessary but but not43 not sufficient condition
toharmful to all such occupations. Nonetheless,
derivative, xnderivative,
may have high average derivative. This drawback
contain even leads to uspoint regarding thepositive
average derivative necessary sufficient condition
derivative, xxnn may may have
have high
high
To slightly average
averagelikely
ameliorate derivative.
derivative.
be theuseful This
This
problems inone more
drawback
drawback
policy:
of averaging, leads
leads
it has of to
an large
to us
us
equal
we number
calculate derivative
theof points than
em- (occupations) large negative
regarding theregarding
average derivative as derivative
a necessary for importance; rendering it complementary to the marginal correlation.
regarding the the average
average derivative if asas but
the
with aachosen
not sufficient
derivative,
necessary
necessary
very large but
but
samples xnnot
negative
condition
may
not have
sufficient
sufficient
for
high
(occupations)
derivative.
importance;
average
condition
condition Increasing contain
rendering
derivative.xn would even
it complementary
This drawback
one
be very harmful leadsto5.6. tothe us marginal
TREND
correlation.
EXTRAPOLATION 43
for importance; rendering it complementary to the marginal correlation. To slightly
To slightly ameliorate
ameliorate the problems of averaging, we calculate the theem- em-
for importance;
for importance; rendering
rendering to all regarding
itit complementary
complementary such to to the
the the
occupations. average
marginal
marginal derivative
correlation.
correlation.
Nonetheless, as
if thea necessary
chosen samples butthe notproblems
sufficientofcondition
(occupations)
averaging, we calculate
To slightly ameliorate the problems
21
Our definition of more
averaging, point
of complementarity
for we of large
calculate
importance;
is loosely positive
the em-
related
rendering
toderivative
that used by economists:
itofthecomplementary than large two negative
To slightly
To slightly ameliorate
ameliorate
features arethe the problems
problems
contain
complementary ifof
of even
theaveraging,
averaging,
one more
marginal we
we
value calculate
calculate
point product of large the em-
em-
onepositive
is increasing intothe
derivative thelevel marginal
than largecorrelation.
negative
As antoalternative to the labels from our foresight workshop
of the second. derivative, definitionTo
derivative, xnslightly
may
may have ameliorate
have high high the
average average problems
derivative. of averaging,
This
derivative. drawbackThis we iscalculate
leads therelated
torelated
us em-
21
This fails to meet our needs.
21
Our
The Our
first definition
definition
problem isofthat
complementarity
of complementarity the is loosely
loosely to that that
usedused by economists:
by economists: two two
21
Our definition definition makes no regarding
accommodation the average
for locationderivative
in feature features
two space. as features
a are are
necessary
The economics complementary
complementary but not if
definition the if the
marginal
sufficient marginal valuevalue
condition product product
participants,
of one of is one is increasing
increasing we use
in thein level
the level
non-parametric extrapolation of
21of complementarity is loosely related to that used by economists:
21 Our definition
definition of of complementarity
complementarity isis loosely
drawback loosely related
related to to that
thatto used by by economists:
economists: two
one isleads us in theregard the average derivative as
Our is a statement used offunction
the of second.
the two
second. ThisThis definition
definition fails fails
to meet to meet our needs.
our needs. The The first first problem problem is that is that
the the
features are complementary if the marginal value about
productfortheimportance;
of second-order increasing derivative
rendering ofit the
level complementary with respect
to the tomarginal
the two
features are
features are complementary
complementary ifif the
of the second. This definition fails features
to meet
the marginal
being
marginal
ourpositive;
needs. for
value21product
value
Theanfirst
product
Our
arbitrary
problem
of one
of
definition one isof
function,
is increasing
is that
increasing
complementarity
asthe may inindefinition
definition
be
the level
the
learned
level
is loosely
makes makes
by no related
no tonon-
accommodation
ouraccommodation
flexible that correlation.
used by location
for
for location economists: inhistorical
in feature two
feature
space. Theemployment
space. The economics
economics trends
definition
definition to give a more data-driven
of the
of the second.
second. This This definition
definition makes no accommodationparametric
definition fails
for locationmodel,
failsato necessary
to
the
in feature
meet
meet Toour slightly
our
second-order
space. The but
features needs.
needs. are ameliorate
notfirst
The
The first
complementary
derivatives
economics may
definition
the
sufficient
problem
problem beifis
problems
inthe iscondition
isstatement
aisvery that
that
marginal of value
the
the
a different
statement averaging,
about for
product
about
regions the of the we
importance; calculate
ofsecond-order
second-order
space. one is increasing the em-
derivative
derivative in
of the
theoflevelthe function
function withwith respect respect to the to two
the two
definitionmakes
definition makesno
is a statement about the second-order
noaccommodation
accommodationfor
The derivative
second, related,
forlocation
location
of theproblem
function
inthe
ofin feature
feature
with
with second.
the
space.
space.
definition
respect
Thedefinition
The
This
to the
economics
iseconomics
that
two it features
may
definition
definition
fails
features
leadto
being meet
to being our
positive; needs.
positive;
highlighting The
for
forfeature
an an first problem
arbitrary
arbitrary isasthat
function,
function, alternative
mayas the
may byto
be learned
be learned predicting
by
our our flexible
flexible non-non- demand in the year 2030.
isis aa statement
statement about
about the
features being positive; for an arbitrary
the second-order
second-order
combinations
function,which,
rendering
as may
derivative
derivative
even
21be if the
ofitthe
of
definition
learned
complementary
the function
function
bymakes
second-order nowith
our flexible
with respect to
respect
accommodation
non- parametric
derivatives to
are
to the
the
the
for
parametric
positive
two
two marginal
location
model,
and in
model,
constant thecorrelation.
thefeature space. The
second-order
second-order
across economics
derivatives
derivatives may definition
may
be inbevery in very different
different regions
regions of space.
of space.
5.6. Trend extrapolation
features being
features being positive;
positive; for
parametric model, the second-orderspace,
for an
an arbitrary
arbitrary function,
are actively
derivatives may beharmful.
Ouris definition
function, a as may
as may be
statement
As a simple
in very different
of
becomplementarity
learnedthe
learned
about
regions example,
bysecond-order
by our
our The isTheloosely
flexible
flexible
second, non-
non-
second, related
derivative to
of
related,
related,
for a bivariate quadratic function
of space.extrapolation
that
the
problem used
function
problem with by
with
the economists:
withthe respect
definition
definition is two
to the
thatisSpecifically,
ittwo
thatmayit may
lead lead
to we
to use
highlighting
highlighting a Gaussian
featurefeature process to regress UK and
parametric model,
parametric model, the
the second-order featuresfeatures
second-order derivatives
derivatives are
maycomplementary
may bebeing
be 5.6.
in very
in very Trend if the
different
different
positive; marginal
regions
regions value
of space.
of space. product asof one is ifincreasing byinourthe level
anfor an arbitrary
combinations function, which, may
even be learned
the second-order flexible non-
derivatives are positive and constant As across
an alternative to the labels
The second, related, problem with the
Thesecond,
The second,related,
withdefinition
related,problem
problemwith withthe Todefinition
positive-definite
is that
the ofslightly
itHessian
may lead(a
the parametric
definition second. ameliorate
isthat
is
toconvex
that This
itmay
it
bowl),
highlighting
maydefinition
model, lead
lead the to
to the
feature 5.6.
problems
occupation
fails to
highlighting
highlighting
second-order
space,
Trend
combinations
meet
on the
feature
space, our
feature
derivatives
are
extrapolation
which,
of wrong
needs. may
actively
even
averaging,
side
The
be
ifofthe
in
the
first
harmful. very
second-order
we
problem
different
As a is
derivatives
that
regions
simple theof
example, US are positive
space. employment
for a
and constant
bivariate (for
quadratic
across
the
functionyears we have) as a from our foresight
function of workshop
combinations which,
combinations
critical point
combinations which, even if the second-order
which, even
even ifif the
(the minimiser,
derivatives are positive
definition
the second-order
second-order
or the
makes
derivatives
derivatives no As
andlocation an alternative
constant
accommodation
are
are positive
positive
ofacross
the bottom
for
and constant
and
toofare
location
constant thethe
across
across
actively
inlabels
bowl) from
feature
harmful.
would our
space.
see As
The
a simpleworkshop
foresight
its
economics
example, participants,
definition
for a bivariate we quadratic function use extrapolation of historical employment trends to give a m
calculate The second,
thefor related,
empirical problem Aswith
distribution
with an the
with alternative
definition
of the
positive-definite
positive-definite isto
thatthe
Hessian it
derivative labels
may
Hessian(a lead from
(a
convex to
convex our
bowl), foresight
highlighting
bowl),
an feature
an workshop
occupation
occupation
years, on on
theparticipants,
forof both the
wrong wrong
side
absolute side
of we
theof the
employment values and
space, are actively harmful. As ademand
space, are
space, are actively
simpledecreased
actively harmful.
harmful. As
example, with
As aa simple
for
simple
aincreases
bivariate
is a statement
example,
example,
in
combinations
either
quadratic
for aause
about
or both
bivariate
bivariate
which,
function
theextrapolation
even
of
second-order
the
quadratic
quadratic
ifthe
the
features.
critical
of function
function
second-order
critical
point
Our
historical
derivative of
point
means
(thethe
of
employment
function
derivatives
(the
assessing
minimiser,
minimiser, with
are trends
respect
orpositive
the theand
or locationto give
to constant
the
of two
location aofmore
the across
the to
bottom
data-driven
bottom
of the the bowl)
bowl) would wouldalternative
see itssee its to predicting demand in share
the year 2030. Specifi
touse extrapolation ofbehistorical byemployment trends give a more data-driven
with positive-definite Hessian (a convex complementarity, however,
bowl), an occupation will
onmorethe correctly
wrong sideidentify
of the differing importances of features
with positive-definite
with positive-definite Hessian
critical point (the minimiser, or the
Hessian (a
combinations
(a convex
location ofatthe
convex
for features
bowl),
each
bowl),
anybottom
point
being
space,
in the are
feature
positive;
an occupation
occupation
offeature
an
bowl)
alternative
actively
space.
for
–wouldon an
thearbitrary
onharmful.
particularly,
the wrong
wrong predicting
As function,
side
side
demand
see its alternative aits
demandofof the
mean
the
simple
decreased demand
asexample,
may
decreased and
with inlearned
withfor the
standard
increases year
a bivariate
increases
in in2030.
our
either flexible
quadratic
either
or bothSpecifically,
or non-
offunction
both employment
the we Our
offeatures.
the features.use a values,
Our
means means ofGaussian
projecting
assessing
of assessing process to regress
forward to UK and US (for the years we
2030.
critical point
critical point (the
(the minimiser,
minimiser, or or the parametric
the location
locationwith model,
of the
of the bottom
bottom the second-order
ofof the
Gaussian theHessian
bowl)
bowl)
process derivatives
would
would tosee see may
its
its
regress to predicting
beUK in very
andwill demand
different
US (for in
regions
the the
of
years year
space. wethe 2030.
ofhave)
Specifically,
employ- we use a
demand decreased with increases in either or both of the features. positive-definite
Our means of assessing (a complementarity,
convex
complementarity, bowl),however,an however,
occupation will
more on
more the wrong
correctly
correctly side
identify identify the
the differing
differing importances
importances ment as a function of years, for both absolute employment v
of features
of features
demand decreased
demand decreased with
with increases
increases in indeviation.
Theor
either
either orsecond,
both
both of
criticalWe
ofrelated,
the
the
pointthen problem
features.
features.
(theof
ment exclude
as Our
Our with means
a means
minimiser, Gaussian
the
function features
definition
ofcombinations
of
or assessing
assessing
the process
of location
isat
years, whose
that of
for to
mayregress
atitthe
any
both derivative
lead
bottom
point to
absolute UK
inof the
feature and
highlightinghas
bowl) US
space.
employment
feature
would (forvalues
theitsyears
see and share we have) employ-
complementarity, however, will more correctly identify the differing importances features combinations any point in feature space. employment values, projecting forward to 2030.
complementarity,however,
complementarity, however,willwillmore combinations
morecorrectly
correctly identify
identify
demand which,
the
the even with
differing
differing
decreased
employment
ifimportances
theincreases
second-order
importances ment of
ofin as derivatives
features
a
features
either function
or both are
ofof positive
theyears,
features. andboth
for constant
Our across
absolute
means of Each
employment
assessing occupation values and is modelled
share separately using separately
a Gaussian
avalues, projecting forforward to 2030.
combinations at any point in feature space. Each occupation is modelled using a Gaussia
combinations at
combinations at any
any point
point inin feature
feature aspace.
standard
space. deviation
space, complementarity,
are actively harmful.
42 that
however, As willisemployment
in
simple
more the top
example,
correctly 97th
identify
values, percentile
a bivariate
the
projectingdiffering quadratic of function
importances
forward to 2030.of features
with positive-definite
combinations atHessian Each
any point occupation
(a convex
in feature is
bowl), modelled
space. an occupation separatelyon the wrong using a Gaussian
side of the Process Process (gp). ( ). A Matérn A Matérn Covariance
Covariance with with parameter ν =3/2
parameter = 3/2 is used fo
thecritical
distribution. AThe Matérn rationale
orCovariance Eachis that occupation
with the is
influence
parameter modelled = of separately
3/2these is42used using a
42 itsfor each gp; thisGaussian Process (gp).
point (the minimiser, the location of the bottom of the ν bowl)
5.6.
would see ensures sufficiently smooth extrapolation without losing the
42 demand decreased with ensuresincreases in A
sufficiently Matérn
either or bothCovariance
smooth ofextrapolation
the features. with Our meansTrend
parameter
without of ν extrapolation
assessing
losing =the3/2 is used
is usedfor
structure infor each
the each gp;; this this ensures sufficiently smooth
trends. To control the characteristic scale on which the G
features 42 is veryhowever,
42
complementarity, ‘noisy’ will(for example
moreensures
correctlysufficiently within
identify a particular
smooth
the differing extrapolation
importances without
of features losing the structure in the
trends.
combinations at any point in featuretrends.
To control the
space. To control
characteristic
42 scale Ason anwhichalternative
the to
Gaussian the
extrapolation labels
Process from our
without foresight
varies (aworkshop
losing length participants,
the scale), in we
we assign
structure a prior
the that centres the val
trends.
occupational grouping, demand scale),might strongly the characteristic scale on which the Gaussian Process
a prior increase usecentres
extrapolation of ofhistorical employmentscale trends to give
varies (a length we assign that the value the length such that adata
morepoints
data-driven
from the last year in our employ
scale such that varies
data(apoints lengthfrom scale),the we last assign a prior
alternative
year in that To control
centres
to predicting
our employment thedemand
series valuetheofincharacteristic
will thethelength scale
year 2030.employment
influence on numbers
Specifically, which
we usethe Gaussian
ina 2030. The gp is modelled
for some occupations
pirical distribution of the
and strongly
scale such
derivative for
decrease
that data
each feature
for
points other
from the last year in our employment series will
influence employment numbers in 2030.– particularly,
The Gaussian
gp is modelled its mean
process to regress
over
Process the log UKof and
varies (a US (foremployment
the
length the years
scale), we we have) employ-
numbers.
assign a prior that
occupations)
and standard and hence
deviation. influence
unlikely
We then 42 employment
to be afeatures
exclude reliable numbers basis
whose in 2030.
derivative The gp is modelled over
has a of years, for both absoluteAbsolute the log of
the employment numbers. ment as a function employment values and
employment share are modelled directly from
numbers
standard deviation that is in
Absolute thetheemployment
employmenttop 97th numbers.
percentile
numbers of theemployment
distribution.
directly values, The centresprojecting the value to
forward ofusing
the alength
2030. scale suchThe that data points
uponrationale
which istothatdesign
the
policy.
influence
Here
Absolute
of these
we haveisare
employment
features very
modelled
implicitly
numbers
‘noisy’
taken
are
(e.g. modelled
within
from
a
historical
directly
data
from historical data
Gaussian Process. share values are slightly more
using a Gaussian Process. The share values areEach slightly occupation
more from is modelled
involved. the First,
last separately
year in we using
ourmodel a the
Gaussian
employment Process
total workforce,
series (gp). , asinfluence
Twill a function of time and
a conservative view
particular occupational
we model of the using
thepotential
grouping, a Gaussian
demand
total workforce, of skillsProcess.
T , aspolicy
might stronglyThe
a function share
AtoMatérn
increase values
of time for are
some
Covariance slightly
and use our more
withmodel involved.
parameter ν to First,
= extrapolate
3/2 is usedthat for forward
each gp;tothis 2030. We then divide the ex
we model for theothertotal workforce, T ,divide
as sufficiently
a function of
employmenttime and totaluse our model
numbers in 2030. The gp is modelled
in thevaluesover
occupations and
precisely affect to
strongly
extrapolate
targeted
decrease
that
occupational forward to occupations)
2030.
groups. We and
ensures
then
However,
hence unlikely
the smooth
extrapolated extrapolation without
workforce losing the
values bystructure
the absolute to givethe
the share val
to be a reliable basis upon to extrapolate
which to design
workforce values by the absolute values to give that forward
policy. Here to we2030.
trends. have We
the share then
implicitly
To control value dividethe the extrapolated
characteristic
predictions. total
scale on which the Gaussian Process to compute the probabi
log of the employment We use these
numbers. trend extrapolations
there is a tradeoff We
taken a conservativein the view workforce
selection
of the potential values of by
of the threshold the
skills absolute
policy to
varies values
of the
precisely to
(a length give
affect the
scale),ofwe share value
theassign predictions.
trend a prior that centres
use these trend extrapolations to compute probability being higherthe value ofthe
demand, thesame
length demand or lower demand be
targeted occupational groups. We use
However, thethese theretrend is aextrapolations
tradeoff scale in such
theto selection
compute datathe
thatbetween probability
points
2015from and the of last
the trend
year
exclusion. being higher
Any non-linearity demand,
in demand same demand
will result or inlower
the demand 2030.inThe our probability
employmentofseries beingwillhigh is taken as the total po
of the threshold of exclusion. being
Any higher demand,
non-linearity in the
demandsame demand
will result or inlower
theAbsolute
demand employment
between 2015 and numbers are modelled directly
2030. The probability of being high is taken influence
as the total employmentpositivenumbers differencein 2030. The above gp 2isstandard
modelleddeviations,
over the loglow of is taken as the total neg
derivative pirical
varying
derivative varying pirical
over
above distribution
over 2030.
distribution
, thereby
2 standard thereby The
ofdeviations,
the probability
ofincreasing
the derivative
derivative
increasing thefor
low of
the being
isstandard
taken ashigh
forstandard
each each
feature
the is
feature
deviation. taken
–total
employment
the – Note
particularly,asfromthe its
particularly,
numbers.
negative totalmean positive
its
historical
difference mean datadifferenceusing2 astandard
Gaussian Process. The share
x, below deviations and the same otherwise.
that featuresand and2 standard
standard
excluded under above
deviation.
this 2 standard
deviation.
scheme We are We
then deviations,
then
exclude
excluded exclude low
features
only for is
featurestakenwhose
whose
consideration
Absolute as thederivative
derivative
employment total has negative
numbersahas aaredifference
modelled
deviation. below standard deviations and the same otherwise. To directly
generatefrom historical
a data data
set equivalent to our workshop resp
by the Note that
standard
average features
standarddeviation
derivative: below
the excluded
deviation
that 2 standard
that
is in
recommendations
To generate a data set equivalent to our is
the under
in deviations
topthe top
97thofthis97th scheme
and
percentile
this the
percentile
metric
usingworkshop same
of the
cannot
a Gaussian of values
otherwise.
the distribution.
distribution.
be
response Process.
areThe slightly
The
The share valuesfrom
we sample
more involved.
are these
slightly
First,
more involved.
extrapolated
we model
First,
probabilities.
the total
A three-parameter mu
are excluded only
trusted for these rationale
rationale
for
features. is the
isconsideration
thatThe Toinfluence
that generate
the
excluded influence
by athese
ofthe
features data are set
ofaverage
these
features
used equivalent
features
asisnormal
we very
model isto inour
very
‘noisy’
the every workshop
‘noisy’ (e.g.
(e.g.workforce,
workforce,
total within response
within
a T , asaweaasample
as function
function of time and use our model
from these extrapolated probabilities. A three-parameter multinomial distri- bution of time
is sampledandusing
use our
the model
probabilities of higher, same and
other facet of our particular
particular
modelling, from
occupationalas inthese
occupational extrapolated
grouping,
producing
bution is sampled using the probabilities of to grouping,demand probabilities.
demand
occupation-level might might
strongly
and
extrapolate
higher, A three-parameter
strongly
increase
aggregate
same and that increase
for
forward
lower. some multinomial
for
For each some
to 2030. We then distri- divide we thesample
extrapolated total
occupation 12 values and use these as the partici
derivative: theof
predictions recommendations
occupations
occupations
demand. and bution
and
strongly is of
sampled
strongly this
decrease metric
using
decreasefor the
for
other cannot
probabilities
other occupations)
occupations) ofand to
higher,and
hence extrapolate
same
hence
unlikelyand lower.
unlikely thatFor forward
each to 2030. We then divide the
occupation we sample 12 values and use these workforce
as the valuesparticipant by theresponses.
absolute values toAgive thevalue
noise shareofvalue predictions.
0 is assigned for every data point.
be trusted fortothese toa be
Abenoise reliable
features. occupation
a reliable
basis isbasis
0 The upon we
upon
which
excludedsample
which
toevery 12tovalues
design
featuresdesign
policy.andare use these
policy.
Here these
Here
we have as the
we have participant
implicitlyimplicitly responses.
value of assigned for data We
point. use extrapolated
trend extrapolations total workforce
to compute These values
thevalues
probability byof the
are then the absolute
fedtrend
into the Ordinalvalues Regression gp
takentaken avalues
a conservative A noise
conservative
areview value view
offed of into
the 0potential
of is
the assigned
potential for
of being of
skills every
skills
policy
higher data
policy
to point.to
precisely
demand, precisely
theandaffect
same affect
demand orcorresponding
lower demandprobabilities
between 2015 areand
used5.5. New occupations
as normal targeted
These
intargeted
every other
occupational
occupational
then
facet
These groups. of our
values
groups. are
However,
the Ordinal
modelling,
then
However, fed
there into
Regression
there
is as
the
a is in
Ordinal
a
tradeoff
gp
tradeoff
in to
model
give
Regression
the in the the
selection gp
the
share
model
selection value
and the predictions. computed as described in Se
corresponding probabilities are computed as2030. describedThe probability
in Section 5.2. of being high is taken as the total positive difference
We define a of potential thenew
theofthreshold occupation
corresponding
threshold as aAny
ofaggregate
exclusion. combination
probabilities
Any of computed
are
non-linearityskills, knowledgeaswill
described in the
Sectiontheis5.2.
producing occupation-level ofand
exclusion. non-linearity
predictions above 2 instandard
in demand
of demand will result
result
deviations, in in
low taken as the total negative difference
and abilitiesderivative
thatderivative
is likely
varying to over
see high
varying over
x, future
x, thereby
thereby demand,
increasing but
increasingthe isthenotstandard
standard associ-
below 2 standard deviations We
deviation. use
deviation. these
Note and Note trend
the
6.otherwise.
RESULTS to compute the
same extrapolations
demand.
ated with anthat 6. features
RESULTS
existing
that occupation.
features
excluded excludedTo
under forecast
under
this where
this
scheme new
schemeare occupations
areToexcluded
excluded only might
only
for for consideration
consideration
emerge, we simply
6. RESULTS generate aprobability data set equivalent of the totrend
our
We workshop
beingbelow
present response
higher we sample
our demand,
main results,the andsame
their interpretation,
by the byoptimise
the average
average the posterior
derivative:
derivative: themean thefor
recommendationsthe latent demand
recommendations
from of this
these of vari-metric
this
metric
extrapolated cannot cannotbe be A three-parameter multinomial distri-
5.5. able,
NEW as aWe
OCCUPATIONS presentof
function
trusted below
for More
these our main results,
precisely,
features. Thewe and their
start
excluded by interpretation,
randomly
features areselecting
used for
as bothprobabilities.
normal theinUS every and the UK economies. In presenting results for both count
m(x) trusted for these x.
We features.
present The
below excluded
our main features
results,
bution are used
and as
their
is sampled demand
normal in
interpretation,
using everyor lower
for both demand
the US between 2015 and 2030. The
and the UK
50 current occupations asoureconomies.
our starting In presenting results foroccupation-level
both countries, wethe probabilities of higher, same and lower. For each
caution
otherother
facet facet
of ofmodelling,
and our UK points
themodelling, as in as
economies. of inhigh
producing In demand
producing occupation
occupations.
occupation-level
presenting results
we sample andboth
for and
aggregate
probability
aggregate
12countries,
values ofand weuse
being caution
these
high asisthe participant
taken as the responses.
total positive
We run localpredictions
optimisers
predictions of(limited-memory
of demand. BFGS,
demand. observing box-constraints, 44
We define a potential new occupation as a combination A noise value of of 0 is assigned for every data point.
as per Byrd et al. (1995)) initialised at each of the 50 44 occupations. This will difference above two standard
†that is likely toThese 44 values are then fed into the Ordinal Regressiondeviations.
gp model andLow the is taken
skills,return
abilities and
50 local knowledge
5.5.optimisers
5.5.
New New
features
of occupations
m(x): points x in skills, knowledge and abilities
occupations
see
corresponding probabilities as the total are computed
negativeasdifference described inbelowSection two
5.2. standard
high space that are associated with high demand. Many of these optimisers will be
future demand,
We We
define
but
define
a
isanot
potential
associated
potential
new new
occupation
with
occupation as a
an as existing
a combination
combination of of
skills, skills,
knowledgeknowledge
(near-)identical, and others will be (near-)identical to existing occupations. deviations.
occupation.
BeginningTo and
with and
singleabilities
forecast
aabilities where
suchthat that
is new is likely
likely
optimiser, , wetoadd
xoccupations
to
†,0 see seeeach
high high 6.future
might
future RESULTS
demand,
successive demand, but isbut
optimiser notis associ-
not associ-
ated with
ated with an existing
an existing occupation.
occupation. To forecast
To forecast wherewhere
unless it is closer, in the 2-norm sense, than a preset threshold ( = 0.1) to new occupations
new occupations mightmight
emerge, we optimise thewe
emerge,
emerge, we simply
posterior
simply mean
optimise
optimise theforposterior
the posterior
themean latent
Wemean present
for for
the below To
thepro-
latent ourgenerate
latent
demandmain
demand
vari- a data
results,
vari- setinterpretation,
and their equivalent tofor our
both workshop
the US
either: an existing occupation, or; a previously included optimiser. This
demandcedurevariable,able, able,
will return as
as
a listm(x)
m(x) asfunction
a function
afunction
of ahotspots of
x.of
ofofdemand More
x.. (local
More More
precisely, and
precisely,
precisely, the
we start
optimisers) we{xUK
start
by economies.
by of
randomly
response
†,i randomly
; ∀i}, In selecting
presenting
selectingwe sampledresults from
for both countries,
these we caution
extrapolated
length that may 50 different
50 current
be current occupations
occupations as ouras
for different our functions.
starting
mean starting
pointspoints
of high
Each of high
demand
hotspot demand occupations.
canoccupations.
we start by randomly selecting
Welocal
run local 50 current
optimisers occupations probabilities. A three-parameter multinomial distribution is
be interpreted Wethrough
run optimisers
returning vector(limited-memory
its (limited-memory BFGS,
of skills, knowledge BFGS, andobserving
observing box-constraints,
box-constraints,
abilities 44
as our starting
values x†,i , as perasByrd
aspoints
well asperofByrd
the ethigh et
al.of
list al. (1995))
demand
(1995)) initialised
occupations.
initialised
occupations which at are
eachatclosest
each
of We
the of50the
to 50 occupations.
occupations.
it. This This
sampled using
will will the probabilities of higher, same and lower.
returnreturn 50 local
50 local optimisers
optimisers of m(x):
of m(x): pointspoints
x† in x in skills,
skills, knowledge
knowledge and abilities
and abilities
†
run local optimisation
spacespace
algorithms
thatassociated
that are
(limited-memory
are associated with demand.
with high high demand. BFGS,
ManyMany of theseof theseFor each
optimisers
optimisers occupation
will be
will be we sample 12 values and use these
observing box-constraints, (near-)identical,
(near-)identical, as
andper andByrd
others others
will etwill
be be1995)
al., (near-)identical
(near-)identical to existing
to existing as theoccupations.
occupations.
participant responses. A noise value of 0 is assigned
43
Beginning
Beginning with awithsinglea single such optimiser,
such optimiser, x†,0 , we
x add
†,0
, weeach
add successive
each successive optimiser
optimiser
initialised at each unlessofitthe
unless it50 occupations.
is closer,
is closer, in thein2-norm
the 2-normThissense,
sense, will return
than than
a preset a preset for( every
threshold
threshold data
0.1) topoint.
( = to
= 0.1)
either:
either: an existing
an existing occupation,
occupation, or; a previously
or; a previously included
included optimiser.
optimiser. This pro-
This pro-
cedurecedure will return
will return a list aoflist of hotspots
hotspots of demand
of demand (local(local These
optimisers)
optimisers) values
{x†,i ; {x
†,i
∀i}, ;of are then fed into the ordinal regression gp
∀i}, of
lengthlength
that maythat be may be different
different for different
for different meanmean functions.
functions. Each Each
hotspot
model hotspot
andcan the can
corresponding probabilities are computed
be interpreted
be interpreted throughthrough returning
returning its vector
its vector of skills,
of skills, knowledge
knowledge and abilities
and abilities
valuesvalues
x†,i , as , as as
x†,iwell well
theaslist
theoflist of occupations
occupations whichwhich asit.described
are closest
are closest to to it. in Section 5.2.
39
43 43
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
40
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
6. RESULTS
Here we present our main results and analysis for both the
US and the UK economies. In presenting results for both countries,
we caution that, as discussed in Section 3, cross-country comparison
is difficult, and is not the focus of this research. The first analysis we
present relates to the share of employment in 2030 by occupation,
and our model’s outputs are the probabilities of that share
being greater than at present. (Findings for the absolute level of
employment are available on request from the authors.) Below, we
informally use ‘increased demand’, ‘future demand’ (and, at times,
simply ‘demand’) as a shorthand for ‘increased share of employment
in 2030’.
41
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
6.1. OCCUPATIONS
The primary outputs of our model are the probabilities of each occupation experiencing a rise in workforce share (that is,
increased demand). These occupation-level results can then be aggregated to give the figures below. We distinguish the
percentage of the workforce in occupations predicted to see a rise in workforce share in 2030 with a ‘low probability’ (less
than 30%), ‘medium probability’ (> 50%) and ‘high probability’ (> 70%). (These probability thresholds are the ones used in
Frey and Osborne (2017)). That is, we calculate total employment that has probability of future increasing demand lying
above and below these three thresholds.
6.1.1. US
As per Section 4.2, our calculations are made at the finest level available for US occupations; that is, the six-digit US SOC
2010. The percentage of the US workforce as partitioned by the thresholds above is provided in Table 2.
Table 2: The fraction of the us workforce above and below varying thresholds for the probability
of increasing demand
Note: employment does not equal total US employment as it excludes any 6-digit SOC occupations for which O*NET data is missing. This consists of “All Other”
titles in the BLS data, representing occupations with a wide range of characteristics which do not fit into one of the detailed O*NET-SOC occupations. It also
consists of occupations for which O*NET is either in the process collecting data e.g. underwriters or has decided not to collect data e.g. legislators. Hence our
analysis excludes occupations equivalent to roughly 2 percent of total US employment.
42
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
In Figure 3, we plot (following Frey and Osborne, 2017) the distribution of current US employment over its probability
of future increased demand. We additionally distinguish this employment by an intermediate aggregation of the Major
Groups, as specified by the BLS 2010 SOC user guide (US Bureau of Labor Statistics, 2010).
Figure 3: The distribution of US employment according to its probability of future increased demand.
Note that the total area under all curves is equal to total US employment.
Management, Business and Financial Occupations Office and Administrative Support Occupations
Computer, Engineering and Science Occupations Farming, Fishing and Forestry Occupations
Figure 3 reveals a large mass of the workforce in employment with highly uncertain demand prospects (that is, a
probability of experiencing higher workforce share of around 0.5). Note that this contrasts sharply with the U-shaped
distribution by probability of automation in Frey and Osborne (2017), where the workforce is overwhelmingly in
occupations at either a very high probability or a very low probability of automation. That our predictions are more
uncertain is a direct result of the distinctions of our methodology from previous work. Firstly, the expert labels we gather
in our foresight exercises (see Section 4.3) are explicitly clothed in uncertainty, whereas Frey and Osborne (2017) assume
that participants are certain about their labels. This humility is partially motivated by the difficulty of the task assigned
to our experts: balancing all the macro trends that might influence the future of work. Our allowance for our experts’
self-assessed degrees of confidence also recognises that many of the macro trends act at cross-purposes, leading to
uncertainty about which will dominate in the case of any one occupation. Secondly, we use 120 O*NET features, against
the nine used in Frey and Osborne (2017). This more detailed characterisation of occupations renders occupations less
similar to one another, and hence limits the confidence of our model in predicting for one occupation based on what has
been labelled for another.
Table 3 lists the minor occupation groups about which our model is most optimistic.
43
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 3: For the US, the minor occupation groups with the greatest probabilities of future
increased demand
For these occupations, we characterise the fraction of their current employment that has a probability of
increased demand above two thresholds.
44
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
We derive a number of insights from Table 3, informed in part by our workshop discussions.
— Education and personal care occupations feature prominently in the rankings; however, healthcare occupations are
lower than expected by trends such as ageing, potentially reflecting uncertainty over the trajectory of healthcare
policy and spending in the US or technical issues related to the composition of the training set (which, in practice,
underrepresented healthcare occupations).
— Construction trade work, as a larger employer, is another beneficiary. It is supported by a number of trends, including
urbanisation, ageing and globalisation and is expected to be an important source of medium-skilled jobs in the future.
— Demand prospects can vary considerably for occupations that are otherwise very similar. For example, business
operations specialists – which typically need information management expertise – are set to grow as a share of the
workforce while neighbouring minor occupation groups in the SOC such as financial specialists (see Table 4) are
predicted to fall in share. Looking at the detailed occupation level, the results for business operations specialists
are driven by management analysts, training and development specialists, labour relations specialists, logisticians
and meeting, convention and event planners in particular – occupations that will conceivably benefit from the
reorganisation of work and the workplace.
— Another niche anticipated to grow in workforce share is other sales and related workers and, within that in particular,
sales engineers and real estate agents, notwithstanding the predicted decline in general sales occupations.
45
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 4: For the US, the minor occupation groups with the lowest probabilities of future
increased demand
We characterise for these occupations the fraction of their current employment that has a probability of increased
demand below two thresholds.
46
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
— These results support the importance of future routine-biased technological change. Notably they anticipate the impact
of automation encroaching on more cognitively advanced and complex occupations such as financial specialists.
— The predicted fall in retail sales workers and entertainment attendants, which between them account for a large
volume of employment, is consistent with an expansion in digitally provided goods and services.
— The transportation occupations represented may reflect a belief that driverless cars will disrupt the future workforce.
The rise of the sharing economy might reasonably be expected to lead to an increased demand for installation and
reparation jobs, especially in areas such as transport, as cars and other assets are used more intensively, but this
hypothesis is not supported here.
47
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
6.1.2. UK
Again, as per Section 4.2, our calculations are made at the finest level available for UK occupations; that is, the four-digit
UK SOC 2010. The percentage of the UK workforce as partitioned by the thresholds above is provided in Table 5.
Table 5: The fraction of the UK workforce above and below varying thresholds for the probability of increasing demand
Note: employment does not equal total UK employment as it excludes 4-digit SOC 2010 occupations corresponding to the 6-digit SOC occupations for which
O*NET data is missing. The latter consists of “All Other” titles in the BLS data, representing occupations with a wide range of characteristics which do not fit into
one of the detailed O*NET-SOC occupations. It also consists of occupations for which O*NET is either in the process collecting data e.g. underwriters or has
decided not to collect data e.g. legislators. Hence our analysis excludes occupations equivalent to roughly 1 percent of total UK employment.
In Figure 4, we describe (following Frey and Osborne, 2014) the distribution of current UK employment over its probability
of future increased demand. We additionally disaggregate this employment by Major Group (that is, at one-digit level)
in the UK SOC.
As with the US results, Figure 4 reveals that the model is notably more uncertain than in Frey and Osborne (2014). Note
from Table 5 that, relative to the US, a larger fraction of the UK workforce is predicted to grow in share; however, a larger
fraction of the UK workforce also faces a high risk of declining in share.
Figure 4: The distribution of UK employment according to its probability of future increased demand.
Note that the total area under all curves is equal to total UK employment.
Professional Occupations
Elementary Occupations
48
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table 6: For the UK, the minor occupation groups, or three-digit occupations, with the greatest
probabilities of future increased demand
For these occupations, we characterise the fraction of their current employment that has a probability of
increased demand above two thresholds.
—T
he presence of health and education-related and other service occupations is consistent with Baumol’s cost disease
hypothesis – that is, lower productivity growth sectors should be expected to experience increasing workforce shares
for a given increase in demand (Baumol et al., 2012).
he results also highlight the resilience of public sector occupations beyond health and education.21 It is unclear
—T
whether this is because workshop participants believe these occupations will grow faster than others, or because they
believe that any change will be less volatile due to institutional factors, such as higher job security and unionisation,
enabling participants to have a higher degree of confidence in them. However understood, the finding is consistent with
research by Acemoglu and Restrepo (2017a) showing that the public sector is among the few segments of the labour
market that holds up in areas affected by automation. Another possibility is that it reflects consumer concerns about
ethical, privacy and safety issues which, among other things, may affect the demand for regulation (World Economic
Forum (WEF), 2016; Jones, 2016).
—A
ctivities that are not subject to international trade feature heavily in the list of occupation groups: food preparation,
elementary services, hospitality and leisure services and sports and fitness, and electrical and electronic trades.
This is also consistent with the high-tech multiplier effects identified by Moretti (2012) and Gregory et al. (2016).
—S
ome of these occupations (food preparation and hospitality trades and other elementary service occupations) have
low skills requirements, but are associated with differentiated products which consumers value (Autor and Dorn, 2013).
They may therefore be ripe for job redesign to emphasise further product variety. Signs of this can be seen in the return
of artisanal employment: the remaking of goods and services such as barbering, coffee roasting, butchery, bartending,
carpentry, textiles and ceramics, incorporating elements of craft-based technical skill which are higher-end – and more
expensive – than in the past. Workers also draw on deep cultural knowledge about what makes a good or service
‘authentic’ and are able to communicate these values of ‘good’ taste to consumers. These markets benefit from the
blurring of tastes between high- and low-brow culture and trends such as reshoring and the importance of localness
in production (Ocejo, 2017).
—T
he craft phenomenon has been attributed, in part, to the consumption preferences of millennials. They may also
explain the support we find for occupations such as sports and fitness and therapy. Arguably, these activities represent
a broader definition of health – one which seeks to maintain people’s wellness through proper nutrition, exercise and
mental health rather than than simply to respond to illness through the provision of acute, episodic care.
—C
reative, digital, design and engineering occupations generally have a bright outlook. This can be seen from Table 6
but also Table B2 in the Appendix which shows the average probability of increased workforce share across minor
occupation groups. These activities are strongly complemented by digital technology but are also ones in which the
UK has a comparative advantage and benefits from trends such as urbanisation.
lthough ‘green’ occupations22 fall slightly below the 0.7 threshold, Table B2 shows that they are projected to grow
—A
in share (with a mean probability of 0.62).
A number of other occupation groups have a high probability of experiencing a fall in workforce share, as detailed
in Table 7.
49
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 7: For the UK, the minor occupation groups, or three-digit occupations, with the lowest
probabilities of future increased demand
We characterise for these occupations the fraction of their current employment that has a probability of increased
demand below two thresholds.
50
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
— Technological advancements and globalisation may account for why many manufacturing production
occupations are predicted to see a fall in workforce share.
— The predicted decline in the workforce share of administrative, secretarial and, to some extent, sales
occupations is also consistent with routine-biased technological change. Customer service jobs which entail
social interaction are perhaps harder to reconcile with this framework.
— Skilled trades occupations (SOC 2010 major group 5) exhibit a more heterogeneous pattern. Metal forming,
textiles, vehicle trades and, to an extent, construction and building trades are projected to fall in share; by
contrast, electrical and electronic trades, food preparation, building finishing and other skilled trades (for
example, glass and ceramics makers, decorators and finishers) fare much better (see Table B2). This suggests
that there are likely to be pockets of opportunity for those lower down the skills ladder, depending on which
choices are made.
51
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Firstly, to test for the generalisability of our results, we perform a cross-validation exercise. In particular, we randomly
select a reduced training set of half the available data (corresponding to the labels for 15 occupations); the remaining
data form a test set. On the test set, we evaluate the receiver operating characteristic (ROC) curve (Murphy, 2012).
Given that we observe ternary, as opposed to binary, labels, the ROC curve is a surface (Waegeman et al., 2008).
An approximation is made to the volume under the surface (VUS), whereby the area under the curve (AUC) is calculated
separately for each ternary component. The volume under the ROC surface is taken to be the mean of the separate AUC
slices. As the workshop data or model prediction for an occupation is either an empirical or posterior distribution over
ternary labels respectively, samples are drawn and the mean VUS is calculated. Due to uncertainty in the distribution,
a normalised VUS is calculated by dividing the test VUS by the ground truth VUS. We repeat this experiment for 50
random splits of the 30 occupations from the workshop set. Each experiment is composed of 10 training occupations
and 20 test occupations.
A high AUC/VUS (which ranges from 0.5 to 1) indicates that our model is able to reliably predict 15 occupations given a
distinct set of 15 occupations. This would suggest that the model is effective and that the training set is self-consistent.
It would also imply a degree of robustness of our results to the inclusion (or exclusion) of a small number of occupations
in that training set.
We also investigate how breaks in long-term trends as perceived by the workshop experts contribute to the findings above.
We do this by re-running the predictive model but using non-parametric extrapolation (as described in Section 5.6) of
occupational employment to label the occupations in place of the experts’ judgments.
As examples of extrapolated trends, Figure 5 plots the share of employment extrapolations for three UK occupations,
one each for higher, same and lower probability. Figure 6 shows the absolute employment extrapolations for three
US occupations, one each for higher, same and lower probability. The shaded areas give the 90% credibility interval
around the mean.
52
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Note that, in what follows, all rankings of occupation groups from our model use the employment-weighted average
of probabilities of occupations within the group.
6.2.1. US
Firstly, Table 8 presents the results of our cross-validation exercise on US workshop data. The high VUS results
suggest that our model is robust, and that our results are not sensitive to small changes to the training set.
Table 8: The volume under the receiver operating surface from cross-validation of model on
US workshop data
Table 9 compares the probability of increased demand generated by trend extrapolation against that obtained from the
judgment of experts.
53
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 9: The percentage point (PP) differences (of trend extrapolation from our workshop-trained
predictions) in the probabilities of future demand at minor occupation group level
That is, each row is the probability produced by trend extrapolation minus the probability produced by the
workshop, multiplied by 100.
PP DIFFERENCE IN
TITLE
PROBABILIT Y OF DEMAND
Other Office and Administrative Support Workers -52.2
Legal Support Workers -47.8
Financial Clerks -47.0
Communications Equipment Operators -42.2
Financial Specialists -39.6
Information and Record Clerks -39.2
Secretaries and Administrative Assistants -38.7
Sales Representatives, Services -35.6
Entertainment Attendants and Related Workers -34.4
Retail Sales Workers -32.7
Rail Transportation Workers -27.7
Motor Vehicle Operators -27.5
Other Protective Service Workers -27.4
Material Recording, Scheduling, Dispatching and Distributing Worker -27.4
Sales Representatives, Wholesale and Manufacturing -26.3
Librarians, Curators and Archivists -26.2
Supervisors of Transportation and Material Moving Workers -26.1
Other Healthcare Support Occupations -25.7
Other Transportation Workers -25.4
Assemblers and Fabricators -24.9
Other Production Occupations -23.6
Lawyers, Judges and Related Workers -23.1
Baggage Porters, Bellhops and Concierges -22.8
Food and Beverage Serving Workers -22.8
Supervisors of Office and Administrative Support Workers -22.4
Printing Workers -22.1
Forest, Conservation and Logging Workers -21.6
Health Technologists and Technicians -21.2
Food Processing Workers -21.1
Supervisors of Sales Workers -20.7
Operations Specialties Managers -20.4
Supervisors of Personal Care and Service Workers -20.3
Cooks and Food Preparation Workers -20.2
Top Executives -19.7
Supervisors of Food Preparation and Serving Workers -19.6
Religious Workers -19.3
Other Sales and Related Workers -19.2
Other Construction and Related Workers -19.2
Media and Communication Workers -18.9
Plant and System Operators -18.7
Law Enforcement Workers -18.5
Business Operations Specialists -18.1
Textile, Apparel and Furnishings Workers -18.1
Counselors, Social Workers and Other Community and Social Service Specialists -18.0
Extraction Workers -17.9
Material Moving Workers -17.2
Other Management Occupations -17.1
Metal Workers and Plastic Worker -17.1
Supervisors of Production Workers -16.8
Water Transportation Workers -16.3
Fishing and Hunting Workers -15.9
Tour and Travel Guide -15.8
Nursing, Psychiatric and Home Health Aides -15.6
Other Teachers and Instructors -15.4
Health Diagnosing and Treating Practitioners -15.4
54
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table 9 (continued)
PP DIFFERENCE IN
TITLE
PROBABILIT Y OF DEMAND
Social Scientists and Related Workers -15.2
Supervisors of Protective Service Workers -15.2
Woodworkers -15.1
Vehicle and Mobile Equipment Mechanics, Installers and Repairers -14.8
Media and Communication Equipment Workers -14.5
Agricultural Workers -14.4
Other Food Preparation and Serving Related Workers -14.4
Supervisors of Construction and Extraction Workers -14.2
Occupational Therapy and Physical Therapist Assistants and Aides -14.1
Supervisors of Building and Grounds Cleaning and Maintenance Workers -13.9
Advertising, Marketing, Promotions, Public Relations and Sales Managers -13.8
Other Education, Training and Library Occupations -13.7
Life, Physical and Social Science Technicians -13.6
Supervisors of Farming, Fishing and Forestry Workers -13.5
Air Transportation Workers -13.4
Other Healthcare Practitioners and Technical Occupations -13.3
Preschool, Primary, Secondary and Special Education School Teachers -12.2
Other Personal Care and Service Workers -11.9
Fire Fighting and Prevention Workers -10.9
Entertainers and Performers, Sports and Related Workers -10.8
Funeral Service Workers -10.6
Other Installation, Maintenance and Repair Occupations -10.4
Construction Trades Workers -9.9
Drafters, Engineering Technicians and Mapping Technicians -8.4
Electrical and Electronic Equipment Mechanics, Installers and Repairers -7.8
Life Scientists -7.3
Art and Design Workers -6.8
Helpers, Construction Trades -6.5
Supervisors of Installation, Maintenance and Repair Workers -5.9
Architects, Surveyors and Cartographers -5.8
Mathematical Science Occupations -4.5
Computer Occupations -4.2
Grounds Maintenance Workers -3.5
Post-secondary Teachers -3.3
Physical Scientists -1.9
Personal Appearance Workers -1.0
Building Cleaning and Pest Control Workers 4.2
Animal Care and Service Workers 4.4
Engineers 8.8
55
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 10 further ranks occupation groups, using the same intermediate aggregation of major groups as described in
Section 6.1, by their probability of rising demand using our expert judgment training set. It compares these against rankings
based on independent quantitative forecasts for the year 2024 from the US BLS. (See Table A1 in the Appendix for major
group rankings).
Table 10: Relative rankings of intermediate aggregation of major occupation groups in the US by our
model, and by independent forecasts from the BLS
R ANKING FROM EXPERT JUDGEMENT R ANKING FROM BLS PROJECTIONS 2014 –2024
Education, Legal, Community Service, Arts and Media Occupations Healthcare Practitioners and Technical Occupations
Management, Business and Financial Occupations Computer, Engineering and Science Occupations
Construction and Extraction Occupations Education, Legal, Community Service, Arts and Media Occupations
Installation, Maintenance and Repair Occupations Transportation and Material Moving Occupations
Office and Administrative Support Occupations Office and Administrative Support Occupations
— Firstly, Table 9 makes it clear that expert judgment is considerably more pessimistic than trend extrapolation for
most minor occupation groups. (This can also be seen by inspecting the distribution of current employment over its
probability of future demand under trend extrapolation in Figure A1 in the Appendix and comparing it with Figure 3).
The divergence is largest for routine cognitive, as opposed to routine manual, occupations.
— We see considerable divergence across the three outlooks (our workshop-informed model, our trend extrapolation,
and the BLS projection). This is arguably most marked for occupations in legal, architecture and engineering and arts,
design, entertainment, and sports and media. There is seemingly greater agreement, however, over occupations
projected to decline in workforce share.
56
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
6.2.2. UK
We begin by detailing in Table 11 the VUS scores from our cross-validation exercise on UK workshop data.
Again, the high results give some reassurance both that the model is robust and that the results are not sensitive
to minor alterations to the choice of occupations in the training set.
Table 11: The volume under the receiver operating surface from cross-validation of model on UK
workshop data
57
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 12 compares the probability of rising workforce share generated by trend extrapolation versus the judgment
of experts.
As with the US, although to a lesser degree, it shows that expert judgement is more pessimistic than trend extrapolation for
most minor occupation groups. (This can also be seen in the distribution of current employment over its probability
of future demand under trend extrapolation in Figure A2 in the Appendix and comparing it with Figure 4). Expert judgement
is particularly more pessimistic about management and supervision roles. There are few cases where the experts are more
optimistic than trend extrapolation – construction and related occupations (not supervisors) are the main exception.
Table 12: The percentage point (PP) differences (of trend extrapolation from our workshop-trained
predictions) in the probabilities of future demand at minor occupation group level.
Each row is the probability produced by trend extrapolation minus the probability produced by the workshop,
multiplied by 100.
PP DIFFERENCE IN
OCCUPATION TITLE
PROBABILIT Y OF DEMAND
Construction and Building Trades Supervisors -38.2
Skilled Metal, Electrical and Electronic Trades Supervisors -31.9
Production Managers and Directors -31.6
Managers and Directors in Transport and Logistics -27.8
Mobile Machine Drivers and Operatives 25.1
Sales Supervisors -24.6
Managers and Directors in Retail and Wholesale -24.6
Senior Officers in Protective Services -24.5
Administrative Occupations: Office Managers and Supervisors -24.0
Elementary Storage Occupations -23.1
Conservation and Environmental Associate Professionals -21.4
Agricultural and Related Trades -20.9
Other Drivers and Transport Operatives -20.9
Chief Executives and Senior Officials -20.4
Managers and Proprietors in Agriculture-related Services -20.4
Elementary Sales Occupations -19.9
Financial institution Managers and Directors -19.1
Sales, Marketing and Related Associate Professionals -19.0
Public Services and Other Associate Professionals -19.0
Cleaning and Housekeeping Managers and Supervisors -18.4
Functional Managers and Directors -17.9
Administrative Occupations: Finance -17.3
Managers and Proprietors in Hospitality and Leisure Services -17.3
Elementary Administration Occupations -16.9
Conservation and Environment Professionals -16.6
Food Preparation and Hospitality Trades -16.4
Legal Professionals -16.2
Process Operatives -16.2
Managers and Proprietors in Health and Care Services -15.6
Managers and Proprietors in Other Services -15.4
Architects, Town Planners and Surveyors -15.0
Transport Associate Professionals -15.0
Protective Service Occupations -15.0
Elementary Cleaning Occupations -14.8
Sales Assistants and Retail Cashiers -14.7
Health and Social Services Managers and Directors -14.6
Business, Research and Administrative Professionals -14.2
Administrative Occupations: Records -13.7
Other Administrative Occupations -13.4
Elementary Agricultural Occupations -12.7
Elementary Process Plant Occupations -12.6
Hairdressers and Related Services -12.0
Sales-related Occupations -11.7
Business, Finance and Related Associate Professionals -11.6
Engineering Professionals -10.4
Assemblers and Routine Operatives -10.2
Animal Care and Control Services -9.9
Quality and Regulatory Professionals -9.7
Housekeeping and Related Services -9.6
58
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table 12 (continued)
PP DIFFERENCE IN
OCCUPATION TITLE
PROBABILIT Y OF DEMAND
Road Transport Drivers -9.6
Other Elementary Services Occupations -9.5
Customer Service Occupations -9.5
Administrative Occupations: Government and Related Organisations -8.9
Elementary Security Occupations -8.1
Secretarial and Related Occupations -7.9
Research and Development Managers -7.5
Customer Service Managers and Supervisors -7.2
Leisure and Travel Services -5.9
Natural and Social Science Professionals -5.8
Legal Associate Professionals -5.6
Librarians and Related Professionals -4.8
Welfare and Housing Associate Professionals -4.6
Sports and Fitness Occupations -4.3
Information Technology and Telecommunications Professionals -3.8
Textiles and Garments Trades -2.7
Caring Personal Services -2.2
Printing Trades -1.9
Welfare Professionals -1.8
Artistic, Literary and Media Occupations -1.7
Media Professionals -1.5
Plant and Machine Operatives -1.0
Information Technology Technicians -1.0
Teaching and Educational Professionals -0.7
Health Associate Professionals 0.0
Nursing and Midwifery Professionals 0.0
Design Occupations 0.4
Science, Engineering and Production Technicians 0.7
Other Skilled Trades 0.9
Health Professionals 2.0
Construction Operatives 3.2
Vehicle Trades 4.4
Childcare and Related Personal Services 5.2
Metal Machining, Fitting and Instrument-making Trades 5.6
Metal Forming, Welding and Related Trades 7.2
Building Finishing Trades 9.2
Therapy Professionals 10.8
Elementary Construction Occupations 14.6
Construction and Building Trades 15.6
Draughtspersons and Related Architectural Technicians 20.3
Electrical and Electronic Trades 22.6
59
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 13 ranks major occupation groups by their probability of rising demand using our expert judgement training set,
and compares against rankings based on independent quantitative forecasts for the year 2024 from the former UK
Commission for Employment and Skills (UKCES).
We comment on the rankings provided by our model and the UKCES, as described in Table 13.
— The rankings are broadly consistent across the two outlooks. However, there are important exceptions when
we drill down and examine minor group or three-digit occupations (see Table A2 in the Appendix for sub-major
group rankings).
— One difference is that the experts are a little less confident about rising demand for customer service occupations.
This places them closer to the results found in Frey and Osborne (2017).
— Our approach also reflects less optimism about occupations relating to personal care. While this may seem difficult to
reconcile with an ageing population (care workers and home carers being the largest occupation within this group),
it may also reflect uncertainty over the fact that people are working longer and leading healthier, more independent
lives – a theme and tension which ran through the UK workshop discussions. Another difference is that our expert-
informed model is more optimistic than UKCES on the prospects for certain medium-skilled jobs for example, skilled
metal, electrical and electronic trades and textiles, printing and other skilled trades – areas where apprenticeships
have traditionally been an important route to entry.
— Our model is also more optimistic about culture, media and sports, teaching and educational professionals and
science, research, engineering and technology occupations than UKCES.
Table 13: Relative rankings of major occupation groups in the UK by our model, trained on expert
judgement, and by independent forecasts from the UKCES
Associate Professional and Technical Occupations Caring, Leisure and Other Service Occupations
Caring, Leisure and Other Service Occupations Associate Professional and Technical Occupations
Sales and Customer Service Occupations Process, Plant and Machine Operatives
60
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
6.3. SKILLS
We now describe the findings of our study on the relationships between O*NET variables (which we refer
to as ‘features’ and occasionally informally refer to as ‘skills’) and future demand. Note that our methodology
(see Section 5.4) provides employment-weighted results: as such, all the results that follow are robust to outlying
occupations with small employment.
We use two measures of the importance of features to future demand: the Pearson correlation coefficient
(Section 5.4.1) and the average derivative (Section 5.4.2). In interpreting the values of these measures, note firstly
that the correlation coefficient lies between −1 and 1. The average derivative is calculated by considering the
derivative of an unobservable real-valued function linked to demand. It is dimensionless; an average derivative’s
magnitude is significant only relative to that of another average derivative. For either measure, positive values are
associated with features whose increase is expected to increase demand, and negative values with features
whose increase is expected to decrease demand.
As described in Section 5.4.2, we exclude especially noisy features from consideration under the average
derivative measure.
Note that the excluded variables are predominately knowledge features. The significance of knowledge,
perhaps more than other features differs considerably across occupations. It is hence not surprising that
these features are more likely to be less reliable than others under the average derivative metric.
61
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
6.3.1. US
Table 14: A ranking, by Pearson correlation, of the importance of O*NET variables to future demand
for US occupations
62
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table 14 (continued)
63
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Figure 7: The 10 most important O*NET features as ranked by Pearson correlation for the US
64
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Figure 8: The 10 least important O*NET features as ranked by Pearson correlation for the US
65
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table 14 ranks, by Pearson correlation coefficient, all 120 variables according to their association with a rising occupation
workforce share (in declining order of strength). Figures 7 and 8 plot, respectively, the top and bottom ten O*NET
variables as ranked by Pearson correlation. Table C1 in the Appendix also provides aggregate rankings
for the average derivative.
— The results confirm the importance of 21st century skills in the US, with a particularly strong emphasis on
interpersonal competencies. This is underscored by the presence of skills such as Instructing, Social Perceptiveness
and Coordination, and related knowledge domains such as Psychology and Sociology and Anthropology.
— This is consistent with the literature on the increasing importance of social skills – recall the fact that between 1980
and 2012, jobs with high social skills requirements grew by nearly 10 percentage points as a share of the US labour
force (Deming, 2015). There are good reasons to think that these trends will continue – not only as organisations
seek to reduce the costs of coordination but also as they negotiate the cultural context in which globalisation and
the spread of digital technology are taking place (Tett, 2017). Take over a variety of interventions targeted at different
stages of the life cycle have proven successful in fostering social skills. The evidence base is largest on the success of
early programmes. Workplace-based internships and apprenticeships, however, also have a good track record due to
the need to learn informal or tacit knowledge and skills, and the bonds of attachment that can be formed between a
supervisor and an apprentice (Kautz et al., 2014).
— The results also emphasise the importance of higher-order cognitive skills such as Originality and Fluency of Ideas.
Learning Strategies and Active Learning – the ability of students to set goals, ask relevant questions, get feedback
as they learn and apply that knowledge meaningfully in a variety of contexts – also feature prominently.
— Progress towards developing these skills as part of the formal education system has been slow due to difficulties
in understanding how they arise and develop over time and how they can be embedded in the curriculum and formal
assessments. Nonetheless, a number of initiatives have shown promise and are beginning to shape domestic and
international policy dialogue (Schunk and Zimmerman, 2007; Lucas et al., 2013; OECD, 2016a). Strengthening the
affective aspects of education and a lifelong learning habit, especially among boys and students from disadvantaged
backgrounds who tend to have lower levels of motivation, is a further area of interest for policymakers. The research
literature shows that teachers can play an important role – both in raising student expectations and in rewarding the
process of learning – for instance, in giving students opportunities to share the results of their work with others or
explain why what they learned was valuable to them, though they are unlikely to be sufficient in the absence of other
policies to promote educational excellence and equity (Covington and Müeller, 2001; Diamond et al., 2004; Weinstein,
2002; Hampden-Thompson and Bennett, 2013; OECD, 2017).
— In addition to knowledge fields related to social skills, English language, History and Archeology, Administration
and Management and Biology are all associated strongly with occupations predicted to see a rise in workforce
share, reminding us that the future workforce will have generic knowledge as well as skills requirements.
— Psychomotor and physical abilities are strongly associated with occupations with a falling workforce share.
Interestingly, this includes abilities such as Finger Dexterity and Manual Dexterity, which Frey and Osborne (2017)
identified as key bottlenecks to automation. Trade and offshoring offer a potential explanation for why these skills
might fall in demand – consistent with workshop participants having considered a broad range of trends. The main
feature that makes a job potentially offshorable or vulnerable to import competition hinges less on a task’s routineness
or non-routineness than the cost advantages of producing overseas, and the marginal importance of face-to-face
interactions in the production process.
— The correlations for variables associated with a rising occupation workforce share, are in general, stronger than those
associated with a falling occupation workforce share. This is perhaps not surprising: all things being equal, an increase
in the value of any O*NET variable for an occupation makes it more skilled, and might broadly be expected to result
in greater demand (even if there are other reasons why the occupation will experience a fall in demand). It is also
fortunate: our core emphasis is on informing skills policy, which has a natural focus on those skills most strongly
linked to growing demand.
66
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
6.3.2. UK
Table 15: A ranking, by Pearson correlation, of the importance of O*NET variables to future
demand for UK occupations
67
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
68
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Figure 9: The 10 most important O*NET features as ranked by Pearson correlation for the UK
10. Monitoring
69
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Figure 10: The 10 least important O*NET features as ranked by Pearson correlation for the UK
70
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table 15 ranks, by Pearson correlation coefficient, all 120 variables according to their association with a rising
occupation workforce share (in declining order of strength). Figures 9 and 10 plot the top and bottom 10 O*NET
variables as ranked by Pearson correlation. Table C2 in the Appendix also provides aggregate rankings for the
average derivative.
— As in the US, the results confirm the importance of 21st century skills, though now with a particularly strong
emphasis on cognitive competencies and learning strategies.
— Interestingly, systems skills, relatively underexplored in the literature, all feature in the top 10. Systems
thinking emphasises the ability to recognise and understand socio-technical systems – their interconnections
and feedback effects – and choose appropriate actions in light of them. It marks a shift from more reductionist
and mechanistic forms of analysis and lends itself to pedagogical approaches such as game design and case
method with evidence that it can contribute to interdisciplinary learning (Tekinbas et al., 2014; Capra and Luisi,
2014; Arnold and Wade, 2015).
— The combined importance of these skills and interpersonal skills supports the view that the demand
for collaborative problem-solving skills may experience higher growth in the future (Nesta, 2017).
— Knowledge fields such as English language, Administration and Management, Sociology and Anthropology and
Education and Training are all associated strongly with occupations predicted to see a rise in workforce share,
again highlighting the importance of generic knowledge requirements.
— Like the US results, psychomotor and physical abilities, including Manual Dexterity and Finger Dexterity are
strongly associated with occupations with a falling workforce share.
— Also as in the US results, correlations for skills, knowledge areas and abilities associated with a rising occupation
workforce share are stronger than those associated with a falling occupation workforce share.
71
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
We now provide a comparison of the overall relative importance of skills, abilities and knowledge areas as captured in
O*NET. Figure 11 shows the results for a) the US and b) the UK. All figures feature on the horizontal axis the rank of all
O*NET features: the further to the right, the less important it is for demand. This importance is assessed using linear
(Pearson correlation coefficient) and non-linear (average derivative) metrics.
Figure 11: The relative importance of skills, abilities and knowledge as assessed by Pearson correlation
coefficient
(a) US Results
1.0 0.6
Abilities
Knowledge
Skills
Average Correlation Coefficient (weighted)
0.8 0.4
0.4 0.0
0.2 -0.2
0.0 -0.4
0 20 40 60 80 100
Ranking
O*NET Ranking
1.0 0.8
Abilities
Knowledge
Skills
Average Correlation Coefficient (weighted) 0.6
0.8
Correlation Coefficient (weighted)
0.4
0.6
Proportion
0.2
0.4
0.0
0.2
-0.2
0.0 -0.4
0 20 40 60 80 100
Ranking
O*NET Ranking
72
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Figure 12: The relative importance of skills, abilities and knowledge as assessed by average derivative.
(a) US Results
1.0 4
Abilities
Knowledge
Skills
3
Average Gradient (weighted)
0.8
Gradient (weighted)
0.6
Proportion
0
0.4
-1
0.2
-2
0.0 -3
0 20 40 60 80 100 120
Ranking
O*NET Ranking
(b) UK Results
1.0 2.0
Abilities
Knowledge
Skills 1.5
Average Gradient (weighted)
0.8
1.0
Gradient (weighted)
0.5
0.6
Proportion
0.0
0.4
-0.5
-1.0
0.2
-1.5
0.0 -0.2
0 20 40 60 80 100 120
Ranking
O*NET Ranking
In all the plots, it can be seen that abilities are broadly less important (weighted to the right). Perhaps the most interesting
insight, however, is that the non-linear metric (the average derivative) gives knowledge features more weight to the left.
That is, a non-linear measure ranks knowledge features more highly. Recall that the benefit of a non-linear metric is that it
allows us to discover complementarities: skills that are only important if other skills take high values. As such, this result is
compatible with the intuition that knowledge features (such as Psychology and Foreign Language) are mostly valuable as
complements. We find a similar pattern in a large number of Science, Technology, Engineering, and Mathematics (STEM) -
features (such as Science, Technology Design and Operations Analysis). They are not equally useful to all occupations (as
would be required to be assessed as important for our linear metric), but find use only for some specialised occupations
that have high values for other skills.
73
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Recall from Section 5.4.2, that we say that an O*NET feature a is complementary to an O*NET feature b if increasing
a increases demand for occupations with large values of b. Conversely, a is anti-complementary to an O*NET
feature b if increasing a decreases demand for occupations with large values of b. For each sub-major occupation
group, for both the US and UK, we rank the three features that would most drive: (i) a rising workforce share for
a unit increase in the feature (complementary features), and (ii) a falling workforce share for a unit increase in the
feature (anti-complementary features). As such, we are also able to establish which features are most important
for different regions of the skills space. We represent the position of an occupation group in skills space by listing
its highest ranked features, which we term its current features.
6.5.1. US
We describe in Table 16 complementary and anti-complementary O*NET variables for US sub-major occupation groups.
Take Production Occupations, for example, which Figure 3 shows are predicted to see a fall in workforce share.
According to the O*NET data, Production and Processing, Near Vision and Problem Sensitivity are the three
most important or emblematic features for this occupation group. Our model predicts that increasing Customer
and Personal Service, Technology Design and Installation in the presence of these features will have the greatest
positive impact on future demand, while increasing Rate Control, Operation and Control and Quality Control
Analysis) will have the greatest negative impact. Looking across all occupation groups, Customer and Personal
Service and Technology Design (along with Science) appear to be the O*NET features most likely to appear
as positive complementary variables.
Of course, any reconfiguration of skills, abilities and knowledge requirements entails an evolution of the occupation.
Or, put differently, occupations may need to be redesigned in order to make effective use of skills and
knowledge complements – and the results presented in Table 16 could be a useful guide in this exercise.
74
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table 16: For US major occupation groups, ranked lists (the highest ranked, top, and lowest ranked,
bottom) of O*NET features that are: currently high-valued, complementary and anti-complementary
Administration and
Science Economics and Accounting
Management
11-0000 Management Occupations Philosophy and Theology Medicine and Dentistry
Oral Expression
Sociology and Anthropology Mathematics – Knowledge
Oral Comprehension
Customer and Personal Service Customer and Personal Service Rate Control
31-0000 Healthcare Support Occupations Oral Comprehension Technology Design Mathematics – Knowledge
English Language Science Computers and Electronics
Public Safety and Security Customer and Personal Service Rate Control
33-0000 Protective Service Occupations Problem Sensitivity Technology Design Operation and Control
English Language Science Quality Control Analysis
75
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Installation
Mechanical Operation and Control
Installation, Maintenance Customer and Personal
49-0000 Near Vision Rate Control
and Repair Occupations Service
Repairing Quality Control Analysis
Technology Design
76
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
6.5.2. UK
Table 17 provides the equivalent complementary and anti-complementary O*NET features for UK sub-major
occupation groups.
Here, take Customer Service and Sales Occupations, which according to the model are also likely to see a fall in future
demand. According to the O*NET data, Customer and Personal Service, Oral Comprehension and Oral Expression are
the three most important features for this group. Our model predicts that increasing Judgment and Decision-Making,
Fluency of Ideas and Originality in the presence of these features will have the greatest positive impact on future
demand, while increasing Public Safety and Security, Law and Government, Operation and Control, Engineering and
Technology and Reading Comprehension will have the greatest negative impact.
Judgment and Decision-Making, Fluency of Ideas, Originality and Operations Analysis appear regularly across all
occupation groups and present an illustrative case where changes in organisational design may be required to take
advantage of them. Without enhanced delegation of formal authority and employee involvement in decision-making
and the generation of ideas, the productivity gains from investing in these skills are likely to be modest. This is
supported by a large body of evidence on the role and complementarity of high-performance work practices (Ben-
Ner and Jones, 1995; Kruse et al., 2004; Lazear and Shaw, 2007). Decentralisation and the organisational structures
and skills which support them appear particularly important for firms closer to the technological frontier, firms in
more varied environments and younger firms (Acemoglu et al., 2007). The UK ranks around the OECD average in
terms of the share of jobs which employ these practices, though the level of task discretion, defined as employees’
immediate control over their work tasks has fallen sharply since the 1990s (Inanc et al., 2013; OECD, 2016d).
Science – defined here as the capacity to use scientific rules and methods to solve problems – is another cross-
cutting complement. We find it to be a complementary feature not only among prototypical high-skill occupations
but also Secretarial and Administrative occupations. Mastery of medium-skill science is already indispensable to
a number of paraprofessional positions – from radiology technicians to electricians (Rothwell, 2013; Grinis, 2017).
Our results suggest that clerical occupations may be ripe for a similar transformation. One can envisage scenarios
where this is possible – for example, the credit controller occupation that increasingly applies aspects of data
science to help investigate the credit worthiness of borrowers and collect arrears of payment.
Table 17: For UK sub-major occupation groups, ranked lists (the highest ranked, top, and
lowest ranked, bottom) of O*NET features that are: currently high-valued, complementary and
anti-complementary
Administration and
Management Science Public Safety and Security
Corporate Managers
1100 Oral Expression Operations Analysis Law and Government
and Directors
Oral Comprehension Originality Sound Localization
77
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Public Safety and Security Gross Body Equilibrium Public Safety and Security
3300 Protective Service Occupations Law and Government Science Law and Government
English Language Gross Body Coordination Engineering and Technology
Customer and Personal Service Judgment and Decision-Making Public Safety and Security
4100 Administrative Occupations Oral Expression Science Engineering and Technology
Clerical Fluency of Ideas Mechanical
78
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Customer and Personal Service Fluency of Ideas Public Safety and Security
Leisure, Travel and Related
6200 Oral Expression Judgment and Decision-Making Engineering and Technology
Personal Service Occupations
Oral Comprehension Originality Reading Comprehension
Customer and Personal Service Judgment and Decision-Making Public Safety and Security
7100 Sales Occupations Oral Comprehension Fluency of Ideas Engineering and Technology
Oral Expression Originality Reading Comprehension
Customer and Personal Service Judgment and Decision-Making Public Safety and Security
7200 Customer Service Occupations Oral Expression Fluency of Ideas Law and Government
Oral Comprehension Originality Operation and Control
79
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
It is also useful to think about the occupations which may emerge in the future in response to the drivers of labour
market change we consider in our study. These occupations correspond to high-demand locations in the feature
space and are not associated with existing occupations. The model allows us to identify a hypothetical occupation
which is ’almost certain’ (see Section 5.5 for a formal interpretation) to experience an increase in workforce share
and the combination of skills, abilities and knowledge features most associated with it.
6.6.1. US
For the US, the model identifies four hypothetical occupations which would almost certainly experience a rise in demand.
Table 18 ranks the top five O*NET features in declining order of feature value for each hypothetical occupation. (S) denotes
that the variable is an O*NET skills feature, (K) is an O*NET knowledge feature and (A) is an O*NET abilities feature.
We can understand something about these hypothetical occupations by looking at existing occupations that are
closest to them (in declining order of proximity), as described in Figure 12. Of the 20 occupations presented here, 11 are
defined by O*NET as enjoying a Bright Outlook and/or are expected to benefit from the growth of the green economy.23
Table 18: The four new occupations found by our model for the US, as described by their top
five O*NET features.
Reading
Education and Oral Comprehension Social Written
4 Comprehension
Training (K) (S) Perceptiveness (S) Comprehension (S)
(S)
80
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
The employment time-series for four of these closest occupations is also plotted in Figure 13 for historical context.
Figure 12: ‘Closest’ occupations to hypothetical new high demand occupations for the US
Non-farm Animal Caretakers Massage Therapists Personal Care Aides Maids and Housekeeping Cleaners Home Health Aides
Construction Laborers Painters, Construction Helpers - Pipelayers, Plumbers, Roofers Drywall and Ceiling Tile Installers
and Maintenance Pipefitters and Steamfitters
Aerospace Engineers Electrical Engineers Chemical Engineers Nuclear Engineers Engineering Teachers,
Post-secondary
Directors, Religious Activities Training and Development Communications Teachers Home Economics Teachers Library Science Teachers
and Education Specialists Post-secondary Post-secondary Post-secondary
81
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Figure 13: Time-series of employment for four of the ‘closest’ occupations to new US occupations,
as tabulated in figure 12
These results provide another rejoinder to the view that jobs in the middle of the education and earnings distributions
will disappear in the future. Two of the four occupations can be plausibly viewed as middle-skill jobs. Our first hypothetical
occupation – which has similarities to social care work – is particularly interesting. On the one hand, it is a textbook
example of a sector where the availability of low-skilled employees, the budgetary squeeze on government programmes
– Medicare and Medicaid account for roughly 70% of all long-term care dollars – and the legacy of the politics of race
and gender have combined to create low-paid jobs with low status and precarious employment conditions (Institute of
Medicine, 2008; Duffy et al., 2015). However, the model points to bright demand prospects for care work which, requires
a mixture of tasks from across the skill spectrum, including formal knowledge and training which, in principle, would
support wage growth and job quality. Finally it is worth noting the extent to which interpersonal competencies feature
across these hypothetical occupations.
82
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
6.6.2. UK
For the UK, two new occupations are identified by the model. Table 19 shows the top five features of these occupations,
in declining order of importance.
Table 19: The two new occupations found by our model for the UK, as described by their top five
O*NET features.
1 Fine Arts (K) Originality (A) Design (K) Fluency of Ideas (A) Visualization (K)
Again, we can learn something about these occupations by looking at existing occupations that are closest to them
(in declining order of proximity). These closest occupations are described in Figure 14, and historical employment for two
of these are plotted in Figure 15. One of the occupations has high levels of creativity and combines traditional craft and
tech-based skills; the other fits hospitality and sales occupations and requires originality, flexibility and management skills.
83
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Figure 14: The ‘closest’ occupations to hypothetical new high demand occupations for the UK.
Artists Merchandisers and Florists Tailors and Dressmakers Photographers, audio-visual and
Window Dressers broadcasting equipment operators
Catering and bar managers Publicans and managers Managers and directors in retail Sales Supervisors Chefs
of licensed premises and wholesale
Figure 15: Time-series of employment for two of the ‘closest’ occupations to new UK occupations,
as tabulated in Figure 14
84
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
85
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
86
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
87
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
88
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
8. CONCLUSIONS
In this report, we have presented a novel mixed-methods Our skills results confirm the future importance of 21st
approach for predicting the demand for skills, which we have century skills – the combination of interpersonal and
applied to the US and UK economies in 2030. Specifically, we cognitive skills that has been an increasing preoccupation of
have: generated directional predictions for occupation growth policymakers in recent years. In our US findings, there is a
for groups in the Standard Occupation Classification of the US particularly strong emphasis on interpersonal competencies,
and UK; identified which skills, knowledge types and abilities consistent with the literature on the increasing importance of
will, by association, most likely experience growth and decline; social skills. In addition, a number of knowledge fields, such
and determined, at the occupational level, which human as English Language, Administration and Management, and
capital investments will most likely boost future demand in Biology are associated strongly with occupations predicted
2030. We have grounded our analysis explicitly in the many to see rising demand – a reminder that the future workforce
sources of structural change likely to impact on US and UK will have generic knowledge as well as skills requirements.
labour markets over this horizon. In the UK, the findings support the importance of 21st
century skills too, though with an even stronger emphasis on
Although there has been an explosion of reports looking
cognitive competencies and learning strategies. System skills –
into the future of employment, we believe ours is the
Judgment and Decision-making, Systems Analysis and Systems
most comprehensive and methodologically ambitious and
Evaluation – feature prominently.
has results that are actionable. It also contains the most
sophisticated treatment of uncertainty; this is important as Our study makes contributions to all three literatures surveyed
our finding that most jobs are associated with high levels of in Section 2. First, the foresight workshops employ a novel
uncertainty about future demand reminds us that the future data collection methodology using active machine-learning
for most occupations is far from inevitable. Lastly, we make algorithms, which intelligently queries participants to maximise
great efforts to benchmark our predictions – to tease out the informativeness of the data collected. Second, the study
our specific contributions to this important conversation employs an innovative approach to generating predictions
– by comparing them with alternative forecasts. While this about the future of skills, combining expert human judgment
necessarily falls short of evaluation, we believe it further with machine-learning techniques that can flexibly respond
separates our study from other recent exercises of this nature. to natural patterns in the data. Our approach thus permits
for richer, more complex, non-linear interactions between
Our approach takes the labour market judgments, gleaned
variables – one that we exploit to assess complementarities
at foresight workshops, of experts in a wide range of domain
between skills and the implications for new occupations.
areas where structural change is expected to impact on
Third, the research is grounded in an explicit consideration
employment, and combines it with a state of- the-art machine-
of the diverse sources of structural change, any one of which
learning algorithm. The model follows earlier studies in making
can be expected to have major impacts on future employer
use of the US Department of Labor’s O*NET survey of more
skills needs. By making use of the detailed characterisation of
than 1,000 occupations which asks detailed questions of every
occupations provided by the O*NET database, we are able
occupation on skills, knowledge and abilities and the tasks and
to provide a higher resolution treatment of skills, knowledge
activities which make up jobs. However, we depart from these
types and abilities than is usually found in the skills literature.
studies in making use of all 120 skills, knowledge and abilities
Finally, our research serves as a potentially important
features in the database.
counterweight to the dominance of future automation in
We find that 9.6% (8.0%) of the current US [UK] workforce policy debates on employment.
is in an occupation that will very likely experience an increase
This final point merits emphasis: it is tempting to focus on
in workforce share and 18.7% (21.2%) in an occupation that
the risks and dangers of the period ahead rather than the
will very likely experience a fall. These estimates imply that
opportunities it offers. This is both dangerous and misleading.
a large mass of the workforce in both the US and UK have
It is dangerous because popular narratives matter for
highly uncertain demand prospects (that is, a probability of
economic outcomes and a storyline of relentless technological
experiencing a higher workforce share of close to 50:50).
displacement of labour markets risks chilling growth and
This finding is significant. It contrasts sharply, for example, with
innovation (Atkinson and Wu, 2017; Shiller, 2017). A backlash
the U-shaped distribution in the studies of future automation
against technology would be particularly dangerous at a time
of Frey and Osborne (2014 and 2017), with their implication
when willingness to embrace risk is needed more than ever to
that the overwhelming majority of US and UK workers are
improve flagging productivity (Phelps, 2013; Erixon and Weigel,
employed in jobs with either very high or very low probability
2016). It is misleading because our analysis points to the
of automation. That our predictions are more uncertain is a
opportunities for boosting growth, though with one important
result of the distinctions of our methodology from previous
caveat – that our education and training systems are agile
work: in particular, our foresight workshops force domain
enough to respond appropriately. History is a reminder that
experts to confront the uncertainties arising from structural
investments in skills must be at the centre of any long-term
trends acting in complex and possibly offsetting ways.
strategy for adjusting to structural change. A precondition for
The experts’ stated uncertainties reflecting this – and
this is access to good information on skills needs – without
other sources – are explicitly factored into our machine-
which policymakers risk flying blind. We hope this report is a
learning model.
step towards improving understanding of this vital agenda.
89
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
90
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Figure A1: Using trend extrapolation, the distribution of US employment according to its probability
of future increased demand
Note that the total area under all curves is equal to total US employment.
Service Occupations
Production Occupations
91
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
UK – Extrapolations
Figure A 2: Using trend extrapolation, the distribution of UK employment according to its probability
of future increased demand
Note that the total area under all curves is equal to total UK employment.
Professional Occupations
Elementary Occupations
92
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
US – Relative rankings
Table A1: Relative rankings of major occupation groups in the US by our model,
trained on expert judgment and by independent forecasts from the BLS.
93
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
UK – Relative rankings
Table A2: Relative rankings of sub-major occupation groups in the UK by our model,
trained on expert judgment and by independent forecasts from the KCES.
94
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table B1: Probabilities of future increased demand for minor occupation groups
STANDARD
OCCUPATION AVER AGE PROBABILIT Y
OCCUPATION TITLE
CL ASSIFICATION (EMPLOYMENT-WEIGHTED)
(SOC)
39-2000 Animal Care and Service Workers 0.796
21-2000 Religious Workers 0.754
25-2000 Preschool, Primary, Secondary and Special Education School Teachers 0.743
23-1000 Lawyers, Judges and Related Workers 0.739
25-1000 Post-secondary Teachers 0.734
17-2000 Engineers 0.718
21-1000 Counselors, Social Workers and Other Community and Social Service Specialists 0.707
25-3000 Other Teachers and Instructors 0.683
39-9000 Other Personal Care and Service Workers 0.680
19-3000 Social Scientists and Related Workers 0.676
11-2000 Advertising, Marketing, Promotions, Public Relations and Sales Managers 0.672
39-5000 Personal Appearance Workers 0.672
25-9000 Other Education, Training and Library Occupations 0.665
27-2000 Entertainers and Performers, Sports and Related Workers 0.661
11-9000 Other Management Occupations 0.658
39-1000 Supervisors Of Personal Care and Service Workers 0.651
31-2000 Occupational Therapy and Physical Therapist Assistants and Aides 0.637
31-1000 Nursing, Psychiatric and Home Health Aides 0.636
43-1000 Supervisors of Office and Administrative Support Workers 0.635
29-9000 Other Healthcare Practitioners and Technical Occupations 0.629
29-1000 Health Diagnosing and Treating Practitioners 0.627
19-1000 Life Scientists 0.625
19-2000 Physical Scientists 0.613
39-7000 Tour and Travel Guides 0.611
17-1000 Architects, Surveyors and Cartographers 0.611
39-4000 Funeral Service Workers 0.605
27-3000 Media and Communication Workers 0.600
39-6000 Baggage Porters, Bellhops and Concierges 0.593
11-3000 Operations Specialties Managers 0.580
11-1000 Top Executives 0.579
13-1000 Business Operations Specialists 0.579
27-1000 Art and Design Workers 0.579
33-1000 Supervisors of Protective Service Workers 0.566
15-1000 Computer Occupations 0.556
37-1000 Supervisors Of Building and Grounds Cleaning and Maintenance Workers 0.551
15-2000 Mathematical Science Occupations 0.549
47-2000 Construction Trades Workers 0.548
25-4000 Librarians, Curators and Archivists 0.545
37-2000 Building Cleaning and Pest Control Workers 0.542
49-1000 Supervisors Of Installation, Maintenance and Repair Workers 0.535
47-3000 Helpers, Construction Trades 0.530
33-3000 Law Enforcement Workers 0.517
37-3000 Grounds Maintenance Workers 0.515
47-1000 Supervisors Of Construction and Extraction Workers 0.515
53-2000 Air Transportation Workers 0.511
53-1000 Supervisors Of Transportation and Material Moving Workers 0.499
41-1000 Supervisors Of Sales Workers 0.496
95
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
96
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Table B2: Probabilities of future increased demand for minor occupation groups
97
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
98
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
99
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
100
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
UK Skills Ranking
Table C2: A ranking, by average derivative, of the importance of O*NET variables to future
demand for UK occupations
101
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
102
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
103
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Table D1: The UK-to-US Occupation Crosswalk basedon LMI for All data
104
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
105
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
106
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
107
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
108
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
109
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
110
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
111
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
112
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
REFERENCES
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, Autor, D. H., 2013. The “task approach” to labor markets: an
C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., overview. Tech. Rep. 7178, IZA discussion paper.
Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R.,
Autor, D. H., Dorn, D., 2013. The growth of low-skill service
Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore,
jobs and the polarization of the US labor market. American
S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B.,
Economic Review 103 (5), 1553–97. URL http://www.aeaweb.
Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan,
org/articles?id=10.1257/aer.103.5.1553
V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke,
M., Yu, Y., Zheng, X., 2015. TensorFlow: Largescale machine Autor, D.., L. F., Murnane, R., 2003. The skill content of recent
learning on heterogeneous systems. Software available from technological change: an empirical exploration. Quarterly
tensorflow.org Journal of Economics 116 (4), 1279–1333.
Abeliansky, A., Prettner, K., 2017. Automation and Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M.,
demographic change. Tech. Rep. 05-2017, Hohenheim Hansen, K., Muller, K.-R., 2010. How to Explain Individual
Discussion Papers in Business, Economics and Social Classification Decisions. Journal of Machine Learning
Sciences Discussion Paper. Research 11, 1803–1831. URL https://is.tuebingen.mpg.de/
fileadmin/user{_}upload/ files/publications/baehrens10a{_}[0].pdf
Acemoglu, D., Aghion, P., Lelarge, C., Van Reenen, J., Zilibotti, F.,
2007. Technology, information and the decentralization of Banerjee, A., Duflo, E., 2008. What is middle class about the
the firm*. The Quarterly Journal of Economics 122 (4), 1759. middle classes around the world? Journal of Economic
URL +http://dx.doi.org/10.1162/qjec.2007.122.4.1759 Perspectives 22 (2), 3–28.
Acemoglu, D., Autor, D., 2011. Skills, tasks and technologies: Bank of America Merrill Lynch, 2017. Overdrive – global future
Implications for employment and earnings. In: Handbook of mobility primer.
Labor Economics. Vol. 4b.
Barany, Z. L., Siegel, C., Mar. 2017. Job polarization
Acemoglu, D., Restrepo, P., 2017a. Robots and jobs: Evidence and structural change. American Economic Journal:
from US labor markets. Macroeconomics. URL https://kar.kent.ac.uk/61271/
Acemoglu, D., Restrepo, P., 2017b. Secular stagnation? Baumol, W. J. and Bowen, W. G. 1966. Performing Arts: The
the effect of aging on economic growth in the age of Economic Dilemma. New York: The Twentieth Century Fund.
automation.
Baumol, W. J., de Ferranti, D., Malach, M., Pablos-Méndez, A.,
Adalet McGowan, M. andrews, D., 2015. Labour market Tabish, H., Wu, L. G., 2012. The cost disease: why computers
mismatch and labour productivity: Evidence from PIAAC get cheaper and health care doesn’t. Yale university press.
data. Tech. Rep. 1209, OECD Economics Department URL http://www.jstor.org/stable/j.ctt32bhj9
Working Papers.
Beblavý, M., Maselli, I., Veselkova, M., 2015. Green, Pink &
Alpert, A., Auyer, J., October 2003. Evaluating the BLS 1988– Silver? The Future of Labour in Europe. SSRN Scholarly
2000 employment projections. BLS Monthly Labor Review. Paper ID 2577743, Social Science Research Network,
Rochester, NY.
Armstrong, S., Sotala, K., Ó hÉigeartaigh, S. S., Jul. 2014. The
errors, insights and lessons of famous AI predictions – and Becker, S., Hornung, E., Woessmann, L., 2009. Catch me if you
what they mean for the future. Journal of Experimental & can – education and catch-up in the industrial revolution.
Theoretical Artificial Intelligence 26 (3), 317–342. Tech. Rep. 2816, CESifo Working Paper Series.
Arnold, R. D., Wade, J. P., 2015. A definition of systems thinking: Becker, S. O. and Muendler, M. 2015. Trade and tasks:
a systems approach. Procedia Computer Science 44, an exploration over three decades in Germany, Economic
669–678. Policy, 30(84), 589-641.
Arntz, M., Gregory, T., Zierahn, U., 2016. The risk of automation Ben-Ner, A., Jones, D. C., 1995. Employee participation, ownership
for jobs in OECD countries: A comparative analysis. Tech. and productivity: A theoretical framework. Industrial Relations:
Rep. 189, OECD Social, Employment and Migration Working A Journal of Economy and Society 34 (4), 532–554.
Papers.
Beramendi, P., Häusermann, S., Kitschelt, H., Kriesi, H., Apr.
Atkinson, R. D., Wu, J., 2017. False alarmism: technological 2015. The Politics of Advanced Capitalism. Cambridge
disruption and the U.S. Labor market, 1850–2015. Tech. University Press.
rep., Information Technology and Innovation Foundation.
Bix, A., 2000. Inventing ourselves out of jobs?: America’s
Autor, D., Katz, L., Kearney, M., 2006. The polarization of the US debate over technological unemployment. Johns Hopkins
labor market. University Press.
Autor, D., Katz, L., Kearney, M., 2008. Trends in US wage Black, S., Spitz-Oener, A., 2010. Explaining women’s success:
inequality: Reassessing the revisionists. Review of Technological change and the skill content of women.
Economics and Statistics 90 (2), 300– 323. Review of Economics and Statistics 92, 187–194.
113
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Brandes, P., Wattenhofer, R., 2016. Opening the Frey/ David, P., Wright, G., 1999. General purpose technologies and
Osborne black box: Which tasks of a job are susceptible to surges in productivity: Historical reflections on the future
computerization? arXiv:1604.08823 [cs.CY]. of the ICT revolution. Tech. Rep. 31, Oxford University
Discussion Papers in Economic and Social History.
Broadbent, B., 2013. Forecast errors. Speech given at the Mile
End Group of Queen Mary, University of London, Wednesday Davis, S., Haltiwanger, J., 2014. Labor market fluidity and
1 May 2013. economic performance. Tech. Rep. 20479, NBER Working
Paper.
Byrd, R. H., Lu, P., Nocedal, J., Zhu, C., 1995. A limited memory
algorithm for bound constrained optimization. SIAM Journal Deming, D., 2015. The growing importance of social skills in
on Scientific Computing 16 (5), 1190–1208. the labor market. Tech. Rep. 21473, NBER Working Paper.
Cappelli, P., 2015. Skill gaps, skill shortages and skill Dessein, W., Santos, T., 2006. Adaptive organizations. Journal
mismatches: Evidence and arguments for the United States. of Political Economy 114 (5), 956–995.
ILR Review 68 (2), 251–290.
Diamond, J. B., Randolph, A., Spillane, J. P., 2004. Teachers’
Capra, F., Luisi, P. L., 2014. The systems view of life: a unifying expectations and sense of responsibility for student
vision. Cambridge University Press. learning: the importance of race, class and organizational
habitus. Anthropology & Education Quarterly 35 (1), 75–98.
Caprettini, B., Voth, H.-J., 2017. Rage against the machines:
labour- saving technology and unrest in england, 1830-32. Duffy, M., Armenia, A., Stacey, C. L., 2015. Caring on the clock:
SSRN Scholarly Paper ID 2904322, Social Science Research the complexities and contradictions of paid care work.
Network, Rochester, NY. Rutgers University Press.
Carnevale, A., Smith, N., Strohl, J., 2010. Help wanted: Dustmann, C., Ludsteck, J., Schonberg, U., 2009. Revisiting the
Projections of jobs and education requirements through German wage structure. Quarterly Journal of Economics
2018. Tech. rep., Georgetown Center on Education and the 124 (2), 809–842.
Workforce.
Ecken, P., Von Der Gracht, H., Gnatzy, T., 2011. Desirability bias
CEDEFOP (European Centre for the Development of Vocational in foresight: Consequences for decision quality based on
Training), 2008. Systems for anticipation of skills needs in Delphi results. Technological Forecasting and Social Change
the EU member states. Tech. Rep. 1, CEDEFOP Working 78 (9), 1654–1670.
Paper.
Erixon, F., Weigel, B., 2016. The innovation illusion: how so
Chadha, J., 2017. Why forecast. Tech. Rep. 239, National little is created by so many working so hard. Yale University
Institute Economic Review. Press.
Christodoulou, D., 2014. Seven myths about education. Freeman, R. B., 2015. Who owns the robots rules the world.
Routledge Press. IZA World of Labor 5.
Chu, W., Ghahramani, Z., 2005. Gaussian processes for ordinal Frey, C., Berger, T., 2016. Industrial renewal in the 21st century:
regression. Journal of Machine Learning Research 6 (Jul), Evidence from US cities. Regional Studies, forthcoming.
1019–1041.
Frey, C. B., Osborne, M. A., Oct. 2014. Agiletown: the relentless
Clark, B., Joubert, C., Maurel, A., 2014. The career prospects march of technology and London’s response. Tech. rep.,
of overeducated Americans. Tech. Rep. 20167, National Deloitte. URL http://www.deloitte.com/view/en_GB/uk/market-
Bureau of Economic Research Working Paper. insights/ uk-futures/london-futures/index.htm
Consoli, D., Marin, G., Marzucchi, A., Vona, F., 2016. Do green Frey, C. B., Osborne, M. A., 2017. The future of employment:
jobs differ from non-green jobs in terms of skills and human how susceptible are jobs to computerisation? Technological
capital? Research Policy, 45 (5), 1046–60. Forecasting and Social Change 114, 254–280.
Covington, M. V., Müeller, K. J., 2001. Intrinsic Versus Extrinsic Galor, O., Moav, O., 2006. Das human-kapital: A theory of the
Motivation: An Approach/Avoidance Reformulation. demise of the class structure. Review of Economic Studies
Educational Psychology Review 13 (2), 157–176. 73 (1), 85–117.
Canadian Scholarship Trust (CST), 2017. CST Careers 2030. Gambin, L., Hogarth, T., Murphy, L., Spreadbury, K., Warhurst,
URL http://careers2030.cst.org. C., Winterbotham, M., 2016. Research to understand the
extent, nature and impact of skills mismatches in the
Davenport, T. H., Kirby, J., 2016. Only humans need apply:
economy. Tech. Rep. 265, BIS Research Paper.
winners and losers in the age of smart machines. Harper
Business. Gathmann, C., Schonberg, U., 2010. How general is human
capital? a taskbased approach. Journal of Labor Economics
David, P., 1990. The dynamo and the computer: An historical
28 (1), 1–49.
perspective on the modern productivity paradox. American
Economic Review, 80 (2), 355–361. Ghahramani, Z., 2013. Bayesian non-parametrics and the
probabilistic approach to modelling. Phil. Trans. R. Soc. A
371 (1984), 20110553.
114
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Gigerenzer, G., 2010. Moral satisficing: Rethinking moral Hirsch, E. D., 2016. Why knowledge matters: rescuing
behavior as bounded rationality. Topics in Cognitive Science our children from failed educational theories. Harvard
2 (3), 528–554. Education Press.
Goldin, C., Katz, L., 2009. The race between education and Hyatt, H., Spletzer, J., 2013. The recent decline in employment
technology. Harvard University Press. dynamics. IZA Journal of Labor Economics 2 (5).
Goodwin, P., Wright, G., 2010. The limits of forecasting Inanc, H., Felstead, A., Gallie, D., Green, F., 2013. Job control in
methods in anticipating rare events. Technological Britain: First findings from the Skills and Employment Survey
Forecasting and Social Change 77 (3), 355–368. 2012.
Goos, M., J., K., Vandeweyer, M., 2015. Employment growth in Institute of Medicine, 2008. Retooling for an aging America:
europe: The roles of innovation, local job multipliers and building the health care workforce. The National Academies
institutions. Tech. Rep. 10, Utrecht School of Economics Press, Washington, DC.
Discussion Paper Series.
International Monetary Fund (IMF), 2017. World economic
Goos, M., Manning, A., 2007. Lousy and lovely jobs: the rising outlook, April 2017: Gaining momentum?
polarization of work in Britain. Review of Economics and
Jones, C. I., 2016. Life and growth. Journal of political Economy
Statistics, 89 (1).
124 (2), 539–578.
Graetz, G., Michaels, G., 2015. Robots at work. Tech. Rep.
Kahneman, D., 2011. Thinking, fast and slow. Penguin Press.
10477, CEPR Discussion Paper.
Kambourov, G., Manovskii, I., 2009. Occupational specificity
Granovetter, M., 2017. Society and Economy: Framework and
of human capital. International Economic Review 50 (1),
Principles. Harvard University Press.
63–115.
Gregory, T., Salomons, A., Zierahn, U., 2016. Racing with or
Kasparov, G., 2017. Deep thinking: where machine intelligence
against the machine? Evidence from europe. Tech. Rep.
ends and human creativity begins. John Murray.
16-053.
Kautz, T., Heckman, J. J., Diris, R., ter Weel, B., Borghans, L., Dec.
Grinis, I., 2017. The STEM requirements of ‘Non-STEM’ jobs:
2014. Fostering and measuring skills: improving cognitive
evidence from UK online vacancy postings and implications
and non-cognitive skills to promote lifetime success.
for skills & knowledge shortages. Mimeo. URL https://papers.
Working Paper 20749, National Bureau of Economic
ssrn.com/sol3/papers.cfm?abstract_id= 2864225
Research.
Guyon, I., Elisseeff, A., 2003. An introduction to variable and
Keynes, J. M., 1930. Economic possibilities for our
feature selection. Journal of Machine Learning Research 3
grandchildren. Essays in Persuasion (1963), pp. 358-73.
(Mar), 1157–1182.
W.W. Norton.
Hajkowicz, S. A., Reeson, A., Rudd, L., Bratanova, A., Hodgers,
King, M., Ruggles, S., Alexander, J. T., Flood, S., Genadek, K.,
L., Mason, C., Boughen, N., 2016. Tomorrow’s digitally
Schroeder, M. B., Trampe, B., Vick, R., 2010. Integrated
enabled workforce: Megatrends and scenarios for jobs and
public use microdata series, current population survey:
employment in Australia over the coming twenty years.
Version 3.0.[machine-readable database]. Minneapolis:
Australian Policy Online.
University of Minnesota 20. URL http://doi.org/10.18128/
Hampden-Thompson, G., Bennett, J., 2013. Science teaching D030.V4.0
and learning activities and students’ engagement in
Kruse, D., Freeman, R., Blasi, J., Buchele, R., Scharf, A., Rodgers,
science. International Journal of Science Education 35 (8),
L., Mackin, C., 2004. Motivating employee-owners in ESOP
1325–1343.
firms: Human resource policies and company performance.
Handel, M., 2012. Trends in job skill demands in OECD In: Employee Participation, Firm Performance and Survival.
countries. Tech. Rep. 143, OECD Social, Employment and Emerald Group Publishing Limited, pp. 101–127.
Migration Working Papers.
Lazear, E. P., Shaw, K. L., 2007. Personnel economics: The
Handel, M., 2016. Dynamics of occupational change: economist’s view of human resources. The Journal of
Implications for the occupational requirements survey. Economic Perspectives 21 (4), 91–114.
Mimeo.
Levy, F., Murnane, R., 2004. The new division of labor: how
Hausmann, R., Hidalgo, C. A., Bustos, S., Coscia, M., Simoes, computers are creating the next job market. Princeton
A., Yildirim, M. A., 2014. The atlas of economic complexity: University Press.
mapping paths to prosperity. MIT Press.
Lin, J., 2011. Technological adaptation, cities and new work.
Heckman, J., 1995. Lessons from the bell curve. Journal of Review of Economics and Statistics 93 (2), 554–574.
Political Economy 103 (5), 1091–1120.
Liu, Y., Grusky, D., 2013. The payoff to skill in the third
Heckman, J., Kautz, T., 2012. Hard evidence on soft skills. industrial revolution. American Journal of Sociology 118 (5),
Labour Economics 19 (4), 451–464. 1330–74.
115
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
Lucas, B., Claxton, G., Spencer, E., 2013. Progression in student Nesta, 2017. Solved! Making the case for collaborative
creativity in school. OECD education working papers, Organisation problem-solving.
for Economic Co-operation and Development, Paris.
Occupational Information Network (O*NET), 2017. O*NET
MacCrory, F., Westerman, G., Alhammadi, Y., Brynjolfsson, online. URL https://www.onetonline.org
E., 2014. Racing with and against the machine: Changes
Ocejo, R. E., 2017. Masters of craft: old jobs in the new urban
in occupational skill composition in an era of rapid
economy. Princeton University Press.
technological advance. Thirty-fifth International Conference
on Information Systems, Auckland. Organisation for Economic Co-operation and Development
(OECD), 2012. Better Skills, Better Jobs, Better Lives: A
MacKay, D. J., 2003. Information theory, inference and learning
Strategic Approach to Skills Policies.
algorithms. Cambridge University Press.
OECD, 2016a. Fostering and assessing students’ creativity and
ManpowerGroup, 2016. The talent shortage survey. URL http://
critical thinking in higher education. In: Workshop Summary
manpowergroup.com/talent-shortage-2016
Report.
Matthews, A. G. d. G., van der Wilk, M., Nickson, T., Fujii,
OECD, 2016b. Getting skills right: assessing and anticipating
K., Boukouvalas, A., León Villagrá, P., Ghahramani, Z.,
changing skill needs.
Hensman, J., 2016. GPflow: A Gaussian process library using
TensorFlow. arXiv preprint 1610.08733. OECD, 2016c. How good is your job? Measuring and assessing
job quality. In: OECD Employment Outlook. pp. 79–139.
McKinsey Global Institute, 2017. A future that works:
Automation, employment and productivity. OECD, 2016d. OECD Employment Outlook 2016.
Mercier, H., Sperber, D., 2017. The enigma of reason. Harvard OECD, 2017. PISA 2015 Results (Volume 1): Excellence and Equity.
University Press.
Office for National Statistics (ONS), 2017a. emp04:
Michaels, G., Natraj, A., Reenen, J. V., 2009. Has ICT polarized employment by occupation. URL https://www.ons.
skill demand? Evidence from eleven countries over 25 years. gov.uk/employmentandlabourmarket/ peopleinwork/
Review of Economics and Statistics 96 (1), 60–77. employmentandemployeetypes/datasets/
employmentbyoccupationemp04
Miller, R., 2006. Futures studies, scenarios and the “possibility-
space” approach. Schooling for Tomorrow, 93–105. ONS, 2017b. Labour force survey 2017. URL https://discover.
ukdataservice.ac.uk/series/?sn=2000026
Mims, C., 2016 September 25. Self-driving hype doesn’t reflect
reality. The Wall Street Journal. Pearson, K., 1901. On lines and planes of closest fit to systems
of points in space. Philosophical Magazine Series 6 2 (11),
Mitchell, T., Brynjolfsson, E., 2017. Track how technology is
559–572. URL http://dx.doi.org/10.1080/14786440109462720
transforming work. Nature 544, 290–292.
Phelps, E. S., 2013. Mass Flourishing: How Grassroots
Mokyr, J., Vickers, C., Ziebarth, N., 2015. The history of
Innovation Created Jobs, Challenge and Change. Princeton
technological anxiety and the future of economic growth:
University Press.
Is this time different? Journal of Economic Perspectives 29
(3), 31–50. Pierson, P., 2004. Politics in time: history, institutions and
social analysis. Princeton University Press.
Montt, G., 2015. The causes and consequences of field-of-
study mismatch: An analysis using PIAAC. Tech. Rep. 167, Poletaev, M., Robinson, C., 2008. Human capital specificity:
oECD Social, Employment and Migration Working Papers. Evidence from the dictionary of occupational titles and
displaced worker surveys, 1984- 2000. Journal of Labor
Moretti, E., 2012. The new geography of jobs.
Economics 26 (2), 387–420.
Houghton Mifflin Harcourt.
Poropat, A., 2009. Meta-analysis of the five-factor model of
Mosca, I., Wright, R., 2013. Is graduate under-employment
personality and academic performance. Psychological
persistent? evidence from the United Kingdom. Tech. Rep.
Bulletin 135 (3), 322–338.
6177, IZA Discussion Paper.
PwC, 2016. The future of work – A journey to 2022.
Murphy, K. P., 2012. Machine learning: a probabilistic
perspective. MIT press. PwC, 2017. Consumer spending prospects and the impact of
automation on jobs. UK Economic Outlook, March.
National Research Council. 2012. Education for life and
work: Developing transferable knowledge and skills in the Rasmussen, C., Williams, C., 2006. Gaussian processes for
21st century. The National Academies Press. https://doi. machine learning. MIT Press.
org/10.17226/13398.
Reimers, F., Chung, C. (Eds.), 2016. Teaching and learning for
Nemet, G., Anadon, L., Verdolini, E., 2016. Quantifying the the twenty- first century: educational goals, policies and
effects of expert selection and elicitation design on curricula from six nations. Harvard Education Press.
experts’ confidence in their judgments about future energy
technologies. Risk Analysis 7 (2), 315–330.
116
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Robinson, C., 2011. Occupational mobility, occupation distance UK Commission for Employment and Skills, 2014. The future of
and basic skills: Evidence from job-based skill measures. work: Jobs and skills in 2030. Tech. Rep. Evidence Report 84.
Tech. Rep. 2011–5, cIBC Working Paper Series.
UK Commission for Employment and Skills, 2017a. LMI for all.
Rosen, S., 1983. Specialization and human capital. Journal of URL http://www.lmiforall.org.uk/about-lmi-for-all/
Labor Economics 1 (1), 43–49.
UK Commission for Employment and Skills, 2017b. LMI for all.
Rothwell, J., 2013. The hidden STEM economy. Metropolitan URL http://api.lmiforall.org.uk
Policy Program at Brookings. URL http://kstp.com/kstpImages/
US Bureau of Labor Statistics, 2010. 2010 SOC user guide. URL
repository/cs/files/ TheHiddenSTEMEconomy610.pdf
https://www.bls.gov/soc/soc_2010_user_guide.pdf
Şahin, A., Song, J., Topa, G., Violante, G., 2014. Mismatch
US Bureau of Labor Statistics, 2015. May 2015 occupational
unemployment. American Economic Review, 104 (11),
employment statistics. URL https://www.bls.gov/oes/2015/
3529–64.
may/oes_nat.htm
Schleicher, A., 2015 December 17. How can we equip the
US Bureau of Labor Statistics, 2016. Replacement Needs. URL
future workforce for technological change?. World Economic
https://www. bls.gov/emp/ep_table_110.htm.
Forum Agenda (blog).
Van Rens, T., 2015. The skills gap: Is it a myth?. Global
Schneider, P., Armstrong, H., Bakhshi, H., 2017. The Future
Perspectives Series: Paper 5, Social Market Foundation.
of US Work and Skills: trends impacting on employment in
2030. Tech. rep., Nesta/- Pearson. Waegeman, W., De Baets, B., Boullart, L., 2008. ROC analysis in
ordinal regression learning. Pattern Recognition Letters 29
Schunk, D. H., Zimmerman, B. J., 2007. Influencing children’s
(1), 1–9.
self- efficacy and self-regulation of reading and writing
through modeling. Reading & Writing Quarterly 23 (1), 7–25. Weaver, A., Osterman, P., 2017. Skill demands and mismatch in
US manufacturing. ILR Review 70 (2), 275–307.
Shackle, G., 1972. Epistemics and economics: a critique of
economic doctrines. Transaction Publishers. Weinstein, R. S., 2002. Reaching Higher. Harvard University
Press.
Shah, J., Wiken, J., Williams, B., Breazeal, C., 2011. Improved
humanrobot team performance using chaski, a human- Woolley, A., Chabris, C., Pentland, A., Hashmi, N., Malone,
inspired plan execution system. In: Proceedings of the 6th T., 2010. Evidence for a collective intelligence factor in
International Conference on Human–Robot Interaction. HRI the performance of human groups. Science 330 (6004),
’11. ACM, New York, NY, USA, pp. 29–36. 686–688.
Shiller, R. J., 2017. Understanding Today’s Stagnation. URL World Bank, 2013. World Development Report 2013: Jobs.
https://www.project-syndicate.org/commentary/ secular- World Bank Publishing.
stagnation-future-of-work-fears-by-robert-j--shiller-2017-05.
World Economic Forum (WEF), 2016. The future of jobs:
Simonite, T., 2016. Prepare to be underwhelmed by 2021’s Employment, skills and workforce strategy for the fourth
autonomous cars. MIT Technology Review. industrial revolution. World Economic Forum.
Spitz-Oener, A., 2006. Technical change, job tasks and rising Wyatt, I., September 2010. Evaluating the 1996-2006
educational demands: Looking outside the wage structure. employment projections. BLS Monthly Labor Review.
Journal of Labor Economics 24 (2), 235–270.
117
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
118
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
Multinomial Distribution
A distribution that shows the likelihood of the possible
results of an experiment with repeated trials in which each
trial can result in a specified number of outcomes that is
greater than two.
Noisy
statistical noise is a term that refers to the unexplained
variation or randomness that is found within a given data
sample.
Non-Parametric Extrapolation
extrapolation is the action of estimating or concluding
something by assuming that existing trends will continue.
When using a non-parametric method to do this, the
boundaries of what is possible come from the data (or the
training set), rather than the statistical model.
Occam’s Razor
A scientific and philosophical rule that entities should
not be multiplied unnecessarily which is interpreted as
requiring that the simplest of competing theories be
preferred to the more complex or that explanations of
unknown phenomena be sought first in terms of known
quantities.
Over-Fitting
A modeling error which occurs when a function is too
closely “fit” to a specific set of data points, limiting its
generalisability.
Python Library
A Python library is a collection of functions and methods
that allows you to perform lots of actions without writing
your own code.
Time-Series
A time series is a sequence of numerical data points in
successive order.
Uncertainty Sampling
An Active Learning approach in which the Machine
Learning Algorithm selects the Documents as to which it
is least certain about Relevance, for Coding by the Subject
Matter Expert(s) and addition to the Training Set.
119
THE F UT URE OF SKI L L S: EMPL OYMEN T I N 2 0 3 0
ENDNOTES
1
Emblematic of this interest was the publication of the automatability at a more granular level. Second, differences
BLS Occupational Outlook Handbook and the OECD’s may arise from the classification method used by Arntz
Mediterranean Regional Project which popularised et al (2016). Their modified logistic regression implies a
the use of manpower planning in developed and linear relationship between features and whether a job is
developing countries. automatable. This is a simpler and less flexible model than
Frey and Osborne’s and again is likely to default towards
2
Ireland, for example, uses foresight in its various sector
predictive probabilities in the middle. Third, the authors
studies, while Germany’s BIB-BIAB Qualification and
include a number of variables such as gender, education,
Occupational Fields programme produces qualitative
income, sector and firm size as predictors of automatability,
scenarios to contrast their baseline quantitative
even though these are not obviously supported or
projections.
interpreted in terms of economic theory. In addition PwC
3
Making predictions about technological progress is also (2017) finds that some of these results are an artefact of
notoriously difficult (Armstrong et al., 2014). which variables from the PIAAC database are used. Its own
estimates, using a different set of occupational features, are
4
The direction of employment change – in both absolute
still lower than those of Frey and Osborne, but still much
and relative terms – was projected accurately for 70% of
closer to them than to the estimates of Arntz et al.
detailed occupations included in Handel’s evaluation.
8
See also Rosen (1983) on the indivisibility of occupations.
5
An interesting study on the short-run dislocations of
labour-saving technology is Caprettini and Voth (2017). 9
In the 19th century, 98% of the labour required to weave
It examines the diffusion of threshing machines through cloth was automated but employment in the weaving
the English countryside during the 1830s and its impact industry still increased due to increased demand for
on social unrest, the so-called ‘Captain Swing’ riots. To cheaper clothes.
measure diffusion, the study uses farm advertisements 10
Exceptions include the public sector and non-automated
in local newspapers, which provide detail on the location
manufacturing industries such as recycling, basic metals,
and use intensity of new threshing machines. The take-up
of new technology was not exogenous, making causal 11
The occupations labelled come from the US Bureau of
assignment difficult. For example, landlords, alarmed by Labor Statistics (BLS) 2010 SOC and the UK Office for
the outbreak of violence, might have introduced fewer National Statistics (ONS) SOC 2010.
machines, which would bias estimates downwards. To 12
The selection of the first 10 occupations presented to
identify causality, the authors use soil suitability for wheat
the experts was random: specifically, 10 occupations
as an instrument for the adoption of threshing technology.
were randomly selected, but individual occupations were
The reason for this is that wheat was the only grain that
replaced with another randomly selected occupation
could be threshed profitably by early threshing technology.
when historical time-series were not available at least back
This instrument is valid insofar as it does not affect
to 1983 for the US and 2001 for the UK. This constraint
rural workers’ propensity to riot other than through its
meant that, in the US, we drew the occupations from 125
effect on the adoption of technology. Thus, among other
of the total 840 six-digit SOC codes and from 163 of the
things, wheat-growing areas were not poorer than other
total 369 four-digit SOC codes in the UK. textiles, paper,
areas. The authors find that areas more suited to wheat
furniture and transportation equipment.
cultivation exhibit both greater adoption of threshing
machines and significantly higher incidence of riots. 13
As explained in 4.2, we adopt six-digit US SOC codes for
US data. When we use US O*NET data for similar UK
6
By mimic, we refer to the ability of a machine to replicate
occupations, we present UK occupations at the four-digit
or surpass the results of a human rather than achieve
UK SOC code level. The US and UK systems are the same
those results in the same way.
level of detail, in that they refer to detailed occupations.
7
This poses a puzzle – if Arntz et al. (2016) proceed from 14
According to O*NET, skills represent developed
a similar assessment of occupation level automatability
capacities which facilitate learning or the more rapid
as Frey and Osborne (2017), why do they arrive at such
acquisition of knowledge; abilities are enduring
sharply different results? While the task-based approach
attributes of the individual which influence performance
may explain some of the difference, it is arguably
and knowledge refers to organised sets of principles and
overstated by aspects of their research design. This can
facts applying in general domains.
be understood on three levels. First, the PIAAC data
is available only at a two-digit International Standard 15
No dataset equivalent to O*NET exists for UK
Classification of Occupations (ISCO) level, in contrast occupations. Instead, we perform a bespoke crosswalk
to data on detailed occupations used by Frey and from UK occupations to the closest match US
Osborne. Studying occupations in aggregate is likely to occupation, as determined by using the 'LMI for All'
push employment towards the medium risk category database (UK Commission for Employment and Skills,
insofar as it washes out variation between occupations’ 2017a) with some custom changes that we explain.
120
T HE FUT URE OF SKI L L S: EM PL OYM ENT I N 2 0 3 0
16
Consider engineers and metal workers and plastic 22
reen occupations are defined here as ‘Waste disposal
G
workers – two occupation groups which the BLS predicts and environmental services’, ‘Conservation professionals’,
will decline in share between 2014 and 2024. Over ‘Environment professionals’, ‘Environmental health
this period, metal workers and plastic workers are professionals’ and ‘Conservation and environmental
anticipated to lose 99,000 jobs, whereas engineers are associate professionals’ (four-digit level).
expected to add 65,000 jobs. 23
Specifically, Bright Outlook occupations are ones
17
bservations consisted only of the ternary-valued
O that: are projected to grow much faster than average
labels and participant uncertainty was not incorporated. (employment increase of 14% or more) over the period
Reflecting the dataset used in Frey and Osborne (2017) 2014–2024; have 100,000 or more job openings over
only nine features were used, namely: Originality Systems the period 2014–2024; or are new and emerging
Evaluation, Hearing Sensitivity, Arm-Hand Steadiness, occupations in a high-growth industry.
Learning Strategies, Oral Comprehension, Social
Perceptiveness, Manual Dexterity, Problem Sensitivity.
18
ote that the Pearson correlation coefficient is often
N
used for feature selection (as a filter), an exception to
the inappropriate information-theoretic methods of
feature selection we generically describe above.
19
o be precise, this entails only redefining the
T
expectations above as employment-weighted sums
over that subset of occupations, rather than over
all occupations.
20
ur definition of complementarity is loosely related
O
to that used by economists: two features are
complementary if the marginal value product of one
is increasing in the level of the second. This definition
fails to meet our needs. The first problem is that the
definition makes no accommodation for location in
feature space. The economics definition is a statement
about the second-order derivative of the function
with respect to the two features being positive; for an
arbitrary function, as may be learned by our flexible
non-parametric model, the second-order derivatives
may be in very different regions of space. The second
related problem with the definition is that it may lead
to highlighting feature combinations which, even if the
second-order derivatives are positive and constant
across space, are actively harmful. As a simple example,
for a bivariate quadratic function with positive-definite
Hessian (a convex bowl), an occupation on the wrong
side of the critical point (the minimiser, or the location
of the bottom of the bowl) would see its demand
decreased with increases in either or both of the
features. Our means of assessing complementarity,
however, will more correctly identify the differing
importances of features combinations at any point in
feature space.
21
his is clearest in the case of ‘Public services and other
T
associate professionals’ and ’Welfare professionals’.
However all other three-digit public sector occupations
i.e. ‘Senior officers in protective services’, ’Protective
services’, ‘Quality and regulatory professionals’, ’Health
associate professionals’ and ’Welfare and housing
associate professionals’ have an average probability
of increased workforce share > 0.5.
121
124