Forecasting Overview 60 109

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

F. Petropoulos, D. Apiletti, V. Assimakopoulos et al.

International Journal of Forecasting 38 (2022) 705–871

e.g., Pearl (2009) for a recent extensive coverage). While as evangelical forecasting), unencumbered by what the
causality has been a key topic of interest to forecasters for data show. More commonly, the process begins with a
a long time already, new approaches and concepts are be- statistical forecast (generated by forecasting software),
ing pushed forward for identification of and inference in which is then subject to review and adjustment, as illus-
causal models (Peters, Janzing, & Schölkopf, 2017), which trated in Fig. 5.
may have a significant impact on the theory of forecasting. In concept, such an elaborate multi-stage process al-
Eventually, the key question of what a good forecast lows ‘‘management intelligence’’ to improve forecast qual-
is will continue to steer new developments in the the- ity, incorporating information not accounted for in the
ory of forecasting in the foreseeable future. The nature statistical model. In reality, however, benefits are not
of goodness of forecasts (seen from the meteorological assured. Lawrence, Goodwin, O’Connor, and Önkal (2006)
application angle) was theorised a few decades ago al- reviewed more than 200 studies, concluding that human
ready (Murphy, 1993), based on consistency, quality and judgment can be of significant benefit but is also subject
value. We still see the need to work further on that ques- to significant biases. Among the many papers on this
tion – possibly considering these 3 pillars, but possibly subject, there is general agreement on the need to track
also finding other ways to define desirable properties and review overrides, and the need to better understand
of forecasts. This will, in all cases, translates to further the psychological issues around judgmental adjustments.
developing frameworks for forecast verification, focusing The underlying problem is that each human touch
on the interplay between forecast quality and value, but point subjects the forecast to the interests of the review-
also better linking to psychology and behavioural eco- ers – and these interests may not align with creating an
nomics. In terms of forecast verification, some of the accurate, unbiased forecast. To identify where such prob-
most pressing areas most likely relate to (multivariate) lems are occurring, Forecast Value Added (FVA) analysis
probabilistic forecasting and to the forecasting of extreme is an increasingly popular approach among practitioners.
events. When it comes to forecast quality and value, we FVA is defined as the change in a forecasting perfor-
need to go beyond the simple plugging of forecasts into mance metric that can be attributed to a particular step
decision problems to assess whether this yields better or participant in the forecasting process (Gilliland, 2002).
decisions, or not. Instead, we ought to propose suitable Any activity that fails to deliver positive FVA (i.e., fails to
theoretical frameworks that allow assessing whether cer- improve forecast quality) is considered process waste.
tain forecasts are fundamentally better (than others) for Starting with a naive forecast, FVA analysis seeks to
given classes of decision problems. Finally, the link to determine whether each subsequent step in the process
psychology and behavioural economics should ensure a improves upon the prior steps. The ‘‘stairstep report’’ of
better appraisal of how forecasts are to be communicated, Table 1 is a familiar way of summarising results, as in this
how they are perceived and acted upon. example from Newell Rubbermaid (Schubert & Rickard,
Most of the advances in the science of forecasting 2011).
have come from the complementarity between theoret- Here, averaged across all products, naive (random walk)
ical developments and applications. We can then only be achieved forecast accuracy of 60%. The company’s statis-
optimistic for the future since more and more application tical forecast delivered five percentage points of improve-
areas are relying heavily on forecasting. Their specific ment, but management review and adjustment delivered
needs and challenges will continue fuelling upcoming negative value. Such findings – not uncommon – urge
developments in the theory of forecasting. further investigation into causes and possible process
corrections (such as training reviewers or limiting adjust-
3. Practice ments). Alternatively, the management review step could
be eliminated, providing the dual benefits of freeing up
3.1. Introduction to forecasting practice91 management time spent on forecasting and, on average,
more accurate forecasts.
The purpose of forecasting is to improve decision mak- Morlidge (2014c) expanded upon FVA analysis to
ing in the face of uncertainty. To achieve this, forecasts present a strategy for prioritising judgmental adjustments,
should provide an unbiased guess at what is most likely finding the greatest opportunity for error reduction in
to happen (the point forecast), along with a measure of products with high volume and high relative absolute
uncertainty, such as a prediction interval (PI). Such infor- error. Chase (2021) described a machine learning (ML)
mation will facilitate appropriate decisions and actions. method to guide forecast review, identifying which fore-
Forecasting should be an objective, dispassionate ex- casts are most likely to benefit from adjustment along
ercise, one that is built upon facts, sound reasoning, and with a suggested adjustment range. Baker (2021) used
sound methods. But since forecasts are created in social ML classification models to identify characteristics of non-
settings, they are influenced by organisational politics and value adding overrides, proposing the behavioural eco-
personal agendas. As a consequence, forecasts will often nomics notion of a ‘‘nudge’’ to prompt desired forecaster
reflect aspirations rather than unbiased projections. behaviour. Further, Goodwin, Petropoulos, and Hyndman
In organisations, forecasts are created through pro- (2017) derived upper bounds for FVA relative to naive
cesses that can involve multiple steps and participants. forecasts. And de Kok (2017) created a Stochastic Value
The process can be as simple as executive fiat (also known Added (SVA) metric to assess the difference between
actual and forecasted distributions, knowledge of which
91 This subsection was written by Michael Gilliland. is valuable for inventory management.
764
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 5. Multi-stage forecasting process.

Table 1
Stairstep report showing FVA results.
Process step Forecast accuracy FVA vs. Naive FVA vs. statistical
(100%–MAPE)
Naive forecast 60%
Statistical forecast 65% 5%
Adjusted forecast 62% 2% −3%

Including an indication of uncertainty around the point on detecting where their process is severely suboptimal
forecast remains an uncommon practice. Prediction in- and taking measures to redress the problem’’. This is
tervals in software generally underestimate uncertainty, where FVA can help.
often dramatically, leading to unrealistic confidence in For now, the challenge for researchers remains: To
the forecast. And even when provided, PIs largely go prompt practitioners to adopt sound methods based on
unused by practitioners. Goodwin (2014) summarised the the objective assessment of available information, and
psychological issues, noting that the generally poor cali- avoid the ‘‘worst practices’’ that squander resources and
bration of the PIs may not explain the reluctance to utilise fail to improve the forecast.
them. Rather, ‘‘an interval forecast may accurately reflect
the uncertainty, but it is likely to be spurned by decision 3.2. Operations and supply chain management
makers if it is too wide and judged to be uninforma-
tive’’ (Goodwin, 2014, page 5). 3.2.1. Demand management92
It has long been recognised (Chatfield, 1986; Lawrence, Demand management is one of the dominant com-
2000) that the practice of forecasting falls well short of ponents of supply chain management (Fildes, Goodwin,
the potential exhibited in academic research, and revealed & Lawrence, 2006). Accurate demand estimate of the
by the M forecasting competitions. In the M4, a simple present and future is a first vital step for almost all
benchmark combination method (the average of Single, aspects of supply chain optimisation, such as inventory
Holt, and Damped exponential smoothing) reduced the management, vehicle scheduling, workforce planning, and
overall weighted average (OWA) error by 17.9% compared distribution and marketing strategies (Kolassa & Siemsen,
to naive. The top six performing methods in M4 further 2016). Simply speaking, better demand forecasts can yield
reduced OWA by over 5% compared to the combination significantly better supply chain management, including
benchmark (Makridakis, Spiliotis et al., 2020). But in fore- improved inventory management and increased service
casting practice, just bettering the accuracy of naive has levels. Classic demand forecasts mainly rely on qualitative
proven to be a surprising challenge. Morlidge’s (2014b) techniques, based on expert judgment and past experi-
study of eight consumer and industrial businesses found ence (e.g., Weaver, 1971), and quantitative techniques,
52% of their forecasts failed to do so. And, as shown, based on statistical and machine learning modelling (e.g.,
Newel Rubbermaid beat naive by just two percentage Bacha & Meyer, 1992; Taylor, 2003b). A combination of
points after management adjustments. qualitative and quantitative methods is also popular and
Ultimately, forecast accuracy is limited by the nature of proven to be beneficial in practice by, e.g., judgmental ad-
the behaviour being forecast. But even a highly accurate justments (Önkal & Gönül, 2005; Syntetos, Kholidasari &
forecast is of little consequence if overridden by man- Naim, 2016; Turner, 1990, and Section 2.11.2), judgmental
agement and not used to enhance decision making and forecast model selection (Han et al., 2019; Petropoulos
improve organisational performance. et al., 2018b, and Section 2.11.3), and other advanced
Practitioners need to recognise limits to forecastability forecasting support systems (Arvan, Fahimnia, Reisi, &
and be willing to consider alternative (non-forecasting) Siemsen, 2019; Baecke, De Baets, & Vanderheyden, 2017,
approaches when the desired level of accuracy is not see also Section 3.7.1).
achievable (Gilliland, 2010). Alternatives include supply The key challenges that demand forecasting faces vary
chain re-engineering – to better react to unforeseen vari- from domain to domain. They include:
ations in demand, and demand smoothing – leverag-
ing pricing and promotional practices to shape more 1. The existence of intermittent demands, e.g., irregu-
favourable demand patterns. lar demand patterns of fashion products. According
Despite measurable advances in our statistical fore- to Nikolopoulos (2020), limited literature has fo-
casting capabilities (Makridakis, Hyndman & Petropoulos, cused on intermittent demand. The seminal work
2020), it is questionable whether forecasting practice has by Croston (1972) was followed by other represen-
similarly progressed. The solution, perhaps, is what Mor- tative methods such as the SBA method by Syntetos
lidge (2014a, page 39) suggests that ‘‘users should focus
less on trying to optimise their forecasting process than 92 This subsection was written by Yanfei Kang.

765
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

and Boylan (2001), the aggregate–disaggregate in- management, logistics and human resources (also see Sec-
termittent demand approach (ADIDA) by Nikolopou- tion 3.2.1). Typically, forecasts are based on an amalgam
los et al. (2011), the multiple temporal aggrega- of statistical methods and management judgment (Fildes
tion by Petropoulos and Kourentzes (2015), and & Goodwin, 2007). Hofmann and Rutschmann (2018) have
the k nearest neighbour (kNN) based approach by investigated the potential for using big data analytics in
Nikolopoulos et al. (2016). See Section 2.8 for more supply chain forecasting but indicate more research is
details on intermittent demand forecasting and Sec- needed to establish its usefulness.
tion 2.10.2 for a discussion on temporal aggrega- In many organisations forecasts are a crucial element
tion. of Sales and Operations Planning (S&OP), a tool that brings
2. The emergence of new products. Recent studies together different business plans, such as those relating
on new product demand forecasting are based on to sales, marketing, manufacturing and finance, into one
finding analogies (Hu, Acimovic, Erize, Thomas, & integrated set of plans (Thomé, Scavarda, Fernandez, &
Van Mieghem, 2019; Wright & Stern, 2015), lever- Scavarda, 2012). The purposes of S&OP are to balance sup-
aging comparable products (Baardman, Levin, Per- ply and demand and to link an organisation’s operational
akis, & Singhvi, 2018), and using external informa- and strategic plans. This requires collaboration between
tion like web search trends (Kulkarni, Kannan, & individuals and functional areas at different levels be-
Moe, 2012). See Section 3.2.6 for more details on cause it involves data sharing and achieving a consensus
new product demand forecasting. on forecasts and common objectives (Mello, 2010). Suc-
3. The existence of short-life-cycle products, e.g., cessful implementations of S&OP are therefore associated
smartphone demand (e.g., Chung, Niu, & Sriskan- with forecasts that are both aligned with an organisa-
darajah, 2012; Shi, Yin, Cai, Cichocki, Yokota, Chen, tion’s needs and able to draw on information from across
Yuan, & Zeng, 2020; Szozda, 2010). the organisation. This can be contrasted with the ‘silo
4. The hierarchical structure of the data such as the culture’ identified in a survey of companies by Moon,
electricity demand mapped to a geographical hi- Mentzer, and Smith (2003) where separate forecasts were
erarchy (e.g., Athanasopoulos et al., 2009; Hong prepared by different departments in ‘islands of analysis’.
et al., 2019; Hyndman et al., 2011, but also Sec- Methods for reconciling forecasts at different levels in
tion 2.10.1). both cross-sectional hierarchies (e.g., national, regional
and local forecasts) and temporal hierarchies (e.g., annual,
With the advent of the big data era, a couple of co- monthly and daily forecasts) are also emerging as an ap-
existing new challenges have drawn the attention of re- proach to break through information silos in organisations
searchers and practitioners in the forecasting community: (see Section 2.10.1, Section 2.10.2, and Section 2.10.3).
the need to forecast a large volume of related time se- Cross-temporal reconciliation provides a data-driven ap-
ries (e.g., thousands or millions of products from one large proach that allows information to be drawn from dif-
retailer: Salinas, Michael et al., 2019), and the increas- ferent sources and levels of the hierarchy and enables
ing number of external variables that have significant this to be blended into coherent forecasts (Kourentzes &
influence on future demand (e.g., massive amounts of Athanasopoulos, 2019).
keyword search indices that could impact future tourism In some supply chains, companies have agreed to share
demand (Law, Li, Fong, & Han, 2019)). Recently, to deal data and jointly manage planning processes in an initia-
with these new challenges, numerous empirical studies tive known as Collaborative Planning, Forecasting, and Re-
have identified the potentials of deep learning based plenishment (CPFR) (Seifert, 2003, also see Section 3.2.3).
global models, in both point and probabilistic demand CPFR involves pooling information on inventory levels and
forecasting (e.g., Bandara, Bergmeir, & Smyl, 2020b; Ran- on forthcoming events, like sales promotions. Demand
gapuram et al., 2018; Salinas, Michael et al., 2019; Wen forecasts can be shared, in real time via the Internet,
et al., 2017). With the merits of cross-learning, global and discrepancies between them reconciled. In theory,
models have been shown to be able to learn long memory information sharing should reduce forecast errors. This
patterns and related effects (Montero-Manso & Hyndman, should mitigate the ‘bullwhip effect’ where forecast errors
2020), latent correlation across multiple series (Smyl, at the retail-end of supply chains cause upstream sup-
2020), handle complex real-world forecasting situations pliers to experience increasingly volatile demand, forcing
such as data sparsity and cold-starts (Chen, Kang, Chen, & them to hold high safety stock levels (Lee et al., 2007).
Wang, 2020), include exogenous covariates such as pro- Much research demonstrating the benefits of collabora-
motional information and keyword search indices (Law tion has involved simulated supply chains (Fildes, 2017).
et al., 2019), and allow for different choices of distribu- Studies of real companies have also found improved per-
tional assumptions (Salinas, Michael et al., 2019). formance through collaboration (e.g., Boone & Ganeshan,
2008; Eksoz, Mansouri, Bourlakis, & Önkal, 2019; Hill,
3.2.2. Forecasting in the supply chain93 Zhang, & Miller, 2018), but case study evidence is still
A supply chain is ‘a network of stakeholders (e.g., re- scarce (Syntetos, Babai, Boylan, Kolassa, & Nikolopoulos,
tailers, manufacturers, suppliers) who collaborate to sat- 2016). The implementation of collaborative schemes has
isfy customer demand’ (Perera et al., 2019). Forecasts been slow with many not progressing beyond the pilot
inform many supply chain decisions, including those re- stage (Galbreth, Kurtuluş, & Shor, 2015; Panahifar, Byrne,
lating to inventory control, production planning, cash flow & Heavey, 2015). Barriers to successful implementation
include a lack of trust between organisations, reward sys-
93 This subsection was written by Paul Goodwin. tems that foster a silo mentality, fragmented forecasting
766
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

systems within companies, incompatible systems, a lack the greater stability of data at the lower frequency level,
of relevant training and the absence of top management with no need for disaggregation.
support (Fliedner, 2003; Thomé, Hollmann, & Scavarda do The variance of forecast errors over lead time is re-
Carmo, 2014). quired to determine safety stock requirements for con-
Initiatives to improve supply chain forecasting can be tinuous review systems. The conventional approach is to
undermined by political manipulation of forecasts and take the variance of one-step-ahead errors and multiply
gaming. Examples include ‘enforcing’: requiring inflated it by the lead time. However, this estimator is unsound,
forecasts to align them with sales or financial goals, ‘sand- even if demand is independent and identically distributed,
bagging’: underestimating sales so staff are rewarded for as explained by Prak, Teunter, and Syntetos (2017). A
exceeding forecasts, and ‘spinning’: manipulating fore- more direct approach is to smooth the mean square errors
casts to garner favourable reactions from colleagues over the lead time (Syntetos & Boylan, 2006).
(Mello, 2009). Pennings, van Dalen, and Rook (2019) dis- Strijbosch and Moors (2005) showed that unbiased
cuss schemes for correcting such intentional biases. forecasts will not necessarily lead to achievement, on
For a discussion of the forecasting of returned items in average, of target cycle service levels or fill rates. Wall-
supply chains, see Section 3.2.9, while Section 3.9 offers ström and Segerstedt (2010) proposed a ‘Periods in Stock’
a discussion of possible future developments in supply measure, which may be interpreted, based on a ‘fictitious
chain forecasting. stock’, as the number of periods a unit of the forecasted
item has been in stock or out of stock. Such measures
3.2.3. Forecasting for inventories94 may be complemented by a detailed examination of error-
Three aspects of the interaction between forecasting implication metrics (Boylan & Syntetos, 2006). For in-
and inventory management have been studied in some ventory management, these metrics will typically include
depth and are the subject of this review: the bullwhip ef- inventory holdings and service level implications (e.g., cy-
fect, forecast aggregation, and performance measurement. cle service level, fill rate). Comparisons may be based on
The ‘bullwhip effect’ occurs whenever there is ampli- total costs or via ‘exchange curves’, showing the trade-offs
fication of demand variability through the supply chain between service and inventory holding costs. Compar-
(Lee, Padmanabhan, & Whang, 2004), leading to excess in- isons such as these are now regarded as standard in the
ventories. This can be addressed by supply chain members literature on forecasting for inventories and align well
sharing downstream demand information, at stock keep- with practice in industry.
ing unit level, to take advantage of less noisy data. Analyt-
ical results on the translation of ARIMA (see Section 2.3.4) 3.2.4. Forecasting in retail95
demand processes have been established for order-up- Retail companies depend crucially on accurate demand
to inventory systems (Gilbert, 2005). There would be no forecasting to manage their supply chain and make de-
value in information sharing if the wholesaler can use cisions concerning planning, marketing, purchasing, dis-
such relationships to deduce the retailer’s demand pro- tribution and labour force. Inaccurate forecasts lead to
cess from their orders (see, for example, Graves, 1999). unnecessary costs and poor customer satisfaction. Inven-
Such deductions assume that the retailer’s demand pro- tories should be neither too high (to avoid waste and extra
cess and demand parameters are common knowledge to costs of storage and labour force), nor too low (to prevent
supply chain members. Ali and Boylan (2011) showed stock-outs and lost sales, Ma & Fildes, 2017).
that, if such common knowledge is lacking, there is value Forecasting retail demand happens in a three-
in sharing the demand data itself and Ali, Boylan, and dimensional space (Syntetos et al., 2016): the position
Syntetos (2012) established relationships between accu- in the supply chain hierarchy (store, distribution cen-
racy gains and inventory savings. Analytical research has tre, or chain), the level in the product hierarchy (SKU,
tended to assume that demand parameters are known. Pa- brand, category, or total) and the time granularity (day,
store, Alfieri, Zotteri, and Boylan (2020) investigated the week, month, quarter, or year). In general, the higher is
impact of demand parameter uncertainty, showing how it the position in the supply chain, the lower is the time
exacerbates the bullwhip effect. granularity required, e.g., retailers need daily forecasts for
Forecasting approaches have been developed that are store replenishment and weekly forecasts for DC distribu-
particularly suitable in an inventory context, even if not tion/logistics activities at the SKU level (Fildes, Ma, & Ko-
originally proposed to support inventory decisions. For lassa, 2019b). Hierarchical forecasting (see Section 2.10.1)
example, Nikolopoulos et al. (2011) proposed that fore- is a promising tool to generate coherent demand forecasts
casts could be improved by aggregating higher frequency on multiple levels over different dimensions (Oliveira &
data into lower frequency data (see also Section 2.10.2; Ramos, 2019).
other approaches are reviewed in Section 3.2.1). Follow- Several factors affect retail sales, which often increase
ing this approach, Forecasts are generated at the lower substantially during holidays, festivals, and other spe-
frequency level and then disaggregated, if required, to cial events. Price reductions and promotions on own and
the higher frequency level. For inventory replenishment competitors’ products, as well as weather conditions or
decisions, the level of aggregation may conveniently be pandemics, can also change sales considerably (Huang,
chosen to be the lead time, thereby taking advantage of Fildes, & Soopramanien, 2019).

94 This subsection was written by John E. Boylan. 95 This subsection was written by Stephan Kolassa & Patrícia Ramos.

767
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Zero sales due to stock-outs or low demand occur very Finally, there are differences in the forecasting process:
often at the SKU × store level, both at weekly and daily
granularity. The most appropriate forecasting approaches
• B&M retailers decouple pricing/promotion decisions
and optimisation from the customer interaction, and
for intermittent demand are Croston’s method (Croston,
therefore from forecasting. Online, this is not possi-
1972), the Syntetos-Boylan approximation (SBA; Syntetos
ble, because the customer has total transparency to
& Boylan, 2005), and the TSB method (Teunter, Syntetos,
competitors’ offerings. Thus, online pricing needs to
& Zied Babai, 2011), all introduced in Section 2.8.1. These
react much more quickly to competitive pressures –
methods have been used to forecast sales of spare parts
faster than the forecasting cycle.
in automotive and aerospace industries but have not yet
been evaluated in the retail context.
• Thus, the specific value of predictors is often not
Univariate forecasting models are the most basic meth- known at the time of forecasting: we don’t know
ods retailers may use to forecast demand. They range yet which customer will log on, so we don’t know
from simple methods such as simple moving averages or yet how many people will see a particular product
exponential smoothing to ARIMA and ETS models (dis- displayed on their personalised landing page. (Nor
cussed in Section 2.3). These are particularly appropriate do we know today what remaining stock will be dis-
to forecast demand at higher aggregation levels (Ramos played.) Thus, changes in drivers need to be ‘‘baked
& Oliveira, 2016; Ramos, Santos, & Rebelo, 2015). The into’’ the forecasting algorithm.
main advantage of linear causal methods such as multiple • Feedback loops between forecasting and other pro-
linear regression is to allow the inclusion of external cesses are thus even more important online: yester-
effects discussed above. There is no clear evidence yet that day’s forecasts drive today’s stock position, driving
nonlinear models and novel machine learning methods today’s personalised recommendations, driving de-
can improve forecast accuracy (Fildes et al., 2019b). mand, driving today’s forecasts for tomorrow. Over-
To be effective, point estimates should be combined all, online retail forecasting needs to be more ag-
with quantile predictions or prediction intervals for deter- ile and responsive to the latest interactional de-
mining safety stock amounts needed for replenishment. cisions taken in the web store, and more tightly
However, to the best of our knowledge this is an under- integrated into the retailer’s interactional tactics and
investigated aspect of retail forecasting (Kolassa, 2016; omnichannel strategy.
Taylor, 2007). Systematic research on demand forecasting in an on-
The online channel accounts for an ever-increasing line or omnnichannel context is only starting to appear (e.g.
proportion of retail sales and poses unique challenges to Omar, Klibi, Babai, & Ducq, 2021, who use basket data
forecasting, beyond the characteristics of brick and mortar from online sales to improve omnichannel retail fore-
(B&M) retail stores. First, there are multiple drivers or casts).
predictors of demand that could be leveraged in online
retail, but not in B&M: 3.2.5. Promotional forecasting96
• Online retailers can fine-tune customer interactions, Promotional forecasting is central for retailing (see
e.g., through the landing page, product recommen- Section 3.2.4), but also relevant for many manufacturers,
dations, or personalised promotions, leveraging the particularly of Fast Moving Consumer Goods (FMCG). In
customer’s purchasing, browsing or returns history, principle, the objective is to forecast sales, as in most busi-
current shopping cart contents, or the retailer’s stock ness forecasting cases. However, what sets promotional
position, in order to tailor the message to one spe- forecasting apart is that we also make use of information
cific customer in a way that is impossible in B&M. about promotional plans, pricing, and sales of comple-
• Conversely, product reviews are a type of interaction mentary and substitute products (Bandyopadhyay, 2009;
between the customer and the retailer and other Zhang, Chen, & Lee, 2008). Other relevant variables may
customers which drives future demand. include store location and format, variables that capture
the presentation and location of a product in a store, prox-
Next, there are differences in forecast use: ies that characterise the competition, and so on (Andrews,
Currim, Leeflang, & Lim, 2008; Van Heerde, Leeflang, &
• Forecast use strongly depends on the retailer’s om-
Wittink, 2002).
nichannel strategy (Armstrong, 2017; Melacini, Per-
Three modelling considerations guide us in the choice
otti, Rasini, & Tappia, 2018; Sopadjieva, Dholakia, &
of models. First, promotional (and associated effects) are
Benjamin, 2017): e.g., for ‘‘order online, pick up in
proportional. For instance, we do not want to model the
store’’ or ‘‘ship from store’’ fulfillment, we need sep-
increase in sales as an absolute number of units, but in-
arate but related forecasts for both total online de-
stead, as a percentage uplift. We do this to not only make
mand and for the demand fulfilled at each separate
the model applicable to both smaller and larger applica-
store.
tions, for example, small and large stores in a retailing
• Online retailers, especially in fashion, have a much
chain, but also to gain a clearer insight into the behaviour
bigger problem with product returns. They may need
of our customers. Second, it is common that there are
to forecast how many products are returned over-
synergy effects. For example, a promotion for a product
all (e.g., Shang, McKie, Ferguson, & Galbreth, 2020),
or whether a specific customer will return a specific
product. 96 This subsection was written by Nikolaos Kourentzes.

768
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

may be offset by promotions for substitute products. Both & Meeran, 2013) namely management judgment, con-
these considerations are easily resolved if we use multi- sumer judgment and diffusion/formal mathematical mod-
plicative regression models. However, instead of working els. In general, the hybrid methods combining different
with the multiplicative models, we rely on the logarithmic approaches have been found to be more useful (Hynd-
transformation of the data (see Section 2.2.1) and proceed man & Athanasopoulos, 2018; Peres et al., 2010). Most
to construct the promotional model using the less cum- of the attempts in New product Forecasting (NPF) have
bersome additive formulation (see Section 2.3.2). Third, been about forecasting ‘adoption’ (i.e., enumerating the
the objective of promotional models does not end with customers who bought at least one time) rather than
providing accurate predictions. We are also interested in ‘sales’, which accounts for repeat purchases also. In gen-
the effect of the various predictors: their elasticity. This eral, these attempts dealt with point forecast although
can in turn provide the users with valuable information there have been some attempts in interval and density
about the customers, but also be an input for constructing forecasting (Meade & Islam, 2001).
optimal promotional and pricing strategies (Zhang et al., Out of the three approaches in NPF, management judg-
2008). ment is the most used approach (Gartner & Thomas,
Promotional models have been widely used on brand- 1993; Kahn, 2002; Lynn, Schnaars, & Skov, 1999) which
level data (for example, Divakar, Ratchford, & Shankar, is reported to have been carried out by either individual
2005). However, they are increasingly used on Stock Keep- managers or group of them. Ozer (2011) and Surowiecki
ing Unit (SKU) level data (Ma, Fildes, & Huang, 2016; (2005) articulated their contrasting benefits and deficits.
Trapero, Kourentzes, & Fildes, 2015), given advances in The Delphi method (see Section 2.11.4) has combined
modelling techniques. Especially at that level, limited the benefits of these two modes of operation (Rowe &
sales history and potentially non-existing examples of Wright, 1999) which has been effective in NPF. Prediction
past promotions can be a challenge. Trapero et al. (2015) markets in the recent past offered an alternative way to
consider this problem and propose using a promotional aggregate forecasts from a group of Managers (Meeran,
model that has two parts that are jointly estimated. The Dyussekeneva, & Goodwin, 2013; Wolfers & Zitzewitz,
first part focuses on the time series dynamics and is 2004) and some successful application of prediction mar-
modelled locally for each SKU. The second part tackles kets for NPF have been reported by Karniouchina (2011)
the promotional part, which pools examples of promo- and Plott and Chen (2002).
tions across SKUs to enable providing reasonable es- In the second category, customer surveys, among other
timates of uplifts even for new SKUs. To ensure the methods, are used to directly ask the customers the like-
expected heterogeneity in the promotional effects, the lihood of them purchasing the product. Such surveys
model is provided with product group information. An- are found to be not very reliable (Morwitz, 1997). An
other recent innovation is looking at modelling promo- alternative method to avoid implicit bias associated with
tional effects both at the aggregate brand or total sales such surveys in extracting inherent customer preference
level, and disaggregate SKU level, relying on temporal is conjoint analysis, which makes implicit trade off cus-
aggregation (Kourentzes & Petropoulos, 2016, and Sec- tomers make between features explicit by analysing the
tion 2.10.2). Ma et al. (2016) concern themselves with the customers’ preference for different variants of the prod-
intra-and inter-category promotional information. The chal- uct. One analysis technique that attempts to mirror real
lenge now is the number of variables to be considered life experience more is Choice Based Conjoint analysis
for the promotional model, which they address by us- (CBC) in which customers choose the most preferred
ing sequential LASSO (see also Section 2.5.3). Although product among available choices. Such CBC models used
together with the analysis tools such as Logit (McFad-
the aforementioned models have shown very promis-
den, 1977) have been successful in different NPF applica-
ing results, one has to recognise that in practice pro-
tions (Meeran, Jahanbin, Goodwin, & Quariguasi Frota Neto,
motions are often forecasted using judgmental adjust-
2017).
ments, with inconsistent performance (Trapero, Pedregal,
In the third approach, mathematical/formal models
Fildes, & Kourentzes, 2013); see also Section 2.11.2 and
known as growth or diffusion curves (see Section 2.3.18
Section 3.7.3.
and Section 2.3.19) have been used successfully to do
NPF (Hu et al., 2019). The non-availability of past data is
3.2.6. New product forecasting97
mitigated by growth curves by capturing the generic pat-
Forecasting the demand for a new product accurately
tern of the demand growth of a class of products, which
has even more consequence with regards to well-being
could be defined by a limited number of parameters such
of the companies than that for a product already in the
as saturation level, inflexion point, etc. For a new product
market. However, this task is one of the most difficult
a growth curve can be constituted from well-estimated
tasks managers must deal with simply because of non-
parameters using analogous products, market intelligence
availability of past data (Wind, 1981). Much work has
or regression methods. Most extensively used family of
been going on for the last five decades in this field.
growth curves for NPF has started with Bass model (Bass,
Despite his Herculean attempt to collate the methods re-
1969) that has been extended extensively (Bass, Gor-
ported, Assmus (1984) could not list all even at that time.
don, Ferguson, & Githens, 2001; Easingwood, Mahajan, &
The methods used before and since could be categorised
Muller, 1983; Islam & Meade, 2000; Peres et al., 2010;
into three broad approaches (Goodwin, Dyussekeneva
Simon & Sebastian, 1987). A recent applications of NPF
focused on consumer electronic goods using analogous
97 This subsection was written by Sheik Meeran. products (Goodwin, Dyussekeneva et al., 2013).
769
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

3.2.7. Spare parts forecasting98 forecasting spare parts demand include the Hyperbolic-
Spare parts are ubiquitous in modern societies. Their Exponential Smoothing method proposed by Prestwich,
demand arises whenever a component fails or requires Tarim, Rossi, and Hnich (2014) and the modified Croston’s
replacement. Demand for spare parts is typically inter- method developed by Babai, Dallery, Boubaker, and Kalai
mittent, which means that it can be forecasted using (2019).
the plethora of parametric and non-parametric methods
presented in Section 2.8. In addition to the intermittence 3.2.8. Predictive maintenance99
of demand, spare parts have two additional character- A common classification of industrial maintenance in-
istics that make them different from Work-In-Progress cludes three types of maintenance (Montero Jimenez,
and final products, namely: (i) they are generated by Schwartz, Vingerhoeds, Grabot, & Salaün, 2020). Correc-
maintenance policies and part breakdowns, and (ii) they tive maintenance refers to maintenance actions that occur
are subject to obsolescence (Bacchetti & Saccani, 2012; after the failure of a component. Preventive maintenance
Kennedy, Wayne Patterson, & Fredendall, 2002). consists of maintenance actions that are triggered after a
The majority of forecasting methods do not link the scheduled number of units as cycles, kilometers, flights,
demand to the generating factors, which are often related etc. To schedule the fixed time between two preventive
to maintenance activities. The demand for spare parts maintenance actions, the Weibull distribution is com-
originates from the replacement of parts in the installed monly used (Baptista, Sankararaman, de Medeiros, Nasci-
base of machines (i.e., the location and number of prod- mento, Prendinger, & Henriques, 2018). The drawbacks of
ucts in use), either preventively or upon breakdown of the preventive maintenance are related to the replacement
part (Kim, Dekker, & Heij, 2017). Fortuin (1984) claims of components that still have a remaining useful life;
that using installed base information to forecast the spare therefore, early interventions imply a waste of resources
part demand can lead to stock reductions of up to 25%. An and too late actions could imply catastrophic failures.
overview of the literature that deals with spare parts fore- Additionally, the preventive intervention itself could be
casting with installed base information is given by Van a source of failures too. Finally, predictive maintenance
der Auweraer and Boute (2019). Spare parts demand can (PdM) complements the previous ones and, essentially,
be driven by the result of maintenance inspections and, in uses predictive tools to determine when actions are nec-
this case, a maintenance-based forecasting model should essary (Carvalho, Soares, Vita, Francisco, Basto, & Alcalá,
then be considered to deal with this issue. Such forecast- 2019). Within this predictive maintenance group, other
ing models include the Delay Time (DT) model analysed terms are usually found in the literature as Condition-
in Wang and Syntetos (2011). Using the fitted values of Based Maintenance and Prognostic and Health Manage-
the distribution parameters of a data set related to a ment, (Montero Jimenez et al., 2020).
hospital pumps, Wang and Syntetos (2011) have shown The role of forecasting in industrial maintenance is
that when the failure and fault arriving characteristics of of paramount importance. One application is to forecast
the items can be captured, it is recommended to use the spare parts (see Section 3.2.7), whose demands are typi-
DT model to forecast the spare part demand with a higher cally intermittent, usually required to carry out corrective
forecast accuracy. However, when such information is and preventive maintenances (Van der Auweraer & Boute,
not available, then time series forecasting methods, such 2019; Wang & Syntetos, 2011). On the other hand, it is
as those presented in Section 2.8.1, are recommended. crucial for PdM the forecast of the remaining useful time,
The maintenance based forecasting is further discussed in which is the useful life left on an asset at a particular time
Section 3.2.8. of operation (Si, Wang, Hu, & Zhou, 2011). This work will
Given the life cycle of products, spare parts are as- be focused on the latter, which is usually found under the
sociated with a risk of obsolescence. Molenaers, Baets, prognostic stage (Jardine, Lin, & Banjevic, 2006).
Pintelon, and Waeyenbergh (2012) discussed a case study The typology of forecasting techniques employed is
where 54% of the parts stocked at a large petrochemical very ample. Montero Jimenez et al. (2020) classify them
company had seen no demand for the last 5 years. Hinton in three groups: physics-based models, knowledge-based
(1999) reported that the US Department of Defence was models, and data-driven models. Physics-based models
holding 60% excess of spare parts, with 18% of the parts require high skills on the underlying physics of the ap-
(with a total value of $1.5 billion) having no demand plication. Knowledge-based models are based on facts or
at all. To take into account the issue of obsolescence cases collected over the years of operation and main-
in spare parts demand forecasting, Teunter et al. (2011) tenance. Although, they are useful for diagnostics and
have proposed the TSB method, which deals with linearly provide explicative results, its performance on prognos-
decreasing demand and sudden obsolescence cases. By tics is more limited. In this sense, data-driven models
means of an empirical investigation based on the indi- are gaining popularity for the development of compu-
vidual demand histories of 8000 spare parts SKUs from tational power, data acquisition, and big data platforms.
the automotive industry and the Royal Air Force (RAF, In this case, data coming from vibration analysis, lubri-
UK), Babai, Syntetos, and Teunter (2014) have demon- cant analysis, thermography, ultrasound, etc. are usu-
strated the high forecast accuracy and inventory perfor- ally employed. Here, well-known forecasting models as
mance of the TSB method. Other variants of the Croston’s VARIMAX/GARCH (see also Section 2.3) are successfully
method developed to deal with the risk of obsolescence in used (Baptista et al., 2018; Cheng, Yu, & Chen, 2012;

98 This subsection was written by Mohamed Zied Babai. 99 This subsection was written by Juan Ramón Trapero Arenas.

770
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

García, Pedregal, & Roberts, 2010; Gomez Munoz, De la items. Past sales can then be used in conjunction with this
Hermosa Gonzalez-Carrato, Trapero Arenas, & Garcia Mar- distribution to create forecasts of returns. Serial number
quez, 2014). State Space models based on the Kalman Fil- level (SL) information, is more detailed and consists of
ter are also employed (Pedregal & Carmen Carnero, 2006; the time matching of an individual unit item’s issues and
Pedregal, García, & Roberts, 2009, and Section 2.3.6). Re- returns and thus exactly the time each individual unit, on
cently, given the irruption of the Industry 4.0, physi- a serial number basis, spent with the customer. Seriali-
cal and digital systems are getting more integrated and sation allows for a complete characterisation of the time
Machine Learning/Artificial Intelligence are drawing the to return distribution. Very importantly, it also enables
attention of practitioners and academics alike (Carvalho tracking exactly how many items previously sold remain
et al., 2019). In that same reference, it is found that with customers, providing time series of unreturned past
the most frequently used Machine Learning methods in sales. Unreturned past sales can then be extrapolated
PdM applications were Random Forest, Artificial Neural — along with a time to return distribution — to create
Networks, Support Vector Machines and K-means. forecasts of returns.
Goltsos, Syntetos, and van der Laan (2019) offered
3.2.9. Reverse logistics100 empirical evidence in the area of returns forecasting by
As logistics and supply chain operations rely upon analysing a serialised data set from a remanufacturing
accurate demand forecasts (see also Section 3.2.2), reverse company in North Wales. They found the Beta probabil-
logistics and closed loop supply chain operations rely ity distribution to best fit times-to-return. Their research
upon accurate forecasts of returned items. Such items suggests that serialisation is something worthwhile pur-
(usually referred as cores) can be anything from reusable suing for low volume products, especially if they are ex-
shipping or product containers to used laptops, mobile pensive. This makes a lot of sense from an investment per-
phones or car engines. If some (re)manufacturing activ- spective, since the relevant serial numbers are very few.
ity is involved in supply chains, it is both demand and However, they also provided evidence that such benefits
returned items forecasts that are needed since it is net expand in the case of high volume items. Importantly,
demand requirements (demand – returns) that drive re- the benefits of serialisation not only enable the imple-
manufacturing operations. mentation of the more complex SL method, but also the
Forecasting methods that are known to work well accurate characterisation of the returns process, thus also
when applied to demand forecasting, such as SES for benefiting the PL method (which has been shown to be
example (see Section 2.3.1), do not perform well when very robust).
applied to time-series of returns because they assume
returns to be a process independent of sales. There are 3.3. Economics and finance
some cases when this independence might hold, such as
when a recycler receives items sold by various companies 3.3.1. Macroeconomic survey expectations101
and supply chains (Goltsos & Syntetos, 2020). In these Macroeconomic survey expectations allow tests of the-
cases, simple methods like SES applied on the time se- ories of how agents form their expectations. Expecta-
ries of returns might prove sufficient. Typically though, tions play a central role in modern macroeconomic re-
returns are strongly correlated with past sales and the search (Gali, 2008). Survey expectations have been used
installed base (number of products with customers). After to test theories of expectations formation for the last
all, there cannot be a product return if a product has 50 years. Initially the Livingston survey data on infla-
not first been sold. This lagged relationship between sales tionary expectations was used to test extrapolative or
and returns is key to the effective characterisation of the adaptive hypothesis, but the focus soon turned to testing
returns process. whether expectations are formed rationally (see Turnovsky
Despite the increasing importance of circular econ- and Wachter, 1972, for an early contribution). Accord-
omy and research on closed loop supply chains, returns ing to Muth (1961, p. 316), rational expectations is the
forecasting has not received sufficient attention in the hypothesis that: ‘expectations, since they are informed
academic literature (notable contributions in this area predictions of future events, are essentially the same as
include Clottey, Benton Jr., & Srivastava, 2012; de Brito the predictions of the relevant economic theory.’ This
& van der Laan, 2009; Goh & Varaprasad, 1986; Toktay, assumes all agents have access to all relevant information.
2003; Toktay, Wein, & Zenios, 2000). The seminal work Instead, one can test whether agents make efficient use of
by Kelle and Silver (1989) offers a useful framework to the information they possess. This is the notion of forecast
forecasting that is based on the degree of available in- efficiency (Mincer & Zarnowitz, 1969), and can be tested
formation about the relationship between demand and by regressing the outturns on a constant and the forecasts
returns. Product level (PL) information consists of the time of those outturns. Under forecast efficiency, the constant
series of sales and returns, alongside information on the should be zero and the coefficient on the forecasts should
time each product spends with a customer. The question be one. When the slope coefficient is not equal to one, the
then is how to derive this time to return distribution. This forecast errors will be systematically related to informa-
can be done through managerial experience, by investi- tion available at the forecast origin, namely, the forecasts,
gating the correlation of the demand and the returns time and cannot be optimal. The exchange between Figlewski
series, or by serialising and tracking a subset (sample) of and Wachtel (1981, 1983) and Dietrich and Joines (1983)

100 This subsection was written by Aris A. Syntetos. 101 This subsection was written by Michael P. Clements.

771
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

clarifies the role of partial information in testing fore- 3.3.2. Forecasting GDP and inflation103
cast efficiency (that is, full information is not necessary), As soon as Bayesian estimation of DSGEs became pop-
and shows that the use of the aggregate or consensus ular, these models have been employed in forecasting
forecast in the individual realisation-forecast regression horseraces to predict the key macro variables, for ex-
outlined above will give rise to a slope parameter less ample, Gross Domestic Product (GDP) and inflation, as
discussed in Del Negro and Schorfheide (2013). The fore-
than one when forecasters are efficient but possess partial
casting performance is evaluated using rolling or recur-
information. Zarnowitz (1985), Keane and Runkle (1990)
sive (expanded) prediction windows (for a discussion,
and Bonham and Cohen (2001) consider pooling across see Cardani, Paccagnini, & Villa, 2015). DSGEs are usually
individuals in the realisation-forecast regression, and the estimated using revised data, but several studies pro-
role of correlated shocks across individuals. pose better results estimating the models using real-time
Recently, researchers considered why forecasters might data (see, for example, Cardani et al., 2019; Del Negro
not possess full-information, stressing informational rigidi- & Schorfheide, 2013; Kolasa & Rubaszek, 2015b; Wolters,
ties: sticky information (see, inter alia, Mankiw & Reis, 2015).
2002; Mankiw, Reis, & Wolfers, 2003), and noisy informa- The current DSGE model forecasting compares DSGE
tion (see, inter alia, Sims, 2003; Woodford, 2002). Coibion models to competitors (see Section 2.3.15 for an intro-
duction to DSGE models). Among them, we can include
and Gorodnichenko (2012, 2015) test these models using
the Random Walk (the naive model which assumes a
aggregate quantities, such as mean errors and revisions.
stochastic trend), the Bayesian VAR models (Minnesota
Forecaster behaviour can be characterised by the re- Prior à la Doan, Litterman, and Sims, 1984; and Large
sponse to new information (see also Section 2.11.1). Over Bayesian VAR à la Bańbura, Giannone, and Reichlin, 2010),
or under-reaction would constitute inefficiency. Broer and the Hybrid-Models (the DSGE-VAR à la Del Negro and
Kohlhas (2018) and Bordalo, Gennaioli, Ma, and Shleifer Schorfheide, 2004; and the DSGE-Factor Augmented VAR
(2018) find that agents over-react, generating a nega- à la Consolo, Favero, and Paccagnini, 2009), and the insti-
tive correlation between their forecast revision and er- tutional forecasts (Greenbook, Survey Professional Fore-
ror. The forecast is revised by more than is warranted casts, and the Blue Chip, as illustrated in Edge and Gürkay-
by the new information (over-confidence regarding the nak, 2010).
Table 2 summarises the current DSGE forecasting lit-
value of the new information). Bordalo et al. (2018) ex-
erature mainly for the US and Euro Area and provided
plain the over-reaction with a model of ‘diagnostic’ ex-
by estimating medium-scale models. As general findings,
pectations, whereas Fuhrer (2018) finds ‘intrinsic inflation DSGEs can outperform other competitors, with the ex-
persistence’: individuals under-react to new information, ception for the Hybrid-Models, in the medium and long-
smoothing their responses to news. run to forecast GDP and inflation. In particular, Smets
The empirical evidence is often equivocal, and might and Wouters (2007) was the first empirical evidence
reflect: the vintage of data assumed for the outturns; of how DSGEs can be competitive with forecasts from
whether allowance is made for ‘instabilities’ such as alter- Bayesian VARs, convincing researchers and policymakers
nating over- and under-prediction (Rossi & Sekhposyan, in adopting DSGEs for prediction evaluations. As discussed
2016) and the assumption of squared-error loss (see, for in Del Negro and Schorfheide (2013), the accuracy of
DSGE forecasts depends on how the model is able to
example, Clements, 2014b; Patton & Timmermann, 2007).
capture low-frequency trends in the data. To explain the
Research has also focused on the histogram forecasts
macro-finance linkages during the Great Recession, the
produced by a number of macro-surveys. Density fore- Smets and Wouters model was also compared to other
cast evaluation techniques such as the probability integral DSGE specifications including the financial sector. For
transform102 have been applied to histogram forecasts, example, Cardani et al. (2019), Del Negro and Schorfheide
and survey histograms have been compared to bench- (2013), Galvão, Giraitis, Kapetanios, and Petrova (2016),
mark forecasts (see, for example, Bao, Lee, & Saltoglu, and Kolasa and Rubaszek (2015a) provide forecasting per-
2007; Clements, 2018; Hall & Mitchell, 2009). Research formance for DSGEs with financial frictions. This strand
has also considered uncertainty measures based on the of the literature shows how this feature can improve the
histograms (Clements, 2014a). Sections 2.12.4 and 2.12.5 baseline Smets and Wouters predictions for the business
cycle, in particular during the recent Great Recession.
also discuss the evaluation and reliability of probabilistic
However, the Hybrid-Models always outperform the
forecasts.
DSGEs thanks to the combination of the theory-based
Clements (2009, 2010) and Engelberg, Manski, and model (DSGE) and the statistical representation (VAR or
Williams (2009) considered the consistency between the Factor Augmented VAR), as illustrated by Del Negro and
point predictions and histogram forecasts. Reporting prac- Schorfheide (2004) and Consolo et al. (2009).
tices such as ‘rounding’ have also been considered (Binder, Moreover, several studies discuss how prediction per-
2017; Clements, 2011; Manski & Molinari, 2010). formance could depend on the parameters’ estimation. Ko-
Clements (2019) reviews macroeconomic survey ex- lasa and Rubaszek (2015b) suggest that updating DSGE
pectations. model parameters only once a year is enough to have
accurate and efficient predictions about the main macro
variables.
102 See, for example, Rosenblatt (1952), Shephard (1994), Kim,
Shephard, and Chib (1998), Diebold et al. (1998) and Berkowitz (2001). 103 This subsection was written by Alessia Paccagnini.

772
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Table 2
Alternative competitors to DSGE models.
Competitor Reference
Hybrid Models US: Del Negro and Schorfheide (2004), Consolo et al. (2009)
Random Walk US: Gürkaynak, Kısacıkoğlu, and Rossi (2013), Euro Area: Warne et al. (2010), Smets, Warne,
and Wouters (2014)
Bayesian VAR US: Smets and Wouters (2007), Gürkaynak et al. (2013), Wolters (2015), Bekiros and
Paccagnini (2014), Bekiros and Paccagnini (2015a), Bekiros and Paccagnini (2015b), Euro Area:
(Warne et al., 2010)
Time-Varying VAR and Markov-Switching US: Bekiros, Cardani, Paccagnini, and Villa (2016), Euro Area: Bekiros and Paccagnini (2016)
Institutional Forecasts US: Edge and Gürkaynak (2010), Kolasa et al. (2012), Del Negro and Schorfheide (2013),
Wolters (2015)

3.3.3. Forecasting unemployment104 (2016) applies a flow approach to unemployment fore-


Unemployment has significant implications at both the casting and finds improvements, as does Smith (2011).
micro and macro levels, influencing individual living stan- One approach that does yield accurate forecasts is to
dards, health and well-being, as well as imposing di- use a measure of profitability as the explanatory variable,
rect costs on the economy. Given its importance, policy- assuming that unemployment will fall when hiring is
makers put unemployment at the heart of their economic profitable. Hendry (2001) proxies profitability (π ) by the
plans, and as such require accurate forecasts to feed into gap between the real interest rate (reflecting costs) and
economic policy decisions. Unemployment is described as the real growth rate (reflecting the demand side), such
a lagging indicator of the economy, with characteristics that the unemployment rate rises when the real interest
including business cycles and persistence. Despite this, rate exceeds the real growth rate, and vice versa:
forecasting the unemployment rate is difficult, because
the data are highly non-stationary with abrupt distribu- πt = (RL − ∆p − ∆y)t
tional shifts, but persistence within regimes. In this sec- where RL is the long-term interest rate, ∆p is a measure
tion we focus on methods used to forecast the aggregate of inflation and ∆y is a measure of output growth. This is
unemployment rate. then embedded within a dynamic equilibrium correction
Unemployment is the outcome of supply and demand model, using impulse indicator saturation (IIS: Hendry,
for labour, aggregated across all prospective workers, Johansen, & Santos, 2008b; Johansen & Nielsen, 2009)
with labour demand derived from demand for goods and and step indicator saturation (SIS: Castle et al., 2015c)
services. This implies a highly complex data generat- to capture outliers, breaks and regime shifts, as well as
ing process. Empirical forecasting models tend to sim- allowing for any non-linearities using Taylor expansions
plify this relationship, with two approaches dominating for the regressors. The resulting forecasts perform well
the literature. The first is based on the Phillips (1958) over the business cycle relative to alternative statistical
curve capturing a non-linear relationship between nom- models (also see Hendry, 2015, and Castle, Hendry, &
inal wage inflation and the unemployment rate, or the Martinez, 2020c).
relation between unemployment and output described
Forecasts from models of unemployment could be im-
as Okun’s 1962 Law. The second uses the time-series
proved with either better economic theories of aggregate
properties of the data to produce statistical forecasts,
unemployment,105 or more general empirical models that
such as univariate linear models (for example, ARIMA or
tackle stochastic trends, breaks, dynamics, non-linearities
unobserved component models; see Sections 2.3.4 and
and interdependence,106 or better still, both. The COVID-
2.3.6), multivariate linear models (for example, VARMA or
19 pandemic and subsequent lockdown policies highlight
CVAR; see Section 2.3.9), various threshold autoregressive
just how important forecasts of unemployment are (Cas-
models (see Section 2.3.13), Markov Switching models
tle, Doornik, & Hendry, 2021).
(see Section 2.3.12) and Artificial Neural Networks (see
Section 2.7.8).
3.3.4. Forecasting productivity107
The empirical literature is inconclusive as to the ‘best’
The growth of labour productivity, measured by the
forecasting models for unemployment, which varies by
percent change in output per hours worked, has varied
country, time period and forecast horizon. There is some
dramatically over the last 260 years. In the UK it ranged
evidence that non-linear statistical models tend to out-
from −5.8% at the onset of the 1920 Depression to just
perform within business cycle contractions or expansions,
but perform worse across business cycles (see, for ex- over 7% in 1971; see panel A in Fig. 6. Productivity growth
ample, Koop & Potter, 1999; Montgomery, Zarnowitz, is very volatile and has undergone large historical shifts
Tsay, & Tiao, 1998; Rothman, 1998), whereas Proietti with productivity growth averaging around 1% between
(2003) finds that linear models characterised by higher
persistence perform significantly better. Evidence of non- 105 There are many relevant theories based on microfoundations,
linearities is found by Johnes (1999), Milas and Rothman including search and matching, loss of skills, efficiency wages, and
(2008) and Peel and Speight (2000), and Gil-Alana (2001) insider-outsider models, see Layard, Nickell, and Jackman (1991) for
a summary.
finds evidence of long-memory. Barnichon and Garda 106 See Hendry and Doornik (2014) for an approach to jointly tackling
all of these issues.
104 This subsection was written by Jennifer L. Castle. 107 This subsection was written by Andrew B. Martinez.

773
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 6. Productivity Growth (Output per total hours worked).


Source: Bank of England and Penn World Table Version 10.0.

1800–1950 followed by an increase in the average annual They show that are able to broadly replicate the OBR’s
growth to 3% between 1950–1975. Since the mid-1970’s forecasts using a quasi-transformed autoregressive model
productivity growth has gradually declined in many de- with one lag, a constant, and a trend. The estimated long-
veloped economies; see panel B of Fig. 6. In the decade run trend is just over 2% per year through 2007 Q4 which
since 2009, 2% annual productivity growth was an upper is consistent with the OBR’s assumptions about the long-
bound for most G7 countries. run growth rate of productivity (OBR, 2019). However, it
The most common approach for forecasting produc- is possible to dramatically improve upon OBR’s forecasts
tivity is to estimate the trend growth in productivity in real-time by allowing for the long-term trend forecast
using aggregate data. For example, Gordon (2003) con- to adjust based on more recent historical patterns. By
siders three separate approaches for calculating trend taking a local average of the last four years of growth
labor productivity in the United States based on (i) av- rates, Martinez et al. (2021) generate productivity fore-
erage historical growth rates outside of the business cy- casts whose RMSE is on average more than 75% smaller
cle, (ii) filtering the data using the HP filter (Hodrick than OBR’s forecasts extending five-years-ahead and is
& Prescott, 1997), and (iii) filtering the data using the 84% smaller at the longest forecast horizon.
Kalman filter (see Kalman, 1960). The Office for Budget
Responsibility (OBR) in the UK and the Congressional Bud- 3.3.5. Fiscal forecasting for government budget surveillance109
get Office (CBO) in the US follow similar approaches for Recent economic recessions have led to a renewed
generating its forecasts of productivity based on average interest in fiscal forecasting, mainly for deficit and debt
historical growth rates as well as judgments about factors surveillance. This was certainly true in the case of the
that may cause productivity to deviate from its histor- 2008 recession, and looks to become even more impor-
ical trend in the short-term.108 Alternative approaches
tant in the current economic crisis brought on by the
include forecasting aggregate productivity using disaggre-
COVID-19 pandemic. This is particularly important in Eu-
gated firm-level data (see Bartelsman, Kurz, & Wolf, 2011;
rope, where countries are subject to strong fiscal mon-
Bartelsman & Wolf, 2014, and Section 2.10.1) and using
itoring mechanisms. Two main themes can be detected
time-series models (see Žmuk, Dumičić, & Palić, 2018, and
in the fiscal forecasting literature (Leal, Pérez, Tujula, &
Section 2.3.4).
Vidal, 2008). First, investigate the properties of forecasts
In the last few decades there have been several at-
in terms of bias, efficiency and accuracy. Second, check
tempts to test for time-varying trends in productivity and
the adequacy of forecasting procedures.
to allow for them. However, the focus of these approaches
The first topic has its own interest for long, mainly
has been primarily on the United States (Hansen, 2001;
restricted to international institutions (Artis & Marcellino,
Roberts, 2001), which saw a sharp rise in productivity
2001). Part of the literature, however, argue that fiscal
growth in the 1990’s that was not mirrored in other
forecasts are politically biased, mainly because there is
countries (Basu, Fernald, Oulton, & Srinivasan, 2003). Test
for shifts in productivity growth rates in other advanced usually no clear distinction between political targets and
economies did not find evidence of a changes in pro- rigorous forecasts (Frankel & Schreger, 2013; Strauch,
ductivity growth until well after the financial crisis in Hallerberg, & Hagen, 2004). In this sense, the availabil-
2007 (Benati, 2007; Glocker & Wegmüller, 2018; Turner ity of forecasts from independent sources is of great
& Boulhol, 2011). value (Jonung & Larch, 2006). But it is not as easy as saying
A more recent approach by Martinez et al. (2021) al- that independent forecasters would improve forecasts
lows for a time-varying long-run trend in UK productivity. due to the absence of political bias, because forecast-
ing accuracy is compromised by complexities of data,
108 See https://obr.uk/forecasts-in-depth/the-economy-forecast/
potential-output-and-the-output-gap. (Accessed: 2020-09-05) 109 This subsection was written by Diego J. Pedregal.

774
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

country-specific factors, outliers, changes in the definition asset pricing and risk management. Indeed, the value of
of fiscal variables, etc. Very often some of these issues interest rate–sensitive securities depends on the value of
are known by the staff of organisations in charge of the riskless rate. Besides, the short interest rate is a fun-
making the official statistics and forecasts long before damental ingredient in the formulation and transmission
the general public, and some information never leaves of the monetary policy (see, for example, Section 2.3.15).
such institutions. So this insider information is actually a However, many popular models of the short rate (for
valuable asset to improve forecasting accuracy (Leal et al., instance, continuous time, diffusion models) fail to deliver
2008). accurate out-of-sample forecasts. Their poor predictive
As for the second issue, namely the accuracy of fore- performance may depend on the fact that the stochastic
casting methods, the literature can be divided into two behaviour of short interest rates may be time-varying (for
parts, one based on macroeconomic models with specific instance, it may depend on the business cycle and on the
fiscal modules that allows to analyse the effects of fis- stance of monetary policy).
cal policy on macro variables and vice versa (see Favero Notably, the presence of nonlinearities in the condi-
and Marcellino (2005) and references therein), and the tional mean and variance of the short-term yield influ-
other based on pure forecasting methods and compar- ences the behaviour of the entire term structure of spot
isons among them. This last stream of research basically rates implicit in riskless bond prices. For instance, the
resembles closely what is seen in other forecasting areas: level of the short-term rate directly affects the slope of
(i) there is no single method outperforming the rest gen- the yield curve. More generally, nonlinear rate dynamics
erally, (ii) judgmental forecasting is especially important imply a nonlinear equilibrium relationship between short
due to data problems (see Section 2.11), and (iii) combi- and long-term yields. Accordingly, recent research has
nation of methods tends to outperform individual ones, reported that dynamic econometric models with regime
see Leal et al. (2008) and Section 2.6. shifts in parameters, such as Markov switching (MS; see
Part of the recent literature focused on the genera- Section 2.3.12) and threshold models (see Section 2.3.13),
tion of very short-term public finance monitoring systems are useful at forecasting rates.
using models that combine annual information with intra- The usefulness of MS VAR models with term structure
annual fiscal data (Pedregal & Pérez, 2010) by time aggre- data had been established since Hamilton (1988) and Gar-
gation techniques (see Section 2.10.2), often set up in a cia and Perron (1996): single-state, VARMA models are
SS framework (see Section 2.3.6). The idea is to produce overwhelmingly rejected in favour of multi-state models.
global annual end-of-year forecasts of budgetary variables Subsequently, a literature has emerged that has docu-
based on the most frequently available fiscal indicators, mented that MS models are required to successfully fore-
so that changes throughout the year in the indicators can cast the yield curve. Lanne and Saikkonen (2003) showed
be used as early warnings to infer the changes in the an- that a mixture of autoregressions with two
nual forecasts and deviations from fiscal targets (Pedregal, regimes improves the predictions of US T-bill rates. Ang
Pérez, & Sánchez, 2014). and Bekaert (2002) found support for MS dynamics in the
The level of disaggregation of the indicator variables short-term rates for the US, the UK, and Germany. Cai
are established according to the information available (1994) developed a MS ARCH model to examine volatil-
and the particular objectives. The simplest options are ity persistence, reflecting a concern that it may be in-
the accrual National Accounts annual or quarterly fiscal flated by regimes. Gray (1996) generalised this attempt
balances running on their cash monthly counterparts. A to MS GARCH and reported improvements in pseudo out-
somewhat more complex version is the previous one with of-sample predictions. Further advances in the methods
all the variables broken down into revenues and expendi- and applications of MS GARCH are in Haas, Mittnik, and
tures. Other disaggregation schemes have been applied, Paolella (2004) and Smith (2002). A number of papers
namely by region, by administrative level (regional, mu- have also investigated the presence of regimes in the typ-
nicipal, social security, etc.), or by items within revenue ical factors (level, slope, and convexity) that characterise
and/or expenditure (VAT, income taxes, etc. Asimakopou- the no-arbitrage dynamics of the term structure, showing
los, Paredes, & Warmedinger, 2020; Paredes, Pedregal, & the predictive benefits of incorporating MS (see, for ex-
Pérez, 2014). ample, Guidolin & Pedio, 2019; Hevia, Gonzalez-Rozada,
Unfortunately, what is missing is a comprehensive and Sola, & Spagnolo, 2015).
transparent forecasting system, independent of Member Alternatively, a few studies have tried to capture the
States, capable of producing consistent forecasts over time time-varying, nonlinear dynamics of interest rates us-
and across countries. This is certainly a challenge that no ing threshold models. As discussed by Pai and Pedersen
one has yet dared to take up. (1999), threshold models have an advantage compared to
MS ones: the regimes are not determined by an unob-
3.3.6. Interest rate prediction110 served latent variable, thus fostering interpretability. In
The (spot) rate on a (riskless) bond represents the ex- most of the applications to interest rates, the regimes are
ante return (yield) to maturity which equates its market determined by the lagged level of the short rate itself,
price to a theoretical valuation. Modelling and predicting in a self-exciting fashion. For instance, Pfann, Schotman,
default-free, short-term interest rates are crucial tasks in and Tschernig (1996) explored nonlinear dynamics of the
US short-term interest rate using a (self-exciting) thresh-
110 This subsection was written by Massimo Guidolin & Manuela old autoregressive model augmented by conditional het-
Pedio. eroskedasticity (namely, a TAR-GARCH model) and found
775
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

strong evidence of the presence of two regimes. More re- the world.112 At the same time, a substantial empirical
cently, also Gospodinov (2005) used a TAR-GARCH to pre- literature has developed that deals with predicting future
dict the short-term rate and showed that this model can house price movements (for a comprehensive survey see
capture some well-documented features of the data, such Ghysels, Plazzi, Valkanov, & Torous, 2013). Although this
as high persistence and conditional heteroskedasticity. literature concentrates almost entirely on the US (see, for
Another advantage of nonlinear models is that they example, Bork & Møller, 2015; Rapach & Strauss, 2009),
can reproduce the empirical puzzles that plague the ex- there are many other countries, such as the UK, where
pectations hypothesis of interest rates (EH), according to house price forecastability is of prime importance. Sim-
which it is a weighted average of short-term rates to drive ilarly to the US, in the UK, housing activities account
longer-term rates (see, for example, Bansal, Tauchen, & for a large fraction of GDP and of households’ expendi-
Zhou, 2004; Dai, Singleton, & Yang, 2007). For instance, tures; real estate property comprises a significant com-
while Bekaert, Hodrick, and Marshall (2001) show single- ponent of private wealth and mortgage debt constitutes a
state VARs cannot generate distributions consistent with main liability of households (Office for National Statistics,
the EH, Guidolin and Timmermann (2009) find that the 2019).
optimal combinations of lagged short and forward rates The appropriate forecasting model has to reflect the
depend on regimes so that the EH holds only in some dynamics of the specific real estate market and take into
states. account its particular characteristics. In the UK, for in-
As widely documented (see, for instance, Guidolin stance, there is a substantial empirical literature that doc-
& Thornton, 2018), the predictable component in mean uments the existence of strong spatial linkages between
rates is hardly significant. As a result, the random walk regional markets, whereby the house price shocks em-
remains a hard benchmark to outperform as far as the anating from southern regions of the country, and in
prediction of the mean is concerned. However, density particular Greater London, have a tendency to spread out
forecasts reflect all moments and the models that capture and affect neighbouring regions with a time lag (see, for
the dynamics of higher-order moments tend to perform example, Antonakakis, Chatziantoniou, Floros, & Gabauer,
best. MS models appear at the forefront of a class of 2018; Cook & Thomas, 2003; Holly, Pesaran, & Yamagata,
non-linear models that produce accurate density predic- 2010, inter alia); see also Section 2.3.10 on forecasting
tions (see, for example, Hong, Li, & Zhao, 2004; Maheu & functional data.
Yang, 2016). Alternatively, Pfann et al. (1996) and more Recent evidence also suggests that the relationship be-
recently Dellaportas, Denison, and Holmes (2007) esti- tween real estate valuations and conditioning macro and
mated TAR models to also forecast conditional higher financial variables displayed a complex of time-varying
order moments and all report reasonable accuracy. patterns over the previous decades (Aizenman & Jinjarak,
Finally, a literature has strived to fit rates not only un- 2013). Hence, predictive methods that do not allow for
der the physical measure, i.e., in time series, but to predict time-variation in both predictors and their marginal ef-
rates when MS enters the pricing kernel, the fundamental fects may not be able to capture the complex house price
pricing operator. A few papers have assumed that regimes dynamics in the UK (see Yusupova, Pavlidis, & Pavlidis,
represent a new risk factor (see, for instance, Dai & Single- 2019, for a comparison of forecasting accuracy of a battery
ton, 2003). This literature reports that MS models lead to a of static and dynamic econometric methods).
range of shapes for nominal and real term structures (see, An important recent trend is to attempt to incorpo-
for instance, Veronesi & Yared, 1999). Often the model rate information from novel data sources (such as news-
specifications that are not rejected by formal tests include paper articles, social media, etc.) in forecasting models
regimes (Ang, Bekaert, & Wei, 2008; Bansal & Zhou, 2002). as a measure of expectations and perceptions of eco-
To conclude, it is worthwhile noting that, while thresh- nomic agents (see also Section 2.9.3). It has been shown
old models are more interpretable, MS remain a more that changes in uncertainty about house prices impact
popular alternative for the prediction of interest rates. on housing investment and real estate construction de-
This is mainly due to the fact that statistical inference cisions (Banks, Blundell, Oldfield, & Smith, 2015; Cun-
for threshold regime switching models poses some chal- ningham, 2006; Oh & Yoon, 2020), and thus incorporat-
lenges, because the likelihood function is discontinuous ing a measure of uncertainty in the forecasting model
with respect to the threshold parameters. can improve the forecastability of real estate prices. For
instance in the UK, the House Price Uncertainty (HPU)
3.3.7. House price forecasting111 index (Yusupova, Pavlidis, Paya, & Peel, 2020), constructed
The boom and bust in housing markets in the early using the methodology outlined in Baker, Bloom, and
and mid 2000s and its decisive role in the Great Reces- Davis (2016),113 was found to be important in predicting
sion has generated a vast interest in the dynamics of
house prices and emphasised the importance of accu- 112 For instance, the International Monetary Fund recently established
rately forecasting property price movements during tur-
the Global Housing Watch, the Globalisation and Monetary Policy
bulent times. International organisations, central banks Institute of the Federal Reserve Bank of Dallas initiated a project on
and research institutes have become increasingly engaged monitoring international property price dynamics, and the UK Housing
in monitoring the property price developments across Observatory initiated a similar project for the UK national and regional
housing markets.
113 For a comparison of alternative text-based measures of economic
111 This subsection was written by Alisa Yusupova. uncertainty see Kalamara, Turrell, Redl, Kapetanios, and Kapadia (2020).

776
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

property price inflation ahead of the house price col- a simple method assuming gradual adjustment of the ex-
lapse of the third quarter of 2008 and during the bust change rate towards the level implied by the Purchasing
phase (Yusupova et al., 2019). Along with capturing the Power Parity (PPP) performs well over shorter as well as
two recent recessions (in the early 1990s and middle longer horizon. This result is consistent with the results of
2000s) this index also reflects the uncertainly related to Ca’ Zorzi et al. (2017) and Eichenbaum, Johannsen, and
the EU Referendum, Brexit negotiations and COVID-19 Rebelo (2017), who showed that exchange rates are pre-
pandemic. dictable within a general equilibrium DSGE framework
(see Section 2.3.15), which encompasses an adjustment of
3.3.8. Exchange rate forecasting114 the exchange rate to a PPP equilibrium. Finally, Ca’ Zorzi,
Exchange rates have long fascinated and puzzled re- Cap, Mijakovic, and Rubaszek (2020) discuss how extend-
searchers in international finance. The reason is that fol- ing the PPP framework for other fundamentals within
lowing the seminal paper of Meese and Rogoff (1983), the the BEER framework is not helping in exchange rate
common wisdom is that macroeconomic models cannot forecasting. Overall, at the current juncture it might be
outperform the random walk in exchange rate forecast- claimed that ‘‘exchange rate disconnect puzzle’’ is still
ing (see Rossi, 2013, for a survey). This view is difficult puzzling, with some evidence that methods based on PPP
to reconcile with the strong belief that exchange rates and controlling the estimation forecast error can deliver
are driven by fundamentals, such as relative productivity, more accurate forecast than the random walk benchmark.
external imbalances, terms of trade, fiscal policy or in- A way forward to account for macroeconomic variables
terest rate disparity (Couharde, Delatte, Grekou, Mignon, in exchange rate forecasting could be to use variable
& Morvillier, 2018; Lee, Milesi-Ferretti, & Ricci, 2013; selection methods that allow to control for the estimation
MacDonald, 1998). These two contradicting assertions by error (see Section 2.5.3).
the academic literature is referred to as ‘‘exchange rate
disconnect puzzle’’. 3.3.9. Financial time series forecasting with range-based
The literature provides several explanations for this volatility models115
The range-based (RB) volatility models is a general
puzzle. First, it can be related to the forecast estimation
term for the models constructed with high and low prices,
error (see Section 2.5.2). The studies in which models
and most often with their difference i.e., the price range.
are estimated with a large panels of data (Engel, Mark,
A short review and classification of such models is con-
& West, 2008; Ince, 2014; Mark & Sul, 2001), long time
tained in Section 2.3.14. From practical point of view, it
series (Lothian & Taylor, 1996) or calibrated (Ca’ Zorzi
is important that low and high prices are almost always
& Rubaszek, 2020) deliver positive results on exchange
available with daily closing prices for financial series.
rate forecastability. Second, there is ample evidence that
The price range (or its logarithm) is a significantly more
the adjustment of exchange rates to equilibrium is non-
efficient estimator of volatility than the estimator based
linear (Curran & Velic, 2019; Taylor & Peel, 2000), which
on closing prices (Alizadeh et al., 2002). Similarly the
might diminish the out-of-sample performance of macroe-
co-range (the covariance based on price ranges) is a sig-
conomic models (Kilian & Taylor, 2003; Lopez-Suarez &
nificantly more efficient estimator of the covariance of re-
Rodriguez-Lopez, 2011). Third, few economists argue that
turns than the estimator based on closing prices (Brunetti
the role of macroeconomic fundamentals may be varying
& Lildholdt, 2002). For these reasons models based on the
over time and this should be accounted for in a forecasting
price range and the co-range better describe variances
setting (Beckmann & Schussler, 2016; Byrne, Korobilis, &
and covariances of financial returns than the ones based
Ribeiro, 2016).
on closing prices.
The dominant part of the exchange rate forecasting Forecasts of volatility from simple models like moving
literature investigates which macroeconomic model per- average, EWMA, AR, ARMA based on the RB variance
forms best out-of-sample. The initial studies explored the estimators are more accurate than the forecasts from
role of monetary fundamentals to find that these models the same models based on squared returns of closing
deliver inaccurate short-term and not so bad long-term prices (Rajvanshi, 2015; Vipul & Jacob, 2007). Forecasts
predictions in comparison to the random walk (Mark, of volatility from the AR model based on the Parkinson
1995; Meese & Rogoff, 1983). In a comprehensive study estimator are more precise even than the forecasts from
from mid-2000s, Cheung, Chinn, and Pascual (2005) the standard GARCH models (see Section 2.3.11) based on
showed that neither monetary, uncovered interest parity closing prices (Li & Hong, 2011).
(UIP) nor behavioural equilibrium exchange rate (BEER) In plenty of studies it was shown that forecasts of
model are able to outperform the no-change forecast. volatility of financial returns from the univariate RB mod-
A step forward was made by Molodtsova and Papell els are more accurate than the forecasts from standard
(2009), who proposed a model combining the UIP and GARCH models based on closing prices (see, for exam-
Taylor rule equations and showed that it delivers com- ple, Mapa, 2003 for the GARCH-PARK-R model;
petitive exchange rate forecasts. This result, however, Chou, 2005 for the CARR model; Fiszeder, 2005 for the
has not been confirmed by more recent studies (Cheung, GARCH-TR model; Brandt and Jones, 2006 for the RE-
Chinn, Pascual, & Zhang, 2019; Engel, Lee, Liu, Liu, & Wu, GARCH model; Chen et al., 2008 for the TARR model; Lin
2019). In turn, Ca’ Zorzi and Rubaszek (2020) argue that et al., 2012 for the STARR model; Fiszeder and Perczak, 2016

114 This subsection was written by Michał Rubaszek. 115 This subsection was written by Piotr Fiszeder.

777
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

for the GARCH model estimated with low, high and clos- by a real-valued latent variable, which corresponds to
ing prices during crisis periods; Molnár, 2016 for the the Fisher transformation of Kendall’s τ . Li and Kang
RGARCH model). (2018) use a covariate-dependent copula framework to
The use of daily low and high prices in the multi- forecast the time varying dependence that improves both
variate volatility models leads to more accurate forecasts the probabilistic forecasting performance and the fore-
of covariance or covariance matrix of financial returns casting interpretability. Liquidity risk is another focus in
than the forecasts from the models based on closing finance. Weiß and Supper (2013) forecast three types of
prices (see, for example, Chou et al., 2009 for the RB DCC liquidity-adjusted intraday Value-at-Risk (L-IVaR) with a
model; Harris and Yilmaz, 2010 for the hybrid EWMA vine copula structure. The liquidity-adjusted intraday VaR
model; Fiszeder, 2018 for the BEKK-HL model; Fiszeder is based on simulated portfolio values, and the results are
and Fałdziński, 2019 for the co-range DCC model; Fiszeder compared with the realised portfolio profits and losses.
et al., 2019 for the DCC-RGARCH model). In macroeconomic forecasting, most existing reduced-
The RB models were used in many financial applica- form models for multivariate time series produce sym-
tions. They lead for example to more precise forecasts metric forecast densities. Gaussian copulas with skew
of value-at-risk measures in comparison to the applica- Student’s-t margins depict asymmetries in the predic-
tion of only closing prices (see, for example, Chen, Ger- tive distributions of GDP growth and inflation (Smith
lach, Hwang, and McAleer, 2012 for the threshold CAViaR & Vahey, 2016). Real-time macroeconomic variables are
model; Asai and Brugal, 2013 for the HVAR model; Fiszeder forecasted with heteroscedastic inversion copulas (Smith
et al., 2019 for the DCC-RGARCH model; Meng and Tay- & Maneesoonthorn, 2018) that allow for asymmetry in the
lor, 2020 for scoring functions). The application of the density forecasts, and both serial and cross-sectional de-
multivariate RB models provides also the increase in the pendence could be captured by the copula function (Loaiza-
efficiency of hedging strategies (see, for example, Chou Maya & Smith, 2020).
et al., 2009 for the RB DCC model; Harris and Yilmaz, 2010 Copulas are also widely used to detect and forecast
for the hybrid EWMA model; Su and Wu, 2014 for the RB- default correlation, which is a random variable called
MS-DCC model). Moreover, the RB volatility models have time-until-default to denote the survival time of each de-
more significant economic value than the return-based faultable entity or financial instrument (Li, 2000). Then
ones in the portfolio construction (Chou and Liu, 2010 for copulas are used in modelling the dependent defaults (Li,
the RB DCC model; Wu and Liang, 2011 for the RB-copula 2000), forecasting credit risk (Bielecki & Rutkowski, 2013),
model). Some studies show that based on the forecasts and credit derivatives market forecasting (Schönbucher,
from the volatility models with low and high prices it is 2003). A much large volume of literature is available
possible to construct profitable investment strategies (He, for this specific area. See the aforementioned references
Kwok, and Wan, 2010 for the VECM model; Kumar, 2015 therein. For particular applications in default risk and
for the CARRS model). credit default swap (CDS) forecasting, see Li and He (2019)
and Oh and Patton (2018) respectively.
3.3.10. Copula forecasting with multivariate dependent fi- In energy economics, Aloui, Hammoudeh, and Nguyen
nancial times series116 (2013) employ the time-varying copula approach, where
In this section, we focus on the practical advances on the marginal models are from ARMA(p, q)–GARCH(1,1)
jointly forecasting multivariate financial time series with to investigate the conditional dependence between the
copulas. In the copula framework (see Section 2.4.3), be- Brent crude oil price and stock markets in the Central and
cause marginal models and copula models are separable, Eastern European transition economies. Bessa, Miranda,
point forecasts are straightforward with marginal models, Botterud, Zhou, and Wang (2012) propose a time-adaptive
but dependence information is ignored. A joint probabilis- quantile-copula where the copula density is estimated
tic forecast with copulas involves both estimations of the with a kernel density forecast method. The method is
copula distribution and marginal models. applied to wind power probabilistic forecasting (see also
In financial time series, an emerging interest is to Section 3.4.6) and shows its advantages for both system
model and forecast the asymmetric dependence. A typical operators and wind power producers. Vine copula models
asymmetric dependence phenomenon is that two stock are also used to forecast wind power farms’ uncertainty
returns exhibit greater correlation during market down- in power system operation scheduling. Wang, Wang, Liu,
turns than market upturns. Patton (2006) employs the Wang, and Hou (2017) shows vine copulas have advan-
asymmetric dependence between exchange rates with a tages of providing reliable and sharp forecast intervals,
time-varying copula construction with AR and GARCH especially in the case with limited observations available.
margins. A similar study for measuring financial conta-
gion with copulas allows the parameters of the copula to 3.3.11. Financial forecasting with neural networks117
change with the states of the variance to identify shifts Neural Networks (NNs; see Section 2.7.8) are capable
in the dependence structure in times of crisis (Rodriguez, of successfully modelling non-stationary and non-linear
2007). series. This property has made them one of the most
In stock forecasting, Almeida and Czado (2012) employ popular (if not the most) non-linear specification used by
a stochastic copula autoregressive model to model DJI practitioners and academics in Finance. For example, 89%
and Nasdaq, and the dependence at the time is modelled of European banks use NNs to their operations (European

116 This subsection was written by Feng Li. 117 This subsection was written by Georgios Sermpinis.

778
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Banking Federation, 2019) while 25.4% of the NNs appli- 3.3.12. Forecasting returns to investment style119
cations in total is in Finance (Wong, Bodnovich, & Selvi, Investment style or factor portfolios are constructed
1995). from constituent securities on the basis of a variety of a-
The first applications of NNs in Finance and currently priori observable characteristics, thought to affect future
the most widespread, is in financial trading. In the mid- returns. For example a ‘Momentum’ portfolio might be
80s when computational power became cheaper and more constructed with positive (‘long’) exposures to stocks with
accessible, hedge fund managers started to experiment positive trailing 12-month returns, and negative (‘short’)
with NNs in trading. Their initial success led to even more exposure to stocks with negative trailing 12-month re-
practitioners to apply NNs and nowadays 67% of hedge turns (for full background and context, see, for example
fund managers use NNs to generate trading ideas (Bar- Bernstein, 1995; Haugen, 2010).120 Explanations as to
clayHedge, 2018). A broad measure of the success of NNs why such characteristics seem to predict returns fall in to
in financial trading is provided by the Eurekahedge AI two main camps: firstly that the returns represent a risk
Hedge Fund Index118 where it is noteworthy the 13.02% premium, earned by the investor in return for taking on
annualised return of the selected AI hedge funds over the some kind of (undiversifiable) risk, and secondly that such
last 10 years. returns are the result of behavioural biases on the part of
In academia, financial trading with NNs is the focus investors. In practice, both explanations are likely to drive
style returns to a greater or lesser extent. Several such
of numerous papers. Notable applications of NNs in trad-
strategies have generated reasonably consistent positive
ing financial series were provided by Dunis, Laws, and
risk-adjusted returns over many decades, but as with
Sermpinis (2010), Kaastra and Boyd (1996), Panda and
many financial return series, return volatility is large rel-
Narasimhan (2007), Tenti (1996), and Zhang and Ming
ative to the mean, and there can be periods of months or
(2008). The aim of these studies is to forecast the sign
even years when returns deviate significantly from their
or the return of financial trading series and based on
long-run averages. The idea of timing exposure to styles
these forecasts to generate profitable trading strategies.
is therefore at least superficially attractive, although the
These studies are closely related to the ones presented
feasibility of doing so is a matter of some debate (Arnott,
in Section 3.3.13 but the focus is now in profitability. Beck, Kalesnik, & West, 2016; Asness, 2016; Bender, Sun,
The second major field of applications of NNs in Fi- Thomas, & Zdorovtsov, 2018). Overconfidence in timing
nance is in derivatives pricing and financial risk man- ability has a direct cost in terms of trading frictions and
agement. The growth of the financial industry and the opportunity cost in terms of potential expected returns
provided financial services have made NNs and other and diversification forgone.
machine learning algorithms a necessity for tasks such A number of authors write on the general topic of style
as fraud detection, information extraction and credit risk timing (recent examples include Dichtl, Drobetz, Lohre,
assessment (Buchanan, 2019). In derivatives pricing, NNs Rother, & Vosskamp, 2019; Hodges, Hogan, Peterson, &
try to fill the limitations of the Black–Scholes model Ang, 2017), and several forecasting methodologies have
and are being used in options pricing and hedging. In been suggested, falling in to three main camps:
academia notable applications of NNs in risk management
are provided by Liu (2005) and Locarek-Junge and Prinzler 1. Serial Correlation: Perhaps the most promising ap-
(1998) and in derivatives by Bennell and Sutcliffe (2004) proach is exploiting serial correlation in style re-
and Psaradellis and Sermpinis (2016). turns. Babu, Levine, Ooi, Pedersen, and Stame-
As discussed before, financial series due to their non- los (2020) and Tarun and Bryan (2019) outline
linear nature and their wide applications in practice seems two such approaches and Ehsani and Linnainmaa
the perfect forecasting data set for researchers that want (2020) explore the relationship between momen-
tum in factor portfolios and momentum in un-
to test their NN topologies. As a result, there are thou-
derlying stock returns. As with valuation spreads
sands of forecasting papers in the field of NNs in financial
mentioned below, there is a risk that using momen-
forecasting. However, caution is needed in interpreta-
tum signals to time exposure to momentum factor
tion of their results. NNs are sensitive to the choice of
portfolios risks unwittingly compounding exposure.
their hyperparameters. For a simple MLP, a practitioner
A related strand of research relates (own) factor
needs to set (among others) the number and type of
volatility to future returns, in particular for mo-
inputs, the number of hidden nodes, the momentum, the
mentum factors (Barroso, 2015; Daniel & Moskowitz,
learning rate, the number of epochs and the batch size.
2016).
This complexity in NN modelling leads inadvertently to
2. Valuation Spreads: Using value signals (aggregated
the data snooping bias (see also Section 2.12.6). In other
from individual stock value exposures) to time ex-
words, a researcher that experiments long enough with posure to various fundamental-based strategies is
the parameters of a NN topology can have excellent in- a popular and intuitively appealing approach (As-
sample and out-of-sample results for a series. However, ness, 2016); however evidence of value added from
this does not mean that the results of his NN can be doing so is mixed, and the technique seems to
generalised. This issue has led the related literature to compound risk exposure to value factors.
be stained by studies cannot be extended in different
samples. 119 This subsection was written by Ross Hollyman.
120 The website of Kenneth French is an excellent source of data on
118 https://www.eurekahedge.com/Indices/IndexView/Eurekahedge/ investment style factor data and research. http://mba.tuck.dartmouth.
683/Eurekahedge-AI-Hedge-fund-Index (Accessed: 2020-09-01) edu/pages/faculty/ken.french/data_library.html

779
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

3. Economic & Financial Conditions: Polk, Haghbin, show that forecast combination can significantly improve
and de Longis (2020) explore how economic and out-of-sample market return forecasts. They first con-
financial conditions affect style returns (an idea struct return forecasts via individual univariate predictive
that dates back at least to Bernstein (1995) and regressions based on numerous popular predictors from
references therein). the literature (Goyal & Welch, 2008). They then compute
a simple combination forecast by taking the average of
Style returns exhibit distinctly non-normal distribu- the individual forecasts. Rapach et al. (2010) demonstrate
tions. On a univariate basis, most styles display returns that forecast combination exerts a strong shrinkage effect,
which are highly negatively skewed and demonstrate thereby helping to guard against overfitting.
significant kurtosis. The long-run low correlation between An emerging literature uses machine-learning tech-
investment styles is often put forward as a benefit of niques to construct forecasts of stock returns based on
style-based strategies, but more careful analysis reveals large sets of predictors. In an investigation of lead–lag
that non-normality extends to the co-movements of in- relationships among developed equity markets, Rapach,
vestment style returns; factors exhibit significant tail de- Strauss, and Zhou (2013) appear to be the first to em-
pendence. Christoffersen and Langlois (2013) explores ploy machine-learning tools to predict market returns.
this issue, also giving details of the skew and kurtosis They use the elastic net (ENet: Zou & Hastie, 2005), a
of weekly style returns. These features of the data mean generalisation of the popular least absolute shrinkage and
that focusing solely on forecasting the mean may not be selection operator (LASSO: Tibshirani, 1996). The LASSO
sufficient, and building distributional forecasts becomes and ENet employ penalised regression to guard against
important for proper risk management. Jondeau (2007) overfitting in high-dimensional settings by shrinking the
writes extensively on modelling non-gaussian distribu- parameter estimates toward zero. Chinco, Clark-Joseph,
tions. and Ye (2019) use the LASSO to forecast high-frequency
(one-minute-ahead) individual stock returns and report
3.3.13. Forecasting stock returns121 improvements in out-of-sample fit, while Rapach et al.
Theory and intuition suggest a plethora of potentially (2019) use the LASSO to improve monthly forecasts of
relevant predictors of stock returns. Financial statement industry returns.
data (e.g., Chan & Genovese, 2001; Yan & Zheng, 2017) Incorporating insights from Diebold and Shin (2019), Han,
provide a wealth of information, and variables relating to He, Rapach, and Zhou (2021) use the LASSO to form com-
liquidity, price trends, and sentiment, among numerous bination forecasts of cross-sectional stock returns based
other concepts, have been used extensively by academics on a large number of firm characteristics from the cross-
and practitioners alike to predict stock returns. The era sectional literature (e.g., Harvey, Liu, & Zhu, 2016; Hou,
of big data further increases the data available for fore- Xue, & Zhang, 2020; McLean & Pontiff, 2016), extend-
casting returns. When forecasting with large numbers ing the conventional OLS approach of Green, Hand, and
of predictors, conventional ordinary least squares (OLS) Zhang (2017), Haugen and Baker (1996), and Lewellen
estimation is highly susceptible to overfitting, which is ex- (2015). Dong, Li, Rapach, and Zhou (2021) and Rapach
acerbated by the substantial noise in stock return data (re- and Zhou (2020) use the ENet to compute combination
flecting the intrinsically large unpredictable component in forecasts of the market return based on popular predictors
returns); see Section 2.7.11. from the time-series literature and numerous anomalies
Over the last decade or so, researchers have explored from the cross-sectional literature, respectively. Forecast-
methods for forecasting returns with large numbers of ing individual stock returns on the basis of firm char-
predictors. Principal component regression extracts the acteristics in a panel framework, Freyberger, Neuhierl,
first few principal components (or factors) from the set of and Weber (2020) and Gu, Kelly, and Xiu (2020) employ
predictors; the factors then serve as predictors in a low- machine-learning techniques – such as the nonparametric
dimensional predictive regression, which is estimated via additive LASSO (Huang, Horowitz, & Wei, 2010), random
OLS (see Section 2.7.1). Intuitively, the factors combine forests (Breiman, 2001), and artificial neural networks –
the information in the individual predictors to reduce that allow for nonlinear predictive relationships.
the dimension of the regression, which helps to guard
against overfitting. Ludvigson and Ng (2007) find that a 3.3.14. Forecasting crashes in stock markets122
few factors extracted from hundreds of macroeconomic Time series data on financial asset returns have spe-
and financial variables improve out-of-sample forecasts of cial features. Returns themselves are hard to forecast,
the US market return. Kelly and Pruitt (2013) and Huang, while it seems that volatility of returns can be predicted.
Jiang, Tu and Zhou (2015) use partial least squares (Wold, Empirical distributions of asset returns show occasional
1966) to construct target-relevant factors from a cross clusters of large positive and large negative returns. Large
section of valuation ratios and a variety of sentiment mea- negative returns, that is, crashes seem to occur more
sures, respectively, to improve market return forecasts. frequently than large positive returns. Forecasting upcom-
Since Bates and Granger (1969), it has been known that ing increases or decreases in volatility can be achieved
combinations of individual forecasts often perform better by using variants of the Autoregressive Conditional Het-
than the individual forecasts themselves (Timmermann, eroskedasticity (ARCH) model (Bollerslev, 1986; Engle,
2006, and Section 2.6.1). Rapach, Strauss, and Zhou (2010) 1982, and Section 2.3.11) or realized volatility models

121 This subsection was written by David E. Rapach. 122 This subsection was written by Philip Hans Franses.

780
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

(Taylor, 1986a). These models take (functions of) past energy management and planning by supporting a large
volatility and past returns as volatility predictors, al- variety of optimisation procedures.
though also other explanatory variables can be incorpo- The main challenge in electricity consumption fore-
rated in the regression. casting is that building energy systems are complex in
An important challenge that remains is to predict nature, with their behaviour depending on various factors
crashes. Sornette (2003) summarises potential causes for related to the type (e.g., residential, office, entertainment,
crashes and these are computer trading, increased trading business, and industrial) and the end-uses (e.g., heating,
in derivatives, illiquidity, trade and budget deficits, and cooling, hot water, and lighting) of the building, its con-
especially, herding behaviour of investors. Yet, forecast- struction, its occupancy, the occupants’ behaviour and
ing the exact timing of crashes may seem impossible, schedule, the efficiency of the installed equipment, and
but on the other hand, it may be possible to forecast the weather conditions (Zhao & Magoulès, 2012). Special
the probability that a crash may occur within a fore- events, holidays, and calendar effects can also affect the
seeable future. Given the herding behaviour, any model behaviour of the systems and further complicate the con-
to use for prediction should include some self-exciting sumption patterns, especially when forecasting at hourly
behaviour. For that purpose, Aït-Sahalia, Cacho-Diaz, and or daily level (see Section 2.3.5). As a result, producing
Laeven (2015) propose mutually exciting jump processes, accurate forecasts typically requires developing tailored,
where jumps can excite new jumps, also across assets or building-specific methods.
markets (see also Chavez-Demoulin, Davison, & McNeil, To deal with this task, the literature focuses on three
2005). Another successful approach is the Autoregressive main classes of forecasting methods, namely engineering,
Conditional Duration (ACD) model (Engle & Russell, 1997, statistical, and ML (Mat Daut, Hassan, Abdullah, Rahman,
1998), which refers to a time series model for durations Abdullah, & Hussin, 2017). Engineering methods, typically
between (negative) events. utilised through software tools such as DOE-2, Energy-
An alternative view on returns’ volatility and the po- Plus, BLAST, and ESP-r, build on physical models that
tential occurrence of crashes draws upon the earthquake forecast consumption through detailed equations which
literature (Ogata, 1978, 1988). The idea is that tensions account for the particularities of the building (Al-Homoud,
in and across tectonic plates build up, until an erup- 2001; Foucquier, Robert, Suard, Stéphan, & Jay, 2013;
tion, and after that, tension starts to build up again un- Zhao & Magoulès, 2012). Statistical methods usually in-
til the next eruption. By modelling the tension-building- volve linear regression (see Section 2.3.2), ARIMA/ARIMAX
up process using so-called Hawkes processes (Hawkes, (see Section 2.3.4), and exponential smoothing (see Sec-
1971, 2018; Hawkes & Oakes, 1974; Ozaki, 1979), one tion 2.3.1) models that forecast consumption using past
can exploit the similarities between earthquakes and fi- consumption data or additional explanatory variables,
nancial crashes (see also Section 2.8.4). Gresnigt, Kole, such as weather or occupancy and calendar related in-
and Franses (2015) take Hawkes processes to daily S&P formation (Deb, Zhang, Yang, Lee, & Shah, 2017). Finally,
500 data and show that it is possible to create reliable ML methods (see Section 2.7.10) typically involve neural
probability predictions of a crash occurrence within the networks (see Section 2.7.8), support vector machines,
next five days. Gresnigt, Kole, and Franses (2017a, 2017b) and grey models that account for multiple non-linear
further develop a specification strategy for any type of as- dependencies between the electricity consumed and the
set returns, and document that there are spillovers across factors influencing its value (Ahmad, Hassan, Abdullah,
assets and markets. Rahman, Hussin, Abdullah, & Saidur, 2014). Till present,
Given investor behaviour, past crashes can ignite fu- the literature has been inconclusive about which class
ture crashes. Hawkes processes are particularly useful to of methods is the most appropriate, with the conclu-
describe this feature and can usefully be implemented sions drawn being subject to the examined building type,
to predict the probability of nearby crashes. By the way, data set used, forecasting horizon considered, and data
these processes can also be useful to predict social con- frequency at which the forecasts are produced (Wei, Li,
flicts, as also there one may discern earthquake-like pat- Peng, Zeng, & Lu, 2019). To mitigate this problem, com-
terns. van den Hengel and Franses (2020) document their binations of methods (see Section 2.6) and hybrids (see
forecasting power for social conflicts in Africa. Section 2.7.13) have been proposed, reporting encourag-
ing results (Mohandes, Zhang, & Mahdiyar, 2019; Zhao &
3.4. Energy Magoulès, 2012).
Other practical issues refer to data pre-processing.
3.4.1. Building energy consumption forecasting and optimi- Electricity consumption data is typically collected at high
sation123 frequencies through smart meters and therefore display
In Europe, buildings account for 40% of total energy noise and missing or extreme values due to monitor-
consumed and 36% of total CO2 emissions (Patti, Acqua- ing issues (see Section 2.7.11). As a result, verifying the
viva, Jahn, Pramudianto, Tomasi, Rabourdin, Virgone, & quality of the input data through diagnostics and data
Macii, 2016). Given that energy consumption of buildings cleansing techniques (see Section 2.2.3 and Section 2.2.4),
is expected to increase in the coming years, forecasting as well as optimising the selected time frames, are im-
electricity consumption becomes critical for improving portant for improving forecasting performance (Bourdeau,
qiang Zhai, Nefzaoui, Guo, & Chatellier, 2019). Similarly,
123 This subsection was written by Christoph Bergmeir & Evangelos it is critical to engineer (see Section 2.2.5) and select
Spiliotis. (see Section 2.5.3) appropriate regressor variables which
781
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

are of high quality and possible to accurately predict More generally, many challenges in the space of build-
to assist electricity consumption forecasting. Finally, it ing energy optimisation are classical examples of so-
must be carefully decided whether the bottom-up, the called ‘‘predict then optimise’’ problems (Demirovic et al.,
top-down or a combination method (see Section 2.10.1) 2019; Elmachtoub & Grigas, 2017). Here, different pos-
will be used for producing reconciled forecasts at both sible scenario predictions are obtained from different
building and end-use level (Kuster, Rezgui, & Mourshed, assumptions in the form of input parameters. These input
2017), being also possibly mixed with temporal aggre- parameters are then optimised to achieve a desired pre-
gation approaches (Spiliotis, Petropoulos et al., 2020, but dicted outcome. As both prediction and optimisation are
also Section 2.10.3). difficult problems, they are usually treated separately (El-
Provided that accurate forecasts are available, effective machtoub & Grigas, 2017), though there are now recent
energy optimisation can take place at a building level or works where they are considered together (Demirovic
across blocks of buildings (see Section 3.4.10) to reduce et al., 2019; El Balghiti, Elmachtoub, Grigas, & Tewari,
energy cost, improve network stability, and support ef- 2019), and this will certainly be an interesting avenue for
forts towards a carbon-free future, by exploiting smart future research.
grid, internet of things (IoT), and big data technologies
along with recommendation systems (Marinakis et al., 3.4.2. Electricity price forecasting124
2020). Forecasting electricity prices has various challenges
An example for a typical application in this area is the that are highlighted in the detailed review papers by
optimisation of heating, ventilation, and air conditioning Weron (2014). Even though there are economically well
(HVAC) systems. The goal is to minimise the energy use motivated fundamental electricity price models, forecast-
of the HVAC system under the constraints of maintaining ing models based on evaluating historic price data are
certain comfort levels in the building (Marinakis, Doukas, the dominating the academic literature. In recent years
Spiliotis, & Papastamatiou, 2017). Though this is predom- the focus on probabilistic forecasting grew rapidly, as
inantly an optimisation exercise, forecasting comes in at they are highly relevant for many applications in energy
different points of the system as input into the optimisa- trading and risk management, storage optimisation and
tion, and many problems in this space involve forecasting predictive maintenance, (Nowotarski & Weron, 2018; Ziel
as a sub-problem, including energy consumption fore- & Steinert, 2018). Electricity price data is highly complex
casting, room occupancy forecasting, inside temperature and is influenced by regulation. However, there is electric-
forecasting, (hyper-local) forecasts of outside tempera- ity trading based on auctions and on continuous trading.
ture, and air pressure forecasting for ventilation, among Many markets like the US and European markets organ-
others. For instance, Krüger and Givoni (2004) use a linear ise day-ahead auctions for electricity prices, see Fig. 7.
regression approach to predict inside temperatures in 3 Thus, we have to predict multivariate time series type
houses in Brazil, and Ruano, Crispim, Conceiçao, and Lúcio data (Ziel & Weron, 2018). In contrast, intraday markets
(2006) propose the use of a neural network to predict usually apply continuous trading to manage short term
temperatures in a school building. Madaus, McDermott, variations due to changes in forecasts of renewable energy
Hacker, and Pullen (2020) predict hyper-local extreme and demand, and outages (Kiesel & Paraschiv, 2017).
heat events, combining global climate models and ma- The key challenge in electricity price forecasting is
chine learning models. Jing, Cai, Chen, Zhai, Cui, and Yin to address all potential characteristics of the considered
(2018) predict air pressure to tackle the air balancing market, most notably (some of them visible in Fig. 7):
problem in ventilation systems, using a support vector
1. (time-varying) autoregressive effects and
machine.
(in)stationarity,
Predicting energy demand on a building/household
2. calendar effects (daily, weekly and annual season-
level from smart meter data is an important research topic
ality, holiday effects, clock-change),
not only for energy savings. In the building space, Ahmad,
3. (time-varying) volatility and higher moment ef-
Mourshed, and Rezgui (2017), Touzani, Granderson, and
fects,
Fernandes (2018), and Wang, Wang, Zeng, Srinivasan, and
4. price spikes (positive and negative), and
Ahrentzen (2018) predict building energy consumption of
5. price clustering.
residential and commercial buildings using decision tree-
based algorithms (random forests and gradient boosted Some of those impacts can be explained by external
trees) and neural networks to improve energy efficiency. inputs, that partially have to be predicted in advance:
A recent trend in forecasting are global forecasting
1. load/demand/consumption (see Section 3.4.3),
models, built across sets of time series (Januschowski
2. power generation, especially from the renewable
et al., 2020). (Recurrent) neural networks (Bandara et al.,
energy sources (RES) of wind and solar (see Sec-
2020a; Hewamalage et al., 2021) are particularly suitable
tions 3.4.6 and 3.4.8),
for this type of processing due to their capabilities to deal
3. relevant fuel prices (especially oil, coal, natural gas;
with external inputs and cold-start problems. Such capa-
see also Section 3.4.4),
bilities are necessary if there are different regimes in the
4. prices of emission allowances (CO2 e costs),
simulations under which to predict, an example of such a
system for HVAC optimisation is presented by Godahewa,
Deng, Prouzeau, and Bergmeir (2020). 124 This subsection was written by Luigi Grossi & Florian Ziel.

782
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 7. Hourly German day-ahead electricity price data resulting from a two-sided auction (top left) with corresponding 24 sale/supply and
purchase/demand curves for 24 May 2020 and highlighted curves for 17:00 (top right), power generation and consumption time series (bottom
left), and bid structure of 24 May 2020 17:00 (bottom right).

5. related power market prices (future, balancing and In recent years, statistical and machine learning meth-
neighboring markets), ods gained a lot of attraction in day-ahead electricity price
6. availabilities of power plants and interconnectors, forecasting. Even though the majority of effects is linear
7. import/export flow related data, and there are specific non-linear dependencies that can be
8. weather effects (e.g. temperature due to cooling explored by using non-linear models, especially neural
and heating and combined heat and power (CHP) networks (Dudek, 2016; Lago, De Ridder, & De Schutter,
effects; see also Section 3.5.2). 2018; Marcjasz, Uniejewski, & Weron, 2019; Ugurlu, Ok-
suz, & Tas, 2018). Of course this comes along with higher
Note that other weather effects might be relevant as well, computational costs compared to linear models. Fezzi and
but should be covered from the fundamental point of view Mosetti (2020) illustrate that even simple linear mod-
by the listed external inputs. Obvious examples are wind els can give highly accurate forecasts, if correctly cali-
speed for the wind power prediction, cloud cover for the brated. However, there seems to be consensus that fore-
solar power production and illumination effects in the cast combination is appropriate, particularly for mod-
electricity consumption. els that have different structures or different calibra-
Many of those external effects may be explained by tion window length (Gaillard, Goude, & Nedellec, 2016;
standard economic theory from fundamental electricity Hubicka, Marcjasz, & Weron, 2018; Mirakyan, Meyer-
price models (Cludius, Hermann, Matthes, & Graichen, Renschhausen, & Koch, 2017).
2014; Kulakov & Ziel, 2021). Even the simple supply stack Another increasing stream of electricity price forecast-
model (merit order model), see Fig. 8, explains many ing models do not focus on the electricity price itself, but
features and should be kept in mind when designing an the bid/sale/sell/supply and ask/sell/purchase/demand
appropriate electricity price forecasting model. curves of the underlying auctions (see Fig. 7, but also
783
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 8. Illustrative example of a supply stack model with inelastic demand for different power plant types, roughly covering the situation in Germany
2020.

Kulakov, 2020; Mestre, Portela, San Roque, & Alonso, and Pavlov (2008) who adopt Markov switching mod-
2020; Shah & Lisi, 2020; Ziel & Steinert, 2016). This so- els for spikes prediction. Christensen, Hurn, and Lindsay
phisticated forecasting problem allows more insights for (2009, 2012) suggest treating and forecasting price spikes
trading applications and the capturing of price clusters. through Poisson autoregressive and discrete-time pro-
In forecasting intraday markets the literature just cesses, respectively. Herrera and González (2014) use a
started to grow quickly. As the aforementioned mar- Hawkes model combined with extreme events theory.
ket characteristics get less distinct if information from
Interregional links among different electricity markets are
day-ahead markets is taken into account appropriately.
used by Clements, Herrera, and Hurn (2015) and Man-
However, intraday prices are usually more volatile and
ner, Türk, and Eichler (2016) to forecast electricity price
exhibit stronger price spikes. Thus, probabilistic fore-
spikes. A new procedure for the simulation of electricity
casting is even more relevant (Janke & Steinke, 2019;
Narajewski & Ziel, 2020b). Recent studies showed that spikes has been recently proposed by Muniain and Ziel
European markets are close to weak-form efficiency. Thus (2020) utilising bivariate jump components in a mean
naive point forecasting benchmarks perform remarkably reverting jump diffusion model in the residuals.
well (Marcjasz, Uniejewski, & Weron, 2020; Narajewski & The second stream of literature includes papers de-
Ziel, 2020a; Oksuz & Ugurlu, 2019). veloping outlier detection methods or robust estimators
As pointed out above, predicting price spikes is par- to improve the forecasting performance of the models.
ticularly important in practice, due to the high impact Martínez-Álvarez, Troncoso, Riquelme, and Aguilar-Ruiz
in decision making problems which occur usually in ex- (2011) tackle the issue of outlier detection and prediction
treme situations, see Fig. 8. Very high electricity prices are defining ‘‘motifs’’, that is patches of units preceding obser-
usually observed in connection to high demand and low vations marked as anomalous in a training set. Janczura,
renewable energy generation, sometimes together with Trück, Weron, and Wolff (2013) focus on the detection
sudden power plant failures. In contrast, negative price
and treatment of outliers in electricity prices. A very sim-
spikes occur in oversupply situation, when there is low
ilar approach, based on seasonal autoregressive models
demand but high penetration from wind and solar power.
and outlier filtering, is followed by Afanasyev and Fe-
The presence of spikes is explored in two main streams in
dorova (2019). Grossi and Nan (2019) introduced a pro-
literature: spike forecasting and prediction of prices under
normal regime through robust estimators. cedure for the robust statistical prediction of electric-
Within the first set of papers, spikes are often mod- ity prices. The econometric framework is represented by
elled as one regime of non-linear models for time series. the robust estimation of non-linear SETAR processes. A
This approach is followed by Mount, Ning, and Cai (2006) similar approach has been followed by Wang, Yang, Du,
focusing on regime-switching models with parameters and Niu (2020) using an outlier-robust machine learning
driven by time-varying variables and by Becker, Hurn, algorithm.
784
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

3.4.3. Load forecasting125 Li, Tan, and Zhou (2018) combined empirical mode de-
Load forecasting forms the basis where power system composition (EMD), ARIMA, and wavelet neural networks
operation and planning builds upon. Based on the time (WNN) optimised by the fruit fly algorithm on Australian
horizon of the forecasts, load forecasting can be classified Market data and New York City data. Their approach was
into very short-term (VSTLF), that refers to horizon from to separate the linear and nonlinear components from
several minutes ahead up to 1 h, short-term (STLF), that original electricity load; ARIMA is used for linear part
spans from 1 h to 168 h ahead, medium-term (MTLF), while the WNN for the non-linear one.
that spans from 168 h to 1 year ahead and finally, and Sideratos, Ikonomopoulos, and Hatziargyriou (2020)
long-term (LTLF) that concerns predictions from 1 year proposed that a radial basis network that performs the
to several years ahead. In VSTLF and STLF applications, initial forecasting could serve as input to a convolutional
the focus is on the sub-hourly or hourly load. In MTLF neural network that performs the final forecasting. The
and LTLF, the variables of interest can be either monthly proposed model led to lower error compared to the per-
electricity peak load and total demand for energy. sistence model, NN, and SVM. Semero, Zhang, and Zheng
Inputs differ in the various horizons. In VSTLF and (2020) focused on the energy management of a micro-
STLF, apart from meteorological data, day type identifi- grid located in China using EMD to decompose the load,
cation codes are used. In LTLF, macroeconomic data are adaptive neuro-fuzzy inference system (ANFIS) for fore-
used since total demand of energy is influenced by the casting and particle swarm intelligence (PSO) to opti-
long-term modifications of the social and economic envi- mize ANFIS parameters. The results show that the pro-
ronments. Among the horizons, special attention is placed posed approach yielded superior performance over four
at STLF. This is reflected by the research momentum that other methods. Faraji, Ketabi, Hashemi-Dezaki, Shafie-
have been placed in the load forecasting related litera- Khah, and Catalão (2020) proposed a hybrid system for
ture by other researchers (Hong & Fan, 2016). Processes the scheduling of a prosumer microgrid in Iran. Various
like unit commitment and optimal power flow rely on machine learning algorithms provided load and weather
STLF (Bo & Li, 2012; Saksornchai, Lee, Methaprayoon, forecasts. Through an optimisation routine, the best in-
Liao, & Ross, 2005). Additionally, since competitive energy dividual forecast is selected. The hybrid system displayed
markets continually evolve, STLF becomes vital for new better accuracy from the sole application of the individual
market entities such as retailers, aggregators, and pro- forecasters.
sumers for applications such as strategic bidding, portfolio
optimisation, and tariff design (Ahmad, Javaid, Mateen, 3.4.4. Crude oil price forecasting126
Awais, & Khan, 2019; Danti & Magnani, 2017). Crude oil, one of the leading energy resources, has
The models that can be found in the load forecasting contributed to over one-third of the world’s energy con-
related literature can in general categorised into three sumption (Alvarez-Ramirez, Soriano, Cisneros, & Suarez,
types: time-series, machine learning, and hybrid. Time- 2003). The fluctuations of the crude oil price have a sig-
series models historically precede the others. Typical ex- nificant impact on industries, governments as well as
amples of this family are ARMA, ARIMA, and others (see individuals, with substantial up-and-downs of the crude
also Section 2.3.4). In the machine learning models, the oil price bringing dramatic uncertainty for the economic
structure is usually determined via the training process. and political development (Cunado & De Gracia, 2005;
NNs are commonly used. Once a NN is sufficiently trained, Kaboudan, 2001). Thus, it is critical to develop reliable
it can provide forecasts for all types of forecasting hori- methods to accurately forecast crude oil price movement,
zons (Hippert, Pedreira, & Souza, 2001). The third category so as to guard against the crude oil market extreme risks
of models refers to the integration of two or more individ- and improve macroeconomic policy responses. However,
ual forecasting approaches (see also see Section 2.7.13). the crude oil price movement suffers from complex fea-
For instance, a NN can be combined with time series tures such as nonlinearity, irregularities, dynamics and
methods, with unsupervised machine learning algorithms, high volatility (Alquist, Kilian, & Vigfusson, 2013; Herrera,
data transformation, and with meta-heuristics algorithms Hu, & Pastor, 2018; Kang, Kang, & Yoon, 2009, and also
(Bozkurt, Biricik, & Tayşi, 2017; El-Hendawi & Wang, Section 2.3.11), making the crude oil price forecasting still
2020; López, Zhong, & Zheng, 2017; Lu, Azimi, & Iseley, one of the most challenging forecasting problems.
2019). Some prior studies have suggested that the crude oil
Hybrid systems has been tested on validation data price movement is inherently unpredictable, and it would
(through forecasting competitions), power system aggre- be pointless and futile to attempt to forecast future prices,
gated load, and application oriented tasks. Ma (2021) see Miao, Ramchander, Wang, and Yang (2017) for a de-
proposed an ensemble method based on a combination tailed summary. These agnostics consider the naive no-
of various single forecasters on GEFCom2012 forecasting change forecast as the best available forecast value of
competition data that outperformed benchmark forecast- future prices. In recent years, however, numerous stud-
ers such as Theta method, NN, ARIMA, and others (see ies result in forecasts that are more accurate than naive
Section 2.12.7 for further discussions on forecasting com- no-change forecasts, making the forecasting activities of
petitions). For aggregated load cases, researchers focus crude oil prices promising (Alquist et al., 2013; Baumeis-
on different countries and energy markets. Zhang, Wei, ter, Guérin, & Kilian, 2015).

125 This subsection was written by Ioannis Panapakidis. 126 This subsection was written by Xiaoqian Wang.

785
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Extensive research on crude oil price forecasting has According to Rao and Kishore (2010), renewables’ typical
focused predominantly on the econometric models, such characteristics such as low load factor, need for energy
as VAR, ARCH-type, ARIMA, and Markov models (see, for storage, small size, high upfront costs create a compet-
example, Agnolucci, 2009; e Silva, Legey, & e Silva, 2010; itive disadvantage, while Meade and Islam (2015b) sug-
Mirmirani & Li, 2004; Mohammadi & Su, 2010, and Sec- gested that renewable technologies are different from
tion 2.3). In the forecasting literature, unit root tests (see other industrial technological innovations because, in the
Section 2.3.4) are commonly applied to examine the sta- absence of focused support, they are not convenient from
tionarity of crude oil prices prior to econometric mod- a financial point of view. In this sense, policy measures
elling (Rahman & Serletis, 2012; Serletis & Rangel-Ruiz, and incentive mechanisms, such as feed-in tariffs, have
2004; Silvapulle & Moosa, 1999). It is well-documented been used to stimulate the market. As highlighted in Lee
that crude oil prices are driven by a large set of exter- and Huh (2017b), forecasting RETs requires to capture
nal components, which are themselves hard to predict, different socio-economic aspects, such as policy choices
including supply and demand forces, stock market ac- by governments, carbon emissions, macroeconomic fac-
tivities, oil-related events (e.g., war, weather conditions), tors, economic and financial development of a country,
political factors, etc. In this context, researchers have fre- competitive strength of traditional energy technologies.
quently considered structural models (see Section 2.3.9), The complex and uncertain environment concerning
which relate the oil price movements to a set of economic RETs deployment has been faced in literature in several
factors. With so many econometric models, is there an op- ways, in order to account for various determinants of the
timal one? Recently, de Albuquerquemello, de Medeiros, transition process. A first stream of research employed a
da Nóbrega Besarria, and Maia (2018) proposed a SE- bottom-up approach, where forecasts at a lower level are
TAR model, allowing for predictive regimes changing af- aggregated to higher levels within the forecasting hier-
ter a detected threshold, and achieved performance im- archy. For instance Park, Yun, Yun, Lee, and Choi (2016)
provements over six widely used econometric models. realised a bottom-up analysis to study the optimum re-
Despite their high computational efficiency, the econo- newable energy portfolio, while Lee and Huh (2017a)
metric models are generally limited in the ability to non- performed a three-step forecasting analysis, to reflect
linear time series modelling. the specificities of renewable sources, by using different
forecasting methods for each of the sources considered.
On the other hand, artificial intelligence and machine
A similar bottom-up perspective was adopted in Zhang,
learning techniques, such as belief networks, support vec-
Bauer, Yin, and Xie (2020), by conducting a multi-region
tor machines (SVMs), recurrent neural networks (RNNs),
study, to understand how multi-level learning may affect
and extreme gradient boosting (XGBoost), provided pow-
RETs dynamics, with the regionalised model of invest-
erful solutions to recognise the nonlinear and irregular
ment and technological development, a general equilib-
patterns of the crude oil price movement with high au-
rium model linking a macro-economic growth with a
tomation (see, for example, Abramson & Finizza, 1991;
bottom-up engineering-based energy system model.
Gumus & Kiran, 2017; Mingming & Jinliang, 2012; Xie,
The relative newness of RETs has posed the challenge
Yu, Xu, & Wang, 2006). However, challenges also ex-
of forecasting with a limited amount of data: in this per-
ist in these techniques, such as computational cost and
spective, several contributions applied the ‘Grey System’
overfitting. In addition, a large number of studies have
theory, a popular methodology for dealing with systems
increasingly focused on the hybrid forecasting models
with partially unknown parameters (Kayacan, Ulutas, &
(see also Section 2.7.13) based on econometrics models
Kaynak, 2010). Grey prediction models for RETs forecast-
and machine learning techniques (Baumeister & Kilian, ing were proposed in Liu and Wu (2021), Lu (2019),
2015; Chiroma, Abdulkareem, & Herawan, 2015; He, Yu, Moonchai and Chutsagulprom (2020), Tsai, Xue, Zhang,
& Lai, 2012; Jammazi & Aloui, 2012), achieving improved Chen, Liu, Zhou, and Dong (2017) and Wu, Ma, Zeng,
performance. Notably, the vast majority of the literature Wang, and Cai (2019).
has focused primarily on the deterministic prediction, Other studies developed forecasting procedures based
with much less attention paid to the probabilistic predic- on growth curves and innovation diffusion models (see
tion and uncertainty analysis. However, the high volatility Sections 2.3.18–2.3.20): from the seminal work by
of crude oil prices makes probabilistic prediction more Marchetti and Nakicenovic (1979), contributions on the
crucial to reduce the risk in decision-making (Abramson diffusion of RETs were proposed by Bunea, Della Posta,
& Finizza, 1995; Sun, Sun, Wang, & Wei, 2018). Guidolin, and Manfredi (2020), Dalla Valle and Furlan
(2011), Guidolin and Mortarino (2010), Lee and Huh (2017b)
3.4.5. Forecasting renewable energy technologies127 and Meade and Islam (2015b). Forecasting the diffusion
The widespread adoption of renewable energy tech- of renewable energy technologies was also considered
nologies, RETs, plays a driving role in the transition to within a competitive environment in Furlan and Mor-
low-carbon energy systems, a key challenge to face cli- tarino (2018), Guidolin and Alpcan (2019), Guidolin and
mate change and energy security problems. Forecasting Guseo (2016) and Huh and Lee (2014).
the diffusion of RETs is critical for planning a suitable
energy agenda and setting achievable targets in terms of 3.4.6. Wind power forecasting128
electricity generation, although the available time series Wind energy is a leading source of renewable energy,
are often very short and pose difficulties in modelling. meeting 4.8% of global electricity demand in 2018, more

127 This subsection was written by Mariangela Guidolin. 128 This subsection was written by Jethro Browell.

786
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

than twice that of solar energy (IEA, 2020). Kinetic en- and Zareipour (2020) and Zhang, Wang, and Wang (2014),
ergy in the wind is converted into electrical energy by and research is ongoing in a range of directions including:
wind turbines according to a characteristic ‘power curve’. improving accuracy and reducing uncertainty in short-
Power production is proportion to the cube of the wind term forecasting, extending forecast horizons to weeks
speed at low-to-moderate speeds, and above this is con- and months ahead, and improving very short-term fore-
stant at the turbine’s rated power. At very high or low cast with remote sensing and data sharing (Sweeney,
wind speeds no power is generated. Furthermore, the Bessa, Browell, & Pinson, 2019, and Section 3.4.10).
power curve is influenced by additional factors including
air density, icing, and degradation of the turbine’s blades. 3.4.7. Wave forecasting129
Forecasts of wind energy production are required from Ocean waves are primarily generated by persistent
minutes to days-ahead to inform the operation of wind winds in one direction. The energy thus propagated by
farms, participation in energy markets and power sys- the wind is referred to as wave energy flux and follows
tems operations. However, the limited predictability of a linear function of wave height squared and wave pe-
the weather (see also Section 3.5.2) and the complex- riod. Wave height is typically measured as significant
ity of the power curve make this challenging. For this wave height, the average height of the highest third of
reason, probabilistic forecasts are increasing used in prac- the waves. The mean wave period, typically measured
tice (Bessa et al., 2017). Their value for energy trading is in seconds, is the average time between the arrival of
clear (Pinson, Chevallier, & Kariniotakis, 2007), but quan- consecutive crests, whereas the peak wave period is the
tifying value for power system operation is extremely wave period at which the highest energy occurs at a
complex. Wind power forecasting may be considered a specific point.
mature technology as many competing commercial of- The benefit of wave energy is that it requires signif-
ferings exist, but research and development efforts to icantly less reserve compared to those from wind (see
produce novel and enhanced products is ongoing (see also Section 3.4.6) and solar (see Section 3.4.8) renewable en-
Section 3.4.5). ergy sources (Hong et al., 2016). For example, the forecast
Short-term forecasts (hours to days ahead) of wind error at one hour ahead for the simulated wave farms is
power production are generally produced by combining typically in the range of 5%–7%, while the forecast errors
numerical weather predictions (NWP) with a model of for solar and wind are 17 and 22% respectively (Reikard,
the wind turbine, farm or even regional power curve, Pinson, & Bidlot, 2011). Solar power is dominated by
depending on the objective. The power curve may be diurnal and annual cycles but also exhibits nonlinear vari-
modelled using physical information, e.g. provided by the ability due to factors such as cloud cover, temperature and
turbine manufacturer, in which case it is also necessary precipitation. Wind power is dominated by large ramp
to post-process NWP wind speeds to match the same events such as irregular transitions between states of high
height-above-ground as the turbine’s rotor. More accurate and low power. Wave energy exhibits annual cycles and
forecasts can be produced by learning the NWP-to-energy is generally smoother although there are still some large
relationship from historic data when it is available. State- transitions, particularly during the winter months. In the
of-the-art methods for producing wind power forecasts first few hours of forecasting wave energy, time series
leverage large quantities of NWP data to produce a single models are known to be more accurate than numerical
forecast (Andrade, Filipe, Reis, & Bessa, 2017) and detailed wave prediction. Beyond these forecast horizons, numer-
information about the target wind farm (Gilbert, Browell, ical wave prediction models such as SWAN (Simulating
& McMillan, 2020a). A number of practical aspects may WAves Nearshore, Booij, Ris, & Holthuijsen, 1999) and
also need to be considered by users, such as mainte- ®
WAVEWATCH III (Tolman, 2008) are widely used. As
nance outages and requirements to reduce output for
there is as yet no consensus on the most efficient model
other reasons, such as noise control or electricity network
for harnessing wave energy, potential wave energy is
issues.
primarily measured with energy flux, but the wave en-
Very short-term forecast (minutes to a few hours
ergy harnessed typically follows non-linear functions of
ahead) are also of value, and on these time scales recent
wave height and wave period in the observations of the
observations are the most significant input to forecasting
six different types of wave energy converters (Reikard,
models and more relevant than NWP. Classical time series
Robertson, Buckham, Bidlot, & Hiles, 2015).
methods perform well (see Section 2.3), and those which
To model the dependencies of wind speed, wave height,
are able to capture spatial dependency between multiple
wave period and their lags, Reikard et al. (2011) uses
wind farms are state-of-the-art, notably vector autore-
linear regressions, which were then converted to fore-
gressive models and variants (Cavalcante, Bessa, Reis, &
casts of energy flux. Pinson, Reikard, and Bidlot (2012)
Browell, 2016; Messner & Pinson, 2018). Care must be
use Reikard et al.’s (2011) regression model and log-
taken when implementing these models as wind power
normal distribution assumptions to produce probabilis-
time series are bounded by zero and the wind farm’s
tic forecasts. López-Ruiz, Bergillos, and Ortega-Sánchez
rated power meaning that errors may not be assumed
(2016) model the temporal dependencies of significant
to be normally distributed. The use of transformations is
wave heights, peak wave periods and mean wave di-
recommended (see also Section 2.2.1), though the choice
rection using a vector autoregressive model, and used
of transformation depends on the nature of individual
them to produce medium to long term wave energy
time series (Pinson, 2012).
Wind power forecasting is reviewed in detail in Giebel
and Kariniotakis (2017), Hong, Pinson, Wang, Weron, Yang, 129 This subsection was written by Jooyoung Jeon.

787
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

forecasts. Jeon and Taylor (2016) model the temporal Ogliari, Dolara, Manzolini, and Leva (2017) compared
dependencies of significant wave heights and peak wave the PV output power day-ahead forecasts performed by
periods using a bivariate VARMA-GARCH (see also Sec- deterministic (based on three and five parameters electric
tion 2.3.11) to convert the two probabilistic forecasts equivalent circuit) and stochastic hybrid (based on artifi-
into a probabilistic forecast of wave energy flux, find- cial neural network models) methods aiming to find the
ing this approach worked better than either univariate best performance conditions. In general, there is no sig-
modelling of wave energy flux or bivariate modelling of nificant difference between the two deterministic models,
wave energy flux and wind speed. Taylor and Jeon (2018) with the three-parameter approach being slightly more
produce probabilistic forecasts for wave heights using a accurate. Fig. 11 shows the daily value of normalised
bivariate VARMA-GARCH model of wave heights and wind mean absolute error (NMAE%) for 216 days evaluated by
speeds, and using forecasts so as to optimise decision using PHANN and three parameters electric circuit. The
making for scheduling offshore wind farm maintenance PHANN hybrid method achieves the best forecasting re-
vessels dispatched under stochastic uncertainty. On the sults, and only a few days of training can provide accurate
same subject, Gilbert, Browell, and McMillan (2020b) use forecasts.
statistical post-processing of numerical wave predictions Dolara et al. (2018) analysed the effect of different
to produce probabilistic forecasts of wave heights, wave approaches in the composition of a training data-set for
periods and wave direction and a logistic regression to de- the day-ahead forecasting of PV power production based
termine the regime of the variables. They further applied on NN. In particular, the influence of different data-set
the Gaussian copula to model temporal dependency but compositions on the forecast outcome has been investi-
this did not improve their probabilistic forecasts of wave gated by increasing the size of the training set size and by
varying the lengths of the training and validation sets, in
heights and periods.
order to assess the most effective training method of this
machine learning approach. As a general comment on the
3.4.8. Solar power forecasting130 reported results, it can be stated that a method that em-
Over the past few years, a number of forecasting tech- ploys the same chronologically consecutive samples for
niques for photovoltaic (PV) power systems has been training is best suited when the availability of historical
developed and presented in the literature. In general, the data is limited (for example, in newly deployed PV plant),
quantitative comparison among different forecast tech- while training based on randomly mixed samples method,
niques is challenging, as the factors influencing the per- appears to be most effective in the case of a greater data
formance are numerous: the historical data, the weather availability. Generally speaking, ensembles composed of
forecast, the temporal horizon and resolution, and the independent trials are most effective.
installation conditions. A recent review by Sobri, Koohi-
Kamali, and Rahim (2018) presents a comparative anal- 3.4.9. Long-term simulation for large electrical power sys-
ysis of previous works, also including statistical errors. tems131
However, since the conditions and metrics used in each In large electrical power systems with renewable en-
work were different, the comparison is not very meaning- ergy dependence, the power generators need to be sched-
ful. Dolara, Grimaccia, Leva, Mussetta, and Ogliari (2018) uled to supply the system demand (de Queiroz, 2016). In
present relevant evaluation metrics for PV forecasting general, for modelling long-term renewables future be-
accuracy, while Leva, Mussetta, & Ogliari, 2019 com- haviour, such as hydro, wind and solar photovoltaics (PV),
pare their effectiveness and immediate comprehension. stochastic scenarios should be included in the scheduling,
In term of forecast horizon for PV power systems, in- usually in a dispatch optimisation problem under uncer-
traday (Nespoli et al., 2019) and the 24 h of the next tainty – like described, for small systems, in Section 3.4.1
day (Mellit et al., 2020) are considered the most impor- and, for wave forecasting, in Section 3.4.7. Due to the
tant. complexity and uncertainly associated, this problem is, in
Nespoli et al. (2019) compared two of the most widely general, modelled with time series scenarios and multi-
used and effective methods for the forecasting of the PV stage stochastic approaches. de Queiroz (2016) presented
production: a method based on Multi-Layer Perceptron a review for hydrothermal systems, with a focus on the
(MLP) and a hybrid method using artificial neural network optimisation algorithms. Sections 3.4.6 and 3.4.8 explore
combined with clear sky solar radiation (see also Sec- the up-to-date methods for wind and PV solar power
tions 2.7.8 and 2.7.13). In the second case, the simulations forecasting.
are based on a feed-forward neural network (FFNN) but, Here, we emphasise the importance of forecasting with
among the inputs, the irradiation in clear sky conditions simulation in the long-term renewable energy planning,
is provided. This method is called Physical Hybrid Artifi- especially in hydroelectric systems. In this context, due
cial Neural Network (PHANN) and is graphically depicted to the data spatial and temporal dependence structure,
in Fig. 9 (Dolara, Grimaccia, Leva, Mussetta, & Ogliari, time series models are useful for future scenarios gen-
2015). PHANN method demonstrates better performance eration. Although the proposal could be forecasting for
than classical NN methods. Fig. 10 shows a compari- short-term planning and scheduling (as described in Sec-
son between the measured and forecasted hourly output tion 3.4.6, 3.4.7, and Section 3.4.8), simulation strategies
power of the PV plant for both sunny and cloudy days.
The PHANN method shows good forecasting performance, 130 This subsection was written by Sonia Leva.
especially for sunny days. 131 This subsection was written by Fernando Luiz Cyrino Oliveira.

788
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 9. Physical Hybrid Artificial Neural Network (PHANN) for PV power forecasting.

Fig. 10. Measured versus forecasted output power by MLP and PHANN methods.

Fig. 11. Daily NMAE% of the PHANN method trained with 10 days (left) and with 215 days (right) compared with the three-parameters model.

are explored for considering and estimating uncertainty ones, are, in general, stationary. One of the main features
in medium and/or long-term horizons. of hydroelectric generation systems is the strong depen-
According to Hipel and McLeod (1994), stochastic pro- dence on hydrological regimes. To deal with this task,
cesses of natural phenomena, such as the renewables the literature focuses on two main classes for forecasting/
789
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

simulation streamflow data: physical and data-driven mod- In order to tackle these limitations, recent research
els (Zhang, Peng, Zhang, & Wang, 2015). Water resources in energy time series forecasting is exploring two al-
management for hydropower generation and energy plan- ternative (and potentially complementary) pathways: (i)
ning is one of the main challenges for decision-makers. privacy-preserving analytics, and (ii) data markets.
At large, the hydrological data are transformed into the The role of privacy-preserving techniques applied col-
so-called affluent natural energy, that is used for scenar- laborative forecasting is to combine time series data from
ios simulation and serve as input for the optimisation multiple data owners in order to improve forecasting
algorithms (Oliveira, Souza, & Marcato, 2015). The current accuracy and keep data private at the same time. For so-
state-of-the-art models for this proposal are the periodic lar energy forecasting, Berdugo, Chaussin, Dubus, Hebrail,
ones. Hipel and McLeod (1994) presented a wide range
and Leboucher (2011) described a method based on local
of possibilities, but the univariate periodic autoregressive
and global analog-search that uses solar power time series
(PAR, a periodic extension version of the ones presented
from neighbouring sites, where only the timestamps and
in Section 2.3.4) is still the benchmark, with several en-
normalised weights (based on similarity) are exchanged
hanced versions. The approach fits a model to each period
and not the time series data. Zhang and Wang (2018) pro-
of the historical data and the residuals are simulated to
generate new future versions of the time series, consid- posed, for wind energy forecasting with spatia-temporal
ered stationary. Among many others, important variations data, a combination of ridge linear quantile regression and
and alternative proposals to PAR with bootstrap proce- Alternating Direction Method of Multipliers (ADMM) that
dures (see bootstrap details in Section 2.7.5), Bayesian enables each data owner to autonomously solve its fore-
dynamic linear models, spatial information and copulas casting problem, while collaborating with the others to
versions (for copulas references, see Section 2.4.3) are improve forecasting accuracy. However, as demonstrated
detailed in Souza, Marcato, Dias, and Oliveira (2012), by Gonçalves, Bessa, and Pinson (2021a), the mathemat-
Marangon Lima, Popova, and Damien (2014), Lohmann, ical properties of these algorithms should be carefully
Hering, and Rebennack (2016), and de Almeida Pereira analysed in order to avoid privacy breaches (i.e., when a
and Veiga (2019), respectively. third party recovers the original data without consent).
It is worth considering the need for renewables portfo- An alternative approach is to design a market (or auc-
lio simulation. This led Pinheiro Neto, Domingues, Coim- tion) mechanism for time series or forecasting data where
bra, de Almeida, Alves, and Calixto (2017) to propose a the data owners are willing to sell their private (or con-
model to integrate hydro, wind and solar power scenarios fidential) data in exchange for an economic compensa-
for Brazilian data. For the Eastern United States, Shahri- tion (Agarwal, Dahleh, & Sarkar, 2019). The basic concept
ari and Blumsack (2018) add to the literature on the consists in pricing data as a function of privacy loss, but it
wind, solar and blended portfolios over several spatial and can be also pricing data as a function of tangible benefits
temporal scales. For China, Liu et al. (2020) proposed a such as electricity market profit maximization. Gonçalves,
multi-variable model, with a unified framework, to sim-
Pinson, and Bessa (2021b) adapted for renewable energy
ulate wind and PV scenarios to compensate hydropower
forecasting the model described in Agarwal et al. (2019),
generation. However, in light of the aforementioned, one
by considering the temporal nature of the data and re-
of the key challenges and trends for renewable electrical
lating data price with the extra revenue obtained in the
power systems portfolio simulation are still related to the
inclusion of exogenous variables, such as climate, mete- electricity market due to forecasting accuracy improve-
orological, calendar and economic ones, as mentioned in ment. The results showed a benefit in terms of higher
Section 3.4.2. revenue resulting from the combination of electricity and
data markets. With the advent of peer-to-peer energy
3.4.10. Collaborative forecasting in the energy sector132 markets at the domestic consumer level (Parag & Sova-
As mentioned in Section 3.4.6, the combination of geo- cool, 2016), smart meter data exchange between peers is
graphically distributed time series data, in a collaborative also expected to increase and enable collaborative fore-
forecasting (or data sharing) framework, can deliver sig- casting schemes. For this scenario, Yassine, Shirehjini, and
nificant improvements in the forecasting accuracy of each Shirmohammadi (2015) proposed a game theory mecha-
individual renewable energy power plant. The same is nism where a energy consumer maximizes its reward by
valid for hierarchical load forecasting (Hong et al., 2019) sharing consumption data and a data aggregator can this
and energy price forecasting (see Section 3.4.2). A review data with a data analyst (which seeks data with the lowest
of multivariate time series forecasting methods can be possible price).
found in Section 2.3.9 2.3.11 and Section 2.4.3. However, Finally, promoting data sharing via privacy-preserving
this data might have different owners, which are un- or data monetisation can also solve data scarcity prob-
willing to share their data due to the following reasons: lems in some use cases of the energy sector, such as
(i) personal or business sensitive information, (ii) lack forecasting the condition of electrical grid assets (Fan,
of understanding about which data can and cannot be Nowaczyk, & Röognvaldsson, 2020). Moreover, combina-
shared, and (iii) lack of information about economic (and tion of heterogeneous data sources (e.g., numerical, tex-
technical) benefits from data sharing. tual, categorical) is a challenging and promising avenue of
future research in collaborative forecasting (Obst, Ghattas,
132 This subsection was written by Ricardo Bessa. Claudel, Cugliari, Goude, & Oppenheim, 2019).
790
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 12. (a) UK emissions, (b) energy sources in megatonnes (Mt) and megatonnes of oil equivalent (Mtoe), (c) economic variables, and (d) multi-step
forecasts of CO2 emissions in Mt.

3.5. Environmental applications (Croll, 1875 and Milankovitch, 1969), allowing 100,000
years of multi-step forecasts at endogenous emissions.
3.5.1. Forecasting two aspects of climate change133 Humanity has affected climate since 10 thousand years
First into the Industrial Revolution, the UK is one ago (kya: Ruddiman, 2005), so we commence forecasts
of the first out: in 2013 its per capita CO2 emissions there. Forecasts over −10 to 100 with time series from
dropped below their 1860 level, despite per capita real 400kya in panels (d) to (f) of Fig. 13 show paths within
incomes being around 7-fold higher (Hendry, 2020). The the ranges of past data ±2.2SE (Pretis & Kaufmann, 2018).
model for forecasting UK CO2 emissions was selected Atmospheric CO2 already exceeds 400 ppm (parts per
from annual data 1860–2011 on CO2 emissions, coal and million), dramatically outside the Ice-Age range (Sundquist
oil usage, capital and GDP, their lags and non-linearities & Keeling, 2009). Consequently, we conditionally forecast
(see Section 3.5.2 for higher frequency weather forecasts). the next 100,000 years, simulating the potential climate
Figs. 12(a) to 12(c) show the non-stationary time series for anthropogenic CO2 (Castle & Hendry, 2020b) not-
with strong upward then downward trends, punctuated ing the ‘greenhouse’ temperature is proportional to the
by large outliers from world wars, miners strikes plus logarithm of CO2 (Arrhenius, 1896). The orbital drivers
shifts from legislation and technological change: (Castle will continue to influence all three variables but that
& Hendry, 2020a). Saturation estimation at 0.1% using
relation is switched off in the scenario for ‘exogenised’
Autometrics (Doornik, 2018) retaining all other regres-
CO2 . The 110 dynamic forecasts conditional on 400 ppm
sors, detected 4 step shifts coinciding with major policy
and 560 ppm with ±2SE bands are shown in Fig. 14,
interventions like the 2008 Climate Change Act, plus nu-
panels (a) and (b) for Ice and Temperature respectively.
merous outliers, revealing a cointegrated relation. The
The resulting global temperature rises inferred from these
multi-step forecasts over 2012—2017 from a VAR in panel
Antarctic temperatures would be dangerous, at more than
(d) of Fig. 12 show the advantage of using step-indicator
5 ◦ C, with Antarctic temperatures positive for thousands
saturation (SIS: Castle et al., 2015b).
We formulated a 3-equation simultaneous model of of years (Pretis & Kaufmann, 2020; Vaks, Mason, Breiten-
atmospheric CO2 and Antarctic Temperature and Ice vol- bach, et al., 2019).
ume over 800,000 years of Ice Ages in 1000-year fre-
quency (Kaufmann & Juselius, 2013; Paillard, 2001). Driven 3.5.2. Weather forecasting134
by non-linear functions of eccentricity, obliquity, and The weather has a huge impact on our lives, affecting
precession (see panels (a), (b), and (c) of Fig. 13 respec- health, transport, agriculture (see also Section 3.8.10), en-
tively), the model was selected with saturation estima- ergy use (see also Section 3.4), and leisure. Since Bjerknes
tion. Earth’s orbital path is calculable into the future (1904) introduced hydrodynamics and thermodynamics

133 This section was written by David F. Hendry. 134 This subsection was written by Thordis Thorarinsdottir.

791
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 13. Ice-Age data, model fits, and forecasts with endogenous CO2 .

Fig. 14. Ice-Age simulations with exogenous CO2 .

into meteorology, weather prediction has been based on Leuenberger et al., 2020). However, for fixed computa-
merging physical principles and observational informa- tional resources, there is a trade-off between grid resolu-
tion. Modern weather forecasting is based on numerical tion and ensemble size, with a larger ensemble generally
weather prediction (NWP) models that rely on accurate providing a better estimate of the prediction uncertainty.
estimates of the current state of the climate system, in- Recent advances furthermore include machine learning
cluding ocean, atmosphere and land surface. Uncertainty approaches (see Section 2.7.10) to directly model the
in these estimates is propagated through the NWP model small-scale processes, in particular cloud processes (see,
by running the model for an ensemble of perturbed ini-
for example, Gentine, Pritchard, Rasp, Reinaudi, & Yacalis,
tial states, creating a weather forecast ensemble (Buizza,
2018; Rasp, Pritchard, & Gentine, 2018).
2018; Toth & Buizza, 2019).
Despite rapid progress in NWP modelling, the raw en-
One principal concern in NWP modelling is that small-
scale phenomena such as clouds and convective precipi- semble forecasts exhibit systematic errors in both magni-
tation are on too small a scale to be represented directly tude and spread (Buizza, 2018). Statistical post-processing
in the models and must, instead, be represented by ap- is thus routinely used to correct systematic errors in cal-
proximations known as parameterisations. Current NWP ibration and accuracy before a weather forecast is is-
model development aims at improving both the grid res- sued; see Vannitsem, Wilks, and Messner (2018) for a
olution and the observational information that enters the recent review but also Sections 2.12.4 and 2.12.5. A funda-
models (Bannister, Chipilski, & Martinez-Alvarado, 2020; mental challenge here is to preserve physical consistency
792
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

across space, time and variables (see, for example, Hein- the systems offered reasonable performance, with corre-
rich, Hellton, Lenkoski, & Thorarinsdottir, 2020; Möller, lation coefficients ranging from 0.6 (lazy learning method)
Lenkoski, & Thorarinsdottir, 2013; Schefzik, Thorarins- to 0.75 (neural network). The work also demonstrated
dottir, & Gneiting, 2013). This is particularly important that the performance of the ensemble of the three sys-
when the weather forecast is used as input for further tems was better than the best model for each monitoring
prediction modelling, e.g., in hydrometeorology (Hemri, station (see also Section 2.6 for further discussions on
2018; Hemri, Lisniak, & Klein, 2015). forecast combinations).
At time scales beyond two weeks, the weather noise Starting from the results of this work, a second config-
that arises from the growth of the initial uncertainty, uration was implemented, using as input also the wind
becomes large (Royer, 1993). Sources of long-range pre- speed measured in the meteorological monitoring sta-
dictability are usually associated with the existence of tion closest to the measurement point of PM. The re-
slowly evolving components of the earth system, includ- searchers observed an improvement in all performance
ing the El Niño Southern Oscillation (ENSO), monsoon indices, with the median of the correlation for the best
rains, the Madden Julian Oscillation (MJO), the Indian model (neural networks) increasing from 0.75 to 0.82 and
Ocean dipole, and the North Atlantic Oscillation (NAO), the RMSE dropping from 15µg /m3 to 7µg /m3 .
spanning a wide range of time scales from months to One of the main drawbacks of data-driven models for
decades (Hoskins, 2013; Vitart, Robertson, & Anderson, air quality is that they provide information only in the
2012). It is expected that, if a forecasting system is capable point where the measurements are available. To over-
of reproducing these slowly evolving components, they come this limitation, recent literature has presented mixed
may also be able to forecast them (Van Schaeybroeck & deterministic and data-driven approaches (see, for exam-
Vannitsem, 2018). The next step is then to find relation- ple, Carnevale, Angelis, Finzi, Turrini, & Volta, 2020) which
ships between modes of low-frequency variability and use the data assimilation procedure and offer promising
the information needed by forecast users such as predic- forecasting performance.
tions of surface temperature and precipitation (Roulin & From a practical point of view, critical issues regarding
Vannitsem, 2019; Smith, Scaife, Eade, Athanasiadis, Bel- forecasting air quality include:
lucci, Bethke, Bilbao, Borchert, Caron, Counillon, Danaba-
soglu, Delowrth, Doblas-Reyes, Dunstone, Estella-Perez, • Information collection and data access: even if re-
Flavoni, Hermanson, Keenlyside, Kharin, Kimoto, Merry- gional authorities have to publicly provide data and
field, Mignot, Mochizuki, Modali, Moneri, Müller, Nicolí, information related to air quality and meteorology,
Ortega, Pankatz, Pholman, Robson, Ruggieri, Sospedra- the measured data are not usually available in real-
Alfonso, Swingedouw, Wang, Wild, Yeager, Yang, & Zhang, time and the interfaces are sometimes not auto-
2020). mated;
• Data quantity: the amount of information required
3.5.3. Air quality forecasting135 by air quality forecasting systems is usually large, in
To preserve human health, European Commission stated particular towards the definition of the training and
in the Directive (2008/50/EC) that member states have validation sets;
to promptly inform the population when the particu- • Non-linear relationships: the phenomenon of accu-
late matter (PM) daily mean value exceeds (or is ex- mulation of pollutants in atmosphere is usually af-
pected to exceed) the threshold of 50µg /m3 . Therefore, fected by strong nonlinearities, which significantly
systems have been designed in order to produce fore- impact the selection of the models and their perfor-
casts for up to three days in advance using as input mance;
the measured value of concentration and meteorological • Unknown factors: it is a matter of fact that the
conditions. These systems can be classified in (i) data- dynamic of pollutants in atmosphere is affected by
driven models (Carnevale, Finzi, Pisoni, & Volta, 2016; a large number of non-measurable variables (such
Corani, 2005; Stadlober, Hormann, & Pfeiler, 2018, and as meteorological variables or the interaction with
Section 2.7), and (ii) deterministic chemical and transport other non-measurable pollutants), largely affecting
models (Honoré, Menut, Bessagnet, Meleux, & Rou, 2007; the capability of the models to reproduce the state
Manders, Schaap, & Hoogerbrugge, 2009). In this section, a of the atmosphere.
brief overview of the application of these systems to the
high polluted area of Lombardy region, in Italy, will be 3.5.4. Forecasting and decision making for floods and water
presented. resources management136
Carnevale, Finzi, Pederzoli, Turrini, and Volta (2018) In Water Resources and Flood Risk Management, de-
compared the results of three different forecasting sys- cision makers are frequently confronted with the need
tems based on neural networks, lazy learning models, and of taking the most appropriate decisions not knowing
regression trees respectively. A single model has been what will occur in the future. To support their decision-
identified for each monitoring station. In the initial con- making under uncertainty, decision theory (Berger, 1985;
figuration, only the last three PM measurements available Bernardo, 1994; DeGroot, 2004) invokes Bayesian informed
were used to produce the forecast. In this configuration, decision approaches, which find the most appropriate

135 This subsection was written by Claudio Carnevale. 136 This subsection was written by Ezio Todini.

793
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

decision by maximising (or minimising) the expected 3.6. Social good and demographic forecasting
value of a ‘‘utility function’’, thus requiring its definition,
together with the estimation of a ‘‘predictive probability’’ 3.6.1. Healthcare137
density (Berger, 1985) due to the fact that utility func- There are many decisions that depend on the quality
tions are rarely linear or continuous. Consequently, their of forecasts in the health care system, from capacity plan-
expected value does not coincide with the value assumed ning to layout decisions to the daily schedules. In general,
on the predicted ‘‘deterministic’’ expected value. Accord- the role of forecasting in health care is to inform both clin-
ical and non-clinical decisions. While the former concerns
ingly, overcoming the classical 18th century ‘‘mechanis-
decisions related to patients and their treatments (Makri-
tic’’ view by resorting into probabilistic forecasting ap-
dakis, Kirkham, Wakefield, Papadaki, Kirkham, & Long,
proaches becomes essential (see also Section 2.6.2).
2019), the latter involves policy/management, and supply
The failure of decision-making based on deterministic
chain decisions that support the delivery of high-quality
forecasts in the case of Flood Risk Management is easily care for patients.
shown through a simple example. At a river section, the A number of studies refer to the use of forecasting
future water level provided by a forecast is uncertain and methods to inform clinical decision making. These meth-
can be described by a Normal distribution with mean ods are used to screen high risk patients for preventative
10 meters and standard deviation of 5 m. Given a dike health care (Chen, Wang, & Hung, 2015; Santos, Abreu,
elevation of 10.5 m, damages may be expected as zero Garca-Laencina, Simão, & Carvalho, 2015; Uematsu, Ku-
if water level falls below the dike elevation and linearly nisawa, Sasaki, Ikai, & Imanaka, 2014; van der Mark,
growing when level exceeds it with a factor of 106 dol- van Wonderen, Mohrs, van Aalderen, ter Riet, & Bindels,
lars. If one assumes the expect value of forecast as the 2014), to predict mental health issues (Shen, Jia, Nie,
deterministic prediction to compute the damage the latter Feng, Zhang, Hu, Chua, & Zhu, 2017; Tran, Phung, Luo,
will result equal to zero, while if one correctly integrates Harvey, Berk, & Venkatesh, 2013), to assist diagnosis and
the damage function times the predictive density the disease progression (Ghassemi et al., 2015; Ma, Chitta,
estimated expected damage will results into 6.59 millions Zhou, You, Sun, & Gao, 2017; Pierce, Hess, Kline, Shah,
Breslin, Branda, Pencille, Asplin, Nestler, Sadosty, Stiell,
of dollars and educated decisions on alerting or not the
Ting, & Montori, 2010; Qiao, Wu, Ge, & Fan, 2019), to
population or evacuating or not a flood-prone area can be
determine prognosis (Dietzel et al., 2010; Ng, Stein, Ning,
appropriately taken (see also Section 3.6).
& Black-Schaffer, 2007), and to recommend treatments for
Water resources management, and in particular reser-
patients (Kedia & Williams, 2003; Scerri, De Goumoens,
voirs management, aim at deriving appropriate operat- Fritsch, Van Melle, Stiefel, & So, 2006; Shang, Ma, Xiao, &
ing rules via long term expected benefits maximisation. Sun, 2019). Common forecasting methods to inform clin-
Nonetheless, during flood events decision makers must ical decisions include time series (see Sections 2.3.1, 2.3.4
decide how much to preventively release from multi- and 2.3.5), regression models (see Section 2.3.2), classi-
purpose reservoirs in order to reduce dam failure and fication trees (see Section 2.7.12), neural networks (see
downstream flooding risks the optimal choice descending Section 2.7.8), Markov models (see Section 2.3.12) and
from trading-off between loosing future water resource vs Bayesian networks. These models utilise structured and
the reduction of short term expected losses. unstructured data including clinician notes
This is obtained by setting up an objective function (Austin & Kusumoto, 2016; Labarere, Bertrand, & Fine,
based on the linear combination of long and short term 2014) which makes the data pre-processing a crucial part
‘‘expected losses’’, once again based on the available prob- of the forecasting process in clinical health care.
abilistic forecast. This Bayesian adaptive reservoir man- One of the aspects of the non-clinical forecasting that
agement approach incorporating into the decision mech- has received the most attention in both research and ap-
plication is the policy and management. Demand forecast-
anism the forecasting information described by the short-
ing is regularly used in Emergency Departments (Arora,
term predictive probability density, was implemented on
Taylor, & Mak, 2020; Choudhury & Urena, 2020; Khaldi,
the lake Como since 1997 (Todini, 1999, 2017) as an
El Afia, & Chiheb, 2019; Rostami-Tabar & Ziel, 2020),
extension of an earlier original idea (Todini, 1991). This
ambulance services (Al-Azzani, Davari, & England, 2020;
resulted into: Setzler, Saydam, & Park, 2009; Vile, Gillard, Harper, &
• a reduction of over 30% of the city of Como fre- Knight, 2012; Zhou & Matteson, 2016) and hospitals with
quency; several different specialities (McCoy, Pellegrini, & Perlis,
• an average reduction of 12% of the water deficit; 2018; Ordu, Demir, & Tofallis, 2019; Zhou, Zhao, Wu,
Cheng, & Huang, 2018) to inform operational, tactical
• an increase of 3% in the electricity production.
and strategic planning. The common methods used for
Lake Como example clearly shows that instead of bas- this purpose include classical ARIMA and exponential
ing decisions on the deterministic prediction, the use of smoothing methods, regression, singular spectrum anal-
a Bayesian decision scheme, in which model forecasts ysis, Prophet, Double-Seasonal Holt-Winter, TBATS and
describe the predictive probability density, increases the Neural Networks. In public health, forecasting can guide
reliability of the management scheme by essentially re- policy and planning. Although it has a wider definition,
ducing the probability of wrong decisions (Todini, 2017,
2018). 137 This section was written by Bahman Rostami-Tabar.

794
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

the most attention is given to Epidemic forecasting (see cases. Shaman and Karspeck (2012) used Kalman filter
also Section 3.6.2). based SIR epidemiological models to forecast the peak
Forecasting is also used in both national and global time of influenza 6–7 weeks ahead.
health care supply chains, not only to ensure the avail- For COVID-19, Petropoulos and Makridakis (2020) ap-
ability of medical products for the population but also plied a multiplicative exponential smoothing model (see
to avoid excessive inventory. Additionally, the lack of also Section 2.3.1) for predicting global number of con-
accurate demand forecast in a health supply chain may firmed cases, with very successful results both for point
cost lives (Baicker, Chandra, & Skinner, 2012) and has forecasts and prediction intervals. This article got serious
exacerbated risks for suppliers (Levine, Pickett, Sekhri, traction with 100,000 views and 300 citations in the first
& Yadav, 2008). Classical exponential smoothing, ARIMA, twelve months since its publication, thus evidencing the
regression and Neural Network models have been applied importance of such empirical investigations. There has
to estimate the drug utilisation and expenditures (Dolgin, been a series of studies focusing on predicting deaths
2010; Linnér, Eriksson, Persson, & Wettermark, 2020), in the USA and European countries for the first wave of
blood demand (Fortsch & Khapalova, 2016), hospital sup- the COVID-19 pandemic (IHME COVID-19 health service
plies (Gebicki, Mooney, Chen, & Mazur, 2014; Riahi, Hosseini- utilization forecasting team & Murray, 2020a, 2020b). Fur-
Motlagh, & Teimourpour, 2013) and demand for global thermore, Petropoulos, Makridakis, and Stylianou (2020)
medical items (Amarasinghe, Wichmann, Margolis, & Ma- expanded their investigation to capture the continuation
honey, 2010; Hecht & Gandhi, 2008; van der Laan, van of both cases and deaths as well as their uncertainty,
Dalen, Rohrmoser, & Simpson, 2016). It is important to achieving high levels of forecasting accuracy for ten-days-
note that, while the demand in a health care supply chain ahead forecasts over a period of four months. Along the
has often grouped and hierarchical structures (Mircetica, same lines, Doornik et al. (2020b) have been publish-
Rostami-Tabar, Nikolicica, & Maslarica, 2020, see also ing real-time accurate forecasts of confirmed cases and
Section 2.10.1), this has not been well investigated and deaths from mid-March 2020 onwards. Their approach is
needs more attention. based on extraction of trends from the data using machine
learning.
3.6.2. Epidemics and pandemics138 Pinson and Makridakis (2020) organised a debate be-
Pandemics and epidemics both refer to disease out- tween Taleb and Ioannidis on forecasting pandemics. Ioan-
breaks. An epidemic is a disease outbreak that spreads nidis, Cripps, and Tanner (2020) claim that forecasting for
across a particular region. A pandemic is defined as spread COVID-19 has by and large failed. However they give rec-
of a disease worldwide. Forecasting the evolution of a ommendations of how this can be averted. They suggest
pandemic or an epidemic, the growth of cases and fatal- that the focus should be on predictive distributions and
models should be continuously evaluated. Moreover, they
ities for various horizons and levels of granularity, is a
emphasise the importance of multiple dimensions of the
complex task with raw and limited data – as each disease
problem (and its impact). Taleb et al. (2020) discuss the
outbreak type has unique features with several factors
dangers of using naive, empirical approaches for fat-tailed
affecting the severity and the contagiousness. Be that as
variables and tail risk management. They also reiterate
it may, forecasting becomes an paramount task for the
the inefficiency of point forecasts for such phenomena.
countries to prepare and plan their response (Nikolopou-
Finally, Nikolopoulos, Punia, Schäfers, Tsinopoulos, and
los, 2020), both in healthcare and the supply chains (Be-
Vasilakis (2020) focused on forecast-driven planning, pre-
liën & Forcé, 2012, see also Section 3.6.1 and Section 3.2.2).
dicting the growth of COVID-19 cases and the respective
Successful forecasting methods for the task include
disruptions across the supply chain at country level with
time-series methods (see Section 2.3), epidemiological
data from the USA, India, UK, Germany, and Singapore.
and agent-based models (see Section 2.7.3), metapopula-
Their findings confirmed the excess demand for groceries
tion models, approaches in metrology (Nsoesie, Mararthe,
and electronics, and reduced demand for automotive –
& Brownstein, 2013), machine and deep learning meth-
but the model also proved that the earlier a lock-down
ods (Yang, Zeng, Wang, Wong, Liang, Zanin, Liu, Cao,
is imposed, the higher the excess demand will be for gro-
Gao, Mai, Liang, Liu, Li, Li, Ye, Guan, Yang, Li, Luo, Xie,
ceries. Therefore, governments would need to secure high
Liu, Wang, Zhang, Wang, Zhong, & He, 2020). Anders- volumes of key products before imposing lock-downs;
son, Kühlmann-Berenzon, Linde, Schiöler, Rubinova, and and, when this is not possible, seriously consider more
Frisén (2008) used regression models for the prediction radical interventions such as rationing.
of the peak time and volume of cases for a pandemic Dengue is one of the most common epidemic diseases
with evidence from seven outbreaks in Sweden. Yaffee, in tropical and sub-tropical regions of the world. Esti-
Nikolopoulos, Reilly, Crone, Wagoner, Douglas, Amman, mates of World Health Organisation reveals that about
Ksiazek, and Mills (2011) forecasted the evolution of half of the world’s population is now at risk for Dengue
the Hantavirus epidemic in USA and compared causal infection (Romero, Olivero, Real, & Guerrero, 2019). Aedes
and machine-learning methods with time-series methods aegypti and Aedes albopictus are the principal vectors of
and found that univariate methods quite successful. Soe- dengue transmission and they are highly domesticated
biyanto, Adimi, and Kiang (2010) used ARIMA models for mosquitoes. Rainfall, temperature and relative humidity
successfully short-term forecasting of influenza weekly are thought of as important factors attributing towards
the growth and dispersion of mosquito vectors and po-
138 This subsection was written by Konstantinos Nikolopoulos & tential of dengue outbreaks (Banu, Hu, Hurst, & Tong,
Thiyanga S. Talagala. 2011).
795
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

In reviewing the existing literature, two data types The analysis of mortality data is fundamental to pub-
have been used to forecast dengue incidence: (i) spatio- lic health authorities and policymakers to make deci-
temporal data: incidence of laboratory-confirmed dengue sions or evaluate the effectiveness of prevention and
cases among the clinically suspected patients (Naish, Dale, response strategies. When facing a new pandemic, mor-
Mackenzie, McBride, Mengersen, & Tong, 2014), (ii) web- tality surveillance is essential for monitoring the over-
based data: Google trends, tweets associated with Dengue all impact on public health in terms of disease sever-
cases (de Almeida Marques-Toledo et al., 2017). ity and mortality (Setel, AbouZahr, Atuheire, Bratschi,
SARIMA models (see also Section 2.3.4) have been Cercone, Chinganya, Clapham, Clark, Congdon, de Sav-
quite popular in forecasting laboratory-confirmed dengue igny, Karpati, Nichols, Jakob, Mwanza, Muhwava, Nah-
cases (Gharbi et al., 2011; Martinez & Silva, 2011; Promprou, mias, Ortiza, & Tshangelab, 2020; Vestergaard, Nielsen,
Jaroensutasinee, & Jaroensutasinee, 2006). Chakraborty, Richter, Schmid, Bustos, Braeye, Denissov, Veideman, Lu-
Chattopadhyay, and Ghosh (2019) used a hybrid model
omala, Möttönen, Fouillet, Caserio-Schönemann, an der
combining ARIMA and neural network autoregressive
Heiden, Uphoff, Lytras, Gkolfinopoulou, Paldy, Domegan,
(NNAR) to forecast dengue cases. In light of biological
O’Donnell, de’ Donato, Noccioli, Hoffmann, Velez, England,
relationships between climate and transmission of Aedes
van Asten, White, Tønnessen, da Silva, Rodrigues, Larrauri,
mosquitoes, several studies have used additional covari-
Delgado-Sanz, Farah, Galanis, Junker, Perisa, Sinnathamby,
ates such as, rainfall, temperature, wind speed, and hu-
midity to forecasts dengue incidence (Banu et al., 2011; Andrews, O’Doherty, Marquess, Kennedy, Olsen, Pebody,
Naish et al., 2014; Talagala, 2015). Poisson regression ECDC Public Health Emergency Team for COVID-19, Krause,
model has been widely used to forecast dengue incidence & Mlbak, 2020). A useful metric is excess mortality and
using climatic factors and lagged time between dengue is the difference between the observed number of deaths
incidence and weather variables (Hii, Zhu, Ng, Ng, & Rock- and the expected number of deaths under ‘‘normal’’ con-
löv, 2012; Koh, Spindler, Sandgren, & Jiang, 2018). Several ditions (Aron & Muellbauer, 2020; Checchi & Roberts,
researchers looked at the use of Quasi-Poisson and neg- 2005). Thus, it can only be estimated with accurate and
ative binomial regression models to accommodate over high-quality data from previous years. Excess mortality
dispersion in the counts (Lowe et al., 2011; Wang, Jiang, has been used to measure the impact of heat events (Li-
Fan, Wang, & Liu, 2014). Cazelles, Chavez, McMichael, maye, Vargo, Harkey, Holloway, & Patz, 2018; Matte,
and Hales (2005) used wavelet analysis to explore the Lane, & Ito, 2016), pandemic influenza (Nielsen, Mazick,
dynamic of dengue incidence and wavelet coherence anal- Andrews, Detsis, Fenech, Flores, Foulliet, Gergonne, Green,
yses was used to identify time and frequency specific Junker, Nunes, O’Donnell, Oza, Paldy, Pebody, Reynolds,
association with climatic variables. de Almeida Marques- Sideroglou, E, Simon-Sofia, Uphoff, Van Asten, Virtanen,
Toledo et al. (2017) took a different perspective and look Wuillaume, & Molbak, 2013; Nunes, Viboud, Machado,
at weekly tweets to forecast Dengue cases. Rangarajan, Ringholz, Rebelo-de Andrade, Nogueira, & Miller, 2011),
Mody, and Marathe (2019) used Google trend data to and nowadays COVID-19 (Nogueira, de Araújo Nobre,
forecast Dengue cases. Authors hypothesised that web Nicola, Furtado, & Carneiro, 2020; Ritchie, Ortiz-Ospina,
query search related to dengue disease correlated with Beltekian, Mathieu, Hasell, Macdonald, Giattino, & Roser,
the current level of dengue cases and thus may be helpful 2020; Shang & Xu, 2021; Sinnathamby, Whitaker, Cough-
in forecasting dengue cases.
lan, Bernal, Ramsay, & Andrews, 2020, and Section 3.6.2),
A direction for future research in this field is to explore
among others. Excess mortality data have been making
the use of spatio-temporal hierarchical forecasting (see
available by the media publications The Economist, The
Section 2.10).
New York Times and The Financial Times. Moreover, a mon-
3.6.3. Forecasting mortality139 itoring system of the weekly excess mortality in Europe
Actuarial, Demographic, and Health studies are some has been performed by the EuroMOMO project (Vester-
examples where mortality data are commonly used. A gaard et al., 2020).
valuable source of mortality information is the Human An essential use of mortality data for those individuals
Mortality Database (HMD), a database that provides mor- at age over 60 is in the pension and insurance industries,
tality and population data for 41 mainly developed coun- whose profitability and solvency crucially rely on accu-
tries. Additionally, at least five country-specific databases rate mortality forecasts to adequately hedge longevity
are devoted to subnational data series: Australian, Cana- risks (see, e.g., Shang & Haberman, 2020a, 2020b). Longevity
dian, and French Human Mortality Databases, United risk is a potential systematic risk attached to the increas-
States and Japan Mortality Databases. In some situations, ing life expectancy of annuitants, and it is an important
the lack of reliable mortality data can be a problem, espe- factor to be considered when determining a sustainable
cially in developing countries, due to delays in registering government pension age (see, e.g., Hyndman, Zeng, &
or miscounting deaths (Checchi & Roberts, 2005). Analysis Shang, 2021, for Australia). The price of a fixed-term or
of National Causes of Death for Action (ANACONDA) is a lifelong annuity is a random variable, as it depends on the
valuable tool that assesses the accuracy and complete- value of zero-coupon bond price and mortality forecasts.
ness of data for mortality and cause of death by check- The zero-coupon bond price is a function of interest rate
ing for potential errors and inconsistencies (Mikkelsen, (see Section 3.3.6) and is comparably more stable than the
Moesgaard, Hegnauer, & Lopez, 2020). retirees’ mortality forecasts.
Several methodologies were developed for mortality
139 This subsection was written by Clara Cordeiro & Han Lin Shang. modelling and forecasting (Booth & Tickle, 2008; Janssen,
796
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

2018). These methods can be grouped into three cate- behaviour is influenced by numerous factors acting at
gories: expectation, explanation, and extrapolation (Booth different levels, from individual characteristics to soci-
& Tickle, 2008). etal change (Balbo, Billari, & Mills, 2013). An important
The expectation approach is based on the subjective methodological challenge for many low- and middle-
opinion of experts (see also Section 2.11.4), who set a income countries is fertility estimation, due to deficien-
long-run mortality target. Methods based on expectation cies in vital statistics caused by inadequate birth regis-
make use of experts’ opinions concerning future mortal- tration systems (AbouZahr, de Savigny, Mikkelsen, Setel,
ity or life expectancy with a specified path and speed Lozano, & Lopez, 2015; Moultrie, Dorrington, Hill, Hill,
of progression towards the assumed value (Continuous Timæ us, & Zaba, 2013; Phillips, Adair, & Lopez, 2018).
Mortality Investigation, 2020). The advantage of this ap- Such countries are often also in the process of transition-
proach is that demographic, epidemiological, medical, and ing from high to low fertility, which induces greater fore-
other relevant information may be incorporated into the cast uncertainty compared to low-fertility countries (Pro-
forecasts. The disadvantages are that such information is gramme, 2019).
subjective and biased towards experts’ opinions, and it A range of statistical models have been proposed to
only produces scenario-based (see Section 2.11.5) deter- forecast fertility – see Booth (2006), Bohk-Ewald, Li, and
ministic forecasts (Ahlburg & Vaupel, 1990; Wong-Fupuy Myrskylä (2018), and Shang and Booth (2020) for reviews.
& Haberman, 2004). The Lee–Carter model (Lee & Carter, 1992) originally
The explanation approach captures the correlation be- developed for mortality forecasting (see Section 3.6.3)
tween mortality and the underlying cause of death. Meth-
has been applied to fertility (Lee, 1993), with exten-
ods based on the explanation approach incorporate med-
sions in functional data (Hyndman & Ullah, 2007) and
ical, social, environmental, and behavioural factors into
Bayesian (Wiśniowski, Smith, Bijak, Raymer, & Forster,
mortality modelling. Example include smoking and disease-
2015) contexts. Other notable extrapolative methods in-
related mortality models. The benefit of this approach is
clude the cohort-ARIMA model of De Beer (1985, 1990)
that mortality change can be understood from changes
– see Section 2.3.4 – and the linear extrapolation method
in related explanatory variables; thus, it is attractive in
of Myrskylä, Goldstein, and Cheng (2013). Many paramet-
terms of interpretability (Gutterman & Vanderhoof, 1998).
ric models have been specified to describe the shapes of
The extrapolative approach is considered more ob-
jective, easy to use and more likely to obtain better fertility curves (Brass, 1974; Evans, 1986; Hoem, Madsen,
forecast accuracy than the other two approaches (Janssen, Nielsen, Ohlsen, Hansen, & Rennermalm, 1981; Schmert-
2018). The extrapolation approach identifies age patterns mann, 2003), with forecasts obtained through time series
and trends in time which can be then forecasted via extrapolations of the parameters (Congdon, 1990; De Iaco
univariate and multivariate time series models (see Sec- & Maggio, 2016; Knudsen, McNown, & Rogers, 1993).
tion 2.3). In the extrapolation approach, many parametric Bayesian methods have been used to borrow strength
and nonparametric methods have been proposed (see, across countries (for example, Alkema et al., 2011; Schmert-
e.g., Alho & Spencer, 2005; Hyndman & Ullah, 2007; mann, Zagheni, Goldstein, & Myrskylä, 2014), with El-
Shang, Booth, & Hyndman, 2011). Among the parametric lison, Dodd, and Forster (2020) developing a hierarchi-
methods, the method of Heligman and Pollard (1980) cal model in the spirit of the latter. The top-down ap-
is well-known. Among the nonparametric methods, the proach (see Section 2.10.1) of the former, which is used by
Lee–Carter model (Lee & Carter, 1992), Cairns-Blake-Dowd the United Nations, projects the aggregate Total Fertility
model (Cairns et al., 2009; Dowd, Cairns, Blake, Cough- Rate (TFR) measure probabilistically (also see Tuljapurkar
lan, Epstein, & Khalaf-Allah, 2010), and functional data & Boe, 1999) before decomposing it by age. Hyppöla,
model (Hyndman & Ullah, 2007, and Section 2.3.10), as Tunkelo, and Törnqvist (1949) provide one of the earli-
well as their extensions and generalisations are dominant. est known examples of probabilistic fertility forecasting
The time-series extrapolation approach has the advan- (Alho & Spencer, 2005).
tage of obtaining a forecast probability distribution rather Little work has been done to compare forecast per-
than a deterministic point forecast and, also, enable the formance across this broad spectrum of approaches. The
determination of forecast intervals (Booth & Tickle, 2008). study of Bohk-Ewald et al. (2018) is the most comprehen-
Janssen (2018) presents a review of the advances in sive to date. Most striking is their finding that few meth-
mortality forecasting and possible future research chal- ods can better the naive freezing of age-specific rates, and
lenges. those that can differ greatly in method complexity (see
also Section 2.5.2). A recent survey of fertility forecast-
3.6.4. Forecasting fertility140 ing practice in European statistical offices (Gleditsch &
Aside from being a driver of population forecasts (see Syse, 2020) found that forecasts tend to be deterministic
Section 2.3.7), fertility forecasts are vital for planning and make use of expert panels (see Section 2.11.4). Ex-
maternity services and anticipating demand for school pert elicitation techniques are gaining in sophistication,
places. The key challenge relates to the existence of, highlighted by the protocol of Statistics Canada (Dion,
and interaction between, the quantum (how many?) and Galbraith, & Sirag, 2020) which requests a full probability
tempo (when?) components (Booth, 2006). This intrinsic distribution of the TFR.
dependence on human decisions means that childbearing A promising avenue is the development of forecasting
methods that incorporate birth order (parity) information,
140 This subsection was written by Joanne Ellison. supported by evidence from individual-level analyses (for
797
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

example, Fiori, Graham, & Feng, 2014). Another underex- more predictable than other (e.g., asylum). Another recog-
plored area is the integration of survey data into fertil- nised problem with models using covariates is that those
ity forecasting models, which tend to use vital statistics can be endogenous to migration (e.g., population) and also
alone when they are of sufficient quality (see Rendall, need predicting, which necessitates applying structured
Handcock, & Jonsson, 2009; Zhang & Bryant, 2019, for models to prevent uncertainty from exploding.
Bayesian fertility estimation with imperfect census data). The methodological gaps and current work in migra-
Alternative data sources also have great potential. For tion forecasting concentrate in a few key areas, notably
example, Wilde, Chen, and Lohmann (2020) use Google including causal (mechanistic) forecasting based on the
data to predict the effect of COVID-19 on US fertility process of migrant decision making (Willekens, 2018); as
in the absence of vital statistics. Lastly, investigation of well as early warnings and ‘nowcasting’ of rapidly chang-
the possible long-term impacts of delayed motherhood ing trends, for example in asylum migration (Napierała,
in high-income countries, alongside developments in as- Hilton, Forster, Carammia, & Bijak, 2021). In the context
sisted reproduction technology such as egg freezing, is of early warnings, forays into data-driven methods for
required (see, for example, Sobotka & Beaujouan, 2018). changepoint detection, possibly coupled with the digital
trace and other high-frequency ‘Big data’, bear particular
3.6.5. Forecasting migration141 promise. At the same time, coherent uncertainty descrip-
Migration forecasts are needed both as crucial input tion across a range of time horizons, especially in the long
into population projections (see Section 2.3.7), as well as range (Azose & Raftery, 2015), remains a challenge, which
standalone predictions, made for a range of users, chiefly needs addressing for the sake of proper calibration of er-
in the areas of policy and planning. At the same time, mi- rors in the population forecasts, to which these migration
gration processes are highly uncertain and complex, with components contribute.
many underlying and interacting drivers, which evade
precise conceptualisation, definitions, measurement, and 3.6.6. Forecasting risk for violence and wars142
theoretical description (Bijak & Czaika, 2020). Given the Can we predict the occurrence of WW3 in the next
high level of the predictive uncertainty, and the non- 20 years? Is there any trend in the severity of wars?
stationary character of many migration processes (Bijak & The study of armed conflicts and atrocities, both in
Wiśniowski, 2010), the current state of the art of forward- terms of frequency over time and the number of casu-
looking migration studies reflects therefore a shift from alties, has received quite some attention in the scientific
prediction to the use of forecasts as contingency planning literature and the media (e.g., Cederman, 2003; Fried-
tools (idem). man, 2015; Hayes, 2002; Norton-Taylor, 2015; Richard-
Reviews of migration forecasting methods are avail- son, 1948, 1960), falling within the broader discussion
able in Bijak (2010) and Sohst, Tjaden, de Valk, and Melde about violence (Berlinski, 2009; Goldstein, 2011; Spagat,
(2020). The applications in official statistics, with a few Mack, Cooper, & Kreutz, 2009), with the final goal of
exceptions, are typically based on various forms scenario- understanding whether humanity is becoming less bel-
based forecasting with judgment (see Section 2.11.5), ligerent (Pinker, 2011), or not (Braumoeller, 2019).
based on pre-set assumptions (for an example, see Abel, Regarding wars and atrocities, the public debate has
2018). Such methods are particularly used for longer time focused its attention on the so-called Long Peace The-
horizons, of a decade or more, so typically in the con- ory (Gaddis, 1989), according to which, after WW2, hu-
text of applications in population projections, although manity has experienced the most peaceful period in his-
even for such long time horizons calibrated probabilistic tory, with a decline in the number and in the severity
methods have been used as well (Azose et al., 2016). of bloody events. Scholars like Mueller (2009a, 2009b)
The mainstream developments in migration forecast- and Pinker (2011, 2018) claim that sociological arguments
ing methodology, however, include statistical and econo- and all statistics suggest we live in better times, while
metric methods discussed in Section 2.3, such as time others like Gray (2015a, 2015b) and Mann (2018) main-
series models, both uni- and multivariate (for exam- tain that those statistics are often partial and misused,
ple, Bijak, 2010; Bijak, Disney, Findlay, Forster, Smith, & the derived theories weak, and that war and violence are
Wiśniowski, 2019; Gorbey, James, & Poot, 1999), econo- not declining but only being transformed. For Mann, the
metric models (for example, Brücker & Siliverstovs, 2006; Long Peace proves to be ad-hoc, as it only deals with
Cappelen, Skjerpen, & Tønnessen, 2015), Bayesian hier- Western Europe and North America, neglecting the rest
archical models (Azose & Raftery, 2015), and dedicated of the world, and the fact that countries like the US have
methods, for example for forecasting data structured by been involved in many conflicts out of their territories
age (Raymer & Wiśniowski, 2018). In some cases, the after WW2.
methods additionally involve selection and combining Recent statistical analyses confirm Gray’s and Mann’s
forecasts through Bayesian model selection and averag- views: empirical data do not support the idea of a decline
ing (Bijak, 2010, see also Section 2.5 and Section 2.6). Such in human belligerence (no clear trend appears), and in
models can be expected to produce reasonable forecasts its severity. Armed conflicts show long inter-arrival times,
(and errors) for up to a decade ahead (Bijak & Wiśniowski, therefore a relative peace of a few decades means nothing
2010), although this depends on the migration flows be- statistically (Cirillo & Taleb, 2016b). Moreover, the distri-
ing forecast, with some processes (e.g., family migration) bution of war casualties is extremely fat-tailed (Clauset,

141 This subsection was written by Jakub Bijak. 142 This subsection was written by Pasquale Cirillo.

798
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

2018; Clauset & Gleditsch, 2018), often with a tail ex- relevant information, analytical models, judgment, visual-
ponent ξ = α1 > 1 (Cirillo & Taleb, 2016b), indicating isations, and feedback. To that end, FSSs must (i) elabo-
a possibly infinite mean, i.e., a tremendously erratic and rate accurate, efficient, and automatic statistical forecast-
unforeseeable phenomenon (see Section 2.3.22). An only ing methods, (ii) enable users to effectively incorporate
apparently infinite-mean phenomenon though (Cirillo & their judgment, (iii) allow the users to track and interact
Taleb, 2019), because no single war can kill more than the with the whole forecasting procedure, and (iv) be easily
entire world population, therefore a finite upper bound customised based on the context of the company.
exists, and all moments are necessarily finite, even if Indeed, nowadays, most off-the-self solutions, such
difficult to estimate. Extreme value theory (Embrechts as SAP, SAS, JDEdwards, and ForecastPro, offer a vari-
et al., 2013) can thus be used to correctly model tail risk ety of both standard and advanced statistical forecasting
and make prudential forecasts (with many caveats like in methods (see Section 2.3), as well as data pre-processing
Scharpf, Schneider, Nöh, & Clauset, 2014), while avoiding (see Section 2.2) and performance evaluation algorithms
naive extrapolations (Taleb et al., 2020). (see Section 2.12). On the other hand, many of them
As history teaches (Nye, 1990), humanity has already still struggle to incorporate state-of-the-art methods that
experienced periods of relative regional peace, like the can further improve forecasting accuracy, such as auto-
famous Paces Romana and Sinica. The present Pax Ameri- matic model selection algorithms and temporal aggrega-
cana is not enough to claim that we are structurally living tion (see also Section 2.10.2), thus limiting the options of
in a more peaceful era. The Long Peace risks to be an- the users (Petropoulos, 2015). Similarly, although many
other apophenia, another example of Texan sharpshooter FSSs support judgmental forecasts (see Section 2.11.1) and
fallacy (Carroll, 2003). judgmental adjustments of statistical forecasts (see Sec-
Similar mistakes have been made in the past. Buckle tion 2.11.2), this is not done as suggested by the literature,
(1858) wrote: ‘‘that [war] is, in the progress of society, i.e., in a guided way under a well-organised framework.
steadily declining, must be evident, even to the most As a result, the capabilities of the users are restrained
hasty reader of European history. If we compare one and methods that could be used to mitigate biases, over-
country with another, we shall find that for a very long shooting, anchoring, and unreasonable or insignificant
period wars have been becoming less frequent; and now changes that do not rationalise the time wasted, are
so clearly is the movement marked, that, until the late largely ignored (Fildes & Goodwin, 2013; Fildes et al.,
commencement of hostilities, we had remained at peace 2006).
for nearly forty years: a circumstance unparalleled [...] in Other practical issues of FSSs are related with their
the affairs of the world’’. Sadly, Buckle was victim of the engine and interfaces which are typically designed so that
illusion coming from the Pax Britannica (Johnston, 2008): they are generic and capable to serve different compa-
the century following his prose turned out to be the most nies and organisations of diverse needs (Kusters et al.,
murderous in human history. 2006). From a developing and economic perspective, this
is a reasonable choice. However, the lack of flexibility
and customisability can lead to interfaces with need-
3.7. Systems and humans less options, models, tools, and features that may con-
fuse inexperienced users and undermine their perfor-
3.7.1. Support systems143 mance (Fildes et al., 2006). Thus, simple, yet exhaus-
Forecasting in businesses is a complicated procedure, tive interfaces should be designed in the future to bet-
especially when predicting numerous, diverse series (see ter serve the needs of each company and fit its partic-
Section 2.7.4), dealing with unstructured data of multi- ular requirements (Spiliotis, Raptis, & Assimakopoulos,
ple sources (see Section 2.7.1), and incorporating human 2015). Ideally, the interfaces should be adapted to the
judgment (Lim & O’Connor, 1996a, but also Section 2.11). strengths and weaknesses of the user, providing useful
In this respect, since the early 80’s, various Forecasting feedback when possible (Goodwin, Fildes, Lawrence, &
Support Systems (FSSs) have been developed to facilitate Nikolopoulos, 2007). Finally, web-based FSSs could re-
forecasting and support decision making (Kusters, Mc- place windows-based ones that are locally installed and
Cullough, & Bell, 2006). Rycroft (1993) provides an early therefore of limited accessibility, availability, and compat-
comparative review of such systems, while many studies ibility (Asimakopoulos & Dix, 2013). Cloud computing and
strongly support their utilisation over other forecasting web-services could be exploited in that direction.
alternatives (Sanders & Manrodt, 2003; Tashman & Leach,
1991). 3.7.2. Cloud resource capacity forecasting144
In a typical use-case scenario, the FSSs will retrieve One of the central promises in cloud computing is
the data required for producing the forecasts, will provide that of elasticity. Customers of cloud computing services
some visualisations and summary statistics to the user, can add compute resources in real-time to meet and
allow for data pre-processing, and then produce forecasts satisfy increasing demand and, when demand for a cloud-
that may be adjusted according to the preferences of the hosted application goes down, it is possible for cloud
user. However, according to Ord and Fildes (2013), effec- computing customers to down-scale. The benefit of the
tive FSS should be able to produce forecasts by combining latter is particularly economically interesting during the

143 This subsection was written by Vassilios Assimakopoulos. 144 This subsection was written by Tim Januschowski.

799
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

current pandemic. Popular recent cloud computing offer- 3.7.3. Judgmental forecasting in practice145
ings take this elasticity concept one step further. They Surveys of forecasting practice (De Baets, 2019) have
abstract away the computational resources completely shown that the use of pure judgmental forecasting by
from developers, so that developers can build serverless practitioners has become less common. About 40 years
applications. In order for this to work, the cloud provider ago, Sparkes and McHugh (1984) found that company ac-
handles the addition and removal of compute resources tion was more likely to be influenced by judgmental fore-
‘‘behind the scenes’’. casts than by any other type of forecast. In contrast, Fildes
and Petropoulos (2015) found that only 15.6% of forecasts
To keep the promise of elasticity, a cloud provider
in the surveyed companies were made by judgment alone.
must address a number of forecasting problems at vary-
The majority of forecasts (55.6%) were made using a com-
ing scales along the operational, tactical and strategic
bination of statistical and judgmental methods. In this
problem dimensions (Januschowski & Kolassa, 2019). As
section, we discuss forecasting using unaided judgment
an example for a strategic forecasting problems: where (pure judgmental forecasting; see also Section 2.11.1),
should data centres be placed? In what region of a country judgmental adjustments (judgment in combination with
and in what geographic region? As an example for tactical statistical models; see also Section 2.11.2), and the role of
forecasting problems, these must take into account energy judgment in forecasting support systems.
prices (see Section 3.4.2) and also, classic supply chain On the first theme, the survey results discussed above
problems (Larson, Simchi-Levi, Kaminsky, & Simchi-Levi, beg the question of whether pure judgmental forecasting
2001). After all, physical servers and data centres are what is still relevant and reliable. Answers here depend on the
enables the cloud and these must be ordered and have type of information on which the judgmental forecasts
a lead-time. The careful incorporation of life cycles of are based (Harvey, 2007, see also Section 2.11.1). For
compute types is important (e.g., both the popularity of instance, people have difficulty making cross-series fore-
certain machine types and the duration of a hard disk). casts, as they have difficulty learning the correlation be-
Analogous to the retail sector, cloud resource providers tween variables and using it to make their forecasts (Har-
have tactical cold-start forecasting problems. For example, vey, Bolger, & McClelland, 1994; Lim & O’Connor, 1996b,
1996c). Additionally, they appear to take account of the
while GPU or TPU instances are still relatively recent
noise as well as the pattern when learning the relation
but already well estabilished, the demand for quantum
between variables; hence, when later using one of the
computing is still to be decided. In the class of opera-
variables to forecast the other, they add noise to their
tional forecasting problems, cloud provider can choose to
forecasts (Gray, Barnes, & Wilkinson, 1965). Judgmen-
address short-term resource forecasting problems for ap- tal extrapolation from a single time series is subject to
plications such as adding resources to applications predic- various effects. First, people are influenced by optimism.
tively and make this available to customers (Barr, 2018). For example, they over-forecast time series labelled as
The forecasting of the customer’s spend for cloud com- ‘profits’ but under-forecast the same series labelled as
puting is another example. For serverless infrastructure, a ‘losses’ (Harvey & Reimers, 2013). Second, they add noise
number of servers is often maintained in a ready state (Gias to their forecasts so that a sequence of forecasts looks
& Casale, 2020) and the forecasting of the size of this similar to (‘represents’) the data series (Harvey, 1995).
‘warmpool’ is another example. We note that cloud com- Third, they damp trends in the data (Eggleton, 1982;
puting customers have forecasting problems that mirror Harvey & Reimers, 2013; Lawrence & Makridakis, 1989).
the forecasting challenges of the cloud providers. Interest- Fourth, forecasts from un-trended independent series do
ingly, forecasting itself has become a software service that not lie on the series mean but between the last data point
cloud computing companies offer (Januschowski, Arpin, and the mean; this is what we would expect if people
Salinas, Flunkert, Gasthaus & Lorenzo, 2018; Liberty et al., perceived a positive autocorrelation in the series (Reimers
& Harvey, 2011). These last two effects can be explained
2020; Poccia, 2019)
in terms of the under-adjustment that characterises use
Many challenges in this application area are not unique
of the anchor-and-adjust heuristic: forecasters anchor on
to cloud computing. Cold start problems exist elsewhere
the last data point and adjust towards the trend line or
for example. What potentially stands out in cloud com-
mean – but do so insufficiently. However, in practice,
puting forecasting problems may be the scale (e.g., there this under-adjustment may be appropriate because real
are a lot of physical servers available), the demands on the linear trends do become damped and real series are more
response time and granularity of a forecast and the degree likely to contain a modest autocorrelation than be inde-
of automation. Consider the operational forecasting prob- pendent (Harvey, 2011). We should therefore be reluctant
lem of predictive scaling. Unlike in retail demand forecast- to characterise these last two effects as biases.
ing, no human operator will be able to control this and Given these inherent flaws in people’s decision mak-
response times to forecasts are in seconds. It will be inter- ing, practitioners might be hesitant to base their predic-
esting to see whether approaches based on reinforcement tions on judgment. However, the reality is that companies
learning (Dempster, Payne, Romahi, & Thompson, 2001; persist in incorporating judgment into their forecasting.
Gamble & Gao, 2018) can partially replace the need to
have forecasting models (Januschowski, Gasthaus, Wang, 145 This subsection was written by Shari De Baets, M. Sinan Gönül, &
Rangapuram & Callot, 2018). Nigel Harvey.

800
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Assertions that they are wrong to do so represent an over- selection, (iii) when baseline predictions are supplemented
simplified view of the reality in which businesses operate. by additional supportive materials such as scenarios and
Statistical models are generally not able to account for alternative forecasts, (iv) when the forecasting source is
external events, events with low frequency, or a patchy believed to be trustworthy and reliable, and (v) when
and insufficient data history (Armstrong & Collopy, 1998; organisational policy or culture prohibits judgmental ad-
Goodwin, 2002; Hughes, 2001). Hence, a balance may justments. In these circumstances, the baseline forecasts
be found in the combination of statistical models and are more easily accepted by practitioners and their ad-
judgment (see Section 2.11.2). justments tend to be less frequent.
In this respect, judgmental adjustments to statistical Ideally, a Forecast Support System (FSS; see
model outputs are the most frequent form of judgmental Section 3.7.1) should be designed to ensure that it encour-
forecasting in practice (Arvan et al., 2019; Eksoz et al., ages adjustment or non-adjustment whichever is appro-
2019; Lawrence et al., 2006; Petropoulos et al., 2016). priate (Fildes et al., 2006). But how can this be achieved?
Judgmental adjustments give practitioners a quick and The perceived quality and accessibility of a FSS can be
influenced by its design. More on this can be found in
convenient way to incorporate their insights, their experi-
the literature on the Technology Acceptance Model (Davis,
ence and the additional information that they possess into
Bagozzi, & Warshaw, 1989) and decision making (for
a set of statistical baseline forecasts. Interestingly, Fildes
instance, by means of framing, visual presentation or
et al. (2009) examined the judgmental adjustment appli-
nudging; e.g., Gigerenzer, 1996; Kahneman & Tversky,
cations in four large supply-chain companies and found
1996; Payne, 1982; Thaler & Sunstein, 2009). A number
evidence that the adjustments in a ‘negative’ direction of studies have investigated the design aspects of FSS,
improved the accuracy more than the adjustments in with varying success. One of the more straightforward
a ‘positive’ direction. This effect may be attributable to approaches is to change the look and feel of the FSS as
wishful thinking or optimism that may underlie positive well as its presentation style. Harvey and Bolger (1996)
adjustments. Adjustments that were ‘larger’ in magnitude found that trends were more easily discernible when
were also more beneficial in terms of the final forecast ac- the data was displayed in graphical rather than tabular
curacy than ‘smaller’ adjustments (Fildes et al., 2009). This format. Additionally, simple variations in presentation
may simply be because smaller adjustments are merely such as line graphs versus point graphs can alter accuracy
a sign of tweaking the numbers, but large adjustments (Theocharis, Smith, & Harvey, 2018). The functionalities
are carried out when there is a highly valid reason to of the FSS can also be modified (see Section 2.11.2).
make them. These findings have been confirmed in other Goodwin (2000b) investigated three ways of improving
studies (see, for example, Franses & Legerstee, 2009b; judgmental adjustment via changes in the FSS: a ‘no ad-
Syntetos, Nikolopoulos, Boylan, Fildes, & Goodwin, 2009). justment’ default, requesting forecasters specify the size
What are the main reasons behind judgmental adjust- of an adjustment rather than give a revised forecast, and
ments? Önkal and Gönül (2005) conducted a series of in- requiring a mandatory explanation for the adjustment.
terviews and a survey on forecasting practitioners (Gönül, Only the default option and the explanation feature were
Önkal, & Goodwin, 2009) to explore these. The main rea- successful in increasing the acceptance of the statistical
sons given were (i) to incorporate the practitioners’ in- forecast and so improving forecast accuracy.
tuition and experience about the predictions generated Goodwin et al. (2011) reported an experiment that
externally, (ii) to accommodate sporadic events and ex- investigated the effects of (i) ‘guidance’ in the form of pro-
ceptional occasions, (iii) to integrate confidential/insider viding information about when to make adjustments and
information that may have not been captured in the fore- (ii) ‘restriction’ of what the forecaster could do (e.g., pro-
casts, (iv) to hold responsibility and to gain control of the hibiting small adjustments). They found that neither re-
forecasting process, (v) to incorporate the expectations strictiveness nor guidance was successful in improving
and viewpoints of the practitioners, and (vi) to compen- accuracy, and both were met with resistance by the fore-
casters. While these studies focused on voluntary integra-
sate for various judgmental biases that are believed to
tion, Goodwin (2000a, 2002) examined the effectiveness
exist in the predictions. These studies also revealed that
of various methods of mechanical integration and con-
forecasting practitioners are very fond of judgmental ad-
cluded that automatic correction for judgmental biases by
justments and perceive them as a prominent way of ‘com-
the FSS was more effective than combining judgmental
pleting’ and ‘owning’ the predictions that are generated
and statistical inputs automatically with equal or varying
by others. weights. Another approach to mechanical integration was
While the first three reasons represent the integration investigated by Baecke et al. (2017). They compared ordi-
of an un-modelled component into the forecast, poten- nary judgmental adjustment with what they termed ‘‘in-
tially improving accuracy, the other reasons tend to harm tegrative judgment’’. This takes the judgmental informa-
accuracy rather than improve it. In such cases, the fore- tion into account as a predictive variable in the forecasting
cast would be better off if left unadjusted. Önkal and model and generates a new forecast. This approach im-
Gönül (2005) and Gönül et al. (2009) report that the proved accuracy. It also had the advantage that forecasters
occasions when forecasters refrain from adjustments are still had their input into the forecasting process and so
(i) when the practitioners are adequately informed and the resistance found by Goodwin et al. (2011) should not
knowledgeable about the forecasting method(s) that are occur. Finally, it is worth emphasising that an effective
used to generate the baseline forecasts, (ii) when there are FSS should not only improve forecast accuracy but should
accompanying explanations and convincing communica- also be easy to use, understandable, and acceptable (Fildes
tions that provide the rationale behind forecast method et al., 2006, see also Section 2.11.6 and Section 3.7.1).
801
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

3.7.4. Trust in forecasts146 objectivity and expertise. In parallel, forecast users face
Regardless of how much effort is poured into train- challenges in openly communicating their expectations
ing forecasters and developing elaborate forecast support from forecasts (Gönül et al., 2009), as well as their needs
systems, decision-makers will either modify or discard for explanations and other informational addendum to
the predictions if they do not trust them (see also Sec- gauge the uncertainties surrounding the forecasts. Organ-
tion 2.11.2, Section 2.11.6, Section 3.7.1, and Section 3.7.3). isational challenges include investing in forecast man-
Hence, trust is essential for forecasts to be actually used agement and designing resilient systems for collaborative
in making decisions (Alvarado-Valencia & Barrero, 2014; forecasting.
Önkal et al., 2019).
Given that trust appears to be the most important 3.7.5. Communicating forecast uncertainty147
attribute that promotes a forecast, what does it mean to Communicating forecast uncertainty is a critical issue
practitioners? Past work suggests that trusting a forecast in forecasting practice. Effective communication allows
is often equated with trusting the forecaster, their exper- forecasters to influence end-users to respond appropri-
tise and skills so that predictions could be used without ately to forecasted uncertainties. Some frameworks for
adjustment to make decisions (Önkal et al., 2019). It is effective communication have been proposed by decom-
argued that trust entails relying on credible forecasters posing the communication process into its elements: the
that make the best use of available information while communicator, object of uncertainty, expression format,
using correctly applied methods and realistic assump- audience, and its effect (National Research Council, 2006;
tions (Gönül et al., 2009) with no hidden agendas (Gönül, van der Bles, van der Linden, Freeman, Mitchell, Galvao,
Önkal, & Goodwin, 2012). Research suggests that trust Zaval, & Spiegelhalter, 2019).
is not only about trusting forecaster’s competence; users Forecasters have long studied part of this problem fo-
also need to be convinced that no manipulations are made cusing mostly in the manner by which we express forecast
for personal gains and/or to mislead decisions (Twyman, uncertainties. Gneiting and Katzfuss (2014) provides a
Harvey, & Harries, 2008). review of recent probabilistic forecasting methods (see
Surveys with practitioners show that key determinants also Section 2.12.4 and Section 2.12.5). Forecasting prac-
of trust revolve around (i) forecast support features and tice however revealed that numeracy skills and cognitive
tools (e.g., graphical illustrations, rationale for forecasts), load can often inhibit end-users from correctly interpret-
(ii) forecaster competence/credibility, (iii) forecast com- ing these uncertainties (Joslyn & Nichols, 2009; Raftery,
binations (from multiple forecasters/methods), and (iv) 2016). Attempts to improve understanding through the
forecast user’s knowledge of forecasting methods (Önkal use of less technical vocabulary also creates new chal-
et al., 2019). lenges. Research in psychology show that wording and
What can be done to enhance trust? If trust translates verbal representation play important roles in dissem-
into accepting guidance for the future while acknowledg- inating uncertainty (Joslyn, Nadav-Greenberg, Taing, &
ing and tolerating potential forecast errors, then both the Nichols, 2009). Generally forecasters are found to be con-
providers and users of forecasts need to work as part- sistent in their use of terminology, but forecast end-users
ners towards shared goals and expectations. Important often have inconsistent interpretation of these terms even
pathways to accomplish this include (i) honest communi- those commonly used (Budescu & Wallsten, 1985; Clark,
cation of forecaster’s track record and relevant accuracy 1990; Ülkümen, Fox, & Malle, 2016). Pretesting verbal ex-
targets (Önkal et al., 2019), (ii) knowledge sharing (Özer pressions and avoiding commonly misinterpreted terms
et al., 2011; Renzl, 2008) and transparency of forecasting are some easy ways to significantly reduce biases and
methods, assumptions and data (Önkal et al., 2019), (iii) improve comprehension.
communicating forecasts in the correct tone and jargon- Visualisations can also be powerful in communicating
free language to appeal to the user audience (Taylor & uncertainty. Johnson and Slovic (1995) and Spiegelhalter,
Thomas, 1982), (iv) users to be supported with forecasting Pearson, and Short (2011) propose several suggestions for
training (Merrick, Hardin, & Walker, 2006), (v) providing effective communication (e.g., multiple-format use, avoid-
explanations/rationale behind forecasts (Gönül, Önkal, & ing framing bias, and acknowledging limitations), but also
Lawrence, 2006; Önkal, Gönül, & Lawrence, 2008), (vi) recognise the limited amount of existing empirical evi-
presenting alternative forecasts under different scenarios dence. Some domain-specific studies do exist. For exam-
(see Section 2.11.5), and (vii) giving combined forecasts as ple, Riveiro, Helldin, Falkman, and Lebram (2014) showed
benchmarks (Önkal et al., 2019). uncertainty visualisation helped forecast comprehension
Trust must be earned and deserved (Maister et al., in a homeland security context.
2012) and is based on building a relationship that benefits With respect to the forecaster and her audience, is-
both the providers and users of forecasts. Take-aways sues such as competence, trust, respect, and optimism
for those who make forecasts and those who use them have been recently examined as a means to improve
converge around clarity of communication as well as per- uncertainty communication. Fiske and Dupree (2014) dis-
ceptions of competence and integrity. Key challenges for cusses how forecast recipients often infer apparent intent
forecasters are to successfully engage with users through- and competence from the uncertainty provided and use
out the forecasting process (rather than relying on a fore- these to judge trust and respect (see also Sections 2.11.6
cast statement at the end) and to convince them of their and 3.7.4 for discussion on trust and forecasting). This

146 This subsection was written by Dilek Önkal. 147 This subsection was written by Victor Richmond R. Jose.

802
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

suggests that the amount of uncertainty information pro- Traditionally, typical micro-economic demand drivers
vided should be audience dependent (Han, Klein, Lehman, (own price, competitors’ prices, and income) and some
Massett, Lee, & Freedman, 2009; Politi, Han, & Col, 2007). more tourism-specific demand drivers (source-market
Raftery (2016) acknowledges this by using strategies de- population, marketing expenditures, consumer tastes, habit
pending on the audience type (e.g., low-stakes user, risk persistence, and dummy variables capturing one-off events
avoider, etc.). Fischhoff and Davis (2014) suggests a simi- or qualitative characteristics) have been employed as pre-
lar approach by examining how people are likely to use dictors in tourism demand forecasting (Song et al., 2008).
the information (e.g., finding a signal, generating new One caveat of some of these economic demand drivers is
options, etc.) their publication lag and their low frequency, for instance,
When dealing with the public, experts assert that when real GDP (per capita) is employed as a proxy for
communicating uncertainty helps users understand fore- travellers’ income.
casts better and avoid a false sense of certainty (Morss, The use of leading indicators, such as industrial pro-
Demuth, & Lazo, 2008). Research however shows that duction as a leading indicator for real GDP (see also
hesitation to include forecast uncertainty exists among Section 3.3.2), has been proposed for short-term tourism
experts because it provides an opening for criticism and demand forecasting and nowcasting (Chatziantoniou, De-
the possibility of misinterpration by the public (Fischhoff, giannakis, Eeckels, & Filis, 2016). During the past couple
2012). This is more challenging when the public has of years, web-based leading indicators have also been
prior beliefs on a topic or trust has not been established. employed in tourism demand forecasting and have, in
Uncertainty can be used by individuals to reinforce a general, shown improvement in terms of forecast accu-
motivated-reasoning bias that allows them to ‘‘see what racy. However, this has not happened in each and every
they want to see’’ (Dieckmann, Gregory, Peters, & Hart- case, thereby confirming the traded wisdom that there is
man, 2017). Recent work however suggests that increas- no single best tourism demand forecasting approach (Li,
ing transparency for uncertainty does not necessarily Song, & Witt, 2005). Examples of those web-based lead-
affect trust in some settings. van der Bles, van der Linden, ing indicators include Google Trends indices (Bangwayo-
Freeman, and Spiegelhalter (2020) recently showed in Skeete & Skeete, 2015), Google Analytics indicators (Gunter
a series of experiments that people recognise greater & Önder, 2016), as well as Facebook ‘likes’ (Gunter, Önder,
uncertainty with more information but expressed only a & Gindl, 2019).
small decrease in trust in the report and trustworthiness The reason why these expressions of interaction of
of the source. users with the Internet have proven worthwhile as pre-
dictors in a large number of cases is that it is sensible to
3.8. Other applications assume potential travellers gather information about their
destination of interest prior to the actual trip, with the
3.8.1. Tourism demand forecasting148 Internet being characterised by comparably low search
As seen throughout 2020, (leisure) tourism demand costs, ergo allowing potential travellers to forage infor-
is very sensitive to external shocks such as natural and mation (Pirolli & Card, 1999) with only little effort (Zipf,
human-made disasters, making tourism products and ser- 2016). A forecaster should include this information in
vices extremely perishable (Frechtling, 2001). As the ma- their own set of relevant information at the forecast origin
jority of business decisions in the tourism industry require Lütkepohl (2005), if taking it into account results in an
reliable demand forecasts (Song, Witt, & Li, 2008), improv- improved forecast accuracy, with web-based leading in-
ing their accuracy has continuously been on the agenda of dicators thus effectively Granger-causing (Granger, 1969)
tourism researchers and practitioners alike. This contin- actual tourism demand (see Section 2.5.1).
uous interest has resulted in two tourism demand fore- Naturally, tourism demand forecasting is closely re-
casting competitions to date (Athanasopoulos et al., 2011; lated to aviation forecasting (see Section 3.8.2), as well as
Song & Li, 2021), the current one with a particular focus traffic flow forecasting (see Section 3.8.3). A sub-discipline
on tourism demand forecasting during the COVID-19 pan- of tourism demand forecasting can be found with hotel
demic (for forecasting competitions, see Section 2.12.7). room demand forecasting. The aforementioned perisha-
Depending on data availability, as well as on geographical bility of tourism products and services is particularly
aggregation level, tourism demand is typically measured evident for hotels as a hotel room not sold is lost revenue
in terms of arrivals, bed-nights, visitors, export receipts, that cannot be regenerated. Accurate hotel room demand
import expenditures, etc. forecasts are crucial for successful hotel revenue manage-
Since there are no specific tourism demand forecast ment (Pereira, 2016) and are relevant for planning pur-
models, standard univariate and multivariate statistical poses such as adequate staffing during MICE (i.e., Meet-
models, including common aggregation and combination ings, Incentives, Conventions, and Exhibitions/Events)
techniques, etc., have been used in quantitative tourism times, scheduling of renovation periods during low sea-
demand forecasting (see, for example, Jiao & Chen, 2019; sons, or balancing out overbookings and ‘‘no shows’’ given
Song, Qiu, & Park, 2019, for recent reviews). Machine constrained hotel room supply (Ivanov & Zhechev, 2012).
learning and other artificial intelligence methods, as well Particularly since the onset of the COVID-19 pandemic
as hybrids of statistical and machine learning models, in 2020, which has been characterised by global travel
have recently been employed more frequently. restrictions and tourism businesses being locked down to
varying extents, scenario forecasting and other forms of
148 This subsection was written by Ulrich Gunter. hybrid and judgmental forecasting played an important
803
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

role (Zhang, Song, Wen, & Liu, 2021, see Section 2.11.5), as exponential smoothing (see Section 2.3.1) and ARIMA
thereby highlighting an important limitation of quantita- (see Section 2.3.4) have been discussed in Botimer (1997),
tive tourism demand forecasting as currently practised. Sa (1987), and Wickham (1995). There are also studies
Based on the rapid development of information technol- focusing on disaggregate airline demand forecasting. For
ogy and artificial intelligence, Li and Jiao (2020), however, example, Martinez and Sanchez (1970) apply empirical
envisage a ‘‘super-smart tourism forecasting system’’ (Li & probability distributions to predict bookings and can-
Jiao, 2020, p. 264) for the upcoming 75 years of tourism cellations of individual passengers travelling with Iberia
demand forecasting. According to these authors, this sys- Airlines. Carson, Cenesizoglu, and Parker (2011) show
tem will be able to automatically produce forecasts at the that aggregating the forecasts of individual airports using
micro level (i.e., for the individual traveller and tourism airport-specific data could provide better forecasts at a
business) in real time while drawing on a multitude of national level. More recently, machine learning meth-
data sources and integrating multiple (self-developing) ods have also been introduced to generate forecasts for
forecast models. airlines. This can be seen in Weatherford, Gentry, and
Wilamowski (2003) where they apply neural networks
3.8.2. Forecasting for aviation149 to forecast the time series of the number of reservations.
Airports and airlines have long invested in forecasting Moreover, Hopman, Koole, and van der Mei (2021) show
arrivals and departures of aircrafts. These forecasts are that an extreme gradient boosting model which forecasts
important in measuring airspace and airport congestions, itinerary-based bookings using ticket price, social media
designing flight schedules, and planning for the assign- posts and airline reviews outperforms traditional time
ment of stands and gates (Barnhart & Cohn, 2004). Various series forecasts.
techniques have been applied to forecast aircrafts’ arrivals Forecasting passenger arrivals and delays in the air-
and departures. For instance, Rebollo and Balakrishnan ports have received also some attention in the litera-
(2014) apply random forests to predict air traffic delays ture, particularly in the past decade. Wei and Hansen
of the National Airspace System using both temporal and (2006) build an aggregate demand model for air passenger
network delay states as covariates. Manna, Biswas, Kundu, traffic in a hub-and-spoke network. The model is a log-
Rakshit, Gupta, and Barman (2017) develop a statistical linear regression that uses airline service variables such
model based on a gradient boosting decision tree to pre- as aircraft size and flight distance as predictors. Barnhart,
dict arrival and departure delays, using the data taken Fearing, and Vaze (2014) develop a multinomial logit
from the United States Department of Transportation (Bu- regression model, designed to predict delays of US do-
reau of Transportation Statistics, 2020). Rodríguez-Sanz,
mestic passengers. Their study also uses data from the US
Comendador, Valdés, Pérez-Castán, Montes, and Serrano
Department of Transportation (Bureau of Transportation
(2019) develop a Bayesian Network model to predict
Statistics, 2020). Guo, Grushka-Cockayne, and De Reyck
flight arrivals and delays using the radar data, aircraft his-
(2020) recently develop a predictive system that gener-
torical performance and local environmental data. There
ates distributional forecasts of connection times for trans-
are also a few studies that have focused on generating
fer passengers at an airport, as well as passenger flows
probabilistic forecasts of arrivals and departures, mov-
at the immigration and security areas. Their approach is
ing beyond point estimates. For example, Tu, Ball, and
based on the application of regression trees combined
Jank (2008) develop a predictive system for estimating
with copula-based simulations. This predictive system has
flight departure delay distributions using flight data from
been implemented at Heathrow airport since 2017.
Denver International Airport. The system employs the
With an increasing amount of available data that is as-
smoothing spline method to model seasonal trends and
sociated with activities in the aviation industry, predictive
daily propagation patterns. It also uses mixture distribu-
analyses and forecasting methods face new challenges as
tions to estimate the residual errors for predicting the
well as opportunities, especially in regard to updating
entire distribution.
In the airline industry, accurate forecasts on demand forecasts in real time. The predictive system developed
and booking cancellations are crucial to revenue manage- by Guo et al. (2020) is able to generate accurate forecasts
ment, a concept that was mainly inspired by the airline using real-time flight and passenger information on a
and hotel industries (Lee, 1990; McGill & Van Ryzin, 1999, rolling basis. The parameters of their model, however, do
see also Section 3.8.1 for a discussion on hotel occu- not update over time. Therefore, a key challenge in this
pancy forecasting). The proposals of forecasting models area is for future studies to identify an efficient way to
for flight demand can be traced back to Beckmann and dynamically update model parameters in real time.
Bobkoski (1958), where these authors demonstrate that
Poisson and Gamma models can be applied to fit airline 3.8.3. Traffic flow forecasting150
data. Then, the use of similar flights’ short-term booking Traffic flow forecasting is an important task for traffic
information in forecasting potential future bookings has management bodies to reduce traffic congestion, perform
been discussed by airline practitioners such as Adams and planning and allocation tasks, as well as for travelling
Michael (1987) at Quantas as well as Smith, Leimkuh- individuals to plan their trips. Traffic flow is complex
ler, and Darrow (1992) at American Airlines. Regressions spatial and time-series data exhibiting multiple season-
models (see Section 2.3.2) and time series models such alities and affected by spatial exogenous influences such

149 This subsection was written by Xiaojia Guo. 150 This subsection was written by Alexander Dokumentov.

804
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

as social and economic activities and events, various gov- volume or call arrival rates is an important part of call
ernment regulations, planned road works, weather, traffic centre management.
accidents, etc. (Polson & Sokolov, 2017). There are several properties to call arrival data. De-
Methods to solve traffic flow forecasting problems pending on the level of aggregation and the frequency
vaguely fall into three categories. The first uses paramet- with which data is collected, e.g., hourly, call arrival
ric statistical methods such as ARIMA, seasonal ARIMA, data may exhibit intraday (within-day), intraweek, and
space–time ARIMA, Kalman filters, etc. (see, for example, intrayear multiple seasonal patterns (Avramidis, Deslau-
Kamarianakis & Prastacos, 2005; Vlahogianni, Golias, & riers, & L’Ecuyer, 2004; Brown et al., 2005a, and Sec-
Karlaftis, 2004; Vlahogianni, Karlaftis, & Golias, 2014; tion 2.3.5). In addition, arrival data may also exhibit inter-
Whittaker, Garside, & Lindveld, 1997). The second set day and intraday dependencies, with different time peri-
of approaches uses purely of neural networks (Mena- ods within the same day, or across days within the same
Oreja & Gozalvez, 2020). The third group of methods week, showing strong levels of autocorrelation (Brown
uses various machine learning, statistical non-parametric et al., 2005a; Shen & Huang, 2005; Tanir & Booth, 1999).
techniques or mixture of them (see, for example, Hong, Call arrivals may also be heteroscedastic with variance
2011; Zhang, Qi, Henrickson, Tang, & Wang, 2017; Zhang, at least proportional to arrival counts (Taylor, 2008), and
Zou, Tang, Ash, & Wang, 2016, but also Section 2.7.8 and overdispersed under a Poisson assumption having vari-
Section 2.7.10 for an overview of NN and ML methods). ance per time period typically much larger than its ex-
Although neural networks are probably the most pected value (Avramidis et al., 2004; Jongbloed & Koole,
promising technique for traffic flow forecasting (see, for 2001; Steckley, Henderson, & Mehrotra, 2005). These prop-
example, Do, Vu, Vo, Liu, & Phung, 2019; Polson & Sokolov, erties have implications for various approaches to mod-
elling and forecasting call arrivals.
2017), statistical techniques, such as Seasonal-Trend de-
The first family of methods are time series meth-
composition based on Regression (STR, see Section 2.2.2),
ods requiring no distributional assumptions. Early studies
can outperform when little data is available or they can be
employed auto regressive moving average (ARMA; see
used for imputation, de-noising, and other pre-processing
Section 2.3.4) models (Andrews & Cunningham, 1995;
before feeding data into neural networks which often
Antipov & Meade, 2002; Tandberg, Easom, & Qualls, 1995;
become less powerful when working with missing or very
Xu, 1999), exponential smoothing (Bianchi, Jarrett, & Hanu-
noisy data.
mara, 1993, 1998, see Section 2.3.1), fast Fourier trans-
Traffic flow forecasting is illustrated below using ve-
forms (Lewis, Herbert, & Bell, 2003), and regression (Tych,
hicle flow rate data from road camera A1.GT.24538 on
Pedregal, Young, & Davies, 2002, see Section 2.3.2). The
A1 highway in Luxembourg (des Mobilités, 2020) from
first methods capable of capturing multiple seasonality
2019-11-19 06:44:00 UTC to 2019-12-23 06:44:00 UTC.
were evaluated by Taylor (2008) and included double
Most of the data points are separated by 5 min intervals. seasonal exponential smoothing (Taylor, 2003b) and mul-
Discarding points which do not follow this schedule leads tiplicative double seasonal ARMA (SARMA). Since then
to a data set where all data points are separated by 5 min several advanced time series methods have been devel-
intervals, although values at some points are missing. oped and evaluated (De Livera et al., 2011; Taylor, 2010;
The data is split into training and test sets by setting Taylor & Snyder, 2012), including artificial neural net-
aside last 7 days of data. As Hou, Edara, and Sun (2014) works (Li, Huang, & Gong, 2011; Millán-Ruiz & Hidalgo,
and Polson and Sokolov (2017) suggest, spatial factors 2013; Pacheco, Millán-Ruiz, & Vélez, 2009) and models
are less important for long term traffic flow forecasting, for density forecasting (Taylor, 2012).
and therefore they are not taken into account and only Another family of models relies on the assumption
temporal data is used. Application of STR (Dokumentov, of a time-inhomogeneous Poisson process adopting fixed
2017) as a forecasting technique to the log transformed (Brown et al., 2005a; Jongbloed & Koole, 2001; Shen &
data leads to a forecast with Mean Squared Error 102.4, Huang, 2008a; Taylor, 2012) and mixed modelling (Aldor-
Mean Absolute Error 62.8, and Mean Absolute Percent- Noiman, Feigin, & Mandelbaum, 2009; Avramidis et al.,
age Error (MAPE) 14.3% over the test set, outperforming 2004; Ibrahim & L’Ecuyer, 2013) approaches to account
Double-Seasonal Holt-Winters by 44% in terms of MAPE. for the overdispersed nature of the data, and in some
The decomposition and the forecast obtained by STR are cases, interday and intraday dependence.
shown on Fig. 15 and the magnified forecast and the The works by Soyer and Tarimcilar (2008) and Wein-
forecasting errors are on Fig. 16. berg, Brown, and Stroud (2007) model call volumes from
a Bayesian point of view. Other Bayesian inspired ap-
3.8.4. Call arrival forecasting151 proaches have been adopted mainly for estimating var-
Forecasting of inbound call arrivals for call centres ious model parameters, but also allowing for intraday
supports a number of key decisions primarily around updates of forecasts (Aktekin & Soyer, 2011; Landon, Rug-
staffing (Aksin, Armony, & Mehrotra, 2007). This typ- geri, Soyer, & Tarimcilar, 2010).
A further class of approach addresses the dimension-
ically involves matching staffing level requirements to
ality challenge related to high frequency call data using
service demand as summarised in Fig. 17. To achieve
Singular Value Decomposition (SVD). Shen and Huang
service level objectives, an understanding of the call load
is required in terms of the call arrivals (Gans, Koole, &
Mandelbaum, 2003). As such, forecasting of future call 151 This subsection was written by Devon K. Barrow.

805
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 15. STR decomposition of the log transformed training data and the forecasts for the traffic flow data.

Fig. 16. Left: forecast (red) and the test data (black); Right: the prediction error over time for the traffic flow data. (For interpretation of the
references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 17. The staffing decision process in call centres.

(2005) and Shen and Huang (2008a) use the same tech- mechanism. Several further studies have extended the
nique to achieve dimensionality reduction of arrival data, basic SVD approach to realise further modelling innova-
and to create a forecasting model that provides both in- tions, for example, to forecast call arrival rate profiles and
terday forecasts of call volume, and an intraday updating generate smooth arrival rate curves (Shen, 2009; Shen &
806
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Huang, 2008b; Shen, Huang, & Lee, 2007). A more com- & Wlezien, 2012; Williams & Reade, 2016; Wolfers &
prehensive coverage of different forecasting approaches Zitzewitz, 2004), even studying their links with opinion
for call arrival rate and volume can be found in a recent polls (Brown, Reade, & Vaughan Williams, 2019). Practice
review paper by Ibrahim, Ye, L’Ecuyer, and Shen (2016). has also developed econometric models (Fair, 1978) that
exploit structural information available months before
3.8.5. Elections forecasting152 the election (e.g., the evolution of the economy or the
With the exception of weather forecasts, there are few incumbent popularity). Lewis-Beck has had great success
forecasts which have as much public exposure as election in publishing dozens of papers using this approach (see,
forecasts. They are frequently published by mass media, e.g., Lewis-Beck, 2005).
with their number and disclosure reaching a frenzy as Special mention also goes to Election-Day forecast-
the Election Day approaches. This explains the signif- ing strategies, which have been systematically commis-
icant amount of methods, approaches and procedures sioned since the 1950s (Mitofsky, 1991). Exit (and en-
proposed and the paramount role these forecasts play in trance) polls (Klofstad & Bishin, 2012; Pavía, 2010), quick-
shaping people’s confidence in (soft/social) methods of counts (Pavía-Miralles & Larraz-Iribas, 2008), and statis-
forecasting. tical models (Bernardo, 1984; Moshman, 1964; Pavía-
The problem escalates because, regardless whether the Miralles, 2005) have been used to anticipate outcomes
goal of the election forecast is an attempt to ascertain on Election Day. Some of these strategies (mainly ran-
the winner in two-choice elections (e.g., a referendum or dom quick-counts) can be also employed as auditing tools
a Presidential election) or to reach estimates within the to disclose manipulation and fraud in weak democra-
margins of error in Parliamentary systems, the knowl- cies (Scheuren & Alvey, 2008).
edge of the forecasts influences electors’ choices (Pavía,
Gil-Carceller, Rubio-Mataix, Coll, Alvarez-Jareño, Aybar, & 3.8.6. Sports forecasting153
Carrasco-Arroyo, 2019). Election forecasts not only affect
Forecasting is inherent to sport. Strategies employed
voters but also political parties, campaign organizations
by participants in sporting contests rely on forecasts, and
and (international) investors, who are also watchful of
the decision by promoters to promote, and consumers
their evolution.
to attend such events are conditioned on forecasts: pre-
Scientific approaches to election forecasting include
dictions of how interesting the event will be. First in
polls, information (stock) markets and statistical models.
this section, we look at forecast competitions in sport,
They can also be sorted by when they are performed;
and following this we consider the role forecasts play in
and new methods, such as social media surveillance (see
sporting outcomes.
also Section 2.9.3), are also emerging (Ceron, Curini, &
Forecast competitions are common; see Section 2.12.7.
Iacus, 2016; Huberty, 2015). Probabilistic (representative)
Sport provides a range of forecast competitions, perhaps
polls are the most commonly used instrument to gauge
most notably the competition between bookmakers and
public opinions. The progressive higher impact of non-
their customers – betting. A bet is a contingent contract,
sampling errors (coverage issues, non-response bias, mea-
a contract whose payout is conditional on specified fu-
surement error: Biemer, 2010) is, however, severely test-
ture events occurring. Bets occur fundamentally because
ing this approach. Despite this, as Kennedy, Wojcik, and
two agents disagree about the likelihood of that event
Lazer (2017) show in a recent study covering 86 coun-
tries and more than 500 elections, polls are still pow- occurring, and hence it is a forecast.
erful and robust predictors of election outcomes after Bookmakers have been extensively analysed as fore-
adjustments (see also Jennings, Lewis-Beck, & Wlezien, casters; Forrest, Goddard, and Simmons (2005) evalu-
2020). The increasing need of post-sampling adjustments ated biases in the forecasts implied by bookmaker odds
of probabilistic samples has led to a resurgence of interest over a period where the betting industry became more
in non-probabilistic polls (Elliott & Valliant, 2017; Pavía & competitive, and found that relative to expert forecasts,
Larraz, 2012; Wang, Rothschild, Goel, & Gelman, 2015), bookmaker forecasts improved.
abandoned in favour of probabilistic sampling in 1936, With the internet age, prediction markets have
when Gallup forecasted Roosevelt’s triumph over Landon emerged, financial exchanges where willing participants
using a small representative sample despite Literacy Di- can buy and sell contingent contracts. In theory, such de-
gest failing to do so with a sample of near 2.5 million centralised market structures ought to provide the most
responses (Squire, 1988). efficient prices and hence efficient forecasts (Nordhaus,
A person knows far more than just her/his voting 1987). A range of papers have tested this in the sport-
intention (Rothschild, 2009) and when s/he makes a bet, ing context (Angelini & De Angelis, 2019; Croxson &
the rationality of her/his prediction is reinforced because Reade, 2014; Gil & Levitt, 2007), with conclusions tending
s/he wants to win. Expectation polls try to exloit the towards a lack of efficiency.
first issue (Graefe, 2014), while prediction markets, as Judgmental forecasts by experts are commonplace too
efficient aggregators of information, exploit both these (see also Section 2.11); traditionally in newspapers, but
issues to yield election forecasts (see also Sections 2.6.4 more recently on television and online. Reade, Singleton,
and 2.11.4). Several studies have proven the performance and Brown (2020) evaluate forecasts of scorelines from
of these approaches (Berg, Nelson, & Rietz, 2008; Erikson two such experts against bookmaker prices, a statistical

152 This subsection was written by Jose M. Pavía. 153 This subsection was written by J. James Reade.

807
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

model, and the forecasts from users of an online fore- a need megaprojects especially those that deliver social
casting competition. Singleton, Reade, and Brown (2019) and economic goods and create economic growth (Flyvb-
find that when forecasters in the same competition revise jerg, Bruzelius, & Rothengatter, 2003). Typical features of
their forecasts, their forecast performance worsens. This megaprojects include some or all the following: (i) deliv-
forecasting competition is also analysed by Butler, Butler, ering a substantial piece of physical infrastructure with a
and Eakins (2020) and Reade et al. (2020). life expectancy that spans across decades, (ii) main con-
Sport is a spectacle, and its commercial success is con- tractor or group of contractors are privately owned and fi-
ditioned on this fact. Hundreds of millions of people glob- nanced, and (iii) the contractor could retain an ownership
ally watch events like the Olympics and the FIFA World stake in the project and the client is often a government
Cup – but such interest is conditioned on anticipation, a or public sector organisation (Sanderson, 2012).
forecast that something interesting will happen. A super- However, megaprojects are heavily laced with extreme
star is going to be performing, the match will be a close human and technical complexities making their deliv-
encounter, or it will matter a lot for a bigger outcome (the ery and implementation difficult and often unsuccess-
championship, say). These are the central tenets of sport ful (Merrow, McDonnell, & Arguden, 1988; The R.F.E.
economics back to Neale (1964) and Rottenberg (1956), Working Group Report, 2015). This is largely due to the
most fundamentally the ‘uncertainty of outcome hypoth- challenge of managing megaprojects including extreme
complexity, increased risk, tight budget and deadlines,
esis’. A multitude of sport attendance prediction studies
lofty ideals (Fiori & Kovaka, 2005). Due to the possibility
investigate this (see, for example, Coates & Humphreys,
and consequences of megaproject failure (Mišić & Radu-
2010; Forrest & Simmons, 2006; Hart, Hutton, & Sharot,
jković, 2015), forecasting the outcomes of megaprojects
1975; Sacheti, Gregory-Smith, & Paton, 2014; van Ours,
is becoming of growing importance. In particular, it is
2021), and Van Reeth (2019) considers this for forecasting
crucial to identify and assess the risks and uncertain-
TV audiences for the Tour de France.
ties as well as other factors that contribute to disap-
Cities and countries bid to host large events like the
pointing outcomes of megaprojects in order to mitigate
World Cup based on forecasts regarding the impact of them (Flyvbjerg et al., 2003; Miller & Lessard, 2007).
hosting such events. Forecasts that are often inflated Literature review in forecasting in megaprojects is
for political reasons (Baade & Matheson, 2016). Equally, scarce. However, there are a few themes that have
franchise-based sports like many North American sports emerged in the extant literature as characteristics of
attract forecasts regarding the impact of a team locat- megaprojects that should be skilfully managed to provide
ing in a city, usually resulting in public subsidies for a guideline for the successful planning and construction
the construction of venues for teams to play at Coates of megaprojects (Fiori & Kovaka, 2005; Flyvbjerg, 2007;
and Humphreys (1999). Governments invest in sporting Sanderson, 2012). Turner and Zolin (2012) even claim
development, primarily to achieve better performances that we cannot even properly define what success is.
at global events, most notably the Olympics (Bernard & They argue that we need to reliable scales in order to
Busse, 2004). predict multiple perspectives by multiple stakeholders
Many sporting events themselves rely on forecasts to over multiple time frames — so definitely a very diffi-
function; high jumpers predict what height they will be cult long term problem. This could be done via a set of
able to jump over, and free diving contestants must state leading performance indicators that will enable managers
the depth they will dive to. Less formally, teams will set of Megaprojects to forecast during project execution how
themselves goals: to win matches, to win competitions, various stakeholders will perceive success months or even
to avoid the ‘wooden spoon’. Here, forecast outcomes are years into the operation. At the very early stages of a
influenced by the teams, and competitors, taking part in project’s lifecycle, a number of decisions must been taken
competitions and, as such, are perhaps less commonly and are of a great importance for the performance and
thought of as genuine forecasts. Important works pre- successful deliverables/outcomes. Flyvbjerg (2007) stress
dicting outcomes range from Dixon and Coles (1997) in the importance of the front-end considerations particu-
soccer, to Kovalchik and Reid (2019) for tennis, while the larly for Megaprojects Failure to account for unforeseen
increasing abundance of data means that machine learn- events frequently lead to cost overruns.
ing and deep learning methods are beginning to domi- Litsiou et al. (2019) suggest that forecasting the suc-
nate the landscape. See, for example, Hubáček, Šourek, cess of megaprojects is particularly a challenging and
and Železnỳ (2019) and Maymin (2019) for basketball, critical task due to the characteristics of such projects.
and Mulholland and Jensen (2019) for NFL. Megaproject stakeholders typically implement impact as-
sessments and/or cost benefit Analysis tools (Litsiou et al.,
3.8.7. Forecasting for megaprojects154 2019). As Makridakis, Hogarth, and Gaba (2010) sug-
Megaprojects are significant activities characterised by gested, judgmental forecasting is suitable where quan-
titative data is limited, and the level of uncertainty is
a multi-organisation structure, which produces highly vis-
very high; elements that we find in megaprojects. By
ible infrastructure or asset with very crucial social im-
comparing the performance of three judgmental methods,
pacts (Aaltonen, 2011). Megaprojects are complex, require
unaided judgment, semi-structured analogies (sSA), and
huge capital investment, several stakeholders are identi-
interaction groups (IG), used by a group of 69 semi-
fied and, usually a vast number of communities and the
experts, Litsiou et al. (2019) found that, the use of sSA out-
public are the receivers of the project’s benefits. There is
performs unaided judgment in forecasting performance
(see also Section 2.11.4). The difference is amplified fur-
154 This subsection was written by Konstantia Litsiou. ther when pooling of analogies through IG is introduced.
808
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

3.8.8. Competing products155 cassettes and compact discs for pre-recorded music in
Competition among products or technologies affects the US market. Obtained results of LVch outperform re-
prediction due to local systematic deviations and saturat- stricted and unrestricted UCRCD analyses. In this context
ing effects related to policies, and evolving interactions. the residual market is not perfectly accessible to both
The corresponding sales time series must be jointly mod- competitors and this fact, combined with WOM com-
elled including the time varying reciprocal influence. Fol- ponents, allows for better interpretation and forecasting
lowing the guidelines in subsection Section 2.3.20, some especially in medium and long-term horizons.
examples are reported below. A further application of the LVch model, Lotka–Volterra
Based on IMS-Health quarterly number of cimetidine with asymmetric churn (LVac), is proposed in Guidolin
and ranitidine packages sold in Italy, the CRCD model and Guseo (2020). It is based on a statistical reduction:
(Guseo & Mortarino, 2012) was tested to evaluate a di- The late entrant behaves as a standard Bass (1969) model
achronic competition that produced substitution. Cimeti- that modifies the dynamics and the evolution of the
dine is a histamine antagonist that inhibits the production first entrant in a partially overlapped market. The case
of stomach acid and was introduced by Smith, Kline & study is offered by a special form of competition where
French in 1976. Ranitidine is an alternative active prin- the iPhone produced an inverse cannibalisation of the
ciple introduced by Glaxo in 1981 and was found to have iPad. The former suffered a local negative interaction
far-improved tolerability and a longer-lasting action. The with some benefits: A long-lasting life cycle and a larger
main effect in delayed competition is that the first com- market size induced by the iPad.
pound spread fast but was suddenly outperformed by the A limitation in models for diachronic competition re-
new one principle that modified its stand-alone regime. lates to high number of rivals, implying complex para-
Guseo and Mortarino (2012) give some statistical and metric representations with respect to the observed in-
forecasting comparisons with the restricted Krishnan- formation. A second limitation, but also an opportunity,
Bass-Kummar Diachronic model (KBKD) by Krishnan et al. is the conditional nature of forecasting if the processes
(2000). Previous results were improved with the UCRCD partially depend upon exogenous control functions (new
model in Guseo and Mortarino (2014) by considering a policy regulations, new radical innovations, regular and
decomposition of word-of-mouth (WOM) effects in two promotional prices, etc.). These tools may be used to
parts: within-brand and cross-brand contributions. The simulate the effect of strategic interventions, but a lack of
new active compound exploited a large cross-brand WOM knowledge of such future policies may affect prediction.
and a positive within-brand effect. After the start of com-
petition, cimetidine experienced a negative WOM effect 3.8.9. Forecasting under data integrity attacks157
from its own adopters and benefited from the increase of Data integrity attacks, where unauthorized parties ac-
the category’s market potential driven by the antagonist. cess protected or confidential data and inject false infor-
Forecasting is more realistic with the UCRCD approach mation using various attack templates such as ramping,
and it avoids mistakes in long-term prediction. scaling, random attacks, pulse and smooth-curve, has be-
Restricted and unrestricted UCRCD models were ap- come a major concern in data integrity control in forecast-
plied in Germany by Guidolin and Guseo (2016) to the ing (Giani, Bitar, Garcia, McQueen, Khargonekar, & Poolla,
competition between nuclear power technologies and re- 2013; Singer & Friedman, 2014; Sridhar & Govindarasu,
newable energy technologies (wind and solar; see also 2014; Yue, 2017).
Sections 3.4.5, 3.4.6 and 3.4.8) in electricity production. Several previous studies have given attention in
Due to the ‘Energiewende’ policy started around 2000, the anomaly detection pre-processing step in forecasting work-
substitution effect, induced by competition, is confirmed flow with varying degree of emphasis. However, accord-
by the electricity production data provided by BP.156 An ing to Yue (2017), the detection of data integrity attacks
advance is proposed in Furlan, Mortarino, and Zahangir is very challenging as such attacks are done by highly
(2020) with three competitors (nuclear power, wind, and skilled adversaries in a coordinated manner without no-
solar technologies) and exogenous control functions ob- table variations in the historical data patterns (Liang, He,
taining direct inferences that provide a deeper analy- & Chen, 2019). These attacks can cause over-forecasts
sis and forecasting improvements in energy transition that demand unnecessary expenses for the upgrade and
context. maintenance, and can eventually lead to poor planning
Previous mentioned intersections between Lotka– and business decisions (Luo et al., 2018a; Luo, Hong, &
Volterra approach and diffusion of innovations competi- Fang, 2018b; Wu, Yu, Cui, & Lu, 2020).
tion models suggested a more modulated access to the Short-term load forecasting (see Section 3.4.3) is one
residual carrying capacity. The Lotka–Volterra with churn major field that are vulnerable to malicious data integrity
model (LVch) by Guidolin and Guseo (2015) represents attacks as many power industry functions such as eco-
‘churn effects’ preserving within and cross-brand effects nomic dispatch, unit commitment and automatic gen-
in a synchronic context. eration control heavily depend on accurate load fore-
An application of LVch model is discussed with ref- casts (Liang et al., 2019). The cyberattack on U.S. power
erence to the competition/substitution between compact grid in 2018 is one such major incident related to the
topic. According to the study conducted by Luo et al.
155 This subsection was written by Renato Guseo. (2018a), the widely used load forecasting models fail to
156 https://www.bp.com/en/global/corporate/energy-economics/
statistical-review-of-world-energy.html (Accessed: 2020-09-01). 157 This subsection was written by Priyanga Dilini Talagala.

809
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

produce reliable load forecast in the presence of such applied forecasting. The literature on the latter connection
large malicious data integrity attacks. A submission to can consider many different aspects in modelling as we
the Global Energy Forecasting Competition 2014 (GEF- illustrate below. We can identify two literature strands,
Com2014) incorporated an anomaly detection a much larger one on the various connections of energy
pre-processing step with a fixed anomalous threshold to with commodities and the agricultural sector (and in this
their load forecasting framework (Xie & Hong, 2016). The strand we include forecasting agricultural series) and a
method was later improved by Luo, Hong, Yue (2018) by smaller one that explores the issue of common factors.
replacing the fixed threshold with a data driven anoma- An early reference of the impact of energy on the agri-
lous threshold. Sridhar and Govindarasu (2014) also pro- cultural sector is Tewari (1990) and then after a decade
posed a general framework to detect scaling and ramp we find Cohin and Chantret (2010) on the long-run impact
attacks in power systems. Akouemo and Povinelli (2016) of energy prices on global agricultural markets.
investigated the impact towards the gas load forecasting Byrne, Fazio, and Fiess (2013) is an early reference for co-
using hybrid approach based on Bayesian maximum like- movement of commodity prices followed by
lihood classifier and a forecasting model. In contrast to Daskalaki, Kostakis, and Skiadopoulos (2014) on com-
the previous model based attempts, Yue, Hong, and Wang mon factors of commodity future returns and then a
(2019) proposed a descriptive analytic-based approach very recent paper from Alquist, Bhattarai, and Coibion
to detect cyberattacks including long anomalous sub- (2020) who link global economic activity with commod-
sequences (see Section 2.2.3), that are difficult to detect ity price co-movement. The impact of energy shocks on
by the conventional anomaly detection methods. US agricultural productivity was investigated by Wang
The problem of data integrity attacks is not limited and McPhail (2014) while Koirala, Mishra, D’Antoni, and
to load forecasting. Forecasting fields such as elections Mehlhorn (2015) explore the non-linear correlations of
forecasting (see Section 3.8.5), retail forecasting (see Sec- energy and agricultural prices with Albulescu, Tiwari,
tion 3.2.4), airline flight demand forecasting (see Sec- and Ji (2020) exploring the latter issue further, the last
tion 3.8.2) and stock price forecasting (see Section 3.3.13) two papers using copulas. Xiong, Li, Bao, Hu, and Zhang
are also vulnerable to data integrity attacks (Luo et al., (2015) is an early reference of forecasting agricultural
2018a; Seaman, 2018). For instant, Wu et al. (2020) ex- commodity prices while Kyriazi et al. (2019), Li, Li, Liu,
plored the vulnerability of traffic modelling and forecast- Zhu and Wei (2020), and Wang, Wang, Li, and Zhou (2019)
ing in the presence of data integrity attacks with the consider three novel and completely different approaches
aim of providing useful guidance for constrained network on forecasting agricultural prices and agricultural futures
resource planning and scheduling.
returns. López Cabrera and Schulz (2016) explore volatil-
However, despite of the increasing attention toward
ity linkages between energy and agricultural commodity
the topic, advancements in cyberattacks on critical in-
prices and then Tian, Yang, and Chen (2017) start a mini-
frastructure raise further data challenges. Fooling existing
stream on volatility forecasting on agricultural series fol-
anomaly detection algorithms via novel cyberattack tem-
lowed among others by the work of Luo, Klein, Ji, and
plates is one such major concern. In response to the
Hou (2019) and of Degiannakis, Filis, Klein, and Walther
above concern, Liang et al. (2019) proposed a data poi-
(2020). de Nicola, De Pace, and Hernandez (2016) exam-
soning algorithm that can fool existing load forecasting
ine the co-movement of energy and agricultural returns
approaches with anomaly detection component while de-
while Kagraoka (2016) and Lübbers and Posch (2016) ex-
manding further investigation into advanced anomaly de-
amine common factors in commodity prices. Pal and Mitra
tection methods. Further, adversaries can also manipulate
(2019) and Wei Su, Wang, Tao, and Oana-Ramona (2019)
other related input data without damaging the target
both investigate the linkages of crude oil and agricultural
data series. Therefore, further research similar to (Sobhani
prices. Finally, Tiwari, Nasreen, Shahbaz, and Hammoudeh
et al., 2020) are required to handle such data challenges.
(2020) examine the time-frequency causality between
3.8.10. The forecastability of agricultural time series158 various commodities, including agricultural and metals.
The forecasting of agricultural time series falls under There is clearly room for a number of applications in
the broader group of forecasting commodities, of which the context of this recent research, such along the lines
agricultural and related products are a critical subset. of further identifying and then using common factors in
While there has been considerable work in the economet- constructing forecasting models, exploring the impact of
rics and forecasting literature on common factor models the COVID-19 crisis in agricultural production or that of
in general there is surprisingly little work so far on the climate changes on agricultural prices.
application of such models for commodities and agri-
cultural time series – and this is so given that there 3.8.11. Forecasting in the food and beverage industry159
is considerable literature in the linkage between energy Reducing the ecological impact and waste, and increas-
and commodities, including agricultural products, their ing the efficiency of the food and beverage industry are
prices and futures prices, their returns and volatilities. currently major worldwide issues. To this direction, effi-
Furthermore, a significant number of papers is fairly re- cient and sustainable management of perishable food and
cent which indicates that there are many open avenues the control of the beverage quality is of paramount impor-
of future research on these topics, and in particular for tance. A particular focus on this topic is placed on supply

158 This subsection was written by Dimitrios Thomakos. 159 This subsection was written by Daniele Apiletti.

810
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

chain forecasting (see Section 3.2.2), with advanced mon- a low-quality coffee due to excessive percolation time. If
itoring technologies able to track the events impacting the amount of coffee ground is low, the ground is coarse,
and affecting the food and beverage processes (La Scalia, and the pressure is high, then we can forecast a low-
Micale, Miglietta, & Toma, 2019). Such technologies are quality coffee due to excessive flow rate. Furthermore, the
typically deployed inside manufacturing plants, yielding coarseness of coffee ground generates an excessive flow
to Industry 4.0 solutions (Ojo, Shah, Coutroubis, Jiménez, rate forecast, despite the optimal values of dosage and
& Ocana, 2018) that are enabled by state-of-the-art fore- pressure, with very high confidence.
casting applications in smart factories. The transition from
plain agriculture techniques to smart solutions for food 3.8.12. Dealing with logistic forecasts in practice160
processing is a trend that fosters emerging forecasting
The forecaster faces three major difficulties when us-
data-driven solutions in many parts of the world, with
ing the logistic equation (S curve); see also Section 2.3.19.
special attention to the sustainability aspects (Zailani, Je-
A first dilemma is whether he or she should fit an S
yaraman, Vengadasan, & Premkumar, 2012).
curve to the cumulative number or to the number per
Various forecasting approaches have been successfully
unit of time. Here the forecaster must exercise wise judg-
applied in the context of the food and beverage indus-
ment. What is the ‘‘species’’ and what is the niche that is
try, from Monte Carlo simulations based on a shelf-life
being filled? To the frustration of business people there
model (La Scalia et al., 2019), to association rule min-
is no universal answer. When forecasting the sales of
ing (see Section 2.9.2) applied to sensor-based equip-
a new product it is often clear that one should fit the
ment monitoring measurements (Apiletti & Pastor, 2020),
cumulative sales because the product’s market niche is
multi-objective mathematical models for perishable sup-
ply chain configurations, forecasting costs, delivery time, expected to eventually fill up. But if we are dealing with
and emissions (Wang, Nhieu, Chung, & Pham, 2021), and something that is going to stay with us for a long time
intelligent agent technologies for network optimisation in (for example, the Internet or a smoking habit), then one
the food and beverage logistics management (Mangina & should not fit cumulative numbers. At times this distinc-
Vlachos, 2005). tion may not be so obvious. For example, when COVID-19
We now focus on the case of forecasting the quality first appeared many people (often amateurs) began fit-
of beverages, and particularly coffee. Espresso coffee is ting S curves to the cumulative number of infections
among the most popular beverages, and its quality is (for other attempts on forecasting COVID-19, see Sec-
one of the most discussed and investigated issues. Be- tion 3.6.2). Some of them were rewarded because in-
sides human-expert panels, electronic noses, and chem- deed the diffusion of the virus in some countries behaved
ical techniques, forecasting the quality of espresso by accordingly (Debecker & Modis, 1994). But many were
means of data-driven approaches, such as association rule frustrated and tried to ‘‘fix’’ the logistic equation by intro-
mining, is an emerging research topic (Apiletti & Pastor, ducing more parameters, or simply gave up on trying to
2020; Apiletti, Pastor, Callà, & Baralis, 2020; Kittichotsat- use logistics with COVID 19. And yet, many cases (e.g., the
sawat, Jangkrajarng, & Tippayawong, 2021). US) can be illuminated by logistic fits but on the daily
The forecasting model of the espresso quality is built number of infections, not on the cumulative number. As
from a real-world dataset of espresso brewing by profes- of August 1, 2020, leaving out the three eastern states
sional coffee-making machines. Coffee ground size, cof- that had gotten things under control, the rest of the US
fee ground amount, and water pressure have been se- displayed two classic S curve steps followed by plateaus
lected among the most influential external variables. The (see Fig. 18). The two plateaus reflect the number of
ground-truth quality evaluation has been performed for infections that American society was willing to tolerate
each shot of coffee based on three well-known quality at the time, as the price to pay for not applying measures
variables selected by domain experts and measured by to restrict the virus diffusion.
specific sensors: the extraction time, the average flow The second difficulty in using the logistic equation has
rate, and the espresso volume. An exhaustive set of more to do with its ability to predict from relatively early mea-
than a thousand coffees has been produced to train a surements the final ceiling. The crucial question is how
model able to forecast the effect of non-optimal values on early can the final ceiling be determined and with what
the espresso quality. accuracy. Some people claim that before the midpoint no
For each variable considered, different categorical val- determination of a final level is trustworthy (Marinakis &
ues are considered: ground size can be coarse, optimal, Walsh, 2021). Forecasters usually abstain from assigning
or fine; ground amount can be high, optimal, or low; quantitative uncertainties on the parameters of their S
brewing water pressure can be high, optimal, or low. The curve forecasts mostly because there is no theory behind
experimental setting of categorical variables enables the it. However, there is a unique study by Debecker and
application of association rule mining (see Section 2.9.2), Modis (2021) that quantifies the uncertainties on the pa-
a powerful data-driven exhaustive and explainable ap- rameters determined by logistic fits. The study was based
proach (Han et al., 2011; Tan et al., 2005), successfully ex- on 35,000 S curve fits on simulated data, smeared by
ploited in different application contexts (Acquaviva et al., random noise and covering a variety of conditions. The
2015; Di Corso et al., 2018). fits were carried out via a χ 2 minimisation technique. The
Several interesting findings emerged. If the water pres-
sure is low, the amount of coffee ground is too high, and
the grinding is fine, then we can forecast with confidence 160 This subsection was written by Theodore Modis.

811
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

Fig. 18. Two logistic-growth steps during the early diffusion of COVID-19 in America (March to July, 2020).

study produced lookup tables and graphs for determining processes during the past decades to convince many busi-
the uncertainties expected on the three parameters of nesses to adopt systematic forecasting procedures, leaving
the logistic equation as a function of the range of the S a wide swath of commerce under the guidance of ad hoc
curve populated by data, the error per data point, and the judgment and intuition. At the other extreme, we see
confidence level required. companies with implementations that combine state-of-
The third difficulty using the logistic equation comes the-art methodology with sophisticated accommodations
from the fact that no matter what fitting program one of computing time and costs, as well as consideration
uses, the fitted S curve will flatten toward a ceiling as of the requirements and capabilities of a diverse group
early and as low as it is allowed by the constraints of the of stakeholders (Yelland, Baz, & Serafini, 2019). So, it is
procedure. As a consequence fitting programs may yield not hyperbole to state that business forecasting prac-
logistic fits that are often biased toward a low ceiling. tices are all over the place. What surely is hyperbole,
Bigger errors on the data points accentuate this bias by however, are the ubiquitous claims of software providers
permitting larger margins for the determination of the S about their products accurately forecasting sales, reduc-
ing costs, integrating functions, and elevating the bottom
curve parameters. To compensate for this bias the user
line (Makridakis, Bonneli et al., 2020; Sorensen, 2020).
must explore several fits with different weights on the
For this section, we grilled a dozen practitioners and
data points during the calculation of the χ 2 . He or she
thought leaders (‘‘the Group’’) about developments play-
should then favour the answer that gives the highest
ing out in the next decade of forecasting practice, and
ceiling for the S curve (most often obtained by weighting
have categorised their responses:
more heavily the recent historical data points). Of course,
this must be done with good justification; here again the • Nature of forecasting challenges;
forecaster must exercise wise judgment. • Changes in the forecasting toolbox;
• Evolution in forecasting processes such as integra-
3.9. The future of forecasting practice161 tion of planning functions;
• Expectations of forecasters; and
Plus ça change, plus c’est la même chose.
• Scepticism about real change.
Forecasting Challenges: Focusing on operations, the
Jean-Baptiste Karr (1849) Group sees demand forecasting becoming ever more dif-
ficult due to product/channel proliferation, shorter lead
It would be a more straightforward task to make pre- times, shorter product histories, and spikes in major dis-
dictions about the future of forecasting practice if we had ruptions.
a better grasp of the present state of forecasting prac-
• Operational forecasts will have shorter forecast hori-
tice. For that matter, we lack even a common definition
zons to increase strategic agility required by busi-
of forecasting practice. In a recent article, Makridakis,
ness to compete, sustain, and survive.
Bonneli et al. (2020) lamented the failure of truly no-
• New models will need to incorporate supply-chain
table advances in forecasting methodologies, systems, and
disruption. Demand chains will need to be restarted,
shortening historical data sets and making tradi-
161 This subsection was written by Len Tashman. tional models less viable due to limited history.
812
F. Petropoulos, D. Apiletti, V. Assimakopoulos et al. International Journal of Forecasting 38 (2022) 705–871

• Lead times will decrease as companies see the prob- • The interest around disease models increases our
lems in having distant suppliers. Longer lead times awareness of the strengths and weaknesses of math-
make accurate forecasting more difficult. ematical models. Forecasters may need to become
more measured in their claims, or do more to resist
Forecasting Tool Box: Unsurprisingly, this category re-
their models being exploited.
ceived most of the Group’s attention. All predict greater
• We will see a transformation from demand planner
reliance on AI/ML for automating supply-and-demand
to demand analyst, requiring additional skill sets
planning tasks and for reconciling discrepancies in hi-
including advanced decision making, data and risk
erarchical forecasting. Longer-horizon causal forecasting
analysis, communication, and negotiation.
models will be facilitated by big data, social media, and
• Professional forecasters will be rare except in com-
algorithmic improvements by quantum computing. Post-
panies where this expertise is valued. Fewer stu-
COVID, we will see a greater focus on risk
dents are now educated or interested in statistical
management/mitigation. The Cloud will end the era of
modelling, and time is not generally available for
desktop solutions.
training.
• Quantum computers will improve algorithms used • Forecasters will learn the same lesson as optimisa-
in areas like financial forecasting (e.g., Monte Carlo tion folks in the 1990s and 2000s: the importance
simulations), and will change our thinking about of understanding the application area—community
forecasting and uncertainty. intelligence.
• Although social media is a tool for ‘‘what’s trend-
Scepticism: Many were sceptical about the current en-
ing now’’, new models will be developed to use
thusiasm for AI/ML methods; disappointed about the slow
social-media data to predict longer-term behaviour.
adoption of promising new methods into software sys-
Step aside Brown (exponential smoothing) and Bass
tems and, in turn, by companies that use these systems;
(diffusion).
and pessimistic about the respect given to and influence
• Greater automation of routine tasks (data loading,
of forecasters in the company’s decision making.
scrubbing, forecast generation and tuning, etc.)
through AI/ML-powered workflow, configurable lim- • While AI/ML are important additions to the fore-
its, and active alerts. More black box under the hood, caster’s toolbox, they will not automatically solve
but more clarity on the dashboard. forecasting issues. Problems include data hunger,
• Greater focus on risk management/mitigation through capacity brittleness, dubious input data, fickle trust
what-if scenarios, simulations, and probabilistic fore- by users (Kolassa, 2020c), and model bias.
casting. • Practices in the next decade will look very similar to
the present. Not that much has changed in the last
Forecasting Processes and Functional Integration: Sys-
decade, and academic developments are slow to be
tems will become more integrated, promoting greater
translated into practice.
collaboration across functional areas and coordination be-
• Politics, gaming, and the low priority given to fore-
tween forecast teams and those who rely upon them.
casting are the prime drivers of practice, thus limit-
Achieving supply-chain resilience will become as impor-
ing interest in adopting new methodologies.
tant as production efficiency, and new technology such
• None of the topical items (AI/ML, big data, demand
as Alert and Root Cause Analysis systems will mitigate
sensing, new forecasting applications) will have much
disruptions.
of an impact on forecasting practice. Forecasting de-
• S&OP will expand from its home in operations to partments hop from one trend to the other without
more fully integrate with other functions such as making much progress towards better forecasting
finance and performance management, especially in accuracy.
larger multinationals. • Software companies will struggle, despite good of-
• The pandemic has forced firms to consider upping ferings. Most companies do not want to invest in
supply-chain resilience. Firms are building in capac- excellent forecasting engines; whatever came with
ity, inventory, redundancy into operations—somewhat their ERP system is ‘‘good enough’’.
antithetical to the efficiency plays that forecasting • Forecasting will continue to suffer from neglect by
brings to the table. higher levels of management, particularly when fore-
• Forecasting will be more closely tied to Alert and casts are inconveniently contrary to the messages
Root Cause Analysis systems, which identify break- management hopes to convey.
downs in processes/systems contributing to adverse
Note finally that the COVID-19 pandemic has elevated
events, and prevent their recurrence.
practitioner concerns about disruptions to normal pat-
Expectations of Forecasters: Agreement was universal terns as well as the fear of an increasingly volatile en-
that the forecaster’s job description will broaden and be- vironment in which forecasts must be made. There are
come more demanding, but that technology will allow indications that companies will place more stress on judg-
some redirection of effort from producing forecasts to mental scenarios, likely in conjunction with statistical/ML
communicating forecasting insights. methods.
813

You might also like