1 s2.0 S0957417422003670 Main

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Expert Systems With Applications 200 (2022) 116936

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

On forecasting non-renewable energy production with uncertainty


quantification: A case study of the Italian energy market
Sergio Flesca a , Francesco Scala a , Eugenio Vocaturo a,c ,∗, Francesco Zumpano b
a
DIMES - Università della Calabria, Rende, Italy
b
Infopower Research, Rende, Italy
c
CNR-Nanotec, Rende, Italy

ARTICLE INFO ABSTRACT

Keywords: Nowadays the introduction of energy marketplaces in several countries pushed the development of machine
Energy production forecasting learning approaches for devising effective predictions about both energy needs and energy productions. In this
Machine learning applications paper we address the problem of predicting the amount of electrical power produced using non-renewable
sources, as getting an estimate of the amount of electrical power produced using the various kinds of non-
renewable sources yields a big competitive advantage for energy market investors. Specifically, we devise a
forecasting technique obtained by trying and combining various machine learning techniques which is able to
provide energy production estimates with a remarkably low error. Finally, since the input data available for
predictions are in general not sufficient to determine the amounts of produced energy for the various source
types, we provide an estimate of the impact of unknown latent variable on the amounts of produced energy,
by devising a prediction model which is capable of estimating the prediction error for the specific data at
hand. These informations can be exploited by investors to get an idea of the risk levels of their investments.

1. Introduction called ‘‘Regulation of the European Parliament and of the Council


on the Internal Market for Electricity’’, denoted as regulation of the
Population expansion, social advances, and technical breakthroughs electricity market hereinafter establishes that the transmission system
have all contributed to increased demand for energy and commodities operators and the electricity market operators jointly organize the
in recent decades. The demographic increase of specific areas of the management of the ‘‘day-ahead’’ and ‘‘intraday’’ markets in a manner
world, combined with the increased production activities are implying that provides that national markets are integrated using the ‘‘market
an ever greater consumption of energy. In light of this growing demand,
coupling’’. The electricity market regulation provides that operators
an adequate amount of energy is necessary to meet national demands
must cooperate at the level of the Union or, where more appropriate, on
while also protecting the natural environment. Utility firms are respon-
a regional basis, in order to maximize the efficiency and effectiveness
sible for supporting better plans and maintaining a database of energy
use in order to continually enhance their services. In this situation, of trade in electricity. Furthermore, the electricity market regulation
energy analysis has developed as a key field of research in recent provides that operators must supply products that can be traded on
years due to its considerable influence on a country’s socioeconomic the ‘‘day-ahead’’ and ‘‘intraday’’ markets that are sufficiently small (for
growth. Consumer segmentation, profile characterization, demand pat- example consisting of 1 Megawatt offers) to allow effective trading
tern analysis, and prediction from real-world data have all been studied by of all market players (regardless of their size). It should be noted
in depth. that the EU plans to introduce (by 1 January 2025) an additional
The realization of forecast models for the energy market need to market known as the ‘‘imbalance’’ characterized by 15-minute slots for
take into account the close interconnection between the EU Member unbalancing negotiations.
States (in the study in question therefore between Italy and neighboring A mandatory aspect as regards the energy market is linked to the
countries), achieved through the common trans-European network, fact that the sale or purchase of energy in the various markets is
which is unique in the world and is one of the tools on which Brussels
characterized by different costs and earnings: an adequate predictive
counts to contain energy costs for consumers. In fact, the regulation

∗ Corresponding author at: DIMES - Università della Calabria, Rende, Italy.


E-mail addresses: sergio.flesca@unical.it (S. Flesca), francesco.scala@unical.it (F. Scala), eugenio.vocaturo@unical.it, eugenio.vocaturo@cnr.it (E. Vocaturo),
f.zumpano@infopowerresearch.it, l.granata@infopowerresearch.it (F. Zumpano).

https://doi.org/10.1016/j.eswa.2022.116936
Received 22 October 2021; Received in revised form 6 March 2022; Accepted 17 March 2022
Available online 26 March 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

scenario becomes necessary to allow operators to move effectively


inside the various markets.
In this complex scenario, the present work aims to design forecast
models for the Italian electricity market, more specifically on the pre-
diction of the production of non-renewable sources. This specific area
appears unexplored in the literature while there are already reliable
models for the prediction of renewable ones Wang and Wang (2016).
In fact, with the integration of markets, the possibility of having reliable
forecasts of the fundamentals of the integrated electricity market opens
up new revenue opportunities by reducing the risk (or in any case
having a measure of risk) of operating in the integrated electricity
market. The creation of forecast models for the electricity market
requires analyzing the correlations between data sources, analyzing the
statistical importance, analyzing the clusters, performing Data Mining
activities for each data source.

1.1. Main contributions

The main contributions of this work are the following:

1. We defined a forecasting model that using mostly publicly avail-


able information sources allows predicting the amount of non-
renewable energy produced in Italy with a very fine granularity
(day, hour). This prediction model has been devised by both
Fig. 1. Electricity production forecasting architecture.
performing an analysis of the available information sources and
experimentally investigating the possibility to use state of the art
prediction models for this specific task. The experiments allow
us to assess that Gradient Boosted Regression Trees are the best Finally, the prediction model and the error estimation module are
model to be used for this task. applied to the current data to yield predictions and error estimates.
2. We investigate experimentally the possibility of identifying Current data consists of a tuple of values that will be described in detail
"hard’’ prediction cases. The knowledge about which are the in the next sections (see Fig. 1).
cases that causes the prediction model to perform badly is
of extremely high importance for the decision maker. Quite 3. Preliminaries
surprisingly the experimental validation suggests that the only
technique that exhibits a good level of performances is KNN In this paper we applied various machine learning techniques for
prediction. providing effective estimates of the electricity production amounts for
the various non-renewable sources. Among the tested techniques some
were found to be able to yield effective estimates. In the following we
2. Energy production forecasting in a nutshell
report a brief description for each tested technique.
In this work we propose a framework for predicting the quantity of • The GBRT (Gradient Boosted Regression Tree) (Friedman, 2002)
electricity production obtained through non-renewable energy sources is a machine learning technique that is based on (regression)
within the Italian national territory. In fact, many models for achieving tree averaging. Specifically, GBRT sequentially adds small trees
effective estimates of the production of renewable energy have been (whose maximum depth is limited by a parameter 𝑑), each with
proposed in the literature (e.g., see Ceci et al. (2015), Ferruzzi et al. high bias. In each step, a new tree is added that focuses on the
(2016), Xie et al. (2015) and Basurto et al. (2019)), while to the best of samples that are responsible for the current remaining regression
our knowledge there is no model for predicting the production amount error. That is, in each iteration a new tree 𝑡(⋅) is added to the
obtained for the various non-renewable energy sources. Yet the two current predictor chosen in order to minimize (𝑇 + 𝑡), where 𝑇
problems may appear to be similar at a first glance, there are many dif- is the current predictor and  is the adopted loss function.
ferences between them: the energy production from renewable sources Intuitively, GBRT performs gradient descent in the instance space.
mainly depends on ‘‘natural events’’, while the energy production from In the case where  is the squared loss, the gradient for a sample
non-renewable sources depends on different factors such as the amount 𝑥𝑖 becomes the residual from the previous iteration, i.e. 𝑟𝑖 =
of renewable energy generated, the prices of raw materials and many 𝑦𝑖 − 𝑇 (𝑥𝑖 ), where 𝑦𝑖 is the output value for sample 𝑥𝑖 . GBRT
others. mainly depends on three parameters: the learning rate 𝛼 > 0,
The proposed energy production forecasting architecture is com- the maximum tree-depth 𝑑, and the number of iterations (number
posed by three main modules: the dataset builder module, the learning estimator), plus other algorithmic parameters such as the split
module and the online predictor module. The dataset builder module criterion, the loss function, other parameters related to individual
extracts data from various sources available on the web, specifically tree construction,
from quandl (carbon prices), entsoe (actual total load, aggregated gener- • The ETR (Extra Tree Regressor) (Geurts et al., 2006) is a boosted
ation per type, cross border physical flow) and macrotrends (oil and gas meta-estimator who trains a number of randomized decision trees
prices) and from the output of a prediction model for made available (also called extra-trees) on various sub-samples of the training
by Infopower Research. All the extracted data are stored in the internal set and uses the calculation of the average to improve predictive
dataset. accuracy and control overfitting. Specifically, each single tree in
The learning module uses the internal dataset to yield a prediction the ensemble is grown on the whole training sample set and for
model for the quantity of electricity production obtained through non- each split a subset of the features is randomly chosen. For each of
renewable energy sources and an error estimation model for the devised the considered features a threshold is randomly drawn and a split
predictor. minimizing the prediction error is made for it. A node is split as

2
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

long as the remaining number of training samples associated to • The LCV (Lasso CV) (Obuchi & Kabashima, 2016) is a linear
the node is higher then the minimum sample size for splitting a model with iterative fitting along a regularization path. The best
node. After all the trees are (independently) grown, the final ETR model is selected by cross-validation. The optimization objective
prediction is the arithmetic average of the single predictions of for Lasso is:
each tree. ( )
1
• The KNN (K-Nearest Neighbors), Ban et al. (2013) is one of the min × ‖𝑋 ⋅ 𝜔 − 𝑦‖22 + 𝛼 × ‖𝜔‖1
𝜔 2 × 𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠
simplest machine learning algorithms. Usually, k is a small, odd
number — sometimes only 1. The larger k is, the more accurate where 𝜔 is the vector of parameters, 𝑋 is the matrix of samples, 𝑦
the classification will be, but the longer it takes to perform the is the vector of actual labels and 𝛼 is a regularization parameter.
classification. To classify an object into one of several classes this • The ENCV (Elastic Net CV) (Hui Zou, 2004) is a model with itera-
technique looking at the k elements of the training set that are tive fitting along a regularization path. The best model is selected
closest to the sample you want to classify, and letting them vote by cross-validation. Here is explained in detail the elastic net
by majority on what that object’s class should be. ‘‘Closest’’ here (EN) method (Zou & Hastie, 2005) that is based on a compromise
refers to literal distance in n-dimensional space, or the Euclidean between the lasso and ridge regression penalties:
distance; ⎧ 𝑛 ( )2
⎪∑ ∑𝑝
• The NN (Neural networks) (Mehlig, 2019) is a computational
̂ ̂
𝛽0 , 𝛽 = 𝑎𝑟𝑔𝑚𝑖𝑛 ⎨ 𝑦𝑖 − 𝛽0 − 𝛽𝑗 𝑋𝑖𝑗
model composed of artificial ‘‘neurons’’, inspired by the simpli- 𝛽0 ,𝛽 ⎪ 𝑖=1 𝑗=1
fication of a biological neural network; ⎩
• The SVR (Support Vector Regression) (Awad & Khanna, 2015): ⎫
𝑝 [
∑ ]
this type of modes is an extension of SVM (Support Vector Ma- 1 | | ⎪
+𝜆 (1 − 𝛼) 𝛽𝑗2 + 𝛼 |𝛽𝑗 | ⎬
chines) but for regression and not for a simple classification; 2 | |
𝑗=1 ⎪
• The KRR (Kernel Ridge Regression) (Murphy, 2012) combines ⎭
Ridge regression and classification (linear least squares with l2- where 0 ≤ 𝛼 ≤ 1 is a penalty weight, 𝑦 is a vector of length 𝑛
norm regularization) with the kernel trick. Thus it learns a linear including the response variable, 𝑋 = (𝑥𝑖1 , … , 𝑥𝑖𝑝 ) is a 𝑛 × 𝑝 matrix
function in the space induced by the respective kernel and the holding the predictor variables, 𝛽0 is the intercept, 𝛽 = (𝛽1 , … , 𝛽𝑝 )
data. For non-linear kernels, this corresponds to a non-linear is a column vector that contains the regression coefficients and 𝑒
function in the original space; is a vector of error terms assuming normal distribution 𝑒 𝑁(0, 𝜔2𝑒 ).
• The RT (Regression Tree) (Loh, 2011) maximizes [𝐶, 𝑌 ], where 𝑌 For models where 𝑛 > 𝑝, the values of the unknown parameters
is now the dependent variable, and 𝐶 is the variable that indicates 𝛽0 and 𝛽 can be uniquely estimated by minimizing the residual
the height of the current tree. A direct maximization cannot be sum of squares. The 𝐸𝑁 with 𝛼 = 1 is identical to the lasso,
performed, so a greedy search is done. We start with the single whereas it turns out to be ridge regression with 𝛼 = 0 (Friedman
binary question that maximizes the information obtained on 𝑌 ; et al., 2010). Setting 𝛼 close to 1 makes the 𝐸𝑁 to behave similar
this gives one root node and two child nodes. At each child node, to the lasso, but eliminates problematic behavior caused by high
the initial procedure is repeated, asking which question would correlations. When 𝛼 increases from 0 to 1, for a given 𝜆 the
give the most information about 𝑌 , given where we are already sparsity of the minimization (i.e., the number of coefficients equal
in the tree, all this recursively; to zero) increases monotonically from 0 to the sparsity of the
• The RFR (Random Forest Regression) (Breiman, 2001): a random lasso estimation. The elastic net can select more variables than
forest is a meta estimator that fits a number of classifying decision observations.
trees on various sub-samples of the dataset and uses averaging to • The PLSR (Partial Least Squares Regression) (Rosipal & Krämer,
improve the predictive accuracy and control over-fitting; 2005) is a quick, efficient and optimal regression method based
• The BR (Bagging Regressor) (Breiman, 1996): a Bagging regressor on covariance. The idea behind this technique is to create, starting
is an ensemble meta-estimator that fits base regressors each on from a table with n observations described by p variables, a set
random subsets of the original dataset and then aggregate their of h components with the PLS1 and PLS2 algorithms.
individual predictions to form a final prediction. It can be used
as a way to reduce the variance of a black-box estimator, by in- 4. Datasets
troducing randomization into its construction procedure and then
making an ensemble out of it. When random subsets of the dataset The dataset used for training the models were downloaded from the
are drawn as random subsets of the samples, then this algorithm web sites of the regulatory bodies of the electrical market, with the
is known as Pasting (Breiman, 1999). If samples are drawn with exception of the data about renewable sources production which are
replacement, then the method is known as Bagging (Breiman, the prediction for the next day which was provided to us by an already
1996). When random subsets of the dataset are drawn as random consolidated model. We do not report the details of this model in this
subsets of the features, then the method is known as Random paper as (1) the model is used as a black box and it is orthogonal to
Subspaces (Ho, 1998). Finally, when base estimators are built on the proposed approach, and (2) it is a proprietary model provided us by
subsets of both samples and features, then the method is known Infopower Research. The granularity of the data is hourly. The amount
as Random Patches (Louppe & Geurts, 2012); of data is relatively small (about 3 years data hour per hour therefore
• The ABR (Ada Boost Regressor) (Freund & Schapire, 1995) is a about 60k samples, divided into training sets and test sets of 59k and 1k
meta-estimator that begins by fitting a regressor on the original respectively) in fact there is no history large enough to think of using
dataset and then fits additional copies of the regressor on the models as neural networks (of any architecture) but we have opted for
same dataset but where the weights of instances are adjusted ac- the use of the models seen in the previous section.
cording to the error of the current prediction. As such, subsequent The data used as inputs for the predictors have been selected
regressors focus more on difficult cases;
according to several standpoints: the possibility to retrieve them with
• The GPML (Gaussian Processes for Machine Learning) (Carl Ed- no license fees and the coverage of the variables that may influence the
ward Rasmussen, 2010) provides a wide range of functionality for energy market. Hence, each data source that has been selected for this
Gaussian process inference and prediction. These processes are purpose contains specific information about some particular feature,
specified by mean and covariance functions. Several likelihood
such as:
functions are supported including Gaussian and heavy-tailed for
regression as well as others suitable for classification; • the prices of raw materials used for a specific source of electricity,

3
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

• the trend of the production of non-renewable sources (assuming platform to grab the carbon price. The data are organized in a
that the trend of the previous days could be an indicator of today’s csv file containing the following columns: Date, Open price, High
production), price for the day, Low price for the day, Settle price for the day,
• the quantity of electricity production from renewable sources (as Change price for the day, Wave, Volume, Prev. Day Open Interest,
non-renewable sources production level depends on the amount EFP Volume, EFS Volume, Block Volume. The only data used for
of production from renewable sources), our dataset was the Settle price for the day;
• the amount of energy imported from neighbor states, and • Entsoe: is the European association for the cooperation of trans-
• the information on holidays, as energy consumption undergo high mission system operators (TSOs) for electricity, it supply a web
variations mainly related to holidays and on seasonal basis. platform to get numerosus informations on electricity related
data. In fact, we used this platform to grab the following data:
In the following we provide a more detailed description of the data
that will be placed as input to the prediction model. Wind production;
Solar production;
• price of raw materials: useful since the price can influence whether
Water production;
to make better use of one or the other non-renewable source.
Cross-border flows;
Specifically, the dataset contains the prices of gas (daily gas price
Consumptions.
- dgp), coal (dcp), and petroleum (dpp); these data are extracted
from macrotrends and quandl platforms, and their granularity is The data are organized in multiple csv files. Those containing
daily; the production data contain the following columns: year, month,
• renewable sources production: necessary because the amount of day, Date Time, Resolution Code, areacode, Area Type Code,
production of non-renewable sources below is what renewable Area Name, Map Code, Production Type, Actual Generation Out-
sources have managed to generate wind (rew), solar (res), hydro put, Actual Consumption, Update Time. Instead those containing
water (reh), other renewable (reo). These data are extracted from the consumption data have contain the following columns: year,
entsoe platform, and their granularity is hourly, instead for predic- month, day, Date Time, Resolution Code, areacode, Area Type
tion we use the data supplied by prediction models by infopower Code, Area Name, Map Code, Total Load Value, Update Time.
because on entsoe there are only historical data and not prediction All the data grabbed from this source need to be pre-processed
for the future; before their usage for training the predictors. In particular, the
• not renewable sources production: necessary as history the amount data have been aggregated using their timestamps. In fact the
of production of not renewable sources in Italian territory, fossil production data available in this source are not cumulative for
coal (nrfc), fossil gas (nrfg), hard coal (nrhc), fossil oil (nrfo), each day. The same pre-processing step was adopted for the
waste (nrwa), other (nrot). These data are extracted from entsoe consumptions. Finally, redundant and inconsistent information
platform, and their granularity is hourly, we will add the suffix and information unrelated to the energy market were removed
(-w) to refer to the previous week’s data and (-d) to refer to the in the pre-processing phase.
previous day’s data; • Holidays library: is a Python library for generating on the fly
• transits: the amount of energy between nations is relevant so holiday information specifying country, province and state. This
transits can be seen as a extra energy source (trans). This data library was used to get holiday information.
is extracted from entsoe platform, and its granularity is hourly,
instead for prediction we use the data supplied by prediction Starting from these data we created four different datasets for each
models by infopower because on entsoe there are only historical type of not renewable production (called 𝑥 for simplicity):
data and not prediction for the future; • rf: (ReFined) this dataset has as input all the data seen just now.
• temporal informations: useful for building a sequence of data that We report here an example of tuple: ⟨tind, tiwh, hour, rew, res, reh,
will hopefully repeat over time. Temporal informations also serve reo, trans, dgp, dcp, dpp⟩
to better define how the environment operates – the consumption • wp: (With Production) this dataset has as input the same of
of individuals – based on current time (same as hour, day, month wp dataset but was added the amount of production by 𝑥 in
or year). Obviously the granularity of these data is hourly: the previous day at the same hour; here an example of tuple:
number of the day in the year (we want to assume that ⟨tind, tiwh, hour, rew, res, reh, reo, trans, dgp, dcp, dpp, 𝐧𝐫𝐟 𝐜 − 𝐝,
history is ‘repeated’ annually) (tind) this data is modeled as 𝐧𝐫𝐟 𝐠 − 𝐝, 𝐧𝐫𝐡𝐜 − 𝐝, 𝐧𝐫𝐟 𝐨 − 𝐝, 𝐧𝐫𝐰𝐚 − 𝐝, 𝐧𝐫𝐨𝐭 − 𝐝⟩
an integer stating from 1 for Sunday until 7 for Saturday; • wpwl: (With Production Week Later) this dataset has as input the
weekday/holiday (tiwh) this data is modeled as an integer same of wp dataset but was added the amount of production by
(boolean) which 1 means that is a holiday and 0 otherwise; 𝑥 in the same hour but in the same day of previous week, here an
example of tuple: ⟨tind, tiwh, hour, rew, res, reh, reo, trans, dgp, dcp,
• energy consumption: the estimated amount of citizenship consump- dpp, 𝐧𝐫𝐟 𝐜 − 𝐰, 𝐧𝐫𝐟 𝐠 − 𝐰, 𝐧𝐫𝐡𝐜 − 𝐰, 𝐧𝐫𝐟 𝐨 − 𝐰, 𝐧𝐫𝐰𝐚 − 𝐰, 𝐧𝐫𝐨𝐭 − 𝐰⟩
tion and possible export. These data are extracted from entsoe and • wpaw: (With Production All Week) this dataset has as input the
its granularity is hourly. same of wp dataset but was added the amount of production by
𝑥 in the same hour per all the last week (until current day), here
The following are the sources for the training data used to build our
an example of tuple: ⟨ tind, tiwh, hour, rew, res, reh, reo, trans,
dataset:
dgp, dcp, dpp, nrfc-d1 ⋯ nrfc-d7 , nrfg-d1 ⋯ nrfg-d7 , nrhc-d1
• Macrotrends: this is a research platform for long term investors ⋯ nrhc-d7 , nrfo-d1 ⋯ nrfo-d7 , nrwa-d1 ⋯ nrwa-d7 , nrot-d1
which provides several historical data about prices of utilities and ⋯ nrot-d7 ⟩
materials. We used this platform to grab the gas and oil prices.
The data are organized in a csv file with two columns one with 5. Forecasting energy productions
the date and the other with the relative price;
• Quandl: is a search engine for numerical data. The site offers In this section the techniques used for the prediction of electricity
access to various free (and also paid) social economic and finan- production and the relative error estimation will be described. In order
cial datasets. Quandl indexes data from different sources allowing to obtain a good prediction, first we tried all the techniques seen before
users to find and download it in different formats. We used this with all the created datasets, so we selected the method that returned

4
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

the best results to do the prediction. In the next sections we will de- alpha: 0.9, the value of the alpha-quantile of the huber
scribe only the configurations used for the more performing techniques, loss function and the quantile loss function, is selected after
and subsequently we report the experimental results describing before some test.
how we made the forecasting and how we reached the results. Then
• ETR:
we describe the way that we used to estimate the prediction’s error,
pointing out the used techniques for the prediction are not good enough criterion: mse, this is the function to measure the quality
for this scope and for this we passed to other techniques. of a split. The selected value is equal to variance reduction
as feature selection criterion;
5.1. Usage of prediction techniques for energy forecasting number estimators: 10; this is the number of trees in the
forest, this value is selected after some test;
To carry out the prediction in question, various models were tested, min samples split: 2; this is the minimum number of sam-
among which the best performing were GBRT and ETR. The basic ples required to split an internal node, this value is selected
configuration used for each of these models and the changes made to after some test;
obtain a better prediction for each renewable source will be described min samples leaf: 1; this is the minimum number of sam-
below (in the appendix the other models’ configurations resulting in ples required to be at a leaf node. A split point at any depth
the more effective predictions are reported). The numerical values that will only be considered if it leaves at least min samples leaf
we present together with a brief description of the quantities and training samples in each of the left and right branches. This
fundamental parameters relating to the GBRT and ETR techniques, may have the effect of smoothing the model, especially in
have been chosen according to specific tuning activities related to our regression, this value is selected after some test;
numerical experimentation’s. min weight fraction leaf: 0; this is the minimum weighted
fraction of the sum total of weights (of all the input samples)
• GBRT: required to be at a leaf node. Samples have equal weight
criterion: friedman mse (Friedman, 2001) was chosen as a when sample weight is not provided, this value is selected
split criterion because it allows making split decisions not after some test;
only on how close we are to the desired outcome (which is bootstrap: false, whether bootstrap samples are used when
what MSE does), but also based on the number (in case of building trees. We used this value in order to the whole
unweighted samples) of samples that will fall in the left (𝑙) dataset is used to build each tree.
of right (𝑟) child of the splitted node;
loss: lad (least absolute deviation), was chosen this be- Below there are dataset’s configuration used in order to train the
cause is a highly robust loss function solely based on order models:
information of the input variables;
learning rate: 0.1; learning rate shrinks the contribution • with regard to the production of energy fossil coal, the best
of each tree by learning rate. There is a trade-off between prediction was obtained by GBRT using the wp dataset;
learning rate and number estimators; • as regards the production of energy fossil gas, the same type of
number estimators: the number of boosting stages to per- energy fossil coal data was applied but the best model was the
form has been set at 100. Gradient boosting is fairly robust one using the ETR technique;
to over-fitting so a large number usually results in better • as regards the production of energy hard coal, GBRT was used
performance, this value is selected after some test; as a model and as a dataset we used wpwl increasing the number
subsample: 1, this is the fraction of samples to be used for of estimators to 800;
fitting the individual base learners. If smaller than 1.0 this • as regards the production of energy fossil oil, GBRT was used
results in Stochastic Gradient Boosting. Subsample interacts with a configuration equal to that of the prediction of energy hard
with the parameter number estimators. Choosing subsample coal but with the number of estimators equal to 200. In order
< 1.0 leads to a reduction of variance and an increase in to improve the prediction we tried to obtain this prediction as
bias; difference of the others but the results was poor;
min samples split: is the minimum number of samples • as regards the production of energy waste, the same configura-
required to split an internal node. The value is selected after tion as that for energy fossil oil;
some test and is set equal to 2; • as regards the production of energy other, the same accuracy
min samples leaf: 1, this is the minimum number of sam- result was obtained with both GBRT and ETR; in this case the
ples required to be at a leaf node. A split point at any depth input of the model was the wpaw dataset.
will only be considered if it leaves at least min samples leaf
training samples in each of the left and right branches. This
5.2. Experimental results
may have the effect of smoothing the model, especially in
regression. The reported value is selected after some test;
min weight fraction leaf: 0, this is the minimum weighted In this section we present the experimental results of the proposed
fraction of the sum total of weights (of all the input samples) energy load forecasting method. Initially we describe the accuracy
required to be at a leaf node. Samples have equal weight measures employed, subsequently we go deeply into the description of
when sample weight is not provided; the proposed value is the experimental results describing them for each used technique. In the
selected after some test; end we described the first approach to the estimation of errors by the
max depth: 3, this is the maximum depth of the individ- prediction models (made though this same models) and subsequently,
ual regression estimators. The maximum depth limits the given by the poor performance, we described the second approach
number of nodes in the tree. Tuning this parameter is im- used based on 𝐾𝑁𝑁 algorithm that gives very good results. We use
portant in order to have best performance. The best setting the datasets just described which are divided into training sets of 59k
value depends on the interaction of the input variables. The examples and test sets of 1k. The same subdivision was made for the
proposed value was chosen after some tests; training set used for error estimation.

5
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

5.2.1. Accuracy measures Table 1


Best results reached, for forecasting, crossing all techniques with all energy source and
To measure the quality of the results we used the relative error for
dataset.
the productions’ forecasting, that corresponds to the error in percentage
Technique Not renewable energy source
w.r.t. of the real value of prediction. The Accuracy is calculated on all
Fossil coal Fossil gas Hard coal Fossil oil Waste Other
the test set, by:
Error 1.4% 8.16% 6.43% 39.78% 4.78% 6.55%
𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 − 𝑟𝑒𝑎𝑙 GBRT
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟 = Dataset wp wpwl wpwl wpwl wpwl wpaw
𝑟𝑒𝑎𝑙
Error 3.12% 7.46% 7.95% 54.26% 7.42% 6.55%
∑𝑛 ETR
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟𝑖 Dataset wpwl wp wp wp wp wpaw
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝑖=0
𝑛 Error 4% 12.4% 11.2% 57.3% 11.4% 9.98%
RT
Dataset wp wp wp rf wp wpaw
where 𝑛 is the test set samples quantity.
For the error estimation instead, we used the absolute error, that Error 2% 8.9% 6.56% 45.45% 9.45% 6.79%
RFR
Dataset wpwl wp wp wpaw wpaw wpaw
corresponds to the difference between real value w.r.t. the estimated
value. The error estimation is calculated on all the test set, by: Error 2.46% 8.52% 6.69% 51.05% 9.73% 6.81%
BR
Dataset wp wp wp wpaw wp wpaw
𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 − 𝑟𝑒𝑎𝑙 Error 6.43% 9.79% 12.38% 60.99% 8.99% 8.79%
ABR
Dataset wpwl wpwl wpwl rf wp wpaw
∑𝑛
𝑖=0 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟𝑖 Error 5.8% 8.74% 27.31% 53.76% 7.7% 6.81%
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 = GPML
𝑛 Dataset rf wpaw wpaw wpwl wp rf

where 𝑛 indicates again the test set samples quantity. Error 4.06% 13.25% 99.97% 100% 100% 15.76%
LCV
Dataset wp rf wp rf rf rf

5.2.2. Effectiveness of energy production forecasting: experimental results Error 4.56% 14.14% 99.97% 100% 100% 19.64%
ENCV
Dataset wpaw rf wp rf rf rf
In Table 1 we report the best results obtained using all the tech-
Error 3.28% 7.07% 99.98% 100% 100% 8.52%
niques that we recalled in Section 3 on all the datasets created and PLSR
Dataset wpaw wpwl wpwl rf rf wpaw
described in Section 4. According to the results, the techniques can be
Error
grouped in three quality ranges. The first class contains GBRT and ETR SVR > 100%
Dataset
which have provided the best results, then we have RT, RFR, BR, ABR,
Error
GPML which have generally provided results in some cases acceptable KRR
Dataset
> 100%
in others less and finally LCV, ENCV, PLSR which only with three types
of sources return relatively acceptable results while in the other cases
(hard coal, fossil oil and waste) are of poor quality. In the latter range Table 2
Summary of the best forecasting results reached.
we also include SVR and KRR which provided the worst results by
Production type Best model Percentage error Dataset
far, failing to achieve a degree of precision conceivable for any of the
possible combinations. Energy fossil coal GBRT 1.4% wp
Energy fossil gas Extra Tree Regressor 7.46% wp
In Table 2 we have summarized the best result for each energy
Energy hard coal GBRT 6.43% wpwl
source in Table 1. It can be appreciated that very good accuracy Energy fossil oil GBRT 39.78% wpwl
results where reached for almost all non-renewable energy sources; in Energy waste GBRT 4.78% wpwl
fact, the percentage error of the predictions ranges from 1% to 7% Energy other GBRT/Extra Tree Regressor 6.55% wpaw
(relative error) based on the energy sources taken into consideration.
An exception to this behavior is the case of fossil oil: this is due to the
Table 3
great volatility of its price which is affect by various non-measurable Training time for each technique and energy source (in minutes)
factors such as political and speculative finance issues. Technique Not renewable energy source
Fossil coal Fossil gas Hard coal Fossil oil Waste Other
5.2.3. Results on training efficiency
GBRT ∼2.81 ∼2.78 ∼2.82 ∼2.80 ∼2.77 ∼2.77
In Table 3 the training times w.r.t. the various techniques and ETR ∼2.98 ∼2.90 ∼2.88 ∼2.91 ∼3.01 ∼2.98
the production sources are reported. The experiments were run on RT ∼2.51 ∼2.56 ∼2.55 ∼2.49 ∼2.53 ∼2.48
a workstation with a Intel(R) Core(TM) i7-7500U, 16 GB RAM and RFR ∼2.61 ∼2.59 ∼2.62 ∼2.58 ∼2.58 ∼2.61
a NVidia GeForce GTX 950M. It is worth pointing out that, training BR ∼2.32 ∼2.32 ∼2.33 ∼2.29 ∼2.31 ∼2.32
ABR ∼2.31 ∼2.31 ∼2.29 ∼2.33 ∼2.31 ∼2.30
times are less than 4 minutes for all the considered techniques. As the
GPML ∼2.95 ∼2.96 ∼2.94 ∼2.92 ∼2.92 ∼2.93
prediction task has to be performed daily, this training times meet the LCV ∼2.85 ∼2.84 ∼2.83 ∼2.86 ∼2.85 ∼2.84
training time requirements for this specific task. Indeed, most of the ENCV ∼3.01 ∼3.02 ∼2.99 ∼3.01 ∼3.02 ∼2.98
source data are updated according to a daily schedule, thus making it PLSR ∼2.91 ∼2.92 ∼2.93 ∼2.91 ∼2.93 ∼2.91
SVR ∼3.11 ∼3.10 ∼3.08 ∼3.09 ∼3.11 ∼3.10
meaningful to retrain the prediction models once for day.
KRR ∼2.83 ∼2.81 ∼2.82 ∼2.81 ∼2.80 ∼2.82

5.3. Estimating the prediction error

In this section, we report the results obtained for the estimation of have been generated we measured their effectiveness by considering
the prediction error. More in detail, given a data point 𝑥 characterized the average absolute error of their predictions.
by an actual production value ℎ and a predictor ℎ(⋅), we consider the
relative error of the prediction ( ℎ(𝑥)−ℎ

) and we aim at providing an Unfortunately, all the predictors devised this way exhibit very poor
estimate of such an error. To this end, once the various predictors performances, i.e. they are characterized by an average absolute error
described in the previous section have been generated we build a new of more the 100%, and thus cannot be profitably used for providing the
dataset for each predictors, where each data point is associated with
analyst with an estimate of the quality of the prediction for the specific
its relative error, and for each of these new datasets we devise several
predictors for the relative error, built using the above mentioned tech- data at hand. Therefore, we tried a different estimation technique based
niques adopted for forecasting energy production. Once the predictors on k-nearest neighbors, that is described in the following.

6
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

Table 4 6. Related work


Summary of the best error estimation results reached.
Production type Absolute error Distance used In this section we discuss previous works on similar topics published
Energy fossil coal 0,76 Relative in the literature. The intent is not to provide an exhaustive overview
Energy fossil gas 6,51 Absolute of the works relating to the topics of interest, but rather aims to frame
Energy hard coal 4,12 Absolute
Energy fossil oil 22,36 Absolute
through some recent publications the multiple facets of the addressed
Energy waste 4,13 Absolute problem. Specifically we grouped the discussion on related works with
Energy other 4,47 Absolute respect to the addressed topic.

• Error estimation:
Table 5
Best 𝑘 value for Production Type. Sensoy et al. (2018) proposed a technique that works by
Production type Best 𝑘 value placing a Dirichlet distribution on class probabilities. The
Energy fossil coal 11 authors treat the predictions of a neural network as sub-
Energy fossil gas 7 jective opinions; in this way the function, that collects the
Energy hard coal 38 evidence leading to these opinions from a deterministic
Energy fossil oil 71 neural network, was learned starting from data. The re-
Energy waste 49
Energy other 10
sulting predictor for a multi-class classification problem
is another Dirichlet distribution whose parameters are set
by the continuous output of a neural network. The pre-
sented approach achieves interesting performance on de-
5.3.1. A KNN based approach for estimating the prediction error tection of out-of-distribution queries and endurance against
The k-nearest neighbors algorithm (k-NN) is a non-parametric ap- adversarial perturbations.
proach that may be used to devise both classification and regression, Deep neural networks (NNs) are powerful black box predic-
by comparing a new sample with its 𝑘 closest neighbors in the sample tors that have recently achieved impressive performance on
space. Depending on whether k-NN is used for classification or regres- a wide spectrum of tasks. Quantifying predictive uncertainty
sion, the 𝑘 closest neighbors of a sample are used in two different in NNs is a challenging and yet unsolved problem. Bayesian
ways: NNs, which learn a distribution over weights, are currently
the state-of-the-art for estimating predictive uncertainty;
• The result of 𝑘-NN classification is a class membership. A sample however these require significant modifications to the train-
is categorized by a majority vote of its neighbors, resulting in the ing procedure and are computationally expensive compared
sample to be assigned the most common class among its 𝑘 closest to standard (non-Bayesian) NNs. In Lakshminarayanan et al.
neighbors. (2017), the authors propose an alternative to Bayesian NNs
• The result of 𝑘-NN regression of a sample is value obtained that is simple to implement, readily parallelizable, requiring
averaging the values of the 𝑘 closest neighbors. very little hyperparameter tuning, able to yield high quality
In most cases, the above described classification/regression strategy is predictive uncertainty estimates. Through a series of tests
on classification and regression benchmarks, the presented
slightly modified: the closer neighbors contribute more to the average
solution yields well-calibrated uncertainty estimates that are
than the farther neighbors, by adopting a distance based weighting
as good as or better than approximation Bayesian NNs. To
scheme. A typical weighting scheme, for example, is to provide a weight
assess resilience to dataset shift, the prediction uncertainty
of 𝑑1 to each neighbor 𝑢, where 𝑑 is the distance between 𝑢 and the
on test cases from known and unknown distributions was
input sample. Neighbors are chosen from a collection of objects for
eliminated, showing that it can express higher uncertainty
which the class (in 𝑘-NN classification) or object property value (in 𝑘-
on out-of-distribution situations.
NN regression) is known. Although there is no explicit training phase
necessary, this can be regarded of as the algorithm’s training set. The • Production from renewable energy sources forecasting :
𝑘-NN method is unique in that it is sensitive to the daisy chain’s local
In Basurto et al. (2019), to increase the variety of solar
structure.
energy and power grid combinations that might be em-
In our case, 𝑘-NN was used for regression. Our approach is based
ployed, a Hybrid Intelligent System (HIS) was proposed.
on 𝑘-NN but is different from Sensoy et al. (2018), as we calculated
The presented solution, was designed to predict how much
the error using the 𝑘 closest to the point in two different ways:
energy a solar thermal system will generate. HIS does this
• absolute: using the 𝑘-NN technique, calculating the average of by employing local models (artificial neural networks) that
the error of the 𝑘 closest to the point from which one wanted perform both supervised and unsupervised learning (clus-
to calculate the error. tering). These techniques are combined and evaluated in
• relative: to further improve the results obtained, it was decided a real-world setting in Spain. Data from a complete year
to carry out the same tests but weighing the contribution of the is utilized to analyze and test various models in this case
points based on the distance. study. Using an optimum parameter fit, the proposed system
estimated the solar energy output by the panel with low
Significant results were obtained on the estimation of the error of error in 86 percent of cases.
the forecasting model using k-NN (See Table 4). The obtained values Xie et al. (2015) suggested several new models for fore-
are remarkably low and vary between 0.76% and 6.51% in terms of casting China’s energy production and consumption pat-
average absolute error, where the worse result is obtained for the terns under the impact of the country’s energy conservation
production of fossil oil due to the above mentioned estimation issues strategy. To predict the overall quantity of energy produc-
characterizing this kind of production. tion and consumption, an optimized single variable discrete
We point out that we devised a sensitivity analysis in order to find gray forecasting model is used, while an unique Markov
the optimal 𝑘 value whose results are reported in Tables 4 and 5.The technique based on quadratic programming is presented
error estimation algorithm was run for the values of 𝑘 ranging in the to forecast the trends of energy production and consump-
interval [1..50]. tion structures. The suggested models are used to replicate

7
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

China’s energy production and consumption from 2006 to the presented DFN-AI techniques are resilient and trustwor-
2011, as well as to estimate trends for 2015 and 2020. thy, and are unaffected by random sample selection, sample
The results show that the suggested models can accurately frequency, or sample structure breakdowns.
simulate and anticipate total energy production and con-
sumption quantities and structures. When compared to the • Electric load forecasting
regression model, the findings demonstrate that the sug- Short-term load forecasting, long-term load forecasting, ge-
gested model performs somewhat better in simulating and ographical load forecasting, electricity price forecasting, de-
forecasting the situation. Despite the fact that China’s en- mand response forecasting, and renewable generation fore-
ergy consumption growth rate has slowed as a result of its casting are all examples of energy forecasting in the utility
energy conservation strategy, overall energy consumption business. Because of the storage limitations and societal de-
and the proportions of natural gas and other energies con- mand for power, energy forecasting has numerous intrigu-
tinue to rise, while crude oil and natural gas self-sufficiency ing aspects, such as complicated seasonal patterns, 24/7
rates continue to fall. data collecting across the system, and the requirement to
In Li et al. (2016), artificial neural networks (ANN) and be highly precise. In Hong (2014), an effective forecasts of
support vector regression (SVR) are evaluated for forecast- electric load was provided, it being in fact a crucial issue in
ing energy output from a solar photovoltaic (PV) system the electric power industry. The energy forecasting sector,
in Florida 15 min, 1 h, and 24 h ahead of time. Based on such as demand response forecasting and renewable gen-
the machine learning techniques, a hierarchical approach is eration forecasting, has problems as a result of smart grid
presented. The production data utilized in this study came investment and technology. In the smart grid age, century-
from 15-minute averaged power measurements taken in old energy forecasting gets a fresh lease on life. Electric
2014. Computing error statistics such as mean bias error load forecasting aims at forecasting the expected electricity
(MBE), mean absolute error (MAE), root mean square er- demand at aggregated levels. Traditionally electric load
ror (RMSE), relative MBE (rMBE), mean percentage error forecasting techniques yield as result the expected value
(MPE), and relative RMSE are used to assess the model’s of the electric load, but recently the field of probabilistic
correctness (rRMSE). This research reveals how individual electric load forecasting (PLF) as gained relevance, which
inverter projections might enhance the PV system’s total aim at yielding predictions in terms of quantities, intervals,
solar power output forecast. or density functions. Since the beginning of the electric
power sector, load forecasting has been a basic commercial
• energy market indices: concern. Both research and commercial practices in this
field have largely concentrated on point load forecasting
In Wang and Wang (2016) was established a novel neural during the last 100 years or more. However, in the last
network architecture that combines Multilayer perception decade, rising market competition, aging infrastructure, and
and ERNN (Elman recurrent neural networks) with stochas- renewable integration requirements have made probabilistic
tic time effective function in an attempt to enhance the load forecasting an increasingly crucial part of energy sys-
predicting accuracy of crude oil price variations. ERNN is tem planning and management. In Hong and Fan (2016), an
a time-varying predictive control system that was designed overview of probabilistic electric load forecasting, covering
to remember recent occurrences in order to forecast future key approaches, methodologies, and assessment methods, as
output. According to the stochastic time effective function, well as frequent misconceptions, was provided. The authors
current information has a greater impact on investors than stressed the importance of investing in additional research,
older information. The empirical research performed well in such as repeatable case studies, probabilistic load prediction
assessing the prediction impact on four time series indices evaluation and value, and a probabilistic load forecasting
using the developed model. In comparison to previous mod- methodology that takes into account new technologies and
els, this one can assess data from the 1990s to 2016 with energy regulations.
notable precision and speed. The applied CID (complexity
invariant distance) analysis and multiscale CID analysis are 7. Conclusions and future work
presented as new helpful methods to assess if the sug-
gested model has a superior forecasting capacity than other In this work the problem of devising effective predictions of the
standard models. amount of electrical power produced using non-renewable sources,
together with a point-wise estimate of the quality of the predictions.
The challenge of predicting crude oil prices is difficult.
The forecasts that will be obtained with the various models considered
In Wang et al. (2018) the presented study offers the DFN-
in this work are of good quality (except for the production using fossil
AI model, an unique hybrid technique that combines an
oil) and would thus be very useful for giving indications to energy
integrated data fluctuation network (DFN) with multiple
market traders in order to predict energy consumption or prices in
artificial intelligence (AI) methods to enhance forecasting. A
advance.
complex network time series analysis technique is used as a
In the future, it will be interesting to consider the possibility of
preprocessor for the original data to extract the fluctuation
improving the prediction for the case of fossil oil by augmenting the
features and reconstruct the data, and then an artificial input data used for devising prediction in this work with new kinds of
intelligence tool, such as BPNN, RBFNN, or ELM, is used to data such as data extracted by financial newspaper in order to identify
model the reconstructed data and predict the future data in political and non-political trends that typically have a strong influence
the proposed DFN-AI model. The authors investigated daily, on such a production.
weekly, and monthly pricing data from the crude oil trading
hub in Cushing, Oklahoma, to confirm these findings. The CRediT authorship contribution statement
suggested DFN-AI models (i.e., DFN-BP, DFN-RBF, and DFN-
ELM) outperform their equivalent single AI models in both Sergio Flesca: Conceptualization, Methodology, Software.
the direction and level of prediction, according to empirical Francesco Scala: Conceptualization, Methodology, Software. Euge-
data. This demonstrates the efficacy of the suggested mod- nio Vocaturo: Conceptualization, Methodology, Software. Francesco
eling of crude oil prices’ nonlinear tendencies. Furthermore, Zumpano: Conceptualization, Methodology, Software.

8
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

Declaration of competing interest eps: 0.001;


l1 ratio: 0.5;
The authors declare that they have no known competing max iter: 1000;
financial interests or personal relationships that could have appeared n alphas: 100;
to influence the work reported in this paper. normalize: False;
positive: False;
Appendix selection: ‘cyclic’;
tol: 0.0001;
Below the best configuration for the models that have given
worse results: • The PLSR:

• The NN: we tried some type of NN feed-forward, convolutional ncomponents: 2.


and recursive, but they performed poorly due to the scarcity of
data; References
• The SVR:
Awad, M., & Khanna, R. (2015). Support vector regression. In Efficient learning
kernel: rbf; machines: Theories, concepts, and applications for engineers and system designers
gamma: 0.1; (pp. 67–80). Berkeley, CA: A Press.
Ban, T., Zhang, R., Pang, S., Sarrafzadeh, A., & Inoue, D. (2013). Referential kNN
param grid:
regression for financial time series forecasting. In Neural information processing
(pp. 601–608). Springer Berlin Heidelberg.
c: [0.001, 0.01, 0.1, 1, 10];
Basurto, N., Arroyo, A., Vega, R., Quintián, H., Calvo-Rolle, J. L., & Herrero, A.
gamma: [0.001, 0.01, 0.1, 1]. (2019). A hybrid intelligent system to forecast solar energy production.
Computers and Electrical Engineering, 78, 373–387.
• The KRR: Breiman, L. (1996). Bagging predictors. Machine Learning, 123–140.
Breiman, L. (1999). Pasting small votes for classification in large databases and
kernel: rbf;
on-line. Machine Learning, 85–103.
gamma: 0.1; Breiman, L. (2001). Random forests. Machine Learning, 5–32.
Carl Edward Rasmussen, H. N. (2010). Gaussian processes for machine learning
• The RT: (GPML) toolbox. Journal of Machine Learning Research, 11, 3011–3015.
Ceci, M., Corizzo, R., Fumarola, F., Ianni, M., Malerba, D., Maria, G., Masciari, E.,
criterion: mse; Oliverio, M., & Rashkovska, A. (2015). Big data techniques for supporting
min samples split: 2; accurate predictions of energy production from renewable sources. In
min samples leaf: 1; Proceedings of the 19th international database engineering & applications symposium
min weight fraction leaf: 0; (pp. 62–71).
Ferruzzi, G., Cervone, G., Delle Monache, L., Graditi, G., & Jacobone, F.
min impurity decrease: 0. (2016). Optimal bidding in a day-ahead energy market for micro grid under
uncertainty in renewable energy production. Energy, 106, 194–202.
• The RFR:
Freund, Y., & Schapire, R. E. (1995). A decision-theoretic generalization of on-line
learning and an application to boosting. In Lecture Notes in Computer Science:
n estimators: 20;
vol. 904, Computational learning theory, second European conference, EuroCOLT
criterion: mse; ’95, Barcelona, Spain, March 13-15, 1995, proceedings (pp. 23–37). Springer.
min samples split: 4; Friedman, J. H. (2001). Greedy function approximation: A gradient boosting
min samples leaf: 1; machine.. The Annals of Statistics, 29(5), 1189–1232.
min weight fraction leaf: 0.01; Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data
Analysis, 38(4), 367–378.
min impurity decrease: 0; Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization paths for
bootstrap: true; generalized linear models via coordinate descent. Journal of Statistical Software,
oob score: false; 33(1), 1–22.
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine
• The BR: Learning, 63(4), 3–42.
Ho, T. K. (1998). The random subspace method for constructing decision forests.
n estimators: 10; IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
max samples: 1; Hong, T. (2014). Energy forecasting: Past, present, and future. Foresight: The
max features: 1; International Journal of Applied Forecasting, (32), 43–48.
Hong, T., & Fan, S. (2016). Probabilistic electric load forecasting: A tutorial
bootstrap: true; review. International Journal of Forecasting, 32(3), 914–938.
warm start: false; Hui Zou, T. H. (2004). Regularization and variable selection via the elastic
oob score: true; net. Journal of the Royal Statistical Society. Series B. Statistical Methodology,
301—320.
• The ABR: Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and scalable
predictive uncertainty estimation using deep ensembles. In Advances in neural
n estimators: 50; information processing systems 30 (pp. 6402–6413).
learning rate: 1; Li, Z., Rahman, S., Vega, R., & Dong, B. (2016). A hierarchical approach using
loss: linear; machine learning methods in solar photovoltaic energy production forecasting.
Energies, 220.
• The GPML: Loh, W.-Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews:
Data Mining and Knowledge Discovery, 1, 14–23.
kernel: DotProduct() + WhiteKernel(); Louppe, G., & Geurts, P. (2012). Ensembles on random patches. In Machine
optimizer: fmin 1 bfgs b; learning and knowledge discovery in databases (pp. 346–361). Springer Berlin
Heidelberg.
• The LCV: Mehlig, B. (2019). Machine learning with neural networks. arXiv e-prints.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. The MIT Press.
alpha: 0.1; Obuchi, T., & Kabashima, Y. (2016). Cross validation in LASSO and its
acceleration. Journal of Statistical Mechanics: Theory and Experiment, 2016(5),
• The ENCV: Article 053304.
Rosipal, R., & Krämer, N. (2005). Overview and recent advances in partial least
cv: 5; squares. Lecture Notes in Computer Science, 34–51.

9
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936

Sensoy, M., Kaplan, L. M., & Kandemir, M. (2018). Evidential deep learning to Xie, N.-M., qing Yuan, C., & jie Yang, Y. (2015). Forecasting China’s energy
quantify classification uncertainty. In Advances in neural information processing demand and self-sufficiency rate by grey forecasting model and Markov model.
systems 31: Annual conference on neural information processing systems 2018 (pp. International Journal of Electrical Power & Energy Systems, 66, 1–8.
3183–3193). Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic
Wang, J., & Wang, J. (2016). Forecasting energy market indices with recurrent net. Journal of the Royal Statistical Society. Series B. Statistical Methodology,
neural networks: Case study of crude oil price fluctuations. Energy, 102, 67(2), 301–320.
365–374.
Wang, M., Zhao, L., Du, R., Wang, C., Chen, L., Tian, L., & Eugene Stanley, H.
(2018). A novel hybrid method of forecasting crude oil prices using complex
network science and artificial intelligence algorithms. Applied Energy, 220,
480–495.

10

You might also like