Professional Documents
Culture Documents
1 s2.0 S0957417422003670 Main
1 s2.0 S0957417422003670 Main
1 s2.0 S0957417422003670 Main
Keywords: Nowadays the introduction of energy marketplaces in several countries pushed the development of machine
Energy production forecasting learning approaches for devising effective predictions about both energy needs and energy productions. In this
Machine learning applications paper we address the problem of predicting the amount of electrical power produced using non-renewable
sources, as getting an estimate of the amount of electrical power produced using the various kinds of non-
renewable sources yields a big competitive advantage for energy market investors. Specifically, we devise a
forecasting technique obtained by trying and combining various machine learning techniques which is able to
provide energy production estimates with a remarkably low error. Finally, since the input data available for
predictions are in general not sufficient to determine the amounts of produced energy for the various source
types, we provide an estimate of the impact of unknown latent variable on the amounts of produced energy,
by devising a prediction model which is capable of estimating the prediction error for the specific data at
hand. These informations can be exploited by investors to get an idea of the risk levels of their investments.
https://doi.org/10.1016/j.eswa.2022.116936
Received 22 October 2021; Received in revised form 6 March 2022; Accepted 17 March 2022
Available online 26 March 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
2
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
long as the remaining number of training samples associated to • The LCV (Lasso CV) (Obuchi & Kabashima, 2016) is a linear
the node is higher then the minimum sample size for splitting a model with iterative fitting along a regularization path. The best
node. After all the trees are (independently) grown, the final ETR model is selected by cross-validation. The optimization objective
prediction is the arithmetic average of the single predictions of for Lasso is:
each tree. ( )
1
• The KNN (K-Nearest Neighbors), Ban et al. (2013) is one of the min × ‖𝑋 ⋅ 𝜔 − 𝑦‖22 + 𝛼 × ‖𝜔‖1
𝜔 2 × 𝑛𝑠𝑎𝑚𝑝𝑙𝑒𝑠
simplest machine learning algorithms. Usually, k is a small, odd
number — sometimes only 1. The larger k is, the more accurate where 𝜔 is the vector of parameters, 𝑋 is the matrix of samples, 𝑦
the classification will be, but the longer it takes to perform the is the vector of actual labels and 𝛼 is a regularization parameter.
classification. To classify an object into one of several classes this • The ENCV (Elastic Net CV) (Hui Zou, 2004) is a model with itera-
technique looking at the k elements of the training set that are tive fitting along a regularization path. The best model is selected
closest to the sample you want to classify, and letting them vote by cross-validation. Here is explained in detail the elastic net
by majority on what that object’s class should be. ‘‘Closest’’ here (EN) method (Zou & Hastie, 2005) that is based on a compromise
refers to literal distance in n-dimensional space, or the Euclidean between the lasso and ridge regression penalties:
distance; ⎧ 𝑛 ( )2
⎪∑ ∑𝑝
• The NN (Neural networks) (Mehlig, 2019) is a computational
̂ ̂
𝛽0 , 𝛽 = 𝑎𝑟𝑔𝑚𝑖𝑛 ⎨ 𝑦𝑖 − 𝛽0 − 𝛽𝑗 𝑋𝑖𝑗
model composed of artificial ‘‘neurons’’, inspired by the simpli- 𝛽0 ,𝛽 ⎪ 𝑖=1 𝑗=1
fication of a biological neural network; ⎩
• The SVR (Support Vector Regression) (Awad & Khanna, 2015): ⎫
𝑝 [
∑ ]
this type of modes is an extension of SVM (Support Vector Ma- 1 | | ⎪
+𝜆 (1 − 𝛼) 𝛽𝑗2 + 𝛼 |𝛽𝑗 | ⎬
chines) but for regression and not for a simple classification; 2 | |
𝑗=1 ⎪
• The KRR (Kernel Ridge Regression) (Murphy, 2012) combines ⎭
Ridge regression and classification (linear least squares with l2- where 0 ≤ 𝛼 ≤ 1 is a penalty weight, 𝑦 is a vector of length 𝑛
norm regularization) with the kernel trick. Thus it learns a linear including the response variable, 𝑋 = (𝑥𝑖1 , … , 𝑥𝑖𝑝 ) is a 𝑛 × 𝑝 matrix
function in the space induced by the respective kernel and the holding the predictor variables, 𝛽0 is the intercept, 𝛽 = (𝛽1 , … , 𝛽𝑝 )
data. For non-linear kernels, this corresponds to a non-linear is a column vector that contains the regression coefficients and 𝑒
function in the original space; is a vector of error terms assuming normal distribution 𝑒 𝑁(0, 𝜔2𝑒 ).
• The RT (Regression Tree) (Loh, 2011) maximizes [𝐶, 𝑌 ], where 𝑌 For models where 𝑛 > 𝑝, the values of the unknown parameters
is now the dependent variable, and 𝐶 is the variable that indicates 𝛽0 and 𝛽 can be uniquely estimated by minimizing the residual
the height of the current tree. A direct maximization cannot be sum of squares. The 𝐸𝑁 with 𝛼 = 1 is identical to the lasso,
performed, so a greedy search is done. We start with the single whereas it turns out to be ridge regression with 𝛼 = 0 (Friedman
binary question that maximizes the information obtained on 𝑌 ; et al., 2010). Setting 𝛼 close to 1 makes the 𝐸𝑁 to behave similar
this gives one root node and two child nodes. At each child node, to the lasso, but eliminates problematic behavior caused by high
the initial procedure is repeated, asking which question would correlations. When 𝛼 increases from 0 to 1, for a given 𝜆 the
give the most information about 𝑌 , given where we are already sparsity of the minimization (i.e., the number of coefficients equal
in the tree, all this recursively; to zero) increases monotonically from 0 to the sparsity of the
• The RFR (Random Forest Regression) (Breiman, 2001): a random lasso estimation. The elastic net can select more variables than
forest is a meta estimator that fits a number of classifying decision observations.
trees on various sub-samples of the dataset and uses averaging to • The PLSR (Partial Least Squares Regression) (Rosipal & Krämer,
improve the predictive accuracy and control over-fitting; 2005) is a quick, efficient and optimal regression method based
• The BR (Bagging Regressor) (Breiman, 1996): a Bagging regressor on covariance. The idea behind this technique is to create, starting
is an ensemble meta-estimator that fits base regressors each on from a table with n observations described by p variables, a set
random subsets of the original dataset and then aggregate their of h components with the PLS1 and PLS2 algorithms.
individual predictions to form a final prediction. It can be used
as a way to reduce the variance of a black-box estimator, by in- 4. Datasets
troducing randomization into its construction procedure and then
making an ensemble out of it. When random subsets of the dataset The dataset used for training the models were downloaded from the
are drawn as random subsets of the samples, then this algorithm web sites of the regulatory bodies of the electrical market, with the
is known as Pasting (Breiman, 1999). If samples are drawn with exception of the data about renewable sources production which are
replacement, then the method is known as Bagging (Breiman, the prediction for the next day which was provided to us by an already
1996). When random subsets of the dataset are drawn as random consolidated model. We do not report the details of this model in this
subsets of the features, then the method is known as Random paper as (1) the model is used as a black box and it is orthogonal to
Subspaces (Ho, 1998). Finally, when base estimators are built on the proposed approach, and (2) it is a proprietary model provided us by
subsets of both samples and features, then the method is known Infopower Research. The granularity of the data is hourly. The amount
as Random Patches (Louppe & Geurts, 2012); of data is relatively small (about 3 years data hour per hour therefore
• The ABR (Ada Boost Regressor) (Freund & Schapire, 1995) is a about 60k samples, divided into training sets and test sets of 59k and 1k
meta-estimator that begins by fitting a regressor on the original respectively) in fact there is no history large enough to think of using
dataset and then fits additional copies of the regressor on the models as neural networks (of any architecture) but we have opted for
same dataset but where the weights of instances are adjusted ac- the use of the models seen in the previous section.
cording to the error of the current prediction. As such, subsequent The data used as inputs for the predictors have been selected
regressors focus more on difficult cases;
according to several standpoints: the possibility to retrieve them with
• The GPML (Gaussian Processes for Machine Learning) (Carl Ed- no license fees and the coverage of the variables that may influence the
ward Rasmussen, 2010) provides a wide range of functionality for energy market. Hence, each data source that has been selected for this
Gaussian process inference and prediction. These processes are purpose contains specific information about some particular feature,
specified by mean and covariance functions. Several likelihood
such as:
functions are supported including Gaussian and heavy-tailed for
regression as well as others suitable for classification; • the prices of raw materials used for a specific source of electricity,
3
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
• the trend of the production of non-renewable sources (assuming platform to grab the carbon price. The data are organized in a
that the trend of the previous days could be an indicator of today’s csv file containing the following columns: Date, Open price, High
production), price for the day, Low price for the day, Settle price for the day,
• the quantity of electricity production from renewable sources (as Change price for the day, Wave, Volume, Prev. Day Open Interest,
non-renewable sources production level depends on the amount EFP Volume, EFS Volume, Block Volume. The only data used for
of production from renewable sources), our dataset was the Settle price for the day;
• the amount of energy imported from neighbor states, and • Entsoe: is the European association for the cooperation of trans-
• the information on holidays, as energy consumption undergo high mission system operators (TSOs) for electricity, it supply a web
variations mainly related to holidays and on seasonal basis. platform to get numerosus informations on electricity related
data. In fact, we used this platform to grab the following data:
In the following we provide a more detailed description of the data
that will be placed as input to the prediction model. Wind production;
Solar production;
• price of raw materials: useful since the price can influence whether
Water production;
to make better use of one or the other non-renewable source.
Cross-border flows;
Specifically, the dataset contains the prices of gas (daily gas price
Consumptions.
- dgp), coal (dcp), and petroleum (dpp); these data are extracted
from macrotrends and quandl platforms, and their granularity is The data are organized in multiple csv files. Those containing
daily; the production data contain the following columns: year, month,
• renewable sources production: necessary because the amount of day, Date Time, Resolution Code, areacode, Area Type Code,
production of non-renewable sources below is what renewable Area Name, Map Code, Production Type, Actual Generation Out-
sources have managed to generate wind (rew), solar (res), hydro put, Actual Consumption, Update Time. Instead those containing
water (reh), other renewable (reo). These data are extracted from the consumption data have contain the following columns: year,
entsoe platform, and their granularity is hourly, instead for predic- month, day, Date Time, Resolution Code, areacode, Area Type
tion we use the data supplied by prediction models by infopower Code, Area Name, Map Code, Total Load Value, Update Time.
because on entsoe there are only historical data and not prediction All the data grabbed from this source need to be pre-processed
for the future; before their usage for training the predictors. In particular, the
• not renewable sources production: necessary as history the amount data have been aggregated using their timestamps. In fact the
of production of not renewable sources in Italian territory, fossil production data available in this source are not cumulative for
coal (nrfc), fossil gas (nrfg), hard coal (nrhc), fossil oil (nrfo), each day. The same pre-processing step was adopted for the
waste (nrwa), other (nrot). These data are extracted from entsoe consumptions. Finally, redundant and inconsistent information
platform, and their granularity is hourly, we will add the suffix and information unrelated to the energy market were removed
(-w) to refer to the previous week’s data and (-d) to refer to the in the pre-processing phase.
previous day’s data; • Holidays library: is a Python library for generating on the fly
• transits: the amount of energy between nations is relevant so holiday information specifying country, province and state. This
transits can be seen as a extra energy source (trans). This data library was used to get holiday information.
is extracted from entsoe platform, and its granularity is hourly,
instead for prediction we use the data supplied by prediction Starting from these data we created four different datasets for each
models by infopower because on entsoe there are only historical type of not renewable production (called 𝑥 for simplicity):
data and not prediction for the future; • rf: (ReFined) this dataset has as input all the data seen just now.
• temporal informations: useful for building a sequence of data that We report here an example of tuple: ⟨tind, tiwh, hour, rew, res, reh,
will hopefully repeat over time. Temporal informations also serve reo, trans, dgp, dcp, dpp⟩
to better define how the environment operates – the consumption • wp: (With Production) this dataset has as input the same of
of individuals – based on current time (same as hour, day, month wp dataset but was added the amount of production by 𝑥 in
or year). Obviously the granularity of these data is hourly: the previous day at the same hour; here an example of tuple:
number of the day in the year (we want to assume that ⟨tind, tiwh, hour, rew, res, reh, reo, trans, dgp, dcp, dpp, 𝐧𝐫𝐟 𝐜 − 𝐝,
history is ‘repeated’ annually) (tind) this data is modeled as 𝐧𝐫𝐟 𝐠 − 𝐝, 𝐧𝐫𝐡𝐜 − 𝐝, 𝐧𝐫𝐟 𝐨 − 𝐝, 𝐧𝐫𝐰𝐚 − 𝐝, 𝐧𝐫𝐨𝐭 − 𝐝⟩
an integer stating from 1 for Sunday until 7 for Saturday; • wpwl: (With Production Week Later) this dataset has as input the
weekday/holiday (tiwh) this data is modeled as an integer same of wp dataset but was added the amount of production by
(boolean) which 1 means that is a holiday and 0 otherwise; 𝑥 in the same hour but in the same day of previous week, here an
example of tuple: ⟨tind, tiwh, hour, rew, res, reh, reo, trans, dgp, dcp,
• energy consumption: the estimated amount of citizenship consump- dpp, 𝐧𝐫𝐟 𝐜 − 𝐰, 𝐧𝐫𝐟 𝐠 − 𝐰, 𝐧𝐫𝐡𝐜 − 𝐰, 𝐧𝐫𝐟 𝐨 − 𝐰, 𝐧𝐫𝐰𝐚 − 𝐰, 𝐧𝐫𝐨𝐭 − 𝐰⟩
tion and possible export. These data are extracted from entsoe and • wpaw: (With Production All Week) this dataset has as input the
its granularity is hourly. same of wp dataset but was added the amount of production by
𝑥 in the same hour per all the last week (until current day), here
The following are the sources for the training data used to build our
an example of tuple: ⟨ tind, tiwh, hour, rew, res, reh, reo, trans,
dataset:
dgp, dcp, dpp, nrfc-d1 ⋯ nrfc-d7 , nrfg-d1 ⋯ nrfg-d7 , nrhc-d1
• Macrotrends: this is a research platform for long term investors ⋯ nrhc-d7 , nrfo-d1 ⋯ nrfo-d7 , nrwa-d1 ⋯ nrwa-d7 , nrot-d1
which provides several historical data about prices of utilities and ⋯ nrot-d7 ⟩
materials. We used this platform to grab the gas and oil prices.
The data are organized in a csv file with two columns one with 5. Forecasting energy productions
the date and the other with the relative price;
• Quandl: is a search engine for numerical data. The site offers In this section the techniques used for the prediction of electricity
access to various free (and also paid) social economic and finan- production and the relative error estimation will be described. In order
cial datasets. Quandl indexes data from different sources allowing to obtain a good prediction, first we tried all the techniques seen before
users to find and download it in different formats. We used this with all the created datasets, so we selected the method that returned
4
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
the best results to do the prediction. In the next sections we will de- alpha: 0.9, the value of the alpha-quantile of the huber
scribe only the configurations used for the more performing techniques, loss function and the quantile loss function, is selected after
and subsequently we report the experimental results describing before some test.
how we made the forecasting and how we reached the results. Then
• ETR:
we describe the way that we used to estimate the prediction’s error,
pointing out the used techniques for the prediction are not good enough criterion: mse, this is the function to measure the quality
for this scope and for this we passed to other techniques. of a split. The selected value is equal to variance reduction
as feature selection criterion;
5.1. Usage of prediction techniques for energy forecasting number estimators: 10; this is the number of trees in the
forest, this value is selected after some test;
To carry out the prediction in question, various models were tested, min samples split: 2; this is the minimum number of sam-
among which the best performing were GBRT and ETR. The basic ples required to split an internal node, this value is selected
configuration used for each of these models and the changes made to after some test;
obtain a better prediction for each renewable source will be described min samples leaf: 1; this is the minimum number of sam-
below (in the appendix the other models’ configurations resulting in ples required to be at a leaf node. A split point at any depth
the more effective predictions are reported). The numerical values that will only be considered if it leaves at least min samples leaf
we present together with a brief description of the quantities and training samples in each of the left and right branches. This
fundamental parameters relating to the GBRT and ETR techniques, may have the effect of smoothing the model, especially in
have been chosen according to specific tuning activities related to our regression, this value is selected after some test;
numerical experimentation’s. min weight fraction leaf: 0; this is the minimum weighted
fraction of the sum total of weights (of all the input samples)
• GBRT: required to be at a leaf node. Samples have equal weight
criterion: friedman mse (Friedman, 2001) was chosen as a when sample weight is not provided, this value is selected
split criterion because it allows making split decisions not after some test;
only on how close we are to the desired outcome (which is bootstrap: false, whether bootstrap samples are used when
what MSE does), but also based on the number (in case of building trees. We used this value in order to the whole
unweighted samples) of samples that will fall in the left (𝑙) dataset is used to build each tree.
of right (𝑟) child of the splitted node;
loss: lad (least absolute deviation), was chosen this be- Below there are dataset’s configuration used in order to train the
cause is a highly robust loss function solely based on order models:
information of the input variables;
learning rate: 0.1; learning rate shrinks the contribution • with regard to the production of energy fossil coal, the best
of each tree by learning rate. There is a trade-off between prediction was obtained by GBRT using the wp dataset;
learning rate and number estimators; • as regards the production of energy fossil gas, the same type of
number estimators: the number of boosting stages to per- energy fossil coal data was applied but the best model was the
form has been set at 100. Gradient boosting is fairly robust one using the ETR technique;
to over-fitting so a large number usually results in better • as regards the production of energy hard coal, GBRT was used
performance, this value is selected after some test; as a model and as a dataset we used wpwl increasing the number
subsample: 1, this is the fraction of samples to be used for of estimators to 800;
fitting the individual base learners. If smaller than 1.0 this • as regards the production of energy fossil oil, GBRT was used
results in Stochastic Gradient Boosting. Subsample interacts with a configuration equal to that of the prediction of energy hard
with the parameter number estimators. Choosing subsample coal but with the number of estimators equal to 200. In order
< 1.0 leads to a reduction of variance and an increase in to improve the prediction we tried to obtain this prediction as
bias; difference of the others but the results was poor;
min samples split: is the minimum number of samples • as regards the production of energy waste, the same configura-
required to split an internal node. The value is selected after tion as that for energy fossil oil;
some test and is set equal to 2; • as regards the production of energy other, the same accuracy
min samples leaf: 1, this is the minimum number of sam- result was obtained with both GBRT and ETR; in this case the
ples required to be at a leaf node. A split point at any depth input of the model was the wpaw dataset.
will only be considered if it leaves at least min samples leaf
training samples in each of the left and right branches. This
5.2. Experimental results
may have the effect of smoothing the model, especially in
regression. The reported value is selected after some test;
min weight fraction leaf: 0, this is the minimum weighted In this section we present the experimental results of the proposed
fraction of the sum total of weights (of all the input samples) energy load forecasting method. Initially we describe the accuracy
required to be at a leaf node. Samples have equal weight measures employed, subsequently we go deeply into the description of
when sample weight is not provided; the proposed value is the experimental results describing them for each used technique. In the
selected after some test; end we described the first approach to the estimation of errors by the
max depth: 3, this is the maximum depth of the individ- prediction models (made though this same models) and subsequently,
ual regression estimators. The maximum depth limits the given by the poor performance, we described the second approach
number of nodes in the tree. Tuning this parameter is im- used based on 𝐾𝑁𝑁 algorithm that gives very good results. We use
portant in order to have best performance. The best setting the datasets just described which are divided into training sets of 59k
value depends on the interaction of the input variables. The examples and test sets of 1k. The same subdivision was made for the
proposed value was chosen after some tests; training set used for error estimation.
5
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
where 𝑛 indicates again the test set samples quantity. Error 4.06% 13.25% 99.97% 100% 100% 15.76%
LCV
Dataset wp rf wp rf rf rf
5.2.2. Effectiveness of energy production forecasting: experimental results Error 4.56% 14.14% 99.97% 100% 100% 19.64%
ENCV
Dataset wpaw rf wp rf rf rf
In Table 1 we report the best results obtained using all the tech-
Error 3.28% 7.07% 99.98% 100% 100% 8.52%
niques that we recalled in Section 3 on all the datasets created and PLSR
Dataset wpaw wpwl wpwl rf rf wpaw
described in Section 4. According to the results, the techniques can be
Error
grouped in three quality ranges. The first class contains GBRT and ETR SVR > 100%
Dataset
which have provided the best results, then we have RT, RFR, BR, ABR,
Error
GPML which have generally provided results in some cases acceptable KRR
Dataset
> 100%
in others less and finally LCV, ENCV, PLSR which only with three types
of sources return relatively acceptable results while in the other cases
(hard coal, fossil oil and waste) are of poor quality. In the latter range Table 2
Summary of the best forecasting results reached.
we also include SVR and KRR which provided the worst results by
Production type Best model Percentage error Dataset
far, failing to achieve a degree of precision conceivable for any of the
possible combinations. Energy fossil coal GBRT 1.4% wp
Energy fossil gas Extra Tree Regressor 7.46% wp
In Table 2 we have summarized the best result for each energy
Energy hard coal GBRT 6.43% wpwl
source in Table 1. It can be appreciated that very good accuracy Energy fossil oil GBRT 39.78% wpwl
results where reached for almost all non-renewable energy sources; in Energy waste GBRT 4.78% wpwl
fact, the percentage error of the predictions ranges from 1% to 7% Energy other GBRT/Extra Tree Regressor 6.55% wpaw
(relative error) based on the energy sources taken into consideration.
An exception to this behavior is the case of fossil oil: this is due to the
Table 3
great volatility of its price which is affect by various non-measurable Training time for each technique and energy source (in minutes)
factors such as political and speculative finance issues. Technique Not renewable energy source
Fossil coal Fossil gas Hard coal Fossil oil Waste Other
5.2.3. Results on training efficiency
GBRT ∼2.81 ∼2.78 ∼2.82 ∼2.80 ∼2.77 ∼2.77
In Table 3 the training times w.r.t. the various techniques and ETR ∼2.98 ∼2.90 ∼2.88 ∼2.91 ∼3.01 ∼2.98
the production sources are reported. The experiments were run on RT ∼2.51 ∼2.56 ∼2.55 ∼2.49 ∼2.53 ∼2.48
a workstation with a Intel(R) Core(TM) i7-7500U, 16 GB RAM and RFR ∼2.61 ∼2.59 ∼2.62 ∼2.58 ∼2.58 ∼2.61
a NVidia GeForce GTX 950M. It is worth pointing out that, training BR ∼2.32 ∼2.32 ∼2.33 ∼2.29 ∼2.31 ∼2.32
ABR ∼2.31 ∼2.31 ∼2.29 ∼2.33 ∼2.31 ∼2.30
times are less than 4 minutes for all the considered techniques. As the
GPML ∼2.95 ∼2.96 ∼2.94 ∼2.92 ∼2.92 ∼2.93
prediction task has to be performed daily, this training times meet the LCV ∼2.85 ∼2.84 ∼2.83 ∼2.86 ∼2.85 ∼2.84
training time requirements for this specific task. Indeed, most of the ENCV ∼3.01 ∼3.02 ∼2.99 ∼3.01 ∼3.02 ∼2.98
source data are updated according to a daily schedule, thus making it PLSR ∼2.91 ∼2.92 ∼2.93 ∼2.91 ∼2.93 ∼2.91
SVR ∼3.11 ∼3.10 ∼3.08 ∼3.09 ∼3.11 ∼3.10
meaningful to retrain the prediction models once for day.
KRR ∼2.83 ∼2.81 ∼2.82 ∼2.81 ∼2.80 ∼2.82
In this section, we report the results obtained for the estimation of have been generated we measured their effectiveness by considering
the prediction error. More in detail, given a data point 𝑥 characterized the average absolute error of their predictions.
by an actual production value ℎ and a predictor ℎ(⋅), we consider the
relative error of the prediction ( ℎ(𝑥)−ℎ
ℎ
) and we aim at providing an Unfortunately, all the predictors devised this way exhibit very poor
estimate of such an error. To this end, once the various predictors performances, i.e. they are characterized by an average absolute error
described in the previous section have been generated we build a new of more the 100%, and thus cannot be profitably used for providing the
dataset for each predictors, where each data point is associated with
analyst with an estimate of the quality of the prediction for the specific
its relative error, and for each of these new datasets we devise several
predictors for the relative error, built using the above mentioned tech- data at hand. Therefore, we tried a different estimation technique based
niques adopted for forecasting energy production. Once the predictors on k-nearest neighbors, that is described in the following.
6
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
• Error estimation:
Table 5
Best 𝑘 value for Production Type. Sensoy et al. (2018) proposed a technique that works by
Production type Best 𝑘 value placing a Dirichlet distribution on class probabilities. The
Energy fossil coal 11 authors treat the predictions of a neural network as sub-
Energy fossil gas 7 jective opinions; in this way the function, that collects the
Energy hard coal 38 evidence leading to these opinions from a deterministic
Energy fossil oil 71 neural network, was learned starting from data. The re-
Energy waste 49
Energy other 10
sulting predictor for a multi-class classification problem
is another Dirichlet distribution whose parameters are set
by the continuous output of a neural network. The pre-
sented approach achieves interesting performance on de-
5.3.1. A KNN based approach for estimating the prediction error tection of out-of-distribution queries and endurance against
The k-nearest neighbors algorithm (k-NN) is a non-parametric ap- adversarial perturbations.
proach that may be used to devise both classification and regression, Deep neural networks (NNs) are powerful black box predic-
by comparing a new sample with its 𝑘 closest neighbors in the sample tors that have recently achieved impressive performance on
space. Depending on whether k-NN is used for classification or regres- a wide spectrum of tasks. Quantifying predictive uncertainty
sion, the 𝑘 closest neighbors of a sample are used in two different in NNs is a challenging and yet unsolved problem. Bayesian
ways: NNs, which learn a distribution over weights, are currently
the state-of-the-art for estimating predictive uncertainty;
• The result of 𝑘-NN classification is a class membership. A sample however these require significant modifications to the train-
is categorized by a majority vote of its neighbors, resulting in the ing procedure and are computationally expensive compared
sample to be assigned the most common class among its 𝑘 closest to standard (non-Bayesian) NNs. In Lakshminarayanan et al.
neighbors. (2017), the authors propose an alternative to Bayesian NNs
• The result of 𝑘-NN regression of a sample is value obtained that is simple to implement, readily parallelizable, requiring
averaging the values of the 𝑘 closest neighbors. very little hyperparameter tuning, able to yield high quality
In most cases, the above described classification/regression strategy is predictive uncertainty estimates. Through a series of tests
on classification and regression benchmarks, the presented
slightly modified: the closer neighbors contribute more to the average
solution yields well-calibrated uncertainty estimates that are
than the farther neighbors, by adopting a distance based weighting
as good as or better than approximation Bayesian NNs. To
scheme. A typical weighting scheme, for example, is to provide a weight
assess resilience to dataset shift, the prediction uncertainty
of 𝑑1 to each neighbor 𝑢, where 𝑑 is the distance between 𝑢 and the
on test cases from known and unknown distributions was
input sample. Neighbors are chosen from a collection of objects for
eliminated, showing that it can express higher uncertainty
which the class (in 𝑘-NN classification) or object property value (in 𝑘-
on out-of-distribution situations.
NN regression) is known. Although there is no explicit training phase
necessary, this can be regarded of as the algorithm’s training set. The • Production from renewable energy sources forecasting :
𝑘-NN method is unique in that it is sensitive to the daisy chain’s local
In Basurto et al. (2019), to increase the variety of solar
structure.
energy and power grid combinations that might be em-
In our case, 𝑘-NN was used for regression. Our approach is based
ployed, a Hybrid Intelligent System (HIS) was proposed.
on 𝑘-NN but is different from Sensoy et al. (2018), as we calculated
The presented solution, was designed to predict how much
the error using the 𝑘 closest to the point in two different ways:
energy a solar thermal system will generate. HIS does this
• absolute: using the 𝑘-NN technique, calculating the average of by employing local models (artificial neural networks) that
the error of the 𝑘 closest to the point from which one wanted perform both supervised and unsupervised learning (clus-
to calculate the error. tering). These techniques are combined and evaluated in
• relative: to further improve the results obtained, it was decided a real-world setting in Spain. Data from a complete year
to carry out the same tests but weighing the contribution of the is utilized to analyze and test various models in this case
points based on the distance. study. Using an optimum parameter fit, the proposed system
estimated the solar energy output by the panel with low
Significant results were obtained on the estimation of the error of error in 86 percent of cases.
the forecasting model using k-NN (See Table 4). The obtained values Xie et al. (2015) suggested several new models for fore-
are remarkably low and vary between 0.76% and 6.51% in terms of casting China’s energy production and consumption pat-
average absolute error, where the worse result is obtained for the terns under the impact of the country’s energy conservation
production of fossil oil due to the above mentioned estimation issues strategy. To predict the overall quantity of energy produc-
characterizing this kind of production. tion and consumption, an optimized single variable discrete
We point out that we devised a sensitivity analysis in order to find gray forecasting model is used, while an unique Markov
the optimal 𝑘 value whose results are reported in Tables 4 and 5.The technique based on quadratic programming is presented
error estimation algorithm was run for the values of 𝑘 ranging in the to forecast the trends of energy production and consump-
interval [1..50]. tion structures. The suggested models are used to replicate
7
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
China’s energy production and consumption from 2006 to the presented DFN-AI techniques are resilient and trustwor-
2011, as well as to estimate trends for 2015 and 2020. thy, and are unaffected by random sample selection, sample
The results show that the suggested models can accurately frequency, or sample structure breakdowns.
simulate and anticipate total energy production and con-
sumption quantities and structures. When compared to the • Electric load forecasting
regression model, the findings demonstrate that the sug- Short-term load forecasting, long-term load forecasting, ge-
gested model performs somewhat better in simulating and ographical load forecasting, electricity price forecasting, de-
forecasting the situation. Despite the fact that China’s en- mand response forecasting, and renewable generation fore-
ergy consumption growth rate has slowed as a result of its casting are all examples of energy forecasting in the utility
energy conservation strategy, overall energy consumption business. Because of the storage limitations and societal de-
and the proportions of natural gas and other energies con- mand for power, energy forecasting has numerous intrigu-
tinue to rise, while crude oil and natural gas self-sufficiency ing aspects, such as complicated seasonal patterns, 24/7
rates continue to fall. data collecting across the system, and the requirement to
In Li et al. (2016), artificial neural networks (ANN) and be highly precise. In Hong (2014), an effective forecasts of
support vector regression (SVR) are evaluated for forecast- electric load was provided, it being in fact a crucial issue in
ing energy output from a solar photovoltaic (PV) system the electric power industry. The energy forecasting sector,
in Florida 15 min, 1 h, and 24 h ahead of time. Based on such as demand response forecasting and renewable gen-
the machine learning techniques, a hierarchical approach is eration forecasting, has problems as a result of smart grid
presented. The production data utilized in this study came investment and technology. In the smart grid age, century-
from 15-minute averaged power measurements taken in old energy forecasting gets a fresh lease on life. Electric
2014. Computing error statistics such as mean bias error load forecasting aims at forecasting the expected electricity
(MBE), mean absolute error (MAE), root mean square er- demand at aggregated levels. Traditionally electric load
ror (RMSE), relative MBE (rMBE), mean percentage error forecasting techniques yield as result the expected value
(MPE), and relative RMSE are used to assess the model’s of the electric load, but recently the field of probabilistic
correctness (rRMSE). This research reveals how individual electric load forecasting (PLF) as gained relevance, which
inverter projections might enhance the PV system’s total aim at yielding predictions in terms of quantities, intervals,
solar power output forecast. or density functions. Since the beginning of the electric
power sector, load forecasting has been a basic commercial
• energy market indices: concern. Both research and commercial practices in this
field have largely concentrated on point load forecasting
In Wang and Wang (2016) was established a novel neural during the last 100 years or more. However, in the last
network architecture that combines Multilayer perception decade, rising market competition, aging infrastructure, and
and ERNN (Elman recurrent neural networks) with stochas- renewable integration requirements have made probabilistic
tic time effective function in an attempt to enhance the load forecasting an increasingly crucial part of energy sys-
predicting accuracy of crude oil price variations. ERNN is tem planning and management. In Hong and Fan (2016), an
a time-varying predictive control system that was designed overview of probabilistic electric load forecasting, covering
to remember recent occurrences in order to forecast future key approaches, methodologies, and assessment methods, as
output. According to the stochastic time effective function, well as frequent misconceptions, was provided. The authors
current information has a greater impact on investors than stressed the importance of investing in additional research,
older information. The empirical research performed well in such as repeatable case studies, probabilistic load prediction
assessing the prediction impact on four time series indices evaluation and value, and a probabilistic load forecasting
using the developed model. In comparison to previous mod- methodology that takes into account new technologies and
els, this one can assess data from the 1990s to 2016 with energy regulations.
notable precision and speed. The applied CID (complexity
invariant distance) analysis and multiscale CID analysis are 7. Conclusions and future work
presented as new helpful methods to assess if the sug-
gested model has a superior forecasting capacity than other In this work the problem of devising effective predictions of the
standard models. amount of electrical power produced using non-renewable sources,
together with a point-wise estimate of the quality of the predictions.
The challenge of predicting crude oil prices is difficult.
The forecasts that will be obtained with the various models considered
In Wang et al. (2018) the presented study offers the DFN-
in this work are of good quality (except for the production using fossil
AI model, an unique hybrid technique that combines an
oil) and would thus be very useful for giving indications to energy
integrated data fluctuation network (DFN) with multiple
market traders in order to predict energy consumption or prices in
artificial intelligence (AI) methods to enhance forecasting. A
advance.
complex network time series analysis technique is used as a
In the future, it will be interesting to consider the possibility of
preprocessor for the original data to extract the fluctuation
improving the prediction for the case of fossil oil by augmenting the
features and reconstruct the data, and then an artificial input data used for devising prediction in this work with new kinds of
intelligence tool, such as BPNN, RBFNN, or ELM, is used to data such as data extracted by financial newspaper in order to identify
model the reconstructed data and predict the future data in political and non-political trends that typically have a strong influence
the proposed DFN-AI model. The authors investigated daily, on such a production.
weekly, and monthly pricing data from the crude oil trading
hub in Cushing, Oklahoma, to confirm these findings. The CRediT authorship contribution statement
suggested DFN-AI models (i.e., DFN-BP, DFN-RBF, and DFN-
ELM) outperform their equivalent single AI models in both Sergio Flesca: Conceptualization, Methodology, Software.
the direction and level of prediction, according to empirical Francesco Scala: Conceptualization, Methodology, Software. Euge-
data. This demonstrates the efficacy of the suggested mod- nio Vocaturo: Conceptualization, Methodology, Software. Francesco
eling of crude oil prices’ nonlinear tendencies. Furthermore, Zumpano: Conceptualization, Methodology, Software.
8
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
9
S. Flesca et al. Expert Systems With Applications 200 (2022) 116936
Sensoy, M., Kaplan, L. M., & Kandemir, M. (2018). Evidential deep learning to Xie, N.-M., qing Yuan, C., & jie Yang, Y. (2015). Forecasting China’s energy
quantify classification uncertainty. In Advances in neural information processing demand and self-sufficiency rate by grey forecasting model and Markov model.
systems 31: Annual conference on neural information processing systems 2018 (pp. International Journal of Electrical Power & Energy Systems, 66, 1–8.
3183–3193). Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic
Wang, J., & Wang, J. (2016). Forecasting energy market indices with recurrent net. Journal of the Royal Statistical Society. Series B. Statistical Methodology,
neural networks: Case study of crude oil price fluctuations. Energy, 102, 67(2), 301–320.
365–374.
Wang, M., Zhao, L., Du, R., Wang, C., Chen, L., Tian, L., & Eugene Stanley, H.
(2018). A novel hybrid method of forecasting crude oil prices using complex
network science and artificial intelligence algorithms. Applied Energy, 220,
480–495.
10