Professional Documents
Culture Documents
1 s2.0 S2352484722005078 Main
1 s2.0 S2352484722005078 Main
1 s2.0 S2352484722005078 Main
Energy Reports
journal homepage: www.elsevier.com/locate/egyr
Research paper
article info a b s t r a c t
Article history: This paper presents a novel approach to forecast day-ahead electricity consumption for residential
Received 8 October 2021 households where highly irregular human behaviour plays a significant role. The methodology requires
Received in revised form 11 February 2022 data from fiscal smart meters, which makes it applicable to real scenarios where personal data gather-
Accepted 21 February 2022
ing is not feasible. These data are rarely complete; therefore, a robust combination of machine-learning
Available online xxxx
techniques is used to handle missing data and outliers. The novelty of this method relies on identifying
Keywords: and predicting user electricity consumption behaviour as a procedure to improve the forecasting of the
Electricity consumption overall electricity consumption of each individual customer. The methodology uses Gaussian mixture
One-day-ahead forecast clustering to identify behaviour clusters and an eXtreme Gradient Boosting classification (XGBoost)
Behaviour pattern model to predict the day-ahead behaviour pattern. This predicted user behaviour cluster is fed into
Smart meter
an Artificial Neural Network (ANN) to enable an improved capturing of the highly unpredictable user
Big data
conduct for the forecast of electricity consumption. A novel metric, namely the Euclidean Distance-based
Accuracy (EDA), is finally proposed to enable a more thorough evaluation of time series classification
algorithms. The whole development is tested over 500 residential users placed in a southeastern region
of Spain. The results showed that, when the novel approach was used, the MAPEd and NRMSEd were
reduced by 7% and 9% respectively, increasing to a 20% and 17% respective reduction for the best cases
according to EDA. This methodology sets the basis for massive user-centred analyses, very profitable
to any electricity company.
© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
https://doi.org/10.1016/j.egyr.2022.02.260
2352-4847/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
F. Lazzari, G. Mor, J. Cipriano et al. Energy Reports 8 (2022) 3680–3691
on the process of people’s decision making, which has an inherent 2. Literature review
stochastic nature (Guo et al., 2021). In forecasting electricity
consumption at the aggregated level, the variability of each cus- Various probabilistic data-driven models have been developed
tomer is balanced, and the user behaviour effect is attenuated. to deal with the energy behaviour of household appliances. Zhao
However, when developing individual or disaggregated models, et al. (2014) used office appliances consumption data from an of-
the behaviour component of each customer gains importance, fice building to obtain insights into occupant behaviour. Sancho-
leading to more significant variability in the energy usage pattern Tomás et al. (2017) proposed a stochastic modelling of the energy
of the customers. Consequently, at the aggregated level, the at- use of small appliances and categorised them in four groups:
tenuation of the user behaviour effect entails significantly smaller audio–visual, computing, kitchen and other appliances. Zhang and
prediction errors than at the individual level. Guo (2020) presented an ensemble method to forecast the hourly
Another challenge when dealing with individual electricity electricity consumption using data from manually-operated de-
consumption forecasting is related to data gathering and data ac- vices (such as dishwashers, heat pumps, televisions). In Lan et al.
cess. Many previous research works used sub-metered data from (2014) thousands of electricity consumption profiles were gen-
appliance consumption or outcomes from customers’ surveys to erated from combinations of activation moments from different
forecast individual electricity consumption. However, this kind of domestic appliances. Ferrandez-Pastor et al. (2017) presented the
data is difficult to obtain when working in real business scenarios design of embedded hardware components able to disaggregate
with a large number of residential households (Jiang et al., 2020). household power consumption. Chou and Tran (2018) used data
For example, gathering daily usage schedules filled in by users is collected from equipment installed in an experimental build-
completely infeasible when the number of customers is signifi- ing to evaluate the effectiveness of different machine learning
cant. Moreover, obtaining access to this type of personal data is techniques. And Haq et al. (2020) used electrical appliance con-
hardly possible in practice due to data privacy concerns. When sumption data (fridge, oven, hair dryer, lighting, air conditioner,
data comes from domestic appliances or home energy systems, laptop, water heater, television, iron, cloth dryer) of 10 household
although their market penetration is rapidly increasing, deploy- to forecast energy consumption.
ing and installing hardware components is still a complex task Other studies used the outcomes of occupants’ surveys to
when conducting a massive implementation. These are the main strengthen their models. Aerts et al. (2014) developed a prob-
reasons why our research is oriented towards using data from abilistic model to identify typical occupancy patterns based on
fiscal smart meters, enhanced with local weather data, which can a time-of-use survey. Liisberg et al. (2016) used Hidden Markov
be obtained through bilateral agreements with electricity trader Models combined with surveys to characterise occupant be-
companies and weather-forecast web services.
haviour in a residential building block. Palacios-Garcia et al.
Analysing data coming from fiscal smart meters requires ro-
(2018) presented a bottom-up stochastic model for electricity
bust methods to handle missing data and outliers since they are
demand of heating and cooling appliances. They stated that the
frequently present in the data sets. In previous research works,
occupancy profile is the cornerstone of the model, emphasising
this issue was not addressed in detail since most of them did not
that energy consumption in the residential sector is strongly
deal with real data coming from the electricity system. One of
related to people’s activity. They fed their model with information
the novelties of this paper is a contribution to fill this gap by
gathered in a diary, which had to be completed by people with
developing an innovative combination of clustering techniques,
a frequency of 10 min and had to include the daily activities
classification procedures and machine-learning-based forecasting
performed.
models, which are capable of achieving a smooth management of
All these mentioned authors used data from sub-metering
these data inconsistencies.
systems able to measure the electricity consumption of domestic
The newly developed approach combines a behaviour model
appliances in residential households. Some of them also used the
prediction with an electricity consumption forecasting model for
individual users. The objective is to forecast hourly, one-day outcomes from customers’ surveys to improve their prediction
ahead electricity consumption for a large number of residen- models. However, these kind of data are difficult to obtain when
tial households (without aggregation), taking into account the working under real scenarios where data from sub-metering sys-
significant influence of the stochastic part of user behaviour. tems are unavailable. In this case, fiscal smart meters become
It starts with Gaussian mixture clustering to identify behaviour a valuable data source; here there is only a single measure of
clusters, followed by the implementation of an XGBoost classi- the overall household electricity consumption. Many researchers
fication model to predict the day-ahead behaviour cluster. This focused their activity on obtaining electricity consumption pat-
predicted user behaviour pattern becomes the main input for terns and perform customers’ segmentation by using these data
an individual electricity consumption prediction model, based on in clustering algorithms (Al-Otaibi et al., 2016; Al Khafaf et al.,
Artificial Neural Network (ANN) techniques, and enables a more 2021; López et al., 2011). Extracting typical consumption pat-
precise capturing of the stochastic component of the behaviour. terns helped electricity Distribution System Operators (DSOs) or
The use of a behaviour model, based on time series classifi- electricity retailers to develop innovative tariff structures aim-
cation methods, tends to make the interpretation of its accuracy ing at efficient consumption profiling, as stated by Faria et al.
a difficult task. The usual binary metrics for time series accuracy (2016), Chicco et al. (2004), Melzi et al. (2017). In Aghabozorgi
assessment do not provide full understanding when dealing with et al. (2015), a comprehensive overview of available time series
daily consumption patterns, which is the basic element of classifi- clustering methods and their applications was performed.
cation in our case. Therefore, a new metric, namely the Euclidean Beyond energy behaviour characterisation, day-ahead electric-
Distance-based Accuracy (EDA), is proposed in this research to ity consumption forecasting of residential users has also been
enable a more thorough evaluation and quantification of time intensively analysed in recent years. Detailed reviews of (Yildiz
series classification algorithms. This constitutes and important et al., 2017) and van der Meer et al. (2018) showed that when
contribution of the paper. data comes exclusively from fiscal smart meters, probabilistic
The combined approach, supported by the new developed data-driven models are the most common approach and that
metric aims at establishing the basis for massive user-centred ANNs have become predominant among these techniques. Chou
analyses in the electricity sector. These kind of enriched data- and Tran (2018) did a comprehensive study of the use of dif-
driven methods are expected to be very profitable for electricity ferent data-driven consumption forecasting models based on en-
network operators, demand response aggregators, or electricity- ergy consumption time series of residential households, con-
trading companies in their end-user-based service portfolios. cluding again that the best performing data-driven model were
3681
F. Lazzari, G. Mor, J. Cipriano et al. Energy Reports 8 (2022) 3680–3691
the ANNs. The reason for the good performance of ANNs, when The whole methodology was successfully tested over an ex-
being used to predict electricity consumption, is their ability to tensive set of households in a southeastern region of Spain. Al-
capture non-linear relationships between input data and target though several preceding studies present forecasting models, few
parameters (Mena et al., 2014; Li et al., 2015). of them test their procedures under real massive scenarios. For
Nevertheless, all the indicated data-driven models, includ- this case, despite the significant problem generated by the great
ing the ANNs, have some limitations in capturing the erratic, amount of missing data and outliers (since smart meter readings
but highly periodic, energy habits of residential customers when are rarely complete), the presented methodology proved to work
the only available data are the electricity consumption, the cal- correctly. It is important to remark that these good results were
endar information and the external weather data. These data obtained even when 27.5% of the testing data set were incomplete
sets contain information of intra day user performance, climate days for the presented case study. This percentage is representa-
dependencies and also weekly, daily and hourly dependencies. tive of a real data corpus; which guarantees that the methodology
However, there is no way to directly infer the user behaviour can be used in real cases. Moreover, the development was im-
which is not related to these seasonal dependencies. If a customer plemented with a focus on avoiding costly computational tasks,
decides to turn on the washing machine twice a week when making it work within a reasonable processing time, even for
they get home, an ANN or regression based model cannot infer large amounts of users. Thus, supporting the fact that it is very
this periodic behaviour with these data inputs. It is necessary to suitable for real business scenarios.
include, at least one new exogenous variable which may have In addition, the behaviour model allows for a better under-
a direct or indirect link with this user behaviour. In previous standing of each customer, because an importance feature anal-
research work (Mor et al., 2021; Grillone et al., 2021; Yildiz et al., ysis can be done to gain knowledge of which are the most
2018), it was found that a good procedure to define this new significant features that determine each user’s daily behaviour.
input is to find the electricity consumption daily profile that is This could be a very useful tool for electricity trader companies.
most likely to occur in the 24 h forecasting horizon, by using a Finally, analysing the literature, a knowledge gap was found
combination of clustering techniques, daily feature extraction and regarding metrics to evaluate the prediction accuracy of time
machine learning methods. This is the procedure followed in this series forecasting models when using classification techniques.
research paper and constitutes a significant improvement in rela- Therefore, a novel performance metric is presented, namely the
tion to previously mentioned electricity consumption forecasting Euclidean Distance-based Accuracy (EDA). This new metric is ca-
methodologies. This strategy is built on the foundations that pable of quantifying the amount by which incorrect predictions
human behaviour repeatedly obeys certain daily routines, which have failed, which could be very helpful for researchers look-
should be evidenced in similarities in their electrical consumption ing to evaluate the accuracy of new classification models ap-
curves (Hsiao, 2015). plied to electricity consumption time series or those with similar
structures.
3. Main contributions
4. Case study
The main contribution of this research work relies on de-
The case study comprises smart meter and weather readings
veloping and testing a combination of behaviour and electricity
from 500 households. The electricity data set comes from hourly
consumption modelling to enrich the residential electricity con-
fiscal meter readings and was pseudo anonymised to avoid data
sumption forecasting and its accuracy assessment within large-
privacy issues. The data sets contain information generated for a
scale data-based computing procedures based on smart metering
year (2018). All the households are from a southeastern area of
data.
Spain.
The research develops a day-ahead forecasting workflow that
27.5% of the electricity consumption corpus were incomplete
inserts information captured by a daily behaviour model into an
days. To evaluate the whole procedure, the first 75% of the data
hourly-based electricity consumption model. The methodology
corpus were selected as historical data to feed a training stage.
starts by clustering the electricity consumption days, identifying
The 25% remaining days were used as new data to be injected into
a set of characteristic electricity consumption patterns that will
a day-ahead forecasting stage. This splitting of the data corpus
determine the typical behaviours for each user. Then, the XGBoost
was done for each user. Each entry consists of:
framework is used to forecast which one of the set of charac-
teristic behaviour patterns will each user perform the following • Electricity consumption (denoted as y(d,h) ):
day. The XGBoost uses, as inputs, extracted features of the present
and previous days. Both the extracted features of the present day – h: hour (0 ≤ h ≤ 23).
and the predicted day-ahead pattern are finally included as inputs – d: date (d = dd/mm/yyyy).
into the hourly electricity consumption forecasting model (ANN).
• External weather conditions:
This novel approach has proven to improve the accuracy of
ANN models to predict electricity consumption of residential – outdoor temperature. Obtained from a weather web
customers at hourly data granularity, overcoming the barriers service of the company Dark Sky (Anon, 0000).
imposed by the highly erratic human behaviour. Although ANN
techniques have been proven to achieve very good results, they 5. Methodology
are not capable of capturing user behaviour without including
it as an explicit variable. The novel strategy provides an alter- 5.1. Overview
native approach to precisely determine the electricity load of a
large amount of residential users at individual level. An accurate To evaluate the improvement introduced, this study compares
comparison has been carried out, in which the most widely used the accuracy results of an hourly consumption forecasting with
hourly consumption forecasting model (ANN) is evaluated with and without including the novel proposed behaviour feature. For
and without including the behaviour model. The results show this purpose, the analysis is done comparing the results of two
a considerable improvement introduced when using the novel models. The first model consists of the ANN trained to forecast
behaviour model. hourly electricity consumption using the 24 hourly lags from the
3682
F. Lazzari, G. Mor, J. Cipriano et al. Energy Reports 8 (2022) 3680–3691
5.3. Clustering as the classification criteria due to its simplicity. Eq. (5) shows its
mathematical expression.
The outcome of the clustering technique is a set of groups
23
containing similar daily consumption patterns which are identi- s′
∑ ( ′ )2
fied from the training data set. Clustering is applied to each user l(yd , c ) = √ y(d,h) − chs (5)
separately. Therefore, households can differ from each other in h=0
′
the number and shape of the daily consumption clusters. s = arg min l(yd , cs ), (6)
A distribution-based clustering technique is used, in particular, s′ ∈{1...G}
the Gaussian mixture clustering is chosen. To understand how where yd is the vector containing the 24 consumption readings
this technique works, let A = {y1 , . . . , yd , . . . , yn } be a sample of ′
for the day d and cs is the vector containing the 24 obtained
n independent identically distributed daily electrical consump- values for the characteristic curve of each cluster s′ . The new
tion measurements. In this case, for example, the element y1 is day is classified into the cluster s that minimises the Euclidean
the electrical consumption for day 1 with its hourly values in distance l.
the 24 corresponding variables (each hour h is a variable). That Classification was chosen to replace clustering in the forecast-
is to say, the element y1 is a vector of the following type, y1 = ing stage because it is faster. However, to avoid losing the new
(y(1,0) , . . . , y(1,h) , . . . , y(1,23) ), where each component represents a information added every day to the process, such as observed in
reading of the smart meter. Assuming that the elements follow similar methodologies (Yildiz et al., 2018), this data is considered
a finite mixture of a number G of Gaussian probability densities, to redefine the characteristic curves of the clusters. This every
the function f representing the distribution in the 24-dimensional day redefinition of the characteristic patterns allows the model
space (length of variable h) is defined in Eq. (3). to cope with changes in typical consumption habits. In this way,
G the model will take only some days to adapt to the new pattern,
which will depend on how big the difference in consumption is,
∑
f (yd ; Ψ) = πk Nk (yd ; µk , Σk ), (3)
but it will gradually reshape the clusters without the need of clus-
k=1
tering again. Many causes can generate changes in consumption
where Ψ = {π1 , . . . , πG , µ1 , . . . , µG , Σ1 , . . . , ΣG } are the param- patterns (these could be: Covid, having a new baby, retirement,
eters of the mixture model, Nk (yd ; µk , Σk ) is the kth Gaussian etc.), so it is important that the methodology is prepared to deal
density component with parameter vector (µk , Σk ), evaluated at with this aspect.
element yd , and (π∑ 1 , . . . , πG ) are the mixing weights or probabil-
G
ities (with πk > 0, k=1 πk = 1), and G is the number of clusters. 5.5. Feature extraction
Assuming G is fixed, the Ψ parameters must be estimated. Since
the component densities are Gaussian, the clusters are ellipsoidal This step consists of selecting and modifying the time series to
and centred at the mean vector µk . They have geometric fea- give rise to some characteristic features that will later serve as in-
tures such as volume, shape and orientation, determined by the put variables for the prediction model. Feature extraction reduces
covariance matrix Σk . dimensionality and highlights aspects of the time series, allowing
In order to find the parameters of the mixture model (Ψ) and a better understanding of the dependence of user behaviour with
the number of clusters (G) which best fit the model described by different features obtained from the time series.
Eq. (3), the R package Mclust was used. This package provides Daylight imposes a natural rhythm on human behaviour, there-
libraries to estimate the parameters of the multi-dimensional fore, to extract daily related features is an obvious choice. Apart
Gaussian probability distributions which best fit the input data from the consumption and weather data which is extracted from
set. The way of achieving this is via the expectation–maximisation the data corpus, calendar features were added; including calendar
algorithm. information has proven to enhance electricity forecasting mod-
In this study, the boundaries for the number of clusters were els (Ramos et al., 2020). Therefore, every day is characterised
set to 2 ≤ G ≤ 10. The optimum total amount of clusters was by:
selected by using the Integrated Completed Likelihood (ICL) crite-
• consumption daily statistics. Made up of quantiles 10 and
rion and the model fit was done using the Bayesian Information
90, standard deviation, mean daily consumption, mean con-
Criterion (BIC).
sumption during the night (00:00–05:00), mean consump-
The daily characteristic curve chs of each cluster s at hour
tion during the morning (06:00–12:00), mean consumption
h, is defined in Eq. (4). Fig. 4 shows an example of the set of during the afternoon (13:00–19:00) and mean consumption
characteristic curves obtained for a particular user. during the evening (20:00–23:00).
ns • weather daily statistics. Made up of: mean temperature,
1 ∑
chs = ys(d,h) , (4) maximum temperature and minimum temperature.
ns
d=1 • calendar. Made up of: weekday or holiday; summer season,
where ns is the number of days belonging to cluster s and ys(d,h) winter season or midseason (merges autumn and spring).
is the daily normalised consumption data of day d at hour h ∈ Once the features are extracted, they are transformed using
{0, . . . , 23}, clustered in cluster s. When applied to this case base splines as a strategy to make them dimensionless. Creat-
study, the mean number of clusters obtained over all the users ing dimensionless features is an important task when the input
was 8, with standard, minimum and maximum deviation of 2, 4 variables to the model represent different physical magnitudes
and 10 respectively. that range between significantly different limits. Each identified
feature is transformed using base splines B(x) of 3 knots that
5.4. Classification satisfy the recursive expression of Cox–de Boor (Eq. (7)).
{
Once the characteristic electricity daily consumption clusters 1 for ti ≤ x < ti+1
Bi,0 (x) =
of each customer are identified, our aim is to classify a given 0 otherwise
(7)
day in a pre-defined cluster. Since the present research deals
x−ti ti+p+1 −x
with large data volumes, the Euclidean distance l was selected Bi,p (x) = ti+p −ti
Bi,p−1 (x) + ti+p+1 −ti+1
Bi+1,p−1 (x)
3684
F. Lazzari, G. Mor, J. Cipriano et al. Energy Reports 8 (2022) 3680–3691
Fig. 4. Characteristic curves for each of the 6 clusters obtained (in red). Normalised curves (in black). (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
where i = {1, 2, 3} are the knots, p the polynomial order and ti prediction model is trained only with complete days. The XGBoost
the specified boundary and interior knots. The boundary knots are package is used since its implementation supports automatic
chosen as the minimum and maximum values of the feature, and handling of missing data (Chen and Guestrin, 2016). Considering
the interior knot corresponds to its mode. This way of creating the high amount of incomplete days (27.5%), this aspect was of
dimensionless inputs works better than simply normalising each great importance when choosing among different algorithms.
feature because, in this case (by choosing the mode as the interior XGBoost is an efficient implementation of gradient boosting,
knot), the most recurrent values are put together. typically used with decision trees. The boosting algorithm is an
Base splines of order p = 1 were chosen. In this way, piece- ensemble learning method that combines weak classifiers (such
wise linear polynomials are obtained. They take values within as the decision trees chosen in this study) to achieve a strong final
the interval [0, 1] in the range defined by the knots, and zero classifier. This is done by iteratively training weak classifiers and
values out of this range. The transformation of the variables into adding them. When they are added, they are weighted in a way
base splines is a way of stratifying the range of values. Once that is related to the weak learners’ accuracy. Each decision tree
all the variables have been transformed into base splines, the learns from the previous one and affects the next tree to improve
characterisation becomes dimensionless. the model (Friedman, 2001). Gradient boosting models have been
Fig. 5 shows an example for the feature night. In Fig. 5(a), compared with other machine learning algorithms (such as K-
the time series of the variable is plotted, Fig. 5(b) shows its Nearest Neighbour and Support Vector Machines), and it has been
histogram and Fig. 5(c) shows the B-splines. In all the graphs, found that the boosting technique outperforms these in terms of
the mode, which is used as the interior knot for the B-spline accuracy and robustness (Bennett et al., 2007; Yu et al., 2010).
calculation, is represented by a dotted line. B-spline1 represents Sepulveda et al. (Sepulveda et al., 2021) show a methodology in
the values below the mode, B-spline2 the values near the mode, which gradient boosting is used to predict energy consumption
and B-spline3 the values above the mode. in a massive scenario.
Finally, all the features are lagged so that each day entry Eq. (8) describes a tree ensemble model which predicts the
contains the information from the previous ones. Lags are defined dependent variable sˆd using the independent variables xd and J
as the values the features and their base spline transformation additive functions.
took the previous 1, 2, 3 and 7 days. J
∑
sˆd = fj (xd ) (8)
5.6. Daily behaviour model
j=1
In this step, we are interested in forecasting the day-ahead In this research, sˆd is the predicted behaviour cluster for day
behavioural cluster for each user. Depending on the data avail- d, fj represents an independent tree structure with leaf scores
able regarding the previous days, two day-states are defined: in the space of trees, and xd is the vector of independent vari-
complete and incomplete. In a training stage, the behavioural ables, which, in this study, is composed of the lagged features
3685
F. Lazzari, G. Mor, J. Cipriano et al. Energy Reports 8 (2022) 3680–3691
Table 1
ANN characteristics.
Parameter Default value
Number of hidden layers 1
Number of neurons in hidden layer 100
3686
F. Lazzari, G. Mor, J. Cipriano et al. Energy Reports 8 (2022) 3680–3691
√ Table 2
∑23
1 ϵ
h=0 ( (d,h) )
2 Comparison of performance metrics.
NRMSEd = 100 × %, (13) Model MAPEd (%) NRMSEd (%)
ȳ(d,h) n
Consumption model 63.4 56.4
where n = 24, ϵ(d,h) = ŷ(d,h) − y(d,h) , ŷ(d,h) is the forecasted Consumption model + Behaviour model 56.2 47.0
consumption and ȳ(d,h) is the mean daily consumption.
Table 5
Table 3 Incorrectly assigned (average EDA = 2.11).
Perfectly assigned (EDA = 0). Model (for incorrectly assigned) MAPEd (%) NRMSEd (%)
Model (for perfectly assigned) MAPEd (%) NRMSEd (%) Consumption model 78.0 59.9
Consumption model 66.0 54.8 Consumption model + Behaviour model 63.8 59.1
Consumption model + Behaviour model 46.1 38.1
Fig. 9. Aggregated feature importance of the models obtained for 217 selected
users which obtained EDA ≤ 1.
7. Discussion
that the model will not be able to predict unexpected scheduling The comparison was performed between an XGBoost plus ANN,
changes. This is the greatest limitation of the presented model. and one considering only an ANN. The results showed that the
Once again, the reason is that human behaviour has an inherent averaged MAPEd and NRMSEd were reduced by 7% and 9% respec-
stochastic nature which makes it highly unpredictable. A possible tively. Moreover, for the perfect days (EDA = 0), the averaged
solution to this problem would be including human scheduling MAPEd was reduced by 20% and the NRMSEd was reduced by 17%.
through internet communication, which would introduce an alert In addition, the behaviour model allows for an importance
in the model when changes are expected to occur. However, im- analysis using SHAP. Therefore, this analysis was carried out for
plementing this type of communication framework in a massive the models corresponding to selected users. The calendar vari-
scenario would add considerable complexity to the methodology ables (week and season) showed to be the most important ones,
and would require total commitment from users. followed by the daily maximum and minimum temperatures and
Direct comparisons should not be taken as strict benchmarks the afternoon (13:00–19:00) mean consumption. Regarding the
due to possible differences in electricity consumption data sets, day lags, the decreasing order of importance was: lag 3, lag 2, lag
climate conditions, etc. However, they can serve as one more 7 and lag 1. This shows a non-ordered component of behaviour
point of comparison. In this sense, when comparing with sim- memory which is better modelled by daily-features extraction
ilar studies in residential households (Yildiz et al., 2018), it is than by Markov chain models. This kind of analysis could be very
observed that the mean NRMSEd is around 60%, showing that useful for electricity trader companies, because it would allow
NRMSEd = 38% for the perfectly assigned group represents a them a better understanding of each customer.
significant improvement. This reinforces the good performance The presented methodology is promising for practical applica-
of the presented methodology. Another interesting result is that tions, and electricity companies could use this tool for a detailed
MAPEd and NRMSEd increased with EDA in the Consumption + understanding of each separate household to provide additional
Behaviour model. This shows that the model presented improves and ad-hoc services. Furthermore, a future trend is to develop
when the user behaviour is correctly predicted. Which reinforces R packages implementing the novel methods, thus providing a
the importance of taking into account the behaviour aspect. means to spread this research across the industrial and private
Finally, analysing Fig. 7(a), it is evident that both curves fail electricity community.
in predicting the magnitude of the real consumption’s peaks.
The Consumption + Behaviour model overestimates the first daily
CRediT authorship contribution statement
peak and underestimates the second. On the other hand, the
Consumption model is not even able to predict that two daily
Florencia Lazzari: Conception and design of study, Analysis
peaks will occur. Fig. 7(b) shows that even though both models
and/or interpretation of data, Writing – original draft, Writing –
could predict the time of the day when the significant peaks
review & editing. Gerard Mor: Conception and design of study,
occurred, the Consumption + Behaviour model replicated best.
Analysis and/or interpretation of data, Writing – original draft.
On the whole, from observing both figures, it can be appreciated
Jordi Cipriano: Acquisition of data, Analysis and/or interpretation
that the overall daily trend (meaning the moment when the peaks
of data, Writing – original draft, Writing – review & editing. Eloi
and the valleys occur) is better reproduced by the Consumption
Gabaldon: Acquisition of data. Benedetto Grillone: Acquisition of
+ Behaviour model.
data. Daniel Chemisana: Acquisition of data. Francesc Solsona:
To conclude, this methodology sets the basis for massive
Analysis and/or interpretation of data, Writing – original draft,
user-centred analyses, which may be very useful for electricity
network operators, demand-response aggregators or electricity- Writing – review & editing.
trading companies. Providing them with better disaggregated
predicting tools will allow the development of individualised Declaration of competing interest
control strategies. This, added to the active participation of resi-
dential consumers, will lead to significant energy savings. The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
8. Conclusions and future work to influence the work reported in this paper.
3690
F. Lazzari, G. Mor, J. Cipriano et al. Energy Reports 8 (2022) 3680–3691
Al Khafaf, N., Jalili, M., Sokolowski, P., 2021. A novel clustering index to find opti- Liisberg, J., Müller, J.K., Bloem, H., Cipriano, J., Mor, G., Madsen, H., 2016. Hidden
mal clusters size with application to segmentation of energy consumers. IEEE Markov models for indirect classification of occupant behaviour. Sustainable
Trans. Ind. Inf. 17 (1), 346–355. http://dx.doi.org/10.1109/TII.2020.2987320, Cities Soc. 27, 83–98.
[Online]. Available: https://ieeexplore.ieee.org/document/9072418/. López, J.J., Aguado, J.A., Martín, F., Muñoz, F., Rodríguez, A., Ruiz, J.E., 2011.
Al-Otaibi, R., Jin, N., Wilcox, T., Flach, P., 2016. Feature construction and Hopfield–K-Means clustering algorithm: A proposal for the segmentation
calibration for clustering daily load curves from smart-meter data. IEEE of electricity customers. Electr. Power Syst. Res. 81 (2), 716–724. http://dx.
Trans. Ind. Inf. 12 (2), 645–654. http://dx.doi.org/10.1109/TII.2016.2528819, doi.org/10.1016/j.epsr.2010.10.036, 00026. [Online]. Available: http://www.
00005. sciencedirect.com/science/article/pii/S0378779610002713.
Aman, S., Frincu, M., Chelmis, C., Noor, M., Simmhan, Y., Prasanna, V.K., Lundberg, S.M., Lee, S.-I., 2017. A unified approach to interpreting model
2015. Prediction models for dynamic demand response: Requirements, chal- predictions. In: Advances in Neural Information Processing Systems. pp.
lenges, and insights. In: 2015 IEEE International Conference on Smart Grid 4765–4774.
Communications (SmartGridComm). IEEE, pp. 338–343. Melzi, F.N., Same, A., Zayani, M.H., Oukhellou, L., 2017. A dedicated mixture
Anon, 0000. Dark Sky, [Online]. Available: https://darksky.net/dev. model for clustering smart meter data: Identification and analysis of electric-
Bennett, J., Lanning, S., et al., 2007. The netflix prize. In: Proceedings of KDD ity consumption behaviors. Energies 10 (10), 1446. http://dx.doi.org/10.3390/
Cup and Workshop, vol. 2007. New York, p. 35. en10101446, [Online]. Available: https://www.mdpi.com/1996-1073/10/10/
Bian, H., Zhong, Y., Sun, J., Shi, F., 2020. Study on power consumption load 1446.
forecast based on k-means clustering and fcm–bp model. Energy Rep. 6, Mena, R., Rodríguez, F., Castilla, M., Arahal, M., 2014. A prediction model
693–700. based on neural networks for the energy consumption of a bioclimatic
Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalic- building. Energy Build. 82, 142–155. http://dx.doi.org/10.1016/j.enbuild.
chio, G., Jones, Z.M., 2016. mlr: MAchine learning in R. J. Mach. Learn. Res. 2014.06.052, [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/
17, 1–5. S0378778814005349.
Chen, T., Guestrin, C., 2016. XGBoost: A Scalable tree boosting system. Pro- Mor, G., Cipriano, J., Martirano, G., Pignatelli, F., Lodi, C., Lazzari, F., Grillone, B.,
ceedings of the 22nd ACM SIGKDD International Conference on Knowledge Chemisana, D., 2021. A data-driven method for unsupervised electricity
Discovery and Data Mining - KDD ’16 785–794. http://dx.doi.org/10.1145/ consumption characterisation at the district level and beyond. Energy Rep.
2939672.2939785, [Online]. Available: http://arxiv.org/abs/1603.02754. 7, 5667–5684.
Chicco, G., Napoli, R., Piglione, F., Postolache, P., Scutariu, M., Toader, C., Palacios-Garcia, E., Moreno-Munoz, A., Santiago, I., Flores-Arias, J., Bellido-
2004. Load pattern-based classification of electricity customers. IEEE Outeirino, F., Moreno-Garcia, I., 2018. A stochastic modelling and simulation
Trans. Power Syst. 19 (2), 1232–1239. http://dx.doi.org/10.1109/TPWRS.2004. approach to heating and cooling electricity consumption in the residential
826810, 00117. sector. Energy 144, 1080–1091.
Chou, J.-S., Tran, D.-S., 2018. Forecasting energy consumption time series Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
using machine learning techniques based on usage patterns of residen- Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
tial householders. Energy 165, 709–726. http://dx.doi.org/10.1016/j.energy. Cournapeau, D., 2011. Scikit-learn: Machine learning in Python. J. Mach.
2018.09.144, [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/ Learn. Res. 12, 2825–2830.
S0360544218319145. Ramos, D., Teixeira, B., Faria, P., Gomes, L., Abrishambaf, O., Vale, Z., 2020. Using
Estudios IDAE, 2011. SECH-SPAHOUSEC-analysis of the energy consumption in
diverse sensors in load forecasting in an office building to support energy
the spanish households. final report. [Online]. Available: https://www.idae.
management. Energy Rep. 6, 182–187.
es/estudios-informes-y-estadisticas.
Sancho-Tomás, A., Sumner, M., Robinson, D., 2017. A generalised model of
Faria, P., Spinola, J., Vale, Z., 2016. Aggregation and remuneration of electricity
electrical energy demand from small household appliances. Energy Build.
consumers and producers for the definition of demand-response programs.
135, 350–366.
IEEE Trans. Ind. Inf. 12 (3), 952–961. http://dx.doi.org/10.1109/TII.2016.
Sepulveda, L.F., Diniz, P.S., Diniz, J.a.O., Netto, S.M., Cipriano, C.L., Araújo, A.C.,
2541542, [Online]. Available: http://ieeexplore.ieee.org/document/7442137/.
Lemos, V.H., Pessoa, A.C., Quintanilha, D.B., Almeida, J.a.D., et al., 2021.
Ferrandez-Pastor, F.-J., Mora-Mora, H., Sánchez-Romero, J.-L., Nieto-Hidalgo, M.,
Forecasting of individual electricity consumption using Optimized Gradient
Garcí a Chamizo, J.M., 2017. Interpreting human activity from electrical con-
Boosting Regression with Modified Particle Swarm Optimization. Eng. Appl.
sumption data using reconfigurable hardware and hidden Markov models. J.
Artif. Intell. 105, 104440.
Ambient Intell. Humaniz. Comput. 8 (4), 469–483.
Tharwat, A., 2018. Classification assessment methods. Appl. Comput. Inf. http:
Friedman, J.H., 2001. Greedy function approximation: A gradient boosting ma-
//dx.doi.org/10.1016/j.aci.2018.08.003.
chine. Ann. Statist. 29 (5), 1189–1232, [Online]. Available: http://www.jstor.
van der Meer, D., Widén, J., Munkhammar, J., 2018. Review on probabilistic
org/stable/2699986.
forecasting of photovoltaic power production and electricity consumption.
Grillone, B., Mor, G., Danov, S., Cipriano, J., Sumper, A., 2021. A data-
Renew. Sustain. Energy Rev. 81, 1484–1512. http://dx.doi.org/10.1016/j.rser.
driven methodology for enhanced measurement and verification of energy
2017.05.212, [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/
efficiency savings in commercial buildings. Appl. Energy 301, 117502.
S1364032117308523.
Guo, X., Gao, Y., Li, Y., Zheng, D., Shan, D., 2021. Short-term household load
Yildiz, B., Bilbao, J.I., Dore, J., Sproul, A., 2018. Household electricity load fore-
forecasting based on Long-and Short-term Time-series network. Energy Rep.
casting using historical smart meter data with clustering and classification
7, 58–64.
Haq, E.U., Lyu, X., Jia, Y., Hua, M., Ahmad, F., 2020. Forecasting household electric techniques. In: 2018 IEEE Innovative Smart Grid Technologies-Asia (ISGT
appliances consumption and peak demand based on hybrid machine learning Asia). IEEE, pp. 873–879.
approach. Energy Rep. 6, 1099–1105. Yildiz, B., Bilbao, J., Sproul, A., 2017. A review and analysis of regression and
Hong, T., et al., 2014. Energy forecasting: Past, present, and future. Foresight: machine learning models on commercial building electricity load forecasting.
Int. J. Appl. Forecast. (32), 43–48. Renew. Sustain. Energy Rev. 73, 1104–1122. http://dx.doi.org/10.1016/j.rser.
Hsiao, Y.-H., 2015. Household electricity demand forecast based on context 2017.02.023, [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/
information and user daily schedule analysis from meter data. IEEE Trans. S1364032117302265.
Ind. Inf. 11 (1), 33–43. http://dx.doi.org/10.1109/TII.2014.2363584, [Online]. Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T.G., Chou, J.-W., Chung, P.-
Available: http://ieeexplore.ieee.org/document/6926785/. H., Ho, C.-H., Chang, C.-F., Wei, Y.-H., et al., 2010. Feature engineering and
Jiang, W., Wu, X., Gong, Y., Yu, W., Zhong, X., 2020. Holt–Winters smoothing classifier ensemble for KDD cup 2010. KDD Cup 11.
enhanced by fruit fly optimization algorithm to forecast monthly elec- Zhang, G., Guo, J., 2020. A novel ensemble method for hourly residential
tricity consumption. Energy 193, 116779. http://dx.doi.org/10.1016/j.energy. electricity consumption forecasting by imaging time series. Energy 203,
2019.116779, [Online]. Available: https://linkinghub.elsevier.com/retrieve/ 117858. http://dx.doi.org/10.1016/j.energy.2020.117858, [Online]. Available:
pii/S0360544219324740. https://linkinghub.elsevier.com/retrieve/pii/S0360544220309658.
Lan, Y., Wu, J., Tang, Z., 2014. Generation of domestic load profiles using Zhao, J., Lasternas, B., Lam, K.P., Yun, R., Loftness, V., 2014. Occupant behavior and
appliances’ activating moments. In: ISGT 2014. IEEE, pp. 1–5. schedule modeling for building energy simulation through office appliance
Li, K., Hu, C., Liu, G., Xue, W., 2015. Building’s electricity consumption predic- power consumption data mining. Energy Build. 82, 341–355.
tion using optimized artificial neural networks and principal component
analysis. Energy Build. 108, 106–113. http://dx.doi.org/10.1016/j.enbuild.
2015.09.002, [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/
S0378778815302437.
3691