1 s2.0 S0360544223000543 Main

Energy 268 (2023) 126660
Contents lists available at ScienceDirect
Energy
journal homepage: www.elsevier.com/locate/energy
Robust framework based on hybrid deep learning approach for short term
load forecasting of building electricity demand
Charan Sekhar *, Ratna Dahiya
Department of Electrical Engineering, National Institute of Technology Kurukshetra, Kurukshetra, Haryana, 136119, India
A R T I C L E I N F O A B S T R A C T
Handling Editor: Henrik Lund Buildings consume about half of the global electrical energy, and an accurate prediction of their electricity
consumption is crucial for building microgrids’ efficient and reliable functioning, leading to profitability for users
Keywords: and utilities. This paper proposes a novel optimal hybrid strategy for building load prediction that combines
Short-term load forecast bilateral long short-term memory (BiLSTM), convolution neural networks (CNN), and grey wolf optimization
Bilateral long short-term memory
(GWO). The GWO obtains the optimal set of parameters of the CNN and BiLSTM algorithms. One-dimensional
Convolution neural networks
CNN is applied to extract the time series data feature effectively. The proposed strategy performance is inves
Time series prediction
Grey wolf optimization tigated using four buildings having distinct characteristics with hourly resolution data. Results justify that the
same technique can be applied effectively to different structures. The work compares and examines their per
formance with other cutting-edge technologies for the forecast for one day, two days, and a week. The findings
demonstrate that the suggested GWO–CNN–BiLSTM technique performs more accurately than standard CNN-
LSTM, CNN-BiLSTM, optimized BiLSTM, and traditional LSTM and BiLSTM techniques.
1. Introduction unpredictable and highly fluctuating with nonlinear characteristics due

to the nature of consumption depending on the scheduling, occupancy,
The power system resilience necessitates a balance between supply and working culture of a particular building. Over decades, researchers,
and demand, which becomes challenging as electrical energy con engineers, and mathematicians have developed several prediction
sumption rises continuously due to economic and population growth. methods broadly categorized as physical, statistical, and artificial in
Further imbalance is brought on by the requirement for a sustainable telligence. Physical models require professionals and more equipment
grid incorporating renewable energy sources and electric vehicle tech rating details and also suffer from inefficient learning, time-consuming,
nologies to reduce pollution emissions. Accurate demand estimation is and cumbersome. Statistical approaches are quick, easy, and straight
unavoidable to lessen imbalance and enhance system reliability and forward, depending on current and past data. They are the most applied
security. Integrating renewable energy sources into the distribution methods for forecast applications in the last century. Despite that, these
system or single entity has created a new paradigm, such as microgrids. approaches are poor at interpreting massive amounts of data and have
Many microgrids are designed for diverse purposes, including campus shortfalls in accuracy for irregular load swings. Due to deregulation, the
microgrids, industrial microgrids, military microgrids, residential complexity of power usage patterns has also grown. The above limita
microgrids, and complex commercial microgrids [1]. In all types of tions are overcame by Artificial intelligence (AI) methods due to their
microgrids, buildings are major electricity users. nonlinear learning capability, good computation facilities, and
Building energy consumption in India is around 33% and increasing data-handling techniques for forecast applications and hence gained
at a rate of 8% [2], and around the world is one-third of total energy popularity [5,6]. Accordingly, the present work developed a hybrid
consumption and 55% of total electricity consumption [3]. Demand framework based on deep learning (DL) techniques: a 1-dimensional
forecasts of buildings ranging from a few hours to a few days assist convolution neural network (CNN) and a bilateral long short-term
building managers and operators in more intelligent resource utiliza memory (BiLSTM) network.
tion, cutting emissions, operating the grid safely, and increasing cost
savings [4]. However, the electrical energy consumption in buildings is
* Corresponding author.
E-mail addresses: charansekhar@gmail.com, charan_61900135@nitkkr.ac.in (C. Sekhar), ratna_dahiya@nitkkr.ac.in (R. Dahiya).
https://doi.org/10.1016/j.energy.2023.126660
Received 9 September 2022; Received in revised form 29 November 2022; Accepted 8 January 2023
Available online 16 January 2023
0360-5442/© 2023 Elsevier Ltd. All rights reserved.
C. Sekhar and R. Dahiya Energy 268 (2023) 126660
1.1. Literature survey (GHOA) for forecasting smart grids in New South whales and Victoria
urban areas [40]. Talaat et al. also applied for GHOA to determine the
The work [7–14] shows an upsurge in using AI techniques to esti optimal number of neurons for hidden layers of ANN for the demand
mate electricity at the grid or region level and indicates ample literature forecast of the youth power station from the city of Salhiya, where the
is available on grid demand forecasts. However, lesser work explored the demand range is around ten of MWs [41]. In contrast, work [42] had
consumption pattern of the buildings compared to the power grid level. chosen Harris hawks optimization (HHO) for tuning weights of ANN for
Research [15–22] has proposed various structures of artificial neural the forecast of the Queensland electricity market. Gao et al. in Ref. [43]
networks (ANN), machine learning (ML) techniques, or a combination of developed an improved whale optimization algorithm (WOA) to find
multiple ML methods for short-term load forecast (STLF) for one hidden nodes and weights for cooling load forecasts in buildings. Even
building or numerous with very close characteristics to that single the WOA had presented considerable improvement, yet suffers with
building. The comparative analysis of methodologies based solely on slow convergence. Tran T. N. chose the grid search algorithm to opti
those types of building loads—residential, commercial, or admin mize the number of filters, kernel size, epochs, and inputs of the 1D CNN
istrative—was offered. The single DL techniques are developed in Refs. structure for the STLF of Queensland state of Australia and Ho Chi Minh
[23,24] to estimate the electricity demand. The work in Ref. [23] city of Vietnam [44]. The error is still high even after optimization of the
considered residential demand prediction using conditional restricted CNN network. The efforts from Refs. [45,46] tried to optimize LSTM
Boltzmann machines and factored dependent restricted Boltzmann ma network parameters, the number of hidden units, hidden layers, dropout
chines (CRBM). While the research of Cai et al. deals with the data of rate, and learning rate using a genetic algorithm and hidden layers,
three buildings for the next-day forecast using long short-term memory hidden units, and dropout rate using improved sine cosine optimization
(LSTM) and gated CNN approaches [24]. (SCO), respectively. In the former case, case studies are two campus
To overcome the overfitting limitations of a single technique, few buildings, and the latter study is considered an academic building. Three
researchers have created hybrid methods to boost the forecast accuracy regional loads in Madhya Pradesh, India, are considered for forecasting
of buildings. Forecasting with three ensemble approaches using multi using grid search optimized multiple BiLSTM networks to learn the
layer perceptron (MLP), LSTM, XGBOOST, support vector regression transitional period between seasons [47]. A Bayesian
(SVR), and linear regression (LR) methods are proposed in Ref. [25] for optimization-based parallel CNN-LSTM network was built for buildings’
net zero building. Kong et al. developed an LSTM neural network-based heat load prediction [48]. In the Assam region of India, the electricity
framework for forecasting a group of individual consumers where the demand projection on special days, such as regional festivals, is done
output of the LSTM technique is fed to the feed-forward neural network utilizing a hybrid framework of GWO - SVR. For significant days, such as
(FFNN) [26]. A hybrid algorithm of regression trees and CNN is created regional festivals, they optimized the SVR parameters intensive loss
to predict three industrial demands [27]. Imani Maryam combined CNN function, trade-off, and kernel function [49]. The literature shows that
with SVR for residential load forecasting, where CNN extracts the fea the optimal set of parameters of single forecast technique to improve the
tures from the data cubes and SVR for forecasting [28]. Nine different DL forecast accuracy and stability. Spite of the fact that a single forecast
techniques-based framework is developed for an hour and day ahead method’s ideal network achieved better outcomes, it still has its
forecast of five buildings from diversified geographical locations. The limitations.
demand variance from lowest to highest is relatively low even though
the work took buildings from various applications into consideration 1.2. Research gaps and contributions
and manually selected the parameters from the multiple sets [29]. The
residential consumption forecast problem is addressed in Refs. [30,31] • According to the literature review, CRBM and CNN-SVR techniques
by combining CNN with LSTM and gated recurrent unit (GRU), are used for residential demand forecasting. The methods CNN-
respectively. Somu et al. constructed a CNN-LSTM-based hybrid tech LSTM, MLR-MLP-SVR, GA-LSTM, and SCOA-LSTM, are confined to
nique by including a K-nearest neighbor to estimate the energy of an forecasting work on academic buildings, while CNN networks pre
academic building [32]. dict industrial demands. It indicates that most forecast work is done
The main limitation of the above-described techniques is the manual on particular types of buildings. Hence there is the scope of a
network selection of the number of hidden layers, hidden units, learn generalized model which applies to diversified buildings, whereas
rate, and drop factor. Selecting a set of parameters for modeling the the electrical consumption pattern is entirely different.
forecast technique is crucial because it increases the model’s stability. • Among the DL techniques, LSTM approaches are common in the
Techniques for hyper-parameter optimization that are frequently forecast work. In contrast, an improved version of the LSTM is
employed include manual and automatic searches. Manual search BiLSTM which is not used much. The LSTM technique learns the
mainly relies on the knowledge and experience of individuals, which is temporal dependencies of sequence data in one direction, which may
difficult in all cases. To overcome the optimal parameter selection be subject to an overfitting problem. BiLSTM includes an additional
problem, researchers combined the STLF techniques with optimization layer for the reverse flow of a sequence of information and allows the
methods. The studies in Refs. [33–38] employed the particle swarm sequence data to flow in both forward and backward directions.
optimization (PSO) technique and its variations to determine the Bidirectional learning helps to overcome the overfitting problem
optimal values for parameters of the various ML techniques used in the [50].
STLF of different types of buildings. These include weights of the neural • Another important strategy that is used in association with BiLSTM is
networks, hybrid model coefficient weights, and parameters of extreme 1D-CNN. The superior abilities of 1D-CNN that extract features
learning machines with kernels, ANFIS. Although the techniques ach through internal linkages increase the proposed framework’s overall
ieved considerable accuracy, still the PSO method has a delayed performance and provide improved predictability.
convergence, inertia weights, and learning rate selection problem, and • Some researchers worked on forecasting the electrical energy con
the local optima problem is a primary constraint. Massana J et al. sumption of different buildings with the manual selection of a set of
employed the grid search algorithm to ascertain the parameters of parameters. This will increase the burden on humans to simulate
multilinear regression, multilayer perceptron, and support vector with different combinations to get better accuracy. This is avoided in
regression. They analyzed its performance on the university office the present work by the GWO technique implementation to achieve
building consumption [39]. The grid search method’s drawback is not an optimal set of parameters to enhance the 1D-CNN and BiLSTM
remembering the best value in its process. network accuracy. The GWO technique is easy to implement, suc
Tuning of the ANN parameters, such as hidden layer nodes and train- cessfully exploits and explores the search space, and is more reliable,
test ratio, are addressed with the grasshopper optimization algorithm quick, and efficient than other evolutionary algorithms.
2
• The work considered ten parameters of two techniques for optimi the forward and the second is in the backward direction. The architec
zation, which overcome the limitations of the single technique ture of the BiLSTM is shown in Fig. 1.
parameter optimization. Another strength of the proposed work is The information of the forward LSTM and backward LSTM units is
fewer iterations in comparison with the literature. → ←
stored as hidden states ht and ht , respectively, at a time, ‘t.’ The final
hidden state will be computed by concatenating two hidden states [52].
The current work offers a better solution to reduce the errors in the
(→ ← )
building forecast using a hybrid approach while keeping in mind the ht = ∂ ht , ht (7)
above research gaps.
1.3. Paper organization 2.2. Convolution neural networks
The paper is organized into six sections, including the first section Convolution neural networks are a unique class of deep learning
comprises the introduction of the work and the motivation behind it. techniques that uses convolutional operations. The features in the time
Section II describes the details of the techniques BiLSTM, CNN, and series consumption data of the buildings were extracted using 1-D CNN
GWO developed for simulation work. Section III explains the method in the current work [27], which applies sliding convolution operation
ology developed from the methods mentioned earlier. Various case along the 1-dimensional time series data sequence. The proposed
studies of the forecast work are given in section IV. The tabular and 1D-CNN comprises three fundamental layers: the convolution layer, the
graphical results with discussion are produced in section V, and overall detection layer, and the pooling layer. The performance of the CNN
conclusions of the work and future scope are concluded in section VI. network depends on the parameters of these layers. They include filter
size, number of filters in the layers, stride, padding, and batch size. The
2. Methods CNN layers’ orientation and the BiLSTM network for forecast work are
shown in Fig. 2.
Basic concepts of the techniques are given as follows.
2.3. Grey wolf optimization
2.1. Bilateral long short term memory
The GWO reflects the hunting pattern of wolves. Grey wolves use
The LSTM technique consists of an input gate, forget gate, and output leadership skills at alpha, beta, delta, and omega levels. The pack leader
gate, along with a memory cell. Fig. 1 shows the LSTM network func is alpha and leads the group and is considered the best position in
tioning [51]. optimization. The beta, the second level of the pack’s leader, coordinates
The forget gate and the input gate value depend on the previous with the alpha and the other wolves. The delta and omega are consec
hidden state and current input. utively decreasing in the steps in commanding authority of the pack [53,
( ) 54].
Ft = σ Wfh ht− 1 + Wfx xt + bf (1) The grey wolves identify and encircle the prey, the first step of
hunting and the mathematical model of it as at the current time step is
It = σ(Wih ht− 1 + Wix xt + bi ) (2) ⃒ ⃒
D = ⃒C × Yp (t) − Y(t)⃒ (8)
( )
The cell state activation Gt = tanh Wgh ht− 1 + Wgx xt + bg (3)
The positions are updated using the
The cell state function Ct = Ct− 1 ⊙ Ft + It ⊙ Gt (4) Y(t + 1) = Yp (t) − A × D (9)
The determination of A and C vectors using

The output gate value Ot = σ(Woh ht− 1 + Wox xt + bo ) (5)
( )
t
Final hidden state Ht = tanh(Ct ) ⊙ Ot (6) A = (2 × a) • r1 − a; C = 2 • r2 ; a = 2 × 1 − (10)
max iteration
Notations σ (sigmoid) and tanh (hyperbolic tangent) are the
nonlinear activation functions and keep the values in the [0,1] and Where A & C are coefficient vectors, the components of vector ‘a’ are
[− 1,1] range, respectively. The weights in the gates are represented decreased linearly from 2 to 0 to attain optima over the iterations. The
asWpq where p in the suffix represents the corresponding gate, and the random vectors r1 and r2 are in between 0 and 1. The YP and Y indicates
the position vector of the prey and grey wolves.
suffix q denotes whether it is for hidden state or input value. The gate
Alpha, the pack’s leader, directs the hunt. The alpha, beta, and delta
biases are indicated with a symbolbp .
locations are in better positions for the correct mathematical model, and
BiLSTM is an upgraded LSTM technique to learn from past and future
the other elements update their placements by these positions.
sequence data terms. For time series data-based forecasting, under
⃒ ⃒
standing the model from past and future frames is required, facilitated Dα = |C1 •Yα − Y|; Dβ = ⃒C2 •Yβ − Y⃒; Dδ = |C3 •Yδ − Y| (11)
by the BiLSTM network composed of two LSTM-based layers. One is in
Fig. 1. BiLSTM architecture.
3
Fig. 2. CNN - BiLSTM structure.
Y1 = Yα − A1 •Dα ; Y2 = Yβ − A2 •Dβ ; Y3 = Yδ − A3 •Dδ (12) CNN-BiLSTM forecast framework fitness of the locations suggested in
the training phase as shown in the diagram. The CNN-BiLSTM network’s
Y1 +Y2 +Y3 training mean square error (MSE) is considered for the GWO objective
Y(t + 1) = (13) function, which must be reduced. Based on the fitness value, the best
3
population for the position update got picked. The GWO runs iteratively
Updating the locations of the search agents depicted with the help of
to optimize the parameters of CNN-BiLSTM, which stops when the
Fig. 3. The updated positions of search agents are within the circle. The
maximum iterations are reached.
final stage in the hunting of grey wolves is attacking the prey. We have
The training and forecasting phases combine the 1DCNN and BiLSTM
mathematically modeled this stage using the variable ’a’ by decreasing
techniques to train and forecast. In the training stage, it is initialized by
the value to zero.
GWO to obtain optimal parameters for forecasting. The total of ten pa
rameters of both CNN and BiLSTM techniques are optimally determined
3. Methodology
and shown with boundaries in Appendix C. The framework consists of a
sequence input layer to feed the input to CNN layers. The layers of a one-
The research proposed a novel hybrid forecast framework for STLF
dimensional CNN network extract the sequence data’s characteristics by
consisting of four phases, depicted in Fig. 4.
utilizing the convolution layers with the relu activation function, pool
The proposed work starts with the data processing phase. This phase
ing layers to get better values. The output of the CNN layers is fed to
deals with importing and processing raw data. Missing data and outliers
BiLSTM layers and a fully connected layer. Finally, the trained result is
in the raw data from the electric meter may affect the forecast’s accu
taken from the regression layer.
racy. The missing data is due to a power supply issue. An algorithm is
After GWO executes the criteria for optimality, the best parameters
created in the current work to eliminate empty cells from the data.
regarding the least MSE are considered for a potential solution. The final
Outliers are another kind of information that must be avoided,
best parameters are given to the 1DCNN-BiLSTM network to estimate
brought on by faulty or broken instruments. Their influence is reduced
accurate electricity consumption. The performance evaluation phase,
in the current work by Z-score normalization, which also accelerates the
where the performance measures are computed, receives the predicted
training process. The Z-score normalization of data using Eq. (14) keeps
data.
the data with a mean of zero, and a standard deviation is one and keeps
the data in [0,1].
3.1. Case study example
x− x
X= (14)
σx A small example is presented to understand the working of the
proposed GWO – CNN – BiLSTM. Consider a sample of 500 consumption
Standardized data is split into training and testing data. After pro
points of hourly resolution of the college building in this example. 90%
cessing the data, the time series data is sent to the second phase.
of the data was used for training, and the remaining 10% for testing. Let
The optimization phase handles the execution of GWO to determine
Np = 3 and Iterations = 1. Initial alpha, beta, and omega positions are
the optimal parameters. The GWO algorithm initializes the positions
taken as infinite. Initialize the random population for the optimization,
using the upper and lower boundaries of the parameters. Then, evaluate
and their values are in Table 1.
The fitness values of the three positions are evaluated to determine
the alpha value. It is 3.7354E+03, and the best position is 3. The
remaining beta and delta positions are the same.
In the second iteration, the new positions are evaluated and shown in
Table 2 . If their fitness is less than the previous value, it is taken as
alpha. Otherwise, the value is the count for the beta. Here beta value is
4.15E+04 for position-1.
For the new positions of the iteration-1 from the Table 3, fitness
values are calculated. For position − 3, the fitness value is less than the
previous alpha value. So, the alpha value is updated with a new value.
Here it is 136.37. For position-2, the fitness value is higher than beta, so
it is considered for the delta value, which is 9.053E+05. In this manner,
the GWO algorithm determines the optimal parameters of the CNN
-BiLSTM network.
Fig. 3. Grey Wolves hunting pattern.
4
Fig. 4. Proposed Forecast framework.
Table 1
GWO Iteration 0: parameters values.
Position Layer-1 hidden Layer-2 hidden Layer-3 hidden Number of Epochs Learning minibatch Drop Filter No. of
units units units layers rate factor size filters
1 213 97 67 2 183 10^(-4.49) 2^8 0.2248 3 5

2 203 122 71 1 124 10^(-3.24) 2^8 0.1102 6 2
3 179 76 74 2 133 10^(-3.63) 2^7 0.3135 8 5
Table 2
GWO Iteration 1: parameters values.
1 175 75 46 1 124 10^(-5) 2^4 0.1190 2 6

2 175 75 75 5 100 10^(-3) 2^8 0.1563 8 3
3 130 112 27 1 100 10^(-3) 2^2 0.119 3 1
Table 3
GWO Iteration 1: parameters after position update.
1 175 75 46 1 123 10^(-5) 2^4 0.1190 2 6

2 175 75 75 2 100 10^(-3) 2^8 0.1563 8 1
3 175 75 75 1 100 10^(-3.2) 2^4 0.100 3 1
5
4. Data description network with a single layer and 100 hidden units (HU) per layer, Case 2
is again a single-layer network, yet 200 HU per layer instead of 100, and
The following four buildings from Mendeley data and Kaggle have Case 3 is a network with two layers with 200 HU in the first layer and
been considered case studies in the present work. The characteristics of 100 HU in the second layer. Stacking LSTM with two layers performs
the buildings vary from one another. better than single-layer LSTM technologies for both data sets. While the
BiLSTM has different performances for each data set. The remaining
• The first one is a campus, the second from a hospital, the third is forecasting method parameters are 300 epochs, 64 minibatch size, 0.005
residential, and the fourth is an industrial building. According to the learn rate, and 0.1 drop factor. All tests are performed using the MAT
college’s power data, annual consumption averages 12958.66 kW for LAB environment on a laptop with an Intel core-i5 processor and 16 GB
an hourly load resolution. The standard deviation of the demand is of RAM.
2453.61 kW. From March to April, there is a fall in electricity con The significant performance variables determining the model’s pre
sumption due to the vacation period and resulting in a heating load dictive abilities include prediction accuracy and generalization. Four
drop. end users of electricity are employed to assess the performance of the
• The second data set depicts the hospital building’s hourly electricity proposed model, and their consumption patterns are very different,
use. Two hundred seventy-five people could be accommodated at the allowing us to study the generalization ability. This section reports the
hospital when it was built. The hospital demand profile has an outcomes of the suggested approach, along with comparisons to other
average and standard deviation of 1058.368 kW and 150.72 kW, cutting-edge techniques like single, hybrid DL techniques and optimized
respectively. The lowest consumption is between February and April BiLSTM methods. Being good at acquiring knowledge from the time
and mid of September, and there is an increase in energy use series data compared to ML techniques and a recent upsurge in the
throughout the summer due to switching on air conditioning [55]. forecast field includes only the DL-based methods in the present work.
• A third case study relies on the domestic electricity usage of a two- For effective scheduling of generating plants, unit commitment, demand
floor home in Houston, Texas, in the United States. The data in response, and fuel planning, the model has been used to anticipate one
cludes three years of consumption data at hourly resolution from day, two days, and one week in the future .
Jun. 01, 2016, to August 2020. The significant consumption is due to
two refrigerators, two water heaters of 50 gallons, an air condi 5.2.1. Model performance for one week forecast
tioning system, washing machines, a TV, a security DVR, and a dryer. This section presents the one-week forecast performance of the
The average building consumption over a year is 0.89163 kW, and proposed model against the single, hybrid, and Optimized models. The
the standard deviation is 0.90813 [56]. performance analysis is carried out on the four building data sets, and
• The final data set comprises one-year Pharmaceutical industrial results in terms of performance indices are presented in Table 4.
building consumption of hourly resolution with a mean consumption Figs. 5–12 shows the comparison of weekly forecast results of the
of 1587.18 kW and a deviation of 241.07 kW. The peak demand is proposed technique against single, hybrid and optimized models. Fig. 5,
2387.7 kW during May, and the minimum is 1150 kW during the Fig. 7, Fig. 9, and Fig. 11 indicate the proposed GWO–CNN–BiLSTM
night of November and December [55]. closely follows the actual demand and fits better than the other models
for a weekly forecast for all four types of buildings with considerable
The proposed work considered 90% of data from each dataset for improvement in accuracy.
training and the remaining 10% for validation.
• The minimum errors by the GWO–CNN–BiLSTM prediction are
5. Performance evaluation 0.00293 kW, 0.000559 kW, − 0.000655 kW, and 0.000146 kW for
college, hospital, residential and industrial building loads. The
5.1. Evaluation metrics maximum error magnitudes are 1.23 kW, 0.3785 kW, − 0.002005
kW, and 1.45 kW in college, hospital, residential, and Industrial
Five measures are used to assess the proposed model’s performance. demand estimation.
They are mean absolute error (MAE), mean square error (MSE), mean • The error at any instant is near zero and not more than 0.5 kW by
absolute percentage error (MAPE), root mean square error (RMSE), and hospital demand, which can be accepted in emergency environ
normalized root mean square error (NRMSE). ments. Compared to college demand prediction, all considered
methods are good in predicting for remaining three buildings.
∑n ( )2
Ya,i − Yp,i • For a one-week projection, the GWO-BiLSTM outperforms the APSO-
1 ∑n
( )
MAE = Ya,i − Yp,i ; MSE = i=1
(15) BiLSTM in RMSE by 49.56%, 20.53%, 74.7%, and 34.64% for col
n i=1 n lege, hospital, residential and industrial buildings. In the same way,
it outperforms EJAYA-BiLSTM in RMSE for the forecast of one week
n ⃒ ⃒
1∑ ⃒Ya,i − Yp,i ⃒ by 46.26%, 25.46%, and 21.64% for college, hospital, and industrial
MAPE = ⃒ ⃒ × 100 (16)
n i=1 ⃒ Ya,i ⃒ buildings. Yet it shows slightly lower performance for residential
data set than EJAYA-BiLSTM by 38.1%.
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√∑ ∑ n
2 • The APSO-BiLSTM shows a considerable improvement in the whole
√n ( )2 (Ya,i − Yp,i )
√
√
Ya,i − Yp,i i=1 week’s prediction compared to one day and two days ahead forecast.
(17) For the one-week forecast condition, the BiLSTM is superior to LSTM
n
RMSE = i=1 ; NRMSE = ∑n
n
in projection.
1
n
Ya,i
i=1
Where actual and forecasted loads are represented with Ya,i , and Yp,i 5.2.2. Model performance for one day and two days forecast
correspondingly, and ‘n’ denotes the test sample number.
5.2.2.1. College building dataset. The BiLSTM approach, however, be
haves entirely different for each dataset. For the college demand, a
5.2. Results and analysis single-layer BiLSTM network with 200 hidden units is good in accuracy.
All forecast models closely follow the variations in the load curve,
The LSTM and BiLSTM are examined under three circumstances and especially the forecast by the proposed GWO–CNN–BiLSTM technique
chosen a better one to compare with the proposed technique. Case 1 is a
6
Table 4
One-Week Forecast comparison of College, Hospital, Residential, and Industrial Building Consumption.
Data Set Technique MAE (kW) MSE MAPE (%) RMSE (kW) NRMSE
College Building LSTM 5.6039 33.7704 0.0570 5.8112 5.80E-04

BiLSTM 4.2954 23.4651 0.0422 4.8440 4.84E-04
CNN-LSTM 3.9268 19.2608 0.0405 4.3887 4.38E-04
CNN-BiLSTM 2.4398 7.7501 0.0251 2.7839 2.78E-04
APSO-BiLSTM 5.7042 38.8380 0.0558 6.2320 6.23E-04
EJAYA-BiLSTM 5.1837 34.2073 0.0503 5.8487 5.84E-04
GWO-BiLSTM 2.8671 9.8798 0.0283 3.1432 3.14E-04
GWO–CNN–BiLSTM 0.6171 0.4648 0.0062 0.6818 6.81E-05
Hospital Building LSTM 0.3322 0.1342 0.0329 0.3664 3.63E-04
BiLSTM 0.3186 0.1500 0.0316 0.3873 3.83E-04
CNN-LSTM 0.1970 0.0539 0.0190 0.2322 2.29E-04
CNN-BiLSTM 0.1472 0.0355 0.0141 0.1884 1.86E-04
APSO-BiLSTM 0.2532 0.0950 0.0243 0.3083 3.05E-04
EJAYA-BiLSTM 0.2646 0.1080 0.0264 0.3287 3.25E-04
GWO-BiLSTM 0.2100 0.0600 0.0204 0.2450 2.43E-04
GWO–CNN–BiLSTM 0.1056 0.0195 0.0104 0.1397 1.38E-04
Residential Building LSTM 0.0021 5.52E-06 0.4810 0.0023 0.0052
BiLSTM 0.0016 4.20E-06 0.3385 0.0020 0.0045
CNN-LSTM 0.0014 3.88E-06 0.4215 0.0019 0.0043
CNN-BiLSTM 9.72E-04 1.30E-06 0.2361 0.0011 0.0025
APSO-BiLSTM 0.0058 6.90E-05 0.3509 0.0083 0.0057
EJAYA-BiLSTM 0.0011 1.80E-06 0.3149 0.0013 0.0029
GWO-BiLSTM 0.0018 4.64E-06 0.4990 0.0021 0.0048
GWO–CNN–BiLSTM 9.36E-04 9.37E-07 0.2453 9.68E-04 0.0021
Industrial Building LSTM 0.7344 0.6832 0.0484 0.8265 0.00054
BiLSTM 0.6914 0.5564 0.0471 0.7459 0.00049
CNN-LSTM 0.4344 0.3212 0.0299 0.5668 0.00037
CNN-BiLSTM 0.3944 0.2223 0.0271 0.4715 0.0003
APSO-BiLSTM 1.0688 1.3960 0.0727 1.1815 7.83E-04
EJAYA-BiLSTM 0.9057 0.9712 0.0607 0.9855 6.53E-04
GWO-BiLSTM 0.6336 0.5963 0.0405 0.7722 5.12E-04
GWO–CNN–BiLSTM 0.1636 0.0506 0.0112 0.2249 1.49E-04
Fig. 5. One-week forecast comparison in terms of MAE, MAPE, and RMSE for the college building data set.
completely overlaps on actual demand than other models. and the results of hospital demand are illustrated in Fig. 16, Fig. 17, and
For the college data set, a graphical representation of the perfor Fig. 18. Table 6 compares the models’ performance. Compared to col
mance indices, each instant error, and one-day ahead prediction is lege demand prediction, all considered methods are good in predicting
illustrated in Fig. 13, Fig. 14, and Fig. 15 and summarized the perfor hospital building load, and their MAE is less than 0.5 kW for one day,
mance indices in Table 5. two days forecast. The proposed technique GWO–CNN–BiLSTM im
proves the accuracy in MAPE 59.64% and 31.21% over 1DCNN-BiLSTM
5.2.2.2. Hospital building data set. The hospital data set projection re for one day and two days forecast, respectively.
sults for the next day and two days are presented in this subsection.
BiLSTM of 1 layer with 200 hidden units for hospital demand is less 5.2.2.3. Residential building data set. Fig. 19, Fig. 20, and Fig. 21 indi
accurate, which is better in college demand projection. Instead, the 2- cate the comparison of the proposed technique against other state-of-
layer BiLSTM network performs better for this case study. the-art methods in terms of performance indices, error magnitude at
The same trend of the forecasting performance of all models is each instant in terms of kW, and one-day forecast.
repeated for hospital building consumption, like college building fore
casting. The recommended method outperforms optimized BiLSTMs 5.2.2.4. Industrial building data set. For the Industrial consumption
approaches and state-of-the-art DL approaches in estimating the hospital forecast, the single layer with 200 hidden units performs better and is
data set. The one-day forecast performance indices, each instant error, considered for comparison. Like the case of other datasets, the proposed
7
Fig. 6. Comparison of one-week forecast error at each instant for college building data set.
Fig. 7. One-week forecast comparison in terms of MAE, MAPE, and RMSE for hospital building data set.
Fig. 8. Comparison of one-week forecast error at each instant for hospital building data set.
Fig. 9. One-week forecast comparison in terms of MAE, MAPE, and RMSE for Residential building data set.
8
Fig. 10. Comparison of one-week forecast error at each instant for residential building data set.
Fig. 11. One-week forecast comparison in terms of MAE, MAPE, and RMSE for Industrial Building data set.
Fig. 12. Comparison of one-week forecast error at each instant for industrial building data set.
Fig. 13. Comparison of one-day performance in terms of MAE, MAPE, and RMSE for college building.
9
Fig. 14. One-day forecast each instant error comparison for the college building data set.
Fig. 15. One-day forecast of the College Building.
Table 5
One day and Two days ahead forecast performance comparison of the college building.
Technique One day forecast Two days forecast
MAE (kW) MSE MAPE (%) RMSE (kW) NRMSE MAE (kW) MSE MAPE (%) RMSE (kW) NRMSE
LSTM 5.7781 35.3763 0.0586 5.9478 5.96E-04 5.7334 34.2056 0.0587 5.8486 5.94E-04
BiLSTM 4.3981 24.2246 0.0435 4.9218 4.94E-04 4.6717 26.3633 0.0472 5.1345 5.22E-04
CNN-LSTM 3.9645 18.6148 0.0407 4.3145 4.32e-04 4.5597 25.8831 0.0478 5.0875 5.17E-04
CNN-BiLSTM 2.4042 7.5327 0.0248 2.7445 2.75E-04 2.7372 9.9583 0.0287 3.1556 3.21E-04
APSO-BiLSTM 5.9482 41.7688 0.0583 6.4628 6.48E-04 5.4227 37.6024 0.0534 6.1320 6.23E-04
EJAYA-BiLSTM 5.4324 36.2988 0.0530 6.0248 6.04E-04 4.9104 32.7622 0.0481 5.7238 5.82E-04
GWO-BiLSTM 3.0398 10.8169 0.0301 3.2889 3.30E-04 2.8083 9.99327 0.0279 3.1612 3.21E-04
GWO–CNN–BiLSTM 0.5416 0.3824 0.0055 0.6184 6.20E-05 0.5223 0.3643 0.0054 0.6036 6.13e-05
Fig. 16. Comparison of one-day performance in terms of MAE, MAPE, and RMSE for hospital building.
technique outperforms considered state-of-the-art techniques and is 5.3. Comparison of BiLSTM with LSTM
represented in the table.
The comparison of the proposed technique against hybrid and single The BiLSTM technique outperforms LSTM for one day and two days
techniques in terms of the performance indices, each hour error instant, forecast for case studies of college, residential and industrial buildings.
and one-day forecast are shown in Fig. 22, Fig. 23, and Fig. 24. For the college building data set, the BiLSTM performance is superior to
LSTM for a day ahead by 25.77% in MAPE and two days estimate by
10
Fig. 17. One-day forecast each instant error comparison models for Hospital building data set.
Fig. 18. One-day forecast of the hospital building dataset.
Table 6
One day and Two days forecast performance comparison for the hospital data set.
Technique One day ahead forecast Two days ahead forecast
LSTM 0.4284 0.1936 0.0370 0.4400 3.84E-04 0.3325 0.1359 0.0298 0.3686 3.40E-04
BiLSTM 0.4490 0.2892 0.0380 0.5378 4.6958e-04 0.3678 0.1934 0.0332 0.4398 4.0546e-04
CNN-LSTM 0.3205 0.1086 0.0277 0.3295 2.8768e-04 0.2626 0.0817 0.0238 0.2859 2.6355e-04
CNN-BiLSTM 0.2562 0.0737 0.0223 0.2714 2.3694e-04 0.1931 0.0524 0.0173 0.2289 2.1103e-04
APSO-BiLSTM 0.4874 0.3324 0.0408 0.5765 5.03E-04 0.3289 0.1851 0.0288 0.4302 3.97E-04
EJAYA-BiLSTM 0.3968 0.2185 0.0333 0.4674 4.08E-04 0.2660 0.1217 0.0231 0.3489 3.22E-04
GWO-BiLSTM 0.3529 0.1721 0.0296 0.4149 3.62E-04 0.2575 0.1024 0.0228 0.3200 2.95E-04
GWO–CNN–BiLSTM 0.1036 0.0160 0.0090 0.1268 1.11E-04 0.1281 0.0265 0.0119 0.1628 1.50E-04
Fig. 19. Comparison of one-day performance in terms of MAE, MAPE, and RMSE for residential building.
19.59%. Even though for the hospital demand prediction at one day and case, the BiLSTM technique performs better for all four case studies. It
48 h ahead time horizon, the BiLSTM shows less performance than the outperforms LSTM by 25.96%, 3.95%, and 2.76% in MAPE for one week
LSTM, the maximum error at any instant does not exceed 1 kW. Simi of the forecast for college, hospital, and industrial buildings. Adding
larly, for residential and industrial loads, BiLSTM prediction is better CNN layers to LSTM or BiLSTM improves accuracy for all cases. Here
than LSTM by 42.29% and 49.4% in MAPE in the day-ahead forecast. also, table results indicate that CNN-BiLSTM performs better than LSTM
The improvement has also been observed for two days forecast. and BiLSTM and superior to CNN-LSTM for all case studies, either one or
For one week forecast, BiLSTM performance is better than LSTM, two days or a week prediction. Another observation of BiLSTM is that
indicating that BiLSTM shows potential learning during the weekend the overfitting problem is also avoided.
due to its bidirectional learning capability and produces less error than
LSTM. For the case one-week prediction, the BiLSTM has superior per
formance over LSTM for all case studies. Unlike the one-day forecast
11
Fig. 20. One-day forecast each instant error comparison for the residential building data set.
Fig. 21. One-day forecast of the Residential demand.
Fig. 22. Comparison of one-day performance in terms of MAE, MAPE, and RMSE for industrial building.
Fig. 23. One-day forecast of each instant error comparison for Industrial Building.
5.4. Effect of the optimal CNN-BiLSTM technique on the performance • Automatic parameter tuning of the BiLSTM network shows accuracy
improvement instead of the manual setting. Three well-known meta-
The performance of the BiLSTM technique is improved in two ways. heuristic-tuned BiLSTM networks are put up for comparison. They
The first strategy is parameter tuning, while the second is a hybrid are GWO, improved JAYA, and adaptive PSO. The BiLSTM, CNN, and
approach that combines the two methods. training option parameters from Appendix C are considered for
optimization.
12
Fig. 24. One-day forecast of the Industrial building.
• In contrast to APSO-BiLSTM and EJAYA-BiLSTM, the GWO-based • Optimizing parameters of the hybrid technique composed of the
BiLSTM network performs better. The results unequivocally GWO–CNN–BiLSTM strategy outperforms the single and hybrid
demonstrate the GWO’s expertise in exploring and exploiting the methods. The relative error of the proposed technique
solution space.
• All the considered optimization techniques produced better results
for single-layer BiLSTM. 5.5. Comparison of optimization capability and performance
• For the hospital demand, the drop factor of EJAYA-BiLSTM and
GWO-BILSTM techniques are given 0.21 and 0.118, improving the Compared to BiLSTM parameters optimization, the optimal param
accuracy over BiLSTM. The GWO-BiLSTM also takes advantage of eter selection of the CNN-BiLSTM network shows considerable
hidden units in first layer (225), resulting in better performance over improvement in the forecast accuracy from the start of the optimization
EJAYA-BiLSTM. The drop factor of APSO-BiLSTM is only 0.1 technique. Even though the PSO BiLSTM technique exhibits a good
resulting in poor performance. The GWO–CNN–BiLSTM drop factor, prediction for the residential demand, it is far away compared to GWO
first layer hidden units are 0.18 and 225, respectively. Along with and EJAYA-based forecasting approaches. The addition of the CNN layer
these parameters, CNN layers also assist in improving accuracy. reflected in the reduction of epochs for all case studies without losing
• Similarly, for the case of residential demand, the drop factor for accuracy.
APSO-BiLSTM, EJAYA-BILSTM, GWO-BiLSTM, GWO–CNN–BiLSTM The results of all case studies illustrate that GWO-BiLSTM has
are 0.1, 0.4, 0.26, and 0.16, and the learning rates are 1.00^-03, consistent and more accurate performance than APSO-BILSTM and
1.43^-04 and 4.78^-04, 6.11^-04. These values represent that the EJAYA-BiLSTM. Fig. 25 shows the convergence of optimization tech
more drop factor and lower learning rate improve the accuracy of niques and indicates the convergence of GWO is faster than the other
EJAYA-BiLSTM, GWO-BiLSTM, and GWO–CNN–BiLSTM. The hidden two optimization methods, reflecting on lesser epochs. The proposed
units are almost equal by all optimization algorithms. GWO–CNN–BiLSTM has a flatter convergence curve and represents
• A similar pattern is observed for industrial demand also except GWO- more excellent stability. One common observation found that all opti
BiLSTM, whose learn is high, result to decrease in performance for mization methods produced better results for a single layer of the fore
one day forecast but still consistent for all day and two days and cast network. The generalization capability of the proposed technique is
weekly forecast. assessed on four buildings of different characteristics, with varied
• For the day, two days forecast, the GWO-BiLSTM outperforms the training and testing samples size and for different forecast horizons.
APSO-BiLSTM, EJAYA-BiLSTM in MAPE by 48.37%, and 47.75%, Table 4, Table 5 Table 6, Table 7, and Table 8 illustrate its superior
43.27%, and 42% for college and hospital buildings. For the resi performance.
dential building, the GWO – BiLSTM produced better prediction over
APSO-BiLSTM for one day and two days forecast. In contrast, for 6. Conclusions
industrial demand, its performance is superior over PSO-BiLSTM for
two days forecast. The study proposed a novel hybrid GWO–CNN–BiLSTM approach to
• The traditional LSTM, BiLSTM, or GWO BiLSTM performance metrics resolve the STLF issue of buildings. The GWO algorithm is integrated to
are reduced in value with the inclusion of CNN. Two hybrid tech obtain the optimal parameters from the training of CNN and BiLSTM
niques, CNN-LSTM and CNN-BiLSTM, are used for comparison, networks, thereby utilizing them to improve the forecast accuracy. The
which are created by inserting 1D-CNN layers before the actual LSTM 1DCNN extracts the features from the sequence data, and the BiLSTM
and BiLSTM layers, respectively. Including the CNN layer in BiLSTM learns the data in both forward and backward directions. The model
enhances its capacity to forecast events and outshines CNN-LSTM performance is analyzed for One-day, two days, and one-week pro
because CNN improved data capture capabilities enable BiLSTM to jections of four different characteristics case studies. They are college,
increase the accuracy of its forecasts. Including the CNN layer in hospital, residential and industrial. Different training sample sizes are
BiLSTM enhances its capacity to forecast events and outshines CNN- also considered to analyze the generalization capability of the proposed
LSTM because CNN improved data capture capabilities enable model.
BiLSTM to increase the accuracy of its forecasts. The conclusions are 1) compared to individual techniques LSTM and
• The guided practice indicates that the GWO successfully obtains the BiLSTM, hybrid approaches CNN-LSTM and CNN-BiLSTM produced
best quality values. Moreover, a hybrid strategy gave more accurate better prediction to a certain extent. 2) The GWO converged faster and
results than a single method because it incorporated the two flattered for the CNN-BiLSTM network, indicating its stability. Even the
methods’ best qualities. The current study made use of both ap starting iteration error is also much lower. 3) The proposed hybrid
proaches, which significantly increased accuracy. The research pro optimal strategy GWO–CNN–BiLSTM outperforms all considered tech
posal suggests that framework CNN-BiLSTM parameters be tuned niques LSTM and BiLSTM, hybrid approaches like CNN-LSTM and CNN-
using GWO since it is accurate in estimation compared to optimized BiLSTM, and optimized BiLSTM approaches like APSO-BiLSTM, EJAYA-
BiLSTM networks and un-optimized hybrid techniques. BiLSTM, and GWO-BiLSTM. 4) The suggested approach increased pre
cision and maintained the MAE and MAPE below 1 kW and 0.25%,
respectively, and the relative error is less than 0.001 for all four data
sets.
13
Fig. 25. Optimization Convergence comparison for hospital, residential and industrial buildings.
Table 7
One day and Two days forecast performance comparison for the Residential data set.
LSTM 0.0021 5.47E-06 0.5081 0.0023 0.0054 0.00213 5.31E-06 0.4984 0.0023 0.0053
BiLSTM 0.0013 3.20E-06 0.2932 0.0017 0.0041 0.00142 3.37E-06 0.2981 0.0018 0.0042
CNN-LSTM 0.0013 3.03E-06 0.3967 0.0017 0.0040 0.00142 3.37E-06 0.4145 0.0018 0.0042
CNN-BiLSTM 0.0010 1.32E-06 0.2497 0.0011 0.0026 9.44E-04 1.21E-06 0.2365 0.0011 0.0025
APSO-BiLSTM 0.0081 1.16E-04 0.3651 0.0107 0.0056 0.0064 8.02E-05 0.3529 0.0089 0.0057
EJAYA-BiLSTM 0.0011 1.65E-06 0.3124 0.0012 0.003 0.0011 1.73E-06 0.3220 0.0013 0.0031
GWO-BiLSTM 0.0018 4.16E-06 0.5018 0.0020 0.0048 0.0018 4.41E-06 0.5153 0.0021 0.0049
GWO–CNN–BiLSTM 9.20E-04 8.86E-07 0.2464 9.41E-04 0.0022 9.33E-04 9.15E-07 0.2500 9.57E-04 0.0022
Table 8
One day and Two days forecast performance comparison for the Industrial data set.
LSTM 0.7071 0.6837 0.0419 0.8269 0.00048 0.7139 0.6441 0.0451 0.8025 0.00050
BiLSTM 0.3651 0.1663 0.0212 0.4078 0.000238 0.5517 0.3879 0.0357 0.6228 0.00038
CNN-LSTM 0.3482 0.2056 0.0194 0.4534 0.000264 0.3406 0.1744 0.0210 0.4177 0.00026
CNN-BiLSTM 0.2764 0.0871 0.0166 0.2952 0.00017 0.3526 0.1395 0.0229 0.3735 0.00023
APSO-BiLSTM 0.5699 0.4562 0.0333 0.6754 3.94E-04 0.8114 0.8499 0.0524 0.9219 5.75E-04
EJAYA-BiLSTM 0.5214 0.3992 0.0299 0.6318 3.69E-04 0.7253 0.6707 0.0465 0.8190 5.10E-04
GWO-BiLSTM 0.6971 0.7062 0.0392 0.8403 4.56E-04 0.6202 0.5361 0.0376 0.7322 4.56E-04
GWO–CNN–BiLSTM 0.2564 0.1511 0.0155 0.3887 2.27E-04 0.1959 0.0861 0.0123 0.2935 1.83E-04
In conclusion, the proposed GWO–CNN–BiLSTM prediction model Supervision, Visualization, Writing- Reviewing and Editing. Software,
can effectively assist in smarter utilization of building resources and cost Validation.
savings by providing high prediction accuracy with consistency and
greater generalization ability. This approach can also assist in creating
the proper generation units to manage the necessary electrical energy to Declaration of competing interest
create net-zero buildings. The proposed strategy has the potential to
predict the distributed generation of power in the future. Future work The authors declare that they have no known competing financial
can also be developed on the improved version of the metaheuristic interests or personal relationships that could have appeared to influence
technique. the work reported in this paper.
Credit author statement Data availability
Charan Sekhar: Conceptualization, Methodology, Software, Data Considered data in article is public available and mentioned in
curation, Investigation, Writing – original draft. Ratna Dahiya: reference
14
Appendix A
Fig. 26. Zoom plot for college demand forecast.
Fig. 27. Zoom plot for Hospital demand forecast.
Fig. 28. Zoom plot for Residential demand forecast.
Fig. 29. Zoom plot for Industrial demand forecast.
Appendix B. Relative Error of the Proposed Method for all case studies
Fig. 30. Relative error of proposed model for college building.
15
Fig. 31. Relative error of proposed model for hospital building.
Fig. 32. Relative error of the proposed model for residential building.
Fig. 33. Relative error of the proposed model for industrial building.
Appendix C: List of Hyperparameters for optimization
Technique Hyperparameter Lower Limit Upper Limit
CNN 1D CNN filter size 2 8

1D CNN number of filters 2 256
BiLSTM Hidden units (HU) in the first layer 175 225
HU in the second layer 75 125
HU in the third layer 25 75
Common Parameters No. of layers 1 3
Epochs 100 200
Learn rate 10^(-5) 10^(-3)
Drop factor 0.1 0.4
Minibatch size 16 256
References [6] Bourdeau M, qiang Zhai X, Nefzaoui E, Guo X, Chatellier P. Modeling and
forecasting building energy consumption: a review of data-driven techniques.
Sustain Cities Soc 2019;48:101533. https://doi.org/10.1016/j.scs.2019.101533.
[1] Hossain E, Kabalci E, Bayindir R, Perez R. Microgrid testbeds around the world:
[7] Jha N, Prashar D, Rashid M, Gupta SK, Saket RK. Electricity load forecasting and
state of art. Energy Convers Manag Oct. 2014;86:132–53. https://doi.org/
feature extraction in smart grid using neural networks. Comput Electr Eng 2021;96:
10.1016/j.enconman.2014.05.012.
107479. https://doi.org/10.1016/j.compeleceng.2021.107479.
[2] Reilly MB William K. ClimateWorks-Annual-Report-2010. ClimateWorks; 2010.
[8] Yazici I, Faruk O, Delen D. Deep-learning-based short-term electricity load
[3] Hamilton I, Rapf O. Executive summary of the 2020 global status report for
forecasting : a real case application. Eng Appl Artif Intell 2022;109:104645.
buildings and construction. 2020.
https://doi.org/10.1016/j.engappai.2021.104645. January.
[4] Ahmad A, Khan JY. Optimal sizing and management of distributed energy
[9] Eskandari H, Imani M, Moghaddam MP. Convolutional and recurrent neural
resources in smart buildings. Energy 2022;244. https://doi.org/10.1016/j.
network based model for short-term load forecasting. Elec Power Syst Res 2021;
energy.2022.123110.
195:107173. https://doi.org/10.1016/j.epsr.2021.107173. October 2020.
[5] Raza MQ, Khosravi A. A review on artificial intelligence based load demand
[10] Rafi SH, Nahid-Al-Masood, Deeba SR, Hossain E. A short-term load forecasting
forecasting techniques for smart grid and buildings. Renew Sustain Energy Rev
method using integrated CNN and LSTM network. IEEE Access 2021;9:32436–48.
2015;50:1352–1372, Oct. https://doi.org/10.1016/j.rser.2015.04.065.
https://doi.org/10.1109/ACCESS.2021.3060654.
16
[11] Farsi B, Amayri M, Bouguila N, Eicker U. On short-term load forecasting using [34] Li K, Hu C, Liu G, Xue W. Building’s electricity consumption prediction using
machine learning techniques and a novel parallel deep LSTM-CNN approach. IEEE optimized artificial neural networks and principal component analysis. Energy
Access 2021;9:31191–212. https://doi.org/10.1109/ACCESS.2021.3060290. Build 2015;108:106–13. https://doi.org/10.1016/j.enbuild.2015.09.002.
[12] Wang S, Wang X, Wang S, Wang D. Bi-directional long short-term memory method [35] Wang J, Zhu S, Zhang W, Lu H. Combined modeling for electric load forecasting
based on attention mechanism and rolling update for short-term load forecasting. with adaptive particle swarm optimization. Energy 2010;35(4):1671–1678, Apr.
Int J Electr Power Energy Syst Jul. 2019;109:470–9. https://doi.org/10.1016/j. https://doi.org/10.1016/j.energy.2009.12.015.
ijepes.2019.02.022. February. [36] Semero YK, Zhang J, Zheng D. EMD–PSO–ANFIS-based hybrid approach for short-
[13] Mughees N, Mohsin SA, Mughees A, Mughees A. Deep sequence to sequence Bi- term load forecasting in microgrids. IET Gener, Transm Distrib Feb. 2020;14(3):
LSTM neural networks for day-ahead peak load forecasting. Expert Syst Appl 2021; 470–5. https://doi.org/10.1049/iet-gtd.2019.0869.
175:114844. https://doi.org/10.1016/j.eswa.2021.114844. December 2020. [37] Raza MQ, Nadarajah M, Hung DQ, Baharudin Z. An intelligent hybrid short-term
[14] Zhang N, Li Z, Zou X, Quiring SM. Comparison of three short-term load forecast load forecasting model for smart power grids. Sustain Cities Soc 2017;31:264–75.
models in Southern California. Energy Dec. 2019;189:116358. https://doi.org/ https://doi.org/10.1016/j.scs.2016.12.006.
10.1016/j.energy.2019.116358. [38] Ofori-Ntow Jnr E, Ziggah YY, Relvas S. Hybrid ensemble intelligent model based on
[15] Bagnasco A, Fresi F, Saviozzi M, Silvestro F, Vinci A. Electrical consumption wavelet transform, swarm intelligence and artificial neural network for electricity
forecasting in hospital facilities: an application case. Energy Build 2015;103: demand forecasting. Sustain Cities Soc 2021;66:102679. https://doi.org/10.1016/
261–70. https://doi.org/10.1016/j.enbuild.2015.05.056. j.scs.2020.102679.
[16] Lusis P, Khalilpour KR, Andrew L, Liebman A. Short-term residential load [39] Massana J, Pous C, Burgas L, Melendez J, Colomer J. Short-term load forecasting in
forecasting: impact of calendar effects and forecast granularity. Appl Energy 2017; a non-residential building contrasting models and attributes. Energy Build Apr.
205:654–69. https://doi.org/10.1016/j.apenergy.2017.07.114. July. 2015;92:322–30. https://doi.org/10.1016/j.enbuild.2015.02.007.
[17] Deb C, Eang LS, Yang J, Santamouris M. Forecasting diurnal cooling energy load [40] Li C. Designing a short-term load forecasting model in the urban smart grid system.
for institutional buildings using Artificial Neural Networks. Energy Build 2016; Appl Energy May 2020;266:114850. https://doi.org/10.1016/j.
121:284–97. https://doi.org/10.1016/j.enbuild.2015.12.050. apenergy.2020.114850. January.
[18] Dagdougui H, Bagheri F, Le H, Dessaint L. Neural network model for short-term [41] Talaat M, Farahat MA, Mansour N, Hatata AY. Load forecasting based on
and very-short-term load forecasting in district buildings. Energy Build Nov. 2019; grasshopper optimization and a multilayer feed-forward neural network using
203:109408. https://doi.org/10.1016/j.enbuild.2019.109408. regressive approach. Energy 2020;196:117087. https://doi.org/10.1016/j.
[19] Jetcheva JG, Majidpour M, Chen W. Neural network model ensembles for building- energy.2020.117087.
level electricity load forecasts. Energy Build Dec. 2014;84:214–23. https://doi.org/ [42] Tayab UB, Zia A, Yang F, Lu J, Kashif M. Short-term load forecasting for microgrid
10.1016/j.enbuild.2014.08.004. energy management system using hybrid HHO-FNN model with best-basis
[20] Biswas MAR, Robinson MD, Fumo N. Prediction of residential building energy stationary wavelet packet transform. Energy Jul. 2020;203:117857. https://doi.
consumption: a neural network approach. Energy 2016;117:84–92. https://doi. org/10.1016/j.energy.2020.117857.
org/10.1016/j.energy.2016.10.066. [43] Gao Z, Yu J, Zhao A, Hu Q, Yang S. A hybrid method of cooling load forecasting for
[21] Amber KP, Aslam MW, Hussain SK. Electricity consumption forecasting models for large commercial building based on extreme learning machine. Energy Jan. 2022;
administration buildings of the UK higher education sector. Energy Build 2015;90: 238:122073. https://doi.org/10.1016/j.energy.2021.122073.
127–36. https://doi.org/10.1016/j.enbuild.2015.01.008. [44] Thanh NGOC Tran. Grid search of convolutional neural network model in the case
[22] Ghenai C, et al. Short-term building electrical load forecasting using adaptive of load forecasting. Arch Electr Eng 2021;70(1):25–36. https://doi.org/10.24425/
neuro-fuzzy inference system (ANFIS). J Build Eng 2022;52:104323. https://doi. aee.2021.136050.
org/10.1016/j.jobe.2022.104323. February. [45] Luo XJ, Oyedele LO. Forecasting building energy consumption: adaptive long-short
[23] Mocanu E, Nguyen PH, Gibescu M, Kling WL. Deep learning for estimating building term memory neural networks driven by genetic algorithm. Adv Eng Inf Oct. 2021;
energy consumption. Sustain. Energy, Grids Networks 2016;6:91–9. https://doi. 50:101357. https://doi.org/10.1016/j.aei.2021.101357. July.
org/10.1016/j.segan.2016.02.005. [46] Somu N, M R GR, Ramamritham K. A hybrid model for building energy
[24] Cai M, Pipattanasomporn M, Rahman S. Day-ahead building-level load forecasts consumption forecasting using long short term memory networks. Appl Energy
using deep learning vs. traditional time-series techniques. Appl Energy 2019;236: Mar. 2020;261:114131. https://doi.org/10.1016/j.apenergy.2019.114131. July
1078–88. https://doi.org/10.1016/j.apenergy.2018.12.042. December 2018. 2019.
[25] Koukaras P, Bezas N, Gkaidatzis P, Ioannidis D, Tzovaras D, Tjortjis C. Introducing [47] Sharma A, Jain SK. A novel seasonal segmentation approach for day-ahead load
a novel approach in one-step ahead energy load forecasting. Sustain. Comput. forecasting. Energy Oct. 2022;257:124752. https://doi.org/10.1016/j.
Informatics Syst. 2021;32:100616. https://doi.org/10.1016/j. energy.2022.124752.
suscom.2021.100616. July 2020. [48] Chung WH, Gu YH, Yoo SJ. District heater load forecasting based on machine
[26] Kong W, Dong ZY, Jia Y, Hill DJ, Xu Y, Zhang Y. Short-term residential load learning and parallel CNN-LSTM attention. Energy May 2022;246:123350. https://
forecasting based on LSTM recurrent neural network. IEEE Trans Smart Grid Jan. doi.org/10.1016/j.energy.2022.123350.
2019;10(1):841–51. https://doi.org/10.1109/TSG.2017.2753802. [49] Barman M, Dev Choudhury NB. A similarity based hybrid GWO-SVM method of
[27] Walser T, Sauer A. Typical load profile-supported convolutional neural network for power system load forecasting for regional special event days in anomalous load
short-term load forecasting in the industrial sector. Energy AI 2021;5. https://doi. situations in Assam, India. Sustain Cities Soc Oct. 2020;61:102311. https://doi.
org/10.1016/j.egyai.2021.100104. July, p. 100104, Sep. org/10.1016/j.scs.2020.102311. January.
[28] Imani M. Electrical load-temperature CNN for residential load forecasting. Energy [50] Siami-Namini S, Tavakoli N, Namin AS. The performance of LSTM and BiLSTM in
Jul. 2021;227:120480. https://doi.org/10.1016/j.energy.2021.120480. forecasting time series. In: IEEE international conference on big data (big data);
[29] Chitalia G, Pipattanasomporn M, Garg V, Rahman S. Robust short-term electrical 2019. p. 3285–92. https://doi.org/10.1109/BigData47090.2019.9005997. Dec.
load forecasting framework for commercial buildings using deep recurrent neural 2019.
networks. Appl Energy Nov. 2020;278:115410. https://doi.org/10.1016/j. [51] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput Nov. 1997;
apenergy.2020.115410. January. 9(8):1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.
[30] Kim TY, Cho SB. Predicting residential energy consumption using CNN-LSTM [52] Schuster M, Paliwal KK. Bidirectional Recurrent Neural Networks 1997;45(11):
neural networks. Energy 2019;182:72–81. https://doi.org/10.1016/j. 2673–81.
energy.2019.05.230. [53] Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Adv Eng Software 2014;69:
[31] Sajjad M, et al. A novel CNN-GRU-Based hybrid approach for short-term residential 46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007.
load forecasting. IEEE Access 2020;8:143759–68. https://doi.org/10.1109/ [54] Jayabarathi T, Raghunathan T, Adarsh BR, Suganthan PN. Economic dispatch
ACCESS.2020.3009537. using hybrid grey wolf optimizer. Energy 2016;111:630–41. https://doi.org/
[32] Somu N, Raman M R G, Ramamritham K. A deep learning framework for building 10.1016/j.energy.2016.05.105.
energy consumption forecast. Renew Sustain Energy Rev 2021;137:110591. [55] Jafari F, Mohsen A, Ghofrani ALI, Angizeh. EnergyPlus data. 2020.
https://doi.org/10.1016/j.rser.2020.110591. April 2020. [56] Polu Sri. Residential power usage 3 years data - Timeseries. Kaggle; 2020. https://
[33] Liu N, Tang Q, Zhang J, Fan W, Liu J. A hybrid forecasting model with parameter www.kaggle.com/datasets/srinuti/residential-power-usage-3years-data-timeserie
optimization for short-term load forecasting of micro-grids. Appl Energy 2014;129: s. accessed Nov. 01, 2022).
336–45. https://doi.org/10.1016/j.apenergy.2014.05.023.
17

1 s2.0 S0360544223000543 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0360544223000543 Main

Uploaded by

Copyright:

Available Formats

Energy 268 (2023) 126660

Contents lists available at ScienceDirect

1. Introduction unpredictable and highly fluctuating with nonlinear characteristics due

1.3. Paper organization 2.2. Convolution neural networks

The cell state function Ct = Ct− 1 ⊙ Ft + It ⊙ Gt (4) Y(t + 1) = Yp (t) − A × D (9)

The determination of A and C vectors using

Fig. 1. BiLSTM architecture.

Fig. 2. CNN - BiLSTM structure.

Fig. 3. Grey Wolves hunting pattern.

Fig. 4. Proposed Forecast framework.

1 213 97 67 2 183 10^(-4.49) 2^8 0.2248 3 5

1 175 75 46 1 124 10^(-5) 2^4 0.1190 2 6

1 175 75 46 1 123 10^(-5) 2^4 0.1190 2 6

College Building LSTM 5.6039 33.7704 0.0570 5.8112 5.80E-04

Fig. 15. One-day forecast of the College Building.

Fig. 18. One-day forecast of the hospital building dataset.

Fig. 21. One-day forecast of the Residential demand.

Fig. 24. One-day forecast of the Industrial building.

Credit author statement Data availability

Fig. 26. Zoom plot for college demand forecast.

Fig. 27. Zoom plot for Hospital demand forecast.

Fig. 28. Zoom plot for Residential demand forecast.

Fig. 29. Zoom plot for Industrial demand forecast.

Fig. 30. Relative error of proposed model for college building.

Fig. 31. Relative error of proposed model for hospital building.

Appendix C: List of Hyperparameters for optimization

Technique Hyperparameter Lower Limit Upper Limit

CNN 1D CNN filter size 2 8

You might also like