1 s2.0 S0378779623003966 Main

Electric Power Systems Research 222 (2023) 109507
Contents lists available at ScienceDirect
Electric Power Systems Research

journal homepage: www.elsevier.com/locate/epsr
A CNN and LSTM-based multi-task learning architecture for short and

medium-term electricity load forecasting
Shiyun Zhang, Runhuan Chen, Jiacheng Cao, Jian Tan *
School of Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
A R T I C L E I N F O A B S T R A C T
Keywords: Electricity load forecasting is the forecast of power load in the future period based on historical load and its
Convolutional neural network related factors. It is of great importance for power system planning, operation, and decision making. There is still
Long short-term memory an opportunity to improve the precision of forecasting using machine learning and deep learning models, ac
Multi-task learning
cording to previous research findings. In this research a deep learning framework based on multi-task learning
Deep learning framework
Electricity load forecast
(MTL), convolutional neural networks (CNN), and long short-term memory (LSTM) is suggested. By weighing the
pertinent training data for the main task and the auxiliary task, the proposed hybrid deep learning network
model MTMV-CNN-LSTM addresses the issues of too much repetitive data and poor convolution effect and
significantly enhances the generalizability of the model. It uses the CNN layer to extract features from the input
data and the LSTM layer for sequence learning. To forecast the short and medium-term electrical load, we created
two different trials. The results show that our proposed method is a highly competitive new model with high
priority compared to other models for both 10-day short-term and 3-month medium-term power forecasting
tasks.
1. Introduction forecasting of short and medium-term power loads.

The most widely used electric load forecasting models can be clas
The forecast of the electric system load is a combination of historical sified into three categories [4]: statistical models, machine learning
power load [1], date factors, weather conditions, economic factors, and models, and hybrid models [8]. Statistical models and machine learning
other factors that influence the forecast of the power system load in the methods have been widely used in electric load forecasting in the past.
future period [2]. It serves as the cornerstone for the economic operation However, due to the numerous factors affecting the load and the
of the power system [3] and is crucial to the power system planning and inability to take into account the latent nonlinear interactions among the
operation of the power system [4]. Accurate forecasting of the electrical data, traditional statistical approaches find it challenging to produce
load helps ensure the financial benefits of the power infrastructure and superior prediction results. Even if machine learning solves the issue of
improves social stability, especially during the COVID-19 pandemic [5]. non-linear interactions not being taken into account and produces better
The forecast of electricity loads can be divided into three categories: prediction outcomes, overfitting still occurs when the data dimension is
short-term, medium-term, and long-term, covering three forecast times large. In recent years, various types of deep learning models and their
that range from 24 h to a week, a week to a year, and more than a year, hybrid models have also been developed for electric load forecasting,
respectively [6]. Different types of power forecast serve different pur and these deep learning models also have better forecasting results than
poses: short-term power load forecasts are used to provide a theoretical other classical machine learning methods.
basis for hydroelectricity scheduling, unit start-up, and shutdown [7]; Long short-term memory (LSTM) and convolutional neural networks
medium-term power load forecasts provide a theoretical basis for (CNN) are the most widely used techniques in deep learning [9], and
rationalizing the maintenance of grid equipment. Forecasts of the elec have also been widely used in power load forecasting in recent years.
tricity consumption situation in the next few years are mainly used for CNN can effectively extract depth features and filter noise [10], in [11],
grid renovation and expansion plans in the grid planning department. In a deep convolutional neural network model based on multi-scale con
this paper, we study the effect of the hybrid model constructed on the volutions is proposed for multi-step short-term load forecasting. LSTM
* Corresponding author.
E-mail address: tj@njupt.edu.cn (J. Tan).
https://doi.org/10.1016/j.epsr.2023.109507
Received 22 October 2022; Received in revised form 7 May 2023; Accepted 20 May 2023
Available online 27 May 2023
0378-7796/© 2023 Elsevier B.V. All rights reserved.
S. Zhang et al. Electric Power Systems Research 222 (2023) 109507
can capture information on sequence patterns and effectively solve the prediction accuracy by inputting energy sequences of major household
problem of gradient disappearance or explosion of RNN [12]. Marino devices. Shi et al. [18] proposed a pooled deep recurrent neural network
[13] and Kong [14] et al. used LSTM networks to predict electrical (PDRNN) to address the high volatility and uncertainty in home load
service of individual houses and achieved good results. Hybrid models forecasting, with a 6.5% reduction in RMSE compared to the classical
frequently combine the benefits of other models to produce more ac RNN forecasting model. Hua et al. [19] developed a hybrid modeling
curate prediction results. According to the experimental findings in [15] approach combining RNN and Ornstein-Uhlenbeck process to solve en
and [6], the hybrid model of either one can significantly increase ergy management problems as well as to predict PV panel and load
predictability. power by dynamic programming methods. Lin et al. [20] used an
The complex and changing weather conditions can have an impact attention mechanism to assign weight coefficients to input sequence
on the load of the power system. Because it is difficult to monitor the data and then calculated the electrical load output value for each LSTM
changes of weather in a very short period of time, the time interval cell based on the forward propagation method, which effectively
between weather data and power load data is often unequal, which improved the model accuracy. Shi [21] proposed a two-layer LSTM layer
makes traditional load forecasting models have certain limitations. In framework that relaxed the fixed strong assumption of mutation points
this paper, we implement a convolutional LSTM multi-task learning and detects uncertain mutation points point by point. The long and
model [16], putting data such as weather data at day intervals into the short-term sensor dependencies within each sensor are characterized by
auxiliary task for training and total active power every 15 min into the an LSTM network to achieve high-precision real-time RUL prediction.
main task for training. The CNN layer is used to extract the depth fea Dat et al. [22] used a hybrid online model to process real-time data. The
tures of the data as input of the LSTM layer, and the LSTM layer decodes original load sequence is decomposed into trend, seasonal and residual
the extracted depth features and obtains the prediction results through terms and predicted using online ARIMA, Fourier transform and online
the fully connected layer. Our proposed MTMV-CNN-LSTM model RNN, respectively. The model can dynamically modify its parameters to
significantly outperforms the other baseline models in the 10-day adapt to changes in real-time data and gives better results than a single
short-term forecasting and 3-month medium-term power load fore online model. Dua et al. [23] focused on the application of SVM and
casting tasks constructed in this paper. In the 10-day load forecasting DNN in short-term electricity load forecasting. The prediction results of
task, the multi-task model improves 66.67%, 40.35%, 61.54%, and the different models were compared by modifying the structure of the
6.21% in the four evaluation metrics of MSE, RMSE, MAE, and R2 , DNN model and changing the kernel function of the SVM.
respectively, compared with the single-task CNN-LSTM model. In the CNN is also one of the most widely used models in deep learning,
3-month load prediction task, the multitask model decreased 5.89% which has the property of reducing the complexity of the model and
performance over the single-task CNN-LSTM in R2 metric, but the improving its ability to extract abstract features through local connec
MTMV-CNN-LSTM model improved 62.5%, 39.78%, and 31.25% in the tivity and shared weights. CNN is also often used for electricity load
three metrics of MSE, RMSE, and MAE, respectively. In general, the forecasting, or in combination with recurrent neural networks, because
invocation of multi-task learning leads to effective improvement in of the large number of factors influencing electricity load, such as
prediction. weather factors, temporal characteristics, economic factors, consumer
The main contributions of this paper are as follows. (1) We propose a types, unexpected events, electricity prices, etc. Kim [24] proposed a
new hybrid deep learning network based on data features by drawing on CNN-LSTM network that combines convolutional neural networks and
the multi-task learning approach commonly used in various fields of long and short-term memory neural networks to extract complex energy
Artificial Intelligence (AI), which solves the problem of poor convolu consumption characteristics for time series components. The temporal
tional effects with too many repeated values. (2) Our model performs information of irregular trends in the modeling solves the problem of
different tasks simultaneously and can share parameters to a certain electrical energy consumption, which was previously difficult to predict.
extent, which improves the generalizability of the model. (3) We are the Khan et al. [25] proposed a hybrid model combining CNN and LSTM-AE
first to introduce a multi-task learning approach to the task of power for power prediction in residential and commercial buildings, and the
load forecasting. experimental results showed that the proposed hybrid model works
The remainder of the paper is organized as follows. Section 2 pro well. Singh et al. [26] constructed a multi-step model to predict
vides an analysis of the literature review and compares the proposed short-term electrical loads based on 2D convolutional networks, which is
model with existing methods. Section 3 describes the data used in this smaller in size and training time compared to other deep learning
paper in detail. Section 4 presents the methods used in this paper and the methods with similar effects and greatly reduces training cost.
specific structure of the proposed MTMV-CNN-LSTM (multiple task, Reinforcement learning can learn better policy planning and control
multiple value) model. Section 5 presents the evaluation metrics used to in exploring and utilizing data to maximize long-term gains in sequential
evaluate the model performance, the experimental setup, and the decision-making tasks, and has also been widely used in grid energy
analysis of the experimental results. Section 6 provides conclusions and dispatching. Feng et al. [27] proposed a new reinforcement
a discussion of future research directions. learning-based dynamic model selection method for short-term power
load forecasting, which uses an optimal DMS strategy to select from a
2. Related work pool of forecasting models at each time step the optimal model at each
time step, greatly improving the prediction accuracy. Furthermore,
In recent decades, researchers have conducted a great deal of deterministic and probabilistic load forecasting (DLF and PLF) are also
research on electric load forecasting, improvements and innovations important for power systems, so Feng [28] also combined reinforcement
have been made in many techniques, some of which are currently newer learning with multi-step forecasting to construct a two-step short-term
methods, and techniques are discussed below. load forecasting model based on Q-learning for dynamic model selec
Power load prediction is also a kind of sequential modeling task, tion, first selecting the locally optimal DLF model from a pool of
which is mostly studied on the basis of recurrent neural networks, and deterministic forecasting models and then selecting the best PLF from a
LSTM is one of the most commonly used models. When training lengthy pool of probabilistic forecasting models model. The method improves
sequences, LSTM performs better than RNN, has higher generalizability the accuracy of DLF and PLF prediction by 50% and 60%, respectively.
and can handle the gradient disappearance and gradient explosion In addition to the above methods, other deep learning techniques
problems. Kong et al. [17] proposed an LSTM-based deep learning such as generative adversarial networks (GAN) and transfer learning
framework for forecasting meter-level load. Incorporating residential have also been developed for power load forecasting and have shown
living patterns into the meter-scale prediction significantly improves the excellent prediction performance. The GAN solves the problem of low
data volume by generating virtual data. Moon et al. [29] proposed a
2
Table 1 using GAN, and the results showed that GAN-generated load curves can
Weather date examples. not only capture the general trend, but also detect its random variation.
Date Weather Highest Lowest Wind The wind For regions with limited historical data, transfer learning can be chosen
conditions temperature temperature direction direction to learn the model from the historical data in the source domain and
during the at night transfer the model from the source domain to the target region by
day
fine-tuning the model and parameters. The experimental results of Yang
2018/ Cloudy/ 22 ◦ C 12 ◦ C No No [31] show that the prediction error can be significantly reduced using
1/1 Cloudy sustained sustained knowledge learned from other regions.
wind wind
direction direction
The research and review of the existing literature provide a theo
< level 3 < level 3 retical basis for this study. Various machine learning and deep learning
2018/ Cloudy/ 22 ◦ C 15 ◦ C No No algorithms are commonly used for electric load forecasting, and
1/2 Cloudy sustained sustained methods such as GAN can be chosen to generate virtual data when the
wind wind
amount of data is small; when there are many variables, feature
direction direction
< level 3 < level 3 screening of data, or CNN is often used to extract high-dimensional
2018/ Cloudy/ 23 ◦ C 15 ◦ C No wind No features. In this paper, the amount of data is sufficient and there are
1/3 Overcast direction sustained many feature variables, so CNN is selected for feature screening and
< level 3 wind improved based on the LSTM. After the weather features are encoded,
sustained direction
< level 3
the data dimension increases exponentially, and because the time in
2018/ Cloudy/ 21 ◦ C 16 ◦ C No No terval between the load data and the rest of the feature data is different,
1/4 Sprinkle sustained sustained a large amount of data duplication will be caused when the data are
wind wind stitched together. Previous studies have not considered the problem and
direction direction
have trained each feature variable directly by putting them into the
< level 3 < level 3
…… …… …… …… …… …… convolutional layer together or by feeding the data directly into the
2021/ Thunder 32 ◦ C 26 ◦ C North North model for training. These two operations will make the convolution poor
8/ shower/ wind level wind level and fail to extract useful high-dimensional features, which leads to a
31 Shower 1–2 1–2 poor fit of the final model. In this paper, we choose a multi-task learning
approach and construct a hybrid deep learning network MTMV-CNN-
LSTM, in which the variable characteristics with ‘days’ as the time in
Table 2 terval, such as weather characteristics and time series characteristics,
The description of features. are placed in the auxiliary task for training, and the power load data are
Variable Description put into the main task for training, which effectively solves the above
1 Month An integer value between 1 and 12
problem. The above problem is effectively solved and the prediction
2 Day An integer value between 1 and 31 accuracy of the model is effectively improved by weighing the relevant
3 Hour An integer value between 1 and 24 information from the two tasks.
4 Season An integer value between 1 and 4
5 Day of the year An integer value between 1 and 366
3. Data introduction and processing
6 Week of the year An integer value between 1 and 53
7 Day of the week An integer value between 1 and 7
8 Whether weekend A categorical value of 0 or 1 The original dataset was taken from question B of the 10th Teddy
9 Whether the beginning of the month A categorical value of 0 or 1 Cup Data Mining Challenge for a period of 44 months from 2018/1/1 to
10 Whether the end of the month A categorical value of 0 or 1 2021/8/31. The dataset provides load data at 15-minute intervals and
11 Whether the beginning of the season A categorical value of 0 or 1
12 Whether the end of the season A categorical value of 0 or 1
weather data including temperature, wind speed, and wind direction at
13 Whether holiday A categorical value of 0 or 1 daily intervals for a regional grid. Table 1 shows five items of the
14 Mean Average of daily electricity load weather data.
15 Max Maximum of daily electricity load In problems related to time series, it is often necessary to extract time
16 Min Minimum of daily electricity load
series features, which helps to enhance the comprehensiveness of
knowledge of feature variables and improve the accuracy of classifica
two-stage data generation scheme to generate temperature data and tion. Referring to the holiday calendar for the years 2018 through 2021,
electricity load data using GAN and regression models, respectively to we also incorporated the ‘Whether holiday’, ‘The average daily elec
increase the sample size. In addition, GAN-generated data curves can be tricity load’, and other characteristics. Table 2 lists the 12 temporal
used to capture data features. Wang et al. [30] showed the real electric features we extracted based on the dataset and the 4 additional variables
load and the unknown probability distribution using GAN, selecting the we added.
optimal number of clusters for clustering using K-Means and We preprocessed the raw data in four primary steps: missing value
Davies-Bouldin indices, and then generating load curves for each cluster processing, outlier processing, normalization processing, and
Fig. 1. Outlier test chart (red dots are outlier points at identification).
3
Fig. 2. Time distribution of total active power (part of the 2018).
Fig. 3. Time distribution of total active power (averaged by year).
Fig. 4. Single-task learning versus multi-task learning.
classification data coding to enhance the quality of the data. We choose

∑
m
One Class SVM [32] as our outlier testing method, which is an unsu min V(r) + C ζi (1)
⏟⏞⏞⏟
pervised outlier detection method with the basic idea of SVDD (support r,o i=1
vector data description), using a hypersphere to do the division,

expecting to minimize the volume of the hypersphere and thus minimize ‖ xi − o ‖2 ≤ r + ζi , i = 1, 2, ⋯, m (2)
the impact of outlier data.
Assume that the center of the hypersphere is o, the radius is r, and the ζi ≥ 0, i = 1, 2, ⋯, m (3)
volume V(r) is minimized. Similar to the traditional SVM approach, the
After using Lagrange dual solution, if the distance from the new data
distance from all training data points xi to the center o is required to be
point to the center point o is less than or equal to the radius r, then it is
strictly less than r, while constructing a relaxation vector C with penalty
not an outlier; if it is greater than r, outside the hypersphere, then it is
factor ζi , then the optimization problem can be expressed as:
considered an outlier.
4
Fig. 5. Illustration of the MTMV-CNN-LSTM framework.
anomalous. Replace the abnormal data with the average value of the
Table 3 total active power of 1 day before and the 1 day after the time point to
Detailed configuration information of the proposed deep model.
complete the abnormal value processing.
Layer (type) Output Param Connected to Fig. 2 and 3 show the daily time distribution of total power for 2018
Shape #
and the average monthly total power distribution for 2018–2021,
aux_input (InputLayer) [(None, 40, 0 [] respectively. As can be seen in Fig. 2, the maximum load is the period
44)] mainly from 9 to 17 h per day. One of the reasons may be that 9–17
conv1d_18 (Conv1D) (None, 38, 4256 [’aux_input’]
32)
o’clock is the main time period for manufacturing in various industries,
max_pooling1d_9 (None, 9, 0 [’conv1d_18′ ] so the total active power is higher during that period. Fig. 3 shows that
(MaxPooling1D) 32) February of each year is the lowest value of total active power, which
lstmaux1 (CuDNNLSTM) (None, 9, 4320 [’max_pooling1d_9′ ] may be due to the Chinese New Year period, when industries shut down
20)
their production, causing the total active power to drop sharply. The
main_input (InputLayer) [(None, 0 []
3840, 1)] total active power in 2018 is generally much higher than that of the
lstmaux2 (CuDNNLSTM) (None, 60) 19,680 [’lstmaux1′ ] three years 2019–2021, which may be influenced by the new crown
conv1d_16 (Conv1D) (None, 32 [’main_input’] epidemic and the policy of that year.
3838, 8)
tf.reshape_25 (None, 10, 0 [’lstmaux2′ ]
4. Model construction
(TFOpLambda) 6)
max_pooling1d_8 (None, 959, 0 [’conv1d_16′ ]
(MaxPooling1D) 8) 4.1. Multi-task learning
dropout_12 (Dropout) (None, 10, 0 [’tf.reshape_25′ ]
6)
Most machine learning tasks are single-task learning, but an exces
m1 (CuDNNLSTM) (None, 480) 940,800 [’max_pooling1d_8′ ]
time_distributed_2 (None, 10, 21 [’dropout_12′ ] sive focus on a single model may overlook some potential information in
(TimeDistributed) 3) related tasks that can help improve the target task. The method of per
dense_14 (Dense) (None, 960) 461,760 [’m1′ ] forming a number of different tasks simultaneously and sharing pa
tf.repeat_6 (TFOpLambda) (None,) 0 [’time_distributed_2′ ] rameters to some extent is known as multi-task learning [33]. In other
tf.reshape_24 (None, 960, 0 [’dense_14′ ]
words, the one with multiple objective function losses learned at the
(TFOpLambda) 1)
tf.reshape_27 (None, 960, 0 [’tf.repeat_6′ ] same time is considered multi-task learning. For a complex problem, we
(TFOpLambda) 3) can decompose it into several related but not identical subtasks, and
concatenate_6 (None, 960, 0 [’tf.reshape_24′ , ’tf. then combine the results of each subtask to get the results of the initial
(Concatenate) 4) reshape_27′ ]
complex problem. Fig. 4 shows the comparison between single-task
conv1d_19 (Conv1D) (None, 958, 416 [’concatenate_6′ ]
32)
learning and multi-task learning, and it can be seen that the models
mian1 (CuDNNLSTM) (None, 958, 173,568 [’conv1d_19′ ] between each task of single-task learning are independent of each other,
192) while the model space (trained model) between each task of multi-task
mian2 (CuDNNLSTM) (None, 384) 887,808 [’mian1′ ] learning is shared. In addition, compared with single-task learning,
dropout_13 (Dropout) (None, 384) 0 [’mian2′ ]
multi-task learning has the following features:
dense_16 (Dense) (None, 960) 369,600 [’dropout_13′ ]
(1) Multi-task learning occupies less memory because multiple tasks

The identified outliers are shown in Fig. 1. For these outliers, if their share the same model;
continuous length exceeds 12 (i.e., 3 h) it is considered normal. Because (2) Multiple tasks can produce results through a single forward
of holidays, peak hours, etc., a longer continuous period of time when calculation, which greatly increases the inference speed;
the electricity load is different from usual should not be considered as
5
Fig. 6. LSTM network structure.
Fig. 7. Dropout VS. normal network.
significantly improve the performance of the model through multi-task

Table 4 learning techniques.
Parameter settings.
Task Type Filter Kernel Pool Activation 4.2. Proposed deep model
size size
Main Task Conv1D 8 3 – LeakyReLU Since we use power load data at 15-minute intervals and weather
Maxpooling1D 4
– – –
data at day intervals, when performing data merging, weather data will
Dense(960) – – – LeakyReLU
Auxiliary Conv1D 32 3 – LeakyReLU have a large number of repetitions within a day. Putting each feature
Task Maxpooling1D – – 4 – variable into the convolution layer together for training will cause the
Dense(3) – – – LeakyReLU problem of poor convolution, which leads to difficulty in feature
extraction and poor model fitting. Based on the characteristics of the
dataset, we choose to use multi-task learning and propose the MTMV-
(3) Multiple associated tasks can improve each other’s performance
CNN-LSTM hybrid deep learning network to construct an auxiliary
and model generalization by sharing information and com
task in which the weather conditions, as well as the time series features
plementing each other.
and daily power features shown in Table 2 are placed for training, and
the power load data every 15 min are placed in the main task for
Overall, multi-task learning usually achieve better results than
training. The generalizability of the model is improved by weighing the
single-task learning.
relevant training information for the main task and the auxiliary task.
There are two models for the application of multi-task learning in
The structure diagram of the hybrid deep learning network MTMV-CNN-
deep learning. The first is hard parameter sharing, where all tasks share
LSTM constructed in this paper is shown in Fig. 5, and its detailed
a hidden layer to capture the intrinsic joint features. The other is soft
configuration is shown in Table 3.
parameter sharing, where each task has a specific model and parame
To better illustrate our model, we give the pseudocode of the MTMV-
ters, and the parameters are not shared, using distance regularization to
CNN-LSTM model and specify the internal structure of the model in
constrain the parameters between models and guarantee the similarity
Sections 4.2.1-4.2.6. w is the loss weight of the auxiliary task.
of the parameter space. The model constructed in this paper uses hard
parameter sharing, which is also a common way of parameter sharing in
4.2.1. Input layer
practice [34]. In addition, multi-task learning can be improved by data
The maximum, minimum and average values of the daily electric
amplification, eavesdropping, attribute selection, representation bias,
load (40 × 3 order matrix), meteorological data extracted by data en
and regularization to improve model training [35]. In general, we can
gineering with meteorological characteristics (40 × 28 order matrix),
6
Fig. 8. Performance comparison of different weights.
Fig. 9. Comparison of the loss curve (forecast 10 days).
and temporal feature data (40 × 13 order matrix) are used as input to 4.2.3. LSTM layer
the auxiliary task. Power load data is used as input to the main task. In the sequence modeling task, the recurrent neural network repre
sented by LSTM [12] has a strong ability to extract time-series type data
4.2.2. Convolutional and pooling layer features, and its variant BiLSTM, which consists of a bidirectional LSTM,
The CNN convolutional layer is comparable to ‘filtering’ the input of can acquire features from both directions of the sequence, improves the
the data, extracting its features using convolutional computing, and decoding ability of the decoding layer to decode the input features and
mining the correlation of the feature vectors of the input data in a high- the performance of the model to deal with the sequence problem.
dimensional space. The computational equations for these operations The LSTM adds an additional internal state ct specifically for linear
are as follows. cyclic information transmission, which can be calculated using the
Wj ∑
following equation. This is in contrast to the RNN, which only has one
∑
M ∑ Hi
F⊗w=
(
F k (i, j)wk (i, j)
)
(4) transfer state ht .
(5)
k=1 j=1 i=1
ct = ft ⊙ ct− 1 + it ⊙ ̃
ct
Where F denotes the input data, ⊗ denotes the convolution calculation,
ht = ot ⊙ tanh(ct ) (6)
and w denotes the weight parameter of the convolution kernel; M, Hf
and Wf are the number of channels, height and width of the convolution
Where ft ∈ [0, 1]D , it ∈ [0, 1]D and ot ∈ [0, 1]D are three gates that control
kernel, respectively. the path of information transfer; ⊙ denotes the vector element product;
The pooling layer serves two purposes: feature invariance and and ̃ct denotes the candidate state obtained by the nonlinear function.
feature dimensionality reduction. We extract the depth features that The LSTM network introduces a gating mechanism to regulate the
affect the daily electric load by convolutional and pooling layers. direction of information transfer. The three ‘gates’, input gate it , forget
gate ft , and output gate ot are determined using Eqs. (7) to (9).
7
Fig. 10. Fitting results (forecast 10 days).
8
Table 5 it = σ (Wi xt + Ui ht− 1 + bi ) (7)

Performance comparison of machine learning methods in case of experiment 1.
( )
Items Model MSE RMSE MAE R2 ft = σ Wf xt + Uf ht− 1 + bf (8)
Electricity load forecast for Prophet 0.182 0.427 0.277 0.433
the next 10 days Catboost 0.004 0.062 0.048 0.211
ot = σ (Wo xt + Uo ht− 1 + bo ) (9)
LSTM 0.001 0.035 0.026 0.774
GRU 0.008 0.092 0.063 0.572 where σ (⋅) denotes the sigmoid function and ht− 1 denotes the external
GAN 0.036 0.190 0.146 0.785 state at the previous moment. Fig. 6
CNN-LSTM 0.003 0.057 0.042 0.805
MTMV-CNN- 0.001 0.034 0.026 0.855
4.2.4. Dropout layer and fully connected layer
LSTM
Models with too many parameters are more likely to overfit in deep
Note: MSE, RMSE, and MAE are standardized. learning. In 2012, Hinton et al. [36] proposed the Dropout algorithm,
which can be more effective in alleviating the occurrence of overfitting
phenomenon and improving the performance of neural networks,
playing a similar effect to regularization. These are very important in
training CNNs; if the dropout layer is not present, the first training
samples will affect learning in a disproportionate way. In other words,
this will prevent the learning of features that will only appear in later
samples or batches.
Fig. 7 illustrates the fundamental concept of dropout. During
training, each neuron is discarded with probability p, and the neurons
retained by each forward propagation are different, which reduces the
Fig. 11. Results of the robustness test.
Fig. 12. Illustration of the MTMV-CNN-LSTM framework (forecast 3 months).
9
Fig. 13. Comparison of the loss curve (forecast 3 months).
network’s dependence on specific neurons and thus can provide the √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√N
neural network with more generalization ability, allowing the neural √∑
√ (yi − ̂ y i )2
network to better adapt to the noise and variability of real data. Dropout √i=1
RMSE = (13)
mechanism plays the role of averaging and reducing the complex co- N
adaptive relationships between neurons [37]. Dropping out different
hidden neurons makes the network structure different and is similar to 1 ∑N
MAE = × yi|
|yi − ̂ (14)
training different networks, so the dropout process is equivalent to N i=1
averaging the different neural networks. At the same time, dropout
makes the update of weights no longer depend on the joint action of ∑
N
implicit nodes with fixed relationships. (yi − ̂y i )2

In this paper, we set the probability of neuron disconnection in the R2 = 1 − i=1
(15)
∑N
dropout layer to 0.3. After decoding the deep features extracted by the (yi − yi )2
i=1
encoder, the prediction results are finally obtained by the fully con
nected layer. where yi and ̂
y i are the actual and predicted load values, respectively, yi
denotes the average of the true values, N is the total number of obser
4.2.5. Repeat and concatenate vations.
Repeat and expand the forecast results of Auxiliary Output, i.e., the
maximum, minimum, and average values of power load in the next 10
days, and then concatenate Decoder_1 Output with Auxiliary Output. 5.2. Experimental settings
4.2.6. Output layer and loss function The electricity load was first predicted every 15 min for the next 10
In multi-task learning, the training of the model is usually weighted days. The encoder of this target job is made up of two layers: a double
by multiple loss functions to obtain the loss, and we set the loss weight of layer convolutional layer and a pooling layer, and the decoder is made
the main task to 1 and the loss weight of the auxiliary task to 0.3, as up of three layers: a Bi-LSTM layer, a dropout layer, and a fully con
shown in Eqs. (10)-(11). We will explain why the weight of 0.3 was nected layer. The final output is the maximum, minimum and average
chosen for the auxiliary task in Section 5.2. values of power load for the next 10 days, i.e., the output of the auxiliary
Loss = 1⋅Loss1 + 0.3⋅Loss2 (10) task is the 10 × 3 matrix. The main task consists of an encoder and two
decoders, and the encoder consists of convolutional and pooling layers.
y i − yi )2 , i = 1, 2
Lossi = (̂ (11) The input matrix of 3840 × 1 order is formed. Decoder 1 consists of the
LSTM layer and a fully connected layer, and decoder 2 consists of the
where Loss1 and Loss2 denote the losses of the main task and the auxil BiLSTM layer, Dropout layer, and fully connected layer, and the final
iary task, respectively. output is the forecast result of the power load every 15 min for the next
10 days, that is the output result is the matrix of 960 × 1 order. In the
5. Experiments and results training process, the dataset will be split into a training set and a testing
set of 90% and 10%.
5.1. Performance evaluation By adjusting the parameters, we can change the way the network is
designed. By altering the number of filters, kernel size, activation
To measure the accuracy of the forecasting performance, the fore function, and other parameters, each layer can alter the model’s per
casted values of the electrical loads obtained from the testing process formance and learning ability. In this paper, we choose the combination
were compared with the actual values using the mean square error of random search and a coarse-to-fine tuning strategy to adjust the pa
(MSE), root mean square error (RMSE), mean absolute error (MAE) and rameters of the model, and the parameters of each layer of our designed
coefficient of determination (R2 ). The formulas of these four metrics are MTMV-CNN-LSTM model are shown in Table 4. The table shows the
given in Eq. (12)-(15). number of filters for the convolutional layer in each task, the size of the
convolutional layer, the kernel of the pooling layer and the type of the
∑
N
y i )2
(yi − ̂ activation function, the number of parameters for the whole layer
including the LSTM layer, and the output dimension of the remaining
MSE = i=1
(12)
N layers can be found in Fig. 5.
In the research, the assumption of the learning goal of the auxiliary
task is to make each evaluation index of the model relatively optimal.
10
Fig. 14. Fitting results (forecast 3 months).
11
Table 6 5.3. Experimental results

Performance comparison of machine learning methods in case of experiment 2.
Items Model MSE RMSE MAE R2 Fig. 9 and 10 show the results of the fitting for forecasting the load
every 15 min for the following 10 days. As we can see from Fig. 9, the
Forecast of maximum and Prophet 0.191 0.437 0.331 0.483
minimum electric load for Catboost 0.004 0.065 0.055 0.263
model converges quickly, the loss functions of the training and valida
the next three months LSTM 0.009 0.097 0.074 0.519 tion sets continue to decline throughout the fitting process, and the
GRU 0.010 0.098 0.071 0.490 performance of the training and validation sets converges after 25
GAN 0.043 0.208 0.159 − 9.269 epochs, demonstrating that the model has a good prediction of the
CNN-LSTM 0.008 0.088 0.064 0.680
power load every 15 min for the following 10 days without overfitting
MTMV- 0.003 0.053 0.044 0.640
CNN-LSTM and has good accuracy. It can also be seen from Fig. 10 that the actual
values fit the predicted values to a high degree, the model fits better, and
the prediction results are more satisfactory.
Prophet, Catboost, LSTM, GRU, GAN, and CNN-LSTM are classical
Algorithm
MTMV-CNN-LSTM. methods in electric load forecasting, and we choose them as baselines to
compare them with our proposed model. Bidirectional LSTM is selected
Input_aux: daily power feature, weather feature, data feature
as the first layer of our LSTM model; the GRU model is constructed using
Input_main: total active power
Output_aux: daily power feature for the next ten days two layers of GRU. The GAN is selected with GRU as the generator and
Output_main: total active power for the next ten days CNN as the discriminator. The generator consists of three GRU layers
for epoch in range(epochs) do and two Dense layers, and the discriminator consists of three convolu
for epoch in auxiliary task do
tional layers and three Dense layers. CNN-LSTM is the last baseline
Output_aux=Aux_task(Input_aux)
calculate the loss of the auxiliary task Loss2 :
model, where the number of convolutional units in CNN is 10 and the
Loss2 =criterion(Output_aux, weather feature) number of hidden layer units in LSTM is 100.
end We evaluated with four evaluation metrics, MSE, RMSE, MAE and
return Output_aux R2 , and the results are shown in Table 5.
for epoch in main task do
Output_main=Main_task(Input_main)
The experimental results show that the proposed MTMV-CNN-LSTM
Concatenate(Input_main, Output_aux) model performs better than traditional machine learning methods in
calculate the loss of the main task Loss1 : power load forecasting and is a competitive power load forecasting
Loss1 =criterion(Output_mian, daily power feature) method. The LSTM, CNN-LSTM and our proposed MTMV-CNN-LSTM
end
have better performance in the evaluation metrics of MSE, RMSE and
return Output_main
weighting to get the loss of the MTMV-CNN-LSTM model: MAE performance. However, MTMV-CNN-LSTM significantly out
Loss = 1⋅Loss1 + ω⋅Loss2 performs the remaining six baseline models in R2 , with a 6.21%
End improvement compared to second-ranked CNN-LSTM.
Fig. 8 shows the comparison of the evaluation indices under different 5.4. Robustness test
weights of the auxiliary task. The evaluation indexes of the model are
good when the weights are 0.2,0.3 and 0.8, but the model has a better Robustness refers to the performance of the model against input
prediction when the weight is 0.3 in a comprehensive comparison. perturbations and against samples. In this investigation, we test the
robustness of the model proposed in this paper by adding small per
turbations to the data. We add Gaussian noise with mean 0 and variance
1 and uniform noise obeying the distribution of [0,1] distribution to the
original data, respectively, and verify the robustness of the model after a
Fig. 15. Time distribution.
12
large number of repeated experiments. Fig. 11 shows the results of the and more. Temperature, wind direction, rain, and many other factors
robustness test. The results show that after adding Gaussian noise and affect the power load forecasting. The complex and variable weather
uniform noise to the original data, the MSE, RMSE and MAE do not conditions make the application of traditional load forecasting models
change significantly, and the R2 of the model fit decreases slightly. have certain limitations, especially when the time interval between
Compared to the original data, the R2 of the model decreased by 3.27% weather data and load data is not equal. Since the dataset selected in this
and 5.73% after adding Gaussian noise and uniform noise, respectively, paper has more feature variables and a larger amount of data, if the data
but they are still above 0.8. This shows that our proposed MTMV-CNN- are directly placed into the convolution layer for training, it will lead to
LSTM model is highly effective by forcing the network to learn more poor convolution effect. Therefore, we introduce the idea of multi-task
robust features through the dropout layer. learning into the power load prediction and construct the MTMV-
CNN-LSTM model. Weather features and time series features with
‘days’ as the time interval are put into the auxiliary task, and the power
5.5. Medium -term forecast test load data are put into the main task, effectively solving the problem of
poor convolution caused by too many repeated values. In addition, it can
Daily power high and low peak load forecasts are critical for stable also predict the power load and other feature variables at the same time
and efficient power system operations, daily generator outage planning, and share parameters to some extent to improve the generalizability of
and supply and demand management [38]. Therefore, we set to predict the task.
the maximum and minimum values of daily load for the next 3 months to We designed two experiments to predict the short and medium-term
test the prediction accuracy of the MTMV-CNN-LSTM model and to find power loads and compared them with the effects of other models to
the time to reach the maximum and minimum values of load accord confirm that the MTMV-CNN-LSTM model has better prediction results
ingly. Due to the different purposes, we made a simple modification to for power load forecasting. In addition, we also conducted robustness
the model graph in Fig. 12. tests on the proposed model and confirmed that our model still has
As can be seen in Fig. 12, we use only a single convolutional and better prediction results in the presence of data perturbations.
pooling layer in Aux_Encoder_1 for the extraction of the depth features Our proposed model is also applicable to forecasting tasks with a
of the daily electric load. The maximum and minimum values of the large number of feature variables or different time intervals of feature
daily electric load in the forecast results of Main Output are extracted for variables. It is not limited to be used in electric load forecasting tasks but
Loss2 calculation, and the maximum and minimum values of the daily also in forecasting tasks in other domains. The feature variables that may
load for the next 3 months are predicted. Therefore, in the auxiliary task, be unfavorable to the prediction target but helpful to the training of the
only the maximum and minimum values should be followed to fit. Each model are put into the auxiliary task for training, and the obtained re
sliding window of the main task is 3 months, and the dimensions of the sults are used as a part of the main task to improve the prediction ac
rest of the output matrix are set as in Fig. 12. curacy of the final model. In the future, we plan to further explore the
For the prediction of the maximum and minimum values of daily features affecting the electric load and apply these data to the load
load for the next 3 months, the last 10% of the data were taken as the forecasting model, or extend the model proposed in this paper to other
validation set, and the obtained model fitting results are shown in fields of sequential modeling tasks to effectively handle multimodal data
Fig. 13 and Fig. 14. and thus improve the convolution effect. If data from other neighboring
It can be seen from Fig. 13 that the model converges quickly, the loss sites are available, the model proposed in this paper can also be com
functions of the training and validation sets keep decreasing during the bined with other deep learning methods, such as transfer learning, to
fitting process, and the performance of the training and validation sets further improve the prediction effect and generalization ability of the
tend to overlap after 100 epochs, indicating that the model has a good model.
prediction of the power load for the next three months without over
fitting, and the model has good accuracy. CRediT authorship contribution statement
It can also be seen from Fig. 14 that the actual values fit the predicted
values to a high degree, the model fits better, and the prediction results Shiyun Zhang: Conceptualization, Methodology, Writing – original
are more satisfactory. draft, Writing – review & editing. Runhuan Chen: Software, Validation,
As with the task of predicting the future 10-day electricity load, we Visualization. Jiacheng Cao: Analysis and Interpretation of results,
still chose Prophet, Catboost, LSTM, GRU, GAN, CNN-LSTM and the Validation, Visualization. Jian Tan: Conceptualization, Writing – re
proposed model for comparison. The results in Table 6 show that our view & editing.
proposed method has absolute priority in all evaluation metrics for
power prediction of different durations. Our proposed MTMV-CNN- Declaration of Competing Interest
LSTM has the highest performance on all metrics except for the R2 ,
where it is 6.25% smaller than CNN-LSTM. Compared to medium-term The authors declare that they have no known competing financial
forecasting issues, the MTMV-CNN-LSTM model performs better in interests or personal relationships that could have appeared to influence
short-term forecasting. In contrast to the predictions from other models, the work reported in this paper.
it improves considerably in the medium-term prediction test.
We predict the maximum and minimum daily load values for the next Data availability
3 months and the corresponding times when the maximum and mini
mum load values will be reached, and then count the number of days. Data will be made available on request.
Fig. 15 shows that the maximum daily electric load for the next three
months is concentrated at 10:00 and 16:00, which is the main time of
work and manufacturing for all industries. The minimum value of the Acknowledgement
daily electric load is concentrated mainly at 4:30, accounting for 97.8%.
This work was supported in part by the National Natural Science
6. Conclusion Foundation of China (Grant No. 11901309).
Power load forecasting is an important part of power system plan

ning, and with the expanding scale of power grids and the increasing
energy demand, the accuracy of power load forecasting is required more
13
References [20] Z. Lin, L. Cheng, G. Huang, Electricity consumption prediction based on LSTM with
attention mechanism[J], IEEJ Trans. Electr. Electron. Eng. 15 (4) (2020) 556–562.
[21] Z. Shi, A. Chehade, A dual-LSTM framework combining change point detection and
[1] J. Liu, Y. Yin, Power Load Forecasting Considering Climate Factors Based on IPSO-
remaining useful life prediction[J], Reliab. Eng. Syst. Saf. 205 (2021), 107257.
Elman Method in China[J], Energies 15 (3) (2022) 1236.
[22] N.Q. Dat, N.T. Ngoc Anh, N. Nhat Anh, et al., Hybrid online model based multi
[2] X. Zhang, J. Wang, K Zhang, Short-term electric load forecasting based on singular
seasonal decompose for short-term electricity load forecasting using ARIMA and
spectrum analysis and support vector machine optimized by Cuckoo search
online RNN[J], J. Intell. Fuzzy Syst. 41 (5) (2021) 5639–5652.
algorithm[J], Electr. Power Syst. Res. 146 (2017), 270-28.
[23] S. Dua, S. Gautam, M. Garg, et al., Short Term Load Forecasting using Machine
[3] Y. Yang, S. Li, W. Li, et al., Power load probability density forecasting using
Learning Techniques[C]//, in: 2022 2nd International Conference on Intelligent
Gaussian process quantile regression[J], Appl. Energy 213 (2018) 499–509.
Technologies (CONIT). IEEE, 2022, pp. 1–6.
[4] Y. Lin, H. Luo, D. Wang, et al., An ensemble model based on machine learning
[24] T.Y. Kim, S.B Cho, Predicting residential energy consumption using CNN-LSTM
methods and data preprocessing for short-term electric load forecasting[J],
neural networks[J], Energy 182 (2019) 72–81.
Energies 10 (8) (2017) 1186.
[25] Z.A. Khan, T. Hussain, A. Ullah, et al., Towards efficient electricity forecasting in
[5] X. Li, Y. Wang, G. Ma, et al., Electric load forecasting based on Long-Short-Term-
residential and commercial buildings: a novel hybrid CNN with a LSTM-AE based
Memory network via simplex optimizer during COVID-19[J], Energy Rep. 8 (2022)
framework[J], Sensors 20 (5) (2020) 1399.
1–12.
[26] N. Singh, C. Vyjayanthi, C. Modi, Multi-step short-term electric load forecasting
[6] X. Shao, C.S. Kim, P. Sontakke, Accurate deep model for electricity consumption
using 2D convolutional neural networks[C]//, in: 2020 IEEE-HYDCON. IEEE,
forecasting using multi-channel and multi-scale feature fusion CNN–LSTM[J],
2020, pp. 1–5.
Energies, 13 (8) (2020) 1881.
[27] C. Feng, J. Zhang, Reinforcement learning based dynamic model selection for
[7] J.R Zhang, Research on power load forecasting based on the improved elman
short-term load forecasting[C]//, in: 2019 IEEE Power & Energy Society
neural network[J], Chem. Eng. Trans. 51 (2016) 589–594.
Innovative Smart Grid Technologies Conference (ISGT). IEEE, 2019, pp. 1–5.
[8] M.A. Hammad, B. Jereb, B. Rosi, et al., Methods and models for electric load
[28] C. Feng, M. Sun, J Zhang, Reinforced deterministic and probabilistic load
forecasting: a comprehensive review[J], Logist., Supply Chain, Sustain. Glob.
forecasting via $ Q $-learning dynamic model selection[J], IEEE Trans. Smart Grid
Challenges 11 (1) (2020) 51–76.
11 (2) (2019) 1377–1386.
[9] S. Atef, K. Nakata, A.B Eltawil, A deep bi-directional long-short term memory
[29] J. Moon, S. Jung, S. Park, et al., Conditional tabular GAN-based two-stage data
neural network-based methodology to enhance short-term electricity load
generation scheme for short-term load forecasting[J], IEEE Access 8 (2020)
forecasting for residential applications[J], Comput. Ind. Eng. 170 (2022), 108364.
205327–205339.
[10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning[J], Nature 521 (7553) (2015)
[30] Z. Wang, T. Hong, Generating realistic building electrical load profiles through the
436–444.
Generative Adversarial Network (GAN)[J], Energy Build. 224 (2020), 110299.
[11] Z. Deng, B. Wang, Y. Xu, et al., Multi-scale convolutional neural network with time-
[31] Yang Q., Kuang S., Wang D. A Novel Short-Term Load Forecasting Approach for
cognition for multi-step short-term load forecasting[J], IEEE Access 7 (2019)
Data-Poor Areas Based on K-Mifs-Xgboost and Transfer-Learning[J]. Available at
88058–88071.
SSRN 4266672.
[12] S. Hochreiter, J. Schmidhuber, Long short-term memory[J], Neural Comput. 9 (8)
[32] Y. Chen, X.S. Zhou, T.S Huang, One-class SVM for learning in image retrieval[C]//,
(1997) 1735–1780.
in: Proceedings 2001 international conference on image processing (Cat. No.
[13] D.L. Marino, K. Amarasinghe, M. Manic, Building energy load forecasting using
01CH37205). IEEE 1, 2001, pp. 34–37.
deep neural networks[C]//, in: IECON 2016-42nd Annual Conference of the IEEE
[33] S. Vandenhende, S. Georgoulis, W. Van Gansbeke, et al., Multi-task learning for
Industrial Electronics Society, IEEE, 2016, pp. 7046–7051.
dense prediction tasks: a survey[J], in: IEEE transactions on pattern analysis and
[14] W. Kong, Z.Y. Dong, Y. Jia, et al., Short-term residential load forecasting based on
machine intelligence, 2021.
LSTM recurrent neural network[J], IEEE Trans. Smart Grid 10 (1) (2017) 841–851.
[34] Learning to learn[M]. Springer Science & Business Media, 2012.
[15] A. Agga, A. Abbou, M. Labbadi, et al., CNN-LSTM: an efficient hybrid deep learning
[35] R. Caruana, Multitask learning[J], Mach. Learn. 28 (1) (1997) 41–75.
architecture for predicting short-term photovoltaic power production[J], Electr.
[36] Hinton G.E., Srivastava N., Krizhevsky A., et al. Improving neural networks by
Power Syst. Res. 208 (2022), 107908.
preventing co-adaptation of feature detectors[J]. arXiv preprint arXiv:1207.0580,
[16] Ruder S. An overview of multi-task learning in deep neural networks[J]. arXiv
2012.
preprint arXiv:1706.05098, 2017.
[37] N. Srivastava, G. Hinton, A. Krizhevsky, et al., Dropout: a simple way to prevent
[17] W. Kong, Z.Y. Dong, D.J. Hill, et al., Short-term residential load forecasting based
neural networks from overfitting[J], J. Mach. Learn. Res. 15 (1) (2014)
on resident behaviour learning[J], IEEE Trans. Power Syst. 33 (1) (2017)
1929–1958.
1087–1088.
[38] J. Lee, Y. Cho, National-scale electricity peak load forecasting: traditional, machine
[18] H. Shi, M. Xu, R. Li, Deep learning for household load forecasting—a novel pooling
learning, or hybrid model?[J], Energy 239 (2022), 122366.
deep RNN[J], IEEE Trans. Smart Grid 9 (5) (2017) 5271–5280.
[19] H. Hua, Y. Qin, C. Hao, et al., Stochastic optimal control for energy Internet: a
bottom-up energy management approach[J], IEEE Trans. Ind. Inf. 15 (3) (2018)
1788–1797.
14

1 s2.0 S0378779623003966 Main

Uploaded by

Copyright:

Available Formats

You might also like

1 s2.0 S0378779623003966 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0378779623003966 Main

Uploaded by

Copyright:

Available Formats

Electric Power Systems Research 222 (2023) 109507

Contents lists available at ScienceDirect

Electric Power Systems Research

A CNN and LSTM-based multi-task learning architecture for short and

1. Introduction forecasting of short and medium-term power loads.

Fig. 2. Time distribution of total active power (part of the 2018).

Fig. 3. Time distribution of total active power (averaged by year).

Fig. 4. Single-task learning versus multi-task learning.

classification data coding to enhance the quality of the data. We choose

vector data description), using a hypersphere to do the division,

Fig. 5. Illustration of the MTMV-CNN-LSTM framework.

(1) Multi-task learning occupies less memory because multiple tasks

Fig. 6. LSTM network structure.

Fig. 7. Dropout VS. normal network.

significantly improve the performance of the model through multi-task

Fig. 8. Performance comparison of different weights.

Fig. 9. Comparison of the loss curve (forecast 10 days).

Fig. 10. Fitting results (forecast 10 days).

Table 5 it = σ (Wi xt + Ui ht− 1 + bi ) (7)

Fig. 11. Results of the robustness test.

Fig. 12. Illustration of the MTMV-CNN-LSTM framework (forecast 3 months).

Fig. 13. Comparison of the loss curve (forecast 3 months).

implicit nodes with fixed relationships. (yi − ̂y i )2

Fig. 14. Fitting results (forecast 3 months).

Table 6 5.3. Experimental results

Fig. 15. Time distribution.

Power load forecasting is an important part of power system plan­

You might also like

Power load forecasting is an important part of power system plan