Day-Ahead Load Probabilistic Forecasting Based On Space-Time Correction

Send Orders for Reprints to reprints@benthamscience.
net
360 Recent Advances in Electrical & Electronic Engineering, 2021, 14, 360-374
RESEARCH ARTICLE
Day-ahead Load Probabilistic Forecasting Based on Space-time Correction
Fei Jin1, Xiaoliang Liu1, Fangfang Xing1, Guoqiang Wen1, Shuangkun Wang2, Hui He2,* and Runhai Jiao2
1
Computer Application and Software, Shandong Electrical Power Company, Weifang Power Supply Company State
Grid, Weifang, China; 2School of Control and Computer Engineering, North China Electric Power University, Beijing,
China
Abstract: Background: The day-ahead load forecasting is an essential guideline for power generat-
ing, and it is of considerable significance in power dispatch.
Objective: Most of the existing load probability prediction methods use historical data to predict a single
area, and rarely use the correlation of load time and space to improve the accuracy of load prediction.
ARTICLE HISTORY Methods: This paper presents a method for day-ahead load probability prediction based on space-
time correction. Firstly, the kernel density estimation (KDE) is employed to model the prediction
Received: July 22, 2020 error of the long short-term memory (LSTM) model, and the residual distribution is obtained. The
Revised: October 12, 2020 correlation value is then used to modify the time and space dimensions of the test set's partial period
Accepted: October 15, 2020
prediction values.
DOI:
10.2174/2352096513666201208103431 Results: The experiment selected three years of load data in 10 areas of a city in northern China.
The MAPE of the two modified models on their respective test sets can be reduced by an average of
10.2% and 6.1% compared to previous results. The interval coverage of the probability prediction
can be increased by an average of 4.2% and 1.8% than before.
Conclusion: The test results show that the proposed correction schemes are feasible.
Keywords: Day-ahead load probabilistic forecasting, long short-term memory, space-time correction, kernel density estima-
tion, short-term load forecast, residual modeling.
1. INTRODUCTION focused on deterministic load forecasting. Deterministic load

forecasting mainly consists of traditional statistical methods
1.1. Background and machine learning approaches. Statistical methods in-
Power load forecasting has become one of the hot re- clude differential autoregressive moving average models
search topics in the power system. Constructing a high accu- (ARIMA), multiple regression models, the exponential
racy and high-reliability forecasting model is crucial to the smoothing model, grey prediction models, and so on. Ma-
grid dispatching of power companies and the stable opera- chine learning approaches include neural networks (NN),
tion of the power system [1]. According to different purpos- support vector machines (SVM), random forests (RF), and so
es, load forecasting can be divided into three categories, in- forth. Nepal et al. [3] proposed a hybrid model composed of
cluding short-term, medium-term, and long-term [2]. The clustering technology and ARIMA to predict the university
short-term power load forecasting, especially the day-ahead building electricity load. Traditional statistical models, such
load forecasting, is an essential part of grid economic dis- as the ARIMA model, do not depend on explanatory varia-
patch, and it is the basis for the power system to formulate bles. However, load forecasting is often affected by many
the generation plan of the unit, so it has higher requirements factors, including meteorological factors, calendar factors,
for the accuracy of prediction. etc. Fan et al. [4] constructed a semiparametric additive
model, which took into account the calendar factors and
1.2. Literature Review temperature in the input variables. The experiment's results
In previous studies, relevant researchers have done a lot verified the effectiveness of the method. Therefore, on the
of work on short-term power load forecasting, most of them one hand, the statistical models can no longer meet people's
needs in terms of prediction accuracy. On the other hand, in
the increasingly complex power system, the prediction mod-
*Address correspondence to this author at the School of Control and Com- el should consider the multiple influencing factors of elec-
puter Engineering, North China Electric Power University, Beijing, China;
E-mail: huihe@ncepu.edu.cn
tricity load. Obviously, the traditional statistical model can-
not satisfy the current application scenario.
2352-0965/21 $65.00+.00 © 2021 Bentham Science Publishers

Day-ahead Load Probabilistic Forecasting Based on Space-time Correction Recent Advances in Electrical & Electronic Engineering, 2021, Vol. 14, No. 3 361
In recent years, with the continuous development of ma- Firstly, we build a cascaded LSTM model for deterministic
chine learning and deep learning technology, the application prediction, which can simulate the long-term and short-term
of power load forecasting has also achieved satisfactory re- dependencies in the time series [14], and obtain more accu-
sults. In the study [5], Xie et al. come up with a fuzzy neural rate prediction results. Then, we use the kernel density esti-
network short-term power load forecasting model based on mation (KDE) for residual modeling analysis. Finally, we
an improved decision tree. The simulation results show that build a modified model of the time dimension and space di-
the proposed model can improve forecast accuracy and re- mension and further correct the predicted value. The KDE
duce relative errors. Ning et al. analyzed the correlation be- model can effectively fit the actual distribution of the data,
tween power load and its influencing factors and then estab- and the modified model can further improve the accuracy of
lished a short-term load forecasting model using the method the prediction results. We implement the experimental veri-
of BP neural network, [6]. The simulation experiments fication on the load data sets of 10 different regions in north-
proved the superiority of the proposed method. Liao et al. ern China. The MAPE of the two modified models on their
utilized the Particle Swarm Optimization (PSO) algorithm to respective test sets can be reduced by an average of 10.2%
optimize the parameters of the SVM model and then estab- and 6.1% compared to previous results. The interval cover-
lished the optimal kernel parameter PSO-SVM load predic- age of the probability prediction can be increased by an av-
tion model according to the normalized root mean square erage of 4.2% and 1.8% based on the correction before,
error [7]. Liu et al. proposed a short-term load forecasting which shows that our modified methods are feasible.
method based on gradient boosting decision tree combination
optimization. Compared with the results of the SVR model 1.3. Contributions
and BP neural network, the proposed method achieved more
accurate forecast results. Jiang et al. employed the Long The main contributions are as follows:
Short-term Memory Network (LSTM) for load forecasting (1) In this paper, a cascaded dual-input LSTM model is
[8, 9]. Compared with the SVR method, the LSTM network proposed, and calendar factors and meteorological factors
can better extract load features and obtain better prediction are utilized in the process of feature construction.
results. The LSTM network is a variant of the Recurrent
(2) A correction model based on the time dimension is
Neural Network (RNN). It solves the long-term dependence
constructed, which takes holidays and non-holidays into ac-
problem and the gradient disappearance problem in the
count.
RNN. It can better capture the nonlinear relationship existing
in the data and thereby improve the prediction accuracy, (3) Space dimension-based correction model is intro-
therefore LSTM has been widely used in the field of predic- duced by comprehensively considering the correlation be-
tion problems. tween the load of 10 areas in a city.
However, the traditional deterministic forecasting meth-
ods are unable to describe the high degree of uncertainty in 1.4. Structure of the Paper
power load. Therefore, probabilistic load forecasting has The paper consists of 6 parts besides Section 1 above.
attracted more and more attention, which can provide more Section 2 firstly introduces the process of data preprocessing,
comprehensive information in the decision-making process and then gives the method of correlation analysis of load
[10]. Research on load probability prediction includes esti- data and the process of selecting influencing factors. Section
mating prediction interval and probability density distribu- 3 presents the evaluation metrics used in this paper. Section
tion. Creighton et al. estimate the boundary of the prediction 4 introduces the various models and principles. Section 5 is
interval by constructing a neural network with two outputs the experimental results and comparative analysis of the
[11]. The loss function of the neural network is implemented methods. Section 6 is the conclusion and prospect of this
based on the evaluation index CWC that minimizes the prob- paper.
ability prediction. The annealing simulation algorithm is
used to minimize the loss function. However, the method can 2. CORRELATION ANALYSIS
only obtain the interval estimation of the probability predic-
tion, cannot get the probability density distribution of the 2.1. Data Preprocessing
load value. Also, it cannot provide more power load infor-
mation to the power system decision-makers. He et al. [12] The data set comprises daily load data and external factor
employed a kernel-based support vector quantile regression data of 10 county-level cities in a city in northern China from
method to achieve short-term load probability density predic- November 24, 2016, to November 26, 2019. The time granu-
tion. He et al. [13] proposed an approach based on quantile larity of the load data is 2 hours, and 12 points are sampled
regression and artificial neural network, which obtained the throughout the day. The external factor data includes daily
predicted value under different points and then used the ker- temperature, wind speed, weather, other meteorological fac-
nel density estimation to get the predicted probability density tors, and calendar factor data. The time granularity of the
distribution. Although the above method can get interval external factor data is one day.
estimation and prediction probability density distribution, Due to abnormal data collection equipment and other rea-
both SVR and ANN method belongs to shallow learning, and sons, there are mainly two cases of anomalous data and miss-
they are not generalized to deal with complex problems. Al- ing data in the original data set. We found that the load data
so, sometimes they cannot achieve good prediction results. intuitively shows a daily periodicity and a weekly periodici-
This paper presents a method for predicting the short- ty. Therefore, for the case of missing data, the method of
term load probability density based on space-time correction. filling with the load value of the same time on the adjacent
362 Recent Advances in Electrical & Electronic Engineering, 2021, Vol. 14, No. 3 Jin et al.
Table 1. Correlation result.
Day-related Correlation Coefficient Week-related Correlation Coefficient

-
r p Accept r p Accept
Area1 0.79 0.002 !! 0.77 0.002 !!
Area2 0.78 0.002 !! 0.65 0.002 !!
Area3 0.91 0.002 !! 0.7 0.002 !!
Area4 0.89 0.002 !! 0.91 0.002 !!
Area5 0.93 0.002 !! 0.87 0.002 !!
Area6 0.93 0.002 !! 0.92 0.002 !!
Area7 0.88 0.002 !! 0.86 0.002 !!
Area8 0.94 0.002 !! 0.93 0.002 !!
Area9 0.73 0.002 !! 0.78 0.002 !!
Area10 0.76 0.002 !! 0.48 0.002 !!
day or adjacent week is adopted. Data anomaly mainly refers In parameter statistics, we can use the method of hypoth-
to the situation where the load curve shows a large vertical esis testing to verify the correlation of time series. Suppose
change. For such abnormal points, we use the normal load the questions to be tested are as follows:
data before and after the abnormal point for smooth pro-
!! : ! = 0, X is not related to Y.!! : ! ≠ 0, X is related to
cessing.
Y, α = 0.05.
In order to accelerate the convergence of the prediction
Where α is the significance level that we selected. Since
model during training, the data needs to be normalized. The
the number of inspection samples is greater than 30, a large
formula for sample normalization is as follows in Eq. (1): sample inspection method is required. In the case of large
p ! p min samples, we can construct statistics to do hypothesis testing,
p'= (1) which is built as follows in Eq. (3):
p max ! p min
Z = r n −1
Where p is the processed data, !!"# is the minimum data,
!!"# is the maximum data and ! ! is the normalized data. p = 2(1 − φ ( Z )) (3)
The value of the normalized data is limited to [0, 1] for sub-
sequent prediction work. Where Z follows the normal distribution, !(Z) is the
cumulative distribution function of the standard normal dis-
To ensure the consistency of the model training results tribution, and p is the test value. If the value of the p is great-
and the prediction results, the load data of all regions adopt er than 0.05, the null hypothesis can be accepted, which
the same normalization standard to ensure the authenticity means that there is no significant difference, and X and Y are
and accuracy of the test. not related. If the value of the p is less than 0.05, the null
hypothesis is rejected, which means that there is a significant
2.2. Time Series Correlation Analysis. difference, and X and Y are correlated.
Since the load data is not normally distributed, we use the In this paper, we use the Spearman correlation coefficient
Spearman correlation coefficient to measure the time-series to measure the daily and weekly correlations of the load time
correlation in the load data [15]. series. We randomly selected the load data of four consecu-
Assume ! = (!! , !! , ⋯ , !! )! , ! = (!! , !! , ⋯ , !! )! , then tive days and four consecutive weeks in 10 regions for the
the Spearman correlation coefficient between X and Y is correlation test. Table 1 shows the results.
defined as follows in Eq. (2) [16]: We can see from the results in Table 1 that the load time
n series data of 10 regions have relatively high Spearman cor-
6 ∑ d2 relation coefficients in daily correlation and weekly correla-
i =1 i
r = 1− 2 tion, and all passed the hypothesis. Fig. (1) shows the 2019
n(n − 1) (2) load curves of ten areas.
Where n is the number of samples, and !! is the differ-
2.3. Inter-regional Load Correlation Analysis
ence in ranks between !! and !! . Rank refers to the position
of the current element in the entire sequence after being sort- We can see from Fig. (1) that the electricity load curves
ed from small to large. of the ten areas are relatively similar, and the load curve
Fig. (1). Load curve of the ten areas. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
Fig. (2). Pearson correlation coefficient map among the ten areas. (A higher resolution / colour version of this figure is available in the elec-
tronic copy of the article).
Fig. (3). Influencing factors selection process. (A higher resolution / colour version of this figure is available in the electronic copy of the
article).
shows roughly the same trend. During the spring festival and (3) External factors, including weather information and
large-scale holidays, the electricity load showed a significant calendar information, which is defined as
drop. During the peak period of summer electricity consump- !! = {!!! , !!! , !!! , !!! , !!! , !!! , !!! , !!! }, where !!! is the weath-
tion, the electricity load shows a rising phenomenon. Also, er factor and !!! is the calendar factor.
the electricity load between the ten areas seems to show a
linear correlation. In order to measure the degree of this line- 3. MATERIALS AND METHODS
ar correlation, we use the Pearson correlation coefficient to
study [17]. 3.1. The Framework
Fig. (2) is the heat map of the Pearson correlation coeffi- Fig. (4) shows the overall framework of this paper. We
cient matrix between the ten areas. From the heat map, we divided the framework into three parts according to the data
can see that the Pearson correlation coefficient between most set. The dataset is divided into three sections by year, includ-
areas is relatively high, indicating that the load value being D1, D2, and D3.
tween most regions at the same time with a linear correla-
tion. D1 is the training set, from which we can get the point
prediction model. D2 is the residual training set. We obtain
2.4. Analysis and Selection of Influencing Factors the residual error between the predicted load value and the
real load value from the point prediction model and then
We divided the selection of influencing factors into two perform kernel density estimation modeling analysis on the
groups: load characteristics and external factors. The load residual error, and finally obtain the residual probability den-
characteristics include time-series characteristics and period- sity distribution. The D3 data set is the test set; we establish
ic characteristics. The attributes of external factors mainly two correction models for the predicted value from the two
include meteorological factors (such as weather, tempera- dimensions of time and space to further modify the predicted
ture, and wind), and calendar factors (such as holidays). value. Finally, we apply the corrected predicted value to the
In this paper, we divide the meteorological information residual probability density distribution to obtain the final
into 12 categories according to weather, temperature, and probability density prediction result.
wind force, and then compile it into a 4-dimensional binary
vector. Similarly, the calendar factors are classified and en- 3.2. Point Prediction Model
coded into a 4-dimensional binary vector. Fig. (5) shows the framework of the point prediction
Considering the strong daily and weekly correlations of model. Considering the different dimensions of constructing
load time series, the selection process of influencing factors the feature matrix, we employ a cascaded LSTM model,
based on load values is shown in Fig. (3). which can enable the network to learn better the timing and
periodic characteristics of the load data, which can improve
We divided the selection process of influencing factors the accuracy of the point prediction model. Let the two input
into the following three parts: feature matrices of the model be E1 and E2, thenE1 =
(1) The load value of the previous 8 points of the day be- !! , !! , !2 = [ !! , !! , !! ], which means that the two inputs
fore the current time point, which is defined as !! = are a combination of sequential factors and external factors
{!! !!! , !! !!! , !! !!! , ⋯ , !! !!! }, where ! ! refers to the same and a combination of periodic factors and external factors.
moment in the previous day.
(2) The load value at the same time four days before the 3.3. Modified Model of the Time Dimension
current point, which is defined as Given the uncertainty and particularity of the electricity
!! = {!!!! , !!!! , !!!! , !!!! }. The load value at the same load during the holiday period, which leads to the impact of
time four weeks before the current point, which is defined as inaccurate model prediction, so we build a modified model
!! = {!!!! , !!!! , !!!! , !!!! }. to capture a possible relationship between the real load value
Fig. (4). Overall framework of dataset.
Fig. (5). Point prediction model. (A higher resolution / colour version of this figure is available in the electronic copy of the article).
and the predicted value. In this paper, we use the time di- P3 as the features into the model. Finally, we can get the
mension correction model to make further corrections to the revised predicted value corresponding to the N3 dataset
prediction values of significant holidays. The modified mod- Fig. (6).
el selects the load data of the Spring Festival holiday, May
Day holiday, and a National holiday. 3.4. Modified Model of Space Dimension
We denote the holiday data in the three datasets as N1,
From the analysis in Section 2.3, we can conclude that
N2 and N3, and we denote the predicted values of the N2
there is a strong linear correlation between the ten areas.
and N3 as P2 and P3; also, we denote the revised predicted Therefore, we can improve the prediction accuracy by con-
value of the N3 as F1. Firstly, we use the predicted value P2
structing a correction model based on spatial dimensions so
corresponding to N1 dataset and N2 dataset as the features,
that the predicted values between regions can be mutually
and N2 dataset as the label. After training, we can get the
corrected. Fig. (7) shows the modified model based on spa-
time dimension correction model. Then we put the average
tial dimensions.
value of N1 dataset and N2 dataset, and the predicted value
Fig. (6). The modified model of the time dimension.
Fig. (7). The modified model of the space dimension.
Firstly, we obtain the Pearson correlation matrix between

the ten areas, thereby getting the degree of correlation be- 4.1. Evaluation Metrics for Deterministic Forecasting
tween the areas. Secondly, on the D2 dataset, we select the The average absolute percentage error is one of the eval-
predicted value of the modified area and the predicted value uation indicators of regression problems. It not only consid-
of the two areas with the highest correlation as the features ers the error between the model prediction result and the real
of the modified model, which are denoted as M0, M1 and value but also takes into account the ratio between the real
M2. Meanwhile, we select the real load value in the correct- value and the error. It is the performance evaluation metrics
ed area as the label of the modified model, which is denoted of the point prediction model used in this paper. It is defined
as M. Finally, on the test data D3, we use the same method to as in Eq. (4):
select features; the selected features are denoted as !!! , !!! and
!!! , then we can get the revised forecast value F2. Pft − Prt
1 N
4. EVALUATION METRICS
MAPE =
N ∑ t =1 Prt
∗100%
(4)
This paper uses two indicators to evaluate the perfor-
mance of power load forecasting, including deterministic Where !!" is the predicted value at time t, !!" is the real
forecasting and probabilistic forecasting. value at time t, N represents the number of samples.
4.2. Evaluation Metrics for Probabilistic Forecasting 5.1. Comparison of Point Prediction Results
The probability prediction part of this paper uses three In order to verify the effectiveness of our cascaded
common metrics to evaluate the model. LSTM model, on the training set D1 and test set D3, we have
selected linear models, random forest models, and support
We used the prediction interval coverage probability
(PICP) to evaluate the accuracy of interval prediction. PICP vector machine models to compare the prediction results.
The test results on the test set are shown in Fig. (8), where
indicates whether the value of the prediction point is covered by
the orange histogram represents the prediction accuracy of
the prediction interval, which is defined as follows in Eq. (5):
our model.
1 N
PICP = ∑ δ As can be seen from Fig. (8), among the ten regions, ex-
N t =1 t
(5) cept for area 9, compared with the prediction results of other
Where N is the number of samples, !! indicates whether models, the prediction accuracy of the point prediction mod-
the prediction interval covers the predicted value. !! is de- el used in this paper is the highest. In region 9, the prediction
fined as in Eq. (6): accuracy of the random forest model, linear model and the
model in this article is basically the same. The results of the
⎧1, y ∈ [ Lt ,U t ] comparative test show that the prediction method proposed
δ t = ⎨ t in this paper is effective.
⎩0, yt ∉ [ Lt ,U t ] (6)
5.2. Residual Analysis
Where !! and !! are the upper and lower bounds of the
confidence interval of the predicted value. PICP can intui- We can get the predicted value on D2 from the point pre-
tively reflect the confidence level of the prediction interval. diction model, and the residual on D2 dataset. Taking the
The larger the value, the higher the probability that the pre- data set of area 8 as an example, Fig. (9) shows the distribu-
dicted value falls within the prediction interval, and the more tion of predicted load values and real load values on the D2
accurate the prediction result. data set at 6 o'clock and 12 o'clock.
The normalized average width of the prediction interval We can see from the figure that the predicted value and
(PINAW) is defined as the normalized average width, which the real value are roughly uniformly distributed on both sides
is used to evaluate the quality of the prediction interval. A of the straight line x = y, the scatterplot is a fusiform distri-
higher interval coverage probability and a full normalized bution, and most of it is gathered in the middle of the fusi-
average width of the prediction interval are the standards for form. The upper and lower tails show a similar sparse distri-
the high quality of the range. PINAW is defined as follows bution. Therefore, the error between the predicted real value
in Eq. (7): and the predicted value at each moment should obey a par-
ticular distribution.
1 N
PINAW =
NR ∑ t =1
(U t − Lt )
(7)
The density estimation method is used to solve the prob-
lem of the density distribution of known samples in proba-
Where R represents the maximum predicted value minus bility theory, and the most widely used method is the kernel
the minimum predicted value in the prediction interval, !! density estimation method. KDE is a non-parametric test
and !! represent the upper and lower confidence intervals of method [18], which does not use prior knowledge of sam-
the forecast, respectively. PINAW describes the target pre- ple distribution and does not make any assumptions, to
diction range as a percentage. study the distribution characteristics of samples from the
data itself.
To comprehensively measure the quality of the interval,
the combined coverage width (CWC) benchmark combines Similarly, taking the data set of area 8 as an example, the
the previous two criteria, and comprehensively considers the residuals obtained from the D2 data set are divided into 12
interval coverage probability and the average width of the categories according to the different time points. Then we
interval. The definition of CWC is as follows in Eq. (8): use kernel density estimation to get the residual probability
density curves of 12-time points, respectively. Table 2 shows
CWC = PINAW (1 + γ ( PICP)e −η ( PICP − µ ) ) the best parameters of kernel density estimation at various
time points.
⎧0, PICP ≥ µ
γ ( PICP) = ⎨ We can see from Table 2 that the best fitting kernel func-
⎩1, PICP ≤ µ (8) tion at each time point is a Gaussian kernel function, indicat-
ing that the residual error at each time point is subject to
Where µ and η are the controlling parameters, and µ Gaussian distribution.
means the prediction interval coverage probability is to be
achieved, we set it to the confidence level (1- α ), η is a 5.3. Results of the Modified Model in the Time Dimen-
penalty term that will lead to CWC exponential growth when sion
interval coverage is less than the confidence level. The data set for the modified model of the time dimen-
5. EXPERIMENT AND ANALYSIS sion is part of the test set. Table 3 shows the comparison of
the load point prediction results after the modified model of
In the following experiment, our computing environment 10 areas on the selected test set.
is Intel Core i5, 4210U.
Fig. (8). Comparison of point prediction results. (A higher resolution / colour version of this figure is available in the electronic copy of the
article).
Fig. (9). The scattered distribution of predicted value and real value at different time points. (A higher resolution / colour version of this fig-
ure is available in the electronic copy of the article).
Table 2. The best parameters for kernel density estimation.
Time Best Bandwidth Best Kernel Function Time Best Bandwidth Best Kernel Function
0:00 40 gaussian 12:00 40 gaussian

Table 3. Comparison of point prediction results of the modified model in the time dimension (MAPE).
- Area1 Area2 Area3 Area4 Area5 Area6 Area7 Area8 Area9 Area10
Modified result 9.80% 7.45% 6.79% 5.24% 6.50% 8.75% 3.61% 8.63% 11.80% 7.24%
Results before correction 10.10% 8.08% 7.42% 7.5% 6.60% 9.35% 3.84% 10.65% 4.97% 8.00%
Reduce 2.97% 7.80% 8.49% 30.1% 1.52% 6.42% 5.99% 18.97% ╳ 9.50%
Table 4. Comparison of probabilistic prediction results of the modified model in the time dimension.
- Area1 Area3 Area4 Area6 Area7 Area8 Area10
PICP 0.9086 0.859 0.8878 0.8511 0.953 0.8721 0.9138

Results before
PINAW 0.4 0.2 0.23 0.28 0.29 0.28 0.35
correction
CWC 3.56 19.1 5.45 39.88 0.29 14.27 2.47
PICP 0.9243 0.9034 0.9973 0.8956 0.9582 0.9138 0.9478

Modified
PINAW 0.4 0.2 0.23 0.28 0.29 0.28 0.35
result
CWC 1.84 2.26 0.23 4.59 0.29 2.02 0.74
Fig. (10). Comparison of the modified model results in the time dimension. (A higher resolution / colour version of this figure is available in
the electronic copy of the article).
We can see that the revised model has further revised 9 Taking the area 8 data set as an example, Fig. (10) shows
of the ten regions, and the prediction accuracy has been im- the relationship between the real load value, the predicted
proved. The revised MAPE has reduced by an average of load value, and the revised predicted value for area 8 from
about 10.2% compared to before the revision. The results in the first day of the Chinese New Year to the Lantern Festi-
Table 4 are at 95% confidence, the probability prediction val. The grey line represents the revised predicted value, and
results of 7 areas with improved area coverage for probabil- the blue line of the load point prediction results revised by
ity prediction, area coverage can be increased by an average the modified model of 10 areas on the selected part of the
of 4.2% on the original basis. Although the revised model's test set.
prediction interval coverage on this part of the selected test
In the modified model of the time dimension, based on
set does not exceed the confidence level, it can be seen from
the correlation between 3 datasets, we used load values un-
the results in Table 4 that the interval coverage rate has been der N1, N2 and N3 in the same period to correct the predict-
greatly improved compared to before the correction. If we
ed value of N3. In Fig. (11), we showed the load values in
judge from the test results of the entire test set, the interval
N1, N2 and N3 of area 1 and area 9, respectively. It can be
coverage is beyond the confidence.
Fig. (11). Comparison of load correlation between area (a) Area 1 and (b) Area 9. (A higher resolution / colour version of this figure is avail-
able in the electronic copy of the article).
concluded that the correlation between N3 data and N1 and judge from the test results of the entire test set, the interval
N2 data in area 9 is not high compared with that in area 1. coverage is beyond the confidence.
Therefore, our modified model is not applicable to area 9
Taking the data set of area1 as an example, Fig. (12)
data.
shows the load value of area 1 from July 16, 2019, to July
23, 2019. The dotted line in the figure represents the cor-
5.4. Results of the Modified Model in the Space Dimen- rected predicted value, and the blue figure represents the
sion predicted value. We can see from the picture that the ad-
The data set used in the spatial dimension correction justed predicted value is significantly closer to the real load
model also selects a part of the test set, including summer value.
load data from July 16 to August 31. Table 5 shows the com- In the modified model of spatial dimensions, based on
parison of the load point prediction results revised by the mod- the correlation between different areas, we used the load
ified model of 10 areas on the selected part of the test set. values between areas to modify the predicted value of are-
The results show that the modified model has further re- as. In Tables 7 and 8, we showed the correlation coefficient
vised 6 of the ten areas, and the prediction accuracy has been between area data. Among them, Feature 1 is the correla-
improved. The corrected MAPE has decreased by an average tion coefficient between the predicted value and the true
of 6.1% compared with before the revision. The results in value of a certain area to be revised. Feature 2 and Feature
Table 6 are at 95% confidence, the probability prediction 3 are the 2 highest correlation coefficients between the pre-
results of 7 areas with improved area coverage for probabil- dicted value of the certain area to be corrected and the pre-
ity prediction, area coverage can be increased by an average dicted value of other areas. Taking area 1 as an example,
of 1.8% on the original basis. Although the revised model's for which the modified model is suitable, we can conclude
prediction interval coverage on this part of the selected test that Feature 2 and Feature 3 of Area 1 is significantly high-
set does not exceed the confidence level, we can see from the er than that of the other four areas. Therefore, comparative-
results in Table 6 that the interval coverage rate has been ly speaking, the proposed model is not suitable for the other
greatly improved compared to before the correction. If we four areas data.
Table 5. Comparison of point prediction results of the modified model in space dimension (MAPE).
− Area1 Area2 Area3 Area4 Area5 Area6 Area7 Area8 Area9 Area10
Modified result 7.39% 5.67% 4.22% 6.25% 5.89% 4.64% 3.93% 5.28% 4.91% 5.41%
Results before correction 8.10% 6.20% 4.30% 4.17% 5.78% 4.76% 4.00% 5.15% 4.80% 6.23%
Reduce 8.77% 8.55% 1.86% ╳ ╳ 2.52% 1.75% ╳ ╳ 13.16%
Table 6. Comparison of probabilistic prediction results of the modified model in the space dimension.
- Area1 Area2 Area4 Area5 Area7 Area8
PICP 0.873 0.92 0.9289 0.9147 0.8845 0.9183
Results before correction PINAW 0.33 0.4 0.33 0.27 0.22 0.29
CWC 16.1 2.2 1.31 1.84 5.93 1.68
PICP 0.9026 0.94 0.9645 0.9342 0.8881 0.9218
Modified result PINAW 0.33 0.4 0.33 0.26 0.22 0.29
CWC 3.92 1.09 0.33 0.84 5 1.45
Fig. (12). Comparison of the modified model results in the space dimension. (A higher resolution / colour version of this figure is available in
the electronic copy of the article).
5.5. The Probability Prediction Results where the yellow and grey dotted segments respectively rep-
resent the upper and lower limits of the confidence interval.
After obtaining the residuals of the point prediction mod-
We can see that at 95% confidence, the real value falls with-
el on the D2 data set, we use kernel density estimation to
in the confidence interval, and the predicted value and the
model and analyze the residuals at each time point to obtain real value have a higher degree of fit.
the residual probability density curve at each time point. We
add the point prediction result of the test set to the residual Fig. (14) is the probability density distribution of the pre-
probability density curve to get the probability prediction dicted value at the partial time of March 1 in the data set of
result of the test set. region 5, where the blue dotted line is the upper and lower
limits of the confidence interval under the 95% confidence
Taking the dataset of area 5 as an example, Fig. (13) level. The yellow dotted line represents the actual load value
shows the load probability prediction results of area 5 from
at the current time point. We can see from Fig. (13) that the
March 1, 2019, to March 7, 2019, with 95% confidence,
probability density curve is relatively smooth and has a
Fig. (13). The forecast results of Area 5 in a particular week. (A higher resolution / colour version of this figure is available in the electronic
copy of the article).
Fig. (14). Probability density distribution of predicted load value at different time points.
Table 7. Comparison of probability prediction results.
- Area1 Area2 Area3 Area4 Area5 Area6 Area7 Area8 Area9 Area10
MAPE of the test set 7.3% 5.3% 3.8% 3.9% 4.3% 4.4% 2.9% 5.0% 4.5% 5.3%
PICP 0.943 0.957 0.962 0.955 0.96 0.96 0.953 0.955 0.958 0.967
Results before correction PINAW 0.26 0.23 0.18 0.17 0.22 0.18 0.19 0.2 0.21 0.26
CWC 0.63 0.23 0.18 0.17 0.22 0.18 0.19 0.2 0.21 0.26
PICP 0.96 0.959 0.966 0.97 0.963 0.963 0.954 0.96 0.953 0.971
Modified result PINAW 0.26 0.23 0.18 0.17 0.22 0.18 0.19 0.2 0.21 0.26
CWC 0.26 0.23 0.18 0.17 0.22 0.18 0.19 0.2 0.21 0.26
Table.8. The correlation values of different areas.
- Feature1 Feature2 Feature3

Area1 0.86 0.82 0.68
Area4 0.82 0.57 0.56
Area5 0.89 0.50 0.63
Area8 0.80 0.59 0.60
Area9 0.87 0.26 0.36
gaussian distribution. The real value falls entirely within the

95% confidence interval. The real value and the predicted AVAILABILITY OF DATA AND MATERIALS
value are close. Since the predicted value probability density The data set used in this article cannot be made public
curve is composed of the predicted value and the residual due to its confidentiality.
probability, the density curves are added together, so the
median and predicted values coincide. FUNDING
We replaced the prediction results of the two parts of the The authors would like to thank the Project Supported by
test set after the time-dimension correction model and the the Fundamental Research Funds for the Central Universities
space-dimension correction model into the overall test set Grant no. (2020MS012) and State Grid Shandong Electric
prediction results. We obtained the probabilistic prediction Power Company Project (52060418006B).
results after the modified model compared with the uncor-
rected results, as shown in Table 7.
CONFLICT OF INTEREST
CONCLUSION AND OUTLOOK The authors declared that they have no conflicts of inte-
rest in this work.
This paper proposes a modified model based on the time
and space dimensions to modify the prediction results of the ACKNOWLEDGEMENTS
cascade LSTM model, and uses kernel density estimation
to model and analyze the prediction residuals. By combin- Declared none.
ing the point prediction results with the residual probabil-
ity, the density distribution is connected to get the final REFERENCES
load probability density distribution. We conducted a com- [1] G. Aburiyana, and M.E. El-Hawary, "An overview of forecasting
parison test of two modified models on selected test sets in techniques for load, wind and solar powers", In: 2017 IEEE Elec-
10 areas of a city in northern China. The test results show trical Power and Energy Conference (EPEC). Saskatoon, SK, Can-
that the MAPE of the two modified models on their respec- ada, 2017, pp. 1-7.
h://dx.doi.org/10.1109/EPEC.2017.8286192
tive test sets can be reduced by an average of 10.2% and [2] S. Singh, S. Hussain, and M.A. Bazaz, "Short term load forecasting
6.1% compared to before modification. The probability using artificial neural network", In: 2017 Fourth International Con-
coverage of the interval can be increased by an average of ference on Image Information Processing (ICIIP). Shimla, India,
4.2% and 1.8% based on before modification, indicating 2017, pp. 1-5.
http://dx.doi.org/10.1109/ICIIP.2017.8313703
that our amendment is valid. However, the correction [3] B. Nepal, M. Yamaha, A. Yokoe, and T. Yamaji, "Electricity load
method we propose in this article is only effective for those forecasting using clustering and ARIMA model for energy man-
data sets with high correlation. The test results prove that agement in buildings", Japan Architect. Review, vol. 3, no. 1, 2020.
our revised model can improve both the prediction accura- http://dx.doi.org/10.1002/2475-8876.12135
cy and the interval coverage of the probability prediction. [4] S. Fan, and R.J. Hyndman, "Short-term load forecasting based on a
semi-parametric additive model", IEEE Trans. Power Syst., vol. 27,
Therefore, the results of this paper are of considerable sig- no. 1, pp. 134-141, 2012.
nificance to the operation planning and scheduling of the http://dx.doi.org/10.1109/TPWRS.2011.2162082
power grid. [5] Z. Xie, R. Wang, Z. Wu, and T. Liu, "Short-term power load fore-
casting model based on fuzzy neural network using improved deci-
Although the revised model in this paper has the effect of sion tree", In: 2019 IEEE Sustainable Power and Energy Confer-
improving the accuracy of the prediction results to a certain ence (iSPEC). Beijing, China, 2019, pp. 482-486.
extent, from the perspective of the probability prediction http://dx.doi.org/10.1109/iSPEC48194.2019.8975070
[6] L. Ning, Z. Guo, C. Chen, E. Zhou, L. Zhang, and L. Wang,
interval coverage of some test sets selected by the revised "Short-term forecasting model of regional power load based on
model, the revised results are not in place. In the next step, neural network", In: 2019 IEEE 7th International Conference on
we will further improve the modified model to achieve a Computer Science and Network Technology (ICCSNT). Dalian,
better correction effect. China, 2019, pp. 241-245.
http://dx.doi.org/10.1109/ICCSNT47585.2019.8962509
[7] X. Liao, X. Kang, M. Li, and N. Cao, "Short term load forecasting
CONSENT FOR PUBLICATION and early warning of charging station based on PSO-SVM", In:
2019 International Conference on Intelligent Transportation, Big
Not applicable. Data & Smart City (ICITBS). Changsha, China, 2019, pp. 305-308.
http://dx.doi.org/10.1109/ICITBS.2019.00080
[8] S. Liu, Y. Cui, Y. Ma, and P. Liu, "Short-term load forecasting [14] X. Song, J. Huang, and D. Song, "Air quality prediction based on
based on GBDT combinatorial optimization", In: 2018 2nd IEEE LSTM-Kalman model", In: 2019 IEEE 8th Joint International In-
Conference on Energy Internet and Energy System Integration formation Technology and Artificial Intelligence Conference
(EI2). Beijing, China, 2018, pp. 1-5. (ITAIC). Chongqing, China, 2019, pp. 695-699.
http://dx.doi.org/10.1109/EI2.2018.8582108 http://dx.doi.org/10.1109/ITAIC.2019.8785751
[9] Q. Jiang, J. Zhu, M. Li, and H. Qing, "Electricity power load fore- [15] J. Nascimento, T. Pinto, and Z. Vale, "Day-ahead electricity market
cast via long short-term memory recurrent neural networks", In: price forecasting using artificial neural network with spearman data
2018 4th Annual International Conference on Network and Infor- correlation", In: 2019 IEEE Milan Power Tech. Milan, Italy, 2019,
mation Systems for Computers (ICNISC). Wuhan, China, 2018, pp. pp. 1-6.
265-268. http://dx.doi.org/10.1109/PTC.2019.8810618
http://dx.doi.org/10.1109/ICNISC.2018.00060 [16] P. Yuvaraj, R. Anirudh, J. Sharmila, and C. Thirumalai, "Analyzing
[10] Y. He, and H. Li, Probability density forecasting of wind power user knowledge by Pearson and spearman method", In: 2017 Inter-
using quantile regression neural network and kernel density esti- national Conference on Trends in Electronics and Informatics
mation: vol. 164. Energ. Convers. Manage, 2018. (ICEI). Tirunelveli, India, 2017, pp. 1086-1089.
[11] A. Khosravi, S. Nahavandi, D. Creighton, and A.F. Atiya, "Lower http://dx.doi.org/10.1109/ICOEI.2017.8300878
upper bound estimation method for construction of neural network- [17] J. Liu, Y. Zhang, and Q. Zhao, "Adaptive ViBe algorithm based on
based prediction intervals", IEEE Trans. Neural Netw., vol. 22, no. pearson correlation coefficient", In: 2019 Chinese Automation
3, pp. 337-346, 2011. Congress (CAC). Hangzhou, China, 2019, pp. 4885-4889.
http://dx.doi.org/10.1109/TNN.2010.2096824 PMID: 21189235 http://dx.doi.org/10.1109/CAC48633.2019.8997209
[12] Y. He, R. Liu, H. Li, S. Wang, and X. Lu, "Short-term power load [18] X. Yang, X. Ma, N. Kang, and M. Maihemuti, "Probability interval
probability density forecasting method using kernel-based support prediction of wind power based on KDE method with rough sets
vector Quantile regression and Copula theory", Appl. Energ., vol. and weighted markov chain", IEEE Access, vol. 6, pp. 51556-
185, 2017. 51565, 2018.
[13] Y. He, Q. Xu, J. Wan, and S. Yang, "Short-term power load proba- http://dx.doi.org/10.1109/ACCESS.2018.2870430
bility density forecasting based on quantile regression neural net-
work and triangle kernel function", Energy, vol. 114, 2016.
DISCLAIMER: The above article has been published in Epub (ahead of print) on the basis of the materials provided by the author. The Edito-
rial Department reserves the right to make minor modifications for further improvement of the manuscript.

Day-Ahead Load Probabilistic Forecasting Based On Space-Time Correction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Day-Ahead Load Probabilistic Forecasting Based On Space-Time Correction

Uploaded by

Copyright:

Available Formats

Send Orders for Reprints to reprints@benthamscience.

Day-ahead Load Probabilistic Forecasting Based on Space-time Correction

1. INTRODUCTION focused on deterministic load forecasting. Deterministic load

2352-0965/21 $65.00+.00 © 2021 Bentham Science Publishers

Table 1. Correlation result.

Day-related Correlation Coefficient Week-related Correlation Coefficient

Area1 0.79 0.002 !! 0.77 0.002 !!

Area2 0.78 0.002 !! 0.65 0.002 !!

Area3 0.91 0.002 !! 0.7 0.002 !!

Area4 0.89 0.002 !! 0.91 0.002 !!

Area5 0.93 0.002 !! 0.87 0.002 !!

Area6 0.93 0.002 !! 0.92 0.002 !!

Area7 0.88 0.002 !! 0.86 0.002 !!

Area8 0.94 0.002 !! 0.93 0.002 !!

Area9 0.73 0.002 !! 0.78 0.002 !!

Area10 0.76 0.002 !! 0.48 0.002 !!

Fig. (4). Overall framework of dataset.

Fig. (6). The modified model of the time dimension.

Fig. (7). The modified model of the space dimension.

Firstly, we obtain the Pearson correlation matrix between

Table 2. The best parameters for kernel density estimation.

0:00 40 gaussian 12:00 40 gaussian

2:00 36 gaussian 14:00 48 gaussian

4:00 31 gaussian 16:00 43 gaussian

6:00 27 gaussian 18:00 31 gaussian

8:00 33 gaussian 20:00 30 gaussian

10:00 40 gaussian 22:00 29 gaussian

- Area1 Area3 Area4 Area6 Area7 Area8 Area10

PICP 0.9086 0.859 0.8878 0.8511 0.953 0.8721 0.9138

PICP 0.9243 0.9034 0.9973 0.8956 0.9582 0.9138 0.9478

Reduce 8.77% 8.55% 1.86% ╳ ╳ 2.52% 1.75% ╳ ╳ 13.16%

- Area1 Area2 Area4 Area5 Area7 Area8

PICP 0.873 0.92 0.9289 0.9147 0.8845 0.9183

CWC 16.1 2.2 1.31 1.84 5.93 1.68

PICP 0.9026 0.94 0.9645 0.9342 0.8881 0.9218

Modified result PINAW 0.33 0.4 0.33 0.26 0.22 0.29

CWC 3.92 1.09 0.33 0.84 5 1.45

Table 7. Comparison of probability prediction results.

Table.8. The correlation values of different areas.

- Feature1 Feature2 Feature3

gaussian distribution. The real value falls entirely within the

You might also like