Energy: Xuguang Wang, Xiao Li, Jie Su

Energy 273 (2023) 127209
Contents lists available at ScienceDirect
Energy
journal homepage: www.elsevier.com/locate/energy
Distribution drift-adaptive short-term wind speed forecasting

Xuguang Wang a,b ,∗, Xiao Li a,b , Jie Su a,b
a
School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, PR China
b
Hebei Technology Innovation Center of Simulation and Optimized Control for Power Generation, Baoding 071002, PR China
ARTICLE INFO ABSTRACT
Keywords: Accurate short-term wind speed forecasting is essential for wind power system scheduling optimization and
Distribution drift profit maximization. However, the distribution of wind speed evolves over time. Stable and reliable forecasting
BED rule results require that the wind speed forecasting methods be adaptive to the wind speed distribution drift. Thus,
Optimal mode number
how to make the forecasting method adaptive to the wind speed distribution drift becomes a challenge. In this
Short-term wind speed forecasting
study, the distribution of future wind speed is predicted using a tiled convolutional neural network (TCNN)
based-model. The distribution deviation between historical and future wind speed is minimized via weighting
the loss contribution of historical data. A branch accumulation error decreasing (BED) rule is introduced to
adaptively determine the optimal mode number for the variational mode decomposition (VMD) method. Two
hybrid models which employ both the distribution drift correction process and BED rule-based decomposition
process are proposed. The effectiveness of the proposed models is verified using data from two different wind
farms in China. Compared with the traditional short-term wind speed forecasting models, the proposed models
show considerably better robustness to the distribution drift of the wind speed and achieve significantly higher
forecasting accuracy in both the one-step ahead and multistep ahead wind speed forecasting scenarios.
1. Introduction outputs of the selected single models. For instance, in Ref. [6], artificial
intelligence models and mixed-frequency models were used as fore-
Accurate short-term wind speed forecasting is a prerequisite for the casting submodels, and the outputs of the submodels were integrated
daily scheduling of wind power systems, and it plays a crucial role in with a kernel-based extreme learning machine. The prediction results
the efficient use of wind energy. However, the volatility, randomness from four neural networks were weighted and aggregated using a
and time-varying distribution of the wind speed turn the forecasting multiobjective bat algorithm in Ref. [14]. To improve the accuracy of
task into a challenging research hotspot [1–3].
wind speed prediction, Li et al. [15] integrated three hybrid models
Over the past few years, wide varieties of wind speed forecasting
based on a modified SVM.
models have been reported in the literature, such as Refs. [4–8]. These
Wind speed usually fluctuates in multiple patterns. Therefore, the
models can be roughly categorized into single and hybrid models
according to their structures. second subcategory of hybrid models first decomposes the wind speed
The commonly used single models include the deep belief net- series into subsignals and then obtains the forecasted wind speed
work (DBN) [9] model, the autoregressive integrated moving average measurements by separately predicting the measurements of these
(ARIMA) model [10], the least squares support vector machine (LS- subsignals or directly approximating the functional relation between
SVM) [11] model, the long–short term memory (LSTM) [12] model, the subsignals and future wind speed measurements. The commonly
and the gated recurrent unit (GRU) [13] model. Due to the volatile and used signal decomposition methods include the variational mode de-
stochastic nature of wind speed, these single models cannot meet the composition (VMD) [16], empirical mode decomposition (EMD) [17],
accuracy requirement of the wind speed forecasting task. Instead, they wavelet transform (WT) [18] and their variants.
are usually utilized as a prediction module in hybrid models. The hybrid models mentioned below all fall into the second subcate-
Compared with the single models, hybrid models can produce better gory. In Ref. [4], a quaternion convolutional neural network combined
forecasting results. Hybrid models can be further divided into two
with a bidirectional LSTM was proposed to forecast the subsignals
subcategories.
decomposed with VMD, and the hyperparameters of the hybrid model
The first subcategory simply assembles several single models, where
were tuned using the arithmetic optimization algorithm. A hybrid
the final forecasting result corresponds to the weighted sum of the
∗ Corresponding author at: School of Control and Computer Engineering, North China Electric Power University, Baoding 071003, PR China.
E-mail addresses: wangxg@ncepu.edu.cn (X. Wang), lixiao_ncepu@163.com (X. Li), sjzjn@126.com (J. Su).
https://doi.org/10.1016/j.energy.2023.127209
Received 30 November 2022; Received in revised form 30 January 2023; Accepted 12 March 2023
Available online 20 March 2023
0360-5442/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
X. Wang et al. Energy 273 (2023) 127209
Table 1
Categories of the mentioned wind speed forecasting models.
Single model Hybrid model
First subcategory Second subcategory
Hinton et al. [9] (DBN, 2006) Yang et al. [6] (2022) Emeksiz et al. [1] (2022)
Kavasseri et al. [10] (ARIMA, 2009) Qu et al. [14] (2017) Wu et al. [7] (2022)
Suykens et al. [11] (LS-SVM, 1999) Li et al. [15] (2018) Li et al. [8] (2022)
Hochreiter [12] (LSTM, 1997) Hua et al. [19] (2022)
Chung et al. [13] (GRU, 2014) Santhosh et al. [20] (WEE, 2018)
Liu et al. [21] (WAW, 2018)
Singh et al. [22] (RWA, 2019)
Zhang et al. [23] (FVAD, 2020)
Wang et al. [24] (VMD-RF, 2022)
model combining VMD, partial least squares (PLS) and an extreme • Two hybrid models that employ both the distribution drift correc-
learning machine (ELM) was proposed in Ref. [19], and the improved tion process and the BED rule-based decomposition process are
atom search optimization was employed to optimize an ELM. Santhosh proposed to implement the wind speed forecasting task.
et al. [20] decomposed the wind speed data using the ensemble EMD • Comparison and ablation experiments on actual wind speed data
(EEMD) and then forecasted the components using an adaptive wavelet collected from two wind farms are conducted to validate the
neural network to implement multistep wind speed prediction. Liu effectiveness of the proposed models.
et al. [21] proposed three hybrid models using EMD, wavelet packet
decomposition, and ELM. In Ref. [22], wavelet decomposition was used The remainder of this paper is organized as follows. Section 2
to obtain the wind speed components and the components were pre- formulates the wind speed forecasting problem. Section 3 describes
dicted by the proposed repeated WT-based ARIMA prediction model. the distribution prediction for the future data and distribution drift
Zhang et al. [23] decomposed the wind speed data using VMD and sepa- correction process. Section 4 introduces the BED rule-based recursive
rately forecasted the modes according to their center frequencies. Wang decomposition process of the wind speed series. Section 5 elucidates
et al. [24] combined the VMD and the Reformer model to implement a the structure of the proposed hybrid models. Section 6 validates the
short-term wind speed forecasting task. In the sequel, the hybrid models proposed model using comparison and ablation experiments. Finally,
introduced in Refs. [20–24] are referred to as WEE, EAW, RWA, FVAD, key findings of this study are summarized in Section 7.
and VMD-RF respectively for the sake of presentation.
For easy reference, the categories of the abovementioned wind 2. Problem formulation
speed forecasting models are summarized in Table 1.
With the ability to learn complex quantitative relationships hidden By assuming that future short-term wind speed measurements can
in wind speed data, machine learning or deep learning-based models be forecasted by the historical wind speed data, trainable models are
are usually adopted as the prediction module in a hybrid model. designed to approximate the functional relationship between histor-
According to the machine learning theory, the forecasting accuracy of ical and future wind speed measurements. Therefore, the functional
this prediction module cannot be guaranteed if the distributions of the relationship is described as
training data and the data to be predicted are inconsistent. However,
the distribution of the wind speed evolves over time due to its volatile 𝐱𝑖𝑓 =  (𝐱𝑖ℎ ) (1)
and stochastic nature, i.e., distribution drift always occurs in the wind where 𝐱𝑖ℎ
and 𝐱𝑖𝑓
denote the historical and forecasted wind speed seg-
speed series. Therefore, stable and reliable forecasting results require ments, respectively;  (⋅) represents the functional relationship between
that the wind speed forecasting models be adaptive to the distribution 𝐱𝑖ℎ and 𝐱𝑖𝑓 ; and 𝑖 is the index. The lag order 𝑃 = |𝐱𝑖ℎ | is usually set
drift of the wind speed series.
empirically, and 𝐿 = |𝐱𝑖𝑓 | is the step size of this forecasting task.
VMD is a favorable and widely used signal decomposition method
Due to the volatile and stochastic nature of the wind speed, the
because it can sidestep the endpoint effect and the mode aliasing. As a
decomposition method is always employed to capture the fluctuation
manually set key parameter of VMD, the mode number is crucial to
patterns. The subsignals decomposed from the wind speed series are
the wind speed forecasting results. Underdecomposition of the wind
then utilized in two different schemes. One is to separately predict
speed series leads to overlapping fluctuation patterns and therefore
the future measurements for each subsignal and integrate the pre-
causes low-accuracy forecasting results. In the overdecomposed wind
dicted measurements to form the future wind speed measurements. This
speed series scenario, trivial modes increase the computation load,
process can be formulated as
compromise, or even prevail the benefits over the stable future trend of
{ 𝑓 ∑
the subsignals. Therefore, effective determination of the optimal mode 𝐱𝑖 = 𝑘 (𝑘)
̂ (𝑘)
𝑤 (𝐱𝑖 )
number for VMD becomes a major issue. ∑ (𝑘) (2)
ℎ
𝐱𝑖 = 𝑘 𝐱𝑖
To address the above two issues, we construct a tiled convolutional
neural network (TCNN) based-network to predict the distribution of where 𝐱𝑖(𝑘) denotes the subsignal decomposed from 𝐱𝑖ℎ , ̂
𝐱𝑖𝑓 represents the
future wind speed and correct the drift between the distribution of the (𝑘)
forecasted wind speed measurements, and 𝑤 is the trainable model
historical and future wind speeds. Meanwhile, we design a VMD-based designed for predicting the future measurements of the 𝑘th subsignal.
recursive decomposition process in this paper. Another scheme takes the intercorrelations of the subsignals into
The main contributions are summarized as follows: consideration. It directly simulates the relationship between subsignals
• The distribution of the future wind speed measurements is pre- and the future wind speed measurements:
dicted using a TCNN based-network. Then, the distribution devi- 𝐱𝑖𝑓 = 𝑤 (𝐱𝑖(1) , … , 𝐱𝑖(𝐾) )
̂ (3)
ation between the historical and future wind speeds is minimized
by weighting the loss contribution of the historical data with the where 𝑤 is the trainable model that fulfills the wind speed forecasting
predicted distribution. task.
• The branch accumulation error decreasing (BED) rule is designed According to the machine learning theory, the forecasting accuracy
to adaptively determine the optimal mode numbers for VMD and of the abovementioned trainable models cannot be guaranteed if the
a VMD-based recursive decomposition process is presented. distributions of the training data and the data to be predicted are
2
Fig. 2. Training and testing sets of model 𝑝 .
compromise, or even prevail the benefits over the stable future trend of
the subsignals. Therefore, another concern of this study is to determine
how to obtain the optimal mode number during the wind speed series
decomposition process with VMD. To remedy this issue, a VMD-based
recursive decomposition process is proposed in Section 4.
3. Distribution drift correction
Under the condition that the data are independent and identically
distributed (i.e., the i.i.d. condition), machine learning or deep learning-
Fig. 1. Wind speed distribution evolves over time.
based models have been successfully applied to the application of
time series prediction. Nonetheless, this condition cannot be satisfied
in the wind speed forecasting scenario. Therefore, stable and reliable
inconsistent. Nevertheless, wind speed presents complex and volatile forecasting results require that machine learning or deep learning-based
fluctuation characteristics, and its distribution evolves over time. That wind speed forecasting models adapt to the distribution drift of the
is to say, the distribution drift always occurs in the wind speed series. wind speed series.
The distribution drift between two end to end time segments is The distribution drift can be measured [25] and categorized based
usually measured by the Jensen–Shannon divergence (abbreviated as on different criteria [26]. The distribution drift of the wind speed can be
JS divergence), whose formula is either gradual or abrupt according to its change rate. Usually, it evolves
1 𝑃 +𝑄 1 𝑃 +𝑄 gradually (see the example in Fig. 1).
𝐽 𝑆[𝑃 ∣∣ 𝑄] = 𝐾𝑃 [𝑃 ∣∣ ] + 𝐾𝑃 [𝑄 ∣∣ ] (4)
2 2 2 2 To mitigate the negative effects of the distribution drift, previous
where 𝑃 and 𝑄 are the distributions of time segments, and works assumed that the minimum distribution deviation occurs be-
∑ 𝑃 (𝑥) tween the latest measurements and the upcoming future data, and
𝐾𝐿[𝑃 ∣∣ 𝑄] = 𝑃 (𝑥)𝑙𝑜𝑔 (5) fine-tuned the current model using the latest measurements [27,28].
𝑥
𝑄(𝑥)
However, the tuned model is still delayed by at least a one-step to
is the Kullback–Leibler divergence between 𝑃 and 𝑄. As a commonly the upcoming future data. This makes the previous works sensitive
used distribution divergence metric, JS divergence satisfies the proper- to the distribution drift. Minku et al. [29] defined distribution drift
ties 𝐽 𝑆[𝑃 ∣∣ 𝑄] ∈ [0, 1] and 𝐽 𝑆[𝑃 ∣∣ 𝑄] = 𝐽 𝑆[𝑄 ∣∣ 𝑃 ]. and studied predictable distribution drift. Khamassi et al. [30] and Li
For example, Fig. 1 demonstrates the distribution drift of two wind et al. [31] showed that the distribution drift is predictable under certain
speed datasets (i.e., the Inland data and the Coastal data, detailed in assumptions.
Section 6) used in this paper: (a) shows the JS divergence between the By assuming that the distribution drift of the wind speed series
wind speeds measured in January and other months, and (b) presents is predictable, we make an attempt to predict the distribution for
the JS divergence of the wind speed sampled between two adjacent the upcoming future wind speed measurements and weight the loss
months. Apparently, distribution drift occurs in every month of the contribution of the historical data. Our purpose is to minimize the
two wind speed datasets, and the degree of the distribution drift gently distribution deviation between historical and future data for wind speed
fluctuates in most of the months except in the summertime, i.e., from forecasting models.
June to August.
Therefore, stable and reliable forecasting results require that wind 3.1. Distribution prediction
speed forecasting models adapt to the distribution drift of the wind
speed series. The key concern of this study is to determine how to make A trainable model 𝑝 should be designed to implement distribu-
the wind speed forecasting methods adaptive to the distribution drift. tion prediction. Here, the training and testing sets of model 𝑝 are
This issue will be addressed in Section 3. illustrated in Fig. 2.
As a widely adopted signal decomposition method, VMD is favor- As shown in Fig. 2, 𝑇 𝑎𝑠𝑘𝑡𝑟𝑎𝑖𝑛 ∶= {𝑡𝑎𝑠𝑘(1) , … , 𝑡𝑎𝑠𝑘(𝑡) } and 𝑇 𝑎𝑠𝑘𝑡𝑒𝑠𝑡 ∶=
able because it can sidestep the endpoint effect and mode aliasing.
{𝑡𝑎𝑠𝑘(𝑡+1) , … , 𝑡𝑎𝑠𝑘(𝑇 ) } are the training and testing sets of model 𝑝 ,
However, before utilizing VMD to decompose the wind speed series,
respectively. Each set consists of tasks arranged in chronological order.
the mode number must be manually set. The mode number is crucial
Two wind speed segments that are seamlessly connected on the timeline
to the wind speed forecasting results. Underdecomposition of the wind
form a task and adjacent tasks overlap.
speed series leads to overlapping fluctuation patterns and therefore
The loss on 𝑡𝑎𝑠𝑘(𝑡) is defined as
causes low-accuracy forecasting results. In the overdecomposed wind
(𝑡) (𝑡) (𝑡) (𝑡)
speed series scenario, trivial modes increase the computation load, 𝑙𝑜𝑠𝑠𝑡 = 𝐾𝐿[𝑝 (𝐷𝑡𝑟𝑎𝑖𝑛 ) ∣∣ 𝑃𝑡𝑒𝑠𝑡 ] = 𝐾𝐿[𝑃𝑟𝑒𝑠𝑎𝑚 ∣∣ 𝑃𝑡𝑒𝑠𝑡 ] (6)
3
Fig. 3. Structure of model 𝑝 .
Fig. 4. Wind speed series imaging.
(𝑡) (𝑡) (𝑡) (𝑡)

where 𝑃𝑡𝑒𝑠𝑡 is the distribution of 𝐷𝑡𝑒𝑠𝑡 and 𝑃𝑟𝑒𝑠𝑎𝑚 = 𝑝 (𝐷𝑡𝑟𝑎𝑖𝑛 ) denotes 3.2. Drift correction
(𝑡)
the predicted distribution of 𝐷𝑡𝑒𝑠𝑡 . Thus, the training process of model
𝑝 can be formulated as The wind speed forecasting model is usually trained by minimizing
∑ the following loss
̂ = arg min (𝑡) (𝑡)
𝛩 𝐾𝐿[𝑝 (𝐷𝑡𝑟𝑎𝑖𝑛 ) ∣∣ 𝑃𝑡𝑒𝑠𝑡 ] (7)
𝛩 ∑𝑁
𝑡 ̂ = arg min 1
𝛷 𝓁((𝐱𝑖ℎ ), 𝐱𝑖𝑓 ) (9)
𝛷 𝑁
where 𝛩 denotes the parameters of 𝑝 . 𝑖=1
In this paper, 𝑝 adopts a TCNN-based structure [32], as shown in where  can be the model 𝑤 in Eq. (3) or the combination of models
Fig. 3. Since CNN-based neural networks are suited for image-oriented (𝑘)
𝑤 in Eq. (2), 𝛷 denotes the parameters of  and 𝓁(⋅) is the loss
tasks, the wind speed segment is encoded as images before predicting function. In a real-world scenario, however, the distribution of the wind
(𝑡) (𝑡)
the distribution 𝑃𝑟𝑒𝑠𝑎𝑚 . Thus, 𝐷𝑡𝑟𝑎𝑖𝑛 is encoded as images first, the speed series drifts over time, and a correction factor must be added
images are then processed by the major network of two TCNNs stacked to Eq. (9) to counteract the distribution bias between the training and
together, and the fully connected layer transforms the output of the testing data [35,36], i.e.,
(𝑡)
major network into the distribution 𝑃𝑟𝑒𝑠𝑎𝑚 .
∑𝑁
In model 𝑝 , the tiling size is set to 2, and 4 filters are adopted to ̂ = arg min 1
𝛷 𝑣𝑖 𝓁((𝐱𝑖ℎ ), 𝐱𝑖𝑓 ) (10)
𝛷 𝑁
implement the convolution operator, i.e., the three encoded images are 𝑖=1
transformed into 12 feature maps. The receptive fields for the first and where
second convolutional layers are set to 8 × 8 and 3 × 3, respectively.
𝑃𝑡𝑒𝑠𝑡 (𝐱𝑖ℎ , 𝐱𝑖𝑓 )
Both topographic independent component analysis (TICA) pooling lay- 𝑣𝑖 = (11)
ers pool a 3 × 3 patch in the previous layer without wraparound 𝑃𝑡𝑟𝑎𝑖𝑛 (𝐱𝑖ℎ , 𝐱𝑖𝑓 )
borders. Here, 𝑃𝑡𝑒𝑠𝑡 and 𝑃𝑡𝑟𝑎𝑖𝑛 are the wind speed distributions of the testing and
Here, three types of images are adopted to encode the wind speed training data, respectively. According to Eq. (1), there is a functional
series, i.e., the Markov transition field (MTF) [33], the Gramian angular relation between 𝐱𝑖ℎ and 𝐱𝑖𝑓 ; thus,
summation field (GASF) [34], and the Gramian angular difference field {
(GADF) [34]. MTF encodes the multispan transition probabilities of a 𝑃𝑡𝑒𝑠𝑡 (𝐱𝑖ℎ , 𝐱𝑖𝑓 ) = 𝑃𝑡𝑒𝑠𝑡 (𝐱𝑖ℎ ,  (𝐱𝑖ℎ )) = 𝑃𝑡𝑒𝑠𝑡 (𝐱𝑖𝑓 )
(12)
time series. GASF and GADF can describe the temporal dependency of 𝑃𝑡𝑟𝑎𝑖𝑛 (𝐱𝑖ℎ , 𝐱𝑖𝑓 ) = 𝑃𝑡𝑟𝑎𝑖𝑛 (𝐱𝑖ℎ ,  (𝐱𝑖ℎ )) = 𝑃𝑡𝑟𝑎𝑖𝑛 (𝐱𝑖𝑓 )
a time series on different timescales, but the information about the Therefore,
fluctuation range of the time series is missing. Therefore, GASF and
𝑃𝑡𝑒𝑠𝑡 (𝐱𝑖𝑓 )
GADF are multiplied by a factor 𝑣𝑖 = (13)
𝑃𝑡𝑟𝑎𝑖𝑛 (𝐱𝑖𝑓 )
𝛼 = 𝑚𝑎𝑥(𝐱) − 𝑚𝑖𝑛(𝐱) (8)
𝑃𝑡𝑟𝑎𝑖𝑛 (𝐱𝑖𝑓 ) can be estimated from the training data, while 𝑃𝑡𝑒𝑠𝑡 (𝐱𝑖𝑓 ) can
in this paper and become scaled GASF and scaled GADF, respectively. be predicted using the model 𝑝 shown in Fig. 3.
Here, 𝐱 denotes a time series, and 𝑚𝑎𝑥(𝐱) and 𝑚𝑖𝑛(𝐱) are the maximum
and minimum measurements of 𝐱, respectively. 4. BED rule-based recursive decomposition
The MTF, scaled GASF and scaled GADF computed from the coastal
data measured in July are illustrated in Fig. 4. The figure shows that VMD is a widely used signal decomposition method. Manually
the three types of images indeed encode different information about setting the mode number, however, makes this method worrisome
the wind speed series. when employed in wind speed forecasting tasks. Since the wind speed
4
Fig. 5. Forecasting accuracy under different mode numbers.
forecasting accuracy is sensitive to the mode number, both under- After 𝐱(𝑘0 ) is decomposed, the total testing error becomes
decomposed and overdecomposed wind speed series lead to further 1 ∑ ∑ (𝑘 ,𝑟) (𝑘 ,𝑟)
improvement-required forecasting results. 𝑒𝑡𝑜𝑡 = ( ̂(𝑘)
∣𝑥𝑖 − 𝑥(𝑘)
𝑖 ∣+ ̂𝑖 0 − 𝑥𝑖 0 ∣)
∣𝑥
𝑀 𝑘≠𝑘 ,𝑖 𝑟,𝑖
0
For example, Fig. 5 presents the wind speed forecasting accuracies ∑ ∑ ∑ ∑ (15)
of several popular models (including VMD-RF, FVAD, SL-SVM, DBN, = 𝑒1𝑘 + 𝑒2𝑘 ,𝑟 < 𝑒1𝑘 + 𝑒1𝑘 = 𝑒1𝑘
0 0
𝑘≠𝑘0 𝑟 𝑘≠𝑘0 𝑘
and ARIMA) under different mode numbers: (a) and (b) display the
wind speed forecasting results calculated on the Inland data, and wind where 𝑥(𝑘)and
(𝑘 ,𝑟)
𝑥𝑖 0
(𝑘 ,𝑟)
are the measurements of 𝐱𝑖(𝑘) and 𝐱𝑖 0 respec-
𝑖
speed samples collected in the first three weeks and the final week of (𝑘) (𝑘0 ,𝑟) (𝑘 ,𝑟)
tively, and 𝑥
̂𝑖 and 𝑥̂𝑖 are the forecasted values of 𝑥(𝑘)
𝑖 and 𝑥𝑖 0 ,
June are used as the training and testing data, respectively. While (c)
respectively.
and (d) present the forecasting results calculated on the Coastal data,
Based on the BED rule, the wind speed decomposition process can
wind speed samples measured in the first three weeks and the final
be recursively executed with the following steps:
week of September are utilized as the training and testing data, respec-
tively. The wind speed data used in this example will be described in Step 1. The wind speed series is decomposed with VMD;
detail in Section 6.
The forecasting accuracies of these models fluctuate with and are Step 2. Check each subsignal of the wind speed series. If it satisfies
sensitive to the mode number. Each model achieves the best forecasting the BED rule, it is further decomposed with VMD;
accuracy under a specific mode number, and these optimal mode
numbers vary with models. This motivates us to explore an adaptive Step 3. Repeat Step 2 until no subsignal satisfies the BED rule.
way to set the mode number when utilizing VMD.
In this part, the mode number is set to 4 for the first decomposition
Based on the BED rule, we propose recursive decomposition in this
level and 2 for other decomposition levels. Typically, the decomposi-
section. Here, we adopt the mean absolute error (MAE) as the error
tion process stops within the third level. When a signal is decomposed
index. The BED rule is used to determine whether a signal needs further
decomposition; that is, the decomposition is executed if the sum of the with VMD, a residual component is always generated. Therefore, the
subsignals testing errors is less than the testing error of the decomposed permutation entropy is used as a signal randomness metric to filter the
signal. residual component.
Without ambiguity, here we let 𝐱 = [𝑥1 , … , 𝑥𝑀 ] be the testing data
of a model, 𝐱(𝑘) be the 𝑘th subsignal of 𝐱 and 𝐱(𝑘,𝑟) be the 𝑟th subsignal 5. Forecasting model
of 𝐱(𝑘) . We further let 𝑒1𝑘 and 𝑒2𝑘,𝑟 denote the testing errors on 𝐱(𝑘) and
The historical wind speed series is always decomposed into sub-
𝐱(𝑘,𝑟) , respectively. Then the BED rule can be formulated as
∑ signals in advance and then utilized in two different schemes. The
𝑒2𝑘,𝑟 < 𝑒1𝑘 (14) first scheme is to separately predict the future measurements for each
𝑟 subsignal and integrate the forecasted measurements to form the future
It can be proven that if the BED rule holds for the 𝑘0 subsignal, the test- wind speed, as formulated in Eq. (2). The second scheme considers
ing error 𝑒𝑡𝑜𝑡 of the model decreases after 𝐱(𝑘0 ) is further decomposed. the intercorrelations of the subsignals and then directly simulates the
5
Fig. 6. Implementation of the second wind speed forecasting scheme.
relation between the subsignals and the future wind speed measure- Fig. 6(a) shows that the historical wind speed series is decom-
ments, as formulated in Eq. (3). In this paper, the two schemes are both posed into subsignals after the BED rule-based recursive decomposi-
implemented to explore the effectiveness of the proposed processes tion process at first. The decomposition residue is filtered based on
(i.e., the distribution drift correction process and the BED rule-based its permutation entropy to obtain the denoised wind speed series.
recursive decomposition process). Both the subsignals and denoised wind speed series are then received
When implementing the first wind speed forecasting scheme, the by the modified Transformer model to forecast future wind speed
correction factor 𝑣𝑖 is estimated using the distribution drift correction measurements.
process. It is used to weight the loss contribution of each training Fig. 6(b) shows both the self-attention mechanism and the
sample for the forecasting model (𝑘) 𝑤 (see Eq. (10)). The purpose is to correlation-attention mechanism for illustration purposes. Here, 𝐐, 𝐊,
counteract the distribution bias between the historical and future wind and 𝐕 denote the query, key, and value matrices, respectively. In the
speed data. The historical wind speed series is then decomposed into self-attention mechanism, 𝐐, 𝐊 and 𝐕 are calculated from the same
subsignals after the BED rule-based recursive decomposition process. input. In the correlation-attention mechanism, 𝐊 and 𝐕 are calculated
The future wind speed measurements are finally integrated from the from the denoised historical wind speed series, while 𝐐 is calculated
future measurements of the subsignals. In this paper, the modified from the subsignals decomposed from the historical wind speed series.
Reformer reported in [24] is employed as the forecasting module of The correlation-attention mechanism can describe the inner-correlation
the first proposed hybrid model. We refer to this implementation as between the fluctuation patterns and the wind speed data.
the Reformer-based distribution drift-adaptive model (RDA in short) for The second implementation is referred to as the Transformer-based
narrative purposes. distribution drift-adaptive model (TDA in short) in this paper.
To implement the second wind speed forecasting scheme, this study
selects the Transformer model [37] as the forecasting module. The 6. Numerical results
structure of the second proposed wind speed forecasting model is
shown in Fig. 6(a). In this section, two real wind speed datasets collected from typical
The original Transformer model adopts the encoder–decoder struc- areas of China are utilized. Comparison and ablation experiments are
ture and cannot be directly applied to the wind speed forecasting task. carried out to validate the superiority of the proposed models.
The reason is that the word embedding module in the input layer is
designed for receiving a sentence but not for a numerical series. Thus, 6.1. Wind speed datasets
the module should be modified. The positional encoding module should
be retained because it can be used to encode the position information The first dataset1 (one-hour resolution) is obtained from an inland
of the historical wind speed series. The output layer of the Transformer wind farm [38] in Hebei Province, and the second dataset2 (15-min
should also be adjusted because the softmax module transforms the out- resolution) is acquired from a coastal wind farm in Hainan Province.
put vector into a probability vector. The modified Transformer model Both datasets covered a one-year period. In the sequel, these datasets
is depicted in the bottom half of Fig. 6(a). It shows that the softmax are referred to as the Inland data and the Coastal data, respectively.
module in the output layer is replaced by a fully connected neural The wind speed measurements of the two datasets sampled in February,
network. The modified input layer contains two parallelly arranged May, August, and November are shown in Fig. 7.
input modules. To accommodate the modifications of the input layer,
the self-attention mechanism used in the original multihead attention
module of the lowermost encoder is adjusted to a mechanism termed 1
This dataset is also used in Refs. [23,24].
2
the correlation-attention mechanism in the modified Transformer. A one-hour-resolution version of this dataset is used in Ref. [24].
6
Fig. 7. Wind speed measurements of the Inland data and Coastal data collected in February, May, August and November.
Fig. 8. Statistics of the wind speed datasets.
Fig. 8 presents some seasonal statistics of the Inland data and

𝑀𝐴𝑃 𝐸𝑐 − 𝑀𝐴𝑃 𝐸𝑝
Coastal data. These datasets clearly demonstrate seasonal variation 𝑃𝑀𝐴𝑃 𝐸 = × 100% (18)
𝑀𝐴𝑃 𝐸𝑐
characteristics, and they reach the maximum average wind speeds and
the kurtosis values during April and June. In each season, the Coastal where 𝑀𝐴𝐸𝑐 , 𝑅𝑀𝑆𝐸𝑐 and 𝑀𝐴𝑃 𝐸𝑐 are error indices obtained from
data present a higher average wind speed than the Inland data. The a comparative model; while 𝑀𝐴𝐸𝑝 , 𝑅𝑀𝑆𝐸𝑝 and 𝑀𝐴𝑃 𝐸𝑝 are the
Coastal data are also more volatile during most months of the year. error indices obtained from the proposed models. The higher the im-
provement index value is, the better the improvement of the proposed
The Inland data fluctuate the most during April and June, while the
models.
Coastal data fluctuate the most during January and March. The wind
Two stability indices, i.e., the index of agreement (IA) and the
speed data in both the datasets fluctuate gently in July and September.
variance of absolute error (VAE), are also employed to evaluate the
forecasting stability of the models:
6.2. Assessment indices ∑𝑁
̂𝑖 )2
𝑖 (𝑥𝑖 − 𝑥
𝐼𝐴 = 1 − ∑𝑁 (19)
̂𝑖 − 𝑥 ∣ + ∣ 𝑥𝑖 − 𝑥 ∣)2
𝑖 (∣ 𝑥
Three error indices, including MAE, root mean square error (RMSE),
and mean absolute percent error (MAPE), are selected to evaluate the 𝑉 𝐴𝐸 = 𝑉 𝑎𝑟(∣ 𝑥𝑖 − 𝑥
̂𝑖 ∣) (20)
performance of the forecasting models. where 𝑥𝑖 is an actual value of the wind speed measurement, 𝑥 ̂𝑖 denotes
To quantify the improvement of the relative forecasting accuracy the predicted measurement, 𝑥 represents the average wind speed, and
achieved by our models, improvement indices are defined based on the 𝑁 is the observation size. The larger IA or the smaller VAE is, the better
above three error indices, i.e., the forecasting stability of a model.
𝑀𝐴𝐸𝑐 − 𝑀𝐴𝐸𝑝
𝑃𝑀𝐴𝐸 = × 100% (16) 6.3. Comparison experiments
𝑀𝐴𝐸𝑐
𝑅𝑀𝑆𝐸𝑐 − 𝑅𝑀𝑆𝐸𝑝 Comparison experiments are implemented to verify the performance
𝑃𝑅𝑀𝑆𝐸 = × 100% (17) of the proposed models. The single models LS-SVM and GRU, and the
𝑅𝑀𝑆𝐸𝑐
7
Fig. 9. Average error indices of the multistep-ahead forecasting.
Fig. 10. 1-step-ahead forecasting measurements for the Inland data.
hybrid models WEE, EAW, RWA, FVAD, and VMD-RF are selected as For both wind speed datasets, wind speeds measured in the final
the comparative models. week of February, May, August, and November are utilized as the
LS-SVM adopts the Gaussian kernel and is executed with LS-SVMlab testing data. The wind speeds measured four weeks before the testing
[39]. GRU is established with three hidden layers where the hidden data are used as the corresponding training data.
neural numbers are 12, 14 and 16. The parameter Settings of other In this section, the direct multistep-ahead forecasting scheme (for-
comparative models are consistent with those in the corresponding ref- mulated in Eq. (1)) is adopted and the multistep-ahead horizons are
erences. For the proposed RDA model, the length of the historical wind utilized to evaluate the performance of the proposed models and the
speed segment is set to 10. Its Reformer module contains 4 encoders comparative models.
and 4 decoders, and the repeated hashing number and the bucket size Fig. 9 shows the average error indices of each model. Here, the
are set to 8 and 2, respectively. During the training process of RDA, the average error index refers to the mean error index calculated on the
dropout rate, learning rate, and epoch are set to 0.1, 0.0001, and 5000, above four testing parts of a dataset.
respectively. For the proposed TDA model, the length of the historical As shown in Fig. 9, the forecasting accuracy of each model decreases
wind speed segment is also set to 10. Its Transformer module in TDA with a larger forecasting step3 size on both datasets. For each model,
consists of 6 encoders and 6 decoders, and the head number of the
multihead mechanism is set to 16. During the training process of TDA,
the dropout rate, learning rate, and epoch are set to 0.1, 0.002, and 3
The time domain length for a forecasting step in a one-step-ahead
10 000, respectively. forecasting task is equal to the time-scale resolution of the wind speed dataset.
8
Fig. 11. Average improvement indices achieved by TDA.
Fig. 12. Average IA and VAE values calculated based on the Coastal data.
the increments in testing error between different forecasting step sizes and those for other comparative models range from approximately 5%
are smaller on the Coastal data. This is attributed to the finer time-scale to 75%. Taking the improvement index 𝑃𝑀𝐴𝐸 as an example, based
resolution of the Coastal data. The hybrid models perform significantly on the Inland data, the minimum values of 𝑃𝑀𝐴𝐸 in different seasons
better than the single models in each forecasting step. The proposed range from 15.31%–26.94%, 12.17%–21.03% and 6.59%–24.45% for
RDA and TDA models are the two best-performing models. In most of
the 1-step-ahead, 2-step-ahead, and 3-step-ahead forecasting, respec-
the scenarios, the TDA model performs slightly better than the RDA
tively. Based on the Coastal data, the minimum values of 𝑃𝑀𝐴𝐸 in
model.
different seasons range from 14.22%–33.82%, 11.07%–29.08%, and
Tables 2 and A.6 list the specific error indices of each model
based on the Inland data and Coastal data, respectively. The bold 10.68%–31.43% for the 1-step-ahead, 2-step-ahead, and 3-step-ahead
values in the two tables represent the optimal error indices obtained forecasting, respectively. The 𝑃𝑀𝐴𝐸 calculated for a fixed comparative
in each forecasting step. The performance of all the models varies with model does not fluctuate much with forecasting step.
season. In the 1-step-ahead forecasting task, the proposed TDA model The average improvement indices at each forecasting step are plot-
outperforms other models in all cases. While in the 2-step-ahead and ted in Fig. 11. The figure shows that the average improvement indices
3-step-ahead forecasting tasks, the TDA model still performs the best for the most of the comparative models are beyond 20% (except VMD-
under most circumstances. TDA and RDA are the top two performing
RF). For single models, all the average improvement indices are high
models in multistep forecasting. This indicates the effectiveness of the
(greater than 60%). The average values of 𝑃𝑀𝐴𝐸 and 𝑃𝑅𝑀𝑆𝐸 for SL-
distribution drift correction process and the BED rule-based recursive
decomposition process. SVM, GRU, WEE, EAW, and RWA are also above 60%. However, they
For illustration purposes, the 1-step-ahead forecasting wind speed barely fluctuate with forecasting steps on both datasets. Even for VMD-
measurements for the Inland data and Coastal data are plotted in RF, all the average improvement indices are beyond 15% and 5% in
Figs. 10 and A.15, respectively. The forecasted measurements of the each forecasting step of the Coastal data and Inland data, respectively.
two proposed models are in good agreement with the actual wind speed All these results confirm the superiority and validity of the proposed
data. Nevertheless, the differences between the actual and forecasted TDA model.
wind speed measurements of the single models are obvious.
To investigate the forecasting stability of the proposed models, we
Improvement indices are calculated based on the error indices ob-
conduct comparison experiments between our models and the compar-
tained by TDA and the comparative models. The specific improvement
ative models on the stability indices. Due to limited space, we only
indices are shown in Tables 3 and A.7. The error indices obtained
by each comparative model are markedly improved by TDA on both list the specific stability indices calculated based on the Coastal data
the Inland data and Coastal data. For each comparative model, the in Table 4. Here, values in bold are the optimal stability indices. We
improvement index values vary with season and dataset. All the im- also show the charts of the average stability index vs. forecasting step
provement indices calculated for the single models are beyond 60%, in Fig. 12.
9
Table 2
Specific error indices obtained based on the Inland data.
Model Feb May Aug Nov
MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE MAE RMSE MAPE
1-step-ahead forecasting
LS-SVM 0.6041 0.7709 12.04% 0.7933 0.9300 13.31% 0.6625 0.8444 14.63% 0.7401 0.9120 13.24%
GRU 0.5915 0.7435 11.32% 0.7359 0.8946 11.27% 0.6097 0.7670 12.39% 0.7075 0.8862 13.02%
WEE 0.5082 0.5963 6.54% 0.5508 0.6304 6.43% 0.5448 0.6356 8.38% 0.4936 0.7243 7.73%
EAW 0.5266 0.6125 6.44% 0.6161 0.7553 6.34% 0.5811 0.7715 9.46% 0.5328 0.8256 8.66%
RWA 0.3820 0.4677 5.74% 0.4804 0.5310 5.86% 0.4460 0.5623 7.31% 0.3677 0.6109 6.89%
FVAD 0.1856 0.2291 4.40% 0.3662 0.4645 5.44% 0.2355 0.2959 6.86% 0.2783 0.3423 5.66%
VMD-RF 0.1737 0.2080 4.22% 0.1970 0.2199 3.39% 0.2136 0.2734 5.41% 0.2258 0.2650 5.49%
RDA 0.1540 0.1937 4.15% 0.1855 0.2032 3.23% 0.1922 0.2462 5.27% 0.2003 0.2434 5.37%
TDA 0.1269 0.1748 4.03% 0.1651 0.1865 3.16% 0.1809 0.2271 5.11% 0.1850 0.2215 5.08%
LS-SVM 1.1359 1.4120 19.36% 1.5234 1.7663 22.76% 1.1990 1.4979 26.74% 1.2447 1.6013 20.03%
GRU 1.0859 1.3640 18.77% 1.2588 1.5029 19.39% 1.0252 1.3467 18.74% 1.2286 1.5249 19.97%
WEE 0.8691 1.0017 10.50% 1.1154 1.2913 12.88% 0.9457 1.0941 12.93% 0.9143 1.3775 12.48%
EAW 0.9918 1.1175 11.87% 1.1814 1.3395 13.16% 0.9800 1.3304 13.99% 0.9643 1.4697 14.22%
RWA 0.6916 0.8300 9.89% 0.8079 0.9328 8.88% 0.8073 1.0009 12.00% 0.7049 1.1554 11.99%
FVAD 0.3123 0.4025 7.31% 0.6358 0.7997 8.40% 0.4517 0.5594 11.83% 0.4709 0.5746 8.88%
VMD-RF 0.3015 0.3579 6.77% 0.3650 0.4181 5.58% 0.3607 0.4587 8.49% 0.4091 0.4778 9.45%
RDA 0.2646 0.3411 6.52% 0.3358 0.3618 5.44% 0.3482 0.4441 8.30% 0.3572 0.4212 8.61%
TDA 0.2381 0.3113 6.35% 0.3167 0.3527 5.29% 0.3165 0.3867 8.20% 0.3593 0.4218 8.68%
LS-SVM 1.4625 1.7158 27.97% 1.7530 2.0453 25.88% 1.4779 1.8882 34.03% 1.6121 2.0340 27.55%
GRU 1.3261 1.7029 25.22% 1.6151 1.9694 23.63% 1.3275 1.7119 25.89% 1.5306 1.9034 26.50%
WEE 1.1331 1.3339 13.39% 1.3611 1.4553 14.75% 1.1788 1.3652 17.07% 1.0449 1.5554 16.58%
EAW 1.1462 1.3668 15.21% 1.3738 1.6894 15.89% 1.2303 1.5118 18.57% 1.1716 1.8735 18.63%
RWA 0.8265 1.0040 11.70% 1.0459 1.1847 12.19% 1.1031 1.3709 16.91% 0.8238 1.3993 13.93%
FVAD 0.4297 0.5091 9.77% 0.7925 0.9970 11.09% 0.6737 0.6778 13.86% 0.5548 0.6904 12.88%
VMD-RF 0.3930 0.4920 8.64% 0.4171 0.4721 8.05% 0.4296 0.5449 12.59% 0.5046 0.5865 12.50%
RDA 0.3452 0.4438 8.38% 0.3780 0.4348 6.62% 0.4249 0.5412 11.56% 0.4282 0.5037 11.32%
TDA 0.2969 0.4038 7.98% 0.3700 0.4273 6.38% 0.4013 0.4990 10.78% 0.4284 0.5039 11.16%
Table 3
Specific improvement indices achieved by TDA based on the Inland data.
𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸 𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸 𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸 𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸
LS-SVM 78.99% 77.33% 66.53% 79.19% 79.95% 76.26% 72.69% 73.11% 65.07% 75.00% 75.71% 61.63%
GRU 78.55% 76.49% 64.40% 77.56% 79.15% 71.96% 70.33% 70.39% 58.76% 73.85% 75.01% 60.98%
WEE 75.03% 70.69% 38.38% 70.03% 70.42% 50.84% 66.80% 64.27% 39.02% 62.52% 69.42% 34.28%
EAW 75.90% 71.46% 37.42% 73.20% 75.31% 50.15% 68.87% 70.56% 45.98% 65.28% 73.17% 41.34%
RWA 66.78% 62.63% 29.79% 65.63% 64.88% 46.08% 59.44% 59.61% 30.09% 49.69% 63.74% 26.27%
FVAD 31.63% 23.70% 8.41% 54.92% 59.85% 41.91% 23.18% 23.25% 25.51% 33.53% 35.29% 10.25%
VMD-RF 26.94% 15.96% 4.50% 16.19% 15.19% 6.78% 15.31% 16.93% 5.55% 18.07% 16.42% 7.47%
LS-SVM 79.04% 77.95% 67.17% 79.21% 80.03% 76.76% 73.60% 74.18% 69.34% 71.13% 73.66% 56.67%
GRU 78.07% 77.18% 66.14% 74.84% 76.53% 72.73% 69.13% 71.29% 56.28% 70.76% 72.34% 56.52%
WEE 72.60% 68.92% 39.50% 71.61% 72.69% 58.94% 66.53% 64.66% 36.60% 60.70% 69.38% 30.44%
EAW 75.99% 72.14% 46.46% 73.19% 73.67% 59.80% 67.70% 70.93% 41.38% 62.74% 71.30% 38.97%
RWA 65.57% 62.49% 35.77% 60.80% 62.19% 40.44% 60.80% 61.36% 31.67% 49.03% 63.49% 27.58%
FVAD 23.76% 22.66% 13.01% 50.19% 55.90% 37.01% 29.93% 30.87% 30.68% 23.69% 26.59% 2.22%
VMD-RF 21.03% 13.02% 6.17% 13.23% 15.64% 5.15% 12.25% 15.70% 3.39% 12.17% 11.72% 8.14%
LS-SVM 79.70% 76.47% 71.46% 78.89% 79.11% 75.33% 72.85% 73.57% 68.32% 73.43% 75.23% 59.49%
GRU 77.61% 76.29% 68.36% 77.09% 78.30% 72.97% 69.77% 70.85% 58.36% 72.01% 73.53% 57.88%
WEE 73.79% 69.73% 40.38% 72.82% 70.64% 56.70% 65.96% 63.45% 36.84% 59.00% 67.60% 32.69%
EAW 74.10% 70.46% 47.54% 73.07% 74.71% 59.81% 67.38% 66.99% 41.95% 63.43% 73.10% 40.09%
RWA 64.08% 59.78% 31.80% 64.62% 63.93% 47.63% 63.62% 63.60% 36.24% 47.99% 63.99% 19.87%
FVAD 30.91% 20.68% 18.32% 53.31% 57.14% 42.41% 40.43% 26.38% 22.20% 22.78% 27.01% 13.35%
VMD-RF 24.45% 17.92% 7.61% 11.29% 9.49% 20.69% 6.59% 8.42% 14.35% 15.10% 14.08% 10.74%
The IA and VAE values of a model vary from month to month at indicates that the hybrid models have better forecasting stability than
the same forecasting step size. For each model, the IA value drops and the single models. Among the hybrid models, the proposed RDA and
the VAE value becomes larger as the forecasting step size increases. TDA models can achieve the optimal IA and VAE values at each
The IA values of the single models are significantly lower than those forecasting step size, thus they show the best forecasting stability. The
of the hybrid models at each forecasting step, while the VAE values TDA model performs slightly better than the RDA model across almost
of the single models are larger than those of the hybrid models. This all scenarios.
10
Fig. 13. Average error indices of the ablation experiment.
6.4. Ablation experiments • TDA-𝛼: The improved VMD-MT model, in which the distribution
drift correction module is employed.
In this subsection, ablation experiments are designed to validate the • TDA-𝛽: The improved VMD-MT model, in which the VMD module
effectiveness of the distribution drift correction process and the BED is replaced by the BED rule-based decomposition module.
rule-based decomposition process integrated into the proposed RDA • TDA: The improved VMD-MT model, in which the VMD module
model and TDA model. is replaced by the BED rule-based decomposition module and the
distribution drift correction module is employed.
For narrative convenience, the nomenclature of the referred models
is listed below. During the ablation experiments, the hyperparameters of the VMD
module (used in RDA-𝛼, TDA-𝛼, and VMD-MT) are set to be consistent
• VMD-RF: A hybrid model combining VMD and Reformer [24].
with those of VMD-RF. The testing and training data of these models
• RDA-𝛼: The improved VMD-RF model, in which the distribution
are consistent with those in the comparison experiments.
drift correction module is employed. The specific error indices at each forecasting step on the Inland
• RDA-𝛽: The improved VMD-RF model, in which the VMD module data and Coastal data are listed in Tables 5 and A.8, respectively.
is replaced by the BED rule-based decomposition module. The average error indices at each forecasting step are also shown in
• RDA: The improved VMD-RF model, in which the VMD module Fig. 13 for ease of illustration. Apparently, both the distribution drift
is replaced by the BED rule-based decomposition module and the correction process and the BED rule-based decomposition process can
distribution drift correction module is employed. separately improve the performances of the proposed RDA and TDA
• VMD-MT: A hybrid model combining VMD and the modified models. The experimental results also indicate the positive synergy of
Transformer. the two processes to the performances of the proposed models. That
11
Fig. 14. Radar charts of the average improvement indices and stability indices.
is, the proposed models can benefit from utilizing the two processes 6.5. Summary
together.
To quantify the improvements achieved by the distribution drift The implications of the above comparison and ablation experiments
correction process and the BED rule-based decomposition process, the are as follows:
average improvement indices of RDA-𝛼, RDA-𝛽, and RDA over VMD-
• The distribution drift correction process can effectively mitigate
RF, and those of TDA-𝛼, TDA-𝛽, and TDA over VMD-MT are calcu-
the negative impact on the wind speed forecasting accuracy
lated. Meanwhile, the average stability indices of these models are also caused by the distribution deviation between the historical and
counted. Constrained by limited space, we merely present the average future data.
improvement indices and average stability indices calculated based on • Compared to VMD, the BED rule-based decomposition process
the Coastal data in Fig. 14.4 can obtain more appropriate fluctuation patterns for wind speed
As displayed in Fig. 14, each kind of average improvement in- forecasting tasks.
dex caused by the distribution drift correction process is almost a • Both the TDA model and RDA model benefit from the synergy
slightly greater than that of the BED rule-based decomposition pro- of the distribution drift correction process and BED rule-based
cess for different forecasting step sizes. Moreover, the largest average decomposition process.
improvement indices of each kind and the best stability indices in • The correlation-attention mechanism employed in the modified
different forecasting step sizes all benefit from the synergy of the very Transformer module contributes to the superior performance of
two proposed processes. All these results validate the effectiveness TDA.
of the distribution drift correction process and the BED rule-based
In contrast with the comparative models, the advantages of the
decomposition process.
proposed TDA model and RDA model are as follows:
• The proposed models achieve significantly higher forecasting ac-

4
In this figure, the average IA and 0.1 minus average VAE are adopted as curacy in both the one-step ahead and multistep ahead wind
the average stability indices. speed forecasting scenarios. For example, all the TDA-based av-
12
Table 4
Stability indices calculated on the Coastal data.
IA VAE IA VAE IA VAE IA VAE
LS-SVM 0.9827 0.0453 0.9778 0.0488 0.9806 0.0471 0.9757 0.0480
GRU 0.9840 0.0466 0.9782 0.0492 0.9769 0.0466 0.9756 0.0477
WEE 0.9889 0.0414 0.9879 0.0423 0.9865 0.0402 0.9833 0.0413
EAW 0.9873 0.0426 0.9863 0.0422 0.9868 0.0420 0.9841 0.0425
RWA 0.9905 0.0407 0.9881 0.0396 0.9899 0.0393 0.9882 0.0401
FVAD 0.9952 0.0382 0.9934 0.0371 0.9942 0.0386 0.9898 0.0393
VMD-RF 0.9971 0.0378 0.9967 0.0370 0.9960 0.0351 0.9920 0.0356
RDA 0.9986 0.0315 0.9989 0.0307 0.9987 0.0311 0.9974 0.0308
TDA 0.9990 0.0259 0.9994 0.0228 0.9993 0.0240 0.9988 0.0251
LS-SVM 0.9765 0.0753 0.9707 0.0800 0.9771 0.0776 0.9738 0.1052
GRU 0.9712 0.0746 0.9749 0.0797 0.9762 0.0772 0.9733 0.1048
WEE 0.9831 0.0630 0.9859 0.0636 0.9852 0.0605 0.9822 0.0633
EAW 0.9829 0.0645 0.9834 0.0641 0.9840 0.0634 0.9834 0.0635
RWA 0.9863 0.0624 0.9872 0.0621 0.9880 0.0570 0.9816 0.0564
FVAD 0.9918 0.0619 0.9929 0.0611 0.9923 0.0546 0.9828 0.0557
SVM-RF 0.9961 0.0572 0.9960 0.0562 0.9936 0.0500 0.9870 0.0568
RDA 0.9977 0.0496 0.9976 0.0489 0.9976 0.0477 0.9942 0.0469
TDA 0.9981 0.0437 0.9979 0.0429 0.9986 0.0364 0.9942 0.0445
LS-SVM 0.9632 0.1010 0.9691 0.1069 0.9793 0.1033 0.9730 0.1173
GRU 0.9625 0.0981 0.9715 0.1098 0.9758 0.1047 0.9729 0.1191
WEE 0.9762 0.0958 0.9858 0.0949 0.9860 0.0947 0.9807 0.0960
EAW 0.9781 0.0901 0.9845 0.0944 0.9818 0.0959 0.9823 0.0963
RWA 0.9817 0.0892 0.9870 0.0930 0.9865 0.0837 0.9799 0.0845
FVAD 0.9884 0.0810 0.9921 0.0805 0.9914 0.0804 0.9828 0.0809
VMD-RF 0.9950 0.0727 0.9936 0.0770 0.9932 0.0758 0.9854 0.0767
RDA 0.9968 0.0640 0.9964 0.0671 0.9961 0.0636 0.9914 0.0633
TDA 0.9973 0.0595 0.9972 0.0542 0.9975 0.0549 0.9932 0.0634
Table 5
Specific error indices of the ablation experiment based on the Inland data.
VMD-RF 0.1737 0.2080 4.22% 0.1970 0.2199 3.39% 0.2136 0.2734 5.41% 0.2258 0.2650 5.49%
RDA-𝛼 0.1687 0.2039 4.18% 0.1943 0.2136 3.34% 0.2025 0.2612 5.34% 0.2116 0.2629 5.40%
RDA-𝛽 0.1703 0.2006 4.21% 0.1947 0.2104 3.32% 0.1962 0.2607 5.37% 0.2131 0.2597 5.44%
RDA 0.1540 0.1937 4.15% 0.1855 0.2032 3.23% 0.1922 0.2462 5.27% 0.2003 0.2434 5.37%
VMD-MT 0.1586 0.2001 4.19% 0.1882 0.2046 3.30% 0.1944 0.2575 5.28% 0.2124 0.2451 5.42%
TDA-𝛼 0.1445 0.1939 4.12% 0.1774 0.1889 3.27% 0.1918 0.2355 5.16% 0.1887 0.2276 5.20%
TDA-𝛽 0.1520 0.1766 4.16% 0.1808 0.1957 3.25% 0.1919 0.2347 5.23% 0.1899 0.2308 5.24%
TDA 0.1269 0.1748 4.03% 0.1651 0.1865 3.16% 0.1809 0.2271 5.11% 0.1850 0.2215 5.08%
VMD-RF 0.3015 0.3579 6.77% 0.3650 0.4181 5.58% 0.3607 0.4587 8.49% 0.4091 0.4778 9.45%
RDA-𝛼 0.2668 0.3509 6.60% 0.3386 0.3707 5.49% 0.3520 0.4479 8.41% 0.3608 0.4265 8.78%
RDA-𝛽 0.2727 0.3507 6.59% 0.3411 0.3720 5.53% 0.3522 0.4492 8.46% 0.3610 0.4293 8.90%
RDA 0.2646 0.3411 6.52% 0.3358 0.3618 5.44% 0.3482 0.4441 8.30% 0.3572 0.4142 8.61%
VMD-MT 0.2689 0.3438 6.62% 0.3423 0.3727 5.47% 0.3441 0.4429 8.39% 0.3746 0.4465 8.99%
TDA-𝛼 0.2416 0.3303 6.51% 0.3249 0.3603 5.37% 0.3199 0.3995 8.33% 0.3604 0.4279 8.80%
TDA-𝛽 0.2425 0.3257 6.43% 0.3254 0.3619 5.39% 0.3215 0.4007 8.28% 0.3606 0.4287 8.95%
TDA 0.2381 0.3113 6.35% 0.3167 0.3527 5.29% 0.3165 0.3867 8.20% 0.3593 0.4218 8.68%
VMD-RF 0.3930 0.4920 8.64% 0.4171 0.4721 8.05% 0.4296 0.5449 12.59% 0.5046 0.5865 12.50%
RDA-𝛼 0.3661 0.4650 8.51% 0.3824 0.4403 7.08% 0.4276 0.5440 12.20% 0.4295 0.5120 11.74%
RDA-𝛽 0.3665 0.4635 8.47% 0.3901 0.4600 6.91% 0.4255 0.5426 12.11% 0.4301 0.5151 11.98%
RDA 0.3452 0.4438 8.38% 0.3780 0.4348 6.62% 0.4249 0.5412 11.56% 0.4282 0.5037 11.32%
VMD-MT 0.3492 0.4455 8.44% 0.3832 0.4471 6.95% 0.4216 0.5329 11.59% 0.4406 0.5165 11.90%
TDA-𝛼 0.3063 0.4321 8.14% 0.3726 0.4315 6.50% 0.4149 0.5152 11.02% 0.4297 0.5031 11.22%
TDA-𝛽 0.3077 0.4317 8.28% 0.3737 0.4317 6.66% 0.4205 0.5107 11.10% 0.4299 0.5024 11.41%
TDA 0.2969 0.4038 7.98% 0.3700 0.4273 6.38% 0.4013 0.4990 10.78% 0.4284 0.5000 11.16%
erage improvement indices are greater than 60% and 15% for • The proposed models show markedly better robustness to the
the single comparative models and hybrid comparative models distribution drift of the wind speed. For example, the TDA model
respectively on the Coastal data. achieves average IA values of 0.9991, 0.9972, and 0.9963 in the
13
Fig. A.15. 1-step-ahead forecasting measurements for the Coastal data.
one-step ahead, two-step ahead, and three-step ahead forecasting The proposed models could be further refined in future research.
tasks for the Coastal data respectively. First, the TDA and RDA are designed to be adaptive to a gradual
• The optimal mode number of the decomposition module is adap- distribution drift of the wind speed, and the adaptability to an abrupt
tively set in the proposed models, which effectively avoids the distribution drift should be considered in the improved models. Second,
low forecasting accuracy caused by the underdecomposition or only historical wind speed is utilized in our models, and other meteoro-
overdecomposition of the wind speed. logical factors, such as temperature, air pressure, humidity, etc., could
be incorporated. Finally, the effect of the data scale on the multistep
The limitations of our models are as follows:
ahead forecasting accuracy could also be investigated.
• The distribution drift correction process is efficient on the wind
speed series with gradual distribution drift, but it does not apply
CRediT authorship contribution statement
to the case when the distribution of the wind speed series drifts
abruptly.
• The proposed models do not take other meteorological factors Xuguang Wang: Conceptualization, Methodology, Data curation,
(e.g., temperature, air pressure, humidity, etc.) into considera- Writing – original draft. Xiao Li: Data curation, Software. Jie Su:
tion. Visualization, Investigation, Validation, Writing – review & editing,
Formal analysis.
7. Conclusions
Declaration of competing interest
Machine learning or deep learning-based models are usually
adopted as the prediction module in a hybrid wind speed forecasting The authors declare that they have no known competing finan-
model. Thus, the forecasting accuracy of this prediction module cannot cial interests or personal relationships that could have appeared to
be guaranteed if the distribution of the wind speed drifts over time. influence the work reported in this paper.
Due to its volatile and stochastic nature, the distribution of wind speed
inevitably evolves. Therefore, the forecasting model must adapt to the Data availability
wind speed distribution drift. In this paper, we propose two distribution
drift-adaptive hybrid models for short-term wind speed forecasting.
Data will be made available on request.
The proposed models combine a distribution drift correction module,
a BED rule-based decomposition module and variants of the Trans-
former to fulfill the wind speed forecasting task. The outperformance Acknowledgments
of the two proposed models is attributed to three aspects. First, the
distribution drift correction process effectively corrects the distribution This work was fully supported by the National Natural Science
deviation between historical and future wind speed data, thus allowing Foundation, China of China (No. 62076093) and the Key R&D program
the proposed models to adapt to the wind speed distribution drift. of science and technology foundation of Hebei Province, China (No.
Second, the BED rule-based decomposition process effectively extracts 19210310D).
the fluctuation patterns of the wind speed, which greatly improves the
performance of the proposed models. Third, the Transformer variants,
used as the forecasting module in the proposed models, can accu- Appendix. Additional experimental results
rately learn the functional relationship between historical wind speed
subsignals and future wind speed measurements. See Tables A.6–A.8 and Fig. A.15.
14
Table A.6
Specific error indices obtained based on the Coastal data.
LS-SVM 0.6163 0.6515 14.20% 0.6960 0.8503 14.22% 0.6711 0.8405 14.23% 0.7550 0.8818 14.14%
GRU 0.5854 0.6233 14.12% 0.6654 0.8035 14.05% 0.6346 0.8100 13.21% 0.7057 0.8235 13.30%
WEE 0.5037 0.5770 6.51% 0.5013 0.6330 7.12% 0.5454 0.6262 9.56% 0.4832 0.7062 8.31%
EAW 0.5622 0.6459 7.37% 0.5746 0.7471 8.16% 0.5625 0.6408 10.32% 0.5325 0.7909 9.00%
RWA 0.4365 0.5166 5.87% 0.4498 0.5362 6.70% 0.4578 0.4615 9.12% 0.3678 0.5610 7.23%
FVAD 0.2918 0.3037 5.10% 0.2655 0.3598 5.11% 0.2162 0.2477 5.88% 0.2375 0.3186 5.78%
VMD-RF 0.2738 0.2678 4.35% 0.2197 0.2464 3.46% 0.1877 0.2109 4.73% 0.2089 0.2434 4.74%
RDA 0.2326 0.2432 4.24% 0.1785 0.2066 3.09% 0.1653 0.1918 4.38% 0.1858 0.2201 4.45%
TDA 0.1967 0.2219 4.01% 0.1454 0.1757 2.06% 0.1610 0.1843 4.37% 0.1764 0.2026 4.27%
LS-SVM 0.7810 0.8353 16.53% 0.9481 1.0900 16.55% 0.8179 1.0776 16.57% 0.9201 1.0305 17.46%
GRU 0.7363 0.8230 16.76% 0.8770 1.0086 16.58% 0.7983 1.0182 15.69% 0.8876 1.0301 16.79%
WEE 0.6762 0.7512 8.09% 0.6734 0.8787 8.85% 0.7322 0.8694 11.88% 0.6489 0.9807 10.33%
EAW 0.6904 0.7787 9.31% 0.7538 0.9705 10.31% 0.7399 0.8801 13.03% 0.6985 1.0078 11.67%
RWA 0.6065 0.7131 7.78% 0.6250 0.7401 8.72% 0.6461 0.6368 11.10% 0.5110 0.7743 9.60%
FVAD 0.3571 0.3720 6.16% 0.3249 0.4418 6.17% 0.2645 0.3134 7.18% 0.2906 0.3903 6.98%
VMD-RF 0.3506 0.3526 5.74% 0.2885 0.3244 4.58% 0.2466 0.2777 6.27% 0.2744 0.3214 6.25%
RDA 0.2947 0.3185 5.23% 0.2262 0.2568 3.81% 0.2195 0.2383 5.41% 0.2355 0.2735 5.49%
TDA 0.2786 0.2922 5.11% 0.2046 0.2521 3.13% 0.2193 0.2389 6.00% 0.2332 0.2708 5.36%
LS-SVM 0.9189 1.0024 21.09% 1.0380 1.3085 20.12% 1.0006 1.2934 20.14% 1.1258 1.3570 21.00%
GRU 0.8977 0.9830 20.56% 0.9861 1.1897 20.46% 0.9406 1.1093 19.23% 1.0471 1.2193 19.37%
WEE 0.7304 0.8545 11.13% 0.7270 0.9373 9.98% 0.7909 0.9271 13.42% 0.8209 1.0459 13.65%
EAW 0.7769 0.9029 12.18% 0.8335 0.9754 11.50% 0.8232 1.0019 17.06% 0.9022 1.1054 14.88%
RWA 0.6698 0.8161 8.48% 0.6902 0.8472 9.77% 0.7026 0.7291 13.17% 0.5645 0.8862 10.44%
FVAD 0.3986 0.4144 8.07% 0.3626 0.4909 8.51% 0.3153 0.3580 9.78% 0.3244 0.4347 9.61%
VMD-RF 0.3492 0.4088 7.08% 0.3363 0.3761 5.63% 0.2876 0.3419 7.80% 0.3198 0.3715 7.72%
RDA 0.3195 0.3471 6.39% 0.2614 0.2949 4.66% 0.2420 0.2737 6.79% 0.2721 0.3141 6.91%
TDA 0.3119 0.3454 6.15% 0.2306 0.2814 4.10% 0.2420 0.2752 6.77% 0.2707 0.3145 6.85%
Table A.7
Specific improvement indices achieved by TDA based on the Coastal data.
𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸 𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸 𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸 𝑃𝑀𝐴𝐸 𝑃𝑅𝑀𝑆𝐸 𝑃𝑀𝐴𝑃 𝐸
LS-SVM 68.08% 65.94% 71.76% 79.11% 79.34% 85.51% 76.01% 78.07% 69.29% 76.64% 77.02% 69.80%
GRU 66.40% 64.40% 71.60% 78.15% 78.13% 85.34% 74.63% 77.25% 66.92% 75.00% 75.40% 67.90%
WEE 60.95% 61.54% 38.40% 70.99% 72.24% 71.07% 70.48% 70.57% 54.29% 63.49% 71.31% 48.62%
EAW 65.01% 65.64% 45.59% 74.70% 76.48% 74.75% 71.38% 71.24% 57.66% 66.87% 74.38% 52.56%
RWA 54.94% 57.05% 31.69% 67.67% 67.23% 69.25% 64.83% 60.07% 52.08% 52.04% 63.89% 40.94%
FVAD 32.59% 26.93% 21.37% 45.24% 51.17% 59.69% 25.53% 25.60% 25.68% 25.73% 36.41% 26.13%
VMD-RF 28.16% 17.14% 7.82% 33.82% 28.69% 40.46% 14.22% 12.61% 7.61% 15.56% 16.76% 9.92%
LS-SVM 64.33% 65.02% 69.09% 78.42% 76.87% 81.09% 73.19% 77.83% 63.79% 74.65% 73.72% 69.30%
GRU 62.16% 64.50% 69.51% 76.67% 75.01% 81.12% 72.53% 76.54% 61.76% 73.73% 73.71% 68.08%
WEE 58.80% 61.10% 36.84% 69.62% 71.30% 64.63% 70.05% 72.52% 49.50% 64.06% 72.39% 48.11%
EAW 59.65% 62.48% 45.11% 72.86% 74.02% 69.64% 70.36% 72.86% 53.95% 66.61% 73.13% 54.07%
RWA 54.06% 59.02% 34.32% 67.26% 65.94% 64.11% 66.05% 62.48% 45.95% 54.36% 65.03% 44.17%
FVAD 21.98% 21.45% 17.05% 37.03% 42.94% 49.27% 17.09% 23.77% 16.43% 19.75% 30.62% 23.21%
VMD-RF 20.54% 17.13% 10.98% 29.08% 22.28% 31.66% 11.07% 13.97% 4.31% 15.01% 15.74% 14.24%
LS-SVM 66.06% 65.54% 70.84% 77.78% 78.49% 79.62% 75.81% 78.72% 66.39% 75.95% 76.82% 67.38%
GRU 65.26% 64.86% 70.09% 76.61% 76.35% 79.96% 74.27% 75.19% 64.79% 74.15% 74.21% 64.64%
WEE 57.30% 59.58% 44.74% 68.28% 69.98% 58.92% 69.40% 70.32% 49.55% 67.02% 69.93% 49.82%
EAW 59.85% 61.75% 49.51% 72.33% 71.15% 64.35% 70.60% 72.53% 60.32% 69.99% 71.55% 53.97%
RWA 53.43% 57.68% 27.48% 66.59% 66.78% 58.03% 65.56% 62.25% 48.60% 52.05% 64.51% 34.39%
FVAD 21.75% 16.65% 23.79% 36.40% 42.67% 51.82% 23.25% 23.13% 30.78% 16.55% 27.65% 28.72%
VMD-RF 10.68% 15.51% 13.14% 31.43% 25.18% 27.18% 15.86% 19.51% 13.21% 15.35% 15.34% 11.27%
15
Table A.8
Specific error indices of the ablation experiment based on the Coastal data.
VMD-RF 0.2738 0.2678 4.35% 0.2197 0.2464 3.46% 0.1877 0.2109 4.73% 0.2089 0.2434 4.74%
RDA-𝛼 0.2460 0.2562 4.30% 0.1821 0.2230 3.15% 0.1723 0.2020 4.50% 0.2003 0.2308 4.53%
RDA-𝛽 0.2502 0.2567 4.31% 0.1836 0.2232 3.22% 0.1786 0.2026 4.62% 0.1964 0.2320 4.58%
RDA 0.2326 0.2432 4.24% 0.1785 0.2066 3.09% 0.1653 0.1918 4.38% 0.1858 0.2201 4.45%
VMD-MT 0.2353 0.2480 4.31% 0.1681 0.1992 3.25% 0.1708 0.2023 4.69% 0.1955 0.2315 4.50%
TDA-𝛼 0.2039 0.2325 4.20% 0.1521 0.1867 3.04% 0.1660 0.1905 4.55% 0.1877 0.2114 4.39%
TDA-𝛽 0.2123 0.2288 4.19% 0.1517 0.1870 2.87% 0.1667 0.1911 4.62% 0.1858 0.2205 4.84%
TDA 0.1967 0.2219 4.01% 0.1454 0.1757 2.06% 0.1610 0.1843 4.37% 0.1764 0.2026 4.27%
VMD-RF 0.3506 0.3526 5.74% 0.2885 0.3244 4.58% 0.2466 0.2777 6.27% 0.2744 0.3214 6.25%
RDA-𝛼 0.3135 0.3232 5.41% 0.2435 0.2603 4.22% 0.2210 0.2501 5.95% 0.2399 0.2890 5.75%
RDA-𝛽 0.3163 0.3234 5.47% 0.2452 0.2621 4.24% 0.2211 0.2505 6.23% 0.2420 0.2965 5.77%
RDA 0.2947 0.3185 5.23% 0.2262 0.2568 3.81% 0.2195 0.2383 5.41% 0.2355 0.2735 5.49%
VMD-MT 0.3230 0.3310 5.44% 0.2375 0.2736 4.07% 0.2438 0.2782 6.23% 0.2739 0.3164 6.11%
TDA-𝛼 0.2921 0.3118 5.29% 0.2089 0.2555 3.65% 0.2208 0.2504 5.98% 0.2361 0.2826 5.50%
TDA-𝛽 0.2880 0.3121 5.32% 0.2236 0.2560 4.02% 0.2207 0.2506 6.21% 0.2424 0.2828 5.56%
TDA 0.2786 0.2922 5.11% 0.2046 0.2521 3.13% 0.2193 0.2389 5.50% 0.2332 0.2708 5.36%
VMD-RF 0.3492 0.4088 7.08% 0.3363 0.3761 5.63% 0.2876 0.3419 7.80% 0.3198 0.3715 7.72%
RDA-𝛼 0.3298 0.3598 6.84% 0.2901 0.3010 4.94% 0.2625 0.2819 7.28% 0.2793 0.3169 7.30%
RDA-𝛽 0.3416 0.3606 6.93% 0.2889 0.2996 5.13% 0.2628 0.2914 7.29% 0.2904 0.3186 7.37%
RDA 0.3195 0.3471 6.39% 0.2614 0.2949 4.66% 0.2420 0.2737 6.79% 0.2721 0.3141 6.91%
VMD-MT 0.3432 0.3947 6.84% 0.2879 0.3325 4.80% 0.2881 0.3428 7.79% 0.3054 0.3716 7.65%
TDA-𝛼 0.3270 0.3585 6.45% 0.2424 0.2999 4.57% 0.2626 0.2847 7.26% 0.2767 0.3189 7.19%
TDA-𝛽 0.3403 0.3573 6.56% 0.2492 0.2905 4.68% 0.2625 0.2918 7.27% 0.2855 0.3170 7.25%
TDA 0.3119 0.3454 6.15% 0.2306 0.2814 4.10% 0.2420 0.2752 6.77% 0.2707 0.3143 6.85%
[18] Mallat S. A theory for multiresolution signal decomposition: the wavelet

representation. IEEE Trans Pattern Anal 1989;11:674–93.
[19] Hua L, Zhang C, Peng T, et al. Integrated framework of extreme learning machine
References (ELM) based on improved atom search optimization for short-term wind speed
prediction. Energy Convers Manage 2022;252:115102.
[1] Emeksiz C, Tan M. Multi-step wind speed forecasting and hurst analysis using [20] Santhosh M, Venkaiah C, Vinod Kumar DM. Ensemble empirical mode decompo-
novel hybrid secondary decomposition approach. Energy 2022;238:121764. sition based adaptive wavelet neural network method for wind speed prediction.
[2] Dhiman HS, Deb D. A review of wind speed and wind power forecasting Energy Convers Manage 2018;168:482–93.
techniques. 2020, arXiv preprint arXiv:2009.02279. [21] Liu H, Mi WX, Li YF. An experimental investigation of three new hybrid wind
[3] Tawn R, Browell J. A review of very short-term wind and solar power forecasting. speed forecasting models using multi-decomposing strategy and ELM algorithm.
Renew Sustain Energy Rev 2022;153:111758. Renew Energy 2018;123:694–705.
[4] Neshat M, Nezhad MM, Mirjalili S, et al. Quaternion convolutional long short- [22] Singh SN, Mohapatra A. Repeated wavelet transform based ARIMA model for
term memory neural model with an adaptive decomposition method for wind very short-term wind speed forecasting. Renew Energy 2019;136:758–68.
speed forecasting: North aegean islands case studies. Energy Convers Manage [23] Zhang JL, Wei YM, Tan ZF. An adaptive hybrid model for short term wind speed
2022;259:115590. forecasting. Energy 2020;190:115615.
[5] Chen X, Yu R, Ullah S, et al. A novel loss function of deep learning in wind [24] Wang X, Ren H, Zhai J, et al. Adaptive support segment based short-term wind
speed forecasting. Energy 2022;238:121808. speed forecasting. Energy 2022;249:123644.
[6] Yang W, Tian Z, Y. Hao. A novel ensemble model based on artificial intelligence [25] Webb GI, Hyde R, Cao H, et al. Characterizing concept drift. Data Min Knowl
and mixed-frequency techniques for wind speed forecasting. Energy Convers Disc 2016;30(4):964–94.
Manage 2022;252:115086. [26] Ren S, Liao B, Zhu W, et al. Knowledge-maximized ensemble algorithm for
[7] Wu H, Meng K, Fan D, et al. Multistep short-term wind speed forecasting using different types of concept drift. Inform Sci 2018;430:261–81.
transformer. Energy 2022;261:125231. [27] Zhao P, Cai LW, Zhou ZH. Handling concept drift via model reuse. Mach Learn
[8] Li D, Jiang F, Chen M, et al. Multi-step-ahead wind speed forecasting based on 2020;109(3):533–68.
[28] Alippi C, Roveri M. Just-in-time adaptive classifierspart II: Designing the
a hybrid decomposition method and temporal convolutional networks. Energy
classifier. IEEE Trans Neural Netw 2008;19(12):2053–64.
2022;238:121981.
[29] Minku LL, White AP, Yao X. The impact of diversity on online ensemble learning
[9] Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets.
in the presence of concept drift. IEEE Trans Knowl Data Eng 2009;22(5):730–42.
Neural Comput 2006;18(7):1527–54.
[30] Khamassi I, Sayed-Mouchaweh M, Hammami M, et al. Discussion and review on
[10] Kavasseri RG, Seetharaman K. Day-ahead wind speed forecasting using f-ARIMA
evolving data streams and concept drift adapting. Evol Syst 2018;9(1):1–23.
models. Renew Energy 2009;34(5):1388–93.
[31] Li W, Yang X, Liu W, et al. DDG-DA: Data distribution generation for predictable
[11] Suykens JAK, Vandewalle J. Least squares support vector machine classifiers.
concept drift adaptation. 2022, arXiv preprint arXiv:2201.04038.
Neural Process Lett 1999;9(3):293–300.
[32] Ngiam J, Chen Z, Chia D, et al. Tiled convolutional neural networks. In: NIPS.
[12] Hochreiter J. Long short-term memory. Neural Comput 1997;9(8):1735–80. 2010.
[13] Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent [33] Wang Z, Oates T. Encoding time series as images for visual inspection and
neural networks on sequence modeling. 2014, ArXiv Preprint ArXiv:14123555. classification using tiled convolutional neural networks. Workshops at the
[14] Qu Z, Zhang K, Mao W, et al. Research and application of ensemble fore- twenty-ninth AAAI conference on artificial intelligence, 2015.
casting based on a novel multiobjective optimization algorithm for wind-speed [34] Wang Z, Oates T. Imaging time-series to improve classification and imputation.
forecasting. Energy Convers Manage 2017;154:440–54. In: Twenty-fourth international joint conference on artificial intelligence. 2015.
[15] Li H, Wang J, Lu H, et al. Research and application of a combined model [35] Zhang C, Bengio S, Hardt M, et al. Understanding deep learning requires
based on variable weight for short term wind speed forecasting. Renew Energy rethinking generalization. 2017, arXiv preprint arXiv:1611.03530.
2018;116:669–84. [36] Azizzadenesheli K, Liu A, Yang F, et al. Regularized learning for domain
[16] Dragomiretskiy K, Zosso D. Variational mode decomposition. IEEE Trans Signal adaptation under label shifts. 2019, arXiv preprint arXiv:1903.09734.
Proces 2014;62(3):531–43. [37] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: NIPS, Vol.
[17] Huang NE, Shen Z, Long SR, et al. The empirical mode decomposition and 30. 2017, p. 5998–6008.
the Hilbert spectrum for nonlinear and non-stationary time series analysis. [38] https://ars.els-cdn.com/content/image/1-s2.0-S0360544219312642-mmc1.xlsx.
roceedings of the royal society of London. Series a: mathematical. Phys Eng [39] http://www.esat.kuleuven.be/sista/lssvmlab/.
Sci 1998;454(1971):903–95.
16

Energy: Xuguang Wang, Xiao Li, Jie Su

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Energy: Xuguang Wang, Xiao Li, Jie Su

Uploaded by

Copyright:

Available Formats

Energy 273 (2023) 127209

Contents lists available at ScienceDirect

Distribution drift-adaptive short-term wind speed forecasting

ARTICLE INFO ABSTRACT

Fig. 2. Training and testing sets of model 𝑝 .

3. Distribution drift correction

Fig. 3. Structure of model 𝑝 .

Fig. 4. Wind speed series imaging.

(𝑡) (𝑡) (𝑡) (𝑡)

Fig. 5. Forecasting accuracy under different mode numbers.

Fig. 6. Implementation of the second wind speed forecasting scheme.

Fig. 8. Statistics of the wind speed datasets.

Fig. 8 presents some seasonal statistics of the Inland data and

Fig. 9. Average error indices of the multistep-ahead forecasting.

Fig. 10. 1-step-ahead forecasting measurements for the Inland data.

Fig. 11. Average improvement indices achieved by TDA.

Fig. 13. Average error indices of the ablation experiment.

• The proposed models achieve significantly higher forecasting ac-

Fig. A.15. 1-step-ahead forecasting measurements for the Coastal data.

[18] Mallat S. A theory for multiresolution signal decomposition: the wavelet

You might also like