Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Temporal fusion transformer using variational

mode decomposition for wind power


forecasting
Meiyu Jiang1 , Xuetao Jiang1 , and Qingguo Zhou*2
1 School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
Email: jiangmy21@lzu.edu.cn Email: jiangxt21@lzu.edu.cn
2 School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
Email: zhouqg@lzu.edu.cn
arXiv:2302.01222v2 [cs.LG] 5 Feb 2023

Abstract
The power output of a wind turbine depends on a variety of factors, including wind speed at different heights,
wind direction, temperature and turbine properties. Wind speed and direction, in particular, have complex
cycles and fluctuate dramatically, leading to large uncertainties in wind power output. This study uses vari-
ational mode decomposition (VMD) to decompose the wind power series and Temporal fusion transformer
(TFT) to forecast wind power for the next 1h, 3h and 6h. The experimental results show that VMD outperforms
other decomposition algorithms and the TFT model outperforms other decomposition models.

Keywords: Wind power forecasting, Deep Learning, Transformer.

1 Introduction
Energy is an important basis for economic development. With the development of economy and
technology, global energy demand is growing rapidly. Traditional energy sources such as coal,
crude oil and natural gas are burned to generate electricity and emit more than 75% of the world’s
greenhouse gases, with approximately 90% of the world’s carbon dioxide emissions. This is the
main cause of global climate change. The reserves of traditional energy sources are limited and
if overexploited will lead to energy depletion [1]. Due to the non-renewable nature of traditional
energy sources, global energy shortages are becoming more and more severe, and the prices of
traditional energy sources are thus maintained at relatively high levels and are gradually increasing.
In order to alleviate the energy crisis, more and more countries are supporting the use of renewable
energy as an alternative to traditional energy sources. Renewable energy includes wind, solar, tidal
and biomass energy, which has the advantages of being widely distributed, recyclable and less
polluting to the environment.With the rapid development of wind power technology, the conversion
of wind energy into electricity is becoming more efficient and less costly. Wind power has an
important role to play in the growth of global electricity generation and has received widespread
attention in countries around the world.
According to the Global Wind Report 2022, the total installed capacity of wind power installations
worldwide will be 837 GW in 2021, an increase of 12% compared to 2020. Newly installed wind
installations increased by 93.6 GW, slightly more than the 93 GW increase in 2020. China has led
the global wind energy industry since 2010, with new installed wind power capacity increasing
Pre-printed in Arxiv (2023)
significantly over the last 11 years and remaining at the top of the world. According to the National
DOI: 10.1017/pan.xxxx.xx Energy Administration (NEA), the total installed capacity of wind power in China is 52GW in 2021,
Corresponding author
with 47.5GW of new grid-connected capacity.
Qingguo Zhou The principle of wind power generation is to convert wind energy into mechanical energy and
© The Author(s) 2023. Published
then into electricity. Because wind energy is characterised by high randomness, discontinuity and
for Arxiv pre-print. frequent fluctuations, wind power generation is less stable and less predictable. As a result, wind

1
turbine power generation is highly unstable, which affects the scheduling and planning of power
systems and leads to imbalances in power supply and demand. In order to alleviate these problems,
accurate forecasts of future wind power generation are needed, which helps to reduce the cost
and increase the efficiency of power generation. Wind power technology is therefore becoming
increasingly important in the field of renewable energy.
Wind power forecasting models can be classified as very short-term, short-term, medium-term
and long-term depending on the time horizon [2, 3]. The very short-term forecasting models can
predict wind power generation values up to 30 minutes ahead, facilitating real-time access to wind
power generation and enhancing staff control of wind turbines. The short-term forecasting model
focuses on predicting wind power generation values from 30 minutes to 6 hours ahead, which
helps power companies to develop load dispatch plans and mitigate the impact of wind power
grid connection on the entire grid. The medium-term forecasting model has a time horizon of 6
hours to 1 day ahead and it facilitates renewable energy trading, optimisation of daily generation
schedules and circuit maintenance. Long-term forecasting models can predict the value of wind
power generation from 1 day to 1 week or more in the future, facilitating the siting of wind farms and
the development of annual generation plans. However, some literature suggests that the accuracy
of predictions decreases as the time horizon increases [4, 5, 6]. Also given the high volatility of wind
power, the difficulty of prediction increases dramatically if the time horizon chosen is too long.
Therefore, this paper focuses on short-term wind power forecasting.

1.1 Literature review


According to modelling theory, wind power models can be classified into four groups: physical
models, traditional statistical models, artificial intelligence-based models and hybrid models [7].
The physical approach does not require historical wind generation data, but relies on the
physical characteristics of the wind farm. Physical models can obtain predicted values for wind
power generation by predicting meteorological variables [8]. Meteorological variables in this context
usually refer to wind speed, which can be obtained from numerical weather prediction (N W P ) or
from forecasts of meteorological factors such as temperature, humidity and atmospheric pressure.
The conversion between wind speed and wind power values can be obtained from wind power
curves. Wind power curves is a graph created from field measurements made by researchers at
wind farms. It shows the power output of the wind turbine at different wind speeds. Ackermann et
al. [9] show that in the wind turbine power in wind power curves is proportional to the third power
of the wind speed, if there is a 10% error in the predicted wind speed, the error in the predicted wind
turbine power will be 30%. Accurate prediction of wind speed is therefore important for wind power
forecasting. In the literature, many physical model-based solutions for wind power forecasting
have been proposed. For example, Lazić et al. [10] used the regional NWP Eta model for wind farms
to obtain predicted wind speed values and then predicted wind power values based on wind power
curves. This paper is the first to evaluate the application of the Eta model to wind power prediction
problems and to validate it using data from the Gotland island power plant. The results show that
the Eta model can be used as a meteorological driver for wind power prediction. Focken et al. [11]
propose a wind power forecasting system based on a physical approach, which takes its inputs
from a large weather forecasting model in Germany. The physical approach performs better in
long-term wind power forecasting, while the statistical approach has advantages in short-term
forecasting [12].
Statistical models require large amounts of data such as historical power values of wind turbines,
historical weather data and weather information provided by the NWP to train a fitted model
between input and output power. Depending on whether parameters are used or not statistical
models can be classified into two types, parametric models like time series models and non-
parametric models based on artificial intelligence, such as artificial neural network (ANN) models

Jiang et al. | Pre-printed in Arxiv 2


and other data driven models.
Time series models are advantageous for short-term or ultra-short-term wind power forecast-
ing, describing fluctuating variations in output power. autoregressive integrated moving average
(ARIMA) and autoregressive model (AR) are commonly used univariate time series forecasting meth-
ods, which have They have a relatively short computation time and are suitable for smooth time
series, which can lead to inaccurate predictions when the input data is more volatile. Wind speed
can be converted to wind power through wind power curves, so many methods based on traditional
statistical models have been proposed to improve the accuracy of wind speed prediction. Yatiyana
et al. [13] proposed a statistical method for predicting wind speed and direction based on an ARIMA
model and validated it using data from a specific site in Western Australia with a time duration of 7
days and a resolution of one minute. The experimental results show that the error in predicting
wind speed is less than 5% and the error in predicting wind direction is less than 16%. Firat et
al. [14] propose a new statistical method for wind speed prediction based on second order blind
identification (SOBI) and autoregressive (AR) model. SOBI is an ICA method that uses temporal
structure to find hidden information or independent components in the input data, which provides
better prediction results than direct prediction of wind speed.
Physical models require more detailed meteorological data and physical characteristics, which
are more demanding on the data set, and the complex structure of wind turbines makes physical
processes difficult to explain. Traditional statistical models mainly deal with linear relationship
problems, making it difficult to capture non-linear time-series wind power signals. With the de-
velopment of artificial intelligence techniques, a large number of ANN-based methods have been
proposed in the literature. Bilal et al. [15] proposed an ANN-based model for predicting wind tur-
bine power. The input features are meteorological factors such as wind speed, wind direction, solar
irradiance, temperature and humidity. The experimental results showed that using meteorological
factors as inputs to the ANN had an impact on the performance of the model, with the greatest
impact when wind direction and wind speed were used as feature inputs, and the accuracy of the
prediction results was significantly improved. xu et al. [16] used Elman neural network to predict
short-term wind power, which used input features from the NWP, and the optimisation algorithm
was Particle Jyothi et al. [17] used Adaptive Wavelet Neural Network (AWNN) for wind power gen-
eration. The input features include wind density in addition to wind speed, wind direction and
temperature, and the authors use Morlet wavelets as motor wavelets. The AWNN was shown to be
more effective than ANNs and the Adaptive Neuro-Fuzzy Inference System (ANFIS) for wind power
prediction problems.
A hybrid model is a combination of different kinds of models that can improve the accuracy
of forecasting by retaining the strengths of each model and characterising different aspects of
data fluctuations when forecasting wind power. In the literature, a hybrid model is considered to
have good performance in time series forecasting tasks if it can capture both linear and non-linear
features of the time series data [18]. For example, Shi et al. [19] proposed that hybrid methods
generally use ARIMA models to extract linear features of the data and ANN or SVM models to
extract non-linear features of the data, producing two hybrid models, ARIMA-ANN and ARIMA-SVM,
respectively, and comparing them with ARIMA, ANN or SVM models. The results show that the
hybrid models do not necessarily perform better than a single model. However, hybrid models will
outperform single prediction models in most scenarios [3]. Therefore hybrid models are widely used
for wind power forecasting. Wang et al. [20] proposed a clustered hybrid wind power forecasting
model called PSO-SVM-ARMA. Shetty et al. [21] proposed a RBF neural network model for wind
power prediction and used Particle Swarm Optimization (PSO) to optimise the performance of
the model and Extreme Learning Machine (ELM) to improve the learning speed during training.
Zameer et al. [22] proposed a GPeANN model consisting of ANN and Genetic Programming for short-
term wind power forecasting. Due to the high volatility of meteorological features such as wind

Jiang et al. | Pre-printed in Arxiv 3


speed and wind direction, the prediction results of individual models vary greatly. Experiments
demonstrate that the proposed GPeANN model can generate a collective and stable decision space,
which can avoid the above problems and improve the accuracy of wind power generation. Liu et
al. [23] proposed a new wind power prediction method using the adaptive neuro-fuzzy inference
system (ANFIS) by mixing three models, which are the backpropagation neural network (BPNN),
radial basis function neural network (RBFNN), and least squares support vector machine (LSSVM).
To improve the accuracy of the predictions, a method based on Pearson correlation coefficient
(PCC) was used in the data pre-processing. The results show that the proposed hybrid method
works better compared to the individual models, regardless of the season.
In recent years, deep neural networks (DNNs) have been widely used for wind power prediction,
such as convolutional neural network (CNN), long short-term memory (LSTM) and gated recurrent
unit (GRU), etc. Some studies have shown that combining two or more DNNs can effectively improve
the prediction results. Yu et al. [24] proposed a non-parametric probabilistic prediction method
based on QR and Hybrid neural network (HNN) for predicting the output power of regional wind
power generation. HNN is a special kind of neural network obtained by mixing CNN and LSTM. It
combines the advantages of both CNN and LSTM models, using CNN to extract spatio-temporal
features of time series data, and then feeding the features into the LSTM model. Niu et al. [25]
proposed a sequence-to-sequence model based on the Attention-based Gated Recurrent Unit
(AGRU) for multi-step prediction of wind power generation using multiple input and multiple
output strategies. The feature selection approach evaluates the importance of each input variable
by combining the attention mechanism with the GRU model, and the results show that wind speed
and wind direction have the greatest impact on wind power forecasting, followed by barometric
pressure and seasonal variability. A hybrid wind power forecasting method called EEMD-BA-RGRU-
CSO was proposed by Meng et al [26]. The hybrid model consists of a combination of four different
models that are used at different stages of the experiment and serve different purposes. Ensemble
empirical mode decomposition (EEMD) is used to decompose the wind turbine power data during
data pre-processing, and then bi-attention mechanism (BA) is used for feature selection. For
prediction, an RGRU model using a combination of the residual network and GRU is proposed to
further extract the static and dynamic relationships between features. The performance of the
RGRU model is optimised at training time using the crisscross optimization algorithm (CSO). The
experimental results show that the proposed hybrid method has better prediction results and is
more stable than other more advanced models covered in the literature.
To improve the accuracy of wind power prediction models, many studies have used data pre-
processing techniques to reduce the volatility of the raw input data and thus better extract features.
For example, combining signal decomposition algorithms such as wavelet decomposition (WD),
empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD) and
variational mode decomposition (VMD) with machine learning models. Rayi et al. [27] used the VMD
algorithm to decompose historical wind power data with non-linear and non-smooth characteristics,
and then built different forecasting models for each IMF obtained from the decomposition, which
effectively improved the forecasting effect. After decomposing the wind power time series by the
vmd algorithm, Duan et al. were able to better extract local features and then construct a hybrid
model for prediction using LSTM and deep belief networks based on PSO [28]. Therefore, this paper
will continue to use the VMD algorithm to decompose the time series associated with wind power
forecasting.

Jiang et al. | Pre-printed in Arxiv 4


2 Methodology
2.1 LSTM
Traditional neural networks cannot link the current input to historical information, while RNNs can
link contextual information and are more suitable for processing data with strong backward and
forward correlation such as time series. However, as the number of layers increases, the activation
function will increase exponentially, causing the gradient to fail to propagate forward and the
gradient to disappear. In addition to this, RNNs can suffer from gradient explosion when trained
with too many network parameters causing the network to crash. Gradient explosion causes the
parameters to overflow in value and is therefore easier to detect than gradient disappearance, but
gradient disappearance causes the RNN model to fail to learn long-term dependency information
of the time series data. To alleviate the above problem, Hochreiter and Schmidhuber proposed the
LSTM model, which is a special type of RNN model [29]. It not only learns contextual information
stored in time series data, but also overcomes the gradient disappearance problem of RNN models
and is able to learn long-term dependencies of the data. The modular unit of the LSTM consists
of three different types of gates, namely input gates, output gates and forgetting gates. The three
gates have different roles, with the input gate controlling the information input module cell and
determining which of the current stream of information can be added to the internal state of the
storage cell, which then updates the cell state. The forgetting gate determines which information
will be discarded in favour of preserving new information. The output gate acts in the hidden
layer and controls whether the information is used as the output of the current LSTM. The gating
mechanism can selectively discard unwanted information and leave useful information behind,
thus solving the problem of gradient dispersion that exists in traditional RNNs.

2.2 XGBRegressor
Unlike ANN, LSTM and CNN, which contain only one strong learner, the XGBoost algorithm proposed
by Chen et al. is a Boosting algorithm with multiple learners [30]. This type of algorithm is an
important branch of integration learning, where a strong learner is obtained by assigning different
weights to multiple weak learners according to their accuracy. An important application of Boosting
algorithms to solve regression problems is GBDT, which obtains a target regressor by integrating
multiple decision trees. In GBDT, each decision tree is built in the direction of the decreasing model
loss function, i.e. the loss value of each round of adding trees is fitted by the negative gradient of
the loss function, which is the same principle as Adaboost in the Boosting algorithm. In order for
such algorithms to satisfy both classification tasks and regression tasks, their subtrees are generally
CART trees, which are built recursively to build binary trees that are optimal by definition of the
algorithm.
XGBoost is an improvement and excellent practice of the GBDT algorithm, which adds a regular-
isation term related to the node partitioning difficulty factor and tree size to the GBDT objective
function, controlling the tree size to reduce the possibility of overfitting and speeding up the con-
vergence of the algorithm. In addition, the use of second-order Taylor expansions can make the loss
function more accurate and allow the function to converge in a more precise direction. As one of
the few algorithms in integrated learning that can compete with strong learners, many researchers
have also added methods such as missing value processing and feature importance analysis to the
XGBoost algorithm, making it also relatively useful in practical problems. XGBoost can be used to
model regression and classification problems and refers to the regressor as the XGBRegressor.

2.3 TFT
TFT is a deep neural network architecture based on the attention mechanism proposed by Lim et
al. and is widely used in time series data prediction [31]. Compared to other artificial intelligence-

Jiang et al. | Pre-printed in Arxiv 5


based models, the TFT model has a much improved performance with greater interpretability. The
TFT model classifies input features into different types, including static covariates, future inputs
that can be obtained by speculation and time series data that are known in the past but not known
in the future. It also understands the interactions between different types of input features in Multi-
horizon forecasting and estimates the importance of these features to the forecasting outcome.
Designing models based on different data sources has better forecasting results.
The TFT model consists of several modules with different functions, which are able to construct
features for different types of input data. (1) Gating mechanisms function by forgetting unneeded
components, not only simplifying the structure of the model, but also improving its performance
on different tasks. (2) Variable selection networks identify important input features at each time
step, which can alleviate the problem of traditional DNNs over-fitting and predicting features with
irrelevant targets, and improve the ability of the model to adapt to different samples. (3) Static
covariate encoders Moderate the modelling of temporal dynamics by encoding static features, as
static information may be important for predicting the target, which helps to improve the prediction
results. (4) Temporal processing enables the model to learn long and short term time dependencies
from future inputs obtained through speculation and from time series data that are known in the
past but not in the future. This part consists of two modules, the sequence-sequence layer and
the multi-headed attention layer, whose functions are to perform local processing and to learn
long-term dependencies, respectively. (5) Prediction intervals show quantile predictions, which
are useful for understanding the output distribution.

3 Data description and evaluation etrics


3.1 Data description
The dataset is from a publicly available dataset provided by Chen and Xu and is a selection of real
data collected from four wind farms in China [32]. The wind farms are climatically and geographically
diverse and are located in terrain that includes deserts, plains and mountains. The dataset records
wind power generation and meteorological characteristics for 2019-2020 at a resolution of 15
minutes. Table 1 shows the statistics for the four wind farms, including the mean, minimum and
maximum for the two variables of power output and wind speed at the height of wheel hub.

Table 1. Statistics of the wind farms

Power output(MW) Wind speed at the height of wheel hub (m/s)


Wind farm
Mean Minimum Maximum Mean Minimum Maximum
Site 1 23.4 0 98.1 6.4 0 30.2
Site 2 72.7 0 201.2 7.5 0 28.8
Site 3 18.1 0 94.3 4 0 36.9
Site 4 28.8 0 114.4 8.1 0 23.8

3.2 Signal decomposition algorithms


Empirical mode decomposition (EMD) is a method proposed by Huang et al. for the decomposition
of many types of signals [33]. EMD can decompose the original signal, which reflects the time-scale
characteristics of the data, into a number of intrinsic mode functions (IMFs), but the frequencies of
these IMF components may differ. It is often the case that an IMF may contain feature components
at different time scales, leading to the problem of mode mixing in EMD. To avoid the mode mixing
problem between IMFs, Wu and Huang developed an improved method for EMD, the ensemble
empirical mode decomposition (EEMD) algorithm [34]. EEMD takes advantage of the fact that
white noise has a zero average value, and changes the extreme value point of the signal by adding

Jiang et al. | Pre-printed in Arxiv 6


Gaussian white noise several times during the EMD decomposition process, and after several
averaging processes, the noises cancel each other out. The more times of averaging, the less noise
is introduced into the decomposition, and the less impact on the decomposition result [35]. Wind
power generation data has a relatively strong volatility and intermittent time series data. EEMD
is used to decompose wind power generation to obtain two mutually independent IMFs and a
representation of the trend residual, as shown in Figure 1. The decomposed wind power generation
data is less volatile and random, and the IMFs have some regularity, which helps to improve the
accuracy of wind power forecasting.

Figure 1. EEMD algorithm decomposes wind power data

The VMD algorithm is a signal processing algorithm based on wiener filtering, Hilbert transform
and frequency mixing [36]. The VMD algorithm is adaptive and non-recursive in nature. Adaptive
is the algorithm’s ability to determine the number of modal decompositions based on the actual
problem and to match the optimum centre frequency and bandwidth of each modal component
during the operation. Compared to the EMD algorithm, it has a better mathematical theoretical
basis and is able to separate signals accurately and efficiently. The VMD algorithm is applied to
highly complex, strongly non-linear time series to reduce its non-smoothness. The decomposition
results in several relatively smooth subseries, effectively eliminating the effect of noise and allowing
the capture of important features of the original signal.
The VMD decomposition algorithm is one of the effective methods to solve the problem of
modal conflation that occurs in EEMD. It is used to decompose the wind power generation data and
the decomposition results are shown in Figure 2. The decomposition results in three IMFs around
the central frequency and a residual, which adaptively separates the IMFs in different frequency
bands and effectively reduces the influence of noise on the prediction results.
The EEMD-VMD algorithm is a secondary decomposition technique, where the EEMD algorithm

Jiang et al. | Pre-printed in Arxiv 7


Figure 2. VMD algorithm decomposes wind power data

is first used to decompose the original signal, and then the VMD algorithm is used to decompose
the first IMF1 obtained after the EEMD decomposition, and the decomposition results are shown in
Figure 3.

3.3 Evaluation Metrics


In order to evaluate the prediction results of our proposed method, the following two metrics
of statistical prediction and true value error were used: normalized mean absolute error (nMAE),
normalized rootq mean square error (nRMSE). nMAE and nRMSE are defined as shown below.
1 2
nRMSE = T y 2
ÍT
max
i =1 (y i − ŷ i )
nMAE = T y1max Ti=1 |yi − ŷi |
Í

where T denotes the forecast length, yi denotes the actual wind power at point i, hat yi is
the predicted value of wind power at point i, and y max denotes the maximum value of the actual
wind power. In wind power forecasting studies, the units of RMSE and MAE can be kW, kWH and
percentage, depending on the object of the forecast. In this paper, the units of each evaluation
metric are in percentages to allow comparison with other wind power forecasting models.

Jiang et al. | Pre-printed in Arxiv 8


Figure 3. EEMD-VMD algorithm decomposes wind power data

4 Results and analysis


4.1 Comparison of results
In this subsection, the wind power data are pre-processed using the EEMD, VMD and EEMD-VMD
decomposition methods, and then the TFT model is used to predict the time series data obtained
from the decomposition, which facilitates the decomposition technique that yields the best pre-

Jiang et al. | Pre-printed in Arxiv 9


diction results. In order to compare the prediction results of the models more visually, Figures
4-15 show the prediction results of the TFT model, EEMD-TFT model, VMD-TFT model and EEMD-
VMD-TFT model for 1-h, 3-h and 6-h ahead at the four sites. As can be seen from the graphs, the
data pre-processing methods are beneficial in improving the prediction performance of the TFT
models. After using different decomposition methods, higher accuracy than the original model
was obtained and the prediction curves were closer to the original wind generation series.
Table 2 shows the prediction results of the TFT model for wind power data from the four wind
power sites with different pre-processing methods for 6-h, 3-h and 1-h ahead respectively. As can
be seen from Table 2, there are differences in the prediction results of the three decomposition
methods due to the different climatic and geographical locations of the four different sites. The
prediction performance of the EEMD-TFT model, VMD-TFT model and EEMD-VMD-TFT model de-
creases significantly as the prediction range increases. Compared to EEMD and EEMD-VMD, VMD
decomposed better, achieving lower MAE and RMSE values in the 6-h, 3-h and 1-h ahead forecasts of
the TFT model. For example, in the 6-h ahead prediction, the VMD-TFT model achieved the lowest
average MAE values of 3.53%, 3.05%, 2.90% and 2.99% and average RMSE of 5.07%, 4.3%, 4.12%
and 3.95% respectively for the wind data from all four sites.

Figure 4. Station 1 1-h ahead wind power forecast results

5 Conclusion
The study used variational mode decomposition (VMD) for decomposition of wind power series
and Temporal fusion transformer (TFT) for future 1h, 3h and 6h wind power forecasts. The experi-
mental results show that VMD outperforms other decomposition algorithms and the TFT model
outperforms other decomposition models.

Jiang et al. | Pre-printed in Arxiv 10


Figure 5. Station 2 1-h ahead wind power forecast results

Figure 6. Station 3 1-h ahead wind power forecast results

Jiang et al. | Pre-printed in Arxiv 11


Figure 7. Station 4 1-h ahead wind power forecast results

Figure 8. Station 1 3-h ahead wind power forecast results

Jiang et al. | Pre-printed in Arxiv 12


Figure 9. Station 2 3-h ahead wind power forecast results

Figure 10. Station 3 3-h ahead wind power forecast results

Jiang et al. | Pre-printed in Arxiv 13


Figure 11. Station 4 3-h ahead wind power forecast results

Figure 12. Station 1 6-h ahead wind power forecast results

Jiang et al. | Pre-printed in Arxiv 14


Figure 13. Station 2 6-h ahead wind power forecast results

Figure 14. Station 3 6-h ahead wind power forecast results

Jiang et al. | Pre-printed in Arxiv 15


Figure 15. Station 4 6-h ahead wind power forecast results

Table 2. Prediction results of TFT prediction models based on different decomposition methods

1-h ahead 3-h ahead 6-h ahead


Forecasting models
nMAE nRMSE nMAE nRMSE nMAE nRMSE
site 1
EEMD-TFT 2.84% 3.71% 3.66% 5.24% 6.00% 9.65%
VMD-TFT 1.63% 2.40% 2.26% 3.13% 3.05% 4.30%
EEMD-VMD-TFT 3.17% 4.03% 3.82% 5.38% 6.05% 9.68%
site 2
EEMD-TFT 2.73% 3.68% 3.63% 5.35% 5.52% 8.85%
VMD-TFT 1.50% 2.39% 2.11% 3.11% 2.90% 4.12%
EEMD-VMD-TFT 2.94% 3.89% 4.29% 6.06% 6.11% 9.42%
site 3
EEMD-TFT 2.81% 3.66% 3.96% 5.60% 5.97% 9.96%
VMD-TFT 1.94% 2.41% 3.19% 3.79% 3.44% 4.39%
EEMD-VMD-TFT 3.03% 3.91% 4.12% 5.85% 6.27% 9.96%
site 4
EEMD-TFT 2.45% 3.17% 2.63% 3.60% 4.28% 6.36%
VMD-TFT 1.32% 1.79% 1.40% 2.08% 2.99% 3.95%
EEMD-VMD-TFT 2.55% 3.26% 2.81% 3.75% 3.87% 5.89%

Acknowledgments. TBD

Jiang et al. | Pre-printed in Arxiv 16


Data Availability Statement. TBD

References
[1] Yun Wang et al. “Approaches to wind power curve modeling: A review and discussion”. In: Renewable
and Sustainable Energy Reviews 116 (2019), p. 109422.
[2] Hui Liu and Chao Chen. “Data processing strategies in wind energy forecasting models and applica-
tions: A comprehensive review”. In: Applied Energy 249 (2019), pp. 392–408.
[3] Jaesung Jung and Robert P Broadwater. “Current status and future advances for wind speed and power
forecasting”. In: Renewable and Sustainable Energy Reviews 31 (2014), pp. 762–777.
[4] Shahram Hanifi et al. “A critical review of wind power forecasting methods—past, present and future”.
In: Energies 13.15 (2020), p. 3764.
[5] Wen-Yeau Chang et al. “A literature review of wind forecasting methods”. In: Journal of Power and En-
ergy Engineering 2.04 (2014), p. 161.
[6] Saurabh S Soman et al. “A review of wind power and wind speed forecasting methods with different
time horizons”. In: North American power symposium 2010. IEEE. 2010, pp. 1–8.
[7] Yun Wang et al. “A review of wind speed and wind power forecasting with deep neural networks”. In:
Applied Energy 304 (2021), p. 117766.
[8] Jing Yan and Tinghui Ouyang. “Advanced wind power prediction based on data-driven error correc-
tion”. In: Energy conversion and management 180 (2019), pp. 302–311.
[9] Thomas Ackermann and Lennart Söder. “Wind energy technology and current status: a review”. In:
Renewable and sustainable energy reviews 4.4 (2000), pp. 315–374.
[10] Lazar Lazić, Goran Pejanović, and Momčilo Živković. “Wind forecasts for wind power generation using
the Eta model”. In: Renewable Energy 35.6 (2010), pp. 1236–1243.
[11] Ulrich Focken, Matthias Lange, and Hans-Peter Waldl. “Previento-a wind power prediction system with
an innovative upscaling algorithm”. In: Proceedings of the European Wind Energy Conference, Copen-
hagen, Denmark. Vol. 276. 2001.
[12] Ma Lei et al. “A review on the forecasting of wind speed and generated power”. In: Renewable and
sustainable energy reviews 13.4 (2009), pp. 915–920.
[13] Eddie Yatiyana, Sumedha Rajakaruna, and Arindam Ghosh. “Wind speed and direction forecasting for
wind power generation using ARIMA model”. In: 2017 Australasian Universities Power Engineering Con-
ference (AUPEC). IEEE. 2017, pp. 1–6.
[14] Umut Firat et al. “Wind speed forecasting based on second order blind identification and autoregres-
sive model”. In: 2010 Ninth International Conference on Machine Learning and Applications. IEEE. 2010,
pp. 686–691.
[15] Boudy Bilal et al. “Wind turbine power output prediction model design based on artificial neural net-
works and climatic spatiotemporal data”. In: 2018 IEEE International Conference on Industrial Technol-
ogy (ICIT). IEEE. 2018, pp. 1085–1092.
[16] Lei Xu and Jiandong Mao. “Short-term wind power forecasting based on Elman neural network with
particle swarm optimization”. In: 2016 Chinese Control and Decision Conference (CCDC). IEEE. 2016,
pp. 2678–2681.
[17] M Nandana Jyothi and PV Ramana Rao. “Very-short term wind power forecasting through adaptive
wavelet neural network”. In: 2016 Biennial International Conference on Power and Energy Systems: To-
wards Sustainable Energy (PESTSE). IEEE. 2016, pp. 1–6.
[18] G Peter Zhang and Min Qi. “Neural network forecasting for seasonal and trend time series”. In: Euro-
pean journal of operational research 160.2 (2005), pp. 501–514.
[19] Jing Shi, Jinmei Guo, and Songtao Zheng. “Evaluation of hybrid forecasting approaches for wind speed
and power generation time series”. In: Renewable and Sustainable Energy Reviews 16.5 (2012), pp. 3471–
3480.

Jiang et al. | Pre-printed in Arxiv 17


[20] Yurong Wang, Dongchuan Wang, and Yi Tang. “Clustered hybrid wind power prediction model based
on ARMA, PSO-SVM, and clustering methods”. In: IEEE Access 8 (2020), pp. 17071–17079.
[21] Rashmi P Shetty, APSP Sathyabhama, A Adarsh Rai, et al. “Optimized radial basis function neural net-
work model for wind power prediction”. In: 2016 second international conference on cognitive comput-
ing and information processing (CCIP). IEEE. 2016, pp. 1–6.
[22] Aneela Zameer et al. “Intelligent and robust prediction of short term wind power using genetic pro-
gramming based ensemble of neural networks”. In: Energy conversion and management 134 (2017),
pp. 361–372.
[23] Jinqiang Liu, Xiaoru Wang, and Yun Lu. “A novel hybrid methodology for short-term wind power fore-
casting based on adaptive neuro-fuzzy inference system”. In: Renewable energy 103 (2017), pp. 620–
629.
[24] Yixiao Yu et al. “Probabilistic prediction of regional wind power based on spatiotemporal quantile re-
gression”. In: 2019 IEEE Industry Applications Society Annual Meeting. IEEE. 2019, pp. 1–16.
[25] Zhewen Niu et al. “Wind power forecasting using attention-based gated recurrent unit network”. In:
Energy 196 (2020), p. 117081.
[26] Anbo Meng et al. “A hybrid deep learning architecture for wind power prediction based on bi-attention
mechanism and crisscross optimization”. In: Energy 238 (2022), p. 121795.
[27] Vijaya Krishna Rayi et al. “Adaptive VMD based optimized deep learning mixed kernel ELM autoencoder
for single and multistep wind power forecasting”. In: Energy 244 (2022), p. 122585.
[28] Jiandong Duan et al. “A novel hybrid model based on nonlinear weighted combination for short-term
wind power forecasting”. In: International Journal of Electrical Power & Energy Systems 134 (2022),
p. 107452.
[29] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8
(1997), pp. 1735–1780.
[30] Tianqi Chen and Carlos Guestrin. “Xgboost: A scalable tree boosting system”. In: Proceedings of the
22nd acm sigkdd international conference on knowledge discovery and data mining. 2016, pp. 785–794.
[31] Bryan Lim et al. “Temporal fusion transformers for interpretable multi-horizon time series forecasting”.
In: International Journal of Forecasting 37.4 (2021), pp. 1748–1764.
[32] Yongbao Chen and Junjie Xu. “Solar and wind power data from the Chinese State Grid Renewable En-
ergy Generation Forecasting Competition”. In: Scientific Data 9.1 (2022), p. 577.
[33] Norden E Huang et al. “The empirical mode decomposition and the Hilbert spectrum for nonlinear and
non-stationary time series analysis”. In: Proceedings of the Royal Society of London. Series A: mathemat-
ical, physical and engineering sciences 454.1971 (1998), pp. 903–995.
[34] Zhaohua Wu and Norden E Huang. “Ensemble empirical mode decomposition: a noise-assisted data
analysis method”. In: Advances in adaptive data analysis 1.01 (2009), pp. 1–41.
[35] Zengshun Chen et al. “An Improved Method Based on EEMD-LSTM to Predict Missing Measured Data of
Structural Sensors”. In: Applied Sciences 12.18 (2022), p. 9027.
[36] Konstantin Dragomiretskiy and Dominique Zosso. “Variational mode decomposition”. In: IEEE transac-
tions on signal processing 62.3 (2013), pp. 531–544.

Jiang et al. | Pre-printed in Arxiv 18

You might also like