41.-++A Robust Deep Learning Framework For Short-Term Wind Power Forecast-2021

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Energy 221 (2021) 119759

Contents lists available at ScienceDirect

Energy
journal homepage: www.elsevier.com/locate/energy

A robust deep learning framework for short-term wind power forecast


of a full-scale wind farm using atmospheric variables
Rajitha Meka, Adel Alaeddini*, Kiran Bhaganagar
Department of Mechanical Engineering, University of Texas, San Antonio, TX, 78249, USA

a r t i c l e i n f o a b s t r a c t

Article history: Short-term (less than 1 h) forecast of the power generated by wind turbines in a wind farm is extremely
Received 24 May 2020 challenging due to the lack of reliable data from meteorological towers and numerical weather model
Received in revised form outputs at these timescales. A robust deep learning model is developed for short-term forecasts of wind
11 November 2020
turbine generated power in a wind farm using the state-of-the-art temporal convolutional networks
Accepted 30 December 2020
(TCN) to simultaneously capture the temporal dynamics of the wind turbine power and relationship
Available online 8 January 2021
among the local meteorological variables. An orthogonal array tuning method based on the Taguchi
design of experiments is utilized to optimize the hyperparameters of the proposed TCN model. The
Keywords:
Temporal convolutional network (TCN)
proposed TCN model is validated using twelve months of data from a 130 MW utility-scale wind farm
Neural networks with 86 wind turbines in comparison with some of the existing methods in the literature. The power
Orthogonal array tuning method (OATM) curves obtained from the proposed TCN model show consistent improvements over existing methods at
Short-term prediction all wind speeds.
Wind farm power prediction © 2021 Elsevier Ltd. All rights reserved.
Wind turbine

1. Introduction include physical methods, statistical methods, and hybrid methods


that are the combination of both [5]. The meso scale features of the
Wind energy is one of the major renewable energy sources that weather is obtained from numerical weather prediction models
has been rapidly growing in recent years. It is known for its envi- (NWP). As the weather predictions are only updated a few times
ronmental friendliness, fossil fuel saving and pollutants reduction per day, the physical methods provide satisfactory forecasts for
[1]. However, while there is a significant increase in the installed longer time horizons [6]. Mathematical models based on statistical
wind capacity each year, it comes with its own set of challenges. methods use historical data to make predictions and are known to
Unlike conventional power plants, wind energy is highly depen- work well with short-term power [7]. The conventional statistical
dent on meteorological conditions which makes the power output models are divided into autoregressive (AR), moving average (MA),
not easily predictable [2]. As the integration of regenerative en- autoregressive moving average (ARMA), autoregressive integrated
ergies such as wind energy into the current electric supply system moving average (ARIMA) models [8]. The hybrid methods combine
increases, predictive algorithms that provide reliable forecasts are the physical and statistical methods using weather forecasts data
required to reduce the technical and financial risks of all partici- and time series analysis to provide robust wind power predictions
pants in the market [3]. [9].
Predictive algorithms for forecasting wind turbine power can be Recently, artificial intelligence techniques based on multi-layer
divided into four categories based on the prediction horizon, which perceptrons (MLP), radial basis function (RBF), recurrent neural
include very short-term (few seconds to 30 min ahead), short-term networks (RNN) and fuzzy logic systems have gained popularity
(30 mine48/72 h ahead), medium-term (48/72 h to 1 week ahead) and proven successful in improving the accuracy of wind power
and long-term (1 weeke1þ years ahead) [4]. Wind power fore- forecasts [10]. Methaprayoon et al. [11] discuss the development of
casting methods can be further classified based on the type and an ANN based wind power forecasting method that uses history of
availability of data, and the purpose of the forecasting, which wind speed and wind power as inputs to predict the future wind
power. Since then, there have been quite a few models developed
based on MLP [12,13], radial basis function (RBF) neural networks
[5,14,15], support vector machines [16,17], ensemble methods
* Corresponding author.
[18,19] to improve the wind power forecast. There are also wind
E-mail address: adel.alaeddini@utsa.edu (A. Alaeddini).

https://doi.org/10.1016/j.energy.2021.119759
0360-5442/© 2021 Elsevier Ltd. All rights reserved.
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

power forecasting models based on numerical weather prediction This paper proposes a temporal convolutional network (TCN) to
model data using fuzzy rules [20]. Hong and Rioflorido [21] propose predict the total wind turbine power in a 130 MW utility-scale wind
an RBF neural network with a double Gaussian activation function farm for short-term using atmospheric variables, which to the best
that utilizes the features extracted by a convolutional neural of our knowledge has not been attempted before. The proposed
network from the data. Chen et al. [22] propose an architecture that model provides a multi-step forecast of the wind power for up to
includes stacked encoders to jointly extract salient patterns and 50 min ahead. It utilizes an orthogonal array tuning method
heterogeneous features of the data. Recurrent neural networks (OATM) to optimize the hyperparameters of the proposed model
(RNN) are another type of neural networks known to work well including the size of the input history. As the wind energy is highly
with time series data [23]. Barbounis et al. [24] and Senjyu et al. dependent on meteorological conditions, the atmospheric data
[25] discuss using the RNNs for long-term wind speed and power collected through two meteorological (MET) towers located near
forecasting. Felder et al. [26] discuss using RNNs to predict short- the wind farm is used. The contributions of the proposed study
term wind power using the last 24 h of averaged power measure- include.
ments and NWP data as input to the model. However, RNN models
has the problem of vanishing gradients, in which the contribution  Proposing a deep learning model based on state-of-the-art TCN
of previous states vanishes after a few timesteps [27]. to predict the total power produced in a full scale wind farm
LSTM is a special type of RNN model that is known to solve the using atmospheric variables
vanishing gradient problem of RNNs for a small extra computa-  Providing multi-step ahead prediction of wind power for 0, 10,
tional costs [28,29]. Since its inception in 1997 [30], a wide variety 20, 30, 40 and 50 min ahead
of LSTM architectures have been used for different problems  Using an orthogonal array tuning method (OATM) to optimize
[31e35]. Xiaoyun et al. [36] discuss the usage of LSTM model based the hyperparameters of the deep learning models, including the
on NWP data for short-term wind power prediction. Wu et al. [37] size of input history, to improve the prediction performance
propose a CNN þ LSTM model that uses convolutional layers to  Evaluating the predictive performance of the proposed multi-
extract the features that are passed through LSTM layers to capture step ahead TCN model against long short-term memory
the time dynamics. Lo pez et al. [38] propose using the LSTM blocks (LSTM), convolutional long short-term memory (CNN þ LSTM)
as hidden layer units of echo state network to predict the accurate and multi linear regression (MLR) models
wind power using the history of wind power and NWP data. Shi
et al. [39] also find LSTM model to perform better than traditional Due to the propriety nature of the results only the scaled total
neural networks and proposes recursive and direct variational power is reported in this study. The rest of the paper is organized as
model decomposition long short-term memory networks for follows: the data collection, preprocessing, model construction of
hourly day ahead wind power prediction. Cali and Sharma [40] also all the prediction models, performance metrics and hyper-
use the historical power and NWP data to predict the wind power parameters optimization are provided in Section 2. The predictive
using LSTM model along with sensitive analysis to identify the most performance of the TCN model along with other models are dis-
informative input parameters from NWP data. Later, Yu et al. [41] cussed in Section 3. Finally, the conclusions are discussed in Section
propose an LSTM enhanced forget-gate network model in which 4.
two peep holes are added to standard LSTM forget and output gates
to improve the performance of the model. LSTM is still one of the 2. Materials and methods
extensively used deep learning models and can be found in many
recent papers [42e48].
Let X2RTxp , X ¼ ðxt¼1 ; xt¼2 ; ::; xt¼T Þ be the time series of input
While LSTM provides promising results, Greff et al. [49]
variable dynamics for the time periods t ¼ 1; …; T, where
benchmark eight different variants of LSTM on different complex
xi¼ðxi1 ;xi2 ;::;xi15 Þ represents the set of values for the individual input
prediction tasks, concluding none of the variants can improve the
variables p ¼ 1; …; 15 at time point i in the dataset. The response
predictions significantly compared to the standard LSTM. Recently,
(output) variable is denoted as y ¼ f ðXÞ. All the models in this study
temporal convolutional networks (TCN) have outperformed several
use the inputs X at time t  ih ;t  ih þ 1;…; t, where ih is the size of
recurrent architectures like LSTM and provide significant perfor-
input history, to predict the y at t; t þ 1; t þ 2; t þ 3; t þ 4; t þ 5
mance improvement for sequence modeling tasks like action seg-
simultaneously.
mentation [50,51], speech analysis [52], image classification [53],
and medical time series prediction [54e58]. Recent works [59]
demonstrate the use of TCN’s hierarchical temporal convolutional 2.1. Wind farm dataset
filters help capturing the long range patterns by overcoming the
previous short comings of recurrent network architectures. Bai The proposed study uses the dataset from a wind farm that
et al. [59] compare TCN with RNN, gated recurrent units and LSTM covers 120 km2 of area. The wind farm has 86 wind turbines with a
using several benchmarked problems to show that TCN out- scale of around 1  1:5 MW. The wind turbines have a 3:5 m=s cut-
performs the RNN architectures. They also discuss the advantages in wind speed, a 14 m=s rated wind speed, and a 25 m=s cut-out
and disadvantages of the TCN model. Apart from benefits like wind speed. The data is collected at 10 min intervals from nearby
lowest memory requirement for training and gradient stability, one meteorological (MET) towers. The dataset consists of 12 months of
of the main advantages of TCN model is the ability of not having to data from the year 2013 collected from MET towers at two heights
backpropagate in the temporal direction that resulting in easily (at hub height and 1:5 m above ground) and wind power from all 86
trainable models of high capacity, despite moderate complexity in wind turbines. In addition to the 12 original variables collected,
terms of parameters. Some of the distinguishing characteristics of three additional variables are calculated. Table 1 presents all 15
TCN include the capability of looking far enough into the past using input variables, along with their notations, and descriptive statis-
a combination of deep networks that are augmented with residual tics. The total power from all the 86 wind turbines is the output in
layers to provide predictions, using sequence of any length to map this study.
the output of same sequence and using causal convolutions that Fig. 1 shows the autocorrelation of the total power at 100
does not allow information leak from future to past [59]. timesteps. Autocorrelation at lag k is calculated using Equation (1)
where T is the number of observations in the time series y1 ; y2 ; …;
2
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Table 1
Descriptive statistics and related notations of input variables.

Input - MET tower variables Notation Mean Std Min Max

Air temperature at 1:5m (K) T1:5m 290.5 10.5 246.8 313.4


Solar radiance SR 211.3 302.5 0.0 1145.0
Surface temperature (K) Ts 293.6 13.6 265.1 334.7
Relative humidity (%) RH 54.7 22.0 5.0 96.0
Soil moisture SM 0.1 0.0 0.0 0.2
Soil temperature (K) TSo 294.7 10.7 273.3 319.4
Wetness W 1082.6 259.7 29.0 1321.5
m U1:5m 2.0 1.5 0.0 10.7
Wind speed at 1:5m ( )
s
Pressure (Pa) P 92609.9 670.4 80439.0 94546.0
Air temperature at hub height (K) T 290.2 10.2 261.9 312.7
Wind direction (+) WD 156.1 78.2 0.0 358.7
m U 7.5 3.3 0.0 23.6
Wind speed at hub height ( )
s
Input - Calculated variables Notation Mean Std Min Max

Air density (kg= m3 ) Rho 0.9 0.1 0.7 1.1


Heat flux (W= m2 ) HFX 45.7 90.7 114.4 1192.7
Change in surface temperature (K) vTs 0.0 0.6 6.0 6.9
vt

Fig. 1. The autocorrelation of total power for the 100 timesteps in the time series, where each timestep involves a 10 min interval.

yT and y is the average of the observations [60]. As the coefficient of related to the ones that are highly correlated with total power. For
correlation plateaus over the time, it indicates that the data far example, heat flux has very small (negative) correlation with total
away from the most current timestep (lag 0) might not be helpful in power, but it has significant correlation of 0.53 with wind speed
prediction of the model. collected at 1:5 m, which has second highest positive correlation
with the total power. In this paper, all of the 12 MET variables as

1
Pnjkj   well as the 3 calculated variables are used as the inputs to the
T t¼1 ytþjkj  y ðyt  yÞ model, because deep learning models are known to have the ability
Ak ¼ P (1) to extract complex nonlinear features from the inputs, which go
1 T ðy  yÞ2
T t¼1 t
beyond Pearson correlation.
The matrix of the Pearson’s correlation coefficient (See Equation PT
(2)) between all the variables is shown in Fig. 2. Of all the input t¼1 ðxt  xÞðyt  yÞ
C ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PT (2)
variables, the wind speed at hub height has the highest positive 2 PT 2
correlation with total power following by wind speed collected at t¼1 ðxt  xÞ t¼1 ðyt  yÞ

1:5 m with the correlation coefficients of 0.88 and 0.39 respectively.


Pressure has the highest negative correlation with the total power
followed by solar radiance with the correlation coefficients
of 0.29 and 0.15 respectively. 2.2. Data cleaning
As shown in the correlation matrix, some of the input variables
might not be directly correlated with the total power, but they are This study uses instantaneous rapid refresh (RAP) data from
3
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Fig. 2. The color coded correlation matrix of the 15 inputs and 1 ouput variables based on Pearson correlation coefficient. (For interpretation of the references to color in this figure
legend, the reader is referred to the Web version of this article.)

NWP to understand the bounds of the MET tower variables. RAP library is provided by Philippe Remy which is based on the Keras
data is not used as inputs as it is available in hourly intervals. The library. The original locuslab code of TCN can be found in Ref. [63].
data is pre-processed to replace all missing observations with valid The TCN architecture can be defined as TCN ¼ 1D FCN þ causal
previous timestep observations. Before cleaning the outliers, the convolutions. 1D FCN is a one dimensional fully convolutional
data distribution of all input and output variables over time are network [64] that helps in accomplishing one of the TCN principles
analyzed. The analysis shows the total power from all wind tur- which is maintaining the network output the same as the length of
bines and the two MET tower variables, air temperature at 1:5 m input. Causal convolutions are used to accomplish the other prin-
and pressure, have outliers. Using the air temperature and pressure ciple of no leakage from the past to future. The output at time t is
bounds of RAP data, the values less than 250 K and 80000 Pa for air convolved only through the elements at time t and earlier in the
temperature at 1:5 m and pressure are considered as outliers previous layer. To take advantage of (possibly) long history, dilated
respectively. All the outliers of air temperature at 1:5 m and pres- convolutions are used instead of simple causal convolutions that
sure are replaced with valid previous timestep observations. For might need a very deep network resulting in complicated network
cleaning the total power data, power from individual wind turbines structure and heavy computation. The dilation factor d is the fixed
are used. As the negative power values are physically impossible, step between every two adjacent filter taps. The dilated convolu-
the values less than 0 are considered as outliers. While there are a tions help increasing the size of the receptive field while the
few very high magnitude power values, after referring to the mean tunable parameters grow linearly [65]. For a 1D sequence input x2
and maximum power values of each wind turbine, the power R and a filter f : f0;::;ks  1g/R, the dilated convolution operation
values greater than 1:8 MW are considered as outliers. The data F at time t is defined as:
distributions of air temperature at 1:5 m, pressure and scaled total
power before and after outliers are shown in Fig. 3. kX
s 1

FðtÞ ¼ ðx*d f ÞðtÞ ¼ f ðiÞ : xtd:i (4)


2.3. Data scaling i¼0

The dataset is scaled to make the variables within the pre- where ks is the filter size, i is the level of the network, and t  d:i in
defined range of ½0; 1. The MinMaxScaler from scikit-learn (version the indicator of the direction of the past. The receptive field of the
0.21.1.) [61] library is used to scale each variable using the Equation TCN can be increased using a larger filter size ks and dilation factor
(3). d. The history size of one such layer will be of the length ðks  1Þd.
The larger dilation helps the output at top level to represent wide
smax  smin smax  smin range of inputs thus increasing the receptive field. The dilation
xscaled ¼ x þ smin  x (3)
xmax  xmin xmax  xmin min factor d ¼ 1 is a dilation convolution equal to regular convolution.
The d value is generally increased exponentially with the depth of
where, smin ¼ 0, smax ¼ 1, xmin and xmax are the minimum and the network.
maximum values of the variable of interest in the training set with x Residual block is the another component of TCN which is
being the actual value of the variable. defined as in Equation (5), where o is the output of the layer, and
FðxÞ represents the residual mapping to be learned. The weight
2.4. Proposed model: Temporal convolutional networks (TCN) normalization is applied to the convolutional filters inside the re-
sidual block. The activation function that is applied can be the
Temporal convolutional networks (TCN) has been successfully rectified linear unit (ReLu). After the weight layer, a dropout layer
applied to several applications in recent years with little tweaks in can be added to avoid the problem of over fitting [66]. Dropout is a
the architecture [50,51,59,62]. The TCN model considered in the regularization technique used to drop some random outputs of the
study is based on the version discussed by Bai et al. [59]. The TCN network layer [67]. The number of neurons to drop is given by the
4
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Fig. 3. Air temperature at 1:5 m, pressure and scaled total power data distribution over time - with and without outliers.

dropout rate ranging from 0 to 1 which is the probability at which 2.6. Comparing method II: CNN þ LSTM
the outputs of the layer are dropped. The TCN receptive field also
depends on the number of stacks of the residual block. For example, Convolutional neural networks (CNN) are also a multi-layer
the size of the receptive field with kernel size of ks ¼ 3 and dilation supervised networks which can learn the features automatically
factor d ¼ 1; 2; 4 and number of stacks of residual blocks ns ¼ 1, from the data and are intensively used in the field of image and
will be 3*4*1 ¼ 12. Following the illustration in Ref. [59], the TCN pattern recognition [37,71,72]. The CNN architecture consists of
architecture is reproduced and shown in Fig. 4. several convolutional and pooling layers stacked to extract higher
level of features that are finally passed through a fully connected
o ¼ Activationðx þ FðxÞÞ (5) layer that uses all the information and provides the final pre-
dictions. The convolutional layer uses several filters that convolve
through the data producing the activation maps of the filters. The
pooling layers help in speeding up the calculation by reducing the
2.5. Comparing method I: LSTM number of parameters. In this study, CNN þ LSTM model is used,
where the data is passed through one convolution and pooling
LSTM networks use the self loops that produce paths through layers each to extract the features and then passed through LSTM
which gradient information can flow for long durations [68]. In this layers and final dense layer to get the predictions. For imple-
paper, Keras [69] library with Tensorflow [70] backend is used for menting the CNN þ LSTM model, Keras library with Tensorflow
implementing the LSTM model. A typical LSTM cell consists of three backend is used as well.
gates to protect and control the cell state. The forget gate uses the
output from previous state and input xt to decide on which infor-
mation to forget from previous state. It uses sigmoid activation 2.7. Comparing method III: MLR
function to output values between 0 and 1 with 0 representing to
completely forget the information and 1 to completely keep the Multi linear regression (MLR) model assumes linear relationship
information. Next, an input gate helps in deciding which new in- between input data (MET data þ calculated variables) and total
formation needs to be stored in the cell state. Initially, it uses a power. It also assumes all observations are normal and indepen-
sigmoid layer to decide on the values to update and uses a tanh dently distributed [73]. MLR model is formulated as
layer to create a new candidate values. Next, the old cell state is yt ¼ b0 þ b1 xt1 þ … þ bp xtp þ ε where yt is output variable at
updated using the information from the forget gate, input gate, and timestep t, xt1 ; …; xtp are p input variables at timestep t, b0 ; …bp are
the new candidate values. Finally, the output gate decides the p þ 1 parameters and ε is the random error of the model. Multi-
output based on the cell state. It runs through a sigmoid layer to OutputRegressor and LinearRegression functions provided by
decide on the parts of the cell state to output and then runs through scikit-learn [61] to build the MLR model are used. If the model uses
tanh layer that uses the output from sigmoid layer to provide the the input history, all the inputs in the past are added as features. For
final output. example, the model that uses input history of 5 timesteps will have
5
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Fig. 4. Architectural elements of TCN.

5p number of features with p number of features at timesteps t  4, has few disadvantages as it doesn’t take the previous evaluation
t  3, t  2, t  1, and t. outcomes into consideration [76]. Later, Bayesian optimization
techniques [77] have been successfully used to optimize the
2.8. Performance metrics hyperparameters [78]. There is also a research area emerging on
automatic selection of models based on the data [79]. One of the
Normalized root mean square error (NRMSE) and normalized major limitations of Bayesian optimization is its dependence on the
mean absolute error (NMAE) are selected as the performance quality of the learned model [80]. Recently, zhang et al. [81] pro-
metrics to evaluate the performance of all the models in this study. pose an Orthogonal Array Tuning Method (OATM), a highly frac-
The calculation of RMSE and MAE performance metrics are shown tional orthogonal design, that is based on Taguchi [82] approach
in Equations (6) and (7) with yTru being the actual total power which trades-off between the tuning time and competitive per-
formance. In this study, the OATM approach is used to optimize the
produced, yEst being estimated total power and T being the total
hyperparameters of the TCN, LSTM, and CNN þ LSTM models.
number of samples in the test set.
In the OATM approach, the hyperparameters are considered as
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uP  2 factors and the different values of the hyperparameters are
u T
t i¼1 yTru i
 yEst i
considered as levels. The major steps of OATM which are described
RMSE ¼ (6) in Ref. [81] are as follows:
T

PT  Tru   Step 1: Determine the number of factors to be optimized and the


y  yEst 
i¼1 i i number of levels for each factor, and build the factor-level (F-L)
MAE ¼ (7)
T table.
Then, the RMSE and MAE are normalized and reported as  Step 2: Construct the orthogonal array tuning table with k fac-
normalized root mean square error (NRMSE ¼ RMSE) and normal- tors, h levels and M rows that follows the basic composition
O principles.
ized mean absolute error (NMAE ¼ MAE) where O is the average of
O  Step 3: Run M models using the hyperparameters corresponding
all actual total power observations in the test set. to each row.
 Step 4: Conduct range analysis to analyze the experiment results
2.9. Orthogonal array Tuning method (OATM) and identify the optimal levels of the hyperparameters.
 Step 5: Run the model using the hyperparameters suggested by
This section describes the proposed methodology for optimizing the range analysis.
the (predictive) performance of the TCN, LSTM and CNN þ LSTM
models. The design parameters of the deep learning models, that As an additional step in this study, after running the model with
define the network architecture, are known as hyperparameters. the suggested optimal set of hyperparameters by the OATM
One of the most common approach for optimizing the hyper- approach, the optimal set of hyperparameters are selected as the
parameters is through grid search. However, grid search suffers the set providing best performance out of all the M þ 1 (M combina-
curse of dimensionality when the number of hyperparameters are tions from orthogonal array table and one combination suggested
very large [74]. Random search is another approach which by OATM approach) runs tested.
randomly searches and optimizes the hyperparameters [75]. It also
6
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Table 2 Table 3
Factor - level table for the TCN, LSTM and CNN þ LSTM models. Range analysis of the TCN model.

Model Levels Factors Row No. nf ks ns ih NRMSE

nf ks ns ih 1 5 5 2 5 0.379
2 5 8 3 10 0.477
TCN Level 1 5 5 2 5
3 5 10 5 15 0.545
Level 2 10 8 3 10
4 10 5 3 15 0.416
Level 3 15 10 5 15
5 10 8 5 5 0.384
lr nl nn ih 6 10 10 2 10 0.395
7 15 5 5 10 0.426
LSTM Level 1 0.001 1 32 5
8 15 8 2 15 0.439
Level 2 0.005 2 64 10 9 15 10 3 5 0.406
Level 3 0.01 3 128 15
Rlevel1 1.401 1.221 1.212 1.169
nf nl nn ih Rlevel2 1.194 1.300 1.299 1.298
CNN þ LSTM Level 1 64 1 32 5 Rlevel3 1.271 1.346 1.355 1.399
Level 2 128 2 64 10 Elevel1 0.467 0.407 0.404 0.390
Level 3 256 3 128 15 Elevel2 0.398 0.433 0.433 0.433
Elevel3 0.424 0.449 0.452 0.466

Lowest NRMSE 0.398 0.407 0.404 0.390


2.10. Factor-level Table of hyperparameters Highest NRMSE 0.467 0.449 0.452 0.466
Range 0.069 0.042 0.048 0.077
Importance i h > nf > ns > k s
Initially, a few trials are run to understand the hyperparameter
ranges to be tested. Based on the results of the initial runs and Best Level Level 2 Level 1 Level 1 Level 1
literature [81,83], four factors are picked with three levels for each Suggested Optimal value 10 5 2 5 0.375

of the TCN, LSTM and CNN þ LSTM models. The four factors selected
for TCN are number of filters (nf ), kernel size (ks ), number of stacks
(ns ) and amount of input history (ih ). The four factors selected for approach provides the lowest NRMSE of all the runs. The range
CNN þ LSTM are number of filters (nf ), number of layers (nl ), analysis by the OATM approach also indicates the sensitivity of the
number of neurons (nn ) and amount of input history (ih ). The four design factors to the performance of the models. For the TCN
factors selected for LSTM are learning rate of ADAM optimization model, the input history size (ih ) in terms of NRMSE is the most
(lr ), number of layers (nl ), number of neurons (nn ) and amount of sensitive design factor followed by the number of filters (nf ), the
input history (ih ). The factor-level (F-L) table of the TCN, LSTM and number of residual stacks (ns ) and the kernel size (ks ). The optimal
CNN þ LSTM is shown in Table 2. set of hyperparameters for the comparing models are shown in
Table 4.
3. Results and discussion
3.2. Analysis of the predictive performance
This section investigates the performance of the proposed TCN
model along with those of LSTM, CNN þ LSTM, and MLR models
This section investigates the predictive performance of the TCN
based on the wind-farm dataset described in Section 2. The data is
model in comparison to LSTM, CNN þ LSTM, and MLR models using
split into 80 : 20 for training and testing the models. All the models
optimized hyperparameters from Section 3.1. MLR model with 5, 10
use 15 input variables to predict the total power up to 6 timesteps
and 15 timesteps of input history is tested and the model with
which is represented as t, t þ 1, t þ 2, t þ 3, t þ 4 and tþ 5 timesteps.
lowest NRMSE is selected for the analysis, which is the model using
5 timesteps of input history. Table 5 shows the NRMSE and NMAE
3.1. Result of hyperparameter optimization values of the comparing methods at t, t þ 1, t þ 2, t þ 3, t þ 4 and t þ
5 timesteps ahead along with the overall NRMSE and NMAE, which
Here, the optimization methodology discussed in Section 2.9 is concatenates the observations of all timesteps. All of the models in
applied to optimize the prediction performance of LSTM, the study predict 6 timesteps ahead simultaneously. The cells in the
CNN þ LSTM and TCN models in terms of NRMSE. All the experi- table are color coded; signifying that the darker values represent
ments are run for 100 epochs with a batch size of 1000 and no higher predictive accuracy. In terms of both NRMSE and NMAE, the
dropout. The default learning rate values are used for CNN þ LSTM TCN model generally performs better than other methods. Also, the
and TCN models. For the TCN runs, the dilation factor of d ¼ 5 is overall NRMSE of the TCN model is 12:6%, 6:4% and 6:4% less than
used which generates the list of dilation f1; 2; 4; 8; 16g. According the MLR, LSTM and CNN þ LSTM models respectively. The overall
to the Section 2.10, the F-L table is built with k ¼ 4 factors and h ¼ 3 NMAE of the TCN model is 20:3%, 8:6% and 2:0% less than the MLR,
levels and shown in Table 2 (Step 1). Taguchi method [82] is used to LSTM and CNN þ LSTM models respectively.
build the orthogonal array tuning table. JMP software [84] is used to At prediction timestep t þ 3, although the NMAE of TCN and
create the OATM table with L9 configuration to construct the table CNN þ LSTM look the same with a value of 0.277, without rounding
with M ¼ 9 rows (Step 2). Each model is run with 9 different off to the third decimal place the NMAE value of CNN þ LSTM is
combinations of hyperparameters provided in the OATM table (Step
3). To conduct the range analysis, all the NRMSE results are recor-
ded as shown in Table 3, where Rleveli denotes the sum of error Table 4
Optimal hyperparameters of the TCN, LSTM and CNN þ LSTM.
under level i, and Eleveli is the average error level calculated by
Eleveli ¼ Rleveli
h
(Step 4). Then, the model is run using hyperparameters Model Factor 1 Factor 2 Factor 3 Factor 4

suggested by the range analysis of the OATM approach (Step 5). TCN nf ¼ 10 ks ¼ 5 ns ¼ 2 ih ¼ 5
The range analysis of the TCN model is shown in Table 3. For the LSTM lr ¼ 0.001 nl ¼ 1 nn ¼ 32 ih ¼ 5
CNN þ LSTM nf ¼ 128 nl ¼ 3 nn ¼ 32 ih ¼ 10
TCN model, the suggested optimal set of hyperparameters by OATM
7
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Table 5 by overcoming the previous short comings of recurrent network


NRMSE and NMAE of the TCN, LSTM, CNN þ LSTM and MLR. architectures.
To further extend the analysis, the scaled actual vs predicted
powers of the comparing methods are visualized over different
period of times. For this purpose, the test dataset, which covers the
time interval of mid October to the end of December, is divided into
four time periods of 1000 timesteps, which are 10 min apart, to
represent the beginning of November, end of November, beginning
of December, and end of December, respectively. These four time
periods are then used to visualize the errors of the comparing
methods for current prediction timestep t. Figs. 5e8 show the
scaled actual and predicted total powers at time t for all the four
periods. The predicted maximum total powers of the TCN model
are normalized using the average of all predictions in test set and
reported as 1.811, 2.585, 2.664 and 2.469 for the four selected pe-
riods, respectively. The corresponding actual total powers
normalized using the average of all actual observations in test set
are 2.906, 2.603, 2.756 and 2.739, respectively. Except for the period
that covers beginning of the November month, the predicted
maximum total powers of TCN are within 90% confidence interval
0:2% less than TCN model. Considering the NRMSE at the same of actual total powers. The period at which the predicted total
prediction timestep t þ 3, the TCN model value is 4:4% less than power is not within 90% confidence interval is the highest over
CNN þ LSTM model. Although both RMSE and MAE express average predicted observation of overall test data. Similarly, by comparing
model prediction error, RMSE has the advantage of penalizing the the predicted maximum total powers of each MLR, LSTM and
large errors more and also does not use the absolute value in the CNN þ LSTM models with corresponding actual total powers, MLR
calculations which is undesirable in many mathematical calcula- and CNN þ LSTM have only one time period out of four time periods
tions [85]. As the RMSE metric is the most common metric used for for which the prediction is within 90% confidence interval while the
revealing the performance ability of the prediction models, the TCN LSTM model does not have any. The predictions of valleys of all four
model is considered better performing model overall even at pre- period time series is challenging for all the models. The normalized
diction step t þ 3. These results prove how the TCN’s hierarchical predicted minimum total powers of the TCN model are
temporal convolution filters help capturing the long range patterns 0.007, 0.049, 0.053 and 0.024. The corresponding normalized

Fig. 5. Scaled actual vs predicted total powers at time t for 1000 timesteps in the beginning of November month using TCN, LSTM, CNN þ LSTM and MLR models.

8
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Fig. 6. Scaled actual vs predicted total powers at time t for 1000 timesteps in the end of November month using TCN, LSTM, CNN þ LSTM and MLR models.

Fig. 7. Scaled actual vs predicted total powers at time t for 1000 timesteps in the beginning of December month using TCN, LSTM, CNN þ LSTM and MLR models.

9
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Fig. 8. Scaled actual vs predicted total powers at time t for 1000 timesteps in the end of December month using MLR, LSTM, CNN þ LSTM and TCN models.

actual total powers are 0.020, 0.083, 0.054 and 0.030, respectively. errors are high in the range of wind speeds from 8 to 12 m/s. The
Similarly, by comparing the predicted minimum total powers of absolute error distribution has pronounced peak at the center
each MLR, LSTM and CNN þ LSTM models with corresponding (around 10 m/s) with broad distribution of errors across the other
actual total powers, none of the models including the TCN model wind speeds. Fig. 9 d shows the squared error distribution.
have the predictions of the minimum total powers within 90%
confidence interval except at one period for TCN model. However, 3.3. Discussion
considering the timesteps at which the TCN model predicted the
minimum total powers and comparing the predictions of those The TCN model provides the best performance over all of the
timesteps to MLR, LSTM and CNN þ LSTM models, the absolute comparing methods across almost all timesteps in terms of all
error of the TCN model is significantly less than comparing performance metrics. With respect to the NRMSE metric, the
methods for all periods except for the period that covers the ending highest performance gaps of the proposed TCN with each of the
of December month. comparing methods are: 21:0% for MLR model occurring at t (cur-
Apart from the TCN model, considering the predictive maximum rent time), 9:9% for the LSTM model at t þ 1 (10 min ahead), and
and minimum total powers predicted by CNN þ LSTM model, the 8:0% for the CNN þ LSTM model at t þ 2 (20 min ahead). Similarly,
CNN þ LSTM model is significantly performing better than LSTM using NMAE, the highest performance gap between the proposed
and MLR models with having one period out of four periods within TCN and the comparing methods are: 35:5% for MLR which occurs
90% confidence interval and one period with absolute error 51:6% at t, 13:1% for LSTM at t þ 1 , and 1:9% for CNN þ LSTM at t þ 2,
and 52:7% less than the absolute errors of MLR and LSTM models respectively. Meanwhile, as expected, the performance gap be-
respectively. tween the proposed TCN and other comparing methods generally
Finally, the power curve representing the correlation between decreases as the prediction timestep increases. This is mainly due
the scaled actual total power and the wind speed, and the scaled to the decrease in the prediction power of all methods for further
predicted total power from the TCN model at current time t and the away timesteps.
wind speed are shown in Fig. 9 a and 9 b respectively. As shown in Based on the OATM approach, an input history of 5 timesteps
Fig. 9 a and 9 b, there exists a strong correlation between the total (50 min) is found optimal for all of the comparing methods except
power with the wind speed, as expected. The Pearson’s correlation for CNN þ LSTM, which requires 10 timesteps (100 min). The need
coefficient of the actual total power with wind speed in the test for additional history in CNN þ LSTM is probably due to the con-
data is 0.86, while the correlation coefficient of predicted total volutional layers used to extract the features. Also, after the TCN
power with wind speed is 0.91. Fig. 9 c shows the absolute error model, the CNN þ LSTM model provides the best performance in
plotted versus the wind speed. It should be noted that the cut-in terms of NRMSE and NMAE compared to MLR and LSTM models.
speed is 3.5 m/s and the cut-off speed is 25 m/s. At low wind- Finally, while the focus of the paper is mainly on using atmo-
speeds and at high wind-speeds the errors are low. However, the spheric variables for prediction of wind power, for readers
10
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

Fig. 9. Correlation of scaled actual and predicted total powers of the TCN model at time t, absolute and squared error distributions of the total power at time t with respect to wind
speed at hub height.

reference, Table 6 shows the NRMSE and NMAE performances of improves the accuracy of the forecasts for all models, and makes
TCN, LSTM, CNN þ LSTM and MLR models that include the historic the predictions of different methods close to each other. This is
wind power at time t as one of the input variables. These models because historic wind power is one of the most influential factors
use the atmospheric variables and the total wind power at t, to for the prediction of future power (due to auto-correlation). Also,
predict the total wind power at t þ 1, t þ 2, t þ 3, tþ 4 and tþ 5 while the proposed TCN method still provides the best overall
simultaneously. As expected, including the historic wind power performance when including the historic wind power, for the very
short prediction intervals (t þ 1, t þ 2), the MLR method shows
better forecast. This is because: (1) The objective function of the
Table 6
comparing models is designed to provide the lowest prediction
NRMSE and NMAE of the TCN, LSTM, CNN þ LSTM and MLR with total wind power as error across all future timesteps (and not only one/two intervals),
input in addition to atmospheric inputs. and (2) very short prediction intervals (t þ 1, t þ 2) involves only a
few minutes ahead predictions (10 min, 20 min), and therefore
historic wind power makes a very good predictor, as significant
changes in the wind power is unlikely in very short time. Mean-
while, as the prediction timestep gets far away to the input time-
step the prediction accuracy of MLR gets decreased.

4. Conclusions

In this study, a state-of-the-art temporal convolutional network


(TCN) is proposed to provide accurate multi-step forecast of the
total wind power generated by a 130 MW full-scale wind farm for
up to 50 min ahead using atmospheric variables. An orthogonal
array tuning method (OATM) for optimizing the hyperparameters
of the proposed TCN method, which includes the number of filters,
the kernel size, the number of residual stacks, and the amount of
input history, is utilized. As most of the literature considers using
power as one of the input variables which might not be easily
available, the models in this study use only meteorological and
derived (from MET data) inputs to predict the total power of a wind
11
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

farm. We believe this study is the beginning of understanding the


Pd Pw
usage of TCN models in improving the wind power prediction as Air Density; r ¼ þ ; where; Pw
most of the current models are highly based on long short-term 287:058*T
 461:495*T
  (8)
memory (LSTM) models. Using a full-scale wind farm dataset, 7:5T RH
¼ 6:1078 10Tþ237:3 ; Pd ¼ P  Pw
including 12 months meteorological (MET) and wind turbine po- 100
wer data from 86 wind turbines, this study compares the predictive
performance of the TCN model with LSTM, convolutional neural rcp ðTs  TÞ
Surface Heat Flux; H ¼ 1000 ; where;
network (CNN)þLSTM, and multi linear regression (MLR) in terms raa
of normalized root mean square error (NRMSE) and normalized (9)
208
mean absolute error (NMAE). The results show the proposed TCN Cp ¼ 1:003; raa ¼
U1:5m
model provides the most competitive and robust performance of all
comparing methods across almost all timesteps and performance
vTs
metrics by a large margin. The results also show that, unlike other Changes in Surface Temparature ¼ (10)
vt
methods, the maximum predicted total performance of the pro-
posed TCN model are often within 90% confidence interval of the
actual values.
The proposed TCN model offers an important contribution to the
References
short-term forecast of total wind power at utility-scale wind farm
by increasing the prediction accuracy. The study also demonstrates [1] Bokde N, Feijoo A, Villanueva D, Kulat K. A review on hybrid empirical mode
the significance of meteorological variables in addition to the wind decomposition models for wind speed and wind power prediction. Energies
speed and wind direction for prediction of wind power. While, 2019;12(2):254.
[2] Lange M, Focken U. Physical approach to short-term wind power prediction.
wind speed has the highest correlation with the total power, the Springer; 2006.
proposed study demonstrates non-trivial correlations with mete- [3] Gensler A, Henze J, Sick B, Raabe N. Deep Learning for solar power fore-
orological variables, such as surface temperature, relative humidity, castingdan approach using AutoEncoder and LSTM Neural Networks. In:
2016 IEEE international conference on systems, man, and cybernetics (SMC).
surface heat flux, and air density. Given the fact that short-term IEEE; 2016. p. 2858e65.
predictions of power based on data collected from meteorological [4] Zhao X, Wang S, Li T. Review of evaluation criteria and main methods of wind
towers and the numerical models are susceptible to significant power forecasting. Energy Procedia 2011;12:761e9.
[5] Sideratos G, Hatziargyriou ND. An advanced statistical method for wind power
errors, by integrating recent history of all the meteorological vari-
forecasting. IEEE Trans Power Syst 2007;22(1):258e65.
ables, the proposed TCN model provides accurate short-term [6] Pinson P, Kariniotakis G. Wind power forecasting using fuzzy neural networks
prediction. enhanced with on-line prediction risk assessment. In: 2003 IEEE bologna
power tech conference proceedings, vol. 2. IEEE; 2003. p. 8.
[7] Lei M, Shiyan L, Chuanwen J, Hongling L, Yan Z. A review on the forecasting of
wind speed and generated power. Renew Sustain Energy Rev 2009;13(4):
915e20.
Credit suthor statement [8] Guoyang W, Yang X, Shasha W. Discussion about short-term forecast of wind
speed on wind farm. Jilin Electric Power 2005;181(5):21e4.
Rajitha Meka: Conceptualization, Methodology, Data Prepara- [9] Wang X, Guo P, Huang X. A review of wind power forecasting models. Energy
procedia 2011;12:770e8.
tion, Coding the Algorithms, Conducting the Numerical Studies, [10] Li G, Shi J. On comparing three artificial neural networks for wind speed
Analyzing the Results, Writing e original draft. Adel Alaeddini: forecasting. Appl Energy 2010;87(7):2313e20.
Conceptualization, Methodology, Supervision, Analyzing the Re- [11] Methaprayoon K, Yingvivatanapong C, Lee W-J, Liao JR. An integration of ann
wind power estimation into unit commitment considering the forecasting
sults, Writing- Reviewing and Editing. Kiran Bhaganagar: Concep- uncertainty. IEEE Trans Ind Appl 2007;43(6):1441e8.
tualization, Methodology, Supervision, Analyzing the Results, [12] Wang S-X, Li M, Zhao L, Jin C. Short-term wind power prediction based on
Writing- Reviewing and Editing. improved small-world neural network. Neural Comput Appl 2019;31(7):
3173e85.
[13] Nielson J, Bhaganagar K, Meka R, Alaeddini A. Using atmospheric inputs for
artificial neural networks to improve wind turbine power prediction. Energy
2020;190:116273.
Declaration of competing interest [14] Sideratos G, Hatziargyriou ND. Probabilistic wind power forecasting using
radial basis function neural networks. IEEE Trans Power Syst 2012;27(4):
The authors declare that they have no known competing 1788e96.
[15] Mishra S, Dash P. Short-term prediction of wind power using a hybrid pseudo-
financial interests or personal relationships that could have inverse legendre neural network and adaptive firefly algorithm. Neural
appeared to influence the work reported in this paper. Comput Appl 2019;31(7):2243e68.
[16] Zeng J, Qiao W. Support vector machine-based short-term wind power fore-
casting. In: 2011 IEEE/PES power systems conference and exposition. IEEE;
2011. p. 1e8.
[17] Fu C, Li G-Q, Lin K-P, Zhang H-J. Short-term wind power prediction based on
Acknowledgments improved chicken algorithm optimization support vector machine. Sustain-
ability 2019;11(2):512.
We acknowledge funding support from NASA MIRO Grant [18] Heinermann J, Kramer O. Precise wind power prediction with svm ensemble
regression. In: International conference on artificial neural networks.
#80NSSC19M0194 and a partial funding support from AFOSR YIP
Springer; 2014. p. 797e804.
Grant #FA9550-16-1-0171. [19] Heinermann J, Kramer O. Machine learning ensembles for wind power pre-
diction. Renew Energy 2016;89:671e9.
[20] Sideratos G, Hatziargyriou N. Using radial basis neural networks to estimate
wind power production. In: 2007 IEEE power engineering society general
Appendix meeting. IEEE; 2007. p. 1e7.
[21] Hong Y-Y, Rioflorido CLPP. A hybrid deep learning-based neural network for
24-h ahead wind power forecasting. Appl Energy 2019;250:530e9.
Air density, surface heat flux and changes in surface tempera- [22] J. Chen, Q. Zhu, H. Li, L. Zhu, D. Shi, Y. Li, X. Duan, Y. Liu, Learning heteroge-
ture are calculated using Equations (8)e(10), where Pw , Cp , raa are neous features jointly: a deep end-to-end framework for multi-step short-
term wind power prediction, IEEE Transactions on Sustainable Energy.
vapor pressure, specific heat and aerodynamic resistance [86] [23] Le QV, et al. A tutorial on deep learning part 2: Autoencoders, convolutional
respectively. neural networks and recurrent neural networks. Google Brain; 2015. p. 1e20.

12
R. Meka, A. Alaeddini and K. Bhaganagar Energy 221 (2021) 119759

[24] Barbounis TG, Theocharis JB, Alexiadis MC, Dokopoulos PS. Long-term wind the classification of satellite image time series. Rem Sens 2019;11(5):523.
speed and power forecasting using local recurrent neural network models. [54] Razavian N, Sontag D. Temporal convolutional neural networks for diagnosis
IEEE Trans Energy Convers 2006;21(1):273e84. from lab tests. arXiv preprint, 1511.07938.
[25] Senjyu T, Yona A, Urasaki N, Funabashi T. Application of recurrent neural [55] Jarrett D, Yoon J, van der Schaar M. Dynamic prediction in clinical survival
network to long-term-ahead generating power forecasting for wind power analysis using temporal convolutional networks. IEEE J Biomed Eng Health
generator. In: 2006 IEEE PES power systems conference and exposition. IEEE; Commun 2019 Jul 17;24(2). 424-36.
2006. p. 1260e5. [56] Lin L, Xu B, Wu W, Richardson T, Bernal EA. Medical time series classification
[26] Felder M, Kaifel A, Graves A. Wind power prediction using mixture density with hierarchical attention-based temporal convolutional networks: a case
recurrent neural networks. In: Poster presentation Gehalten auf der European study of myotonic dystrophy diagnosis. arXiv preprint, 1903.117481.
wind energy conference; 2010. [57] Moor M, Horn M, Rieck B, Roqueiro D, Borgwardt K. Temporal convolutional
[27] Chen G. A gentle tutorial of recurrent neural network with error back- networks and dynamic time warping can drastically improve the early pre-
propagation. arXiv preprint, 1610.02583. diction of sepsis. arXiv preprint, 1902.01659.
[28] F. A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: Continual predic- [58] Catling FJ, Wolff AH. Temporal convolutional networks allow early prediction
tion with lstm. of events in critical care. J Am Med Inf Assoc 2020;27(3):355e65.
[29] Sundermeyer M, Schlüter R, Ney H. Lstm neural networks for language [59] Bai S, Kolter JZ, Koltun V. An empirical evaluation of generic convolutional and
modeling. In: Thirteenth annual conference of the international speech recurrent networks for sequence modeling. arXiv preprint, 1803.01271.
communication association; 2012. [60] Bartlett P. Introduction to time series analysis. lecture 2014;5.
[30] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput [61] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O,
1997;9(8):1735e80. Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine
[31] Gers FA, Schmidhuber J. Recurrent nets that time and count. In: Proceedings of learning in python. J Mach Learn Res 2011;12(Oct):2825e30.
the IEEE-INNS-ENNS international joint conference on neural networks. IJCNN [62] Lea C, Vidal R, Reiter A, Hager GD. Temporal convolutional networks: a unified
2000. Neural computing: new challenges and perspectives for the new mil- approach to action segmentation. In: European conference on computer
lennium, vol. 3. IEEE; 2000. p. 189e94. vision. Springer; 2016. p. 47e54.
[32] Gers FA, Schraudolph NN, Schmidhuber J. Learning precise timing with lstm [63] Remy P. Tcn library. https://github.com/philipperemy/keras-tcn.
recurrent networks. J Mach Learn Res 2002;3(Aug):115e43. [64] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic
[33] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural segmentation. In: Proceedings of the IEEE conference on computer vision and
networks. In: Advances in neural information processing systems; 2014. pattern recognition; 2015. p. 3431e40.
p. 3104e12. [65] Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv
[34] Vinyals O, Kaiser Ł, Koo T, Petrov S, Sutskever I, Hinton G. Grammar as a preprint, 1511.07122.
foreign language. In: Advances in neural information processing systems; [66] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a
2015. p. 2773e81. simple way to prevent neural networks from overfitting. J Mach Learn Res
[35] Graves A, Mohamed A-r, Hinton G. Speech recognition with deep recurrent 2014;15(1):1929e58.
neural networks. In: 2013 IEEE international conference on acoustics, speech [67] Gal Y, Ghahramani Z. A theoretically grounded application of dropout in
and signal processing. IEEE; 2013. p. 6645e9. recurrent neural networks. In: Advances in neural information processing
[36] Xiaoyun Q, Xiaoning K, Chao Z, Shuai J, Xiuda M. Short-term prediction of systems; 2016. p. 1019e27.
wind power based on deep long short-term memory. In: 2016 IEEE PES asia- [68] Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press; 2016.
pacific power and energy engineering conference (APPEEC). IEEE; 2016. [69] Chollet F, et al. Keras. 2015. https://keras.io.
p. 1148e52. [70] Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A,
[37] Wu W, Chen K, Qiao Y, Lu Z. Probabilistic short-term wind power forecasting Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y,
based on deep neural networks. In: 2016 international conference on prob- Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane  D, Monga R, Moore S,
abilistic methods applied to power systems (PMAPS). IEEE; 2016. p. 1e8. Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K,
[38] Lo pez E, Valle C, Allende H, Gil E, Madsen H. Wind power forecasting based on Tucker P, Vanhoucke V, Vasudevan V, Vie gas F, Vinyals O, Warden P,
echo state networks and long short-term memory. Energies 2018;11(3):526. Wattenberg M, Wicke M, Yu Y, Zheng X. TensorFlow: large-scale machine
[39] Shi X, Lei X, Huang Q, Huang S, Ren K, Hu Y. Hourly day-ahead wind power learning on heterogeneous systems. 2015. software available from: tensor-
prediction using the hybrid model of variational model decomposition and flow.org, https://www.tensorflow.org/.
long short-term memory. Energies 2018;11(11):3227. [71] LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD.
[40] Cali U, Sharma V. Short-term wind power forecasting using long-short term Backpropagation applied to handwritten zip code recognition. Neural Comput
memory based recurrent neural network model and variable selection, vol. 8; 1989;1(4):541e51.
2019. p. 103e10. [72] Ju Y, Sun G, Chen Q, Zhang M, Zhu H, Rehman MU. A model combining
[41] Yu R, Gao J, Yu M, Lu W, Xu T, Zhao M, Zhang J, Zhang R, Zhang Z. Lstm-efg for convolutional neural network and lightgbm algorithm for ultra-short-term
wind power forecasting based on sequential correlation features. Future wind power forecasting. IEEE Access 2019;7:28309e18.
Generat Comput Syst 2019;93:33e42. [73] Uyanık GK, Güler N. A study on multiple linear regression analysis. Procedia-
[42] Hao Y, Tian C. A novel two-stage forecasting model based on error factor and Social and Behavioral Sciences 2013;106:234e40.
ensemble method for multi-step wind power forecasting. Appl Energy [74] M. Claesen, J. Simm, D. Popovic, Y. Moreau, B. De Moor, Easy hyperparameter
2019;238:368e83. search using optunity, arXiv preprint arXiv:1412.1114.
[43] Liu Y, Guan L, Hou C, Han H, Liu Z, Sun Y, Zheng M. Wind power short-term [75] Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach
prediction based on lstm and discrete wavelet transform. Appl Sci Learn Res 2012;13(Feb):281e305.
2019;9(6):1108. [76] N. Reimers, I. Gurevych, Optimal hyperparameters for deep lstm-networks for
[44] Xu G, Xia L. Short-term prediction of wind power based on adaptive lstm. In: sequence labeling tasks, arXiv preprint arXiv:1707.06799.
2018 2nd IEEE conference on energy internet and energy system integration [77] E. Brochu, V. M. Cora, N. De Freitas, A tutorial on bayesian optimization of
(EI2). IEEE; 2018. p. 1e5. expensive cost functions, with application to active user modeling and hier-
[45] X. Wang, Z. Li, J. Zhang, H. Liu, C. Qiu, X. Cai, An lstm-attention wind power archical reinforcement learning, arXiv preprint arXiv:1012.2599.
prediction method considering multiple factors. [78] Y. Sun, H. Gong, Y. Li, D. Zhang, Hyperparameter importance analysis based on
[46] M. Du, Improving lstm neural networks for better short-term wind power n-rrelieff algorithm., Int J Comput Commun Contr 14 (4).
predictions, arXiv preprint arXiv:1907.00489. [79] Luo G. A review of automatic selection methods for machine learning algo-
[47] Zhou B, Ma X, Luo Y, Yang D. Wind power prediction based on lstm networks rithms and hyper-parameter values. Network Modeling Analysis in Health
and nonparametric kernel density estimation. IEEE Access 2019;7: Informatics and Bioinformatics 2016;5(1):18.
165279e92. [80] Calandra R, Gopalan N, Seyfarth A, Peters J, Deisenroth MP. Bayesian gait
[48] Sun Y, Wang P, Zhai S, Hou D, Wang S, Zhou Y. Ultra short-term probability optimization for bipedal locomotion. In: International conference on learning
prediction of wind power based on lstm network and condition normal dis- and intelligent optimization. Springer; 2014. p. 274e90.
tribution. Wind Energy 2020;23(1):63e76. [81] Zhang X, Chen X, Yao L, Ge C, Dong M. Deep neural network hyperparameter
[49] Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J. Lstm: a optimization with orthogonal array tuning. In: International conference on
search space odyssey. IEEE transactions on neural networks and learning neural information processing. Springer; 2019. p. 287e95.
systems 2016;28(10):2222e32. [82] Taguchi G. System of experimental design; engineering methods to optimize
[50] Lea C, Flynn MD, Vidal R, Reiter A, Hager GD. Temporal convolutional net- quality and minimize costs. 1987. Tech. rep.
works for action segmentation and detection. In: Proceedings of the IEEE [83] Zhao W, Gao Y, Ji T, Wan X, Ye F, Bai G. Deep temporal convolutional networks
conference on computer vision and pattern recognition; 2017. p. 156e65. for short-term traffic flow forecasting. IEEE Access 2019;7:114496e507.
[51] Kim TS, Reiter A. Interpretable 3d human action analysis with temporal [84] SAS Institute Inc., Cary, NC, 1989-2019, Jmp version 15.
convolutional networks. In: 2017 IEEE conference on computer vision and [85] T. Chai, R. R. Draxler, Root mean square error (rmse) or mean absolute error
pattern recognition workshops (CVPRW). IEEE; 2017. p. 1623e31. (mae)?earguments against avoiding rmse in the literature.
[52] A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. [86] Chehbouni A, Nouvellon Y, Lhomme J-P, Watts C, Boulet G, Kerr Y, Moran MS,
Kalchbrenner, A. Senior, K. Kavukcuoglu, Wavenet: a generative model for Goodrich D. Estimation of surface sensible heat flux using dual angle obser-
raw audio, arXiv preprint arXiv:1609.03499. vations of radiative surface temperature. Agric For Meteorol 2001;108(1):
[53] Pelletier C, Webb GI, Petitjean F. Temporal convolutional neural network for 55e65.

13

You might also like