A Hybrid Wind Speed Forecasting Model Using Stacked Autoencoder and LSTM

A hybrid wind speed forecasting model using
stacked autoencoder and LSTM

Cite as: J. Renewable Sustainable Energy 12, 023302 (2020); https://doi.org/10.1063/1.5139689
Submitted: 25 November 2019 . Accepted: 16 March 2020 . Published Online: 22 April 2020
K. U. Jaseena, and Binsu C. Kovoor
J. Renewable Sustainable Energy 12, 023302 (2020); https://doi.org/10.1063/1.5139689 12, 023302
© 2020 Author(s).
Journal of Renewable ARTICLE scitation.org/journal/rse
and Sustainable Energy
A hybrid wind speed forecasting model using

stacked autoencoder and LSTM
Cite as: J. Renewable Sustainable Energy 12, 023302 (2020); doi: 10.1063/1.5139689
Submitted: 25 November 2019 . Accepted: 16 March 2020 .
Published Online: 22 April 2020
K. U. Jaseenaa) and Binsu C. Kovoorb)
AFFILIATIONS
Division of Information Technology, School of Engineering, Cochin University of Science and Technology, Kochi, Kerala, India
a)
Electronic mail: jaseena.mes@gmail.com
b)
Author to whom correspondence should be addressed: binsu.kovoor@gmail.com
ABSTRACT
Fossil fuels cause environmental and ecosystem problems. Hence, fossil fuels are replaced by nonpolluting, renewable, and clean energy
sources such as wind energy. The stochastic and intermittent nature of wind speed makes it challenging to obtain accurate predictions. Long
short term memory (LSTM) networks are proved to be reliable models for time series forecasting. Hence, an improved deep learning-based
hybrid framework to forecast wind speed is proposed in this paper. The new framework employs a stacked autoencoder (SAE) and a stacked
LSTM network. The stacked autoencoder extracts more profound and abstract features from the original wind speed dataset. Empirical tests
are conducted to identify an optimal stacked LSTM network. The extracted features from the SAE are then transferred to the optimal stacked
LSTM network for predicting wind speed. The efficiency of the proposed hybrid model is compared with machine learning models such as
support vector regression, artificial neural networks, and deep learning based models such as recurrent neural networks and long short term
memory networks. Statistical error indicators, namely, mean absolute error, root mean squared error, and R2 , are adopted to assess the
performance of the models. The simulation results demonstrate that the suggested hybrid model produces more accurate forecasts.
Published under license by AIP Publishing. https://doi.org/10.1063/1.5139689
I. INTRODUCTION Apart from this, the models can also be physical, statistical, and
Wind energy is one of the most efficient renewable energy sour- hybrid, depending upon the approaches used for forecasting. Physical
ces. Greenhouse gases originating from conventional energy sources models are mathematical models. Statistical models are further divided
such as fossil fuels are the ultimate reason for global warming. Unlike into time series models, spatial correlation models, and artificial intelli-
traditional energy sources, renewable energy sources are pollution- gence models. Hybrid models are combinations of two or more
free, economic, inexhaustible, and environment friendly. The effective approaches. Hybrid models are the most recent models, which pro-
use of wind energy contributes to sustainable development. duce better forecasts. The current extensive availability of massive
Urbanization, population growth, and industrialization are the key datasets and the advent of information technology motivated many
reasons for the high demand for renewable energy sources. Wind researchers to explore hidden patterns in datasets. The artificial neural
power generation witnessed a significant increase from 2001 to 2018. network (ANN) is one of the popular intelligent techniques for wind
Wind power production can be enhanced if the available wind speed speed forecasting. Now, neural networks with deep architectures are
is higher than the cut-in speed of the wind turbine. Since wind power promising in the field of wind speed prediction.
and wind speed are cubic proportional, a small change in wind speed Numerous studies have been carried out for wind speed forecast-
will produce higher wind power. Therefore, continuous monitoring of ing using intelligent predictors, deep learning predictors, and hybrid
wind speed is a necessity in wind farms. Wind speed forecasting models.1 Artificial neural networks (ANNs), extreme learning
is essential for the proper functioning of the wind turbine and for machines (ELMs), support vector machines (SVMs), and fuzzy logic
optimum wind power generation. models are the main intelligent predictors used for wind speed and
Wind speed forecasting models can be classified into many types, energy forecasting. The deep learning predictors consist of autoen-
depending on the prediction horizon and the approaches used for coders, radial bias function (RBF), convolutional neural networks
implementation. Forecasting models can be very short term, short (CNNs), and recurrent neural networks (RNNs). Many works have
term, medium-term, and long term based on prediction horizon. been proposed so far based on the above-mentioned techniques.
J. Renewable Sustainable Energy 12, 023302 (2020); doi: 10.1063/1.5139689 12, 023302-1
Published under license by AIP Publishing
Ranganayaki and Deepa2 described a wind speed prediction model STPAR models, specifically to forecast wind speed for the wind farms
using linear the SVM and proximal SVM, where in the proximal in the Hebei regions of China.
SVM, both linear and nonlinear kernel functions are employed. Li Deep neural networks consist of a large number of hidden layers.
et al.3 presented an improved wind speed forecasting system using the Hence, they are capable of learning high-level abstract features from
least squares SVM, where parameters of the model are optimized using massive datasets. The selection of the network structure and tuning of
the Ant Colony Optimization technique. Blanchard and Samanta4 hyperparameters are the key factors required for designing any effi-
suggested a wind speed prediction model using two variants of ANNs, cient deep neural network. High-level abstract features present in data
namely, NAR (nonlinear auto regressive) and NARX (nonlinear auto can be extracted precisely by employing neural networks with deep
regressive neural network with exogenous inputs). It is verified that architectures. Li et al.18 implemented a multi-step prediction system
the NARX model works with better prediction accuracy even with less for long term forecasting using long short term memory (LSTM) and
amount of data than the NAR network. Madhiarasan and Deepa5 pro- SVR. Yu et al.19 suggested a hybrid model using recurrent neural net-
vided an overview of different techniques used for selecting the count works and SVM for wind speed forecasting. Tang and Sui20 proposed
of hidden neurons in ANNs. The authors also suggested a new method a hybrid SAE and the SVM model for power system transient stability
for the selection of the count of hidden neurons, specifically for wind assessment. The SAE is employed to extract abstract features, and the
speed prediction problems. Du et al.6 analyzed the effectiveness and SVM is used to train the model. Liu et al.1 conducted a comprehensive
feasibility of deep belief networks (DBNs) to predict the weather using survey of different wind energy forecasting models. The theoretical
massive meteorological datasets. Zhang et al.7 proposed a model using background, applications, merits, and demerits of various intelligent
several environmental factors to forecast precipitation. Deep neural predictive models and deep learning-based models are investigated.
network (DNN) architecture called DBNPF (deep belief networks pre- Chen et al.21 presented a wind speed forecasting model based on
cipitation forecasting) is used to extract essential features from hydro- stacked denoising autoencoder and ensemble learning. Hong and
logical datasets. Khalaf and Gan8 proposed a deep multilayer Rioflorido22 built a novel hybrid method based on the CNN combined
perceptron classifier model using the stacked autoencoder (SAE) to with the RBF and double Gaussian function (DGF) as the activation
overcome the difficulties such as overfitting and vanishing/exploding function. The CNN is employed to extract characteristics of wind
gradients initiated during the training of deep neural networks. power, and these features are fed to the RBF neural network for wind
The analysis of studies reveals that usually hybrid algorithms speed forecasting.
produce better results in wind speed forecasting compared to stand- In recent years, the sizes of datasets have increased rapidly. This
alone models. Zhang et al.9 suggested a hybrid wind speed prediction rapid increase in the volume of datasets has led to growing interest in
model based on hybrid decomposition and extreme learning machine. the development of tools capable of automatic extraction of knowledge
Mi et al.10 explained a novel hybrid multi-step forecasting model based from data. The high dimensionality declines the performance of the
on SSA (singular spectrum analysis), EMD (empirical mode decompo- mining algorithms and increases the time and space required for proc-
sition), and the convolutional SVM. Qolipour et al.11 developed a essing the data.23,24 The high dimensionality problem can be resolved
hybrid model to predict the behavior of wind speed using the gray by using dimensionality reduction techniques. Dimensionality reduc-
ELM. Dhiman et al.12 proposed a wind speed prediction model based tion techniques aim at finding and exploiting low-dimensional struc-
on wavelet transform and variants of support vector regression (SVR). tures in high-dimensional data to save the computation time and
The decomposition of the input signal is done using wavelet trans- storage burden. Feature extraction is a dimension reduction technique
form, and the decomposed signals are given as input to SVR for wind used to obtain the most relevant and good quality information from
speed prediction. The performance of the model is estimated by using the original data. Deep learning-based feature extraction has more sig-
four different datasets. Carrillo et al.13 implemented a hybrid predic- nificance in various domains, such as speech recognition, fraud detec-
tive model based on ant colony optimization and extreme learning tion, and time series prediction. Stacked autoencoders are mainly used
machine to forecast wind power. The authors estimated the perfor- for this purpose. An autoencoder is an unsupervised neural network
mance of the model using the datasets collected from two wind farms trained by stochastic gradient descent algorithms. Stacked autoen-
in Spain. Sarkar et al.14 investigated the effects of various activation coders are created by stacking multiple encoder layers. It performs
functions in time series forecasting. The models are implemented nonlinear feature extraction and is capable of learning higher-order
using nonlinear autoregressive and nonlinear autoregressive exoge- features.
nous neural networks with tansig and logsig activation functions. Deep learning structures such as RNN and LSTM are reliable
Studies proved the effectiveness of the tansig activation function. models for time series forecasting. Learning long term dependen-
Adnan et al.15 utilized a cross validation method to predict hourly cies is one of the main challenges in deep learning. RNNs are a
wind speed and wind power by applying least squares SVR and M5 family of neural networks for processing sequential data. Vanilla
model regression tree methods. The effectiveness of the model is evalu- RNNs are not often used for time series prediction because of
ated using root mean squared error (RMSE), mean absolute error their vanishing/exploding gradient problems.25 In order to avoid
(MAE), and R2 . Wang and Wang16 developed a hybrid wind speed the unstable gradient problems, LSTM networks were introduced
forecasting model using similar coefficient sum (SCS), Hermite inter- by Hochreiter and Schmidhuber in 1997.26 LSTM is one of the sig-
polation, and SVM. Qin et al.17 recommended a powerful and new nificant deep learning architectures, which is capable of long term
method for wind speed forecasting using Elman neural networks memory tasks with more time steps. It has an internal memory
(ENNs) and the smooth transition periodic autoregressive model state, which significantly reduces the multiplication effects of
(STPAR). The results described that the proposed hybrid model exhib- small/large gradients and thus reduces the vanishing/exploding
ited better forecasting performance than the stand-alone ENN and gradient problems.
In short, hybrid models exhibit better prediction accuracy and reconstructed input, respectively. The activation functions are specified
performance. Deep learning architectures are appropriate for captur- using g and f
ing long term dependencies present in time series data. So, in this
paper, a deep learning-based hybrid model using stacked autoencoders h ¼ g ðWxi þ bÞ; (1)
and LSTM for wind speed forecasting is proposed. The datasets col- xî ¼ f ðWh þ cÞ; (2)
lected from Melmandai wind farm situated at Tamil Nadu, India, is 1 Xn Xn
utilized for this study. The data sampled in every ten minutes are nor- J ðhÞ ¼ min ^x ij xij 2 : (3)
m i¼1 j¼1
malized using standard scaling. The accuracy of a forecasting model
depends on how much history is used for processing the time series An autoencoder can be under-complete or over-complete. An
data. In this study, a lag value (previous observation) of 144 is selected under-complete autoencoder has fewer dimensions for the hidden
for the experiments, which means that the previous one-day data are layer than the input layer, whereas an over-complete autoencoder has
used as input to forecast wind speed for the next time step. These input more dimensions for the hidden layer. An under-complete autoen-
features are fed to a stacked autoencoder (SAE) to extract relevant fea- coder is able to capture all the important characteristics of the inputs.
tures. The abstract features obtained from the SAE are then applied to These networks are trained using the backpropagation algorithm.
the input layer of the stacked LSTM for forecasting wind speed. A Both linear and nonlinear transformation can be represented using
four-layer stacked LSTM is used for this purpose. Experimental analy- autoencoders. Under-complete autoencoders are used primarily for
sis shows that the proposed hybrid SAE–LSTM (attacked autoenco- data denoising and dimensionality reduction of data.27
der–long short term memory) model outperforms the LSTM and The architecture of a stacked under-complete autoencoder is
RNN models. The main highlights of this work are summarized as shown in Fig. 1. It has three hidden layers, namely, h1, h2, and h3. The
follows: hidden layer h1 captures an abstract representation of input x, and h2
captures the abstract representation of h1. Then, h3 captures the
1. A deep learning-based hybrid model for wind speed prediction abstract representation of h2. In general, each successive hidden layer
using the SAE and LSTM. captures an abstract representation of the previous layer. Some of the
2. Abstract feature extraction using optimal stacked autoencoders. variants of autoencoders are listed below.
3. A suitable stacked LSTM network is adopted to forecast wind
speed. 1. Denoising autoencoders (DAEs)
4. The optimal stacked LSTM is trained on the top of the hidden
representation of the SAE. Autoencoders often set the output the same as the input.
5. Extensive evaluation of the model using statistical error indica- However, reconstruction does not guarantee clear and clean input
tors and the C25 index. because during extraction, some useful features may be lost.28 This
6. An improved hybrid model for accurate wind speed prediction. drawback can be rectified using denoising autoencoders. It recon-
structs the clean input from a corrupted version of the input.
The remainder of this paper is organized in such a way that
an overview of the autoencoders and LSTM is described in Sec. II.
2. Stacked autoencoders (SAEs)
Section III explains the proposed hybrid SAE–LSTM framework.
Section IV outlines the experiments conducted and the results An SAE is obtained by stacking several autoencoders where the
obtained. Concluding remarks are presented in Sec. V. output of the previous layer is the input to the succeeding layer. It is
better to train a stacked autoencoder by using greedy layer-wise train-
II. METHODS
ing. Each layer is trained separately one after another, and the output
The fundamental concepts of autoencoders and long short term of each layer is passed to the subsequent layer as input. The process of
memory networks are presented in this section. training of each layer individually is called unsupervised pre-training.
Once training of each layer is completed, supervised fine-tuning using
A. Autoencoders backpropagation is done for the entire network to improve the
Autoencoders are one of the important deep learning architec- results.28 A supervised learning strategy is used for fine-tuning in order
tures based on neural networks. An autoencoder is an unsupervised to minimize prediction error.
machine learning algorithm. It has three layers, namely, an input layer,
a hidden layer, and an output layer. The hidden layer performs encod- 3. Stacked denoising autoencoders (SDAEs)
ing, and the output layer performs decoding. The network is trained to
SDAEs are a group of denoising autoencoders stacked together.
copy or reconstruct its input to its output. The hidden layer achieves
SDAEs are variations of the stacked autoencoder (SAE) approach.
this by learning abstract representations of the inputs. Since autoen-
Once all layers are pre-trained, fine-tuning for the entire network is
coders are trained to reconstruct their input, the dimensionality of the
performed.
input and output layers must be the same.
Autoencoders are a variant of the feed-forward neural network,
which performs encoding of its input xi to a hidden representation h B. Long short term memory (LSTM)
with the help of an encoder as in Eq. (1). Then, it decodes the input Long short term memory is a variant of a recurrent neural net-
again from the hidden representation using a decoder as in Eq. (2). work that can learn long term dependencies as well.27 LSTM networks
Autoencoders make use of the squared error loss function given in can recollect information for a long time. The internal organization of
Eq. (3). The variables W, b, and xî denote weights, bias, and an LSTM cell is depicted in Fig. 2. In addition to input and output
FIG. 1. The architecture of a stacked autoencoder.

layers, LSTM networks can have one or more hidden layers. Each ft ¼ r Wf ht1 þUf xt þbf ; (6)
LSTM cell in a hidden layer consists of three gates, namely, input gate, s~t ¼ tanhðWht1 þUxt þbÞ; (7)
output gate, and forget gate. Each gate has its own job. The internal
memory state of the LSTM cell stores only the useful and relevant st ¼ ft st1 þit s~t ; (8)
information with the help of these three gates. The forget gate tries to ht ¼ ot tanhðst Þ; (9)
improve the effectiveness of the network by removing less important
where st is the state of the network at time step t. xt is the input at time
information from the cell state. The input gate is designed to transfer
step t, and ht is the output at time step t. it, ot, and ft represent the input
new information to the cell state, whereas the output gate passes useful
gate, output gate, and forget gate, respectively. s^t represents a tempo-
information from the memory cell to the output. The LSTM network
rary state. U, V, and W are the weights, and b is the bias. r and tanh
can be mathematically defined using the following equations:
denote the sigmoid and tanh activation functions. Element wise multi-
ot ¼ rðWo ht1 þUo xt þbo Þ; (4) plication is represented using .
it ¼ rðWi ht1 þUi xt þbi Þ; (5) The LSTM network with more than one hidden layer is called
stacked or deep LSTM. Stacked LSTM consists of multiple layers of
LSTM, with the output of the previous layer becoming the input to the
next layer. The architecture of a stacked LSTM is illustrated in Fig. 3.
Stacked LSTM cells can store more information through hidden layers.
Each LSTM layer processes some portion of the task and transfers the
output to the next layer. Finally, the results are taken from the output
layer. Stacking of hidden layers makes the recurrent model deeper and
can learn features more accurately. The concept of stacked or deep
LSTM was first proposed by Graves et al.29 Stacked LSTMs are found
to be a powerful technique for sequence prediction problems.
III. PROPOSED SAE–LSTM FRAMEWORK
The proposed SAE–LSTM framework for wind speed forecasting
is described in Fig. 4. The main steps include data collection, data pre-
processing, feature extraction, model training, performance evaluation,
FIG. 2. LSTM cell. and visualization.
FIG. 3. Stacked LSTM architecture.
FIG. 4. Framework of the proposed SAE-LSTM system.
TABLE I. Statistical analysis of data.
Feature Units Count Mean Std Minimum Maximum
Wind speed m/s 159 505 6.3168 2.5704 0 19.4576

Wind direction degree 159 505 160.7881 104.4688 0.0029 359.9996
TABLE II. Metrics used for evaluation. utilized for this study. The data are collected using an anemometer
installed at the height of 100 m. Table I provides the statistics of
Metric Mathematical equation the data. The datasets are divided into train and test sets in the
ratio of 70:30. The train datasets are employed to train the model,
MAE P
n
1
jyj ^y j j and the remaining 30% test data sets are used for validating the
n
j¼1 effectiveness of the model.
MSE 1
P
n 2
n yj ^y j
j¼1
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi B. Data preprocessing
RMSE Pn
1
n yj ^y j 2 The quality of data affects the performance of the machine
j¼1 learning model. So, it is necessary to clean and prepare data before
R2 P
n
processing it. Missing values in the dataset are imputed using
ðyj ^y j Þ2
j¼1 mean values. After imputing missing values, data are transformed
1P n
ðyj y j Þ2 for processing. Data transformation is an essential pre-processing
j¼1 step that transforms data to the required scale or range.
Standardization and normalization are various data transforma-
tion techniques. Standardization is a data preparation step where
numerical values are converted to a common scale without degrad-
A. Data collection ing the results. The rescaled data will have a mean of 0 and a stan-
The wind speed datasets provided by the NIWE (National dard deviation of 1. The equation for standardization is shown in
Institute of Wind Energy), Ministry of New and Renewable Eq. (10), where x is the original data, x0 is the normalized data, l is
Energy, Government of India, are considered for the training and the mean, and r is the variance of the data
evaluation of the proposed models. Wind speed data sampled
every ten minutes from December 2013 to December 2017 from xl
x0 ¼ : (10)
the Melamandai wind station situated at Tamilnadu, India, are r
FIG. 5. Distribution of wind speed data.
TABLE III. Train and test error values corresponding to various SAE architectures. Boldface denotes the train and test error values of the optimal SAE architecture.
Train accuracy Test accuracy
SAE architecture Hidden layer neurons MAE (m/s) RMSE (m/s) R2 (%) MAE (m/s) RMSE (m/s) R2 (%)
Arch1 (130, 115, 100, 85) 0.06 002 0.0858 99.229 0.06 148 0.0861 99.216
Arch2 (135, 125, 115, 105) 0.05019 0.0697 99.516 0.05087 0.0701 99.481
Arch3 (140, 135, 130, 125) 0.04299 0.0577 99.669 0.04326 0.0581 99.644
Arch4 (135, 130, 125, 115) 0.04 513 0.0615 99.623 0.04 710 0.0641 99.566
C. Feature extraction D. Model training

Feature extraction is a dimensionality reduction technique, The stacked LSTM network is used for training the wind speed
which decreases the computational load and extracts better qual- prediction model. Stacked LSTM networks with one to six hidden
ity features from data. In this study, stacked autoencoders are layers are trained and tested. The stacked network that achieves mini-
employed for extracting abstract features from the original fea- mum MAE and maximum R2 values is selected as optimal architec-
tures. It is a deep learning-based feature extraction technique. A ture. The newly extracted features from the stacked autoencoder are
detailed experimental analysis comparing the performance of provided as input to the optimal stacked LSTM. Training datasets are
PCA (principal component analysis) and stacked autoencoders used for training the SAE–LSTM model. An optimal four-layer
(SAE) for feature extraction is conducted. PCA is a feature extrac- stacked LSTM with 100 neurons in each of the hidden layers is
tion technique, which reduces the high dimensional data to lower employed for this study.
dimensions by extracting the most significant features that cap- In short, the procedure for selecting the optimal SAE–LSTM
ture maximum information about the dataset. The features are architecture is outlined as follows:
derived based on the variance that they cause in the output.
Compared to PCA, the results of the SAE seems to exhibit better 1. Wind speed datasets collected from the Melamandai wind sta-
performance for feature extraction due to which the stacked tion situated at Tamilnadu, India, are preprocessed and utilized
autoencoder is chosen for feature extraction in this study. The for the experiments. The dataset is subdivided in the ratio of
results comparing the performance are detailed in Sec. IV. A four- 70:30 such that 70% of data are utilized for training the model,
layer stacked under-complete autoencoder is used for this pur- and the remaining 30% of data are used for evaluating the effec-
pose. Since the hidden layers have fewer dimensions than the tiveness of the model. Of 70% of training data, 20% is used for
input layer, an under-complete autoencoder can extract important validation. The train, validation, and test split of the dataset are
features from the data more efficiently. Four different architec- 89 322, 22 331, and 47 707, respectively. Holdout is the preferred
tures of SAE models are tried out with varying combinations of validation technique when dealing with large datasets. The hold-
hidden neurons. The architecture with a maximum R2 value and out validation method is computationally more effective and
minimum error values is chosen as the final SAE model to extract takes less time than the cross-validation technique when process-
meaningful features. ing large datasets.30 Hence, holdout validation is the model selec-
tion technique employed in this paper to validate the final
optimal model.
2. Experiments are conducted to find an optimal four-layer stacked
autoencoder architecture to extract essential features from the
TABLE IV. Average time taken by the CPU per epoch.
Average CPU time per epoch (in seconds)
Hidden LSTM Without With Reduction

layers normalization normalization in time
1 212.29 207.27 5.02

2 486.39 481.51 4.88
3 770.71 765.05 5.66
4 1056.72 1050.53 6.19
5 1332.23 1326.16 6.07
6 1639.18 1629.15 10.03
Average 916.25 909.95 6.30
FIG. 6. Performance comparison of SAE architectures.
TABLE V. Evaluation of stacked LSTM networks based on statistical indicators. Boldface denotes the train and test accuracy of the optimal stacked LSTM network.
Hidden LSTM layers MAE (m/s) RMSE (m/s) R2 (%) MAE (m/s) RMSE (m/s) R2 (%)
1 0.4319 0.6064 94.54 0.4334 0.6105 93.79

2 0.4306 0.6020 94.59 0.4322 0.6095 93.90
3 0.4303 0.6019 94.60 0.4311 0.6093 93.92
4 0.4292 0.5957 95.16 0.4306 0.6087 94.17
5 0.4346 0.6109 94.34 0.4381 0.6124 93.72
6 0.4334 0.6133 94.58 0.4371 0.6164 93.84
wind speed datasets. Four combinations of hidden layers are E. Performance evaluation
investigated, and the architecture with minimum MAE and max- The performance evaluation of the models is carried out using
imum R2 is selected as the optimal architecture. The optimal the metrics MAE, MSE, RMSE, and R2 , which are summarized in
SAE selection is explained in Sec. IV C 1. Table II. The variables yj ; ^y j ; y j , and n denote the actual value, pre-
3. A suitable stacked LSTM network is then identified to forecast dicted value, mean value, and number of samples, respectively. MAE
wind speed. For that, LSTM networks with hidden layers from defines the average difference between the actual value and the pre-
one to six have been experimented, and the optimal LSTM dicted value. MSE is the average squared error between the actual and
network with minimum MAE and maximum R2 is selected. predicted values. RMSE represents the square root of MSE. R2 , also
The optimal LSTM network selection is demonstrated in called the coefficient of determination, describes how worthy the
Sec. IV C 2. regression model is. It is defined as the ratio of variance explained by
4. Once an optimal SAE and stacked LSTM network are identified, the model to the total variance. CPU time taken by the processor to
a hybrid model is developed by combining these two optimal train and test the models is also used as a metric to evaluate the effi-
architectures to predict wind speed for the next time step. The ciency of the models. The performance of the proposed hybrid model
evaluation of the hybrid model is described in Sec. IV C 3. is compared with benchmark models, SVR, ANN, RNN, LSTM,
FIG. 7. Loss on the train and test sets of various stacked LSTM models based on hidden layers.
FIG. 8. Plot of error values against hidden LSTM layer counts.
various PCA based hybrid models, and deep learning based (SAE) B. Wind speed data analysis
hybrid models. The wind speed data utilized for the study are graphically repre-
sented using the distribution chart and wind rose diagram as in Fig. 5.
F. Visualization The distribution chart displays the distribution of wind speed over
Visualizing the results is yet another important step in every time. The wind rose diagram displays the distribution of wind speed
machine learning framework. Scatter plots, line plots, and semi-log and wind direction of a particular location within a time period. It
plots are used for the graphical analysis of the results. The association presents the frequency of wind blowing from different directions.
between the actual and predicted values is visualized to evaluate the
effectiveness of the proposed model. C. Evaluation of models
IV. SIMULATION RESULTS 1. Selection of the optimal stacked autoencoder model
A. Hardware and software Experiments are conducted to find an optimal four-layer stacked
Python Keras library with the TensorFlow back end is employed autoencoder architecture in order to extract important features from
for all the experiments. The experiments are conducted in an Intel(R) the wind speed datasets. Four combinations of hidden layers are inves-
Core(TM) i5–5200U processor with 8 GB RAM. tigated, and the architecture with minimum MAE and maximum R2
FIG. 9. Comparison of stacked LSTM networks using scatterplots.
TABLE VI. C25 index values of various stacked LSTM networks. Values given in hidden LSTM layers. Experiments are conducted to identify a suitable
boldface denote C25 index values of the optimal model. number of hidden layers. LSTM networks with hidden layers from one
to six have been experimented, and the optimal LSTM network is
Coverage of points within the 6 25% boundary determined. Before outlining the optimal model, the influence of using
Hidden Data points within Total data normalized datasets on these models is investigated. The datasets are
LSTM layers the boundary points C25 index (%) normalized using standard scaling for conducting the experiments. In
standard scaling, data are rescaled depending on the mean and stan-
1 44 699 47 707 93.69 dard deviation of the data. The main advantage of normalization is
2 44 739 47 707 93.78 that it reduces the training time of the models. Table IV lists the aver-
3 44 763 47 707 93.83 age CPU time taken by the model for each epoch with and without
4 44 957 47 707 94.24 normalization. It is evident from the table that the training time of
5 44 604 47 707 93.50 each model is reduced significantly with normalized datasets. There is
6 44 711 47 707 93.72 an average reduction of 6.3 s in training time per epoch when models
are trained using standardized data.
The optimal stacked LSTM network is identified using two
criteria. The first criteria are based on statistical error indicators such
is labeled as the optimal architecture. The various architectures
as MAE, RMSE, and R2 , and the second evaluation is based on the
employed for analysis, and the corresponding results achieved are tab-
C25 Index value.
ulated in Table III. The MAE values obtained for each architecture are
Table V lists the results obtained for various stacked LSTM
plotted in Fig. 6. The figure illustrates that architecture 3 provides
architectures based on statistical error indicators. The network
a minimum error value compared to other architectures. This archi-
that yields minimum MAE and RMSE values, as well as maximum
tecture attains a minimum MAE of 0.04 326 and a RMSE of 0.0581.
Hence, it is selected as the optimal model. The optimal SAE model R2 values, is chosen as the optimal model. The LSTM network with
consists of 140, 135, 130, and 125 neurons, respectively, in the first, four hidden layers achieves a lower MAE value of 0.4306 and a
second, third, and fourth hidden layers. The selected architecture gains RMSE of 0.6087 and also gains the highest R2 value of 94.17.
the highest R2 value of 99.644, which indicates the feature extraction Hence, the four-layer stacked network is chosen as the optimal
capability of the stacked autoencoder. network. Figure 7 describes the model loss on train and test
datasets for various stacked LSTM models. The network with four
hidden LSTM layers achieves minimum train and test loss. The
2. Selection of the optimal stacked LSTM network
MAE and RMSE values obtained for each network are plotted in
LSTM makes use of the history of wind speed data for predict- Figs. 8(a) and 8(b), respectively. Both the plots have an almost sim-
ing future occurrences. Stacked LSTM networks are proposed for ilar structure. The figure illustrates that the error value is found to
wind speed forecasting in this work. A stacked LSTM has multiple decrease significantly up to four LSTM layers beyond which it is
TABLE VII. Evaluation of models based on statistical indicators. Boldface denotes the train and test accuracy of the proposed optimal hybrid model.
Model MAE (m/s) RMSE (m/s) R2 (%) MAE (m/s) RMSE (m/s) R2 (%)
RNN 0.4412 0.6177 94.14 0.4437 0.6229 93.48

LSTM 0.4292 0.5957 95.16 0.4306 0.6087 94.17
PCA–RNN 0.6155 0.8151 90.13 0.6742 0.9058 87.80
PCA–LSTM 0.5595 0.7831 91.28 0.5688 0.7949 90.04
SVR 0.4421 0.6150 94.56 0.4512 0.6388 93.57
ANN 0.4516 0.6321 94.16 0.4697 0.6552 93.53
PCA–SVR 0.4495 0.6254 94.02 0.4566 0.6472 93.40
PCA–ANN 0.4583 0.6431 93.91 0.4743 0.6655 92.38
SAE–RNN 0.4331 0.6127 94.42 0.4410 0.6209 93.94
SAE-1 LSTM 0.4219 0.5921 95.63 0.4242 0.6015 94.24
SAE-2 LSTM 0.4214 0.5935 95.51 0.4254 0.6046 94.19
SAE-3 LSTM 0.4206 0.5938 95.46 0.4299 0.6073 94.23
SAE-4 LSTM 0.3921 0.5871 96.93 0.3982 0.5969 96.20
SAE-5 LSTM 0.4246 0.5929 95.37 0.4286 0.6080 94.21
SAE-6 LSTM 0.4243 0.5954 95.48 0.4262 0.6038 94.26
FIG. 10. Scatterplots for recurrent and hybrid models.
found to increase. This implies that after a steady increase in the as shown in Fig. 9. In addition to the evaluation based on error indices,
performance of the stacked LSTM network up to four layers, the the efficiency of the various models is also validated by calculating the
performance tends to decline significantly. C25 index value of the models. The C25 index value of each model is
The various stacked LSTM models are graphically analyzed by computed by counting the number of data points that lie within the
plotting the predicted values against actual values using scatter plots, 625% boundary of the lines drawn near the regression line. The C25
FIG. 11. Scatterplots for SVR, ANN, and hybrid PCA models.
FIG. 12. Scatterplots for various hybrid SAE-LSTM models.
index is defined as the ratio of the number of data points within the are within the 625% boundary lines. This signifies that the proposed
625% boundary to the total number of data points and is specified in hybrid model is more powerful compared to SVR, ANN, vanilla RNN,
Eq. (11). C25 index values of the models are tabulated in Table VI. A LSTM, and other hybrid models.
high C25 value indicates that more data points lie within the boundary The comparison of the SVR, ANN, RNN, LSTM, and hybrid
lines. The four-layer stacked LSTM model has the highest C25 index models is also performed graphically using a semi-log plot, as shown
value of 94.24, which signifies that this model is more effective com- in Fig. 13. It is observed that the predicted values generated using a
pared to other models four-layer SAE–LSTM hybrid model are more close to the actual value.
There is much deviation between actual and predicted values for other
No: of data points within boundary recurrent models used for comparison. This again confirms the
C25 Index ¼ 100%: (11)
Total number of data points
TABLE VIII. C25 index values of recurrent and hybrid models. Boldface denotes the
3. Evaluation of hybrid models C25 index value of the proposed hybrid model.
Once the optimal SAE and stacked LSTM structures are identi-
fied, a hybrid model is developed by combining these two architectures Coverage of points within the 625% boundary
to predict wind speed for the next time step. The effectiveness of the Data points within Total data C25 index
suggested hybrid model is compared with machine learning models Model the boundary points (%)
such as SVR and ANN and deep learning based models such as RNN,
LSTM, and various PCA based and SAE based hybrid models using RNN 44 586 47 707 93.46
the error indices MAE, RMSE, and R2 . The results obtained are listed LSTM 44 957 47 707 94.24
in Table VII. It is clear from the table that the hybrid four-layer PCA–RNN 41 129 47 707 86.21
SAE–LSTM model attains minimum error values of 0.3982, and PCA–LSTM 42 637 47 707 89.37
0.5969, respectively, for MAE and RMSE and a maximum R2 value of SVR 44 671 47 707 93.64
96.20. Hence, the proposed four-layer SAE–LSTM model exhibits bet-
ANN 44 241 47 707 92.73
ter forecasting accuracy compared to other models. This study also
PCA–SVR 44 634 47 707 93.55
manifests that deep learning based hybrid models are found to have
higher forecasting accuracy compared to individual models and PCA PCA–ANN 43 926 47 707 92.07
based hybrid models. SAE–RNN 44 792 47 707 93.89
Figure 10 shows the scatter plots of the RNN, SAE–RNN, LSTM, SAE-1 LSTM 45 029 47 707 94.39
and SAE–LSTM models. Figure 11 demonstrates the scatter plots of SAE-2 LSTM 45 040 47 707 94.41
SVR, ANN, and various PCA based hybrid models. Figure 12 SAE-3 LSTM 44 992 47 707 94.31
describes the scatter plots of the various hybrid SAE–LSTM models. SAE-4 LSTM 45 556 47 707 95.49
C25 index values of the models described using Figs. 10–12 are tabu- SAE-5 LSTM 45 021 47 707 94.37
lated in Table VIII. The four-layer SAE–LSTM model has the highest SAE-6 LSTM 45 207 47 707 94.76
C25 index value of 95.49, which indicates that 95.49% of total points
FIG. 13. Comparison of models using semi-log plot.
reliability of the developed hybrid stacked four-layer SAE–LSTM V. CONCLUSION

model. Since the key component of the proposed hybrid model is the A hybrid wind speed forecasting model using the SAE and deep
LSTM network, a separate comparison of the performance of the LSTM is proposed to enhance the accuracy of prediction. In the hybrid
hybrid model with LSTM is provided in Fig. 14. The relationship framework, optimal SAE architecture is first identified to extract high-
between actual and predicted values is plotted. It can be noticed quality, meaningful features from wind speed data. Then, suitable
that the deviation between the actual and predicted values of the deep LSTM architecture is identified. It is then trained on top of the
SAE–LSTM model is less compared to the LSTM network. The hidden representation of the SAE to forecast wind speed for the next
stacked autoencoder is employed in this study for extracting time step. Various hybrid SAE–LSTM models with different hidden
the most relevant features from the input time series, and these layers are setup and evaluated using error indices MAE, RMSE, and
extracted features are fed as inputs to the stacked LSTM network. R2 . The hybrid SAE–LSTM model with four hidden layers outper-
The stacked LSTM network with four hidden layers has been used formed other recurrent and hybrid models with a minimum error
for forecasting wind speed. Each successive hidden layer extracts value and a maximum R2 value. This study also establishes that deep
more abstract features from the previous layer. Hence, the devia- learning based hybrid models perform more effectively in wind speed
tion between actual and forecasted values is lesser compared to prediction compared to individual models. The future works include
other models. From the above observations, it is worth noting that the implementation of a wind speed prediction system using a bidirec-
the proposed hybrid SAE–LSTM model is more reliable for accu- tional LSTM network.
rate wind speed forecasting.
ACKNOWLEDGMENTS
The authors would like to thank the NIWE (National Institute
of Wind Energy), Ministry of New and Renewable Energy,
Government of India, for providing datasets for our research. The
contribution of the NIWE is acknowledged.
APPENDIX: BACKPROPAGATION THROUGH TIME

OF LSTM
The loss function used by the model is the squared error func-
tion. The total loss made by the model is the sum of the loss over all
time steps as given in the following equation:
X
T
EðhÞ ¼ Et ðhÞ; (A1)
t¼1
where Et ðhÞ is the loss function at time step t and is defined as

FIG. 14. Comparison of SAE–LSTM vs LSTM models. follows:
@st
1X T Next, we need to compute @W . We know that
Et ðhÞ ¼ ðyt ^y t Þ2 ; (A2)
T t¼1 st ¼ Wst1 þ Uxt þ b: (A9)
@st
where yt is the original value, ^y t is the predicted value at time t, and However, in an ordered network, it is not possible to compute @W
T is the number of time steps. h denotes the parameters of the by considering st1 as a constant because st1 also rely on W.
model, i.e., h ¼ fW; U; V g. The parameters W, U, and V are the @st
Hence, @W has two
þ
components, namely, explicit and implicit.
weights corresponding to the hidden layer, input layer, and output Explicit: @@Wst , where the þ sign indicates that all other inputs
layer, respectively. are treated as constant.
The network is trained using backpropagation through time Implicit: @s@st1t @s@W
t1
, which means summing up of all indirect
(BPTT). During backpropagation, we need to compute the gradient paths from st to W.
of the loss function with respect to W, U, and V. @st
Therefore, @W can be written as
1. Derivative with respect to W ›EðhÞ ›W @st @ þ st @st @st1
¼ þ
Take the derivative of Eq. (A1) with respect to W @W @W @st1 @W

@EðhÞ X T
@Et ðhÞ @ þ st @st @ þ st1 @st1 @st2
¼ : (A3) ¼ þ þ
@W @W @W @st1 @W @st2 @W
t¼1
P @ þ st @st @ þ st1 @st @st1 @st2
Each term of Tt¼1 @E@W t ðhÞ
is the summation of the derivative of the ¼ þ þ
@W @st1 @W @st1 @st2 @W
loss with respect to the weights in the hidden layer. Using the chain
@ þ st @st @ þ st1 @st @st1 @ þ st2 @st2 @st3
t ðhÞ
rule of derivatives, @E@W is computed by summing up the gradients ¼ þ þ þ
@W @st1 @W @st1 @st2 @W @st3 @W
along all the paths from the loss function Et ðhÞ to W. Whenever we @ þ st @st @ þ st1 @st @st1 @ þ st2
want to compute the derivative of the loss function with respect to ¼ þ þ
@W @st1 @W @st1 @st2 @W
any parameter, the procedure is to look at all the paths that go from
@st @st1 @st2 @st3
the loss function to that parameter and sum up the gradients across þ : (A10)
those paths. There are t paths connecting Et ðhÞ to W. Each state @st1 @st2 @st3 @W
variable in an ordered network is determined one at a time in a defi- The replacement of terms is repeated until the value of t reaches 1.
nite order. Therefore, we have Finally, we get
@Et ðhÞ @Et ðhÞ @st @st @ þ st @st @ þ st1 @st @st1 @ þ st2
¼ ; (A4) ¼ þ þ
@W @st @W @W @W @st1 @W @st1 @st2 @W
@Et ðhÞ
where @st depends on the parameter V only.
@st @st1 @st2 @ þ st3
Now, we need to compute @E@st ðhÞ þ þ
t
@st1 @st2 @st3 @W
@Et ðhÞ @Et ðhÞ @^y t @st @st1 @st2 @s4 @s3 @ þ s2
¼ ; (A5) þ
@st @^y t @st @st1 @st2 @st3 @s3 @s2 @W
but @st @st1 @st2 @s3 @s2 @ þ s1
! þ : (A11)
1X T
@st1 @st2 @st3 @s2 @s1 @W
@ ðyt ^y t Þ2
@Et ðhÞ T t¼1 For simplicity, we can reduce Eq. (A11) by eliminating some varia-
¼
@^y t @^y t bles so as to obtain a simple summation as follows:
2X T
@st @ þ st @st @ þ st1 @st @ þ st2
¼ ðyt ^y t Þ: (A6) ¼ þ þ þ
T t¼1 @W @W @st1 @W @st2 @W
We know that ^y t ¼ Vst þ c, where c is the bias. Then @st @ þ s3 @st @ þ s2 @st @ þ s1
þ þ þ
@^y t @ ðVst þ cÞ @s3 @W @s2 @W @s1 @W
¼ @st @ þ st @st @ þ st1 @st @ þ st2
@st @st ¼ þ þ þ
@^y t @st @W @st1 @W @st2 @W
¼ V: (A7)
@st @st @ þ s3 @st @ þ s2 @st @ þ s1
@Et ðhÞ @^y þ þ þ : (A12)
Substituting the values of @^y and @stt in Eq. (A5), we get @s3 @W @s2 @W @s1 @W
t
Eq. (A12) can be written as follows:
@Et ðhÞ 2 XT
¼ ðyt ^y t ÞV Xt
@st T t¼1 @st @st @ þ sk
¼ : (A13)
@W @sk @W
2V X
T k¼1
¼ ðyt ^y t Þ: (A8)
T t¼1 @st
Substituting the value of @W in Eq. (A4), we get
@Et ðhÞ @Et ðhÞ X

t
@st @ þ sk @Et ðhÞ 2 XT
¼ : (A14) ¼ ðyt ^y t Þst
@W @st k¼1 @sk @W @V T t¼1
Substituting the values of @E@st ðhÞ as computed in Eq. (A8) to Eq. 2st X
T
t ¼ ðyt ^y t Þ: (A22)
(A14), we get T t¼1
@Et ðhÞ 2V X

T Xt
@st @ þ sk At this stage, we computed the gradient of the loss function with
¼ ðyt ^y t Þ : (A15) respect to W, U, and V. So, the total gradient can be calculated by
@W T t¼1 k¼1
@sk @W
adding these three gradients. Therefore
Total gradient
›EðhÞ
2. Derivative with respect to U ›U
@Et ðhÞ @Et ðhÞ @Et ðhÞ
Computing the gradient of U is similar to computing the gra- ¼ þ þ
dient of W since both of them require taking sequential derivatives @W @U @V
of the st vector 2V X T X t
@st @ þ sk 2V X
T
¼ ðyt ^y t Þ þ ðyt ^y t Þ
T t¼1 @sk @W T t¼1
@Et ðhÞ @Et ðhÞ X
t
@st @ þ sk k¼1
¼ X
t
@st @ þ sk 2st X T
@U @st k¼1 @sk @U þ ðyt ^y t Þ
@sk @U T t¼1
2V X
T X t
@st @ þ sk k¼1

¼ ðyt ^y t Þ : (A16) 2V X T X t
@st @ þ sk @ þ sk 2st XT
T t¼1 k¼1
@sk @U ¼ ðyt ^y t Þ þ þ ðyt ^y t Þ
T t¼1 k¼1
@sk @W @U T t¼1
The difference between U and W lies in the real implementation " #
2 X X
þ þ T t
since the values of @@Usk and @@Wsk vary. @st @ þ sk @ þ sk
¼ ðyt ^y t Þ V þ þ st : (A23)
T t¼1 k¼1
@sk @W @U
3. Derivative with respect to V ›EðhÞ ›V
The loss function used by the model is the squared error func-
REFERENCES
tion. The total loss made by the model is the sum of the loss over all 1
time steps as given in Eq. (A1). H. Liu, C. Chen, X. Lv, X. Wu, and M. Liu, “Deterministic wind energy fore-
casting: A review of intelligent predictors and auxiliary methods,” Energy
Taking the derivative of Eq. (A1) with respect to V, we get
Convers. Manage. 195, 328–345 (2019).
2
V. Ranganayaki and S. Deepa, “Linear and non-linear proximal support vector
@EðhÞ X T
@Et ðhÞ machine classifiers for wind speed prediction,” Cluster Comput. 22, 379–390
¼ : (A17)
@V t¼1
@V (2019).
3
P Y. Li, P. Yang, and H. Wang, “Short-term wind speed forecasting based on
t ðhÞ
Each term of Tt¼1 @E@V is the summation of the derivative of the improved ant colony algorithm for LSSVM,” Cluster Comput. 22,
loss with respect to the weights in the output layer 11575–11581 (2019).
4
T. Blanchard and B. Samanta, “Wind speed forecasting using neural networks,”
@Et ðhÞ @Et ðhÞ @^y t Wind Eng. 44, 33 (2020).
¼ ; (A18) 5
M. Madhiarasan and S. Deepa, “A novel criterion to select hidden neuron
@V @^y t @V numbers in improved back propagation networks for wind speed forecasting,”
!
1X T Appl. Intell. 44, 878–893 (2016).
ðyt ^y t Þ2
6
@ J. Du, Y. Liu, and Z. Liu, “Study of precipitation forecast based on deep belief
@Et ðhÞ T t¼1 networks,” Algorithms 11, 132 (2018).
¼ 7
P. Zhang, L. Zhang, H. Leung, and J. Wang, “A deep-learning based precipitation
@^y t @^y t
forecasting approach using multiple environmental factors,” in 2017 IEEE
2 XT
International Congress on Big Data (BigData Congress, IEEE, 2017), pp. 193–200.
¼ ðyt ^y t Þ: (A19) 8
M. I. A. Khalaf and J. Q. Gan, “Deep classifier structures with autoencoder for
T t1
higher-level feature extraction,” in Proceedings of the 10th International Joint
Conference on Computational Intelligence, Volume 1: IJCCI (SciTePress, 2018),
Therefore, substituting this in Eq. (A18), we get
pp. 31–38, ISBN 978-989-758-327-8.
9
D. Zhang, X. Peng, K. Pan, and Y. Liu, “A novel wind speed forecasting based
@Et ðhÞ 2X T
@^y on hybrid decomposition and online sequential outlier robust extreme learning
¼ ðyt ^y t Þ t : (A20)
@V T t¼1 @V machine,” Energy Convers. Manage. 180, 338–357 (2019).
10
X. Mi, H. Liu, and Y. Li, “Wind speed prediction model using singular spec-
However, we know that ^y t ¼ Vst þ c. Then trum analysis, empirical mode decomposition and convolutional support vector
machine,” Energy Convers. Manage. 180, 196–205 (2019).
@^y t @ ðVst þ cÞ 11
M. Qolipour, A. Mostafaeipour, M. Saidi-Mehrabad, and H. R. Arabnia,
¼ “Prediction of wind speed using a new grey-extreme learning machine hybrid
@V @V
@^y t algorithm: A case study,” Energy Environ. 30, 44–62 (2019).
12
¼ st : (A21) H. S. Dhiman, P. Anand, and D. Deb, “Wavelet transform and variants of svr
@V with application in wind forecasting,” Innovations in Infrastructure (Springer,
Substituting this in Eq. (A20), we get 2019), pp. 501–511.
13 22
M. Carrillo, J. Del Ser, M. N. Bilbao, C. Perfecto, and D. Camacho, “Wind Y.-Y. Hong and C. L. P. P. Rioflorido, “A hybrid deep learning-based neural
power production forecasting using ant colony optimization and extreme learn- network for 24-h ahead wind power forecasting,” Appl. Energy 250, 530–539
ing machines,” in International Symposium on Intelligent and Distributed (2019).
23
Computing (Springer, 2017), pp. 175–184. V. A. Kumar and N. Elavarasan, “A survey on dimensionality reduction
14
R. Sarkar, S. Julai, S. Hossain, W. T. Chong, and M. Rahman, “A comparative technique,” Int. J. Emerging Trends Technol. Comput. Sci. 3, 36–41 (2014),
study of activation functions of NAR and NARX neural network for long-term https://www.semanticscholar.org/paper/A-Survey-on-Dimensionality-Reduction-
wind speed forecasting in Malaysia,” Math. Probl. Eng. 2019, 1. Technique-Kumar-Elavarasan/96100fcdc4f0f6daace44f8b27501f114ec8891f.
15 24
R. M. Adnan, Z. Liang, X. Yuan, O. Kisi, M. Akhlaq, and B. Li, “Comparison of V. Sumithra and S. Surendran, “A review of various linear and non linear
LSSVR, M5RT, NF-GP, and NF-SC models for predictions of hourly wind dimensionality reduction techniques,” Int. J. Comput. Sci. Inf. Technol. 6,
speed and wind power based on cross-validation,” Energies 12, 329 (2019). 2354–2360 (2015), https://pdfs.semanticscholar.org/ed2f/c78cf5d7eb8233deaa
16
J.-Z. Wang and Y. Wang, “A novel wind speed forecasting model for wind 9dde8f42b1e2d7f661.pdf.
25
farms of Northwest China,” Int. J. Green Energy 14, 463–478 (2017). M. A. Nielsen, Neural Networks and Deep Learning (Determination Press, San
17
S. Qin, J. Wang, J. Wu, and G. Zhao, “A hybrid model based on smooth transition Francisco, CA, USA, 2015), Vol. 2018.
26
periodic autoregressive and elman artificial neural network for wind speed forecast- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
ing of the Hebei region in China,” Int. J. Green Energy 13, 595–607 (2016). Comput. 9, 1735–1780 (1997).
18 27
S. Li, H. Fang, and B. Shi, “Multi-step-ahead prediction with long short term I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press,
memory networks and support vector regression,” in 2018 37th Chinese Control 2016).
28
Conference (CCC) (IEEE, 2018), pp. 8104–8109. M. Hossain, B. Rekabdar, S. J. Louis, and S. Dascalu, “Forecasting the weather
19
C. Yu, Y. Li, Y. Bao, H. Tang, and G. Zhai, “A novel framework for wind speed of Nevada: A deep learning approach,” in 2015 International Joint Conference
prediction based on recurrent neural networks and support vector machine,” on Neural Networks (IJCNN) (IEEE, 2015), pp. 1–6.
29
Energy Convers. Manage. 178, 137–145 (2018). A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep
20
J. Tang and H. Sui, “Power system transient stability assessment based on recurrent neural networks,” in 2013 IEEE International Conference on
stacked autoencoders and support vector machine,” in IOP Conference Series: Acoustics, Speech and Signal Processing (IEEE, 2013), pp. 6645–6649.
30
Materials Science and Engineering (IOP Publishing, 2018), Vol. 452, p. 042117. S. Raschka, “Model evaluation, model selection, and algorithm selection in
21
L. Chen, Z. Li, and Y. Zhang, “Multiperiod-ahead wind speed forecasting using machine learning,” Computing Research Repository (CoRR), abs/1811.12808,
deep neural architecture and ensemble learning,” Math. Probl. Eng. 2019, 1. http://arxiv.org/abs/arXiv:1811.12808 (2018).

A Hybrid Wind Speed Forecasting Model Using Stacked Autoencoder and LSTM

Uploaded by

Copyright:

Available Formats

You might also like

A Hybrid Wind Speed Forecasting Model Using Stacked Autoencoder and LSTM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Hybrid Wind Speed Forecasting Model Using Stacked Autoencoder and LSTM

Uploaded by

Copyright:

Available Formats

A hybrid wind speed forecasting model using

stacked autoencoder and LSTM

K. U. Jaseena, and Binsu C. Kovoor

J. Renewable Sustainable Energy 12, 023302 (2020); https://doi.org/10.1063/1.5139689 12, 023302

A hybrid wind speed forecasting model using

K. U. Jaseenaa) and Binsu C. Kovoorb)

FIG. 1. The architecture of a stacked autoencoder.

FIG. 3. Stacked LSTM architecture.

FIG. 4. Framework of the proposed SAE-LSTM system.

TABLE I. Statistical analysis of data.

Feature Units Count Mean Std Minimum Maximum

Wind speed m/s 159 505 6.3168 2.5704 0 19.4576

FIG. 5. Distribution of wind speed data.

Train accuracy Test accuracy

C. Feature extraction D. Model training

TABLE IV. Average time taken by the CPU per epoch.

Average CPU time per epoch (in seconds)

Hidden LSTM Without With Reduction

1 212.29 207.27 5.02

Train accuracy Test accuracy

1 0.4319 0.6064 94.54 0.4334 0.6105 93.79

FIG. 8. Plot of error values against hidden LSTM layer counts.

FIG. 9. Comparison of stacked LSTM networks using scatterplots.

Train accuracy Test accuracy

RNN 0.4412 0.6177 94.14 0.4437 0.6229 93.48

FIG. 10. Scatterplots for recurrent and hybrid models.

FIG. 12. Scatterplots for various hybrid SAE-LSTM models.

FIG. 13. Comparison of models using semi-log plot.

reliability of the developed hybrid stacked four-layer SAE–LSTM V. CONCLUSION

APPENDIX: BACKPROPAGATION THROUGH TIME

where Et ðhÞ is the loss function at time step t and is deﬁned as

@Et ðhÞ @Et ðhÞ X

@Et ðhÞ 2V X

You might also like