Wind Speed Forecasting Using Recurrent Neural Networks and Long Short Term Memory

2019 6th International Conference on Instrumentation, Control, and Automation (ICA)
Bandung, Indonesia. 31 July – 2 August 2019
Wind Speed Forecasting Using Recurrent Neural

Networks and Long Short Term Memory
Fitriana R. Ningsih Esmeralda C. Djamal Asep Najmurrakhman
Department of Informatics Department of Informatics Department of Electrical
Universitas Jenderal Achmad Yani Universitas Jenderal Achmad Yani Engineering
Cimahi, Indonesia Cimahi, Indonesia Universitas Jenderal Achmad Yani
fitrianarn@gmail.com esmeralda.contessa@lecture.unjani. Cimahi, Indonesia
ac.id asepnajmu@ieee.org
Abstract— Wind is a natural phenomenon that plays an Vector Regression (SVR) [12], Backpropagation Levenberg-
essential role in various aspects of human life, including the Marquardt Algorithm [6] Support Vector Machine (SVM)
spread of pests in plants. This variable is right for regions often [13], Long Short-Term Memory Network [14] [15] [16],
hit by strong winds. The development of machine learning Recurrent Neural Networks [17].
technology now makes predictions of wind speed to anticipate
future impacts. This study proposes wind speed predictions Deep learning methods are mainly various neural
using Recurrent Neural Network (RNN) with Long Short Term networks designed by inspiration from brain neurons, such as
Memory (LSTM). The data used was obtained from the Convolutional Neural Networks (CNN) [18] for image
Nganjuk Meteorology and Geophysics Agency (BMKG), East processing. In wind speed prediction, previous research used
Java from 2008 to 2017. The results showed that the use of the deep learning methods such CNN [19], Recurrent Neural
Adam model could provide 92.7% accuracy for training data Networks (RNN) [17] [20], Deep Belief Network (DBN) [21].
and 91.6% for new data. These methods have been widely used such image processing,
speech recognition, time series data, and natural language
Keywords— wind speed; forecasting; recurrent neural processing. The best way to deal with the time series data such
networks; LSTM; meteorology data is RNN, which is a general term for a series
of neural networks capable of processing sequence data [16].
I. INTRODUCTION RNN encounter great difficulties in dealing with long-term
The wind has several essential roles in various aspects of dependencies with gradients [16][22]. To solve the problem of
human life, such as renewable energy [1], shipping, disaster long-term dependence, Long-Short Term Memory network is
due to turbulence intensity [2] and agriculture. One of the proposed, and it performed very well in dealing with the issues
factors that influence agricultural production is wind [3]. For of the time series data [22]. LSTM's hidden layer adds three
example, the tomato yellow leaf curl virus is spread due to doors, namely, input, output, and forgotten gates, which is the
climate change [4]. Strong winds can cause danger in key to solving the long-term dependence problem of RNN. In
pollinating and infecting pests such as fungal development in deep learning, LSTM is usually used to make point prediction
plants. The impact of this is a loss for farmers. Therefore, [16].
some research has been concerned to wind speed prediction
This study offers a method for predicting monthly wind
Wind speed forecasting refers to historical climate data or speeds using RNN and LSTM. The data used are data on
expertise in physical knowledge. Thus, wind speed prediction temperature, humidity, and wind speed obtained from BMKG
mainly comprises two categories: data-driven model-based Nganjuk district in the period 2008-2017. Prediction is made
prediction and physical model-based forecast [2]. History of to determine the wind speed of one month ahead based on the
climate data that can be used to predict wind speed such as previous wind speed data pattern to find out that one of the
wind speed [5], air temperature and air humidity [1], wind four wind speed ranges namely "Quiet" has a variety of less
speed, air humidity, air temperature, and air pressure [6]. Then than 1 knot, "Shade" has a range of one to three knots. Fields
other studies predict wind speed by using data from 26 cities of four to six knots, "soft breeze" have a span of 7-10 knots,
and 12 other parameters besides wind speed for monthly and "Strong" has wind speed more than 10 knots.
predictions [7]. The study used data for the last three months
at 10-minute intervals for the prediction of the next hour. II. PROPOSED METHODS
Prediction of meteorological parameters consists of This research used the average wind speed, air
several aspects. Viewed from the area can use a global area temperature, and air humidity in 2008-2017. The data was
[8] or specific area [9]. While looking at a time can use short obtained from the Agency for Meteorology, Climatology, and
term or minute to hours [2] [9], daily [10] [11], weeks or Geophysics (BMKG) of Nganjuk Regency, East Java
months [7]. Sawahan Geophysics Station through the BMKG website.
Climate parameter data can be seen in Table I. Unfortunately,
The current development makes the machine can learn the some climate data is lost or unusable, so it needs to be pre-
patterns of data. Meteorological data is a sequential or time processed as in Table II.
series of data which certainly possessed a particular pattern
that can be learned. Some methods that have been used to Pre-process stages of wind speed, air temperature, and
predict wind speed include Backpropagation and Support humidity will be carried out including handling lost data,
978-1-7281-0916-9/19/$31.00 ©2019 IEEE 137

normalizing each feature in the same range, and converting Month
Climate Parameters
from day to month by looking for maximum values for each to- Date Temperature Humidity
Wind
Speed
climate parameter variable every month. Stages can be seen in
Fig. 1. 2. 29/02/2008 24.7 96.0 9.0
… … … … …
TABLE I. CLIMATE DATA 90. 26.1 95.0 3.0
31/07/2015
Climate Data 100. 31/08/2015 25.4 93.0 2.0
Days
Wind
to- Date Temperature Humidity … … … … …
Speed
1. 01/01/2008 22,4 75 3 119 30/11/2017 25.5 97.0 2.0
2. 02/01/2008 22,5 88 3 120 31/12/2017 26.2 97.0 3.0
… … … … …
2129. 19/05/2014 24,6 88 9999 3) Segmentation with Overlapping
Each data set has a history of one year. The data used are
2130. 20/05/2014 23,3 93 1
five years and ten years of history. Overlap way intends to
… … … … … minimize the effect data discontinue, as shown in Fig. 2.
3651. 30/12/2017 88 1 2 Month
3652. 31/12/2017 91 2 1 1 2 ... 12 13 … 107 108 ... 119 120
Dataset 1 Dataset 108
Dataset 2 Dataset 109
Fig. 2. Data Segmentation

Data Forget Gate
Interpolation
Feature
B. Recurrent Neural Networks
Input Gate
Extraction
RNNs are a model that mimics the way of thinking in
Data Training
Normalization Output Gate
humans by considering information from the past in the
Segmentation
Dense Layer
(Sigmoid)
learning process. RNNs are a modification of the Feedforward
Neural Network by characterizing using feedback from output
Trainig to input. The RNN output depends not only on this input but
Result Wight
also on the previous state of the network that acts as memory
Data [23]. The RNN architecture includes the input layer (xt),
Interpolation
Feature
Prediction of Wind Speeed hidden layer (ht), and output layer (ot), as shown in Fig. 3.
Extraction
Prediction o ot-1 ot ot+1
Normalization Result
Data Test
V V V V
Calm
1-3
Ligt Breeze
4-6
Moderate Breeze
7-10
Strong Breeze
>10 W ht-1 ht ht+1
h
Fig. 1. Wind Speed Forecasting Model W W W W
Unfold
1) Normalization U U U
Climate parameters used are air humidity, temperature, U
and duration of solar radiation per day. The three variables x xt-1 xt xt+1
have different units and ranges of values. Therefore,
normalization in the same field is needed, namely 0-1 as
Fig. 3. Recurrent Neural Networks Architecture
shown (1).
− ( ) (1) Nodes in the hidden layer are fully connected; the output
= of the hidden layer is also the input of the hidden layer next
( ) − ( )
time. U is the weight between the input layer and the hidden
The normalization takes the highest and lowest values on layer, V is the weight between the hidden layer and the output
each row, and this value is used to normalize the numbers in layer, W is the weight between the current hidden layer and
the row. the hidden layer at the next time. Weight w will accept the
previous new state value from the hidden layer minus one
2) Day to Month Conversion
which is input when the state xt and stored in the next hidden
This extraction process is the search for maximum
layer (ht) or other words when the hidden layer is updated. The
values every month from each variable, namely air
hidden layer calculation is seen in (3), and the output
temperature, air humidity, and wind speed so that it becomes
calculation uses the Softmax function, which can be seen in
monthly data as in Table II.
(4).
TABLE II. FEATURE EXTRACTION RESULTS ℎ =( + ℎ ) (3)
Climate Parameters
Month
Wind = ( ℎ) (4)
to- Date Temperature Humidity
Speed
1. 31/01/2008 24.5 91.0 4.0 Although in theory, RNN can overcome long-term
dependencies, the longer the time interval step, the weight
138
matrix will continue to convolution with the previous output. at the last point in time t-1 and adding the current input into
This condition will cause a vanishing gradient problem or a the Sigmoid function the value of ot using (10) produces a
situation where the gradient value is minimal so that the value between zero and one to determine how many cells
learning will be prolonged. In the RNNs architecture, there are declare information that needs output. The cell state is first
several units such as Gate Recurrent Unit (GRU), activated in the tanh layer before convolution by , which
Backpropagation Through Time (BPTT) and Long Short will produce a value between 1 and -1. The output information
Term Memory (LSTM). This study used LSTM networks to of ℎ is convolution with the LSTM block at t time which can
overcome the vanishing gradient problem. be calculated using (11)
C. Long Short Term Memory = ( . ℎ , + ) (10)
LSTM is a particular type of Recurrent Neural Network ℎ = ∗ tanh( ) (11)
that is capable of learning long-term dependencies [24]. The
difference between LSTM and traditional RNN neural
networks is that each neuron in LSTM is a memory cell [25]. III. RESULT AND DISCUSSION
LSTM connects previous data information to current neurons. Climate data used from 2008-2017 of Nganjuk city, East
Each neuron contains three gates, namely the gate input, forget Java. The data is divided into two parts, namely 80% for
gate, and gate output which can be seen in Fig. 4 training data and 20% for test data. Training optimization with
variations in weight renewal models, number of training data
ht-1 ht ht+1 sets, and learning rates.
A. Comparing Two Optimization Models
Ct-1 Ct
x + x + x +
tanh tanh tanh
x x ft
it
Ct
x
Ot
x x x
This study used two optimization models of Stochastic
Gradient Descent (SGD) and Adaptive Moment Estimation
sigmoid sigmoid tanh sigmoid sigmoid sig moid tanh sigmoid sig moid sig moid tanh sigmoid
ht-1 ht
Xt-1 Xt Forget
Input gate Output gate
Xt+1 (Adam). Meanwhile, both optimization models use the
learning rate = 0.001, and the number of epochs is 700 to
gate
Fig. 4. Long Short Term Memory Architecture determine the effectiveness of the two optimization methods.
The test results with the two optimization methods can be seen
1) Forget Gate in Table III.
To determine which information will be discarded from
the cell can use (5). By entering the output ht-1 in the previous TABLE III. COMPARING OF OPTIMIZATION MODEL
unit (t-1) and adding the current input (Xt) (t) to the sigmoid Training Data New Data
function S(t) at (6), the value to be generated is between zero No. Model Accuracy Accuracy
and one. This value will double with cell Ct-1 to determine how Loss Loss
(%) (%)
much information will be forgotten or remembered. If the 1. SGD 0.4630 46.87 0.3770 25.00
result is zero, the information is completely ignored, if one
information is retained. Whereas W and b are, respectively, 2. Adam 0.0446 92.70 0.9016 91.66
weight matrices and bias vector parameters.
= ( . ℎ , + ) (5) In this study, the Adam optimization method is better than
SGD with a higher level of accuracy and deviation from the
1 (6) target, which is characterized by a lower loss magnitude
( )= compared to the SGD model. Graphs of SGD and Adam test
1+
results can be seen in Fig. 5 and Fig. 6.
2) Input Gate
This gate determines which new information must be
remembered in cell state. By entering the output (ht-1) in the
previous unit (t-1) and adding the current input (Xt) time (t)
into the sigmoid function S (t) at (5), the value at (7) that is
generated is between zero and one to decide how much new
information in the cell state to remember. At the same time,
the tanh layer receives the Ct selection message that can be
calculated using (8) to be added to the cell state by entering
output (ht-1) at the previous time (t-1) and adding input (Xt) to Fig. 5. Accuracy of SGD and Adam Model
the current time (t).
= ( . ℎ , + ) (7)
= tanh( . ℎ , + ) (8)
Convolution the values of it, and obtained using the
sigmoid function so that the latest information that wants to be
added to the cell state Ct uses (9).
= ∗ + ∗ (9)
Fig. 6. Loss of SGD and Adam Model
3) Output Gate
This gate output determines which information will be
Based on the results of the SGD and Adam optimization
output in the cell state. By entering the previous output ℎ test, which can be seen in Table III, the accuracy value using
139
SGD optimization is 46.87% for training data and 25.00% for TABLE VI. INFLUENCE THE ACCURACY OF THE EPOCH NUMBER
test data. While for testing using Adam optimization reached Data Training Data Testing
an accuracy value of 92.70% for training data and 91.66% for No. Epoch Accuracy Accuracy
test data. As shown in Fg. 5, the graph of the comparison Loss Loss
(%) (%)
accuracy model between the SGD and Adam optimization 1. 100 0.1025 70.83 0.1263 75.00
models indicates that the SGD optimization model has no
change in accuracy at each epoch, while the Adam 2. 200 0.1906 71.87 0.1171 78.32
optimization model looks more volatile. Model loss shown in 3. 300 0.0205 75.00 0.0573 83.00
Fig. 6 shows the Loss number of SGD unchanged, while
Adam's optimization shows a smaller loss. Therefore, it can 4. 400 0.0567 84.41 0.0849 82.67
be concluded that the use of optimization methods can 5. 500 0.0489 85.41 0.0838 83.33
influence the results of training accuracy.
6. 600 0.0543 91.66 0.0419 90.61
B. Influence of Data History Number
7. 700 0.0446 92.70 0.9016 91.66
This study predicts wind speed with the amount of data
for the past ten years. However, to find out how much force
the amount of data is trained, so testing the amount of data in Table VI shows the difference in the accuracy results
the form of datasets where the data processed is in the way of obtained based on the number of epochs. Experiments with
daily data. The training process was carried out using Adam 100 epochs had the lowest accuracy of 70.83% for training
optimization with a learning rate of 0.001 and epochs of 200. data and 75.00% for test data. Meanwhile, trials with a total of
Several configurations to be tested can be seen in Table IV, 700 epochs have the highest accuracy of 92.70% for training
and the results can be seen in Table V. data and 91.66% for test data. As shown in Fig. 7, the value of
correctness increases with the increase in the number of
TABLE IV. CONFIGURATION OF RNN
Output
No. Layer Epoch Testing
Ten years Five years 100.00
ACCURACY
1. Dataset 120 60
Input 50.00
2. 64 64
Layer
Hidden
2. 64 64
Layer
0.00
3. Dropout 0.2 0.2 0 200 400 600
4. Dense 13 13 EPOCH
Output Data Training Data Testing
5. 1 1
layer
Fig. 7. Model of Epoch
TABLE V. RNN CONFIGURATION TESTING
epochs. This condition can happen because as the epoch
Data Training New Testing increases, more weight changes in the Neural Network.
No. Dataset
Loss
Accuracy
Loss
Accuracy Therefore, it can be concluded that the number of epochs can
(%) (%) affect the results of the accuracy obtained.
1. 120 0.0446 92.70 0.9016 91.66
IV. CONCLUSION
2. 60 0.0741 85.71 0.1764 83.33
This study has predicted wind speed using Recurrent
Neural Networks (RNN) with Long Short Term Memory
Table V shows the difference in the results of accuracy (LSTM). The wind speed prediction system consists of three
obtained based on the number of datasets. Experiments with stages. The first stage is preprocessing data consisting of data
120 datasets or 10 years of data have an accuracy of 92.70% interpolation, extraction, normalization, and segmentation.
for training data and 91.66% for test data. Whereas, trials with The second stage is the process of training data using RNN
60 datasets or data for five years have an accuracy of 85.71% with LSTM; then the third stage is the testing process. The
for training data and 83.33% for test data. So it can be results of this study indicate that wind speed predictions using
concluded that the value of accuracy decreases with the RNN get outstanding results using Adam optimization with a
reduced number of datasets that are trained and can affect the learning rate value of 0.001 obtaining accuracy for training
results of the accuracy obtained. data of 92.70% and test data of 91.66%. Predictions using data
C. Amount of Epoch Testing over the past ten years have better results than predictions
made using 5-year data. Therefore, it can be concluded that
This study used the number of epochs as many as 700 with
the optimization model, the amount of data, and the number
excellent results using learning rate 0.001 and the Adam
of epochs used for the training process can influence the
optimization model. But to find out how much influence the
results of the accuracy obtained.
amount of epochs in training data can be seen in Table VI.
ACKNOWLEDGMENT
Thanks to Lembaga Penelitian dan Pengabdian Kepada
Masyarakat, Universitas Jenderal Achmad Yani for the
financial support provided this research in 2019.
140
REFERENCES [13] N. Kumar and G. Kaur, “Wind Speed Prediction using Neural
Network,” International Journal of Advanced Production and
[1] A. Lodge and X. H. Yu, “Short term wind speed prediction using Industrial Engineering IJAPIE-SI-IDCM, vol. 608, no. January, pp.
artificial neural networks,” ICIST 2014 - Proceedings of 2014 4th IEEE 36–41, 2017.
International Conference on Information Science and Technology, pp.
539–542, 2014. [14] Z. Zhang et al., “Long Short-Term Memory Network Based on
Neighborhood Gates for Processing Complex Causality in Wind Speed
[2] F. Li, G. Ren, and J. Lee, “Multi-step wind speed prediction based on Prediction,” Energy Conversion and Management, vol. 192, no. March,
turbulence intensity and hybrid deep neural networks,” Energy pp. 37–51, 2019.
Conversion and Management, vol. 186, no. January, pp. 306–322,
2019. [15] X. Qing and Y. Niu, “Hourly day-ahead solar irradiance prediction
using weather forecasts by LSTM,” Energy, vol. 148, pp. 461–468,
[3] M. I. Indriyani, F. Nhita, and Deni Saepudin, “predictions of the spread 2018.
of stem borers in Bandung regency based on weather information using
[16] Z. Zhang, H. Qin, L. Yao, J. Lu, and L. Cheng, “Interval prediction
the Adaptive Neuro-Fuzzy Inference System (ANFIS) algorithm,” in
e-Proceeding of Engineering, 2016, vol. 3, no. 2, pp. 3914–3926. method based on Long-Short Term Memory Networks for system
integrated of hydro , wind and solar power,” in Energy Procedia, 2019,
[4] R. Soares, L. Kumarb, F. Shabanib, and M. C. Picançoa, “Risk of vol. 158, pp. 6176–6182.
spread of tomato yellow leaf curl virus ( TYLCV ) in tomato crops
under various climate change scenarios,” Agricultural Systems, vol. [17] Z. Shi, S. Member, H. Liang, V. Dinavahi, and S. Member, “Direct
173, no. June 2018, pp. 524–535, 2019. Interval Forecast of Uncertain Wind Power Based on Recurrent Neural
Networks,” IEEE Transactions on Sustainable Energy, vol. 9, no. 3,
[5] H. Liu, H. Tian, D. Pan, and Y. Li, “Forecasting models for wind speed pp. 1177–1187, 2018.
using wavelet , wavelet packet , time series and Artificial Neural
Networks,” Applied Energy, vol. 107, pp. 191–208, 2013. [18] H. Fadhilah, E. C. Djamal, and R. Ilyas, “Non-Halal Ingredients
Detection of Food Packaging Image Using Convolutional Neural
[6] T. Multazam, R. I. Putri, M. Pujiantara, V. Lystianingrum, A. Priyadi, Networks,” in The 2018 International Symposium on Advanced
and M. H. P, “Short-term Wind Speed Prediction Base on Intelligent Informatics (SAIN 2018), 2018.
Backpropagation Levenberg-Marquardt Algorithm ; Case Study Area
Nganjuk,” in 5th International Conference on Instrumentation, [19] A. Zhu, X. Li, Z. Mo, and H. Wu, “Wind Power Prediction Based on a
Communications, Information Technology, and Biomedical Convolutional Neural Network,” in 2017 International Conference on
Engineering (ICICI-BME), 2017. Circuits, Devices and Systems, 2017, pp. 133–135.
[7] T. Kaur, S. Kumar, and R. Segal, “Application of artificial neural [20] Q. Cao, B. T. Ewing, and M. A. Thompson, “Forecasting wind speed
network for Short term wind speed prediction,” in Biennial with recurrent neural networks,” European Journal of Operational
Research, vol. 221, no. 1, pp. 148–154, 2012.
International Conference on Power and Energy Systems:Towards
Sustainable Energy (PESTSE), 2016, pp. 217–222. [21] H. Z. Wang, G. B. Wang, G. Q. Li, J. C. Peng, and Y. T. Liu, “Deep
belief network based deterministic and probabilistic wind speed
[8] S. G. Gouda, Z. Hussein, S. Luo, and Q. Yuan, “Model selection for
forecasting approach,” Applied Energy, vol. 182, pp. 80–93, 2016.
accurate daily global solar radiation prediction in China,” Journal of
Cleaner Production, vol. 221, no. 1 June 2019, pp. 132–144, 2019. [22] F. A. Gers, J. Schmidhuber, F. Cummins, F. A. Gers, and F. Cummins,
“Learning to forget: continual prediction with lstm. Neural
[9] I. Tanaka and H. Ohmori, “Method Selection in Different Regions for
Computation,” in Proceeding ICANN’99 Int. Conf. on Arti cial Neural
Short-Term Wind Speed Prediction in Japan,” in SICE Annual
Networks, 1999, vol. 2, pp. 850–855.
Conference 2015 July, 2015, vol. 2, pp. 189–194.
[10] Kazım Kaba, M. Sarıgülb, Mutlu Avcıc, and H. K. Mustafa, [23] Z. Abbas, A. Al-shishtawy, S. Girdzijauskas, and V. Vlassov, “Short-
“Estimation of Daily Global Solar Radiation Using Deep Learning,” Term Traffic Prediction Using Long Short-Term Memory Neural
Energy, vol. 162, no. 1, pp. 126–135, 2018. Networks,” in 2018 IEEE International Congress on Big Data
(BigData Congress), 2018, no. Figure 1, pp. 57–65.
[11] L. Wang, Z. Wang, and B. Wang, “Wind Power Day-ahead Prediction
Based on LSSVM With Fruit Fly Optimization Algorithm,” in 2018 [24] I. Kök, M. U. Şimşek, and S. Özdemir, “A deep learning model for air
International Conference on Power System Technology quality prediction in smart cities,” in Proceedings - 2017 IEEE
(POWERCON), 2018, pp. 999–1003. International Conference on Big Data, Big Data 2017, 2018, vol. 2018-
Janua, pp. 1983–1990.
[12] S. Nurunnahar and D. B. Talukdar, “A Short Term Wind Speed
Forcasting Using SVR and BP-ANN : A Comparative Analysis,” in [25] Y.-T. Tsai, Y.-R. Zeng, and Y.-S. Chang, “Air Pollution Forecasting
2017 20th International Conferenceof Computer and Information Using RNN with LSTM,” in 2018 IEEE 16th Int. Conf. on Dependable,
Technology (ICCIT), 2017, pp. 22–24. Autonomic & Secure Comp., 16th Int. Conf. on Pervasive Intelligence
& Comp., 4th Int. Conf. on Big Data Intelligence & Comp., and 3rd
Cyber Sci. & Tech. Cong., 2018, pp. 1074–1079.
141

Wind Speed Forecasting Using Recurrent Neural Networks and Long Short Term Memory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wind Speed Forecasting Using Recurrent Neural Networks and Long Short Term Memory

Uploaded by

Copyright:

Available Formats

2019 6th International Conference on Instrumentation, Control, and Automation (ICA)

Bandung, Indonesia. 31 July – 2 August 2019

Wind Speed Forecasting Using Recurrent Neural

978-1-7281-0916-9/19/$31.00 ©2019 IEEE 137

3652. 31/12/2017 91 2 1 1 2 ... 12 13 … 107 108 ... 119 120

Dataset 1 Dataset 108

Dataset 2 Dataset 109

Fig. 2. Data Segmentation

You might also like