Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

sustainability

Article
Research on Short-Time Wind Speed Prediction in
Mountainous Areas Based on Improved ARIMA Model
Zelin Zhou 1 , Yiyan Dai 2, * , Jun Xiao 3,4 , Maoyi Liu 5 , Jinxiang Zhang 2 and Mingjin Zhang 2

1 China 19th Metallurgical Corporation, Chengdu 610031, China


2 Department of Bridge Engineering, Southwest Jiaotong University, Chengdu 610031, China
3 CCCC Second Highway Engineering Co., Ltd., Xi’an 710199, China
4 Shaanxi Union Research Center of University and Enterprise for Bridge Intelligent Construction,
Xi’an 710199, China
5 Chongqing Construction Investment (Group) Co., Ltd., Chongqing 400010, China
* Correspondence: dyy@my.swjtu.edu.cn

Abstract: In rugged mountain areas, the lateral aerodynamic force and aerodynamic lift caused by
strong winds are the main reasons for the lateral overturning of trains and the destruction of buildings
and structures along the railroad line. Therefore, it is important to build a strong wind alarm system
along the railroad line, and a reasonable and accurate short-time forecast of a strong wind is the
basis of it. In this research, two methods of constructive function and time-series decomposition
are proposed to pre-process the input wind speed for periodic strong winds in mountainous areas.
Then, the improved Auto-Regressive Integrated Moving Average model time-series model was
established through the steps of a white noise test, data stationarity test, model recognition, and
order determination. Finally, the effectiveness of the improved wind speed prediction was examined.
The results of the research showed that rational choice of processing functions has a large impact on
wind speed prediction results. The prediction accuracy of the improved ARIMA model proposed in
Citation: Zhou, Z.; Dai, Y.; Xiao, J.; this paper is better than the results of the traditional Seasonal Auto-Regressive Integrated Moving
Liu, M.; Zhang, J.; Zhang, M. Average model, and it can quickly and accurately realize the short-time wind speed prediction along
Research on Short-Time Wind Speed the railroad line in rugged mountains. In addition, the improved ARIMA model has verified its
Prediction in Mountainous Areas universality in different mountainous places.
Based on Improved ARIMA Model.
Sustainability 2022, 14, 15301. Keywords: wind speed forecast; data pre-processing; auto-regressive integrated moving average;
https://doi.org/10.3390/ seasonal auto-regressive integrated moving average; rugged mountain areas
su142215301

Academic Editor: Byungik Chang

Received: 11 October 2022


1. Introduction
Accepted: 15 November 2022
Published: 17 November 2022
With economic development and social progress, more and more railways have been
built in rugged mountainous areas. Many studies have therefore been carried out on
Publisher’s Note: MDPI stays neutral
wind parameters in the mountainous area [1,2]. High winds in mountainous areas can
with regard to jurisdictional claims in
cause damage to buildings along the railway line and affect construction operations [3].
published maps and institutional affil-
To prevent disasters in advance, it is necessary to establish a strong wind warning system [4].
iations.
However, the non-linearity and non-stationary characteristics of mountain winds pose a
challenge to the prediction of wind speeds [5,6].
The main methods of predicting wind speed in mountainous areas include numerical
Copyright: © 2022 by the authors.
weather prediction, neural networks, and statistical methods. Numerical weather predic-
Licensee MDPI, Basel, Switzerland. tion (NWP) can take into account the relationship between wind speed and topography
This article is an open access article based on Weather Research and Forecasting (WRF) [7], but it can only achieve mesoscale
distributed under the terms and forecasts and needs to be combined with other models to achieve high accuracy [8]. The
conditions of the Creative Commons application of neural networks in wind speed prediction is mature, and it has a strong non-
Attribution (CC BY) license (https:// linear fitting ability, which can achieve good prediction effects for non-stationary wind in
creativecommons.org/licenses/by/ mountainous areas. Many neural network models have achieved good results in mountain
4.0/). wind speed prediction, such as Artificial Neural Network (ANN) [9], Back Propagation

Sustainability 2022, 14, 15301. https://doi.org/10.3390/su142215301 https://www.mdpi.com/journal/sustainability


wind in mountainous areas. Many neural network models have achieved good results in
mountain wind speed prediction, such as Artificial Neural Network (ANN) [9], Back
Sustainability 2022, 14, 15301 Propagation (BP) [10], ELMAN [11], Long Short-Term Memory (LSTM) [12], etc. Moreo- 2 of 12
ver, hybrid models usually behave better than single models [13–15]. However, the train-
ing time of the neural network is longer, and although the hybrid model can get better
prediction
(BP) results, the
[10], ELMAN [11],steps
Long areShort-Term
more complicated.Memory (LSTM) [12], etc. Moreover, hybrid
The statistical method
models usually behave better than singleis widely applied
modelsin[13–15].
the prediction
However, ofthe
wind speedtime
training in moun-
of the
neural network is longer, and although the hybrid model can get better predictionisresults,
tainous areas. The Auto-Regressive Integrated Moving Average (ARIMA) model repre-
sentative
the steps are of the
more statistical
complicated.method [16], which is simple to operate as a short-term wind
speedTheprediction.
statisticalIn addition,
method fewer parameters
is widely applied in the of this model of
prediction need
windto be determined,
speed in mountain- and
the method
ous areas. The forAuto-Regressive
determining theIntegrated
parametersMoving is well established
Average (ARIMA) [17,18].model
However, the input
is representa-
dataofrequired
tive by themethod
the statistical model need to be stationary
[16], which is simple to and do notasinclude
operate potential
a short-term wind seasonal
speed
factors [19]. Wind speeds in mountainous areas are often
prediction. In addition, fewer parameters of this model need to be determined, andnon-stationary and the original
the
data without
method processing input
for determining directly into
the parameters the model
is well cannot[17,18].
established achieveHowever,
the desired theresults.
input
Therefore,
data required Seasonal
by the Auto-Regressive
model need to beIntegrated stationary Moving
and do not Average
include (SARIMA)
potential wasseasonalpro-
posed to
factors solve
[19]. thespeeds
Wind above problem and hasareas
in mountainous been are
applied
oftenin wind speed prediction
non-stationary in rug-
and the original
ged without
data mountainous areas input
processing [20,21]. However,
directly into thethe model
SARIMA has achieve
cannot more parameters
the desiredthan the
results.
ARIMA model
Therefore, andAuto-Regressive
Seasonal is more difficult Integrated
to manipulate. Moving Average (SARIMA) was proposed
to solve the above
To solve problem
the above and has
problem been
of the applied
ARIMA in wind
model and speed
improve prediction in rugged
the quality of data
mountainous
feed into the areas
ARIMA [20,21].
model. However, the SARIMA
This research proposeshastwomore parameters
methods than the ARIMA
for preprocessing the
model and is more
data, including difficult
the to manipulate.
constructive function method and the time-series decomposition
To solve
method. The the above problem
parameter regularity of the ARIMA
of those twomodel
methods and is
improve
discussedthe based
qualityonofthe
data feed
histor-
into
ical the
wind ARIMA
speed model.
data fromThisdifferent
research sites.
proposes two
Then, themethods
processedfor preprocessing
data are inputthe into data,
the
including the constructive function method and the time-series
ARIMA model to predict the future wind speed. The results show that the improved decomposition method.
The
ARIMAparameter
modelregularity
based on the of those two methods
proposed two methods is discussed based
can conduct on the historical
accurate forecasting wind
and
speed data from different sites. Then, the processed data are input
behave better than the SARIMA model. To further confirm the reliability of the improved into the ARIMA model
to predictmodel,
ARIMA the future
thesewind speed.
models areThe
used results show that
to forecast the improved
another wind speed ARIMAin themodel based
mountains.
on
Thethe proposed
results also two
show methods can conduct
the superiority of theaccurate
model. forecasting and behave
Finally, some better than the
main conclusions are
SARIMA
present. model. To further confirm the reliability of the improved ARIMA model, these
models are used to forecast another wind speed in the mountains. The results also show
the superiority of the model. Finally, some main conclusions are present.
2. Methodology
2.1.
2. ARIMA
Methodology
The Auto-Regressive Integrated Moving Average model (ARIMA) is a combination
2.1. ARIMA
of the
TheAuto-Regressive
Auto-Regressivemodel (AR)Moving
Integrated and theAverage
Movingmodel
Average modelis(MA),
(ARIMA) which isofa
a combination
means of forecasting future data and trends based on historical data. The expression
the Auto-Regressive model (AR) and the Moving Average model (MA), which is a means of of
the ARIMA with parameters p, d, and q is shown in Equation (1).
forecasting future data and trends based on historical data. The expression of the ARIMA
with parameters p, d, and q is shown in Equation (1).
𝑋𝑡 − 𝜑1 𝑋𝑡−1 − ⋯ … − 𝜑𝑝 𝑋𝑡−𝑝 = 𝜀𝑡 − 𝜃1 𝜀𝑡−1 … … − 𝜃𝑞 𝜀𝑡−𝑞 (1)
X t − ϕ 1 X t −1 − · · · . . . − ϕ p X t − p = ε t − θ 1 ε t −1 . . . . . . − θ q ε t − q (1)
whereppisisthe
where theparameter
parameterfor forthe
theautoregressive
autoregressive term,
term, and
and q isq the
is the parameter
parameter forfor
thethe mov-
moving
ing average term. d stands for the differential number of times, and
average term. d stands for the differential number of times, and original data become original data become
stationary data
stationary data after
after dd differentials
differentials times. 𝑋𝑡 isisaastationary
times. X stationarytimetime series
series with
with zero
zero mean.
mean.
t
2
ε𝜀t𝑡 isisstationary
stationarywhite
whitenoise
noisewith
withzero
zero mean
mean andand the
the variance
variance is is σ𝜎ε 𝜀 . . The
2 The process
process ofof the
the
ARIMAmodel
ARIMA modelisisshown
shownin inFigure
Figure1.1.

Figure 1. The process of the ARIMA model.

As shown in Figure 1, to confirm the quality of data that are input to the model, the
original data should be conducted in the white noise test and the stationarity test. The
white noise test adopts the LB statistics test [22]. The original hypothesis H0 is that if
Sustainability 2022, 14, 15301 3 of 12

the p-value obtained from the test is less than 0.05, the original hypothesis is considered
to be rejected (i.e., the original data are not white noise). The stationarity test uses the
Augmented Dickey–Fuller test (ADF). The original hypothesis H0 is that there is a unit
root (i.e., the data are non-stationary). If the original hypothesis is accepted, differential
treatment is performed to make the data stationary. In contrast, if the original hypothesis
is rejected, the data are considered stationary. The Autocorrelation Function (ACF) and
the Partial Autocorrelation Function (PACF) are used to determine the suitable type of
model for the input data. Moreover, the Bayesian Information Criterion (BIC) is applied for
determining the appropriate parameters of the model, which is shown in Equation (2).

BIC = 2kln(n) − 2 ln( L) (2)

where k is the number of the parameters. L stands for the likelihood function, and n is the
sample number.

2.2. SARIMA
The Seasonal Auto-Regressive Integrated Moving Average (SARIMA) model is based
on the ARIMA theory and takes into account the influence of seasonal terms. The SARIMA
model has a good prediction effect for data with strong periodicity. The general form is
SARIMA (p, d, q) × (P, D, Q)s, where the significance of parameters p, d, and q is the same as
that of the ARIMA model. The parameter P represents the order of seasonal autoregression,
Q is the order of the seasonal moving average, and D is the order of the seasonal differential.
In addition, the s is the cycle length of the season. The expression of SARIMA can be seen
in Equations (3)–(5).

ϕ( B) ϕ( Bs )(1 − Bs ) D Xt = c + θ ( B)Θ( Bs )ε t (3)

ϕ ( B s ) = 1 − ϕ1 B s − · · · ϕ P B s − P (4)
s s s− Q
Θ( B ) = 1 − Θ1 B − · · · ΘQ B (5)
where Xt is a stationary time series. Bs denotes the seasonal backward shift operator. ε t is
stationary white noise with zero mean and the variance is σε 2 . The modeling process of the
SARIMA is similar to the ARIMA. The difference is that the SARIMA takes into account
the potential periodicity in the order determination, which is reflected in the seasonal term
of the model. Because of eliminating the influence of periodic factors, SARIMA prediction
results are often better than the ordinary ARIMA model. Additionally, the SARIMA model
does not need to process the input data to remove the trend term and the seasonal term. The
essence of the SARIMA is the ARIMA multiplicative model considering seasonal attributes.
The SARIMA model has a good predictive effect on periodic data. However, the parameters
of the SARIMA are more than the ARIMA, so it takes longer time than the general ARIMA
and the uncertain parameters will also affect the prediction results.

2.3. Improved ARIMA


The pre-processing of data can eliminate the non-stationarity of original data, which
solves the problem of low prediction accuracy when the order parameter of the ARIMA
model is too low, and the model is complex and time-consuming when the order parameter
is too high. Moreover, the potential regularity of input data cannot be identified by the
model. To solve the problem, this research proposes two methods to preprocess the input
data to achieve the purpose of optimizing the ARIMA model, namely the constructive
function method and the time-series decomposition method. The process of the improved
ARIMA model is shown in Figure 2.
Sustainability 2022, 14, x FOR PEER REVIEW 4 of 13

Sustainability 2022, 14, 15301 4 of 12


constructive function method and the time-series decomposition method. The process of
the improved ARIMA model is shown in Figure 2.

Figure2.2. Improved
Figure ImprovedARIMA
ARIMAmodeling
modelingprocess.
process.

To
Todescribe
describeconveniently,
conveniently,the thetraining
trainingset
setdata
dataare arerecorded
recordedasas{X {𝑋t1𝑡1},},in
inwhich
whichthe
thedata
data
used to input the ARIMA model are recorded
used to input the ARIMA model are recorded as {𝑋t2𝑡2 }. {𝑋t3 as {X }. {X } is a term that stands for
𝑡3 } is a term that stands for the
the
constructive function or the time-series decomposition. There
constructive function or the time-series decomposition. There is a relationship between is a relationship between
{{𝑋
X𝑡1 }, {{𝑋X𝑡2t2},},and
t1 }, { X𝑡3t3} }as
and {𝑋 asininEquation
Equation(6).
(6).The { X𝑡4t4} }stands
The {𝑋 standsfor forthe
thetesting
testingset setof
ofdata.
data.

{𝑋}𝑡2=} =
{ Xt2 { X{𝑋 }−
𝑡1−
t1 } { X{𝑋 }}
t3𝑡3 (6)
(6)

2.3.1. Constructive Function Method


2.3.1. Constructive Function Method
Due to wind in rugged mountainous areas being complex and non-stationary, a
Duefunction
periodic to wind {inXrugged mountainous areas being complex and non-stationary, a pe-
t3 } is constructed to eliminate the non-stationarity of the data for
riodic function
making {𝑋 }
𝑡3 is data
the input model constructed to eliminate
stable. The the non-stationarity
specific step of the data
is to reduce the training set {for
Xt1mak-
} by
aing the input
periodic model{ X
function datat3 } stable.
to get The
{ X t2 specific
} , and step
predict is{to
X reduce
t2 } afterthe training
removing set
the {𝑋
trend𝑡1 } by a
term.
periodic
The of {{𝑋
function
expression Xt3𝑡3}} istoshown
get {𝑋in𝑡2Equation
}, and predict(7). {𝑋𝑡2 } after removing the trend term.
The expression of {𝑋𝑡3 } is shown in Equation (7).
i
{ Xt3 } = K ∗ cos( )+ C (7)
ω𝑖
{𝑋𝑡3 } = 𝐾 ∗ 𝑐𝑜𝑠( ) + 𝐶 (7)
𝜔
where K is the magnification of periodic functions, and i is the number of the data.
ω represents
where themagnification
K is the period of the periodic
of periodic function, and Cand
functions, is aiconstant.
is the number of the data. 𝜔
represents the period of the periodic function, and C is a constant.
2.3.2. Time-Series Decomposition Method
2.3.2.ToTime-Series
remove theDecomposition
influence of theMethod
period term in the training set data on the prediction,
this research adopts the statsmoedls to decompose the historical measured data. The
To remove the influence of the period term in the training set data on the prediction,
statsmodels can extract the period components from a one-dimensional time series. The
this research adopts the statsmoedls to decompose the historical measured data. The stats-
data apply a convolution filter to estimate trends to obtain results. Then, the trend is deleted
models can extract the period components from a one-dimensional time series. The data
from the sequence, and the average value of the detrended sequence for each period is the
apply a convolution filter to estimate trends to obtain results. Then, the trend is deleted
returned periodic component [23,24].
from the sequence, and the average value of the detrended sequence for each period is the
The statsmoedls model decomposes the training set data into a trend term, a period
returned periodic component [23,24].
term { Xt3 }, and a residual term. The decomposition principle is to estimate the trend
The statsmoedls model decomposes the training set data into a trend term, a period
by applying a convolution filter to the data. Then, the trend is removed from the time
term {𝑋𝑡3 }, and a residual term. The decomposition principle is to estimate the trend by
series, and the average of the trend series at different times is a periodic component
applying Moreover,
returned. a convolution filter to the
to increase the effectiveness
data. Then, the of trend is removed
the period from
term on themodel
input time series,
data,
the decomposition term is multiplied by a magnification coefficient as a treatment returned.
and the average of the trend series at different times is a periodic component function,
Moreover,
the to increase
expression of whichthe effectiveness
is shown of the(8).
in Equation period term on input model data, the de-
composition term is multiplied by a magnification coefficient as a treatment function, the
expression of which is shown in Equation { Xt3 } =(8).
K ∗ S( ϕ, t) (8)

where K is a magnification coefficient,{𝑋 is }the


S 𝑡3 =𝐾 ∗ 𝑆(𝜑, 𝑡)
decomposition (8)
function of the seasonal term
of the statsmoedls.
where ϕ represents
K is a magnification the sampling
coefficient, period for decomposition.
S is the decomposition function of the seasonal term
of the statsmoedls. 𝜑⁡ represents the sampling period for decomposition.
3. Cases Studies
3.1. Original Data
This research selected the measured 1-min average wind speed data in two moun-
tainous areas are selected as objects of study, with high wind speed at site A and low
Sustainability 2022, 14, x FOR PEER REVIEW 5 of 13

3. Cases Studies

Sustainability 2022, 14, 15301


3.1. Original Data 5 of 12
This research selected the measured 1-min average wind speed data in two moun-
tainous areas are selected as objects of study, with high wind speed at site A and low wind
speed
windatspeed
site B.at
Insite
addition, wind speed
B. In addition, wind varies greatly
speed variesatgreatly
both locations, with highwith
at both locations, ampli-
high
tude of maximum and minimum wind speeds. The number of data
amplitude of maximum and minimum wind speeds. The number of data points is 1500,points is 1500, of of
which 1000 are divided as the training set {𝑋 } and the remaining 500 as the test
which 1000 are divided as the training set { Xt1 } and the remaining 500 as the test set { Xt4 }.
𝑡1 set
{𝑋𝑡4 }. The
The datadata visualization
visualization of Aofand
A and
B is Bplotted
is plotted in Figure
in Figure 3. 3.

Wind speed (m/s)


7
6 Training set Test set
5
4
3
2
1
0
100 300 500 700 900 1100 1300 1500
Data number
(a)
Wind speed (m/s)

20 Training set Test set


15
10
5
0
100 300 500 700 900 1100 1300 1500
Data number
(b)
Figure 3. Modeling
Figure data
3. Modeling visualization:
data (a)(a)
visualization: Dataset division
Dataset of A
division of site; (b)(b)
A site; dataset division
dataset of of
division B site.
B site.

TheThe first-order
first-order twelve-step
twelve-step difference
difference removes
removes thethe trend
trend and
and period
period terms
terms from
from thethe
original data, due to the fact that the original data cannot pass the stationarity test and thethe
original data, due to the fact that the original data cannot pass the stationarity test and
white
white noise
noise test.
test. After
After conducting
conducting first-order
first-order twelfth-order
twelfth-order differencing,
differencing, thethe p-values
p-values of of
the ADF test are 1.8 × 10 −18 in site A and 2.0−14 × 10 −14 in site B, which is significantly less
the ADF test are 1.8 × 10 in site A and 2.0 × 10 in site B, which is significantly less than
−18

thethan the hypothetical


hypothetical value. Furthermore,
value. Furthermore, both theboth the SARIMA
SARIMA model andmodeltheand
twothe two methods
methods pro-
proposed
posed in thisinresearch
this research can eliminate
can eliminate trendtrend
and and period
period termsterms
fromfrom
the the data,
data, so so
thethe three
three
methods mentioned above were used to conduct the wind prediction
methods mentioned above were used to conduct the wind prediction for sites A and B to for sites A and B to
illustrate the superiority of the proposed
illustrate the superiority of the proposed methods. methods.

3.2. Modeling
3.2. Modeling
3.2.1. Data Pre-Processing
3.2.1. Data Pre-Processing
This research takes site B as an example and preprocesses the training set data of site B. The
This research
processing takesare
functions sitethe
B as an example
constructed and preprocesses
periodic function andthe
thetraining set data of
decomposition site
function
B. based
The processing functions are the constructed periodic function and the decomposition
Sustainability 2022, 14, x FOR PEER REVIEW on the statsmodels decomposition. For data pre-processing, the original training 6 of set
13
function
data arebased on thefrom
subtracted statsmodels decomposition.
the processing function.For
Thedata pre-processing,
processing functionsthe
areoriginal
shown in
training
Figuresset data5,are
4 and subtractedThe
respectively. from thefor
data processing
the input function.
model areThe processing
plotted functions
in Figure 6.
are shown in Figures 4 and 5, respectively. The data for the input model are plotted in
Figure 6.

Constructed periodic
Figure 4. Constructed periodic function.
Original data

6.0
4.0
2.0
0.0
0 100 200 300 400 500 600 700 800 900 1000
Sustainability 2022, 14, x FOR PEER REVIEW 6 of 13

Sustainability 2022, 14, 15301 6 of 12

Figure 4. Constructed periodic function.

Residual term Periodic term Trend term Original data


6.0
4.0
2.0 Figure 4. Constructed periodic function.
0.0

Residual term Periodic term Trend term Original data


0 100 200 300 400 500 600 700 800 900 1000
6.0
4.0 4.0
3.5
2.0
3.0
2.5 0.0
0 100 200 300 400 500 600 700 800 900 1000
2.0 4.0
0 100 3.5200 300 400 500 600 700 800 900 1000
3.0
2.0 2.5
2.0
1.0
0 100 200 300 400 500 600 700 800 900 1000
0.0
-1.0 2.0
1.0
-2.0 0.0
0 100 -1.0 200 300 400 500 600 700 800 900 1000
2.0 -2.0
0 100 200 300 400 500 600 700 800 900 1000
0.0 2.0
- 2.0 0.0
- 2.0
- 4.0
0 100 - 4.0200 300 400 500 600 700 800 900 1000
0 100 200 300 400 500 600 700 800 900 1000
Data number
Data number

Figure
Figure 5. Figure 5.
5. The results
results ofThe
of theresults
the of the statsmodels
statsmodels
statsmodels decomposition.
decomposition.
decomposition.
15
after treatment

15 10 Constructive function method


Wind speed
after treatment

10 5 Constructive function method


Wind speed

5 0
0 -5
- 10
-5 -15
- 10 100 200 300 400 500 600 700 800 900 1000

-15 Data number


100 200 300 400 500 600
(a) 700 800 900 1000
after treatment

8
Data numberTime-series decomposition method
Wind speed

6
4 (a)
after treatment

8 2 Time-series decomposition method


Wind speed

6 0
4 -2
0 100 200 300 400 500 600 700 800 900 1000
2 Data number
0 (b)

-2 Figure 6. Processed data: (a) Constructive function method; (b) time-series decomposition method.
Figure 6. Processed data: (a) Constructive function method; (b) time-series decomposition method.
0 100 200 300 400 500 600 700 800 900 1000
3.2.2. Selection of Parameters
Data number
After preprocessing the data,
(b) an improved ARIMA model can be established as shown
in Figure 2. To further improve the accuracy of wind speed prediction, the parameters
Figure 6. Processed data: (a) Constructive function method; (b) time-series decomposition method.
of the processing function are optimized and analyzed. To evaluate the quality of the
prediction results, the corresponding indicators for its accuracy evaluation are introduced
as shown in Equations (9)–(12).
2
∑i ( X̂ (i ) − X (i ))
R2 = 1 − 2
(9)
∑i ( X − X (i ))
N
1
∑ X (i) − X̂ (i)

MAE = (10)
N i =1
N
X (i ) − X̂ (i )
1
MAPE =
N ∑ X (i)
(11)
i =1
Sustainability 2022, 14, 15301 7 of 12

N
1 2


MSE = X (i ) − X̂ (i ) (12)
N i =1

where X (i ) is the measured wind speed, X̂ (i ) is the predicted wind speed, and X stands
for the mean value of the measured wind speed value. Furthermore, the R2 and MAPE
are selected as the main evaluation indicators. The parameters that affect the results of
predicted wind speed are the magnification coefficient K, the period of the constructed
function ω of the constructive function method, the magnification coefficient K, and the
sampling period ϕ of the time-series decomposition method.
The variation of parameters will affect the prediction results and observing changes in
evaluation indicators can determine the optimal parameters. The results of the parameter
discussion are shown in Figure 7, where the parameters are taken at intervals and densely
selected from the parameters that behave well. In Figure 7, the results of fitting the sample
points using the function fitting method are shown and given the practical time-consuming
issues. A simple polynomial function can be implemented to capture the trend in the
variations of parameters.
It can be seen from Figure 7 that changing the parameters of the processing function
will affect the accuracy of the prediction results. Moreover, the prediction results of A
and B can be obtained at the same regularity due to parameter changes. To visualize the
regularity, the results are fitted by polynomial functions as shown in Figure 7. In the case
of parameter changes, there are the following regularity,
1. For the time-series decomposition method, it can be seen from the periodic term in
Figure 5 that the length of the repeated period of the decomposed subsequence is ϕ.
When the value of ϕ is 0, the length of the decomposition sequence under each period
is 0. When ϕ > 500, the number of cycles in the training set is less than 2. So, the
parameter ϕ is selected as 0 < ϕ ≤ 500, which is reasonable. From Figure 7a,b, with
the increase of ϕ value, the overall trend of MAPE decreases, and the overall trend
of R2 increases. The results show that the higher the value of ϕ, the more the series
obtained from the statsmodels decomposition is able to extract the periodicity of the
wind speed to enhance the prediction results.
2. Parameter K in the time-series decomposition method amplifies the effect of a decom-
posed subsequence on the original data. From Figure 7c,d, it can be seen that different K
will get different predictions. When K is less than the optimal value, with the increase of
the K value, R2 increases, and MAPE decreases. When K is greater than the optimal value,
the change regularity of R2 and MAPE is the opposite. The dashed line in Figure 7c,d
indicates the location of the optimal value of K. Based on the results of the parameter
discussion, the optimal value of K is 0.5 times the value of the maximum value of the
original wind speed divided by the maximum value of the processing function.
3. Figure 7e,f describe the changes in the indicators that evaluate the prediction results as
the parameter ω of the constructive function method changes. The change regularity of
the prediction indicators can be fitted by a quadratic polynomial. From the fitting results,
it can be easily seen that the extreme value of the function corresponds to the optimal
prediction result. When ω reaches the optimal value, R2 is the largest and MAPE is the
smallest. The corresponding optimal ω values of A and B are both 26, which can provide
a reference for wind speed prediction in similar rugged mountainous areas.
4. Similar to K in the time-series decomposition method, K in the constructive function
method is also related to the wind speed of the original data. Through Figure 7g,h
when K = 5 in site A and K = 35 in site B, R2 is the largest, and MAPE is the smallest, the
model can achieve the best-predicted results. The optimal parameter K corresponding
to the two sites is different because of the different wind speeds. The small wind speed
in site A corresponds to the small K value, and site B is the opposite. The maximum
wind speed can provide a reference for the value of K.
sampling period 𝜑 of the time-series decomposition method.
The variation of parameters will affect the prediction results and observing changes
in evaluation indicators can determine the optimal parameters. The results of the param-
eter discussion are shown in Figure 7, where the parameters are taken at intervals and
densely selected from the parameters that behave well. In Figure 7, the results of fitting
Sustainability 2022, 14, 15301 the sample points using the function fitting method are shown and given the practical 8 of 12
time-consuming issues. A simple polynomial function can be implemented to capture the
trend in the variations of parameters.
0.22 0.95 0.17 0.99
R² 0.16 R²
0.20 0.90 Fitted curve Fitted curve
0.15 0.98
0.18 95% confidence band 95 % confidence band
0.14


0.85
MAPE

MAPE

0.16 0.13 0.97
0.80 0.12
0.14
MAPE 0.11
MAPE
0.12 Fitted curve 0.75 Fitted curve 0.96
0.10
95 % confidence band 95 % confidence band
0.10 0.70 0.09
5.5
100 200 300
φ
400 500 600 700 100 200 300
φ
400 500 600 700 100 200 300 400
φ 500 600 700 100 200 300 400
φ 500 600 700
0.08
MSE 1.6
5.0 MSE
Fitted curve 1.5 0.07 Fitted curve 0.18
4.5
95% confidence band 1.4 95 % confidence band
4.0 0.06
1.3 0.16

MAE
MAE
MSE

3.5

MSE
1.2 0.05
3.0
0.14
2.5 1.1 0.04
MAE MAE
2.0 1.0
Fitted curve 0.03 0.12 Fitted curve
1.5 0.9 95 % confidence band
Sustainability 95 % confidence band
1.0 2022, 14, x FOR PEER REVIEW
0.8 0.02 0.10
8 of 13
100 200 300 400 500 600 700 100 200 300 400 500 600 700 100 200 300 400 500 600 700 100 200 300 400 500 600 700
φ φ φ φ
(a) (b)
0.30 0.16
MAPE MAPE 0.990
Fitted curve 0.9
0.25 0.14
Fitted curve
0.985
95 % confidence band 95 % confidence band
0.8
MAPE

MAPE
0.980


0.20 0.12
0.7 0.975
0.15 R2 0.10 R²
0.970
Fitted curve Fitted curve
0.6
0.10 95 % confidence band 95 % confidence band
0.08 0.965
1.0 1.5 2.0 2.5 K 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0
K K K
2.0
MSE MAE 0.06 MSE MAE
6 1.8
Fitted curve Fitted curve Fitted curve Fitted curve
5 95 % confidence band 1.6 95 % confidence band 0.05 95 % confidence band 95 % confidence band
0.16

MAE
MAE

1.4
MSE

4
MSE

0.04
3 1.2
1.0 0.03 0.12
2
0.8 0.02
1
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
K 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 K 2.0 2.5 3.0
K K
(c) (d)
0.22 0.28 0.99
0.95 R2 MAPE
0.20
Fitted curve 0.24 Fitted curve 0.98
0.18 0.90
95 % confidence 95 % confidence band
0.20
0.97
MAPE

MAPE

0.16 0.85 band


0.16
0.14 0.80 0.96
MAPE
0.12 Fitted curve 0.75
0.12 R²
95 % confidence 0.95 Fitted curve
0.10 0.08
band 0.70 95 % confidence band
0.08 0.94
10 20 30 ω 40 50 60 10 20 30 ω 40 50 60 20 30 ω 40 50 60 20 30 ω 40 50 60
0.10
5 MSE 0.24 MAE
1.6
0.08 Fitted curve Fitted curve
4 1.4 95 % confidence band 95 % confidence band
0.20
MAE
MSE
MAE
MSE

3 1.2 0.06
0.16
1.0 MAE
2 MSE 0.04
Fitted curve Fitted curve 0.12
0.8
1 95 % confidence band 95 % confidence band 0.02
10 20 30
ω 40 50 60 10 20 30
ω 40 50 60 20 30
ω 40 50 60 20 30
ω 40 50 60

(e) (f)
0.24
R² 0.25
MAPE 0.99
0.95
Fitted curve Fitted curve
0.20
0.90 95 % confidence band 95 % confidence band 0.98
0.20

MAPE
MAPE

0.16 0.85 0.97


0.15
0.80 0.96
0.12 MAPE R²
0.10 Fitted curve
Fitted curve 0.75
0.95 95 % confidence band
0.08 95 % confidence band
0.70 0.05
10 20 30 40 50 10 20 30 40 50 2 4 K6 8 10 2 4 6 8 10
K K K
0.4
1.8 MSE 0.24 MAE
5
1.6 Fitted curve Fitted curve
0.3
4 95 % confidence band 0.20 95 % confidence band
MAE

1.4
MSE
MAE
MSE

3 1.2 0.2
0.16
1.0
2 MSE MAE 0.1
Fitted curve Fitted curve 0.12
0.8
1 95 % confidence band 95 % confidence band
0.6 0.0
10 20 2 4 6 8 10 2 4 K6 8 10
K 30 40 50 10 20 K 30 40 50 K
(g) (h)
Figure 7. Parameter discussion results: (a) 𝜑, A, time-series decomposition method; (b) 𝜑, B, time-
Figure Parameter discussion
series7.decomposition method; (c)results: (a) ϕ, A, decomposition
𝐾, A, time-series time-series decomposition
method; (d) 𝐾, method; (b) ϕ, B, time-
B, time-series
series decomposition
decomposition method;
method; (e) 𝜔, A,(c) K, A, time-series
constructive decomposition
function method; method; (d)
(f) 𝜔, B, constructive K, B, time-series
function
method; (g)⁡𝐾, method;
decomposition A, constructive
(e) ω,function
A, constructive 𝐾, B, constructive
method; (h)function method;function
(f) ω, method.
B, constructive function
method; (g) K, A, constructive function method; (h) K, B, constructive function method.
It can be seen from Figure 7 that changing the parameters of the processing function
will affect the accuracy of the prediction results. Moreover, the prediction results of A and
B can be obtained at the same regularity due to parameter changes. To visualize the reg-
ularity, the results are fitted by polynomial functions as shown in Figure 7. In the case of
parameter changes, there are the following regularity,
1. For the time-series decomposition method, it can be seen from the periodic term in
the model can achieve the best-predicted results. The optimal parameter K corre-
sponding to the two sites is different because of the different wind speeds. The small
wind speed in site A corresponds to the small K value, and site B is the opposite. The
maximum wind speed can provide a reference for the value of K.
Sustainability 2022, 14, 15301 9 of 12

3.3. Results and Analysis


According
3.3. Results andtoAnalysis
Section 3.2.1, the prediction results are affected by parameter change
and the According
methods to ofSection
selecting
3.2.1,the
theoptimal
predictionparameters have by
results are affected been discussed
parameter above.
change, and To de-
scribe
thethe results
methods that used
of selecting thethe improved
optimal ARIMA
parameters model
have been with the
discussed optimal
above. parameters,
To describe the the
results results
predicted that used theplotted
are improved ARIMA model
in Figures 8 andwith the optimal
9. Figures 8 andparameters,
9 show the theprediction
predicted results
results
of the aremethods,
three plotted in Figures 8 and 9. Figures
the time-series 8 and 9 show
decomposition the prediction
method, results of thefunction
the construction
three methods, the time-series decomposition method, the construction function method,
method, and SARIMA, for sites A and B, respectively. Moreover, Tables 1 and 2 corre-
and SARIMA, for sites A and B, respectively. Moreover, Tables 1 and 2 correspond to the
spond to the evaluation
evaluation indicators forindicators forresults
the predicted the predicted
in A and B,results in A and B, respectively.
respectively.

Original data Constructive function method


Time-series decomposition method SARIMA
Wind speed (m/s)

20
15
10
5

0 PEER10
Sustainability 2022, 14, x FOR REVIEW 20 30 40 50 60 70 80 90 10 of 13

Figure 8. Predicted
8. Predicted
Figure resultsin
results site A.
insite A.

Original data Constructive function method


Time-series decomposition method SARIMA
Wind speed (m/s)

0
0 10 20 30 40 50 60 70 80 90
Figure 9. Predicted
Figure results
9. Predicted resultsin
insite B.
site B.

Table 1. Forecast evaluation indicators site A.


Table 1. Forecast evaluation indicators site A.
Model MAPE MSE MAE R2
Model MAPE MSE MAE 𝑹𝟐
SARIMA
SARIMA 23.10%23.10%5.91 1.72
5.91 0.68
1.72 0.68
Improved ARIMA by constructive function method 9.28% 0.84 0.72 0.95
Improved
ImprovedARIMA bytime-series
ARIMA by constructive function method
decomposition method 9.22%9.28%0.99 0.84
0.76 0.72
0.94 0.95
Improved ARIMA by time-series decomposition method 9.22% 0.99 0.76 0.94
Table 2. Forecast evaluation indicators site B.
Table 2. Forecast evaluation indicators site B.
Model MAPE MSE MAE R2
Model MAPE MSE MAE 𝑹𝟐
SARIMA 17.02% 0.09 0.21 0.94
SARIMA
Improved ARIMA by constructive function method 7.75%17.02%0.01 0.09
0.09 0.21
0.99 0.94
Improved
ImprovedARIMA bytime-series
ARIMA by constructive function method
decomposition method 6.92%7.75%0.02 0.01
0.09 0.09
0.99 0.99
Improved ARIMA by time-series decomposition method 6.92% 0.02 0.09 0.99

Based on the prediction results, it can be seen that the method proposed in this re-
search is better able to obtain future one-step predictions in short-time wind speed pre-
diction in rugged mountainous areas. However, as seen in Figure 7, if the parameters are
Sustainability 2022, 14, 15301 10 of 12

Based on the prediction results, it can be seen that the method proposed in this research
is better able to obtain future one-step predictions in short-time wind speed prediction in
rugged mountainous areas. However, as seen in Figure 7, if the parameters are not chosen
appropriately, the proposed method’s prediction results are rather inferior to the SARIMA
model, which shows the importance of parameter selection. If the appropriate parameters
are selected, the predicted results for sites A and B are much better than the SARIMA model.
Moreover, the proposed method is easy to operate, stable, and time-consuming, and has no
requirements for the configuration of the operating equipment.
However, the prediction results in Figures 8 and 9 are based on the optimal parameters
after discussion. It is therefore necessary to validate the methods for determining the
parameters obtained in Section 3.2.1 to verify the applicability of the improved ARIMA
model. A set of wind speeds in rugged mountainous areas was arbitrarily selected to build
the model and the selected dataset was divided into a training set and a testing set. Then, the
Sustainability 2022, 14, x FOR PEER REVIEW
optimal parameters for the appropriate training set were selected according to the regularity 11 of 13
identified in Section 3.2.1. The predicted results of the improved ARIMA with the optimal
parameters are plotted in Figure 10. The evaluation indicators are shown in Table 3.

Original data Constructive function method


Time-series decomposition method SARIMA
Wind speed (m/s)

16
12
8
4
0
0 10 20 30 40 50 60 70 80 90
Figure
Figure 10. Predicted
10. Predicted resultsof
results ofvalidation
validation data.
data.
Table 3. Forecast evaluation indicators of validation data.
Table 3. Forecast evaluation indicators of validation data.
Model MAPE MSE MAE R2
Model MAPE MSE MAE 𝑹𝟐
SARIMA 17.02% 0.09 0.21 0.94
Improved ARIMA by SARIMA
constructive function method 17.02%0.01
7.75% 0.09
0.09 0.21
0.99 0.94
Improved
ImprovedARIMA
ARIMA by bytime-series
constructive function method
decomposition method 6.92%7.75%0.02 0.01
0.09 0.09
0.99 0.99
Improved ARIMA by time-series decomposition method 6.92% 0.02 0.09 0.99
The results of the validation show that the improved ARIMA with the optimal
The results of the validation show that the improved ARIMA with the optimal pa- param-
eters determined by the proposed method still behaves better than the SARIMA model.
rameters determined by the proposed method still behaves better than the SARIMA
This demonstrates the applicability of the proposed method to wind speed prediction in
model. This demonstrates the applicability of the proposed method to wind speed predic-
mountainous areas.
tion in mountainous areas.
4. Conclusions
4. Conclusions
This research proposed two methods to improve the quality of the input data for
the ARIMA model, including the constructive function method and the time-series de-
This research proposed two methods to improve the quality of the input data for the
composition method. Based on the history of wind speed of two different sites, A and
ARIMAB, themodel, including
parameters of the the constructive
proposed methodsfunction method
are discussed. Theand the time-series
preprocessed decompo-
data were
sition method.
input into theBased
ARIMAon the history
models of wind
for predicting speedwind
the future of two different
speed. sites,toAverify
Furthermore, and B, the
parameters of the
the reliability of proposed
the proposed methods
improvedare discussed.
ARIMA, The
another preprocessed
dataset datainwere
of wind speed the input
into mountain
the ARIMA areamodels
was conducted for prediction.
for predicting Some conclusions
the future wind speed.were obtained, as below.
Furthermore, to verify the
reliability
1. of the
This proposed
research improved
proposes ARIMA,
two different methodsanother dataset offunction
(i.e., constructive wind speed
method in and
the moun-
tain area time-series decomposition
was conducted method) Some
for prediction. for pre-processing
conclusionstheweredata obtained,
input to theasARIMA
below.
model, which improves the quality of the wind speed input to the ARIMA model.
1. This research proposes two different methods (i.e., constructive function method and
time-series decomposition method) for pre-processing the data input to the ARIMA
model, which improves the quality of the wind speed input to the ARIMA model.
2. To solve the problem of parameter uncertainty in the proposed methods, based on
Sustainability 2022, 14, 15301 11 of 12

2. To solve the problem of parameter uncertainty in the proposed methods, based on the
wind speed of two typical mountain wind fields, this research carries out a discus-
sion on parameter optimization. The discussion gets the regularity for determining
the relevant parameters of the proposed methods, and the regularity of identifying
parameters is validated using the other dataset of mountain wind fields.
3. Combining the ARIMA and proposed methods (i.e., constructive function method
and time-series decomposition method), the improved ARIMA model is conducted
for prediction in rugged mountainous areas. The results show the proposed model
behaves well and operates easily, and the evaluation indicates that the improved
ARIMA model has superior performance to the SARIMA model. The improved
ARIMA model can provide a new method for prediction in mountainous areas.

Author Contributions: Conceptualization, Y.D. and M.Z.; methodology, Z.Z.; software, J.X.; valida-
tion, Z.Z., J.X. and M.L.; formal analysis, J.Z.; investigation, M.Z.; resources, M.Z.; data curation,
M.L.; writing—original draft preparation, Y.D.; writing—review and editing, M.Z.; visualization,
Z.Z.; supervision, Z.Z.; project administration, J.X.; funding acquisition, Z.Z. All authors have read
and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to Unit-related requirements.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Zhang, J.; Zhang, M.; Jiang, X.; Wu, L.; Qin, J.; Li, Y. Pair-Copula-Based Trivariate Joint Probability Model of Wind Speed, Wind
Direction and Angle of Attack. J. Wind. Eng. Ind. Aerodyn. 2022, 225, 105010. [CrossRef]
2. Li, Y.; Jiang, F.; Zhang, M.; Dai, Y.; Qin, J.; Zhang, J. Observations of Periodic Thermally-Developed Winds beside a Bridge Region
in Mountain Terrain Based on Field Measurement. J. Wind. Eng. Ind. Aerodyn. 2022, 225, 104996. [CrossRef]
3. Cui, P.; Ge, Y.; Li, S.; Li, Z.; Xu, X.; Zhou, G.G.D.; Chen, H.; Wang, H.; Lei, Y.; Zhou, L.; et al. Scientific Challenges in Disaster Risk
Reduction for the Sichuan–Tibet Railway. Eng. Geol. 2022, 309, 106837. [CrossRef]
4. Gou, H.; Chen, X.; Bao, Y. A Wind Hazard Warning System for Safe and Efficient Operation of High-Speed Trains. Autom. Constr.
2021, 132, 103952. [CrossRef]
5. Wang, J.; Li, J.; Wang, F.; Hong, G.; Xing, S. Research on Wind Field Characteristics Measured by Lidar in a U-Shaped Valley at a
Bridge Site. Appl. Sci. 2021, 11, 9645. [CrossRef]
6. Xu, Y.L.; Chen, J. Characterizing Nonstationary Wind Speed Using Empirical Mode Decomposition. J. Struct. Eng. 2004, 130,
912–920. [CrossRef]
7. Zhao, J.; Guo, Y.; Xiao, X.; Wang, J.; Chi, D.; Guo, Z. Multi-Step Wind Speed and Power Forecasts Based on a WRF Simulation and
an Optimized Association Method. Appl. Energy 2017, 197, 183–202. [CrossRef]
8. Li, L.; Yin, X.-L.; Jia, X.-C.; Sobhani, B. Day Ahead Powerful Probabilistic Wind Power Forecast Using Combined Intelligent
Structure and Fuzzy Clustering Algorithm. Energy 2020, 192, 116498. [CrossRef]
9. Jung, S.; Kwon, S.-D. Weighted Error Functions in Artificial Neural Networks for Improved Wind Energy Potential Estimation.
Appl. Energy 2013, 111, 778–790. [CrossRef]
10. Han, S.; Yang, Y.; Liu, Y. The Comparison of BP Network and RBF Network in Wind Power Prediction Application. In Proceedings
of the 2007 Second International Conference on Bio-Inspired Computing: Theories and Applications, Zhengzhou, China,
14–17 September 2007; IEEE: New York, NY, USA; pp. 173–176.
11. Wang, J.; Zhang, W.; Li, Y.; Wang, J.; Dang, Z. Forecasting Wind Speed Using Empirical Mode Decomposition and Elman Neural
Network. Appl. Soft Comput. 2014, 23, 452–459. [CrossRef]
12. Shahid, F.; Zameer, A.; Muneeb, M. A Novel Genetic LSTM Model for Wind Power Forecast. Energy 2021, 223, 120069. [CrossRef]
13. Kang, A.; Tan, Q.; Yuan, X.; Lei, X.; Yuan, Y. Short-Term Wind Speed Prediction Using EEMD-LSSVM Model. Adv. Meteorol. 2017,
2017, 6856139. [CrossRef]
14. Altan, A.; Karasu, S.; Zio, E. A New Hybrid Model for Wind Speed Forecasting Combining Long Short-Term Memory Neural
Network, Decomposition Methods and Grey Wolf Optimizer. Appl. Soft Comput. 2021, 100, 106996. [CrossRef]
15. Fu, W.; Zhang, K.; Wang, K.; Wen, B.; Fang, P.; Zou, F. A Hybrid Approach for Multi-Step Wind Speed Forecasting Based on
Two-Layer Decomposition, Improved Hybrid DE-HHO Optimization and KELM. Renew. Energy 2021, 164, 211–229. [CrossRef]
Sustainability 2022, 14, 15301 12 of 12

16. Cadenas, E.; Rivera, W. Wind Speed Forecasting in the South Coast of Oaxaca, México. Renew. Energy 2007, 32, 2116–2128.
[CrossRef]
17. Hasan, M.K.; Hossain, N.M.; Naylor, P.A. Autocorrelation Model-Based Identification Method for ARMA Systems in Noise. IEE
Proc. Vis. Image Process. 2005, 152, 520. [CrossRef]
18. Zerubia, J.; Alengrin, G. Estimation of ARMA(p,q) Parameters. Signal Process. 1991, 22, 53–60. [CrossRef]
19. Tam, W.-K.; Reinsel, G.C. Tests for Seasonal Moving Average Unit Root in ARIMA Models. J. Am. Stat. Assoc. 1997, 92, 725–738.
[CrossRef]
20. Zhang, W.; Lin, Z.; Liu, X. Short-Term Offshore Wind Power Forecasting—A Hybrid Model Based on Discrete Wavelet Transform
(DWT), Seasonal Autoregressive Integrated Moving Average (SARIMA), and Deep-Learning-Based Long Short-Term Memory
(LSTM). Renew. Energy 2022, 185, 611–628. [CrossRef]
21. Liu, X.; Lin, Z.; Feng, Z. Short-Term Offshore Wind Speed Forecast by Seasonal ARIMA—A Comparison against GRU and LSTM.
Energy 2021, 227, 120492. [CrossRef]
22. Nankervis, J.C.; Savin, N.E. Testing for Uncorrelated Errors in ARMA Models: Non-standard Andrews-Ploberger Tests. Econom. J.
2012, 15, 516–534. [CrossRef]
23. Lovrić, M. Molekulsko modeliranje odnosa strukturnih svojstava i aktivnosti molekula s pomoću programskog jezika Python
(prvi dio). Kem. Ind. 2018, 67, 409–419. [CrossRef]
24. Persson, I.; Khojasteh, J. Python Packages for Exploratory Factor Analysis. Struct. Equ. Model. A Multidiscip. J. 2021, 28, 983–988.
[CrossRef]

You might also like