Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2022 Tenth International Conference on Advanced Cloud and Big Data (CBD)

Research on Cloud Computing load forecasting based


on LSTM-ARIMA combined model
2022 Tenth International Conference on Advanced Cloud and Big Data (CBD) | 979-8-3503-0971-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/CBD58033.2022.00013

Xiaomin Liu Xiaolan Xie* Qiang Guo


Guangxi Key Laboratory of Guangxi Key Laboratory of Guangxi Key Laboratory of
Embedded Technology and Embedded Technology and Embedded Technology and
Intelligent System Intelligent System Intelligent System
Guilin University of Technology Guilin University of Technology Guilin University of Technology
Guilin,China Guilin,China Guilin,China
1378167510@qq.com xie_xiao_lan@foxmail.com guoqiang121@163.com

Abstract—With the continuous development of cloud At present, load forecasting models can be mainly divided
computing technology, the change of cloud computing resource into two categories: single forecasting models and combined
load presents more and more complex characteristics, and forecasting models based on ensemble learning. Yan et al. [1]
efficient load prediction has become a key technology to solve the used the integrated moving average autoregression model (auto-
imbalance of cloud computing resource utilization. Aiming at the regression and moving average model, ARIMA) algorithm to
problem of low prediction performance of the current load perform software The prediction of the resource consumption of
prediction model, a combined prediction model LSAR based on the system is carried out by comparing the prediction effect of
long short-term memory network LSTM and autoregressive the ARIMA algorithm, the support vector machine and the
moving average model ARIMA is proposed by comprehensively
artificial neural network in different scenarios. Calheiros et al. [2]
considering the factors of prediction accuracy and prediction time.
Compared with the traditional load prediction models ARIMA
designed a cloud platform workload prediction module based on
and LSTM, the open data set was used for the experiment. The the ARIMA algorithm, and actively configures dynamic
experimental results show that the prediction accuracy of the resources based on the prediction data of the model, which can
cloud computing resource combination prediction model is ensure the user's application service quality with as little
significantly higher than that of other prediction models, and the resource cost as possible. Shetty et al. [3] used multi-model
real-time prediction error of resource load in the cloud fusion to predict the load of cloud computing resources. The
environment is significantly reduced. experiment divided the load data set into training set and test set,
and used exponential smoothing, ARIMA, neural network and
other prediction models to calculate the average of the training
set. Based on the square error, the prediction results obtained
Keywords—cloud computing, load forecasting, combined from the test set are divided into weights, and finally the
forecasting model, long short-term memory network (LSTM) combined prediction value is obtained. Prassanna et al. [4] used
the HoltWinters algorithm to predict the workload of the cloud
environment. The mentioned algorithm can fit the historical time
I. INTRODUCTION series trend curve by establishing a mathematical model. It is
With the advent of the cloud computing era, more and more prone to the problem that the parameters are difficult to confirm.
companies have migrated their applications to the cloud The parameters are set by their own experience, so the final
computing platform. Cloud computing uses virtualization accuracy of the model is greatly affected by the parameters. At
technology to integrate discrete resources into resource pools, the same time, it is difficult for the model to learn the complex
and realizes user-to-computing in an on-demand manner. Elastic laws inside the time series data. Zhang et al. [5] used the
requirements of resources. The essence of cloud computing is to Recursive Neural Network (RNN) to predict cloud workloads,
provide services on demand to realize automatic allocation and and verified the accuracy of the method through experiments on
switching of application resources on the cloud platform. To the Google Cloud Trace dataset. Babu et al. [6] adopted the
achieve dynamic scaling of computing power, load balancing combined forecasting model of ARIMA-ANN to perform linear
technology is required. Cloud computing platforms use load and nonlinear forecasting on time series data, respectively, and
balancing technology to increase processing capacity , which combine the final results. The experimental results show that the
strengthens data processing capabilities. Load is a key indicator mixed models have higher prediction accuracy.
required to achieve load balancing. Accurately predicting At present, there are various services running in cloud
application load can achieve peak traffic expansion and traffic computing nodes, and the resource requirements of various
shrinking of application capacity, thereby saving resource costs, services are also different.The load time series data usually has
reducing manual intervention, and improving cloud computing a combination of linear and nonlinear characteristics during the
platforms. reliability and performance. operation. Most of the existing prediction models use a single
prediction model, or use the prediction results of multiple

The work was supported by The National Natural Science Foundation of China (No.62262011) and The Natural Science Foundation of Guangxi
˄No.2021JJA170130˅

979-8-3503-0971-3/22/$31.00 ©2022 IEEE 19


DOI 10.1109/CBD58033.2022.00013
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 04,2024 at 17:22:28 UTC from IEEE Xplore. Restrictions apply.
different prediction models to integrate weights to obtain the 3.2 ARIMA prediction model
final prediction value. The above two models have different ARIMA can be used to forecast time series and is often used
degrees of improvement in the prediction accuracy, but there is in demand forecasting. The ARIMA model is a comprehensive
no fundamental solution to the problem.The impact of a poor model that includes both the autoregressive AR (Auto
model on the final result. Therefore, this paper proposes a new Regressive) model and the moving average MA (Moving
prediction model LSAR by combining LSTM [7] and ARIMA[8]. Average) model in the time series model. Among them, d is the
Firstly, LSTM prediction model and ARIMA prediction model difference order, which is the number of differences to obtain
are used to predict the future workload. Secondly, the objective the stationary time series; p is the autoregressive term; q is the
weighting method is used to accumulate the prediction results of corresponding moving average term. The basic formula of
the two prediction models to get the combined prediction value. ARIMA is shown in formula (3):
p q

II. PROBLEM DESCRIPTION


Lt ¦ Mi Lt i  H t  ¦ T jH t  j
i 1 j 1
˄3˅
This paper mainly focuses on the load prediction problem of
host resources in the cloud environment Assume that the original where Lt is the cluster load value at time t;¶ǃ© are the
resource sequence of cloud computing is L = {l1,l2,···lt},where lt autoregressive (AR) parameter and the moving average (MA)
is the load of the cluster at time .Select the data at m times from parameter, respectively.;p is the order of the autoregressive
L as the input vector of the prediction model M = ˄lt-m+1,···,lt- model;q is the moving average model order;¦ is the residual
1,lt˅.The so-called load prediction is to predict the cluster load sequence.
at time t+1 through the input vector M.
The workload in the cloud environment has a certain
dependency, and the load situation at each moment is very
closely related to the previous load situation. When the historical
value is closer to the current time t, the relationship between 3.3 LSTM prediction model
them is closer. At the same time, the deployment in the cloud
computing environment is becoming more and more complex, Long Short-Term Memory Networks (LSTMs) are variants
and the load time series is neither purely linear nor nonlinear, of Recurrent Neural Networks (RNNs). RNN is prone to the
and generally includes a combination of two structures. problem of gradient explosion or disappearance. LSTM
Although the single forecasting model is better than the introduces a new internal state ct for linear cyclic information
combined forecasting model in forecasting time, the forecasting transmission, and introduces a gate mechanism to control the
accuracy is significantly lower than that of the combined path of information transmission, namely input gate it, forget
forecasting model, and it has certain limitations in capturing gate ft and Output gate ot. The calculation methods of 3 gates,
complex load time series patterns. The prediction performance cell memory states, and hidden layers are as follows:
is more stable. Moreover, some poor single prediction models it V (Zixt  uiht  1  bi ) ˄4˅
will affect the final results. This paper proposes the idea of error
correction on the basis of combined prediction, which further
improves the accuracy of load prediction.
ft V (Z fxt  ufht  1  bf ) ˄5˅

ot V (Z oxt  uoht  1  bo) ˄6˅


III. LOAD PREDICTION MODEL
ct ft ct  1  it tanh((Zixt  ucht  1  bc ) ˄7˅
ta
3.1 Load Model
The so-called load is a description of the working state of ht ot tanh˄ct˅ ˄8˅
the server, which reflects the pressure of the current server task
processing. This paper mainly considers the impact of two Where σ() is the logistic function, its output interval is (0,1),
factors of memory usage on the cluster load. By adopting a Ĵ is the product of vector elements, tanh() is the activation
dynamic load prediction model, the influence of these two function, and xt is the input at the current moment,ht-1 and ht are
factors on the load under different resource requirements is fully the external state at the previous moment and the current
considered. The load calculation in a single node is shown in moment, respectively,¹ *ǃu* and b* are the learned network
equations (1) and (2): parameters.
L WmLm  WcpuLcpu ˄1˅ The calculation process of the LSTM cyclic unit is: using the
external state ht-1 at the previous moment and the input xt at the
Wm  Wcpu 1 ˄2˅ current moment, the states of the three gates and the candidate
states can be calculated according to formulas (5)~(8).After this,
Where L refers to the load calculation value of a virtual host the memory cell ct is updated by combining the forget gate ft and
in the computer cluster LmǃLcpu are memory usage and CPU the input gate it. Finally, combined with the output gate ot, the
usage,WmǃWcpu are the weight coefficients of memory usage information of the internal state is passed to the external state ht.
and CPU usage, respectively.

20

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 04,2024 at 17:22:28 UTC from IEEE Xplore. Restrictions apply.
3.4 Evaluation Criteria for Prediction Models
Time series data
This paper uses mean absolute error (MAE), mean absolute
percentage error (MAPE), mean square error (MSE), root mean
square error (RMASE) and coefficient of determination (R2) to
evaluate the prediction results. The calculation methods are as
follows: Formulas (9)~(13) are shown:

1 n
MAE ¦ | yi yˆi |
ni1
ARIMA load forecasting LSTM load prediction
˄9˅ model model

1 n yi  yˆ i
MAPE ¦|
n i 1 yi
|
˄10˅
Predicted load value Predicted load value
1 n
MSE ¦
ni1
( yi  yˆi ) 2
˄11˅

1 n
RMSE ¦ ( yi  yˆi)2
ni1 ˄12˅
n

¦ ( y  yˆ ) 2
CRITIC Data Fusion
i i

R2 1 i 1
n

¦ ( y  y)
i 1
i
2

˄13˅
Combined predicted value

where n is the number of load predictions,yi is the actual load


value,ŷi is the predicted load value,Cyi is the load average.
Figure 1 Combined forecasting mode
3.5 Combined prediction model based on ensemble
learning
The forecasting steps based on the combined forecasting
Considering that the ARIMA forecasting model is not model are as follows:
accurate enough for nonlinear time series fitting, and the simple
1) Time series load data acquisition. According to
neural network is not sufficient for linear and nonlinear
processing performance, this paper proposes a combined equations (1) to (3), considering that the cluster service in
forecasting model based on LSTM and ARIMA . The specific actual production takes 3 to 5 minutes from creation to
model as shown in Figure 1. deployment, the cloud computing cluster load time series data
set with a time interval of 5 minutes and a length of n is
collected LT={l1,l2,···,ln}.
2) LT is used to train the ARIMA prediction model and
LSTM load prediction model, namely, Equations (4) ~ (8),
Therefore, the ARIMA predicted load LPA ={lPA PA
1 ,···,lm } and LSTM
P PL PL
predicted load LL ={l1 ,···,lm } at m moments in the future are
obtained.L TA ={l TA TA TA T TL TL TL
1 ,l 2 ,···,l m } and L L ={l 1 ,l 2 ,···,l m } were

predicted by using the two prediction models after training LT.


3) Obtain the combined predicted value using CRITIC
objective weight method.Firstly, the absolute mean percentage
error (MAPE), absolute mean error (MAE), mean error (MSE),
root mean square error (RMSE), standard deviation (SE) and
relative standard deviation (RSD) of ARIMA and LSTM were
calculated according to LTA , LTL and LT.

21

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 04,2024 at 17:22:28 UTC from IEEE Xplore. Restrictions apply.
IV. EXPERIMENTAL RESULTS AND ANALYSIS TABLE I. COMPARISON OF PREDICTION PERFORMANCE

濣瀅濸濷濼濶瀇濼瀂瀁澳 濘瀉濴濿瀈濴瀇濼瀂瀁澳瀆瀇濴瀁濷濴瀅濷澳

4.1 Experimental data 瀀瀂濷濸濿澳 濠濔濘澳 濠濔濣濘澳 濠濦濘澳 濥濠濦濘澳 濥澳
In order to verify the validity of the cloud computing 濔濥濜濠濔澳 濅濁濆濊濄濉澳 濃濁濃濌濈濈澳 濌濁濋濌濊濄澳 濆濁濄濇濉澳 濃濁濉濃濋濋澳
resource load prediction model proposed in this paper, the public 澳
data set provided by The Grid Workloads Archive website is 濟濦濧濠澳 濄濁濉濄濋濉澳 濃濁濃濉濊濅澳 濈濁濅濃濆濄澳 濅濁濅濋濄濄澳 濃濁濊濉濇濇澳
used to verify it experimentally. Using the load time series data 澳
provided by Materna, the data set records a grid workload every 濟濦濔濥澳 濃濁濌濉濅濉澳 濃濁濃濆濋濄澳 濅濁濇濌濊濇澳 濄濁濈濋濃濇澳 濃濁濌濉濃濊澳
5 minutes.
4.2 Analysis of experimental results
In order to test and evaluate the performance of the load From the figure, all models and the original sequence are
prediction model (LSAR) proposed in this paper, the original basically consistent in trend. However, from the perspective of
data TEST is compared with the single model ARIMA and prediction accuracy, the single load prediction model ARIMA
LSTM, and it is also compared with the combined model and LSTM are obviously inferior to the combined prediction
LSTM-ARIMA(LSAR). model LSAR. Compared with the traditional load prediction
model, the load prediction model LSAR proposed in this paper
The actual workload value in November 2010 was selected has a great improvement in accuracy. At the same time, it can be
from the data set, and the weights of CPU utilization and seen from the experimental results that the LSAR model
memory utilization were set to 0.4 and 0.6 respectively by proposed in this paper has high accuracy, and the error has been
analyzing the load data, and a piece of data was obtained every further corrected on the basis of the combined prediction model,
5 minutes, totaling 18,800 pieces of data. The first 70% of the which can further ensure the accuracy of the prediction data.
data is selected as the training set, 20% of the data is used as the
single-step test set, and the remaining data is used to verify the The load prediction of cloud computing resources mainly
generalization ability of the model. The results before and after refers to the changes in the load of cloud computing resources
model load prediction are shown in Figure 2, and Table1 gives for a period of time in the future. And from the prediction results,
the prediction error indicators of each model. the model proposed in this paper can not only accurately predict
the cloud computing resource load, but also the prediction model
has a certain generalization ability.

V. CONCLUSION

In order to deal with the complex and changing cloud


computing load time series data, this paper considers the impact
of CPU and memory on the load at the same time, and proposes
a combined prediction model LSAR based on the long short-
term memory network LSTM and the autoregressive moving
average model ARIMA. Firstly, ARIMA and LSTM prediction
models are used for prediction, and then the objective weighting
method is used to combine the prediction results to get the final
prediction results.The results of using public datasets show that
the prediction model in this paper has better prediction accuracy
and generalization ability, can more accurately predict the
changing trend of cloud computing center load, and can
effectively improve the network resource utilization of cloud
computing center.
Figure 2 Experimental comparison between different models

22

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 04,2024 at 17:22:28 UTC from IEEE Xplore. Restrictions apply.
REFERENCES

[1] YAN Y Q, GUO P. Predicting resource consumption in a Web server


using ARIMA model [J]. Journal of Beijing Institute of Technology, 2014,
23(4): 502–510. [1]
[2] CALHEIROS R N, MASOUMI E, RANJAN R, et al. Workload
prediction using ARIMA model and its impact on cloud applications' QoS
[J]. IEEE Transactions on Cloud Computing, 2014, 3(4): 449–458.
[3] Shetty J, Shobha G. An ensemble of automatic algorithms for forecasting
resource utilization in cloud[C]//Proc of 2016 Future Technologies
Conference (F TC) ,2016:301-306.
[4] PRASSANNA J, VENKATARAMAN N. Adaptive regressive holt –
winters workload prediction and firefly optimized lottery scheduling for
load balancing in cloud [J]. Wireless Networks, 2019: 1–19.
[5] Zhang W,Li B,Zhao D,et al. Workload prediction for cloud cluster using
a recurrent neural network[C]// Proc of 2016 International Conference on
Identification, Information and Knowledge in the Internet of Things
(IIKI) ,2016:104-109.
[6] Babu C N,Reddy B E. A moving average filter based hybrid ARIMA
ANN model for forecasting time series data [J].Applied Soft
Computing ,2014,23:27-38.
[7] Sudhakar C, Kumar A R, Siddartha N,et al. Workload predic-tion using
ARIMA statistical model and long short term memory recurrent neural
networks[C]// Proc of 2018 Inter-national Conference on Computing,
Power and Communica-tion Technologies (GUCON) ,2018: 600-604.
[8] NAZARIPOUYA H, WANG Y B, CHU C C, et al. Univa-riate time
series prediction of solar power using a hybrid wavelet- ARMA- NARX
prediction method [C] // 2016 .IEEE/PES Transmission and Distribution
ConferenceandExposition(T&D).IEEE,2016:DOI:10.1109/TDC.2016.75
19959.
[9] Diakoulaki D, Mavrotas G, Papayannakis L. Determining ob-jective
weights in multiple criteria problems: The critic method[ J ]. Computers
&. Operations Research, 1995, 22(7) : 763-770.
[10] Yang Teng-fei, Shi Kun,Wang Qrsheng. Two objective weighting
methods and their application in determining com-bined prediction
weights [J] . Surveying and Mapping Engineering,2014,23(7):59-61. (in
Chinese)

23

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 04,2024 at 17:22:28 UTC from IEEE Xplore. Restrictions apply.

You might also like