ShaunBorg Placement Report

Optimizing Cost by Forecasting Server Utilization through Time
Series Analysis
Shaun Borg
shaun.borg.22@um.edu.mt
Department of Artificial Intelligence
University of Malta
Malta
ABSTRACT Using statistical analysis, we can identify patterns, trends, and
By predicting server utilisation, IT admins can take measures in seasonality in the historical server usage data to better understand
advance to optimise usage and reduce costs. This research aims the data and select the most suitable time series algorithm for
to identify potential improvements for a user’s RAS (Remote Ap- the data. By applying machine learning techniques such as the
plication Server) configuration setup by predicting the utilisation time series algorithms VARMAX (Vector Autoregressive Moving
of servers in a RAS farm using VARMAX (Vector Autoregressive Average with Exogenous Variables) and LSTM (Long Short-Term
Moving Average with Exogenous Variables) and LSTM (Long Short- Memory) to RAS, we can predict the utilisation of servers. Time
Term Memory). The proposed models used historical server data series algorithms can identify data trends and extract valuable
to predict the server’s resources, such as CPU, RAM, and Disk insights from temporal data (such as historical server usage) to
Read/Write. Using that predicted data, the total usage per hour make accurate predictions. Deep learning techniques, such as Long
was calculated and displayed as a 7-day hourly heatmap table that Short-Term Memory (LSTM) neural networks, can effectively retain
highlights the different levels of predicted server utilisation. The long-term dependencies, making them suitable for modelling noisy
proposed models were trained using optimal parameters and eval- one-dimensional time series data and making accurate predictions
uated on different forecast horizons using metrics such as Mean over extended periods [9].
Absolute Error (MAE) and Root Mean Squared Error (RMSE). The By predicting server utilisation, IT admins using RAS can take
results indicated that the VARMAX model performed better at a proactive approach for timely actions, such as load balancing,
short-term predictions, whilst LSTM performed better at predicting resource allocation, or scaling down server capacities, which can
long-term predictions. The results obtained in this research showed help them optimise performance and lower operational costs. Cur-
that using historical server data, we can successfully predict the rently, RAS lacks the functionality to forecast a server’s utilisation;
server’s resource utilisation for a 7-day forecast. implementing such a system will help benefit RAS customers as it
can aid them in making decisions that reduce costs.
KEYWORDS
Time Series Analysis, LSTM, VARMAX, neural networks, server
utilisation
1.2 Challenges
1 INTRODUCTION Traditional time series forecasting techniques, like Autoregressive
Monitoring server utilisation has become a crucial factor in man- integrated moving average (ARIMA), utilise patterns observed in
aging IT costs. Companies must ensure their servers operate effi- historical data to predict future outcomes. Such techniques may,
ciently and effectively to avoid having idle resources that can incur however, not be ideal when the data is non-stationary or contains
unnecessary expenses. They can achieve this by predicting server random fluctuations [4]. Traditional approaches are also computa-
utilisation and taking measures in advance to optimise usage. The tionally intensive and only perform best when working with station-
proposed system will utilise historical server usage data to identify ary data, making them impossible to predict constantly changing
patterns and trends in data and predict future server utilisation data such as CPU usage [9].
levels using time-series algorithms. By accurately forecasting a Time series algorithms can effectively capture and model data’s
server’s under-utilisation patterns using these algorithms, compa- seasonality (recurring patterns that repeat at fixed intervals) and
nies can optimise resource allocation, reduce costs, and enhance trends (patterns in data). Modelling seasonality and trends in time
overall system efficiency. series data can be challenging as they constantly change over time.
Selecting the appropriate time series forecasting algorithm for the
1.1 Motivation specific data is essential, as different models are suitable for different
Parallels Remote Application Server (RAS) 1 provides administra- types of data [8]. As indicated by Nguyen et al., [9] trends in data
tors with a centralised management console that combines on- may also affect the performance of short-term and long-term predic-
premises servers, VDI and multi-cloud solutions. RAS allows IT tions. For long-term predictions, LSTM models are better suited for
admins to add RDSH servers to distribute the workload across detecting patterns in the input features and can effectively examine
multiple servers. long sequences, unlike traditional regression methods, which can
find it challenging to adjust when forecasting problems include
1 https://www.parallels.com/products/ras/remote-application-server/ multiple inputs and variables [2].
1.3 Aim known intervals, such as quarterly, yearly, monthly, weekly, or
This research investigates the application of machine-learning tech- daily data. The cyclical pattern is when the data fluctuates for an
niques that identify potential improvements for a user’s RAS config- extended period and does not adhere to a consistent pattern. An
uration setup. The system will determine which servers are being example is a few years of economic growth followed by a few years
under-utilised based on the usage of resources. This information of recession, then economic growth again. The irregular component
will assist them in making decisions that can help reduce costs. The is when a time series contains fluctuations arising from unforeseen
company will provide the necessary data for this research, such factors and do not have a consistent pattern. An example of these
as the historical records of server usage. The data used for this fluctuations is when floods or earthquakes occur [6].
research will serve as the foundation for training and testing the Predicting resource consumption can be considered a time series
time series algorithms to forecast future server utilization patterns. forecasting problem in which historical resource usage patterns are
analyzed to anticipate future resource utilization [9]. Ullah et al.
1.4 Objectives [22] explain that utilization prediction models can be categorized
into three main categories: classical time-serial analysis approaches
In order to reach the stated aim, the following objectives have been
(ex, auto-regression and moving average), machine learning (ex,
set:
support vector regression), and deep learning (ex, long short-term
O.1 Use statistical analysis to interpret how historical server data, memory).
such as the counters usage (CPU, RAM and Disk), varies over The Autoregressive Integrated Moving Average Model (ARIMA)
time and across different sessions to identify any patterns is a comprehensive time series model that incorporates both Autore-
and trends in the data. gressive (AR) and Moving Average (MA) processes [24]. The ARIMA
O.2 Evaluate different time series forecasting algorithms that model has three parameters: 𝑝, 𝑑, and 𝑞. These parameters stand
forecast the server utilisation by predicting the CPU, RAM for Autoregression, Integrated, and Moving Average, respectively.
and Disk usage based on historical data. Autoregression (𝑝) is a regression model that considers the relation-
O.3 Use the predictions generated by time series forecasting ships between an observation and lagged observations. Integrated
algorithms to generate an overall usage score based on time (𝑑) measures the differences between observations at different times
and day and visualise it as a heatmap to highlight server to transform the time series stationary. Moving Average (𝑞) is a
under-utilisation patterns. technique that considers the correlation between observations and
residual error terms when a moving average model is applied to
1.5 Proposed Solution the lagged observations [19]. Vector Autoregression (VAR) is a
Through this research, a system using time series algorithms will statistical technique that extends the AR model to analyse multi-
be developed to forecast the resource usage of servers belonging to variate time series data where multiple variables influence each
a RAS farm. The predictions made by the algorithm will be used other [18, 23]. In VAR, each variable is expressed as a function based
to highlight whether a server is being under-utilised. The number on the variables’ values at previous time points (lagged values). In
of sessions connected to a server will be used as the algorithm’s VAR, all variables can influence each other due to the bidirectional
input to predict the server’s CPU, RAM and disk read/write usage. relationship between variables [18].
Overall usage scores will be generated per hour for the upcoming The Vector Autoregressive Moving Average model with exoge-
week using the algorithm’s usage predictions of each resource. The nous variables (VARMAX) expands on the ARMA/ARIMA model’s
usage scores will be visualised as a table heatmap to display how the capabilities to enable the analysis of vector time series data with
server utilisation varies across different time intervals and days and multiple variables and the ability to accommodate exogenous vari-
also help identify periods where the server is under-utilised. Based ables, which are independent of other variables [13]. VARMAX can
on the usage score derived from the heatmap visualisation, the user effectively capture the dynamic relationship between dependent
could decide to switch off the server or remove it to reduce costs. variables (endogenous) and the relationship between the dependent
Different time-series algorithms will be evaluated to determine and independent variables (exogenous). This model is defined in
which algorithm performs better at predicting the CPU, RAM and terms of the orders of the AR (Autoregressive) or MA (Moving-
disk usage using the number of sessions as input. Average) process or a combination of both [12].
Machine learning and deep learning algorithms are being used
2 BACKGROUND to solve multivariable time series forecasting problems as they have
A time series refers to sequential observations made over time, been found to perform better than traditional statistical analysis
such as daily share price and hourly air temperature readings. In models on non-linear and complex datasets. A widely used machine
time-series forecasting, the typical approach involves analyzing learning technique is neural networks [9].
historical data, fitting a suitable model, and utilizing it to predict Artificial Neural Networks (ANN) is a computational model that
future values [3]. The four fundamental components that typically functions similarly to biological nervous systems. They consist of
affect time series are trend, seasonal, cyclical, and irregular. The multiple connected computational nodes, also known as neurons,
trend indicates whether a time series has a consistent upward or that work together to learn from the input and optimize the output
downward movement in the data over time. An example is the [10]. In an ANN, the connection between computational nodes is
upward trend observed in population growth over the years. The established through weights. Each neuron’s input is multiplied by
seasonal component indicates whether seasonal factors influence a weight that influences the output produced by that unit. The
time series data. An example is the recurring pattern at fixed and learning process of an ANN occurs by adjusting the weights that
2
connect the neurons. The ANN modifies and refines the computed Nguyen et al. [9] use LSTM in their research to forecast CPU us-
function by adjusting the weights between neurons to increase age in cloud workloads and determine the under and over-utilization
prediction accuracy with future iterations [1]. Each neuron receives of cloud resources. They develop short-term and long-term LSTM
weighted inputs from other neurons, which are then summed and prediction models to forecast this usage. The short-term model
transferred to an activation function. The activation function is a uses data from the previous day as an interval for its predictions,
hyperparameter that optimizes the model’s performance. After all while the long-term model uses the previous week to make its
the layers have been processed, the deep neural network (DNN) predictions. To improve working with seasonal and unbalanced
produces an output which is then compared to the actual output datasets, they use the average of the predicted and actual values of
to calculate an error. The backpropagation process updates the multiple time points to predict the workload. They also combine
weights using this error by the specified loss function. The loss LSTM with transfer learning (TL) to optimise the model using the
function assesses the model’s performance [5]. previous model’s learning knowledge. Their experiment used three
Recurrent Neural Networks (RNNs) are a type of neural net- scenarios of simulated datasets and two practical datasets. Each
work used for sequential tasks due to their ability to retain and dataset included 20,160 data records consisting of the host, CPU
analyze past information. The high-dimensional hidden state and usage and time over two weeks. The first scenario was used to eval-
nonlinear dynamics of RNNs allow the hidden state to accumulate uate daily trends in data; the second scenario was used to evaluate
information over multiple timesteps and utilize it to make precise how the model adapts to different trends; and the third scenario
predictions [20]. Despite RNN’s capacity to understand sequential was used to evaluate the weekly seasonal in data. Their results
dependencies, RNN models suffer from exploding or vanishing gra- indicated that the LSTM short-term and long-term models resulted
dients which cause them to have difficulty capturing long-term in more accurate forecasts than traditional one-step, multi-step, and
sequential patterns in the data. Long-short-term memory (LSTM) iterated prediction approaches when evaluating the loss function
aims to mitigate the issues of vanishing and exploding gradients using mean squared error (MSE), root mean square error (RMSE),
by replacing the hidden layer utilising “sigmoid” or “tanh” with mean absolute percentage error (MAPE), and mean absolute error
memory cells regulated by gates that control the data flow to the (MAE).
hidden neurons and preserve the features from previous timesteps Similarly, Rao et al. [15] also use LSTM to forecast future resource
[17]. usage based on demand and optimise resource management using
this prediction. In their research, they implement and compare
three different time series algorithm models to predict the CPU
3 LITERATURE REVIEW utilisation of a server. The three models compared and evaluated are
Jain et al. [7] use VARMAX in their research to predict the energy Holt-Winters, ARIMA and LSTM. The dataset used in their research
generated by a wind turbine using the data gathered from vari- consisted of CPU usage collected from a server that periodically
ous sensors. They used a dataset consisting of a year of SCADA receives requests from two clients. Their research findings showed
(Supervisory Control and Data Acquisition) data collected from a that the LSTM model performed the best, giving the lowest MAPE
wind turbine. For the model features, they selected readings from and RMSE.
ten sensors and scaled the observations from 144 observations to Thonglek et al. [21] adopted a similar yet distinct methodology.
an observation per day. They included exogenous variables in the Through their research, they implemented an LSTM model that
model to account for external factors and unforeseen events that predicts resource allocation in a data centre for each job based on
may independently affect the energy generated by the turbine. By historical data. Their objective was to enhance resource utilisation
including these exogenous variables, they are accounting for addi- in data centres by forecasting the resource demands of individual
tional factors that may impact the accuracy of forecasted outcomes. applications. They implemented an LSTM model with two layers
Using a year’s data, their VARMAX model predicted the forecast of that each learn the relationship between the allocation and usage
energy generated for the following three months by considering and the CPU and memory. To train their model, they used Google’s
exogenous and endogenous variables. cluster usage data, which included information such as the resource
The VARMAX model has also been used by Rafi et al. [13] to allocation and usage of jobs carried out in a Google data centre.
predict the cash demands of individual ATMs within a network of The model was integrated into a data centre’s resource manager
ATMs belonging to a particular financial institution. The research to distribute the estimated resource allocation generated by the
utilized a dataset consisting of two and a half years of transactions model to computing resources. They evaluated their model using
from 7 different ATMs located in financially active areas in Pak- Google’s cluster scheduler simulator, where the results showed CPU
istan. They first tested the modelled data for correlation and found and memory utilisation enhancements when using their proposed
through the tests that there was a highly positive relation between model. Compared to a traditional resource manager, their model
the transaction amount and transaction count. They performed a improved utilisation by 10.71% for CPU and 47.36% for memory.
second test to verify that the dataset was stationary. The data was
then used on the VARMAX model to predict the transaction amount
and transaction count while considering exogenous variables such 4 METHODOLOGY
as salary week, weekdays and holidays. According to their findings, The system implemented for this research uses time-series algo-
the VARMAX model they used performed better than the LSTM rithms to predict the utilisation of servers belonging to a RAS farm.
model previously implemented by Rajwani et al. [14] for the same The algorithms will predict each server’s resource usage and use
problem and dataset. those predictions to generate overall usage scores that are displayed
3
as a 7-day hourly forecast heatmap to highlight server utilisation 4.3 Objective 1 - Using statistical analysis to
patterns. interpret server utilisation
As outlined in the first section, Objective 1 involved applying statis-
tical analysis to the processed dataset (dataset 1) to identify patterns
4.1 Data Collection in data and relationships between features. The approach used for
statistical analysis was similar to that of Pathak et al. [11], where
An Azure test environment with RAS installed was used to collect
a dashboard was created for data visualisation. The interactive
data. The data consisted of information about the usage of Remote
dashboard was implemented using ‘Shiny’ 2 , a framework which
Desktop Session Hosts (RDSH) servers that belonged to the envi-
provides the ability to create a web application using Python. The
ronment’s RAS farm. The usage details consisted of each server’s
dashboard created lets users interact with the data through its graph-
CPU, RAM, and Disk Read/Write usage along with the server ID,
ical user interface. Users can select a server and adjust a particular
the server’s active and disconnected sessions and the time recorded
date range to view its usage through interactive visualisations.
at that particular instance. This data was recorded from the envi-
The dashboard’s first tab, as depicted in Figure 12 of Appen-
ronment’s RAS console every 15 minutes and stored in a database.
dix A.1, consists of the server’s resource usage over time for each
In order to gather accurate and comprehensive data on server uti-
counter. The usage is represented as separate line graphs for CPU
lization, simulated scenarios were created and deployed on the test
and RAM (Figure 1), disk read and write, and active and discon-
environment to mimic real-world server usage. These simulations
nected sessions.
were deployed using an automated script that connects to the Azure
test environment and opens between 1-3 applications for a random
duration of 1-5 hours. These simulations were designed to mimic
user behaviour, as throughout the day, users tend to use various ap-
plications for specific durations. This script was executed multiple
times a day on different machines. By using different machines, we
are simulating the usage of different users accessing the published
applications on the RDSH servers belonging to the test environment
RAS’s farm.
Three datasets were used for the training and evaluation of each
model. Each dataset was built upon the previous one with approx-
imately an additional week of data. The reason for using three
datasets was to analyze the impact and performance of the models Figure 1: Dashboard - Displays graphs that show how each
when adding more data. The first dataset consisted of data from counter’s usage varies over time
approximately 48 days, recorded at each hour, consisting of 1146
records. The second dataset consisted of data from approximately 57
The dashboard’s second tab, as shown in Figure 13 of Appen-
days, which included 1368 records, while the third dataset consisted
dix A.2, consists of a multi-line graph that highlights how each
of data from approximately 64 days, comprising 1535 records.
counter’s usage varies over time. Each line represents the usage of
different counters (Figure 2).
4.2 Data Preprocessing

The data preprocessing first consisted of converting the SQLite
database to a data frame. A column for the ‘Total Sessions’ was
added, which combines the active and disconnected sessions. Ses-
sions are created when users connect to RAS. This column will
be used as the input for the algorithm. The ‘TimeCreate’ column,
which indicates the time of the record’s instance, was converted
from a Unix timestamp to a ‘DateTime’ object to display a readable
date in ISO 8601 format.
Each server is grouped with its server ID and is resampled to
an hourly frequency using the mean. Data resampling involved
grouping the original data into hourly intervals and finding the
average value for each hour to create a single value. Since the
dataset only consisted of usage records when the server was on,
resampling data resulted in ‘NaN’ values in the hours when the
server was off. These values were replaced with 0, indicating that Figure 2: Dashboard - Displays the overall counters’ usage
the server had no activity or utilization and was off during that over time
time. Feature selection was made by choosing the features useful
for the model and dropping the unnecessary features. 2 https://shiny.rstudio.com/py/
4
Figure 14 of Appendix A.3, shows the dashboard’s third tab. function from the‘sklearn’ 3 library. Both algorithms were trained
This tab contains a bar chart depicting each counter usage across and tested on the same dataset, enabling a direct comparison of
different sessions (Figure 3). their performance.
4.4.1 Vector Autoregression Moving-Average with Exogenous Regres-
sors (VARMAX).
Similar to work conducted by Jain et al., [7] and Deepthi and In-
diramma [13], the first algorithm implemented is VARMAX. The
VARMAX algorithm is a multivariate time series model that can
analyse the relationship between the dependent and independent
variables using autoregressive (AR), moving-average (MA), or a
combination of both. The ‘statsmodels’ 4 library was used to create
the VARMAX model. When defining the model, the following pa-
rameters were passed: exogenous variables, endogenous variables,
order and trend. The exogenous (independent) variable defined
was the total sessions, while the endogenous (dependent) variables
were the CPU, RAM and Disk Read/Write. The order was speci-
fied as (𝑝, 𝑞), where the first value represents the AR order, and
Figure 3: Dashboard - Displays the total usage of each counter the second value represents the MA order. The trend parameter
at different number of total sessions is used to specify the pattern or trend observed in the data over
time. The 𝑓 𝑖𝑡 () method was utilised to train the dataset through
an iterative optimisation algorithm that identifies the optimal val-
The dashboard’s fourth tab, as shown in Figure 15 of Appendix ues for the model parameters. After training the VARMAX model,
A.4, provides a heatmap that visualises the server’s utilisation over the model generates predictions of the test set using the 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 ()
a week. Like the previous tabs, users can select which server they method. The predictions were inverse-transformed to the original
would like to view the heatmap. Users can also view the heatmap of scale, where they were then placed in a data frame with the same
servers assigned to a particular group in RAS. The heatmap (Figure index as the test set
4) highlights the total number of sessions (active + disconnected
sessions) at each hour, where the warmer colours indicate a higher 4.4.2 Long Short-Term Memory (LSTM).
usage (higher number of sessions). The second algorithm implemented for this research was a LSTM
network using a similar approach to that of Rao et al. [16] and
Nguyen et al. [9]. As outlined in the background, LSTM is a RNN
that uses memory cells and input/output gates to control the flow
of information within the network.
A function was implemented to iterate through each row in
that dataset and create input-output sequences for the training
and testing of the LSTM model. In these input-out sequences, each
input sequence consists of a fixed number of historical time steps
(specified by the look-back parameter) used as input to predict the
next value in the time series. The look-back parameter specifies the
number of previous time steps to use as input for predicting the
next time step. The corresponding output sequence consists of the
next time step value after the input sequence. In order to predict the
server utilization, the past values of the number of sessions were
used as the input sequence, whilst the corresponding CPU, RAM
Figure 4: Dashboard - Displays a heatmap that shows the and disk usage was used as the output sequence. The look-back
total Server Utilisation period in this implementation was set to 48, equivalent to two days,
as the dataset includes an hour frequency. For the implementation
of the model, the ‘Sequential’ class from the Keras 5 library was
used to create a linear stack of layers that allow data to flow from
4.4 Objective 2 - Evaluate different time series one layer to another. An LSTM layer was added to the Sequential
forecasting algorithms model, where the number of neurons and the input shape expected
Objective 2, as listed in the first section of this paper involved the by the LSTM layer were specified. The input shape represents the
implementation of time series algorithms that forecast the server shape of the input data, where each input has look-back timesteps
utilisation by predicting the server counter’s usage. Each prepro- with one feature dimension (number of sessions).
cessed dataset was split so that 80% would be the train set whilst 3 https://scikit-learn.org/stable/
the remaining 20% would be the test set. Both train and test sets 4 https://www.statsmodels.org/dev/index.html
were normalised within the range of 0-1 using the “MinMaxScaler” 5 https://keras.io/
5
The output of the LSTM layer is a sequence of hidden states, one invertibility in the MA component, was also defined with boolean
for each time step in the input sequence. This output is used as values. Table 1 shows the optimal hyperparameters for the VAR-
the input to the dense layer. The dense layer includes four neurons MAX model when using each of the three datasets. The optimal
that create a sequence of predicted values consisting of four output hyperparameters defined in the table resulted in the lowest RMSE
values for each input sequence. Each value represents the predicted value. The table’s results indicate that stationarity and invertibility
CPU, RAM, Disk Read and Write. The model was then configured for were enforced in all three datasets, whilst the 𝑝, 𝑞, and 𝑡𝑟𝑒𝑛𝑑 varied
training by specifying the loss and optimiser parameters. The loss across datasets. The table also shows the RMSE scores achieved us-
parameter represents the loss function used to calculate the error ing the hyperparameters combination for each dataset. The results
between the predicted and actual values. The optimiser parameter indicated that dataset 3 achieved the lowest RMSE score of 8.45.
was used to update the neural network weights during training.
Table 1: VARMAX Optimal Hyperparameters
4.5 Objective 3 - Use the predictions to generate
an overall usage score and create a heatmap
Dataset 1 Dataset 2 Dataset 3
highlighting the server’s under-utilisation
P 2 0 0
As outlined in the first section, Objective 3 consisted of using the
Q 2 5 4
predictions generated (for the next 7 days) by the VARMAX and
Trend c t n
LSTM models to calculate the total usage for each hour. The to-
Enforce Stationarity True True True
tal usage for each hour was determined by adding the CPU and
Enforce Invertibility True True True
RAM usage, multiplying by 100, and dividing by 200 to obtain a
percentage value representing the total usage. The total usage at RMSE Score 11.74 12.19 8.45
each hour was visualised in a heatmap table with seven columns
representing the days of the week and 24 rows representing each
hour. A colour scale is defined on the heatmap to map the total Optuna was also used to optimise the LSTM model’s parameters.
usage values to different colours. Lower percentages indicating low The parameters optimised consisted of the following: the number
server utilisation patterns were assigned light shades of orange, of units, dropout rate, activation, optimiser and batch size. For
whilst higher percentages indicating high utilisation were assigned the number of units, the search space was defined as a range of
warmer shades of red. values between 16 and 256. The dropout rate was also defined as the
range of values between 0.0 and 0.5. The activation parameter was
5 FINDINGS AND DISCUSSION OF RESULTS set to include the following categorical options: Rectified Linear
Unit (ReLU), Hyperbolic Tangent Function (Tanh) and sigmoid.
5.1 Hyper-parameter Optimization The optimiser was also defined as categorical options consisting
of Adam and Root Mean Square Propagation (RMSProp) optimiser.
Determining the optimal hyperparameters consisted of using Op- For the batch size parameter, the following values were defined in
tuna 6 , a Python framework that uses optimisation methods to find the search space: 16, 32, 64, 128.
the best hyperparameters that achieve optimal performance. An ob- Table 2 exhibits the LSTM’s optimal hyperparameters achieved
jective function was created for each model to evaluate the model’s using Optuna across the three datasets. The table shows that the op-
performance on different sets of hyperparameters using the Root timal number of units increased as the size of the dataset increased,
Mean Square Error (RMSE). Optuna used this objective function to indicating that by increasing the number of units in the LSTM layer,
minimise the RMSE by iteratively exploring the hyperparameters the LSTM model can help better capture and understand the data’s
defined in the search space for a predefined number of iterations complex patterns. On the other hand, the dropout rate, activation
(trials) set to 100. This process was repeated for the VARMAX and function, optimizer, and batch size varied across each dataset as the
LSTM models to determine the optimal parameters using each of data increased by an additional week. Similar to the previous table,
the three datasets. Each dataset was split into the train (60%), vali- the optimal parameters were determined by the RMSE score. The
dation (20%), and test (20%) set. During each trial, the model was table indicates that dataset 1 achieved the lowest RMSE score of
trained on the train set using a specific set of hyperparameters and 8.79.
evaluated using those parameters on the validation set.
For the VARMAX model, Optuna was used to optimise the order,
Table 2: LSTM Optimal Hyperparameters
trend, enforce stationarity, and enforce invertibility parameters. In
the search space, the order parameter, representing the number of
autoregressive (𝑝) and moving average (𝑞), was specified within a Dataset 1 Dataset 2 Dataset 3
range of 0 to 5. The trend parameter included options consisting Units 108 182 233
of trends that are constant (𝑐), linear (𝑡), both (𝑐𝑡) or none (𝑛). Dropo Rate 0.47 0.50 0.40
The enforce stationarity, which defines whether the AR enforces Activation sigmoid relu sigmoid
stationarity, was defined with either true or false values, whilst the Optimizer rmsprop adam adam
enforce invertibility, which defines whether the MA parameter uses Batch Size 16 32 16
6 https://optuna.readthedocs.io/en/stable/ RMSE Score 8.79 9.32 9.58
6
5.2 Evaluation of VARMAX and LSTM models In Figure 6, we can see the performance of the LSTM model
using different forecast horizons across different forecast horizons for each dataset. The chart shows
that the model’s performance when evaluating the MAE and RMSE
values varied slightly across forecast horizons, which suggests that
Similarly to how Duggan et al. [4] evaluated the accuracy of a neu-
the different forecast horizons did not influence the LSTM model’s
ral network using different time steps to predict CPU utilisation,
prediction accuracy.
we also experimented using different forecast horizons for both
VARMAX and LSTM models. The forecast horizon represents the
number of time steps into the future to predict. In order to evalu-
ate the performance of the models in predicting server utilisation
across different durations, the models were trained and evaluated
using three different forecast horizons: 48 hours (2 days), 120 hours
(5 days), and 168 hours (7 days). For each forecast horizon, the
models were trained using their respective optimal parameters on
the complete train set (consisting of the train + validate set) and
evaluated on the test set using Mean Absolute Error (MAE) and
Root Mean Squared Error (RMSE) as metrics. This process was
repeated for each dataset.
Figure 5 displays the performance of the VARMAX model across
different forecast horizons using each dataset. The bar chart shows
that the 2-day forecast horizon across each dataset had the lowest
metric values compared to the 5-day and 7-day forecast horizons,
indicating that the model performed better when making short-term Figure 6: LSTM’s MAE and RMSE Metric Performance using
predictions. In each dataset, the 5-day and 7-day forecast horizons different forecast horizons
had similar metric values. This similarity might indicate that the
model reached a point where it captured most of the important
information and patterns within the first few days and will show The LSTM’s performance across datasets for each forecast hori-
minimal improvement for longer forecast horizons. zon indicates that the model’s performance remained stable and
consistent as more data was added between datasets. The results
indicate that the LSTM model effectively captured patterns and
trends in the data and reached a convergence point.
Figure 5: VARMAX’s MAE and RMSE Metric Performance Figure 7: Average VARMAX and LSTM’s MAE Performance
using different forecast horizons across different forecast horizons
The results of the VARMAX model indicate that across each Figure 7 exhibits the average MAE across the three datasets for
dataset, each forecast horizon’s MAE and RMSE values tended to each forecast horizon of both VARMAX and LSTM models, while
decrease as more data was added between each dataset. There was, Figure 8 exhibits the average RMSE for the two models. When
however, an anomaly at the 2-day forecast horizon in Dataset 3, comparing the results of the two models, the VARMAX model
where the MAE and RMSE values increased compared to Dataset 2. outperforms the LSTM model for the 2-day forecast horizon with
The overall decrease in MAE and RMSE shows that the model made MAE and RMSE values lower than that of LSTM. On the other
more accurate predictions (closer to the actual values) with each hand, for the 5-day and 7-day forecast horizons, the LSTM model
additional week of data. This suggests that the VARMAX model performed better, with lower MAE and RMSE values, than the
was learning more about the patterns and trends in the data as VARMAX model, which indicates that the LSTM model predicted
more data was made available. long-term forecasts better than the VARMAX model.
7
generated using the VARMAX predictions (Figure 10) shows that
the server will be used between 9 am till 4 pm from Monday to
Wednesday, whilst on Thursday and Friday, the server is expected
to be used between 9 am till 6 pm. The overall start time of the
server’s utilisation aligns closely with the actual data, whilst the
end time, which indicates the time the server is no longer utilised,
shows a discrepancy. The VARMAX model expects an earlier finish
time of three hours on some days. Compared to the actual usage,
the total usage of the VARMAX heatmap appears significantly
lower, indicating that the model tends to underestimate the server
utilisation levels on average.
Figure 8: Average VARMAX and LSTM’s RMSE Performance

across different forecast horizons
Table 3 of Appendix B.1 and Table 4 of Appendix B.2 represent

the MAE and RMSE results for each counter when using the VAR-
MAX and LSTM models, respectively. Table 3 shows the results for
VARMAX model, whilst Table 4 shows the results for the LSTM
model.
5.3 Evaluation of Server Utilisation
In order to evaluate the accuracy of the server utilization predic-

tions, three heatmaps were generated. The first heatmap consisted Figure 10: Heatmap representing the Server Utilisation using
of the actual total usage data (CPU + RAM) at each hour from the VARMAX
test set, which served as ground truth and provided a benchmark for
comparison. The second and third heatmaps were generated using
the predictions from the VARMAX and LSTM models, respectively. The heatmap generated using the LSTM predictions (Figure 11)
By comparing these three heatmaps, the performance of both the indicates that the server will be used from Monday to Wednesday,
VARMAX and LSTM models can be evaluated against the actual between 11 am till 7 pm, and on Thursday and Friday, between 11
usage. The predictions from each model were generated using the am till 9 pm. Compared to the actual data, the LSTM’s heatmap
optimal parameters and results from Dataset 3. shows a consistent delay of two hours in the start time of utilisation.
However, for the end time, it closely resembled the actual data
except for the two days, Thursday and Friday, which shows the
server utilisation will be used for an extended duration. The total
usage of the LSTM values was also similar to the actual heatmap.
Figure 9: Heatmap representing the actual Server Utilisation
The actual server utilisation heatmap (Figure 9) indicates that

the server is mainly used during weekdays from Monday to Friday
between 9 am till 7 pm, except for Monday and Wednesday, when Figure 11: Heatmap representing the Server Utilisation using
the server starts being used at 8 am, an hour earlier. The heatmap LSTM
8
The results show that the LSTM model is better suited for predict- [9] Tu Nguyen, Tri Do, Khanh Le, Seungkyu Go, Sunghyun Na, Dukyun Kim, and
ing server utilisation than the VARMAX model, despite variations Duc Tran. 2022. An LSTM-Based Approach for Predicting Resource Utilization
in Cloud Computing. In Proceedings of the 11th International Symposium on Infor-
in the start of utilisation times. The LSTM’s total usage closely mation and Communication Technology (Hanoi, Vietnam) (SoICT ’22). Association
resembles the actual data, indicating its ability to accurately pre- for Computing Machinery, New York, NY, USA, 173–179.
[10] Keiron O’Shea and Ryan Nash. 2015. An Introduction to Convolutional Neural
dict the overall server utilisation levels. On the other hand, the Networks. arXiv:1511.08458 [cs.NE]
VARMAX model exhibits significantly lower total usage values. [11] Shivangi Pathak, Shambhavi Lau, Nikita Lakhanpal, and Rajni Sehgal Kaushik.
2023. National Information Visualization Dashboard for Indian Regions Using
ML Approach. In 2023 13th International Conference on Cloud Computing, Data
6 CONCLUSION AND FUTURE WORKS Science & Engineering (Confluence). 93–98.
[12] Deepthi A R and Indiramma M. 2022. COVID-19 Prediction Using Time Series
In this research, we implemented two time series algorithms, VAR- Models. In 2022 4th International Conference on Advances in Computing, Commu-
MAX and LSTM, to predict the utilisation of servers in a user’s RAS nication Control and Networking (ICAC3N). 414–422.
[13] Muhammad Rafi, Mohammad Taha Wahab, Muhammad Bilal Khan, and Hani
configuration setup. The total predicted usage was calculated using
Raza. 2020. ATM Cash Prediction Using Time Series Approach. In 2020 3rd
the model’s predictions and represented as a heatmap to show the International Conference on Computing, Mathematics and Engineering Technologies
predicted usage for the upcoming week. By visualising the data as a (iCoMET). 1–6.
[14] Akber Rajwani, Tahir Syed, Behraj Khan, and Sadaf Behlim. 2017. Regression
heatmap, the user can determine which servers were under-utilised Analysis for ATM Cash Flow Prediction. In 2017 International Conference on
and at which hours to make decisions that can reduce costs. The Frontiers of Information Technology (FIT). 212–217.
results showed that VARMAX’s performance improved across each [15] Sriram N Rao, G Shobha, Srinivas Prabhu, and N Deepamala. 2019. Time Series
Forecasting methods suitable for prediction of CPU usage. In 2019 4th Inter-
forecast horizon as more data was added between datasets. This national Conference on Computational Systems and Information Technology for
indicated that the model was learning more about the patterns and Sustainable Solution (CSITSS). 1–5.
[16] Sriram N Rao, G Shobha, Srinivas Prabhu, and N Deepamala. 2019. Time Series
trends in the data with each additional data week. In contrast, the Forecasting methods suitable for prediction of CPU usage. In 2019 4th Inter-
LSTM model’s performance across datasets remained consistent as national Conference on Computational Systems and Information Technology for
more data was added, which indicates that the model effectively Sustainable Solution (CSITSS). 1–5.
[17] Hojjat Salehinejad, Sharan Sankar, Joseph Barfett, Errol Colak, and
captured patterns and trends in the data and reached a convergence Shahrokh Valaee. 2018. Recent Advances in Recurrent Neural Networks.
point. The results indicated that the VARMAX model performed arXiv:1801.01078 [cs.NE]
better when making short-term predictions (of 2 days), while the [18] Jasleen Kaur Sethi and Mamta Mittal. 2020. Analysis of Air Quality using Univari-
ate and Multivariate Time Series Models. In 2020 10th International Conference
LSTM performed better at predicting long-term forecasts (for 5 and on Cloud Computing, Data Science Engineering (Confluence). 823–827.
7 days). The LSTM model successfully predicted a 7-day forecast [19] Sima Siami-Namini, Neda Tavakoli, and Akbar Siami Namin. 2018. A Comparison
of ARIMA and LSTM in Forecasting Time Series. In 2018 17th IEEE International
and achieved similar results to the actual data. This shows that the Conference on Machine Learning and Applications (ICMLA). 1394–1401.
proposed model can successfully predict the resource utilisation of [20] Ilya Sutskever, James Martens, and Geoffrey Hinton. 2011. Generating Text with
a server using historical server usage. Recurrent Neural Networks. In Proceedings of the 28th International Conference
on International Conference on Machine Learning (ICML’11). Omnipress, Madison,
In future works, we intend to investigate how the model can be WI, USA, 1017–1024. event-place: Bellevue, Washington, USA.
extended to predict the total number of sessions, which will then [21] Kundjanasith Thonglek, Kohei Ichikawa, Keichi Takahashi, Hajimu Iida, and
be compared to the maximum number of sessions defined in RAS Chawanat Nakasan. 2019. Improving Resource Utilization in Data Centers using
an LSTM-based Prediction Model. In 2019 IEEE International Conference on Cluster
so that, together with total usage (CPU and RAM), we can provide Computing (CLUSTER). 1–8.
a grading of utilisation. Furthermore, we envision the possibility [22] Farman Ullah, Muhammad Bilal, and Su-Kyung Yoon. 2023. Intelligent time-series
forecasting framework for non-linear dynamic workload and resource prediction
of using the predictions from the model to provide suggestions in cloud. Computer Networks 225 (2023), 109653.
and take action on the user’s behalf to reduce costs. Such actions [23] Zonghan Wu, Shirui Pan, Guodong Long, Jing Jiang, Xiaojun Chang, and Chengqi
include automatically reducing the number of servers, switching Zhang. 2020. Connecting the Dots: Multivariate Time Series Forecasting with
Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International
off/on a particular server or auto-scaling for a particular server. Conference on Knowledge Discovery amp; Data Mining (Virtual Event, CA, USA)
(KDD ’20). Association for Computing Machinery, New York, NY, USA, 753–763.
[24] Mahendra Pratap Yadav, Nisha Pal, and Dharmendar Kumar Yadav. 2021. Work-
REFERENCES load Prediction over Cloud Server using Time Series Data. In 2021 11th Interna-
[1] Charu C. Aggarwal. 2018. Neural Networks and Deep Learning: A Textbook. tional Conference on Cloud Computing, Data Science Engineering (Confluence).
Springer International Publishing, Cham. 267–272.
[2] Srihari Athiyarath, Mousumi Paul, and Srivatsa Krishnaswamy. 2020. A Compar-
ative Study and Analysis of Time Series Forecasting Techniques. SN Computer
Science 1, 3 (May 2020), 175.
[3] Chris Chatfield. 2005. Time-series Forecasting. Significance 2, 3 (2005), 131–133.
[4] Martin Duggan, Karl Mason, Jim Duggan, Enda Howley, and Enda Barrett. 2017.
Predicting host CPU utilization in cloud computing using recurrent neural net-
works. In 2017 12th International Conference for Internet Technology and Secured
Transactions (ICITST). 67–72.
[5] Adrian Iustin Georgevici and Marius Terblanche. 2019. Neural networks and
deep learning: a brief introduction. Intensive Care Medicine 45, 5 (May 2019),
712–714.
[6] Aboul Ella Hassanien and Ashraf Darwish (Eds.). 2021. Machine Learning and
Big Data Analytics Paradigms: Analysis, Applications and Challenges. Studies in
Big Data, Vol. 77. Springer International Publishing, Cham.
[7] Samyak Jain, Sanjay, Pawan, Ravi Arora, and Priyanka Behera. 2022. Wind Power
Forecasting using VARMAX. In 2022 2nd International Conference on Technological
Advancements in Computational Sciences (ICTACS). 353–360.
[8] G. Mahalakshmi, S. Sridevi, and S. Rajaram. 2016. A survey on forecasting of
time series data. In 2016 International Conference on Computing Technologies and
Intelligent Data Engineering (ICCTIDE’16). 1–8.
9
A DASHBOARD A.3 Dashboard’s Usage vs Sessions Tab
A.1 Dashboard’s Counter vs Time Tab
Figure 14: Complete Dashboard UI that displays the total

usage of each counter at different number of total sessions
Figure 12: Complete Dashboard UI that displays each coun- A.4 Dashboard’s Server Utilization
ters’ usage over time
A.2 Dashboard’s Counters vs Time Tab
Figure 15: Complete Dashboard UI that displays the heatmap

consisting of the total Server Utilisation
Figure 13: Complete Dashboard UI that displays the overall

counters’ usage over time
10
B RESULTS B.2 LSTM’s Performance per Counter
B.1 VARMAX’s Performance per Counter
Table 4: LSTMs MAE and RMSE Performance (Per Counter)
across each dataset
Table 3: VARMAX’s MAE and RMSE Performance (Per
Counter) across each dataset
Dataset Forecast Horizon Counter MAE RMSE
Dataset Forecast Horizon Counter MAE RMSE Dataset 1 2 Days (48 hours) CPU 6.48 10.07
RAM 11.43 17.22
Dataset 1 2 Days (48 hours) CPU 0.09 0.09
DiskReadTime 2.21 8.06
RAM 0.19 0.20
DiskWriteTime 0.31 1.37
DiskWriteTime 0.01 0.01 5 Days (120 hours) CPU 7.13 10.63
RAM 13.24 21.41
5 Days (120 hours) CPU 11.28 17.55
RAM 34.54 56.20
RAM 10.79 19.18
7 Days (168 hours) CPU 12.09 17.43
RAM 37.62 56.21
DiskWriteTime 0.88 2.03 Dataset 2 2 Days (48 hours) CPU 5.28 9.20
RAM 16.02 22.21
RAM 0.21 0.21
RAM 12.54 19.05
5 Days (120 hours) CPU 6.52 9.23
RAM 31.54 45.02
RAM 10.38 17.36
7 Days (168 hours) CPU 5.81 8.19
RAM 28.05 39.87
DiskWriteTime 0.36 0.87 Dataset 3 2 Days (48 hours) CPU 1.41 1.47
RAM 5.29 5.51
RAM 1.39 5.84
RAM 9.95 15.67
5 Days (120 hours) CPU 1.88 4.66
RAM 6.64 16.12
RAM 9.25 14.71
7 Days (168 hours) CPU 2.76 6.44
RAM 8.03 17.89
Received 31 May 2023
11

ShaunBorg Placement Report

Uploaded by

Copyright:

Available Formats

You might also like

ShaunBorg Placement Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ShaunBorg Placement Report

Uploaded by

Copyright:

Available Formats

Optimizing Cost by Forecasting Server Utilization through Time

4.2 Data Preprocessing

Figure 8: Average VARMAX and LSTM’s RMSE Performance

Table 3 of Appendix B.1 and Table 4 of Appendix B.2 represent

5.3 Evaluation of Server Utilisation

In order to evaluate the accuracy of the server utilization predic-

Figure 9: Heatmap representing the actual Server Utilisation

The actual server utilisation heatmap (Figure 9) indicates that

Figure 14: Complete Dashboard UI that displays the total

A.2 Dashboard’s Counters vs Time Tab

Figure 15: Complete Dashboard UI that displays the heatmap

Figure 13: Complete Dashboard UI that displays the overall

Received 31 May 2023

You might also like