Professional Documents
Culture Documents
ShaunBorg Placement Report
ShaunBorg Placement Report
ShaunBorg Placement Report
Series Analysis
Shaun Borg
shaun.borg.22@um.edu.mt
Department of Artificial Intelligence
University of Malta
Malta
ABSTRACT Using statistical analysis, we can identify patterns, trends, and
By predicting server utilisation, IT admins can take measures in seasonality in the historical server usage data to better understand
advance to optimise usage and reduce costs. This research aims the data and select the most suitable time series algorithm for
to identify potential improvements for a user’s RAS (Remote Ap- the data. By applying machine learning techniques such as the
plication Server) configuration setup by predicting the utilisation time series algorithms VARMAX (Vector Autoregressive Moving
of servers in a RAS farm using VARMAX (Vector Autoregressive Average with Exogenous Variables) and LSTM (Long Short-Term
Moving Average with Exogenous Variables) and LSTM (Long Short- Memory) to RAS, we can predict the utilisation of servers. Time
Term Memory). The proposed models used historical server data series algorithms can identify data trends and extract valuable
to predict the server’s resources, such as CPU, RAM, and Disk insights from temporal data (such as historical server usage) to
Read/Write. Using that predicted data, the total usage per hour make accurate predictions. Deep learning techniques, such as Long
was calculated and displayed as a 7-day hourly heatmap table that Short-Term Memory (LSTM) neural networks, can effectively retain
highlights the different levels of predicted server utilisation. The long-term dependencies, making them suitable for modelling noisy
proposed models were trained using optimal parameters and eval- one-dimensional time series data and making accurate predictions
uated on different forecast horizons using metrics such as Mean over extended periods [9].
Absolute Error (MAE) and Root Mean Squared Error (RMSE). The By predicting server utilisation, IT admins using RAS can take
results indicated that the VARMAX model performed better at a proactive approach for timely actions, such as load balancing,
short-term predictions, whilst LSTM performed better at predicting resource allocation, or scaling down server capacities, which can
long-term predictions. The results obtained in this research showed help them optimise performance and lower operational costs. Cur-
that using historical server data, we can successfully predict the rently, RAS lacks the functionality to forecast a server’s utilisation;
server’s resource utilisation for a 7-day forecast. implementing such a system will help benefit RAS customers as it
can aid them in making decisions that reduce costs.
KEYWORDS
Time Series Analysis, LSTM, VARMAX, neural networks, server
utilisation
1.2 Challenges
1 INTRODUCTION Traditional time series forecasting techniques, like Autoregressive
Monitoring server utilisation has become a crucial factor in man- integrated moving average (ARIMA), utilise patterns observed in
aging IT costs. Companies must ensure their servers operate effi- historical data to predict future outcomes. Such techniques may,
ciently and effectively to avoid having idle resources that can incur however, not be ideal when the data is non-stationary or contains
unnecessary expenses. They can achieve this by predicting server random fluctuations [4]. Traditional approaches are also computa-
utilisation and taking measures in advance to optimise usage. The tionally intensive and only perform best when working with station-
proposed system will utilise historical server usage data to identify ary data, making them impossible to predict constantly changing
patterns and trends in data and predict future server utilisation data such as CPU usage [9].
levels using time-series algorithms. By accurately forecasting a Time series algorithms can effectively capture and model data’s
server’s under-utilisation patterns using these algorithms, compa- seasonality (recurring patterns that repeat at fixed intervals) and
nies can optimise resource allocation, reduce costs, and enhance trends (patterns in data). Modelling seasonality and trends in time
overall system efficiency. series data can be challenging as they constantly change over time.
Selecting the appropriate time series forecasting algorithm for the
1.1 Motivation specific data is essential, as different models are suitable for different
Parallels Remote Application Server (RAS) 1 provides administra- types of data [8]. As indicated by Nguyen et al., [9] trends in data
tors with a centralised management console that combines on- may also affect the performance of short-term and long-term predic-
premises servers, VDI and multi-cloud solutions. RAS allows IT tions. For long-term predictions, LSTM models are better suited for
admins to add RDSH servers to distribute the workload across detecting patterns in the input features and can effectively examine
multiple servers. long sequences, unlike traditional regression methods, which can
find it challenging to adjust when forecasting problems include
1 https://www.parallels.com/products/ras/remote-application-server/ multiple inputs and variables [2].
1.3 Aim known intervals, such as quarterly, yearly, monthly, weekly, or
This research investigates the application of machine-learning tech- daily data. The cyclical pattern is when the data fluctuates for an
niques that identify potential improvements for a user’s RAS config- extended period and does not adhere to a consistent pattern. An
uration setup. The system will determine which servers are being example is a few years of economic growth followed by a few years
under-utilised based on the usage of resources. This information of recession, then economic growth again. The irregular component
will assist them in making decisions that can help reduce costs. The is when a time series contains fluctuations arising from unforeseen
company will provide the necessary data for this research, such factors and do not have a consistent pattern. An example of these
as the historical records of server usage. The data used for this fluctuations is when floods or earthquakes occur [6].
research will serve as the foundation for training and testing the Predicting resource consumption can be considered a time series
time series algorithms to forecast future server utilization patterns. forecasting problem in which historical resource usage patterns are
analyzed to anticipate future resource utilization [9]. Ullah et al.
1.4 Objectives [22] explain that utilization prediction models can be categorized
into three main categories: classical time-serial analysis approaches
In order to reach the stated aim, the following objectives have been
(ex, auto-regression and moving average), machine learning (ex,
set:
support vector regression), and deep learning (ex, long short-term
O.1 Use statistical analysis to interpret how historical server data, memory).
such as the counters usage (CPU, RAM and Disk), varies over The Autoregressive Integrated Moving Average Model (ARIMA)
time and across different sessions to identify any patterns is a comprehensive time series model that incorporates both Autore-
and trends in the data. gressive (AR) and Moving Average (MA) processes [24]. The ARIMA
O.2 Evaluate different time series forecasting algorithms that model has three parameters: 𝑝, 𝑑, and 𝑞. These parameters stand
forecast the server utilisation by predicting the CPU, RAM for Autoregression, Integrated, and Moving Average, respectively.
and Disk usage based on historical data. Autoregression (𝑝) is a regression model that considers the relation-
O.3 Use the predictions generated by time series forecasting ships between an observation and lagged observations. Integrated
algorithms to generate an overall usage score based on time (𝑑) measures the differences between observations at different times
and day and visualise it as a heatmap to highlight server to transform the time series stationary. Moving Average (𝑞) is a
under-utilisation patterns. technique that considers the correlation between observations and
residual error terms when a moving average model is applied to
1.5 Proposed Solution the lagged observations [19]. Vector Autoregression (VAR) is a
Through this research, a system using time series algorithms will statistical technique that extends the AR model to analyse multi-
be developed to forecast the resource usage of servers belonging to variate time series data where multiple variables influence each
a RAS farm. The predictions made by the algorithm will be used other [18, 23]. In VAR, each variable is expressed as a function based
to highlight whether a server is being under-utilised. The number on the variables’ values at previous time points (lagged values). In
of sessions connected to a server will be used as the algorithm’s VAR, all variables can influence each other due to the bidirectional
input to predict the server’s CPU, RAM and disk read/write usage. relationship between variables [18].
Overall usage scores will be generated per hour for the upcoming The Vector Autoregressive Moving Average model with exoge-
week using the algorithm’s usage predictions of each resource. The nous variables (VARMAX) expands on the ARMA/ARIMA model’s
usage scores will be visualised as a table heatmap to display how the capabilities to enable the analysis of vector time series data with
server utilisation varies across different time intervals and days and multiple variables and the ability to accommodate exogenous vari-
also help identify periods where the server is under-utilised. Based ables, which are independent of other variables [13]. VARMAX can
on the usage score derived from the heatmap visualisation, the user effectively capture the dynamic relationship between dependent
could decide to switch off the server or remove it to reduce costs. variables (endogenous) and the relationship between the dependent
Different time-series algorithms will be evaluated to determine and independent variables (exogenous). This model is defined in
which algorithm performs better at predicting the CPU, RAM and terms of the orders of the AR (Autoregressive) or MA (Moving-
disk usage using the number of sessions as input. Average) process or a combination of both [12].
Machine learning and deep learning algorithms are being used
2 BACKGROUND to solve multivariable time series forecasting problems as they have
A time series refers to sequential observations made over time, been found to perform better than traditional statistical analysis
such as daily share price and hourly air temperature readings. In models on non-linear and complex datasets. A widely used machine
time-series forecasting, the typical approach involves analyzing learning technique is neural networks [9].
historical data, fitting a suitable model, and utilizing it to predict Artificial Neural Networks (ANN) is a computational model that
future values [3]. The four fundamental components that typically functions similarly to biological nervous systems. They consist of
affect time series are trend, seasonal, cyclical, and irregular. The multiple connected computational nodes, also known as neurons,
trend indicates whether a time series has a consistent upward or that work together to learn from the input and optimize the output
downward movement in the data over time. An example is the [10]. In an ANN, the connection between computational nodes is
upward trend observed in population growth over the years. The established through weights. Each neuron’s input is multiplied by
seasonal component indicates whether seasonal factors influence a weight that influences the output produced by that unit. The
time series data. An example is the recurring pattern at fixed and learning process of an ANN occurs by adjusting the weights that
2
connect the neurons. The ANN modifies and refines the computed Nguyen et al. [9] use LSTM in their research to forecast CPU us-
function by adjusting the weights between neurons to increase age in cloud workloads and determine the under and over-utilization
prediction accuracy with future iterations [1]. Each neuron receives of cloud resources. They develop short-term and long-term LSTM
weighted inputs from other neurons, which are then summed and prediction models to forecast this usage. The short-term model
transferred to an activation function. The activation function is a uses data from the previous day as an interval for its predictions,
hyperparameter that optimizes the model’s performance. After all while the long-term model uses the previous week to make its
the layers have been processed, the deep neural network (DNN) predictions. To improve working with seasonal and unbalanced
produces an output which is then compared to the actual output datasets, they use the average of the predicted and actual values of
to calculate an error. The backpropagation process updates the multiple time points to predict the workload. They also combine
weights using this error by the specified loss function. The loss LSTM with transfer learning (TL) to optimise the model using the
function assesses the model’s performance [5]. previous model’s learning knowledge. Their experiment used three
Recurrent Neural Networks (RNNs) are a type of neural net- scenarios of simulated datasets and two practical datasets. Each
work used for sequential tasks due to their ability to retain and dataset included 20,160 data records consisting of the host, CPU
analyze past information. The high-dimensional hidden state and usage and time over two weeks. The first scenario was used to eval-
nonlinear dynamics of RNNs allow the hidden state to accumulate uate daily trends in data; the second scenario was used to evaluate
information over multiple timesteps and utilize it to make precise how the model adapts to different trends; and the third scenario
predictions [20]. Despite RNN’s capacity to understand sequential was used to evaluate the weekly seasonal in data. Their results
dependencies, RNN models suffer from exploding or vanishing gra- indicated that the LSTM short-term and long-term models resulted
dients which cause them to have difficulty capturing long-term in more accurate forecasts than traditional one-step, multi-step, and
sequential patterns in the data. Long-short-term memory (LSTM) iterated prediction approaches when evaluating the loss function
aims to mitigate the issues of vanishing and exploding gradients using mean squared error (MSE), root mean square error (RMSE),
by replacing the hidden layer utilising “sigmoid” or “tanh” with mean absolute percentage error (MAPE), and mean absolute error
memory cells regulated by gates that control the data flow to the (MAE).
hidden neurons and preserve the features from previous timesteps Similarly, Rao et al. [15] also use LSTM to forecast future resource
[17]. usage based on demand and optimise resource management using
this prediction. In their research, they implement and compare
three different time series algorithm models to predict the CPU
3 LITERATURE REVIEW utilisation of a server. The three models compared and evaluated are
Jain et al. [7] use VARMAX in their research to predict the energy Holt-Winters, ARIMA and LSTM. The dataset used in their research
generated by a wind turbine using the data gathered from vari- consisted of CPU usage collected from a server that periodically
ous sensors. They used a dataset consisting of a year of SCADA receives requests from two clients. Their research findings showed
(Supervisory Control and Data Acquisition) data collected from a that the LSTM model performed the best, giving the lowest MAPE
wind turbine. For the model features, they selected readings from and RMSE.
ten sensors and scaled the observations from 144 observations to Thonglek et al. [21] adopted a similar yet distinct methodology.
an observation per day. They included exogenous variables in the Through their research, they implemented an LSTM model that
model to account for external factors and unforeseen events that predicts resource allocation in a data centre for each job based on
may independently affect the energy generated by the turbine. By historical data. Their objective was to enhance resource utilisation
including these exogenous variables, they are accounting for addi- in data centres by forecasting the resource demands of individual
tional factors that may impact the accuracy of forecasted outcomes. applications. They implemented an LSTM model with two layers
Using a year’s data, their VARMAX model predicted the forecast of that each learn the relationship between the allocation and usage
energy generated for the following three months by considering and the CPU and memory. To train their model, they used Google’s
exogenous and endogenous variables. cluster usage data, which included information such as the resource
The VARMAX model has also been used by Rafi et al. [13] to allocation and usage of jobs carried out in a Google data centre.
predict the cash demands of individual ATMs within a network of The model was integrated into a data centre’s resource manager
ATMs belonging to a particular financial institution. The research to distribute the estimated resource allocation generated by the
utilized a dataset consisting of two and a half years of transactions model to computing resources. They evaluated their model using
from 7 different ATMs located in financially active areas in Pak- Google’s cluster scheduler simulator, where the results showed CPU
istan. They first tested the modelled data for correlation and found and memory utilisation enhancements when using their proposed
through the tests that there was a highly positive relation between model. Compared to a traditional resource manager, their model
the transaction amount and transaction count. They performed a improved utilisation by 10.71% for CPU and 47.36% for memory.
second test to verify that the dataset was stationary. The data was
then used on the VARMAX model to predict the transaction amount
and transaction count while considering exogenous variables such 4 METHODOLOGY
as salary week, weekdays and holidays. According to their findings, The system implemented for this research uses time-series algo-
the VARMAX model they used performed better than the LSTM rithms to predict the utilisation of servers belonging to a RAS farm.
model previously implemented by Rajwani et al. [14] for the same The algorithms will predict each server’s resource usage and use
problem and dataset. those predictions to generate overall usage scores that are displayed
3
as a 7-day hourly forecast heatmap to highlight server utilisation 4.3 Objective 1 - Using statistical analysis to
patterns. interpret server utilisation
As outlined in the first section, Objective 1 involved applying statis-
tical analysis to the processed dataset (dataset 1) to identify patterns
4.1 Data Collection in data and relationships between features. The approach used for
statistical analysis was similar to that of Pathak et al. [11], where
An Azure test environment with RAS installed was used to collect
a dashboard was created for data visualisation. The interactive
data. The data consisted of information about the usage of Remote
dashboard was implemented using ‘Shiny’ 2 , a framework which
Desktop Session Hosts (RDSH) servers that belonged to the envi-
provides the ability to create a web application using Python. The
ronment’s RAS farm. The usage details consisted of each server’s
dashboard created lets users interact with the data through its graph-
CPU, RAM, and Disk Read/Write usage along with the server ID,
ical user interface. Users can select a server and adjust a particular
the server’s active and disconnected sessions and the time recorded
date range to view its usage through interactive visualisations.
at that particular instance. This data was recorded from the envi-
The dashboard’s first tab, as depicted in Figure 12 of Appen-
ronment’s RAS console every 15 minutes and stored in a database.
dix A.1, consists of the server’s resource usage over time for each
In order to gather accurate and comprehensive data on server uti-
counter. The usage is represented as separate line graphs for CPU
lization, simulated scenarios were created and deployed on the test
and RAM (Figure 1), disk read and write, and active and discon-
environment to mimic real-world server usage. These simulations
nected sessions.
were deployed using an automated script that connects to the Azure
test environment and opens between 1-3 applications for a random
duration of 1-5 hours. These simulations were designed to mimic
user behaviour, as throughout the day, users tend to use various ap-
plications for specific durations. This script was executed multiple
times a day on different machines. By using different machines, we
are simulating the usage of different users accessing the published
applications on the RDSH servers belonging to the test environment
RAS’s farm.
Three datasets were used for the training and evaluation of each
model. Each dataset was built upon the previous one with approx-
imately an additional week of data. The reason for using three
datasets was to analyze the impact and performance of the models Figure 1: Dashboard - Displays graphs that show how each
when adding more data. The first dataset consisted of data from counter’s usage varies over time
approximately 48 days, recorded at each hour, consisting of 1146
records. The second dataset consisted of data from approximately 57
The dashboard’s second tab, as shown in Figure 13 of Appen-
days, which included 1368 records, while the third dataset consisted
dix A.2, consists of a multi-line graph that highlights how each
of data from approximately 64 days, comprising 1535 records.
counter’s usage varies over time. Each line represents the usage of
different counters (Figure 2).
4
Figure 14 of Appendix A.3, shows the dashboard’s third tab. function from the‘sklearn’ 3 library. Both algorithms were trained
This tab contains a bar chart depicting each counter usage across and tested on the same dataset, enabling a direct comparison of
different sessions (Figure 3). their performance.
4.4.1 Vector Autoregression Moving-Average with Exogenous Regres-
sors (VARMAX).
Similar to work conducted by Jain et al., [7] and Deepthi and In-
diramma [13], the first algorithm implemented is VARMAX. The
VARMAX algorithm is a multivariate time series model that can
analyse the relationship between the dependent and independent
variables using autoregressive (AR), moving-average (MA), or a
combination of both. The ‘statsmodels’ 4 library was used to create
the VARMAX model. When defining the model, the following pa-
rameters were passed: exogenous variables, endogenous variables,
order and trend. The exogenous (independent) variable defined
was the total sessions, while the endogenous (dependent) variables
were the CPU, RAM and Disk Read/Write. The order was speci-
fied as (𝑝, 𝑞), where the first value represents the AR order, and
Figure 3: Dashboard - Displays the total usage of each counter the second value represents the MA order. The trend parameter
at different number of total sessions is used to specify the pattern or trend observed in the data over
time. The 𝑓 𝑖𝑡 () method was utilised to train the dataset through
an iterative optimisation algorithm that identifies the optimal val-
The dashboard’s fourth tab, as shown in Figure 15 of Appendix ues for the model parameters. After training the VARMAX model,
A.4, provides a heatmap that visualises the server’s utilisation over the model generates predictions of the test set using the 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 ()
a week. Like the previous tabs, users can select which server they method. The predictions were inverse-transformed to the original
would like to view the heatmap. Users can also view the heatmap of scale, where they were then placed in a data frame with the same
servers assigned to a particular group in RAS. The heatmap (Figure index as the test set
4) highlights the total number of sessions (active + disconnected
sessions) at each hour, where the warmer colours indicate a higher 4.4.2 Long Short-Term Memory (LSTM).
usage (higher number of sessions). The second algorithm implemented for this research was a LSTM
network using a similar approach to that of Rao et al. [16] and
Nguyen et al. [9]. As outlined in the background, LSTM is a RNN
that uses memory cells and input/output gates to control the flow
of information within the network.
A function was implemented to iterate through each row in
that dataset and create input-output sequences for the training
and testing of the LSTM model. In these input-out sequences, each
input sequence consists of a fixed number of historical time steps
(specified by the look-back parameter) used as input to predict the
next value in the time series. The look-back parameter specifies the
number of previous time steps to use as input for predicting the
next time step. The corresponding output sequence consists of the
next time step value after the input sequence. In order to predict the
server utilization, the past values of the number of sessions were
used as the input sequence, whilst the corresponding CPU, RAM
Figure 4: Dashboard - Displays a heatmap that shows the and disk usage was used as the output sequence. The look-back
total Server Utilisation period in this implementation was set to 48, equivalent to two days,
as the dataset includes an hour frequency. For the implementation
of the model, the ‘Sequential’ class from the Keras 5 library was
used to create a linear stack of layers that allow data to flow from
4.4 Objective 2 - Evaluate different time series one layer to another. An LSTM layer was added to the Sequential
forecasting algorithms model, where the number of neurons and the input shape expected
Objective 2, as listed in the first section of this paper involved the by the LSTM layer were specified. The input shape represents the
implementation of time series algorithms that forecast the server shape of the input data, where each input has look-back timesteps
utilisation by predicting the server counter’s usage. Each prepro- with one feature dimension (number of sessions).
cessed dataset was split so that 80% would be the train set whilst 3 https://scikit-learn.org/stable/
the remaining 20% would be the test set. Both train and test sets 4 https://www.statsmodels.org/dev/index.html
were normalised within the range of 0-1 using the “MinMaxScaler” 5 https://keras.io/
5
The output of the LSTM layer is a sequence of hidden states, one invertibility in the MA component, was also defined with boolean
for each time step in the input sequence. This output is used as values. Table 1 shows the optimal hyperparameters for the VAR-
the input to the dense layer. The dense layer includes four neurons MAX model when using each of the three datasets. The optimal
that create a sequence of predicted values consisting of four output hyperparameters defined in the table resulted in the lowest RMSE
values for each input sequence. Each value represents the predicted value. The table’s results indicate that stationarity and invertibility
CPU, RAM, Disk Read and Write. The model was then configured for were enforced in all three datasets, whilst the 𝑝, 𝑞, and 𝑡𝑟𝑒𝑛𝑑 varied
training by specifying the loss and optimiser parameters. The loss across datasets. The table also shows the RMSE scores achieved us-
parameter represents the loss function used to calculate the error ing the hyperparameters combination for each dataset. The results
between the predicted and actual values. The optimiser parameter indicated that dataset 3 achieved the lowest RMSE score of 8.45.
was used to update the neural network weights during training.
Table 1: VARMAX Optimal Hyperparameters
4.5 Objective 3 - Use the predictions to generate
an overall usage score and create a heatmap
Dataset 1 Dataset 2 Dataset 3
highlighting the server’s under-utilisation
P 2 0 0
As outlined in the first section, Objective 3 consisted of using the
Q 2 5 4
predictions generated (for the next 7 days) by the VARMAX and
Trend c t n
LSTM models to calculate the total usage for each hour. The to-
Enforce Stationarity True True True
tal usage for each hour was determined by adding the CPU and
Enforce Invertibility True True True
RAM usage, multiplying by 100, and dividing by 200 to obtain a
percentage value representing the total usage. The total usage at RMSE Score 11.74 12.19 8.45
each hour was visualised in a heatmap table with seven columns
representing the days of the week and 24 rows representing each
hour. A colour scale is defined on the heatmap to map the total Optuna was also used to optimise the LSTM model’s parameters.
usage values to different colours. Lower percentages indicating low The parameters optimised consisted of the following: the number
server utilisation patterns were assigned light shades of orange, of units, dropout rate, activation, optimiser and batch size. For
whilst higher percentages indicating high utilisation were assigned the number of units, the search space was defined as a range of
warmer shades of red. values between 16 and 256. The dropout rate was also defined as the
range of values between 0.0 and 0.5. The activation parameter was
5 FINDINGS AND DISCUSSION OF RESULTS set to include the following categorical options: Rectified Linear
Unit (ReLU), Hyperbolic Tangent Function (Tanh) and sigmoid.
5.1 Hyper-parameter Optimization The optimiser was also defined as categorical options consisting
of Adam and Root Mean Square Propagation (RMSProp) optimiser.
Determining the optimal hyperparameters consisted of using Op- For the batch size parameter, the following values were defined in
tuna 6 , a Python framework that uses optimisation methods to find the search space: 16, 32, 64, 128.
the best hyperparameters that achieve optimal performance. An ob- Table 2 exhibits the LSTM’s optimal hyperparameters achieved
jective function was created for each model to evaluate the model’s using Optuna across the three datasets. The table shows that the op-
performance on different sets of hyperparameters using the Root timal number of units increased as the size of the dataset increased,
Mean Square Error (RMSE). Optuna used this objective function to indicating that by increasing the number of units in the LSTM layer,
minimise the RMSE by iteratively exploring the hyperparameters the LSTM model can help better capture and understand the data’s
defined in the search space for a predefined number of iterations complex patterns. On the other hand, the dropout rate, activation
(trials) set to 100. This process was repeated for the VARMAX and function, optimizer, and batch size varied across each dataset as the
LSTM models to determine the optimal parameters using each of data increased by an additional week. Similar to the previous table,
the three datasets. Each dataset was split into the train (60%), vali- the optimal parameters were determined by the RMSE score. The
dation (20%), and test (20%) set. During each trial, the model was table indicates that dataset 1 achieved the lowest RMSE score of
trained on the train set using a specific set of hyperparameters and 8.79.
evaluated using those parameters on the validation set.
For the VARMAX model, Optuna was used to optimise the order,
Table 2: LSTM Optimal Hyperparameters
trend, enforce stationarity, and enforce invertibility parameters. In
the search space, the order parameter, representing the number of
autoregressive (𝑝) and moving average (𝑞), was specified within a Dataset 1 Dataset 2 Dataset 3
range of 0 to 5. The trend parameter included options consisting Units 108 182 233
of trends that are constant (𝑐), linear (𝑡), both (𝑐𝑡) or none (𝑛). Dropo Rate 0.47 0.50 0.40
The enforce stationarity, which defines whether the AR enforces Activation sigmoid relu sigmoid
stationarity, was defined with either true or false values, whilst the Optimizer rmsprop adam adam
enforce invertibility, which defines whether the MA parameter uses Batch Size 16 32 16
6 https://optuna.readthedocs.io/en/stable/ RMSE Score 8.79 9.32 9.58
6
5.2 Evaluation of VARMAX and LSTM models In Figure 6, we can see the performance of the LSTM model
using different forecast horizons across different forecast horizons for each dataset. The chart shows
that the model’s performance when evaluating the MAE and RMSE
values varied slightly across forecast horizons, which suggests that
Similarly to how Duggan et al. [4] evaluated the accuracy of a neu-
the different forecast horizons did not influence the LSTM model’s
ral network using different time steps to predict CPU utilisation,
prediction accuracy.
we also experimented using different forecast horizons for both
VARMAX and LSTM models. The forecast horizon represents the
number of time steps into the future to predict. In order to evalu-
ate the performance of the models in predicting server utilisation
across different durations, the models were trained and evaluated
using three different forecast horizons: 48 hours (2 days), 120 hours
(5 days), and 168 hours (7 days). For each forecast horizon, the
models were trained using their respective optimal parameters on
the complete train set (consisting of the train + validate set) and
evaluated on the test set using Mean Absolute Error (MAE) and
Root Mean Squared Error (RMSE) as metrics. This process was
repeated for each dataset.
Figure 5 displays the performance of the VARMAX model across
different forecast horizons using each dataset. The bar chart shows
that the 2-day forecast horizon across each dataset had the lowest
metric values compared to the 5-day and 7-day forecast horizons,
indicating that the model performed better when making short-term Figure 6: LSTM’s MAE and RMSE Metric Performance using
predictions. In each dataset, the 5-day and 7-day forecast horizons different forecast horizons
had similar metric values. This similarity might indicate that the
model reached a point where it captured most of the important
information and patterns within the first few days and will show The LSTM’s performance across datasets for each forecast hori-
minimal improvement for longer forecast horizons. zon indicates that the model’s performance remained stable and
consistent as more data was added between datasets. The results
indicate that the LSTM model effectively captured patterns and
trends in the data and reached a convergence point.
Figure 5: VARMAX’s MAE and RMSE Metric Performance Figure 7: Average VARMAX and LSTM’s MAE Performance
using different forecast horizons across different forecast horizons
The results of the VARMAX model indicate that across each Figure 7 exhibits the average MAE across the three datasets for
dataset, each forecast horizon’s MAE and RMSE values tended to each forecast horizon of both VARMAX and LSTM models, while
decrease as more data was added between each dataset. There was, Figure 8 exhibits the average RMSE for the two models. When
however, an anomaly at the 2-day forecast horizon in Dataset 3, comparing the results of the two models, the VARMAX model
where the MAE and RMSE values increased compared to Dataset 2. outperforms the LSTM model for the 2-day forecast horizon with
The overall decrease in MAE and RMSE shows that the model made MAE and RMSE values lower than that of LSTM. On the other
more accurate predictions (closer to the actual values) with each hand, for the 5-day and 7-day forecast horizons, the LSTM model
additional week of data. This suggests that the VARMAX model performed better, with lower MAE and RMSE values, than the
was learning more about the patterns and trends in the data as VARMAX model, which indicates that the LSTM model predicted
more data was made available. long-term forecasts better than the VARMAX model.
7
generated using the VARMAX predictions (Figure 10) shows that
the server will be used between 9 am till 4 pm from Monday to
Wednesday, whilst on Thursday and Friday, the server is expected
to be used between 9 am till 6 pm. The overall start time of the
server’s utilisation aligns closely with the actual data, whilst the
end time, which indicates the time the server is no longer utilised,
shows a discrepancy. The VARMAX model expects an earlier finish
time of three hours on some days. Compared to the actual usage,
the total usage of the VARMAX heatmap appears significantly
lower, indicating that the model tends to underestimate the server
utilisation levels on average.
Figure 12: Complete Dashboard UI that displays each coun- A.4 Dashboard’s Server Utilization
ters’ usage over time
10
B RESULTS B.2 LSTM’s Performance per Counter
B.1 VARMAX’s Performance per Counter
Table 4: LSTMs MAE and RMSE Performance (Per Counter)
across each dataset
Table 3: VARMAX’s MAE and RMSE Performance (Per
Counter) across each dataset
Dataset Forecast Horizon Counter MAE RMSE
Dataset Forecast Horizon Counter MAE RMSE Dataset 1 2 Days (48 hours) CPU 6.48 10.07
RAM 11.43 17.22
Dataset 1 2 Days (48 hours) CPU 0.09 0.09
DiskReadTime 2.21 8.06
RAM 0.19 0.20
DiskWriteTime 0.31 1.37
DiskReadTime 0.01 0.01
DiskWriteTime 0.01 0.01 5 Days (120 hours) CPU 7.13 10.63
RAM 13.24 21.41
5 Days (120 hours) CPU 11.28 17.55
DiskReadTime 2.81 8.46
RAM 34.54 56.20
DiskWriteTime 0.52 2.33
DiskReadTime 3.45 6.92
DiskWriteTime 0.87 2.09 7 Days (168 hours) CPU 5.49 9.61
RAM 10.79 19.18
7 Days (168 hours) CPU 12.09 17.43
DiskReadTime 2.15 7.38
RAM 37.62 56.21
DiskWriteTime 0.55 2.57
DiskReadTime 3.93 7.61
DiskWriteTime 0.88 2.03 Dataset 2 2 Days (48 hours) CPU 5.28 9.20
RAM 16.02 22.21
Dataset 2 2 Days (48 hours) CPU 0.09 0.09
DiskReadTime 2.90 6.75
RAM 0.21 0.21
DiskWriteTime 0.38 1.23
DiskReadTime 0.01 0.01
DiskWriteTime 0.02 0.02 5 Days (120 hours) CPU 3.63 7.06
RAM 12.54 19.05
5 Days (120 hours) CPU 6.52 9.23
DiskReadTime 3.62 8.60
RAM 31.54 45.02
DiskWriteTime 0.31 1.12
DiskReadTime 1.57 3.30
DiskWriteTime 0.31 0.77 7 Days (168 hours) CPU 3.03 6.17
RAM 10.38 17.36
7 Days (168 hours) CPU 5.81 8.19
DiskReadTime 2.76 7.54
RAM 28.05 39.87
DiskWriteTime 0.24 0.98
DiskReadTime 3.21 6.95
DiskWriteTime 0.36 0.87 Dataset 3 2 Days (48 hours) CPU 1.41 1.47
RAM 5.29 5.51
Dataset 3 2 Days (48 hours) CPU 1.21 3.56
DiskReadTime 0.37 0.42
RAM 1.39 5.84
DiskWriteTime 0.02 0.02
DiskReadTime 2.98 7.95
DiskWriteTime 0.09 0.37 5 Days (120 hours) CPU 4.20 8.36
RAM 9.95 15.67
5 Days (120 hours) CPU 1.88 4.66
DiskReadTime 5.39 14.86
RAM 6.64 16.12
DiskWriteTime 0.70 2.70
DiskReadTime 4.51 12.07
DiskWriteTime 0.40 1.96 7 Days (168 hours) CPU 3.91 7.31
RAM 9.25 14.71
7 Days (168 hours) CPU 2.76 6.44
DiskReadTime 4.70 12.76
RAM 8.03 17.89
DiskWriteTime 0.64 2.33
DiskReadTime 4.86 12.14
DiskWriteTime 0.53 2.25
11