Professional Documents
Culture Documents
Ant Colony 2022 Future Gen
Ant Colony 2022 Future Gen
Ant Colony 2022 Future Gen
article info a b s t r a c t
Article history: The Virtual Machine Placement (VMP) problem is a challenging optimization task that involves
Received 26 March 2021 the assignment of virtual machines to physical machines in a cloud computing environment. The
Received in revised form 17 August 2021 placement of virtual machines can significantly affect the use of resources in a cluster, with a
Accepted 22 November 2021
subsequent impact on operational costs and the environment. In this paper, we present an improved
Available online 5 December 2021
algorithm for VMP, based on Parallel Ant Colony Optimization (PACO), which makes effective use of
Keywords: parallelization techniques and modern processor technologies. We achieve solution qualities that are
Virtual Machine Placement comparable with or superior to those obtained by other nature-inspired methods, with our parallel
Ant Colony Optimization implementation obtaining a speed-up of up to 2002x over recent serial algorithms in the literature.
Swarm intelligence This allows us to rapidly find high-quality solutions that are close to the theoretical minimum number
Parallel MAX-MIN Ant System
of Virtual Machines.
Parallel Ant Colony Optimization
© 2021 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.future.2021.11.019
0167-739X/© 2021 Elsevier B.V. All rights reserved.
J. Peake, M. Amos, N. Costen et al. Future Generation Computer Systems 129 (2022) 174–186
Machines from one Physical Machine to another in a seamless Rj where the sums are taken over the indices of all virtual
fashion, without disruption for the user [8]. This ability to migrate machines i which are mapped to the physical machine j. The
VMs therefore offers the possibility of optimization of their place- optimization problem seeks to find a feasible solution which
ment on servers. The fundamental question we address, therefore, maximizes the number of empty physical machines, which is
is as follows: given a set of VMs, each with a specific resource equal to the cardinality of the set of indices j which are not
mapped from any virtual machines i.
footprint, and a number of PMs with individual capacities, what
An illustrative example of an instance of the static VMP prob-
is the best allocation of VMs to PMs, such that the number
lem is shown in Fig. 1, along with its solution. We have an initially
of PMs is minimized? For scenarios where a small number of
unbounded number of PMs, each with a fixed CPU and RAM
servers are available, determining the most efficient allocation
resource, and a number of VMs (VM1-5) to be allocated to PMs
for a small number of VMs can be trivial. However, services such
such that the total resource requirement on each PM does not
as AWS have millions of users and hundreds of thousands of
exceed its capacity, and the number of PMs is minimized (in this
servers.
case, to three).
Virtual Machine Placement (VMP) is an NP-hard problem [9],
in which the aim is to allocate Virtual Machines (VMs) to Physical
2.2. Existing methods
Machines (PMs) as efficiently as possible. While VM features
differ between variants of the problem, the two most typical
As with many combinatorial optimization problems, a wide
attributes are memory (RAM) and processing (CPU). RAM require-
range of techniques exist to find solutions to VMP instances.
ments are generally measured in Gigabytes (GB), while CPU re- As well as heuristic-based approaches such as Next Fit, First
quirements are generally measured in either processor cores or Fit, First Fit Decreasing (FFD) and Best Fit (BF) [13], more ad-
MIPS (million instructions per second). Every VM has its own vanced optimization techniques have been successfully applied
specific requirement for each, which means it occupies its own to the VMP problem; these include Genetic Algorithms (GA) [14–
resource ‘‘footprint’’. The aim of the problem is to ‘‘legally’’ al- 16], Particle Swarm Optimization [17], Q-Learning [18] and Ant
locate every VM to a PM in such a way that the number of PMs Colony Optimization (ACO) [14,19–21].
required is minimized (that is, this version of the VMP is a variant A recently-published method for VMP, which provides one
of a bin packing problem). Here, we focus on the static variant of of the comparison baselines for the work presented here, is the
the VMP problem, where we need to allocate a fixed set of VMs IGA-POP genetic algorithm (GA) [22]. This frames the VMP as
to PMs. a Variable-Sized Bin Packing Problem (VSBPP), a variant of the
Although the static variant of the VMP problem is relatively Bin Packing Problem in which the container elements have dif-
limited in applicability (as, in reality, cloud computing represents fering capacities. In IGA-POP, a solution encodes an ordering of
a dynamic situation in which virtual machines are added and VM assignments to PMs. The fitness function for this algorithm
removed at different times) it still offers a useful challenge, with prioritizes low power usage, and it performs competitively in
a number of recent solutions in the literature for the purposes terms of solution quality against the BF and First-Fit (FF) greedy
of comparison. Our focus here, and the key contribution of the algorithms, the Sine–Cosine Optimization Algorithm (SCA) [23]
current paper, lies in the efficient solution of such problems and a generic GA. For this reason, we select IGA-POP as being
using parallelized and vectorized ACO. This work represents an representative of the ‘‘evolutionary’’ algorithm class of solvers for
evolution of our earlier work on applying ACO to the Travelling VMP.
Salesman Problem (TSP) [10,11], and future work will adapt our Our own method is based on Ant Colony Optimization
methods to solve the dynamic version of VMP. We note that (ACO), [24] which is an optimization metaheuristic modelled on
the pheromone evaporation mechanism of ACO could provide a the foraging behaviour of ants [25]. When ants leave the nest to
suitable adaptation strategy for the dynamic problem, which has look for food, they initially explore the local area; once food is
recently been shown to be effective on the dynamic version of located by an ant, it returns to the nest. On the return journey,
the TSP [12]. the ant leaves a pheromone trail, which increases the probability
175
J. Peake, M. Amos, N. Costen et al. Future Generation Computer Systems 129 (2022) 174–186
that other members of the colony will take that path to the Due to the inherently distributed nature of ACO (many ‘‘ants’’
food source. As each ant follows a trail, it also lays its own trail, effectively act independently, informed by their environment),
strengthening the pheromone over time and causing more ants parallelizing the algorithm is a well-researched topic. Early imple-
to follow it, in a process of positive feedback reinforcement. mentations of parallel ACO made use of distributed systems [29,
This phenomenon is abstractly replicated by the ACO algo- 30]. Later work on distributed ACO systems included the use of
rithm, with problem instances generally represented by graph agent-based systems [31–34], decentralized algorithms [35] and,
structures, and with multiple ‘‘ants’’ searching for a solution by most recently, desynchronized parallel ACO has demonstrated
traversing its edges according to pheromone concentrations. As scalability on up to 400 compute nodes [36].
an ant traverses a problem graph, it uses a weighted random The development of Nvidia’s Computed Unified Device Archi-
process to select its next move, in which solution components tecture (CUDA) framework led to a significant number of GPU-
with a better combination of pheromone and heuristic are more based implementations [37–39]. The CUDA framework gives ac-
likely to be selected. Pheromone ‘‘evaporates’’ over time, allowing cess to the powerful parallelization architecture offered by GPUs
unproductive paths to be eventually removed (thus preventing for graphics processing, which may be utilizes for other purposes
premature convergence).
(this is known as General-Purpose computing on Graphics Pro-
Feller et al. [20] presented an early application of ACO to VMP.
cessing Units (GPGPU)). Before CUDA, distributed systems were
This treats VMP as a multi-dimensional bin-packing problem
the only realistic method of parallelizing an algorithm, but CUDA
(MDBP), with the bins being the Physical Machines, and the items
allows for parallelization to be performed on a single machine.
to be packed being the VMs. As reducing the number of PMs is the
The rising thread count on modern processors, along with the
most effective way of reducing energy usage, the objective of the
availability of Single Instruction Multiple Data (SIMD) vector op-
algorithm is to minimize the number of bins used. The algorithm
fills bins (PMs) one-at-a-time, with each bin being closed when erations such as the AVX512 instruction set on Intel Xeon Phi and
no remaining VMs can fit inside. Pheromone is deposited on Xeon processors, has also increased the viability of parallel ACO
Item–Bin pairs, with VMs being linked to the specific PM to which on CPUs [40–44]. SIMD is a class of parallel computing in which
they are allocated. The heuristic is based on the total resource the same operation is performed on multiple data points simulta-
utilization of the PM if the current VM were to be assigned to neously, allowing multiple operations to be performed in parallel.
it, and pheromone deposition is based on the average utilization In the case of AVX512 and the earlier AVX2 instruction set, 16
of all utilized PMs. The Feller ACO technique outperforms First and 8 operations respectively may be performed simultaneously.
Fit Decreasing (FFD) in terms of energy usage, saving 4.1% on These instructions may be applied to code that is already paral-
average. However, execution time for the algorithm is significant, lelized, essentially reducing the number of operations required by
ranging from 37.46 s for 100 VMs to 2.01 h for 600 VMs. up to a factor of 16, allowing for an even deeper level of parallel
A more recent ACO-based VMP algorithm is OEMACS [21]. This processing.
adds two Local Search procedures: an exchange procedure similar While running ants in parallel across a graph seems like a
to a local search procedure used for Bin Packing Problems [26], straightforward task, certain aspects of the ACO algorithm make
which swaps VMs between PMs in an attempt to find a more ef- parallelization difficult. In particular, the fundamental roulette-
ficient configuration, and an insertion procedure, which attempts wheel selection technique used by ACO, where different paths
to remove a VM from one PM and insert it into another. OEMACS are allocated a ‘‘slice’’ of a roulette wheel that is proportional to
outperforms [20] in terms of both solution quality and execution their favourability, is not amenable to parallelization. In order to
time. We therefore compare our ACO algorithm against OEMACS, overcome this issue, Cecilia et al. [37] developed a new data par-
as it stands as a representative modern ACO algorithm for the allel approach to ACO edge selection, known as I-Roulette. While
VMP. the exact proportionality between probability and edge weights
To summarize, we select OEMACS and IGA-POP as representa- is not fully maintained, I-Roulette still accurately replicates the
tive state of the art ACO and GA solvers for the VMP, along with behaviour of Roulette Wheel selection in a parallel-compatible
a standard heuristic, First-Fit. manner. I-Roulette was later adapted into vRoulette [42], which
made use of the AVX512 vector instructions to further increase
3. Our ACO algorithm for VMP
the efficiency of the algorithm.
In what follows we describe an alternative ACO algorithm
3.1. Overview of ACO
for VMP which dramatically reduces run-time by harnessing the
techniques described above, along with modern processor fea-
As we base our algorithm on ACO, we now provide a brief
overview of its operation. The first ACO algorithm (Ant System) tures.
was described by [27]. Many variants and applications of the al-
gorithm have since been developed; however, the key features of 3.2. Data structures and algorithm design
ACO, shared by all variants, are that the algorithm uses a number
of agents (ants) which independently construct solutions guided In this Section we describe our algorithm for Parallel Ant
by a global pheromone matrix data structure (and, in some cases, Colony Optimization for Virtual Machine Placement (PACO-VMP),
a problem-specific heuristic). The representation of a solution which combines a standard ACO variant with existing paral-
typically takes the form of a subset of edges of a graph, and lelization techniques and SIMD vector operations. Our method
the solution construction phase of the algorithm involves each ant is based on the MMAS variant of ACO, as this is the version
iteratively traversing the graph, building a feasible solution by that is most amenable to parallelization (due to the absence
selecting from the available edges at each step. Edges are selected of communication between ants during an iteration). We have
using a random choice, weighted by the pheromone and heuristic made complete reference code available online.1 Our notation is
values associated with the available edges. The pheromone up- defined in Table 1, a high-level pseudocode description of the
date phase of the algorithm associates higher pheromone values method is supplied in Algorithm 1, and a detailed description of
with edges which are included in objectively ‘‘good’’ solutions, the algorithm is given in Fig. 4.
and to evaporate pheromone from older, less successful, solution
components. In what follows, we base our algorithm on the
MAX -MIN Ant System (MMAS) ACO variant [28]. 1 https://github.com/jnpeake/PACO-VMP.
176
J. Peake, M. Amos, N. Costen et al. Future Generation Computer Systems 129 (2022) 174–186
Fig. 2. Schematic representation of the solution graph and the solution construc-
tion process carried out by the ants. Edges represented by bold lines correspond
Fig. 3. A demonstration of how the vRoulette-1 technique combines the heuris-
to edges in the solution, possible edges are in light grey.
tic and pheromone values of a PM with a random number between 0 and
1. AVX2 instructions allows operations to be carried out on each Vector lane
(numbered 0–7) simultaneously.
When constructing a solution, ants visit each virtual machine
in turn and allocate it to a physical machine. The graph represen-
tation of the solution is therefore a complete bipartite graph be- into vectors. As our implementation uses the Intel AVX2 instruc-
tween the two sets of vertices representing the virtual and phys- tions, which operate on 8 32-bit values at a time, the size of each
ical machines respectively. Solution construction corresponds to array must be a multiple of 8. The arrays also need to be aligned
the ants selecting NVM edges from this graph, with one edge in memory in order to be correctly loaded into AVX2 vectors. The
attached to each of the NVM vertices on one side of the graph rep- pheromone matrix is a matrix of size NVM × NVM , with NVM being
resenting the virtual machines. The graph is illustrated schemat- the number of Virtual Machines in the problem instance. The
ically in Fig. 2. values of the pheromone matrix are initially set to τ0 = 1/NPM ,
The key data structures used by the algorithm are: where NPM is the number of Physical Machines. In this phase
we also set the value of the MMAS constant a (see Eq. (11)),
1. The pheromone matrix, a square (NVM × NVM ) matrix of
which is later used to determine the maximum and minimum
floating point numbers which holds the pheromone values
pheromone values. The number of PMs is initially set to be equal
τij corresponding to placing two VMs, i and j in the same
to the number of VMs.
PM in a solution.
2. An array of ant data structures, which each contain a repre-
3.2.2. Solution construction
sentation of a solution (an array of arrays of integers, which
The first step of the solution construction phase is to randomly
lists the indices of the VMs assigned to each PM).
shuffle the VMs. This happens at the beginning of each iteration
At the highest level of description, the algorithm proceeds as in order to prevent VMs being allocated to the same PM purely
shown in Algorithm 1; each phase of the algorithm is discussed due to their position in the array. OpenMP is used to allocate each
in detail in subsections which follow. ant’s construction process to a separate thread. As each ant only
reads from global pheromone memory during the construction
Algorithm 1 High-level description of proposed algorithm. phase, and does not write to memory, synchronization is not
required. During the construction phase, the ants loop through
Initialization Phase
every VM and allocate it to a PM, unless the current VM does not
for each Iteration do
fit in any remaining PM. Any VMs left un-allocated at the end
for each Ant do
of the loop are then allocated to the PM with the most available
// Solution Construction
capacity, creating an infeasible solution. A Local Search procedure,
Randomize order of VM list
fully described in a later section, is then applied to the solution
for each VM in VM list do
in an attempt to make it feasible.
// allocate VM to PM
The selection procedure used to allocate VMs is based on
Calculate pheromone and heuristic for each PM
the vRoulette-1 technique developed by Lloyd & Amos [42].
if placement feasible then
This is illustrated in Fig. 3, which shows how the Heuristic and
Place VM with weighted random selection
Pheromone values (which we describe in detail later) of the PM in
else
each vector lane are combined with a random number between
Allocate VM to PM with most available space
0 and 1. This is then multiplied by a Tabu value, which is set to
end if
0 or 1 (the value is only set to 0 in the instance that the ‘‘PM’’
end for
in that lane is actually just a placeholder used to pad the PM
end for
list to a multiple of 8), and then masked by vectors (denoted
Apply Local Search to iteration-best Ant
MaxCPUMask and MaxRAMMask) that filter out any PMs that do
if Iteration-best solution is feasible then
not have enough available capacity for the current VM. This is
Update Global Best Solution
done on a vector-by-vector basis, with 8 PMs being processed
end if
for selection in parallel. The 8 current PM values are compared
Pheromone Update
lane-by-lane with a vector of the highest PM values in the current
end for
selection process. Once every PM has been processed, a parallel
reduction (with the max operator) is carried out on this vector
3.2.1. Initialization phase and the PM corresponding to the highest value is then assigned to
In this phase, the parameters and structures required by the the current VM. If the highest value is lower than 0, this indicates
algorithm are created and initialized. An important first step is that no PMs had enough capacity available for the current VM,
to ensure that any arrays used for vectorized computations are and the VM is added to the unassigned list, to be allocated once
padded correctly, which prevents errors when they are loaded the solution construction procedure has been completed.
177
J. Peake, M. Amos, N. Costen et al. Future Generation Computer Systems 129 (2022) 174–186
3.2.4. Local Search where NPM is the number of PMs in the current instance and
Local Search is a procedure that takes a candidate solution Pjmax and Pjidle are the maximum and idle power usage of physical
generated by an optimization algorithm and applies small per- machine j respectively. We choose a definition of pheromone
turbations to it in order to find a local minimum with respect to based on power usage, as this reflects the positive impact of a
some neighbourhood. The term ‘‘Local Search’’ refers to a vast ar- lower number of PMs while still measuring differences between
ray of usually problem-specific techniques for making these small solutions with the same number of PMs used. The global amount
changes. Local Search techniques are widely used in conjunction of pheromone then decays by a static amount, controlled by the
with ACO to good effect, and they are essential for creating parameter ρ ,
solutions that are optimal or near-optimal. Local Search takes
τij ← τij (1 − ρ ) ∀i, j ∈ [1, NVM ]. (8)
place after the solution construction phase, once the iteration-
best solution has been determined. Following [21], we perform The choice of value of ρ will be discussed in Section 4.
local search only on the iteration-best solution. However, we MMAS utilizes a clamping procedure to prevent stagnation,
note (from our experimental results) that the time taken by the by restricting the level of pheromone to be between maximum
local search phase was relatively short (less than 1% of execution and minimum values. The maximum and minimum values are
time on our largest instances), and that local search could (in defined as
principle) be parallelized so that each ant performs this step 1
on its own thread. We followed [21] for a fair comparison, but τmax = global
(9)
ρ Nbest
note that in future work we may obtain further improved so-
lution performance by conducting local search on all candidate 2(1 − a)
τmin = τmax (10)
solutions. (NVM + 1)a
Our Local Search is based on a technique developed by Alvim where NVM is the number of VMs in the current instance, ρ is the
et al. [26] for the Bin Packing Problem, and also utilized by Liu evaporation parameter and
et al. [21] for the VMP. In this algorithm, after each solution is
found, one bin is destroyed. If a subsequent solution is then able a = exp(ln(0.05)/NVM ). (11)
to successfully fit all items in the remaining bins, it is consid- This clamping is applied to the whole matrix after evaporation.
ered feasible. However, if no feasible solution can be found, the
local search technique is applied. There are two phases of our 4. Experimental evaluation
technique, the swap phase and the insertion phase. Any PM that
has been allocated more VMs than it has capacity for is marked We investigate the performance of our algorithm by compar-
as overloaded. In the swap phase an overloaded PM is compared ing it with an implementation of the OEMACS algorithm (which is
with every non-overloaded PM, and the algorithm attempts to an ACO-based method that generally out-performs conventional
swap each VM in the overloaded PM with each VM in the non- heuristics and evolutionary algorithms for this problem [21]),
overloaded PM. This continues until either a successful swap and a state-of-the-art genetic algorithm, IGA-POP [22]. Code for
takes place, or every non-overloaded PM has been compared to OEMACS is publicly available.2 All algorithms were implemented
the overloaded PM. Regardless of the outcome, the process is in C++, and all tests were carried out on a machine with an
®
carried out again for the next PM, and this continues until every Intel Xeon E5-2640 v4 processor with 20 cores running at a
overloaded PM has been compared. If the swap phase is unable base frequency of 2.4 GHz and a maximum frequency of 3.4 GHz.
to successfully find a feasible solution, the insertion phase is then Code was compiled using the GNU C++ compiler (g++), with O2
performed. In this phase, each overloaded PM attempts to allocate optimization enabled. The initial comparative tests compare a
each of its VMs to a non-overloaded PM. While this is far less serial implementation of PACO-VMP with the serial algorithms
likely to produce positive results than the swap phase, it is still OEMACS and IGA-POP, with the aim of comparing the quality
able to occasionally make progress when the swap phase cannot. of solutions obtained. Two variants of IGA-POP are used: the
first, referred to as GA1, uses the fitness function also used by
3.2.5. Pheromone update PACO-VMP and OEMACS; the second, referred to as GA2, uses
The final phase of our ACO algorithm is the deposition of a slightly modified version of the fitness function used in the
pheromone. As our algorithm is based on the MMAS ACO vari- initial IGA-POP experiments [22]. Finally, to evaluate the impact
ant, pheromone is deposited only by the global-best ant. As of our OpenMP parallelization, we also run a parallelized PACO-
mentioned previously, pheromone is distributed on edges con- VMP using OpenMP, assigning one ant to each of our 20 cores.
necting VMs allocated to the same PM. The pheromone matrix is While the execution time differs, solution quality is identical to
updated using the serial version.
KP
τij ← τij + (6) 4.1. Problem instances
P
where KP is a constant, for all pairs of VMs i, j which are allocated All problem instances used in our experiments were randomly
to the same PM in the global best solution, and where P is the generated. For each instance, the initial number of Physical Ma-
power usage of the solution. In our runs we used KP = 365. chines is set to be equal to the number of Virtual Machines. Our
In practice, this is a constant which scales power usage into the dataset consists of three sets of 600 VMP instances, each further
typical range of values for τmin and τmax , defined below. The actual split into 6 subsets of 100 instances with 100, 200, 300, 400, 500
value of this parameter may be varied depending on the units and 1000 VMs. The three sets (A, B and C) are described in the
used for power in the instance definition, and can be considered following subsections, and differ in terms of either the inclusion
a hyper-parameter of the method. of bottlenecks in one or other resource, or the homogeneity of the
The power usage is defined as physical machines in the instance setup. One run is performed
NPM
( ) using each instance. We use one run per instance with a larger
∑ Cjused
P = (Pjmax − Pjidle ) cap + Pjidle (7)
Cj 2 https://github.com/Budding0828/OEMACS.
j=1
179
J. Peake, M. Amos, N. Costen et al. Future Generation Computer Systems 129 (2022) 174–186
Fig. 5. Solution difference measured as percentage over theoretical optimum for Fig. 6. Average time to solve (seconds) for 100 instances of a given instance
PACO-VMP, OEMACS, GA1, GA2 and FF for instance set A. size for PACO-VMP, OEMACS, GA1 and GA2 for instance set A.
4.3. Results
Fig. 8. Average time to solve (seconds) for 100 instances of a given instance Fig. 9. Solution difference measured in percentage over theoretical optimum for
size for PACO-VMP, OEMACS, GA1 and GA2 for instance set B. PACO-VMP, OEMACS, GA1 and GA2 and FF for instance set C.
Table 2
Solution quality results of our experiments on FF, OEMACS, GA and PACO-VMP. Entries in the Set column represent
the 100 instances of the specified size from the specific instance set. Solution Quality is the average percentage
over the theoretical minimum for all 100 problem instances for each size within each set with values in bold
being the best result, on the condition that it is significantly different when results are analysed with the Wilcoxon
signed-rank test. Errors quoted are the standard deviation over the 100 instances in each set.
Set Solution Quality (%)
FF OEMACS GA1 GA2 PACO
A100 10.3 ± 4.0 8.98 ± 2.76 8.80 ± 2.42 8.54 ± 1.92 8.25 ± 3.13
A200 5.03 ± 1.82 4.71 ± 1.60 5.13 ± 2.23 4.4 ± 1.31 4.47 ± 1.46
A300 3.69 ± 1.33 3.26 ± 1.15 4.21 ± 2.45 2.92 ± 0.858 3.32 ± 1.24
A400 2.98 ± 1.00 2.78 ± 0.98 4.78 ± 2.78 2.46 ± 1.04 2.78 ± 1.01
A500 2.58 ± 0.75 2.39 ± 0.78 5.54 ± 3.57 1.92 ± 0.672 2.42 ± 0.81
A1000 1.58 ± 0.31 1.96 ± 0.39 11.3 ± 4.2 1.26 ± 0.479 1.58 ± 0.31
B100 16.0 ± 3.7 8.24 ± 3.33 6.49 ± 1.41 7.75 ± 5.57 6.24 ± 1.90
B200 12.7 ± 2.2 5.05 ± 2.31 3.01 ± 1.02 4.78 ± 4.11 3.08 ± 1.04
B300 11.5 ± 1.5 3.72 ± 1.35 2.05 ± 0.47 5.36 ± 4.12 2.11 ± 0.62
B400 11.0 ± 1.4 3.47 ± 1.24 1.64 ± 0.384 6.77 ± 4.51 1.69 ± 0.41
B500 10.4 ± 1.1 2.89 ± 1.06 1.25 ± 0.35 6.30 ± 3.88 1.37 ± 0.52
B1000 9.83 ± 0.72 2.82 ± 0.68 0.79 ± 0.29 11.4 ± 4.02 0.91 ± 0.32
C100 34.8 ± 14.1 13.7 ± 4.7 6.96 ± 2.29 23.9 ± 7.79 7.67 ± 2.89
C200 31.6 ± 10.5 11.9 ± 2.6 5.03 ± 1.18 31.4 ± 6.25 6.10 ± 1.46
C300 32.1 ± 9.4 11.7 ± 2.4 4.46 ± 0.97 37.2 ± 6.56 5.39 ± 1.09
C400 31.0 ± 7.6 12.1 ± 2.4 4.07 ± 0.84 41.3 ± 5.4 4.66 ± 0.86
C500 29.2 ± 6.7 12.0 ± 2.0 3.93 ± 0.61 43.0 ± 4.45 4.44 ± 0.62
C1000 26.3 ± 5.2 12.8 ± 1.5 3.7 ± 0.48 54.5 ± 3.67 3.62 ± 0.32
results reveals that the issue stems from the tendency of the can be paired, and compared to each other using the Wilcoxon
algorithm to assign VMs to empty PMs even when currently used signed-rank test (a non-parametric test which can be used to
PMs have enough capacity remaining, which happens regardless compare paired sets of readings). Since we perform an all-vs-
of fitness function. Both PACO-VMP and OEMACS enforce a limit all comparison of three tests (all possible pairs of algorithms)
on the number of PMs that can be used (the previous best number we apply the Bonferroni correction, and divide the significance
of PMs) which prevents this behaviour. Additionally, it is worth threshold by the number of tests (in this case 3). Bold values in
nothing that despite a 10x increase in instance size across our the table show solution quality values that are significantly better
experiments, the quality of the solutions produced by PACO-VMP than the other four algorithms, using a significance threshold of
remains consistent. The percentages above the lower limit of PM 0.002 (that is, 0.01 after application of the Bonferroni correction).
utilization decrease with each increase in instance size, which (in In general, one of the two GA versions tends to produce the
terms of raw numbers) indicates a fairly consistent number of best solutions; however, although the differences are in many
PMs over the minimum. This suggests that our implementation cases statistically significant, the magnitude of the effect is small.
could still produce good results for even larger VMP instances. For example, in the case of the A1000 instances, a comparison
This is an advantage over OEMACS, where solution quality de- between GA2 and PACO shows that – out of the 100 trials – GA2
grades for size 1000 instances, a trend which would potentially is superior for 46 instances, whereas ACO is superior for 3, with
continue as instance sizes increase. 51 ties. Although this is a statistically highly significant difference,
The main focus of PACO-VMP is to improve execution time,
the magnitude of the difference is only 0.32% in terms of solution
and it succeeds at this objective. While PACO-VMP and OEMACS
quality. This demonstrates that the experiments are very sensitive
use similar pheromone definitions and local search techniques,
in detecting significant, but small, differences in performance.
PACO-VMP produces better results, both in terms of execution
Qualitatively, the results show that one of the two GAs gener-
time and solution quality. This may be due partly to the choice
ally performs the best for any set of instances, but this is often
of MMAS algorithm over ACS, and also explained by differ-
accompanied by the other GA performing the worst. Since we
ences in selection probabilities due to the use of independent
used the GA with the recommended parameters for the popula-
roulette. It has been shown [47] that independent roulette al-
tion size and number of generations, this performance also comes
gorithms (such as vRoulette) tend to make greedier selections
at a significant cost; for example in the 1000 VM instances, GA1
than the traditional roulette wheel algorithm, which may be a
factor explaining the different solution qualities found between and GA2 perform 4000 evaluations per generation for 200 gen-
PACO-VMP and OEMACS. Clearly the areas in which PACO-VMP erations, compared to 20 evaluations for 50 iterations in PACO.
and OEMACS differ are significant in terms of execution time, as Furthermore, the difference in performance between the two cost
PACO-VMP has a time complexity of O(n2 ), whereas OEMACS is, functions is very clear; using the original cost function proposed
experimentally, closer to O(n4 ). The main contributing factor to by [22] leads to poor performance on the B and C instance sets.
this is the probability calculation; whereas PACO-VMP uses the On the other hand, PACO-VMP achieves solution quality close to
resource wastage formula as given in Formula (1) as the heuristic best (or best) across all instance types, without any sensitivity to
value, OEMACS uses a much more complex formula that includes the algorithm parameters, and achieves better average solution
resource wastage, but also has extra sums over the VMs in both quality than the other ACO algorithm (OEMACS) in 15 out of
the numerator and denominator of the formula. Experimentally, the 18 instance categories. There is also a clear advantage for
the time complexity of the IGA-POP variants is approximately PACO-VMP in terms of both scalability and execution time. The
O(n3 ). computational complexity of PACO is superior to both OEMACS
We summarize our results in Tables 2 and 3. These show and GA1/2, and the execution time of the parallel version is
results for solution quality and execution time; for the solution several orders of magnitude lower in most cases. For the C1000
quality results we indicate which results are statistically signifi- instances, the most challenging instance set, PACO-VMP achieves
cant. The results of each algorithm on each instance of a given size the best solution quality of all algorithms, in an average time of
183
J. Peake, M. Amos, N. Costen et al. Future Generation Computer Systems 129 (2022) 174–186
Table 3
Execution time results from our experiments on OEMACS, GA and PACO-VMP. Entries in the Set column represent
the 100 instances of the specified size from the specific instance set. Execution Time is the average time per run in
seconds for all 100 problem instances for each size within each set. Errors quoted are the standard deviation over
the 100 instances in each set.
Set Execution Time (s)
OEMACS GA1 GA2 PACO(S) PACO(P)
A100 0.9364 ± 0.0655 9.098 ± 0.166 15.05 ± 0.27 0.1246 ± 0.0217 0.05473 ± 0.00447
A200 12.22 ± 0.60 60.23 ± 0.57 107.9 ± 3.2 0.4234 ± 0.0596 0.1542 ± 0.0052
A300 51.81 ± 1.60 194.4 ± 1.9 354.3 ± 6.4 0.8922 ± 0.0772 0.2653 ± 0.0029
A400 158.8 ± 4.4 440.0 ± 3.8 817.7 ± 4.1 1.528 ± 0.099 0.4989 ± 0.0133
A500 369.8 ± 8.7 832.8 ± 6.0 1575 ± 9 2.387 ± 0.117 0.7591 ± 0.0158
A1000 5451 ± 233 5711 ± 200 12120 ± 310 10.37 ± 0.62 2.986 ± 0.476
B100 1.082 ± 0.116 9.044 ± 0.164 15.32 ± 0.19 0.1211 ± 0.0202 0.05445 ± 0.00463
B200 14.14 ± 0.59 59.15 ± 0.54 109.2 ± 3.1 0.4228 ± 0.0595 0.1544 ± 0.0045
B300 58.75 ± 1.97 189.7 ± 1.6 358.6 ± 3.8 0.9016 ± 0.0871 0.266 ± 0.003
B400 181.1 ± 5.7 426.2 ± 2.4 829.1 ± 8.0 1.514 ± 0.094 0.4988 ± 0.0105
B500 414.8 ± 6.6 805.4 ± 4.4 1596 ± 22 2.351 ± 0.114 0.7561 ± 0.0137
B1000 6080 ± 208 5547 ± 190 12190 ± 280 9.594 ± 0.506 2.904 ± 0.426
C100 1.646 ± 0.242 9.733 ± 0.220 16.98 ± 0.30 0.1349 ± 0.0232 0.0556 ± 0.0049
C200 22.14 ± 3.00 63.66 ± 0.95 116.7 ± 2.6 0.4644 ± 0.0514 0.1576 ± 0.0048
C300 88.56 ± 9.21 205.0 ± 3.1 383.7 ± 7.1 0.9796 ± 0.0841 0.2742 ± 0.0037
C400 270.3 ± 23.0 456.8 ± 4.9 874.7 ± 14.7 1.663 ± 0.0968 0.5112 ± 0.0068
C500 633.0 ± 40.0 858.4 ± 8.0 1665 ± 20 2.623 ± 0.140 0.7709 ± 0.0212
C1000 8874 ± 509 5754 ± 195 12490 ± 430 10.69 ± 0.57 2.874 ± 0.486
Acknowledgement [22] A.S. Abohamama, Eslam Hamouda, A hybrid energy–aware virtual machine
placement algorithm for cloud environments, Expert Syst. Appl. 150 (2020)
JP is funded by the Centre for Advanced Computational Science 113306.
at Manchester Metropolitan University. [23] Seyedali Mirjalili, SCA: A Sine Cosine algorithm for solving optimization
problems, Knowl.-Based Syst. 96 (2016) 120–133.
[24] Marco Dorigo, Mauro Birattari, Thomas Stutzle, Ant colony optimization,
References
IEEE Comput. Intell. Mag. 1 (4) (2006) 28–39.
[25] Jean-Louis Deneubourg, Jacques M. Pasteels, Jean-Claude Verhaeghe, Prob-
[1] Brian Hayes, Cloud computing, 2008.
abilistic behaviour in ants: a strategy of errors? J. Theoret. Biol. 105 (2)
[2] A. Botta, W. de Donato, V. Persico, A. Pescapé, On the integration of cloud
(1983) 259–271.
computing and internet of things, in: 2014 International Conference on
[26] Adriana Alvim, Fred S Glover, Celso C Ribeiro, Dario J Aloise, Local Search
Future Internet of Things and Cloud, 2014, pp. 23–30.
[3] M. Satyanarayanan, The emergence of edge computing, Computer 50 (1) for the Bin Packing Problem, Citeseer, 1999.
(2017) 30–39. [27] Marco Dorigo, Vittorio Maniezzo, Alberto Colorni, Ant system: Optimiza-
[4] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, M. Ayyash, Internet tion by a colony of cooperating agents, IEEE Trans. Syst. Man Cybern. B 26
of things: A survey on enabling technologies, protocols, and applications, (1) (1996) 29–41.
IEEE Commun. Surv. Tutor. 17 (4) (2015) 2347–2376. [28] Thomas Stützle, MAX-MIN Ant System for Quadratic Assignment Problems,
[5] A. Zanella, N. Bui, A. Castellani, L. Vangelista, M. Zorzi, Internet of things Citeseer, 1997.
for smart cities, IEEE Internet Things J. 1 (1) (2014) 22–32. [29] Bernd Bullnheimer, Gabriele Kotsis, Christine Strauß, Parallelization strate-
[6] J.A. Guerrero-ibanez, S. Zeadally, J. Contreras-Castillo, Integration chal- gies for the ant system, in: High Performance Algorithms and Software in
lenges of intelligent transportation systems with connected vehicle, cloud Nonlinear Optimization, Springer, 1998, pp. 87–100.
computing, and internet of things technologies, IEEE Wirel. Commun. 22 [30] Ling Chen, Chunfang Zhang, Adaptive parallel ant colony algorithm, in:
(6) (2015) 122–128. International Conference on Natural Computation, Springer, 2005, pp.
[7] Damián Fernández-Cerero, Alejandro Fernández-Montes, Agnieszka 1239–1249.
Jakóbik, Limiting global warming by improving data-centre software, IEEE [31] S. Ilie, C. Bădică, A comparison of the island and ACODA approaches
Access 8 (2020) 44048–44062. for distributing ACO, in: 2013 17th International Conference on System
[8] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Theory, Control and Computing, ICSTCC, 2013, pp. 757–762.
Jul, Christian Limpach, Ian Pratt, Andrew Warfield, Live migration of
[32] Sorin Ilie, Costin Bădică, Multi-agent approach to distributed ant colony
virtual machines, in: Proceedings of the 2nd Conference on Symposium on
optimization, Sci. Comput. Program. 78 (6) (2013) 762–774, Special sec-
Networked Systems Design & Implementation-Vol. 2, 2005, pp. 273–286.
tion: The Programming Languages track at the 26th ACM Symposium
[9] Mark Stillwell, David Schanzenbach, Frédéric Vivien, Henri Casanova,
on Applied Computing (SAC 2011) & Special section on Agent-oriented
Resource allocation algorithms for virtualized service hosting platforms,
Design Methods and Programming Techniques for Distributed Computing
J. Parallel Distrib. Comput. 70 (9) (2010) 962–974.
in Dynamic and Complex Environments.
[10] Joshua Peake, Martyn Amos, Paraskevas Yiapanis, Huw Lloyd, Vectorized
[33] Sorin Ilie, Costin Bădică, Multi-agent distributed framework for swarm
candidate set selection for parallel ant colony optimization, in: Proceedings
intelligence, Procedia Comput. Sci. 18 (2013) 611–620, 2013 International
of the Genetic and Evolutionary Computation Conference Companion, in:
Conference on Computational Science.
GECCO ’18, Association for Computing Machinery, New York, NY, USA,
2018, pp. 1300–1306. [34] A. Kaplar, M. Vidaković, N. Luburić, M. Ivanović, Improving a distributed
[11] Joshua Peake, Martyn Amos, Paraskevas Yiapanis, Huw Lloyd, Scaling agent-based ant colony optimization for solving traveling salesman
techniques for parallel ant colony optimization on large problem instances, problem, in: 2017 40th International Convention on Information and Com-
in: Proceedings of the Genetic and Evolutionary Computation Conference, munication Technology, Electronics and Microelectronics, MIPRO, 2017, pp.
in: GECCO ’19, Association for Computing Machinery, New York, NY, USA, 1144–1148.
2019, pp. 47–54. [35] Huw Lloyd, Decentralized parallel any colony optimization for distributed
[12] Michalis Mavrovouniotis, Shengxiang Yang, Mien Van, Changhe Li, Marios memory systems, in: 2019 IEEE SmartWorld, Ubiquitous Intelligence
Polycarpou, Ant colony optimization algorithms for dynamic optimization: Computing, Advanced Trusted Computing, Scalable Computing Commu-
A case study of the dynamic travelling salesperson problem [research nications, Cloud Big Data Computing, Internet of People and Smart
frontier], IEEE Comput. Intell. Mag. 15 (1) (2020) 52–63. City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), 2019,
[13] E.G. Coffman Jr., M.R. Garey, D.S. Johnson, Approximation algorithms pp. 1561–1567.
for bin packing: A survey, in: Approximation Algorithms for NP-Hard [36] Mateusz Starzec, Grażyna Starzec, Aleksander Byrski, Wojciech Turek,
Problems, 1996, pp. 46–93. Kamil Pietak, Desynchronization in distributed ant colony optimization in
[14] Yongqiang Gao, Haibing Guan, Zhengwei Qi, Yang Hou, Liang Liu, A multi- HPC environment, Future Gener. Comput. Syst. 109 (2020) 125–133.
objective ant colony system algorithm for virtual machine placement in [37] José M Cecilia, José M García, Andy Nisbet, Martyn Amos, Manuel Ujaldón,
cloud computing, J. Comput. System Sci. 79 (8) (2013) 1230–1242. Enhancing data parallelism for ant colony optimization on GPUs, J. Parallel
[15] Haibo Mi, Huaimin Wang, Gang Yin, Yangfan Zhou, Dianxi Shi, Lin Distrib. Comput. 73 (1) (2013) 42–51.
Yuan, Online self-reconfiguration with performance guarantee for energy- [38] José M Cecilia, José M Garcia, Manuel Ujaldón, Andy Nisbet, Martyn
efficient large-scale cloud computing data centers, in: 2010 IEEE Amos, Parallelization strategies for ant colony optimisation on GPUs, in:
International Conference on Services Computing, IEEE, 2010, pp. 514–521. 2011 IEEE International Symposium on Parallel and Distributed Processing
[16] Fei Tao, Chen Li, T. Warren Liao, Yuanjun Laili, BGM-BLA: A New algorithm Workshops and Phd Forum, IEEE, 2011, pp. 339–346.
for dynamic migration of virtual machines in cloud computing, IEEE Trans. [39] Rafał Skinderowicz, The GPU-based parallel ant colony system, J. Parallel
Serv. Comput. 9 (6) (2015) 910–925. Distrib. Comput. 98 (2016) 48–60.
[17] Shangguang Wang, Zhipiao Liu, Zibin Zheng, Qibo Sun, Fangchun Yang,
[40] Darren M. Chitty, Applying ACO to large scale TSP instances, in: Fei
Particle swarm optimization for energy-aware virtual machine placement
Chao, Steven Schockaert, Qingfu Zhang (Eds.), Advances in Computational
optimization in virtualized data centers, in: 2013 International Conference
Intelligence Systems, Springer International Publishing, Cham, 2018, pp.
on Parallel and Distributed Systems, IEEE, 2013, pp. 102–109.
104–118.
[18] Seyed Saeid Masoumzadeh, Helmut Hlavacs, Integrating VM selection
[41] Ivars Dzalbs, Tatiana Kalganova, Accelerating supply chains with ant colony
criteria in distributed dynamic VM consolidation using fuzzy Q-learning,
optimization across a range of hardware solutions, Comput. Ind. Eng. 147
in: Proceedings of the 9th International Conference on Network and
(2020) 106610.
Service Management, CNSM 2013, IEEE, 2013, pp. 332–338.
[19] Fares Alharbi, Yu-Chu Tian, Maolin Tang, Wei-Zhe Zhang, Chen Peng, [42] Huw Lloyd, Martyn Amos, A highly parallelized and vectorized imple-
®
Minrui Fei, An ant colony system for energy-efficient dynamic virtual mentation of max-min ant system on Intel Xeon Phi™, in: 2016 IEEE
machine placement in data centers, Expert Syst. Appl. 120 (2019) 228–238. Symposium Series on Computational Intelligence, SSCI, IEEE, 2016, pp. 1–6.
[20] Eugen Feller, Louis Rilling, Christine Morin, Energy-aware ant colony based [43] Felipe Tirado, Ricardo J Barrientos, Paulo González, Marco Mora, Efficient
workload placement in clouds, in: 2011 IEEE/ACM 12th International exploitation of the Xeon Phi architecture for the ant colony optimization
Conference on Grid Computing, IEEE, 2011, pp. 26–33. (ACO) metaheuristic, J. Supercomput. 73 (11) (2017) 5053–5070.
[21] Xiao-Fang Liu, Zhi-Hui Zhan, Jeremiah D Deng, Yun Li, Tianlong Gu, [44] Felipe Tirado, Angelica Urrutia, Ricardo J. Barrientos, Using a coprocessor
Jun Zhang, An energy efficient ant colony system for virtual machine to solve the ant colony optimization algorithm, in: 2015 34th International
placement in cloud computing, IEEE Trans. Evol. Comput. 22 (1) (2016) Conference of the Chilean Computer Science Society, SCCC, IEEE, 2015, pp.
113–128. 1–6.
185
J. Peake, M. Amos, N. Costen et al. Future Generation Computer Systems 129 (2022) 174–186
[45] Mauro Birattari, On the Estimation of the Expected Performance of a Nicholas Costen received the BA degree in experimen-
Metaheuristic on a Class of Instances. how Many Instances, How Many tal psychology from the University of Oxford, and the
Runs? IRIDIA, Université Libre de Bruxelles, Brussels, Belgium, 2004. Ph.D. degree in mathematics and psychology from the
[46] D. Wilcox, A. McNabb, K. Seppi, Solving virtual machine packing with University of Aberdeen. He has undertaken research
a reordering grouping genetic algorithm, in: 2011 IEEE Congress of with the Advanced Telecommunications Research Lab-
Evolutionary Computation, (CEC), 2011, pp. 362–369. oratory, Kyoto, and the Division of Imaging Science
[47] Huw Lloyd, Martyn Amos, Analysis of independent roulette selection in and Biomedical Engineering, University of Manchester.
parallel ant colony optimization, in: Proceedings of the Genetic and Evolu- He is currently a reader in Cognitive Computer Vision
with Manchester Metropolitan University, where his
tionary Computation Conference, in: GECCO ’17, Association for Computing
interests include face recognition and human motion
Machinery, New York, NY, USA, 2017, pp. 19–26.
analysis.
Joshua Peake received the B.Sc. degree in Computer Giovanni Masala is Senior Lecturer in Computer Sci-
Science from Manchester Metropolitan University, UK, ence and Leader of the Robotics Lab in Manchester
in 2018. He is currently conducting research towards Metropolitan University (UK). He is also Visiting Re-
his Ph.D. in Computer Science, also at Manchester search Fellow at University of Plymouth (UK). His
Metropolitan University. research interests are in Artificial Intelligence (AI) and
Robotics, natural language understanding, social robots
for elderly and AI in medical applications.
186