Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

J. Cent. South Univ.

(2017) 24: 1050−1062


DOI: 10.1007/s11771-017-3508-7

Multi-objective workflow scheduling in cloud system based on


cooperative multi-swarm optimization algorithm

YAO Guang-shun(姚光顺)1, 2, DING Yong-sheng(丁永生)1, HAO Kuang-rong(郝矿荣)1


1. Engineering Research Center of Digitized Textile & Fashion Technology of Ministry of Education,
College of Information Sciences and Technology, Donghua University, Shanghai 201620, China;
2. College of Computer and Information Engineering, Chuzhou University, Chuzhou 239000, China
© Central South University Press and Springer-Verlag Berlin Heidelberg 2017

Abstract: In order to improve the performance of multi-objective workflow scheduling in cloud system, a multi-swarm multi-
objective optimization algorithm (MSMOOA) is proposed to satisfy multiple conflicting objectives. Inspired by division of the same
species into multiple swarms for different objectives and information sharing among these swarms in nature, each physical machine
in the data center is considered a swarm and employs improved multi-objective particle swarm optimization to find out
non-dominated solutions with one objective in MSMOOA. The particles in each swarm are divided into two classes and adopt
different strategies to evolve cooperatively. One class of particles can communicate with several swarms simultaneously to promote
the information sharing among swarms and the other class of particles can only exchange information with the particles located in the
same swarm. Furthermore, in order to avoid the influence by the elastic available resources, a manager server is adopted in the cloud
data center to collect the available resources for scheduling. The quality of the proposed method with other related approaches is
evaluated by using hybrid and parallel workflow applications. The experiment results highlight the better performance of the
MSMOOA than that of compared algorithms.

Key words: multi-objective workflow scheduling; multi-swarm optimization; particle swarm optimization (PSO); cloud computing
system

optimization, were reported in Refs. [5, 6]. And many


1 Introduction other works proposed a lot of effective algorithms for
workflow scheduling in a distributed computing system
Workflow emerging as a popular paradigm, is used for different objectives from different perspective [2,
by many scientists and engineers to model scientific and 7−9].
industrial applications [1−3]. Typically, a workflow Recently, cloud computing becomes a revolutionary
application contains a great number of tasks that have paradigm suitable to change the way of providing
precedence constraints, where the input of some of these heterogeneous services and computational resources to
tasks may depend on the output of the others. And customers in a pay-as-you-go model [10−14]. Cloud
usually, it can be described by a directed acyclic graph service providers, such as Amazon EC2 and IBM, can
(DAG) in which each computational task is represented offer flexible and scalable IT infrastructures to customers.
by a node, and each datum or control dependency With cloud computing, customers can scale up to
between tasks is represented by a directed edge between massive capacities in an instant without having to pay for
the corresponding nodes. software licenses and invest in new infrastructure. These
In a distributed heterogeneous computing system, characteristics attract an increasing number of
how to schedule the tasks of a workflow to the available individuals and corporations to rent cloud service for
computing resources, which belongs to a class of their applications.
NP-complete problem [4], is one of major challenges. In the context of cloud computing, the workflow
Many classical optimization methods, such as suffrage, scheduling is even more difficult because there are
min–min, max–min, HEFT and auction-based several factors to be considered. Firstly, the goal is

Foundation item: Project(61473078) supported by the National Natural Science Foundation of China; Project(2015−2019) supported by the Program for
Changjiang Scholars from the Ministry of Education, China; Project(16510711100) supported by International Collaborative Project of
the Shanghai Committee of Science and Technology, China; Project(KJ2017A418) supported by Anhui University Science Research,
China
Received date: 2015−06−17; Accepted date: 2016−09−30
Corresponding author: DING Yong-sheng, Professor, PhD; Tel: +86−21−67792323; E-mail: ysding@dhu.edu.cn
J. Cent. South Univ. (2017) 24: 1050−1062 1051
different between customers and cloud service providers. As for three objectives (makespan, cost and energy
Customers usually interest in minimizing makespan and consumption), FARD et al [21] and YASSA et al [22]
cost of their application, whereas cloud service providers presented a heuristic list scheduling and a hybrid particle
often interest in maximizing the resource utilization, swarm optimization (PSO) algorithm for these objectives,
minimizing the energy consumption or user fairness. In respectively. But none of the above methods has been
these circumstances, the scheduling must be formulated integrated with the structure of cloud data center, which
as a multi-objective optimization problem (MOOP) is composed of multiple physic machines (PMs) and
aiming at optimizing multiple possible conflicting where information can be shared among these PMs
criteria, where it is impossible to find the globally through Intranet. The above methods also did not
optimal solution with respect to all objectives. consider the dynamic change of computational resources
Moreover, the cloud data center offers its services to in the context of cloud computing.
customers in the form of virtual machine (VM) through In this work, we also take three objectives
virtualization technology and the running VMs can scale (makespan, cost and energy consumption) into
up and down dynamically according to the workloads in consideration, and design a multi-swarm multi-objective
the system. So, the scheduling strategy should be able to optimization algorithm (MSMOOA) for workflow
check the available computational resources as quickly scheduling in cloud computing. In order to obtain the
as possible after the change happened. available computational resources for the scheduling, a
Recently, some related works [15−17] have data center model is designed at first. In this model, a
proposed their methods for multi-objective workflow manager server is adopted to collect the information of
scheduling in cloud or grid system. In Ref. [15], the available computational resources after accepting the
problem was simplified to a single-objective problem by workflows submitted from customers, which effectively
aggregating all the objectives in one analytical function. avoids the influence from the elastic resources to
The main drawback of these approaches is that the scheduling results. Then, the MSMOOA is executed.
computed solution depends on the selected weights, Different from previous algorithms [18−22], the
which is usually decided with a-priori, without any MSMOOA takes advantage of the structure of cloud data
knowledge about the workflow, infrastructure, and in center to search non-dominated scheduling solutions. In
general about the problem being solved. Therefore, the the MSMOOA, each available PM is considered a swarm
computed solution may not be satisfactory for the solved and employs the improved multi-objective particle
problem if the weights do not capture the user swarm optimization algorithm (MOPSOA) to find out
preferences in an accurate way. Other approaches are non-dominated solutions with one objective. Through the
based on sorting the different objectives in a sequential Intranet connection among PMs, some particles in one
fashion [16]. Once an objective has been optimized and swarm can get information from other swarms and the
no further improvement is possible, the next objective is velocity update of these particles is also influenced by
considered. The optimization of this new objective is the states of other swarms, which promote the
carried out so that none of the imposed constrains over information sharing and cooperation among swarms.
the previous criteria are violated. ZHAN et al [17] took Some new update strategies are designed to improve the
this kind of approach to optimize makespan and particles’ search capability. We compare the MSMOOA
economic cost. However, for the above approaches, the with another multi-objective scheduling algorithm in
number of objectives is limited and the order in which cloud system and further analyze the quality of solutions
the objectives are optimized requires some sort of computed by these algorithms. Simulation results on
preferential information, which may be difficult to well-known hybrid and parallel workflow applications
derive. highlight the performance of the proposed approach.
Recently, some Pareto-based approaches have been The main contributions of this work are as follows:
used for multi-objective task scheduling. TAO et al [18] 1) The MSMOOA is proposed by introducing a new
proposed a case library and Pareto solution based hybrid multi-swarm cooperative mechanism and modifying the
genetic algorithm to find Pareto solutions for makespan update of the particles’ velocity in the MSMOOA. The
and energy consumption optimization. DURILLO et al update of particles’ velocity at each iteration is affected
[19] designed a Pareto-based heuristic list scheduling by not only the personal and global best but also the
that provided the customers with a set of tradeoff optimal swarm best. 2) The proposed MSMOOA is used for
solutions about makespan and energy consumption. They multi-objective workflow scheduling in cloud system,
also proposed a similar multi-objective workflow which is the first workflow scheduling algorithm that
scheduling method for makespan and cost [20]. However, takes the structure characteristic of cloud data center into
all of the above works have focused on two objectives. consideration.
1052 J. Cent. South Univ. (2017) 24: 1050−1062
to the solutions in PS.
2 Problem modeling
Pf  {F ( x ) x  PS } (5)

In this section, the concept for the MOOP is


presented firstly. Then, the model of cloud data center, 2.2 Cloud data center model
workflow application and scheduling, including all QoS The cloud data center used in this work offers a set
parameters in this work is formally described. Our goal of resources by a cloud service provider in the form of
is to distribute workflow applications to cloud computing VM in a pay-as-you-go model. In our model, the cloud
system so as to optimize both customers’ criteria (quality data center consists of a set of PMs,
of service, QoS) and cloud providers’ profits. Finally, the PM={pm1, pm2, …, pmP}
scheduling model is presented as a MOOP.
which is similar to multiple swarms of same species in
nature. Each PM can communicate with other PMs
2.1 Concept for multi-objective optimization problem
located in the same data center through Intranet and all
Most real-world engineering problems are MOOPs.
these connected PMs with a manager server make up the
The goal of the MOOP is to find a set of good trade-off
cloud data center. The cloud data center offers its
solutions from which the decision maker can select one
services to customers through Internet via the manager
according to his preference. Since Pareto offered the
server, which holds the information of available
most common definition of optimum in multi-objective
resources in the cloud data center, accepts the workflows
optimization, lots of research works were presented to
submitted from customers and stores the found non-
solve the MOOPs [23, 24].
dominated solutions.
The mathematical formulation of a minimization
Through virtualization technology, each PM is
problem for the MOOP with m decision variables and n
virtualized a set of heterogeneous VMs
objectives can be formally defined as
Min ( y  F ( x )  [ f1 ( x ), f 2 ( x ),   , f n ( x )]) (1) VM i  {vmi1 , vmi 2 ,  , vmim }, 1  i  P

where x  X is a m-dimensional decision vector; X is which has different performance and prices as shown in
the search space; y  Y is the objective vector and Y is Fig. 1. The number of running VMs and PMs can scale
the objective space. up and down dynamically according to the workloads in
Because there are multiple objectives involved in the system. If the running VMs and PMs are changed,
the MOOP, there is no single optimal solution with the corresponding information is sent to the manager
regards to all objectives. The solutions which have server immediately. When the manager server accepts a
trade-off or good compromise among all objectives workflow from customers, it is firstly check the
should be found, where Pareto optimality is usually information of available VMs and PMs. Based on this
adopted. Some Pareto concepts [24] are given as follows information, the multi-objective scheduling, which will
(without loss of generality, supposing that the objectives be described in next section, is executed without being
are to be minimized). influenced by the change of available computational
Definition 1: Pareto dominance. The vector x1 resources.
dominates the vector x2 (denoted by  ), if and only if Pre-emption is not allowed in our model, which
the next statement is verified. means that each task must be completed without
interruption once started. It also supposes that each VM
i  {1, 2,  , n}, f i ( x1 )  f i ( x 2 ),  i, cannot perform more than one task at a time.
f i ( x1 )  f i ( x 2 )  f i ( x1 )  f i ( x 2 ) (2)

Definition 2: Pareto optimality. A decision vector x1


is said to be Pareto optimal if and only if the next
statement is verified.
{ x 2 F ( x 2 )  F ( x1 ), x1 and x 2  X }  Ø (3)

Definition 3: Pareto optimal set. The Pareto optimal


set PS is the set of all Pareto optimal decision vectors.
PS  { x1  X x 2  X : x 2  x1} (4)
Definition 4: Pareto optimal front. The Pareto front
Pf consists of the values of the objectives corresponding Fig. 1 Cloud data center model
J. Cent. South Univ. (2017) 24: 1050−1062 1053
2.3 Workflow application model O1 (t r , vmij ), pred (t r )  Ø
In this work, the DAG is used to represent Tt r  
max t p  pred (t r ) {Tt p  O1 (t r , vmij )}, pred (t r )  Ø
customers’ workflow applications submitted to cloud
computing system. A DAG V=(T, E) consists of R tasks (7)
T={t1, t2, … , tR} which are interconnected through The makespan of a DAG task is finally defined as
control flow and data flow such as the maximum completion time of all the tasks in the
DAG:
E  {(ti , t j , Dataij ) (ti , t j )  T  T , i  j}
O1 (V )  max i[1,n ]{Tt r } (8)
where Dataij represents the size of which needs to be
2.4.2 Economic cost
transferred from task ti to tj. The relationship among
Because of marketization characteristic of cloud
tasks, such as serialization, parallelization and selection,
services, most cloud providers such as Amazon have set
is represented by the control flow and data flow. pred(ti)
a price for their services. They have fixed the price for
and succ(ti) are used to denote the set of predecessors
transferring basic data unit (e.g., per MB) between two
and successors of task ti, which means that pred(ti) must
services and the price for processing basic time units
complete before starting ti. Every task ti  T is (e.g., per hour) in pay-as-you-go model. The economic
characterized by its length, len(ti), measured for example objective O2 of task tr executed on vmij is the sum of data
in the total number of instructions, which affects its transfer and computation cost [20, 21]:
execution time and energy consumption. Figure 2 shows
O2 (t r , vmij )  O1 (t r , vmij )  vcij 
a customer’s workflow model, which contains four tasks.
pred (t r )
 Data pr  vt sched (t p ) ij
(9)
p 1

where vcij is the hourly price of vmij; Datapr is the size of


the data to be transferred from tp to tr; and vt sched (t p )ij
is the hourly price of data transfer between sched(tp) and
vmij. So, the economic cost of a DAG task is the sum of
all tasks’ economic cost:
R
O2 (V )   O2 (ti , sched (ti )) (10)
i 1

2.4.3 Energy consumption


In this work, our focus is on computational-
intensive applications. Therefore, the data transfer and
storage energy consumption are ignored and only the
Fig. 2 A customer’s workflow model with four tasks energy consumption for computation of tasks is
considered. So, the energy consumption of task tr
2.4 Scheduling model executed on vmij can be expressed as follows [19, 21]:
2.4.1 Makespan O3 (t r , vmij )  O1 (t r , vmij )  veij (11)
The makespan of a task tr on VM vmij (the j-th VM
in the i-th PM) can be computed as the sum of the where veij is the hourly energy consumption of vmij. So,
longest input transfer time (from all inputs to tr) and the the energy consumption of a DAG task is the sum of all
task computation time [19−21]: tasks’ energy consumption:
R
 Data pr  len(t ) O3 (V )   O3 (t r , sched (t r )) (12)
r
O1 (t r , vmij )  max t p  pred (t r )   (6) r 1
 bsched (t p ) ij  sij
2.4.4 Scheduling model
where sched(tp) is the VM which task tp executed on; In this work, a cloud computing system consists of a
Datapr is the size of the data to be transferred between tp set of PMs. Each PM offers different VMs and the users
and tr; bsched (t p )ij is the network bandwidth between submit their DAG applications. PMs compose of a set of
sched(tp) and vmij; len(tr) is the length of the task tr in tasks that have to be executed on these VMs. The
machine instructions as mentioned before; and sij is the workflow scheduling problem is to construct a mapping
computation speed of vmij. So, the makespan of tr on vmij of these tasks to VMs that minimizes the following
can be computed as follows: conflicting objectives: makespan, cost, and energy
1054 J. Cent. South Univ. (2017) 24: 1050−1062
consumption. Therefore, the task scheduling in cloud algorithm is terminated when the stopping criterion is
system can be formulated as the following MOOP: satisfied.
makespan : min imize O1 (V )
 3.2 MSMOOA
cost : min imize O2 (V ) (13)
In nature, many swarms of same species can exist in
energy : min imize O (V )
 3 the living space simultaneously. Each swarm occupies a
part of the living space and evolves according to the
3 Multi-objective workflow scheduling based rules benefit for its own swarm. Due to various reasons
on multi-swarm optimization algorithm such as competition for territory or mating, some
individuals can communicate with several other swarms.
3.1 Particle swarm optimization algorithm They can get information from the swarms which they
Inspired by the social behavior of animals, such as have arrived. So, the evolution of the individuals is
bird flocking and fish schooling, PSO algorithm was affected by different swarms’ rules and they also promote
proposed by KENNEDY and EBERHART in 1995 [25]. information sharing among swarms and cooperative
PSO algorithm has got more and more attention from development of whole species.
many researchers due to its relative simplicity, fast The cloud data center consists of several PMs and
convergence and population-based feature [26]. In past information can communicate among PMs through
decades, PSO algorithm has been successfully applied in Intranet as mentioned in Section 2.2. Each PM is
a variety of fields, such as constrained mixed-variable virtualized to a set of heterogeneous VMs through
optimization problems [27] and wireless sensor networks virtualization technology. Thus, the structure of cloud
[28, 29]. Some of the existing multi-objective particle data center is similar to multiple swarms of same species
swarm optimization (MOPSO) algorithms can be found in nature. So, in this work, we design the multi-swarm
in Refs. [30−32]. cooperative PSO called MSMOOA for multi-objective
In the PSO algorithm, the population of solution workflow scheduling in cloud computing environment. It
candidates is called a “swarm”, while each candidate has been proved that the multi-swarm cooperative PSO
solution is called a “particle”. Initially, the particles are can get better performance than typical MOPSO [32]. In
generated at random and each particle is a candidate our proposed algorithm, each available PM performs as a
solution to the optimization problem. The current swarm and can communicate with each other through
position in the search space of a particle represents a Intranet. So, the particles of each swarm in the
potential solution. In its more basic form, the position of MSMOOA are divided into two classes. The first class of
the i-th particle in the search space at generations k+1, particles belongs to only one swarm and develops
xi(k+1), is decided by according to the rule of this swarm. The second class of
particles can communicate with multiple swarms and get
xi (k  1)  xi (k )  vi (k  1) (14) evolutionary information from these swarms, so the
where the factor vi(k+1) is known as the velocity and is update of these particles’ position and velocity is affected
given by by different swarms’ rules. The second class of particles
also promotes information sharing among swarms and
vi (k  1)    vi (k )  C1  r1  ( pi  xi )  C2  r2  ( gi  xi )
cooperative evolution in the whole process for
(15) multi-objective optimization. The spirit of dividing each
where pi, called the personal best (pbest), is the best swarm into two classes is similar to the idea presented in
solution that xi has viewed, and gi, called the global best Ref. [33]. However, the functions and evolutionary
(gbest), is the best particle that the entire swarm has strategies for the particles in Ref. [33] and this work are
viewed. ω is the inertia weight of the particle and different from each other. Furthermore, the method
controls the tradeoff between the global experience and proposed in Ref. [33] is used for the multimodal
the local one. r1 and r2 are two uniformly distributed functions and is not suit for the multi-objective
random numbers in the range [0,1], and C1 and C2 are scheduling in this work.
specific parameters which control the effect of the In the MSMOOA, each swarm employs the
personal particle and the global best one. MOPSO algorithm to find out non-dominated particle
At each iteration of the PSO algorithm, each particle with one objective. Then, all non-dominated particles
improves its position and velocity according to Eqs. (14) obtained by different swarms cooperate to find the final
and (15). Then, the fitness value of each particle is global non-dominated solutions. Like most of the
evaluated according to the desired optimization fitness existing MOPSO algorithms, the external archives with
function and pbest as well as gbest is updated. Analogous maximal capacity, called the local external archive (LEA)
to other evolutionary computation algorithms, the PSO in each PM and the global external archive (GEA) in the
J. Cent. South Univ. (2017) 24: 1050−1062 1055
manager server, are adopted to store non-dominated modified as
optimal solutions for one swarm and all swarms,
vis (t  1)    vis (t )  C1  r1  ( pis  xis )  C 2  r2 
respectively. The details are presented as follows.
3.2.1 Particle representation ( sb s  xis )  C3  r3  ( gb s  xis ) (16)
To solve an optimization problem with the PSO
Like some individuals can communicate with
algorithm, one needs to encode the candidate solution to
several other swarms in nature, the other class of
the underlying problem into a particle vector form, then
particles in the MSMOOA can migrate among several
the evolved vector is decoded to the solution form to
other swarms and get information from these swarms. So,
evaluate its merit of fitness. In our work, each particle is
the velocity update of these particles is expressed as
represented with a vector of n elements (n is the total
number of tasks in a DAG application), and each element vis (t  1)    vis (t )  C1  r1  ( pis  xis )  C 2  r2 
has three integer values, which indicate the number of ( sb s  xis )  C3  r3  ( gb s  xis )  C 4  r4  ( sb k  xis )
task in the DAG, the number of PM, and the number of (17)
VM, respectively. It also indicates that the position
where pis (called pbest) is the personal best position for
satisfies the precedence constraint between tasks. For
the i-th particle in the s-th swarm; sb s (called sbest) is
example, Fig. 3 shows a feasible particle for the DAG
the best position in the s-th swarm and gb s (called
application as shown in Fig. 2. This particle indicates
gbest) is the global best position in the s-th warm; sb k
that task 1 is scheduled to the second VM in the first PM,
(called osbest) is the best position of the k-th swarm,
task 2 is scheduled to the first VM in the second PM, and
with which the i-th particle in the s-th swarm can
so on.
communicate. r3 as well as r4 is a uniformly distributed
random number in the range [0, 1] like r1 and r2, and the
same to C3 and C4. All particles’ position is expressed as
Fig. 3 A particle example for DAG application in Fig. 2 follows
xis (t  1)  xis (t )  vis (t  1) (18)
3.2.2 Handing workflow scheduling using MSMOOA
Our work has three fitness functions: 1) minimizing 1) Update of the pbest for one particle. After
makespan O1(V) according to Eq. (8), 2) minimizing updating the position and the velocity of one particle
economic cost O2(V) according to Eq. (10), and 3) according to Eq. (18) and Eq. (16) or Eq. (17) at every
minimizing energy consumption O3(V) according to iteration, the fitness function of this particle is evaluated
Eq. (12). according to Eqs. (8), (10) and (12). If the current value
In this work, each available PM is considered a dominates the previous one, then the pbest is updated by
swarm and corresponds to one of multi-objective in the current value. Conversely, it is changeless. If the current
workflow scheduling. And each swarm employs the value and the previous position can not dominate with
MOPSOA to find out non-dominated solutions with this each other, the one with the minimum value of this
objective, which are stored in the LEA. Then, all swarm’s objective is selected as the new pbest. The
non-dominated solutions obtained by different swarms update of the pbest for the i-th particle in the s-th swarm
cooperate to find the final global non-dominated is expressed as follows:
solutions, which are stored in the GEA. So, the evolution
of all particles in the MSMOOA is affected by not only  x s (t  1), if x s (t  1)  p s (t )
 i i i
the personal and the swarm’s information but also the 
pis (t  1)   pis (t ), if pis (t )  xis (t  1) (19)
global information. So, it is different from the similar

algorithm presented in Ref. [34] called vector evaluated  xpis , otherwise
particle swarm optimization (VEPSO), which does not 
have global information. where xpis indicates the one with a minimum value of
Inspired by division of same species into multiple the s-th swarm’s objective
swarms in nature and information sharing among swarms 2) Update of the LEA and the sbest for one swarm.
to promote cooperative evolution, particles of each After updating position of each particle in one swarm, all
swarm in the MSMOOA are divided into two classes as particles in this swarm and in the LEA are assessed by
mentioned before. The first class of particles only three fitness functions and non-dominated solutions are
belongs to one swarm and develops according the rule of added to the LEA via the Pareto dominance-based
this swarm. So, the velocity update of these particles species technique. If the LEA is not full, all the
(vis , the velocity of the i-th particle in the s-th swarm) is non-dominated solutions are kept in the LEA. If the size
1056 J. Cent. South Univ. (2017) 24: 1050−1062
exceeds the maximal capacity of LEA, the non- swarms with different objectives becomes an important
dominated solution with a maximal fitness of this problem, for it will affect the solution searching ability
swarm’s objective is deleted until the LEA is not full. and the convergence of the algorithm directly. In the
Every solution in one LEA can not dominate any other MSMOOA, two external archives, LEA and GEA, are
particles stored in the same LEA. In the MSMOOA, the adopted to store non-dominated optimal solutions of one
solutions in each LEA are sorted according to the swarm and all swarms, respectively. For one swarm,
objective function with this swarm at each iteration. LEA has stored the found non-dominated solutions with
Then the solution with minimum fitness of this objective the objective of this swarm and sbest, which has a
minimum fitness of this objective, is selected to guide
is selected as new sbest sb s for this swarm. The process
the particles in this swarm at each iteration. So, the
of updating the LEA and the sbest for s-th swarm is
selection of gbest for the particles in different swarms
depicted as follows.
should has priority on other objective.
In MSMOOA, all solutions in the GEA are sorting
Input: all particles after Eq.(8)
in multi queues according to the fitness of different
Output: undated LEA and sbest for all particles
objective at first. Then, for one swarm with a predefined
1 Evaluate all particles in the s-th swarm and the s-th
objective, the solution in GEA with a minimum fitness of
LEA according to Eqs. (8), (10) and (12)
another objective is randomly selected as the gbest. The
2 Select all non-dominated solutions and add them to
general step of unpdating the GEA and gbest is depicted
the s-th LEA
in follows.
3 While ( the s-th LEA is full )
a) delete the non-dominated solution with a maximal Input: all particles after undating LEA and sbest
fitness of the objective of the s-th swarm Output: undated GEA and gbest for all particles
4 End while 1 Add all solutions from LEA to GEA
5 Sort solutions in the s-th LEA by the objective 2 Delete the dominated solutions from GEA
function with s-th swarm
3 While ( the GEA is full )
6 Select solution with a minimum fitness of this
a) Compute crowding distance of every solution in
objective as as new sbest sb s for the s-th swarm
GEA
b) Update GEA by the binary tournament with these
3) Update of the osbest for the second class of
crowding distance values
particles. The second class of particles can communicate
4 End while
with different swarms through Intranet and get sbests of
these swarms. For each particle in the second class, if 5 Sort the solutions into multi queues according to the
one of these sbests dominates others, it is selected as fitness of different objective
osbest. Otherwise, the sbest with the minimum objective 6 For s=1 to S (S is the number of all swarms)
value of the swarm, which the particle belongs to, is a) Select the solution with a minimum fitness of
selected as osbest. another objective as the gbest from GEA
4) Update of the GEA and gbest for all particles. 7 End for
GEA, which is the output of the MSMOOA, consists of
solutions from all LEAs. After updating all LEAs, the The general steps of the MSMOOA based workflow
solutions from these LEAs are added to GEA and scheduling in cloud computing system are outlined as
dominated solutions are deleted. If the size of the archive follows.
exceeds the maximal capacity of GEA, it is truncated on
the basis of the density of elements to keep high Input: DAG, the information of all available VMs in
performance of output. The crowding distance [35] is
Manager server
adopted to estimate each element’s density. The higher
Output: the set of non-dominated solutions for the DAG
crowding distance value signifies the better solution. So,
1 For s=1 to S (S is the number of all PMs)
the crowding distance of every solution in GEA is
calculated at each iteration. Then, the binary tournament a) For i=1 to Snums (Snums is the size of particle
with these crowding distance values is done to update swarm in the s-th swarm)
GEA. By this approach, the most distributed elements i) Select one objective as main function for
are retained in the archive. the s-th swarm
After GEA updated, every solution does not ii) Determine velocity vis and position xis
dominate any other solutions in GEA. The selection of randomly
suitable gbest from the GEA for the particles in different iii) Initialize the pbest of the i-th particle in
J. Cent. South Univ. (2017) 24: 1050−1062 1057

the s-th swarm pis  xis


b) End For
c) Update LEAs and sbest
2 End For
3 Select osbest for the second class of particles
4 Update GEA and gbest
5 For t=1 to T (T is the maximum number of iterations)
a) For s=1 to S (S is the number of all PMs)
i) For i=1 to Snums (Snums is the size of
particle swarm in the s-th swarm)
1) Calculate the new velocity vis
according to Eq. (16) or (17) and new
position xis according to Eq. (18)
2) Update the pbest pis according to
Fig. 4 Structure of evaluated workflows: (a) Montage;
Eq. (19)
(b) Epigenomics
ii) End For
iii) Update LEAs and sbest
characterization for each workflow.
b) End For The WorkflowSim toolkit [37] has been chosen as a
c) Select osbest for the second class of particles simulation platform, as it is a modern simulation
d) Update GEA and gbest framework aimed for workflow scheduling in cloud
6 End For computing environments. The experiments have
performed on 5 PMs and the available VMs are changed
7 Return the Pareto front (the set of non-dominated
dynamically. The pricing models given by Amazon EC2
solutions from GEA)
(http://aws.amazon.com/ec2) and Amazon CloudFront
(http://aws.amazon.com/cloudfront/) are chosen as the
By using the above method, each available PM
processing costs and the transferring data costs,
concentrates its effort on looking for itself non-
respectively. To estimate the energy consumption, we
dominated optima through two classes of particles.
rely on the cube-root rule [21, 38] which approximates
Based on the non-dominated particles and Pareto optimal
the clock frequency of a chip as the cube-root of its
solutions of every PM, the GEA is able to evolve a
power consumption.
diverse and well-distributed nearly optimal Pareto front.
We evaluate the performance of the proposed
Thus, the proposed algorithm leads to cooperation search
MSMOOA on the workflow instances described
among the swarms to the multi-objective workflow
previously and compare our results with those of another
scheduling in cloud computing system and can get a set
multi-objective workflow scheduling algorithm called
of non-dominated scheduling strategies.
MOHEFT [20] for cloud system. The quality of solutions
computed by these algorithms is compared. Both
4 Performance evaluation
MOHEFT and MSMOOA can offer a set of tradeoff
solutions to users. In order to facilitate comparison, we
In this section, the overall setup of our experiment
take the single result from HEFT, one of widely-used
and the results obtained from it is described to validate
heuristics for workflows scheduling in distributed
the proposed MSMOOA. In our experiment, two heterogeneous computing systems, as the baseline to
well-known workflow applications, montage and compare the gain over this baseline for the multiple
epigenomics [36], are chosen as test cases. The Montage scheduling objectives. As for the gain, the result with big
workflow, created by the NASA/IPAC Infrared Science value is better than the one with small value. We also
Archive, is a hybrid workflow and can be used to compare the running time of each algorithm. The running
generate custom mosaics of the sky using input images in time of an algorithm is its execution time for obtaining
the flexible image transport system format. The the output schedule of a given workflow.
Epigenomics workflow, created by the USC Epigenome As mentioned before, both MOHEFT and
Center and the Pegasus Team, is a parallel workflow MSMOOA can offer a set of tradeoff solutions, so the
used to automate various operations in genome sequence experiments run MOHEFT and MSMOOA ten times,
processing. Figure 4 shows their simplified respectively, and compare the gain of makespan, cost and
representations. DEELMAN et al [3] provided a detailed energy consumption over HEFT by the average result of
1058 J. Cent. South Univ. (2017) 24: 1050−1062
MOHEFT and MSMOOA, respectively at first. Secondly, hypervolume indicator [39], a useful indicator to
one solution from the results of MSMOOA and measure the quality of a set of tradeoff solutions.
MOHEFT is selected, respectively. This solution is the The experiment results of Montage workflow are
closest one to the result of HEFT in the sense of shown in Table 1 and Fig. 5. From Table 1, we can see
Euclidean distance. Then, we consider the gain of that the MSMOOA gets the similar result with MOHEFT
makespan, cost and energy consumption over HEFT by in makespan and both of them are better than HEFT.
the selected solution from MSMOOA and MOHEFT, Moreover, the average gain about makespan of
respectively. At last, in order to compare the performance MSMOOA is better than MOHEFT for Montage_1000,
of MOHEFT and MSMOOA, we focus on the while the average makespan of MOHEFT is better than

Table 1 Average gain over HEFT of Montage workflow


MOHEFT average/% MSMOOA average/%
Workflow
Time Cost Energy Time Cost Energy
Montage_25 0.18 11.31 9.74 0.15 14.21 12.11
Montage_50 0.20 13.17 17.67 0.14 15.55 14.57
Montage_100 0.17 16.06 14.58 0.08 19.95 16.80
Montage_1000 0.23 16.83 14.25 0.25 20.37 19.39

Fig. 5 Comparative analysis of Montage workflow: (a) Makespan; (b) Cost; (c) Energy; (d) Hypervolume
J. Cent. South Univ. (2017) 24: 1050−1062 1059
MSMOOA for the three other workflows. Although the
performance of MOHEFT in makespan does not degrade
compared to MSMOOA, MOHEFT is initialized with the
solution computed by HEFT and MSMOOA is initialized
with stochastic solution. So, the MSMOOA has better
search ability. As for cost and energy consumption, the
performance of MOHEFT and MSMOOA has significant
improvement and the average gain by MSMOOA is
better than that by MOHEFT in most cases. This is
because HEFT is a single-objective scheduling algorithm
and does not consider other objectives, such as cost and
energy consumption, and MSMOOA designs the multi-
objective workflow scheduling by integrating with the
structure characteristics of cloud data center, while
MOHEFT left out of consideration.
Performance analysis of one selected solution from
MOHEFT and MSMOOA is given in Fig. 5 for Montage
workflow. This selected solution is the closest one to the Fig. 6 Average running time of algorithms about Montage
result of HEFT in the sense of Euclidean distance and the workflow
gain of makespan, cost and energy consumption over
HEFT by the selected solution of MOHEFT and reason for this phenomenon is that both MSMOOA and
MSMOOA is displayed on Figs. 5(a)−(c). MOHEFT and MOPSO are meta-heuristic methods and MOHEFT is a
MSMOOA get the similar result to HEFT about list scheduling algorithm based on HEFT. When
makespan as shown in Fig. 5(a). As for cost and energy considering the previously compared results, it is
consumption, MSMOOA and MOHEFT get better valuable to consume sustainable time for better
performance than HEFT as shown in Figs. 5(b) and (c), scheduling result, especially about the cost and energy
and the improvement gets more and more obviously with consumption. It is also indicate that the results of
the increase of the number of workflow tasks. In the case MSMOOA are better than those of MOPSO. This is due
of Motage_25 (the number of tasks is 25), the gain over to a fact that the iterations of MSMOOA are less than
HEFT in cost is 13.30% and 15.21% for MOHEFT and MOPSO, although the used time for each iteration of
MSMOOA, respectively and the gain over HEFT in MSMOOA is more than that of MOPSO.
energy consumption is 11.10% and 13.06%. When the As for epigenomics workflow, the results
number of tasks is 1000 (Montage_1000), the gain in summarized in Fig. 7, Fig. 8 and Table 2 are similar to
cost is 17.40% and 21.20% respectively and the gain in the previous experiment and confirm the findings of
energy consumption is 14.10% and 19.77%. We can also Montage workflow in terms of the compared metrics.
see that the performance of MSMOOA is better than The gain over HEFT in makespan, cost and energy
MOHEFT as illustrated in Fig. 5(d). The explanation for consumption is shown in Figs. 7(a), (b) and (c),
this behavior is that multi-swarms are designed to find respectively. The hypervolume of MSMOOA and
the solutions collaboratively in MSMOOA and each MOHEFT is indicated in Fig. 7(d). The comparison
swarm corresponds to one objective of multi-objective in about running time is presented in Fig. 8.
the workflow scheduling. At the same time, two classes
of particles are designed to promote information sharing 5 Conclusions
in these swarms and different strategy for updating LEA
and GEA is adopted. So, the selected solution of 1) A manager sever is adopted to avoid the influence
MSMOOA is better than that of MOHEFT. from the elastic characteristic of cloud system to
Furthermore, we also compare the running time scheduling results and collect the information of
used by MOHEFT, MSMOOA and MOPSO [32] for available computational resources when the system
each Montage workflow and the results are presented in accepts a workflows submitted from customers.
Fig. 6. The compared time of each algorithm is also the 2) The proposed MSMOOA can find out better non-
average result of the corresponding algorithm for ten dominated solutions effectively, which has been proved
times. The running times of all algorithms increase with by experiences.
the growing up of task number as shown in Fig. 6. It is 3) Compared with HEFT and MOHEFT by
also presented that the running times of MSMOOA and simulating them with both hybrid and parallel workflow
MOPSO are more than the results of MOHEFT. The applications having different structures, the MSMOOA
1060 J. Cent. South Univ. (2017) 24: 1050−1062

Fig. 7 Comparative analysis of epigenomics workflow: (a) Makespan; (b) Cost; (c) Energy; (d) Hypervolume

Table 2 Average gain over HEFT of epigenomics workflow


MOHEFT MSMOOA
Workflow average/% average/%
Time Cost Energy Time Cost Energy
Epigenomics_24 0.06 6.60 5.40 0.10 9.03 6.63
Epigenomics_46 0.10 8.00 7.70 0.06 11.31 9.92
Epigenomics_100 0.04 9.30 11.30 0.03 14.02 14.85
Epigenomics_997 0.07 11.30 14.40 0.11 17.69 19.36

has better performance than HEFT and MOHEFT, not


only in terms of the cost but also in terms of the energy
consumption.
4) The MSMOOA can be only used for stable
computing environment. In real world, the computing
Fig. 8 Average running time of algorithms about epigenomics resources are not useful all the time. In future work,
workflow other objectives such as robust, reliability and security in
J. Cent. South Univ. (2017) 24: 1050−1062 1061
addition to the objectives in this work should be applications on utility grids: Time and cost trade-off management
[C]// The Thirty-Second Australasian Conference on Computer
considered.
Science. Australian: Australian Computer Society Inc., 2009:
151−160.
References [16] TENG S, HAY L L, PENG C E. Multi-objective ordinal optimization
for simulation optimization problems [J]. Automatica, 2007, 43(11):
[1] YU J, BUYYA R. Scheduling scientific workflow applications with 1884−1895.
deadline and budget constraints using genetic algorithms [J]. [17] ZHAN Fan, CAO Jun-wei, LI Ke-qin, KHAN S U, HWANG K.
Scientific Programming, 2006, 14(3): 217−230. Multi-objective scheduling of many tasks in cloud platforms [J].
[2] VISWANATHAN S, VEERAVALLI B, ROBERTAZZI T G. Future Generation Computer Systems, 2014, 37: 309−320.
Resource-aware distributed scheduling strategies for large-scale [18] TAO Fei, FENG Ying, ZHANG Lin, LIAO T W. CLPS-GA: A case
computational cluster/grid systems [J]. IEEE Transactions on Parallel library and Pareto solution-based hybrid genetic algorithm for
and Distributed Systems, 2007, 18(10): 1450−1461. energy-aware cloud service scheduling [J]. Applied Soft Computing,
[3] DEELMAN E, VAHI K, JUVE G, RYNGE M, CALLAGHAN S, 2014, 19: 264−279.
MAECHLING P, MAYANI R, CHEN W, SILVA R F, LIVNY M, [19] DURILLO J J, NAE V, PRODAN R. Multi-objective energy-efficient
WENGER K. PEGASUS, a workflow management system for workflow scheduling using list-based heuristics [J]. Future
science automation [J]. Future Generation Computer Systems, 2015, Generation Computer Systems, 2014, 36: 221−236.
46: 17−35. [20] DURILLO J J, PRODAN R. Multi-objective workflow scheduling in
[4] PINEDO M L. Scheduling: Theory, algorithms, and systems [M]. Amazon EC2 [J]. Cluster Computing, 2014, 17(2): 169−189.
New York: Springer, 2012. [21] FARD H M, PRODAN R, FAHRINGER T. Multi-objective list
[5] TOPCUOGLU H, HARIRI S, WU M. Performance-effective and scheduling of workflow applications in distributed computing
low-complexity task scheduling for heterogeneous computing [J]. infrastructures [J]. Journal of Parallel and Distributed Computing,
IEEE Transactions on Parallel and Distributed Systems, 2002, 13(3): 2014, 74(3): 2152−2165.
260−274. [22] YASSA S, CHELOUAH R, KADIMA H, GRANADO B.
[6] BRAUN T D, SIEGEL H J, BECK N, BÖLÖNI L L, Multi-objective approach for energy-aware workflow scheduling in
MAHESWARAN M, REUTHER A I, ROBERTSON J P, THEYS M cloud computing environments [J]. The Scientific World Journal,
D, YAO B, HENSGEN D FRUND R F. A comparison of eleven 2013, doi: 10.1155/2013/350934.
static heuristics for mapping a class of independent tasks onto [23] CHENG Ji-xiang, ZHANG Ge-xiang, LI Zhi-dan, LI Yuan-quan.
heterogeneous distributed computing systems [J]. Journal of Parallel Multi-objective ant colony optimization based on decomposition for
and Distributed computing, 2001, 61(6): 810−837. bi-objective traveling salesman problems[J]. Soft Computing, 2012,
[7] CAO Jun-wei, HWANG K, LI Ke-qin, ZOMAYA A Y. Optimal 16(4): 597−614.
multiserver configuration for profit maximization in cloud computing [24] GÓMEZ J, GIL C, BAÑOS R, MÁRQUEZ A L, MONTOYA F G,
[J]. IEEE Transactions on Parallel and Distributed Systems, 2013, MONTOYA M G. A Pareto-based multi-objective evolutionary
24(6): 1087−1096. algorithm for automatic rule generation in network intrusion
[8] TSAI C, HUANG Wei-cheng, CHIANG M H, CHIANG M C, detection systems [J]. Soft Computing, 2013, 17(2): 255−263.
YANG Chu-sing. A hyper-heuristic scheduling algorithm for cloud [25] KENNEDY J, EBERHART R C. Particle swarm optimization [C]//
[J]. IEEE Transactions on Cloud Computing, 2014, 2(2): 236−250 The 1995 IEEE International Conference on Neural Network. Perth:
[9] LUO Jiang-ying, RAO Lei, LIU Xue. Temporal load balancing with IEEE, 1995: 1942−1948.
service delay guarantees for data center energy cost optimization [J]. [26] COELLO C C A. Evolutionary multi-objective optimization: A
IEEE Transactions on Parallel and Distributed Systems, 2014, 25(3): historical view of the field [J]. IEEE Computational Intelligence
775−784. Magazine, 2006, 1(1): 28−36.
[10] BUYYA R, YEO C S, VENUGOPAL S, BROBERG J, BRANDIC L. [27] GAO Lei, HAILU A. Comprehensive learning particle swarm
Cloud computing and emerging IT platforms: Vision, hype, and optimizer for constrained mixed-variable optimization problems [J].
reality for delivering computing as the 5th utility [J]. Future International Journal of Computational Intelligence Systems, 2010,
Generation Computer Systems, 2009, 25(6): 599−616. 3(6): 832−842.
[11] FU Zhang-jie, REN Kui, SHU Jian-gang, SUN Xing-ming, HUANG [28] HU Yi-fan, DING Yong-sheng, HAO Kuang-rong, REN Li-hong,
Feng-xiao. Enabling personalized search over encrypted outsourced HAN Hua. An immune orthogonal learning particle swarm
data with efficiency improvement [J]. IEEE Transactions on Parallel optimization algorithm for routing recovery of wireless sensor
and Distributed Systems, 2016, 27(9): 2546−2559. networks with mobile sink [J]. International Journal of Systems
[12] XIA Zhi-hua, WANG Xin-hui, SUN Xing-ming, WANG Qian. A Science, 2014, 45(3): 337−350.
secure and dynamic multi-keyword ranked search scheme over [29] HU Yi-fan, DING Yong-sheng, REN Li-hong, HAO Kuang-rong,
encrypted cloud data [J]. IEEE Transactions on Parallel and HAN Hua. An endocrine cooperative particle swarm optimization
Distributed Systems, 2015, 27(3): 340−352. algorithm for routing recovery of wireless sensor networks with
[13] FU Zhang-jie, SUN Xing-ming, LIU Qi, ZHOU Lu, SHU Jian-gang. multiple mobile sinks [J]. Information Sciences, 2015, 300: 100−113.
Achieving efficient cloud search services: Multi-keyword ranked [30] COELLO C A C, PULIDO G T, LECHUGA M S. Handling multiple
search over encrypted cloud data supporting parallel computing [J]. objectives with particle swarm optimization [J]. IEEE Transactions
IEICE Transactions on Communications, 2015, E98-B(1): 190− on Evolutionary Computation, 2004, 8(3): 256−279.
200. [31] LIU Da-sheng, TAN K C, GOH C K, HO W K. A multiobjective
[14] REN Yong-jun, SHEN Jian, WANG Jin, HAN Jin, LEE S Y. Mutual memetic algorithm based on particle swarm optimization [J]. IEEE
verifiable provable data auditing in public cloud storage [J] Journal Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics,
of Internet Technology, 2015, 16(2): 317−323. 2007, 37(1): 42−50.
[15] GARG S K, BUYYA R, SIEGEL H J. Scheduling parallel [32] GAO Hong-yuan, CAO Jin-long. Non-dominated sorting quantum
1062 J. Cent. South Univ. (2017) 24: 1050−1062
particle swarm optimization and its application in cognitive radio G, VAHI K. Characterizing and profiling scientific workflows [J].
spectrum allocation [J]. Journal of Central South University, 2013, Future Generation Computer Systems, 2013, 29(3): 682−692.
20(7): 1878−1888. [37] CHEN Wei-wei, DEELMAN E. Workflowsim: A toolkit for
[33] YEN G G, DANESHYARI M. Diversity-based information exchange simulating scientific workflows in distributed environments [C]//
among multiple swarms in particle swarm optimization [J]. IEEE 8th International Conference on E-Science (e-Science).
International Journal of Computational Intelligence and Applications, Chicago: IEEE , 2012: 1−8.
2008, 7(1): 57−75. [38] BROOKS D M, BOSE P, SCHUSTER S E, JACOBSON H, KUDVA
[34] PARSOPOULOS K E, TASOULIS D K, VRAHATIS M N. P N, BUYUKTOSUNOGLU A, WELLMAN J, ZUUBAN V,
Multiobjective optimization using parallel vector evaluated particle GUPTA M, COOK P W. Power-aware microarchitecture: Design and
swarm optimization [C]// The IASTED International Conference on modeling challenges for next-generation microprocessors [J]. IEEE
Artificial Intelligence and Applications. America: IEEE, 2004, 2: Micro, 2000, 20(6): 26−44.
823−828. [39] ZITZLER E, THIELE L, LAUMANNS M, FONSECA C M,
[35] DEB K, PRATAP A, AGARWAL S, MEYARIVAN T. A fast and FONSECA V G. Performance assessment of multiobjective
elitist multiobjective genetic algorithm: NSGA-II [J]. IEEE optimizers: An analysis and review [J]. IEEE Transactions on
Transactions on Evolutionary Computation, 2002, 6(2): 182−197. Evolutionary Computation, 2003, 7(2): 117−132.
[36] JUVE G, CHERVENAK A, DEELMAN E, BHARATHI S, MEHTA (Edited by YANG Hua)

Cite this article as: YAO Guang-shun, DING Yong-sheng, HAO Kuang-rong. Multi-objective workflow scheduling in
cloud system based on cooperative multi-swarm optimization algorithm [J]. Journal of Central South University, 2017,
24(5): 1050−1062. DOI: 10.1007/s11771-017-3508-7.

You might also like