Network Failure-Aware Redundant Virtual Machine Placement in A Cloud Data Center

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/319347730
Network failure-aware redundant virtual machine placement in a cloud data

center
Article in Concurrency and Computation Practice and Experience · August 2017

DOI: 10.1002/cpe.4290
CITATIONS READS
5 136
5 authors, including:
Ao Zhou Shangguang Wang

Xi'an Jiaotong-Liverpool University Beijing University of Posts and Telecommunications
48 PUBLICATIONS 367 CITATIONS 163 PUBLICATIONS 1,833 CITATIONS
SEE PROFILE SEE PROFILE
Ching-Hsien Hsu Myung Ho Kim

Asia University & National Chung Cheng University Soongsil University
369 PUBLICATIONS 2,367 CITATIONS 45 PUBLICATIONS 98 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
CYCLONE: Cloud applications automation View project
Transportation Cyber-physical Systems Design to Automated Driving View project
All content following this page was uploaded by Shangguang Wang on 05 December 2017.
The user has requested enhancement of the downloaded file.

MANUSCRIPT ID 1
Network Failure-aware Redundant Virtual

Machine Placement in a Cloud Data Center
Ao Zhou, Shangguang Wang, Ching-Hsien Hsu, Myung Ho Kim
Abstract—Cloud has become a very popular infrastructure for many smart city applications. A growing number of smart city
applications from all over the world are deployed on the clouds. However, node failure events from the cloud data center have
negative impact on the performance of smart city applications. Survivable virtual machine placement has been proposed by the
researchers to enhance the service reliability. Due to the ignorance of switch failure, current survivable virtual machine
placement approaches cannot achieve the best effect. In this paper, we study to enhance the service reliability by designing a
novel network failure-aware redundant virtual machine placement approach in a cloud data center. Firstly, we formulate the
network failure-aware redundant virtual machine placement problem as an integer non-linear programming problem and prove
that the problem is NP-hard. Secondly, we propose a heuristic algorithm to solve the problem. Finally, extensive simulation
results show the effectiveness of our algorithm.
Index Terms—cloud computing, virtual machine placement, reliability, smart city application
——————————  ——————————
1 INTRODUCTION
R ecently, cloud computing has gained widespread

concern from a growing number of application pro-
chine that hosts a critical sub-application is mapping to a
backup virtual machine. The backup virtual machine is
viders [1, 2]. For the low maintenance cost and on- synchronized with the active virtual machine. When a
demand use nature of cloud computing, it has become a failure event occurs, the interrupted sub-application is
very popular infrastructure for many smart city applica- switched to the backup virtual machine. “Critical applica-
tions [3]. We have witnessed many smart city applications tion” denotes that the sub-application cannot tolerate any
from all over the world deployed on the clouds from In- disruption. As we known, it is unnecessary to start up a
frastructure-as-a-Service (IaaS) providers such as Amazon backup for each virtual machine in the cloud data center.
EC2 [4, 5]. Tasks of those smart city applications are sub- The active virtual machine and its backup virtual ma-
mitted and execute in the virtual machines. Therefore, chine are placed on different servers to enhancement the
application providers rent virtual machines from IaaS reliability.
providers to reduce the monetary cost [6, 7]. However, However, the traditional redundant virtual machine
failure events (such as host server failures) from the cloud placement approach may become out of use when a net-
data center directly affects the performances of the hosted work failure occurs. Fat-tree data center network [12, 13]
smart city applications. Thus, reliability enhancement for is commonly adopted by the IaaS providers. Fig. 1 illus-
applications in the cloud computing environment has trates the topology structure of fat-tree data center net-
become an urgently required research problem [8, 9]. In work. In fat-tree data center network, redundancy is em-
this paper, application and service are used alternately ployed in the first layer (the core layer) and the the sec-
with the same meaning. ond layer (the aggregation layer). Therefore, the core lay-
In a cloud data center, virtual machine can become un- er and the aggregation are fairly reliable. However, when
available caused by the host server failure or software a switch in the third layer (the edge layer) fails, all the
faults [10]. To overcome this problem, survivable virtual servers connected to it become inaccessible. In traditional
machine placement has been proposed by the researchers redundant virtual machine placement approach, the ac-
for composited application[11], such as many smart city tive virtual machine and its backup are prone to be placed
applications. Traditionally, many smart city applications on two servers in the same subnet to reduce the network
are complex, and the main smart city application is divid- resource consumption. Two virtual machines with heavy
ed into several sub-applications. The sub-applications are inter-traffic are also prone to be placed on the same sub-
mapped to a group of virtual machines. Each subnet. The service easily becomes unavailable when an edge
application is hosted by a virtual machine. In survivable switch fails.
virtual machine placement approach, each virtual ma- Several edge switch failure-aware virtual machine
placement approaches have been proposed by the re-
———————————————— searchers [14, 15]. However, there is still some challenges
 Ao Zhou, Shangguang Wang is with the State Key Laboratory of Network- the edge switch failure-aware virtual machine placement
ing and Switching Technology, Beijing University of Posts and Telecom- approach confronts. None of current proposed edge
munications. E-mail: aozhou@ bupt.edu.cn, sgwang@ bupt.edu.cn. switch failure-aware approach employs redundancy
 Ching-Hsien Hsu is with Department of Computer Science and Infor-
mation Engineering, Chung Hua University E-mail: chh @ chu.edu.tw. strategy and considers the placement of backup virtual
 Myung Ho Kim is with School of Software, Soongsil University, E-mail:
kmh @ ssu.ac.kr.
xxxx-xxxx/0x/$xx.00 © 200x
2 MANUSCRIPT ID
Fig. 1 Redundant virtual machine placement in fat-tree data center network.
machine. In addition, there is no inter-traffic between a of the proposed algorithm. Section 6 shows experiment
backup virtual machine and other active virtual machines results, and Section 7 concludes the paper.
before failure event occurs. However, the backup virtual
machine should interact with other active virtual machine
after a failure event occurs. Without considering this con-
2 RELATED WORK
dition, the virtual machine placement approaches con- Many notable cloud service reliability enhancement ap-
sumes too much upper network resource. The core layer, proaches have been proposed by the researchers.
which is the bottleneck of a fat-tree data center, would be Checkpoint mechanism achieves fault tolerance by
over-utilized. saving the virtual machine state as a checkpoint image
In this paper, we investigate the network failure-aware periodically during failure-free execution [16]. A check-
redundant virtual machine placement problem in a cloud point image contains the whole recovery information.
data center. Our contributions are summarized as follows: Because of the dynamic nature of Infrastructure as a Ser-
1) By mining the characteristic of the data center net- vice clouds, it is hard to design an efficient checkpoint
work, we specially investigate how to apply replication mechanism. To address the problem, [17] proposed an
technique to enhance the service reliability. We formulate optimal checkpoint mechanism aiming at minimizing the
the network failure-aware redundant virtual machine performance overhead and storage resource consumption.
placement problem into an Integer Non-Linear Pro- In order to efficiently save the complete running state of
gramming (INLP) problem. the application, the propose checkpoint mechanism lever-
2) We prove that the network failure-aware redundant ages new functions, such as disk-image multi-
virtual machine placement problem is a NP-hard problem. snapshotting, and inside checkpoint protocols. [18] pre-
An efficient heuristic algorithm EFAP (Edge switch Fail- sented an incremental checkpoint mechanism in cloud
ure Aware Placement) is presented for the problem. data center. To reduce the network resource consumption
3) Extensive simulation results show that our algo- and the time needed to take a checkpoint image, only
rithm can ensure the service reliability and avoid adding modifications compared with latest stored checkpoint
burden to the bottleneck of data center. image is checkpointed. [19, 20] presented distributed
The rest of this paper is organized as follows. Section 2 checkpoint image storage system in fat-tree data center to
describes related work. Section 3 lists the system model, reduce the upper layer data center network resource con-
and the challenges in the network failure-aware redun- sumption. In the distributed checkpoint image storage
dant virtual machine placement. The problem formula- system, the checkpoint images are storage on the nearby
tion is proposed in Section 4. Section 5 presents the details host server.
AUTHOR ET AL.: TITLE 3
(a) (b)
(c) (d)
Fig. 2 Different redundant virtual machine placement strategies.
Checkpoint mechanism is suitable for large scale compu- service reliability cannot be ensured. A novel virtual ma-
tation-intensive service. To address the problem, replica- chine placement algorithm is proposed to minimize the
tion is another mechanism that can be employed network traffic under the reliability constraints. [15] pro-
Taking the quality of service requirement of applica- posed an availability-aware virtual machine mapping
tions into consideration, [21] presented two optimal data algorithm to improve the network resource utilization
replications approaches in cloud computing environment. and service reliability for a multi-tier application. The
[22] proposed a k-fault tolerant virtual machine placement approaches do not employ replication mechanism.
approach. The proposed approach can minimize the In [11], a backup virtual machine is created for each
number of running server while satisfy the quality of ser- critical virtual machine to enhance the survivability.
vice requirements at any k physical server failures. Only When a server fails or a virtual machine failure occurs,
atomic service is considered in the proposed approaches. the affected service is switched over to the backup virtual
[14, 15, 23] proposed reliability enhancement approaches machine. The group of correlated virtual machine and
for complex application. [14] proposed a reliable virtual their backups compose a survivable virtual machine set.
data center mapping algorithm. Considering both the An efficient algorithm is proposed to map the survivable
failure characteristic of hardware and the impact of indi- virtual machine set to a cloud data center. [24] identified
vidual failures on service reliability, the mapping algo- the significant components of a complex application and
rithm tries to achieve high reliability and low cost. To determine the most suitable reliability enhancement strat-
improve the reliability and performance deployed in egy for each identified component. The significance of
cloud data center, [23] presented a structural constraint- each component is calculated based on the invocation
aware virtual machine placement algorithm. The service frequencies and invocation structures. However, the tra-
availability is formulated as a combination of colloca- ditional redundant virtual machine placement approach-
tion/anti-collocation constraints. In a cloud data center, es may become out of use when a network failure occurs.
the Infrastructure as a Service provider intends to place a We will address above-mentioned problems in this
group of virtual machines with high inter-traffic to the paper.
same subnet to reduce intra-network traffic. However, the
4 MANUSCRIPT ID
TABLE 1 tree data center, is often over-utilized.

NOTATIONS
3.2 Application Model
Symbol Meaning
For a complex smart city application, the main applica-
Vp A primary node in a application structure
tion is divided into several sub-applications. We model
graph
complex smart city application as a graph G  (V , E ) ,
Vb A backup node in a application structure graph
where each vertex denotes a virtual machine. There are
Ep A primary edge in a application structure two types of node in the graph: primary node (denoted
graph by Vp ) and backup node (denoted by Vb ). There are two
Eb A backup edge in a application structure graph types of edges in the graph. There is a primary edge (de-
CSi A core switch of a fat-tree data center network noted by Ep ) connecting a pair of primary vertices if the
two virtual machines need to interact with each other. In
AS j A aggregation switch of a fat-tree data center
addition, there is a primary edge connecting a primary
network
vertex and its backup vertex because the backup virtual
ESk A edge switch of a fat-tree data center network
machine is synchronized with the active virtual machine
ds () The distance between two host server periodically. There is a backup edge (denoted by Eb )
dt () The data interaction rate between two virtual connecting a backup vertex V and a primary vertex V if
machines corresponding primary virtual machine of V needs to
pod y A pod in a cloud data center interact with V . When a failure event occurs, the inter-
rupted sub-application is switched to the backup virtual
subx A subnet in a cloud data center machine. A “backup edge” denotes that there are interac-
vmvp ,i A primary virtual machine tions between two vertices only after failures occur.
vm B The backup virtual machine that is allocated for

vp ,i
vmvp ,i 3.3 Problem Statement
hs k A host server in a cloud data center We will illustrate the research problem of this paper by
using a simple example.
r cpu CPU requirements of a primary or backup vir-
All primary virtual machines and backup virtual ma-
tual machine
chines should be placed on physical servers. To enhance
r mem Memory requirements of a primary or backup
the service reliability, the traditional algorithms try to
virtual machine
spread the virtual machines with frequent interaction
r disk Disk requirements of a primary or backup vir- across multiple servers in the same subnet. Fig .2(a) illus-
tual machine trates the structure of an application. Fig .2(b) illustrates a
c cpu Current available CPU resource traditional reliability-aware virtual machine placement
c mem Current available memory resource strategy. In fat-tree data center network, redundancy is
c disk
Current available disk resource employed in the core layer and the aggregation layer.
Therefore, the core layer and the aggregation layer are
fairly reliable. However, when a switch in the edge layer
fails, all the servers connected to it become inaccessible.
3 SYSTEM MODEL AND PROBLEM STATEMENT As shown in Fig. 2(b), the application is interrupted when
the red server fails. Therefore, we should take the edge
3.1 Data Center Model switch failure into consideration in virtual machine
We consider the problem in a fat-tree data center. A placement. A simple strategy is to spread all virtual ma-
fat-tree data center is composed of four layer nodes. The chine with high inter-traffic across multiple subnets.
top layer is composed of a group of core switches (denot- However, the data center network resource consumption
ed by CSi ). The second layer is composed of a group of varies when different strategies are employed. Fig. 2(c)
aggregation switches (denoted by AS j ). The third layer is and Fig. 2(d) illustrates two placement strategies. Sup-
composed of a group of edge switches (denoted by ES k ). pose the mutual-traffic between vma and vmc is heavier
The bottom layer is composed of a group of physical/host than the mutual-traffic between vma and vmd . Therefore,
servers (denoted by hsi ). The physical servers that con- strategy in Fig.2(c) would consume much more upper
nect to the network through the same edge switches are layer network resources. There is another problem we
called in the same subnet (denoted by subx ). The physical should address. Although there is no inter-traffic between
servers that connet to through the same aggregation a backup virtual machine and other active virtual ma-
switches are called in the same pod (denoted by pod y ). chines, the backup virtual machine should communicate
When the physical servers in the same subnet com- with other active virtual machines after a failure event
municate with each other, only the edge layer network occurs. We should take this into consideration when de-
resources are consumed. When the physical servers in sign the virtual machine placement strategy.
different pods communicate with each other, the edge, We will address the above-mentioned problems in this
aggregation, and core layer network resources should be paper.
consumed. The core layer, which is the bottleneck of a fat-
4 PROBLEM FORMULATION 4.1 Virtual Machine Interaction Constraints

In this section, we formulate the virtual machine It is well-known that a primary virtual machine and its
placement problem. corresponding backup virtual machine should be placed
on host servers in different subnets. Otherwise, an edge
4.1 Virtual Machine Placement Constraints switch failure would result in service interruption. This
Suppose the virtual machines that are allocated for an constraint can be expressed by the following:
application are denoted by the following:  Yvmk Yvmk =0
k
v p ,i
B
(12)
v p ,i
VM P  {vmv p ,1 , vmvp ,2 ,..., vmvp ,N } (1)

All primary virtual machines allocated for a specific
The backup virtual machine that is allocated for vmvp ,i service should be placed on host servers in different sub-
B nets to minimize the effect of failure. This constraint can
is denoted by vmvp ,i .
be expressed by the following:
As we known, all the primary virtual machine and its
corresponding backup virtual machine should be placed
 Yvmk Yvmk B =0
k
v p ,i vp, j
(13)
on host servers. Several binary variables are defined by us All backup virtual machines allocated for a specific
for virtual machine placement: service should be placed on host servers in different sub-
k 1, if virtual machine vm is placed on hsk nets. When a primary virtual machine becomes unavaila-
X vm  (2)
0, otherwise ble, the corresponding sub-service is quickly switched to
the backup virtual machine. The backup virtual machine
1, if virtual machine vm is placed on subk
Yvmk   (3) becomes a primary virtual machine. Therefore, two pri-
 0, otherwise mary virtual machine may be placed on virtual machines
k in the same subnet after failure events occur. This con-
X vm is equal to 1 when virtual machine vm is placed
k k straint can be expressed by the following:
on host server hs k . Otherwise, X vm is equal to 0. Yvm is
equal to 1 when virtual machine vm is placed on a host
 Yvmk B Yvmk B =0
k
v p ,i
(14)
vp, j
server in subnet subk . Otherwise, Yvm is equal to 0.

k A backup virtual machine and other primary virtual
machines should be placed on host servers in different
To ensure that each primary virtual machine is placed
subnets. The backup virtual machine becomes a primary
on one and only one host server, the following constraint
virtual machine when a failure event occurs. Two prima-
should be satisfied for any vmvp ,i :
ry virtual machines may be placed on host servers in the
j
X vmv p ,i 1 (4) same subnet in this condition. This constraint can be ex-
j pressed by the following:
To ensure that each backup virtual machine is placed
on one and only one host server, the following constraint
 Yvmk B Yvmk =0
k
v p ,i
(15)
vp, j
B
should be satisfied for any vm vp ,i :
4.2 Virtual Machine Interaction Cost
The virtual machines should interact with each other
j
X j
vmvBp ,i
1 (5) to provide a complex service. A large amount of data is
transferred between the virtual machines over the data
A host server should have enough available compu- center network. Different virtual machine placement
ting resource that can be allocated to the placed virtual strategies would result in different network resource con-
machine. This constraints can be expressed by the follow- sumption. We should minimize the network resource
ing: consumption. Three types of communication would con-
rvmcpuv X vmj v  c cpu
j
(6) sume data center network resource.
p ,i p ,i
CP denotes the network resource consumption caused

rvmmem
v
X vmj v  c mem
j
(7)
p ,i p ,i
by primary virtual machines interaction. CP can be calcu-
rvmdiskv X vmj v  c disk
j
(8)
lated by the following:
p ,i p ,i
m n
rvmcpuB X vmj B  c cpu
j
(9) Cp (vmvp ,i , vmvp , j )   X vmv
X vmv
dt (vmvp ,i , vmvp , j )ds(hsm , hsn )
v p ,i v p ,i p ,i p, j
m n
mem j mem
rvm vBp ,i
X vm vBp ,i
c j
(10) (16)
CP = C p (vmvp ,i , vmvp , j ) (17)
rvmdiskB X vmj B  c disk
j
(11) i j
v p ,i v p ,i
cpu where dt denotes the data interaction rate between two

r is the CPU requirements of a primary or backup
mem virtual machines, ds denotes the distance between two
virtual machine, r is the memory requirements of a
disk host servers.
primary or backup virtual machine, r is the disk re-
quirements of a primary or backup virtual machine, c cpu Because the backup virtual machine is synchronized
denotes the current available CPU resource, c mem denotes with the active virtual machine periodically, CB denotes
the current available memory resource, c disk denotes the the network resource consumption between a primary
current available disk resource. virtual machine and its backup virtual machine. CB can be
6 MANUSCRIPT ID
calculated by the following: chine placement problem involving only primary virtual
m
Cb (vmvp ,i )   X vm n
X vm B dt (vmv , vmvBp ,i )ds(hsm , hsn ) (18) machine can be solved, then it can be used to solve the
v p ,i
p ,i v p ,i
m n multidimensional packing problem [25, 26].
CB = Cb (vmvp ,i ) (19) Consider a special case of the network failure-aware
i redundant virtual machine placement problem: there are
When a failure event occurs, the interrupted sub- m number of host servers in the data center and n number
application is switched to the backup virtual machine. CV of primary virtual machines. Suppose there is no critical
denotes the network resource consumption between a sub-application. Therefore, the number of backup virtual
backup virtual machine and other virtual machines after machine is 0. The problem is to place all virtual machines
failure event occurs. on host servers with the goal of minimizing the network
Cv (vmvBp ,i , vm)   X vm
m
v
n
X vm dt (vmvp ,i , vm)ds(hsm , hsn ) (20) resource consumption. It is easy to see that an algorithm
p ,i
m n for solving the virtual machine placement problem can be
CV =  Cv (vmvBp ,i , vm) (21) used to solve the multidimensional packing problem. The
i vmVM ( vmvBp ,i ) multidimensional packing problem is NP-Hard [27, 28].
The proof outlined above shows that even a simpler case
where VM ( vmvBp ,i ) denotes the virtual machines that need of only primary virtual machine involved makes the
problem NP-Hard. Hence it follows that the network fail-
B
to interact with vmvp ,i after failure events occur. ure-aware redundant virtual machine placement problem
In a fat-tree cloud data center, there is no data need to is NP-Hard. End proof
be transferred by the data center network when two vir- There are tens of thousands of host servers in a data
tual machine are placed on the same host server. When center. Therefore, it is impractical to iterate all primary
two host servers are in the same subnet, the data should and backup virtual machine placement strategy. We will
be transferred by an edge switch. The distance is 2. When propose a heuristic algorithm to solve the INLP in this
the two host servers are in different subnets but the same paper.
pod, the data should be transferred by two edge switches
and an aggregation switch. The distance is 4. When the
two host servers are in different pods, the data should be
transferred by two edge switches, two aggregation
switches and a core switch. The distance is 6. Therefore,
the distance between two host servers is defined as fol-
lows:
0, if m  n
2, if hs and hs is in the same subnet
 m n
ds(hsm , hsn )   (22)
 4, if hsm and hsn isin the same pod
6, otherwise
4.4 An INLP Formulation

Our goal is to calculate the optimal primary and back-
up virtual machines placement strategy while minimizing
the total network resource consumption. The network
resource consumption is calculated by the following:
C =CP  CB   *CV (23)
 is a tunable parameter that is used to balance trade-
offs between actual communication and backup commu-
nication.
By summering all constraints discussed above, we can
formulate this virtual machine placement problem as a
integer non-linear programming as
INLP 1:
min: (23)
s.t.: (4-11), (12-15)
for i , j, m and  n
Proposition 1. The network failure-aware redundant
virtual machine placement problem is NP-Hard.
Proof: We reduce the multidimensional packing prob-
lem to the network failure-aware redundant virtual ma-
chine placement problem. The idea is that even a simpler
case of the network failure-aware redundant virtual ma-
5 PROPOSED ALGORITHM physical servers are in the same pod. Therefore, we at-
tempt to place the virtual machines with high inter-traffic
Our goal is to calculate the optimal primary and backup
in the same pod. The algorithms are illustrated in Algo-
virtual machines placement strategy while minimizing
rithm 1 and Algorithm 2. As shown in Algorithm 1, we
the total network resource consumption. Based on the
sort all links based on the inter-traffic size and add all
topology of data center network, a heuristic algorithm is
pods to candidate pod set. Then, we iterate all pods and
proposed to solve the problem.
decide the optimal pod that can host the most number of
virtual machines. Thirdly, the virtual machines that can
be placed on current optimal pod are removed from the
virtual machine set. In addition, current optimal pod is
removed from the candidate pod set. The above-
mentioned steps are repeated until all virtual machines
have been placed. As shown in Algorithm 2, we iterate
the links in the link list. The links in the link list have al-
ready been sorted in a descending order of data rate (in
Algorithm 1). A link is in the List if at least one node (de-
notes a virtual machine) of the link does not have been
placed. Then, we select available subnet(s) for the un-
placed node(s). The function find is used to find an avail-
able host server. The steps are repeated until there is no
available host server in current pod. “Available” denotes
that (6)-(15) are satisfied.
6 PERFORMANCE EVALUATION
6.1 Experiment Setup
In our experiment, the network topology is a 32 port
fat-tree topology. The fat-tree data center network con-
sists of 32 pods, and 16 subnets in each pod. There are 16
host servers reside in each subnet. We generate 100 appli-
cations to be deployed in the cloud data center.  is 1.
The number of sub-applications is uniformly distributed
in [7, 8] for each application. The sub-applications are
serially connected. We randomly add 1 backup virtual
machine for each application. The execution time of each
sub-stage is 5 min. The data interaction rate falls to [0.8
MB/min, 1.2 MB/min]. 6000 tasks are generated for the
applications.
We compare our proposed algorithm with two other
algorithms:
(1) RANDOM. The target host server is randomly se-
lected. First fit strategy is employed to place the primary
and backup virtual machines.
(2) HFAP. Host server failure-aware redundant virtual
machine placement. The algorithm is proposed in [11].
HFAP only considers the host server failure in virtual
machine placement.
All algorithms are evaluated by using the following
metrics: (1) Total lost time. Total lost time that is caused
by failures. (2) Total delayed tasks. A task is delayed if a
primary virtual machine and its backup virtual machine
failed at the same time in the task execution. (3) Total data
that has been transferred by the edge switches. (4) Total
data that has been transferred by the aggregation switch-
es. (5) Total data that has been transferred by the core
switches.
In fat-tree cloud data center, only edge layer and aggrega-

tion layer network resources are consumed when two
8 MANUSCRIPT ID
Fig. 3 The number of delayed tasks under different failure rate set-
tings. X-axis denotes the failure rate, and Y-axis denotes the number Fig. 6 The aggregation layer network resource consumption under
of delayed tasks. different failure rate settings. X-axis denotes the failure rate, and Y-
axis denotes the aggregation layer network resource consumption.
Fig. 4 The total lost time under different failure rate settings. X-axis
denotes the failure rate, and Y-axis denotes the total lost time.
Fig. 7 The edge layer network resource consumption under different
failure rate settings. X-axis denotes the failure rate, and Y-axis de-
notes the edge layer network resource consumption.
6.2 Performance Comparison under Different

Failure Rate
To study the impact of the edge switch failure rate on
the service reliability, we compare RANDOM, HFAP, and
EFAP under failure rate settings of 0 to 0.01 with a step
value of 0.002. The service number in this experiment is
100. Figs. 3-7 shows the experimental results of all meth-
ods under different failure rate settings.
Fig. 3 illustrates the total delayed tasks under different
failure rate. Fig. 4 illustrates the total lost time under dif-
ferent failure rate settings. Fig. 5 illustrates the total core
layer network resource consumption under different fail-
ure rate settings. Fig. 6 illustrates the total aggregation
Fig. 5 The core layer network resource consumption under different layer network resource consumption under different fail-
failure rate settings. X-axis denotes the failure rate, and Y-axis de- ure rate settings. Fig. 7 illustrates the total edge layer
notes the core layer network resource consumption. network resource consumption tasks under different fail-
ure rate settings.
As shown in Fig. 3 and Fig 4, the total delayed task Konwinski, et al., "A view of cloud computing,"
number and the total lost time of all the three algorithms Communications of the ACM, vol. 53, pp. 50-58, 2010.
become larger with the increase number of edge switch [3] J. Chen, C. Wang, B. B. Zhou, L. Sun, Y. C. Lee, and A. Y.
failure rate from 0 to 0.01. EFAP outperform RANDOM Zomaya, "Tradeoffs between profit and customer satisfaction for
and HFAP in all the edge switch failure rate settings con- service provisioning in the cloud," presented at the Proceedings
sistently. HFAP performs badly in all settings. The reason of the 20th international symposium on High performance
is that only host server failure is considered by HFAP. distributed computing, 2011.
There is a frequent interaction between a primary virtual [4] S. Yi, A. Andrzejak, and D. Kondo, "Monetary cost-aware
machine and its backup virtual machine because the checkpointing and migration on Amazon cloud spot instances,"
backup virtual machine is synchronized with the active
Services Computing, IEEE Transactions on, vol. 5, pp. 512-524,
virtual machine periodically. Therefore, HFAP attempts
2012.
to place primary virtual machine and its backup virtual
[5] A. Zhou, Q. Sun, L. Sun, J. Li, and F. Yang, "Maximizing the
machine on the same subnet but different host servers.
profits of cloud service providers via dynamic virtual resource
Many tasks may be interrupted from the critical stage
renting approach," EURASIP Journal on Wireless
when an edge switch fails. A task fails when it is inter-
rupted from a critical stage. Instead of being restarted Communications and Networking, vol. 2015, p. 71, 2015.
from the last sub-stage, the execution time of the task is [6] W. Guo, K. Chen, Y. Wu, and W. Zheng, "Bidding for highly
lost. available services with low price in spot instance market,"
As illustrated in Figs 5-7, RADDOM consumes the presented at the Proceedings of the 24th International
most network resource, and HFAP consumes the least Symposium on High-Performance Parallel and Distributed
network resource. That’s because HFAP attempts to place Computing, 2015.
the virtual machine with high inter-traffic on host servers [7] X. He, P. Shenoy, R. Sitaraman, and D. Irwin, "Cutting the Cost
in the same subnet. Therefore, the data transfer does not of Hosting Online Services Using Cloud Spot Markets,"
consume too many core layer and aggregation layer net- presented at the The 25th International ACM Symposium on
work resources. However, the reliability cannot be en- High-Performance Parallel and Distributed Computing(HPDC),
sured in HFAP. For attempting to spread virtual ma- 2015.
chines among host servers in the same pod but different [8] J. Liu, S. Wang, A. Zhou, and F. Yang, "PFT-CCKP: A proactive
subnets, EFAP consumes very little core layer network fault tolerance mechanism for data center network," in 2015
resource. Therefore, EFAP can avoid adding burden to IEEE 23rd International Symposium on Quality of Service
the bottleneck of data center but still enhance the service (IWQoS), 2015, pp. 79-80.
reliability. [9] A. Zhou, S. Wang, B. Cheng, Z. Zheng, F. Yang, R. Chang, et al.,
"Cloud Service Reliability Enhancement via Virtual Machine
Placement Optimization," IEEE Transactions on Services
7 CONCLUSION Computing, vol. PP, pp. 1-1, 2016.
[10] S. Wang, Z. Liu, Q. Sun, H. Zou, and F. Yang, "Towards an
In this paper, we studies the network failure-aware re-
accurate evaluation of quality of cloud service in service-
dundant virtual machine placement problem with the
consideration of the data center network resource con- oriented cloud computing," Journal of Intelligent Manufacturing,
sumption. We formulate the network failure-aware re- vol. 25, pp. 283-291, 2014.
dundant virtual machine placement problem as an inte- [11] J. Xu, J. Tang, K. Kwiat, W. Zhang, and G. Xue, "Survivable
ger non-linear programming problem and prove that the virtual infrastructure mapping in virtualized data centers," in
problem is NP-hard. An efficient heuristic algorithm is Cloud Computing (CLOUD), 2012 IEEE 5th International
proposed to solve the problem. Extensive simulations Conference on, 2012, pp. 196-203.
show the effectiveness of our algorithm. We will experi- [12] M. Al-Fares, A. Loukissas, and A. Vahdat, "A scalable,
ment real-world workload in our future work. commodity data center network architecture," ACM SIGCOMM
Computer Communication Review, vol. 38, pp. 63-74, 2008.
[13] S. Kandula, J. Padhye, and P. Bahl, "Flyways to de-congest data
ACKNOWLEDGMENT center networks," 2009.
The work presented in this study is supported by [14] M. Shen, X. Ke, F. Li, F. Li, L. Zhu, and L. Guan, "Availability-
NSFC (61602054), Beijing Natural Science Foundation Aware Virtual Network Embedding for Multi-tier Applications
(4174100), and NSFC (61571066). in Cloud Networks," presented at the Proceedings of the 2015
IEEE 17th International Conference on High Performance
REFERENCES Computing and Communications, 2015 IEEE 7th International
Symposium on Cyberspace Safety and Security, and 2015 IEEE
12th International Conf on Embedded Software and Systems,
[1] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali,
2015.
"Cloud computing: distributed internet computing for IT and
[15] X. Li and C. Qian, "Traffic and failure aware VM placement for
scientific research," Internet Computing, IEEE, vol. 13, pp. 10-
multi-tenant cloud computing," in 2015 IEEE 23rd International
13, 2009.
Symposium on Quality of Service (IWQoS), 2015, pp. 41-50.
[2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A.
[16] T. Knauth and C. Fetzer, "VeCycle: Recycling VM Checkpoints
10 MANUSCRIPT ID
for Faster Migrations," presented at the Proceedings of the 16th

Annual Middleware Conference, Vancouver, BC, Canada, 2015.
[17] B. Nicolae and F. Cappello, "BlobCR: efficient checkpoint-
restart for HPC applications on IaaS clouds using virtual disk
image snapshots," presented at the Proceedings of 2011
International Conference for High Performance Computing,
Networking, Storage and Analysis, Seattle, Washington, 2011.
[18] Í. Goiri, F. Julia, J. Guitart, and J. Torres, "Checkpoint-based
fault-tolerant infrastructure for virtualized service providers,"
presented at the Network Operations and Management
Symposium (NOMS), 2010 IEEE, 2010.
[19] N. Limrungsi, J. Zhao, Y. Xiang, T. Lan, H. H. Huang, and S.
Subramaniam, "Providing reliability as an elastic service in
cloud computing," in Communications (ICC), 2012 IEEE
International Conference on, 2012, pp. 2912-2917.
[20] A. Zhou, S. Wang, Z. Zheng, C. H. Hsu, M. R. Lyu, and F. Yang,
"On Cloud Service Reliability Enhancement with Optimal
Resource Usage," IEEE Transactions on Cloud Computing, vol.
4, pp. 452-466, 2016.
[21] J. Lin, C. Chen, and J. Chang, "QoS-aware data replication for
data intensive applications in cloud computing systems," 2013.
[22] F. Machida, M. Kawato, and Y. Maeno, "Redundant virtual
machine placement for fault-tolerant consolidated server
clusters," in Network Operations and Management Symposium
(NOMS), 2010 IEEE, 2010, pp. 32-39.
[23] D. Jayasinghe, C. Pu, T. Eilam, M. Steinder, I. Whally, and E.
Snible, "Improving Performance and Availability of Services
Hosted on IaaS Clouds with Structural Constraint-Aware Virtual
Machine Placement," in 2011 IEEE International Conference on
Services Computing, 2011, pp. 72-79.
[24] Z. Zheng, T. C. Zhou, M. R. Lyu, and I. King, "Component
ranking for fault-tolerant cloud applications," Services
Computing, IEEE Transactions on, vol. 5, pp. 540-550, 2012.
[25] N. Bansal, A. Caprara, and M. Sviridenko, "Improved
approximation algorithms for multidimensional bin packing
problems," presented at the Proceedings of the 47th Annual
IEEE Symposium on Foundations of Computer Science, 2006.
[26] S. Wang, Z. Liu, Z. Zheng, Q. Sun, and F. Yang, "Particle swarm
optimization for energy-aware virtual machine placement
optimization in virtualized data centers," in Parallel and
Distributed Systems (ICPADS), 2013 International Conference
on, 2013, pp. 102-109.
[27] C. Chekuri and S. Khanna, "On Multidimensional Packing
Problems," SIAM J. Comput., vol. 33, pp. 837-851, 2004.
[28] Y. Gao, H. Guan, Z. Qi, Y. Hou, and L. Liu, "A multi-objective
ant colony system algorithm for virtual machine placement in
cloud computing," Journal of Computer and System Sciences,
vol. 79, pp. 1230-1242, 2013/12/01 2013.
View publication stats

Network Failure-Aware Redundant Virtual Machine Placement in A Cloud Data Center

Uploaded by

Copyright:

Available Formats

You might also like

Network Failure-Aware Redundant Virtual Machine Placement in A Cloud Data Center

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Network Failure-Aware Redundant Virtual Machine Placement in A Cloud Data Center

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Network failure-aware redundant virtual machine placement in a cloud data

Article in Concurrency and Computation Practice and Experience · August 2017

Ao Zhou Shangguang Wang

SEE PROFILE SEE PROFILE

Ching-Hsien Hsu Myung Ho Kim

SEE PROFILE SEE PROFILE

CYCLONE: Cloud applications automation View project

Transportation Cyber-physical Systems Design to Automated Driving View project

The user has requested enhancement of the downloaded file.

Network Failure-aware Redundant Virtual

R ecently, cloud computing has gained widespread

Fig. 1 Redundant virtual machine placement in fat-tree data center network.

Fig. 2 Different redundant virtual machine placement strategies.

TABLE 1 tree data center, is often over-utilized.

vm B The backup virtual machine that is allocated for

4 PROBLEM FORMULATION 4.1 Virtual Machine Interaction Constraints

VM P  {vmv p ,1 , vmvp ,2 ,..., vmvp ,N } (1)

server in subnet subk . Otherwise, Yvm is equal to 0.

CP denotes the network resource consumption caused

cpu where dt denotes the data interaction rate between two

4.4 An INLP Formulation

In fat-tree cloud data center, only edge layer and aggrega-

6.2 Performance Comparison under Different

for Faster Migrations," presented at the Proceedings of the 16th

View publication stats

You might also like