Professional Documents
Culture Documents
Benchmarking Servers Using Virtual Machines
Benchmarking Servers Using Virtual Machines
Abstract
Virtualization software is often used in enterprise on server machines to
run virtual machines (VMs), each running a distinct server. This has many
benefits including flexibility and lower hardware cost. This paper presents the
results from conducting a performance evaluation of two comparable dual
quad-core server machines, one from HP and another from Dell, both run-
ning VMWare ESX V3. The performance is measured using a series of inde-
pendent benchmarks testing the performance of a file server, database server,
web server, java server and email server. This multi-workload benchmark is
used to evaluate the two server machines and how the different server work-
loads interact to better utilize the hardware resources.
1 Introduction
Benchmarking is a useful tool in determining the performance of a particular piece
of technology. In this paper we describe a benchmarking experience to deter-
mine server performance using virtual machines to run several different servers
per physical machine. We were asked to perform this study by Indiana Univer-
sitys University Information Systems (UIS), a division of University Information
Technology Services (UITS). UIS is responsible for developing, implementing and
managing information systems that are part of the universitys core business actions
[16]. They require a number of servers performing different tasks. UIS prefers to
use VMwares ESX Server product [9] to create virtual machines to run many dif-
ferent servers on the same physical machine. Our task is to determine which server
performs best, and what workload best utilizes the machines.
Virtualization can provide the user a flexible environment for running applica-
tions. On one physical machine, there can be multiple operating systems running
independently of each other sharing the resources of the physical machine in har-
mony. Virtual machine software typically runs on the bare hardware and then one
can create virtual machines (VMs) with different operating systems and software
on each one. The number of VMs per physical machine depends on the virtual-
ization software and the hardware capabilities. This is an advantage because when
the need for another server arises, one can just add another VM to a machine, in-
stall the appropriate software and the system is ready, as opposed to purchasing a
new machine to handle each new server. Virtualization allows the user to change
the allocation of resources through various tools provided by the software vendor.
This is very important to find the optimum configuration of applications and VMs.
For our experiment we compare the performance of HP and Dell servers run-
ning five VMs. Each VM runs a benchmark to test a database, email, file, java
or web server. The virtualization software, ESX Server, is provided by VMware,
a commercial vendor that has a long history of enterprise virtualization products.
We based our server and benchmark server choices on the needs of our client (UIS)
and previous virtual machine benchmarks [68, 18].
Section 2 describes each of the benchmarks we chose to use and why. Section 3
compares how our work relates to previous work on benchmarking VMs. Section
4 provides a thorough description of the experimental platform. Section 5 explains
the workloads we benchmarked and discusses the results. Section 6 describes the
future research directions some of the authors will pursue. We conclude with con-
clusions and acknowledgments.
2 Benchmarks
In this section we describe the five benchmarks we use to test the virtual machine
systems.
2
2.1 Database Server Benchmark: SwingBench
Swingbench is a free load generator (and set of benchmarks) designed to stress
test an Oracle 10g database. It models users repeatedly executing a predefined mix
of transactions. Swingbench has two benchmarks OrderEntry and CallingCircle.
OrderEntry is based on the oe schema that ships with Oracle10g. It can be run
continuously (that is until you run out of space). It introduces heavy contention
on a small number of tables and is designed to stress interconnects and memory.
Our client, UIS, uses Oracle for all of their database needs, thus we chose to
use a benchmark designed for Oracle. Swingbench was chosen after surveying the
literature and exploring the documentation. It is specifically designed for stress
testing Oracle database and the hardware it runs on. Swingbench consists of two
modes, a GUI mode called Swingbench and a command line version, CharBench
that generate the network load. All the parameters are easily configurable via a sin-
gle XML file. It is also scalable as it can be extended to stress test using multiple
load generators and has a coordinator component for controlling it.
The benchmark setup consisted of an Oracle 10g (10.2.0.1) Database Server in-
stalled on Red Hat Enterprise Linux 4 and a Load Generator Client. The client
requires 1.5 JVM and SwingBench,with Oracle Client also installed on Red Hat En-
terprise Linux 4. The database instance used was approximately 8GB. Application
data files consumed approximately 7 GB, with log files and dictionary views ac-
counting for the remaining space. The load generator average load was maintained
below 70% and ratio of one load generator CPU to two database CPUs was also
maintained to give reliable results. The number of users for the benchmark was
specified to over 100 standard database connections. The user think time between
transactions was 250 milliseconds. The Oracles SGA (in-memory data cache) was
configured to be 512 MB.
The graph 1 shows the total number of transactions and the distribution of type
of transactions that were run on the Oracle Database. We choose to have the Pro-
cess Orders transactions as 5.5% of total transactions load and New Customer Reg-
istration transactions as 11%. The transactions Browse Products, Browse Orders
and Order Products each contributed around 27.8% of the total transactions. These
parameters were decided according to the contemporary database write and read
trends and are easily configurable. These would be changed in the future work de-
pending upon the requirements of UIS (see section 6).
3
Figure 1: Transactions Load for the Oracle Database
4
One of the key observations has been with regards to the total number of trans-
actions. On the HP Virtual machine, it hits a high of 7500 pretty easily and is con-
sistent as shown in the graph above in individual runs. However, the Dell machine
hits a high of between 5500 and 6000 transactions.
Our intention was to benchmark several differing workflows that would mimic
messaging and appointment loads that commonly occur in generalized business
and education environments. Several things went wrong. After managing to par-
tially initialize the benchmark testing routine, we encountered the ubiquitous MAPI
5
E FAILONPROVIDER error. We managed to eventually delete several hundred
spuriously-created Microsoft Exchange server users that had arbitrary passwords
generated during the first failed initialization. This was an extremely time-consuming
error that turned out to be somewhat simple to solve. We also encountered a shar-
ing error stemming from an improperly configured public mail folder. We strug-
gled during the entire time given for benchmark setup to understand why we could
not select a non-local Exchange Server when we initialized LoadSim from the client
machine. By the time the first batch of multi-service benchmarks was being exe-
cuted on the entire system, we had determined that there was no Windows Domain
Controller configured on the Exchange server.
6
version of smbtorture in the Samba 4.0 development tree. Unfortunately, there was
not enough time to install and use this component as part of the benchmark. Since
we are measuring how well VMs perform as single servers. Thus, we decided it is
sufficient to measure the file system operations using dbench and network traffic
using tbench. The benchmark was run simulating one to a hundred ninety-five
clients.
The results for the benchmark are shown in figures 2.3 and 2.3. Figure 2.3 shows
the results of the dbench component. As you can see as more clients are added, the
file system performance drops off quite fast. File I/O is very CPU intensive, thus
as more clients demand files and the other VMs use the CPU the performance de-
creases. Figure 2.3 shows the results of the tbench component. As you can see, as
more clients are added, the better the throughput. There seems to be a peak at ap-
proximately 80 clients, however, it is clear that the CPU is a bottleneck in the file
server case.
dbench results
180
160
140
HP Dell
120
Throughput (MB/s)
100
80
60
40
20
0
1 5 10 15 20 25 30 35 40 45 50 55 60 65
# of clients
7
tbench results
100
90
80
70
Throughput (MB/s)
60
50
40
30
HP Dell
20
10
0
1 5 10 15 20 25 30 35 40 45 50 55 60 65 80 100 120 140 160 180 195
# of clients
totally self contained and self driving (generates its own data, generates its
own multi-threaded operations, and does not depend on any package be-
yond the JRE).
memory resident, performs no I/O to disks, has only local network I/O, and
8
has no think times.
Clients are replaced by driver threads, database storage by binary trees of ob-
jects and increasing amounts of workload are applied, providing a graphical
view of scalability.
Benchmarks like RUBiS and Volanomark have been used in the past to bench-
mark Java servers. While RUBiS is an auction site prototype and is usually used to
evaluate application servers performance and scalability, VolanoMark is a pure Java
server benchmark characterized by long-lasting network connections and high
thread counts.It creates client connections in groups of 20 and measures the time
required by the clients to take turns broadcasting a set of messages to the group.
The fact that SPECjbb2005 emulates a 3-tier system which is the most common
type of server-side Java application, is the reason for using this benchmark in our
study.
This user can configure the number of application instances to run. When more
than one instance is selected, several instances will be run concurrently with the
final measurement being the sum of those for the individual instances. The multi-
ple application instances are synced using local socket communication and a con-
troller.
Results:
1. The value of the expected peak warehouse(N) is set to the result of the run-
time call to obtain the maximum number of processors in the system. In our
9
case, N = 2.
2. For all points from N to 2*N warehouses, the scores for the individual JVM
instances are added. (The other points do not contribute in the calculation
of throughput metrics.)
3. The summed throughputs for all the points from N warehouses to 2*N ware-
houses (inclusive of both) are averaged. This average is the SPECjbb2005
bops metric. The SPECjbb2005 bops/JVM is obtained by dividing the SPECjbb2005
bops metric by the number of JVM instances.
1. There is only one JVM instance. Both the SPECjbb2005 bops and SPECjbb2005
bops/JVM is 16431. See Figure 4.
2. There iw now two JVM instances. JVM 1 has a score of 8964 while
JVM 2 has a score of 8831. The SPECjbb2005 bops is 17795 and the
SPECjbb2005 bops/JVM is 8898. See Figure 2.4
10
Figure 5: Dell machine with two JVM instances
11
Figure 7: HP machine with two JVM instances
1. There is only one JVM instance. Both the SPECjbb2005 bops and SPECjbb2005
bops/JVM is 17453. See Figure 6.
2. There iw now two JVM instances. JVM 1 has a score of 6094 while
JVM 2 has a score of 12980. The SPECjbb2005 bops is 19074 and the
SPECjbb2005 bops/JVM is 9537. See Figure 7
12
SPECweb2005 Support is a purely unencrypted workload. Most of the re-
quests and traffic in SPECweb2005 Support are for normal HTTP down-
loading of files of various sizes.
All of the workloads page requests are dynamic, and so customization of the size of
the page requests are possible. In this benchmarking run, however, we have chosen
to stick with the default settings.
3 Related Work
Modern computers are now powerful enough to run hundreds of processes at the
same time. Therefore, its now a waste to acquire a new machine for each server
process. Virtualization is therefore an important technique for subdividing the re-
sources of a modern computer. The benefits of virtualization includes [14]
Improving the utilization of machine resources.
Virtual machines can provide secure, isolated sandboxes for running un-
trusted applications.
13
Provide resource limits contraint and, in some cases, resource guarantees.
This helps in the creation of QoS-enabled operating systems.
Easy migration of systems. This allows for a flexible and robust systems for
handling errors.
VMware [9] is an example of a popular full system virtualization tool for x86 archi-
tecture [13]. Virtualization benchmarking using VMware has been performed by
various hardware vendors, such as Dell [11, 12], and HP [3]. Xen is another popular
virtualization tool. The difference between Xen and VMware is that while VMware
employed the fully virtualization approach where the exposed functionality of vir-
tual hardware is indentical to that of the underlying machine, Xen employed an ap-
proach known as paravirtualization where the virtual machine abstraction is simi-
lar but not identical to the underlying hardware [8, 13]. Virtualization benchmark-
ing using Xen is the topic of the papers of Clark et al [6], and Barham et al [8]. Also
relevant to the topic of virtualization is the usage of Disco, a full virtualization tech-
nique, in running commodity operating systems on ccNUMA architectures [7] and
Denali, a paravirtualization technique for hosting vast numbers of virtualized OS
instances [18]. Finally, VMware had also done a performance comparison between
several virtualization frameworks [17] that includes Xen and its own VMware.
4 Experiment Description
The experimental setup is a HP Proliant DL585 G2 model, eight 2.6 Ghz Dual-Core
AMD Opteron Processors and 32 GB of memory, and Dell Inc. PowerEdge 6950 ,
eight 2.6 Ghz Dual-Core AMD Opteron Processors and 24 GB of memory. Each of
these machines is running VMwares ESX Server 3.0.1 as the demonstration plat-
form for the virtual machines. The 5 disks connected to the HP are configured in
RAID Level 5.However, The 5 disks are connected to Dell are not configured in any
RAID Level 0.We have created 5 virtual machines (3 running Red Hat Enterprise
Linux 4 and 2 are running Windows Server 2003) of 2 Virtual CPUs and 4 GB of
memory. For each benchmark run of HP Virtual Machines, Dell Virtual Machines
act as the client and vice versa.
14
process. It is further constrained by the amount of resources available to each vir-
tual machine on the target physical host.
Through four runs of the benchmark, with two runs per system, our results
indicate that theres a slight difference in CPU load between the two systems, e.g.
see Figure 8, but the amount of memory consumed are almost identical, thats the
amount of free memory for the Dell machine hovers around the 13GB mark while
that of the HP machines hover around the 21GB mark, which on account that the
Dells machine only has 24GB of physical memory instead of the 32GB for the HPs
machine, indicates that the benchmarks not able to saturate the system memory.
This can be attributed partly to the fact that the benchmark constraint of only run-
ning the clients on the Dells machine while benchmarking the HPs machine and
vice versa creates an artificial barrier on the stress testing of the systems.
6 Future Work
Due to time constraints and technical difficulties, we were unable to test the ma-
chines as thoroughly as we had hoped. Our future work will include running ad-
ditional workload configurations to fully utilize the hardware resources.
In addition, a subset of the authors plan continue this work as a research project
that will carry on from the preliminary work done this semester. The project will
focus on one or two of the applications described in this paper, such as the database
Oracle 10G application, and focus the experimentation, benchmark, and workload
on exposing aspects of configuration, organization, and load balancing.
The tests will be run on both 32-bit and 64-bit versions of the linus OS. We
will explore generating workloads based on the learning from the previous run
15
Figure 8: Comparison of CPU Load between Dells and HPs machines
16
of the workload. As UIS is underwriting use of the machines, we will refine the
workloads through close interaction with UIS. Moreover, with UIS having heavy
database transactions, we would study the advantages of running small databases
in a virtual environment so that better response time maybe obtained despite pro-
cessing of a large number of queries.
7 Conclusions
The two physical machines did not seem to have a significant difference in per-
formance. The unfortunate and unintended difference in machines is most likely
the culprit in this case. Further work needs to be done in order to ascertain the
true performance differences and an ideal workload configuration. It is clear from
this experiment that virtualization software is an excellent tool for large and small
businesses to host servers, and expand and adapt to the ever changing markets.
References
[1] Answers.com. Computer Desktop Encyclopedia. Computer Language Com-
pany Inc., 2007. Answers.com, April 2007. URL http://www.answers.com/
topic/pc-magazine-benchmarks. 6
[6] B. Clark et al. Xen and the art of repeated research. In USENIX Annual Tech-
nical Conference, pages 135144, 2004. 2, 14
17
[7] K. Govil et al. Cellular disco: resource management using virtual clusters on
shared-memory multiprocessors. ACM Transactions on Computer Systems,
18:229262, 2000. 2, 14
[8] P. Barham et al. Xen and the art of virtualization. In Proceedings of the nine-
teenth ACM symposium on Operating System principles, pages 164177, 2003.
2, 14
[10] S. King, G. Dunlap, and P. Chen. Operating system support for virtual ma-
chines. In 2003 USENIX Annual Technical Conference, pages 7184, 2003.
[12] T. Muirhead and D. Jaffe. Advantages of Dell PoweEdge 2950 two socket
servers over Hewlett-Packard Proliant DL 585 G2 four socket servers for vir-
tualization. Technical report, Dell Enterprise Systems, December 2006. 14
18