Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Best Practices for Oversubscription

of CPU, Memory and Storage in

vSphere Virtual Environments
How far can oversubscription be taken safely?

Written by Scott D. Lowe

Abstract organizations have been able to reinvent the modern data

One of the benefits of virtualization is that it enables center. Whereas data centers of ten years ago tended to be
administrators to efficiently share host resources among server-centric places, modern data centers revolve around the
different applications. In fact, administrators often needs of line-of-business applications, including ensuring that
oversubscribe the physical resources on a host in order to those applications remain highly available and able to survive
maximize the number of workloads that can run on a host. But the loss of host servers.
how much oversubscription is too much? This paper discusses
what oversubscription is and why it is used, explores the pros Virtualization has changed the data center dynamic in many
and cons of the practice, and proposes some ideas about the other ways as well. While workloads used to be confined to
point at which oversubscription becomes dangerous the hardware on which they were originally installed, in a
modern data center, workloads are fluid; they flow from host
Introduction to host based on sets of administrator-defined rules as well
Virtualization enables data centers to focus on the needs of as in reaction to changes in the host environment. The fluidic
business applications, but it can make resource allocation nature of the modern data center has added new challenges to
more challenging. resource allocation, but over the years, both free and paid tools
One of the great features of virtualization is the ability to run have been introduced to assist administrators in their resource
many disparate workloads on a single host server, thereby planning efforts.
maximizing the utilization of that host server. In doing so,
Sizing resources in a resources than are actually available on
virtualized environment the host. For example, suppose a host
Virtualization makes hardware sizing has 96 GB of physical RAM. Under the
more complex—but offers opportunities right circumstances, an administrator
for improving data center efficiency. might assign 128 GB of RAM to all of the
The rise of virtualization has also enabled virtual machines running on that host.
the use of hardware in ways that were
never envisioned even just ten years ago. But just how far can this oversubscription
In those days, administrators purchased be taken? The limits depend on a
servers sized to support the peak needs number of factors. This paper discusses
of a single application, and that sizing oversubscription in general, explores
included a projection of the resources its pros and cons, and proposes

Administrators can the application would likely need over

the life of the server hardware. Because
some ideas about the point at which
oversubscription becomes dangerous.
oversubscribe the many servers were deployed with just
a single application, resource planning Resource management and
physical resources was relatively simple. In modern data oversubscription

on a host in order center environments, which are heavily

virtualized, resource planning takes on
What is oversubscription?
Oversubscription in vSphere refers
to maximize new complexity: because a wide array to various methods by which more
of I/O patterns will be present on single resources than are available on the
the number of pieces of hardware, administrators need physical host can be assigned to the

workloads that can insight into how individual applications

interact with the rest of the environment.
virtual servers that are supported by
that host. In general, administrators
run on it. In other have the ability to oversubscribe
This blending of I/O in a heavily processing, memory and storage
words, they can virtualized environment has also created resources in virtual machines.

assign to virtual a significant opportunity for efficiency in

the data center. Whereas administrators Refusing to oversubscribe resources is the
machines, in used to size individual servers based safest choice, but often wastes resources.
on the needs of a single application, Different administrators have
aggregate, more the mixed nature of I/O in a virtual different opinions on the wisdom of

resources than are environment enables sharing of resources

with different peak needs. As a result,
oversubscribing physical resources.
Many administrators prefer to assign
actually available on there is opportunity for administrators only those resources that are physically
to very efficiently share host resources available to support all of the running
the host. among different applications. workloads. This is the safest option as
it ensures that, in general, all running
Overprovisioning is useful—but how virtual machines will always have the
much is too much? resources they need.
Even with virtualization, administrators
still sometimes overprovision resources However, in days of physical servers, it
and size individual virtual machines was not uncommon to find that physical
(VMs) to meet peak demands, so there servers rarely made use of all of their
are often resources that go unused in resources. From a processor standpoint,
a virtual machine. vSphere provides a utilization averaged only 5–15 percent,
number of powerful methods for sharing meaning that there was a whole lot of
idle resources with other running room for growth.
workloads. In addition, administrators
can oversubscribe the physical resources
on a host in order to maximize the
number of workloads that can run on
a host. In other words, they can assign
to virtual machines, in aggregate, more

Oversubscription maximizes the value In vSphere, a physical CPU (pCPU)
of resources—but introduces risk of the refers to:
host not having enough resources to • When hyperthreading is not present or
service all its VMs. enabled: A single physical CPU core
While virtual machines are generally • When hyperthreading is present and
more right-sized than their physical enabled: A single logical CPU core
counterparts were in the past, there is
still room to grow built in, especially Here are two examples:
when particular workloads are idle. • If a host has two eight core processors and
Many administrators see this as an hyperthreading is either not supported or
opportunity to make use of those idle not enabled, that host has sixteen physical
resources in order to maximize virtual CPUs (8 cores x 2 processors).
machine density on a host. However,
with oversubscription, administrators are
• If a host has two eight core processors
and hyperthreading is enabled, that host
basically assigning to virtual machines has thirty-two physical CPUs (8 cores x 2 is an opportunity
more resources than are actually processors x 2 threads per core).
available on the host. In other words, to maximize VM
if all of the virtual machines suddenly
requested access to all of their allocated
How those resources are presented to
virtual machines
density on a host.
resources, the host would not have In a virtual machine, processors are But it introduces
enough resources to service the needs. referred to as virtual CPUs (vCPUs).
When an administrator adds vCPUs to a risks: if all of the VMs
Resource oversubscription, while it does
increase virtual machine density, carries
virtual machine, each of those vCPUs is
assigned to a pCPU, although the actual
suddenly request
with it some risks. Once a particular pCPU may not always be the same. access to all of their
resource is finally exhausted, if that There must be enough pCPUs available
resource happens to be oversubscribed, to support the number of vCPUs allocated resources,
stability issues can occur and major
performance problems can be
assigned to an individual virtual machine
or that virtual machine will not boot.
the host will not
introduced affecting all of the workloads have enough
running on the host server. However, that doesn’t mean that
administrators are limited to just the resources to service
Before we discuss more about
oversubscription, it’s important to
number of pCPUs in the host. On the
contrary, there is no 1:1 ratio between
the needs.
understand vSphere manages the three the number of vCPUs that can be
basic types of resources: assigned to virtual machines and the
• Processing resources number of physical CPUs in the host.
• Memory resources In fact, as of vSphere 5.0, there is a
• Storage resources maximum of 25 vCPUs per physical core,
and administrators can allocate up to
How vSphere manages 2,048 vCPUs to virtual machines on a
processing resources single host.
How physical resources are represented
on a vSphere host How vSphere manages
In vSphere, administrators assign CPUs memory resources
to virtual machines in order to support vSphere uses a number of techniques
the workload needs of each individual to maximize the use of RAM in a
virtual machine. These virtual processing virtual environment:
resources are pulled from the host’s • Transparent page sharing (TPS)—In most
available physical CPUs. The number of virtual environments, administrators run
physical CPUs that are present in hosts is many copies of the same operating system.
dependent on a couple factors. In these cases, there is a lot of duplication
of memory pages in host memory.
Transparent page sharing is basically a

form of memory deduplication—vSphere • Swapping to disk—Swapping to disk
combines multiple identical memory pages is the hypervisor’s last-ditch effort to
into just one and frees the remaining retrieve enough physical RAM to satisfy
pages up for other uses. TPS has an almost the needs of workloads running on a
imperceptible impact on the performance host. Swapping is a process by which the
of the host. hypervisor moves the least used memory
• Memory ballooning—When the VMware pages to disk. Those memory pages are
Tools are installed inside a guest virtual still accessible, but when they are required,
machine, a memory balloon driver is they must be retrieved from disk. Swapping
installed as well. This driver acts as a will noticeably degrade the overall
Windows process, and the OS can use its performance of the host.
normal memory management techniques
to assign idle or unused memory pages to Because vSphere’s other memory

The number of the driver. The balloon driver then “pins”

those pages and reports this back to the
management techniques are so good,
swapping usually takes place only on
vCPUs that can be hypervisor. If the host becomes low on seriously overcommitted hosts, although
physical memory, guest memory pages are swapping can also be caused by resource
assigned to VMs is assigned to the balloon driver, and the host pool constraints or memory limits

not limited to the can then reclaim these memory pages in

order to address the needs of other virtual
configured on a virtual machine. In addition,
if a VM does not have VMware Tools
number of physical machines that may need the RAM. installed or VMware Tools is not running,
the ballooning process would get skipped
CPUs in the host. In this way, when a particular virtual completely and the system will go straight
machine has RAM to spare, it can to swapping.
transparently share that RAM with other
virtual machines on the same host, It should be noted that neither swapping
enabling the host to achieve yet higher nor compression take place unless
levels of VM density. Whereas TPS is a there is a memory contention issue on
memory deduplication technique, the the host, or in the situations discussed
ballooning process brings to RAM a sort of with regard to swapping. In most
thin provisioning capability. The ballooning environments, memory contention
process does require some processing issues that result in swapping or
overhead, which is usually imperceptible compression should be avoided since
in the performance of the guest and host. this situation means that the host has
However, in extreme cases, ballooning can basically run out of RAM.
cause swapping inside the OS.
• Memory compression—Introduced How vSphere manages
in vSphere 4.1, memory compression storage resources
can, in some cases, replace the costly Storage is the third piece of the
swapping process. With this technique, resource puzzle in a vSphere
rather than memory pages being swapped environment. Storage resources can
to disk on a per-VM basis, the memory also be oversubscribed through what
pages are compressed and placed into have become very common resource
a compression cache on disk. When the allocation techniques.
need arises to swap to return to RAM a
page that would have been swapped, the Thin provisioning is the most common
page is instead retrieved from the cache technique for overprovisioning
and uncompressed. While this process is storage resources.
less costly than swapping to disk, it does The most common technique for
still carry something of a performance hit. overprovisioning storage is a process
known as thin provisioning. In many
cases, when an administrator allocates
storage to a virtual machine, more
storage than is absolutely necessary

is allocated. After all, it’s reasonable acceptable and often recommended
to expect that the virtual machine will method for managing storage capacity.
continue to need additional disk space
as time goes on. Some storage devices have additional
features, such as data compression and
Thin provisioning operates as follows: deduplication, that enable additional
When an administrator provisions the levels of oversubscription. For the
total disk space for the virtual machine, purposes of this paper, however, the
the virtual machine is told that it has focus is on the hypervisor, so only thin
access to the entirety of the allocated provisioning will be discussed.
space. In reality, however, vSphere gives
the virtual machine only the space that it Getting insight into resource usage
is actually consuming. For instance, if an in your environment
administrator allocates 200 GB to a new Now that we have see how resources Swapping to disk
virtual machine, but that virtual machine are managed in a vSphere environment,
is using only 40 GB, the remaining let’s move on to oversubscribing is the hypervisor’s
160 GB remain available for allocation
to other virtual machines. As a virtual
those resources. In order to determine
whether resources are overcommitted,
last-ditch effort to
machine requires more space, vSphere the administrator needs a monitoring retrieve enough
provides additional chunks to that virtual tool such as the free vOPS™ Server
machine, up to the size of the disk that Explorer tool from Dell®. One of its physical RAM to
was originally allocated. several utilities is Environment Explorer,
which provides administrators with a
satisfy the needs of
By using thin provisioning, administrators high-level view of resource usage in workloads running
can create virtual machines with the environment. As shown in Figure 1,
virtual disks of a size that is necessary Environment Explorer shows resource on a host. Swapping
in the long term, without having to
immediately commit the total disk
utilization as a percentage of actual
physical resources, making the utility
will noticeably
space that is necessary to support that a perfect fit for exploring the topic of degrade the overall
allocation. In many tests, it has been resource over-commitment.
shown that thin provisioning carries with performance of
it only a very slight—almost negligible—
performance impact. Accordingly, thin
Now let’s explore some guidelines for
oversubscribing each of the three types
the host.
provisioning has become a common, of resources: processing, memory
and storage.

Figure 1. Environment Explorer (part of the free vOPS Server

Explorer tool) provides a high-level view of resource usage.

Oversubscribing processing Metrics to watch
resources The following metrics will help you
The common wisdom maintain a vCPU to pCPU ratio that
As mentioned earlier, in vSphere 5, every makes efficient use of resources while
physical processor core can support still allowing workloads to run well:
up to 25 vCPUs. However, for every • Inside virtual machines
additional workload beyond a 1:1 vCPU • CPU Utilization—When average CPU
to pCPU ratio, the vSphere hypervisor usage remains high, it’s time to add an
needs to invoke processor scheduling additional vCPU to the VM.
in order to distribute processor time • On the host:
to virtual machines that need it. For • CPU Ready—CPU Ready is, by far, the
example, if an administrator has created most important gauge of overall host
a vCPU to pCPU ratio of 5:1, then each health standpoint with regard to CPU.
By using thin processor is supporting five vCPUs. CPU Ready indicates the length of time
that a VM is waiting for enough physical
provisioning, Experts in the field disagree about the processors to become available in order
administrators can proper rules of thumb when it comes
to vCPU to pCPU ratio. However, there
to meet its demands. For example, if a
VM is allocated four vCPUs, this metric
create VMs with are two items on which just about will show the length of time that the VM
everyone agrees: waited for four corresponding pCPUs to
virtual disks of a size • Start with one vCPU per virtual machine. become available at the same time.
that is necessary Most experts agree that administrators • CPU Utilization—The overall CPU
should create new virtual machines with usage on the host server is also critical
in the long term, just one vCPU and add virtual vCPUs because it enables an administrator to
as needs dictate. As vCPUs are added, understand just how much work the
without having to the virtual machine is tied to requiring host server is doing.
immediately commit processor time from the host. Whenever
the virtual machine needs to perform an Real-world observations and advice
the total disk space operation, it has to wait for a number of
physical CPUs equal to the number of Virtual resources
that is necessary assigned vCPUs to be available. So, as 4 vCPUs - 200 % of Actual cores
to support that administrators add more vCPUs to a virtual 3 GB Memory - 37 % of Physical
26.6 GB Storage - 13 % of Provisioned
machine, there is an increased risk of
allocation. poorer overall performance.
Figure 2. A lab with a vCPU to pCPU
• The vCPU to pCPU ration is workload
ratio of 2;1
dependent. While 1:1 vCPU to pCPU
assignment is sometimes advocated, other
Online forums are filled with questions
ratios are common. Although vSphere
from users requesting insight into
5 supports a ratio of up to 25:1, your
acceptable vCPU to pCPU ratios in a
ability to achieve a high ratio will depend
real-world environment. While some
on the kinds of workloads you need to
responses continue to advocate for a
support. If the host is supporting lots of
1:1 ratio, from a pure density standpoint,
virtual machines, each with only meager
1:1 should be considered a worst-case
processing needs, the vCPU to pCPU ratio
scenario. Figure 2 shows a lab with a
could be quite high. If, however, the host
ratio of 2:1.
is running a number of processor intensive
workloads, the ratio may be much smaller.

Some respondents indicate that they In addition, Environment Explorer
have received guidance that suggests identifies where host processor
no more than a 1.5:1 vCPU to pCPU resources are overcommitted, so you’ll
ratio, but guidance from industry experts know where perform additional analysis
suggests that vSphere real-world to determine if that over-commitment
numbers are in the 10:1 to 15:1 range. is causing performance issues. As the
Still others indicate that VMware itself has “% of actual cores” metric begins to
a recommended ratio range of 6:1 to 8:1. surpass 500 percent, carefully monitor
CPU Ready and general workload
The Dell white paper, “Demystifying performance to ensure that business
CPU Ready (% RDY) as a Performance needs are being met.
Metric,” establishes the following
vCPU:pCPU guidelines: Oversubscribing memory resources
• 1:1 to 3:1 is no problem. The common wisdom Environment
• 3:1 to 5:1 may begin to cause Oversubscribing RAM is one of the more
performance degradation. controversial resource oversubscription Explorer (part of
• 6:1 or greater is often going to cause options. Whereas CPU and storage
resources are often overcommitted,
Dell’s free vOPS
a problem.
there seems to be some conservatism Server Explorer
In addition, keeping the CPU Ready when it comes to overcommitting RAM.
metric at 5 percent or below is tool) provides a
considered a best practice. Metrics to watch
On a host server, administrators need to
high-level view of
The actual achievable ratio in a specific watch the amount of RAM actually in use resource usage.
environment will depend on a number by virtual machines. As the actual RAM in
of factors: use approaches 100 percent, either add
• vSphere version—The vSphere CPU additional RAM to the server or migrate
scheduler is always being improved. The workloads to hosts that have more
newer the version of vSphere, the more available RAM.
consolidation that should be possible.
• Processor age—Newer processors are Real-world observations and advice
much more robust than older ones, so In order to maximize VM density and
organizations with newer processors ensure that the environment remains
should be able to achieve higher operational, it’s important to monitor
processor ratios. the actual memory utilization. That’s
• Workload type—Different kinds of why Environment Explorer displays RAM
workloads on the host will result in different usage (the “% of physical” metric) using
optimal ratios. the amount of RAM actually provisioned
to each virtual machine, rather than
vScope Explorer, another utility included the amount of RAM actually being
in vOPS Server Explorer, can help used by virtual machines once all of
you investigate performance metrics, vSphere’s various memory management
including CPU Ready, at both the host techniques are taken into consideration.
and virtual machine levels, so you can
determine whether your vCPU to pCPU
ratio is too high.

The level of over-commitment possible storage volume running out of space,
depends on one primary factor: how the VMs will still think they have available
much memory deduplication can take disk space to use but there won’t be any
place by virtue of the fact that there are space available. This can cause a serious
many similar workloads running on the outages that can result in data loss
host. The greater the level of disparity and costly recovery. So if you use thin
between running workloads, the less provisioning, be sure to monitor carefully.
memory consolidation that can take place
and the less density that can be enjoyed. Metrics to watch
Here are some observations about what To mitigate the risks associated with
others are doing and recommending with thin provisioning, you need to keep a
regard to memory over-commitment: close eye on the amount of free space

Most experts agree • Many administrators refuse to

oversubscribe RAM at all.
in a datastore. As a datastore gets low
on space, proactively add space to the
that administrators • Some administrators prefer to not exceed datastore or use Storage vMotion to
125 percent of physical memory, feeling move one of the virtual machines to
should create new that going beyond that limit carries a different datastore that has enough

virtual machines unacceptable risk.

• If every workload on the server is identical,
available capacity to serve the needs of
the workloads.
with just one vCPU much higher over-commitment levels
are possible. Real-world observations and advice
and add virtual • Many other administrators simply spot- Thin provisioning is well represented

vCPUs as needed. check host memory usage, but don’t

regularly scan for over-commitment levels.
in Environment Explorer, although it’s
displayed only in aggregate. In the
example in Figure 2, 26.8 GB of storage
Oversubscribing storage resources is currently in use—13 percent of what’s
The common wisdom actually provisioned to the three virtual
It has become commonplace to machines that are powered on. As the
oversubscribe storage resources using “% of provisioned” metric approaches
thin provisioning. This technique offers 100 percent, take care to ensure that
many benefits; the primary one is additional physical resources are
maximizing the use of the organization’s made available.
storage capacity. Plus, thin provisioning
also helps IT in two ways. First, it enables Conclusion
administrators to give a virtual machine Overprovisioning of processing,
all of the storage it will ever need memory and storage can help you
without having to constantly watch maximize resource utilization in your
to see if it needs more space. Second, virtual environment. But you want to
thin provisioning can reduce conflicts overprovision in such as way that you can
between IT and other teams: application also maintain high performance. Using
owners can request all of the storage the techniques and real-world advice
they like and storage administrators— presented in this white paper, along with
knowing full well that the request is the right tools, you can balance these
too high—can simply grant the request needs and overprovision wisely.
without worrying about wasting that
over-requested storage.

However, thin provisioning also carries

with it some challenges. While it can
make life easier on a daily basis, it
does add some complexity, and if
administrators aren’t careful, they can
introduce major availability issues. If the
storage oversubscription results in the

For More Information
contains proprietary information protected by copyright. No ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING
part of this document may be reproduced or transmitted in TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE
any form or by any means, electronic or mechanical, including IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR
photocopying and recording for any purpose without the A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. IN NO
written permission of Dell, Inc. (“Dell”). EVENT SHALL DELL BE LIABLE FOR ANY DIRECT, INDIRECT,
Dell, Dell Software, the Dell Software logo and products—as DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES
identified in this document—are registered trademarks of Dell, FOR LOSS OF PROFITS, BUSINESS INTERRUPTION OR LOSS
Inc. in the U.S.A. and/or other countries. All other trademarks OF INFORMATION) ARISING OUT OF THE USE OR INABILITY
and registered trademarks are property of their respective TO USE THIS DOCUMENT, EVEN IF DELL HAS BEEN ADVISED
representations or warranties with respect to the accuracy or
The information in this document is provided in connection completeness of the contents of this document and reserves
with Dell products. No license, express or implied, by estoppel the right to make changes to specifications and product
or otherwise, to any intellectual property right is granted by descriptions at any time without notice. Dell does not make
this document or in connection with the sale of Dell products. any commitment to update the information contained in this

About Dell
Dell Inc. (NASDAQ: DELL) listens to customers and delivers
worldwide innovative technology, business solutions and
services they trust and value. For more information,

If you have any questions regarding your potential use of

this material, contact:

Dell Software
5 Polaris Way
Aliso Viejo, CA 92656
Refer to our Web site for regional and international
office information.


You might also like