Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Storage and Hyper-V Part 1: Fundamentals

www.altaro.com/hyper-v/storage-and-hyper-v-part-1-fundamentals/

September 17, 2013

17 Sep 2013 by Eric Siron

Rate

When it comes to Hyper-V, storage is a massive topic.


There are enough possible ways to configure storage that it could almost get its own book,
and even then something would likely get forgotten. This post is the first of a multi-part
series that will try to talk about just about everything about storage. I won’t promise that it
won’t miss something, but the intent is to make it all-inclusive.

Part 1 – Hyper-V storage fundamentals


Part 2 – Drive Combinations
Part 3 – Connectivity
Part 4 – Formatting and file systems
Part 5 – Practical Storage Designs
Part 6 – How To Connect Storage
Part 7 – Actual Storage Performance

The Beginning
This series was actually inspired by MVP Alessandro Cardoso, whom you can catch up
with on his blog over at CloudTidings.com. He was advising me on another project in which
I was spending a lot of time and effort on the very basics of storage. Realistically, it was too
in-depth for that project. His advice was simply to move it here. Who am I to argue with an
MVP?

This first entry won’t deal with anything that is directly related to Hyper-V. It will introduce
you to some very nuts and bolts aspects of storage. For many of you, a lot of this material
is going to be pretty remedial. Later entries in this series will build upon the contents of this
post as though everyone is perfectly familiar with them.

Intro
In the morass of storage options, one thing remains constant: you need drives. You might
have a bunch of drives in a RAID array. You might stick a bunch of drives in a JBOD
1/7
enclosure. You might rely on disks inside your host. Whatever you do, it will need drives.
These are the major must-haves of any storage system, so it is with drives that we will kick
off our discussion.

Spinning Disks
Hard disks with spinning platters are the venerated workhorses of the datacenter and are a
solid choice. Their pros are:

The technology is advanced and mature


Administrators are comfortable with them
They provide the lowest cost per gigabyte of space
They offer acceptable performance for most workloads

Cons are:

Their reliance on moving parts make them some of the most failure-prone equipment
in modern computing
Their reliance on moving parts make them some of the slowest equipment in modern
computing

Spinning disks come in many forms and there are a lot of things to consider.

Rotational Speed and Seek Time


Together, rotational speed and seek time are the primary determinants of how quickly a
hard drive performs. As the disk spins, the drive head is moved to the track that contains
the location it needs to read or write and, as that location passes by, performs the
necessary operation.

How quickly the disk spins is measured in revolutions per minute (RPM). Typical disk
speeds range from 5400 RPM to 15,000 RPM. For most server-class workloads, especially
an active virtualization system, 10,000 RPM drives and higher are recommended. 7200
RPM drives may be acceptable in low-load systems. Slower drives should be restricted to
laptops, desktops, and storage systems for backup. The second primary metric is seek
time, occasionally called access time. This refers to the amount of time it takes for the
drive’s read/write head to travel from its current track to the next track needed for a
read/write operation. Manufacturers usually publish a drive’s average seek time, which is
determined by measuring seeks using a random pattern. Sometimes its maximum seek
time will also be given, which measures how long it takes to travel across the entire drive
platter. Lower seek times are better.

Consider the following diagram (I apologize for the crudity of this drawing, I’m a pretty good
sysadmin for an artist and the original plan was to have an actual artist redo this):

Basically, the drive is always spinning in the direction indicated by the arrow and at full
speed. The read/write head moves on its arm in the direction indicated by the arrow. What
it has to do is move to the track where it needs to read or write data, then wait for the disk
to spin around to the necessary spot. If you’ve ever seen a video of a train picking up mail

2/7
from a hook as it went past, it’s sort of
like that, only at much much much
higher speeds. The electronics in the
drive tell it the precise moment when it
needs to perform its read or write
operation.

Hard Disk Diagram

Cache
Drives have a small amount of onboard cache memory, which is typically only for reads. If
the computer requests data that happens to be in the cache, the drive sends it immediately
without waiting for an actual read operation. In general, this can happen at the maximum
speed of the computer-to-drive bus. Most drives will not use their onboard cache for writes
as contents would reported to the computer as written but lost during a power outage.
Mechanisms that save writes in the cache (for subsequent reads) and also directly on the
platter are called “write-through” cache. “Write-back caches” place writes in the cache and
return to the operating system, allowing the disk to take care of it when it gets around to it.
These should be battery-backed. The larger the cache, the better overall performance.

IOPS
Input/output operations per second (IOPS) is sort of a catch-all metric that encompasses all
possible operational types on the hard drive using a variety of tests. Due to the
unpredictable nature of the loads on shared storage in a virtualized environment, the metric
of most value is usually random IOPS, although you will also find specific metrics such as
sequential write IOPS and random read IOPS. If the drive distinguishes between read IOPS
and write IOPS, most server loads, including virtualization, are heavier on read than write.
Higher IOPS is better.

Sustained transfer rate is effectively the rate at which the drive can transfer data in a
worse-case scenario, usually measured in megabytes per second. There are a variety of
ways to ascertain this speed, so you should assume that manufacturers are publishing
optimistic numbers. A higher number for sustained transfer rate is better.

Form Factor
Spinning disks come in two common sizes: 2.5 inch and 3.5 inch. The smaller size limits
the platter radius, which limits the amount of distance that the drive head can possibly
cover. This automatically reduces the maximum seek time and, as a result, lowers the
average seek time as well. The drawback is that the surface area of the platters is reduced.
2.5 inch drives are faster, but do not come in the same capacities as their 3.5 inch cousins.

4 Kilobyte Sector Sizes


3/7
Update 9/20/2013: A note from reader Max pointed out some serious flaws in this section
that have now been corrected.

A recent advance in hard drive technology has been to convert to four kilobyte sector
sizes. A sector is a logical boundary on a hard drive disk track that is designated to hold a
specific number of bits. Historically, a sector has been almost universally defined as being
512 bytes in size. However, this eventually led to a point where the feasibility of expanding
hard drives capacities by increasing bit density became prohibitive. By increasing sector
sizes to 4,096 bytes (4k), superior error-correction methods can be employed to help
protect data from the inevitable degradation of the magnetic state of a sector (fondly known
as “bit rot”).

The drawback with 4k sectors is that not all operating systems support it. So, in the interim,
“Advanced Format” drives have been created to fill in the hole. They use emulation to
present themselves as a 512-byte-per-sector drive. These are noted as 512e (for
emulated) drives, as opposed to 512n (for native). This emulation incurs a substantial
performance penalty during write operations. Because Windows Server 2012, and by
extension, Hyper-V Server 2012, can work directly with 4k sector drives, it is recommended
that you choose either 512n or 4k sector drives.

Solid-State Storage
Solid-state storage, generally known as Solid State Drives (SSD) uses flash memory
technology. Because they have no moving parts, SSDs have none of the latency or
synchronization issues of spinning hard drives and they consume much less power in
operation. Their true attractiveness is speed; SSD sustained data transfer rates are an
order of magnitude faster than those of conventional drives. However, they are much more
expensive than spinning disk.

Another consideration with SSD is that every write operation causes irreversible
deterioration to its flash memory such that the drive eventually becomes unusable. As a
result of this deterioration, early SSD’s had a fairly short life span. Manufacturers have
made substantial improvements that maximize the overall lifespan of SSD. There are two
important points to remember about this deterioration:

A decent SSD will be able to process far more write operations in its lifetime than a
spinning disk
The wearing effects of data writes on an SSD are measurable, so the amount of
remaining life for an SSD is far more predictable than for a spinning disk

The counterbalance is that SSDs are so much faster than spinning disks that it is possible
for them to reach their limits much faster than their mechanical counterparts. Every
workload is unique, so there is no certain way to know just how long an SSD will last in your
environment. Typical workloads are much heavier on reads than on writes, and read
operations do not have the wearing effect of writes. Most installations can reasonably
expect their SSDs to last at least five years. Just watch out for that imbalanced

4/7
administrator who needs to defragment something every few minutes because he’s
convinced that someone was almost able to detect a slowdown. He’s going to need to be
kept far away from your SSDs.

There are two general types of SSD: single-level cell (SLC) and multi-level cell (MLC). A
single-level cell is binary: it’s either charged or it isn’t. This means that SLC can hold one bit
of information per cell. MLC has multiple states, so each cell can hold more than a single bit
of data. MLC has a clear advantage in density, so you can find these drives at a lower cost
per gigabyte. However, SLC is much more durable. For this reason alone, SLC is almost
exclusively the SSD of choice in server-class systems where data integrity is of priority. As
added bonuses, SLC has superior write performance and uses less energy to operate.

Performance measurements for SSDs can vary widely, but they are capable of read
input/output operations per second (IOPS) in the tens of thousands, write IOPS in the
thousands, and transfer rates of hundreds of megabytes per second.

Drive Bus
The drive bus is the technology used to connect a drive to its controller. If you’re thinking
about using shared storage, don’t confuse this connection with the external host-to-storage
connector. The controller being covered in this section is inside the computer or device that
directly connects to the drive(s). As an example, you can connect an iSCSI link to a storage
device whose drives are connected to a SAS bus.

The two most common drive types used today are Serial-Attached SCSI (SAS), and serial
ATA (SATA). Fibre Channel drives are also available, but they have largely fallen out of
favor as a result of advances in SAS technology.

SAS is in iteration of the SCSI bus that uses a serial connections instead of parallel. SCSI
controllers are designed with the limitations of hard drives in mind, so requests for drive
access are quite well-optimized. SAS is most commonly found in server-class computing
devices.

SATA is an improvement upon the earlier PATA bus. Like SAS to SCSI, it signaled a switch
from parallel connectivity to serial. However, an early iteration of the SATA bus also
introduced a very important new feature. Whereas older ATA drives processed I/O requests
as they came in, (a model known as FIFO: first-in, first-out), SATA was enhanced with
native command queuing (NCQ). NCQ allows the drive to plan its platter access in a
fashion that minimizes read head travel. This helped it gain on the optimizations in the
electronics of SCSI drives. SATA drives are most commonly found in desktop and laptop
computers, although they are becoming more common in entry-level and even mid-tier
NAS and SAN devices.

SAS drives are available in speeds up to 15,000 RPM where SATA tops out at 10,000
RPM. SAS drives usually have superior seek times to SATA. Also, they tend to be more
reliable, but this has less to do with any particular strengths and weaknesses of either bus
type and more to do with how manufacturers focus on build quality. Of course, all this
translates directly to cost. Quality aside, SATA’s one benefit over SAS is that it comes in

5/7
higher maximum capacities. Both bus types currently have a maximum speed of six
gigabits per second, but as explained above, it is not realistic to expect any one drive to be
able to reach that speed on a regular basis.

You will sometimes find hard drive systems labelled as near-line. Unfortunately, this isn’t a
tightly-controlled term so you may find it used in various ways, but for a pure hard drive
system it most properly refers to a SATA drive that can be plugged into a SAS system. This
is intended mainly for mixed-drive systems where you need some drives to have the higher
speed of SAS but also the higher capacity of SATA.

SSDs can use either bus type. Because they have no moving parts and therefore none of
the performance issues of spinning disks, the differences between SAS and SATA SSDs
are largely unnoticeable.

External Connections
If you’re going to use external storage, your choices are actually pretty simple. You can use
a direct-connect technology, usually something with an external SAS cable. You can use
Fibre Channel, which can direct-connect or go through an FC switch. And, you can use
iSCSI, which travels over regular Ethernet hardware and cables. Pretty straightforward. The
issue here is typically cost. FC is lossless and gets you the maximum speed of the wire…
assuming your disks can actually push/pull that much data. iSCSI has the standard TCP
overhead. It can be reduced using jumbo frames, but it’s still there. However, if you’ve got
10GbE connections, then the overhead is probably of little concern. Another option is SMB
3.0. Like iSCSI, this works over regular Ethernet.

Why This is All So Important


So, why have a whole blog post dedicated to really basic material? Well, if you truly know
all these things, then I apologize, as I probably wasted your time. But, even though this all
seems like common knowledge, many administrators come into this field midstream and
have to sort of piece all this together through osmosis and a lot of them come out
understanding a lot less than they think they do. For instance, I once watched a couple of
“storage experts” insist on connecting an array using a pair of 8Gb Fibre Channel
connectors. Their reasoning was that it would aggregate and get 16Gbps throughput in and
out of the drives. That’s all well and good, except that there were only 5 drives in the array,
and they were 10k. Know when those drives are going to be able to push 16Gbps? Never.
Of course, the link is redundant, and there’s some chance that one of the lines could fail.
Know when those 5 drives are going to reach 8Gbps? Also never. “Cache!” they exclaimed.
Sure, but at 8Gbps, the cache could be completely transferred so quickly that still, not
really 8Gbps. “Expansion!” they cried. OK, so the system has been in place for a decade
and experiences about 5% data growth annually. Of the presented capacity of these 5
drives, 30% was being used. So, the drives and device would no longer be in warranty
once they reached the point where a sixth drive would need to be added. Even a sixth drive
wouldn’t put the array in a place where it could use 8Gbps.

6/7
This was a discussion I was having with experts. People paid to do nothing but sit around
and think about storage all day. People who will be consulting for you when you want to put
storage in your systems. Look around a bit, and you’ll see all sorts of people making
mistakes like this all the time. Remedial or not, this information is important, and a lot further
from the realm of common knowledge than you might think.

What’s Next
The next part of this series is going to look at how you combine drives together to offset
their performance and reliability issues. It will then add all this together to start down the
path of architecting a good storage solution for your Hyper-V system.

Download: Free Hyper-V Backup


Download your FREE copy of Altaro VM Backup, Altaro's easy to use backup software for
Hyper-V trusted by 40,000+ organizations worldwide. FREE, forever for up to 2 VMs per
host

Download Now!Learn more

Get notified about new articles by email


Receive all our free Hyper-V articles and checklists and get notified firstwhen we release
new eBooks and announce upcoming webinars!

Eric Siron
I have worked in the information technology field since 1998. I have
designed, deployed, and maintained server, desktop, network, and
storage systems. I provided all levels of support for businesses ranging
from single-user through enterprises with thousands of seats. Along
the way, I have achieved a number of Microsoft certifications and was
a Microsoft Certified Trainer for four years. In 2010, I deployed a
Hyper-V Server 2008 R2 system and began writing about my
experiences. Since then, I have been writing regular blogs and
contributing what I can to the Hyper-V community through forum participation and free
scripts.

ALL Post : ALL POST

7/7

You might also like