Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

WHITE PAPER

CREATING STRETCH CLUSTERS


WITH STORMAGIC SvSAN

Overview Regional disasters are rare, but


Many physical disruptions and disasters that localized disruptions are frequent
organizations face are localized to a specific Even in high-risk areas, such as California
room, building or campus. Much maintenance for earthquakes, Kansas and Oklahoma for
and repair work that organizations undertake tornadoes, and Louisiana for hurricanes, regional
is similarly contained to a single location or is disasters of sufficient impact to disrupt IT
completed in a staged manner. For IT operations, operations are a relatively rare occurrence.
these more-frequent, localized events cause Smaller disasters and disruptions, however, are
significant disruption, but the disruption can significantly more frequent and have greater
be minimized or eliminated through the use of impact on daily operations than the regional
stretch clusters. disasters against which organizations protect
themselves. Small disasters may include
This white paper will examine the use cases localized flooding, fires, air contamination,
for stretch clusters, and how StorMagic bombs from localized terrorist attacks,
SvSAN enables stretch clusters to be created. and lock-downs due to active-shooter
Supporting this are test results demonstrating incidents. Small disruptions may occur
the level of latency (and therefore distance) from equipment maintenance, utility
that SvSAN can tolerate for both read and write system upgrades, and building
operations. remodeling.

Key considerations in operational These situations rarely require


resiliency a disaster declaration, which
1. Localized disruptions and disasters are more would invoke costly recovery
frequent than regional disasters procedures at a distant
2. Recovery from many regional disasters can disaster recovery site, but
be accomplished locally without resorting to rather, can typically be
a disaster declaration managed by failing
3. Stretch clusters across buildings, campuses over applications
and metro-areas are a valuable tool in to another system
reducing or eliminating service interruptions located within the
from both local and regional disasters building, campus,
4. Stretch clusters are relatively low cost, or metropolitan
especially when compared to the cost of an area. This
out-of-region disaster recovery center approach is
5. Stretch clusters help avoid the high cost of dramatically
disaster declarations less
disruptive

StorMagic. Copyright © 2018. All rights reserved.


and, with When the network performance is poor, a
The recommended maximum
distance two SvSAN nodes can be today’s typical response to solve the issue is to increase
stretched across is: stretch network bandwidth, which is a relatively simple
cluster thing to achieve. However, this only improves
1,000Km technology, can
be done quite
performance when there is network congestion.
Bandwidth addresses the amount of data
That's equivalent to the cost effectively. that can be transmitted, but does not govern
distance from New York the speed of the link. In general having more
City to Cincinnati. Creating stretch bandwidth reduces the likelihood of congestion.
clusters with Bandwidth is similar to lanes on a highway –
Expressed as latency,
more lanes enable more vehicles (data packets)
SvSAN can tolerate StorMagic SvSAN
a recommended: to use the highway at the same time. When all
The majority of StorMagic
lanes are full the bandwidth limit is reached,
10ms
SvSAN cluster deployments
which leads to congestion (traffic jams).
will reside in a single physical
location, either in the same
Latency or round trip time (RTT) is another
rack, equipment room or
important factor and determines the speed
site, with cluster nodes usually
of the link – lower latency equates to better
separated by a relatively short
network speeds. Referring back to the highway
distance. Despite this, SvSAN offers
analogy, latency is the highway speed limit and
the storage architect the ability
impacts the time it takes to complete a round
to separate the cluster nodes over
trip. Unfortunately, getting better network latency
greater distances while maintaining
is not as simple as increasing bandwidth as it is
synchronous data replication, thereby
affected by a number of factors, most of which
creating a stretch cluster configuration,
are out of the user’s control. These include:
shown in fig. 1.
• Propagation delay - the time that data takes
Bandwidth and latency - the to travel through a medium relative to the
limitations of using stretch clusters speed of light, such as fibre optic cable, or
If you are considering stretch clusters as part copper wire.
of your storage infrastructure, then you will no
doubt be aware of the bandwidth and latency • Routing and switching delays - the number
limitations that will influence your deployment. of routers and switches data has to pass
through.
Site A Site B
• Data protocol conversions - decoding and re-
encoding slows down data.

• Network congestion and queuing -


bottlenecks caused by routers and switches.

• Application latency - some applications can


introduce or only tolerate a certain amount of
latency.

Putting SvSAN's latency tolerance to


the test
Fig. 1: A typical stretch cluster, with one node StorMagic has conducted tests with SvSAN
on Site A, another on Site B, and a witness to determine the bandwidth and latency (and
node separated remotely. distance) that two nodes within a cluster can
tolerate before service is adversely affected.

StorMagic. Copyright © 2018. All rights reserved.


Server configuration
The tests were performed using a pair of
Supermicro X7DVL-3 servers with the following
configuration:

Processor 1 x Intel Xeon E5405 4-core 2GHz


Memory 8GB RAM
Network 2 x Intel 80003ES2 Gigabit Ethernet
Adapter Adapters

RAID 1 x LSI MegaRAID 1078 RAID


Controller Controller

Disk Drives 3 x Western Digital 500GB 7.2k RPM


SATA (WD5002ABYS) in a RAID-0
stripe and presented as Raw Device
Mapped (RDM)
Hypervisor VMware vSphere ESXi 5.5 Fig. 3: Graph depicting the impact of latency
on read and write IOPS and response time
I/O load generation software – IOMeter when using SvSAN.
To ensure that the tests were repeatable, I/O load The configuration used for the tests is illustrated
was generated using the IOMeter tool (www. in fig. 2.
iometer.org)
The response times and IOPS remained
The following I/O patterns were used: relatively constant up to latencies of 30ms,
4KB block size, 100% random read after which the random write IOPS rapidly
4KB block size, 100% random writes dropped off and write response times began
8KB block size, 100% random read to rise. These results are shown in fig. 3.
8KB block size, 100% random writes
Disk queue depth = 32 NOTE: Using different disk types (SSDs,
15K RPM, SAS, wider disk stripes,
Latency emulator – WANem etc.) would increase the number of
To simulate different network latencies, the IOPS and/or lower the response
WANem (http://wanem.sourceforge.net) times, but would still follow similar
Wide Area Network Emulator tool was used. profiles.
WANem can be used to simulate Wide Area
Network conditions, allowing different network Additional findings included:
characteristics, such as network delay (latency),
bandwidth, packet loss, packet corruption, • Random read IOPS and
disconnections, packet re-ordering, jitter, etc. to response times are not
be configured. affected by increased
latencies and
remained relatively
constant at 700
IOPS and 45.6ms
through out the
test. The read
requests are
fulfilled by
the local
Fig. 2: Stretch cluster testing configuration. storage,
and

StorMagic. Copyright © 2018. All rights reserved.


wouldn’t need to traverse the network in separate racks, rooms, buildings or cities can
almost eliminate downtime associated with
• Write IOPS were between 730 and 770 until localized disasters such as fire, flooding or power
the RTT reached 30ms, after which the outages. The versatility of SvSAN's strech cluster
number of IOPS rapidly dropped off. This feature enables storage infrastructure to be
is because synchronous mirroring requires configured in this highly resilient manner.
all writes to be committed to both sides of
the mirror before being acknowledged as The testing conducted for this white paper
complete and would have had to traverse the has shown that SvSAN synchronous mirroring
network can withstand high latencies allowing the
SvSAN cluster nodes to be separated by up
• The write response time averaged ~42.5ms to 1,000Km, which provides a huge scope for
until the RTT reached 30ms, after which it building resiliency into an organization's storage
began to rise, on reaching 100ms of delay infrastructure.
it increased exponentially. These response
times could lead to unacceptable application Further Reading
performance. Enabling stretch clusters is just one feature of
many within SvSAN. Why not explore some of
Recommendations the others, such as Predictive Storage Caching,
Should you choose to leverage SvSAN's or Data Encryption? These features and
stretch cluster feature, and to ensure data more can be accessed through the extensive
is synchronously replicated, the following collection of white papers on the StorMagic
recommendations should be followed: website.

• Although SvSAN synchronous mirroring Additional details on SvSAN are available


can tolerate high round trip times, it is in the Technical Overview which details
recommended that it should not exceed SvSAN's capabilities and deployment
10ms, equating to 1,000Km excluding other options.
factors that affect network latency. Latencies
greater than this could lead to performance If you're ready to test SvSAN in your
degradation or unpredictable behavior. environment, you can do so totally
free of charge, with no obligations.
• Bandwidth between SvSAN VSAs should be a Simply download our fully-
minimum of 1Gb, this depends on the amount functioning free trial of SvSAN
of data that needs to be synchronously from the website.
mirrored.
If you still have questions, or
• Ideally there should be a minimum of you'd like a demo of SvSAN
two separate network connections for you can contact the
redundancy. StorMagic team directly
by sending an email to
• Make certain the path selection policy sales@stormagic.com StorMagic
ensures that the data is read from the local Unit 4, Eastgate
side of the mirror to avoid read requests from Office Centre
traversing the network. Eastgate Road
Bristol
Conclusions BS5 6XX
Separating or stretching the nodes within United Kingdom
a storage cluster is a very effective way of +44 (0) 117 952 7396
increasing the resiliency of storage infrastructure. sales@stormagic.com
Creating a stretch cluster by placing the nodes www.stormagic.com

StorMagic. Copyright © 2018. All rights reserved.

You might also like