Many physical disruptions and disasters that localized disruptions are frequent organizations face are localized to a specific Even in high-risk areas, such as California room, building or campus. Much maintenance for earthquakes, Kansas and Oklahoma for and repair work that organizations undertake tornadoes, and Louisiana for hurricanes, regional is similarly contained to a single location or is disasters of sufficient impact to disrupt IT completed in a staged manner. For IT operations, operations are a relatively rare occurrence. these more-frequent, localized events cause Smaller disasters and disruptions, however, are significant disruption, but the disruption can significantly more frequent and have greater be minimized or eliminated through the use of impact on daily operations than the regional stretch clusters. disasters against which organizations protect themselves. Small disasters may include This white paper will examine the use cases localized flooding, fires, air contamination, for stretch clusters, and how StorMagic bombs from localized terrorist attacks, SvSAN enables stretch clusters to be created. and lock-downs due to active-shooter Supporting this are test results demonstrating incidents. Small disruptions may occur the level of latency (and therefore distance) from equipment maintenance, utility that SvSAN can tolerate for both read and write system upgrades, and building operations. remodeling.
Key considerations in operational These situations rarely require
resiliency a disaster declaration, which 1. Localized disruptions and disasters are more would invoke costly recovery frequent than regional disasters procedures at a distant 2. Recovery from many regional disasters can disaster recovery site, but be accomplished locally without resorting to rather, can typically be a disaster declaration managed by failing 3. Stretch clusters across buildings, campuses over applications and metro-areas are a valuable tool in to another system reducing or eliminating service interruptions located within the from both local and regional disasters building, campus, 4. Stretch clusters are relatively low cost, or metropolitan especially when compared to the cost of an area. This out-of-region disaster recovery center approach is 5. Stretch clusters help avoid the high cost of dramatically disaster declarations less disruptive
and, with When the network performance is poor, a The recommended maximum distance two SvSAN nodes can be today’s typical response to solve the issue is to increase stretched across is: stretch network bandwidth, which is a relatively simple cluster thing to achieve. However, this only improves 1,000Km technology, can be done quite performance when there is network congestion. Bandwidth addresses the amount of data That's equivalent to the cost effectively. that can be transmitted, but does not govern distance from New York the speed of the link. In general having more City to Cincinnati. Creating stretch bandwidth reduces the likelihood of congestion. clusters with Bandwidth is similar to lanes on a highway – Expressed as latency, more lanes enable more vehicles (data packets) SvSAN can tolerate StorMagic SvSAN a recommended: to use the highway at the same time. When all The majority of StorMagic lanes are full the bandwidth limit is reached, 10ms SvSAN cluster deployments which leads to congestion (traffic jams). will reside in a single physical location, either in the same Latency or round trip time (RTT) is another rack, equipment room or important factor and determines the speed site, with cluster nodes usually of the link – lower latency equates to better separated by a relatively short network speeds. Referring back to the highway distance. Despite this, SvSAN offers analogy, latency is the highway speed limit and the storage architect the ability impacts the time it takes to complete a round to separate the cluster nodes over trip. Unfortunately, getting better network latency greater distances while maintaining is not as simple as increasing bandwidth as it is synchronous data replication, thereby affected by a number of factors, most of which creating a stretch cluster configuration, are out of the user’s control. These include: shown in fig. 1. • Propagation delay - the time that data takes Bandwidth and latency - the to travel through a medium relative to the limitations of using stretch clusters speed of light, such as fibre optic cable, or If you are considering stretch clusters as part copper wire. of your storage infrastructure, then you will no doubt be aware of the bandwidth and latency • Routing and switching delays - the number limitations that will influence your deployment. of routers and switches data has to pass through. Site A Site B • Data protocol conversions - decoding and re- encoding slows down data.
• Network congestion and queuing -
bottlenecks caused by routers and switches.
• Application latency - some applications can
introduce or only tolerate a certain amount of latency.
Putting SvSAN's latency tolerance to
the test Fig. 1: A typical stretch cluster, with one node StorMagic has conducted tests with SvSAN on Site A, another on Site B, and a witness to determine the bandwidth and latency (and node separated remotely. distance) that two nodes within a cluster can tolerate before service is adversely affected.
SATA (WD5002ABYS) in a RAID-0 stripe and presented as Raw Device Mapped (RDM) Hypervisor VMware vSphere ESXi 5.5 Fig. 3: Graph depicting the impact of latency on read and write IOPS and response time I/O load generation software – IOMeter when using SvSAN. To ensure that the tests were repeatable, I/O load The configuration used for the tests is illustrated was generated using the IOMeter tool (www. in fig. 2. iometer.org) The response times and IOPS remained The following I/O patterns were used: relatively constant up to latencies of 30ms, 4KB block size, 100% random read after which the random write IOPS rapidly 4KB block size, 100% random writes dropped off and write response times began 8KB block size, 100% random read to rise. These results are shown in fig. 3. 8KB block size, 100% random writes Disk queue depth = 32 NOTE: Using different disk types (SSDs, 15K RPM, SAS, wider disk stripes, Latency emulator – WANem etc.) would increase the number of To simulate different network latencies, the IOPS and/or lower the response WANem (http://wanem.sourceforge.net) times, but would still follow similar Wide Area Network Emulator tool was used. profiles. WANem can be used to simulate Wide Area Network conditions, allowing different network Additional findings included: characteristics, such as network delay (latency), bandwidth, packet loss, packet corruption, • Random read IOPS and disconnections, packet re-ordering, jitter, etc. to response times are not be configured. affected by increased latencies and remained relatively constant at 700 IOPS and 45.6ms through out the test. The read requests are fulfilled by the local Fig. 2: Stretch cluster testing configuration. storage, and
wouldn’t need to traverse the network in separate racks, rooms, buildings or cities can almost eliminate downtime associated with • Write IOPS were between 730 and 770 until localized disasters such as fire, flooding or power the RTT reached 30ms, after which the outages. The versatility of SvSAN's strech cluster number of IOPS rapidly dropped off. This feature enables storage infrastructure to be is because synchronous mirroring requires configured in this highly resilient manner. all writes to be committed to both sides of the mirror before being acknowledged as The testing conducted for this white paper complete and would have had to traverse the has shown that SvSAN synchronous mirroring network can withstand high latencies allowing the SvSAN cluster nodes to be separated by up • The write response time averaged ~42.5ms to 1,000Km, which provides a huge scope for until the RTT reached 30ms, after which it building resiliency into an organization's storage began to rise, on reaching 100ms of delay infrastructure. it increased exponentially. These response times could lead to unacceptable application Further Reading performance. Enabling stretch clusters is just one feature of many within SvSAN. Why not explore some of Recommendations the others, such as Predictive Storage Caching, Should you choose to leverage SvSAN's or Data Encryption? These features and stretch cluster feature, and to ensure data more can be accessed through the extensive is synchronously replicated, the following collection of white papers on the StorMagic recommendations should be followed: website.
• Although SvSAN synchronous mirroring Additional details on SvSAN are available
can tolerate high round trip times, it is in the Technical Overview which details recommended that it should not exceed SvSAN's capabilities and deployment 10ms, equating to 1,000Km excluding other options. factors that affect network latency. Latencies greater than this could lead to performance If you're ready to test SvSAN in your degradation or unpredictable behavior. environment, you can do so totally free of charge, with no obligations. • Bandwidth between SvSAN VSAs should be a Simply download our fully- minimum of 1Gb, this depends on the amount functioning free trial of SvSAN of data that needs to be synchronously from the website. mirrored. If you still have questions, or • Ideally there should be a minimum of you'd like a demo of SvSAN two separate network connections for you can contact the redundancy. StorMagic team directly by sending an email to • Make certain the path selection policy sales@stormagic.com StorMagic ensures that the data is read from the local Unit 4, Eastgate side of the mirror to avoid read requests from Office Centre traversing the network. Eastgate Road Bristol Conclusions BS5 6XX Separating or stretching the nodes within United Kingdom a storage cluster is a very effective way of +44 (0) 117 952 7396 increasing the resiliency of storage infrastructure. sales@stormagic.com Creating a stretch cluster by placing the nodes www.stormagic.com