Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Maintaining Continuity of Operations with a Disaster Tolerance Strategy

IT risks must now be considered as serious as any other significant business risk.
Business white paper

Table of contents
Executive overview................................................ 2 How much business risk can you afford? .................. 2 The cost of an IT outage ........................................ 3 The case for disaster-tolerant systems ....................... 4 Benefits of disaster-tolerant systems .......................... 4 Which services are at greatest risk? ........................ 5 Implementing disaster-tolerant systems typical use cases .................................................. 6 Match your business recovery to your business risk......................................................... 8 For more information ............................................. 8

Executive overview
Increasingly, business and IT risks are intertwined. In todays global 24x7 economy, organizations find themselves less tolerant of interruptions. Today, even a few hours of downtime constitutes a business disaster for some organizations. Conventional disaster recovery solutions do not work under these circumstances. Instead, many companies seek disaster-tolerant solutions to mitigate the business impact and business costs of such significant events. Companies implement disaster-tolerant solutions to keep business processes running right through a disruption and immediately after it. Companies practice disaster-tolerant solutions to continue business processes without any delay after a disruption. They want continuity of operations, and they achieve this by utilizing IT solutions that keep IT services running with secondary systems. These systems assume responsibility for the business, if the primary site is interrupted. Not all IT services need disaster tolerance. By assessing the business requirements, the cost of downtime for each application, and the business impact of its loss, executives can determine which applications have critical RTO and RPO objectives near zero (recovery time and recovery points), and hence need disaster tolerance solutions.

How much business risk can you afford?


Critical system disruptions are quite common. Very few of them are caused by natural disasters. Many are caused by internal IT failures or by common external events such as loss of power, fire, or flooding. In March 2007, both US Airways and Canada Revenue Agency experienced system disruptions. In June 2008, Amazon.com experienced an unplanned site outage. The direct users of the website faced system interruptions; however, international website and other service sites were not affected. The lesson is clear: system interruptions that significantly impact operations can happen to any organization at any time. Today, you cannot operate for long without access to your critical IT services and applications. As a result, business risks that impact IT directly and IT risks themselves must now be considered and treated like any significant business risk. Managing in this demanding environment forces organizations to become less tolerant of interruptions that once were acceptable. Today, for example, the loss of a customer call center for a few hours would constitute a disaster for many businesses. Today, IT system are critical for every organizationnot just for large global financial or telecommunications firms. As the industry analyst firm Enterprise Strategy Group (ESG) states: Companies are becoming increasingly dependent on a global economy. Many have established key technology in follow-the-sun modes that require 24x7 availability. In response, managers are turning to disaster-tolerant systems to mitigate IT business risk, when the business impact of site outages and business costs of downtime are large.

Figure 1. Average cost of downtime The loss of a critical system for even a few hours can cost thousands, even millions, of dollars.
Average cost per hour of downtime ATM Fees Shipping Tele-ticketing Airline Catalog sales Home shopping PPV Credit card Brokerage USD $1K $10K $100K $1,000K $10,000K
IT downtime is business downtime. Up to 10 percent of business costs are IT downtime. The impact of downtime can be devastating.

Source: Contingency Planning Research, Inc.; a division of Eagle Rock Alliance, Ltd.; West Orange, NJ.

The cost of an IT outage


All business interruptions cost money. When a critical system is interrupted, the costs can mount fast. For example, downtime of a catalog e-commerce system could cost up to $100,000 USD an hour in lost sales; if the system processes credit card transactions, lost sales could exceed $1 million USD per hour (see Figure 1). Interruptions like these were the concern for a large Midwest healthcare provider, operating a network of almost two dozen healthcare facilities and affiliates and a primary care physician network with several hundred practitioners. One of its centralized systems provides online access for care providers at multiple facilities to patient care orders, medication information, dietary needs, and more. Given its tornado-alley location, this system clearly was at risk of being disrupted. In the event of a tornado, care providers would not be able to maintain the level of care required for their patients. Although disruptions related to tornadoes and other events are rare, management insisted on a disaster-tolerant system to make certain that it could continue to deliver the care its patients counted on.

These include disruptions to internal systems that can have significant productivity impacts and costs relating to employees and partners. Performance penalties may be incurred due to service interruptions that impact service-level commitments. There is the potential loss of customers goodwill and negative publicity, which can impact brand and corporate reputation. There may be associated liabilities and financial penalties. And, for some organizations, even lives could be at stake. Costs of business interruptions vary by industry and by company. It also varies by system. If payroll goes down, the organization needs to recover within the cycle of its pay period. However, should key production systems go down, the revenue stream may stop immediately and not resume until the system is restored. Not only is the organization itself impacted, but suppliers, partners, regulators, customers, and other stakeholders may also be affected. Risks to systems such as these take many forms and managers need to assess and prioritize all of them.

Not all business and IT risks are equal in the likelihood of their taking place or in their impact on the business. Hence, it is imperative to assess risks realistically, prioritizing around those with the highest probability In the case of the Midwest healthcare organization, of occurrence and those with the highest business the risk was loss of life. For other businesses that do impact. Then, you must allocate resources, based not face life-and-death scenarios, the major risks are on what is at stake. By doing so, you can determine no less real and critically important to the survival of which systems need to be made disaster-tolerant and the organization. Loss of revenue due to the inability to which systems can be protected through traditional process transactions, for example, often is great, but recovery solutions. there are other costs to consider as well.

The case for disaster-tolerant systems


Market research shows that most, but not all, organizations practice some form of disaster recovery (DR). At a minimum, DR employs data protection and recovery solutions, often tape-based. In practice, DR for most organizations is more complicated than that. Most must be concerned not only with how quickly the data can be restored, but from where they can acquire the IT infrastructure to restore services. Not only is access to the backup data necessary, but also access to a site (owned or third party) with compatible IT infrastructure with which to resume IT services. DR is necessary and must be part of every organizations plans. Many organizations, however, augment conventional DR for mission-critical services with disaster-tolerant solutions. Disaster-tolerant systems differ from DR in that they allow the organization to continue functioning despite an interruption of its primary systems, thereby enabling continuity of business operations even while the disaster is taking place. They do so by turning over operational responsibilities to systems at the secondary site. These systems provide for more than the protection of data loss. You need to protect your ability to continue to deliver IT services despite and during a disaster or disruption. Disaster-tolerant systems continuously capture and save data from these primary systems for use by a backup system. The backup system continues delivering IT services in the event of a disruption. It does not matter if this results from a natural disaster, a major catastrophe such as a fire, an earthquake, or a terrorist attack, or a local event such as an extended power loss, an accident, or a human error.

Which organizations need disaster-tolerant systems? Overall, the following types of organizations would need disaster-tolerant systems: Where the business costs of downtime are large Where the business impact of a site outage is large Where the investments in disaster-tolerant solutions are clearly smaller than the perceived business and IT risks

Benefits of disaster-tolerant systems


Disaster-tolerant systems reduce the business risk resulting from application downtime. This translates into financial savings and costs not incurred, because significant downtime and data loss is prevented. The decision to opt for disaster-tolerant systems comes down to a straightforward benefit assessment: the likelihood of risks taking place multiplied by the business costs should that risk occur, compared with the costs of investing in an appropriate recovery or disaster-tolerant solution. Given the declining costs of IT infrastructure, disaster-tolerant solutions can be justified easily for customers of all sizes and in all industries. Using Figure 2 (Balance risks and costs), managers can identify and prioritize the risks their organizations face. Then, using Figure 1 (Average cost of downtime) as a guide, they can estimate the amount of value at risk should an incident or event result in system or application downtime. Disaster-tolerant systems produce both direct and indirect benefits. Direct benefits, which can be translated into hard dollars, can quickly offset the cost of a disaster-tolerant system for many business-critical systems. Such benefits include: Mitigation of business risks relating to IT system interruption and data losshelping access to applications and data continue as normal through the interruption

Where once only a small handful of global enterprises considered such disaster tolerance important, now, Reduction of any financial impact from business according to the ESG, Many more organizations of disruptionenabling the continuity of revenueall sizes, in all industries, and located across the globe generating capabilities require applications to be running and data to be Maintenance of acceptable levels of productivity always available. The needs of these organizations for production continuity go far beyond simple disaster recovery, requiring an environment that maintains business continuity during Maintenance of expected and committed levels of and immediately after a disaster. To make it more servicefor customer business continuity, customer interesting, the number and types of applications that satisfaction, and compliance with service-level require this level of protection is very diverse. obligations

Figure 2. Balance risks and costs

C
O

Acceptable downtime

L
O

S T

S S

Spend more, lose less Money

Maximum cost of control Time to recover (slow or fast) Data loss (high or low)

Spend less, lose more

Maintenance of supply chain continuity and consistencyfor avoiding disruptions in the supply chain Avoidance of legal, regulatory, and contractual compliance exposure Continuation of customer experienceto prevent degradation of customer loyalty Protection of corporate reputation and brand integrity The indirect benefits may not be as dramatic, but they too can have long-term business impact, which also can be translated into hard dollars. These benefits include the ability to preserve the brand and company image, to avoid bad press and publicity, and to maintain the confidence of partners, suppliers, and other stakeholders.

Clearly, businesses are at risk when IT is not operational. According to a 2007 ESG poll, 14 percent of enterprise businesses reported that they cannot tolerate any application downtime. More than 58 percent cannot tolerate even four hours of application downtime. Overall, more than 80 percent of enterprise-class and mid-tier respondents reported that they cannot tolerate more than 24 hours of application unavailability. Interestingly, ESG notes that survey respondents reporting low tolerance for downtime were not just from the financial sector, as might be expected, but were also from government, manufacturing, retail, and healthcare (including pharmaceutical) sectors. ESG looked at the level of tolerance for system downtime in various vertical industry segments and found the following:

RetailThe critical applications that handle pointof-sale data and enable inventory and distribution require applications that are always on. Being Not every application requires disaster-tolerant able to react quickly to changing conditions can systems. Managers must assess each application on mean the difference between profitability and loss. the basis of the following business cost factors: Online shopping and the customers experience are Revenue riskloss of current or future revenue due to also very important to retailers, making downtime downtime or data loss unacceptable. Customer riskloss of customers or degradation of customer experience during a period of downtime or Online commerceSimilarly, B2B and B2C commerce requires 24x7 availability. As online data loss commerce represents a larger proportion of a Operational/Productivity riskloss of worker companys revenue, the need for disaster-tolerant productivity and operational efficiency (automation) commerce systems increases. System interruptions during a period of downtime or data loss reduce revenue flow, reduce customer satisfaction, Regulatory/Compliance riskinability to meet and risk driving customers to competitors who are regulatory and compliance obligations due to data just a click away. loss or during a period of downtime

Which services are at greatest risk?

Legal and contractual risksinability to meet legal and contractual obligations during a period of downtime

HealthcareWith the digitization of medical images and patient records, ensuring availability of these applications and files goes beyond mission-critical. Especially when you consider the pervasive use of technology in delivering critical patient care, disaster-tolerance can actually be driven by the need to save lives, not just save money. ManufacturingCompetitive pressures drive companies to run as efficiently as possible. In particular, just-in-time (JIT) manufacturing processes that coordinate shipments from suppliers around the world demand 24x7 availability. Interruption of critical applications can throw off the precisely orchestrated timing of order/production/delivery with serious bottom-line ramifications. A number of factors are increasing the business risks relating to system downtime and driving the need for disaster-tolerant systems. Customer-touching services delivered through the call center or website rely completely on the availability of systems and data. If those systems are down or if data is unavailable, customer satisfaction is impacted and customers may move to competitors.

The heightened business risks associated with system interruptions, according to industry analyst ESG, have made business continuity a boardroom-level concern. In many cases, it is the CEO who mandates that the business be fully protected. Even worse than an outage itself is the fallout from negative press, loss of customer confidence, and, for public companies, potential impact on stock prices.

Implementing disaster-tolerant systemstypical use cases


Table 1 summarizes use cases that illustrate the role of disaster-tolerant systems in a business continuity and availability strategy. Developing a risk management plan that includes the right level of disaster tolerance for various applications is not a complicated challenge. You must understand the risks that matter to the business and the business impact (costs) of those risks. These may or may not include regional, city, or site-level risks, or may be limited to risks that only impact a specific data center. Following a risk assessment, a business impact analysis project determines your costs. Given your risks and costs, you can match business continuity and availability solutions to your specific requirements.

Table 1. Disaster-tolerant systemstypical use cases


Situation Large manufacturing company Multiple locations SAP is core system Runs wide array of systems and applications to support other business functions Challenge Global operationprecisely orchestrated 24x7 supply, production, business financials, and business intelligence Need for continuous access to SAP applications and data Disaster tolerance strategy Deploy SAP on primary and secondary disaster-tolerant systems at two locations Capture data on primary system and copy it concurrently to secondary system In the event of failure at primary site, failover immediately to secondary site Implement conventional DR strategy for other systems and applications Online commerce company Global B2B and B2C operation Centralized North American data center Maintain 24x7 e-commerce operations, 365 days a year for B2B and B2C operations Maintain 24x7 online support operation for B2B and B2C operations Deploy B2B and B2C e-commerce systems on primary and secondary disaster-tolerant systems Establish a remote facility for the secondary data center site In the event of a failure at the central data center, immediately fail-over to the secondary site to enable non-interrupted commerce Deploy call center systems and master customer data at all three locations on disaster-tolerant systems Continuously replicate data between systems Monitor and rebalance the workload in the event any call center is overloaded or down

Major telecommunications company Maintains telephone call centers on three continents

To enable 24x7 support Balance call center workload across all call centers at peak times Enable one call center to back up the other Make sure all customer data is available to all call center agents

Major healthcare provider with centralized data center Delivers urgent and comprehensive healthcare services Supports a large network of physician practices and health centers

To make sure all systems that impact the delivery of urgent, critical care do not go down Reduce duration of interruptions for non-urgent and non-critical applications and systems

Establish a secondary data center to provide an alternative to the primary data center Deploy all urgent and critical applications on disaster-tolerant systems to reduce downtime For other systems, deploy DR systems with fast recovery point and recovery time objectives to reduce disruptions

Match your business recovery to your business risk


When the business risk is greatwhen seconds, minutes, and hours countconventional DR systems are simply not fast enough. In the time it takes to recover, revenue and customers can be lost, critical supply chains disrupted, contractual and regulatory obligations jeopardized, and even lives put at risk. Therefore, some business risks require foolproof systemssystems that can almost instantly transfer services to the secondary site without missing a beat, even if they are instantaneously disrupted without warning. This was exactly the disaster tolerance scenario HP recently set up in a test observed by ESG. In this test, HP dramatically engineered a true disaster by physically blowing up a primary data center (simulating a natural gas explosion), which interrupted operations instantlya worst-case scenario for IT operations. Yet, the HP disaster-tolerant solution worked flawlessly, without loss of data or services. For more information about this disaster simulation, go to
www.hp.com/go/disasterproof

Business risks of system downtime Lost revenue Unanticipated costs Lost customers Reduced levels of customer satisfaction Negative impact on brand and reputation Reduced employee productivity Penalty clauses in customer agreements Exposure to legal risks Going out of business Potential threat to human health and safety To determine your needs and requirements, call HP and let us work with you to conduct a business risk and business impact analysis. Or, if you already know you need more protection than you currently have, we can help make your current environment more resilient and even disaster-tolerant, if that is what you need. We can help you align your IT environment to your business needstoday and into the future.

For more information


To learn more, please visit
www.hp.com/go/continuityandavailability

Not every application needs this level of disaster tolerance, but you might. When was the last time you truly assessed your business risksor the business impact of those risk eventsand then matched your current recovery time to costs to determine if your business was adequately protected or not? You might be surprised, and you might discover a need to build disaster tolerance into your plans.

Share with colleagues

Get connected
www.hp.com/go/getconnected

Get the insider view on tech trends, alerts, and HP solutions for better business outcomes

Copyright 2007, 20092010 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. 4AA1-6439ENW, Created November 2007; Updated July 2010, Rev. 2

You might also like