Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Prepare disaster recovery and contingency

plans
Introduction
A business is made up of many different systems: marketing, accounting, logistics,
manufacturing and so on. Information systems are an important part of any business. It is the
glue that interconnects all the systems.

Business continuity management is a holistic management process that identifies potential


threats to an organisation’s systems and ensures the provision of processes and procedures to
minimise their impact on business continuance.

A Disaster Recovery Plan (DRP) is an important part of business continuity management. It is an


action plan that charts the procedures for recovering critical business functions after a
catastrophe – an event resulting in great loss or misfortune.

The goal of a DRP is to recover critical business systems as soon as possible to a minimal
functioning order.

Unit topics
The topics for this unit are as follows:

Evaluate the impact of critical systems on business continuity

In this topic you will learn to gather information about the different information systems in a
business. In particular you will focus on how to:

 determine the critical systems of a business


 gather information about the critical systems
 rank the systems in terms of importance and impact on the business.

Evaluate threats to the system

In this topic you will learn how to identify and classify different threats to the system. You will
also learn about the strategies available to minimise the impact of the threats.

Formulate a prevention and recovery strategy

1
In this topic we look at which recovery and preventive measures are available and how to
document and gain approval for a disaster prevention and recovery strategy.

Develop disaster recovery plan

In this topic you will look at how an agreed disaster prevention and recovery strategy can be
translated into detailed processes, procedures and resources.

Evaluate the impact of critical systems on business continuity

Overview
When creating a disaster recovery plan, business impact statement and business continuity plan,
the first step is to understand which parts of a business are critical for operation. That is which
systems, including processes, infrastructure and operating data, are critical in doing what the
business does. To understand what is critical, information is gathered from many different parts
of the business. The material gathered contains information about technology infrastructure
including:

 software
 hardware
 data
 network
 facilities.

The information is then analysed to determine what must be available for the business to
continue working.

In this topic we collect information about the different information systems in a business. In
particular we focus on how to:

 determine the critical systems of a business


 gather information about the critical systems
 rank the systems in terms of importance and impact on the business.

What is a critical business system?

2
A system is critical for a commercial organisation if its failure results directly or indirectly in
loss of life (for example, an air traffic control system) and/or major financial loss. When
developing a disaster recovery plan (DRP) it is essential to identify critical systems and ensure
they are restored as soon as possible.

Each critical system has a maximum allowable downtime beyond which its loss will severely
impact the business. The shorter the period of time before losses start to occur, the more critical
the system is. The size of the financial loss, relative to the financial worth of the business, is also
significant. The greater the financial loss in percentage terms, the more critical the system is.

Identifying critical systems and data


Many organisations will have identified which systems their business relies on. Nevertheless, it
is important to formalise the identification of these systems and put in place appropriate recovery
procedures.

Ideally, the business case or proposal for each new system should identify its importance and a
risk analysis should be undertaken early in the project. This information may already be
available in the project documentation, in which case you would review this material and
identify the risk issues that have been raised.

In the absence of this information, you may need to survey the organisation’s business areas or
conduct workshops where managers can consider the critical nature of their systems.

During this process, each system should be considered as a whole. All the parts that make up the
system must be carefully documented. Only then can it be determined what part of the system is
critical.

You will need to collect information about how the system uses:

 software
 hardware
 networks (voice and data)
 data
 facilities (chairs, tables projectors etc).

Software in the form of standard packages is used to access data. It can be readily replaced.

Data may have been gathered over many years and is unique and irreplaceable.

Hardware is needed to run the software and access the data. Software requires a minimum
hardware platform to work properly.

Networks provide the basis for distributing data.

Facilities such as chairs, telephones tables, paper based form etc complete the system.

3
Since systems become more critical at different times, the maximum allowable downtime can
vary depending on the time of day, week, month or year a disaster happens. For example, many
businesses work to a monthly accounting cycle: losing their financial system at the end of the
month would have a greater impact than in the middle of the month.

An example of critical assessment

Consider the critical systems on your personal computer at home. Assess whether the following
situations make your systems critical or not.

1. You are working late on a 50-page assignment that must be handed in by 9 30am the next day
otherwise you will fail the course.
2. You are using the Internet to book a holiday you intend taking in three months time.
3. You have developed a spreadsheet to calculate your tax return.
4. You have created a database of CDs, records, tapes and videos which you will need to show your
insurance company if the collection is destroyed or stolen.
5. You have saved several versions of your favourite computer game.

You may have come up with something like this:

Table 1: Levels of critical systems

Item Critical assessment

1 Critical until 9:30am and then not critical


2 Not critical
3 Critical when completing tax return
4 Critical if event occurs
5 Not critical

Activity

To practise identifying critical systems complete Activity 1 in the Activities section of the Topic
menu.

Critical systems/data assessment forms

Before starting work on the DRP all critical systems must be identified and documented. Users
and management complete critical systems/data assessment forms with the guidance of IT staff.
Once completed, they form an integral part of the system documentation.

The following are examples of the forms used:

Form 1: Review software used

4
Review Software used (.doc 30KB)

Use this form to identify the software that is most frequently used. Frequency may or may not
indicate the software is critical. For example, many users may use a word processor every day
but this may not be critical to the organisation. Further analysis is required.

Form 2: Reviewing data used or created by the system

Review data used or created by the system (.doc 30KB)

Complete this form for each system that is used constantly or frequently, for example, an email
or e-commerce system. You can use it to identify how important the data is, which data items are
easy to recover and which are not.

The form identifies the types of data activity carried out and where the source data originates.
The level of difficulty in restoring data and impact on the organisation is then measured in
percentage points.

Let’s say, for example, we need to assess an email system. The percentage level of criticality to
the organisation is indicated with examples and an explanation of how this level was arrived at.

Table 2: Example of completed Form 2


Update Create own data Create Create own Create own
corporate files shared temporary longer term
data files documents documents documents
From source 10%
documents
(eg software
program—
not critical
because
source
documents
are
replaceable
from other
areas)
From other 10%
data files
(eg email
address
stored on
server—

not critical
because it can

5
be recovered
elsewhere)
From 10%
irrecoverable
sources such (eg diary and
as telephone calendar—not
calls critical for the
running of a
business even
though data
can’t be
recovered)
Developed at 60% 5% 5%
the
workstation (eg sent emails (eg (eg received
such as report and emails and
writing attachments— meeting attachments
room stored in
critical for bookings in temporary
organisation shared files—can’t be
because if email inbox replaced but not
crashes the critical because
business suffers) - not critical email and
for running attachments can
the business be resent.)
even though
data can’t be
recovered)
Other –
specify

Note how most of the data files for the email system are developed and created at the
workstation. The loss of these files has a high impact on the individual but not on the business as
a whole.

The following tables describe the significance of the loss of source files in relation to the purpose
for which they are used.

Table 3: Difficulty in recovering lost source files.


Data sourced Issues
From source documents Recovery could be from source documents if they are
kept.
From other data files Recovery could be from other data files if they are
backed up.
From irrecoverable sources such as Recovery impossible unless regular backups of files are
telephone calls made and stored externally.

6
Developed at the workstation such as Recovery impossible unless regular backups of files are
report writing made and stored externally.
Other – specify Determine how easy it is to get back to a source or
original.
Table 4: Impact of loss of source files on business.
Data used to Issues
Update corporate data files Important data used by many and may be critical.
Create own data files May be critical data but restricted impact and short life
Create shared documents May be critical data but restricted impact
Create own temporary Unlikely to be critical
documents
Create own longer term May be critical data but restricted impact, may be required
documents again

Form 3: Resource Requirements

Resource Requirement (.doc 26KB)

Complete this form for each system. It helps identify what equipment is needed to run each
system.

Form 4: Analysing Critical Areas

Analysing Critical Areas (.doc 38KB)

Complete this form to identify the impact of system failure in a number of different areas. The
answers ’very costly’, ’serious’ and ’little or no effect’ quantify the size of the financial loss and
thus the magnitude of the impact on the business.

The form should be completed for different time periods to show what the impact of system
failure would be in minutes and hours for time-sensitive critical systems and hours and days for
others.

The following table describes the critical areas in Form 4.

Table 5: Description of critical areas


Area Issue
Impact on cash flow Businesses must be able to pay their debts and to obtain income. Is
the system critical to the cash flow?
Impact on profitability If sales are lost or expenses incurred then it begins to bite into the
’bottom line’.
Impact on customer or Customers may put up with delayed shipment of goods once but
supplier relations next time they may go elsewhere.
Impact on legal Are there contracts or statutory obligations that may incur penalties
requirements if missed?
Impact on staff or morale If systems are regularly down or inaccurate, staff may be harassed

7
by customers or have to undertake extra work to sort out problems.

Note that all these areas can eventually have an impact on profit so the user should identify the
primary area of impact.

In addition to the forms a number of questions related to should be answered.

Question 4d

This is a different approach to identify the user’s dependence on the system. The question would
be asked for all major systems.

Question 4e

Problems occur at the worst possible time. Payroll programs may only be critical once a month
when the payroll is calculated and that will be the time that they fail. You should plan to handle
the worst-case scenario.

Question 4f

This is a different approach to identify the user’s dependence on any system.

Activity

To practiseanalysing critical areas go to Activity 2 in the Activities section of the Topic menu.

Ranking of critical systems

Having identified one or more critical systems, these need to be ranked in order of importance
and impact on the organisation. It is unlikely that you will have the time to implement DRP
procedures for all systems so you should initially concentrate on the most important.

Form 5: Ranking of critical systems

Ranking critical systems (.doc 29KB)

Complete this form when ranking critical systems.

Activity

To practise ranking the critical systems go to Activity 3 in the Activities section of the Topic
menu.

Impact of system failure

8
When undertaking risk analysis and disaster planning, it is usual to focus on critical systems,
software and data. The very definition of a critical system is that the business depends upon it
and would be severely impacted if the system were not available.

Forms 1 to 4 assist in analysing how long the business can cope after a loss.

Many organisations, such as banks, stock exchanges and automated factories, cannot manage
more than a few minutes without their systems. Imagine the state of the rail system or air traffic
control without the use of their computers. Even the local supermarket would suffer loss if the
tills went down for several minutes.

When assessing the impact on a business it is usual to consider the financial impact. Profits will
suffer if customers cannot trade with the company. If an e-commerce website is down, for
example, customers may turn to competitors.

There may also be an impact on cash flow. Not so long ago, a bank had to borrow millions of
dollars overnight to cover its needs when its computers went down.

If systems are regularly down or slow then customers may eventually go elsewhere. If faulty
systems delay payments, suppliers may stop delivering essential goods and services

Statutory and business requirements


Statutory and commercial requirements must be considered when assessing the impact of a
system failure. The Act governing the Australian financial industry promotes financial
soundness, stability and appropriate risk management.

Corporate regulation

Business continuity management (BCM) and DRP form part of the core principles of the
International Standards on Prudential Regulation. The Australian Prudential Regulatory
Authority (APRA) regulates the Australian financial industry, overseeing banks, general and life
insurers and most members of the superannuation industry.

Financial institutions are subject to auditing by APRA, including on-site visits. APRA
determines whether the business has an adequate and up-to-date DRP in place and whether the
testing program is sound. Any irregularities are noted in an audit report and a formal notice is
sent to the business. If the business fails to rectify problems it can be fined or even suspended
from trade.

Organisations trading in the USA are subject to recently enacted legislation (Sarbanes-Oxely)
which has considerably tightened their operating requirements. Failure to comply would result in
heavy fines.

Managing business continuity

9
The need to identify which critical systems rely on outside services or resources is paramount in
managing business continuity. Once critical systems are identified, it is necessary to state in the
Service Level Agreement (SLA) with the supplier how business outages will be handled. For
example, the SLA may require a supplier to store excess stock at an offsite storage area or
arrange for a competitor to handle supply until business resumes.

Take, for example, a car manufacture which purchases components for steering wheels. If the
component supplier is unable to fulfil orders due to a disaster, then the car manufacturer must
stop production and lose millions of dollars. To reduce the risk of this happening, the car
manufacturer stipulates in its SLA with the component supplier that there will be a penalty of
$100,000 per day for non-supply of components. The component supplier is forced to have a
Disaster Recovery Plan to ensure production is resumed as fast as possible or risk of being
penalised or even financial collapse.

Evaluate the impact of critical systems on


business continuity
Evaluate threats to the system
Overview
Have you ever done something that was risky like sky diving, snow boarding or even walking
down the street? You may think walking down the street is not risky. Yet if it is raining there is a
chance of slipping over or there may be a large angry dog near you who would like nothing
better than bite you. To aid against these threats you could wear non-slip shoes or cross the road,
thus keeping away from the dog. Understanding the different threats to oneself is an important
part of staying alive. The same can be said for business. A business needs to understand the
different types of threats before it can prepare against those threats and therefore maintain a
viable business.

In this section you will learn how to:

 identify different threats to the system


 examine how threats are classified
 select strategies available to minimise the impact of threats.

You should have already evaluated the impact of critical systems on business continuity.
Evaluating the impact of critical systems on business continuity requires

 determine the critical systems of a business


 gather information about the critical systems
 rank the systems in terms of importance and impact on the business.

10
Risk analysis
Risk analysis is an analytical process undertaken to evaluate system assets and examine their
susceptibility to threats. Through this process we evaluate the possible commercial losses that
may result from the loss of these assets.

Figure 1 Risk Analysis

In order to undertake a risk analysis you must:

 identify which system assets are included in the analysis


 identify threats to the system
 consider the probability of the event occurring
 estimate the possible loss that could occur
 consider safeguards to prevent or recover from the event
 carry out a cost-benefit analysis of loss versus the cost of the safeguard
 implement safeguards and a recovery plan.

Why do we carry out a risk analysis?

The basic purpose of a risk analysis is to identify preventive and recovery options for assets.
Think about assets of your own which you would take steps to protect from loss. For example, if
you own a car, you might install an alarm and immobiliser to deter theft. In the same way, a
company will also take precautions with its assets.

Computer systems (including hardware, software and data) are valuable assets of an
organisation. It is therefore very important that a risk analysis be undertaken to identify and
safeguard these systems. A major factor in risk analysis is to identify the impact of systems on
business continuity. ‘Mission critical’ systems require the greatest level of protection.

The loss of IT systems could have a major impact on many businesses. Many would come to a
standstill in minutes without their critical business systems. Even a small company could get into
financial difficulties if it lost its accounting data and did not know who owed it money.

An organisation undertakes an IT risk analysis to identify:

 how dependent it is on IT systems


 what could go wrong with these systems
 what system assets they might lose
 what can be done about it.

Identify system threats

11
IT systems can comprise many parts including:

 hardware
 software
 networks
 data
 technical skills
 projects.

There are many ways to categorise threats. One way is to consider whether the source of the
threat is internal or external.

Internal threats

Internal threats mainly result from actions by users and/or IT staff. These can include:

 viruses corrupt or delete data*. Users can unknowingly transfer viruses to the corporate
network via mobile devices such as personal data assistants or laptop computers. For
example, a user might buy a new laptop and connect it to the Internet to check for updates
at home. They are unaware that a virus is downloaded on their computer. The next day
the user takes the laptop to work and connects it to the corporate network. The virus is
then spread throughout the network deleting important data. Normally the virus would
have been stopped by the corporate firewall.
 the wrong disk is formatted destroying data and software. Mistakes are easily made
when formatting a hard disk using the command line. For example, a person on work
experience could accidentally format the wrong hard disk drive by entering a wrong
command.
 sabotage. Data and software*are intentionally destroyed or corrupted.
 data and software files are deleted. Deleting data can be accidental or intentional. For
example, a person could accidentally press the delete key when moving data or
intentionally delete data through known software system vulnerabilities.
 a password is forgotten so data or software cannot be accessed. For example, a
retrenched employee deliberately doesn’t update a password list.
 input errors cause data to be corrupted.* If operators input incorrect, duplicated or
unauthorised transactions, then very quickly the data becomes corrupted or inaccurate.
How many stories have you heard about computers sending out a bill for millions of
dollars to an old age pensioner or cheques for two cents?
 processing errors cause data to be corrupted.* Poor software design changes data.
 hardware failure occurs so data and software are not available. Hardware and
networking equipment is delivered with a mean time to failure or mean time to repair.
This is the expected time after which hardware will need to be replaced or repaired.
Preventive maintenance can prolong this period.
 fraud. Data is corrupted in order to steal assets.*
 poor testing. Bugs are left in software so errors or delays occur.*
 incorrect processes or calculations occur in programs so errors or delays occur.*

12
 copyright and license agreements are broken which leads to the company being sued
by the owner of the copyright or license provider.

External threats

External threats can include:

 theft of data and loss of confidential information especially customer details*


transmitted over the Internet or wide area network connection.
 breakdowns of Internet or wide area network connection or failure of critical systems
hardware
 fire or earthquake which renders the system inaccessible.
 floodingwhich renders the system inaccessible. Water from sprinklers or sewer lines can
cause flooding of offices.
 hackers corrupt or steal data*. A discontented customer or ex-employee may decide to
post customers’ credit card details to the Internet.
 power problems make the system inaccessible. Power spikes or outages can disrupt
critical systems.
 ‘buggy’ software from a package vendor may cause errors in data or delays.

The more serious external threats are likely to have an impact on the hardware and networks on
which the system run.

Threats listed above marked with an * may have been previously identified by a security audit or
analysis. The organisation’s internal or external auditors may have already performed such an
analysis providing you with a useful source of information. To see an example of an audit report,
click on or copy the following link.

http://www.anao.gov.au/WebSite.nsf/Publications/4A256AE90015F69BCA256A6900112E38

Example of system threats

Consider the Urban Homeware Company which has 10 stores located across the state. The
company headquarters are located in the capital city. They have identified their POS and
dispatch systems as ‘critical systems’. What threats can be identified for these systems?

Internal threats

 viruses – deleting important data. Viruses can spread to stores via dial-up connection to
company headquarters. Point of Sale terminals are not connected to the Internet but are
still susceptible to virus attacks by employees transferring data from CD’s or floppy
disks.
 hardware failure. Computer servers or networking equipment fail causing loss or
inaccessibility of data.
 deleting or changing data. Accidental deleting or changing of data by employees or
software programs.

13
 input errors. Mistakes by POS operators.

External Threats

 theft of data. Corporate espionage by competitors or by a hacker.


 break down of telephone connections. Inability to transfer data to head office.
 fire, earthquake, flood or windstorm. Causes disruption to facilities or supply chain.

Activity

To practise identifying threats to the system go to Activity 1 and Activity 2 in the Activities
section of the Topic menu.

Evaluate threats to the system


Formulate a prevention and recovery
strategy
Overview
How would you feel if you lost all the digital photos ever taken because a virus infected your
computer? Your answer is probably influenced by when you last backed up your photos. If you
made a backup the week before it would be quite easy to replace the photos, so it is just a matter
of being inconvenienced. If you never backed up your photos you would be devastated as they
would be gone forever. To avoid this devastating loss, it is a good idea to develop prevention and
recovery strategies.

As individuals we have a choice whether to develop these prevention and recovery strategies but
for organisations it is essential in order to avoid serious loss of revenue or even the loss of a
company.

In this topic we look at the recovery and preventive measures available for the different
identified threats to organisations and how to formulate and gain approval for a disaster
prevention and recovery strategy.

You should have already evaluated threats to the system. Evaluating threats to the system
requires:

 identifying different threats to the system


 examining how threats are classified
 selecting strategies available to minimise the impact of threats.

14
Strategies for dealing with risk
There are two main strategies for dealing with risk (apart from ignoring it in the hope it will go
away): prevent or recover. Both options have the objective of minimising the impact of the risk
event.

Prevention

With prevention you attempt to decrease the probability (maybe even to 0) of the event occurring
or causing damage. Many events can never be totally eliminated but their impact may be
minimised.

For example, an extensive sprinkler system will ensure that any outbreak of fire does minimal
damage. It is almost impossible to totally prevent a fire from occurring in the first place but this
is still considered a preventative action. This type of activity may also be termed risk
minimisation.

Recovery

Recovery procedures are put in place to ensure that the system can be quickly restored after the
event occurs. For example, the use of a hot-site (one that has a computer system already set up
and ready to use) allows for speedy recovery after a fire has gutted the building. This process
may also be termed a contingency. In fact DRP is sometimes referred to as contingency
planning.

Recovery and prevention options


The recovery or prevention option chosen will vary depending upon the threat being analysed.
Some of the more common options are listed in the following table.

Table 1: recovery and prevention options


Used Option Type
To recover data/software when it Backup and recovery Recovery
has been destroyed or corrupted.
To minimise the impact of Testing Prevention
software bugs and errors.
To stop unauthorised access and User and resource security Prevention
data theft or destruction.
To stop errors in the data. System controls Prevention
To minimise the impact of a major Hot sites – one option among many Recovery
disaster at the main site.
To stop unauthorised access to Encryption, password control Prevention
data
To stop virus attacks Virus checking software Prevention
To minimise user errors User training Prevention

15
To stop software being copied and Software keys Prevention
breaking license agreements.
To allow access to data to Mirrored disks or RAID (Redundant Array of Prevention
continue even if a disk fails. Inexpensive Disks) systems, clustered systems
To stop unauthorised access to Access rights Prevention
data and data destruction.
To minimise impact of power loss Uninterruptible power supplies (UPS), standby Prevention
or spikes and surges. generator

Cost of recovery and prevention options


As you can see, there are many options available to prevent risk from occurring. Some of these
are based on policies or standards and may involve no additional cost. However, some options,
such as a hot site, can be very expensive.

When deciding which options to adopt, you need to weigh the possible cost of the risk event
against the cost of the recovery or prevention option (single incident cost). A simple formula can
be used to calculate how much money to allocate to a recovery or prevention measure for the
known value of an asset.

Loss= Single Incident Cost X Rate of Threat Occurrence

The loss of critical systems can cost major organisations, such as banks, large sums of money.
They are therefore willing to invest in backup sites to keep their systems running in the event of
a major disaster. Their numerous branches and offices provide locations in which they can site
the backup equipment.

While a typical small business can still suffer a relatively large loss in the case of critical system
failure, it will probably not choose to create a backup site because of the high cost.

Example of prevention and recovery options

Let us consider the case for installing a power surge protector in an average home. Suppose there
is a power surge while you are operating your computer. It could be seriously damaged or, at the
very least, you would be faced with disruption while your computer is being repaired.

Let’s assume the worst case scenario that the single incident cost is $1200 or the cost of a new
computer. Meanwhile, a computer vendor is selling power surge protectors for $10.

So spending $10 could save $1200 in the long run. While this represents a substantial cost
benefit, it may not be enough to convince some people to purchase such a device, especially if
their computer is only used for games.

However, people who use their home computer for work are likely to have a different attitude.
Assume someone is earning $50 per hour. Their computer is damaged by a power surge and is
taken away for repairs for one day. That person stands to lose around $400 (earnings for an 8-
16
hour day) plus the cost of repairs – say $1600 in total. Intangible costs also need to be
considered: if a customer has their work delayed as a result, they may decide to send their work
elsewhere in the future.

If you live in an area that is prone to power surges, common sense would dictate that you
purchase a power surge protector. Let’s suppose that the probability of a power surge occurring
is 1 in 120 or roughly three times a year.

Use the following simple formula to estimate how much should be spent to safeguard against
power surges.

Loss= Single Incident Cost X Rate of Threat Occurrence

Loss = $1600 X 1/120 = $13.60

This means that it would be viable to spend $13.60 per incident or $40.80 per year on protection
against power spikes.

So why do so many people not bother to buy a power surge protector? When it comes to risk
analysis, people often seem to adopt an air of optimism – the ‘it will never happen to me’
syndrome. Interestingly, it is the computer user who has already suffered a loss who buys these
devices – a case of shutting the stable door after the horse has not just bolted but died!

This attitude can also be found in business. There are many managers who are willing to live
with a risk rather than spend the money on something that may never be needed.

Available options

In choosing risk prevention and recovery options to employ you need to consider:

 how critical the system is and how far the organisation relies on it
 the surrounding infrastructure and how susceptible the organisation is to a risk event
 the existing procedures and controls used and how these may be enhanced
 the equipment that may be available to prevent or recover from the event
 the number of risk events or systems a particular option may cover
 what the option will cost and how much the organisation is prepared to pay
 when you have completed the analysis and considered the risk minimisation options, the
findings are compiled in a report to be submitted to management for approval.
 once the risk minimisation strategy has been approved it has to be acted upon and the
equipment and procedures put in place.

Example of DPR strategies

Consider the following disaster prevention and recovery strategies available for a typical home
computer user.

17
 save work every few minutes
 regularly back up files according to their importance
 use external backup devices such as tape, zip or CDs
 store important files away from the home, possibly at the office
 use UPS or surge protectors especially in areas prone to power problems
 use telephone surge protectors with modems
 install and update anti-virus software
 create a repair disk
 record serial numbers of all components in case of theft
 keep a fire extinguisher in the vicinity of the computer
 use only licensed software and store all licenses safely
 use passwords and/or encryption to protect confidential files
 avoid storing passwords in dial up settings. (While this makes logging in easier, it also makes it
easy for a thief to access your account)
 use anti-spyware software and firewalls if connected to the Internet
 keep up to date security patches for software (operating systems and applications).

Strategy report
A Strategy Report recommends the risk prevention and recovery strategies to be applied and
provides a summary of the risk analysis exercise. A typical report includes the following:

The systems covered by the analysis and any other scope definition

You may be preparing a plan for a single system or for several systems. There may already be a
DRP in place to recover from network or hardware disasters. You should define the areas that
this particular plan covers and also what it does not cover.

The systems that were identified as critical on which the analysis has been based

It is normal to focus on the most critical systems first since, if these are protected, the same
processes will often also protect other systems. You should ensure that readers of your report
understand how you arrived at your findings and why you concluded that a particular system is a
critical one.

Parts of the business impacted by the systems

This may be described in terms of business functions or departments.

Possible impacts to the organisation of major and minor events

Since you need to persuade managers to spend money on the DRP, you should describe in vivid
terms the impact of a disaster on profitability, cash flow and customer relationships to achieve a
dramatic affect.

18
Current security and control of these systems

If the system is already in production then you should summarise the current situation and
identify strengths and weaknesses.

Assumptions made and the impact of any future developments

Your recommendations may be based on little or no change in the environment. However future
business developments may have an impact on your solution. For example, you may propose to
use one of the organisation’s sites as a backup site to minimise the cost. If this site is scheduled
for closure, then your plan may not be practical.

Threat and risk events considered

This summarises the risk analysis activity that should have been fully documented.

Findings and probabilities used in evaluation

Details of the findings of the risk analysis outlining the method used to determine probability of
events occuring .

Cost to the organisation if events occur

Costs should be expressed in such a way that it will capture the attention of the managers reading
the report. For example ‘If the system is down for 30 minutes we would lose $1,000,000 in
revenue!’

Possible preventative and recovery measures (a major part of the report)

Having described the problems, you can now show how you can solve them. You may need to
provide alternative approaches and options, for example, the facilities provided by a hot site and
a cold site.

Cost benefit analysis

This is what the managers will be keen to find out. What is the value of the benefits of the
proposed solutions against the cost? You should also include intangible benefits (for example,
improved customer service) in your analysis even though these are difficult to quantify in dollar
terms.

Recommendations

Develop your argument into a recommendation. It may also be worth discussing what may
happen if the recommendations are not followed.

19
Action items and activities required to implement recommendations

To show that you have thought through the proposal, you should describe how the DRP will be
implemented if it is approved.

There are several examples of reports on the Internet and these can be found by following the
suggested links in References.

A well thought out, logical, cost-effective report should have no trouble being approved.
However, if funding is tight, management may still prefer to save money in the short term by not
implementing any high-cost recommendations. In this case it is prudent to ensure that the
minutes of the meeting clearly show that management decided on this course of action.

Submitting the report


The report needs to be submitted to management for approval and authorisation of the required
funding. Often you will be asked to present and explain your report in person. This is an
opportunity, if you are well prepared, to obtain the desired approval from management.

Your presentation (using PowerPoint or other software) should include the following:

 introduction and approval process


 importance of a DRP
 impact of a disaster event
 real-life example(s)
 what the DRP will offer the business
 threats to be safeguarded against
 recovery and prevention processes
 cost benefit analysis
 how the DRP supports the business
 recommendations
 action plan
 conclusion and call for approval.

Getting approval
You may think that you have developed the best DRP in the world. However, you might present
it to management only to have it rejected. Why could this happen?

As you prepare to write your report and/or present your case consider the following issues:

Present to the audience

Use appropriate language. If the audience is made up of non-IT managers, avoid technical terms.
Use business terms and try to show the impact on individuals. For example, ask the payroll

20
manager ‘What would be the impact of an incorrect tax calculation?’ Ask the production
manager ‘How long could you go without raw materials before laying off workers?’

Make it a business case

All major decisions that management makes are usually based on as a business case. Basically
this explains the current situation, what the problems are and how to solve them. Express your
argument in values and key performance indicators. Most organisations focus on the profit of the
business. Explain what impact a disaster would have on this.

Provide examples to support your case

Disasters occur all the time. Perhaps your business recently suffered a power outage. What
happened and how did it cope? Use this and other real-life incidents to demonstrate that these
things do happen. Carry out research and have the facts available to back up your argument.

Consider legal or contractual implications

The business may need to meet certain legal or statutory obligations. How embarrassing would it
be for a hospital if patients’ records were disclosed? What would happen if Tax File Numbers
were not kept secure?

Show cost benefits and extra benefits

The heart of the business case is what the DRP will cost and what benefits will be gained. The
problem with DRPs is that if a disaster never occurs, it can appear to be a waste of money. Are
there any benefits to be gained as a by-product? For example, could the use of encryption be
used as a marketing tool to encourage more security-conscious customers?

Work on the budget

Where will the money come from? Can the costs be spread over a period of time? What can be
achieved without cost or simply by reconfiguring the system? Can each business department be
individually billed? Don’t forget to add the cost of the project team carrying out the DRP.

Provide alternatives

Managers like choices. Give them options but don’t give them so many that they get confused.
Keep it simple. Remember, they can still decide to go with the risk and not put in place any
recovery or preventative strategy.

Show you can provide solutions

Describe the threats and the problems but quickly move on to your suggested solutions and the
associated benefits. Managers want to hear solutions not problems. If you follow these guidelines
then you should get the desired response.

21
The minutes of the meeting and a summary of any changes made to the proposal will form the
basis of the DRP that will be implemented.

If your recommended DRP has been modified by management or they have chosen one of a
number of alternatives that you suggested, you should produce a new document that reflects
these decisions. This should then be signed off as the DRP.

Develop disaster recovery plan


Overview
Imagine being in a foreign country and not having any money because your credit cards and
money have been stolen from your bag. What would you do next? If you’d thought about that
before leaving you may have asked your bank about what to do in situations such as this. They
might have given you a phone number to call reverse charges or have a local contact for
emergencies.

Having a plan to follow when things go wrong is also important to business continuity. A plan
makes it easier for a business to return to production as soon as possible. Statistics show that
without a plan the business would most likely fail.

In this topic we will look out how to translate an agreed disaster prevention and recovery strategy
into a detailed process, procedure and resource plan. The plan is then used to recover from a
disaster of any magnitude be it minor, major or total devastation.

You should have already formulated preventive and recovery strategies. Formulating preventive
and recovery strategies requires:

 developing strategies for dealing with risk


 identifying the cost of preventive and recovery options
 completing a strategy report
 gaining approval from management for strategy implementation.

Implementing a disaster prevention and recovery strategy


Once the DPR strategy has been formally accepted by the business and approved by senior
management, it’s time to implement it. Required actions include:

 changing procedures, eg virus checkers to run each time a computer is switched on


 purchasing equipment to provide fault tolerance and standby
 implementing additional controls to identify errors
 improving backup procedures
 increasing security over data and user access
 developing the disaster recovery plan.

22
These can be categorised as:

 building or implementing in-built system contingencies


 bringing the current site to the standard required
 making changes to policies and procedures
 implementing additional or changed hardware and/or software.

In-built system contingencies


Not all prevention or recovery processes will cost money to implement. Often existing facilities
have not been fully implemented or turned on. These will vary from system to system and it is
important for the team undertaking the risk analysis to be aware of these built-in facilities.

We will examine a few of the built-in facilities of Windows XP Professional and how these may
be used to safeguard against different risk events. These are summarised in the following table:

Table 1: Windows XP system contingencies


Facility Function
User accounts Restrict access to authorised users only.
Encryption Additional level of security to ensure that confidential files are
secure
Permissions Allows some users restricted access (such as read only) to safeguard
the data from destruction or corruption
Auditing Tracks events to determine what users have been doing on their
computers
Lock computer Prevents others from accessing a user’s computer
Support for smart cards Restricts access to authorised users only
Automated System Allows quick recovery from an operating system problem
Recovery
Support for RAID 5 and Allows system to continue working even if a hard disk fails
mirroring
Recycle bin Allows recovery of recently deleted files
Backup software Creates backups of files and the whole system.
System restore Monitors and records system changes. Enables roll back to a
previous point in time
File protection Protects Windows files from being corrupted by rogue software
installs
Firewall Prevents malicious attacks by worms and other viruses from the
network or Internet

Controls such as passwords and access permissions may be referred to as logical controls.

Current site configuration

23
Here we are primarily concerned with systems in terms of software, data and hardware.
However, the security and controls that are implemented at the physical site are also an important
consideration in the risk analysis.

While encryption and user access can be used to prevent unauthorised access, no-one should be
able to physically access a computer in the first place. The following diagram will give you an
idea of some of the levels of physical security that may be applied.

An organisation in a secure building with locked doors on each floor with security guards and
video cameras can be confident that an intruder would find it difficult to access a PC and the
confidential data it contains. However, many frauds and errors are perpetrated by trusted
employees. That is why there is still an ongoing need for logical controls and passwords for each
user.

Figure 1: Security measures

Activity

To practise identifying systems contingencies go to Activity 1 located in the Activities section of


the Topic menu.

Review and update policies and procedures


The normal day-to day-operations of an organisation are described in its policy and procedures
manual. This may be stored electronically, on the company's Intranet or published as a paper-

24
based manual. After designing the recovery requirements, you will often need to update this
manual to include the changes required to prevent or recover from a disaster.

As mentioned earlier, many risk events are also security threats which are often identified during
a security audit or review. Similarly, review and investigation of the current procedures also
form part of the Disaster Recovery Planning process to ensure that they meet DRP requirements.

The review process follows the following stages:

1. Identify key DRP issues that should have been resolved by the existing processes and
procedures
2. Review and evaluate the operational policies to ensure that they meet the demands
imposed by the DRP
3. Design a series of tests to verify that procedures are in accordance with these policies
4. Carry out the testing and document the results
5. Evaluate the findings and make any recommendations for changes or approve the current
processes.

The procedural changes required will depend upon what is discovered and the DRP strategy
adopted. Here are a few examples:

Table 2: Examples of procedural changes


Strategy adopted Impact on procedures
Nightly backups to be Backup procedures and the process for getting backups offsite and
taken offsite subsequent retrieval will need to be described.
Software to be fully Testing procedures (defining what ‘fully tested’ means),
tested before going into documentation and test results to be maintained will need to be
production. described.
Virus checking Procedures to explain the danger of viruses, how to check for
viruses on disks and in e-mails and what to do if a virus is
discovered will be required.
Only licensed software to Procedures for checking the numbers of licenses that the
be used. organisation has and what to do if more are needed will be required.
Penalties to be imposed if staff disregard the policy.

A set of procedures for the disaster recovery plan itself will also be required.

Additional or changed hardware and/or software required


A DRP strategy usually requires new or updated hardware and software. Some of these
requirements are detailed in the following table:

Table 3: DRP requirements


Strategy Hardware or software
Regular backups to Tape backup unit with sufficient capacity. Tapes for the backup.

25
tapes Appropriate backup software.
Mirrored disks or Additional disks or disk subsystems.
RAID.
Fault tolerance Requires similar hardware to that being duplicated. If a file server is to
systems, duplicated be duplicated, a matching machine will be needed. May also require
systems additional software licenses.
Virus checking Virus software licenses for all users

Think about the hardware and software that would be required by the home user to implement
the disaster prevention and recovery strategies identified earlier, under which:

 work is saved every few minutes


 files are regularly backed up
 external backup devices such as tape, zip or CDs, are used
 important files are stored away from the home, possibly in the office
 UPS or surge protectors are used especially if in an area that suffers power problems.
 telephone surge protectors are used with modems
 virus checking software are always used and kept up to date
 a repair disk is always created
 serial numbers of all components are recorded in case of theft
 a fire extinguisher is kept in the vicinity of the computer
 only licensed software is used and all licenses are stored safely
 passwords and/or encryption is used to protect confidential files
 passwords are not stored in dial-up settings
 anti-spyware software and firewalls are always used if connected to the Internet
 security patches for software (operating systems and applications) are kept up to date.

The following hardware and software would be required:

 Backup tape unit (or zip drive or CD writer), tapes (or zip cartridges or CDs), appropriate
backup software and hardware drivers
 UPS and/or surge protectors for power and telephone
 Virus-checking software
 Fire extinguisher.

Identifying the required hardware and software and developing implementation plans to install
them forms part of the DRP project. One aspect of the risk analysis and recovery plan is to
identify cost-effective options to meet a variety of threats. These would have been approved as
part of the approved plan. However, before implementing these, you will need to select particular
products which meet both business requirements and cost constraints.

Sometimes the required resources cost much more than was originally estimated. In this case,
you will need to revisit the DRP and submit a new recommendation for approval. In extreme
cases (say, for example, the original cost for a hot site was estimated as $300,000, but turned out
to be nearer $1,000,000) management may decide that they will live with the risk and the
contingency option is dropped. They opt instead for a cold site costing only $200,000.

26
Precise requirements and costs are documented only when current operational procedures have
been reviewed and gaps between the ideal and what is actually agreed upon have been identified.

Once the report has been approved, it is time to put it into action. This may involve changing
existing policies and procedures and purchasing new hardware and software.

Large organisations may follow specific risk analysis methods and use specialised tools. You can
still carry out an analysis by identifying the asset to be safeguarded, possible risks to that asset,
the cost to the company of the risk event occurring, the likelihood of such an event and the cost
of prevention or recovery.

The output from the risk analysis is an action plan to make changes to the current way of
working in order to minimise or prevent the risk and a disaster recovery plan so staff know what
to do should the risk event occur.

Identifying cut-over criteria


How do you know when to activate your disaster recovery plan? If an earthquake that destroyed
the office building the answer would be obvious. But what if a computer virus deleted all the
data on one or all the servers. Each possible incident needs to be analysed to determine the
impact of the disruption to the business. The first step is to determine the extent of the impact to
establish how long it will take for the business systems to be restored. If this exceeds the
maximum allowable downtime, then a disaster is declared.

The Disaster Recovery Co-ordinator, with input from upper management, is responsible for
deciding when to activate the disaster recovery plan. If the co-ordinator is not available,
responsibility flows down the chain of command. This is why it is important for roles and
responsibilities to be clearly defined in the Disaster Recovery Plan. A contact list should be
created and maintained containing details of all employees with after-hours phone numbers. The
organisation’s internal directory listing, it can be modified accordingly.

27
Figure 2 Example of a generic structure for disaster recovery

Documenting the Disaster Recovery Plan


All that remains is to document the Disaster Recovery Plan. The plan outlines the tasks that need
to be completed to recover from the disaster and return the business to its normal operations. The
plan is a dynamic one – it will constantly change as the business changes. Therefore it is
important to review it at regular intervals to ensure it is up to date.

There are many different possible formats for a DRP.

Here is one suggestion:

 Introduction
 Purpose
 Scope
 Authorities (what legal/contractual requirement the DRP complies with)
 Record of change
 Operations
 Systems description and architecture (a general description of all the systems
 Responsibilities (detailed outline of teams responsible for recovery operations)
 Activation phase (initial actions to detect and assess damage)

28
 Recovery phase (processes and procedures to complete recovery of each system with
nominated staff positions responsible for each task)
 Details of the post-recovery review to be performed after the completion of the recovery
from any declared disaster.

An example DRP

A small firm of accountants consisting of two partners and four assistants operate from a house
converted into offices. The building has fire alarms installed but no other fire or security devices.
Computers are allocated to each staff member and networked to a file server. The equipment,
which is around two years old, has been reliable to date but is now out of warranty and the
original supplier is no longer in business.

The server has a tape backup unit but no one knows how to use it.

The office uses the following software:

 MYQB accounting system to process the accounts for 150 clients. Average charge per
client is $1,500 per account. Work for each client is carried out throughout the year.
 PAT tax return system used with 2,000 clients. Average charge to a client is $150 per
return. Most work is done from September to December.
 FIN to manage the financial affairs of 100 clients. Average charge per client is $2,000.
Work is carried out four times a year in March, June, September and December.
 TIM for time recording of staff and billing of clients. Weekly time recording and monthly
billing.
 TRUS to keep trust accounts for 200 clients. Average charge is $1,000 per annum. Work
is carried out in March and September. This software was developed by one of the
assistants.
 Excel for spreadsheet work
 Word for word-processing
 Windows ME and a NetWare server.

All users have access to the Internet and use e-mail to communicate with clients. Turnover of
staff is quite high and all the assistants have been with the firm for less than a year. To minimise
the need to administer the network, everyone signs on with the same user ID and password. This
has been the situation for over two years.

Issues to be considered at this firm

If you were to asked to undertake a risk analysis for this office you would need to take into
account the following issues.

 There are peak periods when risk events could cause greater damage
 Confidential data is kept about clients
 There are legal requirements to meet deadlines
 Clients could readily change accountants if service is poor.

29
Major risks

Table 4: Major risks


Risk Likelihood
PC failure and breakdown Very likely
Theft Possible
Fire Possible
Client data security breaches Possible
Software support problems Very likely
Loss of data Possible

Business requirements of the firm

The firm generates a lot of income from the use of computers. If we consider the worst possible
case then during peak periods they could earn an income of $18,750 from accounting work,
$75,000 from tax returns, $50,000 from FIN work, and $100,000 from trusts. This amounts to
nearly $250,000 a month or $60,000 a week or $12,000 a day! If the server were down and little
work could be done, this the amount of revenue they could lose.

This is an over-simplification, however, since work would most likely just be delayed and staff
could work overtime to catch up. On the other hand, if the problem resulted in the late payment
of taxes, and the firm were held liable, they could end up paying a lot more in fines. Either way,
this exercise demonstrates that the systems are critical.

Actions to be taken

Table 5: Actions
Risk Action
PC failure and Consider the need for standby PCs and/or server. Negotiate service and
breakdown support contracts for 24/7 quick response service. Implement backup
procedures as a priority.
Theft Consider office security involving the use of alarm systems and window
shutters or bars. Implement backup procedures as a priority.
Fire Consider use of a fireproof safe. Implement backup procedures as a priority
including off-site backup.
Client data Assign each user a user with an ID and password. Improve security procedures
security and staff training. Ensure appropriate access is implemented. Consider the use
breaches of encryption for sensitive data. Consider implementing auditing of file level
access.
Software Review quality of support provided for TRUS system. Consider outcomes if
support staff member leaves. Negotiate for the source code or consider escrow
problems arrangement.
Loss of data Implement backup procedures as a priority.

Disaster Recovery Plan

30
This document describes the procedures to be followed in the event of a major disaster, such as a
fire, that completely destroys the building.

In order to recover from a disaster it is important that staff complete regular recovery and
operational procedures. These are listed below.

File storage and backup

 All client files must be stored on the file server. Only files of a temporary or personal
nature should be kept on the local hard disk.
 Every evening, a full backup of the file server is to be completed. Instructions for
carrying out the backup are provided in the folder next to the backup tapes in the storage
room. The tape rotation scheme is also documented.
 The backup takes several hours and will run automatically overnight. First thing each
morning the IT administrator will check the backup report to ensure that it finished
correctly.
 The tape will then be stored in the fireproof safe until 4 pm. At that time a courier will
collect the tape from the off-site storage agent. Full procedures for recording the dispatch
of these tapes are available.
 If the IT administrator is away from the office, the chief financial officer will arrange for
someone else to carry out their task.
 Every quarter backups are verified to ensure the backup tapes are readable and data can
be recovered

Diaries

 All staff must record their day’s activities in their diary. This is used for billing purposes
and is also part of the Disaster Recovery Plan.
 Details of what is to be recorded are contained in the Employee Handbook.
 Each evening staff must update the time recording system and take their diaries home
with them.

In case the office is destroyed

 A copy of these instructions is kept at each partner’s house and at the bank. Whoever
arrives at the office first (normally a partner) will call the appropriate site to get these
instructions.
 All staff are to meet at the local RSL. They are not to go home! The firm has an
agreement with the RSL that it can hire one of their rooms.
 The IT administrator will call the off-site agent and arrange for the latest backup tape to
be delivered.
 The firm has an arrangement with a computer hire company to hire computers in order to
resume business as soon as possible. These will be set up at the RSL and the IT
administrator will rebuild the file server and restore the server. While this will still take
several hours, the firm should be able to start operating again by lunchtime.

31
 The backup tape will be one day out of date and staff must now review their diaries to
identify the work they had done on the previous day.
 If the IT administrator is not available that day, recovery procedures will pass down the
chain of command.

32

You might also like