Professional Documents
Culture Documents
Information Sheet (DRP)
Information Sheet (DRP)
plans
Introduction
A business is made up of many different systems: marketing, accounting, logistics,
manufacturing and so on. Information systems are an important part of any business. It is the
glue that interconnects all the systems.
The goal of a DRP is to recover critical business systems as soon as possible to a minimal
functioning order.
Unit topics
The topics for this unit are as follows:
In this topic you will learn to gather information about the different information systems in a
business. In particular you will focus on how to:
In this topic you will learn how to identify and classify different threats to the system. You will
also learn about the strategies available to minimise the impact of the threats.
1
In this topic we look at which recovery and preventive measures are available and how to
document and gain approval for a disaster prevention and recovery strategy.
In this topic you will look at how an agreed disaster prevention and recovery strategy can be
translated into detailed processes, procedures and resources.
Overview
When creating a disaster recovery plan, business impact statement and business continuity plan,
the first step is to understand which parts of a business are critical for operation. That is which
systems, including processes, infrastructure and operating data, are critical in doing what the
business does. To understand what is critical, information is gathered from many different parts
of the business. The material gathered contains information about technology infrastructure
including:
software
hardware
data
network
facilities.
The information is then analysed to determine what must be available for the business to
continue working.
In this topic we collect information about the different information systems in a business. In
particular we focus on how to:
2
A system is critical for a commercial organisation if its failure results directly or indirectly in
loss of life (for example, an air traffic control system) and/or major financial loss. When
developing a disaster recovery plan (DRP) it is essential to identify critical systems and ensure
they are restored as soon as possible.
Each critical system has a maximum allowable downtime beyond which its loss will severely
impact the business. The shorter the period of time before losses start to occur, the more critical
the system is. The size of the financial loss, relative to the financial worth of the business, is also
significant. The greater the financial loss in percentage terms, the more critical the system is.
Ideally, the business case or proposal for each new system should identify its importance and a
risk analysis should be undertaken early in the project. This information may already be
available in the project documentation, in which case you would review this material and
identify the risk issues that have been raised.
In the absence of this information, you may need to survey the organisation’s business areas or
conduct workshops where managers can consider the critical nature of their systems.
During this process, each system should be considered as a whole. All the parts that make up the
system must be carefully documented. Only then can it be determined what part of the system is
critical.
You will need to collect information about how the system uses:
software
hardware
networks (voice and data)
data
facilities (chairs, tables projectors etc).
Software in the form of standard packages is used to access data. It can be readily replaced.
Data may have been gathered over many years and is unique and irreplaceable.
Hardware is needed to run the software and access the data. Software requires a minimum
hardware platform to work properly.
Facilities such as chairs, telephones tables, paper based form etc complete the system.
3
Since systems become more critical at different times, the maximum allowable downtime can
vary depending on the time of day, week, month or year a disaster happens. For example, many
businesses work to a monthly accounting cycle: losing their financial system at the end of the
month would have a greater impact than in the middle of the month.
Consider the critical systems on your personal computer at home. Assess whether the following
situations make your systems critical or not.
1. You are working late on a 50-page assignment that must be handed in by 9 30am the next day
otherwise you will fail the course.
2. You are using the Internet to book a holiday you intend taking in three months time.
3. You have developed a spreadsheet to calculate your tax return.
4. You have created a database of CDs, records, tapes and videos which you will need to show your
insurance company if the collection is destroyed or stolen.
5. You have saved several versions of your favourite computer game.
Activity
To practise identifying critical systems complete Activity 1 in the Activities section of the Topic
menu.
Before starting work on the DRP all critical systems must be identified and documented. Users
and management complete critical systems/data assessment forms with the guidance of IT staff.
Once completed, they form an integral part of the system documentation.
4
Review Software used (.doc 30KB)
Use this form to identify the software that is most frequently used. Frequency may or may not
indicate the software is critical. For example, many users may use a word processor every day
but this may not be critical to the organisation. Further analysis is required.
Complete this form for each system that is used constantly or frequently, for example, an email
or e-commerce system. You can use it to identify how important the data is, which data items are
easy to recover and which are not.
The form identifies the types of data activity carried out and where the source data originates.
The level of difficulty in restoring data and impact on the organisation is then measured in
percentage points.
Let’s say, for example, we need to assess an email system. The percentage level of criticality to
the organisation is indicated with examples and an explanation of how this level was arrived at.
not critical
because it can
5
be recovered
elsewhere)
From 10%
irrecoverable
sources such (eg diary and
as telephone calendar—not
calls critical for the
running of a
business even
though data
can’t be
recovered)
Developed at 60% 5% 5%
the
workstation (eg sent emails (eg (eg received
such as report and emails and
writing attachments— meeting attachments
room stored in
critical for bookings in temporary
organisation shared files—can’t be
because if email inbox replaced but not
crashes the critical because
business suffers) - not critical email and
for running attachments can
the business be resent.)
even though
data can’t be
recovered)
Other –
specify
Note how most of the data files for the email system are developed and created at the
workstation. The loss of these files has a high impact on the individual but not on the business as
a whole.
The following tables describe the significance of the loss of source files in relation to the purpose
for which they are used.
6
Developed at the workstation such as Recovery impossible unless regular backups of files are
report writing made and stored externally.
Other – specify Determine how easy it is to get back to a source or
original.
Table 4: Impact of loss of source files on business.
Data used to Issues
Update corporate data files Important data used by many and may be critical.
Create own data files May be critical data but restricted impact and short life
Create shared documents May be critical data but restricted impact
Create own temporary Unlikely to be critical
documents
Create own longer term May be critical data but restricted impact, may be required
documents again
Complete this form for each system. It helps identify what equipment is needed to run each
system.
Complete this form to identify the impact of system failure in a number of different areas. The
answers ’very costly’, ’serious’ and ’little or no effect’ quantify the size of the financial loss and
thus the magnitude of the impact on the business.
The form should be completed for different time periods to show what the impact of system
failure would be in minutes and hours for time-sensitive critical systems and hours and days for
others.
7
by customers or have to undertake extra work to sort out problems.
Note that all these areas can eventually have an impact on profit so the user should identify the
primary area of impact.
Question 4d
This is a different approach to identify the user’s dependence on the system. The question would
be asked for all major systems.
Question 4e
Problems occur at the worst possible time. Payroll programs may only be critical once a month
when the payroll is calculated and that will be the time that they fail. You should plan to handle
the worst-case scenario.
Question 4f
Activity
To practiseanalysing critical areas go to Activity 2 in the Activities section of the Topic menu.
Having identified one or more critical systems, these need to be ranked in order of importance
and impact on the organisation. It is unlikely that you will have the time to implement DRP
procedures for all systems so you should initially concentrate on the most important.
Activity
To practise ranking the critical systems go to Activity 3 in the Activities section of the Topic
menu.
8
When undertaking risk analysis and disaster planning, it is usual to focus on critical systems,
software and data. The very definition of a critical system is that the business depends upon it
and would be severely impacted if the system were not available.
Forms 1 to 4 assist in analysing how long the business can cope after a loss.
Many organisations, such as banks, stock exchanges and automated factories, cannot manage
more than a few minutes without their systems. Imagine the state of the rail system or air traffic
control without the use of their computers. Even the local supermarket would suffer loss if the
tills went down for several minutes.
When assessing the impact on a business it is usual to consider the financial impact. Profits will
suffer if customers cannot trade with the company. If an e-commerce website is down, for
example, customers may turn to competitors.
There may also be an impact on cash flow. Not so long ago, a bank had to borrow millions of
dollars overnight to cover its needs when its computers went down.
If systems are regularly down or slow then customers may eventually go elsewhere. If faulty
systems delay payments, suppliers may stop delivering essential goods and services
Corporate regulation
Business continuity management (BCM) and DRP form part of the core principles of the
International Standards on Prudential Regulation. The Australian Prudential Regulatory
Authority (APRA) regulates the Australian financial industry, overseeing banks, general and life
insurers and most members of the superannuation industry.
Financial institutions are subject to auditing by APRA, including on-site visits. APRA
determines whether the business has an adequate and up-to-date DRP in place and whether the
testing program is sound. Any irregularities are noted in an audit report and a formal notice is
sent to the business. If the business fails to rectify problems it can be fined or even suspended
from trade.
Organisations trading in the USA are subject to recently enacted legislation (Sarbanes-Oxely)
which has considerably tightened their operating requirements. Failure to comply would result in
heavy fines.
9
The need to identify which critical systems rely on outside services or resources is paramount in
managing business continuity. Once critical systems are identified, it is necessary to state in the
Service Level Agreement (SLA) with the supplier how business outages will be handled. For
example, the SLA may require a supplier to store excess stock at an offsite storage area or
arrange for a competitor to handle supply until business resumes.
Take, for example, a car manufacture which purchases components for steering wheels. If the
component supplier is unable to fulfil orders due to a disaster, then the car manufacturer must
stop production and lose millions of dollars. To reduce the risk of this happening, the car
manufacturer stipulates in its SLA with the component supplier that there will be a penalty of
$100,000 per day for non-supply of components. The component supplier is forced to have a
Disaster Recovery Plan to ensure production is resumed as fast as possible or risk of being
penalised or even financial collapse.
You should have already evaluated the impact of critical systems on business continuity.
Evaluating the impact of critical systems on business continuity requires
10
Risk analysis
Risk analysis is an analytical process undertaken to evaluate system assets and examine their
susceptibility to threats. Through this process we evaluate the possible commercial losses that
may result from the loss of these assets.
The basic purpose of a risk analysis is to identify preventive and recovery options for assets.
Think about assets of your own which you would take steps to protect from loss. For example, if
you own a car, you might install an alarm and immobiliser to deter theft. In the same way, a
company will also take precautions with its assets.
Computer systems (including hardware, software and data) are valuable assets of an
organisation. It is therefore very important that a risk analysis be undertaken to identify and
safeguard these systems. A major factor in risk analysis is to identify the impact of systems on
business continuity. ‘Mission critical’ systems require the greatest level of protection.
The loss of IT systems could have a major impact on many businesses. Many would come to a
standstill in minutes without their critical business systems. Even a small company could get into
financial difficulties if it lost its accounting data and did not know who owed it money.
11
IT systems can comprise many parts including:
hardware
software
networks
data
technical skills
projects.
There are many ways to categorise threats. One way is to consider whether the source of the
threat is internal or external.
Internal threats
Internal threats mainly result from actions by users and/or IT staff. These can include:
viruses corrupt or delete data*. Users can unknowingly transfer viruses to the corporate
network via mobile devices such as personal data assistants or laptop computers. For
example, a user might buy a new laptop and connect it to the Internet to check for updates
at home. They are unaware that a virus is downloaded on their computer. The next day
the user takes the laptop to work and connects it to the corporate network. The virus is
then spread throughout the network deleting important data. Normally the virus would
have been stopped by the corporate firewall.
the wrong disk is formatted destroying data and software. Mistakes are easily made
when formatting a hard disk using the command line. For example, a person on work
experience could accidentally format the wrong hard disk drive by entering a wrong
command.
sabotage. Data and software*are intentionally destroyed or corrupted.
data and software files are deleted. Deleting data can be accidental or intentional. For
example, a person could accidentally press the delete key when moving data or
intentionally delete data through known software system vulnerabilities.
a password is forgotten so data or software cannot be accessed. For example, a
retrenched employee deliberately doesn’t update a password list.
input errors cause data to be corrupted.* If operators input incorrect, duplicated or
unauthorised transactions, then very quickly the data becomes corrupted or inaccurate.
How many stories have you heard about computers sending out a bill for millions of
dollars to an old age pensioner or cheques for two cents?
processing errors cause data to be corrupted.* Poor software design changes data.
hardware failure occurs so data and software are not available. Hardware and
networking equipment is delivered with a mean time to failure or mean time to repair.
This is the expected time after which hardware will need to be replaced or repaired.
Preventive maintenance can prolong this period.
fraud. Data is corrupted in order to steal assets.*
poor testing. Bugs are left in software so errors or delays occur.*
incorrect processes or calculations occur in programs so errors or delays occur.*
12
copyright and license agreements are broken which leads to the company being sued
by the owner of the copyright or license provider.
External threats
The more serious external threats are likely to have an impact on the hardware and networks on
which the system run.
Threats listed above marked with an * may have been previously identified by a security audit or
analysis. The organisation’s internal or external auditors may have already performed such an
analysis providing you with a useful source of information. To see an example of an audit report,
click on or copy the following link.
http://www.anao.gov.au/WebSite.nsf/Publications/4A256AE90015F69BCA256A6900112E38
Consider the Urban Homeware Company which has 10 stores located across the state. The
company headquarters are located in the capital city. They have identified their POS and
dispatch systems as ‘critical systems’. What threats can be identified for these systems?
Internal threats
viruses – deleting important data. Viruses can spread to stores via dial-up connection to
company headquarters. Point of Sale terminals are not connected to the Internet but are
still susceptible to virus attacks by employees transferring data from CD’s or floppy
disks.
hardware failure. Computer servers or networking equipment fail causing loss or
inaccessibility of data.
deleting or changing data. Accidental deleting or changing of data by employees or
software programs.
13
input errors. Mistakes by POS operators.
External Threats
Activity
To practise identifying threats to the system go to Activity 1 and Activity 2 in the Activities
section of the Topic menu.
As individuals we have a choice whether to develop these prevention and recovery strategies but
for organisations it is essential in order to avoid serious loss of revenue or even the loss of a
company.
In this topic we look at the recovery and preventive measures available for the different
identified threats to organisations and how to formulate and gain approval for a disaster
prevention and recovery strategy.
You should have already evaluated threats to the system. Evaluating threats to the system
requires:
14
Strategies for dealing with risk
There are two main strategies for dealing with risk (apart from ignoring it in the hope it will go
away): prevent or recover. Both options have the objective of minimising the impact of the risk
event.
Prevention
With prevention you attempt to decrease the probability (maybe even to 0) of the event occurring
or causing damage. Many events can never be totally eliminated but their impact may be
minimised.
For example, an extensive sprinkler system will ensure that any outbreak of fire does minimal
damage. It is almost impossible to totally prevent a fire from occurring in the first place but this
is still considered a preventative action. This type of activity may also be termed risk
minimisation.
Recovery
Recovery procedures are put in place to ensure that the system can be quickly restored after the
event occurs. For example, the use of a hot-site (one that has a computer system already set up
and ready to use) allows for speedy recovery after a fire has gutted the building. This process
may also be termed a contingency. In fact DRP is sometimes referred to as contingency
planning.
15
To stop software being copied and Software keys Prevention
breaking license agreements.
To allow access to data to Mirrored disks or RAID (Redundant Array of Prevention
continue even if a disk fails. Inexpensive Disks) systems, clustered systems
To stop unauthorised access to Access rights Prevention
data and data destruction.
To minimise impact of power loss Uninterruptible power supplies (UPS), standby Prevention
or spikes and surges. generator
When deciding which options to adopt, you need to weigh the possible cost of the risk event
against the cost of the recovery or prevention option (single incident cost). A simple formula can
be used to calculate how much money to allocate to a recovery or prevention measure for the
known value of an asset.
The loss of critical systems can cost major organisations, such as banks, large sums of money.
They are therefore willing to invest in backup sites to keep their systems running in the event of
a major disaster. Their numerous branches and offices provide locations in which they can site
the backup equipment.
While a typical small business can still suffer a relatively large loss in the case of critical system
failure, it will probably not choose to create a backup site because of the high cost.
Let us consider the case for installing a power surge protector in an average home. Suppose there
is a power surge while you are operating your computer. It could be seriously damaged or, at the
very least, you would be faced with disruption while your computer is being repaired.
Let’s assume the worst case scenario that the single incident cost is $1200 or the cost of a new
computer. Meanwhile, a computer vendor is selling power surge protectors for $10.
So spending $10 could save $1200 in the long run. While this represents a substantial cost
benefit, it may not be enough to convince some people to purchase such a device, especially if
their computer is only used for games.
However, people who use their home computer for work are likely to have a different attitude.
Assume someone is earning $50 per hour. Their computer is damaged by a power surge and is
taken away for repairs for one day. That person stands to lose around $400 (earnings for an 8-
16
hour day) plus the cost of repairs – say $1600 in total. Intangible costs also need to be
considered: if a customer has their work delayed as a result, they may decide to send their work
elsewhere in the future.
If you live in an area that is prone to power surges, common sense would dictate that you
purchase a power surge protector. Let’s suppose that the probability of a power surge occurring
is 1 in 120 or roughly three times a year.
Use the following simple formula to estimate how much should be spent to safeguard against
power surges.
This means that it would be viable to spend $13.60 per incident or $40.80 per year on protection
against power spikes.
So why do so many people not bother to buy a power surge protector? When it comes to risk
analysis, people often seem to adopt an air of optimism – the ‘it will never happen to me’
syndrome. Interestingly, it is the computer user who has already suffered a loss who buys these
devices – a case of shutting the stable door after the horse has not just bolted but died!
This attitude can also be found in business. There are many managers who are willing to live
with a risk rather than spend the money on something that may never be needed.
Available options
In choosing risk prevention and recovery options to employ you need to consider:
how critical the system is and how far the organisation relies on it
the surrounding infrastructure and how susceptible the organisation is to a risk event
the existing procedures and controls used and how these may be enhanced
the equipment that may be available to prevent or recover from the event
the number of risk events or systems a particular option may cover
what the option will cost and how much the organisation is prepared to pay
when you have completed the analysis and considered the risk minimisation options, the
findings are compiled in a report to be submitted to management for approval.
once the risk minimisation strategy has been approved it has to be acted upon and the
equipment and procedures put in place.
Consider the following disaster prevention and recovery strategies available for a typical home
computer user.
17
save work every few minutes
regularly back up files according to their importance
use external backup devices such as tape, zip or CDs
store important files away from the home, possibly at the office
use UPS or surge protectors especially in areas prone to power problems
use telephone surge protectors with modems
install and update anti-virus software
create a repair disk
record serial numbers of all components in case of theft
keep a fire extinguisher in the vicinity of the computer
use only licensed software and store all licenses safely
use passwords and/or encryption to protect confidential files
avoid storing passwords in dial up settings. (While this makes logging in easier, it also makes it
easy for a thief to access your account)
use anti-spyware software and firewalls if connected to the Internet
keep up to date security patches for software (operating systems and applications).
Strategy report
A Strategy Report recommends the risk prevention and recovery strategies to be applied and
provides a summary of the risk analysis exercise. A typical report includes the following:
The systems covered by the analysis and any other scope definition
You may be preparing a plan for a single system or for several systems. There may already be a
DRP in place to recover from network or hardware disasters. You should define the areas that
this particular plan covers and also what it does not cover.
The systems that were identified as critical on which the analysis has been based
It is normal to focus on the most critical systems first since, if these are protected, the same
processes will often also protect other systems. You should ensure that readers of your report
understand how you arrived at your findings and why you concluded that a particular system is a
critical one.
Since you need to persuade managers to spend money on the DRP, you should describe in vivid
terms the impact of a disaster on profitability, cash flow and customer relationships to achieve a
dramatic affect.
18
Current security and control of these systems
If the system is already in production then you should summarise the current situation and
identify strengths and weaknesses.
Your recommendations may be based on little or no change in the environment. However future
business developments may have an impact on your solution. For example, you may propose to
use one of the organisation’s sites as a backup site to minimise the cost. If this site is scheduled
for closure, then your plan may not be practical.
This summarises the risk analysis activity that should have been fully documented.
Details of the findings of the risk analysis outlining the method used to determine probability of
events occuring .
Costs should be expressed in such a way that it will capture the attention of the managers reading
the report. For example ‘If the system is down for 30 minutes we would lose $1,000,000 in
revenue!’
Having described the problems, you can now show how you can solve them. You may need to
provide alternative approaches and options, for example, the facilities provided by a hot site and
a cold site.
This is what the managers will be keen to find out. What is the value of the benefits of the
proposed solutions against the cost? You should also include intangible benefits (for example,
improved customer service) in your analysis even though these are difficult to quantify in dollar
terms.
Recommendations
Develop your argument into a recommendation. It may also be worth discussing what may
happen if the recommendations are not followed.
19
Action items and activities required to implement recommendations
To show that you have thought through the proposal, you should describe how the DRP will be
implemented if it is approved.
There are several examples of reports on the Internet and these can be found by following the
suggested links in References.
A well thought out, logical, cost-effective report should have no trouble being approved.
However, if funding is tight, management may still prefer to save money in the short term by not
implementing any high-cost recommendations. In this case it is prudent to ensure that the
minutes of the meeting clearly show that management decided on this course of action.
Your presentation (using PowerPoint or other software) should include the following:
Getting approval
You may think that you have developed the best DRP in the world. However, you might present
it to management only to have it rejected. Why could this happen?
As you prepare to write your report and/or present your case consider the following issues:
Use appropriate language. If the audience is made up of non-IT managers, avoid technical terms.
Use business terms and try to show the impact on individuals. For example, ask the payroll
20
manager ‘What would be the impact of an incorrect tax calculation?’ Ask the production
manager ‘How long could you go without raw materials before laying off workers?’
All major decisions that management makes are usually based on as a business case. Basically
this explains the current situation, what the problems are and how to solve them. Express your
argument in values and key performance indicators. Most organisations focus on the profit of the
business. Explain what impact a disaster would have on this.
Disasters occur all the time. Perhaps your business recently suffered a power outage. What
happened and how did it cope? Use this and other real-life incidents to demonstrate that these
things do happen. Carry out research and have the facts available to back up your argument.
The business may need to meet certain legal or statutory obligations. How embarrassing would it
be for a hospital if patients’ records were disclosed? What would happen if Tax File Numbers
were not kept secure?
The heart of the business case is what the DRP will cost and what benefits will be gained. The
problem with DRPs is that if a disaster never occurs, it can appear to be a waste of money. Are
there any benefits to be gained as a by-product? For example, could the use of encryption be
used as a marketing tool to encourage more security-conscious customers?
Where will the money come from? Can the costs be spread over a period of time? What can be
achieved without cost or simply by reconfiguring the system? Can each business department be
individually billed? Don’t forget to add the cost of the project team carrying out the DRP.
Provide alternatives
Managers like choices. Give them options but don’t give them so many that they get confused.
Keep it simple. Remember, they can still decide to go with the risk and not put in place any
recovery or preventative strategy.
Describe the threats and the problems but quickly move on to your suggested solutions and the
associated benefits. Managers want to hear solutions not problems. If you follow these guidelines
then you should get the desired response.
21
The minutes of the meeting and a summary of any changes made to the proposal will form the
basis of the DRP that will be implemented.
If your recommended DRP has been modified by management or they have chosen one of a
number of alternatives that you suggested, you should produce a new document that reflects
these decisions. This should then be signed off as the DRP.
Having a plan to follow when things go wrong is also important to business continuity. A plan
makes it easier for a business to return to production as soon as possible. Statistics show that
without a plan the business would most likely fail.
In this topic we will look out how to translate an agreed disaster prevention and recovery strategy
into a detailed process, procedure and resource plan. The plan is then used to recover from a
disaster of any magnitude be it minor, major or total devastation.
You should have already formulated preventive and recovery strategies. Formulating preventive
and recovery strategies requires:
22
These can be categorised as:
We will examine a few of the built-in facilities of Windows XP Professional and how these may
be used to safeguard against different risk events. These are summarised in the following table:
Controls such as passwords and access permissions may be referred to as logical controls.
23
Here we are primarily concerned with systems in terms of software, data and hardware.
However, the security and controls that are implemented at the physical site are also an important
consideration in the risk analysis.
While encryption and user access can be used to prevent unauthorised access, no-one should be
able to physically access a computer in the first place. The following diagram will give you an
idea of some of the levels of physical security that may be applied.
An organisation in a secure building with locked doors on each floor with security guards and
video cameras can be confident that an intruder would find it difficult to access a PC and the
confidential data it contains. However, many frauds and errors are perpetrated by trusted
employees. That is why there is still an ongoing need for logical controls and passwords for each
user.
Activity
24
based manual. After designing the recovery requirements, you will often need to update this
manual to include the changes required to prevent or recover from a disaster.
As mentioned earlier, many risk events are also security threats which are often identified during
a security audit or review. Similarly, review and investigation of the current procedures also
form part of the Disaster Recovery Planning process to ensure that they meet DRP requirements.
1. Identify key DRP issues that should have been resolved by the existing processes and
procedures
2. Review and evaluate the operational policies to ensure that they meet the demands
imposed by the DRP
3. Design a series of tests to verify that procedures are in accordance with these policies
4. Carry out the testing and document the results
5. Evaluate the findings and make any recommendations for changes or approve the current
processes.
The procedural changes required will depend upon what is discovered and the DRP strategy
adopted. Here are a few examples:
A set of procedures for the disaster recovery plan itself will also be required.
25
tapes Appropriate backup software.
Mirrored disks or Additional disks or disk subsystems.
RAID.
Fault tolerance Requires similar hardware to that being duplicated. If a file server is to
systems, duplicated be duplicated, a matching machine will be needed. May also require
systems additional software licenses.
Virus checking Virus software licenses for all users
Think about the hardware and software that would be required by the home user to implement
the disaster prevention and recovery strategies identified earlier, under which:
Backup tape unit (or zip drive or CD writer), tapes (or zip cartridges or CDs), appropriate
backup software and hardware drivers
UPS and/or surge protectors for power and telephone
Virus-checking software
Fire extinguisher.
Identifying the required hardware and software and developing implementation plans to install
them forms part of the DRP project. One aspect of the risk analysis and recovery plan is to
identify cost-effective options to meet a variety of threats. These would have been approved as
part of the approved plan. However, before implementing these, you will need to select particular
products which meet both business requirements and cost constraints.
Sometimes the required resources cost much more than was originally estimated. In this case,
you will need to revisit the DRP and submit a new recommendation for approval. In extreme
cases (say, for example, the original cost for a hot site was estimated as $300,000, but turned out
to be nearer $1,000,000) management may decide that they will live with the risk and the
contingency option is dropped. They opt instead for a cold site costing only $200,000.
26
Precise requirements and costs are documented only when current operational procedures have
been reviewed and gaps between the ideal and what is actually agreed upon have been identified.
Once the report has been approved, it is time to put it into action. This may involve changing
existing policies and procedures and purchasing new hardware and software.
Large organisations may follow specific risk analysis methods and use specialised tools. You can
still carry out an analysis by identifying the asset to be safeguarded, possible risks to that asset,
the cost to the company of the risk event occurring, the likelihood of such an event and the cost
of prevention or recovery.
The output from the risk analysis is an action plan to make changes to the current way of
working in order to minimise or prevent the risk and a disaster recovery plan so staff know what
to do should the risk event occur.
The Disaster Recovery Co-ordinator, with input from upper management, is responsible for
deciding when to activate the disaster recovery plan. If the co-ordinator is not available,
responsibility flows down the chain of command. This is why it is important for roles and
responsibilities to be clearly defined in the Disaster Recovery Plan. A contact list should be
created and maintained containing details of all employees with after-hours phone numbers. The
organisation’s internal directory listing, it can be modified accordingly.
27
Figure 2 Example of a generic structure for disaster recovery
Introduction
Purpose
Scope
Authorities (what legal/contractual requirement the DRP complies with)
Record of change
Operations
Systems description and architecture (a general description of all the systems
Responsibilities (detailed outline of teams responsible for recovery operations)
Activation phase (initial actions to detect and assess damage)
28
Recovery phase (processes and procedures to complete recovery of each system with
nominated staff positions responsible for each task)
Details of the post-recovery review to be performed after the completion of the recovery
from any declared disaster.
An example DRP
A small firm of accountants consisting of two partners and four assistants operate from a house
converted into offices. The building has fire alarms installed but no other fire or security devices.
Computers are allocated to each staff member and networked to a file server. The equipment,
which is around two years old, has been reliable to date but is now out of warranty and the
original supplier is no longer in business.
The server has a tape backup unit but no one knows how to use it.
MYQB accounting system to process the accounts for 150 clients. Average charge per
client is $1,500 per account. Work for each client is carried out throughout the year.
PAT tax return system used with 2,000 clients. Average charge to a client is $150 per
return. Most work is done from September to December.
FIN to manage the financial affairs of 100 clients. Average charge per client is $2,000.
Work is carried out four times a year in March, June, September and December.
TIM for time recording of staff and billing of clients. Weekly time recording and monthly
billing.
TRUS to keep trust accounts for 200 clients. Average charge is $1,000 per annum. Work
is carried out in March and September. This software was developed by one of the
assistants.
Excel for spreadsheet work
Word for word-processing
Windows ME and a NetWare server.
All users have access to the Internet and use e-mail to communicate with clients. Turnover of
staff is quite high and all the assistants have been with the firm for less than a year. To minimise
the need to administer the network, everyone signs on with the same user ID and password. This
has been the situation for over two years.
If you were to asked to undertake a risk analysis for this office you would need to take into
account the following issues.
There are peak periods when risk events could cause greater damage
Confidential data is kept about clients
There are legal requirements to meet deadlines
Clients could readily change accountants if service is poor.
29
Major risks
The firm generates a lot of income from the use of computers. If we consider the worst possible
case then during peak periods they could earn an income of $18,750 from accounting work,
$75,000 from tax returns, $50,000 from FIN work, and $100,000 from trusts. This amounts to
nearly $250,000 a month or $60,000 a week or $12,000 a day! If the server were down and little
work could be done, this the amount of revenue they could lose.
This is an over-simplification, however, since work would most likely just be delayed and staff
could work overtime to catch up. On the other hand, if the problem resulted in the late payment
of taxes, and the firm were held liable, they could end up paying a lot more in fines. Either way,
this exercise demonstrates that the systems are critical.
Actions to be taken
Table 5: Actions
Risk Action
PC failure and Consider the need for standby PCs and/or server. Negotiate service and
breakdown support contracts for 24/7 quick response service. Implement backup
procedures as a priority.
Theft Consider office security involving the use of alarm systems and window
shutters or bars. Implement backup procedures as a priority.
Fire Consider use of a fireproof safe. Implement backup procedures as a priority
including off-site backup.
Client data Assign each user a user with an ID and password. Improve security procedures
security and staff training. Ensure appropriate access is implemented. Consider the use
breaches of encryption for sensitive data. Consider implementing auditing of file level
access.
Software Review quality of support provided for TRUS system. Consider outcomes if
support staff member leaves. Negotiate for the source code or consider escrow
problems arrangement.
Loss of data Implement backup procedures as a priority.
30
This document describes the procedures to be followed in the event of a major disaster, such as a
fire, that completely destroys the building.
In order to recover from a disaster it is important that staff complete regular recovery and
operational procedures. These are listed below.
All client files must be stored on the file server. Only files of a temporary or personal
nature should be kept on the local hard disk.
Every evening, a full backup of the file server is to be completed. Instructions for
carrying out the backup are provided in the folder next to the backup tapes in the storage
room. The tape rotation scheme is also documented.
The backup takes several hours and will run automatically overnight. First thing each
morning the IT administrator will check the backup report to ensure that it finished
correctly.
The tape will then be stored in the fireproof safe until 4 pm. At that time a courier will
collect the tape from the off-site storage agent. Full procedures for recording the dispatch
of these tapes are available.
If the IT administrator is away from the office, the chief financial officer will arrange for
someone else to carry out their task.
Every quarter backups are verified to ensure the backup tapes are readable and data can
be recovered
Diaries
All staff must record their day’s activities in their diary. This is used for billing purposes
and is also part of the Disaster Recovery Plan.
Details of what is to be recorded are contained in the Employee Handbook.
Each evening staff must update the time recording system and take their diaries home
with them.
A copy of these instructions is kept at each partner’s house and at the bank. Whoever
arrives at the office first (normally a partner) will call the appropriate site to get these
instructions.
All staff are to meet at the local RSL. They are not to go home! The firm has an
agreement with the RSL that it can hire one of their rooms.
The IT administrator will call the off-site agent and arrange for the latest backup tape to
be delivered.
The firm has an arrangement with a computer hire company to hire computers in order to
resume business as soon as possible. These will be set up at the RSL and the IT
administrator will rebuild the file server and restore the server. While this will still take
several hours, the firm should be able to start operating again by lunchtime.
31
The backup tape will be one day out of date and staff must now review their diaries to
identify the work they had done on the previous day.
If the IT administrator is not available that day, recovery procedures will pass down the
chain of command.
32