Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

Raval • Fichadia

John Wiley & Sons, Inc. 2007

Systems Availability and


Business Continuity
Chapter Four

Prepared by: Raval, Fichadia


Chapter Four Objectives
1. Understand system availability and business
continuity, and recognize differences between the two.

2. Comprehend incident response systems and their role


in achieving the system availability objective.

3. Explain disaster recovery planning objectives and its,


design, implementation and testing requirements.

4. Comprehend the link between business continuity and


disaster recovery.

5. Understand the role of backup and recovery in disaster


recovery plans.
Business
System

Concerns include strategy, operation, control, and


Concerns include development, operation, security, and

Business
process-
Availability focused Continuity
Technology-
concern
focused concern

is impacted by
Is interrupted by

Incidences
Disasters
or breaches

Warrant
Warrant
a response

Incidence detection
Recovery
and protection

Is designed as two stages

Systems resources

Back up Recovery

Data
Permits, when necessary
Power outage at Northwest
Airlines
 Thunderstorm and lightening at the datacenter
location caused the problem.
 Systems, down initially, operated in a degraded
manner the next morning.
 Took very long to check people in flights.
 NWA triggered manual processes. Lines became
longer and so did the delays in departure.
 Arrivals were late, but the departures from gates at
the destination airport made the flights to wait before
they could get to the gate.
 NWA announced an embargo, limiting itself to what
it can handle under the circumstances.
System Availability and
Business Continuity
 System availability assures you that business
will continue to operate.
 Business continuity is necessary for systems
to add value on an ongoing basis.
 The issues of business continuity and
systems availability are related and even
overlap to a degree.
Incident Response
 Incident: A level of interruption in the system availability that
appears to be temporary.
 An incident can be triggered by an accidental action by an
authorized user, it may result from a threat.
 Incidents may be detected by:
 End-users who may describe the symptom but not the cause.
 Those monitoring systems and processes may detect anomalies
which lead to an incident that has occurred.
 Attack: A series of steps taken by an attacker to achieve an
unauthorized result.
 Event: An action directed at a target that is intended to result in
a change of state, or status, of the target.
 An event consists of an action and a target.
Nature of Response to an
Incident
 Assess the business significance of the incident’s
impact.
 Identify critical business processes that might have been
compromised.
 Determine the root causes of the incident. This might
present a challenge, for every incident could be of a
different variety. The team may need to consult experts
from outside the team.
 Training in forensics could help the team collect and
evaluate evidence systematically.
 Standard procedures must be followed for restoring the
affected systems and processes, instead of ad hoc, one-
off attempts to restore what is compromised or lost.
Preventive Measures
 Prevention is better – and could be more cost
effective - than a cure.
 Preventive measures require an anticipation
or prediction of what might happen in terms
of incidents and consequent compromises.
 Lessons learned from the organization’s and
from others’ experiences can help design and
implement effective preventive measures.
Incident Response Team
 A multi-skilled group, since the incident may be any
variety and may impact almost any information
asset.
 May include representation from human resources,
legal, information systems, networks and
communications, physical security, information
security, and public relations.
 A top management team member may be
designated as a direct contact for counseling and
support.
CERT
 CERT stands for Computer Emergency
Readiness Team.
 Also called CERT Coordination Center
(CERT CC), it is the Internet’s official
emergency team.
 Provides alerts and offers incident handling
and avoidance guidelines.
 Is located at Carnegie-Mellon University.
 www.cert.org
Disaster Recovery
 Disaster: An event that causes a significant and perhaps
prolonged disruption in system availability.
 Disasters can be man-made or natural.
 Man-made disasters can be malicious or unintentional.
 Disaster recovery is a systematic effort to recover from the
impact of a disaster.
 Best way to understand recovery is by focusing on post-disaster
phases.
 Post-disaster phases
 Immediate response
 Near-term resumption
 Recovery toward normalization
 Restoration to pre-disaster state
Phase Immediate Near-term Recovery toward Restoration to pre-
Response resumption normalization disaster state

Objective Address emergency Resume operations at Expand operations and Return as close to the
situation only. any level possible. extend capabilities and original (pre-disaster)
functionalities. state as possible.

Example Event: Call customers Install equipment, load Expand the order Load operating
A logic bomb whose orders are yet operating system and processing cycle. Increase system, data, and
destroyed the to be filled. applications, restore the functionality (e.g. applications at the
operating system Determine the data, and test outputs. report generation). original site. Pre-test.
and customer current state of the Switch to automated Resume processing in
data. system and data. Call processing. a parallel run with the
in backup tapes and warm site. Cut over to
equipment to a warm the original site. Fold
site. Begin manual operations at the warm
processing of critical site and return the
orders. equipment.
Timeliness of Action and Value
of Recovery
 Timeliness of action
 The timeline of actions planned should reflect value of the
action at the time.
 Certain steps can wait while others must be taken without
delay, to minimize losses.
 Value of recovery
 Timeliness of action reflects value of the recovery target.
 Considering this, recovery tasks should be systematically
assigned to each post-disaster phase.
Figure 4.2. Relationship between timeliness of action and value of
recovery

Low
Restore
Value of
recovery
Recover

Resume

Respond High

High Low
Timeliness of action
Disaster Recovery Planning
(DRP)
 DRP: The definition of business processes, their
infrastructure supports and tolerances to
interruptions, and formulation of strategies for
reducing the likelihood of interruption or its
consequences.
 Component steps of DRP:
 Define the process
 Identify what supports the process and its tolerance to
interruptions
 Determine and implement strategies that would reduce the
likelihood and cosequences of interruptions.
Disaster Recovery Planning
(DRP)
 Assessing potential losses: Disaster Impact
Analysis
 What disasters the firm is likely to face?
 What is the probability of each type of disaster?
 What is the impact of the disaster on the firm?
Disaster Recovery Planning
(DRP)
 Value-based recovery planning
 Definition of criticality and criteria to determine criticality
 Identification of critical business processes and their
supports
 Identification of the role of information systems resources
in the critical process
 Determination of process owners and process customers
 Determination of the amount of time the business can
survive without the process post-disaster
 Identify interdependencies between the process and the
rest of the business processes and systems
 To find critical processes, consider attributes such
as importance, key users, tolerance to outage,
waiting time between cycles, possibility of data
recovery.
Disaster Recovery Planning
(DRP)
 Disaster recovery strategies
 How do we recover a system given its priority?
 Address the question by system components.
 Data (e.g., designate off-site storage)
 Processing (e.g., backup and store offsite current
copies of the software)
 Network and communication (e.g., backup and store
offsite a copy the current network configuration)
 Dependencies with other systems (e.g., identify how
these processes will be interfaced post-disaster)
Potential for
disasters

Assessing
requires
potential losses

Results in Finding criticality

Value-based
recovery plan

Is based on

Recovery
strategies

To form
To select

Recovery
Recovery teams
locations

Are tested for Disaster readiness Are tested for


DRP: Recovery Locations
 Recovery location: A site(s) where processes and
systems will be recovered post-disaster.
 Hot sites: Near-perfect replicas of the operations.
 Cold sites: Just the infrastructure (computer operations
room, platform for installing hardware, power and
communication lines, cabling, etc.).
 Warm sites: More than just a cold site, but not quite as
ready as a hot site. For example, it may include
commonly used computers and operating system.
 Reciprocal agreements: Sharing of similar resources by
those in the same or similar computing enviornments.
 Colocations: Recovery is planned using availability of
computing resources at the firm’s many locations.
DRP: Teams
 Purpose of forming teams is to ensure that recovery
tasks are accomplished in an orderly and
responsible manner.
 The number and nature of teams could vary across
organizations.
 However, each team should include knowledge and skills
necessary to perform its assigned tasks.
 Recovery teams can be organized by recovery
phases.
 Flexibility in assignments is necessary, for an actual
disaster may need adjustments to the team. Non-
availability of some team members when disaster
strikes is also likely.
DRP: Disaster Readiness
 Meaning of readiness: Having the assurance that if
and when a disaster strikes, the firm has a high
likelihood of recovering from the disaster. Testing of
the plan is crucial to get this assurance. Disaster
readiness practices include:
 Walkthroughs: Having a plan preparer walk though others
to show how the plan leads from point A to point B.
 Rehearsals: An “as-if” exercise to simulate a disaster’s
impact and have people responsible recreate recovery of
“lost” processes and systems.
 Compliance (Live) testing: Actual test of recovery with a
simulated disaster.
Business Continuity Planning
(BCP)
 BCP: The totality of plans made to recover
the business operations following a disaster.
 Recovery of all operations is involved, not just
information assets.
 Methods and strategies adopted for BCP are
comparable to, and often overlap with, those used
in DRP.
Business Continuity Planning
(BCP)
 Business impact analysis is an exercise in risk
assessment.
 Identify vulnerabilities of the firm.
 Assess the business impact
 Focus on a particular disaster and determine processes
that might be affected, and/or
 Analyze all business processes to assess probable
business impact in the event that a disaster strikes.
 Initiate a planning process to develop methods and
strategies to mitigate risk.
 Business recovery
 Approaches and methods for business recovery are similar
to those discussed in disaster recovery planning.
Assurance Considerations
 Any assurance that BCP/DRP will be effective
requires an examination of such plans from three
angles:
 Method: Review the method followed in the development
of the plan. A sound planning process make possible a
plan that is complete and reliable.
 Content: Should have been collected from “right”
participants, and the instruments and methods used to
collect data must be valid. The plan should be current.
 Testing: Critical components of the plan should be tested,
results should be documented, and corrective action,
where necessary, should follow.
Business
System

Concerns include strategy, operation, control, and


Concerns include development, operation, security, and

Business
process-
Availability focused Continuity
Technology-
concern
focused concern

is impacted by
Is interrupted by

Incidences
Disasters
or breaches

Warrant
Warrant
a response

Incidence detection
Recovery
and protection

Is designed as two stages

Systems resources

Back up Recovery

Data
Permits, when necessary

You might also like