Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

Course Overview

Course Overview

Hi everyone. Welcome to the Business Continuity, Disaster Recovery, and Incident Response for the

Certified in Cybersecurity course. This certification prep course will help you prepare for the Certified

in Cybersecurity examination. In this course, we're going to cover the skills measured in the three

sections of the business continuity management system as listed in the exam outline. This domain

counts for about 10% of the total exam. This is the second of the five domains for the CC

examination, and it addresses the areas of business resilience and even survival. My name is Kevin

Henry. I'm an educator and security professional, and I've been developing and teaching information

security courses for over 20 years based on my own years of practical experience in the field. This

course will address the key topics related to the principles of incident management and response,

business continuity planning, and disaster recovery planning. This course is supported by several

reference and exercise files. To download the exercise files, navigate to the Exercise files tab and

click on the Download button. In the exercise files, you'll find a helpful study guide that you can use

to follow along during your certification prep for the CC exam. The study guide contains a glossary, a

list of key points to remember, and some sample questions. I'm happy to join you on your

certification prep journey with the Business Continuity, Disaster Recovery, and Incident Response

for the Certified in Cybersecurity course, here at Pluralsight.

Incident Response

Business Continuity Disaster Recovery Incident Response Intro

Welcome to the second domain of the Certified in Cybersecurity Certification course. This domain is

entitled Business Continuity, Disaster Recovery, and Incident Response. These three elements

make up the business continuity management system and are crucial to helping organizations

prepare for and manage the many problems and challenges that all organizations face. This domain

represents 10% of the examination content, which makes it the lowest weighted domain in the exam.

Consider this to be a free 10% in the exam. We'll cover these topics in a way that makes them
logical and understandable so you're well prepared for the exam questions. This domain is divided

into three sections, incident response, business continuity, and disaster recovery. Bad things

happen. Every organization must be prepared to face adversity and unexpected problems. Power

outages, hacking employee errors, storms, and equipment failures are some of the many types of

incidents that can and do affect business mission and operations. The secret to managing a crisis is

to be prepared, have a plan, in fact, many plans, different plans to deal with different types of

incidents. Most incidents can be resolved quickly allowing the resumption of normal business activity,

but sometimes an incident requires a more detailed response, a business continuity plan to enable

the continuation of critical business processes. And a severe crisis may require a disaster recovery

plan to rebuild services at perhaps another location. These plans work together to ensure life safety,

always the first priority, and the identification and containment of the incident and the ability to return

to normal as quickly as possible. I hope you enjoy this domain. Let's get started with incident

response.

Incident Response

Let's take a look at incident response. The outcomes of a business continuity management system

are that we have plans in place for incidents through incident response planning which address

things like life safety, containment of the incident, documentation of the incident, and the ability to

return to normal operations. Business continuity planning is based on a business impact analysis,

the critical business functions, the recovery time objective, the data recovery point objective, and the

requirements to enable recovery of systems. Disaster recovery planning, the relocation of IT and

other services to an alternate location. When we look at an event, an event can be defined as any

measurable occurrence, something happened, somebody walked in, somebody walked out. That's

an event. An incident is a type of event with a potential to affect business mission. In other words, we

could call it an adverse event. All incidents are types of events, but certainly not all events are types

of incidents. Our goal is to build resilient systems. We see all that used a lot today in the ability of

business resilience means we can continue operations even during adverse circumstances. We

have response plans in place to address especially things that have happened in the past. If it's
happened before, there is a chance it could happen again. We also have to know what are the

current trends and threats, what are the types of attacks being used today? Is today's problem, say

ransomware or DDoS attacks? We should know what the current, should we say, tool of choice of

hackers is. And of course, we should look at areas of change because everything worked well until

we made a change. In many cases, it's when we have a change in staff, a change in procedures, a

change in equipment that we get more incidents as well. Incident management is a structured

process that starts with preparation. Let's be prepared in case something happens, then we can

prevent it as much as possible if we know the things that can happen. But we have to be alert to the

fact that things can still happen, even though we are prepared and have prevented, so we need

good detection. When something happens, we need to stop it from spreading, and that, of course, is

containment. Then we want to get back to normal, restoration, and apply lessons that were learned.

We can see, for example, a fire is an example of an incident. We're prepared by having equipment

and alarms and smoke detectors. We try to prevent fires through good practice of not overloading

electrical circuits or having dangerous circumstances that could lead to fire, but we have those

detectors so if there is a fire we'd know about it. The first thing we want to do if there is a fire is to

contain it, stop it from spreading, close fire doors, for example. After the fire is out, we need to

rebuild, restoration, and then, of course, learn how could we make sure that this doesn't happen

again. The idea of preparation starts with policy. Do we have policies about how to deal with things

and who has the authority if there is an incident? So it's not such that in a case of a crisis, everybody

is wondering well, who can make the decisions? Who's in charge? We have defined team members,

each with their own role, and of course, with the procedures of how we would do things. We want to

make sure that everything is documented because when we have things documented, we'll be able

to go back and review what went well, what could we improve on, for example. And of course, we

want to have regular reporting back to management and our customers and employees of what is

the current status of the incident. The idea of prevention, of course, is to have learned what are the

things that could happen, so hopefully we reduce the vulnerabilities or re-reduce the likelihood of

something happening again. The better we can be at prevention, the better we can be hopefully at

avoiding having to deal with incidents at all. We know that a lot of this is learning from what are the

bad guys doing. The types of attacks they're using are the things I should especially be watching for,
in other words, offense drives defense. We need to monitor and know what's happening on our

systems, networks, applications, and users. We see far too often that the problem is the attack had

gone on for months and nobody recognized it because nobody knew what was normal activity. We

should test our controls to make sure they're working, and certainly we should have awareness

programs so people know what to watch for and what to do if something happens. The key points

review. The secret to incident management is preparation, manage the incident and don't let the

incident manage you. Prevention is better than recovery, and learn from past incidents how to be

better prepared.

Detection

Of course, we want to try to prevent incidents, but we have to be ready for when they happen. We

need to detect the incidents, and this can come through use of various tools and technology that, for

example, detects a change in behavior on a system network or user, looking for signatures of known

types of attacks, we often see this with, for example, malicious code, heuristics, which is a type of

artificial intelligence and tries to learn when there is something that maybe is undesirable. We use

alarms because they can notify us if there is something that's gone wrong and an alert can come in

that allow us as employees, customers, suppliers, so we're able to be aware of a problem and, of

course, communicate that with these outside parties as well. One of the things that can be important

is to do audits and reviews of how well we've handled incidents in the past. What are the things we

could learn? When it comes to incident detection, the first line of defense is often the help desk. They

are the first people who become aware of people calling in and saying we're having a problem. They

have trouble tickets, and we should look for trends and patterns in the types of problems that people

are having. Alerts come in from our various monitoring systems, maybe a security information event

management system, for example. But when something goes wrong, our first priority must always be

life safety, looking after our employee's customers, and certainly, the community around us as well.

But then we have to do some analysis of the incident. The analysis of the incident should lead to a

classification. Is this really an incident or is it just noise? It's not really serious and just we could call

here a false positive, or is it a true positive? This is an incident and something we need to then
immediately take action on. The identification of it as a real incident should lead to the classification

of whether or not this is just a minor problem. Is it serious or even catastrophic that could affect the

whole organization, for example. Depending on the classification, we can determine whether or not

it's just an internal problem or something has come from outside. Was it something was done

intentionally or just accidentally? And then, of course, we activate the appropriate response teams. If

it's a minor incident, maybe just a few people are involved, but if it's catastrophic, it could be that we

activate teams right up to the senior management level and even our public relations group as well.

We want to contain incidents so we can contain the bad effect or adverse impact of the incident, and

often, we'll do this through things like isolation. We have a system that's infected, we disconnect it

from the network. In the case of a fire, we close fire doors, we disable network connections, we put a

system into quarantine so we can examine and see what's going on. And then quite often, this is

where we'll use a sandbox. A sandbox means we put, for example, malware or an infected machine

into a secure environment where you can watch its execution, watch it, what it's trying to do, but is

limited into that area, often a virtual machine, so it can't infect or spread to other systems. One of the

things we often will do, power the system down and give us a chance to be able to stop it from then

continuing to generate whatever type of malicious activity it's doing. We then sometimes, in a minor

incident, might just monitor. For example, we have things like honeypots where we can try to watch

the type of activity, or we see something that's going wrong and it's not something which is spreading

quickly, but we can monitor so we can see whether or not there is something going on and how that

is developing. We can learn maybe some of the behavior, the tools and techniques of the attacker.

Some of the considerations when we want to contain or stop something from spreading depends on

whether or not this is a critical system. If this is a system that is critical to business operations,

maybe I can't power it down or isolate it. We also have to look is this something that's going to

spread or is it something which is just, for example, in one area and not going to start infecting other

systems or networks. We also said, in some cases, we'll allow an attack to continue because we're

trying to gather evidence, we're trying to learn what's actually been going on so hopefully we can

improve our response and protection. The key points review. Incident management starts with

preparation, but then follows up with the ability, the watchfulness, so we detect any type of an

incident. Then we need to classify the incident so we can respond appropriately.


Eradication

Once the incident has been detected, classified, and we've tried to contain it, we want to then

eradicate the problem. In this case, eradication where we remove the damage, the damaged system

or software, and rebuild the system maybe from backups or making sure that we have a clean

backup that is not infected as well and apply any patches that were missing that maybe allowed the

attack to happen in the first place. In some cases today, the problem is that many of the attacks will

actually affect the hardware itself, and there has been a number of cases, especially with

ransomware, where it's actually required to actually replace the hardware because it's impossible to

remove the infection that's in there reliably. The idea of restoration is we want to get back to normal,

and of course, part of getting back to normal is to recover the things that are most important first. We

set out timelines and priorities for recovery. It's important, though, that we don't just get back to

normal and become re-infected. So we need to take steps to make sure that we've identified the

actual root cause of the initial infection and taken steps to prevent that from happening again. We've

talked a number of times about documentation and sometimes the documentation of the incident is

the most valuable thing we have. It outlines the steps and procedures we are to use in the recovery

process, but then it also documents what we did so that we can make sure that we can review it,

what went well, what could be improved, are there decisions that would have been easier to make if

we'd had more information, for example. So we keep this documentation in order to assist in

reviewing the feedback, and of course, future incidents. If we've already addressed this problem

once, it's really good if we know how we did it and we don't have to reinvent the wheel and try to find

out how to make that same repair again or repeat even the same mistakes again. Reporting is

important. We should obviously report when the incident is over and so that everybody knows that

this is now finished and completed. But part of the report should include our analysis and

assessment of the incident, what caused it. It could be more than one thing. It could be many small

things, not one big thing. We often say that the problem is that organizations look too much for the

trigger, but the trigger was just the spark that started it. That was a small part. There were many

other things that led up to the incident before maybe that spark or trigger happened. We document

and report on what we did. How did we fix the problem? And certainly from all of that, we assess

how the staff responded as well. Not everybody is good during a time of stress, and we want to know
who are the people that do work well and excel when it's a time of stress, so those are key people on

our teams. All of this should result in lessons learned. Now the problem, of course, with many

organizations is that by the time the incident is over, they didn't document anything, and therefore,

they don't learn what they could have learned from it. The key points review. We need to have

incident response plans because incidents will happen, so it's a critical capability required for every

business, but we also need senior management support when there is an incident. It's not that

everybody is guessing what should we do, but the senior management supports the plans we have

in place. We know that the plans should be detailed and action-oriented and should list the

procedures we will follow and it should be required that everybody follows those procedures. All of

the team members should be properly chosen, trained, and equipped to be able to do their job in a

crisis time. And certainly, incident response should link to our other plans as well, such as business

continuity, disaster recovery, and human resources plans.

Post-incident Review Practices - Lessons Learned

The final step in this incident response and incident management module is to review what we

learned from this incident. In other words, we conduct a post incident review and apply lessons

learned. When we review, we should look at what went well. We certainly want to continue the things

that went well, but we also want to do a very truthful and a self-assessment of what could be

improved. We want to know who demonstrated competence and the appropriate demeanor. Did

people get angry and argue during the middle of the crisis? Who were the ones that displayed

leadership, the ability to make good decisions, rational decisions, in the middle of chaos? One of the

things we'll sometimes do is we'll do a review right following the incident when the emotions are still

high, everybody's still a little bit so you should say, agitated, and that's often called a hot wash. Let's

hear, right now, what happened. The next step is to do a cold wash, to go back later and look at it in

the cold light of dawn, and now that people have had a chance to recover, think about it, and sort of

say, okay, what do we think now that we've had a little more time to reflect on it? Both are important

because sometimes in the cold wash, we can have lost some of the things that we knew about at the

time. But in a hot wash, we didn't use always the most rational thinking either. The idea of lessons
learned is to improve our preparation, improve our plans, improve our teams, make sure we have

the right tools and training that are needed, improve our prevention through things like enhanced

controls and improve our detection. I remember talking with one company that had a major breach,

and as they said, the one thing that they learned was they weren't even monitoring the right things.

They had monitored many things, but they didn't monitor the things that would have told them about

that breach. And certainly, we have to look at whether or not our containment really worked. Was it

an effective response? A lot of this comes down to awareness, letting people know what we can

learn, what they can do, certainly making the whole situation alive for them as well and address the

lessons learned through our various awareness sessions. One of the things is that we want

everybody in the staff to be a part of our security team and have a security culture so they are

conscious of the types of threats that are out there and know what to watch for. In summary, every

incident contains key learning points that the organization can learn from. We often say the problem

is trying to extract those small little flakes of gold from the mountain of rubble of the actual incident

itself. We want to improve our incident response so we're better prepared for future incidents.

Business Continuity

Business Continuity

Let's continue with this Business Continuity, Disaster Recovery, and Incident Response for the

Certified in Cybersecurity certification with a more detailed look at business continuity. Earlier on, we

saw this definition, business resilience, a common word being used today, and it can be defined as

the ability to continue operations, even during adverse circumstances, so this is the heartbeat or the

main thrust of some type of business continuity program continuing operations, not just recovering.

We saw before that incident response is very often the first step, but when it's a severe incident, it

might trigger the need to implement and start to use business continuity plans. The outcomes of the

business continuity management system were to have an incident response plan focused on life

safety containment, documentation, and return to normal, but then to have a business continuity plan

focused on business impact analysis, critical business functions, recovery time objective, the data

recovery point objective, and the recovery requirements. When we looked at disaster recovery
planning, we're looking at a catastrophic event that meant we had to relocate IT and other services

to an alternate location. Business continuity is just simply project management. It starts with project

initiation, then moves on to business impact analysis. Based on the business impact analysis, we'll

select our recovery strategy. Then we write plans for how to implement that recovery strategy in the

event of a serious incident, but we know that all plans need to be tested. We need to roll it out,

communicate it so that everyone is aware of what to do in a crisis, and certainly through testing, we

train our staff, and we also find any flaws in the plan. Every type of use of the plan, whether it's a test

or a real incident, will allow us also to learn more about how to make the plans better and maintain

the plan. The heartbeat of business continuity is understanding the business, and this is a process

known as analysis of the impact on the business, or BIA, and it could easily be said this is the critical

and most important step in the actual business continuity planning process. Through business

impact analysis, we identify what is critical, the critical business functions, processes, for example,

that are going to have the most impact on the profitability, the reputation, and operations of the

organization. Some departments are more important than others. For a while, I worked in internal

audit, and believe me, we weren't a critical process. Most of the business thought they'd run better

without us, but the ones that are important need to be identified so that's where we set our priorities.

We also need to know what are the critical supporting processes in order to support those critical

business functions. In other words, the dependencies that critical business functions have on

supporting processes. When we want to recover a business process, we need to know what we

need in resources, people, data, facilities, equipment, and supply chain. The BIA allows us to

determine our priorities for recovery. Let's look at how this all works. We have the element of time

and business impact analysis is all about impact over time. In that way, it's different from risk

management because when we looked at risk management back in the Security Principles course,

we saw that risk was based on impact and likelihood. So here, we're looking at impact over time, so

very much an overlapping type of supporting process, but slightly different from a risk assessment.

Over time, the business is running as normal, normal operations, but then one day, we encounter a

crisis. As a result of that crisis, our level of business drops to 0. We're no longer producing a product,

we're no longer meeting our mission. Now, immediately we should start to determine what is the

impact of that inability to operate our business over time, and we can see that that quite often will
grow kind of exponentially at the end. Over the first few hours, people understand if we've got a little

bit of an outage, but the longer it goes, the greater the damage to our reputation and finance

becomes. Now, this is different for different business processes. Obviously, if this is the life support

system, this is measured in minutes, not in hours or days. One of the things we try to determine

through all of this is when the level of impact would be high enough that we actually encounter

business failure, the business has to shut down. We are unable to continue business operations.

We've lost the confidence of our customers, our owners, our bankers, for example, and that point in

time at which we would encounter business failure can be called the maximum tolerable downtime.

Sometimes we'll hear that called the maximum tolerable period of disruption. In the old days, we

used to hear it called maximum allowable downtime. I think sometimes they change the name just to

keep us all a little confused. So we look at all the business processes of the organization. We said

that some are more critical than others, and we want to know what are the critical supporting

processes for each of the critical business processes as well. We'll quite often then group. There is

no way to recover a business process without also recovering its supporting processes, so our

recovery plan should look at recovering both of them, should we say, concurrently. We can say it

simply this way, you cannot recover essential services without recovering supporting processes. One

of the things we need to learn is what will our owners, what will regulators, and what will our

customers tolerate? These would be tolerable levels of outage. We all know that, in some cases, the

customer will say, oh yeah, sure, your systems are down, I'll call back in an hour. In other cases, we

will lose the customer. So this is where we have to understand what our customers expect. Are there

regulations that say we must provide a certain level of service bound by say, government

regulations? All of these can help us determine the point of business failure, something we called

before the maximum tolerable downtime for those critical processes and their supporting processes.

Then we want to determine what is our ideal time of recovery, and this is known as the recovery time

objective and will have different recovery time objectives for different processes. The, of course,

requirement is that the recovery time objective must be, in fact, we could say, significantly less than

the maximum tolerable downtime. I don't want to write a plan that would have me recover my critical

business process an hour before the business would fail. The other thing we have to look at is the

recovery point objective, and I always call it the data recovery point objective because what this
refers to is what is my data recovery point. I'm really saying that if I have a major interruption, how

much data can I afford to lose. So really what this measures is the amount of data that can be lost in

the case of an outage and how old the data would be when it's restored. When we looked at the

resource requirements, we need to identify what would be required in order to restore systems. Now

that, as we said, also included some of our supporting processes, our dependencies, but also it

includes things like the controls we put in place that could be added to try to make sure that this

doesn't just happen again right away. So let's go back to that diagram we looked at before. The idea

here of BIA was that we determine what was the level of impact over time until the point of business

failure. Then we want to say, okay, what would it cost for us to recover the business? Now, the cost

of recovery is often the inverse of the duration of the outage. In other words, I could have a very

minimal amount of, should we say, outage time, but then the cost of the recovery is very high. So in

most cases, instead, we will try to find more of that crossover point at which point we could say the

cost of recovery is sort of, I should say, inline with the impact. This is where we want to set our

recovery time objective. So we write plans to try to recover these critical business processes by this

point in time. But when I recover, say after a fire that wiped out my head office, I have to go to my

data backups and maybe I did data backups on a regular basis, but the time of the failure was not

the same as the time of my last data backup. So I, when I rebuild my systems, am going to have to

use the most recent backup I have, which quite simply means that quite likely all of the data from the

time of the last backup until the time of the crisis will actually be lost data. All of this allows me to set

out my priorities and plans for recovery. I establish one of the priorities for system recovery based on

cost, as well as the level of impact to the business. And of course, I must have a plan which is

feasible, not unrealistic, I can't recover a major system in a few minutes. It must be something which

is acceptable, acceptable to should we say our customers, our owners, management, something

which is suitable for the type of business we're in. And of course, this is something that quite often is

a little bit contentious. We'll have a lot of different people think, well, my department is most

important so you should recover my department first. In the end, we need to go back to senior

management and hope that they will approve the actual choices we've made for which parts of the

business should be recovered first. The key points review. Its business impact analysis that provides

us the information we need in order to move ahead with selecting our recovery strategies and writing
plans. It's critical to the business continuity planning process. It identifies all of the critical business

processes, documents the resources required to restore those processes, and gives us now the

ability to choose restoration timelines. It sets out our priorities, and through this, helps us to move on

so we can write effective plans for business continuity.

Data Preservation and Recovery

One of the essential resources required to recover our IT systems today is really data, and we have

to be prepared so we are able to recover the data if we have a major outage. That means we have

some type of data preservation plan. We use this term the recovery point objective. The recovery

point objective means we won't lose too much data and that recovery point objective determines, or

in many cases, influences or drives what our backup strategy should be. Do we back up our data to

the cloud so that it should be available from an offsite location if our head office burned down? Do

we actually have some type of storage area network with say internal hard drives or some type of

removable storage that we could put off in a secure location, say every day? Do we mirror our data

on two different, should we say, systems, maybe even geographically dispersed locations? Do we

take all of our data, say once an hour and write it off into an electronic vault or maybe every 1000

transactions so it goes offsite, and if there was a problem with the primary site, well the most I would

ever lose is that 1000 transactions that have happened since the last time I did a vault. When we

deal with databases, whenever we make a change to a database, we write a little journal entry that

allows us to recover the actual changes made to the database, even if the database was corrupt or

failed. The thing is that if that journal is just kept on the same system that failed, it's probably going to

be lost as well. So we will write that journal off to another location. We'll take a full database backup

on a regular basis, and we can apply those journals to bring the database right up to the time of the

failure, minimizing the amount of actual data loss. We want to build our systems to be resilient. That

means quite often fault tolerant. We put in things like, for example, duplication and redundancy of

equipment and networks so that if one failed, the others will still be able to keep going, and one of

the solutions to that is a cluster. Maybe I have a number of servers working together and all of them

sharing the load. If one goes down, the others just keep on processing and should have a very
minimal impact on our customers and users. We build high-availability systems, systems where

we've built in the ability to failover if a piece of equipment fails, for example. We also make sure we

have the appropriate levels of quality of service which ensures that we have the bandwidth, the

storage we need for our processing to actually be then handled. In summary, in this module, we set

out the foundation for continuity of operations. Our goal is to ensure the organization is prepared to

deal with and manage disruption to business mission and operations. This is so that we can sustain

the critical business operations through proper preparation and planning.

Disaster Recovery

Disaster Recovery

Let's continue looking at Business Continuity, Disaster Recovery, and Incident Response for the

Certified in Cybersecurity course. Now let's take a look at the third part of this, disaster recovery. We

looked earlier at this slide about the outcomes of a business continuity management system, and we

said the three parts incident response planning was, first of all, concerned with life safety,

containment, documentation, and getting back to normal. Business continuity planning was based on

the business impact analysis, the critical business functions, the recovery time objective, the data

recovery point objective, and the various recovery requirements. Now when we look at disaster

recovery planning, we're looking primarily at the relocation of IT and other services to an alternate

location. Our primary location has been damaged, we can't use it, so we need to recover by

rebuilding systems, for example, our processes, at another place. When we choose those other

places, we could call that our recovery site, there is a number of factors that were used in

determining what was an appropriate recovery site. For example, how quickly do I need to recover?

If it's 8 hours drive away, that's maybe not something that's going to work if I need to recover in 4

hours. So the recovery time objective drives the site selection, but we also know that if I need to

recover very quickly, it's probably going to cost me more as well. So in some cases, the fastest

recovery would be having redundant sites, if one fails, the other is still running, but that doubles my

cost of operation. So quite often, we choose a less expensive option, such as a warm site. We also

have to look at how are we going to prioritize our systems recovery. We want to prioritize by
recovering the most critical business processes first. Now most critical could be from a financial

perspective or it could be from a reputational perspective as well. We also realize there are

challenges. If I have a recovery site too far away, it could be difficult to manage when I have

employees and systems at different sites based on a course process criticality. So the selection of

that contingency site is going to bring in a number of factors such as what would it cost, what's its

availability, can I be sure it's there when I need it, and will it help me meet my recovery time

objective? I want it close enough, but not so close that it could be affected by the same threat that

damaged my primary site, so proximity is a consideration. We want to have a site which is secure so

we don't have to worry about other problems, for example, relocating the site which itself would be

under an immense threat. We also have to worry about employees. They need to get to that site,

and logistics is often missed in disaster recovery plans. How can my employees get to this alternate

site? If they have to work there for the next 6 months to a year, that may not be so easy if that site is

hours drive away and there is no public transit available. We also want to make sure we have

support, whether or not we're discussing power, fire, police, ambulances, food, all of these are

important for the recovery site as well.

Writing the Plan(s)

Writing the plan. Now, we usually say writing the plan and we use the singular often a business

continuity plan, but there is quite often for a large organization 100 different plans. Our recovery in

the case, for example, of a fire, is very different than it is in the case of malware, for example. But we

write plans to deal with the various types of situations we could expect to face. A plan should be

thorough, it should address all types of situations. We say yes, but there can always be things

happen we didn't expect, but if I've written good plans, those could be adjusted to whatever type of

incident this is. We get the team together because we want the business continuity plan and disaster

recovery plans to address all areas, not just IT or not just the business, but we have to look at

everything from finance to operations and logistics. A plan should be a series of steps and actions.

We should try to minimize verbiage. We don't want a person to have to read pages of documentation

in the middle of a crisis. Instead, we want them to read and say do this, then do this, check, do this,
check off, and all these things mean we move towards the actual resumption of business processes.

We should write the plan for what we often call a worst case scenario, the most resource intensive

situation because then we can always use a part of the plan if it's not a worst case scenario, and that

means that in a worst case scenario, any type of lesser incident or situation would still be addressed

in that plan. One of the problems we have is that during a crisis, we have an elevated level of risk.

We know that, for example, many of the normal controls we would have had in place, separation of

duties, for example, are missing. We have people making decisions that go beyond what was their

normal budgetary authority. So this is an elevated security risk as well we have to watch for. We

want to have teams that are ready to go. We assign roles and responsibilities, as well as, of course,

the leaders, but for every leader, there should be a deputy, a person who can fill in if that leader was

not available. Ideally, we want to have people on the teams that understand more than just their area

so that if another team was in some ways impaired from being able to do their job, there is

cross-training and there is support that can be provided. An important thing in a crisis is to have clear

leadership and lines of reporting. We define who's in charge, who makes the decisions, who talks to

the media so that we have good and clearly understood reporting relationships and it's not such that

everybody's just doing whatever they think is best. We need to ensure that the people on our teams

have the appropriate training so they can execute their responsibilities, as well as the tools they

would need in order to do their job. An important part in any crisis is communication, communication

with our employees, managers, our customers, all of the stakeholders, or in other words, all of the

people who could be affected by this crisis. We want management to know what's going on so they

can provide direction and certainly answer questions from the media. We quite often have to report

to government and regulatory agencies, let's say if we had a spill of diesel fuel or some other type of

environmental issue, or even an injury to an employee or a customer. We want to communicate with

our customers so they have confidence that we are there to support and help them and it's not such

that we are going to disappear and their warranties are now worth nothing. This is especially

important when we're dealing with a privacy breach. We want all of our customers to be confident

that we have done everything we can to protect their information, but also that we're being upfront

about what had happened and how we will prevent that from happening in the future. We need to

communicate with our suppliers. We quite often rely on them some of the raw materials we'll need
and we don't want them to stop shipping those products because they're afraid they'll never get paid.

And of course, our shareholders. By law, in many cases, we have to communicate with our

shareholders all at the same time so they're all aware of what's going on if this is something that

could affect share price. When we talk about reporting, we want to do regular reports on the status of

the crisis to management, and this quite often can be done through an emergency operation center,

the heartbeat or control point where we'll actually manage all the various teams and activities, and

from this point, we can communicate to management what's going on. We should have checklists,

our plans are action-oriented, so we can show milestones and progress we've made towards

addressing various types of systems or issues. And then, of course, we want to get back to normal.

We will call this the process of restoration. To restore to normal means I will recover the business

functions at whatever is now going to be my primary site. Now, normally, when we recovered after

the incident, we recovered our most critical business processes first, but when I restore, I'm going to

recover the actual less important areas. That will allow me to test my migration plan, my networks,

and my systems before I jeopardize my most critical business processes by trying to move them into

whatever the new normal is going to be. No plan can be trusted unless it's been tested, and we do

tests of the plan with the intention of finding any deficiencies. The point of the test is to find

something that could go wrong so we can fix it before the incident. The testing also helps us to train

our staff so they develop skills and know how to respond effectively. The test should be thorough,

they should be as accurate and realistic as possible so we know that this is how things would work in

a real world situation. When we test, it's always good to start small. Do some little tests of just

individual processes before we move on to more complex types of tests. One of the problems is that

very often from any incident and from any, we could say, test, there have been lessons that have

been identified. It is important that those become lessons learned. We apply what we learned so we

improve it so it doesn't just happen again. In summary, in this module, we set out the requirements

for disaster recovery planning. This is for the most serious types of incidents that would require

relocation of operations.

Domain Summary

Domain Summary
Congratulations on completing the Business Continuity, Disaster Recovery, and Incident Response

for the Certified in Cybersecurity examination. Let's do a quick summary of the important things we

covered in this domain. This domain is worth 10% of the examination. It looked at these three areas

and how they relate to each other and how they ensure that our systems will be available for

business to operate in a secure manner. The first step in all of this really is incident response. We

deal with incidents as they happen, some maybe major, some maybe minor, but sometimes we

need to then also invoke a second process, that of business continuity. That is when the duration of

an incident would exceed acceptable timelines, and we need to take steps to keep the business

going, hence the name business continuity. One of the things that we often have to do when there

has been a major disruption is recover things like IT services, and that is why we also have disaster

recovery often seen to be the recovery of IT, even at an alternate location, which in many ways is

kind of a subset of business continuity. We have to remember that when something happens, the

first priority is always life safety. We want to make sure that people are safe, and therefore, that is

the first thing we must address. We can look at how NIST, the National Institute for Standards and

Technology, defined all of these areas of incidents as, first of all, being prepared. We're prepared, we

know what to do, then when we detect something, we already have a plan. We execute that plan to

try to contain the incident and recover from what actually happened. And sometimes as we're trying

to contain, we learn more, we do more detection and analysis until we finally have completed and

eradicated the problem and we can do a review, what did we learn, and what we learned as part of

post incident activity can help us be better prepared for next time. When we looked at business

continuity, we defined a number of key parts of what we're trying to do. We often don't try to recover

everything. We set a priority on critical business functions first, and we do this through that process

we called business impact analysis, in other words, analyzing what that impact of an outage would

be on the business. We also had to determine what our drop dead deadlines were, the maximum

tolerable downtime, and that is the point by which we had to recover or else maybe we could be out

of business altogether, but that wasn't our goal for recovery. Our goal for recovery was based on the

recovery time objective, that's when we wanted to recover by, and we set that so that we could put in

place a plan to help us to recover the critical business functions by that point in time. We looked at

disaster recovery as recovery of operations at an alternate location which included, of course, the
recovery of the data we needed for the business to run, the personnel required, the equipment that

we required for our business to operate, and of course, looking at things such as where that location

could be as well. So here we've looked at these three important points worth 10% of the exam, and

we can move on to our next steps. Review each of these areas, make sure we understood them and

didn't just memorize them, for example, do the sample questions to ensure we really have

understood the concepts behind them, and then proceed to the next domain, Access Control

Concepts.

You might also like