Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Root Cause Analysis Fundamentals

Table of Contents

Root Cause Analysis ........................................................................................................................ 2

Why Do Root Cause Analysis? ........................................................................................................ 4

Who Does Root Cause Analysis?..................................................................................................... 6

When to Do Root Cause Analysis .................................................................................................... 8

How to Do Root Cause Analysis .................................................................................................... 10

Root Cause Analysis Fundamentals .............................................................................................. 12

Root Cause Analysis Caveats......................................................................................................... 14

Root Cause Analysis Preparatory Tasks ........................................................................................ 17

Notices .......................................................................................................................................... 19

Page 1 of 19
Root Cause Analysis

Root Cause Analysis

A root cause is
• an initiating cause of either a condition or a causal chain that leads to
an outcome or effect of interest
(source: Wikipedia, https://en.wikipedia.org/wiki/Root_cause)

• the highest level cause of a problem; “the evil at the bottom” that sets
in motion the entire cause-and-effect chain causing the problem(s)
(source: American Society for Quality (ASQ),
http://asq.org/learn-about-quality/root-cause-analysis/overview/overview.html)

Root cause analysis is


• a method of problem solving used for identifying the root causes of
faults or problems
(source: Wikipedia, https://en.wikipedia.org/wiki/Root_cause)

• a collective term that describes a wide range of approaches, tools, and


techniques used to uncover causes of problems
(source: http://asq.org/learn-about-quality/root-cause-analysis/overview/overview.html)

• the understanding of the design or implementation flaw that allowed


the attack
(source: FIRST “Security Incident Response Team (SIRT) Services Framework,”
https://www.first.org/_assets/global/FIRST_SIRT_Services_Framework_Version1.0.pdf)

[Distribution Statement A] This material has been approved for public release and unlimited
4
distribution.

**004 So let's begin first with some


definitions. What is root cause?
Cause is often associated with effect,
and you'll see in some of these
definitions for what root cause are
they use the term "effect" in the
definition. Wikipedia identifies root
cause simply as a cause of either a
condition or a causal chain that leads
to an outcome or effect of interest.
The American Society for Quality has
a definition, and they say that root
cause is kind of like the highest-level
cause for a particular problem, or
they call it the evil at the bottom that
sets in motion a cause-and-effect
chain that causes a particular
problem.

Page 2 of 19
So root cause analysis is analyzing
the root cause. Again, other
definitions are a method of problem-
solving using for identifying the root
causes of particular problems or
faults. Again, the American Society
for Quality defines root cause
analysis as a collective term that
includes a wide range of different
approaches, tools and techniques
that can be used to uncover the
cause of particular problems. And
the Forum of Incident Response and
Security Teams, FIRST, has drafted
an initial framework for security
incident response teams and they've
identified a number of different
processes, and one of these is root
cause analysis, and they define root
cause analysis as the understanding
of a flaw, and specifically designer
implementation flaws, but it may be
other types of flaws that allow a
particular attack or incident to occur.

Page 3 of 19
Why Do Root Cause Analysis?

Why Do Root Cause Analysis?

Root cause analysis can benefit other processes and activities in


the overall incident management workflow.
• Response: Understanding the root cause of an incident can
support the development of an appropriate (more focused and
targeted) response.
- Align the response course of actions with the underlying root
cause(s).
• Depending on the circumstances, mitigation (elimination of the
root cause) and recovery might not happen in the short term;
some response actions may be deferred until a later time.
• Prevention and detection: Root cause identification can help in
developing indicators/signatures to better prevent or detect
future incidents.
- Failure to mitigate the root cause(s) can allow new or repeat
incidents to occur.

[Distribution Statement A] This material has been approved for public release and unlimited
5
distribution.

**005 So why would we want to do


root cause analysis? Well, it can be
very beneficial and advancement to
other processes and activities in the
improve incident management and
incident response process workflows.
Particularly it happens during the
response process, and understanding
why or how an incident occurred can
then lead you to a better course of
action-- more focused, more targeted,
more efficient response steps-- and
you can align those particular
recommendations or follow-up actions
to focus on targeting, eliminating, or
mitigating the underlying vulnerabilities
or causes of the particular incident in
question.

Page 4 of 19
And depending on the individual
circumstances of that incident, there
may be some types of follow-up
responses-- such as mitigation and
recovery-- that might not necessarily
be the same or appropriate, and
perhaps then could even be delayed
to a later time, depending on the
specific circumstances of that
particular incident.

So in addition to the response


phases, root cause analysis can also
lead back and be beneficial to the
prevention and detection processes,
because by understanding what
caused an incident, you can then
identify whether there are maybe
some other changes that need to be
made on other similar systems to
detect or even prevent those
incidents from occurring on other
systems beyond the affected system
at hand. And if you don't provide a
comprehensive approach to
addressing, mitigating or eliminating
the underlying root causes, another
incident may reoccur, or it may occur
on other systems too if these
problems aren't addressed.

Page 5 of 19
Who Does Root Cause Analysis?

Who Does Root Cause Analysis?

It depends…
Root cause analysis, at some level, is often performed by CSIRT
incident analysts.
Many teams might not have a defined approach or process for
formally conducting root cause analysis.
• Informal (ad hoc) root cause analysis may be performed at
lower levels of effort or rigor, as needed.
• Constituents or system owners with direct access to the
available incident information may be more capable and likely to
perform this analysis than a coordinating CSIRT.

[Distribution Statement A] This material has been approved for public release and unlimited
6
distribution.

**006 So who does root cause


analysis? Well, it depends.
Oftentimes it may be done by the
people in the computer security
incident response team, primarily
performing the incident analysis
phases of the incident response, and
as we know, incident analysis can
occur iteratively in many different
areas during the course of the
incident lifecycle.

Many teams may not have a formal


set of processes for doing root cause
analysis. They may do this in an
informal or ad hoc manner, or they
may even be doing root cause
analysis without realizing this is what

Page 6 of 19
they're doing, just trying to answer
questions and understand what is
basically happening with the incident
so they can move on to the other
steps in following up on responses.

Sometimes it may be your users,


your end-users or system
administrators or other parts of your
constituency, who are performing
some level of root cause analysis and
then providing their interpretation of
what might have caused the incident
to occur to your incident response
team staff. So there's a variety of
different people who might be doing
this, but typically it's the incident
analysts who are doing the further
investigation, trying to identify the
causes of a particular incident.

Page 7 of 19
When to Do Root Cause Analysis

When to Do Root Cause Analysis

Root cause analysis usually occurs during the detailed analysis


steps of the incident response process.
• to support the development of an appropriate response
But root cause analysis can also occur in conjunction with other
analysis steps anywhere in the incident management lifecycle.
- event/incident detection (confirm/verify whether a possible incident
has occurred)
- triage
- response
- lessons learned (after the incident response for the affected system
has already been completed)

[Distribution Statement A] This material has been approved for public release and unlimited
7
distribution.

**007 When is it done? Well, again,


generally it's done during the
response process, but it can happen
at other times in the incident
management lifecycle too. Generally
it's being done to provide a better set
of response course of actions in
support of the analysis that has been
done on a particular incident, but
there may even be some levels of
root cause analysis being done in the
initial identification that an incident
has occurred. If you're monitoring
your systems to try to verify that a
SIEM alert or an intrusion detection
alert is an actual indicator of
something happening versus a false
alarm or false-positive, you might be

Page 8 of 19
having to do some level of root cause
analysis.

Doing the triage of the initial incident


report and trying to categorize what
type of incident, there may be some
high-level of root cause analysis
being done to try to categorize and
prioritize that incident for further
follow-up analysis and response.

Later on in the response process,


after the initial incident response, if
you're coordinating with other people
and getting additional information,
those additional data sources will
require further analysis and may
answer additional questions that are
unknown at the time of the initial
analysis.

And sometimes the root cause


analysis is deferred till later after the
incident has been addressed.
Perhaps the response actions have
been to initially contain and eradicate
and recover the system and get it
back online, and then at a later point
in time you may have people who go
and look at the logs and the
information available to try to
understand what was the underlying
cause that allowed that incident to
occur after the response has already
been taken.

So it can occur at various times, but


generally it's during the incident
analysis phase of response.

Page 9 of 19
How to Do Root Cause Analysis

How to Do Root Cause Analysis

Root cause analysis requires


• a list, catalog, or taxonomy of possible causes or threat vectors
- Use your existing incident and threat categories; add new threat
vectors as they are detected/discovered.
- Adapt the incident/threat categories used by others.
• information sources to identify (confirm or refute) the possible
threat vectors
- Use the same sources as needed in other incident analyses (e.g.,
log files, running processes, network connections, and artifacts).
- Use the results of vulnerability assessments.
• a methodical approach for analyzing the available information to
identify the suspected threat vectors

[Distribution Statement A] This material has been approved for public release and unlimited
8
distribution.

**008 How do you do it? Well,


there are a variety of different ways,
but there are three primary
requirements that can allow a more
efficient, formalized root cause
analysis process to be performed.

One of the things that'll be very


useful is to have some kind of list or
a catalog or a taxonomy of the
possible causes, or you may call them
threat vectors, for different types of
incidents, and one way to do this, if
you don't already have a list, is
perhaps just looking at your historical
reports of past incident reports.
What are the typical types of
activities that your constituents have

Page 10 of 19
reported or seen or you've detected
in the past? Looking at these trying
to come up with some way of
organizing these and cataloging them
and identifying those various types of
causes is going to be one way to be
able to map those causes to the
different activities that you're
analyzing.

If you don't already have such a list


or terms-- you may have to make up
terms-- perhaps look at other
organizations and see what types of
incident categories or threat
categories they use, and you might
be able to adapt some of their
terminology. Could be that some of
the incident threats or categories that
others have seen, you have never
experienced one of those incidents
before, but you still want to include
that in your incident category list or
taxonomy in case those incidents
should happen at some point in the
future.

In addition to having some


descriptions or list of the different
possible causes or threat vectors, you
also need access to various types of
data sources or information sources
to perform the analysis, to try to
answer the questions as to what is
happening-- the methods used, the
timing, other indicators of the attack.
You may also have other sources
available to you such as if you're
doing internal vulnerability
assessments or penetration testing,
things like that. Some of those
activities may be able to be
information sources for the types of

Page 11 of 19
vulnerabilities, threats, weaknesses
that might be used as methods or
mechanisms to cause an incident to
occur.

And finally, to do this, you need


some kind of process or method or
approach for analyzing the variety of
data sources and information that
could be available, to correlate these
and map them to the possible or
suspected threat vectors or causes
for the particular incidents that you're
analyzing.

Root Cause Analysis Fundamentals

Root Cause Analysis Fundamentals

By definition, the goal is to identify the root cause of a problem


(e.g., why an incident occurred).
• This focus differs from other types of analysis, such as impact
analysis.
To answer the Why question(s), you often also need to answer
related What and How questions. (Who and When questions may
also be asked.)
• Answering these questions may require the results or
information from other types of analysis, such as
- system analysis
- network analysis
- malware analysis
- vulnerability analysis
- retrospective analysis (What else did the attacker do?)
- trend analysis

[Distribution Statement A] This material has been approved for public release and unlimited
9
distribution.

**009 So some of the fundamentals


for root cause analysis are-- again, by

Page 12 of 19
definition we're trying to identify why
or how an incident occurred. So this
root cause analysis is fundamentally
different than other types of incident
analysis, such as impact analysis, risk
analysis, those types of things. So to
answer these "why" questions, you
often have to answer some other
types of related questions, like,
again, how, and depending on how
you word the question, how and why,
you can use different terms to
answer the same information, but
also underlying what-- that may be a
different way to phrase a question--
and sometimes other questions, such
as the who and the when questions,
may also need to be addressed too to
get to the underlying why or how.

So answering these types of


questions typically require other
types of analyses that you do in
incident analysis, such as network
analysis, looking at available network
data sources, logs, network tools,
SIEM tools, intrusion detection
systems, other things that are
monitoring the network activity;
system- or host-based analyses,
looking at specific files, processes,
things that are running on a
particular system. If there's a
malicious code involved, there may
be some malware analysis-- there's
going to be some overlap-- and
again, understanding the particular
malware that's being used can
understand what vulnerabilities that
might have been exploited to get
there in the first place or what else it
might be doing.

Page 13 of 19
There may be other types of
analyses, like vulnerability analysis,
looking at the underlying
vulnerabilities that may exist in
software or systems that allow an
exploitation to occur to create an
incident, and other things like
retrospective or trend analysis, what
the attacker may have done, or
trends in what has been seen in the
past and maybe be able to predict
what might be happening in the
future.

Root Cause Analysis Caveats

Root Cause Analysis Caveats

Keep these things in mind:


• Incidents may occur because of more than one problem (i.e.,
there may be multiple root causes).
• A lack of information may leave the root cause to be unknown.
• Be cautious in making assumptions about the actual root cause.
- Insecure victim systems may have had more than one attacker
(multiple, unrelated root causes).
- Analysis may lead to the discovery of other, previously undetected,
unrelated incidents that occurred on the same system.

[Distribution Statement A] This material has been approved for public release and unlimited
10
distribution.

**010 So some things to keep in


mind are that many incidents, when
you're analyzing and trying to

Page 14 of 19
investigate them, you may find out
that there might be multiple root
causes. So don't be constrained into
thinking that we only have to identify
one particular category in our
different lists. You expect that there
might be a variety of different things
that are identified that'll map to a set
of incident activities. So having this
as part of your analysis process and
the ability to identify the various
multiple root causes is going to be
something that's important.

Another thing to keep in mind is


depending on the availability of
information to analyze, sometimes
the information simply isn't available
or it's inconclusive or ambiguous, and
you may not be able to identify the
true root cause, and you'll have to
decide what to do from that point.

Another thing to keep in mind when


you're doing root cause analysis is be
careful about making assumptions in
your analysis process. It's not
unusual to have an incident report of
a particular system, and depending
on the initial symptoms that detected
the incident, you may find out that
that particular system has
vulnerabilities or the information
about that vulnerable system has
been shared with more than one
intruder, so you have two or more
intruders who have used that same
system and they're doing-- maybe
they gained access through different
mechanisms; they may be using it for
different impacts. So just be careful
about making your assumptions and
linkages. You may see multiple

Page 15 of 19
different indicators from different
data sources that may be unrelated,
and so drawing the correlations and
linking these different root causes
together is something you're just
going to have to watch out for.

You also may discover that in the


course of analyzing one particular
incident that you may have other
types of unrelated activity occurring
on that system or other systems too.
So you might have multiple different
incidents with different root causes;
again, being very careful to draw
distinctions and being cautious about
making unfounded correlations
between these is something you have
to keep in mind.

Page 16 of 19
Root Cause Analysis Preparatory Tasks

Root Cause Analysis Preparatory Tasks

Prepare your infrastructure and knowledge. Example tasks include the


following:
1. Have your specialists educated in specific areas of application
development and development languages.
2. Have your specialists educated in security aspects of infrastructures
and their implementations and designs.
3. Provide access to a lab environment to enable the specialists to
research vulnerabilities in applications, infrastructures, or designs.
4. Establish and document a workflow for various types of root cause
analyses.
5. Cooperate with your constituency during the incident response
process to obtain access to affected infrastructures or compromised
applications.

(source: [DRAFT] FIRST SIRT Services Framework, Tasks and Sub-Tasks for Function 2.4 Vulnerability/Exploitation Analysis – Sub-
Function 2.4.2 Root cause analysis (Task 2.4.2.1)

[Distribution Statement A] This material has been approved for public release and unlimited
11
distribution.

**011 So some of the tasks that'll


be very useful in preparing to do root
cause analysis. This list, again,
comes from the Forum for Incident
Response Security Teams SIRT
Services Framework, and some of the
things that'll help is having some
knowledge of the people doing root
cause analysis about the underlying
fundamental causes of vulnerabilities
by having some knowledge or
experience or familiarity with
different application development,
software development languages,
and various aspects of the
infrastructures and the
implementations and designs within
your constituency itself. It's a nice

Page 17 of 19
thing to have, but it's something
that's not necessarily essential to do
root cause analysis, but it can
definitely provide better insight to
what's happening.

In addition, if you can have some


equipment or maybe a dedicated lab
environment to enable the people
doing root cause analysis to actually
do some research or testing or
verification of various types of
vulnerabilities, applications, operating
systems, implementations,
configurations, design problems,
things like that, to do further
analysis.

Another thing is to define an actual


process or workflow on how root
cause analysis is done, and as we
mentioned earlier, having a list of
different root cause categories, threat
vectors, whatever you want to call
them, is going to be something that's
going to be very useful for having
consistency in the results of your
analyses; and also setting up some
expectations or some relationships
with your constituents and the people
who are going to be reporting the
incidents to you so that you can
better obtain or request access to the
data sources or information or
systems to do the further analysis to
answer the questions to understand
what caused the incident in the first
place.

Page 18 of 19
Notices

Notices
Copyright 2016 Carnegie Mellon University

[Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US
Government use and distribution.

This material is based upon work funded and supported by Department of Homeland Security under Contract No. FA8721-05-C-0003 with Carnegie
Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center sponsored by the
United States Department of Defense.

NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN
“AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY
MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR
RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND
WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.

This material is distributed by the Software Engineering Institute (SEI) only to course attendees for their own individual study. Except for the U.S.
government purposes described below, this material SHALL NOT be reproduced or used in any other manner without requesting formal permission
from the Software Engineering Institute at permission@sei.cmu.edu.

The U.S. Government's rights to use, modify, reproduce, release, perform, display, or disclose this material are restricted by the Rights in Technical
Data-Noncommercial Items clauses (DFAR 252-227.7013 and DFAR 252-227.7013 Alternate I) contained in the above identified contract. Any
reproduction of this material or portions thereof marked with this legend must also reproduce the disclaimers contained on this slide.

Although the rights granted by contract do not require course attendance to use this material for U.S. Government purposes, the SEI recommends
attendance to ensure proper understanding.

Carnegie Mellon®, CERT® and CERT Coordination Center® are registered marks of Carnegie Mellon University.

DM-0003588

[Distribution Statement A] This material has been approved for public release and unlimited
2
distribution.

Page 19 of 19

You might also like