Professional Documents
Culture Documents
From TTP To IoC Advanced Persistent Graphs For Threat Hunting
From TTP To IoC Advanced Persistent Graphs For Threat Hunting
From TTP To IoC Advanced Persistent Graphs For Threat Hunting
Abstract—Defenders fighting against Advanced Persistent maneuvers whose objectives are to test the soldier readiness
Threats need to discover the propagation area of an adver- and attack effectiveness through simulations. In cybersecurity,
sary as quickly as possible. This discovery takes place through a these exercises help organizations keep their assets safe. The
phase of an incident response operation called Threat Hunting,
where defenders track down attackers within the compromised Red Team is composed of highly trained individuals playing
network. In this article, we propose a formal model that dis- the role of potential attackers motivated by a strategic objec-
sects and abstracts elements of an attack, from both attacker tive (e.g., stealing sensitive information, using organizations’
and defender perspectives. This model leads to the construction capabilities for malicious purposes, defeating the availability
of two persistent graphs on a common set of objects and compo- of victim’s services). The Blue Team defends the company,
nents allowing for (1) an omniscient actor to compare, for both
defender and attacker, the gap in knowledge and perceptions; and has to ensure that its assets are not compromised, in the
(2) the attacker to become aware of the traces left on the tar- event of the Red Team finding a vulnerability and exploiting it.
geted network; (3) the defender to improve the quality of Threat The Blue Team thus needs to rapidly remediate the incident to
Hunting by identifying false-positives and adapting logging policy control the Red Team’s network propagation and contain the
to be oriented for investigations. In this article, we challenge this threat. To estimate the effectiveness of their respective games,
model using an attack campaign mimicking APT29, a real-world
threat, in a scenario designed by the MITRE Corporation. We we can naively measure the time it took the Red Team to dom-
measure the quality of the defensive architecture experimentally inate the target and the time it took the Blue Team to detect
and then determine the most effective strategy to exploit data and respond to the attack. We believe that this measure would
collected by the defender in order to extract actionable Cyber be greatly improved with knowledge of the compromised com-
Threat Intelligence, and finally unveil the attacker. ponents, by the Red Team, from the victim’s network (i.e., its
Index Terms—Advanced persistent threat, tactics techniques propagation area) and how aware the Blue Team was of this.
procedures, threat hunting, IOC, SIEM. In this article, we formalize both defensive processes and
the attacker’s offensive approaches, allowing for confronting
their respective perceptions during the same attack cam-
I. I NTRODUCTION paign. The attacker’s perception of the campaign is built from
HE RISE of collective awareness of Advanced Persistent (1) the execution of his procedures chosen from among his
T Threats (APT) has required companies to reconsider their
approach to cybersecurity. The resilience level of a company’s
Tactics, Techniques and Procedures (TTP) [2]; (2) his exposed
resources during these executions; and (3) the victim’s com-
information system can be challenged today through a Red ver- ponents he compromised. The defender’s perception of the
sus Blue exercise that simulates a realistic attack campaign, attack is built from (1) the collected traces on the targeted
aimed at reproducing the behavior of the adversary. In these information system; and (2) the exploitation of these traces
exercises, the Blue Team seeks the Red Team in a Threat through his defensive procedures. The benefits of the proposed
Hunting operation [1], while the Red Team tries to carry out model are twofold. First of all, it provides a high-level repre-
their attack as stealthily as possible. These exercises allow for sentation of the attack campaign, allowing to quickly assess
testing the security controls, sensors, Security Information and the attacker and defender progressions. This representation
Event Management (SIEM), and incident response processes. can also be used by an omniscient third party to mea-
In such a game an attacker, the Red Team, intends to break into sure the success of a Red versus Blue exercise. Second, the
the infrastructure defended by the Blue Team. Such exercises model highlights how the defender can improve the efficiency
of attacking and defending are inspired by similar military of his detection process by tweaking the input (configura-
tions and rules) of some of his defensive procedures. Here,
Manuscript received October 14, 2020; revised January 5, 2021 and January we conduct an experiment with our model, using an attack
27, 2021; accepted February 1, 2021. Date of publication February 3, 2021;
date of current version June 10, 2021. The associate editor coordinating campaign issued from the public project Mordor [3]. This
the review of this article and approving it for publication was S. Scott- attack campaign mimics the real-world threat APT29 in a
Hayward. (Corresponding author: Aimad Berady.) scenario designed by MITRE for the purposes of ATT&CK
Aimad Berady, Valérie Viet Triem Tong, and Gilles Guette are with
CentraleSupélec, Inria, University Rennes, CNRS, IRISA, 35042 Rennes, Evaluations [4]. The confrontation of these two perceptions
France (e-mail: aimad.berady@irisa.fr; valerie.viet_triem_tong@irisa.fr; allows us to define a metric to estimate the deployed detection
gilles.guette@univ-rennes1.fr). chain efficiency.
Mathieu Jaume is with Sorbonne Université, CNRS, LIP6, 75005 Paris,
France (e-mail: mathieu.jaume@lip6.fr). This article is structured as follows. Section II provides
Digital Object Identifier 10.1109/TNSM.2021.3056999 an overview of the concepts manipulated in the model.
1932-4537
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1322 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021
Sections III and IV present respectively the attacker and to the victim (some of them related to components of C), by
the defender’s perspectives. Section V details the experiment OA the set of objects relative to the attacker and by O the set
architecture and specifies requirements for integration in exist- OD ∪ OA .
ing infrastructures. Section VI discusses how to evaluate the
efficiency of the detection chain and how to enhance it. B. The Attacker Scope
The attacker can be an individual, a group, or a Red Team,
II. OVERVIEW but for the sake of clarity, we simply refer here to the attacker.
This article formalizes the Threat Hunting process con- In the same way, the presence of multiple attackers in the
ducted by an incident response team, towards ultimately victim’s network is not an obstacle because the ambition of
evaluating the efficiency of a detection chain. We start our pro- this model is to provide a more exhaustive view of the com-
cess with the attacker’s point of view; the attacker has initiative promised components. The attacker is at the initiative of the
and sets the tempo. attack campaign and has his own components at his disposal,
We begin by specifying the scope of this study and explain- which is part of his infrastructure. He also owns a collection of
ing the terms we will use in the rest of this article. Thus, this attack procedures (denoted TTPA ) often related to techniques
section successively details the infrastructure under considera- described in the MITRE ATT&CK matrix. These procedures
tion, the attacker’s scope, and then the defender’s scope before may be parameterized by objects from OD as well as objects
outlining how we will be able to compare their two points of from OA . These procedures are executed on components in the
view with the help of two graphs. targeted network (in C) only if the attacker has already dis-
covered these components. The attacker scope is completely
A. The Infrastructure formalized in Section III.
The targeted network is the information system hosting the
attacker’s final objective. In this network, we distinguish the C. The Defender Scope
components from the objects. The components are assets (i.e., The defender can be the victim itself, the security team
machines) with logging capabilities over which the defender of a company, an external security team, or a Blue Team.
has full control, such as computers, servers, or appliances. For the same reasons, we simply refer here to the defender.
The objects are measurable events or stateful properties rela- The defender has defensive procedures to cope with an attack
tive to malware characterization, intrusion detection, incident campaign. The defender only observes the events occurring
response or digital forensics. These objects correspond to the on components in C that he has chosen to monitor. The rele-
observable objects defined in the STIX standard [5]. Each vant events to be monitored by sensors are specified in their
object plays a precise role in the context of an event. This configuration S. The defensive procedure plogs allows him to
role specifies the function that the object holds in the event. generate traces from events that occur on these components.
The set of possible roles is here denoted by R. Each role r The defensive procedure phunting exploits these traces in an
is associated with a unique type that specifies the nature of attempt to identify in C the components compromised by the
any object playing this role. By denoting T the set of possible attacker. The defender scope is formalized in Section IV.
types, the type associated with a role is defined through the
function τ : R → T.
For our implementation, we used the MITRE Cyber D. Confronting Perspectives
Analytics Repository (CAR) [6] data model to name objects’ We propose here to represent the attacker network propa-
roles. An object can play different roles associated with dif- gation during an attack campaign by a persistent graph GA
ferent types. For example, the object maliciousfile.com between objects and components. This graph will be com-
can hold the role destination hostname with the type domain, puted from the sequence of attack procedures and the involved
the role file name, or even the role executable with the type file. objects. These objects represent characteristic information that
The three columns on the left of the Table I detail some exam- the attacker cannot conceal and is aware of exposing.
ples of objects and their relative types and roles. IP addresses, Similarly, we represent the defender perception of the
domain names, or files are examples of frequently observed attacker’s propagation by a second persistent graph GD . This
types. graph computation relies on the defensive procedures exe-
Although the attacker also has his own machines, which cuted by the defender. Figure 1 gives a global overview of
could be components, in this model, we only consider the vic- this process, including the attacker and the defender scopes.
tim’s components since those of the attacker cannot be reached The knowledge of the attacker and the defender of the tar-
by the defender, in a strictly defensive posture. Nevertheless, geted network are enriched by their own directory. For both
the defender may have an insight into some of the attacker’s the attacker and the defender, their directory determines how
objects (e.g., a domain name, a hash value, an IP) because the an object with a given type is relative to a component from
attacker would have exposed them. The attacker will, however, his own perspective. In the following, DA will be the attacker
be able to discover and gain access to part of the victim’s directory and D the defender directory. Both DA and D are
components through objects. For these reasons, this article possibly imperfect or incomplete; they are here represented by
highlights only the set of components of the targeted network, two functions from T × OD to C ∪ {None}. When the attacker
refering to it as C. We denote by OD the set of objects relative (resp. the defender) has no information concerning an object
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1323
TABLE I
E XAMPLES OF O BJECTS , T HEIR R ELATIVE T YPES , P OSSIBLE ROLES , AND E XISTENCE IN B OTH D IRECTORIES
o with a type t or if the object o in a type t does not corre- observed by the infosec community. Subsequently, Roberto
spond to a component, then DA (t, o) is equal to None (resp. Rodriguez [3] was at the initiative of a dataset of logs recorded
DD (t, o) = None). The representation of a component c from on an infrastructure allowing the procedure execution from
the attacker’s point of view may be different from the repre- scenarios. We decide to merge these two scenarios because of
sentation of the same component c from the defender’s point their similarities in targeting the same infrastructure and the
of view. For example, the defender may know that a machine fact that they mimic the same APT actor. Among the traces in
has a given IP address, is a Web server, owns files, and has the dataset, we focused on those coming from the Sysmon sen-
a name, while the attacker only knows the IP address of this sor, which is the one recommended according to state of the
machine. In the Table I example, the two right columns show art [7]. Experimentation infrastructure is detailed in Section V,
if an object with a specific type is related to a component and the results of this experiment are discussed in Section VI.
using the attacker and the defender directory.
In the following, Section V explains how the comparison of
III. ATTACKER ’ S P ERSPECTIVE
GA and GD allows confronting the perceptions of the attack
campaign from both points of view: that of the attacker and The attacker is modeled through a graph of objects and
that of the defender. We also show how the Threat Hunting components. The components belong to the targeted network,
operation can be greatly improved by increasing the quality of with which the attacker interacts while executing an offensive
Indicators of Compromise (IoC) and Events of Interest (EoI) procedure. The objects represent the traces that the attacker
used by the defender. is aware of leaving on the target infrastructure while execut-
ing these procedures. This section details the construction of
E. Experiment: APT29 Simulated Campaign this graph by taking into account both actions, which define
the whole attack campaign, and progression of the attacker’s
In the context of cybersecurity product evaluations, MITRE knowledge about the targeted network.
creates attack scenarios inspired by real-world threats. Two
of them concern the so-called APT29, which is a state-
sponsored attackers group that has been active since 2008. A. Actions of the Attacker
During these attacks, the attacker used different procedures to An attack campaign A is composed of a sequence
collect and exfiltrate sensitive files from the targeted network e1 , . . . , eN of executions of N attack procedures p1 , . . . , pN
after exploring it. These two scenarios detail the steps of an on components in C of the targeted network. We assume here
APT campaign using the threat actor’s TTPs as they were that the attacker knows at least one initial component of the
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1324 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1325
side. To this end, the defender uses his own defensive proce-
dures. The semantic framework proposed in this article allows
for generalizing offensive or defensive procedures. We specify
that we have already identified several other tactics that would
be generalizable. These include, for example, pdiscovery , which
consists for the attacker to browse a namespace to discover
new components.
In the following, we describe two main defensive proce-
dures: plogs and phunting , which must be implemented by the
defender to conduct incident response operations. Attack cam-
Fig. 2. G(psexec). paigns against the targeted network generate events on its com-
ponents. We represent an event ev by a tuple (, c, O) where
∈ E denotes the type of ev (with E the set of event types
that can be observed), c ∈ C is the component over which the
event is observed, and O = {(o1 , r1 ), . . . , (om , rm )} ⊆ O×R
contains all the objects involved in this event associated with
their role from the defender’s point of view.
Depending on the configuration of sensors, events can gen-
erate traces. A trace x is a tuple (idx , t, , c, O) where idx is
the trace identifier, t is a timestamp, ∈ E is the type of
events causing this trace, c ∈ C is the component where the
event causing this trace has been observed and O are all the
objects together with their roles involved in this trace. A single
object can assume several roles in a single event and therefore
appears several times in the O set. In contrast, each role is
unique within the same trace.
In a Threat Hunting process, the defender can influence the
quality of his results on three aspects: sensor configuration,
detection rules, and his IoC database.
Fig. 3. GA computed by the attacker during APT29 simulated campaign.
First, the defender decides through his defensive procedure
plogs which components on the targeted network are moni-
tored by sensors and which events produce traces that will
graph GA (A) resulting from the union of all
the graphs asso- be forwarded to the Security Information Event Management
N
ciated with each execution: GA (A) = i=1 G(ei ). The (SIEM). This procedure, detailed in Section IV-A, will gener-
computation of this graph allows the attacker to represent ate a huge number of traces allowing the defender to identify
all the objects he exposes during his attack campaign. This the traces relative to the attacker’s activity.
graph can be seen as an attack footprint exposed to a defender. The second defensive procedure called phunting , detailed in
Moreover, in a Red versus Blue exercise, this graph allows an Section IV-B, exploits those traces.
omniscient team (i.e., the White Team) to measure the dis-
tance between what the Blue Team has actually found and the
attacker’s actual footprint. A. Targeted Network Monitoring
Figure 3 presents the graph GA computed by the attacker Technically, the defender is able to monitor almost any event
during the APT29 simulated campaign in 49 steps corre- occurring on the targeted network’s components. Nevertheless,
sponding to execution procedures, 4 compromised components many of these events tend to be irrelevant from a security point
SCRANTON, NASHUA, NEWYORK and UTICA, which are the of view and may generate too many false-positives in a SIEM
four on which the attacker actually executed procedures dur- alerting system.
ing the attack campaign. Those 4 components are also defined As the traces raised by the sensors are the only way for
in his directory. the defender to perceive a part of the attacker’s activity, the
defender has to pay very close attention to the configuration S
of sensors. The more meaningful a trace is, the more valuable
IV. D EFENDER ’ S P ERSPECTIVE it is to the defender.
We consider the same attack campaign but now from the We assume here that all components can be observed, and
defender’s perspective. The defender is never considered to their monitoring is configured by a sensor configuration S.
have initiative. The aim of the defender is to compute a rep- In this configuration, the defender defines the relevant event
resentation of the attacker propagation in the targeted network types for each component c ∈ C, which need to be traced and
from his (the defender’s) point of view. In other words, his goal filtered on objects according to their roles. The configuration
is to compute a graph of objects and components that are as S(c) makes it possible to monitor the component c. The con-
similar as possible to the one produced by the attacker on his figuration is defined by a set of tuple (, φ, R ) where ∈ E
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1326 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021
objects). The first option will generate too many traces and
risks producing many false-positives. The second option will
rarely be positively satisfied and thus will produce a very few
numbers of traces, which will lead the defender to misjudge
the threat.
Finally, the defender’s defensive procedure determines the
monitored events and the reported traces in the procedure
plogs . This procedure is parameterized by S that specifies the
configurations associated with components and by E a set of
events that occurred on the components.
plogs generates a set of traces Xc for each component c:
Listing 2. Sysmon Sensor Configuration Example, Part of a plogs Procedure.
TABLE II
R ELEVANT ROLES IN A S YSMON T RACE AND T HEIR CAR NAME
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1327
TABLE III
E XTRACT OF THE I O C DATABASE U SED BY THE D EFENDER D URING
APT29 T HREAT H UNTING P ROCESS
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1328 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021
V. M ODEL E XPERIMENTATION
Integrating our approach in a production environment would
allow both the attacker and the defender on either side to
become aware of the traces left on the victim’s network. The
Fig. 5. GD computed by the defender during APT29 simulated campaign. attacker could use this information to improve the stealthiness
of his procedures, and the defender could use it to improve his
Threat Hunting process. In practice, its deployment in existing
architectures requires:
3) Big Step: Finally, an attack campaign A observed by • On the Attacker’s Side: a logging system for executed
a defender through a collection
of traces X can be modeled procedures formatted according to Listing 1; a directory
by the graph G(X ) = x∈X G(x, Rx , Ox ) resulting from the that corresponds to the attacker’s current knowledge of
union of all the graphs associated with each Event of Interest the victim’s network;
computed from a set of traces X. • On the Defender’s Side: sensors installed on defended
Figure 5 is the graph computed by the defender during the components and configured as presented in Listing 2
APT29 simulated campaign with rules from the public project and whose relevant roles are specified as presented in
Sigma. Section V gives all the details on its computation and Table II; a directory that corresponds to the defender’s
discusses how to deal with the objects present. current knowledge of the victim’s network; an indexer
The graph on the Figure 5 has to be compared with the graph enriched with a set of detection rules as in Listing 4; an
representing this same attack campaign from the attacker’s IoC database, such as in Table III, an analyst for tasks
perspective on the Figure 3. that are not automatable to date, such as IoC generation.
4) Quality of the Defender Perspective: The quality of In this article, we take the point of view of an omniscient
the graph G(X) constructed by the defender is influenced by actor, which allows us to answer the following questions.
three parameters: the configuration of the sensors S, the set How to measure the quality of defensive architecture?
of detection rules R database, and the IoC database. In a The comparison of the graphs GA and GD allows calcu-
Threat Hunting process, updating the sensor configuration or lating the coverage rate of objects coming from the attacker
the set of detection rules is too long and too impactful to be into the defender’s graph. This allows estimating the relevance
done straightaway. On the other hand, the IoC database can of the detection chain.
and must be updated each time a trace is considered to be How to reduce the defender’s graph to unveil the attacker?
of interest by the function generate_IoC((Rx , Ox ), x, IoC). Too many objects in the defender’s graph GD can make it
All objects that appear in a trace considered as an Event unusable. We are therefore looking for ways to reduce the
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1329
Listing 5. Part of All Procedures From the Mordor APT29 Attack Campaign.
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1330 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021
TABLE IV
D EFENDER P OINT OF V IEW: B IG G RAPH S PECIFICATIONS
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1331
Fig. 7. Evolutions of coverage rates and count of objects present in the graph GD according to the number of disabled rules.
• Update(GD , RD ) is a function that updates the graph allow a good understanding of the different layers and their
GD view by excluding the events resulting from the interactions. We observe two main lines of research that should
rules RD . converge: those that formalize the relationships between real-
Implementing RD as a list allows us to keep the order in which life components; and those focusing on the improvement of
the rules were disabled and therefore revert the graph GD to actors’ strategies, for instance, based on Game Theory.
an earlier state.
Figure 7 shows two 3D curves which correspond respec-
A. Need for Unified Views
tively to the deactivation of Top-objects and Top-events strate-
gies. We can observe that the Top-objects strategy seems to be Gianvecchio et al. [11] point out the semantic gap between
the most effective because it allows maintaining high coverage the defender and the attacker. Indeed, the attacker operates
while considerably reducing the number of objects. So when at the level of strategy and tactics; he focuses on target dis-
there are only 63 objects left in the graph and only 4 rules covery, and can deploy various kill chains tactics [8], [12].
have been deactivated, the coverage rate is 24.4%. However, the defender spends significant time processing low-
level, rule-generated alerts and single-log analysis can hardly
C. Discussion reveal the complete attack story for complex, multi-stage
attacks. Gianvecchio et al. reduce this gap by proposing an
By applying a rule disabling strategy, the defender will thus
explicit model of the attacker strategy using machine-readable
considerably reduce the number of objects to be investigated
data structures and clustering of security events around TTP
while maintaining a sufficiently high coverage rate. However,
annotations from a well-known behavioral taxonomy. This
it should be emphasized that, unlike a detection system which
model enables the defender to operate similarly to the attacker
aims to automatically contain a technical threat (e.g., anti-
at the strategic level without sacrificing their ability to drill
malware, Endpoint Detection and Response), the defensive
into evidential details. For their implementation, they used
infrastructure, as described in this article, aims to collect as
CALDERA, previously introduced by Applebaum et al. [13]
much Cyber Threat Intelligence as possible related to the
in order to automate Red Teaming while retaining the concept
adversary in order to be able to better hunt it in the victim’s
of TTP.
network. Thus, an exhaustive detection is not necessary since
the objective is not to block unknown threats but simply to
ensure the sufficient number of IoC that will allow it to be B. From Events of Interest to Objects
tracked in the victim’s network. This approach of measuring In [14], Najafi et al. are one of the first to introduce the intu-
the coverage rate is possible only in the context of Red versus ition behind global features for threat detection. They define a
Blue exercises. However, the disabling of too verbose rules can SIEM-based knowledge graph allowing highlighting the most
be done in post-processing without observing the evolution of important entities (what we call objects) and relationships
the coverage rate, but simply in order to clean up the graph observed in Event Logs of Interest extracted from DNS and
generated by the defender and to make it more exploitable. proxy logs. These relations are enriched with information gath-
ered from publicly available sources of threat intelligence.
VII. R ELATED W ORKS Over this knowledge graph, Najafi et al. design MalRank,
Threat Hunting is an agile and iterative process of search- a graph-based inference algorithm designed to infer a node
ing, characterizing, and later identifying attackers who may maliciousness score based on its associations to other entities
have compromised the victim’s network. In 2021, it is still a presented in the graph. MalRank has successfully been helpful
widespread focus of cyber defense research. We believe that in identifying previously unknown malicious entities such as
this area still requires further formalization efforts in order to malicious domain names and IP addresses.
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
1332 IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, VOL. 18, NO. 2, JUNE 2021
C. From Traces to Indicator of Compromise graph analysis subserves the attack correlations by identify-
In [9], Kurogome et al. propose to enhance the Threat ing event patterns. During this study, we better understood the
Hunting process by automatically generating accurate and origin of objects which then become IoC. We also emphasize
interpretable IoCs from malware traces. They design EIGER that some objects, which have a meaning only in the con-
that takes a dataset of traces computed from malware as input. text of the information system where they were found, can
EIGER then computes IoCs of different abstraction levels be very interesting to exploit for Threat Hunting. Finally, our
using an enumerate-then-optimize algorithm. Kurogome et al. model explains the mutual inference necessary for attacker
demonstrate that their generated IoCs bear comparison with and defender to understand each other. The most valuable
manually generated ones, which indicates that EIGER is an intelligence is the understanding of attacker’s procedures.
appealing complement to endpoint malware detection in real- In future work, we plan to focus on graph similarity com-
world security operations. This is an example of a concrete putation in order to design metrics that could testify to the
implementation for the function generate_IoC formalized quality of attacker and defender points of view. Such exper-
here. iments require implementing different comparison algorithms
and benchmarking them. However, beyond a metric confronta-
D. From Logs to Defender’s Perspective tion between two graphs, we hope the semantics introduced
in this article allows for interpreting the differences between
Pei et al. describe in [15] HERCULE a log-based intru- attacker and defender perceptions at a deeper level, towards
sion analysis system. It models the relationship between designing other defensive procedures.
multiple logs in the system and automatically generates a In the long run, our goal is to adjust defender levers defined
multidimensional weighted graph with potentially valuable in this article dynamically: sensor configuration, detection
information for the defender embedded within. Their proposed rules, and IoC database to better reveal the presence of an
graph provides a panoramic view of the logs generated by attacker. To achieve this goal, we might need to define a defen-
different system components and help the defender to under- sive strategy that takes deployment costs into account. Thus,
stand the whole attack trace. Recently, Burr et al. published a Red versus Blue exercise would have concrete and immedi-
a study [16] that focuses on community detection in graphs ate technical repercussions on the company’s cybersecurity to
constructed from Intrusion Detection System (IDS) alerts. fight Advanced Persistent Threats.
This article is in line with these works and proposes a richer
model that offers a dual view of the Threat Hunting process
by also taking the perception into account, but also knowledge A PPENDIX
and actions of the campaign from the attacker perspective. S ATISFACTION R ELATIONS
We believe that in the near future, this work will allow us to a) Sensor Configuration: Expressions φ of sensor configura-
join the research conducted in Game Theory and in Security tions are built from logical constants true, false and operators
Games [17], [18] where the Threat Hunting process requires and, or, not applied over pairs (r, c) where r is a role and c
more accurate models [19]. In these games, the defender, who is a property over the object playing this role (we write c(o)
is monitoring some collection of resources, has to decide how to express that an object o satisfies the property c):
to deploy a certain number of sensors or honeypots [20] at
some predetermined cost. His goal is to properly protect this φ ::= true | false | (r, c) | not φ | φ and φ | φ or φ
network at a minimal cost. The attacker’s goal is to cheat the
Then, given a set O = {(o1 , r1 ), . . . , (on , rn )} ⊆ O × R, the
defender in order to reach his objectives.
satisfaction relation |= of a condition is defined by:
VIII. C ONCLUSION O |= true
Threat Hunting is a fundamental step of an incident response O |= (ri , c) iff c(oi )
operation that allows for spotting the components of an O |= not φ iff O |= φ
information system compromised by an attacker. In this pro-
cess, the defender’s ambition is to shed light on the attacker’s
O |= φ1 and φ2 iff O |= φ1 and O |= φ2
propagation area in order to best prepare the operation for his O |= φ1 or φ2 iff O |= φ1 or O |= φ2
eradication.
We assume here that for each role in φ there exists an object
In this article, we have proposed a model to analyze both the
in O playing it and that a role occurs at most one time in O.
attacker propagation and the defender knowledge of that prop-
b) Detection Rule: Detection rules r are built from logical
agation. All the steps involved in a Threat Hunting approach
constants true, false and operators and, or, not applied over
have been carefully formalized here. Our approach allows
pairs (α, c) where α is a timestamp t, or an event type ∈ E,
enhancing the knowledge base of the defender with new
or a component c ∈ C, or a role r ∈ R, and c is a property over
Indicators of Compromise, which can subsequently enable
α (we write c(α) to express that α satisfies the property c):
proactive threat detection. Furthermore, our model and its
experimentation highlight the existence of false-positives in r ::= true | false | (α, c) | not r | r and r | r or r
particular because of lax detection rules. This feature is valu-
able for the defender since it allows him to gain efficiency Then, given a trace x = (idx , t, , c, O), the satisfaction relation
and to improve his detection tools. In particular, because the |= of a detection rule r is inductively defined by:
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.
BERADY et al.: FROM TTP TO IOC: ADVANCED PERSISTENT GRAPHS FOR THREAT HUNTING 1333
x |= true R EFERENCES
x |= t , c
iff t = t and c(t) [1] E. C. Thompson, “Threat hunting,” in Designing a HIPAA-Compliant
Security Operations Center. Berkeley, CA, USA: Apress, 2020.
x |= , c iff = and c() [2] F. Maymi, R. Bixler, R. M. Jones, and S. D. Lathrop, “Towards a defi-
x |= c , c iff c = c and c(c) nition of cyberspace tactics, techniques and procedures,” in Proc. IEEE
Int. Conf. Big Data, 2017, pp. 4674–4679.
x |= (ri , c) iff (o, ri ) ∈ O and c(oi ) [3] R. Rodriguez. (2020). APT29 Activity From the ATT&CK Evaluations.
[Online]. Available: https://github.com/hunters-forge/mordor/tree/
x |= not r iff x |= r master/datasets/large/apt29
x |= r1 and r2 iff x |= r1 and x |= r2 [4] MITRE Corporation. (2019). ATT&CK Evaluation. [Online]. Available:
https://attackevals.mitre-engenuity.org/
x |= r1 or r2 iff x |= r1 or x |= r2 [5] OASIS Cyber Threat Intelligence. (2017). STIX a Structured Language
for Cyber Threat Intelligence. [Online]. Available: https://oasis-open.
c) Main Notations: github.io/cti-documentation/stix/intro
[6] MITRE Corporation. (2018). The MITRE Cyber Analytics Repository
– C is a set of components of the targeted network; c is a (CAR). [Online]. Available: https://car.mitre.org/
component in C [7] V. Mavroeidis and A. Jøsang, “Data-driven threat hunting using
– O = OD ∪ OA where OD (resp. OA ) is the set of objects Sysmon,” in Proc. 2nd Int. Conf. Cryptogr. Security Privacy, 2018,
pp. 82–88.
relative to the victim (resp. to the attacker); o ∈ O [8] A. Berady, V. V. T. Tong, G. Guette, C. Bidan, and G. Carat, “Modeling
– R is a set of roles associated with objects the operational phases of APT campaigns,” in Proc. 6th Annu. Conf.
– T is set of types associated with roles; t is a type in T Comput. Sci. Comput. Intell., 2019, pp. 96–101.
[9] Y. Kurogome et al., “EIGER: Automated IOC generation for accu-
– τ : R → T is the function from roles to types rate and interpretable endpoint malware detection,” in Proc. 35th Annu.
– DA (resp. D) : (T × OD ) → (C ∪ {None}) is the attacker Comput. Security Appl. Conf., 2019, pp. 687–701.
(resp. defender) directory [10] Sudhakar and S. Kumar, “An emerging threat fileless malware: A survey
– E is a set of types of observable events (by the defender) and research challenges,” Cybersecurity, vol. 32, p. 1, Jan. 2020.
[11] S. Gianvecchio, C. Burkhalter, H. Lan, A. Sillers, and K. Smith,
on components; is an event in E “Closing the gap with APTs through semantic clusters and auto-
– ev = (, c, O) is an observable event on a component mated cybergames,” in Lecture Notes of the Institute for Computer
– O ⊆ O × R is a (sub)set of objects with their role Sciences, Social Informatics and Telecommunications Engineering.
Cham, Switzerland: Springer, 2019.
– x = (idx , t, , c, O) is a trace observed on c at date t [12] P. N. Bahrami, A. Dehghantanha, T. Dargahi, R. M. Parizi,
– S is a set of sensor configurations K.-K. R. Choo, and H. H. S. Javadi, “Cyber kill chain-based tax-
– S(c) is a sensor configuration on the component c onomy of advanced persistent threat actors: Analogy of tactics,
techniques, and procedures,” J. Inf. Process. Syst., vol. 15, no. 4,
– TTP is a set of procedures; p is a procedure in TTP pp. 865–889, 2019.
– A = e1 , . . . , eN is an attack campaign [13] A. Applebaum, D. Miller, B. Strom, C. Korban, and R. Wolf,
“Intelligent, automated red team emulation,” in Proc. 32nd Annu. Conf.
– e is a procedure execution Comput. Security Appl., 2016, pp. 363–373.
– M(e) is a machine (component) on which e is executed [14] P. Najafi, A. Mühle, W. Pünter, F. Cheng, and C. Meinel, “MalRank: A
– φ is a condition over some objects and their roles measure of maliciousness in SIEM-based knowledge graphs,” in Proc.
35th Annu. Comput. Security Appl. Conf., 2019, pp. 417–429.
– R ⊆ R are relevant roles for an event type [15] K. Pei et al., “HERCULE: Attack story reconstruction via community
– (, φ, R ) is an element of a sensor configuration discovery on correlated log graph,” in Proc. 32nd Annu. Conf. Comput.
– IoC ⊆ O × T is an Indicators of Compromise database Security Appl., 2016, pp. 583–595.
– R is a set of detection rules; r is a rule in R [16] B. Burr, S. Wang, G. Salmon, and H. Soliman, “On the detection of
persistent attacks using alert graphs and event feature embeddings,” in
– EoIR,IoC (x) = (Rx , Ox ) is the function providing relevant Proc. NOMS, 2020, pp. 1–4.
rules (Rx ⊆ R) and objects (Ox ⊆ O) from a trace generated [17] A. H. Anwar and C. Kamhoua, “Game theory on attack graph for
cyber deception,” in Decision and Game Theory for Security. Cham,
by an Event of Interest Switzerland: Springer, 2020.
– G(e) = (Ve , →e ) is an attacker’s graph relating to an [18] T. E. Carroll and D. Grosu, “A game theoretic investigation of deception
execution e in network security,” in Proc. 18th Int. Conf. Comput. Commun. Netw.,
– GA (A) = (VA , →A ) is the attacker’s network propaga- 2009, pp. 1–6.
[19] M. Bilinski et al., Lie Another Day: Demonstrating Bias in a
tion graph resulting from the A attack campaign Multi-round Cyber Deception Game of Questionable Veracity. Cham,
– G(x, Rx , Ox ) is a defender’s graph relating to a trace x Switzerland: Springer, 2020.
[20] R. Píbil, V. Lisý, C. Kiekintveld, B. Bošanský, and M. Pěchouček,
– GD (X ) = (VD , →D ) is the defender’s perception graph “Game theoretic model of strategic honeypot selection in computer
of the attacker’s propagation associated with each Event of networks,” in Decision and Game Theory for Security. Berlin, Germany:
Interest computed from a set of traces X. Springer, 2012.
Authorized licensed use limited to: K J Somaiya College of Engineering - MUMBAI. Downloaded on December 01,2021 at 16:01:15 UTC from IEEE Xplore. Restrictions apply.