Malicious Data Leak Prevention and Purposeful Evasion Attacks An Approach To Advanced Persistent Threat (APT) Management

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Malicious Data Leak Prevention and Purposeful

Evasion Attacks: An Approach to Advanced


Persistent Threat (APT) Management
Tarique Mustafa,
Founder & Chief Executive Officer / Chief Technology Officer
nexTier Networks, Inc.
2953, Bunker Hill Lane, Ste: 400, Santa Clara, CA-95054, USA
Tarique@nextiernetworks.com

Abstract— Existing Data Leak Prevention (DLP) solutions are Prevention” capability wherein “Purposeful Evasion Attacks”
inherently incapable of scaling beyond trivial scenarios of can be effectively detected and prevented.
“Accidental Data Leak” wherein no “Purposeful Evasion
Attack” is encountered. Nevertheless, these attacks can render a With the advent of Advanced Persistent Threats (APTs)
DLP system completely useless (or greatly depreciate the
effectiveness/usefulness of any DLP solution). A true DLP
against Information Security Systems, “Purposeful Evasion
solution, therefore, must support “Malicious Data Leak Attacks” have assumed even more serious significance. In fact
Prevention” capability wherein “Purposeful Evasion Attacks” “Purposeful Evasion Attacks” have emerged as the most
can be effectively detected and prevented. sophisticated class of threats against DLP solutions.

With the advent of Advanced Persistent Threats (APTs) against Unfortunately, the “Purposeful Evasion Attacks” have also
Information Security and DLP Systems, “Purposeful Evasion remained un-addressed in their most basic forms, primarily
Attacks” have emerged as the most sophisticated class of threats because, 1) “Purposeful Evasion Attacks” are difficult to
against DLP solutions. Unfortunately, “Purposeful Evasion
address, and 2) “Purposeful Evasion Attacks” pose
Attacks” have also remained un-addressed in their most basic
forms.
unparalleled challenges at the basic algorithmic level.

This paper presents (1) an insight into the lifecycle of APTs In Section II this paper provides an insight into the anatomy
launched against Information Security and DLP systems, (2) a and lifecycle of APTs that launch Evasion Attacks against the
classification of real-life “Purposeful Evasion Attacks” against DLP systems after infiltrating the target infrastructure. In
Information Security and DLP systems, (3) a reference model for Section III an in-depth analysis and classification of various
enabling Malicious Data Leak Prevention (called 3-D Correlation real-life Evasion Attacks against Information Security and
Paradigm). DLP systems is presented. The specific algorithms targeted by
each type of Evasion Attacks are also described. Section IV
Keywords— Data Leak Prevention, Malicious DLP, Evasion presents a reference model for Malicious Data Leak
Attack, Information Security, Advanced Persistent Threat, APT, Prevention System that enables mechanisms for protection
False Negative, Egress Control.
against “Purposeful Evasion Attacks” launched via APTs (or
by a ‘Mole’) in the target infrastructure. Section V
summarizes some conclusions.
I. INTRODUCTION
A security solution is only as good as its weakest This paper will enable the audience to, 1) Better understand
vulnerability. Conventional DLP approaches are effective for the real-life significance of “Purposeful Evasion Attacks” in
Accidental Data Leak (ADL) only. An example of ADL is the the context of Information Security and Data Leak Prevention,
scenario where an “unintended” recipient is included in the 2) Evaluate the inherent limitations of existing DLP
‘To List’ of an email, utterly by mistake. Conventional ADL algorithms that further aggravate vulnerability to “Purposeful
based DLP solutions are inherently incapable of scaling Evasion Attacks”, 3) Design and implement methods to
beyond trivial scenarios of ADL wherein no “Purposeful counter the “Purposeful Evasion Attacks” against Information
Evasion Attack” is encountered. Security and Data Leak Prevention solutions, 4) Establish
business processes and guidelines to help mitigate
Nevertheless, these “Purposeful Evasion Attacks” can vulnerabilities against various “Purposeful Evasion Attacks”,
render a DLP system completely useless (or greatly depreciate 5) Improve their overall strategy against APTs and eliminate
the effectiveness/usefulness of any DLP solution). A true DLP compliance violation issues that result from “Purposeful
solution, therefore, must support “Malicious Data Leak Evasion Attacks”.

978-1-4673-6195-8/13/$31.00 ©2013 IEEE


II. MALICIOUS DATA LEAK – AS A CLASS OF ADVANCED could range from (a) Identity Theft and Masquerading, to (b)
PERSISTENT THREAT Exploitation of the known “Vulnerability” of the Data/Content
Malicious Data Leak (MDL) is one of the most serious Identification and Matching Algorithms.
challenges facing the information security industry [6], both
the commercial market and the Government
III. EVASION ATTACKS
departments/agencies. It involves the scenario wherein an
agent (whether a human actor or a malicious process, such as Evasion Attacks are a new class of cyber attack against the
a malware or bot) causes a deliberate unauthorized Information Security and DLP infrastructure [5], [6]. Evasion
exfiltration of confidential and/or mission critical information. Attacks can be invoked either manually (by a ‘Mole’) or via
If the MDL is caused by a human actor, it is regarded as a APTs.
classical case of industrial (or military) espionage where the
human actor is a mole or a spy. If the MDL is effectuated via Evasion Attacks can render Information Security and DLP
non-human actors (such as malicious process, malware, bot, infrastructure completely useless (or greatly depreciate its
etc.) it takes the form of a quintessential Advanced Persistent usefulness). Real-life impact of Evasion Attacks on
Threat (APT). Information Security and DLP infrastructure manifests itself
as False Negatives i.e. failure to control Egress,
A. How is Malicious Data Leak Invoked? • Very frequently for Unstructured & Semi-structured
An MDL follows a distinct lifecycle and pattern as listed information (e.g. emails, spreadsheets, ad-hoc
below; documents, etc.).
• Relatively frequently for Structured Data &
1) Infiltration Phase: In this phase the Actor (human mole Information (e.g. database information, PCI, PII, etc.).
or non-human malicious process) infiltrates the target
infrastructure. These attacks are extremely difficult to detect as they
2) Identity Acquisition Phase: In this phase the Actor involve sophisticated techniques that exploit vulnerability of
(human mole or non-human malicious process) acquires a the underlying Information Security and DLP constructs at the
legitimate Identity and corresponding Authorizations in the systemic and algorithmic levels. Fig. 1 illustrates a formal
target infrastructure. layout of an exemplary construct for Information Security and
DLP process. The two main variables in this model are (a)
3) Discovery and Access Phase: In this phase the Actor Identity of the Actor, and (b) the Content itself. Hence, these
(using its Identity and Authorizations) launches discovery two variables constitute the key vulnerability of Information
campaign to locate the target information on the target Security and DLP constructs forming the main focus of
infrastructure. Once the target information is located, access Evasion Attacks (whether manual through a ‘Mole’ or via an
processes are invoked to gain access to the target information. APT).
4) Exfiltration Phase: In this phase the Actor invokes the
exfiltration process. Depending upon the sophistication of the
Actor multiple actions are taken, including,
• Protocol and Communication Channel Selection
• Target Destination Selection
• Evasion Attack Selection

Depending upon the sophistication of the Actor, these


decisions are made either autonomously by the Actor or in
conjunction with the Command & Control (C2) mechanism of
the APT.
B. Why is Malicious Data Leak More Complex?
Malicious Data Leak is more difficult to address not only
due to the presence of intent to exfiltrate on part of the
Actor(s) but also due to the absence of foolproof underlying
technology for automated identification of important classified
data and information. Data and content identification and Fig. 1 Information/Data Exfiltration Prevention: Formal Definition
matching algorithms employed in conventional DLP solutions
to protect against ADL can be easily gamed. Depending upon
its sophistication an MDL usually involves invocation of
“Purposeful Evasion Techniques” (called Evasion Attacks)
against the Information Security and DLP systems deployed
on the target infrastructure. Examples of Evasion Attacks
A. Types of Evasion Attacks
Evasion Attacks can be broadly classified into two
categories;
1) Identity Based Evasion Attacks: It is to be noted here
that the most likely source of Evasion Attacks against
Information Security and DLP infrastructure is “Mole” within
the enterprise with proper Identity and Access Rights. Or an
APT wherein the malicious process assumes a legitimate
Identity and Access Rights.
Hence, Identity based Evasion Attacks involve obtaining a
‘legitimate’ Identity and Access Rights for an Actor either via
a properly prescribed process (e.g. in the case of a ‘Mole’) or
through stealth in the case of APT. Since most of the
Information Security and DLP solutions rely exclusively on
the ability to protect the Identity, once the Identity is
compromised the rogue Actor has no obstacles and Fig. 1 Content Identification and Matching Process: Epicentre of Content
exfiltration can be performed unhindered. based Evasion Attacks

Identity based Evasion Attacks in the context of


Information Security and DLP has been an ongoing topic of
B. Classification of Content based Evasion Attacks
research [5], [6], [7], [8], [9]. Many remedial means [1], [4]
have been devised with varying levels of success. Content based Evasion Attacks involve data/content
manipulation to exploit the vulnerability of the underlying
2) Content Based Evasion Attacks: Content based matching algorithms. Fig. 3 displays a taxonomy of Content
Evasion Attacks constitute some of the most challenging based Evasion Attacks.
attacks against Information Security and DLP infrastructure.
Fig. 2 illustrates an exemplary algorithmic construct for
Content Identification and Matching used in Information
Security and DLP infrastructure. As is evident from the logic
flow in Fig. 2, the most vulnerable part of the algorithm is the
ideal ‘Epicenter’ for Evasion Attacks.
Content based Evasion Attacks may involve multiple
techniques including,
• Manipulation of structural, lexical or temporal
composition of the content (e.g. emails, spreadsheets,
ad-hoc documents, etc.).
• Content Encoding and Steganographic techniques.
• Exploitation of the known “Vulnerability” of the
Data/Content Identification and Matching Algorithms.

Content Encoding and Steganographic techniques are Fig. 3 Classification of Content Based Evasion Attacks
complex challenges. However, challenges posed by Evasion
Attacks based on these techniques have been either addressed
Table I provides a description of various types of Content
or greatly mitigated by making it relatively easier to detect in
based Evasion Attacks and the corresponding algorithmic
the context of Information Security and DLP [2], [5], [6].
vulnerability that each of these attacks exploits. Information
Security and DLP systems normally employ a suite of content
Evasion Attacks based on manipulation of structural,
identification and matching algorithms.
lexical or temporal composition of data or content are by far
the most difficult to detect and neutralize. These techniques
It is to be noted here that the more precise a content
exploit the vulnerabilities of content Identification and
identification and matching algorithm is the more vulnerable it
Matching Algorithms. Few remedial means exist [1], [2], [5],
is to Content based Evasion Attacks. Thus, precision 1 ,
[6]. The key objective behind manipulation of the content
tolerance 2 and rigorousness 3 are equally important (although
itself is to avoid matching, and hence identification, with the
declared instances of data or content that is being protected. 1
Accuracy of algorithm for exact matching of content
2
Ability for partial matching of content
3
Ability to Identify similarity of content in a particular context
at times contradictory) characteristics of content matching evade many Data/Content Matching Algorithms
algorithms in order to build a formidable and robust defence Evasion • Fingerprinting Algorithms
against Content based Evasion Attacks. Target: • Keyword Matching Algorithms
• Pattern matching Algorithms
• LSI Algorithms
TABLE I 2 Polysemy Attack
SUMMARY OF CONTENT BASED EVASION ATTACKS Description: This involves using polysyms for key
words to Obfuscate or Ambiguate the original
Data/Content. Simply changing certain key words will
No. Structural Alteration Attacks evade many Data/Content Matching Algorithms
1 Transposition Attack Evasion • Fingerprinting Algorithms
Description: This involves moving around phrases or Target: • Keyword Matching Algorithms
sections of a document out of order. For example • Pattern matching Algorithms
transposing sections of a document. • LSI Algorithms
Evasion • Fingerprinting Algorithms 3 Book Cipher Attack
Target:
Description: This involves using a cipher for certain
2 Sentence Structure Alteration (SSA) Attack words. For example, using the word “seven” for the
Description: This involves altering the structure of a number “7” in a Social Security Number
sentence to make it seem different, even though the Evasion • Fingerprinting Algorithms
semantic mean is the same. As in our previous Target: • Keyword Matching Algorithms
example, “Someone will buy Company X” vs. • Pattern matching Algorithms
“Company X is going to be acquired by a buyer”. • LSI Algorithms
Evasion • Fingerprinting Algorithms
Target: • NLP Algorithms
3 Substitution Attack
Description: This involves substituting a word or IV. ADVANCED PERSISTENT THREAT MANAGEMENT IN DLP
phrase for part of a sentence or section of a document SYSTEMS
Evasion • Fingerprinting Algorithms As described in Sections II and III, real-life scenarios
Target: • NLP Algorithms
require the ability to protect against MDL that is caused via
• Keyword Matching Algorithms
No.
externally planted APTs using sophisticated Evasion Attacks.
Transformation Attacks
1 Mapping & Hashing
These attacks may include combination of Identity based and
Description: This involves using a mapping function
Content based Evasion Attacks.
F(Map) or a hashing function H(c1 … ck) to transform
a given Data/Content into a unique non-lexical In this section a unique paradigm called “3-D Correlation”
sequence to evade Content Matching Algorithms. will be presented. This provides an effective security
Evasion • Fingerprinting Algorithms framework against data exfiltration caused by Evasion Attacks
Target: • NLP Algorithms launched by APTs (Malware, Bots, Agents, etc.). This unique
• Keyword Matching Algorithms approach correlates ‘Actors~Information~Operations’ using
• Pattern matching Algorithms multiple criteria including ‘Identity and Roles’ and ‘Security
2 Encryption Transforms Profile’ of Actors.
Description: This involves using encryption to evade
Content Identification Algorithms As shown in Fig. 4, the 3-D Correlation is a ‘Formal
Evasion • Fingerprinting Algorithms Definition” of generalized Data Leak Prevention scenarios.
Target: • NLP Algorithms
• Keyword Matching Algorithms
• Pattern matching Algorithms
• Transform matching Algorithms
3 Substitution Cipher Attack
Description: This usually involves using advanced
derivatives of classical Caesar Cipher (e.g. Rot(N)
Algorithms) as a means of camouflaging the Content
Evasion • Fingerprinting Algorithms
Target: • NLP Algorithms
• Keyword Matching Algorithms
• Pattern matching Algorithms
• Transform matching Algorithms
No. Obfuscation Attacks
1 Synonymy Attack
Description: This involves using synonyms for key
words, for example, “purchase” vs. “buy” vs. Fig. 4 Formal Definition of Generalized Data Leak Prevention
“acquire”. Simply changing certain key words will
The paradigm is comprised of four main components; V. CONCLUSIONS
A new generation of threats is challenging the Information
1. Actors: An Actor is either a human agent (e.g. an Security and DLP solutions today – over two-thirds of all
employee, a user), or a process (e.g. a malware, botnet, data/information exfiltration now derives from evasive,
software, B2B process, etc.), or a machine (e.g. a USB malicious attacks launched through sophisticated APTs against
drive, CD drive, WiFi port, etc.). the data/information repositories and applications. The pace
and sophistication of these advanced attacks is increasing. It is
Each Actor has two key attributes, therefore, essential to understand the anatomy of APTs
(a) Identity – that uniquely identifies the Actor launched against Information Security and DLP infrastructure.
(b) Security Profile – that specifies the ‘Security
Clearance Level’ of the Actor. Evasion Attacks pose unparalleled challenges to DLP
systems at the basic “algorithmic level”. The Problem is
2. Information Elements: An Information Element inherent in the underlying DLP algorithms & constructs.
(IE) is a piece of data or information that is of interest.
Transition from “innocent”- DLP to “malicious”- DLP
Each IE has two key attributes, capabilities requires fundamental innovations at algorithmic
(a) Classification Type – that uniquely identifies the type level in order to address Evasion Attacks. Optimal
of information the IE represents (e.g. financial combination of processes (i.e. sophisticated Workflow) and
information PCI, healthcare information HIPAA, methods (i.e. intelligent Content Identification & Matching
personal information PII, etc.). Algorithms) is needed to counter the APTs.
(b) Confidentiality Level – that specifies the
‘Contextual Value’ of the IE. This paper presents (a) insight into the anatomy of APTs
that cause MDL, (b) the taxonomy of Evasion Attacks that
3. Operations: An Operation is an action that can be these APTs can launch to cause failure of Egress Control, (c) a
performed on an IE (e.g. Print, Copy, USB transfer, etc.) DLP paradigm that can effectively stop MDL caused by APTs.
by means of data communication or data transfer (e.g. a
communication protocol or application). The 3-D Correlation paradigm presented in this paper
provides an effective “reference” framework for defence
4. Accessibility Map: The Accessibility Map is a against sophisticated APTs.
mapping between Actors and IEs. It specifies which
Actor is allowed to have access to which IE type or REFERENCES
instance. [1] T. Mustafa, “High Granularity Reactive Measures for Reactive Pruning
of Information”, US Patent 8,141,127 B1, Mar. 20, 2012.
A. Advanced 3-D Correlation for Malicious DLP~APT: [2] T. Mustafa, “High Accuracy Document Information-Element Vector
Encoding Server”, US Patent 7,725,466 B2, May 25, 2010.
The 3-D Correlation entails a Canonical Representation of [3] N. Srinivasa et. al, “Method and Apparatus for Electronically
Segmentation of Duty (SoD) based abstraction of Information Extracting Application Specific Multidimensional Information from
Communication Infrastructure so that Information Access Documents Selected from a Set of Documents Extracted from a
Library of Electronically Searchable Documents”, US Patent 6,965,900
Control can be enforced. B2, Nov. 15, 2005.
[4] D. Gupta et. al, “System and Method for Preventing Large-Scale
One of the key capabilities of the 3-D Correlation paradigm Account Lockout”, US Patent 8,302,187 B2, Oct. 30, 2012.
is its ability to address the “Zero Date Document” use case [5] T. Mustafa, “Evasion Attacks: The Next Frontier in Data Leak
Prevention”, in Proc. RSA Security Conference, 2009.
scenarios. It incorporates an Ontology driven technology for [6] T. Mustafa, “Malicious Data Leak Prevention (DLP): Impact and
real-time automatic ‘Identification’, ‘Classification’ and Challenges for Business Processes”, in Proc. DLP Conference - Russia,
‘Correlation’ of confidential and compliance regulated 2010.
data/information without human intervention. Thus, any “Zero [7] E.M. Hutchins, M.J. Cloppert, R.M. Amin, “Intelligence-Driven
Computer Network Defense Informed by Analysis of Adversary
Day Document” or “Virgin Data” can be automatically in Campaigns and Intrusion Kill Chains”, Lockheed Martin Corporation,
real-time identified, classified and correlated for any potential Abstract, 2013.
violation. Simultaneously, the ‘Policy Engine’ accordingly [8] M.K. Daly, “The Advanced Persistent Threat (or Informationized
enforces the corresponding DLP policies in real-time. Force Operations)”, Raytheon, Report, 2009.
[9] Command and Control in the Fifth Domain, Command Five Pty Ltd,
2012.
It provides advanced and sophisticated constructs to enable [10] Advanced Persistent Threats: A Decade in Review, Command Five Pty
the Governmental Agencies, Intelligence Agencies, Police Ltd., 2011. Available:
Departments, Department of Defence and Homeland Security http://www.commandfive.com/papers/C5_APT_ADecadeInReview.pdf
Agencies to monitor and conduct surveillance over
communication infrastructure. The resulting product could be
a key tool in Cyber Counter Intelligence and Cyber Warfare.

You might also like