Professional Documents
Culture Documents
Malicious Data Leak Prevention and Purposeful Evasion Attacks An Approach To Advanced Persistent Threat (APT) Management
Malicious Data Leak Prevention and Purposeful Evasion Attacks An Approach To Advanced Persistent Threat (APT) Management
Malicious Data Leak Prevention and Purposeful Evasion Attacks An Approach To Advanced Persistent Threat (APT) Management
Abstract— Existing Data Leak Prevention (DLP) solutions are Prevention” capability wherein “Purposeful Evasion Attacks”
inherently incapable of scaling beyond trivial scenarios of can be effectively detected and prevented.
“Accidental Data Leak” wherein no “Purposeful Evasion
Attack” is encountered. Nevertheless, these attacks can render a With the advent of Advanced Persistent Threats (APTs)
DLP system completely useless (or greatly depreciate the
effectiveness/usefulness of any DLP solution). A true DLP
against Information Security Systems, “Purposeful Evasion
solution, therefore, must support “Malicious Data Leak Attacks” have assumed even more serious significance. In fact
Prevention” capability wherein “Purposeful Evasion Attacks” “Purposeful Evasion Attacks” have emerged as the most
can be effectively detected and prevented. sophisticated class of threats against DLP solutions.
With the advent of Advanced Persistent Threats (APTs) against Unfortunately, the “Purposeful Evasion Attacks” have also
Information Security and DLP Systems, “Purposeful Evasion remained un-addressed in their most basic forms, primarily
Attacks” have emerged as the most sophisticated class of threats because, 1) “Purposeful Evasion Attacks” are difficult to
against DLP solutions. Unfortunately, “Purposeful Evasion
address, and 2) “Purposeful Evasion Attacks” pose
Attacks” have also remained un-addressed in their most basic
forms.
unparalleled challenges at the basic algorithmic level.
This paper presents (1) an insight into the lifecycle of APTs In Section II this paper provides an insight into the anatomy
launched against Information Security and DLP systems, (2) a and lifecycle of APTs that launch Evasion Attacks against the
classification of real-life “Purposeful Evasion Attacks” against DLP systems after infiltrating the target infrastructure. In
Information Security and DLP systems, (3) a reference model for Section III an in-depth analysis and classification of various
enabling Malicious Data Leak Prevention (called 3-D Correlation real-life Evasion Attacks against Information Security and
Paradigm). DLP systems is presented. The specific algorithms targeted by
each type of Evasion Attacks are also described. Section IV
Keywords— Data Leak Prevention, Malicious DLP, Evasion presents a reference model for Malicious Data Leak
Attack, Information Security, Advanced Persistent Threat, APT, Prevention System that enables mechanisms for protection
False Negative, Egress Control.
against “Purposeful Evasion Attacks” launched via APTs (or
by a ‘Mole’) in the target infrastructure. Section V
summarizes some conclusions.
I. INTRODUCTION
A security solution is only as good as its weakest This paper will enable the audience to, 1) Better understand
vulnerability. Conventional DLP approaches are effective for the real-life significance of “Purposeful Evasion Attacks” in
Accidental Data Leak (ADL) only. An example of ADL is the the context of Information Security and Data Leak Prevention,
scenario where an “unintended” recipient is included in the 2) Evaluate the inherent limitations of existing DLP
‘To List’ of an email, utterly by mistake. Conventional ADL algorithms that further aggravate vulnerability to “Purposeful
based DLP solutions are inherently incapable of scaling Evasion Attacks”, 3) Design and implement methods to
beyond trivial scenarios of ADL wherein no “Purposeful counter the “Purposeful Evasion Attacks” against Information
Evasion Attack” is encountered. Security and Data Leak Prevention solutions, 4) Establish
business processes and guidelines to help mitigate
Nevertheless, these “Purposeful Evasion Attacks” can vulnerabilities against various “Purposeful Evasion Attacks”,
render a DLP system completely useless (or greatly depreciate 5) Improve their overall strategy against APTs and eliminate
the effectiveness/usefulness of any DLP solution). A true DLP compliance violation issues that result from “Purposeful
solution, therefore, must support “Malicious Data Leak Evasion Attacks”.
Content Encoding and Steganographic techniques are Fig. 3 Classification of Content Based Evasion Attacks
complex challenges. However, challenges posed by Evasion
Attacks based on these techniques have been either addressed
Table I provides a description of various types of Content
or greatly mitigated by making it relatively easier to detect in
based Evasion Attacks and the corresponding algorithmic
the context of Information Security and DLP [2], [5], [6].
vulnerability that each of these attacks exploits. Information
Security and DLP systems normally employ a suite of content
Evasion Attacks based on manipulation of structural,
identification and matching algorithms.
lexical or temporal composition of data or content are by far
the most difficult to detect and neutralize. These techniques
It is to be noted here that the more precise a content
exploit the vulnerabilities of content Identification and
identification and matching algorithm is the more vulnerable it
Matching Algorithms. Few remedial means exist [1], [2], [5],
is to Content based Evasion Attacks. Thus, precision 1 ,
[6]. The key objective behind manipulation of the content
tolerance 2 and rigorousness 3 are equally important (although
itself is to avoid matching, and hence identification, with the
declared instances of data or content that is being protected. 1
Accuracy of algorithm for exact matching of content
2
Ability for partial matching of content
3
Ability to Identify similarity of content in a particular context
at times contradictory) characteristics of content matching evade many Data/Content Matching Algorithms
algorithms in order to build a formidable and robust defence Evasion • Fingerprinting Algorithms
against Content based Evasion Attacks. Target: • Keyword Matching Algorithms
• Pattern matching Algorithms
• LSI Algorithms
TABLE I 2 Polysemy Attack
SUMMARY OF CONTENT BASED EVASION ATTACKS Description: This involves using polysyms for key
words to Obfuscate or Ambiguate the original
Data/Content. Simply changing certain key words will
No. Structural Alteration Attacks evade many Data/Content Matching Algorithms
1 Transposition Attack Evasion • Fingerprinting Algorithms
Description: This involves moving around phrases or Target: • Keyword Matching Algorithms
sections of a document out of order. For example • Pattern matching Algorithms
transposing sections of a document. • LSI Algorithms
Evasion • Fingerprinting Algorithms 3 Book Cipher Attack
Target:
Description: This involves using a cipher for certain
2 Sentence Structure Alteration (SSA) Attack words. For example, using the word “seven” for the
Description: This involves altering the structure of a number “7” in a Social Security Number
sentence to make it seem different, even though the Evasion • Fingerprinting Algorithms
semantic mean is the same. As in our previous Target: • Keyword Matching Algorithms
example, “Someone will buy Company X” vs. • Pattern matching Algorithms
“Company X is going to be acquired by a buyer”. • LSI Algorithms
Evasion • Fingerprinting Algorithms
Target: • NLP Algorithms
3 Substitution Attack
Description: This involves substituting a word or IV. ADVANCED PERSISTENT THREAT MANAGEMENT IN DLP
phrase for part of a sentence or section of a document SYSTEMS
Evasion • Fingerprinting Algorithms As described in Sections II and III, real-life scenarios
Target: • NLP Algorithms
require the ability to protect against MDL that is caused via
• Keyword Matching Algorithms
No.
externally planted APTs using sophisticated Evasion Attacks.
Transformation Attacks
1 Mapping & Hashing
These attacks may include combination of Identity based and
Description: This involves using a mapping function
Content based Evasion Attacks.
F(Map) or a hashing function H(c1 … ck) to transform
a given Data/Content into a unique non-lexical In this section a unique paradigm called “3-D Correlation”
sequence to evade Content Matching Algorithms. will be presented. This provides an effective security
Evasion • Fingerprinting Algorithms framework against data exfiltration caused by Evasion Attacks
Target: • NLP Algorithms launched by APTs (Malware, Bots, Agents, etc.). This unique
• Keyword Matching Algorithms approach correlates ‘Actors~Information~Operations’ using
• Pattern matching Algorithms multiple criteria including ‘Identity and Roles’ and ‘Security
2 Encryption Transforms Profile’ of Actors.
Description: This involves using encryption to evade
Content Identification Algorithms As shown in Fig. 4, the 3-D Correlation is a ‘Formal
Evasion • Fingerprinting Algorithms Definition” of generalized Data Leak Prevention scenarios.
Target: • NLP Algorithms
• Keyword Matching Algorithms
• Pattern matching Algorithms
• Transform matching Algorithms
3 Substitution Cipher Attack
Description: This usually involves using advanced
derivatives of classical Caesar Cipher (e.g. Rot(N)
Algorithms) as a means of camouflaging the Content
Evasion • Fingerprinting Algorithms
Target: • NLP Algorithms
• Keyword Matching Algorithms
• Pattern matching Algorithms
• Transform matching Algorithms
No. Obfuscation Attacks
1 Synonymy Attack
Description: This involves using synonyms for key
words, for example, “purchase” vs. “buy” vs. Fig. 4 Formal Definition of Generalized Data Leak Prevention
“acquire”. Simply changing certain key words will
The paradigm is comprised of four main components; V. CONCLUSIONS
A new generation of threats is challenging the Information
1. Actors: An Actor is either a human agent (e.g. an Security and DLP solutions today – over two-thirds of all
employee, a user), or a process (e.g. a malware, botnet, data/information exfiltration now derives from evasive,
software, B2B process, etc.), or a machine (e.g. a USB malicious attacks launched through sophisticated APTs against
drive, CD drive, WiFi port, etc.). the data/information repositories and applications. The pace
and sophistication of these advanced attacks is increasing. It is
Each Actor has two key attributes, therefore, essential to understand the anatomy of APTs
(a) Identity – that uniquely identifies the Actor launched against Information Security and DLP infrastructure.
(b) Security Profile – that specifies the ‘Security
Clearance Level’ of the Actor. Evasion Attacks pose unparalleled challenges to DLP
systems at the basic “algorithmic level”. The Problem is
2. Information Elements: An Information Element inherent in the underlying DLP algorithms & constructs.
(IE) is a piece of data or information that is of interest.
Transition from “innocent”- DLP to “malicious”- DLP
Each IE has two key attributes, capabilities requires fundamental innovations at algorithmic
(a) Classification Type – that uniquely identifies the type level in order to address Evasion Attacks. Optimal
of information the IE represents (e.g. financial combination of processes (i.e. sophisticated Workflow) and
information PCI, healthcare information HIPAA, methods (i.e. intelligent Content Identification & Matching
personal information PII, etc.). Algorithms) is needed to counter the APTs.
(b) Confidentiality Level – that specifies the
‘Contextual Value’ of the IE. This paper presents (a) insight into the anatomy of APTs
that cause MDL, (b) the taxonomy of Evasion Attacks that
3. Operations: An Operation is an action that can be these APTs can launch to cause failure of Egress Control, (c) a
performed on an IE (e.g. Print, Copy, USB transfer, etc.) DLP paradigm that can effectively stop MDL caused by APTs.
by means of data communication or data transfer (e.g. a
communication protocol or application). The 3-D Correlation paradigm presented in this paper
provides an effective “reference” framework for defence
4. Accessibility Map: The Accessibility Map is a against sophisticated APTs.
mapping between Actors and IEs. It specifies which
Actor is allowed to have access to which IE type or REFERENCES
instance. [1] T. Mustafa, “High Granularity Reactive Measures for Reactive Pruning
of Information”, US Patent 8,141,127 B1, Mar. 20, 2012.
A. Advanced 3-D Correlation for Malicious DLP~APT: [2] T. Mustafa, “High Accuracy Document Information-Element Vector
Encoding Server”, US Patent 7,725,466 B2, May 25, 2010.
The 3-D Correlation entails a Canonical Representation of [3] N. Srinivasa et. al, “Method and Apparatus for Electronically
Segmentation of Duty (SoD) based abstraction of Information Extracting Application Specific Multidimensional Information from
Communication Infrastructure so that Information Access Documents Selected from a Set of Documents Extracted from a
Library of Electronically Searchable Documents”, US Patent 6,965,900
Control can be enforced. B2, Nov. 15, 2005.
[4] D. Gupta et. al, “System and Method for Preventing Large-Scale
One of the key capabilities of the 3-D Correlation paradigm Account Lockout”, US Patent 8,302,187 B2, Oct. 30, 2012.
is its ability to address the “Zero Date Document” use case [5] T. Mustafa, “Evasion Attacks: The Next Frontier in Data Leak
Prevention”, in Proc. RSA Security Conference, 2009.
scenarios. It incorporates an Ontology driven technology for [6] T. Mustafa, “Malicious Data Leak Prevention (DLP): Impact and
real-time automatic ‘Identification’, ‘Classification’ and Challenges for Business Processes”, in Proc. DLP Conference - Russia,
‘Correlation’ of confidential and compliance regulated 2010.
data/information without human intervention. Thus, any “Zero [7] E.M. Hutchins, M.J. Cloppert, R.M. Amin, “Intelligence-Driven
Computer Network Defense Informed by Analysis of Adversary
Day Document” or “Virgin Data” can be automatically in Campaigns and Intrusion Kill Chains”, Lockheed Martin Corporation,
real-time identified, classified and correlated for any potential Abstract, 2013.
violation. Simultaneously, the ‘Policy Engine’ accordingly [8] M.K. Daly, “The Advanced Persistent Threat (or Informationized
enforces the corresponding DLP policies in real-time. Force Operations)”, Raytheon, Report, 2009.
[9] Command and Control in the Fifth Domain, Command Five Pty Ltd,
2012.
It provides advanced and sophisticated constructs to enable [10] Advanced Persistent Threats: A Decade in Review, Command Five Pty
the Governmental Agencies, Intelligence Agencies, Police Ltd., 2011. Available:
Departments, Department of Defence and Homeland Security http://www.commandfive.com/papers/C5_APT_ADecadeInReview.pdf
Agencies to monitor and conduct surveillance over
communication infrastructure. The resulting product could be
a key tool in Cyber Counter Intelligence and Cyber Warfare.