Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Improving the Efficiency of Network Intrusion

Detection Systems

B. Tech Project Report


Submitted in partial fulfillment of the requirements
for the degree of
Bachelor of Technology

Nakul Aggarwal
Roll No: 02005022

under the guidance of


Prof. Om Damani
&
Prof. Kriti Ramamritham

a
Department of Computer Science and Engineering
Indian Institute of Technology
Bombay
May 3, 2006
BTech Project Approval Sheet
I hereby state that contents of this work are mine. Any substantially borrowed material
(cut-pasted or otherwise) including figures, tables and sketches have been duly acknowl-
edged.

Nakul Aggarwal
(Roll no: 02005022)

Date :

I hereby give my approval for the B.Tech Project Report titled “Improving The Efficiency
of Intrusion Detection Systems” by Nakul Aggarwal (02005022) to be submitted.

Prof. Om Damani

Prof. Krithi Ramamritham

Date :

2
Acknowledgments

I would like to express my sincere gratitude towards my guides Prof. Om Damani and Prof.
Krithi Ramamritham for their invaluable consistent support and guidance. They has been
generous enough to let me pursue the work of my interest.

Nakul Aggarwal,
May, 2006
IIT Bombay.

3
Abstract

Network intrusion detection systems have become standard components in security infras-
tructures. The elements central to intrusion detection are the resources to be protected in a
target network, i.e., computer systems, file systems, network information, etc; models that
characterize the normal or legitimate behavior of network; techniques that compare the ac-
tual network activities with the established models, and identify those that are abnormal or
intrusive.
There are two approaches to combat issue of intrusion depending upon whether we have
some previous info of the attacks or not. One is, when from earlier intrusions we want to
know whether new flows are intrusive in nature. Other is after learning the normal behav-
ior of a network we want to classify new flows are normal or intrusive. Here we will look
at some of the approaches, algorithms, issues still unsolved. Then we had looked at the
issue of evading IDS’s by overflowing their network buffers with out of order packets and
has proposed a solution. Also, implementing inline and adaptive clustering mechanisms for
anomaly detection techniques at high traffic rate has been an limitation in anomaly detec-
tion approaches. ADWICE has been first effort in this field but since it uses distance based
clustering mechanism it suffers from inefficient clustering. We have proposed additional
density based statistical variables with each cluster so as to improve the efficiency.

i
Contents

1 Introduction 1

2 Misuse Detection 3
2.1 Approaches to Misuse Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Algorithms in Misuse Detection 6


3.1 Boyer Moore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Knuth-Morris-Pratt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Aho Corasick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 NFA/DFA at hardware level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Snort 10
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Snort Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Architecture of String Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Working Model of the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5 Some More about Snort Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5.1 Preprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5.2 Inline Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.6 Multi-Pattern String matching Algorithms in Snort . . . . . . . . . . . . . . . . 13
4.6.1 Boyer-Moore Multi-pattern String Matching . . . . . . . . . . . . . . . 13
4.6.2 Wu-Manber Multi-pattern String Matching . . . . . . . . . . . . . . . . 13
4.6.3 Aho-Corasick Multi-pattern Matching . . . . . . . . . . . . . . . . . . . 13
4.6.4 Aho/Corasick with Sparse Matrix Implementation . . . . . . . . . . . 14
4.6.5 SFKSearch using Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Bro 15

6 Issues with Pattern Matching 17

ii
7 Issue of Out of Order packets 20
7.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

8 Anomaly Detection 23
8.1 Approaches to Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . 24

9 Clustering Algorithms for Anomaly Detection 27


9.1 BIRCH - Balanced Iterative Reducing and Clustering . . . . . . . . . . . . . . 27
9.2 DBSCAN - Density-Based Algorithm for Discovering Clusters in Large Spa-
tial Databases with Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

10 ADWICE-TRAD 30

11 Conclusion and Future Work 32


11.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

iii
Chapter 1

Introduction

There has been significant rise in the number of network attacks, hacking into the systems
using as simple as buffer overflows, new worms making whole networks down, attacking
of web servers via exploitation of software bugs or DOS attacks. Because of the increas-
ing personal information at stakes in the networks and ever expanding internet/intranet
threats, there’s much work going on in combating these attacks. Intrusion Detection is pri-
mary concerned with the detection of illegal activities and acquisitions of privileges that
cannot be detected with information flow and access control modules. Intrusion detection
can be of two types either Pattern Matching or Anomaly Detection. Pattern matching is just one
of the methods where system inspects network traffic for matches against exact, precisely-
described patterns, while, Anomaly Detection learns the normal network traffic and then
detects network intrusions by classifying the real traffic as being normal or anomalous.
The increasing network utilization and weekly increase in the number of critical appli-
cation layer exploits implies IDS designers must find ways to speed up their attack analysis
techniques when monitoring a fully-saturated network with less number of false positives.
Even the studies on empirical data indicate that number of signatures (which represent the
one or the other unique malicious activity) has grown around 2.5 times in last 3 yrs [20]. Then
ten’s of vulnerabilities of various softwares are exposed each day at various security related
lists and newsgroups or buqtraq mailing lists.
Signature Matching is the core of malicious traffic/event detection engines, independent
of implementation level in network i.e. whether its deployed at network Perimeter (typi-
cally known as Demilitarized Zone (DMZ’s)), at network level(NIDS) or at host level(HIDS).
And implementations exist at both level as in softwares products, hardware chips or pattern
maching engines. Some of the most popular software NIDS includes Snort, Bro, Dragon IDS
etc. Signature matching engines at hardware level implement the signatures with the help
of LookUp Tables (LUTs), TCAMs, NFA/DFA and pattern matching is done in router itself
for each packet maintaining the session flow information per IP.
Snort is the one of the most widely deployed IDS tools. Statistics say that signature
matching is the most computational intensive part of an IDS system. In Snort, upto 70% of
the total execution time goes in this process which clearly reflects the vast amount of work
that still needs to be done. Also, other than pattern matching when it comes to statesful
pattern matching we have the issues of out-of-memory and excessive CPU usages, therefore
much work still needs to be done in this field. Pattern matching for network security and

1
Intrusion Detection demands exceptionally high performance.
People have adopted various techniques at times for string matching and this technol-
ogy is still evolving with new optimizations and heuristics every day. String matching is of
high interest in theoretical aspect also. But here the need is of fast multiple pattern string
matching. Pattern matching started with the use of the most common string matching algo-
rithms like Boyer-Moore, Knuth-Morris-Pratt (KMP), Aho/Corasick etc. But, over the time,
researchers have designed new and efficient algorithms including improvements over these
existing approaches. Some of them are Coloured Petri Nets, Hash table mapping for each
pattern over Boyer-Moore given by Wu-Manber, Sparse-vector implementation over Aho-
Corasick, Tries and suffix trees (some kind of linked-list implementation for State-Machine
matching) etc.
Anomaly Detection is currently in its infant stage as far as real world implementations
are concerned, but is more powerful than pattern matching because of the capability of iden-
tifying new attacks. Also, less human effort is involved once setup and running comparing
to former approach where one constantly needs to update the signature database. While
there has been a lot of limitations in this approach listed in chapter 8, efficient and fast clus-
tering mechanism has been one of the most important limitation. In ADWICE[4], authors
has implemented an scalable and efficient anomaly detection system which uses clustering
algorithm namely BIRCH with some modifications. We have proposed an fix in the BIRCH
algorithm which will make clustering more robust and efficient especially in intrusion de-
tection applications.

Rest of the paper is arranged as follows.

• In Chapter 2, we will begin an general introduction to Misuse Detection and approaches.

• In Chapter 3, we will see some of the most common algorithms of pattern matching.

• In Chapter 4, we will be explaining the design and architecture of most widely used
Intrusion Detection system, Snort.

• In Chapter 5, Bro will be looked at.

• Chapter 6, we will list the issues with most of the detection systems.

• Chapter 7, we have looked at the issue of ”out of order” packets in NIDS, we will look
here at a proposed solution and its proof.

• In Chapter 8, we will going into Anomaly Detection Systems, starting with a general
overview.

• Chapter 9 follows with the discussion of clustering algorithms for large scale data and
their analysis for implementation in anomaly detection systems.

• Chapter 10, we propose a fix for the ADWICE, an Adaptive anomaly detection system.

• We conclude with Chapter 11.

2
Chapter 2

Misuse Detection

Misuse detection aims to detect well-known attacks as well as slight variations of them,
by characterizing the rules that govern these attacks. Systems based on this approach use
different models like state transition analysis, or a more formal pattern classification. By its
nature, misuse detection tends to have low number of false positives but is unable to detect
attacks that lie beyond its knowledge. Some examples being:
1. IP packets that exceed the maximum legal length (65535 octets)
2. /User-Agent \:[ˆ\n]+ PassSickle/i
This is a example signature for capturing the packets containing trojan horse PassSickle.

2.1 Approaches to Misuse Detection


Misuse detection approaches can be classified into the following categories:
• Signature Analysis
• Association Rules
• State Transition Analysis
• Data Mining Approaches
Misuse Detection systems has knowledge of both the normal and the anomalous data and
new flows are classified into the one of the two categories depending upon one of the above
mentioned approaches used. Where the anomalous data is represented by the signatures as
we have seen in above example, all the data with no such signatures are considered to be
normal.

Signature analysis or Pattern Matching is the technique of matching the data with a set
of predefined ruleset or signatures with any of the pattern matching algorithms which will
be discussed in the chapter 3.

Association Rules or Expert systems defines the intrusions as a set of rules and correspond-
ing actions, which are fired whenever a matching with some rule takes place.

3
State Transition Analysis Here the known intrusions are defined as definite finite state ma-
chine with some end nodes, every event either takes you to next state depending upon the
transitional input. Bro (refer Chapter 5) is an example of this kind of approach, where each
matching of some signature, flags, etc. triggers the correlation engine which makes an tran-
sition on the state machine.

Data Mining Approaches use statistical classification techniques like Naive Bayes, Deci-
sion Trees, Neural Networks, genetic algorithms etc. to classify the new events/flows being
normal or anomalous. Being statistical this requires some data to build up the models to
match new data against. Hence, here some learning data where flows are pre-labelled as
normal and/or anomalous is feed into the machine initially to build up the models and then
use these learned models for further classification.

While Misuse detection is the most widely deployed mechanism for NIDS, it suffers
from following flaws which has lead to the search of more efficient techniques for intrusion
detection. Some of the limitations being :

1. Since, it uses pre-defined set of signatures, it is not able to detect new threats/intrusions.

2. Over last few years, networks has seen large variety of intrusions, providing a large
signature set leading to large number of false positives and requires large human effort
optimizing the signature set as per one’s network needs and requirements.

3. Updating of Signatures. These systems needs to be regularly updated to the newest


rule-set from the respective sites for combating everyday’s new attacks.

4. Signature obfuscation. Here a attack eludes the NIDS by exploiting the fact that signa-
ture doesn’t covers all the attack instances. i.e. Given a signature “blaster”, the NIDS
can be easily evaded by the malicious packet[s] if it contains the string mlaster etc.

5. Other IDS evasion and invasion techniques. There contains a large set which has been
thoroughly discussed in [12] ( For eg. evading the signature matching rule set by
additional packet with arbitrary string and low TTL in between packets which contains
the main string, this substring will prevent matching engine from matching but the end
host gets affected since it doesn’t gets the additional packet with low TTL value.

6. Latency in development. This type of systems involve high manual involvement life-
long. From the starting of installing, optimizing the rule set, regular updating of sig-
natures, checking the alerts and hence the intrusions.

7. Association rules do suffer from all above with additional overhead of the clumsiness
which comes through as the number of attributes to match keep on increasing.

But, rather looking for the superset of misuse detection to be able to detect every intru-
sion, people rather looked for removing the limitations which gave rise to anomaly detection
techniques, which are able to detect new intrusions and donot suffer from large signature
set issues (since it doesnot uses any signature set).

4
Misuse Detection via Signature Matching is the most widely accepted approach because
of the large research base which provides the constant and updated flow of signature set,
typically within hours when a new vulnerability or exploit or worm is detected. One of
the most widely deployed tool for NIDS is Snort, which also uses this approach. A de-
tailed study of the snort architecture, techniques, algorithms and code has been discussed
in Chapter 4.
Pattern Matching matches the input flow with given a set of signatures. Signatures can
be both flags matching, IP Addresses, or content in the payload (which is there in most of
the signatures). Hence, most commonly string matching algorithms like Boyer More etc.
are deployed as part of these NIDS. Lets look at some of the common pattern matching
algorithms.

5
Chapter 3

Algorithms in Misuse Detection

Here, we will be discussing some of the basic must-know algorithms of string or pattern
matching. These include
1. Simple string matching
• Boyer-Moore
• Knuth-Morris-Pratt (KMP)
2. State Machine Matching
• Aho/Corasick
3. Hardware Solutions
• Bloom Filters and Extended Bloom Filters
• NFA/DFA implementation at hardware level

3.1 Boyer Moore


Main features
• performs the comparisons from right to left;
• preprocessing phase in O(m + σ) time and space complexity where σ is character set
size of pattern;
• searching phase in O(mn) time complexity example am in an ;
• 3n text character comparisons in the worst case when searching for a non periodic
pattern and n in average case.
• O(n/m) best performance example when am b in bn .
The Boyer-Moore algorithm[5] uses two different heuristics for determining the max-
imum possible shift distance in case of a mismatch: the “bad character” and the “good
suffix” heuristics. The first heuristic, referred to as the bad character heuristic, works as
follows: if the search pattern contains a mismatching character (that is different from corre-
sponding character in the given text), the pattern is shifted so that the mismatching charac-
ter is aligned with the rightmost position at which it appears inside the pattern. The second
heuristic, works as follows: if a mismatch is found in the middle of the pattern, the search

6
pattern is shifted to the next occurrence of the matched suffix in the pattern. Both heuris-
tics can lead to a shift distance of m. For the bad character heuristics this is the case, if the
first comparison causes a mismatch and the corresponding text symbol does not occur in
the pattern at all. For the good suffix heuristics this is the case, if only the first comparison
was a match, but that symbol does not occur elsewhere in the pattern. And with the help
of preprocessed “bad character” and “good suffix” values, one can finds the value of shift
needed as the max of these two.
The preprocessing for the good suffix heuristics is rather difficult to understand and to
implement. Therefore, some versions of the Boyer-Moore algorithm are found in which the
good suffix heuristics is left away. The argument is that the bad character heuristics would
be sufficient and the good suffix heuristics would not save many comparisons. However,
this is not true for small alphabet sets.

3.2 Knuth-Morris-Pratt
Main features

• Performs the comparisons from left to right;


• Preprocessing phase in O(m) space and time complexity;
• Searching phase in O(n + m) time complexity (independent from the alphabet size);

This was significant improvement in memory requirements over finite state machine
based string matching. It pre-calculates a auxiliary function π (m-dimension) which contains
the information about the jumping from current state to next state.
While during string matching process, π contains the information about the optimum
shifts needed in the case of a mis-match. The optimum shift depends on the prefix in pattern
which is also the suffix in the matched pattern part.

3.3 Aho Corasick


Main Features

• Performs the comparison from left to right;


• Searching phase in O(n) time complexity;
P P
• Preprocessing phase has O(m ) space requirements, where is alphabet set size.

Aho/Corasick String Matching Automaton for a given finite set P of patterns is a (deter-
ministic) finite automaton G accepting the set of all words containing a word of P .
G consists of the following components:

1. finite set Q of states


P
2. finite alphabet P
3. transition function δ : Q × → Q + f ail
4. initial state q0 in Q
5. a set F of final states

7
Transition table is built during the preprocessing part. P Where at each state, there is in-
formation about where to jump to for each character ∈ . It just traverses the string to be
matched making transitions
P via the δ, the transition function which tells which state to jump
for each character ∈ . Whenever we reach a state ∈ F , a match is reported by the engine.
For simple string matching cases, it doesnot performs very well but when there are multiple
patterns or pattern matching is done at regular expression level, it is one of the best options
for pattern matching.

3.4 Bloom Filters


A Bloom filter is a space efficient randomized data-structure used for concisely representing
a set in order to support approximate membership queries. The space efficiency is achieved
at the cost of a small probability of false positives. This means that a Bloom filter could
wrongly accept some entry even if it does not belong to the set under consideration. How-
ever, wise selection of the filter’s parameters can guarantee a small false positives probabil-
ity.
Given a string ’X’, the Bloom filter computes ’k’ hash functions on it producing ’k’ hash
values ranging from 1 to ’m’. It then sets ’k’ bits in a ’m’- bit long vector at the addresses
corresponding to the ’k’ hash values. The same procedure is repeated for all the members
of the set. This process is called programming of the filter. The query process is similar to
programming, where a string whose membership is to be verified is input to the filter. The
Bloom filter generates ’k’ hash values using the same hash functions it used to program the
filter. The bits in the ’m’-bit long vector at the locations corresponding to the ’k’ hash values
are looked up. If at least one of these ’k’ bits is found not set then the string is declared to
be a nonmember of the set. If all the bits are found to be set then the string is said to belong
to the set with a certain probability. Therefore, a Bloom filter could result in false positives;
where an item is accepted while it does not actually belong to the set.
Lately, there has been much improvements in this technology also with the modifications
leading to counting bloom filters, Compressed Bloom Filters, Bloomier filters etc.

3.5 NFA/DFA at hardware level


Sidhu and Prasanna in [18],first time implemented NFA matching onto programmable logic
in O(n2 ) logic and still providing O(1) access time. They implemented One-Hot Encoding
(OHE) scheme, where one flip-flop is associated with each state and at anytime only one
is active. Then combinational logic associated with each flip flop ensures that this 1-bit is
transferred to flip-flop corresponding to next state in the DFA. For fitting in logic of the
existing patterns, first DFA is formed and then a NFA. Now each transition is mapped to
these flip-flop structure. Taking care of the  transitions in the NFA’s by providing the same
input to next state also, and usage of LUT’s for comparing the input character, they are able
to map the patterns to the FPGA’s.
The reported times are amazing, the string matching time for 11MB file, reported CPU
time and maximum memory usage of 0.34sec and 580KB respectively, while the same when

8
matching when done by DFA matching engine as software reported above mentioned stats
to be 87309.38sec and 229MB respectively.
While, there had been a lot of modifications and advancements in this approach also
after this initial effort.

9
Chapter 4

Snort

4.1 Introduction
Snort can perform real-time packet logging, protocol analysis, and content searching/matching.
It can be used to detect a variety of attacks and probes such as stealth port scans, CGI-based
attacks, Address Resolution Protocol (ARP) spoofing, and attacks on daemons with known
weaknesses. Snort utilizes descriptive rules to determine what traffic it should monitor and
a modularly designed detection engine to pinpoint attacks in real time. When an attack
is identified, Snort can take a variety of actions to alert the systems administrator to the
threat. Snort into its first releases used to have brute force matching which was very slow.
First boost to signature matching came in with the implementation of Boyer-Moore pattern
matching Algorithm. They have come long way after these initial implementations, we will
see some of those soon.

4.2 Snort Rules


A sample snort rule can be written as..
alert udp $EXTERNAL NET any -> $HOME NET 177 (msg:"MISC xdmcp
query"; content: "|00 01 00 03 00 01 00|";reference:arachnids,476;
classtype:attempted-recon; sid:517; rev:1;)
This rule has been broken down into 2 parts: Rule header (everything upto first paren-
thesis) and Rule options (everything in the hypothesis). Rule headers forms the RTN (Rule
Tree Node) in the snort matching architecture while Rule Options forms the OTNs (Optional
Tree Node). How, this helps in matching, we will see in next section.

4.3 Architecture of String Matching


Snort contains RuleList global variable which has four RTN head nodes. Four heads cor-
responds to each of the four protocols TCP, ICMP, IP and UDP. These head nodes are head
nodes of the RTN linked lists. Each rule in the rules file is added to the respective list. Since,
many of the rules contains the same Rule Headers, therefore each of the RTN node contains
a pointer to the head node of the OTN linked list which contains all the rules with the same

10
RTN header. Each OTN node further contains some other flags that needs to matched (like
Ack flag should be set etc.) and these checks are performed before the string matching to
avoid unnecessary pattern matching in case even flag doesnot matches. And when flag also
matches, engine calls the function pointer stored to do other(if any) necessary checks and
string matching using any of the string matching algorithm.
By default, Wu-Manber string matching algorithm is used. But snort contains imple-
mentations of large number of other pattern matching engines also including Modified
Wu-Manber Style Multi-Pattern Matcher, SFK matching engine, Aho/Corasick, Optimiza-
tions on Aho/Corasick, Sparse Matrix implementation of Aho/Corasick etc. We will be
discussing some of these algorithms in next section.
This Rulelist is build up during the initialization of the engine. But lately the number
of rules in snort rule DB has exceeded even 3000 mark such that the above mentioned 3 -
dimension linked lists [RTN, OTN, function pointers] are not able to work at high speed in
the network. Therefore they have done one more optimization i.e. they have implemented
a fast packet classification engine adding a 4rth dimension to the above structure.
This fourth dimension is “port” based classification of rules and this is done before the
RTN lists are created. That is we have port based classification for ruleset after the four
protocols mentioned above. The authors has assumed that given the port values (both of
source or destination) we can drop the rule in one of the following class.

1. Unique Source Port

2. Unique Destination Port

3. Unique Source and Destination Port

4. Generic (source and Destination port can take “any” value)

Now each structure has linked list array of MAX PORT size (64*1024). This allows O(1)
mapping of rule on the basis of port value to its appropriate list. This additional dimension
speeds up the process of string matching since now the number of rules to be matched
against the incoming traffic are reduced by high number.

4.4 Working Model of the code


Snort’s architecture is focused on performance, simplicity, and flexibility. There are three
primary subsystems that make up Snort: the packet decoder, the detection engine, and the
logging and alerting subsystem.
These subsystems ride on top of the libpcap promiscuous packet sniffing library, which
provides a portable packet sniffing and filtering capability. Program configuration, rules
parsing, and data structure generation takes place before the sniffer section is initialized,
keeping the amount of per packet processing to the minimum required to achieve the base
program functionality.

11
4.5 Some More about Snort Powers
4.5.1 Preprocessors
With the arrival of term ”Anomaly Detection”, their was high demand of this in snort
also. Since, rule based matching was done in Detection engine, the protocol anomaly de-
tection and many other functionalities which are independent of rules comes under this
category. Also, each added preprocessor, will demand more processing time affecting the
main strength of snort i.e. fast rule-based matching. Hence, authors thought of adding this
functionality as modular “plug-ins” something similar to modules in linux kernel which can
be deactivated whenever not needed or when they are effecting the snort performance.
Preprocessors are plugable components of Snort, introduced since version 1.5. They’re
“located” just after the module of protocol analysis and before detection engine and do not
depend of rules. They are called whenever a packet arrives, but just once, the detection
plugins, in the other hand, do depend of rules and may be applied many times for a sin-
gle packet. SPP’s (Snort Preprocessors) can be used in different ways: They can look for
an specific behavior(portscan, flowportscan), to be support for further analysis(is this the
expression? help us) like flow, or just collect certain information, like perfmonitor.
Hola Anonimo has given a very basic level tutorial [1] on how to write a preprocessor
plugin for Snort. Some of the major achievements or goals of Preprocessors were
• To decrease the number of false positives,
• Adding the anomaly detection techniques to snort and last but not the least
• Improving the pattern matching when pattern is extended over multiple segments or
fragments.
Now we will look at last 2 of the above mentioned achievements.
Anomaly Detection Anomaly Detection preprocessors include both type of protocol anomaly
detection (via protocol specific PP like arpspoof, telnet decode etc) and even the ad-
vanced techniques of statistical approaches to anomaly detection via the Spade plugin
(A brief description has been given in Appendix A).
Pattern Matching over Multiple Packets This is achieved through the Stream4 and Frag2
preprocessors, where former adds the TCP statefulness and session reassembly so that
connection status and information can be stored providing more information on alerts
and also removing the unnecessary checks and also check the patterns which are ex-
tending over multiple packets. While the latter preprocessor prevents the IDS evasion
and invasions via fragmented packets [15].

4.5.2 Inline Mode


Inline Mode is optional argument in Snort which actually increases the processing speed
of snort. Since in this level, it works at the same level as Iptables, where each packet is
processed first accessed by snort and then passed to the linux kernel, hence preventing sig-
nificant overheads involved in kernel processing in cases when packet needs to be dropped.
Snort inline obtains packets from iptables instead of libpcap and then uses new rule types
to help iptables pass or drop packets based on Snort rules.

12
4.6 Multi-Pattern String matching Algorithms in Snort
1. Boyer More
2. Wu-Manber
3. Aho/Corasick
4. Sparse matrix with Aho/Corasick
5. Tries in SFKSearch

4.6.1 Boyer-Moore Multi-pattern String Matching


This is same as what we have already discussed in earlier section 3.1 except that the patterns
are quite large in number rather just one. But, matching here is done sequentially for each
pattern.

4.6.2 Wu-Manber Multi-pattern String Matching


This is the default string matching algorithm used in Snort. This was an improvement over
Boyer-Moore in 2 aspects (assuming all the patterns are of same length and each pattern
is broken into further substrings of same length. Eg. if patterns are of length m and k in
number, we form fragments of length b = log(mk), inpractise however b = 2 or 3)
• SHIFT Table, which is b-byte shift table preprocessed during initialization (Here all
possible cases of b-string are considered for the given alphabet size). Hence, blocks
of characters are matched, by mapping them to unique integral values. It is used to
determine how many characters in the text can be shifted (skipped) when the text is
scanned. When a shift value indicates matched fragment of pattern (i.e. value ≤ 0),
only then pattern matching is done.

• Rather than matching all the patterns they have exploited power of hash functions,
where initially HASH table is built and all patterns are categorized into appropriate
table entry. Building of hash table is quite interesting here, since the first b-length
substring is considered from the prefix of each pattern for calculation of each hash
value.
Ambiguity lies in the case where the SHIFT reports a match but there is no entry in the
Hashtable (since the hash only depends on first b-character substring in the pattern), there-
fore in that case only 1 character is skipped.
The reported macthing times are nearly two times faster than GNU-grep. The scanning
operation was also shown to be in O(bN/m) where N is the size of input.

4.6.3 Aho-Corasick Multi-pattern Matching


Aho/Corasick Matching, implementation first forms a combined DFA for all patterns. Since
this is preprocessed during the initialization part, there is no overhead of DFA formation for
each pattern and also no (individual or set of) patterns traversal. And for each new character
we have to just take one step. But the memory overheads are huge. Also, the state holding
at each step is huge because there are multiple copies of active DFA’s since a new DFA gets

13
activated at each new character input other than the existing DFA’s. Of course some go out
also but difference is huge.
But power of the algorithm is, it is unaffected by the variance in size of the patterns and
worse and average case performance is same.

4.6.4 Aho/Corasick with Sparse Matrix Implementation


The enhanced design on Aho-Corasick uses an optimized vector storage design for stor-
ing the transition table. This memory efficient variant uses sparse matrix storage to reduce
the memory requirements and further improve performance on large pattern groups. The
author [13] has even reported an 1.2 to 1.7 times faster performance with the usage of sparse-
matrix and 1.5-2.5 times with full-matrix version.

Sparse-Row format
Vector: 0 0 0 2 4 0 0 0 6 0 7 0 0 0 0 0 0
Sparse-Row Storage: 8 4 2 5 4 9 7 11 7

Now for each DFA state rather than having a 256-size vector of which most are 0 values, we
use sparse matrices to present the transition element and the corresponding value. Cleary
since we cannot have O(1) transition time in this implementation, since we need to traverse
this new vector to find the transition element. The memory requirements go down by four
times which is quite significant. There are some other compact representations have also
been discussed by the author namely Compressed Sparse Vector Format, Banded-Row For-
mat and CSR Matrix Format.

4.6.5 SFKSearch using Tries


The term trie comes from the word ”retrieval”. A trie is a k-ary position tree. It is constructed
from input strings, i.e. the input is a set
Pof n strings called S1 , S2 , ..., Sn , where each Si consists
of symbols from a finite alphabet set and has a unique terminal symbol $. This algorithm
used for low memory situations in Snort. The algorithm builds a trie. Each level in the trie
is a sequential list of sibling nodes that contain a pointer to matching rules, a character that
must be matched to traverse to their child node, and a pointer to the (next) sibling node. The
algorithm uses a bad character shift table to advance through search text until it encounters
a possible start of a match string, at which point it traverses the trie looking for matches. If
there is a match between the character in the current node and the current character in the
packet, the algorithm follows the child pointer and increments the character packet pointer.
Otherwise, it follows the sibling pointer until it reaches the end of the list, at which point it
recognizes that no further matches are possible. In the case that matching fails, the algorithm
backtracks to the point at which the match started, and now considers matches starting from
the next character in the packet.
While worst case performance is quite poor in comparison to Aho/Corasick but low
memory requirements makes it an appropriate substitute at times.

14
Chapter 5

Bro

Bro[2] is a Unix-based Network Intrusion Detection System (IDS). Bro monitors network
traffic and detects intrusion attempts based on the traffic characteristics and content. Bro
detects intrusions by comparing network traffic against rules describing events that are
deemed troublesome. These rules might describe activities (e.g., certain hosts connecting
to certain services), what activities are worth alerting (e.g., attempts to a given number of
different hosts constitutes a “scan”), or signatures describing known attacks or access to
known vulnerabilities. If Bro detects something of interest, it can be instructed to either is-
sue a log entry or initiate the execution of an operating system command. The main aim of
this IDS to combat two major shortcomings of the snort engine namely high false alarm rates
and the string matching time. For the former, they designed the concept of context based
pattern matching, where additional context is provided by
1. Regular expressions for signatures rather strings.
2. Providing the alert engine a notion of connection state and knowledge.
In their design, for every matched pattern or rule, rather than generation of an alert, an
event is generated and passed to another component named as policy script component which
at the abstract level sort of correlates these events to find the possibility of an attack. But
matching large number of patterns each time is quite intensive especially when they have
two engines running simultaneously. For combating this problem, they have implemented
DFA matching for pattern matching algorithm which also provides additional strength to
their patterns since they are more robust to false positives now. Since the DFA construction
requires quite large memory requirements, they have used the approach of on-the-fly gen-
eration of the DFA as given in [7] and also implemented the memory bounded DFA engine
in-case there is algorithmic attack on the engine itself so that not to affect the other engine.
They have compared their approach with Snort and reported some interesting results
also.
• The reported matching time was quite similar in without-cache implementation of Bro
engine and snort.
• The number of alerts and signatures in Bro were much more informative as compared
to Snort, eliminating a large number of false positives.
• They because of their context-based matching engine has inbuilt capability to fightback
TCP reassembly and fragmentation issues.

15
Important question is then why snort is the most widely used tool? There is no such
answer available anywhere but these arguments are just what are my inferences:

• With the implementation of efficient string matching algorithms, the running time of
snort exceeds bro by much large margin.
• Snort has large and regularly updated signature database, which is most important
reason for its usage.
• Even though Bro signatures are more context-specific, without regular updation of
signatures and more proper categorization (with the ever increasing signature set), the
performance goes down.
• The memory requirements are quite high, since they use a DFA matching engine.

16
Chapter 6

Issues with Pattern Matching

Other than Pattern matching “algorithm” decision, there are a lot of other issues that also
needs to considered before choosing any one of them. Of course, fast matching is the natural
need for the decision but there are some other issues to be kept in mind like fighting false
positives example in some cases it is possible that payload contains a pattern for buffer
overflow attack via telnet application protocol but what if there was no active telnet session
between two hosts. Then, other issue can be what if pattern is split over multiple packets?
Some of issues with respect to choice of algorithm and limitations of signature matching has
been stated below.

• Memory vs Speed
• Signature format
• Session-Based and Application Level Signature Matching
• State Holding issues in-cases of pattern extending over multiple packets
• Packet Fragmentation Issues.
• Getting packet dumps or testing data set? (other than attack tools and DARPA set.)

While one always needs to compromise between memory requirements vs speed avail-
able. As we can see in the existing algorithms itself, Aho/Corasick provides O(1) time pat-
tern matching but requires quite large memory for the storage of the state machine. While
the other string matching algorithms such as Boyer-Moore can lead to O(mn) time require-
ments in cases of algorithmic attacks. One must need to payoff one depending upon the
constraints.
Most of the IDS’s except a few use the byte or character based string as the patterns pre-
sentation format. While this is also needed as the most common algorithms used are Boyer-
Moore, KMP etc. But if State-machine matching is being deployed then regular expression
can provide a better pattern which can be more informative and will be more unique to
the attack it is identifying. Other than these, most of the Snort rules do contains multiple
patterns with different offset and depth values which can be very well expressed in sin-
gle regular expression with the usage of basic regular expression patterns like . and * etc.
[19] provides some examples also. Also, Bro contains patterns in regex (regular expression)
format itself.
Then, [19] also discusses about the statesful packet matching where IDS stores the infor-
mation about the context of the traffic between two peers providing more efficient pattern

17
matching results but the overheads involved are the massive because of the information that
needs to be stored specific to content of the traffic for large amount of the flows. While over
this, one can also provide application level pattern matching to provide even better results.
One of the most important issues with IDS systems is the state holding issue which can
be explained as the amount of the information that needs to be stored for each flow flowing
through it. Incase of pattern matching over individual packets, this is not of much concern
since this does not even comes into picture. But with the invent of attack packet split over
multiple packets, pattern matching has gone to name packet stream matching since now
packet needs to be matched over multiple packets, demanding more memory for storing
information about session flows and packets flowing, the partially matched patterns, other
flow specific data structures etc. Although there is Snort preprocessor for counter-attack to
this issue namely Stream4, but these issues are with this plugin also. For how much time,
does the information needs to stored before dropping the information, (it should not be the
case that IDS declares timeout and drops the session information while the destination host
still keeps waiting, or vice-versa). Then, what is the number of maximum sessions that can
be stored, since information that needs to be stored can vary from flow to flow.
Continuing the above discussion, issue of fragmented packets [9],[12], [15], [14] even
complicate the situation more. Since, some of new issues comes into picture like
• Out-of-order arrival of TCP segments
• Re-transmitted segments
• overlapping TCP packets hence issues with reassembly
• Missing of fragments in between or losing the state of the connection while connection
is still alive?
• How much data should be buffered (TCP window)
• Varying TTL of the fragments for evasion of NIDS. If the NIDS believes a packet was
received when in fact it did not reach the end-system, then its model of the end-system
s protocol state will be incorrect. If the attacker can find ways to systematically ensure
that some packets will be received and some not, the attacker may be able to evade the
NIDS.
While Authors in [17] has examined the character and effects of fragmented IP traffic as
monitored on highly aggregated Internet links. They had shown the amount of fragmented
packets in normal internet traffic and their characterizations, classifications as per the statis-
tics, protocol and application layer. They show that the amount of “fragmented packet”
traffic at internet links is less than 1% but there are two cases first they are talking at inter-
net level with good connection speeds and secondly, but what if traffic is fragmented attack
specific. These issues pops up some new questions other than existing ones like because
different operating systems have unique methods of fragment reassembly, if an intrusion
detection system uses a single “one size fits all” reassembly method, it may not reassemble
and process the packets the same way the destination host does. An attack that successfully
exploits these differences in fragment reassembly can cause the IDS to miss the malicious
traffic and fail to alert. While much of these have been solved in existing tools heuristically.
The above mentioned papers themselves have discussed few of them. Snort even contains
a preprocessor plugin i.e. Frag2 for most of these issues with some assumptions like if next
few fragments doesnot arrives in next 30 seconds, it will be dropped, then one can/needs
to specify the end hostsystem OS so that specific reassembly is done for that session. Some

18
tools even use bifurcating analysis [12], what it means is if the NIDS does not know which
of two possible interpretations the end-system may apply to incoming packets, then it splits
its analysis context for that connection into multiple threads, one for each possible interpre-
tation, and analyzes each context separately from then onwards. Some other methodologies
has also been discussed in the same paper.
Then, one of the major issue we have come across is the testing of existing approaches.
While there exists MIT DARPA Datasets but there are two issues with them, firstly they
contain very few attacks and secondly they are of 1998-99 period and since that attack tech-
nologies has advanced a lot. Even the attack tools are too specific for producing individual
attacks rather a generic traffic in-between including attack packets. While recently,[16] has
designed a new tool for IDS testing namely AGENT which takes other than producing ”pat-
tern strings”, also generate other type of traffic like the ones has been described in [12], but
then always its also synthetic.

19
Chapter 7

Issue of Out of Order packets

In the last section, we saw some of the limitations of the existing NIDS systems, “Handling
of Out of Order” packets is one of them. In most of the current implementations of intrusion
detection systems, out of order packets needs to stored unless all the fragments/segments
has been received. Then the packet is re-assembled and transmitted to the destination. Now,
since it involves temporary storage of the fragments, one can easily evade the IDS by con-
stant bombardment of the never ending fragments. Currently the IDS handle this issue by
limiting the number of fragments per flow and also setting timeout value for each of the
fragmented packet so that it will be dropped as soon as timeout time is passed after the
packet arrival.
Since, logging the packets (hence the blockage of network buffers) also affects the other
modules of the system, we propose a solution such that one need not store the out-of-order
packet i.e. as we keep getting fragments they are pushed to the destination instantly. But,
we made an assumption that is fragment size should always be greater than the largest
signature in the signature set.

7.1 Solution
Consider the Aho-Corasick algorithm of pattern matching, which involves making a definite
finite automata of the signature set and then traversing this DFA (we call this simply, DFA)
for the incoming traffic payload. Now, consider another DFA, lets call it RDFA. Define a
new signature set which is formed by reversing all of the signatures of the original signature
set. Now, RDFA is constructed similarly to the original DFA just that new signature set
generated in last step is used.

How this works

We claim that using the two DFA’s we would be able to do the matching (assumed above
assumption), without storing the fragmented packets. For each of the input packet payload,
do the transitions on the original DFA and for reverse of the packet payload, do the transition
jumps on the RDFA. Now, store pointers for the intermediate state for both of these DFA’s.
(which is stored anyways in the stream based pattern matching methodology). When the

20
next fragment comes, we move on the respective DFA’s from the stored states. Now, there
are the following possible cases:

• Fragments are in order

• We have seen fragments upto sequence n and now some fragment of sequence n + i
(where i > 1) arrives.

• We have seen fragments with sequence number n, n+2. Now comes the fragment with
sequence number n + 1.

Implications of the Assumption


Our assumption that fragment payload size will be greater than largest signature size from our
signature set, implies that no signature (if it exists in the flow) will be extended across more
than 2 fragments.
Proof: Lets say that the largest signature size is k, then fragment payloadsize >= kbytes,
now if a pattern starts matching at place i in fragment 0 ( even n can be taken, 0 is used
without loss of generality) , it can maximum go upto index i + k, and total packet size of two
fragments will be >= 2k. Since i <= k which implies i + k <= 2k, hence if there exists a
matching pair, it will definitely be in two consecutive fragments.

Handling the 3 cases


Case 1:
Here the matching in the RDFA becomes redundant. Since, normal DFA transition analysis
will also work in this case.

Case 2:
Here, we store the pointers in the respective DFA’s for the sequence n, and for the packet
n + k, we start the transitions from state 0 on both of them again.

Case 3:
Since, we keep on storing the pointers in both of the DFA’s for each of the flows (if packets
are out of order), now when we get an packet with next sequence number of one of packets
seen, then we start making transitions from the stored states.

So, it there exists a possible match for a signature, it will be reported. For eg. if it was
starting somewhere in packet n, and ending in n + 1, and n is seen earlier then from the
DFA transitions it will be matched while if n + 1 is seen earlier then the RDFA moving in
reverse direction ensures that matching do takes place and notification is send to the
appropriate action taking engine.

7.2 Example
Lets consider an example:
Signature Set = {“hello00 , “she00 }

21
Reverse Signature Set = {“olleh00 , “ehs00 }
Stream flow (payloads of packets) = {“whatshel00 , “lomg 00 }

Now, DFA’s will be as shown in figures below:

Figure 7.2: DFA and RDFA (respectively) for the above example.

Now, if first packet comes first, then DFA will report the match for signature1 and will be in
state 3, while RDFA will be in state 0 itself. As second packet arrives, DFA will report
another match as it crosses ’o’ in the payload and at the end it will be in state 0. Otherwise,
if second packet comes first, then DFA will be in state 0 while state RDFA will be in state 2.
And as the first packet arrives, RDFA will report the matches for both the signatures and
ends up in state 0 as the DFA.

7.3 Limitations
• Assumption (this is also a limitation), while [17] shows that fragmented packets are
quite less and also that our assumption will be true in most of the cases but we regard
this is as a limitation

• Snort with one DFA ends up using around 58MB of memory of the DFA, now with
two DFA’s this almost double. So, the tradeoff of network buffers goes into the
requirement of more memory.

We looked at different ways of optimizing the huge memory requirements by our proposed
solution, like merging the 2 DFA’s or keeping 2 transition tables rather than 2 DFA’s, using
suffix trees etc. but none of them worked, some are inefficient in terms of speed of
matching while some can lead to wrong results.

22
Chapter 8

Anomaly Detection

Anomaly detection is a key element of intrusion detection in which perturbations of


normal behavior suggest the presence of intentionally or unintentionally induced attacks,
faults, defects etc. Anomaly detection approaches build models of normal data and detect
deviations from the normal model in observed data. Anomaly detection applied to
intrusion detection and computer security has been an active area of research since it was
originally proposed in ’87. Most anomaly detection algorithms require a set of purely
normal data to train the model, and they implicitly assume that anomalies can be treated as
patterns not observed before. Since an outlier may be defined as a data point which is very
different from the rest of the data, based on some measure (which can be distance based or
the density based), this field has seen application of large number of clustering algorithms
from the fields of databases and data mining being employed here.
Some of the commonly employed algorithms belonging to this class are Nearest Neighbor
(NN), Distance to the k-th Nearest Neighbor, Mahalanobis-distance Based Outlier, Density
Based Local Outliers (LOF)[3], Unsupervised Support Vector Machines, Balanced Iterative
Reducing and Clustering using Hierachies (BIRCH). While [10] has discussed all this in
good detail with their relative pros and cons and also their performances on the DARPA [8]
as well as real data set, which indicated that LOF is the best among all these approaches.
Even one of the most anomaly detection tool namely MINDS [6]has also used this
approach in their NIDS.
Some popularly used Anomaly Detection Tools/Products:

• MINDS (Minnesota Network Intrusion Detection System) using the LOF approach for
learning model.

• ADWICE [4] in collaboration with SafeGaurd using the BIRCH clustering algorithm

• LANCOPE (a commercial Behavioral Network Anomaly Detection product)

• SPADE plug-in, for the open source IDS Snort, inspects recorded data for anomalous
behavior based on a computed score.

23
8.1 Approaches to Anomaly Detection
Anomaly Detection involves two parts namely building the normal profile of the network
and scoring the new flows on the scale 0 to 100 (0 being normal, 100 being anomalous).
Building of normal profile can be done in one of the two ways:

1. Using one of the clustering algorithms like BIRCH etc. where all the learning data
points are clustered first and then when new data comes, it is tested for possbility to
drop into one of the clusters else declared as outlier.

2. Using measures such as Local Outlier Factors, Nearest Neighbour etc. can be used.
Here, rather than clustering the data points, some features or statistics are calculated
over each of the points and then when the new data arrives, it is matched with the
nearest (which is also defined by these measures) data points and scored as normal or
anomalous.

Latter techniques have been mainly deployed in the anomaly detection systems at system
level where intrusion on a host is prime concern. Some have tried to use them at network
level also. But they have shortcoming that they require to store all the data points from the
learning data and when new data points arrive, heavy computations on both of the existing
data and new data point are required to get good results. Efficient data structures are
deployed to prevent these computations for the existing data points but for atleast new
data points they are quite heavy.
But, former approach that is clustering for anomaly detection, we would look at some of
the algorithms and their scalability issues in the next chapter.
Some of the limitations of anomaly detection systems being:

1. They work best when you have properly labelled normal data. Now, normal is
defined as the regular traffic features of the network. Gathering or Capturing the
normal data for a network is not feasible always because of variety of reasons like
toplogy of network, what if some attack or scan going on while assuming normal
data etc.

2. Since, these are based on statistical analysis, false positives rate are much higher than
pattern matching techniques.

3. Algorithms or techniques used are not scalable and fast enough to keepup with the
gigabit networks requirements of these days. Not fast enough because the statistical
processing involves heavy computions on each of the incoming packet and with large
feature set (which implies large dimension data set) makes computation even more
expensive. Scalability is an issue since these systems depend on the network traffic
behavior and we have networks today which have diverse and different requirements
at times.

4. Selection of features for defining the network behavior from the packets is still
developing. The proper set which can be said to properly and completely define the
network behavior is still not available.

24
5. Application level exploits at network anomaly detection systems are still in
developing phase (i.e. no product does this), hence any new buffer overflow, sql
injection or any such exploits are still undetectable by these systems. (Since most
commonly defined features capture the network behavior from the headers or the
flags of the packets)

6. Lack of adaptiveness of changing network behavior

People do try to provide solutions for various shortcomings like in case of the normal data,
one can use the pattern matching engine to detect the network attacks, scans etc if any
goingon. And then collect the data for large periods of time since networks may have
different requirements at different times of day or different times of weekdays.

Event correlation engines are developed for correlating the various events/alerts after a
threshold or the rule is found violated. But, rather looking for the superset of misuse
detection to be able to detect every intrusion, people rather looked for removing the
limitations which gave rise to anomaly detection techniques, which are able to detect new
intrusions and donot suffer from large signature set issues (since it doesnot uses any
signature set).

People have tried to work on the expensive(in cpu and memory terms) and time taking
computation of these systems. Even applications of techniques such as SVD (Singular
value decomposition), PCA (Principal Component Analysis) etc which reduce the
dimensions of the data in such a way that results does not vary much even if complete
dimensions were chosen. Also, clustering algorithms has been proposed which works via
exploring the “dense” sub-dimensions of the data rather working on the large data set in
large dimensions and results are positive.

ADWICE has looked at the last flaw and even developed a system which is adaptive to
network behavior. Even their clustering algorithm in very popular scalable clustering
algorithm from the databases, lets look at the some of the clustering algorithms with their
minuses and positives. But we know that when looking for the clustering algorithm for
anomaly detection we are looking for one with the following properties:

• Clustering should be unique, implies that clusters returned as output should be


independent of the order of the input data points.

• Clustering should be as accurate as possible and accuracy with distance based


approach comes at the cost of time while density based approach (here tradeoff is
large memory requirements), provides much accuracy and can be deployed to
consider all cases i.e. data has all equal sized clusters, data has some very dense
clusters while some sparse, data has some dense and small clusters while some large
and sparse and vice-versa etc.

• Clustering Algorithm should be adaptive that is new data points can be feeded into
the appropriate clusters and/or clusters can be modified even in testing time.

25
• Should be able to classify new input points as normal as anomalous efficiently and
fastly (keeping in mind gigabit requirements)

• (This is optional) But having an Memory and space efficient clustering algorithm
would be helpful to convert the product into an inline one.

26
Chapter 9

Clustering Algorithms for Anomaly


Detection

While a lot of research has been going in the field of databases for clustering and data
mining for various large, scalable and efficient clustering algorithms but here in anomaly
detection we have additional requirement of speed which should be much faster than those
in databases.
Clustering Algorithms can be broadly classified into three parts:

1. Partitioning Algorithms (eg. K-Means)

2. Grid Based Algorithms (eg. CLIQUE)

3. Hierarchical Algorithms

• Agglomerative (eg. BIRCH, DBSCAN etc)


• Divisive

Partitioning Approaches tries to optimize an function such that the space is divided into the
k-partitions and each point is in the best possible partition. While Grid based approaches,
slice the n-dimension into the small cells and then forming the dense clusters (this even
helps to work with dimension reducibility).
Hierarchical clustering algorithms groups data points into the same cluster initially and then
keep partitioning them as some dense clusters start forming (divisive approach), and
vice-versa for the Agglomerative approach. Lets look at some of the clustering algorithms.

9.1 BIRCH - Balanced Iterative Reducing and Clustering


BIRCH [21] is one of the fastest running clustering algorithms with an order of O(N ). It is
divided into 4 phases out of which last three are optional and are just for fine tuning the
phase 1 clustering. These constants to be feeded to the algorithm:

• T, that is the size of cluster.

• B, the branching factor of the tree.

27
• P, the memory size available to this process.
• L, the maximum number of clusters at each leaf node.
It maintains a binary Tree type Tree structure with each node having maximum of B child’s.
All the clusters are at the leaf nodes of the tree. Now initially the tree is empty and let say
T=0, as new data points keep coming, it traverses the tree to find the appropriate leaf node
where it can fit into, and then it looks for the perfect match in each of the clusters in that
leaf node. If it can fit into any of the clusters, then it is inserted there else a new cluster is
formed. The fitting of the data point is defined by distance based measure (which can be
manhattan, euclidean etc) and the cluster statistics are updated after insertion. If formation
of cluster increases the leaf child count by L, then the leaf is splitted into 2 leaves with a
parent above them and clusters are designated to the appropriate leaf nodes. Also, at
sometime if memory cap i.e. P is reached, then the T is increased so that the cluster sizes
are increased and more points can be fitted into each of the clusters, henceforth reducing
the cluster count freeing up some memory.
Positive Points of this algorithm:
• Running time is O(n) which is much better compared to other algorithms, also
additional phases cleans up some of the errors from phase 1.
• Memory efficient (hence easily be built as a inline product)
• Because of efficient tree data structure, classifying new data points is easier.
Negative Points of this algorithm:
• The clustering is not unique, i.e. the clustering results depends upon the order of the
data points. This is because it is possible that two small dense clusters can be joined
to form one cluster if the data points occur alternatively from two clusters or in some
same order (assumed that T is large enough to encapsulate both the clusters)
• Uses distance based measures for all calculations which are known to be less accurate
when clusters with different densities and sizes exists.
• Some data points may be classified to wrong clusters because of the limitations of
distance based calculations in measurement.
• Requires large numbers of the input params.
• Clusters formed are spherical, may lead to large false positives.

9.2 DBSCAN - Density-Based Algorithm for Discovering


Clusters in Large Spatial Databases with Noise
DBSCAN [11] an O(N ∗ log N ) time clustering algorithm has density as the similarity
measure between data points rather than the distance based formulas. It just iterates once
over all the data points for all the clusters with addition time of log N in each step makes it
an O(N log N ) algorithm. But, compared to BIRCH, it has just two input parameters namely

28
1. Eps - which is the measure of distance, within which one should look at for finding its
neighbours.

2. Minpts - which define the number of points which must lie within a Eps-
neighbourhood of a point for it to be core-point.

For each of the points, first find the Eps-neighbourhood of that point (this step takes
O(log N ) time, using efficient R∗ trees), then if there exists more than Minpts data points
within this region, then this cluster is named as a cluster of its own and assigned a new
clusterID. One may think here that each of the points which has more than Minpts data
points within its neighbourhood would be a separate cluster, this is not true though. Since,
paper also defines the merging mechanism for the clusters which should form one cluster
i.e. the density reachable and density connected for a pair of clusters. In the step where the
neighbourhood is find and clusterID is assigned, one more loop is ran for each of the points
in the neighbourhood of this point, where, checking is done if they also form their own
clusters, if yes, then clusters are merged based on definition of reachibility. Now, this step
goes into recursion (each of the merged cluster points, checks for their points and hence
their cluster possibilities) and using the definition of connectedness, clusters are kept
merging unless no more clusters can be merged.
Following are the positive points of this algorithm:

• Algorithm is density based statistics for clustering which is far more accurate than
distance based.

• Algorithm results in unique clustering results.

• Is able to detect the clusters of any size and shapes.

While these are the negative points of this algorithm:

• It also (as in BIRCH) requires two input parameters from the user.

• Anomaly detection with new data points won’t be much efficient (since for each of
the new input) program has to find the Eps-neighbourhood.

• It is not capable for differentiating clusters with different sizes and different densities
since the Eps is pre-defined and fixed all the time.

29
C h a p t e r 10

ADWICE-TRAD

ADWICE[4] is an adaptive anomaly detection algorithm which uses BIRCH[21] as the


clustering algorithm for learning the normal data and then classifying the new data as
anomalous or normal. As already seen in the last section, BIRCH suffers from a lot of
shortcomings. Here we tried to reduce the number of false positives by modifying the
threshold calculation and cluster bounds. In original original algorithm of BIRCH uses a
constant same threshold for each of the clusters (named ’T’) which increases whenever we
run out of given amount of memory so as to merge some clusters and free some memory.

Fixing the same threshold for all clusters is unfair for many of them. For example consider
a cluster, with all points near the center of cluster and cluster’s threshold ’T’. This cluster
can include some of the bad points which are near the boundary. Hence, fixing the same
threshold for all clusters is not fine rather it should depend on the cluster properties like
points distribution, density of the cluster etc. Hence, we propose a density based
mechanism for the deciding the cluster size and threshold which we name as
ADWICE-TRAD.

BIRCH uses distance based measures for clustering algorithm. According to which, all
clusters have the same threshold size, ‘T’. For a new point inclusion into a cluster, its
distance from the center of the cluster has to be less than ‘T’. So, define ’inclusion region’ as
the spherical region of radius ‘T’ around the center of cluster. Currently, ’inclusion region’
is independent of the current density of the cluster and same for all clusters.

But if, a cluster is dense, ’inclusion region’ should be less and should be dependent on the
current radius of the cluster rather than some predefined fixed threshold. While for sparse
cluster, ’inclusion region’ should be relatively large.
So, the inclusion of the new point in a cluster should be dependent on the density of the
cluster (i.e. the number of points in cluster and its current radius). Mathematically, the
measurements will be made on the basis of two more variables t0 and R0 where both the
terms has been explained below.

• R0 (additional statistical variable need to be stored with each cluster Cluster feature
set) is different for each of the clusters and depends on the current number of points

30
in it and its current Radius (R(CFi )

R0 (CFi ) = R(CFi ) ∗ (1 + c/f n(n, d))

d = dimension of the data points.


n = number of points inside this cluster.
here,
f n(n, d) = some function of ‘n’ and ‘d’
c = some constant.
i.e. R0 = its current radius + current radius multiplied by some constant and divided
by some function of ‘n’.
The function f n can be logd (n) or just log(n). So, threshold requirement should be

R(CFi ) <= R0 (CFi )

• But using above expression as measure, clustering will suffer in the case of one or
very few points in the cluster, hence define t0 as the threshold for handling the base
cases. (this can be kept fairly small). So, threshold requirement becomes

R(CFi ) <= max(R0 (CFi ), t0 )

• Also, for large sparse clusters, we want an upper bound on the radius of the cluster so
as to prevent explosion by some of the clusters.
So, threshold requirement in ADWICE-TRAD would be

R(CFi ) <= min(max(R0 (CFi ), t0 ), T )

ADWICE-TRAD requires one additional variable as input to the clustering algorithm


namely t0 , since R0 is automatically calculated from the dataset points.

31
C h a p t e r 11

Conclusion and Future Work

Both the methodologies of intrusion detection namely misuse and anomaly detection has
been widely researched. Since, we have already listed the major issues of both type of
systems it is quite clear that none of them can work without the other, that is for a robust
intrusion detection system, one needs both of them. We have looked at two problems
namely handling out of order packets in NIDS, where we have proposed a solution but still
far away from an workable model or an optimized model.
While we also worked on decreasing the false positive rate from ADWICE by introducing
additional statistical parameters for the clusters which introduce some component of
density in clustering algorithm.

11.1 Future Work


There are a number of avenues to pursue the work of optimizing the pattern matching and
anomaly detection engines:

• Optimizing the proposed solution for handling of out of order packets.

• Looking for solutions for other issues of the NIDS’s

• Providing solutions for any of the issues in Anomaly detection or combating the
limitations of such systems.

• Checking the effectiveness of the proposed fix in ADWICE

32
Bibliography

[1] http://afrodita.unicauca.edu.co/∼cbedon/snort/spp kickstart.html.

[2] http://bro-ids.org.

[3] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and J&#246;rg Sander. Lof:
identifying density-based local outliers. In SIGMOD ’00: Proceedings of the 2000 ACM
SIGMOD international conference on Management of data, pages 93–104, New York, NY,
USA, 2000. ACM Press.

[4] Kalle Burbeck and Simin Nadjm-Tehrani. Adwice - anomaly detection with real-time
incremental clustering. In ICISC, pages 407–424, 2004.

[5] T. Corman, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, 1990.

[6] L.Ertoz E., Eilertson A., A. Lazarevic, P. Tan, J. Srivastava, Kumar V., and P. Dokas. The
MINDS - Minnesota Intrusion Detection System, in “Next Generation Data Mining”. MIT
/AAAI Press, 2004.

[7] J. Heering, P. Klint, and J. Rekers. Incremental generation of lexical scanners. ACM
Trans. Program. Lang. Syst., 14(4):490–520, 1992.

[8] http://www.ll.mit.edu/IST/ideval/. DARPA Intrusion Detection Evaluation, 1999.

[9] C. A. Kent and J. C. Mogul. Fragmentation considered harmful. WRL Technical Report
87/3, 1987.

[10] Aleksandar Lazarevic, Aysel Ozgur, Levent Ertoz, Jaideep Srivastava, and Vipin
Kumar. A comparative study of anomaly detection schemes in network intrusion
detection. In SIAM International Conference on Data Mining, 2003.

[11] Ester M., Kriegel H.-P., and Xu X. Sander J. A density-based algorithm for discovering
clusters in large spatial databases with noise. In Proc. 2nd int. Conf. on Knowledge
Discovery and Data Mining (KDD 96). AAAI Press, 1996.

33
[12] C.Kreibich M.Handley and V.Paxon. Network intrusion detection: Evasion, traffic
normalization, and end-to-end protocol semantics. Proc.of the 10th USENIX Security
Symposium (Security ’01), 2001.

[13] Marc Norton. Optimizing pattern matching for intrusion detection, 2004.

[14] Judy Novak. Target-based fragmentation reassembly, April, 2005.

[15] Thomas H. Ptacek and Timothy N. Newsham. Insertion, evasion, and denial of
service: Eluding network intrusion detection. Technical report, Secure Networks, Inc.,
Suite 330, 1201 5th Street S.W, Calgary, Alberta, Canada, T2R-0Y6, 1998.

[16] Shai Rubin, Somesh Jha, and Barton P. Miller. Automatic generation and analysis of
nids attacks. In ACSAC ’04: Proceedings of the 20th Annual Computer Security
Applications Conference (ACSAC’04), pages 28–38, Washington, DC, USA, 2004. IEEE
Computer Society.

[17] C. Shannon, D. Moore, and K. Claffy. Characteristics of fragmented ip traffic on


internet links, 2001.

[18] R. Sidhu and V. Prasanna. Fast regular expression matching using fpgas, 2001.

[19] R. Sommer and V. Paxson. Enhancing byte-level network intrusion detection


signatures with context, 2003.

[20] N. Tuck, T. Sherwood, B. Calder, and G. Varghese. Deterministic memoryefficient


string matching algorithms for intrusion detection, 2004.

[21] Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: an efficient data
clustering method for very large databases. pages 103–114, 1996.

34

You might also like