Professional Documents
Culture Documents
Methods of Impvin String Effciency
Methods of Impvin String Effciency
Detection Systems
Nakul Aggarwal
Roll No: 02005022
a
Department of Computer Science and Engineering
Indian Institute of Technology
Bombay
May 3, 2006
BTech Project Approval Sheet
I hereby state that contents of this work are mine. Any substantially borrowed material
(cut-pasted or otherwise) including figures, tables and sketches have been duly acknowl-
edged.
Nakul Aggarwal
(Roll no: 02005022)
Date :
I hereby give my approval for the B.Tech Project Report titled “Improving The Efficiency
of Intrusion Detection Systems” by Nakul Aggarwal (02005022) to be submitted.
Prof. Om Damani
Date :
2
Acknowledgments
I would like to express my sincere gratitude towards my guides Prof. Om Damani and Prof.
Krithi Ramamritham for their invaluable consistent support and guidance. They has been
generous enough to let me pursue the work of my interest.
Nakul Aggarwal,
May, 2006
IIT Bombay.
3
Abstract
Network intrusion detection systems have become standard components in security infras-
tructures. The elements central to intrusion detection are the resources to be protected in a
target network, i.e., computer systems, file systems, network information, etc; models that
characterize the normal or legitimate behavior of network; techniques that compare the ac-
tual network activities with the established models, and identify those that are abnormal or
intrusive.
There are two approaches to combat issue of intrusion depending upon whether we have
some previous info of the attacks or not. One is, when from earlier intrusions we want to
know whether new flows are intrusive in nature. Other is after learning the normal behav-
ior of a network we want to classify new flows are normal or intrusive. Here we will look
at some of the approaches, algorithms, issues still unsolved. Then we had looked at the
issue of evading IDS’s by overflowing their network buffers with out of order packets and
has proposed a solution. Also, implementing inline and adaptive clustering mechanisms for
anomaly detection techniques at high traffic rate has been an limitation in anomaly detec-
tion approaches. ADWICE has been first effort in this field but since it uses distance based
clustering mechanism it suffers from inefficient clustering. We have proposed additional
density based statistical variables with each cluster so as to improve the efficiency.
i
Contents
1 Introduction 1
2 Misuse Detection 3
2.1 Approaches to Misuse Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Snort 10
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Snort Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Architecture of String Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Working Model of the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.5 Some More about Snort Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5.1 Preprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5.2 Inline Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.6 Multi-Pattern String matching Algorithms in Snort . . . . . . . . . . . . . . . . 13
4.6.1 Boyer-Moore Multi-pattern String Matching . . . . . . . . . . . . . . . 13
4.6.2 Wu-Manber Multi-pattern String Matching . . . . . . . . . . . . . . . . 13
4.6.3 Aho-Corasick Multi-pattern Matching . . . . . . . . . . . . . . . . . . . 13
4.6.4 Aho/Corasick with Sparse Matrix Implementation . . . . . . . . . . . 14
4.6.5 SFKSearch using Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Bro 15
ii
7 Issue of Out of Order packets 20
7.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8 Anomaly Detection 23
8.1 Approaches to Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . 24
10 ADWICE-TRAD 30
iii
Chapter 1
Introduction
There has been significant rise in the number of network attacks, hacking into the systems
using as simple as buffer overflows, new worms making whole networks down, attacking
of web servers via exploitation of software bugs or DOS attacks. Because of the increas-
ing personal information at stakes in the networks and ever expanding internet/intranet
threats, there’s much work going on in combating these attacks. Intrusion Detection is pri-
mary concerned with the detection of illegal activities and acquisitions of privileges that
cannot be detected with information flow and access control modules. Intrusion detection
can be of two types either Pattern Matching or Anomaly Detection. Pattern matching is just one
of the methods where system inspects network traffic for matches against exact, precisely-
described patterns, while, Anomaly Detection learns the normal network traffic and then
detects network intrusions by classifying the real traffic as being normal or anomalous.
The increasing network utilization and weekly increase in the number of critical appli-
cation layer exploits implies IDS designers must find ways to speed up their attack analysis
techniques when monitoring a fully-saturated network with less number of false positives.
Even the studies on empirical data indicate that number of signatures (which represent the
one or the other unique malicious activity) has grown around 2.5 times in last 3 yrs [20]. Then
ten’s of vulnerabilities of various softwares are exposed each day at various security related
lists and newsgroups or buqtraq mailing lists.
Signature Matching is the core of malicious traffic/event detection engines, independent
of implementation level in network i.e. whether its deployed at network Perimeter (typi-
cally known as Demilitarized Zone (DMZ’s)), at network level(NIDS) or at host level(HIDS).
And implementations exist at both level as in softwares products, hardware chips or pattern
maching engines. Some of the most popular software NIDS includes Snort, Bro, Dragon IDS
etc. Signature matching engines at hardware level implement the signatures with the help
of LookUp Tables (LUTs), TCAMs, NFA/DFA and pattern matching is done in router itself
for each packet maintaining the session flow information per IP.
Snort is the one of the most widely deployed IDS tools. Statistics say that signature
matching is the most computational intensive part of an IDS system. In Snort, upto 70% of
the total execution time goes in this process which clearly reflects the vast amount of work
that still needs to be done. Also, other than pattern matching when it comes to statesful
pattern matching we have the issues of out-of-memory and excessive CPU usages, therefore
much work still needs to be done in this field. Pattern matching for network security and
1
Intrusion Detection demands exceptionally high performance.
People have adopted various techniques at times for string matching and this technol-
ogy is still evolving with new optimizations and heuristics every day. String matching is of
high interest in theoretical aspect also. But here the need is of fast multiple pattern string
matching. Pattern matching started with the use of the most common string matching algo-
rithms like Boyer-Moore, Knuth-Morris-Pratt (KMP), Aho/Corasick etc. But, over the time,
researchers have designed new and efficient algorithms including improvements over these
existing approaches. Some of them are Coloured Petri Nets, Hash table mapping for each
pattern over Boyer-Moore given by Wu-Manber, Sparse-vector implementation over Aho-
Corasick, Tries and suffix trees (some kind of linked-list implementation for State-Machine
matching) etc.
Anomaly Detection is currently in its infant stage as far as real world implementations
are concerned, but is more powerful than pattern matching because of the capability of iden-
tifying new attacks. Also, less human effort is involved once setup and running comparing
to former approach where one constantly needs to update the signature database. While
there has been a lot of limitations in this approach listed in chapter 8, efficient and fast clus-
tering mechanism has been one of the most important limitation. In ADWICE[4], authors
has implemented an scalable and efficient anomaly detection system which uses clustering
algorithm namely BIRCH with some modifications. We have proposed an fix in the BIRCH
algorithm which will make clustering more robust and efficient especially in intrusion de-
tection applications.
• In Chapter 3, we will see some of the most common algorithms of pattern matching.
• In Chapter 4, we will be explaining the design and architecture of most widely used
Intrusion Detection system, Snort.
• Chapter 6, we will list the issues with most of the detection systems.
• Chapter 7, we have looked at the issue of ”out of order” packets in NIDS, we will look
here at a proposed solution and its proof.
• In Chapter 8, we will going into Anomaly Detection Systems, starting with a general
overview.
• Chapter 9 follows with the discussion of clustering algorithms for large scale data and
their analysis for implementation in anomaly detection systems.
• Chapter 10, we propose a fix for the ADWICE, an Adaptive anomaly detection system.
2
Chapter 2
Misuse Detection
Misuse detection aims to detect well-known attacks as well as slight variations of them,
by characterizing the rules that govern these attacks. Systems based on this approach use
different models like state transition analysis, or a more formal pattern classification. By its
nature, misuse detection tends to have low number of false positives but is unable to detect
attacks that lie beyond its knowledge. Some examples being:
1. IP packets that exceed the maximum legal length (65535 octets)
2. /User-Agent \:[ˆ\n]+ PassSickle/i
This is a example signature for capturing the packets containing trojan horse PassSickle.
Signature analysis or Pattern Matching is the technique of matching the data with a set
of predefined ruleset or signatures with any of the pattern matching algorithms which will
be discussed in the chapter 3.
Association Rules or Expert systems defines the intrusions as a set of rules and correspond-
ing actions, which are fired whenever a matching with some rule takes place.
3
State Transition Analysis Here the known intrusions are defined as definite finite state ma-
chine with some end nodes, every event either takes you to next state depending upon the
transitional input. Bro (refer Chapter 5) is an example of this kind of approach, where each
matching of some signature, flags, etc. triggers the correlation engine which makes an tran-
sition on the state machine.
Data Mining Approaches use statistical classification techniques like Naive Bayes, Deci-
sion Trees, Neural Networks, genetic algorithms etc. to classify the new events/flows being
normal or anomalous. Being statistical this requires some data to build up the models to
match new data against. Hence, here some learning data where flows are pre-labelled as
normal and/or anomalous is feed into the machine initially to build up the models and then
use these learned models for further classification.
While Misuse detection is the most widely deployed mechanism for NIDS, it suffers
from following flaws which has lead to the search of more efficient techniques for intrusion
detection. Some of the limitations being :
1. Since, it uses pre-defined set of signatures, it is not able to detect new threats/intrusions.
2. Over last few years, networks has seen large variety of intrusions, providing a large
signature set leading to large number of false positives and requires large human effort
optimizing the signature set as per one’s network needs and requirements.
4. Signature obfuscation. Here a attack eludes the NIDS by exploiting the fact that signa-
ture doesn’t covers all the attack instances. i.e. Given a signature “blaster”, the NIDS
can be easily evaded by the malicious packet[s] if it contains the string mlaster etc.
5. Other IDS evasion and invasion techniques. There contains a large set which has been
thoroughly discussed in [12] ( For eg. evading the signature matching rule set by
additional packet with arbitrary string and low TTL in between packets which contains
the main string, this substring will prevent matching engine from matching but the end
host gets affected since it doesn’t gets the additional packet with low TTL value.
6. Latency in development. This type of systems involve high manual involvement life-
long. From the starting of installing, optimizing the rule set, regular updating of sig-
natures, checking the alerts and hence the intrusions.
7. Association rules do suffer from all above with additional overhead of the clumsiness
which comes through as the number of attributes to match keep on increasing.
But, rather looking for the superset of misuse detection to be able to detect every intru-
sion, people rather looked for removing the limitations which gave rise to anomaly detection
techniques, which are able to detect new intrusions and donot suffer from large signature
set issues (since it doesnot uses any signature set).
4
Misuse Detection via Signature Matching is the most widely accepted approach because
of the large research base which provides the constant and updated flow of signature set,
typically within hours when a new vulnerability or exploit or worm is detected. One of
the most widely deployed tool for NIDS is Snort, which also uses this approach. A de-
tailed study of the snort architecture, techniques, algorithms and code has been discussed
in Chapter 4.
Pattern Matching matches the input flow with given a set of signatures. Signatures can
be both flags matching, IP Addresses, or content in the payload (which is there in most of
the signatures). Hence, most commonly string matching algorithms like Boyer More etc.
are deployed as part of these NIDS. Lets look at some of the common pattern matching
algorithms.
5
Chapter 3
Here, we will be discussing some of the basic must-know algorithms of string or pattern
matching. These include
1. Simple string matching
• Boyer-Moore
• Knuth-Morris-Pratt (KMP)
2. State Machine Matching
• Aho/Corasick
3. Hardware Solutions
• Bloom Filters and Extended Bloom Filters
• NFA/DFA implementation at hardware level
6
pattern is shifted to the next occurrence of the matched suffix in the pattern. Both heuris-
tics can lead to a shift distance of m. For the bad character heuristics this is the case, if the
first comparison causes a mismatch and the corresponding text symbol does not occur in
the pattern at all. For the good suffix heuristics this is the case, if only the first comparison
was a match, but that symbol does not occur elsewhere in the pattern. And with the help
of preprocessed “bad character” and “good suffix” values, one can finds the value of shift
needed as the max of these two.
The preprocessing for the good suffix heuristics is rather difficult to understand and to
implement. Therefore, some versions of the Boyer-Moore algorithm are found in which the
good suffix heuristics is left away. The argument is that the bad character heuristics would
be sufficient and the good suffix heuristics would not save many comparisons. However,
this is not true for small alphabet sets.
3.2 Knuth-Morris-Pratt
Main features
This was significant improvement in memory requirements over finite state machine
based string matching. It pre-calculates a auxiliary function π (m-dimension) which contains
the information about the jumping from current state to next state.
While during string matching process, π contains the information about the optimum
shifts needed in the case of a mis-match. The optimum shift depends on the prefix in pattern
which is also the suffix in the matched pattern part.
Aho/Corasick String Matching Automaton for a given finite set P of patterns is a (deter-
ministic) finite automaton G accepting the set of all words containing a word of P .
G consists of the following components:
7
Transition table is built during the preprocessing part. P Where at each state, there is in-
formation about where to jump to for each character ∈ . It just traverses the string to be
matched making transitions
P via the δ, the transition function which tells which state to jump
for each character ∈ . Whenever we reach a state ∈ F , a match is reported by the engine.
For simple string matching cases, it doesnot performs very well but when there are multiple
patterns or pattern matching is done at regular expression level, it is one of the best options
for pattern matching.
8
matching when done by DFA matching engine as software reported above mentioned stats
to be 87309.38sec and 229MB respectively.
While, there had been a lot of modifications and advancements in this approach also
after this initial effort.
9
Chapter 4
Snort
4.1 Introduction
Snort can perform real-time packet logging, protocol analysis, and content searching/matching.
It can be used to detect a variety of attacks and probes such as stealth port scans, CGI-based
attacks, Address Resolution Protocol (ARP) spoofing, and attacks on daemons with known
weaknesses. Snort utilizes descriptive rules to determine what traffic it should monitor and
a modularly designed detection engine to pinpoint attacks in real time. When an attack
is identified, Snort can take a variety of actions to alert the systems administrator to the
threat. Snort into its first releases used to have brute force matching which was very slow.
First boost to signature matching came in with the implementation of Boyer-Moore pattern
matching Algorithm. They have come long way after these initial implementations, we will
see some of those soon.
10
RTN header. Each OTN node further contains some other flags that needs to matched (like
Ack flag should be set etc.) and these checks are performed before the string matching to
avoid unnecessary pattern matching in case even flag doesnot matches. And when flag also
matches, engine calls the function pointer stored to do other(if any) necessary checks and
string matching using any of the string matching algorithm.
By default, Wu-Manber string matching algorithm is used. But snort contains imple-
mentations of large number of other pattern matching engines also including Modified
Wu-Manber Style Multi-Pattern Matcher, SFK matching engine, Aho/Corasick, Optimiza-
tions on Aho/Corasick, Sparse Matrix implementation of Aho/Corasick etc. We will be
discussing some of these algorithms in next section.
This Rulelist is build up during the initialization of the engine. But lately the number
of rules in snort rule DB has exceeded even 3000 mark such that the above mentioned 3 -
dimension linked lists [RTN, OTN, function pointers] are not able to work at high speed in
the network. Therefore they have done one more optimization i.e. they have implemented
a fast packet classification engine adding a 4rth dimension to the above structure.
This fourth dimension is “port” based classification of rules and this is done before the
RTN lists are created. That is we have port based classification for ruleset after the four
protocols mentioned above. The authors has assumed that given the port values (both of
source or destination) we can drop the rule in one of the following class.
Now each structure has linked list array of MAX PORT size (64*1024). This allows O(1)
mapping of rule on the basis of port value to its appropriate list. This additional dimension
speeds up the process of string matching since now the number of rules to be matched
against the incoming traffic are reduced by high number.
11
4.5 Some More about Snort Powers
4.5.1 Preprocessors
With the arrival of term ”Anomaly Detection”, their was high demand of this in snort
also. Since, rule based matching was done in Detection engine, the protocol anomaly de-
tection and many other functionalities which are independent of rules comes under this
category. Also, each added preprocessor, will demand more processing time affecting the
main strength of snort i.e. fast rule-based matching. Hence, authors thought of adding this
functionality as modular “plug-ins” something similar to modules in linux kernel which can
be deactivated whenever not needed or when they are effecting the snort performance.
Preprocessors are plugable components of Snort, introduced since version 1.5. They’re
“located” just after the module of protocol analysis and before detection engine and do not
depend of rules. They are called whenever a packet arrives, but just once, the detection
plugins, in the other hand, do depend of rules and may be applied many times for a sin-
gle packet. SPP’s (Snort Preprocessors) can be used in different ways: They can look for
an specific behavior(portscan, flowportscan), to be support for further analysis(is this the
expression? help us) like flow, or just collect certain information, like perfmonitor.
Hola Anonimo has given a very basic level tutorial [1] on how to write a preprocessor
plugin for Snort. Some of the major achievements or goals of Preprocessors were
• To decrease the number of false positives,
• Adding the anomaly detection techniques to snort and last but not the least
• Improving the pattern matching when pattern is extended over multiple segments or
fragments.
Now we will look at last 2 of the above mentioned achievements.
Anomaly Detection Anomaly Detection preprocessors include both type of protocol anomaly
detection (via protocol specific PP like arpspoof, telnet decode etc) and even the ad-
vanced techniques of statistical approaches to anomaly detection via the Spade plugin
(A brief description has been given in Appendix A).
Pattern Matching over Multiple Packets This is achieved through the Stream4 and Frag2
preprocessors, where former adds the TCP statefulness and session reassembly so that
connection status and information can be stored providing more information on alerts
and also removing the unnecessary checks and also check the patterns which are ex-
tending over multiple packets. While the latter preprocessor prevents the IDS evasion
and invasions via fragmented packets [15].
12
4.6 Multi-Pattern String matching Algorithms in Snort
1. Boyer More
2. Wu-Manber
3. Aho/Corasick
4. Sparse matrix with Aho/Corasick
5. Tries in SFKSearch
• Rather than matching all the patterns they have exploited power of hash functions,
where initially HASH table is built and all patterns are categorized into appropriate
table entry. Building of hash table is quite interesting here, since the first b-length
substring is considered from the prefix of each pattern for calculation of each hash
value.
Ambiguity lies in the case where the SHIFT reports a match but there is no entry in the
Hashtable (since the hash only depends on first b-character substring in the pattern), there-
fore in that case only 1 character is skipped.
The reported macthing times are nearly two times faster than GNU-grep. The scanning
operation was also shown to be in O(bN/m) where N is the size of input.
13
activated at each new character input other than the existing DFA’s. Of course some go out
also but difference is huge.
But power of the algorithm is, it is unaffected by the variance in size of the patterns and
worse and average case performance is same.
Sparse-Row format
Vector: 0 0 0 2 4 0 0 0 6 0 7 0 0 0 0 0 0
Sparse-Row Storage: 8 4 2 5 4 9 7 11 7
Now for each DFA state rather than having a 256-size vector of which most are 0 values, we
use sparse matrices to present the transition element and the corresponding value. Cleary
since we cannot have O(1) transition time in this implementation, since we need to traverse
this new vector to find the transition element. The memory requirements go down by four
times which is quite significant. There are some other compact representations have also
been discussed by the author namely Compressed Sparse Vector Format, Banded-Row For-
mat and CSR Matrix Format.
14
Chapter 5
Bro
Bro[2] is a Unix-based Network Intrusion Detection System (IDS). Bro monitors network
traffic and detects intrusion attempts based on the traffic characteristics and content. Bro
detects intrusions by comparing network traffic against rules describing events that are
deemed troublesome. These rules might describe activities (e.g., certain hosts connecting
to certain services), what activities are worth alerting (e.g., attempts to a given number of
different hosts constitutes a “scan”), or signatures describing known attacks or access to
known vulnerabilities. If Bro detects something of interest, it can be instructed to either is-
sue a log entry or initiate the execution of an operating system command. The main aim of
this IDS to combat two major shortcomings of the snort engine namely high false alarm rates
and the string matching time. For the former, they designed the concept of context based
pattern matching, where additional context is provided by
1. Regular expressions for signatures rather strings.
2. Providing the alert engine a notion of connection state and knowledge.
In their design, for every matched pattern or rule, rather than generation of an alert, an
event is generated and passed to another component named as policy script component which
at the abstract level sort of correlates these events to find the possibility of an attack. But
matching large number of patterns each time is quite intensive especially when they have
two engines running simultaneously. For combating this problem, they have implemented
DFA matching for pattern matching algorithm which also provides additional strength to
their patterns since they are more robust to false positives now. Since the DFA construction
requires quite large memory requirements, they have used the approach of on-the-fly gen-
eration of the DFA as given in [7] and also implemented the memory bounded DFA engine
in-case there is algorithmic attack on the engine itself so that not to affect the other engine.
They have compared their approach with Snort and reported some interesting results
also.
• The reported matching time was quite similar in without-cache implementation of Bro
engine and snort.
• The number of alerts and signatures in Bro were much more informative as compared
to Snort, eliminating a large number of false positives.
• They because of their context-based matching engine has inbuilt capability to fightback
TCP reassembly and fragmentation issues.
15
Important question is then why snort is the most widely used tool? There is no such
answer available anywhere but these arguments are just what are my inferences:
• With the implementation of efficient string matching algorithms, the running time of
snort exceeds bro by much large margin.
• Snort has large and regularly updated signature database, which is most important
reason for its usage.
• Even though Bro signatures are more context-specific, without regular updation of
signatures and more proper categorization (with the ever increasing signature set), the
performance goes down.
• The memory requirements are quite high, since they use a DFA matching engine.
16
Chapter 6
Other than Pattern matching “algorithm” decision, there are a lot of other issues that also
needs to considered before choosing any one of them. Of course, fast matching is the natural
need for the decision but there are some other issues to be kept in mind like fighting false
positives example in some cases it is possible that payload contains a pattern for buffer
overflow attack via telnet application protocol but what if there was no active telnet session
between two hosts. Then, other issue can be what if pattern is split over multiple packets?
Some of issues with respect to choice of algorithm and limitations of signature matching has
been stated below.
• Memory vs Speed
• Signature format
• Session-Based and Application Level Signature Matching
• State Holding issues in-cases of pattern extending over multiple packets
• Packet Fragmentation Issues.
• Getting packet dumps or testing data set? (other than attack tools and DARPA set.)
While one always needs to compromise between memory requirements vs speed avail-
able. As we can see in the existing algorithms itself, Aho/Corasick provides O(1) time pat-
tern matching but requires quite large memory for the storage of the state machine. While
the other string matching algorithms such as Boyer-Moore can lead to O(mn) time require-
ments in cases of algorithmic attacks. One must need to payoff one depending upon the
constraints.
Most of the IDS’s except a few use the byte or character based string as the patterns pre-
sentation format. While this is also needed as the most common algorithms used are Boyer-
Moore, KMP etc. But if State-machine matching is being deployed then regular expression
can provide a better pattern which can be more informative and will be more unique to
the attack it is identifying. Other than these, most of the Snort rules do contains multiple
patterns with different offset and depth values which can be very well expressed in sin-
gle regular expression with the usage of basic regular expression patterns like . and * etc.
[19] provides some examples also. Also, Bro contains patterns in regex (regular expression)
format itself.
Then, [19] also discusses about the statesful packet matching where IDS stores the infor-
mation about the context of the traffic between two peers providing more efficient pattern
17
matching results but the overheads involved are the massive because of the information that
needs to be stored specific to content of the traffic for large amount of the flows. While over
this, one can also provide application level pattern matching to provide even better results.
One of the most important issues with IDS systems is the state holding issue which can
be explained as the amount of the information that needs to be stored for each flow flowing
through it. Incase of pattern matching over individual packets, this is not of much concern
since this does not even comes into picture. But with the invent of attack packet split over
multiple packets, pattern matching has gone to name packet stream matching since now
packet needs to be matched over multiple packets, demanding more memory for storing
information about session flows and packets flowing, the partially matched patterns, other
flow specific data structures etc. Although there is Snort preprocessor for counter-attack to
this issue namely Stream4, but these issues are with this plugin also. For how much time,
does the information needs to stored before dropping the information, (it should not be the
case that IDS declares timeout and drops the session information while the destination host
still keeps waiting, or vice-versa). Then, what is the number of maximum sessions that can
be stored, since information that needs to be stored can vary from flow to flow.
Continuing the above discussion, issue of fragmented packets [9],[12], [15], [14] even
complicate the situation more. Since, some of new issues comes into picture like
• Out-of-order arrival of TCP segments
• Re-transmitted segments
• overlapping TCP packets hence issues with reassembly
• Missing of fragments in between or losing the state of the connection while connection
is still alive?
• How much data should be buffered (TCP window)
• Varying TTL of the fragments for evasion of NIDS. If the NIDS believes a packet was
received when in fact it did not reach the end-system, then its model of the end-system
s protocol state will be incorrect. If the attacker can find ways to systematically ensure
that some packets will be received and some not, the attacker may be able to evade the
NIDS.
While Authors in [17] has examined the character and effects of fragmented IP traffic as
monitored on highly aggregated Internet links. They had shown the amount of fragmented
packets in normal internet traffic and their characterizations, classifications as per the statis-
tics, protocol and application layer. They show that the amount of “fragmented packet”
traffic at internet links is less than 1% but there are two cases first they are talking at inter-
net level with good connection speeds and secondly, but what if traffic is fragmented attack
specific. These issues pops up some new questions other than existing ones like because
different operating systems have unique methods of fragment reassembly, if an intrusion
detection system uses a single “one size fits all” reassembly method, it may not reassemble
and process the packets the same way the destination host does. An attack that successfully
exploits these differences in fragment reassembly can cause the IDS to miss the malicious
traffic and fail to alert. While much of these have been solved in existing tools heuristically.
The above mentioned papers themselves have discussed few of them. Snort even contains
a preprocessor plugin i.e. Frag2 for most of these issues with some assumptions like if next
few fragments doesnot arrives in next 30 seconds, it will be dropped, then one can/needs
to specify the end hostsystem OS so that specific reassembly is done for that session. Some
18
tools even use bifurcating analysis [12], what it means is if the NIDS does not know which
of two possible interpretations the end-system may apply to incoming packets, then it splits
its analysis context for that connection into multiple threads, one for each possible interpre-
tation, and analyzes each context separately from then onwards. Some other methodologies
has also been discussed in the same paper.
Then, one of the major issue we have come across is the testing of existing approaches.
While there exists MIT DARPA Datasets but there are two issues with them, firstly they
contain very few attacks and secondly they are of 1998-99 period and since that attack tech-
nologies has advanced a lot. Even the attack tools are too specific for producing individual
attacks rather a generic traffic in-between including attack packets. While recently,[16] has
designed a new tool for IDS testing namely AGENT which takes other than producing ”pat-
tern strings”, also generate other type of traffic like the ones has been described in [12], but
then always its also synthetic.
19
Chapter 7
In the last section, we saw some of the limitations of the existing NIDS systems, “Handling
of Out of Order” packets is one of them. In most of the current implementations of intrusion
detection systems, out of order packets needs to stored unless all the fragments/segments
has been received. Then the packet is re-assembled and transmitted to the destination. Now,
since it involves temporary storage of the fragments, one can easily evade the IDS by con-
stant bombardment of the never ending fragments. Currently the IDS handle this issue by
limiting the number of fragments per flow and also setting timeout value for each of the
fragmented packet so that it will be dropped as soon as timeout time is passed after the
packet arrival.
Since, logging the packets (hence the blockage of network buffers) also affects the other
modules of the system, we propose a solution such that one need not store the out-of-order
packet i.e. as we keep getting fragments they are pushed to the destination instantly. But,
we made an assumption that is fragment size should always be greater than the largest
signature in the signature set.
7.1 Solution
Consider the Aho-Corasick algorithm of pattern matching, which involves making a definite
finite automata of the signature set and then traversing this DFA (we call this simply, DFA)
for the incoming traffic payload. Now, consider another DFA, lets call it RDFA. Define a
new signature set which is formed by reversing all of the signatures of the original signature
set. Now, RDFA is constructed similarly to the original DFA just that new signature set
generated in last step is used.
We claim that using the two DFA’s we would be able to do the matching (assumed above
assumption), without storing the fragmented packets. For each of the input packet payload,
do the transitions on the original DFA and for reverse of the packet payload, do the transition
jumps on the RDFA. Now, store pointers for the intermediate state for both of these DFA’s.
(which is stored anyways in the stream based pattern matching methodology). When the
20
next fragment comes, we move on the respective DFA’s from the stored states. Now, there
are the following possible cases:
• We have seen fragments upto sequence n and now some fragment of sequence n + i
(where i > 1) arrives.
• We have seen fragments with sequence number n, n+2. Now comes the fragment with
sequence number n + 1.
Case 2:
Here, we store the pointers in the respective DFA’s for the sequence n, and for the packet
n + k, we start the transitions from state 0 on both of them again.
Case 3:
Since, we keep on storing the pointers in both of the DFA’s for each of the flows (if packets
are out of order), now when we get an packet with next sequence number of one of packets
seen, then we start making transitions from the stored states.
So, it there exists a possible match for a signature, it will be reported. For eg. if it was
starting somewhere in packet n, and ending in n + 1, and n is seen earlier then from the
DFA transitions it will be matched while if n + 1 is seen earlier then the RDFA moving in
reverse direction ensures that matching do takes place and notification is send to the
appropriate action taking engine.
7.2 Example
Lets consider an example:
Signature Set = {“hello00 , “she00 }
21
Reverse Signature Set = {“olleh00 , “ehs00 }
Stream flow (payloads of packets) = {“whatshel00 , “lomg 00 }
Figure 7.2: DFA and RDFA (respectively) for the above example.
Now, if first packet comes first, then DFA will report the match for signature1 and will be in
state 3, while RDFA will be in state 0 itself. As second packet arrives, DFA will report
another match as it crosses ’o’ in the payload and at the end it will be in state 0. Otherwise,
if second packet comes first, then DFA will be in state 0 while state RDFA will be in state 2.
And as the first packet arrives, RDFA will report the matches for both the signatures and
ends up in state 0 as the DFA.
7.3 Limitations
• Assumption (this is also a limitation), while [17] shows that fragmented packets are
quite less and also that our assumption will be true in most of the cases but we regard
this is as a limitation
• Snort with one DFA ends up using around 58MB of memory of the DFA, now with
two DFA’s this almost double. So, the tradeoff of network buffers goes into the
requirement of more memory.
We looked at different ways of optimizing the huge memory requirements by our proposed
solution, like merging the 2 DFA’s or keeping 2 transition tables rather than 2 DFA’s, using
suffix trees etc. but none of them worked, some are inefficient in terms of speed of
matching while some can lead to wrong results.
22
Chapter 8
Anomaly Detection
• MINDS (Minnesota Network Intrusion Detection System) using the LOF approach for
learning model.
• ADWICE [4] in collaboration with SafeGaurd using the BIRCH clustering algorithm
• SPADE plug-in, for the open source IDS Snort, inspects recorded data for anomalous
behavior based on a computed score.
23
8.1 Approaches to Anomaly Detection
Anomaly Detection involves two parts namely building the normal profile of the network
and scoring the new flows on the scale 0 to 100 (0 being normal, 100 being anomalous).
Building of normal profile can be done in one of the two ways:
1. Using one of the clustering algorithms like BIRCH etc. where all the learning data
points are clustered first and then when new data comes, it is tested for possbility to
drop into one of the clusters else declared as outlier.
2. Using measures such as Local Outlier Factors, Nearest Neighbour etc. can be used.
Here, rather than clustering the data points, some features or statistics are calculated
over each of the points and then when the new data arrives, it is matched with the
nearest (which is also defined by these measures) data points and scored as normal or
anomalous.
Latter techniques have been mainly deployed in the anomaly detection systems at system
level where intrusion on a host is prime concern. Some have tried to use them at network
level also. But they have shortcoming that they require to store all the data points from the
learning data and when new data points arrive, heavy computations on both of the existing
data and new data point are required to get good results. Efficient data structures are
deployed to prevent these computations for the existing data points but for atleast new
data points they are quite heavy.
But, former approach that is clustering for anomaly detection, we would look at some of
the algorithms and their scalability issues in the next chapter.
Some of the limitations of anomaly detection systems being:
1. They work best when you have properly labelled normal data. Now, normal is
defined as the regular traffic features of the network. Gathering or Capturing the
normal data for a network is not feasible always because of variety of reasons like
toplogy of network, what if some attack or scan going on while assuming normal
data etc.
2. Since, these are based on statistical analysis, false positives rate are much higher than
pattern matching techniques.
3. Algorithms or techniques used are not scalable and fast enough to keepup with the
gigabit networks requirements of these days. Not fast enough because the statistical
processing involves heavy computions on each of the incoming packet and with large
feature set (which implies large dimension data set) makes computation even more
expensive. Scalability is an issue since these systems depend on the network traffic
behavior and we have networks today which have diverse and different requirements
at times.
4. Selection of features for defining the network behavior from the packets is still
developing. The proper set which can be said to properly and completely define the
network behavior is still not available.
24
5. Application level exploits at network anomaly detection systems are still in
developing phase (i.e. no product does this), hence any new buffer overflow, sql
injection or any such exploits are still undetectable by these systems. (Since most
commonly defined features capture the network behavior from the headers or the
flags of the packets)
People do try to provide solutions for various shortcomings like in case of the normal data,
one can use the pattern matching engine to detect the network attacks, scans etc if any
goingon. And then collect the data for large periods of time since networks may have
different requirements at different times of day or different times of weekdays.
Event correlation engines are developed for correlating the various events/alerts after a
threshold or the rule is found violated. But, rather looking for the superset of misuse
detection to be able to detect every intrusion, people rather looked for removing the
limitations which gave rise to anomaly detection techniques, which are able to detect new
intrusions and donot suffer from large signature set issues (since it doesnot uses any
signature set).
People have tried to work on the expensive(in cpu and memory terms) and time taking
computation of these systems. Even applications of techniques such as SVD (Singular
value decomposition), PCA (Principal Component Analysis) etc which reduce the
dimensions of the data in such a way that results does not vary much even if complete
dimensions were chosen. Also, clustering algorithms has been proposed which works via
exploring the “dense” sub-dimensions of the data rather working on the large data set in
large dimensions and results are positive.
ADWICE has looked at the last flaw and even developed a system which is adaptive to
network behavior. Even their clustering algorithm in very popular scalable clustering
algorithm from the databases, lets look at the some of the clustering algorithms with their
minuses and positives. But we know that when looking for the clustering algorithm for
anomaly detection we are looking for one with the following properties:
• Clustering Algorithm should be adaptive that is new data points can be feeded into
the appropriate clusters and/or clusters can be modified even in testing time.
25
• Should be able to classify new input points as normal as anomalous efficiently and
fastly (keeping in mind gigabit requirements)
• (This is optional) But having an Memory and space efficient clustering algorithm
would be helpful to convert the product into an inline one.
26
Chapter 9
While a lot of research has been going in the field of databases for clustering and data
mining for various large, scalable and efficient clustering algorithms but here in anomaly
detection we have additional requirement of speed which should be much faster than those
in databases.
Clustering Algorithms can be broadly classified into three parts:
3. Hierarchical Algorithms
Partitioning Approaches tries to optimize an function such that the space is divided into the
k-partitions and each point is in the best possible partition. While Grid based approaches,
slice the n-dimension into the small cells and then forming the dense clusters (this even
helps to work with dimension reducibility).
Hierarchical clustering algorithms groups data points into the same cluster initially and then
keep partitioning them as some dense clusters start forming (divisive approach), and
vice-versa for the Agglomerative approach. Lets look at some of the clustering algorithms.
27
• P, the memory size available to this process.
• L, the maximum number of clusters at each leaf node.
It maintains a binary Tree type Tree structure with each node having maximum of B child’s.
All the clusters are at the leaf nodes of the tree. Now initially the tree is empty and let say
T=0, as new data points keep coming, it traverses the tree to find the appropriate leaf node
where it can fit into, and then it looks for the perfect match in each of the clusters in that
leaf node. If it can fit into any of the clusters, then it is inserted there else a new cluster is
formed. The fitting of the data point is defined by distance based measure (which can be
manhattan, euclidean etc) and the cluster statistics are updated after insertion. If formation
of cluster increases the leaf child count by L, then the leaf is splitted into 2 leaves with a
parent above them and clusters are designated to the appropriate leaf nodes. Also, at
sometime if memory cap i.e. P is reached, then the T is increased so that the cluster sizes
are increased and more points can be fitted into each of the clusters, henceforth reducing
the cluster count freeing up some memory.
Positive Points of this algorithm:
• Running time is O(n) which is much better compared to other algorithms, also
additional phases cleans up some of the errors from phase 1.
• Memory efficient (hence easily be built as a inline product)
• Because of efficient tree data structure, classifying new data points is easier.
Negative Points of this algorithm:
• The clustering is not unique, i.e. the clustering results depends upon the order of the
data points. This is because it is possible that two small dense clusters can be joined
to form one cluster if the data points occur alternatively from two clusters or in some
same order (assumed that T is large enough to encapsulate both the clusters)
• Uses distance based measures for all calculations which are known to be less accurate
when clusters with different densities and sizes exists.
• Some data points may be classified to wrong clusters because of the limitations of
distance based calculations in measurement.
• Requires large numbers of the input params.
• Clusters formed are spherical, may lead to large false positives.
28
1. Eps - which is the measure of distance, within which one should look at for finding its
neighbours.
2. Minpts - which define the number of points which must lie within a Eps-
neighbourhood of a point for it to be core-point.
For each of the points, first find the Eps-neighbourhood of that point (this step takes
O(log N ) time, using efficient R∗ trees), then if there exists more than Minpts data points
within this region, then this cluster is named as a cluster of its own and assigned a new
clusterID. One may think here that each of the points which has more than Minpts data
points within its neighbourhood would be a separate cluster, this is not true though. Since,
paper also defines the merging mechanism for the clusters which should form one cluster
i.e. the density reachable and density connected for a pair of clusters. In the step where the
neighbourhood is find and clusterID is assigned, one more loop is ran for each of the points
in the neighbourhood of this point, where, checking is done if they also form their own
clusters, if yes, then clusters are merged based on definition of reachibility. Now, this step
goes into recursion (each of the merged cluster points, checks for their points and hence
their cluster possibilities) and using the definition of connectedness, clusters are kept
merging unless no more clusters can be merged.
Following are the positive points of this algorithm:
• Algorithm is density based statistics for clustering which is far more accurate than
distance based.
• It also (as in BIRCH) requires two input parameters from the user.
• Anomaly detection with new data points won’t be much efficient (since for each of
the new input) program has to find the Eps-neighbourhood.
• It is not capable for differentiating clusters with different sizes and different densities
since the Eps is pre-defined and fixed all the time.
29
C h a p t e r 10
ADWICE-TRAD
Fixing the same threshold for all clusters is unfair for many of them. For example consider
a cluster, with all points near the center of cluster and cluster’s threshold ’T’. This cluster
can include some of the bad points which are near the boundary. Hence, fixing the same
threshold for all clusters is not fine rather it should depend on the cluster properties like
points distribution, density of the cluster etc. Hence, we propose a density based
mechanism for the deciding the cluster size and threshold which we name as
ADWICE-TRAD.
BIRCH uses distance based measures for clustering algorithm. According to which, all
clusters have the same threshold size, ‘T’. For a new point inclusion into a cluster, its
distance from the center of the cluster has to be less than ‘T’. So, define ’inclusion region’ as
the spherical region of radius ‘T’ around the center of cluster. Currently, ’inclusion region’
is independent of the current density of the cluster and same for all clusters.
But if, a cluster is dense, ’inclusion region’ should be less and should be dependent on the
current radius of the cluster rather than some predefined fixed threshold. While for sparse
cluster, ’inclusion region’ should be relatively large.
So, the inclusion of the new point in a cluster should be dependent on the density of the
cluster (i.e. the number of points in cluster and its current radius). Mathematically, the
measurements will be made on the basis of two more variables t0 and R0 where both the
terms has been explained below.
• R0 (additional statistical variable need to be stored with each cluster Cluster feature
set) is different for each of the clusters and depends on the current number of points
30
in it and its current Radius (R(CFi )
• But using above expression as measure, clustering will suffer in the case of one or
very few points in the cluster, hence define t0 as the threshold for handling the base
cases. (this can be kept fairly small). So, threshold requirement becomes
• Also, for large sparse clusters, we want an upper bound on the radius of the cluster so
as to prevent explosion by some of the clusters.
So, threshold requirement in ADWICE-TRAD would be
31
C h a p t e r 11
Both the methodologies of intrusion detection namely misuse and anomaly detection has
been widely researched. Since, we have already listed the major issues of both type of
systems it is quite clear that none of them can work without the other, that is for a robust
intrusion detection system, one needs both of them. We have looked at two problems
namely handling out of order packets in NIDS, where we have proposed a solution but still
far away from an workable model or an optimized model.
While we also worked on decreasing the false positive rate from ADWICE by introducing
additional statistical parameters for the clusters which introduce some component of
density in clustering algorithm.
• Providing solutions for any of the issues in Anomaly detection or combating the
limitations of such systems.
32
Bibliography
[2] http://bro-ids.org.
[3] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. Lof:
identifying density-based local outliers. In SIGMOD ’00: Proceedings of the 2000 ACM
SIGMOD international conference on Management of data, pages 93–104, New York, NY,
USA, 2000. ACM Press.
[4] Kalle Burbeck and Simin Nadjm-Tehrani. Adwice - anomaly detection with real-time
incremental clustering. In ICISC, pages 407–424, 2004.
[5] T. Corman, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, 1990.
[6] L.Ertoz E., Eilertson A., A. Lazarevic, P. Tan, J. Srivastava, Kumar V., and P. Dokas. The
MINDS - Minnesota Intrusion Detection System, in “Next Generation Data Mining”. MIT
/AAAI Press, 2004.
[7] J. Heering, P. Klint, and J. Rekers. Incremental generation of lexical scanners. ACM
Trans. Program. Lang. Syst., 14(4):490–520, 1992.
[9] C. A. Kent and J. C. Mogul. Fragmentation considered harmful. WRL Technical Report
87/3, 1987.
[10] Aleksandar Lazarevic, Aysel Ozgur, Levent Ertoz, Jaideep Srivastava, and Vipin
Kumar. A comparative study of anomaly detection schemes in network intrusion
detection. In SIAM International Conference on Data Mining, 2003.
[11] Ester M., Kriegel H.-P., and Xu X. Sander J. A density-based algorithm for discovering
clusters in large spatial databases with noise. In Proc. 2nd int. Conf. on Knowledge
Discovery and Data Mining (KDD 96). AAAI Press, 1996.
33
[12] C.Kreibich M.Handley and V.Paxon. Network intrusion detection: Evasion, traffic
normalization, and end-to-end protocol semantics. Proc.of the 10th USENIX Security
Symposium (Security ’01), 2001.
[13] Marc Norton. Optimizing pattern matching for intrusion detection, 2004.
[15] Thomas H. Ptacek and Timothy N. Newsham. Insertion, evasion, and denial of
service: Eluding network intrusion detection. Technical report, Secure Networks, Inc.,
Suite 330, 1201 5th Street S.W, Calgary, Alberta, Canada, T2R-0Y6, 1998.
[16] Shai Rubin, Somesh Jha, and Barton P. Miller. Automatic generation and analysis of
nids attacks. In ACSAC ’04: Proceedings of the 20th Annual Computer Security
Applications Conference (ACSAC’04), pages 28–38, Washington, DC, USA, 2004. IEEE
Computer Society.
[18] R. Sidhu and V. Prasanna. Fast regular expression matching using fpgas, 2001.
[21] Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: an efficient data
clustering method for very large databases. pages 103–114, 1996.
34