Professional Documents
Culture Documents
C4.5 Based Sequential Attack Detection and Identification Model
C4.5 Based Sequential Attack Detection and Identification Model
Introduction
8,000
6,000
5,000
3,000
Year
Examples : Figure : The number of total vulnerabilities
Denial of Service (DoS) catalogued from 1995 to 2006
160000
100000
IP Spoofing 82,094
80000
40000
20000
6
0
Year
Victim
overflow
Buffer
Edge Router Transit Domain
Legitimate Packets
Edge Router Stub Domain
Attack Packets
bottleneck link
Figure : Packets drop under DDoS Attack
3
Motivation
Existing approaches to defend against Attacks
4
Sequential Multi-Level Classification
Model
The objective is to find the natural hierarchy in the network traffic and to exploit
the generic and differentiating characteristics of different attacks to build a
more secure environment.
A differential approach is used to detect one kind of attack at a time from the
network traffic.
A sequential model with different binary classifiers at each level, categorizing
attacks in a step by step manner, is used.
Rules are also generated at different levels of abstraction.
KDD99 Dataset is used for evaluation
Node 1
Class 1 Node 2
Class 2 Node 3
Class 3 Class 4
Mathematical Model
Traffic Feature Distribution
Flow Number of
X {ni , i 1,2,3, N} Id (i) Packets
Where
(ni)
X is a random process that i occurs
ni times.
1 n1
2 n2
X
i is defined by one or combination
of following traffic features in
packet header like: 3 n3
Source IP address
Destination IP address
: :
: :
.
Source Port
Destination Port
2. Sampling
{t ,t} {t ,t} {t ,t}
{ X ( t ), t j , j n } 1 2 3 ... ... N H(X)
X (t ) represents number of packet 3Δ X(3Δ,1) X(3Δ,2) X(3Δ,3) ... ... X(3Δ,N) H(3Δ)
arrivals for a flow in {t , t}
: : : : : : : :
Actual Classified as
Class Normal Dos Attack Other
Attacks Attack
Normal 0 83 257 Normal
Dos Attack 0 222524 795 Level 2
Other 0 435 3998
Attacks
Correctly Classified Instances 99.3117%
Other
Actual
Class Normal Dos
Classified as
Probe Others
Dos Attacks
Attack Level 3
Normal 0 0 253 7
Dos Attack 0 0 358 471
Probe 0 0 3086 0
Attack
Other 0 0 347 527 Other
Attacks Probe
Correctly Classified Instances 71.5587% Attacks
Attack Level 4
Actual Classified as
Class Normal Dos Probe U2R R2L
Normal 0 0 0 1 6
Dos 0 0 0 0 471
Probe 0 0 0 0 0 U2R R2L
U2R 0 0 0 9 8
R2L 0 0 0 2 508
Correctly Classified Instances 51.4428%
Improvements in Training
Dataset
KDD99 10% Training dataset and Testing dataset distribution
Training Set Testing Set
Normal 19.69% 19.48%
Probe 0.83% 1.34%
Dos 79.24% 73.90%
U2R 0.01% 0.07%
R2L 0.23% 5.20%
Improvements
New dataset – U2R, R2L and Probe data was duplicated 5 times
Level 1 classifier was trained using this new dataset
Testing Results of Level-1 Classifier on earlier dataset
Attack detection rate increased from 90.942% to 92.2515%.
The data data duplication improved the misuse and anomaly detection
rate from 99.6835% and 26.4618% to 99.9832% and 35.2479%,
respectively.
Descriptive Modeling
The advantage of multi-level sequential approach is that we
get small and easily interpretable trees.
Rules can be derived from these decision trees at different level of
abstraction.
These rules are in terms of 41 features of KDD dataset.
E.g. Rule derived from second classifier
If ( %of connection to different services for same host for last
1000 connections < 0.1 and
% of connection to different host for same service for past
1000 connections < 0.01 and
number of connection to same host for the past two seconds
>2)
=> Dos Attack
Conclusion
The model has low false alarm ratio of 0.15%.
Individual attack detection rate of 99.644% for Dos and
100% for Probe is achievable.
The percentage accuracy for classification between U2R and
R2L is as high as 98.1024%.
New dataset gives better result :
Misuse detection rate 99.9832% and anomaly detection rate 35.247%
The trees generated are small and easy to derive rules at
different levels of abstraction.
References
[1] S. Axellson, “The Base-Rate Fallacy and the Difficulty of Intrusion Detection,” ACM
Transaction on Information and System Security, 2000.
[2] Corey, V. et. al.: Network forensics analysis, Internet Computing, IEEE , Volume:6 Issue:
6 , 2002 pp: 60 –66.
[3] R. J. Henery, “Classification,” Machine Learning, Neural and Statistical
Classification,” D. Michie , D. J. Spiegelhalter, and C. C. Taylor (Eds.), Ellis Horwood, New
York, 1994.
[4] E. Bloedorn, L. Talbot, C. Skorupka, A. Christiansen, W. Hill, and J. Tivel, “Data Mining
applied to Intrusion Detection: MITRE Experiences,” In Proc. IEEE International
Conference on Data Mining, 2001.
[5] Y. Ma, D. Choi, and S. Ata, Eds., Application of Data Mining to Network Intrusion
Detection: Classifier Selection Model, ser. Lecture Notes in Computer Science. Berlin
Heidelberg, Germany: Springer-Verlag , 2008, vol. 5297.
[6] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A Detailed Analysis of the KDD
CUP 99 Data Set,” in Proc. IEEE Symposium CISDA’09, 2009.
[7] J. R. Quinlan, “C4.5: Programs for machine learning,” Morgan Kaufmann, San Mateo,
California, 1993.
[8] Weka – Data Mining Machine Learning Software. [Online]. Available:
http://www.cs.waikato.ac.nz/ml/weka/
[9] KDD Cup 1999 Data. [Online]. Available:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[10] M. Sabhnani, and G. Serpen, “Why Machine Learning Algorithms Fail in Misuse Detection on
KDD Intrusion Detection Dataset,” Intelligent Data Analysis, vol. 6, June 2004.
[11] K. Kendall, “A Database of Computer Attacks for the Evaluation of Intrusion Detection
Systems,” M. Eng. Thesis, Massachusetts Institute of Technology, Massachusetts, United
Thank You