Professional Documents
Culture Documents
Integrated Intrusion Detection Model Based On Artificial Immune
Integrated Intrusion Detection Model Based On Artificial Immune
1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
2. Zhengzhou University of Light Industry, Zhengzhou 450000, China
Abstract
The author puts forward an integrated intrusion detection (ID) model based on artificial immune (IIDAI), a vaccination
strategy based on the significance degree of genes and a method to generate initial memory antibodies with rough set (RS).
IIDAI integrates two kinds of intrusion detection mode: misuse detection and anonymous detection. Misuse detection and
anonymous detection are applied to detect the known and the unknown attacks, respectively. On the basis of IIDAI model,
an ID algorithm is presented. Simulation shows that the IIDAI has better performance than traditional ID methods in
feasibility and effectiveness. It is very prone to achieve a higher convergence rate by using the vaccination strategy.
Moreover, RS can remove the redundancy attributes and increase the detection speed. It can also increase detection rate by
applying the integrated method.
to apply the misuse detection and anomaly detection to equivalence relation, approximation, antibodies attribute
detect known or unknown attacks effectively. First, the dependency and significance degree of gene are defined as
misuse detection and anomaly detection is applied to follows.
detecting known or unknown attacks effectively, Definition 1 Antigen g ∈ A , A ⊂ {0,1}l , ( l ∈ N ,
respectively. But it will take long time on anomaly l > 0 ), A is the antigen set, g is the binary string with
detection. Consequently, misuse detection must be carried length l, the value of antigen g denotes the character of
out before the anomaly detection is implemented. Second, attribute.
removing redundancy attribute is an important factor to Definition 2 Antibody b ∈ B , B ⊂ {0,1}l , antibody
increase the detection speed. Usually it can be achieved by gene a ∈ {0,1}1 , e ∈ N , B{< d , s, e, c >} , s ∈ {00, 01,10} .
machine learning algorithm such as Bayesian classification,
B is the antibody set, and s denotes the status of antibody,
fuzzy clustering algorithm, rock clustering algorithm, RS,
whose values include: 00, 01 and 10, e is the age of
etc. Especially, RS theory has vast of unique traits. In 1982,
antibody, c is the matching number. B = BI ∪ BT ∪ BM ,
Pawlak firstly proposed RS to reduce the attribution of
BI is the set which consists of immature antibodies,
important contents. With development of information
science, many researchers paid more attention to the BI = {I I ∈ B, I s = 00} , BT is the set including mature
application of RS theory in the fields of artificial antibodies, BT = {T T ∈ B, Ts = 01} , BM is the memory
intellegence, such as knowledge discovery, data mining antibody set, BM = {M M ∈ B, M s = 10} .
and pattern recognition. Importantly, RS plays an
Definition 3 Let Sself and Snself be the sets of normal
important role in data mining for obtaining data clustering,
and abnormal behaviors, respectively. Obviously, we have
forming equivalent sets, and generating decision-making
Sself ∩ S nself = ∅ , Sself ∪ S nself = A .
rules which are called memory antigens [11]. For the
samples of incomplete, uncertain, small sampling data, RS The affinity of g and b means the bonding strength
has obvious merits in erasuring the redundant attributes to between antigen and antibody, denoted that fit (b, g).
get effective initial memory antibodies. Definition 4 fit (b, g) can be calculated by Euclidean
The artificial intellegence [6] and RS algorithms are distance:
l
repeated according to actual needs of intrusion detection. fit(b, g ) = ∑ (g i
− bi ) 2 (1)
And IIDAI method is demonstrated in detail. Three merits i =1
are given below: where, gi and bi are the ith bit of antigen g and antibody b,
1) By using RS the redundancy attributes and generate respectively.
initial memory antibodies can be removed. The time Definition 5 An antibody decision information system
complexity of the algorithm is reduced and the detection is defined as T =< U , C ∪ D, V , f > , where:
speed is also improved. 1) Nonempty domain U = {b1 , b2 ,..., bn } , b1 , b2 ,..., bn
2) Enlightened by RS theory, the definition of the are antibodies, B ⊂ U , B is antibodies population.
antigen decision table is given. The significance degree of 2) Conditional attributes subset C = {x1 , x2 ,..., xm },
gene and the vaccination extraction scheme is proposed
(m<l) is decision attributes subset.
based on above definition. Moreover, vaccinations are 3) D = {0,1} , if the value is 0, the antibody is self,
generated to improve the convergence speed of the
otherwise the antibody is non-self, and C ∩ D = ∅ .
detection algorithm by means of this method.
4) V = ∪ x , V is called the domain of antibody.
3) Method of generating immature antibodies is x∈C ∪ D
improved. It can select the best ones among the memory 5) f : U × (C ∪ D) → V , f is an information function,
antibodies, which are cloned, and varied to improve the and ∀x ∈ C ∪ D . b ∈ U and f ( x, b) ∈ V .
stability of the detection algorithm.
Definition 6 Every subset of antibodies B ' ⊂ B
2 The definitions decides a binary equivalent relation I ( B ') , which is
denoted as:
2.1 Antigens decision table I ( B ') = {(b, b ') ∈ U × U⏐∀b ∈ B, f ( x, b) = f (b ', x)} .
In Ref. [12], antibody information decision table, the U is divided into k classes by I ( B ') , each class
Issue 2 ZHANG Ling, et al. / Integrated intrusion detection model based on artificial immune 85
I ( B ') = {B1 ', B2 ',..., Bk '}; k≤n (2) ⎪1; s' ∑ (sig(bik' ' )bik' ' ) > α , α≥0.8
{b1 , b2 ,..., bn ' } , 1≤n′≤n , is the antibodies population, bik' ' , In Eq. (8), E ′(v) is the value of the fitness of antibody
1≤i ′≤n′ , 1≤k ′≤l , denotes the k ′th gene of the i ′th before inoculation, fit is the function with which the
antibody bi ' , let a = bik' ' , for antibody attribute D, the affinity between antigen and antibody can be calculated,
and â is the one after inoculation.
significance degree of antibody gene a is:
F (a, C , D) = F (bi ' , C , D) = r (C , D) − r ((C − {bi ' }), D) (5)
3 IIDAI model
Definition 11 v ∈ {0,1,*}l , ( l ∈ , l > 0 ), v is defined The ID evolution model based on artificial immune is
as a string with length l which is coded with character 0, 1 described as a four-tuple Ω = ( A, B, V , Θ) , where
and *, vk ' is the k ′th of v. A = {Sself , S nself } , B = BI ∪ BT ∪ BM and V are the sets
Definition 12 Vaccination extraction operator: let of antigens, antibodies and vaccinations, Θ represents of
b1 , b2 ,..., bs ' be the excellent antibody in population the vaccination inoculation operator. The evolution mode
{b1 , b2 ,..., bn } with certain evaluating criterions, sig(bik' ' ) is a dynamic cyclical process of ‘production,
is the significance degree of gene bik' ' to decision differentiation, proliferation and apoptosis’. Each antigen
attributes, the definition of vaccination extraction operator and antibody is going to change from moment to moment.
is in Eq. (6): Thus, all the antigens, antibodies and the vaccinations are
dynamically ever-changing in the evolution model. The
evolution model is shown in Fig. 1.
86 The Journal of China Universities of Posts and Telecommunications 2014
After the vaccination, Eq. (8) is used to evaluate the new used to classify the antibodies is: in certain period, in
antibodies, if the new ones can detect the antigens more Ref. [8], the rock clustering algorithm is used.
effectively than the antibodies which are not inoculated, 6) Extraction of different vaccinations
and the new antibodies are excellent vaccinations. Let The iteration of algorithm will increase the load of the
b ∈ BT , b is the mature antibody, a is the antibody gene system, so in certain period, a vaccination extraction
of b, and a is inoculated, and the affinity is calculated. If operation is applied to extract memory antibodies. For
fit(a Θ v, g ) > fit(a, g ) , then b is replaced by the new each subset, the vaccination vi is extracted, and added into
antibody, else the old antibody is kept. The significance the vaccination set.
degree of gene is calculated by the significance degree of The classification of vaccinations is to detect different
attribute. types of attacks effectively and to obtain excellent
3) Translation from mature antibodies to memory antibody population. The operation is separately done with
antibodies different types of invasion types to ensure that this kind of
For mature antibody in BT, in its lifecycle T2, if the intrusion behaviors will be detected.
mature one matches non-self, namely bc>θ (θ is the
3.3 Complexity of algorithm
threshold value of activation), b is activated, or else dies
away naturally.
In this section the authors will analyze the time and
After b is activated, the CSM signal is used to judge
space complexity of the proposed algorithm. Let Nt, Na, Ns,
whether the mature antibodies may change into memory
Nm, Nn and J be the number of training samples, antigens,
antibodies. Within certain time, if there is no CSM signal,
self-antibodies, memory antibodies, non-memory
bT will be died naturally. In order to avoid high false rate
antibodies and detectors’ attributes, respectively. The
of alarming, the CSM signal is needed.
length of detectors and input antigen signal are denoted by
4) Obtaining next general immature antibodies
L1 and L.
The next generation antibodies are generated from the
The time complexity of RS algorithm is O(241 N t2 L)
antibodies which already exited within the iteration
during the process of calculating OB ( D) , calculating the
circulation. In order to ensure the optimal antibodies, the
strategy is as follows: significance degree of attribute and generating the memory
a) The memory antibodies are divided into several antibodies with RS algorithm, respectively. Moreover, The
categories. In each category, better memory antibodies are time complexity of detection antigens is O(NaNmL1), but it
selected to be operated with gauss mutation, and the is increased to O(2 J N t2 L) after M interactions. It is
mutated individuals are saved as next generation immature known that when J<41 and after one interaction, the time
antibodies population. complexity of our IIDAI algorithm is less than one of IIDV.
b) Part individuals in antibody population are chosen for Once the vaccinations are inoculated, initial memory
crossing and variation operation, then they are saved as antibodies will only be generated in the beginning,
immature antibodies population. therefore the total time complexity of IIDAI is O(NaNmL1).
c) The random function is applied to get some immature The space complexity of IIDAI is related with
antibodies, and those immature antibodies are added into generating memory antibodies by means of RSA. Owing
immature antibodies population. to N1 is a high value, the space complexity of DynamiCS
The purposes of the strategy above are: one side, the and IIDV is O(NnL(Nn+ Na)) after an interaction, but one
diversity of antibodies is ensured, and the local of the antigens decision table is O(Nt L) in the process of
convergence is avoided, on the other side, too much copy generating memory antibodies. Thus, the space complexity
of antibodies is limited, the detection rate is increased also. of IIDAI is O(Nt L).
5) Dividing memory antibodies into different The comparisons among DynamiCS, IIDV and IIDAI
populations algorithms are shown in Table 1.
To obtain good vaccinations, the memory antibodies Table 1 The complexity of algorithms
need to be classified, namely BM = ∪ BM j ' . Each set Algorithm Time complexity Space complexity
DynamiCS in Ref. [6] O(NaNmL) O ((Nn+ Nm)L)
stands for one type of antibody population. The method IIDV in Ref. [8] O(NaNmL) O ((Nn+ Nm)L)
IIDAI O(Na NsL1) O(NtL)
88 The Journal of China Universities of Posts and Telecommunications 2014
As we all known, a prominent merit of RSA is From Table 2, the length of antibodies is 92 bit.
eliminating redundancies, therefore the length of memory 2) The parameters
antibodies decreases, namely L>L1. It is concluded that our There are 4 parameters which have great influences on
proposed algorithm based on RSA can improve the the performance of the algorithm IIDAI: the length of
detection speed and the effectiveness of obtaining memory memory antibodies L1, tolerant patients T1 (the lifecycle of
antibodies from Table 1. immature antibody), the lifecycle of mature antibody T2
However, much space is needed because of the large and activation threshold θ. The value of L1 is given in
amount of antigens in the process of training the data. Sect. 4.2 by the experiment.
Once the memory antibodies are generated, the space of In Ref. [6, 8–9], the other parameters are set as: T1 is
IIDAI is equal to the other two algorithms. Thus, it is 40 s, T2 is 50 s and θ is 5. The amount of non-memory
worth sacrificing certain space complexity in order to antibodies is 250, the affinity calculation is adopted by
improve the detection speed. Eq. (1), and the radius of detector is 10 bit.
4.1 Experiential datasets The decision table has 41 conditional attributes, while
some attributes are unimportant, if all the attributes are
The experiential data are from ‘10% KDD99’ dataset. 50 considered in detection, the length of the antibodies is
million records are adopted in the simulations. About 80% 92 bit, and the time complexity and space complexity will
of records S1 are used for training to get decision rules, the increase also. So it is necessary to delete the unimportant
others (20% of records set) are included in set S2 ,which attributes.
are used as testing data. The testing data are divided into 5 The reduction algorithm based on RS is used to deal
groups. Each set has 2 million data. The timestamp will be with the antibody information decision table S1, and to get
added to each record in order to simulate the detection that the simplest decision table. After reduction, there are 10
one record is sampled per second in testing data. attributes, which are: {protocol type, service, flag, serror
1) Antigens rate, srv serror rate, srv rerror rate, same srv rate, srv diff
The data domains are converted to binary strings which host rate, dst host diff srv rate, dst host same src port rate}.
are used to represent antigens [13], the methods of The decision rules before reduction and after reduction are
conversation are shown in Table 2. given in Table 3. After reduction, the length of antibodies L
Table 2 The construction of antigens table is 28 bit.
Item Conversion method Length/bit
Table 3 The number of decision rules
TCP, UDP and ICMP are
Protocol_type 2 Type Before reduction After reduction
replaced by 01,11,11
According to the initials, the Self 20 000 324
strings are converted into 1 to Non-self 20 000 785
Service 7
66, and convert into a binary
string In Table 3, after reduction by RS, the number of
According to the initials, the
strings are converted into 1 to antibody decision rules obviously decreases. In accordance
Duration 4
11, and transformed to binary with the important degree of rules, all the decision rules
string
Wrong fragment Convert into a binary string 2 whose decision attribute value is 0, and 315 rules are
Urgent Convert into a binary string 2
Represented as 000, 001, 010,
sorted by matching number from high to low.
Logged in 3
011 With RSA, we obtain 324 memory antibodies, while in
File creations Convert into a binary string 4
Shells Convert into a binary string 2 Ref. [6], the number is 1 035 in Ref. [8], and the number is
Access files Convert into a binary string 3 1 235. At the same time, the length of memory antibody is
Multiple 100, and convert into a
Srv diff host rate
binary string
7 28 bit, but in Ref. [8], the length is 128 bit. In Sect. 3.3,
{Land, logged in, from the algorithm’s time complexity, with RS, the time
flag ,root shell, is guest 0 and 1 1
login, is hot login } complexity of detection decreased. In this section, the
Lower is 00,middle is 01, high conclusion is that: the method of generating the memory
Others 2
is 10 and higher is 11
antibody can be used to remove the redundancy, and make
Issue 2 ZHANG Ling, et al. / Integrated intrusion detection model based on artificial immune 89
the length of memory antibody much less, which will between IIDAI with IIDV obtains the result that except
increase the detection rate. U2L, all the average TP of IIDAI is higher than IIDV, and
the average FP is lower than IIDV. The conclusion is that
4.3 The abnormal detection results the detection performance of IIDAI is better.
The datasets S2 are divided into five groups, and each 4.5 Convergence rate
group of data is running for 10 times. And the values of
true positive detection rate (TP) and false positive In Sect. 3.2, a method of generating memory antibodies
detection rate (FP) are calculated to get the average, which is described, and an integrated way is used to extract the
are used to evaluate algorithm above. The results are vaccinations. The two methods are applied to improve the
shown in Table 4. convergence speed of detection. In this section, for giving
Table 4 The values of TP and FP different numbers of interaction, simulations are
Algorithm Set TP/(%) FP/(%) implemented to get the TP which is used to observe the
1 97.65 1.5 convergence speed.
2 96.85 1.2
3 97.26 1.4 After different interactions, the experiment results of
IIDAI 4 95.87 0.95 IIDAI, DynamiCS [6] and IIDAI are shown in Fig. 2.
5 97.70 0.91
Average 97.07 1.19 IIDAI can achieve a higher detection rate in a shorter time,
Variance 0.005 64 0.000 69
Average 93.16 1.38
namely the convergence speed is faster than the other two
DynamiCS in Ref. [6]
Variance 0. 181 5 0. 007 48 algorithms. In the meantime, it is important that when the
Average 96.93 1.36
IIDV in Ref. [8]
Variance 0.028 9 0.059 2
number of interaction is 0, the TP arrived 72.3%, for the
reason that the effective memory antibodies are obtained
In Table 4, it can be concluded that: the average value of with IIDAI. It is concluded that IIDAI’s stability is better
TP of IIDAI is up 3.91% than DynamiCS [6], and is up than DynamiCS [6] and IIDV [8].
0.34% than IIDV, the average value of FP is 0.19% below
to DynamiCS [6], and is 0.17% to IIDV [8]. The stability
of algorithm is evaluated with the variances, so it is
obvious that IIDAI is more stable than the others.
Five kinds of attacks from simulation environment are Foundation of China (61161140320).
got, each kind of attacks are tested. The tests show that:
with IIDAI, the detection rate reached 96.2% in simulated References
environment. In summary, the method IIDAI is feasible,
1. Shingo M, Chen C, Lu N N, et al. An intrusion-detection model based on
and is of good performance in simulation environment. fuzzy class-association-rule mining using genetic programming network.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications
5 Conclusions and Reviews, 2011, 41(1): 130−139
2. Li L, Zhao K N. A new intrusion detection system based on rough set theory
and fuzzy support vector machine. Proceedings of the 3rd International
In this article, the authors proposed a new IIDAI model.
Workshop on Intelligent Systems and Applications (ISA’11), May 28−29,
First, the RS approach is introduced in order to increase 2011, Wuhan, China. Piscataway, NJ, USA: IEEE, 2011: 5p
the ID speed. Meanwhile, an overall comparison is made 3. Afaneh S, Zitar R A, Al-Hamami A. Virus detection using clonal selection
among IIDAI, DynamiCS [6] and IIDV [8]. Analysis algorithm with genetic algorithm (VDC algorithm). Applied Soft
Computing, 2013, 13(1): 239−246
shows that the time complexity decreases, however, 4. Denning D E. An intrusion detection model. IEEE Transactions on Software
sacrificing certain space complexity is needed in order to Engineering, 1987, 13 (2): 222−232
improve the detection speed. Second, to improve the 5. Hofmeye S A, Forrest S. Architecture for an artificial immune system.
Evolutionary Computation, 2000, 8(4): 443−473
detection performances such as high TP rate and low FP
6. Kim J, Bentley P J. Towards an artificial immune system for network
rate, the significance degree of antibody gene is defined. intrusion detection: an investigation of dynamic clonal selection.
Then a vaccination strategy based on the significance Proceedings of the 2002 Congress on Evolutionary Computation (CEC’02):
degree of gene is given. Furthermore, simulation on the Vol 2, May 12−17, 2002, Honolulu, HI, USA. Piscataway, NJ, USA: IEEE,
2002: 1015−1020
KDD99 shows that the TP rate and FP rate can achieve 7. Li T. Dynamic detection for computer virus based on immune system.
97.07% and 1.19%, respectively. Third, in order to Science in China Series F: Information Sciences, 2008, 51(10): 1475−1486
increase the detection convergence speed, the improved 8. Yan X H. An artificial immune-based intrusion detection model using
vaccination strategy. Acta Electronica Sinica, 2009, 37(4): 780−785 (in
method of is used for selecting excellent memory
Chinese)
antibodies to generate immature antibodies, which is 9. Chen Y B, Feng C, Zhang Q, et al. Integrated artificial immune system for
different from the selecting randomly from immature intrusion detection. Journal on Communications, 2012, 33(2): 125−131 (in
memory antibodies. Also simulations show that the Chinese)
10. Sperotto A, Mandjes M, Sadre R, et al. Autonomic parameter tuning of
detection rate can reach 96.2% in real simulation anomaly-based IDSs: an SSH case study. IEEE Transactions on Network
environment. In summary, our proposed IIDAI method is and Service Management, 2012, 9(2): 128−141
feasible, and has better performances. In the future, we 11. Pawlak Z, Gzymala-Busse J, Slowinski R, et al. Rough sets.
Communications of the ACM, 1995, 38(11): 88−95
will focus on the research of applying IIDAI method to the
12. Qian J, Miao D Q, Zhang Z H. Knowledge reduction algorithms in cloud
cloud computing environment. computing. Chinese Journal of Computers, 2011, 34(12): 2332−2343 (in
Chinese)
Acknowledgements 13. Zhang L, Bai Z Y, Luo S S, et al. Integrated intrusion detection model based
on rough set and artificial immune. Journey on Communications, 2013,
34(9): 166−176 (in Chinese)
This work was supported by the National Natural Science
(Editor: WANG Xu-ying)