Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing

The Approach to Detect Abnormal Access Behavior


Based on Naive Bayes Algorithm
Yong Ma Shuang Liang, Xuan Chen, Chunfu Jia
Information Security Evaluation Center College of Computer and Control Engineering
Civil Aviation University Nankai University
Tianjin, China Tianjin, China
heimajushi@sina.com liangshuang@mail.nankai.edu.cn
chenxuan@mail.nankai.edu.cn,
cfjia@nankai.edu.cn

Abstract — Recently the act of fraud aimed at the air tickets is f) On account of the characteristic attribute of various
more and more rampant. Lawbreakers get passenger categories is mutual independence, P(X|ci)P(ci) =
information from illegal channels, not only make a lot of
passengers suffering financial loss, but also influence the P(x1|ci)P(x2|ci)…P(xn|ci)P(ci) = P(ci) ∏ P(xj|ci), 0 < j < n
reputation and business of airline. There are many ways that 2) The Naive Bayes classification is used in this paper is
passenger information easy to be leaked, in this paper, we just mainly due to the following reasons:
focus on the detection of abnormal access behaviors that hacker a) Correlation between the characteristic attributes of
intrudes the frequent passenger system to get the passenger
information. In this paper, we analyze the passenger’s behavior
passenger access behavior is quite limited, so that various
of visiting the frequent passenger system deeply. Then we base on characteristic attributes can be assumed to be mutual
the naïve bayes algorithm to detect the behavior of passengers, so independence, also it's the precondition for using the Naive
that we can decide whether the account is abnormal. Through the Bayes.
experiment and evaluation, we demonstrate that our approach is b) The Naive Bayes gives statistics the probability
effective to identify the illegal log in of hackers and makes a great
distribution, it has a more stable classification efficiency,
help in improving the security of passenger information.
especially when classify the characteristic attributes of
Keywords—abnormal Access Behavior detection; Naive Bayes Discreteness, it has a high accuracy.
Algorithm; information security c) The achievement of the Naive Bayes is relatively
simple, the requirement on the number of characteristic
ALGORITHM INTRODUCTION attributes which are needed in classification is few, what’s
more, it’s less sensitive to the missing data.
A. Bayesian algorithm
The Naive Bayes has been used in this paper to identify the B. The Classification Process of Passenger Access Behavior
abnormal access behavior. Bayesian algorithm is a 1) Data preprocessing stage
classification method in statistics. It classifies based on It mainly analysis and research for the data got from the log
Probability and Statistic. of passenger access system, determine the characteristic
attributes of passenger access behavior [1]. Classify the
1) The Naive Bayes classification is described as below: training sample according to users reported information and the
a) Assuming that there is a data sample X, it has n system automatic detection information to form the training
characteristic attribute, that is X = {x1,x2,...,xn}. sample of passenger access system. Either the extraction of the
b) C is the classific groups of all the data samples, those characteristic attributes of access behavior or the accuracy on
samples belongs to m categories respectively, that is C = classification of the training set will directly affect the quality
{c1,c2,...,cm}. of the classifier [2].

c) Given a new data sample X which hasn't been 2) Generate a classifier


classified. If P(ci|X) > P(cj|X), i ≠ j, then the sample X is It mainly choose the algorithm of classification and use the
supposed to belong to ci based on the Naive Bayes. training sample set to generating classifier in this phase. This
paper uses the Bayesian algorithm, so that the classifier
d) Based on the Bayes' theorem, P(ci|X) = P(X|ci)P(ci) / generated from the calculating of the probability of each
P(X). characteristic attribute and the classification of conditional
e) To all the categories, P(X) is constant, so that the probability of the training sample set.
MAP P(cj|X) can be translated into the MPP P(X|ci)P(ci).

978-1-5090-0984-8/16 $31.00 © 2016 IEEE 313


DOI 10.1109/IMIS.2016.83
Through the analysis, following characteristic attributes of
the access behavior are defined in this article:
a) Definition 1: access terminal changing frequency =
the number of access terminal type/ passenger registration
time.
b) Definition 2: access address changing frequency =
the number of different login IP/ passenger registration time.
c) Definition 3: the number of login error refers to the
number of time a passenger enter the wrong password when
accessing the passenger system.
d) Definition 4: the range of accessing refers to contents
of accessing when a passenger log in the system, including
function for credits, function for booking tickets, and function
for back Meal Service.
e) Definition 5: the frequency of the accessing=the
times of accessing / passenger registration time.
C. The Naive Bayes classification modeling
1) The data sample: X = {access terminal changing
frequency, access address changing frequency, the number of
Fig. 1. Chart1-classify flow login error, the range of accessing, the frequency of the
accessing}.
TABLE I. TRAINING SAMPLE SET 2) Classification properties: {normal access, abnormal
count of normal account Count of abnormal account total access}.
13200 1800 15000 3) Classification.
TABLE III. TESTING SAMPLE SET If p(xj|ci)≠0, j = {1,2,3,4,5}, i = {1,2}.
count of normal account Count of abnormal account total C=Max {P(ci) ∏ P(xj|ci)}.
4580 420 5000 If p(xj|ci)=0j = {1,2,3,4,5}, i = {1,2}.
All characteristic attributes value plus one, and then
3) Using the classifier to classify
In this phase the classifier can be used to test the test calculate the probability distribution.
sample set, input the unclassified set of access attributes, use C=Max {P(ci) ∏ P(xj|ci)}.
the classifier generated by the Naive Bayes to classify, the By the reason of the assumption of independence between
output the result. Specific flow chart is as fig.1 shows. characteristic attributes of the system access behavior, once
appear a characteristic probability is 0, the accuracy of the
classifier will be greatly reduced. Laplace calibration is
II. DATA PREPROCESSING AND MODELING introduced in data calculation to avoid this situation [4]. In this
paper only involves normal accounts and abnormal accounts,
A. The data source and the characteristic attributes of them are more visible, so
The data in this paper derived from passenger system on that the situation that probability is 0 will not occur during
mobile end of an airline company, the data samples are in a algorithm implementation. But there may be a property that
total of 20000. The original sample properties are as table 1 probability is low, once low probability appears, the
follows. modification of estimated classification probability is necessary.
Equipmen Log in Log in Log out Access contents The Naive Bayes classification is introduced in this paper
t model IP time time information
to classify the passenger access behavior of passenger system.
15000 data among all is the training sample set, as table I Carry on the statistical analysis of the abnormal access
shows, and the other 5000 is a testing sample set, as table II behavior respectively when test sample set is 1000, 2000, 3000,
shows. The abnormal access account from user’s report and 4000 and 5000.
airlines daily test results [3].
TABLE IIIII. TESTING RESULT
The number of test samples 1000 2000 3000 4000 5000
B. Data preprocessing The number of abnormal 43 72 89 102 120
Analysis found that the original sample set classified the access accounts
characteristic attribute of access behavior, so that data The number of correctly 37 62 73 84 97
identified abnormal access
preprocessing is necessary before using the Naïve Bayes to accounts
classify. The number of normal access 8 10 12 23 23
identified as abnormal access
1) The extraction of characteristic attribute

314
detecting method in frequent passenger system, Bayesian
classification algorithm has following advantages:
Recognition 1) The classification results of Bayesian classification
 rate of Bayes
algorithm
algorithm is more stable and less undulate.
 2) The classification results of Bayesian classification
 algorithm is obvious. The correct recognition rate is more than
 False positive 80%, that is much higher than original detecting method in
rate of Bayes frequent passenger system.
 algorithm
3) By testing the test sample we find that the rate of false
 positives between the Bayesian classification algorithm and
 Recognition the original detecting method in frequent passenger system is
rate of error very close.
 logging But there are some shortcomings we need to improve and
 approach complete in classification through Bayesian classification
algorithm:
 False positive
 rate of error 1) The Naive Bayes classification relies on the accuracy of
logging training sample set highly [5]. But in practice we can’t find all
     approach
abnormal access account. So, the training sample set need to
verify and organize further.
Fig.2. Recognition results 2) In order to make sure that feature attributes are
In order to taking further analysis of the classification independent of each other, we just chose 5 feature attributes to
results, this paper carried out the following definition: conduct the experiment. The result of classification maybe
a) Definition 6: the rate of correctly identified abnormal impact because the feature attributes is less.
access account = the number of correctly identified abnormal Experimental results demonstrate that the Naive Bayes
access accounts/the number of abnormal access accounts. classification can improve the identification effect obviously in
frequent passenger system. It makes a contribute to improving
b) Definition 7: the rate of misinformation of abnormal the security of frequent passenger account. Besides, it also has
access = the number of normal access identified as abnormal important significance in improving the airline service quality
access / (the number of correctly identified abnormal access and security. Generally, the Naive Bayes classification can
accounts + the number of normal access identified as achieve the desired results of the airline.
abnormal access).
The difference of identification effect between using the ACKNOWLEDGMENT
Naive Bayes classification and using the incorrect login We would like to thank the anonymous reviewers of this
decision mechanism only is as fig.2 shows. paper for their helpful feedback and our friend Zheli Liu for his
Through the above we can find that the recognition rate of insightful comments and suggestions for this paper. We are
Bayesian classification algorithm was higher than the method very appreciate the help they gave us. This work is supported
based on error logging generally. What’s more, Bayesian by the Civil Airport Information Security Technology Research
classification algorithm is more stable. While the accuracy of under Grants MHRD20150233
the method based on error logging is undulate, analyzing this
part of the subset of the test sample we found that in this subset
the sample having characteristic attribute of error logging is REFERENCES
more, so it occur as shown in figure.
[1] Cooley S. Systems and methods for detecting abnormal behavior of
For the rate of false positives, the reason why the rate of networked devices: US8973133[P]. 2015.
false positives of login error identification method is lower than [2] Heo Y J, Sohn S G, Kim B K, et al. SYSTEM AND METHOD FOR
the false positives of Bayesian classification algorithm is that DETECTING ABNORMAL BEHAVIOR OF CONTROL SYSTEM: ,
US20150341380[P]. 2015.
the decision condition of login error identification method is
clear. But as shown in the figure above, the rate of false [3] Kim T, Kim H. A system for detection of abnormal behavior in BYOD
based on web usage patterns[C]// Information and Communication
positives of these two method is very close, and overlapped Technology Convergence (ICTC), 2015 International Conference on.
partly. Generally, the difference of these two method is not IEEE, 2015.
much. [4] Lai Y, Zhang W, Yang Z. Software Abnormal Behavior Detection Based
on Function Semantic Tree[J]. Ieice Transactions on Information &
By analysis we can know that Bayesian classification Systems, 2015, E98.D(10):1777-1787.
algorithm has a good effect in identifying the passenger [5] Ni S Y, Lan Y T, Lin I T C, et al. Abnormal behavior detection system
abnormal access behavior. Comparing with the original and method using automatic classification of multiple features: US,
US8885929[P]. 2014.

315

You might also like