Professional Documents
Culture Documents
Ma 2016
Ma 2016
Abstract — Recently the act of fraud aimed at the air tickets is f) On account of the characteristic attribute of various
more and more rampant. Lawbreakers get passenger categories is mutual independence, P(X|ci)P(ci) =
information from illegal channels, not only make a lot of
passengers suffering financial loss, but also influence the P(x1|ci)P(x2|ci)…P(xn|ci)P(ci) = P(ci) ∏ P(xj|ci), 0 < j < n
reputation and business of airline. There are many ways that 2) The Naive Bayes classification is used in this paper is
passenger information easy to be leaked, in this paper, we just mainly due to the following reasons:
focus on the detection of abnormal access behaviors that hacker a) Correlation between the characteristic attributes of
intrudes the frequent passenger system to get the passenger
information. In this paper, we analyze the passenger’s behavior
passenger access behavior is quite limited, so that various
of visiting the frequent passenger system deeply. Then we base on characteristic attributes can be assumed to be mutual
the naïve bayes algorithm to detect the behavior of passengers, so independence, also it's the precondition for using the Naive
that we can decide whether the account is abnormal. Through the Bayes.
experiment and evaluation, we demonstrate that our approach is b) The Naive Bayes gives statistics the probability
effective to identify the illegal log in of hackers and makes a great
distribution, it has a more stable classification efficiency,
help in improving the security of passenger information.
especially when classify the characteristic attributes of
Keywords—abnormal Access Behavior detection; Naive Bayes Discreteness, it has a high accuracy.
Algorithm; information security c) The achievement of the Naive Bayes is relatively
simple, the requirement on the number of characteristic
ALGORITHM INTRODUCTION attributes which are needed in classification is few, what’s
more, it’s less sensitive to the missing data.
A. Bayesian algorithm
The Naive Bayes has been used in this paper to identify the B. The Classification Process of Passenger Access Behavior
abnormal access behavior. Bayesian algorithm is a 1) Data preprocessing stage
classification method in statistics. It classifies based on It mainly analysis and research for the data got from the log
Probability and Statistic. of passenger access system, determine the characteristic
attributes of passenger access behavior [1]. Classify the
1) The Naive Bayes classification is described as below: training sample according to users reported information and the
a) Assuming that there is a data sample X, it has n system automatic detection information to form the training
characteristic attribute, that is X = {x1,x2,...,xn}. sample of passenger access system. Either the extraction of the
b) C is the classific groups of all the data samples, those characteristic attributes of access behavior or the accuracy on
samples belongs to m categories respectively, that is C = classification of the training set will directly affect the quality
{c1,c2,...,cm}. of the classifier [2].
314
detecting method in frequent passenger system, Bayesian
classification algorithm has following advantages:
Recognition 1) The classification results of Bayesian classification
rate of Bayes
algorithm
algorithm is more stable and less undulate.
2) The classification results of Bayesian classification
algorithm is obvious. The correct recognition rate is more than
False positive 80%, that is much higher than original detecting method in
rate of Bayes frequent passenger system.
algorithm
3) By testing the test sample we find that the rate of false
positives between the Bayesian classification algorithm and
Recognition the original detecting method in frequent passenger system is
rate of error very close.
logging But there are some shortcomings we need to improve and
approach complete in classification through Bayesian classification
algorithm:
False positive
rate of error 1) The Naive Bayes classification relies on the accuracy of
logging training sample set highly [5]. But in practice we can’t find all
approach
abnormal access account. So, the training sample set need to
verify and organize further.
Fig.2. Recognition results 2) In order to make sure that feature attributes are
In order to taking further analysis of the classification independent of each other, we just chose 5 feature attributes to
results, this paper carried out the following definition: conduct the experiment. The result of classification maybe
a) Definition 6: the rate of correctly identified abnormal impact because the feature attributes is less.
access account = the number of correctly identified abnormal Experimental results demonstrate that the Naive Bayes
access accounts/the number of abnormal access accounts. classification can improve the identification effect obviously in
frequent passenger system. It makes a contribute to improving
b) Definition 7: the rate of misinformation of abnormal the security of frequent passenger account. Besides, it also has
access = the number of normal access identified as abnormal important significance in improving the airline service quality
access / (the number of correctly identified abnormal access and security. Generally, the Naive Bayes classification can
accounts + the number of normal access identified as achieve the desired results of the airline.
abnormal access).
The difference of identification effect between using the ACKNOWLEDGMENT
Naive Bayes classification and using the incorrect login We would like to thank the anonymous reviewers of this
decision mechanism only is as fig.2 shows. paper for their helpful feedback and our friend Zheli Liu for his
Through the above we can find that the recognition rate of insightful comments and suggestions for this paper. We are
Bayesian classification algorithm was higher than the method very appreciate the help they gave us. This work is supported
based on error logging generally. What’s more, Bayesian by the Civil Airport Information Security Technology Research
classification algorithm is more stable. While the accuracy of under Grants MHRD20150233
the method based on error logging is undulate, analyzing this
part of the subset of the test sample we found that in this subset
the sample having characteristic attribute of error logging is REFERENCES
more, so it occur as shown in figure.
[1] Cooley S. Systems and methods for detecting abnormal behavior of
For the rate of false positives, the reason why the rate of networked devices: US8973133[P]. 2015.
false positives of login error identification method is lower than [2] Heo Y J, Sohn S G, Kim B K, et al. SYSTEM AND METHOD FOR
the false positives of Bayesian classification algorithm is that DETECTING ABNORMAL BEHAVIOR OF CONTROL SYSTEM: ,
US20150341380[P]. 2015.
the decision condition of login error identification method is
clear. But as shown in the figure above, the rate of false [3] Kim T, Kim H. A system for detection of abnormal behavior in BYOD
based on web usage patterns[C]// Information and Communication
positives of these two method is very close, and overlapped Technology Convergence (ICTC), 2015 International Conference on.
partly. Generally, the difference of these two method is not IEEE, 2015.
much. [4] Lai Y, Zhang W, Yang Z. Software Abnormal Behavior Detection Based
on Function Semantic Tree[J]. Ieice Transactions on Information &
By analysis we can know that Bayesian classification Systems, 2015, E98.D(10):1777-1787.
algorithm has a good effect in identifying the passenger [5] Ni S Y, Lan Y T, Lin I T C, et al. Abnormal behavior detection system
abnormal access behavior. Comparing with the original and method using automatic classification of multiple features: US,
US8885929[P]. 2014.
315