Cluster Computing For Neural Network Based Anomaly Detection

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Cluster Computing for Neural Network based

Anomaly Detection
Srinivasan. N Vaidehi. V
Department of Information Technology Department of Electronics Engineering
Madras Institute of Technology, Anna University Madras Institute of Technology, Anna University
Chennai 600044, INDIA Chennai 600044, INDIA
ns@annauniv.edu vaidehi@annauniv.edu

Abstract - Network intrusion-detection systems are now they use during a particular session. Typical Intrusion
being identified as a mandatory component in multilayered detection systems that are based on Neural network expects
security architecture. Intrusion detection systems have the system administrator to run the neural network based
traditionally been based on the characterization of a user and Intrusion Detection system at the end of each day to see if
tracking of activity of the user to see if it matches that the users' sessions match their normal pattern. If not, an
characterization. Artificial neural networks provide a feasible
approach to model complex engineering systems such as investigation would be launched. The above model takes its
lItrusion detection. Applications of Artificial neural networks own shortcoming that by the time, the administrator
to characterize the behavior of users have been well studied in monitors and identifies an intruder; the attack would have
the recent past, without considering the enormous time they been successfilly completed. The model explained in this
take to get modeled. In this paper we present an paper is near real time which has a cluster at the server's end
implementation of a parallel version of the Back Propagation to facilitate online training of the neural network and speedy
training algorithm for Feed-forward neural networks that are processing of detection. This model is implemented in a
used for detecting intruders based on the MPI standard on UNIX environment and consists of keeping logs of the
Linux PC clusters. The experiments show a considerable commands executed, forming command histograms for each
increase in speedup during training and testing of the neural
network, which in turn increases the speed of detecting user, and learning the users' profiles from these histograms.
intruders. This provides an elegant solution to both online and offline
monitoring with these user profiles.
1. INTRODUCTION
For detecting intrusion in this type of environment, the
Intrusion detection systems (IDS) have become an amount of records to be handled by the server from all the
integral component for the protection perimeter of workstations increases with the number of users, a neural
traditional computing systems. One definition, put forward network based approach becomes too intricate and the
by Amoroso [11 is: "Intrusion detection is the process of computational complexity to train the neural network rises
identifying and responding to rmlicious activity targeted at to a great extent. This in turn increases the processing
computing and networkEng resources". Of the many delays, thus defers the detection of the intruder. Hence a
possible approaches to intrusion detection, one that has cluster is built at the server to handle the tasks of training
received considerable attention is anomaly detection. and testing the neural network. Buyya [31 defines a cluster
According to Kumar [2J, "Anomaly detection attepts to as "A type of parallel or distributed processing system,
quantify the usual or acceptable behavior and flags other which consists of a collection of interconnected stand-alone
irregular behavior as potentially intrusive". Under this computers working together as a single, integrated
definition, the scope of anomaly detection encompasses not computing resource'.
only violations by an outsider but also anomalies arising
from violations on the part of an authorized user. It is The rest of the paper discusses about the organization of
important to note that anomaly detection omits the class of the Cluster based Anomaly Detection system, introducing
security policy violations that occur within the bounds of the ways and means by which Anomaly detection could be
normal behavior for a system. Detecting anomalous done using neural network in section 2 followed by how
behavior can be viewed as a binary valued classification parallelization could be achieved in neural network in
problem in which measurements of system activity such as section 3. Section 4 includes a brief note on the Linux
system log files, resource usage, command traces, and audit cluster that was established and the LAM MPI parallel
trails are used to produce a classification of the state of the libraries used. Section 5 present our results in terms of
system as normal or abnormal. classification accuracy and computational speedups attained
through implementation.
In a distributed environment, each workstation in the
monitored domain is equipped with a host monitor that is 11. ANOMALY DETECTION AND NEURAL NETWORK
employed to send the monitored data to the server. The
system used to detect intruders uses an Artificial neural Anomaly detection relies on models of the intended
network trained to identify users based on what commands behavior of users and applications and interprets deviations
from this 'normal' behavior as evidence of malicious activity

1-4244-0000-7/05/$20.00 02005 IEEE. 130

Authorized licensed use limited to: Florida Institute of Technology. Downloaded on October 28,2020 at 00:39:20 UTC from IEEE Xplore. Restrictions apply.
[5,6,7,81. This approach is complementary with respect to output. During training, the parameters of the ANN are
misuse detection, where a number of attack descriptions adapted in an iterative process.
(usually in the form of signatures) are matched against the
stream of audited events, looking for evidence that one of The two types of parallelism utilized by the Parallel
the modeled attacks is occurring [9,10,111. Since anomaly version of Backpropagation are as follows:
detection techniques signal all anomalies as intrusions, false
alarms are expected when anomalies are caused by Pattern Parallelism Most of the speed-up is due to data
behavioral irregularities instead of intrusions [2]. Unusual parallelism. Since the weight changes do not occur until
but legitimate use may sometimes be considered anomalous. after the sweep is over, there is no more data dependency
The challenge is to develop a model of legitimate behavior between the operations performed for different patterns in
that would accept novel legitimate use. the sweep. Consequently, these computations can all be
done in simultaneously. Therefore, we can simulate more
A basic assumption underlying anomaly detection is than one network at a time and train each one with a
that attack pattems differ from normal behavior. In additiorn, different input pattern, in parallel. These networks all have
anomaly detection assumes that this 'difference' can be the same initial random weights and, ideally, only one input
expressed quantitatively. Under these assumptions, many pattern to lear. Each network calculates updates for its
techniques have been proposed to analyze different data weights based on the input pattem and the desired output
streams, such as data mining for network traffic 113], pattern it is assigned to. This is done for all the networks at
statistical analysis for audit records [141, and sequence the same time. After this step, the weight changes are
analysis for operating system calls 1151. accumulated from all the networks and are updated
simultaneously, based on the total weight changes.
But can a computer act smart as a hunan being?
Artificial neural network (ANN) capacitates a computer the Network Parallelisnm This parallelism is due to the
ability of learning and thinking. The application of ANN parallel features of the architecture of the multistage neural
method to anomaly detection has proved to be particularly network. The computations performed in the neurons of the
advantageous if the measured behavior is not connected same stage can be performed all at the same time. Since
exactly to the characteristics of the user. The optimum there are no connections between the neurons of the same
structure of neural network is detennined by a trial and error stage, no communication overhead is necessary.
method. Back propagation neural network (BPN) is the most
popular technology behind detecting intruders. The back
propagation method is part of the parallel distributed
processing system [171. One-layer networks like Hopfield
and Kohonen structure, and multi-layer systems like counter
propagation and back propagation of errors, can be used for
detecting intruders. A learning algorithm for modifying a
feed-forward neural network which minimises a continuous :NR- NeN.a :'.c
"error function" or "objective function." Back-propagation is
a "gradient descent" method of training in that it uses
gradient information to modify the network weights to F . tt ernd istiu
a i Strate
decrease the value of the error function on subsequent tests
of the inputs. Other gradient-based methods from numerical
analysis can be used to train networks more efficiently [18].
III. PARALLELIZATION

Parallelism is one of the underlying principles of the


artificial neural networks [20-26]. Several parallel schemes
are proposed in literature. So far, some of them are
implemented in neural hardware - in the comparatively
inexpensive neural boards. Another approach is to exploit
the inherent parallelism and to map them on special purpose
hardware or to simulate neural networks on general-purpose
parallel computers. We focus on the data parallel simulation
of ANN. ANN consists of a number of very simple units, It is known that the neural networks training can be
which are connected with each other via adjustable links. In efficiently implemented on parallel computers. Several
the feed-forward networks, units are grouped into layers methods have been proposed for the implementation of
with continuous activation functions. We present a set of backpropagation training on distributed and parallel
input-target (desired output) examples to the network architectures. In [261 each neuron is mapped on a different
representing the functional relationship to be learned. Processing Element of a parallel architecture, in [231 each
Supervised learning is an optimization problem. The task is PB handles several neurons. Other solutions distribute the
to minimize a cost function or error measure, i.e. the set of training patterns, the connections or both on different
"difference" between the actual output and the desired PEs [24]. There are also many examples of special hardware

131

Authorized licensed use limited to: Florida Institute of Technology. Downloaded on October 28,2020 at 00:39:20 UTC from IEEE Xplore. Restrictions apply.
[22,251. On the other hand, there are few approaches that independent computers. It features the MPI programming
exploit loosely-coupled distributed systems like standard for developing parallel programs. The LAM MPI
workstation's clusters. Our work was to choose the scheme parallel libraries were installed on each computer in the
for the general-purpose parallel computers with the cluster to implement parallel constructs based on MPI. One
availability of distributed low-level software. It hides the of these computers was designated as the master. The master
intrinsic differences between non-homogeneous monitors the overall execution of the application program.
architectures called MPI for a wide range of parallel The rest of the computers were designated as slave nodes.
computers (Henceforth a slave node will simply be referred to as node.)
Basically a setup consisting of a master-slave enviromnent
For complex multidimensional problems, the number of with 4 slave nodes was established.
training points is usually large while the number of model
parameters (weights) remains comparatively small. Thus An Audit collection mechanism is being executed at
obtaining the gradient is typically the most computationally every workstation where the user is working. The Anomaly
expensive stage of the iterative step of the gradient detection module can either be processed at every
optimizer. workstation or raw audit obtained at every workstation can
be sent as such to the server where a cluster of computers
To tackle the problem the following parallelization are commissioned for further processing. Audited activity is
scheme can be adopted. The training points are distributed described by a blend of intrusion detection variables that
among several nodes of the cluster. Each node should have a correspond to the measures recorded in the user profiles. As
local copy of all involved neural networks. One of the nodes each audit record arrives, the corresponding user profile are
is considered as master. Once the current version of weights retrieved from the knowledge base and compared with the
is received from the master each node can independently recorded intrusion detection variables. If the reference
(i.e. in parallel with the other nodes) compute the sum of the defined by the vector of intrusion detection variables is
derivatives of the error along the corresponding subset of sufficiently far from the position defined by the expected
collocation points. Final gradient is accumulated on the values, with respect to historical co-variances for the
master. variables stored in the profiles, then the record is considered
anomalous.
- -L er -

rentuf delta //ij's


j
tit e ights

Training Set IlI 142 113

Fig 2. Problem decomposition

IV. LINUX CLUSTER SETUP


A cluster of 5 personal computers working with the
Linux (Redhat/Linux 2.6) operating system was established
to carry out the implementation. Connectivity between the
computers was achieved via a 3-Com switch and the
Ethemet protocol. Message Passing Interface (MPI) is a
paradigm that provides the facility to develop parallel and
portable algorithms. An MPI program consists of Site Security Officer
autonomous processes, executing their own code, in an
M1MD style, as described in [28]. The codes executed by Fig. 3. Organization of a Cluster based Anomaly detection
each process need not be identical. If neural networks show sparse occupied weight
The processes communicate via calls to MPI matrices, each available processor can simulate a cluster of
communication primitives. Typically, each process executes neighbored neurons. Here the partitioning algorithm should
in its own address space, although shared-memory be combined with a heuristic to minimize the
communication costs, because these are usually high in a
implementations of MPI are possible. LAM stands for Local parallel system with distributed memory. As the speedup is
Area Multi-computer and is an implementation of the MPI strongly dependent on the topology of the neural network,
standard. It is a parallel processing environment and
development system, described in [29], for a network of

132

Authorized licensed use limited to: Florida Institute of Technology. Downloaded on October 28,2020 at 00:39:20 UTC from IEEE Xplore. Restrictions apply.
his partitioning method seems to be reserved for special &ather the calculated errors from all the nodes.
cases. Master node synchronizes the error vector from all
the nodes
The entire procedure involved in detecting an intruder Master node Broadcasts the synchronized error to
is as shown below: all the nodes.
Update the weights based on the broadcasted error.
Procedure: Step 6:
Write the trained network to the permanent storage media.
Step A: Audit collection.
Track system calls from user commands V. EXPERIMENTATION AND RESULTS
sys call table points to sys_execve
sys_call_table is made to point to user defined The system was tested at the Visual programming
my-sys_exec laboratory of The Department of Information Technology,
When user exectes a command, my_sys_execve captures which serves as nodal center for over 200 students. Audit
it-and stores it in /proc filesystem trails were collected and were dispatched to the system for
Periodically, a daemon flushes our audit data from /proc- detection. The results are as shown below.
filesystem
Store the flushed data in the respective user's file Fig.4.1. shows that plot 2, on a cluster based server
Step B: Audit Preprocessing gives a better performance than plot 1, on a stand-alone
Segregate the commands according to the userid with- server that is used to train the neural networks for detecting
reference to a pre-defined command vector. intruders. It could be observed that even a considerable
"Command - Frequency" details of each user is calculated increase in the number of hidden neurons results in a meager
Normalization is done (Ratio of frequency of each- increase in the time required to process them. Fig. 4.2. (a)
command to total frequency) and Fig. 4.2. (b) are concerned with adjustments in the
A pattern of "rNo. of commands x 1" for each user is neural network that is modeled on a cluster. Fig. 4.2. (a)
formed proves that the time consumed for processing is less on a
Step C: Test or Train cluster (plot 2) than the stand-alone system (plot 1), even
If new network though the performance of both the versions looks alike. A
< Perform Neural Algorithm > considerable drop in the time could clearly be noticed upon
else increasing the size of the training set according to Fig..4.2.
Test neural network (Detect for intruder) (b). A usual- evaluation on the cluster by increasing the
(Feedforward the data pattern and compare the output number of slave notes was also made as shown in Fig. 4.3.
with threshold.)
If value < threshold 160 -
alarm as intruder 140
else 120
<Perform Neural Algorithm >
Step a. END 100
80
Neural Algorithm: 60
40
Step 1: 20
Initialize system variables 0
Step 2: 20 40 60 80 100
If new network then N,. of ,Iis r screeh
Initialize network in the master node of the cluster.
else Fig. 4.1. Performance ofCluster vs. Stand-alone
Read weights from permanent storage media
60
Step 3:
Broadcast initial weights to all the nodes of the cluster. 50- d.

Step 4:
Broadcast the number of epochs to all the nodes of the 40
cluster.
Step 5: 30
Train the Network - M

a) Get the number of patterns = 20

b) for each epoch tO


for each pattern do
Get the user pattern (command vector) v, i
for each hidden and output neuron, sum and apply 20 40 60 80 100
the activation function.
for each hidden and output neuron calculate error.

133

Authorized licensed use limited to: Florida Institute of Technology. Downloaded on October 28,2020 at 00:39:20 UTC from IEEE Xplore. Restrictions apply.
'0
Ia
tUoo

80

60

40

20

sB

so

to0-
as
x~

0 l
0

4
l

-
t
Fig. 4.2. (a) Variation in Hidden neurons

t
2

VI.
Ig
4

2
nntr

No. o. cEte
6
rai set size

-dfiL nde

CONCLUSION

The parallel neural network training algorithm for


classifying large volumes of training data speeds up the
training process without the requirement for any special
hardware. The speed up capability of this technique is
evident from the implementation results. Since the data is
3
8 10

Fig. 4.2. (b) Variation in Training set size

Fig. 4.3. Scalability on the no. of nodes

distributed over all the nodes of the cluster, this method can
also bring down the space requirements on a single
computer to manageable limits. Thus, the eventual goal of
detecting intruders at a near real-time situation and at a
4
12
[8] T. Lane and C.E. Brodley. "Temporal sequence leaning and data
reduction for anomaly detection", In Proceedings of the 5th ACM
conference on Computer and comnunications security, 150-158.
ACM Press, 1998.
[9] K. Ilgun, R.A. Kemmerer, and P.A. Porras. "State Transition
Analysis: A Rule-Based Intrusion Detection System", IEEE
Transactions on Software Engineering, 2l(3):181-199, March 1995.
[10] V. Paxson. Bro: "A System for Detecting Network Intruders in Real-
Time", In Proceedings of the 7th USENIX Security Symposium, San
Antonio, TX, January 1998.
[11] U. Lindqvist and P.A. Porras. 'Detecting Computer and Network
Misuse with the Production-Based Expert System Toolset (P-BEST)",
In IEEE Symposium on Security and Privacy, 146-161, Oaldand,
California, May 1999.
[12] Chari SN, Cheng PC. "BlueBoX: A policy-driven, host-based
intrusion detection system", ACM Trans. on. Information and System
Security, 6(2):173-200, 2003
[13] W. Lee, S. Stolfo, and K. Mok. "Mining in a Dataflow Enviromnent:
Experience in Network Intrusion Detection", In Proceedings of the
5th ACM SIGKDD Intemational Conference on Knowledge
Discovery & Data Mining (KDD '99), San Diego, CA, August 1999.
[14] H. S. Javitz and A. Valdes. "The SRI IDES Statistical Anomaly
Detector', In Proceedings of the IEEE Symposium on Security and
Privacy, May 1991.
[151 S. Forrest. "A Sense of Self for UNIX Processes", In Proceedings of
the IEEE Symposium on Security and Privacy, 120-128, Oakland,
CA, May 1996.
116] Belanche, Li., "Heterogeneous neural networks: Theory and
applications" PhD Thesis, Department of Languages and Informatic
Systems, Polytechnic University of Catalonia, Barcelona, Spain, July,
2000
[17] Rumelhart, D. E. & McClelland, J. L, "Parallel Distributed
Processing", MIT Press, U. S.A.
[18] fip://ftp.sas.com/pub/neural/FAQ.html
[19] Douglas Aberdeen, Jonathan Baxter, and Robert Edwards.
-0.92/MFlops/s, Ultra-Large-Scale Neural-Network Training on a
Pill Cluster'. In Proceedings of the IEEE/ACM SC2000 Conference.
IEEE Computer Society, November 2000.
[20] Anguita, D., Parodi, G., and Zunino, R., "An Efficient
Implementation of Back- Propagation on RISC-based workstations"
Neurocomputing, December 1993.
[21] Chu, L.C., and Wah, B.W. "Optimal mapping of neural networks
learning on message passing multicomputers", JPDC, No. 14, 1992.
[221 Hammerstrom, D. "A VLSI architecture for high-performance, low-
cost, on chip learning", }ICNN '90, San Diego, USA, 1990.
[23] Hwang, J.N., Vlontzos, J.A., and Kung, S.Y., "A systolic neural-
network architecture for hidden markov model" IEEE Trans. on
ASSP Vol. 37, No. 12, 1989.
[24] Pomerleau, D.A., et at. "Neural network simulation at warp speed:
how we got 17 millions connections per second", Proc. of IEEE
faster pace is achieved. ICNN, San Diego, USA, 1988.
[251 Ranacher, U., Beichter, J., Raab, W, Anlauf, J., Bruels, N.,
Hachmann, U., Wesseling, M. "Design of a I st generation
neurocomputer". In "VLSI Design of Neural Networks", Kluwer
REFERENCES Academic, 271-310, 1991.
[26] Rosemberg, C.R., and Blelloch, G. "An implementation of network
[1] Edward Amoroso, "Intrusion Detection - An Introduction to Intemet learning on the connection machine". 10th Int. Cont on Al, Milan,
Surveillance, Correlation, Trace Back, Traps, and Response", AT&T, Italy, 1987.
7-8,1999. [271 Valdes, J.J. : "Time Series Models Discovery witb Similarity-Based
[2] Kumar S., "Classification and Detection of Computer Intrusions", Neuro-Fuzzy Networks and Evolutionary Algorithms", IEEE World
Ph.D., Thesis, Department of Computer Sciences, Purdue University. Conference on Computational Intelligence WCCI'2002, Hawaii,
W. Lafayette, IN, 1995. USA, 2002.
[31 Buyya R, et al. "Cluster computing at a glance: In High Performance [28] Message Passing Interface Forum. "MPI:A Message-Passing
cluster computing: Architectures and systems". Interface Standard", Technical Report Version 1.0, University of
[4] XU Ming, CHEN Chun, YING Jing, "Anomaly Detection Based on Tennessee, Knoxville, Tennessee, June 1995.
System Call Classification", Journa of Software,I 5(3):391-403, [29] "MPI Primer / Developing With LAM", Technical Report Version
March 2004. 1.0, Ohio Supercomputer Center, The Ohio State University,
[5] D.E. Denning. "An Intrusion Detection Model", IEEE Transactions November 1996.
on Software Engineering, 13(2): 222-232, February 1987.
[6] C. Ko, M. Ruschitzka, and K. Levitt. "Execution Monitoring of
Security-Critical Programs in Distributed Systems: A Specification-
based Approach", In Proceedings of the 1997 IEEE Symposium on
Security and Privacy, I75-187, May 1997.
[7] A.K. Ghosh, J. Wanken, and F. Charron. 'Detecting Anomalous and
Unknown Intrusions Against Programs", In Proceedings of the
Annual Computer Security Applications Conference (ACSAC'98),
259-267, Scottsdale, AZ, December 1998.

134

Authorized licensed use limited to: Florida Institute of Technology. Downloaded on October 28,2020 at 00:39:20 UTC from IEEE Xplore. Restrictions apply.

You might also like