Professional Documents
Culture Documents
A High Accuracy and Adaptive Anomaly Detection Model With Dual-Domain Graph
A High Accuracy and Adaptive Anomaly Detection Model With Dual-Domain Graph
18, 2023
Abstract— Insider threat is destructive and concealable, mak- organizations in recent years. Compared with outsider threats,
ing addressing it a challenging task in cybersecurity. Most exist- insider threats are usually launched by insiders of the orga-
ing methods transform user behavior into sequential information nization who already have the authorization to access the
and analyze user behavior while neglecting structural informa-
tion among users, resulting in high false positives. To solve information system and are familiar with enterprise internal
this problem, in this paper, we propose Dual-Domain Graph defense processes [1]. Insider attackers can remove footprints
Convolutional Network (referred to as DD-GCN), a graph- after posing enterprise privacy or property in threat, which are
based modularized method for high accuracy and adaptive difficult for traditional anomaly detection schemes to discover.
insider threat detection. The central idea is to convert user Since insider attackers are familiar with the core secrets of the
features and structural information into heterogeneous graphs
in the light of various relationships and take user behavior company, the consequences of insider threats are often more
and relationship into account together. To this end, a weighted severe than those caused by external attackers [2], [3], [4].
feature similarity mechanism is applied to balance the feature This situation attracts more and more attention from academia.
similarity of users and original linkages among them so as to Thus, deploying a cybersecurity approach to information sys-
generate the fused structure. Next, specific graph embeddings tems to detect insider threats comprehensively is crucial.
are extracted from the original topology structure and fused
structure simultaneously, which convert behavior information
into high-level representations. Furthermore, an attention mech-
anism is applied to learn the adaptive importance weights of the A. Motivation
user’s features in the corresponding embedding. The combination So far, researchers have done numerous works on insider
and difference constraints are proposed to enhance the learned threat detection. Existing insider threat detection works are
embeddings’ commonality and the ability to capture different
information. Extensive experiments on two real-world datasets divided into two categories. 1) The first type of typical
clearly show that our proposed DD-GCN extracts the most methods attempts to extract various insider threat patterns
correlated information from structural topology and feature and make anomaly detection about insider threats based on
information substantially, and achieves improved accuracy with prior knowledge. These techniques focus on establishing users’
a clear margin. behavior baseline to classify normal users against insider
Index Terms— Insider threat detection, anomaly detection, threats via game theory [5], machine learning [6], etc., ana-
graph convolutional network. lyzing users’ behavior based on system logs (e.g., file access,
logon and logoff operations, removable device usage), and
I. I NTRODUCTION adding the consideration of role attributes to form a real-time
monitoring system in the company. 2) Another group of
I NSIDER threat, as an essential research topic in cyber-
security, has become one of the main factors in numer-
ous cybersecurity incidents and caused significant losses to
typical methods concentrates effort on transforming various
user behavior into sequence data (i.e., sequential relationships
among log entries), which hold temporal information of the
Manuscript received 15 March 2022; revised 5 October 2022 and 5 January user. The rapid evolution of deep learning has recently brought
2023; accepted 3 February 2023. Date of publication 14 February 2023; date
of current version 2 March 2023. This work was supported in part by the a new aspect to sequence processing techniques for insider
National Natural Science Foundation of China under Grant 62202066, Grant threat detection. Deep learning as a subfield of representation
62102040, and Grant 62002028. The associate editor coordinating the review learning has a powerful feature representing capability, which
of this manuscript and approving it for publication was Dr. Ragnar Thobaben.
(Corresponding author: Xiaoyong Li.) can learn profound information and perform abstract and
Ximing Li, Xiaoyong Li, Jia Jia, Linghui Li, Jie Yuan, and Yali Gao accurate representation. Sequence processing techniques of
are with the Key Laboratory of Trustworthy Distributed Computing deep learning, including convolutional neural network (CNN)
and Service, Ministry of Education, Beijing University of Posts and
Telecommunications, Beijing 100876, China (e-mail: ximingli@bupt.edu.cn; and recurrent neural network (RNN), are widely applied to
lixiaoyong@bupt.edu.cn; jiajia@bupt.edu.cn; lilinghui@bupt.edu.cn; learn knowledge according to the history behavior and analyze
yuanjie@bupt.edu.cn; gaoyali@bupt.edu.cn). the users’ future behavior. Essentially, these behavior-to-entry
Shui Yu is with the School of Computer Science, University of Technology
Sydney, Ultimo, NSW 2007, Australia (e-mail: shui.yu@uts.edu.au). methods simulate normal user behavior and mark a deviation
Digital Object Identifier 10.1109/TIFS.2023.3245413 as an anomaly.
1556-6021 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1639
Although prior works on insider threat detection have made is provided. In summary, we highlight our contributions as
exciting progress by extracting threat patterns and entirely follows:
using user behavior information, they still have numerous • By leveraging feature similarity and original linkage,
drawbacks. Methods proposed in prior works can effectively a weighted feature similarity mechanism for insider threat
block hackers and prevent intruders from attacking to some detection is applied to balance user behavior and original
extent, but they are often costly and occupy several system linkage information, and translate users’ log entries into
resources. Similarly, much complicated and costly work is heterogeneous graphs. The weighted feature similarity
deployed when facing high-dimension, complexity, hetero- mechanism calculates the feature similarity among users
geneity, and sparsity situations. Moreover, it is too idealized and extracts latent structural information, which holds
that existing methods identify anomalies by comparing them users’ specific behavior information and maximizes the
with users’ daily normal behavior based on the assumption that usage of labeled samples. As a result, various structural
users’ daily behavior over time is relatively regular and steady information can be fully quantified and discovered.
(logical relationships over days or weeks). Relationships can • We propose the DD-GCN, a two-domain and two convo-
provide valuable and essential information, similar to social lutional operations graph-based model for high accuracy
networks in our daily life, yet existing approaches ignore such and adaptive insider threat detection, which takes into
relationships in the insider threat detection task. At the same account of structural relationships and feature informa-
time, current methods depend on a large number of labeled tion. The fused structural information as a bonus com-
data to ensure the model’s performance. However, collecting ponent for the final result. Furthermore, an attention
enough labeled user information and malicious activities from mechanism is applied to learn the different importance of
the organization is challenging in real-world scenarios due related nodes to capture behavior differences adaptively.
to privacy concerns. Thus, maximizing the usage of circum- The model aggregates different scale information and
scribed data is critical. In summary, we need to address the achieves improved insider threat detection accuracy.
following three problems: 1) How to capture different behavior • Two constraints, i.e., the combination constraint and
accurately under high-dimension, complexity, heterogeneity, the difference constraint, are designed for insider threat
and sparsity situation. 2) How to integrate the relationship of detection to ensure the consistency and disparity of the
structural information (i.e., user interaction) with user behavior learned embeddings to enhance their commonality and
features in an insider threat detection model. 3) How to use the ability to capture different scale information.
limited labeled instances efficiently for insider threat tasks. • Our evaluation of the proposed model on public insider
threat datasets illustrates that our proposed model
achieves improved detection accuracy of 98.65% and out-
B. Our Designs and Contributions performs state-of-the-art techniques with a clear margin.
The rest of the article is organized as follows. In Section II,
To alleviate the aforementioned drawbacks, we propose a we will review related works of insider threat detection
novel graph-based model named Dual-Domain Graph Convo- and the application of Graph Convolutional Network (GCN).
lutional Network (DD-GCN) for high accuracy and adaptive In Section III, we elaborate Dual-Domain GCN model and
insider threat detection. DD-GCN mainly consists of three insider threat detection framework based on it. The perfor-
specialized modules to extract relationships, accurately capture mance evaluation results are reported in Section IV, and we
different behavior, and detect anomalies under limited labeled conclude and prospect our work in Section V.
data efficiently. First, a weighted feature similarity mechanism
is applied to construct users’ fused structure from the original II. R ELATED W ORK
linkage network among users and integrate multiple relation-
Insider threat is an in-depth investigation problem that
ships, such as email communication, device, or file transform-
causes severe damage to enterprises. Scholars work on plenty
ing operations. Next, users’ behaviors are propagated over
of studies for insider threat detection and prevention. In this
the fused structure and the original topology graph to extract
section, we will review related works on insider threat detec-
embeddings in two domains with two specific convolutional
tion and the application of Graph Convolutional Network.
modules. Vectorized user behavior and operations based on the
embeddings are extracted from the original structural topol-
ogy and fused structural information simultaneously. Finally, A. Insider Threat Detection
an attention mechanism is utilized to learn the importance Insider threats have received significant attention for a long
weights of the user’s features within two embeddings in time as one of the most challenging cyberattacks to counter.
order to propagate them adaptively. In this way, insider threat One of the most typical methods of existing paradigms [8],
labels are able to supervise the learning process to adjust [9], [10], [11], [12] extracts and transforms user behavior
the adaptive importance weights of users’ features in two features, including email communication, network activity,
graphs and extract the most correlated information. Moreover, file access, and personal computer, into machine learning
we design combination and difference constraints to ensure models to identify malicious sessions and events. Features
the common and disparity effect of the learned embeddings. for enterprise situations also contain logon/logoff information,
Meanwhile, the complete framework of the insider threat printer, and device connection. The data used for insider threat
detection system based on the Dual-Domain GCN algorithm detection are often heterogeneous and sparsity, making feature
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
1640 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023
extracting-based approaches time-consuming, complicated and In addition to typical detection models mentioned above,
hindering practical deployment. tracking systems are applied to monitor and analyze activities
Another group of typical approaches converts log informa- of the system for insider threat detection [21], [22], [23]. Our
tion into sequential data and then analyzes user behavior. Min proposed DD-GCN differs from these tracking systems in two
Du et al. [13] design an anomaly detection method based on ways. First, most tracking systems aim to prevent attack foren-
LSTM named DeepLog. DeepLog uses system log keywords sics, not cyber threat detection. Second, these tracking systems
and different types of anomalies in log entries. Tuor et al. [14] usually apply causal graphs to track the flow of operations
propose a stacked LSTM structure to capture user actions and processes and interactions, such as IPC syscall operating on
utilize user activity log-likelihoods as the anomalous scores pipes. Our proposed DD-GCN analyzes the log information
to identify insider threat sessions. Instead of relying only that records users’ behavior in an information system and
on the activity type, Yuan et al. [15] utilize file uploading captures multiple relationships among log information that
and web browsing activities for insider threat detection and reflect user features, such as removable devices connection
propose a hierarchical neural temporal point processes model, history or website browsing contents.
which generates an anomalous score based on the differences
between test and normal activities in terms of activity types
B. Graph Convolutional Network
and occurrence time. Shen et al. [16] leverage RNN to predict
a user’s upcoming action to identify an anomaly user. If no sig- With the development of Represent Learning and Graph
nificant disparities are detected between the prediction findings Embedding techniques, Graph Convolutional Networks have
and the user’s activities, the user is identified as a regular user. attracted much attention and are widely studied [24], [25],
Otherwise, malicious activities are identified. Hu et al. [17] [26]. Bruna et al. [27] first apply graph Laplacian to graph
propose a CNN-based user authentication approach to detect convolution in the Fourier domain. Defferrard et al. [28]
insider threats by evaluating mouse bio-behavioral traits. The assume the Laplace transform from the Chebyshev expansion
proposed approach utilizes a picture to show how users move graph, improving the efficiency. Kipf et al. [7] simplify the
their controllers. If ID theft is committed, the user’s controllers convolution operation and recommend that only single-hop
actions will differ from those of the authorized user. Although neighbor node features are aggregated. GraphSage [29] aggre-
sequence processing models improve insider threat detection gates node features from the local neighbors with mean, max,
significantly, they always neglect the interaction information or LSTM pooling. Graph Attention network [30] outlines the
among users, which can provide indispensable information for node that should be paid attention to by learning the weight of
insider threat identification. the node feature mechanism. Pei et al. [31] design Geom-GCN
Recently, graph data structures have been widely deployed, that applies the structural similarity to collect long-range
which are able to represent the interaction among users effec- disassortative graphs. Xu et al. [32] propose Graph Wavelet
tively. Oka et al. [18] convert user-system interaction into a Neural Network (GWNN), which modifies the GCN model
bipartite graph and use an unsupervised learning framework by replacing the eigenvectors with wavelet bases to improve
to evaluate whether a potential insider threat is triggered after efficiency. Monti et al. [33] propose MoNet, which provides a
an incident shows a clear correlation between precipitation unified generalization of graph convolutional structure in the
with significant anomalies. Jiang et al. [19] adopt GCN model spatial domain. Most current GCN variants mainly concentrate
for insider threat detection. It is reasonable to utilize a graph on propagating node features on the topology to learn graph
structure to represent the interdependencies among users since embedding for detection or classification tasks. They aim to
individuals in an organization frequently contact one another design the aggregation function of neighbor nodes for message
via Email. The proposed model also uses profound user passing patterns.
profiles as the feature vectors of nodes. Liu et al. [20] propose Conversely, some studies question and analyze the propagat-
a heterogeneous graph embedding model, named Log2vec, ing mechanism of GCN. Li et al. [34] show that GCN actually
to encode activity linkages. Log2vec first creates a heteroge- carries out Laplacian debugging on node attributes so that the
neous graph from audit data by representing varied activities nodes embedded in the entire network gradually converge. Nt
as nodes and profound relationships between activities as et al. [35] and Wu et al. [36] show that when performance
edges. Then, Log2vec distinguishes malicious and benign information is propagated through the network topology, the
activity into different groups and identifies insider threats by topology plays a role in the low-frequency filtering of node
applying a clustering algorithm to node embeddings. Graph attributes. Using only low-frequency information under differ-
data-based methods above are built on normal user behavior ent conditions is limited. Based on this, Deyu et al. [37] pro-
patterns and compare or predict them with new behavior to pose a Frequency Adaptation Graph Convolutional Networks
identify anomalies. Different from the idea of [19] and [20], (FAGCN) based on a self-gating mechanism. The core idea
our proposed DD-GCN transforms user feature behavior and lies in implementing adaptive convolution of frequency infor-
relationship into graphs and uses specific domain embed- mation graphs. They also analyze the role of low-frequency
dings to extract latent information. Except for the relevant and high-frequency information in learning node feature rep-
user behavior transformation patterns, DD-GCN exploits two resentation. Also, Wu et al. [38] provide an elaborate review
domains to explore users’ latent information and applies an of GCN. For now, it is still being determined whether GCN
attention mechanism to learn the corresponding importance can adaptively extract relevant information from node features
weights of user behavior under different domains adaptively. and topology structure for classification or detection tasks.
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1641
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
1642 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023
Fig. 1. The architecture of Dual-Domain GCN. The input matrices are adjacency matrix A and feature matrix X . Adjacency matrix A represents the
linkage among nodes. Node feature X is to construct for the feature domain. DD-GCN consists of two convolution modules for topology domain and feature
domain and the attention mechanism. The adjacency matrix and feature matrix are generated by Equation 1 for the feature domain.
Algorithm 1 Calculate Adjacency Matrix A F for Feature adjacency matrix and feature matrix graph G t = (A T , X ),
Domain where A T = A. The l-th layer of topology embedding Z Tl
Input:Adjacency matrix:A; is calculated in the same way as that in the feature domain,
Input feature:X ; which is:
Feature vector:xi , ∀i ∈ V ;
−1 −1
Balance parameter:ω; Z Tl = σ ( D̃T 2 Ã T D̃T 2 Z Tl−1 WTl ), (5)
Output:Adjacency matrix for feature domain:A F
here, σ is the activation function, and ReLU is chosen as our
1: Initialize A F with zeros; activation function due to the low computational complexity
2: Initialize ω; and the non-exponential operation. D̃T is the diagonal degree
3: for node i, j ∈ V do matrix of à T . Specifically, we have à T = A T + IT , and IT is
4:
x ·x
Si, j ← cos(xi , x j ) = |xii||xjj | identify matrix. WTl is the weight matrix of l-th GCN in the
5: A F (i, j) ← ω × Si, j + (1 − ω) × Ai j topology domain. The last embedding in the topology domain
6: if A F (i, j) > 0.5 then is denoted as Z T .
7: A F (i, j) ← 1 3) Attention Mechanism: Since we have two specific
8: end if embeddings Z T and Z F , different node features contribute
9: end for differently to the detection result. Thus, an attention mecha-
Return:A F nism is applied to learn corresponding adaptive importance
weights. We concatenate two embeddings firstly and get
′ ′
Z cat ∈ Rn ×d×d , where n ′ is the number of nodes in two
and node j in the feature domain is established. The generation embeddings, i.e., n ′ = 2n. d is the number of features
of A F in the feature domain is summarized in Algorithm 1. applied in this section, and d ′ is the dimension of the feature
′
After the fused adjacency matrix in feature domain A F is information. For node i in the embedding Z cat , z i ∈ Rd×d is
generated, the input graph in the feature domain is G f = node i’s embedding vector in Z cat . The embedding of node
(A F , X ). Subsequently, l-th layer feature embedding Z lF can i is propagated through a nonlinear transformation, and a
be represented as: shared vector of attention q ∈ Rh×1 is applied to calculate
the attention values ξ i as follow:
−1 −1
Z lF = σ ( D̃ F 2 Ã F D̃ F 2 Z l−1
F W F ),
l
(4)
ξ i = q T σ (W (z i )T + b), (6)
where Z 0F= X , which is similar to Kipf’s graph convolution
kernel [7]. σ is the activation function, and we choose ReLU where σ is the activation function, and we choose tanh as
as our activation function because of the low computational the activation function for attention value calculation. W is
complexity, the non-exponential operation, and the ease of the weight matrix, and b is the bias vector. After that, the
learning optimization. D̃ F is the diagonal degree matrix of normalized attention value with the softmax function is applied
à F , here à F = A F + I F , and I F is identify matrix. W Fl is to get the final importance weight:
the weight matrix of l-th GCN in the feature domain. The last
exp ξ i j
embedding in the feature domain is denoted as Z F . In this α i j = softmax ξ i j = Pd . (7)
j=1 exp(ξ )
way, we can obtain the node embedding in the feature domain ij
that captures the specific fused information.
2) Topology Domain Convolution Module: For the topology The values of α i reflect the importance of node i within all
domain, the input of the convolution module is the original node features in these two domains for the corresponding
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1643
embedding. Each embedding Z i is calculated as follow: normalize the different domain embeddings, which are denoted
X as Z T nor and Z Fnor . Then these two normalized matrices are
Zi = αi j · zi j . (8) applied to calculate the similarity denoted as ST and S F by:
j
The whole process of final embedding generation is sum- ST = Z T nor · Z TT nor , (9)
marized in Algorithm 2. S F = Z Fnor · T
Z Fnor . (10)
Algorithm 2 Node Embedding Generation of Dual-Domain The combination constrain is calculated by:
GCN
Input: Graph:G = (V, E); Lcomb = ∥ST − S F ∥2 , (11)
Input feature:X ; where ∥ · ∥2 is L 2 -normalization.
Weight matrix : W k , ∀k ∈ {1, . . . , K }; b) Difference constraint: For feature or topology embed-
Non-linearity:σ ; ding and combination embeddings, which we denote as
Balance parameter:ω; Z combT and Z combF for embedding Z T and Z F , respectively.
Feature vector:xi , ∀i ∈ V ; Z combT and Z combF are learned from the same graph as
Shared vector of attention:qTi ; Z T and Z F with shared parameters WCl . Thus, a difference
Bias:b. constraint is proposed to enhance them to learn different
Output:Final embedding Z information. Inspired by mutual information, Hilbert-Schmidt
1: Build adjacency (adj) matrix:A ← (V, E) Independence Criterion (HSIC) [39], an effective measure
2: Feature domain fused structure:A F (i, j) to calculate the independence of two vectors, is applied to
3: Feature domain graph generation:G F ← (A F , X ) enhance difference constraint in this work. Unlike mutual
4: Degree matrix of feature domain: D̃ F information, HSIC does not need to estimate the probability
5: Z 0F ← X density of two variables, and it transforms this process into
−1 −1
6: F WF )
Embedding:Z lF ← σ ( D̃ F 2 Ã F D̃ F 2 Z l−1 l a sampling form directly. Technically, the difference con-
7: Adj matrix for topology domain:A T ← A straint of topology embedding Z T and combination embedding
8: Topology domain graph generation:G T ← (A T , X ) Z combT are learned from the same graph, and the HSIC is
9: Degree matrix of topology domain: D̃T calculated as:
10: Z T0 ← X 1
−1 −1 H S I C(Z T , Z combT ) = tr (K T J K combT J ), (12)
l
11: Embedding:Z T ← Ã T D̃T 2 Z Tl−1 WTl )
σ ( D̃T 2 (n − 1)2
12: Concat two embeddings:Z cat ← Z T ||Z F
13: for node i ∈ V do where J = I − n1 and I is the n-level identify matrix. tr is
14: Node i’s embedding in Z cat :z i the trace of matrix. K T and K combT are Gram matrices and
15: Attention value:ξ i ← q T · tanh(W (z i )T + b) calculated as:
16: Adaptive
P importance weight:α i j ← so f tmax(ξ i j ) ∥x1 − x2 ∥22
Zi ← j α · zi j i j K (x1 , x2 ) = exp(− )(σ > 0), (13)
17: σ
18: end for
j
Return:Z where K T,i j = K T (z iT , z T ) and K combT,i j =
j
K combT (z combT , z combT ). z iT is the i-th node’s embedding in
i
the topology domain and z combTi stands for the same meaning.
4) Loss Function: In this section, we design two constraints
to enhance the effectiveness of our proposed model. Since the For the difference constraint in feature domain is calculated
information in these two domains has a combination effect, in the same way as topology domain:
i.e., node label related to both domains, combination constraint 1
is designed to improve this effect through a combination H S I C(Z F , Z combF ) = tr (K F J K combF J ). (14)
(n − 1)2
embedding calculated by two graph convolutional operations
via shared parameters. The difference constraint is intended to Thus, the difference constraint is denoted as Ldi f f is:
improve the disparity effect between two domain embeddings
and corresponding combination embedding. The generation of Ldi f f = H S I C(Z T , Z combT ) + H S I C(Z F , Z combF ). (15)
these two constraints is shown in Fig. 2. The loss function
c) Optimization function: The output embedding Z is
consists of two constraints and an optimization function.
used for multi-class classification with a linear transformation
a) Combination constraint: Generally speaking, the node
and a softmax function. The class prediction Ŷ of the node
classification results may be related to the information in both
can be calculated as follows:
the feature and topology domains. The combination constraint
is designed to enforce the commonality effect between these Ŷ = so f tmax(W Z + b), (16)
two domain embeddings and implies that the node label is
i)
related to both domains. Thus the combination embedding is here so f tmax(xi ) = Pexp(x . In this work, we choose cross-
i exp(xi )
learned with shared parameters WCl of two domain convolu- entropy loss for node classification overall training process.
tional operations as Eq 4 and 5. We utilize L 2 -normalization to Suppose the training set is L, each l ∈ L, the real label of
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
1644 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1645
Fig. 3. The framework of Insider Threat Detection via Dual-Domain GCN. This framework has four modules: input data module, data pre-processing
module, DD-GCN detection module, and output module. Input data module collects user behavior information; Data Pre-pressing module extracts specific
feature of each user and form adjacency matrix Ai and feature matrix X i ; DD-GCN detection module applies DD-GCN to generate node embeddings for
insider threat classification task. The architecture of DD-GCN will be introduced in section III-A. The origin adjacency matrix Ai is transformed to A f for
the feature domain, and the adjacency matrix for topology is the same as origin adjacency matrix; The Output module states the classification result.
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
1646 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1647
TABLE III
VARIANTS C OMPARISON
less than CERT r4.2. Thus, the feature of the user plays a
more significant role in insider threat detection.
For the insider threat detection task, the extracted informa-
tion of the feature domain is critical, contributing to better
results. On the same dataset, different domain contributes
differently to insider threat detection. The same domain con-
tributes differently to insider threat detection on the differ-
ent datasets. Overall, the attention distribution experiments
demonstrate that DD-GCN can focus on more critical infor-
mation.
c) Attention value: With the same parameters, we test
the attention value of each feature in the test set of two
datasets. The results are shown in Fig. 6. The feature ID in
both datasets represents the same. Feature ID from 1-6 stands
for user logon activities (i.e., workhour logons, after-hour
logons, or the total number of logon activities in a given time).
ID 7-15 represents the removable device connection and file
upload activities. ID from 16-20 represents email operations,
and ID from 21-26 denotes web browsing operations such
as downloading, browsing, or uploading operations based on
simple NLP preprocessing. In CERT r4.2, anomaly users are
more related to visiting malicious hacking-related websites
then downloading a keylogger and using a thumb drive to
transfer it to other machines, or surfing job-hunting-related
websites and soliciting employment from a competitor. Thus,
the attention values of anomaly nodes on web browsing oper-
ations are higher than other features, which verifies that our
attention mechanism can assign different importance weights
to different features and focus more on important features.
In CERT r6.2, malicious users are distinguished more by
files, removable device operations, and hacking-related web-
sites browsed during or after work. The attention values of
Web browsing and device operations are higher than other
features. In summary, the evaluation of attention distribution
and attention value indicates that our proposed DD-GCN is
able to assign more significant importance weights to more
Fig. 6. Analysis of Attention Values of Different Features. Together with critical information and features adaptively.
the insider scenarios in both datasets, the insider threat related features get 3) Parameter Study: To examine the sensitivity of the
higher attention values than other session related features which implies the
attention mechanism assigns higher importance weight to crucial information. parameters of our proposed model, we investigate the balance
parameter ω for the feature similarity mechanism in the feature
domain and the parameters of two constraints in both datasets.
than that in the topology domain on the corresponding dataset. The results are shown in Fig. 7.
Meanwhile, for attention distribution in CERT r6.2 shown in a) Analysis of balance parameter ω of feature domain:
Fig. 5(b), the attention difference between topology and feature As ω is used to balance the feature similarity for the feature
domains is more noteworthy than in CERT r4.2. The reason domain and original linkage in the topology domain, we study
behind this could be that the anomaly user in CERT r6.2 is the performance of DD-GCN for insider threat detection
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
1648 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023
with the balance parameter ω ranging from 0.1 to 0.9 for density of data points. LOF reflects the degree of the
CERT r4.2 and CERT r6.2 in the first column of Fig. 7. anomaly of a sample mainly by calculating a score.
The parameter of combination constraint γ and difference • Auto-Encoder (AE) [44] is a form of multi-layer neural
constraint coefficient β are set to 0.001 and 5e-9. From Both network that compresses and reproduces data.
figures, the accuracy increases first and then decreases. This • Long Short Term Memory (LSTM) is widely-used
performance may be because when ω is smaller, the adjacency to model the long sequences and capture the long-time
matrix in the feature domain is similar to that in topology, dependence.
which is not much different from GCN. Also, larger ω may • Convolutional Neural Network (CNN) is a typical
introduce more noisy edges and ignore the original linkage model in deep learning for computer vision, which
among nodes. achieves shift, scale, and distortion invariance through
b) Analysis of combination constraint parameter γ : In local receptive fields, shared weights, and sub-sampling.
order to test the effect of the combination constraint coefficient • Graph Convolutional Network (GCN) learns node rep-
γ , we deploy the experiments of γ varying from 0 to 10,000. resentations through aggregating information from neigh-
The results of CERT r4.2 and CERT r6.2 are shown in the bors, which is a variant of graph neural network and can
second column of Fig. 7. The balance parameter ω is set to be applied to semi-supervised multi-class classification
0.6, and the difference constraint coefficient is set to 5e-9. task.
With the increase of γ , the accuracy rises first and drops after. • Graph Convolutional Network with Feature Aug-
The reason behind this performance probably because more mentation (GCN-FA) [19] is a GCN based model for
labeled data brings more information to feature and topology insider threats detection with feature augmentation via a
domains. Furthermore, the curves of all label rates display weighted feature function.
similar varying trends.
c) Analysis of difference constraint parameter β: After 1) Comparison with Machine Learning-Based Methods:
testing the effect of the combination constraint coefficient, The comparison results are reported in Table IV. Different
we examine the coefficient of the difference constraint and label rates mean different percent of labeled nodes per class
vary it from 0 to 1e-5. The result of CERT r4.2 and CERT in the training set. We have the following observations:
r6.2 are shown in the third column of Fig. 7. Similar to the • Our proposed DD-GCN achieves the best performance
result of the parameter of combination constraint, the accuracy over all label rates compared with machine learning
rises first and drops after. Conversely, the performance drops baselines. Especially for accuracy, DD-GCN achieves
significantly when β increases to 1e-9. While β between 5e- accuracy of around 98.21% and 98.65% for CERT
9 and 1e-10, the result of the label rate of 60% is steadier r4.2 and r6.2, respectively. Maximum relative improve-
than that with the label rate of 30% and 15%. Moreover, even ment is about 13.86% in CERT r4.2 and 14.24% in CERT
for the label rate of 15%, the accuracy still increases, which r6.2. As for the F1 score, DD-GCN reaches the best
implies our proposed DD-GCN can achieve better results with result, around 93.04%, which improves around 30% than
all label rates. other traditional machine learning-based methods in both
datasets. These results demonstrate the effectiveness of
DD-GCN.
C. Insider Threat Detection Analysis • Compared with traditional machine learning meth-
We compare our proposed insider threat detection via ods, DD-GCN outperforms in all label rates. Because
Dual-Domain GCN with other state-of-the-art methods, such DD-GCN extracts not only user behavior feature infor-
as traditional machine learning-based and deep learning-based mation but also the relationship information between
methods. The traditional machine learning methods applied topology structure and user features.
to compare in this section include support vector machine The reason behind this indicates that the sophisticated fea-
(SVM), random forest (RF), logic regression (LR), and ture and relationships of structural information are fundamen-
two outstanding machine learning-based methods of anomaly tal for insider threat detection task, which traditional machine
detection. The deep learning-based schemes in this section learning-based schemes neglect. Moreover, the number of
comprise two categories: manually structured skeleton data representative features of user behavior considered in this
approaches and graph-based architectures. We conducted the work is less than traditional machine learning-based schemes
experiments 10 times and reported the average values and stan- require, which undermines the performance of traditional
dard deviation. The information of these traditional machine machine learning baselines. Also, the size of the dataset is
learning methods and deep learning paradigms will be shown another concern for machine learning baselines. For example,
below, and the parameters applied in this part are γ =0.001, LODA can be theoretically improved with more training data.
β=5e-8, ω=0.6: In some cases, the performance of LOF is better than other
• Lightweight on-line Detector of Anomalies (LODA) algorithms because the identified anomalies are determined by
[42] is a method that consists an anomaly detector based the local features that other algorithms may ignore. However,
on a weak histogram with a robust detector. LOF suffers from long training and prediction time. LODA
• Local Outlier Factor (LOF) [43] is a density-based requires lower time complexity, which makes it suitable for
algorithm and the core part of it is characterizing the time essence detection tasks.
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1649
Fig. 7. Parameter Study. The sensitivity of parameters of Balance Parameter ω, Combination Constraint Coefficient γ and Difference Constraint Coefficient
β.
TABLE IV
R ESULTS (%) C OMPARED W ITH M ACHINE L EARNING -BASED M ETHODS
Comparing the results in Table III and IV, we can find The results of DD-GCN and GCN-FA are better than
that despite two constraints, DD-GCN-w/o still outperforms GCN, which verifies that the feature domain contributes more
other schemes, indicating that our proposed framework is significantly to this task. DD-GCN consistently outperforms
competitive. GCN-FA and GCN with all label rates, which indicates the
2) Comparison with Deep Learning-Based Methods: The effectiveness of the adaptive attention mechanism and trans-
results of accuracy and F1-score are summarized in Table V. formation pattern in DD-GCN. DD-GCN can extract more
The first observation is the overwhelming performance of information than GCN and achieve better accuracy, around
graph-based architecture compared with the structured-based 98.21% and 98.65% of the two datasets, respectively. The
skeleton ones, such as CNN and LSTM. Even though LSTM maximum relative improvement of accuracy is 6.57%. For
can be considered to extract structural information successfully the performance of the F1 score, our approach improves by
according to their data-driven architectures, solid improve- around 7.36%. For the 60% label rate, DD-GCN achieves
ments can not be evident when modeling irregular skeleton higher accuracy than GCN with a similar F1 score.
data, such as a graph. For better performance of structure Comparing GCN-FA and GCN, we can find that struc-
skeleton data schemes for insider threat detection tasks, LSTM tural differences exist between topology and feature domains,
can be applied to extract the sequence feature of user behavior and performing convolutional operation only on the topology
and process it to CNN for classification tasks. domain does not show a better result than on the feature
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
1650 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023
TABLE V
R ESULTS (%) C OMPARED W ITH D EEP L EARNING -BASED M ETHODS
TABLE VI
C OMPUTATIONAL T IME A NALYSIS
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
LI et al.: HIGH ACCURACY AND ADAPTIVE ANOMALY DETECTION MODEL WITH DD-GCN 1651
V. S UMMARY AND F UTURE W ORK [11] I. Homoliak, F. Toffalini, J. Guarnizo, Y. Elovici, and M. Ochoa, “Insight
into insiders and IT: A survey of insider threat taxonomies, analysis,
In this paper, we propose Dual-Domain GCN, a novel modeling, and countermeasures,” ACM Comput. Surveys, vol. 52, no. 2,
pp. 1–40, Mar. 2020.
graph-based model for insider threat detection. The key idea
[12] H. A. Kholidy, F. Baiardi, and S. Hariri, “DDSGA: A data-driven
behind DD-GCN is fusing user behavior from structural topol- semiglobal alignment approach for detecting masquerade attacks,” IEEE
ogy and feature information. A weighted feature similarity Trans. Dependable Secure Comput., vol. 12, no. 2, pp. 164–178,
mechanism function is designed to convert user log entries into Jun. 2015.
[13] M. Du, F. Li, G. Zheng, and V. Srikumar, “DeepLog: Anomaly detection
heterogeneous graphs via leveraging relationship information and diagnosis from system logs through deep learning,” in Proc.
between users and their corresponding activities features. ACM SIGSAC Conf. Comput. Commun. Secur., New York, NY, USA,
We extract specific graph embeddings from structural topology Oct. 2017, pp. 1285–1298, doi: 10.1145/3133956.3134015.
[14] A. Tuor, S. Kaplan, B. Hutchinson, N. Nichols, and S. Robinson,
and feature information simultaneously to analyze the graph, “Deep learning for unsupervised insider threat detection in structured
transforming behavior information into high-level representa- cybersecurity data streams,” 2017, arXiv:1710.00811.
tions. Meanwhile, an attention mechanism is applied to learn [15] S. Yuan, P. Zheng, X. Wu, and Q. Li, “Insider threat detection via
hierarchical neural temporal point processes,” in Proc. IEEE Int. Conf.
the importance weight of each node’s embedding within these Big Data (Big Data), Dec. 2019, pp. 1343–1350.
two domains so as to fuse them adaptively. The evaluation [16] Y. Shen, E. Mariconti, P. A. Vervier, and G. Stringhini, “Tiresias:
of DD-GCN on public datasets demonstrates that DD-GCN Predicting security events through deep learning,” in Proc. ACM SIGSAC
based detection model achieves improved detection accuracy Conf. Comput. Commun. Secur., New York, NY, USA, Oct. 2018,
pp. 592–605, doi: 10.1145/3243734.3243811.
and outperforms other state-of-the-art traditional techniques [17] T. Hu, W. Niu, X. Zhang, X. Liu, J. Lu, and Y. Liu, “An insider threat
based insider detection models. detection approach based on mouse dynamics and deep learning,” Secur.
In future work, we plan to apply the DD-GCN based model Commun. Netw., vol. 2019, pp. 1–12, Feb. 2019.
[18] M. Oka, Y. Oyama, and K. Kato, “Eigen co-occurrence matrix method
to other practical applications to evaluate its feasibility and for masquerade detection,” in Proc. 7th JSSST SIGSYS Workshop Syst.
scalability further. We think the process of creating anomaly Program. Appl. (SPA). Tsukuba, Japan: Tsukuba Univ., 2004.
detection models in many other areas (such as network traffic [19] J. Jiang et al., “Anomaly detection with graph convolutional networks
for insider threat and fraud detection,” in Proc. IEEE Mil. Commun.
anomaly detection and fraud detection) could be similar to Conf. (MILCOM), Nov. 2019, pp. 109–114.
our proposed model with only a few changes in the feature [20] F. Liu, Y. Wen, D. Zhang, X. Jiang, X. Xing, and D. Meng, “Log2vec:
extraction process. In addition, since the number of labeled A heterogeneous graph embedding based approach for detecting cyber
attack samples in insider threat is limited, we wonder if the threats within enterprise,” in Proc. ACM SIGSAC Conf. Comput. Com-
mun. Secur., New York, NY, USA, Nov. 2019, pp. 1777–1794, doi:
pre-training GCN model is capable of tackling this situation. 10.1145/3319535.3363224.
Thus, we will seek to incorporate the pre-training mechanism [21] M. N. Hossain et al., “SLEUTH: Real-time attack scenario reconstruc-
into our model in future work. tion from COTS audit data,” in Proc. USENIX Secur. Symp., 2017,
pp. 487–504.
[22] S. Ma, J. Zhai, F. Wang, K. H. Lee, X. Zhang, and D. Xu, “MPI:
Multiple perspective attack investigation with semantics aware execution
R EFERENCES partitioning,” in Proc. USENIX Secur. Symp., 2017, pp. 1111–1128.
[23] Y. Tang et al., “NodeMerge: Template based efficient data reduction
[1] L. Liu, O. de Vel, Q.-L. Han, J. Zhang, and Y. Xiang, “Detect- for big-data causality analysis,” in Proc. ACM SIGSAC Conf. Comput.
ing and preventing cyber insider threats: A survey,” IEEE Commun. Commun. Secur., New York, NY, USA, Oct. 2018, pp. 1324–1337, doi:
Surveys Tuts., vol. 20, no. 2, pp. 1397–1417, 2nd Quart., 2018, doi: 10.1145/3243734.3243763.
10.1109/COMST.2018.2800740. [24] J. Chen, T. Ma, and C. Xiao, “FastGCN: Fast learning with graph
[2] U.S. Cybercrime-Survey. (2015). CERT Division of the convolutional networks via importance sampling,” in Proc. ICLR, 2018,
Software Engineering Institute, Price Waterhouse Cooper. pp. 1–15.
[Online]. Available: http://www.pwc.com/us/en/increasing-it- [25] Y. Ma, S. Wang, C. C. Aggarwal, and J. Tang, “Graph convolutional
effectiveness/publications/assets/2015-us-cybercrime-survey.pdf networks with EigenPooling,” in Proc. 25th ACM SIGKDD Int. Conf.
[3] Verizon 2018 Data Breach Investigations Report. (2018). [Online]. Knowl. Discovery Data Mining, Jul. 2019, pp. 723–731.
Available: http://www.verizon-enterprise.com/resources/reports/ [26] M. Qu, Y. Bengio, and J. Tang, “GMNN: Graph Markov neural
[4] Dtex Systems. (2018). 2018 Insider Threat Intelligence Report. networks,” in Proc. ICML, 2019, pp. 5241–5250.
[Online]. Available: http://www.dtex-systems.com/2018-insider-threat- [27] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and
intelligence-report/ locally connected networks on graphs,” in Proc. ICLR, 2013, pp. 1–14.
[5] X. Feng, Z. Zheng, D. Cansever, A. Swami, and P. Mohapatra, “Stealthy [28] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural
attacks with insider information: A game theoretic model with asym- networks on graphs with fast localized spectral filtering,” in Proc.
metric feedback,” in Proc. IEEE Mil. Commun. Conf., Nov. 2016, NeurIPS, 2016, pp. 3844–3852.
pp. 277–282. [29] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
[6] B. A. Alahmadi, P. A. Legg, and J. R. C. Nurse, “Using Internet activity learning on large graphs,” in Proc. NeurIPS, 2017, pp. 1024–1034.
profiling for insider-threat detection,” in Proc. 17th Int. Conf. Enterprise [30] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and
Inf. Syst., 2015, pp. 709–720. Y. Bengio, “Graph attention networks,” in Proc. ICLR, 2017, pp. 1–12.
[7] N. T. Kipf and M. Welling, “Semi-supervised classification with graph [31] H. Pei, B. Wei, K. C. C. Chang, Y. Lei, and B. Yang, “Geom-
convolutional networks,” in Proc. ICLR, 2017, pp. 1–14. GCN: Geometric graph convolutional networks,” in Proc. ICLR, 2020,
[8] Y. Chen, C. M. Poskitt, and J. Sun, “Learning from mutants: Using code pp. 1–12.
mutation to learn and monitor invariants of a cyber-physical system,” in [32] B. Xu, H. Shen, Q. Cao, Y. Qiu, and X. Cheng, “Graph wavelet neural
Proc. IEEE Symp. Secur. Privacy (SP), May 2018, pp. 648–660, doi: network,” in Proc. ICLR, 2019, pp. 1–13.
10.1109/SP.2018.00016. [33] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and
[9] X. Shu, D. Yao, N. Ramakrishnan, and T. Jaeger, “Long-span program M. M. Bronstein, “Geometric deep learning on graphs and manifolds
behavior modeling and attack detection,” ACM Trans. Privacy Secur., using mixture model CNNs,” in Proc. IEEE Conf. Comput. Vis. Pattern
vol. 20, no. 4, pp. 1–28, Nov. 2017, doi: 10.1145/3105761. Recognit. (CVPR), Jul. 2017, pp. 5425–5434.
[10] P. A. Legg, O. Buckley, and M. Goldsmith, “Caught in the act of an [34] Q. Li, Z. Han, and X. Wu, “Deeper insights into graph convolu-
insider attack: Detection and assessment of insider threat,” in Proc. IEEE tional networks for semi-supervised learning,” in Proc. AAAI, 2018,
Symp. Technol. Homeland Secur., Apr. 2015, pp. 1–6. pp. 3538–3545.
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.
1652 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 18, 2023
[35] N. T. Hoang and T. Maehara, “Revisiting graph neural networks: All Linghui Li (Member, IEEE) received the Ph.D.
we have is low-pass filters,” 2019, arXiv:1905.09550. degree from the Institute of Computing Technol-
[36] F. Wu, T. Zhang, A. H. D. Souza, C. Fifty, T. Yu, and Q. K. Weinberger, ogy, University of Chinese Academy of Sciences.
“Simplifying graph convolutional networks,” in Proc. ICML, 2019, She was a Post-Doctoral Researcher with the Bei-
pp. 6861–6871. jing University of Posts and Telecommunications
[37] D. Bo, X. Wang, C. Shi, and H. Shen, “Beyond low-frequency informa- (BUPT), where she is currently an Associate Pro-
tion in graph convolutional networks,” 2021, arXiv:2101.00797. fessor of cyberspace security. Her current research
[38] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A compre- interests are information security and artificial intel-
hensive survey on graph neural networks,” 2019, arXiv:1901.00596. ligence.
[39] W. D. K. Ma, J. P. Lewis, and W. B. Kleijn, “The HSIC bottle-
neck: Deep learning without back-propagation,” in Proc. AAAI, 2019,
pp. 5085–5092.
[40] J. Glasser and B. Lindauer, “Bridging the gap: A pragmatic approach to
generating insider threat data,” in Proc. IEEE Secur. Privacy Workshops,
San Francisco, CA, USA, 2013, pp. 98–104, doi: 10.1109/SPW.2013.37.
[41] A. Paszke, “PyTorch: An imperative style, high-performance deep learn-
ing library,” in Proc. NeurIPS, 2019, pp. 1–12.
Jie Yuan (Member, IEEE) received the Ph.D. degree
[42] T. Pevny, “Loda: Lightweight on-line detector of anomalies,” Mach. in cyberspace security from the Beijing University
Learn, vol. 102, no. 2, pp. 275–304, 2016. of Posts and Telecommunications. She has pub-
[43] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “LOF: Identifying lished several papers in journals and conference
density-based local outliers,” in Proc. ACM SIGMOD, May 2000, proceedings. Her current research focuses on cloud
vol. 29, no. 2, pp. 93–104. computing, network security, and trusted systems.
[44] E. Pantelidis, G. Bendiab, S. Shiaeles, and N. Kolokotronis, “Insider
threat detection using deep autoencoder and variational autoencoder
neural networks,” in Proc. IEEE Int. Conf. Cyber Secur. Resilience
(CSR), Jul. 2021, pp. 129–134, doi: 10.1109/CSR51186.2021.9527925.
Authorized licensed use limited to: KCG College of Technology - CHENNAI. Downloaded on July 20,2023 at 04:28:24 UTC from IEEE Xplore. Restrictions apply.