Professional Documents
Culture Documents
A Novel Method For Malware Detection On ML-based Visualization Technique
A Novel Method For Malware Detection On ML-based Visualization Technique
A Novel Method For Malware Detection On ML-based Visualization Technique
Journal Pre-proof
PII: S0167-4048(18)31462-7
DOI: https://doi.org/10.1016/j.cose.2019.101682
Reference: COSE 101682
Please cite this article as: Xinbo Liu, Yaping Lin, He Li, Jiliang Zhang, A Novel Method for
Malware Detection on ML-based Visualization Technique, Computers & Security (2019), doi:
https://doi.org/10.1016/j.cose.2019.101682
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
Changsha, China
Abstract
Malware detection is one of the challenging tasks in network security. With the
flourishment of network techniques and mobile devices, the threat from malwares
has been of an increasing significance, such as metamorphic malwares, zero-day
attack, and code obfuscation, etc. Many machine learning (ML)-based malware
detection methods are proposed to address this problem. However, considering
the attacks from adversarial examples (AEs) and exponential increase in the
malware variant thriving nowadays, malware detection is still an active field of
research. To overcome the current limitation, we proposed a novel method using
data visualization and adversarial training on ML-based detectors to efficiently
detect the different types of malwares and their variants. Experimental results
on the MS BIG malware database and the Ember database demonstrate that
the proposed method is able to prevent the zero-day attack and achieve up to
97.73% accuracy, along with 96.25% in average for all the malwares tested.
Keywords: Malware Detection, Adversarial Training, Adversarial Examples,
Image Texture, Data visualization
2018 MSC: 00-01, 99-00
∗ Correspondingauthor:
Email address: yplin@hnu.edu.cn (Yaping Lin)
Malicious softwares (Malwares) usually refers to a generic term for all un-
wanted softwares, designed to gain unauthorized access, steal useful information,
disrupt normal operation and adversely affect computers or even mobile devices
5 [1, 2]. Even though there are eight major types of malwares discovered in the real
world [3], there were 6,480 and 6,447 publicly disclosed vulnerabilities in 2015 and
2016, respectively, and the number of cases became worse up to 14,712 in 2017
according to cvedetails1 . Therefore, how to detect different kinds of malwares,
especially their variants, efficiently and accurately is a challenge nowadays.
10 Traditionally, the signature-based determination is used to detect malware,
but its scalability limits the applicability [4]. Static code analysis is another
kind of malware detection method, which is working for a complete coverage
through disassembling, however, it usually suffers from code obfuscation and the
executable files must be unpacked and decrypted before analysis [2]. Different
15 from analyzing code statically, dynamic code analysis is proposed as not to unpack
or decrypt the execution file in a virtual environment, which is time-intensive
and resource-consuming [3]. More importantly, the methods mentioned above
are unable to detect specific types of malware whose behavior is well-camouflaged
or not satisfied by trigger conditions.
20 Recently, malware detection has employed different machine learning (ML)
methods to improve the detection efficiency [5, 1]. Especially for the visualization
techniques in ML-based detection methods, which could not only be more efficient
in the detection process but also be flexible enough to break the restrictions
between file formats[6, 7]. However, the ML-based detectors are vulnerable to the
25 attack from Adversarial Example (AE) [7, 8]. AE is a special sample generated
from the original dataset with a tiny perturbation, which is able to fool the
ML-based malware detectors [9]. If there exists such interference from malware
AEs, that the detection accuracy of ML-based malware analytical methods will
1 https://www.cvedetails.com
2
be greatly influenced by them [7, 10]. Even worse, these detectors will be induced
30 to output an opposite result. Fortunately, in order to defend the attack from
AEs the Adversarial Training (AT) is proposed. AT is an AE-driven technique
to increase the accuracy and robustness of the detection model by augmenting
training data with targeted AEs in the pre-training process [11, 9]. Even though
AT/AE techniques have been used for malware detection such as the malware
35 analysis with targeted interference, they only focused on one specific malware
format, such as Android, app or swf files [7, 12, 13], rather than investigating
a universal file format for all the potential malwares and their variants. To
the best of our knowledge, the AT technique is able to solve the problem of
AE’s perturbation, but it has not been used in the area of ML-based malware
40 visualization detection.
In this paper, we propose an AT-based malware visualization detection
method, named Visual-AT, which not only improves the detection accuracy in
malware analysis but also prevents potential attacks from malware AEs and
associated variants. Additionally, the proposed method is suitable for most of
45 the universal malware file formats, such as worms, viruses, trojans, spyware
et al. In Visual-AT, the generated AEs are used to simulate the potential
malware variants which are normally disguised as benign samples in traditional
ML-based detectors. Meanwhile, Visual-AT can also imitate malware variants
to not only extend the malware dataset but also facilitate extracting malware
50 features. Besides, we optimize the commonly used AE-generation (FGSM and
C&W’s attack) and image-transformation methods (visualized transformation
and normalization). The experimental results on real malware datasets (from the
MS BIG database and the EMBER database) show that our method achieves a
superior performance versus four latest works and the traditional SVM- and CNN-
55 based detectors in terms of efficiency and accuracy. An up-to 97.73% accuracy is
obtained, along with 96.25% on average for malware variants tested. In addition,
compared to the normal methods without using AE/AT, the proposed method
can obtain an average 28.12% of detection-accuracy increase and 81.24% of false
positive rate reduction. This paper makes the following contributions:
3
60 (1) To the best of our knowledge, this paper proposes the first ML-based
malware visualization detection method with a suitable model regularization by
exploiting both AE and AT techniques.
(2) The proposed method can generate the variance and camouflage of
malwares to mitigate the novel and probable malwares in real detection. This
65 method (Visual-AT) is also a more accurate and efficient detection method to
prevent the zero-day attack.
(3) This work carries out a performance analysis of the proposed method
and evaluation in terms of accuracy and robustness of real dataset.
The rest of the paper is organized as follows: Sect. 2 surveys the related work
70 of malware detection and adversarial techniques, Sect. 3 describes the proposed
method and the corresponding improvement malware detection approach, Sect. 4
shows the experimental results and Sect. 5 discusses the possibilities for the
proposed method to improve the accuracy and robustness. Finally, conclusion
and future works are presented in Sect. 6.
75 2. Related work
4
whether the program is benign or malware. Although these methods only process
90 the sequence in the forward direction, some sequential patterns may lie in the
backward direction. Bidirectional RNN [19] tries to learn patterns from both
directions, that is an additional backward RNN is used to process the reversed
sequence. But the computation process will incur a substantial overhead with
low efficiency, in which output probability is used to calculate the concatenation
95 of the hidden states from both directions. By applying discriminate distance
matrices learning, Kong and Yan [20] proposed a method that observes the
similarity on the extracted fine-grained features between two malware programs
based on the function call graph for each sample. This learning method could
cluster the malware samples belonging to the same family while keeping the
100 different clusters separate by a marginal distance. The weakness of this method
is the detection accuracy relies heavily on the extracted fine-grained features to
compare similarity.
In addition, another different kind of method is proposed by the recent
application of statistical topic modeling approaches to the classification of
105 system call sequences [21], which was further extended with a nonparametric
methodology [22]. Subsequently, this method has been extended by taking
system call arguments as additional information as well as memory allocation
patterns and other traceable operations [23]. Pfoh et al. [24] exploit SVM with
string kernels to represent a sequence-aware method that is capable of detecting
110 malicious behavior online by classifying system call traces in small sections
and keeping a moving average over the probability estimates. The shortcoming
of this method is that the process to act maliciously should interact in some
manner with the rest of the system, or this interaction must take place through
the interface provided by the operating system (e.g. system calls). Moreover,
115 Mohaisen and Alrawi [25] introduced an automated system, AMAL, for large
scale malware analysis and classification. The AMAL consists of two subsystems.
One is AutoMal to collect malicious low granularity behavioral artifacts, and
another is MaLable to create representative features with artifacts. MaLabel
has the ability to tune different learning algorithms for classification, including
5
120 SVM, K-nearest neighbor, and decision tree. The downside to this automatic
system is an unnecessary overhead by running malware samples in virtualized
environments.
.text
.rdata
94 E8 EA ...
C 7 01 BB ...
Bytes .data
files 5 E C 2 04 ...
... ... ... ...
Malware Binary Binary to
100101001110100011 8 bit vector .rsrc
10101011101010...
8 Bit Vector to
Grayscale Image
6
140 Random Forest (RF) [33], Decision Tree (DT) [34, 35] and et al. About different
ML-based detectors, this type of methods has been proposed for detecting
unknown samples or underlining those samples that exhibit unseen behavior
for detailed analysis. Lee et al. [36] firstly transformed the malicious code
into the image to accelerate the malware detection. And then, Kong and Yan
145 [20] proposed to classify the malicious samples automatically with encoded
features. Han et al. [5] analyzed the global features of malware based on binary
textures. However, the limitation of these methods is that an attacker can
adopt counter measures to beat the system because of the features based on
a global image. Then, Makandar et al. [1] constructed the texture feature
150 vectors with multi-resolution and wavelet for malware image classification via
SVM. Huang et al. [6] introduced an R2-D2 method through a color-inspired
RGB texture image without extracting pre-selected feature via CNN. Recently,
Kalash et al. [28] exploit a deep learning approach, which converts malware
binaries to gray-scale images and subsequently train a CNN framework for
155 further classification. In general, these ML-based visualization methods could
obtain high accuracy, additionally the rate of true positive and false positive
also illustrated a good robustness [1, 2]. But the majority of these studies are
only based on compression and dimensional reduction with real data samples to
extract malicious features. It hence brings a serious security threat: if AEs are
160 involved, traditional ML-based detection cannot successfully identify malware
[37].
7
170 to defend the white-box attacks if the perturbation computed during training is
close to the maximum of model loss. Even though a few studies employed AE
and AT for malware detection, none of them relied on the ML-based visualization
detection approach. Grosse et al.[7] used AEs for discrete and binary input
domain to mislead classifier in malware samples. But this attack is able to
175 handle the specific binary feature in Android malware detection. Maiorca et
al. [12] were the first to formally define and quantitatively characterize this
vulnerability in SWF-file-based malware dataset. To mitigate the zero-day attack
from malware, Kim et al. [13] proposed a transferred generative adversarial
network (tGAN) for automatic classification with visualization-based malware
180 transformation. These methods mentioned above are not able to deal with the
exponential growth of malware variants and AEs. Considering these reasons,
with the increased importance of the adversarial techniques, the normal detection
methods (even with the ML-based visualization approaches) are hard to adapt
to the relevant requirements of different detection object or the environment in
185 the recent research [37] of malware determination.
3. Methodology
Fig. 2 shows the flow chart of Visual-AT which contains five functional
modules, including Data preprocessing, Pre-training Process, AE Generation
8
Bytes
Data files
……
Input
Pre-training Process Output
……
AE Type 1 AE Type 2 AE Type n
Original Dataset AE Generation Process
CNN
…… + SVM
Enlarged Dataset
Detection Methods
Adversarial Training & Detection Process
Process, Adversarial training & detection process and Output Results. Each
module plays a specific role. Meanwhile, the colored arrows denote the flow
200 direction of the corresponding dataset. Since the researchers usually obtain
malware datasets with all these exiting classes, the proposed method can be
applied to solve the problem with a kind of attack in gray-box [42].
Visual-AT firstly converts malware code into feature images (or malware
9
images) and then rescales these transformed images to the same size in the Data
205 preprocessing stage. By means of these preprocessing measures (transformation
and normalization), the image samples are used for ML-based malware detection,
such as using CNN and SVM. To generate AEs and simulate corresponding
subtle variants for certain malware types, the original dataset will be transmitted
to the stages of Pre-training Process and Adversarial Training & Detection
210 Process after Data preprocessing. Assume that there are no AEs available, the
preprocessed dataset will be transmitted to the final stage directly and make a
limited determination. However with the help of adversarial techniques during
Pre-training Process and AE Generation Process stages, the proposed method
crafts a subset of targeted AEs to enlarge the dataset. Finally, Visual-AT will
215 be enhanced with a suitable regularization of the generated dataset. Therefore,
this method is able to intentionally craft a subtle perturbation to simulate
malware variants and camouflage from original malware dataset via different AE
generating methods.
where ε represents the distortion between AEs and original samples. sign(·)
denotes the sign function,∇x Jθ (·, ·) computes the gradient of the cost function
J around the current value x of the model parameters θ. l is the label of x.
10
x 1 x* x
(tanh( w) 1) x
“Obfuscator.ACY” 2 “Benign sample”
81.3% confidence 99.7% confidence
11
where δ = x∗ − x, a new objective function g is defined as:
Note that if and only if f (x∗ ) = y 0 6=y, the penalty term and distance can be
240 further optimized. In this case, the optimization formulation could be modified
as follows. Taking l2 attack as an example, the original single reformer c∗ · g ∗ (·)
is divided into two parts as r · gr (·) and d · gd (·), as c∗ · g ∗ (·) = r · gr (·) + d · gd (·),
which denotes the corresponding loss function in a detector. Additionally, r and
d are chosen via binary search. Algorithm 1 illustrates the pseudo-code of the
245 optimized C&W’s attack-based method in l2 -norm attack, according to existing
12
works of [9, 43]. Fig. 3 shows an example of AE perturbation process from the
correct detection result with an average confidence of 81.3% on the MS BIG
dataset to the induced error detection result with an average confidence of 99.7%
by C&W’s attack with the distortion ε = 0.45.
Xm
exi n
13
where F (·) is an activation function, xi denotes to the i-th input to a neuron,
w is the weight vector in gradient descent and b represents the corresponding
typical value.
255 As a machine learning based detection method, the computational cost is also
an important factor that is worth taking into consideration. Paying attention
to the proposed method, the computational cost mainly depends on the AE
generation process [48] and the training process of the detection model [49].
At first, this section discusses the asymptotic analysis for the training process
260 and the AE generation process respectively, including the time complexity
and space complexity of the proposed Visual-AT method. For the training
process of the detection model, the time complexity can been illustrated as,
PD
T ime ∼ O( l=1 Ml2 · Kl2 · Cl−1 · Cl ), according to the research in [49, 50], where,
M is the edge length of output feature map in each convolutional kernel, K
265 denotes the edge length of each convolutional kernel, C represents the number
of channels for each convolutional kernel, D is the convolutional layer number
of neural network (i.e., depth of network), and l denotes the l-th convolutional
layer of a neural network. Meanwhile, the corresponding space complexity
PD
is Space ∼ O( l=1 Kl2 · Cl−1 · Cl ), where, the space complexity reflects in
270 the volume of the method (or model) itself, which is related to the size of the
convolution kernel (K), the number of channels (C), and the depth of the network
(D). Referring to the work of [48, 51] for the AE generation process, the time
complexity analysis for gradient optimization-based AE generation method could
2
LPM
be illustrated as, T ime ∼ O( ε ). Where, L and P are Lipschitz constants,
275 which are associated with the specific norm (k · k), as L ≡ Lk·k , P ≡ PM,k·k . And
ε denotes the gradient distortion parameter. Additionally, the corresponding
LPx2
space complexity could be expressed as, Space ∼ O( ε ).
In this work, the proposed Visual-AT method contains not only a CNN-based
training process for model detection but also a gradient optimization-based AE
generation process. According to the analysis above, the computational cost for
14
time complexity could be demonstrated as,
D
X 2
LPM
T ime ∼ O( Ml2 · Kl2 · Cl−1 · Cl + ).
ε
l=1
Where the space complexity reflects in the volume of the proposed method, but
has nothing to do with the size of the input.
280 The Visual-AT method presented in this paper is a robust method that
is based on the model pre-training process with ML algorithm. In general,
the modeling process will account for the majority of computational cost.
Since it is not difficult to obtain the computational cost, we have T ime ∼
PD LP 2
O( l=1 Ml2 · Kl2 · Cl−1 · Cl ) T ime ∼ O( εM ) according to the calculation
285 above. The corresponding overhead can be computed mainly according to
the selected model framework in the Visual-AT method. Therefore, for the
proposed Visual-AT scheme, the computational cost could be approximated
PD PD
as, T ime ∼ O( l=1 Ml2 · Kl2 · Cl−1 · Cl ) and Space ∼ O( l=1 Kl2 · Cl−1 · Cl ).
Additionally, the computational cost for the detection model’s training process
290 is always consistent with corresponding detection methods. In the following
experiment part of Sect. 4.5, this paper discusses the real experimental results
of AE generation and model training process of Visual-AT in detail.
4. Experiments
15
300 methods with different AE crafted methods. Meanwhile, the cross-validation
for statistical evaluation is also used to analyze the reliability of the proposed
approach. Finally, by testing the computational cost of the proposed method,
which includes the training process and generation process of AE, this exper-
iment validates the flexibility and reliability of Visual-AT on a platform with
305 real malware dataset. The experiment programs are implemented in Python
3.6 and MATLAB 2018a under the operating system CentOS Linux release
7.6.1810 (Core), while the testbed features an Intel(R) Xeon(R) Silver 4110 CPU
at 2.10GHz, 128GB RAM, and 16GB NVIDIA Quadro P5000 GPU with CUDA
10.1. The detailed descriptions of each experiment are as follows.
16
Table 1: MS BIG dataset with malware class distribution & benign samples.
2 https://github.com/BartyzalRadek/Multi-label-Inception-net
17
345 of the proposed method, the targeted AEs generated from the BIG dataset are
exploited to compute the detection accuracy in Visual-AT, while comparing the
method against traditional ML-based visualization malware detection methods,
such as CNN and SVM. The evaluation metrics of this experiment include the
accuracy value, false positive rate, and false negative rate, which is effective for
350 quantitative evaluation. A high accuracy with both low both false positive rate
and false negative rate indicates a better performance. Moreover, in order to
evaluate the performance metrics of Visual-AT, the AEs generated from the
Ember dataset are used to test the detection accuracy of these constructed
model during AEs generation. By adjusting different sizes of training dataset
355 and different parameters during AEs generation process, the experiment results
of detection accuracy show a variation tendency clearly in performance analysis.
Therefore, malware detection results based on different datasets are elaborated,
along with discussions of the ability to resist AEs. Finally, by executing the
model pre-training process with different sizes of datasets and evaluating the
360 different AE generation methods in Visual-AT, the computational cost results
of the proposed method are also represented in this section. Parameters and
settings are described in detail below.
18
375 In Table 2, comparison results are listed including the accuracy, false positive
rate, and false negative rate. Visual-AT achieves an increase in the average
accuracy of 8.64% in SVM-based detector while the average false positive and
negative rates drop below 3.65% and 2.31% respectively. Factors versus a CNN-
based detector are 4.47%, 2.93%, and 1.70%, respectively. Compared to both
380 traditional ML models, Visual-AT not only obtains an average accuracy increase
of 7.41% but also a decrease of 68.17% regarding the false positive rate. In
addition, the detection model using FGSM-based AE is slightly less accurate
than the one with C&W’s attack-based in the SVM detector, with a difference
in value from 94.55% to 96.18%. As for the accuracy of the CNN detector,
385 the model using FGSM-based AE shows superiority over the one using C&W’s
attack-based method with a difference in the accuracy from 97.56% to 96.25%.
As expected, the difference between gradient descent and norm optimization in
AE generation methods is responsible for the findings above.
Compared with the traditional detection methods, such as code analysis and
390 data signature in related work [2, 3, 54], the ML-based visualization detection
methods have improved the accuracy with an average increase of 20%, as the
value of 87.13% for SVM-based detector and 92.16% for CNN-based one according
to the experiment results in Table 2. Furthermore, by adopting the adversarial
techniques the detection results can obtain an efficiency improvement. Table 2
395 lists the comparison between normal ML-based visualization detectors and Visual-
AT, where the latter achieves a better performance. The detection accuracy of
Table 2: The detection results of different detectors with the pure dataset
SVM CNN
Accuracy False positive rate False negative rate Accuracy False positive rate False negative rate
19
Visual-AT achieves a 5%∼10% increase and a 5%∼8% false positive rate drop
versus traditional methods. The CNN-based detector without AT has a value of
92.16% with the accuracy on average whereas Visual-AT increases the number
400 to 96.53%. Factors become 87.13% and 95.76%, respectively for the SVM-based
detector.
Table 3: The detection results of different detectors with the obstructed dataset
SVM CNN
Accuracy False positive rate False negative rate Accuracy False positive rate False negative rate
20
and from 92.16% to 54.16% in the CNN-based one. In terms of the false positive
420 rate on average, it increases from 4.82% to 34.59% and from 8.62% to 48.62% in
SVM- and CNN-based detectors, respectively. These obvious gaps indicate that
the AEs hugely affect the performance of traditional ML-based visualization
methods in malware detection.
More importantly, the proposed Visual-AT method shows strong robustness
425 and performs even better when AEs are involved in the dataset. Compared
with the results of normal ML-based visualization methods in Table 2 and 3,
Visual-AT achieves an 1.59× and 1.78× detection-accuracy increase compared
to the SVM-based and CNN-based detectors, respectively. In addition, one can
observe an obvious drop of the false positive rate in two ML-based detectors,
430 from the value of 34.59% to 3.36% in SVM and from the value of 48.62% to 2.69%
in CNN. Since the AEs can be considered as the potential malware variants,
the proposed method is proven to have a unique ability to defend the threat of
malware variants.
21
curves indicate that the accuracy of each detection model is growing with the
rise of the scale during AE generation.
450 When the AE’s scale of training dataset grows up to 1000, the accuracy of
these two improved methods are up to 96.81% and 97.39%, as is shown in Fig. 4
respectively. These results are much higher than those of the traditional CNN
and SVM detectors, whose detection results are under the average value of 86%
in related references. However, with the AE size increasing, their accuracies do
455 not rise continuously until this number exceeds a certain degree, after which the
accuracy drops. This phenomenon results from the over-fitting issue of detection
model with the enlargement of training samples. Therefore, a reasonable size of
AE set is of great importance to improve the accuracy of proposed Visual-AT.
98
97
96
Accuracy (%)
95
94
93
92
SVM
91 CNN
0 200 400 600 800 1000 1200 1400 1600
Number
Figure 4: The detectors’ accuracy with different number scale
22
By using these AE’s sets the detection accuracy can be tested for different
improvement detectors, including SVM and CNN. The experiment process
465 adjusts the value of distortion from 0.1 to 0.7 during AE generation process,
and then uses the AEs to calculate the different accuracy in testing different
detectors, as shown in Fig. 5.
When the distortion (ε) is set to 0.5, the accuracies of these two methods
reach 94.61% and 97.10%, respectively. Even though the detectors’ accuracy
470 cannot obtain a satisfactory result with a small intensity value in distortion
(ε), adjusting ε during AE generation will achieves better performance. In
Fig. 5, one can find that the variation trend of these two lines (red for SVM
and green for CNN) are growing gradually with the increase of the distortion
strength. However, when the distortion parameter reaches a certain degree (as
475 ε > 0.5), the variational trends of the accuracy will gradually tend to be stable
or even decreased. Although the range of the fluctuations in distortion (ε) is
less than 0.5%, the influence of these small variations on accuracy cannot be
ignored as well. Just as solving the optimal interval in parameter optimized
98
96
94
Accuracy (%)
92
90
88
SVM
86 CNN
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Distortion( )
Figure 5: The detectors’ accuracy with different distortion value
23
process, when the distortion (ε) is set between 0.45 and 0.60, that the detection
480 accuracy can achieve an optimal result relatively. Since the distortion value will
directly affect the difference between AEs and the original sample, it can be
inferred that with the increase in the distortion (ε) the difference between AEs
and the original samples will be enlarged gradually and will deviate from the
original one. In general, the bigger the distortion parameter (ε), the easier the
485 desired attack purpose is to achieve. This difference will become more and more
obvious. Even worse, for the pre-training process of the detection model, the
excessive perturbation could lead to an opposite effect on the accuracy result,
i.e. the over-fitting phenomenon. Therefore, an appropriate perturbation with a
suitable distortion value (ε) for AE generation is also of great significance to the
490 Visual-AT method.
4.4. Comparison
24
parison results of these four similar latest works with Visual-AT method, which
510 are illustrated in Table 4. As shown in Table 4, these methods obtain a high
accuracy, in which the maximum average accuracy of both M-CNN method
and Visual-AT is beyond 98%. Comparing these five methods, one can find
that the CNN-based detection methods (such as M-CNN, Lanzcos-CNN, and
Visual-AT) generally obtain a higher accuracy than the others based on the
515 algorithm structure with non-neural network (such as, RF and LZJD with kNN).
On the contrary, for the methods based on tree structure or nearest neighbor
method with optimal distance, the detection accuracy is slightly lower than the
CNN-based ones to some extent. Meanwhile, through comparing the difference
in accuracy results between 5-fold cross validation and 10-fold cross validation,
520 the M-CNN, Random Forest, and Visual-AT achieves high stability with the
deviations 0.82%, 0.70% and 0.65% respectively. In other words, the robustness
of these methods can be well applied to real-world detection. Therefore, from
the analysis above, even though the proposed method (Visual-AT) is not the
most accurate method, considering the experiment results from the comparison
525 in Table 4 the proposed method shows a good robustness and high accuracy in
detection.
25
530 detection model, where the modeling process and training process generally
account for the majority of computational cost. In general, the computational
cost of the detection model’s training process is always consistent with other
corresponding detection methods. In this work for Visual-AT, the framework of
GoogleNet Inception V3 has been selected for training and modeling process in
535 an experiment. Therefore, in this subsection, the experiment detailed discusses
the increased computational overhead in AE generation, especially the model
pre-training process.
In order to clearly represent the computational cost, both the BIG 2015
dataset (with 11,878 samples) and the Ember 2018 dataset (with 5,000 samples)
540 are exploited to implement the model pre-training with different iteration times in
AE generation. The testbed features a CPU at 2.10GHz, 128GB RAM, and 16GB
GPU. At first, the Ember dataset with 5,000 samples is used in the experiment
to execute the model training process with different iteration times. In this
experiment, in order to show the different overhead values of computational cost,
545 a different value of training iteration has been chosen with an order of magnitude
gap, e.g., 100 and 1,000. In Table 5, the computational results are listed in
detail, including the Volatile GPU-Util, GPU memory usage, Power: Usage/Cap,
GPU fan, and time consumption. Among them, these results of overhead are
average values obtained from three groups of experiments, which could represent
550 the general attributes of Visual-AT. Moreover, considering the factors of sample
size and different dataset, the BIG dataset with 11,878 samples has also been
implemented in the model training process with different iteration. One can see
26
the computational results in Table 5 as well. Even though the sample size is
over ten thousand with one thousand the iteration, the computational costs do
555 not consume too much, such as 43% for Volatile GPU-Util and 69W/180W for
power overhead. Generally, 100 iterations can satisfy the demand to construct a
pre-training model. As it listed in Table 5, for the group with 11878 samples and
100 iterations, the computational cost in it just takes 32% Volatile GPU-Util
and 91.71 seconds for the whole the training process. Consequently, comparing
560 with each experiment groups in Table 5, the computational cost of the proposed
method is in an acceptable range.
Furthermore, considering the whole overhead of Visual-AT, this experiment
also calculates the computational cost of AE generation process with two kinds
of generation methods (including FGSM and C&W’s attack) after model pre-
565 training, even though the pre-training model occupies the main cost in the
Visual-AT. In this experiment, we exploit the same distortion parameters (as
ε=0.5) to generate AEs by a pre-trained detection model. Table 6 lists the
computational cost results of AE generation with the different two methods as
well. The results include the values of a CPU Utilization (CPU percent), Memory
570 Usage, and consumption time, which are mean values obtained from executing
three groups of the same experiments. As FGSM based method, it takes 34.5%
in CPU Utilization, 4.4% in Memory Usage (as 5.648E + 09/1.288E + 11), and
0.220 seconds for consumption time on average. Besides, for C&W’s attack based
method, the average computational cost is also at a lower level, where the CPU
575 Utilization is 35.7% with 4.5% of Memory Usage (as 5.659E + 09/1.288E + 11)
and 9.603 seconds for consumption time. Meanwhile, comparing the evaluation
metrics with these two methods in Table 6, the average calculation efficiency
AE generation Method CPU Utilization (%) Memory Usage (%) Consumption Time (s)
27
of the FGSM based method is slightly better than the C&W’s attack based
method, especially for the consumption time of 0.220s. Therefore, one can find
580 that although the visual-AT, an ML-based detection method, will result in some
additional computational overhead, the evaluation metrics of these experimental
results are still in a reasonable range.
5. Discussion
In this section, this work discusses some possibilities for the adversarial
585 technique to enhance the ML-based visualization malware detectors and provide
some defensive strategies to effectively defend against AE attacks.
First of all, according to the results illustrated above, one can find that
the detection methods based on SVM are more robust against AE attacks
from Table 2. For instance, when some AEs exist in the testing dataset the
590 average accuracy in the SVM-based detector is generally higher than one of
CNN-based detector. In general, since the generation of AEs is based on the
DNN with a hierarchical structure, we can make a reasonable assumption that
the algorithm structure can influence the detection accuracy according to the
experiment results in Sect. 4. By comparing the linear structure-based SVM
595 detector with the hierarchical structure-based CNN detector, the former achieves
better accuracy and is more robust than the latter. Therefore, one can attribute
this phenomenon to the decision feature of a linear algorithm and the application
of kernel function. Although the CNN-based deep learning method (including
the Visual-AT) is vulnerable to the attack from malware AEs, the detection
600 accuracy could be improved through the adversarial training with the enlarged
dataset and a suitable regularization in the pre-training process. At present,
there is no published work to validate that the structure of different algorithm
will directly have a large influence on the accuracy of the detection model with
AEs. So one can be able to infer that the similar hierarchical discriminant
605 structure of their algorithms between AE generation and malware detection is
an important factor to induce the misjudgment of malware detection.
28
Secondly, according to the difference of malware samples from different
datasets, such as BIG 2015 and Ember 2018, this research also analyzes the
potential threat from different data samples. In this subsection, the mean
610 distance Dx∗ is compared with the difference between the original inputs (x)
and normal AE samples (i.e., x∗ to simulate many different types of benign or
malware sample), as well as the mean distance Dxt between the original inputs
(x) and the corresponding simulated pseudo benign AEs (i.e., malware AEs xt to
simulate the targeted benign sample). The calculated results find that both Dx∗
615 and Dxt have similar values with the proportion of 1:1.17, respectively. All these
AEs (x∗ ) generated for obstructing malware detectors are very similar to the
pseudo-benign AEs (xt ) and the mathematical distribution of these interference
values is also close to the uniform distribution. Therefore, it is easy for these
targeted AEs (xt ) to induce the detection result of the detectors. In practice,
620 it is rather difficult to distinguish normal samples from different AEs. Since
the AE-generation process fits along the optimal gradient in the target category,
whatever the target label is (even irrelevant), the detector will finally be induced
to produce the result desired by the attackers. From this discussion, it’s not
difficult to discover that the difference between samples from different datasets
625 will not influence the final detection results.
Thirdly, the Visual-AT method can simulate malware camouflage and variants.
Different from the normal dataset augmentation, AEs do not naturally appear in
the training set. Therefore, AT can be used to perfect the robustness of detection
models and avoid their flaws revealed by the attackers. Fig. 4 and 5 indicate that
630 the accuracy and detection ability of these detectors will be improved with the
rise of AE’s scale and the intensity of the distortion parameters. However, the
accuracy is not unlimitedly growing. When the number of AEs or the intensity of
distortion reaches a certain value, the accuracies will not rise continuously, which
present to be leveled off or even decreased by degrees. In general, according
635 to the variant features of size scale and distortion values in these experiments
above, the phenomenon can be attributed to the over-fitting issue of detection
model with an exceeding value of training parameters. Therefore, it is of great
29
importance to choose a suitable size of the sample set and a reasonable value of
the distortion intensity for a specific ML-based malware detector of the Visual-AT
640 method.
In summary, according to the above analysis with adversarial technique, the
following three reasons can be inferred for improving the accuracy and robustness
of malware detection. Firstly, the algorithmic structure of different detectors,
such as the linear or hierarchical optimization method. Secondly, the similarity
645 of the difference between AEs. Thirdly, the parameter factors, such as the
distribution features of distortion (ε) and AE’s dataset scale (n).
6. Conclusion
30
to all possible types of malware attacks is also a challenging topic that is worth
studying, such as the defensive of the specific attack, Distributed Denial of
Service (DDoS). Finally, one can believe that the Visual-AT method could have
670 a wide range of applications in machine learning and computer security.
Acknowledgment
References
References
31
[5] K. S. Han, J. H. Lim, B. Kang, E. G. Im, Malware analysis using visualized
images and entropy graphs, International Journal of Information Security
14 (1) (2015) 1–14.
[6] T. H.-D. Huang, C.-M. Yu, H.-Y. Kao, R2-d2: Color-inspired convolutional
695 neural network (cnn)-based android malware detections, arXiv preprint
arXiv:1705.04448.
700 [8] X. Yuan, P. He, Q. Zhu, R. R. Bhat, X. Li, Adversarial examples: Attacks
and defenses for deep learning, arXiv preprint arXiv:1712.07107.
705 [10] X. Liu, J. Zhang, Y. Lin, H. Li, ATMPA: attacking machine learning-
based malware visualization detection methods via adversarial examples, in:
Proceedings of the International Symposium on Quality of Service, IWQoS
2019, Phoenix, AZ, USA, June 24-25, 2019., 2019, pp. 38:1–38:10.
715 [13] J.-Y. Kim, S.-J. Bu, S.-B. Cho, Malware detection using deep transferred
generative adversarial networks, in: International Conference on Neural
Information Processing, Springer, 2017, pp. 556–564.
32
[14] B. Kolosnjaji, A. Zarras, G. Webster, C. Eckert, Deep learning for classifi-
cation of malware system call sequences, in: Australasian Joint Conference
720 on Artificial Intelligence, Springer, 2016, pp. 137–149.
740 [21] H. Xiao, T. Stibor, A supervised topic transition model for detecting
malicious system call sequences, in: Workshop on Knowledge Discovery,
2011.
33
[23] X. Han, C. Eckert, Efficient online sequence prediction with side information,
in: IEEE International Conference on Data Mining, 2013.
[31] L. Liu, B. Wang, Malware classification using gray-scale images and ensemble
learning, in: International Conference on Systems and Informatics, 2017,
770 pp. 1018–1022.
34
[33] F. C. C. Garcia, I. F. P. Muga, Random forest for malware classification.
775 [34] K. Kosmidis, C. Kalloniatis, Machine learning and images for malware detec-
tion and classification, in: Proceedings of the 21st Pan-Hellenic Conference
on Informatics, ACM, 2017, p. 5.
[36] D. Lee, I. S. Song, K. J. Kim, J.-h. Jeong, A study on malicious codes pat-
tern analysis using visualization, in: Information Science and Applications
(ICISA), 2011 International Conference on, IEEE, 2011, pp. 1–5.
35
800 [43] N. Akhtar, A. Mian, Threat of adversarial attacks on deep learning in
computer vision: A survey, arXiv preprint arXiv:1801.00553.
[44] J. Aldrich, J. Miller, Earliest known uses of some of the words of mathe-
matics, particular, the entries for” bell-shaped and bell curve”,” normal
(distribution)”,” Gaussian”, and” Error, law of error, theory of errors, etc.
805 [45] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks,
in: Security and Privacy (SP), 2017 IEEE Symposium on, IEEE, 2017, pp.
39–57.
[49] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
in: Proceedings of the IEEE conference on computer vision and pattern
recognition (IEEE CVPR 2016), 2016, pp. 770–778.
[50] Y. Bengio, et al., Learning deep architectures for ai, Foundations and
820 trends
R in Machine Learning 2 (1) (2009) 1–127.
36
[54] F. A. Narudin, A. Feizollah, N. B. Anuar, A. Gani, Evaluation of machine
learning classifiers for mobile malware detection, Soft Computing 20 (1)
(2016) 343–357.
37
Author Biography:
Xinbo Liu
Jiliang Zhang
He Li