A Novel Method For Malware Detection On ML-based Visualization Technique

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

A Novel Method for Malware Detection on ML-based Visualization Technique

Journal Pre-proof

A Novel Method for Malware Detection on ML-based Visualization


Technique

Xinbo Liu, Yaping Lin, He Li, Jiliang Zhang

PII: S0167-4048(18)31462-7
DOI: https://doi.org/10.1016/j.cose.2019.101682
Reference: COSE 101682

To appear in: Computers & Security

Received date: 22 December 2018


Revised date: 11 October 2019
Accepted date: 26 November 2019

Please cite this article as: Xinbo Liu, Yaping Lin, He Li, Jiliang Zhang, A Novel Method for
Malware Detection on ML-based Visualization Technique, Computers & Security (2019), doi:
https://doi.org/10.1016/j.cose.2019.101682

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier Ltd.


A Novel Method for Malware Detection on ML-based
Visualization Technique

Xinbo Liua,b , Yaping Lina,b,∗, He Lia , Jiliang Zhanga


a TheCollege of Computer Science and Electronic Engineering, Hunan University, Changsha,
China
b Hunan Provincial Key Laboratory of Trusted System and Networks in Hunan University,

Changsha, China

Abstract

Malware detection is one of the challenging tasks in network security. With the
flourishment of network techniques and mobile devices, the threat from malwares
has been of an increasing significance, such as metamorphic malwares, zero-day
attack, and code obfuscation, etc. Many machine learning (ML)-based malware
detection methods are proposed to address this problem. However, considering
the attacks from adversarial examples (AEs) and exponential increase in the
malware variant thriving nowadays, malware detection is still an active field of
research. To overcome the current limitation, we proposed a novel method using
data visualization and adversarial training on ML-based detectors to efficiently
detect the different types of malwares and their variants. Experimental results
on the MS BIG malware database and the Ember database demonstrate that
the proposed method is able to prevent the zero-day attack and achieve up to
97.73% accuracy, along with 96.25% in average for all the malwares tested.
Keywords: Malware Detection, Adversarial Training, Adversarial Examples,
Image Texture, Data visualization
2018 MSC: 00-01, 99-00

∗ Correspondingauthor:
Email address: yplin@hnu.edu.cn (Yaping Lin)

Preprint submitted to Computers & Security December 2, 2019


1. Introduction

Malicious softwares (Malwares) usually refers to a generic term for all un-
wanted softwares, designed to gain unauthorized access, steal useful information,
disrupt normal operation and adversely affect computers or even mobile devices
5 [1, 2]. Even though there are eight major types of malwares discovered in the real
world [3], there were 6,480 and 6,447 publicly disclosed vulnerabilities in 2015 and
2016, respectively, and the number of cases became worse up to 14,712 in 2017
according to cvedetails1 . Therefore, how to detect different kinds of malwares,
especially their variants, efficiently and accurately is a challenge nowadays.
10 Traditionally, the signature-based determination is used to detect malware,
but its scalability limits the applicability [4]. Static code analysis is another
kind of malware detection method, which is working for a complete coverage
through disassembling, however, it usually suffers from code obfuscation and the
executable files must be unpacked and decrypted before analysis [2]. Different
15 from analyzing code statically, dynamic code analysis is proposed as not to unpack
or decrypt the execution file in a virtual environment, which is time-intensive
and resource-consuming [3]. More importantly, the methods mentioned above
are unable to detect specific types of malware whose behavior is well-camouflaged
or not satisfied by trigger conditions.
20 Recently, malware detection has employed different machine learning (ML)
methods to improve the detection efficiency [5, 1]. Especially for the visualization
techniques in ML-based detection methods, which could not only be more efficient
in the detection process but also be flexible enough to break the restrictions
between file formats[6, 7]. However, the ML-based detectors are vulnerable to the
25 attack from Adversarial Example (AE) [7, 8]. AE is a special sample generated
from the original dataset with a tiny perturbation, which is able to fool the
ML-based malware detectors [9]. If there exists such interference from malware
AEs, that the detection accuracy of ML-based malware analytical methods will

1 https://www.cvedetails.com

2
be greatly influenced by them [7, 10]. Even worse, these detectors will be induced
30 to output an opposite result. Fortunately, in order to defend the attack from
AEs the Adversarial Training (AT) is proposed. AT is an AE-driven technique
to increase the accuracy and robustness of the detection model by augmenting
training data with targeted AEs in the pre-training process [11, 9]. Even though
AT/AE techniques have been used for malware detection such as the malware
35 analysis with targeted interference, they only focused on one specific malware
format, such as Android, app or swf files [7, 12, 13], rather than investigating
a universal file format for all the potential malwares and their variants. To
the best of our knowledge, the AT technique is able to solve the problem of
AE’s perturbation, but it has not been used in the area of ML-based malware
40 visualization detection.
In this paper, we propose an AT-based malware visualization detection
method, named Visual-AT, which not only improves the detection accuracy in
malware analysis but also prevents potential attacks from malware AEs and
associated variants. Additionally, the proposed method is suitable for most of
45 the universal malware file formats, such as worms, viruses, trojans, spyware
et al. In Visual-AT, the generated AEs are used to simulate the potential
malware variants which are normally disguised as benign samples in traditional
ML-based detectors. Meanwhile, Visual-AT can also imitate malware variants
to not only extend the malware dataset but also facilitate extracting malware
50 features. Besides, we optimize the commonly used AE-generation (FGSM and
C&W’s attack) and image-transformation methods (visualized transformation
and normalization). The experimental results on real malware datasets (from the
MS BIG database and the EMBER database) show that our method achieves a
superior performance versus four latest works and the traditional SVM- and CNN-
55 based detectors in terms of efficiency and accuracy. An up-to 97.73% accuracy is
obtained, along with 96.25% on average for malware variants tested. In addition,
compared to the normal methods without using AE/AT, the proposed method
can obtain an average 28.12% of detection-accuracy increase and 81.24% of false
positive rate reduction. This paper makes the following contributions:

3
60 (1) To the best of our knowledge, this paper proposes the first ML-based
malware visualization detection method with a suitable model regularization by
exploiting both AE and AT techniques.
(2) The proposed method can generate the variance and camouflage of
malwares to mitigate the novel and probable malwares in real detection. This
65 method (Visual-AT) is also a more accurate and efficient detection method to
prevent the zero-day attack.
(3) This work carries out a performance analysis of the proposed method
and evaluation in terms of accuracy and robustness of real dataset.
The rest of the paper is organized as follows: Sect. 2 surveys the related work
70 of malware detection and adversarial techniques, Sect. 3 describes the proposed
method and the corresponding improvement malware detection approach, Sect. 4
shows the experimental results and Sect. 5 discusses the possibilities for the
proposed method to improve the accuracy and robustness. Finally, conclusion
and future works are presented in Sect. 6.

75 2. Related work

2.1. Machine Learning based Malware Detection Methods

Machine learning-based malware analysis and detection are hot spots in


several research efforts. These efforts employ various behavioral features of
malware as input for statistical analytical models. By analyzing code or tracing
80 events, analysts could acquire the features needed [14], e.g. system calls, registry
accesses, and network traffic. In general, these kinds of sequences will be analyzed
through supervised (for classification), semi-supervised and unsupervised (for
clustering) learning methods [15, 16].
In recent years, several workers have used advanced machine learning methods
85 and extracted more information from malware datasets for analysis. As recurrent
neural networks (RNN) become popular, some researchers have used RNN for
malware detection and classification [17, 18, 14], where the API sequence invoked
by a program is used as the input of RNN. Then, the RNN detector will predict

4
whether the program is benign or malware. Although these methods only process
90 the sequence in the forward direction, some sequential patterns may lie in the
backward direction. Bidirectional RNN [19] tries to learn patterns from both
directions, that is an additional backward RNN is used to process the reversed
sequence. But the computation process will incur a substantial overhead with
low efficiency, in which output probability is used to calculate the concatenation
95 of the hidden states from both directions. By applying discriminate distance
matrices learning, Kong and Yan [20] proposed a method that observes the
similarity on the extracted fine-grained features between two malware programs
based on the function call graph for each sample. This learning method could
cluster the malware samples belonging to the same family while keeping the
100 different clusters separate by a marginal distance. The weakness of this method
is the detection accuracy relies heavily on the extracted fine-grained features to
compare similarity.
In addition, another different kind of method is proposed by the recent
application of statistical topic modeling approaches to the classification of
105 system call sequences [21], which was further extended with a nonparametric
methodology [22]. Subsequently, this method has been extended by taking
system call arguments as additional information as well as memory allocation
patterns and other traceable operations [23]. Pfoh et al. [24] exploit SVM with
string kernels to represent a sequence-aware method that is capable of detecting
110 malicious behavior online by classifying system call traces in small sections
and keeping a moving average over the probability estimates. The shortcoming
of this method is that the process to act maliciously should interact in some
manner with the rest of the system, or this interaction must take place through
the interface provided by the operating system (e.g. system calls). Moreover,
115 Mohaisen and Alrawi [25] introduced an automated system, AMAL, for large
scale malware analysis and classification. The AMAL consists of two subsystems.
One is AutoMal to collect malicious low granularity behavioral artifacts, and
another is MaLable to create representative features with artifacts. MaLabel
has the ability to tune different learning algorithms for classification, including

5
120 SVM, K-nearest neighbor, and decision tree. The downside to this automatic
system is an unnecessary overhead by running malware samples in virtualized
environments.

2.2. ML-based Visualization Detection Methods without Adversarial Technique

Visualization technique collects binary files of malware which can be read


125 as 8-bit unsigned integers and will be visualized as gray-scale images [26, 1].
The value is between 0 (black) and 255 (white). According to the scale of
different data samples and different analytical requirements, the width of these
transformed images could be appropriately adjusted, for instance, 32 for the file
size below 10KB, 64 between 10KB to 30KB. Once the width of the transformed
130 image is set, its height is allowed to change due to malware sizes. Fig. 1 shows
an example of Trojan downloader from dataset MS BIG [27], representing the
visualization process of malware. Additionally, a detailed taxonomy of various
primitive binary fragments and their corresponding visual regions (as gray-scale
images) are illustrated with the distinctive image textures, e.g. the section of
135 .text, .rdata, .data and .rsrc. Certainly, icons might be also included in the
application if needed.
ML-based visualization methods became popular for detecting malware in
recent years, such as Convolutional neural networks (CNN) [28, 29], Support
Vector Machines (SVM) [30, 1], Nearest Neighbours (NN) [1, 4], K-means [31, 32],

.text

.rdata

 94 E8 EA ...
C 7 01 BB ...
Bytes   .data
files  5 E C 2 04 ...
 
 ... ... ... ...
Malware Binary Binary to
100101001110100011 8 bit vector .rsrc
10101011101010...
8 Bit Vector to
Grayscale Image

Figure 1: Process of Malware Visualization Transformation.

6
140 Random Forest (RF) [33], Decision Tree (DT) [34, 35] and et al. About different
ML-based detectors, this type of methods has been proposed for detecting
unknown samples or underlining those samples that exhibit unseen behavior
for detailed analysis. Lee et al. [36] firstly transformed the malicious code
into the image to accelerate the malware detection. And then, Kong and Yan
145 [20] proposed to classify the malicious samples automatically with encoded
features. Han et al. [5] analyzed the global features of malware based on binary
textures. However, the limitation of these methods is that an attacker can
adopt counter measures to beat the system because of the features based on
a global image. Then, Makandar et al. [1] constructed the texture feature
150 vectors with multi-resolution and wavelet for malware image classification via
SVM. Huang et al. [6] introduced an R2-D2 method through a color-inspired
RGB texture image without extracting pre-selected feature via CNN. Recently,
Kalash et al. [28] exploit a deep learning approach, which converts malware
binaries to gray-scale images and subsequently train a CNN framework for
155 further classification. In general, these ML-based visualization methods could
obtain high accuracy, additionally the rate of true positive and false positive
also illustrated a good robustness [1, 2]. But the majority of these studies are
only based on compression and dimensional reduction with real data samples to
extract malicious features. It hence brings a serious security threat: if AEs are
160 involved, traditional ML-based detection cannot successfully identify malware
[37].

2.3. Adversarial Techniques in Malware Detection

Adversarial technology has been rapidly developed in the recent three


years[38]. Adversarial Training (AT) was originally proposed to improve the
165 robustness of distinguishing model between training and predictions by Szegedy’s
group [9]. However, it requires a computationally expensive inner loop to identify
the adversarial direction. To overcome this problem, an optimized definition of
adversarial perturbation was proposed by Goodfellow et al. [39, 40] to approxi-
mately compute the inner loop. Madry et al. [41] proved that AT can be used

7
170 to defend the white-box attacks if the perturbation computed during training is
close to the maximum of model loss. Even though a few studies employed AE
and AT for malware detection, none of them relied on the ML-based visualization
detection approach. Grosse et al.[7] used AEs for discrete and binary input
domain to mislead classifier in malware samples. But this attack is able to
175 handle the specific binary feature in Android malware detection. Maiorca et
al. [12] were the first to formally define and quantitatively characterize this
vulnerability in SWF-file-based malware dataset. To mitigate the zero-day attack
from malware, Kim et al. [13] proposed a transferred generative adversarial
network (tGAN) for automatic classification with visualization-based malware
180 transformation. These methods mentioned above are not able to deal with the
exponential growth of malware variants and AEs. Considering these reasons,
with the increased importance of the adversarial techniques, the normal detection
methods (even with the ML-based visualization approaches) are hard to adapt
to the relevant requirements of different detection object or the environment in
185 the recent research [37] of malware determination.

3. Methodology

In this section, the proposed method, Visual-AT, for malware’s ML-based


visualization detection is described in detail. By using adversarial techniques,
Visual-AT improves not only the effectiveness of the detection model but also
190 the accuracy and robustness. At first, we provides an overview of Visual-AT in
this section. Secondly, we descries how to craft an AE to simulate the variant
malware in Sect. 3.2. Thirdly, we elaborate on how the detection model could be
enhanced with AT. Finally, according to existing researches, this paper analyzes
the computational cost of the proposed Visual-AT method in detail.

195 3.1. Overview

Fig. 2 shows the flow chart of Visual-AT which contains five functional
modules, including Data preprocessing, Pre-training Process, AE Generation

8
Bytes
Data files

Malware Dataset Malware Binary Rescale & Preprocess Image


10010100111010001110 Malware Grayscale Image
101011101010...
Data preprocesssing

……

Input
Pre-training Process Output

……
AE Type 1 AE Type 2 AE Type n
Original Dataset AE Generation Process

CNN

…… + SVM
Enlarged Dataset
Detection Methods
Adversarial Training & Detection Process

Malware (M) Benign sample (B)

Ramnit (R) Lollipop (L) …… Gatak (G)


Output Results

Figure 2: Flowchart of the proposed Visual-AT method.

Process, Adversarial training & detection process and Output Results. Each
module plays a specific role. Meanwhile, the colored arrows denote the flow
200 direction of the corresponding dataset. Since the researchers usually obtain
malware datasets with all these exiting classes, the proposed method can be
applied to solve the problem with a kind of attack in gray-box [42].
Visual-AT firstly converts malware code into feature images (or malware

9
images) and then rescales these transformed images to the same size in the Data
205 preprocessing stage. By means of these preprocessing measures (transformation
and normalization), the image samples are used for ML-based malware detection,
such as using CNN and SVM. To generate AEs and simulate corresponding
subtle variants for certain malware types, the original dataset will be transmitted
to the stages of Pre-training Process and Adversarial Training & Detection
210 Process after Data preprocessing. Assume that there are no AEs available, the
preprocessed dataset will be transmitted to the final stage directly and make a
limited determination. However with the help of adversarial techniques during
Pre-training Process and AE Generation Process stages, the proposed method
crafts a subset of targeted AEs to enlarge the dataset. Finally, Visual-AT will
215 be enhanced with a suitable regularization of the generated dataset. Therefore,
this method is able to intentionally craft a subtle perturbation to simulate
malware variants and camouflage from original malware dataset via different AE
generating methods.

3.2. Generation of Malware Variants

220 Visual-AT targets the generation of malware variants by virtue of AEs’


purposiveness. The FGSM and C&W’s attack methods are very popular AE-
crafted methods [8, 43] in terms of transferability, robustness, and overhead.
Both of them are therefore used to generate targeted malicious samples to
produce malware variants and improve the discriminant accuracy effectively.

225 3.2.1. Visual-AT FGSM method


FGSM [39] is a fast and robust method only performing one step gradient
update along the direction of the sign at each pixel point. The perturbation can
be expressed as,
δ = ε · sign(∇x Jθ (x, l))

where ε represents the distortion between AEs and original samples. sign(·)
denotes the sign function,∇x Jθ (·, ·) computes the gradient of the cost function
J around the current value x of the model parameters θ. l is the label of x.

10
  

x 1 x*  x  
    (tanh( w)  1)  x
“Obfuscator.ACY” 2 “Benign sample”
81.3% confidence 99.7% confidence

Figure 3: Process of Perturbed Visualized Malware

To achieve an accurate detection, the FGSM method is optimised to exploit


230 an index i for a maximum gradient, i.e. arg max(f (x∗ )) = y ∗ or the index
i
value reaching a threshold with the maximum index imax , according to existing
works of [39, 8]. Since the quantity of generated AE is limited for the training
dataset, the random normal distribution [44] is introduced to fine-tune the
distortion parameters (ε) for AT. The perturbations (δ) will be illustrated as
1
235 f (δ|µ, σ 2 ) = √ (δ−µ)2
, where µ is the expectation value of the distribution

2πσ 2 e 2σ 2
and σ is the standard deviation. Visual-AT can simulate different types of
malware variants and disguise rapidly.

3.2.2. Visual-AT C&W’s attack method


L-norm distance-based C&W’s attacks [45], e.g. l0 , l2 & l∞ , are also used
in the AE generation module. To optimize the penalty term and distance
approximation, the basic problem of this targeted attack is expressed as:

arg minkx∗ − xk, s.t. f (x∗ ) 6= f (x),


δ

11
where δ = x∗ − x, a new objective function g is defined as:

minkδkp + c∗ · g ∗ (x + δ), s.t. x + δ ∈ [0, 1]n .


δ

Note that if and only if f (x∗ ) = y 0 6=y, the penalty term and distance can be
240 further optimized. In this case, the optimization formulation could be modified
as follows. Taking l2 attack as an example, the original single reformer c∗ · g ∗ (·)
is divided into two parts as r · gr (·) and d · gd (·), as c∗ · g ∗ (·) = r · gr (·) + d · gd (·),
which denotes the corresponding loss function in a detector. Additionally, r and
d are chosen via binary search. Algorithm 1 illustrates the pseudo-code of the
245 optimized C&W’s attack-based method in l2 -norm attack, according to existing

Algorithm 1 Crafting AEs with C&W’s attack-based method


Input: x, y, ε, f (·)
Output: x∗ , δ
x∗ ← x //Data preprocessing
δ = 12 (tanh(w) + 1) − x.
while arg min(D(x, x + δ)) and f (x∗ ) 6= y do
δ
mink 12 (tanh(w) + 1)k2 + c · g( 21 (tanh(w) + 1)),
w
//Optimize w
g(x∗ ) = max(max(Z(x∗ )i ) − Z(x∗ )t , −κ),
δ
//where Z(·)i is the softmax function for class i
c · g(x∗ ) = r · gr (x∗ ) + d · gd (x∗ )
//where, r and d are chosen via binary search
if wmax ≤ 0 then
return Failure
end if
δ = ε · 12 (tanh(w) + 1) − x,
//update δ

x ←x+δ
end while
return x∗, δ

12
works of [9, 43]. Fig. 3 shows an example of AE perturbation process from the
correct detection result with an average confidence of 81.3% on the MS BIG
dataset to the induced error detection result with an average confidence of 99.7%
by C&W’s attack with the distortion ε = 0.45.

250 3.3. Optimised ML-based Visualization Malware Detection

Since AT can effectively improve the regularization of the detection model,


regularization model is incorporated in Visual-AT to achieve a better detection
accuracy. In contrast with the normal data augmentation where AEs do not
appear in the test set naturally, attackers are highly likely to reveal the flaws of
detection models. The AE-based malware variants will be directly submitted to
CNN and SVM detectors. But the detection results cannot meet the security
requirement. Therefore, AEs are employed in the AT process to enhance the
detection model with a suitable regularization. On the basis of the existing
works [46, 40], the modified loss function of AT is then defined as,

Ladv (X, Y, θ) = D[H(Y ), P (Y |X + Radv , θ)] ,

where Radv = arg maxD[H(Y ), P (Y |X + Radv , θ)], θ is the model parameter


R;R≤
of the vector during training. D[P, P 0 ] denotes a non-negative function, which
represents the distance set between two distributions P and P 0 . The function
H(Y ) is a distribution derived from the training sample, and also the function to
which we strive to approximate the parameterized model. Since the malware’s
binary indicator vector X does not possess any particular structural properties
or interdependencies, a regular feed-forward neural network is applied for the
training process. The rectifier F (·) is used as the activation function for each
hidden neuron in the training network and additionally applied for standard
dropout and gradient descent. For the normalization of output probabilities, a
softmax layer [47] is employed, which can be expressed as,

Xm
exi n

Fi (X) = , xi = wj,i · xj + bj,i ,


ex0 + ex1 j=1

13
where F (·) is an activation function, xi denotes to the i-th input to a neuron,
w is the weight vector in gradient descent and b represents the corresponding
typical value.

3.4. Computational cost analysis

255 As a machine learning based detection method, the computational cost is also
an important factor that is worth taking into consideration. Paying attention
to the proposed method, the computational cost mainly depends on the AE
generation process [48] and the training process of the detection model [49].
At first, this section discusses the asymptotic analysis for the training process
260 and the AE generation process respectively, including the time complexity
and space complexity of the proposed Visual-AT method. For the training
process of the detection model, the time complexity can been illustrated as,
PD
T ime ∼ O( l=1 Ml2 · Kl2 · Cl−1 · Cl ), according to the research in [49, 50], where,
M is the edge length of output feature map in each convolutional kernel, K
265 denotes the edge length of each convolutional kernel, C represents the number
of channels for each convolutional kernel, D is the convolutional layer number
of neural network (i.e., depth of network), and l denotes the l-th convolutional
layer of a neural network. Meanwhile, the corresponding space complexity
PD
is Space ∼ O( l=1 Kl2 · Cl−1 · Cl ), where, the space complexity reflects in
270 the volume of the method (or model) itself, which is related to the size of the
convolution kernel (K), the number of channels (C), and the depth of the network
(D). Referring to the work of [48, 51] for the AE generation process, the time
complexity analysis for gradient optimization-based AE generation method could
2
LPM
be illustrated as, T ime ∼ O( ε ). Where, L and P are Lipschitz constants,
275 which are associated with the specific norm (k · k), as L ≡ Lk·k , P ≡ PM,k·k . And
ε denotes the gradient distortion parameter. Additionally, the corresponding
LPx2
space complexity could be expressed as, Space ∼ O( ε ).

In this work, the proposed Visual-AT method contains not only a CNN-based
training process for model detection but also a gradient optimization-based AE
generation process. According to the analysis above, the computational cost for

14
time complexity could be demonstrated as,
D
X 2
LPM
T ime ∼ O( Ml2 · Kl2 · Cl−1 · Cl + ).
ε
l=1

In addition, the corresponding space complexity of the proposed Visual-AT


method could also be deduced as follows,
D
X LPx2
Space ∼ O( Kl2 · Cl−1 · Cl + ).
ε
l=1

Where the space complexity reflects in the volume of the proposed method, but
has nothing to do with the size of the input.
280 The Visual-AT method presented in this paper is a robust method that
is based on the model pre-training process with ML algorithm. In general,
the modeling process will account for the majority of computational cost.
Since it is not difficult to obtain the computational cost, we have T ime ∼
PD LP 2
O( l=1 Ml2 · Kl2 · Cl−1 · Cl )  T ime ∼ O( εM ) according to the calculation
285 above. The corresponding overhead can be computed mainly according to
the selected model framework in the Visual-AT method. Therefore, for the
proposed Visual-AT scheme, the computational cost could be approximated
PD PD
as, T ime ∼ O( l=1 Ml2 · Kl2 · Cl−1 · Cl ) and Space ∼ O( l=1 Kl2 · Cl−1 · Cl ).
Additionally, the computational cost for the detection model’s training process
290 is always consistent with corresponding detection methods. In the following
experiment part of Sect. 4.5, this paper discusses the real experimental results
of AE generation and model training process of Visual-AT in detail.

4. Experiments

In this section, verification and experimental evaluation are conducted in


295 terms of effectiveness and performance of Visual-AT. Firstly, this section intro-
duces the setup of the experiment, such as the necessary routines for preprocessing
the collected dataset, and the experimental setup in detail. Then, two types
of experiments are conducted. One is verifying the feasibility of the proposed
Visual-AT method, the other is testing the extendibility of different detection

15
300 methods with different AE crafted methods. Meanwhile, the cross-validation
for statistical evaluation is also used to analyze the reliability of the proposed
approach. Finally, by testing the computational cost of the proposed method,
which includes the training process and generation process of AE, this exper-
iment validates the flexibility and reliability of Visual-AT on a platform with
305 real malware dataset. The experiment programs are implemented in Python
3.6 and MATLAB 2018a under the operating system CentOS Linux release
7.6.1810 (Core), while the testbed features an Intel(R) Xeon(R) Silver 4110 CPU
at 2.10GHz, 128GB RAM, and 16GB NVIDIA Quadro P5000 GPU with CUDA
10.1. The detailed descriptions of each experiment are as follows.

310 4.1. Dataset and Setups

Verification and experimental evaluation are conducted in terms of the


effectiveness and performance of Visual-AT. Firstly, an open source malware
dataset in Kaggle Microsoft Malware Classification Challenge (BIG 2015) [27] is
used in this experiment, which consists of 10,678 labeled malware samples with
315 nine classes. For benign executables, these samples are collected by scraping all
the valid executables from a freshly installed Linux CentOS 7.6, Windows 10, and
iOS 11.3 on the virtual machine. By using anti-virus vendors in the VirusTotal
search [52], 1200 benign file samples have been selected. The distribution of
these samples is illustrated in Table 1. Secondly, considering the renewal and
320 variant of malware dataset another latest dataset, named Ember 2018 (Endgame
Malware BEnchmark for Research) [53], is also collected for this experiment,
which is an open source collection of 1.1 million portable executable files that
were scanned by VirusTotal as well. From the Ember 2018 dataset, 4000 training
samples (2000 malicious and 2000 benign) and 1000 test samples (500 malicious,
325 500 benign) are collected in the further experiment.
In order to evaluate the Visual-AT method, this work conducts extensive
experiments on both the BIG 2015 and Ember 2018 datasets. Firstly, in section
4.2, it presents experiments designed to validate the effectiveness of the proposed
method by making a quantitative comparison between different malware detec-

16
Table 1: MS BIG dataset with malware class distribution & benign samples.

Types of Malware Number of Samples


Ramnit (R) 1534
Lollipop (L) 2470
Kelihos ver3 (K3) 2942
Vundo (V) 451
Simda (S) 41
Tracur (T) 685
Kelihos ver1 (K1) 386
Obfuscator.ACY(O) 1158
Gatak (G) 1011
Benign sample(B) 1200

330 tion methods. Secondly, in section 4.3, it demonstrates experiments aimed at


evaluating the performance index of Visual-AT through testing the accuracy of
these constructed model during AE generation. Thirdly, in section 4.4, it also
illustrates the comparison of different methods for malware detection, which
includes the detection accuracy comparison between four similar latest works in
335 the field and the proposed Visual-AT. Finally, in section 4.5, it represents the
computational cost results of the proposed method, including the memory-usage,
volatile GPU-util, CPU Utilization, Power Usage and time consumption of the
model training in AE generation.
In these experiments, considering the robustness and expendability of neural
2
340 network architecture, the GoogleNet Inception V3 is adopted in Visual-AT
for AE generation and pre-training. During the process of AE generation
and pre-training, the targeted AEs will be built upon the FGSM and C&W’s
attack methods according to the different distortion parameter ε, which are
implemented for the follow-up testing experiments. To verify the effectiveness

2 https://github.com/BartyzalRadek/Multi-label-Inception-net

17
345 of the proposed method, the targeted AEs generated from the BIG dataset are
exploited to compute the detection accuracy in Visual-AT, while comparing the
method against traditional ML-based visualization malware detection methods,
such as CNN and SVM. The evaluation metrics of this experiment include the
accuracy value, false positive rate, and false negative rate, which is effective for
350 quantitative evaluation. A high accuracy with both low both false positive rate
and false negative rate indicates a better performance. Moreover, in order to
evaluate the performance metrics of Visual-AT, the AEs generated from the
Ember dataset are used to test the detection accuracy of these constructed
model during AEs generation. By adjusting different sizes of training dataset
355 and different parameters during AEs generation process, the experiment results
of detection accuracy show a variation tendency clearly in performance analysis.
Therefore, malware detection results based on different datasets are elaborated,
along with discussions of the ability to resist AEs. Finally, by executing the
model pre-training process with different sizes of datasets and evaluating the
360 different AE generation methods in Visual-AT, the computational cost results
of the proposed method are also represented in this section. Parameters and
settings are described in detail below.

4.2. Quantitative Comparison for Different Malware Detection Methods

4.2.1. Results of Malware Detection with Pure Dataset


365 A pure dataset from BIG dataset without AEs is initially used, which contains
only the transformed malware samples. The selected samples are separated into
10-fold cross-validation through data partitioning and data pre-processing. These
data will be respectively propagated to SVM- & CNN-based malware detectors
and the proposed Visual-AT enhancement detectors. During the pre-training
370 procedure, two different AE crafting methods are used to generate AEs for the
Visual-AT. The distortion parameter ε is 0.5 for FGSM and 0.35 for C&W’s
attack. As for SVM-based detector, it sets the parameter as γ = 1, C = 1.0 and
k = 10, while the default parameters of GoogleNet Inception V3 are used in this
CNN-based detector.

18
375 In Table 2, comparison results are listed including the accuracy, false positive
rate, and false negative rate. Visual-AT achieves an increase in the average
accuracy of 8.64% in SVM-based detector while the average false positive and
negative rates drop below 3.65% and 2.31% respectively. Factors versus a CNN-
based detector are 4.47%, 2.93%, and 1.70%, respectively. Compared to both
380 traditional ML models, Visual-AT not only obtains an average accuracy increase
of 7.41% but also a decrease of 68.17% regarding the false positive rate. In
addition, the detection model using FGSM-based AE is slightly less accurate
than the one with C&W’s attack-based in the SVM detector, with a difference
in value from 94.55% to 96.18%. As for the accuracy of the CNN detector,
385 the model using FGSM-based AE shows superiority over the one using C&W’s
attack-based method with a difference in the accuracy from 97.56% to 96.25%.
As expected, the difference between gradient descent and norm optimization in
AE generation methods is responsible for the findings above.
Compared with the traditional detection methods, such as code analysis and
390 data signature in related work [2, 3, 54], the ML-based visualization detection
methods have improved the accuracy with an average increase of 20%, as the
value of 87.13% for SVM-based detector and 92.16% for CNN-based one according
to the experiment results in Table 2. Furthermore, by adopting the adversarial
techniques the detection results can obtain an efficiency improvement. Table 2
395 lists the comparison between normal ML-based visualization detectors and Visual-
AT, where the latter achieves a better performance. The detection accuracy of

Table 2: The detection results of different detectors with the pure dataset

SVM CNN

Accuracy False positive rate False negative rate Accuracy False positive rate False negative rate

Normal Methods 87.13% 11.57% 4.82% 92.16% 8.62% 3.27%

FGSM 94.55% 5.82% 3.76% 97.56% 3.56% 2.10%

Visual-AT l0 attack 96.18% 2.45% 1.48% 96.43% 2.61% 1.31%


(Our method)
C&W’s attack l2 attack 95.80% 3.76% 2.35% 96.07% 3.19% 2.13%

l∞ attack 96.53% 2.58% 1.67% 96.24% 2.45% 1.47%

19
Visual-AT achieves a 5%∼10% increase and a 5%∼8% false positive rate drop
versus traditional methods. The CNN-based detector without AT has a value of
92.16% with the accuracy on average whereas Visual-AT increases the number
400 to 96.53%. Factors become 87.13% and 95.76%, respectively for the SVM-based
detector.

4.2.2. Results of Malware Detection with AEs


The sample set used above is based on the real dataset without crafted AEs.
Since the latest works have found that the influence of AEs on ML and Artificial
405 Intelligence (AI) is getting worse gradually [43, 37], further experiments were
implemented using datasets with AEs. In this paper, a 10-fold cross-validation
test is also used to estimate the performance of the proposed method, where
the same size of the malware AE set is inserted into the whole dataset to
replace the selected one-part subset for prediction testing. In each sub-testing
410 experiment, it exploits different AE generation methods for AE malware set,
and AE generation process in Visual-AT detector. For example, when the AE
malware set crafted by FGSM-based method is used in experiment, the C&W’s
attack-based AE generation process will be selected in Visual-AT detector for
the sub-testing. The corresponding results of an average value after 10-fold
415 cross-validation are shown in Table 3. Versus malware detection results using the
pure dataset in Table 2, traditional ML models have a significant deterioration
in the discriminant performance when encountering interferences from AEs. The
value of the accuracy drops from 87.13% to 60.23% in the SVM-based detector

Table 3: The detection results of different detectors with the obstructed dataset

SVM CNN

Accuracy False positive rate False negative rate Accuracy False positive rate False negative rate

Normal Methods 60.23% 34.59% 8.82% 54.16% 48.62% 7.63%

FGSM 95.37% 5.12% 2.96% 97.73% 3.17% 1.58%

Visual-AT l0 attack 96.72% 2.45% 1.48% 97.14% 1.98% 1.05%


(Our method)
C&W’s attack l2 attack 95.80% 3.47% 2.13% 96.26% 3.18% 2.17%

l∞ attack 96.21% 2.47% 1.71% 96.56% 2.14% 1.52%

20
and from 92.16% to 54.16% in the CNN-based one. In terms of the false positive
420 rate on average, it increases from 4.82% to 34.59% and from 8.62% to 48.62% in
SVM- and CNN-based detectors, respectively. These obvious gaps indicate that
the AEs hugely affect the performance of traditional ML-based visualization
methods in malware detection.
More importantly, the proposed Visual-AT method shows strong robustness
425 and performs even better when AEs are involved in the dataset. Compared
with the results of normal ML-based visualization methods in Table 2 and 3,
Visual-AT achieves an 1.59× and 1.78× detection-accuracy increase compared
to the SVM-based and CNN-based detectors, respectively. In addition, one can
observe an obvious drop of the false positive rate in two ML-based detectors,
430 from the value of 34.59% to 3.36% in SVM and from the value of 48.62% to 2.69%
in CNN. Since the AEs can be considered as the potential malware variants,
the proposed method is proven to have a unique ability to defend the threat of
malware variants.

4.3. Performance Analysis of Visual-AT

435 To further analyze the performance of Visual-AT, the accuracy of these


constructed model is tested through different sizes of training dataset and different
values of parameters during AE generation. In this experiment, considering the
novelty and variant of malware in recent years, a relatively latest open-source
dataset of the Ember 2018 is taken into consideration for further evaluation.

440 4.3.1. The number of AEs in dataset


In general, the robustness and accuracy of the ML-based malware detector
are affected by the size of the training dataset. A bigger dataset size leads to
higher prediction accuracy. For the FGSM method, different dataset sizes of AEs
are used in the whole experiment dataset (such as n = 100, 200, 500, 100, 1500),
445 and also the accuracy of Visual-AT in SVM- and CNN-based detectors are tested
with ε = 0.5. To better illustrate the variant trend between the accuracy versus
the AE number size, the experimental results are shown in Fig. 4. The increasing

21
curves indicate that the accuracy of each detection model is growing with the
rise of the scale during AE generation.
450 When the AE’s scale of training dataset grows up to 1000, the accuracy of
these two improved methods are up to 96.81% and 97.39%, as is shown in Fig. 4
respectively. These results are much higher than those of the traditional CNN
and SVM detectors, whose detection results are under the average value of 86%
in related references. However, with the AE size increasing, their accuracies do
455 not rise continuously until this number exceeds a certain degree, after which the
accuracy drops. This phenomenon results from the over-fitting issue of detection
model with the enlargement of training samples. Therefore, a reasonable size of
AE set is of great importance to improve the accuracy of proposed Visual-AT.

4.3.2. The perturbation parameter of AE’s generation in dataset


460 Another important factor, the distortion parameter ε is analyzed in the
following to investigate how it affects the discriminant accuracy. Firstly, the
FGSM-based method is exploited to generate a serial of AEs with n = 500.

98
97
96
Accuracy (%)

95
94
93
92
SVM
91 CNN
0 200 400 600 800 1000 1200 1400 1600
Number
Figure 4: The detectors’ accuracy with different number scale

22
By using these AE’s sets the detection accuracy can be tested for different
improvement detectors, including SVM and CNN. The experiment process
465 adjusts the value of distortion from 0.1 to 0.7 during AE generation process,
and then uses the AEs to calculate the different accuracy in testing different
detectors, as shown in Fig. 5.
When the distortion (ε) is set to 0.5, the accuracies of these two methods
reach 94.61% and 97.10%, respectively. Even though the detectors’ accuracy
470 cannot obtain a satisfactory result with a small intensity value in distortion
(ε), adjusting ε during AE generation will achieves better performance. In
Fig. 5, one can find that the variation trend of these two lines (red for SVM
and green for CNN) are growing gradually with the increase of the distortion
strength. However, when the distortion parameter reaches a certain degree (as
475 ε > 0.5), the variational trends of the accuracy will gradually tend to be stable
or even decreased. Although the range of the fluctuations in distortion (ε) is
less than 0.5%, the influence of these small variations on accuracy cannot be
ignored as well. Just as solving the optimal interval in parameter optimized

98

96

94
Accuracy (%)

92

90

88
SVM
86 CNN
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Distortion( )
Figure 5: The detectors’ accuracy with different distortion value

23
process, when the distortion (ε) is set between 0.45 and 0.60, that the detection
480 accuracy can achieve an optimal result relatively. Since the distortion value will
directly affect the difference between AEs and the original sample, it can be
inferred that with the increase in the distortion (ε) the difference between AEs
and the original samples will be enlarged gradually and will deviate from the
original one. In general, the bigger the distortion parameter (ε), the easier the
485 desired attack purpose is to achieve. This difference will become more and more
obvious. Even worse, for the pre-training process of the detection model, the
excessive perturbation could lead to an opposite effect on the accuracy result,
i.e. the over-fitting phenomenon. Therefore, an appropriate perturbation with a
suitable distortion value (ε) for AE generation is also of great significance to the
490 Visual-AT method.

4.4. Comparison

To investigate the advantages and disadvantages of the Visual-AT method,


four latest studies in this field have been selected for comparison. A decision
tree-based ensemble learning method, Random Forest (RF) method [33] with
495 a visualization technique for malware detection has been firstly selected to be
compared with the proposed method. A Lempel-Ziv Jaccard Distance (LZJD)
with kNN method [55] based on normalized compression distance has further
been introduced in this comparison experiment. Considering the popular used
CNN method, two latest works M-CNN [28] and Lanzcos-CNN [56] both have
500 been adopted for the comparison in this section. For the proposed Visual-AT
method, FGSM method has been selected for AE generation and the GoogleNet
Inception V3 for both AE pre-training and malware detection. In addition, the
whole dataset of MS BIG database is also exploited in this comparison, which
contains the pre-processed and transformed malware samples. Finally, all of these
505 selected samples will respectively be separated into both 5-fold cross-validation
and 10-fold cross-validation to detect the accuracy through data partitioning.
After the implementation of three group experiments for each cross-validation
testing with the same setting, one can obtain the average accuracy and com-

24
parison results of these four similar latest works with Visual-AT method, which
510 are illustrated in Table 4. As shown in Table 4, these methods obtain a high
accuracy, in which the maximum average accuracy of both M-CNN method
and Visual-AT is beyond 98%. Comparing these five methods, one can find
that the CNN-based detection methods (such as M-CNN, Lanzcos-CNN, and
Visual-AT) generally obtain a higher accuracy than the others based on the
515 algorithm structure with non-neural network (such as, RF and LZJD with kNN).
On the contrary, for the methods based on tree structure or nearest neighbor
method with optimal distance, the detection accuracy is slightly lower than the
CNN-based ones to some extent. Meanwhile, through comparing the difference
in accuracy results between 5-fold cross validation and 10-fold cross validation,
520 the M-CNN, Random Forest, and Visual-AT achieves high stability with the
deviations 0.82%, 0.70% and 0.65% respectively. In other words, the robustness
of these methods can be well applied to real-world detection. Therefore, from
the analysis above, even though the proposed method (Visual-AT) is not the
most accurate method, considering the experiment results from the comparison
525 in Table 4 the proposed method shows a good robustness and high accuracy in
detection.

4.5. Computational Cost Results

As an ML-based method, the computational cost for the proposed method


mainly depends on the AE generation process and the training process of the

Table 4: The comparison of different methods for malware detection

Method 5-fold Accuracy 10-fold Accuracy


Random Forest (2016) [33] 94.43% 95.13%
LZJD with kNN (2017) [55] 95.47% 96.78%
M-CNN (2018) [28] 97.23% 98.05%
Lanzcos-CNN (2019) [56] 96.72% 95.61%
Visual-AT (ours) 98.21% 97.56%

25
530 detection model, where the modeling process and training process generally
account for the majority of computational cost. In general, the computational
cost of the detection model’s training process is always consistent with other
corresponding detection methods. In this work for Visual-AT, the framework of
GoogleNet Inception V3 has been selected for training and modeling process in
535 an experiment. Therefore, in this subsection, the experiment detailed discusses
the increased computational overhead in AE generation, especially the model
pre-training process.
In order to clearly represent the computational cost, both the BIG 2015
dataset (with 11,878 samples) and the Ember 2018 dataset (with 5,000 samples)
540 are exploited to implement the model pre-training with different iteration times in
AE generation. The testbed features a CPU at 2.10GHz, 128GB RAM, and 16GB
GPU. At first, the Ember dataset with 5,000 samples is used in the experiment
to execute the model training process with different iteration times. In this
experiment, in order to show the different overhead values of computational cost,
545 a different value of training iteration has been chosen with an order of magnitude
gap, e.g., 100 and 1,000. In Table 5, the computational results are listed in
detail, including the Volatile GPU-Util, GPU memory usage, Power: Usage/Cap,
GPU fan, and time consumption. Among them, these results of overhead are
average values obtained from three groups of experiments, which could represent
550 the general attributes of Visual-AT. Moreover, considering the factors of sample
size and different dataset, the BIG dataset with 11,878 samples has also been
implemented in the model training process with different iteration. One can see

Table 5: Computational cost results for model training of Visual-AT

Volatile GPU GPU Time Consumption


Sample Size Training Iteration Pwr: Usage/Cap (W)
GPU-Util(%) Memory Usage Fan(%) (s)

5000 100 39% 15631MiB/16273MiB 54W/180W 26% 48.92s

5000 1000 42% 15631MiB/16273MiB 58W/180W 36% 467.54s

11878 100 32% 15631MiB/16273MiB 57W/180W 27% 91.71s

11878 1000 43% 15631MiB/16273MiB 69W/180W 37% 903.11s

26
the computational results in Table 5 as well. Even though the sample size is
over ten thousand with one thousand the iteration, the computational costs do
555 not consume too much, such as 43% for Volatile GPU-Util and 69W/180W for
power overhead. Generally, 100 iterations can satisfy the demand to construct a
pre-training model. As it listed in Table 5, for the group with 11878 samples and
100 iterations, the computational cost in it just takes 32% Volatile GPU-Util
and 91.71 seconds for the whole the training process. Consequently, comparing
560 with each experiment groups in Table 5, the computational cost of the proposed
method is in an acceptable range.
Furthermore, considering the whole overhead of Visual-AT, this experiment
also calculates the computational cost of AE generation process with two kinds
of generation methods (including FGSM and C&W’s attack) after model pre-
565 training, even though the pre-training model occupies the main cost in the
Visual-AT. In this experiment, we exploit the same distortion parameters (as
ε=0.5) to generate AEs by a pre-trained detection model. Table 6 lists the
computational cost results of AE generation with the different two methods as
well. The results include the values of a CPU Utilization (CPU percent), Memory
570 Usage, and consumption time, which are mean values obtained from executing
three groups of the same experiments. As FGSM based method, it takes 34.5%
in CPU Utilization, 4.4% in Memory Usage (as 5.648E + 09/1.288E + 11), and
0.220 seconds for consumption time on average. Besides, for C&W’s attack based
method, the average computational cost is also at a lower level, where the CPU
575 Utilization is 35.7% with 4.5% of Memory Usage (as 5.659E + 09/1.288E + 11)
and 9.603 seconds for consumption time. Meanwhile, comparing the evaluation
metrics with these two methods in Table 6, the average calculation efficiency

Table 6: Computational cost results for different AE generation process

AE generation Method CPU Utilization (%) Memory Usage (%) Consumption Time (s)

FGSM-based method 34.5% 4.4% 0.220s

C&W’s attack based method 35.7% 4.5% 9.603s

27
of the FGSM based method is slightly better than the C&W’s attack based
method, especially for the consumption time of 0.220s. Therefore, one can find
580 that although the visual-AT, an ML-based detection method, will result in some
additional computational overhead, the evaluation metrics of these experimental
results are still in a reasonable range.

5. Discussion

In this section, this work discusses some possibilities for the adversarial
585 technique to enhance the ML-based visualization malware detectors and provide
some defensive strategies to effectively defend against AE attacks.
First of all, according to the results illustrated above, one can find that
the detection methods based on SVM are more robust against AE attacks
from Table 2. For instance, when some AEs exist in the testing dataset the
590 average accuracy in the SVM-based detector is generally higher than one of
CNN-based detector. In general, since the generation of AEs is based on the
DNN with a hierarchical structure, we can make a reasonable assumption that
the algorithm structure can influence the detection accuracy according to the
experiment results in Sect. 4. By comparing the linear structure-based SVM
595 detector with the hierarchical structure-based CNN detector, the former achieves
better accuracy and is more robust than the latter. Therefore, one can attribute
this phenomenon to the decision feature of a linear algorithm and the application
of kernel function. Although the CNN-based deep learning method (including
the Visual-AT) is vulnerable to the attack from malware AEs, the detection
600 accuracy could be improved through the adversarial training with the enlarged
dataset and a suitable regularization in the pre-training process. At present,
there is no published work to validate that the structure of different algorithm
will directly have a large influence on the accuracy of the detection model with
AEs. So one can be able to infer that the similar hierarchical discriminant
605 structure of their algorithms between AE generation and malware detection is
an important factor to induce the misjudgment of malware detection.

28
Secondly, according to the difference of malware samples from different
datasets, such as BIG 2015 and Ember 2018, this research also analyzes the
potential threat from different data samples. In this subsection, the mean
610 distance Dx∗ is compared with the difference between the original inputs (x)
and normal AE samples (i.e., x∗ to simulate many different types of benign or
malware sample), as well as the mean distance Dxt between the original inputs
(x) and the corresponding simulated pseudo benign AEs (i.e., malware AEs xt to
simulate the targeted benign sample). The calculated results find that both Dx∗
615 and Dxt have similar values with the proportion of 1:1.17, respectively. All these
AEs (x∗ ) generated for obstructing malware detectors are very similar to the
pseudo-benign AEs (xt ) and the mathematical distribution of these interference
values is also close to the uniform distribution. Therefore, it is easy for these
targeted AEs (xt ) to induce the detection result of the detectors. In practice,
620 it is rather difficult to distinguish normal samples from different AEs. Since
the AE-generation process fits along the optimal gradient in the target category,
whatever the target label is (even irrelevant), the detector will finally be induced
to produce the result desired by the attackers. From this discussion, it’s not
difficult to discover that the difference between samples from different datasets
625 will not influence the final detection results.
Thirdly, the Visual-AT method can simulate malware camouflage and variants.
Different from the normal dataset augmentation, AEs do not naturally appear in
the training set. Therefore, AT can be used to perfect the robustness of detection
models and avoid their flaws revealed by the attackers. Fig. 4 and 5 indicate that
630 the accuracy and detection ability of these detectors will be improved with the
rise of AE’s scale and the intensity of the distortion parameters. However, the
accuracy is not unlimitedly growing. When the number of AEs or the intensity of
distortion reaches a certain value, the accuracies will not rise continuously, which
present to be leveled off or even decreased by degrees. In general, according
635 to the variant features of size scale and distortion values in these experiments
above, the phenomenon can be attributed to the over-fitting issue of detection
model with an exceeding value of training parameters. Therefore, it is of great

29
importance to choose a suitable size of the sample set and a reasonable value of
the distortion intensity for a specific ML-based malware detector of the Visual-AT
640 method.
In summary, according to the above analysis with adversarial technique, the
following three reasons can be inferred for improving the accuracy and robustness
of malware detection. Firstly, the algorithmic structure of different detectors,
such as the linear or hierarchical optimization method. Secondly, the similarity
645 of the difference between AEs. Thirdly, the parameter factors, such as the
distribution features of distortion (ε) and AE’s dataset scale (n).

6. Conclusion

Malwares are increasingly posing a serious security threat to the computer


industry, especially AI and IoT. As far as we know, this paper proposes the first
650 generic ML-based visualization method to detect malwares and their variants
with a universal binary format, named Visual-AT. In addition, it employs the
AT technique to detect and analyze the originally difficult-to-identified malware
as well as potential variants with the transformed image data through two ML
algorithms. Experimental results demonstrate that the accuracy and robustness
655 of the proposed method have been greatly improved versus the existing ML-based
malware detectors. Finally, the Visual-AT achieves an up-to 97.73% accuracy
and 96.56% on average for malware cases tested. Compared with some latest
similar methods and the traditional ML-based visualization detection methods,
Visual-AT obtains 28.14% detection-accuracy increase and 81.17% false positive
660 rate reduction on average.
In future work, since the AEs are trained per classification problem with a
horizontal set of labels, an interesting topic will be to develop the Visual-AT for
new scenarios, such as the hierarchical set of labels (e.g. Animal-Dog-Poodle).
Moreover, for other hot topics, the Visual-AT can be applied to different fields,
665 such as sound and speech processing to help one with recognition impairment.
Furthermore, how to optimize the Visual-AT method to extend its generalization

30
to all possible types of malware attacks is also a challenging topic that is worth
studying, such as the defensive of the specific attack, Distributed Denial of
Service (DDoS). Finally, one can believe that the Visual-AT method could have
670 a wide range of applications in machine learning and computer security.

Acknowledgment

This work is supported by the National Natural Science Foundation of China


under Grant (No.61874042, No. 61472125, and No. 61602107), the Hu-Xiang
Youth Talent Program under Grant No. 2018RS3041, the Key Research and
675 Development Program of Hunan Province under Grant No. 2019GK2082, and
the Peng Cheng Laboratory Project of Guangdong Province PCL2018KP004.

References

References

[1] A. Makandar, A. Patrot, Malware class recognition using image processing


680 techniques, in: Analytics and Innovation (ICDMAI), 2017 International
Conference on Data Management, IEEE, 2017, pp. 76–80.

[2] E. Gandotra, D. Bansal, S. Sofat, Malware analysis and classification: A


survey, Journal of Information Security 5 (02) (2014) 56.

[3] A. Makandar, A. Patrot, Overview of malware analysis and detection,


685 in: IJCA proceedings on national conference on knowledge, innovation in
technology and engineering, NCKITE, Vol. 1, 2015, pp. 35–40.

[4] L. Nataraj, S. Karthikeyan, G. Jacob, B. Manjunath, Malware images:


visualization and automatic classification, in: Proceedings of the 8th in-
ternational symposium on visualization for cyber security, ACM, 2011,
690 p. 4.

31
[5] K. S. Han, J. H. Lim, B. Kang, E. G. Im, Malware analysis using visualized
images and entropy graphs, International Journal of Information Security
14 (1) (2015) 1–14.

[6] T. H.-D. Huang, C.-M. Yu, H.-Y. Kao, R2-d2: Color-inspired convolutional
695 neural network (cnn)-based android malware detections, arXiv preprint
arXiv:1705.04448.

[7] K. Grosse, N. Papernot, P. Manoharan, M. Backes, P. McDaniel, Adversarial


examples for malware detection, in: European Symposium on Research in
Computer Security, Springer, 2017, pp. 62–79.

700 [8] X. Yuan, P. He, Q. Zhu, R. R. Bhat, X. Li, Adversarial examples: Attacks
and defenses for deep learning, arXiv preprint arXiv:1712.07107.

[9] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel-


low, R. Fergus, Intriguing properties of neural networks, arXiv preprint
arXiv:1312.6199.

705 [10] X. Liu, J. Zhang, Y. Lin, H. Li, ATMPA: attacking machine learning-
based malware visualization detection methods via adversarial examples, in:
Proceedings of the International Symposium on Quality of Service, IWQoS
2019, Phoenix, AZ, USA, June 24-25, 2019., 2019, pp. 38:1–38:10.

[11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,


710 A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural
information processing systems, 2014, pp. 2672–2680.

[12] D. Maiorca, B. Biggio, M. E. Chiappe, G. Giacinto, Adversarial de-


tection of flash malware: Limitations and open issues, arXiv preprint
arXiv:1710.10225.

715 [13] J.-Y. Kim, S.-J. Bu, S.-B. Cho, Malware detection using deep transferred
generative adversarial networks, in: International Conference on Neural
Information Processing, Springer, 2017, pp. 556–564.

32
[14] B. Kolosnjaji, A. Zarras, G. Webster, C. Eckert, Deep learning for classifi-
cation of malware system call sequences, in: Australasian Joint Conference
720 on Artificial Intelligence, Springer, 2016, pp. 137–149.

[15] D. Ucci, L. Aniello, R. Baldoni, Survey of machine learning techniques for


malware analysis, Computers & Security.

[16] B. Devyani, B. Poonam, Malware classification and machine learning: A sur-


vey, International Journal of Latest Research in Engineering and Technology
725 (IJLRET) 2 (10) (2016) 5.

[17] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, A. Thomas, Mal-


ware classification with recurrent networks, in: 2015 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE,
2015, pp. 1916–1920.

730 [18] S. Tobiyama, Y. Yamaguchi, H. Shimada, T. Ikuse, T. Yagi, Malware


detection with deep neural network using process behavior, in: 2016 IEEE
40th Annual Computer Software and Applications Conference (COMPSAC),
Vol. 2, IEEE, 2016, pp. 577–582.

[19] M. Schuster, K. K. Paliwal, Bidirectional recurrent neural networks, IEEE


735 Transactions on Signal Processing 45 (11) (1997) 2673–2681.

[20] D. Kong, G. Yan, Discriminant malware distance learning on structural


information for automated malware classification, in: Proceedings of the
19th ACM SIGKDD international conference on Knowledge discovery and
data mining, ACM, 2013, pp. 1357–1365.

740 [21] H. Xiao, T. Stibor, A supervised topic transition model for detecting
malicious system call sequences, in: Workshop on Knowledge Discovery,
2011.

[22] B. Kolosnjaji, A. Zarras, T. Lengyel, G. Webster, C. Eckert, Adaptive


semantics-aware malware classification, in: International Conference on
745 Detection of Intrusions & Malware, 2016.

33
[23] X. Han, C. Eckert, Efficient online sequence prediction with side information,
in: IEEE International Conference on Data Mining, 2013.

[24] J. Pfoh, C. Schneider, C. Eckert, Leveraging String Kernels for Malware


Detection, 2013.

750 [25] A. Mohaisen, O. Alrawi, M. Mohaisen, Amal: High-fidelity, behavior-based


automated malware analysis and classification, Computers & Security 52
(2015) 251–266.

[26] P. Parmuval, M. Hasan, S. Patel, Malware family detection approach using


image processing techniques: Visualization technique, International Journal
755 of Computer Applications Technology and Research 07 (2018) 129–132.
doi:10.7753/IJCATR0703.1004.

[27] L. Wang, J. Liu, X. Chen, Microsoft malware classification challenge (big


2015) first place team: Say no to overfitting (2015) (2015).

[28] M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang, F. Iqbal,


760 Malware classification with deep convolutional neural networks, in: Ifip
International Conference on New Technologies, Mobility and Security, 2018,
pp. 1–5.

[29] J. Zhang, Z. Qin, H. Yin, L. Ou, K. Zhang, A feature-hybrid malware


variants detection using cnn based opcode embedding and bpnn based api
765 embedding, Computers & Security 84 (2019) 376–392.

[30] K. Kancherla, S. Mukkamala, Image visualization based malware detection.,


in: Computational Intelligence in Cyber Security, 2013.

[31] L. Liu, B. Wang, Malware classification using gray-scale images and ensemble
learning, in: International Conference on Systems and Informatics, 2017,
770 pp. 1018–1022.

[32] S. Pai, F. Di Troia, C. A. Visaggio, T. H. Austin, M. Stamp, Clustering


for malware classification, Journal of Computer Virology and Hacking
Techniques 13 (2) (2017) 95–107.

34
[33] F. C. C. Garcia, I. F. P. Muga, Random forest for malware classification.

775 [34] K. Kosmidis, C. Kalloniatis, Machine learning and images for malware detec-
tion and classification, in: Proceedings of the 21st Pan-Hellenic Conference
on Informatics, ACM, 2017, p. 5.

[35] S. A. Mohd, M. S. Bin, A. M. Mohd, Classification of malware family


using decision tree algorithm, in: Innovations in Computing Technology
780 and Applications, Vol. 02, 2017, pp. 1–8.

[36] D. Lee, I. S. Song, K. J. Kim, J.-h. Jeong, A study on malicious codes pat-
tern analysis using visualization, in: Information Science and Applications
(ICISA), 2011 International Conference on, IEEE, 2011, pp. 1–5.

[37] X. Liu, Y. Lin, H. Li, J. Zhang, Adversarial examples: Attacks on machine


785 learning-based malware visualization detection methods, arXiv preprint
arXiv:1808.01546.

[38] J. Zhang, C. Li, Adversarial examples: Opportunities and challenges,


IEEE Transactions on Neural Networks and Learning Systemsdoi:10.1109/
TNNLS.2019.2933524.

790 [39] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial


examples, arXiv preprint arXiv:1412.6572.

[40] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, P. Mc-


Daniel, Ensemble adversarial training: Attacks and defenses, arXiv preprint
arXiv:1705.07204.

795 [41] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards


deep learning models resistant to adversarial attacks, arXiv preprint
arXiv:1706.06083.

[42] T. P. Bohlin, Practical grey-box process identification: theory and applica-


tions, Springer Science & Business Media, 2006.

35
800 [43] N. Akhtar, A. Mian, Threat of adversarial attacks on deep learning in
computer vision: A survey, arXiv preprint arXiv:1801.00553.

[44] J. Aldrich, J. Miller, Earliest known uses of some of the words of mathe-
matics, particular, the entries for” bell-shaped and bell curve”,” normal
(distribution)”,” Gaussian”, and” Error, law of error, theory of errors, etc.

805 [45] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks,
in: Security and Privacy (SP), 2017 IEEE Symposium on, IEEE, 2017, pp.
39–57.

[46] T. Miyato, S.-i. Maeda, S. Ishii, M. Koyama, Virtual adversarial training: a


regularization method for supervised and semi-supervised learning, IEEE
810 transactions on pattern analysis and machine intelligence.

[47] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with


deep convolutional neural networks, in: Advances in neural information
processing systems, 2012, pp. 1097–1105.

[48] G. Lan, Y. Zhou, Conditional gradient sliding for convex optimization,


815 SIAM Journal on Optimization 26 (2) (2016) 1379–1409.

[49] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
in: Proceedings of the IEEE conference on computer vision and pattern
recognition (IEEE CVPR 2016), 2016, pp. 770–778.

[50] Y. Bengio, et al., Learning deep architectures for ai, Foundations and
820 trends
R in Machine Learning 2 (1) (2009) 1–127.

[51] S. Bubeck, et al., Convex optimization: Algorithms and complexity, Foun-


dations and Trends
R in Machine Learning 8 (3-4) (2015) 231–357.

[52] Virustotal tool, https://www.virustotal.com/en/, 2018.

[53] H. S. Anderson, P. Roth, Ember: an open dataset for training static pe


825 malware machine learning models, arXiv preprint arXiv:1804.04637.

36
[54] F. A. Narudin, A. Feizollah, N. B. Anuar, A. Gani, Evaluation of machine
learning classifiers for mobile malware detection, Soft Computing 20 (1)
(2016) 343–357.

[55] E. Raff, C. Nicholas, An alternative to ncd for large sequences, lempel-ziv


830 jaccard distance, in: Proceedings of the 23rd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, ACM, 2017, pp.
1007–1015.

[56] D. Gibert, C. Mateu, J. Planes, R. Vicens, Using convolutional neural


networks for classification of malware represented as images, Journal of
835 Computer Virology and Hacking Techniques 15 (1) (2019) 15–28.

37
Author Biography:

Xinbo Liu

Received his M.S. degree in Analytical Science from Central


South University, China, in 2015. Since 2015, he has been a Ph.D.
candidate in College of Computer Science and Electronic Engineering,
Hunan University. His research interests include Machine Learning,
Artificial Neural Network, Information Security, Data Mining and
Software Security.

Yaping Lin (Corresponding author)

Received the B.S. degree in Computer Application from Hunan


University, China, in 1982, and the M.S. degree in Computer
Application from National University of Defense Technology, China
in 1985. He received the Ph.D. degree in Control Theory and
Application from Hunan University in 2000. He has been a professor
and Ph.D supervisor in Hunan University since 1996. During
2004-2005, he worked as a visiting researcher at the University of
Texas at Arlington. His research interests include machine learning,
artificial intelligent, network security and wireless sensor networks.

Jiliang Zhang

Received the Ph.D. degree in Computer Science and Technology


from Hunan University, Changsha, China in 2015. From 2013 to 2014,
he worked as a Research Scholar at the Maryland Embedded Systems
and Hardware Security Lab, University of Maryland, College Park.
From 2015 to 2017, he was an Associate Professor with Northeastern
University, China. Since 2017, he has joined Hunan University. His
current research interests include hardware/hardware-assisted security,
artificial intelligence security, and emerging technologies. Prof. Zhang
was a recipient of the Hu-Xiang Youth Talent, and the best paper
nominations in International Symposium on Quality Electronic
Design 2017. He has been serving on the technical program committees of many international
conferences such as ASP-DAC, FPT, GLSVLSI, ISQED and AsianHOST, and is a Guest Editor of the
Journal of Information Security and Applications and Journal of Low Power Electronics and
Applications.

He Li

Received the M.S degree from the department of Microelectronics


in Tianjin University, China. He is currently a Ph.D. student of
department of Electrical and Electronics Engineering in Imperial
College London, UK. And also he is a visiting student in Hunan
University. His research interests include custom computing, computer
arithmetic, hardware security and machine learning. He has received the
Best Paper Presentation award in 2017 IEEE international conference on
Field-programmable Technology (FPT), the Student Travel award in
2017 IEE symposium on Computer Arithmetic (ARITH’24) and the
outreach award in 2015 IEEE international system-on-chip conference
(SOCC).
A CONFLICT OF INTEREST STATEMENT
• None of the authors of this paper has a financial or personal relationship with other
people or organizations that could inappropriately influence or bias the content of the
paper.
• It is to specifically state that “No Competing interests are at stake and there is No
Conflict of Interest” with other people or organizations that could inappropriately
influence or bias the content of the paper.

You might also like