Professional Documents
Culture Documents
Generative Adversarial Network To Detect Unseen Internet of Things Malware
Generative Adversarial Network To Detect Unseen Internet of Things Malware
Generative Adversarial Network To Detect Unseen Internet of Things Malware
net/publication/352983034
CITATIONS READS
36 1,009
7 authors, including:
All content following this page was uploaded by Zahra Moti on 18 February 2022.
Abstract
Machine learning is significantly used for malware and adversary detection in the
industrial internet of things networks. However, majority of these methods require
a significant prior knowledge of malware properties to identify optimal features for
malware detection. This is a more significant challenge in IoT environment due to
limited availability of malware samples. Some researchers utilized data deformation
techniques such as converting malware to images or music to generate features that
can be used for malware detection. However, these processes can be time-consuming
and require a significant amount of data. This paper proposes MalGan, a framework
for detecting and generating new malware samples based on the raw byte code at the
edge layer of the Internet of Things (IoT) networks. Convolutional Neural Network
(CNN) was utilized to extract high-level features, and boundary-seeking Generative
Adversarial Network technique was used to generate new malware samples. Thus,
even with a few malware samples, a significant number of previously unseen malware
samples are detectable with high accuracy. To capture the short-term and long-term
dependency of features, we employed an attention-based model, a combination of
CNN and Long Short Term Memory. The attention mechanism improves the model’s
performance by increasing or decreasing attention to certain parts of the features.
The proposed method is examined extensively using standard Windows and IoT
malware datasets. The experimental results indicate that our proposed MalGan is
the method of choice, as it offers a higher detection rate compared to the previous
malware detection algorithms.
Keywords: Malware, Generative Adversarial Network (GAN), Deeplearning,
Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), Edge
Layer, Internet of Things
2
purpose, they proposed an improved Monte Carlo Tree Search (MCTS) algorithm
whose selection, expansion, and simulation stages are different from normal MCTS.
Also, they implemented a GAN-based model to detect adversarial examples. They
added adversarial examples to the training set to equip the discriminator for im-
proving detection rate. Li et al. [39] utilized ensemble deep learning to detect and
generate adversarial examples and demonstrated its impact on models’ performance.
1. To improve the GAN training stability and enhance the accuracy of classifiers
for malware detection, we implement a novel representation learning model
based on a CNN to extract a high-level representation of raw data.
2. To defend system security with improving generalization and robustness of
models, we oversample the training set by generating the signature of new
malware samples using a generative model called Boundary seeking GAN.
3. To accurately detect malware samples, we propose an attention-based deep
learning model, a combination of CNN and LSTM. The model considers long-
term and short-term dependencies of the input data concurrently and focuses
on important features.
We evaluate our models using standard IoT and Windows malware datasets. The
remainder of the paper is arranged as follows: Section 2 depicts the related work and,
in Section 3, we discuss the fundamental background regarding GAN architecture.
In Section 4, the proposed method is introduced. Section 5 reports experimental
results, and finally, in Section 6, we summarize and conclude our paper.
2. Literature Review
Due to the importance of malware detection, many studies have been conducted so
far [30, 46, 29, 12]. In general, malware analysis techniques are divided into dynamic
analysis and static analysis [3].
3
2.1. Dynamic Analysis
In dynamic analysis, files are executed in a virtual or real environment, and decisions
are made based on their behavior during execution, such as network traffic, mem-
ory usage, system call, and hardware feature. The most notable advantage of this
technique is resistance to packing and obfuscation. However, this method is very
time-consuming and requires many computing resources; it is not suitable for large
datasets. Furthermore, many malware samples recognize the virtual environment
and do not show their malicious behavior or show their malicious behavior over a
very long time [58].
3. Background
3.1. Generative Adversarial Learning
The primary goal of the Generative Adversarial Networks is to learn the density
function and estimate the distribution of the training set. GAN consists of two
neural networks called generator and discriminator, which train with a min-max
objective function simultaneously [20]. The generator intends to generate realistic
samples to deceive the discriminator while the discriminator tries to correctly classify
generated and the real samples. The generator first takes a random vector as input,
which contains noise derived from a specific differential function and tries to make it
more similar to the real samples in each iteration. The discriminator is an ordinary
neural network taking its input samples from two sources: original and generated
samples. Then it returns a probability close to one for real data and close to zero for
synthetic data. The error rate in the discriminator network is calculated by cross-
entropy function and then back-propagated to generator and discriminator networks
to update the weights [19]. The objective function of GAN is shown in Equation
(1) where D and G refer to the discriminator and generator, respectively [20]. pr
is the probability distribution of real data, and pg is the probability distribution of
generated data. The goal of the generator is to minimize the objective function,
5
while the goal of the discriminator is to maximize it.
min max V (D, G) = Ex∼ pr [ log D(x)] + Ex̃∼ pg [ log(1 − D(G(z)))] (1)
G D
In the original GAN paper, it is assumed that G is given and the optimal value
of the D is equal to [19]:
pr
D∗ (x) = (2)
pr + pg
Under an ideal discriminator, despite the traditional maximum likelihood approaches
which minimize Kullback-Leibler divergence between pg and pr , as indicated in Equa-
tion (3) the generator minimizes the Jensen-Shannon divergence [15] and if pg is
equal to pr , the global minimum of V(G) is V ∗ = − log 4 [4].
∂f ∗ ∗ D∗ (x)
w∗ (x) = (T (x)) = (8)
∂T 1 − D∗ (x)
4. Proposed Method
4.1. Overview of the Proposed Method
Generally, instead of creating a program from scratch, malware developers produce
new samples by making minor changes to existing ones for make the detection harder.
In this paper, we generated new malware signatures based on a limited number of
existing ones using GAN. Since the ultimate objective of GAN is to estimate the
distribution of data, the generated data are based on the distribution of existing
malware; they are potentially similar to malware that may be released in the future.
After generating the new samples, we add this data to the training set, which causes
the model to be trained with more diverse samples.
Figure 1 shows the structure of our proposed framework, MalGan. First, headers
of the executable files are extracted, then the high-level and abstract features are
derived using the CNN. After that, new signature of malware samples are generated
by boundary-seeking GAN and added to the original dataset to train our proposed
deep network.
7
Benign PE Header …
DOS-Headers …
…
File-Headers
…
Extract Optional-Headers Extract Feature
…
CNN
Header By CNN
…
Malware
CNN
ELF File …
ELF Headers
D(x) X real
G(z)
Classification Oversample
Training set
Dense
Dense
X fake Z(10)
Dense
Dense
9
X real
Discriminator
Extracted
Feature
Real
Generator G(z) Or
Fake
Dense(256)
Dense(512)
Z(10)
X fake
Input layer
Output layer
Dense(256) (32*1)
Dense(512)
Dense(1024)
After each convolutional layer, there are pooling and dropout layers to prevent
overfitting [42]. Dropout is a technique that accidentally removed some neurons of
each layer at a constant rate during the network training, so the model does not
depend on a specific set of neurons and weights. Then, an LSTM layer is added,
which reduces the effect of vanishing gradient and paves the way for working with
long sequences [17]. LSTM contains inner mechanisms named gates that can adjust
the flow of information.
ft = σ (Wf · [ht−1 , xt ] + bf ) (11)
it = σ (Wi · [ht−1 , xt ] + bi ) (12)
ot = σ (Wo [ht−1 , xt ] + bo ) (13)
Ct = ft ∗ Ct−1 + it ∗ tanh (WC · [ht−1 , xt ] + bC ) (14)
ht = ot ∗ tanh (Ct ) (15)
In the above equations, ft denotes the forget gate, which is responsible for removing
information no longer needed from the state cell. it represents the input gate that
10
decides what new information can be added to the cell, and it indicates the output
gate that specifies the output of the current cell. Ct represents the cell state, and
ht denotes the hidden state, which is the output of LSTM layer.
To reduce the execution time,the convolutional and max-pooling layers have been
used to extract important features from long sequences and pass them to LSTM
layer.The combination of CNN and LSTM captures short-term, and long-term de-
pendencies concurrently [43]. In other words, CNN extracts local features based on
the filter that convolves on input and LSTM with multiple gates, and memory con-
siders correlations between the features. Then, attention mechanism [57] is used to
determine the relevance between different positions of a sequence. The motivation
is to enhance accuracy by increasing or decreasing the concentration and contribu-
tion of certain parts of the input sequences. The attention mechanism imitates the
mechanism of human vision. When the human sight mechanism identifies an item,
it will not scan the entire scene, and it will concentrate on a specific part according
to the target. Attention learns the correlation between features and their impact
on target identification by assigning scores to features. The steps are explained in
the following equations. hi indicates the latent states of the LSTM layer, w0 and w1
also refer to the matrix of weights. At first, a neural network is trained to assign
scors to connected and interconnected inputs. When scores are calculated, atten-
tion weights are generated by applying the softmax function. The summation of
the weights should be one. This vector is then multiplied by the input vector to
apply the weights to the output. Here, the input is the hidden state of the LSTM
layer, so the weight is calculated for each state, and it indicates its importance for
a classification task and makes the model more robust against noise and outliers.
The output of this layer (out) is given to a fully connected layer with the softmax
activation function for classification.
T
X
out = αi hi (16)
i=1
T
1X
h̄ = αi hi (17)
T i=1
5. Experimental Setup
A system with Intel CoreT M i7 − 6700K CPU and 16.0 GB of RAM with Ubuntu
18.04 is used for our experiments. All of the neural networks in the propsoed model
are built using the Python 3.6 and Keras library. Adam optimizer is utilized, and
the batch size is set to 80. Loss function is also set to binary cross-entropy. CNN
has been trained for 15 epochs as a feature extraction module, and BGAN is trained
for 1200 epochs to generate a new signature of malware samples. More detail about
discriminator and generator networks are shown in Tables 1 and 2. Also, our deep
network for malware detection is trained for 20 epochs.
11
LSTM
…
… LSTM
…
… LSTM
Attention
…
LSTM
CNN
…
Max Pooling
…
LSTM
CNN
Max Pooling
…
LSTM
5.1. Dataset
We use two Windows malware datasets to evaluate our proposed method. The first
dataset contains 5,000 executable files, including 2,500 malware samples and 2,500
non-malicious files. Malware samples are collected from a collection of virusShare
online malware repository. Benign samples are gathered from system files of an
original Windows 10 and other known programs, checked with a commercial anti-
virus to assure that they are not malicious. The second dataset includes 15,000
EXE and DLL files, of which 7,500 samples as malware, randomly selected from
the VX-Heaven dataset, including various malware types, including trojan, worm,
backdoor, virus, and hack-tool, and 7,500 samples as non-malicious software. Also,
we employ an IoT dataset, which includes 271 malware samples and 281 malicious
samples. The last dataset contains 4,200 executable files, which malware samples
are randomly selected from the VX-Heaven dataset and packed using a popular
tool called UPX. The purpose of selecting this dataset is to examine the selected
feature’s ability to detect malware in packed mode. We summarize the information
12
of all datasets in Table 3.
Figure 4: Generator and discriminator loss Figure 5: Generator and discriminator loss
when raw header is used to train the BGAN when CNN features are used to train BGAN
Table 4: Comparison between the distribution of original VX-Heaven and synthetic sample gener-
ated by our proposed framework.
MMDl JSD KLD(Fake and Real) KLD (Real and fake)
0.054 0.007 0.017 0.016
14
Figure 6: Illustration of the distribution of the real and generated malware by BGAN and non-
malicious files.
Table 5: Performance comparison between the proposed method and the SMOTE algorithm.
SMOTE MalGan
Methods
Accuracy F1 Accuracy F1
Decision Tree 89.65 89.88 93.12 93.21
SVM 89.25 89.07 93.25 93.25
Logistic Regression 87.67 87.32 93.50 98.95
3-nn 88.12 88.63 92.56 92.60
15
6.2. Computational Complexity
Our proposed framework contains pre-processing and malware detection phases. The
pre-processing phase includes the header extraction, the high-level feature extraction
by CNN, and data generation by boundary seeking GAN processes. Since data
generation is performed offline, it is just considered in computational complexities
of training. Computational complexity is calculated for the worst-case scenario, in
which the number of layers, number of neurons, hidden units, and filters in each
layer is supposed to be n, which is the number of training samples. The header
extraction step consists of an iterative loop, which in the worst case is of order
O (n). According to the algorithm of GAN [20], considering the worst case for the
number of training iterations as well as the number of steps to apply to the generator
and discriminator, which are both composed of dense layers with the complexity of
O (n2 ) [49], the computational complexity of GAN is O (n2 ). A CNN model was used
to extract high-level features. As mentioned in [25], P the computational cost of the
entire Convolutional layers per training epoch is O(( nl=1 cnn
nl−1 s2l nl m2l )leni ), where
ncnn is the number of convolutional layers, nl is the number of filters in the l−th layer,
nl−1 is the number of l − th layer’s input channels, sl is the filter’s spatial size, ml is
the spatial size of the output feature map, and leni is the length of the input data.
We generalize this phase to the worst-case; therefore, the order of this algorithm will
be O (n3 ) for training and O (n2 ) for testing computational complexity. Thus, the
overall computational complexity of the pre-processing phase in the worst scenario
is O (n3 ) and O (n2 ) for training and testing . For malware detection, we presented
a novel technique of an attention-based deep learning model integrating CNN and
LSTM for Malware detection. As discussed in [27], the computational complexity
of LSTM per epoch is O (wleni ), where w is the number of weights, and leni0 is the
input data length, which is generalized to O (n4 ) considering the worst-case scenario
for training and generalized to O (n2 ) for testing computentional complexity. The
computational cost of the attention layer is O (n) [57]. In total, the computational
complexity of Malware detection is the order of O (n3 ) + O (n4 ) + O(n) = O (n4 )
and O (n2 ) + O (n2 ) + O(1) = O (n2 ) for training and testing respectively. Hence,
the overall computational complexity of training and testing, which contains pre-
processing and malware detection phases, is O (n3 ) + O (n4 ) = O (n4 ) and O (n2 ) +
O (n2 ) = O (n2 ).
16
proposed deep network with them. For this purpose, we randomly selected malware
samples from VX-Heaven dataset. We then compared the proposed deep network
with some classical machine learning models and some neural networks. As shown
in Figure 7, the accuracy of the CNN-LSTM model is higher than that of CNN
and LSTM, so it can be concluded that short-term and long-term dependencies are
explored simultaneously by combining CNN and LSTM models, and the accuracy of
the model is improved. Also, our proposed deep network has a better performance
than the CNN-LSTM model. It is because of the effect of the attention mechanism
on the model. As we said, using the attention mechanism, the essential features
for malware detection gain more significant weight, which increase the accuracy of
the proposed model. Also, based on [57], in terms of computational complexity,
attention layers are faster than convolutional and recurrent layers.
Figure 7: Compare the accuracy of our proposed deep network and other models.
17
Figure 8: Comparison between packed and non-packed modes, considering the raw header as a
feature.
18
Figure 9: Comparison between CNN and Autoencoder as a feature extraction.
than the other classifiers. This shows that the generated data follows the original
data distribution well, and increasing the data helps the models more general and
robust. Also, by utilizing the extracted feature using CNN, the performance of all
models has enhanced. Generally, one advantage of deep networks is to extract a
conceptual space from the data. In addition, the accuracy, F1, and other metrics of
our proposed deep network are superior to almost all machine learning algorithms
and also LSTM and CNN. As mentioned in the previous sections, our proposed deep
network combines CNN and LSTM that considers both short and long dependen-
cies concurrently. Also, selected optimal structure and fine-tuning of the network
hyper-parameters after performing numerous experiments are other important keys.
Figure 10: Accuracy of models on VX-Heaven Figure 11: AUC of models on VX-Heaven
dataset dataset
19
Figure 12: F1 score of models on VX-Heaven Figure 13: Seisitivity of models on VX-Heaven
dataset dataset
Figure 14: Accuracy of models on Collected Figure 15: AUC of models on Collected dataset
dataset dataset
For a more comprehensive evaluation of the proposed method and the effect of
oversampling by BGAN, we have compared our work with a couple of new works,
given in Table 6, 7, and 8. As can be seen, MalGan is more efficient than other
methods. Therefore, even with a limited number of training samples, it can detect
malware with reasonable accuracy. In [18] and [38], due to the network structure
and input type, which is images, more samples are needed for training the network
attributes. [37] and [14] have also performed poorly due to their feature engineer-
ing phase with limited data. Also, in [29] despite our method, only short-term
dependencies between features are considered. In our method, because of selecting
the optimal network structure and using raw bytecode as a feature, overhead and
complexity is very trivial compared to other methods except Namavar et al. [29],
which required time for extracting feature is not considered. Therefore, our proposed
method can be used as a real-time malware detection system.
20
Figure 16: F1 score of models on Collected Figure 17: Seisitivity of models on Collected
dataset dataset
Figure 18: Accuracy of models on IoT dataset Figure 19: AUC of models on IoT dataset
21
Figure 20: F1 score of models on IoT dataset Figure 21: Seisitivity of models on IoT dataset
22
and analyze them using other methods such as executing in a virtual environment.
References
[1] M. Ahmadi, D. Ulyanov, S. Semenov, M. Trofimov, and G. Giacinto. Novel feature extraction,
selection and fusion for effective malware family classification. pages 183–194, 2016.
[2] A. Al-Abassi, H. Karimipour, A. Dehghantanha, and R. M. Parizi. An ensemble deep learning-
based cyber-attack detection in industrial control system. IEEE Access, 8:83965–83973, 2020.
[3] M. Amin, B. Shah, A. Sharif, T. Ali, K.-l. Kim, and S. Anwar. Android malware detec-
tion through generative adversarial networks. Transactions on Emerging Telecommunications
Technologies, page e3675, 2019.
[4] M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial
networks. arXiv preprint arXiv:1701.04862, 2017.
[5] A. Azmoodeh, A. Dehghantanha, and K.-K. R. Choo. Robust malware detection for internet of
(battlefield) things devices using deep eigenspace learning. IEEE Transactions on Sustainable
Computing, 4(1):88–95, 2018.
[6] E. Barshan, A. Ghodsi, Z. Azimifar, and M. Z. Jahromi. Supervised principal component
analysis: Visualization, classification and regression on subspaces and submanifolds. Pattern
Recognition, 44(7):1357–1371, 2011.
[7] Z. Bazrafshan, H. Hashemi, S. M. H. Fard, and A. Hamzeh. A survey on heuristic malware
detection techniques. pages 113–120, 2013.
[8] C. Beek, C. Castillo, C. Cochin, and A. Dolezal. Mcafee threats report: December 2018.
McAfee Labs, New York, NY, USA, pages 3–7, 2018.
[9] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority
over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
[10] T. Committee et al. Tool interface standard (tis) executable and linking format (elf) specifi-
cation version 1.2, 1995.
[11] Z. Cui, F. Xue, X. Cai, Y. Cao, G.-g. Wang, and J. Chen. Detection of malicious code variants
based on deep learning. IEEE Transactions on Industrial Informatics, 14(7):3187–3196, 2018.
[12] H. Darabian, A. Dehghantanha, S. Hashemi, S. Homayoun, and K.-K. R. Choo. An opcode-
based technique for polymorphic internet of things malware detection. Concurrency and
Computation: Practice and Experience, 32(6):e5173, 2020.
[13] M. Egele, T. Scholte, E. Kirda, and C. Kruegel. A survey on automated dynamic malware-
analysis techniques and tools. ACM computing surveys (CSUR), 44(2):1–42, 2008.
[14] M. Farrokhmanesh and A. Hamzeh. Music classification as a new approach for malware
detection. Journal of Computer Virology and Hacking Techniques, 15(2):77–96, 2019.
[15] B. Fuglede and F. Topsoe. Jensen-shannon divergence and hilbert space embedding. In
International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings., page 31.
IEEE, 2004.
[16] E. Gandotra, D. Bansal, and S. Sofat. Malware analysis and classification: A survey. Journal
of Information Security, 2014, 2014.
[17] F. A. Gers, J. Schmidhuber, and F. Cummins. Learning to forget: Continual prediction with
lstm. 1999.
[18] D. Gibert, C. Mateu, J. Planes, and R. Vicens. Using convolutional neural networks for
classification of malware represented as images. Journal of Computer Virology and Hacking
Techniques, 15(1):15–28, 2019.
[19] I. Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint
arXiv:1701.00160, 2016.
[20] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
and Y. Bengio. Generative adversarial nets. pages 2672–2680, 2014.
[21] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample
test. The Journal of Machine Learning Research, 13(1):723–773, 2012.
[22] H. HaddadPajouh, A. Dehghantanha, R. Khayami, and K.-K. R. Choo. A deep recurrent neu-
23
ral network based approach for internet of things malware threat hunting. Future Generation
Computer Systems, 85:88–96, 2018.
[23] H. HaddadPajouh, A. Dehghantanha, R. M. Parizi, M. Aledhari, and H. Karimipour. A
survey on internet of things security: Requirements, challenges, and solutions. Internet of
Things, page 100129, 2019.
[24] H. Hashemi and A. Hamzeh. Visual malware detection using local malicious pattern. Journal
of Computer Virology and Hacking Techniques, 15(1):1–14, 2019.
[25] K. He and J. Sun. Convolutional neural networks at constrained time cost. In Proceedings of
the IEEE conference on computer vision and pattern recognition, pages 5353–5360, 2015.
[26] R. D. Hjelm, A. P. Jacob, T. Che, A. Trischler, K. Cho, and Y. Bengio. Boundary-seeking
generative adversarial networks. arXiv preprint arXiv:1702.08431, 2017.
[27] S. Hochreiter. Ja1 4 rgen schmidhuber (1997).“long short-term memory”. Neural Computa-
tion, 9(8).
[28] A. N. Jahromi, J. Sakhnini, H. Karimpour, and A. Dehghantanha. A deep unsupervised rep-
resentation learning approach for effective cyber-physical attack detection and identification
on highly imbalanced data. In Proceedings of the 29th Annual International Conference on
Computer Science and Software Engineering, pages 14–23, 2019.
[29] A. N. Jahromi, S. Hashemi, A. Dehghantanha, K.-K. R. Choo, H. Karimipour, D. E. Newton,
and R. M. Parizi. An improved two-hidden-layer extreme learning machine for malware
hunting. Computers & Security, 89:101655, 2020.
[30] H. Karimipour, A. Dehghantanha, R. M. Parizi, K.-K. R. Choo, and H. Leung. A deep and
scalable unsupervised machine learning system for cyber-attack detection in large-scale smart
grids. IEEE Access, 7:80778–80788, 2019.
[31] J. O. Kephart. Automatic extraction of computer virus signatures. pages 178–184, 1994.
[32] C. H. Kim, E. K. Kabanga, and S.-J. Kang. Classifying malware using convolutional gated
neural network. pages 40–44, 2018.
[33] J.-Y. Kim, S.-J. Bu, and S.-B. Cho. Malware detection using deep transferred generative
adversarial networks. pages 556–564, 2017.
[34] J.-Y. Kim, S.-J. Bu, and S.-B. Cho. Zero-day malware detection using transferred generative
adversarial networks based on deep autoencoders. Information Sciences, 460:83–102, 2018.
[35] T. Kim, B. Kang, M. Rho, S. Sezer, and E. G. Im. A multimodal deep learning method
for android malware detection using various features. IEEE Transactions on Information
Forensics and Security, 14(3):773–788, 2018.
[36] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J. Inman. 1d convolutional
neural networks and applications: A survey. arXiv preprint arXiv:1905.03554, 2019.
[37] A. Kumar, K. Kuppusamy, and G. Aghila. A learning model to detect maliciousness of
portable executable using integrated feature set. Journal of King Saud University-Computer
and Information Sciences, 31(2):252–265, 2019.
[38] Q. Le, O. Boydell, B. Mac Namee, and M. Scanlon. Deep learning at the shallow end: Malware
classification for non-domain experts. Digital Investigation, 26:S118–S126, 2018.
[39] D. Li and Q. Li. Adversarial deep ensemble: Evasion attacks and defenses for malware
detection. IEEE Transactions on Information Forensics and Security, 15:3886–3900, 2020.
[40] H. Li, S. Zhou, W. Yuan, J. Li, and H. Leung. Adversarial-example attacks toward android
malware detection system. IEEE Systems Journal, 14(1):653–656, 2019.
[41] F. Liese and I. Vajda. On divergences and informations in statistics and information theory.
IEEE Transactions on Information Theory, 52(10):4394–4412, 2006.
[42] L. Liu and B. Wang. Automatic malware detection using deep learning based on static analy-
sis. In International Conference of Pioneering Computer Scientists, Engineers and Educators,
pages 500–507. Springer, 2017.
[43] T. Liu, J. Bao, J. Wang, and Y. Zhang. A hybrid cnn–lstm algorithm for online defect
recognition of co2 welding. Sensors, 18(12):4369, 2018.
[44] S. Martin Arjovsky and L. Bottou. Wasserstein generative adversarial networks. 2017.
[45] N. Milosevic, A. Dehghantanha, and K.-K. R. Choo. Machine learning aided android malware
classification. Computers & Electrical Engineering, 61:266–274, 2017.
[46] Z. Moti, S. Hashemi, and A. Namavar. Discovering future malware variants by generating
new malware samples using generative adversarial network. pages 319–324, 2019.
24
[47] V. Nagarajan and J. Z. Kolter. Gradient descent gan optimization is locally stable. pages
5585–5595, 2017.
[48] S. Nowozin, B. Cseke, and R. Tomioka. f-gan: Training generative neural samplers using
variational divergence minimization. In Advances in neural information processing systems,
pages 271–279, 2016.
[49] P. Orponen et al. Computational complexity of neural networks: a survey. Nordic Journal of
Computing, 1994.
[50] N. Peiravian and X. Zhu. Machine learning for android malware detection using permission
and api calls. pages 300–305, 2013.
[51] M. D. Preda, M. Christodorescu, S. Jha, and S. Debray. A semantics-based approach to
malware detection. ACM SIGPLAN Notices, 42(1):377–388, 2007.
[52] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas. Malware detection
by eating a whole exe. arXiv preprint arXiv:1710.09435, 2017.
[53] E. Raff, J. Sylvester, and C. Nicholas. Learning the pe header, malware detection with minimal
domain knowledge. pages 121–132, 2017.
[54] T. Rezaei and A. Hamze. An efficient approach for malware detection using pe header spec-
ifications. In 2020 6th International Conference on Web Research (ICWR), pages 234–239.
IEEE, 2020.
[55] J. Sakhnini, H. Karimipour, A. Dehghantanha, R. M. Parizi, and G. Srivastava. Security
aspects of internet of things aided smart grids: A bibliometric survey. Internet of things, page
100111, 2019.
[56] S. M. Tahsien, H. Karimipour, and P. Spachos. Machine learning based solutions for security
of internet of things (iot): A survey. Journal of Network and Computer Applications, 161:
102630, 2020.
[57] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and
I. Polosukhin. Attention is all you need. In Advances in neural information processing systems,
pages 5998–6008, 2017.
[58] Z. Xu, S. Ray, P. Subramanyan, and S. Malik. Malware detection using machine learning
based analysis of virtual memory access patterns. pages 169–174, 2017.
[59] C. Yinka-Banjo and O.-A. Ugot. A review of generative adversarial networks and its applica-
tion in cybersecurity. Artificial Intelligence Review, pages 1–16, 2019.
[60] X. Zhang, Y. Zhou, S. Pei, J. Zhuge, and J. Chen. Adversarial examples detection for xss
attacks based on generative adversarial networks. IEEE Access, 8:10989–10996, 2020.
25