Professional Documents
Culture Documents
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
Generating Adversarial Malware Examples For Black-Box Attacks Based On GAN
Substitute
Generator
Detector
Benign
max (· , · ) represents element-wise max operation. If an of programs that are detected as malware by the black-box
element of m has the value 1, the corresponding result of G detector.
is also 1, which is unable to back propagate the gradients. If To train the substitute detector, LD should be minimized
an element of m has the value 0, the result of G is the neural with respect to the weights of the substitute detector.
network’s real number output in the corresponding dimen-
The loss function of the generator is defined in Formula 3.
sion, and gradient information is able to go through. It can be
seen that m0 is actually the binarization transformed version
of Gθg (m, z).
LG = Em∈SM alware ,z∼puniform[0,1) log Dθd Gθg (m, z) .
2.3 Substitute Detector (3)
Since malware authors know nothing about the detailed struc- SM alware is the actual malware dataset, not the malware
ture of the black-box detector, the substitute detector is used set labelled by the black-box detector. LG is minimized with
to fit the black-box detector and provides gradient informa- respect to the weights of the generator.
tion to train the generator.
The substitute detector is a multi-layer feed-forward neural Minimizing LG will reduce the predicted malicious proba-
network with weights θd which takes a program feature vector bility of malware and push the substitute detector to recognize
x as input. It classifies the program between benign program malware as benign. Since the substitute detector tries to fit the
and malware. We denote the predicted probability that x is black-box detector, the training of the generator will further
malware as Dθd (x). fool the black-box detector.
The training data of the substitute detector consist of ad- The whole process of training MalGAN is shown in Algo-
versarial malware examples from the generator, and benign rithm 1.
programs from an additional benign dataset collected by mal-
ware authors. The ground-truth labels of the training data are Algorithm 1 The Training Process of MalGAN
not used to train the substitute detector. The goal of the sub-
stitute detector is to fit the black-box detector. The black-box 1: while not converging do
detector will detect this training data first and output whether 2: Sample a minibatch of malware M
a program is benign or malware. The predicted labels from 3: Generate adversarial examples M 0 from the genera-
the black-box detector are used by the substitute detector. tor for M
4: Sample a minibatch of benign programs B
5: Label M 0 and B using the black-box detector
3 Training MalGAN 6: Update the substitute detector’s weights θd by de-
To train MalGAN malware authors should collect a malware scending along the gradient ∇θd LD
dataset and a benign dataset first. 7: Update the generator’s weights θg by descending
The loss function of the substitute detector is defined in along the gradient ∇θg LG
Formula 2. 8: end while
0.4
Training Set without Retraining Table 3: True positive rate (in percentage) on the adversarial
0.2
Test Set without Retraining examples after the black-box detector is retrained.
Training Set with Retraining Before Retrain- After Reraining
Test Set with Retraining
0.0 ing MalGAN MalGAN
0 20 40 60 80 100 120 140 160
Iteration Training set 100 0
Test set 100 0
Figure 3: True positive rate on the adversarial examples over
the iterative process when using the algorithm proposed by
However, once antivirus vendors release the updated black-
Grosse et al..
box detector publicly, malware authors will be able to get a
copy of it and retrain MalGAN to attack the new black-box
the substitute neural network are unable to fool the black-box detector. After this process the black-box detector can hardly
random forest. detect any malware again, as shown in the last column of Ta-
ble 3. We found that reducing TPR from 100% to 0% can
In order to fit the black-box random forest more accurately
be done within one epoch during retraining MalGAN. We al-
on the adversarial examples, we tried to retraining the sub-
ternated retraining the black-box detector and retraining Mal-
stitute neural network on the adversarial examples. At each
GAN for ten times. The results are the same as Table 3 for
iteration, the current generated adversarial examples from the
the ten times.
whole training set are used to retrain the substitute neural net-
work. As shown in Figure 3, the retraining approach make To retrain the black-box detector antivirus vendors have to
TPR converge to 46.18% on the training set, which means collect enough adversarial examples. It is a long process to
the black-box random forest can still detect about half of the collect a large number of malware samples and label them.
adversarial examples. However, the retrained model is unable Adversarial malware examples have enough time to propa-
to generalize to the test set, sine the TPR on the test set con- gate before the black-box detector is retrained and updated.
verges to 90.12%. The odd probability distribution of these Once the black-box detector is updated, malware authors will
adversarial examples limits the generalization ability of the attack it immediately by retraining MalGAN and our exper-
substitute neural network. iments showed that retraining takes much less time than the
first-time training. After retraining MalGAN, new adversarial
MalGAN uses a generative network to transform original
examples remain undetected. This dynamic adversarial pro-
samples into adversarial samples. The neural network has
cess lands antivirus vendors in a passive position. Machine
enough representation ability to perform complex transfor-
learning based malware detection algorithms can hardly work
mations, making MalGAN able to result in nearly zero TPR
in this case.
on both the training set and the test set. While the represen-
tation ability of the gradient based approach is too limited to
generate high-quality adversarial examples. 5 Conclusions
This paper proposed a novel algorithm named MalGAN to
4.4 Retraining the Black-Box Detector generate adversarial examples from a machine learning based
Several defensive algorithms have been proposed to deal black-box malware detector. A neural network based substi-
with adversarial examples. Gu et al. proposed to use auto- tute detector is used to fit the black-box detector. A generator
encoders to map adversarial samples to clean input data [Gu is trained to generate adversarial examples which are able to
and Rigazio, 2014]. An algorithm named defensive distilla- fool the substitute detector. Experimental results showed that
tion was proposed by Papernot et al. to weaken the effective- the generated adversarial examples are able to effectively by-
ness of adversarial perturbations [Papernot et al., 2016d]. Li pass the black-box detector.
et al. found that adversarial retraining can boost the robust- Adversarial examples’ probability distribution is con-
ness of machine learning algorithms [Li et al., 2016]. Chen et trolled by the weights of the generator. Malware authors
al. compared these defensive algorithms and concluded that are able to frequently change the probability distribution by
retraining is a very effective way to defend against adversarial retraining MalGAN, making the black-box detector cannot
examples, and is robust even against repeated attacks [Chen keep up with it, and unable to learn stable patterns from it.
et al., 2016]. Once the black-box detector is updated malware authors can
In this section we will analyze the performance of Mal- immediately crack it. This process making machine learning
GAN under the retraining based defensive approach. If an- based malware detection algorithms unable to work.
Acknowledgments [Liu et al., 2016] Yanpei Liu, Xinyun Chen, Chang Liu,
This work was supported by the Natural Science Foundation and Dawn Song. Delving into transferable adversar-
of China (NSFC) under grant no. 61375119 and the Bei- ial examples and black-box attacks. arXiv preprint
jing Natural Science Foundation under grant no. 4162029, arXiv:1611.02770, 2016.
and partially supported by National Key Basic Research De- [Microsoft, 2013] Microsoft. Microsoft portable executable
velopment Plan (973 Plan) Project of China under grant no. and common object file format specification, 2013.
2015CB352302. [Mirza and Osindero, 2014] Mehdi Mirza and Simon Osin-
dero. Conditional generative adversarial nets. arXiv
References preprint arXiv:1411.1784, 2014.
[Arjovsky and Bottou, 2017] Martin Arjovsky and Léon [Papernot et al., 2016a] Nicolas Papernot, Patrick Mc-
Bottou. Towards principled methods for training gener- Daniel, and Ian Goodfellow. Transferability in machine
ative adversarial networks. In NIPS 2016 Workshop on learning: from phenomena to black-box attacks using
Adversarial Training. In review for ICLR, volume 2016, adversarial samples. arXiv preprint arXiv:1605.07277,
2017. 2016.
[Chen et al., 2016] Xinyun Chen, Bo Li, and Yevgeniy [Papernot et al., 2016b] Nicolas Papernot, Patrick Mc-
Vorobeychik. Evaluation of defensive methods for dnns Daniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and
against multiple adversarial evasion models. 2016. Ananthram Swami. Practical black-box attacks against
[Denton et al., 2015] Emily L Denton, Soumith Chintala, deep learning systems using adversarial examples. arXiv
Rob Fergus, et al. Deep generative image models using a preprint arXiv:1602.02697, 2016.
laplacian pyramid of adversarial networks. In Advances in [Papernot et al., 2016c] Nicolas Papernot, Patrick Mc-
neural information processing systems, pages 1486–1494, Daniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik,
2015. and Ananthram Swami. The limitations of deep learning
[Goodfellow et al., 2014a] Ian Goodfellow, Jean Pouget- in adversarial settings. In Security and Privacy (Eu-
Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, roS&P), 2016 IEEE European Symposium on, pages
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen- 372–387. IEEE, 2016.
erative adversarial nets. In Advances in neural information [Papernot et al., 2016d] Nicolas Papernot, Patrick Mc-
processing systems, pages 2672–2680, 2014. Daniel, Xi Wu, Somesh Jha, and Ananthram Swami.
[Goodfellow et al., 2014b] Ian J Goodfellow, Jonathon Distillation as a defense to adversarial perturbations
Shlens, and Christian Szegedy. Explaining and harnessing against deep neural networks. In Security and Privacy
adversarial examples. arXiv preprint arXiv:1412.6572, (SP), 2016 IEEE Symposium on, pages 582–597. IEEE,
2014. 2016.
[Grosse et al., 2016] Kathrin Grosse, Nicolas Papernot, [Radford et al., 2015] Alec Radford, Luke Metz, and
Praveen Manoharan, Michael Backes, and Patrick Mc- Soumith Chintala. Unsupervised representation learning
Daniel. Adversarial perturbations against deep neu- with deep convolutional generative adversarial networks.
ral networks for malware classification. arXiv preprint arXiv preprint arXiv:1511.06434, 2015.
arXiv:1606.04435, 2016. [Salimans et al., 2016] Tim Salimans, Ian Goodfellow, Woj-
[Gu and Rigazio, 2014] Shixiang Gu and Luca Rigazio. To- ciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen.
wards deep neural network architectures robust to adver- Improved techniques for training gans. In Advances
sarial examples. arXiv preprint arXiv:1412.5068, 2014. in Neural Information Processing Systems, pages 2226–
[Kingma and Ba, 2014] Diederik Kingma and Jimmy Ba. 2234, 2016.
Adam: A method for stochastic optimization. arXiv [Schultz et al., 2001] Matthew G Schultz, Eleazar Eskin,
preprint arXiv:1412.6980, 2014. Erez Zadok, and Salvatore J Stolfo. Data mining methods
[Kolter and Maloof, 2004] Jeremy Z Kolter and Marcus A for detection of new malicious executables. In Security
Maloof. Learning to detect malicious executables in the and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE
wild. In Proceedings of the tenth ACM SIGKDD interna- Symposium on, pages 38–49. IEEE, 2001.
tional conference on Knowledge discovery and data min- [Szegedy et al., 2013] Christian Szegedy, Wojciech
ing, pages 470–478. ACM, 2004. Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian
[Kolter and Maloof, 2006] J Zico Kolter and Marcus A Mal- Goodfellow, and Rob Fergus. Intriguing properties of
oof. Learning to detect and classify malicious executables neural networks. arXiv preprint arXiv:1312.6199, 2013.
in the wild. The Journal of Machine Learning Research,
7:2721–2744, 2006.
[Li et al., 2016] Bo Li, Yevgeniy Vorobeychik, and Xinyun
Chen. A general retraining framework for scalable ad-
versarial classification. arXiv preprint arXiv:1604.02606,
2016.