Hybrid Sequence Based Android Malware Detection Using Natural Language Processing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Received: 20 March 2021 | Revised: 19 April 2021 | Accepted: 11 May 2021

DOI: 10.1002/int.22529

RESEARCH ARTICLE

Hybrid sequence‐based Android malware


detection using natural language processing

Nan Zhang1 | Jingfeng Xue1 | Yuxi Ma1 |


2 3
Ruyun Zhang | Tiancai Liang | Yu‐an Tan4

1
School of Computer, Beijing Institute of
Technology, Beijing, China Abstract
2
Zhejiang Lab, Hangzhou, Android platform has been the target of attackers due to
Zhejiang Province, China its openness and increasing popularity. Android mal-
3
GRG Banking Equipment Co. Ltd.,
ware has explosively increased in recent years, which
Guangzhou, China
4 poses serious threats to Android security. Thus propos-
School of Cyberspace Science and
Technology, Beijing Institute of ing efficient Android malware detection methods is
Technology, Beijing, China curial in defeating malware. Various features extracted
Correspondence
from static or dynamic analysis using machine learning
Tiancai Liang, GRG Banking Equipment have played an important role in malware detection
Co. Ltd., 510145 Guangzhou, China. recently. However, existing code obfuscation, code en-
Email: 13808895230@139.com
cryption, and dynamic code loading techniques can be
Yu‐an Tan, School of Cyberspace of
Security, Beijing Institute of Technology, employed to hinder systems that single based on static
100081 Beijing, China. analysis, purely dynamic analysis systems cannot detect
Email: tan2008@bit.edu.cn
all potential code execution paths. To address these is-
Funding information sues, we propose CoDroid, a sequence‐based hybrid
National Natural Science Foundation of Android malware detection method, which utilizes the
China, Grant/Award Numbers: sequences of static opcode and dynamic system call. We
U1936218, 61876019; Zhejiang Lab,
Grant/Award Number: treat one sequence as a sentence in the natural language
NO.2020LE0AB02 processing and construct a CNN–BiLSTM–Attention
classifier which consists of Convolutional Neural Net-
works (CNNs), the Bidirectional Long Short‐Term
Memory (BiLSTM) with an attention language model.
We extensively evaluate CoDroid under a real‐world
data set and perform comprehensive analysis against
other existing related detection methods. The evalua-
tions show the effectiveness and flexibility of CoDroid
across a variety of experimental settings.

5770 | © 2021 Wiley Periodicals LLC wileyonlinelibrary.com/journal/int Int J Intell Syst. 2021;36:5770–5784.
ZHANG ET AL. | 5771

KEYWORDS
Android malware detection, attention, deep learning, hybrid
analysis, machine learning, natural language processing, text
classification

1 | INTRODUCTION

The malware has hit the Android platform hard. Statista's1 study claims that new Android malware
samples per month make a total of 482,579 in March 2020. Malware is on the rise, posing a
significant threat to mobile security. To curb the increasing speed of Android malware, a number of
well‐performed detection methods have been suggested. In recent years, many security researchers
are focused on solving the detection problem by machine learning (ML) techniques and obtain a
good performance. In general, the approaches of ML‐based Android malware detection are cate-
gorized as follows: static analysis,2–6 dynamic analysis,7–12 and hybrid analysis.13–16 In the last few
years, deep learning (DL) was applied in the field of Android malware detection, thanks to the
advancement of the theory and the maturation of tools.17–20 The static analysis does not require the
execution environments to be set, as well as having relatively low computational overheads.
However static analysis fails to tackle code during execution. Malware that uses obfuscation
techniques (e.g., code encryption and packing) can be dealt with by dynamic analysis, but some
code execution paths may be ignored. In the other way, researchers classify ML‐based Android
malware detection approaches into two categories: semantic features (e.g., Application Program-
ming Interface [API] Call, Intent, Hardware, and Permission) and syntax features (e.g., the se-
quence of opcode) based on considered features.21 Almost all recent ML solutions place a greater
emphasis on semantic features derived from Dalvik bytecode.22 Opcodes and system calls have
been shown to be useful for detecting malware in several studies. There have been many solutions
depend on opcode (sequences)23,24 and system calls (sequences)25–27 in the previous works.
However, most of these solutions only employed single opcode sequences or system calls sequences
which could fall into the main drawbacks we mentioned above.
To overcome these limitations, we propose CoDroid, an Android malware detection
framework using hybrid sequence features (opcode + system calls) based on the DL method.
Static features (opcode sequences) are extracted from the bytecode of the applications, for
dynamic behavior, we trace system calls sequences invoked from the running applications.
Considering the combination of opcodes and system calls is sequential, in this study, we apply
a Convolutional Neural Network (CNN)–Bidirectional Long Short‐Term Memory
(BiLSTM)–Attention model which is capable of classifying Android applications as benign
or malicious. The experiments demonstrate that CoDroid obtains a better performance
compared with some rated approaches.

1.1 | Our contributions

The following is a list of our contributions:

• Hybrid sequence‐based features and nonessential feature engineering: Our schema uses a
hybrid sequence‐based feature and does not require feature engineering. We use hybrid
5772 | ZHANG ET AL.

sequences of opcode and system calls to analyze the behavior of the apps. The method is
systematic to reveal their static and dynamic features.
• Text classification‐based Android malware detection method: Text classification method in
natural language processing (NLP) was employed in CoDroid. Hybrid sequential sequence
data were classified by a CNN–BiLSTM–Attention model, which could find more sensitive
semantic information.
• Automatic prototype of CoDroid evaluated on real‐world data set: We implemented an au-
tomatic prototype of CoDroid and evaluated on real‐world data set. The results of experi-
ments have shown that CoDroid exceeds some traditional ML algorithms and related
advanced works.

1.2 | Paper organization

We organize the rest of our paper in the following. Section 2 presents related works, Section 3
describes preliminaries, and Section 4 details the methodology. After giving the im-
plementation and experiments in Section 5, this study concludes and discusses future works
in Section 6.

2 | RELATED W ORKS

2.1 | Malware detection on Android using opcode

To classify the Android malware family efficiently, Jiang et al.28 used sensitive opcode se-
quences. Opcode, sensitive APIs, STRs, and actions are used to create a sensitive opcode
sequence in this approach. Furthermore, the technique of oversampling was used to improve
efficiency. Experiments on Android malware classification show that the approach has high
accuracy and area under curve. Zhao et al.29 presented and evaluated a deep CNN‐based
offline tool, using the sequences of opcode as features of the apps, which improved the
security of the Android system. Canfora et al.30 designed an approach based on advanced
classifiers using frequencies of opcodes n‐grams. The experimentation on a recent data set
which includes 11,120 applications showed that an accuracy of 97% on average can be
achieved. ByteDroid31 is an automatic CNN‐based Android malware detection system which
automatically learns malware feature. ByteDroid is capable of not only detecting malware but
also performing well in unknown samples. Furthermore, ByteDroid is resistant to a number
of popular obfuscation techniques.

2.2 | Malware detection on Android using system calls

A method that uses system calls sequences as features was proposed in the paper.32 Using a
training collection of execution system traces, they use ML to build the fingerprint. They tested
their system on real devices and found it to be 97% effective at detecting malware. SWORD33
captures system calls from the apps and uses Markov chain to construct a sequential system call
graph. SWORD obtained an accuracy of 94.2% in experiments on a data set containing 2000
Android samples from various sources. Vinod et al.34 investigated system calls with two feature
ZHANG ET AL. | 5773

selection approaches to tackle Android malware. Human interaction and random inputs are
used to generate machine calls. In addition, the detector's robustness against adversarial ex-
amples was investigated. DL‐Droid35 is an automated dynamic analysis framework that em-
ploys DL. Experiments of over 30,000 applications on real devices were conducted. The results
reveal that DL‐Droid can achieve better performance both with dynamic features only and with
hybrid features.

2.3 | Malware detection on Android using DL method

Droid‐sec17 uses Deep Belief Network (DBN) to implement a tool for detecting Android
malware that employs hybrid analysis. And Droid‐sec obtained a 96% accuracy rate.
DroidDeep18 starts by extracting five different types of static features from a different number
of applications, both goodware and malware. The DBN model is used to learn the features,
finally, a support‐vector machine (SVM) algorithm is used to classify those samples.
DroidDeep achieves a 99.4% accuracy score, according to the result. Detection performance is
improved in a new version of DroidDeep36 because the work employs DL to extract
remarkable behavioral features from apps. DeepRefiner37 is an automatic semantic‐based
framework that could extract related features from the method‐level bytecode using LSTM
with multiple hidden layers. DeepRefiner's accuracy is up to 97.74%, according to the results.
Kim et al.20 introduce the first multimodal approach for detecting Android malware using
DL. As far as feature extraction approaches go, they came up with two new feature
representations: existence‐based and similarity‐based.

2.4 | Malware detection on Android using NLP

McLaughlin et al.38 make use of raw opcode sequences derived from Android apps as
features, traditional neural networks were used to detect Android malware. MalDozer39 used
the sequences of raw API method as static feature, and DL method was used to recognize
malicious samples and attribute their families. Experiments on a large scale showed that
MalDozer was efficient and effective in malware detection. In MalDy,40 behavior reports of
Android malware were generated by the Droidbox tool, and then the authors use the bag of
words (BoW) to model these reports into a sequence of words. MalDy constructs an ensemble
model to detect and attribute malware. TextDroid41 is an automatic and efficient Android
malware detection system that leverages NLP and ML techniques. Word segmentation and
the n‐gram model were used to model mobile traffic to n‐gram sequences, and TextDroid
applies a feature selection algorithm to capture significant features. Then, the authors
developed a detection model with an SVM algorithm that achieved a good performance in the
test set. DySign42 utilizes an advanced NLP method to automatically generate fingerprints
from dynamic behaviors of Android malware. DySign achieved enhanced Android malware
detection task with family classification and performed a good detection performance
with high scalability in the evaluation of a real‐life data set. Overall, most of the existing NLP‐
based solutions utilize traditional ML methods, feature engineering is in need. Our method,
on the other hand, employs a hybrid sequence‐based feature that does not require feature
engineering and employs an advanced DL algorithm. As a result, our method could capture
more sensitive features and thus achieving better performance.
5774 | ZHANG ET AL.

3 | BACKGROUND

3.1 | Word embedding

Word embedding is an important part of NLP, whose function is mapping the input into a
low‐dimensional vector representation. Through enabling related words to similar vector
representations, word embedding can help minimize computational complexity. Word embedding is
always used in different tasks, such as syntax analysis and sentiment analysis. When used as a
low‐level input representation, word embedding has been shown to improve the performance of NLP
tasks. The popular methods include Word2vec43 and GloVe.44 For text classification task, word
embedding could capture the syntactic and semantic similarity between words of a document.

3.2 | CNN

CNNs are a special kind of neural network which are always used to process image data, such
as image classification and image recognition. In addition, CNNs are commonly used in a
variety of fields, including face recognition, text classification, target segmentation, and so on.
For text classification task, CNNs have been shown to be very efficient. Each input refers to a
k × n‐dimensional feature map, where n incidents that there are n sentences in each input, and
the dimension of each sequence vector is k (padded as needed). The feature map is then
subjected to a max‐pooling over time process. The features that correspond to the highest‐
valued filter are chosen. Capturing the most valuable features for each feature map, which are
those with the highest importance is the purpose.

3.3 | LSTM/BiLSTM

Long Short‐Term Memory (LSTM)45 network refers to a kind of special Recurrent Neural Net-
work (RNN) architecture. LSTM networks improve the problem of long‐term dependence, thus
being very suitable to process sequence information. Extensive experiments have shown that
LSTM networks are successful in machine translation, sentiment classification, image captioning,
and other tasks. In LSTM network, the input sequence is processed word by word. The memory
pt and hidden state ht are then modified at time stage t using the following equations:

⎡ it ⎤
⎢ ⎥ ⎡ σ ⎤
⎢ ft ⎥ = ⎢ σ ⎥ W ⋅ [ht −1, x t ] + b, (1)
⎢ ot ⎥ ⎢⎢ σ ⎥⎥
⎢⎣ ~ ⎣ tanh ⎦
ct ⎥⎦

ct = ft ⊙ ct −1 + it ⊙ ~
ct , (2)

ht = ot ⊙ tanh(ct ). (3)

However, a single LSTM can only save information in one direction before the sequence. In
CoDroid, we employed BiLSTM, which is an optimization of the LSTM network. A
ZHANG ET AL. | 5775

bidirectional LSTM network could extract the full‐text context information effectively because
it can summarize sequence information from two directions.

3.4 | Attention mechanism

Attention mechanism46 is a significant breakthrough in the area of DL. Initially, it was created to
tackle the problem of encoding a long sequence, which is needed in Seq2Seq models. Currently, it
was widely used in ML applications, such as machine translation, sentiment classification, au-
tomatic summarization, automatic question and answer, dependency analysis, and so forth.

4 | METHOD OL OGY

4.1 | Overall architecture

We now introduce CoDroid, a system that employs DL‐based hybrid analysis. Figure 1 depicts
CoDroid's architecture, which makes up three main components: preprocessing, sequence
generation, and DL model. Training data for CoDroid are behavior sequences from Android
benign applications and malware. The rest of this section details the three components.

4.2 | Preprocessing

We first describe how to deal with Android apps in the preprocessing stage. An Android app is
first decompiled and then executed in Android Emulator in preparation for obtaining system
calls. Through reverse engineering, we can get the Dalvik Executable (DEX) file, a compiled
Android application code file that includes details on all of the project's class files. Moreover,
we will detail how to extract system calls when the app is running in the next stage.

FIGURE 1 Architecture of CoDroid


5776 | ZHANG ET AL.

4.3 | Sequence generation

4.3.1 | Opcode sequence

After preprocessing, as seen in Figure 1, CoDroid uses static malware analysis to extract opcode
sequences. Each single class and its all methods were defined in each smali file. Each method
consists of many instructions, and a single Dalvik opcode and several operands make up
each instruction. In this phase, we extract opcode sequences from each method and then discard
the operands. Figure 2 shows a .smali file example of benign APK sample “suannihen.apk”
(MD5:909cbf6548ee4951882a69408d3bd333).

4.3.2 | System calls sequence

System calls are generated when the application is running, which could describe how the
services commutations between application and Linux kernel of Android platform. The use of
system calls sequences could reveal dynamic behaviors of applications and thus help identify
benign or malicious. To obtain this feature, we use Strace47 scripts to log system traces of
running applications. During the run time, Monkey48 is used to produce a variety of system
events and emulate User Interface (UI) interactions. Figure 3 shows a segment of a benign APK
sample “suannihen.apk” (MD5:909cbf6548ee4951882a69408d3bd333) collected system calls list.
The process of extracting system calls is detailed in Algorithm 1.

Algorithm 1 Extract system calls

Input: apk_dataset: The collection of apk samples,


Output: SysCall: System calls of apk_dataset
1: for each apk in apk_dataset do
2: Package_name = getPackage_name(apk);
3: Activity_name = getActivity_name(apk);
4: lauchApk(Activity_name);
5: PID = getPID(Package_name);
6: stracePID(PID);
7: killPID(PID);
8: sleep(5);
9: uninstall(Package_name);
end for
return SysCall

4.4 | DL model

We employed the CNN–BiLSTM–Attention model49 in our approach, as illustrated in Figure 4,


which contains a sequence embedding layer, CNN layer, BiLSTM Layer, attention layer, and
output layer. In the model, we can get global sequence features from the BiLSTM layer. The
attention layer focuses more on the words that contribute to the classification of benign or
malicious and it is useful to help understand the semantic of sentences. Next, we will go
through each layer in the model in detail from bottom to top.
ZHANG ET AL. | 5777

FIGURE 2 Instance of Dalvik opcode sample in the .smali file

FIGURE 3 Strace output of a sample application

F I G U R E 4 Architecture of the CNN–BiLSTM–Attention model.49 BiLSTM, Bidirectional Long Short‐Term


Memory; CNN, Convolutional Neural Network; LSTM, Long Short‐Term Memory

• Sequences embedding layer: In our model, input is sequences of opcode and system calls. This
layer is used to map the words w1, w2, …, wN into a vector space RW with low‐dimensional,
where the number of words in a sequence is indicated as N , and W denotes the size of the
embedding layer. In this paper, we employ Word2vec43 to embedding words.
5778 | ZHANG ET AL.

• CNN layer: We utilize CNN layer to extract local and high‐level features from input textual
sequences, reduce the dimension of the word vector generated by word embedding as well.
The window of word x i : i + m −1 generates n th feature sequence Sn as

(
Sn = relu WcT x i : i + m −1 + b , ) (4)

where relu(·) represents a nonlinear activation function, and a bias vector is presented as b.
• BiLSTM layer: Through the construction of forward and backward LSTM by using the hidden
units, we could obtain the high‐level feature representation. The CNN layer's output feeds
into the BiLSTM layer's data. At time step i , the output of BiLSTM is described as

→ ←
hj = [ hj ⊕ hj ]. (5)

• Attention layer: It is evident that each word contributes differently to the malicious or benign
properties of behaviors. To extract malicious features (words) that are significant to reveal
the behavior properties, the features (words) were assigned different weights based on the
attention mechanism.50 Finally, we obtain the sentence representation by aggregating the
representations of those malicious features (words), as shown in

eij = a (si −1, hj ), (6)

exp(eij )
αij = T
, (7)
∑kx=1exp(eik )

T
ci = ∑ j x=1αij hj . (8)

• Output layer: In CoDroid, we feed the vector representations into a fully connected soft‐max
layer, which is used to generate conditional probabilities to accomplish the classification
task. This layer outputs the probability on class Benign and Malicious. Finally, CoDroid
returns a classification decision in the form of a score ranging from 0 to 1, where 0 denotes
Benign, and 1 stands for Malware, as shown in

eWr + b
p= . (9)
∑i ∈ [1, L] (eWi r + b)

5 | EX PERIMEN TS

This section focuses on evaluating the detection performance of our CoDroid which we
described in Section 4. All of the experiments were run on a machine which is equipped with
the Ubuntu 18.04 system with Intel Xeon E5‐2620 v4 CPU, @2.10 GHz, and 32 GB RAM.
We use Genymotion51 to execute apps and get system calls. The modules of CoDroid were
implemented in Android Debug Bridge (ADB52) and Python scripts. Keras scientific computing
environment was used to build our model. Furthermore, parameters of the model were
optimized using Adadelta with a learning rate of 1e2 for 20 epochs and a minibatch size of
64 during training. We now go on to an experimental evaluation in terms of its efficiency and
effectiveness.
ZHANG ET AL. | 5779

5.1 | Data set

In our evaluation, the benign applications are collected from the PlayDrone data set.53 We
gather apps from DREBIN data set2 to construct our malware collection. To clean the data set,
we use the AndroGuard54 tool to remove replicated and invalid applications. In addition, all
applications are verified by the online malware detection engine VirusTotal.55 The final data set
contains 2978 malicious applications and 2707 benign ones.

5.2 | Evaluation parameters

There we use mainly four metrics to measure our malware detection approach, specifically,
F1‐score, precision, recall, and accuracy. True positive (TP) denotes that malware was rightly
classified as malicious apps, false positive (FP) indicates that benign samples were misclassified
as malware, true negative (TN) indicates that truly classified benign apps were correctly
distinguished as benign, and malware samples classified as benign are defined as false
negative (FN).
The ratio of rightly classified samples to the quantity of all apps is defined as accuracy.

TN + TP
Accuracy = . (10)
(TN + TP + FN + FP )

The percentage of correctly found out benign apps to the quantity of all identified benign apps
refers to precision.

TP
Precision = . (11)
(TP + FP )

The percentage of correct malware predicted to all classified malicious samples is referred to as
recall.

TP
Recall = . (12)
(TP + FN )

F1‐score is defined as

Recall ∗ Precision
F1‐score = 2 ∗ . (13)
(Recall + Precision)

5.3 | Performance evaluation

5.3.1 | Run‐time performance

CoDroid can conduct an analysis in about 80 s per Android app. In addition, Figure 5
depicts the time needed for sequence generation and training for a variety of Android apps.
We can see that as the number of Android apps grows, the processing time grows almost
linearly.
5780 | ZHANG ET AL.

FIGURE 5 Process of CoDroid with different number of Android apps: (A) sequence generation and (B) training

TABLE 1 Detection performance comparison


F1‐score Recall Precision F1‐score Recall Precision Overall
Algorithm (M) (M) (M) (B) (B) (B) accuracy
KNN 0.96 0.94 0.97 0.94 0.96 0.92 0.95
LR 0.95 0.96 0.94 0.93 0.92 0.94 0.94
NB 0.89 0.88 0.90 0.86 0.87 0.85 0.88
RF 0.95 0.96 0.94 0.95 0.94 0.96 0.94
CoDroid 0.96 0.93 0.98 0.96 0.98 0.95 0.97
Note: The bold values stand for highest value of every metric.
Abbreviations: B, benign apps; KNN, K‐nearest neighbor; LR, logistic regression; M, malware ones; NB, naive Bayes; RF,
random forest.

5.3.2 | Detection performance

We list the performance of CoDroid and compare CoDroid to several traditional ML algorithms
containing K‐nearest neighbor (KNN), Naive Bayes (NB), Logistic Regression (LR), and Random
Forest (RF). Table 1 shows the comprehensive results. We can see that the model of the proposed
system exceeds the common ML methods in most of the performance metrics. For malware apps,
our model has the highest F1‐score and precision; for benign apps, we achieved the highest
F1‐score and recall. Especially, our model obtained an accuracy of 97% on the overage.

5.3.3 | Comparison with related detection methods

We analyze other related approaches from the literatures to compare CoDroid's performance to
that of other detection systems. In Table 2, we compare the performance of CoDroid with an
opcode sequence‐based approach presented by McLaughlin et al.,38 two methods based on
opcode, and a method based on a sequence of system calls. In our evaluation, almost all of the
ZHANG ET AL. | 5781

TABLE 2 Detection performance comparison


Approach Model Accuracy F1‐score Precision Recall
56
opcode n‐grams (n = 3) RF 0.946 0.945 0.951 0.935
57
opcode‐LSTM LSTM 0.938 0.936 0.957 0.895
58
Sequence of system calls SVM 0.933 0.906 0.944 0.876
38
McLaughlin et al. CNN 0.948 0.947 0.953 0.935
CoDroid CNN–BiLSTM–Attention 0.976 0.986 0.954 0.978
Note: The bold values stand for highest value of every metric.
Abbreviations: BiLSTM, Bidirectional Long Short‐Term Memory; CNN, Convolutional Neural Network; LSTM, Long Short‐
Term Memory; RF, random forest; SVM, support‐vector machine.

key experiment metrics show that CoDroid outperforms the other approaches. In particular,
CoDroid is capable of achieving a 97.6% accuracy score.

6 | C O N C L U S IO N S A N D F U T U R E W O R K

In this paper, CoDroid, an automated sequence‐based hybrid analysis system for malware
detection on Android using DL is proposed. To boost malware detection for Android‐based
systems, we extract sequence‐based features which combine opcodes sequences with system
calls sequence. In this study, we use a CNN–BiLSTM–Attention model to automatically
learn sensitive features from Android apps. The results of our evaluation show the promise
of this approach, as CoDroid outperforms other similar methods and can differentiate
malware with reasonable accuracy. In the future, we would try to support the analysis of
native compiled libraries and more rich dynamic features of the sandbox (e.g., cuckoo)
using real devices.

ACKNOWLEDGMENTS
We gratefully acknowledge the support of the National Natural Science Foundation of China
under Grant Nos. U1936218 and 61876019, and Zhejiang Lab (Grant No. 2020LE0AB02).

ORCID
Nan Zhang https://orcid.org/0000-0001-7328-6159
Jingfeng Xue https://orcid.org/0000-0002-3087-9701
Yuxi Ma https://orcid.org/0000-0001-5639-5214
Ruyun Zhang https://orcid.org/0000-0002-4173-2668
Tiancai Liang https://orcid.org/0000-0002-6334-0411
Yu‐an Tan https://orcid.org/0000-0001-6404-8853

REFERENCES
1. Development of Android malware worldwide 2016‐2020. Statista [Online]. Available at https://www.
statista.com/statistics/680705/global-android-malware-volume/
2. Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K. DREBIN: effective and explainable detection of
Android malware in your pocket. In: NDSS, San Diego, CA, USA; 2014;14:23‐26.
5782 | ZHANG ET AL.

3. Onwuzurike L, Mariconti E, Andriotis P, Cristofaro ED, Ross G, Stringhini G. MaMaDroid: detecting


Android malware by building markov chains of behavioral models (extended version). ACM Trans Priv
Secur. 2019;22(2). https://doi.org/10.1145/3313391
4. Aafer Y, Du W, Yin H. DroidAPIMiner: mining API‐level features for robust malware detection in Android.
Lect Notes Inst Comput Sci Soc Inf Telecommun Eng. 2013;127:86‐103.
5. Yang C, Xu Z, Gu G, Yegneswaran V, Porras PA. DroidMiner: automated mining and characterization of
fine‐grained malicious behaviors in Android applications. In: Kutyłowski M, Vaidya J, eds. Computer
Security ‐ ESORICS 2014. Cham: Springer International Publishing; 2014:163‐182.
6. Zhang M, Duan Y, Yin H, Zhao Z. Semantics‐aware Android malware classification using weighted con-
textual API dependency graphs. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and
Communications Security. New York, NY, USA: Association for Computing Machinery; 2014:1105‐1116.
7. Yan LK, Yin H. Droidscope: seamlessly reconstructing the OS and Dalvik semantic views for dynamic
Android malware analysis. In: Proceedings of the 21st USENIX Conference on Security Symposium,
Security'12. USA: USENIX Association; 2012:569‐584.
8. Bläsing T, Batyuk L, Schmidt A, Çamtepe S, Albayrak S. An Android application sandbox system for
suspicious software detection. In: 2010 5th International Conference on Malicious and Unwanted Software.
Nancy, France: IEEE; 2010:55‐62.
9. Saracino A, Sgandurra D, Dini G, Martinelli F. MADAM: effective and efficient behavior‐based Android
malware detection and prevention. IEEE Trans Depend Secure Comput. 2016;15(1):83‐97.
10. Enck W, Gilbert P, Han S, et al. TaintDroid: an information‐flow tracking system for realtime privacy
monitoring on smartphones. ACM Trans Comput Syst (TOCS). 2014;32(2):1‐29. https://doi.org/10.1145/
2619091
11. Shabtai A, Kanonov U, Elovici Y, Glezer C, Weiss Y. Andromaly: a behavioral malware detection frame-
work for Android devices. J Intell Inf Syst. 2012;38(1):161‐190.
12. Burguera I, Zurutuza U, Nadjm‐Tehrani S. Crowdroid: behavior‐based malware detection system for An-
droid. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices,
SPSM '11. New York, NY, USA: Association for Computing Machinery; 2011:15‐26.
13. Chen S, Xue M, Tang Z, Xu L, Zhu H. Stormdroid: a streaminglized machine learning‐based system for
detecting Android malware. In: Proceedings of the 11th ACM on Asia Conference on Computer and Com-
munications Security, ASIA CCS '16. New York, NY, USA: Association for Computing Machinery;
2016:377‐388.
14. Lindorfer M, Neugschwandtner M, Platzer C. MARVIN: efficient and comprehensive mobile app classifi-
cation through static and dynamic analysis. In: Proceedings of the 2015 IEEE 39th Annual Computer Software
and Applications Conference, COMPSAC '15. USA: IEEE Computer Society; 2015;2:422‐433.
15. Spreitzenbarth M, Freiling F, Echtler F, Schreck T, Hoffmann J. Mobile‐sandbox: having a deeper look into
Android applications. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC'13.
New York, NY, USA: Association for Computing Machinery; 2013:1808‐1815.
16. Cai H, Fu X, Hamou‐Lhadj A. A study of run‐time behavioral evolution of benign versus malicious apps in
Android. Inf Softw Technol. 2020;122:106291.
17. Yuan Z, Lu Y, Wang Z, Xue Y. Droid‐sec: deep learning in Android malware detection. In: Proceedings of
the 2014 ACM Conference on SIGCOMM. 2014:371‐372.
18. Su X, Zhang D, Li W, Zhao K. A deep learning approach to Android malware feature learning and
detection. In: 2016 IEEE Trustcom/BigDataSE/ISPA. Tianjin, China: IEEE; 2016:244‐251.
19. Amin M, Tanveer TA, Tehseen M, Khan M, Khan FA, Anwar S. Static malware detection and attri-
bution in Android byte‐code through an end‐to‐end deep system. Future Gener Comput Syst. 2020;102:
112‐126.
20. Kim T, Kang B, Rho M, Sezer S, Im EG. A multimodal deep learning method for Android malware
detection using various features. IEEE Trans Inf Forensics Secur. 2018;14(3):773‐788.
21. Chen S, Xue M, Fan L, et al. Automated poisoning attacks and defenses in malware detection systems: an
adversarial machine learning approach. Comput Secur. 2018;73:326‐344.
22. Chen X, Li C, Wang D, et al. Android HIV: a study of repackaging malware for evading machine‐learning
detection. IEEE Trans Inf Forensics Secur. 2019;15:987‐1001.
ZHANG ET AL. | 5783

23. Li D, Zhao L, Cheng Q, Lu N, Shi W. Opcode sequence analysis of Android malware by a convolutional
neural network. Concurrency Comput: Pract Exper. 2020;32(18):e5308.1‐e5308.18.
24. Jerome Q, Allix K, State R, Engel T. Using opcode‐sequences to detect malicious Android applications.
In: 2014 IEEE International Conference on Communications (ICC). Sydney, NSW, Australia: IEEE; 2014:
914‐919
25. Das PK, Joshi A, Finin TW. App behavioral analysis using system calls. In: 2017 IEEE Conference on
Computer Communications Workshops (INFOCOM WKSHPS). Atlanta, GA, USA: IEEE; 2017:487‐492.
26. Malik S, Khatter K. System call analysis of Android malware families. Indian J Sci Technol. 2016;9(21):1‐13.
27. Althelaya KA, El‐Alfy E‐SM. Android malware detector based on sequences of system calls and bidirec-
tional recurrent networks. In: Thampi SM, Martinez Perez G, Ko R, Rawat DB, eds. Security in Computing
and Communications. Singapore: Springer Singapore; 2020:309‐321.
28. Jiang J, Li S, Yu M, et al. Android malware family classification based on sensitive opcode sequence.
In: 2019 IEEE Symposium on Computers and Communications (ISCC). Barcelona, Spain: IEEE; 2019:1‐7.
29. Zhao L, Li D, Zheng G, Shi W. Deep neural network based on Android mobile malware detection system
using opcode sequences. In: 2018 IEEE 18th International Conference on Communication Technology
(ICCT). Chongqing, China: IEEE; 2018:1141‐1147.
30. Canfora G, Lorenzo AD, Medvet E, Mercaldo F, Visaggio CA. Effectiveness of opcode ngrams for detection
of multi family Android malware. In: 2015 10th International Conference on Availability, Reliability and
Security. Toulouse, France: IEEE; 2015:333‐340.
31. Kewen Z, Xi L, Pengfei L, Weiping W, Hao‐dong W. ByteDroid: Android malware detection using deep
learning on bytecode sequences. Commun Comput Inf Sci. 2020;1149:159‐176.
32. Canfora G, Medvet E, Mercaldo F, Visaggio CA. Detecting Android malware using sequences of system calls.
In: Proceedings of the 3rd International Workshop on Software Development Lifecycle for Mobile, DeMobile
2015. New York, NY, USA: Association for Computing Machinery; 2015:13‐20.
33. Bhandari S, Panihar R, Naval S, Laxmi V, Zemmari A, Gaur MS. SWORD: semantic aware Android
malware detector. J Inf Secur Appl. 2018;42:46‐56.
34. Vinod P, Zemmari A, Conti M. A machine learning based approach to detect malicious Android apps using
discriminant system calls. Future Gener Comput Syst. 2019;94:333‐350.
35. Alzaylaee MK, Yerima SY, Sezer S. DL‐Droid: deep learning based Android malware detection using real
devices. Comput Secur. 2020;89:101663.
36. Su X, Shi W, Qu X, Zheng Y, Liu X. DroidDeep: using deep belief network to characterize and detect
Android malware. Soft Comput. 2020;24(8):6017‐6030.
37. Xu K, Li Y, Deng R, Chen K. DeepRefiner: multi‐layer Android malware detection system applying deep
neural networks. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P). Los Alamitos,
CA, USA: IEEE Computer Society; 2018:473‐487.
38. McLaughlin N, Rincón JMD, Kang B, et al. Deep Android malware detection. In: Proceedings of the Seventh
ACM on Conference on Data and Application Security and Privacy, CODASPY '17. New York, NY, USA:
Association for Computing Machinery; 2017:301‐308.
39. Karbab E, Debbabi M, Derhab A, Mouheb D. MalDozer: automatic framework for Android malware
detection using deep learning.Digit Invest. 2018;24:S48‐S59.
40. Karbab EB, Debbabi M. MalDy: portable, data‐driven malware detection using natural language processing
and machine learning techniques on behavioral analysis reports. Digit Invest. 2019;28:S77‐S87.
41. Wang S, Yan Q, Chen Z, Yang B, Zhao C, Conti M. TextDroid: semantics‐based detection of mobile
malware using network flows. In: 2017 IEEE Conference on Computer Communications Workshops (IN-
FOCOM WKSHPS), MobiSec 2017. Atlanta, GA, USA: IEEE; 2017:18‐23.
42. Karbab E, Debbabi M, Alrabaee S, Mouheb D. DySign: dynamic fingerprinting for the automatic detection
of Android malware. In: 2016 11th International Conference on Malicious and Unwanted Software (MAL-
WARE). Los Alamitos, CA, USA: IEEE Computer Society; 2016:1‐8.
43. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and
their compositionality. In: Proceedings of the 26th International Conference on Neural Information Proces-
sing Systems, NIPS'13. Red Hook, NY, USA: Curran Associates Inc; 2013;2:3111‐3119.
5784 | ZHANG ET AL.

44. Pennington J, Socher R, Manning CD. GloVe: global vectors for word representation. In: Proceedings of the
2014 Conference on Empirical Methods in natural language processing (EMNLP). Doha, Qatar: Association
for Computational Linguistics; 2014:1532‐1543
45. Hochreiter S, Schmidhuber J. Long short‐term memory. Neural Comput. 1997;9(8):1735‐1780.
46. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014.
arXiv preprint arXiv:1409.0473.
47. Strace. http://linux.die.net/man/1/strace
48. Android monkey. https://developer.Android.com/studio/test/monkey.html
49. Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification.
Neurocomputing. 2019;337:325‐338.
50. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. 2017. arXiv preprint arXiv:1706.03762.
51. Genymotion Android Emulator. https://www.genymotion.com/
52. Android Debug Bridge. http://developer.Android.com/tools/help/adb.html
53. Viennot N, Garcia E, Nieh J. A measurement study of google play. In: The 2014 ACM International
Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '14. New York, NY, USA:
Association for Computing Machinery; 2014:221‐233.
54. Androguard. https://github.com/androguard/androguard
55. VirusTotal. https://www.virustotal.com/
56. Kassadinsw. https://github.com/Kassadinsw/AndroidMalware‐ngram‐RF
57. opcode‐LSTM. https://github.com/ndhpro/opcode‐lstm
58. Akhilesh64. https://github.com/Akhilesh64/Android‐Malware‐Detection

How to cite this article: Zhang N, Xue J, Ma Y, Zhang R, Liang T, Tan Y‐A. Hybrid
sequence‐based Android malware detection using natural language processing.
Int J Intell Syst. 2021;36:5770‐5784. https://doi.org/10.1002/int.22529

You might also like