Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Wireless Personal Communications

https://doi.org/10.1007/s11277-022-09765-0

An Efficient Android Malware Detection Using Adaptive Red


Fox Optimization Based CNN

P. C. Senthil Mahesh1 · S. Hemalatha2

Accepted: 5 May 2022


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

Abstract
Android smartphones are employed widely due to its flexible programming system with
several user-oriented features in daily lives. With the substantial growth rate of smart-
phone technologies, cyber-attack against such devices has surged at an exponential rate.
Majority of the smartphone users grant permission blindly to various arbitrary applica-
tions and hence it weakens the efficiency of the authorization mechanism. Numerous
approaches were established in effective malware detection, but due to certain limitations
like low identification rate, low malware detection rate as well as category detection, the
results obtained are ineffective. Therefore, this paper proposes a convolutional neural net-
work based adaptive red fox optimization (CNN-ARFO) approach to detect the malware
applications as benign or malware. The proposed approach comprising of three different
phases namely the pre-processing phase, feature extraction phase and the detection phase
for the effective detection of android malware applications. In the pre-processing phase, the
selected dataset utilizes Minmax technique to normalize the features. Then the malicious
APK and the collected benign apps are investigated to identify and extract the essential
features for the proper functioning of malware in the extraction phase. Finally, the android
mobile applications are detected using CNN based ARFO approach. Then the results based
on detecting the benign and malicious applications from the android mobiles are demon-
strated by evaluating certain parameters like model accuracy rate, model loss rate, accu-
racy, precision, recall and f-measure. The resulting outcome revealed that the detection
accuracy achieved by the proposed approach is 97.29%.

Keywords Android phones · Malware applications · Convolution neural network · Red


fox · Databases · Detection rate

* P. C. Senthil Mahesh
pcsenthilmahesh@gmail.com
1
Department of Computer Science and Engineering, Excel Engineering College,
Komarapalayam, Namakkal District, Tamil Nadu, India
2
Department of Computer Science and Engineering, Panimalar Institute of Technology, Chennai,
Tamil Nadu, India

13
Vol.:(0123456789)
P. C. S. Mahesh, S. Hemalatha

1 Introduction

Since 2021, the usage of android phones popularity has been confirmed as a steady rise due
to its easy accessing capability, applications and functionalities. Among numerous mobile
operating systems, the most widely used operating system is the Google Android operating
system [1]. As reported by strategy analytics, 87.5% of smartphones market share utilizes
the android OS globally. Professionals at AV-TEST foundation enumerated approximately
1.61 million malicious applications during recent years. The most significant reasons for
such a substantial growth rate involve injecting, repacking and cloning variants of mali-
cious apps [2]. In addition to this, the malware applications are enhancing as a continu-
ous threat that includes bank fakes, lies, unethical use, and compensation business for an
attacker. The android proposal offers numerous security techniques which limit malicious
functions and particularly android authorization control mechanism. Majority of the users
generally grant permission blindly to various arbitrary applications and hence it weakens
the efficiency of the authorization mechanism [3]. Subsequently, the authorization mecha-
nism could barely limit the malware application propagation.
Numerous research scholars developed machine learning approaches for detecting
android malicious applications [4]. Generally, the machine learning approach comprises
two significant steps: (a) feature set construction and (b) classification model construction.
The feature sets are constructed using opcodes, android permissions, application program-
ming interface (API) and activities and the classification models are constructed using
recurrent neural networks (RNN), deep belief network (DBN) and convolution neural net-
work (CNN) [5]. The machine learning techniques automatically deduce application based
behaviour when integrated with program analyzing approaches. These approaches are clas-
sified into two major categories namely dynamic approach and static approach. The static
approach is considered as the most beneficial approaches that quickly scans and identifies
malicious applications [6]. In order to get away from statistical analysis, numerous malware
apps established a sequence of deformation technological advancements namely execution
native code, encryption bytecode as well as reflection [1]. Such transformation approaches
significantly challenge static analytical techniques [7–16].
On the other hand, due to the resisting code transformation methods, dynamic analy-
sis becomes a promising approach to monitor and perform runtime monitoring behaviours
[17]. The monitoring of runtime behaviours includes system calls, operation of hidden
icon, API calls thereby combining the algorithms of supervised learning to execute effi-
cient detection of malware. Nevertheless the function of advanced technologies to enhance
the life quality and making the cyber world a better spot, but new levels of malicious
attacks are rising at an alarming rate. These new types of attacks are rising day by day that
infringes the smartphone security [18]. In accordance with the Android OS, the security
policies can be violated in diverse ways. Few existing approaches utilize permissions and
API calls as a feature to evaluate the existence of malware applications in android phones.
It reports that detecting malicious applications is not much effective in statistical analysis
when compared with dynamic analysis. In addition to this, few research studies utilized
deep learning approaches to evaluate the existence of malware applications. But due to cer-
tain limitations like low identification rate, low malware detection rate as well as category
detection, the results obtained are ineffective [19].
In order to enhance the detection results, this paper proposes a CNN based ARFO
approach. Three different phases namely the pre-processing phase, feature extraction
phase and the detection phase are employed. This paper focuses on addressing the above

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

mentioned limitations thereby attaining highly effective method to identify and detect mal-
ware apps. The major contribution of the paper is mentioned below.

• Proposing an effective and functional CNN-ARFO approach to detect the android mal-
ware applications as malicious or benign.
• Evaluating the effectiveness of the proposed approach thereby focussing on requesting
permission and exhibiting numerous malicious codes.
• Enhancing the detection performances and attaining an improved accuracy rate.
• Comparing the proposed approach with other state of art studies to determine the effec-
tiveness of the system.

The rest of the paper is structured in the following manner: In Sect. 2, the theoretical
background to detect malware applications. The proposed methodology describing three
different phases namely the pre-processing phase, feature extraction phase and detection
phase are described in Sect. 3. The experimental results and comparative analysis are dis-
cussed in Sect. 4. Section 5 concludes the article along with future work directions.

2 Theoretical Background

In recent years, numerous research scholars have been focused on the detection of android
malware applications based on diverse types of approaches. A few relevant works and their
techniques are summarized in Table 1.
Wu et al. [20] utilized a multiple view information integration technique to identify
the malware family and to detect the android malware applications. Here, multiple ker-
nel learning (MKL) is employed in classifying the malware applications as malicious or
benign. This paper utilized different types of databases obtained from android Malware
Genome, Drebin, Virus Share project, Google Play store. The experimental investigation
is conducted for various parameter measures namely accuracy, precision, recall, F-measure
to obtain enhanced performance rate. Meanwhile, this approach was highly expensive in
terms of time and memory.
Intelligent malware identification for android (IMIAD) platform was demonstrated by
Hussain et al. [21] and here, the random forest, decision tree, nave bayes, gradient boosting
classification techniques were employed for classifying the malware applications. The per-
formance analyses were performed and the result concludes that the accuracy rate was high
while detecting the malware applications. But this approach failed to compute dynamic
properties and memory utilization.
Feng et al. [22] developed a novel dynamic android malware detection system based
on ensemble learning approaches namely decision tree and chi-squared techniques. The
data sets utilized were obtained from Google Play Store, AndroZoo, Drebin databases. The
performance measures namely accuracy, precision, recall, F-measure, true positive value,
false positive value were evaluated for the respective datasets to enhance the classification
performances. Meanwhile this approach was imprecise with poor dynamic coverage.
Android malware detection based on system call sequences and long short term mem-
ory (LSTM) was proposed by Xiao et al. [23]. The malware is classified into malicious and
benign by evaluating the simulation metrics namely accuracy, precision, recall, F-measure,
true positive value, false positive value. The performance investigation was carried out and the
obtained result outcome demonstrated that the efficiency and recall was high with minimum

13
Table 1  Summary of existing literature survey
Authors and references Technique employed Databases utilized Performance measures Strength Limitations

13
Wu et al. [20] Multiple kernel learning Android Malware Genome, Accuracy, precision, recall, Enhanced performance rate Highly expensive in terms of
technique, MVIID roid Drebin, Virus Share pro- F-measure time and memory
ject, Google Play store
Hussain et al. [21] Random forest, decision Google Play store, Accuracy, precision, recall, High rate of accuracy to Failed to determine dynamic
tree, nave bayes, gradient Apkpure.com, Drebin F-measure, true positive detect malware applica- properties and
boosting databases value, false positive value tions
Feng et al. [22] decision tree, chi-squared Google Play store, Andro-
Accuracy, precision, recall, High classification perfor- Imprecise and poor dynamic
technique zoo, Drebin databases F-measure, true positive mances coverage
value, false positive value
Xiao et al. [23] Long short-term memory Drebin, Google Play store Accuracy, precision, recall, High efficiency, recall and High noise signal, failed to
F-measure, true positive minimum false positive pre-process, poor accuracy
value, false positive value rate
Imtiaz et al. [24] Deep artificial neural CICInvesAndMal2019 and Accuracy, precision, recall, Enhanced malware clas- High security issues
network CICAndMal2017 F-measure sification
Mahindru et al. [25] Machine learning and Google’s Play store, Accuracy, precision, recall, Enhanced malware detec- Failed to detect new malware
clustering approach mumayi, AndroMalshare F-measure, true positive tion rate
value, false positive value
Zhu et al. [26] Principal component Official Android market and Matthews Correlation Capable of extracting Failed to implement in real
analysis, support vector https://​virus​share.​com/ Coefficient, specificity, high sensitive data, high world applications
machine, multilayer Accuracy, precision, accuracy
perceptron recall, F-measure
Su et al. [27] Deep belief network Drebin,several unofficial Accuracy, precision, recall, Enhanced detection accu- Insufficient to detect huge
markets, official Android F-measure, recall racy, required less time malwares
market
Zhang et al. [28] Method level behavioural Drebin, AMD Accuracy, precision, recall, High F-measure, high accu- Highly expensive
semantic approach F-measure racy, system stability
Wang et al. [29] Deep auto encoder and con- Drebin, Google Play store Accuracy, precision, recall, High accuracy and mini- Failed to detect sophisticated
volution neural network F-measure, recall, time mum execution time android malware
Proposed approach CNN-ARFO Google Play store, private Accuracy percentage, loss High detection accuracy Security issues
companies percentage, precision,
recall, f-measure
P. C. S. Mahesh, S. Hemalatha
An Efficient Android Malware Detection Using Adaptive Red Fox…

false positive rate. In addition, this approach was failed to pre-process the datasets that results
in high noise signals.
Imtiaz et al. [24] developed a highly efficient deep artificial neural network to identify and
detect Android malware applications. The datasets was obtained from CICInvesAndMal2019
and CICAndMal2017 by evaluating the performance measures like accuracy, precision, recall
and F-measure. The experimental evaluations were performed and the analysis revealed that
the malware was classified accurately. High security issues were considered as the major
drawback of this approach.
The Android malware detection framework using machine learning and clustering
approaches was developed by Mahindru et al. [25]. Here, the datasets were collected from
Google’s play store, mumayi, AndroMalShare. Accuracy, precision, recall, F-measure, true
positive value, false positive value were utilized and thus the malware application was detected
accurately but failed to detect new malwares.
Zhu et al. [26] developed a concept to enhance stacking ensemble of deep learning frame-
work namely principal component analysis, support vector machine, multilayer perceptron
for android malware detection. The datasets were collected from official Android market and
https://​virus​share.​com/. Additionally, this approach extracts high sensitive data with high rate
of accuracy by evaluating diverse performance measures namely Matthews Correlation Coef-
ficient, specificity, Accuracy, precision, recall, F-measure. However, this approach failed to
implement in real world applications.
The Android malware detection utilized beep belief neural network was proposed by Su
et al. [27]. The databases involved in this approach as collected from Drebin, several unofficial
markets, official Android market. The experimental results of this approach have proven that
the detection accuracy was high and the time required for execution was minimum. On the
other hand, this approach has insufficient space for detecting huge malware applications.
Zhang et al. [28] utilized method level behavioural semantic approach to obtain an effective
Android malware detection system. Drebin, AMD were the databases utilized for evaluation.
Here, high F-measure, high accuracy, system stability were obtained by evaluating few perfor-
mance measures namely accuracy, precision, recall and F-measure. But this approach seems
to be highly expensive.
Effective android malware detection with a hybrid model based on deep autoencoder and
convolutional neural network was developed by Wang et al. [29]. This paper utilized different
types of databases obtained from Drebin and Google Play Store. The experimental investiga-
tion is conducted for various parameter measures namely accuracy, precision, recall, F-meas-
ure to obtain high accuracy and minimum execution time. On the other hand, this approach
failed to detect sophisticated android malware.
In order to enhance the detection results, this paper proposes a CNN based ARFO approach.
Three different phases namely the pre-processing phase, feature extraction phase and the
detection phase are employed. Then the results based on detecting the benign and malicious
applications from the android mobiles are demonstrated by evaluating certain parameters like
model accuracy rate, model loss rate, accuracy, precision, recall and f-measure.

3 Proposed Framework

Figure 1 depicts the workflow diagram of the proposed approach comprising of three
different phases namely the pre-processing phase, feature extraction phase and the
detection phase. In this section, a proposed CNN-ARFO approach is presented to detect

13
P. C. S. Mahesh, S. Hemalatha

Fig. 1  Workflow of the proposed approach

the malicious android application. The accuracy rate of the proposed approach is based
on API call behaviours and permission request. The phases involved in the proposed
approach are discussed in the subsequent section.

3.1 Pre‑Processing

Pre-processing is considered as a start-up phase to achieve enhanced performances in


any machine learning techniques. In general, the pre-processing phase includes nor-
malization, elimination of NaN, scaling as well as duplicate instances. The dataset thus
selected are ambiguities and consists of low variances. Hence, a MinMax scaling is
selected in normalizing the features. The term normalizing describes the real valued
re-scaling of numerical attributes containing a fixed scale value of 0 and 1 respectively.
It is greatly significant to scale the input attributes that depend on magnitude. The math-
ematical expression involved in normalizing the data Z(Norm) using MinMax scale is
stated as follows.
ZK − ZMIN
Z(Norm) = (1)
ZMAX − ZMIN

From the above equation, the features original value ZK subtracting from the features
minimum value ZMIN and dividing the subtracted value of both minimum and maximum
features [30].

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

3.2 Feature Extraction

During feature extraction, the malicious APK and the collected benign apps are investi-
gated to identify and extract the essential features for the proper functioning of malware
[31]. For feature set selection, few semantic data embedded into the bytecode of appli-
cations are focused. To be more specific, the package level information as well as the
API calls are extracted. In addition to this, the requested approval for certain applica-
tions to generate the baseline model is also extracted using the feature extraction phase.
In general, the feature extraction phase investigates the manifest file thereby extracting
intents and permissions. The two diverse phases involved in this phase are,

• Permission type (regular or hazardous) and its number


• Intent type (regular or hazardous) and its number

Furthermore, both intend and permissions are categorized into four significant groups
namely the regular permissions, hazardous permissions, regular intents and hazardous
intents. The malicious applications often utilize hazardous intents and permissions and
the benign applications employ regular intents and permissions. Each android applica-
tion is accomplished by two diverse features namely the permission feature and API call
feature. A python script is utilized in extracting the API call features and permission
requests from the APK files. The following steps provide the procedure to extract the
API call features [32].
The early phase involves the generation of distinctive packages and later the API
calls, packages containing package-level information (i.e., Android, Java) are extracted.
Most significantly, the android permissions protect few sensitive API calls. Then the
requested permissions for certain apps are extracted thereby representing every app as a
binary vector of API call.
{
1; API employed in the app
Apps =
0; API not employed in the app (2)

Then permissions and API calls are mapped to characterize the association map.
{( ) }
Am = PE , DE |PE ∈ PE , DE ∈ DE PE controls DE (3)

Finally, the extracted API calls are obtained and the entire databases are decompiled
accordingly [33].

3.3 Detection Phase

This section utilizes the integration of convolution neural network and adaptive red fox
optimization algorithm to detect the android applications as malicious or benign. The
malicious mobile detection deals with requesting permission and exhibiting numerous
malicious codes. The technical backgrounds based on CNN and ARFO algorithm are
discussed in the following section.

13
P. C. S. Mahesh, S. Hemalatha

Fig. 2  Typical CNN structure

A. Convolution neural network (CNN)

Generally, the classification of deep neural networks for analyzing and extracting the
data from the databases is the CNN. The typical network structure of CNN is portrayed
in Fig. 2. The CNN building block comprises diverse significant layers namely the input
layer, convolution layer, pooling layer, as well as fully connected layer [34].

• Convolution layer

The convolution layer is the fundamental and central layer of the CNN consisting
of filters of small size. The process of convolution occurs by calculating the dot prod-
uct among filter and database that are expressed in terms of mathematical formula in
Eq. (4).
[ ]
XKL = 𝜎 ZJL−1 ∗ 𝜔1(L)
JK
+ B1(L)
K (4)

From Eq. (4), 𝜔1(L)


JK
and B1(L)
K
signifies the weight function among Jth and Kth node of
Lth layer and bias function of the Lth layer. The sigmoid function is 𝜎 . ZJL−1 is the input
of the convolution layer [35].

• Pooling layer

A down-sampling operating function is accomplished by the pooling layer. Diverse


types of pooling functions exist and maximum pooling is considered as the most com-
monly utilized pooling layer. At this juncture, 2*2 filter containing 2 strides are applied.
The maximum value for each sub-region is returned by the maximum pooling filter. Hence,
the maximum pooling filter of size 2*2*1 is used to the feature set containing the size of
about 4*4*1 to obtain a down-sampled feature containing the size of 2*2*1. The mathe-
matical expression involved in obtaining the output of the pooling layer is stated in Eq. (5).
[ ]
ZKL = 𝜎 ZJL−1 ∗ 𝜔2(L)
JK
+ B 2(L)
K (5)

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

• Fully connected layer

The fully connected layer in other words referred to as hidden layer which is analo-
gous to artificial neural network (ANN). The neurons present in the previous pooling
layer are bounded to each neuron in the preceding layers that involve a huge number of
training parameters.

• Softmax layer

The softmax layer evaluates the probabilistic distribution of a particular event over
diverse events. This function computes every target probabilities Pr(ZKL ) over all proba-
ble targets classes. The mathematical formula to determine the softmax layer operation is
stated in Eq. (6).

Ex(ZKL )
Pr(ZKL ) = ∑I (6)
K=1
Ex(ZKL )

In order to optimize the weight function of CNN, this paper utilized an adaptive red
fox optimization (ARFO) algorithm. An ARFO is selected due to high optimization speed,
minimum computational complexity as well as high precision. In addition to this, the
ARFO algorithm solves complex issues in which the solution space is not well explained
and the mathematical models are solved in an effective manner. The behavior of ARFO is
discussed in the following section.

B. Adaptive Red Fox Optimization (ARFO) Algorithm

Generally, the red fox population comprises two significant categories: (1) leading a
wandering and nomadic lifestyle (2) leaving in well-defined regions. Every herd consists
of a unique territory under alpha couple hierarchy. When the red fox reaches the stage of
adulthood, it leaves the herd and establishes its own herd. In addition to this, the red foxes
are considered as an effective chaser that hunts both wild and domestic animals. The algo-
rithm that imitates the behavioural characteristics and the performances of the red foxes are
referred to as ARFO algorithm. The territory exploration to search its prey is considered as
the global search and the prey prior to the attack is considered as a local search. The step
by step procedure involved in ARFO algorithm is determined in the following section [36].

• Underlying principle of ARFO algorithm

The individual population comprises the red foxes with a constant fox number.
[ J ]T Every
red fox is denoted by Z point containing m coordinates. Here a new notation Z K is intro-
duced for distinguishing every fox where J and K signifies the total number of fox in the
population and the coordinates with respect to solution space dimensions respectively.
Let us assume, the criterion function be F ∈ ℜm containing m variables based on solu-
tion space dimensions and the notation is stated in Eq. (7).
J [ ]
(Z ) = (Z0 )(J) , (Z1 )(J) , … (Zm−1 )(J) (7)

13
P. C. S. Mahesh, S. Hemalatha

The above mentioned J


equation denotes every point in the space ⟨p, q⟩m in which
J
p, q ∈ ℜ. Therefore, (Z ) is considered as an optimal solution if the function F(Z ) is
global maximum or minimum on ⟨p, q⟩.

• Prey search in global search space

During searching process, it is necessary that each respective fox should play a role
among the herd for survival. In addition to this, the members (i.e. red foxes) have to
travel to various places to hunt their prey during the absence of prey in the local sur-
roundings. Soon after obtaining the information regarding the location of the prey, they
share them with their family members for survival. In order to obtain, the best search
space area to hunt for the best prey, the red fox optimization algorithm applies the
search process of chameleon swarm algorithm (CSA) [37]. Therefore, the behavioural
performances in search of prey are stated in Eq. (8)
{ J,K
J,K
ZT + 𝜌1[((ATJ,K − BKT)
) + 𝜌2 (BKT ]− AJ,K
T
) ℜJ ≥ CP
ZT+1 = J,K K K K (8)
ZT + 𝛿 UQ − LQ ℜ3 + LQ S(ℜ − 0.5) ℜJ < CP

From the above equation, ZTJ,K and ZT+1


J,K
signifies the current and new position of the
Jth chameleon with respect to (T + 1)th iteration. The two diverse positive numbers on
controlling the exploration capability are denoted by 𝜌1 and 𝜌2 respectively. The best
local and global best position achieved by the chameleon with respect to Jth dimension
and (T + 1)th iteration is represented by AJ,K
T
and BKT respectively. ℜJ indicates the ran-
dom number ranging from 0 to 1. The chameleon probability that perceives the prey is
represented by CP ; in which CP = 0.1. The direction based on both the exploration and
exploitation process is denoted by S(ℜ − 0.5) and the iteration function parameter is 𝛿 .
This searching operation signifies the proposed global search performances of every red
fox iteration.

• Journey towards domestic regions in local search space

In this phase, the red fox frequently travels to the neighbouring territory looking for
prey. When the prey is spotted in the eyes of red fox, a quiet approach was performed
by the red fox. Usually, the red fox circles around the prey for hunting and the most
interesting fact is that the red fox attacks the prey by surprise. In ARFO algorithm, the
victims are deceived by the moving and observing process during hunting in the local
search space area. The mathematical expression in defining the action of a red fox is
stated in Eq. (9).
Red fox moving closer → 𝛿 > 0.75
(9)
Red fox hides and stays → 𝛿 ≤ 0.75

From the above equation, 𝛿 signifies the movement of population. The movement
visualization using a modified cochleoid equation is utilized by each respective red fox
and the radius of the hunting fox in defining the vision is determined in Eq. (10). Thus,
{
Sin𝜙
b 𝜙 0 if 𝜙0 ≠ 0
R= 0 (10)
𝜗 if 𝜙0 = 0

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

From the above equation, 𝜙1 , 𝜙2 , 𝜙3 , ....𝜙m−1 ∈ 0, 2𝜋 signifies the randFrom the above
equation the scaling parameter is represented by b; where the value of b ranges from 0,
0.2 respectively. The value 𝜙0 = 0, 2𝜋 is chosen to design the observation angle of a fox.
The random value ranging from 0 to 1 is 𝜗. The model based on movement with respect
to the spatial coordinates is determined in Eq. (11).

⎧ Z0NEW = bR ⋅ cos (𝜙1 ) + Z0ACTUAL


⎪ Z NEW = bR ⋅ sin (𝜙1 ) + bR ⋅ cos (𝜙2 ) + Z1ACTUAL
⎪ 1NEW
⎪ Z2 = bR ⋅ sin (𝜙1 ) + bR ⋅ sin (𝜙2 ) + bR ⋅ cos (𝜙3 ) + Z2ACTUAL
⎪⋮
⎨ ∑
m−2 (11)
⎪ Zm−2
NEW
= bR ACTUAL
sin (𝜙K ) + bR ⋅ cos (𝜙m−1 ) + Zn−2
⎪ K=1
⎪ ∑
m−2
⎪ Zm−1
NEW
= bR ACTUAL
sin (𝜙1 ) + bR ⋅ sin (𝜙2 ) + ⋯ + bR ⋅ sin (𝜙m−1 ) + Zn−1
⎩ K=1

omized angular value for every point.

• Breeding and departing from the group

Biologically, the red foxes face numerous challenges. If the red fox doesn’t found prey
in local habitat it needs to move to the remote territories in search of prey. Additionally, the
danger arises from the humans; since it hunts and creates damage to the tamed animal. On
the other hand, the red foxes will never die or migrate completely; instead, it escapes cun-
ningly and reproduces to establish new fox herds.
In order to model such behaviour, 5% of the worst population among the red foxes is
chosen in accordance with the criterion functional value. This value is employed as a sub-
jective postulation for simulating slight modifications among the herd. It is assumed that
the red foxes are either shot by the hunters or gets migrated if the fitness value obtained is
worst. Then later, the new individual replaces the individuals containing the constant popu-
lation by means of habitat territory developed by the alpha couple. [ ] [ ]
1 T 2 T
The alpha couple is represented by the two best individuals namely Z and Z of
the red fox optimization algorithm to compute the centre of the habitat CH as well as the
Euclidean distances DH among the alpha couple. Thus,
[ 1 ]T [ 2 ]T
Z + Z
(12)
Centre of the habitat: CH =
2

‖[ 1 ]T [ 2 ]‖
Euclidean Distance: DH = ‖
‖ Z − Z ‖ ‖ (13)
‖ ‖

For every iteration a random parameter 𝜁 ranging from 0 to 1 is established to define the
replacements. Hence,
New nomadic red fox: 𝜁 ≥ 0.45 (14)

Alpha couple breeding: 𝜁 < 0.45 (15)

13
P. C. S. Mahesh, S. Hemalatha

Initially, it is assumed that the new red foxes leave from the herd and chooses the
nomadic life to search for their prey and for reproduction. Secondly,
[ it]is assumed
[ ] that the
1 T 2 T
new individuals are obtained from the alpha couple to reproduce Z and Z as two
best individuals that further combined to form a new reproduced individual. Therefore,
[ ]
1 T
[ 2 ]T
Z + Z
(16)
New reproduced individual = 𝜁
2
The algorithmic procedure involved in ARFO algorithm is obtained as follows.

C. CNN based ARFO approach for android malware detection

Generally, the data mining process utilizes various machine learning approaches to
predict and classify the data samples. Therefore, it is necessary to choose an appro-
priate approach dependent on a particular application. Here, CNN-ARFO approach is
suited well to detect malicious applications and hence it can also perform relatively
quick classification with minimum computational overhead once the training process

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

Fig. 3  CNN-ARFO approach to detect malicious app

is done. Figure 3 depicts the structural outline of the proposed CNN based AEFO
approach to detect the android mobile applications as malware or benign.

4 Evaluation Results

To compute the performances of the proposed CNN based ARFO approach for detect-
ing malware applications, numerous analyses are performed. Here, numerous simulation
metrics, as well as experimental setups, are utilized to evaluate the performances of the
proposed approach. For further experimentation, the data samples are splitted into two
types namely the testing and the training data. Here 80% of data samples are employed
to train and the rest 20% are utilized in testing.

4.1 Computational Environment

This section depicts the description of various parameters employed in computing the
environment. Table 2 depicts the parameters of the computing environment to evaluate
the proposed approach.

13
P. C. S. Mahesh, S. Hemalatha

Table 2  Hardware specifications Parameters Attributes


of the computing environment
OS Windows 10 with professional 1909
RAM 16 GB
CPU Intel ® Core ™ i7 processor-6700 HQ
CUDA version 9.0
GPU NVIDIA GeForce 1060
Python version 3.8

4.2 Simulation Metrics

In this section, diverse simulation measures namely accuracy, precision, recall and
f-measure are taken into consideration. In addition to this, the mathematical formulae
for each respective parameter are discussed below.

• Accuracy

Accuracy refers to the fraction of recognizing the accurate record by the overall test
databases. Higher the machine learning model results in enhancing the accuracy rate.
Additionally, the accuracy performs as a significant metrics for the test databases with bal-
anced classes. Therefore,
| | | |
|TNEG(Normal) | + |TPOS(Abnormal) |
| | | |
Accuracy =
| | | | | | (17)
|TNEG(Normal) | + TPOS(Abnormal) + |FNEG(Abnormal) | + |FPOS(Normal) |
| | | | | |

• Precision

The term precision refers to the ratio of instances that are classified accurately to
the total number of instances classifies as the given class. The mathematical expression
involved in obtaining the precision value is stated in Eq. (18).
| |
|TPOS(Abnormal) |
| |
Precision =
| | | | (18)
|TPOS(Normal) | + |FPOS(Abnormal) |
| | | |
Recall

The recall refers to the accurately determined percentage value as malicious are deter-
mined in Eq. (19). Therefore,
| |
|TPOS(Abnormal) |
| |
Recall =
| | | | (19)
|TPOS(Abnormal) | + |FNEG(Normal) |
| | | |

• F-Measure

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

By integrating two diverse measures namely the precision and recall, the F-Measure
value is obtained. Hence,
Recall × Precision
F-Measure = 2 × (20)
Recall + Precision

|TPOS(Abnormal) | |TPOS(Abnormal) |
×
|TPOS(Abnormal) |+|FNEG(Normal) | |TPOS(Normal) |+|FPOS(Abnormal) |
F-Measure = 2 × (21)
|TPOS(Abnormal) | |TPOS(Abnormal) |
+
|TPOS(Abnormal) |+|FNEG(Normal) | |TPOS(Normal) |+|FPOS(Abnormal) |
From Eqs. (17) to (21),

• True positive: Predictable events are to be precise and are certainly in it.
• False positive: Predictable events are to be precise and are not certainly in it.
• True negative: Predictable events are not to be precise and are not certainly in it.
• False negative: Predictable event not to be precise and are certainly in it.

4.3 Parametric Configuration

In this section, the parameters and their respective values involved in determining the pro-
posed approach are discussed. Table 3 provides the parametric details involved in CNN and
ARFO approach.

4.4 Description of Databases

In this section, there are altogether 34,000 databases that include both malicious and
benign types of datasets. Here, in this research, we utilized benign and malicious applica-
tions containing 32,000 each as the training data and 1000 malicious and benign applica-
tions as the testing data.

Table 3  Parameter description


Methods Parameters Ranges

CNN Input layer 74 * 1200 * 1


Convolution layer 1 8 filters containing the size of 3 * 3 * 1
Max pooling layer 1 2 * 2 containing stride 1
Convolution layer 2 16 filters containing the size of 3 * 3 * 8
Max pooling layer 2 2 * 2 containing stride 1
Convolution layer 3 32 filters containing the size of 3 * 3 * 16
Fully connected layer 13
Output layer Softmax
ARFO Size of the population 50
Maximum number of iterations 100
Scaling parameter 0, 0.2
Observation angle 𝜙0 = 0, 2𝜋
Random parameter 𝜁 = 0, 1

13
P. C. S. Mahesh, S. Hemalatha

Table 4  Overall Confusion Predicted class Malicious Benign


matrix
Actual class

Malicious 96.7% 3.3%


Benign 2% 98%

Fig. 4  a Model accuracy and b model loss rate for testing and training set

• Benign applications

The benign applications were generated from the Google play store by means of a
crawler that randomly searches a particular word from the dictionary and download the
top 300 applications. Since diverse benign applications are downloaded randomly, they are
guaranteed.

• Malicious applications

The malicious applications are generated by private companies that comprise 425 types
of malware families. The malware applications are selected randomly from the above men-
tioned 425 families as malware database.
Table 4 depicts the confusion matrices to evaluate the performances of the proposed
CNN based ARFO malware detection approach. The proposed approach classifies the appli-
cations as malicious app or benign app. Here, the confusion matrix is evaluated for four
different parameters namely the true positive, false positive, true negative and false negative.

4.5 Evaluation Results

In this section, the results based on detecting the benign and malicious applications from
the android mobiles are demonstrated by evaluating certain parameters like model accu-
racy rate, model loss rate, accuracy, precision, recall and f-measure.
Figure 4a, b portrays the model accuracy and loss value for both the testing and the
training data of the proposed CNN-ARFO approach. Figure 4a demonstrates the accuracy
convergence of the proposed CNN-ARFO approach with respect to epoch value. From the

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

Fig. 5  Classification results for


the proposed approach

Fig. 6  ROC curve for the pro-


posed approach

graphical analysis, it is clear that the proposed approach attains the highest testing accuracy
rate of 93.4% at sixth epoch. The testing accuracy curve decreases slightly in fourth and
sixth epoch. The training accuracy starts increasing from the seventh epoch and reaches
93.4%.
Figure 4b describes the model loss value for both the testing and the training set with
respect to the epoch value of the proposed approach. From the graphical representation, it
is demonstrated that the training loss achieved the lowest loss percentage of about 25% at
the second epoch. The training loss decreases slightly and reached 5% and later the loss
value remains stable. The testing loss begins at 41% and the loss percentage based on the
testing dataset is irregular with an increase in epoch value.
Figure 5 depicts the classification results of the proposed CNN-ARFO approach with
respect to four simulation measures namely accuracy, precision, recall and f-measure. The
performance rates are evaluated for the simulation measures mentioned above. From the
resulting outcome, it is clear that the accuracy, precision, recall and f-measure of the pro-
posed approach is 97.27%, 96.45%, 93.21% and 95.43% respectively.
Figure 6 portrays the ROC curve for the proposed approach. A ROC graph is curve
that shows the classification model performances at all classification thresholds. The ROC
curve is plotted among two diverse parameters namely the true positive rate and false posi-
tive rate. In addition to this, the ROC curve is obtained by plotting and computing true
positive value against the false positive value for the proposed CNN-ARFO classifier at
a diverse threshold rate. From the graphical analysis, the overall accuracy rate based on

13
P. C. S. Mahesh, S. Hemalatha

Fig. 7  Comparative evaluation results for a, accuracy b, precision c, recall d, F-measure

true positive value is obtained to be 96.88% and false positive value is obtained as 2.39%.
Therefore from the evaluation, it is considered that the best outcome enhances the differ-
ences among the true positive rate and false positive rate and here high true positive rate is
obtained with minimum false positive rate.
Figure 7a–d depicts the comparative results for the proposed CNN-AREO approach with
various other malware detection approaches namely deep artificial neural network (DANN)
[20], deep belief network (DBN) [31], Method level behavioural semantic approach (MLBS)
[32] and Deep autoencoder and convolution neural network (DE-CNN) [34].
Figure 7a portrays the graphical representation based on accuracy rate for various
approaches. The graphical analysis is plotted and the resulting outcome revealed that the
accuracy rate achieved by the proposed approach is 97.29%. The accuracy rates obtained
by other approaches are 95.57%, 93.21%, 91.67% and 90.79%. The graph is plotted for the
precision value for the proposed and various approaches as mentioned above are repre-
sented in Fig. 7b. The experimentation is carried out for various approaches to obtain the
precision value and the analysis revealed that the proposed approach achieved the precision
value of about 96.45%; whereas in DANN, DBN, MLBS as well as DE-CNN, the precision
value obtained are 94.25%, 89.67%, 92.73% and 90.46% respectively.
Figure 7c depicts the graphical representation based on recall rate for various
approaches. The graphical analysis is plotted and the resulting outcome revealed that the
recall rate achieved by the proposed approach is 93.21%. The accuracy rates obtained by

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

other approaches are 90.57%, 89.92%, 85.27% and 88.71%. The graph is plotted for the
precision value for the proposed and various approaches as mentioned above are repre-
sented in Fig. 7d. The experimentation is carried out for various approaches to obtain the
precision value and the analysis revealed that the proposed approach achieved the precision
value of about 95.43%; whereas in DANN, DBN, MLBS as well as DE-CNN, the precision
value obtained are 91.42%, 90.31%, 87.38% and 88.23% respectively.

5 Conclusion

Numerous research scholars developed machine learning approaches for detecting android
malicious applications, but due to certain drawback effective detecting of malware applica-
tion becomes a challenging task. Therefore, this paper proposed CNN-ARFO approach is
presented to detect the malicious android application. Here, the accuracy rate of the pro-
posed approach is based on API call behaviours and permission request. In order to attain
enhanced detection accuracy, initially, the databases are pre-processed using MinMax scaling
approach. Theneach android application is accomplished by two diverse features namely the
permission feature and API call feature for effective feature extraction. Then CNN-ARFO
approach is proposed to detect malicious applications and hence it can also perform relatively
quick classification with minimum computational overhead once the training process is done.
Then later, we utilized benign and malicious applications to evaluate the performances of
the proposed CNN based ARFO malware detection approach. In addition to this, the com-
parative results for the proposed CNN-AREO approach with various other malware detection
approaches namely DANN, DBN, MLBS and DE-CNN approaches for diverse parameters
namely accuracy, precision, recall and F-measure. From the resulting outcome, it is clear that
the accuracy, precision, recall and f-measure of the proposed approach achieved higher value
(97.27%, 96.45%, 93.21% and 95.43%) when compared with other state of art studies. In the
future study, it is intended to develop an idea regarding the applications (benign or malware)
before downloading. Also, the enhancement in security will also be taken into consideration.

Authors’ Contributions PCSM agreed on the content of the study. PCSM and SH collected all the data
for analysis. PCSM agreed on the methodology. PCSM and SH completed the analysis based on agreed
steps. Results and conclusions are discussed and written together. Both author read and approved the final
manuscript.

Funding Not applicable.

Availability of Data and Material Data sharing is not applicable to this article as no new data were created or
analyzed in this study.

Code Availability Not applicable.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.

Consent to Participate Not applicable.

Consent for Publication Not applicable.

Ethics Approval Compliance with Ethical Standards.

13
P. C. S. Mahesh, S. Hemalatha

Human and Animal Rights This article does not contain any studies with human or animal subjects per-
formed by any of the authors.

Informed Consent For this type of study informed consent is not required.

References
1. Syrris, V., & Geneiatakis, D. (2021). On machine learning effectiveness for malware detection in
android OS using static analysis data. Journal of Information Security and Applications, 59, 102794.
2. Cai, L., Li, Y., & Xiong, Z. (2021). JOWMDroid: Android malware detection based on feature weight-
ing with joint optimization of weight-mapping and classifier parameters. Computers & Security, 100,
102086.
3. Ren, Z., Haomin, Wu., Ning, Q., Hussain, I., & Chen, B. (2020). End-to-end malware detection for
android IoT devices using deep learning. Ad Hoc Networks, 101, 102098.
4. Bhatia, T., & Kaushal R. (2017). Malware detection in android based on dynamic analysis. In 2017
International conference on cyber security and protection of digital services (Cyber security) (pp.
1–6). IEEE.
5. Zhou, Q., Feng, F., Shen, Z., Zhou, R., Hsieh, M.-Y., & Li, K.-C. (2019). A novel approach for mobile
malware classification and detection in Android systems. Multimedia Tools and Applications, 78(3),
3529–3552.
6. Alzaylaee, M. K., Yerima, S. Y., & Sezer, S. (2020). DL-Droid: Deep learning based android malware
detection using real devices. Computers & Security, 89, 101663.
7. Sundararaj, V. (2016). An efficient threshold prediction scheme for wavelet based ECG signal noise
reduction using variable step size firefly algorithm. Intenational Journal of Intelligent Engineering and
Systems, 9(3), 117–126.
8. Sundararaj, V., Anoop, V., Dixit, P., Arjaria, A., Chourasia, U., Bhambri, P., Rejeesh, M. R., & Sunda-
raraj, R. (2020). CCGPA-MPPT: Cauchy preferential crossover-based global pollination algorithm for
MPPT in photovoltaic system. Progress in Photovoltaics Research and Applications, 28(11), 1128–1145.
9. Vinu, S. (2019). Optimal task assignment in mobile cloud computing by queue based ant-bee algo-
rithm. Wireless Personal Communications, 104(1), 173–197.
10. Sundararaj, V., Muthukumar, S., & Kumar, R. S. (2018). An optimal cluster formation based energy
efficient dynamic scheduling hybrid MAC protocol for heavy traffic load in wireless sensor networks.
Computers & Security, 77, 277–288.
11. Rejeesh, M. R. (2019). Interest point based face recognition using adaptive neuro fuzzy inference sys-
tem. Multimedia Tools and Applications, 78(16), 22691–22710.
12. Kavitha, D., & Ravikumar, S. (2021). IOT and context-aware learning-based optimal neural network
model for real-time health monitoring. Transactions on Emerging Telecommunications Technologies,
32(1), e4132.
13. Hassan, B. A., & Rashid, T. A. (2020). Datasets on statistical analysis and performance evaluation of
backtracking search optimisation algorithm compared with its counterpart algorithms. Data Brief, 28,
105046.
14. Hassan, B. A. (2020). CSCF: A chaotic sine cosine firefly algorithm for practical application problems.
Neural Computing and Applications, 33, 1–20.
15. Gowthul Alam, M. M., & Baulkani, S. (2017). Reformulated query-based document retrieval using
optimised kernel fuzzy clustering algorithm. International Journal of Business Intelligence and Data
Mining, 12(3), 299.
16. Manikandan, N., Gobalakrishnan, N., & Pradeep, K. (2022). Bee optimization based random double
adaptive whale optimization model for task scheduling in cloud computing environment. Computer
Communications, 187, 35–44.
17. Bayazit, E. C., Sahingoz, O. K., & Dogan, B. 2020. Malware detection in Android systems with tradi-
tional machine learning models: A survey. In 2020 International congress on human–computer inter-
action, optimization and robotic applications (HORA) (pp. 1–8). IEEE.

13
An Efficient Android Malware Detection Using Adaptive Red Fox…

18. Millar, S., McLaughlin, N., Martinez del Rincon, J., Miller, P., & Zhao, Z. (2020). DANdroid: A multi-
view discriminative adversarial network for obfuscated Android malware detection. In Proceedings of
the tenth ACM conference on data and application security and privacy (pp. 353–364).
19. Jerbi, M., Dagdia, Z. C., Bechikh, S., & Said, L. B. (2020). On the use of artificial malicious pat-
terns for android malware detection. Computers & Security, 92, 101743.
20. Wu, Q., Li, M., Zhu, X., & Liu, B. (2020). Mviidroid: A multiple view information integration
approach for android malware detection and family identification. IEEE Multimedia, 27(4), 48–57.
21. Hussain, S.J., Ahmed, U., Liaquat, H., Mir, S., Jhanjhi, N. Z., & Humayun. M (2019). IMIAD:
Intelligent malware identification for android platform. In 2019 International conference on com-
puter and information sciences (ICCIS) (pp. 1–6). IEEE.
22. Feng, P., Ma, J., Sun, C., Xinpeng, Xu., & Ma, Y. (2018). A novel dynamic android malware detec-
tion system with ensemble learning. IEEE Access, 6, 30996–31011.
23. Xiao, X., Zhang, S., Mercaldo, F., Guangwu, Hu., & Sangaiah, A. K. (2019). Android malware
detection based on system call sequences and LSTM. Multimedia Tools and Applications, 78(4),
3979–3999.
24. Imtiaz, S. I., urRehman, S., Javed, A. R., Jalil, Z., Liu, X., & Alnumay, W. S. (2021). DeepAMD:
Detection and identification of android malware using high-efficient deep artificial neural network.
Future Generation computer systems, 115, 844–856.
25. Mahindru, A., & Sangal, A. L. (2021). MLDroid—Framework for android malware detection using
machine learning techniques. Neural Computing and Applications, 33(10), 5183–5240.
26. Zhu, H., Li, Y., Li, R., Li, J., You, Z.-H., & Song, H. (2020). Sedmdroid: An enhanced stacking
ensemble of deep learning framework for android malware detection. IEEE Transactions on Net-
work Science and Engineering, 8, 984–994.
27. Su, X., Shi, W., Xilong, Qu., Zheng, Y., & Liu, X. (2020). DroidDeep: Using deep belief network
to characterize and detect Android malware. Soft Computing, 24, 1–14.
28. Zhang, H., Luo, S., Zhang, Y., & Pan, L. (2019). An efficient android malware detection system
based on method-level behavioral semantic analysis. IEEE Access, 7, 69246–69256.
29. Wang, W., Zhao, M., & Wang, J. (2019). Effective android malware detection with a hybrid model
based on deep autoencoder and convolutional neural network. Journal of Ambient Intelligence and
Humanized Computing, 10(8), 3035–3043.
30. Karbab, E. B., Debbabi, M., Derhab, A., & Mouheb, D. (2017). Android malware detection using
deep learning on API method sequences. arXiv:​1712.​08996.
31. Alazab, M., Alazab, M., Shalaginov, A., Mesleh, A., & Awajan, A. (2020). Intelligent mobile mal-
ware detection using permission requests and API calls. Future Generation Computer Systems, 107,
509–521.
32. Jiang, X., Mao, B., Guan, J., & Huang, X. (2020). Android malware detection using fine-grained
features. Scientific Program, 2020, 1–13.
33. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
34. Garg, M., & Dhiman, G. (2020). Deep convolution neural network approach for defect inspection of
textured surfaces. Journal of the Institute of Electronics and Computer, 2(1), 28–38.
35. Kumar, A., Gandhi, C. P., Zhou, Y., Kumar, R., & Xiang, J. (2020). Improved deep convolution
neural network (CNN) for the identification of defects in the centrifugal pump using acoustic
images. Applied Acoustics, 167, 107399.
36. Połap, D., & Woźniak, M. (2021). Red fox optimization algorithm. Expert Systems with Applica-
tions, 166, 114107.
37. Braik, M. S. (2021). Chameleon Swarm Algorithm: A bio-inspired optimizer for solving engineering
design problems. Expert Systems with Applications, 174, 114685.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.

13
P. C. S. Mahesh, S. Hemalatha

Dr. P. C. Senthil Mahesh received his B.E in Computer Science and


Engineering from Madras University, Chennai in 1997. He received
his M.E., and Ph.D., in Computer science and Engineering Anna Uni-
versity, Chennai 2006 and 2016 respectively. He has Twenty One
Years of teaching experience in various Engineering Colleges. At Pre-
sent he is working as an Associate Professor in Computer science and
Engineering Department at Excel Engineering College komarapalyam,
Namakkal District Tamil Nadu. He has published more than Twenty
Research papers in International Journals and Fifteen Papers in
National Journals. His main research interest includes Network secu-
rity, Software Engineering and Mobile Adhoc networks. He is a Life
Member in CSI and ISTE.

Dr. S. Hemalatha did her B.E. in Computer Science and Engineering


and M.E. in Computer Science and Engineering, University of Madras
and Anna University Chennai, India in 2000 and 2004 respectively.
She completed Ph.D. in Computer Science and Engineering from
Anna University, India in 2016. She has totally Twenty years of Teach-
ing Experience in different engineering colleges and currently she is
working as a Professor in Computer Science and Engineering at Pani-
malar Institute of Technology, Chennai, India. She has published 45
National and International journals and 30 papers in International con-
ferences.Her research areas are Network Security, MobileSecurity. She
is a member in CSI, IEEE and ISTE.

13

You might also like