Clustering-Aided Multi-View Classification: A Case Study On Android Malware Detection

Journal of Intelligent Information Systems (2020) 55:1–26
https://doi.org/10.1007/s10844-020-00598-6
Clustering-Aided Multi-View Classiﬁcation: A Case

Study on Android Malware Detection
Annalisa Appice1,2 · Giuseppina Andresini1 · Donato Malerba1,2
Received: 3 October 2019 / Revised: 29 January 2020 / Accepted: 17 March 2020 /

Published online: 4 May 2020
© Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract
Recognizing malware before its installation plays a crucial role in keeping an android device
safe. In this paper we describe a supervised method that is able to analyse multiple informa-
tion (e.g. permissions, api calls and network addresses) that can be retrieved through a broad
static analysis of android applications. In particular, we propose a novel multi-view machine
learning approach to malware detection, which couples knowledge extracted via both clus-
tering and classification. In an assessment, we evaluate the effectiveness of the proposed
method using benchmark Android applications and established machine learning metrics.
Keywords Multi-view Learning · Classification · Clustering · Android Malware

Detection · Android Application Static Analysis
1 Introduction
The ceaseless growth of mobile sales and their pervasiveness in our daily lives has fostered
the development of malware for attacking mobile devices. Malware is a malicious appli-
cation aimed at stealing private information, sending SMS messages and reading contacts.
It can even cause damage by exploiting private, sensitive data, as well as infiltrating well-
defended organisations and service providers. Nowadays the Android Operating System is
in absolute the most popular mobile operating system that dominates the worldwide mobile
Annalisa Appice
annalisa.appice@uniba.it
Giuseppina Andresini
giuseppina.andresini@uniba.it
Donato Malerba
donato.malerba@uniba.it
1 Department of Informatics, Università degli Studi di Bari Aldo Moro,

via Orabona, 4-I-70125 Bari, Italy
2 Consorzio Interuniversitario Nazionale per l’Informatica - CINI, via Orabona,
4-I-70125 Bari, Italy
2 Journal of Intelligent Information Systems (2020) 55:1–26
market with a share of 74.13 percent in December 2019.1 Unfortunately, Android is also
the most commonly targeted operating system for a malware developer. Nokia reports that,
among smartphones, android devices were responsible for 47.15% of the observed malware
infections in 2018 (NOKIA 2019). In response to the increasing of malicious applications,
malware detection would be a vital component of an android security solution (Painter and
Kadhiwala 2017).
In the on-going arm-wrestle with android malware developers, several malware detec-
tion approaches have been proposed in the cybersecurity literature. These approaches are
now being shifted to machine learning techniques, in order to automate and speed up the
detection process, as well as to keep pace with the malware evolution. In the last decade
various machine learning methods have been developed, in order to automatically reason
about malicious and genuine samples and fit the best detection model parameters (see (Ucci
et al. 2019) for a recent survey). These methods mainly boost malware detection by learn-
ing from sample data extracted through static, dynamic or hybrid analysis (where static and
dynamic are used together) of android applications.
Methods based on static analysis perform the reverse engineering of each application
sample for analysing the malicious code before it is executed (pre-execution) (Arp et al.
2014; Fan et al. 2017; Kang et al. 2016; Nguyen-Vu et al. 2019). By contrast, dynamic
analysis executes an application in a controlled environment such as an emulator or a real
device with the purpose of tracing its behaviour (Alzaylaee et al. 2017; 2020; Bhatia and
Kaushal 2017). As discussed in (Kapratwar et al. 2017), dynamic analysis may be very
informative, but requires sophisticated skills and platforms, which cause cost overheads and
time-consuming processes. On the other hand, static analysis is very useful and popular,
since it is much easier to remedy malware if it is never executed. Therefore, coherently with
conclusions drawn in (Kapratwar et al. 2017), this study still explores the machine learn-
ing potential with android static analysis. In particular, we investigate how various static
characteristics of android malware samples can be processed through machine learning, in
order to accurately profile malware signatures (i.e. patterns that indicate the presence of a
malicious code).
While several machine learning studies restrict static analysis to permissions (Rovelli and
Vigfússon 2014; Tiwari and Singh 2015; Talha et al. 2015), api calls (Li et al. 2015; Peira-
vian and Zhu 2013; Sheen et al. 2015; Shiqi et al. 2018; Yerima et al. 2014) or intent filters
(Idrees and Rajarajan 2014), we decide to pursue a machine learning method for android
malware detection, which is able to take advantage of various views of a broad static analysis
simultaneously. This follows the study in (Arp et al. 2014), where the authors have clari-
fied that a high variety of static features can be actually extracted from reverse engineering
of both the manifest and dexcode of android applications. These features define a wealth
of information naturally related to ten distinct views (i.e. permissions, features, intents,
application components, services, providers, restricted api calls, used permissions, suspi-
cious api calls and network addresses). All these views can be used simultaneously to better
understand decision-making in the analysis of differences between malicious and genuine
applications. In (Arp et al. 2014) a step of data fusion is applied before machine learning,
in order to build a single feature vector by concatenating all the features extracted in the
identified views. Therefore, machine learning is performed along this single feature vector,
concluding that feature variety is the crucial fuel for the accuracy of the machine learning
1 http://gs.statcounter.com/os-market-share/mobile/worldwide
Journal of Intelligent Information Systems (2020) 55:1–26 3
approach. This conclusion is still confirmed when feature selection is used to diminish the
volume of information (Demontis et al. 2017; Roy et al. 2015; Wen and Yu 2017).
Continuing in this research direction, we formulate a novel machine learning method,
named MuViDA (Multi-View clustering-aided classifier for malware Detection in Android),
that learns a multi-view classification pattern for detecting android applications that are
malicious. It benefits from the informative richness of a broad, multi-view, static analysis
which gathers simultaneously as many static feature views as possible. Instead of flattening
the multiple views of a static analysis in a single feature vector, as promoted in (Arp et al.
2014), we resort to a multi-view learning approach, in order to efficaciously learn the fea-
ture volume of application samples as it is spanned over the identified variety of views. The
proposed method is developed in the supervised machine learning paradigm (i.e. a train-
ing set of fully labelled samples is requested for machine learning), which is applied to
an intriguing combination of clustering and classification to process knowledge produced
across multiple views. In particular, a clustering strategy is used to synthesise useful knowl-
edge in intra-view data. A classification strategy is then used to exchange clustering-aware
knowledge across multiple views, as well as to process the inter-view interactions, captured
through the exchange stage, and learn an effective multi-view malware detection pattern.
We assess the effectiveness of the proposed malware detection method in a benchmark
case study with 123453 Android applications (Arp et al. 2014). The case study is made
complex by the presence of an imbalanced class problem (less than five percent of the
applications are malware). We note that performing the evaluation with exactly all the data
in (Arp et al. 2014) resembles the realistic case where the number of malware applications
appearing in a market is significantly lower than the number of genuine applications. This
imbalanced setting is more challenging than evaluating malware detection methods with
balanced datasets as was done, for example, in (Bai and Wang 2016; Milosevic et al. 2017;
Narayanan et al. 2018; Narayanan et al. 2018; Zhang et al. 2016). Specifically, MuViDA
integrates the sampling strategy introduced in (Andresini et al. 2020), in order to handle the
imbalanced class problem of this scenario.
We evaluate the accuracy of both the proposed method and several competitors, by
computing the metrics (sensitivity—true positive rate, fall-out—false positive rate and
AUC—Area Under the ROC Curve), which are usually considered by the machine learning
and cybersecurity community. The experimental results show the efficacy of the proposed
multi-view learning method, compared to that of various competitors (comprised a deep
neural network).
The paper is organised as follows.The Section 2 clarifies the background, the motiva-
tion and the contribution of this paper. Section 3 describes the proposed machine learning
method. Section 4 presents the data scenario, the experimental setup and reports rele-
vant results. Finally, in Section 5 some conclusions are drawn and some future work is
outlined.
2 Background, motivation and contribution
Android is designed with security as one of its cornerstone principles. It does a good job
with a sophisticated permission system, in order to guarantee that no application can access
the system level without the adequate permissions. Each permission corresponds to a cer-
tain security-critical action, for example, the dispatch of an SMS message. Each android
application declares the permissions it requires in its manifest. These privileges are accred-
ited by the user before installation. As permissions play a crucial role in the android security
system, it is no surprise that various machine learning studies focus on predicting differ-
ent permissions that legitimate android applications (Tiwari and Singh 2015; Rovelli and
Vigfússon 2014; Talha et al. 2015; Taheri et al. 2019).
Although, these studies prove that permission analysis can effectively contribute to
detecting malicious android applications, there are also various papers highlighting that api
calls (Li et al. 2015; Peiravian and Zhu 2013; Yerima et al. 2014; Sheen et al. 2015; Shiqi
et al. 2018; Taheri et al. 2019), intent filters (Idrees and Rajarajan 2014; Taheri et al. 2019),
as well as combinations of statements, method calls, function arguments and instructions
(Milosevic et al. 2017) can be considered for accurate android malware detection. Follow-
ing this research direction, the authors of (Arp et al. 2014) have identified which kind of
android information can be statically extracted, in order to help accurate machine learning.
Their study selects more than 500,000 static features spanned on ten distinct views, and
proves that feature variety contributes to accuracy gain in the machine learning approach.
Following this research direction, the authors of (Demontis et al. 2017; Roy et al. 2015; Wen
and Yu 2017) have explored the use of feature selection algorithms, in order to diminish the
high volume of static features identified in (Arp et al. 2014). Interestingly they have proved
that machine learning is still accurate, provided that the view variety is preserved in the
final feature vector model, i.e. selected features still come from various views. In any case,
although existing studies reach an important milestone, demonstrating that machine learn-
ing can take advantage of information along multiple views, it is improper to concatenate
features of several views into one long vector. This is because they have different proper-
ties, and simply concatenating them into a high-dimensional feature vector will suffer from
the so-called curse of dimensionality (Yu et al. 2012).
On the other hand, multi-view learning has become popular in recent years (Zhao et al.
2017), as it is considered a viable solution to the curse of dimensionality problem when
multiple views of data are available. In particular, multi-view learning aims to learn one
function to model each view and jointly optimizes all the functions to improve the general-
ization performance. Well studied multi-view learning schemes include ensemble learning
(Folino and Pisani 2016) and stacking (Garcia-Ceja et al. 2018), which can be applied to
classification functions learned from various data views.
Multi-view learning has recently attracted a certain amount of attention in cybersecu-
rity. In particular, the authors of (Bai and Wang 2016; Zhang et al. 2016) apply multi-view
learning to malware detection, in order to appropriately address the curse of dimension-
ality with features retrieved through android static analysis. Both studies consider several
categories of static features defining various views. They propose learning a distinct clas-
sification function from each view and combinining the multiple classification functions
by applying an ensemble with majority voting or a stacking strategy. In (Bai and Wang
2016) the authors also investigate the opportunity of learning several classification functions
simultaneously for each view. The authors of (Guo et al. 2010) apply multi-view learning to
information available in API call sequences, in order to discriminate between genuine and
malicious processes. They divide the API call sequences of malware into sub-sequences and
use each sub-sequence to train a classification function. The classification outputs are finally
combined in a single score. Following this research direction, the effectiveness of both
multi-view learning and ensemble learning has been proved in intrusion detection problems
(Miller and Busby-Earle 2017), as well as in Windows malware detection (Tajoddin and
Abadi 2019). More recently, a unified framework that integrates multiple views of android
applications has been formulated in (Narayanan et al. 2018). It performs comprehensive
malware detection in combination with malicious code localization. Specifically, it applies
Multiple Kernel Learning to find a weighted combination of the views which yields the
best detection accuracy. Finally, a feature hashing-based word embedding model is investi-
gated in (Narayanan et al. 2018), in order to learn multi-view graph/subgraph embeddings
of android applications.
Considering the promising results of multi-view learning in cybersecurity (Zhang et al.
2016; Bai and Wang 2016; Narayanan et al. 2018; Narayanan et al. 2018), we have decided
to continue investigations in a multi-view learning direction, in order to bridge the gap
between the machine learning bias and the choice of the android application view bias,
from which traditional machine learning approaches may suffer. We base our study on
the assumption that the truth underlying a malware detection pattern would recognize the
malicious nature of an android application independently of the application view. There-
fore, we decide to use simultaneously the multiple views derived from the broad android
static analysis described in (Arp et al. 2014) by bootstrapping classifications with clustering
knowledge learned at different views and exchanged with one view another. This is done
with a novel, application-specific, multi-view methodology that combines clustering and
classification, in order to learn a malware detection pattern by accounting for knowledge
exchanged across several views. Our proposal is different of the combination of clustering
and classification experimented in (Milosevic et al. 2017), where clustering is considered
for producing labels in a single-view, semi-supervised setting. These labels are subsequently
used to train a classification model with more data. It is also different from (Andresini et al.
2020), where clustering is considered as a strategy to deal with the imbalanced class prob-
lem. In particular, a k-means algorithm is used in (Andresini et al. 2020), in order to reduce
the amount of genuine information and learn the malware detection pattern in a balanced
setting. Finally it is different from the existing multi-view approaches (see the brief review
reported above) that mainly learn independent malware detection patterns from independent
views, while exchanging the intra-view learned patterns during the combination of classi-
fications achieved on the separate views. On the contrary, in our proposal the exchange of
knowledge among the various views involves the clusters learned at the view level, while
uses the classification as an exchange means.
In particular, we use a clustering algorithm to roughly separate malicious samples from
genuine samples. As we adopt a mechanism to automatically determine the number of
clusters based on the data to cluster, application samples having different profiles (e.g.
malicious applications belonging to various families) may be separated in distinct clusters.
The clustering stage is repeated on each separate view, so it assigns every training sam-
ple to one distinct cluster per view. In every view, the clustering assignment of a sample is
decided according to the static feature vector of the sample in the considered view. There-
fore, the clustering assignments of the training samples (namely clustering patterns) define
the knowledge extracted from the training applications in an intra-view level. Subsequently
we use a classification strategy to exchange the learned clustering patterns across multiple
views and identify possible interactions between separate views, which may improve the
accuracy of the malware detection task. In order to make this exchange, we propose using a
classification strategy that learns predictions of the sample clustering assignments decided
in one view (dependent feature of the classification used for the exchange) depending on the
static feature vector of each left-out view (independent features of the classification used
for the exchange).
Our assumption behind using a classification strategy to exchange clustering patterns
across views is that training a classification function from the static features of one view, in
order to predict cluster assignments decided in the left-out views, may help to constrain the
Table 1 Description of frequently used symbols
Symbol Description
A android application domain

D training set (D ⊂ A)
a android application (sample)
ωj j -th view of (Arp et al. 2014), with j = 1, . . . , 10
Xj vector of binary static features associated with the j -th view
Y binary label (malicious/genuine)
Aj aggregate feature that counts how many binary static features spanned on Xj
are 1-valued on a sample
θ parameter to control the granularity of clustering
ρ parameter to control the size of the sampling strategy
Cj categorical cluster variable that represents the clustering assignments of training
samples decided by the clustering pattern learned on the j -th view
C=j feature vector composed of cluster variables C i with i = j
γj multi-target classification function γ i : Xj → C=j
Ĉγi j cluster feature C i as it is predicted by γ j (with i = j )
Ĉ=j vector composed of Ĉγi j , with i = j
search for complex malware detection patterns that agree across the various views. Based
upon this assumption, we should be able to learn a more robust malware detection pattern
from the combination of the clustering-aided exchanged patterns than from the combina-
tion of the separate malware classification functions that are commonly trained from the
independent views (as it has been proposed in (Bai and Wang 2016; Zhang et al. 2016)).
We determine the empirical validity of this hypothesis by including traditional multi-view
solutions described in (Bai and Wang 2016; Zhang et al. 2016) as competitors of the experi-
mental study illustrated in Section 4. The malware detection pattern can finally be learned as
a consensus classification function depending on the clusters’ assignments predicted during
the exchange phase. We note that to learn an accurate consensus in an imbalanced training
scenario (i.e. in a scenario with a low number of malicious samples and a high number of
genuine samples), a sampling strategy is applied to the genuine information.
Considering that we define a novel method for android malware detection in multi-view
learning, this study is relevant for the machine learning community. It contributes to assess-
ing that multi-view learning can gain improvement over traditional single view learning
in an application context (android security) that has recently gained importance. On the
other hand, this study is also significant for the cybersecurity community, as it describes
a machine learning method that deals with multiple views of a broad static analysis, by
properly handling the curse of dimensionality.
3 Multi-view clustering-aided classiﬁcation
MuViDA is formulated in the supervised setting by resorting to the multi-view learning

paradigm. This is to properly integrate the multiple data views that are derived through a
broad static analysis of a collection of android applications. The list of symbols frequently
used in the illustration of the proposed method is reported in Table 1.
In MuViDA, the application data (samples) are extracted through the static android ana-
lyzer described in (Arp et al. 2014). This analyser maps every android application on ten,
distinct feature vectors. These vectors convey binary information extracted through reverse
engineering and related to ten different entities, namely, permissions, features, intents, appli-
cation components, services, providers, restricted api calls, used permissions, suspicious
api calls and network addresses. Every binary feature is defined to be set equal to 1 if the
reverse engineering analyser verifies that the application code contains the feature, 0 oth-
erwise. For example, the static analyser extracts a feature RECEIVE SMS (related to the
entity permission) that is equal to 1 if the manifest of the analysed application declares that
this permission will be granted at the installation time, 0 otherwise. We note that the ten enti-
ties described in (Arp et al. 2014) and considered in this study produce android application
views that are compatible with the assumption of view independence, which is generally
done in multi-view learning (Zhao et al. 2017).
The multi-view learning process of MuViDA (see the block diagram reported in Fig. 1)
is three-staged. In particular, it comprises:
– A clustering stage that is repeated on every view of the training set, in order to learn a
view-related clustering pattern of the training set in each view. This pattern separates
training samples in disjoint clusters, based on the feature vector of the considered view
(see details in Section 3.1);
– A knowledge exchange stage that resorts to a classification strategy, in order to export
each clustering pattern learnt in one view to the left-out views. This stage learns one
multi-target classification function for each view involved in the exchange operation
(see details in Section 3.2);
Fig. 1 The block diagram of MuViDA. For the sake of simplicity, the block diagram is defined with three
views. (1) The clustering stage is repeated on every view. (2) The knowledge exchange stage exchanges the
clustering patterns across the multiple views by learning a classification function for each view, in order to
predict the cluster assignments of the dependent, left-out views on the static features of the independent,
considered view. (3) The stacking phase learns the classification pattern that combines clustering-defined
assignments as they can be predicted by the classification functions learned at each view during the exchange
stage
– A final multi-view combination stage that applies a stacking-based fusion method to

learn the consensus malware detection pattern, by accounting for the clustering patterns
created and exchanged in the previous two stages. In particular, the learned malware
detection pattern allows us to classify any android application as malicious or genuine
based on its clustering assignments predicted with the classification functions learned
in the knowledge exchange stage. A sampling strategy is introduced, in order to deal
with the imbalanced class problem when learning the consensus pattern (see details in
Section 3.3).
We note that, although MuViDA encloses a clustering stage, that is traditionally formu-
lated as an unsupervised task, it performs its learning phase in the supervised setting. In fact,
the ground truth labels are taken into account both during the clustering phase (see details
in Section 3.1) and the stacking phase (see details in Section 3.3).
Formally, the input parameters of MuViDA are:
– a training dataset D that is a collection of n android applications (D ⊂ A) labeled in
the set {malicious, genuine};
– a collection of ten independent views ωj (with j = 1, . . . , 10)2 , so that each view
corresponds to one entity considered in (Arp et al. 2014); the j -th view can be seen as
a function ωj : A → Xj that maps any android application a ∈ A onto a view-related
binary feature vector Xj ;
– a binary label Y , so that each training sample a ∈ D is labelled according to an
unknown classification function, whose range is a finite set of two distinct labels, i.e.
Y = malicious or Y = genuine;
– two user-defined thresholds θ and ρ, which control the granularity of the clustering
patterns learned in the clustering stage and the size of the sampling strategy introduced
in the stacking stage, respectively.
The output is a malware detection pattern that composes the stacking function learned in
the stacking stage on the multi-target classification functions learned in the knowledge
exchange stage.
3.1 Clustering stage
Clustering is performed along each view, in order to learn view-related, compact represen-
tations (clustering patterns) of the training set. As the clustering stage learns one clustering
pattern for each each view, it aids in understanding the view-specific data structures behind
the training samples. In particular, for every view, the clustering stage assigns each training
sample to one view-related cluster with the purpose of roughly separating malicious and
genuine applications into distinct clusters. We assume that this clustering knowledge may
aid in deriving information useful for malware detection. To this aim, each view-related
clustering assignment of a training sample is decided on the basis of the description of the
sample in both the static feature vector associated with the considered view, and the label
of the sample. We highlight that our idea of clustering by accounting for the static fea-
ture vector of a view contributes to group applications that presumably manifest a similar
signature in the considered view. On the other hand, accounting for the label information
during clustering helps to boost the discovery of clusters, which mainly group malicious
2 By changing the static analyser the method can be generalised to any number of views.
applications, and clusters, which mainly group genuine applications. In any case, coherently
with the unsupervised formulation of clustering, both static features of a view and labels are
dealt as descriptive features during the clustering stage.
In principle, clustering can be completed using various algorithms. In MuViDA we select
k-means++ (Arthur and Vassilvitskii 2007) as a clustering algorithm. This algorithm starts
initialising k training samples, called centres, randomly. Then it performs an iterative step
that partitions samples into clusters by solving the problem of minimising the intra-cluster
variance. In particular, the iterative step of k-means is performed by alternating between
(1) assigning training samples to clusters based on their closeness to the current centres
and (2) choosing centres (points which are the centroid of a cluster) based on the current
assignment of samples to clusters. This process is repeated for a given number of iterations
until coverage (i.e. current clustering assignments change).
The choice of k-means is due to various considerations. It is a well-known partition-based
algorithm that, if the dissimilarity measure (to measure closeness between every training
sample and every cluster centre) is defined and the number of clusters is decided is easy to
implement. It is an effective way to identify the distinct, different behaviours in a training
set, with clusters’ centres that plausibly model the signatures of these behaviours. In par-
ticular, we select k-means++, as it also applies a procedure to initialise the cluster centres
before proceeding with the standard k-means iterative step. With the k-means++ initialisa-
tion, the algorithm is guaranteed to find a solution that is O(log k) (with k being the number
of clusters) competitive to the optimal k-means solution (Arthur and Vassilvitskii 2007).
Regarding to the definition of the dissimilarity measure to adopt with the k-means itera-
tive step, we decompose its computation into two terms, that are independently defined on
the static feature vector of the considered view and the label, respectively. Both terms con-
tribute equally to the overall dissimilarity measurement. If we consider the j -th view, the
dissimilarity measure is formulated as follows:
1 1
d j (a, b) = dis j (a, b) + disY (a, b), (1)
2 X 2
where a and b are the compared points (i.e. a training sample and a cluster centre) and
disXj (a, b) and disY (a, b) are the dissimilarity terms computed between a and b on Xj and
Y , respectively. Several dissimilarity measures have been defined in the machine learning
field, in order to compare categorical and numeric data. Let us remind that we handle the
binary features, as described in (Arp et al. 2014), as well as binary labels. As binary data are
numerically encoded with 0/1, we can resort to the weighted Euclidean distance to compute
both the dissimilarity terms reported in Equation 1. In particular, they are computed as
follows:

1
disXj (a, b) = j
(a(X) − b(X))2 , (2)
j
X
X∈X

disY (a, b) = (a(Y ) − b(Y ))2 . (3)
According to Equations 2 and 3, terms disXj (a, b) and disY (a, b) are both computed in
the range [0,1]. This allows both the static feature vector Xj and the label Y to equally con-
tribute to Equation 1 without suffering from any bias due to the higher number of features
included in Xj with respect to Y .
With regard to the number of clusters, we resort to the Elbow method (Bholowalia and
Kumar 2014), in order to select automatically the number of clusters per view based on the
data characteristics of the training set in the considered view. The method explains the intra-
cluster variance of the clustering assignment of a training set as a function of the number of
clusters. It is based upon the idea that the intra-cluster variation tends to decrease toward 0
as we increase the number of clusters (the intra-cluster variation is 0 when the number of
clusters is equal to the number of samples in the training set, since, in this case, each sample
is its own cluster and there is no dissimilarity between it and the centre of its cluster). If we
plot the intra-cluster variance against the number of clusters and augmenting the number
of clusters, we will add more information explaining a lot of variance. However, at some
point the marginal gain will drop drastically (see Fig. 2). Based upon this consideration, the
Elbow method chooses a number of clusters, so that adding another cluster does not give
a much better modelling of the data. In MuViDA we resort to a threshold-based approach
to determine the elbow of the number of clusters, with a separate elbow estimated on each
independent view. Specifically, if we consider the j -th view, the optimal number of clusters
j
kelbow is computed on the view ωj of D as the lowest value of k, such that:
j
kelbow = argmin{|var(kmeansωj ,k (D)) − var(kmeansωj ,k+1 (D))| ≤ θ }, (4)
k≥2
where var() measures the intra-cluster variance and θ is the user-defined threshold that
controls clustering granularity (by default θ = .005 in this study). According to Equation 4,
j
kelbow is determined as the lowest k, such that increasing k by 1 leads to a reduction of vari-
ance with the new clustering pattern that is lower than θ. We note that the elbow method
with threshold θ allows us to run k-means with a number of clusters that may change view
by view. This aids the clustering stage in appropriately fitting the structure of training sam-
ple,s by accounting for possible different data characteristics of the same training set in the
various views.
Final considerations concern how every view-related clustering pattern, learned accord-
ing to the theory reported above, can actually aid in deriving an intra-view compact
representation of the training set. Let us consider that any clustering pattern, learned
with a partition-based clustering algorithm such as k-means, naturally engineers a cluster
feature. This feature maps each training sample onto a cluster label that belongs to a view-
defined categorical domain (cluster label set). For example, the clustering pattern learned
on the j -th view allows us to build the cluster feature C j having an ωj -based domain
C j = {cluster1j , . . . , cluster jj }. In principle, different clustering patterns can be learned
kelbow
in different views of the same training set, so the cluster features engineered from these pat-
terns express a vector of distinct features with distinct domains. According to this point of
Fig. 2 The intra-cluster variance

(axis Y) by varying the number
of clusters k (axis X) as it is
computed on the view Restricted
Api Call of the training set of one
split of the android application
collection, described in Data.
The elbow corresponds to k=9
view, the clustering stage produces a data engineering stage that maps each training sample
onto a tuple of values, spanned on the vector of ten cluster features C 1 , . . . , C 10 , where each
cluster feature C j (with j = 1, . . . , 10) provides a compact (1-dimensional) representation
of every sample with respect to the j -th view ωj .
3.2 Knowledge exchange stage
A classification strategy is formulated, in order to exchange view-related clustering patterns

across multiple views. For this purpose, we learn a classification function on the static fea-
ture vector of each view, in order to predict the clustering assignments learned in the left-out
views. Specifically, for each j -th view, we define: (1) an independent space spanned on Xj
and (2) a dependent space spanned on C=j , where C=j comprises cluster features C i with
i = j . Let us consider a representation of the training samples as pairs of tuples spanned
on the independent space (Xj ) and the dependent space (C=j ) defined above (see an exam-
ple in Fig. 3). We create the knowledge exchange operation by learning the classification
functions defined as:
γ j : Xj → C=j . (5)
Equation 5 involves a dependent space that is structured as a vector of dependent (tar-
get) features. Therefore, this formulation of a classification strategy properly resembles the
definition of a multi-target classification task. As reported in (Last 2016), the multi-target
classification can be seen as a generalization of the multi-label classification task (Madjarov
et al. 2012). Namely, multi-label classification is concerned with learning from a training
set, where each sample is associated with multiple labels. These multiple labels belong to
a predefined set of labels. The goal of multi-label classification is to construct a predictive
model that will provide a list of relevant labels for a given, previously unseen sample. In the
multi-target task, the output list of labels includes a distinct label selected from the domain
of every target.
Additional considerations concern the fact that the multi-target classification task can
also be naively decomposed into a series of single-target classifications, so that one tradi-
tional classification pattern can be learned to predict each target variable. In any case, we
Fig. 3 Details of the input of the knowledge exchange operation from the block diagram of MuViDA in
Figure 1. The detail is referred to view ω1 and clustering patterns C 2 and C 3
decide here to follow a main stream of research in machine learning (Kocev et al. 2013;
Papagiannopoulou et al. 2015; Last 2016), which has repeatedly confirmed that learning a
single multi-target classification function can outperform the decision of learning several
single-target classification functions. In fact, the multi-target classification strategy limits
the risk of data overfitting and gains in accuracy by exploiting possible dependencies among
the targets (Last 2016). Considering that some relationships should arise between clustering
patterns leaned on various views, we can reasonably resort to a strategy that capitalizes the
dependencies across the targets, in order to identify inter-view knowledge, which aids the
search for malicious and genuine families that share a similar signature.
In particular, we learn an ensemble of predictive clustering trees for multi-target predic-
tion (Kocev et al. 2013), in order to learn each multi-target classification function defined as
in Equation 5. The ensembles are constructed by using a multi-target version of a Random
Forest algorithm. Random Forests are currently one of the most popular machine learning
algorithms as they are efficient, easy to use and commonly highly accurate (Lin et al. 2017).
The decision of considering Random Forests in this paper is also supported by several stud-
ies (Alam and Vuong 2013; Bhatia and Kaushal 2017; Goyal et al. 2016; Andresini et al.
2020), that apply Random Forests to android malware detection with great success also in
combination with multi-view ensemble and stacking (Bai and Wang 2016). Random Forests
are also used in (Khorshidpour et al. 2017) in malware detection in PDF files.
3.3 Stacking stage
A stacking combiner (Wolpert 1992) is finally learned as a consensus malware detection

pattern. In particular, the stacking combiner allows us to learn a meta-classification function
that predicts the label of a sample (malicious or genuine), based on the cluster predic-
tions yielded with the multi-target classification functions learned during the knowledge
exchange stage. For this purpose, the independent space of the stacking combiner neces-
sarily comprises tuples of values spanned on the independent space [Ĉ=j ]j =1,...,10 , that
is, on the concatenation of the cluster feature vectors Ĉ=j predicted by every γ j () (i.e.
Ĉ=j = γ j (Xj )) with j varying between 1 and 10.
In principle, any feature engineer can be used to enrich this independent space. For exam-
ple, a recent study (Suarez-Tangil et al. 2017) has shown that the accuracy of an android
malware detection pattern can be improved by accounting for independent features that
count the number of permissions granted and the number of intents declared. Based upon
this analysis, we define [Aj ]j =1...10 , that is, the vector of aggregate features Aj (with j vary-
ing between 1 and 10). Every aggregate feature Aj counts how many static binary features
of Xj are verified (i.e. set equal to 1) in the considered sample. For example, the aggregate
feature associated with the view on permissions counts how many permissions are declared
in the manifest of an android application.
Based upon the considerations reported above, we consider the aggregate function:
α j : Xj → Aj , (6)
as the function that maps Xj Aj .
onto Given an independent space spanned on both
[Ĉ=j ]j =1,...,10 and [Aj ]j =1...10 ,3 the dependent label Y (with values malicious and gen-
uine) and a representation of training samples as pairs of tuples spanned on independent
3 The advantage of accounting for aggregate information in Equation 7 is validated empirically in the
application scenario of this study (see the results in Section 4.4.1).
space [Ĉ=j ]j =1,...,10 ×[Aj ]j =1...10 and dependent label Y , the consensus meta-classification
function can be learned as:
σ : [Ĉ=j ]j =1,...,10 , [Aj ]j =1,...,10 → Y . (7)
In this study, we select a Random Forest algorithm to learn the classification function
σ ().
To deal with the class imbalanced data setting, which is common in an android scenario
(as that explored in Section 4), we decide to balance the training set actually considered
to learn σ (). For this purpose, we apply the clustering-based sampling strategy adopted in
(Andresini et al. 2020). In particular, we consider a training set that comprises all the mali-
cious training samples (minority class samples) and a number of genuine pseudo-samples,
which are constructed as representatives of the genuine training samples (majority class
samples). These representatives are built by partitioning the genuine training samples, as
they are spanned on the space [Ĉ=j ]j =1,...,10 × [Aj ]j =1,...,10 , in clusters (with the k-means
algorithm) and selecting the centres of these clusters as genuine training pseudo-samples.
The algorithm k-means is run in the sampling strategy with the number of clusters set
equal to the user-defined parameter ρ multiplied by the number of malicious training sam-
ples. By default ρ = 1, that is, σ () is learned from a balanced training set comprising the
same number of malicious and genuine samples. As this balanced setting diminishes the
amount of genuine information processed to learn σ (), it may produce a high number of
false alarms. So, the parameter ρ can be used in MuViDA, in order to control the fall-out
of malware alerts (false positive rate) of σ (). 4
In short, the consensus malware detection pattern, that is finally learned in this stage, is
that defined with the function composition:

σ ◦ γ 1 (X1 ), γ 2 (X2 ), . . . , γ 10 (X10 ), α 1 (X1 ), α 2 (X2 ), . . . , α 10 (X10 ) , (8)
that combines Equations 5, 7 and 6. Equation 8 describes a pattern that allows us to classify
any android application as malicious or genuine, based on the multi-view tuple of values
derived through the considered static analysis.
4 Experimental analysis
The performance of MuViDA is investigated by considering 129013 benchmark android

applications described in (Arp et al. 2014). A brief description of the data considered in
this study is reported in Section 4.1. The evaluation metrics are illustrated in Section 4.2,
while the implementation details are reported in Section 4.3. Finally, the empirical results
are discussed in Section 4.4.
4.1 Data
The experimental study is carried out adopting the dataset and the evaluation methodology
introduced in (Arp et al. 2014).5 This dataset contains 123453 genuine android applications
and 5560 malware applications, collected in the period from August 2010 to October 2012.
4 The impact of the parameter ρ on the fall-out and sensitivity is investigated in the application scenario of
this study (see the results in Section 4.4.1).
5 https://www.sec.cs.tu-bs.de/∼danarp/drebin/
The applications are collected from various sources such as the GooglePlay Store, alter-
native Chinese Markets, alternative Russian Markets, other sources (e.g. android websites,
malware forums and security blogs) and Android Malware Genome Project (Zhou and Jiang
2012). Both the manifest and dexcode of these android applications undergo the static anal-
ysis described (Arp et al. 2014), extract 545292 string-defined features. These features are
represented as binary features and grouped in ten views as reported in Table 2 (column 2).
To speed up machine learning, the malware detection task is evaluated on the feature
selection of the considered dataset. This feature selection is performed by retaining the
most discriminant string features x, for which |p(x|malware) − p(x|genuine)| (estimated
on training data) is greater than 0.001. The number of features selected by applying this
procedure is reported in Table 2 (column 3). This use of feature selection is consistent with
findings in (Demontis et al. 2017; Roy et al. 2015; Wen and Yu 2017), as it is shown that
only a very small fraction of features is significantly discriminant and usually selected to
learn the malware detection pattern.
For the evaluation, the dataset is randomly split into a known partition (training dataset -
67%) and an unknown partition (testing dataset - 33%) on ten trials. The malware detection
pattern is determined on the training set, whereas the testing set is used for measuring the
final detection performance. In particualr, the evaluation procedure is repeated ten times,
so that performance results can be averaged on the performed independent trials. The par-
titioning ensures that reported results only refer to malicious applications unknown during
the learning phase. We note that the dataset splits into 67/33 are provided by the authors of
(Arp et al. 2014). We use these splits (instead of different data splits) to guarantee a safe
comparison between the accuracy performance achieved in this study and the performance
already described in (Arp et al. 2014).
4.2 Evaluation metrics
The following metrics are computed to evaluate the detection performance:
Table 2 DREBIN data description. For each view (column 1) derived from the static analysis: the number of
string features extracted (column 2); the mean (as well as the standard deviation) of the number of features
selected on each training set by performing the feature selection with p = 0.001 (column 3)
Views Features Selected Features (mean±stdev)
Manifest views
Hardware feature 72 15.7±0.458

Activity 185729 894.9±17.073
Intent 6379 86.1±3.590
Permission 3812 138.0±2.569
Provider 4513 16.1±1.220
Service & receiver 33222 432.4±12.760
Dexcode views
Restricted api call 315 98.0±3.098
Used permission 70 36.3±0.781
Suspicious call 733 100.9±8.549
Network address 310447 2192.8±21.885
– Sensitivity that measures the proportion of actual malware (positive samples), which
are correctly predicted as being malicious software. Formally,
TP
sensitivity = .
T P + FN
The higher the sensitivity, the better the true positive rate of the malware detection
pattern.
– Fall-out that measures the proportion of actual genuine applications (negative samples),
which are wrongly categorized as malicious (false alarms), with respect to the total
number of actual genuine applications. Formally,
FP
f all − out = .
FP + T N
The lower the fall-out, the lower the false alarm rate of the malware detection pattern.
– Area Under Roc curve (AUC) that measures the trade-off between sensitivity and
specificity (with specificity=1- fall-out). Formally,
1 TP TN
AU C = ( + ).
2 T P + FN T N + FP
The higher the AUC, better the malware detection pattern is at classifying malicious
samples as malicious and genuine samples as genuine.
In the above formulations, TP is the number of correctly classified samples that belong
to the class malicious, TN is the number of correctly classified samples that belong to the
class genuine and FP is the number of samples that are incorrectly classified as belonging
to the class malicious.
4.3 Implementation details
MuViDA is written in Java using the implementation of the multi-target Random Forest
algorithm included in CLUS toolkit (https://dtai.cs.kuleuven.be/clus/) and the implemen-
tation of the single Random Forest algorithm included in the Weka toolkit (https://www.
cs.waikato.ac.nz/ml/weka/index.html). The Random Forests are learned with 100 trees and
number of features to randomly investigate set equal to log2 (#independent features)+1) (i.e.
the default choice reported in (Breiman 2001)). In addition, in the multi-target Random
Forest algorithm, we set GainRatio as heuristic and ProbabilityDistribution as voting type.
4.4 Results and discussion
The presentation of the results is organized as follows. We start investigating how the
detection performance can be influenced by the parameter configuration of the learn-
ing components (clustering and stacking component) that contribute to the definition
of MuViDA (see Section 4.4.1). We proceed by comparing the detection performance
of MuViDA to that of single-view and multi-view competitors described in both the
machine learning and android security literature (Section 4.4.2). We complete the study
(Secion 4.4.3) by analyzing the time spent in learning the multi-view malware detection
pattern, as well as in classifying an android application. All the experiments are run on
ReCaS cloud, CPU 1:8 @ 2Ghz 2,16.0 GB RAM, running Ubuntu 14.04.4 (GNULinux
3.13.0-39-generic x86 64).
4.4.1 Learning component study
The accuracy performance of MuViDA is here investigated along the parameter configura-
tion of the clustering and stacking stages.
Clustering The structure of the clustering patterns is influenced by the user-defined thresh-
old θ , that controls the number of clusters discovered per view and, consequently, the
granularity of the clustering pattern that is exchanged across the multiple views. We repeat
the experiment by varying θ between 0.001, 0.005 (default) and 0.01. For each view, the
number of clusters discovered is plotted in Fig. 4a, while the time (in seconds) spent discov-
ering the clustering patterns is plotted in Fig. 4b. Both the number clusters and the clustering
time are averaged on the ten independent trials of this study and computed by varying θ
with ρ = 1 (default configuration of ρ). As expected, the number of discovered clusters,
as well as the clustering time monotonically decrease with respect to clustering threshold
θ: the lower the value of θ , the higher the number of clusters (i.e the finer the clustering
granularity) and the more the time spent deriving this clustering information. This observed
change in the granularity of the clustering pattern may slightly condition the accuracy of the
malware detection pattern. In particular, the accuracy results reported in Fig. 5a-c show that
the change in the clustering granularity may cause only small fluctuations in the accuracy
measured along both the sensitivity, fall-out and AUC. In any case, the highest AUC can be
observed with θ = 0.005
Stacking The performance of the stacking stage depends on the independent feature space
that is considered to learn the combiner, as well as the size of the sampling strategy that is
applied to deal with the imbalanced class problem.
The independent space of the staking combiner (see Section 3.3) is composed of the
cluster features computed via clustering (see Section 3.1) and exchanged via multi-target
classification (see Section 3.2). This space is enriched with the aggregate features, which
are synthesized as an overall summary of the android applications in each considered view
(see Section 3.3). For each view considered, the mean values of these aggregate features
are reported in Table 3. The box plot distribution of the aggregate feature over the mal-
ware and genuine applications is reported in Fig. 6a for the view of used permissions and
(a) Number of clusters (b) Clustering time (seconds)
Fig. 4 Number of clusters discovered per view (Fig. 4a). Time spent performing the clustering stage along
each view (Fig. 4b). Results are averaged on 10 trials and plotted by varying θ between .001, .005 (default)
and .01
(a) sensitivity (b) fall-out (c) AUC
Fig. 5 Average sensitivity (Fig. 5a), average fall-out (Fig. 5b) and average AUC (Fig. 5c) of MuViDA.
Results are averaged on 10 trials and plotted by varying θ between .001, .005 (default) and .01, while setting-
up ρ = 1 (default)
Fig. 6b for the view of api calls. This preliminary analysis of the aggregate feature distribu-
tion highlights that aggregate features are distributed differently over malware and genuine
applications (the only exceptions are Provider and Network). This observation supports the
idea of exploiting these features in learning the stacking combiner, in order to improve its
malware discovery ability.
To further evaluate how the aggregate features can actually contribute to the accuracy of
the stacking combiner, we also compare the accuracy of stacking over an independent space
populated with: (1) cluster features, (2) aggregate features, (3) cluster features and aggre-
gate features (default configuration). The accuracy results are reported in Table 4. These
results are collected by disabling the clustering-based sampling strategy, i.e. considering all
the genuine training samples to learn the stacking combiner. This analysis confirms that
learning the stacking combiner by accounting for both the cluster features and the aggre-
gate features gains sensitivity (as well, impoves AUC) without augmenting the fall-out. This
supports our decision to adopt this enriched independent feature schema for the stacking
stage.
Final considerations arise from the analysis of the performance of the classifications
along the size the clustering-based sampling strategy. This size is controlled by parameter ρ.
As the sampling strategy is performed to gain sensitivity in the imbalanced class setting, ρ
allows us to control the amount of genuine information sampled and, consequently, take the
fall-out under control. We evaluate how the final classifications depend on ρ by performing
a set of experiments with θ = 0.005 (default configuration of θ ) and ρ varying between 1
(default value), 1.25, 1.5, 1.75 and 2. The results are reported in Fig. 7a-c. They confirm
Table 3 The overall mean of the aggregate feature values, which are computed by counting the number of
features verified per each view on both the collection of malicious samples (M) and the collection of genuine
samples (G), respectively
view M G view M G
Suspicious call 6.2 4.1 Used permission 6.0 3.9

Restricted api call 6.8 4.7 Hardware feature 3.7 2.7
Activity 3.3 5.1 Provider 0.2 0.1
Service & Receiver 3.3 0.8 Permission 11.7 4.3
Intent 4.4 2.9 Network 15.6 17.7
(a) Used permission (b) Restricted api call
Fig. 6 Box plot distributions of aggregate feature values computed over malware and genuine applications
for the views of used permissions (Fig. 7b) and api calls (Fig. 7b). The line that divides the box into two parts
indicates the 2nd quartile (median value)
that we are actually able to diminish the fall-out by increasing ρ. As expected, this happens
at the cost of the sensitivity, while not necessarily at the cost of the AUC.
4.4.2 Comparative study
The methods used for the comparison are listed here.

– MuViDA, which leverages the power of both clustering and classification, as it imple-
ments the multi-view clustering-aided classification approach illustrated in Section 3.
For this comparative experiment, we consider MuViDA’ performance by using the
default configuration of both θ = 0.005 (see Section 3.1) and ρ = 1 (see Section 3.3).
– Single, which learns a classification function from the data spanned over the feature
vector model of a single data view. Ten distinct configurations of this algorithm can be
considered, that is, one configuration for each data view selected. Based on the analysis
of the performance of the ten configurations (see results reported in Fig. 8a-c), we select
the configuration Single (Permission) for the subsequent comparative analysis. This
configuration achieves the highest sensitivity and the highest AUC with fall-out equal
to 0.0027. The only configurations with a fall-out lower than 0.0027 (Activity, Provider,
Service & Receiver and Network) have a sensitivity lower than 0.650. We note that this
result achieved with permission information can be considered as a new confirmation
of the common opinion that permissions play a crucial role in the android security
system.
Table 4 Stacking detection performance: average sensitivity (avgSensitivity), average fall-out (avgFallOut)
and average AUC of the stacking combiner learned with cluster features (column 2), aggregate features
(column 3), cluster features and count features (columns 4). Results are computed without applying the
clustering-based sampling strategy. For each compared configuration, metrics are averaged on 10 trials
accuracy Cluster features Aggregate features Cluster + Aggregate features
avgSensitivity 0.913 0.808 0.916

avgFallOut 0.002 0.002 0.002
avgAUC 0.955 0.902 0.957
Fig. 7 Average sensitivity (Fig. 7a), average fall-out (Fig. 7b) and average AUC (Fig. 7c) of MuViDA.
Results are averaged on 10 trials and plotted by varying ρ between 1 (default), 1.25, 1.5, 1.75 and 2, while
setting-up θ = 0.005
– All, which learns a classification function from the data spanned over the feature vector
model, that is derived by concatenating the features associated with all data views into
a single vector model.
– Stacking, which learns one classification function from the data spanned over the
feature vector model of each data view and combines these multiple classification
functions by stacking. This method implements the multi-view malware detection com-
petitor based on the stacking multi-view strategy and described in (Bai and Wang 2016;
Zhang et al. 2016).
– Ensemble, which learns one classification function from the data spanned over the
feature vector model of each data view and combines these multiple classification func-
tions by the majority voting rule. This method implements the multi-view malware
detection competitor based on the ensemble multi-view strategy and described in (Bai
and Wang 2016; Zhang et al. 2016).
– Aggregate+Clustering-based Sampling, which uses the clustering-defined sampling
strategy to handle the imbalanced class problem when learning a classification function
from the data spanned over the vector of aggregate features—with an aggregate feature
associated with every view to count how many features of the view are verified (i.e. set
equal to 1) in a sample. This method is introduced in (Andresini et al. 2020).
– Deep-Net, which implements the deep neural network described in (Vinayakumar et al.
2018) for Android malware detection. The architecture processes the data spanned over
the feature vector model, that is derived by concatenating the features associated with
all data views into a single vector model. It is a MLP with 4 fully- connected layers
and a final softmax layer. As described in (Vinayakumar et al. 2018), Dropout (0.01)
layers– to prevent over-fitting– and Batch Normalization layers – to speed up model
Fig. 8 Average sensitivity (Fig. 8a), fall-out (Fig. 8b) and AUC (Fig. 8c) of Single run with the configura-
tions Suspicious call, Used permission, Restricted api call, Hardware feature, Activity, Provider, Service &
Receiver, Permission, Intent and Network. Results are averaged on 10 trials
training – have been used between the fully-connected layers. The rectified linear unit
(ReLu) activation function has been used for each hidden layer.
All competitors listed above, except for Deep-Net, use the Random Forest algorithm,
that is implemented in Weka toolkit (https://www.cs.waikato.ac.nz/ml/weka/index.html), as
a base classification algorithm. In particular, All uses a Random Forest algorithm to learn
a classification function from the concatenated dataset. Single, Stacking and Ensemble
learn a Random Forest from each single-view dataset. In addition, Stacking learns also a
Random Forest as a stacking combiner. We note that our implementation of Ensemble and
Stacking, using the Random Forest algorithm for learning both the intra-view classification
functions and the stacking combiner (in the case of Stacking), is consistent with the work
(Bai and Wang 2016). This work introduced these competitors and showed Random Forests
as superior to other baseline classification algorithms. Similarly, the authors of (Andresini
et al. 2020) use the Random Forest algorithm for the classification in their study. Finally, as
in MuViDA, the Random Forests of the competitors are learned with 100 trees and number
of features to randomly investigate set equal to log2 (#independent features)+1).
For the sake of comparison, we also consider the performance of the malware detection
method Drebin (Arp et al. 2014). This performs machine learning via a Support Vector
Machine on the entire feature set (545292 string-defined features), derived by concatenating
manifest and dexcode features. The accuracy performance of Drebin is reported in (Arp
et al. 2014) and measured on the same dataset splits considered in this study, so that the
result in (Arp et al. 2014) can be safely considered here as a baseline.
The accuracy results (sensitivity, fall-out and AUC) of the methods compared in this
study are reported in Table 5. The analysis of AUC shows that MuViDA outperforms its
competitors in terms of overall capability of distinguishing between malicious and genuine
classes. Interestingly this performance is confirmed with the analysis of sensitivity except
for the competitor Aggregate + Clustering-based Sampling. This is the competitor that,
similarly to MuViDA, introduces the clustering-based sampling strategy, in order to deal
with the imbalanced class data. However, the higher sensitivity of Aggregate + Clustering-
based Sampling is at the cost of the higher fall-out reported in Table 5. This result shows
that the clustering-based sampling strategy suffers of a high number of false alarms (i.e. 30
false alarms when installing 100 application) if it is decoupled from the multi-view cluster
features, which represent the actual novelty of MuViDA.
Table 5 Comparative detection performance: average sensitivity (avgSensitivity), average fall-out (avgFall-
Out) and average AUC (avgAUC). Metrics are averaged on 10 trials. The average detection rates of Drebin
are reported in (Arp et al. 2014). “-” denotes that no value is reported in (Arp et al. 2014) for the AUC
Algorithm avgSensitivity avgFallOut avgAUC
MuViDA 0.960 0.028 0.966

Single (Permission) 0.822 0.002 0.909
All 0.874 0.001 0.936
Stacking 0.878 0.001 0.938
Ensemble 0.649 0.0005 0.824
Aggregate+Clustering-based Sampling 0.976 0.309 0.833
Deep-Net 0.891 0.001 0.945
Drebin 0.939 0.010 -
Further considerations can be made by broadening this discussion to the analysis of the
multi-view competitors All, Stacking, Ensemble, Deep-Net and Drebin. We note that
sensitivity (and, in general, AUC) is commonly gained when machine learning accounts for
a broad spectrum of multiple feature views. The exception is Ensemble that is outperformed
by Single (Permission). This mainly depends on the specific weakness of the ensemble as a
multi-view learner. In fact, the ensemble detection pattern applies a multi-view classification
function combiner that is lazy-based on the majority voting rule. This means that there is
no sophisticated learning to derive a classification function combiner. The consequence is
that the ensemble can be skewed towards the weaker single-view classification functions,
which are included in the multi-view combiner. Both Stacking and MuViDA are more
robust to this weakness as they apply a multi-view combiner that is eagerly learned across
the various views. This interpretation of the performance of the ensemble approach is also
empirically supported by studies described in both (Bai and Wang 2016) and (Zhang et al.
2016), which achieve a similar result, showing that stacking can outperform ensemble as a
multi-view learner for malware detection. We also note that All and Drebin do not suffer
from the problem of learning a classification function combiner, as they directly learn one
classification function from the joined dataset derived by concatenating all features in a
single feature vector model. However, as pointed out in (Yu et al. 2012), this kind of data
fusion approach may cause the curse of dimensionality problem.
It is also significant that MuViDA surpasses the sensitivity of Drebin—the baseline
competitor defined in (Arp et al. 2014) for this data scenario—by 2.1 percentage points.
This improvement is not negligible by taking into account that it is achieved in a challenging
imbalanced setting and with a small increase of the fall-out (the fall-out passes from 1 false
alarm installing 100 applications in Drebin to 2.8 false alarms installing 100 applications
in MuViDA). In any case, we remind that if we run MuViDA with ρ = 1.75, we are able
to learn a malware detection pattern achieving the sensitivity of 0.956 at the fall-out of
0.01 (see the results reported in Fig. 7a and b). Interestingly, this sensitivity performance
still outperforms that of Drebin (by 1.7 percentage points) without augmenting the fall-out
risk.
Final conclusions can be drawn from focusing the analysis on the performance of the
deep neural network. The accuracy performance of Deep-Net is close to that of Stacking
and All in terms of fall-out, although Deep-Net outperforms both methods in terms of sen-
sitivity and AUC. On the other, MuViDA significantly outperforms Deep-Net in terms of
sensitivity. Although, the higher sensitivity of MuViDA is at the cost of the higher fall-out,
the AUC of MuViDA remains greater than the AUC of Deep-Net
In short, our analysis until now confirms that multi-view learning can commonly out-
perform single-view learning, although the multi-view classification function combination
requires sophisticated machine learning. The clustering-based sampling strategy allows us
to improve the ability of detecting a malware, but it produces a growth in the number of
false alarms. However, coupling the clustering-based sampling with the specific multi-view
learning approach proposed here is a good means to realize the desirable trade-off between
high true positive rate and low false alarm rate. These conclusions are statistically supported
by the one-way analysis of variance (ANOVA).6 This is performed, in order to determine
whether there are any significant differences between the averages of the AUC rates along
6 The ANOVA analysis leaves-out competitor Drebin, as (Arp et al. 2014) reports only the sensitivity and fall-
out of Drebin averaged on the ten trials of the experiment (an no result on AUC), while the ANOVA analysis
is done on the the series of results collected on the ten splits.
Fig. 9 The interactive graphs of the Anova analysis performed for the AUC of the methods compared in the
analysis. The analysis compares the AUC rate computed for each method (MuviDA, Single (Permission), All,
Stacking, Ensemble, Aggregate+Clustering-based Sampling and Deep-Net) on the ten independent splitting
trials (training set - testing set) considered in this study. The average accuracy rates are shown in Table 5
the compared methods on the tested trials. The significance level of this statistical analy-
sis is 0.01. The ANOVA analysis reveals that the test hypothesis is rejected as the detection
rates of the compared methods are statistically different. A multiple comparison test is per-
formed, in order to determine whether any of these averages are significantly different from
each other. The results plotted in Fig. 9 highlight that the AUC of MuViDA is statistically
better than the AUC of competitors at the p-value of 8.4951e-70 in the the ANOVA analysis.
4.4.3 Computation time analysis
We complete this empirical investigation by analyzing the computation time spent, on aver-
age, performing both the training phase and the testing phase of MuViDA. In particular,
the training phase is computed spending 8809.908 seconds on average, while every testing
android application is classified spending 0.21 seconds on average. We consider that these
computation times may satisfy the requirements of an android security service that enables
users to build the machine learning pattern offline on a remote cloud, while it uses the dis-
covered detection pattern to analyze online a request for application installation. We also
believe in parallel machine learning as an effective direction to speed up the machine learn-
ing service, considering that several steps of the presented multi-view learning methodology
can be run in parallel (e.g. clustering and multi-target clustering-aided classification). In any
case, this aspect requires further investigation that is out of the scope of this paper.
5 Conclusion
A novel multi-view machine learning method for android malware detection is described.
This method accounts for information that can be extracted via reverse engineering through
a broad static analysis of the manifest and dexcode of android applications. Static infor-
mation is extracted by accounting for the multiple data views introduced in (Arp et al.
2014). The presented method computes the clustering structure of each data view and
uses the multi-target classification strategy, in order to exchange this clustering knowledge
across the multiple views. Clustering is performed, in order to distinguish clusters of (mali-
cious/genuine) android applications, that presumably fit together a signature (cluster centre)
in the considered view. Multi-target classification is performed to constrain the search for
patterns useful for malware detection, which agree across the multiple views. Stacking is
finally performed to achieve a consensus malware detection pattern by combining the var-
ious patterns learned joining clustering and multi-target classification through multi-view
learning and resorting to sampling strategy to deal with the imbalanced class problem.
The effectiveness of the proposed method is assessed via an empirical study on a bench-
mark collection of android applications. This benchmark is made complex by the presence
of an imbalanced class problem (the malware collection is less than five percent of the entire
benchmark). This study confirms that the proposed combination of clustering and classi-
fication via multi-view learning can effectively deal with data, spanned over a variety of
data views. It gains accuracy by detecting malware applications and limiting the number of
false alarms. The empirical study contributes to proving that a multi-view learning method
is more accurate than traditional single-view methods, which focus on a single informa-
tion source. The proposed method is also more accurate than the state-of-the-art multi-view
learning methods, which learn the classification patterns directly from the multiple data
views, instead of leveraging the knowledge enclosed in the hidden cluster structures of the
these views. The presented method aso outperforms the ad-hoc baseline solution proposed
in the literature (Arp et al. 2014) for this benchmark, as well as a deep neural network
formulated for malware detection tasks (Vinayakumar et al. 2018).
Various directions for further work are still to be explored. The proposed method has
been implemented tightly connected to the specific cybersecurity task at hand (i.e. mal-
ware detection on the static analysis of android applications). However, multi-view learning
strategies have been successively put forward in the recent literature for various domains,
ranging from process modelling (Appice and Malerba 2016) to disease analysis (Valmarska
and Miljkovic 2017), biological and census domains (Ceci et al. 2012), as well as remote
sensing (Appice et al. 2017; Kumar 2015). Motivated by the increasing interest in applica-
tions of multi-view learning in various domains, we plan to perform further investigations
on the application potential of the proposed method by evaluating its effectiveness in new
classification tasks outside the cybersecurity field. A further research direction concerns the
imbalanced aspect of the considered scenario. The effectiveness of the proposed method
has been evaluated in a realistic imbalanced scenario of cybersecurity. The imbalanced class
problem has been explicitly handled by resorting to a resampling strategy. In any case, we
plan to explore new strategies for imbalanced class data (see (Fernȧndez et al. 2018) for
a recent survey) to investigate if their introduction can contribute to gain further sensitiv-
ity lowering the fall-out of the android malware detection ability. In addition, following an
emerging research direction (Sun et al. 2019) that has started the exploration of deep learn-
ing in combination with multi-view learning, we may investigate how deep learning can be
used, in order to pre-train multiple views for the feature learning of the classification pat-
terns. In addition, parallel machine learning can be investigated, in order to perform the
parallel computation of both clustering and classification patterns over the separate data
views. It would be also interesting to study the performance of the proposed algorithm when
the data views are populated with hybrid features arising from both static and dynamic anal-
ysis of android applications. Finally, one limitation of the proposed method is that it is not
designed to account for drift in malicious characteristics induced by malware evolution over
time. This can be addressed using online approaches in the several phases of the proposed
algorithm. Another limitation that follows from the use of machine learning is the possibility
of adversarial attacks. Adversaries may succeed in reducing its accuracy by incorporating
genuine features or fake invariants into malicious applications. Even though such adver-
sarial attacks against machine based detectors cannot be ruled out in general, meticulous
sanitation of training data can limit their impact.
Acknowledgements The authors wish to thank Lynn Rudd for her help in reading the manuscript, Dragi
Kocev for his help in configuring the parameters, in order to learn the multi-target Random Forests, Daniel
Arp for providing the benchmark data used in the empirical study and ReCaS-Bari resource team for
providing the infrastructure to run the experiential study.
References
Alam, M.S., & Vuong, S.T. (2013). Random forest classification for detecting android malware. In Proceed-
ings of the 2013 IEEE International Conference on Green Computing and Communications and IEEE
Internet of Things and IEEE Cyber, Physical and Social Computing, pp. 663–669.
Alzaylaee, M., Yerima, S., Sezer, S. (2017). Improving dynamic analysis of android apps using hybrid
test input generation. In International Conference on Cyber Security and Protection of Digital Ser-
vices (Cyber Security 2017): Proceedings, pp. 1–8. IEEE, https://doi.org/10.1109/CyberSecPODS.2017.
8074845.
Alzaylaee, M.K., Yerima, S.Y., Sezer, S. (2020). Dl-droid: Deep learning based android malware detection
using real devices. Computers & Security, 89(101), 663. https://doi.org/10.1016/j.cose.2019.101663.
Andresini, G., Appice, A., Malerba, D. (2020). Dealing with Class Imbalance in Android Malware Detec-
tion by Cascading Clustering and Classification, pp. 173–187. Springer International Publishing: Cham,
Switzerland.
Appice, A., Guccione, P., Malerba, D. (2017). A novel spectral-spatial co-training algorithm for the
transductive classification of hyperspectral imagery data. Pattern Recognition, 63, 229–245.
Appice, A., & Malerba, D. (2016). A co-training strategy for multiple view clustering in process mining.
IEEE Trans. Services Computing, 9(6), 832–845.
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K. (2014). DREBIN : Effective and explainable
detection of android malware in your pocket. In Proceedings of the 21st Annual Network and Distributed
System Security Symposium. The Internet Society.
Arthur, D., & Vassilvitskii, S. (2007). K-means++: the advantages of careful seeding. In Proceedings of the
8th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and
Applied Mathematics.
Bai, J., & Wang, J. (2016). Improving malware detection using multi-view ensemble learning. Security and
Communication Networks, 9(17), 4227–4241.
Bhatia, T., & Kaushal, R. (2017). Malware detection in android based on dynamic analysis. In Proceedings
of the 2017 International Conference on Cyber Security And Protection Of Digital Services (Cyber
Security), pp. 1–6.
Bholowalia, P., & Kumar, A. (2014). Article: ebk-means: A clustering technique based on elbow method and
k-means in wsn. International Journal of Computer Applications, 105(9), 17–24.
Breiman, L. (2001). Random forests. Machine Learning. 45(1), pp 5–32.
Ceci, M., Appice, A., Viktor, H.L., Malerba, D., Paquet, E., Guo, H. (2012). Transductive relational classifi-
cation in the co-training paradigm. In Perner, P. (Ed.) Proceedings of the 8th International Conference
on Machine Learning and Data Mining in Pattern Recognition, LNCS, vol. 7376, pp. 11–25. Springer,
Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4 2.
Demontis, A., Melis, M., Biggio, B., Maiorca, D., Arp, D., Rieck, K., Corona, I., Giacinto, G., Roli, F. (2017).
Yes, Machine Learning Can Be More Secure! A Case Study on Android Malware Detection. IEEE
Transactions on Dependable and Secure Computing. PP. https://doi.org/10.1109/TDSC.2017.2700270.
Fan, M., Liu, J., Wang, W., Li, H., Tian, Z., Liu, T. (2017). Dapasa: Detecting android piggybacked apps
through sensitive subgraph analysis. IEEE Transactions on Information Forensics and Security, 12(8),
1772–1785. https://doi.org/10.1109/TIFS.2017.2687880.
Fernȧndez, A., Garcı́a, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F. (2018). Learning from Imbalanced
Data Sets Springer.
Folino, G., & Pisani, F. (2016). Evolving meta-ensemble of classifiers for handling incomplete and
unbalanced datasets in the cyber security domain. Applied Soft Computing, 47, 179–190.
Garcia-Ceja, E., Galván-Tejada, C.E., Brena, R. (2018). Multi-view stacking for activity recognition with
sound and accelerometer data. Information Fusion, 40, 45–56.
Goyal, R., Spognardi, A., Dragoni, N., Argyriou, M. (2016). Safedroid: a distributed malware detection
service for android. In Proceedings of the 2016 IEEE 9th International Conference on Service-Oriented
Computing and Applications (SOCA), pp. 59–66.
Guo, S., Yuan, Q., Lin, F., Wang, F., Ban, T. (2010). A malware detection algorithm based on multi-view
fusion. In Wong, K.w., Mendis, B.S.U., Bouzerdoum, A. (Eds.) Neural Information Processing. Models
and Applications, pp. 259–266. Springer.
Idrees, F., & Rajarajan, M. (2014). Investigating the android intents and permissions for malware detection. In
Proceedings of the IEEE 10th International Conference on Wireless and Mobile Computing, Networking
and Communications, pp. 354–358.
Kang, B., Yerima, S.Y., Mclaughlin, K., Sezer, S. (2016). N-opcode analysis for android malware classifi-
cation and categorization. In 2016 International conference on cyber security and protection of digital
services (cyber security), pp. 1–7.
Kapratwar, A., Troia, F., Stamp, M. (2017). Static and dynamic analysis of android malware. In Proceed-
ings of the 3rd International Conference on Information Systems Security and Privacy, pp. 653–662.
SCITEPRESS.
Khorshidpour, Z., Hashemi, S., Hamzeh, A. (2017). Evaluation of random forest classifier in security domain.
Applied Intelligence, 47(2), 558–569. https://doi.org/10.1007/s10489-017-0907-2.
Kocev, D., Vens, C., Struyf, J., Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern
Recognition, 46(3), 817–833.
Kumar, V. (2015). Multi-view ensemble learning using optimal feature set partitioning: An extended exper-
iments and analysis in low dimensional scenario. Procedia Computer Science, 58, 499–506. Second
International Symposium on Computer Vision and the Internet.
Last, M. (2016). Multi-target classification: Methodology and practical case studies. In Berendt, B., Bring-
mann, B., Fromont, É., Garriga, G.C., Miettinen, P., Tatti, N., Tresp, V. (Eds.) Proceedings of the
Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016,
Part III, LNCS, vol. 9853, pp. 280–283. Springer.
Li, Y., Shen, T., Sun, X., Pan, X., Mao, B. (2015). Detection, classification and characterization of
android malware using api data dependency. In Thuraisingham, B., Wang, X., Yegneswaran, V. (Eds.)
Proceedings of the Security and Privacy in Communication Networks, pp. 23–40. Springer.
Lin, W., Wu, Z., Lin, L., Wen, A., Li, J. (2017). An ensemble random forest algorithm for insurance big data
analysis. IEEE Access, 5(16), 568–16,575.
Madjarov, G., Kocev, D., Gjorgjevikj, D., Džeroski, S. (2012). An extensive experimental comparison of
methods for multi-label learning. Pattern Recognition, 45(9), 3084–3104.
Miller, S.T., & Busby-Earle, C. (2017). Multi-perspective machine learning a classifier ensemble method for
intrusion detection. In Proceedings of the 2017 International Conference on Machine Learning and Soft
Computing, ICMLSC ’17, pp. 7–12. ACM, https://doi.org/10.1145/3036290.3036303.
Milosevic, N., Dehghantanha, A., Choo, K.K.R. (2017). Machine learning aided android malware classifica-
tion. Computers and Electrical Engineering, 61, 266–274.
Narayanan, A., Chandramohan, M., Chen, L., Liu, Y. (2018). A multi-view context-aware approach to
android malware detection and malicious code localization. Empirical Software Engineering, 23(3),
1222–1274. https://doi.org/10.1007/s10664-017-9539-8.
Narayanan, A., Soh, C., Chen, L., Liu, Y., Wang, L. (2018). Apk2vec: Semi-supervised multi-view rep-
resentation learning for profiling android applications. In IEEE International conference on data
mining, ICDM 2018, singapore, november 17-20, 2018, pp. 357–366. IEEE computer society,
https://doi.org/10.1109/ICDM.2018.00051.
Nguyen-Vu, L., Ahn, J., Jung, S. (2019). Android fragmentation in malware detection. Computers & Security,
87(101), 573. https://doi.org/10.1016/j.cose.2019.101573.
NOKIA (2019). Nokia threat intelligence report – 2019. White paper, online at
https://pages.nokia.com/T003B6-Threat-Intelligence-Report-2019.html.
Painter, N., & Kadhiwala, B. (2017). Comparative analysis of android malware detection techniques. In
Satapathy, S.C., Bhateja, V., Joshi, A. (Eds.) Proceedings of the International Conference on Data
Engineering and Communication Technology, pp. 131–139. Springer.
Papagiannopoulou, C., Tsoumakas, G., Tsamardinos, I. (2015). Discovering and exploiting determin-
istic label relationships in multi-label learning. In Cao, L., Zhang, C., Joachims, T., Webb, G.I.,
Margineantu, D.D., Williams, G. (Eds.) Proceedings of the 21th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pp. 915–924. ACM.
Peiravian, N., & Zhu, X. (2013). Machine learning for android malware detection using permission and api
calls. In Proceedings of the IEEE 25th International Conference on Tools with Artificial Intelligence,
pp. 300–305.
Rovelli, P., & Vigfússon, Ý. (2014). Pmds: Permission-based malware detection system. In Prakash, A., &
Shyamasundar, R. (Eds.) Proceedings of the Information Systems Security, pp. 338–357. Springer.
Roy, S., DeLoach, J., Li, Y., Herndon, N., Caragea, D., Ou, X., Ranganath, V.P., Li, H., Guevara, N. (2015).
Experimental study with real-world data for android app security analysis using machine learning. In
Proceedings of the 31st Annual Computer Security Applications Conference, ACSAC 2015, pp. 81–90.
Sheen, S., Anitha, R., Natarajan, V. (2015). Android based malware detection using a multifeature
collaborative decision fusion approach. Neurocomputing, 151, 905–912.
Shiqi, L., Shengwei, T., Long, Y., Jiong, Y., Hua, S. (2018). Android malicious code classification
using deep belief network. KSII Transactions on Internet and Information Systems, 12, 454–475.
https://doi.org/10.3837/tiis.2018.01.022.
Suarez-Tangil, G., Dash, S.K., Ahmadi, M., Kinder, J., Giacinto, G., Cavallaro, L. (2017). Droidsieve: Fast
and accurate classification of obfuscated android malware. In Proceedings of the 7th ACM on Conference
on Data and Application Security and Privacy, CODASPY 2017, pp. 309–320.
Sun, S., Mao, L., Dong, Z., Wu, L. (2019). Multiview Deep Learning, (pp. 105–138). Singapore: Springer
Singapore.
Taheri, R., Javidan, R., Shojafar, M., Pooranian, Z., Miri, A., Conti, M. (2019). On defending against label
flipping attacks on malware detection systems. 1908.04473.
Tajoddin, A., & Abadi, M. (2019). Ramd: registry-based anomaly malware detection using one-class
ensemble classifiers Applied Intelligence.
Talha, K.A., Alper, D.I., Aydin, C. (2015). Apk auditor: Permission-based android malware detection system.
Digital Investigation, 13, 1–14.
Tiwari, P.K., & Singh, U. (2015). Android users security via permission based analysis. In Abawajy, J.H.,
Mukherjea, S., Thampi, S.M., Ruiz-Martı́nez, A. (Eds.) Proceedings of the Security in Computing and
Communications, pp. 496–505. Springer.
Ucci, D., Aniello, L., Baldoni, R. (2019). Survey of machine learning techniques for malware analysis.
Computers &, Security, 81, 123–147. https://doi.org/10.1016/j.cose.2018.11.001.
Valmarska, A., & Miljkovic, D. (2017). Robnik-Šikonja, M., lavrač, N.: Multi-view approach to parkinson’s
disease quality of life data analysis. In Appice, A., Ceci, M., Loglisci, C., Masciari, E., Raś, Z.W. (Eds.)
Proceedings of the 2016 New Frontiers in Mining Complex Patterns, Selected papers, pp. 163–178.
Springer.
Vinayakumar, R., BarathiGanesh, H., Poornachandran, P., AnandKumar, M., Somank., P. (2018). Deep-net:
Deep neural network for cyber security use cases. 1812.03519.
Wen, L., & Yu, H. (2017). An android malware detection system based on machine learning. In Proceedings
of the AIP Conference, vol. 1864. American Institute of Physics.
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259.
Yerima, S.Y., Sezer, S., Muttik, I. (2014). Android malware detection using parallel machine learning classi-
fiers. In Proceedings of the 8th International Conference on Next Generation Mobile Apps, Services and
Technologies, pp. 37–42.
Yu, J., Wang, M., Tao, D. (2012). Semisupervised multiview distance metric learning for cartoon synthesis.
IEEE Transactions on Image Processing, 21(11), 4636–4648.
Zhang, Y., Huang, Q., Ma, X., Yang, Z., Jiang, J. (2016). Using multi-features and ensemble learning method
for imbalanced malware classification. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA,
pp. 965–973.
Zhao, J., Xie, X., Xu, X., Sun, S. (2017). Multi-view learning overview: Recent progress and new challenges.
Information Fusion, 38, 43–54.
Zhou, Y., & Jiang, X. (2012). Dissecting android malware: Characterization and evolution. In Proceedings
of the 2012 IEEE Symposium on Security and Privacy, pp. 95–109.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

Clustering-Aided Multi-View Classification: A Case Study On Android Malware Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Clustering-Aided Multi-View Classification: A Case Study On Android Malware Detection

Uploaded by

Copyright:

Available Formats

Journal of Intelligent Information Systems (2020) 55:1–26

Clustering-Aided Multi-View Classiﬁcation: A Case

Annalisa Appice1,2 · Giuseppina Andresini1 · Donato Malerba1,2

Received: 3 October 2019 / Revised: 29 January 2020 / Accepted: 17 March 2020 /

Keywords Multi-view Learning · Classification · Clustering · Android Malware

1 Department of Informatics, Università degli Studi di Bari Aldo Moro,

2 Background, motivation and contribution

Table 1 Description of frequently used symbols

A android application domain

3 Multi-view clustering-aided classiﬁcation

MuViDA is formulated in the supervised setting by resorting to the multi-view learning

– A final multi-view combination stage that applies a stacking-based fusion method to

3.1 Clustering stage

Fig. 2 The intra-cluster variance

3.2 Knowledge exchange stage

A classification strategy is formulated, in order to exchange view-related clustering patterns

3.3 Stacking stage

A stacking combiner (Wolpert 1992) is finally learned as a consensus malware detection

The performance of MuViDA is investigated by considering 129013 benchmark android

4.2 Evaluation metrics

The following metrics are computed to evaluate the detection performance:

Views Features Selected Features (mean±stdev)

Hardware feature 72 15.7±0.458

4.3 Implementation details

4.4 Results and discussion

4.4.1 Learning component study

(a) Number of clusters (b) Clustering time (seconds)

(a) sensitivity (b) fall-out (c) AUC

Suspicious call 6.2 4.1 Used permission 6.0 3.9

(a) Used permission (b) Restricted api call

4.4.2 Comparative study

The methods used for the comparison are listed here.

accuracy Cluster features Aggregate features Cluster + Aggregate features

avgSensitivity 0.913 0.808 0.916

(a) sensitivity (b) fall-out (c) AUC

(a) sensitivity (b) fall-out (c) AUC

Algorithm avgSensitivity avgFallOut avgAUC

MuViDA 0.960 0.028 0.966

4.4.3 Computation time analysis

You might also like