Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Performance Anomaly Detection Models

of Virtual Machines for Network Function


Virtualization Infrastructure with
Machine Learning

Juan Qiu(B) , Qingfeng Du(B) , Yu He, YiQun Lin, Jiaye Zhu, and Kanglin Yin

Tongji University, Shanghai 201804, China


juan qiu@tongji.edu.cn, du cloud@tongji.edu.cn

Abstract. Networking Function Virtualization (NFV) technology has


become a new solution for running network applications. It proposes a
new paradigm for network function management and has brought much
innovation space for the network technology. However, the complexity
of the NFV Infrastructure (NFVI) impose hard-to-predict relationship
between Virtualized Network Function (VNF) performance metrics (e.g.,
latency, throughput), the underlying allocated resources (e.g., load of
vCPU), and the overall system workload, thus the evolving scenario
of NFV calls for adequate performance analysis methodologies, early
detection of performance anomalies plays a significant role in providing
high-quality network services. In this paper, we have proposed a novel
method for detecting the performance anomalies in NFV infrastructure
with machine learning methods. We present a case study on the open
source NFV-oriented project, namely Clearwater, which is an IP Multi-
media Subsystem (IMS) NFV application. Several classical classifiers are
applied and compared empirically on the anomaly dataset which is built
by ourselves. Considering the risk of over-fitting issue, the experimental
results show that neutral networks is the best anomaly detection model
with the accuracy over 94%.

Keywords: NFV · Performance anomaly detection · Machine learning

1 Introduction

The paradigm of Network Function Virtualization (NFV) has immediately been


an emerging paradigm which is a new vision of the network that takes advantage
of advances in dynamic cloud architecture, Software Defined Networking (SDN),
and modern software provisioning techniques. The topic about NFV bottlenecks
analysis and relevant hardware and software features for high and predictable
performance have been already highlighted in the Group Specification (GS) by
European Telecommunications Standards Institute (ETSI) industry Specifica-
tion (ISG) Network Functions [1].
c Springer Nature Switzerland AG 2018
V. Kůrková et al. (Eds.): ICANN 2018, LNCS 11140, pp. 479–488, 2018.
https://doi.org/10.1007/978-3-030-01421-6_46
480 J. Qiu et al.

The purpose of this paper aims to detect performance anomalies by mod-


eling the various performance metrics data collected from the virtual machines
of the NFV platform. We conduct an experiment with an open source NFV-
oriented project, namely Clearwater, which has been designed to support mas-
sive horizontal scalability and adopts popular cloud computing design patterns
and technologies, to demonstrate how the proposed method can be applied to
the detection performance anomalies. The main contributions of this paper are
as follows:
1. Present an approach on how to build the performance anomaly dataset for
NFVI.
2. Put forward an approach for detecting performance anomalies in NFVI with
machine learning models.
The paper is organized as follows: The next section discusses the related
works; Methodology and implementation are presented in Sect. 3, then we con-
duct a case study on Clearwater in Sect. 4; Sect. 5 concludes and provides the
conclusion.

2 Related Works
Reliability studies for NFV technology including performance and security topics
are also hot research areas for both academia and industry. In order to guaran-
tee high and predictable performance of data plane workloads, a list of minimal
features which the Virtual Machine (VM) Descriptor and Compute Host Descrip-
tor should contain for the appropriate deployment of VM Images over an NFV
Infrastructure (NFVI) are presented [1]. NFV-Bench [2] is proposed by Domenico
et al. to analyze the faulty scenarios and to provide joint dependability and
performance evaluations for NFV systems. Bonafiglia et al. [3] provides a (pre-
liminary) benchmark of the widespread virtualization technologies when used
in NFV, which means when they are exploited to run the so-called virtual net-
work functions and to chain them in order to create complex services. Priyanka
et al. presents the design and implementation of a tool, namely NFVPerf [4], to
monitor performance and identify performance bottlenecks in an NFV system.
NFVPerf runs as part of a cloud management system like OpenStack and sniffs
traffic between NFV components in a manner that is transparent to the VNF.
Anomaly detection is an important data analysis task that detects abnormal
data from a given dataset, it is an important data mining research problem and
has been widely studied in many fields. It can usually be solved by statistics and
machine learning methods [5–8]. In recent years, anomaly detection literature in
NFV has also begun to emerge. Michail-Alexandros et al. [9] presented the use
of an open-source monitoring system especially tailored for NFV in conjunction
with statistical approaches commonly used for anomaly detection, towards the
timely detection of anomalies in deployed NFV services. Domenico et al. [10]
proposed an approach on an NFV-oriented Interactive Multimedia System to
detect problems affecting the quality of service, such as the overload, component
Performance Anomaly Detection Models for NFVI 481

crashes, avalanche restarts and physical resource contention. EbAT [11] is an


automated online detection framework for anomaly identification and tracking
in data center systems. Fu [12] proposed a framework for autonomic anomaly
detection on cloud systems and the proposed framework could select the most
relevant among the large number of performance metrics.
We have been actively participating in the OPNFV1 Yardstick project2 .
Especially, we continuously and deeply involved in the HA Yardstick framework
architecture evolution, and the fault injection techniques used in the this paper
are based on our previous research works [13,14]. Recently we are participating in
the OPNFV Bottlenecks project3 which is a testing project aims to find system
bottlenecks by testing and verifying OPNFV infrastructure in a staging environ-
ment before committing it to a production environment. Most cloud operators
identify performance bottlenecks by monitoring hardware resource utilization, or
other application-specific metrics obtained from instrumenting the application
itself. In this paper, we are trying to detect performance anomalies by modeling
the various performance metrics data collecting from the virtual machines of the
NFV platform.

3 Methodology and Implementation

3.1 Classification Problem

The performance anomaly detection method studied in this paper is based on


the classification methods. The essence of the anomaly detection problem is to
train and get a detection model by using the performance metrics data collected
from the virtual machine in the NFV infrastructure layer. The virtual machine
state characterized by the performance metrics collected in real time is divided
into multiple classes based on the anomaly detection model.

Fig. 1. The training and testing processes of anomaly detection model

1
https://www.opnfv.org/.
2
https://wiki.opnfv.org/display/yardstick/Yardstick/.
3
https://wiki.opnfv.org/display/bottlenecks/Bottlenecks/.
482 J. Qiu et al.

As shown in Fig. 1, given the training performance metric samples: T =


k T
{(x1 , y1 ) , (x2 , y2 ) , ..., (xk , yk )} ∈ (Rn × Y ) where Xi = [x1i , x2i , ..., xni ] ∈ Rn
is the input vector, the component of the vector represents the performance
metrics. yi is the output which is the accordingly anomaly label for xi and (xi , yi )
represents a sample of the training set and k is the sample size of the training
set. For the multi classification problem, we need not only to determine whether
there is an anomaly, but also to further determine which kind of anomaly it
belongs to. yi ∈ Y = {1, 2, ..., c} , i = 1, 2, ..., k, where c is the size of anomaly
classes. We agree on that yi = 1 means normal status, the other value yi represent
abnormal status. thus the solution is to explore a decision function in Rn : y =
f (x) : Rn → Y and this function could be used to infer the corresponding value
Ynew of any new instance Xnew . It performs detection with localization of an
anomalous behaviour by assigning one class label to each anomalous behaviour
depending on its localization.
Machine learning is a famous field to be extremely relevant for solving clas-
sification problems. with respect to the machine learning models that we aim
to build for detection classifiers, samples of labeled monitoring data are needed
to train them to discern different system behaviours. There are a large pool of
classification-based techniques available, we will try to introduce some of the
well known classifiers in this paper, such as support vector machines (SVM), K-
Nearest Neighbors (KNN), Random Forests, Decision Tree and Neural Networks
(NN).
The measures of classification efficiency could be built from a confusion
matrix that could provide results of counting correctly and incorrectly detected
instances for each class of events. The confusion matrix, also known as an error
metrics is a specific table layout that allows visualization of the performance of
a classifier in the field of machine learning. In a binary classification task, the
terms ‘positive’ and ‘negative’ refer to the classifier’s prediction, and the terms
‘true’ and ‘false’ refer to whether that prediction corresponds to the external
judgment (sometimes known as the ‘observation’). Given these definitions, the
confusion matrix could be formulated as Table 1.

Table 1. Confusion matrix

Actual class (observation)


Predicted class TP (True Positive) correct FP (False Positive)
(expectation) result Unexpected result
FN (False Negative) Missing TN (True Negative) correct
result absence of result

accuracy, precision and F − measure are the well known performance mea-
sures for machine learning models. Intuitively, accuracy = T P +FT PP +T+T N
N +F N is
easy to understand, that is, the proportion of correctly categorized samples
accounted for all samples. Generally speaking, the higher the accuracy, the bet-
ter the classifier. precision = T PT+F
P
P is the ability of the classifier not to label
Performance Anomaly Detection Models for NFVI 483

as positive a sample that is negative, and recall = T PT+F


P
is the ability of the
 N 2  precision×recall
classifier to find all the positive samples. The Fβ = 1 + β β 2 precision+recall
(Fβ and F1 measures) can be interpreted as a weighted harmonic mean of the
precision and recall. A Fβ measure reaches its best value at 1 and its worst score
at 0. With β = 1, Fβ and F1 are equivalent, and the recall and the precision are
equally important.

3.2 Implementation

Performance Anomaly Detection Framework. We implement an anomaly


detection framework which includes a system perturbation module, a cloud plat-
form monitoring module and a data processing and analysis module. The per-
turbation module generates workload and faultload to simulate performance
issues or bottlenecks. At the same time, the monitoring module can collect rel-
evant performance data, it performs the monitoring process according to the
Key Performance Indicator (KPI), the goal of monitoring is to gather data
samples from the target system via performance counters which is so-called
monitoring metrics, then the anomaly datasets could be built. As shown in
Table 2, the anomaly dataset is composed of three parts of data, the perfor-
mance metrics, the anomalous behavior labels and the miscellaneous features.
Schema = {M etrics ∪ AnomalyLabels ∪ M iscF eatures}, where M etrics are
composed of the specific performance metrics such as cpu usage, memory usage.
The AnomalyLabels imply the type of a performance anomaly, the value of ‘1’
represents the underlying anomaly happens, and ‘0’ represents no such anomaly
happens. The dataset also contain some miscellaneous features such as location
where the VNF located, and the timestamp feature of the record. Finally, the
data processing and analysis module is responsible for creating models that are
trained offline for performance anomaly detection based on the anomaly dataset.

Table 2. The schema of the anomaly dataset

Bottlenecks Simulation. In order to better engage in the research of NFV


performance anomaly detection, performance anomalies and bottlenecks could
be simulated by the perturbation module as implemented in Algorithm1, and
the performance related data in the NFVI layer could be collected by the data
monitoring module. Both workload and faultload could be generated by the
perturbation module.
484 J. Qiu et al.

Algorithm 1. Bottlenecks injection controller


Input: vm list, bottleneck type list,
injection duration, user count, duration
1: timer = start timer()
2: while timer < duration do
3: sip simulate(user count, duration)
4: bottleneck type = random (bottleneck type list)
5: vm = random (vm list)
6: injection = new injection (bottleneck type)
7: inject(vm, injection duration)
8: sleep(pause)
9: end while

Performance Metric Model. Classification-based techniques highly reply on


expert’s domain knowledge of the characteristics of performance issues or bot-
tlenecks status. The work in this paper particularly focuses on the identification
of performance anomalies from monitoring data of VMs OSs of the NFVI such
as CPU consumption, disk I/O, and memory consumption. A classic Zabbix4
OS monitoring template5 is adopted as the performance metric model in this
paper.

4 Case Study

4.1 Experimental Environment Setup

The testbed is built on one powerful physical server DELL R730 which is
equipped with 2x Intel Xeon CPU E5-2630 v4 @ 2.10 GHz, 128 G of RAM and 5
TB Hard Disk. The vIMS under test is the Clearwater project which is an open-
source implementation of an IMS for cloud computing platforms. The Clearwa-
ter application is installed on the commercialized hypervisor-based virtualization
platform (VMware ESXi). 10 components of Clearwater are individually hosted
in a docker container on a virtual machine(VM), and the containers are managed
by Kubernetes. Particularly there is an attack host for injecting bottlenecks into
the Clearwater virtual hosts, a tool for the fault injection runs on the inject host,
and the Zabbix agents are installed on the other hosts, finally the performance
data of each virtual host could be collected by the agent when the faultload and
workload are injected.
An open source tool SIPp6 is used as the workload generator for IMS. Fault
injection techniques could be applied to bottlenecks simulation refers to the
Algorithm 1 presented in the previous section.

4
https://www.zabbix.com/.
5
https://github.com/chunchill/nfv-anomaly-detection-ml/blob/master/data/
Features-Description-NFVI.xlsx.
6
http://sipp.sourceforge.net/.
Performance Anomaly Detection Models for NFVI 485

The monitoring agent could collect the performance data from each virtual
host for each round, the timestamp would be record in the log file once there is a
bottleneck injection, so that the performance data could be labeled with related
injection type according to the injection log. Finally, the performance dataset
could be built for data analysis in the next section.

4.2 Experimental Results


There were three kinds of bottlenecks in the data: CPU bottlenecks, memory
bottlenecks, I/O bottlenecks, in addition, if there is no bottleneck injection,
the data is labeled as ‘normal’, and we extracted a total of 3693 records from
the experiment, including 2462 with normal class, 373 with CPU bottlenecks
class, 266 with memory bottlenecks class and 592 with I/O bottlenecks class.
The schema of a record consists of two identification fields (host, timestamp),
45 monitoring metrics feature fields, and 4 labels (normal, CP U bottleneck,
memory bottleneck, and I/O bottleneck).
We used the following machine learning classifiers to perform compara-
tive experiments: Neural Networks, Combined Neural Networks with SVM,
K-Nearest Neighbors, Linear SVM, Radial Basis Function (RBF) SVM, Decision
Tree and Random Forests.

Table 3. Accuracy comparison results of machine learning classifiers

Models Training set Testing set


NN 0.94 0.90
NN+SVM 0.93 0.89
KNN 0.92 0.87
Linear SVM 0.80 0.83
RBF SVM 0.80 0.83
Decision Tree 0.77 0.80
Random Forrest 0.90 0.89

As shown in the comparison results in the Table 3, the effect of the neural
networks is the best for both in training set and testing set. Table 4 shows the
specific results of the neural networks. As the epoch history trend of neural
network learning shown in Fig. 2, we can see that the trend of accuracy and
loss on the training set and the validation set is almost the same, indicating
that there is no over-fitting situation in the training process. It is proved that
the effect of neural networks is ideal and effective to detect the performance
anomalies.
All of the experiment artifacts are available on this github repository7 , includ-
ing the fault injection tools, datasets and the python codes.
7
https://github.com/chunchill/nfv-anomaly-detection-ml.
486 J. Qiu et al.

Table 4. The results by neural network

Accuracy on training set: 0.94


Labels Precision Recall F1-Score
Normal 0.97 0.95 0.96
cpu 0.90 0.93 0.92
Memory 0.91 0.85 0.88
I/O 0.87 0.95 0.91
avg/total 0.94 0.94 0.94
Accuracy on testing set: 0.90
Labels Precision Recall F1-Score
Normal 0.96 0.92 0.94
cpu 0.81 0.90 0.86
Memory 0.86 0.78 0.82
I/O 0.75 0.88 0.81
avg/total 0.91 0.90 0.90

Fig. 2. The accuracy and loss trend of Neural Networks for both training set and
validation set

5 Conclusion
This paper have proposed a machine learning based performance anomaly detec-
tion approach for NFV-oriented cloud system infrastructure. Considering that it
is difficult for researchers to obtain comprehensive and accurate abnormal behav-
iors data in a real NFV production environment, system perturbation technol-
ogy to simulate faultload and workload is presented, and the monitoring module
Performance Anomaly Detection Models for NFVI 487

is integrated into the anomaly detection framework to monitor and evaluate


the platform, it is responsible for constructing anomaly dataset consisting of
abnormal labels and multi-dimensional monitoring metrics. Finally, the effective
machine learning models are fitted by training the statistical learning model on
the anomaly dataset. The experiment results show that machine learning clas-
sifiers could be effectively applied to solve the performance anomalies problem,
and the neural networks model is the best detection model with the precision
over 94%.

Acknowledgement. This work has been supported by the National Natural Science
Foundation of China (Grant No. 61672384), part of the work has also been supported by
Huawei Research Center under Grant No. YB2015120069. And we have to acknowledge
the OPNFV project, because some of the ideas come from the OPNFV community, we
have obtained lots of inspiration and discussion when we involved in the activities on
OPNFV projects Yardstick and Bottlenecks.

References
1. ETSI GS NFV-PER 001. https://www.etsi.org/deliver/etsi gs/NFV-PER/.
Accessed 1 July 2018
2. Cotroneo, D., De Simone, L., Natella, R.: NFV-bench: a dependability benchmark
for network function virtualization systems. IEEE Trans. Netw. Serv. Manag., 934–
948 (2017)
3. Bonafiglia, Roberto, et al.: Assessing the performance of virtualization technologies
for NFV: a preliminary benchmarking. In: European Workshop on Software Defined
Networks (EWSDN), pp. 67–72. IEEE (2015)
4. Naik, P., Shaw, D.K., Vutukuru, M.: NFVPerf: Online performance monitoring and
bottleneck detection for NFV. In: International Conference on Network Function
Virtualization and Software Defined Networks (NFV-SDN), pp. 154–160. IEEE
(2016)
5. Liu, D., et al.: Opprentice: towards practical and automatic anomaly detection
through machine learning. In: Proceedings of the Internet Measurement Confer-
ence, pp. 211–224. ACM (2015)
6. Li, K.-L., Huang, H.-K., Tian, S.-F., Wei, X.: Improving one-class SVM for anomaly
detection. In: IEEE International Conference on Machine Learning and Cybernet-
ics, vol. 5, pp. 3077–3081 (2003)
7. Shanbhag, S., Gu, Y., Wolf, T.: A taxonomy and comparative evaluation of algo-
rithms for parallel anomaly detection. In: ICCCN, pp. 1–8 (2010)
8. Yairi, T., Kawahara, Y., Fujimaki, R., Sato, Y., Machida, K.: Telemetry-mining: a
machine learning approach to anomaly detection and fault diagnosis for space sys-
tems. In: Second International Conference on Space Mission Challenges for Infor-
mation Technology(SMC-IT), p. 8. IEEE (2006)
9. Kourtis, M.A., Xilouris, G., Gardikis, G., Koutras, I.: Statistical-based anomaly
detection for NFV services. In: International Conference on Network Function Vir-
tualization and Software Defined Networks (NFV-SDN), pp. 161–166. IEEE (2016)
10. Cotroneo, D., Natella, R., Rosiello, S.: A fault correlation approach to detect per-
formance anomalies in virtual network function chains. In: IEEE 28th International
Symposium on Software Reliability Engineering (ISSRE), pp. 90–100 (2017)
488 J. Qiu et al.

11. Wang, C., Talwar, V., Schwan, K., Ranganathan, P.: Online detection of utility
cloud anomalies using metric distributions. In: Network Operations and Manage-
ment Symposium (NOMS), pp. 96–103. IEEE (2010)
12. Fu, S.: Performance metric selection for autonomic anomaly detection on cloud
computing systems. In: Global Telecommunications Conference (GLOBECOM),
pp. 1–5. IEEE (2011)
13. Du, Q., et al.: High availability verification framework for OpenStack based on fault
injection. In: International Conference on Reliability, Maintainability and Safety
(ICRMS), pp. 1–7. IEEE (2016)
14. Du, Q., et al.: Test case design method targeting environmental fault tolerance for
high availability clusters. In: International Conference on Reliability, Maintainabil-
ity and Safety (ICRMS), pp. 1–7. IEEE (2016)

You might also like