Professional Documents
Culture Documents
Performance Anomaly Detection Models of Virtual Machines For Network Function Virtualization Infrastructure With Machine Learning
Performance Anomaly Detection Models of Virtual Machines For Network Function Virtualization Infrastructure With Machine Learning
Juan Qiu(B) , Qingfeng Du(B) , Yu He, YiQun Lin, Jiaye Zhu, and Kanglin Yin
1 Introduction
2 Related Works
Reliability studies for NFV technology including performance and security topics
are also hot research areas for both academia and industry. In order to guaran-
tee high and predictable performance of data plane workloads, a list of minimal
features which the Virtual Machine (VM) Descriptor and Compute Host Descrip-
tor should contain for the appropriate deployment of VM Images over an NFV
Infrastructure (NFVI) are presented [1]. NFV-Bench [2] is proposed by Domenico
et al. to analyze the faulty scenarios and to provide joint dependability and
performance evaluations for NFV systems. Bonafiglia et al. [3] provides a (pre-
liminary) benchmark of the widespread virtualization technologies when used
in NFV, which means when they are exploited to run the so-called virtual net-
work functions and to chain them in order to create complex services. Priyanka
et al. presents the design and implementation of a tool, namely NFVPerf [4], to
monitor performance and identify performance bottlenecks in an NFV system.
NFVPerf runs as part of a cloud management system like OpenStack and sniffs
traffic between NFV components in a manner that is transparent to the VNF.
Anomaly detection is an important data analysis task that detects abnormal
data from a given dataset, it is an important data mining research problem and
has been widely studied in many fields. It can usually be solved by statistics and
machine learning methods [5–8]. In recent years, anomaly detection literature in
NFV has also begun to emerge. Michail-Alexandros et al. [9] presented the use
of an open-source monitoring system especially tailored for NFV in conjunction
with statistical approaches commonly used for anomaly detection, towards the
timely detection of anomalies in deployed NFV services. Domenico et al. [10]
proposed an approach on an NFV-oriented Interactive Multimedia System to
detect problems affecting the quality of service, such as the overload, component
Performance Anomaly Detection Models for NFVI 481
1
https://www.opnfv.org/.
2
https://wiki.opnfv.org/display/yardstick/Yardstick/.
3
https://wiki.opnfv.org/display/bottlenecks/Bottlenecks/.
482 J. Qiu et al.
accuracy, precision and F − measure are the well known performance mea-
sures for machine learning models. Intuitively, accuracy = T P +FT PP +T+T N
N +F N is
easy to understand, that is, the proportion of correctly categorized samples
accounted for all samples. Generally speaking, the higher the accuracy, the bet-
ter the classifier. precision = T PT+F
P
P is the ability of the classifier not to label
Performance Anomaly Detection Models for NFVI 483
3.2 Implementation
4 Case Study
The testbed is built on one powerful physical server DELL R730 which is
equipped with 2x Intel Xeon CPU E5-2630 v4 @ 2.10 GHz, 128 G of RAM and 5
TB Hard Disk. The vIMS under test is the Clearwater project which is an open-
source implementation of an IMS for cloud computing platforms. The Clearwa-
ter application is installed on the commercialized hypervisor-based virtualization
platform (VMware ESXi). 10 components of Clearwater are individually hosted
in a docker container on a virtual machine(VM), and the containers are managed
by Kubernetes. Particularly there is an attack host for injecting bottlenecks into
the Clearwater virtual hosts, a tool for the fault injection runs on the inject host,
and the Zabbix agents are installed on the other hosts, finally the performance
data of each virtual host could be collected by the agent when the faultload and
workload are injected.
An open source tool SIPp6 is used as the workload generator for IMS. Fault
injection techniques could be applied to bottlenecks simulation refers to the
Algorithm 1 presented in the previous section.
4
https://www.zabbix.com/.
5
https://github.com/chunchill/nfv-anomaly-detection-ml/blob/master/data/
Features-Description-NFVI.xlsx.
6
http://sipp.sourceforge.net/.
Performance Anomaly Detection Models for NFVI 485
The monitoring agent could collect the performance data from each virtual
host for each round, the timestamp would be record in the log file once there is a
bottleneck injection, so that the performance data could be labeled with related
injection type according to the injection log. Finally, the performance dataset
could be built for data analysis in the next section.
As shown in the comparison results in the Table 3, the effect of the neural
networks is the best for both in training set and testing set. Table 4 shows the
specific results of the neural networks. As the epoch history trend of neural
network learning shown in Fig. 2, we can see that the trend of accuracy and
loss on the training set and the validation set is almost the same, indicating
that there is no over-fitting situation in the training process. It is proved that
the effect of neural networks is ideal and effective to detect the performance
anomalies.
All of the experiment artifacts are available on this github repository7 , includ-
ing the fault injection tools, datasets and the python codes.
7
https://github.com/chunchill/nfv-anomaly-detection-ml.
486 J. Qiu et al.
Fig. 2. The accuracy and loss trend of Neural Networks for both training set and
validation set
5 Conclusion
This paper have proposed a machine learning based performance anomaly detec-
tion approach for NFV-oriented cloud system infrastructure. Considering that it
is difficult for researchers to obtain comprehensive and accurate abnormal behav-
iors data in a real NFV production environment, system perturbation technol-
ogy to simulate faultload and workload is presented, and the monitoring module
Performance Anomaly Detection Models for NFVI 487
Acknowledgement. This work has been supported by the National Natural Science
Foundation of China (Grant No. 61672384), part of the work has also been supported by
Huawei Research Center under Grant No. YB2015120069. And we have to acknowledge
the OPNFV project, because some of the ideas come from the OPNFV community, we
have obtained lots of inspiration and discussion when we involved in the activities on
OPNFV projects Yardstick and Bottlenecks.
References
1. ETSI GS NFV-PER 001. https://www.etsi.org/deliver/etsi gs/NFV-PER/.
Accessed 1 July 2018
2. Cotroneo, D., De Simone, L., Natella, R.: NFV-bench: a dependability benchmark
for network function virtualization systems. IEEE Trans. Netw. Serv. Manag., 934–
948 (2017)
3. Bonafiglia, Roberto, et al.: Assessing the performance of virtualization technologies
for NFV: a preliminary benchmarking. In: European Workshop on Software Defined
Networks (EWSDN), pp. 67–72. IEEE (2015)
4. Naik, P., Shaw, D.K., Vutukuru, M.: NFVPerf: Online performance monitoring and
bottleneck detection for NFV. In: International Conference on Network Function
Virtualization and Software Defined Networks (NFV-SDN), pp. 154–160. IEEE
(2016)
5. Liu, D., et al.: Opprentice: towards practical and automatic anomaly detection
through machine learning. In: Proceedings of the Internet Measurement Confer-
ence, pp. 211–224. ACM (2015)
6. Li, K.-L., Huang, H.-K., Tian, S.-F., Wei, X.: Improving one-class SVM for anomaly
detection. In: IEEE International Conference on Machine Learning and Cybernet-
ics, vol. 5, pp. 3077–3081 (2003)
7. Shanbhag, S., Gu, Y., Wolf, T.: A taxonomy and comparative evaluation of algo-
rithms for parallel anomaly detection. In: ICCCN, pp. 1–8 (2010)
8. Yairi, T., Kawahara, Y., Fujimaki, R., Sato, Y., Machida, K.: Telemetry-mining: a
machine learning approach to anomaly detection and fault diagnosis for space sys-
tems. In: Second International Conference on Space Mission Challenges for Infor-
mation Technology(SMC-IT), p. 8. IEEE (2006)
9. Kourtis, M.A., Xilouris, G., Gardikis, G., Koutras, I.: Statistical-based anomaly
detection for NFV services. In: International Conference on Network Function Vir-
tualization and Software Defined Networks (NFV-SDN), pp. 161–166. IEEE (2016)
10. Cotroneo, D., Natella, R., Rosiello, S.: A fault correlation approach to detect per-
formance anomalies in virtual network function chains. In: IEEE 28th International
Symposium on Software Reliability Engineering (ISSRE), pp. 90–100 (2017)
488 J. Qiu et al.
11. Wang, C., Talwar, V., Schwan, K., Ranganathan, P.: Online detection of utility
cloud anomalies using metric distributions. In: Network Operations and Manage-
ment Symposium (NOMS), pp. 96–103. IEEE (2010)
12. Fu, S.: Performance metric selection for autonomic anomaly detection on cloud
computing systems. In: Global Telecommunications Conference (GLOBECOM),
pp. 1–5. IEEE (2011)
13. Du, Q., et al.: High availability verification framework for OpenStack based on fault
injection. In: International Conference on Reliability, Maintainability and Safety
(ICRMS), pp. 1–7. IEEE (2016)
14. Du, Q., et al.: Test case design method targeting environmental fault tolerance for
high availability clusters. In: International Conference on Reliability, Maintainabil-
ity and Safety (ICRMS), pp. 1–7. IEEE (2016)