Professional Documents
Culture Documents
United States Patent: Viswanathan Et A) - (10) Patent N0.: (45) Date of Patent
United States Patent: Viswanathan Et A) - (10) Patent N0.: (45) Date of Patent
Viswanathan et a].
(54)
(56)
References Cited
8,027,897 B2 *
8,037,178 B1 *
8,046,468 B2 *
8,115,151 B2 *
8,180,693 B2 *
8,180,694 B2 *
Apr. 1, 2014
(*)
US 8,688,620 B2
705/35
8,185,913 B1 *
5/2012
8,255,709 B2 *
705/35
705/35
713/300
709/223
8,458,547 B2 *
714/746
8,463,784 B1 *
707/737
8,504,733 B1 *
8/2013
8,549,004 B2 *
2012/0136909 A1*
2012/0209575 A1*
* cited by examiner
(22) Filed:
(65)
US 2013/0080375 A1
(51)
Int. Cl.
(52)
US. Cl.
G06F 15/18
USPC
(58)
(57)
(2006.01)
.......................................................... ..
706/62
ABSTRACT
.......................................................... .. 706/ 62
10!)
125
140
Network
152
Data Center
US. Patent
Apr. 1, 2014
Sheet 1 014
//
5?
US 8,688,620 B2
150/
149
Progam Code
152$
15
% a
1.1V 0 i/
US. Patent
Apr. 1, 2014
Sheet 2 0f 4
Wilta
wanna-owwaannnuunnnnav-u
DomK
WE}
Inc-Iannnnowwannuv'
i
I
Ilnnnvd
,nuuouhnqnnuouannunnuuinnnu
Jrannnnu-Irannnnuumnnnnvuin!
i:sla?
US 8,688,620 B2
US. Patent
Apr. 1, 2014
Sheet 3 0f4
US 8,688,620 B2
mm
Tim
US. Patent
Apr. 1, 2014
Sheet 4 0f4
US 8,688,620 B2
410
429
430
US 8,688,620 B2
1
BACKGROUND
class machines.
15
pro?ts.
Anomaly detection deployed in many data centers com
pare metrics (which are being monitored in the data center) to
?xed thresholds. These thresholds may be determined
30
35
40
50
55
DETAILED DESCRIPTION
60
network (WAN). The host 120 and client 130 may be pro
vided on the network 140 via a communication connection,
65
US 8,688,620 B2
3
inter-quartile range.
system.
The program code executes the function of the architecture
of machine readable instructions as self-contained modules.
These modules can be integrated within a self-standing tool,
or may be implemented as agents that run on top of an existing
program code. In an example, the architecture of machine
readable instructions may be executed to detect anomalies in
the data center.
Anomalies in data centers may manifest in a variety of
different ways. In an example, an anomaly may manifest
itself as a data point that is atypical with reference to the
normal behavior of the data distribution. When an anomaly
manifests as an atypical data point, the distribution of the data
may be used to compute thresholds. Data under normal pro
cess behavior tends to oscillate within the threshold limits.
35
40
making sure that the threshold takes into account the variance
of the sample mean of the window. The thresholds can then be
computed from the entire past history, or a subset of past
windows. However, this approach may not capture all anoma
lies.
Instead, a new approach may be used to detect anomalies
which manifest as changes in the system behavior that are
inconsistent with what is expected. This approach is based on
determining if the observed data is consistent with a given
50
Data points falling beyond and below the upper and lower
shape of the distribution as well as a desired, false positive
65
US 8,688,620 B2
6
5
the probability of incorrectly accepting the null hypothesis.
{1, 2, . . . k}.
factor).
It is also noted that some data centers operate in multiple
D(Q||P) = 2 $108?
lowing algorithm:
40
GOODNESS-OF-FIT
'
SetPl=P,m=l,cl=1
i s m,
Else
55
f) Else
60
0) Compute P
6) Else if (2 * W * D(PiiP,-) < T) for any hypothesis Pi.
W)
1) Set m = 0
2) Set Windex = l
3) Set Stepsize = (Utilmax Utilmin)/Nbins
US 8,688,620 B2
8
or 0.99.
ior in the past, the entire past data can be used to smooth
averages. If the workload has varying behavior that is time
dependent, or is originating from virtual machines that have
current'
25
accepted fewer than cth times in the past, then the window is
textual anomalies.
data center.
40
45
tiles are computed for the Tukey method, and the empirical
frequency of previous windows is computed for the relative
entropy method. These quantities can be computed in time
linear in the input, and these quantities can be well-approxi
mated in one pass using only limited memory, without having
55
60
65
US 8,688,620 B2
9
10
mented by hour of day, and day of week for both CUST1 and
Statistics Over
PPR
Gaussian (Window?d)
Entire past
006
002
Gaussian (Window?d)
R?c?nt Windows
006
002
Gaussian(srnoothed)
Entire Past
0.58
0.08
Tuk??smooth?d)
Entire Past
0-76
0-04
Relattve entropy
Entire Past
0-46
0-04
Relattve entropy
Rm? past
Multiple
Hypo?wses
0'74
0'04
Relative entropy
Recall
0.86
Increases
15
0.04
statistical techniques perform poorly, as compared to the multiple hypotheses relative entropy technique described herein.
wise Gaussian and Tukey techniques, the ?rst half of the data
set was used to estimate the mean, standard deviation and
window length was set to 100. For algorithms that use only 25
The server in question had been allocateda maximum of 16
the recent past, 10 windows of data were used. For the Gauscores and thus the values of CPU utilization ranged between
evaluation, an alarm was raised if an anomaly was detected in 35 one window, then the window was deemed anomalous. This
was performed for each of the 24 hours. The results are shown
in Table 2.
TABLE 2
Hour
# Anomalies
#
#
(Relative Anomalies Anomalies
of day
EntropyRE) (Gaussian)
0000
0100
0200
0300
0400
0500
0600
0700
0800
0900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
6
4
2
5
5
4
5
4
6
7
6
7
4
1
5
6
4
4
4
5
3
6
5
6
7
6
4
4
5
5
7
6
2
2
2
2
2
2
5
5
5
3
3
3
2
5
4
0
Common
Unique
Unique to
Unique
to
(Tukey)
Anomalies
to RE
Gaussian
Tukey
7
1
0
2
2
2
3
7
3
14
13
3
2
1
5
12
2
3
0
3
0
8
0
6
6
1
0
1
2
1
3
4
2
2
2
1
2
1
4
4
2
2
0
3
0
4
0
5
0
0
0
2
0
1
0
0
3
0
0
5
2
0
1
1
1
2
1
2
2
1
2
0
0
2
2
0
0
1
2
0
0
0
0
0
0
1
0
0
2
0
0
0
1
0
1
3
0
0
0
0
0
0
0
1
0
7
7
0
0
0
0
6
0
0
0
0
0
2
0
0
US 8,688,620 B2
11
12
The ?rst column in Table 2 shows the hour of day that was
analyzed. The next three columns indicate number of anoma
25
couple of cases, the value did not drop down entirely and
remained between 4.0 and 6.0, indicating perhaps a stuck
thread. The Gaussian and Tukey methods do not catch this
30
It can be seen from the above examples that the systems and
tially automated.
40
ing:
tional states of the data center;
constructing upper and lower bounds based on the statisti
50
55
the time series data and analyzing each window for anoma
60
hypothesis; and
performing the multinomial goodness-of-?t test with a
threshold based on an acceptable false negative prob
65
ability.
5. The method of claim 3, wherein the multinomial good
ness-of-?t test is based on relative entropy analysis.
US 8,688,620 B2
14
13
6. The method of claim 5, further comprising raising an
alarm that the window includes an anomaly if the hypothesis
is rejected.
hypotheses; and
accepted.
monitored metric.
9. The method of claim 8, further comprising declaring a
window to be anomalous if distribution of metric values in a
window differs signi?cantly from distribution of historical
metric values.
center.
data center.
CERTIFICATE OF CORRECTION
PATENT N0.
: 8,688,620 B2
APPLICATION NO.
: 13/243476
DATED
INVENTOR(S)
: April 1, 2014
: Krishnamurthy Viswanathan et a1.
Page 1 of 1
It is certified that error appears in the above-identi?ed patent and that said Letters Patent is hereby corrected as shown below:
In the Claims
In column 12, line 61, in Claim 4, delete window and insert -- Window; --, therefor.
In column 14, line 22, in Claim 18, before processor insert -- the --.
WMZ44L_
Michelle K. Lee