Professional Documents
Culture Documents
BPM2011 Session 9a 2
BPM2011 Session 9a 2
Matthias Weidlich
Mathias Weske
Behavioral Similarity
A Proper Metric
Abstract
2
lundi 5 septembre 2011
Abstract
What can it be
used for?
2
lundi 5 septembre 2011
Abstract
What is a
proper metric?
What can it be
used for?
2
lundi 5 septembre 2011
Abstract
What is a
proper metric?
What can it be
used for?
2
lundi 5 septembre 2011
Abstract
What is a
proper metric?
What can it be
used for?
Abstract
What is a
proper metric?
What can it be
used for?
4
lundi 5 septembre 2011
reduce tcomparison
reduce |comparisons|
5
lundi 5 septembre 2011
Process Similarity
[17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
lundi 5 septembre 2011
Process Similarity
Existing
work!
[17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
lundi 5 septembre 2011
Proper Metric
A metric d is distance function, that is
7
lundi 5 septembre 2011
Proper Metric
A metric d is distance function, that is
7
lundi 5 septembre 2011
Proper Metric
A metric d is distance function, that is
b
a
7
lundi 5 septembre 2011
pace partition T (p) (solid circle) with pivot p and similarity query
Proper
Metric
with
query model
q. (a) Exclusion, (b) Inclusion, (c) Intersection
Metric). A metric
a is a distance function d : D D R between
cannot be larger
in D with the following properties:
than
a
+
b
oi , oj D : d(oi , oj ) = d(oj , oi )
ty: oi , oj D, oi = oj : d(oi , oj ) > 0
i , oj D : d(oi , oj ) = 0 oi = oj
equality: oi , oj , ok D : d(oi , ok ) d(oi , oj ) + d(oj , ok )
7
lundi 5 septembre 2011
r(p)
r(q)
8
lundi 5 septembre 2011
r(p)
r(q)
8
lundi 5 septembre 2011
Exclusion
No model in the partition of p can match q.
r(p)
Exclusion
No model in the partition of p can match q.
r(p)
other
element
in
T
Intersection
r(q) r(p) d(p, q): T
Partition of p cannot be excluded
and
must
be
Fig. 1(b). Thus, al
searched exhaustively.
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
r(p)
the number of compari
r(q)
ecient
d(p,q)to implement
q
p
the comparison operat
2.2
Process Mode
10
other
element
in
T
Intersection
r(q) r(p) d(p, q): T
Partition of p cannot be excluded
and
must
be
Fig. 1(b). Thus, al
searched exhaustively.
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
r(p)
the number of compari
r(q)
to
implement
ecient
d(p,q)
p
q
the comparison operat
2.2
Process Mode
10
other
element
in
T
Intersection
r(q) r(p) d(p, q): T
Partition of p cannot be excluded
and
must
be
Fig. 1(b). Thus, al
searched exhaustively.
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
r(p)
the number of compari
r(q)
to
implement
ecient
d(p,q)
p
q
the comparison operat
Remember:
No coordinates!
2.2
Process Mode
10
Efficient Search
If we partition the whole repository, we can
p1
p3
p2
11
lundi 5 septembre 2011
Efficient Search
If we partition the whole repository, we can
p1
p3
p2
11
lundi 5 septembre 2011
Efficient Search
If we partition the whole repository, we can
r(q)
p1
p3
p2
11
lundi 5 septembre 2011
Abstract
a suitable
distance function
reduce number
of comparisons
Behavioral Profile
semantics
[4]
lundi 5 septembre 2011
13
behavioral profile of P .
Strict Order
behavioral profile of P .
Strict Order
Exclusiveness Relation
B + C
two activities B and C never occur together in
15
lundi 5 septembre 2011
Interleaving Order
B || C
else, i.e., B may happen before C and vice
versa
16
lundi 5 septembre 2011
Interleaving Order
B || C
else, i.e., B may happen before C and vice
versa
information loss:
CAUSALITY,
CARDINALITY
16
lundi 5 septembre 2011
Behavioral Profile
in a model
17
lundi 5 septembre 2011
profiles
of two
process order,
models, Pthe
andreverse
Q
pair
(x, y)
in strict
s
1
elementary
e pair
(y, x),similarities
i.e., x (,P+, y||) y P
extended similarities
properties
that we will exploit for sim
e strict order relation, the relations of
[34]
18
profiles
of two
process order,
models, Pthe
andreverse
Q
pair
(x, y)
in strict
s
1
elementary
e pair
(y, x),similarities
i.e., x (,P+, y||) y P
extended similarities
properties
that we will exploit for sim
e strict order relation, the relations of
[34]
18
reward
of common
activities
in a
Definitionsimilar
6 (Strictordering
Order Similarity).
Let P , Q be
process models
Q the strict
order relations
of their
process
models
P and
Q respective behavioral profiles BP , B
define the Strict Order Similarity
|P Q |
sim (BP , BQ ) = |P Q |
Due to x P y y 1
P x, cf. Section 3, it suces to incorporate o
strict order relation into the similarity measure. The reverse relation is im
covered. This can be seen in the behavioral profile matrix as the stric
19
lundi 5 septembre 2011
reward
of common
activities
in a
Definitionsimilar
6 (Strictordering
Order Similarity).
Let P , Q be
process models
Q the strict
order relations
of their
process
models
P and
Q respective behavioral profiles BP , B
define the Strict Order Similarity
|P Q |
sim (BP , BQ ) = |P Q |
Due to x P y y 1
P x, cf. Section 3, it suces to incorporate o
strict order relation into the similarity measure. The reverse relation is im
covered. This can be seen in the behavioral profile matrix as the stric
d=1
|AB|
|AB|
sim (BP , BQ ) =
lundi 5 septembre 2011
|{(A,B),(A,C)}{(A,B),(A,C)}|
|{(A,B),(A,C)}{(A,B),(A,C)}|
=1
19
penalizes
rewards
high
corresponding
similarity
quantifies,
how
many
common
pairs
exist
in
pairs,
if
they
are,
e.g.,
executed
i
of exclusiveness
of flexibility
models that feature the same exclusiveness
relation.
flow cycle
in the other.
Elementary Similarities
constraints
e.g. loops
Definition
Definition 5 (Exclusiveness Similarity).
Let7 P(Interleaving
,and
Q be processO
concurrency
||Prespective
, ||Q the interleaving
ord
+P , +Q the exclusiveness relations ofand
their
behavioral profi
BP , BQ . We define the Interleav
We define the Exclusiveness Similarity
| ||P ||Q |
|+P +Q |
sim|| (BP , BQ ) = | ||P ||Q |
sim+ (BP , BQ ) = |+P +Q |
The
process
models models
of Fig. 2ofshF
See, for example, the activities c and
d in
both process
(d,f)
and (f,d).
the scenario, it is prohibitive to reject
an order
and fileThis
thisyields
order si
a
20
all cases
does
the ord
The only dierence between m1 and m2 Not
within
regard
to the
exclusiven
The process models of Fig. 2 share interleaving order relations for (d,e),(e,d),
(d,f) and (f,d). This yields sim|| (Bm1 , Bm2 ) = 48 = 0.5.
Not in all cases does the ordering of activities in a model correspond to the
actual order of execution of these activities in practice. Their order may simply be
one possible sequence sprung from the habit of the process modeler. To address
such cases, we extend the elementary similarities above to increase the tolerance
of the measures.
The strict order similarity only rewards pairs of activities that are executed in
the same order. If two activities appear both in each trace of two distinct process
pair of activities in interleaving order in one model
models but are executed in inverse order
respectively,
it is plausible
they do
of the
same activities
in a staticthat
sequence
in anothe
not depend on each other, but ratherorder
been
havesupersedes
simply
modeled
in flexibility.
a sequenceHowever
strict
order in
that seemed suitable to the process interleaving
modeler. Thus,
propose the
extended
orderwesimilarity.
Thus,
we suggest the
strict order similarity that will rewardsimilarity,
these pairs
of activities.
which
rewards the containment of strict o
Extended Similarities
Extended Strict Order
Similarity
Extended Interleaving
Order Similarity
Q
P the reverse strict
order relations of their respective behavioral profiles BP , BQ . We define the
Definition 9 (Extended Interleaving Order S
Extended Strict Order Similarity
cess
models and P , Q the strict order relations,
1
|(P 1
)(
)|
Q
sim (BP , BQ ) = |( P1 )( Q
order
relations, ||P , ||Q the interleaving order relat
1
)|
P
Q
P
Q
ioral profiles BP , BQ . We define the Extended Int
1
The above consideration applies to activities f and g in |(
Fig.P
2: 1
They
are
in
strict
||
)(
P
Q
P
Q ||Q ) |
simin
= |(
1
1
|| (B
order in m1 and in reverse strict order
mP2 ., BInQ )the
given
scenario,
it
seems
||
)(
P
P
Q
P
Q ||Q ) |
to be reasonable to assume that there is no explicit order constraint between
As an example, refer to the activities e and f in Fi
them. Thus, these pairs should contribute to the similarity of both models:
order in m1 . In an instance of this process they may21b
14
lundi sim
5 septembre
(Bm1 , Bm2 ) =
0.467.
2011
sim
Behavioral Profile Metric h P Q
h
weight that accounts for
the respective
metrics
impact
as explained
in Section
4.2 and [34].
Theno
that accounts
the respective
me
We postulate B as weight
the universe
of allforpossible
behavio
We
postulate
B
as
the
universe
of
all
p
this matches the behavioral profiles of all models within
this matches the behavioral profiles of all
Aggregate the similarities by weighted sum
Definition
10
(Behavioral
Profile
Metric).
B
=
(B
Definition
10 (Behavioral Profile Me
and transform it into
a metric.
of behavioral profiles B,
theprofiles
behavioral
metr
of where
behavioral
B, whereprofile
the behavio
weights determined
through experiments
metric,
metric,
dB (BP , BQ ) = 1 wh simh (BP , BQ
dB (BP , BQ ) = 1
wh simh (BP , BhQ )
h
with
h
{+,
,
||,
,
||
}
and
weighting
f
{+, , ||, , || w}h and
= 1. weighting factors wh R
with
h
wh = 1.
h
sim
Behavioral Profile Metric h P Q
h
weight that accounts for
the respective
metrics
impact
as explained
in Section
4.2 and [34].
Theno
that accounts
the respective
me
We postulate B as weight
the universe
of allforpossible
behavio
We
postulate
B
as
the
universe
of
all
p
this matches the behavioral profiles of all models within
this matches the behavioral profiles of all
Aggregate the similarities by weighted sum
Definition
10
(Behavioral
Profile
Metric).
B
=
(B
Definition
10 (Behavioral Profile Me
and transform it into
a metric.
of behavioral profiles B,
theprofiles
behavioral
metr
of where
behavioral
B, whereprofile
the behavio
weights determined
through experiments
metric,
metric,
dB (BP , BQ ) = 1 wh simh (BP , BQ
dB (BP , BQ ) = 1
wh simh (BP , BhQ )
h
with
h
{+,
,
||,
,
||
}
and
weighting
f
{+, , ||, , || w}h and
= 1. weighting factors wh R
with
h
wh = 1.
h
In orderproof
to usethat
this this
aggregate
metric
for
is a metric
that it iscan
indeed
aound
metric
as established
in
be f
in the
paper
Abstract
a suitable
distance function
reduce number
of comparisons
Evaluation
SAP Reference Model (604 EPC)
human assessment of similarity among a
recall levels
24
lundi 5 septembre 2011
Search Quality
single metrics
So
In
ES
equal weights
EI
1,00
1,00
0,75
0,75
Precision
Precision
Ex
aggregated metric
0,50
0,50
0,25
0,25
0
0
0,1
0,2
0,3
0,4
0,5
Recall
0,6
0,7
0,8
0,9
1Ex-4ES
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
Recall
25
lundi 5 septembre 2011
Search Performance
built metric tree index and searched for
bpd_median
400
300
200
100
0
20
80
140
200
260
320
dataset size
lundi 5 septembre 2011
380
440
500
26
Search Performance
built metric tree index and searched for
Beyond the
paper
bpd_median
400
300
200
100
0
20
80
140
200
260
320
dataset size
lundi 5 septembre 2011
380
440
500
26
Search Performance
built metric tree index and searched for
Beyond the
paper
linear search
bpd_median
bpd_median
100
number of comparisons
400
bpd_max
300
200
100
75
50
25
0
20
80
140
200
260
320
dataset size
lundi 5 septembre 2011
380
440
500
20
80
140
200
260
320
dataset size
380
440
500
26
Limitations
subgraphs
27
lundi 5 septembre 2011
Conclusions
Conclusions
reduce number
of comparisons
Conclusions
b
a
a suitable
distance function
reduce number
of comparisons
Conclusions
b
a
a suitable
distance function
reduce number
of comparisons
Conclusions
b
a
a suitable
distance function
reduce number
of comparisons
1,00
0,75
300
0,50
200
0,25
100
0
0
0,2
0,4
0,6
0,8
References
1. Rosemann, M.: Potential pitfalls of process modeling: part B. Business Process
Management Journal 12(3) (2006) 377-384
2. Dijkman, R., Dumas, M., van Dongen, B., Kaarik, R., Mendling, J.: Similarity of
business process models: Metrics and evaluation. Inf. Syst. 36(2) (2011) 498-516
3. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space
Approach. Springer-Verlag New York, Inc., Secaucus, NJ, USA (2005)
4. Weidlich, M., Mendling, J., Weske, M.: Efficient Consistency Measurement based on
Behavioural Profiles of Process Models. IEEE Trans. Softw. Eng. (2011)
5. Kunze, M., Weidlich, M., Weske, M.: m3 - A Behavioral Similarity Metric for Business
Processes. In: ZEUS'11. CEUR-WS. (2011) 89-95
6. Ch'avez, E., Navarro, G., Baeza-Yates, R., Marroqu'in, J.L.: Searching in Metric Spaces.
ACM Comput. Surv. 33(3) (2001) 273-321
7. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces (Survey Article).
ACM Trans. Database Syst. 28(4) (2003) 517-580
8. Dumas, M., Garc'ia-Ba~nuelos, L., Dijkman, R.M.: Similarity Search of Business Process
Models. IEEE Data Eng. Bull. 32(3) (2009) 23-28
9. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB
J. 10(4) (2001) 334-350
10. Euzenat, J., Shvaiko, P.: Ontology matching. Springer-Verlag, Heidelberg (DE) (2007)
11. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals.
Soviet Physics Doklady 10(8) (1966) 707-710
12. Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing.
Commun. ACM 18(11) (1975) 613-620
13. Manning, C.D., Sch"atze, H.: Foundations of Statistical Natural Language Processing.
The MIT Press (1999)
14. Miller, G.A.: WordNet: A Lexical Database for English. Commun. ACM 38(11) (1995)
39-41
15. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition.
Pattern Recognition Letters 1(4) (1983) 245-253
16. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman (1979)
17. Dijkman, R.M., Dumas, M., Garc'ia-Ba~nuelos, L.: Graph Matching Algorithms for
Business Process Model Similarity Search. In: BPM. Volume 5701 of LNCS, Springer (2009)
48-63
18. Li, C., Reichert, M., Wombacher, A.: On Measuring Process Model Similarity Based on
High-Level Change Operations. In: ER. Volume 5231 of LNCS, Springer (2008) 248-264
19. Yan, Z., Dijkman, R.M., Grefen, P.: Fast Business Process Similarity Search with FeatureBased Similarity Estimation. In: OTM '10, Volume 6426 of LNCS, Springer (2010) 60-77
20. Wombacher, A., Rozie, M.: Evaluation of Workflow Similarity Measures in Service
Discovery. In: Service Oriented Electronic Commerce. Volume 80 of LNI., GI (2006)
51-71
21. Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., Zave, P.: Matching and
Merging of Statecharts Specifications. In: ICSE '07, Washington, DC, USA, IEEE
Computer Society (2007) 54-64
22. Eshuis, R., Grefen, P.: Structural Matching of BPEL Processes. In: ECOWS '07,
Washington, DC, USA, IEEE Computer Society (2007) 171-180
23. van Dongen, B., Dijkman, R., Mendling, J.: Measuring Similarity between Business
Process Models. In: CAISE '08, Volume 5074 of LNCS, Springer (2008) 450-464
24. Minor, M., Tartakovski, A., Bergmann, R.: Representation and Structure-Based
Similarity Assessment for Agile Workflows. In: ICCBR '07 Volume 4626 of LNCS,
Springer (2007) 224-238
25. Ehrig, M., Koschmider, A., Oberweis, A.: Measuring Similarity between Semantic
Business Process Models. In: APCCM '07 Australian Computer Society, Inc. (2007)
71-80
26. van der Aalst, W., de Medeiros, A.A., Weijters, A.: Process Equivalence: Comparing
Two Process Models Based on Observed Behavior. In: BPM Volume 4102 of LNCS,
Springer (2006) 129-144
27. Lu, R., Sadiq, S.: On the Discovery of Preferred Work Practice Through Business
Process Variants. In: ER' 07, Volume 4801 of LNCS, Springer (2007) 165-180
28. Kunze, M., Weske, M.: Metric Trees for Efficient Similarity Search in Process Model
Repositories. In: IW-PL '10, Hoboken, NJ (September 2010)
29. Vanhatalo, J., V"olzer, H., Leymann, F., Moser, S.: Automatic Workflow Graph Refactoring
and Completion. In: ICSOC. Volume 5364 of LNCS, Springer (2008) 100-115
30. Lohmann, N., Verbeek, E., Dijkman, R.M.: Petri Net Transformations for Business
Processes - A Survey. T. Petri Nets and Other Models of Concurrency 2 (2009) 46-63
31. van der Aalst, W.: Workflow Verification: Finding Control-Flow Errors Using Petri-NetBased Techniques. In: BPM '09, Volume 1806 of LNCS, Springer (2000) 161-183
32. Weidlich, M., Polyvyanyy, A., Mendling, J., Weske, M.: Efficient Computation of Causal
Behavioural Profiles Using Structural Decomposition. In:Petri Nets. Volume 6128 of LNCS,
Springer (2010) 63-83
33. Weidlich, M., Mendling, J.: Perceived Consistency between Process Models. Inf. Syst.
(2011) In Press.
34. Lipkus, A.: A Proof of the Triangle Inequality for the Tanimoto Distance. Journal of
Mathematical Chemistry 26 (1999) 263-265
35. Decker, G., Mendling, J.: Process instantiation. Data Knowl. Eng. 68(9) (2009) 777-792
36. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR '00.
(2000) 33-40
37. Awad, A.: BPMN-Q: A Language to Query Business Processes. In: EMISA. Volume P-119
of LNI., GI (2007) 115-128
38. Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying Business Processes. In: VLDB
'06VLDB Endowment (2006) 343-354
39. Choi, I., Kim, K., Jang, M.: An XML-based process repository and process query
language for integrated process management. Knowledge and Process Management 14(4)
(2007) 303-316
40. Jin, T., Wang, J., Wu, N., Rosa, M.L., ter Hofstede, A.H.M.: Efficient and Accurate
Retrieval of Business Process Models through Indexing - (Short Paper). In: OTM '10, Volume
6426 of LNCS, Springer (2010) 402-409
29
lundi 5 septembre 2011
! !
% %
% &#!'%$((((
!
&#!'%$((((
! %
"#!
"#!
$
(b)
%
!
&#!'%$((((
!
%
&#!'%$
(a)
"#%
"#%
$
$
"#%
$
"#%
$
"#%
$
&#!'%$((((
"#%
$ "#!
$
"#%
$
"#!
$
"#!
"#!
$
$
"#!
((((
"#%
$
"#!
&#!'%$
(c)
Fig. 1. Metric space partition T (p) (solid circle) with pivot p and similarity query
(dotted circle) with query model q. (a) Exclusion, (b) Inclusion, (c) Intersection.
data transformation and reduction techniques need to be applied, which often
result in objects for which only pairwise similarity can be computed [3].
Within the last two decades, data structures and algorithms have been studied
that allow for ecient similarity search in such a case, cf. [6,7]. These methods
30
Aspect
Structure
Structure
Structure
Structure
Behavior
Behavior
Behavior
Structure
Behavior
Behavior
Structure
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
no
no
no
yes
yes
no
no
yes
no
no
no
Further, edit distances have been defined for the behavior of a process model based
on all traces, an automaton encoding the language, or an n-gram representation of
all traces [20]. If the process behavior is defined by a transition system, the degree
lundi 5 septembre 2011
31
Discovery. In: Service Oriented Electronic Commerce. Volume 80 of LNI., GI (2006) 51-71
21. Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., Zave, P.: Matching and
Merging of Statecharts Specifications. In: ICSE '07, Washington, DC, USA, IEEE Computer
Society (2007) 54-64
22. Eshuis, R., Grefen, P.: Structural Matching of BPEL Processes. In: ECOWS '07,
Process Models. In: CAISE '08, Volume 5074 of LNCS, Springer (2008) 450-464
24. Minor, M., Tartakovski, A., Bergmann, R.: Representation and Structure-Based
Similarity Assessment for Agile Workflows. In: ICCBR '07 Volume 4626 of LNCS, Springer
(2007) 224-238
25. Ehrig, M., Koschmider, A., Oberweis, A.: Measuring Similarity between Semantic
Business Process Models. In: APCCM '07 Australian Computer Society, Inc. (2007) 71-80
26. van der Aalst, W., de Medeiros, A.A., Weijters, A.: Process Equivalence: Comparing
Two Process Models Based on Observed Behavior. In: BPM Volume 4102 of LNCS,
Springer (2006) 129-144
27. Lu, R., Sadiq, S.: On the Discovery of Preferred Work Practice Through Business
Process Variants. In: ER' 07, Volume 4801 of LNCS, Springer (2007) 165-180
32
lundi 5 septembre 2011
For each pair (x, y) in strict order, the reverse strict order relation comprises
the inverse pair (y, x), i.e., x P y y 1
P x. Behavioral profiles show a
number of properties that we will exploit for similarity analysis. Together with
the reverse strict order relation, the relations of the behavioral profile partition
the Cartesian product of activities [4]. Further, behavioral profiles are computed
lundi 5eciently
septembre 2011
for the class of process models introduced above under the assumption
33
0"3)'0.$45
6"31'*",
#"4(#1'0.$45
6"31'*7,
0"3)'
$3!($-"'*%,
#"8"-1'
(#)"#'*-,
!"%$
!"#$%&'
(#)"#'*+,
%$9"'(#)"#'
*),
0"3)'0.$45
6"31'*",
#"4(#1'0.$45
6"31'*7,
0"3)'
$3!($-"'*%,
+ - ) " % 7
+ :
: : : : :
)
: : ;; ;; ;;
;; :
"
: ;;
%
:
:
7
: ;;
%$9"'(#)"#'
*),
-."-/'
01(-/'*2,
:
!"#$%&'
(#)"#'*+,
:
#"8"-1'
(#)"#'*-,
!"#$
+ 2 - ) " % 7
+ :
2
:
: : : : :
)
: : ;; ;;
;; : ;;
"
: ;; ;; :
%
7
:
:
34
contrast
to pure similarity
ordering features,
strict
ordercf.and
interleaving
order,
thatInallow
for ecient
search ini.e.,
such
a case,
[6,7].
These methods
we do not
relax the exclusiveness
similarity,
since exclusiveness
a very strong
leverage
transitivity
properties exposed
by a distance
functionaiscomplement
to
statement
about
the dependency
and correlation of activities.
the
similarity
of data
objectsa metric.
Definition
1 (Metric).
A metric
is a distance
function d : D D R between
4.3 Aggregated
Metric
for Behavioral
Profiles
objects of domain D with the following properties:
Based
on the elementary
above, we construct an ag Symmetry:
oi , oj Dsimilarity
: d(oi , oj )measures
= d(oj , odefined
i)
gregated
similarity metric
follows.
elementary
Nonnegativity:
oi , oj as
D,
oi = oEach
j : d(o
i , oj ) > 0 similarity translates into an
elementary
BQi), o=j )1=
0sim
,
B
)
for
all
h
{+,
,
||,
, || },
Identity:metric,
oi , ojd
DP: ,d(o
ho(B
=
o
h (B
P
Q
i
j
as explained
in Section 4.2
Then,
up ithese
Triangle inequality:
oi ,and
oj , o[34].
: d(owe
) d(o
, oj ) +values
d(oj , and
ok ) assign a
k D
i , oksum
weight
that
accounts
for S
the
metrics impact on the overall metric.
A
metric
space
is a pair
= respective
(D, d).
We postulate B as the universe of all possible behavioral profiles. In practice
The
triangle inequality
enables
determining
minimum
maximum distances
this matches
the behavioral
profiles
of all models
withinand
a repository.
of two objects oi , ok without calculating it, if their pairwise distance to a third
Definition
10 (Behavioral
Profile Metric).
B =sets,
(B, i.e.,
dB ) is
a metric
space
object
oj is given.
To search eciently
in large data
avoid
comparison
of behavioral
profiles
where
behavioral
profile metric
B can
R isbe
a
B : B some
with
every object,
theB,data
setthe
is split
into partitions,
fromdwhich
metric, during search. In metric spaces, a partition is established by a pivot p D
pruned
anddaB (B
covering
that
= 1 r(p)
wh R
sim
) sphere around p. All data objects oi
P , BQ ) radius
h (Bspans
P , BQ a
within a distance d(oi, hp) r(p) are contained in that sphere, oi T (p).
with
h {+, , ||, , || } and weighting factors wh R, 0 < wh < 1 such that
Similarity
search requires a query q from the same domain of the objects in
wh = 1.
the
h data set, i.e., q D, and a tolerance that expresses how similar the objects
of a valid result may be to qa query radius r(q) R. Fig. 1(a) shows a pivot p
In order
to use this
aggregate
metric
for similarity
search,
has toalong
be shown
with
its covering
radius
r(p) and
all elements
of T (p)
(soliditcircle),
with 35
that
it 2011
is indeed
established
in Definition
lundi
5 septembre
the
query
model aq metric
and itsas
similarity
tolerance
(dotted1.circle) spanned by r(q).
./
*,
*+,-./0-
*.
!#(
!#(
!#'
!#'
!#&
!#&
!#%
!#%
!
!"#$%&%'(
!
!"#$%&%'(
*+
!#$
!"
!#"
!"
!#"
!#8
!#0
!#7
!#)
!#)
!
!#)
!#0
!#1
!#"
!#$
)#$*++
(a)
!#%
!#&
!#'
!#(
)123"14
!#$
!#1
12345360314316
!#)
!#7
!#8
!#"
!#$
!#%
!#&
!#'
!#(
)#$*++
(b)
36