Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Matthias Kunze

Matthias Weidlich
Mathias Weske

Behavioral Similarity
A Proper Metric

lundi 5 septembre 2011

Abstract

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.

2
lundi 5 septembre 2011

Abstract
What can it be
used for?

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.

2
lundi 5 septembre 2011

Abstract
What is a
proper metric?

What can it be
used for?

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.

2
lundi 5 septembre 2011

Abstract
What is a
proper metric?

What can it be
used for?

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.
How does
it work?

2
lundi 5 septembre 2011

Abstract
What is a
proper metric?

What can it be
used for?

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.
How does
it work?

How well does


it work?
2

lundi 5 septembre 2011

Abstract
What is a
proper metric?

What can it be
used for?

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.
How does
it work?

How well does


it work?
3

lundi 5 septembre 2011

Search in Process Repositories


large process model repositories require

meaningful means for search

find similar models to given model


identify duplicates

Similarity Search: Find all process models within

a given proximity to a given query model.

4
lundi 5 septembre 2011

Efficient Similarity Search


tseach = tcomparison |comparisons|
data preprocessing and reduction

reduce tcomparison

index data structures and algorithms

reduce |comparisons|

traditional indexing: requires transitive

relation, e.g., ordering or coordinates

5
lundi 5 septembre 2011

Process Similarity

[17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
lundi 5 septembre 2011

Process Similarity

Existing
work!

[17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
lundi 5 septembre 2011

Proper Metric
A metric d is distance function, that is

symmetric, non-negative and obeys the


triangle inequality.

No coordinates or ordering required!

7
lundi 5 septembre 2011

Proper Metric
A metric d is distance function, that is

symmetric, non-negative and obeys the


triangle inequality.

No coordinates or ordering required!

7
lundi 5 septembre 2011

Proper Metric
A metric d is distance function, that is

symmetric, non-negative and obeys the


triangle inequality.

No coordinates or ordering required!

b
a

7
lundi 5 septembre 2011

pace partition T (p) (solid circle) with pivot p and similarity query
Proper
Metric
with
query model
q. (a) Exclusion, (b) Inclusion, (c) Intersection

A metric d is distance function, that is


ation
and
reduction
techniques
need
to
be
applied,
which
often
symmetric, non-negative and obeys the
s for which only pairwise similarity can be computed [3].
triangle inequality.
ast two decades, data structures and algorithms have been studied
cient
searchor
in ordering
such a case,
cf. [6,7]. These method
No similarity
coordinates
required!
vity properties exposed by a distance functiona complement to
f data objectsa metric.
b

Metric). A metric
a is a distance function d : D D R between
cannot be larger
in D with the following properties:
than
a
+
b
oi , oj D : d(oi , oj ) = d(oj , oi )
ty: oi , oj D, oi = oj : d(oi , oj ) > 0
i , oj D : d(oi , oj ) = 0 oi = oj
equality: oi , oj , ok D : d(oi , ok ) d(oi , oj ) + d(oj , ok )
7
lundi 5 septembre 2011

Similarity Search with a Metric


Given a partition that includes a set of models

with p at its center and a search query q.

r(p)
r(q)

8
lundi 5 septembre 2011

Similarity Search with a Metric


Given a partition that includes a set of models

with p at its center and a search query q.


max distance of
any model to p
search
radius

r(p)
r(q)

8
lundi 5 septembre 2011

Exclusion
No model in the partition of p can match q.

r(p)

r(q) + r(p) < d(p, q): A


they cannot be in
other element in T
r(q) r(p) d(p, q): T
r(q)
Fig.q 1(b). Thus, al
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
9

lundi 5 septembre 2011

Exclusion
No model in the partition of p can match q.

r(p)

r(q) + r(p) < d(p, q): A


they cannot be in
other element in T
r(q) r(p) d(p, q): T
r(q)
Fig.q 1(b). Thus, al
d(p,q)
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
9

lundi 5 septembre 2011

other
element
in
T
Intersection
r(q) r(p) d(p, q): T
Partition of p cannot be excluded
and
must
be
Fig. 1(b). Thus, al
searched exhaustively.
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
r(p)
the number of compari
r(q)
ecient
d(p,q)to implement
q
p
the comparison operat
2.2

Process Mode
10

lundi 5 septembre 2011

other
element
in
T
Intersection
r(q) r(p) d(p, q): T
Partition of p cannot be excluded
and
must
be
Fig. 1(b). Thus, al
searched exhaustively.
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
r(p)
the number of compari
r(q)
to
implement
ecient
d(p,q)
p
q
the comparison operat
2.2

Process Mode
10

lundi 5 septembre 2011

other
element
in
T
Intersection
r(q) r(p) d(p, q): T
Partition of p cannot be excluded
and
must
be
Fig. 1(b). Thus, al
searched exhaustively.
be compared with
r(q) + r(p) d(p, q): T
of the objects that
Pruning complete parti
r(p)
the number of compari
r(q)
to
implement
ecient
d(p,q)
p
q
the comparison operat
Remember:
No coordinates!

2.2

Process Mode
10

lundi 5 septembre 2011

Efficient Search
If we partition the whole repository, we can

safely exclude partitions from search and


become efficient.

p1

p3

p2
11
lundi 5 septembre 2011

Efficient Search
If we partition the whole repository, we can

safely exclude partitions from search and


become efficient.

p1

p3

p2
11
lundi 5 septembre 2011

Efficient Search
If we partition the whole repository, we can

safely exclude partitions from search and


become efficient.

r(q)

p1

p3

p2
11
lundi 5 septembre 2011

Abstract
a suitable
distance function

reduce number
of comparisons

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.
How does
it work?

How well does


it work?
12

lundi 5 septembre 2011

Behavioral Profile

describes the behavioral relation between any

pair of activities in a process model

is an abstraction of behavior, based on trace

semantics

[4]
lundi 5 septembre 2011

13

behavioral profile of P .
Strict Order

t order, the reverse strict order relati


1
x ABP y y P x. Behavioral pr
any exploit
trace of the
in which
A and B T
we inwill
forprocess,
similarity
analysis.
occur, A happens before B
tion, the relations of the behavioral pro
tivities [4]. Further, behavioral profiles
ocess models introduced above under th
a correctness criterion that guarantees
e [31]. Our notion of a process model tr
a dedicated structural sub-class of Petr
14

lundi 5 septembre 2011

behavioral profile of P .
Strict Order

t order, the reverse strict order relati


1
x ABP y y P x. Behavioral pr
any exploit
trace of the
in which
A and B T
we inwill
forprocess,
similarity
analysis.
occur, A happens before B
tion, the relations of the behavioral
pro
information loss:
CAUSALITY
tivities [4]. Further, behavioral
profiles
ocess models introduced above under th
a correctness criterion that guarantees
e [31]. Our notion of a process model tr
a dedicated structural sub-class of Petr
14

lundi 5 septembre 2011

Exclusiveness Relation
B + C
two activities B and C never occur together in

a trace of the process

15
lundi 5 septembre 2011

Interleaving Order
B || C
else, i.e., B may happen before C and vice

versa

16
lundi 5 septembre 2011

Interleaving Order
B || C
else, i.e., B may happen before C and vice

versa

information loss:
CAUSALITY,
CARDINALITY

16
lundi 5 septembre 2011

Behavioral Profile

describes relation between any pair of activities

in a model

abstraction makes comparison faster


can be (pre-) computed efficiently for sound,

free-choice process models

17
lundi 5 septembre 2011

n 4 (Behavioral Profile). Let P = (


Design of a Metric
pair (x, y) (A A) is in the following
trict order relation P , i x P y and
1
Introdu
xclusiveness relation +P , i x P y an
|AB|
nterleaving
order
relation
||
,
i
x P
P 1 |AB|
Jaccard Distance for sets
d=
A
and
B
measures
their
overlap
,
+
,
||
}
is
the
behavioral
profile of P
P
P
P
compare particular relations of the behavioral

profiles
of two
process order,
models, Pthe
andreverse
Q
pair
(x, y)
in strict
s
1
elementary
e pair
(y, x),similarities
i.e., x (,P+, y||) y P
extended similarities
properties
that we will exploit for sim
e strict order relation, the relations of
[34]

lundi 5 septembre 2011

18

n 4 (Behavioral Profile). Let P = (


Design of a Metric
pair (x, y) (A A) is in the following
proven
metric
trict order relation
toPbe, ai
x P y and
by Lipkus [34]
1
Introdu
xclusiveness relation +P , i x P y an
|AB|
nterleaving
order
relation
||
,
i
x P
P 1 |AB|
Jaccard Distance for sets
d=
A
and
B
measures
their
overlap
,
+
,
||
}
is
the
behavioral
profile of P
P
P
P
compare particular relations of the behavioral

profiles
of two
process order,
models, Pthe
andreverse
Q
pair
(x, y)
in strict
s
1
elementary
e pair
(y, x),similarities
i.e., x (,P+, y||) y P
extended similarities
properties
that we will exploit for sim
e strict order relation, the relations of
[34]

lundi 5 septembre 2011

18

a high similarity sim+ (Bm1 , Bm2 ) = 14


15 0.933.
In most cases, the order of tasks of a business process is reflected by th
order of these activities in the process model. The strict order similarity
to reward a large overlap of two strict order relations, whereas order vio
are penalized.

Strict Order Similarity

reward
of common
activities
in a
Definitionsimilar
6 (Strictordering
Order Similarity).
Let P , Q be
process models

Q the strict
order relations
of their
process
models
P and
Q respective behavioral profiles BP , B
define the Strict Order Similarity
|P Q |
sim (BP , BQ ) = |P Q |

Due to x P y y 1
P x, cf. Section 3, it suces to incorporate o
strict order relation into the similarity measure. The reverse relation is im
covered. This can be seen in the behavioral profile matrix as the stric

19
lundi 5 septembre 2011

a high similarity sim+ (Bm1 , Bm2 ) = 14


15 0.933.
In most cases, the order of tasks of a business process is reflected by th
order of these activities in the process model. The strict order similarity
to reward a large overlap of two strict order relations, whereas order vio
are penalized.

Strict Order Similarity

reward
of common
activities
in a
Definitionsimilar
6 (Strictordering
Order Similarity).
Let P , Q be
process models

Q the strict
order relations
of their
process
models
P and
Q respective behavioral profiles BP , B
define the Strict Order Similarity
|P Q |
sim (BP , BQ ) = |P Q |

Due to x P y y 1
P x, cf. Section 3, it suces to incorporate o
strict order relation into the similarity measure. The reverse relation is im
covered. This can be seen in the behavioral profile matrix as the stric

d=1

|AB|
|AB|

sim (BP , BQ ) =
lundi 5 septembre 2011

|{(A,B),(A,C)}{(A,B),(A,C)}|
|{(A,B),(A,C)}{(A,B),(A,C)}|

=1
19

Fig. 2. Order management process their


models
m1 order
and mrelations,
2 with their
strict
bec
profile matrices Bm1 and Bm2 .
order in m1 , whereas in m2 the
missing in m2 , which yields five
are illustrated by equal labels in both
models,
and are
represented
Bm1
. The strict
order
relationst
indices a to g. The behavioral profiles
(Bm1 , Bnot
are depicted
as m
however,
by that.
T
m2 )aected
6
Exclusiveness
Interleaving
Order
the activities, e.g., a and g are in strict
since
in
all
traces, w
simorder,
(Bm1 , B
)
=
m2
16 = 0.375.
activities appear, a happens before g.
Interleaving order is the we
Similarity
Similarity
Exclusiveness is the strictest relation
of a behavioral
becaust
activities,
since itprofile,
only states
the
absence of aviolations
co-occurrence of two
activities
within
onedegree
process
in
process
instance.
Thus,
the inte

penalizes
rewards
high
corresponding
similarity
quantifies,
how
many
common
pairs
exist
in
pairs,
if
they
are,
e.g.,
executed
i
of exclusiveness
of flexibility
models that feature the same exclusiveness
relation.
flow cycle
in the other.

Elementary Similarities

constraints

e.g. loops
Definition
Definition 5 (Exclusiveness Similarity).
Let7 P(Interleaving
,and
Q be processO

concurrency
||Prespective
, ||Q the interleaving
ord
+P , +Q the exclusiveness relations ofand
their
behavioral profi
BP , BQ . We define the Interleav
We define the Exclusiveness Similarity
| ||P ||Q |
|+P +Q |
sim|| (BP , BQ ) = | ||P ||Q |
sim+ (BP , BQ ) = |+P +Q |

The
process
models models
of Fig. 2ofshF
See, for example, the activities c and
d in
both process
(d,f)
and (f,d).
the scenario, it is prohibitive to reject
an order
and fileThis
thisyields
order si
a
20
all cases
does
the ord
The only dierence between m1 and m2 Not
within
regard
to the
exclusiven

lundi 5 septembre 2011

The process models of Fig. 2 share interleaving order relations for (d,e),(e,d),
(d,f) and (f,d). This yields sim|| (Bm1 , Bm2 ) = 48 = 0.5.
Not in all cases does the ordering of activities in a model correspond to the
actual order of execution of these activities in practice. Their order may simply be
one possible sequence sprung from the habit of the process modeler. To address
such cases, we extend the elementary similarities above to increase the tolerance
of the measures.
The strict order similarity only rewards pairs of activities that are executed in
the same order. If two activities appear both in each trace of two distinct process
pair of activities in interleaving order in one model
models but are executed in inverse order
respectively,
it is plausible
they do
of the
same activities
in a staticthat
sequence
in anothe
not depend on each other, but ratherorder
been
havesupersedes
simply
modeled
in flexibility.
a sequenceHowever
strict
order in
that seemed suitable to the process interleaving
modeler. Thus,
propose the
extended
orderwesimilarity.
Thus,
we suggest the
strict order similarity that will rewardsimilarity,
these pairs
of activities.
which
rewards the containment of strict o

Extended Similarities
Extended Strict Order
Similarity

Extended Interleaving
Order Similarity

rewards strict order,


rewards any order
even if in opposite
unless exclusiveness
directions
is violated
order, accounting
for Let
strictP order
both behaviora
Definition
8 (Extended Strict Order
Similarity).
, Q bein process

1forward and reverse strict o


in relations,
both directions,
i.e.,
models and P , Q the strict order
1
,

Q
P the reverse strict
order relations of their respective behavioral profiles BP , BQ . We define the
Definition 9 (Extended Interleaving Order S
Extended Strict Order Similarity
cess
models and P , Q the strict order relations,
1
|(P 1
)(

)|
Q
sim (BP , BQ ) = |( P1 )( Q
order
relations, ||P , ||Q the interleaving order relat
1
)|
P
Q
P
Q
ioral profiles BP , BQ . We define the Extended Int
1
The above consideration applies to activities f and g in |(
Fig.P
2: 1
They
are
in
strict
||
)(

P
Q
P
Q ||Q ) |
simin
= |(
1
1
|| (B
order in m1 and in reverse strict order
mP2 ., BInQ )the
given
scenario,
it
seems
||
)(

P
P
Q
P
Q ||Q ) |
to be reasonable to assume that there is no explicit order constraint between
As an example, refer to the activities e and f in Fi
them. Thus, these pairs should contribute to the similarity of both models:
order in m1 . In an instance of this process they may21b
14
lundi sim
5 septembre
(Bm1 , Bm2 ) =
0.467.
2011

elementary metric, dh (BP , BQ ) = 1 simh (BP , BQ ) for a


gregated similarity metric as follows. Each e
as
explained
in
Section
4.2
and
[34].
Then,
we
sum
up
th
elementary
metric,
d
(B
,
B
)
=
1

sim
Behavioral Profile Metric h P Q
h
weight that accounts for
the respective
metrics
impact
as explained
in Section
4.2 and [34].
Theno
that accounts
the respective
me
We postulate B as weight
the universe
of allforpossible
behavio
We
postulate
B
as
the
universe
of
all
p
this matches the behavioral profiles of all models within
this matches the behavioral profiles of all
Aggregate the similarities by weighted sum
Definition
10
(Behavioral
Profile
Metric).
B
=
(B
Definition
10 (Behavioral Profile Me
and transform it into
a metric.
of behavioral profiles B,
theprofiles
behavioral
metr
of where
behavioral
B, whereprofile
the behavio
weights determined
through experiments
metric,
metric,
dB (BP , BQ ) = 1 wh simh (BP , BQ
dB (BP , BQ ) = 1
wh simh (BP , BhQ )

h
with
h

{+,
,
||,

,
||
}
and
weighting
f

{+, , ||, , || w}h and
= 1. weighting factors wh R

with
h

wh = 1.
h

In order to use this aggregate metric for


that it is indeed a metric as established in

In order to use this aggregate metric for similarity sear


Theorem
1. The weighted
sum D(oi , o1.
)
that it is indeed a metric
as established
in Definition
j22
lundi 5 septembre 2011

rics d is a metric if h [1...n] : w R

elementary metric, dh (BP , BQ ) = 1 simh (BP , BQ ) for a


gregated similarity metric as follows. Each e
as
explained
in
Section
4.2
and
[34].
Then,
we
sum
up
th
elementary
metric,
d
(B
,
B
)
=
1

sim
Behavioral Profile Metric h P Q
h
weight that accounts for
the respective
metrics
impact
as explained
in Section
4.2 and [34].
Theno
that accounts
the respective
me
We postulate B as weight
the universe
of allforpossible
behavio
We
postulate
B
as
the
universe
of
all
p
this matches the behavioral profiles of all models within
this matches the behavioral profiles of all
Aggregate the similarities by weighted sum
Definition
10
(Behavioral
Profile
Metric).
B
=
(B
Definition
10 (Behavioral Profile Me
and transform it into
a metric.
of behavioral profiles B,
theprofiles
behavioral
metr
of where
behavioral
B, whereprofile
the behavio
weights determined
through experiments
metric,
metric,
dB (BP , BQ ) = 1 wh simh (BP , BQ
dB (BP , BQ ) = 1
wh simh (BP , BhQ )

h
with
h

{+,
,
||,

,
||
}
and
weighting
f

{+, , ||, , || w}h and
= 1. weighting factors wh R

with
h

wh = 1.
h

In orderproof
to usethat
this this
aggregate
metric
for
is a metric
that it iscan
indeed
aound
metric
as established
in
be f
in the
paper

In order to use this aggregate metric for similarity sear


Theorem
1. The weighted
sum D(oi , o1.
)
that it is indeed a metric
as established
in Definition
j22
lundi 5 septembre 2011

rics d is a metric if h [1...n] : w R

Abstract
a suitable
distance function

reduce number
of comparisons

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.
overlap in behavioral
profile relations

How well does


it work?
23

lundi 5 septembre 2011

Evaluation
SAP Reference Model (604 EPC)
human assessment of similarity among a

subset of pairs provided by Dijkman et al. [2]

identified matching nodes by string edit

distance of their labels

measured precision of metrics for different

recall levels

24
lundi 5 septembre 2011

Search Quality
single metrics

So

In

with certain weights

ES

equal weights

EI

1,00

1,00

0,75

0,75

Precision

Precision

Ex

aggregated metric

0,50

0,50

0,25

0,25

0
0

0,1

0,2

0,3

0,4

0,5

Recall

0,6

0,7

0,8

0,9

1Ex-4ES

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

Recall

25
lundi 5 septembre 2011

Search Performance
built metric tree index and searched for

10 most similar process models

bpd_median

query time (ms)

400

300

200

100

0
20

80

140

200

260

320

dataset size
lundi 5 septembre 2011

380

440

500

26

Search Performance
built metric tree index and searched for

10 most similar process models

Beyond the
paper
bpd_median

query time (ms)

400

300

200

100

0
20

80

140

200

260

320

dataset size
lundi 5 septembre 2011

380

440

500

26

Search Performance
built metric tree index and searched for

10 most similar process models

saved up to 90% of |comparisons|

Beyond the
paper

compared to linear search

linear search

bpd_median

bpd_median

100

number of comparisons

400

query time (ms)

bpd_max

300

200

100

75

50

25

0
20

80

140

200

260

320

dataset size
lundi 5 septembre 2011

380

440

500

20

80

140

200

260

320

dataset size

380

440

500

26

Limitations

efficiency is dependent of distance distribution

among model collection

limited support for behavioral excerpts/

subgraphs

27
lundi 5 septembre 2011

Conclusions

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.

lundi 5 septembre 2011

Conclusions
reduce number
of comparisons

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.

lundi 5 septembre 2011

Conclusions
b
a

a suitable
distance function

reduce number
of comparisons

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.

lundi 5 septembre 2011

Conclusions
b
a

a suitable
distance function

reduce number
of comparisons

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.
overlap in behavioral
profile relations

lundi 5 septembre 2011

Conclusions
b
a

a suitable
distance function

reduce number
of comparisons

A proper metric to enable efficient similarity


search of process models based on a
behavioral abstraction thereof.
overlap in behavioral
profile relations

provides good quality


and performance
400

1,00
0,75

300

0,50

200

0,25

100

0
0

lundi 5 septembre 2011

0,2

0,4

0,6

0,8

20 100 180 260 340 420 500

References
1. Rosemann, M.: Potential pitfalls of process modeling: part B. Business Process
Management Journal 12(3) (2006) 377-384
2. Dijkman, R., Dumas, M., van Dongen, B., Kaarik, R., Mendling, J.: Similarity of
business process models: Metrics and evaluation. Inf. Syst. 36(2) (2011) 498-516
3. Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space
Approach. Springer-Verlag New York, Inc., Secaucus, NJ, USA (2005)
4. Weidlich, M., Mendling, J., Weske, M.: Efficient Consistency Measurement based on
Behavioural Profiles of Process Models. IEEE Trans. Softw. Eng. (2011)
5. Kunze, M., Weidlich, M., Weske, M.: m3 - A Behavioral Similarity Metric for Business
Processes. In: ZEUS'11. CEUR-WS. (2011) 89-95
6. Ch'avez, E., Navarro, G., Baeza-Yates, R., Marroqu'in, J.L.: Searching in Metric Spaces.
ACM Comput. Surv. 33(3) (2001) 273-321
7. Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces (Survey Article).
ACM Trans. Database Syst. 28(4) (2003) 517-580
8. Dumas, M., Garc'ia-Ba~nuelos, L., Dijkman, R.M.: Similarity Search of Business Process
Models. IEEE Data Eng. Bull. 32(3) (2009) 23-28
9. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB
J. 10(4) (2001) 334-350
10. Euzenat, J., Shvaiko, P.: Ontology matching. Springer-Verlag, Heidelberg (DE) (2007)
11. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals.
Soviet Physics Doklady 10(8) (1966) 707-710
12. Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing.
Commun. ACM 18(11) (1975) 613-620
13. Manning, C.D., Sch"atze, H.: Foundations of Statistical Natural Language Processing.
The MIT Press (1999)
14. Miller, G.A.: WordNet: A Lexical Database for English. Commun. ACM 38(11) (1995)
39-41
15. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition.
Pattern Recognition Letters 1(4) (1983) 245-253
16. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman (1979)
17. Dijkman, R.M., Dumas, M., Garc'ia-Ba~nuelos, L.: Graph Matching Algorithms for
Business Process Model Similarity Search. In: BPM. Volume 5701 of LNCS, Springer (2009)
48-63
18. Li, C., Reichert, M., Wombacher, A.: On Measuring Process Model Similarity Based on
High-Level Change Operations. In: ER. Volume 5231 of LNCS, Springer (2008) 248-264
19. Yan, Z., Dijkman, R.M., Grefen, P.: Fast Business Process Similarity Search with FeatureBased Similarity Estimation. In: OTM '10, Volume 6426 of LNCS, Springer (2010) 60-77
20. Wombacher, A., Rozie, M.: Evaluation of Workflow Similarity Measures in Service
Discovery. In: Service Oriented Electronic Commerce. Volume 80 of LNI., GI (2006)
51-71
21. Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., Zave, P.: Matching and
Merging of Statecharts Specifications. In: ICSE '07, Washington, DC, USA, IEEE
Computer Society (2007) 54-64

22. Eshuis, R., Grefen, P.: Structural Matching of BPEL Processes. In: ECOWS '07,
Washington, DC, USA, IEEE Computer Society (2007) 171-180
23. van Dongen, B., Dijkman, R., Mendling, J.: Measuring Similarity between Business
Process Models. In: CAISE '08, Volume 5074 of LNCS, Springer (2008) 450-464
24. Minor, M., Tartakovski, A., Bergmann, R.: Representation and Structure-Based
Similarity Assessment for Agile Workflows. In: ICCBR '07 Volume 4626 of LNCS,
Springer (2007) 224-238
25. Ehrig, M., Koschmider, A., Oberweis, A.: Measuring Similarity between Semantic
Business Process Models. In: APCCM '07 Australian Computer Society, Inc. (2007)
71-80
26. van der Aalst, W., de Medeiros, A.A., Weijters, A.: Process Equivalence: Comparing
Two Process Models Based on Observed Behavior. In: BPM Volume 4102 of LNCS,
Springer (2006) 129-144
27. Lu, R., Sadiq, S.: On the Discovery of Preferred Work Practice Through Business
Process Variants. In: ER' 07, Volume 4801 of LNCS, Springer (2007) 165-180
28. Kunze, M., Weske, M.: Metric Trees for Efficient Similarity Search in Process Model
Repositories. In: IW-PL '10, Hoboken, NJ (September 2010)
29. Vanhatalo, J., V"olzer, H., Leymann, F., Moser, S.: Automatic Workflow Graph Refactoring
and Completion. In: ICSOC. Volume 5364 of LNCS, Springer (2008) 100-115
30. Lohmann, N., Verbeek, E., Dijkman, R.M.: Petri Net Transformations for Business
Processes - A Survey. T. Petri Nets and Other Models of Concurrency 2 (2009) 46-63
31. van der Aalst, W.: Workflow Verification: Finding Control-Flow Errors Using Petri-NetBased Techniques. In: BPM '09, Volume 1806 of LNCS, Springer (2000) 161-183
32. Weidlich, M., Polyvyanyy, A., Mendling, J., Weske, M.: Efficient Computation of Causal
Behavioural Profiles Using Structural Decomposition. In:Petri Nets. Volume 6128 of LNCS,
Springer (2010) 63-83
33. Weidlich, M., Mendling, J.: Perceived Consistency between Process Models. Inf. Syst.
(2011) In Press.
34. Lipkus, A.: A Proof of the Triangle Inequality for the Tanimoto Distance. Journal of
Mathematical Chemistry 26 (1999) 263-265
35. Decker, G., Mendling, J.: Process instantiation. Data Knowl. Eng. 68(9) (2009) 777-792
36. Buckley, C., Voorhees, E.M.: Evaluating evaluation measure stability. In: SIGIR '00.
(2000) 33-40
37. Awad, A.: BPMN-Q: A Language to Query Business Processes. In: EMISA. Volume P-119
of LNI., GI (2007) 115-128
38. Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying Business Processes. In: VLDB
'06VLDB Endowment (2006) 343-354
39. Choi, I., Kim, K., Jang, M.: An XML-based process repository and process query
language for integrated process management. Knowledge and Process Management 14(4)
(2007) 303-316
40. Jin, T., Wang, J., Wu, N., Rosa, M.L., ter Hofstede, A.H.M.: Efficient and Accurate
Retrieval of Business Process Models through Indexing - (Short Paper). In: OTM '10, Volume
6426 of LNCS, Springer (2010) 402-409

29
lundi 5 septembre 2011

! !

% %

% &#!'%$((((
!
&#!'%$((((
! %

"#!

"#!
$

(b)

%
!
&#!'%$((((
!
%

&#!'%$

(a)

"#%
"#%
$

$
"#%
$

"#%
$
"#%
$

&#!'%$((((

"#%

$ "#!
$

"#%
$

"#!

$
"#!

"#!
$

$
"#!

((((

"#%
$

Metric Space Partitions

"#!

&#!'%$

(c)

Fig. 1. Metric space partition T (p) (solid circle) with pivot p and similarity query
(dotted circle) with query model q. (a) Exclusion, (b) Inclusion, (c) Intersection.
data transformation and reduction techniques need to be applied, which often
result in objects for which only pairwise similarity can be computed [3].
Within the last two decades, data structures and algorithms have been studied
that allow for ecient similarity search in such a case, cf. [6,7]. These methods

lundi 5 septembre 2011

30

Process Model Similarity


Table 1. Overview of process model similarities
Approach
Minor et al. [24]
Li et al. [18]
Ehrig et al. [25]
Dijkman et al. [17]
Eshuis and Grefen [22]
Aalst et al. [26]
Wombacher and Rozie [20]
Lu and Sadiq [27]
Dongen et al. [23]
Nejati et al. [21]
Yan et al. [19]

Aspect
Structure
Structure
Structure
Structure
Behavior
Behavior
Behavior
Structure
Behavior
Behavior
Structure

Symmetry Nonnegativity Identity Triangle


inequality
yes
yes
yes
yes
yes
no
yes
yes
yes
yes
yes

yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes

yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes

no
no
no
yes
yes
no
no
yes
no
no
no

Further, edit distances have been defined for the behavior of a process model based
on all traces, an automaton encoding the language, or an n-gram representation of
all traces [20]. If the process behavior is defined by a transition system, the degree
lundi 5 septembre 2011

31

Process Model Similarity


20. Wombacher, A., Rozie, M.: Evaluation of Workflow Similarity Measures in Service

Discovery. In: Service Oriented Electronic Commerce. Volume 80 of LNI., GI (2006) 51-71
21. Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., Zave, P.: Matching and

Merging of Statecharts Specifications. In: ICSE '07, Washington, DC, USA, IEEE Computer
Society (2007) 54-64
22. Eshuis, R., Grefen, P.: Structural Matching of BPEL Processes. In: ECOWS '07,

Washington, DC, USA, IEEE Computer Society (2007) 171-180


23. van Dongen, B., Dijkman, R., Mendling, J.: Measuring Similarity between Business

Process Models. In: CAISE '08, Volume 5074 of LNCS, Springer (2008) 450-464
24. Minor, M., Tartakovski, A., Bergmann, R.: Representation and Structure-Based

Similarity Assessment for Agile Workflows. In: ICCBR '07 Volume 4626 of LNCS, Springer
(2007) 224-238
25. Ehrig, M., Koschmider, A., Oberweis, A.: Measuring Similarity between Semantic

Business Process Models. In: APCCM '07 Australian Computer Society, Inc. (2007) 71-80
26. van der Aalst, W., de Medeiros, A.A., Weijters, A.: Process Equivalence: Comparing

Two Process Models Based on Observed Behavior. In: BPM Volume 4102 of LNCS,
Springer (2006) 129-144
27. Lu, R., Sadiq, S.: On the Discovery of Preferred Work Practice Through Business

Process Variants. In: ER' 07, Volume 4801 of LNCS, Springer (2007) 165-180
32
lundi 5 septembre 2011

on common Petri net-based formalizations [30].


Our metric relies on behavioral profiles as a behavioral abstraction. These
profiles capture behavioral characteristics of a process model by means of relations
between activity pairs. These relations are grounded in weak order, which holds
between two activities if both are observed in a trace in a certain order.

Behavioral Profile (formally)

Definition 3 (Weak Order). Let P = (A, s, e, C, N, F, T ) be a process model


and TP its set of traces. The weak order relation P (A A) contains all
pairs (x, y), such that there exists a trace = a1 , a2 , . . . in TP and there exist
two indices j, k {1, 2, . . .} with j < k for which holds aj = x and ak = y.
Using the notion of weak order, we define the behavioral profile as follows.

Definition 4 (Behavioral Profile). Let P = (A, s, e, C, N, F, T ) be a process


model. A pair (x, y) (A A) is in the following relations:
The strict order relation P , i x P y and y P x.
The exclusiveness relation +P , i x P y and y P x.
The interleaving order relation ||P , i x P y and y P x.
BP = {P , +P , ||P } is the behavioral profile of P .

For each pair (x, y) in strict order, the reverse strict order relation comprises
the inverse pair (y, x), i.e., x P y y 1
P x. Behavioral profiles show a
number of properties that we will exploit for similarity analysis. Together with
the reverse strict order relation, the relations of the behavioral profile partition
the Cartesian product of activities [4]. Further, behavioral profiles are computed
lundi 5eciently
septembre 2011
for the class of process models introduced above under the assumption

33

Behavioral Profile (Example)

0"3)'0.$45
6"31'*",

#"4(#1'0.$45
6"31'*7,
0"3)'
$3!($-"'*%,

#"8"-1'
(#)"#'*-,

!"%$
!"#$%&'
(#)"#'*+,

%$9"'(#)"#'
*),
0"3)'0.$45
6"31'*",

#"4(#1'0.$45
6"31'*7,

0"3)'
$3!($-"'*%,

+ - ) " % 7
+ :     
: : : : :
)
: : ;; ;; ;;
;; :  
"
: ;;
%
:
 :
7
: ;;



%$9"'(#)"#'
*),

-."-/'
01(-/'*2,



: 



!"#$%&'
(#)"#'*+,

  
 
 
:  
 
 


#"8"-1'
(#)"#'*-,

!"#$

+ 2 - ) " % 7
+ :      
2
:     
: : : : :
)
: : ;; ;; 
;; : ;; 
"
: ;; ;; : 
%
7
:
:

Fig. 2. Order management process models m1 and m2 with their behavioral


profile matrices Bm1 and Bm2 .
are illustrated by equal labels in both models, and are represented through the
indices a to g. The behavioral profiles (Bm1 , Bm2 ) are depicted as matrices over
the activities, e.g., a and g are in strict order, since in all traces, where both
activities appear, a happens before g.
lundi 5 septembre 2011

34

contrast
to pure similarity
ordering features,
strict
ordercf.and
interleaving
order,
thatInallow
for ecient
search ini.e.,
such
a case,
[6,7].
These methods
we do not
relax the exclusiveness
similarity,
since exclusiveness
a very strong
leverage
transitivity
properties exposed
by a distance
functionaiscomplement
to
statement
about
the dependency
and correlation of activities.
the
similarity
of data
objectsa metric.

Behavioral Profile Metric (formally)

Definition
1 (Metric).
A metric
is a distance
function d : D D R between
4.3 Aggregated
Metric
for Behavioral
Profiles
objects of domain D with the following properties:
Based
on the elementary
above, we construct an ag Symmetry:
oi , oj Dsimilarity
: d(oi , oj )measures
= d(oj , odefined
i)
gregated
similarity metric
follows.
elementary
Nonnegativity:
oi , oj as
D,
oi = oEach
j : d(o
i , oj ) > 0 similarity translates into an

elementary
BQi), o=j )1=
0sim
,
B
)
for
all
h

{+,
,
||,

, || },
Identity:metric,
oi , ojd
DP: ,d(o
ho(B
=
o
h (B
P
Q
i
j
as explained
in Section 4.2
Then,
up ithese
Triangle inequality:
oi ,and
oj , o[34].
: d(owe
) d(o
, oj ) +values
d(oj , and
ok ) assign a
k D
i , oksum
weight
that
accounts
for S
the
metrics impact on the overall metric.
A
metric
space
is a pair
= respective
(D, d).
We postulate B as the universe of all possible behavioral profiles. In practice
The
triangle inequality
enables
determining
minimum
maximum distances
this matches
the behavioral
profiles
of all models
withinand
a repository.
of two objects oi , ok without calculating it, if their pairwise distance to a third
Definition
10 (Behavioral
Profile Metric).
B =sets,
(B, i.e.,
dB ) is
a metric
space
object
oj is given.
To search eciently
in large data
avoid
comparison
of behavioral
profiles
where
behavioral
profile metric
B can
R isbe
a
B : B some
with
every object,
theB,data
setthe
is split
into partitions,
fromdwhich
metric, during search. In metric spaces, a partition is established by a pivot p D
pruned

anddaB (B
covering
that
= 1 r(p)
wh R
sim
) sphere around p. All data objects oi
P , BQ ) radius
h (Bspans
P , BQ a
within a distance d(oi, hp) r(p) are contained in that sphere, oi T (p).
with
h {+, , ||, , || } and weighting factors wh R, 0 < wh < 1 such that
Similarity
search requires a query q from the same domain of the objects in
wh = 1.
the
h data set, i.e., q D, and a tolerance that expresses how similar the objects
of a valid result may be to qa query radius r(q) R. Fig. 1(a) shows a pivot p
In order
to use this
aggregate
metric
for similarity
search,
has toalong
be shown
with
its covering
radius
r(p) and
all elements
of T (p)
(soliditcircle),
with 35
that
it 2011
is indeed
established
in Definition
lundi
5 septembre
the
query
model aq metric
and itsas
similarity
tolerance
(dotted1.circle) spanned by r(q).

Measures in the paper


,-

./

*,

*+,-./0-

*.

!#(

!#(

!#'

!#'

!#&

!#&

!#%

!#%

!
!"#$%&%'(

!
!"#$%&%'(

*+

!#$
!"
!#"

!"
!#"
!#8

!#0

!#7

!#)

!#)

!
!#)

!#0

!#1

!#"

!#$

)#$*++

(a)

!#%

!#&

!#'

!#(

)123"14

!#$

!#1

12345360314316

!#)

!#7

!#8

!#"

!#$

!#%

!#&

!#'

!#(

)#$*++

(b)

Fig. 3. (a) Precision-recall curve for metrics based on elementary similarities,


(b) precision-recall curve for a baseline metric, and two aggregated metrics
of 0.6), two elements are considered to be a matching pair. This proves very

lundi 5 septembre 2011

36

You might also like