Professional Documents
Culture Documents
Visual Notation Design 2.0: Towards User-Comprehensible RE Notations
Visual Notation Design 2.0: Towards User-Comprehensible RE Notations
Daniel L. Moody
University of Luxembourg
Luxembourg
email: patrice.caire@uni.lu
I.
Designing notations that business stakeholders can understand is one of the most difficult practical problems and
greatest research challenges in the RE field. Requirements
engineering is, to a large extent, a communication problem:
its success depends critically on effective communication
between business analysts and end users (customers). For
this reason, the Holy Grail for RE researchers and practitioners has always been to develop modelling notations that
nave users can understand.
Empirical studies show that we have been spectacularly
unsuccessful in doing this: both field and laboratory studies
show that nave users understand RE models very poorly and
that most analysts do not even show models to their customers [13,14,31,40]. One of the reasons why this is such an
intractable problem is because it is extraordinarily difficult
for experts to think like novices, a problem called the curse
of knowledge [12]. Also, members of the target audience
(nave users) are rarely involved in the notation design process or even in acceptance testing prior to their release.
Semantically
Perverse
(false mnemonic)
Semantically Opaque
(conventional)
appearance suggests
different or opposite
meaning
STOP
Class
Semantically
Transparent
(mnemonic)
appearance suggests
correct meaning
Person
At the negative end, semantic perversity means a novice reader would be likely to guess an incorrect meaning
from the symbols appearance (e.g. a green light for
stop). Such symbols require the most effort to learn
and remember, as they require unlearning the familiar
meaning.
Semantic transparency formalises subjective notions like
intuitiveness or naturalness that are often used informally when discussing visual notations.
B. Operationalising semantic transparency.
Semantic transparency is defined as the extent to which a
novice reader can infer the meaning of symbol from its appearance alone [27]. However, semantic transparency is
typically evaluated subjectively: experts (researchers, notation designers) try to estimate the likelihood that novices will
be able to infer the meaning of particular symbols [e.g.
28,29]. Experts are poorly qualified to do this because it is
difficult for them to think like novices (cf. the curse of
knowledge). This paper defines a way of empirically measuring (operationalising [8]) semantic transparency, which
provides a way of objectively resolving such issues.
C. Visual Notation Design 2.0
Until now, the design of RE visual notations has been the
exclusive domain of technical experts (e.g. researchers,
members of OMG technical committees). Even when notations are specifically designed for communicating with business stakeholders, members of the target audience are rarely
involved. For example, BPMN 2.0 is a notation designed for
communicating with business stakeholders, yet no business
representatives were involved in the notation design process
and no testing was conducted with them prior to its release
[35,41]. In the light of this, it is perhaps no surprise that RE
notations are understood so poorly by business stakeholders:
this is analogous to building a software system without involving end users or conducting user acceptance testing
prior to its release, which would be a recipe for disaster.
Web 2.0 involves a radical change in the dynamics of
content creation on the web, where end users can contribute
their own content rather than being passive consumers [32].
For example, Threadless is a T-shirt company that does not
have its own designers but allows customers to submit their
own designs, which are voted on by other customers: the
most popular designs are then put into production. In this
paper, we apply the Web 2.0 philosophy to designing RE
visual notations. We define a process for actively involving
nave users in the notation design process as co-developers
(prosumers) rather than as passive consumers.
D. Research objectives
The broad research questions addressed by this paper are:
RQ1. How can we objectively measure the semantic
transparency of visual notations?
RQ2. How can we improve the semantic transparency of
visual notations?
RQ3. Can novices design more semantically transparent
symbols than experts?
PREVIOUS RESEARCH
Agent
Goal
Position
Resource
Role
Softgoal
Task
RESEARCH DESIGN
The research design consists of 5 related empirical studies (4 experiments and 1 non-reactive study) 1:
1. Symbolisation experiment: nave participants generated symbols for i* concepts, a task normally reserved
for experts.
2. Stereotyping analysis (nonreactive study): we identified the most common symbols produced for each i*
concept. This defined the stereotype symbol set.
3. Prototyping experiment: nave participants identified
the best representations for each i* concept. This defined the prototype symbol set.
4. Semantic transparency experiment: we evaluated the
ability of nave participants to infer the meanings of
novice-designed symbols (stereotype and prototype
symbol set) compared to expert-designed symbols
(standard i* and PoN i*).
5. Recognition experiment: we evaluated the ability of
nave participants to learn and remember symbols from
the 4 symbol sets.
The research design combines quantitative and qualitative research methods, thus providing triangulation of
method [20]: Studies 1,2,3 use qualitative methods, while
Studies 4 and 5 use quantitative methods.
IV.
In this experiment, we asked nave participants to generate symbols for i* concepts. To do this, we used the sign
production technique, developed by Howell and Fuchs [16]
to design military intelligence symbols. This involves asking
members of the target audience (those who will be interpreting diagrams) to generate symbols to represent concepts. The
1
A. Participants
In previous research, semantic transparency has almost
always been evaluated by experts, who are poorly qualified
to do this: the definition of semantic transparency [27] refers
to novice readers, which they most certainly are not. For
this reason, we used nave participants in this experiment.
There were 65 participants, undergraduate students in Interpretation and Translation from the Haute Ecole Marie
HAPS-Bruxelles or Accountancy from the Haute Ecole
Robert Schuman-Libramont. As in Studies 1 and 3, the participants had no previous knowledge of goal modelling or i*.
B. Experimental design
A 4 group, post-test only experimental design was used,
with 1 active between-groups factor (symbol set). There
were four experimental groups, corresponding to different
levels of the independent variable (symbol set):
1. Standard i* (Fig. 2): produced by experts using intuition (unselfconscious design [1]).
2. PoN i* (Fig. 3): produced by experts using explicit
principles (selfconscious design [1]).
3. Stereotype i* (Fig. 4): the most common symbols produced by novices.
4. Prototype i* (Fig. 5): the best symbols produced by
novices (as judged by other novices).
The dependent variables were hit rate and the semantic transparency coefficient. The levels of the independent
variable enable comparisons between unselfconscious and
selfconscious design (group 1 vs 2), experts and novices
(1+2 vs 3+4) and stereotyping and prototyping (3 vs 4), so
provides the basis for answering RQ3 and RQ6 plus one additional question:
RQ11. Does prototyping result in more semantically transparent symbols than stereotyping?
C. Materials
4 sets of materials were prepared, one for each symbol
set. As in all previous experiments, the first page was used to
ask the screening question and collect demographic data. The
remaining 9 pages were used to evaluate the semantic transparency of the symbols. Each symbol was displayed at the
top of the page (the stimulus) and the complete set of i*
constructs and definitions displayed in a table below (the
candidate responses). Participants were asked to indicate
which construct they thought most likely corresponded to
the symbol. Both the order in which the symbols were presented and the order in which the concepts were listed on
each page were randomised to avoid sequence effects.
D. Procedure
Participants were randomly assigned to experimental
groups and provided with a copy of the experimental materials. They were instructed to work alone and not discuss their
responses with other participants. No time limit was set but
subjects took 10-15 minutes to complete the task.
E. Hypotheses
Given that sign production studies consistently show that
symbols produced by novices are more accurately interpreted
than those produced by experts, we predicted that the stereotype and prototype symbol sets would outperform the standard i* and PoN symbol sets. We also predicted that the prototype symbol set would outperform the stereotype set as it
represents the best drawings rather than just the most common ones. Finally, we predicted that the PoN symbol set
would outperform the standard i* symbol set as it was designed based on explicit principles rather than intuition. This
results in a total ordering of the symbol sets:
Prototype > Stereotype > PoN > Standard i*
This corresponds to 12 separate hypotheses: all possible
comparisons between groups on both dependent variables.
F. Results
1) Statistical significance vs practical meaningfulness
In interpreting empirical results, it is important to distinguish between statistical significance and practical meaningfulness [6]. Statistical significance is measured by the pvalue: the probability that the result could have occurred by
chance. However, significance testing only provides a binary
(yes/no) response as to whether there is a difference (and
whether hypotheses should be accepted or rejected), without
providing any information about how large the difference is
[5,6]. Using large enough sample sizes, it is possible to
achieve statistically significance for differences that have
little or no practical relevance. Effect size (ES) provides a
way of measuring the size of differences and has been suggested as an index of practical meaningfulness [37]. Statistical significance is most important in theoretical work, while
effect size is most important for applied research that addresses practical problems [5].
2) Hit rate
The traditional way of measuring comprehensibility of
graphical symbols [17,19] is by measuring hit rates (percentage of correct responses). Only 6 out of the 36 symbols meet
the ISO threshold for comprehensibility of 67% [9], with 5
of these from the stereotype symbol set (TABLE I).
TABLE I. HIT RATE ANALYSIS
(GREEN = ABOVE ISO THRESHOLD; UNDERLINE = BEST)
Actor
Agent
Belief
Goal
Position
Resource
Role
Softgoal
Task
Mean hit rate
Std dev
Group size (n)
Standard
PoN
Stereotype Prototype
0.30
0.58
0.37
Actor -0.31
0.30
0.39
0.30
Agent -0.19
0.37
0.83
0.23
Belief 0.25
0.23
0.45
0.23
Goal -0.07
-0.30
0.33
0.44
Position -0.19
0.44
0.64
0.30
Resource -0.25
0.37
0.64
0.37
Role -0.19
-0.16
0.64
0.44
Softgoal 0.44
0.79
0.64
0.44
Task -0.13
Mean -0.07
0.26
0.57
0.34
Std dev 0.25
0.32
0.15
0.09
0 (one Opaque Transparent Transparent Transparent
sample t-test) (p = .419) (p = 042*) (p = .000***) (p = .000***)
Hypothesis
H1: PoN > Standard
H2: Stereotype> Standard
H3: Prototype > Standard
H4: Stereotype > PoN
H5: Prototype > PoN
H6: Prototype > Stereotype
Statistical
Practical
significance (p) meaningfulness (d)
.005**
1.22+++
.000***
3.32+++
.002***
2.19+++
.000***
1.57+++
.703
.001***
2.21+++
B. Experimental design
A 5 group, post-test only experimental design was used,
with 2 active between-groups factors (symbol set and design
rationale). The groups were the same as in Study 4 with one
additional group: PoN with design rationale (PoN DR).
C. Materials
5 sets of materials were prepared, one for each group:
Training materials: these defined all symbols and associated meanings for one of the symbol sets. The PoN
DR symbol set included explicit design rationale for
each symbol, taken from [29]. Design rationale could
not be included for any of the other symbol sets because
Statistical
Practical
significance (p) meaningfulness (d)
H13: PoN > Standard
.006**
1.16 +++
H14: Stereotype> Standard
.000***
2.32+++
H15: Prototype > Standard
.000***
1.65+++
H16: Stereotype > PoN
.022***
1.02+++
H17: Prototype > PoN
.052
IX.
CONCLUSION
A. Summary of results
We summarise the findings of the paper by answering the
research questions raised in Sections 1 and 2 of the paper:
RQ1. How can we objectively measure the semantic transparency of symbols and notations?
Only two comparisons were not significant: no difference
A: Through blind interpretation experiments (e.g. Study
was found between the prototype and PoN symbol sets (as in
4). In this paper, we have defined a new metric (semantic
Study 4) or between the prototype and stereotype symbol
transparency coefficient), which has theoretical and practical
sets. All statistically significant differences were also practiadvantages over traditional measures such as hit rate.
cally meaningful, with large effect sizes in all cases.
RQ2. How can we improve the semantic transparency of
2) Effect of semantic transparency on recall/
visual notations?
recognition
A: Through symbolisation experiments (e.g. Study 1) and
To evaluate RQ6, we conducted a linear regression
stereotyping analyses (e.g. Study 2). Using this approach, we
analysis across all symbols and symbol sets using the sewere able to increase comprehensibility of symbols (as
mantic transparency coefficient as the independent (predicmeasured by hit rates) by nave subjects by a factor of altor) variable and recognition accuracy as the dependent
most 4 (from 17% to 67%: TABLE I) and reduce interpreta(outcome) variable. The results show that semantic transtion errors by more than 80% (from 16% to 3.1%: Fig. 6)
parency explains 43% of the variance in recognition perover notations designed in the traditional way (the existing i*
2
formance (r = .43) [38]. The effect is both statistically sigsymbol set). Even more surprisingly, it resulted in symbols
nificant (p = .000) and practically meaningful (r2 .25 =
that meet the ISO threshold for public information and safety
large effect size), which confirms RQ6. The standardised
symbols (67% hit rate): in other words, self-explanatory symregression coefficient () is .66, meaning that for a 1% inbols.
crease in semantic transparency, there will be a correspondRQ3. Can novices design more semantically transparent
ing .66% increase in recognition accuracy. The resulting
symbols than experts?
regression equation is:
A: The answer to this question is emphatically yes. 8
out of the 9 best (most semantically transparent) symbols for
Recognition accuracy (%) = 15 * semantic transparency
each
i* concept were designed by novices and 8 out of the 9
coefficient + 88
(1)
worst (least semantically transparent) symbols were designed
by experts (TABLE II). The average semantic transparency
3) Effect of design rationale on recognition
of novice-generated symbols was more than 5 times that of
The difference between the PoN and PoN DR experimenexpert-generated symbols, which challenges the longstanding
tal groups (H13) shows that design rationale improves recassumption in the RE field that experts are best qualified to
ognition performance over and above the effects of semantic
design visual notations.
transparency, thus confirming R6. Including design rationale
RQ4. How can we actively (and productively) involve end
seems to have a similar effect to more than doubling semanusers in the process of designing notations?
tic transparency: PoN DR achieved a recognition accuracy of
A:
Through symbolisation experiments (e.g. Study 1),
96.3%, which based on the regression equation (Equation 1),
blind interpretation experiments (e.g. Study 4) and recognishould require a semantic transparency coefficient of .55
tion experiments (e.g. Study 5).
(rather than .26). Design rationale appears to act as an adRQ5. How can we evaluate user comprehensibility of visjunct to semantic transparency in recognition processes:
when it is difficult to infer the meaning of symbols from
ual notations prior to their release (like clinical trials
their appearance alone, design rationale helps people refor drugs or usability testing for software systems)?
member what symbols mean by creating additional semantic
A: Through blind interpretation studies (e.g. Study 4) and
cues in long term memory. This emphasises the importance
recognition studies (e.g. Study 5) using members of the tarof explanations to the human mind, which is a well-known
get audience (or proxies). The results of such studies will
causal processor: even from an early age, children need to
help identify potential interpretation problems, which can be
constantly know why [25,39].
addressed prior to releasing them on the public. Such testing
always been the case that nave users have to learn our languages to communicate with us: by getting them to design
the languages themselves, we may be able to overcome
many of the communication problems that currently beset
RE practice.
REFERENCES
[1] Alexander, C.W. (1970): Notes On The Synthesis Of Form. Boston,
USA: Harvard University Press.
[2] Bodart, F., Patel, A., Sim, M. and Weber, R. (2001): "Should the
Optional Property Construct be used in Conceptual Modelling: A
Theory and Three Empirical Tests." Information Systems Research
12(4): 384-405.
[3] Britton, C. and Jones, S. (1999): "The Untrained Eye: How Languages
for Software Specification Support Understanding by Untrained
Users." Human Computer Interaction 14: 191-244.
[4] Chen, P.P. (1976): "The Entity Relationship Model: Towards An
Integrated View Of Data." ACM Transactions On Database Systems
1(1): 9-36.
[5] Chow, S.L. (1988): "Significance Test or Effect Size." Psychological
Bulletin 103(1): 105-110.
[6] Cohen, J. (1988): Statistical Power Analysis for the Behavioural
Sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum and Associates.
[7] Doan, A., Ramakrishnan, R. and Halevy, A.Y. (2011):
"Crowdsourcing Systems on the World-Wide Web." Communications
of the ACM 54(4): 86-96.
[8] Dubin, R. (1978): Theory Building (revised edition). New York: The
Free Press.
[9] Foster, J.J. (2001): "Graphical Symbols: Test Methods for Judged
Comprehensibility and for Comprehension." ISO Bulletin: 11-13.
[10] Genon, N., Caire, P., Toussaint, H., Heymans, P. and Moody, D.L.
(2012): Towards a More Semantically Transparent i* Visual Syntax.
Proceedings of the 18th International Working Conference on
Requirements Engineering: Foundation for Software Quality
(REFSQ). Essen, Germany, Springer LNCS 7195.
[11] Grau, G., Horkoff, J., Yu, E. and Abdulhadi, S. (2007): "i* Guide
3.0." Retrieved February 10, 2009, from http://istar.rwthaachen.de/tiki-index.php?page_ref_id=67.
[12] Heath, C. and Heath, D. (2008): Made to Stick: Why Some Ideas Take
Hold and Others Come Unstuck. London, England: Arrow Books.
[13] Hitchman, S. (1995): "Practitioner Perceptions on the use of Some
Semantic Concepts in the Entity Relationship Model." European
Journal of Information Systems 4(1): 31-40.
[14] Hitchman, S. (2002): "The Details of Conceptual Modelling Notations
are Important - A Comparison of Relationship Normative Language."
Communications of the AIS 9(10).
[15] Howard, C., O'Boyle, M.W., Eastman, V., Andre, T. and Motoyama,
T. (1991): "The relative effectiveness of symbols and words to convey
photocopier functions." Applied Ergonomics 22(4): 218-224.
[16] Howell, W.C. and Fuchs, A.H. (1968): "Population Stereotypy in
Code Design." Organizational Behavior and Human Performance 3:
310-339.
[17] ISO (2003): ISO Standard Graphical Symbols: Safety Colours and
Safety Signs Registered Safety Signs (ISO 7010:2003). Geneva,
Switzerland, International Standards Organisation (ISO).
[18] ISO (2007): Graphical symbols - Test methods - Part 1: Methods for
testing comprehensibility (ISO 9186-1:2007). Geneva, Switzerland,
International Standards Organisation (ISO).
[19] ISO (2007): ISO Standard Graphical Symbols: Public Information
Symbols (ISO 7001:2007). Geneva, Switzerland, International
Standards Organisation (ISO).
[20] Jick, T.D. (1979): "Mixing Qualitative and Quantitative Methods:
Triangulation in Action." Administrative Science Quarterly 24: 602611.
[21] Jones, S. (1983): "Stereotypy in pictograms of abstract concepts."
Ergonomics 26(6): 605-611.
[22] Lee, J. (1997): "Design Rationale Systems: Understanding the Issues."
IEEE Expert 12(3): 7885.