Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Jan M.

Noyes and Stella Mills

Heuristic Evaluation of Websites: Lessons Learned

Jan M. Noyes1 and Stella Mills2


1
University of Bristol, UK
2
University of Gloucestershire, UK

Abstract have been developed (Nielsen & Mack,


1994). One example of these methods is
Heuristic evaluation is a technique that heuristic evaluation. “Heuristic evaluation is
has been developed over the last 15 years as an easy to use, easy to learn, discount
a result of the growth and interest in the usability evaluation technique used to
usability of systems and products. It is a so- identify major usability problems of a
called ‘discount usability engineering’ product in a timely manner with reasonable
method since it comprises a so-called ‘quick cost” (Zhang, Johnson, Patel, Paige, &
and dirty’ means of evaluation that can be Kubose, 2003, p. 25). Further, it has been
carried out by informed non-experts. This defined as a specialist report method, and is
paper overviews the use of heuristic thought to be one of the most effective in
evaluations in usability testing, and presents identifying usability problems (Fu, Salvendy,
two heuristic evaluations of websites carried & Turley, 2002) and, more recently, the
out by the authors. From the lessons learned, interactivity of web-based learning systems
a series of recommendations are made in (Evans & Sabry, 2003). Heuristic evaluation
order to help computer and other provides the focus of the current paper,
professionals, who need to conduct heuristic which aims to introduce the method as a
evaluations. useful tool for evaluating websites. The
various steps of undertaking an heuristic
Keywords: Heuristic evaluation, discount analysis are given, together with a discussion
usability engineering, human-computer of these in the light of two website
interaction, usability, Internet, website evaluations; issues raised are also considered
design. at the end of each step.

What is Heuristic Evaluation?


Designing websites needs to be
accompanied by evaluation in order to An heuristic evaluation is essentially a
ensure systems and products are well systematic inspection of a user interface by
designed for human use. However, the having a small set of evaluators examine an
engineering emphasis traditionally has interface and judge its compliance with
concentrated on bringing a product to market recognised usability principles (Nielsen &
as quickly as possible in order to begin to Molich, 1990). The overall aim of an
realise the costs of development. Economic, heuristic evaluation is to identify problems
logistic and practical reasons often have with the user interface. Besides finding
precluded carrying out a full or even part problems, an heuristic evaluation helps the
evaluation. In response to this, the so-called designer to prioritize further refinement.
‘discount usability engineering’ methods Although this type of evaluation does not

32
Heuristic Evaluation of Websites: Lessons Learned

provide explicit technical solutions, often the followed by a discussion of the lessons
act of identification of problems leads to learnt.
suggestions concerning their rectification.
The focus of an heuristic evaluation is Step 1. Defining the Heuristics
usually on the whole of the design rather The heuristics are features relating to the
than on specific aspects (Lewis & Wharton, usability of the interface that will have been
1997). It is carried out as part of the overall determined prior to the evaluation. Two
iterative design process, and typically examples of heuristics might be
involves a small group of people acting as compatibility, i.e. the operation of the
evaluators and examining the interface in interface should be in keeping with the
order to check the extent to which it meets expectations of the user from using similar
recognised design principles, i.e. the interfaces, and reversibility, i.e. users have
heuristics. These heuristics are often in the the opportunity to ‘undo’ their actions.
form of guidelines. Given that Smith and Mosier (1986)
Heuristic evaluation was developed as a generated 944 guidelines for user-system
‘discount usability engineering’ method, i.e. interaction, a large number of heuristics are
“an easy and fast method of finding potential possible for some interfaces. One of the
problems” (Nielsen & Mack, 1994, p. 25). initial problems, therefore, when carrying
This is an important point since many studies out an heuristic evaluation is to determine
have attempted to compare heuristic which heuristics to use.
evaluation and empirical testing methods. A systematic attempt at defining
The results have been mixed with some appropriate heuristics was carried out by
researchers, e.g. Jeffries, Miller, Wharton, Nielsen (1994). He put forward a set of nine
and Uyeda (1991) and Virzi, Sorce, and heuristics which were derived from a factor
Herbert (1993) finding that the heuristic analysis of 11 studies covering 249 usability
method will identify the largest number of problems. These were: visibility of system
usability problems, while others, e.g. status, match between system and the real
Desurvire, Lawrence, and Atwood (1991) world, user control and freedom, consistency
and Karat, Campbell, and Fiegel (1992) and standards, error prevention, recognition
indicating that empirical testing identified rather than recall, flexibility and efficiency
more problems. The difficulty here might lie of use, aesthetic and minimalist design, and
in distinguishing the severity and importance helping users recognize, diagnose and
of the usability problems identified recover from errors. A tenth heuristic, help
compared with their actual number. and documentation, was added later to this
Measurements have tended to focus on list by Nielsen, while Muller, McClard, Bell,
quantity, whereas the type of usability Dooley, Meiskey, Meskill, Sparks, and
problem may be more important. Kamper Tellam (1995) extended this original list to
(2002) concluded that heuristic evaluation is include three more heuristics. These were:
more appropriate when used earlier in the respect the user and his/her skills, provide a
design process, whereas empirical testing is pleasurable experience with the system, and
of more use later on. This may help explain support quality work. Although it is
some of these earlier findings of mixed unrealistic to expect all these heuristics to
success. relate to every interface, they do provide a
We turn now to the steps in the process starting point when carrying out an heuristic
of conducting an heuristic evaluation; each evaluation.
step is then illustrated with our experiences, User characteristics and the context of
use are important when defining heuristics.

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

33
Jan M. Noyes and Stella Mills

Clearly, if the system is to be used by Evaluating websites: Our experiences


novices then the heuristics need to reflect Given the numerous generic evaluation
their needs so that, for example, a minimalist principles available, ranging from those
style of interface may not be suitable. directly related to system design (e.g.
Context of use needs also to be ascertained. Nielsen & Mack, 1994) to those based on
If the system is to be used in a safety critical psychological principles (e.g. Gardiner &
situation, for example, then the interface Christie, 1987), it is necessary to choose the
must reflect good practice in ensuring heuristics carefully. Following a typical user-
cognitive efficiency. In other words, the centred design approach, a good starting
more specific the heuristics can be, the more point is to consider the users’ characteristics
closely the evaluation will mimic the actual and the context of use of the system. The
working situation and, consequently, the principles given in Table 1 have been
system is more likely to satisfy the users’ derived previously from the literature and
needs in achieving their goals. used effectively by one of the authors in an
evaluation context.

Insert Table 1 about here.


Table 1

Some Generic Principles for Website Evaluation

1. The presentation is of great importance when decision making is necessary from the
interpretation of the information displayed (Mills, 1995).
2. Information must be within the user’s capability to comprehend instantly and
simultaneously (Mills, 1995).
3. Information supplied should be easy to receive and absorb and clear in all respects
(Mills, 1995).
4. Operational errors should be rectifiable without consequence (Mills, 1995).
5. Users should be able to master the system and use it effectively (Mills, 1995).
6. Information should be consistently displayed (Nielsen & Mack, 1994; Parlangeli,
Marchigiani, & Bagnara, 1999).
7. Clearly marked exits should be available at all times (Parlangeli et al., 1999).
8. The design style of the interface should reflect the users’ characteristics (Mills &
Araújo, 1999).

Charlton, Gittings, Leng, Little, and such as those given above in Table 1, with
Neilson (1999) refuted the claim that the more specific guidelines. It must be pointed
web provides a simple graphical interface out, however, that guidelines are by no
that is usable by anyone by arguing that means unanimously agreed by evaluators
usability is more than utilising a set of skills since some of them have yet to be tested
possessed by the user. They admit, however, fully. Even so, it is possible to be specific in
that many web users already possess skills some cases and to give some guidance as to
that enable them to navigate satisfactorily what constitutes a more usable interface.
through menu structures, icons and browsers. Table 2 shows a list of ergonomic principles
Even so, in evaluating websites, it is derived from the literature.
necessary to supplement generic principles,

34
Heuristic Evaluation of Websites: Lessons Learned

Table 2

Ergonomic Principles for Website Evaluation

1. Text should always be in a style that is understandable without being patronising (Noyes
& Mills, 1998).
2. Text should not use a variety of fonts unless a special effect is required (Noyes & Mills,
1998).
3. Upper case letters should only be used for the first letter of words, normally only at the
beginning of a sentence, the exception being some short words used in warnings (Noyes
& Mills, 1998).
4. The minimum number of colours should be used to gain the desired effect (Noyes &
Mills, 1998).
5. The use of colours should reflect the time the user spends viewing a particular page
(Shneiderman, 1998).
6. The various facets of each page should be clear instantaneously to the user; these include
navigation aids, exit points and links to other pages (Shneiderman, 1998).
7. Navigation support should clearly indicate the user’s progress through the system
(Shneiderman, 1998).
8. Metaphors should be life-like and instantaneously recognisable by the user (Noyes &
Mills, 1998).
9. The software must accommodate the technical limitations of the hardware (Mills &
Araújo, 1998).
10. Sound should always match the user’s expectation (Noyes & Mills, 1998).
11. Sound should always be distinct from any background noise (Noyes & Mills, 1998).
12. Any sound should only provide the minimum information for comprehension (Noyes &
Mills, 1998)
13. Music can be used to communicate the flow of time and emotion (Robertson, 1996).
14. Animation should be used purposefully (Shneiderman, 1998).
15. Graphics should be used purposefully (Noyes & Mills, 1998).
16. No more than four different graphics should be used (Shneiderman, 1998).
17. Video should be used in short sequences to illustrate static procedures (Shneiderman,
1998).

Lessons learned themselves for easy general application.


In applying the heuristics, it is often They have, in fact, been used profitably to
necessary to derive the set from first evaluate a number of different websites and
principles, depending on the users and the have yielded design modifications that have
purpose of the software. However, in this improved the accessibility and general usage
case, we have been able to use generic of the websites. But it should be stressed that
questions which cover the main design areas beyond the rather specialised type of
of websites and the user groups will be software that a website is (after all, most
ubiquitous due to the nature of the Internet. websites are used primarily for information
Thus, given that businesses want to target the seeking and purchasing and do not have
general populace and so must be designed much data input from the user, for example),
for the ‘average’ web user (whoever that it may be necessary at least to fine tune the
person may be!), the heuristics lend heuristics for the software under evaluation.

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

35
Jan M. Noyes and Stella Mills

From our experience, when deriving heuristics is a usability expert, but the
heuristics, it is better to have a small number situation with regard to the evaluators of the
of, say up to 10, categories and then, if interface is not so clear. In some situations, it
necessary, include sub-heuristics under each is quite likely that the person who
category. While this may lengthen the list of determines the heuristics (the ‘defining
heuristics, it focuses the evaluator on a evaluator’) could assume the role of
particular evaluation section and allows a evaluator (the ‘user evaluator’), but it may be
more in-depth evaluation to be achieved more appropriate to have ‘real users’
more quickly. This is especially evaluate the interface.
advantageous if the system is to be used by a Nielsen (1992) considered this issue in a
variety of users as it allows the evaluator to number of experimental studies. He defined
examine each part of the system for relevant three groups of user evaluators: (i) ‘novices’
potential problems for each user category. who knew about computers, but with no
In using the heuristics, it is sometimes specialist usability expertise; (ii) ‘single
necessary to ignore sections that do not experts’ who were usability experts, but with
apply. For example, a website may not have no knowledge of the system interface, and
any sound associated with it. However, it is (iii) ‘double experts’ who had knowledge
not recommended that such sections be about both the system and usability
removed as the presence of these sections attributes. Nielsen found there was a
reminds the designer that future systematic difference in the performance of
modifications to the website may include the evaluators with the double experts
these other media. Also, if these heuristics locating 60% of the usability problems,
are used for comparisons, particularly of single experts finding 41%, and the novices
competitors’ websites, then it is much easier an average of 22%. An heuristic evaluation
to compare the design features and their is likely to be more successful when carried
characteristics. Interestingly, it has been our out by individuals with knowledge of
experience that few companies have thought usability engineering and system knowledge.
about the website’s design giving However, we would argue beyond this and
competitive advantage as most companies for evaluators to have domain knowledge
seem still to be focussed on their own sites. (i.e. be ‘triple’ experts). This would allow
comparison within the field of application to
Step 2. Identifying who should carry out the be made intrinsically, thus leading to amore
Heuristic Evaluation knowledgeable evaluation. This is
A further issue relates to the people particularly appropriate for specialist
carrying out the heuristic evaluation. There domains such as sports or project
is a need for someone to define the heuristics management software.
as well as having someone act as evaluator The number of evaluators also needs to
and actually carry out the practical be determined. It is possible to conduct an
evaluation. The 10 heuristics defined by heuristic evaluation with only one individual,
Nielsen (1994) require the evaluators to who acts as both the defining and user
know the meaning of the terms both from a evaluator. However, the obvious difficulty is
human-computer interaction (HCI) that a single evaluator may not be
viewpoint and domain area perspective. This particularly good at evaluating the interface,
often means that the evaluators need to be because of a limited knowledge base. This
‘double experts’ – experts in HCI and the was shown by Nielsen (1992) in an analysis
domain area (Nielsen, 1992). It is generally of six projects. He found that one evaluator
accepted that the person defining the only located 35% of the usability problems

36
Heuristic Evaluation of Websites: Lessons Learned

associated with the interface while five for relevant potential problems for each user
evaluators found 75% of the usability category. This assumes, of course, that the
problems. Increasing the number of evaluators have sufficient skill and
evaluators to 15 resulted in 90% of the knowledge to adapt their evaluating to
problems being identified. Work by Virzi different cognitive scenarios for role-playing
(1992) also supported this finding in that the different users’ needs and goals. It
four to five evaluators were found to supports using double experts. This may
highlight 80% of the usability problems. increase the overall costs of the evaluation,
However, adding in more evaluators will but the quality of the evaluation should
inevitably lead to more problems being reflect this investment.
uncovered.
In conclusion, the optimum number of Step 3. Conducting the Heuristic Evaluation
evaluators is probably between three and A number of different techniques can be
five. Some support for this comes from a used for the practical evaluation of the
recent study evaluating ambient displays interface. However, the basic premises are
where three to five evaluators were able to the same. For example, the actual heuristic
identify between 40 to 60% of the usability evaluation is carried out by each individual
problems (Mankoff, Dey, Hsieh, Kientz, evaluating the interface usually alone and in
Lederer, & Ames, 2003). However, the final their own time. It is implicit in the technique
decision needs to be made according to the that the evaluator uses the interface, even if
specific characteristics of the interface under only a mock-up is available. Ideally, most
evaluation. If, for example, there are health interfaces being subject to an heuristic
and safety considerations, and the usability evaluation would be close to product release,
of the interface is critical to the safe i.e. an operational interface is present, but
functioning of the system, more evaluators there is an opportunity to make changes
should be used in order to ensure that the (Fleming, 1998). However, given that the
majority of the problems can be located evaluators in an heuristic evaluation need not
before the system becomes operational. If, use the interface to perform a task as in the
on the other hand, the interface is for a real world, it is possible to carry out an
website, where not uncovering all the evaluation using only ‘pen and paper’.
usability problems will result in no more Although there is a loss of ecological
than a mild irritation for some users, then the validity in doing this, it might be appropriate
number of evaluators could be considerably for systems that are not yet developed, and
reduced. this also allows for heuristic evaluations to
be carried out early in the life cycle. In the
Lessons learned simplest form of an heuristic evaluation, the
Having stated that the optimum number user evaluator could merely study the
of evaluators lies between three and five, we interface without carrying out any pre-
have successfully carried out heuristic described tasks. In these cases, often a
evaluations with two evaluators. However, ‘walkthrough’ technique is used (see Lewis,
both these individuals were human factors Polson, Wharton & Rieman, 1990), where
specialists and had system knowledge, i.e. the evaluator ‘walks through’ typical tasks
so-called ‘double experts’. which would be carried out using the
User characteristics and context of use interface.
are vital considerations. If the system is to be A variation on this is a cognitive
used by a variety of users, the evaluator will walkthrough where the focus of the tasks is
need to examine each section of the system on the cognitive aspects of the user-system

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

37
Jan M. Noyes and Stella Mills

interaction (Lewis & Wharton, 1997). The reported that about three times more
benefit of a walkthrough is that it allows problems could be found in the same amount
using the interface to be simulated, and of evaluation time by adopting the
specific tasks to be considered. Although jogthrough technique. Finally, an extension
Lewis and Wharton (p. 718) argued that of the basic walkthough is the pluralistic
“heuristic evaluation provides an overview walkthrough (Bias, 1994). In this type of
of a complete design rather than a detailed walkthrough, the practical evaluation is
examination of particular tasks” carried out by a number of different groups
incorporating a cognitive walkthrough into of individuals who have been involved in the
the evaluation will allow a focus on specific system development, e.g. usability experts,
tasks. users and the system developers.
There are various modifications of the
walkthrough technique, e.g. the Our experiences
programming walkthrough (Bell, Rieman, & The heuristics were applied as a
Lewis, 1991) and the cognitive jogthrough checklist to two UK websites, one for an
(Rowley & Rhoades, 1992). In the agricultural charity and the other a small
jogthrough technique, the evaluation session wine producing company. Tables 3a and 3b
is recorded on video tape allowing the summarize the results of applying the
evaluation (but not the test) to be conducted questions relating to the four areas for
in a much shorter time. Lewis and Wharton evaluation.

Table 3a

Website Evaluation of an Agricultural Charity

Criterion Comments

Fitness for Purpose


Integrity of information
Is the information reliable? Yes, but rather dated
Is the information consistent with similar
information found elsewhere? Yes, as much as I could judge

Identify users’ characteristics


Have the potential users experience of computers?
Have the potential users experience of website This is assumed
exploration?
For this evaluation, are the users considered to be Yes
browsers or searching with fully specified goals?
Is the website intended for specific age groups
such as children? Both were role-played

Identify users’ goals It appeared that adults using the


Are the users’ goals consistent with their site for serious information were
characteristics? envisaged as the users
System functionality
Is the information available within the website
specification consistent with the users’ goals? Yes

38
Heuristic Evaluation of Websites: Lessons Learned

Is the content of the website (or at least the main Yes, but rather thin and dated
links) clearly visible on entering?

Overall Design
Decision making Yes, but first impressions are
Is decision making necessary from the rather clouded by the colour
interpretation of the information displayed? combination and the large
amount of information and
choice
Is the design sufficiently clear to allow a fully
informed decision?
Information comprehension
Is the information comprehensible instantly and
simultaneously? Yes, the choices weren’t
immediately obvious

Clarity of information
Is the information supplied easily absorbed? Yes, after thought

Is the information clear in all respects? No – initially the site appears


cluttered with buttons too small
for quick comprehension
Rectification of operational errors
Are operational errors rectifiable without Yes, but at times the style is
consequence? rather emotional and does not,
therefore, give the impression of
scholarly independence (e.g.
articles by Brian Dobby). Also,
some articles are long for
reading especially with such a
small font but printing is easily
achieved
Are error messages comprehensible
instantaneously? No – section buttons are too
small and some links are not
obvious

Are fatal errors prevented? I found no errors as such

Effective mastery of system N/A


Can users master the system and use it effectively? N/A

Consistency
Is information consistently displayed throughout
the website? Yes, if they are experienced

Clearly marked exits


Are clearly marked exits available at all times? More or less but there were
some different routes to the
same article such as rural

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

39
Jan M. Noyes and Stella Mills

theology arriving at redefining


sustainability
Style of interface
Does the design style of the interface reflect the The homepage is linked at the
users’ characteristics? top but not overly obvious
Some thought was needed. The
colour scheme of green and
yellow is rather eye straining
although the idea is good.
Perhaps green and white is a
better combination

If applicable, does the design style cater for any Definitely adults and probably
specified age groups? educated viewers searching for
information

Interface/page design
General aspects
Use of text
Is the text understandable instantaneously? Yes
Is the text free from any patronising of the user?
Are the number of different text fonts limited to Yes but at times rather
two? emotional
Are upper case letters only used for the first letter
of words? Yes
Use of colour
Is the minimum number of colours used to gain the Yes
desired effect?

Are soft colours used on pages that require the No – this needs rectification as
user to read sections of text? they were too bright
Black and white for text is good
as is white/green but why red
headings?

Clarity
Are the various facets, such as navigation aids, No – there are too many
exit points and links to other pages, of each page headings, boxes etc on the
clear instantaneously to the user? homepage. The search area is
too small
Navigation
Is the user’s progress through the system clearly
indicated? Not at all, only through the
browser facilities

Is the number of links to other pages adequate but Probably yes but there are
not excessive? various paths to the same pages.
New links text is very small and
there are links to nowhere such
as briefing papers

40
Heuristic Evaluation of Websites: Lessons Learned

Is the website broad and shallow in terms of


navigation levels? Yes

Metaphors
Are metaphors life-like and instantaneously The search engine ‘go’ button
recognisable by the user? has to be pressed for bulletin
board queries. Pull-down box is
below and then need to click
‘search’
Use of multimedia facets
Does the software adequately accommodate the Yes except possibly for frames
technical limitations of the hardware? if used

Sound N/A
Are the sounds used those that the user would
expect in a given situation?
Are all sounds distinct from any background
noise?
Do all sounds provide the minimum information
for comprehension?
If music is used, does it communicate the flow of
time?
Is music used to create emotion?

Animation N/A
Is animation used to increase the user’s
enjoyment?
Is animation used to aid icon understanding?

Graphics
Are all graphics used purposefully? Pictures on homepage could be
made active as icons as on the
site map. This leads to an
inconsistency at present
Are no more than four different graphics used on
any one website? There are more than four photos
on the homepage
Video
Is video used in short sequences to illustrate static
procedures? N/A

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

41
Jan M. Noyes and Stella Mills

Table 3b

An Evaluation of a Wine Producers’ Website

Fitness for Purpose


Integrity of information
Is the information reliable?
Yes – this is assumed.
Is the information consistent with similar information found elsewhere?
N/A
Identify users’ characteristics
Have the potential users experience of computers?
No evidence of users being consulted
Have the potential users experience of website exploration?
Possibly
For this evaluation, are the users considered to be browsers or searching with fully
specified goals?
Both possibly.
Is the website intended for specific age groups such as children?
Adults.
Identify users’ goals
Are the users’ goals consistent with their characteristics?
Goals not clearly indicated
System functionality
Is the information available within the website specification consistent with the users’
goals?
Yes probably.
Is the content of the website (or at least the main links) clearly visible on entering?
Yes but there is little leading of the user to the goals

Overall Design
Decision making
Is decision making necessary from the interpretation of the information displayed?
Yes.
Is the design sufficiently clear to allow a fully informed decision?
No - the information is far too cluttered. Scrolling is used.
Information comprehension
Is the information comprehensible instantly and simultaneously?
Yes – but it is difficult to find
Clarity of information
Is the information supplied easily absorbed?
Possibly - but the design is too cluttered. The user is given far too much choice without
clarity. The navigation does not lead the user or help.
Is the information clear in all respects?
Yes - except for too much text.
Rectification of operational errors
Are operational errors rectifiable without consequence?

42
Heuristic Evaluation of Websites: Lessons Learned

Yes
Are error messages comprehensible instantaneously?
None found
Are fatal errors prevented?
None found
Note all error messages are browser dependent; the system appears not to have its own
error messages.
Effective mastery of system
Can users master the system and use it effectively?
Yes but not very effectively. There is far too much information almost ‘thrown’ at the
user. Pictures and colours are used indiscriminately.
Consistency
Is information consistently displayed throughout the website?
Probably
Clearly marked exits
Are clearly marked exits available at all times?
No – some links are not clear.
Style of interface
Does the design style of the interface reflect the users’ characteristics?
Only superficially through the use of pictures.
If applicable, does the design style cater for any specified age groups?
Yes – adult

Interface/page design
General aspects
Use of text
Is the text understandable instantaneously?
Yes
Is the text free from any patronising of the user?
Yes
Are the number of different text fonts limited to two?
Yes
Are upper case letters only used for the first letter of words?
Yes
Use of colour
Is the minimum number of colours used to gain the desired effect?
No – colours have no obvious metaphor and are used indiscriminately
Are soft colours used on pages that require the user to read sections of text?
Yes – text mirrors WORD conventions
Clarity
Are the various facets, such as navigation aids, exit points and links to other pages, of
each page clear instantaneously to the user?
Not really – there are too much text and graphics
Navigation
Is the user’s progress through the system clearly indicated?
No – the left bar helps and at times the upper bar too but no structure becomes
evident.

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

43
Jan M. Noyes and Stella Mills

Is the number of links to other pages adequate but not excessive?


Rather too many
Is the website broad and shallow in terms of navigation levels?
No – it appears square.
Metaphors
Are metaphors life-like and instantaneously recognisable by the user?
I wasn’t aware of any metaphors really

Use of multimedia facets


Does the software adequately accommodate the technical limitations the hardware?
Yes - although the machine used in the evaluation was of a good specification.
Sound
Are the sounds used those that the user would expect in a given situation?
N/A
Are all sounds distinct from any background noise?
N/A
Do all sounds provide the minimum information for comprehension?
N/A
If music is used, does it communicate the flow of time?
N/A
Is music used to create emotion?
N/A
Animation
Is animation used to increase the user’s enjoyment?
N/A
Is animation used to aid icon understanding?
No - but icon meaning is generally clear.
Graphics
Are all graphics used purposefully?
No - some pictures seem to serve little purpose and add to the clutter
Are no more than four different graphics used on any one website?
Yes - graphics are restricted to pictures.
Video
Is video used in short sequences to illustrate static procedures?
N/A

Lessons learned problems in navigation, user confusion, use


As with any heuristic evaluation, there of colour and other facets important
was an element of role-playing; in both particularly in website design. Domain
cases, the evaluator was a human factors knowledge and practical knowledge of
expert with previous experience of heuristic targeted users also facilitated the evaluation.
evaluation and so was able to identify
possible problems from both naïve and Step 4: Recording and Presenting the Data
experienced users. In addition, the The exact way in which the heuristics
evaluator’s knowledge of interface design are checked against the design needs to be
was essential in order to identify potential determined. A checklist approach could be

44
Heuristic Evaluation of Websites: Lessons Learned

used or more detailed answers elicited. Some suggesting solutions and wider connected
evaluations might use a combination of issues.
approaches. Recording of this information is It will be noted that Tables 3a and 3b
usually carried out by the evaluator, although both use the same heuristics but the results
some heuristic evaluations have an are presented slightly differently; the first is
independent observer who records comments in true tabular form while the second is more
being made by the evaluator (Nielsen, 2003). as a list of questions and answers. In the
The advantage of this is that the evaluator latter, bold type has been used to highlight
does not lose track of the procedure whilst the answers. We are content working with
having to stop to make notes; the either format and we use whichever one the
disadvantage is inaccurate reporting by the client prefers. However, this does emphasize
observer who misinterprets the evaluator’s that there is not a generally accepted way of
responses. presenting the data from an heuristic analysis
and we would favour any presentation style,
Lessons learned which gives clarity, while highlighting issues
For the evaluation, the heuristics were of concern. Severity ratings (see below) were
listed as questions rather than stated design not used formally but in the accompanying
principles; this is rather akin to research aims report for both businesses, emphasis was
being re-phrased into research questions and given to those factors, which may cause a
has been found easier to administer. The user to give up or move to another site.
questions are shown in Table 3. Questions
also allow for a clearer categorization of Step 5: Assessing the Severity of the
answer, as this can be recorded simply as Usability of the Websites
‘yes’, ‘no’, or ‘maybe’, following A further step in the heuristic evaluation
suggestions used in Usability Context is for the evaluators to rate all of the
Analysis quality documentation (Bowden, problems identified. As can be seen from the
Thomas, & Bevan, 1995), where a minimal, example given, this can be done verbally in
positive answer, essentially ‘yes’, will the ‘comments’ column or quantitatively
indicate the compliance of the website with using the following 5-point scale: ‘0 = I do
the associated principle. Conversely, a not agree that this is a usability problem’; ‘1
negative answer indicates non-compliance. = Cosmetic problem only: need not be fixed
Thus, a basic usability measure is possible unless extra time is available’; ‘2 = Minor
by simply counting the positive responses to usability problem: fixing this should be
the applicable questions. In essence, it is a given low priority’; ‘3 = Major usability
simplified Nielsen (2003) scale. If a tabular problem: important to fix, so should be given
format of recording the results of the high priority’; ‘4 = Usability catastrophe:
heuristic evaluation is preserved, then a third imperative to fix this before real users use
column can be used for comments and the interface’ (Nielsen, 2003). Compilation
suggesting solutions. Alternatively, these can of this data will indicate the order in which
be discussed in a separate document whilst usability problems need to be addressed and
referring to the table; this is helpful if the can also be used to provide a crude usability
problems are numerous and need fairly rating for the system.
drastic remedies. In business terms, this
tabular approach may be sufficient but we Our experiences
have found that some companies prefer to In evaluating websites for a number of
have a prose document highlighting the main clients in industry, it has been our experience
findings; this gives good scope for to be asked to evaluate sites which are

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

45
Jan M. Noyes and Stella Mills

already live. Often managers are aware that advantageous, but if the end-users of the
all is not well with the website but they are system will not be familiar with a previous
unable to technically define the problems. In version then system knowledge may hinder
such circumstances, we have found that it is the defining evaluator’s derivation of the
much better to use a verbal system of rating, heuristics.
as has been given in the examples (Tables 3a Recommendation 1: The evaluator has
and 3b) rather than try to quantify the computing and human factors knowledge of
problems in a rather subjective hierarchy. interface design and domain knowledge (i.e.
Managers generally want the website to be is a ‘triple’ expert).
usable in all aspects and so it has been our
experience to list all problems that detract It is important that the heuristics reflect
from the objectives of the website as all the user categories as well as the context of
these need to be fixed for the user to move use in terms of any special target users. In
freely within the website. However, should website design, for example, most businesses
the website be evaluated heuristically in its have a target audience for their products and
early stages, then some quantitative ranking it is likely that these people will also visit the
of the problems along the lines of Nielsen’s website. Consequently, it is important that
(2003) hierarchy above could be useful. the design of the website reflects the needs
of these users as otherwise they may be
Lessons learned alienated from using the site. In addition, the
While the quantitative ranking of company may want to attract a different
problems suggested by Nielsen (2003) may target group and the needs of these users also
be useful in systems development, it has must be accommodated in the design. This
been our experience that for live websites of can lead to conflict and there is no easy
businesses, it can be ‘overkill’, adding little design solution. Heuristic evaluation can,
to a more verbal approach. Managers want to however, be instrumental in identifying the
know what must be done to fix problems design features that conflict; e.g. animation
which they can perceive but may not be in a may be useful for attracting young people
position to suggest viable solutions. Thus, in but may not be desirable for the older
ranking usability problems, it is often the generation (Greenwood, 2000).
solutions which are more important; for Recommendation 2: Heuristics need to
example, if a problem has no solution within reflect user characteristics and context of
a budget, then practically, it has to remain, use.
regardless of the consequences for the users.
Issue 2. Identifying the evaluators
Issues and recommendations Generally it has been found easy to
apply the heuristics, but this has been done
Issue 1. Defining the heuristics by experienced human factors evaluators and
The primary consideration is to have it is hard to see how an inexperienced user
someone define the heuristics who knows without any appropriate knowledge or
about interface design and the principles of training could find the same type of design
good design from a user perspective; in problems. Indeed, the fact that after design,
addition, domain knowledge is helpful as such problems have been found, suggests the
this contextualizes the system and should need for the human factor expert’s
lead to more user-centred and specialised evaluation. We would argue that a single
heuristics. It can be argued that knowledge expert evaluator is more beneficial than a
of the system under evaluation is also larger number of inexperienced evaluators

46
Heuristic Evaluation of Websites: Lessons Learned

since it is impossible for novice users to role- Issue 4. Severity Ratings


play experienced users. However, it is not The calculation of severity ratings helps
difficult for experienced users to role-play assess the extent of the problems associated
novice users. In addition, the use of one with the usability of the interface. It also
expert evaluator is much easier and cheaper allows comparisons to be made between
strategically and operationally. interfaces.
Recommendation 3: Single evaluators, Recommendation 7: Severity ratings are
expert in computing, human factors and useful in that they provide a ranking of the
domain knowledge are likely to be more problems found; however, for managerial
beneficial than multiple novice users. use, this may be more useful in a verbal form
than a quantitative ranking.
As with any evaluation, the personal
bias of the evaluator plays a role in Issue 5. Further analyses
identifying the areas of concern. In this Nielsen (1993) suggested that it is useful
sense, questions perhaps ease the situation to hold a debriefing session at the end of the
by narrowing the scope of answers, whereas evaluation in order to allow the good aspects
guidelines or traditional heuristics may allow of the interface to be highlighted and to
a wider answer. But the problem of bias is begin to give some consideration to how
one that is inherent in any evaluation and at usability issues can be addressed. The end-
least it is minimalized through using a result of the practical evaluation is usually a
generically derived table. list of usability problems that need to be
Recommendation 4: Bias should be reduced rectified before product release. This is
by using generically derived heuristics. useful in the development stages of the
website but, for live websites, a written
Issue 3. Carrying out the evaluation report can give an overview and context of
The evaluator works through a series of the problems, together with possible
typical tasks using the interface in order to solutions. Such a report can be passed
compare the attributes of the system with a around management for action as required.
list of recognised usability principles. There should be further studies to ascertain
Recording of the results is usually carried that the problems have been fixed.
out by the evaluator. At the end of the Recommendation 8: If the resource is
practical evaluation, the results are compiled available, further confirmatory and
to gauge the extent to which the interface verification exercises will be beneficial.
meets these pre-determined usability criteria.
If a number of evaluators have been Conclusion
employed, it is important that they do not
confer since this may introduce bias. In this paper, we have discussed our
Recommendation 5: The format for experiences of using heuristic evaluations for
recording the results of the evaluation will two websites which are live and whose
be determined by the nature of the heuristic managers were concerned about usability
evaluation and the skills of the evaluator(s). problems in the sites. These experiences,
Recommendation 6: There are benefits to be together with other heuristic evaluations not
attained from having an independent detailed formally here, suggest that heuristic
observer record the results. evaluation can be an inexpensive and useful
method of finding many of the problems that
users encounter. Clearly, this is part way to a
solution to each problem so that the

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

47
Jan M. Noyes and Stella Mills

evaluator who can also suggest suitable and and Information Technology, 21(2), 137-
practical solutions can add value to the 143.
design of the website. This is beneficial since Gardiner, M. M. & Christie, B. (1987).
more usable websites may mean more profit Applying cognitive psychology to user-
and success for the business. interface design. Chichester: Wiley.
Greenwood, P. J. (2000). Animation in
References electronic commerce web sites: Potential
usefulness to Internet marketers.
Bell, B., Rieman, J., & Lewis, C. (1991). Unpublished Master’s Thesis, University
“Usability testing of a graphical of Gloucestershire, UK.
programming system: Things we missed Jeffries, R., Miller, J. R., Wharton, C., &
in a programming walkthrough”. Uyeda, K. M. (1991). “User interface
Proceedings of CHI ‘91 (New Orleans, evaluation in the real world: A
LA) (pp. 7-12). New York: ACM. comparison of four techniques”.
Bias, R. G. (1994). “The pluralistic usability Proceedings of CHI ’91 Conference on
walkthrough: Coordinated empathies”. In human factors in computing systems
J. Nielsen & R. L. Mack (Eds.), Usability (New Orleans, LA) (pp. 119-124). New
inspection methods (pp. 63-76). New York: ACM.
York: Wiley. Kamper, R. J. (2002). “Extending the
Bowden, R., Thomas, C., & Bevan, N. (Eds.) usability of heuristics for design and
(1995). Usability context analysis: A evaluation: Lead, follow, and get out of
practical guide. Teddington, UK: the way”. International Journal of
National Physical Laboratory. Human-Computer Interaction, 14(3&4),
Charlton, C., Gittings, C., Leng, P., Little, J., 447-462.
& Neilson, I. (1999). “Bringing the Karat, C. M., Campbell, R., & Fiegel, T.
internet to the community”. Interacting (1992). Comparison of empirical testing
with Computers, 12(1), 51-61. and walkthrough methods in user
Desurvire, H., Lawrence, D., & Atwood, M. interface evaluation. Proceedings of CHI
(1991). “Empiricism versus judgement: ’92 Conference on human factors in
Comparing user interface evaluation computing systems (pp. 397-404). New
methods on a new telephone-based York: ACM.
interface”. SIGCHI Bulletin, 23, 58-59. Lewis, C. & Wharton, C. (1997). “Cognitive
Evans, C. & Sabry, K. (2003). “Evaluation walkthroughs”. In M. Helander, T. K.
of the interactivity of web-based learning Landauer & P. Prabhu (Eds.), Handbook
systems: Principles and process”. of human-computer interaction (2nd Ed.)
Innovations in Education and Teaching (pp. 717-732). North-Holland: Elsevier.
International, 40(1), 89-99. Lewis, C., Polson, P., Wharton, C., &
Fleming, J. (1998). User Testing: How to Rieman, J. (1990). “Testing a
Find Out What Users Want [Online]. walkthrough methodology for theory-
Available: based design of walk-up-and-use
http://www.ahref.com/guides/design/199 interfaces”. Proceedings of CHI ’90
806/0615jef.html Conference on human factors in
Fu, L., Salvendy, G., & Turley, L. (2002). computing systems (Seattle, WA) (pp.
“Effectiveness of user testing and 235-242). New York: ACM.
heuristic evaluation as a function of Mankoff, J., Dey, A.K., Hsieh, G., Kientz, J.,
performance classification”. Behaviour Lederer, S., & Ames, M. (2003).
“Heuristic evaluation of ambient

48
Heuristic Evaluation of Websites: Lessons Learned

displays”. Proceedings of CHI ’03 (Seattle, WA) (pp. 249-256). New York:
Conference on human factors in ACM.
computing systems (Fort Lauderdale, FL) Noyes, J. M. & Mills, S. (1998). Display
(pp. 169-176). New York: ACM. design for human-computer interaction.
Mills, S. (1995). “Usability problems of Cheltenham: Cheltenham & Gloucester
acoustical fishing aids”. Displays, 16(3), College of Higher Education.
115-121. Parlangeli, O., Marchigiani, E., & Bagnara,
Mills, S. & Araújo, M. M. T. (1998). S. (1999). “Multimedia systems in
“Learning from virtual reality – a lesson distance education: Effects of usability
for a designer”. HCI Letters, 1, 28-31. on learning”. Interacting With
Mills, S. & Araújo, M. M. T. (1999). Computers, 12(1), 37-49.
“Learning through virtual reality: A Robertson, P. (1996). Music and the mind.
preliminary investigation”. Interacting London: Channel 4 Television.
with Computers, 11, 453-462. Rowley, D. E. & Rhoades, D. G. (1992).
Muller, M. J., McClard, A., Bell, B., Dooley, “The cognitive jogthrough: A fast-paced
S., Meiskey, L., Meskill, J. A., Sparks, user interface evaluation procedure”.
R., & Tellam, D. (1995). “Validating an Proceedings of CHI ’92 Conference on
extension to participatory heuristic human factors in computing systems
evaluation: Quality of work and quality (Monterey, CA) (pp. 389-395). New
of work life”. Proceedings of CHI ’95 York: ACM.
Conference on human factors in Shneiderman, B. (1998). Designing the user
computing systems (pp. 115-116). New interface strategies for effective human-
York: ACM. computer interaction (3rd Ed.). Reading,
Nielsen, J. (1992). “Finding usability MA: Addison Wesley Longman.
problems through heuristic evaluation”. Smith, S. L. & Mosier, J. N. (1986).
Proceedings of CHI ’92 Conference on Guidelines for designing user interface
human factors in computing systems software. Report ESD-TR-86-278, USAF
(Monterey, CA) (pp. 373-380). New Electronic Systems Center, Hanscom
York: ACM. AFB, MA.
Nielsen, J. (1993). Usability engineering. Virzi, R. A. (1992). “Refining the test phase
London: AP Professional. of usability evaluation: How many
Nielsen, J. (1994). “Enhancing the subjects is enough?” Human Factors, 34,
explanatory power of usability 457-468.
heuristics”. Proceedings of CHI ’94 Virzi, R. A., Sorce, J. F., & Herbert, L. B.
Conference on human factors in (1993). “A comparison of three usability
computing systems (Boston, MA) (pp. evaluation methods: heuristic, think-
152-158). New York: ACM. aloud, and performance testing”.
Nielsen, J. (2003). [Online]. Available: Proceedings of the Human Factors and
http://ww.useit.com/papers/heuristic Ergonomics Society 37th Annual Meeting
Nielsen, J. & Mack, R. L. (1994). Usability (pp. 309-313). Santa Monica, CA: The
inspection methods. New York: Wiley. Human Factors and Ergonomics Society.
Nielsen, J. & Molich, R. (1990). “Heuristic Zhang, J., Johnson, T. R., Patel, V. L., Paige,
evaluation of user interfaces”. D. L., & Kubose, T. (2003). “Using
Proceedings of CHI ’90 Conference on usability heuristics to evaluate safety of
human factors in computing systems medical devices”. Journal of Biomedical
Informatics, 36, 23-30.

International Journal of The Computer, the Internet and Management Vol. 14.No.1 (January-April, 2006) pp 32- 49

49

You might also like