HCI Sem1 202021 LU5

TME3423/TMI3053/TMS3843/
TMT3683Human Computer Interaction

Semester 1, 2020/21
Unit 5: Evaluation
Contents
 Introducing evaluation
 Who, why, what, where, when
 Types of evaluation
 Considerations
 Usability testing
 Heuristic evaluation
Introduction
 Evaluation is integral to the design process to collect information about users’
experiences when interacting with the product
 Conducting evaluations involve

 Understanding why evaluation is important
 What aspects to evaluate
 Where evaluation should take place
 When to evaluate
Introduction
 There are many evaluation methods – choose which one to use based on the
goals of the evaluation
 Evaluation focuses on both

 Usability of the system (how easy it is to learn and use)
 User experience when interacting with it (how satisfying, enjoyable, motivating the
interaction is)
Introduction
 Usually involves observing participants and measuring their performances
 Experiments
 Field studies
 There are also methods that do not involve participants:

 Modeling user behavior – approximating what users might do when interacting with
the product
 Analytics – examine the performance of an already existing product so that it can
be improved
Who is involved?
Role Description
Participant The person who interacts with or inspects the UI to give feedback
Could be actual user or representative
Evaluator Conducts the evaluation (the “interviewer” or “tester”)
Plans the evaluation session
Analyzes the data
Reports the findings
Observer Observes the participant
Makes notes of the participant’s comments and any usability problems
Facilitator Manages the evaluation – explain the aims, describes how it will be
conducted
Answers participant’s questions during the session
*Evaluator may take on the role of observer and facilitator

Who is involved?
 Who is the actual user?
 E.g. 1: Public information kiosk
 Actual users: general public, including tourists who do not speak BM
 Participants in evaluation can be
 Actual users, all whom speak BM for first round
 Non-BM speaking users for second round
 E.g. 2: Safety critical car system

 Actual users: Plant supervisor with experiences
 Participants in evaluation:
 Actual users (domain knowledge of the actual users is important)
Why evaluate?
 Evaluation is done to
 to access if the UI design is effective, efficient, engaging, error tolerant and easy
to learn
 to check if there are any problems/difficulties, and find ways to improve the
design
 fix problems before the product goes on sale/is released to the user
 Basically, evaluation is done to check if the design is acceptable and

appropriate for the users
Why evaluate?
The 5E’s (the five dimensions of usability)
Effective Completeness and accuracy with which users achieve
their goals
Efficient Speed with which users complete tasks
Engaging Degree that makes the product pleasant or satisfying to
use
Error tolerant Prevents errors or helps with recovery from errors that
occur
Easy to learn Supports both initial orientation and deepening
understanding of its capabilities
What to evaluate?
 Does the design do what the users need and want it to do?
 Items that can be evaluated include:
 Low-tech prototypes
 Complete systems
 A particular screen function
 The whole workflow?
 Aesthetic design
 Technical features
 Examples:
 Game developers would want to know how engaging and fun the game is
 Web browser developers would want to know if users find items faster with their browser
 Government authorities would want to know if the system to control traffic lights result in less
accidents
 Etc.
What to evaluate?
 What type of data to collect?
 Quantitative, qualitative
 Refer to the data collection notes given earlier
 What tasks to ask the participants to try?
 What metric to use to measure?
 How to collect the data?
 What are the constraints?
What to evaluate - Data to collect
Usability dimensions Possible quantitative data to Possible qualitative date to
collect (task) collect (task)
Effective Task completed accurately User’s views of whether the
task was finished correctly or
not
Efficient Counting clicks/keystrokes User’s views of whether the
Analysis of navigation paths task was finished correctly or
not
Engaging Numeric measures of User satisfaction surveys or
satisfaction qualitative interviews to gauge
user acceptance and attitudes
Easy to learn Number of “false starts” Novice user’ reports about level
Time spend in incorrect routes, of confidence in using the
time spent by experienced vs interface
novice to complete task
Error tolerant Level of accuracy achieved in Users reports of confidence in
tasks compared to time spent interface even if made mistakes
What to evaluate?
 Usability metrics are particular measurements e.g. percentages, timings,
numbers etc.
 Example:
 A user must be able to load a webpage in 10 seconds
 It should take not more than 2 minutes for an experienced user to enter a
customer’s details in the hotel’s database
 At least 4 out of 5 novices using the product must rate it as “easy to use” on a 5
point scale (very easy to use, easy to use, neither easy nor difficult to use, difficult
to use, very difficult to use)
What to evaluate?
 Usability metrics can be assigned levels
 Example – to perform a task,

 Current level of performance: user takes 4 minutes (benchmark)
 Best case: 2 minutes (level to be achieved)
 Planned level: 3 minutes (first version)
 Worst case: slightly better than 4 minutes (lowest acceptable level of user
performance)
What to evaluate?
 Prepare task descriptions – tasks that participants will perform during the
evaluation
 Decide which tasks to administer and how many to evaluate – not feasible to
evaluate all
 Choose tasks that

 help in validating requirements OR
 focus on particular feature/usability concerns
What to evaluate?
 The possible tasks to consider:
 Core tasks that are frequently performed
 Tasks that are very important to the users or the business
 Tasks that have some new design features or functionality added
 Critical tasks, even though not frequently used)
 Tasks that has to be validated with users for greater clarity and understanding
 Design features highlighted in marketing efforts
 Tasks in use scenarios
 It is convenient to use task cards – 5” x 3” cards with task description on it

What to evaluate?
 Constraints should be considered, as what you would like to do in an
evaluation will be affected by the constraints
 Constraints:
 Money (budget)
 Timescales
 Availability of equipment
 Availability of participants and the cost of recruiting them
 Availability of evaluators
What to evaluate?
 How to collect data?
 Timing – stopwatch, clock
 Logging – software
 Think aloud – encourage participants to talk about what they are trying to do
during evaluation
 Taking notes – write down comments and observations
 Retrospective protocol – re-running events of evaluation and ask participant to
comment
 Recording technologies – video, audio, eye tracking
 Questionnaires
What to evaluate?
 Examples:
 Web browser: if users find items faster?
 Ambient display: does it change user’s behaviour?
 Game app: how engaging and fun? How long does a user play?
 Computerized traffic lights: lesser accidents?
 Website: complies with accessibility requirements?
 Toy makers: can the child user manipulate the controls? Engagement? Is toy safe?
 Personal digital music player: size, colour, shape?
 Software: market reaction to new homepage design?
Where to evaluate?
 Where evaluation takes place depends on what is being evaluated
 Evaluation can take place in different places:

 Labs
 Living labs
 Living labs have the artificial, controlled context of labs and the natural, uncontrolled
nature of natural settings
 Natural settings (also known as in-the-wild studies)
 User’s homes
 Outdoors
 Work place
Where to evaluate?
 Example:
What to evaluate Where to evaluate

Web accessibility Labs
Size and layout of a smartphone
Children’s enjoyment of playing Natural settings
and how long before they got
bored
Social networking User’s home
Online behaviour
When to evaluate?
 When to evaluate depends on the type of product is being evaluated
 Brand new product
 Upgrade of an existing product
 Brand new product:

 More time is usually invested in market research and establishing user
requirements
 These are used to create initial sketches, storyboards, series of screens, or
prototype which will be evaluated
When to evaluate?
 Upgrade of an existing product:
 Evaluation focus on what needs improving
 E.g. specific aspects such as enhanced navigation
 Formative vs. summative evaluation

 Formative evaluation: done during design to check if product meets user
requirements
 Summative evaluation: done to assess a finished product
Types of evaluation
Type of evaluation
Involving users Controlled settings Users’ activities are controlled in order to
measure/observe certain behaviours
E.g. usability testing, experiments

Natural settings Little or no control of users’ activities to
determine how the product will be used in the
real world
E.g. field studies

Not involving users Consultants and researchers critique, predict
and model aspects of the UI to identify usability
problems
E.g.: inspections, heuristics, walkthroughs,

models, analytics
Types of evaluation – pros & cons
Evaluation method Pros Cons

Lab-based studies Reveal usability Poor at capturing
problems context of use
Field studies Demonstrate how Expensive and difficult
people use tech in to conduct
intended settings
Modeling & predicting Cheap and quick to Can miss unpredictable
approaches perform usability problems and
subtle aspects of UX
Types of evaluation: controlled settings
 Main methods: usability testing, experiments
 Controlled settings enable evaluators to:

 Control what users do
 When they do it
 For how long they do it
 Reduce outside influences and distractions
 Extensively and successfully used to evaluate software where participants can

be seated in front of a computing device to perform a set of tasks
Controlled setting Description

method
Usability testing • Primary goal: determine if UI is usable by the
intended user population to carry out the tasks for
which it was designed
• Investigates how typical users perform on typical tasks
• Uses a combination of observation, interviews,
questionnaires
• Findings are summarized in usability specification
Controlled setting Description

method
Experiments • Typically conducted in research labs in universities or
industries to test hypotheses
• Any extraneous variables that may interfere with
participant’s performance is removed
• A number of participants will be brought into the lab
to carry out a predefined set of tasks, and their
performances would be measured in specified terms
Types of evaluation: natural settings
 The main method used is field studies
 The main aim is to evaluate people in their natural settings
 Methods used to conduct this:

 Observation
 Interviews
 Focus groups
 Interaction logging
Types of evaluation: natural settings
 Field studies is used to:
 Help identify opportunities for new technology
 Establish requirements for new design
 Facilitate introduction of technology
 Inform deployment of existing technology in new contexts
 In field studies, a goal is to try to be unobtrusive and not to affect what

people do during the evaluation
Types of evaluation: not involving users
 The main methods used are
 Inspections
 Heuristics
 Walkthroughs
 Models
 Analytics
 Researchers/evaluators have to imagine or model how a UI is likely to be used

 Use software
Usability testing
 Traditionally tested in controlled laboratory settings
 Evaluators can control

 what users do
 environmental and social influences that may impact user performance
 The goal: test whether the product being developed is usable by the intended
user population for which it was designed
Usability testing
 Collect data about users’ performance on predefined tasks is a central
component
 Methods to collect data:

 Video recordings
 Logged keystrokes and mouse movements
 Think aloud
 User satisfaction questionnaires
 Interviews
 Etc.
Usability testing
 Example of tasks given to user:
 Search for information
 Read different typefaces (fonts)
 Navigate through different menus
 Find a website
 Etc.
 Number of users to be involved: 5-12 is acceptable but 2-3 also acceptable

when there are budget and schedule constraints
Usability testing
 Measures used are performance times and numbers, e.g.:
 Time taken to complete a task
 Time taken to complete a task after a specified time away from the product
 Number of errors made per task
 Type of errors made per task
 Number of errors made per unit of time
 Number of navigation to online help/manual
 Number of users making a particular type of error
 Number of users completing a particular task successfully
Usability testing
 Labs and equipment
 Lab – proper lab/makeshift lab
 Equipment – items needed to carry out the testing
 Interpreting and presenting the data

 Write up the results as a report
 Make recommendations
Other types of evaluation involving users
 Conducting experiments (controlled settings)
 Test specific hypotheses that make a prediction about the way users will perform
with an interface
 A hypothesis involves examining a relationship between 2 variables
 Example: context menus are an easier to select option compared to cascading
menus
 Hypotheses are based on theory or previous research findings
 Experiments have to be designed to test the hypotheses
 Statistical tests are applied to the data collected
Other types of evaluation involving users
 Evaluation conducted in natural settings: field studies/in the wild studies
 Little or no control imposed on participants’ activities
 Can range from a few minutes to days/months/years
 Data collection is done by observing, interviewing, collecting audio, video and field
notes
 Useful to discover how the product will be used within their intended social and
physical context of use
Heuristic evaluation
 Experts inspect the interface and role-play as a user to identify usability
problems and evaluate UI elements using a set of guidelines
 This set of guidelines is called heuristics or usability principles

 Resembles design principles
 The original heuristics were developed by Nielsen and colleagues in 1994,

from an analysis of 249 usability problems
Heuristic evaluation – Nielsen’s heuristics
Heuristic Description
Visibility of system status The system should always keep users informed about
what is going on through appropriate feedback within
reasonable time.
Match between system The system should speak the users’ language, with
and the real world words, phrases, and concepts familiar to the user,
rather than system-oriented terms. Follow real-world
conventions, making information appear in a natural
and logical order.
User control and freedom Users often choose system functions by mistake and
will need a clearly marked exit to leave the unwanted
state without having to go through an extended
dialog. Support undo and redo.
Consistency and standards Users should not have to wonder whether different
words, situations, or actions mean the same thing.
Follow platform conventions.
Error prevention Even better than good error messages is a careful
design that prevents a problem from occurring in the
first place. Either eliminate error-prone conditions or
check for them and present user with a confirmation
option before they commit to the action.
Recognition rather than Minimize the user’s memory load by making objects,
recall actions, and options visible. The user should not have
to remember information from one part of the dialog
to another. Instructions for use of the system should
be visible or easily retrievable whenever appropriate.
Flexibility and efficiency Accelerators – unseen by the novice user – may often
of use speed up the interaction for the expert user such that
the system can cater to both inexperienced and
experienced users. Allow users to tailor frequent
actions.
Aesthetic and minimalist Dialogs should not contain information that is

design irrelevant or rarely needed. Every extra unit of
information in a dialog competes with the relevant
units of information and diminishes their relative
visibility.
Help users recognize, Error messages should be expressed in plain language
diagnose, and recover (no codes), precisely indicate the problem, and
from errors constructively suggest a solution.
Help and documentation Even though it is better if the system can be used
without documentation, it may be necessary to
provide help and documentation. Any such
information should be easy to search, focused on the
user’s task, list concrete steps to be carried out, and
not to be too large.
Heuristic evaluation
 These heuristics are meant to be used by judging aspects to the interface
against them
 These heuristics may be too general for some products available today, so
Nielsen suggested developing category-specific heuristics that apply to a
specific class of products as a supplement to the general heuristics
 Exactly which heuristics and how many are needed depends on the goals of
the evaluation, but most sets have 5-10 items
 Recommended 3-5 evaluators

Heuristic evaluation - mobile
 Prioritize content over UI elements
 Show user a variety of content right away
 Use of gestures
 Use gestures that are familiar to users
 Integration between the mobile Web and the app

 Easy transition from mobile Web to app (and back)
Heuristic evaluation - mobile
 Better use of phone features
 Take advantage of basic phone functionalities (e.g. using user’s location or existing
payment platforms)
 Fewer tutorials
 Replace with general overview of the app instead of lengthy tutorials
 Read more: https://www.nngroup.com/articles/state-mobile-ux/

Other types of evaluation not involving users
 Walkthroughs
 Walking through a task with the product and taking note of problematic usability
features
 Two types: cognitive and pluralistic
 Cognitive walkthrough involve stimulating a user’s problem-solving process at each
step in the human-computer dialog, checking to see if the user’s goals and memory
for actions can be assumed to lead to the next correct action
 Pluralistic walkthrough involve users, developers and usability experts working
together to step through a scenario, discussing usability issues associated with
elements involve in the scenario steps
 Analytics
 A method for evaluating user traffic through a
system
 Can be collected by logging user interaction
activity, counting and analyzing the data in
order to understand what parts of the system
are being used and when
 To show how users are using the system,
where they are from, their behaviours etc.
 Analytics tool can be used for this purpose
 Examples: web analytics, Google analytics
 Predictive models
 Use formulas to derive various measures of user performance
 Provides estimates of the efficiency of different systems for various kinds of task
 2 predictive models influential in HCI: Fitts’ Law and the GOMS family of models
 A/B testing
 A large scale experiment to evaluate how 2 groups of users perform using 2
different designs
 One for control condition
 The other for experimental condition (for the new design being tested)
 Involves hundreds or thousands of participants
Choosing & combining methods
 Combination of methods across the three broad categories are often used, for
better understanding
 Weigh the pros and cons when choosing methods to use
 Example:
 Usability testing in labs + observations in natural settings
Choosing & combining methods
 Example:
Considerations
 Considerations when carrying out evaluation:
 Participants’ rights
 Ensure participants are not endangered physically and emotionally, and their right
to privacy is protected
 Getting consent
 Participants have to be told what they will be asked to do, the condition under
which the data will be collected, and what will happen to their data when they
finish their tasks, and they can withdraw anytime if they wish
Considerations
Consent Form
I………………………………………agree to participate in [name]’s research study.
I give permission for my interview with [name] to be tape-recorded. I understand that I

can withdraw from the study, without repercussions, at any time, whether before it starts
or while I am participating.
(Please tick one box:)

I agree to quotation/publication of extracts from my interview 
I do not agree to quotation/publication of extracts from my interview 
Signed……………………………………. Date……………….
Considerations
 Non-disclosure agreements
 In cases when the product is proprietary/confidential in nature, ask participants to
keep what they learn in evaluation to themselves
 Cannot share with anyone or post on social media
 Things that can influence data interpretation

 Reliability – how well it produces the same results on separate occasions under the
same circumstances
 Validity – does it measure what it is intended to measure?
 Biases – are the results distorted?
 Scope – how much of the findings can be generalized?
 Ecological validity – how the environment in which an evaluation is conducted
influences/distorts the results
Considerations
 Children
 Get permission from their parents/legal guardians if they are below 18 years old
 Take into account biases or influences that may affect evaluation findings
Summary
 Introducing evaluation
 Who, why, what, where, when
 Types of evaluation
 Considerations
 Heuristic evaluation

HCI Sem1 202021 LU5

Uploaded by

Copyright:

Available Formats

You might also like

HCI Sem1 202021 LU5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HCI Sem1 202021 LU5

Uploaded by

Copyright:

Available Formats

TME3423/TMI3053/TMS3843/

TMT3683Human Computer Interaction

 Conducting evaluations involve

 Evaluation focuses on both

 There are also methods that do not involve participants:

*Evaluator may take on the role of observer and facilitator

 E.g. 2: Safety critical car system

 Basically, evaluation is done to check if the design is acceptable and

 Example – to perform a task,

 Choose tasks that

 It is convenient to use task cards – 5” x 3” cards with task description on it

 Evaluation can take place in different places:

What to evaluate Where to evaluate

 Brand new product:

 Formative vs. summative evaluation

E.g. usability testing, experiments

E.g. field studies

E.g.: inspections, heuristics, walkthroughs,

Evaluation method Pros Cons

 Controlled settings enable evaluators to:

 Extensively and successfully used to evaluate software where participants can

Controlled setting Description

Controlled setting Description

 The main aim is to evaluate people in their natural settings

 Methods used to conduct this:

 In field studies, a goal is to try to be unobtrusive and not to affect what

 Researchers/evaluators have to imagine or model how a UI is likely to be used

 Evaluators can control

 Methods to collect data:

 Number of users to be involved: 5-12 is acceptable but 2-3 also acceptable

 Interpreting and presenting the data

 This set of guidelines is called heuristics or usability principles

 The original heuristics were developed by Nielsen and colleagues in 1994,

Aesthetic and minimalist Dialogs should not contain information that is

 Recommended 3-5 evaluators

 Integration between the mobile Web and the app

 Read more: https://www.nngroup.com/articles/state-mobile-ux/

 Weigh the pros and cons when choosing methods to use

I………………………………………agree to participate in [name]’s research study.

I give permission for my interview with [name] to be tape-recorded. I understand that I

(Please tick one box:)

 Things that can influence data interpretation

You might also like