Professional Documents
Culture Documents
Thesis
Thesis
Thesis
net/publication/234782328
CITATIONS READS
15 1,344
1 author:
Pradipta Biswas
Indian Institute of Science
186 PUBLICATIONS 1,260 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Pradipta Biswas on 19 July 2020.
Pradipta Biswas
Trinity College
December 2009
Declaration
I, Pradipta Biswas, declare that this thesis and the work presented in it are my own
and have been generated by me as the result of my own original research. I confirm
that this work was wholly done while in candidature for a research degree at this
University. Any part of this thesis has not previously been submitted for a degree or
any other qualification at this University or any other institution. Where I have
consulted the published work of others, this is always clearly attributed. Where I have
quoted from the work of others, the source is always given. With the exception of
such quotations, this thesis is entirely my own work. The thesis has approximately
33,000 words.
(Pradipta Biswas)
Dedicated
to
Rocky
Acknowledgement
I would like to thank the Gates Cambridge Trust for funding this work. I am indebted
to my supervisor Prof. Peter Robinson for his constant help and support in my
research. It would have been impossible to complete this research without his vision
and encouragement. I am also grateful to Dr. Alan Blackwell for his immense help in
conducting user trials and analyzing results. I would also like to thank Dr. Graham
Titmus and Dr. Neil Dodgson for always providing me helping hands.
I am quite fortunate to be part of a very helpful research group and would like to
thank all the present and former members for their immense help and support.
Christian Richardt, Arnab Chakrabarti and Cecily Morrison need special mention for
proof reading my thesis chapters. I would also like to thank the staff and members of
the Computer Laboratory especially Mrs. Lise Gough, Mrs. Megan Sammons, Mrs.
Fiona Billingsley, Mrs. Jeniffer Underhill and Ms. Michelle Jeffery for their help on
many occasions. I am grateful to my college tutor Dr. Ali Alavi and secretary to the
tutor Mrs. Helene Sutton for their kind support. I am grateful to the volunteers at
Papworth Trust and University of Cambridge for taking part in my studies. I would
like to thank Ms. Adela Xu and Dr. Setor Knutsor of Papworth Trust for organizing
the user trials. Dr. Pattrick Langdon and his research group at Engineering Design
Centre were particularly helpful for sharing their research work. I would also like to
thank the following researchers for providing me useful resources and technical
suggestions on many occasions: Dr. Daniel Bernhardt, Dr. William Billingsley, Dr.
Joy Goodman, Mrs. Margaret Hayden, Dr. Sean Holden, Dr. Gregory Hughes, Dr.
Faustina Hwang, Prof. David Kieras, Prof. Clayton Lewis, Prof. John Mollon, Prof.
Helen Pain, Prof. Gary Rubin, Dr. H. M. Shah, Dr. Metin Sezgin, Prof. Alistair
Sutcliffe, Dr. Shari Trewin, Dr. Phil Tuddenham, Dr. Sam Waller, Prof. Jacob
Wobbrock and Prof. Richard Young.
Last but not the least I would like to thank my numerous friends and well wishers at
Cambridge, especially Moin Nizami, Shazia Afzal and Amitabha Roy for their both
technical and moral support. Finally I would like to express my gratitude to my
parents for always encouraging me to do well.
Abstract
The simulator consists of a perception model, a cognitive model and a motor behaviour
model. The perception model simulates the phenomena of visual perception (like
focussing and shifting attention) and can also simulate the effects of different visual
impairments on interaction. It has predicted the visual search time and eye gaze pattern of
able-bodied people and a few types of visually impaired users with statistically
significant accuracy. The cognitive model simulates expert performance by using CPM-
GOMS model. It can also simulate performance of novices by using a dual-space model.
The motor-behaviour model is based on statistical analysis of cursor traces from motor-
impaired users. As part of the model, I have also developed a new scale of characterizing
the extent of disability of users by measuring their grip strength. I have evaluated the
simulator through an icon searching task undertaken by visually and motor impaired
people and also used the simulator to develop a new assistive interaction technique. My
studies have already been used to design an accessible game and the University has been
awarded EU funding for a project that will build on results from my PhD research.
2
Table of contents
Chapter 1 Introduction 12
1.1. Background 12
Hypothesis 13
1.2. Proposed solution 14
1.3. Development methodology 15
1.4. Thesis Structure 17
1.5. Publications 18
2.1. Introduction 20
2.2. The GOMS family of models 22
2.3. Cognitive architectures 24
2.4. Grammar based models 26
2.5. Application specific models 26
2.6. Review 28
2.7. Objective 31
2.8. Architecture 32
2.9. Conclusion 33
3.1. Introduction 35
3.2. Related works 36
3.3. Design 39
3.4. Modelling visual impairments 41
3.4.1. Visual acuity loss 41
3.4.2. Contrast sensitivity loss 41
3.4.3. Maccular Degeneration 41
3.4.4. Diabetic Retinopathy 44
3.4.5. Glaucoma 44
3.4.6. Colour blindness 45
3.4.7. Demonstrations 48
3.4.8. User interfaces of the model 50
3.5. Experiment to collect eye tracking data 52
3.5.1. Process 52
3.5.2. Material 53
3.5.3. Participants 54
3.5.4. Calibration for predicting fixation duration 55
3.5.5. Calibration for predicting eye movement patterns 58
3.6. Validation 61
3.7. Discussion 69
3.8. Conclusion 71
4.1. Introduction 72
4.2. Design 73
4.2.1. Learning 75
4.2.2. User interfaces 76
4.3. Case studies 76
4.3.1. Study 1- Modelling simple icon manipulation operations 81
4.3.2. Study 2- A cognitive model for eight-directional scanning 83
4.3.3. Study 3- Modelling interaction for a novel interface 91
4.4. Conclusion 95
4
Chapter 5 Motor behaviour model 96
5.1. Introduction 96
5.2. Related work 97
5.3. Design 98
5.3.1. User interfaces of the model 101
5.4. Pilot study 102
5.5. Confirmatory study 107
5.5.1. Process 108
5.5.2. Material 108
5.5.3. Participants 112
5.5.4. Results 114
5.5.5. Calibration 118
5.5.6. Validation 120
5.6. Affect of hand strength for able-bodied users 123
5.6.1. Process 124
5.6.2. Material 124
5.6.3. Participants 124
5.6.4. Results 124
5.6.1. Process 124
5.7. Discussion 126
5.8. Conclusions 129
5
6.3.3. Participants 135
6.3.4. Simulation 136
6.3.5. Results 137
6.3.6. Discussion 145
6.4. The Cluster scanning system 146
6.4.1. Related work 147
6.4.2. Current scanning systems 150
6.4.3. The cluster scanning system 154
6.4.4. Evaluation through simulation 159
6.4.5. Validation of the result 163
6.5. Conclusion 169
Appendix 184
Bibliography 188
6
List of Figures
8
Figures Page No.
5-15. Average number of Pauses per pointing task vs. Active range of ROM of 116
Forearm
5-16. Average number of Pauses per pointing task in different phases of 117
movement vs. Grip Strength (SMNS: Sub Movement Near Source, SMIM: Sub
Movement in Middle SMNE: Sub Movement Near End)
5-17. Velocity of Movement vs. Grip Strength 117
5-18. Scatter plot of prediction 121
5-19. Percentage error of prediction 121
5-20. Scatter plot between actual and predicted task completion times 122
5-21. Error Plot 123
5-22. Index of Performance vs. Grip Strength 125
5-23. Index of Performance vs. Tip Pinch Strength 125
5-24. Parameter b vs. Tip Pinch Strength 126
6-1. Sequence of operations in the simulator 131
6-2. Corpus of Shapes 133
6-3. Corpus of Icons 133
6-4. Sample screenshot of the study 134
6-5. Scatter plot between actual and predicted task completion time 138
6-6. Relative error in prediction 138
6-7. Effect size comparison in ANOVA 142
6-8. Effect size comparison in MANOVA 142
6-9. Effect of Font size in different user groups 143
6-10. Effect of Spacing in different user groups 144
6-11. Effect of medium Spacing in motor impaired users 144
6-12. Effect of medium Font size in motor impaired users 145
6-13. The eight-directional Scanning System 151
6-14. The Block Scanning System 152
153
6-15. The Cluster Scanning System
6-16. Screenshot of the demonstration for scanning interfaces 154
6-17. Variation of T w.r.t. the number of clusters 158
9
Figures Page No.
6-18. Performance Comparison of Different Scanning Systems 160
6-19. Comparing Cluster Scanning and Block Scanning for tasks using and not 162
using Internet
6-20. Task completion times for the scanning systems 165
6-21. Comparing the scanning systems 168
7-1. Timescale of human action (adapted from [Newell, 1990]) 174
10
List of Tables
Not only do physically disabled people have experiences which are not available to the able-
bodied, they are in a better position to transcend cultural mythologies about the body,
because they cannot do things the able-bodied feel they must do in order to be happy,
'normal,' and sane. If disabled people were truly heard, an explosion of knowledge of the
human body and psyche would take place.
-Susan Wendell, from her book “The Rejected Body: Feminist Philosophical Reflections on
Disability”,1996
1.1. Background
The World Health Organisation (WHO) states that the number of people aged 60 and
over will be 1.2 billion by 2025 and 2 billion by 2050 [WHO website, 2009]. The very
old (age 80+) is the fastest growing population group in the developed world. Many
of these elderly people have disabilities which make it difficult for them to use
computers. The definition of the term ‘Disability’ differs across countries and
cultures, but the World Bank estimates a rate of 10-12% of population worldwide
having a condition that inhibits their use of standard computer systems [World Bank
website, 2009]. The Americans with Disabilities Act (ADA) in USA and The
Disability Discrimination Act (DDA) in UK prohibits any discrimination between
able bodied and disabled people with respect to education, service, and employment.
There are also ethical and social reasons for designing products and services for this
vast population. In particular, computers offer valuable assistance to people with
physical disabilities and help to improve their quality of life. However the diverse
range of abilities complicates the designing of human computer interfaces for these
users. Many inclusive or assistive software systems often address a specific class of
user and still exclude many users. Lack of knowledge about the problems of disabled
and elderly users has often led designers to develop non inclusive systems. There are
guidelines for designing accessible systems, particularly accessible websites [Web
Chapter 1 Introduction
Content Accessibility Guidelines, 2008], but designers often do not conform to the
guidelines when developing new systems. Additionally, the guidelines are not always
adequate to analyze the effects of impairment on interaction with devices.
Evaluation of assistive interfaces can be even more difficult than their design.
Assistive interfaces are generally evaluated by analysing log files after a user trial
[Lesher and colleagues, 2000; O’Neill, Roast and Hawley, 2000; Hill and Romich,
2007]. As an example of a different approach, Rizzo and colleagues [1997] evaluated
the AVANTI project [Stephanidis, 1998] by a technique combining cognitive
walkthrough and Normans’ seven-stage model [Shneiderman, 2001]. However it is
often difficult to find participants with specific disabilities to conduct a user trial.
Petrie and colleagues [2006] take the approach of remote evaluation. This evaluation
technique does not require participants to be brought into a laboratory and can
increase the sample size of a study. However it still does not avoid the need to find
disabled participants nor to replace controlled experiment.
Hypothesis
A modelling tool for people with disabilities is particularly important, as user trials
are often difficult and time consuming. In different domains of science, simulation
and modelling have been already found to be effective for explaining and augmenting
existing theories. However, very few human computer interaction (HCI) models have
considered users with disabilities. I take a novel approach to the design and
evaluation of assistive interfaces by simulating interaction patterns of users with and
without disabilities. I hypothesize that
I have investigated how physical capabilities of users with a wide range of abilities
are reflected in their interactions with digital devices and modelled their interaction
patterns. In my work, I have tried to find answers of the following questions
o How can we predict completion time of representative tasks for people with a
wide range of abilities?
o How do different physical impairments affect human computer interaction?
In particular,
o How does visual impairment affect visual searching in a computer
screen?
o How does mobility impairment affect pointing using different input
devices?
o How can we relate physical characteristics of users with the simulation
parameters?
o How effective will a simulation be in designing and evaluating interfaces for
people with diverse range of abilities?
Figure 1-1 shows the use of my simulator. I aim to help evaluate existing systems and
different design alternatives with respect to many types of disability. The evaluation
process would be used to select a particular interface, which can then be validated by
14
Chapter 1 Introduction
a formal user trial. The user trials also provide feedback to the models to increase
their accuracy. As each alternative design does not need to be evaluated by a user
trial, it will reduce the development time significantly.
Prototype
Systems
Best
Alternative
User New
Simulation
Testing Systems
Interaction
Existing Patterns
Systems
Design
Calibration
Validation
The type of the model (e.g. Neural network, linear regression model and so
on)
15
Chapter 1 Introduction
Design
Basic framework
Calibration
User
Calibrated model
Studies
Validation
Validated model
Exploratory models emphasize explaining facts while validated models are used for
prediction. I also followed this general principle in developing models of human
16
Chapter 1 Introduction
performance. I calibrated the models through a series of user studies and later
validated them by controlled experiments.
The perception model explores the principles of eye gaze fixation and eye movement
trajectories of able-bodied people and people with visual impairment. In Chapter 3, I
present the design of the model followed by an experiment to calibrate and validate
the model.
The cognitive model simulates intentions of users based on their prior knowledge and
the task. I also present three case studies of using the cognitive model in Chapter 4.
The motor behaviour model explores how hand strength is reflected in pointing
performance and predicts movement time of pointing tasks undertaken by motor
impaired participants. In Chapter 5, I present the design of the model followed by
pilot and confirmatory studies to calibrate and validate the model. I have also
explored how hand strength affects pointing performance of able-bodied users.
17
Chapter 1 Introduction
1.5. Publications
Some portions of the dissertation have appeared in the following publications:
18
Chapter 1 Introduction
8. P. Biswas, Simulating HCI for all, CHI Extended Abstracts 2008: pp. 2649-
2652.
9. P. Biswas and P. Robinson, Simulation to Predict Performance of Assistive
Interfaces, Proceedings of the 9th International ACM SIGACCESS
Conference on Computers and Accessibility (ASSETS’07) pp. 827-828.
10. P. Biswas, Simulating HCI for Special Needs, ACM SIGACCESS Newsletter,
Issue 89, Sept 2007, pp. 7-10.
11. P. Biswas and P. Robinson, Modelling Perception using Image Processing
Algorithms, 23rd British Computer Society Conference on Human-Computer
Interaction (HCI 09).
12. P. Biswas and P. Robinson, Predicting Pointing Time from Hand Strength,
USAB 2009, LNCS 5889, pp. 428–447.
13. P. Biswas, T. M. Sezgin and P. Robinson, Perception model for people with
visual impairments, Proceedings of the 10th International Conference on
Visual Information Systems (VISUAL 2008), LNCS 5188, pp. 279-290, 2008.
14. P. Biswas and P. Robinson, Performance Comparison of Different Scanning
System using a Simulator, Proceedings of the 9th European Conference of
Advancement of Assistive Technology in Europe (AAATE ‘07), pp. 873-877.
15. P. Biswas and P. Robinson, Effects of Physical Capabilities on Interaction,
Workshop: Defining the Architecture for Next Generation Inclusive
Television in EuroITV 2009.
16. P. Biswas and P. Robinson, Modelling user interfaces for special needs,
Accessible Design in the Digital World (ADDW) 2008.
17. P. Biswas and P. Robinson, A Motor-Behaviour Model for Physically
Challenged Users, Cambridge Workshop on Universal Access and Assistive
Technology, Cambridge, April 2008, pp 5-9, ISSN 0963-5432.
18. P. Biswas and P. Robinson, Simulating HCI for all, Proceedings of the IET
Conference on Recent Advances in Assistive Technology and Engineering
(RAATE 2007).
19
Chapter 2 Literature survey
Computer simulation is not the deception of others but the construction of "a suitably
analogous apparatus" on a computer.
-Hans-Jiirgen Eikmeyer and Ulrich Schade, from their paper, "The Role of Computer
Simulation in Neurolinguistics", Nordic Journal of Linguistics, 16, 1993
2.1. Introduction
decompose an interaction task and gave a conceptual view of the interface before its
implementation. However it completely ignored the human aspect of the interaction
and did not model the capabilities and limitations of users. Card, Moran and Newell’s
Model Human Processor (MHP) [Card, Moran and Newell, 1983] was an important
milestone in modelling HCI since it introduced the concept of simulating HCI from
the perspective of users. It gave birth to the GOMS family of models [Card, Moran
and Newell, 1983] that are still the most popular modelling tools in HCI.
There is another kind of model for simulating human behaviour that not only works
for HCI but also aims to establish a unified theory of cognition. These types of models
originated from the earlier work of computational psychologists. Allen Newell
pioneered the idea of unifying existing theories in cognition in his famous paper “You
can’t play 20 questions with nature and win” at the 1973 Carnegie Symposium
[Newell, 1973]. Since then, a plethora of systems have been developed that are termed
as cognitive architectures and they simulate the results of different experiments
conducted in psychological laboratories. Since these models are capable (or at least
demanded to be capable) of simulating any type of user behaviour, they are also often
used to simulate the behaviour of users while interacting with a computer. Gray and
colleagues [1997] assert that cognitive architectures ensure the development of
consistent models over a range of behavioural phenomena due to their rigorous
theoretical basis.
So there are two main approaches of user modelling: the GOMS family of models was
developed only for HCI while the models involving cognitive architectures took a
more detailed view of human cognition. Based on the accuracy, detail and
completeness of these models, Kieras [2005] classified them as low fidelity and high
fidelity models respectively. These two types of model can be roughly mapped to two
different types of knowledge representation. The GOMS family of models is based on
goal-action pairs and corresponds to the Sequence/Method representation while
21
Chapter 2 Literature survey
cognitive architectures aim to represent the users’ mental model [Carroll and Olson,
1990]. The Sequence/Method representation assumes that all interactions consist of a
sequence of operations or generalized methods, while the mental model representation
assumes that users have an underlying model of the whole system.
There is a third kind of model in HCI that evaluates an interface by predicting users’
expectations, rather than their performance (e.g. Task Action Language [Reisner,
1981], Task Action Grammar [Payne and Green, 1986] etc.). These models represent
an interaction by using formal grammar where each action is modelled by a sentence.
They can be used to compare users’ performance based on standard sentence
complexity measures; however, they have not yet been used and tested extensively for
simulating users’ behaviour [Carroll and Olson, 1990].
In the following sections, I briefly describe these different types of user model. Then,
I present a critical review of existing models and set out the objectives of this
research.
22
Chapter 2 Literature survey
The KLM model [Keystroke Level Model, Card, Moran and Newell, 1983] simplifies
the GOMS model by eliminating the goals, methods, and selection rules, leaving only
six primitive operators. They are:
1. Pressing a key
2. Moving the pointing device to a specific location
3. Making pointer drag movements
4. Performing mental preparation
5. Moving hands to appropriate locations, and
6. Waiting for the computer to execute a command.
The durations of these six operations have been empirically determined. The task
completion time is predicted by the number of times each type of operation must
occur to accomplish the task.
John and Kieras [1996] proposed a new version of the GOMS model, called CPM-
GOMS, to explore the parallelism in users’ actions. This model decomposes a task
into an activity network (instead of a serial stream) of basic operations (as defined by
KLM) and predicts the task completion time based on the Critical Path Method.
23
Chapter 2 Literature survey
24
Chapter 2 Literature survey
R system [Adaptive Control of Thought- Rational, Anderson and Lebiere, 1998] does
not follow the pure symbolic modelling strategy of the SOAR, rather it was developed
as a hybrid model, which has both symbolic and sub symbolic levels of processing. At
the symbolic level, ACT-R operates as a rule-based system. It divides the long-term
memory into declarative and procedural memory. Declarative memory is used to store
facts in the form of ‘chunks’ and the procedural memory stores production rules. The
system works to achieve a goal by firing appropriate productions from the production
memory and retrieving relevant facts from the declarative memory. However the
variability of human behaviour is modelled at the sub-symbolic level. The long-term
memory is implemented as a semantic network. Calculation of the retrieval time of a
fact and conflict resolution among rules is done based on the activation values of the
nodes and links of the semantic network.
26
Chapter 2 Literature survey
The Generative User Model [Motomura, Yoshida and Fujimoto, 2000] was developed
for personalized information retrieval. In this model input query words are related to
user’s mental state and retrieved object using latent probabilistic variables. Norcio
[1989] used fuzzy logic to classify users of an intelligent tutoring system. The fuzzy
groups are used to derive certain characteristic of the user and thus deriving new rules
for each class of user.
Norcio and Chen [1992] also used an artificial neural network for the same purpose as
their previous work [Norcio, 1989]. In their model, users’ characteristics are stored as
an image and neural networks are used to find patterns in users’ knowledge, goals and
so on.
The Lumiere convenience project [Horovitz and colleagues, 2008] used influence
diagram in modelling users. Lumiere project is the background theory of the Office
Assistant shipped with Microsoft Office application. The influence diagram models
the relationships among users’ needs, goals, user background etc. However all these
models are developed by keeping only a single application in mind and so they are
hardly usable to model human performance in general.
27
Chapter 2 Literature survey
2.6. Review
The GOMS family of models is mainly suitable for modelling the optimal behaviour
(skilled behaviour) of users [John and Kieras, 1996]. These models assume that for
each instance of a task execution, the goal and the plan of a user are determined
before the execution is started. During execution of a task, a novice first time user or a
knowledgeable intermittent user may not have a fixed plan beforehand and can even
change goals (or subgoals) during execution of the task. Even expert users do not
follow a fixed sequence of actions every time. So the assumptions of the GOMS
model may not hold true for many real life interactions. In actuality, these models do
not have probabilistic components beyond the feature of selecting the execution time
of primitive operators from a statistical distribution in order to model the uncertainty
involved in the sub-optimal behaviour of users. As it fails to model the sub-optimal
behaviour, it cannot be used to predict the occurrences of different errors during
interaction. These problems are common for any Sequence/Method representations
since these ways of representations overlook the underlying mental models of users
[Carroll and Olson, 1990].
On the other hand, cognitive architectures model the uncertainty of human behaviour
in detail but they are not easily accessible to non psychologists and this causes
problem as interface designers are rarely psychologist as well. For example, the ACT-
R architecture models the content of a long-term memory in the form of a semantic
network, but it is very difficult for an interface designer to develop a semantic
network of the related concepts of a moderately complex interface. Developing a
sequence of production rules for SOAR or a set of constraints for CORE is equally
difficult. The problem in usability issues of cognitive architectures is also supported
by the development of the X-PRT system [Tollinger and colleagues, 2005] for the
CORE architecture. Additionally, Kieras [2005] has shown that a high fidelity model
cannot always outperform a low fidelity one though it is expected to do so.
28
Chapter 2 Literature survey
Researchers have already attempted to combine the GOMS family of models and
cognitive architectures to develop more usable and accurate models. Salvucci and Lee
[2003] developed the ACT-Simple model by translating basic GOMS operations
(such as move hand, move mouse, press key) into ACT-R production rules. However
they do not model the ‘think’ operator in detail, which corresponds to the thinking
action of users and differentiates novices from experts. The model works well in
predicting expert performance but does not work for novices.
Blandford and colleagues [2004] implemented the Programmable User Model (PUM)
[Young, Green and Simon, 1989] by using the SOAR architecture. They developed a
program, STILE (SOAR Translation from Instruction Language made Easy), to
convert the PUM Instruction Language into SOAR productions. However, this
approach also demands good knowledge of SOAR on the part of an interface
designer. Later, the PUM team identified additional problems with runnable user
models and they are now investigating abstract mathematical models [Butterworth
and Blandford, 2007].
There also exist some application specific models that combine GOMS models with a
cognitive architecture. For example, Gray and Sabnani [1994] combined GOMS with
ACT-R to model a VCR programming task, while Peck and John [1992] used SOAR
to model interaction with a help-browser, which ultimately turned out to be a GOMS
model.
29
Chapter 2 Literature survey
30
Chapter 2 Literature survey
not address the basic perceptual, cognitive and motor behaviour of users and so it is
hard to generalize to other applications.
Keates and colleagues [2000] measured the difference between able-bodied and motor
impaired users with respect to the Model Human Processor (MHP) [Card, Moran and
Newell, 1983] and motor impaired users were found to have a greater motor action
time than their able-bodied counterparts. The finding is obviously important, but the
KLM model itself is too primitive to model complex interaction and especially the
performance of novice users.
My previous user model [Biswas and colleagues, 2005] also took a more generalized
approach than the AVANTI project. It broke down the task of user modelling into
several steps that included clustering users based on their physical and cognitive
ability, customizing interfaces based on user characteristics and logging user
interactions to update the model itself. However the objective of this model was to
design adaptable interfaces and not to simulate users’ performance.
2.7. Objective
Based on the previous discussion, Figure 2-2 plots the existing general purpose HCI
models in a space defined by the skill and physical ability of users. To cover most of
the blank spaces in the diagram, I set my objectives to develop models that can:
31
Chapter 2 Literature survey
CPM-GOMS EPIC
SOAR
Skill ACT-R
level Core
Novice
Disabled Physical Ability Able-bodied
Figure 2-2. Existing HCI models w.r.t. skill and physical ability of users
2.8. Architecture
In light of my objective, I have developed the simulator as shown in Figure 2-3. It
consists of the following three components:
The Application model represents the task currently undertaken by the user by
breaking it up into a set of simple atomic tasks following KLM model [Card, Moran
and Newell, 1983].
The Interface model decides the type of input and output devices to be used by a
particular user and sets parameters for an interface.
The User model simulates the interaction patterns of users for undertaking a task
analysed by the task model under the configuration set by the interface model. It uses
the sequence of phases defined by Model Human Processor [Card, Moran and Newell,
1983].
32
Chapter 2 Literature survey
Display Perception
Model Model
Task Cognitive
Model Model
Motor
Input
Behaviour
Model
Model
2.9. Conclusion
In this chapter I have presented literature survey on human behaviour simulation and
their applications on modelling users in human computer interaction. The review of
the current state-of-the-art work shows a deficiency of modelling tools for users with
33
Chapter 2 Literature survey
34
Chapter 3 Perception model
We can now begin to develop a science of graphic design based on a scientific understanding
of visual attention and pattern perception.
-Colin Ware, from his book “Visual Thinking: For Design”, 2008
3.1. Introduction
In the next section I present a review of the existing perception models. In the
following sections I discuss the design, calibration and validation of the model.
Finally I make a comparative analysis of my model with other approaches and
conclude by exploring possibilities for further research.
Chapter 3 Perception model
Feature extraction: As the name suggests, in this step the image is analysed
to extract different features such as colour, edge, shape, curvature and so on
This step mimics neural processing in the V1 region of the brain [Tovee,
2008].
36
Chapter 3 Perception model
Object recognition: The grouped features are compared to known objects and
the closest match is chosen as the output.
In these three steps, the first step models the bottom up theory of attention while the
last two steps are guided by top down theories. All of these models aim to recognize
objects from a background picture and some of them have been proved successful at
recognizing simple objects (such as mechanical instruments). However, they have not
demonstrated such good performance at recognizing arbitrary objects [Rosandich,
1997]. These early models do not operate at a detailed neurological level. Itti and
Koch [2001] present a review of computational models, which try to explain vision at
the neurological level. Itti’s pure bottom up model [Itti and Koch, 2001] even worked
in some natural environments, but most of these models are used to explain the
underlying phenomena of vision (mainly the bottom up theories) rather than
prediction. As an example of a predictive model, the VDP model [Daly, 1993] uses
image processing algorithms to predict retinal sensitivity for different levels of
luminance, contrast and so on. Privitera and Stark [2000] also used different image
processing algorithms to identify points of fixations in natural scenes, however they
do not have an explicit model to predict eye movement trajectory.
In the field of human computer interaction, the EPIC [Kieras and Meyer, 1990] and
ACT-R [Anderson and Lebiere, 1998] cognitive architectures have been used to
develop perception models for menu searching and icon searching tasks. Both the
37
Chapter 3 Perception model
EPIC and ACT-R models [Hornof and Kieras, 1997; Byrne, 2001] are used to explain
the results of Nielsen’s experiment on searching menu items [Nielsen, 1992], and
found that users search through a menu list in both systematic and random ways. The
ACT-R model has also been used to find out the characteristics of a good icon in the
context of an icon searching task [Fleetwood and Byrne, 2002; 2006]. However, the
cognitive architectures emphasize modeling human cognition and so the perception
and motor modules in these systems are not as well developed as the remainder of the
system. The working principles of the perception models in EPIC and ACT-R/PM are
simpler than the earlier general purpose computational models of vision. These
models do not use any image processing algorithms [Hornof and Kieras, 1997;
Fleetwood and Byrne, 2002; 2006]. The features of the target objects are manually fed
into the system and they are manipulated by handcrafted rules in a rule-based system.
As a result, these models do not scale well to general purpose interaction tasks. It will
be hard to model the basic features and perceptual similarities of complex screen
objects using propositional clauses. Modelling of visual impairment is particularly
difficult using these models. For example, an object seems blurred in a continuous
scale for different degrees of visual acuity loss and this continuous scale is hard to
model using propositional clauses in ACT-R or EPIC. Shah and colleagues [2003]
have proposed the use of image processing algorithms in a cognitive model, but they
have not published any result about the predictive power of their model yet.
similarities of objects and also calibrated it for predicting eye movements. The
calibrated model can predict the visual search time for two different visual search
tasks with significant accuracy for both able-bodied and visually impaired people.
3.3. Design
The perception model takes a list of mouse events, a sequence of bitmap images of an
interface and locations of different objects in the interface as input, and produces a
sequence of eye movements as output. The model is controlled by four free
parameters: distance of the user from the screen, foveal angle, parafoveal angle and
periphery angle (Figure 3-1). The default values of these parameters are set according
to the EPIC architecture [Kieras and Meyer, 1990].
(distance × tan peripheryangle ), Figure 3-1). If the focus rectangle contains more
2
than one probable target then it shrinks in size to investigate each individual item.
Similarly, in a sparse area of the screen, the focus rectangle increases in size to reduce
the number of attention shifts.
The model scans the whole screen by dividing it into several focus rectangles, one of
which should contain the actual target. The probable points of attention fixation are
calculated by evaluating the similarity of other focus rectangles to the one containing
the target. We know which focus rectangle contains the target from the list of mouse
events that was input to the system. The similarity is measured by decomposing each
focus rectangle into a set of features (colour, edge, shape and so on) and then
comparing the values of these features. The focus rectangles are aligned with respect
to the objects within them during comparison. Finally, the model shifts attention by
combining different eye movement strategies (such as Nearest [Findlay, 1992; 1997],
Systematic, Cluster [Fleetwood and Byrne, 2002; 2006] and so on), which are
discussed later.
The model can also simulate the effect of visual impairment on interaction by
modifying the input bitmap images according to the nature of the impairment (such as
blurring for visual acuity loss, changing colours for colour blindness). I discuss the
modelling of visual impairment in the next section. Following that, I present the
calibration and validation of the model using an eye gaze tracking experiment.
40
Chapter 3 Perception model
Visual Acuity is the sensitivity of the visual interpretative mechanism of the brain. It
represents the acuteness of vision, which depends on the sharpness of the retinal
image within the eye [Crick and Khaw, 2003].
I have modelled visual acuity loss by using a Gaussian low pass filter. I have
calibrated the filter by blurring a Snellen chart [Crick and Khaw, 2003] and then
observing the effect of blurring on people with normal vision.
Contrast Sensitivity is the ability to perceive differences between an object and its
background. It is measured by the difference in the amount of light reflected
(luminance) from two adjacent surfaces [Crick and Khaw, 2003].
Dry Maccular Degeneration causes vision loss through loss of photoreceptors (rods
and cones) in the central part of the eye [Faye, 1980]. It progresses at a slower pace
than the wet form and vision loss is less severe. In the dry form, the macula thins over
time as part of the aging process. Words may appear blurred or hazy and colours may
appear dim or gray.
In the wet form of Maccular Degeneration, patients lose vision due to abnormal blood
vessel growth, ultimately leading to blood and protein leakage below the macula
[Faye, 1980]. The wet form may cause visual distortion and make straight lines
appear wavy. A central blind spot develops in later stage of the disease. The wet type
progresses more rapidly and vision loss is more pronounced.
I have simulated the central field loss by a function that takes as input
o an image
o the tentative point of eye gaze fixation on it,
o the radius of the lost visual field
42
Chapter 3 Perception model
The function processes the input image to put a black (or grey) spot of the specified
radius at the point of fixation. It also shifts the point of fixation according to the
position of the pseudo-fovea. The radius of the black spot increases proportionately
with the progress of the disease.
However, I have simulated the initial stage of wet Maccular Degeneration separately
from the dry form. I have simulated the early stages of wet Maccular Degeneration by
distorting and blurring the image. For simulating the early stages of dry Maccular
Degeneration, I have used a function that takes as input
o an image
o the tentative point of eye gaze fixation on it
o the size, number and positions of scotoma (black patches) with respect to the
point of fixation
The function processes the image to draw the black patches at some random positions
within a ring surrounding the point of eye gaze fixation. The number and size of the
scotoma are determined from a normal distribution with the given parameters as
mean. The nephroid and cardioid curves are used to draw the patches, which closely
match the shape of the scotoma according to the ophthalmologists. As the disease
progresses, the patches grow in size and covers the macula. At this stage, the
simulation signifies central visual field loss and starts to use the function described in
the previous paragraph and progressively increases the disc size. I also blur the whole
image using a Gaussian low pass filter. The standard deviation and template size of
the filter increases proportionately with the progress of the disease.
43
Chapter 3 Perception model
o an image
o the tentative point of eye gaze fixation on it
o the size and the number of scotoma.
The function processes the image to draw the black patches at some random positions
within an area defined by the point of attention fixation and the periphery angle. The
number and size of the black patches are determined from a normal distribution with
the given parameters as mean. As the disease progresses, the patches grow in size and
number. I also blur the screen as in Macular Degeneration.
For Maccular Degenration and Diabetic Retinopathy, I use the modified image to the
perception model as input. The perception model works following the previously
mentioned three steps on these images.
3.4.5. Glaucoma
Glaucoma is caused by the death of retinal ganglion cells [Faye, 1980]. Higher
intraocular pressure is one of the significant risk factors for glaucoma. Initially it only
44
Chapter 3 Perception model
creates a few scotoma but with the progress of the disease, patients loose peripheral
field vision and eventually become blind. It is the second leading cause of blindness
worldwide [Crick and Khaw, 2003].
Colour blindness is the inability to perceive differences between some of the colours
that other people can distinguish [Kaiser and Boynton, 1996]. The normal human
retina contains two kinds of photoreceptor cells: the rod cells (active in low light) and
the cone cells (active in normal daylight). Normally, there are three kinds of cones,
each containing a different pigment. The cones are activated when the pigments
absorb light. The absorption spectra of the cones differ; one is maximally sensitive to
short wavelengths, one to medium wavelengths, and the third to long wavelengths.
Their peak sensitivities are in the blue, yellowish-green, and yellow regions of the
spectrum, respectively. The sensitivity of normal colour vision depends on the overlap
between the absorption spectra of the three systems - different colours are recognized
45
Chapter 3 Perception model
when the different types of cone are stimulated to different extents. There exist three
main types of colour blindness [Kaiser and Boynton, 1996; Tovee, 2008]:
46
Chapter 3 Perception model
Not done
My Simulation
Ishihara test
Plate 16
Plate 17
I have also confirmed the correctness of my program using the Ishihara Test for
colour blindness [Colour Blindness Test, 2008]. I have used plate 16 and 17 of a 24-
plate version of the test (Figure 3-3). It can be seen in Figure 3-3 that the right hand
digit is prominent for Protanopia simulation, while the left hand one for Deuteranopia,
as should happen in cases of protans and deuterans.
3.4.7. Demonstrations
48
Chapter 3 Perception model
The early stage of wet Maccular Degeneration (Figure 3-4b) introduces blurring and
distortion while the early stage of dry Maccular Degeneration (Figure 3-4d)
introduces some random black spots. As the disease progress, the black spot increases
in radius signifying more central visual field loss (Figure 3-4f). In case of Diabetic
Retinopathy (Figures 3-4g and 3-4h), some random black spots appear at the region of
attention fixation. As the disease progresses, they increase both in size and number
49
Chapter 3 Perception model
(Figure 3-4h). Protans (Figure 3-4i) and Deuterans (Figure 3-4j) do not face any
problem for this particular screen and target, as it does not hamper vision of blue
colour, but the red and green targets appear differently to them. In cases of visual
acuity loss (Figure 3-4c), Maccular Degeneration (Figure 3-4d) and Diabetic
Retinopathy (Figure 3-4h) the number of points of fixation is greater than in normal
vision (Figure 3-4a), contrast sensitivity loss (Figure 3-4e) and colour blindness
(Figures 3-4i and 3-4j) since users may need to investigate all blue targets due to
blurring of the screen.
To cover a wide range of visual impairments, I have developed the user interfaces in
three different levels - in the first level (Figure 3-5) the system simulates different
diseases. In the next level (Figure 3-6) the system simulates the effect of change in
different visual functions (e.g. Visual acuity, Contrast sensitivity, Visual field loss
etc.). In the last level (Figure 3-7), the system allows different image processing
algorithms to be run (such as High pass filtering, Blurring etc.) on input images to
simulate the effect of a particular impairment. This approach also makes it easier to
model the progress of an impairment. Previous simulations of visual impairments
model the progress of impairment by a single parameter [Inclusive Design Toolkit,
2008 and Vision Simulator, 2008] or using a large number of parameters [Visual
Impairment Simulator, 2008]. In my system, the progress of any impairment can be
modelled either by a single parameter or by changing the values of different visual
functions. For example, the extent of a particular case of Maccular Degeneration can
be modelled either by a single scale (Figure 3-5) or by using different scales for visual
acuity and central visual field loss (Figure 3-6).
50
Chapter 3 Perception model
3.5.1. Process
I conducted trials with two families of icons. The first consisted of geometric shapes
with colours spanning a wide range of hues and luminance (Figure 3-8). The second
consisted of images from the system folder in Microsoft Windows to increase the
external validity (Figure 3-9) of the experiment.
52
Chapter 3 Perception model
The experimental task consisted of searching two families of icons. The task was as
follows
1. A particular target (shape or icon) was shown.
2. A set of 18 candidates was shown.
3. Participants were asked to click on the candidate(s), which are same as the
target.
4. The number of candidates similar to the target was randomly chosen between
1 and 8 to simulate both serial and parallel searching effects [Treisman and
Gelade, 1980], the other candidates were distractors.
5. The candidates were separated by 150 pixels horizontally and by 200 pixels
vertically.
6. Each participant did ten searching tasks with two families of icons.
3.5.2. Material
I used a 1024 × 768 LCD colour display driven by a 1.7 GHz Pentium 4 PC running
the Microsoft Windows XP operating system. I also used a standard computer Mouse
53
Chapter 3 Perception model
(Microsoft IntelliMouse® Optical Mouse) for clicking on the target and a Tobii X120
Eye Tracker for tracking eye gaze pattern, which has an accuracy of 0.5º of visual
angle. The Tobii studio software was used to extract the points of fixation. I used the
default fixation filter (Tobii fixation filter) and fixation radius (minimum distance to
separate two fixations) of 35 pixels.
3.5.3. Participants
I collected data from 8 visually impaired and 10 able-bodied participants (Table 3-1).
All were expert computer users and had no problem in using the experimental set up.
C1 22 M
C2 29 M
C3 27 M
C4 30 F
C5 24 M
Able-bodied
C6 28 M
C7 29 F
C8 50 F
C9 27 M
C10 25 M
P1 24 M Retinopathy
P2 22 M Nystagmus and acuity loss due to Albinism
P3 22 M Myopia (-3.5 / -3.5 Dioptre)
P4 50 F Colour blindness - Protanopia
P5 24 F Myopia (-4.5 / -4.5 Dioptre)
54
Chapter 3 Perception model
Initially I measured the drift of the eye tracker for each participant. The drift was
smaller than half the separation between the icons, so most of the fixations around the
icons could be identified. I calibrated the model to predict fixation duration by the
following two steps.
Step 1: Calculation of image processing coefficients and relating them to the fixation
duration
As I discussed in section 3.2, the first phase of the process of vision is feature
extraction. To extract features, I calculated the
Then I used a Support Vector Machine (SVM) and a cross validation test to identify
the best feature set for predicting fixation duration for each participant as well as for
all participants. I found that the Shape Context Similarity coefficient and the Colour
55
Chapter 3 Perception model
Histogram coefficient in YUV space work best for all participants taken together. The
combination also has a recognition rate within the 5% limit of the best classifier for
individual participants. Finally I measured the correlation of the Colour Histogram
and Shape Context coefficients between the targets and distractors with the fixation
durations (Table 3-2). The image processing coefficients correlate significantly with
the fixation duration, though the significance is not indicative of their actual
predictive power, as the number of data points is large. However, the Colour
Histogram algorithm in YUV space is moderately correlated (0.51) with the fixation
duration (Figure 3-10). So I developed a classifier that takes the Shape Context
Similarity coefficient and Colour Histogram coefficient in YUV space of a target as
input and predicts the fixation duration on it as output.
Table 3-2. Correlation between fixation duration and image processing algorithms
Colour Colour
Image Shape Edge
Histogram Histogram
Statistics Context Similarity
(YUV) (RGB)
Spearman’s 0.507** 0.444** 0.383** 0.363**
Rho (ρ)
1600
1400
1200
1000
800
600
400
200
0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05
Colour histogram(YUV) coefficient
I found in the eye tracking data that users often fixed eye gaze more than once on
targets or distractors. I investigated the number of fixations with respect to the
fixation durations (Figures 3-11 and 3-12). I assumed that in case of more than one
attention fixation, the recognition took place during the fixation with the largest
duration. Figure 3-12 shows the total number of fixations with respect to the
maximum fixation duration for all able-bodied users and each visually impaired user.
I found that visually impaired people fixed eye gaze more often than their able-bodied
counterparts. Participant P2 (who has nystagmus) has many fixations of duration less
than 100 msec and only two fixations having duration more than 400 msec.
It can be seen that as the fixation duration increases, the number of fixations also
decreases (Figures 3-11 and 3-12). This can be explained by the fact that when the
fixation duration is higher, the users can recognize the target and do not need more
long fixations on it. The number of fixations is smaller when the fixation duration is
less than 100 msec, probably these are fixations where the distractors are very
different from the targets and users quickly realize that they are not intended target.
In my model, I predict the maximum fixation duration using the image processing
coefficients (as discussed in the previous section) and then decide the number of
fixations based on the value of that duration.
57
Chapter 3 Perception model
No of Fixations
250
200
Total No. of Fixations
150
100
50
0
0-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000 >1000
Maximum Fixation Duration (msec)
Number of Fixations
80
Able-bodied (Avg)
70
P1
Number of Fixations
60
P2
50 P3
40 P4
30 P5
P6
20
P7
10
P8
0
100 200 300 400 500 600 700 800 900 1000
Fixation Duration (in msec)
I investigated different strategies to explain and predict the actual eye movement
trajectory. I rearranged the points of fixation given by the eye tracker following
58
Chapter 3 Perception model
different eye movement strategies and then compared the rearrangements with the
actual sequences, which signify the actual trajectories.
I used the average Levenshtein distance between actual and predicted eye fixation
sequences to compare different eye movement strategies. I converted each sequence
of points of fixation into a string of characters by dividing the screen into 36 regions
and replacing a point of fixation by a character according to its position in the screen
[Privitera and Stark, 2000]. The Levenshtein distance measures the minimum number
of operations needed to transform one string into the other, where an operation is an
insertion, deletion, or substitution of a single character. I normalized the Levenshtein
No.ofOperations
distance within a range of 0 to 1 using the following formula 1 − .
StringLength
I considered the following eye movement strategies,
o Nearest strategy [Fleetwood and Byrne, 2002; 2006]: At each instant, the
model shifts attention to the nearest probable point of attention fixation from
the current position.
o Systematic Strategy: Eyes move systematically from left to right and top to
bottom.
o Random Strategy: Attention randomly shifts to any probable point of
fixation.
o Cluster Strategy: The probable points of attention fixation are clustered
according to their spatial position and attention shifts to the centre of one of
these clusters. This strategy reflects the fact that a saccade tends to land at the
centre of gravity of a set of possible targets [O’Regan, 1992; Findlay, 1992;
1997], which is particularly noticeable in eye tracking studies on reading
tasks.
59
Chapter 3 Perception model
o Cluster Nearest (CN): The points of fixations are clustered and the first
saccade launches at the centre of the biggest cluster (highest number of points
of fixation). Then the strategy switches to the Nearest strategy.
Figures 3-13 and 3-14 show the average Levenshtein distance for different eye
movement strategies for able-bodied and visually impaired participants respectively.
The best strategy varies across participants. However one of the Cluster, Nearest and
Cluster Nearest (CN) strategies was best for each participant individually. I did not
find any difference in the eye movement patterns of able-bodied and visually impaired
users. The Cluster Nearest strategy turns out to be the best considering all participants
together. It is also significantly better than the random strategy (Figure 3-15, two
tailed paired t-test, t(180) = 3.89, p < 0.001), which indicates that it actually captures
the pattern of eye movement in most of the cases.
0.60
Average Levenshtine Distance
0.50
Nearest
0.40 Systematic
Cluster
0.30 CN
NR
0.20 NCR
Random
0.10
0.00
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 All
Participants
60
Chapter 3 Perception model
0.45
Average Levenshtine Distance
0.40
0.35
CN
0.30
Cluster
0.25
Nearest
0.20
Systematic
0.15
0.10 Random
0.05
0.00
P1 P2 P3 P4 P5 P6 P7 P8 All
Participants
Average
Levenshitin
Distance
Figure 3-15. Comparing the best strategy against the Random strategy
3.6. Validation
Initially I have used a 10-fold cross validation test on the classifiers to predict fixation
durations. In this test, 90% data was randomly selected for training and the prediction
61
Chapter 3 Perception model
was tested on the remaining 10%. The process is repeated 10 times and the prediction
error is averaged. It can be seen that the prediction error is less than or equal to 40%
for 12 out of 18 participants and 40% taking all participants together (Figure 3-16).
70
60
50
% Error
40
30
20
10
0
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
P1
P2
P3
P4
P5
P6
P7
P8
All
Participants
Then, I have used the model to predict the total fixation time for each individual
search task by each participant. The total fixation time is summation of all fixation
durations, which is nearly same as the visual search time. Table 3-3 shows the
correlation coefficient between actual and predicted time for each participant. Figure
3-17 shows a scatter plot of the actual and predicted times taking all able-bodied
participants together and Figure 3-18 shows the scatter plot for each visually impaired
participant.
For able-bodied participants, the predicted time significantly correlates with the actual
for 6 participants (each undertook 10 search tasks), correlates moderately for 3
participants and did not work for one participant (participant C8). For visually
impaired participants, the predicted time significantly correlates with the actual for 5
participants (each undertook 10 search tasks), correlates moderately for 3 participants.
62
Chapter 3 Perception model
Table 3-3. Correlation between actual and predicted total fixation time
Participants Correlation
C1 0.74*
C2 0.79**
C3 0.78**
C4 0.46
C5 0.44
C6 0.74*
C7 0.53
C8 -0.31
C9 0.91**
C10 0.66*
P1 0.85**
P2 0.45
P3 0.63
P4 0.67*
P5 0.84**
P6 0.76**
P7 0.73**
P8 0.53
** p < 0.01
* p < 0.05
63
Chapter 3 Perception model
20000
Predicted Time (in msec)
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
0 2000 4000 6000 8000 10000 12000
Actual Time (in msec)
Figure 3-17. Scatter plot of actual and predicted time for able-bodied users
20000 P4
P5
P6
15000
P7
P8
10000 Linear (P1)
Linear (P2)
5000 Linear (P3)
Linear (P4)
0 Linear (P5)
0 2000 4000 6000 8000 10000 12000 Linear (P6)
Linear (P7)
Actual Time (for msec.)
Linear (P8)
Figure 3-18. Scatter plot of actual and predicted time for visually impaired users
I also validated the model using a Leave-1-out validation test. In this process I tested
the model for each participant by training the classifiers using data from the other
participants. Figure 3-19 shows the scatter plot of actual and predicted time. The
64
Chapter 3 Perception model
predicted and actual time is correlated significantly (ρ = 0.5, p < 0.01). I also
Predicted − Actual
calculated the relative error Actual and show its distribution in Figure 3-20.
The superimposed curve shows a normal distribution with same mean (-5%) and
standard deviation (66%) as the relative error. I also found that 64% of the trials have
a relative error within ± 40%.
20000
18000
16000
14000
Predicted Time (in msec)
12000
10000
8000
6000
4000
2000
0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
Actual Time (in msec)
65
Chapter 3 Perception model
Error Plot
25
20
15
% Data
10
0
0
0
0
0
0
0
0
0
0
0
00
80
60
40
20
00
0
20
40
60
80
10
12
14
16
18
20
-8
-6
-4
-2
-2
-1
-1
-1
-1
-1
% Error
Finally I validated the model by taking data from seven new participants (Table 3-4).
I used a single classifier for all of them which was trained by the previous data set. I
did not change the value of any parameter of the model for any participant. Table 3-4
shows the correlation coefficients between actual and predicted time for each
participant. Figure 3-21 shows a scatter plot of the actual and predicted times for each
participant. It can be seen that the prediction from the model significantly correlates
with actual for 6 out of 7 participants.
Table 3-5 shows the actual and predicted visual search paths for some sample tasks.
The prediction is similar though not exactly same. The model successfully detected
most of the points of fixation. In the second picture of Table 3-5, there is only one
target, which pops out from the background. The model successfully captures this
parallel searching effect while the serial searching is also captured in the other cases.
The last figure shows the prediction for a protanope (a type of colour blindness)
66
Chapter 3 Perception model
participant and so the right hand figure is different from the left hand one as the effect
of protanopia was simulated on the input image.
* p < 0.05
** p < 0.01
14000
V1
12000 V2
Predicted Time (in msec)
V3
10000
V4
V5
8000
V6
6000 V7
Linear (V1)
4000 Linear (V2)
Linear (V3)
2000 Linear (V4)
Linear (V5)
0 Linear (V6)
0 2000 4000 6000 8000 10000 12000 14000 Linear (V7)
Actual Time (in msec)
Figure 3-21. Scatter plot of actual and predicted time for new users
67
Chapter 3 Perception model
68
Chapter 3 Perception model
3.7. Discussion
The eye tracking data shows that the eye movement patterns are different for different
participants. The performance of the eye tracker (drift, fixation identification and so
on) also differs across participants.
I found that the visual search time is greater for visually impaired users than for able-
bodied users. However, the eye movement strategies of visually impaired users were
not different from their able-bodied counterparts. This is due to the fact that the V4
region in the brain controls the visual scanning and visually impaired participants did
not have any brain injury and so the V4 region worked the same as the able-bodied
users. However, visually impaired users had a greater number of attention fixations
which made the search time longer. Additionally the difference between the numbers
of fixations for able-bodied and visually impaired users was more prominent for
shorter duration fixations (less than 400 msec). Perhaps this means that visually
impaired users needed many short duration fixations to confirm the recognition of
target.
From an interface designers’ point of view, these results indicate that the clarity and
distinctiveness of targets are more important than the arrangement of the targets in a
screen. Since the eye movement patterns are almost identical for all users, the
69
Chapter 3 Perception model
arrangement of the targets need not be different to cater for visually impaired users.
However clarity and distinctiveness of targets will reduce the visual search time by
reducing the recognition time and the number of fixations as well.
However, in real life situations the model fails to take account of the domain
knowledge of users. This knowledge can be either application specific or application
independent. There is no way to simulate application specific domain knowledge
without knowing the application beforehand. However there are certain types of
domain knowledge that are application independent and apply for almost all
applications. For example, the appearance of a pop-up window immediately shifts
attention in real life, however the model still looks for probable targets in the other
parts of the screen. Similarly, when the target is a text box, users focus attention on
the corresponding labels rather than other text boxes, which I have not yet modelled.
There is also scope to model perceptual learning. For that purpose, I could incorporate
a factor like the frequency factor of the EMMA model [Salvucci 2001] or consider
some high level features like the caption of a widget, handle of the application and so
on to remember the utility of a location for a certain application. These issues did not
70
Chapter 3 Perception model
arise in most previous work since they considered very specific and simple search
tasks.
Table 3-6. Comparative analysis of my model
3.8. Conclusion
In this work, I have developed a systematic model of visual perception which works
for people with a wide range of abilities. I have used image processing algorithms to
quantify the perceptual similarities among objects and predict the fixation duration
based on that. I have calibrated the model by considering different eye movement
strategies. My model is intended to be used by software engineers to design software
interfaces. So I have tried to make the model easy to use and comprehend. As a result
it is not so detailed and accurate to explain the results of different psychological
experiments on visual perception. However, it is accurate enough to select the best
interface among a pool of interfaces based on the visual search time. Additionally, it
can be tuned to capture the individual differences among users and to generate
accurate prediction for any user.
71
Chapter 4 Cognitive model
The human mind does not reach its goals mysteriously or miraculously. Even its sudden
insights and "ahas" are explainable in terms of recognition processes, well-informed search,
knowledge-prepared experiences of surprise, and changes in representation motivated by
shifts in attention. When we incorporate these processes into our theory, as empirical
evidence says we should, the unexplainable is explained.
4.1. Introduction
Cognition refers to the underlying mental processes of our all activities including
perception, intuition, reasoning, judgement and so on. Research on cognitive
modelling dates back to the work of computational psychologists during the 1940s.
The early attempts of cognitive modelling [Duffy, 2008] include the use of various
mathematical models like Bayes’ decision model (e.g. Edwards [1962] probabilistic
information processor) or Shanon’s information theory. Recent research on cognitive
modelling address a wide range of topics such as investigating mental processes for
new idea generation [Wang, 2008], speech perception [Strauss, Mirman and
Magnuson, 2006], bilingualism [Li, 2006], knowledge representation, learning
[Griffiths, Kemp and Tenenbaum, 2008] and so on. However the domain of cognitive
modelling is currently overwhelmed by the cognitive architectures and models
developed using them. As I discussed in chapter 2, the main problems with cognitive
architectures are
My cognitive model hits a balance between the details of cognitive architectures and
the comprehensibility of GOMS family of HCI models. It takes a task definition as
input and produces the most probable users’ action as output. In the following
sections I discuss the design of the model and demonstrate three case studies of using
the model. However I have not addressed cognitive impairment in the present
research.
4.2. Design
I have modelled the optimal (expert) and sub-optimal (non expert) behaviour
separately. I have used the CPM-GOMS model [John and Kieras, 1996] to simulate
the optimal behaviour. For sub-optimal behaviour, I have developed a new model.
This model takes a task definition as input and produces a sequence of operations
needed to accomplish the task as output. It simulates interaction patterns of non expert
users by two interacting Markov processes. One of them models the user’s view of the
system and the other signifies the designer’s view of the system. Users operate in the
users’ space to achieve their goals. They do this by converting their intended actions
into an operation offered by the device. At the same time, they map a state of the
device space into a state of the user space to decide on the next action. Users behave
sub-optimally when these mappings between the device space and the user space drift
apart. The assumptions can be summarized as follows:
o Users and devices operate in two different state spaces [Rieman and Young,
1996].
o A good interface will minimize the mismatch between the user space and the
device space.
The operation of the system is illustrated in Figure 4-1. At any stage, users have a
fixed policy based on the current task in hand. The policy produces an action, which
in turn is converted into a device operation (e.g. clicking on a button, selecting a menu
item and so on). After application of the operation, the device moves to a new state.
Users have to map this state to one of the state in the user space. Then they again
decide a new action until the goal state is achieved.
Action Operation
Old User Action to Old Device
Operation
State State
Mapping
74
Chapter 4 Cognitive model
4.2.1. Learning
Besides performance simulation, my model also has the ability to learn new
techniques for interactions. Learning can occur either offline or online. The offline
learning takes place when the user of the model (such as an interface designer) adds
new states or operations to the user space. The model can also learn new states and
operations itself. During execution, whenever the model cannot map the intended
action of the user into an operation permissible by the device, it tries to learn a new
operation. To do so, it first asks for instructions from outside. The interface designer
is provided with the information about previous, current and future states and he can
choose an operation on behalf of the model. If the model does not get any external
instructions then it searches the state transition matrix of the device space and selects
an operation according to the label matching principle [Rieman and Young, 1996]. If
the label matching principle cannot return a prospective operation, it randomly selects
an operation that can change the device state in a favourable way. It then adds this
new operation to the user space and updates the state transition matrix of the user
space accordingly. In the same way, the model can also learn a new device state.
Whenever it arrives in a device state unknown to the user space, it adds this new state
to the user space. It then selects or learns an operation that can bring the device into a
state desirable to the user. If it cannot reach a desirable state, it simply selects or
learns an operation that can bring the device into a state known to the user.
The model can also simulate the practice effect of users. Initially the mapping
between the user space and the device space remains uncertain. It means that the
probabilities for each pair of state/action in the user space and state/operation in the
device space are less than 1. After each successful completion of a task the model
increases the probabilities of those mappings that lead to the successful completion of
the task and after sufficient practice the probability values of certain mappings reach
75
Chapter 4 Cognitive model
one. At this stage the user can map his space unambiguously to the device space and
thus behave optimally.
One important aspect of a cognitive model is its own usability, which is mostly
ignored in the current literature on cognitive models. I have developed user interfaces
for developing and running the model (Figures 4-2 and 4-3 respectively). Following
my approach, any model should be developed in three steps. In the first step, the
designer has to specify the possible user states and actions. Then he has to define a
state transition diagram for the current task by selecting a state and an action
alternatively. This can be done with the help of a physical data flow diagram (for
structured design) or a state transition diagram (for object-oriented design), which are
developed as part of the system design document. Individual entries of the probability
transition matrix can be modified by clicking on the ‘Advanced Control’ button
(Figure 4-2a). In step 2, all of the previous operations have to be repeated for
developing the device space. Finally in step 3, the states and actions of the user space
and the device space have to be mapped to each other. The mapping can be done by
defining a joint probability distribution matrix using the interface shown in Figure 4-
2d. The interface designer is also free to choose any advanced modelling techniques
(such as a rule-based system or a decision network) to model the mapping between
the user space and the device space. Once developed, the model can be run using the
interface shown in Figure 4-3a. At this stage, the system can also be used to define
and simulate a new task (Figure 4-3b).
76
Chapter 4 Cognitive model
a simple but non trivial task. The second case study shows the use of the model to
simulate an assistive interaction technique. In this case study I have mainly
demonstrated the probabilistic mapping between the users’ space and device space.
The third case study demonstrates the simulation of users’ behaviour for a new
interface and highlights the learning capability of the model.
77
Chapter 4 Cognitive model
78
Chapter 4 Cognitive model
79
Chapter 4 Cognitive model
80
Chapter 4 Cognitive model
Initially I developed a cognitive model for simple icon manipulation operations (such
as opening, copying, cutting, deleting a folder or a shortcut). These icon manipulation
operations can be done in more than one way- either using the keyboard shortcuts or
by using the popup menu after right clicking on the icon. Following my cognitive
model, the state space diagrams for these icon manipulation operations are shown in
Figure 4-4.
Table 4-1 shows the output from the model for opening a folder or an application
through a shortcut. The model can be configured to do the operation in either way or
by randomly selecting one of the ways. While it uses the popup menu for the first
time, it learns the new states and operations and updates the user space accordingly.
81
Chapter 4 Cognitive model
Figure 4-4. User and Device spaces for icon manipulation operation
Table 4-1. Output from the model for icon manipulation operation
Model No.: 1
Task Name: Openning Application
Learning Rate: 10
Device Space User Space
82
Chapter 4 Cognitive model
In GOMS analyses, the selection rules are often ignored. This small demonstration
shows how we can incorporate selection rules into a cognitive model. My model also
permits to set a priority order among different methods of undertaking a task. Besides
that, this demonstration also shows how it can learn new operations, which is not
possible in GOMS models. Moreover, we do not need to write a set of detailed
procedural rules to accomplish these, as we would in a cognitive architecture.
83
Chapter 4 Cognitive model
able-bodied users, which ensures that the sub-optimality resulted only due to lack of
skill rather than physical impairment.
The model
In eight-directional Scanning technique the pointer icon is changed at regular time
intervals to show one of eight directions (Up, Up-Left, Left, Left-Down, Down,
Down-Right, Right, Right-Up). The user can choose a direction by pressing the
switch when the pointer icon shows the required direction. After getting the direction
choice, the pointer starts moving. When the pointer reaches the desired point in the
screen, the user has to make another key press to stop the pointer movement and make
a click. A state chart diagram of the scanning system is shown in Figure 4-5, which is
same for user and device spaces in this case. A demonstration of the scanning system
can be seen at http://www.youtube.com/watch? v=0eSyyXeBoXQand feature=user.
84
Chapter 4 Cognitive model
The task hierarchy using a CPM-GOMS model of a sample session is shown in Figure
4-6. The model can determine the optimal direction for movement from the source
and target coordinates. If a horizontal or vertical line from the source can reach the
target, then one of the left, right, up or down directions is chosen. Otherwise, the
pointer is moved diagonally until it reaches the same horizontal or vertical line as the
target. The eight-directional scanning system takes the scan delay and scan step as
input. The scan delay is the time interval between any two state changes of the system
while the scan step is the distance crossed by the cursor during an interval equal to the
scan delay. I set the default values of the two parameters at 1 sec and 10 pixels
respectively.
In the eight-directional scanning system, I have found users behave sub-optimally for
the following three reasons:
• They do not always choose the best direction of movement.
• They try to place the cursor exactly over the centre of the target before
clicking.
85
Chapter 4 Cognitive model
In terms of my model, it means that the user space differs from the device space for
the ‘Start Moving’ and ‘Stop Moving’ operations (refer Figure 4-5). I have modelled
this difference by a rule-based system developed in CLIPS [2007]. The rule based
system models the uncertainty in choosing a direction of movement and the stopping
position of the pointer. The rules for choosing direction take the difference in target
86
Chapter 4 Cognitive model
coordinates and current coordinates as input and give the probabilities of different
direction choices as output. The eight-directional scanning system shows the direction
choices in a particular sequence. The general structure of a rule to select a direction is
as follows:
Choose direction choice that comes after the optimum direction choice with
probability p2
Choose direction choice that comes before the optimum direction choice with
probability p3
And p1>>>p2>p3 for Left, Right, Down and Up, since novice users prefer Manhattan
direction choices than diagonal
The values of p1, p2, and p3 are chosen to reduce the error in prediction. However the
parameters were kept the same for all participants to generalize the model. I have
found that users show more sub-optimal behaviour as the separation between source
and targets increases. So the probability of correct direction choice is made inversely
proportional to the separation between the source and the target. Since CLIPS fires
rules concurrently, a direction choice may appear with two different probability
values. In that case I consider the average probability.
I have found that users stop the pointer movement almost optimally when the source
to target distance is small (less than 700 pixels for a 1280×800 pixel resolution
screen) or when the distance is very large (more than 1300 pixels). In other situations,
users often stop the pointer movement before or after the optimum time instant. When
87
Chapter 4 Cognitive model
the pointer is close to the target, users also frequently failed to stop the pointer at the
optimum point. So the rules take the source to target distance as input and give the
probability of deviation of input from optimum position as output. The general
structure of rules for stopping condition is as follows
where p1>>p2,p3>p4,p5
In the fragment above, deviation_of_input 1 means the pointer is stopped after going
one step further from the optimum stopping position, similarly deviation_of_input –1
means the pointer is stopped one step before the optimum stopping position.
Process
In this experiment, the participants were instructed to select buttons which were
randomly placed on a screen. The buttons had to be pressed in a particular sequence,
chosen randomly for each execution of the experiment. The random arrangement of
88
Chapter 4 Cognitive model
buttons ensured that the experimental set up was not biased towards any particular
screen layout or navigation pattern. All of the buttons were colored red except the
next target, which was green. After each button press, the last pressed button was
disabled to show that it was no longer a target. The buttons were also labeled with a
serial number to indicate the sequence. The actual task to be performed by the
scanning system was kept very simple so that it would not impose any cognitive load
on users. Hence any sub-optimal behaviour occurred only because of the scanning
technique itself.
Material
The experiment was carried out on a Laptop with an LCD screen of resolution
1280×800 pixels using the Windows XP operating system. A single keyboard switch
was used to control the scanning techniques. The scan delay was set at 1000 ms. The
dimension of the buttons was 25×40 pixels and kept constant throughout the
experiment.
Participants
Eight able-bodied participants undertook the experiment. The able-bodied participants
were undergraduate and graduate students of my institute. None of them had any
colour blindness. Six participants were male and two were female. Their age ranged
from 23 to 35 years.
Results
The actual and predicted task completion times are shown in Table 4-2 and Figure 4-
7. The predictions are obtained by running Monte-Carlo simulation. It can be seen
from Figure 4-7 that, with two exceptions, the model can predict task completion time
Predicted − Actual
with an overall relative error Actual within ±7%. I also do not find any
89
Chapter 4 Cognitive model
statistically significant difference between actual and predicted task completion time
(t(8) = 0.31, p > 0.05 for a two tailed paired t-test).
Table 4-2. Actual and Predicted Task Completion Time (in sec)
Relative
Participants Actual Predicted
Difference
P1 364 384 5.5%
P2 391 506 29.4%
P3 386 370 4.2%
P4 367 457 24.5%
P5 314 335 6.7%
P6 303 303 0.0%
P7 299 312 4.4%
P8 473 474 0.2%
90
Chapter 4 Cognitive model
In this particular example, the difference between the user space and the device space
lies in the interpretation of the ‘Send Mail’ operator. After clicking on the ‘Send Mail’
91
Chapter 4 Cognitive model
button, users expected that they would automatically be asked to specify a recipient,
which was not supported by the interface. As a result while executing the task for the
first time, users encountered the error message and learned the operation ‘Give
Recipient’. After specifying the recipient, the user wanted to confirm the sending
operation. The ‘ConfirmSending’ action did not have any matching operation in the
device space. At this stage the model applied the label matching principle[Rieman and
Young, 1996], which successfully returned the ‘Send Mail’ operation in the device
space. At the next iteration, the model performed the task optimally by using its
learned knowledge.
Table 4-3. Mapping between the user space and device space
92
Chapter 4 Cognitive model
93
Chapter 4 Cognitive model
94
Chapter 4 Cognitive model
4.4. Conclusion
Cognition covers a wide range of topics and it is almost impossible to develop a
single model to simulate all aspects of cognition. Even the cognitive architectures fail
to model high level cognitive functions like affective state, consciousness and so on
My cognitive model intends to simulate human computer interaction only. However it
is not as primitive as GOMS family of models and can simulate the performance of
novice users. In contrast to the cognitive architectures, the model does not need
detailed knowledge of psychology to operate. It has graphical user interfaces to
provide input parameters and showing output of simulation.
I have not considered simulating the effect of cognitive impairment. However the
model can be extended to simulate a few types of impairment like lack of short-term
memory (as in dyslexia), inadequacy in planning which is apparent in some cases of
autism according to the executive dysfunction hypothesis [Burack and collegues,
2001]. For example, lack of short-term memory can be modelled by limiting the
number of states in the user space. The number of states will depict the maximum
amount of information that a user can keep in mind at one time. The state transition
matrix in the user space can also be calibrated to simulate problems in planning.
I have addressed the issue of lack of detail in the model in comparison to the
cognitive architectures by separately modelling perception and motor action. I have
already presented the perception model in the previous chapter and described the
motor behaviour model in the next chapter.
95
Chapter 5 Motor behaviour model
Movement is used in some way, to some degree, in every task accomplished by human beings.
Every individual needs to understand human movement so that any task -- light or heavy, fine
or gross, fast or slow, of long or short duration, whether it involves everyday living skills,
work skills or recreation skills - can be approached effectively.
- Marion R. Broer, from her book "Efficiency of Human Movement", Saunders, 1966
5.1. Introduction
Pointing tasks form a significant part of human computer interaction (HCI) in graphi-
cal user interfaces. Fitts’ Law [Fitts, 1954] and its variations [Mackenzie, 2003] are
widely used to model pointing as a sequence of rapid aiming movements, especially
for able-bodied users. Fitts’ Law [Fitts, 1954] predicts the movement time as a func-
tion of the width and distance to the target. This law is found to be very robust and
works in many different situations including space and under water. However the ap-
plication of Fitts’ Law for people with motor impairment is debatable. Motor im-
paired users only conform to Fitts’ Law when the task is very simple and thus requires
less coordination between vision and motor action [Smits-Engelsman, 2007] or there
are other sensory cues besides vision [Gajos, Wobbrock and Weld, 2007].
I have developed a statistical model to predict the movement time of pointing tasks
performed by people with motor impairment. Prediction from my model has signifi-
cantly correlated with the actual pointing time for different data sets. As part of the
model, I have also developed a new scale of characterizing the extent of disability of
users by measuring their grip strength. Finally I have found that hand strength also
affects performance of able-bodied users and the Index of Performance in a Fitts’ Law
task significantly correlates with grip and tip pinch strength.
Chapter 5 Motor behaviour model
There are a few works to develop an alternative to Fitts’ Law for motor impaired peo-
ple. Gump and colleagues [2002] found significant correlation between the movement
time and the root of movement amplitude (Ballistic Movement Factor [Gan and
Hoffmann, 1988]). Gajos, Wobbrock and Weld [2007] estimated the movement time
by selecting a set of features from a pool of seven functions of movement amplitude
and target width, and then using the selected features in a linear regression model.
This model shows interesting characteristics of movement patterns among different
users but fails to develop a single model for all. Movement patterns of different users
are found to be inclined to different functions of distance and width of targets.
97
Chapter 5 Motor behaviour model
5.3. Design
Able-bodied users move the mouse pointer towards a target by a single long sub
movement followed by some smaller sub movements to home on the target. In the
original formulation of Fitts’ Law [Fitts, 1954], it was assumed that a rapid aiming
movement consists of two phases:
However, this assumption does not hold for motor impaired users. It is due to the fact
that their movement is disturbed by many pauses and they rarely make a big move-
ment towards the target. The main difference between the mouse movement of the
motor impaired and able-bodied users lie in the characteristics of the sub movements
[Trewin and Pain, 1999]. The number of sub movements for motor impaired users is
greater than that of able-bodied users and the main movement towards the target is
often composed of two or more sub movements. The time spent between two sub
movements (described as pause) also significantly affects the total task completion
time. So my model estimates the total task completion time by calculating the average
number of sub movements in a single pointing task, their average duration, and the
average duration of pauses. In the present study, I define a pause as the event when
the mouse stops movement for more than 100 msec. and a sub movement is defined as
a movement occurring between two pauses (Figure 5-1).
To reveal the characteristics of the sub movements and the pauses, I have clustered
the points where the pauses occurred (i.e. a new sub movement started). I have evalu-
ated the optimum number of clusters by using Classification Entropy [Ross, 1997] as
98
Chapter 5 Motor behaviour model
a cluster validation index. The optimum number of clusters is three. I have found that
about 90% of the sub movements took place when the mouse pointer is
o near the source such that the pointer has not moved more than 20% of the total
distance or,
o near the target such that the pointer has moved more than 85% of the total dis-
tance.
The remaining 10% of the sub movements actually constitutes the main movement.
The positions of the cluster centres indicate three phases of movement
• Starting Phase: This phase consists of small sub movements near the
source, perhaps while the user gets control of the pointing device.
• Middle Phase: This consists of relatively large sub movements which bring
the pointer near the target.
• Homing Phase: This is similar to the homing phase in Fitts’ Law, though
the number of sub movements is greater.
So my model divided the sub movements and pauses during a pointing task into three
classes based on their position with respect to the source and the target as shown in
Figure 5-1 (the thick blue line depicts a sample cursor trace between a source and a
target). The movement time is estimated as:
p1 (d 1 + s1 ) + p 2 ⋅ d 2 + f (Dist / v 2 ) + p 3 (d 3 + s 3 ) − (s1 + s 3 )
Where,
Dist Distance from source to target
p1 Number of pauses near source
d1 Average duration of a pause near source
99
Chapter 5 Motor behaviour model
Source
Target
Initial phase
Main movement
Homing phase
Structure of a sub-movement
100
Chapter 5 Motor behaviour model
The motor behaviour model takes the position and size of the target as input (Figure
5-2). It also considers the extent of disability of a user. In later sections of the chapter
I have explained the techniques used to measure the extent of disability.
Figure 5-3 shows an example of the output from the model. The thin purple line
shows a sample trajectory of mouse movement of a motor impaired user. It can be
seen that the trajectory contains random movements near the source and the target.
The thick red and black lines encircle the contour of these random movements. The
area under the contour has a high probability of missed clicks as the movement is ran-
dom there and thus lacks control. A good interface should not have more than one tar-
get in this contour and the contour should help to decide the amount of separation be-
tween icons.
101
Chapter 5 Motor behaviour model
102
Chapter 5 Motor behaviour model
source, main movement, near target). I blurred these boundaries to make the model
more realistic. I calculated the probability of a pause from the function shown in Fig-
ure 5-7. As can be seen from Figure 5-7, the probability of a pause gradually increases
to 1 near the source and the target. The pause duration is estimated by multiplying it
with the probability of occurrence of a pause.
103
Chapter 5 Motor behaviour model
104
Chapter 5 Motor behaviour model
actual and predicted average task completion times and a Z-score distribution of the
actual and predictions are shown in Table 5-1 and Figure 5-9 respectively. Figure 5-8
presents a scatter diagram of actual and average predicted time. The median of the Z-
scores was at –0.27 instead of 0, however the predicted average task completion time
significantly correlates (p < 0.002) with the actual.
1
0.9
Probability of a Pause
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120 140 160
Normalized Distance
Average Actual
Participants Predicted Time
Time (msec.) (msec.)
P1 3566 1880
P2 4138 2176
P3 3418 2400
P4 4018 2500
P5 3920 2907
105
Chapter 5 Motor behaviour model
Average Actual
Participants Predicted Time
Time (msec.) (msec.)
P7 14632 10309
P9 7389 2796
P11 687 1293
P12 14512 9349
P14 14974 22833
P15 4134 10478
P16 3584 1629
P17 7895 15888
P19 4018 2335
P20 3188 8771
r = 0.71, t(15) = 3.64, p < 0.01
25000
20000
Actual Time
15000
10000
5000
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Average Predicted Time
Figure 5-8. Scatter Diagram of Actual vs. Predicted Task Completion Time (in
msec.)
106
Chapter 5 Motor behaviour model
Several clinical scales have been used to measure disability (e.g. Ashworth scale, the
Weighted disability score, Tardieu Scale, Spasticity Grading [Barnes and Johnson,
2001; Scholtes and colleagues, 2006]. etc.), but they are not really applicable for
modelling HCI. The clinical scales deal with a single disease and often are very sub-
jective (such as the Ashworth scale for Spasticity). The descriptions of disease of the
users are inadequate to calibrate a model numerically. I found in my previous dataset
[Trewin and Pain, 1999] that user survey about their skills is also not very accurate.
So I have developed a new scale by evaluating the hand strength of motor impaired
107
Chapter 5 Motor behaviour model
users and then correlating them with their HCI performance (such as the task comple-
tion time, number of pauses and so on). It has already been found that active range of
motion (ROM) of wrist can be significantly correlated with the movement time of a
Fitts’ Law task for children with spasticity [Smits-Engelsman and colleagues, 2007].
Hand evaluation devices are cheap, easy to operate and have good test-retest reliabil-
ity [Mathiowetz and colleagues, 1984]. So these are reliable and useful tools for
measuring physical strength making these results useful in practice.
5.5.1. Process
My study consisted of pointing tasks. A sample screenshot of the task is shown in
Figure 5-10. I followed the description of the multiple tapping tasks in ISO 9241 part
9. In this task the pointer was initially located at the middle of the screen. The partici-
pants had to move it towards a target (one of the red dots, appearing a light grey in
monochrome), and click on it. This process was repeated for all the targets. There
were eight targets on the screen and each participant performed the test twice except
one participant (P2), who retired after completing the first test. The distances to the
targets ranged from 200 to 600 pixels while target widths were randomly selected as
an integer between 16 and 48 pixels.
5.5.2. Material
I used a standard optical Mouse and an Acer Aspire 1640 Laptop with a 15.5” monitor
having 1280×800 pixel resolution. I also used the same seating arrangement (same
table height and distance from table) for all participants. I measured the following six
variables for hand strength evaluation (Figure 5-11). Each variable was measured
three times and the average value was considered. I evaluated only one hand (the
dominant hand) of the participants which they used to operate the mouse.
108
Chapter 5 Motor behaviour model
Grip strength measures how much force a person can exert gripping with the hand. I
measured it using a mechanical dynamometer.
Tip pinch strength measures the maximum force generated by a person squeezing
something between the tips of his thumb and index finger. I measured it using a me-
chanical dynamometer.
Radial deviation is the motion that rotates the wrist away from the midline of the
body when the person is standing in the standard anatomical position [Kaplan, 2006].
When the hand is placed over a table with palm facing down, this motion rotates the
hand about the wrist towards the thumb. I measured the maximum radial deviation
using a goniometer.
109
Chapter 5 Motor behaviour model
Ulnar deviation is the motion that rotates the wrist towards the midline of the body
when the person is standing in the standard anatomical position. When the hand is
placed over a table with palm facing down, this motion rotates the hand about the
wrist towards the little finger. I measured it with the goniometer.
Pronation is the rotation of the forearm so that the palm moves from a facing up posi-
tion to a facing down position. I measured it using a wrist-inclinometer.
Supination is the opposite of pronation, the rotation of the forearm so that the palm
moves from a facing down position to a facing up position. I measured it with the
wrist-inclinometer.
110
Chapter 5 Motor behaviour model
111
Chapter 5 Motor behaviour model
Pronation
Measuring Pronation
Supination
Measuring Supination
5.5.3. Participants
I initially collected data from 10 motor impaired and 6 able-bodied participants (Trial
1 in Table 5-2). The motor impaired participants were recruited from a local centre,
which works on treatment and rehabilitation of disabled people, and they volunteered
112
Chapter 5 Motor behaviour model
for the study. To generalize the study, I selected participants with both hypokinetic
and hyperkinetic movement disorders [Flowers, 1976]. Hypokinetic motor impair-
ment results in restricted movement of limbs, (e.g. participants P1, P3, P4 and so on)
while hyperkinetic refers to uncontrolled movement or tremor, (e.g. participants P5,
P6 and so on). All motor impaired participants used a computer at least once each
week. Able-bodied participants were students of my university and expert computer
users.
C1 30 M
C2 29 M
C3 28 M Able-bodied Trial 1
C4 25 M
C5 29 M
C6 27 F
P1 M Cerebral Palsy reduced manual dex-
30 Trial 1
terity wheel chair user.
Cerebral Palsy reduced manual dex-
P2 43 M terity also some tremor in hand Trial 1
wheel chair user.
P3 25-45 F One handed (dominant hand) the
Trial 1
other hand is paralyzed.
P4 30 M Dystonia cannot speak cannot move
Trial 1
fingers wheelchair user.
Left side (non dominant) paralysed
P5 62 M after a stroke in 1973 also has Trials 1 and 2
tremor
Cerebral attack significant tremor in
P6 44 M whole upper body part fingers al- Trial 1
ways remain folded.
P7 46 F Did not mention disease difficulty
Trial 1
in gripping things no tremor.
113
Chapter 5 Motor behaviour model
5.5.4. Results
I found that the movement time significantly correlates (ρ = 0.57, p < 0.001) with
the number of pauses in a pointing task. I also correlated the average number of
pauses per pointing task with the hand strength metrics. Figures 5-12 to 5-15 show the
graphs of number of pauses with respect to Grip Strength, active ROM of Wrist (Ul-
nar + Radial Deviation) and active ROM of Forearm (Pronation + Supination) respec-
tively. I found that some users did not have any range of motion in their wrist, though
they managed to move the mouse to perform the pointing tasks correctly. I also found
that the natural logarithm of grip strength (Figure 5-13) significantly correlates with
the mean (ρ = -0.72, p < 0.001) and standard deviation (ρ = -0.53, p < 0.05) of the
number of pauses per pointing task. I did not find any correlation between that
movement time and the distance, width or Fitts’ Law index of difficulty (ID) [Fitts,
1954] of the targets for motor impaired users. This may be due to the presence of
physical impairment and the small number of pointing tasks (only 16) performed by
the participants. I also did not find any significant correlations involving ranges of
motion (Figures 5-14 and 5-15).
I divided the whole movement path into three phases and observed how the hand
strength affects in the initial, main movement and homing phases. I found that grip
114
Chapter 5 Motor behaviour model
strength significantly correlates with the average number of pauses near the source
(Figure 5-16, ρ = -0.61, p < 0.01) and near the target (ρ = -0.78, p < 0.001). I also
found that the mean and standard deviation of the velocity of movement were signifi-
cantly correlated with grip strength (Figure 5-17, ρ = 0.82, p < 0.001 for mean and
ρ = 0.81, p < 0.001 for standard deviation).
18
16
14
Number of Pauses
12
10 Motor-impaired
8 Able-bodied
6
4
2
0
0 10 20 30 40 50 60 70
Grip Strength (in Kg)
Figure 5-12. Average number of Pauses per pointing task vs. Grip Strength
18
16
14
No. of Pauses
12
10
8
6
4
2
0
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Log of Grip Strength (in Kg)
Figure 5-13. Average number of Pauses per pointing task vs. Log of Grip Strength
115
Chapter 5 Motor behaviour model
18
16
14
Number of Pauses
12
10 Motor-impaired
8 Able-bodied
6
4
2
0
0 20 40 60 80 100 120
ROM Wrist (in degree)
Figure 5-14. Average number of Pauses per pointing task vs. Active range of ROM of
Wrist
18
16
14
Number of Pauses
12
10 Motor-impaired
8 Able-bodied
6
4
2
0
0 50 100 150 200 250
ROM ForeArm (in degree)
Figure 5-15. Average number of Pauses per pointing task vs. Active range of ROM of
Forearm
116
Chapter 5 Motor behaviour model
10
Number of Sub Movements
9
8
7
6 SMNS
5 SMIM
4 SMNE
3
2
1
0
0 10 20 30 40 50 60 70
Grip Strength (in Kg)
Figure 5-16. Average number of Pauses per pointing task in different phases of
movement vs. Grip Strength (SMNS: Sub Movement Near Source, SMIM: Sub
Movement in Middle SMNE: Sub Movement Near End)
0.4 Mean
0.35 Velocity
Velocity (in pixel/msec)
0.3
Stddev
0.25 Velocity
0.2
Linear
0.15 (Stddev
0.1 Velocity)
Linear
0.05 (Mean
0 Velocity)
0 10 20 30 40 50 60 70
Grip Strength (in Kg)
117
Chapter 5 Motor behaviour model
5.5.5. Calibration
I revised my model in the light of these results. Grip strength is used to predict the
number of pauses near the source and destination, and also to predict the speed of
movement. Probability distributions for the other factors were derived using the in-
verse transform method [Ross, 2002]. The model works based on following equations.
(
p1 = α + β × log(S ) + 0.5 × ρ × χ × e (δ ×S ) )
Where
α= 3.95
β = -0.84
χ= 2.29
δ = -0.02
ρ = a random value from a normal distribution with mean 0 and standard
deviation 1
S = Grip strength in kg
118
Chapter 5 Motor behaviour model
d1 = α × e β × ( χ + δ × µ )
Where
α = 3997279
β = -0.16
χ = 140
δ = 100
µ = a random value from a uniform distribution between 0 and 1
d 2 = α × e β ×( χ +δ × µ )
Where
α = 12956.60
β = -0.11
χ = 140
δ = 100
µ = a random value from a uniform distribution between 0 and 1
d 3 = α + β × log( S )
Where
α = 449.72
β = -70.78
S = Grip strength in kg
119
Chapter 5 Motor behaviour model
Dist
MT = ( p1 − 1) × d1 + α × + d 2 + ( p3 − 1) × d 3
v2
Where
α = 0.9
MT Movement Time
Dist Distance from source to target
p1 Number of pauses near source
d1 Average duration of a pause near source
d2 Average duration of a pause in main movement
v2 Speed of movement in main movement
p3 Number of pauses near target
d3 Average duration of a pause near target
5.5.6. Validation
I tested the performance of my model on 232 pointing tasks performed by 10 motor
impaired and 6 able-bodied participants. The predictions were obtained by simulating
each pointing task using Monte-Carlo simulation. Figures 5-18 and 5-19 show the
scatter plot and relative error in prediction respectively. I calculated the relative error
by using the following formula (Pr edicted− Actual) . In Figure 5-19, I superimposed a Gaus-
Actual
sian curve with same mean and standard deviation as the relative error.
In 10% of the cases the error was more than ±70%, so the model has failed for those
tasks. However, the predicted values are significantly correlated with actual values
(ρ = 0.65, p < 0.001) with error within ±40% over half of the trials. The average rela-
tive error is -2% with a standard deviation of 57%.
120
Chapter 5 Motor behaviour model
Scatter Plot
5.5
5
Log(Predicted Time)
4.5
3.5
2.5
2.5 3 3.5 4 4.5 5
Log(Actual Time )
Error Plot
20
18
16
14
12
% Data
10
0
-120 -100 -80 -60 -40 -20 0 20 40 60 80 100 > 120
% Error
I further validated the model by taking data from six participants (Trial 2 in Table 5-
2). In this second trial, participants P5, P8, P9, P10 and two new participants took
121
Chapter 5 Motor behaviour model
part. As most participants felt fatigue quickly, I ran the trial for six minutes for each
participant. In total, they undertook 435 pointing tasks. Figures 5-20 and 5-21 show
the scatter plot and the distribution of relative error between actual and prediction re-
spectively. It can be seen the model did not work well for 10% of the tasks and the
relative error is greater than ±70%. For the remaining 90% of the trial, the actual is
significantly correlated with prediction (ρ = 0.56, p < 0.001). The relative error is
within ±30% for 60% of the trials (262 out of 435 pointing tasks). The average rela-
tive error is -16 % with a standard deviation of 34%. I also calculated the correlation
for each participant (Table 5-3). It can be seen that the prediction is significantly cor-
related (p < 0.01) with actual for five out of six participants.
Scatter Plot
6000
5000
Predicted Time (in msec)
4000
3000
2000
1000
0
0 1000 2000 3000 4000 5000 6000
Actual Time (in msec)
Figure 5-20. Scatter plot between actual and predicted task completion times
122
Chapter 5 Motor behaviour model
Error Plot
30
25
20
% Data
15
10
0
-120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120
% Error
Correlation
Participants
Coefficients
P5 0.41*
P8 0.44*
P9 0.61*
P10 0.55*
P11 0.30
P12 0.46*
* p < 0.01
123
Chapter 5 Motor behaviour model
5.6.1. Process
In the Fitts’ Law task, I used 26 different combinations of target amplitude (A, ranged
from 30 to 700 pixels) and target width (W, ranged from 16 to 48 pixels). The result-
ing index of difficulty (ID) ranged from 2 to 5. Each of the participants performed 450
pointing tasks.
5.6.2. Material
I used a standard optical Mouse and an Acer Aspire 1640 Laptop with 15.5” monitor
having 1280×800 pixel resolution. I also used the same seating arrangement for all
participants. I measured the same six variables for hand strength evaluation as in the
previous study.
5.6.3. Participants
I collected data from 14 able-bodied users (9 male, 5 female, and age range 22 to 50
with average age of 29.3). All participants were expert computer users.
5.6.4. Results
The correlation coefficients between index of difficulty ( ID ) and movement time
ranges from 0.73 to 0.95 with an average value of 0.85, which conforms to Fitts’ Law.
I compared the hand evaluation metrics with the Fitts’ Law coefficients (a and b
IDAverage
A
where, MT = a + b log 2 + 1 and Index of Performance (IP = MTAverage ). I found that
W
IP is significantly correlated with the grip strength and tip pinch strength (ρ = 0.57,
p < 0.05 for grip strength, ρ = 0.72, p < 0.005 for tip pinch strength, Figures 5-22 and
5-23 respectively). The parameter b significantly correlates with tip pinch strength (ρ
124
Chapter 5 Motor behaviour model
= 0.65, p < 0.01, Figure 5-24). I did not find any other significant correlation between
IP, a and b and any other hand evaluation metrics.
4
Index of Performance (bits/sec)
3.8
3.6
3.4
3.2
3
2.8
2.6
2.4
2.2
2
10 15 20 25 30 35 40 45 50 55 60
Grip Strngth (in Kg)
4
Index of Performance (bits/sec)
3.8
3.6
3.4
3.2
3
2.8
2.6
2.4
2.2
2
1 2 3 4 5 6 7 8 9
Tip Pinch Strength (in Kg)
125
Chapter 5 Motor behaviour model
360
320
280
Parameter b
240
200
160
120
80
40
1 2 3 4 5 6 7 8 9
Tip Pinch Strength (in Kg)
5.7. Discussion
For able-bodied users, pointing performance is generally analysed in terms of Fitts’
Law. Fitts’ Law can be applied to rapid aiming movements in many different con-
texts, but a proper explanation of this law is still unclear. Crossman and Goodeve pio-
neered an early but limited mathematical explanation [Rosenbaum, 1991]. They as-
sumed that a movement consists of several sub movements. Each sub movement takes
a constant time to execute and crosses a constant fraction of the total distance. The
last sub movement brings the pointer within the target. However many bio-
mechanical experiments failed to find existence of sub movements in all movements
as predicted by the Crossman-Goodeve model [Langolf and colleagues, 1976].
Schmidt and colleagues [1979] rejected the idea of existence of any sub movements
and they explained the speed-accuracy trade-off with the help of the initial impulse
generated by the muscles. Later Meyer and colleagues [1988] combined the ideas of
126
Chapter 5 Motor behaviour model
both sub movement and initial impulse. They formulated a generalized model of rapid
aiming movements in which Fitts’ Law appears as a special case when the number of
sub movements tends to infinity. However alternative explanations of rapid aiming
movements are also available (such as Polit and Bizzi’s [1978] Mass Spring model).
Fitts’ Law does not account for the users’ physical abilities in predicting movement
time. This seems reasonable for able-bodied users, but may not be correct for users
with disabilities. My analysis indicates that people having higher hand strength also
have greater control in hand movement and can perform pointing faster. The positive
correlation between the velocity of movement and grip strength also supports this
claim. As motor impairment reduces the strength of hands, motor impaired people
lose control of hand movement. So the number of pauses near the source and the tar-
get are significantly affected by the grip strength. The relation between grip strength
and number of pauses indicates that there is a minimum amount of grip strength
(about 20 kg) required to move the mouse without pausing more than twice. This
threshold of 20 kg can be used to determine the type of input device suitable for a
user, along with other factors like preference, expertise and so on. My analysis also
showed that flexibility of motion (as measured by range of motion of wrist or fore-
arm) is not as important as strength of hand (as measured by grip strength).
I found that hand strength affects pointing performance of able-bodied users, too. The
positive correlation between index of performance and hand strength shows people
with greater hand strength perform pointing faster. The correlation between the con-
stant term b and tip pinch strength indicates a difference in movement patterns among
people with different hand strengths. As the constant b indicates the effect of index of
difficulty (ID) on the movement time, perhaps the movement pattern of people with
higher hand strength mainly consists of an initial ballistic phase and does not have a
long homing phase since time to complete the homing phase should depend more on
127
Chapter 5 Motor behaviour model
the target characteristics. The opposite holds true for people with less hand strength.
Since the homing phase requires more control in hand movement, the negative corre-
lation between b and hand strength also indicates people having higher hand strength
also have greater control in hand movement.
However, the model did not work well for about 10% of pointing tasks. This 10%
data shifted the average relative error from zero and also increased the standard devia-
tion of relative errors. This failure can be attributed to various characteristics of users
like effects of learning and fatigue, interest, expertise and so on In future I plan to in-
corporate more input parameters into the model. I would also like to extend the scope
of the model beyond pointing with a mouse. I shall investigate different modalities of
interaction like finger or stylus based input [Hoffmann and Sheikh, 1991; Holzinger,
2003; Holzinger and colleagues, 2008] and effects of situational impairments in inter-
action [Schedlbauer and Heines, 2007; Schedlbauer, Pastel and Heines, 2008], which
will make the model useful for designing ubiquitous interfaces, too.
128
Chapter 5 Motor behaviour model
5.8. Conclusions
I have developed a motor behaviour model for those motor impaired people, who can
use their hands to interact with a computer. My statistical model has accurately pre-
dicted the task completion time for pointing tasks for different data sets. As part of the
model, I have also developed a new scale of characterizing the extent of disability of
users by measuring their grip strength. Finally I have showed that hand strength also
affects the performance of able-bodied users and the Index of Performance in a Fitts’
Law task significantly correlates with the grip and tip pinch strength. The model can
be used to optimize interface layout for motor impaired users based on completion
time of representative tasks.
129
Chapter 6 Applications
It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't
agree with experiment, it's wrong.
6.1. Introduction
The previous three chapters have presented the design, calibration and validation of
the individual models of the simulator. This chapter demonstrates the use of the
simulator as a whole. The first application uses the simulator to model an icon
searching task. The study validates the models for an externally valid task and also
demonstrates the application of the simulator to compare different interface layouts
for people with a wide range of abilities. The second application is the development of
a new scanning system by clustering screen objects. I have evaluated the scanning
technique against others by using the simulator and later confirmed the result by a
controlled experiment.
2. Initially, the cognitive model analyzes the task and produces a list of atomic
tasks.
3. If an atomic task involves perception, the perception model operates on the
event list and the sequence of bitmap images. Similarly, if an atomic task
involves movement, the motor behaviour model operates on the event list and
the high-level snapshot (Figure 6-1).
Application Model
Perception
Interface Model
Designer Low Level
Snapshot
User Characteristics
I have implemented the models in a modular fashion – all of the models can be run
independently of each other as well as together. The cognitive model takes a task
description from the task model and produces a list of low-level device operations.
131
Chapter 6 Applications
The interface designer or participants have to execute these operations manually with
the monitor program running in background. An interface designer is free to use any
modules of the system, together or separately. For example, one can run a KLM
analysis on the output of the cognitive model instead of using my perception or motor
behaviour models. Similarly the monitor program can be run for any interaction that is
not produced by the cognitive model, and the perception and motor behaviour models
can be used on the output of the monitor program.
Text searching includes any search which only involves searching for text and not
any other visual artifact. Examples include menu searching, keyword searching in a
document, mailbox searching and so on.
Icon Searching includes searching for a visual artifact (such as an icon or a button)
along with text search for its caption. The search is mainly guided by the visual
artifact and the text is generally used to confirm the target.
In this section, I present a study involving an icon searching task. I simulated the task
using the simulator and evaluated the predictive power of the model by comparing
actual task completion time with prediction in terms of correlation and percentage
error in prediction.
132
Chapter 6 Applications
6.3.1. Process
I conducted trials with two families of icons. The first consisted of geometric shapes
with colours spanning a wide range of hues and luminance (Figure 6-2). The second
consisted of images from the system folder in Microsoft Windows to increase the
external validity (Figure 6-3) of the experiment. Each icon bears a caption underneath
(Figure 6-4). The first two letters and length of all the captions were kept nearly same
to avoid any pop out effect of the captions during visual search.
The experiment was a mixed design with two measures and a between-subject factor.
The within-subject measures were spacing between icons and font size of captions. I
used the following three levels for each measure
133
Chapter 6 Applications
o Dense: 120 pixels horizontally, 170 pixels vertically. This was the
minimum possible separation without overlapping the icons.
o Font size
o Small: 10 point.
o Medium: 14 point as recommended by the RNIB [2006].
o Large: 20 point.
The between-subjects factor is
134
Chapter 6 Applications
o Group
o Able bodied
o Visually impaired
o Motor impaired
The experimental task consisted of shape searching and icon searching tasks. The task
was as follows:
o A particular target (shape or icon with a caption) was shown.
o A set of 18 candidates for matching was shown.
o Participants were asked to click on the candidate, which was same as the target
both in terms of icon and caption.
The sequence of the trials was randomized using a Latin-square. Each participant
undertook 8 trials for each combination of the within-subject measures. Each
participant performed 72 searching and pointing tasks in total. They were trained for
the task before start of the actual trial. However one of the participants (P4) retired
after undertaking 40 trials.
6.3.2. Material
I used a 1280 × 800 LCD colour display driven by a 1.7 GHz Pentium 4 PC running
the Microsoft Windows XP operating system. I also used a standard computer Mouse
(Microsoft IntelliMouse® Optical Mouse) for clicking on the target.
6.3.3. Participants
I collected data from 2 able bodied, 2 visually impaired and 3 motor impaired
participants (Table 6-1). All were expert computer users and used computers more
than once a week.
135
Chapter 6 Applications
6.3.4. Simulation
Initially I analyzed the task in light of my cognitive model. Since the users undertook
preliminary training, I considered them as expert users. I followed the GOMS analysis
technique and identified two sub-tasks
The prediction is obtained by sequentially running the perception model and the
motor behaviour model. The predicted task completion time is the summation of the
visual search time (output by the perception model) and the pointing time (output by
the motor behaviour model).
136
Chapter 6 Applications
6.3.5. Results
Figure 6-5 shows the correlation between actual and predicted task completion times.
I also calculated the relative error Predicted − Actual and show its distribution in Figure 6-
Actual
6. The superimposed curve shows a normal distribution with same mean and standard
deviation as the relative error. I found that the correlation is ρ = 0.7 (p < 0.001) and
56% of the trials have a relative error within ± 40%. The average relative error is
+16% with a standard deviation of 54%. The model did not work for 10% of the trials
and the relative error is more than 100% in those cases. For the remaining 90% of the
trials the average relative error is + 6% with a standard deviation of 42%.
I also analyzed the effects of font size and icon spacing on the task completion time
and investigated whether the prediction reflects these effects as well. So I conducted
two 3 × 3 ANOVA (Spacing × Font × Group) on the actual and predicted task
completion times respectively. I investigated both the within-subject effects and
results of a multivariate test. In the ANOVAs, I did not consider the trials for which
the relative error was more than 100% as the model did not work for those trials.
Participant P4 did not also complete the trial, leaving us with 40 rows of data (N =
40).
137
Chapter 6 Applications
Scatter Plot
20000
Predicted task completion time (in msec)
15000
10000
5000
0
0 5000 10000 15000 20000
Actual task completion time (in msec)
Figure 6-5. Scatter plot between actual and predicted task completion time
18
16
14
12
10
% Data
0
<-120 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120
% Error
o A main effect of Spacing (F(2, 74) = 5.44, p < 0.05) on actual task completion
time.
o A main effect of Spacing (F(2, 74) = 6.95, p < 0.05) in predicted task
completion time.
o An interaction effect of Spacing and Group (F(4, 74) = 3.15, p < 0.05) on
actual task completion time.
o An interaction effect of Spacing and Group (F(4, 74) = 4.64, p < 0.05) on
predicted task completion time.
o An interaction effect of Font and Group (F(3.4, 62.97) = 5.02, p < 0.05) on
actual task completion time.
o An interaction effect of Font and Group (F(3.44, 63.6) = 3.75, p < 0.05) on
predicted task completion time.
The main effect of Font and interaction effects between Font and Group and Spacing,
Font and Spacing do not have significant effects on both actual and predicted task
139
Chapter 6 Applications
completion times. I confirmed these effects through a multivariate test (Table 6-3),
which is not affected by the sphericity assumption. Table 6-3 shows the following
effects
o A main effect of Spacing (Wilks' λ = 0.762, F(2, 36) = 5.62, p < 0.05) on
actual task completion time.
o A main effect of Spacing (Wilks' λ = 0.741, F(2, 36) = 6.28, p < 0.05) in
predicted task completion time.
o A main effect of Font (Wilks' λ = 0.817, F(2, 36) = 4.05, p < 0.05) in
predicted task completion time.
o An interaction effect of Spacing and Group (Wilks' λ = 0.750, F(4, 72) = 2.78,
p < 0.05) on actual task completion time.
o An interaction effect of Spacing and Group (Wilks' λ = 0.671, F(4, 72) = 3.97,
p < 0.05) on predicted task completion time.
o An interaction effect of Font and Group (Wilks' λ = 0.545, F(4, 72) = 6.39,
p < 0.05) on actual task completion time.
o An interaction effect of Font and Group (Wilks' λ = 0.610, F(4, 72) = 5.05,
p < 0.05) on predicted task completion time.
It can be seen from Tables 6-2 and 6-3 that the prediction captures all effects at
99.95% confidence level in both within-subject test and multivariate test. Figures 6-7
and 6-8 show that the effect sizes (η2) are also fairly similar in the prediction as in the
actual. The maximum difference is below 10% in within-subject test and below 20%
in multivariate test. This suggests that the simulator successfully explained the
variance in task completion time for different factors. As these factors include both
interface parameters and physical characteristics of users, we can infer that the
simulator has successfully explained the effects of different interface layouts on task
completion time for people with visual and motor impairment.
140
Chapter 6 Applications
141
Chapter 6 Applications
0.25
0.2
Eta Squared
0.15 Actual
0.1 Predicted
0.05
0
SPACING SPACING * FONTSIZE FONTSIZE SPACING * SPACING *
GROUP * GROUP FONTSIZE FONTSIZE
* GROUP
Measures
0.3
0.25
Eta Squared
0.2
Actual
0.15
Predicted
0.1
0.05
0
SPACING SPACING * FONTSIZE FONTSIZE SPACING * SPACING *
GROUP * GROUP FONTSIZE FONTSIZE
* GROUP
Measures
Figures 6-9 and 6-10 show the effects of font size and spacing for different user
groups. In Figures 6-9 and 6-10, the points depict the average task completion time
and the bars show the standard error at a 95% confidence level. It can be seen from
Figures 6-9 and 6-10 that the prediction is in line with the actual task completion
times for different font sizes and icon spacing.
142
Chapter 6 Applications
However the prediction is less accurate in one of the nine conditions - the medium
font size and medium spacing for the motor impaired users. So I further analyzed
these two conditions (Figures 6-11 and 6-12). As in previous figures, Figures 6-11 and
6-12 depict the average task completion time and the bars show the standard error at
95% confidence level. Figure 6-11 shows the task completion time for different font
sizes while the spacing between the icons were medium. Figure 6-12 shows the task
completion time for different icon layouts while the font size of the icons was 14 pt
(medium). Figures 6-11 and 6-12 show that the standard error is estimated less in the
prediction than the actual, and in these cases the model fails to capture variability in
the task completion time. The model also underestimates the task completion times
for motor impaired users.
14000
Task completion time (in msec)
12000
10000
8000
Actual
6000
Predicted
4000
2000
0
m
m
l
l
e
e
al
al
al
rg
rg
rg
du
iu
iu
sm
sm
sm
ed
ed
la
la
la
ei
m
143
Chapter 6 Applications
Effect of Spacing
14000
Task completion time (in msec)
12000
10000
8000
Actual
6000 Predicted
4000
2000
0
e
e
se
se
se
m
m
ns
ns
ns
iu
iu
iu
ar
ar
ar
de
de
de
ed
ed
ed
sp
sp
sp
m
m
Able bodied Visually impaired Motor impaired
18000
16000
Task completion time (in
14000
12000
msec)
10000 Actual
8000 Predicted
6000
4000
2000
0
Small Medium Large
Conditions
144
Chapter 6 Applications
18000
16000
Task completion time (in
14000
12000
m sec)
10000 Actual
8000 Predicted
6000
4000
2000
0
Sparse Medium Dense
Conditions
6.3.6. Discussion
I have developed the simulator to help with the design and evaluation of assistive
interfaces. Choosing a particular interface from a set of alternatives is a significant
task for both design and evaluation. In this study, I considered a representative task
and the results showed that the effects of both factors (Spacing between icons and
Font size) were the same in the prediction as for actual trials with different user
groups. The prediction from the simulator can be reliably used to capture the main
effects of different design alternatives for people with a wide range of abilities.
However the model did not work accurately for about 30% of the trials where the
relative error is more than ±50%. These trials also accounted for an increase in the
average relative error from zero to 16%. In particular, the predicted variance in task
completion times for motor impaired users was smaller than the actual variance. This
can be attributed to many factors; the most important ones are as follows.
145
Chapter 6 Applications
o Effect of usage time - fatigue and learning effects: The trial continued for
about 15 to 20 minutes. A few participants (especially one user in the motor
impaired group) felt fatigue. On the other hand, some users worked more
quickly as the trial proceeded. The model did not consider these effects of
fatigue and learning. In future I plan to incorporate the usage time into the
input parameters of the model.
o User characteristics: The variance in the task completion time can be
attributed to various factors such as expertise, usage time, type of motor
impairment (hypokinetic vs. hyperkinetic), interest of the participant and so
on. Currently, the model characterizes the extent of motor impairment of the
user only by measuring the grip strength, in future more input parameters may
be considered.
o The choice of the motor behaviour model: I trained the motor behaviour
model by collecting data from people with and without motor impairment.
However Fitts’ Law [Fitts’, 1954] predicts the movement time better than my
model for people without any mobility impairment. An eclectic approach of
choosing Fitts’ Law for people without mobility impairment and my motor
behaviour model for people with mobility impairment may produce more
accurate results.
In this section I demonstrate the use of the simulator for developing a new assistive
interaction technique. Many physically challenged users cannot interact with a
computer through a conventional keyboard and mouse. They may interact with a
computer through one or two switches with the help of a scanning mechanism.
Scanning is the technique of successively highlighting items on a computer screen and
146
Chapter 6 Applications
pressing a switch when the desired item is highlighted. I have developed a new
scanning technique by clustering screen objects. Initially I have evaluated the cluster
scanning system against two other scanning systems using the simulator. Later I have
also confirmed the result by collecting data from motor impaired participants.
Most work on scanning has aimed to enhance the text entry rate of a virtual keyboard.
In these systems the mechanism is usually block-row-column-item based scanning
[Simpson and Koester, 1999; Lesher and colleagues, 2002]. However, navigation to
arbitrary locations on a screen has also become important as graphical user interfaces
are more widely used. Two types of scanning mechanism are commonly used for
navigation. Cartesian scanning moves the cursor progressively in a direction parallel
to the edges of the screen, and Polar scanning selects a direction and then moves
along a fixed bearing. A particular type of Polar scanning that allows movement only
in eight directions is commonly used [Steriadis and Constantnou, 2002; Ntoa, Savidis
and Stephanidis, 2004] (and in a wheelchair mobility interface [O’Neill, Roast and
Hawley, 2002]). In both Cartesian and Polar scanning systems, the interaction rate of
users remains very low. So recent scanning systems have tried to combine two or
more types of scanning to get better performance. Examples of some existing systems
in the same discipline are the Autonomia System [Steriadis and Constantnou, 2002],
the FastScanner system [Ntoa, Savidis and Stephanidis, 2004], the Gus! Scanning
Cursor [2007], the ScanBuddy system [2007] and the SSMCI system [Moynahan and
Mahoney, 1996].
The Autonomia system [Steriadis and Constantnou, 2002] replaces the windows and
widgets of a typical Windows interface by Frames and Widget for Single-switch input
devices (WIFSID) respectively. The system consists of different frames such as
147
Chapter 6 Applications
Cursor Frame, Virtual Keyboard Frame, Console frame etc. The cursor frame
provides eight-directional scanning whereas the frame itself and other frames are
scanned using the block-row-item based scanning approach.
The FastScanner system [Ntoa, Savidis and Stephanidis, 2004] starts the scanning
process by showing a list of currently open applications and asks the user to choose an
application. The scanning procedure then restarts itself in the selected application. The
objects of an interface are scanned sequentially based on a predefined order. Screen
navigation is done by eight-directional scanning. Additionally, the objects of an
interface are divided into four classes –
The user input is interpreted according to the type of the object that has received the
input.
The Gus Scanning Cursor [2007] provides different types of navigation strategies
(such as Cartesian, Polar, eight-directional) at a single screen and the screen itself is
scanned by row-item based scanning. The user has to choose a particular scanning
type to navigate through the screen.
The ScanBuddy system [2007] scans the screen by iteratively dividing it into two
equal parts up to 4 times. Finally it scans the smallest part using Cartesian scanning.
148
Chapter 6 Applications
In the Single Switch Mouse Control Interface (SSMCI) system [Moynahan and
Mahoney, 1996], an intelligent agent operates to guess the target and moves the cursor
accordingly. If the guess is incorrect the user has to signal the agent, which then
reevaluates the situation and comes up with a new solution.
There also exist few scanning applications for specialized tasks like text selection
[Shein, 1997], menu selection [Evreinov and Raisamo, 2004], Web browsing [Ntoa
and colleagues, 2009] and so on, but they are not really useful for navigating to an
arbitrary location in a screen.
Most of these scanning systems (excepts Gus Scanning Cursor [2007] and SSMCI
[Moynahan and Mahoney, 1996]) have a similar structure. They start by dividing the
screen into several blocks and then introduce either Cartesian or Polar scanning within
a block. As a result, users can traverse shorter distances using Cartesian or Polar
scanning and the time needed to reach a target from long distances is reduced.
However, an arbitrary screen layout cannot always be evenly divided into blocks,
rows or columns. So scanning systems define blocks in different ways. The
Autonomia system introduces blocks by providing different frames. The FastScanner
system defines blocks based on the hierarchy of objects in the Windows operating
system. The Scanbuddy system defines blocks just by dividing the screen in two equal
segments.
149
Chapter 6 Applications
In the present study I have considered the following three types of scanning systems.
150
Chapter 6 Applications
Block Scanning System: In the block scanning system the screen area is iteratively
segmented into equally sized sub-areas (Figure 6-14). The user has to select a sub-
area that contains the intended target (the green rectangle in Figure 6-14). The
segmentation process runs a certain number of iterations and after that eight-
directional scanning is initiated in the selected sub-area.
151
Chapter 6 Applications
Cluster Scanning System: The cluster scanning system initially collects all possible
targets in a screen. Then it iteratively divides a screen into several clusters of targets
based on their locations (Figure 6-15). The user has to select the cluster that contains
the intended target. The clustering process iterates until the cluster contains a single
target (Figure 6-15).
152
Chapter 6 Applications
153
Chapter 6 Applications
The cluster scanning system works by enumerating objects being shown in the screen
and storing positions of windows, buttons, icons and other possible targets. The
algorithm starts by considering all the processes running on the computer. If a process
is controlling a window, then the algorithm also considers all child and thread
processes owned by it. During the enumeration process, the algorithm identifies the
foreground window and separately stores the positions of the foreground window and
targets within it from the background windows. The algorithm also calculates the area
occupied by the foreground window. Then it separately clusters the targets in the
foreground window and background windows. The ratio of the number of clusters in
foreground and background windows is proportional to the ratio of the area occupied
by the foreground window in the whole screen. I used the Fuzzy c-means algorithm
[Ross, 1997] to cluster the targets into similarly sized groups. The algorithm is similar
to k-means clustering algorithm. The k-means algorithm partitions points into k
154
Chapter 6 Applications
clusters where each point belongs to the cluster with nearest mean. This algorithm
aims at minimizing the following objective function
k n
J =∑ ∑x
2
( j)
i − cj
j =1 i =1
2
where x i( j ) − c j is a distance measure between a data point xi( j ) and the cluster
centre cj. The Fuzzy c-means algorithm returns the membership values of data points
into different clusters instead of putting the data points into separate clusters. As a
result when the data points are not naturally separated, it returns overlapping clusters.
The c-means algorithm takes the number of clusters (c) as input. It aims at minimizing
the following objective function
N C
Jm = ∑ ∑u
2
m
ij xi − c j , 1 ≤ m <∝
i =1 j =1
where m is any real number greater than 1, uij is the degree of membership of xi in the
cluster j, xi is the ith of d-dimensional measured data, cj is the d-dimension centre of
the cluster, and ||*|| is any norm expressing the similarity between any measured data
and the centre. The pseudocode of the algorithm is shown in Appendix.
I have calculated the optimum number of clusters to minimize the target acquisition
time with respect to the number of clusters. Initially I have evaluated the average time
needed to select a single cluster from a set of clusters. Then the number of iterations
needed to reach a single target in the cluster scanning process is estimated. Based on
these two estimations I have evaluated the total target acquisition time and found the
number of clusters that minimizes it. The optimal number of cluster is five and it does
155
Chapter 6 Applications
not depend on the number of targets in the screen nor the size or resolution of the
screen. The detail of the analysis is shown below.
Let
Number of targets = n
Number of Clusters = c
Scan Delay = s msec.
Suppose that each cluster is equally likely to be selected since we cannot assume a
particular target has higher probability of being selected than the others (unless a
significant amount of interaction is recorded and analysed). So the expected time to
select each cluster is
1
Tc = ∑ (1 + 2 + 3 + .... + c )s
c
1 1
= c (c + 1 )s
2 c
1
= (c + 1 )s
2
After reaching a cluster the user needs to press the key another time to indicate that it
is the correct selection. To give this confirmation signal he has to wait another s sec.,
which makes the average cluster selection time to be
1
Tc = (c + 1)s + s
2
1
= (c + 3)s …...……………………..Eq. 6-1
2
156
Chapter 6 Applications
If the clustering is optimal, we can assume each cluster will contain an equal number
of targets.
Hence,
n
After the first iteration each cluster will contain targets
c
n
After the second iteration each cluster will contain targets
c2
n
After the i-th iteration each cluster will contain targets
ci
Finally after the last iteration (say k-th) each cluster will contain a single target
n
Hence =1
ck
ln n
or, k=
ln c
ln n 1
= . (c + 3)s
ln c 2
1
= (s. ln n )
(c + 3)
2 ln c
= Q.
(c + 3)
ln c
1
[Where, Q = (s. ln n ) ,and is constant for a particular interface]
2
157
Chapter 6 Applications
dT 1
= Q. −
(c + 3)
2
dc ln c c.(ln c )
dT 3
So =0 when ln c = 1 +
dc c
Numerical analysis gives a solution at c ≈ 4.97 (see Figure 6-17). Further
examination of discrete values of c confirms that T is minimized with five targets in
each cluster.
Variations of T with c
7.5
7
6.5
6
T
5.5
5
4.5
4
2 3 4 5 6 7 8 9 10
Number of Clusters
Since T does not vary much for 4 to 6 clusters, so I have used the classification
entropy [Ross, 1997] of a cluster to minimize T. At each instance, I cluster targets into
4, 5 and 6 clusters and then select the one that minimizes the classification entropy.
158
Chapter 6 Applications
o The model for the cluster scanning system takes the scan delay, the number of
clusters, the intended target and the total number and positions of targets in a
screen as input and gives the target acquisition time as output. The model
calculates the target acquisition time by running the cluster scanning algorithm
on the input and using equation 6-1.
o The model for the block scanning system takes the scan delay (s), the number
of blocks (k) and the number of iterations (r) as input and gives the target
acquisition time as output. The model calculates the target acquisition time by
running the block scanning algorithm on the input. The minimum target
acquisition time is s × r (while the target can be reached by always selecting
the first block) and the maximum is equal to s × k × r (while the target has to
be reached by always selecting the last block).
159
Chapter 6 Applications
Results
I investigated the eight-directional scanning, block scanning for different numbers of
blocks and different numbers of iterations, and cluster scanning for different numbers
of clusters. The estimated task completion times are shown in Table 6-4 and Figure 6-
18. The fact that some of these tasks would take over two hours to complete indicates
the value of simulation over user trials.
Discussion
The results clearly show that both the cluster scanning and block scanning processes
perform better than eight-directional scanning and thus support the use of screen
segmentation in recent scanning systems. The cluster scanning system performs best
when the number of clusters is five. However, among the different versions of Cluster
and Block scanning processes, I found a type of block scanning that divides the screen
into four equal sized partitions for four iterations performed best.
160
Chapter 6 Applications
Table 6-4. Estimated Task Completion Time for different scanning systems
I expected that the cluster scanning process would perform better since it uses the
information about target types and locations in the clustering process. For example,
labels are not considered as possible targets. So as part of a post-hoc analysis I studied
the actual tasks undertaken by the participants. Most of the time, participants used
instant messenger software and browsed the World Wide Web. The present version of
the clustering process does not consider locations of hyperlinks in the target
acquisition process and so it might miss possible targets during Web surfing. To test
161
Chapter 6 Applications
I found that that the cluster scanning process performed far better than the block
scanning process when it considered all possible targets in its clustering process (i.e.
in tasks without web browsing). The intended audience of the scanning systems
(motor impaired users) can use special browsers customized for them [Stephanidis,
1998; IBM Web Adaptation, 2007]. In those browsers, a web page is preprocessed
before presentation and the hyperlinks are arranged in a fixed location of screen. In
that case, the cluster scanning process will have no problem locating hyperlinks and
should perform better than other scanning systems.
Figure 6-19. Comparing Cluster Scanning and Block Scanning for tasks using and not
using Internet
162
Chapter 6 Applications
The simulation predicts that participants should take less time to complete a task using
the cluster scanning system than the block scanning system if the clustering process
can include all targets in a screen. I validated this result by a controlled experiment on
motor impaired users. Additionally, I also investigated how hand strength affects
pointing performance in case of single-switch scanning systems. The detail of the
experiment is discussed in the following sections.
Process
In this experiment, the participants were instructed to press a set of buttons placed in a
screen (Figure 6-16) in a particular sequence. All the buttons were coloured grey
except the next target, which was red. The same task was repeated for all the scanning
systems. In particular, I evaluated the cluster and block scanning systems. I recorded
cursor traces, target height, width, and task completion time. For internal validity of
the experiment, I did not use any scan delay adaptation algorithm. The scan delay was
kept constant at 2 sec. for motor impaired participants and at 1 sec. for the control
group. These values were selected after observing their reaction time and were greater
than the reaction time. All participants were trained adequately with the scanning
systems before undertaking the experiment.
Material
I used a push button switch [The Super Switch, 2007] and an Acer Aspire 1640
Laptop with 1280 × 800 pixel screen resolution. I used the same seating arrangement
(same table height and distance from table) for all participants. I measured the same
six variables for hand strength evaluation as discussed in Chapter 5.
163
Chapter 6 Applications
Participants
I collected data from 8 motor impaired and 8 able-bodied participants (Table 6-5).
The motor impaired participants were recruited from a local centre, which works on
treatment and rehabilitation of disabled people and they volunteered for the study. All
motor impaired participants used computer at least once each week. Able-bodied
participants were students of my university and expert computer users. None of them
had used the scanning systems before.
164
Chapter 6 Applications
Results
Initially I measured the total task completion time for the scanning systems (Figure 6-
20 and Table 6-6). It can be seen that participants took less time to complete the task
using the cluster scanning system. The dotted bars in Figure 6-20 mean that two
participants could not complete the task using the block scanning system.
200000
180000
Task Completion Time (in msec)
160000
140000
120000
Cluster Scanning System
100000
Block Scanning System
80000
60000
40000
20000
0
P1 P2 P3 P4 P5 P6 P7 P8 C1 C2 C3 C4 C5 C6 C7 C8
Participants
To further investigate the scanning systems, I measured the following three variables:
Number of missed clicks: It measures the number of times the participants wrongly
pressed the switch.
Idle Count: The scanning systems periodically highlight the buttons. This variable
measures the number of cycles when the participants did not provide any input,
though they were expected to do so.
165
Chapter 6 Applications
Efficiency: The scanning systems require a minimum time to complete any task
which depends on the particular scanning system and not on the performance of the
OptimalTim e
user. I calculated the efficiency as the ratio ActualTime . An efficiency of 100%
indicates optimal performance, 50% indicates taking twice the minimal time and 0%
indicates failure to complete the task. Table 6-6 presents the efficiency of each
participant. The optimal time was same for each participant within a group. In
calculating the efficiency I took the average time needed to optimally and actually
make one click (or selection) since two participants could not complete the task.
166
Chapter 6 Applications
Table 6-7 shows the correlation coefficients of these variables with the hand
evaluation metrics. The only significant effect is a correlation between the number of
missed clicks in the cluster scanning system and grip strength. There was a similar,
but weaker effect, in the block scanning system. It seems that hand strength does not
affect performance of users with the scanning systems.
I did not find any significant difference between the performances of motor impaired
and able-bodied users by an equal variance paired t-test at p < 0.05 level. However the
efficiency, average number of missed clicks and idle count are significantly lower in
the cluster scanning system than in the block scanning system in an equal variance
paired t-test (p < 0.05) (Figure 6-21). Additionally two participants (P3 and P7) could
not complete the task using the block scanning system while all participants could
complete the task using the cluster scanning system.
167
Chapter 6 Applications
0.49
Avg. Efficiency
0.62
18.59
Avg. Idle Count
3.75
Discussion
I failed to find any effect of hand strength on pointing performance while participants
used the scanning systems. There are two possible explanations:
o The switch used in scanning only requires a gentle push to operate and the
hand strength of motor impaired users are sufficient to operate the switch.
o The scanning software does the navigation itself and the users need not move
their hand to move the pointer.
This result with the scanning system also shows that an appropriate choice of an
assistive technology can make interaction independent of the physical strength of
users. It can be noted from tables 6-5 and figure 6-20 that participants P4 and P5 both
have hyperkinetic motor impairment and both can not complete the task using block
scanning system. Perhaps it means they face a different challenge in comparison to
other users. In future, it will be interesting to investigate the effects of the type of
motor impairment on the scanning systems.
168
Chapter 6 Applications
The simulator predicted that the task completion time would be less in the cluster
scanning system than the block scanning system when the cluster scanning system can
consider all possible targets in its clustering process. The experiment also shows
similar results. The total task completion time, sub-optimal task completion time, idle
time and number of missed clicks are less in the cluster scanning system than the
block scanning system. The efficiency of the cluster scanning system can be attributed
to the following factors.
o The cluster scanning system does not introduce any new interface element like
a frame or form in the screen as Autonomia [Steriadis and Constantnou, 2002]
or FastScanner [Ntoa, Savidis and Stephanidis, 2004] systems do.
o The cluster scanning system does not blindly divide the screen in a predefined
number of segments as the ScanBuddy system [2007] or the block scanning
systems do. It clusters the target so that the targets are evenly divided into
blocks and a block is not drawn in a region that does not contain any target.
6.5. Conclusion
In this chapter, I have presented two representative applications of the simulator. The
first study demonstrates the use of the simulator to choose an interface layout from a
set of alternatives. The simulator is found to correctly predict the main effects of
169
Chapter 6 Applications
different layout options for people with a wide range of abilities. The second study
demonstrates the use of the simulator for designing a new assistive interaction
technique. Initially I have developed a new scanning technique and used the simulator
to analyze it in comparison to other scanning techniques. Later I have confirmed the
results of the analysis through a controlled experiment.
170
Chapter 7 Conclusions
While we believe strongly in user testing and iterative design. However, each iteration of a
design is expensive. The effective use of such models means that we get the most out of each
iteration that we do implement.
-Bill Buxton from his book “Human Input to Computer Systems: Theories, Techniques and
Technology”, 2010
7.1. Introduction
In this work, I have developed a simulator to help with the design and evaluation of
assistive interfaces. The simulator embodies both the internal state of a computer
application and also the perceptual, cognitive and motor processes of its user. It takes
a task definition and locations of different objects in an interface as input. It then
predicts possible eye movements and cursor paths on the screen and uses these to
predict task completion times. The models are parameterized to represent different
physical abilities, levels of skill and input devices.
In the following sections, I summarize my work and discuss about the implication,
limitation and future directions of my research.
7.2. Summary
I have taken a novel approach to designing and evaluating inclusive systems by
modelling the performance of users with a wide range of abilities. As I discussed in
Chapter 2, two main types of user model are in widespread use:
o The GOMS family of models, which were developed only for human
computer interaction (HCI).
Chapter 7 Conclusions
The GOMS (Goal, Operator, Model, Selection) family of HCI models (e.g. KLM,
CMN-GOMS, CPM-GOMS) is mainly suitable for modelling the optimal (skilled)
behaviour of users. On the other hand, models developed using cognitive architectures
consider the uncertainty of human behaviour in detail but have not been widely
adopted for simulating HCI as their use demands a detailed knowledge of psychology.
There is also not much reported work on systematic modelling of assistive interfaces.
In the present work, I have addressed some of the current problems of user modelling
by developing a simulator inspired by Model Human Processor [Card, Moran and
Newell, 1983]. My simulator consists of a perception model, a cognitive model and a
motor behaviour model.
The perception model simulates the phenomena of visual perception such as focussing
and shifting attention. It can also simulate the effects of different visual impairments
on interaction. I have investigated eye gaze patterns of able-bodied users as well as
people with visual impairment and my model can predict the visual search time and
eye gaze pattern of able-bodied people and a few types of visually impaired users with
statistically significant accuracy.
The motor behaviour model is developed by statistical analysis of cursor traces from
motor impaired users. As part of the model, I have also developed a new scale for
172
Chapter 7 Conclusions
characterizing the extent of disability of users by measuring their grip strength, which
was not earlier possible by using existing clinical scales.
174
Chapter 7 Conclusions
7.4. Contributions
With reference to the hypothesis in Chapter 1, the main contributions of my work are
My studies have been used to design an inclusive accessible game [Phillips, 2009] and
the University has recently been awarded EU funding for a project [The GUIDE
Project, 2009] that will build on results from my PhD research.
175
Chapter 7 Conclusions
In the present work, I have validated the models using controlled experiments.
However it will be interesting to investigate the performance of the models in practice
and design new applications based on their predictions.
loss of visual acuity and a few types of motor impairment. The Lumiere convenience
project [Horovitz and colleagues, 2008] uses an influence diagram in modelling users.
This records the relationships among users’ needs, goals, background etc. The Office
Assistant of Microsoft Office application uses this influence diagram to provide
runtime help to users. The AVANTI project [Stephanidis and colleagues, 1998; 2003]
provides a multimedia web browser for people with light, or severe motor disabilities,
and blind people. It distinguishes personalization into two classes –
However, the Lumiere project does not generalize their personalization mechanisms
for other applications and the AVANTI project only addresses a small segment of
disabilities for a particular application.
The lack of a generalized framework for personalization of users with a wide range of
abilities affects the scalability of products. My model covers users with a wide range
of abilities which can lead to a generalized framework for interface personalization.
The framework will work by identifying the specific problem experienced by a user
with a component of an interface and then personalizing the component according to
the needs of the user. For example an elderly user may find an online form unusable
because he suffers from significant tremor in his finger. In this case increasing the
font size or the size of the whole webpage will not be very helpful. Rather we have to
identify the optimum size of the textboxes and buttons that would help him to point to
them but will not consume much screen real-estate. The interface personalization
mechanism will be similar to the interface feature selection and customization process
177
Chapter 7 Conclusions
described in Kumar et. al [2004], but will be more rigorous and generalized in relating
interface personalization to user modelling. It will work based on the following steps:
Web browser
Another interesting use of the models will be designing inclusive websites. Currently,
many applications (like shopping, banking, social networking systems) are developed
as web based systems. There are numerous guidelines and systems for developing
accessible websites but they are not adequate to provide accessibility. Moreover
designers often do not conform to the guidelines while developing new systems. It is
also equally difficult to change existing systems according to the guidelines. There are
a few systems (like IBM Web Adaptation Technology [2008], AVANTI Web
browser, WEBADPT2ME systems) which offer features to make web sites accessible
but either they serve a very special type of user (motor-impaired for AVANTI) or
there is no way to relate the inclusive features with the particular need of users. My
model can be used to relate the existing inclusive features with the need of users with
a wide range of abilities. For example, if we know the visual acuity of a user, I can
use my model to decide the optimum font size for a website. This type of adjustments
can also be done for ubiquitous devices, where the context itself poses limitations. For
example, an interface suitable for people having less visual acuity will be useful for
small screen devices which have high pixel-density. Similarly, a good interface for a
178
Chapter 7 Conclusions
hyperkinetic motor impaired user will also be suitable for a handheld device during its
use in a moving vehicle.
The application of the simulator will also help to analyse the models in more detail.
As for example, I have developed models for visual search tasks. However most of
the real life search tasks also involve significant amount of textual search. The
perception model can be extended to simulate reading tasks. The model can also be
extended to simulate other types of perception besides vision like auditory and haptic
perception.
I have not addressed cognitive impairment but my cognitive model can be extended to
simulate a few types of cognitive disabilities. Similarly the motor behaviour model
can be extended for other input devices besides mouse and single switch scanning
systems.
I have developed user interfaces for each model. It should be interesting and
important to work with interface designers and software engineers to optimize the
design of the interfaces of the models themselves.
The accuracy of the existing models can also be increased by separately calibrating
and validating them for different impairments (such as Maccular Degeneration,
Diabetic Retinopathy, Hypokinetic and Hyperkinetic motor impairment [Flowers,
1976], Dyslexia). The extension of the work will help to understand the effect of task
and devices on human cognition in more detail, which will also be of interest to
researchers in other disciplines besides computer science.
179
Glossary
Assistive technology Technology developed for people with disabilities to assist them
in rehabilitation.
Dual space model A type of user model that uses two separate state space
diagrams.
Effect size A statistical metric that shows the amount of variance explained
by a factor.
Fovea The Fovea is a spot near the centre of the retina that provides
most detailed vision. It has the highest density of cone cells.
GOMS A type of user model, which assumes that people interact with a
computer to achieve a goal by selecting a method, which
consists of a sequence of basic operations.
KLM Model The KLM model [Keystroke Level Model] simplifies the GOMS
model by eliminating the goals, methods, and selection rules,
leaving only six primitive operators.
A heuristics used to search interface objects. It tells that users
Label matching
principle search for similar words in an interface which have also
appeared in the task definition.
Markov Process A type of probability model which assumes that the present state
depends on a finite set of past states. When the present state only
depends on the immediate past state, it is called a first order
Markov process.
181
MHP A user model that explains cognition by an information
processing model. It classifies all cognitive activities in three
classes – perception, cognition and motor behaviour.
182
Abbreviations
AI Artificial Intelligence
183
Appendix
I have calculated colour histogram of two different regions of an image one of which
contains the actual target while the other contains a possible target. The locations of the
actual and possible targets are part of the input to the perception model. Finally I
calculate the mean square distance between the histograms of the two regions of the
image.
Sobel operator
Sobel operator is used to detect edges in an image named after Irwin Sobel (currently at
HP Labs, Palo Alto). It convolutes the whole image with the following 3 × 3 kernel
184
Shape context matching
The shape context algorithm detects the shape of an object in an image. It works after
detecting the edge of the object. It divides the image or particular image region
containing the object into uniform bins in log polar space (figure below) with respect to a
reference point and then calculates the number of pixels denoting edge in each bin.
I have calculated shape context histogram of two different regions of an image one of
which contains the actual target while the other contains a possible target. The locations
of the actual and possible targets are part of the input to the perception model. The centre
of a target is considered as the reference point. Finally I calculate the mean square
distance between the histograms of the two regions of the image.
185
Subroutine Get Targets
Enumerate all currently running processes
If the process creates a window and the window is visible
Store its position
If the window is the foreground window
Store its handle
End if
Get all child processes
Get all Thread Processes
End if
End Subroutine
186
Subroutine Get Thread Processes
If the process creates a Window and the window is visible
Get Window’s Position
If Parent Process is Foreground Window
Store its position with a mark
Else
Store its position without any mark
End if
End if
End Subroutine
187
Bibliography
4. Barnard P. "The Emotion Research Group Website, MRC Cognition and Brain
Sciences Unit." Available at: http://www.mrc-cbu.cam.ac.uk/~philb , Accessed
on 1st July, 2007
6. Belongie S., Malik J., and Puzicha J. "Shape Context: A new descriptor for
shape matching and object recognition." Neural Information Processing Systems
Conference 2000.
7. Belongie S., Malik J. and Puzicha J. "Shape Matching and Object Recognition
Using Shape Contexts." IEEE Transactions on Pattern Analysis and Machine
Intelligence 24.4 (2002): 509-521.
10. Biswas P., Bhattacharyya S. and Samanta D. "User Model to Design Adaptable
Interfaces For Motor-Impaired Users." Tencon '05 - IEEE Region 10
Conferences 2005. 1801-1844.
11. Blackstien-Adler S., Shein F., Quintal J., Birch S. and Weiss P. L. "Mouse
Manipulation through Single-Switch Scanning." Assistive Technology 16.1
(2004): 28-42.
Bibliography
14. Bovair S., Kieras D. E., and Polson P. G. "The acquisition and performance of
text-editing skill: A cognitive complexity analysis." Human-Computer
Interaction 5 (1990): 1-48.
15. Bravo P. E. , LeGare M., Cook A.M. and Hussey S. "A study of the application
of Fitts' Law to selected cerebral palsy adults." Perceptual and Motor Skills 77
(1993): 1107-1117.
17. Burack J. A., Zelazo P. R., Charman T. and Yirmiya N. "The Development of
Autism: Perspectives from Theory and Research." Lawrence Erlbaum
Associates, 2001.
18. Butterworth R. and Blandford A. "Programmable user models: The story so far."
Available at: http://www.cs.mdx.ac.uk/puma/wp8.pdf, Accessed on 30th June,
2007
189
Bibliography
24. Daly S. "The Visible Differences Predictor: An algorithm for the assessment of
image fidelity." Digital Images and Human Vision Ed. Watson A. B. Cambridge,
MA, USA: MIT Press, 1993. 179-206.
30. Eng K., Lewis R. L., Tollinger I., Chu A., Howes A. and Vera A. "Generating
Automated Predictions of Behavior Strategically Adapted To Specific
Performance Objectives." ACM/SIGCHI Conference on Human Factors in
Computing Systems (CHI) 2006. 621-630.
31. Faye E. "The effect of the eye condition on functional vision." Clinical low
vision Ed. Faye E. Boston, USA: Little, Brown and Company, 1980. 172-189.
32. Field A. "Discovering Statistics Using SPSS." SAGE Publications Ltd., 2009.
33. Findlay J. M. "Saccade Target Selection during Visual Search." Vision Research
37.5 (1997): 617-631.
190
Bibliography
35. Fitts P.M. "The Information Capacity of The Human Motor System In
Controlling The Amplitude of Movement." Journal of Experimental Psychology
47 (1954): 381-391.
40. Gan K. C. and Hoffmann E. R. "Geometrical conditions for ballistic and visually
controlled movements." Ergonomics 31 (1988): 829-839.
42. Gray W. D. and Sabnani H. "Why you can't program your VCR, or, predicting
errors and performance with production system models of display-based action."
Conference Companion On Human Factors In Computing Systems in
ACM/SIGCHI Conference on Human Factors in Computing Systems (CHI)
1994. 79-80.
43. Gray W., Young R.M. and Kirschenbaum S. "Introduction to this special issue
on cognitive architectures and human-computer interaction." Human-Computer
Interaction 12 (1997): 301-309.
191
Bibliography
45. Gump A., Legare M. and Hunt D. L. "Application of Fitts' Law to individuals
with cerebral palsy." Perceptual and Motor Skills 94 (2002): 883-895.
48. Hick W.E. "On the rate of gain of information." Journal of Experimental
Psychology 4 (1952): 11-26.
49. Hill K. and Romich B. "A Rate Index for Augmentative and Alternative
Communication." Available At: http://www.AACinstitute.Org/Resources/
Methodsandtools/2002rateindex/ Paper.html, Accessed on 21st May 2007
54. Horvitz E., Breese J., Heckerman D., Hovel D. and Rommelse K. "The Lumiere
Project: Bayesian User Modeling for Inferring the Goals and Needs of Software
Users." Available at: http://research.microsoft.com/en-
us/um/people/horvitz/lumierehtm, Accessed on 28th October 2009
55. Howes A., Vera A., Lewis R.L. and Mccurdy, M. "Cognitive Constraint
Modeling: A Formal Approach To Reasoning About Behavior." Annual meeting
of the Cognitive Science Society Lawrence Erlbaum Associates, 2004.
192
Bibliography
56. Hwang F., Langdon P.M., Keates S., Clarkson P.J., and Robinson P. "Cursor
Characteristics And Haptic Interfaces For Motor-Impaired Users." Cambridge
Workshop on Universal Access and Assistive Technology 2002. 87-96.
60. John B. E. and Kieras D. "The GOMS Family of User Interface Analysis
Techniques: Comparison And Contrast." ACM Transactions on Computer
Human Interaction 3 (1996): 320-351.
62. Johnson-Laird P.A. "The Computer and The Mind." Cambridge, MA, USA:
Harvard University Press, 1988.
63. Jonides J. "Voluntary versus automatic control over the mind's eye's movement."
Attention and performance Ed. Long J. B. and Baddeley A. D. Hillsdale, NJ,
USA: Erlbaum, 1981. 187-203.
64. Kaiser P. and Boynton R. "Human color vision." Optical Society of America,
1996.
65. Kaplan R. J. "Physical medicine and rehabilitation review." MacGraw Hill Book
Company, 2006.
67. Keates S. and Trewin S. "Effect of Age And Parkinson's Disease On Cursor
Positioning Using A Mouse." ACM/SIGACCESS Conference on Computers and
Accessibility (ASSETS) 2005. 68-75.
193
Bibliography
68. Keates S., Clarkson J. and Robinson P. "Investigating The Applicability of User
Models For Motion Impaired Users." ACM/SIGACCESS Conference on
Computers and Accessibility (ASSETS) 2000. 129-136.
69. Keates S., Trewin S. and Paradise J. "Using Pointing Devices: Quantifying
Differences Across User Groups." International Conference on Universal Access
in Human-Computer Interaction 2005.
70. Kennedy P. R., Bakay R. A., Moore M. M., Adams K. and Goldwaithe J. "Direct
Control of a Computer from the Human Central Nervous System." IEEE
Transactions on Rehabilitation Engineering 8.2 (2000): 198-203.
71. Kieras D. and Meyer D. E. "An Overview of The EPIC Architecture For
Cognition And Performance With Application to Human-Computer Interaction."
Human-Computer Interaction 12 (1990): 391-438.
74. Kieras D. E., Wood S. D., Abotel K. and Hornof A. "GLEAN: A Computer-
Based Tool For Rapid GOMS Model Usability Evaluation of User Interface
Designs." ACM Symposium on User Interface and Software Technology (UIST)
1995. 91-100.
75. Laird J.E., Rosenbloom P.S. and Newell A. "Towards chunking as a general
learning mechanism." National Conference on Artificial Intelligence at Austin,
TX: Morgan 1984. 188-192.
77. Langolf G. D., Chaffin D. B. and Foulke J. A. "An investigation of Fitts' Law
using a wide range of movement amplitudes." Journal of motor behaviour 8
(1976): 113-128.
194
Bibliography
81. Luck S. J., Chelazzi L., Hillyard S. A. and Desimone R. "Neural Mechanisms of
Spatial Selective Attention In Areas V1, V2, And V4 of Macaque Visual
Cortex." Journal of Neurophysiology 77.1 (1997): 24-42.
83. Majaranta P. and Raiha K. "Twenty Years of Eye Typing: Systems and Design
Issues." Eye Tracking Research & Application 2002. 15-22.
84. Marr D. C. "Visual Information Processing: the structure and creation of visual
representations." Philosophical Transactions of the Royal Society of London
290.1038 (1980): 199-218.
85. Mathiowetz V., Weber K., Volland G. and Kashman N. "Reliability and validity
of hand strength evaluation." Journal of Hand Surgery 9A (1984): 222-226.
88. Mcmillan W. W. "Computing For Users With Special Needs And Models of
Computer-Human Interaction." ACM/SIGCHI Conference on Human Factors in
Computing Systems (CHI) 1992. 143-148.
195
Bibliography
89. Meyer D. E., Abrams R. A., Kornblum S., Wright C. E. and Smith J. E. K.
"Optimality in human motor performance: Ideal control of rapid aimed
movements." Psychological Review 95 (1988): 340-370.
90. Moran T.P. "Command Language Grammar: A Representation For The User
Interface of Interactive Computer Systems." International Journal of Man-
Machine Studies 15.1 (1981): 3-50.
91. Motomura Y., Yoshida K. and Fujimoto K. "Generative user models for
Adaptive Information Retrieval." IEEE International Conference on Systems
2000.
95. Newell A. "You can’t play 20 questions with nature and win." Projective
comments on the papers of this symposium. Pittsburgh, Pa: Carnegie Mellon
University, Department of Computer Science. 1973.
96. Newell A. and Simon H. A. "GPS, A Program That Simulates Human Thought."
Cambridge, MA, USA: MIT Press, 1995.
98. Norcio F. "Adaptive Interfaces: Modelling Tasks and Users." IEEE Transaction
on Systems, Man, Cybernetics 19.2 (1989): 399-408.
99. Nixon M. and Aguado A. "Feature Extraction and Image Processing." Oxford,
UK: Elsevier Ltd., 2002.
100. Norcio F. and Chen Q. "Modeling User's with Neural Architecture." International
Joint Conference on Neural Networks 1992. 547-552.
196
Bibliography
101. Ntoa S., Savidis A. and Stephanidis C. "Fastscanner: An Accessibility Tool For
Motor Impaired Users." International Conference on Computers Helping People
with Special Needs, LNCS-3118, Springer-Verlag 2004. 796-803.
102. O'neill P., Roast C. and Hawley M. "Evaluation of Scanning User Interfaces
Using Real Time Data Usage Logs." ACM/SIGACCESS Conference on
Computers and Accessibility (ASSETS) 2000. 137-141.
104. Oka N. "Hybrid cognitive model of conscious level processing and unconscious
level processing." IEEE International Joint Conference on Neural Networks
1991. 485-490.
105. Pasero R., Richardet N. and Sabatier P. "Guided Sentences Composition for
Disabled People." Applied Natural Language Processing 1994. 205-206.
106. Payne S.J. and Green T.R.G. "Task-Action Grammars: A Model of Mental
Representation of Task Languages." Human-Computer Interaction 2 (1986): 93-
133.
108. Petrie H., Hamilton F., King N. and Pavan P. "Remote Usability Evaluations
With Disabled People." ACM/SIGCHI Conference on Human Factors in
Computing Systems (CHI) 2006. 1133-1141.
197
Bibliography
115. Rizzo A., Marchigiani E. and Andreadis A. "The AVANTI Project: Prototyping
And Evaluation With A Cognitive Walkthrough Based On The Norman's Model
of Action." Designing interactive systems: processes, practices, methods, and
techniques 1997. 305-309.
118. Ross S. M. "Probability Models For Computer Science.” Elsevier Ltd., 2002.
120. Salvucci D. D. "An integrated model of eye movements and visual encoding."
Cognitive Systems Research (2001):
198
Bibliography
125. Scholtes V.A. B., Becher J. G., Beelen A. and Lankhorst G. J. "Clinical
assessment of spasticity in children with cerebral palsy: a critical review of
available instruments." Developmental Medicine and Child Neurology 48
(2006): 64-73.
127. Shah K., Rajyaguru S., Amant R. S. and Ritter F. E. "Connecting a Cognitive
Model to Dynamic Gaming Environments: Architectural and Image Processing
Issues." International Conference on Cognitive Modeling 2003. 189-194.
132. Stephanidis C., Paramythis A., Sfyrakis M., Stergiou A., Maou N., Leventis
A.,Paparoulis G. and Karagiannidis C. "Adaptable And Adaptive User Interfaces
for Disabled Users in the AVANTI Project." Intelligence in Services and
Networks, LNCS-1430, Springer-Verlag 1998. 153-166.
199
Bibliography
139. Tollinger I., Lewis R. L., McCurdy M., Tollinger P., Vera A., Howes A. and
Pelton L. "Supporting Efficient Development of Cognitive Models At Multiple
Skill Levels: Exploring Recent Advances In Constraint-Based Modeling."
ACM/SIGCHI Conference on Human Factors in Computing Systems (CHI)
2005. 411-420.
143. Trewin S. and Pain H. "Keyboard And Mouse Errors Due To Motor
Disabilities." International Journal of Human-Computer Studies 50.2 (1999):
109-144.
144. Viénot F., Brettel H. and Mollon J. D. "Digital video colour maps for checking
the legibility of displays by dichromats." Color Research and Application 24.4
(1999): 243-252.
200
Bibliography
148. Waller S., Langdon P., Cardoso C. and Clarkson P.J. "Calibrating capability loss
simulators to population data." Contemporary Ergonomics Ed. Bust P. Taylor
and Francis Ltd., 2008.
150. Warrick A. and Kaul S. "Their Manner of Speaking." Calcutta, India: Indian
Institute of Cerebral Palsy, 2002.
153. Wobbrock J. O. and Gajos K. Z. "A Comparison of area pointing and goal
crossing for people with and without motor impairments." International
ACM/SIGACCESS Conference on Computers and Accessibility (ASSETS)
2007. 3-10.
155. Young R.M., Green T.R.G. and Simon T. "Programmable User Models For
Predictive Evaluation of Interface Designs." ACM/SIGCHI Conference on
Human Factors in Computing Systems (CHI) 1989. 15-19.
201