Thesis

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/234782328
Simulating HCI for special needs
Article in ACM SIGACCESS Accessibility and Computing · September 2007

DOI: 10.1145/1328567.1328569
CITATIONS READS
15 1,344
1 author:
Pradipta Biswas
Indian Institute of Science
186 PUBLICATIONS 1,260 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
HCI for Automotive View project
Human Robot Interaction : A Literature Survey View project
All content following this page was uploaded by Pradipta Biswas on 19 July 2020.
The user has requested enhancement of the downloaded file.

Inclusive User Modelling
A dissertation submitted to the University of Cambridge Computer Laboratory

for the degree of Doctor of Philosophy
Pradipta Biswas
Trinity College
December 2009
Declaration
I, Pradipta Biswas, declare that this thesis and the work presented in it are my own
and have been generated by me as the result of my own original research. I confirm
that this work was wholly done while in candidature for a research degree at this
University. Any part of this thesis has not previously been submitted for a degree or
any other qualification at this University or any other institution. Where I have
consulted the published work of others, this is always clearly attributed. Where I have
quoted from the work of others, the source is always given. With the exception of
such quotations, this thesis is entirely my own work. The thesis has approximately
33,000 words.
Date:…………13/3/10………….... Pradipta Biswas…..

Signed:…...…Pradipta Biswas
(Pradipta Biswas)
Dedicated
to
Rocky
Acknowledgement
I would like to thank the Gates Cambridge Trust for funding this work. I am indebted
to my supervisor Prof. Peter Robinson for his constant help and support in my
research. It would have been impossible to complete this research without his vision
and encouragement. I am also grateful to Dr. Alan Blackwell for his immense help in
conducting user trials and analyzing results. I would also like to thank Dr. Graham
Titmus and Dr. Neil Dodgson for always providing me helping hands.
I am quite fortunate to be part of a very helpful research group and would like to
thank all the present and former members for their immense help and support.
Christian Richardt, Arnab Chakrabarti and Cecily Morrison need special mention for
proof reading my thesis chapters. I would also like to thank the staff and members of
the Computer Laboratory especially Mrs. Lise Gough, Mrs. Megan Sammons, Mrs.
Fiona Billingsley, Mrs. Jeniffer Underhill and Ms. Michelle Jeffery for their help on
many occasions. I am grateful to my college tutor Dr. Ali Alavi and secretary to the
tutor Mrs. Helene Sutton for their kind support. I am grateful to the volunteers at
Papworth Trust and University of Cambridge for taking part in my studies. I would
like to thank Ms. Adela Xu and Dr. Setor Knutsor of Papworth Trust for organizing
the user trials. Dr. Pattrick Langdon and his research group at Engineering Design
Centre were particularly helpful for sharing their research work. I would also like to
thank the following researchers for providing me useful resources and technical
suggestions on many occasions: Dr. Daniel Bernhardt, Dr. William Billingsley, Dr.
Joy Goodman, Mrs. Margaret Hayden, Dr. Sean Holden, Dr. Gregory Hughes, Dr.
Faustina Hwang, Prof. David Kieras, Prof. Clayton Lewis, Prof. John Mollon, Prof.
Helen Pain, Prof. Gary Rubin, Dr. H. M. Shah, Dr. Metin Sezgin, Prof. Alistair
Sutcliffe, Dr. Shari Trewin, Dr. Phil Tuddenham, Dr. Sam Waller, Prof. Jacob
Wobbrock and Prof. Richard Young.
Last but not the least I would like to thank my numerous friends and well wishers at
Cambridge, especially Moin Nizami, Shazia Afzal and Amitabha Roy for their both
technical and moral support. Finally I would like to express my gratitude to my
parents for always encouraging me to do well.
Abstract
Computers offer valuable assistance to people with physical disabilities. However

designing human-computer interfaces for these users is complicated. The range of
abilities is more diverse than for able-bodied users, which makes analytical modelling
harder. I have taken a novel approach to designing and evaluating inclusive systems by
modelling the performance of users with a wide range of abilities. Previous work has
explored this principle for able-bodied users; however performance modelling of people
with a diverse range of abilities remains a challenge. I have investigated how physical
capabilities of users with a wide range of abilities are reflected in their interactions with
digital devices. I have formulated a simulator that embodies both the internal state of a
computer application and also the perceptual, cognitive and motor processes of its user.
The simulator can predict the likely interaction patterns when undertaking a task using a
variety of input devices, and estimate the time to complete the task in the presence of
different disabilities and for different levels of skill.
The simulator consists of a perception model, a cognitive model and a motor behaviour
model. The perception model simulates the phenomena of visual perception (like
focussing and shifting attention) and can also simulate the effects of different visual
impairments on interaction. It has predicted the visual search time and eye gaze pattern of
able-bodied people and a few types of visually impaired users with statistically
significant accuracy. The cognitive model simulates expert performance by using CPM-
GOMS model. It can also simulate performance of novices by using a dual-space model.
The motor-behaviour model is based on statistical analysis of cursor traces from motor-
impaired users. As part of the model, I have also developed a new scale of characterizing
the extent of disability of users by measuring their grip strength. I have evaluated the
simulator through an icon searching task undertaken by visually and motor impaired
people and also used the simulator to develop a new assistive interaction technique. My
studies have already been used to design an accessible game and the University has been
awarded EU funding for a project that will build on results from my PhD research.
2
Table of contents
Chapter 1 Introduction 12
1.1. Background 12
Hypothesis 13
1.2. Proposed solution 14
1.3. Development methodology 15
1.4. Thesis Structure 17
1.5. Publications 18
Chapter 2 Literature survey 20
2.1. Introduction 20
2.2. The GOMS family of models 22
2.3. Cognitive architectures 24
2.4. Grammar based models 26
2.5. Application specific models 26
2.6. Review 28
2.7. Objective 31
2.8. Architecture 32
2.9. Conclusion 33
Chapter 3 Perception model 35
3.2. Related works 36
3.3. Design 39
3.4. Modelling visual impairments 41
3.4.1. Visual acuity loss 41
3.4.2. Contrast sensitivity loss 41
3.4.3. Maccular Degeneration 41
3.4.4. Diabetic Retinopathy 44
3.4.5. Glaucoma 44
3.4.6. Colour blindness 45
3.4.7. Demonstrations 48
3.4.8. User interfaces of the model 50
3.5. Experiment to collect eye tracking data 52
3.5.1. Process 52
3.5.2. Material 53
3.5.3. Participants 54
3.5.4. Calibration for predicting fixation duration 55
3.5.5. Calibration for predicting eye movement patterns 58
3.6. Validation 61
3.7. Discussion 69
3.8. Conclusion 71
Chapter 4 Cognitive model 72
4.2. Design 73
4.2.1. Learning 75
4.2.2. User interfaces 76
4.3. Case studies 76
4.3.1. Study 1- Modelling simple icon manipulation operations 81
4.3.2. Study 2- A cognitive model for eight-directional scanning 83
4.3.3. Study 3- Modelling interaction for a novel interface 91
4.4. Conclusion 95
4
Chapter 5 Motor behaviour model 96
5.2. Related work 97
5.3. Design 98
5.3.1. User interfaces of the model 101
5.4. Pilot study 102
5.5. Confirmatory study 107
5.5.1. Process 108
5.5.2. Material 108
5.5.4. Results 114
5.5.5. Calibration 118
5.5.6. Validation 120
5.6. Affect of hand strength for able-bodied users 123
5.6.1. Process 124
5.6.2. Material 124
5.6.4. Results 124
5.6.1. Process 124
5.7. Discussion 126
5.8. Conclusions 129
Chapter 6 Applications 130

6.2. Working principle of the simulator 130
6.3. Icon searching task 132
6.3.1. Process 133
6.3.2. Material 135
5
6.3.4. Simulation 136
6.3.5. Results 137
6.3.6. Discussion 145
6.4. The Cluster scanning system 146
6.4.1. Related work 147
6.4.2. Current scanning systems 150
6.4.3. The cluster scanning system 154
6.4.4. Evaluation through simulation 159
6.4.5. Validation of the result 163
6.5. Conclusion 169
Chapter 7 Conclusions 171

7.2. Summary 171
7.3. Implications and limitations 173
7.4. Contributions 175
7.5. Future work 176
7.5.1. New applications 176
7.5.2. New models 179
Glossary & Abbreviations 180
Appendix 184
Bibliography 188
6
List of Figures
Figures Page No.

1-1. Use of the simulator 15
1-2. Development methodology 16
2-1. Application Specific User Models 27
2-2. Existing HCI models w.r.t. skill and physical ability of users 32
2-3. Architecture of the Simulator 33
3-1. Foveal, parafoveal and peripheral vision 39
3-2. Comparing my colour blindness simulation with Brettels’ Simulation 47
3-3. Simulation of Ishihara test 47
3-4. Visual Impairment Simulation 49
3-5. Screenshot to model diseases 51
3-6. Screenshot to model visual functions 51
3-7. Screenshot to run image processing algorithms 52
3-8. Corpus of Shapes 53
3-9. Corpus of Icons 53
3-10. Relating colour histogram coefficients with fixation duration 56
3-11. Total no. of fixations w.r.t. fixation duration 58
3-12. Number of fixations w.r.t. fixation duration 58
3-13. Average Levenshtein Distance for able-bodied users 60
3-14. Average Levenshtein Distance for visually impaired users 61
3-15. Comparing the best strategy against the Random strategy 61
3-16. Cross validation test on the classifiers 62
3-17. Scatter plot of actual and predicted time for able-bodied users 64
3-18. Scatter plot of actual and predicted time for visually impaired users 64
3-19. Scatter plot of predicted and actual time 65
3-20. Relative error in prediction 66
3-21. Scatter plot of actual and predicted time for new users 67
4-1. Sequence of events in an interaction 74
4-2a. Interface to design user space 77
Figures Page No.
4-2b. Interface to design device space 78
4-2c. Interface to map between user and device space 79
4-2d. Interface to map by joint probability matrix 80
4-3a. Interface to run the model 80
4-3b. Interfaces to define a new task 81
4-4. User and Device spaces for icon manipulation operation 82
4-5. State Transition Diagram of the eight-directional scanning mechanism with 84
a single switch
4-6. CPM-GOMS analyses for clicking on a target using eight-directional 86
scanning
4-7. Comparing Actual and Predicted Task Completion Time 90
4-8. An interface to send and receive emails 91
4-9. State transition diagram of the device space 93
4-10. State transition diagram of the user space 93
5-1. Different phases of movement 100
5-2. Input to the motor behaviour model 101
5-3. An example of the output from the model 102
5-4.Variation of number of pauses w.r.t. difficulty in dragging 103
5-5. Variation of velocity (pixel/msec) of main movement w.r.t. difficulty in 104
dragging
5-6. Histograms of model parameters 104
5-7. Probability of occurrence of a pause 105
5-8. Scatter Diagram of Actual vs. Predicted Task Completion Time (in msec.) 106
5-9. Prediction from my model for mouse interface 107
5-10. Screenshot of the experiment for mouse interface 109
5-11. Measurement of hand strength 112
5-12. Average number of Pauses per pointing task vs. Grip Strength 115
115
5-13. Average number of Pauses per pointing task vs. Log of Grip Strength
116
5-14. Average number of Pauses per pointing task vs. Active range of ROM of
Wrist
8
Figures Page No.
5-15. Average number of Pauses per pointing task vs. Active range of ROM of 116
Forearm
5-16. Average number of Pauses per pointing task in different phases of 117
movement vs. Grip Strength (SMNS: Sub Movement Near Source, SMIM: Sub
Movement in Middle SMNE: Sub Movement Near End)
5-17. Velocity of Movement vs. Grip Strength 117
5-18. Scatter plot of prediction 121
5-19. Percentage error of prediction 121
5-20. Scatter plot between actual and predicted task completion times 122
5-21. Error Plot 123
5-22. Index of Performance vs. Grip Strength 125
5-23. Index of Performance vs. Tip Pinch Strength 125
5-24. Parameter b vs. Tip Pinch Strength 126
6-1. Sequence of operations in the simulator 131
6-2. Corpus of Shapes 133
6-3. Corpus of Icons 133
6-4. Sample screenshot of the study 134
6-5. Scatter plot between actual and predicted task completion time 138
6-6. Relative error in prediction 138
6-7. Effect size comparison in ANOVA 142
6-8. Effect size comparison in MANOVA 142
6-9. Effect of Font size in different user groups 143
6-10. Effect of Spacing in different user groups 144
6-11. Effect of medium Spacing in motor impaired users 144
6-12. Effect of medium Font size in motor impaired users 145
6-13. The eight-directional Scanning System 151
6-14. The Block Scanning System 152
153
6-15. The Cluster Scanning System
6-16. Screenshot of the demonstration for scanning interfaces 154
6-17. Variation of T w.r.t. the number of clusters 158
9
Figures Page No.
6-18. Performance Comparison of Different Scanning Systems 160
6-19. Comparing Cluster Scanning and Block Scanning for tasks using and not 162
using Internet
6-20. Task completion times for the scanning systems 165
6-21. Comparing the scanning systems 168
7-1. Timescale of human action (adapted from [Newell, 1990]) 174
10
List of Tables
Tables Page No.

3-1. List of Participants 54
3-2. Correlation between fixation duration and image processing algorithms 56
3-3. Correlation between actual and predicted total fixation time 63
3-4. New Participants 67
3-5. Actual and predicted visual search path 68
3-6. Comparative analysis of my model 71
4-1. Output from the model for icon manipulation operation 82
4-2. Actual and Predicted Task Completion Time (in sec) 90
4-3. Mapping between the user space and device space 92
4-4. Output of the cognitive model 94
5-1. Actual and Predicted Task Completion Time 105
5-3. Correlation coefficients for each participant 123
6-2. Test of within-subjects effects on task completion time 141
6-3. Multivariate test on completion time 141
6-4. Estimated Task Completion Time for different scanning systems 161
6-6.Comparing scanning systems 166
6-7. Correlation coefficients for the Scanning Systems 167
Chapter 1 Introduction
Not only do physically disabled people have experiences which are not available to the able-
bodied, they are in a better position to transcend cultural mythologies about the body,
because they cannot do things the able-bodied feel they must do in order to be happy,
'normal,' and sane. If disabled people were truly heard, an explosion of knowledge of the
human body and psyche would take place.
-Susan Wendell, from her book “The Rejected Body: Feminist Philosophical Reflections on
Disability”,1996
1.1. Background
The World Health Organisation (WHO) states that the number of people aged 60 and
over will be 1.2 billion by 2025 and 2 billion by 2050 [WHO website, 2009]. The very
old (age 80+) is the fastest growing population group in the developed world. Many
of these elderly people have disabilities which make it difficult for them to use
computers. The definition of the term ‘Disability’ differs across countries and
cultures, but the World Bank estimates a rate of 10-12% of population worldwide
having a condition that inhibits their use of standard computer systems [World Bank
website, 2009]. The Americans with Disabilities Act (ADA) in USA and The
Disability Discrimination Act (DDA) in UK prohibits any discrimination between
able bodied and disabled people with respect to education, service, and employment.
There are also ethical and social reasons for designing products and services for this
vast population. In particular, computers offer valuable assistance to people with
physical disabilities and help to improve their quality of life. However the diverse
range of abilities complicates the designing of human computer interfaces for these
users. Many inclusive or assistive software systems often address a specific class of
user and still exclude many users. Lack of knowledge about the problems of disabled
and elderly users has often led designers to develop non inclusive systems. There are
guidelines for designing accessible systems, particularly accessible websites [Web
Content Accessibility Guidelines, 2008], but designers often do not conform to the
guidelines when developing new systems. Additionally, the guidelines are not always
adequate to analyze the effects of impairment on interaction with devices.
Evaluation of assistive interfaces can be even more difficult than their design.
Assistive interfaces are generally evaluated by analysing log files after a user trial
[Lesher and colleagues, 2000; O’Neill, Roast and Hawley, 2000; Hill and Romich,
2007]. As an example of a different approach, Rizzo and colleagues [1997] evaluated
the AVANTI project [Stephanidis, 1998] by a technique combining cognitive
walkthrough and Normans’ seven-stage model [Shneiderman, 2001]. However it is
often difficult to find participants with specific disabilities to conduct a user trial.
Petrie and colleagues [2006] take the approach of remote evaluation. This evaluation
technique does not require participants to be brought into a laboratory and can
increase the sample size of a study. However it still does not avoid the need to find
disabled participants nor to replace controlled experiment.
Hypothesis
A modelling tool for people with disabilities is particularly important, as user trials
are often difficult and time consuming. In different domains of science, simulation
and modelling have been already found to be effective for explaining and augmenting
existing theories. However, very few human computer interaction (HCI) models have
considered users with disabilities. I take a novel approach to the design and
evaluation of assistive interfaces by simulating interaction patterns of users with and
without disabilities. I hypothesize that
It is possible to develop systematic models of human computer interaction

patterns of people with a wide range of abilities by analysing and quantifying
their range of abilities.
13
I have investigated how physical capabilities of users with a wide range of abilities
are reflected in their interactions with digital devices and modelled their interaction
patterns. In my work, I have tried to find answers of the following questions
o How can we predict completion time of representative tasks for people with a
wide range of abilities?
o How do different physical impairments affect human computer interaction?
In particular,
o How does visual impairment affect visual searching in a computer
screen?
o How does mobility impairment affect pointing using different input
devices?
o How can we relate physical characteristics of users with the simulation
parameters?
o How effective will a simulation be in designing and evaluating interfaces for
people with diverse range of abilities?
1.2. Proposed solution

To explain the above hypothesis, I have developed a simulator to help with the design
and evaluation of assistive interfaces. The simulator can predict the likely interaction
patterns when undertaking a task using a variety of input devices, and estimate the
time to complete the task in the presence of different disabilities and for different
levels of skill.
Figure 1-1 shows the use of my simulator. I aim to help evaluate existing systems and
different design alternatives with respect to many types of disability. The evaluation
process would be used to select a particular interface, which can then be validated by
14
a formal user trial. The user trials also provide feedback to the models to increase
their accuracy. As each alternative design does not need to be evaluated by a user
trial, it will reduce the development time significantly.
Prototype
Systems
Best
Alternative
User New
Simulation
Testing Systems
Interaction
Existing Patterns
Systems
Figure 1-1. Use of the simulator
1.3. Development methodology

I have developed the models using the following three steps (Figure 1-2)
Design
Calibration
Validation
The models draw on earlier work on cognitive science, studies of different

impairments and the effects of those on human computer interaction. The designs are
characterised by
The type of the model (e.g. Neural network, linear regression model and so
on)
15
Input parameters (e.g. type and extent of disability of users)

Intended output (e.g. task completion time, visual search path and so on)
User interfaces for interacting with the model (e.g. interfaces to take input and
showing output)
Literature survey on cognitive

science, disability and impairment
Design
Basic framework
Calibration
User
Calibrated model
Studies
Validation
Validated model
Figure 1-2. Development methodology
There are two phases in model development
Calibration through exploration

Validation through prediction
Exploratory models emphasize explaining facts while validated models are used for
prediction. I also followed this general principle in developing models of human
16
performance. I calibrated the models through a series of user studies and later
validated them by controlled experiments.
1.4. Thesis Structure

The remainder of the dissertation is organized as follows:
I discuss earlier research on human behaviour simulation in Chapter 2. At the end of

the chapter I set the objectives for this work and also introduce my simulator.
My user model consists of three components - a perception model, a cognitive model

and a motor behaviour model. I describe these models in Chapters 3, 4 and 5
respectively.
The perception model explores the principles of eye gaze fixation and eye movement
trajectories of able-bodied people and people with visual impairment. In Chapter 3, I
present the design of the model followed by an experiment to calibrate and validate
the model.
The cognitive model simulates intentions of users based on their prior knowledge and
the task. I also present three case studies of using the cognitive model in Chapter 4.
The motor behaviour model explores how hand strength is reflected in pointing
performance and predicts movement time of pointing tasks undertaken by motor
impaired participants. In Chapter 5, I present the design of the model followed by
pilot and confirmatory studies to calibrate and validate the model. I have also
explored how hand strength affects pointing performance of able-bodied users.
17
Chapter 6 evaluates the complete simulator. I present two applications of the

simulator - selecting an interface layout from a set of alternatives and designing a new
assistive interaction technique.
Finally I conclude and point to a few possible extensions of my research in Chapter 7.
1.5. Publications
Some portions of the dissertation have appeared in the following publications:
1. P. Biswas and P. Robinson, A New Screen Scanning System based on

Clustering Screen Objects, Journal of Assistive Technologies, Vol. 2 Issue 3
September 2008, pp. 24-31, ISSN: 1754-9450.
2. P. Biswas and P. Robinson, A Case Study of Simulating HCI for Special
Needs, Designing Inclusive Futures, Springer-Verlag, 2008, pp. 175-184,
ISBN 978-1-84800-210-4.
3. P. Biswas and P. Robinson, Evaluating the cluster scanning system, Designing
Inclusive Interactions, Springer-Verlag, 2010
4. P. Biswas and P. Robinson, The effects of hand strength on pointing
performance, Designing Inclusive Interactions, Springer-Verlag, 2010
5. P. Biswas and P. Robinson, A brief survey on user modelling in HCI,
Intelligent Techniques for Speech, Image and Language Processing, Springer-
Verlag, 2010
6. P. Biswas and P. Robinson, Evaluating the design of inclusive interfaces by
simulation, Proceedings of the ACM International Conference on Intelligent
User Interfaces (IUI) 2010
7. P. Biswas and P. Robinson, Automatic Evaluation of Assistive Interfaces,
Proceedings of the ACM International Conference on Intelligent User
Interfaces (IUI) 2008, pp. 247-256.
18
8. P. Biswas, Simulating HCI for all, CHI Extended Abstracts 2008: pp. 2649-
2652.
9. P. Biswas and P. Robinson, Simulation to Predict Performance of Assistive
Interfaces, Proceedings of the 9th International ACM SIGACCESS
Conference on Computers and Accessibility (ASSETS’07) pp. 827-828.
10. P. Biswas, Simulating HCI for Special Needs, ACM SIGACCESS Newsletter,
Issue 89, Sept 2007, pp. 7-10.
11. P. Biswas and P. Robinson, Modelling Perception using Image Processing
Algorithms, 23rd British Computer Society Conference on Human-Computer
Interaction (HCI 09).
12. P. Biswas and P. Robinson, Predicting Pointing Time from Hand Strength,
USAB 2009, LNCS 5889, pp. 428–447.
13. P. Biswas, T. M. Sezgin and P. Robinson, Perception model for people with
visual impairments, Proceedings of the 10th International Conference on
Visual Information Systems (VISUAL 2008), LNCS 5188, pp. 279-290, 2008.
14. P. Biswas and P. Robinson, Performance Comparison of Different Scanning
System using a Simulator, Proceedings of the 9th European Conference of
Advancement of Assistive Technology in Europe (AAATE ‘07), pp. 873-877.
15. P. Biswas and P. Robinson, Effects of Physical Capabilities on Interaction,
Workshop: Defining the Architecture for Next Generation Inclusive
Television in EuroITV 2009.
16. P. Biswas and P. Robinson, Modelling user interfaces for special needs,
Accessible Design in the Digital World (ADDW) 2008.
17. P. Biswas and P. Robinson, A Motor-Behaviour Model for Physically
Challenged Users, Cambridge Workshop on Universal Access and Assistive
Technology, Cambridge, April 2008, pp 5-9, ISSN 0963-5432.
18. P. Biswas and P. Robinson, Simulating HCI for all, Proceedings of the IET
Conference on Recent Advances in Assistive Technology and Engineering
(RAATE 2007).
19
Chapter 2 Literature survey
Computer simulation is not the deception of others but the construction of "a suitably
analogous apparatus" on a computer.
-Hans-Jiirgen Eikmeyer and Ulrich Schade, from their paper, "The Role of Computer
Simulation in Neurolinguistics", Nordic Journal of Linguistics, 16, 1993
2.1. Introduction
Addressing a large variety of users is always a challenge to designers due to diverse

range of abilities and differences in task, prior knowledge and situation. A user model
is a representation of the knowledge and preferences of users [Benyon and Murray,
1993].
Research on simulating user behaviour to predict machine performance was originally

started during the Second World War. Researchers tried to simulate operators’
performance to explore their limitations while operating different military hardware.
During the same time, computational psychologists were trying to model the mind by
considering it as an ensemble of processes or programs. McCulloch and Pitts’ model
of the neuron and subsequent models of neural networks, and Marr’s model of vision
are two influential works in this discipline. Boden [1985] presents a detailed
discussion of such computational mental models. In the late 70s, as interactive
computer systems became cheaper and accessible to more people, modelling human
computer interaction (HCI) also gained much attention. However, models like Hick’s
Law [Hick, 1952] or Fitts’ Law [Fitts, 1954] which predict visual search time and
movement time respectively were individually not enough to simulate a whole
interaction.
The Command Language Grammar [Moran, 1981] developed by Moran at Xerox

PARC could be considered as the first HCI model. It took a top down approach to
decompose an interaction task and gave a conceptual view of the interface before its
implementation. However it completely ignored the human aspect of the interaction
and did not model the capabilities and limitations of users. Card, Moran and Newell’s
Model Human Processor (MHP) [Card, Moran and Newell, 1983] was an important
milestone in modelling HCI since it introduced the concept of simulating HCI from
the perspective of users. It gave birth to the GOMS family of models [Card, Moran
and Newell, 1983] that are still the most popular modelling tools in HCI.
There is another kind of model for simulating human behaviour that not only works
for HCI but also aims to establish a unified theory of cognition. These types of models
originated from the earlier work of computational psychologists. Allen Newell
pioneered the idea of unifying existing theories in cognition in his famous paper “You
can’t play 20 questions with nature and win” at the 1973 Carnegie Symposium
[Newell, 1973]. Since then, a plethora of systems have been developed that are termed
as cognitive architectures and they simulate the results of different experiments
conducted in psychological laboratories. Since these models are capable (or at least
demanded to be capable) of simulating any type of user behaviour, they are also often
used to simulate the behaviour of users while interacting with a computer. Gray and
colleagues [1997] assert that cognitive architectures ensure the development of
consistent models over a range of behavioural phenomena due to their rigorous
theoretical basis.
So there are two main approaches of user modelling: the GOMS family of models was
developed only for HCI while the models involving cognitive architectures took a
more detailed view of human cognition. Based on the accuracy, detail and
completeness of these models, Kieras [2005] classified them as low fidelity and high
fidelity models respectively. These two types of model can be roughly mapped to two
different types of knowledge representation. The GOMS family of models is based on
goal-action pairs and corresponds to the Sequence/Method representation while
21
cognitive architectures aim to represent the users’ mental model [Carroll and Olson,
1990]. The Sequence/Method representation assumes that all interactions consist of a
sequence of operations or generalized methods, while the mental model representation
assumes that users have an underlying model of the whole system.
There is a third kind of model in HCI that evaluates an interface by predicting users’
expectations, rather than their performance (e.g. Task Action Language [Reisner,
1981], Task Action Grammar [Payne and Green, 1986] etc.). These models represent
an interaction by using formal grammar where each action is modelled by a sentence.
They can be used to compare users’ performance based on standard sentence
complexity measures; however, they have not yet been used and tested extensively for
simulating users’ behaviour [Carroll and Olson, 1990].
In the following sections, I briefly describe these different types of user model. Then,
I present a critical review of existing models and set out the objectives of this
research.
2.2. The GOMS family of models

GOMS stands for Goals, Operators, Method and Selection. It was inspired by the GPS
system [Newell and Simon, 1995] developed by Newell. It assumes that people
interact with a computer to achieve a goal by selecting a method, which consists of a
sequence of basic operations. The GOMS model enables a designer to simulate the
sequence of actions of a user while undertaking a task by decomposing the task into
goals and sub goals [John and Kieras, 1996]. There are many variations of the original
GOMS model.
22
The KLM model [Keystroke Level Model, Card, Moran and Newell, 1983] simplifies
the GOMS model by eliminating the goals, methods, and selection rules, leaving only
six primitive operators. They are:
1. Pressing a key
2. Moving the pointing device to a specific location
3. Making pointer drag movements
4. Performing mental preparation
5. Moving hands to appropriate locations, and
6. Waiting for the computer to execute a command.
The durations of these six operations have been empirically determined. The task
completion time is predicted by the number of times each type of operation must
occur to accomplish the task.
Kieras developed a structured language representation of GOMS model, called

NGOMSL (Natural GOMS Language) [Kieras, 1994]. Originally, it was an attempt to
represent the content of a CCT model [Johnson, 1992] at a higher level of notation.
CCT is a rule-based system developed by Bovaria and colleagues [1990] to model the
knowledge of users of an interactive computer system. In NGOMSL, the methods of
the original GOMS model are represented in terms of production rules of the CCT
model. Kieras and colleagues [1995] also developed a modelling tool, GLEAN
(GOMS Language Evaluation and Analysis), to execute NGOMSL. It simulates the
interaction between a simulated user with a simulated device for undertaking a task.
John and Kieras [1996] proposed a new version of the GOMS model, called CPM-
GOMS, to explore the parallelism in users’ actions. This model decomposes a task
into an activity network (instead of a serial stream) of basic operations (as defined by
KLM) and predicts the task completion time based on the Critical Path Method.
23
2.3. Cognitive architectures

Allen Newell [1990] developed the SOAR (State Operator And Result) architecture as
a possible candidate for his unified theories of cognition. According to Newell [1990]
and Johnson-Laird [1988], the vast variety of human response functions for different
stimuli in the environment can be explained by a symbolic system. So the SOAR
system models human cognition as a rule-based system and any task is carried out by
a search in a problem space. The heart of the SOAR system is its chunking
mechanism. Chunking is “a way of converting goal-based problem solving into
accessible long-term memory (productions)” [Newell, 1990]. It operates in the
following way. During a problem solving task, whenever the system cannot determine
a single operator for achieving a task and thus cannot move to a new state, an impasse
is said to occur. An impasse models a situation where a user does not have sufficient
knowledge to carry out a task. At this stage SOAR explores all possible operators and
selects the one that brings it nearest to the goal. It then learns a rule that can solve a
similar situation in future. Laird and colleagues successfully explained the power law
of practice through the chunking mechanism [Laird, Rosenbloom, and Newell, 1984].
However, there are certain aspects of human cognition (such as perception,

recognition, motor action) that can better be explained by a connectionist approach
than a symbolic one [Oka, 1991]. It is believed that initially conscious processes
control our responses to any situation while after sufficient practice, automatic
processes are in charge for the same set of responses [Hampson and Morris, 1996].
Lallement and Alexandre [1997] have classified all cognitive processes into synthetic
or analytical processes. Synthetic operations are concerned with low level, non
decomposable, unconscious, perceptual tasks. In contrast, analytical operations
signify high level, conscious, decomposable, reasoning tasks. From the modelling
point of view, synthetic operations can be mapped on to connectionist models while
analytic operations correspond to symbolic models. Considering these facts, the ACT-
24
R system [Adaptive Control of Thought- Rational, Anderson and Lebiere, 1998] does
not follow the pure symbolic modelling strategy of the SOAR, rather it was developed
as a hybrid model, which has both symbolic and sub symbolic levels of processing. At
the symbolic level, ACT-R operates as a rule-based system. It divides the long-term
memory into declarative and procedural memory. Declarative memory is used to store
facts in the form of ‘chunks’ and the procedural memory stores production rules. The
system works to achieve a goal by firing appropriate productions from the production
memory and retrieving relevant facts from the declarative memory. However the
variability of human behaviour is modelled at the sub-symbolic level. The long-term
memory is implemented as a semantic network. Calculation of the retrieval time of a
fact and conflict resolution among rules is done based on the activation values of the
nodes and links of the semantic network.
The EPIC (Executive-Process/Interactive Control) [Kieras and Meyer, 1990]

architecture pioneers to incorporate separate perception and motor behaviour modules
in a cognitive architecture. It mainly concentrates on modelling the capability of
simultaneous multiple task performance of users. It also inspired the ACT-R
architecture to install separate perception and motor modules and developing the
ACT-R/PM system. A few examples of their usage in HCI are the modelling of menu
searching and icon searching tasks [Hornof and Kieras, 1997; Byrne, 2001].
The CORE system (Constraint-based Optimizing Reasoning Engine) [Howes, Vera

and Lewis, 2004; Tollinger and colleagues, 2005; Eng and colleagues, 2006,] takes a
different approach to model cognition. Instead of a rule-based system, it models
cognition as a set of constraints and an objective function. Constraints are specified in
terms of the relationship between events in the environment, tasks and psychological
processes. Unlike the other systems, it does not execute a task hierarchy; rather
prediction is obtained by solving a constraint satisfaction problem. The objective
function of the problem can be tuned to simulate the flexibility in human behaviour.
25
There exist additional cognitive architectures (such as Interactive Cognitive

Subsystems [Barnard, 2007], Apex, DUAL, CLARION [Cognitive Architecture,
2007] etc.), but they are not yet as extensively used as the previously discussed
systems.
2.4. Grammar based models

The grammar based model (such as Task action grammar [Payne and Green, 1986]
and Task action language [Reisner, 1981]) simulates an interaction in the form of
grammatical rules. As for example, Task Action Language models
o Operations by Terminal symbols

o Interaction by a Set of rules
o Knowledge by Sentences
This type of modelling is quite useful to compare different interaction techniques.

However, they are more relevant to model knowledge and competence of a user than
performance.
2.5. Application specific models

A lot of work has been done on user modelling for developing customizable
applications. These models have the following generic structure (Figure 2-1). They
maintain a user profile and use different types of Artificial Intelligence (AI) systems
to predict performance. These type of models are particularly popular in online
adaptable systems (such as personalized search engines or portals).
26
The Generative User Model [Motomura, Yoshida and Fujimoto, 2000] was developed
for personalized information retrieval. In this model input query words are related to
user’s mental state and retrieved object using latent probabilistic variables. Norcio
[1989] used fuzzy logic to classify users of an intelligent tutoring system. The fuzzy
groups are used to derive certain characteristic of the user and thus deriving new rules
for each class of user.
Figure 2-1. Application Specific User Models
Norcio and Chen [1992] also used an artificial neural network for the same purpose as
their previous work [Norcio, 1989]. In their model, users’ characteristics are stored as
an image and neural networks are used to find patterns in users’ knowledge, goals and
so on.
The Lumiere convenience project [Horovitz and colleagues, 2008] used influence
diagram in modelling users. Lumiere project is the background theory of the Office
Assistant shipped with Microsoft Office application. The influence diagram models
the relationships among users’ needs, goals, user background etc. However all these
models are developed by keeping only a single application in mind and so they are
hardly usable to model human performance in general.
27
2.6. Review
The GOMS family of models is mainly suitable for modelling the optimal behaviour
(skilled behaviour) of users [John and Kieras, 1996]. These models assume that for
each instance of a task execution, the goal and the plan of a user are determined
before the execution is started. During execution of a task, a novice first time user or a
knowledgeable intermittent user may not have a fixed plan beforehand and can even
change goals (or subgoals) during execution of the task. Even expert users do not
follow a fixed sequence of actions every time. So the assumptions of the GOMS
model may not hold true for many real life interactions. In actuality, these models do
not have probabilistic components beyond the feature of selecting the execution time
of primitive operators from a statistical distribution in order to model the uncertainty
involved in the sub-optimal behaviour of users. As it fails to model the sub-optimal
behaviour, it cannot be used to predict the occurrences of different errors during
interaction. These problems are common for any Sequence/Method representations
since these ways of representations overlook the underlying mental models of users
[Carroll and Olson, 1990].
On the other hand, cognitive architectures model the uncertainty of human behaviour
in detail but they are not easily accessible to non psychologists and this causes
problem as interface designers are rarely psychologist as well. For example, the ACT-
R architecture models the content of a long-term memory in the form of a semantic
network, but it is very difficult for an interface designer to develop a semantic
network of the related concepts of a moderately complex interface. Developing a
sequence of production rules for SOAR or a set of constraints for CORE is equally
difficult. The problem in usability issues of cognitive architectures is also supported
by the development of the X-PRT system [Tollinger and colleagues, 2005] for the
CORE architecture. Additionally, Kieras [2005] has shown that a high fidelity model
cannot always outperform a low fidelity one though it is expected to do so.
28
Researchers have already attempted to combine the GOMS family of models and
cognitive architectures to develop more usable and accurate models. Salvucci and Lee
[2003] developed the ACT-Simple model by translating basic GOMS operations
(such as move hand, move mouse, press key) into ACT-R production rules. However
they do not model the ‘think’ operator in detail, which corresponds to the thinking
action of users and differentiates novices from experts. The model works well in
predicting expert performance but does not work for novices.
Blandford and colleagues [2004] implemented the Programmable User Model (PUM)
[Young, Green and Simon, 1989] by using the SOAR architecture. They developed a
program, STILE (SOAR Translation from Instruction Language made Easy), to
convert the PUM Instruction Language into SOAR productions. However, this
approach also demands good knowledge of SOAR on the part of an interface
designer. Later, the PUM team identified additional problems with runnable user
models and they are now investigating abstract mathematical models [Butterworth
and Blandford, 2007].
There also exist some application specific models that combine GOMS models with a
cognitive architecture. For example, Gray and Sabnani [1994] combined GOMS with
ACT-R to model a VCR programming task, while Peck and John [1992] used SOAR
to model interaction with a help-browser, which ultimately turned out to be a GOMS
model.
Another problem of existing modelling approaches stems from issues related to

disability. Researchers have concentrated on designing assistive interfaces for many
different applications including
o Web Browsers [Stephanidis, 1998 ; IBM Web Adaptation Technology, 2008]
29
o Augmentative and alternative communication aids ([Alm, Arnott and Newell,

1992; Pasero, Richardet and Sabatier, 1994; Mccoy, 1997; Stephanidis, 2003]
etc.).
o New interaction techniques
o Scanning interfaces [Moynahan and Mahoney, 1996; Steriadis and
Constantnou, 2002; Ntoa, Savidis and Stephanidis, 2004]
o Gravity wells [Hwang and colleagues, 2002])
o Novel hardware interfaces
o Head mounted switches
o Eye gaze trackers
o Brain-computer interfaces [Kennedy and colleagues, 2000; Majaranta
and Raiha, 2002; Gnanayutham and colleagues, 2005; Abledata
Products, 2007].
Most of these works concentrate on a particular application or a set of users, which

reduces the scalability of the overall approach. Furthermore, developing systems for a
small segment of market often makes the system very costly [Stephanidis and
colleagues, 1997].
There is not much reported work on systematic modelling of assistive interfaces.

McMillan [1992] felt the need to use HCI models to unify different research streams
in assistive technology, but his work aimed to model the system rather than the user.
The AVANTI project [Stephanidis and colleagues, 1998; 2003] modelled an assistive
interface for a web browser based on static and dynamic characteristics of users. The
interface is initialised according to static characteristics (such as age, expertise, type
of disability and so on) of the user. During interaction, the interface records users’
interaction and adapts itself based on dynamic characteristics (such as idle time, error
rate and so on) of the user. This model works based on a rule based system and does
30
not address the basic perceptual, cognitive and motor behaviour of users and so it is
hard to generalize to other applications.
Keates and colleagues [2000] measured the difference between able-bodied and motor
impaired users with respect to the Model Human Processor (MHP) [Card, Moran and
Newell, 1983] and motor impaired users were found to have a greater motor action
time than their able-bodied counterparts. The finding is obviously important, but the
KLM model itself is too primitive to model complex interaction and especially the
performance of novice users.
My previous user model [Biswas and colleagues, 2005] also took a more generalized
approach than the AVANTI project. It broke down the task of user modelling into
several steps that included clustering users based on their physical and cognitive
ability, customizing interfaces based on user characteristics and logging user
interactions to update the model itself. However the objective of this model was to
design adaptable interfaces and not to simulate users’ performance.
2.7. Objective
Based on the previous discussion, Figure 2-2 plots the existing general purpose HCI
models in a space defined by the skill and physical ability of users. To cover most of
the blank spaces in the diagram, I set my objectives to develop models that can:
o Simulate HCI of both able-bodied and disabled users.

o Work for users with different levels of skill.
o Be easy to use and comprehend for an interface designer.
31
CPM-GOMS EPIC
KLM for Disabled

Expert GOMS
SOAR
Skill ACT-R
level Core
Novice
Disabled Physical Ability Able-bodied
Figure 2-2. Existing HCI models w.r.t. skill and physical ability of users
2.8. Architecture
In light of my objective, I have developed the simulator as shown in Figure 2-3. It
consists of the following three components:
The Application model represents the task currently undertaken by the user by
breaking it up into a set of simple atomic tasks following KLM model [Card, Moran
and Newell, 1983].
The Interface model decides the type of input and output devices to be used by a
particular user and sets parameters for an interface.
The User model simulates the interaction patterns of users for undertaking a task
analysed by the task model under the configuration set by the interface model. It uses
the sequence of phases defined by Model Human Processor [Card, Moran and Newell,
1983].
32
The perception model simulates the visual perception of interface

objects. It is based on the theories of visual attention.
The cognitive model determines an action to accomplish the current
task. It is more detailed than the GOMS model [John and Kieras, 1996]
but not as complex as other cognitive architectures.
The motor behaviour model predicts the completion time and possible
interaction patterns for performing that action. It is based on statistical
analysis of screen navigation paths of disabled users.
Display Perception
Model Model
Task Cognitive
Model Model
Motor
Input
Behaviour
Model
Model
Application Model Interface Model User Model
Figure 2-3. Architecture of the Simulator
2.9. Conclusion
In this chapter I have presented literature survey on human behaviour simulation and
their applications on modelling users in human computer interaction. The review of
the current state-of-the-art work shows a deficiency of modelling tools for users with
33
disabilities. I have developed a simulator to address the problems in existing

modelling techniques. Among all the modules of the simulator, the user model is the
crucial part, which simulates users’ interaction. The following three chapters present
the details of the user model by describing the perception, cognitive and motor
behaviour models.
34
Chapter 3 Perception model
We can now begin to develop a science of graphic design based on a scientific understanding
of visual attention and pattern perception.
-Colin Ware, from his book “Visual Thinking: For Design”, 2008
3.1. Introduction
Computer scientists have studied theories of visual perception extensively for

graphics and, more recently, for human computer interaction (HCI). A good interface
should contain unambiguous objects that are easily distinguishable from each other
and reduce visual search time. In HCI, there are guidelines for designing good
interfaces such as colour selection rules and object arrangement rules [Shneiderman,
1992]. However the guidelines are not always adequate to design a good interface. I
have developed a model of visual perception that can be used to design and evaluate
interfaces based on the visual search time.
My model uses image processing techniques to identify a set of features that

differentiate screen objects. The model is calibrated to estimate fixation durations and
eye movement trajectories and is validated by comparing its predicted visual search
time with actual time for different visual search tasks.
In the next section I present a review of the existing perception models. In the
following sections I discuss the design, calibration and validation of the model.
Finally I make a comparative analysis of my model with other approaches and
conclude by exploring possibilities for further research.
3.2. Related works

Human vision has been investigated in many ways over the years. The Gestalt
psychologists in early 20th century pioneered an interpretation of the processing
mechanisms for sensory information [Hampson and Morris, 1996]. Later the Gestalt
principle gave birth to the top down or constructivist theories of visual perception.
According to this theory, the processing of sensory information is governed by
existing knowledge and expectations [Duncan and Humphreys 1989; Desimone and
Duncan, 1995]. On the other hand, bottom up theorists suggest that perception occurs
by automatic and direct processing of stimuli [Jonides,1981; Itti and Koch, 2001].
Considering both approaches, present models of visual perception incorporate both
top down and bottom up mechanisms [Neisser, 1976]. This is also reflected in recent
experimental results in neurophysiology [Luck and colleagues, 1997; Reynolds and
Desimone, 1999].
Knowledge about theories of perception has helped researchers to develop

computational models of visual perception. Marr’s model of perception is the pioneer
in this field [Marr, 1980] and most of the other models follow its organization. In
recent years, a plethora of models have been developed (e.g. ACRONYM, PARVO,
CAMERA and so on [Rosandich, 1997]), which have also been implemented in
computer systems. The working principles of these models are based on the general
framework proposed in the analysis-by-synthesis model of Neisser [1976] and also
quite similar to the Feature Integration Theory of Triesman [1980]. It mainly consists
of the following three steps:
Feature extraction: As the name suggests, in this step the image is analysed
to extract different features such as colour, edge, shape, curvature and so on
This step mimics neural processing in the V1 region of the brain [Tovee,
2008].
36
Perceptual grouping: The extracted features are grouped together based on

different heuristics or rules (such as the proximity and containment rule in the
CAMERA system, rules of collinearity, parallelism and terminations in the
ACRONYM system [Rosandich, 1997]). Similar types of perceptual grouping
occur in V2 and V3 regions of the brain [Tovee, 2008].
Object recognition: The grouped features are compared to known objects and
the closest match is chosen as the output.
In these three steps, the first step models the bottom up theory of attention while the
last two steps are guided by top down theories. All of these models aim to recognize
objects from a background picture and some of them have been proved successful at
recognizing simple objects (such as mechanical instruments). However, they have not
demonstrated such good performance at recognizing arbitrary objects [Rosandich,
1997]. These early models do not operate at a detailed neurological level. Itti and
Koch [2001] present a review of computational models, which try to explain vision at
the neurological level. Itti’s pure bottom up model [Itti and Koch, 2001] even worked
in some natural environments, but most of these models are used to explain the
underlying phenomena of vision (mainly the bottom up theories) rather than
prediction. As an example of a predictive model, the VDP model [Daly, 1993] uses
image processing algorithms to predict retinal sensitivity for different levels of
luminance, contrast and so on. Privitera and Stark [2000] also used different image
processing algorithms to identify points of fixations in natural scenes, however they
do not have an explicit model to predict eye movement trajectory.
In the field of human computer interaction, the EPIC [Kieras and Meyer, 1990] and
ACT-R [Anderson and Lebiere, 1998] cognitive architectures have been used to
develop perception models for menu searching and icon searching tasks. Both the
37
EPIC and ACT-R models [Hornof and Kieras, 1997; Byrne, 2001] are used to explain
the results of Nielsen’s experiment on searching menu items [Nielsen, 1992], and
found that users search through a menu list in both systematic and random ways. The
ACT-R model has also been used to find out the characteristics of a good icon in the
context of an icon searching task [Fleetwood and Byrne, 2002; 2006]. However, the
cognitive architectures emphasize modeling human cognition and so the perception
and motor modules in these systems are not as well developed as the remainder of the
system. The working principles of the perception models in EPIC and ACT-R/PM are
simpler than the earlier general purpose computational models of vision. These
models do not use any image processing algorithms [Hornof and Kieras, 1997;
Fleetwood and Byrne, 2002; 2006]. The features of the target objects are manually fed
into the system and they are manipulated by handcrafted rules in a rule-based system.
As a result, these models do not scale well to general purpose interaction tasks. It will
be hard to model the basic features and perceptual similarities of complex screen
objects using propositional clauses. Modelling of visual impairment is particularly
difficult using these models. For example, an object seems blurred in a continuous
scale for different degrees of visual acuity loss and this continuous scale is hard to
model using propositional clauses in ACT-R or EPIC. Shah and colleagues [2003]
have proposed the use of image processing algorithms in a cognitive model, but they
have not published any result about the predictive power of their model yet.
In short, approaches based on image processing have concentrated on predicting

points of fixations in complex scenes while researchers in HCI mainly try to predict
the eye movement trajectories in simple and controlled tasks. There has been less
work on using image processing algorithms to predict fixation durations and
combining it with a suitable eye movement strategy in a single model. The EMMA
model [Salvucci 2001] is an attempt in that direction, but it does not use any image
processing algorithm to quantify the perceptual similarities among objects. I have
separately calibrated the model for predicting fixation duration based on perceptual
38
similarities of objects and also calibrated it for predicting eye movements. The
calibrated model can predict the visual search time for two different visual search
tasks with significant accuracy for both able-bodied and visually impaired people.
3.3. Design
The perception model takes a list of mouse events, a sequence of bitmap images of an
interface and locations of different objects in the interface as input, and produces a
sequence of eye movements as output. The model is controlled by four free
parameters: distance of the user from the screen, foveal angle, parafoveal angle and
periphery angle (Figure 3-1). The default values of these parameters are set according
to the EPIC architecture [Kieras and Meyer, 1990].
The model follows the ‘spotlight’ metaphor of visual perception. We perceive

something on a computer screen by focusing attention at a portion of the screen and
then searching for the desired object within that area. If the target object is not found
we look at other portions of the screen until the object is found or the whole screen is
scanned. My model simulates this process in the following three steps.
1. Scanning the screen and decomposing it into primitive features.
2. Finding the probable points of attention fixation by evaluating the similarity of
different regions of the screen to the one containing the target.
3. Deducing a trajectory of eye movement.
Figure 3-1. Foveal, parafoveal and peripheral vision

39
The perception model represents a user’s area of attention by defining a focus

rectangle within a certain portion of the screen. The area of the focus rectangle is
calculated from the distance of the user from the screen and the periphery angle
(distance × tan peripheryangle  ), Figure 3-1). If the focus rectangle contains more
 2
than one probable target then it shrinks in size to investigate each individual item.
Similarly, in a sparse area of the screen, the focus rectangle increases in size to reduce
the number of attention shifts.
The model scans the whole screen by dividing it into several focus rectangles, one of
which should contain the actual target. The probable points of attention fixation are
calculated by evaluating the similarity of other focus rectangles to the one containing
the target. We know which focus rectangle contains the target from the list of mouse
events that was input to the system. The similarity is measured by decomposing each
focus rectangle into a set of features (colour, edge, shape and so on) and then
comparing the values of these features. The focus rectangles are aligned with respect
to the objects within them during comparison. Finally, the model shifts attention by
combining different eye movement strategies (such as Nearest [Findlay, 1992; 1997],
Systematic, Cluster [Fleetwood and Byrne, 2002; 2006] and so on), which are
discussed later.
The model can also simulate the effect of visual impairment on interaction by
modifying the input bitmap images according to the nature of the impairment (such as
blurring for visual acuity loss, changing colours for colour blindness). I discuss the
modelling of visual impairment in the next section. Following that, I present the
calibration and validation of the model using an eye gaze tracking experiment.
40
3.4. Modelling visual impairments

As I have discussed earlier, the perception model takes a set of bitmap images as
input. The visual impairment simulations modify these input images before using the
remainder of the model. In particular, I have modelled the following visual functions
and diseases:
3.4.1. Visual acuity loss
Visual Acuity is the sensitivity of the visual interpretative mechanism of the brain. It
represents the acuteness of vision, which depends on the sharpness of the retinal
image within the eye [Crick and Khaw, 2003].
I have modelled visual acuity loss by using a Gaussian low pass filter. I have
calibrated the filter by blurring a Snellen chart [Crick and Khaw, 2003] and then
observing the effect of blurring on people with normal vision.
3.4.2. Contrast sensitivity loss
Contrast Sensitivity is the ability to perceive differences between an object and its
background. It is measured by the difference in the amount of light reflected
(luminance) from two adjacent surfaces [Crick and Khaw, 2003].
Lack of contrast sensitivity can be modelled by reducing the contrast of an image. So

I model it using a function, which can change the contrast of an image in a scale of 1
to 100.
3.4.3. Maccular Degeneration
Maccular Degeneration is a medical condition which causes loss of central vision.

The macula or macula lutea is an oval yellow spot near the centre of the retina that
provides most detailed vision [Crick and Khaw, 2003]. Maccular Degeneration causes
41
damage to the macula. It is predominantly found in elderly adults, which entails

inability to see fine details, to read, or to recognize faces [Faye, 1980]. It mainly has
two forms: dry and wet.
Dry Maccular Degeneration causes vision loss through loss of photoreceptors (rods
and cones) in the central part of the eye [Faye, 1980]. It progresses at a slower pace
than the wet form and vision loss is less severe. In the dry form, the macula thins over
time as part of the aging process. Words may appear blurred or hazy and colours may
appear dim or gray.
In the wet form of Maccular Degeneration, patients lose vision due to abnormal blood
vessel growth, ultimately leading to blood and protein leakage below the macula
[Faye, 1980]. The wet form may cause visual distortion and make straight lines
appear wavy. A central blind spot develops in later stage of the disease. The wet type
progresses more rapidly and vision loss is more pronounced.
I have simulated Maccular Degeneration based on interviews with ophthalmologists

and following literature [Faye, 1980]. I have considered the following two facts
during developing the programs for simulation
o Maccular Degeneration causes loss of central visual field.

o Dry Maccular Degeneration progresses at a slower pace than the wet form of
the disease.
I have simulated the central field loss by a function that takes as input
o an image
o the tentative point of eye gaze fixation on it,
o the radius of the lost visual field
42
o the relative position of the pseudo-fovea [Faye, 1980].
The function processes the input image to put a black (or grey) spot of the specified
radius at the point of fixation. It also shifts the point of fixation according to the
position of the pseudo-fovea. The radius of the black spot increases proportionately
with the progress of the disease.
However, I have simulated the initial stage of wet Maccular Degeneration separately
from the dry form. I have simulated the early stages of wet Maccular Degeneration by
distorting and blurring the image. For simulating the early stages of dry Maccular
Degeneration, I have used a function that takes as input
o an image
o the tentative point of eye gaze fixation on it
o the size, number and positions of scotoma (black patches) with respect to the
point of fixation
The function processes the image to draw the black patches at some random positions
within a ring surrounding the point of eye gaze fixation. The number and size of the
scotoma are determined from a normal distribution with the given parameters as
mean. The nephroid and cardioid curves are used to draw the patches, which closely
match the shape of the scotoma according to the ophthalmologists. As the disease
progresses, the patches grow in size and covers the macula. At this stage, the
simulation signifies central visual field loss and starts to use the function described in
the previous paragraph and progressively increases the disc size. I also blur the whole
image using a Gaussian low pass filter. The standard deviation and template size of
the filter increases proportionately with the progress of the disease.
43
3.4.4. Diabetic Retinopathy
Diabetic Retinopathy is a type of damage to the retina (retinopathy) caused by

damage of blood vessels inside the eyes due to diabetes mellitus. It affects up to 80%
of all patients who have had diabetes for 10 years or more [Crick and Khaw, 2003].
Due to poor blood sugar control, the small blood vessels inside the eyes leak and the
leaked fluid occlude the path of light. As a result random black spots (or scotoma)
appear into the vision. The leaked fluid from the blood vessels makes the macula
swell, which also blurs vision.
I have simulated Diabetic Retinopathy by interviewing the ophthalmologists. I have
used a function which takes as input
o an image
o the tentative point of eye gaze fixation on it
o the size and the number of scotoma.
The function processes the image to draw the black patches at some random positions
within an area defined by the point of attention fixation and the periphery angle. The
number and size of the black patches are determined from a normal distribution with
the given parameters as mean. As the disease progresses, the patches grow in size and
number. I also blur the screen as in Macular Degeneration.
For Maccular Degenration and Diabetic Retinopathy, I use the modified image to the
perception model as input. The perception model works following the previously
mentioned three steps on these images.
3.4.5. Glaucoma
Glaucoma is caused by the death of retinal ganglion cells [Faye, 1980]. Higher
intraocular pressure is one of the significant risk factors for glaucoma. Initially it only
44
creates a few scotoma but with the progress of the disease, patients loose peripheral
field vision and eventually become blind. It is the second leading cause of blindness
worldwide [Crick and Khaw, 2003].
I have simulated Glaucoma by interviewing the ophthalmologists. The early stage of

Glaucoma is simulated by using the same function that I used for early stage of dry
Maccular Degeneration. As the disease progresses the black patches increase in size
and at a later stage they cover the peripheral field of vision, unlike dry Maccular
Degeneration in which case they cover the central field of vision. When the model
simulates complete loss of peripheral field of vision (tunnel vision), the perception
model works differently than other cases since the user can never see the screen as a
whole. The model scans the whole screen from left to right and top to bottom as it is
the default scanning pattern for most people (though it may differ among different
cultures). I interviewed a few ophthalmologists regarding this question and it turns out
that the visual search pattern of people with tunnel vision is not yet well known. So I
have assumed the default, since there is no reason to assume a special scanning
strategy.
3.4.6. Colour blindness
Colour blindness is the inability to perceive differences between some of the colours
that other people can distinguish [Kaiser and Boynton, 1996]. The normal human
retina contains two kinds of photoreceptor cells: the rod cells (active in low light) and
the cone cells (active in normal daylight). Normally, there are three kinds of cones,
each containing a different pigment. The cones are activated when the pigments
absorb light. The absorption spectra of the cones differ; one is maximally sensitive to
short wavelengths, one to medium wavelengths, and the third to long wavelengths.
Their peak sensitivities are in the blue, yellowish-green, and yellow regions of the
spectrum, respectively. The sensitivity of normal colour vision depends on the overlap
between the absorption spectra of the three systems - different colours are recognized
45
when the different types of cone are stimulated to different extents. There exist three
main types of colour blindness [Kaiser and Boynton, 1996; Tovee, 2008]:
• Protanopia: Lacking the long-wavelength sensitive retinal cones, those with

this condition are unable to distinguish between colours in the green-yellow-
red section of the spectrum. They have a neutral point [Kaiser and Boynton,
1996] at a greenish wavelength around 492 nm, which means they cannot
discriminate light of this wavelength from white.
• Deuteranopia: Lacking the medium-wavelength cones, those affected are

again unable to distinguish between colours in the green-yellow-red section of
the spectrum. Their neutral point is at a slightly longer wavelength, 498 nm.
The deuteranope suffers the same hue discrimination problems as the
protanope, but without the abnormal dimming.
• Tritanopia: Lacking the short-wavelength cones, those affected are unable to

distinguish between the colours in the blue-yellow section of the spectrum.
I have simulated colour blindness by following the algorithms of Vienot and

colleagues [1999] and Brettel and colleagues [1997]. This algorithm converts the
images from the RGB colour model to LMS colour model, changes the LMS values
according to the type of colour blindness and then restores the images to RGB space.
The LMS colour space is closer to the cone responses of human vision than other
colour spaces (e.g. CIE XYZ, HSV and so on) and so simulating the effects of colour
blindness using the LMS colour space is more accurate than doing it in other colour
spaces. I have initially verified my programs by comparing its output with a previous
simulation [Interactive demonstration 2008, (Figure 3-2)].
46
Original Image Protanopia Deuteranopia Tritanopia

Brettels’ Simulation
Not done
My Simulation
Figure 3-2. Comparing my colour blindness simulation with Brettels’ Simulation
Original Image Protanopia Simulation Deuteranopia Simulation
Ishihara test
Plate 16
Plate 17
Figure 3-3. Simulation of Ishihara test

47
I have also confirmed the correctness of my program using the Ishihara Test for
colour blindness [Colour Blindness Test, 2008]. I have used plate 16 and 17 of a 24-
plate version of the test (Figure 3-3). It can be seen in Figure 3-3 that the right hand
digit is prominent for Protanopia simulation, while the left hand one for Deuteranopia,
as should happen in cases of protans and deuterans.
3.4.7. Demonstrations
In this section, I have demonstrated a few sample executions of my simulator. In all

these figures, the desired target is the blue triangle at the bottom right corner (marked
with a white dot). The black line indicates a probable trajectory of eye movements
through a series of intermediate points of attention fixation marked with yellow rings.
b. Early stage of Wet Maccular

a. Normal Vision Degeneration
d. Early stage of Dry Maccular

c. Visual Acuity Loss Degeneration
48
f. Severe Maccular Degeneration

e. Contrast Sensitivity Loss
g. Early stage of Diabetic Retinopathy h. Late stage of Diabetic Retinopathy
i. Colour Blindness – Protanopia j. Colour Blindness – Deuteranopia
Figure 3-4. Visual Impairment Simulation
The early stage of wet Maccular Degeneration (Figure 3-4b) introduces blurring and
distortion while the early stage of dry Maccular Degeneration (Figure 3-4d)
introduces some random black spots. As the disease progress, the black spot increases
in radius signifying more central visual field loss (Figure 3-4f). In case of Diabetic
Retinopathy (Figures 3-4g and 3-4h), some random black spots appear at the region of
attention fixation. As the disease progresses, they increase both in size and number
49
(Figure 3-4h). Protans (Figure 3-4i) and Deuterans (Figure 3-4j) do not face any
problem for this particular screen and target, as it does not hamper vision of blue
colour, but the red and green targets appear differently to them. In cases of visual
acuity loss (Figure 3-4c), Maccular Degeneration (Figure 3-4d) and Diabetic
Retinopathy (Figure 3-4h) the number of points of fixation is greater than in normal
vision (Figure 3-4a), contrast sensitivity loss (Figure 3-4e) and colour blindness
(Figures 3-4i and 3-4j) since users may need to investigate all blue targets due to
blurring of the screen.
3.4.8. User interfaces of the model
To cover a wide range of visual impairments, I have developed the user interfaces in
three different levels - in the first level (Figure 3-5) the system simulates different
diseases. In the next level (Figure 3-6) the system simulates the effect of change in
different visual functions (e.g. Visual acuity, Contrast sensitivity, Visual field loss
etc.). In the last level (Figure 3-7), the system allows different image processing
algorithms to be run (such as High pass filtering, Blurring etc.) on input images to
simulate the effect of a particular impairment. This approach also makes it easier to
model the progress of an impairment. Previous simulations of visual impairments
model the progress of impairment by a single parameter [Inclusive Design Toolkit,
2008 and Vision Simulator, 2008] or using a large number of parameters [Visual
Impairment Simulator, 2008]. In my system, the progress of any impairment can be
modelled either by a single parameter or by changing the values of different visual
functions. For example, the extent of a particular case of Maccular Degeneration can
be modelled either by a single scale (Figure 3-5) or by using different scales for visual
acuity and central visual field loss (Figure 3-6).
50
Figure 3-5. Screenshot to model diseases
Figure 3-6. Screenshot to model visual functions

51
Figure 3-7. Screenshot to run image processing algorithms
3.5. Experiment to collect eye tracking data

In this experiment, I have investigated how eyes move across a computer screen while
searching for a particular target. I have kept the searching task very simple to avoid
any cognitive load. The eye gazes of users were tracked by using a Tobii X120 eye
tracker [Tobii Eye Tracker, 2008].
3.5.1. Process
I conducted trials with two families of icons. The first consisted of geometric shapes
with colours spanning a wide range of hues and luminance (Figure 3-8). The second
consisted of images from the system folder in Microsoft Windows to increase the
external validity (Figure 3-9) of the experiment.
52
Figure 3-8. Corpus of Shapes
Figure 3-9. Corpus of Icons
The experimental task consisted of searching two families of icons. The task was as
follows
1. A particular target (shape or icon) was shown.
2. A set of 18 candidates was shown.
3. Participants were asked to click on the candidate(s), which are same as the
target.
4. The number of candidates similar to the target was randomly chosen between
1 and 8 to simulate both serial and parallel searching effects [Treisman and
Gelade, 1980], the other candidates were distractors.
5. The candidates were separated by 150 pixels horizontally and by 200 pixels
vertically.
6. Each participant did ten searching tasks with two families of icons.
3.5.2. Material
I used a 1024 × 768 LCD colour display driven by a 1.7 GHz Pentium 4 PC running
the Microsoft Windows XP operating system. I also used a standard computer Mouse
53
(Microsoft IntelliMouse® Optical Mouse) for clicking on the target and a Tobii X120
Eye Tracker for tracking eye gaze pattern, which has an accuracy of 0.5º of visual
angle. The Tobii studio software was used to extract the points of fixation. I used the
default fixation filter (Tobii fixation filter) and fixation radius (minimum distance to
separate two fixations) of 35 pixels.
3.5.3. Participants
I collected data from 8 visually impaired and 10 able-bodied participants (Table 3-1).
All were expert computer users and had no problem in using the experimental set up.
Table 3-1. List of Participants
Age Gender Impairment
C1 22 M
C2 29 M
C3 27 M
C4 30 F
C5 24 M
Able-bodied
C6 28 M
C7 29 F
C8 50 F
C9 27 M
C10 25 M
P1 24 M Retinopathy
P2 22 M Nystagmus and acuity loss due to Albinism
P3 22 M Myopia (-3.5 / -3.5 Dioptre)
P4 50 F Colour blindness - Protanopia
P5 24 F Myopia (-4.5 / -4.5 Dioptre)
54

P6 24 F Myopia (-5.5 / -5.5 Dioptre)
P7 27 M Colour blindness - Protanopia
P8 22 M Colour blindness - Protanopia
3.5.4. Calibration for predicting fixation duration
Initially I measured the drift of the eye tracker for each participant. The drift was
smaller than half the separation between the icons, so most of the fixations around the
icons could be identified. I calibrated the model to predict fixation duration by the
following two steps.
Step 1: Calculation of image processing coefficients and relating them to the fixation
duration
As I discussed in section 3.2, the first phase of the process of vision is feature
extraction. To extract features, I calculated the
o Colour Histogram [Nixon and Aguado, 2002] algorithm for colour

o Sobel operator [Nixon and Aguado, 2002] to extract edges and
o Shape Context coefficients [Belongie, Malik and Puzicha, 2000; 2002] to
extract shape features of target and distractors (whose locations in the screen
are also part of the input to the perception model). [These algorithms are
further discussed in Appendix]
Then I used a Support Vector Machine (SVM) and a cross validation test to identify
the best feature set for predicting fixation duration for each participant as well as for
all participants. I found that the Shape Context Similarity coefficient and the Colour
55
Histogram coefficient in YUV space work best for all participants taken together. The
combination also has a recognition rate within the 5% limit of the best classifier for
individual participants. Finally I measured the correlation of the Colour Histogram
and Shape Context coefficients between the targets and distractors with the fixation
durations (Table 3-2). The image processing coefficients correlate significantly with
the fixation duration, though the significance is not indicative of their actual
predictive power, as the number of data points is large. However, the Colour
Histogram algorithm in YUV space is moderately correlated (0.51) with the fixation
duration (Figure 3-10). So I developed a classifier that takes the Shape Context
Similarity coefficient and Colour Histogram coefficient in YUV space of a target as
input and predicts the fixation duration on it as output.
Table 3-2. Correlation between fixation duration and image processing algorithms
Colour Colour
Image Shape Edge
Histogram Histogram
Statistics Context Similarity
(YUV) (RGB)
Spearman’s 0.507** 0.444** 0.383** 0.363**
Rho (ρ)
**All are significant at 0.01 level
Colour Histogram (YUV) Vs. Fixation Duration

Fixation Duration (in msec)
1600
1400
1200
1000
800
600
400
200
0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05
Colour histogram(YUV) coefficient
Figure 3-10. Relating Colour Histogram coefficients with fixation duration

56
Step 2: Number of fixations
I found in the eye tracking data that users often fixed eye gaze more than once on
targets or distractors. I investigated the number of fixations with respect to the
fixation durations (Figures 3-11 and 3-12). I assumed that in case of more than one
attention fixation, the recognition took place during the fixation with the largest
duration. Figure 3-12 shows the total number of fixations with respect to the
maximum fixation duration for all able-bodied users and each visually impaired user.
I found that visually impaired people fixed eye gaze more often than their able-bodied
counterparts. Participant P2 (who has nystagmus) has many fixations of duration less
than 100 msec and only two fixations having duration more than 400 msec.
It can be seen that as the fixation duration increases, the number of fixations also
decreases (Figures 3-11 and 3-12). This can be explained by the fact that when the
fixation duration is higher, the users can recognize the target and do not need more
long fixations on it. The number of fixations is smaller when the fixation duration is
less than 100 msec, probably these are fixations where the distractors are very
different from the targets and users quickly realize that they are not intended target.
In my model, I predict the maximum fixation duration using the image processing
coefficients (as discussed in the previous section) and then decide the number of
fixations based on the value of that duration.
57
No of Fixations
250
200
Total No. of Fixations
150
100
50
0
0-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000 >1000
Maximum Fixation Duration (msec)
Figure 3-11. Total no. of fixations w.r.t. fixation duration
Number of Fixations
80
Able-bodied (Avg)
70
P1
Number of Fixations
60
P2
50 P3
40 P4
30 P5
P6
20
P7
10
P8
0
100 200 300 400 500 600 700 800 900 1000
Fixation Duration (in msec)
Figure 3-12. Number of fixations w.r.t. fixation duration
3.5.5. Calibration for predicting eye movement patterns
I investigated different strategies to explain and predict the actual eye movement
trajectory. I rearranged the points of fixation given by the eye tracker following
58
different eye movement strategies and then compared the rearrangements with the
actual sequences, which signify the actual trajectories.
I used the average Levenshtein distance between actual and predicted eye fixation
sequences to compare different eye movement strategies. I converted each sequence
of points of fixation into a string of characters by dividing the screen into 36 regions
and replacing a point of fixation by a character according to its position in the screen
[Privitera and Stark, 2000]. The Levenshtein distance measures the minimum number
of operations needed to transform one string into the other, where an operation is an
insertion, deletion, or substitution of a single character. I normalized the Levenshtein
 No.ofOperations 
distance within a range of 0 to 1 using the following formula 1 − .
 StringLength 
I considered the following eye movement strategies,
o Nearest strategy [Fleetwood and Byrne, 2002; 2006]: At each instant, the
model shifts attention to the nearest probable point of attention fixation from
the current position.
o Systematic Strategy: Eyes move systematically from left to right and top to
bottom.
o Random Strategy: Attention randomly shifts to any probable point of
fixation.
o Cluster Strategy: The probable points of attention fixation are clustered
according to their spatial position and attention shifts to the centre of one of
these clusters. This strategy reflects the fact that a saccade tends to land at the
centre of gravity of a set of possible targets [O’Regan, 1992; Findlay, 1992;
1997], which is particularly noticeable in eye tracking studies on reading
tasks.
59
o Cluster Nearest (CN): The points of fixations are clustered and the first
saccade launches at the centre of the biggest cluster (highest number of points
of fixation). Then the strategy switches to the Nearest strategy.
Figures 3-13 and 3-14 show the average Levenshtein distance for different eye
movement strategies for able-bodied and visually impaired participants respectively.
The best strategy varies across participants. However one of the Cluster, Nearest and
Cluster Nearest (CN) strategies was best for each participant individually. I did not
find any difference in the eye movement patterns of able-bodied and visually impaired
users. The Cluster Nearest strategy turns out to be the best considering all participants
together. It is also significantly better than the random strategy (Figure 3-15, two
tailed paired t-test, t(180) = 3.89, p < 0.001), which indicates that it actually captures
the pattern of eye movement in most of the cases.
Comparing Eye Movement Strategies
0.60
Average Levenshtine Distance
0.50
Nearest
0.40 Systematic
Cluster
0.30 CN
NR
0.20 NCR
Random
0.10
0.00
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 All
Participants
Figure 3-13. Average Levenshtein Distance for able-bodied users
60
Comparing Eye Movement Strategies
0.45
Average Levenshtine Distance
0.40
0.35
CN
0.30
Cluster
0.25
Nearest
0.20
Systematic
0.15
0.10 Random
0.05
0.00
P1 P2 P3 P4 P5 P6 P7 P8 All
Participants
Figure 3-14. Average Levenshtein Distance for visually impaired users
Average
Levenshitin
Distance
Eye Movement Strategies
Figure 3-15. Comparing the best strategy against the Random strategy
3.6. Validation
Initially I have used a 10-fold cross validation test on the classifiers to predict fixation
durations. In this test, 90% data was randomly selected for training and the prediction
61
was tested on the remaining 10%. The process is repeated 10 times and the prediction
error is averaged. It can be seen that the prediction error is less than or equal to 40%
for 12 out of 18 participants and 40% taking all participants together (Figure 3-16).
Cross Validation Test to predict Fixation Duration
70
60
50
% Error
40
30
20
10
0
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
P1
P2
P3
P4
P5
P6
P7
P8
All
Participants
Figure 3-16. Cross validation test on the classifiers
Then, I have used the model to predict the total fixation time for each individual
search task by each participant. The total fixation time is summation of all fixation
durations, which is nearly same as the visual search time. Table 3-3 shows the
correlation coefficient between actual and predicted time for each participant. Figure
3-17 shows a scatter plot of the actual and predicted times taking all able-bodied
participants together and Figure 3-18 shows the scatter plot for each visually impaired
participant.
For able-bodied participants, the predicted time significantly correlates with the actual
for 6 participants (each undertook 10 search tasks), correlates moderately for 3
participants and did not work for one participant (participant C8). For visually
impaired participants, the predicted time significantly correlates with the actual for 5
participants (each undertook 10 search tasks), correlates moderately for 3 participants.
62
Table 3-3. Correlation between actual and predicted total fixation time
Participants Correlation
C1 0.74*
C2 0.79**
C3 0.78**
C4 0.46
C5 0.44
C6 0.74*
C7 0.53
C8 -0.31
C9 0.91**
C10 0.66*
P1 0.85**
P2 0.45
P3 0.63
P4 0.67*
P5 0.84**
P6 0.76**
P7 0.73**
P8 0.53
** p < 0.01
* p < 0.05
63
Scatter Plot for Able-bodied Users
20000
Predicted Time (in msec)
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
0 2000 4000 6000 8000 10000 12000
Actual Time (in msec)
Figure 3-17. Scatter plot of actual and predicted time for able-bodied users
Scatter Plot for Visually Impaired Users

P1
25000 P2
P3
Predicted Time (for msec.)
20000 P4
P5
P6
15000
P7
P8
10000 Linear (P1)
Linear (P2)
5000 Linear (P3)
Linear (P4)
0 Linear (P5)
0 2000 4000 6000 8000 10000 12000 Linear (P6)
Linear (P7)
Actual Time (for msec.)
Linear (P8)
Figure 3-18. Scatter plot of actual and predicted time for visually impaired users
I also validated the model using a Leave-1-out validation test. In this process I tested
the model for each participant by training the classifiers using data from the other
participants. Figure 3-19 shows the scatter plot of actual and predicted time. The
64
predicted and actual time is correlated significantly (ρ = 0.5, p < 0.01). I also
Predicted − Actual
calculated the relative error Actual and show its distribution in Figure 3-20.
The superimposed curve shows a normal distribution with same mean (-5%) and
standard deviation (66%) as the relative error. I also found that 64% of the trials have
a relative error within ± 40%.
Scatter Plot for Leave 1 out Validation
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000
Figure 3-19. Scatter plot of predicted and actual time
65
Error Plot
25
20
15
% Data
10
0
0
0
0
0
0
0
0
0
0
0
00
80
60
40
20
00
0
20
40
60
80
10
12
14
16
18
20
-8
-6
-4
-2
-2
-1
-1
-1
-1
-1
% Error
Figure 3-20. Relative error in prediction
Finally I validated the model by taking data from seven new participants (Table 3-4).
I used a single classifier for all of them which was trained by the previous data set. I
did not change the value of any parameter of the model for any participant. Table 3-4
shows the correlation coefficients between actual and predicted time for each
participant. Figure 3-21 shows a scatter plot of the actual and predicted times for each
participant. It can be seen that the prediction from the model significantly correlates
with actual for 6 out of 7 participants.
Table 3-5 shows the actual and predicted visual search paths for some sample tasks.
The prediction is similar though not exactly same. The model successfully detected
most of the points of fixation. In the second picture of Table 3-5, there is only one
target, which pops out from the background. The model successfully captures this
parallel searching effect while the serial searching is also captured in the other cases.
The last figure shows the prediction for a protanope (a type of colour blindness)
66
participant and so the right hand figure is different from the left hand one as the effect
of protanopia was simulated on the input image.
Table 3-4. New Participants
Participants Age Gender Correlation Impairment

V1 29 F 0.64* None
V2 29 M 0.89** None
V3 25 F 0.70* None
*
V4 25 F 0.72 Myopia (-4.75 / -4.5 Dioptre)
*
V5 25 F 0.69 Myopia (-3.5 / -3.5 Dioptre)
V6 27 F 0.44 Myopia (-8 / -7.5 Dioptre)
V7 26 M 0.70* None
* p < 0.05
** p < 0.01
Validating Perception Model
14000
V1
12000 V2
V3
10000
V4
V5
8000
V6
6000 V7
Linear (V1)
4000 Linear (V2)
Linear (V3)
2000 Linear (V4)
Linear (V5)
0 Linear (V6)
0 2000 4000 6000 8000 10000 12000 14000 Linear (V7)
Figure 3-21. Scatter plot of actual and predicted time for new users
67
Table 3-5. Actual and predicted visual search path
Actual Eye Gaze Pattern Predicted Eye Gaze Pattern
68
3.7. Discussion
The eye tracking data shows that the eye movement patterns are different for different
participants. The performance of the eye tracker (drift, fixation identification and so
on) also differs across participants.
I found that the visual search time is greater for visually impaired users than for able-
bodied users. However, the eye movement strategies of visually impaired users were
not different from their able-bodied counterparts. This is due to the fact that the V4
region in the brain controls the visual scanning and visually impaired participants did
not have any brain injury and so the V4 region worked the same as the able-bodied
users. However, visually impaired users had a greater number of attention fixations
which made the search time longer. Additionally the difference between the numbers
of fixations for able-bodied and visually impaired users was more prominent for
shorter duration fixations (less than 400 msec). Perhaps this means that visually
impaired users needed many short duration fixations to confirm the recognition of
target.
From an interface designers’ point of view, these results indicate that the clarity and
distinctiveness of targets are more important than the arrangement of the targets in a
screen. Since the eye movement patterns are almost identical for all users, the
69
arrangement of the targets need not be different to cater for visually impaired users.
However clarity and distinctiveness of targets will reduce the visual search time by
reducing the recognition time and the number of fixations as well.
Regarding my model, I tried to keep it as general as possible by using the same

feature set (Shape Context Similarity coefficient and Colour Histogram coefficient in
YUV space) to predict fixation duration for all participants. Additionally, I also used
the same eye movement strategy (Cluster Nearest) for all participants. The
experimental task consisted of searching for both basic shapes and real life icons. I
found that the fixation duration does not depend on the type of the target (icon/shape),
hence, the model does not need to be tuned for a particular task and works for both
types of search task. Table 3-6 presents a comparative analysis of my model with the
ACT-R/PM and EPIC models. My model seems to be more accurate, scalable (in
terms of stimuli that can be tackled) and easier to use than the existing models.
However, in real life situations the model fails to take account of the domain
knowledge of users. This knowledge can be either application specific or application
independent. There is no way to simulate application specific domain knowledge
without knowing the application beforehand. However there are certain types of
domain knowledge that are application independent and apply for almost all
applications. For example, the appearance of a pop-up window immediately shifts
attention in real life, however the model still looks for probable targets in the other
parts of the screen. Similarly, when the target is a text box, users focus attention on
the corresponding labels rather than other text boxes, which I have not yet modelled.
There is also scope to model perceptual learning. For that purpose, I could incorporate
a factor like the frequency factor of the EMMA model [Salvucci 2001] or consider
some high level features like the caption of a widget, handle of the application and so
on to remember the utility of a location for a certain application. These issues did not
70
arise in most previous work since they considered very specific and simple search
tasks.
Table 3-6. Comparative analysis of my model
ACT-R/PM or My Model Advantages of

EPIC models my model
Storing Propositional Spatial Array
Stimuli Clauses Easy to use and
Extracting Manually Automatically using scalable
Features Image Processing
algorithms
Matching Rules with binary Image processing More accurate
Features outcome algorithms that give the
minimum squared error
Modelling Not relevant as Considers the type of More detailed and
top down applied to very target (e.g. button, icon, practical
knowledge specific domain. combo box and so on).
Shifting Systematic/ Random Clustering/ Nearest Not worse than
Attention and /Random strategy previous, probably
Nearest strategy more accurate
3.8. Conclusion
In this work, I have developed a systematic model of visual perception which works
for people with a wide range of abilities. I have used image processing algorithms to
quantify the perceptual similarities among objects and predict the fixation duration
based on that. I have calibrated the model by considering different eye movement
strategies. My model is intended to be used by software engineers to design software
interfaces. So I have tried to make the model easy to use and comprehend. As a result
it is not so detailed and accurate to explain the results of different psychological
experiments on visual perception. However, it is accurate enough to select the best
interface among a pool of interfaces based on the visual search time. Additionally, it
can be tuned to capture the individual differences among users and to generate
accurate prediction for any user.
71
Chapter 4 Cognitive model
The human mind does not reach its goals mysteriously or miraculously. Even its sudden
insights and "ahas" are explainable in terms of recognition processes, well-informed search,
knowledge-prepared experiences of surprise, and changes in representation motivated by
shifts in attention. When we incorporate these processes into our theory, as empirical
evidence says we should, the unexplainable is explained.
- Herbert A. Simon, from his book " Machine As Mind", 1995
4.1. Introduction
Cognition refers to the underlying mental processes of our all activities including
perception, intuition, reasoning, judgement and so on. Research on cognitive
modelling dates back to the work of computational psychologists during the 1940s.
The early attempts of cognitive modelling [Duffy, 2008] include the use of various
mathematical models like Bayes’ decision model (e.g. Edwards [1962] probabilistic
information processor) or Shanon’s information theory. Recent research on cognitive
modelling address a wide range of topics such as investigating mental processes for
new idea generation [Wang, 2008], speech perception [Strauss, Mirman and
Magnuson, 2006], bilingualism [Li, 2006], knowledge representation, learning
[Griffiths, Kemp and Tenenbaum, 2008] and so on. However the domain of cognitive
modelling is currently overwhelmed by the cognitive architectures and models
developed using them. As I discussed in chapter 2, the main problems with cognitive
architectures are
1. They are difficult to use without detailed knowledge of psychology.

2. They are mainly used for exploratory modelling. For predictive purpose, they
need a lot of parameter tuning which reduces their scalability.
My cognitive model hits a balance between the details of cognitive architectures and
the comprehensibility of GOMS family of HCI models. It takes a task definition as
input and produces the most probable users’ action as output. In the following
sections I discuss the design of the model and demonstrate three case studies of using
the model. However I have not addressed cognitive impairment in the present
research.
4.2. Design
I have modelled the optimal (expert) and sub-optimal (non expert) behaviour
separately. I have used the CPM-GOMS model [John and Kieras, 1996] to simulate
the optimal behaviour. For sub-optimal behaviour, I have developed a new model.
This model takes a task definition as input and produces a sequence of operations
needed to accomplish the task as output. It simulates interaction patterns of non expert
users by two interacting Markov processes. One of them models the user’s view of the
system and the other signifies the designer’s view of the system. Users operate in the
users’ space to achieve their goals. They do this by converting their intended actions
into an operation offered by the device. At the same time, they map a state of the
device space into a state of the user space to decide on the next action. Users behave
sub-optimally when these mappings between the device space and the user space drift
apart. The assumptions can be summarized as follows:
o Users and devices operate in two different state spaces [Rieman and Young,
1996].
o Each state space can be modelled as a Markov Decision Process. This is

consistent with the fact of finite capacity of short-term memory.
o Users follow the principle of maximum rationality [Newell, 1990], so if they

know an action to achieve their goal, then they will select that action.
73
o Users behave sub-optimally by not properly converting their intended action

into a device operation and misperception of a device state.
o A good interface will minimize the mismatch between the user space and the
device space.
The operation of the system is illustrated in Figure 4-1. At any stage, users have a
fixed policy based on the current task in hand. The policy produces an action, which
in turn is converted into a device operation (e.g. clicking on a button, selecting a menu
item and so on). After application of the operation, the device moves to a new state.
Users have to map this state to one of the state in the user space. Then they again
decide a new action until the goal state is achieved.
Action Operation
Old User Action to Old Device
Operation
State State
Mapping
New User State New Device

State Mapping State
User Space Mapping Device Space
Figure 4-1. Sequence of events in an interaction
74
4.2.1. Learning
Besides performance simulation, my model also has the ability to learn new
techniques for interactions. Learning can occur either offline or online. The offline
learning takes place when the user of the model (such as an interface designer) adds
new states or operations to the user space. The model can also learn new states and
operations itself. During execution, whenever the model cannot map the intended
action of the user into an operation permissible by the device, it tries to learn a new
operation. To do so, it first asks for instructions from outside. The interface designer
is provided with the information about previous, current and future states and he can
choose an operation on behalf of the model. If the model does not get any external
instructions then it searches the state transition matrix of the device space and selects
an operation according to the label matching principle [Rieman and Young, 1996]. If
the label matching principle cannot return a prospective operation, it randomly selects
an operation that can change the device state in a favourable way. It then adds this
new operation to the user space and updates the state transition matrix of the user
space accordingly. In the same way, the model can also learn a new device state.
Whenever it arrives in a device state unknown to the user space, it adds this new state
to the user space. It then selects or learns an operation that can bring the device into a
state desirable to the user. If it cannot reach a desirable state, it simply selects or
learns an operation that can bring the device into a state known to the user.
The model can also simulate the practice effect of users. Initially the mapping
between the user space and the device space remains uncertain. It means that the
probabilities for each pair of state/action in the user space and state/operation in the
device space are less than 1. After each successful completion of a task the model
increases the probabilities of those mappings that lead to the successful completion of
the task and after sufficient practice the probability values of certain mappings reach
75
one. At this stage the user can map his space unambiguously to the device space and
thus behave optimally.
4.2.2. User interfaces
One important aspect of a cognitive model is its own usability, which is mostly
ignored in the current literature on cognitive models. I have developed user interfaces
for developing and running the model (Figures 4-2 and 4-3 respectively). Following
my approach, any model should be developed in three steps. In the first step, the
designer has to specify the possible user states and actions. Then he has to define a
state transition diagram for the current task by selecting a state and an action
alternatively. This can be done with the help of a physical data flow diagram (for
structured design) or a state transition diagram (for object-oriented design), which are
developed as part of the system design document. Individual entries of the probability
transition matrix can be modified by clicking on the ‘Advanced Control’ button
(Figure 4-2a). In step 2, all of the previous operations have to be repeated for
developing the device space. Finally in step 3, the states and actions of the user space
and the device space have to be mapped to each other. The mapping can be done by
defining a joint probability distribution matrix using the interface shown in Figure 4-
2d. The interface designer is also free to choose any advanced modelling techniques
(such as a rule-based system or a decision network) to model the mapping between
the user space and the device space. Once developed, the model can be run using the
interface shown in Figure 4-3a. At this stage, the system can also be used to define
and simulate a new task (Figure 4-3b).
4.3. Case studies

The following three case studies demonstrate the use of the model in three different
contexts. The first one demonstrates the working principle of the model by modelling
76
a simple but non trivial task. The second case study shows the use of the model to
simulate an assistive interaction technique. In this case study I have mainly
demonstrated the probabilistic mapping between the users’ space and device space.
The third case study demonstrates the simulation of users’ behaviour for a new
interface and highlights the learning capability of the model.
Figure 4-2a. Interface to design user space
77
Figure 4-2b. Interface to design device space
78
Figure 4-2c. Interface to map between user and device space
79
Figure 4-2d. Interface to map by joint probability matrix
Figure 4-3a. Interface to run the model
80
Figure 4-3b. Interfaces to define a new task
4.3.1. Study 1- Modelling simple icon manipulation operations
Initially I developed a cognitive model for simple icon manipulation operations (such
as opening, copying, cutting, deleting a folder or a shortcut). These icon manipulation
operations can be done in more than one way- either using the keyboard shortcuts or
by using the popup menu after right clicking on the icon. Following my cognitive
model, the state space diagrams for these icon manipulation operations are shown in
Figure 4-4.
Table 4-1 shows the output from the model for opening a folder or an application
through a shortcut. The model can be configured to do the operation in either way or
by randomly selecting one of the ways. While it uses the popup menu for the first
time, it learns the new states and operations and updates the user space accordingly.
81
Figure 4-4. User and Device spaces for icon manipulation operation
Table 4-1. Output from the model for icon manipulation operation
Model No.: 1
Task Name: Openning Application
Learning Rate: 10
Device Space User Space
Method 1: Opening by double click

State Ready Ready
Action DoubleClick Open
State Opened Opened
Method 2: Opening by right click

Before Learning
State Ready Ready
Action RightClick Open
New State
PopUpMenu
Learned
New Action LeftClick
Learned
State Opened Opened
After Learning
State Ready Ready
Action RightClick Open
State PopUpMenu PopUpMenu
Action LeftClick LeftClick
State Opened Opened
82
In GOMS analyses, the selection rules are often ignored. This small demonstration
shows how we can incorporate selection rules into a cognitive model. My model also
permits to set a priority order among different methods of undertaking a task. Besides
that, this demonstration also shows how it can learn new operations, which is not
possible in GOMS models. Moreover, we do not need to write a set of detailed
procedural rules to accomplish these, as we would in a cognitive architecture.
4.3.2. Study 2- A cognitive model for eight-directional scanning
Many physically challenged users cannot interact with a computer through a

conventional keyboard and mouse. For example, Spasticity, Amyotrophic Lateral
Sclerosis, and Cerebral Palsy confine movement to a very small part of the body.
People with these disorders may interact with a computer through one or two switches
with the help of a scanning mechanism. Scanning is the technique of successively
highlighting items on a computer screen and pressing a switch when the desired item
is highlighted. Two types of scanning mechanisms are commonly used for navigating
through a graphical user interface.
Cartesian scanning moves the cursor progressively in a direction parallel to

the edges of the screen,
Polar scanning selects a direction and then moves along a fixed bearing.
A particular type of Polar scanning (referred as eight-directional scanning system)

that allows movement only in eight directions is commonly used [O’Neill, Roast and
Hawley, 2000; Steriadis and Constantnou, 2002 and Ntoa and colleagues, 2004].
There is little previous work [Damper, 1986] on user models for scanning interfaces
that can predict task completion time or interaction patterns. I have developed and
evaluated a cognitive model for eight-directional scanning system. I have simulated
lack of skill and lack of physical capability separately, so I validated the model with
83
able-bodied users, which ensures that the sub-optimality resulted only due to lack of
skill rather than physical impairment.
The model
In eight-directional Scanning technique the pointer icon is changed at regular time
intervals to show one of eight directions (Up, Up-Left, Left, Left-Down, Down,
Down-Right, Right, Right-Up). The user can choose a direction by pressing the
switch when the pointer icon shows the required direction. After getting the direction
choice, the pointer starts moving. When the pointer reaches the desired point in the
screen, the user has to make another key press to stop the pointer movement and make
a click. A state chart diagram of the scanning system is shown in Figure 4-5, which is
same for user and device spaces in this case. A demonstration of the scanning system
can be seen at http://www.youtube.com/watch? v=0eSyyXeBoXQand feature=user.
Figure 4-5. State Transition Diagram of the eight-directional scanning mechanism

with a single switch
As mentioned previously, I have developed the cognitive model by separately

modelling the optimal (expert) and sub-optimal (non expert) behaviour of users.
84
Modelling optimal behaviour

The task of clicking on a target in an eight-directional scanning system can be broken
down to the following three tasks according to the state chart diagram shown in
Figure 4-5.
1. Select Direction and start Moving
2. Stop Moving
3. Select Mouse Action or request further Movement
The task hierarchy using a CPM-GOMS model of a sample session is shown in Figure
4-6. The model can determine the optimal direction for movement from the source
and target coordinates. If a horizontal or vertical line from the source can reach the
target, then one of the left, right, up or down directions is chosen. Otherwise, the
pointer is moved diagonally until it reaches the same horizontal or vertical line as the
target. The eight-directional scanning system takes the scan delay and scan step as
input. The scan delay is the time interval between any two state changes of the system
while the scan step is the distance crossed by the cursor during an interval equal to the
scan delay. I set the default values of the two parameters at 1 sec and 10 pixels
respectively.
Modelling sub-optimal behaviour
In the eight-directional scanning system, I have found users behave sub-optimally for
the following three reasons:
• They do not always choose the best direction of movement.
• They do not stop the pointer movement at the correct place.
• They try to place the cursor exactly over the centre of the target before
clicking.
85
Figure 4-6. CPM-GOMS analyses for clicking on a target using eight-directional

scanning
In terms of my model, it means that the user space differs from the device space for
the ‘Start Moving’ and ‘Stop Moving’ operations (refer Figure 4-5). I have modelled
this difference by a rule-based system developed in CLIPS [2007]. The rule based
system models the uncertainty in choosing a direction of movement and the stopping
position of the pointer. The rules for choosing direction take the difference in target
86
coordinates and current coordinates as input and give the probabilities of different
direction choices as output. The eight-directional scanning system shows the direction
choices in a particular sequence. The general structure of a rule to select a direction is
as follows:
Compare the difference in X and Y coordinates
Choose optimum direction choice with probability p1
Choose direction choice that comes after the optimum direction choice with
probability p2
Choose direction choice that comes before the optimum direction choice with
probability p3
Where p1>p2>p3 for UpLeft, UpRight, DownLeft and DownRight
And p1>>>p2>p3 for Left, Right, Down and Up, since novice users prefer Manhattan
direction choices than diagonal
The values of p1, p2, and p3 are chosen to reduce the error in prediction. However the
parameters were kept the same for all participants to generalize the model. I have
found that users show more sub-optimal behaviour as the separation between source
and targets increases. So the probability of correct direction choice is made inversely
proportional to the separation between the source and the target. Since CLIPS fires
rules concurrently, a direction choice may appear with two different probability
values. In that case I consider the average probability.
I have found that users stop the pointer movement almost optimally when the source
to target distance is small (less than 700 pixels for a 1280×800 pixel resolution
screen) or when the distance is very large (more than 1300 pixels). In other situations,
users often stop the pointer movement before or after the optimum time instant. When
87
the pointer is close to the target, users also frequently failed to stop the pointer at the
optimum point. So the rules take the source to target distance as input and give the
probability of deviation of input from optimum position as output. The general
structure of rules for stopping condition is as follows
Consider the distance from the target
deviation_of_input is 0 with probability p1
deviation_of_input is -1 with probability p2
deviation_of_input is -2 with probability p4
where p1>>p2,p3>p4,p5
In the fragment above, deviation_of_input 1 means the pointer is stopped after going
one step further from the optimum stopping position, similarly deviation_of_input –1
means the pointer is stopped one step before the optimum stopping position.
Validation of the model

The model was validated with able-bodied users, who had not used this scanning
system previously. In this case the participants behaved sub-optimally because they
were novices.
Process
In this experiment, the participants were instructed to select buttons which were
randomly placed on a screen. The buttons had to be pressed in a particular sequence,
chosen randomly for each execution of the experiment. The random arrangement of
88
buttons ensured that the experimental set up was not biased towards any particular
screen layout or navigation pattern. All of the buttons were colored red except the
next target, which was green. After each button press, the last pressed button was
disabled to show that it was no longer a target. The buttons were also labeled with a
serial number to indicate the sequence. The actual task to be performed by the
scanning system was kept very simple so that it would not impose any cognitive load
on users. Hence any sub-optimal behaviour occurred only because of the scanning
technique itself.
Material
The experiment was carried out on a Laptop with an LCD screen of resolution
1280×800 pixels using the Windows XP operating system. A single keyboard switch
was used to control the scanning techniques. The scan delay was set at 1000 ms. The
dimension of the buttons was 25×40 pixels and kept constant throughout the
experiment.
Participants
Eight able-bodied participants undertook the experiment. The able-bodied participants
were undergraduate and graduate students of my institute. None of them had any
colour blindness. Six participants were male and two were female. Their age ranged
from 23 to 35 years.
Results
The actual and predicted task completion times are shown in Table 4-2 and Figure 4-
7. The predictions are obtained by running Monte-Carlo simulation. It can be seen
from Figure 4-7 that, with two exceptions, the model can predict task completion time
Predicted − Actual
with an overall relative error Actual within ±7%. I also do not find any
89
statistically significant difference between actual and predicted task completion time
(t(8) = 0.31, p > 0.05 for a two tailed paired t-test).
Table 4-2. Actual and Predicted Task Completion Time (in sec)
Relative
Participants Actual Predicted
Difference
P1 364 384 5.5%
P2 391 506 29.4%
P3 386 370 4.2%
P4 367 457 24.5%
P5 314 335 6.7%
P6 303 303 0.0%
P7 299 312 4.4%
P8 473 474 0.2%
Figure 4-7. Comparing Actual and Predicted Task Completion Time
90
4.3.3. Study 3- Modelling interaction for a novel interface
I have modelled an application for sending emails using my cognitive modelling

technique. I developed a very simple interface (Figure 4-8) for sending and receiving
emails. The interface did not impose or indicate any particular order of operations and
allowed the user to do any operation at any time. So it helped us to observe the natural
interaction patterns of users while sending or receiving emails. The device model was
developed from the interface itself. The state transition diagram of the device space is
shown in Figure 4-9. I developed the user space by collecting interaction patterns
from 5 participants on the interface. The participants were expert computer users but
none had used the interface before. They were aged between 25 to 35 years. The state
transition diagram of the user space is shown in Figure 4-10. The mapping between
the user space and the device space is presented in Table 4-3. I ran the model for two
iterations to simulate the task of sending an email using this particular interface. The
output of the model is shown in Table 4-4.
Figure 4-8. An interface to send and receive emails
In this particular example, the difference between the user space and the device space
lies in the interpretation of the ‘Send Mail’ operator. After clicking on the ‘Send Mail’
91
button, users expected that they would automatically be asked to specify a recipient,
which was not supported by the interface. As a result while executing the task for the
first time, users encountered the error message and learned the operation ‘Give
Recipient’. After specifying the recipient, the user wanted to confirm the sending
operation. The ‘ConfirmSending’ action did not have any matching operation in the
device space. At this stage the model applied the label matching principle[Rieman and
Young, 1996], which successfully returned the ‘Send Mail’ operation in the device
space. At the next iteration, the model performed the task optimally by using its
learned knowledge.
Table 4-3. Mapping between the user space and device space
User Space Device Space

States
Ready to write mail Welcome Screen
Letter without recipient Notepad without recipient
Specify recipient Specify recipient
Letter with recipient Notepad with recipient
Confirmation Message Confirmation Message
Actions
Write Mail Write Mail
Send Mail Send Mail
Confirm Recipient(s) Confirm Recipient(s)
92
Figure 4-9. State transition diagram of the device space
Figure 4-10. State transition diagram of the user space
93
Table 4-4. Output of the cognitive model
Device Space User Space

Iteration 1
State Welcome Screen Ready to write mail
Action Write Mail WriteMail
State Notepad without Letter without
recipient recipient
Action SendMail SendMail
State ErrorMsg
New Action GiveRecipients
Learned
State Specify recipient Specify recipient
Action ConfirmRecipient ConfirmRecipient
State Notepad with recipient Letter with recipient
New Action SendMail
Learned
Action SendMail Confirm Sending
State Confirmation Confirmation
Iteration 2
State Welcome Screen Ready to write mail
Action Write Mail WriteMail
State Notepad without Letter without
recipient recipient
Action GiveRecipients GiveRecipients
State Recipient Recipient
Action ConfirmRecipient ConfirmRecipient
State Notepad with recipient Letter with recipient
Action SendMail Confirm Sending
State Confirmation Confirmation
94
4.4. Conclusion
Cognition covers a wide range of topics and it is almost impossible to develop a
single model to simulate all aspects of cognition. Even the cognitive architectures fail
to model high level cognitive functions like affective state, consciousness and so on
My cognitive model intends to simulate human computer interaction only. However it
is not as primitive as GOMS family of models and can simulate the performance of
novice users. In contrast to the cognitive architectures, the model does not need
detailed knowledge of psychology to operate. It has graphical user interfaces to
provide input parameters and showing output of simulation.
I have not considered simulating the effect of cognitive impairment. However the
model can be extended to simulate a few types of impairment like lack of short-term
memory (as in dyslexia), inadequacy in planning which is apparent in some cases of
autism according to the executive dysfunction hypothesis [Burack and collegues,
2001]. For example, lack of short-term memory can be modelled by limiting the
number of states in the user space. The number of states will depict the maximum
amount of information that a user can keep in mind at one time. The state transition
matrix in the user space can also be calibrated to simulate problems in planning.
I have addressed the issue of lack of detail in the model in comparison to the
cognitive architectures by separately modelling perception and motor action. I have
already presented the perception model in the previous chapter and described the
motor behaviour model in the next chapter.
95
Chapter 5 Motor behaviour model
Movement is used in some way, to some degree, in every task accomplished by human beings.
Every individual needs to understand human movement so that any task -- light or heavy, fine
or gross, fast or slow, of long or short duration, whether it involves everyday living skills,
work skills or recreation skills - can be approached effectively.
- Marion R. Broer, from her book "Efficiency of Human Movement", Saunders, 1966
5.1. Introduction
Pointing tasks form a significant part of human computer interaction (HCI) in graphi-
cal user interfaces. Fitts’ Law [Fitts, 1954] and its variations [Mackenzie, 2003] are
widely used to model pointing as a sequence of rapid aiming movements, especially
for able-bodied users. Fitts’ Law [Fitts, 1954] predicts the movement time as a func-
tion of the width and distance to the target. This law is found to be very robust and
works in many different situations including space and under water. However the ap-
plication of Fitts’ Law for people with motor impairment is debatable. Motor im-
paired users only conform to Fitts’ Law when the task is very simple and thus requires
less coordination between vision and motor action [Smits-Engelsman, 2007] or there
are other sensory cues besides vision [Gajos, Wobbrock and Weld, 2007].
I have developed a statistical model to predict the movement time of pointing tasks
performed by people with motor impairment. Prediction from my model has signifi-
cantly correlated with the actual pointing time for different data sets. As part of the
model, I have also developed a new scale of characterizing the extent of disability of
users by measuring their grip strength. Finally I have found that hand strength also
affects performance of able-bodied users and the Index of Performance in a Fitts’ Law
task significantly correlates with grip and tip pinch strength.
5.2. Related work

For disabled users, there is growing evidence that their interaction patterns are signifi-
cantly different from those of their able-bodied counterparts [Trewin and Pain, 1999;
Keates, Clarkson and Robinson, 2000; Keates and Trewin, 2005; Keates, Trewin and
Paradise, 2005]. However the applicability of Fitts’ Law for motor impaired users is a
debatable issue. Smits-Engelsman [2007] and Gajos [2007] found it to be applicable
for children with congential spastic hemiplegia and motor impaired people respec-
tively, but Bravo [1993] and Gump [2002] obtained different result. Fitts’ Law as-
sumes that people perform rapid movements through visual feedback. However, mo-
tor impaired persons can not always follow visual feedback during real life pointing
tasks. Their movements seem to be more rapid and discrete [Gump, 2002]. This may
be a result of their poor coordination between perception and motor action. This poor
coordination causes more neuro-motor noise than the permissible limit of Fitts’ Law
[McCrea and Eng, 2005]. They obey Fitts’ Law when the task is very simple and thus
requires less coordination between vision and motor action [Smits-Engelsman, 2007]
or there are other cues (such as auditory) besides vision [Gajos, Wobbrock and Weld,
2007].
There are a few works to develop an alternative to Fitts’ Law for motor impaired peo-
ple. Gump and colleagues [2002] found significant correlation between the movement
time and the root of movement amplitude (Ballistic Movement Factor [Gan and
Hoffmann, 1988]). Gajos, Wobbrock and Weld [2007] estimated the movement time
by selecting a set of features from a pool of seven functions of movement amplitude
and target width, and then using the selected features in a linear regression model.
This model shows interesting characteristics of movement patterns among different
users but fails to develop a single model for all. Movement patterns of different users
are found to be inclined to different functions of distance and width of targets.
97
5.3. Design
Able-bodied users move the mouse pointer towards a target by a single long sub
movement followed by some smaller sub movements to home on the target. In the
original formulation of Fitts’ Law [Fitts, 1954], it was assumed that a rapid aiming
movement consists of two phases:
• An initial ballistic phase, which brings one near the target.

• A homing phase, which is one or more precise sub movements to home on
the target.
However, this assumption does not hold for motor impaired users. It is due to the fact
that their movement is disturbed by many pauses and they rarely make a big move-
ment towards the target. The main difference between the mouse movement of the
motor impaired and able-bodied users lie in the characteristics of the sub movements
[Trewin and Pain, 1999]. The number of sub movements for motor impaired users is
greater than that of able-bodied users and the main movement towards the target is
often composed of two or more sub movements. The time spent between two sub
movements (described as pause) also significantly affects the total task completion
time. So my model estimates the total task completion time by calculating the average
number of sub movements in a single pointing task, their average duration, and the
average duration of pauses. In the present study, I define a pause as the event when
the mouse stops movement for more than 100 msec. and a sub movement is defined as
a movement occurring between two pauses (Figure 5-1).
To reveal the characteristics of the sub movements and the pauses, I have clustered
the points where the pauses occurred (i.e. a new sub movement started). I have evalu-
ated the optimum number of clusters by using Classification Entropy [Ross, 1997] as
98
a cluster validation index. The optimum number of clusters is three. I have found that
about 90% of the sub movements took place when the mouse pointer is
o near the source such that the pointer has not moved more than 20% of the total
distance or,
o near the target such that the pointer has moved more than 85% of the total dis-
tance.
The remaining 10% of the sub movements actually constitutes the main movement.
The positions of the cluster centres indicate three phases of movement
• Starting Phase: This phase consists of small sub movements near the
source, perhaps while the user gets control of the pointing device.
• Middle Phase: This consists of relatively large sub movements which bring
the pointer near the target.
• Homing Phase: This is similar to the homing phase in Fitts’ Law, though
the number of sub movements is greater.
So my model divided the sub movements and pauses during a pointing task into three
classes based on their position with respect to the source and the target as shown in
Figure 5-1 (the thick blue line depicts a sample cursor trace between a source and a
target). The movement time is estimated as:
p1 (d 1 + s1 ) + p 2 ⋅ d 2 + f (Dist / v 2 ) + p 3 (d 3 + s 3 ) − (s1 + s 3 )
Where,
Dist Distance from source to target
p1 Number of pauses near source
d1 Average duration of a pause near source
99
s1 Average duration of a sub movement near source

p2 Number of pauses in main movement
d2 Average duration of a pause in main movement
v2 Speed of movement in main movement
Fraction of the total distance covered by the main
f movement
p3 Number of pauses near target
d 3 Average duration of a pause near target
s 3 Average duration of a sub movement near target
Source
Target
Initial phase
Main movement
Homing phase
Structure of a sub-movement
Pause Sub Movement Pause
Figure 5-1. Different phases of movement
100
5.3.1. User interfaces of the model
The motor behaviour model takes the position and size of the target as input (Figure
5-2). It also considers the extent of disability of a user. In later sections of the chapter
I have explained the techniques used to measure the extent of disability.
Figure 5-3 shows an example of the output from the model. The thin purple line
shows a sample trajectory of mouse movement of a motor impaired user. It can be
seen that the trajectory contains random movements near the source and the target.
The thick red and black lines encircle the contour of these random movements. The
area under the contour has a high probability of missed clicks as the movement is ran-
dom there and thus lacks control. A good interface should not have more than one tar-
get in this contour and the contour should help to decide the amount of separation be-
tween icons.
Figure 5-2. Input to the motor behaviour model
101
Figure 5-3. An example of the output from the model
5.4. Pilot study

Initially, I have developed a model by statistical analysis of cursor traces of a previous
experiment [Trewin and Pain, 1999]. In that set of data, the experimenters categorized
the users in several ways based on their experience, difficulty in clicking, pointing,
dragging and so on. Among these I found that a scale based on difficulty in dragging,
is significantly correlated (p < 0.05) with three model parameters (Number of pauses
near source, Number of pauses near target [Figure 5-4] and average velocity of main
movement [Figure 5-5]). I drew histograms of other parameters (Figure 5-6) and then
they were approximated by the inverse transform method [Ross, 2002]. However in
developing the model I assumed a fixed boundary among the three regions (near
102
source, main movement, near target). I blurred these boundaries to make the model
more realistic. I calculated the probability of a pause from the function shown in Fig-
ure 5-7. As can be seen from Figure 5-7, the probability of a pause gradually increases
to 1 near the source and the target. The pause duration is estimated by multiplying it
with the probability of occurrence of a pause.
Figure 5-4.Variation of number of pauses w.r.t. difficulty in dragging
103
Figure 5-5. Variation of velocity (pixel/msec) of main movement w.r.t. difficulty in

dragging
Figure 5-6. Histograms of model parameters
To estimate the accuracy of the model, I tested it on 62 pointing tasks undertaken by

15 participants. The predictions are obtained by running Monte-Carlo simulation. The
104
actual and predicted average task completion times and a Z-score distribution of the
actual and predictions are shown in Table 5-1 and Figure 5-9 respectively. Figure 5-8
presents a scatter diagram of actual and average predicted time. The median of the Z-
scores was at –0.27 instead of 0, however the predicted average task completion time
significantly correlates (p < 0.002) with the actual.
Pause Distribution with Disance
1
0.9
Probability of a Pause
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 20 40 60 80 100 120 140 160
Normalized Distance
Figure 5-7. Probability of occurrence of a pause
Table 5-1. Actual and Predicted Task Completion Time
Average Actual
Participants Predicted Time
Time (msec.) (msec.)
P1 3566 1880
P2 4138 2176
P3 3418 2400
P4 4018 2500
P5 3920 2907
105
Average Actual
Participants Predicted Time
Time (msec.) (msec.)
P7 14632 10309
P9 7389 2796
P11 687 1293
P12 14512 9349
P14 14974 22833
P15 4134 10478
P16 3584 1629
P17 7895 15888
P19 4018 2335
P20 3188 8771
r = 0.71, t(15) = 3.64, p < 0.01
Actual vs Predicted Task Completion Time
25000
20000
Actual Time
15000
10000
5000
0
0 2000 4000 6000 8000 10000 12000 14000 16000
Average Predicted Time
Figure 5-8. Scatter Diagram of Actual vs. Predicted Task Completion Time (in
msec.)
106
Figure 5-9. Prediction from my model for mouse interface
5.5. Confirmatory study

In the pilot study, I used a scale based on self reporting of dragging skill for character-
izing the degree of disability. However the choice of the scale was arbitrary and use of
the output from a likert scale in a regression analysis is also debatable among statisti-
cians. So I have tried to develop a scale to measure the extent of disability of a par-
ticipant.
Several clinical scales have been used to measure disability (e.g. Ashworth scale, the
Weighted disability score, Tardieu Scale, Spasticity Grading [Barnes and Johnson,
2001; Scholtes and colleagues, 2006]. etc.), but they are not really applicable for
modelling HCI. The clinical scales deal with a single disease and often are very sub-
jective (such as the Ashworth scale for Spasticity). The descriptions of disease of the
users are inadequate to calibrate a model numerically. I found in my previous dataset
[Trewin and Pain, 1999] that user survey about their skills is also not very accurate.
So I have developed a new scale by evaluating the hand strength of motor impaired
107
users and then correlating them with their HCI performance (such as the task comple-
tion time, number of pauses and so on). It has already been found that active range of
motion (ROM) of wrist can be significantly correlated with the movement time of a
Fitts’ Law task for children with spasticity [Smits-Engelsman and colleagues, 2007].
Hand evaluation devices are cheap, easy to operate and have good test-retest reliabil-
ity [Mathiowetz and colleagues, 1984]. So these are reliable and useful tools for
measuring physical strength making these results useful in practice.
5.5.1. Process
My study consisted of pointing tasks. A sample screenshot of the task is shown in
Figure 5-10. I followed the description of the multiple tapping tasks in ISO 9241 part
9. In this task the pointer was initially located at the middle of the screen. The partici-
pants had to move it towards a target (one of the red dots, appearing a light grey in
monochrome), and click on it. This process was repeated for all the targets. There
were eight targets on the screen and each participant performed the test twice except
one participant (P2), who retired after completing the first test. The distances to the
targets ranged from 200 to 600 pixels while target widths were randomly selected as
an integer between 16 and 48 pixels.
5.5.2. Material
I used a standard optical Mouse and an Acer Aspire 1640 Laptop with a 15.5” monitor
having 1280×800 pixel resolution. I also used the same seating arrangement (same
table height and distance from table) for all participants. I measured the following six
variables for hand strength evaluation (Figure 5-11). Each variable was measured
three times and the average value was considered. I evaluated only one hand (the
dominant hand) of the participants which they used to operate the mouse.
108
Figure 5-10. Screenshot of the experiment for mouse interface
Grip strength measures how much force a person can exert gripping with the hand. I
measured it using a mechanical dynamometer.
Tip pinch strength measures the maximum force generated by a person squeezing
something between the tips of his thumb and index finger. I measured it using a me-
chanical dynamometer.
Radial deviation is the motion that rotates the wrist away from the midline of the
body when the person is standing in the standard anatomical position [Kaplan, 2006].
When the hand is placed over a table with palm facing down, this motion rotates the
hand about the wrist towards the thumb. I measured the maximum radial deviation
using a goniometer.
109
Ulnar deviation is the motion that rotates the wrist towards the midline of the body
when the person is standing in the standard anatomical position. When the hand is
placed over a table with palm facing down, this motion rotates the hand about the
wrist towards the little finger. I measured it with the goniometer.
Pronation is the rotation of the forearm so that the palm moves from a facing up posi-
tion to a facing down position. I measured it using a wrist-inclinometer.
Supination is the opposite of pronation, the rotation of the forearm so that the palm
moves from a facing down position to a facing up position. I measured it with the
wrist-inclinometer.
Measuring Grip Strength Measuring Tip-pinch Strength
110
Standard anatomical position
Measuring ranges of motion
Measuring Radial Deviation
Range of Motion of wrist

(Palm facing down)
Measuring Ulnar Deviation
111
Pronation
Measuring Pronation
Supination
Measuring Supination
Figure 5-11. Measurement of hand strength
5.5.3. Participants
I initially collected data from 10 motor impaired and 6 able-bodied participants (Trial
1 in Table 5-2). The motor impaired participants were recruited from a local centre,
which works on treatment and rehabilitation of disabled people, and they volunteered
112
for the study. To generalize the study, I selected participants with both hypokinetic
and hyperkinetic movement disorders [Flowers, 1976]. Hypokinetic motor impair-
ment results in restricted movement of limbs, (e.g. participants P1, P3, P4 and so on)
while hyperkinetic refers to uncontrolled movement or tremor, (e.g. participants P5,
P6 and so on). All motor impaired participants used a computer at least once each
week. Able-bodied participants were students of my university and expert computer
users.
Age Gender Impairment Trials

participated
C1 30 M
C2 29 M
C3 28 M Able-bodied Trial 1
C4 25 M
C5 29 M
C6 27 F
P1 M Cerebral Palsy reduced manual dex-
30 Trial 1
terity wheel chair user.
Cerebral Palsy reduced manual dex-
P2 43 M terity also some tremor in hand Trial 1
wheel chair user.
P3 25-45 F One handed (dominant hand) the
Trial 1
other hand is paralyzed.
P4 30 M Dystonia cannot speak cannot move
Trial 1
fingers wheelchair user.
Left side (non dominant) paralysed
P5 62 M after a stroke in 1973 also has Trials 1 and 2
tremor
Cerebral attack significant tremor in
P6 44 M whole upper body part fingers al- Trial 1
ways remain folded.
P7 46 F Did not mention disease difficulty
Trial 1
in gripping things no tremor.
113
Age Gender Impairment Trials

participated
P8 >45 F Spina Bifida/ Hydrocephalus wheel-
Trials 1 and 2
chair user.
P9 43 F Did not mention disease restricted
Trials 1 and 2
hand movement no tremor.
P10 >45 M Cerebral Palsy from birth restricted
Trials 1 and 2
hand movement no tremor.
P11 46 M Multiple Sclerosis Trial 2
Cerebral Palsy, Hyper pressure,
P12 41 M Trial 2
nocturnal epilepsy
5.5.4. Results
I found that the movement time significantly correlates (ρ = 0.57, p < 0.001) with
the number of pauses in a pointing task. I also correlated the average number of
pauses per pointing task with the hand strength metrics. Figures 5-12 to 5-15 show the
graphs of number of pauses with respect to Grip Strength, active ROM of Wrist (Ul-
nar + Radial Deviation) and active ROM of Forearm (Pronation + Supination) respec-
tively. I found that some users did not have any range of motion in their wrist, though
they managed to move the mouse to perform the pointing tasks correctly. I also found
that the natural logarithm of grip strength (Figure 5-13) significantly correlates with
the mean (ρ = -0.72, p < 0.001) and standard deviation (ρ = -0.53, p < 0.05) of the
number of pauses per pointing task. I did not find any correlation between that
movement time and the distance, width or Fitts’ Law index of difficulty (ID) [Fitts,
1954] of the targets for motor impaired users. This may be due to the presence of
physical impairment and the small number of pointing tasks (only 16) performed by
the participants. I also did not find any significant correlations involving ranges of
motion (Figures 5-14 and 5-15).
I divided the whole movement path into three phases and observed how the hand
strength affects in the initial, main movement and homing phases. I found that grip
114
strength significantly correlates with the average number of pauses near the source
(Figure 5-16, ρ = -0.61, p < 0.01) and near the target (ρ = -0.78, p < 0.001). I also
found that the mean and standard deviation of the velocity of movement were signifi-
cantly correlated with grip strength (Figure 5-17, ρ = 0.82, p < 0.001 for mean and
ρ = 0.81, p < 0.001 for standard deviation).
Number of Pauses vs. Grip Strength
18
16
14
Number of Pauses
12
10 Motor-impaired
8 Able-bodied
6
4
2
0
0 10 20 30 40 50 60 70
Grip Strength (in Kg)
Figure 5-12. Average number of Pauses per pointing task vs. Grip Strength
Number of Pauses vs. Log of Grip Strength
18
16
14
No. of Pauses
12
10
8
6
4
2
0
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Log of Grip Strength (in Kg)
Figure 5-13. Average number of Pauses per pointing task vs. Log of Grip Strength
115
Number of Pauses vs. ROM Wrist
18
16
14
Number of Pauses
12
10 Motor-impaired
8 Able-bodied
6
4
2
0
0 20 40 60 80 100 120
ROM Wrist (in degree)
Figure 5-14. Average number of Pauses per pointing task vs. Active range of ROM of
Wrist
Number of Pauses vs. ROM ForeArm
18
16
14
Number of Pauses
12
10 Motor-impaired
8 Able-bodied
6
4
2
0
0 50 100 150 200 250
ROM ForeArm (in degree)
Figure 5-15. Average number of Pauses per pointing task vs. Active range of ROM of
Forearm
116
Number of Sub Movements vs. Grip Strength
10
Number of Sub Movements
9
8
7
6 SMNS
5 SMIM
4 SMNE
3
2
1
0
0 10 20 30 40 50 60 70
Figure 5-16. Average number of Pauses per pointing task in different phases of
movement vs. Grip Strength (SMNS: Sub Movement Near Source, SMIM: Sub
Movement in Middle SMNE: Sub Movement Near End)
Velocity vs Grip Strength
0.4 Mean
0.35 Velocity
Velocity (in pixel/msec)
0.3
Stddev
0.25 Velocity
0.2
Linear
0.15 (Stddev
0.1 Velocity)
Linear
0.05 (Mean
0 Velocity)
0 10 20 30 40 50 60 70
Figure 5-17. Velocity of Movement vs. Grip Strength
117
5.5.5. Calibration
I revised my model in the light of these results. Grip strength is used to predict the
number of pauses near the source and destination, and also to predict the speed of
movement. Probability distributions for the other factors were derived using the in-
verse transform method [Ross, 2002]. The model works based on following equations.
(
p1 = α + β × log(S ) + 0.5 × ρ × χ × e (δ ×S ) )
Where
α= 3.95
β = -0.84
χ= 2.29
δ = -0.02
ρ = a random value from a normal distribution with mean 0 and standard
deviation 1
S = Grip strength in kg
p3 = α + β × log(S ) + 0.5 × ρ × (χ + δ × log(S ))

Where
α = 11.06
β = -2.50
χ= 5.73
δ = -1.23
ρ = a random value from a normal distribution with mean 0
and standard deviation 1
118
d1 = α × e β × ( χ + δ × µ )
Where
α = 3997279
β = -0.16
χ = 140
δ = 100
µ = a random value from a uniform distribution between 0 and 1
d 2 = α × e β ×( χ +δ × µ )
Where
α = 12956.60
β = -0.11
χ = 140
δ = 100
µ = a random value from a uniform distribution between 0 and 1
d 3 = α + β × log( S )
Where
α = 449.72
β = -70.78
v2 = α + β × log(S ) + 0.5 × ρ × ( χ + δ × log(S ))

Where
α = -0.12
β = 0.14
χ = 0.007
δ = 0.03
119
ρ = a random value from a normal distribution with mean 0 and standard

deviation 1
Dist
MT = ( p1 − 1) × d1 + α × + d 2 + ( p3 − 1) × d 3
v2
Where
α = 0.9
MT Movement Time
Dist Distance from source to target
p1 Number of pauses near source
d1 Average duration of a pause near source
d2 Average duration of a pause in main movement
v2 Speed of movement in main movement
p3 Number of pauses near target
d3 Average duration of a pause near target
5.5.6. Validation
I tested the performance of my model on 232 pointing tasks performed by 10 motor
impaired and 6 able-bodied participants. The predictions were obtained by simulating
each pointing task using Monte-Carlo simulation. Figures 5-18 and 5-19 show the
scatter plot and relative error in prediction respectively. I calculated the relative error
by using the following formula (Pr edicted− Actual) . In Figure 5-19, I superimposed a Gaus-
Actual
sian curve with same mean and standard deviation as the relative error.
In 10% of the cases the error was more than ±70%, so the model has failed for those
tasks. However, the predicted values are significantly correlated with actual values
(ρ = 0.65, p < 0.001) with error within ±40% over half of the trials. The average rela-
tive error is -2% with a standard deviation of 57%.
120
Scatter Plot
5.5
5
Log(Predicted Time)
4.5
3.5
2.5
2.5 3 3.5 4 4.5 5
Log(Actual Time )
Figure 5-18. Scatter plot of prediction
Error Plot
20
18
16
14
12
% Data
10
0
-120 -100 -80 -60 -40 -20 0 20 40 60 80 100 > 120
% Error
Figure 5-19. Percentage error of prediction
I further validated the model by taking data from six participants (Trial 2 in Table 5-
2). In this second trial, participants P5, P8, P9, P10 and two new participants took
121
part. As most participants felt fatigue quickly, I ran the trial for six minutes for each
participant. In total, they undertook 435 pointing tasks. Figures 5-20 and 5-21 show
the scatter plot and the distribution of relative error between actual and prediction re-
spectively. It can be seen the model did not work well for 10% of the tasks and the
relative error is greater than ±70%. For the remaining 90% of the trial, the actual is
significantly correlated with prediction (ρ = 0.56, p < 0.001). The relative error is
within ±30% for 60% of the trials (262 out of 435 pointing tasks). The average rela-
tive error is -16 % with a standard deviation of 34%. I also calculated the correlation
for each participant (Table 5-3). It can be seen that the prediction is significantly cor-
related (p < 0.01) with actual for five out of six participants.
Scatter Plot
6000
5000
4000
3000
2000
1000
0
0 1000 2000 3000 4000 5000 6000
Figure 5-20. Scatter plot between actual and predicted task completion times
122
Error Plot
30
25
20
% Data
15
10
0
-120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120
% Error
Figure 5-21. Error Plot
Table 5-3. Correlation coefficients for each participant
Correlation
Participants
Coefficients
P5 0.41*
P8 0.44*
P9 0.61*
P10 0.55*
P11 0.30
P12 0.46*
* p < 0.01
5.6. Affect of hand strength for able-bodied users

After developing the model for motor impaired users, I investigated how hand
strength affects performance of able-bodied users. Fitts’ Law provides a robust and
123
accurate model for rapid aiming movements of able-bodied users. So I conducted a 2-

dimensional Fitts’ Law task.
5.6.1. Process
In the Fitts’ Law task, I used 26 different combinations of target amplitude (A, ranged
from 30 to 700 pixels) and target width (W, ranged from 16 to 48 pixels). The result-
ing index of difficulty (ID) ranged from 2 to 5. Each of the participants performed 450
pointing tasks.
5.6.2. Material
I used a standard optical Mouse and an Acer Aspire 1640 Laptop with 15.5” monitor
having 1280×800 pixel resolution. I also used the same seating arrangement for all
participants. I measured the same six variables for hand strength evaluation as in the
previous study.
5.6.3. Participants
I collected data from 14 able-bodied users (9 male, 5 female, and age range 22 to 50
with average age of 29.3). All participants were expert computer users.
5.6.4. Results
The correlation coefficients between index of difficulty ( ID ) and movement time
ranges from 0.73 to 0.95 with an average value of 0.85, which conforms to Fitts’ Law.
I compared the hand evaluation metrics with the Fitts’ Law coefficients (a and b
IDAverage
 A 
where, MT = a + b log 2  + 1 and Index of Performance (IP = MTAverage ). I found that
W 
IP is significantly correlated with the grip strength and tip pinch strength (ρ = 0.57,
p < 0.05 for grip strength, ρ = 0.72, p < 0.005 for tip pinch strength, Figures 5-22 and
5-23 respectively). The parameter b significantly correlates with tip pinch strength (ρ
124
= 0.65, p < 0.01, Figure 5-24). I did not find any other significant correlation between
IP, a and b and any other hand evaluation metrics.
IP vs. Grip Strength
4
Index of Performance (bits/sec)
3.8
3.6
3.4
3.2
3
2.8
2.6
2.4
2.2
2
10 15 20 25 30 35 40 45 50 55 60
Grip Strngth (in Kg)
Figure 5-22. Index of Performance vs. Grip Strength
IP vs. Tip Pinch Strength
4
Index of Performance (bits/sec)
3.8
3.6
3.4
3.2
3
2.8
2.6
2.4
2.2
2
1 2 3 4 5 6 7 8 9
Tip Pinch Strength (in Kg)
Figure 5-23. Index of Performance vs. Tip Pinch Strength
125
b vs. Tip Pinch Strength
360
320
280
Parameter b
240
200
160
120
80
40
1 2 3 4 5 6 7 8 9
Tip Pinch Strength (in Kg)
Figure 5-24. Parameter b vs. Tip Pinch Strength
5.7. Discussion
For able-bodied users, pointing performance is generally analysed in terms of Fitts’
Law. Fitts’ Law can be applied to rapid aiming movements in many different con-
texts, but a proper explanation of this law is still unclear. Crossman and Goodeve pio-
neered an early but limited mathematical explanation [Rosenbaum, 1991]. They as-
sumed that a movement consists of several sub movements. Each sub movement takes
a constant time to execute and crosses a constant fraction of the total distance. The
last sub movement brings the pointer within the target. However many bio-
mechanical experiments failed to find existence of sub movements in all movements
as predicted by the Crossman-Goodeve model [Langolf and colleagues, 1976].
Schmidt and colleagues [1979] rejected the idea of existence of any sub movements
and they explained the speed-accuracy trade-off with the help of the initial impulse
generated by the muscles. Later Meyer and colleagues [1988] combined the ideas of
126
both sub movement and initial impulse. They formulated a generalized model of rapid
aiming movements in which Fitts’ Law appears as a special case when the number of
sub movements tends to infinity. However alternative explanations of rapid aiming
movements are also available (such as Polit and Bizzi’s [1978] Mass Spring model).
Fitts’ Law does not account for the users’ physical abilities in predicting movement
time. This seems reasonable for able-bodied users, but may not be correct for users
with disabilities. My analysis indicates that people having higher hand strength also
have greater control in hand movement and can perform pointing faster. The positive
correlation between the velocity of movement and grip strength also supports this
claim. As motor impairment reduces the strength of hands, motor impaired people
lose control of hand movement. So the number of pauses near the source and the tar-
get are significantly affected by the grip strength. The relation between grip strength
and number of pauses indicates that there is a minimum amount of grip strength
(about 20 kg) required to move the mouse without pausing more than twice. This
threshold of 20 kg can be used to determine the type of input device suitable for a
user, along with other factors like preference, expertise and so on. My analysis also
showed that flexibility of motion (as measured by range of motion of wrist or fore-
arm) is not as important as strength of hand (as measured by grip strength).
I found that hand strength affects pointing performance of able-bodied users, too. The
positive correlation between index of performance and hand strength shows people
with greater hand strength perform pointing faster. The correlation between the con-
stant term b and tip pinch strength indicates a difference in movement patterns among
people with different hand strengths. As the constant b indicates the effect of index of
difficulty (ID) on the movement time, perhaps the movement pattern of people with
higher hand strength mainly consists of an initial ballistic phase and does not have a
long homing phase since time to complete the homing phase should depend more on
127
the target characteristics. The opposite holds true for people with less hand strength.
Since the homing phase requires more control in hand movement, the negative corre-
lation between b and hand strength also indicates people having higher hand strength
also have greater control in hand movement.
My model predicts pointing time by separately working on each individual phase of

movement and it also incorporates personal characteristics of users. In particular, I
predicted the number of pauses near source and target and the velocity of movement
based on the extent of disability of the user. The model has accurately predicted the
task completion time for pointing tasks undertaken by motor impaired users. The
model can be used to predict task completion time for different interfaces and thus
optimize interface layout based on minimum task completion time. It can also be used
to generate design guidelines regarding minimum target separation, target size and so
on for different degrees of motor impairment.
However, the model did not work well for about 10% of pointing tasks. This 10%
data shifted the average relative error from zero and also increased the standard devia-
tion of relative errors. This failure can be attributed to various characteristics of users
like effects of learning and fatigue, interest, expertise and so on In future I plan to in-
corporate more input parameters into the model. I would also like to extend the scope
of the model beyond pointing with a mouse. I shall investigate different modalities of
interaction like finger or stylus based input [Hoffmann and Sheikh, 1991; Holzinger,
2003; Holzinger and colleagues, 2008] and effects of situational impairments in inter-
action [Schedlbauer and Heines, 2007; Schedlbauer, Pastel and Heines, 2008], which
will make the model useful for designing ubiquitous interfaces, too.
128
5.8. Conclusions
I have developed a motor behaviour model for those motor impaired people, who can
use their hands to interact with a computer. My statistical model has accurately pre-
dicted the task completion time for pointing tasks for different data sets. As part of the
model, I have also developed a new scale of characterizing the extent of disability of
users by measuring their grip strength. Finally I have showed that hand strength also
affects the performance of able-bodied users and the Index of Performance in a Fitts’
Law task significantly correlates with the grip and tip pinch strength. The model can
be used to optimize interface layout for motor impaired users based on completion
time of representative tasks.
129
Chapter 6 Applications
It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't
agree with experiment, it's wrong.
- Richard P. Feynman, unsourced quotation
6.1. Introduction
The previous three chapters have presented the design, calibration and validation of
the individual models of the simulator. This chapter demonstrates the use of the
simulator as a whole. The first application uses the simulator to model an icon
searching task. The study validates the models for an externally valid task and also
demonstrates the application of the simulator to compare different interface layouts
for people with a wide range of abilities. The second application is the development of
a new scanning system by clustering screen objects. I have evaluated the scanning
technique against others by using the simulator and later confirmed the result by a
controlled experiment.
6.2. Working principle of the simulator
The simulator works in the following three steps.
1. While a task is undertaken by participants, a monitor program records the

interaction. This monitor program records
a. A list of key presses and mouse clicks (operations),
b. A sequence of bitmap images of the interfaces (low-level snapshot)
c. Locations of windows, icons, buttons and other controls in the screen
(high-level snapshot).
2. Initially, the cognitive model analyzes the task and produces a list of atomic
tasks.
3. If an atomic task involves perception, the perception model operates on the
event list and the sequence of bitmap images. Similarly, if an atomic task
involves movement, the motor behaviour model operates on the event list and
the high-level snapshot (Figure 6-1).
Application Model
Perception
Interface Model
Designer Low Level
Snapshot
Cognitive List of Mouse and

Task Operations
Model Kbd Events
Model
High Level Motor

Snapshot behaviour
Monitor
Model
Program
User Characteristics
Figure 6-1. Sequence of operations in the simulator
I have implemented the models in a modular fashion – all of the models can be run
independently of each other as well as together. The cognitive model takes a task
description from the task model and produces a list of low-level device operations.
131
The interface designer or participants have to execute these operations manually with
the monitor program running in background. An interface designer is free to use any
modules of the system, together or separately. For example, one can run a KLM
analysis on the output of the cognitive model instead of using my perception or motor
behaviour models. Similarly the monitor program can be run for any interaction that is
not produced by the cognitive model, and the perception and motor behaviour models
can be used on the output of the monitor program.
6.3. Icon searching task
In graphical user interfaces, searching and pointing constitute a significant portion of

human computer interaction. Users search for many different artifacts like information
in a web page, button with a particular caption in an application, email from a list of
mails etc. We can broadly classify searching in two categories.
Text searching includes any search which only involves searching for text and not
any other visual artifact. Examples include menu searching, keyword searching in a
document, mailbox searching and so on.
Icon Searching includes searching for a visual artifact (such as an icon or a button)
along with text search for its caption. The search is mainly guided by the visual
artifact and the text is generally used to confirm the target.
In this section, I present a study involving an icon searching task. I simulated the task
using the simulator and evaluated the predictive power of the model by comparing
actual task completion time with prediction in terms of correlation and percentage
error in prediction.
132
6.3.1. Process
I conducted trials with two families of icons. The first consisted of geometric shapes
with colours spanning a wide range of hues and luminance (Figure 6-2). The second
consisted of images from the system folder in Microsoft Windows to increase the
external validity (Figure 6-3) of the experiment. Each icon bears a caption underneath
(Figure 6-4). The first two letters and length of all the captions were kept nearly same
to avoid any pop out effect of the captions during visual search.
Figure 6-2. Corpus of Shapes
Figure 6-3. Corpus of Icons
The experiment was a mixed design with two measures and a between-subject factor.
The within-subject measures were spacing between icons and font size of captions. I
used the following three levels for each measure
o Spacing between icons

o Sparse: 180 pixels horizontally, 230 pixels vertically. This was the
maximum separation possible in the screen.
o Medium: 150 pixels horizontally, 200 pixels vertically.
133
o Dense: 120 pixels horizontally, 170 pixels vertically. This was the
minimum possible separation without overlapping the icons.
Figure 6-4. Sample screenshot of the study

(a. Dense Spacing Big Font, b. Medium Spacing Medium Font, c. Sparse Spacing
Medium Font, d. Dense Spacing Small Font)
o Font size
o Small: 10 point.
o Medium: 14 point as recommended by the RNIB [2006].
o Large: 20 point.
The between-subjects factor is
134
o Group
o Able bodied
o Visually impaired
o Motor impaired
The experimental task consisted of shape searching and icon searching tasks. The task
was as follows:
o A particular target (shape or icon with a caption) was shown.
o A set of 18 candidates for matching was shown.
o Participants were asked to click on the candidate, which was same as the target
both in terms of icon and caption.
The sequence of the trials was randomized using a Latin-square. Each participant
undertook 8 trials for each combination of the within-subject measures. Each
participant performed 72 searching and pointing tasks in total. They were trained for
the task before start of the actual trial. However one of the participants (P4) retired
after undertaking 40 trials.
6.3.2. Material
I used a 1280 × 800 LCD colour display driven by a 1.7 GHz Pentium 4 PC running
the Microsoft Windows XP operating system. I also used a standard computer Mouse
(Microsoft IntelliMouse® Optical Mouse) for clicking on the target.
6.3.3. Participants
I collected data from 2 able bodied, 2 visually impaired and 3 motor impaired
participants (Table 6-1). All were expert computer users and used computers more
than once a week.
135

C1 27 M
Able-bodied
C2 30 M

Hypokinetic motor impairment resulted
P3 30 M from Cerebral Palsy, restricted hand
movement, wheelchair user
Cerebral Palsy, restricted hand
P4 42 M movement, also suffering tremor in
hand, wheelchair user
Hyperkinetic motor impairment
P5 45 M resulted from stroke, significant tremor
in fingers, wheelchair user
6.3.4. Simulation
Initially I analyzed the task in light of my cognitive model. Since the users undertook
preliminary training, I considered them as expert users. I followed the GOMS analysis
technique and identified two sub-tasks
o Searching for the target.

o Pointing and clicking on the target.
The prediction is obtained by sequentially running the perception model and the
motor behaviour model. The predicted task completion time is the summation of the
visual search time (output by the perception model) and the pointing time (output by
the motor behaviour model).
136
6.3.5. Results
Figure 6-5 shows the correlation between actual and predicted task completion times.
I also calculated the relative error Predicted − Actual and show its distribution in Figure 6-
Actual
6. The superimposed curve shows a normal distribution with same mean and standard
deviation as the relative error. I found that the correlation is ρ = 0.7 (p < 0.001) and
56% of the trials have a relative error within ± 40%. The average relative error is
+16% with a standard deviation of 54%. The model did not work for 10% of the trials
and the relative error is more than 100% in those cases. For the remaining 90% of the
trials the average relative error is + 6% with a standard deviation of 42%.
I also analyzed the effects of font size and icon spacing on the task completion time
and investigated whether the prediction reflects these effects as well. So I conducted
two 3 × 3 ANOVA (Spacing × Font × Group) on the actual and predicted task
completion times respectively. I investigated both the within-subject effects and
results of a multivariate test. In the ANOVAs, I did not consider the trials for which
the relative error was more than 100% as the model did not work for those trials.
Participant P4 did not also complete the trial, leaving us with 40 rows of data (N =
40).
137
Scatter Plot
20000
Predicted task completion time (in msec)
15000
10000
5000
0
0 5000 10000 15000 20000
Actual task completion time (in msec)
Figure 6-5. Scatter plot between actual and predicted task completion time
Relative Error in Prediction
18
16
14
12
10
% Data
0
<-120 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120
% Error
Figure 6-6. Relative error in prediction

138
For calculating the within-subject effects, the Greenhouse-Geisser correction was

used if the Mauchy’s test detected violation from sphericity assumption [Field, 2009]
giving fractional values for the degrees of freedom. In this study, the main effect of
Spacing did not violate sphericity assumption (W = .854, χ² = 5.69 in actual, W = .99,
χ² = 0.37 in prediction, p > 0.05), while the main effect of Font (W = .825, χ² = 6.93 in
actual, W = .836, χ² = 6.43 in prediction, p < 0.05) and the interaction effect of
Spacing and Font (W = .244, χ² = 49.94 in actual, W = .539, χ² = 21.91 in prediction, p
< 0.05) violated sphericity assumption. Tables 6-2 and 6-3 show results of the within-
subjects tests and multivariate tests on the actual and predicted task completion times
respectively. The tables list the degrees of freedom (df), F value and corresponding
significance for different measures. Table 6-2 shows that three sources have
significant effects on both actual and predicted task completion time. They are
o A main effect of Spacing (F(2, 74) = 5.44, p < 0.05) on actual task completion
time.
o A main effect of Spacing (F(2, 74) = 6.95, p < 0.05) in predicted task
completion time.
o An interaction effect of Spacing and Group (F(4, 74) = 3.15, p < 0.05) on
actual task completion time.
o An interaction effect of Spacing and Group (F(4, 74) = 4.64, p < 0.05) on
predicted task completion time.
o An interaction effect of Font and Group (F(3.4, 62.97) = 5.02, p < 0.05) on
o An interaction effect of Font and Group (F(3.44, 63.6) = 3.75, p < 0.05) on
The main effect of Font and interaction effects between Font and Group and Spacing,
Font and Spacing do not have significant effects on both actual and predicted task
139
completion times. I confirmed these effects through a multivariate test (Table 6-3),
which is not affected by the sphericity assumption. Table 6-3 shows the following
effects
o A main effect of Spacing (Wilks' λ = 0.762, F(2, 36) = 5.62, p < 0.05) on
o A main effect of Spacing (Wilks' λ = 0.741, F(2, 36) = 6.28, p < 0.05) in
o A main effect of Font (Wilks' λ = 0.817, F(2, 36) = 4.05, p < 0.05) in
o An interaction effect of Spacing and Group (Wilks' λ = 0.750, F(4, 72) = 2.78,
p < 0.05) on actual task completion time.
o An interaction effect of Spacing and Group (Wilks' λ = 0.671, F(4, 72) = 3.97,
p < 0.05) on predicted task completion time.
o An interaction effect of Font and Group (Wilks' λ = 0.545, F(4, 72) = 6.39,
p < 0.05) on actual task completion time.
o An interaction effect of Font and Group (Wilks' λ = 0.610, F(4, 72) = 5.05,
p < 0.05) on predicted task completion time.
It can be seen from Tables 6-2 and 6-3 that the prediction captures all effects at
99.95% confidence level in both within-subject test and multivariate test. Figures 6-7
and 6-8 show that the effect sizes (η2) are also fairly similar in the prediction as in the
actual. The maximum difference is below 10% in within-subject test and below 20%
in multivariate test. This suggests that the simulator successfully explained the
variance in task completion time for different factors. As these factors include both
interface parameters and physical characteristics of users, we can infer that the
simulator has successfully explained the effects of different interface layouts on task
completion time for people with visual and motor impairment.
140
Table 6-2. Test of within-subjects effects on task completion time
Source Actual Predicted

df F Sig. df F Sig.
Spacing 2.0 5.44 0.006 2.0 6.95 0.002
Spacing × Group 4.0 3.15 0.019 4.0 4.64 0.002
Error(Spacing) 74.0 74.0
Font 1.7 0.22 0.770 1.7 2.89 0.071
Font × Group 3.4 5.02 0.002 3.4 3.75 0.012
Error(Font) 63.0 63.6
Spacing × Font 2.3 1.03 0.370 3.3 1.54 0.204
Spacing × Font × Group 4.7 0.83 0.528 6.5 1.32 0.250
Error(Spacing × Font) 86.3 121.0
Table 6-3. Multivariate test on completion time
Effect Actual Predicted

df F Sig. df F Sig.
Spacing 2 5.62 0.008 2 6.28 0.005
Spacing × Group 4 2.78 0.033 4 3.97 0.006
Font 2 0.31 0.739 2 4.05 0.026
Font × Group 4 6.39 0 4 5.05 0.001
Spacing × Font 4 1.41 0.253 4 2.18 0.093
Spacing × Font × Group 8 2.15 0.043 8 1.74 0.106
141
Effect Size Comparison in Within-Subject Test
0.25
0.2
Eta Squared
0.15 Actual
0.1 Predicted
0.05
0
SPACING SPACING * FONTSIZE FONTSIZE SPACING * SPACING *
GROUP * GROUP FONTSIZE FONTSIZE
* GROUP
Measures
Figure 6-7. Effect size comparison in ANOVA
Effect Size Comparison in Multivariate Test
0.3
0.25
Eta Squared
0.2
Actual
0.15
Predicted
0.1
0.05
0
SPACING SPACING * FONTSIZE FONTSIZE SPACING * SPACING *
GROUP * GROUP FONTSIZE FONTSIZE
* GROUP
Measures
Figure 6-8. Effect size comparison in MANOVA
Figures 6-9 and 6-10 show the effects of font size and spacing for different user
groups. In Figures 6-9 and 6-10, the points depict the average task completion time
and the bars show the standard error at a 95% confidence level. It can be seen from
Figures 6-9 and 6-10 that the prediction is in line with the actual task completion
times for different font sizes and icon spacing.
142
However the prediction is less accurate in one of the nine conditions - the medium
font size and medium spacing for the motor impaired users. So I further analyzed
these two conditions (Figures 6-11 and 6-12). As in previous figures, Figures 6-11 and
6-12 depict the average task completion time and the bars show the standard error at
95% confidence level. Figure 6-11 shows the task completion time for different font
sizes while the spacing between the icons were medium. Figure 6-12 shows the task
completion time for different icon layouts while the font size of the icons was 14 pt
(medium). Figures 6-11 and 6-12 show that the standard error is estimated less in the
prediction than the actual, and in these cases the model fails to capture variability in
the task completion time. The model also underestimates the task completion times
for motor impaired users.
Effect of Font siz e
14000
Task completion time (in msec)
12000
10000
8000
Actual
6000
Predicted
4000
2000
0
m
m
l
l
e
e
al
al
al
rg
rg
rg
du
iu
iu
sm
sm
sm
ed
ed
la
la
la
ei
m
Able bodied Visually impaired Motor impaired
Figure 6-9. Effect of Font size in different user groups
143
Effect of Spacing
14000
Task completion time (in msec)
12000
10000
8000
Actual
6000 Predicted
4000
2000
0
e
e
se
se
se
m
m
ns
ns
ns
iu
iu
iu
ar
ar
ar
de
de
de
ed
ed
ed
sp
sp
sp
m
m
Able bodied Visually impaired Motor impaired
Figure 6-10. Effect of Spacing in different user groups
Effect of fontsiz e for medium spacing with motor impaired users
18000
16000
Task completion time (in
14000
12000
msec)
10000 Actual
8000 Predicted
6000
4000
2000
0
Small Medium Large
Conditions
Figure 6-11. Effect of medium Spacing in motor impaired users
144
Effect of spacing for medium font siz e with motor impaired

users
18000
16000
Task completion time (in
14000
12000
m sec)
10000 Actual
8000 Predicted
6000
4000
2000
0
Sparse Medium Dense
Conditions
Figure 6-12. Effect of medium Font size in motor impaired users
6.3.6. Discussion
I have developed the simulator to help with the design and evaluation of assistive
interfaces. Choosing a particular interface from a set of alternatives is a significant
task for both design and evaluation. In this study, I considered a representative task
and the results showed that the effects of both factors (Spacing between icons and
Font size) were the same in the prediction as for actual trials with different user
groups. The prediction from the simulator can be reliably used to capture the main
effects of different design alternatives for people with a wide range of abilities.
However the model did not work accurately for about 30% of the trials where the
relative error is more than ±50%. These trials also accounted for an increase in the
average relative error from zero to 16%. In particular, the predicted variance in task
completion times for motor impaired users was smaller than the actual variance. This
can be attributed to many factors; the most important ones are as follows.
145
o Effect of usage time - fatigue and learning effects: The trial continued for
about 15 to 20 minutes. A few participants (especially one user in the motor
impaired group) felt fatigue. On the other hand, some users worked more
quickly as the trial proceeded. The model did not consider these effects of
fatigue and learning. In future I plan to incorporate the usage time into the
input parameters of the model.
o User characteristics: The variance in the task completion time can be
attributed to various factors such as expertise, usage time, type of motor
impairment (hypokinetic vs. hyperkinetic), interest of the participant and so
on. Currently, the model characterizes the extent of motor impairment of the
user only by measuring the grip strength, in future more input parameters may
be considered.
o The choice of the motor behaviour model: I trained the motor behaviour
model by collecting data from people with and without motor impairment.
However Fitts’ Law [Fitts’, 1954] predicts the movement time better than my
model for people without any mobility impairment. An eclectic approach of
choosing Fitts’ Law for people without mobility impairment and my motor
behaviour model for people with mobility impairment may produce more
accurate results.
6.4. The Cluster scanning system
In this section I demonstrate the use of the simulator for developing a new assistive
interaction technique. Many physically challenged users cannot interact with a
computer through a conventional keyboard and mouse. They may interact with a
computer through one or two switches with the help of a scanning mechanism.
Scanning is the technique of successively highlighting items on a computer screen and
146
pressing a switch when the desired item is highlighted. I have developed a new
scanning technique by clustering screen objects. Initially I have evaluated the cluster
scanning system against two other scanning systems using the simulator. Later I have
also confirmed the result by collecting data from motor impaired participants.
6.4.1. Related work
Most work on scanning has aimed to enhance the text entry rate of a virtual keyboard.
In these systems the mechanism is usually block-row-column-item based scanning
[Simpson and Koester, 1999; Lesher and colleagues, 2002]. However, navigation to
arbitrary locations on a screen has also become important as graphical user interfaces
are more widely used. Two types of scanning mechanism are commonly used for
navigation. Cartesian scanning moves the cursor progressively in a direction parallel
to the edges of the screen, and Polar scanning selects a direction and then moves
along a fixed bearing. A particular type of Polar scanning that allows movement only
in eight directions is commonly used [Steriadis and Constantnou, 2002; Ntoa, Savidis
and Stephanidis, 2004] (and in a wheelchair mobility interface [O’Neill, Roast and
Hawley, 2002]). In both Cartesian and Polar scanning systems, the interaction rate of
users remains very low. So recent scanning systems have tried to combine two or
more types of scanning to get better performance. Examples of some existing systems
in the same discipline are the Autonomia System [Steriadis and Constantnou, 2002],
the FastScanner system [Ntoa, Savidis and Stephanidis, 2004], the Gus! Scanning
Cursor [2007], the ScanBuddy system [2007] and the SSMCI system [Moynahan and
Mahoney, 1996].
The Autonomia system [Steriadis and Constantnou, 2002] replaces the windows and
widgets of a typical Windows interface by Frames and Widget for Single-switch input
devices (WIFSID) respectively. The system consists of different frames such as
147
Cursor Frame, Virtual Keyboard Frame, Console frame etc. The cursor frame
provides eight-directional scanning whereas the frame itself and other frames are
scanned using the block-row-item based scanning approach.
The FastScanner system [Ntoa, Savidis and Stephanidis, 2004] starts the scanning
process by showing a list of currently open applications and asks the user to choose an
application. The scanning procedure then restarts itself in the selected application. The
objects of an interface are scanned sequentially based on a predefined order. Screen
navigation is done by eight-directional scanning. Additionally, the objects of an
interface are divided into four classes –
o Text entry objects

o Simple objects
o Selection objects
o Container objects
The user input is interpreted according to the type of the object that has received the
input.
The Gus Scanning Cursor [2007] provides different types of navigation strategies
(such as Cartesian, Polar, eight-directional) at a single screen and the screen itself is
scanned by row-item based scanning. The user has to choose a particular scanning
type to navigate through the screen.
The ScanBuddy system [2007] scans the screen by iteratively dividing it into two
equal parts up to 4 times. Finally it scans the smallest part using Cartesian scanning.
148
In the Single Switch Mouse Control Interface (SSMCI) system [Moynahan and
Mahoney, 1996], an intelligent agent operates to guess the target and moves the cursor
accordingly. If the guess is incorrect the user has to signal the agent, which then
reevaluates the situation and comes up with a new solution.
There also exist few scanning applications for specialized tasks like text selection
[Shein, 1997], menu selection [Evreinov and Raisamo, 2004], Web browsing [Ntoa
and colleagues, 2009] and so on, but they are not really useful for navigating to an
arbitrary location in a screen.
Most of these scanning systems (excepts Gus Scanning Cursor [2007] and SSMCI
[Moynahan and Mahoney, 1996]) have a similar structure. They start by dividing the
screen into several blocks and then introduce either Cartesian or Polar scanning within
a block. As a result, users can traverse shorter distances using Cartesian or Polar
scanning and the time needed to reach a target from long distances is reduced.
However, an arbitrary screen layout cannot always be evenly divided into blocks,
rows or columns. So scanning systems define blocks in different ways. The
Autonomia system introduces blocks by providing different frames. The FastScanner
system defines blocks based on the hierarchy of objects in the Windows operating
system. The Scanbuddy system defines blocks just by dividing the screen in two equal
segments.
There is little published work comparing performances of different scanning

techniques. Angelo [1991] compared Automatic, Inverse and Step scanning and
unsurprisingly concluded that the choice of scanning depended on the type of
disability of the user. Birch [2000] compared Cartesian and Polar scanning
approaches and found that “the Cartesian method was shown to be faster over the
entire screen when compared to the rotational method”. Blackstien-Adler [2004]
149
compared continuous Cartesian, discrete Cartesian, rotational, and hybrid

quadrant/continuous Cartesian scanning techniques and found that Cartesian scanning
(both continuous and discrete) was not only preferred and but also the most effective
scanning strategy. However these works were confined to Polar and Cartesian
scanning. Additionally they compared the performance of the scanning systems on
some synthesized situations.
I have taken a novel approach of designing a new scanning system by clustering

screen objects and evaluating it in a realistic situation through simulation. In the next
section I have described three different scanning systems that are considered in the
present study. In Section 6.4.3, I discuss the cluster scanning system in detail. The
performance of the scanning systems is compared in Section 6.4.4. Finally, I have
validated my result through a user study in Section 6.4.5.
6.4.2. Current scanning systems
In the present study I have considered the following three types of scanning systems.
Eight-Directional Scanning System: In this scanning technique the pointer icon is

changed at a particular time interval to show one of eight directions (Up, Up-Left,
Left, Left-Down, Down, Down-Right, Right, Right-Up). The user can choose a
direction by pressing the switch when the pointer icon shows the required direction.
When the pointer reaches the desired point in the screen, the user has to give another
key press to stop the pointer movement and make a click. Figure 6-13 shows a
probable cursor trace using the eight-directional scanning system.
150
Figure 6-13. The eight-directional Scanning System
Block Scanning System: In the block scanning system the screen area is iteratively
segmented into equally sized sub-areas (Figure 6-14). The user has to select a sub-
area that contains the intended target (the green rectangle in Figure 6-14). The
segmentation process runs a certain number of iterations and after that eight-
directional scanning is initiated in the selected sub-area.
151
Figure 6-14. The Block Scanning System
Cluster Scanning System: The cluster scanning system initially collects all possible
targets in a screen. Then it iteratively divides a screen into several clusters of targets
based on their locations (Figure 6-15). The user has to select the cluster that contains
the intended target. The clustering process iterates until the cluster contains a single
target (Figure 6-15).
152
Figure 6-15. The Cluster Scanning System
Demonstration of the scanning system
A demonstration of these scanning systems can be downloaded from

Eight Directional Scanning: http://www.youtube.com/watch?v=0eSyyXeBoXQand
feature=user
Block Scanning: http://www.youtube.com/watch?v=UTmMstrGDZY
Cluster Scanning: http://www.youtube.com/watch?v=tRtbsn2LfeAandfeature=user
The demonstrations involved a task of pressing a set of buttons placed in a screen

(Figure 6-16) in a particular sequence. All of the buttons were coloured grey except
the next target, which was red. The same task was repeated for all the scanning
systems.
153
Figure 6-16. Screenshot of the demonstration for scanning interfaces
6.4.3. The cluster scanning system
The cluster scanning system works by enumerating objects being shown in the screen
and storing positions of windows, buttons, icons and other possible targets. The
algorithm starts by considering all the processes running on the computer. If a process
is controlling a window, then the algorithm also considers all child and thread
processes owned by it. During the enumeration process, the algorithm identifies the
foreground window and separately stores the positions of the foreground window and
targets within it from the background windows. The algorithm also calculates the area
occupied by the foreground window. Then it separately clusters the targets in the
foreground window and background windows. The ratio of the number of clusters in
foreground and background windows is proportional to the ratio of the area occupied
by the foreground window in the whole screen. I used the Fuzzy c-means algorithm
[Ross, 1997] to cluster the targets into similarly sized groups. The algorithm is similar
to k-means clustering algorithm. The k-means algorithm partitions points into k
154
clusters where each point belongs to the cluster with nearest mean. This algorithm
aims at minimizing the following objective function
k n
J =∑ ∑x
2
( j)
i − cj
j =1 i =1
2
where x i( j ) − c j is a distance measure between a data point xi( j ) and the cluster
centre cj. The Fuzzy c-means algorithm returns the membership values of data points
into different clusters instead of putting the data points into separate clusters. As a
result when the data points are not naturally separated, it returns overlapping clusters.
The c-means algorithm takes the number of clusters (c) as input. It aims at minimizing
the following objective function
N C
Jm = ∑ ∑u
2
m
ij xi − c j , 1 ≤ m <∝
i =1 j =1
where m is any real number greater than 1, uij is the degree of membership of xi in the
cluster j, xi is the ith of d-dimensional measured data, cj is the d-dimension centre of
the cluster, and ||*|| is any norm expressing the similarity between any measured data
and the centre. The pseudocode of the algorithm is shown in Appendix.
Evaluating optimum number of clusters
I have calculated the optimum number of clusters to minimize the target acquisition
time with respect to the number of clusters. Initially I have evaluated the average time
needed to select a single cluster from a set of clusters. Then the number of iterations
needed to reach a single target in the cluster scanning process is estimated. Based on
these two estimations I have evaluated the total target acquisition time and found the
number of clusters that minimizes it. The optimal number of cluster is five and it does
155
not depend on the number of targets in the screen nor the size or resolution of the
screen. The detail of the analysis is shown below.
Let
Number of targets = n
Number of Clusters = c
Scan Delay = s msec.
Cluster selection time
Suppose that each cluster is equally likely to be selected since we cannot assume a
particular target has higher probability of being selected than the others (unless a
significant amount of interaction is recorded and analysed). So the expected time to
select each cluster is
1
Tc = ∑  (1 + 2 + 3 + .... + c )s
c
1  1 
=   c (c + 1 )s
2  c 
1
= (c + 1 )s
2
After reaching a cluster the user needs to press the key another time to indicate that it
is the correct selection. To give this confirmation signal he has to wait another s sec.,
which makes the average cluster selection time to be
1
Tc = (c + 1)s + s
2
1
= (c + 3)s …...……………………..Eq. 6-1
2
156
Number of iterations required to reach a single target
If the clustering is optimal, we can assume each cluster will contain an equal number
of targets.
Hence,
n
After the first iteration each cluster will contain targets
c
n
After the second iteration each cluster will contain targets
c2
n
After the i-th iteration each cluster will contain targets
ci
Finally after the last iteration (say k-th) each cluster will contain a single target
n
Hence =1
ck
ln n
or, k=
ln c
Total time needed to reach a single target
Now the total selection time for an individual target will be

T = kT c
ln n 1
= . (c + 3)s
ln c 2
1
= (s. ln n )
(c + 3)
2 ln c
= Q.
(c + 3)
ln c
1
[Where, Q = (s. ln n ) ,and is constant for a particular interface]
2
157
c is a discrete quantity, but we can consider it to be continuous. Then T will achieve

dT
an extreme value at =0
dc
Now
dT  1
= Q. −
(c + 3) 
2 
dc  ln c c.(ln c ) 
dT 3
So =0 when ln c = 1 +
dc c
Numerical analysis gives a solution at c ≈ 4.97 (see Figure 6-17). Further
examination of discrete values of c confirms that T is minimized with five targets in
each cluster.
Variations of T with c
7.5
7
6.5
6
T
5.5
5
4.5
4
2 3 4 5 6 7 8 9 10
Number of Clusters
Figure 6-17. Variation of T w.r.t. the number of clusters
Since T does not vary much for 4 to 6 clusters, so I have used the classification
entropy [Ross, 1997] of a cluster to minimize T. At each instance, I cluster targets into
4, 5 and 6 clusters and then select the one that minimizes the classification entropy.
158
6.4.4. Evaluation through simulation
I evaluated the scanning systems by the following three models.
o The model for the cluster scanning system takes the scan delay, the number of
clusters, the intended target and the total number and positions of targets in a
screen as input and gives the target acquisition time as output. The model
calculates the target acquisition time by running the cluster scanning algorithm
on the input and using equation 6-1.
o The model for the block scanning system takes the scan delay (s), the number
of blocks (k) and the number of iterations (r) as input and gives the target
acquisition time as output. The model calculates the target acquisition time by
running the block scanning algorithm on the input. The minimum target
acquisition time is s × r (while the target can be reached by always selecting
the first block) and the maximum is equal to s × k × r (while the target has to
be reached by always selecting the last block).
o I used the cognitive model discussed in Section 4.3.2 of Chapter 4 to simulate

the eight-directional scanning system.
In the present study, I recorded sample interactions by two able-bodied users to

generate a list of tasks, which were fed to the simulator to evaluate different scanning
techniques. The users were expert computer users and they were not instructed to use
any particular application or to do any specific task. These can therefore be taken as
representative of natural interactions. The simulator estimated the time needed to
undertake the same set of tasks using different scanning systems.
159
Results
I investigated the eight-directional scanning, block scanning for different numbers of
blocks and different numbers of iterations, and cluster scanning for different numbers
of clusters. The estimated task completion times are shown in Table 6-4 and Figure 6-
18. The fact that some of these tasks would take over two hours to complete indicates
the value of simulation over user trials.
Discussion
The results clearly show that both the cluster scanning and block scanning processes
perform better than eight-directional scanning and thus support the use of screen
segmentation in recent scanning systems. The cluster scanning system performs best
when the number of clusters is five. However, among the different versions of Cluster
and Block scanning processes, I found a type of block scanning that divides the screen
into four equal sized partitions for four iterations performed best.
Figure 6-18. Performance Comparison of Different Scanning Systems

(B.F.: Branching Factor, Itr: #Iterations)
160
Table 6-4. Estimated Task Completion Time for different scanning systems
Scanning Type Branching Number of Iterations Estimated Task

Factor (For Block Scanning) Completion Time
(#Clusters or in sec.
#Blocks)
Eight- 8676
Directional
Scanning
Cluster 2 6943
Scanning
3 5996
4 5842
5 5707
6 5937
7 5965
Block Scanning 2 2 7595
2 4 7859
2 8 7781
4 1 7206
4 2 7116
4 4 5374
16 1 8201
16 2 6961
I expected that the cluster scanning process would perform better since it uses the
information about target types and locations in the clustering process. For example,
labels are not considered as possible targets. So as part of a post-hoc analysis I studied
the actual tasks undertaken by the participants. Most of the time, participants used
instant messenger software and browsed the World Wide Web. The present version of
the clustering process does not consider locations of hyperlinks in the target
acquisition process and so it might miss possible targets during Web surfing. To test
161
my hypothesis, I again collected some sample cursor traces in two different

conditions: in the first condition I asked users not to browse the Web while in the
second there was no such restriction. The estimated time for block scanning (with
branching factor 4 and 4 iterations) and cluster scanning (with 5 cluster centres) are
shown in Figure 6-19.
I found that that the cluster scanning process performed far better than the block
scanning process when it considered all possible targets in its clustering process (i.e.
in tasks without web browsing). The intended audience of the scanning systems
(motor impaired users) can use special browsers customized for them [Stephanidis,
1998; IBM Web Adaptation, 2007]. In those browsers, a web page is preprocessed
before presentation and the hyperlinks are arranged in a fixed location of screen. In
that case, the cluster scanning process will have no problem locating hyperlinks and
should perform better than other scanning systems.
Figure 6-19. Comparing Cluster Scanning and Block Scanning for tasks using and not
using Internet
162
6.4.5. Validation of the result
The simulation predicts that participants should take less time to complete a task using
the cluster scanning system than the block scanning system if the clustering process
can include all targets in a screen. I validated this result by a controlled experiment on
motor impaired users. Additionally, I also investigated how hand strength affects
pointing performance in case of single-switch scanning systems. The detail of the
experiment is discussed in the following sections.
Process
In this experiment, the participants were instructed to press a set of buttons placed in a
screen (Figure 6-16) in a particular sequence. All the buttons were coloured grey
except the next target, which was red. The same task was repeated for all the scanning
systems. In particular, I evaluated the cluster and block scanning systems. I recorded
cursor traces, target height, width, and task completion time. For internal validity of
the experiment, I did not use any scan delay adaptation algorithm. The scan delay was
kept constant at 2 sec. for motor impaired participants and at 1 sec. for the control
group. These values were selected after observing their reaction time and were greater
than the reaction time. All participants were trained adequately with the scanning
systems before undertaking the experiment.
Material
I used a push button switch [The Super Switch, 2007] and an Acer Aspire 1640
Laptop with 1280 × 800 pixel screen resolution. I used the same seating arrangement
(same table height and distance from table) for all participants. I measured the same
six variables for hand strength evaluation as discussed in Chapter 5.
163
Participants
I collected data from 8 motor impaired and 8 able-bodied participants (Table 6-5).
The motor impaired participants were recruited from a local centre, which works on
treatment and rehabilitation of disabled people and they volunteered for the study. All
motor impaired participants used computer at least once each week. Able-bodied
participants were students of my university and expert computer users. None of them
had used the scanning systems before.

C1 27 F
C2 28 F
C3 30 M
C4 30 M
Able-bodied
C5 31 M
C6 28 F
C7 30 F
C8 26 F
P1 30 M Cerebral Palsy resulting manual dexterity, wheel chair user.

Cerebral Palsy resulting manual dexterity, also have tremor in
P2 43 M hand, wheel chair user.
P3 30 M Dystonia, cannot speak, cannot move fingers, wheelchair user.
Left side (non-dominant) paralysed after a stroke in 1973, also
P4 62 M have tremor.
Cerebral attack, significant tremor in whole upper body part,
P5 44 M fingers always remain folded.
P6 46 F Did not mention disease, hard to grip things, no tremor.
P7 >45 F Spina Bifida/ Hydrocephalus, wheelchair user.
P8 >45 M Cerebral Palsy from birth, restricted hand movement, no tremor.
164
Results
Initially I measured the total task completion time for the scanning systems (Figure 6-
20 and Table 6-6). It can be seen that participants took less time to complete the task
using the cluster scanning system. The dotted bars in Figure 6-20 mean that two
participants could not complete the task using the block scanning system.
Comparing Cluster and Block Scanning Systems
200000
180000
Task Completion Time (in msec)
160000
140000
120000
Cluster Scanning System
100000
Block Scanning System
80000
60000
40000
20000
0
P1 P2 P3 P4 P5 P6 P7 P8 C1 C2 C3 C4 C5 C6 C7 C8
Participants
Figure 6-20. Task completion times for the scanning systems
To further investigate the scanning systems, I measured the following three variables:
Number of missed clicks: It measures the number of times the participants wrongly
pressed the switch.
Idle Count: The scanning systems periodically highlight the buttons. This variable
measures the number of cycles when the participants did not provide any input,
though they were expected to do so.
165
Efficiency: The scanning systems require a minimum time to complete any task
which depends on the particular scanning system and not on the performance of the
OptimalTim e
user. I calculated the efficiency as the ratio ActualTime . An efficiency of 100%
indicates optimal performance, 50% indicates taking twice the minimal time and 0%
indicates failure to complete the task. Table 6-6 presents the efficiency of each
participant. The optimal time was same for each participant within a group. In
calculating the efficiency I took the average time needed to optimally and actually
make one click (or selection) since two participants could not complete the task.
Table 6-6. Comparing scanning systems
Participants Cluster scanning system Block scanning system

Task Task
completion completion
Efficiency time (in sec.) Efficiency time (in sec.)
P1 0.39 166 0.13 618
P2 0.76 86 0.62 125
P3 0.89 73 0.75 104
P4 0.58 113 0.15 171
P5 0.54 120 0.48 107
P6 0.61 107 0.33 236
P7 0.68 97 0.57 137
P8 0.50 131 0.47 165
C1 0.56 58 0.51 77
C2 0.84 39 0.53 73
C3 0.49 67 0.37 105
C4 0.76 43 0.63 62
C5 0.71 46 0.67 58
C6 0.56 59 0.57 68
C7 0.62 53 0.66 59
C8 0.42 77 0.46 84
Average 0.62 83 0.49 141
166
Table 6-7 shows the correlation coefficients of these variables with the hand
evaluation metrics. The only significant effect is a correlation between the number of
missed clicks in the cluster scanning system and grip strength. There was a similar,
but weaker effect, in the block scanning system. It seems that hand strength does not
affect performance of users with the scanning systems.
Table 6-7. Correlation coefficients for the Scanning Systems
Cluster Scanning System Block Scanning System

Missed Idle Efficiency Missed Idle Efficiency
Click Count Click Count
Correlations GS -0.580 -0.191 0.160 -0.429 -0.331 0.174
TPS -0.374 -0.105 0.094 -0.271 -0.153 0.02
ROM -0.414 -0.154 -0.127 -0.120
Wrist 0.188 0.028
ROM 0.000 0.106 -0.268 -0.225
Forearm -0.103 0.082
Significance GS 0.018 0.478 0.478 0.097 0.210 0.519
TPS 0.153 0.699 0.699 0.310 0.572 0.941
ROM 0.111 0.569 0.639 0.659
Wrist 0.569 0.918
ROM 1.000 0.695 0.315 0.401
Forearm 0.695 0.762
I did not find any significant difference between the performances of motor impaired
and able-bodied users by an equal variance paired t-test at p < 0.05 level. However the
efficiency, average number of missed clicks and idle count are significantly lower in
the cluster scanning system than in the block scanning system in an equal variance
paired t-test (p < 0.05) (Figure 6-21). Additionally two participants (P3 and P7) could
not complete the task using the block scanning system while all participants could
complete the task using the cluster scanning system.
167
Comparing Cluster and Block Scanning Systems
0.49
Avg. Efficiency
0.62
18.59
Avg. Idle Count
3.75
Block Scanning System

6.96
Avg. Missed Click Cluster Scanning System
0.42
Figure 6-21. Comparing the scanning systems
Discussion
I failed to find any effect of hand strength on pointing performance while participants
used the scanning systems. There are two possible explanations:
o The switch used in scanning only requires a gentle push to operate and the
hand strength of motor impaired users are sufficient to operate the switch.
o The scanning software does the navigation itself and the users need not move
their hand to move the pointer.
This result with the scanning system also shows that an appropriate choice of an
assistive technology can make interaction independent of the physical strength of
users. It can be noted from tables 6-5 and figure 6-20 that participants P4 and P5 both
have hyperkinetic motor impairment and both can not complete the task using block
scanning system. Perhaps it means they face a different challenge in comparison to
other users. In future, it will be interesting to investigate the effects of the type of
motor impairment on the scanning systems.
168
The simulator predicted that the task completion time would be less in the cluster
scanning system than the block scanning system when the cluster scanning system can
consider all possible targets in its clustering process. The experiment also shows
similar results. The total task completion time, sub-optimal task completion time, idle
time and number of missed clicks are less in the cluster scanning system than the
block scanning system. The efficiency of the cluster scanning system can be attributed
to the following factors.
o The cluster scanning system does not introduce any new interface element like
a frame or form in the screen as Autonomia [Steriadis and Constantnou, 2002]
or FastScanner [Ntoa, Savidis and Stephanidis, 2004] systems do.
o The cluster scanning system does not blindly divide the screen in a predefined
number of segments as the ScanBuddy system [2007] or the block scanning
systems do. It clusters the target so that the targets are evenly divided into
blocks and a block is not drawn in a region that does not contain any target.
My study also confirms the value of automatically evaluating assistive interfaces

using a simulator. Before running a formal user trial, a system designer may tune
interface parameters or select the best design alternative using my simulator. As each
alternative design needs not to be evaluated by a user trial, the simulator will reduce
the development time significantly.
6.5. Conclusion
In this chapter, I have presented two representative applications of the simulator. The
first study demonstrates the use of the simulator to choose an interface layout from a
set of alternatives. The simulator is found to correctly predict the main effects of
169
different layout options for people with a wide range of abilities. The second study
demonstrates the use of the simulator for designing a new assistive interaction
technique. Initially I have developed a new scanning technique and used the simulator
to analyze it in comparison to other scanning techniques. Later I have confirmed the
results of the analysis through a controlled experiment.
170
Chapter 7 Conclusions
While we believe strongly in user testing and iterative design. However, each iteration of a
design is expensive. The effective use of such models means that we get the most out of each
iteration that we do implement.
-Bill Buxton from his book “Human Input to Computer Systems: Theories, Techniques and
Technology”, 2010
7.1. Introduction
In this work, I have developed a simulator to help with the design and evaluation of
assistive interfaces. The simulator embodies both the internal state of a computer
application and also the perceptual, cognitive and motor processes of its user. It takes
a task definition and locations of different objects in an interface as input. It then
predicts possible eye movements and cursor paths on the screen and uses these to
predict task completion times. The models are parameterized to represent different
physical abilities, levels of skill and input devices.
In the following sections, I summarize my work and discuss about the implication,
limitation and future directions of my research.
7.2. Summary
I have taken a novel approach to designing and evaluating inclusive systems by
modelling the performance of users with a wide range of abilities. As I discussed in
Chapter 2, two main types of user model are in widespread use:
o The GOMS family of models, which were developed only for human
computer interaction (HCI).
o Models involving cognitive architectures, which take a detailed view of human

cognition.
The GOMS (Goal, Operator, Model, Selection) family of HCI models (e.g. KLM,
CMN-GOMS, CPM-GOMS) is mainly suitable for modelling the optimal (skilled)
behaviour of users. On the other hand, models developed using cognitive architectures
consider the uncertainty of human behaviour in detail but have not been widely
adopted for simulating HCI as their use demands a detailed knowledge of psychology.
There is also not much reported work on systematic modelling of assistive interfaces.
In the present work, I have addressed some of the current problems of user modelling
by developing a simulator inspired by Model Human Processor [Card, Moran and
Newell, 1983]. My simulator consists of a perception model, a cognitive model and a
motor behaviour model.
The perception model simulates the phenomena of visual perception such as focussing
and shifting attention. It can also simulate the effects of different visual impairments
on interaction. I have investigated eye gaze patterns of able-bodied users as well as
people with visual impairment and my model can predict the visual search time and
eye gaze pattern of able-bodied people and a few types of visually impaired users with
statistically significant accuracy.
The cognitive model simulates expert performance by using CPM-GOMS model

[John and Kieras, 1996]. It can also simulate performance of novices by using a dual
space model [Rieman and Young, 1996].
The motor behaviour model is developed by statistical analysis of cursor traces from
motor impaired users. As part of the model, I have also developed a new scale for
172
characterizing the extent of disability of users by measuring their grip strength, which
was not earlier possible by using existing clinical scales.
These models do not need detailed knowledge of psychology or programming to

operate. They have graphical user interfaces to provide input parameters and showing
output of simulation.
7.3. Implications and limitations

User trials are always expensive in terms of both time and cost. A design evolves
through an iteration of prototypes and if each prototype is to be evaluated by a user
trial, the whole design process will be slowed down. Additionally, user trials are not
representative in certain cases, especially for designing inclusive interfaces for people
with special needs. A good simulation with a principled theoretical foundation can be
more useful than a user trial in such cases. Exploratory use of modelling can also help
designers to understand the problems and requirements of users, which may not
always easily be found through user trials or controlled experiments.
I have shown that it is possible to develop engineering models to simulate human

computer interaction of people with a wide range of abilities and that the prediction is
useful in designing and evaluating interfaces. According to Allen Newell’s time scale
of human action [Figure 7-1, Newell, 1990], our model works in the cognitive band
and predicts activity in millisecond to second range. It can not model activities outside
the cognitive band like micro-saccadic eye gaze movements, response characteristics
of different brain regions (in biological band [Newell, 1990]), affective state, social
interaction, consciousness (in rational and social band [Newell, 1990]) and so on.
Simulations of each individual band have their own implications and limitations.
However the cognitive band is particularly important since models working in this
173
band are technically feasible, experimentally verifiable and practically usable.

Research in computational psychology and more recently in cognitive architectures
supports this claim. I have added a new dimension in cognitive modelling by
including users with special needs.
Figure 7-1. Timescale of human action (adapted from [Newell, 1990])
174
7.4. Contributions
With reference to the hypothesis in Chapter 1, the main contributions of my work are
1. Development of a model to predict completion time of representative tasks

undertaken by users with a wide range of abilities.
2. Analysing effects of physical strength on human computer interaction. In
particular,
a. Identification and calibration of two image processing algorithms to
predict points of eye gaze fixations and the corresponding fixation
durations during visual search in a computer screen undertaken by
people with and without visual impairment.
b. Analysis of eye movement trajectories during visual search in a
computer screen and identification of the most probable strategies to
predict the actual trajectory.
c. Investigation of the effect of hand strength on human computer
interaction.
d. Development of a statistical model to predict pointing times of motor
impaired computer users based on their hand strength.
3. Evaluation of the predictive power of the simulation for selecting a design
alternative and developing a new application.
My studies have been used to design an inclusive accessible game [Phillips, 2009] and
the University has recently been awarded EU funding for a project [The GUIDE
Project, 2009] that will build on results from my PhD research.
175
7.5. Future work

My research has created many open questions that pave way for two different avenues
for research. They are:
1. Applications of the simulator

2. Further development of the models
In the present work, I have validated the models using controlled experiments.
However it will be interesting to investigate the performance of the models in practice
and design new applications based on their predictions.
7.5.1. New applications
Interface personalization framework

The simulator can be extended to develop a framework for interface personalization.
Interface personalization is mainly explored in the domain of content personalization
and developing intelligent information filtering or recommendation systems based on
user profiles. In most of those systems content (or information) is represented in a
graph like structure (such as ontology or semantic network) and filtering or
recommendation is generated by storing and analyzing users’ interaction patterns.
Little research work has been done beyond content personalization. A few
representative and significant projects on interface personalization are the SUPPLE
project at University of Washington [Gajos, Wobbrock and Weld, 2007], the Lumiere
Project at Microsoft Research [Horovitz and colleagues, 2008] and AVANTI project
[Stephanidis and colleagues, 1998; 2003] for people with disabilities. The SUPPLE
project personalizes interfaces mainly by changing layout and font size for people
with visual and motor impairment and also for ubiquitous devices. However, the user
models do not consider visual and motor impairment in detail and thus work for only
176
loss of visual acuity and a few types of motor impairment. The Lumiere convenience
project [Horovitz and colleagues, 2008] uses an influence diagram in modelling users.
This records the relationships among users’ needs, goals, background etc. The Office
Assistant of Microsoft Office application uses this influence diagram to provide
runtime help to users. The AVANTI project [Stephanidis and colleagues, 1998; 2003]
provides a multimedia web browser for people with light, or severe motor disabilities,
and blind people. It distinguishes personalization into two classes –
o static adaptation which is personalization based on user profile and,

o dynamic adaptation that is personalization following the interaction pattern
(such as calculating error rate, user idle time etc. from usage log) with the
system.
However, the Lumiere project does not generalize their personalization mechanisms
for other applications and the AVANTI project only addresses a small segment of
disabilities for a particular application.
The lack of a generalized framework for personalization of users with a wide range of
abilities affects the scalability of products. My model covers users with a wide range
of abilities which can lead to a generalized framework for interface personalization.
The framework will work by identifying the specific problem experienced by a user
with a component of an interface and then personalizing the component according to
the needs of the user. For example an elderly user may find an online form unusable
because he suffers from significant tremor in his finger. In this case increasing the
font size or the size of the whole webpage will not be very helpful. Rather we have to
identify the optimum size of the textboxes and buttons that would help him to point to
them but will not consume much screen real-estate. The interface personalization
mechanism will be similar to the interface feature selection and customization process
177
described in Kumar et. al [2004], but will be more rigorous and generalized in relating
interface personalization to user modelling. It will work based on the following steps:
Collection of a set of sample interactions.

Prediction of interaction patterns using the user model and Monte-Carlo
simulation for different types of disabilities and skill levels.
Finding out the problems faced by users with different ranges of abilities.
Optimization of the features of the interface (such as font size, colour contrast,
layout etc.) for different users.
Web browser
Another interesting use of the models will be designing inclusive websites. Currently,
many applications (like shopping, banking, social networking systems) are developed
as web based systems. There are numerous guidelines and systems for developing
accessible websites but they are not adequate to provide accessibility. Moreover
designers often do not conform to the guidelines while developing new systems. It is
also equally difficult to change existing systems according to the guidelines. There are
a few systems (like IBM Web Adaptation Technology [2008], AVANTI Web
browser, WEBADPT2ME systems) which offer features to make web sites accessible
but either they serve a very special type of user (motor-impaired for AVANTI) or
there is no way to relate the inclusive features with the particular need of users. My
model can be used to relate the existing inclusive features with the need of users with
a wide range of abilities. For example, if we know the visual acuity of a user, I can
use my model to decide the optimum font size for a website. This type of adjustments
can also be done for ubiquitous devices, where the context itself poses limitations. For
example, an interface suitable for people having less visual acuity will be useful for
small screen devices which have high pixel-density. Similarly, a good interface for a
178
hyperkinetic motor impaired user will also be suitable for a handheld device during its
use in a moving vehicle.
7.5.2. New models
The application of the simulator will also help to analyse the models in more detail.
As for example, I have developed models for visual search tasks. However most of
the real life search tasks also involve significant amount of textual search. The
perception model can be extended to simulate reading tasks. The model can also be
extended to simulate other types of perception besides vision like auditory and haptic
perception.
I have not addressed cognitive impairment but my cognitive model can be extended to
simulate a few types of cognitive disabilities. Similarly the motor behaviour model
can be extended for other input devices besides mouse and single switch scanning
systems.
I have developed user interfaces for each model. It should be interesting and
important to work with interface designers and software engineers to optimize the
design of the interfaces of the models themselves.
The accuracy of the existing models can also be increased by separately calibrating
and validating them for different impairments (such as Maccular Degeneration,
Diabetic Retinopathy, Hypokinetic and Hyperkinetic motor impairment [Flowers,
1976], Dyslexia). The extension of the work will help to understand the effect of task
and devices on human cognition in more detail, which will also be of interest to
researchers in other disciplines besides computer science.
179
Glossary
ANOVA A statistical test that determines how much variance of a

dependent variable is explained by different factors and
interaction among them.
Assistive technology Technology developed for people with disabilities to assist them
in rehabilitation.
Cognitive A type of intelligent system that models cognition by assuming

Architecture
it a set of computational processes.
Colour blindness Colour blindness is the inability to perceive differences between

some of the colours that other people can distinguish .
Contrast Sensitivity Contrast Sensitivity is the ability to perceive differences between

an object and its background. It is measured by the difference in
the amount of light reflected (luminance) from two adjacent
surfaces.
Correlation A statistical analysis to determine the relationship between two

variables.
Diabetic Diabetic Retinopathy is a type of damage to the retina

Retinopathy
(retinopathy) caused by damage of blood vessels inside the eyes
due to diabetes mellitus.
Dual space model A type of user model that uses two separate state space
diagrams.
Effect size A statistical metric that shows the amount of variance explained
by a factor.
Eye Tracker An electronic device that tracks eye movement.
Fovea The Fovea is a spot near the centre of the retina that provides
most detailed vision. It has the highest density of cone cells.
Fuzzy c means A clustering algorithm that returns overlapping groups of cluster

based on fuzzy logic.
Glaucoma Glaucoma is a medical condition which causes loss of vision due
to the death of retinal ganglion cells.
GOMS A type of user model, which assumes that people interact with a
computer to achieve a goal by selecting a method, which
consists of a sequence of basic operations.
Goniometer A device to measure the range of motions of bone joints.
Hyperkinetic motor Hyperkinetic motor impairment refers to motor mpairment that

impairment
causes uncontrolled movement of body parts or tremor.
Hypokinetic motor Hypokinetic motor impairment refers to motor impairment that

impairment
causes restricted movement of body parts.
KLM Model The KLM model [Keystroke Level Model] simplifies the GOMS
model by eliminating the goals, methods, and selection rules,
leaving only six primitive operators.
A heuristics used to search interface objects. It tells that users
Label matching
principle search for similar words in an interface which have also
appeared in the task definition.
Levenshtein The Levenshtein distance is a metric to determine the similarity

distance
between two strings. It measures the minimum number of
operations needed to transform one string into the other, where
an operation is an insertion, deletion, or substitution of a single
character.
Maccular Maccular Degeneration is a medical condition which causes loss

Degeneration
of central vision.
MANOVA A statistical test to determine the interactions among the

dependent variables and the independent variables.
Markov Process A type of probability model which assumes that the present state
depends on a finite set of past states. When the present state only
depends on the immediate past state, it is called a first order
Markov process.
181
MHP A user model that explains cognition by an information
processing model. It classifies all cognitive activities in three
classes – perception, cognition and motor behaviour.
Motor impairment Motor-impairment is defined as the partial or complete disorder

of motor neurons resulting in the inability of muscle movements.
It includes people with spasticity, Amyotrophic Lateral Sclerosis
(ALS), Cerebral Palsy and other forms of acquired brain injury
that effect motor neurons.
The Parafovea is the region of retina at a distance of 1¼ mm
Parafovea
from the fovea, and extends to 2¾ mm away from the fovea. It
contains the highest density of rod cells.
The region of retina outside the parafoveal belt. It provides
Periphery
highly compressed information of low resolution.
People who lose foveal vision often develop a spot outside fovea
Pseudo fovea
called pseudo fovea, which they use to get most detailed vision.
When they look at a target, they focus it on the pseudo fovea
rather than on the real fovea.
Scanning Scanning is the technique of successively highlighting items on a

computer screen and pressing a switch when the desired item is
highlighted. Severely motor impaired users can access a
computer using scanning mechanism.
Syslistview32 A process in Windows operating system that renders icons in the

process
screen.
t Test A statistical test to compare means between two variables.
User model A user model is a representation of the knowledge and

preferences of users.
Visual Acuity Visual Acuity is the sensitivity of the visual interpretative

mechanism of the brain. It represents the acuteness of vision,
which depends on the sharpness of the retinal image within the
eyes.
Wrist Inclinometer A device to measure the range of motions of wrist.
182
Abbreviations
ACT-R Adaptive Control of Thought--Rational
AI Artificial Intelligence
ANOVA Analysis Of Variance
CORE Constraint-based Optimizing Reasoning Engine
EPIC Executive-Process/Interactive Control
GOMS Goal Operator Method Selection
HCI Human Computer Interaction
MANOVA Multivariate Analysis Of Variance
MHP Model Human Processor
SOAR State Operator And Result
183
Appendix
Colour histogram matching
The colour histogram of an image shows the distribution of colours of an image. It

divides the whole colour space (RGB, YUV and so on) into a set of bins and then
calculates the number of pixels in each bin.
I have calculated colour histogram of two different regions of an image one of which
contains the actual target while the other contains a possible target. The locations of the
actual and possible targets are part of the input to the perception model. Finally I
calculate the mean square distance between the histograms of the two regions of the
image.
Sobel operator
Sobel operator is used to detect edges in an image named after Irwin Sobel (currently at
HP Labs, Palo Alto). It convolutes the whole image with the following 3 × 3 kernel
where * here denotes the 2-dimensional convolution operation.
184
Shape context matching
The shape context algorithm detects the shape of an object in an image. It works after
detecting the edge of the object. It divides the image or particular image region
containing the object into uniform bins in log polar space (figure below) with respect to a
reference point and then calculates the number of pixels denoting edge in each bin.
I have calculated shape context histogram of two different regions of an image one of
which contains the actual target while the other contains a possible target. The locations
of the actual and possible targets are part of the input to the perception model. The centre
of a target is considered as the reference point. Finally I calculate the mean square
distance between the histograms of the two regions of the image.
Pseudo code of the cluster scanning system
Subroutine Cluster Scan

Get Targets
Cluster Targets
End Subroutine
185
Subroutine Get Targets
Enumerate all currently running processes
If the process creates a window and the window is visible
Store its position
If the window is the foreground window
Store its handle
End if
Get all child processes
Get all Thread Processes
End if
End Subroutine
Subroutine Get Child Processes

Get Class name
//Store positions of buttons, textboxes and combo boxes
If class name contains “Button” or class name =”text” or class name =”combo”
Get Position
If Parent Process is Foreground Window
Store its position with a mark
Else
Store its position without any mark
End if
//Store positions of icons
Else If class name =”SysListView32”
Get number of Icons
Get Position of all Icons
Store positions of all icons with a mark
Else
Store positions of all icons without any mark
End if
End if
End Subroutine
186
Subroutine Get Thread Processes
If the process creates a Window and the window is visible
Get Window’s Position
Store its position with a mark
Else
Store its position without any mark
End if
End if
End Subroutine
Subroutine Cluster Targets

Get the percentage of area occupied by the foreground window
Run Fuzzy c-means algorithm on the targets of the foreground window with values of c
proportional to percentage of area occupied by the foreground window
Run Fuzzy c-means algorithm on the rest of the targets with values of c proportional to (1 -
percentage of area occupied by the foreground window)
End Subroutine
187
Bibliography
1. "Abledata Products." Available at: http://www.abledata.com, Accessed on 1st

July, 2007
2. Alm N., Arnott J. L. and Newell A. F. "Prediction and Conversational

Momentum in an Augmentative Communication System." Communications of
the ACM 35.5 (1992): 46-57.
3. Anderson J. R. and Lebiere C. "The Atomic Components of Thought." Hillsdale,

NJ, USA: Lawrence Erlbaum Associates, 1998.
4. Barnard P. "The Emotion Research Group Website, MRC Cognition and Brain
Sciences Unit." Available at: http://www.mrc-cbu.cam.ac.uk/~philb , Accessed
on 1st July, 2007
5. Barnes M. P. and Johnson G.P. "Upper Motor Neurone Syndrome and

Spasticity." UK: Cambridge University Press, 2001.
6. Belongie S., Malik J., and Puzicha J. "Shape Context: A new descriptor for
shape matching and object recognition." Neural Information Processing Systems
Conference 2000.
7. Belongie S., Malik J. and Puzicha J. "Shape Matching and Object Recognition
Using Shape Contexts." IEEE Transactions on Pattern Analysis and Machine
Intelligence 24.4 (2002): 509-521.
8. Benyon D. and Murray D. "Applying User Modeling to Human Computer

Interaction Design." Artificial Intelligence Review 7.3 (1993): 199-225.
9. Birch S. "Single-Switch Mouse Emulation: A Tale of Two Methods." Annual

Conference of RESNA 2000.
10. Biswas P., Bhattacharyya S. and Samanta D. "User Model to Design Adaptable
Interfaces For Motor-Impaired Users." Tencon '05 - IEEE Region 10
Conferences 2005. 1801-1844.
11. Blackstien-Adler S., Shein F., Quintal J., Birch S. and Weiss P. L. "Mouse
Manipulation through Single-Switch Scanning." Assistive Technology 16.1
(2004): 28-42.
Bibliography
12. Blandford A., Butterworthb R. and Curzonb P. "Models of interactive systems: a

case study on programmable user modelling." International Journal of Human-
Computer Studies 60 (2004): 149-200.
13. Boden M. A. "Computer Models of Mind: Computational Approaches in

Theoretical Psychology." Cambridge, UK: Cambridge University Press, 1985.
14. Bovair S., Kieras D. E., and Polson P. G. "The acquisition and performance of
text-editing skill: A cognitive complexity analysis." Human-Computer
Interaction 5 (1990): 1-48.
15. Bravo P. E. , LeGare M., Cook A.M. and Hussey S. "A study of the application
of Fitts' Law to selected cerebral palsy adults." Perceptual and Motor Skills 77
(1993): 1107-1117.
16. Brettel H., Viénot F. and Mollon J. D. "Computerized simulation of color

appearance for dichromats." Journal of the Optical Society of America 14.10
(1997): 2647-2655.
17. Burack J. A., Zelazo P. R., Charman T. and Yirmiya N. "The Development of
Autism: Perspectives from Theory and Research." Lawrence Erlbaum
Associates, 2001.
18. Butterworth R. and Blandford A. "Programmable user models: The story so far."
Available at: http://www.cs.mdx.ac.uk/puma/wp8.pdf, Accessed on 30th June,
2007
19. Byrne M. D. "ACT-R/PM And Menu Selection: Applying A Cognitive

Architecture To HCI." International Journal of Human Computer Studies 55
(2001): 41-84.
20. Card S., Moran T. and Newell A. "The Psychology of Human-Computer

Interaction." Hillsdale, NJ, USA: Lawrence Erlbaum Associates, 1983.
21. Carroll J. M. and Olson J.M. "Mental Models In Human-Computer Interaction."

Handbook of Human-Computer Interaction Ed. Helander M. Amsterdam,
Netherlands: Elsevier Ltd., 1990. 135-158.
22. "Cognitive Architectures." Available at: http://en.wikipedia.org/wiki/Cognitive_

architecture, Accessed on 1st July, 2007
189
Bibliography
23. Crick R. P. and Khaw P. T. "A textbook of clinical ophthalmology - A Practical

Guide to Disorders of the Eyes and Their Management." World Scientific
Publishing Co Pvt. Ltd., 2003.
24. Daly S. "The Visible Differences Predictor: An algorithm for the assessment of
image fidelity." Digital Images and Human Vision Ed. Watson A. B. Cambridge,
MA, USA: MIT Press, 1993. 179-206.
25. Damper R. I. "Text Composition By The Physically Disabled: A Rate Prediction

Model For Scanning Input." Applied Ergonomics 15.4 (1984): 289-296.
26. Desimone R. and Duncan J. "Neural mechanisms of selective visual attention."

Annual Review Neuroscience 18 (1995): 193-222.
27. Duncan J. and Humphreys G. W. "Visual search and stimulus similarity."

Psychological Review 96.3 (1989): 433-458.
28. Duffy V. G. "Handbook of Digital Human Modeling: Research for Applied

Ergonomics and Human Factors Engineering." Boca Raton, FL, USA: CRC
Press, 2008.
29. Edwards W. D. "Dynamic decision theory and probabilistic information

processing." Human Factors 4.2 (1962): 59-73.
30. Eng K., Lewis R. L., Tollinger I., Chu A., Howes A. and Vera A. "Generating
Automated Predictions of Behavior Strategically Adapted To Specific
Performance Objectives." ACM/SIGCHI Conference on Human Factors in
Computing Systems (CHI) 2006. 621-630.
31. Faye E. "The effect of the eye condition on functional vision." Clinical low
vision Ed. Faye E. Boston, USA: Little, Brown and Company, 1980. 172-189.
32. Field A. "Discovering Statistics Using SPSS." SAGE Publications Ltd., 2009.
33. Findlay J. M. "Saccade Target Selection during Visual Search." Vision Research
37.5 (1997): 617-631.
34. Findlay J. M. "Programming of Stimulus-Elicited Saccadic Eye Movements."

Eye Movements and Visual Cognition: Scene Perception and Reading Ed.
Rayner K. New York, USA: Springer-Verlag, 1992. 8-30.
190
Bibliography
35. Fitts P.M. "The Information Capacity of The Human Motor System In
Controlling The Amplitude of Movement." Journal of Experimental Psychology
47 (1954): 381-391.
36. Fleetwood M. F. and Byrne M. D. "Modeling icon search in ACT-R/PM."

Cognitive Systems Research 3.1 (2002): 25-33.
37. Fleetwood M. F. and Byrne M. D. "Modeling the Visual Search of Displays: A

Revised ACT-R Model of Icon Search Based on Eye-Tracking Data." Human-
Computer Interaction 21.2 (2006): 153-197.
38. Flowers K. A. "Visual 'Closed-Loop' And 'Open-Loop' characteristics of

Voluntary Movement in Patients With Parkinsonism And Intention Tremor."
Brain 99 (1976): 269-310.
39. Gajos K. Z., Wobbrock J. O. and Weld D. S. "Automatically generating user

interfaces adapted to users' motor and vision capabilities." ACM Symposium on
User Interface and Software Technology (UIST) 2007. 231-240.
40. Gan K. C. and Hoffmann E. R. "Geometrical conditions for ballistic and visually
controlled movements." Ergonomics 31 (1988): 829-839.
41. Gnanayutham P., Bloor C. and Cockton G. "Discrete Acceleration and

Personalised Tiling as Brain-Body Interface Paradigms for Neurorehabilitation."
ACM/SIGCHI Conference on Human Factors in Computing Systems (CHI)
2005. 261-270.
42. Gray W. D. and Sabnani H. "Why you can't program your VCR, or, predicting
errors and performance with production system models of display-based action."
Conference Companion On Human Factors In Computing Systems in
1994. 79-80.
43. Gray W., Young R.M. and Kirschenbaum S. "Introduction to this special issue
on cognitive architectures and human-computer interaction." Human-Computer
Interaction 12 (1997): 301-309.
44. Griffiths T. L., Kemp C. and Tenenbaum J. B. "Bayesian Models of Inductive

Learning." Tutorial at the Annual Meeting of the Cognitive Science Society
2008. 2665
191
Bibliography
45. Gump A., Legare M. and Hunt D. L. "Application of Fitts' Law to individuals
with cerebral palsy." Perceptual and Motor Skills 94 (2002): 883-895.
46. "Gus Scanning Cursor." Available At: www.Turningpointtechnology.Com/

Software/Gs/Scanning Cursor.htm, Accessed on 21st May 2007
47. Hampson P. J. and Moris P. E. "Understanding Cognition.” Oxford, UK:

Blackwell Publishers Ltd., 1996.
48. Hick W.E. "On the rate of gain of information." Journal of Experimental
Psychology 4 (1952): 11-26.
49. Hill K. and Romich B. "A Rate Index for Augmentative and Alternative
Communication." Available At: http://www.AACinstitute.Org/Resources/
Methodsandtools/2002rateindex/ Paper.html, Accessed on 21st May 2007
50. Hoffmann E. R. and Sheikh I. "Finger width corrections in Fitts' Law:

Implications for speed-accuracy research." Journal of Motor Behavior 24 (1991):
259-262.
51. Holzinger A. "Finger Instead of Mouse: Touch Screens as a means of enhancing

Universal Access." Universal Access, Theoretical Perspectives, Practice, and
Experience. LNCS - 2615 Ed. Carbonell N. and Stephanidis C. Springer-Verlag
Ltd., 2003. 387-397.
52. Holzinger A., Höller M.,Schedlbauer M. and Urlesberger B. "An Investigation of

Finger versus Stylus Input in Medical Scenarios." IEEE International Conference
on Information Technology Interfaces 2008. 433-438.
53. Hornof A. J. and Kieras D. E. "Cognitive Modeling Reveals Menu Search Is

Both Random And Systematic." ACM/SIGCHI Conference on Human Factors in
54. Horvitz E., Breese J., Heckerman D., Hovel D. and Rommelse K. "The Lumiere
Project: Bayesian User Modeling for Inferring the Goals and Needs of Software
Users." Available at: http://research.microsoft.com/en-
us/um/people/horvitz/lumierehtm, Accessed on 28th October 2009
55. Howes A., Vera A., Lewis R.L. and Mccurdy, M. "Cognitive Constraint
Modeling: A Formal Approach To Reasoning About Behavior." Annual meeting
of the Cognitive Science Society Lawrence Erlbaum Associates, 2004.
192
Bibliography
56. Hwang F., Langdon P.M., Keates S., Clarkson P.J., and Robinson P. "Cursor
Characteristics And Haptic Interfaces For Motor-Impaired Users." Cambridge
Workshop on Universal Access and Assistive Technology 2002. 87-96.
57. "IBM Web Adaptation Technology." Available at: http://www.webadapt.org/,

Accessed on 22nd May 2007
58. "Inclusive Design Toolkit." Available at: http://www-

edc.eng.cam.ac.uk/betterdesign/ downloads/visualsim.html, Accessed on 27th
March, 2008
59. Itti L. and Koch C. "Computational Modelling of Visual Attention." Nature

Reviews, Neuroscience 2 (2001): 1-10.
60. John B. E. and Kieras D. "The GOMS Family of User Interface Analysis
Techniques: Comparison And Contrast." ACM Transactions on Computer
Human Interaction 3 (1996): 320-351.
61. Johnson P. "Human Computer Interaction: psychology, task analysis and

software engineering." McGraw Hill Book Company, 1992.
62. Johnson-Laird P.A. "The Computer and The Mind." Cambridge, MA, USA:
Harvard University Press, 1988.
63. Jonides J. "Voluntary versus automatic control over the mind's eye's movement."
Attention and performance Ed. Long J. B. and Baddeley A. D. Hillsdale, NJ,
USA: Erlbaum, 1981. 187-203.
64. Kaiser P. and Boynton R. "Human color vision." Optical Society of America,
1996.
65. Kaplan R. J. "Physical medicine and rehabilitation review." MacGraw Hill Book
Company, 2006.
66. Keates S. and Clarkson J. "Countering Design Exclusion An Introduction To

Inclusive Design." UK: Springer-Verlag London Ltd., 2004.
67. Keates S. and Trewin S. "Effect of Age And Parkinson's Disease On Cursor
Positioning Using A Mouse." ACM/SIGACCESS Conference on Computers and
Accessibility (ASSETS) 2005. 68-75.
193
Bibliography
68. Keates S., Clarkson J. and Robinson P. "Investigating The Applicability of User
Models For Motion Impaired Users." ACM/SIGACCESS Conference on
Computers and Accessibility (ASSETS) 2000. 129-136.
69. Keates S., Trewin S. and Paradise J. "Using Pointing Devices: Quantifying
Differences Across User Groups." International Conference on Universal Access
in Human-Computer Interaction 2005.
70. Kennedy P. R., Bakay R. A., Moore M. M., Adams K. and Goldwaithe J. "Direct
Control of a Computer from the Human Central Nervous System." IEEE
Transactions on Rehabilitation Engineering 8.2 (2000): 198-203.
71. Kieras D. and Meyer D. E. "An Overview of The EPIC Architecture For
Cognition And Performance With Application to Human-Computer Interaction."
Human-Computer Interaction 12 (1990): 391-438.
72. Kieras D. E. "Fidelity Issues In Cognitive Architectures For HCI Modelling: Be

Careful What You Wish For." International Conference on Human Computer
Interaction 2005.
73. Kieras D. "GOMS Modeling of User Interfaces Using NGOMSL." Conference

Companion On Human Factors In Computing Systems in ACM/SIGCHI
Conference on Human Factors in Computing Systems 1994. 371-372.
74. Kieras D. E., Wood S. D., Abotel K. and Hornof A. "GLEAN: A Computer-
Based Tool For Rapid GOMS Model Usability Evaluation of User Interface
Designs." ACM Symposium on User Interface and Software Technology (UIST)
1995. 91-100.
75. Laird J.E., Rosenbloom P.S. and Newell A. "Towards chunking as a general
learning mechanism." National Conference on Artificial Intelligence at Austin,
TX: Morgan 1984. 188-192.
76. Lallement Y. and Alexandre F. "Cognitive Aspects of Neurosymbolic

Integration." Connectionist-Symbolic Integration Ed. Sun R. and Alexandre F.
London, UK: Lawrence Erlbaum Associates, 1997.
77. Langolf G. D., Chaffin D. B. and Foulke J. A. "An investigation of Fitts' Law
using a wide range of movement amplitudes." Journal of motor behaviour 8
(1976): 113-128.
194
Bibliography
78. Lesher G. W. , Rinkus G. J., Moulton B. J. and Higginbotham D. J. "Logging

And Analysis of Augmentative Communication." Annual Conference of RESNA
2000.
79. Lesher G. W., Higginbotham D.J. and Alsofrom B. "Acquisition of Scanning

Skills: The Use of An Adaptive Scanning Delay Algorithm Across Four
Scanning Dispays." Annual Conference of RESNA 2002.
80. Li P. "Computational Modeling of Bilingualism." Workshop at the Annual.

Meeting of the Cognitive Science Society 2006. 2659
81. Luck S. J., Chelazzi L., Hillyard S. A. and Desimone R. "Neural Mechanisms of
Spatial Selective Attention In Areas V1, V2, And V4 of Macaque Visual
Cortex." Journal of Neurophysiology 77.1 (1997): 24-42.
82. Mackenzie I. S. "Motor Behaviour Models For Human-Computer Interaction."

HCI Models, Theories, And Frameworks: Toward A Multidisciplinary Science
Ed. Carroll J. M. San Francisco, USA: Morgan Kaufmann, 2003. 27-54.
83. Majaranta P. and Raiha K. "Twenty Years of Eye Typing: Systems and Design
Issues." Eye Tracking Research & Application 2002. 15-22.
84. Marr D. C. "Visual Information Processing: the structure and creation of visual
representations." Philosophical Transactions of the Royal Society of London
290.1038 (1980): 199-218.
85. Mathiowetz V., Weber K., Volland G. and Kashman N. "Reliability and validity
of hand strength evaluation." Journal of Hand Surgery 9A (1984): 222-226.
86. Mccoy K. "Simple NLP Techniques For Expanding Telegraphic Sentences."

Natural Language Processing for Communication Aids, an ACL/EACL '97
Workshop 1997. 17-22.
87. Mccoy K. and Demasco P. W. "Some Interface Issues In Developing Intelligent

Communication Aid for Disabled." Intelligent User Interfaces 1997. 163-170.
88. Mcmillan W. W. "Computing For Users With Special Needs And Models of
Computer-Human Interaction." ACM/SIGCHI Conference on Human Factors in
195
Bibliography
89. Meyer D. E., Abrams R. A., Kornblum S., Wright C. E. and Smith J. E. K.
"Optimality in human motor performance: Ideal control of rapid aimed
movements." Psychological Review 95 (1988): 340-370.
90. Moran T.P. "Command Language Grammar: A Representation For The User
Interface of Interactive Computer Systems." International Journal of Man-
Machine Studies 15.1 (1981): 3-50.
91. Motomura Y., Yoshida K. and Fujimoto K. "Generative user models for
Adaptive Information Retrieval." IEEE International Conference on Systems
2000.
92. Moynahan A. J. and Mahoney R. M. "Single Switch Mouse Control Interface."

Annual Conference of RESNA 1996.
93. Neisser U. "Cognition and reality: principles and implications of cognitive

psychology." San Francisco, USA: WH Freeman, 1976.
94. Newell A. "Unified Theories of Cognition." Cambridge, MA, USA: Harvard

University Press, 1990.
95. Newell A. "You can’t play 20 questions with nature and win." Projective
comments on the papers of this symposium. Pittsburgh, Pa: Carnegie Mellon
University, Department of Computer Science. 1973.
96. Newell A. and Simon H. A. "GPS, A Program That Simulates Human Thought."
Cambridge, MA, USA: MIT Press, 1995.
97. Nilsen E. L. "Perceptual-motor Control in Human-Computer Interaction." Ann

Arbor, MI, USA: The Cognitive Science and Machine Intelligence Laboratory,
the University of Michigan, 1992.
98. Norcio F. "Adaptive Interfaces: Modelling Tasks and Users." IEEE Transaction
on Systems, Man, Cybernetics 19.2 (1989): 399-408.
99. Nixon M. and Aguado A. "Feature Extraction and Image Processing." Oxford,
UK: Elsevier Ltd., 2002.
100. Norcio F. and Chen Q. "Modeling User's with Neural Architecture." International
Joint Conference on Neural Networks 1992. 547-552.
196
Bibliography
101. Ntoa S., Savidis A. and Stephanidis C. "Fastscanner: An Accessibility Tool For
Motor Impaired Users." International Conference on Computers Helping People
with Special Needs, LNCS-3118, Springer-Verlag 2004. 796-803.
102. O'neill P., Roast C. and Hawley M. "Evaluation of Scanning User Interfaces
Using Real Time Data Usage Logs." ACM/SIGACCESS Conference on
Computers and Accessibility (ASSETS) 2000. 137-141.
103. O'Regan K. J. "Optimal Viewing position in words and the Strategy-Tactics

Theory of Eye Movements in Reading." Eye Movements and Visual Cognition:
Scene Perception and Reading Ed. Rayner K. New York, USA: Springer-Verlag,
1992. 333-355.
104. Oka N. "Hybrid cognitive model of conscious level processing and unconscious
level processing." IEEE International Joint Conference on Neural Networks
1991. 485-490.
105. Pasero R., Richardet N. and Sabatier P. "Guided Sentences Composition for
Disabled People." Applied Natural Language Processing 1994. 205-206.
106. Payne S.J. and Green T.R.G. "Task-Action Grammars: A Model of Mental
Representation of Task Languages." Human-Computer Interaction 2 (1986): 93-
133.
107. Peck V. A. and John B. E. "Browser-Soar: a computational model of a highly

interactive task." ACM/SIGCHI conference on Human factors in computing
systems (CHI) 1992. 165-172.
108. Petrie H., Hamilton F., King N. and Pavan P. "Remote Usability Evaluations
With Disabled People." ACM/SIGCHI Conference on Human Factors in
109. Phillips N. "Graphical modification for partially sighted gamer accessibility."

Computer Laboratory, Tripos Part II project University of Cambridge, 2009.
110. Polit A. and Bizzi E. "Processes controlling arm movements in monkeys."

Science 201 (1978): 1235-1237.
111. Privitera C. M. and Stark L. W. "Algorithms for defining Visual Regions-of-

Interests: Comparison with Eye Fixations." IEEE Transactions on Pattern
Analysis and Machine Intelligence 22.9 (2000): 970-982.
197
Bibliography
112. Reisner P. "Formal Grammar And Human Factors Design of An Interactive

Graphics System." IEEE Transactions On Software Engineering 7 (1981): 229-
240.
113. Reynolds J. H. and Desimone R. "The Role of Neural Mechanisms of Attention

in Solving The Binding Problem." Neuron 24.1(19-29) (1999): 111-125.
114. Rieman J. and Young R. M. "A dual-space model of iteratively deepening

exploratory learning." International Journal of Human-Computer Studies 44
(1996): 743-775.
115. Rizzo A., Marchigiani E. and Andreadis A. "The AVANTI Project: Prototyping
And Evaluation With A Cognitive Walkthrough Based On The Norman's Model
of Action." Designing interactive systems: processes, practices, methods, and
techniques 1997. 305-309.
116. Rosandich R. G. "Intelligent Visual Inspection using artificial neural networks."

London, UK: Chapman and Hall, 1997.
117. Rosenbaum D. A. "Human Motor Control." California, USA: Academic Press

Inc., 1991.
118. Ross S. M. "Probability Models For Computer Science.” Elsevier Ltd., 2002.
119. Ross T. J. "Fuzzy Logic with Engineering Application." McGraw-Hill Inc.,

1997.
120. Salvucci D. D. "An integrated model of eye movements and visual encoding."
Cognitive Systems Research (2001):
121. Salvucci D. D. and Lee F. J. "Simple cognitive Modelling in a complex cognitive

architecture." ACM/SIGCHI Conference on Human Factors in Computing
Systems (CHI) 2003. 265-272.
122. Schedlbauer M. and Heines J. "Selecting While Walking: An Investigation of

Aiming Performance in a Mobile Work Context." Americas Conference on
Information Systems 2007.
123. Schedlbauer M. J., Pastel R. L. and Heines J. M. "Effect of Posture on Target

Acquisition with a Trackball and Touch Screen." International Conference on
Information Technology Interfaces 2006. 257-262.
198
Bibliography
124. Schmidt R. A., Zelaznik H. N. and Frank J. S. "Ssources of Inaccuracy in Rapid

Movement." Information Processing in Motor Control and Learning Ed.
Stelmach G. E. London, UK: Academic Press Inc. Ltd., 1978. 183-203.
125. Scholtes V.A. B., Becher J. G., Beelen A. and Lankhorst G. J. "Clinical
assessment of spasticity in children with cerebral palsy: a critical review of
available instruments." Developmental Medicine and Child Neurology 48
(2006): 64-73.
126. "See it Right." UK: RNIB, 2006.
127. Shah K., Rajyaguru S., Amant R. S. and Ritter F. E. "Connecting a Cognitive
Model to Dynamic Gaming Environments: Architectural and Image Processing
Issues." International Conference on Cognitive Modeling 2003. 189-194.
128. Shneiderman B. "Designing The User Interface - Strategies For Effective

Human-Computer Interaction." Pearson Education, 2001.
129. Simpson R. C. and Koester H. H. “Adaptive One-Switch Row-Column

Scanning." IEEE Transactions On Rehabilitation Engineering 7.4 (1999): 464-
473.
130. Smits-Engelsman B. C. M., Rameckers E. A. A. and Duysens J. "Children with

congential spastic hemiplegia obey Fitts' Law in a visually guided tapping task."
Journal of Experimental Brain Research 177 (2007): 431-439.
131. Stephanidis C. and Constantinou P. "Designing Human Computer Interfaces For

Quadriplegic People." ACM Transactions on Computer-Human Interaction 10.2
(2003): 87-118.
132. Stephanidis C., Paramythis A., Sfyrakis M., Stergiou A., Maou N., Leventis
A.,Paparoulis G. and Karagiannidis C. "Adaptable And Adaptive User Interfaces
for Disabled Users in the AVANTI Project." Intelligence in Services and
Networks, LNCS-1430, Springer-Verlag 1998. 153-166.
133. Stephanidis C., Paramythis A., Karagiannidis C. and Savidis A. "Supporting

Interface Adaptation: the AVANTI Web-Browser." 3rd ERCIM Workshop on
User Interfaces for All 1997.
134. Steriadis C. E. and Constantnou P. "Using The Scanning Technique to make an

Ordinary Operating System Accessible to Motor-Impaired Users. The
199
Bibliography
Autonomia System." ACM/SIGACCESS Conference on Computers and

Accessibility (ASSETS) 2002.
135. Strauss T. J. and Mirman D. and Magnuson J. S. "Speech Perception: Linking

Computational Models and Human Data." Tutorial at the Annual Meeting of the
Cognitive Science Society 2006. 2669
136. The GUIDE Project, Available at: http://hcim.di.fc.ul.pt/wiki/GUIDE Accessed

on 2nd October 2009
137. "The Scanbuddy System." Available At: www.Ahf-Net.Com/Scanbuddy.htm,

Accessed on 21st May 2007
138. "The Super-Switch." Available at: http://rjcooper.com/super-switch/index.html,

Accessed on 1st July, 2007
139. Tollinger I., Lewis R. L., McCurdy M., Tollinger P., Vera A., Howes A. and
Pelton L. "Supporting Efficient Development of Cognitive Models At Multiple
Skill Levels: Exploring Recent Advances In Constraint-Based Modeling."
2005. 411-420.
140. Tovee M. J. "An introduction to Visual System." Cambridge, UK: Cambridge

University Press, 2008.
141. "Tobii Eye Tracker." Available at http://www.imotionsglobal.com/Tobii+X120+

Eye-Tracker.344.aspx Accessed on 12th December 2008
142. Treisman A. and Gelade G. "A Feature Integration Theory of Attention."

Cognitive Psychology 12 (1980): 97-136.
143. Trewin S. and Pain H. "Keyboard And Mouse Errors Due To Motor
Disabilities." International Journal of Human-Computer Studies 50.2 (1999):
109-144.
144. Viénot F., Brettel H. and Mollon J. D. "Digital video colour maps for checking
the legibility of displays by dichromats." Color Research and Application 24.4
(1999): 243-252.
145. "Version 6.24 of CLIPS." Available At: http://www.Ghg.Net/Clips/

Version624.html, Accessed on 21st May 2007
200
Bibliography
146. "Vision Simulator." Available at: http://www.my-vision-simulator.com/,

Accessed on 27th March, 2008
147. "Visual Impairment Simulator." Available at: http://cita.rehab.uiuc.edu/software/

vis/index.php, Accessed on 27th February, 2008
148. Waller S., Langdon P., Cardoso C. and Clarkson P.J. "Calibrating capability loss
simulators to population data." Contemporary Ergonomics Ed. Bust P. Taylor
and Francis Ltd., 2008.
149. Wang H. C. "Modeling Idea Generation Sequences Using Hidden Markov

Models." The Annual Meeting of the Cognitive Science Society 2008. 107-112.
150. Warrick A. and Kaul S. "Their Manner of Speaking." Calcutta, India: Indian
Institute of Cerebral Palsy, 2002.
151. "WAI Guidelines and Techniques." Available at http://www.w3.org/WAI/guid-

tech.html, Accessed on 4th December, 2009
152. "WHO website." Available at http://www.who.int/ageing/en, Accessed on 18th

September, 2009
153. Wobbrock J. O. and Gajos K. Z. "A Comparison of area pointing and goal
crossing for people with and without motor impairments." International
ACM/SIGACCESS Conference on Computers and Accessibility (ASSETS)
2007. 3-10.
154. "World Bank Website." Available at http://web.worldbank.org, Accessed on

18th September, 2009
155. Young R.M., Green T.R.G. and Simon T. "Programmable User Models For
Predictive Evaluation of Interface Designs." ACM/SIGCHI Conference on
Human Factors in Computing Systems (CHI) 1989. 15-19.
201
View publication stats

Thesis

Uploaded by

Copyright:

Available Formats

You might also like

Thesis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Simulating HCI for special needs

Article in ACM SIGACCESS Accessibility and Computing · September 2007

HCI for Automotive View project

Human Robot Interaction : A Literature Survey View project

The user has requested enhancement of the downloaded file.

A dissertation submitted to the University of Cambridge Computer Laboratory

Date:…………13/3/10………….... Pradipta Biswas…..

Computers offer valuable assistance to people with physical disabilities. However

Chapter 2 Literature survey 20

Chapter 3 Perception model 35

Chapter 4 Cognitive model 72

Chapter 6 Applications 130

6.1. Introduction 130

Chapter 7 Conclusions 171

7.1. Introduction 171

Glossary & Abbreviations 180

Figures Page No.

Tables Page No.

It is possible to develop systematic models of human computer interaction

1.2. Proposed solution

Figure 1-1. Use of the simulator

1.3. Development methodology

The models draw on earlier work on cognitive science, studies of different

Input parameters (e.g. type and extent of disability of users)

Literature survey on cognitive

Figure 1-2. Development methodology

There are two phases in model development

Calibration through exploration

1.4. Thesis Structure

I discuss earlier research on human behaviour simulation in Chapter 2. At the end of

My user model consists of three components - a perception model, a cognitive model

Chapter 6 evaluates the complete simulator. I present two applications of the

Finally I conclude and point to a few possible extensions of my research in Chapter 7.

1. P. Biswas and P. Robinson, A New Screen Scanning System based on

Addressing a large variety of users is always a challenge to designers due to diverse

Research on simulating user behaviour to predict machine performance was originally

The Command Language Grammar [Moran, 1981] developed by Moran at Xerox

2.2. The GOMS family of models

Kieras developed a structured language representation of GOMS model, called

2.3. Cognitive architectures

However, there are certain aspects of human cognition (such as perception,

The EPIC (Executive-Process/Interactive Control) [Kieras and Meyer, 1990]

The CORE system (Constraint-based Optimizing Reasoning Engine) [Howes, Vera

There exist additional cognitive architectures (such as Interactive Cognitive

2.4. Grammar based models

o Operations by Terminal symbols

This type of modelling is quite useful to compare different interaction techniques.

2.5. Application specific models

Figure 2-1. Application Specific User Models

Another problem of existing modelling approaches stems from issues related to

o Web Browsers [Stephanidis, 1998 ; IBM Web Adaptation Technology, 2008]

o Augmentative and alternative communication aids ([Alm, Arnott and Newell,

Most of these works concentrate on a particular application or a set of users, which

There is not much reported work on systematic modelling of assistive interfaces.

o Simulate HCI of both able-bodied and disabled users.

KLM for Disabled

The perception model simulates the visual perception of interface

Application Model Interface Model User Model