Professional Documents
Culture Documents
Disseration - Viktoria Spanou
Disseration - Viktoria Spanou
Viktoria Spanou
This dissertation was submitted in part fulfilment of requirements for the degree of MSc
Information Management
August 2020
Abstract
This dissertation explores the use of Perceptual Speed in the STEM hiring process.
Perceptual Speed (PS) has been associated with a variety of cognitive traits as well as
performance in certain industries. Perceptual Speed (PS) is defined as the cognitive ability of
an individual to compare, scan and find symbols or numbers quickly and accurately. This study
aims to determine whether or not high PS is an indicator of high performance in the context of
computer programming. An online experiment was conducted comprising two PS tests and one
programming work task in the attempt to identify a correlation between the two. Analysis of the
results demonstrated a relationship between one of the PS tests and work task recall values.
The results indicate that other variables (experience, device used) also affect an individual’s
performance in the administered task. Further research is necessary to identify whether PS can
be used to determine if a candidate is an appropriate fit for a STEM role.
Declaration
This dissertation is submitted in part fulfilment of the requirements for the degree of
MSc of the University of Strathclyde.
I declare that this dissertation embodies the results of my own work and that it has
been composed by myself. Following normal academic conventions, I have made due
acknowledgement, to the work of others.
I declare that I have sought, and received, ethics approval via the Departmental Ethics
Committee as appropriate to my research.
I declare that the word count for this dissertation (excluding title page, declaration,
abstract, acknowledgements, table of contents, list of illustrations, references and appendices
is .
3.1. Further research: The relationship between PS and other cognitive tasks ................15
4.3. Perceptual Speed test types, choosing the right ones ...............................................19
6.2. Scoring......................................................................................................................40
8. References.........................................................................................................................53
Table of figures
Figure 1: An example of a test for perceptual speed .................................................................. 9
Figure 14: Work Task Precision and Recall per Device .............................................................46
Figure 15: Page 1 of the administered programming work task (Desktop Site)..........................47
Figure 16: Page 1 of the administered programming work task (Mobile Site) ............................48
Table of tables
Table 1: Hypotheses ..............................................................................................................18
Table 8: Correlation between Work Task Precision/Recall and other variables ......................42
The next section (Section 2) outlines the previous work undertaken in the Perceptual
Speed space. The remainder of the dissertation is structured as follows: Section 3 presents the
use of PS tests in the hiring process; Section 4 presents the hypotheses investigated in this
dissertation and describes the various types of PS tests and the rationale in selecting the
appropriate test; and Sections 5, 6 and 7 present the methodology, analysis, results and
discussion, conclusions, and recommendations for future research respectively.
2. A history of Perceptual Speed Studies and
Applications
1 It may be noted that PS tests were used prior to 1976, mainly to measure the work task efficiency
of administrative hires, as PS was a validated proxy for clerical speed (Gael, Grant and Ritchie, 1975).
Figure 2: Number Comparison Test
Extract from Ekstrom’s kit (page 123): PS test with instructions, example and marking criteria
With the exception of digitisation, which began in the 1980s, the administered PS tests
have not varied greatly since the early applications. A comparison was conducted between the
computerised and paper versions of various aptitude tests to confirm whether or not the
computerised test was an “accurate representation of the test battery” (Henly et al., 1989) . This
comparison validated Computerised Adaptive Versions of aptitude tests and as a result,
broadened their scope.
3. Contextual relevance: Perceptual Speed in the hiring
process
The use of PS tests has evolved over the last twenty years and they are currently being
used together with cognitive tests in the context of job hiring. Employers are continually seeking
more and more criteria to determine how a candidate’s suitability and fit in a role and the
organisation as a whole, in addition to how well they will perform after hiring. Moreover, due to
the increased number of highly qualified applicants for each open position, this additional testing
adds another layer of classification and contributes to reducing costs throughout the entire hiring
process. As this process is often outsourced, this further explains the success of companies
like the aforementioned Thomas International Ltd.
Ackerman further explored Perceptual Speed tests by relating them to skill acquisition,
investigating whether or not he could identify individual differences based on PS (Ackerman
and Beier, 2007). Ackerman’s research aimed to determine if an individual with high Perceptual
Speed was able to more quickly adapt to a new role than an equally qualified candidate with
lower PS. Although this study took place many years after the first instance of digitisation, there
was still very little research into the benefits of computerised testing. Therefore, in all
applications of PS testing prior to this study, one limitation of administering the PS tests was the
effort and time-intensive nature of the test scoring, as the tests had to be administered in small
groups and then graded individually.
In this study, skill acquisition is defined following the three phases of skill acquisition
defined by Fitts and Posner (Fitts and Posner, 1967). The first phase is general cognitive
abilities (verbal, maths, spatial), the second phase refers to PS and the third phase
encompasses psychomotor abilities. This model is employed in this study due to Ackerman’s
previous successful use of and endorsement of it (Ackerman and Cianciolo, 2000). Ackerman
concluded that a correlation exists between participant performance on speed tests and their
ability to acquire skills. This correlation increases in phase one and two and declines in phase
three. Ackerman was able to highlight the usefulness of testing PS as a proxy for predicting
work task performance (for certain job types) due to the ease with which PS tests can be
administered and the variety of tests that are available.
Johnson and Deary conducted a study on information processing speed in which they
measured PS reaction and inspection time compared to general cognitive ability. The sample
population chosen for this study were over the age of 70, which is important as it has been
proven that PS decreases with age (Ghisletta and Lindenberger, 2003). The researchers
administered 18 different ability and speed tests. The study concluded that high spatial, verbal
and perceptual speed were more relatable to information processing speed as opposed to
general intelligence (Johnson and Deary, 2011).
There are many cases studies outlining the benefits of high perceptual speed in
information retrieval tasks. In preparation for the 17th Annual Conference on research and
development in Information Retrieval (IR), Allen conducted an experiment on the relation
between PS, IR performance and learning (Allen, 1994). He first identified the two learning
components in IR: general search patterns and the specific topic related process. The latter
stage is the one most influenced by PS. His sample was comprised of 100 students from the
University of Illinois: after answering demographic-related questions and completing a PS test,
their instruction was to read a stimulus article and perform a search on this topic (as if they
intended to write a paper on it). Two different search systems were randomly assigned to the
participants (one presented the references in usual order and the other prioritised subject
headings), while all other factors remained the same. Although this experiment verified the
correlation between high PS and learning, this correlation was only present in the users
assigned to the system “designed to enable fast scanning of subject descriptors”, I.e. the system
prioritising subject headings (Allen, 1994). This experiment indicates the importance of the
design of a system and its relation to usability.
This experiment would not have been possible without the use of technology. The tests
were all administered online, which allowed diversification of the sample and simplified scoring
and data analysis.
In the context of this experiment neither speed nor accuracy were prioritised.
Examinees were briefed on the concept of Perceptual Speed, the purpose of the study and the
tests were timed based on their difficulty.
Before delving into the experiment methodology and defining the research problem at
hand, I would like to address the concept of performance, due to its ambiguity and contextual
variance. Performance is defined as “how well a person does a piece of work or an activity”
(“Cambridge English Dictionary”). In the workplace, this definition does not change, although
the sector, job role and department play a major role in how performance is measured. O’Neill
addresses the ambiguous subject of measuring workplace performance in office environments.
He defines his own model for performance due to “the dynamic characteristics and biological
metaphor of organisations” (O’Neill, 2016). O’Neill highlights the importance of the organisation
in the measure of performance, which is the reason the model is more process than object
oriented, as is illustrated in the application of case studies.
In accordance with these definitions and in the context of this research problem,
performance will be quantified based on the presented work task involving visual processing
and accuracy.
To respond to these objectives, the dissertation is structured as such:
- The remainder of Section 4 will outline the hypotheses investigated and the PS tests used
and the rationale in selecting them
- Section 5: methodology
- Section 6: results and analysis
- Section 7: conclusion, comprising a discussion on limitations, achievements, and
recommendations for future research
4.2. Hypotheses
In this dissertation, these are the following hypotheses that will be explored throughout
the analysis. They have been derived from the research conducted on PS as well as previous
studies conducted in this space.
Table 1: Hypotheses
The study hypotheses
Null Hypothesis: There is no correlation The results from the survey described
between PS and Performance. above were analysed using a multilinear
regression. The data was also explored
for correlations.
As previously stated, all tests were administered online which limits the test options
due to the author’s programming experience. Table 2 defines several previously mentioned
existing applications or uses for certain tests.
A trend identified across previous applications is variety, i.e. every application includes
both a numerical and an alphabetical test. For this study, a Finding As test will be administered
in conjunction with a Summing to 10 test.
In a study focused on the connection between Information Retrieval and PS. Foulds,
Azzopardi and Halvey recently identified the limitations of the current findings and expose a lack
of standardisation of PS test results. They concluded that comparison between study outcomes
is not possible due to the lack of a harmonised scale to score results of the test (Foulds,
Azzopardi and Halvey, 2020). This research paper establishes its own scale of high versus low
perceptual speed, based on the results of the experiment, where a high score in comparison to
the remainder of the scores indicates high perceptual speed.
5. An Agile Methodology
This section outlines the dissertation methodology and sampling methods, followed by a
brief description of contextual limitations encountered. A description of the demographic collection
as well as the three tests will follow. Finally, the software methodology and architecture are
explained, including database creation and management and the testing methods employed.
To organise and structure this dissertation project, an Agile development method was
implemented. The Agile method is centred around four core values:
1 ✓ Obtain ethical approval from the Computer Science (CS) ethics 29th of
department May to
✓ Write and submit introduction for feedback 26th of
✓ Conduct background research and establish a sourcing method June
for the project
✓ Research various PS tests and select which ones to administer
for this experiment
These sprints were designed to be flexible, however, they are mainly sequential tasks
(with the exception of Sprints 3 and 4), i.e. one must be completed to proceed to the next.
5.2. Sampling
In order to collect sufficient data for analysis, this study aimed to recruit a minimum of
50 participants. Participants were recruited by stratified, convenience sampling. Stratified
sampling targets a subset of the population possessing a common attribute (Sincero, 2012). In
this case, the knowledge and frequent use of Python. Convenience sampling is a form of non-
probabilistic sampling characterised by convenience: using family, friends or colleagues that fit
the sampling criteria (Albert, Tullis and Tedesco, 2010). This was mainly due to the skillset
required to complete the full study. Participants were required to be familiar with Python 3. I
used my personal network, by sharing the study details on social media. This broadened the
scope significantly, seeing as many people were able to share the survey to their own network.
The study was also shared to the Computer Science and Data Analytics students at Strathclyde
University. Due to the alternate hypothesis involving the experience factor in programming
speed, the survey needed to reach a more experienced group of programmers. To facilitate
this, the website was shared on 4 closed online forums that specialise in Python and more
specifically Python for Machine Learning.
However, alongside newly imposed “work from home” procedures and economic
turmoil across a variety of industries, both organisations were forced to withdraw their
participation. The experiment pivoted to focus on the STEM industry. In lieu of external
performance data, an online debugging challenge was created that mimics a work task.
5.4. Demographic collection
Our hypotheses state the possibility that experience is a better indicator of
programming performance than Perceptual Speed. Demographic collection preceded the tests
to diversify the variables and collect data on the various characteristics of the sample population.
After providing their consent, the programmers were asked to specify their age between four
ranges of approximately 10 years and their gender (male, female, other). In relation to
experience, the participants were asked to specify their overall coding experience as well as
their familiarity and usage of Python specifically (once again with the use of ranges). Finally,
specification was noted in regards the device used, to determine if this would affect the outcome.
A desktop or laptop with external mouse was recommended to best emulate a paper-pencil test.
However, this was not a concrete requirement and the website was also available in a mobile
version. The demographic questions were made mandatory for the participants to be able to
proceed to the tests.
The debugging task (see Figure 3) was written using Python 3, due to its increasing
importance in data processing and machine learning. The language has gained immense
popularity over the past few years and it was intended to facilitate a higher participant rate.
Three pages of code were presented, each one independent from the next. The users were
asked to spot the syntax errors within the lines of code. They were instructed to select and
highlight the line containing the error, by clicking on it. They were able to deselect the line, in
case of error. No semantic errors were used, due to their dependence on the packages
imported. Sampling would have proven extremely difficult if the knowledge base were restricted
to Python function. Therefore, various syntax errors were spread across the three pages of
code, along with undefined variable errors. Across a total of 117 lines of code, there were 23
errors. The time limit provided was four minutes, after which the test timed out.
The Finding As test (see Figure 4) was composed of 315 words and was also spread
across three pages. There was a total of 85 words containing the letter A evenly spread out
across all three pages, with 28 on pages one and three, and 29 on page two. The words listed
varied from four to eight characters. Participants were asked to click on the words with an A,
which would select it by crossing it out. In case of error, they were able to click on the word
again, to deselect it. The time limit provided was two minutes, after which the test timed out and
they were asked to proceed to the next one.
Figure 4: Finding As Test
Instruction page for the Finding As Perceptual Speed test: on this page participants have the
opportunity to “practice” the test. If they click on a word that does not contain the letter “a”, the
line below the table will read: Correct: 0 Incorrect: 1. The example code is presented exactly
like the timed work task is, in order to familiarise participants with what is expected. In this
practice test there are two correct answers (“glasses” and “boats”) and two incorrect answers
(“monkey” and “yellow”).
The Summing to 10 test (see Figure 5) was a grid of 280 two-digit numbers, participants
were requested to select the digit pairs that sum to 10, for example: 82 → 8+2=10. There were
121 numbers that fell into this category, spread out across the grid. Participants were asked to
select said numbers by clicking on them, which highlighted the selection. They deselected by
clicking again. After 1.5 minutes, the test timed out.
Figure 5: Summing to 10 test
Instruction page for the Summing to 10 Perceptual Speed test: on this page participants have
the opportunity to “practice” the test. If they click on a number that does not sum to 10, the line
below the table will read: Correct: 0 Incorrect: 1. The example code is presented exactly like
the timed work task is, in order to familiarise participants with what is expected. In this practice
test there are four correct answers (“46” on line one, “19” and “64” on line two and “19” on line
three).
The Finding As Test was first, followed by the work task and finally, the Summing to
10. The reason for this was to not discourage participants, Findings As is the most relatable,
least demanding of the three tests. The work task was by far the longest, which is why it was
placed in the middle. The experiment concluded with the Summing to 10, the only test which
was one page.
*Preliminary step to Sprint 1: determined the software with which to write, store and
test the code, and which languages were most fitting.
This software architecture (Error! Reference source not found.) illustrates this
interaction between different system elements. This illustration proved fundamental in assisting
to determine at what stages changes needed to be implemented throughout testing and various
development phases.
Figure 6: Website Architecture Design
Visual representation of the three-tier architecture: the presentation tier controls what the user
sees, the logic tier implements the various website functionalities and the data tier receives and
stores the submitted data.
The web and mobile versions of the website are identical; however, the presentation
varies according to the device in use.
Testing was done incrementally throughout the web creation process. Each webpage
was confirmed to be fully operational and interactive before moving onto the next. In accordance
with the chosen Agile development method, usability testing only began when the survey and
both PS tests were complete. This was pivotal in the creation of the website, due to the fact that
unlike most experiments, there is no supervision and users are not able to ask questions prior
to commencing.
5.10. Functional Testing
Functional testing is a crucial component to the development of any software. This step
is in alignment with Agile development, which focuses on the end user of the software. It is
important to revisit the objectives at hand and ensure the approach to functional testing covers
as many aspects of the project as possible. To ensure compliance with this step in the
development process, a few different methods were applied as follows.
5.10.1.Usability Testing
Usability testing plays a highly important role in any project involving software and web
design. It allows developers to uncover design problems and opportunities for improvement.
Usability testing also reveals aspects related to the behaviour and preferences of the user. The
Agile Method particularly promotes this by ensuring the project plan incorporated is designed
around the end user. With an experiment of this nature, it was imperative to receive and
implement user feedback.
There were three instances of usability testing: the first two involved interface testing
and the third comprised a full system test. Interface testing helps to understand user trends and
implement timely changes throughout the development process. The first instance arose in
Sprint Four (Table 4). At this stage of the project, three webpages were fully operational,
including database connectivity. A group of five people were selected to participate and provide
their feedback. At this stage, no programming knowledge was required, however, it is important
to note that the selection of people was based on the target audience and only appropriate and
knowledgeable users participated. The test sample population was chosen via the convenience
method in all three instances. The ages of the participants ranged from 19 to 55, two of which
were female and the remaining three were male. The second instance appeared in Sprint Five
(Table 4) and was purely centred on the Python work task portion of the survey. This test
evolved significantly during creation, due it its important role in our measurement of work task
performance. User interpretation and clarity was pivotal. To emulate the potential sample as
closely as possible, three male participants were involved, with programming experience
varying from 1 year to 30 years. The third and final instance of usability testing was a full system
test in Sprint Six (Table 4), upon completion of the full web design. There were five participants,
four male and one female, with programming experience varying from 3 to 30 years. This was
an evaluation of the complete system as it was intended to be deployed during the experiment.
User feedback was positive in all instances, which was encouraging. Test and website
comprehensiveness were also confirmed. Certain suggestions were made to promote clarity
and general usability of the form (see Table 5). Additionally, the use of traditional PS test
instructions proved to be outdated and required updating. While requesting feedback and noting
how users were testing the website, changes were also implemented to the demographic
collection, including more questions related to user programming experience and the device
used to take the test.
Finding As One of the users was able to The number of words was increased
finish the test before the timer substantially, so as to ensure that it
ran out. would be impossible to finish the test.
Work Task Users reported difficulty Decided to emulate the popular Python
reading the code. coding platform Anaconda. This
enhanced similarity to an actual work
task.
An additional variable to note, is the use of device for the completion of the survey (see
Figure 11): 60% of respondents chose to use a desktop or laptop computer, with an external
mouse. Although it was indicated in the survey that this method was recommended, 27% still
chose to complete it on their mobile device and the remaining 13% used their laptop trackpad.
Figure 8: Ages of sample population
Histogram chart plotting age of participants: 34 between the age of 19 and 32, 7 between 32
and 41, 1 between 42 and 51, and 4 over 51.
For the Finding As test, the minimum score reported is 23 and the maximum is 78. For
the Summing to 10 test, the lowest score is 13 and the highest is 80. The standard deviation for
the PS tests is high, meaning our sample population had a diverse variation of Perceptual
Speed, which is representative of the general population. Furthermore, the average score for
each test is equidistant from the minimum and maximum values, which means approximately
half of the sample population has higher PS and the other half has lower PS, also validating the
variety of our sample population.
The Work Task standard deviation is significantly lower. Participants found 9 correct
answers on average, with a minimum of 3 and a maximum of 20. In contrast with to the PS
tests, the number of incorrect answers were much more notable in this task. On average, the
participants clicked on 4 lines of code that did not contain any errors. The minimum incorrect
answers is 0, and 10 participants were able to finish with 100% precision which correspondingly
increased the average precision figure: 73% for the sample population. The maximum incorrect
answers is 25 (there was only one instance of this). Interestingly, this is higher than the total
number of possible correct answers. This explains the lower recall values: the average is 40%
for the sample population.
Total
PS Experien Pytho Corre Incorr Precisio Rec
PS2 Age Answ
1 ce n ct ect n all
ers
Coun
t 48 48 48 48 48 48 48 48 48 48
50.
Mean
58 44.92 9.13 3.98 13.10 0.73 0.40 50.58 44.92 9.13
13.
Std
79 14.79 4.31 4.88 6.10 0.22 0.19 13.79 14.79 4.31
41.
25%
75 35.75 6 1 9 0.62 0.26 41.75 35.75 6
51.
50%
5 41.5 8 3 12 0.74 0.35 51.5 41.5 8
Total
Possible
Answers 85 121 n/a n/a n/a 23 n/a n/a n/a n/a
Python programming, especially within the machine learning space, often contains
functions. Data in the form of parameters are loaded into a function to produce results. The
deductive reasoning involved in the Summing to 10 test can be related to that required in the
Python world. This could explain the strong correlation between the two variables.
Adversely, general coding experience shows a negative coefficient (Table 13). In other
words, the more experienced the individual the lower their performance on the presented work
task. In addition, performance and age indicate a negative correlation to the overall precision
values (Table 8). Interestingly this opposes the presented hypothesis: experience plays a
greater role in work task performance, than PS. These two arguments can be explained by a
few factors. When learning Python, a lot of importance is placed upon syntax and clean code.
An individual with less experience might identify errors of this nature more easily because of
their more recent familiarity with these requirements. This counter-intuitive finding may also
indicate that after reaching a certain level of experience, these mundane errors (somewhat like
common grammatical errors in any spoken language) are no longer made or encountered.
Therefore, their ability to spot them decreases. Moreover, the intuitive software that experienced
users use to write code often identifies or auto-completes basic syntax errors (much like
Microsoft Word does when writing in most languages). Perhaps these findings identify a
reliance on the actual coding software, that is exhibited by more seasoned coders.
Total 1.651 47
Constant 0.746
R2 = 0.371 N = 46
Figure 15: Page 1 of the administered programming work task (Desktop Site)
After clicking “start”, participants are navigated to this page and need to scroll down in order to
read the full code and toggle to the next page.
Figure 16: Page 1 of the administered programming work task (Mobile Site)
After clicking “start”, participants are navigated to this page and need to zoom in to read the full
code.
7. Conclusion
This section provides a brief overview of the initial research objectives and outcomes.
A discussion highlighting the limitations and difficulties encountered throughout the study as
well as the various achievements will also be presented. Finally, suggestions will be put forward
for future research accompanied by recommendations for replicability attempts.
A further limitation in this experiment was the sample itself. The population was
primarily male, with only 25.5% females and 6.3% other. Unfortunately, the programming
industry remains predominantly male. The 2018 Women in Tech Report found only 2,000 out
of 14,000 of programmers were women in an industry survey (McDowell, 2018). Furthermore,
although the University of Strathclyde is actively trying to decrease gender imbalances within
the STEM fields, the 2017 Gender Action Plan indicates that over 75% of STEM students were
male (Strathclyde, 2017). This is a three-year action plan, therefore more recent data is not
currently available. The gender gap undisputedly continues, which helps to explain the lack of
female participants. As a result, it was impossible to perform an analysis based on gender.
Using gender as a variable could prove to be very interesting. This could be achieved through
communication with a Female Coding society or using a STEM industry with a higher number
of female contributors. This would most definitely increase female participants and allow for a
more equal gender-based analysis.
Another aspect to note is that 68.1% of participants were below the age of 32. The
demographic collection could have been more precise to allow for further speculation on the
correlation between age and PS. The reason for the large age ranges was privacy related and
to not discourage participants from proceeding by protecting their identity as much as possible.
However, it is believed that protecting the respondent’s anonymity does not need to extend to
age, especially as there is an increased understanding for the necessity of this variable when
conducting research. Participants were questioned after completing the test (when possible)
and they reported no issue with providing a more precise age. With more precise demographic
collection regarding participants’ age and experience, there could have been more variety in the
visualisations: i.e. plotting a histogram or scatter plot per variable in order to identify trends
rather than the simple bar charts used in this dissertation.
Although the correlation between Perceptual Speed (Summing to 10) and work task
recall is evident, no definite conclusions on this matter can be drawn from this experiment. It
could be beneficial to further explore the influence of the type of PS test administered. STEM
professionals might respond better to locating symbols as opposed to comparing them.
With respect to the programming challenge itself, feedback was collected from one of
the more experienced programmer participants. They indicated that due to the programming
software currently available and in use, spotting syntax errors is no longer necessary, in that
the software can either auto-complete it, or the output indicates exactly where the error is. This
aligns with the negative correlation between experience and performance on the work task. It
further validates Ackerman’s previous research on PS and skill acquisition. It was presented
that the importance of PS increases in the first two phases and then declines in importance in
the third phase, where psychomotor ability takes precedence (Ackerman and Cianciolo, 2000).
To emulate real world performance, the coding task could involve reading and interpreting a
code and describing what the output would be. This task involves PS and would provide a better
indication of an individual’s coding ability. However, the test would have to be administered to
specific skill levels.