Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

THE RELATIONSHIP BETWEEN PERCEPTUAL

SPEED AND WORK TASK PERFORMANCE IN


THE CONTEXT OF PROGRAMMING

Viktoria Spanou

This dissertation was submitted in part fulfilment of requirements for the degree of MSc

Information Management

DEPARTMENT OF COMPUTER AND INFORMATION SCIENCES


University of Strathclyde

August 2020
Abstract
This dissertation explores the use of Perceptual Speed in the STEM hiring process.
Perceptual Speed (PS) has been associated with a variety of cognitive traits as well as
performance in certain industries. Perceptual Speed (PS) is defined as the cognitive ability of
an individual to compare, scan and find symbols or numbers quickly and accurately. This study
aims to determine whether or not high PS is an indicator of high performance in the context of
computer programming. An online experiment was conducted comprising two PS tests and one
programming work task in the attempt to identify a correlation between the two. Analysis of the
results demonstrated a relationship between one of the PS tests and work task recall values.
The results indicate that other variables (experience, device used) also affect an individual’s
performance in the administered task. Further research is necessary to identify whether PS can
be used to determine if a candidate is an appropriate fit for a STEM role.
Declaration
This dissertation is submitted in part fulfilment of the requirements for the degree of
MSc of the University of Strathclyde.

I declare that this dissertation embodies the results of my own work and that it has
been composed by myself. Following normal academic conventions, I have made due
acknowledgement, to the work of others.

I declare that I have sought, and received, ethics approval via the Departmental Ethics
Committee as appropriate to my research.

I give permission to the University of Strathclyde, Department of Computer and


Information Sciences, to provide copies of the dissertation, at cost, to those who may in the
future request a copy of the dissertation for private study or research.

I give permission to the University of Strathclyde, Department of Computer and


Information Sciences, to place a copy of the dissertation in a publicly available archive.
(please tick) Yes [X] No [ ]

I declare that the word count for this dissertation (excluding title page, declaration,
abstract, acknowledgements, table of contents, list of illustrations, references and appendices
is .

I confirm that I wish this to be assessed as a Type (five): 1 2 3 4 5

Dissertation (please circle).

Signature: Viktoria Spanou

Date: August 15th 2020


Acknowledgments
A special thank you to my parents for their continued support throughout this process
and to my sister, whose expertise has assisted me more times than I can count.
Table of Contents
1. Introduction ......................................................................................................................... 9

2. A history of Perceptual Speed Studies and Applications ....................................................11

2.1. What is Perceptual Speed? .......................................................................................11

2.2. How is Perceptual Speed measured? .......................................................................12

3. Contextual relevance: Perceptual Speed in the hiring process ...........................................14

3.1. Further research: The relationship between PS and other cognitive tasks ................15

3.2. PS and Programming ................................................................................................16

4. Research Hypotheses and Objectives................................................................................17

4.1. Research Objectives .................................................................................................17

4.2. Hypotheses ...............................................................................................................18

4.3. Perceptual Speed test types, choosing the right ones ...............................................19

5. The Agile Method ...............................................................................................................21

5.1. Dissertation Methodology ..........................................................................................21

5.2. Sampling ...................................................................................................................23

5.3. External factors and contextual limitations ................................................................23

5.4. Demographic collection .............................................................................................24

5.5. The tests ...................................................................................................................24

5.6. Software methodology ..............................................................................................27

5.7. Software Architecture ................................................................................................29

5.8. Web design ...............................................................................................................30

5.9. Database Creation and Management ........................................................................30

5.10. Functional Testing .....................................................................................................32

5.10.1. Usability Testing ..................................................................................................32


5.10.2. Software Integration and Unit Testing ..................................................................34
5.10.3. Further Website Functionality ..............................................................................35
6. Data Analysis: Results and Evaluations .............................................................................37
6.1. Sample population ....................................................................................................37

6.2. Scoring......................................................................................................................40

6.3. Descriptive Perceptual Speed Scores .......................................................................40

6.4. Results Analysis and Evaluation ...............................................................................41

6.4.1. Perceptual Speed, Coding Experience and Programming....................................41


6.4.2. Effect of device on overall Performance ...............................................................44
7. Conclusion .........................................................................................................................49

7.1. A relatively unexplored domain .................................................................................49

7.2. Discussion: Limitations and Recommendations for further experimentation ..............49

7.3. Summary and Final Remarks ....................................................................................51

8. References.........................................................................................................................53
Table of figures
Figure 1: An example of a test for perceptual speed .................................................................. 9

Figure 2: Number Comparison Test ..........................................................................................12

Figure 3: Work Task ..................................................................................................................25

Figure 4: Finding As Test ..........................................................................................................26

Figure 5: Summing to 10 test ....................................................................................................27

Figure 6: Website Architecture Design ......................................................................................30

Figure 7: Entity Relationship Diagram (ERD) ............................................................................31

Figure 8: Ages of sample population .........................................................................................38

Figure 9: General Coding experience of sample population ......................................................38

Figure 10: Python Experience of sample population .................................................................39

Figure 11: Devices used by sample population .........................................................................39

Figure 12: Finding As Score by Device .....................................................................................45

Figure 13: Summing to 10 Score per Device .............................................................................45

Figure 14: Work Task Precision and Recall per Device .............................................................46

Figure 15: Page 1 of the administered programming work task (Desktop Site)..........................47

Figure 16: Page 1 of the administered programming work task (Mobile Site) ............................48
Table of tables
Table 1: Hypotheses ..............................................................................................................18

Table 2: Types of PS tests and their use ................................................................................19

Table 3: Dissertation Management: Sprints ............................................................................22

Table 4: Software Creation: Sprints ........................................................................................28

Table 5: User Feedback .........................................................................................................33

Table 6: Functionality Tests ...................................................................................................35

Table 7: Descriptive Statistics ................................................................................................41

Table 8: Correlation between Work Task Precision/Recall and other variables ......................42

Table 9: Model Summary: Precision .......................................................................................43

Table 10: Model Summary: Recall .........................................................................................43

Table 11: ANOVA...................................................................................................................43

Table 12: Coefficients ............................................................................................................44

Table 13: P-Value ..................................................................................................................44


1. Introduction
Perceptual Speed (PS) is the cognitive ability which determines an individual’s “speed
and accuracy in comparing figures or symbols, [in] scanning to find figures or symbols, or [in]
carrying out other very simple tasks involving visual perception” (Ekstrom et al., 1976). An
example of a method to test an individual’s PS is to present them with a list of words and give
them a few minutes to identify all words containing the letter “a”, otherwise known as the Finding
As Test (see Figure 1).

Figure 1: An example of a test for perceptual speed


A partially completed paper-pencil Finding As test: participants were instructed to cross out all
words containing the letter “A”
The type of test presented in Figure 1 was administered on paper and dates back to
the 1970s. Since those early days of perceptual speed testing, the methods have progressed
and evolved in accordance with the introduction of new technologies and reasons for use. In
more recent years, researchers have used these tests in conjunction with other tests to relate
an individual's level of PS to another cognitive trait. Contemporary practical applications most
often involve a measure of the speed with which participants can accurately assimilate visual
stimuli.
High PS has been related to high performance in various sectors, for example within
the air force aviation industry (Johnson et al., 2017). Johnson et al. conducted a study wherein
a combination of spatial ability, PS and academic aptitude tests were administered to Aviator
trainees. Perceptual Speed showed superior incremental validity in predicting flying
performance when compared to the results of academic tests combined with the technical
knowledge of the participants. This study is one of very few (to the author’s knowledge) which
investigates the relationship between PS and performance in a STEM sector. Despite this lack
of research, there is an increasing trend in the use of these tests in the first stages of the hiring
process (in various sectors including STEM) to assess the abilities and potential capabilities of
interviewees. Thomas International Ltd. proposes a range of online assessments to better
evaluate future candidates and ensure longevity within their roles to companies. One of these
is the General Intelligence Assessment (GIA), which is used to classify potential employees
based on their ability to assimilate to a role. The GIA is composed of 5 online assessments:
Reasoning, Perceptual Speed, Number Speed and Accuracy, Word Meaning, and Spatial
Visualisation. This composition is based on techniques of the British Army Recruitment Battery
and aims to identify fluid intelligence as opposed to general intelligence. This assessment is
registered with the British Psychological Society (BPS) and is believed to have helped
thousands of companies “bring out the best in [your] people” (Thomas International Ltd, 2020).

Programming, from the perspective of a work task, comprises a large visual


component. In this dissertation, the relationship between PS and task performance is explored,
examining whether high PS is an indicator of the efficiency of the performance in programming
work tasks, or whether experience in the task takes precedence over PS within the programming
industry. This relationship is explored through an online experiment conducted with a sample
group of programmers and comprises three tests: two perceptual speed tests and a
programming work task.

The next section (Section 2) outlines the previous work undertaken in the Perceptual
Speed space. The remainder of the dissertation is structured as follows: Section 3 presents the
use of PS tests in the hiring process; Section 4 presents the hypotheses investigated in this
dissertation and describes the various types of PS tests and the rationale in selecting the
appropriate test; and Sections 5, 6 and 7 present the methodology, analysis, results and
discussion, conclusions, and recommendations for future research respectively.
2. A history of Perceptual Speed Studies and
Applications

2.1. What is Perceptual Speed?


Perceptual Speed tests were first officially defined by Ekstrom, French, Harman and
Dermen who created a ‘kit’ of factor-Referenced Cognitive Tests (Ekstrom et al., 1976)1. This
kit serves as a guideline for Perceptual Speed researchers and contains various
recommendations on how to conduct cognitive ability research such as the importance of using
multiple cognitive tests when conducting a study in order to ensure the validity of the collected
data. The kit describes a variety of different PS tests including: Finding A’s, Number Comparison
Test and Identical Pictures Test along with scoring methods and examples (see Figure 2). This
manual is considered a reference for Perceptual Speed and other cognitive tests and led to a
plethora of studies, including but not limited to the ones listed below.

1 It may be noted that PS tests were used prior to 1976, mainly to measure the work task efficiency

of administrative hires, as PS was a validated proxy for clerical speed (Gael, Grant and Ritchie, 1975).
Figure 2: Number Comparison Test
Extract from Ekstrom’s kit (page 123): PS test with instructions, example and marking criteria

2.2. How is Perceptual Speed measured?


In their original form, PS tests were administered in paper-pencil format as indicated in
Figure 1 and Figure 2. In general, the tests comprise low difficulty tasks that participants must
complete within a short time limit. A participant’s Perceptual Speed is measured by the accuracy
of task completion within the imposed time limit (Parks et al., 2001).

Philip Ackerman, a pioneer in Perceptual Speed research and literature, conducted a


study with Victor Ellingsen which was designed to test the importance of the time limit
component of PS tests (Ackerman and Ellingsen, 2016). They hypothesised that if the PS tests
were administered without a time limit, participants would not make any errors and, therefore,
there is a trade-off between accuracy and speed. PS tests were used in conjunction with
psychomotor ability tests to determine whether and how good performance was related to
higher intelligence. There were three different treatments of the PS test: in the first treatment,
the text required that participants should prioritise accuracy in their responses, the second that
they should prioritise speed, and the third demanded that both accuracy and speed should be
equally prioritised. This study highlighted the importance of the type of instructions provided to
examinees due to the fact that test results varied depending on the recommended focus (speed
or accuracy). The study concluded that perceptual speed and intellectual ability positively
correlate in laboratory conditions (Ackerman and Ellingsen, 2016).

With the exception of digitisation, which began in the 1980s, the administered PS tests
have not varied greatly since the early applications. A comparison was conducted between the
computerised and paper versions of various aptitude tests to confirm whether or not the
computerised test was an “accurate representation of the test battery” (Henly et al., 1989) . This
comparison validated Computerised Adaptive Versions of aptitude tests and as a result,
broadened their scope.
3. Contextual relevance: Perceptual Speed in the hiring
process
The use of PS tests has evolved over the last twenty years and they are currently being
used together with cognitive tests in the context of job hiring. Employers are continually seeking
more and more criteria to determine how a candidate’s suitability and fit in a role and the
organisation as a whole, in addition to how well they will perform after hiring. Moreover, due to
the increased number of highly qualified applicants for each open position, this additional testing
adds another layer of classification and contributes to reducing costs throughout the entire hiring
process. As this process is often outsourced, this further explains the success of companies
like the aforementioned Thomas International Ltd.

Ackerman further explored Perceptual Speed tests by relating them to skill acquisition,
investigating whether or not he could identify individual differences based on PS (Ackerman
and Beier, 2007). Ackerman’s research aimed to determine if an individual with high Perceptual
Speed was able to more quickly adapt to a new role than an equally qualified candidate with
lower PS. Although this study took place many years after the first instance of digitisation, there
was still very little research into the benefits of computerised testing. Therefore, in all
applications of PS testing prior to this study, one limitation of administering the PS tests was the
effort and time-intensive nature of the test scoring, as the tests had to be administered in small
groups and then graded individually.

In this study, skill acquisition is defined following the three phases of skill acquisition
defined by Fitts and Posner (Fitts and Posner, 1967). The first phase is general cognitive
abilities (verbal, maths, spatial), the second phase refers to PS and the third phase
encompasses psychomotor abilities. This model is employed in this study due to Ackerman’s
previous successful use of and endorsement of it (Ackerman and Cianciolo, 2000). Ackerman
concluded that a correlation exists between participant performance on speed tests and their
ability to acquire skills. This correlation increases in phase one and two and declines in phase
three. Ackerman was able to highlight the usefulness of testing PS as a proxy for predicting
work task performance (for certain job types) due to the ease with which PS tests can be
administered and the variety of tests that are available.

To further generalise the predictive validity of PS in this context, Ackerman explains


that its use is particularly relevant when the associated work task involves speed and accuracy,
e.g. a bank teller or various administrative roles. PS tests can be used to inform on work
performance in ways that general aptitude tests do not incorporate (Ackerman and Beier, 2007).
The use of these tests to predict performance in clerical roles dates back to the 1930s (Andrew,
Paterson and Longstaff, 1979). Furthermore, the US military utilises two different PS tests to
screen and select applicants across a variety of roles (Held, Carretta and Rumsey, 2014).

In 2008, another experiment was conducted which compared PS to General Mental


Ability (GMA), as well as Personality and Job Performance (Mount, Oh and Burns, 2008). The
researchers defined PS as a person's “speed of processing and ability to focus attention”. The
study was carried out with warehouse workers and different tests were administered for each
factor. Mount, Oh and Burns found that for a job with low complexity, PS can be used instead
of GMA to predict job performance. Mount et al. also concluded that high Perceptual Speed
predicts one’s ability to produce results under pressure and time constraints, but should be
taken in relation with other criteria in the assessment of job performance as a whole (Mount, Oh
and Burns, 2008).

3.1. Further research: The relationship between PS and other


cognitive tasks
Although PS might not be the sole indicator of an individual’s overall success in a
certain role, it has been related to a plethora of other cognitive tasks.

Johnson and Deary conducted a study on information processing speed in which they
measured PS reaction and inspection time compared to general cognitive ability. The sample
population chosen for this study were over the age of 70, which is important as it has been
proven that PS decreases with age (Ghisletta and Lindenberger, 2003). The researchers
administered 18 different ability and speed tests. The study concluded that high spatial, verbal
and perceptual speed were more relatable to information processing speed as opposed to
general intelligence (Johnson and Deary, 2011).

There are many cases studies outlining the benefits of high perceptual speed in
information retrieval tasks. In preparation for the 17th Annual Conference on research and
development in Information Retrieval (IR), Allen conducted an experiment on the relation
between PS, IR performance and learning (Allen, 1994). He first identified the two learning
components in IR: general search patterns and the specific topic related process. The latter
stage is the one most influenced by PS. His sample was comprised of 100 students from the
University of Illinois: after answering demographic-related questions and completing a PS test,
their instruction was to read a stimulus article and perform a search on this topic (as if they
intended to write a paper on it). Two different search systems were randomly assigned to the
participants (one presented the references in usual order and the other prioritised subject
headings), while all other factors remained the same. Although this experiment verified the
correlation between high PS and learning, this correlation was only present in the users
assigned to the system “designed to enable fast scanning of subject descriptors”, I.e. the system
prioritising subject headings (Allen, 1994). This experiment indicates the importance of the
design of a system and its relation to usability.

3.2. PS and Programming


Within the science, technology, engineering and mathematics (STEM) career fields
“individual differences in spatial ability contribute to learning, the development of expertise, and
securing advanced educational and occupational credentials” (Lubinski, 2010). Perceptual
Speed was shown to be a part of spatial cognition, defined as a cognitive capacity to
comprehend and recall spatial relations among objects (Lohman F, 1979). This application
continues to be extremely underutilised in the assessment of potential STEM professionals
despite the fact that its use has been demonstrated.
4. Research Hypotheses and Objectives

4.1. Research Objectives


This dissertation will attempt to validate the use of PS tests in the context of hiring by
administering two PS tests in conjunction with a programming work task. The results will be
analysed to identify any underlying trends in the data.

This experiment would not have been possible without the use of technology. The tests
were all administered online, which allowed diversification of the sample and simplified scoring
and data analysis.

In the context of this experiment neither speed nor accuracy were prioritised.
Examinees were briefed on the concept of Perceptual Speed, the purpose of the study and the
tests were timed based on their difficulty.

Although programming cannot be considered a low complexity task, in relation to the


specific work task performance, Mount, Oh and Burns’ theory might prove valid in that the
programmers with higher Perceptual Speed might perform better than those with lower PS
scores, despite their levels of experience. Furthermore, due to the nature of the experiment (i.e.
online), there is no immediate control over the system in use. Therefore, this will be considered
a variable in order to determine if there is an effect on the results.

Before delving into the experiment methodology and defining the research problem at
hand, I would like to address the concept of performance, due to its ambiguity and contextual
variance. Performance is defined as “how well a person does a piece of work or an activity”
(“Cambridge English Dictionary”). In the workplace, this definition does not change, although
the sector, job role and department play a major role in how performance is measured. O’Neill
addresses the ambiguous subject of measuring workplace performance in office environments.
He defines his own model for performance due to “the dynamic characteristics and biological
metaphor of organisations” (O’Neill, 2016). O’Neill highlights the importance of the organisation
in the measure of performance, which is the reason the model is more process than object
oriented, as is illustrated in the application of case studies.

In accordance with these definitions and in the context of this research problem,
performance will be quantified based on the presented work task involving visual processing
and accuracy.
To respond to these objectives, the dissertation is structured as such:

- The remainder of Section 4 will outline the hypotheses investigated and the PS tests used
and the rationale in selecting them
- Section 5: methodology
- Section 6: results and analysis
- Section 7: conclusion, comprising a discussion on limitations, achievements, and
recommendations for future research

4.2. Hypotheses
In this dissertation, these are the following hypotheses that will be explored throughout
the analysis. They have been derived from the research conducted on PS as well as previous
studies conducted in this space.

Table 1: Hypotheses
The study hypotheses

Hypotheses Testing method

The higher an individual’s Perceptual Two PS tests were administered in a


Speed, the better they will perform in a survey alongside a visual programming
visual work task such as programming. work task to test the relationship between
PS and work task performance.

Null Hypothesis: There is no correlation The results from the survey described
between PS and Performance. above were analysed using a multilinear
regression. The data was also explored
for correlations.

Alternate Hypothesis: Experience plays a The strengths of the relationships


greater role in work task performance than between experience/PS and work task
PS. performance were investigated using the
estimated coefficients and correlations.

Alternate Hypothesis: Other variables The strengths of the relationships


have a greater correlation to work task between other variables/PS and work
performance than PS. task performance were investigated
using the estimated coefficients and
correlations.
4.3. Perceptual Speed test types, choosing the right ones
There are two types of PS tests: tests of speed in locating symbols or patterns
(Cancellation, Finding As, Scattered Xs) and tests of speed comparing symbols or patterns
(Clerical Checking, Name Comparison, Numerical Checking). When designing these PS tests,
Ekstrom, French and Dermen specified that to validate the use of these tests in research, more
than one should be administered (Ekstrom et al., 1976). Therefore, this experiment comprises
one of each type of test.

As previously stated, all tests were administered online which limits the test options
due to the author’s programming experience. Table 2 defines several previously mentioned
existing applications or uses for certain tests.

Table 2: Types of PS tests and their use


Application and context for certain PS tests.

Context Who uses it? Since when PS test

Clerical Test Minnesota Clerical and Number Comparison


Occupational Testing.
Name Comparison
In use since the 1930s.
(Andrew, Paterson and
Longstaff, 1979)

US Military Applicants US military (Held, Carretta Code Speed


and Rumsey, 2014)
Numerical Operations

Used to measure Inspection Used in a research study Verbal speed test


time and IQ (Johnson and Deary, 2011)
Numerical speed test
Figural speed test

A trend identified across previous applications is variety, i.e. every application includes
both a numerical and an alphabetical test. For this study, a Finding As test will be administered
in conjunction with a Summing to 10 test.

In a study focused on the connection between Information Retrieval and PS. Foulds,
Azzopardi and Halvey recently identified the limitations of the current findings and expose a lack
of standardisation of PS test results. They concluded that comparison between study outcomes
is not possible due to the lack of a harmonised scale to score results of the test (Foulds,
Azzopardi and Halvey, 2020). This research paper establishes its own scale of high versus low
perceptual speed, based on the results of the experiment, where a high score in comparison to
the remainder of the scores indicates high perceptual speed.
5. An Agile Methodology
This section outlines the dissertation methodology and sampling methods, followed by a
brief description of contextual limitations encountered. A description of the demographic collection
as well as the three tests will follow. Finally, the software methodology and architecture are
explained, including database creation and management and the testing methods employed.

5.1. Dissertation Methodology


In order to test our presented hypotheses a variety of preliminary work has been done.
We outlined the method with which they were going to be tested, applied for ethical approval,
factored in external variants and selected the PS tests to administer.

To organise and structure this dissertation project, an Agile development method was
implemented. The Agile method is centred around four core values:

1. “individuals and interactions over processes and tools,


2. working software over comprehensive documentation,
3. customer collaboration over contract negotiation, and
4. responding to change over following a plan. “ (Beck et al., 2001)

Defined as a flexible methodology, allowing for expedited feedback (Rumpe, 2017)


Agile is popular within the project management sector, especially when software development
is involved. The experiment itself was prefaced by a significant technical component, that
required development, testing and deployment. Due to the experimental nature of this
dissertation, the Agile method was most suitable, as it involved segregating the project into
smaller tasks with set timelines, otherwise known as sprints. The project itself was divided into
six sprints, illustrated in Table 3.

Communication is fundamental to Agile processes. Weekly meetings were scheduled


with my dissertation supervisor and a shared OneDrive folder was set up to facilitate the
meetings. Prior to each meeting the folder was updated with the new documents if applicable,
allowing for review and feedback. This method facilitated transparency as to the progress of the
project as well as a frequent opportunity for critique and questions, which ensured key
milestones were met. The flexibility of the Agile method proved pivotal when external factors
came into play, as detailed in Section 5.3 below.
Table 3: Dissertation Management: Sprints
Sprint breakdown according to task and project timeline.
Sprint Task Timeline

1 ✓ Obtain ethical approval from the Computer Science (CS) ethics 29th of
department May to
✓ Write and submit introduction for feedback 26th of
✓ Conduct background research and establish a sourcing method June
for the project
✓ Research various PS tests and select which ones to administer
for this experiment

2 ✓ Design and create website 29th of


✓ Design and create database June to
✓ Establish connectivity between database and webpage 17th of
✓ Testing: both local and external July

3 ✓ Deploy website publicly on the server 17th to 24th


✓ Further user testing of July
✓ 24th of July: Test deployment

4 ✓ Redaction of the methodology portion of the research project 17th to 31st


of July

5 ✓ Data visualisations and analysis: on Python and SPSS 31st of


✓ Redaction of the conclusion: Discussion, Limitations and July to 7th
Recommendations of August

6 ✓ Submit final draft 10th to 17th


✓ Review, Review, Review of August
✓ Gathering appendices (if applicabke) and formatting

These sprints were designed to be flexible, however, they are mainly sequential tasks
(with the exception of Sprints 3 and 4), i.e. one must be completed to proceed to the next.
5.2. Sampling
In order to collect sufficient data for analysis, this study aimed to recruit a minimum of
50 participants. Participants were recruited by stratified, convenience sampling. Stratified
sampling targets a subset of the population possessing a common attribute (Sincero, 2012). In
this case, the knowledge and frequent use of Python. Convenience sampling is a form of non-
probabilistic sampling characterised by convenience: using family, friends or colleagues that fit
the sampling criteria (Albert, Tullis and Tedesco, 2010). This was mainly due to the skillset
required to complete the full study. Participants were required to be familiar with Python 3. I
used my personal network, by sharing the study details on social media. This broadened the
scope significantly, seeing as many people were able to share the survey to their own network.
The study was also shared to the Computer Science and Data Analytics students at Strathclyde
University. Due to the alternate hypothesis involving the experience factor in programming
speed, the survey needed to reach a more experienced group of programmers. To facilitate
this, the website was shared on 4 closed online forums that specialise in Python and more
specifically Python for Machine Learning.

5.3. External factors and contextual limitations


In March 2020, due to the spread of COVID-19, the world went into a locked down
state. Non-essential businesses were closed and although the severity differed from country to
country, the general instruction was to remain home until further notice. Initially, the research
proposal was in conjunction with two organisations (which is reflected in the form for ethical
approval). One of which was an online coding website where programmers from across the
globe participate in online challenges. The challenges are divided by specialty and have a time
limit, generally of less than 2 hours. The programmers compete to win a monetary reward. This
challenge would comprise the performance measure for the programmers and after completing
it, the participants were intended to take the PS tests. The second organisation specialises in
digital media, another industry involving visual stimuli. The annual performance of employees
was to be quantified and compared to their level of Perceptual Speed.

However, alongside newly imposed “work from home” procedures and economic
turmoil across a variety of industries, both organisations were forced to withdraw their
participation. The experiment pivoted to focus on the STEM industry. In lieu of external
performance data, an online debugging challenge was created that mimics a work task.
5.4. Demographic collection
Our hypotheses state the possibility that experience is a better indicator of
programming performance than Perceptual Speed. Demographic collection preceded the tests
to diversify the variables and collect data on the various characteristics of the sample population.
After providing their consent, the programmers were asked to specify their age between four
ranges of approximately 10 years and their gender (male, female, other). In relation to
experience, the participants were asked to specify their overall coding experience as well as
their familiarity and usage of Python specifically (once again with the use of ranges). Finally,
specification was noted in regards the device used, to determine if this would affect the outcome.
A desktop or laptop with external mouse was recommended to best emulate a paper-pencil test.
However, this was not a concrete requirement and the website was also available in a mobile
version. The demographic questions were made mandatory for the participants to be able to
proceed to the tests.

5.5. The tests


Before describing the tests, it is important to note an aspect of administering PS tests:
the length of the test itself. The tests are intentionally designed to be too long to finish, as the
goal is not completion, but rather to determine accuracy and speed. If a respondent were able
to finish before the timer, it would give them time to review their results, which in turn would
nullify them, and they would no longer be indicators of PS. Therefore, all three tests presented
below, have been designed to be impossible to complete. This was tested extensively, with
different skill levels throughout the testing process (Section Functional Testing5.10).

The debugging task (see Figure 3) was written using Python 3, due to its increasing
importance in data processing and machine learning. The language has gained immense
popularity over the past few years and it was intended to facilitate a higher participant rate.
Three pages of code were presented, each one independent from the next. The users were
asked to spot the syntax errors within the lines of code. They were instructed to select and
highlight the line containing the error, by clicking on it. They were able to deselect the line, in
case of error. No semantic errors were used, due to their dependence on the packages
imported. Sampling would have proven extremely difficult if the knowledge base were restricted
to Python function. Therefore, various syntax errors were spread across the three pages of
code, along with undefined variable errors. Across a total of 117 lines of code, there were 23
errors. The time limit provided was four minutes, after which the test timed out.

Figure 3: Work Task


Instruction page for the programming work task: on this page participants have the opportunity
to “practice” the test. If they click on a line that does not contain an error, the line below the
table will read: Correct: 0 Incorrect: 1. This is so participants know what kind of errors they
are looking for when they click on the “Start” button. The example code is presented exactly
like the timed work task is, in order to familiarise participants with what is expected. In this
practice question, the error is on the fourth line, it should read: np.dot(a,b) (punctuation error).

The Finding As test (see Figure 4) was composed of 315 words and was also spread
across three pages. There was a total of 85 words containing the letter A evenly spread out
across all three pages, with 28 on pages one and three, and 29 on page two. The words listed
varied from four to eight characters. Participants were asked to click on the words with an A,
which would select it by crossing it out. In case of error, they were able to click on the word
again, to deselect it. The time limit provided was two minutes, after which the test timed out and
they were asked to proceed to the next one.
Figure 4: Finding As Test
Instruction page for the Finding As Perceptual Speed test: on this page participants have the
opportunity to “practice” the test. If they click on a word that does not contain the letter “a”, the
line below the table will read: Correct: 0 Incorrect: 1. The example code is presented exactly
like the timed work task is, in order to familiarise participants with what is expected. In this
practice test there are two correct answers (“glasses” and “boats”) and two incorrect answers
(“monkey” and “yellow”).

The Summing to 10 test (see Figure 5) was a grid of 280 two-digit numbers, participants
were requested to select the digit pairs that sum to 10, for example: 82 → 8+2=10. There were
121 numbers that fell into this category, spread out across the grid. Participants were asked to
select said numbers by clicking on them, which highlighted the selection. They deselected by
clicking again. After 1.5 minutes, the test timed out.
Figure 5: Summing to 10 test
Instruction page for the Summing to 10 Perceptual Speed test: on this page participants have
the opportunity to “practice” the test. If they click on a number that does not sum to 10, the line
below the table will read: Correct: 0 Incorrect: 1. The example code is presented exactly like
the timed work task is, in order to familiarise participants with what is expected. In this practice
test there are four correct answers (“46” on line one, “19” and “64” on line two and “19” on line
three).

The Finding As Test was first, followed by the work task and finally, the Summing to
10. The reason for this was to not discourage participants, Findings As is the most relatable,
least demanding of the three tests. The work task was by far the longest, which is why it was
placed in the middle. The experiment concluded with the Summing to 10, the only test which
was one page.

5.6. Software methodology


As previously illustrated by Ackerman and Ellingson (Ackerman and Ellingsen, 2016)
the user’s interpretation of the test can greatly affect the outcome. Therefore, a crucial step to
the website development was to ensure clarity and usability of the tests. To facilitate this, an
Agile approach was implemented. This phase was also managed in sprints, illustrated in Table
4. In accordance with the Agile method.
Table 4: Software Creation: Sprints
Sprint breakdown with timeline for creation of the website.
Sprint Task Timeline

1* ✓ Creating the first webpage: demographic form with consent 29th of


collection June to 3rd
✓ Creating a local database to store test data of July
✓ Creating corresponding table within the database
✓ Connecting the database to the webpage
✓ Local testing to ensure connectivity and comprehensive data
storage

2 ✓ Creating the second webpage: Findings As Test 3rd to 8th


✓ Creating corresponding table within the database of July
✓ Connecting the database to the webpage
✓ Local testing

3 ✓ Creating the third webpage: Summing to 10 Test 8th to 13th


✓ Creating corresponding table within the database of July
✓ Connecting the database to the webpage
✓ Local testing

4 ✓ Ensuring page connectivity and creation of a unique ID per 13th to 17th


submission of July
✓ Deploying partial website (Demographic Form, and both PS
tests) on the server
✓ Usability testing (detailed in 5.10.1)

5 ✓ Creating the fourth webpage: Python 3 Debugging Test 17th to 21st


✓ Creating corresponding table within the database of July
✓ Connecting the database to the webpage
✓ Local testing
✓ Deploying webpage on server on its own
✓ Usability testing

6 ✓ Ensuring full page connectivity and testing 21st to 24th


✓ Usability Testing of July
✓ Website Deployment

*Preliminary step to Sprint 1: determined the software with which to write, store and
test the code, and which languages were most fitting.

5.7. Software Architecture


A three-tier architecture was implemented, where the web pages were written in PHP
with embedded HTML. Due to the length of the code for each test, the corresponding webpage
comprises its own PHP file and is linked to separate CSS and JavaScript files, used for styling
and creating responsive interactive tests respectively. The JavaScript library employed is
jQuery, which is used to condense and simplify JavaScript code. Although not the most popular
of the JavaScript libraries, jQuery had the suitable features for what was needed: HTML and
CSS manipulation, HTML event methods and effects (timer). The code was written using Visual
Studio Code and was backed up on a GitHub repository. Following the Agile method, this
practice allows multiple versions of the code to be saved onto the cloud which facilitates sharing,
regular review and code security. The pages were tested locally before being linked and
deployed on the server.

This software architecture (Error! Reference source not found.) illustrates this
interaction between different system elements. This illustration proved fundamental in assisting
to determine at what stages changes needed to be implemented throughout testing and various
development phases.
Figure 6: Website Architecture Design
Visual representation of the three-tier architecture: the presentation tier controls what the user
sees, the logic tier implements the various website functionalities and the data tier receives and
stores the submitted data.

5.8. Web design


The aim was to create pages with a simple design and interface so as not to distract
from the visual task in hand. All pages were designed with a simple blue colour scheme, with
grey undertones, in accordance with the logo of Strathclyde University. The tests follow a very
similar design in order to ensure visual continuity (see Figure 3, Figure 4, Figure 5).

The web and mobile versions of the website are identical; however, the presentation
varies according to the device in use.

5.9. Database Creation and Management


Database creation and management was executed through phpMyAdmin, which is an
online MySQL database management application. The database contains one table per
webpage. Therefore, on each webpage, the user submits a form to the database before
proceeding to the next page. Initially, the idea was to create one large form that would be
submitted to the database upon completion of the full test. However, due to the length of the
entire survey, the best practice seemed to be a more segmented approach. As shown below in
the Entity Relationship Diagram (ERD: Figure 7), the primary key for each table is also the
foreign key, ensuring a one-to-one relationship between each table.

To abide by ethics and increase participation by ensuring anonymity, each participant


was assigned a unique session ID using php. To achieve this, upon submission of the first form,
the webpage was designed to designate a set of random numbers and characters to the user.
However, due to the randomisation, this feature is known to produce duplicates. To ensure
uniqueness of each submission, the session ID was designed to be preceded by a web
generated timestamp of submission.

Figure 7: Entity Relationship Diagram (ERD)


Diagram summarising entities that compile the database and the relationships between the
tables (1:1)

Testing was done incrementally throughout the web creation process. Each webpage
was confirmed to be fully operational and interactive before moving onto the next. In accordance
with the chosen Agile development method, usability testing only began when the survey and
both PS tests were complete. This was pivotal in the creation of the website, due to the fact that
unlike most experiments, there is no supervision and users are not able to ask questions prior
to commencing.
5.10. Functional Testing
Functional testing is a crucial component to the development of any software. This step
is in alignment with Agile development, which focuses on the end user of the software. It is
important to revisit the objectives at hand and ensure the approach to functional testing covers
as many aspects of the project as possible. To ensure compliance with this step in the
development process, a few different methods were applied as follows.

5.10.1.Usability Testing
Usability testing plays a highly important role in any project involving software and web
design. It allows developers to uncover design problems and opportunities for improvement.
Usability testing also reveals aspects related to the behaviour and preferences of the user. The
Agile Method particularly promotes this by ensuring the project plan incorporated is designed
around the end user. With an experiment of this nature, it was imperative to receive and
implement user feedback.

There were three instances of usability testing: the first two involved interface testing
and the third comprised a full system test. Interface testing helps to understand user trends and
implement timely changes throughout the development process. The first instance arose in
Sprint Four (Table 4). At this stage of the project, three webpages were fully operational,
including database connectivity. A group of five people were selected to participate and provide
their feedback. At this stage, no programming knowledge was required, however, it is important
to note that the selection of people was based on the target audience and only appropriate and
knowledgeable users participated. The test sample population was chosen via the convenience
method in all three instances. The ages of the participants ranged from 19 to 55, two of which
were female and the remaining three were male. The second instance appeared in Sprint Five
(Table 4) and was purely centred on the Python work task portion of the survey. This test
evolved significantly during creation, due it its important role in our measurement of work task
performance. User interpretation and clarity was pivotal. To emulate the potential sample as
closely as possible, three male participants were involved, with programming experience
varying from 1 year to 30 years. The third and final instance of usability testing was a full system
test in Sprint Six (Table 4), upon completion of the full web design. There were five participants,
four male and one female, with programming experience varying from 3 to 30 years. This was
an evaluation of the complete system as it was intended to be deployed during the experiment.
User feedback was positive in all instances, which was encouraging. Test and website
comprehensiveness were also confirmed. Certain suggestions were made to promote clarity
and general usability of the form (see Table 5). Additionally, the use of traditional PS test
instructions proved to be outdated and required updating. While requesting feedback and noting
how users were testing the website, changes were also implemented to the demographic
collection, including more questions related to user programming experience and the device
used to take the test.

Table 5: User Feedback


User feedback collected throughout all testing phases, and actions taken.
Webpage Feedback Action

Summing to Although the test had a The instructions were reworded to


10 “practice” section. Users were promote clarity. The spacing between
confused as to what the the pairs of numbers was increased to
required task was and make the table of numbers easier to
requested further clarity in the read.
instructions.

Finding As Initially, the page “markers” The markers were completely


used to toggle between pages redesigned into page buttons that were
were boxes that were more visually appealing and familiar to
designed to stay highlighted users.
when on the page. However,
users were confused by this
design.

Finding As Although explicit throughout The instructions were clear and so no


the instructions, certain users changes were made. However, the
did not realise there were number of words per page was
three pages which affected reduced. This was to ensure that
their scores. regardless the device in use, the users
could see the page buttons.

Finding As One of the users was able to The number of words was increased
finish the test before the timer substantially, so as to ensure that it
ran out. would be impossible to finish the test.
Work Task Users reported difficulty Decided to emulate the popular Python
reading the code. coding platform Anaconda. This
enhanced similarity to an actual work
task.

Form No user feedback. This Throughout the usability testing


change arose based on review increased demographic collection
and reflection throughout the needs became evident. More
testing process. questions were added in regard to
users’ programming experience, as
well as information on the device they
chose to take the test.

5.10.2.Software Integration and Unit Testing


As previously illustrated in the software methodology, incremental testing was
implemented throughout the project. Unit testing was done locally to guarantee the website was
operating properly, followed by integration testing. Integration testing guaranteed seamless
webpage connectivity and functionality. As indicated in Table 5 above, it enabled bug detection
and expedited final testing.
5.10.3.Further Website Functionality
To guarantee a well-rounded analysis, additional local testing was implemented to
understand how specific actions might appear in the collected data.

Table 6: Functionality Tests


Overview of various functionality tests carried out.
Action Outcome

It would still be impossible to complete the


test before the timer runs out and would
Clicking all the words (or as many as immediately raise a flag that no perception
possible before the test times out) in the was employed. A high score could be
Finding As Test achieved with a correspondingly high error
rate, indicating that the test was falsified
(e.g. 74 correct – 183 incorrect).

If the user exits before submitting all four


forms, the database only collects partial
data for that participant. This is part of the
reason the database contains 1 table per
form. Are we able to conclude the user’s
reason for quitting, based on their
Exiting the test midway submitted demographics?

➔ Did they exit before Test 1 or after?


Perhaps the user exited after the
programming challenge.

This is all data worthy of further speculation


and examination.

Due to the nature of the webpage, it was


impossible to prevent the user from doing
this, meaning they were able to “reattempt”
Clicking the back/refresh button
one of the tests if they chose too. This is a
limitation of administering the test at a
distance.
However, an alert was in place to warn the
user that any submitted data would be lost,
and they would have to restart the task
completely. In the cases where a user
opted to refresh the page and redo the
task, only one submission was retained.

Impossible: using jQuery form validation, to


ensure data collection, the user is unable to
submit the form without completing the
Trying to submit the form without providing
mandatory fields (i.e. all of them apart from
consent or completing the demographics
the e-mail address, which was collected
only if the user desired to see his or her
results).
6. Data Analysis: Results and Evaluations
Firstly, this section will present the sample population, followed by the scoring methods
employed for each test. Secondly, the descriptive PS scores are presented as well as a linear
regression to further analyse the results. Explanations for every aspect of the results are
presented, when possible.

6.1. Sample population


There was a total of 48 participants in the experiment comprising 33 males, 12 females
and 3 other. The majority of the participants were students which is further outlined in Figure 8:
over 75% of the respondents were under the age of 32. Nonetheless, despite the lack of variety
within the age variable, the range of experience varied significantly. With respect to general
coding experience, 21% had under a year, whereas 42% had more than four years and the
remainder of the participants were between these variables, with two to three years of
experience (see Figure 9). Despite general coding experience, the programmers were less
experienced Python users; 44% had only used Python for less than a year, 27% were more
familiar and had been using it for two to three years, and the remaining 29% of the participants
had been using it for more than four years (see Figure 10).

An additional variable to note, is the use of device for the completion of the survey (see
Figure 11): 60% of respondents chose to use a desktop or laptop computer, with an external
mouse. Although it was indicated in the survey that this method was recommended, 27% still
chose to complete it on their mobile device and the remaining 13% used their laptop trackpad.
Figure 8: Ages of sample population
Histogram chart plotting age of participants: 34 between the age of 19 and 32, 7 between 32
and 41, 1 between 42 and 51, and 4 over 51.

Figure 9: General Coding experience of sample population


Histogram chart plotting general coding experience of participants: 9 with less than 1 year of
experience, 17 between 2 to 3 years and 20 with over 4 years of experience.
Figure 10: Python Experience of sample population
Histogram chart plotting python experience of participants: 20 with less that 1 year of
experience, 12 between 2 to 3 years and 14 with over 4 years of experience.

Figure 11: Devices used by sample population


Pie chart plotting devices used: 29 participants used an external mouse, 11 used a mobile
device and 6 used the trackpad on their laptop
6.2. Scoring
Ekstrom’s manual (Ekstrom et al., 1976) was again referenced to determine the scores.
For the Finding As Test, the number of words marked correctly determined the participant’s
score. For the Summing to 10 Test, the score was comprised of the number of correct answers,
minus the incorrect selections. For the Work Task, due to the nature of the results which will be
explored in sections 6.3 and 6.4.1, the participant’s precision and recall were calculated in order
to be able to quantify their performance. In general, precision and recall are metrics used to
define accuracy within the information retrieval space. However, a further application is to
characterise classifiers in data analytics as well. In the context of this experiment, where
participants were asked to identify errors within the code, precision represents the percentage
of a participant’s selections that represent actual syntax errors. Recall represents the
percentage of the total errors that the participant was able to identify (EMC Education Services,
2015).

6.3. Descriptive Perceptual Speed Scores


Descriptive Statistics for each variable are presented in Table 7 below. The total
number of responses is indicated, as well as the possible “highest score” for each test, for
reference.

For the Finding As test, the minimum score reported is 23 and the maximum is 78. For
the Summing to 10 test, the lowest score is 13 and the highest is 80. The standard deviation for
the PS tests is high, meaning our sample population had a diverse variation of Perceptual
Speed, which is representative of the general population. Furthermore, the average score for
each test is equidistant from the minimum and maximum values, which means approximately
half of the sample population has higher PS and the other half has lower PS, also validating the
variety of our sample population.

The Work Task standard deviation is significantly lower. Participants found 9 correct
answers on average, with a minimum of 3 and a maximum of 20. In contrast with to the PS
tests, the number of incorrect answers were much more notable in this task. On average, the
participants clicked on 4 lines of code that did not contain any errors. The minimum incorrect
answers is 0, and 10 participants were able to finish with 100% precision which correspondingly
increased the average precision figure: 73% for the sample population. The maximum incorrect
answers is 25 (there was only one instance of this). Interestingly, this is higher than the total
number of possible correct answers. This explains the lower recall values: the average is 40%
for the sample population.

Table 7: Descriptive Statistics


A summary of descriptive statistics for all variables.
Work Task

Total
PS Experien Pytho Corre Incorr Precisio Rec
PS2 Age Answ
1 ce n ct ect n all
ers

Coun
t 48 48 48 48 48 48 48 48 48 48

50.
Mean
58 44.92 9.13 3.98 13.10 0.73 0.40 50.58 44.92 9.13

13.
Std
79 14.79 4.31 4.88 6.10 0.22 0.19 13.79 14.79 4.31

Min 23 13 3 0 4 0.27 0.13 23 13 3

41.
25%
75 35.75 6 1 9 0.62 0.26 41.75 35.75 6

51.
50%
5 41.5 8 3 12 0.74 0.35 51.5 41.5 8

75% 61 51 12 5 16.25 0.91 0.52 61 51 12

Max 78 80 20 25 37 1.00 0.87 78 80 20

Total
Possible
Answers 85 121 n/a n/a n/a 23 n/a n/a n/a n/a

6.4. Results Analysis and Evaluation


The data analysis was done on Python and SPSS, according to the Analysis of
Variance (ANOVA) methods followed by a multiple linear regression to identify trends between
variables.

6.4.1. Perceptual Speed, Coding Experience and Programming


Previously it was noted that the participants’ precision values were more homogenous
across the sample. However, although the correlation between precision and the variables is
positive, it is lower than the correlation between recall and the variables (see Table 8). In order
to determine whether this positive correlation validates any of our hypotheses, two multiple
linear regressions were plotted with precision and recall as the dependant variables (see Table
9 and Table 10 respectively). As the correlation predicts, the model is not a perfect fit for the
data (R2 = 37%, see Table 9) when precision is the dependent variable, this means that their
performance does not vary based on our independent variables. The model performs
significantly better when using recall as the dependant variable (R2 = 61%, see Table 10).
Therefore, further analysis was done on this linear regression. Unfortunately, the coefficient
statistics are not indicative of a high correlation between the majority of the variables and recall
(Table 12): python experience, general experience and Finding As scores all show a low
significance (i.e. greater than 0.05). However, Table 12 and Table 13 present that results of the
Summing to 10 test have a high significance meaning there is strong evidence of correlations
between the scores on that test and the work task performance (p value = 0.008). This partially
validates the presented hypothesis and indicates a relationship between PS and work task
accuracy.

Table 8: Correlation between Work Task Precision/Recall and other variables


Correlation between recall and variables is more significant than that of precision.
Score 1: Score 2:
Age Experience Python
Finding As Summing to 10
Precision 0.14986 0.243709 0.010680 0.111579 -0.154011

Recall 0.45421 0.573631 -0.20243 -0.00874 0.174125

Python programming, especially within the machine learning space, often contains
functions. Data in the form of parameters are loaded into a function to produce results. The
deductive reasoning involved in the Summing to 10 test can be related to that required in the
Python world. This could explain the strong correlation between the two variables.

Adversely, general coding experience shows a negative coefficient (Table 13). In other
words, the more experienced the individual the lower their performance on the presented work
task. In addition, performance and age indicate a negative correlation to the overall precision
values (Table 8). Interestingly this opposes the presented hypothesis: experience plays a
greater role in work task performance, than PS. These two arguments can be explained by a
few factors. When learning Python, a lot of importance is placed upon syntax and clean code.
An individual with less experience might identify errors of this nature more easily because of
their more recent familiarity with these requirements. This counter-intuitive finding may also
indicate that after reaching a certain level of experience, these mundane errors (somewhat like
common grammatical errors in any spoken language) are no longer made or encountered.
Therefore, their ability to spot them decreases. Moreover, the intuitive software that experienced
users use to write code often identifies or auto-completes basic syntax errors (much like
Microsoft Word does when writing in most languages). Perhaps these findings identify a
reliance on the actual coding software, that is exhibited by more seasoned coders.

Table 9: Model Summary: Precision


A summary of the indicators of model fit with precision as the dependent variable: not a good
fit for the data (R = 37%)
St. Error of the
Model R R Square Adjusted R Square Estimate

1 0.368a 0.135 0.055 0.21251

a. Predictors: (Constant), Experience, Score2, Python, Score1

Table 10: Model Summary: Recall


A summary of the indicators of model fit, using recall as the dependent variable: better fit (R =
61%)
St. Error of the
Model R R Square Adjusted R Square Estimate

1 0.609a 0.371 0.313 0.15535

b. Predictors: (Constant), Experience, Score2, Python, Score1

Table 11: ANOVA


Analysis of Variance Method: variation between treatments and error or residual variation
Sum of Mean
Model Squares df Square F Significance

1 Regression 0.613 4 0.153 6.351 .000^b

Residual 1.038 43 0.024

Total 1.651 47

a. Dependent Variable: Recall


b. Predictors: (Constant), Experience, Score2, Python, Score1
Table 12: Coefficients
The coefficient table outlines the significance of each variable to overall work task recall:
Score 2 (0.008) shows a high significance.
Standardised
Unstandardised Coefficients Coefficients

Model B Std. Error Beta t Significance

1 (Constant) 0.033 0.101 0.327 0.746

Python 0.028 0.020 0.197 1.43 0.160

Score 1 0.002 0.002 0.162 0.978 0.334

Score 2 0.006 0.002 0.457 2.791 0.008

Experience -0.025 0.023 -0.152 -1.084 0.284

a. Dependent Variable: Recall

Table 13: P-Value


P-Value indicates how significant the coefficient is (high significance < 0.05): Score 2 is very
significant. Experience shows a negative coefficient.
Variable Coefficient p-
value

Constant 0.746

Score 1 0.162 0.334

Score 2 0.457 *** 0.008

Python 0.197 0.160

Experience -0.152 0.284

R2 = 0.371 N = 46

6.4.2. Effect of device on overall Performance


This variable is proved to be very conclusive and fully supported the hypothesis. On
average, the PS score of participants who used mobile devices was lower than that of the
participants using a laptop or desktop computer with an external mouse (see Figure 12 and
Figure 13). These results validate the theory presented: using an external mouse is a better
representation of an individual’s PS. Similarly to Allen’s experiment (Allen, 1994), it seems that
although instructions remain the same across both devices, a high Perceptual Speed score is
only achieved when the participant uses a system with an external mouse. It also validates the
hypothesis: other variables have a greater correlation with work task performance than PS. This
phenomenon, illustrated in Figure 12 and Figure 13, could be explained by the fact that reactivity
is facilitated when using a mouse as it is most similar to holding a pen or pencil.

Figure 12: Finding As Score by Device


The average score of participants that used an external mouse is significantly higher than
those who did not.

Figure 13: Summing to 10 Score per Device


The average score of participants that used an external mouse is significantly higher than
those who did not.
Figure 14: Work Task Precision and Recall per Device
The average work task precision score of participants that used a mobile device is
comparable to that of those who used an external mouse. The average work task recall score
of participants that used a mobile device is notably lower than those who used an external
mouse.
The results of the programming work task per device present slightly different findings
(Figure 14). It seems that the participants that used a mobile device were more accurate (higher
precision), however they were able to complete less of the test (lower recall). As the web version
of the test comprises a great deal of information on a single page, the user is required to scroll
in order to read the full code and navigate to the next page (Figure 15). On a mobile device,
reading the code requires the user to zoom-in, and the characters are enlarged (Figure 16).
Therefore, in order to read and view the full PS test, users had to navigate around the page,
which resulted in the loss of valuable seconds, and they were unable to complete as much of
the test as their desktop user counterparts. Conversely, this may have worked in their favour in
the Work Task. In the programming world, it is a well-known fact that no matter how experienced
the programmer is, reading someone else’s code is always mentally taxing and can prove to be
difficult. In other words, it is always easier to write your own code than to review someone else’s.
A simple Google search on the topic validates that there are a multitude of blog posts and
articles with recommendations and techniques to assist in deciphering another programmer’s
code. When taking the desktop version of the test, the work task opens immediately into 21
lines of code (see Figure 15). Although the example outlines the presentation to expect,
respondents reported being intimidated by the immensity of the task, which further validates the
difficulty of reviewing code written by others. When using a mobile device, the full test is
presented on screen when navigating to the task, but the characters are too small to read. To
complete the task, the user must zoom in (see Figure 16), which may help to pinpoint errors.
Unfortunately, this is pure speculation and more research on the matter is required. Another
aspect to note with respect to the Work Task is that programmers generally use a desktop
computer (with multiple monitors) to write, read and edit code. Therefore, completing the test
with this type of set up most emulates a real world work task and could explain the spike in
performance.

Figure 15: Page 1 of the administered programming work task (Desktop Site)
After clicking “start”, participants are navigated to this page and need to scroll down in order to
read the full code and toggle to the next page.
Figure 16: Page 1 of the administered programming work task (Mobile Site)
After clicking “start”, participants are navigated to this page and need to zoom in to read the full
code.
7. Conclusion
This section provides a brief overview of the initial research objectives and outcomes.
A discussion highlighting the limitations and difficulties encountered throughout the study as
well as the various achievements will also be presented. Finally, suggestions will be put forward
for future research accompanied by recommendations for replicability attempts.

7.1. A relatively unexplored domain


Existing research on Perceptual Speed is limited in comparison to other fields of
research. Application and studies are centred around its relation to information retrieval, skill
acquisition and the decline of PS with age. The use of PS within general hiring practices and
even less so in STEM fields, has not yet been questioned or explored. This experiment was the
first of its kind, which contributes to the explanation of the high number of limitations
encountered.

7.2. Discussion: Limitations and Recommendations for further


experimentation
Data collection proved to be the most challenging task in this experiment. The low
number of respondents translates to low accuracy in the regression models and reduces the
validity of the sample population versus the general population. Furthermore, although there
was extensive usability testing to ensure the ease of website navigation and content flow,
several users did not complete the test. Out of 63 people who completed the preliminary survey,
only 48 finished the entire test. At 24% this is an extraordinarily high and discouraging number.
Of the incomplete tests: 5 quit after the Form, 9 after the Finding As Test and 1 after the Work
Task. This limitation is directly related to administering the test online. Lack of communication
with and feedback from participants as to why they were unable or unwilling to complete the full
survey is a missing key variable. However, it is worth noting that this validates the web page
and database structures for this experiment. Had the form submission not been incremental
throughout the survey, this limitation would never have been identified which would leave room
for error in future research. Administering the experiment to a group of employees, or even a
controlled group of individuals in a laboratory or workshop setting would help to ensure
completion of the test as well as facilitate feedback collection. If collaboration with an
organisation or access to a workspace is not feasible or preferred, administering the survey
online is a very useful way to broaden the scope of the research. However, due to lack of time
and resources, online testing could not be used to its full potential. Had the survey been
promoted with some sort of incentive, this would most likely have increased both the participant
pool and completion rate.

A further limitation in this experiment was the sample itself. The population was
primarily male, with only 25.5% females and 6.3% other. Unfortunately, the programming
industry remains predominantly male. The 2018 Women in Tech Report found only 2,000 out
of 14,000 of programmers were women in an industry survey (McDowell, 2018). Furthermore,
although the University of Strathclyde is actively trying to decrease gender imbalances within
the STEM fields, the 2017 Gender Action Plan indicates that over 75% of STEM students were
male (Strathclyde, 2017). This is a three-year action plan, therefore more recent data is not
currently available. The gender gap undisputedly continues, which helps to explain the lack of
female participants. As a result, it was impossible to perform an analysis based on gender.
Using gender as a variable could prove to be very interesting. This could be achieved through
communication with a Female Coding society or using a STEM industry with a higher number
of female contributors. This would most definitely increase female participants and allow for a
more equal gender-based analysis.

Another aspect to note is that 68.1% of participants were below the age of 32. The
demographic collection could have been more precise to allow for further speculation on the
correlation between age and PS. The reason for the large age ranges was privacy related and
to not discourage participants from proceeding by protecting their identity as much as possible.
However, it is believed that protecting the respondent’s anonymity does not need to extend to
age, especially as there is an increased understanding for the necessity of this variable when
conducting research. Participants were questioned after completing the test (when possible)
and they reported no issue with providing a more precise age. With more precise demographic
collection regarding participants’ age and experience, there could have been more variety in the
visualisations: i.e. plotting a histogram or scatter plot per variable in order to identify trends
rather than the simple bar charts used in this dissertation.

Further recommendations regarding demographic collection would be a greater focus


on the participants’ programming skillset to collect better defined variables. For example,
instead of general Python experience, the survey could be more specific and ask how much it
was used in the past six months. Moreover, it may be interesting to understand what the nature
of their work is in the field. It is recommended to collect feedback from users after completion
whenever possible. Although convenience sampling is not always preferred in research due to
the possible increase in bias, it has worked in a positive manner for this experiment by facilitating
respondent feedback which provided valuable insight on limitations.

Although the correlation between Perceptual Speed (Summing to 10) and work task
recall is evident, no definite conclusions on this matter can be drawn from this experiment. It
could be beneficial to further explore the influence of the type of PS test administered. STEM
professionals might respond better to locating symbols as opposed to comparing them.

With respect to the programming challenge itself, feedback was collected from one of
the more experienced programmer participants. They indicated that due to the programming
software currently available and in use, spotting syntax errors is no longer necessary, in that
the software can either auto-complete it, or the output indicates exactly where the error is. This
aligns with the negative correlation between experience and performance on the work task. It
further validates Ackerman’s previous research on PS and skill acquisition. It was presented
that the importance of PS increases in the first two phases and then declines in importance in
the third phase, where psychomotor ability takes precedence (Ackerman and Cianciolo, 2000).
To emulate real world performance, the coding task could involve reading and interpreting a
code and describing what the output would be. This task involves PS and would provide a better
indication of an individual’s coding ability. However, the test would have to be administered to
specific skill levels.

Additionally, to ensure continuity within the results, a further possibility would be to


impose the use of an external mouse and desktop or laptop computer. This method best
emulates a traditional paper-pencil test and contributes to a better score across all tests. Should
a mobile version be preferred to further expand the reach of the experiment, it is recommended
to create a separate mobile application. This would remove the device variable and ensure the
design of the test is suitable for the device.

7.3. Summary and Final Remarks


To conclude, the results of this experiment do not confirm the validity of Perceptual
Speed in the STEM hiring process. Nevertheless, this dissertation did identify a variety of
findings which will assist in its future replicability in an evolved form. For the designed
programming work task, the recall values were more suited for linear regression. Results of the
Summing to 10 PS test indicated the strongest correlation with the collected data, which invites
future research on PS test types and their use. Experience showed a negative correlation to the
work task, which can be explained by the nature of the exercise. Providing experienced
programmers with a more analytical work task would produce more conclusive results. Finally,
the system used for completion is critical to the end results. In a more controlled environment
further conclusions could be made.

The Agile methodology employed facilitated a flexible project and software


development plan. Adaptability and incremental feedback were pivotal to the deployment of the
website and completion of the project. Establishing continuous communication with other
players ensured that the dissertation did not stray from topic and contributed to its timely
completion.
8. References
Ackerman, P. L. and Beier, M. E. (2007) ‘Further explorations of perceptual
speed abilities in the context of assessment methods, cognitive abilities, and individual
differences during skill acquisition.’, Journal of Experimental Psychology: Applied,
13(4), pp. 249–272. doi: 10.1037/1076-898X.13.4.249.
Ackerman, P. L. and Cianciolo, A. T. (2000) ‘Cognitive, perceptual-speed, and
psychomotor determinants of individual differences during skill acquisition.’, Journal of
Experimental Psychology: Applied, 6(4), pp. 259–290. doi: 10.1037/1076-898X.6.4.259.
Ackerman, P. L. and Ellingsen, V. J. (2016) ‘Speed and accuracy indicators of
test performance under different instructional conditions: Intelligence correlates’,
Intelligence, 56, pp. 1–9. doi: 10.1016/j.intell.2016.02.004.
Albert, B., Tullis, T. and Tedesco, D. (2010) Beyond the Usability Lab. Elsevier.
doi: 10.1016/C2009-0-19827-6.
Allen, B. (1994) ‘Perceptual speed, learning and information retrieval
performance’, in. Springer-Verlag, pp. 71–80.
Andrew, D. M., Paterson, D. G. and Longstaff, H. P. (1979) Manual for the
Minnesota clerical test. Edited by P. Corporation. New York.
Beck, K. et al. (2001) Manifesto for agile software development.
Ekstrom, R. et al. (1976) Manual for Kit of Factor-Referenced Cognitive Tests.
Edited by R. Ekstrom et al. Educational Testing Service.
EMC Education Services (2015) ‘Advanced Analytical Theory and Methods’, in
EMC Education Services (ed.) Data Science & Big Data Analytics. Indianapolis, Indiana:
John Wiley & Sons, Inc., pp. 191–231. doi: 10.1002/9781119183686.ch7.
Fitts, P. and Posner, M. I. (1967) Human performance. Edited by Brooks/Cole.
Belmont, CA.
Foulds, O., Azzopardi, L. and Halvey, M. (2020) Reflecting upon Perceptual
Speed Tests in Information Retrieval: Limitations, Challenges, and Recommendations.
Gael, S., Grant, D. L. and Ritchie, R. J. (1975) ‘Employment test validation for
minority and nonminority clerks with work sample criteria.’, Journal of Applied
Psychology, 60(4), pp. 420–426. doi: 10.1037/h0076908.
Ghisletta, P. and Lindenberger, U. (2003) ‘Age-Based Structural Dynamics
Between Perceptual Speed and Knowledge in the Berlin Aging Study: Direct Evidence
for Ability Dedifferentiation in Old Age.’, Psychology and Aging, 18(4), pp. 696–713. doi:
10.1037/0882-7974.18.4.696.
Held, J. D., Carretta, T. R. C. and Rumsey, M. G. (2014) ‘Evaluation of Tests of
Perceptual Speed/Accuracy and Spatial Ability for Use in Military Occupational
Classification’. doi: 10.1037/mil0000043.
Henly, S. J. et al. (1989) ‘Adaptive and Conventional Versions of the DAT: The
First Complete Test Battery Comparison’, Applied Psychological Measurement, 13(4),
pp. 363–371. doi: 10.1177/014662168901300403.
Johnson, J. F. et al. (2017) ‘Predictive Validity of Spatial Ability and Perceptual
Speed Tests for Aviator Training’, The International Journal of Aerospace Psychology,
27(3–4), pp. 109–120. doi: 10.1080/24721840.2018.1442222.
Johnson, W. and Deary, I. J. (2011) ‘Placing inspection time, reaction time, and
perceptual speed in the broader context of cognitive ability: The VPR model in the
Lothian Birth Cohort 1936’, in Intelligence, pp. 405–417. doi:
10.1016/j.intell.2011.07.003.
Lohman F, D. (1979) Spatial ability : a review and reanalysis of the correlational
literature. Edited by D. Lohman F. Stanford University.
Lubinski, D. (2010) ‘Spatial ability and STEM: A sleeping giant for talent
identification and development’, Personality and Individual Differences, 49(4), pp. 344–
351. doi: 10.1016/j.paid.2010.03.022.
McDowell, G. L. (2018) 2018 Women in Tech Report, HackerRank. Available
at: https://research.hackerrank.com/women-in-tech/2018.
Mount, M. K., Oh, I.-S. and Burns, M. (2008) ‘Incremental Validity of Perceptual
Speed and Accuracy over General Mental Ability’, Personnel Psychology, 61(1), pp.
113–139. doi: 10.1111/j.1744-6570.2008.00107.x.
O’Neill, M. J. (2016) Measuring Workplace Performance. Second Edi. Edited by
Micheal J O’Neill. CRC Press. doi: 10.1201/9781420006131.
Parks, S. et al. (2001) Developing a computerized test of perceptual/clerical
speed, Computers in Human Behavior. doi: 10.1016/S0747-5632(00)00031-5.
Rumpe, B. (2017) Agile Modeling with UML. Cham: Springer International
Publishing. doi: 10.1007/978-3-319-58862-9.
Sincero, S. M. (2012) Methods of Survey Sampling. Available at:
https://explorable.com/methods-of-survey-sampling.
Strathclyde, U. of (2017) ‘University of Strathclyde Gender Action Plan’.
Available at: https://www.strath.ac.uk/media/ps/sees/equality/Gender_Action_Plan.pdf.
Thomas International Ltd (2020) General Intelligence Assessment (GIA).
Available at: https://www.thomas.co/general-intelligence-assessment-gia.

You might also like