10 1002@hfm 20831

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Received: 7 September 2018 | Revised: 15 October 2019 | Accepted: 6 December 2019

DOI: 10.1002/hfm.20831

Assessment and monitoring of mental workload in subway


train operations using physiological, subjective, and
performance measures

Mohammad‐Javad Jafari1 | Farid Zaeri2 | Amir H. Jafari3 |


Amir T. Payandeh Najafabadi4 | Saif Al‐Qaisi5 | Narmin Hassanzadeh‐Rangi6

1
Safety Promotion and Injury Prevention
Research Center, Shahid Beheshti University Abstract
of Medical Sciences, Tehran, Iran Subway train operation is a complex, sociotechnical system that involves a variety of
2
Department of Biostatistics, Faculty of
cognitively demanding tasks. The train operators are responsible for continuously
Paramedicine, Shahid Beheshti University of
Medical Sciences, Tehran, Iran monitoring the surrounding environment, maintaining awareness, processing
3
Medical Physics & Biomedical Engineering information, and making decisions under risk. The resulting mental strain on
Department, School of Medicine, Tehran
University of Medical Sciences, Tehran, Iran operators can negatively affect their performance and the interaction of the
4
Department of Mathematical Sciences, human–machine system. The objective of this study was to evaluate if physiological,
Shahid Beheshti University, Tehran, Iran
subjective, and performance measures could identify the level of mental workloads
5
Department of Industrial Engineering and
Management, American University of Beirut
arising from routine and nonroutine operations in the subway system. A total of
(AUB), Beirut, Lebanon 11 subway train operators underwent different driving scenarios in a high‐fidelity
6
Department of Occupational Health and simulator. The simulated tasks were divided into two categories: routine operations
Safety Engineering, School of Public Health
and Safety, Student Research Committee, (preparing to drive and driving between stations without interruptions or
Shahid Beheshti University of Medical emergencies) and nonroutine operation (responding to a tunnel fire, dealing with a
Sciences, Tehran, Iran
high density of passengers, encountering a passenger/technician on the track, and
Correspondence responding to train failure). The mental workload was monitored and evaluated in
Narmin Hassanzadeh‐Rangi, Department of
Occupational Health and Safety Engineering, these tasks using an electrocardiogram, subjective self‐rating scales, and driving
School of Public Health and Safety, Student performance. Both heart rate variability and performance measures (including
Research Committee, Shahid Beheshti
University of Medical Sciences, 7th Floor, Bldg reaction time and error rate) detected mental workload variations in the different
No. 2, SBUMS, Arabi Ave, Daneshjoo Blvd, operations. On the other hand, the subjective ratings (including NASA‐TLX) assessed
Velenjak, Tehran 19839‐63113, Iran.
Email: narminhassanzadeh@sbmu.ac.ir the overall mental workload associated with a task, without explaining the mental
demand variations within the task over time. Subway train drivers experienced
different levels of mental workload during routine and nonroutine driving conditions.
The findings of this study can be used to extract mental workload limits to optimize
workload levels during train operations.

KEYWORDS
mental workload, physiological feedback, simulator, subjective measure, subway

1 | INTRODUCTION Hernández‐Fernaud, & Díaz‐Cabrera, 2010; Tripathi & Borrion,


2016; Zoer, Sluiter, & Frings‐Dresen, 2014). They are especially
More studies focusing on human factors and safety in the railway desired in underdeveloped systems such as subway networks (Zhao,
industry have been demanded (Kecklund et al., 2000; Rolo, Tang, & Ning, 2017). Previous qualitative studies on subway systems

Hum. Factors Man. 2019;1–11. wileyonlinelibrary.com/journal/hfm © 2019 Wiley Periodicals, Inc. | 1


2 | JAFARI ET AL.

have mostly focused on technological features and less on the rate (HR) increases and heart rate variability (HRV) decreases
human interactions with the system (Hassanzadeh‐Rangi, Khosravi, (Mansikka, Simola, Virtanen, Harris, & Oksama, 2016). HR and HRV
Farshad, & Jalilian, 2017; Karvonen et al., 2011; Khossravi, require less sophisticated instruments and have shown to be
Hassanzadeh‐Rangi, & Farshad, 2017). Train drivers are involved in sensitive measures of mental workload in simulated environments
many hidden tasks beyond obvious tasks (Karvonen et al., 2011). (Brookhuis & de Waard, 2010; Mansikka et al., 2016). However,
They do not only conduct the train on the track and control its these methods often require the attachment of physical sensors to
doors at stations, but they monitor the surrounding environment, the participant. For this reason, HR and HRV are not widely used in
communicate with traffic controllers, and interact with different practical fields (Chuang, Lin, Shiang, Hsieh, & Liou, 2016).
actors of the subway system (Khossravi et al., 2017). During routine Performance measures attempt to assess the operator’s perfor-
train operations, the driver adopts a passive role in safely driving the mance on the task of interest. These measures are useful when the
train between stations. In this operation, the automatic train job demands exceed the operator’s capacity. Many indices, such as
protection (ATP) system controls the speed level, the safe distance reaction times and error rates, can be used to measure
between two trains, and where and when the doors should open or task performance. For example, as mental workload increases,
close. During nonroutine train operations, the ATP system is off and the reaction time and probability of error increases (Schaap,
the driver is fully responsible for safely driving the train between Van Der Horst, Van Arem, & Brookhuis, 2008). There are many
stations (Hassanzadeh‐Rangi et al., 2017). critiques about task performance measures. Although they serve as a
Several qualitative studies have shown that train driving may good measure of mental workload, they alone may not suffice to
consist of a series of visual, communication, decision, and action tasks assess the mental state and capacity of operators during work
in a dynamic environment (Hassanzadeh‐Rangi et al., 2017; Karvonen overload (Wilson, 2005).
et al., 2011; Khossravi et al., 2017). Train drivers work in a dynamic A combination of physiological and subjective measures has been
context, including continuous monitoring of the surrounding envir- recommended as measures that reflect mental workload in main-
onment, maintaining awareness, cognitive processing, and on‐time taining task performance (Ryu & Myung, 2005). In other words, a
decision making (Khossravi et al., 2017). Train drivers acquire combination of psychophysiological feedbacks, subjective scales, and
information from many sources—such as the trip schedule, rulebook, task performances provides different perspectives and complement
ATP system, track‐side environment, warning signals and, signs—to one another in the assessment of mental workload.
provide a safe and reliable operation (Karvonen et al., 2011). Train Although mental workload measurement has been studied for
drivers can have many errors in implementing safe procedures under more than 30 years, past studies on train operation have mostly
high workload conditions (Abd Rahman & Md Dawal, 2016; focused on subjective scales and less on the combination of
Tripathi & Borrion, 2016). Therefore, human–machine interaction physiological feedbacks, subjective scales, and task performances
(HMI) must be carefully investigated for the safety of the operator (Pickup et al., 2005). On the other hand, previous studies have shown
and passengers in the subway. that ambient parameters, including noise, illumination, and tempera-
The level of mental workload is one of the primary factors affecting ture, affect workload levels (Rolo et al., 2010; Varjo et al., 2015); as
HMIs including observing and obeying signaling indications and train confounders that should be considered during an experimental study.
warning systems, obeying speed limits and other ATP orders, continuous Some past studies have investigated the reliability of using a
monitoring of the surrounding environment, cognitive processing, and on‐ combined approach to assess mental workload for computerized
time decision making (Hassanzadeh‐Rangi et al., 2017; Karvonen et al., tasks and simulated conditions in other domains (Cegarra &
2011; Khossravi et al., 2017). Mental workload is complex and Chevalier, 2008; Luque‐Casado, Perales, Cárdenas, & Sanabria,
multidimensional, and it is a function of task load, external provisions, 2016; Wanyan, Zhuang, & Zhang, 2014). Luque‐Casado et al.
and past skills (Young & Stanton, 2005). Psychophysiological feedbacks, (2016) investigated HRV as a function of cognitive demands (based
subjective self‐rating scales, and task performances are among the basic on the NASA‐TLX ratings) in several computerized tasks. They
methods for assessing mental workload (Cain, 2007; Young, Brookhuis, suggested that HRV is highly sensitive to the overall demands of
Wickens, & Hancock, 2015). sustained attention over and above the influence of other cognitive
NASA‐task load index (NASA‐TLX), the subjective workload processes suggested by previous literature. Also, their findings
assessment technique, and the modified Cooper–Harper (MCH) are highlighted a potential dissociation between objective and subjective
examples of well‐established subjective self‐rating scales. In these measures of mental workload, with important implications in applied
methods, operators can rate work‐demands themselves on a settings (Luque‐Casado et al., 2016). Cegarra and Chevalier (2008)
numerical or graphical scale (Young et al., 2015). The subjective investigated the reliability of using multiple measures of mental
self‐rating scales are highly recommended for assessment of modern workload in a puzzle‐solving experiment. The results indicate the
mental tasks involving judgment and decision making but are not importance of combining multiple measures to build upon the
strongly recommended for assessment of physical tasks involving theoretical and methodological foundations of mental workload
repetitive or highly learned activities (Cain, 2007). (Cegarra & Chevalier, 2008). Wanyan et al. (2014) investigated
Physiological measures are objective measures for assessing behavioral performance, subjective assessment based on the NASA‐
mental workload. For example, as mental workload increases, heart TLX, as well as physiological measures indexed by electrocardiograph
JAFARI ET AL. | 3

(ECG), event‐related potential, and eye‐tracking data in a flight (with an age range of 25–42 and a mean [SD] of 33 years [1.44 years])
simulation. They suggested that their findings can be applied to the read and signed a consent form before the experiment. The
comprehensive evaluation of mental workload during flight tasks and environmental parameters in the simulated cabin were maintained
the further quantitative classification (Wanyan et al., 2014). to the same conditions of the real cabin. The experimental procedure
However, the reliability of using a combined approach in real and was approved by the University’s Research Ethics Committee (Ethics
new operations such as subway train operation is yet unclear. There number: IR.SBMU.PHNS.REC.1395.105).
is a need for research to assess the mental workload of train drivers
using physiological, subjective, and performance measures in subway
operation under a more controlled condition. Therefore, this study 2.2 | Mental workload measurement
aimed to assess the mental workload associated with routine and
nonroutine subway train operations using physiological, subjective, 2.2.1 | Physiological measures
and performance measures. Finally, the objective of this study was to
evaluate if physiological, subjective, and performance measures could The HR and HRV parameters were recorded to evaluate the mental
identify the level of mental workloads arising from routine and workload of each experimental task (Fallahi, Motamedzade,
nonroutine operations in the subway system. Heidarimoghadam, Soltanian, & Miyake, 2016; Mansikka et al.,
2016; Ryu & Myung, 2005). A 7‐lead ECG was used to continuously
monitor and record HR and HRV. ECG has been widely used to
2 | METHODS assess mental workload mainly because it reflects the functionality of
the autonomous nervous system (J. Zhang, Yin, & Wang, 2015). Ag/
2.1 | Participants AgCl electrodes were attached to the drivers’ chests to record ECG.
The data were continuously recorded during the experimental tasks
The participants were train drivers from the Tehran subway system. with a Holter‐ECG (Beneware CT‐08 model) supported by Biotrace+
The participants were systematically selected from the train driver software (CardioTrack Holter System; version 1.4.1.5).
population. The focus of this study was on the routine and
nonroutine conditions of subway operations and its effects on
workload. We decided to control the effect of other individual 2.2.2 | Subjective measures
differences on workload between and within individuals. According
to Wiberg, Nilsson, Lindén, Svanberg, and Poom (2015), the following The NASA‐TLX as a subjective self‐rating scale was administered
two rules were applied to minimize the variation between and within (Hart & Staveland, 1988). The NASA‐TLX contains six dimensions,
individuals. First, to minimize the variation between individuals, the which are mental demand, physical demand, temporal demand,
following inclusion criteria were required: age range of 25–45 years; performance, effort, and frustration. It is scored from a low of 0 to a
driving experience range of 1–5 years; body mass index (BMI) high of 100 in each dimension. The weighting procedure to combine
between 20 and 25; physically and mentally healthy; drug‐free; and the six dimensions into an overall score is described by Hart and
nonsmokers. Second, to minimize physiological variation within each Staveland (1988) and Rubio et al. (2004). Each driver was also asked
individual, participants were instructed to satisfy the following to mark the ASHRAE 7‐point scale (Y. Zhang & Zhao, 2008), the
conditions before each experimental task: no caffeine consumption 5‐point subjective loudness rating scale (Williams, Beach, & Gilliver,
6 hr before each test; no blood donation within the latest 2 weeks; no 2013), and the 5‐point visual comfort scale (Mui & Wong, 2006) to
exercising on the same day; no alcohol consumption during the last describe their feelings about the surrounding environment during
24 hr; and at least three succeeding nights of good sleep before the each trip.
experiment (Wiberg et al., 2015). A demographic form (including age,
driving experience, shift work, education level, smoking, night sleep,
second job, and other inclusion/exclusion criteria) was used to collect 2.2.3 | Performance measures
background information of the participants. Since all drivers working
in the Tehran subway system are male, 12 healthy male subway train Task performance measures were used for the assessment of each
drivers volunteered to participate in this experimental study. Usually, driver’s performance using reaction times and error rates. The task
the number of participants in the same experimental studies have performance involved maintaining the train on the track, controlling
been from 9 to 20 subjects (Dorrian, Roach, Fletcher, & Dawson, its doors, and performing other driving tasks, such as visual
2006; Jay, Dawson, Ferguson, & Lamond, 2008; Käthner, monitoring, communication, recalling information, on‐time decision
Wriessnegger, Müller‐Putz, Kübler, & Halder, 2014; Lamond, making, and reacting to an emergency. The train driving task
Darwent, & Dawson, 2005; Miller, Rietschel, McDonald, & Hatfield, performance was rated between 0 (worst performance) and 100
2011; Wei, Zhuang, Wanyan, Liu, & Zhuang, 2014). One of the (best performance) based on an official checklist rating scale and the
participants was excluded due to drug consumption and poor output report from the train simulator. The task performance was
detection of electrophysiological signals. The final 11 participants rated by a qualified train simulator instructor. The simulator
4 | JAFARI ET AL.

instructors or examiners in the Tehran subway training center findings of mental workload extracted from several past pilots and
routinely evaluate train driver’s performance to issue driving independent studies (Hassanzadeh‐Rangi et al., 2017; Khossravi
certificates on subway lines. This type of performance measurement et al., 2017). The scenarios are presented in detail in Table 1. To
has been applied by Mansikka et al. (2016) for a similar objective. avoid learning effects, the routine and nonroutine conditions were
counterbalanced across the participants. All train drivers were
familiar with the experimental tasks, but none of them had driving
2.3 | Subway train simulator experience, specifically on Tehran subway line 1. The rationale
behind this selection was that the experimental route is interesting
The Corys Metro Train Simulator (CMTS) was used for the and demanding enough to induce mental effort from participants.
experimental driving tasks in this study. The CMTS is a full‐motion, The environmental parameters in the simulated cabin were main-
high‐fidelity train simulator with a fully functional AC500 train cabin. tained to the same conditions of the real cabin. The illumination level
The CMTS replicates subway train driving with high accuracy for (in Lux), the wet bulb globe temperature (in °C), and the noise level
both routine and nonroutine operations in Tehran subway lines 1 and (in dBA) were measured to ensure that these parameters were
2. The simulated task is based on actual track data and video for lines consistent between the real and simulated environment.
1 and 2. A simulator instructor monitored and controlled all the The drivers were asked to sit in the simulator chair for 15 min to
driving operations in a control room. The CMTS is used for basic and adapt to the ECG apparatus and to record baseline data. The last
advanced training and also for driving license examination in the 5‐minute rest period was used to record the resting baseline ECG.
Tehran subway system. They also completed the demographic form. To reduce possible
artifacts that could influence the HRV analysis, the drivers were
asked to avoid severe body movement during the experimental tasks.
2.4 | Study design and procedure The participants drove the train along the proposed paths as ECG
data were recorded simultaneously. The subjective rating scales were
The experimental route was a part of the Tehran subway system line completed by the drivers after each 10‐min driving period. Each
1 (39 km, 20 min). Both routine and nonroutine operations were driving scenario was separated by a 10‐min rest period. During the
presented to the participants. The routine and nonroutine scenarios rest period, the simulator was manipulated for the next scenario. To
were designed according to the train operation rulebooks and the minimize the learning effect, the order presentation of the

T A B L E 1 The experimental tasks in both routine and nonroutine scenarios

Scenario

Task/Condition Train driver’s required action Routine Nonroutine


Preparing to drive Changing the train’s direction; completing the train dispatching procedure, √ √
including train activation, train safety activation, and check‐up; observing
and obeying signaling indications and train warning systems.
Driving between stations without Driving the train along the track; obeying speed limits and other ATP √ √
interruptions or emergencies orders; monitoring the surrounding environment; observing and obeying
signaling indications and train warning systems; stopping the train at a
specific area in the stations; opening and closing train doors; entering and
leaving stations within the speed limit.
Driving between stations with an emergency Driving the train along the track; obeying speed limits and other ATP √
situation (responding to a tunnel fire) orders; monitoring the surrounding environment; observing and obeying
signaling indications and train warning systems; quickly diagnosing the
emergency situation; proper response to the emergency, such as stopping
at a safe distance and reporting the emergency; stopping the train at a
specific area in the stations; opening and closing train doors; entering and
leaving stations within the speed limit.
Dealing with a high density of passengers Driving the train along the track; monitoring the surrounding environment; √
fixing the opened or not completely closed door; proper communication
with the control room and passengers.
Encountering a passenger/technician on the Driving the train along the track; monitoring the surrounding environment; √
track proper response to the emergency situation, such as stopping at a safe
distance and reporting the emergency.
Responding to train failure Reporting an ATP fault to the control room; isolating the ATP; driving the √
train with the isolated ATP; obeying speed limits; activating the ATP;
proper communication with an assistant driver in the rear cabin.
Abbreviation: ATP, automatic train protection
JAFARI ET AL. | 5

experimental trials was randomized among the participants using the GraphPad Prism (version 5; GraphPad, San Diego) was utilized to test
Latin square method (Lan, Lian, & Pan, 2010). The simulator for significant differences in the physiological and subjective
instructors evaluated the driver’s performance based on their task variables between the routine and nonroutine operations. The α
measures (such as reaction time and error rate). To increase the level was .05 in this study.
sense of originality of the driving task, the drivers were allowed to
follow self‐selected techniques and problem‐solving approaches.
3 | RESULTS

2.5 | Data processing and analysis The pairwise comparisons of the environmental parameters showed no
significant difference between the simulated and real cabins, in terms of
For each participant, the last 5‐minute baseline period and two the illumination level (62.70 ± 23.94 and 69.70 ± 9.42 in Lux; p = .52), the
10‐min segments (for the routine and nonroutine driving scenarios) wet‐bulb globe temperature (19.50 ± 2.12 and 20.50 ± 2.12 in °C; p = .66),
of the ECG records were extracted for further analysis. Usually, the and the noise level (25.75 ± 5.67 and 26.0 ± 6.0 in dBA; p = .68).
duration of ECG records in the same experimental studies has been Table 2 presents the mean values, including the standard deviation of
from 3 to 5 min (Fallahi et al., 2016; Mansikka et al., 2016). All the physiological parameters in the baseline, routine, and nonroutine
artifacts and noise in the data were detected and removed conditions. The results revealed significant differences for the means of
automatically by the CardioTrack Holter System before beginning the HR/HRV parameters (p < .001) among the baseline, routine, and
the data analysis. The ECG data were interpreted by the guidelines of nonroutine operations, except for the mean RR (p = .103).
the Task Force of The European Society of Cardiology and The North The results showed that there was a significant difference for the
American Society of Pacing and Electrophysiology (1996). An internal mean HR between baseline condition and routine operation (p = .004).
medicine physician verified the results of the ECG data. The HR/HRV However, a significant difference was not observed between routine
components were analyzed in both time and frequency domains. The operation and nonroutine operation (p = .298). The results indicated that
time‐domain parameters included the mean value of the HR (mean there was not a significant difference for the mean RR among the
HR), the mean value of the RR intervals (mean RR), the standard baseline, routine operation, and nonroutine operation (p = .103). The
deviation of the RR intervals (SDNN), the average standard results showed that significant differences for the mean RR were not
deviations of the RR intervals for each 5‐min interval (SDNNIDX), observed between baseline condition and routine operation (p = .373),
the root mean square of the successive difference of the RR intervals and routine operation and nonroutine operation (p = .266). The results
(RMSSD), the percentage of successive RR intervals that varied more indicated that there was a significant difference for the SDNN between
than 50 ms from the previous interval (pNN50). The frequency‐ baseline condition and routine operation (p = .003), and routine operation
domain parameter of HRV was the ratio of the LF over the HF and nonroutine operation (p = .005). The results showed that there was a
(LF/HF) (Fallahi et al., 2016). Since there are significant differences in significant difference for the SDNNIX between baseline condition and
individuals’ cardiac and subjective responses to different task routine operation (p = .003), and routine operation and nonroutine
demands (Mansikka et al., 2016), the physiological and subjective operation (p = .014). The results indicated that there was a significant
measurements were compared within each participant rather than difference for the RMSSD between baseline condition and routine
between participants. The Friedman test was used to compare the operation (p = .002), and routine operation and nonroutine operation
HR and HRV parameters during the baseline, routine, and nonroutine (p = .005). The results indicated that there was a significant difference for
operations. The pairwise comparison of differences between the HR the PNN50 between baseline condition and routine operation (p = .003),
and HRV parameters during the baseline, routine, and nonroutine and routine operation and nonroutine operation (p = .013). The results
operations were analyzed using the Wilcoxon signed‐rank test. showed that there was a significant difference for the LF/HF ratio
Subjective and performance measurements were normalized due between routine operation and nonroutine operation (p = .041) and the
to within and between individual variations (Wiberg et al., 2015). The reduction in the LF/HF ratio was not significant between baseline
differences between the participants’ subjective responses during condition and routine operation (p = .051).
routine and nonroutine operations for each NASA‐TLX scale and the The changes in the physiological parameters between the
overall NASA‐TLX score were analyzed using the Wilcoxon signed‐ proposed operations are plotted in Figure 1. The results showed
rank test. The performance scores were also analyzed using the that most parameters related to HRV (including the SDNN, SDNNIX,
Wilcoxon signed‐rank test to examine differences during routine and RMSSD, and PNN50 parameters) were lower significantly in the
nonroutine operations. The Mann–Whitney test was carried out to nonroutine operation than in the routine operation; but there was a
compare the environmental parameters between the simulated and nonsignificant reduction in the mean RR in the nonroutine operation
real cabins. Nonparametric or distribution‐free tests were used than in the routine operation. The results indicated that the LF/HF
because of the small numbers of participants and the data were not ratio HR was significantly higher in the nonroutine operation than in
normal (Aprahamian, Martinelli, Neri, & Yassuda, 2010; Javorka, the routine operation.
Javorková, Tonhajzerová, Calkovska, & Javorka, 2005). The data The subjective rating results for the routine and nonroutine
were analyzed using IBM SPSS (version 21; IBM Corp., Armonk, NY). conditions are summarized in Table 3. The weighted NASA‐TLX score
6 | JAFARI ET AL.

and the scores in each dimension (mental demand, physical demand,


temporal demand, performance level, effort level, and frustration
level) were higher during the nonroutine operation than in the
RO×NOb
(p‐value)
routine operation. The results indicated that there was a significant
difference in the mental demand dimension (p = .01) between the
.298
.266
.005
.014
.005
.013
.041
routine operation and nonroutine operation. However, there were no
significant differences in physical demand, temporal demand,
performance level, effort level, frustration level, and weighted
workload between the routine operation and nonroutine operation.
(p‐value)
BC×ROb

The task performance (including reaction time and error rate)


.004
.373
.003
.003
.002
.003
.051 was lower during the nonroutine operation (69.09 ± 6.64) rather than
the routine operation (95.45 ± 4.71). The results indicated that there
was a significant difference in task performance between the routine
operation and nonroutine operation (p < .001).
Abbreviations: HR, heart rate; RMSSD, root mean square of the successive difference of the RR intervals; SDNN, standard deviation of the RR intervals
BC×RO×NOa
(p‐value)

4 | D I S C U SS I O N
.103
<.001

<.001
<.001
<.001
<.001
<.001

Human cognitive abilities and limitations play an important role in


the majority of industrial accidents and driving crashes (Allahyari,
Rangi, Khalkhali, & Khosravi, 2014; Hassanzadeh Rangi, Allahyari,
T A B L E 2 Comparison of physiological variables during the routine and nonroutine operations of the subway driving

Khosravi, Zaeri, & Saremi, 2012; Hassanzadeh‐Rangi, Asghar Farshad,


Nonroutine operation (NO)

Khosravi, Zare, & Mirkazemi, 2014; Khosravi, Asilian‐Mahabadi,


Hajizadeh, Hassanzadeh‐Rangi, & Behzadan, 2014; Khosravi et al.,
2013). There was a need for research to assess the mental workload
664.72 ± 110.94
92.64 ± 16.56

49.18 ± 17.25
49.09 ± 19.42
28.01 ± 15.88
6.01 ± 7.20
4.46 ± 2.36

of train drivers in the subway operation. In the current study, mental


(mean ± SD)

demand was manipulated by varying the difficulty of the task


performed in a high‐fidelity simulator. On the other hand, previous
studies have shown that ambient parameters, including noise,
illumination, and temperature, affect workload levels (Rolo et al.,
2010; Varjo et al., 2015); as confounders that should be considered
during an experimental study. The finding of this study showed that
the simulated cabin was similar to the real cabin in terms of the
Routine operation (RO)

illumination, temperature, and noise levels. This study is one of the


first studies to assess the mental workload of train drivers in subway
671.63 ± 108.27
91.82 ± 16.46

65.54 ± 16.47
64.18 ± 17.05
33.27 ± 15.76
8.45 ± 7.15
3.02 ± 3.01

operation under a controlled environment.


(mean ± SD)

Past studies have emphasized the need to use a combination of


physiological, subjective, and performance measures to assess the
mental effort of users in multitask jobs (Young et al., 2015).
Therefore, in this study, a combination of measures, including HR/
HRV, NASA‐TLX, and task performance, was used to evaluate mental
workload during routine and nonroutine train operations. Since this
Baseline condition (BC)

study was a within‐subject repeated measures design, concerns


related to variations between and within individuals were minimized.
673.45 ± 107.03

The ECG was selected in this study for acquiring physiological


89.00 ± 17.09

78.63 ± 20.55
75.45 ± 20.27
38.90 ± 16.94
10.29 ± 8.37
2.43 ± 3.37
(mean ± SD)

feedbacks because it has a high face validity without generating


intrusion of any kind (Mansikka et al., 2016).
The findings of this study showed that subway train drivers
experienced different levels of the mental workload under routine and
Mean HR (1/min)

nonroutine train operations. Work demand changes among the baseline,


LF/HF ratio (‐)
Mean RR (ms)

Wilcoxon test.
Friedman test.
SDNNIX (ms)
RMSSD (ms)

routine, and nonroutine conditions were detected by the following HRV


PNN50 (%)
SDNN (ms)
Variable

parameters: SDNN, SDNNIX, RMSSD, PNN50, and LF/HF. Also, the


mental demand dimension and the task performance revealed work
demand between the routine and nonroutine operations.
b
a
JAFARI ET AL. | 7

F I G U R E 1 Box plots (mean, SD, Min, and Max) of the routine and nonroutine operations for the physiological parameters: (a) Mean HR, (b)
mean RR, (c) SDNN, (d) SDNNIX, (e) RMSSD, (f) PNN50, and (h) LF/HF. HR, heart rate; RMSSD, root mean square of the successive difference of
the RR intervals; SDNN, standard deviation of the RR intervals

The findings of this study and other qualitative studies (Kecklund role in the work organization and job design of train operations
et al., 2000; Smith, Blandford, & Back, 2009) show serious concern (Hassanzadeh‐Rangi et al., 2017; Khossravi et al., 2017). Improper
for the effects of nonroutine train operations (without the ATP consideration of these features in job design can lead to complicated
system) on driver’s mental workload and performance. Therefore, HMI and mishaps. The HMI level and related workload depend on the
new technologies, such as the ATP system, seem to have a necessary level of automation and the role of subway train drivers and other
8 | JAFARI ET AL.

T A B L E 3 Comparison of the subjective and performance measures during the routine and nonroutine operations of the subway driving
Variable Routine operation (mean ± SD) Nonroutine operation (mean ± SD) p‐valuea
Mental demand (NASA‐TLX) 73.82 ± 28.78 83.36 ± 26.38 .01
Physical demand (NASA‐TLX) 53.64 ± 24.40 59.55 ± 22.18 .44
Temporal demand (NASA‐TLX) 69.82 ± 25.75 73.64 ± 22.92 .39
Performance level (NASA‐TLX) 76.36 ± 20.50 71.82 ± 21.94 .35
Effort level (NASA‐TLX) 89.18 ± 18.01 93.18 ± 12.30 .59
Frustration level (NASA‐TLX) 20.91 ± 27.64 19.55 ± 26.31 .68
Weighted workload (NASA‐TLX) 72.50 ± 18.83 75.04 ± 18.09 .55
Abbreviation: NASA‐TLX, NASA‐task load index
a
Wilcoxon test.

traffic controllers in the subway system. These findings confirm that higher in the nonroutine operation than in the routine operation;
subway train driving is a complex and mentally demanding job. however, only the mental demand dimension was able to detect a
The current results indicated that increasing the work demand from significant difference between the routine and nonroutine operations.
nonroutine operations significantly affects the driver’s reaction time and Past studies showed similar results that the perceived mental workload
error rate in task performance. A couple of studies have suggested that (self‐measured by the NASA‐TLX) in the rail industry was not sensitive
traditional train drivers can have more train accidents and errors in enough to mental workload variability (Larue, Rakotonirainy, & Haworth,
implementing safe procedures under high workload and alertness 2016). Another study conducted in a simulated train concluded that the
conditions (Abd Rahman & Md Dawal, 2016; Tripathi & Borrion, 2016). NASA‐TLX is only sensitive to changes in task difficulty and not sensitive
The findings of this study showed that as mental demand to changes in task difficulty over time (Haga, Shinoda, & Kokubun, 2002).
increased during the nonroutine operation, the HRV parameters, In other studies, the NASA‐TLX and the integrated workload scale
including the SDNN, SDNNIX, RMSSD, and PNN50, decreased (Pickup et al., 2005) have been used as subjective and retrospective tools
significantly. Also, with the increase in mental workload, the LF/HF for mental workload assessment in the rail industry. Apparently, the
ratio increased significantly. These findings are consistent with sensitivity of different aspects of mental workload depends on the type of
research in the literature and further confirmed that with an increase operation investigated. In previous studies, for example, mental demand
in mental demand, the RMSSD, SDNN, and PNN50 parameters and performance (measured by the NASA‐TLX) were the most sensitive
decrease significantly (Cinaz, Arnrich, La Marca, & Tröster, 2013; dimensions in flight operations (Lee & Liu, 2003). In another study, all of
Fallahi et al., 2016; Mansikka et al., 2016). In the current study, the the dimensions of the NASA‐TLX, with the exception of physical demand,
LF/HF ratio was introduced as another sensitive indicator of mental were sensitive in traffic control operations (Fallahi et al., 2016). Perhaps
workload variability. This finding is consistent with past research that some dimensions of the NASA‐TLX are not significantly sensitive to
has shown that with an increase in mental demand, the LF/HF ratio mental workload variability because they are subjective and they assess
increase significantly (Cinaz et al., 2013). However, there is a study, mental workload as a whole, retrospective, and nondynamic concept.
which stated that the LF/HF ratio has a limitation in reflecting mental These subjective tools should be used to assess the overall perceived
workload changes. Because the LF/HF ratio is more affected by mental workload and the collective judgment of job demands. These tools
respiration fluctuation and sympathetic and parasympathetic sys- are not suitable for monitoring mental workload and for studies that are
tems (Miyake et al., 2009). This study also showed that increasing required to register online variability in mental workload.
mental demand increases the mean HR and decreases the mean RR. This study showed that the train driver’s performance reduced
However, the mean HR and the mean RR were not able to significantly during nonroutine operations than during routine
differentiate the routine operation from the nonroutine operation. operations of subway trains. Therefore, the train driver’s perfor-
This finding is consistent with the findings of a previous study mance was confirmed as a sensitive measure for changes in mental
(Mukherjee, Yadav, Yung, Zajdel, & Oken, 2011). Mean HR has been demands. A past study has also shown that reaction time and
introduced as an indicator of mental workload in other previous performance accuracy are sensitive measures of mental workload in
studies (Fallahi et al., 2016; Mansikka et al., 2016). The findings of complex operations (Wei et al., 2014). This study showed that the
previous studies about the mean RR are controversial. Some studies subjective rating of task performance is a strong candidate when
have shown a significant reduction in the mean RR (Mansikka et al., different levels of the mental workload are being considered for a
2016; Mukherjee et al., 2011) and others have presented a train operation environment. In other words, heart rate variability
nonsignificant reduction (Cinaz et al., 2013). In general, the findings and task performance were associated with mental workload
of this study support that the HR and HRV parameters can reveal variations. The findings of this study can be used to support the
variations in task difficulty of subway train operations. identification of workload limits for train operations.
The comparison of the subjective mental workload in this study The results of this study showed that the combination of
showed that the scores of the overall NASA‐TLX and its dimensions were measures—including task performance (reaction time and error
JAFARI ET AL. | 9

rate), heart rate (the LF/HF ratio), and heart rate variability Both heart rate variability (including the RMSSD, PNN50, SDNN, and
(including the RMSSD, PNN50, SDNN, and SDNNIX parameters)— SDNNIX parameters) and performance measures (including reaction time
can be used to monitor mental workload of HMIs in complex and error rate) detected mental workload variations in the different
systems. On the other hand, subjective ratings, measured by the operations. On the other hand, the subjective ratings (including NASA‐
NASA‐TLX, can be used for the overall assessment of mental TLX) assessed the overall mental workload associated with a task,
workload and for the collective judgment of job demands. Because without explaining the mental demand variations within the task. In other
of the multidimensional nature of mental demands, mental work- words, subjective measures were more sensitive to overall mental
load measures should be integrated into a combined measure using workload rather than to the dynamic work demand over time. The
different weight coefficients. findings of this study suggested a combined approach to assess overall
One of the limitations of this study was that we determined and dynamic mental workload to promote the effectiveness of the mental
the specific inclusion/exclusion criteria to control the effect of workload optimization procedures. The current findings can be applied to
some individual differences, such as age, driving experience, and develop a reliable model for the comprehensive evaluation of mental
lifestyle, on workload between and within individuals to focus on workload in complex sociotechnical systems.
the task‐based variables and its effects on workload. Therefore,
future studies are recommended to evaluate the individual
AC KNO WL EDG M EN TS
differences and their effects on the mental workload in the
subway operation system. Furthermore, all train drivers working in The cooperation of the Tehran Metro Company in this study is
the Tehran subway system were men. Therefore, this study did not greatly appreciated. This study was supported by Shahid Beheshti
address the impact of gender on physiological responses and University of Medical Sciences.
mental workload. Future research can investigate the workload of
operators in fields with both men and women. Another limitation
of this study was that operation in the simulator may require less CON F LI CT OF IN TE RES T S

work demand than the real environment. Although a simulated The authors declare that there are no conflict of interests.
environment can be designed to include critical emergency
operations, it will not be managed for the perceived risk of a real
ORCI D
operation such as the risk of a crash, injury, or death (Mansikka
et al., 2016). On the other hand, the real operational environment Narmin Hassanzadeh‐Rangi http://orcid.org/0000-0001-9885-
is affected by uncontrolled and perhaps unknown factors (Cain, 8889
2007). In this environment, testing of critical emergency opera-
tions with an acceptable safety margin is difficult and almost
R E F E R E N CE S
impossible. Conversely, the simulated operational environment is
under more experimental control. Advances in simulator technol- Abd Rahman, N. I., & Md Dawal, S. Z. (2016). The mental workload and
ogies are now providing opportunities for conducting more reliable alertness levels of train drivers under simulated conditions based on
electroencephalogram signals. Malaysian Journal of Public Health
research to estimate mental workload in the simulated environ-
Medicine, 1(sup1), 115–123. http://www.scopus.com/inward/record.
ment before real‐world measurements are undertaken (Cain, url?eid=2‐s2.0‐84957565869&partnerID=40&md5=20e22d12a15efa
2007). However, future studies are also recommended to evaluate 239848144a8b2da840
the mental workload in the real subway environment compared Allahyari, T., Rangi, N. H., Khalkhali, H., & Khosravi, Y. (2014).
Occupational cognitive failures and safety performance in the
with the simulated environment.
workplace. International Journal of Occupational Safety and
Ergonomics, 20(1), 175–180.
5 | CONC LU SION Aprahamian, I., Martinelli, J. E., Neri, A. L., & Yassuda, M. S. (2010). The
accuracy of the Clock Drawing Test compared to that of standard
screening tests for Alzheimer's disease: Results from a study of
A high‐fidelity simulated experimental study was designed to
Brazilian elderly with heterogeneous educational backgrounds.
manipulate and evaluate mental workload during a close to real International Psychogeriatrics, 22(1), 64–71.
operation under a more controlled condition. Train drivers experi- Brookhuis, K. A., & de Waard, D. (2010). Monitoring drivers’ mental
enced different levels of mental workload during routine and workload in driving simulators using physiological measures. Accident
Analysis & Prevention, 42(3), 898–903.
nonroutine driving operations. The current study showed that
Cain, B. (2007). A review of the mental workload literature.
nonroutine train operations (without the ATP system) could have
Cegarra, J., & Chevalier, A. (2008). The use of Tholos software for combining
adverse effects on the train operator driver’s mental workload and measures of mental workload: Toward theoretical and methodological
performance. Therefore, new technologies, such as the ATP system, improvements. Behavior Research Methods, 40(4), 988–1000.
have a necessary role in performance promotion and accident Chuang, C. Y., Lin, C. J., Shiang, W. J., Hsieh, T. L., & Liou, J. L. (2016).
Development of an objective mental workload assessment tool based
prevention in train operations. The findings of this study can be
on Rasmussen's skill‐rule‐knowledge framework. Journal of Nuclear
used to extract mental workload limits based on a combined Science and Technology, 53(1), 123–128. https://doi.org/10.1080/
approach to optimize workload levels during train operations. 00223131.2015.1027156
10 | JAFARI ET AL.

Cinaz, B., Arnrich, B., La Marca, R., & Tröster, G. (2013). Monitoring of Larue, G. S., Rakotonirainy, A., & Haworth, N. L. (2016). A simulator
mental workload levels during an everyday life office‐work scenario. evaluation of effects of assistive technologies on driver cognitive load
Personal and Ubiquitous Computing, 17(2), 229–239. https://doi.org/10. at railway‐level crossings. Journal of Transportation Safety & Security,
1007/s00779‐011‐0466‐1 8(sup1), 56–69.
Dorrian, J., Roach, G. D., Fletcher, A., & Dawson, D. (2006). The effects of Lee, Y. H., & Liu, B. S. (2003). Inflight workload assessment: Comparison of
fatigue on train handling during speed restrictions. Transportation subjective and physiological measurements. Aviation, Space, and
Research Part F: Traffic Psychology Behaviour, 9(4), 243–257. Environmental Medicine, 74(10), 1078–1084.
Fallahi, M., Motamedzade, M., Heidarimoghadam, R., Soltanian, A. R., & Luque‐Casado, A., Perales, J. C., Cárdenas, D., & Sanabria, D. (2016). Heart
Miyake, S. (2016). Effects of mental workload on physiological and rate variability and cognitive processing: The autonomic response to
subjective responses during traffic density monitoring: A field study. task demands. Biological Psychology, 113, 83–90.
Applied Ergonomics, 52, 95–103. https://doi.org/10.1016/j.apergo. Mansikka, H., Simola, P., Virtanen, K., Harris, D., & Oksama, L. (2016).
2015.07.009 Fighter pilots’ heart rate, heart rate variation and performance during
Haga, S., Shinoda, H., & Kokubun, M. (2002). Effects of task difficulty and instrument approaches. Ergonomics, 59(10), 1344–1352.
time‐on‐task on mental workload. Japanese Psychological Research, Miller, M. W., Rietschel, J. C., McDonald, C. G., & Hatfield, B. D. (2011). A
44(3), 134–143. novel approach to the physiological measurement of mental workload.
Hart, S. G., & Staveland, L. E. (1988). Development of NASA‐TLX (Task International Journal of Psychophysiology, 80(1), 75–78.
Load Index): Results of empirical and theoretical research. Advances in Miyake, S., Yamada, S., Shoji, T., Takae, Y., Kuge, N., & Yamamura, T.
Psychology, 52, 139–183. (2009). Physiological responses to workload change. A test/retest
Hassanzadeh Rangi, N., Allahyari, T., Khosravi, Y., Zaeri, F., & Saremi, M. examination. Applied Ergonomics, 40(6), 987–996.
(2012). Development of an Occupational Cognitive Failure Mui, K., & Wong, L. (2006). Acceptable illumination levels for office
Questionnaire (OCFQ): Evaluation validity and reliability. Iran occupants. Architectural Science Review, 49(2), 116–119.
Occupational Health, 9(1), 29–40. Mukherjee, S., Yadav, R., Yung, I., Zajdel, D. P., & Oken, B. S. (2011). Sensitivity
Hassanzadeh‐Rangi, N., Khosravi, Y., Farshad, A. A., & Jalilian, H. (2017). to mental effort and test–retest reliability of heart rate variability
Assessment and analysis of physical workload in metro driving job and measures in healthy seniors. Clinical Neurophysiology, 122(10), 2059–2066.
recommendation for improvement. Journal of Health and Safety at Pickup, L., Wilson, J. R., Sharpies, S., Norris, B., Clarke, T., & Young, M. S.
Work, 7(1), 33–44. http://jhsw.tums.ac.ir/article‐1‐5588‐en.html (2005). Fundamental examination of mental workload in the rail
Hassanzadeh‐Rangi, N., Asghar Farshad, A., Khosravi, Y., Zare, G., & industry. Theoretical Issues in Ergonomics Science, 6(6), 463–482.
Mirkazemi, R. (2014). Occupational cognitive failure and its Rolo, G., Hernández‐Fernaud, E., & Díaz‐Cabrera, D. (2010). Impact of
relationship with unsafe behaviors and accidents. International perceived physical and environmental conditions on mental workload:
Journal of Occupational Safety and Ergonomics, 20(2), 265–271. An exploratory study in office workers. Psyecology, 1(3), 333–342.
Javorka, M., Javorková, J., Tonhajzerová, I., Calkovska, A., & Javorka, K. https://doi.org/10.1174/217119710792774861
(2005). Heart rate variability in young patients with diabetes mellitus Rubio, S., Díaz, E., Martín, J., & Puente, J. M. (2004). Evaluation of
and healthy subjects explored by Poincaré and sequence plots. Clinical subjective mental workload: A comparison of SWAT, NASA‐TLX, and
Physiology and Functional Imaging, 25(2), 119–127. workload profile methods. Applied Psychology, 53(1), 61–86.
Jay, S. M., Dawson, D., Ferguson, S. A., & Lamond, N. (2008). Driver fatigue Ryu, K., & Myung, R. (2005). Evaluation of mental workload with a combined
during extended rail operations. Applied Ergonomics, 39(5), 623–629. measure based on physiological indices during a dual task of tracking and
Karvonen, H., Aaltonen, I., Wahlström, M., Salo, L., Savioja, P., & Norros, L. mental arithmetic. International Journal of Industrial Ergonomics, 35(11),
(2011). Hidden roles of the train driver: A challenge for metro 991–1009. https://doi.org/10.1016/j.ergon.2005.04.005
automation. Interacting with Computers, 23(4), 289–298. Schaap, T., Van Der Horst, A., Van Arem, B., & Brookhuis, K. (2008).
Käthner, I., Wriessnegger, S. C., Müller‐Putz, G. R., Kübler, A., & Halder, S. Drivers’ reactions to sudden braking by lead car under varying
(2014). Effects of mental workload and fatigue on the P300, alpha and workload conditions; towards a driver support system. IET Intelligent
theta band power during operation of an ERP (P300) brain–computer Transport Systems, 2(4), 249–257.
interface. Biological Psychology, 102, 118–129. Smith, P., Blandford, A., & Back, J. (2009). Questioning, exploring,
Kecklund, L., Ingre, M., Kecklund, G., Söderström, M., Åkerstedt, T., narrating and playing in the control room to maintain system safety.
Lindberg, E., & Almqvist, P. (2000). Railway safety and the train driver Cognition, Technology & Work, 11(4), 279–291.
information environment. Advances in Transport, 7, 1047–1056. Task Force of the European Society of Cardiology and the North
Khosravi, Y., Asilian‐Mahabadi, H., Hajizadeh, E., Hassanzadeh‐Rangi, N., & American Society of Pacing and Electrophysiology (1996). Heart
Behzadan, A. H. (2014). Structural modeling of safety performance in rate variability standards of measurement, physiological
construction industry. Iranian Journal of Public Health, 43(8), interpretation, and clinical use. Eur Heart J, 17, 354–381.
1099–1106. Tripathi, K., & Borrion, H. (2016). Safe, secure or punctual? A simulator study
Khosravi, Y., Asilian‐Mahabadi, H., Hajizadeh, E., Hassanzadeh‐Rangi, N., of train driver response to reports of explosives on a metro train. Security
Bastani, H., Khavanin, A., & Mortazavi, S. B. (2013). Modeling the Journal, 29(1), 87–105. https://doi.org/10.1057/sj.2015.46
factors affecting unsafe behavior in the construction industry from Varjo, J., Hongisto, V., Haapakangas, A., Maula, H., Koskela, H., & Hyönä, J.
safety supervisors’ perspective. Journal of Research in Health Sciences, (2015). Simultaneous effects of irrelevant speech, temperature and
14(1), 29–35. ventilation rate on performance and satisfaction in open‐plan offices.
Khossravi, Y., Hassanzadeh‐Rangi, N., & Farshad, A. (2017). Task and Journal of Environmental Psychology, 44, 16–33. https://doi.org/10.
hazard analysis of metro drivers and recommendations to 1016/j.jenvp.2015.08.001
improvement. Iran Occupational Health, 13(6), 46–57. Wanyan, X., Zhuang, D., & Zhang, H. (2014). Improving pilot mental
Lamond, N., Darwent, D., & Dawson, D. (2005). Train drivers’ sleep and workload evaluation with combined measures. Bio‐Medical Materials
alertness during short relay operations. Applied Ergonomics, 36(3), and Engineering, 24(6), 2283–2290.
313–318. Wei, Z., Zhuang, D., Wanyan, X., Liu, C., & Zhuang, H. (2014). A model for
Lan, L., Lian, Z., & Pan, L. (2010). The effects of air temperature on office discrimination and prediction of mental workload of aircraft cockpit
workers’ well‐being, workload and productivity‐evaluated with display interface. Chinese Journal of Aeronautics, 27(5), 1070–1077.
subjective ratings. Applied Ergonomics, 42(1), 29–36. https://doi.org/10.1016/j.cja.2014.09.002
JAFARI ET AL. | 11

Wiberg, H., Nilsson, E., Lindén, P., Svanberg, B., & Poom, L. (2015). Zhang, Y., & Zhao, R. (2008). Overall thermal sensation, acceptability and
Physiological responses related to moderate mental load comfort. Building and Environment, 43(1), 44–50.
during car driving in field conditions. Biological Psychology, 108, Zhao, B., Tang, T., & Ning, B. (2017). System dynamics approach for
115–125. modelling the variation of organizational factors for risk control in
Williams, W., Beach, E., & Gilliver, M. (2013). Development of a subjective automatic metro. Safety Science, 94, 128–142.
loudness rating scale. International Journal of Audiology, 52(9), Zoer, I., Sluiter, J. K., & Frings‐Dresen, M. H. (2014). Psychological work
650–653. characteristics, psychological workload and associated psychological
Wilson, G. F. (2005). Operator functional state assessment for adaptive and cognitive requirements of train drivers. Ergonomics, 57(10),
automation implementation. Paper presented at the Defense and 1473–1487.
Security, 2005, Orlando, FL.
Young, M. S., Brookhuis, K. A., Wickens, C. D., & Hancock, P. A. (2015).
State of science: Mental workload in ergonomics. Ergonomics, 58(1),
How to cite this article: Jafari M‐J, Zaeri F, Jafari AH,
1–17. https://doi.org/10.1080/00140139.2014.956151
Young, M. S., & Stanton, N. A. (2005). Mental workload, Handbook of Payandeh Najafabadi AT, Al‐Qaisi S, Hassanzadeh‐Rangi N.
Human Factors and Ergonomics Methods, London: Taylor & Francis. Assessment and monitoring of mental workload in subway
Zhang, J., Yin, Z., & Wang, R. (2015). Recognition of mental workload train operations using physiological, subjective, and
levels under complex human‐machine collaboration by using
performance measures. Hum. Factors Man. 2019;1–11.
physiological features and adaptive support vector machines. IEEE
Transactions on Human‐Machine Systems, 45(2), 200–214. https://doi. https://doi.org/10.1002/hfm.20831
org/10.1109/THMS.2014.2366914

You might also like