ChangKazemiEsmaeiliDavies2023_VRtraining

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/373157641
The Effectiveness of Virtual Reality Training: A Systematic Review
Article in Journal of Organizational Behavior Management · August 2023

DOI: 10.1080/01608061.2023.2240767
CITATIONS READS
2 498
4 authors, including:
Ellie Kazemi Vahe Esmaeili

California State University, Northridge California State University, Northridge
36 PUBLICATIONS 415 CITATIONS 1 PUBLICATION 2 CITATIONS
SEE PROFILE SEE PROFILE
Matthew Davies
California State University, Northridge
3 PUBLICATIONS 6 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ellie Kazemi on 04 November 2023.
The user has requested enhancement of the downloaded file.

JOURNAL OF ORGANIZATIONAL BEHAVIOR MANAGEMENT
https://doi.org/10.1080/01608061.2023.2240767
The Effectiveness of Virtual Reality Training: A Systematic

Review
An An Chang, Ellie Kazemi , Vahe Esmaeili, and Matthew S. Davies
Department of Psychology, California State University Northridge, Northridge, CA, United States
ABSTRACT KEYWORDS
Researchers have conducted studies on integrating autono artificial intelligence;
mous artificial intelligence (AI) in Virtual Reality Training (VRT); autonomous; training; virtual
however, little is known about the effectiveness of these train reality
ings and the skills typically taught. Out of the 2,017 related
articles found, there were 20 articles that met our inclusionary
criteria. We analyzed the 20 articles along the dimensions of
participant demographics (e.g. age, disability, ethnicity); skills
taught; measurement methods; components of VRTs (e.g. feed
back, communication medium, degree of immersion); effective
ness; and social validity. We also checked the 11 VRTs
mentioned in the present review for components of behavior
skills training (BST). Our results showed that VRT is effective in
teaching social, safety, and professional skills (e.g. initiation of
play, emergency bystander intervention, job interview) to 1,144
participants, including children with disabilities and adults with
and without disabilities. Across the reviewed articles, authors
probed for skill generalization and found that the targeted skills
generalized across setting or time in 15 out of 20 (75%) articles.
Our results indicate that VRT is a flexible and viable option for
scaling BSTs, although additional research is needed for cost-
benefit analysis. Lastly, we discussed ways for behavior analysts
to leverage VRTs with autonomous AI and recommendations for
future research.
Virtual Reality (VR) as a training medium offers various benefits. VR

encompasses various hardware and software technologies that allow users
to interact with computer-generated environments. This can provide dif
ferent levels of immersion depending on the requirements of the training
and allows for widespread access to the same training. Many organizations
across diverse industries have already found success using VR training. For
example, organizations such as Verizon, Walmart, and FedEx Ground have
already begun using VR training (VRT) to cut long-term operating costs
without compromising the quality of trainings they provide to their
employees (Harvard Business Review Analytic Services, 2020). Specifically:
Verizon trained its call center staff in de-escalation and empathy skills
through VR; Walmart used VR to train its staff to set up merchandise
CONTACT Ellie Kazemi ellie.kazemi@csun.edu Department of Psychology, California State University

Northridge, Northridge, CA, United States
© 2023 Taylor & Francis
2 A. A. CHANG ET AL.
across its complicated network of services; and FedEx Ground used VR to

train staff in efficiently loading packages on a truck, “an essential task [that
is] difficult to simulate in the classroom” (Harvard Business Review
Analytic Services, 2020, p. 3).
VRTs offer controlled, safe environments for learners to rehearse essential
skills in both high-risk situations (e.g., Çakıroğlu & Gökoğlu, 2019) and ones
that prove too difficult to replicate in vivo (e.g., Leary et al., 2019).
Additionally, VRTs can be programmed to automatically collect data on
learners’ performance and provide feedback. As technology advances, and
integration of detection systems becomes more feasible, VR technology can
observe, evaluate, and provide performance feedback on learner’s perfor
mance. However, some of this technology is still being developed. In the
meantime, some VRTs have included a “Wizard of Oz” technique
(Hanington & Martin, 2019) in which human operators control the behavior
of virtual avatars to interact with the learners (e.g., Burke et al., 2018).
However, the Wizard of Oz technique is reliant on the presence of a human
trainer for the VR environment to respond to the learner.
Artificial intelligence (AI) is any computer software that can indepen
dently mimic humans’ ability to (a) observe the environment, (b) orient
toward a discriminative stimulus, (c) make data-informed decisions, and
(d) act upon the decision (Proud et al., 2003). For example, an autonomous
AI avatar may be able to discriminate between and record correct and
incorrect responses independently, then respond with changes in its non
verbal body language (e.g., Smith et al., 2021, 2022). When embedded in
VRTs, AI can alter teaching strategies to adapt to the learner’s responses,
recreating an interactive learning experience. For instance, it may automa
tically alter prompting procedures (e.g., prompt delay, echoic prompts)
based on a learner’s performance (e.g., King, Boyer, et al., 2022; King,
Estapa, et al., 2022). Additionally, VRTs with AI can increase the verisimi
litude of the training, increasing the likelihood that learners will generalize
the target skill to novel situations. For example, Hassani et al. (2013)
programmed AI avatars to automatically respond to learners’ vocal
responses using natural language processing.
By embedding autonomous AI within VR, researchers developed realistic
training simulations in which the AI responded to the learner and provided
individualized expert feedback in the absence of expert human trainers. This
combination allows organizations to reduce costs and make training more
scalable, all while preserving individualization (Harvard Business Review
Analytic Services, 2020, p. 3). Furthermore, by reducing the reliance on trainer
availability, organizations can provide on-demand training that better fits
within learner’s schedule constraints. The purpose of our review was to
identify the efficacy of self-instructional VRTs with autonomous AI avatars
in training skills and monitoring performance.
JOURNAL OF ORGANIZATIONAL BEHAVIOR MANAGEMENT 3
Figure 1. Literature search and review process. Other reasons for excluding an article for present
review include: (a) IV or DV does not meet inclusion criteria; (b) articles were inaccessible; (c)
articles were reviews, meta-analysis, or dissertations; (d) duplicates of articles in present review.
Method
We identified articles in PsycINFO and Academic Search Premier (see Figure 1)
using identical keywords, filters, and inclusionary/exclusionary criteria in each
search. Our keywords included: virtual reality, train* or teach*, and skill*,
adding asterisks to include similar variations on the terms (e.g., train, training,
trained). We set our search range from July 2010 until December 10th, 2022, to
control for any major technological advances in VR hardware and software.
We also selected “peer-reviewed empirical studies” and “English” filters to
narrow our search to experimental studies written in English. We found a total
of 2,017 articles between both databases at the time of the search.
Inclusion/Exclusion criteria
We included articles in the review that met all the following criteria: (a)
dependent variables that were observable and measurable; (b) authors
used VRT to teach a specific skill; (c) the explicit presence of an
autonomous AI avatar; (d) learners interacted with an autonomous AI
avatar in the virtual world (e.g., using voice, text, or gestures). We
defined “autonomous AI avatar” as a computer-generated avatar that
independently responds to the learner’s behavior within the training
without the control of a confederate. We added autonomy as
a criterion to include the growing trend of organizations using machine
learning to improve strategic data-informed decision-making (Jordan &
Mitchell, 2015). Moreover, in our goal to identify self-instructional

packages, we identified explicit evidence of autonomous Human-AI
interaction (see Table 5).
Two independent readers reviewed each article. In the first round, we
scanned the abstract, introduction, and method sections to exclude duplicate
articles, and articles that did not meet any of our inclusion criteria. We also
excluded dissertations because they have not undergone a peer-review process.
In the second round of review, the independent reviewers read the article in its
entirety, then met to discuss any discrepancies around inclusion of the articles.
All discrepancies were resolved by highlighting statements in the article that
conflicted with inclusion criteria. We found 20 articles that met our criteria for
inclusion and examined the following dimensions of the articles.
Participants
Participant characteristics (e.g., age, disabilities, ethnicity/culture) can influ
ence responsiveness to intervention (Jones et al., 2020; Li et al., 2017). To
evaluate whether effects of VRTs generalized across diverse populations, we
recorded the number of participants in each study, their age (defining adults as
18 years or older and children as 17 years or younger), and any diagnoses of
mental and/or physical disability. Age and disabilities were the two consis
tently reported participant characteristics reported across the articles. Two
articles included student populations, so we looked at age distribution to
identify any potential learning differences between adults and children using
VRTs. Similarly, we were interested in whether mental and physical disabilities
could impede learners’ response to VRT as an intervention.
Target skills and context
The skills being taught within the VRTs varied with overlapping topographies;
however, the contexts in which each skill was taught were clearly discrimin
able. For example, engaging in appropriate job-interview skills and engaging
in appropriate conversation while waiting for public transportation both
involve social skills, but only job-interview skills are occurring in the context
of a job. We evaluated the different contexts of each skill being trained by
examining three different categories: social, occupational, and safety contexts.
We defined occupational skills as any observable and measurable behavior for
job-related tasks (e.g., classroom management, interview skills, patient triage).
We defined social skills as any observable and measurable verbal behavior for
non-job-related community contexts (e.g., appropriate responses while wait
ing for a bus, asking airport staff for directions). We defined safety skills as any
observable and measurable behavior for emergency protocols (e.g., bystander
CPR administration).
Skill and generalization measurement
The databases used in the present review include literature from disciplines
beyond behavior analysis and organizational behavior management, such as
clinical psychology, nursing, and education. We reviewed the various mea
surement methods used across the multiple disciplines to identify potential
trends in the literature around VR and training. We defined “direct measures”
as any instance of authors recording behavior based on their objective obser
vation (e.g., tracking rate of speech, scoring roleplays, etc.) and “indirect
measures” as any instance of authors recording behavior based on partici
pants’ self-reports or third-party reports. Additionally, since every interaction
with the virtual world requires technology that can run a code and track
interactions, we checked for explicitly stated use of automatic data collection
within the VRT.
Efficacy and generalization

For each article, we noted whether the authors reported a significant improve
ment in learners’ performance of the skill, when assessed within the training
environment (i.e., within the virtual world). We looked at the different ways
authors measured generalizability of the training interventions and defined
generalization as any instance when the authors reported that learner(s)
performed the target skills at mastery level in a novel context (e.g., novel
situation and real world) and/or maintained the target skill and performance
level over time.
Automatic data collection

Technology can be programmed to automatically collect and present data. We
checked if the VRTs collected and visually presented performance data on
screen to trainers and/or learners. Examples of visual presentation could
include filled versus empty health points, linear graph of rates of behavior
over sessions, and fraction summary of correct versus incorrect responses. We
also checked if the authors utilized the presented data to assess their dependent
variables.
Social validity
Following the recommendations set by Wolf (1978), we evaluated VRTs’ social
validity based on the significance of targeted goals, acceptability of procedures,
and satisfaction of the results across participants and stakeholders (e.g., par
ents, teachers, and trainers). For goal significance, we checked if authors
conducted pre-assessments or surveys to identify participants’ baseline skill
levels. For acceptability and satisfaction of VRTs, we reviewed the reported
results of post-training surveys and assessments. We also noted complications
learners reported after using the VR equipment (e.g., discomfort and restricted
movement).
Virtual reality and level of immersion
Virtual Reality (VR) is an umbrella term that refers to a broad spectrum

of technologies and applications that allow users to interact and experi
ence a computer-generated environment (Bartle, 2003). With the diversi
fication of VR technology over the past decade, the term “VR” is
frequently used in conjunction with other terms to describe specific
applications like augmented reality (AR) or mixed reality (MR), in
which users can engage with both the virtual world and the real world
simultaneously. To account for the changes in the technology over time,
we used Bartle’s (2003) definition of VR (i.e., any system that allows users
to interact with a virtual world) to identify VR technology, which groups
AR and MR under VR.
Different types of VR hardware technology offer users varying levels of
immersion. Immersion refers to the degree to which users can engage with,
and experience, a sense of presence within the virtual world. With non-
immersive VR technology, users typically see virtual environments on
a monitor and navigate the virtual environment using a keyboard and
mouse. With immersive VR technology, users typically see the virtual envir
onment through a head-mounted display (HMD) that completely covers the
user’s field of view and engages the virtual environment using hand-held
controllers and haptic devices. In this paper, we coded each VRT’s level of
immersion as it correlates with the hardware used and complexity of system
development to discriminate between the different VR systems (e.g., immer
sive and non-immersive). We recorded VRTs as “immersive” if the learners
could see the virtual world but not the real world, and “non-immersive” if the
learners could see both the virtual and real world.
Human-AI interaction
We examined how the learners interacted with the AI in the program. We
categorized interactions modalities into “point-and-click”, “text chat”, and
“vocal chat”. We defined point-and-click as interactions where the learner
selected from pre-determined scripts programmed into the system (e.g., click
ing a response from a list when prompted, etc.). We defined text chat as
interactions where the learner verbally interacted with the system by typing
(e.g., typing responses with a keyboard, etc.). We defined vocal chat as inter
actions where the learner communicated with AI vocally via microphone. We
coded systems with multiple modalities as “yes” for all modalities used (e.g.,
both vocal chat and point-and-click). We included both guided response
(point-and-click) and free response (text chat and vocal chat) to determine
whether topography of learners’ responses would influence skill acquisition.
Behavior Skills Training (BST) in VR
We compiled all the information found across the articles reviewed and
created a table to compare the features of VRTs (see Table 3). We first checked
if the different components of BST (i.e., instruction, model, rehearsal, and
feedback) were presented in the real or virtual world. We then noted the
presence of human trainers across the different components to determine if
the VRT can be implemented without an expert human trainer present. Next,
we checked for multiple exemplars presented during rehearsals and feedback.
For the feedback portions, we noted the type and timing of feedback provided
(see Table 4). We defined natural consequences as any feedback provided in
training, not including vocal verbal behaviors that are presented contingent to
learner’s behavior, independent of another person’s efforts (e.g., correct
response leads to changes in avatar’s heart rate). We defined socially mediated
consequences as any contingency of reinforcement (or punishment) that are
presented by another person (e.g., store helper answering learner’s question).
We defined “concurrent feedback” as any instance of learners receiving feed
back during rehearsal and “terminal feedback” as any instance of learners
receiving feedback post-rehearsal. Finally, we recorded whether the system is
capable of automatically collecting and presenting learners’ performance data.
Interobserver Agreement (IOA)
We conducted article inclusion IOA with primary and secondary reviewers

simultaneously and independently after the initial search. We conducted
variable measurement IOA after finalizing the list of the included articles.
We calculated IOA percentage for both article inclusion and variable measure
ment by dividing the number of agreements by the number of agreements and
disagreements, then multiplying the result by 100. For article inclusion, an
agreement is defined as two independent reviewers stating an article has met
all our inclusion and exclusion criteria. The IOA for article inclusion across
two independent reviewers was 90%. The average IOA across all variables was
94% (84%−100%). The observers met after independent reviews to discuss any
discrepancies in data collection. To resolve the discrepancies, we first checked
if locating specific sections of the article that directly reference the relevant
information would help reviewers come to an agreement. We resolved 90% of
the discrepancies with this first step. Next, we discussed the definition of the
variables under review until a consensus was reached then reviewed all the
articles again independently. When disagreement persisted, we selected the
first author’s data to include in calculations.
Table 1. Characteristics of the 20 articles in the present review.

Information Reported % of articles Information Reported % of articles
Participant Demographicsa Experimental Designs
Adults with disabilities 50% Between-group 70%
Adults without disabilities 45% Within-group 10%
Children with disabilities 10% Single-subject 10%
Children without disabilities 0% Combination 10%
Target Skills Quasi Experiments 10%
Occupational 80% Generality Measures
Social 15% Direct 35%
Safety 5% Indirect 25%
Dependent Measures Combination 15%
Direct 50% Not measured 25%
Indirect 10% Included Social Validity Measure 85%
Combination 40%
a
Participant demographics exceeds 100% because Smith et al. (2021) included both adult and child participants with
disabilities.
Results
We compiled the characteristics of the 20 articles that met our criteria (see
Table 1). There were 1,144 participants across all studies (n = 20). Nineteen
out of 20 articles included exclusively adult or child participants. Smith et al.
(2021) is the only author to include both adult and child-participants (n = 71)
but did not include the age distribution. Across the other 19 articles, there
were 1,073 participants: 619 were adults with no reported disabilities (57.69%),
451 were adults with some form of mental or developmental disability
(42.03%), and three were children with autism spectrum disorder (0.28%).
Authors reported information on ethnicity in 10 articles (50%), which was
insufficient to derive an accurate representation. Across articles, the authors
used VRTs to teach participants occupational skills in 16 articles (80%), social
skills in three articles (15%), and safety skills in one article (5%).
Next, we examined the methodologies found in recent literature and found
that authors evaluated behavior change using direct measures (e.g., perfor
mance scores, expert scores comparison), indirect measures (e.g., knowledge
tests and self-reported surveys), and combinations thereof in 50%, 10%, and
40% of the articles, respectively. All authors reported a significant increase in
skill performance after using VRTs, though there is evidence that VRTs may
not be effective across all topographies of skills. For example, in Leary et al.
(2019), participants were taught chest compressions by tapping on
a smartphone. Participants were able to practice their rhythm by tapping but
could not practice how deep to press down on the chest. As a result, when
tested with CPR mannequins, participants met the mastery criteria for rhythm
of the chest compressions but not the criteria for depth of the chest compres
sions. In terms of measuring generality of VRTs, we found that authors probed
for skill generalization and found that the targeted skills generalized across
setting and/or time in 15 out of 20 (75%) of the articles. The longest delay to
the maintenance probe was conducted 9 months after training ended (Smith
et al., 2022). Five studies did not include a generality probe (Aysina et al., 2016;
Hassani et al., 2013; Middeke et al., 2018; Sapkaroski et al., 2021; Ward &
Esposito, 2018).
In addition to the effectiveness of VRTs, we wanted to know whether VRTs
are socially valid interventions. We found that in 17 articles (85%), authors
surveyed participants about their training experience using Likert scales,
rating scales, and/or qualitative short answers. The learners and stakeholders
alike scored and described VRTs favorably. Additional benefits of VRTs
reported by authors included lower attrition rate for participants in VRTs
experimental groups compared to those of alternative training methods.
Authors reported no health-related complications. However, one article
(Park et al., 2011) mentioned learners expressed some difficulty with move
ment due to wires on the technology.
Next, we reviewed characteristics of the mentioned VRTs to understand the
different mechanisms authors utilized that made VRTs effective. Across the 20
articles, authors mentioned 11 different VRTs. Virtual Reality Job Interview
Training (VR-JIT) was examined across nine different articles, and King et al.’s
VR training (King, Boyer, et al., 2022; King, Estapa, et al., 2022) was examined
across two articles. We compared the characteristics of the 11 different VRTs
in Table 2. No clear trend was found in terms of degrees of immersion,
topographies of Human-AI interactions and data presentation capability
across the mentioned VRTs. However, we found that nine of the 11 (82%)
VRTs included multiple exemplars during practice. Multiple exemplar train
ings were presented in forms of different settings (e.g., school, home, and
community), situations (e.g., symptoms and events), levels of difficulty (e.g.,
easy and hard), and/or personas of conversation partners (e.g., friendly,
aggressive, and cold).
We reported the modality of BST components across the 11 mentioned
VRTs in Table 3. The presentation of BST components in VR and inclusion of
Table 2. Characteristics of 11 mentioned VRTs.

Information Reported % of VRTs
Degrees of Immersion
Full Immersive 55%
Non-Immersive 45%
Human-AI Interactions
Text Chat Only 0%
Vocal Only 27%
Point-and-Click Only 45%
Combineda 27%
Multiple Exemplars 82%
Data Presentation Capability 73%
a
Vocal and point-and-click interactions were available to
participants in Smith et al. (2021), but authors did not
specify whether participants were allowed to use
either or both mode of response.
Table 3. BST components in mentioned VRTs.

VRTs Instruction Model Rehearsal Feedback
Albright et al. (2018) ● ●
Aysina et al. (2016) ● ●
Cheng et al. (2015) ●
Hassani et al. (2013) ● ●
King, Estapa, et al (2022, 2022). ○ ○ ● ●
Leary et al. (2019) ○ ●
Middeke et al. (2018) ● ●
Pantziaras et al. (2015) ● ● ●
Park et al. (2011) ● ○ ●
Sapkaroski et al. (2021) ○ ● ●
Virtual Reality Job Interview Training (VR-JIT)a ● ● ●
Smith et al. (2021)b ○ ●
Smith et al. (2022)b ● ●
Note: BST components included in each distinct VRT ●=in VR only, ○= with Human Trainer only, = combination
thereof.
a
Authors who evaluated effects of VR-JIT on skill performance include Humm et al. (2014); M. J. Smith, Ginger, Wright,
et al. (2014); M. J. Smith, E. J. Ginger, M. Wright, et al. (2014); Smith, Fleming, et al. (2015); Smith, Humm, et al.
(2015); Smith et al. (2016); and Ward and Esposito (2018).
b
Smith et al. (2021) and Smith et al. (2022) also used VR-JIT but slightly altered their BST modality.
human trainer’s support were not mutually exclusive (e.g., if the learner
received feedback in VR and had a follow-up with a human trainer to further
discuss performance). All the reviewed VRTs incorporated rehearsal and
feedback, although incorporation of human trainers varied. For example, the
authors showed flexibility of VR-JIT by implementing BST either completely
in VR or in combination with additional human trainer support (see Table 3).
In Table 4, we reviewed the distribution of feedback modality across all 20
articles. Learners contacted natural consequences in VR during rehearsals in
11 articles (55%) and after rehearsals in nine articles (45%). Learners contacted
socially mediated consequences presented within VR during rehearsals in 17
articles (85%) and after rehearsals in 16 articles (80%). Learners did not
contact any natural consequences in-person from human trainers during or
after rehearsal. Learners contacted socially mediated consequences in-person
from human trainers during rehearsal in one article (5%) and after rehearsal in
seven articles (35%). Natural consequences presented in VR were emotional
responding or non-verbal responses from AI-conversation partners (e.g., tone
change after being asked inappropriate question) and changes in status or
event (e.g., getting the job). Socially mediated consequences in VR included
visual performance ratings (e.g., number of hearts lefts, trust or rapport meter,
thumbs up/down), numerical summary report (e.g., number of correct and
Table 4. Feedback modality utilized in the 20 articles in the present review.

Timing VR Human Trainer
Natural Socially Mediated Natural Socially Mediated
Consequences Consequences Consequences Consequences
Concurrent 55% 85% 0% 5%
Terminal 45% 80% 0% 35%
Table 5. Evidence of autonomous human-AI interactions across reviewed articles.

Visual
Articles Display Descriptive Automatic Human-AI Interactions
Albright et al. Monitor Learns used point-and-click to select verbal statements to emit to an AI-patient.
(2018) The AI-patient delivered contingent verbal and non-verbal responses. A visual
“Trust Meter” displayed effects of learner’s response on AI-patient as additional
feedback. (p. 251)
Aysina et al. Monitor Learner used point-and-click to select from between 6–10 available pre-scripted
(2016) responses. AI-interviewer vocally responded contingent on learner’s response
and allowed learner to recover from mistakes. The VRT system scored learner’s
responses and provided individualized feedback via either text or audio format.
(pp. 45–46)
Cheng et al. HMD Learner vocally answers questions presented to them in VR. Contingent on
(2015) correctness of learner’s response, AI-teacher praised or asked the learner to try
again. (p. 8)
Hassani et al. HMD Learner vocally communicated with AI-community members. Contingent on
(2013) learner’s responses, AI-community members provided information or asked
follow-up questions using natural language processing. (pp. 13–14)
Humm et al. Monitor Learner vocally responded to AI-interviewer’s vocal questions. AI-interviewer
(2014) delivered contingent verbal and non-verbal responses. AI-trainer located at the
bottom right corner of the learner’s screen gave a thumbs up or down as
additional feedback. (pp. 2–4)
King, Estapa, et al. HMD Learner vocally communicates with AI-student. Contingent on learner’s response,
(2022) AI-student vocally responded with answers and follow-up questions or
exhibited no response. Interactions continued until AI-student completed
assigned tasks. (p. 8–11)
King, Boyer, et al. HMD Learner vocally communicates with AI-student. Contingent on learner’s response,
(2022) AI-student vocally responded with answers and follow-up questions or
exhibited no response. Interactions continued until AI-student completed
assigned tasks. (p. e41097).
Leary et al. (2019) Mobile Learner used VR viewer click-button to instruct AI-bystanders to call 911 and
HMD retrieve the AED, then learner physically provided chest compressions to
physical CPR Manikins that tracked quality of the compressions. Contingent on
learner’s response, AI-bystanders either followed instruction or stood by. VRT
provided visual scorecard with ratings of 0–5 hearts as feedback on learner’s
emergency response performance. (p. 169)
Middeke et al. Monitor Learner used point-and-click to select verbal and non-verbal responses with 10
(2018) different AI-patients. Contingent on learner’s response, each AI-patient
provided different medical history and described different symptoms. VRT
provided digital feedback on learner’s diagnosis and treatment. (pp. 4–5)
Pantziaras et al. Monitor Learner used point-and-click to select pre-scripted responses to communicate
(2015) with AI-patient. Contingent on learner’s response, AI-patient provided verbal
and non-verbal responses. AI-patient and AI-advisor provided additional
individualized feedback. (p. 2)
Park et al. (2011) HMD Learner used joystick and buttons to communicate with AI-community members.
Contingent on learner’s response, AI-community members praised or delivered
corrective feedback.Participants were asked to navigate their virtual
environment and socialize with other avatars within the environment. There
were also helper avatars who provided praise and corrective feedback to
participants following their responses. (p. 167)
Participants were asked to navigate their virtual environment and socialize
with other avatars within the environment. There were also helper avatars who
provided praise and corrective feedback to participants following their
responses. (p. 167)
Sapkaroski et al. HMD Learner used point-and-click to select pre-scripted responses to communicate
(2021) with AI-patient. Contingent on learner’s response, AI-patient provided verbal
and non-verbal responses. (p. 59)
M. J. Smith, Ginger, Monitor Learner used both point-and-click and vocal responses to communicate with AI-
Wright, et al. interviewer. Contingent on learner’s responses, AI-interviewer provided verbal
(2014) and nonverbal responses. A different AI-avatar praised or delivered corrective
feedback. (p. 2452–2453)
(Continued)
Table 5. (Continued).
Visual
Articles Display Descriptive Automatic Human-AI Interactions
M. J. Smith, Monitor Learner used both point-and-click and vocal responses to communicate with AI-
E. J. Ginger, interviewer. Contingent on learner’s responses, AI-interviewer provided verbal
M. Wright, et al. and nonverbal responses. A different AI-avatar praised or delivered corrective
(2014) feedback. (p. 660–661)
Smith, Fleming, Monitor Learner used both point-and-click and vocal responses to communicate with AI-
et al. (2015) interviewer. Contingent on learner’s responses, AI-interviewer provided verbal
and nonverbal responses. A different AI-avatar praised or delivered corrective
feedback. (p. 87)
Smith, Humm, et al. Monitor Learner used both point-and-click and vocal responses to communicate with AI-
(2015) interviewer. Contingent on learner’s responses, AI-interviewer provided verbal
feedback. (p. 273–274)
Smith et al. (2016) Monitor Learner used both point-and-click and vocal responses to communicate with AI-
interviewer. Contingent on learner’s responses, AI-interviewer provided verbal
feedback. (p.325)
feedback. (p.1538–1541)
feedback. (p. 1030)
Ward and Esposito Monitor Learner used both point-and-click and vocal responses to communicate with AI-
(2018) interviewer. Contingent on learner’s responses, AI-interviewer provided verbal
feedback. (p.425)
incorrect responses), and textual summary (e.g., outline of behavior-specific

feedbacks). Socially mediated consequences presented by human trainers
included group reviews of performance reports generated by VRT or provided
examples of correct responding. Finally, we found that even though authors
used direct measures in some capacity to assess learners’ performance in 18
articles (90%), only authors from eight articles (40%) reported using the
technology itself for data collection (Hassani et al., 2013; King, Boyer, et al.,
2022; King, Estapa, et al., 2022; Middeke et al., 2018; Sapkaroski et al., 2021;
Smith et al., 2021, 2022; Ward & Esposito, 2018).
Discussion
Across disciplines, researchers have provided evidence for the efficacy and
social validity of VRTs, as well as the generalization of target skills from the
virtual world to the real world. Learners with or without disabilities, young
and old alike all have shown improvements in acquiring targeted skills after
using VRTs in settings such as health care (e.g., bedside manner and patient
triage), emergency responses (e.g., bystander CPR), employment services (e.g.,
job interview skills), and education (e.g., classroom management, learning
a second language). Post-training surveys from the studies we reviewed sug

gested that the success of VRTs may be attributed to a variety of factors
including novelty (Cheng et al., 2015; Leary et al., 2019) and engagement
(Cheng et al., 2015; M. J. Smith, Ginger, Wright, et al., 2014; Smith et al.,
2021). Other benefits of VRTs identified in this review include high accept
ability of the training, retention rates of learners undergoing the training, and
the minimal harm and discomfort reported by learners.
In this review, we identified the behavioral principles and mechanisms the
researchers incorporated in their VRTs. Although many of the articles were
sourced from outside of traditional behavior analytic literature, we found all
but four articles (n = 15) included at least three components of BST. In four
articles, authors presented all four components of BST (see Table 3), but only
one group of authors presented all components of BST within VR (Middeke
et al., 2018). As exhibited in Table 3, only three groups of authors stated their
inclusion of models of correct behaviors as part of their intervention. More
authors may have included modeling of correct responses in their trainings as
part of the instruction without directly specifying their inclusion of the model
component. For example, authors reported that VR-JIT included extensive
e-learning materials but did not describe what those e-learning materials fully
entailed. Authors from four articles successfully trained target skills by imple
menting only practice and feedback (e.g., Albright et al., 2018; Aysina et al.,
2016; Cheng et al., 2015; Hassani et al., 2013). However, upon closer examina
tion, the participants of these studies were previously enrolled in related
coursework (e.g., medicine and special education), except the participants
from Aysina et al. (2016).
The variations of how authors incorporated BST within VR show that VRTs
can be completely self-guided and be used in conjunction with other training
formats. For instance, authors implemented three different procedures with
VR-JIT. Most authors tested VR-JIT as a self-instruction training program.
Smith et al. (2021) provided instruction in-person and included additional
feedback review after learners completed their VRT. Smith et al. (2022)
presented instruction, rehearsal, and feedback in VR and then provided addi
tional feedback in-person post-VRT. These results highlight the flexible appli
cations of VRTs and provide further support that organizations can take an
iterative approach to developing effective VRTs, reducing initial development
costs.
The two components of BST consistently implemented across the
articles we reviewed were rehearsal and feedback. We wanted to under
stand what specific behavioral mechanisms may have contributed to the
effectiveness of VRTs. Starting with rehearsal, we looked for how
authors took advantage of the VRT’s customizability to incorporate
multiple exemplar trainings to program for generalization. Based on
our review, authors frequently included three or more practice scenarios
with variations in setting, people, events, and/or levels of difficulty.

Cheng et al. (2015) programmed a total of 16 different practice scenar
ios in VR to teach learners social skills. M. J. Smith, Ginger, Wright,
et al. (2014) programmed VR-JIT to include interviews for eight differ
ent employment positions (e.g., cashier, inventory worker, fast food
service worker, grounds worker, stock clerk, janitor, customer service
representative, and security). Although multiple exemplar training can
be done in-person, there is a limit in how many examples can be
created and implemented with high fidelity. Programming different
scenarios in VRTs with autonomous AI may be time and cost intensive
during initial development, but the costs associated with training mod
alities that require the presence of human trainers (e.g., paying trainers
for each session, renting out spaces, etc.) can be reduced. By using
VRTs with autonomous AI, organizations can accumulate the number
of examples over time without expecting human trainers to remember
each training script.
Similarly, the customizability of VRTs with autonomous AI expands on
the types of consequences learners contact during and after rehearsal.
Delivering feedback autonomously reduces the risk of trainer drift and
the potential for disrupting the roleplay simulation, increasing the relia
bility of learners systematically receiving individualized feedback. Our
results suggested that researchers can deliver a variety of feedback to
personalize training and cater to specific learner demographics and tar
geted skills. For example, Cheng et al. (2015) used the reward message
“Great! You have done well.” combined with an animated applause sound
for child participants. Hassani et al. (2013) aimed to teach manding skills to
adult English learners, so they opted for neutral consequences of access to
information and/or assistance as feedback. VR-JIT utilized multiple forms
of feedback; during the training, learners contacted natural consequences
and socially mediated feedback during training. Our results support that all
forms of feedback resulted in behavior change, regardless of the presenta
tion (i.e., by human, in VR, concurrent, terminal, natural, or socially
mediated).
In addition to reviewing the mechanisms for behavior change, we looked
at how VRTs with autonomous AI could improve training efficiency.
A major benefit of implementing BST in VR is the possibility of automating
data collection and analysis. We found a discrepancy between VRTs with
preprogrammed capability of automating data collection and whether
authors relied on the automatic function. There were 16 articles in which
authors used a VRT with automatic data collection and analysis function
ality. Of those, authors depended on VRT to collect and evaluate user
performance in seven articles. One possible explanation is that in early
stages of testing, authors manually collect data to establish the reliability of

VRT’s data collection features.
Limitations
Given the complexity and constant evolution of VR and AI technology, there
are several limitations to this present review. To narrow down self-
instructional VRTs, we specifically looked for the presence of autonomous
AI within each article as one of our inclusionary criteria. As a result, viable
VRTs like those with Wizard of Oz technique may not have been included. For
example, if a research article used an autonomous AI avatar within the VRT
for roleplays, but used a different, controlled avatar for an experimenter to
provide feedback within the virtual environment, this would still have met our
criteria for inclusion. At the time of present review, Wizard of Oz VRTs offer
some benefits that are still difficult to achieve with autonomous AI. For
instance, Wizard of Oz VRTs (e.g., Simmersion®; TeachLivE®) offer learners
non-scripted role-play experiences that cannot currently be simulated in VRTs
with autonomous AI even with extensive programming. Natural language
processing may be used to help the autonomous AI adapt to variations of
learner’s verbal behaviors, but large datasets are needed to effectively train the
autonomous AI. Although it is possible to program for VRT with AI to vary as
often as needed, based on the learner’s performance, the programming skills
and upfront time required may be currently cost-prohibitive.
Moreover, due to the lack of standardization in reporting the use of
technology, it was difficult for us to identify exactly which autonomous AIs
authors used and how they used them. Some articles only included a single
sentence describing the function of the autonomous AI avatar. It is possible
that more VRTs with autonomous AI were not included in this review because
it was unclear that autonomous AI was used. Our analysis of keywords showed
that only three articles specifically mentioned AI. Both King et al. articles
(King, Boyer, et al., 2022; King, Estapa, et al., 2022) included “artificial
intelligence” and Hassani et al. (2013) included “embodied conversational
agents” as one of their keywords.
With the advances in VR technology and application of AI, more interest in
the specific characteristics of their capabilities is required. This would involve
a dive into questions such as, how AI’s decision-making can better aid in real-
time changes in virtual environments and its interactions with users, and how
VR with an autonomous AI can make the learning process more efficient. The
current literature review’s aim was much more precise, so it did not capture
any such potential.
Future research
VRT with autonomous AI is an emerging topic within the literature, with

limited empirical studies that fit our criteria between 2010 and 2022 (n =
20). We recommend examination of the quality of reviewed paper for
future research as new studies are published in the coming years.
Additionally, we recommend that in the future, researchers conduct more
detailed analysis of immersion and hardware, such as those outlined by
Dechsling et al. (2021).
Future research is necessary to understand to what degree VRTs can
help organizations save money. Development of training technology
costs valuable resources. Currently, little information about the devel
opment of VRTs is reported in published studies, which makes it
difficult to conduct accurate cost-benefit analyses. For instance, during
the initial stages of development, video recording is cheaper and faster
than developing 3D assets. However, during post-production, editing of
the videos can become more costly than editing the 3D assets. Another
underreported variable is the duration of training. Although effective
ness has been established, efficiency is one of the key benefits for wide
adoptions of VR BSTs. A potential way to improve training efficiency is
to automate data collection and analysis within the VRTs. Not only does
automation save time, but it also creates a database that researchers can
later use to train a more sophisticated AI to analyze and predict
learners’ responses.
Regarding feedback, many authors opted to provide post-training group
discussions to review any questions learners may have regarding their perfor
mance reports generated by the VRT. More research is needed to determine to
what degree additional in-person feedback may produce greater behavior
change if the VRTs provide non-behavior specific feedback (e.g., number of
hearts left, percentage of correct responses).
Finally, we want to highlight the wide spectrum of hardware we found in
conducting this review. VRTs can be designed to be highly immersive from
inception, but a high degree of immersion is not necessary for behavior
change. Our results support that non-immersive VRT can be complex without
using newer technologies like HMDs. In the future, researchers should con
tinue to look for ways to develop non-immersive VRTs using computers,
laptops, and mobile devices since new technologies like HMDs and haptic
devices are not common household technologies yet. However, in the future,
researchers should develop non-immersive VRTs with plans to incorporate
more immersive technology as immersive VR technology becomes more
accessible to the public. For instance, early iterations of VRTs can be devel
oped for web-based applications to minimize development scope and then
additional development phases can be included for more immersive

technologies.
By recreating empirically supported BSTs in VR, effective BSTs can be
more accessible and scalable. Behavioral service providers can look to invest
in VRTs as a potential cost-saving solution without compromising
effectiveness.
Disclosure statement
We have no known conflict of interest to disclose.
ORCID
Ellie Kazemi http://orcid.org/0000-0001-8316-4112
References
Albright, G., Bryan, C., Adam, C., Mcmillan, J., & Shockley, K. (2018). Using virtual patient
simulations to prepare primary health care professionals to conduct substance use and
mental health screening and brief intervention. Journal of the American Psychiatric Nurses
Association, 24(3), 247–259. https://doi.org/10.1177/1078390317719321
Aysina, R. M., Maksimenko, Z. A., & Nikiforov, M. V. (2016). Feasibility and efficacy of job
interview simulation training for long-term unemployed individuals. PsychNology Journal,
14(1), 41–60. https://www.researchgate.net/publication/324108447_Feasibility_and_
Efficacy_of_Job_Interview_Simulation_Training_for_Long-Term_Unemployed_
Individuals
Bartle, R. (2003). Introduction to virtual worlds. Designing virtual worlds (pp. 1–108). essay,
Pearson Education Limited. https://www.researchgate.net/publication/200025892_
Designing_Virtual_Worlds
Burke, S. L., Bresnahan, T., Li, T., Epnere, K., Rizzo, A., Partin, M., Ahlness, R. M., &
Trimmer, M. (2018). Using Virtual interactive Training Agents (ViTA) with adults with
autism and other developmental disabilities. Journal of Autism and Developmental Disorders,
48(3), 905–912. https://doi.org/10.1007/s10803-017-3374-z
Çakıroğlu, Ü., & Gökoğlu, S. (2019). Development of fire safety behavioral skills via virtual
reality. Computers & Education, 133, 56–68. https://doi.org/10.1016/j.compedu.2019.01.014
Cheng, Y., Huang, C. L., & Yang, C. S. (2015). Using a 3D immersive virtual environment
system to enhance social understanding and social skills for children with autism spectrum
disorders. Focus on Autism and Other Developmental Disabilities, 30(4), 222–236. https://
doi.org/10.1177/1088357615583473
Dechsling, A., Orm, S., Kalandadze, T., Sütterlin, S., Øien, R. A., Shic, F., & Nordahl-Hansen,
A. (2021). Virtual and augmented reality in social skills interventions for individuals with
autism spectrum disorder: A scoping review. Journal of Autism and Developmental
Disorders, 52(11), 4692–4707. https://doi.org/10.1007/s10803-021-05338-5
Hanington, B., & Martin, B. (2019). 99 wizard of Oz. In Universal methods of design: 125 ways
to research complex problems, develop innovative ideas, and design effective solutions (pp.
462) Essay, Rockport Publishers.
Harvard Business Review Analytic Services. (2020). The Future of Work Is Immersive. Harvard
Business School Publishing.
Hassani, K., Nahvi, A., & Ahmadi, A. (2013). Design and implementation of an intelligent
virtual environment for improving speaking and listening skills. Interactive Learning
Environments, 24(1), 252–271. https://doi.org/10.1080/10494820.2013.846265
Humm, L. B., Olsen, D., Bell, M., Fleming, M., & Smith, M. (2014). Simulated job interview
improves skills for adults with serious mental illnesses. Studies in Health Technology and
Informatics, 199, 50–54. https://doi.org/10.1007/s11414-014-9392-0
Jones, S. H., Peter, C., & Ruckle, M. M. (2020). Reporting of demographic variables in the
Journal of Applied Behavior Analysis. Journal of Applied Behavior Analysis, 53(3),
1304–1315. https://doi.org/10.1002/jaba.722
Jordan, M. I., & Mitchell, T. M. (2015, July 17). Machine learning: Trends, perspectives, and
prospects. Science: Advanced Materials and Devices, 349(6245), 255–260. https://doi.org/10.
1126/science.aaa8415
King, S., Boyer, J., Bell, T., & Estapa, A. (2022). An automated virtual reality training system for
teacher-student interaction: A randomized controlled trial. JMIR Serious Games, 10(4),
e41097. https://doi.org/10.2196/41097
King, S., Estapa, A., Bell, T., & Boyer, J. (2022). Behavioral skills training through smart virtual
reality: Demonstration of feasibility for a verbal mathematical questioning strategy. Journal
of Behavioral Education, 1–25. https://doi.org/10.1007/s10864-022-09492-3
Leary, M., Mcgovern, S. K., Chaudhary, Z., Patel, J., Abella, B. S., & Blewer, A. L. (2019).
Comparing bystander response to a sudden cardiac arrest using a virtual reality CPR
training mobile app versus a standard CPR training mobile app. Resuscitation, 139,
167–173. https://doi.org/10.1016/j.resuscitation.2019.04.017
Li, A., Wallace, L., Ehrhardt, K. E., & Poling, A. (2017). Reporting participant characteristics in
intervention articles published in five behavior-analytic journals, 2013-2015. Behavior
Analysis: Research and Practice, 17(1), 84–91. https://doi.org/10.1037/bar0000071
Middeke, A., Anders, S., Schuelper, M., Raupach, T., Schuelper, N., & Ito, E. (2018). Training of
clinical reasoning with a serious game versus small-group problem-based learning:
A prospective study. Plos One, 13(9), e0203851. https://doi.org/10.1371/journal.pone.020385
Pantziaras, I., Fors, U., & Ekblad, S. (2015). Training with virtual patients in transcultural
psychiatry: Do the learners actually learn? Journal of Medical Internet Research, 17(2), e46.
https://doi.org/10.2196/jmir.3497
Park, K. M., Ku, J., Choi, S. H., Jang, H. J., Park, J. Y., Kim, S. I., & Kim, J. J. (2011). A virtual
reality application in role-plays of social skills training for schizophrenia: A randomized,
controlled trial. Psychiatry Research, 189(2), 166–172. https://doi.org/10.1016/j.psychres.
2011.04.003
Proud, R. W., Hart, J. J., & Mrozinski, R. B. (2003). Methods for determining the level of
autonomy to design into a human spaceflight vehicle: A function specific approach. Proc.
Performance Metrics for Intelligent Systems (PerMIS ’03), NIST Special Publication 1014,
September 2003. https://ntrs.nasa.gov/citations/20100017272
Sapkaroski, D., Mundy, M., & Dimmock, M. R. (2021). Immersive virtual reality simulated
learning environment versus role-play for empathic clinical communication training.
Journal of Medical Radiation Sciences, 69(1), 56–65. https://doi.org/10.1002/jmrs.555
Smith, M. J., Bell, M. D., Wright, M. A., Humm, L. B., Olsen, D., & Fleming, M. F. (2016).
Virtual reality job interview training and 6-month employment outcomes for individuals
with substance use disorders seeking employment. Journal of Vocational Rehabilitation, 44
(3), 323–332. https://doi.org/10.3233/jvr-160802
Smith, M. J., Fleming, M. F., Wright, M. A., Roberts, A. G., Humm, L. B., Olsen, D., &
Bell, M. D. (2015). Virtual reality job interview training and 6-month employment outcomes
for individuals with schizophrenia seeking employment. Schizophrenia Research, 166(1–3),

86–91. https://doi.org/10.1016/j.schres.2015.05.022
Smith, M. J., Ginger, E. J., Wright, M., Wright, K., Humm, L. B., Olsen, D., Bell, M. D., &
Fleming, M. F. (2014). Virtual reality job interview training for individuals with psychiatric
disabilities. The Journal of Nervous and Mental Disease, 202(9), 659–667. https://doi.org/10.
1097/nmd.0000000000000187
Smith, M. J., Ginger, E. J., Wright, K., Wright, M. A., Taylor, J. L., Humm, L. B., Olsen, D. E.,
Bell, M. D., & Fleming, M. F. (2014). Virtual reality job interview training in adults with
autism spectrum disorder. Journal of Autism and Developmental Disorders, 44(10),
2450–2463. https://doi.org/10.1007/s10803-014-2113-y
Smith, M. J., Humm, L. B., Fleming, M. F., Jordan, N., Wright, M. A., Ginger, E. J., Wright, K.,
Olsen, D., & Bell, M. D. (2015). Virtual reality job interview training for veterans with
posttraumatic stress disorder. Journal of Vocational Rehabilitation, 42(3), 271–279. https://
doi.org/10.3233/jvr-150748
Smith, M. J., Sherwood, K., Ross, B., Smith, J. D., DaWalt, L., Bishop, L., Humm, L., Elkins, J., &
Steacy, C. (2021). Virtual interview training for autistic transitional age youth:
A randomized controlled feasibility and effectiveness trial. Autism, 25(6), 1536–1552.
https://doi.org/10.1177/1362361321989928
Smith, M. J., Smith, J. D., Blajeski, S., Ross, B., Jordan, N., Bell, M. D., McGurk, S. R.,
Mueser, K. T., Burke-Miller, J. K., Oulvey, E. A., Fleming, M. F., Nelson, K., Brown, A.,
Prestipino, J., Pashka, N. J., & Razzano, L. A. (2022). An RCT of virtual reality job interview
training for individuals with serious mental illness in IPS supported employment.
Psychiatric Services, 73(9), 1027–1038. https://doi.org/10.1176/appi.ps.202100516
Ward, D. M., & Esposito, M. C. K. (2018). Virtual reality in transition program for adults with
autism: Self-efficacy, confidence, and interview skills. Contemporary School Psychology, 23
(4), 423–431. https://doi.org/10.1007/s40688-018-0195-9
Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied
behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11(2), 203–214.
https://doi.org/10.1901/jaba.1978.11-203
View publication stats

ChangKazemiEsmaeiliDavies2023_VRtraining

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ChangKazemiEsmaeiliDavies2023_VRtraining

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

The Effectiveness of Virtual Reality Training: A Systematic Review

Article in Journal of Organizational Behavior Management · August 2023

Ellie Kazemi Vahe Esmaeili

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

The Effectiveness of Virtual Reality Training: A Systematic

Virtual Reality (VR) as a training medium offers various benefits. VR

CONTACT Ellie Kazemi ellie.kazemi@csun.edu Department of Psychology, California State University

across its complicated network of services; and FedEx Ground used VR to

Mitchell, 2015). Moreover, in our goal to identify self-instructional

Target skills and context

Skill and generalization measurement

Efficacy and generalization

Automatic data collection

Virtual reality and level of immersion

Virtual Reality (VR) is an umbrella term that refers to a broad spectrum

Behavior Skills Training (BST) in VR

Interobserver Agreement (IOA)

We conducted article inclusion IOA with primary and secondary reviewers

Table 1. Characteristics of the 20 articles in the present review.

Table 2. Characteristics of 11 mentioned VRTs.

Table 3. BST components in mentioned VRTs.

Table 4. Feedback modality utilized in the 20 articles in the present review.

Table 5. Evidence of autonomous human-AI interactions across reviewed articles.

incorrect responses), and textual summary (e.g., outline of behavior-specific

a second language). Post-training surveys from the studies we reviewed sug­

with variations in setting, people, events, and/or levels of difficulty.

stages of testing, authors manually collect data to establish the reliability of

VRT with autonomous AI is an emerging topic within the literature, with

additional development phases can be included for more immersive

for individuals with schizophrenia seeking employment. Schizophrenia Research, 166(1–3),

View publication stats

You might also like

a second language). Post-training surveys from the studies we reviewed sug