Gem Article No Abstract

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/290648891
The game-based learning evaluation model (GEM): Measuring the

effectiveness of serious games using a standardised method
Article in International Journal of Technology Enhanced Learning · January 2015

DOI: 10.1504/IJTEL.2015.074189
CITATIONS READS
29 8,369
6 authors, including:
Esther Oprins Gillian Christine van de Boer-Visschedijk

TNO TNO
34 PUBLICATIONS 305 CITATIONS 11 PUBLICATIONS 213 CITATIONS
SEE PROFILE SEE PROFILE
Maartje Bakhuys Roozeboom M. E. W. Dankbaar

TNO Dankbaar Advies
23 PUBLICATIONS 241 CITATIONS 41 PUBLICATIONS 848 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Stepwise surgical procedure learning View project
Preparing resident View project
All content following this page was uploaded by M. E. W. Dankbaar on 16 May 2016.
The user has requested enhancement of the downloaded file.

326 Int. J. Technology Enhanced Learning, Vol. 7, No. 4, 2015
The game-based learning evaluation model (GEM):

measuring the effectiveness of serious games using
a standardised method
Esther Oprins*and Gillian Visschedijk

Department of Training and Performance Innovations,
PO Box 23, 3769 ZG Soesterberg, The Netherlands
Email: esther.oprins@tno.nl
Email: gillian.visschedijk@tno.nl
*Corresponding author
Maartje Bakhuys Roozeboom

Department of Work, Health and Care,
PO Box 3005, 2301 DA Leiden, The Netherlands
Email: maartje.bakhuysroozeboom@tno.nl
Mary Dankbaar
Erasmus University Medical Centre Rotterdam,
Institute of Medical Education Research,
PO Box 2040, 3000 CA Rotterdam, The Netherlands
Email: m.dankbaar@erasmusmc.nl
Wim Trooster
Hogeschool Windesheim,
Kenniscentrum Domein Bewegen & Educatie,
Postbus 10090, 8000 GB Zwolle, The Netherlands
Email: w.trooster@windesheim.nl
Stephanie C.E. Schuit

Erasmus University Medical Centre Rotterdam,
Departments of Emergency Medicine and Internal Medicine,
PO Box 2040, 3000 CA Rotterdam, The Netherlands
Email: s.schuit@erasmusmc.nl
Abstract: This article describes the background, design, and practical
application of the game-based evaluation model (GEM). The aim of this evalu-
ation model is to measure the effectiveness of serious games in a practical way.
GEM contains the methodology and indicators to be measured in validation
research. Measuring generic learning and design indicators makes it possible to
apply GEM on multiple games. Results provide insight in the reasons why
serious games are effective. This evidence will help serious gaming designers
to improve their games. Three empirical studies based on various serious games
Copyright © 2015 Inderscience Enterprises Ltd.

The game-based learning evaluation model 327
applied in different contexts show how GEM can be practically used and how
these studies have contributed to the improvement of GEM.
Keywords: applied gaming; educational games; educational technology;
effectiveness research; evaluation framework; evaluation model; evidence-
based; game-based learning; serious gaming; validation.
Reference to this paper should be made as follows: Oprins, E., Visschedijk, G.,
Roozeboom, M.B., Dankbaar, M., Trooster, W. and Schuit, S.C.E. (2015) ‘The
game-based learning evaluation model (GEM): measuring the effectiveness of
serious games using a standardised method’, Int. J. Technology Enhanced
Learning, Vol. 7, No. 4, pp.326–345.
Biographical notes: Esther Oprins, PhD, is a researcher at TNO, Soesterberg,
Netherlands since 2009. Her background is Educational Psychology. Her main
research interests at TNO are training design, assessment, evaluation, and
validation research of training with new technology (serious gaming,
simulation, simulators), in various domains such as education, aviation, and
military. She obtained her PhD in Competence Assessment of Air Traffic
Controllers at Maastricht University which she combined with her previous
work as Training Expert at Air Traffic Control the Netherlands (2002–2009).
Gillian Visschedijk, MSc, is a Scientific Researcher at the Training and
Performance Innovations Department at TNO. She graduated cum laude at the
University of Twente in Enschede with a Degree in Educational Science &
Technology. She specialises in the use of serious gaming and simulation
technology for training purposes in various domains such as education, health,
crisis management, and military. Her work is characterised by both design
projects and validation studies. Her main expertise in these projects is
combining didactical approaches and game design principles.
Maartje Bakhuys Roozeboom, MSc, is a Researcher at TNO, Leiden, since
2007. She works at the Department Work, Health, and Care. Her main interests
are in the area of evaluation research in various domains such as occupational
health and serious gaming. She has a background in Social Psychology and
Sociology and she has been involved in the development and (effect)
evaluation of various serious games.
Mary E.W. Dankbaar, MSc, is a Program Manager for e-learning at the
Erasmus University Medical Centre since 2006 and has a background in
Educational Psychology, with a specialisation in Technology-based Learning.
She has designed, developed, and implemented a large number of online and
blended programs for companies and for higher education organisations. Aside
from her work in instructional design and online learning, she is currently
working on her PhD thesis on the effectiveness of serious games in health care.
She has developed, in close cooperation with health care and game
professionals, the abcdeSIM ‘serious game’ for emergency care skills training.
From 2008 to 2013, she was the chairwomen of the national e-learning working
group of the Dutch Association for Medical Education (NVMO) and since
2012 she is a member of the scientific board of the Dutch Society for
Simulation in Healthcare.
Wim Trooster is currently working as a Senior Lecturer/Researcher on
e-learning at Windesheim University, Zwolle, the Netherlands. He has a Master
of Science in Biology and a PhD in Medical Sciences (Neurology). He gained
experience in Education as well (secondary vocational education and
university), where he developed a big interest in the application of IT for
Education. For the last six years he worked as a researcher on IT & Educational
Innovation at Windesheim University, among others on the topic of use of
328 E. Oprins et al.
Virtual Worlds & Serious Gaming in Education. He designed education with

Serious Games in a Virtual World and studied the efficacy of these learning
tools for Education. Currently Wim is in the midst of setting up a ‘Laboratory’
on Didactics & IT at Windesheim University.
Stephanie C.E. Schuit, MD MSc PhD, is an Internist in Acute Medicine–
Intensivist at Erasmus University Medical Centre in Rotterdam, The
Netherlands. She is Chair of the Emergency Department and Director of the
Internal Medicine Residency Program. Aside from her clinical work she
devotes a large part of her time to train doctors in the care of severely ill
patients. Together with her team at Erasmus University Medical Centre, games
developer IJsfontein, and the University of Twente, she developed the serious
game abcdeSIM. Her main research interest is the effectiveness of serious
games as a training instrument in medical education.
1 Introduction
The education and training community is increasingly accepting serious games as a

potentially valuable alternative for conventional ways of training. Serious games are
usually reality (i.e., simulations) or fantasy based, entertaining, interactive, rule-
governed, goal-focused, and competitive (Tobias and Fletcher, 2007; Vogel et al., 2006).
From a learning perspective, the value of serious games is that competencies can be
acquired in a realistic, attractive, and challenging manner (Bedwell et al., 2012; Gee,
2003; Korteling, Helsdingen and Theunissen, 2012; Shaffer, 2006; Squire, 2011). Serious
games are also assumed to be intrinsically motivating and engaging (Csikszentmihali,
1990; Malone, 1981) and enable the learner to take responsibility. We assume that this
latter aspect may enhance the learner’s self-efficacy (Bandura, 1997) and may facilitate
the development of self-directed learning (Stubbé and Theunissen, 2008).
In the last decennia, a growing need for evidence on the value of game-based learning
has emerged. Evidence on ‘what works’ is needed, that is, for which target groups and
type of learning tasks are serious games the optimal alternative with respect to regular
educational methods? And what are the gaming design features determining their
learning effects? This kind of knowledge will help designers to improve their games
(Bedwell et al., 2012; Oprins and Korteling, 2014). However, reviews on studies on
effectiveness of serious gaming consistently show a large diversity in outcomes and in
quality of research (Bekebrede, Warmelink and Mayer, 2011; Connolly et al., 2012;
Egenfeldt-Nielsen, 2006; Harteveld, 2012; Mayer, 2012; Sitzmann, 2011). The wide
variety of target groups, learning objectives, and types of games make it impossible to
generalise results of evaluation research on serious gaming.
The goal of the research presented in this article is to contribute to an increased
quality of evaluation research on game-based learning in education and training. We
propose a general evaluation framework, the game-based learning evaluation model
(GEM), which can be applied to serious games used for various target groups, learning
tasks, and domains. GEM contains a methodology, indicators, and measurement
instruments that are practically applicable for evaluation research. The ambition is to
develop a generic evaluation framework that helps to get insight into typical gaming
design features, applicable to various types of games that are most effective for learning.
This would improve the quality of gaming design and facilitate the implementation of
serious gaming in education and training. This article describes the theoretical
background, and main principles of this evaluation framework, and the practical
application in three studies.
2 Evaluation models for effectiveness of game-based learning
Various evaluation models and frameworks for evaluation and measuring effectiveness of
learning interventions have been designed earlier, generic and gaming-specific. This
section discusses the characteristics and approaches of the various models in order to
conclude which principles should be leading in designing an appropriate evaluation
framework for game-based learning.
2.1 Generic evaluation models

One of the most popular evaluation models is by Kirkpatrick (1976, 1998). Kirkpatrick’s
model describes four levels on which a learning intervention can be evaluated:
1 direct reaction of the learner, their feelings about the (new) learning intervention
2 learning effects (knowledge or skills), measured directly after education or training
3 transfer of learning in another context (e.g., work environment)
4 organisational effects of the learning intervention.
Later, a fifth level was added to measure return on investment, the costs and benefits of
the learning intervention. The popularity of the model can be contributed to its
understanding of evaluation of learning interventions in a systematic way, and
simplifying the complex process of evaluation (Alliger and Janak, 1989; Bates, 2004).
This model also has certain restrictions. Most importantly, Kirkpatrick’s model only
focuses on the outcomes of learning and does not help to understand the mechanics in the
learning process (Kraiger, Ford and Salas, 1993; Salas, Milham and Bowers, 2003). For
this reason, Kirkpatrick’s model was expanded towards new models by Tannenbaum
et al. (1993), also presented in Cannon-Bowers et al. (1995), Holton (1996), Kraiger
(2002), and Alvarez, Salas and Garofano (2004). Their models have in common that they
not only measure various types of outcome variables (cf. cognitive, skill-based, affective
learning outcomes) but also process variables that influence learning before, during, and
after the learning intervention (Alvarez et al., 2004). By measuring generic constructs
such as motivation and self-efficacy, the mechanisms that contribute to the game
effectiveness can be evaluated, and these constructs can be used and compared over
various studies (Kraiger et al., 1993).
In sum, a good model for measuring the effectiveness for learning should comprise
both various types of outcome measures at different levels as well as process measures
that influence learning before, during, and after the learning intervention.
2.2 Evaluation models for serious gaming

In the last few years specific evaluation models for game-based learning have been
developed. Garris, Ahlers, and Driskell (2002) created an input–process–output model for
learning with serious games. They see input as the characteristics and the instructional
content of the game. Process is regarded as the learning process, including enjoyment or
interest and user behaviours such as greater persistence on task. Output refers to the
debriefing and the learning outcomes. Interesting in their model is the distinction between
gaming design features, and the effects these have on the users’ perceptions and
following behaviour. Pavlas (2010) also describes the process of game-based learning.
Pavlas makes the distinction between ‘player traits’ and ‘game characteristics’ as input
variables, ‘player in game states and behaviours’ as a kind of process variables with flow
as a central theme, and ‘outcomes’ as the output variables. Connolly, Stansfield and
Hainey (2009) describe some existing evaluation models for serious games. For instance,
they describe the game object model (GOM; Tan, Ling, and Ting, 2007), the CRESST
model (Baker and Mayer, 1999), and the four dimensional framework (FDF; De Freitas
and Oliver, 2006). Connolly, Stansfield and Hainey (2009) also propose an integrated
evaluation framework for game-based learning. The idea of their model is that games can
be evaluated in terms of learner performance, motivation, perceptions, and preferences as
well as the game environment and collaboration between players.
All these models have in common that they only present high-level views on
evaluation of game-based learning with the focus on gaming design features. They do
not; however, specify which indicators should be measured in line with the generic
evaluation models of Kirkpatrick and others. Mayer (2012) presents these indicators more
clearly. His conceptual model contains various types of variables at individual or team
level and organisational or system level, that are also present in other generic evaluation
models (e.g., Kraiger et al., 1993; Tannenbaum et al., 1993). As in Kirkpatrick’s model,
outcome variables are measured directly after the gaming intervention and more indirect
effects are measured. Mayer’s model presents a pre-condition, the gaming intervention,
and a post-condition of which (quasi) experimental designs could be derived. Mayer
(2012) also stresses the dilemmas between generalisability and standardisation necessary
for comparative research, and specificity and flexibility necessary for evaluation of single
cases. Generic constructs as part of validated measurement instruments can be re-used for
various effectiveness studies of different games so that comparative studies could be done
in line with our ambition.
In sum, between the various evaluation models of game-based learning the model of
Mayer (2012) fits best in the approach of evaluating serious games that we would like to
propose. His model is most in line with general principles on evaluation models by
Kirkpatrick and others, applied on serious games.
3 Design of the game-based learning effectiveness measurement

(GEM) model
This section presents the proposed game-based learning evaluation model (GEM). It
explains the main principles and its theoretical foundation.
3.1 Design method

The design method used was rather iterative with various versions of the model
(Korteling, Oprins and Venrooij, 2014). GEM was mainly based on literature as
explained in this article. Additionally, two expert workshops were organised in order to
check the practical usability of the model with experts on serious gaming, validation
research, and learning theory from various universities and knowledge institutes. The
experts applied this evaluation framework using checklists and questionnaires on two
different serious games. The results have led to a next version of GEM. Finally, various
versions of GEM were applied in studies on effectiveness of games which have also led
to improvements. Three of these studies will be briefly summarised in the next section.
The final version of GEM is presented.
3.2 The game-based learning evaluation model (GEM)

The evaluation framework that we propose (see Figure 1) has certain basic principles
based on the existing models presented in the previous section. We present this
framework as an input – process – output model in which both process and outcome
variables are included (Alvarez et al., 2004; Kraiger et al., 1993; Tannenbaum et al.,
1993, Mayer, 2012). The outcome variables are domain-specific and can be measured at
various levels as proposed by Kirkpatrick (1976, 1998), including transfer of training
(Baldwin and Ford, 1988). The process variables applicable for learning are generic so
that various serious games can be compared with each other (Mayer, 2012). We have
divided them into emotional–motivational and cognitive indicators. Emotional–
motivational factors influence information processing and learning (Hockey, 1997;
Richardsson et al., 2012). The emotional–motivational and cognitive factors together are
referred to as learning indicators in this evaluation framework. The specific serious
gaming features of the intervention are considered the design indicators. Some design
indicators (with *) can be part of the game itself, called ‘small game’ and also of the
learning environment in which the game is embedded, called ‘big game’. In accordance
with many other models (Cannon-Bowers, J.A., Salas, E., Tannenbaum, S.I. and Mathieu,
J.E. (1995); Mayer, 2012) the input is determined by personal features of the learners as
usually distinguished such as personality, experiences, and abilities (Roe, 2005).
Moreover, environmental factors at personal and organisational level influence learning.
Figure 1 The game-based learning evaluation model (GEM) (see online version for colours)
GEM indicates which type of indicators should be measured in validation research and
how they relate to each other. The idea is that a validation study should always measure
the learning outcomes, design and learning indicators, personal features, and
environmental influences. For each study, choices should be made which indicators are to
be measured, based on research hypotheses. This framework presents the most important
indicators adopted from literature.
3.3 Design and learning indicators

A really new approach of GEM is the distinction between design indicators specifically
for serious gaming and generic learning indicators in line with Garris et al. (2002). In our
view it is necessary to make this distinction to be able to investigate which design
features help to improve which aspects of learning. This provides insight in the ‘black
box’ of learning. For instance, it can be assumed that ‘challenge’ helps to motivate
learners and that ‘control’ enhances self-directedness.
3.3.1 Design indicators

In the last decades, many theories have come up explaining the beneficial elements of
serious games for learning (e.g., Malone, 1981; Leemkuil, de Jong, and Ootes, 2000;
Gee, 2003; Pavlas, 2010). Looking at all these theories and lists, one can draw only one
conclusion: there is no consensus about the terminology and comparable importance of
indicators. The best starting point was the work performed by Bedwell et al. (2012).
From an extended list of game attributes, they came up with nine categories. We adopted
the categorisation with some changes, based on other literature.
3.3.2 Action language

The action language of a game includes the method by which a user interacts with the
game and the interface that allows them to do so (Bedwell et al., 2012). In other words,
action language refers to the usability of the game. If the user interaction and interface are
poorly designed, the game is not playable at all (Wang, Shen and Ritterfeld, 2009).
3.3.3 Feedback
Feedback is a broad term comprising various elements of feedback within the small game
and the setting in which the game is played (big game). In the small game, we refer to
feedback as:
1 the measurement of achievement (e.g., scores)
2 information related to the progress towards game goals
3 direct effects on actions
4 instructional support (e.g., hints).
Feedback is also part of the big game, as successful serious games often include a
debriefing session (Garris et al., 2002).
3.3.4 Challenge
Challenge refers to the actual content of the game, the problem the player is faced with.
One of the most important aspects is the balance between the challenge of the game and
the skills of the player (Csikszentmihali, 1990; Garris et al., 2002; Pavlas, 2010). In other
words, the game should not be too easy and not too difficult. As skills generally improve
during play, challenge should progress as well. This can be provided by an adaptive
system, measuring the performance of the player and choosing the optimal level of
challenge (Sweetser and Whyeth, 2005). Another aspect of challenge denotes the nature
of the problem to be solved, characterised by uncertainty, mystery, and curiosity (Garris
et al., 2002; Gee, 2003; Koster, 2005).
3.3.5 Control
A feeling of agency or power about what happens in the game is often the result of
control. More specifically, it is the amount of active learner control over content or
gameplay and the player’s capacity for influence over elements of the game (Garris et al.,
2002). This can be created by allowing players to select their own strategy (Graesser
et al., 2009). It can also be created by incorporating those tools the player would expect
to reach the goal (Squire, 2011). Powerful as well is to allow players to manipulate or
co-design the game world (Gee, 2003).
3.3.6 Game worlds

The game world encompasses everything that has to do with the context of the game: the
location, the story (or narrative), the role of the player, and other players. These elements
of the game world can either be realistic or fantastic, as long as relevant aspects for the
learning goals are present (Graesser et al., 2009). An often discussed topic related to a
realistic game world is fidelity. Key for fidelity is that a ‘relevant reality’ is met; striving
for full fidelity is often not necessary (Visschedijk, Lazonder, Hulst, van der, Vink, and
Leemkuil, 2013). A related aspect of game worlds is sensory stimuli, which refers to the
degree of dramatic or novel visual and auditory stimuli (Garriset al., 2002).
3.3.7 Social interaction

Social interaction is about interpersonal (face-to-face) interaction as well as technology
mediated interaction between people and is assumed to be very important (Gee, 2003). In
the small game it is often about playing together in cooperation or in a competitive
setting. In the big game it is crucial for reflective discussions in debriefing sessions.
3.3.8 Rules & goals

Rules and goals refer to the core of the gameplay. They are the reasons for which the
player interacts with the game world, and the motivation for their in-game actions
(Bedwell et al., 2012). It is widely recognised that clear goals enhance performance
(Anderson and Krathwohl, 2001), as humans store their experiences best in terms of how
their goals did or did not work out (Gee, 2003). The rules determine the method by which
a player can solve problems, and with that reach the goals of the game. Another aspect is
that games leave room for players to make goals personal (Gee, 2003).
3.4 Learning indicators

The choice for the learning indicators is based on earlier research in which certain
indicators are generally combined with each other. For instance, Pintrich and De Groot
(1990) have designed the motivated strategies for learning questionnaire (MSLQ) which
includes the indicators as defined in our framework: self-efficacy, motivation, and self-
directedness. Ryan and Deci (2000) also put these three constructs in one model to
evaluate learning.
3.5 Emotional–motivational indicators
3.5.1 Self-efficacy
Self-efficacy has been defined as individuals' beliefs about their performance capabilities
in a particular domain, or ‘the power of believing you can’ (Bandura, 1997). In an
achievement context, it includes learners' confidence in their cognitive skills to learn or
perform the task.
3.5.2 Motivation
Motivation is a very broad construct which has been proven to have a strong influence on
learning. Pintrich and The Groot (1990) consider self-efficacy even as a part of
motivation. Motivation refers to the perceived importance or interest in what is learned.
The intrinsic motivation inventory (IMI) by Ryan and Deci (for example in Deci et al.,
1994) shows the divers constructs of motivation.
3.5.3 Engagement
Engagement describes a person’s active involvement in a task or activity (Reeve et al.,
2004) and is considered as stimulating Learning (Appleton et al., 2006). In the serious
gaming literature engagement is often related to immersion and presence. Here
engagement refers to the player's subjective acceptance of a game's reality, as well as
their degree of involvement and focus on this reality (McMahan, 2003).
3.6 Cognitive indicators
3.6.1 Self-directedness
Self-directedness, including related constructs such as self-regulation and self-directed
learning, is assumed to enhance learning, and is often mentioned as one of the most
important desirable effects of new learning solutions (Pintrich and De Groot, 1990).
Self-directed learning refers to the extent in which learners take their own responsibility
in their learning process (Stubbé and Theunissen, 2008).
3.6.2 Learning activity

Learning activities are the cognitive activities that the learner undertakes to learn. They
include, for instance, the application of various learning strategies (e.g., Barnard et al.,
2009) such as rehearsal, organisation, and elaboration as well as collaborative activities

such as help seeking and peer learning (Richardson, Abraham and Bond, 2012).
3.6.3 Mental effort

Effort is assumed to affect information processing (e.g., Hockey, 1997). The amount of
attention and concentration determines how much mental effort is put into learning.
Zijlstra (1993) had developed the rating scale mental effort (RMSE) that is often used to
measure mental effort during learning.
4 Applying gem in empirical research
GEM evolved over time, based on its application in multiple studies. In this section, we
will highlight three studies in which it demonstrated its use. In all studies the overall goal
was to investigate the effectiveness of a particular game (experimental group) in
comparison with another way of learning, usually classroom instruction (control group),
measuring both outcome and process variables.
4.1 Study 1: economical serious games in education

An overall study with three different economical serious games was done at two
university of applied sciences and one high school with a total of 133 participants in the
gaming group and 95 in the control group (Bakhuys Roozeboom, Visschedijk and Oprins,
in press). Figure 2 shows the experimental design.
Figure 2 Experimental design three economical games in education (see online version
for colours)
All three studies measured the same indicators adopted from GEM. We used self-made
scales, rated on a five points scale, which were updated based on a factor analysis and
internal consistency (N = 228). The finally used design indicators were ‘control’
(alpha = .82), ‘challenge’ (alpha = .69), ‘feedback’ (alpha = .85), ‘social interaction’
(alpha = .85), and the learning indicator ‘engagement’ (alpha = .90). In this study, the
learning indicators ‘motivation’ and ‘self-efficacy’ were only used as control variables.
Learning outcomes were measured by self-assessment and knowledge tests different for
each game. One of the games was ease-IT. It is a non-digital multiplayer simulation about
business engineering. Students economics at the university (N = 88) played the game
during half a day. The control group with comparable students (N = 22), followed
classroom instruction by the same teacher with similar content.
Independent samples of t-tests were performed to investigate differences on learning
Indicators in relation to condition. For ease-IT we found that all five indicators were
significantly higher for the experimental group. For instance, the score on ‘control’ of the
experimental group (N = 76, M = 3.60, SD = .53) in comparison with the control group
(N = 20, M = 2.11, SD = .73) was significantly higher (t = −8.90, df = 27.30, p < .000). In
addition, independent samples of t-tests were performed to investigate the differences in
progression on self-assessment (post-test minus pre-test). Progression on all self-assessed
competences were significantly higher for the experimental group using ease-IT. For
example, progression on ‘Strategic Management and Organisation’ of the experimental
group (N = 76, M = .54, SD = .61) in comparison with the control group (N = 20,
M = −.14, SD = .66) was significantly higher (t = −4,50, df = 96, p < .000). Apparently,
the game group did have more faith in their own competences, in other words, their self-
efficacy must have grown. Effects of serious gaming on the knowledge tests were not
found, probably because the type of tests did not match the learning goals of the games.
Furthermore, the influence of the five indicators on the self-assessed competences was
investigated with a multivariate linear regression analysis. The results for Ease-It showed
that ‘feedback’ (B = .37, p = .07) and ‘challenge’ (B = .39, p = .07) explained the
differences in progress on the self-assessed competence ‘strategic management and
organisation’ between the experimental and control group. We found similar results for
the other two games that are described in detail in Bakhuys Roozeboom et al., in press.
This first study taught us that measuring design and learning indicators in relation to
learning outcomes, as proposed by GEM, is useful for understanding why some serious
games are effective, . Since we found similar results over the three economical games,
using a similar experimental design, we could argue that design features as challenge,
control, social interaction, and feedback really adds to the value of serious games.
However, we did not yet measure the learning indicators in an appropriate way as
proposed by GEM, using ‘motivation’ and ‘self-efficacy’ only as control variables.
4.2 Study 2: Emergency skill training: ‘abcdeSIM game’

In the second study a medical serious game called ‘abcdeSIM’ was investigated with
respect to its effectiveness (Dankbaar et al., 2014a). The serious game contains a virtual
emergency room in which doctors can train their emergency skills, as a preparation for a
two-week face-to-face (f2f) training. A total of 159 students participated (52 in the
control and 107 in the experimental group). The experimental design is different from the
first project (see Figure 3). Both groups received course material as a preparation, the
experimental group additionally played the game for circa 2.5 hours before training.
Figure 3 Experimental design abcdeSIM study (see online version for colours)
We measured the following learning indicators using the Motivated Strategies for
Learning Questionnaire (MSLQ, Pintrich and De Groot, 1990) before the training
(N = 159): ‘self-efficacy’ (alpha = .83), ‘intrinsic value’ (alpha = .76), ‘test anxiety’
(alpha = .87) and ‘self-regulation’ (alpha = .56). From the ‘intrinsic motivation inventory’
(IMI, Deci et al., 1994) the following learning indicators were measured after the
training: ‘enjoyment’ (alpha = .81), ‘perceived competence’ (alpha = .81), ‘effort’
(alpha = .66), ‘value’ (alpha = .77), and ‘pressure/tension’ (alpha = .86). Generally, the
Cronbach’s alphas are sufficiently high except for ‘self-regulation’ and ‘effort’. The
learning outcomes were measured using a regular skill assessment in which the junior
doctor was confronted with a role-play patient and an assessor. The skill assessment was
validated in a separate study (Dankbaar et al., 2014b). Also a self-assessment
questionnaire on emergency skills was completed several times.
Independent samples of t-tests were performed to investigate differences on learning
outcomes between the two groups. Results of the study showed that after the game,
before face-to-face training, the experimental group performed better on clinical
competencies of the skill assessment and the variability in scores was smaller compared
to the nongame group (7-point scale; M = 4.3/3.5; SD = 0.75/1.27, p = 0.03; Cohen’s
d = 0.62) (Dankbaar et al., 2015). The self-assessment scores on clinical competencies
before training were also higher for the experimental group than for the control group
(M = 4.7/4.4, p = 0.01). After two weeks of training, a positive effect of the preparatory
game was no longer found on the assessment or self-assessment scores. An obvious
explanation for this is learning time: the effect of the 2–2.5 hour’s game learning time
was overshadowed by the 2-weeks of face-to-face training. Independent samples of t-tests
were performed to investigate differences on learning indicators in relation to condition.
We did not find significant differences between the control group and the game group on
any of the learning indicators.
This brings us to the findings in relation to GEM. We applied GEM in a typical
experimental design with a face-to-face training for both groups. The purpose of the
game preparation, is different from offering an alternative learning method. As the game
was only used by the experimental group, comparison of learning indicators had to be
directed to the training instead of the game. The learning indicators based on MSLQ and
IMI showed good reliability to use further in study 3. This second study especially
showed the importance of measuring the learning outcomes in a reliable and valid way as
proposed by GEM.
4.3 Study 3: English pronunciation in education: ‘LINGO online’

In the third study we were able to apply GEM in its complete form with a game called
‘LINGO Online’ (Trooster et al., 2015). The game uses voice recognition software so that
children can practice their English pronunciation skills. Its effectiveness was measured at
two primary schools and one high school (total N=131) with eight weeks of playing the
game during half a day per week. Figure 4 shows the various indicators measured. For
the learning outcomes, the students were individually tested on their pronunciation skills
using six standardised language tests. Also a self-assessment was conducted.
Figure 4 Experimental design LINGO online study (see online version for colours)
The validated existing questionnaire IMI (Deci et al., 1994) was used for the learning
indicators. Self-made scales mainly adopted from study 1 were used for the design
indicators and rated on a five points scale. The scales were updated for this target group
based on factor analysis and internal consistency. An important change was that control
and challenge were combined in one scale. The final scales, all with 4 to 5 items, used for
the experimental group (N = 37) were: ‘feedback’ (alpha = .74), ‘challenge & control’
(alpha = .71), ‘rules & goals’ (alpha = .73), ‘action language’ (alpha = .73), and ‘game
world’ (alpha = .75). For the control group (N = 38) only the first same three scales were
used: ‘feedback’ (alpha = .62), ‘challenge & control’ (alpha = .68), ‘rules & goals’
(alpha = .72).
The study showed interesting results based on the learning indicators. The differences
between the average scores on progression on ‘motivation’ (post-test minus pre-test) of
only the experimental group on the pre-test (N = 63, M = 3.25, SD = .89) and the post-test
(N = 62, M = 3.60, SD = .78) investigated with independent samples of t-test are
significant (t = −3.03; df = 58; p = .004). We did not find a significant difference for the
control group. This implies that the motivation of the experimental group has
significantly increased in contrast with the control group. We also found a significant
difference for ‘engagement’ between the control group (N = 54, M = 2.71, SD = .74) and
experimental group (N = 53, M = 2.89, SD = .88), (t = −1.19; df = 117; p = .24). We did
not find significant differences for the other learning indicators.
Finally, the results showed significant differences between control and experimental
group, split up for type of school, for the one minute test. This outcome measure is a
standardised test for English pronunciation. For primary schools the differences of the
progression (post-test minus pre-test) between control group (N = 79; M = 2.35,
SD = 5.41) and experimental group (N = 76; M = 4.63, SD = 3.77) were significant as
measured with independent samples of t-test (t = −2.09, df = 73, p = .04). We did not find
any significant differences in self-assessments. This might be explained by the fact that
children find it difficult to assess themselves. Furthermore, we did not find any effects for
the design indicators. Based on its reliability, this questionnaire might not be suitable in
its current form for children.
In relation to GEM, which was applied in most optimal form until now, we may state
that it is useful to measure the learning indicators to explain certain learning outcomes in
relation to condition as also shown in study 1. The learning indicators should optimally
be measured both in pre-test and post-test under the assumption that serious gaming
improves motivation, engagement, and self-efficacy etcetera. Comparable to study 2,
effects in learning outcomes were found based on standardised tests. The questionnaire
for design indicators, which is the only one that is not based on validated questionnaires,
is usable but should be improved further based on multiple studies applying GEM.
5 Discussion and conclusions
5.1 The game-based learning evaluation model (GEM)

With the game-based learning evaluation model (GEM), we propose a new framework
for measuring the effectiveness of serious games with a learning goal in a systematic and
standardised way. The idea behind measuring learning interventions using an input–
process–output model is not new (see for example Alvarez et al., 2004; and Mayer,
2012). What is new is the interpretation and operationalisation of the process variables in
addition to the usually measured outcome variables (cf. learning objectives). We have
made a distinction between learning indicators and design indicators, and specified these
for serious games (e.g., Bedwell et al., 2012). The ambition is that this helps to get insight
in the ‘black box’ of learning: an indication of which design features (positively)
influence learning. By using the same generic indicators in many different studies, results
could be generalised over more serious games in the future (see also Mayer, 2012). This
not only helps to choose a suitable game by or for the users, but it also helps serious
gaming designers to improve their products.
5.2 GEM applied in empirical research

The application of GEM in various empirical studies showed that GEM is a suitable and
practical framework for effectiveness in research. GEM provides insight in the design
and learning indicators which positively influence the learning outcomes: the basic idea
of GEM. Various studies provided extra insight in the indicators that must be measured in
a pre- and or post-test and into the quality of their operationalisation in the
questionnaires. In this way, the indicators were not only based on an extensive literature
search, but also applied and matured during the three studies in which completely
different serious games were evaluated. The study-specific results from data analyses are
rather different due to differences in the set-up of the studies, learning environments,
domains, and measures used. The studies had a comparable (quasi-) experimental design
with control group (usually classroom instruction) and experimental group (gaming), and
a pre-test and post-test. Ideally, all learning indicators should be measured before and
after the learning intervention to explore the growth based on the intervention. Both pre-
and post-test should comprise maximally reliable and valid domain-specific performance
measures. Mainly due to practical reasons, this optimal experimental design was not
always possible and this has had an effect on the results. Table 1 summarises the
significant results found for the constructs measured in the three studies.
Table 1 Overall results of the three studies
Study 1 (Ease-IT) Study 2 (abcdeSIM) Study 3 (LINGO online)

Learning outcomes – + +
(knowledge, skills)
Self-assessment + + –
Learning indicators + (engagement) – + (motivation, engagement)
Design indicators + (challenge, (not measured) –
social interaction,
feedback, control)
In the first study (Bakhuys Roozeboom et al., in press) we did not find clear significant
differences between experimental and control group in outcome measures. This was
mainly due to the methodology used, that is, a mismatch of learning objectives in the
game with the outcome measures. In the second study (Dankbaar et al., 2014) significant
differences were found in the skill assessments between the two groups (with or without
the game). The same result was found for the third study (Trooster et al., 2015), in which
standardised language tests were used. This emphasises the importance of measuring the
learning outcomes in a reliable and valid way as proposed by GEM. In all three projects
self-assessments were included as well, which point at self-efficacy (Pintrich and De
Groot, 1990). For self-assessments we found (significant) results in the first two studies
(Bakhuys Roozeboom et al., in press, Dankbaar et al., 2014) in which this was measured
quite profoundly. In the second study we found that self-assessment was a more reliable
outcome measure than self-efficacy since it correlates with the skill assessment.
Especially younger children appear to have difficulties with assessing themselves in a
reliable way as shown in the third study (Trooster et al., 2015). Thus, the usefulness of
self-assessment as outcome measures differs per goal and per target group.
As argued for GEM, it is also important to investigate the influence of the design and
learning indicators on the learning outcomes in order to get insight into the black box of
learning. It is important to know which design features (e.g., challenge) contribute to
effective learning and why (e.g., higher motivation) to be able to improve the serious
games based on the evaluation results (e.g., Mayer, 2012). Here we found even more
variation in the results of the various studies. It must be mentioned; however, that GEM
was still in development during the studies. Different combinations of design and
learning indicators were measured in various ways, both in pre- and post-test or only
once. We used different existing questionnaires for the learning indicators, IMI, and
MSLQ, to find out what works best. Besides, we applied a self-made questionnaire for
the design indicators. This questionnaire has evolved over time and must still be
improved in further studies.
In the first study (Bakhuys Roozeboom et al., in press) the learning indicator
‘engagement’ appeared to have an important influence on learning outcomes as found
earlier (e.g., Bedwell et al., 2012; McMahan, 2003). In addition, effects were found for
the design indicators that have been marked as important features of serious games by
others, that is, ‘challenge’, ‘social interaction’, ‘feedback’, and ‘control’ (e.g., Bedwell
et al., 2012; Gee, 2009). In general, this study showed that measuring these mediating
learning and design indicators is worthwhile. The second study (Dankbaar et al., 2014a,b)
had a different experimental design. The game was not meant to replace another type of
intervention but as a (extra) preparation for face-to-face training. This made it difficult
and less useful to compare the two conditions (gaming, control) on design and learning
indicators. The third study (Trooster et al., 2015) looked more like the first study. We
found comparable effects for ‘engagement’ and we were able to conclude that
‘motivation’ was increased by the game as being shown in many studies (e.g.,
Csikszentmihali, 1990; Pintrich and De Groot, 1990).
Thus, despite the fact that the studies differed highly in their experimental designs
and the indicators measured, we were able to make a first start with combining results
over multiple games: an important ambition of GEM in the future. We can conclude that
GEM is a practically usable framework for measuring the effectiveness for serious
gaming for learning purposes under the condition that it is appropriately and
methodologically applied.
5.3 Future research

There are still some challenges of GEM that need attention in future research. The most
specific one is the questionnaire that measures the game design indicators. We had to
design this questionnaire from scratch while we could use existing ones (like IMI,
MSLQ) for the learning indicators. Consequently, the quality of the design indicator
scales still needs to be improved. In addition, GEM should be applied further on a variety
of games. This will enable us to get more empirical data on the design and learning
indicators that are really necessary to measure in evaluation research, not only based on
literature. In other words, this will further improve GEM. Therefore we invite fellow
scholars to use GEM as a standardised method. If we can collect a lot of rich data, we
will really be able to generalise results over multiple games. This will open the black box
of learning with serious games further. We will get more insight into the reasons why
serious games are effective in terms of learning processes.
For game designers, it is interesting to analyse a game in more detail using the design
indicators. Accordingly, we can learn from the way the game is designed. Of course, this
is still just one game and the mechanic does not need to work for all types of games.
Imagine though that more and more effectiveness studies are conducted using GEM
measuring the same indicators. With all the results of the data analyses and comparing
these with the design mechanics of the different games, our belief is that the patterns that
will obviously occur can provide us with thorough and more informed guidelines for
serious game design. Although we realise that game design, also for games used for
learning, will remain a creative process with no definite right or wrong design principles,
serious game designers could be provided with more informed guidelines as to what
makes a serious game effective and motivating in the future, based on evidence achieved
in several effectiveness studies using GEM.
References
Alliger, G.M. and Janak, E.A. (1989) ‘Kirkpatrick’s levels of training criteria: thirty years later’,
Personnel Psychology, Vol. 42, pp.331–342.
Alvarez, K., Salas, E. and Garofano, C.M. (2004) ‘An integrated model of training evaluation and
effectiveness’, Human Resource Development Review, Vol. 3, No. 4, pp.385–416.
Anderson, L.W. and Krathwohl, D.R. (2001) A Taxonomy for Learning, Teaching, and Assessing:
A Revision of Bloom’s Taxonomy of Educational Objectives, Longman, New York.
Appleton, J., Christenson, S.L., Kim, D. and Reschly, A.L. (2006) ‘Measuring cognitive and
psychological engagement: validation of the student engagement instrument’, Journal of
School Psychology, Vol. 44, pp.427–445.
Bakhuys Roozeboom, M., Visschedijk, G. and Oprins, E. (in press) ‘The effectiveness of three
serious games measuring generic learning features’, British Journal of Educational
Technology.
Bandura, A. (1997) Self-Efficacy: The Exercise of Control, Freeman, New York.
Barnard, L., Lan, W.Y., Yen, M.T., Osland Paton, V. and Lai, S. (2009) ‘Measuring self-regulation
in online and blended learning environments’, Internet and Higher Education, Vol. 12,
pp.1–6.
Bates, R. (2004) ‘A critical analysis of evaluation practice: the Kirkpatrick model and the principle
of beneficence’, Evaluation and Program Planning, Vol. 27, pp.341–347.
Bedwell, W.L., Pavlas, D., Heyne, K., Lazzara, E.H. and Salas, E. (2012) ‘Toward a taxonomy
linking game attributes to learning: an empirical study’, Simulation and Gaming, Vol. 43,
No. 6, pp.729–760.
Bekebrede, G., Warmelink, H.J.G. and Mayer, I.S. (2011) ‘Reviewing the need for gaming in
education to accommodate the Net generation’, Computers and Education, Vol. 57, No. 2,
pp.1521–1529, doi:10.1016/j.compedu.2011.02.010.
Cannon-Bowers, J.A., Salas, E., Tannenbaum, S.I. and Mathieu, J.E. (1995) ‘Toward theoretically
based principles of training effectiveness: a model and initial empirical investigation’, Military
Psychology, Vol. 7, pp.141–164.
Connolly, T.M., Boyle, E.A., MacArthur, E., Hainey, T. and Boyle, J.M. (2012) ‘A systematic
literature review of empirical evidence on computer games and serious games’, Computers
and Education, Vol. 59, No. 2, pp.661–686, doi:10.1016/j.compedu.2012.03.004.
Connolly, T.M., Stansfield, M.H. and Hainey, T. (2009) ‘Towards the development of a games-
based learning evaluation framework’, Connolly, T.M., Stansfield, M.H. and Boyle, E. (Eds.):
Games-based Learning Advancement for Multisensory Human Computer Interfaces:
Techniques and Effective Practices, Idea-Group Publishing, Hershey. ISBN: 978-1-60566-
360-9.
Csikszentmihali, M. (1990) Flow: The Psychology of Optimal Experience, Harper and Row, New
York.
Dankbaar, M.E.W., Bakhuys Roozeboom, M., Oprins, E.A.P.B., Rutten, F., van Saase, J.J.L.C.M.,
van Merrienboer, J.J.G., Schuit, S.C.E. (2014a) ‘Gaming as a training tool to train cognitive
skills in emergency care: how effective is it?’, in Schouten, B. et al. (Eds.): Games for Health,
Proceedings of the 4th Conference, Springer, New York, pp.13–15.
Dankbaar, M.E.W., Stegers-Jager, K.M., Baarveld, F., van Merrienboer, J.J.G., Norman, G.R.,
Rutten, F.L., van Saase, J.L.C.M. … Schuit, S.C.E. (2014b), ‘Assessing the assessment in
emergency care training’, PLoS One, Vol. 9, No. 12, p.e114663.
Dankbaar, M.E.W. (2015) ‘Serious games and blended learning; effects on performance and
motivation in medical education’, Thesis, Erasmus University Rotterdam, the Netherlands
(ch. 4), pp.61–78.
Deci, E.L., Eghrari, H., Patrick, B.C. and Leone, D. (1994) ‘Facilitating internalization: the self-
determination theory perspective’. Journal of Personality, Vol. 62, pp.119–142.
Egenfeldt-Nielsen, S. (2006), ‘Overview of research on the educational use of video games’,
Digital Kompetanse, Vol. 1, No. 3, pp.184–213.
Garris, R., Ahlers, R. and Driskell, J.E. (2002) ‘Games, motivation and learning: a research and
practice model’, Simulation and Gaming: An Interdisciplinary Journal, Vol. 33, pp.441–467.
Gee, J.P. (2003) What Videogames Have To Teach Us About Learning And Literacy, Palgrave
Macmillan, New York.
Gee, J.P. (2009) ‘Deep learning properties of good digital games. How far can they go?’, In
Ritterfeld, U., Cody, M. and Vorderer, P. (Eds.): Serious Games: Mechanisms and Effects,
Routledge, New York/London, pp.68–82.
Graesser, A.C., Chipman, P., Leeming, F. and Biedenbach, S. (2009) ‘Deep learning and emotion
in serious games’, in Ritterfeld, U., Cody, M. and Vorderer, P. (eds.): Serious Games:
Mechanisms and Effects, Routledge, Taylor and Francis, New York and London, pp.81–100.
Harteveld, C. (2012), Making Sense of Virtual Risks: A Quasi-Experimental Investigation into
Game-Based Training, IOS Press, Amsterdam.
Hockey, G.R. (1997), ‘Compensatory control in the regulation of human performance under stress
and high workload: a cognitive-energetical framework’, Biological Psychology, Vol. 45,
pp.73–93.
Holton, E.F. (1996), ‘The flawed four-level evaluation model’, Human Resource Development
Quarterly, Vol. 7, pp.5–21.
Kirkpatrick, D.L. (1976) ‘Evaluation of training’, in Craig, R.L. (Ed.): Training and
Development Handbook: A Guide to Human Resource Development, McGraw Hill,
New York, pp.301–319.
Kirkpatrick, D.I. (1998) Evaluating Training Programs: The Four Levels, 2nd ed., Berrett-Koehler,
San Francisco.
Korteling, J.E., Helsdingen, A.S. and Theunissen, N.C.M. (2012) ‘Serious games @ work: learning
job-related competences using serious gaming’, in Bakker, A. and Derks, D. (Eds.): The
Psychology of Digital Media at Work, Psychology Press LTD, Taylor and Francis Group,
pp.123–144
Korteling, J.E., Oprins, E.A.P.B. and Venrooij, W. (2014) Evaluatie van leerinterventies en
teamfunctioneren in dynamische teams [evaluation of learning interventions and team
functioning in dynamic teams] (TNO report R10243), TNO, Soesterberg.
Kraiger, K. (2002), ‘Decision-based evaluation’, in Kraiger, K. (ed): Creating, Implementing and
Managing Effective Training and Development, Jossey-Bass, San Francisco CA, pp.331–375.
Kraiger, K., Ford, J.K. and Salas, E. (1993) ‘Application of cognitive, skill-based, and affective
theories of learning outcomes to new methods of training evaluation’, Journal of Applied
Psychology, Vol. 78, No. 2, pp.311–328.
Leemkuil, H., de Jong, T. and Ootes, S. (2000) Review of Educational Use of Games and
Simulations, (IST-1999-13078 Deliverable D1), University of Twente, Enschede.
Malone, T.W. (1981) ‘Towards a theory of intrinsically motivating instruction’, Cognitive Science,
Vol. 4, pp.333–369.
Mayer, I. (2012) ‘Towards a comprehensive methodology for the research and evaluation of serious
games’, Procedia Computer Science, Vol. 15, pp.233–247.
McMahan, A. (2003) ‘Immersion, engagement, and presence: a method for analyzing 3-d video
games’, in Wolf, M.J.P. and Perron, B. (Eds.): The Video Game, Routledge, Taylor and
Francis Group, New York, pp.77–78.
Oprins, E. and Korteling, H. (2014) ‘Transfer of gaming: effectiveness of a cashier trainer’, in Cai,
Y.Y. and Goei, S.L. (Eds.): Serious Games, Simulation and Their Applications, Springer,
Science and Business Media, Singapore. pp.227–253
Pavlas, D. (2010) A Model of Flow and Play in Game-Based Learning: The Impact of Game
Characteristics, Player Traits, and Player States, Unpublished PhD thesis, University of
Central Florida.
Pintrich, P.R. and de Groot, W.V. (1990) ‘Motivational and self-regulated learning components of
classroom academic performance, Journal of Educational Psychology, Vol. 82, No. 1,
pp.33–40.
Reeve, J., Jang, H., Carrell, D., Jeon, S. and Barch, J. (2004) ‘Enhancing students’ engagement by
increasing teachers’ autonomy support’, Motivation and Emotion, Vol. 28, No. 2, pp.147–169.
Richardson, M., Abraham, C. and Bond, R. (2012) ‘Psychological correlates of university students’
academic performance: a systematic review and meta-analysis’, Psychological Bulletin,
Vol. 138, No. 2, pp.353–387.
Roe, R.A. (2005) ‘The design of selection systems: contexts, principles, issues’, in Evers, A., Smit,
O. and Anderson, N. (eds.): Handbook of Personnel Selection, Blackwell, Oxford, pp.73–97.
Ryan, R.M. and Deci, E.L. (2000), ‘Intrinsic and extrinsic motivations: classic definitions and new
directions’, Contemporary Educational Psychology, Vol. 25, No. 1, pp.54–67.
Salas, E., Milham, L.M. and Bowers, C.A. (2003), ‘Training evaluation in the military:
misconceptions, opportunities, and challenges’, Military Psychology, Vol. 15, pp.3–16.
Shaffer. (2006) How Computer Games Help People Learn, Palgrave Macmillan, New York.
Sitzmann, T. (2011) ‘A meta-analytic examination of the instructional effectiveness of computer-
based simulation games’, Personnel Psychology, Vol. 64, pp.489–528.
Squire, K. D. (2011) Video Games and Learning: Teaching and Participatory Culture in the
Digital Age, Teachers College Press, New York.
Stubbé, H.M. and Theunissen, N.C.M. (2008) ‘Self-directed learning in a ubiquitous learning
environment: a meta-review’, Proceedings of Special Track on Technology Support for Self-
Organised Learners, Vol. 2008, pp.5–28.
Sweetser, P. and Wyeth, P. (2005), ‘Game flow: a model for evaluating player enjoyment in
games’, Computers in Entertainment, Vol. 3, No. 3, pp.1–24.
Tannenbaum, S., Cannon-Bowers, J., Salas, E. and Mathieu, J. (1993) Factors That Influence
Training Effectiveness: A Conceptual Model And Longitudinal Analysis (Tech. Rep. No. 93-
011), Naval Training Systems Center, Human Systems Integration Division, Orlando.
Tobias, S. and Fletcher, J.D. (2007) ‘What research has to say about designing computer games for
learning’, Educational Technology, Vol. 47, pp.20–29.
Trooster, W., Goei, S.L., Ticheloven, A. Oprins, E., Visschedijk, G., Corbalan, G., and Schaik, M.
van (2014) ‘The effectiveness of LINGO online, a serious game for English pronunciation’.
Report Windesheim University of Applied Sciences, Zwolle.
Vogel, J., Vogel, D.S., Cannon-Bowers, J., Bowers, C.A., Muse, K. and Wright, M. (2006)
‘Computer gaming and interactive simulations for learning: a meta-analysis’, Journal of
Educational Computing Research, Vol. 34, pp.229–243.
Wang, H., Shen, C. and Rittefeld, U. (2009) ‘Enjoyment of digital games what makes them
‘seriously’ fun?’, In Ritterfeld, U., Cody, M. and Vorderer, P. (Eds.): Serious Games:
Mechanisms and Effects, Routledge, New York/London, pp.25–47.
Zijlstra, F.R.H. (1993) Efficiency in Work Behaviour: A Design Approach for Modern Tools,
Unpublished PhD Thesis, University of Delft.
View publication stats

Gem Article No Abstract

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gem Article No Abstract

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

The game-based learning evaluation model (GEM): Measuring the

Article in International Journal of Technology Enhanced Learning · January 2015

Esther Oprins Gillian Christine van de Boer-Visschedijk

SEE PROFILE SEE PROFILE

Maartje Bakhuys Roozeboom M. E. W. Dankbaar

SEE PROFILE SEE PROFILE

Stepwise surgical procedure learning View project

Preparing resident View project

The user has requested enhancement of the downloaded file.

The game-based learning evaluation model (GEM):

Esther Oprins*and Gillian Visschedijk

Maartje Bakhuys Roozeboom

Stephanie C.E. Schuit

Copyright © 2015 Inderscience Enterprises Ltd.

Virtual Worlds & Serious Gaming in Education. He designed education with

The education and training community is increasingly accepting serious games as a

2 Evaluation models for effectiveness of game-based learning

2.1 Generic evaluation models

2.2 Evaluation models for serious gaming

3 Design of the game-based learning effectiveness measurement

3.1 Design method

3.2 The game-based learning evaluation model (GEM)

3.3 Design and learning indicators

3.3.1 Design indicators

3.3.2 Action language

3.3.6 Game worlds

3.3.7 Social interaction

3.3.8 Rules & goals

3.4 Learning indicators

3.5 Emotional–motivational indicators

3.6 Cognitive indicators

3.6.2 Learning activity

2009) such as rehearsal, organisation, and elaboration as well as collaborative activities

3.6.3 Mental effort

4 Applying gem in empirical research

4.1 Study 1: economical serious games in education

4.2 Study 2: Emergency skill training: ‘abcdeSIM game’

4.3 Study 3: English pronunciation in education: ‘LINGO online’

5 Discussion and conclusions

5.1 The game-based learning evaluation model (GEM)

5.2 GEM applied in empirical research

Table 1 Overall results of the three studies

Study 1 (Ease-IT) Study 2 (abcdeSIM) Study 3 (LINGO online)

5.3 Future research

View publication stats

You might also like