Professional Documents
Culture Documents
Gem Article No Abstract
Gem Article No Abstract
net/publication/290648891
CITATIONS READS
29 8,369
6 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by M. E. W. Dankbaar on 16 May 2016.
Mary Dankbaar
Erasmus University Medical Centre Rotterdam,
Institute of Medical Education Research,
PO Box 2040, 3000 CA Rotterdam, The Netherlands
Email: m.dankbaar@erasmusmc.nl
Wim Trooster
Hogeschool Windesheim,
Kenniscentrum Domein Bewegen & Educatie,
Postbus 10090, 8000 GB Zwolle, The Netherlands
Email: w.trooster@windesheim.nl
applied in different contexts show how GEM can be practically used and how
these studies have contributed to the improvement of GEM.
Keywords: applied gaming; educational games; educational technology;
effectiveness research; evaluation framework; evaluation model; evidence-
based; game-based learning; serious gaming; validation.
Reference to this paper should be made as follows: Oprins, E., Visschedijk, G.,
Roozeboom, M.B., Dankbaar, M., Trooster, W. and Schuit, S.C.E. (2015) ‘The
game-based learning evaluation model (GEM): measuring the effectiveness of
serious games using a standardised method’, Int. J. Technology Enhanced
Learning, Vol. 7, No. 4, pp.326–345.
Biographical notes: Esther Oprins, PhD, is a researcher at TNO, Soesterberg,
Netherlands since 2009. Her background is Educational Psychology. Her main
research interests at TNO are training design, assessment, evaluation, and
validation research of training with new technology (serious gaming,
simulation, simulators), in various domains such as education, aviation, and
military. She obtained her PhD in Competence Assessment of Air Traffic
Controllers at Maastricht University which she combined with her previous
work as Training Expert at Air Traffic Control the Netherlands (2002–2009).
Gillian Visschedijk, MSc, is a Scientific Researcher at the Training and
Performance Innovations Department at TNO. She graduated cum laude at the
University of Twente in Enschede with a Degree in Educational Science &
Technology. She specialises in the use of serious gaming and simulation
technology for training purposes in various domains such as education, health,
crisis management, and military. Her work is characterised by both design
projects and validation studies. Her main expertise in these projects is
combining didactical approaches and game design principles.
Maartje Bakhuys Roozeboom, MSc, is a Researcher at TNO, Leiden, since
2007. She works at the Department Work, Health, and Care. Her main interests
are in the area of evaluation research in various domains such as occupational
health and serious gaming. She has a background in Social Psychology and
Sociology and she has been involved in the development and (effect)
evaluation of various serious games.
Mary E.W. Dankbaar, MSc, is a Program Manager for e-learning at the
Erasmus University Medical Centre since 2006 and has a background in
Educational Psychology, with a specialisation in Technology-based Learning.
She has designed, developed, and implemented a large number of online and
blended programs for companies and for higher education organisations. Aside
from her work in instructional design and online learning, she is currently
working on her PhD thesis on the effectiveness of serious games in health care.
She has developed, in close cooperation with health care and game
professionals, the abcdeSIM ‘serious game’ for emergency care skills training.
From 2008 to 2013, she was the chairwomen of the national e-learning working
group of the Dutch Association for Medical Education (NVMO) and since
2012 she is a member of the scientific board of the Dutch Society for
Simulation in Healthcare.
Wim Trooster is currently working as a Senior Lecturer/Researcher on
e-learning at Windesheim University, Zwolle, the Netherlands. He has a Master
of Science in Biology and a PhD in Medical Sciences (Neurology). He gained
experience in Education as well (secondary vocational education and
university), where he developed a big interest in the application of IT for
Education. For the last six years he worked as a researcher on IT & Educational
Innovation at Windesheim University, among others on the topic of use of
328 E. Oprins et al.
1 Introduction
serious gaming in education and training. This article describes the theoretical
background, and main principles of this evaluation framework, and the practical
application in three studies.
Various evaluation models and frameworks for evaluation and measuring effectiveness of
learning interventions have been designed earlier, generic and gaming-specific. This
section discusses the characteristics and approaches of the various models in order to
conclude which principles should be leading in designing an appropriate evaluation
framework for game-based learning.
interest and user behaviours such as greater persistence on task. Output refers to the
debriefing and the learning outcomes. Interesting in their model is the distinction between
gaming design features, and the effects these have on the users’ perceptions and
following behaviour. Pavlas (2010) also describes the process of game-based learning.
Pavlas makes the distinction between ‘player traits’ and ‘game characteristics’ as input
variables, ‘player in game states and behaviours’ as a kind of process variables with flow
as a central theme, and ‘outcomes’ as the output variables. Connolly, Stansfield and
Hainey (2009) describe some existing evaluation models for serious games. For instance,
they describe the game object model (GOM; Tan, Ling, and Ting, 2007), the CRESST
model (Baker and Mayer, 1999), and the four dimensional framework (FDF; De Freitas
and Oliver, 2006). Connolly, Stansfield and Hainey (2009) also propose an integrated
evaluation framework for game-based learning. The idea of their model is that games can
be evaluated in terms of learner performance, motivation, perceptions, and preferences as
well as the game environment and collaboration between players.
All these models have in common that they only present high-level views on
evaluation of game-based learning with the focus on gaming design features. They do
not; however, specify which indicators should be measured in line with the generic
evaluation models of Kirkpatrick and others. Mayer (2012) presents these indicators more
clearly. His conceptual model contains various types of variables at individual or team
level and organisational or system level, that are also present in other generic evaluation
models (e.g., Kraiger et al., 1993; Tannenbaum et al., 1993). As in Kirkpatrick’s model,
outcome variables are measured directly after the gaming intervention and more indirect
effects are measured. Mayer’s model presents a pre-condition, the gaming intervention,
and a post-condition of which (quasi) experimental designs could be derived. Mayer
(2012) also stresses the dilemmas between generalisability and standardisation necessary
for comparative research, and specificity and flexibility necessary for evaluation of single
cases. Generic constructs as part of validated measurement instruments can be re-used for
various effectiveness studies of different games so that comparative studies could be done
in line with our ambition.
In sum, between the various evaluation models of game-based learning the model of
Mayer (2012) fits best in the approach of evaluating serious games that we would like to
propose. His model is most in line with general principles on evaluation models by
Kirkpatrick and others, applied on serious games.
This section presents the proposed game-based learning evaluation model (GEM). It
explains the main principles and its theoretical foundation.
experts applied this evaluation framework using checklists and questionnaires on two
different serious games. The results have led to a next version of GEM. Finally, various
versions of GEM were applied in studies on effectiveness of games which have also led
to improvements. Three of these studies will be briefly summarised in the next section.
The final version of GEM is presented.
GEM indicates which type of indicators should be measured in validation research and
how they relate to each other. The idea is that a validation study should always measure
the learning outcomes, design and learning indicators, personal features, and
environmental influences. For each study, choices should be made which indicators are to
be measured, based on research hypotheses. This framework presents the most important
indicators adopted from literature.
3.3.3 Feedback
Feedback is a broad term comprising various elements of feedback within the small game
and the setting in which the game is played (big game). In the small game, we refer to
feedback as:
1 the measurement of achievement (e.g., scores)
2 information related to the progress towards game goals
3 direct effects on actions
4 instructional support (e.g., hints).
Feedback is also part of the big game, as successful serious games often include a
debriefing session (Garris et al., 2002).
The game-based learning evaluation model 333
3.3.4 Challenge
Challenge refers to the actual content of the game, the problem the player is faced with.
One of the most important aspects is the balance between the challenge of the game and
the skills of the player (Csikszentmihali, 1990; Garris et al., 2002; Pavlas, 2010). In other
words, the game should not be too easy and not too difficult. As skills generally improve
during play, challenge should progress as well. This can be provided by an adaptive
system, measuring the performance of the player and choosing the optimal level of
challenge (Sweetser and Whyeth, 2005). Another aspect of challenge denotes the nature
of the problem to be solved, characterised by uncertainty, mystery, and curiosity (Garris
et al., 2002; Gee, 2003; Koster, 2005).
3.3.5 Control
A feeling of agency or power about what happens in the game is often the result of
control. More specifically, it is the amount of active learner control over content or
gameplay and the player’s capacity for influence over elements of the game (Garris et al.,
2002). This can be created by allowing players to select their own strategy (Graesser
et al., 2009). It can also be created by incorporating those tools the player would expect
to reach the goal (Squire, 2011). Powerful as well is to allow players to manipulate or
co-design the game world (Gee, 2003).
3.5.1 Self-efficacy
Self-efficacy has been defined as individuals' beliefs about their performance capabilities
in a particular domain, or ‘the power of believing you can’ (Bandura, 1997). In an
achievement context, it includes learners' confidence in their cognitive skills to learn or
perform the task.
3.5.2 Motivation
Motivation is a very broad construct which has been proven to have a strong influence on
learning. Pintrich and The Groot (1990) consider self-efficacy even as a part of
motivation. Motivation refers to the perceived importance or interest in what is learned.
The intrinsic motivation inventory (IMI) by Ryan and Deci (for example in Deci et al.,
1994) shows the divers constructs of motivation.
3.5.3 Engagement
Engagement describes a person’s active involvement in a task or activity (Reeve et al.,
2004) and is considered as stimulating Learning (Appleton et al., 2006). In the serious
gaming literature engagement is often related to immersion and presence. Here
engagement refers to the player's subjective acceptance of a game's reality, as well as
their degree of involvement and focus on this reality (McMahan, 2003).
3.6.1 Self-directedness
Self-directedness, including related constructs such as self-regulation and self-directed
learning, is assumed to enhance learning, and is often mentioned as one of the most
important desirable effects of new learning solutions (Pintrich and De Groot, 1990).
Self-directed learning refers to the extent in which learners take their own responsibility
in their learning process (Stubbé and Theunissen, 2008).
GEM evolved over time, based on its application in multiple studies. In this section, we
will highlight three studies in which it demonstrated its use. In all studies the overall goal
was to investigate the effectiveness of a particular game (experimental group) in
comparison with another way of learning, usually classroom instruction (control group),
measuring both outcome and process variables.
Figure 2 Experimental design three economical games in education (see online version
for colours)
336 E. Oprins et al.
All three studies measured the same indicators adopted from GEM. We used self-made
scales, rated on a five points scale, which were updated based on a factor analysis and
internal consistency (N = 228). The finally used design indicators were ‘control’
(alpha = .82), ‘challenge’ (alpha = .69), ‘feedback’ (alpha = .85), ‘social interaction’
(alpha = .85), and the learning indicator ‘engagement’ (alpha = .90). In this study, the
learning indicators ‘motivation’ and ‘self-efficacy’ were only used as control variables.
Learning outcomes were measured by self-assessment and knowledge tests different for
each game. One of the games was ease-IT. It is a non-digital multiplayer simulation about
business engineering. Students economics at the university (N = 88) played the game
during half a day. The control group with comparable students (N = 22), followed
classroom instruction by the same teacher with similar content.
Independent samples of t-tests were performed to investigate differences on learning
Indicators in relation to condition. For ease-IT we found that all five indicators were
significantly higher for the experimental group. For instance, the score on ‘control’ of the
experimental group (N = 76, M = 3.60, SD = .53) in comparison with the control group
(N = 20, M = 2.11, SD = .73) was significantly higher (t = −8.90, df = 27.30, p < .000). In
addition, independent samples of t-tests were performed to investigate the differences in
progression on self-assessment (post-test minus pre-test). Progression on all self-assessed
competences were significantly higher for the experimental group using ease-IT. For
example, progression on ‘Strategic Management and Organisation’ of the experimental
group (N = 76, M = .54, SD = .61) in comparison with the control group (N = 20,
M = −.14, SD = .66) was significantly higher (t = −4,50, df = 96, p < .000). Apparently,
the game group did have more faith in their own competences, in other words, their self-
efficacy must have grown. Effects of serious gaming on the knowledge tests were not
found, probably because the type of tests did not match the learning goals of the games.
Furthermore, the influence of the five indicators on the self-assessed competences was
investigated with a multivariate linear regression analysis. The results for Ease-It showed
that ‘feedback’ (B = .37, p = .07) and ‘challenge’ (B = .39, p = .07) explained the
differences in progress on the self-assessed competence ‘strategic management and
organisation’ between the experimental and control group. We found similar results for
the other two games that are described in detail in Bakhuys Roozeboom et al., in press.
This first study taught us that measuring design and learning indicators in relation to
learning outcomes, as proposed by GEM, is useful for understanding why some serious
games are effective, . Since we found similar results over the three economical games,
using a similar experimental design, we could argue that design features as challenge,
control, social interaction, and feedback really adds to the value of serious games.
However, we did not yet measure the learning indicators in an appropriate way as
proposed by GEM, using ‘motivation’ and ‘self-efficacy’ only as control variables.
Figure 3 Experimental design abcdeSIM study (see online version for colours)
We measured the following learning indicators using the Motivated Strategies for
Learning Questionnaire (MSLQ, Pintrich and De Groot, 1990) before the training
(N = 159): ‘self-efficacy’ (alpha = .83), ‘intrinsic value’ (alpha = .76), ‘test anxiety’
(alpha = .87) and ‘self-regulation’ (alpha = .56). From the ‘intrinsic motivation inventory’
(IMI, Deci et al., 1994) the following learning indicators were measured after the
training: ‘enjoyment’ (alpha = .81), ‘perceived competence’ (alpha = .81), ‘effort’
(alpha = .66), ‘value’ (alpha = .77), and ‘pressure/tension’ (alpha = .86). Generally, the
Cronbach’s alphas are sufficiently high except for ‘self-regulation’ and ‘effort’. The
learning outcomes were measured using a regular skill assessment in which the junior
doctor was confronted with a role-play patient and an assessor. The skill assessment was
validated in a separate study (Dankbaar et al., 2014b). Also a self-assessment
questionnaire on emergency skills was completed several times.
Independent samples of t-tests were performed to investigate differences on learning
outcomes between the two groups. Results of the study showed that after the game,
before face-to-face training, the experimental group performed better on clinical
competencies of the skill assessment and the variability in scores was smaller compared
to the nongame group (7-point scale; M = 4.3/3.5; SD = 0.75/1.27, p = 0.03; Cohen’s
d = 0.62) (Dankbaar et al., 2015). The self-assessment scores on clinical competencies
before training were also higher for the experimental group than for the control group
(M = 4.7/4.4, p = 0.01). After two weeks of training, a positive effect of the preparatory
game was no longer found on the assessment or self-assessment scores. An obvious
explanation for this is learning time: the effect of the 2–2.5 hour’s game learning time
was overshadowed by the 2-weeks of face-to-face training. Independent samples of t-tests
were performed to investigate differences on learning indicators in relation to condition.
We did not find significant differences between the control group and the game group on
any of the learning indicators.
This brings us to the findings in relation to GEM. We applied GEM in a typical
experimental design with a face-to-face training for both groups. The purpose of the
338 E. Oprins et al.
game preparation, is different from offering an alternative learning method. As the game
was only used by the experimental group, comparison of learning indicators had to be
directed to the training instead of the game. The learning indicators based on MSLQ and
IMI showed good reliability to use further in study 3. This second study especially
showed the importance of measuring the learning outcomes in a reliable and valid way as
proposed by GEM.
Figure 4 Experimental design LINGO online study (see online version for colours)
The validated existing questionnaire IMI (Deci et al., 1994) was used for the learning
indicators. Self-made scales mainly adopted from study 1 were used for the design
indicators and rated on a five points scale. The scales were updated for this target group
based on factor analysis and internal consistency. An important change was that control
and challenge were combined in one scale. The final scales, all with 4 to 5 items, used for
the experimental group (N = 37) were: ‘feedback’ (alpha = .74), ‘challenge & control’
(alpha = .71), ‘rules & goals’ (alpha = .73), ‘action language’ (alpha = .73), and ‘game
world’ (alpha = .75). For the control group (N = 38) only the first same three scales were
used: ‘feedback’ (alpha = .62), ‘challenge & control’ (alpha = .68), ‘rules & goals’
(alpha = .72).
The game-based learning evaluation model 339
The study showed interesting results based on the learning indicators. The differences
between the average scores on progression on ‘motivation’ (post-test minus pre-test) of
only the experimental group on the pre-test (N = 63, M = 3.25, SD = .89) and the post-test
(N = 62, M = 3.60, SD = .78) investigated with independent samples of t-test are
significant (t = −3.03; df = 58; p = .004). We did not find a significant difference for the
control group. This implies that the motivation of the experimental group has
significantly increased in contrast with the control group. We also found a significant
difference for ‘engagement’ between the control group (N = 54, M = 2.71, SD = .74) and
experimental group (N = 53, M = 2.89, SD = .88), (t = −1.19; df = 117; p = .24). We did
not find significant differences for the other learning indicators.
Finally, the results showed significant differences between control and experimental
group, split up for type of school, for the one minute test. This outcome measure is a
standardised test for English pronunciation. For primary schools the differences of the
progression (post-test minus pre-test) between control group (N = 79; M = 2.35,
SD = 5.41) and experimental group (N = 76; M = 4.63, SD = 3.77) were significant as
measured with independent samples of t-test (t = −2.09, df = 73, p = .04). We did not find
any significant differences in self-assessments. This might be explained by the fact that
children find it difficult to assess themselves. Furthermore, we did not find any effects for
the design indicators. Based on its reliability, this questionnaire might not be suitable in
its current form for children.
In relation to GEM, which was applied in most optimal form until now, we may state
that it is useful to measure the learning indicators to explain certain learning outcomes in
relation to condition as also shown in study 1. The learning indicators should optimally
be measured both in pre-test and post-test under the assumption that serious gaming
improves motivation, engagement, and self-efficacy etcetera. Comparable to study 2,
effects in learning outcomes were found based on standardised tests. The questionnaire
for design indicators, which is the only one that is not based on validated questionnaires,
is usable but should be improved further based on multiple studies applying GEM.
In the first study (Bakhuys Roozeboom et al., in press) we did not find clear significant
differences between experimental and control group in outcome measures. This was
mainly due to the methodology used, that is, a mismatch of learning objectives in the
game with the outcome measures. In the second study (Dankbaar et al., 2014) significant
differences were found in the skill assessments between the two groups (with or without
the game). The same result was found for the third study (Trooster et al., 2015), in which
standardised language tests were used. This emphasises the importance of measuring the
learning outcomes in a reliable and valid way as proposed by GEM. In all three projects
self-assessments were included as well, which point at self-efficacy (Pintrich and De
Groot, 1990). For self-assessments we found (significant) results in the first two studies
(Bakhuys Roozeboom et al., in press, Dankbaar et al., 2014) in which this was measured
quite profoundly. In the second study we found that self-assessment was a more reliable
outcome measure than self-efficacy since it correlates with the skill assessment.
Especially younger children appear to have difficulties with assessing themselves in a
reliable way as shown in the third study (Trooster et al., 2015). Thus, the usefulness of
self-assessment as outcome measures differs per goal and per target group.
The game-based learning evaluation model 341
As argued for GEM, it is also important to investigate the influence of the design and
learning indicators on the learning outcomes in order to get insight into the black box of
learning. It is important to know which design features (e.g., challenge) contribute to
effective learning and why (e.g., higher motivation) to be able to improve the serious
games based on the evaluation results (e.g., Mayer, 2012). Here we found even more
variation in the results of the various studies. It must be mentioned; however, that GEM
was still in development during the studies. Different combinations of design and
learning indicators were measured in various ways, both in pre- and post-test or only
once. We used different existing questionnaires for the learning indicators, IMI, and
MSLQ, to find out what works best. Besides, we applied a self-made questionnaire for
the design indicators. This questionnaire has evolved over time and must still be
improved in further studies.
In the first study (Bakhuys Roozeboom et al., in press) the learning indicator
‘engagement’ appeared to have an important influence on learning outcomes as found
earlier (e.g., Bedwell et al., 2012; McMahan, 2003). In addition, effects were found for
the design indicators that have been marked as important features of serious games by
others, that is, ‘challenge’, ‘social interaction’, ‘feedback’, and ‘control’ (e.g., Bedwell
et al., 2012; Gee, 2009). In general, this study showed that measuring these mediating
learning and design indicators is worthwhile. The second study (Dankbaar et al., 2014a,b)
had a different experimental design. The game was not meant to replace another type of
intervention but as a (extra) preparation for face-to-face training. This made it difficult
and less useful to compare the two conditions (gaming, control) on design and learning
indicators. The third study (Trooster et al., 2015) looked more like the first study. We
found comparable effects for ‘engagement’ and we were able to conclude that
‘motivation’ was increased by the game as being shown in many studies (e.g.,
Csikszentmihali, 1990; Pintrich and De Groot, 1990).
Thus, despite the fact that the studies differed highly in their experimental designs
and the indicators measured, we were able to make a first start with combining results
over multiple games: an important ambition of GEM in the future. We can conclude that
GEM is a practically usable framework for measuring the effectiveness for serious
gaming for learning purposes under the condition that it is appropriately and
methodologically applied.
For game designers, it is interesting to analyse a game in more detail using the design
indicators. Accordingly, we can learn from the way the game is designed. Of course, this
is still just one game and the mechanic does not need to work for all types of games.
Imagine though that more and more effectiveness studies are conducted using GEM
measuring the same indicators. With all the results of the data analyses and comparing
these with the design mechanics of the different games, our belief is that the patterns that
will obviously occur can provide us with thorough and more informed guidelines for
serious game design. Although we realise that game design, also for games used for
learning, will remain a creative process with no definite right or wrong design principles,
serious game designers could be provided with more informed guidelines as to what
makes a serious game effective and motivating in the future, based on evidence achieved
in several effectiveness studies using GEM.
References
Alliger, G.M. and Janak, E.A. (1989) ‘Kirkpatrick’s levels of training criteria: thirty years later’,
Personnel Psychology, Vol. 42, pp.331–342.
Alvarez, K., Salas, E. and Garofano, C.M. (2004) ‘An integrated model of training evaluation and
effectiveness’, Human Resource Development Review, Vol. 3, No. 4, pp.385–416.
Anderson, L.W. and Krathwohl, D.R. (2001) A Taxonomy for Learning, Teaching, and Assessing:
A Revision of Bloom’s Taxonomy of Educational Objectives, Longman, New York.
Appleton, J., Christenson, S.L., Kim, D. and Reschly, A.L. (2006) ‘Measuring cognitive and
psychological engagement: validation of the student engagement instrument’, Journal of
School Psychology, Vol. 44, pp.427–445.
Bakhuys Roozeboom, M., Visschedijk, G. and Oprins, E. (in press) ‘The effectiveness of three
serious games measuring generic learning features’, British Journal of Educational
Technology.
Bandura, A. (1997) Self-Efficacy: The Exercise of Control, Freeman, New York.
Barnard, L., Lan, W.Y., Yen, M.T., Osland Paton, V. and Lai, S. (2009) ‘Measuring self-regulation
in online and blended learning environments’, Internet and Higher Education, Vol. 12,
pp.1–6.
Bates, R. (2004) ‘A critical analysis of evaluation practice: the Kirkpatrick model and the principle
of beneficence’, Evaluation and Program Planning, Vol. 27, pp.341–347.
Bedwell, W.L., Pavlas, D., Heyne, K., Lazzara, E.H. and Salas, E. (2012) ‘Toward a taxonomy
linking game attributes to learning: an empirical study’, Simulation and Gaming, Vol. 43,
No. 6, pp.729–760.
Bekebrede, G., Warmelink, H.J.G. and Mayer, I.S. (2011) ‘Reviewing the need for gaming in
education to accommodate the Net generation’, Computers and Education, Vol. 57, No. 2,
pp.1521–1529, doi:10.1016/j.compedu.2011.02.010.
Cannon-Bowers, J.A., Salas, E., Tannenbaum, S.I. and Mathieu, J.E. (1995) ‘Toward theoretically
based principles of training effectiveness: a model and initial empirical investigation’, Military
Psychology, Vol. 7, pp.141–164.
Connolly, T.M., Boyle, E.A., MacArthur, E., Hainey, T. and Boyle, J.M. (2012) ‘A systematic
literature review of empirical evidence on computer games and serious games’, Computers
and Education, Vol. 59, No. 2, pp.661–686, doi:10.1016/j.compedu.2012.03.004.
Connolly, T.M., Stansfield, M.H. and Hainey, T. (2009) ‘Towards the development of a games-
based learning evaluation framework’, Connolly, T.M., Stansfield, M.H. and Boyle, E. (Eds.):
Games-based Learning Advancement for Multisensory Human Computer Interfaces:
Techniques and Effective Practices, Idea-Group Publishing, Hershey. ISBN: 978-1-60566-
360-9.
The game-based learning evaluation model 343
Csikszentmihali, M. (1990) Flow: The Psychology of Optimal Experience, Harper and Row, New
York.
Dankbaar, M.E.W., Bakhuys Roozeboom, M., Oprins, E.A.P.B., Rutten, F., van Saase, J.J.L.C.M.,
van Merrienboer, J.J.G., Schuit, S.C.E. (2014a) ‘Gaming as a training tool to train cognitive
skills in emergency care: how effective is it?’, in Schouten, B. et al. (Eds.): Games for Health,
Proceedings of the 4th Conference, Springer, New York, pp.13–15.
Dankbaar, M.E.W., Stegers-Jager, K.M., Baarveld, F., van Merrienboer, J.J.G., Norman, G.R.,
Rutten, F.L., van Saase, J.L.C.M. … Schuit, S.C.E. (2014b), ‘Assessing the assessment in
emergency care training’, PLoS One, Vol. 9, No. 12, p.e114663.
Dankbaar, M.E.W. (2015) ‘Serious games and blended learning; effects on performance and
motivation in medical education’, Thesis, Erasmus University Rotterdam, the Netherlands
(ch. 4), pp.61–78.
Deci, E.L., Eghrari, H., Patrick, B.C. and Leone, D. (1994) ‘Facilitating internalization: the self-
determination theory perspective’. Journal of Personality, Vol. 62, pp.119–142.
Egenfeldt-Nielsen, S. (2006), ‘Overview of research on the educational use of video games’,
Digital Kompetanse, Vol. 1, No. 3, pp.184–213.
Garris, R., Ahlers, R. and Driskell, J.E. (2002) ‘Games, motivation and learning: a research and
practice model’, Simulation and Gaming: An Interdisciplinary Journal, Vol. 33, pp.441–467.
Gee, J.P. (2003) What Videogames Have To Teach Us About Learning And Literacy, Palgrave
Macmillan, New York.
Gee, J.P. (2009) ‘Deep learning properties of good digital games. How far can they go?’, In
Ritterfeld, U., Cody, M. and Vorderer, P. (Eds.): Serious Games: Mechanisms and Effects,
Routledge, New York/London, pp.68–82.
Graesser, A.C., Chipman, P., Leeming, F. and Biedenbach, S. (2009) ‘Deep learning and emotion
in serious games’, in Ritterfeld, U., Cody, M. and Vorderer, P. (eds.): Serious Games:
Mechanisms and Effects, Routledge, Taylor and Francis, New York and London, pp.81–100.
Harteveld, C. (2012), Making Sense of Virtual Risks: A Quasi-Experimental Investigation into
Game-Based Training, IOS Press, Amsterdam.
Hockey, G.R. (1997), ‘Compensatory control in the regulation of human performance under stress
and high workload: a cognitive-energetical framework’, Biological Psychology, Vol. 45,
pp.73–93.
Holton, E.F. (1996), ‘The flawed four-level evaluation model’, Human Resource Development
Quarterly, Vol. 7, pp.5–21.
Kirkpatrick, D.L. (1976) ‘Evaluation of training’, in Craig, R.L. (Ed.): Training and
Development Handbook: A Guide to Human Resource Development, McGraw Hill,
New York, pp.301–319.
Kirkpatrick, D.I. (1998) Evaluating Training Programs: The Four Levels, 2nd ed., Berrett-Koehler,
San Francisco.
Korteling, J.E., Helsdingen, A.S. and Theunissen, N.C.M. (2012) ‘Serious games @ work: learning
job-related competences using serious gaming’, in Bakker, A. and Derks, D. (Eds.): The
Psychology of Digital Media at Work, Psychology Press LTD, Taylor and Francis Group,
pp.123–144
Korteling, J.E., Oprins, E.A.P.B. and Venrooij, W. (2014) Evaluatie van leerinterventies en
teamfunctioneren in dynamische teams [evaluation of learning interventions and team
functioning in dynamic teams] (TNO report R10243), TNO, Soesterberg.
Kraiger, K. (2002), ‘Decision-based evaluation’, in Kraiger, K. (ed): Creating, Implementing and
Managing Effective Training and Development, Jossey-Bass, San Francisco CA, pp.331–375.
344 E. Oprins et al.
Kraiger, K., Ford, J.K. and Salas, E. (1993) ‘Application of cognitive, skill-based, and affective
theories of learning outcomes to new methods of training evaluation’, Journal of Applied
Psychology, Vol. 78, No. 2, pp.311–328.
Leemkuil, H., de Jong, T. and Ootes, S. (2000) Review of Educational Use of Games and
Simulations, (IST-1999-13078 Deliverable D1), University of Twente, Enschede.
Malone, T.W. (1981) ‘Towards a theory of intrinsically motivating instruction’, Cognitive Science,
Vol. 4, pp.333–369.
Mayer, I. (2012) ‘Towards a comprehensive methodology for the research and evaluation of serious
games’, Procedia Computer Science, Vol. 15, pp.233–247.
McMahan, A. (2003) ‘Immersion, engagement, and presence: a method for analyzing 3-d video
games’, in Wolf, M.J.P. and Perron, B. (Eds.): The Video Game, Routledge, Taylor and
Francis Group, New York, pp.77–78.
Oprins, E. and Korteling, H. (2014) ‘Transfer of gaming: effectiveness of a cashier trainer’, in Cai,
Y.Y. and Goei, S.L. (Eds.): Serious Games, Simulation and Their Applications, Springer,
Science and Business Media, Singapore. pp.227–253
Pavlas, D. (2010) A Model of Flow and Play in Game-Based Learning: The Impact of Game
Characteristics, Player Traits, and Player States, Unpublished PhD thesis, University of
Central Florida.
Pintrich, P.R. and de Groot, W.V. (1990) ‘Motivational and self-regulated learning components of
classroom academic performance, Journal of Educational Psychology, Vol. 82, No. 1,
pp.33–40.
Reeve, J., Jang, H., Carrell, D., Jeon, S. and Barch, J. (2004) ‘Enhancing students’ engagement by
increasing teachers’ autonomy support’, Motivation and Emotion, Vol. 28, No. 2, pp.147–169.
Richardson, M., Abraham, C. and Bond, R. (2012) ‘Psychological correlates of university students’
academic performance: a systematic review and meta-analysis’, Psychological Bulletin,
Vol. 138, No. 2, pp.353–387.
Roe, R.A. (2005) ‘The design of selection systems: contexts, principles, issues’, in Evers, A., Smit,
O. and Anderson, N. (eds.): Handbook of Personnel Selection, Blackwell, Oxford, pp.73–97.
Ryan, R.M. and Deci, E.L. (2000), ‘Intrinsic and extrinsic motivations: classic definitions and new
directions’, Contemporary Educational Psychology, Vol. 25, No. 1, pp.54–67.
Salas, E., Milham, L.M. and Bowers, C.A. (2003), ‘Training evaluation in the military:
misconceptions, opportunities, and challenges’, Military Psychology, Vol. 15, pp.3–16.
Shaffer. (2006) How Computer Games Help People Learn, Palgrave Macmillan, New York.
Sitzmann, T. (2011) ‘A meta-analytic examination of the instructional effectiveness of computer-
based simulation games’, Personnel Psychology, Vol. 64, pp.489–528.
Squire, K. D. (2011) Video Games and Learning: Teaching and Participatory Culture in the
Digital Age, Teachers College Press, New York.
Stubbé, H.M. and Theunissen, N.C.M. (2008) ‘Self-directed learning in a ubiquitous learning
environment: a meta-review’, Proceedings of Special Track on Technology Support for Self-
Organised Learners, Vol. 2008, pp.5–28.
Sweetser, P. and Wyeth, P. (2005), ‘Game flow: a model for evaluating player enjoyment in
games’, Computers in Entertainment, Vol. 3, No. 3, pp.1–24.
Tannenbaum, S., Cannon-Bowers, J., Salas, E. and Mathieu, J. (1993) Factors That Influence
Training Effectiveness: A Conceptual Model And Longitudinal Analysis (Tech. Rep. No. 93-
011), Naval Training Systems Center, Human Systems Integration Division, Orlando.
Tobias, S. and Fletcher, J.D. (2007) ‘What research has to say about designing computer games for
learning’, Educational Technology, Vol. 47, pp.20–29.
The game-based learning evaluation model 345
Trooster, W., Goei, S.L., Ticheloven, A. Oprins, E., Visschedijk, G., Corbalan, G., and Schaik, M.
van (2014) ‘The effectiveness of LINGO online, a serious game for English pronunciation’.
Report Windesheim University of Applied Sciences, Zwolle.
Vogel, J., Vogel, D.S., Cannon-Bowers, J., Bowers, C.A., Muse, K. and Wright, M. (2006)
‘Computer gaming and interactive simulations for learning: a meta-analysis’, Journal of
Educational Computing Research, Vol. 34, pp.229–243.
Wang, H., Shen, C. and Rittefeld, U. (2009) ‘Enjoyment of digital games what makes them
‘seriously’ fun?’, In Ritterfeld, U., Cody, M. and Vorderer, P. (Eds.): Serious Games:
Mechanisms and Effects, Routledge, New York/London, pp.25–47.
Zijlstra, F.R.H. (1993) Efficiency in Work Behaviour: A Design Approach for Modern Tools,
Unpublished PhD Thesis, University of Delft.