Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/333164524

Critically Reviewing GraphoGame Across the World: Recommendations and


Cautions for Research and Implementation of Computer‐Assisted Instruction
for Word‐Reading Acquisition

Article  in  Reading Research Quarterly · May 2019


DOI: 10.1002/rrq.256

CITATIONS READS

58 2,688

4 authors:

Erin McTigue Oddny Judith Solheim


University of Stavanger (UiS) University of Stavanger (UiS)
84 PUBLICATIONS   1,806 CITATIONS    43 PUBLICATIONS   621 CITATIONS   

SEE PROFILE SEE PROFILE

Wendi Zimmer Per Henning Uppstad


Texas A&M University University of Stavanger (UiS)
13 PUBLICATIONS   190 CITATIONS    61 PUBLICATIONS   386 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Teaching Children Perspective-Taking and Empathy through Multicultural Literature View project

Comprehension of Graphics for Informational Text Learning View project

All content following this page was uploaded by Oddny Judith Solheim on 17 August 2020.

The user has requested enhancement of the downloaded file.


Critically Reviewing GraphoGame
Across the World: Recommendations
and Cautions for Research
and Implementation of
Computer-­Assisted Instruction
for Word-­Reading Acquisition
Erin M. McTigue ABSTR ACT
Overall, game-­based technology for early reading instruction has not robustly
Oddny Judith Solheim met the learning potentials of young readers. To better understand the ef-
University of Stavanger, Norway fects and limitations of computer-­assisted instruction in classrooms, research-
ers have called for more critical attention to learning theory, methodological
selection, and context for learning. GraphoGame (GG), an adaptive serious
Wendi K. Zimmer game designed to prevent reading difficulties through the promotion of sound–
Texas A&M University, College Station, symbol connections, has been implemented in over 20 countries. Therefore,
USA the GG research base provides an opportunity to synthesize research on a
single computer-­assisted instruction across diverse contexts. Surprisingly, de-
spite extensive use and further implementation plans, no review has yet syn-
Per Henning Uppstad thesized GG’s effects. Specifically, this systematic literature review, with an
University of Stavanger, Norway embedded meta-­analysis, synthesized 28 empirical studies for theory, meth-
odological quality, and outcomes. The GG research base was dominated by
theories of reading disabilities and psycholinguistics. Methodologically, quan-
titative methods, focusing on phonological and decoding outcome measures
only, were most common. The meta-­analysis (n = 19), measuring GG’s impact
on word-­reading outcomes, did not yield an overall meaningful effect size
(g = −0.02). However, among moderators (language complexity, duration of
intervention, and adult interaction), adult interaction was significant, favor-
ing implementation contexts with high levels of adult interaction. Specifically,
studies with high adult interaction produced an average positive effect size
(g = 0.48), which suggests implications for classroom use. Regarding future re-
search, the authors recommend stronger clarity of theory, attention to learn-
ing context, and a more purposeful collection of process data, which can be
obtained through greater plurality of methodology.

D
espite great potential, the effect of educational technology on
classroom learning has overpromised, both in general (e.g.,
Eng, 2005; Lawless, 2016; Livingstone, 2012; Young et  al.,
2012) and in reading development specifically (Baye, Inns, Lake, &
Slavin, 2018; Cheung & Slavin, 2013; Dynarski et  al., 2007; Slavin,
Lake, Davis, & Madden, 2011; Wood, Underwood, & Avis, 1999). Yet,
Reading Research Quarterly, 55(1)
pp. 45–73 | doi:10.1002/rrq.256
simultaneously, encouraged by technological advances, policymakers
© 2019 International Literacy Association. and educators optimistically invest increasingly larger percentages of

45
funds and students’ time into technology supports, par- base provides an opportunity to consider contextual
ticularly for reading (Livingstone, 2012). Determining ­factors (language, implementation, and duration) of a
why there is often a mismatch between the potential and widely distributed game.
demonstrated value of educational technology for read-
ing instruction has become increasingly urgent.
The answer is undoubtedly complex. Previous ex-
planations include inadequate theoretical grounding GG’s Origin and Scope
(Gobet & Wood, 1999), limitations of research design In conjunction with the Jyväskylä Longitudinal Study of
(Amiel & Reeves, 2008; Eng, 2005), bias from evaluat- Dyslexia (e.g., Lyytinen et  al., 2006), GG, an internet-­
ing technology under highly specific conditions (Ma, based learning platform of serious games, was developed
Adesope, Nesbit, & Liu, 2014), learner variables such at the University of Jyväskylä in Finland in collaboration
as motivation (Young et al., 2012), and implementation with the Niilo Mäki Institute. Although originally con-
(Lawless, 2016; Wood et al., 1999). Focusing on the ini- ceived as a dynamic assessment, a multidisciplinary team
tial stages of reading development, we work to untan- further developed it as an intervention. The game fo-
gle these issues and inquire, How can we better harness cuses on connections between spoken and written lan-
the power of educational technology for the promotion guage, at increasingly larger units, while adapting to
of word reading? learner performance and providing specific feedback. To
To date, the majority of research syntheses examin- create versions for individual languages or levels, GG can
ing educational technology (e.g., Clark, Tanner-­Smith, vary on multiple dimensions: the basic content (i.e., lan-
& Killingsworth, 2016; Ma et  al., 2014; Parr & Fung, guage), sequence of content, size of lexical units, adapt-
2000; Vogel et al., 2006; Wood et al., 1999; Wouters, van ability algorithms (fixed or general adaption levels and
Nimwegen, van Oostendorp, & van der Spek, 2013) have level of challenge), and number of learning items. These
considered multiple technologies applied across disci- adaptions limit exact comparisons across studies because
plines and developmental levels, thus rendering it diffi- GG represents a learning platform and an instructional
cult to consider how technology interacts specifically approach, not a single, immutable game.
with the content and the users (Livingstone, 2012). To Beyond geography, the differentiation possibilities
better isolate such variables, we approached this chal- have prompted individual studies of diverse, at-­risk learn-
lenge from an alternate direction: We synthesized re- ers (see Table  1 for details). Researchers have explored
search from one highly regarded educational technology GG’s effect on L2 readers (e.g., Oksanen, 2010; Patel,
platform for word-­ reading acquisition, GraphoGame 2018), bilingual students reading in two languages (Baker
(GG). The notion of word reading applied in the present et  al., 2017), and students low socioeconomic back-
work is concurrent with Gough and Tunmer’s (1986) grounds (Rosas, Escobar, Ramírez, Meneses, & Guajardo,
definition of decoding skills, encompassing both word 2017). Researchers have also considered effects for stu-
recognition and the use of letter–sound correspondence dents with cognitive challenges, including rapid automa-
rules. tized naming (Heikkilä, Aro, Närhi, Westerholm, &
GG, developed for the goal of dyslexia prevention Ahonen, 2013) and short-­term memory (Hintikka, Aro,
(Lyytinen, Erskine, Kujala, Ojanen, & Richardson, & Lyytinen, 2005). Furthermore, Nakeva von Mentzer
2009), was designed for students at risk for reading dis- and colleagues (2014) studied how GG contributes to
abilities. GG also represents a common manner that deaf and hard of hearing students’ reading development.
technology is integrated into decoding instruction (Lai,
Chang, & Ye, 2006; Lawless, 2016): individual games fo-
cused on basic skill attainment and containing much
repetition (i.e., skill and drill). Furthermore, GG has a GG’s Theoretical Grounding
global focus, supported by an international network, As described previously, the inspiration for GG derived
with the goal of providing “technology-­enhanced sup- from the Jyväskylä Longitudinal Study of Dyslexia
port as widely as possible to all learners globally, how- rather than theory. However, the simple view of reading
ever, with a special emphasis on countries where access (SVR) model (Gough & Tunmer, 1986) informed the
to literacy education is limited” (GraphoLearn, n.d.b, dyslexia study (e.g., Torppa et  al., 2016) and therefore
para. 3). Accordingly, the GG network has partnered frames GG. The SVR conceptualizes that reading com-
with UNESCO and international nongovernmental or- prehension is the product of decoding and linguistic
ganizations (Agora Center, n.d.) and has been studied in comprehension and, thereby, positions GG as a tool to
over 20 countries across Africa, Asia, Europe, North support efficient decoding skills. The SVR lens also
America, and South America. Findings have been pub- forms an implicit rationale for the implementation of
lished in top journals in the field (e.g., Reading Research GG into world orthographies, with detailed attention to
Quarterly, Scientific Studies of Reading). Such a research linguistic differences but minimal attention to cultural

46 | Reading Research Quarterly, 55(1)


TABLE 1
Descriptive Summary of Main Findings From Reviewed Studies, Presented by Year of Publication
Sample Intensity Research goal(s) Main findings
Language Duration Null or negative
and of GG and Session Positive findings findings related
Study N Description country location length Focus related to GG to GG Conclusions
Hintikka, Aro, and 44 μ = 7.25 years, Finnish, 6 weeks (170 15–20 To quantify if training Some evidence Overall, poor Overall, GG was
Lyytinen (2005)a grade 1, poor Finland minutes), minutes in GG (letters, that subgroups readers in the not more effective
readers school syllables, and words) differentially GG group did than classroom
will help poor students benefited from not outperform instruction.
improve grapheme– GG: attention the nontraining However, findings
phoneme knowledge deficit disorder, control group in suggest that
and conversion and if short-­term memory letter knowledge subgroups may
this will translate to problems, and low measures or benefit from GG.
reading skills letter naming reading measures.

Hintikka, Landerl, 39 μ = 8.4 years, German, 6 days in a 15–20 To compare different GG students read The gains from GG Short GG training on
Aro, and Lyytinen grades 2 and 3, Austria row (90–120 minutes types of GG training trained syllables did not transfer to syllables increased
(2008)a poor readers minutes), (phonological– and words with pseudowords with reading speed on
school orthographic trained syllables trained syllables, target sublexical
association group, faster than pseudowords, or patterns, but
read-­aloud group, and students without high-­frequency there was minimal
combined group) both GG training. words. All GG evidence of transfer
to one another and to groups performed to word reading.
a nontraining control similarly.
group; learning
measured for both
trained and transfer
items

Huemer, Landerl, 39 μ = 9.3 years, German, 25 sessions in 15 minutes To measure if GG GG students made GG students and Training in
Aro, and Lyytinen grade 2, poor Austria 6 weeks (375 training in sublexical greater growth in paired reading sublexical syllables
(2008)a readers minutes), patterns will transfer trained syllables. students did not did not benefit
μ = 11.8 years, school to word reading; to Paired reading differ in reading reading words with
grade 4, poor compare effects of GG students made words with GG-­ target syllables.
readers training and a paired greater gains in trained items. Paired reading had
reading intervention global word-­reading Neither group stronger effects
for reading outcomes, fluency. improved with on connected text
while considering pseudowords. reading.
subgroups

Brem et al. (2010)a 32 6 years, Gerrnan, 8 weeks (216 Varied To observe, using GG students After GG, GG improved
kindergarten, Switzerland minutes), ERP and fMRI, how showed growth students’ word-­ grapheme–phoneme
nonreaders home visual word form in word reading reading skills correspondence
system is activated and literacy skills, remained basic; for nonreaders and
via print processing; compared with only three our sensitized visual
print processing was the control (math) of 32 students word form system.
stimulated through group. could decode >10
GG; control group did words.
GG-­Math

Critically Reviewing GraphoGame Across the World | 47


(continued)
TABLE 1
Descriptive Summary of Main Findings From Reviewed Studies, Presented by Year of Publication (continued)
Sample Intensity Research goal(s) Main findings
Language Duration Null or negative
and of GG and Session Positive findings findings related
Study N Description country location length Focus related to GG to GG Conclusions
Oksanen (2010) 8 6–8 years, grades Finnish, 3 weeks (μ = Varied Using case studies, Seven out of eight Only two Playing GG at home
1 and 2, L1 = Finland 210 minutes considered the manner L2 learners made students showed may provide benefit
Russian, L2 = total when in which L2 speakers progress in letter growth at the to Finnish L2
Finnish outlier of Finnish benefit from sounds. Patterns of level of syllable learners for learning
removed), GG; used process data learning had much recognition. Many letter sounds.

48 | Reading Research Quarterly, 55(1)


home to explore the manner variability, not students exhibited Analysis of errors
that a child’s language accounted for by lack of motivation showed interference
environment affects years in Finland. for playing GG at from L1.
results home.

Saine, Lerkkanen, 166 7 years, grade Finnish, 4 sessions per 15 minutes Within a small-­group GG students At-­risk students, Results provide
Ahonen, Tolvanen, 1, at risk Finland week in 28 (within a reading intervention, outperformed in either the GG evidence for GG as
and Lyytinen (longitudinal) weeks, school 45-­minute students did 15 control students in or the control an intervention for
(2010)a,b lesson) minutes of word reading fluency. group, made promoting word-­
building (treated GG students caught slower progress reading fluency in
control group) up with not at-­risk toward fluency Finnish for at-­risk
or 15 minutes of students in fluency than normally readers.
GG; assessed the in grade 2. developing
interventions’ effect students.
for different profiles
of at-­risk students

Saine, Lerkkanen, 166 7 years, grade Finnish, 4 sessions per 15 minutes To consider the GG students had None reported Results indicate
Ahonen, Tolvanen, 1, at risk Finland week in 28 (within a long-­term effects of greater growth than that GG for at-­risk
and Lyytinen (longitudinal) weeks, school 45-­minute intervention (including control students in students in grade 1
(2011)b lesson) GG) on multiple letter knowledge, improves literacy
aspects of reading spelling, word outcomes.
(Note: Same sample fluency, and
as Saine et al., 2010, accuracy.
2013)

Lovio, Halttunen, 30 μ = 6.56 years, Finnish, 3 weeks 5–20 To determine if GG The intervention Group differences Reading-­related
Lyytinen, kindergarten, Finland (180 minutes minutes can have an effect group performed were not found in skills can be
Näätänen, and nonreaders, total), school on reading skills better than the letter knowledge, improved with even
Kujala (2012) some at risk for and central auditory math treatment letter recognition, a short intervention
dyslexia processing for 6-­year-­ control group or reading (three hours).
old nonreaders with in phonological syllables and Training effects are
familial risk for processing, writing nonwords. reflected in brain
dyslexia words, and writing activity.
nonwords.

(continued)
TABLE 1
Descriptive Summary of Main Findings From Reviewed Studies, Presented by Year of Publication (continued)
Sample Intensity Research goal(s) Main findings
Language Duration Null or negative
and of GG and Session Positive findings findings related
Study N Description country location length Focus related to GG to GG Conclusions
Bach, Richardson, 19 6 years, German, 8 weeks, Varied To assess nonreaders Reading behavior, Phonological There is a potential
Brandeis, Martin, kindergarten, Switzerland home (before and after GG) ERP, and fMRI data awareness at age for combining
and Brem (2013) nonreaders with reading behavior, predicted reading 5 did not improve neuroimaging and
ERP, and fMRI data skills in grade 2 prediction of poor reading behavior
to build prediction above reading readers at age 7. data for early
models of later behavior data prediction risk.
reading skills (grade 2) alone.
and risk status

Bhide, Power, and 19 6 and 7 years, English, UK 19 sessions 25 minutes To compare the Pre–post gains No difference in Both GG Rime
Goswami (2013)a poor readers in 8 weeks, effects of GG Rime were 2.03 ES in reading growth and the rhythmic
school (English) with a word reading, 1.28 was found musical intervention
rhythmic musical ES in pseudowords, between two benefited struggling
intervention on and 1.40 ES in intervention readers of English.
literacy outcomes spelling. groups

Heikkilä, Aro, 150 μ = 9.2 years, Finnish, 10 sessions in 5–10 To explore if syllable-­ With limited GG Syllable Training in syllable
Närhi, grades 2 and 3, Finland 2–3 weeks, minutes reading speed can be play, students’ training did not fluency was
Westerholm, and poor readers school improved via practice reading speed on transfer among effective, even in
Ahonen (2013)a with GG, as compared practiced syllables syllable types, readers with low
with a nontraining increased compared pseudowords RAN. Transfer to
control; to find out with that of the with trained reading was not
if improvement nontraining control syllables, or effective.
depends on the group connected texts
type of practiced with practiced
syllable (e.g., length, syllables.
frequency)

Kyle, Kujala, 31 6 and 7 years, English, UK 12 weeks, 10–15 To compare the Compared with In follow-­up Both versions of
Richardson, grade 2, poor (daily, ≈11 minutes efficacy of two the control group, tests, neither GG GG (for English)
Lyytinen, and readers hours total), theoretically GG Rime had group maintained had effects on
Goswami (2013)a school motivated versions a meaningful advantage on students’ literacy.
of GG for English effect on spelling, sight words. GG Rime showed
(GG Rime and GG decoding, deletion, more improvement
Phoneme) with each and rhyme, and at posttest and
other and with a GG Phoneme had a follow-­up.
control meaningful effect
on deletion.

(continued)

Critically Reviewing GraphoGame Across the World | 49


TABLE 1
Descriptive Summary of Main Findings From Reviewed Studies, Presented by Year of Publication (continued)
Sample Intensity Research goal(s) Main findings
Language Duration Null or negative
and of GG and Session Positive findings findings related
Study N Description country location length Focus related to GG to GG Conclusions
Saine, Lerkkanen, 166 7 years, grade 1, Finnish, 4 sessions per 15 minutes To build a predictive GG students None reported GG was effective in
Ahonen, Tolvanen, at-­risk readers Finland week in 28 (within a model of spelling outperformed supporting spelling
and Lyytinen weeks, school 45-­minute for students whose control and development for
(2013)b lesson) reading acquisition mainstream at-­risk students.
was supported students in spelling

50 | Reading Research Quarterly, 55(1)


differently (GG or performance.
control) (Note: Same
sample as Saine et al.,
2010, 2011)

Jere-­Folotiya et 312 5–9 years, grade ciNyana, Variable (3 7–9 To compare several Students playing Students playing The teachers +
al. (2014) 1, all levels Zambia phases, μ = minutes methods of providing GG showed GG did not show students condition
94 minutes) GG to first-­grade improvement in improvement in was most effective
students: cell phone spelling (decoding) orthographic or for implementing
vs. computer format, outcome measures. emergent literacy GG.
and student access measures.
vs. teacher access vs.
both (no true control
group)

Kamykowska, 48 6.3–7.4 years, Polish, 8 weeks, 15 minutes To assess the efficacy All students made Students in GG GG was ineffective
Haman, Latvala, grade 1, low Poland (group 1 = of GG for improving growth in letter and GG-­Math on reading skills.
Richardson, and letter knowledge 57.36 minutes literacy skills for knowledge, word-­ made equal Intervention
Lyytinen (2014)a total), school children with low reading speed, progress on letter students made
letter-­naming scores, and pseudoword-­ knowledge, word-­ growth, but the gap
as compared with reading speed. reading speed, with the reference
matched students and pseudoword-­ group remained.
playing GG-­Math game reading speed.
(crossover design);
included a reference,
skilled reader group

Nakeva von 48 5–7 years, DHH Swedish, 4 weeks (μ = 10 minutes To examine the All children For DHH children, The intervention
Mentzer et al. and NH children Sweden 202 minutes potential of GG for improved there was no indicated growth for
(2014) total), home DHH children in in decoding improvement both DHH and NH
Sweden using cochlear and passage in nonword students, although
implants and/or comprehension. decoding (e.g., areas of growth
hearing aids; 16 NH pseudowords). varied between
children served as a groups.
reference group

(continued)
TABLE 1
Descriptive Summary of Main Findings From Reviewed Studies, Presented by Year of Publication (continued)
Sample Intensity Research goal(s) Main findings
Language Duration Null or negative
and of GG and Session Positive findings findings related
Study N Description country location length Focus related to GG to GG Conclusions
Ronimus, Kujala, 138 6–10 years, Finnish, 2 sessions a 10–15 To explore how Children self-­ Children played Internal features of
Tolvanen, and grades 1 and 2 Finland week for 8 minutes certain features of reported enjoying the game less the game had little
Lyytinen (2014) weeks, home GG (e.g., level of GG. The in-­game than expected. impact on students’
challenge, novel reward session The level of motivation to play
reward system, encouraged initial challenge did not GG at home.
fantasy) can impact play. affect children’s
children’s engagement engagement. The
during play in the impact of the
home reward system
was short term.

Ecochard (2015) 9 Grade 1, poor Spanish, 8 weeks 20 minutes To understand, via Adults played an Children did not Cultural and
readers Peru (daily, 480 qualitative methods, important role play GG mastery situational context
minutes the use of GG in in supporting learning goals, must be considered
total), after a unique context motivation as intended, but for implementing
school (rural Peru); to see and providing instead for fun. GG, particularly for
how children use scaffolding for Initial interest sustaining student
GG for learning; the learning with GG. diminished engagement.
teacher’s role in a GG quickly.
intervention

Koikkalainen 203 7 and 8 years, Finnish, N/A (used for 120–150 To measure how Computerized and Results of Provides evidence
(2015) grade 2, all levels Finland assessment min computerized reading pen-­and-­paper predictive for using
purpose), assessment methods, tests correlated. variables may computerized
school via GG, compare with Risk predictors: be language measures in the
traditional pen-­and-­ word recognition, dependent. game environment
paper measures for sentence fluency, Finnish has for assessment
identifying at-­risk and RAN. a highly
students transparent,
regular
orthography.

Ronimus and 194 6–10 years, Finnish, 2 sessions a 10–15 To determine whether Students played GG Children were not GG was more
Lyytinen (2015) grades 1 and 2 Finland week for 8 minutes the school or home for more sessions highly motivated effective at
weeks, school environment is better and more minutes to play GG at school than at
and home for GG play, as at school than home. Parental home related
related to frequency, home. Teachers control was to engagement,
duration, engagement, were more involved negatively related frequency, and
level, and adult than parents during to engagement. adult involvement.
involvement GG.

(continued)

Critically Reviewing GraphoGame Across the World | 51


TABLE 1
Descriptive Summary of Main Findings From Reviewed Studies, Presented by Year of Publication (continued)
Sample Intensity Research goal(s) Main findings
Language Duration Null or negative
and of GG and Session Positive findings findings related
Study N Description country location length Focus related to GG to GG Conclusions
Mourgues et al. 110 8–18.5 years, Chi-­tonga N/A (GG Not To examine the BBG was a In BBG, many Further
(2016) grades 3–7, all and English, adapted reported contribution of statistically students development of BGG
levels Zambia for use as visual–verbal significant performed close may differentiate
dynamic paired-­associate predictor for to chance, low-­performing
assessment), learning tasks for both word and indicating students from those
school predicting reading; pseudoword guessing or a need who have specific
two paired-­associate reading. for additional reading disabilities.

52 | Reading Research Quarterly, 55(1)


learning tasks were practice with the
administered: BBG format.
(GG was adapted to be
a dynamic assessment)
and foreign-­language
learning task

Baker et al. 78 6 and 7 years, Spanish and 80 sessions 10 minutes To increase decoding Pre–post gains When compared GG promoted
(2017)a grade 1, bilingual English, USA in 16 weeks, and fluency in both were 0.30 ES for with control reading growth in
Spanish–English, school Spanish and English Spanish fluency. A students, GG at-­risk bilingual
low-­SES for bilingual children subset of students students showed students but not
backgrounds playing GG Spanish, in GG Spanish no overall more than business-­
compared with improved in English improvement in as-­usual students.
business-­as-­usual decoding. Spanish decoding
instruction (nontraining and fluency or
control group); to English decoding
explore linguistic and fluency.
transfer effects

Rosas, Escobar, 87 Grade 1, low-­ Spanish, 12 weeks 30 minutes To consider the For students GG did not have Provides partial
Ramírez, Meneses, and high-­SES Chile (360 minutes impact of GG on from high-­SES an impact on support for GG
and Guajardo backgrounds total), school literacy learning for backgrounds, GG word reading because of the
(2017)a Spanish speakers from had an effect on or phonological effect on sublexical
both high-­and low-­ RAN. For students awareness. skills, but no
SES backgrounds as from low-­SES evidence of transfer
compared with control backgrounds, GG to reading
had an effect on
letter sounds.

Borleffs et al. 69 6.2 years, grade Standard 8.9 sessions 12 minutes To develop and pilot For students with There was no GG may be
(2018) 1, 6 weeks of Indonesian, average GG for Standard low phonological control group to effective for
formal instruction Singapore (1.86 hours Indonesian for first-­ awareness, isolate the effects readers of Standard
average), grade students (no greater exposure of the game. Indonesian.
school control group) to the game was Students may
associated with only need 60–120
gains in decoding. minutes of the
game for effects.

(continued)
TABLE 1
Descriptive Summary of Main Findings From Reviewed Studies, Presented by Year of Publication (continued)
Sample Intensity Research goal(s) Main findings
Language Duration Null or negative
and of GG and Session Positive findings findings related
Study N Description country location length Focus related to GG to GG Conclusions
Carvalhais, 30 Grade 2, at Portuguese, 6 weeks 15 minutes To assess the efficacy Compared with a GG students did Portuguese at-­risk
Richardson, and risk for reading Portugal (420 minutes of using GG Fluent math intervention, not outperform readers responded
Castro (2018)a failure total), school Portuguese for second-­ GG students control students to training in GG
grade students at risk improved in word on implicit for both spelling
for reading failure reading, spelling, phonological and word-­reading
and explicit syllable awareness tests or outcomes.
awareness. the global reading
measure (cloze
test).
Ngorosho (2018)a 108 Grade 1 Kiswahili, 3 weeks, 20–40 To establish the Students in all When comparing The most effective
Tanzania school minutes efficacy of using GG four groups (two control and GG intervention had
Kiswahili to asses basic interventions students (before both teachers and
reading and spelling at two schools) switching), GG students exposed
skills; crossover improved in students did to GG.
design, so all students reading skills after not outperform
had GG exposure and a combination of the traditional
classroom instruction classroom and GG classroom
instruction. instruction.
Patel (2018)a 30 7 and 8 years, ELs, India 21 days over 20–30 To determine the GraphoLearn GraphoLearn GraphoLearn was
grade 3, ELs in 8 weeks minutes effectiveness of the students students did implemented
India (470 minutes GraphoLearn version outperformed not outperform with no teacher
total), school (teaching English to control students control students involvement.
nonnative speakers); (math) in letter in rime, word Future work should
students assessed in sounds. reading, consider greater
letter knowledge, pseudoword collaboration with
syllables (rime), reading, or teachers.
reading skills, and spelling.
spelling
Worth, Nelson, 389 Year 2, at-­risk English, UK 5 sessions 10–15 Efficacy trial and Teachers reported GG Rime did not GG Rime is no more
Harland, readers per week for minutes external evaluation for that GG Rime was improve students’ or less effective
Bernardinelli, and 10–12 weeks, GG Rime intervention easy to implement reading or spelling than the typical
Styles (2018)a school that had been and that children test scores when support that
previously piloted (see enjoyed the game. compared with students would
Bhide et al., 2013; business-­as-­usual receive.
Kyle et al., 2013) students. The
same pattern was
found for students
from low-­SES
backgrounds.

Note. BBG = Bala Bbala Graphogame; DHH = deaf and hard of hearing; EL = English learner; ERP = event-­related potential; ES = effect size; fMRI = functional magnetic resonance imaging; GG = GraphoGame;
L1 = first language; L2 = second language; N/A = not applicable; NH = normal hearing; RAN = rapid automatized naming; SES = socioeconomic status.
a

Critically Reviewing GraphoGame Across the World | 53


Included in the meta-­analysis. bThese three studies analyzed the same sample of students, so only one was included in the meta-­analysis.
context; SVR describes reading as principally a cognitive FIGURE 1
act by an individual rather than positioning reading Sample Screenshot From GraphoGame
more broadly as a cultural and communication practice
that is bestowed value by a group.
Similarly, Lyytinen and Richardson (2014) docu-
mented three additional theories informing GG: Ehri’s
(2005) theory of word reading, Ziegler and Goswami’s
(2005) psycholinguistic grain size theory, and Katz and
Frost’s (1992) orthographic depth hypothesis, also in-
forming content. These additional theories were noted
to supplement and specify the SVR framing as described
by Gough and Tunmer’s (1986) definition of decoding
skills, although the exact theoretical connections must
be inferred. Specifically, Ehri’s theory of word-­reading
development can provide justification for sublexical in-
struction, whereas Ziegler and Goswami’s psycholin-
guistic grain size theory guides the sequence from
phonemes to syllables to words. Katz and Frost’s ortho- Note. In this activity, the player hears a phoneme and clicks on the
graphic depth hypothesis likely informs adaptation corresponding letter. If the choice is correct, the character goes up the
ladder. The player receives immediate, specific feedback because after
across languages with varying levels of complexity. In making a selection, there is an affirmative or corrective sound, and the
total, with the exception of self-­determination theory to correct letter choice is framed in green and any incorrect choice framed
promote intrinsic motivation (subsequently described), in red. The color figure can be viewed in the online version of this article
at http://ila.onlinelibrary.wiley.com.
the theoretical base informs the scope and sequence of
the game. Curiously, for a tool positioned to address
reading difficulties through technology, the theoretical item mastery, ensuring efficient progress on easy items
focus contains a notable lack of sociocultural theories or and additional opportunities on harder items (Ojanen
technology-­learning theories. et al., 2015).

GG’s Approach and Structure Defining Technology Constructs


Despite variability, GG versions are consistent in the Before proceeding further, we place GG within the con-
overall approach. GG relies on the foundation that when text of similar resources. Most broadly, educational
readers fail to achieve automatic phoneme–grapheme technology is a variety of electronic tools that support
correspondence, they will be impaired in decoding the learning process (Cheung & Slavin, 2013), and those
speed, thus creating a bottleneck for learning to read that directly support instruction are often described as
(Lyytinen et  al., 2009). Essential for preventing bottle- computer-­assisted instruction (CAI). Computer games
necks in transparent orthographies, each game explicitly are a further subset and, despite their name, can be
directs attention to grapheme and phoneme links and played on many devices (smartphones, tablets, and
encourages speed (Ronimus & Richardson, 2014). Then, desktop computers). Fundamentally, computer games
relevant for opaque orthographies, players continue to have goals and interactive elements, are rewarding
make rapid connections but at increasingly larger units (Vogel et al., 2006), and incorporate increasing levels of
(blends, rimes, and words; Ziegler & Goswami, 2005). challenge (Ronimus, Kujala, Tolvanen, & Lyytinen,
Regarding player experience, GG has a visually sim- 2014). The subtype played in school settings is serious
ple animation style (see Figure 1), with only a few writ- games (Wouters et al., 2013), because the goal is learn-
ten elements at any one time. The player matches these ing not entertainment. Additionally, although CAI can
elements with short segments of audio, in both untimed be a core reading program, computer games usually
and time-­restricted formats (Richardson & Lyytinen, supplement a curriculum through individualized adap-
2014). When errors occur, the player is guided to the tive practice (Cheung & Slavin, 2012). Finally, computer
correct match. Finally, after a short sequence, the player games including systems to record individual students’
is rewarded with game tokens and virtual stickers progress are called integrated learning systems (Parr &
(Richardson & Lyytinen, 2014). Informed by self-­ Fung, 2000). Using these definitions, GG is a serious
determination theory (Ryan & Deci, 2000), to build con- game within an integrated learning system and typically
fidence, students move at their unique pace, based on used as a supplemental intervention.

54 | Reading Research Quarterly, 55(1)


Previous Synthesis stronger effects when learners played a game across
multiple sessions, instead of a single session, which
Regarding Learning Effects ­occurred in all GG studies. In contrast, duration (total
From Technology minutes) was not associated with greater learning
Multiple meta-­analyses and reviews regarding computer (Clark et  al., 2016). Despite mixed results, within GG
games for learning (Clark et al., 2016; Vogel et al., 2006; research, authors consistently reported time as a major
Wouters et al., 2013; Young et al., 2012) and for learning limitation to their studies; therefore, we explored this as
with information and communication technologies a moderator.
(Lee, Waxman, Wu, Michko, & Lin, 2013) provide con-
vergent evidence that under specific conditions, tech-
nology can significantly enhance students’ learning. We Learning Literacy With
extract common themes that informed this work. Computer Support
Transitioning to literacy specifically, reviews consider-
Motivation Interactions ing literacy learning with technology (Baye et al., 2018;
Counter to common wisdom, learners are not always Cheung & Slavin, 2013; Slavin et  al., 2011) and large-­
highly motivated by learning games. In their review of 39 scale evaluation studies from the United States
studies, Wouters et al. (2013) found that serious games (Campuzano, Dynarski, Agodini, & Rall, 2009; Dynarski
were not considered more motivating than traditional et  al., 2007), the United Kingdom (Wood et  al., 1999),
instruction. However, Vogel et  al.’s (2006) review of 32 and Australia (Parr & Fung, 2000) are most connected
studies found an interaction with autonomy: When play- to our review. However, compared with the more gen-
ers navigated the programs, technology was preferred, eral learning with technology, these provide less opti-
but when teachers controlled the game, the advantage mistic predictions for educational technology and
was lost. Considering design features, Clark et al.’s (2016) serious games for reading, such as GG.
review of 69 studies tested intrinsic integration to cap-
ture features added to promote motivation, but found Small Effects for Readers
little benefit. A review of 39 studies suggested that the
context surrounding the game may be more important Studying low-­ achieving first graders in the United
for students’ engagement than the actual game itself States, Dynarski et al. (2007) found no overall impact of
(Young et  al., 2012). For example, in Lee et  al.’s (2013) the CAI interventions on word reading (ES = 0.03). In
review of 58 studies, when technology interventions al- follow-­up work, Campuzano et  al. (2009) documented
lowed for conversations and peer collaboration, students that for low-­achieving students, CAI interventions had a
reported more positive affect. In total, findings indicate negative effect on word reading. Wood et al. (1999) and
that student motivation can be achieved but should not Parr and Fung (2000) both reported high variability and
be assumed for serious games, thus making motivation a concluded that their evaluations did not produce con-
lens through which to interpret GG results. vincing evidence for reading gains. Similarly, Slavin
et al. (2011), in their review of 96 studies, reported that
the educational technology programs for struggling ele-
Integration and Alignment mentary readers had limited impact (ES = 0.09). Perhaps
Regarding technology’s role in classrooms, researchers reflecting recent technologies, Cheung and Slavin’s
found the largest learning effects when teachers actively (2013) synthesis of 20 studies regarding educational
scaffolded students’ learning (Clark et  al., 2016; Lee technology for elementary school reading showed rela-
et al., 2013; Young et al., 2012). Wouters et al. (2013) also tively larger but still small effects (ES = 0.14). However, a
found stronger effects when the games supplemented recent review of reading programs for secondary stu-
other instruction, such as teacher-­led lessons. Regarding dents (Baye et  al., 2018), with 23 programs including
content type, technology best supported factual and ba- technology and 46 not, showed no advantage of
sic skills–learning goals rather than higher order learn- technology-­ enhanced programs (ES = −0.01). These
ing (Lee et al., 2013), which is aligned with the learning ­effect sizes provide a range for which to interpret the
goals in GG. Integration effects prompted our modera- ­effects of GG on students’ learning.
tor focus of level of adult interaction.
Implementation Interactions
Time Wood et al. (1999) stressed that for meaningful learning,
Findings provide little consensus regarding optimum games and systems must be integrated into authentic learn-
time. However, consistent with distributed practice, ing experiences. Teacher involvement was specifically
both Clark et al. (2016) and Wouters et al. (2013) found identified for the transfer of learning beyond the game

Critically Reviewing GraphoGame Across the World | 55


environment (Parr & Fung, 2000). By quantifying the dif- under which conditions learning occurs and, more im-
ferential effects of implementation, Cheung and Slavin portantly, when it does not. Our contextual focus further
(2013) provided further evidence: When CAI was used in a extends the research base because the majority of previous
small-­group integrated approach, CAI produced the stron- CAI reviews concerning reading considered English-­
gest effects (ES = 0.32), whereas supplementary CAI mod- speaking children in developed nations, whereas this re-
els, in which students independently followed self-­tutorials, view incorporates many languages across both developing
yielded smaller effects (ES = 0.18). In total, analogous to and developed nations.
the findings describing general learning effects from tech- Our next goals are epistemological. Theory has re-
nology, the context of the game play and teacher involve- ceived particularly limited attention in CAI research for
ment affected learning and prompted us to examine GG reading (Yang, Kuo, Ji, & McTigue, 2018), and in the lit-
when used with varying levels of support (i.e., our adult eracy field, advances in reading theories have been con-
interaction moderator). sistently eclipsed by empirical advances (Cain & Parrila,
2014). Our theoretical analysis provides insight into the
Methodological Concerns types of understanding (and potentially bias) of GG re-
search. Finally, for the goal of identifying research limi-
Methodological concerns have been identified as limiting
tations, we critically analyze methodologies so future
our understanding of CAI in the classroom. In fact, the
researchers can develop more rigorous approaches.
studies by Dynarski et  al. (2007) and Campuzano et  al.
Two categories of research questions guided this re-
(2009) were a direct response to such concerns, prompt-
view: outcome (questions 1a and 1b) and epistemologi-
ing the U.S. Congress to fund large-­scale evaluations.
cal (questions 2a and 2b):
Specific concerns include weak or absent comparison
groups, small sample sizes, insufficiently characterized 1a. How effective is GG in improving students’
samples, and lack of reliability and validity information word-reading skills?
about outcome measures (Parr & Fung, 2000; Wood et al., 1b. Which contextual factors are associated with
1999). Slavin and colleagues’ (2011) and Cheung and improved word reading?
Slavin’s (2013) reviews controlled for methodological
2a. To what extent does theory inform GG research
weaknesses by a stringent inclusion criterion; however,
and relate to contextual factors?
such decisions also excluded many studies and favored
large-­scale quantitative studies. These concerns prompted 2b. To what extent do methodological choices im-
our use of a qualitative methodological screening tool, pact our understanding of the effectiveness of
as  well as a moderator to monitor for control group GG?
variability.

Method
The Current Study Our overarching approach, to allow for a range of meth-
odologies, was a mixed-­methods systematic literature
In 2000, the National Reading Panel noted technology’s review, in which we identified relevant articles, screened
promise (National Institute of Child Health and Human for methodological rigor, and qualitatively summarized
Development, 2000) but left two persistent questions results in a systematic manner (Khan, Riet, Popay, Nixon,
that remain topical: & Kleijnen, 2001). Systematic reviews typically have one
1. What is the proper role of integration of comput- of three foci: outcome, theoretical, or methodological
ers in reading instruction? (Petticrew & Roberts 2006); we employed all three. For
quantitatively analyzing outcomes, we employed meta-­
2. Under what conditions can they replace or sup-
analytic procedures.
plement conventional instruction?
We report results according to the Preferred Re­
Our overarching purpose is making incremental prog- porting Items for Systematic Reviews and Meta-­
ress toward answering such questions. GG, after a de- Analyses statement (see Moher, Liberati, Tetzlaff,
cade of global use in diverse settings, provides a unique Altman, & The PRISMA Group, 2009). We applied a
opportunity to potentially unravel key variables in sup- five-­
step process: identification of studies, initial
porting word-­decoding technologies. screening process via inclusionary criteria, eligibility
Our purpose is multileveled. We first focus quantita- decision according to methodological quality indica-
tively on outcomes and variables that may moderate tors, descriptive synthesis, and quantitative analysis of
­outcomes, and then our qualitative synthesis of studies studies appropriate for meta-­ analytic review (see
informs the meta-­ analytic results. By holding GG Figure 2). Due to the wide scope of questions, we ana-
­constant, we explore the role of context by considering lyzed each question independently, resulting in a

56 | Reading Research Quarterly, 55(1)


FIGURE 2
Flow Diagram for Search and Inclusion Criteria for Studies in This Review

Search features:
• Electronic databases (ERIC, PsycINFO, Linguistics and Language
Identification

Behavior Abstracts, and ProQuest Education Journals)


• GraphoGame webpage (https://www.graphogame.com/) list
• Reference lists of identified articles reviewed
• Google Scholar’s “cited by” function for identified articles
• Email requests with researchers in the field
Initial screening

Abstracts excluded (n =
361):
408 abstracts screened for GG as the intervention used
• Not GG (n = 264)
and duplications
• Duplicates (n = 97)

Articles excluded (n =
48 studies screened for inclusion: published January 2005– 17):
April 2018, GG implemented as intervention/assessment, • Not empirical (n = 8)
Inclusion criteria

empirical, participants in pre-K–12, accessible in English, • Language of


and full text available publication (n = 8)
• Full text not available
(n = 1)

31 studies evaluated for methodological quality: empirical


goal, provides empirical/theoretical evidence for design, Articles excluded (n = 3):
methods provide sufficient detail and rigor of design, • Lacking detail of
reliability of data, validity of data, participants/sample well methods, measures,
characterized, implementation fidelity, and interpretation and sample (n = 2)
Eligibility

consistent with data • Rigor of design and


implementation fidelity
(n = 1)

Studies included in overall, qualitative synthesis


(n = 28)
Full articles excluded
(n = 13):
• Qualitative (n = 2)
Inclusion criteria for meta-analysis: quantitative data • Research design (n =
reported, randomized or quasi-experimental design with
Included

7)
control, and outcome measures included word reading
• Not word reading (n =
2)
• Same word-reading
Studies included in quantitative synthesis data reported (n = 2)
(n = 15; 19 independent comparisons)

Note. GG = GraphoGame. Adapted from “Preferred Reporting Items for Systematic Reviews and Meta-­Analyses: The PRISMA Statement,” by D. Moher, A.
Liberati, J. Tetzlaff, D.G. Altman, and The PRISMA Group, 2009, PLoS Medicine, 6(7), e1000097, https://doi.org/10.1371/journal.pmed.1000097. Copyright
2009 by Moher et al.

Critically Reviewing GraphoGame Across the World | 57


series of short, yet interrelated, reviews with a com- word reading, and have pre-­and posttest data on word
mon discussion and conclusion. reading or posttest data only if the researchers used ran-
dom assignment and had evidence of no initial group
Search Process and Inclusion Criteria differences. Word reading (not prereading skills) was
the outcome variable of interest. Previous researchers
Regarding database searches (see Figure  2 for details),
have advocated only using reading outcomes because
we began with keyword searches for “GraphoGame” and
prereading outcome measures are often designed in
“Ekapali” (the original Finnish name, meaning first
close alignment with interventions, which can overstate
game) in the following databases: ERIC, Google Scholar,
effects (e.g., Slavin et  al., 2011). We allowed control
Linguistics and Language Behavior Abstracts, ProQuest
groups to be either treated or nontreated, and we coded
Education Journals, and PsycINFO. We triangulated our
for control type as a moderator. Furthermore, we did
database findings with the publication list maintained
not exclude for length of intervention but also coded
on the GG webpage (https://www.graphogame.com/).
and tested it as a moderator. Thirteen articles were ex-
To reduce potential publication bias, non-­peer-­reviewed
cluded at this stage (see Figure  2), leaving 17 articles.
sources were included (e.g., book chapters, research re-
However, three studies used the same data set. Therefore,
ports, dissertations). After removing duplicates and
so as to not bias our sample, from those three studies, we
nonrelevant articles, as an ancestral search, we reviewed
selected the one with a primary focus on word reading,
the reference lists of each source. Next, to identify more
yielding a total of 15 articles.
recent sources, we applied the “cited by” function on
Google Scholar. Select scholars (N = 5) active in GG re-
search reviewed the list of identified studies (see Coding and Synthesis Procedure
Figure 2). for Narrative Analysis
An abstract screening, to primarily determine We initially coded 44 dimensions for each article, in-
whether GG was the CAI of focus, reduced the initial cluding participant descriptions, intervention proce-
total from 408 to 48 (see Figure 2). At the full-­text level, dures, outcome measures, theory, and methodological
these 48 were double-­coded by two authors for inclu- rigor. All articles were double-­coded, and discrepancies
sion. Inclusion criteria were used GG as an intervention, were discussed until consensus. After the initial coding
empirical, the language of publication was English, and process, we created an individual matrix for each ques-
studies involved pre-­K–12 students (see Figure 2). Five tion. At this stage, using inductive paradigmatic analy-
manuscripts were coded collaboratively, the remaining sis, we described themes through multiple close readings
were double-­ coded independently, with 100% agree- of each study, rather than approaching with an a priori
ment, and 17 sources were excluded (see Figure 2). hypothesis (Smeyers & Verhesschen, 2001).
We provide an example to illustrate our process:
Methodological Quality Evaluation When analyzing the theoretical frameworks, theories
Modeled after the systematic review guidelines estab- were identified in three manners. First, when reading
lished by Torgerson (2007), the studies underwent a the introductions, we annotated the occurring theoreti-
methodological quality evaluation at the full-­text level. cal frames. Second, we performed a keyword search on
Our Methodological Quality Questionnaire (see the electronic files for signal words such as theory, model,
Appendix) was adapted from Miller, Scott, and McTigue and grounded. Finally, we checked reference lists for any
(2018). We added the criterion of fidelity of implemen- overlooked theoretical contributions. Next, we coded
tation because it predicts reading intervention success the purpose of each theory within the research design.
(e.g., Fogarty et  al., 2014). All studies (n = 31) were Then, through the process of constant comparison, we
double-­coded at the full-­text level, and only those that grouped like theories together. Resulting categories were
met all eight quality criteria were included. Again, five given tentative names based on underlying similarities.
manuscripts were coded collaboratively, and the re- This process continued until all theories could be logi-
maining 26 studies were double-­coded independently, cally placed in one category, and no single theory fit into
with 100% agreement for inclusion. Three manuscripts two categories.
were excluded (see Figure 2).
Coding Procedures for Meta-­Analysis
Inclusion Criteria for Meta-­Analysis We operationalized word-­ reading outcomes as word
For further inclusion in the meta-­analysis, the studies lists (accuracy), word lists (speed and accuracy), pseu-
had to report quantitative data, employ random assign- doword lists (accuracy), and short passages scored for
ment or matching with appropriate adjustment for pre- accuracy. However, in situations with multiple outcome
test differences, include at least one outcome measure of measure types (e.g., pseudoword and word reading), we

58 | Reading Research Quarterly, 55(1)


selected the outcome closest to authentic reading (i.e., measures. Specifically, we first input the different means,
word reading). In studies containing more than one standard deviations, and sample sizes. Then, we calcu-
measure that equally captured word reading (e.g., Test of lated each Hedges’s g and the associated weight and di-
Word Reading Efficiency, British Ability Scales second vided the sum of weighted Hedges’s g by the sum of
edition), both measures were averaged. We included weights (i.e., Σwigi/Σwi) to produce a weighted average
outcome measures that were aligned with the interven- Hedges’s g for that study.
tion if they required learning transfer to word reading. We estimated the overall effect sizes by calculating a
For example, in Hintikka, Landerl, Aro, and Lyytinen’s weighted average of individual effect sizes using a ran-
(2008) study, students practiced sublexical units, and we dom effects model. According to Borenstein, Hedges,
selected the outcome measure of words with trained Higgins, and Rothstein (2010), a random effects model
segments rather than words with untrained segments, as should be selected when researchers anticipate that the
the latter was considered the control. However, we ex- true effect size is not identical across studies. Thereby,
cluded outcome measures that combined sublexical with different study designs, populations, and languages,
tasks and word reading in the same list (e.g., Lovio, we hypothesized that the true effect size would vary
Halttunen, Lyytinen, Näätänen, & Kujala, 2012) because across the studies.
we could not determine if students had read words or
only word parts. For studies with both a posttest and Publication Bias
follow-­up, we selected the immediate posttest.
We used multiple methods to estimate the sensitivity of
In research with a randomized crossover design (Brem
our results to publication bias: funnel plot, Egger’s test of
et al., 2010; Kamykowska, Haman, Latvala, Richardson, &
publication bias (Egger, Smith, Schneider, & Minder,
Lyytinen, 2014; Ngorosho, 2018), we extracted data from
1997), Duval and Tweedie’s (2000) trim and fill method,
the first cohort only, thus allowing for a true control.
and cumulative forest plot (Borenstein et  al., 2010).
Furthermore, in three studies (Heikkilä et  al., 2013;
Meta-­analyses assume that effect sizes are symmetrical to
Hintikka et al., 2008; Kyle, Kujala, Richardson, Lyytinen,
the mean, and results may be biased if the funnel plot vi-
& Goswami, 2013), GG participants were divided into
sually depicts an asymmetrical distribution (Borenstein
multiple subgroups, with each subgroup playing a GG
et  al., 2010). Egger’s linear regression test (Egger et  al.,
variant (e.g., frequent vs. infrequent syllables), but all
1997) also examines the assumption of symmetry. The
subgroups were compared with the same control group.
trim and fill procedure trims the outlying studies from
In such situations, to reduce bias of selecting one GG
one side, fills them to the other side, and reestimates
subgroup to analyze, we first verified that there were no
Hedges’s g (Schwarzer, Carpenter, & Rücker, 2015).
subgroup differences on word reading. Then, we com-
Finally, a cumulative forest plot can detect the impact of
bined all GG conditions into one experimental GG
studies with small sample sizes (Borenstein et al., 2010).
group. In instances of missing data or unclear reporting,
we emailed authors and received clarification for all of
our requests. Moderator Variables
Unfortunately, the intervention research in computer
Meta-­Analytic Procedure games has been dominated by those providing simple
We conducted analyses using the Comprehensive Meta-­ exposure to technology but often ignoring teacher and
Analysis program, version 3 (Borenstein, Hedges, learner variables and, consequently, cultural variables
Higgins, & Rothstein, 2005), and computed effect sizes (Young et al., 2012). This limitation in the field provided
as Hedges’s g. When Hedges’s g was positive, the group the impetus for this work, but this limitation also ex-
playing GG performed higher than the control. When tended to GG and constrained our potential moderator
possible (n = 16), Hedges’s g was calculated as the differ- variables. However, we examined the capacity of four
ence in gain (pretest to posttest) between the GG and contextual moderator variables, derived from our litera-
control groups. However, because beginning readers ture and theoretical review, to explain variability in the
play GG, in three studies (Brem et al., 2010; Kamykowska effect sizes.
et  al., 2014; Saine, Lerkkanen, Ahonen, Tolvanen, &
Lyytinen, 2010), assuming nonreading, researchers did Control Group Type
not use reading pretests. However, random assignment Researchers employed three different types of control:
was employed, so we calculated Hedges’s g as the post- untreated control, Math Computer Game, and reading
test differences between the GG and control groups. intervention. The untreated control was typically de-
For studies that reported multiple reading outcome scribed as business as usual, and GG represented an
measures, we calculated a weighted average Hedges’s g added instructional opportunity. The Math Computer
with the mean standard error based on a number of Game was specifically designed to mirror the gaming

Critically Reviewing GraphoGame Across the World | 59


experience of GG but with math content not reading. Results and Discussion
The reading intervention controls included both word-­
reading interventions (see Saine et al., 2010) and more Overall Sample
general reading interventions such as paired reading In total, 28 empirical studies were included in the review:
(see Huemer, Landerl, Aro, & Lyytinen, 2008). 22 peer-­reviewed journal articles, one research report,
one book chapter, one doctoral dissertation, and three
Language of Intervention master’s theses (see Table 1). Table 1 provides an overview
GG was designed for use across languages, yet much evi- of the studies and their key characteristics. This review
dence indicates different patterns of reading acquisition represents research on 2,430 students. Eleven languages
related to orthography (e.g., Ellis et al., 2004). Specifically, and 14 countries were included, although Finland was
both theoretical models (e.g., Katz & Frost, 1992) and overrepresented. Within the quantitative studies (N = 26),
empirical evidence (Seymour, Aro, & Erskine, 2003) in- sample sizes were small to modest, ranging from 19 to
dicate that young readers acquire decoding skills more 389, with a mean of 100. About half of all studies (N = 15)
readily in transparent (or shallow) orthographies with were included in the embedded meta-­analysis regarding
high sound–symbol consistency (e.g., Italian, Icelandic) GG’s impact on word reading, but because of multiple in-
compared with readers of complex orthographies with dependent groups, this yielded 19 independent compari-
higher proportions of phonetically irregular spelling pat- sons. Three studies had kindergarten participants, 17
terns (e.g., English, French). Because of the many or- studies focused on first grade, and eight studied second-­
thographies, following Landerl and colleagues (2013), we grade participants or older. Regarding setting, 20 studies
grouped them into three categories based on the dimen- occurred in school, seven occurred at home or after
sions of orthographic depth and syllable complexity school, and one compared settings. No studies included
(Seymour et  al., 2003): shallow/simple (e.g., Finnish, students formally identified as dyslexic; however, 18 stud-
Spanish), moderate/complex (e.g., German, Polish), and ies selected participants with risk factors associated with
deep/complex (e.g., English). difficulties in learning to read.
Results did not indicate evidence of publication bias.
Duration Specifically, the funnel plot showed that although dis-
persed, the studies were almost symmetrical to the mean
Students in all studies played GG in multiple sessions for
effect size. The Egger’s test of publication bias was not
at least one week. The total play/intervention times were
statistically significant (t = 1.19, df = 17, p = .13).
either directly reported or calculated from the treatment
Furthermore, Duval and Tweedie’s (2000) trim and fill
schedule (e.g., 15 minutes, three times a week, over six
method did not provide evidence that results were sensi-
weeks). The average reported durations ranged from 57 to
tive to the publication bias, because original values were
900 minutes. Interventions were thus coded as short (180
unchanged. Finally, the cumulative forest plot displayed
minutes or less), medium (181–360 minutes), or long
a stable overall effect size.
(361–900 minutes). GG was most commonly played daily
for 15-­minute sessions. Therefore, the short duration rep-
resents approximately a one-­to two-­week intervention, How Effective Is GG in Improving
the medium duration approximately a two-­to six-­week Students’ Word-­Reading Skills?
intervention, and the long duration as greater than six When analyzing the meta-­analytic subsample (19 compar-
weeks. isons), the null hypothesis regarding homogeneity of stud-
ies was not rejected, Q(18) = 16.27, p = .58, I2 = 0.00%,
Level of Adult Interaction t2 = 0.00, indicating consistency across studies and appro-
We coded interventions as either low or high adult in- priateness to combine findings. Figure  3 visually depicts
teraction. Low-­interaction interventions had at least one the individual effect sizes, measuring students’ word-­
of the following conditions: Students worked individu- reading skills for students completing a GG intervention
ally within a large group (often in a computer lab), GG compared with a control. The mean effect size was slightly
was played at home, and/or the researcher described negative (g = −0.02, 95% CI [−0.14, 0.09]) but not statisti-
that adults provided only technical assistance after ses- cally significantly different from zero (p = .70), indicating
sion 1. High-­interaction interventions had at least one of that overall, as a word-­reading intervention, there is no evi-
the following implementation characteristics: an adult dence that GG produced growth in students’ word reading.
to child ratio of 1:1 or 1:2, the methods described that an Furthermore, the trend in researchers’ conclusions re-
adult provided support in some manner beyond techni- ported from individual studies (see Table 1) indicate that
cal (e.g., adults provided encouragement), and/or GG although students often learned from GG, their learning
was integrated into small-­group (two to six students), did not typically surpass that of control groups. This result
teacher-­led lessons. is consistent with previous reviews with beginning readers

60 | Reading Research Quarterly, 55(1)


FIGURE 3
Forest Plot for Word-­Reading Outcomes After GraphoGame Intervention
Study name Hedges's g and 95%CI

Ngorosho 2018 School B Int 2


Ngorosho 2018 School AInt 2
Ngorosho 2018 School AInt 1
Huemer et al. 2008
Rosas et al. HI SES 2017
Heikkila et al. 2013
Hintikka et al. 2005
Worth et al., 2018
Kamykowska et al. 2014
Baker et al. 2017
Patel. 2018
Bhide et al. 2013
Brem et al. 2010
Kyle et al. 2013
Rosas et al. LO SES 2017
Hintikka et al. 2008
Ngorosho 2018 School B Int 1
Saine et al. 2010
Carvalhais et al. 2018

-1.00 -0.50 0.00 0.50 1.00


Favors Control Favors Graphogame
Note. Effect sizes are Hedges’s g values. The overall average effect size is displayed as a diamond. The individual effect sizes are displayed as rectangles,
with confidence intervals represented by horizontal lines; horizontal lines with arrows indicate that the confidence interval exceeds ±2 Hedges’s g.

(Campuzano et al., 2009; Cheung & Slavin, 2013; Dynarski together, these findings suggest that select students are
et al., 2007; Slavin et al., 2011; Wood et al., 1999). However, gaining reading subskills through GG but not easily ap-
the wide range of effect sizes (g = −1.07 to 1.55) indicates plying them to word reading.
that the variation may be impacted by additional, contex-
tual factors, which are explored in our moderator What Features of Implementation
analyses.
Furthermore, as noted earlier, to avoid overinflation
May Be Associated With Better
of results when outcome measures are closely aligned Effects in Using GG?
with an intervention, we purposefully used word read- How Does the Impact of GG on Reading
ing as the outcome variable. However, multiple studies Vary Across Languages?
provided evidence of growth in sublexical skills (see
As described earlier, we grouped languages into three
Table 1 for d ­ etails), which we highlight here. Specifically,
­categories: shallow/simple, moderate/complex, and deep/
Huemer et al. (2008), Hintikka et al. (2008), and Heikkilä
complex. Language complexity was not a statistically sig-
et al. (2013) found e­ vidence that students improved in
nificant moderator (see Table  2). This finding indicates
syllable reading  after GG interventions, but only
that factors beyond language explain differences among
Hintikka et  al. found  transfer effects. Patel (2018) and
students’ learning from GG. The lack of linguistic effect
Lovio and colleagues (2012) ­reported that GG students
may be related to researchers’ precise attention to ortho-
improved in ­letter sounds and phonological processing,
graphic differences (e.g., see Kyle et al., 2013), thus mini-
respectively. Similarly, Rosas  and colleagues (2017)
mizing linguistic effects.
found that students from low socioeconomic back-
grounds playing GG grew in letter sound knowledge
and that students from high socioeconomic back- Duration of GG Play and Reading Outcomes
grounds grew in rapid automatized naming. Although To consider relations between the GG exposure and
exact patterns are not consistent across studies, taken reading outcomes, we considered duration (total time)

Critically Reviewing GraphoGame Across the World | 61


TABLE 2
Analysis of Moderators on GraphoGame Word-­Reading Outcomes
Number of effect Test of difference
Moderator variable sizes (k) Effect size (g) Heterogeneity (I2) (Q test)
Language of intervention

Shallow/simple 9 −0.03 0.00

Moderate/complex 6 0.15 46.16

Deep/complex 4 −0.04 0.00 0.57

Duration of intervention

Short (1–3 hours) 5 −0.08 0.00

Medium (3–6 hours) 6 0.19 22.91

Long (6+ hours) 8 0.01 11.93 0.58

Control group type

Untreated control 11 −0.02 0.00

Math Computer Game 5 −0.01 39.14

Reading intervention 3 −0.08 56.63 0.88

Level of adult interaction

Low 14 −0.07 0.00

High 5 0.48* 7.10 0.01*

*p < .05.

in three categories (see Table  2). Duration was not a a series of small-­group work, with GG being one learn-
statistically significant moderator. The nonsignificance ing station. This could allow teachers to integrate GG
of duration aligns with previous reviews (e.g., Clark learning across activities. In contrast, in Carvalhais,
et  al., 2016). However, as a caution, most researchers Richardson, and Castro (2018) and Kyle et  al. (2013),
offered length of play time as a principal limitation and the students played in a different room, thus the learn-
speculated that extended GG time would have yielded ing within the game likely remained relatively isolated
greater learning; our findings do not support that from the classroom curriculum. However, in addition to
inference. technology support, an adult provided encouragement
and motivational support. Finally, in Bhide, Power, and
Goswami (2013) and Hintikka et al. (2008), the students
Adult Interaction and Reading Outcomes played GG in either a one-­to-­one setup with an adult or
We coded multiple aspects of implementation, yielding in student pairs with an adult.
two categories. Studies with high interaction were im- Beyond those five studies included in the meta-­
plemented in either individual or small-­group situa- analysis, multiple other researchers discussed the role
tions, with an adult providing support, typically of adults for GG interventions, both for affective and
motivational. In studies with low interaction, students cognitive support (see Table 1). Ecochard’s (2015) eth-
played GG in a fully independent manner. Adult inter- nographic work most thoroughly addressed this issue.
actions proved to be a statistically significant modera- Despite i­ ntending to occupy an observer role, she found
tor (see Table 3), with high interactions associated with that to  ensure students’ adequate participation with
better word reading: Studies with high adult interac- GG, her role must expand. The game’s reward system
tion had a moderate positive effect (g = 0.47), whereas did not sustain students’ motivation, but her encour-
those with low adult interaction had a small negative agement and ­ attention readily motivated students.
effect (g = −0.07). Other researchers’ observations (Kamykowska et  al.,
The studies with high adult interaction occurred in 2014; Oksanen, 2010; Ronimus & Lyytinen, 2015) tri-
three different manners, each of which would lead to angulated Ecochard’s conclusions about students’ moti-
increased interaction. In Saine, Lerkkanen, Ahonen,
­ vation toward the game when playing solo. Likewise,
Tolvanen, and Lyytinen (2011), students rotated through when explaining students’ greater success with GG in

62 | Reading Research Quarterly, 55(1)


TABLE 3
Theoretical Categories in GraphoGame Research
Theoretical Examples from Citation
category Theoretical focus GraphoGame research frequency
Theories informing Addresses and explains causes and sequence of reading • Double-deficit theory of 14
reading disability disabilities and/or delays: These theories may address developmental dyslexia (Wolf &
typical development but only as a contrast to delayed Bowers, 1999)
development. • Phonological deficit hypothesis
(Snowling, 1998)

Psycholinguistic Systematically applies language development and • Psycholinguistic grain size theory 12
theories linguistic structures to reading acquisition; often (Ziegler & Goswami, 2005)
compares learning with reading across languages of • Small versus large unit theories
varying dimensions of reading acquisition (Seymour
& Duncan, 1997)

Micro reading Details one or a few subprocesses of reading, such as • Automatic information 12
theories word reading or automaticity; typically includes both processing (LaBerge & Samuels,
the development of that skill and cognitive pathways 1974)
• Dual-route cascaded model of
visual word recognition and
reading aloud (Coltheart, Rastle,
Perry, Langdon, & Ziegler, 2001)

General learning Cognitive theories that inform the internal working • Theories of working memory 5
theories of the mind during learning, memory, and skill (Baddeley, 2012)
attainment; not specific to language and reading • Theory of instruction (Bruner,
1966)

Macro reading Addresses at least two of the three major subdivisions • Simple view of reading (Hoover 3
theories within a continuum of reading practices: decoding, & Gough, 1990)
comprehension, or response

Affective and Informs the relation between emotions and learning; • Self-determination theory (Ryan 2
motivational predicts why learners engage in activities; does not & Deci, 2000)
theories have to be specific to literacy

Social learning Considers the central role of social interaction in • Cultural difference theories 1
theories knowledge and learning; asserts that literacy is a (Eisenhart, 2001)
cultural practice and therefore influenced by families,
communities, and culture

school, as compared with home, Ronimus et al. (2014) To What Extent Does Theory
noted that teachers provided greater support and inter- Inform GG Research and Relate
action than did parents. to Contextual Factors?
Shifting to cognitive benefits, learning from technol-
ogy is often conceptualized as occurring within a triangu- Theoretical Representation
lar dynamic, with the three points being the technology, Within the corpus of studies, 27 unique theories were iden-
the learner, and the teacher (Schmid, Miodrag, & Di tified and referenced 49 times. Informed by theoretical re-
Francesco, 2008; Wood et  al., 1999). Providing evidence views (Sadoski & Paivio, 2007; Tracey & Morrow, 2017), we
for that concept, Ecochard (2015) documented that her categorized individual theories into seven categories (see
role included co-­constructing meaning with the Peruvian Table 3). Macro reading theories contain aspects of a uni-
students. Through discussions, she helped students inter- fied theoretical model, whereas micro reading theories de-
nalize more understanding from GG and begin to transfer tail a reading subprocess such as word reading or decoding
that learning to text reading. Similarly, in Zambia (Jere-­ (Sadoski & Paivio, 2007). The psycholinguistic theories
Folotiya et al., 2014) and Tanzania (Ngorosho, 2018), the typically compare reading across languages, whereas theo-
most successful intervention models were when both ries informing reading disability aim to predict reading dis-
teachers and students had the opportunity to play GG, ability development.
likely allowing teachers to better understand the learning In total, this corpus of research was dominated by
within the game. Likewise, Patel (2018) suggested that the theories informing reading disability. The prominence
limited effects of GG on word reading was related to a lack of such theories is predictable because many studies tar-
of teacher involvement. geted at-­risk students. This theoretical focus also aligns

Critically Reviewing GraphoGame Across the World | 63


with the developers’ reason for distributing GG: the pre- broader questions regarding learning with technology,
vention of reading difficulties (Lyytinen et al., 2009). In motivation, or adapting an intervention across cultural
contrast, the next two well-­represented categories, psy- boundaries.
cholinguistic theories and micro reading theories, pro-
vide insight for both at-­risk and normally developing To What Extent Do Theories Inform
students and were essential for researchers to adapt the
content of GG across languages. Finally, it is important GG Research?
to note that cognitivist theories (Tracey & Morrow, To capture the level of connection between theory and
2017) dominate all three of these categories, suggesting the research, we coded how explicitly the theory was con-
that researchers conceptualize reading as a cognitive nected to design. Studies coded as explicit had a clearly
skill rather than a cultural or social act. stated theory that could be traced to design and/or inter-
Of additional interest is considering underrepre- pretation. In contrast, studies coded as implicit had an
sented theories. Unexpected, due to attention within the identifiable theory(s) that could be discerned through
game development literature on motivation, was that careful reading but was not cued by the author. Consistent
only two studies (Ronimus et  al., 2014; Ronimus & with previous research examining theory within technol-
Lyytinen, 2015) cited motivation theories, and both ogy and reading instruction research (Yang et al., 2018),
were specifically studying engagement. Connected to theories were often implicit: Only five of the 28 articles
the lack of theoretical attention, researchers rarely mea- (18%) presented an explicit theoretical link.
sured motivation, indicating an assumption of motiva- In situations with explicit theoretical links, theory in-
tion. For example, current descriptions of GG state that formed research in multiple ways. For example, theory in-
“positive feedback sustains the child’s engagement in formed the adaptation of GG, as described by Kyle and
playing for sufficient time for learning to be established” colleagues (2013): “GG Rime is based on the intrasyllabic
(GraphoLearn, n.d.a, para. 2). unit of the rime, which is argued to be an important psy-
Furthermore, only the single ethnographic study cholinguistic unit for English-­ speaking students
(Ecochard, 2015) interpreted results through a social (Goswami, 1999; Ziegler & Goswami, 2005)” (p. 67). In
learning theory—specifically, cultural difference theory other theoretically explicit studies, theory provided the ra-
(Eisenhart, 2001)—discussing how changing conceptions tionale for the research. For example, Ronimus and
of culture impacts research. No socioecological theories Lyytinen (2015) compared the context of where children
were documented, which may be imprudent considering played GG (home or school) amd stated that “according to
how an education intervention, developed in Finland self-­determination theory (see, e.g., Ryan & Deci, 2002),
(with a notably unique education system), has been pi- ideal learning environments are those that satisfy the three
loted worldwide. Additionally, this absence of sociocul- basic psychological needs of competence, autonomy, and
tural theories seems discrepant with the detailed attention relatedness” (p. 125). Additionally, researchers who pro-
provided to linguistic theory and likely reflects a cognitiv- vided an explicit link were more likely to interpret results
ist viewpoint. As a result, issues on how a serious game for with theory. For example, Ronimus and Lyytinen inter-
word reading aligns with international schooling prac- preted that school-­based GG may be more successful than
tices and cultural systems remain largely unexplored. a home-­based GG intervention, due to the school environ-
ment satisfying children’s psychological needs.
Individual Theories
To further explore the theoretical grounding, we also To What Extent Do Methodological
quantified the three most prevalent individual theories. Choices Impact Our Understanding
Psycholinguistic grain size theory (Ziegler & Goswami, of GG’s Effectiveness?
2005) was most cited. Due to GG’s use across countries In this subsection, we document methodological decisions
and languages, a focus on how language features impact made by researchers and consider how such choices im-
children’s reading acquisition is highly relevant. The sec- pact our knowledge of learning with GG.
ond most represented theory was Ehri’s (2005) phase
model of word reading, which provided a framework to
describe readers’ pathway from emergent reading to
Predominant Research Methodologies
word reading. Finally, LaBerge and Samuels’s (1974) the- Quantitative Methodology
ory of automaticity was the third most cited theory. GG’s Of the 28 publications, the majority (n = 25) reported
design of repetition and rapid matching of sound to sym- quantitative results, incorporating many best practices of
bol connects logically to automaticity. Again, we empha- large-­group multivariate design. This quantitative focus is
size that these theories all logically inform the content aligned with the GG network’s goals of providing evidence-­
and linguistic adaptations of GG but do not guide based support via experimental studies (GraphoWORLD,

64 | Reading Research Quarterly, 55(1)


2010). Within experimental designs, seven employed ran- In comparison, the math intervention (same game
dom assignment, nine used stratified random assignment, format but with numeracy content instead of literacy)
and nine used quasi-­experimental or rigorously matched controlled for the gaming experience. However, in
designs. such studies, any learning gained by GG may simply
reflect that engaging in any type of reading practice
Qualitative and Mixed Methodology enhances reading skills. Therefore, we argue that
Qualitative research was underrepresented, with only for  establishing efficacy as an intervention, a well-­
two qualitative studies. One, an ethnographic case study, characterized reading intervention would be the most
documented GG’s implementation in a rural Peruvian appropriate comparison.
school (Ecochard, 2015). Ecochard explained that her Furthermore, for future research, it is equally im-
methodological choice was to “complement the quanti- portant to note strengths. Following Goswami’s (2003)
tative approach which has dominated the research on recommendation, select researchers (see Kamykowska
Graphogame” (p. 2). The rich descriptions of settings et al., 2014; Saine et al., 2010, 2011, 2013) considered in-
provided insight regarding the role of cultural context. tervention readers’ growth (compared with a control
The second qualitative study (Oksanen, 2010) docu- group), as well as the extent that struggling readers at-
mented case studies about Russian immigrant children tenuated the gap with normally developing peers.
learning to read Finnish. Unlike group design, Oksanen’s
work revealed the many disparate pathways that chil- Fidelity of Implementation
dren can move through GG. Although no studies were excluded for lacking imple-
Using Tashakkori and Creswell’s (2007) definition of mentation fidelity, this reflects GG’s capacity to store
mixed-­methods research “as research in which the investi- students’ work progress on a server and led most re-
gator collects and analyzes data, integrates the findings, and searchers to report students’ mean playing times.
draws inferences using both qualitative and quantitative Unfortunately, few authors provided in-­depth evidence
approaches or methods in a single study or a program of of treatment fidelity as recommended by the Council for
inquiry” (p. 4), such designs were absent. This absence is Exceptional Children (Cook et  al., 2014) and other
particularly surprising considering the availability of pro- frameworks (Dane & Schneider, 1998), which would in-
cess data. Select researchers even collected multiple sources clude coverage of material, student responsiveness (e.g.,
of data but did not employ a mixed-­methods design. For engagement), and quality of delivery (e.g., intervention-
example, Saine and colleagues (2010, 2011; Saine, ist enthusiasm). In fact, multiple researchers voiced
Lerkkanen, Ahonen, Tolvanen, & Lyytinen, 2013) con- ­concern about students’ engagement (Ecochard, 2015;
ducted tutoring observations and recorded discussions Kamykowska et  al., 2014; Oksanen, 2010; Patel, 2018;
with teachers, but the data were not used in analysis. Ronimus & Lyytinen, 2015), but only more rigorous
measures of fidelity could capture this information.
Methodological Rigor Select studies provided more complete examples of
Although the studies all met minimal rigor guidelines treatment to fidelity, which can inform future research.
on our Methodological Quality Questionnaire (see the Kyle and colleagues (2013) reported both exposure
Appendix), additional consideration of control group time and the final levels reached, thus confirming
and aspects of fidelity of implementation is needed to progress. Using GG server data, Oksanen (2010) ana-
understand the technology’s effectiveness fully. lyzed both duration and students’ play behavior for in-
dication of guessing, and through this analysis, she
Control Group identified outliers. Finally, Worth, Nelson, Harland,
Researchers used three different types of control groups Bernardinelli, and Styles (2018) provided an exemplar
of varying rigor, thus complicating attempts for direct of measuring treatment fidelity, including consultation
comparisons: untreated control (n = 7), nonreading and teacher interviews for 14 of 15 schools. The re-
computer intervention (e.g., a math game designed to searchers then reported and assessed the impact of is-
mirror GG; n = 5), or a competing intervention for read- sues relating to dosage, mode of delivery, and level of
ing skills (n = 3). We tested control group type statisti- difficulty.
cally, and it was a not a statistically significant moderator
for word-­reading gain (see Table 2). However, the lack of
significance does not negate this weakness in research General Discussion
design. Specifically, although a nontreatment control
group allows for calculating additive benefits, it in- and Implications
creases internal validity threats via the Hawthorne effect GG, a serious game for reading attainment developed by
(McCambridge, Witton, & Elbourne, 2014). leading researchers from neuropsychology, linguistics,

Critically Reviewing GraphoGame Across the World | 65


and special education, has achieved attention worldwide often referred to as the gold standard (Eisenhart, 2006),
and been promoted as an avenue to address literacy needs are considered essential for building confidence in edu-
in developing nations. GG derives from a highly regarded cational research. However, although randomized con-
longitudinal study of dyslexia from Finland (Jyväskylä trol trials occupy a critical niche, they contain limitations
Longitudinal Study of Dyslexia), in which researchers that other methods can complement.
identified a key bottleneck where readers with reading In GG research, investigators faced a myriad of con-
disabilities diverged from normally developing readers in textual characteristics beyond linguistic diversity (e.g.,
achieving automaticity in phoneme–grapheme corre- irregular school attendance: Patel, 2018; teachers’ knowl-
spondence. GG instructionally targets such bottlenecks, edge of language development: Jere-­Folotiya et al., 2014;
but our synthesis reveals that, at best, GG’s effectiveness student disengagement: Kamykowska et al., 2014), which
for word-­reading gains is inconsistent. More concerning can add much noise to quantitative measures and is often
is that despite initial promising results, GG did not yield more readily captured through qualitative approaches.
meaningful gains in word reading overall. This logically However, only two qualitative studies (Ecochard, 2015;
leads to the question as to why a learning game, originally Oksanen, 2010) were located, and neither were published
demonstrated to be highly effective in Finland (e.g., Saine (see Table 1). However, these two studies provide critical
et al., 2010, 2011, 2013), has not been consistently repli- insight for interpreting findings and guiding GG imple-
cated by others. Looking more broadly, we began this re- mentation, particularly related to issues of motivation
view referencing two questions: and differential response by individual students.
Similarly, there was an absence of mixed-­methods
1. What is the proper role of integration of comput-
research in GG, which may also represent a more gen-
ers in reading instruction?
eral gap in the research base for struggling readers
2. Under what conditions can they replace or sup- (Klingner & Boardman, 2011). Yet, as researchers from
plement conventional instruction? special education (Klingner & Boardman, 2011) and
Putting our findings within previous research contexts, educational technology (Amiel & Reeves, 2008; Eng,
in this section, we address implications for the intersec- 2005) have contended, we need to recognize the com-
tion of CAI and reading in the classroom. plexity of implementing an intervention in real-­world
contexts. When designed thoughtfully, mixed methods
can help researchers explore socially situated and con-
Theoretical and Methodological textualized learning processes, as well as understand in-
Choices Impacting Epistemology dividual differences (McCrudden, Marchand, & Schutz,
Based on our theoretical analysis, we infer that GG is an 2019). In turn, such approaches could address the need
intervention designed with exquisite attention to lin- for parallel examination of larger contextual analyses
guistics but paired with a model of learning that puts the (i.e., cultural, political).
main emphasis on the technology itself at the cost of In summary, the GG research favors quantitative,
other relevant agents, such as the student, teacher, peda- large-­ group designs. Considering that this serious
gogy, and culture. Considering that Finland is an outlier game was developed in a European context, the socio-
in education (albeit a positive one), it is not surprising political complexity of this project (Geraldi, Maylor, &
that an intervention that worked in this unique environ- Williams, 2011) may not be readily captured using only
ment (e.g., highly effective educational system, high quantitative outcome methods. Therefore, moving for-
overall socioeconomic status, transparent orthography) ward, GG and CAI research may require shifting epis-
may not easily transfer across the globe, even when the temological beliefs to more formally represent that in
instruction is adapted to the language. Success in methodological design, learning to read is a complex
Finland may be attributed to a complexity of influence event embedded within systems of families, schools,
that could result from an interaction among the class- and cultures.
room instruction, orthography, and GG or even reside
outside the game (e.g., the teachers’ manner of integrat-
ing GG). Unfortunately, researchers to date, perhaps be- Implementation Interactions:
cause of a reliance on quantitative outcomes but with The Adults in the Room
minimal process data, have not fully examined critical Our findings indicate the fundamental importance of
aspects of the learning process (e.g., engagement, goal adult involvement when students learn word-­reading
orientation, social interactions, school environment). skills from a serious game. This finding may be of
The preponderance of group experimental designs likely greater significance than the overall lack of effect of GG
reflects a shift that occurred in the early 2000s toward because it informs the forefront of research on CAI
prioritization for randomized experimental research as ­instruction of a critical parameter: adult–child inter­
a basis for policy (Slavin, 2002). These research designs, actions. The traditional model of pairing individual

66 | Reading Research Quarterly, 55(1)


children and expecting transfer to reading tasks is out- Limitations of Transfer Among CAI,
dated and has been proven, time and again, to be inef- Sublexical Learning, and Word Reading
fective (e.g., Campuzano et  al., 2009; Dynarski et  al.,
Throughout this review, we maintained a strict focus on
2007; Wood et al., 1999). Therefore, echoing the call of
word reading as our outcome variable of interest be-
researchers in education technology (Amiel & Reeves,
cause prereading outcome measures are often designed
2008; Eng, 2005; Gobet & Wood, 1999; Livingstone,
in close alignment with interventions, which can over-
2012), we recommend that research expands to also fo-
state effects (e.g., Slavin et al., 2011). However, as docu-
cus on aspects of implementation, particularly as related
mented in Table  1, students in multiple studies gained
to motivation and transfer.
sublexical knowledge, although such learning did not
Paradoxically, the lack of learning effects was most
consistently benefit word reading. This conundrum is
prominent when the game was used in strict accordance
complex but consistent with demonstrated limitations
with its prescriptions, such as emphasizing its potential
of learning from educational technology (e.g., Lawless,
for playing in solitude, in which the game itself takes the
2016; Livingstone, 2012).
role as an additional teacher (GraphoLearn, n.d.a). As
First, in two reviewed studies, researchers discussed
Huemer et al. (2008) discussed, because literacy is a so-
alignment issues between GG and the classroom read-
cial undertaking, independent computer learning may
ing curriculum. Kamykowska et  al. (2014) noted that
need to be nested within more authentic and social
phoneme-­based GG training did not complement the
reading activities to become fully realized. Additionally,
synthetic phonics class instruction for Polish reading,
teacher instructional support during CAI helps learners
rendering GG as redundant. Extra time on synthetic
more efficiently select and organize new information
phonics did not add value. Kyle et al.’s (2013) compari-
(Wouters & van Oostendorp, 2013).
son between the versions of GG Phoneme and GG Rime,
Understanding the exact type of teacher or adult in-
both of which were supporting a synthetic phonics cur-
volvement requires additional in-­situation evaluations of
riculum for English, explicitly considered alignment.
the game, but based on the current review, we can tenta-
Although equivalent learning occurred in both versions,
tively draw a few conclusions. When GG was more inte-
there were trends of greater effectiveness for GG Rime,
grated into a coherent instructional sequence (e.g., Saine
which may be due to the complementary effects of ana-
et  al., 2011), the effects appear stronger. Researchers
logical, rime-­based phonics.
have expressed concerns that the learning within com-
Second, more globally, the lack of transfer between
puter games can remain inert, meaning situated only
sublexical skills and word-­reading tasks is likely related
within the game (Renkl, Mandl, & Gruber, 1996), but
to the discrete nature of supplementary CAI programs,
teacher questioning can help learners activate that
as compared with (more effective) situations when CAI
knowledge to novel situations. Similarly, others have ex-
is integrated (Cheung & Slavin, 2012). As a field, it
pressed concerns that game-­based learning is intuitive,
seems that this lesson repeats itself, because an early
meaning that learners can apply it but not verbalize it
large-­scale evaluation of the integrated learning system
(Leemkuil & de Jong, 2011). Therefore, discussions with
concluded that
adults during the game play requires verbalization of
learning. either one accepts that such systems have no place in the
The second trend in adult involvement relates to classroom at all, or one accepts that they must be integrated
student motivation, particularly through relational alongside other teaching and learning practices if they are to
work as a significant contributor to learning and understand-
channels, which would be predicted by sociocultural
ing. (Wood et al., 1999, p. 104)
theories. For example, Kyle and colleagues (2013) re-
ported (ironically as a limitation) that the researchers’ Supporting this approach, the Finnish studies conducted
encouragement may have enhanced GG’s positive ef- by Saine and colleagues (2010, 2011, 2013), which dem-
fects. Ecochard (2015) documented that although the onstrated positive and persistent effects of GG (g = 0.83),
reward system in the game became less effective over followed a model in which computer instruction was
time, the students were highly motivated by her atten- embedded in a 45-­ minute small-­ group intervention,
tion and encouragement, whether verbal or nonverbal. likely allowing teachers to coordinate across activities.
For example, she even rearranged seating to increase her Finally, looking to disciplinary-­ specific explana-
physical proximity to more students during the GG ses- tions, a potent perspective was put forth by Compton,
sions. This finding is in concert with Baye et al.’s (2018) Miller, Elleman, and Steacy (2014), who challenged con-
recent analysis of successful reading interventions in temporary code-­based (i.e., phonics) interventions for
which programs with positive outcomes emphasized their lack of effects in promoting generative code-­based
student motivation, student–teacher relationships, and skills, that is, skills that are transferable beyond the mere
socioemotional learning. mastery of code to text reading. In line with this,

Critically Reviewing GraphoGame Across the World | 67


Heikkilä et al. (2013) conjectured that after teaching syl- as such, failed to assess the unique contribution of GG as
lables in GG, an intermediary step of teaching students a reading intervention. Ideally, studies would use both a
how to read words through syllable analysis would assist true control and a comparison matched group to study
transfer. Future research initiatives should, in line with the rate in which students diverge, or converge, with
Compton and colleagues’ perspective, search for inter- typically developing peers.
ventions that propel generative skills, rather than as-
sume that sublexical growth will provide later reading
benefits. Implications for the Classroom:
Tutor/Child/Computer Triangle
Recommendations for Future Research We must now address an inconvenient implication of our
Researchers should replicate the quantitative studies in findings. In classrooms, CAI often serves a dual purpose of
contexts where GG and other CAI programs have been assisting with classroom management. Therefore, our rec-
shown to have strong effects on reading. Such studies ommendation for increased teacher interactions with CAI
are needed to clarify the extent that the active ingredient may reduce serious games’ functional role. For example, in
resides inside or outside the game. For example, the se- Baker et al. (2017), despite lack of reading gains, teachers
ries of studies by Saine and colleagues (2010, 2011, remained positive about GG because they “found it easy to
2013), which produced particularly stronger reading incorporate as a means for providing students with oppor-
outcomes on average, was characterized by an integrated tunities to independently practice their letter-­sound corre-
approach in which GG play was immediately followed spondence knowledge while teachers provided small-­group
with small-­group, teacher-­led instruction. It remains an instruction to other students” (p. 234). Similarly, Worth
open question as to whether the level of support and et  al. (2018) reported that teachers were overwhelmingly
teacher feedback led to such large reading effects or positive even though “the vast majority were hesitant to
whether the cultural, linguistic, and educational con- specifically assign any improvements in phonics attain-
texts of Finland facilitated such growth. ment or progress to playing it” (p. 33), thus implying that
Furthermore, the overall (quantitative) conclusion GG’s perceived value was not purely instructional.
was most saliently explained in Ecochard’s (2015) qualita- Therefore, we suggest that future research on stu-
tive observations and reflections from her ethnographic dents using CAI should explore potential benefits of
and culturally specific work. The utility of her work gives adult support by those who are not necessarily highly
evidence to the importance of in-­situation evaluations of qualified classroom teachers but may be community
educational technology. More qualitative and mixed-­ volunteers or paraprofessionals. Such cost-­effective ar-
methods studies are needed. Such work would also allow rangements may capitalize on the social benefits of tu-
researchers to better engage with practitioners in the de- toring and the systematic reading content presented by a
sign process and implementation strategies and result in CAI. Research by Schmid et al. (2008), in which tutors
meaningful learning (Amiel & Reeves, 2008). individually supported preschool students’ reading skills
Researchers generally provided strong empirical using CAI, shows promise. The tutors provided both so-
background, but the link to theory was tenuous. Future cioemotional and instructional support: creating rap-
research should prioritize theoretical clarity and diver- port, building motivation, and scaffolding instruction.
sity because theory advances learning interventions in The researchers concluded that a well-­designed CAI
general. Unfortunately, according to Compton and col- program can be effective and motivational for young
leagues (2014), the reading field learners when supported by an adult tutor, deemed as
the tutor/child/computer triangle. Chambers and col-
may have inadvertently diluted reading theory in ways that leagues’ (2011) work provides an additional template.
compromise the power of intervention programs.…we argue Struggling readers in grades 1 and 2 were systematically
that current intervention programs target instruction at a supported with a combination of tutors and CAI and
knowledge level below that which is necessary to foster read-
non-­CAI instructional reading activities, both in small
ing skill development that is ‘generative’ in children with RD
[reading disabilities]. (p. 55) groups and individually, adhering to the Tier I and Tier
II levels of the Response to Intervention framework
Finally, future empirical research should also use (U.S. Department of Education, 2005).
control and comparison groups judiciously. Three dif- Although volunteers and well-­researched CAI cannot,
ferent types of control groups were used in the reviewed and should not, replace teachers, a combined approach,
studies, thus complicating attempts for direct compari- highly sensitive to the role of interactions between teacher
sons: no-­treatment control group, nonreading computer and students and between students, in which learning
intervention (typically math), or a competing reading technology provides instructional content, is promising.
intervention. A large portion of the studies avoided Although technology has fully entered reading class-
comparing with a competing reading intervention and, rooms, our focus should always prioritize teacher–student

68 | Reading Research Quarterly, 55(1)


interactions, whether the interactions surround technol- Referencing the GG developers’ goal of empowering
ogy or books. McTigue and Uppstad (2019) provided rec- marginalized groups with limited access to literacy in-
ommendations to enact such principles for CAI in a struction, we recognize innovative technological solu-
classroom setting. tions as noble and meaningful. However, operationalizing
such attempts may require revisiting epistemological
Limitations beliefs to more fully recognize that learning to read is a
complex event embedded within systems of classrooms,
This review only documents one educational serious
families, and cultures, which requires interaction with
game and thus is limited in generalizability to other seri-
others. Therefore, in the manner that medical research-
ous games. However, we argue that controlling the type
ers have begun to broaden their repertoire of method-
of game allows us to more readily consider issues of con-
ologies (Curry, Nembhard, & Bradley, 2009), research
text and thus uniquely add to the research. Despite all
with struggling readers may also benefit from moves
studies using GG, it is important to note that it is not a
toward greater plurality of methodology. As Ferri,
singular game but a learning platform that varies in both
Gallagher, and Connor (2011) explained, in an effort to
content and playing features. Thus, although the varia-
determine what works, we may lose sight of what
tion does not create a situation of comparing the prover-
matters.
bial apples and oranges, it arguably creates a situation
of  comparing Granny Smith and McIntosh apples. NOTES
However, any concerns related to game variability was
This work was supported by a grant (237861) from the Research
attenuated by our rich descriptions of the studies and Council of Norway’s research program FINNUT. We are grateful to
use of the random effects model in the meta-­analysis. In the GraphoGame developers and researchers, Drs. Heikki Lyytinen
a subset of the reviewed studies, GG was implemented and Ulla Richardson, for providing valuable feedback on this
for a short duration, which may reduce any measured manuscript.
effect. However, the shortest duration included in the
meta-­ analysis was 90–120 total minutes across six REFERENCES
sessions. References marked with an asterisk indicate studies included in the
Additionally, duration was not found to be a statisti- narrative review.
cally significant moderator. However, before transfer of Agora Center. (n.d.). UNESCO Chair on inclusive literacy learning for
skills can occur, students at risk for dyslexia may need a all. Jyväskylä, Finland: Agora Center, University of Jyväskylä.
longer, sustained duration of interventions than stu- Retrieved from https://agoracenter.jyu.fi/projects/unesco-chair
Amiel, T., & Reeves, T.C. (2008). Design-­based research and educa-
dents who are not at risk need (Lyon & Moats, 1997). tional technology: Rethinking technology and the research agenda.
Finally, because of a lack of process data, we can identify Journal of Educational Technology & Society, 11(4), 29–40.
that adult interaction was a statistically significant mod- *Bach, S., Richardson, U., Brandeis, D., Martin, E., & Brem, S. (2013).
erator, but researchers provided little information on the Print-­specific multimodal brain activation in kindergarten im-
types of interactions. proves prediction of reading skills in second grade. NeuroImage,
82, 605–615. https://doi.org/10.1016/j.neuroimage.2013.05.062
Baddeley, A. (2012). Working memory: Theories, models, and con-
Conclusion troversies. Annual Review of Psychology, 63, 1–29. https://doi.org/
10.1146/annurev-psych-120710-100422
In this review, we first documented GG’s lack of evi- *Baker, D.L., Basaraba, D.L., Smolkowski, K., Conry, J., Hautala, J.,
dence for reading improvement when implemented as Richardson, U., … Cole, R. (2017). Exploring the cross-­linguistic
recommended (an independent CAI tutorial), which transfer of reading skills in Spanish to English in the context of a
raises the question of how this serious game’s global computer adaptive reading intervention. Bilingual Research
popularity has arguably exceeded its efficacy as a read- Journal, 40(2), 222–239. https://doi.org/10.1080/15235882.2017.13
09719
ing intervention. Such a disconnect is particularly trou- Baye, A., Inns, A., Lake, C., & Slavin, R.E. (2018). A synthesis of quan-
bling when considering the scarce resources for titative research on reading programs for secondary students.
education in low-­literacy nations investing in GG. We Reading Research Quarterly, 54(2), 133–166. https://doi.org/10.
argue that the narrow use of research methodologies 1002/rrq.229
and the limited application of learning theories created a *Bhide, A., Power, A., & Goswami, U. (2013). A rhythmic musical in-
tervention for poor readers: A comparison of efficacy with a letter-­
context for this disconnect to occur. The preponderance based intervention. Mind, Brain, and Education, 7(2), 113–123.
of special education and linguistic theory, as well as https://doi.org/10.1111/mbe.12016
quantitative research design, resulted in an underrepre- Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2005).
sentation of critical and affective theories and qualita- Comprehensive Meta-Analysis (Version 3) [Computer software].
tive and mixed-­ methods research, thus limiting our Englewood, NJ: Biostat.
Borenstein, M., Hedges, L.V., Higgins, J., & Rothstein, H.R. (2010). A
knowledge. This methodological imbalance may reflect basic introduction to fixed-­effect and random-­effects models for
an epistemological belief that undergirds special educa- meta-­analysis. Research Synthesis Methods, 1(2), 97–111. https://
tion research more generally. doi.org/10.1002/jrsm.12

Critically Reviewing GraphoGame Across the World | 69


*Borleffs, E., Glatz, T.K., Daulay, D.A., Richardson, U., Zwarts, F., & Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-­
Maassen, B.A. (2018). GraphoGame SI: The development of a plot–based method of testing and adjusting for publication bias in
technology-­enhanced literacy learning tool for Standard Indonesian. meta-­analysis. Biometrics, 56(2), 455–463. https://doi.org/10.1111/
European Journal of Psychology of Education, 33(4), 595–613. j.0006-341X.2000.00455.x
https://doi.org/10.1007/s10212-017-0354-9 Dynarski, M., Agodini, R., Heaviside, S., Novak, T., Carey, N.,
*Brem, S., Bach, S., Kucian, K., Kujala, J.V., Guttorm, T.K., Martin, E., Campuzano, L., … Sussex, W. (2007). Effectiveness of reading and
… Richardson, U. (2010). Brain sensitivity to print emerges when mathematics software products: Findings from the first student co-
children learn letter–speech sound correspondences. Proceedings hort (NCEE 2007-4005). Washington, DC: National Center for
of the National Academy of Sciences of the United States of America, Education Evaluation and Regional Assistance, Institute of
107(17), 7939–7944. https://doi.org/10.1073/pnas.0904402107 Education Sciences, U.S. Department of Education.
Bruner, J.S. (1966). Toward a theory of instruction. Cambridge, MA: *Ecochard, S. (2015). Learning to read with Graphogame, an ethnog-
Harvard University Press. raphy in a Peruvian rural school (Unpublished doctoral disserta-
Cain, K., & Parrila, R. (2014). Introduction to the special issue. tion). University of Jyväskylä, Finland.
Theories of reading: What we have learned from two decades of Egger, M., Smith, G.D., Schneider, M., & Minder, C. (1997). Bias in
scientific research. Scientific Studies of Reading, 18(1), 1–4. https:// meta-­analysis detected by a simple, graphical test. British Medical
doi.org/10.1080/10888438.2013.836525 Journal, 315, 629–634. https://doi.org/10.1136/bmj.315.7109.629
Campuzano, L., Dynarski, M., Agodini, R., & Rall, K. (2009). Ehri, L.C. (2005). Learning to read words: Theory, findings, and is-
Effectiveness of reading and mathematics software products: sues. Scientific Studies of Reading, 9(2), 167–188. https://doi.org/
Findings from two student cohorts (NCEE 2009-4041). Washington, 10.1207/s1532799xssr0902_4
DC: National Center for Education Evaluation and Regional Eisenhart, M. (2001). Changing conceptions of culture and ethno-
Assistance, Institute of Education Sciences, U.S. Department of graphic methodology: Recent thematic shifts and their implica-
Education. tions for research on teaching. In V. Richarson (Ed.), The handbook
*Carvalhais, L., Richardson, U., & Castro, S.L. (2018). Computer- of research on teaching (4th ed., pp. 209–225). Washington, DC:
assisted reading and spelling intervention with Graphogame American Educational Research Association.
Fluent Portuguese. In Á. Rocha, H. Adeli, L. Reis, & S. Costanzo Eisenhart, M. (2006). Qualitative science in experimental time.
(Eds.), WorldCIST’18 2018: Vol. 2. Trends and advances in infor- International Journal of Qualitative Studies in Education, 19(6),
mation systems and technologies (pp. 1452–1460). Cham, 697–707. https://doi.org/10.1080/09518390600975826
Switzerland: Springer. Ellis, N.C., Natsume, M., Stavropoulou, K., Hoxhallari, L., Daal, V.H.,
Chambers, B., Slavin, R.E., Madden, N.A., Abrami, P., Logan, M.K., & Polyzoe, N., … Petalas, M. (2004). The effects of orthographic
Gifford, R. (2011). Small-­group, computer-­assisted tutoring to im- depth on learning to read alphabetic, syllabic, and logographic
prove reading outcomes for struggling first and second graders. scripts. Reading Research Quarterly, 39(4), 438–468. https://doi.
The Elementary School Journal, 111(4), 625–640. https://doi. org/​10.1598/RRQ.39.4.5
org/10.1086/659035 Eng, T.S. (2005). The impact of ICT on learning: A review of research.
Cheung, A.C., & Slavin, R.E. (2012). How features of educational International Education Journal, 6(5), 635–650.
technology applications affect student reading outcomes: A meta-­ Ferri, B.A., Gallagher, D., & Connor, D.J. (2011). Pluralizing method-
analysis. Educational Research Review, 7(3), 198–215. https://doi. ologies in the field of LD: From “what works” to what matters.
org/10.1016/j.edurev.2012.05.002 Learning Disability Quarterly, 34(3), 222–231. https://doi.org/​
Cheung, A.C., & Slavin, R.E. (2013). Effects of educational technol- 10.1177/0731948711419276
ogy applications on reading outcomes for struggling readers: A Fogarty, M., Oslund, E., Simmons, D., Davis, J., Simmons, L.,
best-­evidence synthesis. Reading Research Quarterly, 48(3), 277– Anderson, L., … Roberts, G. (2014). Examining the effectiveness
299. https://doi.org/10.1002/rrq.50 of a multicomponent reading comprehension intervention in
Clark, D.B., Tanner-Smith, E.E., & Killingsworth, S.S. (2016). Digital middle schools: A focus on treatment fidelity. Educational
­
games, design, and learning: A systematic review and meta-­ Psychology Review, 26(3), 425–449. https://doi.org/10.1007/s10648-
analysis. Review of Educational Research, 86(1), 79–122. https:// 014-9270-6
doi.org/10.3102/0034654315582065 Geraldi, J., Maylor, H., & Williams, T. (2011). Now, let’s make it really
Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). complex (complicated): A systematic review of the complexities of
DRC: A dual route cascaded model of visual word recognition and projects. International Journal of Operations & Production
reading aloud. Psychological Review, 108(1), 204–256. https://doi. Management, 31(9), 966–990. https://doi.org/10.1108/01443571
org/10.1037/0033-295X.108.1.204 111165848
Compton, D.L., Miller, A.C., Elleman, A.M., & Steacy, L.M. (2014). Gobet, F., & Wood, D. (1999). Expertise, models of learning and
Have we forsaken reading theory in the name of “quick fix” inter- computer-­based tutoring. Computers & Education, 33(2/3), 189–
ventions for children with reading disability? Scientific Studies of 207. https://doi.org/10.1016/S0360-1315(99)00032-9
Reading, 18(1), 55–73. https://doi.org/10.1080/10888438.2013.836 Goswami, U. (2003). Why theories about developmental dyslexia re-
200 quire developmental designs. Trends in Cognitive Sciences, 7(12),
Cook, B., Buysse, V., Klingner, J., Landrum, T., McWilliam, R., 534–540. https://doi.org/10.1016/j.tics.2003.10.003
Tankersley, M., & Test, D. (2014). Standards for evidence-based prac- Gough, P.B., & Tunmer, W.E. (1986). Decoding, reading, and reading
tices in special education. Arlington, VA: Council for Exceptional disability. Remedial and Special Education, 7(1), 6–10. https://doi.
Children. org/10.1177/074193258600700104
Curry, L.A., Nembhard, I.M., & Bradley, E.H. (2009). Qualitative and GraphoLearn. (n.d.a). GraphoLearn. Retrieved from https://info.
mixed methods provide unique contributions to outcomes re- grapholearn.com/
search. Circulation, 119(10), 1442–1452. https://doi.org/10.1161/ GraphoLearn. (n.d.b). Partners. Retrieved from https://info.grapholearn.
CIRCULATIONAHA.107.742775 com/partners/
Dane, A.V., & Schneider, B.H. (1998). Program integrity in primary GraphoWORLD. (2010). Declaration on establishing a Language/
and early secondary prevention: Are implementation effects out of Literacy Network of Excellence. Retrieved from http://info.
control? Clinical Psychology Review, 18(1), 23–45. https://doi.org/ grapholearn.com/wp-uploads/2015/02/GraphoWORLD-Declara
10.1016/S0272-7358(97)00043-3 tion_2010_2015.pdf

70 | Reading Research Quarterly, 55(1)


*Heikkilä, R., Aro, M., Närhi, V., Westerholm, J., & Ahonen, T. (2013). Lawless, K.A. (2016). Educational technology: False profit or sacrifi-
Does training in syllable recognition improve reading speed? A cial lamb? A review of policy, research, and practice. Policy Insights
computer-­based trial with poor readers from second and third From the Behavioral and Brain Sciences, 3(2), 169–176. https://doi.
grade. Scientific Studies of Reading, 17(6), 398–414. https://doi.org org/10.1177/2372732216630328
/10.1080/10888438.2012.753452 Lee, Y.-H., Waxman, H., Wu, J.-Y., Michko, G., & Lin, G. (2013).
*Hintikka, S., Aro, M., & Lyytinen, H. (2005). Computerized training Revisit the effect of teaching and learning with technology. Journal
of the correspondences between phonological and orthographic of Educational Technology & Society, 16(1), 133–146.
units. Written Language and Literacy, 8(2), 79–102. https://doi. Leemkuil, H., & de Jong, T. (2011). Instructional support in games. In
org/10.1075/wll.8.2.07hin S. Tobias & J.D. Fletcher (Eds.), Computer games and instruction
*Hintikka, S., Landerl, K., Aro, M., & Lyytinen, H. (2008). Training (pp. 353–369). Charlotte, NC: Information Age.
reading fluency: Is it important to practice reading aloud and is Livingstone, S. (2012). Critical reflections on the benefits of ICT in
generalization possible? Annals of Dyslexia, 58(1), 59–79. https:// education. Oxford Review of Education, 38(1), 9–24. https://doi.org/
doi.org/10.1007/s11881-008-0012-7 10.1080/03054985.2011.577938
Hoover, W.A., & Gough, P.B. (1990). The simple view of reading. *Lovio, R., Halttunen, A., Lyytinen, H., Näätänen, R., & Kujala, T.
Reading and Writing, 2(2), 127–160. https://doi.org/10.1007/ (2012). Reading skill and neural processing accuracy improvement
BF00401799 after a 3-­hour intervention in preschoolers with difficulties in
*Huemer, S., Landerl, K., Aro, M., & Lyytinen, H. (2008). Training reading-­related skills. Brain Research, 1448, 42–55. https://doi.
reading fluency among poor readers of German: Many ways to the org/10.1016/j.brainres.2012.01.071
goal. Annals of Dyslexia, 58(2), 115–137. https://doi.org/10.1007/ Lyon, G.R., & Moats, L.C. (1997). Critical conceptual and method-
s11881-008-0017-2 ological considerations in reading intervention research. Journal
*Jere-Folotiya, J., Chansa-Kabali, T., Munachaka, J.C., Sampa, F., of Learning Disabilities, 30(6), 578–588. https://doi.org/10.1177/​
Yalukanda, C., Westerholm, J., & Lyytinen, H. (2014). The effect of 002221949703000601
using a mobile literacy game to improve literacy levels of grade one Lyytinen, H., Erskine, J., Kujala, J., Ojanen, E., & Richardson, U.
students in Zambian schools. Educational Technology Research (2009). In search of a science-­based application: A learning tool for
and Development, 62(4), 417–436. https://doi.org/10.1007/ reading acquisition. Scandinavian Journal of Psychology, 50(6),
s11423-014-9342-9 668–675. https://doi.org/10.1111/j.1467-9450.2009.00791.x
*Kamykowska, J., Haman, E., Latvala, J.M., Richardson, U., & Lyytinen, Lyytinen, H., Erskine, J., Tolvanen, A., Torppa, M., Poikkeus, A.M., &
H. (2014). Developmental changes of early reading skills in six-­ Lyytinen, P. (2006). Trajectories of reading development: A follow-
year-­old Polish children and GraphoGame as a computer-­based in- ­up from birth to school age of children with and without risk for
tervention to support them. L1-­Educational Studies in Language dyslexia. Merrill-­Palmer Quarterly, 52(3), 514–546. https://doi.
and Literature, 13, 1–17. https://doi.org/10.17239/l1esll-2013.01.05 org/10.1353/mpq.2006.0031
Katz, L., & Frost, R. (1992). The reading process is different for differ- Lyytinen, H., & Richardson, U. (2014). Supporting urgent basic reading
ent orthographies: The orthographic depth hypothesis. In R. Frost skills in children in Africa and around the world. Human Technology,
& L. Katz (Eds.), Advances in psychology: Vol. 94. Orthography, 10(1), 1–4. https://doi.org/10.17011/ht/urn.201405281856
phonology, morphology, and meaning (pp. 67–84). Amsterdam, Ma, W., Adesope, O.O., Nesbit, J.C., & Liu, Q. (2014). Intelligent tu-
Netherlands: North-Holland. toring systems and learning outcomes: A meta-­analysis. Journal of
Khan, K., Riet, G., Popay, J., Nixon, J., & Kleijnen, J. (2001). Study Educational Psychology, 106(4), 901–918. https://doi.org/10.1037/
quality assessment: Undertaking systematic reviews of research ef- a0037123
fectiveness, CRD’s guidance for those carrying out or commissioning McCambridge, J., Witton, J., & Elbourne, D.R. (2014). Systematic re-
reviews. York, UK: NHS Centre for Reviews and Dissemination. view of the Hawthorne effect: New concepts are needed to study
Klingner, J.K., & Boardman, A.G. (2011). Addressing the “research research participation effects. Journal of Clinical Epidemiology,
gap” in special education through mixed methods. Learning 67(3), 267–277. https://doi.org/10.1016/j.jclinepi.2013.08.015
Disability Quarterly, 34(3), 208–218. https://doi.org/10.1177/ McCrudden, M.T., Marchand, G., & Schutz, P. (2019). Mixed methods
0731948711417559 in educational psychology inquiry. Contemporary Educational
*Koikkalainen, M. (2015). Computerized reading fluency assessment: Psychology, 57, 1–8. https://doi.org/10.1016/j.cedpsych.2019.01.008
Task validity and the strongest discriminators of fluency skills McTigue, E.M., & Uppstad, P.H. (2019). Getting serious about serious
among second-graders (Unpublished master’s thesis). University of games: Best practices for computer games in reading classrooms. The
Jyväskylä, Finland. Reading Teacher, 72(4), 453–461. https://doi.org/10.1002/trtr.1737
*Kyle, F., Kujala, J., Richardson, U., Lyytinen, H., & Goswami, U. Miller, D.M., Scott, C.E., & McTigue, E.M. (2018). Writing in the
(2013). Assessing the effectiveness of two theoretically motivated secondary-­level disciplines: A systematic review of context, cogni-
computer-­assisted reading interventions in the United Kingdom: tion, and content. Educational Psychology Review, 30(1), 83–120.
GG Rime and GG Phoneme. Reading Research Quarterly, 48(1), https://doi.org/10.1007/s10648-016-9393-z
61–76. https://doi.org/10.1002/rrq.038 Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., & The PRISMA
LaBerge, D., & Samuels, S.J. (1974). Toward a theory of automatic in- Group. (2009). Preferred Reporting Items for Systematic Reviews
formation processing in reading. Cognitive Psychology, 6(2), 293– and Meta-­Analyses: The PRISMA Statement. PLoS Medicine, 6(7),
323. https://doi.org/10.1016/0010-0285(74)90015-2 e1000097. https://doi.org/10.1371/journal.pmed.1000097
Lai, S.-L., Chang, T.-S., & Ye, R. (2006). Computer usage and reading *Mourgues, C., Tan, M., Hein, S., Ojanen, E., Reich, J., Lyytinen, H., &
in elementary schools: A cross-­ cultural study. Journal of Grigorenko, E.L. (2016). Paired associate learning tasks and their
Educational Computing Research, 34(1), 47–66. https://doi.org/ contribution to reading skills. Learning and Individual Differences,
10.2190/95BG-4XDR-MJWD-KARA 46, 54–63. https://doi.org/10.1016/j.lindif.2014.12.003
Landerl, K., Ramus, F., Moll, K., Lyytinen, H., Leppänen, P.H., *Nakeva von Mentzer, C., Lyxell, B., Sahlén, B., Dahlström, Ö.,
Lohvansuu, K., … Kunze, S. (2013). Predictors of developmental Lindgren, M., Ors, M., … Uhlén, I. (2014). Computer-­assisted
dyslexia in European orthographies with varying complexity. reading intervention with a phonics approach for children using
Journal of Child Psychology and Psychiatry, 54(6), 686–694. https:// cochlear implants or hearing aids. Scandinavian Journal of
doi.org/10.1111/jcpp.12029 Psychology, 55(5), 448–455. https://doi.org/10.1111/sjop.12149

Critically Reviewing GraphoGame Across the World | 71


National Institute of Child Health and Human Development. (2000). *Saine, N.L., Lerkkanen, M.K., Ahonen, T., Tolvanen, A., & Lyytinen,
Report of the National Reading Panel. Teaching children to read: An H. (2011). Computer-­ assisted remedial reading intervention for
evidence-based assessment of the scientific research literature on school beginners at risk for reading disability. Child Development,
reading and its implications for reading instruction (NIH Publication 82(3), 1013–1028. https://doi.org/10.1111/j.1467-8624.2011.01580.x
No. 00-4769). Washington, DC: U.S. Government Printing *Saine, N.L., Lerkkanen, M.K., Ahonen, T., Tolvanen, A., & Lyytinen,
Office. H. (2013). Long-­term intervention effects of spelling development
*Ngorosho, D. (2018). Enhancing the acquisition of basic reading for children with compromised preliteracy skills. Reading &
skills in Kiswahili using GraphoGame. Papers in Education and Writing Quarterly, 29(4), 333–357. https://doi.org/10.1080/105735
Development, 35. Retrieved from http://journals.udsm.ac.tz/index. 69.2013.741962
php/ped/article/view/1488 Schmid, R.F., Miodrag, N., & Di Francesco, N. (2008). A human-­
Ojanen, E., Ronimus, M., Ahonen, T., Chansa-Kabali, T., February, P., computer partnership: The tutor/child/computer triangle promot-
Jere-Folotiya, J., … Puhakka, S. (2015). GraphoGame—a catalyst for ing the acquisition of early literacy skills. Journal of Research on
multi-­level promotion of literacy in diverse contexts. Frontiers in Technology in Education, 41(1), 63–84. https://doi.org/10.1080/15
Psychology, 6, article 671. https://doi.org/10.3389/fpsyg.2015.00671 391523.2008.10782523
*Oksanen, L. (2010). The effect of language environment on learning Schwarzer, G., Carpenter, J.R., & Rücker, G. (2015). Meta-analysis
the letter–sound correspondence through GraphoGame: Case stories with R. Cham, Switzerland: Springer.
of eight immigrant children in Finland (Unpublished master’s the- Seymour, P.H.K., Aro, M., & Erskine, J.M. (2003). Foundation literacy
sis). University of Jyväskylä, Finland. acquisition in European orthographies. British Journal of Psychology,
Parr, J.M., & Fung, I. (2000). A review of the literature on computer- 94(2), 143–174. https://doi.org/10.1348/000712603321661859
assisted learning, particularly integrated learning systems, and out- Seymour, P.H.K., & Duncan, L.G. (1997). Small versus large unit theories
comes with respect to literacy and numeracy. Wellington, New of reading acquisition. Dyslexia, 3(3), 125–134. https://doi.org/10.1002/
Zealand: Ministry of Education. (SICI)1099-0909(199709)3:3<125:AID-DYS85>3.0.CO;2-4
*Patel, P. (2018). GraphoLearn India: The effectiveness of a computer- Slavin, R.E. (2002). Evidence-­based education policies: Transforming
assisted reading intervention in supporting English readers in India educational practice and research. Educational Researcher, 31(7),
(Unpublished master’s thesis). University of Jyväskylä, Finland. 15–21. https://doi.org/10.3102/0013189X031007015
Petticrew, M., & Roberts, H. (2006). Systematic reviews in the social Slavin, R.E., Lake, C., Davis, S., & Madden, N.A. (2011). Effective
sciences: A practical guide. Malden, MA: Blackwell. programs for struggling readers: A best-­ evidence synthesis.
Renkl, A., Mandl, H., & Gruber, H. (1996). Inert knowledge: Analyses Educational Research Review, 6(1), 1–26. https://doi.org/10.1016/j.
and remedies. Educational Psychologist, 31(2), 115–121. https:// edurev.2010.07.002
doi.org/10.1207/s15326985ep3102_3 Smeyers, P., & Verhesschen, P. (2001). Narrative analysis as philosophi-
Richardson, U., & Lyytinen, H. (2014). The GraphoGame method: cal research: Bridging the gap between the empirical and the concep-
The theoretical and methodological background of the technology-­ tual. International Journal of Qualitative Studies in Education, 14(1),
enhanced learning environment for learning to read. Human 71–84. https://doi.org/10.1080/09518390010007629
Technology, 10(1). 39–60. https://doi.org/10.17011/ht/urn.201405 Snowling, M. (1998). Dyslexia as a phonological deficit: Evidence and
281859 implications. Child Psychology and Psychiatry Review, 3(1), 4–11.
*Ronimus, M., Kujala, J., Tolvanen, A., & Lyytinen, H. (2014). Children’s https://doi.org/10.1017/S1360641797001366
engagement during digital game-­based learning of reading: The ef- Tashakkori, A., & Creswell, J.W. (2007). The new era of mixed meth-
fects of time, rewards, and challenge. Computers & Education, 71, ods. Journal of Mixed Methods Research, 1(1), 3–7. https://doi.
237–246. https://doi.org/10.1016/j.compedu.2013.10.008 org/10.1177/2345678906293042
*Ronimus, M., & Lyytinen, H. (2015). Is school a better environment Torgerson, C.J. (2007). The quality of systematic reviews of effective-
than home for digital game-­ based learning? The case of ness in literacy learning in English: A ‘tertiary’ review. Journal of
GraphoGame. Human Technology, 11(2), 123–147. https://doi.org/​ Research in Reading, 30(3), 287–315. https://doi.org/10.1111/j.1467-
10.17011/ht/urn.201511113637 9817.2006.00318.x
Ronimus, M., & Richardson, U. (2014). Digital game-­based training Torppa, M., Georgiou, G.K., Lerkkanen, M.-K., Niemi, P., Poikkeus,
of early reading skills: Overview of the GraphoGame method in a A.-M., & Nurmi, J.-E. (2016). Examining the simple view of read-
highly transparent orthography. Estudios de Psicología, 35(3), 648– ing in a transparent orthography: A longitudinal study from kin-
661. https://doi.org/10.1080/02109395.2014.974424 dergarten to grade 3. Merrill-­Palmer Quarterly, 62(2), 179–206.
*Rosas, R., Escobar, J.P., Ramírez, M.P., Meneses, A., & Guajardo, A. https://doi.org/10.13110/merrpalmquar1982.62.2.0179
(2017). Impact of a computer-­based intervention in Chilean children Tracey, D.H., & Morrow, L.M. (2017). Lenses on reading: An introduc-
at risk of manifesting reading difficulties. Infancia y Aprendizaje, tion to theories and models. New York, NY: Guilford.
40(1), 158–188. https://doi.org/10.1080/02103702.2016.1263451 U.S. Department of Education. (2005). Additional procedures for
Ryan, R.M., & Deci, E.L. (2000). Self-­determination theory and the evaluating children with specific learning disabilities. Federal
facilitation of intrinsic motivation, social development, and well-­ Register, 70(118), 35802–35803.
being. American Psychologist, 55(1), 68–78. https://doi.org/​ 10.​ Vogel, J.J., Vogel, D.S., Cannon-Bowers, J., Bowers, C.A., Muse, K., &
1037/​0003-066X.55.1.68 Wright, M. (2006). Computer gaming and interactive simulations
Sadoski, M., & Paivio, A. (2007). Toward a unified theory of reading. for learning: A meta-­analysis. Journal of Educational Computing
Scientific Studies of Reading, 11(4), 337–356. https://doi.org/ Research, 34(3), 229–243. https://doi.org/10.2190/FLHV-K4WA-
10.1080/10888430701530714 WPVQ-H0YM
*Saine, N.L., Lerkkanen, M.K., Ahonen, T., Tolvanen, A., & Lyytinen, Wolf, M., & Bowers, P.G. (1999). The double-­deficit hypothesis for
H. (2010). Predicting word-­level reading fluency outcomes in three the developmental dyslexias. Journal of Educational Psychology,
contrastive groups: Remedial and computer-­ assisted remedial 91(3), 415–438. https://doi.org/10.1037/0022-0663.91.3.415
reading intervention, and mainstream instruction. Learning and Wood, D., Underwood, J., & Avis, P. (1999). Integrated learning sys-
Individual Differences, 20(5), 402–414. https://doi.org/10.1016/j. tems in the classroom. Computers & Education, 33(2/3), 91–108.
lindif.2010.06.004 https://doi.org/10.1016/S0360-1315(99)00027-5

72 | Reading Research Quarterly, 55(1)


*Worth, J., Nelson, J., Harland, J., Bernardinelli, D., & Styles, B. ERIN M. MCTIGUE (corresponding author) is a research
(2018). GraphoGame Rime: Evaluation report and executive scientist and associate professor II at the National Centre for
summary. Slough, UK: National Foundation for Educational Reading Education and Reading Research at the University of
Research. Stavanger, Norway; email erin.m.mctigue@uis.no. Her work
Wouters, P., van Nimwegen, C., van Oostendorp, H., & van der Spek,
focuses on addressing the needs of struggling readers, including
E.D. (2013). A meta-­analysis of the cognitive and motivational ef-
fects of serious games. Journal of Educational Psychology, 105(2),
motivation and classroom processes, as well as disciplinary
249–265. https://doi.org/10.1037/a0031311 literacy for elementary school students.
Wouters, P., & van Oostendorp, H. (2013). A meta-­analytic review of
the role of instructional support in game-­ based learning. ODDNY J. SOLHEIM is a professor at the National Centre for
Computers & Education, 60(1), 412–425. https://doi.org/10.1016/j. Reading Education and Reading Research at the University of
compedu.2012.07.018 Stavanger, Norway; email oddny.j.solheim@uis.no. Her areas of
Yang, X., Kuo, L.-J., Ji, X., & McTigue, E. (2018). A critical examina- expertise include assessment of reading, early reading intervention,
tion of the relationship among research, theory, and practice: difficulties with reading and writing, and reading motivation.
Technology and reading instruction. Computers & Education, 125,
62–73. https://doi.org/10.1016/j.compedu.2018.03.009 WENDI ZIMMER is currently a visiting assistant professor
*Young, M.F., Slota, S., Cutter, A.B., Jalette, G., Mullin, G., Lai, B., … at Texas A&M University, College Station, USA; email
Yukhymenko, M. (2012). Our princess is in another castle: A review wzimmer@tamu.edu. Her research focuses on teachers’ knowledge
of trends in serious gaming for education. Review of Educational and use of technology for teaching reading and writing.
Research, 82(1), 61–89. https://doi.org/10.3102/0034654312436980
Ziegler, J.C., & Goswami, U. (2005). Reading acquisition, develop- PER HENNING UPPSTAD is a professor at the National Centre
mental dyslexia, and skilled reading across languages: A psycholin-
for Reading Education and Reading Research at the University of
guistic grain size theory. Psychological Bulletin, 131(1), 3–29.
https://doi.org/10.1037/0033-2909.131.1.3
Stavanger, Norway; email per.h.uppstad@uis.no. His academic
background in linguistics and phonetics provides a foundation
Submitted August 30, 2018 for his current research areas in reading and writing attainment,
Final revision received March 23, 2019 with a focus on at-­risk learners, dyslexia, psycholinguistics, and
Accepted March 29, 2019 second-­language attainment.

APPE NDIX
Methodological Quality Questionnaire for Screening Studies
Standard Quality criterion
1. Provides a clear argument that links 1.1. Explicates theory and/or previous research in a way that builds the formulation
theory and research and demonstrates of the question; poses a question/purpose/objective that can be investigated
a coherent chain of reasoning; empirically
explicates theoretical and previous
research in a way that builds the 1.2. Explicitly links findings to previous theory and research
formulation of the question(s)

2. Applies rigorous, systematic, and 2.1. Ensures that methods are presented in sufficient detail and clarity to clearly visualize
objective methodology to obtain procedures (another person could actually collect the same data): Data collection
reliable and valid knowledge relevant should be described so readers can replicate the procedures in a quantitative study
to educational activities and programs and follow the trail of data analysis in a qualitative study. For qualitative studies, the
researchers should report some of the following: number of observations, interviews,
or documents analyzed; if interviews and observations are taped and/or transcribed;
the duration of the observations; the diversity of material analyzed; and the degree
of the investigators’ involvement in the data collection and analysis.

2.2. Was evidence of reliability provided for data collected? Information about instrument
development and adaptations for specialized populations is provided. For qualitative
studies, were trustworthiness, credibility, and/or dependability addressed and
reported?

2.3. Was evidence of validity provided for data collected (e.g., does instrumentation
measure what it is designed to measure and accurately perform the function?)?
Information about instrument development and adaptations for specialized
populations is provided. For qualitative studies, were trustworthiness, credibility,
and/or dependability addressed and reported?

2.4.  Describes participants and has a well-characterized sample

2.5.  Provides evidence of implementation fidelity

3. Present finding and make claims that 3.1.  Findings and conclusions are legitimate or consistent with data collected.
are appropriate to and supported by
the methods that have been employed.

Critically Reviewing GraphoGame Across the World | 73

View publication stats

You might also like