Professional Documents
Culture Documents
Schaefer - Analysing Early Grade Reading Data
Schaefer - Analysing Early Grade Reading Data
1
Table of Contents
Introduction 3
2. Task types 5
Constructs measured 5
Untimed and timed tasks 5
3. Scoring 6
Scoring untimed tasks 6
Scoring timed tasks 7
What scores are used in analyses by researchers and project evaluators? 7
References 12
2
Introduction
I have worked on a few early grade reading projects that use the early grade reading assessment
(EGRA) to collect data on letter recognition, word reading, oral reading fluency and reading
comprehension. A former colleague asked me to write a short overview of which data matters for
analysis and reporting. While I was writing, I decided to also describe the considerations around
administration of EGRA, as well as considerations when it comes to including response options for
untimed tasks. For this review, I use my experience as a researcher who has used EGRA in my own
research, as well as my experience as a research consultant who prepared EGRA tasks and data
collection for clients.
This review DOES NOT explain what EGRA is and how it is used in research, monitoring and
evaluation. For an overview of the theoretical underpinnings of EGRA and the tasks that are
available, please see Dubeck and Gove (2015).
Small scale projects (such as for student research) most likely use paper based methods, and large
scale projects (such as project evaluations, e.g. Funda Wande) are more likely to use online
methods. There are more advantages than disadvantages associated with online data collection,
however, most researchers do not have the financial resources or know-how to conduct EGRA
electronically. The small samples most researchers work with may also mean that online form
programming and associated assessor training is too burdensome.
3
Table 1. Advantages and disadvantages of scoring EGRA on paper and online
Paper You can ● Assessors don’t have to be ● Costly to print pages, and requires
download an computer literate (less training pencils/pens and timer/cellphone.
example needed). ● Documents can go missing before data
paper-based ● Paper is less interesting to is captured electronically.
EGRA from children so may distract them ● Mistakes can be introduced when
my OSF less. presenting questions for text reading.
project. ● Scoring electronically can be ● Mistakes can be introduced in data
quality controlled. Scores can be capturing (but double capturing can
checked by a third party. address this concern).
● Immediate data monitoring cannot take
place.
● Often, only final scores are captured
(e.g. total number of words incorrect),
rather than which words exactly were
read incorrectly.
● Discontinuation (stop-criterion) rules
have to be manually applied.
Online Some ● Data is immediately saved and ● Requires more extensive training in
examples of cannot go missing. how to use the app.
using ● Data can be checked as it comes ● Technical glitches do occur.
Tangerine are in. ● Initial cost to purchase tablets may be
provided by ● As the timed aspects are prohibitive.
RTI programmed into the tablet, timed ● Requires knowledge of backend
International tasks may be more accurate. programming of the tasks.
on their ● Data output is standardised
website. across projects if the same basic
task structure is followed in the
programming.
● More nuanced data is collected,
such as exactly which words were
read incorrectly.
● Only relevant comprehension
questions are presented
depending on how far each child
read.
● Discontinuation (stop criterion)
rules can be automatically
applied.
4
2.Task types
Constructs measured
In South Africa (SA), the most often used tasks are:
- Rapid object naming
- Letter recognition
- Word reading
- Text reading
- Reading comprehension
- Listening comprehension
- Phonological awareness
At a minimum, EGRA tasks in SA include letter recognition fluency, text reading fluency (most often
called oral reading fluency (ORF)) and reading comprehension.
Letter recognition fluency (the easiest task) is included even when administered at later grades as it
allows children to demonstrate what reading skill they do have (i.e. avoiding floor effects). Research
on benchmarks indicates that children need to read at least 40 letter units a minute to be able to
understand some of what they read (Ardington et al., 2021). ORF and reading comprehension are
the main outcomes which are the focus of research.
5
3. Scoring
Table 2. Examples, and considerations for deciding on response options for untimed tasks
6
Scoring timed tasks
Timed tasks in EGRA require the following information to be captured as raw data:
1. Which item(s) was/were read incorrectly (number incorrect)1
2. Last item read when the timer ran out (number attempted) OR total time taken if participant
completed before end of timer
Traditionally, the timed tasks were administered for 60 seconds, but nowadays, the timing can vary
(see example below). Nevertheless, the same data is collected at a minimum.
1
In paper based administrations, this data is often summarised as TOTAL NUMBER OF WORDS READ
INCORRECTLY and this total number included in spreadsheet software. Item level data is often not recorded
in paper administrations of these timed tasks. For online software, item level data (i.e. whether an item was
read correctly or not) is recorded, providing nuanced data for both primary and secondary analysts.
7
Researchers will use the following scores in their analyses:
1. Untimed tasks:
a. Item level frequency tables
b. Total items answered correctly
AND/OR
c. Total itemss answered correctly as a proportion of attempted questions
2. Timed tasks:
a. Words correct per minute (derived from ((number of items attempted of whole
passage minus number incorrect over whole passage) divided by total time taken))
b. Accuracy (derived from ((number attempted minus number incorrect) divided by
number attempted))
8
test_session == 2. You can always reformat the data into wide format in the analysis program you
use. For more on long vs wide data, see this article.
Figure 1 shows how the responses to each question was captured (orf_comp_1_1 - orf_comp_1_5).
Here, the responses were coded as 1 (correct), 0 (incorrect), and 99 (nonresponse). Remember also
that participants are only presented questions based on how far they have read. Thus, row 4 (which
has blank observations) indicates those questions which were not presented (and so were not
attempted). It is useful to have different values for questions which were answered as nonresponse
and those which were not presented. This allows the analyst more options.
Additionally, the total comprehension score was calculated within the SurveyCTO form and
captured in calc_orf_1_comp_score. One can also calculate the total score in the script of the
analysis program you use. For example, analysts may use the percentage correct out of all questions
(divide by 5 in this example) OR the percentage correct out of questions asked (which would be
divided by 1 for the participant in row 4.
SurveyCTO makes use of a plugin which automatically times the timed tasks, and records certain
metadata. The metadata captured for ORF tasks is presented in Figure 2 below. Here you can see
that this metadata is also labelled. orf_1_metadata_4 tells us how many items were attempted.
orf_1_metadata_5 tells us how many items were read incorrectly. The program automatically
calculates items correct in orf_1_metadata_6 but you should also calculate this value yourself after
cleaning the data as a way to check the scores.
9
Figure 2. Output for timed tasks using the plug in in SurveyCTO
10
high correlation at least above .75. In later grades, the correlation may be much higher.
Investigate any data points that are very far from the regression line or that seem to be
influencing the regression line. For example, it’s unlikely that a child will score zero on the
first ORF then 60 wcpm on the second ORF. One of these data points seems to be incorrect.
○ When you have discrepancies, try to examine the participant’s scores for all tasks. A
non-reader will score zero or close to zero in all the tasks, and a child that can read
should be able to read in all the tasks. This is how the EGRA works.
○ When you find these discrepancies, you may decide to delete the participant from the
study, or to replace one or some of the scores with a missing value (R uses NA to
indicate missing values). Whatever you decide, record what you did in your data
cleaning script so you can always re-assess later.
● In large studies with multiple data collectors, you may want to examine the average scores
(and their standard deviation) provided by each data collector. If data collectors have been
randomly allocated to sites/participants their average scores should be quite similar in the
long run. Data collectors whose scores are very different from the group of data collectors
may need to be investigated and further supported. In the worst case, you may have to
discard the data submitted by some data collectors. Regularly inspecting incoming data will
help you to identify problematic data collectors, or faulty software!
11
References
Ardington, C., Wills, G., Pretorius, E., Mohohlwane, N., & Menendez, A. (2021). Benchmarking oral
reading fluency in the early grades in Nguni languages. International Journal of Educational
Development, 84, 102433. https://doi.org/10.1016/j.ijedudev.2021.102433
Dubeck, M. M., & Gove, A. (2015). The early grade reading assessment (EGRA): Its theoretical
foundation, purpose, and limitations. International Journal of Educational Development, 40,
315–322. https://doi.org/10.1016/J.IJEDUDEV.2014.11.004
Reynolds, T., Schatschneider, C., & Logan, J. (2020). The Basics of Data Management (Version 2).
figshare. https://doi.org/10.6084/m9.figshare.13215350.v2
12