Schaefer - Analysing Early Grade Reading Data

Processing and Analysing Early Grade
Reading Assessment (EGRA) Data:

Reflections from a South African
Researcher
Maxine Schaefer
https://orcid.org/0000-0002-5455-2762
mschaeferliteracy@gmail.com
October 2022
How to cite this document:
1
Table of Contents
Introduction 3
1. Administration: paper first or online first 3
2. Task types 5
Constructs measured 5
Untimed and timed tasks 5
3. Scoring 6
Scoring untimed tasks 6
Scoring timed tasks 7
What scores are used in analyses by researchers and project evaluators? 7
4. Electronic storage of data 8

Overview of the template 8
How data is saved by SurveyCTO 9
5. Cleaning raw EGRA data 10
References 12
2
Introduction
I have worked on a few early grade reading projects that use the early grade reading assessment
(EGRA) to collect data on letter recognition, word reading, oral reading fluency and reading
comprehension. A former colleague asked me to write a short overview of which data matters for
analysis and reporting. While I was writing, I decided to also describe the considerations around
administration of EGRA, as well as considerations when it comes to including response options for
untimed tasks. For this review, I use my experience as a researcher who has used EGRA in my own
research, as well as my experience as a research consultant who prepared EGRA tasks and data
collection for clients.
This review DOES NOT explain what EGRA is and how it is used in research, monitoring and
evaluation. For an overview of the theoretical underpinnings of EGRA and the tasks that are
available, please see Dubeck and Gove (2015).
1. Administration: paper first or online first

EGRAs can be administered on pen and paper with the assessor manually recording scores on a
score sheet, which are later placed online (e.g. directly into a spreadsheet), or via an app such as
Tangerine or SurveyCTO. Each method has its advantages and disadvantages (Table 1).
Small scale projects (such as for student research) most likely use paper based methods, and large
scale projects (such as project evaluations, e.g. Funda Wande) are more likely to use online
methods. There are more advantages than disadvantages associated with online data collection,
however, most researchers do not have the financial resources or know-how to conduct EGRA
electronically. The small samples most researchers work with may also mean that online form
programming and associated assessor training is too burdensome.
3
Table 1. Advantages and disadvantages of scoring EGRA on paper and online
Type of Example Advantages Disadvantages

Admini
stratio
n
Paper You can ● Assessors don’t have to be ● Costly to print pages, and requires
download an computer literate (less training pencils/pens and timer/cellphone.
example needed). ● Documents can go missing before data
paper-based ● Paper is less interesting to is captured electronically.
EGRA from children so may distract them ● Mistakes can be introduced when
my OSF less. presenting questions for text reading.
project. ● Scoring electronically can be ● Mistakes can be introduced in data
quality controlled. Scores can be capturing (but double capturing can
checked by a third party. address this concern).
● Immediate data monitoring cannot take
place.
● Often, only final scores are captured
(e.g. total number of words incorrect),
rather than which words exactly were
read incorrectly.
● Discontinuation (stop-criterion) rules
have to be manually applied.
Online Some ● Data is immediately saved and ● Requires more extensive training in
examples of cannot go missing. how to use the app.
using ● Data can be checked as it comes ● Technical glitches do occur.
Tangerine are in. ● Initial cost to purchase tablets may be
provided by ● As the timed aspects are prohibitive.
RTI programmed into the tablet, timed ● Requires knowledge of backend
International tasks may be more accurate. programming of the tasks.
on their ● Data output is standardised
website. across projects if the same basic
task structure is followed in the
programming.
● More nuanced data is collected,
such as exactly which words were
read incorrectly.
● Only relevant comprehension
questions are presented
depending on how far each child
read.
● Discontinuation (stop criterion)
rules can be automatically
applied.
4
2.Task types
Constructs measured
In South Africa (SA), the most often used tasks are:
- Rapid object naming
- Letter recognition
- Word reading
- Text reading
- Reading comprehension
- Listening comprehension
- Phonological awareness
At a minimum, EGRA tasks in SA include letter recognition fluency, text reading fluency (most often
called oral reading fluency (ORF)) and reading comprehension.
Letter recognition fluency (the easiest task) is included even when administered at later grades as it
allows children to demonstrate what reading skill they do have (i.e. avoiding floor effects). Research
on benchmarks indicates that children need to read at least 40 letter units a minute to be able to
understand some of what they read (Ardington et al., 2021). ORF and reading comprehension are
the main outcomes which are the focus of research.
Untimed and timed tasks

EGRA has two main task types: untimed and timed. Untimed tasks such as reading/listening
comprehension questions, can either be dummy coded (e.g. correct (1) or incorrect (0)) or have
single select options (e.g. correct in first language (1), correct in English (2), incorrect (0),
non-response (99)). Most often, the single select options are provided so that non-responses can
also be captured. In the case of reading comprehension, the assessor should only ask
comprehension questions up until where the child has read. In paper administrations of the test, the
assessor needs to manually figure this out. However, in online methods, the software will only
present the relevant questions (as long as the programming was correct). The automatic
presentation of only relevant questions is therefore a major advantage of online administration
methods.
5
3. Scoring
Scoring untimed tasks

When it comes to including response options for untimed tasks, there is always a compromise
between being succinct (and reducing the complexity of scoring) and collecting sufficiently nuanced
data which informs the current project goals and future secondary analysis of the data. Table 2
presents an example of succinct and detailed scoring. The more nuanced the scoring is, the more
training is required for assessors, and the more difficult it becomes to score. Researchers should
evaluate what level of detail is required to answer their primary research questions, but also what
data may be useful for secondary data analysts, while balancing what will be straightforward for
assessors to follow. It doesn’t help to have a very detailed coding scheme that assessors find too
complicated to use!
Table 2. Examples, and considerations for deciding on response options for untimed tasks
Type of Example Advantages Disadvantages

Scoring
Succinct 1 correct ● Easy to sum into total scores, ● Nuance is lost.

0 incorrect especially when the software ● Limits the usability of the
99 no response automatically recognises and data for secondary analyses
does not count non-responses that may have different
in frequencies. research questions.
Detailed/ Example 1 ● Detailed information can be ● Requires slightly more data

Nuanced 1 correct in first used for both the current processing for any
language primary analysis as well as particular analysis.
2 correct in English secondary analysis e.g. ● Complicated scoring makes
0 incorrect secondary analysts can use training more difficult and
99 non response the data to analyse children’s could lead to more
multilingual response options mistakes/ inconsistencies.
Example 2 and derive different groups of ● Ethically, one should only
1 correct answer, children with different collect data that will be
correct spelling instructional needs. used.
2 correct answer, ● Analysts can make choices
incorrect spelling about how they would like to
0 incorrect analyse the data e.g. the
4 writing is illegible scores can be converted to
99 non-response dummy scores (right/wrong) if
the analysis warrants it.
6
Scoring timed tasks
Timed tasks in EGRA require the following information to be captured as raw data:
1. Which item(s) was/were read incorrectly (number incorrect)1
2. Last item read when the timer ran out (number attempted) OR total time taken if participant
completed before end of timer
From these variables, the analyst calculates:

1. Words correct per minute = (number attempted minus number incorrect) divided by total
time taken
2. Accuracy = (number attempted minus number incorrect) divided by number attempted
Traditionally, the timed tasks were administered for 60 seconds, but nowadays, the timing can vary
(see example below). Nevertheless, the same data is collected at a minimum.
Example of new timing methods

Recent uses of EGRA in Funda Wande include collecting the following information:
1. Which item(s) was/were read incorrectly (number incorrect)
2. Last item read at 60 seconds OR total time taken if participant completed before end of
timer
AND
3. Last item read at 180 seconds OR total time taken if participant completed before end of
timer
From these variables, the analyst calculates:
3. Words correct per minute (whole passage) = (number attempted of whole passage minus
number incorrect over whole passage) divided by total time taken
4. Accuracy = (number attempted minus number incorrect) divided by number attempted
What scores are used in analyses by researchers and project

evaluators?
A later section addresses what you should look out for when cleaning EGRA data. I start first by
addressing what the main variables researchers and project evaluators use in their analyses.
1
In paper based administrations, this data is often summarised as TOTAL NUMBER OF WORDS READ
INCORRECTLY and this total number included in spreadsheet software. Item level data is often not recorded
in paper administrations of these timed tasks. For online software, item level data (i.e. whether an item was
read correctly or not) is recorded, providing nuanced data for both primary and secondary analysts.
7
Researchers will use the following scores in their analyses:
1. Untimed tasks:
a. Item level frequency tables
b. Total items answered correctly
AND/OR
c. Total itemss answered correctly as a proportion of attempted questions
2. Timed tasks:
a. Words correct per minute (derived from ((number of items attempted of whole
passage minus number incorrect over whole passage) divided by total time taken))
b. Accuracy (derived from ((number attempted minus number incorrect) divided by
number attempted))
4.Electronic storage of data

I, personally, always find it helpful to understand how the data looks when it is stored in a
spreadsheet. Researchers will each have their own way of capturing data in a spreadsheet. I have
provided an example .csv file and associated codebook to capture data in OSF (https://osf.io/a4wky/).
For a detailed guide on basic data management principles I recommend reading Reynolds,
Schatschneider and Logan (2022). In the sections below, I summarize how data is stored via
SurveyCTO and make the connection to the template I provided to capture paper based scores.
Overview of the template

The template includes columns to include unique identifiers, some demographic information, and
columns to capture the data from a paper based EGRA. The template includes letter-sound
recognition, word reading, ORF and reading comprehension after the ORF. Note how columns
related to the same construct start with the same prefix (e.g. the letter recognition task variables
always start with letters_). This naming convention makes it easier for people out of the project to
follow, and makes it easier to follow. For more tips on naming conventions for data management,
please review the directions provided by Jan Schenk in this article. You will also notice some
variables which I suggest you calculate in the analysis program (e.g. letters correct per minute) so
that you reduce errors in calculating the variables and capturing them. Another recommendation is
to include data in long format i.e. each row in the spreadsheet is one observation (one child’s scores
for one test session). When the same child is assessed twice, use the test_session variable to
indicate that the second row relates to another assessment. Thus, the data for the learner who gets
tested twice will appear in two rows. Once, where test_session == 1, and secondly, where
8
test_session == 2. You can always reformat the data into wide format in the analysis program you
use. For more on long vs wide data, see this article.
How data is saved by SurveyCTO

SurveyCTO collects data in wide format: each row is a different observation. The data from the
survey is exported as a .csv file and will contain columns for each variable. For untimed tasks, such
as the comprehension questions after an ORF task, the relationship between variables and columns
is clear: each question has its own column in the spreadsheet. Figure 1 demonstrates the output for
untimed reading comprehension questions.
Figure 1. Output for untimed tasks
Figure 1 shows how the responses to each question was captured (orf_comp_1_1 - orf_comp_1_5).
Here, the responses were coded as 1 (correct), 0 (incorrect), and 99 (nonresponse). Remember also
that participants are only presented questions based on how far they have read. Thus, row 4 (which
has blank observations) indicates those questions which were not presented (and so were not
attempted). It is useful to have different values for questions which were answered as nonresponse
and those which were not presented. This allows the analyst more options.
Additionally, the total comprehension score was calculated within the SurveyCTO form and
captured in calc_orf_1_comp_score. One can also calculate the total score in the script of the
analysis program you use. For example, analysts may use the percentage correct out of all questions
(divide by 5 in this example) OR the percentage correct out of questions asked (which would be
divided by 1 for the participant in row 4.
SurveyCTO makes use of a plugin which automatically times the timed tasks, and records certain
metadata. The metadata captured for ORF tasks is presented in Figure 2 below. Here you can see
that this metadata is also labelled. orf_1_metadata_4 tells us how many items were attempted.
orf_1_metadata_5 tells us how many items were read incorrectly. The program automatically
calculates items correct in orf_1_metadata_6 but you should also calculate this value yourself after
cleaning the data as a way to check the scores.
9
Figure 2. Output for timed tasks using the plug in in SurveyCTO
5.Cleaning raw EGRA data

There’s many ways data capturing can go wrong leading to the need to clean data. Because of all
these varied ways that gremlins creep in, I cannot provide an exhaustive list of how to clean EGRA
data, but I try to address the most common things to look out for. Firstly, though, always save your
raw data in its own file, and any subsequent cleaned data in its own file! It’s also best to
document the cleaning you do via your stats program using a script. For a much longer list of data
cleaning tips, I recommend the data management guide by Reynolds et al. (2022).
● Check that the total scores are plausible per task. Running descriptive statistics can help
here (min and max scores). Number of words attempted should be the same or less than the
total number of possible words. There should not be any negative values in EGRA.
● Check that any automatically calculated or hand calculated scores were calculated correctly
by re-doing the calculation in your statistical program (and check against the paper version if
available). These kinds of errors can creep in especially in paper-based EGRA scoring.
● For paper-based scoring, ensure that any discontinuation rules were applied correctly. You
should also check that the correct number of reading comprehension questions were
administered based on where the child read up until in the ORF task.
● Check that bivariate correlations look correct. I run scatterplots and include a linear line of
best fit with raw data points included. I usually include jitter so that datapoints don’t overlap
allowing me to see the underlying data more clearly. At this stage, we have a good idea of
how the sub-tasks of the EGRA are correlated. For example, two ORF tasks should have a
10
high correlation at least above .75. In later grades, the correlation may be much higher.
Investigate any data points that are very far from the regression line or that seem to be
influencing the regression line. For example, it’s unlikely that a child will score zero on the
first ORF then 60 wcpm on the second ORF. One of these data points seems to be incorrect.
○ When you have discrepancies, try to examine the participant’s scores for all tasks. A
non-reader will score zero or close to zero in all the tasks, and a child that can read
should be able to read in all the tasks. This is how the EGRA works.
○ When you find these discrepancies, you may decide to delete the participant from the
study, or to replace one or some of the scores with a missing value (R uses NA to
indicate missing values). Whatever you decide, record what you did in your data
cleaning script so you can always re-assess later.
● In large studies with multiple data collectors, you may want to examine the average scores
(and their standard deviation) provided by each data collector. If data collectors have been
randomly allocated to sites/participants their average scores should be quite similar in the
long run. Data collectors whose scores are very different from the group of data collectors
may need to be investigated and further supported. In the worst case, you may have to
discard the data submitted by some data collectors. Regularly inspecting incoming data will
help you to identify problematic data collectors, or faulty software!
11
References
Ardington, C., Wills, G., Pretorius, E., Mohohlwane, N., & Menendez, A. (2021). Benchmarking oral
reading fluency in the early grades in Nguni languages. International Journal of Educational
Development, 84, 102433. https://doi.org/10.1016/j.ijedudev.2021.102433
Dubeck, M. M., & Gove, A. (2015). The early grade reading assessment (EGRA): Its theoretical
foundation, purpose, and limitations. International Journal of Educational Development, 40,
315–322. https://doi.org/10.1016/J.IJEDUDEV.2014.11.004
Reynolds, T., Schatschneider, C., & Logan, J. (2020). The Basics of Data Management (Version 2).
figshare. https://doi.org/10.6084/m9.figshare.13215350.v2
12

Schaefer - Analysing Early Grade Reading Data

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Schaefer - Analysing Early Grade Reading Data

Uploaded by

Copyright:

Available Formats

Processing and Analysing Early Grade

Reading Assessment (EGRA) Data:

How to cite this document:

1. Administration: paper first or online first 3

4. Electronic storage of data 8

5. Cleaning raw EGRA data 10

1. Administration: paper first or online first

Type of Example Advantages Disadvantages

Untimed and timed tasks

Scoring untimed tasks

Type of Example Advantages Disadvantages

Succinct 1 correct ● Easy to sum into total scores, ● Nuance is lost.

Detailed/ Example 1 ● Detailed information can be ● Requires slightly more data

From these variables, the analyst calculates:

Example of new timing methods

What scores are used in analyses by researchers and project

4.Electronic storage of data

Overview of the template

How data is saved by SurveyCTO

Figure 1. Output for untimed tasks

5.Cleaning raw EGRA data

You might also like