Using Readability Tests To Improve The Accuracy of Evaluation Documents Intended For Low-Literate Participants

http://www.jmde.
com/ Ideas to Consider
Using Readability Tests to Improve the

Accuracy of Evaluation Documents Intended for
Low-Literate Participants
Julien B. Kouamé
Western Michigan University
Background: Readability tests are indicators Setting: United States.

that measure how easy a document can be read
and understood. Simple, but very often ignored, Research Design: A comparative framework is
readability statistics cannot only provide used to present the need for readability testing.
information about the level of difficulty of the
readability of particular documents but also can Keywords: Readability tests, evaluation
increase an evaluator’s credibility. instruments, survey research, low-literacy survey
__________________________________
Purpose: The purpose of this article is two-fold:
(1) to provide readers with logical reasons for
using readability tests and (2) how to choose the
right test for a project.
I ncreasingly, survey critics reproach

researchers for using language at levels
too advanced and therefore inappropriate
instruments to between a seventh and
eighth grade level, which is the average
adult American reading ability (Kirsh et
to their intended audience. In 1993, the al., 1993). More than ever, new evaluators
National Adult Literacy survey indicated as well as professional evaluators borrow
that about 50% of the public cannot read instruments from other languages. Often,
and understand information in even short these instruments cannot be used in other
publications (Association of Medical linguistic settings without major
Directors, 2004). An article published by modifications that account for
Calderón et al. (2006) stated that major comprehension. Consequently, evaluation
survey tools such as the quality-of-life instruments lose their ability to accurately
survey” used in health institutions, measure outcomes because of linguistic
present variations in readability levels inaccessibility. Most of these concerns,
between items. To enable their patients to however, can easily be addressed by
understand most of the evaluation considering the use of a readability test as
documents, the Association of Medical part of the content analysis process of the
Directors (2004) suggested that evaluation instrument.
researchers and evaluators adapt their
Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 132

ISSN 1556-8180
August 2010
Julien B. Kouamé
Today, despite the multiplicity of readability test on low-literate

readability formulas, only a few of these participants.
are being used, some of which are A child abuse evaluation survey was
incorporated into computer software to borrowed for this assessment. The
facilitate their use. Provocative questions evaluation was conducted with 65 low-
concern the selection of the appropriate literate participants (10 years of formal
formula, the method by which to interpret schooling) for whom English was their
a result, and the need to continue the second language. Participants were
conversation regarding the use of randomly assigned into two groups of 33
readability testing. This paper presents and 32 individuals. One group used a
attempts to address these questions. form of the survey in which the content
was tested to suit the readability level by
Readability Tests for Test using the Flesch–Kincaid formula. Table 1
shows the Flesch-Kincaid grade level for
Validity and Report Clarity each question across the two forms of the
survey. Participants were also asked to
Readability tests are indicators that evaluate instructions and the
measure how easy a document is to read understandability of each item on a scale
and understand. For evaluators, of 1 to 10, where, 1 describes an item that
readability statistics can be solid is easy to read and 10 describes an item
predictors of the language difficulty level that is difficult to read. For each group,
of particular documents. The essential the understanding level was calculated.
information in an evaluation document The descriptive statistic is provided in
should be easily understandable. A proper Table 2. Also a frequency of rating is
readability level of the evaluation provided in the Figure 1 and Figure 2. On
document will greatly prevent frustration average, version 1 has a grade level (GL) of
for the project’s participants. In short, 9.8 (between 9 and 10 grade) compared to
evaluation documents presenting difficult 5.22 for version 2. The participants’ rating
items in a survey could lead to shows that the two documents were
nonresponse, missing data points, or generally well understood (see Table 2).
“unreliable responses because of a However, the document with the
mismatch between item readability and readability test presents a better
the reading skills of the respondent” understandability score.
(Calderón et al., 2006). The
implementation of readability tests prior
to pilot testing results in the more
efficient use of evaluators’ time, a critical
resource. Readability testing can also
increase the validity and reliability of data
collection instruments as well as the
credibility of the evaluator.
To illustrate this point, I use an
example from a study I conducted in fall
2006. The goal of this project was to
develop and evaluate a simple and
understandable survey for formative
evaluation and to assess the effect of the

ISSN 1556-8180
August 2010
Julien B. Kouamé
Table 1
Flesch-Kincaid Grade Level
by Survey Question
Flesch-Kincaid Grade Level

Questions Survey Form 1 Survey Form 2
Item 1 8.1 3.6
Item 2 10.4 5.2
Item 3 8.2 0.7
Item 4 7.6 0.7 Figure 1. Distribution of Rating of Survey
Form 1
Item 5 7.3 3.6
Item 6 13.0 5.8
Item 7 15.4 11.3
Item 8 14.2 9.0
Item 9 9.0 6.2
Item 10 10.9 9.0
Item 11 2.2 2.2
Item 12 11.7 5.4
Average 9.83 5.22
Figure 2. Distribution of Rating of Survey
Form 2
The results show a better readability of
the survey after the revision following the Although the Flesch-Kincaid test
Flesch-Kincaid test. The version 1 has a shows that the GL is higher than the
rating mean of 4.43 with a standard target population, the rating of the
deviation of 1.4. The version 2 received a participants shows that the survey could
lower mean equal 3.60 with a standard be understood by the users. The following
deviation of 1.3. may be the principal reason justifying the
difference between the two results. The
Table 2 questions are tested individually;
Descriptive Statistics for Survey Rating therefore, the software cannot relate them
to each other. While the readability test
N M SD Min Max provides a high GL, the participants may
Survey
not have difficulty understanding because
30 4.43 1.36 2 7 they take the context of the writing into
Form 1
consideration. However the test was
Survey
Form 2
30 3.60 1.33 2 7 useful as it helped revising the survey for
easy reading.

ISSN 1556-8180
August 2010
Julien B. Kouamé
Critiques of Readability evaluation capacity) that will improve

evaluation reports and the practice of
Formulae evaluation. The metaevaluators judged the
evaluation reports using criteria for sound
There is no doubt that readability testing evaluations according to The Program
has always been at the center of Evaluation Standards (Joint Committee
controversy. In “The Principles of on Standards for Educational Evaluation,
Readability,” DuBay (2004) listed 1994). The metaevaluation focused on the
numerous papers that criticized extent to which the five reports
readability testing. These papers, as individually and collectively met the
Dubay wrote, have titles such as requirements for utility, feasibility,
“Readability: A Postscript” (Manzo 1970), propriety, and accuracy. For the purpose
“Readability Formulas: What’s the Use?” of this paper only a part of the U5
(Duffy 1985) and “Last Rites for standard in the Stufflebeam’s (1999)
Readability Formulas in Technical Metaevaluation Checklist is presented.
Communication” (Connaster 1999). Still As a result of this metaevaluation, we
others suggest the idea of usability as an found that only one evaluation met the
alternative for readability. However following two criteria of the comparison
usability testing is not able to provide an analysis: 1. Provides definition of terms
objective prediction of text difficulty used and 2. Uses audience-appropriate
(Dubay, 2004). Before other reasonable language, tables, and graphs. However
alternatives are invented, readability none of the evaluators provided
testing for text reading level prediction information about testing the readability
remains essential. This point is well level of their documents (research tools
illustrated in my study mentioned above. and reports). The judgment made by the
metaevaluators was based not only on
Simple Skill, Often Forgotten their own ability to read and understand
the reports, but also by the
In Scriven’s (2007) Key Evaluation implementation of readability testing to
Checklist (KEC) he urges professional account for the skills of the target
evaluators not to ignore readability population. Although the report entitled
testing. Not only do we have the Improving Asset Utilization was difficult
commitment to provide our customers to understand, the evaluator not only
with accurate information, but we also provided definitions of difficult
have the obligation to present those expressions but also clearly disclaimed
findings in a report that is easily that language used in the report was
understandable. familiar to his client. Despite the fact that
In the summer of 2006, I was part of a this assessment was done using only 5
team of 10 evaluators to metaevaluate five evaluation reports, it clearly illustrates
evaluation reports by other evaluators, that the readability testing is not always
who had at least 2 years of experience. used by evaluators.
The project was initiated by the
Department of Educational Studies of my
school and supervised by an expert in
evaluation. The objective was to provide
opportunities for evaluators to further
develop evaluation skills (i.e., build

ISSN 1556-8180
August 2010
Julien B. Kouamé
Selecting an Appropriate conducted in 2005 by examining major

medical publications: Medline (1966-
Formula 2003), CINAHL (1982-2003), ClinPSYC
(1993-2003), and PsychInfo (2003-2005).
Recently I was talking to a friend about The 17 articles reported focused on
my interest in readability testing of readability and presented methods for
evaluation documents. He agreed that this estimating readability scores. Table 3
is a “must be done” act for any researcher. shows that Flesch-Kincaid and the Flesch
Then my friend continued the discussion Reading Ease are the most common
by saying that all of the formulas do not formulas used to assess readability
provide the same reading level. The most (Calderón et al., 2006). These formulas
important question he asked during our are the most widely used because they are
conversation was the following: “How the most reliable formulas. Especially
should we select an appropriate test respected is the Flesch Reading Ease
(formula)?” The next section will attempt formula because it is the most tested and
to answer this important question. the most reliable (Chall 1958, Klare 1963).
Indeed more than 200 readability test In addition, the formula is incorporated in
formulas were invented since the 1940s. the Microsoft Word software which favors
However, only a few of these are currently its ease-of-use factor.
being used. The recent study led by
Calderón (2006), presented in Table 3, is
the result of a literature search on survey
readability. The Literature search was
Table 3
Publications on Survey Readability: Methods, Application to Test, and Scores
However, good research practice assess the reading level of our evaluation
suggests that we use several methods for documents. Using more than one test
testing because error is inevitable. provides greater insight into the
Therefore my suggestion is that we use the document. Be reminded that any
combination of more than one formula to measurement is susceptible to error.

ISSN 1556-8180
August 2010
Julien B. Kouamé
Indeed errors are the essence of the field prediction for non running narrative
of measurement. Some of the readability (Questionnaire, Form), FOG is widely
formulas tend to predict higher scores used for running text in the health care
than others. This is the case for the SMOG and general insurance industries for
and the Fog formulas. Users of the general business publications. To
readability formulas find discrepancy illustrate the discrepancy among
between the formulas because each of readability tests, I tested the present
them are constructed with a specific paragraph using Flesch and SMOG. The
objective in mind. Therefore, “Different result is shown below. In addition, I
uses of a text require different levels of provide in Table 4 a list of frequently used
difficulty” (DuBay, 2004). For example, readability test formulas with what they
while FORCAST provides a good test the best.
Table 4
Suggested Usage of Common Readability Formulas
As can be seen from the formulas in Here is a list of concerns you should
Table 4, manual calculation of reading have in mind when you thinking about
level, can be boring, complex and using such tools:
sometimes time consuming because
words, sentences, and paragraphs must be 1. There are many free tools to
counted. Fortunately, many of these consider. However, sometimes free
formulas are incorporated in software is also cheap in value.
applications to make them easy to use. 2. Use of more than one testing tools
Unfortunately, not all the applications will provide you with a significant
provide reliable results. knowledge of your document

ISSN 1556-8180
August 2010
Julien B. Kouamé
3. Consider also a visual display of the you to find out if your document suits the
result of your test. This will help target. If it does not, you have to review
you to know what to focus your and test again. But by subjecting your
revision on. documents to readability testing, you will
4. The tool should help to locate the predict (know) the reading level and save
best test for your audience. one or two stages of field testing.
5. Establish the credibility of the Readability testing will become more
author (s) of the tool. necessary than ever before because of the
6. Test the accuracy of the tool by multiple layers of reading capability
testing the reading level of within our diverse society. As with any
sentences such as “The students tool, it can only do best what it is designed
saw Mrs. Kate during the recess.” to do. Despite the limits of readability
Some software may consider this as formulas, they remain a unique way to
2 sentences. If this happens, the predict the extent to which documents can
software may not reconsider you be comprehended by their intended
choice. target. However researchers have
demonstrated that although readability
Readability testing should be part of testing is relatively simple, it is often
all evaluators’ projects, even when those forgotten.
individuals are internal evaluators and
think they know the common language
used in the institution. The final report of References
an evaluator can be disseminated to the
public at large, not only to an internal Association of Medical Directors. (2004).
constituency. Therefore the evaluator Comprehension and reading level.
Retrieved February 20, 2008, from
should not only have the client in mind
http://www.informatics-
while working for her/him but should also
review.com/FAQ/reading.html
think of the audience (Scriven, 1991). Calderón, J. L., Morales, L. S., Liu, H., &
Scriven has even suggested that reports be Hays, R. D. (2006). Variation in the
field tested to suit the target. Whether you readability of items within surveys.
are writing a proposal, the first question American Journal of Medical Quality,
of a research tool or a report, it is a 21(1), 49-56.
valuable habit to pretest for readability. Chall, J. S. (1958). Readability: An
You should not look at this just in appraisal of research and application.
terms of KEC, as Scriven advises, but Columbus, OH: Ohio State University
should consider that cultivating this habit Press.
will also save considerable time in DuBay, W. (2004). The principles of
revision and even make you a better readability. Retrieved July 24, 2008,
evaluator. from http://www.impact-
The KEC put a focus on testing information.com.
frequent consultation with stakeholders FreadabilityFormulas.com. Can YOU read
and audience before, during, and after we me now? Retrieved July25, 2008, from
have developed any evaluation document. http://www.readabilityformulas.com/fr
Assessing both the reading ability of the ee-ebooks.php.
audience and the readability of the text Kirsh, I., Jungeblut, A., Jenkins, L., &
will greatly facilitate this process. The Kolstad, A. (1993). Adult literacy in
field test suggested by Scriven will allow America: A first look at the results of

ISSN 1556-8180
August 2010
Julien B. Kouamé
the National Adult Literacy Survey. Scriven, M. (2007). Key evaluation

Washington, DC: US Department of checklist. Kalamazoo, MI: The
Health, Education and Welfare. Evaluation Center, Western Michigan
Powers, R. D., Sumner, W. A., & Kearl, B. E. University.
(1958). A recalculation of four adult Stufflebeam, D. L. (1999). Program
readability formulas. Journal of evaluation models metaevaluation
Educational Psychology, 49(2 ), 99-105. checklist. Kalamazoo, MI: The
Scriven, M. (1991). Evaluation thesaurus Evaluation Center, Western Michigan
(4th ed.). Newbury Park, CA: Sage.. University.

ISSN 1556-8180
August 2010

Using Readability Tests To Improve The Accuracy of Evaluation Documents Intended For Low-Literate Participants

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Using Readability Tests To Improve The Accuracy of Evaluation Documents Intended For Low-Literate Participants

Uploaded by

Copyright:

Available Formats

http://www.jmde.

com/ Ideas to Consider

Using Readability Tests to Improve the

Background: Readability tests are indicators Setting: United States.

I ncreasingly, survey critics reproach

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 132

Today, despite the multiplicity of readability test on low-literate

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 133

Flesch-Kincaid Grade Level

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 134

Critiques of Readability evaluation capacity) that will improve

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 135

Selecting an Appropriate conducted in 2005 by examining major

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 136

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 137

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 138

the National Adult Literacy Survey. Scriven, M. (2007). Key evaluation

Journal of MultiDisciplinary Evaluation, Volume 6, Number 14 139

You might also like