Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/233268844

High School Exit Exams and Mismeasurement

Article  in  The Educational Forum · October 2011


DOI: 10.1080/00131725.2011.602467

CITATIONS READS

2 295

1 author:

Christopher H. Tienken
Seton Hall University
62 PUBLICATIONS   330 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Christopher H. Tienken on 13 April 2018.

The user has requested enhancement of the downloaded file.


This article was downloaded by: [Christopher H. Tienken]
On: 13 February 2012, At: 14:47
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered
office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

The Educational Forum


Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/utef20

High School Exit Exams and


Mismeasurement
a
Christopher H. Tienken
a
Department of Education Leadership, Management, and Policy,
Seton Hall University, Spring Lake Heights, New Jersey, USA

Available online: 30 Aug 2011

To cite this article: Christopher H. Tienken (2011): High School Exit Exams and Mismeasurement, The
Educational Forum, 75:4, 298-314

To link to this article: http://dx.doi.org/10.1080/00131725.2011.602467

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation
that the contents will be complete or accurate or up to date. The accuracy of any
instructions, formulae, and drug doses should be independently verified with primary
sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand, or costs or damages whatsoever or howsoever caused arising directly or
indirectly in connection with or arising out of the use of this material.
The Educational Forum, 75: 298–314, 2011
Copyright © Kappa Delta Pi
ISSN: 0013-1725 print/1938-8098 online
DOI: 10.1080/00131725.2011.602467

High School Exit Exams


and Mismeasurement
Christopher H. Tienken
Department of Education Leadership, Management, and Policy,
Seton Hall University, Spring Lake Heights, New Jersey, USA
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

Abstract
Test score validity takes center stage in the debate over the use of high school
exit exams. Scant literature addresses the amount of conditional standard
error of measurement (CSEM) present in individual student results on high
school exit exams. The purpose of this study is to fill a void in the literature
and add a national review of the CSEM, including data on the amount of
CSEM present in high school exit exams results. Individual student results
from each of the 23 exit exams contained a CSEM ranging from 3.29 to 39
scale-score points. Nearly one-fourth of the state education agencies did not
report the CSEM for the individual student results.

Key words: assessment, curriculum and instruction, high school exit exams, secondary
education.

Standardized testing of public, secondary-school students’ academic skills and


knowledge is an international education practice (Organization of Economic Coopera-
tion and Development 2008). Students in many countries in the European Union and
Asia take national or provincial exams in middle school to determine the type of high
school they can attend. Some students in those countries must also take national tests
at the end of their high school careers to determine whether they can attend univer-
sities or receive a standard high school diploma. High school students in Germany
must usually take the Arbitur exam to enter college, unless they have a diploma from
a vocational education institution, which requires an exit exam (European Glossary on
Education 2004). In Finland, high school students must take a high school graduation
exam, and those who wish to attend a university must pass the national matriculation
exam (Matriculation Examination Board 2009). Education officials in France, Italy, and
England also administer standardized exit-type national exams to their high school

Address correspondence to Christopher H. Tienken, Department of Education Leadership,


Management, and Policy, Seton Hall University, 1104 Ocean Rd., Spring Lake Heights,
NJ 07762, USA. E-mail: christopher.tienken@shu.edu
High School Exit Exams and Mismeasurement

students to establish eligibility for diplomas and entrance into universities (European Glos-
sary on Education 2004). In China, students must take a national college entrance exam to
determine whether they can attend a university (United Nations Educational, Scientific,
and Cultural Organization [UNESCO] 2003). They must also take a high school gradu-
ate test to determine the type of diploma they receive. Students in India take the Indian
Certificate of Secondary Education Examination after their second year in high school, or
Year 10, as it is known in India. Then, students who stay in high school beyond Year 10
take the Indian School Certificate Examination at the end of high school to earn a standard
diploma and to meet university access requirements (UNESCO 2003).

Assessment-driven education policies are in place in all 50 states in the United States.
The latest assessment-driven legislation included the reauthorization of the Elementary
and Secondary Education Act of 1965 (U.S. Congress 2004), but the modern-day groundwork
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

for recent federal education-reform initiatives was laid more than 30 years ago with the
release of the report, Improving Educational Achievement (Committee on Testing and Basic
Skills 1978). The report, which formed the foundation for the No Child Left Behind (NCLB)
Act of 2001 (U.S. Congress 2002), called for changes in schooling by recommending that
government push schools to (1) return to “basic skills” as a means to increase achieve-
ment test scores, (2) increase teacher quality, and (3) use test score-driven accountability
of teachers and administrators as methods to “improve” education. In hindsight, some
statements were prophetic (Committee on Testing and Basic Skills 1978): “American educa-
tion should be paying much more attention to doing a thorough job in the fundamentals
of reading, writing, and arithmetic” (iii); and, the authors elaborated, “Tests can play
several different roles. One is as a means of public accountability” (7).

One influence of 30 years of increased federal and state pressure to pursue assessment-
driven education policies has been an increased use of high school exit exams. Broadly
defined, a high school exit exam is a statewide standardized test given to all high school
students in a specified high school grade or at the end of specified courses, such as
Algebra II or Biology, as the basis for a judgment about students’ high school graduation
status: with a standard diploma, not graduate, or with a lesser diploma. A state board of
education can waive the exit exam requirement for groups of students with individual
education plans (IEPs) or other special cases if defined in state education statutes. Some
states’ rules allow students, especially those with IEPs, to take an alternative assessment
if they do not pass the initial exit exam.

In 1978, state education agency (SEA) personnel from Virginia unveiled a “minimum
competency” test required for high school graduation (Sanger 1978). In 1979, the New
York SEA instituted a basic competency test administered to students in the ninth grade.
By 1990, 14 states used high school exit exams; and, by 2001, prior to the passage of the
NCLB Act, 18 states required students to pass a standardized statewide exit exam for
graduation (Education Commission of the States 2008). By the end of the 2008 through
2009 school year, 23 states required students to pass a standardized statewide test in at
least language arts (LA) and mathematics to receive a standard high school diploma. The
states included Alabama, Alaska, Arizona, California, Florida, Georgia, Idaho, Indiana,
Louisiana, Massachusetts, Minnesota, Missouri, Nevada, New Jersey, New Mexico, New

The Educational Forum • Volume 75 • 2011 • 299


Tienken

York, North Carolina, Ohio, South Carolina, Tennessee, Texas, Virginia, and Washington.
By 2012, Arkansas, Maryland, Oklahoma, and Pennsylvania might also use exit exams,
bringing the total to 27 states (Education Commission of the States 2008).

Purpose
A macrolevel problem with state-mandated high school exit exams is that the reported
results from all statewide tests of academic skills and knowledge contain inherent tech-
nical flaws that should preclude them from being used as the only data point or as the
deciding factor to make high-stakes decisions about individual students, such as whether a
student qualifies for high school graduation (American Educational Research Association
[AERA], American Psychological Association [APA], and National Council on Measure-
ment in Education [NCME] 1999; Joint Committee on Testing Practices [JCTP] 2004). The
technical qualities of the reported test results for individual students do not support the
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

potential negative social and educational consequences that result from their use as a
diploma gatekeeper. Unintended social and educational consequences of high school exit
exams can include retention of students in a particular grade level, increased chances of
economically disadvantaged students not completing high school (Borg, Plumlee, and
Stranahan 2007), placement of students in low-level course sequences (which increases the
chances of not completing high school), disallowance of a standard high school diploma
to particular students, or denial of graduation for some students (e.g., Booher-Jennings
2005; Burch 2006). Each of the potential consequences—and this list is not complete—costs
society more money in the long term because of the depressed earnings of those who do
not attain a high school diploma. Depressed earnings result in depressed tax receipts, and
are also associated with higher public medical costs, greater rates of incarceration, and
greater use of the welfare system (Levin 2009).

Test score validity takes center stage in the debate over high school exit exams when
validity is discussed in the context of whether the interpretation of a single test score is
an appropriate measure of an individual’s high school achievement (i.e., traditional con-
struct validity). The traditional view of validity as three distinct categories of construct,
content, and criterion is ill-suited to explain the potential negative social and education
consequences of test score misinterpretation. Messick (1988; 1995; 1996) called for a view
of validity that integrated criteria and content, with intended and unintended conse-
quences within the construct validity framework. Messick (1995) placed the intended
and unintended social and educational consequences of test score interpretation as an
aspect of construct validity, and not as its own category of validity. The integrated view
of construct validity allows school administrators and policymakers to consider social
and education consequences within the validity discussion.

The specific problem centers on one technical characteristic associated with the con-
struct validity of using large-scale exit exam results as the determining factor to allow a
student to graduate from high school: conditional standard error of measurement (CSEM)
and its effect on individual test score interpretation. The reported results of individual
students might not be the actual or true score. The CSEM is an estimate of the amount of
error or the lack of precision one must consider when interpreting a test score at a specific
cut-point or proficiency level (Harville 1991). Think of it as the margin of error reported in

300 • The Educational Forum • Volume 75 • 2011


High School Exit Exams and Mismeasurement

political polls (e.g., ± 5 points). The individual student-level results from every large-scale
state standardized test have a margin of error. The CSEM described how large the margin
of error is and how far the reported test results might differ from a student’s true score.
The CSEM reflected the amount of scale-score imprecision of individual test scores. For
example, if a student receives a reported scale-score of 199, and there are ± 10 scale-score
points of CSEM, then the true score could be located somewhere within the range of 189
to 209. Furthermore, if that state’s proficiency cut-score is 200, then the student is rated
“not proficient” based on his or her reported score if the SEA does not account for CSEM in
some way in its proficiency calculations for individual students, even though the student
scored within the CSEM band. Students receive incorrect proficiency categorizations when
SEA personnel do not account for CSEM in the individual reported test scores. This is
especially troubling when the single test score determines whether a student can graduate
from high school or receive a standard diploma, as it currently does in 23 states.
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

If bureaucrats within SEAs and legislators do not provide policy relief for the CSEM
that exists in their tests results, some percentage of students might be wrongly denied
a standard high school diploma when, in fact, they passed the exit exam. For example,
based on information from the latest technical manual, in 2009, about 9,500 New Jersey
high school students scored within the range of spring administration of their mathemat-
ics exit exam’s SEM at the proficiency cut-score (Tienken 2011). Likewise, approximately
54,000 students in California scored within the CSEM margin of error on their November
2006 LA exit exam (Tienken 2011). This happens in every state. The reported student-level
scores are not the true scores, yet SEA personnel make determinations about graduation
eligibility as if scores were error free.

Scant literature addresses the amount of CSEM present in the results of individual
students on high school exit exams. The purpose of this study was to add literature on
the amount of CSEM present in the mathematics and LA high school exit exams to help
school administrators understand the validity issues associated with exit exams.

Questions
Five questions guided the study: (1) How many states report the scale-score CSEM
at the proficiency cut-score point for the LA and mathematics sections of the high school
exit exam in their official technical documents as recommended in the Standards for Edu-
cational and Psychological Testing (AERA, APA, and NCME 1999)? (2) What is the size of
the reported CSEM at the proficiency cut-score point, in scale-score points, for the LA and
mathematics sections of each state’s exit exams? (3) How do SEAs attempt to remedy the
imprecision issues posed by CSEM on the interpretation of reported individual student
test scores? (4) How many students are potentially affected (denied graduation) due to the
presence of CSEM? (5) How aligned are state practices regarding the treatment of CSEM
to standards of educational testing?

The results of this study add needed literature on this topic, and provide leaders le-
verage with which to advocate for policy adjustments. Education policy and high-stakes
testing continue to take shape at the federal level, and the informed discussion of CSEM
should be a priority topic.

The Educational Forum • Volume 75 • 2011 • 301


Tienken

Research and Literature on High School Exit Exams


The purpose of the literature review is to give the reader an overview of the issue. An
initial Internet search was conducted to explore the literature on the topic of exit exams
and CSEM. The search terms included measurement error and high school exit exam, and
exit exam and conditional standard error of measurement. The exploratory search produced
three types of results: (1) non-empirical, advocacy literature, (2) empirical literature,
and (3) psychometric technical documents and related standards for testing. The results
divided into writings that advocated for the use of exit exams, writings that opposed
their use, and psychometric and technical standards and recommendations for the ap-
propriate use of test results. Administrators looking for coherent answers about the ef-
ficacy of exit exams are hard-pressed to find a consistent message in the non-empirical
and empirical literature, whereas technical psychometric standards provide concrete
guidance and recommendations to guide administrators’ initial judgments about their
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

state’s exit exam program.

Non-Empirical Literature
The non-empirical literature ranges from advocacy, faux policy briefs, and editorials
published by think-tanks and pseudo-scientists who support the practice (e.g., Freedman
2004; Greene and Winters 2004; Hanushek and Welch 2006; Achieve, Inc. 2008; Education
Commission of the States 2008), to opposition advocacy (e.g., Neill 1997; Ohanian 2001;
FairTest.org 2008). Advocates’ arguments for high school exit exams focus on six areas.
Exit exams (Amrein and Berliner 2002):

• provide a measure of quality control for the high school diploma;


• motivate students and teachers to do their best;
• represent important content;
• foster equal education opportunities for all students;
• provide accountability for students and teachers; and
• provide a reliable measure of individual achievement.

Opponents of the practice state that the exams (Warren, Jenkins, and Kulick, 2006):

• lack construct validity;


• are not suitable to make high-stakes decisions about individual students or
teachers;
• penalize minorities, special education students, and English language learners;
• increase the number of dropouts; and
• increase teachers’ use of gaming strategies, such as test preparation and narrow-
ing of the curriculum.

Though the layperson’s literature might not rise to the level of empirical research as
defined by Haller and Kleine (2001), it has influenced education policy (e.g., Goals 2000
[see U.S. Congress 1994], NCLB, and Achieve, Inc. and its American Diploma Project). This
literature contains little discussion about the CSEM.

302 • The Educational Forum • Volume 75 • 2011


High School Exit Exams and Mismeasurement

Empirical Literature
The review of empirical research included a search of three databases—EBSCO, OVID
SP, and ERIC; and six refereed education research journals—American Educational Research
Journal, Educational Researcher, Educational Evaluation and Policy Analysis, Review of Education
Research, Review of Research in Education, and the Education Policy Analysis Archives—for
literature on CSEM issues and high school exit exams. There were 53 peer-reviewed articles
with the terms high school exit exam. When a search with the terms conditional standard error
of measurement and exit exam was performed, not a single peer-reviewed article was found
that reported the actual CSEM present in high school exit exams or directly reported on
the influence of CSEM on interpretation of the results. Three mutually exclusive common
themes about exit exam influences on achievement and graduation rates surfaced. High
school exit exams (1) improve overall achievement and graduation rates (Stringfield and
Yakimowski-Srebnick 2005); (2) suppress overall achievement and graduation rates, and
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

have negative unintended consequences, especially for minorities (Lee and Wong 2004;
Hursh 2007; Vasquez Heilig and Darling-Hammond 2008); or (3) provide mixed, uneven,
or inconsistent results (Clarke et al. 2003; Allensworth 2005; Dee and Jacob 2006).

Standards for Education Testing


Authors of the Standards for Educational and Psychological Testing (AERA, APA, and
NCME 1999) and the Code of Fair Testing Practices in Education (JCTP 2004) presented spe-
cific standards and recommendations for test developers, test takers, and those who use
test results to make decisions about children. The standards and recommendations cover
test construction, fairness in testing practices, appropriate documentation of technical
characteristics of tests, and other related topics. Both publications make specific recom-
mendations for how to address CSEM in the context of high-stakes testing. The Standards
was chosen as this study’s benchmark for appropriate reporting practices instead of the
Code because three of the largest organizations (in terms of membership) associated with
testing produced the Standards (AERA, APA, and NCME 1999). They provided specific
guidance for developers and users of high-stakes testing programs. There are many
overlaps between the two because the working group that produced the Code included
members of the three Standards organizations, and many recommendations contained in
the Code are included in the Standards.

Specific statements related to construct validity, as defined by Messick (1995; 1996), and
measurement error appear in Part I and Part III of the Standards (AERA, APA, and NCME
1999). For example, the authors of the Standards (AERA, APA, and NCME 1999, 27) stated:

Measurement error reduces the usefulness of measures. It limits the extent to


which test results can be generalized beyond the particulars of a specific application of
the measurement process. Therefore, it reduces the confidence that can be placed in any
single measurement.

The authors recommended that error and its sources be reported: “The critical information
on reliability includes the identification of the major sources of error, summary statistics
bearing on the size of such error” (AERA, APA, and NCME 1999, 27). Also, “Precision

The Educational Forum • Volume 75 • 2011 • 303


Tienken

and consistency in measurement are always desirable. However, the need for precision
increases as the consequences of decisions and interpretations grow in importance” (AERA,
APA, and NCME 1999, 30). They (i.e., the AERA, APA, and NCME 1999, 30) explained
why test developers and users (i.e., SEA personnel) must report the CSEM at the cut-score
levels of their tests:

Mismeasurement of examinees whose true scores are close to the cut score is a
more serious concern. The techniques used to quantify reliability should recognize these
circumstances. This can be done by reporting the conditional standard error in the
vicinity of the critical value.

Several standards for test score precision in high-stakes contexts exist (AERA, APA,
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

and NCME 1999), which policymakers and school administrators can use to guide high-
stakes testing policy and decision-making. Table 1 includes the applicable macro-standards,
statements, and paraphrased recommendations. Authors of the Standards (AERA, APA, and
NCME 1999, 139) provided overall guidance on interpretation and score precision: “The
higher the stakes … the more important it is that the test-based inferences are supported
with strong evidence of technical quality.”

Table 1. Standards for Educational and Psychological Testing


Related to Test Score Precision and Conditional Standard Error of
Measurement
Standard Standard Statement Recommendations
2.2 “The standard error of measurement, The CSEM is important in high
both overall and conditional … should school exit exam situations due to the
be reported … in units of each derived consequence of imprecision.
score” (31).

5.10 “[T]hose responsible for the testing Score precision should be illustrated
programs should provide appropriate by error bands or potential score
interpretations. [They] should describe ranges for individual students and
… the precision of the scores, common should show the CSEM.
misinterpretations of tests scores” (65).

6.5 “When relevant for test interpreta- The SEM should be reported.
tion, test documents ordinarily should
include item level information, cut
scores … the SEM” (69).
7.9 “When tests or assessments are pro- Precision is an important issue. …
posed for use as instruments of social, Users should report the amount of
educational or public policy … users error present in scores.
… should fully and accurately inform
policy-makers of the characteristics of
the tests” (83).

Note. Source: American Educational Research Association, American Psychological Association, and National
Council on Measurement in Education (1999). CSEM ⫽ conditional standard error of measurement.

304 • The Educational Forum • Volume 75 • 2011


High School Exit Exams and Mismeasurement

Though the non-empirical and empirical literature on high school exit exams is char-
acterized by multiple perspectives, contradictory research results, and worn-out slogans,
the Standards (AERA, APA, and NCME 1999) provided guidance about the influence
of CSEM on construct validity related to the potential negative social and educational
consequences for children. The Standards clearly called for recognizing CSEM as a factor
that affects score interpretation of individual results and affects the test’s usefulness as a
valid measure of high school achievement (e.g., construct validity, as defined by Messick,
1995; 1996).

Theoretical Perspectives for High School Exit Exams


Advocates of high school exit exams generally harvest policy frameworks from ra-
tional choice theory and behaviorist theories of cognitive development. The frameworks
are operationalized via state education policies that use positive reinforcement and nega-
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

tive reinforcement, also known as carrots and sticks. Bryk and Hermanson (1993) termed
this an instrumental use model. The theory is that a policy body develops a set of expected
education outcome measures (e.g., state standards); monitors the relationship between the
measures and school processes, usually through high-stakes testing; and then implements
rewards or sanctions to change behavior through external force to maximize performance.
The measures rest on arbitrary achievement proficiency levels and external control. For
example, advocates of exit exam policies postulate that high-stakes exit exams cause
students and teachers to work harder and achieve more because the tests create teaching
and learning targets that have perceived meanings to both groups. In other words, they
will make a rational choice to work harder to prepare for the exams. Of course, there is
an underlying assumption in the theoretical framework that teachers and students do not
work hard and, therefore, need external motivators to improve. Another example includes
the threats from SEAs to withhold funding for poor performance on high-stakes tests to
compel school personnel to work harder because they do not want to lose funding. A
similar version is the use of public castigation via the press and ratings, and rankings of
districts by SEA personnel to spur educators to work harder to achieve outcomes.

Conversely, exit exam opponents derive theoretical guidance from an enlightenment


model based on self-determination theory (Laitsch 2006). Creators of an assessment system
based on an enlightenment model seek to foster greater discussion, study, and reflection
of education practices based on the indicators of the assessment system. Standardized
tests still play a part, but their uses and interpretations are different from those within an
instrumental use model.

Design and Method


A non-experimental, descriptive, cross-sectional design (Johnson 2001) was used for
this research because the purpose of the research was to (1) investigate the type of CSEM
information SEAs currently report and the policy remedies in place to address the CSEM
present in individual test scores, (2) identify the number of students potentially denied
graduation as a result of CSEM, and (3) report results of the investigation. An Internet
search was made of SEA Web sites for the mathematics and LA exit exam technical manuals
of the 23 states that used high school exit exams in 2008. The “search” function on each
SEA site and appropriate key word descriptors (i.e., testing and accountability, exit exam,

The Educational Forum • Volume 75 • 2011 • 305


Tienken

high school exit exam, exit exam and mathematics, exit exam and LA, technical manual and LA,
technical manual and mathematics, technical report and mathematics, technical report and LA,
and technical manual and exit exam) were used to locate the technical manuals. If a manual
was not posted on the SEA Web site, a formal e-mail request was sent to the SEA test-
ing coordinators, and another e-mail was sent after two weeks if no reply was received.
Technical manuals are supposed to be available in the public domain.

The search occasionally led to several exit exam technical manuals for each subject.
If there were exit exams for multiple high school grades, the most recent manual at the
grade level closest to 12th grade was chosen. For example, if a state included Algebra I
and Algebra II exams as exit exams, the Algebra II exam was chosen because of the as-
sumption that the Algebra II exam would more closely represent the higher level of high
school math attainment.
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

After locating a technical manual for a high school exit exam, the “find” function
was used with the search terms standard error and conditional standard error. Sometimes
an SEA did not report the CSEM for the proficiency cut-score. In that case, a search was
conducted for the reported SEM at each scale-score point along with the just-proficient
scale-score. When the just-proficient scale-score—the lowest score a student could achieve
to be considered proficient—was located, the SEM reported for that scale-score as the
CSEM was recorded. In the cases when the SEM was not reported for the scale-scores,
but reported for raw scores, cross-referencing of the just-proficient raw score with its
corresponding scale-score was performed. Then, the researchers cross-referenced with
raw scores at one raw score point above the just-proficient score and one raw score point
below the just-proficient raw score with their corresponding scale-scores, calculated the
difference between the just-proficient and not-proficient scale-scores, and used that as
the estimated CSEM.

The CSEM, in scale-score points for LA and mathematics, was recorded. In some
cases, SEA personnel reported CSEM or SEM values for reading and writing separately.
Only the reading CSEM was recorded under the heading of LA because most questions
and scale-score points for the LA portion of the exit exam came from the reading section.
When states offered multiple testing opportunities, the CSEM was taken from the first
test administration. The number of testing opportunities was located by searching the
graduation requirements found on each SEA Web site. When SEA personnel scheduled
two testing cycles—for example, a retake in October and a retake in March—the CSEM
for the October test was used.

Two methods were used to calculate the number of students potentially denied gradu-
ation because of inaccurate categorizations of students due to CSEM. None of the states
reported this information directly. The first method was a simple calculation of how many
students scored within the CSEM range below the proficiency cut-score. For example, in
New Jersey, the proficiency scale-score cut-point is 200, and the CSEM is approximately
10 scale-score points. After locating the frequency of student scores, added to that number
were the frequency counts of students who scored within 10 points below the cut-point. For
states that did not provide frequency data, the number used was derived by multiplying

306 • The Educational Forum • Volume 75 • 2011


High School Exit Exams and Mismeasurement

the reported classification accuracy estimate by 34 percent of the student population who
took the test. Some states reported estimates for the accuracy of their proficiency clas-
sification. Thirty-four percent of the population was used because that represents the ap-
proximate percentage of students, on average, who scored within the CSEM range below
the cut-point. That percentage was determined by taking the percentage of students who
scored within one CSEM around the cut-score (68 percent, on average) and halving that to
account for only the students who scored below the cut-point with the CSEM. This figure
is tentative, given the lack of data provided by some state SEAs, and necessitated using
a method to approximate the number of students affected. However, it does provide a
useful approximation to help gain a sense of the potential magnitude of the problem.

Results
Table 2 lists the name of each state, the most recently reported or approximate CSEM at
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

the proficiency cut-point for the LA and mathematics portions of the high school exit exam,
the number of opportunities to take and pass the exam, the method used to address the
presence of CSEM, whether the state used a “hard and fast” cut-score (e.g., did not allow
for a range of scores at the proficiency cut-point), and the number of students potentially
affected by CSEM. It was possible to locate or calculate CSEM at the proficiency cut-score
for 18 out of 23 of high school exit exams. Approximately 21 percent, or 5 out of 23, of the
SEAs did not report CSEM or provide enough information to permit a calculation. The
SEAs that did not report CSEM also did not post technical manuals on their Web sites, or
the manuals did not provide any usable information to calculate CSEM. Testing personnel
from two states did not respond to e-mail requests for information.

The range of CSEM at the proficiency cut-point for LA and mathematics ranged from
3.24 on the Idaho mathematics exit exam to 39 scale-score points on the Texas mathematics
exit exam. That means that the Texas students’ true math score can be ± 39 points from
the reported test score. The actual size of the error is less of a concern in this case because
each state uses a hard-and-fast cut-score. Even one scale-score point of CSEM can cause
misinterpretation and mis-categorization of student performance.

Every SEA provided at least two opportunities for students to take and pass the high
school exit exam. The mode was three testing opportunities. None of the SEAs reported
policies that averaged students’ scores from multiple testing opportunities to form a single
score on a specific test (e.g., LA or math) to determine proficiency. Eleven SEAs posted
technical manuals for tests administered in 2007, and nine SEAs posted only technical
manuals for tests administered prior to 2007. Almost 60 percent of the SEAs did not provide
information about their accounting for CSEM in individual student test results. Only one
SEA included a visual CSEM band on student reports. None of the SEAs accounted for
the CSEM by awarding the student the theoretical higher score, the score at the top end
of the CSEM band, even though SEA personnel knew CSEM existed.

Nationwide, an estimated 118,111 students were potentially denied a passing score on


their state’s high school exit exam in LA, and an estimated 114,391 students were denied a
passing score on their state’s exit exam in mathematics the first time they took the exam.
Keep in mind that these figures represent data from only 18 states. All states administer

The Educational Forum • Volume 75 • 2011 • 307


Tienken

Table 2. Reported Conditional Standard Error of Measurement in


Scale-Score Points for the Language Arts and Mathematics Portions
of State High School Exit Exams and Number of Testing unities
No. of Stu-
Report dents Poten-
CSEM tially Affected
Testing Account for Band on by CSEM at
LA Math Oppor- CSEM in Student Student Proficient LA/
State/Year CSEM CSEM tunities Scores Scores Math
Alabama DNR DNR — Established by No DNR
incorporating the
CSEM
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

Arkansas/2007 19 19 3 — — 412/937
Arizona/2007 13 8 3 No No 4,906/1,907

California/2007 14 18 7 SEA stated that No 54,000/54,000


cut-scores are set to
account for CSEM
Florida/2006 19 8 3 — — 6,348a/10,006
Georgia/2007 9 5 3 — — —
Idaho/2007 3.29 3.24 3 No No 718/1,226

Indiana/2006 DNR DNR 3 — — DNR


Louisiana/2006 3.54 3.98 3 — — —

Massachusetts/2007 ~9 ~9 3 No Yes 8,189/6,206

Minnesota/2007 14 12 3 No No 5,435/—
Missouri/2007 ~8 ~9 >1 No Yes 1,926a/2,038a
Nevada/2007 26 33 >3 — — —

New Jersey/2006 ~9 ~9 — — — 9,500a/9,500


New Mexico/2006 10 11 2 No No 1,664/1,897

New York/2006 DNR DNR 2 — — —


North Carolina — — 3 — — —
Ohio/2006 ~8.59 ~10.02 5 — — —
South Carolina/2004 5.6 5.5 3 No No 995a/1,079a

Tenessee/2007 — — 3 No Yes —
Texas/2007 32 39 3 — — 19,183/19,058

Virginia/2004 24 17 — — — 1,212/1,445
Washington/2007 8.86 9.93 3 — — 3,623/5,092
Note. N ⫽ 23. CSEM = conditional standard error of measurement; LA ⫽ language arts; DNR ⫽ did not report;
SEA ⫽ state education agency.
a
Calculated by hand the number of students potentially mis-categorized.

308 • The Educational Forum • Volume 75 • 2011


High School Exit Exams and Mismeasurement

statewide high school exams in LA and mathematics as mandated by the NCLB Act.
Though only 23 out of 50 states used the exams to determine graduation in 2008, there
are other issues, as stated earlier, that arise for students based on the results of the non-
exit exams administered in the other 27 states. Admittedly, the calculations of students
potentially affected by CSEM are tentative, but they represent the best estimate possible
given the lack of data provided by SEAs on this subject. The fact remains that students
are being mis-categorized and subjected to inappropriate education decisions that carry
high stakes for them and their families based on either an inability or unwillingness of
their SEA to address this issue. SEAs are encouraged to respond with more exact figures
to bring transparency and clarity to this topic.

Not all SEAs adhere to the standards, best practices, and recommendations regarding
CSEM advocated in the Standards for Educational and Psychological Testing (AERA, APA,
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

and NCME 1999). For example, Standard 2.2 (AERA, APA, and NCME 1999, 31) reads:
“The standard error of measurement, both overall and conditional … should be reported
… in units of each derived score.” Nearly 25 percent of the states did not report CSEM.
When SEA personnel choose not to report the CSEM (Standard 2.2), it creates a snow-
ball effect of Standards violations. “When test score information is released to parents …
those responsible for the testing programs should provide appropriate interpretations.
The interpretations should describe in simple language … the precision of the scores and
common misinterpretations of tests scores” (AERA, APA, and NCME 1999, 65, Standard
5.10). “When relevant for test interpretation, test documents ordinarily should include
item level information, cut-scores … the standard errors of measurement” (AERA, APA,
and NCME 1999, 69, Standard 6.5).

The lack of transparency and lack of psychometric professionalism calls into question the
overall quality of that entire state’s testing programs, and raises concerns about the construct
validity, as defined by Messick (1995; 1996), to include the social and educational consequences
of high-stakes testing. Messick (1995) included the potential intended and unintended nega-
tive social and education consequences of test score misuse and misinterpretation as part of
construct validity. He cautioned that people who create and use high-stakes tests and their
results should weigh the possible intended and unintended consequences to children before
enacting the policies of a proposed testing program. The integrated view of construct validity
allows school administrators and policymakers to consider social and education consequences
in the validity discussion, and potentially make more informed policy decisions.

School administrators, students, and parents in states where SEAs do not report CSEM
or publish technical manuals in the public domain have no way to judge the precision of
reported individual test results, and they are limited in their attempts to appeal the results.
Administrators also are handcuffed in their attempts to lobby for a fair testing system
that protects students and minimizes unintended social and educational consequences.
How can school administrators initiate policy remedies for problems they do not know
exist? All large-scale standardized tests contain CSEM; and administrators, parents, and
policymakers need to be aware of the size of the CSEM. What do the testing personnel in
the SEA have to hide? Where is the institutional accountability, and where are procedural
safeguards for children?

The Educational Forum • Volume 75 • 2011 • 309


Tienken

One purpose of this study was to add literature on the topic of CSEM and exit ex-
ams, not to draw aspersions on the entire practice of exit exam testing. That would be
unwarranted. Huebert and Hauser (1999, 276) wrote, “Blanket criticisms of tests are not
justified. When tests are used in ways that meet relevant psychometric, legal, and edu-
cational standards, students’ scores provide important information that, combined with
information from other sources, can lead to decisions that promote student learning and
equality of opportunity.”

Is CSEM a real concern for students? Yes, according to the leadership of APA, AERA,
NCME, and JCTP, and individuals in the field of educational testing, like Messick (1995;
1996) and Koretz (2008), because of the unintended consequences that CSEM produces if
SEA personnel do not report it and if SEAs do not account for it through policy remedies.
Even a small amount of CSEM can have severe consequences for students when SEA
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

personnel do not account for it and, instead, simply require students to achieve a set cut-
score to demonstrate proficiency (Koretz 2008), as do states in this study.

Because high school exit exams and CSEM are nationwide phenomena, perhaps
hundreds of thousands of students might have been potentially negatively affected in the
NCLB era by what appears as inaction at the state and national levels to develop policy
remedies aligned with standards and recommendations for appropriate testing practices.
As stated in the Standards, “Measurement error reduces the usefulness of measures. …
[I]t reduces the confidence that can be placed in any single measurement” (AERA, APA,
and NCME 1999, 27).

Nearly all the states with exit exams, 21 out of 23, provided a basic policy remedy to
help account for CSEM: multiple testing opportunities. While this seems like a positive
approach, it does not overcome the issue, but simply shifts the CSEM to another test
(Koretz 2008) and does not account for it in the interpretation phase.

Policy Remedies
One appropriate policy remedy is for SEAs to keep their current number of testing op-
portunities, but report scores with the CSEM band and award the higher score to the student
(i.e., student’s reported score plus the CSEM at the proficiency cut-point). This increases the
transparency of the process and helps with score interpretation because the SEAs would
formally recognize the CSEM on the individual score reports. This policy would help to
ameliorate the potential negative social and educational consequences to students when
there is no accounting for the CSEM. The score advantage should always go to the student
in the high-stakes situation because of the inherent uncertainty and imprecision of the re-
ported test results (APA, AERA, and NCME 1999). An advantage of this recommendation
is that the SEAs do not have to change the testing cycle or incur additional costs.

Another approach is for states to allow students additional testing opportunities,


during their high school years (e.g., up to four times) and up to two times within one year
after completing their high school credit requirements; to report scores with the CSEM
band; and to award the higher score to the student. This recommendation simply adds
the reporting and awarding of the CSEM to the individual score of that test. While it does

310 • The Educational Forum • Volume 75 • 2011


High School Exit Exams and Mismeasurement

cost more to allow more testing opportunities, it costs even more in the long run to retain
students or deny them a high school diploma because of score error. It does not cost more
to award the CSEM to the reported score. Though it does reduce the effect of CSEM, the
major weakness of this policy change is that it simply shifts the CSEM from one test to
another test. Other weaknesses are the short-term costs and logistical issues associated
with additional testing. Strengths of this recommendation are that it gives students more
time to prepare for the assessment and provides multiple practice opportunities. Includ-
ing the CSEM in the student’s score and awarding the score at the top end of the CSEM,
along with additional testing opportunities, would provide one procedural safeguard to
lessen the unintended consequence of students not being awarded a high school diploma
due to CSEM precision issues. “Precision and consistency in measurement are always
desirable. However, the need for precision increases as the consequences of decisions and
interpretations grow in importance” (APA, AERA, and NCME 1999, 30).
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

A third policy remedy is to provide students unlimited testing opportunities and to


average their scores because the average of multiple scores represents student perfor-
mance better than just one score (Koretz 2008). Unlimited testing opportunities can be
accomplished via a validated online testing system. The psychometric reliability of the
reported scores for a student increase with the number of times the student takes the test.
The averaging process helps to erode the effect of CSEM. While some might argue that
this recommendation does not really help the student, it does provide for psychometric
honesty and greater score reliability.

A fourth option is to combine the first three recommendations: (1) allow additional
testing opportunities up to one year after students satisfy high school credit requirements,
(2) report the CSEM for individual scores, (3) award the student the reported score plus
the CSEM at the proficiency cut-point, and (4) average the scores from all testing op-
portunities. This option seeks to address CSEM on multiple fronts, is psychometrically
sound, and goes the furthest to mitigate the potential negative and social consequences
associated with the interpretation of individual results.

Closing Thoughts
It is important to keep in mind that many countries around the world engage in
standardized testing to sort students into academic and vocational tracks, gatekeeping
some students from accessing the total curriculum, and overtly managing their country’s
current social-class structures and human capital. For example, sorting students in Europe
via standardized testing is a leftover cultural practice from the days of the aristocracies
and restrictive class systems. The practice of sorting and gatekeeping is generally cultur-
ally accepted in other parts of the world, such as in some Asian countries, but it is not
necessarily part of the democratic culture and ideals in the United States. One view of
education in the United States derives from the Jeffersonian idea of public school as a tool
to level the playing field between the haves and have-nots. Those who hold a democratic,
Jeffersonian view of education reject sorting systems, whether those systems are overt or
covert. After all, the public school system in the United States is the only social institution
that allows democratic values to be passed on to the next generation, and it is the only
institution with the ability to socialize all Americans, citizens and immigrants alike, to

The Educational Forum • Volume 75 • 2011 • 311


Tienken

the democratic ideas of the country (Commission on the Reorganization of Secondary


Education 1918).

Mechanisms or gates that interfere with that function can create potential threats to
the long-term democratic health of the republic. Consider that the education system is
usually one of the first institutions to be “remade” after coups, jihads, revolutions, or in-
vasions and occupations. For example, Mikhail Gorbachev had the history curriculum in
the Soviet Union changed to be more reflective of negative aspects of previous communist
regimes in order to operationalize his vision of glasnost (transparency) and help to move
the Soviets out from behind the Iron Curtain. Education cannot be separated from the
larger governmental context. It is difficult to evolve a democracy if the education system
relies on undemocratic methods and policies.
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

The larger policy question remains, and is not explored in this study: Given what
is known about the effect of CSEM on score interpretation and the high-stakes negative
social and educational consequences to students associated with score imprecision, do
the ends (e.g., the mythical standardizing of the high school diploma) justify the means
of using high-stakes tests with known technical flaws that affect score interpretation?
Children do not have a seat at the decision-making table. Policy must speak for them.
Adults make policy.

I thank Gabriella and Francesca, and acknowledge the doctoral candidates in Seton
Hall University’s Cohort XI for their feedback regarding this manuscript.

References
Achieve, Inc. 2008. Closing the expectations gap. Washington, DC: Author. Available at: www.
achieve.org/files/50-state-2008-final02-25-08.pdf.
Allensworth, E. M. 2005. Dropout rates after high-stakes testing in elementary school: A
study of the contradictory effects of Chicago’s efforts to end social promotion. Edu-
cational Evaluation and Policy Analysis 27(4): 341–64.
American Educational Research Association, American Psychological Association, and
National Council on Measurement in Education. 1999. Standards for educational and
psychological testing. Washington, DC: American Educational Research Association.
Amrein, A. L., and D. C. Berliner. 2002. High-stakes testing, uncertainty, and student
learning. Education Policy Analysis Archives 10(18): 1–74. Available at: http://epaa.asu.
edu/epaa/v10n18.
Booher-Jennings, J. 2005. Below the bubble: “Education triage” and the Texas account-
ability system. American Education Research Journal 42(2): 231–68.
Borg, M. O., J. P. Plumlee, and H. A. Stranahan. 2007. Plenty of children left behind. Edu-
cational Policy 21(5): 695–716.
Bryk, A. S., and K. L. Hermanson. 1993. Educational indicator systems: Observations on
their structure, interpretation, and use. Review of Research in Education 19(1): 451–84.
Burch, P. 2006. The new educational privatization: Educational contracting and high stakes
accountability. Teachers College Record 108(12): 2582–610. Available at: www.tcrecord.
org/content.asp?contentid=12259.

312 • The Educational Forum • Volume 75 • 2011


High School Exit Exams and Mismeasurement

Clarke, M., A. Shore, K. Rhoades, L. Abrams, J. Miao, and J. Li. 2003. State-mandated test-
ing programs on teaching and learning: Findings from interviews with educators in low-,
medium-, and high-stakes states. Boston, MA: National Board on Testing and Public
Policy, in conjunction with Boston College, Lynch School of Education.
Commission on the Reorganization of Secondary Education. 1918. Cardinal principles of
secondary education, Bulletin No. 35. Washington, DC: U.S. Bureau of Education.
Committee on Testing and Basic Skills. 1978. Improving educational achievement. Washington,
DC: National Academy of Education.
Dee, T. S., and B. Jacob. 2006. Do high school exit exams influence educational attainment or labor
market performance?, NBER Working Paper No. W12199. Washington, DC: National
Bureau of Economic Research. Available at: http://ssrn.com/abstract=900985.
Education Commission of the States. 2008. State notes: Exit exams. Denver, CO: Author.
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

Available at: http://mb2.ecs.org/reports/Report.aspx?id=1357.


European glossary on education. 2004. Vol. 1: Examinations, qualifications and titles, 2nd ed.
Brussels, Belgium: Eurydice.
FairTest.org. 2008. Why graduation tests/exit exams fail to add value to high school diplomas.
Jamaica Plain, MA: Author. Available at: www.fairtest.org/gradtestfactmay08.
Freedman, M. K. 2004. The fight for high standards. Hoover Digest 2004(3). Available at:
www.hoover.org/publications/digest/3020841.html.
Greene, J. P., and M. A. Winters. 2004. Pushed out or pulled up? Exit exams and dropout rates
in public high schools. New York, NY: Manhattan Institute for Policy Research. Avail-
able at: www.manhattan-institute.org/html/ewp_05.htm.
Haller, E. J., and P. F. Kleine. 2001. Using educational research: A school administrator’s guide.
New York, NY: Longman.
Hanushek, E. A., and F. Welch. 2006. Handbook of the economics of education. St. Louis, MO:
Elsevier.
Harville, L. M. 1991. Standard error of measurement. Educational Measurement: Issues and
Practices 10(2): 33–41.
Huebert, J. P., and R. M. Hauser. 1999. High stakes: Testing for tracking, promotion, and gradu-
ation. Washington, DC: National Academies Press.
Hursh, D. 2007. Assessing No Child Left Behind and the rise of neoliberal education poli-
cies. American Educational Research Journal 44(3): 493–518.
Johnson, R. B. 2001. Toward a new classification of nonexperimental quantitative research.
Educational Researcher 30(2): 3–13.
Joint Committee on Testing Practices. 2004. Code of fair testing practices in education. Wash-
ington, DC: American Psychological Association.
Koretz, D. 2008. Measuring up: What educational testing really tells us. Cambridge, MA:
Harvard University Press.
Laitsch, D. 2006. Assessment, high stakes, and alternative visions: Appropriate use of the right
tools to leverage improvement. Tempe, AZ: Education Policy Research Unit. Available
at: http://epsl.asu.edu/epru/documents/EPSL-0611-222-EPRU.pdf.
Lee, J., and K. K. Wong. 2004. The impact of accountability on racial and socioeconomic
equity: Considering both school resources and achievement outcomes. American
Educational Research Journal 41(4): 797–832.

The Educational Forum • Volume 75 • 2011 • 313


Tienken

Levin, H. 2009. The economic payoff to investing in educational justice. Educational Re-
searcher 38(1): 5–20.
Matriculation Examination Board. 2009. Finnish matriculation examination 2007. Sastamala,
Finland: Author. Available at: www.ylioppilastutkinto.fi/Tilastoja/Matriculation_07_web.
pdf.
Messick, S. 1988. The once and future issues of validity: Assessing the meaning and
consequences of measurement. In Test validity, eds. H. Wainer and H. I. Braun, 33–45.
Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Messick, S. 1995. Standards-based score interpretation: Establishing valid grounds for
valid inferences. Proceedings on the joint conference of Standard Setting for Large-
Scale Assessments, National Assessment Governing Board and National Center for
Educational Statistics, Washington, DC.
Downloaded by [Christopher H. Tienken] at 14:47 13 February 2012

Messick, S. 1996. Validity of performance assessments. In Technical issues in large-scale


performance assessment, ed. G. W. Phillips, 1–18. Washington, DC: National Center for
Educational Statistics.
Neill, M. 1997. Testing our children: A report card on state assessment systems. Jamaica Plain,
MA: FairTest.org. Available at: www.fairtest.org/states/survey.htm.
Ohanian, S. 2001. One size fits few: The folly of educational standards. Portsmouth, NH:
Heinemann.
Organization of Economic Cooperation and Development. 2008. Education at a glance 2008:
OECD indicators. Paris, France: Author.
Sanger, D. 1978. Is “competency” good enough? The New York Times, April 2.
Stringfield, S. C., and M. E Yakimowski-Srebnick. 2005. Promise, progress, problems, and
paradoxes of three phases of accountability: A longitudinal case study of the Baltimore
City public schools. American Educational Research Journal 42(1): 43–75.
Tienken, C. H. 2011. Structured inequity: The intersection of socioeconomic status and the
standard error of measurement of state mandated high school test results. Accepted
for NCPEA yearbook, ed. B. Alford. Mill Valley, CA: Qoop Publishing and National
Council of Professors of Educational Administration. (in press)
United Nations Educational, Scientific, and Cultural Organization. 2003. Global education
digest 2003. Montreal, Quebec, Canada: UNESCO Institute for Statistics. Available at:
http://www.uis.unesco.org/ev.php?ID=5483_201&ID2=DO_TOPIC.
U.S. Congress. 2002. No Child Left Behind Act of 2001, Pub. L. No. 107–110. Washington,
DC: Author. Available at: www2.ed.gov/policy/elsec/leg/esea02/107-110.pdf.
U.S. Congress. 2004. Elementary and Secondary Education Act of 1965, Pub. L. No. 89-10.
Washington, DC: Author.
U.S. Congress. 1994. Goals 2000: Educate America Act, Pub. L. No. 103-227. Washington,
DC: Author.
Vasquez Heilig, J., and L. Darling-Hammond. 2008. Accountability Texas-style: The
progress and learning of urban minority students in a high-stakes testing context.
Educational Evaluation and Policy Analysis 30(2): 75–110.
Warren, J. R., K. N. Jenkins, and R. B. Kulick. 2006. High school exit examinations and
state-level completion and GED rates, 1975 through 2002. Educational Evaluation and
Policy Analysis 28(2): 131–52.

314 • The Educational Forum • Volume 75 • 2011

View publication stats

You might also like