Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Second and Foreign Language

Assessment
JAMES E. PURPURA
Teachers College, Columbia University
Program in Applied Linguistics and TESOL
Department of Arts and Humanities
525 West 120th St.
New York, NY 10027
Email: jp248@tc.columbia.edu

This article summarizes some of the main issues, concerns, and debates that have ensued over the years
in the field of L2 assessment and shows how past concerns have shaped contemporary L2 assessment
research and practice. The article first describes what L2 assessment is and what it entails, arguing that
notions of L2 assessment have been broadened over the years to keep pace with contemporary inter-
pretations of how information from assessments in large-scale or classroom contexts is used to make
decisions. It then describes in some detail four approaches to construct definition, showing how these
approaches have influenced what gets assessed. Finally, the paper discusses a range of other topics of
current and future importance for L2 assessment theory, research, and practice.
Keywords: L2 assessment; L2 testing; L2 validation; L2 construct definition; models of L2 proficiency

OVER THE YEARS, SOCIETAL, ECONOMIC, to process information, reason from evidence,
geopolitical, and technological forces in the work- make decisions, solve problems, self-regulate, col-
place have increased the knowledge, skills, and laborate, and learn—and they need to do this in
abilities (KSAs) that people need to perform their their L2. This reality has created performance de-
jobs. We are now asked to read, listen, and syn- mands for all aspects of education, but especially
thesize large amounts of information from several for assessment, where the public depends on as-
sources via multiple modalities; search for infor- sessments to do everything from diagnosing learn-
mation, judge its accuracy, and evaluate its appli- ing needs to verifying the achievement of stan-
cability; and use communication technologies to dards in educational and professional settings.
collaborate in teams whose members represent To keep pace with changing workforce and ed-
a diverse global community (National Research ucational demands, the field of L2 assessment has
Council, 1999, 2001). Importantly, many of us are been tasked with determining whether L2 users
asked to do this in a second, foreign, or heritage can perform high-level skills in the L2, or whether
language (L2), requiring competencies for com- they have acquired sufficient linguistic resources,
municating ideas and establishing relationships along with other relevant competencies, to ben-
in culturally respectful ways. Such demands have efit from L2 and disciplinary instruction. The
sparked the need to have disciplinary skills, along changing nature of work and education has also
with linguistic skills, to communicate in complex required L2 assessments to tap into a broader
ways across diverse contexts via multiple modal- range of competencies over the years, competen-
ities. To succeed in this environment, L2 users cies that measure more complex linguistic, so-
must demonstrate that they have the skills needed ciocognitive, and sociocultural skills and, to some
extent, the ability to engage affectively with top-
The Modern Language Journal, 100 (Supplement 2016) ical content on assessments in new ways. Finally,
DOI: 10.1111/modl.12308 these changes have led educators and testing
0026-7902/16/190–208 $1.50/0 experts to rethink the KSAs they aim to tap

C 2016 The Modern Language Journal into and to reconsider how assessments might be
James E. Purpura 191
designed, developed, and validated before being ences or claims about certain language-related
used for decision making in a range of contexts. characteristics of an individual. In other words,
Besides debating the theoretical components the term assessment refers not only to formal tests
of L2 proficiency, L2 educators and testing such as the TOEFL or an end-of-chapter evalua-
experts have also capitalized on new commu- tion, but also to other methods of obtaining in-
nication and digital technologies (e.g., Skype, formation about KSAs such as by observing L2
digital video clips) and analytic procedures (e.g., performance during pair work or by asking learn-
automated scoring, corpus analyses) for inclusion ers to report their understandings and uncertain-
into the assessment system, thereby impacting ties. The goal behind all L2 assessments is to elicit
design, development, delivery, scoring, and vali- L2 performance from an individual under certain
dation. They have also embraced the use of new conditions so that performance consistencies can
and sophisticated measurement practices into be interpreted and used to produce records such
assessment systems. And they have used these as scores, verbal descriptions, or mental notes. In-
innovations in both large-scale and classroom terpretations from these records are then used as
contexts. However, despite these advances and evidence for making decisions.
the many theoretical and methodological affini- One quality of all L2 assessments, according
ties assessment experts have with other areas of to Bachman and Palmer (2010), is that assess-
applied linguists (e.g., L2 analysis, acquisition, ments are based on substantive grounding. In
use, pedagogy), L2 assessment as a field has other words, they are designed to elicit informa-
often struggled to influence how other applied tion rooted in a principled and verifiable body of
linguists (e.g., SLA researchers) conceptualize, content, coming from a lesson in a textbook, a
operationalize, and validate the assessments they syllabus, standards, or a model of L2 proficiency.
use or even how other applied linguists (e.g., A second quality is that assessments are goal-
discourse analysts) might benefit from trans- oriented in that they have an intended purpose,
disciplinary research with testers. Rather than even though in many spontaneous, classroom-
seeing assessment as an organic part of applied based assessments embedded in instruction the
linguistics, L2 assessment is still often viewed purpose may be more implicit than explicit. A
as an afterthought, or as a craft. Consequently, final quality of all assessments is that the proce-
the real potential that L2 assessment has for dures used to elicit information involve varying
transdisciplinary understandings and practices in degrees of systematicity, ranging from very con-
applied linguistics is, in my view, yet to come. trolled tests to far less controlled assessments as in
This article provides an overview of the main is- routinized teacher protocols during instruction.
sues, concerns, and debates in contemporary L2 More specifically, an L2 test, measure, or
assessment theory, research, and practice. I will measurement instrument is a special type of
first describe what L2 assessment is and what it en- assessment designed to control: (a) the type
tails, arguing that notions of L2 assessment have of L2 behavior elicited from a content domain
broadened over the years to keep pace with differ- under certain conditions, (b) the systematicity
ent interpretations of how information from as- of the elicitation procedure, and (c) the stan-
sessments in large-scale or classroom contexts is dardized procedures for scoring performance
used to make decisions. I will then describe four and generating performance records. Other
approaches to construct definition and show how measures designed to assess social-psychological
these approaches have influenced what gets as- or demographic factors like student affective
sessed. While this discussion might reflect an ad- dispositions are sometimes referred to as mea-
mittedly narrow but traditional conceptualization surement scales, but for all intents and purposes,
of assessment as a ‘test,’ these debates have clear these have characteristics similar to tests, and are
relevance for assessments in nontesting situations. treated as such. For example, the ACTFL oral
Finally, I will discuss a range of other topics of cur- interview1 is both an assessment and in the form
rent and future resonance. of a test (often used synonymously), whereas the
principled observation of student performance
WHAT DOES LANGUAGE ASSESSMENT in class is an assessment, but never a test.
ENTAIL? Claims or inferences about a user’s L2 at-
tributes, made on the basis of an assessment,
Language assessment is a broad term referring relate to constructs, such as L2 knowledge, L2
to a systematic procedure for eliciting test and ability, or linguistic complexity, whose definition
nontest data (e.g., a teacher checklist of student must be constructed from research, theory, stan-
performance) for the purpose of making infer- dards, accumulated experience, or principled
192 The Modern Language Journal, 100, Supplement 2016
practice. Constructs cannot be observed directly, the consequences to the students’ lives are rela-
but can only be inferred from concrete opera- tively minor.
tionalizations of the construct by means of tasks As might be imagined, using score-based inter-
or other less formal protocols (e.g., naturalistic pretations to make high-stakes decisions does not
conversations) that elicit performance data. In occur in a sociocultural or political vacuum, even
L2 assessment, we are specifically interested in in classroom contexts. Rather, the use of assess-
eliciting evidence of L2 performance under ment information cannot be fully understood or
certain conditions so that we can make claims evaluated without considering the specific use(s)
about what L2 learners or users know, what for which assessments are intended, as well as the
L2 skills they have, and the extent to which potential consequences of these uses. Therefore,
they can use L2 resources, along with a host of a discussion of the consequences of assessment
other resources (e.g., topical, sociocognitive), use, especially as this relates to issues of fairness
to communicate effectively within or across and social justice, is currently part and parcel of
contexts. a discussion of what assessment is (see Kunnan,
Technically speaking, assessment refers only to 2004; McNamara & Ryan, 2011).
the elicitation and collection of performance data This naturally leads to a discussion of the
for some specific purpose; however, a discussion consequences of assessment misuse, given the
of ‘what assessment is’ should also address what number of situations in which assessments are
we do with the performance records once col- (un)intentionally used in ways for which they were
lected. This gives rise to the topic of interpretation. not intended or in ways in which social benefi-
As assessment data are gathered and records gen- cence is not prioritized, thereby leading to nega-
erated with respect to some specific purpose, we tive consequences for stakeholders. This situation
need to consider the meaning of these records may arise, for example, in research contexts when
in terms of the claims they are intended to make readily available measures from previous studies
about an individual’s attributes or some other fea- are used for purposes other than the ones for
ture of interest (e.g., the characteristics of L2 pro- which they were designed. Since the two contexts
duction). The process of deriving and justifying are different, the construct may be measured too
meaning from assessment records is at the heart broadly or narrowly for use in both contexts, re-
of validation. These record-based interpretations sulting in low score variability and uninterrupt-
are then used as an evidentiary basis for making able results. The unjustified score interpretations
assessment (or research) decisions. The system- and ensuing decisions may well lead to spurious
atic use of evidence for the purpose of making de- claims with unintended, negative consequences
cisions is then referred to as evaluation. According (Purpura, Brown, & Schoonen, 2015). A similar
to Bachman and Palmer (2010), “Evaluation in- situation may arise in classroom settings where er-
volves making value judgments and decisions on roneous assumptions of learning are made about
the basis of information, and gathering informa- a student’s KSAs based on inadequate perfor-
tion to inform decisions is the primary purpose for mance data. Given the importance of the social
which language assessments are used” (p. 21, orig- consequences of assessments, we will discuss this
inal emphasis). topic in greater detail later on.
As L2 educators and researchers, we use inter- Finally, current views of ‘what L2 assessment is’
pretations of assessments to make a wide range would certainly recognize ‘assessment’ in its sim-
of decisions. In classrooms, this involves decisions plest form as a principled collection of informa-
about readiness (i.e., diagnosis), progress, mas- tion elicited under certain conditions for some in-
tery, retention, and promotion, or decisions about tended purpose, leading to performance that can
teaching, learning, and curricula. In schools, we be scored or characterized verbally, analyzed, and
make decisions about proficiency, selection, and interpreted to produce assessment records. How-
placement. And in society, we make decisions ever, few assessment experts would now agree that
about certification and accountability. All these assessment begins and ends with data collection,
decisions come with consequences to the stake- scoring, and analysis. Most now recognize that as-
holders, or the individuals mostly impacted by the sessments are used to make decisions in real-world
decision. For example, university selection deci- contexts, involving factors outside the assessment
sions based on proficiency test scores are obvi- itself, where the probability of making the ‘right’
ously high stakes given the major consequences decision about individual attributes or research
these decisions have on the applicants’ lives, questions is not only a function of the quality of
whereas decisions about whether learners need the assessment and the related interpretations,
further practice based on a quiz are low stakes since but also a function of the use of the assessment for
James E. Purpura 193
some intended purpose, together with the conse- In an effort to contextualize how past concerns
quences of making those decisions. This broad- convene to create opportunities for current and
ened scope of L2 assessment has opened the way future assessment understandings, practice, and
for an ever-burgeoning diversity of ideas that now research activity, I will describe four approaches to
falls under the rubric of L2 assessment. This evolu- construct definition: (a) trait-centered, (b) task-
tion also comes at a time in history when the fun- based, (c) interactionist, and (d) sociointerac-
damental scientific principles, philosophical and tional, and illustrate how these approaches shape
sociopolitical assumptions, and technological op- how L2 educators and assessment experts can tai-
portunities for assessment are being rethought, lor their assessments to a diversity of assessment
thereby revealing the true potential that assess- purposes.2
ment has not only for beneficial and constructive One question surrounding differences in how
purposes, such as the improvement of teaching L2 proficiency has been defined relates to a con-
and learning, but for safeguards against compro- sideration of the components of L2 knowledge
mises to assessment quality or unethical misuses and their interrelationships. Another concerns
of assessment information. the role that context plays in characterizing an
In sum, the dynamism and intellectual diversity examinee’s KSAs based on performance consis-
of L2 assessment are charted by a virtual explosion tencies. A third involves sociocognitive and af-
of literature in this subfield of applied linguistics. fective factors underlying L2 assessment perfor-
This includes at least three journals, four book se- mance, and how these factors mediate between
ries, two encyclopedias, several handbooks, and a an examinee’s KSAs and their ability to respond
companion, all dedicated exclusively to L2 assess- to a task. A final question—and one receiving too
ment topics. little attention in my opinion—is the role that
topical, content, or disciplinary knowledge and
APPROACHES TO CONSTRUCT DEFINITION: the conveyance of meaning play in the L2 pro-
BROADENING THE CONSTRUCT OF L2 ficiency construct. Dialectics surrounding these
PROFICIENCY questions have led to a broadening of the L2 con-
struct in light of growing interpretation needs,
Probably the most compelling and enduring and to refinements in the construction, delivery,
challenge in L2 assessment is the quest to un- and use of assessments. As these four approaches
derstand and define the construct of L2 profi- shape current thinking in L2 assessment, I will de-
ciency, so that meaningful interpretations of an scribe each with an eye toward how contemporary
individual’s performance consistencies can be at- theory, research, and practice follow suit.
tributed, to the extent possible, to L2 resources,
along with other internal and external factors, What Gets Assessed: Trait-Based Approach
responsible for communicative success across a to Construct Definition
range of tasks and contexts. This has led as-
sessment theorists in educational measurement The trait- or construct-based approach to con-
and applied linguistics (e.g., Bachman, 2005; struct definition interprets performance consis-
Chapelle, 2008; Kane, 2006; Messick, 1989) to tencies as a reflection of an examinee’s KSAs as
propose several approaches to defining constructs conceptualized within some theoretical model of
so that these approaches can serve as the basis the construct of interest (Chapelle, 1998). To as-
not only for meaningful interpretation of perfor- sess the construct, items or prompts, for example
mance consistencies, but also for assessment de- in a test, are drawn from a pool of tasks in the
sign and operationalization, interpretation, and target language use (TLU) domain (e.g., the aca-
use. In some cases, these approaches have led demic context), and then specified so that task
to frameworks or ‘underlying models’ of theoret- characteristics constrain the performance to re-
ical constructs, where specific components and flect the trait. This selection process allows for the
their possible interrelationships are hypothesized. KSAs needed to perform these tasks to general-
These models have not been conceptualized as ize across other tasks in a wide range of unspeci-
universal prescriptions of the construct, given the fied contexts. In this approach, interpretation of
breadth and complexity of the proposals, but performance is thus a function of the specified
as a heuristic for informing conceptual assess- construct, assuming that the trait will generalize
ment frameworks for any given context, includ- across contexts. While this has been the dominant
ing those involving elicitation methods other than paradigm in L2 assessment for years, researchers
traditional tests (i.e., assessments embedded in have begun to question the generalizability of
instruction and mediated through interaction). traits across real-world L2 proficiency contexts.
194 The Modern Language Journal, 100, Supplement 2016
Numerous examples of the trait-based ap- Oller proposed an integrative approach to assess-
proach in the L2 assessment literature can be ment, where scores are interpreted as measures
found, beginning with Lado (1961) and Car- of overall proficiency—a practice not so dissimi-
roll (1961), who proposed a skills-and-elements lar from holistic assessment today.
model in which the individual L2 elements (e.g., While Oller’s notion of global proficiency was
phonology, structure, lexis) could be assessed subsequently rejected for a multicomponential
separately while performing language skills. The depiction of L2 ability, involving a general higher-
assumption in this approach was that L2 pro- order factor along with several distinct factors
ficiency transpires by internalizing simple, dis- (Bachman & Palmer, 1982; Vollmer & Sang,
crete units of the L2 before acquiring complex 1983), depictions of L2 proficiency as an inte-
sequences, the accumulation of which consti- grated factor have recently resurfaced, to some
tuted proficiency. This led to a discrete-point ap- extent, with contemporary proposals for inte-
proach to assessment, which is still a common grated skills assessments or task-based L2 assess-
method of assessment in many contexts today, es- ments, topics to be discussed later in further
pecially where curricula and instructional meth- detail.
ods assume simple-to-complex notions of devel- Another example of a trait-based approach
opment. For example, in classroom-based gram- to L2 proficiency was offered by Canale and
mar assessment, where the goal is for learners Swain (1980) and later by Canale (1983) with
to recognize the differences between discrete their model of communicative competence (CC),
forms and their meanings, the skills-and-elements which defined proficiency as knowledge of gram-
model and discrete-point testing would provide matical resources and the capacity to use these
an adequate basis for that interpretation. Also, resources to understand and generate written
discrete-point testing of isolated items with refer- and spoken texts in sociolinguistically appropri-
ence to a theoretical construct is the basis for iden- ate ways. This model specified grammatical, soci-
tifying L2 production features measured in cur- olinguistic, and discourse competence, along with
rent automated scoring protocols of writing and strategic competence, defined in terms of com-
speaking. pensatory strategies, that is, strategies that serve to
Another example of a trait-based model of L2 mediate between competence and performance.
proficiency was proposed by Oller (1979), who By specifying sociolinguistic and discourse com-
conceptualized L2 proficiency not as a set of petence, Canale and Swain acknowledged how
components, but as a unitary, general proficiency context shaped L2 use, again broadening the
factor, integrated with perceptual and productive notion of L2 proficiency. Today, communicative
processing.3 This was referred to as pragmatic ex- language teaching and assessment are still the
pectancy grammar, or “any procedure or task that mainstay of L2 education.
causes the learner to process [and produce] se- Extending the CC model, Bachman (1990),
quences of linguistic elements in a language that with later refinements by Bachman and Palmer
conform to the normal contextual constraints of (1996, 2010), provided another trait-based ap-
that language and which requires the learner to proach to construct definition with their model
relate sequences of linguistic elements via prag- of communicative language ability (CLA). In this
matic mappings to the extralinguistic context” model, L2 ability was conceptualized as an individ-
(p. 38). This model addressed how L2 elements ual’s capacity for language use, as inferred from
relate to pragmatic meanings associated with an performance consistencies on tasks assumed to be
external context of situation, thereby foreshadow- generalizable across other tasks in the TLU do-
ing current notions of pragmatic knowledge and main. L2 ability was thus seen as an underlying
meaning potential. It also identified perceptual competence to be accessed during future perfor-
and productive processing as a means by which mance, but not as a prediction of L2 ability for a
learners are able to make moment-by-moment specific TLU context.
predictions about what an interlocutor is likely In their model L2 ability was defined theo-
to understand or say in interaction, especially in retically in terms of L2 knowledge and strate-
context-reduced situations, again presaging cur- gic competence. L2 knowledge consisted of
rent inquiry into the role of L2 processing in per- organizational and pragmatic knowledge, where
formance on assessments, as seen today in the organizational knowledge involved grammatical
work of Doughty (2014), focusing on aptitude, and textual knowledge, and pragmatic knowledge
and Phakiti (2003) and Purpura (1999, 2014a, included functional and sociolinguistic knowl-
2014b), focusing on strategy use and test per- edge. Strategic competence was seen as strategies
formance. Finally, rejecting discrete-point testing, underlying language use. In this model, CLA is
James E. Purpura 195
understood to be activated by the need to per- ficiency is normally demonstrated” (Clark, 1975,
form tasks drawn from the ‘context of situation’— p. 10). The examinee’s ability is then based on
where tasks represent “restricted or controlled raters’ judgments of the performance, as de-
versions of the contextual features that determine scribed in a scoring rubric. Performance on these
the nature of language performance” (Bachman, assessments is seen as a predictor of future task
1990, p. 112). That said, this model does not spec- performance.
ify how L2 ability interacts with tasks in specific McNamara (1996) referred to Clark and Jones’s
contexts, that is, how L2 ability might relate to approach to assessment as the strong version of
tasks in the academic domain. Nonetheless, the PA. In describing the weak version, he empha-
CLA model is still one of the most comprehensive sized that the focus was not so much on the can-
models of L2 proficiency and has been the source didate’s ability to complete surrogate, real-world
of inspiration for several conceptual assessment test tasks as predictors of future task success;
frameworks today. rather, the focus was on the task’s success in elic-
iting L2 performance on assessments in contexts
What Gets Assessed: Task-Centered Approaches and conditions similar to those of the intended
to Construct Definition real-life performance, so that raters could then
judge the performance with reference to the KSAs
A second approach to construct definition of interest by means of a holistic or analytic rubric.
is referred to as a task-centered, task-based, or PA is currently the most common approach to
performance-based approach to L2 assessment assessing L2 production and is one of the most
(Clark, 1972, 1978; McNamara, 1996; Norris et al., thoroughly researched areas of the field of L2 as-
1998; Skehan, 1998), where performance consis- sessment, especially as this concerns the effects
tencies are assumed to reflect the examinee’s abil- of different sources of variance on performance
ity to accomplish open-ended, written or spoken scores.4 Some current examples of PAs are seen
tasks within some real-life TLU domain. Task spec- in the Occupational English Test5 and the Canadian
ifications represent critical features of the task Language Benchmarks.6
and context, such as the setting (e.g., a phar- Another example of a task-centered approach
macy), the communicative context (e.g., a new to construct definition is seen in the work of
prescription), the communicative goal (e.g., un- Norris et al. (1998) and Brown et al. (2002)
derstanding instructions), and so forth. Task com- and is referred to as task-based language assess-
pletion is judged mostly from criteria drawn from ment (TBLA). An extension of the direct method,
real-world standards of performance (e.g., criteria TBLA interprets performance consistencies as the
from aviation), and the components of L2 knowl- accomplishment of real-life tasks within a specific
edge may or may not be isolated and assessed contextual domain, given the abilities that can-
(McNamara, 1996). In this regard, McNamara dis- didates bring to the task, so that future perfor-
tinguished strong from weak performance assess- mance can be predicted. “Task” is the “fundamen-
ments (PAs) depending on the real-word crite- tal unit of analysis motivating item selection, test
ria used and noted that most L2 PAs were ‘weak’ instrument construction, and the rating of task
assessments, since they failed to use real-world performance” (Long & Norris, 2000, p. 60). The
assessment criteria. Thus, performance in this competencies needed to perform tasks are not
approach may be interpreted as a predictor of the drawn, per se, from a theoretical model of L2 abil-
ability to perform similar tasks in real-life settings ity, rather from the KSAs needed to accomplish
and/or as a reflection of the ability underlying tasks at different levels of success. TBLA typically
performance. assesses performance based on can-do statements,
Task-centered approaches are all rooted in the like those in the Common European Framework of
work of Clark (1972) and Jones (1985), whose Reference,7 rather than on capacity-for-use state-
goal was to measure L2 proficiency in a direct ments, where performance reflects what learners
manner, that is, by requiring examinees to per- know based on a theoretical construct. Thus, the
form tasks similar to those in the real-life do- focus in TBLA is not on ability level, but on perfor-
main (Clark, 1972, p. 121). Instead of focusing mance and task outcome (Skehan, 1998). A cur-
on L2 ability components, they described PA as rent example of a TBLA is seen in the Certificate of
a measure of an examinee’s ability to communi- Dutch as a Foreign Language (Van Gorp & Deygers,
cate for some pragmatically useful purpose within 2014).8
a real-life setting by performing tasks that “dupli- Skehan (1998) proposed a somewhat different
cate as closely as possible the setting and opera- version of TBLA in which he is less concerned
tion of the real-life situations in which the pro- with tasks as predictors of real-world performance
196 The Modern Language Journal, 100, Supplement 2016
than with tasks as triggers of language and L2 What Gets Assessed: Interactionist Approach
processes during task performance. This perspec- to Construct Definition
tive highlights the potential interactions among
language, task, and cognitive variables. Tasks A third approach to construct definition is re-
in this approach are designed with reference ferred to as the interactionist approach to con-
to code complexity, cognitive complexity, and struct definition, as proposed by Chapelle (1998).
communicative stress, rather than according to This approach addresses the need to relate
contextual features such as setting and purpose. performance consistencies to “traits, contextual
Also, instead of using can-do statements as per- factors, and to their interactions” (p. 34). It
formance indicators, performance is assessed by interprets L2 performance as a combination of
means of capacity-for-use scales involving rater the trait of L2 knowledge and the characteristics
judgments of performance, as in traditional PA, of the context in which L2 ability is elicited and to
or by describing the features of L2 production which performance generalizes, all the while as-
involving the complexity, accuracy, and fluency suming that the interaction between the trait and
(CAF) of the output. Interestingly, Skehan also contexts is facilitated by metacognitive strategy
proposed the use of processing-driven scales, use (Chapelle, 1998). In other words, when learn-
where performance moderators relating to pro- ers or users are presented with goal-oriented L2
cessing (e.g., lexical searches) are specified in L2 assessment tasks in some relevant context, their
production rubrics. However, this idea has, unfor- L2 knowledge is activated along with the cognitive
tunately, yet to gain attention in the assessment processes needed to perform the task. Therefore,
literature. since score meaning derives from the interactions
Although Skehan’s (1998) proposal to use among L2 knowledge, contextual factors, and
L2 production features (i.e., CAF) as a mea- strategy use, the interactionist approach requires
sure of performance consistencies has had con- specification of (a) the traits, (b) the context, and
siderable resonance in the SLA literature (e.g., (c) the strategies required for construct-relevant
Ellis & Barkhuizen, 2005), this approach to con- performance. As Chapelle (1998) points out, “the
struct definition has had little influence on L2 trait components can no longer be defined in
testing practice, since complexes of L2 pro- context-independent, absolute terms, and contex-
duction features (e.g., percentage of error-free tual features cannot be defined without relevance
clauses) are virtually impossible to interpret and to their impact on underlying characteristics”
use in operational assessment contexts. Also, in (p. 43). The interactionist approach has signifi-
examining the effects of task characteristics and cantly broadened the construct of L2 proficiency
performance conditions related to cognitive de- and is currently the model underlying the assess-
mands on difficulty associated with CAF, Iwashita, ment framework for the TOEFL iBT.
McNamara, and Elder (2001) found that the Chalhoub–Deville (2003) carried Chapelle’s
performance conditions hypothesized by Skehan (1998) ideas on context and L2 ability a step fur-
(and others) failed to affect CAF as predicted. ther by arguing for a representation of L2 ability
Nonetheless, the use of these features has gained as an ‘ability-in-individual-in-context’ construct,
considerable momentum in the development of rather than as an individually focused ‘cognitive
automated scoring systems of speaking and writ- or within language user’ representation of L2 abil-
ing and their use in the enhancement of human ity. She maintained that the context of L2 use
scores (Deane, 2015). Also, the use of L2 produc- sometimes activates an examinee’s L2 ability at
tion features in validation research to compare times, and at other times, the examinee’s L2 abil-
the characteristics of tasks, contexts, or domains ity influences facets of the context, concluding
in assessment contexts with those in TLU contexts that “ability and context features are intrinsically
has also emerged as an informative source of ev- connected and it is difficult or impossible to dis-
idence in validation (see Cumming et al., 2006; entangle them” (p. 372).
Knoch, Macqueen, & O’Hagan, 2014). An example of the interactionist approach
In sum, TBLA in general has gained consid- is seen in Douglas’s (2000) work on language
erable recognition over the years as L2 teachers for specific purposes (LSP) assessment. Argu-
have become more interested in task-based lan- ing that “the nature of language knowledge
guage teaching as an extension of communicative may be different from one domain to another”
language teaching. This approach has also gen- (p. 24), Douglas proposed a special purpose lan-
erated considerable research interest in SLA and guage ability construct in which performance
L2 assessment (see Van den Branden, Bygate, & consistencies reflect the strategic engagement of
Norris, 2009; Van Gorp & Deygers, 2014). L2 ability resulting from tasks situated within
James E. Purpura 197
specific purposes contexts. He also argued that actionist view of L2 ability has provided theoret-
since background knowledge plays such an im- ical grounding for incorporating traits, contex-
portant role in these assessments, special purpose tual factors, and processing into construct def-
language ability needed to specify background inition, thereby acknowledging that a person’s
knowledge as part of the construct. So, drawing L2 ability is sociocognitively activated by goal-
on Hymes (1974), he extended Bachman and oriented tasks embedded within TLU contexts.
Palmer’s (1996) task characteristics framework to What is lacking in this approach, however, is a
include critical features of context. Douglas also consideration of a meaning component—or what
provided applied linguists with a concrete exam- meaning potentials or topical/disciplinary con-
ple of how the L2 knowledge trait, task/context, tent are being communicated. Functional knowl-
and their interaction through strategy use could edge (Bachman & Palmer, 2010) allows us to ex-
be incorporated into assessment construction. Im- amine the purpose of meaning conveyance, but not
portantly, this included background knowledge as the content of meaning conveyance. This is im-
part of the L2 proficiency construct. portant if L2 educators and assessment experts
Another example of the interactionist ap- want to assess the extent to which communication
proach is seen in the work of Purpura (2004, is substantively meaningful or content responsi-
2014c). Arguing that the fundamental linguistic ble (Douglas, 2000; Purpura, 2004; Snow & Katz,
resources of communication involve not only 2014). Thus, besides specifying the trait compo-
grammatical forms, but also the meaning po- nent, the context, and the interactions, I believe
tentials associated with these forms when used it is critical to specify the meaning component by
alone or in context, he proposed a model of L2 addressing the role that topic, theme, and con-
proficiency in which performance consistencies tent (topical or disciplinary) play in L2 perfor-
can be attributed not only to the selective use of mance as elicited from contextually rich tasks, de-
grammatical forms in conveying and understand- signed within social–interactional, transactional,
ing literal, semantic, or propositional meanings academic, or professional TLU domains.
associated with the topic of communication, but
also to a range of intended or implied prag- What Gets Assessed: Sociointeractional Approaches
matic meaning potentials (e.g., sociolinguistic, to Construct Definition
sociocultural, psychological/affective, rhetorical)
encoded in the communicative context. To elicit A fourth approach to construct definition
these meaning potentials, tasks then need to spec- might be referred to as the sociointeractional ap-
ify contextual features of communication (e.g., proach to construct definition. In this approach,
the setting, the communication goal), together performance in the TLU domain involves a lo-
with relevant sociolinguistic features (e.g., partici- cal, social activity (e.g., pitching a new product
pant roles), sociocultural features (e.g., New York idea to associates) in which individuals interact
City culture), psychological/affective features on a moment-by-moment basis to jointly con-
(e.g., participant attitudinal or emotional dispo- struct meaning to perform a goal-oriented activ-
sitions), and rhetorical (e.g., genres) features of ity (McNamara, 1997). Embedded in this activ-
the context. Finally, to mediate between L2 knowl- ity occur indigenous (L2 and content) assessment
edge, topical or disciplinary knowledge, and task activities that can be associated with indigenous
engagement, examinees need to access process- assessment criteria. Assessment thus involves an
ing capabilities associated with both the mind’s evaluation of the degree to which participants
cognitive architecture (e.g., attention, percep- effectively achieve the communicative goal by co-
tion, memory) and its functions (e.g., processing, constructing construct relevant meanings and ap-
strategies). This expansion of the L2 proficiency propriately engage in discursive practices (e.g.,
construct has implications for LSP assessment, turn-taking, preference organization, repair) ex-
scenario-based assessment, and the assessment pected within the specific TLU domain.
of language and content integrated learning. An An example of this approach occurs in the work
example of this approach will be found in the of Jacoby and McNamara (1999), who reported
newly created Community English Placement on how physicists in a graduate seminar evaluated
Test at Teachers College, Columbia University. and provided feedback on conference presenta-
In sum, the construct of L2 ability has evolved tion rehearsals. These rehearsals were intended
over the years to keep pace with contemporary to socialize participants into the practices, goals,
theories, empirical research, and increasing de- and habits of mind of this particular community.
mands to align assessment interpretation with tar- Based on an analysis of the interactions, partici-
get language use in real-life contexts. The inter- pants oriented to several assessment criteria (e.g.,
198 The Modern Language Journal, 100, Supplement 2016
newsworthiness, clarity, content accuracy, techni- have been interestingly extended to applying
cal delivery, overall quality) used to judge perfor- DA principles to large-scale assessment contexts,
mance in the discursive practice of presenting a especially in contexts where feedback, scaffolded
conference talk. Jacoby and McNamara argued assistance, and other forms of mediation are
that in this context, an independent set of assess- delivered through intelligent, computer-assisted
ment criteria would have been inappropriate for L2 assessment protocols (Poehner, 2014).
the assessment goal.
Where Do We Go From Here? Since its inception,
Another example of this approach is seen in the
the field of L2 assessment has engaged in lively
work of He and Young (1998), who, in refining
debates over the ongoing challenge of construct
Kramsch’s (1986) notion of interactional compe-
definition and how best to rethink constructs
tence (IC), described performance consistencies
to support contemporary needs and purposes
“not as an ability within an examinee,” but as the
for which assessments may be put to use. These
joint co-construction of “abilities, actions, and ac-
developments have direct implications for as-
tivities of all participants” (p. 5, italics in the orig-
sessment practice. This section attempted to
inal), while performing assessment tasks. IC is,
summarize these major issues and debates by
thus, conceived as a product of both interlocutors,
considering how past understandings continue to
since interaction is locally managed on a moment-
inform current thinking, research, and practice,
by-moment basis between participants. Interlocu-
and how they define future agendas for how the
tors are assumed to acquire “practice-specific IC”
construct of L2 proficiency might be assessed.
rather than “general, practice-independent com-
In the next section, I will discuss some selected
municative competence” (p. 7), suggesting that
areas of current interest and debate within the
performance on interactive speaking tasks does
assessment community.
not generalize beyond the specific event.
While the claim that IC is an attribute of talk-in-
interaction rather than an attribute of individuals SELECTED AREAS OF CURRENT
using resources to jointly co-construct meaning, ASSESSMENT ACTIVITY
knowledge, and action within some discursive
practice is problematic for most assessment ex- Aside from the debates on approaches to con-
perts. IC and the notion of indigenous assessment struct definition, L2 educators and assessors have
criteria have certainly broadened the L2 profi- continued to explore and improve ways to design,
ciency construct in that they have highlighted the develop, deliver, score, analyze assessments, and
role of co-construction (e.g., topic management, report assessment information in light of current
turn-taking) and intersubjectivity in spoken understandings. They have also taken advantage
communication, thereby leading L2 assessors to of new opportunities for assessment through tech-
reexamine interaction in test construction and nological advances and new methods of educa-
validation. They have also underscored the need tional measurement for statistical modeling. Fi-
to understand and consider indigenous scoring nally, they have pursued critical inquiries on the
criteria from the TLU domain in the assessment. use of assessments as fair and just measures of
A somewhat different example of the socioin- performance. Since the research contribution in
teractional approach is seen in the work of these areas is overwhelming, I will focus on a few
Lantolf and Poehner (2004) on dynamic assess- selected areas of interest.
ment (DA), where performance consistencies
are conceptualized in terms of changes in L2
The Assessment of Individual Elements and Skills
development as a result of interventions provided
by a mediator (assessor), who, during the course Assessing Pronunciation. After years of margi-
of instruction, attempts to mediate (i.e., assist) nalization, the assessment of pronunciation has
the narrowing of learning gaps. The assump- resurfaced as an important area of inquiry and
tion underlying DA is that mediations, provided practice in L2 assessment. This stems, for exam-
through co-constructed talk-in-interaction with ple, from the need to evaluate speech samples
a more capable person, will lead to the inter- from asylum seekers who claim they are from
nalization of new understandings. DA, unlike repressed groups, from demands for the intelli-
other conceptualizations, is located within the gibility of international teaching assistants, from
teaching/learning activity to provide mediation the desire to represent several varieties of En-
between the learner and task, instead of after glish in standardized exams, and from an interest
the activity. While DA has typically been used in measuring phonological control through auto-
and researched inside classrooms, current efforts mated scoring protocols of L2 speaking (Isaacs,
James E. Purpura 199
2014). While the focus is no longer on accent matic meaning potentials may be contextual, so-
reduction or the attainment of native-like stan- ciolinguistic, sociocultural/intercultural, psycho-
dards, current research in this area is concerned logical/affective, or rhetorical. The contextual
with understanding what phonological features features driving pragmatic ability include the in-
have the greatest effect on intelligibility and terlocutors’ beliefs and presuppositions about the
comprehensibility, in other words, what features temporal, spatial, and social characteristics of the
contribute most to communication breakdowns. settings, their knowledge of appropriate subject
Another area of inquiry relates to the influence of matter, their knowledge of appropriate registers,
rater characteristics, such as phonological mem- and so forth (Levinson, 1983). Critical to this
ory, on the ability to judge L2 pronunciation. A model is the role of context without which the
final area concerns the use of speech recognition ability to interpret extended or exophoric mean-
technology to relate automated scores of intelli- ings encoded in utterances would be lost. This
gibility to scores from human judgments (Isaacs, conceptualization again represents a significant
2014). In sum, we are likely to see much more broadening of the construct of L2 proficiency
work on the assessment of pronunciation in the with implications for the construction of test tasks,
future. scoring, and validation.
Grabowski (2009, 2013) used Purpura’s model
Assessing Pragmatics. Most applied linguists
to design contextually rich, reciprocal role-play
would agree that communication in real world
tasks intended to measure the ability to con-
contexts involves not only the ability to commu-
vey sociolinguistic, sociocultural, and psycholog-
nicate literal propositions, but also the ability to
ical meanings with rubrics intended to measure
decipher and convey nuanced pragmatic mean-
these meanings. She found that the tasks suc-
ings that encode, inter alia, the interpersonal re-
ceeded in eliciting the meanings of interest, and
lationship of interlocutors, their emotional and
that performance could be reliably judged. The
attitudinal stance, presuppositions about what in-
tasks also produced scalable scores across differ-
terlocutors know, and assumptions about how to
ent proficiency levels. Grabowski concluded that
interact in a particular setting (Purpura, 2004).
while pragmatic knowledge can be measured at all
However, until recently, the assessment of prag-
proficiency levels, it is critical to do so at the ad-
matic knowledge has received relatively little at-
vanced levels to avoid construct underrepresenta-
tention, even though pragmatic competence is
tion. She also noted the challenges, but not the
implicitly assessed in all L2 assessments.
impossibility, of administering and scoring recip-
Empirical research on pragmatics assessment
rocal tasks in operational testing situations.
began with the work of Hudson, Detmer, and
Currently, the assessment of pragmatic knowl-
Brown (1995), who investigated the sociolinguis-
edge has emerged as an area of strong interest in
tic component of pragmatic knowledge by design-
L2 assessment research, and one that is being in-
ing discourse completion tasks, role plays, and
vestigated for incorporation into large-scale L2 as-
self-assessments to measure the contextual vari-
sessments.
ables of relative power, social distance, and abso-
lute ranking of impositions in the context of three
Assessing Integrated Skills
speech acts. Building on this work, Roever (2005)
developed a web-based multiple-choice test of Integrated Skills Assessment. Research on the di-
pragmalinguistics intended to measure not only visibility of L2 proficiency clearly showed that
the sociolinguistic variables of politeness, but also L2 ability consists of a higher-order factor along
conversational implicature and situational rou- with several interrelated components (Bachman
tine formulae. Current research in the area (see & Palmer, 1982). As a result, L2 proficiency assess-
Roever, 2014) has also begun to look at pragmatic ments have traditionally been designed to mea-
awareness and notions of interactional compe- sure the four skills, together with grammar and vo-
tence, drawing on conversation analysis (Ross & cabulary, in separate sections or embedded in the
Kasper, 2013). assessment of productive skills. If the individual
Another approach to the assessment of prag- skill scores on these assessments produced strong
matic knowledge is seen in Purpura (2004). Sit- correlations, the assumption was that a combined
uating pragmatics assessment within a model of score could serve as an estimate of overall profi-
CLA, he defined pragmatic competence as the ciency. However, recent descriptions of the L2 de-
ability to use L2 resources, together with context, to mands needed in academic and workplace con-
understand and convey meanings beyond what is texts have shown that besides needing to per-
literally said. The source of these implied prag- form skills independently, in the vast majority of
200 The Modern Language Journal, 100, Supplement 2016
contexts, learners need to express meanings ing), this perspective has concentrated somewhat
through the integration of L2 skills (Leki & Car- narrowly on skill integration and performance.
son, 1997). In other words, source information is Scenario-based assessment (SBA) takes this a step
obtained from reading or listening and then inte- further by orchestrating skill integration within a
grated into speaking and writing. Thus, increas- thematically coherent, socially familiar, purpose-
ing numbers of assessments (e.g., the TOEFL driven scenario in which participants are required
iBT) have designed tasks in which two or more to complete a sequence of subtasks intended to
modalities are integrated, and where perfor- reflect the habits of mind underlying the over-
mance consistencies are interpreted as the com- arching scenario goal (e.g., work with peers to
bination of these constructs within a contextual pitch an idea about creating a green school; Saba-
space, as one might expect in a synthesis task. tini & O’Reilly, 2013). The completion of such a
Empirical research on integrated skills has goal requires a range of L2 skills in which exam-
taken some interesting pathways. One strand has inees have to understand, summarize, synthesize,
attempted to understand the extent to which and integrate information across texts and modal-
reading and writing are separate or integrated ities; evaluate the relevance and quality of the
constructs. For example, Sawaki, Quinlan, and synthesized information; and integrate the syn-
Lee (2013) examined performance on TOEFL in- thesized information into coherent, goal-oriented
tegrated writing tasks in which examinees used spoken or written responses. Capitalizing on the
source material content from the reading and lis- potential of computer delivery, SBA also provides
tening sections to write a response to the inte- opportunities to measure component skills pre-
grated items. They found that, in fact, both the sumed to enable performance of the higher order
reading/listening comprehension factor and the integrated skills, topical/disciplinary knowledge
writing factor were subsumed within a higher- across the scenario, their need for and response
order integrated comprehension and writing fac- to feedback and assistance, and a range of demo-
tor, which served to explain source-text-based graphic and social–psychological dispositions.
writing and text comprehension. Another set of Several researchers in mainstream assessment
studies examined the processes and strategies that (e.g., Bennett, 2010) have worked on the use of
examinees associate with performing integrated scenarios in disciplinary content learning. Among
skills assessment. Plakans (2009) found that writ- them, Sabatini and O’Reilly (2013) described how
ers asked to perform integrated skills tasks showed scenarios in the context of L1 literacy assess-
composing processes that clearly reflected ‘dis- ment could be used to provide a comprehen-
course synthesis.’ Still another interesting line of sive measure of reading ability designed to tap
inquiry examined the role of source-text charac- into a wide span of macro and micro reading
teristics of writing ability on L2 tests (Cumming competencies with the potential of providing di-
et al., 2006). agnostic information to examinees. The compo-
In sum, demands for L2 assessments that mea- nent skills part of the assessment targeted gen-
sure competencies and processes similar to those eralizable features related to basic processing of
in the TLU domain have motivated the assess- text (e.g., word recognition), while the integrated
ment of integrated L2 skills (Cumming, 2014). As skills part aimed to capture the ability to ac-
Plakans (2013) notes, this represents a step away complish performance on purpose-driven, the-
from conventional tests of the four skills toward matically related, integrated tasks, called global-
constructive principles of knowledge integration integrated, scenario-based assessments (GISAs).
and synthesis. While many questions still remain O’Reilly and Sabatini (2013) described how sev-
regarding what constructs are actually being as- eral dimensions of the reading ability construct
sessed by complex tasks requiring integration or could be organized and measured through sce-
how best to assess them, the ability to integrate narios. They state:
information is fundamental to language use and
This orchestration is achieved by incorporating a
a sure area of increased research interest in the
scenario-based design that organizes the assessment
future. around a central theme and goal for reading (e.g.,
work with fellow students to study for an exam
Scenario-Based Language Assessments. While the
or prepare a presentation on a science or history
work on integrated skills assessments has high- topic) (. . .) a set of diverse sources (e.g., blogs,
lighted the importance of tasks designed to mea- Web sites, videos, charts and diagrams, traditional
sure the learners’ ability to understand source text genre excerpts), and a sequence of subtasks to
material to communicate ideas in writing and achieve the final goal (e.g., evaluate sources, identify
speaking (e.g., reading to summarize in writ- important or relevant ideas, integrate information
James E. Purpura 201
across sources, make decisions, edit a wiki). In this L2 assessment (e.g., Leung, 2009; Rea–Dickins,
manner, GISA is designed to resemble the types of 2004), teacher assessment processes (e.g., Davi-
reading activities one might engage in school, work, son, 2004; Rea–Dickins, 2001), and L2 assessment
or leisure. (pp. 2–3) methods (Cheng, Todd, & Huiqin, 2004). Other
research, concentrating more on learners, has
Although L2 assessment experts have only looked at the value of self- and peer assessment
begun to develop SBAs of L2 ability, this type of (Patri, 2002), the role of diagnostic (Alderson
assessment, in my view, is theoretically appeal- et al., 2014) or dynamic (e.g., Poehner, 2014)
ing for its capacity to measure L2 ability within assessment in promoting teaching and learning,
thematically coherent, goal-driven contexts re- or the role of technology in learning and assess-
quiring KSAs that one would expect to find in ment (Chapelle & Douglas, 2006). Finally, Hill
real-life L2 task completion. It also offers op- and McNamara (2011) used data from two for-
portunities to incorporate the measurement of eign language classrooms to propose a framework
performance moderators, such as background for researching CBA processes, which addressed
knowledge, into the assessment protocol, along questions related to the teacher perspective, such
with a learning-oriented component9 designed as what teachers do, what teachers look for, and
to provide feedback and assistance for those who what theory or standards they use. Interestingly,
need it. In my view, SBA provides a plausible solu- they also addressed the learner’s perspective, that
tion to the theoretical conundrum of designing is, what understandings learners have regarding
tasks that account for the trait, context, content, learning and assessment of the foreign language.
and the ensuing sociocognitive and affective More recently, Turner and Purpura (2015) have
interactions. proposed a multidimensional approach to CBA,
entitled learning-oriented assessment (LOA), in
Assessment, Learning, and Instruction which the goal is to understand the complexities
of how information from assessments (e.g., tests,
While much discussion in L2 assessment has fo- observations, class discussions, naturalistic talk-
cused on assessments external to the classroom, in-interaction, peer feedback, self-assessment,
the role that assessment plays inside classrooms projects, portfolios, or moment-to-moment eval-
has emerged as an exciting new area of inquiry, es- uations of performance embedded in dialogic in-
pecially as this relates to assessments embedded in struction) serve to trigger L2 processes in the ulti-
learning and instruction. Traditionally, the focus mate resolution of learning gaps. Importantly, this
of L2 classroom-based assessment (CBA) has been approach acknowledges that assessment in class-
on the principled construction of tests, designed rooms is multifaceted, involving not only many
mostly to measure achievement for summative different dimensions (e.g., the context) that con-
purposes (e.g., a grade) and administered at the tribute to the learning process, but also several
end of instruction. This focus has unsurprisingly agents (e.g., students, teachers, peers, comput-
resulted in several books devoted to the develop- ers). In this respect, Turner and Purpura move
ment of assessments of individual L2 components beyond dichotomous depictions of CBA as for-
(e.g., assessing grammar), skills (e.g., assessing mative/summative or AOL/AFL, toward an ap-
writing), and methods (e.g., assessing language proach that characterizes assessment, learning,
for specific purposes).10 At the same time, the em- and instruction, while different, as intrinsically in-
phasis of CBA research in both mainstream edu- tertwined. They identify seven critical intercon-
cation and L2 assessment has also been motivated necting dimensions of CBA. These include: the
by a push to understand and promote forma- contextual dimension (i.e., the social, cultural, or
tive assessment (FA) or assessment FOR learning educational context of learning), the elicitation
(AFL; i.e., providing information to improve), as dimension (i.e., the method used to elicit perfor-
opposed to summative assessment or assessment mance), the proficiency dimension (i.e., the KSAs
OF learning—AOL (i.e., providing information being targeted and tracked), the cognitive or
on mastery; Black & Wiliam, 1998). The L2 CBA learning dimension (i.e., the sociocognitive char-
research has focused primarily on the teacher and acteristics underlying performance and learn-
teacher assessment literacy—that is, teachers’ un- ing), the affective dimension (i.e., attitudinal and
derstandings of assessment and assessment pro- emotional dispositions engaged in performance
cesses related to the identification and narrow- and learning), the interactional dimension (i.e.,
ing of learning gaps in instruction through FA. To the interactional attributes of assessment-related
that end, several studies have examined the role communication—turn-taking, preference struc-
of teacher knowledge, experience, and beliefs in ture, etc.), and the instructional dimension (i.e.,
202 The Modern Language Journal, 100, Supplement 2016
the instructor’s content knowledge, pedagogical to include ‘native’ and lingua franca varieties
content knowledge, and assessment literacy). of English in assessments, especially with those
While considerable research on LOA is cur- claiming to provide a global measure of L2 pro-
rently under way, the importance of this topic ficiency (Brown, 2014). These needs have been
as an emerging area of inquiry is highlighted tempered by concerns that the inclusion of ‘some’
by several recent conference symposia,11 a three- native varieties of English and virtually all lingua
day roundtable in 2015 at Teachers College, franca varieties into high-stakes assessment could
Columbia University on LOA theory and prac- introduce construct-irrelevant variance into the
tice in large-scale and classroom contexts,12 and a assessment and ultimately incur negative conse-
symposium on classroom-based L2 assessment at quences for examinees (Harding, 2011b). While
the 2016 Georgetown University Round Table on this debate is far from over, research in the area
Language and Linguistics. has initiated discussions for a reconsideration
of what L2 proficiency means, especially in EFL
Language Assessment in Global Contexts contexts, and for further research on the effect
of these features on test performance.
Over the last few decades, societal, economic,
and geopolitical forces around the globe have Common European Framework of Reference (CEFR).
converged to create the need for L2 certification The CEFR, originally designed by the Council
for individuals wishing to pursue education, em- of Europe (2001) to provide “a common basis
ployment, asylum, and citizenship. This has re- for the elaboration of language syllabuses, cur-
sulted in the creation of high-stakes standardized riculum guidelines, examinations, textbooks, etc.
L2 assessments designed for these uses, followed across Europe” (p. 1), provides several sets of L2
by a substantial body of research relating to the performance indicators across six levels of pro-
use of these exams for their intended purposes. ficiency. These indicators are then used to align
As the need for high-quality exams administered L2 assessments to the framework so that profi-
around the world has forced testers to rethink ciency levels can reflect common understandings
some of their practices, I will briefly highlight across contexts in terms of what candidates are
some of the current issues. able do with the L2 (Council of Europe, 2009).
The CEFR has been used extensively across Eu-
Assessing English as a Lingua Franca. English
rope but has more recently also been used in con-
has become the de facto global lingua franca
texts outside of Europe as a set of standards for
given its extensive use around the world by native
curricular reform. Research in the area has exam-
and nonnative speakers. In fact, increasing num-
ined the CEFR scale functionality, procedures for
bers of speakers from different first languages
aligning tests with the CEFR, scale use in several
use English to communicate, something that
demographic contexts, and the limitations of the
many think should be reflected to some degree
CEFR (Council of Europe, 2002; see also a special
in assessments. For example, Harding (2011a)
issue in Language Testing—volume 22). Another
highlighted the need to incorporate L2 accents
body of research has criticized the theoretical un-
of English into the listening sections of large-scale
derpinnings of the CEFR, its use as a policy instru-
exams to reflect the diversity of accents in English-
ment, and its use with non-European languages
speaking contexts. Research on nonnative En-
(Fulcher, 2004). Nonetheless, the CEFR has had
glish speakers has also highlighted how nonnative
an enormous effect on how L2 teaching and as-
speakers use phonological accommodations to
sessment relate to and can be aligned with a set
enhance intelligibility in L2 communication
of external standards, and it can be credited with
(Jenkins, 2000). Research from corpus analyses
raising assessment literacy across the globe.
has shown how certain nonnative-like lexicogram-
matical features have little effect on communica- The Values and Social Consequences of Assessment Use
tion (Seidlhofer, 2011). More recently, research
from pragmatics analyses has examined accom- As mentioned previously, L2 assessors have
modations with respect to miscommunication long been interested in the consequences of
and the role of code-switching as a means of assessment use, especially as this relates to fair-
preserving or broadening one’s linguistic and ness, social justice, and ethics. This has mostly
cultural identity. This research has called into been argued from the perspective of linking
question, as Jenkins and Leung (2014) point out, test validity to empirical or rational evidence
the use of the native-like standard of competence supporting the beneficial consequences of test
with respect to grammatical and pragmatic norms use. However, much of the recent research in
on assessments. It has also highlighted the need this area has actually focused on test misuse. For
James E. Purpura 203
example, one potential misuse of assessment oc- sessment come from new constructs that can be
curs when tests are used to make decisions about assessed, and how assessments can be delivered
individuals or groups for whom the assessment on computers and other digital devices. Once be-
was not originally intended. This situation may yond the reach of most assessors, several programs
unintentionally lead to harmful consequences for are now available (e.g., Google apps) with author-
stakeholders affected by those exams (Shohamy, ing tools for the creation of all sorts of assess-
2001a). This situation is brought into focus ments. As a result, L2 educators and assessors can
when assessments are used to make high-stakes fairly easily create enriched task environments, in-
decisions in educational contexts or contexts corporating into the assessment audio files, digi-
related to citizenship, immigration, asylum, tal video clips, wikis, podcasts, limited web access
or naturalization (Eades, 2009; Kunnan, 2009; through hyperlinks, interactivity through video
McNamara & Roever, 2006; Shohamy, 2001b). In chats or messaging, simulations of face-to-face
the educational context, Leung and Rea–Dickins interviews or phone conversations, editing and
(2007) discuss the contradictions and capacities proofreading tools (e.g., spell check), and other
of teacher assessment as a policy instrument resources (e.g., dictionary). The availability of this
for the assessment of English literacy with the technology has, to some extent, revolutionized
National Curriculum in the United Kingdom. how assessments can be designed and delivered,
They describe the consequences for pedagogy the caveat always being that the usefulness of these
and curriculum provision when policy makers features for expanding constructs and augment-
seem more interested in the political benefits of ing authenticity should be determined by valida-
educational success than in assessment findings tion research.
that reveal the actual technical and educational Also available to assessment developers are writ-
issues related to educational reform for L2 learn- ten and spoken corpora that can be used in the
ers. In the citizenship context, McNamara and construction of assessments to examine grammat-
Ryan (2011) use the role of English literacy in ical, semantic, functional, and discourse features
the Australian citizenship test as a context for at several levels of proficiency and across general
distinguishing between fair assessments (i.e., and specific language use domains. These cor-
qualities related to the test properties) and just pora are now available online for use in devel-
assessments (i.e., qualities related to the ethical oping, for example, options for multiple choice
values embodied in assessment use), arguing that questions or for selecting graded texts (Cobb,
while the technical standards of an assessment 2009). In fact, several large corpora have been
might be fair, they may still be ethically unjust to compiled for use in large-scale assessment. ETS
stakeholders, and vice versa. developed the TOEFL 2000 Spoken and Written
Since the late 1990s, the social consequences Academic Language Corpus to help item writers
of assessment use have been a deep and endur- assess language drawn from a variety of contexts
ing concern for L2 assessors with two special issues in the real-life academic domain (Biber et al.,
(Language Testing in 2004 and Language Assessment 2004). Cambridge ESOL and Cambridge Univer-
Quarterly in 2007). This topic is likely to remain in sity Press developed a learner corpus as a resource
the fore for years to come. for designing test items and investigating lexico-
graphic research. This corpus consisted of written
exam scripts by nonnative English speakers along
Technology and Language Assessment with the associated questions; the scripts repre-
sent several general, academic, and professional
Advances in computer and telecommunica-
English tests at six CEFR levels. Barker (2014) pro-
tions technologies have profoundly changed the
vides an excellent review of the use of corpora in
ways we work, learn, and socialize. These trends
assessment.
have significantly impacted the competencies we
Considerable research has targeted the role of
now need, and those to be assessed. These same
technology and new media in L2 assessment as
technological advances have allowed assessors to
well. Chapelle and Douglas (2006) provided a
design, deliver, and score assessments that not
comprehensive overview of computer-based L2 as-
only reach beyond conventional practices, but
sessment in which discussions included how new
also provide opportunities for assessing constructs
technologies influence construct definitions and
that have previously eluded us. This section dis-
other features of the assessment. Ockey (2009)
cusses some of the current work in this area.
highlighted the need to disentangle the interplay
Technology and Assessment Design. Some of the among learner abilities, technical affordances,
most exciting applications of technology to L2 as- and multimodal texts so that L2 constructs can
204 The Modern Language Journal, 100, Supplement 2016
be reexamined. Gruba (2014) called for broader current consideration. Thus, I began the article
validation research agenda on new media and L2 by clarifying how notions of what constitutes L2
assessment by means of an assessment use argu- assessment have significantly expanded over the
ment. Given the potential of these new technolo- years as assessment concerns have moved beyond
gies for the development of assessments and the the technical quality of an instrument to the use
questions that they pose, I think we have only be- of that instrument in decision making. I then pro-
gun to see what the future holds. vided a rather in-depth look at a critical concern
for L2 assessors—approaches to construct defini-
Automated Scoring of Speaking and Writing. New
tion and how they influence what actually gets as-
digital technologies have facilitated the design,
sessed. Finally, with an eye toward the future, I
delivery, and collection of assessment data by
rather subjectively highlighted a few areas of as-
means of a growing number of tools and ap-
sessment activity that have attracted considerable
plications that allow for assessment responses to
research interest and enthusiasm. Finally, given
be automatically scored and analyzed. As a re-
the limits of such a review, I admit I may have left
sult, several large-scale L2 assessment programs
out some areas of critical concern for L2 testers,
have designed automated scoring and evalua-
such as the debates regarding the different ap-
tion programs of speaking or writing. As Burn-
proaches to validation or the latest quantitative
stein (2013) points out, automated scoring pro-
methods and their application to assessment vali-
grams only produce scores, whereas automated
dation. This review was never meant to be conclu-
evaluation systems also offer diagnostic feedback
sive or fully comprehensive, but it is my hope that
usually related to linguistic features. One widely
it has succeeded in highlighting the exciting work
used system for automatic essay scoring was de-
that L2 assessors are currently engaged in and the
veloped at the Educational Testing Service (ETS)
work that still remains before us.
and is called e-rater.13 This system uses a natu-
ral language processing methodology to detect
text structure, linguistic structure, and other fea-
tures, combined with statistical protocols to gener-
ate essay scores. Despite criticisms that e-rater fo- NOTES
cuses too much on linguistic features rather than
on messages, research on comparisons between
1 See the American Council on the Teaching of
e-rater and human essay scores have shown re-
markably high rates of agreement (96% of the Foreign Languages Interview Familiarization Man-
time; Chodorow & Burstein, 2004), attributed per- ual (2012): http://www.languagetesting.com/oral-
haps to the relationship between “ease of text proficiency-interview-opi
2 For further reading on these approaches, see Bach-
production” and the “ability to mobilize cogni-
man (2007) and Chapelle (1998).
tive resources to address rhetorical and concep- 3 While Oller does not specifically mention ‘pro-
tual problems” (Deane, 2015, p. 7). Since e-rater ductive processing,’ we can assume that pragmatic ex-
is more efficient than humans in focusing on mi- pectancy grammar also applied to production.
cro features, e-rater scores are increasingly used 4 For a current review of PA, see Yu (2013).

alongside human scores to produce essay scores. 5 See https://www.occupationalenglishtest.org


6 See https://www.google.com/search?client=safari
Although many testers are still skeptical about
assigning scores based on a complex of pro- &rls=en&q=Canadian+Language+Benchmarks&ie=
duction features without the inclusion of robust UTF-8&oe=UTF-8
7 For more information on the CEFR, see http://
meaning-based contributions to the scores, con-
www.coe.int/t/dg4/linguistic/cadre1_en.asp
siderable research in the area is under way and is 8 See http://www.cnavt.org/files/alg.folder_EN.pdf
likely to continue as new technologies emerge. 9 For more information on SBA, see http://www.

ets.org/research/topics/reading_for_understanding/
assessments
CONCLUSION 10 See the Cambridge Language Assessment Series or

Routledge’s New Perspective on Language Assessment


In this article I have attempted to provide an up-
Series.
date on some of the major issues, concerns, and 11 See LOA symposia at AAAL 2013, BAAL 2014, and
debates in L2 assessment, as I see them. I have LTRC 2015.
shown how past concerns shape current solutions, 12 For more information on LOA, see http://www.
and then how current solutions create opportu- tc.columbia.edu/tccrisls/
nities for advancement. I have tried to argue that 13 For more information, see https://www.ets.org/

‘past’ concerns often have a way of resurfacing for erater/about


James E. Purpura 205
Canale, M., & Swain, M. (1980). Theoretical bases of
communicative approaches to second language
REFERENCES
teaching and testing. Applied Linguistics, 1, 1–
47.
Alderson, J. C., Haapakangas, E.–L., Huhta, A., Niemi- Carroll, J. B. (1961). Fundamental considerations in
nen, L., & Ullakonoja, R. (2014). The diagnosis of testing for English proficiency of foreign students.
reading in a second or foreign language. New York: In Testing the English proficiency of foreign students
Routledge/Taylor & Francis. (pp. 30–40). Washington, DC: Center for Applied
American Council on the Teaching of Foreign Lan- Linguistics.
guages. (2012). Oral Proficiency Interview familiariza- Chalhoub–Deville, M. (2003). Second language interac-
tion manual. White Plains, NY: ACTFL. Accessed 17 tion: Current perspectives and future trends. Lan-
April 2015 at http://www.languagetesting.com/ guage Testing, 20, 369–383.
oral-proficiency-interview-opi. Chapelle, C. A. (1998). Construct definition and validity
Bachman, L. F. (1990). Fundamental considerations in lan- inquiry in SLA research. In L. F. Bachman & A.
guage testing. Oxford: Oxford University Press. D. Cohen (Eds.), Interfaces between second language
Bachman, L. F. (2005). Building and supporting a case acquisition and language testing research (pp. 32–70).
for test use. Language Assessment Quarterly, 2, 1–34. New York: Cambridge University Press.
Bachman, L. F. (2007). What is the construct? The di- Chapelle, C. A. (2008). The TOEFL validity argument.
alectic of abilities and contexts in defining con- In C. A. Chapelle, M. Enright, & J. Jamieson
structs in language assessment. In J. Fox, M. (Eds.), Building a validity argument for the Test of En-
Wesche, D. Bayliss, L. Cheng, C. E. Turner, & C. glish as a foreign language (pp. 319–352). London:
Doe (Eds.), Language testing reconsidered (pp. 41– Routledge/Taylor & Francis.
71). Ottawa: University of Ottawa Press. Chapelle, C. A., & Douglas, D. (2006). Assessing language
Bachman, L. F., & Palmer, A. (1982). The construct val- through computer technology. Cambridge: Cambridge
idation of some components of communicative University Press.
proficiency. TESOL Quarterly, 16, 449–465. Cheng, L., Todd, R., & Huiqin, H. (2004). ESL/EFL
Bachman, L. F., & Palmer, A. S. (1996). Language testing instructors’ classroom assessment practices: Pur-
in practice. Oxford: Oxford University Press. poses, methods, and procedures. Language Testing,
Bachman, L. F., & Palmer, A. S (2010). Language assess- 21, 360–389.
ment in practice. Oxford: Oxford University Press. Chodorow, M., & Burstein, J. (2004). Beyond es-
Barker, F. (2014). Using corpora to design assessment. say length: Evaluating e-rater’s performance on
In A. J. Kunnan (Ed.), The companion to language as- TOEFL R
Essays (TOEFL R
Research Report RR-
sessment (pp. 1452–1476). Oxford, UK: John Wiley 73, ETS Research Report RR-04-04). Princeton,
& Sons. NJ: Educational Testing Service.
Bennett, R. (2010). Cognitively based assessment of, for, Clark, J. L. D. (1972). Foreign language testing: Theory and
and as learning (CBAL): A preliminary theory of practice. Philadelphia: Center for Curriculum De-
action for summative and formative assessment. velopment.
Measurement, 8, 70–92. Clark, J. L. D. (1975). Theoretical and technical consid-
Biber, D., Conrad, S. M., Reppen, R., Byrd, P., Helt, erations in oral proficiency testing. In R. Jones &
M., & Clark, V. (2004). Representing language use B. Spolsky (Eds.), Testing language proficiency (pp.
in the university: Analysis of the TOEFL R
2000 spo- 10–24). Arlington, VA: Center for Applied Lin-
ken and written academic language corpus (TOEFL R
guistics.
Monograph No. MS-25). Princeton, NJ: Educa- Clark, J. L. D. (1978). Direct testing of speaking proficiency.
tional Testing Service. Princeton, NJ: Educational Testing Service.
Black, P., & Wiliam, D. (1998). Assessment and class- Cobb, T. (2009). Complete Lexical Tutor. Montréal: Uni-
room learning. Assessment in Education: Principles, versity of Québec. Accessed 7 September 2015 at
Policy & Practice, 5, 7–74. http://www.lextutor.ca/
Brown, J. D. (2014). The future of world Englishes in Council of Europe. (2001). Common European Frame-
language testing. Language Assessment Quarterly, 11, work of Reference for Languages: Learning, teach-
5–16. ing, assessment. Cambridge: Cambridge University
Brown, J. D., Hudson, T. D., Norris, J. M., & Bonk, W. Press.
(2002). An investigation of second language task-based Council of Europe. (2002). Common European Frame-
performance assessments. Honolulu, HI: University of work of Reference for Languages: Learning, teaching,
Hawai‘i Press. assessment—Case studies. Cambridge: Cambridge
Burnstein, J. (2013). Automated essay evaluation and University Press.
scoring. In C. A. Chapelle (Ed.), The encyclope- Council of Europe. (2009). Manual for relating language
dia of applied linguistics: Assessment and evaluation. examinations to the Common European Framework of
Wiley–Blackwell. Reference for Languages. Strasbourg, France: Coun-
Canale, M. (1983). On some dimensions of language cil of Europe.
proficiency. In J. W. Oller, Jr. (Ed.), Issues in lan- Cumming, A. (2014). Assessing integrated skills. In A. J.
guage testing research (pp. 333–342). Rowley, MA: Kunnan (Ed.), The companion to language assessment
Newbury House. (pp. 216–229). Oxford, UK: John Wiley & Sons.
206 The Modern Language Journal, 100, Supplement 2016
Cumming, A., Kantor, R., Baba, K., Eouanzoui, K., Er- Hymes, D. (1974). Foundations in sociolinguistics: An
dosy, U., & James, M. (2006). Analysis of discourse ethnographic approach. Philadelphia: University of
features and verification of scoring levels for indepen- Pennsylvania Press.
dent and integrated prototype written tasks for the new Isaacs, T. (2014). Assessing pronunciation. In A. J.
TOEFL R
test (TOEFL R
Monograph No. MS-30). Kunnan (Ed.), The companion to language assessment
Princeton, NJ: Educational Testing Service. (pp. 140–155). Oxford, UK: John Wiley & Sons.
Davison, C. (2004). The contradictory culture of Iwashita, N., McNamara, T., & Elder, C. (2001). Can
teacher-based assessment: ESL teacher assessment we predict task difficulty in an oral proficiency
practices in Australian and Hong Kong secondary test? Exploring the potential of an information-
schools. Language Testing, 21, 305–334. processing approach to ask design. Language
Deane, P. (2015). On the relation between automated Learning, 51, 401–436.
essay scoring and modern views of the writing con- Jacoby, S., & McNamara, T. (1999). Locating compe-
struct. Assessing Writing, 18, 7–24. tence. English for Specific Purposes, 18, 213–241.
Doughty, C. (2014). Assessing aptitude. In A. J. Kunnan Jenkins, J. (2000). The phonology of English as an interna-
(Ed.), The companion to language assessment (pp. 25– tional language. Oxford: Oxford University Press.
46). Oxford, UK: John Wiley & Sons. Jenkins, J., & Leung, C. (2014). English as a lingua
Douglas, D. (2000). Assessing language for specific purposes: franca. In A. J. Kunnan (Ed.), The companion to
Theory and practice. Cambridge: Cambridge Univer- language assessment (pp. 1605–1616). Oxford, UK:
sity Press. John Wiley & Sons.
Eades, D. (2009). Testing the claims of asylum seekers: Jones, R. (1985). Some basic considerations in test-
The role of language analysis. Language Assessment ing oral proficiency. In Y. Lee, A. Fok, R. Lord,
Quarterly, 6, 30–40. & G. Low (Ed.), New directions in language testing
Ellis, R., & Barkhuizen, G. (2005). Analysing learner lan- (pp. 77–84). Oxford, UK: Pergamon.
guage. Oxford: Oxford University Press. Kane, M. (2006). Validation. In R. Brennen (Ed.), Educa-
Fulcher, G. (2004). Deluded by artifices? The Common tional measurement (4th ed., pp. 17–64). Westport,
European Framework and harmonization. Lan- CT: Greenwood.
guage Assessment Quarterly, 1, 253–366. Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An in-
Grabowski, K. (2009). Investigating the construct validity of vestigation of the effect of task type on the dis-
a test designed to measure grammatical and pragmatic course produced by students at various score levels
R
knowledge in the context of speaking. (Unpublished in the TOEFL iBT writing test. ETS Research Report
doctoral dissertation). Columbia University, New Series, 2014, 1–82.
York. Kramsch, C. (1986). From language proficiency to in-
Grabowski, K. (2013). Investigating the construct valid- teractional competence. Modern Language Journal,
ity of a role-play test designed to measure gram- 70, 366–372.
matical and pragmatic knowledge at multiple pro- Kunnan, A. J. (2004). Test fairness. In M. Milanovic
ficiency levels. In S. Ross & G. Kasper (Eds.), Assess- & C. Weir (Eds.), European language testing in a
ing second language pragmatics. New York: Palgrave global context (pp. 27–48). Cambridge: Cambridge
Macmillan. University Press.
Gruba, P. (2014). New media in language assessment. Kunnan, A. J. (2009). Testing for citizenship: The U.S.
In A. J. Kunnan (Ed.), The companion to language naturalization test. Language Assessment Quarterly,
assessment (pp. 955–1012). Oxford, UK: John Wiley 6, 89–97.
& Sons. Lado, R. (1961). Language testing. New York: McGraw–
Harding, L. (2011a). Accent and listening assessment: A val- Hill.
idation study of the use of speakers with L2 accents on an Lantolf, J. P., & Poehner, M. E. (2004). Dynamic assess-
academic English listening test. Frankfurt am Main, ment: Bringing the past into the future. Journal of
Germany: Peter Lang. Applied Linguistics, 1, 49–74.
Harding, L. (2011b). Accent, listening assessment and Leki, I., & Carson, J. (1997). “Completely different
the potential for a shared-L1 advantage: A DIF per- worlds”: EAP and the writing experiences of ESL
spective. Language Testing, 29, 163–180. students in university courses. TESOL Quarterly, 31,
He, A. W., & Young, R. (1998). Language proficiency 39–69.
interviews: A discourse approach. In R. Young & Leung, C. (2009). Developing formative teacher assess-
A. W. He (Eds.), Talking and testing (pp. 1–24). ment: Knowledge, practice, and change. Language
Philadelphia/Amsterdam: John Benjamins. Assessment Quarterly, 1, 19–41.
Hill, K., & McNamara, T. (2011). Developing a com- Leung, C., & Rea–Dickins, P. (2007). Teacher assess-
prehensive, empirically based research framework ment as policy instrument: Contradictions and ca-
for classroom-based assessment. Language Testing, pacities. Language Assessment Quarterly, 4, 6–36.
29(3), 395–420. Levinson, S. (1983). Pragmatics. Cambridge: Cambridge
Hudson, T., Detmer, E., & Brown, J. D. (1995). Devel- University Press.
oping prototypic measures of cross-cultural pragmatics. Long, M. H., & Norris, J. M. (2000). Task-based language
Honolulu, HI: University of Hawai‘i Press. teaching and assessment. In M. Byram (Ed.),
James E. Purpura 207
Encyclopedia of language teaching (pp. 597–603). Plakans, L. (2013). Assessment of integrated skills. In C.
London: Routledge. A. Chapelle (Ed.), The encyclopedia of applied linguis-
McNamara, T. F. (1996). Measuring second language per- tics. Wiley–Blackwell.
formance. London: Longman. Poehner, M. (2014). Dynamic assessment in the class-
McNamara, T. F. (1997). ‘Interaction’ in second lan- room. In A. J. Kunnan (Ed.), The companion to lan-
guage performance assessment: Whose perfor- guage assessment (pp. 677–692). Oxford, UK: John
mance? Applied Linguistics, 18, 446–466. Wiley & Sons.
McNamara, T. F., & Roever, C. (2006). Language assess- Purpura, J. E. (1999). Strategy use and language test perfor-
ment at school: Social values and policy. Language mance: A structural equation modeling approach. Cam-
Learning, 56, 203–245. bridge: Cambridge University Press.
McNamara, T. F., & Ryan, K. (2011). Fairness versus jus- Purpura, J. E. (2004). Assessing grammar. Cambridge:
tice in language testing: The place of English lit- Cambridge University Press.
eracy in the Australian citizenship test. Language Purpura, J. E. (2014a). Cognition and language assess-
Assessment Quarterly, 8, 161–178. ment. In A. J. Kunnan (Ed.), The companion to
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educa- language assessment (pp. 1452–1476). Oxford, UK:
tional measurement (3rd ed., pp. 13–103). New York: John Wiley & Sons.
American Council on Education and Macmillan Purpura, J. E. (2014b). Language learner styles and
Publishing Company. strategies. In M. Celce–Murcia, D. Brinton, & A.
National Research Council. (1999). The changing na- Snow (Eds.), Teaching English as a second or foreign
ture of work: Implications for occupational analysis. language (4th ed., pp. 532–549). Boston: National
Committee on Techniques for the Enhancement Geographic Learning/Cengage Learning.
of Human Performance: Occupational Analy- Purpura, J. E. (2014c). Assessing grammar. In A. J.
sis. Commission on Behavioral and Social Sci- Kunnan (Ed.), The companion to language assessment
ences and Education. Washington, DC: National (pp. 100–124). Oxford, UK: John Wiley & Sons.
Academy Press. Purpura, J. E., Brown, J. D., & Schoonen, R. (2015). Im-
National Research Council. (2001). Building a workforce proving the validity of quantitative measures in ap-
for the information economy. Committee on Work- plied linguistics research. Language Learning, 65,
force Needs in Information Technology. Board on 36–73.
Testing and Assessment; Board on Science, Tech- Rea–Dickins, P. (2001). Mirror, mirror on the wall: Iden-
nology, and Economic Policy; and Office of Scien- tifying processes of classroom assessment. Lan-
tific and Engineering Personnel. Washington, DC: guage Testing, 18, 429–462.
National Academy Press. Rea–Dickins, P. (2004). Understanding teachers as
Norris, J. M., Brown, J. D., Hudson, T., & Bonk, J. (2002). agents of assessment. Language Testing, 21, 249–
Examinee abilities and task difficulty in task-based 258.
second language performance assessment. Lan- Roever, C. (2005). Testing ESL pragmatics. Frankfurt am
guage Testing, 19, 395–418. Main, Germany: Peter Lang.
Norris, J. M., Brown, J. D., Hudson, T., & Yoshioka, Roever, C. (2014). Assessing pragmatics. In A. J. Kunnan
J. (1998). Designing second language performance (Ed.), The companion to language assessment (pp.
assessments (Vol. SLTCC Technical Report #18). 125–139). Oxford, UK: John Wiley & Sons.
Honolulu, HI: Second Language Teaching and Ross, S., & Kasper, G. (Eds.). (2013). Assessing sec-
Curriculum Center, University of Hawai‘i at ond language pragmatics. New York: Palgrave
Mānoa. Macmillan.
Ockey, G. (2009). Developments and challenges in the Sabatini, J., & O’Reilly, T. (2013). Rationale for a
use of computer-based testing (CBT) for assessing new generation of reading comprehension assess-
second language ability. Modern Language Journal, ments. In B. Miller, L. Cutting, & P. McCardle
93, 836–847. (Eds.), Unraveling reading comprehension: Behavioral,
O’Reilly, T., & Sabatini, J. (2013). Reading for understand- neurobiological, and genetic components (pp. 100–
ing: How performance moderators and scenarios impact 111). Baltimore, MD: Brookes Publishing.
assessment design (Research Report No. RR-13-31). Sawaki, Y., Quinlan, T., & Lee, Y.–W. (2013) Understand-
Princeton, NJ: Educational Testing Service. ing learner strengths and weaknesses: Assessing
Oller, J. W. (1979). Language tests at school: A pragmatic performance on an integrated writing task. Lan-
approach. London: Longman. guage Assessment Quarterly, 10, 73–95.
Patri, M. (2002). The influence of peer feedback on Seidlhofer, B. (2011). Understanding English as a lingua
self- and peer-assessment of oral skills. Language franca: A complete introduction to the theoretical nature
Testing, 19, 109–131. and practical implications of English used as a lingua
Phakiti, A. (2003). A closer look at the relationship of franca. Oxford: Oxford University Press.
cognitive and metacognitive strategy use to EFL Shohamy, E. (2001a). The power of tests: A critical perspec-
reading comprehension test performance. Lan- tive on the uses of language tests. London: Pearson.
guage Testing, 20, 26–56. Shohamy, E. (2001b). Democratic assessment as an al-
Plakans, L. (2009). Discourse synthesis in integrated ternative. Language Testing, 18, 373–391.
second language assessment. Language Testing, 26, Skehan, P. (1998). A cognitive approach to language learn-
561–587. ing. Oxford: Oxford University Press.
208 The Modern Language Journal, 100, Supplement 2016
Snow, A., & Katz, A. (2014). Assessing language and con- Van Gorp, K., & Deygers, B. (2014). Task-based language
tent. In A. J. Kunnan (Ed.), The companion to lan- assessment. In A. J. Kunnan (Ed.), The companion
guage assessment (pp. 230–247). Oxford, UK: John to language assessment (pp. 578–593). Oxford, UK:
Wiley & Sons. John Wiley & Sons.
Turner, C. E., & Purpura, J. E. (2015). Learning- Vollmer, H. J., & Sang, F. (1983). Competing hypotheses
oriented assessment in second and foreign lan- about second language ability: A plea for caution.
guage classrooms. In D. Tsagari & J. Baneerjee In J. W. Oller, Jr. (Ed.), Issues in language testing
(Eds.), Handbook of second language assessment (pp. research (pp. 29–79). Rowley, MA: Newbury House.
255–272). Boston: De Gruyter Mouton. Yu, G. (2013). Performance assessment in the class-
Van den Branden, K., Bygate, M., & Norris, J. M. room. In A. J. Kunnan (Ed.), The companion to lan-
(Eds.). (2009). Task-based language teaching: A guage assessment (pp. 615–630). Oxford, UK: John
reader. Philadelphia/Amsterdam: John Benjamins. Wiley & Sons.

You might also like