Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Comparing the Effects of Test Anxiety on

Independent and Integrated Speaking


Test Performance
HENG-TSUNG DANNY HUANG
National Sun Yat-Sen University
Kaohsiung, Taiwan
SHAO-TING ALAN HUNG
National Taiwan University of Science and Technology
Taipei, Taiwan

Integrated speaking test tasks (integrated tasks) offer textual and/or


aural input for test takers on which to base their subsequent oral
responses. This path-analytic study modeled the relationship between
test anxiety and the performance of such tasks and explored whether
test anxiety would differentially affect the performance of indepen-
dent speaking test tasks (independent tasks) and the performance of
integrated tasks. A total of 352 students studying English as a foreign
language took two independent tasks for which they spoke without
input support, performed two integrated tasks for which they orally
summarized the reading and listening input, and completed the state
anxiety inventory twice. To avoid topic effects, half of them took the
tasks on one topic combination, and the other half took the tasks on
another combination. Path analyses of the data reveal that (1) test
anxiety significantly affected integrated performance, (2) test anxiety
impacted independent performance and integrated performance in a
statistically equivalent way, and (3) topic effects were absent. These
findings suggest that the advantage of integrated tasks over indepen-
dent tasks might not relate to the reduction of test anxiety or its
impact on test performance and that integrated tasks suffer the con-
struct validity threat posed by test anxiety as much as independent
tasks.
doi: 10.1002/tesq.69

T he beginning of the new millennium has witnessed a wave of inter-


est in using integrated test tasks to determine second language
(L2) performance (Gebril, 2006; Huang & Hung, 2010; Iwashita,
Brown, McNamara, & O’Hagan, 2008; Lee, 2006; Plakans, 2008, 2009,
2010; Swain, Huang, Barkaoui, Brooks, & Lapkin, 2009; Watanabe,
2001; Weigle, 2004). Vis-à-vis their independent counterparts for which

244 TESOL QUARTERLY Vol. 47, No. 2, June 2013


© 2012 TESOL International Association
test takers generate answers without the benefit of input, integrated
test tasks offer prior textual and/or aural support on which test takers
may base their subsequent oral or written responses (Lewkowicz,
1997). Integrated test tasks have been claimed to feature higher
authenticity (Luoma, 2004), put test takers on a more equal footing in
terms of background knowledge (Read, 1990), and exert a positive
impact on teaching and learning as a form of performance assessment
(Shohamy, 1995). However, to date little research has attempted to
explore how test taker characteristics impact the performance on such
integrated test tasks. The current research, in response, made a preli-
minary excursion into this largely uncharted domain by examining the
influence of test anxiety on integrated speaking test performance and
comparing independent speaking test performance and integrated
speaking test performance in terms of how they relate to test anxiety,
in an effort to add to the research on integrated language assessment
and to accrue further construct validity evidence for integrated speak-
ing test tasks.

REVIEW OF LITERATURE

Integrated Test Tasks

As defined by Lewkowicz (1997), integrated language test tasks rep-


resent thematically connected tasks of a test, and for such a test, “the
input that has been provided forms the basis of the response(s) to be
generated by test takers” (p. 121). Three benefits have been claimed
for these tasks. First, they are believed to feature a higher level of
authenticity. As Butler, Eignor, Jones, McNamara, and Suomi (2000)
indicate, in the real-life academic context, students usually speak after
being given reading or listening input. Integrated test tasks, customar-
ily designed to supply textual or aural input (Lewkowicz, 1997), may
better approximate the realistic language use tasks in the academic
milieu and as such feature an air of authenticity (Luoma, 2004).
Second, they are believed to promote equity. In evaluative situations,
test takers usually bring varying degrees of prior knowledge to bear on
the test tasks, and those weaker in relevant background knowledge are
thus at a disadvantage as they attempt to formulate their responses. As
Read (1990) asserts, integrated test tasks can effectively mediate any
undesirable impact of background knowledge by offering textual or
aural information ahead of time. Third, taking the form of perfor-
mance assessment, integrated test tasks might initiate positive wash-
back effects on teaching (Shohamy, 1995). In the face of high-stakes
tests such as the TOEFL, instructional practices usually focus on mate-

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 245


rial directly related to the exam at the expense of the development of
skills and knowledge not tested on the exam. Such teaching-to-the-test
practices would inevitably emphasize drill and rote learning and in
turn impair the effectiveness of a curriculum. Integrated tests, requir-
ing test takers to use more than one language skill to perform tasks,
may encourage teachers to adopt “a more holistic approach to instruc-
tion” (Miller & Legg, 1993, p. 12) whereby attention may be directed
more equally to all skill areas.
A number of studies have shed light on the implementation of inte-
grated writing test tasks in the L2 context. For instance, Watanabe
(2001) discovered that reading-to-write prompts could allow teachers
to assess students’ ability to identify and integrate important informa-
tion in the source texts into their own writing. Echoing this positive
finding, Weigle (2004) found that an English as a second language
(ESL) proficiency test with an integrated reading-writing component
boasted much higher pass rates and rater agreement and exerted a
positive washback effect on the instruction geared toward the prepara-
tion for this test. In the same vein, Gebril (2006) reports that reading-
to-write tasks generated scores equally reliable as those produced by
independent writing tasks. Furthermore, Plakans (2008, 2009, 2010)
reveals that reading-to-write test tasks stimulated a more interactive
writing process, enjoyed a higher degree of authenticity and student
preference, elicited discourse synthesis transformations in the compos-
ing process, and featured a task representation dissimilar to that of
the writing-only tasks.
However, only a few research efforts have thus far examined inte-
grated speaking test tasks. Among them, Lee (2006) focused attention
on the speaking component of the TOEFL-iBT, which incorporates
independent speaking tasks, reading-listening-speaking tasks, and lis-
tening-speaking tasks, and found that an increase in the number of
tasks favorably influenced the score reliability. Iwashita et al. (2008)
examined the features that distinguish performance on the speaking
component of the TOEFL-iBT and found that linguistic resources,
phonology, and fluency all significantly affected the evaluation of the
spoken performances. Swain et al. (2009), probing the strategic behav-
iors associated with the speaking section of the TOEFL-iBT, suggest
that reported test-taking strategies fell into five categories: metacogni-
tive, cognitive, communication, approach, and affective. They found
that integrated tasks, especially those providing both reading and
listening input, elicited more strategy use than independent tasks.
Huang and Hung (2010) compared a reading-to-speak test task and a
speaking-only test task in terms of the anxiety each induced and the
perceptions they produced, and show that, although the two oral test
tasks incurred a comparable amount of test anxiety, the participating

246 TESOL QUARTERLY


students demonstrated an overwhelming preference for the reading-to-
speak test task.

Test Taker Characteristics: Test Anxiety

In view of Bachman and Palmer’s (1996) model of language use,


five test taker characteristics (i.e., topical knowledge, language knowl-
edge, personal characteristics, strategic competence, and affective sche-
mata) and the interactions among them constitute the essential
ground that underlies language use and language test performance.
Among these characteristics, affective schemata represent “the basis on
which language users assess, consciously or unconsciously, the charac-
teristics of the language use task and its setting in terms of past emo-
tional experiences in similar contexts” (p. 65). That is, they determine
language users’ affective responses to language use tasks of a similar
nature. As Bachman and Palmer point out, these schemata function to
either facilitate or inhibit the language user’s access to the full spec-
trum of language knowledge and strategies available to a test taker as
he or she sets out to tackle a language task. Given the crucial role of
affective schemata in language use situations, this research centered
attention primarily on them and, for the purpose of this study, opera-
tionalized them as test anxiety.

Definition and prior empirical work. To date, the most popular def-
inition of test anxiety situates itself within the state-trait model
advanced by Spielberger (1966). This model distinguishes between
state anxiety and trait anxiety, with the former denoting “a transitory
emotional state or condition” of tension, apprehension, and autonomic
nervous reactions and the latter concerning the “relatively stable
individual differences in anxiety proneness” to respond with A-state
reactions in an array of stimulus situations (Spielberger, 1972, p. 39).
Defined in reference to this dichotomous distinction, test anxiety has
thus been conceptualized as “a situation-specific personality trait” that
responds with heightened anxiety to evaluative situations (Spielberger,
Anton, & Bedell, 1976, p. 323). Because the research reported here
concerned itself primarily with the anxiety reactions test takers experi-
enced at the time of completing oral test tasks, this psychological
construct was thus operationalized as state anxiety in this study.
In the field of education, test anxiety has been shown to bear a
moderate inverse association with performance in a wide range of test-
ing contexts (Zeidner, 1998). Likewise, in the realm of foreign lan-
guage learning, the relevant studies have presented a similar, though
less conclusive, picture. For instance, Steinberg and Horwitz (1986)

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 247


elicited and compared the oral responses to three picture-description
tests and found that state anxiety significantly affected oral perfor-
mance. Hembree (1988) reveals in a meta-analytic study that foreign
language achievement surfaced as one significantly but weakly inverse
correlate of test anxiety. Oya, Manalo, and Greenwood (2004) show
that, as ESL students’ state test anxiety escalated, their performance
on a story-retelling task dropped significantly in clausal construction
accuracy. Liu (2007) found that oral English test anxiety was signifi-
cantly and inversely associated with oral test performance. However,
thus far, scant research has looked into how test anxiety relates to the
performance of integrated speaking test tasks.

Construct validity: Test anxiety as a source of construct-irrelevant


variance. As Messick (1989) contends, construct validity often falls
victim to the threat posed by construct-irrelevant variance (CIV),
among other threats. Representing the “excess reliable variance . . . that
affect[s] responses in a manner irrelevant to the interpreted construct”
(Messick, 1995, p. 742), such variance would operate to influence the
judgments formulated for test performance (Messick, 1994) and as
such undermines “the accuracy of test score interpretations, the legiti-
macy of decisions made on the basis on test scores, and the validity
evidence for tests” (Downing, 2002, p. 236). According to Haladyna
and Downing (2004), test anxiety constitutes a major source of such
CIV. That is, because test anxiety has been demonstrated to deleteri-
ously impact test performance, its presence would thus lead to the
underestimation of test scores and attenuate the adequacy and validity
of the interpretations and actions based on such scores. In other words,
when test takers experience test anxiety and in turn fail to perform
well, the test scores they receive consequently might not reflect their
true ability because such scores have been negatively affected by their
anxiety. Further, the decisions based on these scores would likewise fall
short of being adequate.
However, we hypothesized that test anxiety might not be as strong a
source of CIV for integrated speaking test tasks as it is for independent
speaking test tasks. As delineated earlier, integrated test tasks provide
textual and aural input for the test takers to use to formulate their
responses; that is, they offer the test takers the most pertinent back-
ground knowledge they need to generate answers. Given that the cur-
rent researchers have often observed that insufficient background
knowledge about the test topic usually emerges as a major cause of test
anxiety, it seems reasonable to assume that test takers performing inte-
grated speaking test tasks might experience less test anxiety due to the
input provision. Furthermore, because test anxiety and test perfor-
mance have been shown to negatively relate to each other, it seems

248 TESOL QUARTERLY


likely that, if test takers experience a lower degree of test anxiety while
completing integrated speaking test tasks, their performance would be
less negatively influenced by this affective factor. If this reasoning
holds (i.e., input provision leads to reduced impact of test anxiety on
performance), test anxiety could then be regarded as an inconsequen-
tial source of CIV for integrated speaking test tasks. However, because
little research has investigated how test anxiety would influence perfor-
mance on integrated speaking test tasks, the veracity of this assump-
tion remains unknown.

THE CURRENT RESEARCH

As the foregoing review demonstrates, research efforts on integrated


speaking test tasks have hitherto been few and far between. Moreover,
the impact of test anxiety on performance of integrated speaking test
tasks remains unknown. In light of these gaps, the current study thus
examined how test anxiety impacts integrated speaking test perfor-
mance and whether independent speaking test performance and inte-
grated speaking test performance differ in how they are related to test
anxiety. We intended to contribute to the body of knowledge pertain-
ing to integrated speaking test tasks and to provide further construct
validity evidence to support the inferences and actions based on inte-
grated speaking performance. Proceeding from these intentions, we
operationalized the independent speaking test tasks as speaking-only
test tasks and the integrated speaking test tasks as reading-listening-
speaking test tasks while addressing the following research questions:
1. What is the relationship between test anxiety and integrated
speaking test performance?
2. Does test anxiety differentially impact independent speaking
test performance and integrated speaking test performance?

METHOD

Participants

The 352 English as a foreign language (EFL) students involved in


this study consisted of 88 males and 264 females recruited from a
Taiwanese university. Their ages ranged from 18 to 30 at the time of
the study. Nineteen were pursuing a master’s degree, and the other
333 were working on a bachelor’s degree. Prior to their participation
in the current study, they had garnered an average of 9 years of formal

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 249


education in English. In terms of academic majors, although they
came from a wide array of academic disciplines, the majority (71%)
were in the College of Foreign Languages.

Instruments
Speaking test tasks

Derivation and development procedure. Six semidirect oral speaking


test tasks were used in this study: three integrated and three indepen-
dent (hereafter referred to as integrated tasks and independent tasks).
The process of finalizing these tasks consisted of four major phases. In
the first phase, the researchers derived four integrated tasks from the
existing TOEFL-iBT preparation kits based on such criteria as cultural
neutrality, religious neutrality, and low controversy-provoking possibil-
ity. These four tasks centered on the topics of the air transportation
industry, biofuels, immunization, and the Euro, and came from two
commercially available books (Lee, 2008; Jiang & Grimes, 2008). Fur-
ther, in light of the objective ETS established for the reading-listening-
speaking tasks on the TOEFL-iBT—that is, to assess test takers’ ability
to synthesize and maneuver the textual and aural information related
to academic course content to generate an oral discourse (Pearlman,
2008)—the researchers revised the four integrated tasks to ensure that
they each ended with a question inviting the test takers to integrate
the information from both the reading input and the listening input
into their oral responses. In the second phase, the researchers devel-
oped four independent tasks on the same topics as the integrated
tasks derived in the preceding phase, based on the description ETS
(2008) set forth for the independent tasks on the TOEFL-iBT; that is,
such tasks “ask test takers to draw upon their own ideas, opinions, and
experiences when responding” (p. 16). The purpose of forging such
thematic links between the integrated tasks and the independent tasks
lay in the need to counterbalance the topics for the test tasks. In the
third phase, the researchers recruited six content experts from the
areas of nursing, chemistry, and economics to inspect and confirm the
thematic links between the independent tasks and the integrated tasks.
In the fourth phase, the researchers assigned tasks on the topics of
the air transportation industry and biofuels as the test tasks and those
on the topics of immunization and the Euro as the practice tasks.

Task description. For the independent tasks, the test takers needed
to capitalize on their world knowledge and personal experience to formu-
late oral responses without the benefit of input provided ahead of time.

250 TESOL QUARTERLY


For instance, for the test task on the air transportation industry, they had
to rely completely on themselves in enumerating the advantages and dis-
advantages of producing new aircrafts as a strategy to grapple with the
economic recession. In compliance with the time allowance set for such
tasks on the official TOEFL-iBT, test takers had 1 minute to complete
each task: 15 seconds for preparation and 45 seconds for responding.
For the integrated tasks that took the format of reading-listening-
speaking tasks, test takers needed to first read a short passage, then
listen to a lecture on the same topic as the reading passage, and
finally generate an oral summary of the information provided in the
textual and aural input. For example, for the test task on the air
transportation industry, the reading passage introduced the major
features of two new aircraft produced by Boeing (Dreamliner) and
EADS (A380); the lecture described how the advent of these two air-
craft gave rise to two very different forecasts of future air travel
spending, and the question called on the test takers to integrate the
information they obtained from the passage and the lecture into an
oral description of how the lecture lent support to the ideas revealed
by the passage. In terms of time allowance, for each task, test takers
spent 45 seconds on reading, approximately 90 seconds on listening,
30 seconds on preparation, and 60 seconds on responding to the
question.

Counterbalancing the topics. The topics of the test tasks were coun-
terbalanced for the integrated task and the independent task in the
administration phase; that is, half of the participants received one
combination of topics, and the other half were assigned the other
topic combination. This counterbalancing practice arose from the
need to rule out possible topic effects. That is, if the participants all
received the same topics for the independent tasks and the integrated
tasks, measuring the influence of test anxiety on speaking perfor-
mance would be confounded with the topic effects. In other words, it
would be difficult, if not impossible, to discern clearly whether the
change in performance was due solely to test anxiety or resulted from
the interaction of test anxiety and the particular topics used. However,
no counterbalancing occurred for the topics of the practice tasks,
because these tasks merely functioned to get the participants accli-
mated to the test and did not provide data for the official analyses.
Table 1 illustrates the topics for the test tasks and the practice tasks.

State anxiety inventory. The state anxiety inventory (SAI) was


drawn from the State-Trait Anxiety Inventory (STAI) introduced by
Spielberger, Gorsuch, and Lushene (1970) to explore the respective
anxiety level participants suffered while tackling the independent

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 251


TABLE 1
Topics for the Test Tasks and the Practice Tasks

Topic

Task Combination Independent task Integrated task


Test tasks A Biofuels Air transportation industry
B Air transportation industry Biofuels
Practice tasks N/A The Euro Immunization

tasks and the integrated tasks. A self-report measure in nature, the


SAI has been shown to feature a high alpha coefficient of .93 and
to generate significantly higher scores for presumably more stressful
and anxiety-provoking conditions, as such demonstrating adequate
reliability and construct validity evidence for the scores it produces
(Spielberger, 1983). This scale comprises 20 items; half of them are
worded positively, such as “I felt calm,” and the other half are
worded negatively, such as “I was worried.” To respond to this scale,
the participants indicated their answer to each item on a 4-point
Likert scale, with the choices being not at all, somewhat, moderately so,
and very much so. In this study, the Chinese version created and vali-
dated by Chung and Lung (1984) was employed in lieu of the ori-
ginal English version to avoid potential language-induced
misunderstandings.

Data Collection Procedures

The data collection took place in two computer labs, where each
participant had access to a desktop computer equipped with head-
phones and a microphone. Specifically, the participants were randomly
assigned to one of the two computer labs (lab A and lab B) and took
different combinations of the oral test tasks according to their lab
assignment. That is, although the participants from both labs took the
practice tasks on the same topics, lab A participants took the indepen-
dent task on the topic of biofuels and the integrated task on the air
transportation industry, whereas the lab B participants performed the
two test tasks on reversed topics. The participants first completed the
two independent tasks and then answered an SAI to indicate the level
of anxiety they experienced during each of these tasks. In tandem with
the completion of the SAI, the participants undertook the two inte-
grated tasks and then completed another SAI, the scores of which rep-
resented the anxiety they endured during the taking of this second set
of tasks.

252 TESOL QUARTERLY


Data Coding

The researchers coded the responses on the SAIs and rated the per-
formance on the oral test tasks. With respect to the SAIs, for the posi-
tively worded items the researchers represented the four response
categories with four ascending numerical values, and for the negatively
worded items they represented the categories with four descending
numerical values. In so doing, they allowed the same numerical value
to reflect a similar amount of anxiety.
In regard to the oral test tasks, the researchers rated the test takers’
performance on the two tasks with reference to the independent
speaking rubrics and integrated speaking rubrics developed by ETS to
evaluate the performance on the speaking component of the TOEFL-
iBT. In accordance with these rubrics, the researchers each evaluated
all speech segments in terms of three criteria: delivery, language use,
and topic development. For each segment, they assigned one score
band on a scale ranging from 0 to 4 for each criterion, totaling three
score bands. The three score bands awarded for each segment by each
researcher were then summed into a composite score. Finally, the two
composite scores generated for each speech segment, one from each
researcher, were then averaged to numerically represent the overall
quality of that particular speech segment. The Cronbach’s alpha coeffi-
cient was found to reach a level as high as .92, suggesting an excel-
lent consistency to be associated with the ratings produced by the
researchers.

Data Analysis

Preliminary analysis. The data analysis began with a preliminary


analysis to ensure that the data could lend themselves to the subsequent
statistical examinations. First, the researchers employed the listwise
deletion technique to cope with the missing data and subsequently
dropped from the data set 14 cases that were missing one or more items
on the SAIs. Second, the descriptive statistics of the responses were
computed, including means, standard deviations, skewness, and kurto-
sis. Third, the researchers calculated the reliability of the instruments
and ratings by means of Cronbach’s alpha coefficient. Fourth, four sta-
tistical assumptions were tested (normality, linearity, outliers, and multi-
collinearity), leading to the removal of four additional cases (i.e., three
univariate outliers and one multivariate outlier). In sum, the prelimin-
ary analysis eliminated 18 cases, reducing the sample size to a total of
334 cases: 165 for combination A and 169 for combination B.

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 253


Primary analysis. This study capitalized on path analysis to perform
the primary analysis. Path analysis represents an extension of multiple
regression analysis (Webley & Lea, 1997) and aims to estimate the
hypothesized causal relationships among various observed variables
(Kline, 2005). Yet, dissimilar to multiple regression analysis, it allows
for simultaneous assessments of the relationships among any number
of independent and dependent variables (Schumacker & Lomax,
2004). Further, it enables researchers to decompose the influence of
one variable on another into direct and indirect effects, which could
provide insight into the “operative causal mechanism” (Olobatuyi,
2006, p. 12).
Figure 1 graphically depicts the baseline path model proposed in this
study to address the first research question. Specifically, this model
aimed to disentangle the relationships among the four assessed
variables, namely, anxiety for independent tasks (anxiety_independent),
anxiety for integrated tasks (anxiety_integrated), independent speaking
test performance (independent performance), and integrated speaking test
performance (integrated performance). In this baseline model, the
researchers specified the relationships among the variables using arrows;
a single-headed arrow reflects the direct effect of an exogenous variable
(i.e., independent variable) on an endogenous variable (i.e., dependent
variable), whereas a double-headed, curved arrow denotes the correla-
tion shared by two variables (Stage, Carter, & Nora, 2004). Moreover, in

e1

Anxiety_ Independent
Independent Performance

Anxiety_ Integrated
Integrated Performance

e2

FIGURE 1. The baseline path model.

254 TESOL QUARTERLY


the model, each endogenous variable comes with a disturbance term
that signals the portion of variance not explained by the model and
measurement error (Garson, 2011). Further, the researchers set the
scale of measurement for each disturbance term by constraining the
path connecting the disturbance term and its associated endogenous
variable to the value of one (Keith, 2006) and expressed the hypothe-
sized relationship between the two endogenous variables by correlating
their disturbance terms (Kenny, 2011).
This baseline path model was formulated with reference to the
guidelines posited by Keith (2006). First, the unidirectional paths
connecting anxiety and speaking performance for both types of
speaking test tasks were postulated based on theory and previous
research. In terms of theory, the transactional process model of test
anxiety put forth by Spielberger and Vagg (1995) lends theoretical
support for this causal relationship. In light of this model, test anxi-
ety might arise from the cognitive appraisals or reappraisals test tak-
ers perform during the test task at hand and in turn set off task-
irrelevant thoughts and depress test performance. With regard to pre-
vious research, L2 scholars have repeatedly demonstrated the substan-
tial relationship between anxiety and language learning and/or test
performance (e.g., Hembree, 1988; Liu, 2007; Oya et al., 2004; Stein-
berg & Horwitz, 1986; Zeidner, 1998). Second, as for the correlation
between the two anxiety variables and between the two speaking per-
formance variables, the researchers found support largely in logic.
That is, it stands to reason that the two anxiety variables might con-
stitute the componential variables of a higher order anxiety construct
such as overall oral test anxiety, and the two speaking performance
variables might fall under the reign of the overall oral proficiency
construct.
The researchers used the AMOS 16.0 program with maximum likeli-
hood estimation to evaluate each proposed baseline path model using
the correlation matrix of the four observed variables. Further, they
assessed each model in terms of its fit with the gathered data by refer-
ring to multiple fit indices. Specifically, the researchers employed the
set of fit indices recommended by Kline (2005): (1) v2 test statistic
along with its level of significance, (2) the comparative fit index (CFI),
(3) the root mean square error of approximation (RMSEA), and (4)
standardized root mean square residual (SRMR). With regard to the
criteria for these indices, for the v2 test index, a good model fit should
feature a relatively small and insignificant v2 value (Hatcher, 1996).
Concerning CFI, RMSEA, and SRMR, the more stringent cutoff criteria
suggested by Hu and Bentler (1999) were adopted (i.e., CFI > .95,
RMSEA < .06, and SRMR < .08).

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 255


e1

Anxiety_ a Independent
Independent Performance

Anxiety_ a Integrated
Integrated Performance

e2

FIGURE 2. The sample anxiety constraint model.

As an attempt to address the second research question, the research-


ers constructed an anxiety constraint model for each topic combination
based on its associated baseline path model (Figure 2). Specifically, they
developed each anxiety constraint model by imposing an equality con-
straint on the two paths pointing toward the speaking performance vari-
ables from the anxiety variables to fix them to be equal in strength (as
denoted by the letter a assigned to the two paths, as shown in Figure 2).
The rationale behind this equality constraint was that, because adding
constraints would invariably deteriorate the model fit (Keith, 2006), if
the equality constraint did not lead to a significant model fit deteriora-
tion, it would imply that the two constrained paths shared a comparable
value, and a case could then be made that anxiety did not differentially
influence the independent performance and the integrated perfor-
mance. Conversely, if the equality constraint significantly worsened the
model fit, it would suggest that the two constrained paths indeed dif-
fered substantially from each other in magnitude and, by inference, anx-
iety did differentially impact the independent performance and the
integrated performance. Finally, the researchers compared the baseline
path model and the anxiety constraint model in terms of their relative
model fit by way of the v2 difference test. By performing this test, the
researchers could examine whether the deterioration in model fit
caused by constraining the two paths had reached the predetermined
level of statistical significance (Kline, 2005).

256 TESOL QUARTERLY


TABLE 2
Correlations, Means, and Standard Deviations for the Observed Variables for Topic Combi-
nation A (n = 165)

Correlation

Observed variable 1 2 3 4 M SD
1. Anxiety independent 1 57.13 9.61
2. Anxiety integrated .69 1 55.79 9.77
3. Independent performance –.31 –.25 1 7.67 2.31
4. Integrated performance –.27 –.34 .63 1 7.56 2.24

TABLE 3
Correlations, Means, and Standard Deviations for the Observed Variables for Topic Combi-
nation B (n = 169)

Correlation

Observed Variable 1 2 3 4 M SD
1. Anxiety independent 1 51.76 10.88
2. Anxiety integrated .60 1 52.62 9.74
3. Independent performance –.23 –.13 1 7.65 2.79
4. Integrated performance –.17 –.21 .72 1 7.46 2.46

RESULTS

Correlations

As shown in Tables 2 and 3, the four observed variables comprising


the path model constructed for each topic combination were all
correlated with one another. Further, these correlations all reached
statistical significance and, for the most part, featured a moderate
degree of association, suggesting that for each pair of variables the
value of one variable would change in proportion to that of the other
(StatSoft, 2012). To illustrate, for combination A (Table 3), the
negative, moderate correlation between anxiety_integrated and integrated
performance (r = .34) implied that an increase in anxiety reactions
toward integrated tasks would lead to a moderate decrease in perfor-
mance on such tasks.

Baseline Path Models

The baseline path model for each topic combination contained two
paths and two correlations among the four observed variables. Figure 3

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 257


Combination A e1

Anxiety_ -.28 Independent


Independent Performance

.69 .60

Anxiety_ -.31 Integrated


Integrated Performance

Chi-square =1.405
df = 2
p = .495 e2
CFI = 1.000
RMSEA = .000
SRMR = .0297

FIGURE 3. The estimated baseline path model for topic combination A.

presents the baseline path model for topic combination A annotated


with the calculated parameter estimates. As shown, the v2 significance
test for this model generated a nonsignificant v2 value of 1.405 with 2
degrees of freedom (p = .495), the CFI achieved an impressive value of
1.00, the RMSEA (.00) fell below the recommended criterion of .06,
and the SRMR (.0297) was well within the acceptable limit. Taken
together, these fit indices corroborated the baseline path model for
topic combination A as a nearly perfect representation of the data and
required no additional modifications. In regard to topic combination
B, a similar picture emerged for the fit of its baseline path model. As
revealed by Figure 4, this model embraced a v2 value of 0.519 with 2
degrees of freedom (p = .771), a CFI of 1.00, an RMSEA approximating
0, and an SRMR of .0191. These statistics collectively suggest that the
baseline path model for topic combination B featured an outstanding
fit to the data and needed no further changes.
A path coefficient reflects “the direct effect of one variable (assumed
to be a cause) on another variable (assumed to be the effect)” (Stage
et al., 2004, p. 5). As illustrated by Figure 3, for topic combination A
the path coefficient for the connection between anxiety_independent and
independent performance (b = .28) and between anxiety_integrated and
integrated performance (b = .31) both presented a large direct effect,
in view of the effect interpretation guidelines (Keith, 2006). These esti-
mates imply that test anxiety exerted a significantly negative impact on
the performance of both types of test tasks; the stronger the test anxi-
ety, the weaker the test performance, be it independent or integrated.

258 TESOL QUARTERLY


Combination B e1

Anxiety_ -.20 Independent


Independent Performance

.60 .71

Anxiety_ -.20 Integrated


Integrated Performance

Chi-square =.519
df = 2
p = .771 e2
CFI = 1.000
RMSEA = .000
SRMR = .0191

FIGURE 4. The estimated baseline path model for topic combination B.

As also demonstrated by Figure 3, the two anxiety variables were charac-


terized by a moderate correlation estimate of .69, and the disturbances
of the two speaking performance variables were highly correlated
(r = .60). This result lent partial support to the assumption that these
two pairs of variables might each represent the componential variables
of a more encompassing construct; that is, anxiety_independent and anxi-
ety_integrated both measured the overall oral test anxiety, and indepen-
dent performance and integrated performance both assessed the overall oral
proficiency. With respect to topic combination B, a cursory perusal of
Figure 4 reveals a similar pattern: a moderate path coefficient
for the path leading to independent performance from anxiety_independent
(b = .20) and for the path traveling to integrated performance from
anxiety_integrated (b = .20), a moderate correlation between the two
anxiety variables (r = .60), and a strong relationship between the
two speaking performance variables (r = .71).

Anxiety Constraint Models

Before comparing the baseline path model (hereafter the baseline


model) and the anxiety constraint model (hereafter the constraint
model) to examine how the added path constraint affected the model
fit, the researchers first estimated the goodness-of-fit of each constraint
model, because a model comparison is only justified when the models
to be compared both show at least an acceptable fit to begin with

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 259


TABLE 4
Model Comparison Results for the Two Topic Combinations

Topic combination Model v2 df Δv2 Δdf p


A Baseline 1.405 2
Constraint 1.446 3 0.041 1 .84
B Baseline 0.5188 2
Constraint 0.5195 3 0.0007 1 .98

(Brown, 2006). The estimation results reveal an excellent model-data


fit for both the constraint model for topic combination A (v2 (3) =
1.446, p > .05; CFI = 1.00; RMSEA = .00; SRMR = .0297) and the con-
straint model for topic combination B (v2 (3) = 0.5195, p > .05;
CFI = 1.00; RMSEA = .00; SRMR = .0191).
Having confirmed the adequate fit of the constraint models, the
researchers proceeded to compare, for each topic combination, the
baseline model and the constraint model by way of an v2 difference
test. Table 4 presents the comparison results for the two topic
combinations. For topic combination A, the v2 difference test pro-
duced a Dv2 of 0.041 with a Ddf of 1, whose probability (p = .84) fell
well above .05, suggesting that the constraint model did not exhibit a
significant loss of model fit as compared to the baseline model. This
finding indicates that the two paths from the anxiety variables to the
speaking performance variables possessed comparable values and, by
inference, anxiety equally affected the independent performance and
the integrated performance. In other words, test takers experienced a
similar degree of test anxiety as they tackled the two types of tasks. In
regard to topic combination B, the model comparison results painted
an almost identical picture. That is, it was shown that the constraint
model enjoyed a similar degree of model fit as the baseline model
(Dv2 (1) = 0.0007, p = .98). This finding implies that the values of the
two paths from the anxiety variables to the speaking performance vari-
ables did not differ from each other in any significant way, which in
turn connotes that the effects of anxiety on the independent perfor-
mance and on the integrated performance approximated each other.
Put another way, the test takers felt equally anxious (or un-anxious)
when they grappled with the two types of tasks.

DISCUSSION
Following the path analyses performed on the collected data, three
major findings came to light: absence of topic effects, adverse impact
of test anxiety on speaking performance, and nondifferential

260 TESOL QUARTERLY


impact of test anxiety on independent performance and integrated per-
formance. What ensues discusses and interprets each of these findings.

Absence of Topic Effects

Previous research has demonstrated that topic selection would give


rise to test or learning performance variations in speaking in an L2
(e.g., Douglas & Selinker, 1992; Papajohn, 1999), so this study thus
used two counterbalanced topic combinations for the independent
tasks and the integrated tasks (i.e., topic combination A and topic
combination B) and performed all of the path analyses twice, once for
each combination, in an attempt to take into consideration such
potential topic effects. Yet, as shown by the analyses, there existed a
close correspondence between the findings associated with the two
topic combinations; namely, for both combinations, anxiety contrib-
uted significantly and negatively to both the independent performance
and the integrated performance, and its impact on these two types of
performance did not differ in any substantial way. These parallels
between the two combinations readily rule out the presence of topic
effects and signify that the relationships between test anxiety and
speaking test performance do not hinge on the topics of the given test
tasks, which in turn provides justification for interpretations to be
made of the overall relationships between test anxiety and speaking
test performance without regard to the topics.

Adverse Impact of Test Anxiety on Speaking Performance

In answer to the first research question, the baseline models indi-


cate that test anxiety consistently applied a significantly negative influ-
ence on oral speaking test performance, be it independent
performance or integrated performance. Although it was well within
the researchers’ expectation that test anxiety interfered with indepen-
dent performance, because a large body of research has confirmed the
detrimental impact of anxiety on language learning or test perfor-
mance (e.g., Hembree, 1988; Liu, 2007; Oya et al., 2004; Steinberg &
Horwitz, 1986; Zeidner, 1998), it came as a surprise that test anxiety
applied an equally significant, detrimental effect on integrated perfor-
mance. As mentioned earlier, one primary motivation of the current
research pertained to the theoretical assumption that integrated per-
formance might be less affected by test anxiety, because integrated
tasks provided background knowledge for the test takers to formulate
their oral responses and because lack of relevant background

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 261


knowledge has often been cited as a source of test anxiety. However,
based on this study’s findings, this assumption was effectively rejected,
suggesting that despite the provision of background knowledge, test
anxiety still played an important role in determining the performance
on the integrated tasks. In the meantime, this finding also implies that
test anxiety still constituted as important a source of CIV for inte-
grated tasks as it did for independent tasks, which in turn signifies that
the scores generated for integrated performance represent both oral
proficiency and anxiety and should be interpreted with caution.
The researchers surmise that this finding might have arisen due to
the test takers’ unfamiliarity with the integrated test format. As demon-
strated by Bonacci and Reeve (2010), test format constituted one of
the primary precursors of test anxiety. Because integrated tasks
emerged as a relatively novel form of oral speaking assessment for the
participants, they may thus have perceived these tasks as more
challenging. Further, it has been shown that when the task at hand
presents a higher level of difficulty or challenge, cognitive interfer-
ence, one major component of test anxiety (Hodapp, 1995), might
manifest itself for the more test-anxious students (Sarason, 1987), caus-
ing them to churn out task-irrelevant thoughts (such as ruminative,
self-evaluative, and self-occupied concerns) that intrude upon their
attention to the task (Sarason, 1984) and lower their performance as a
consequence (Sarason & Stoops, 1978). Following this line of reason-
ing, it seems plausible to speculate that although integrated tasks
provided input for the test takers to generate their responses and
might, as such, have reduced their anxiety to a certain degree, a com-
paratively unfamiliar test format might have simultaneously augmented
their anxiety through increasing the occurrence of task-irrelevant
thinking, eventually allowing test anxiety to remain a crucial factor in
determining performance.

Nondifferential Impact of Test Anxiety on Independent


Performance and Integrated Performance

As the anxiety constraint models illustrates, the two paths linking


the anxiety variables and the speaking performance variables statisti-
cally approximate each other in terms of magnitude, which implies
that test anxiety did not differentially affect independent performance
and integrated performance, as such responding in the negative to the
second research question, which asked whether a differential effect of
test anxiety would exist for independent performance and integrated
performance. To conjecture, this nondifferential impact of test anxiety
might have stemmed from the additional processing loads entailed by

262 TESOL QUARTERLY


the reading and listening input in the integrated tasks. In view of the
transactional process model of test anxiety noted earlier, the cognitive
appraisals or reappraisals test takers perform during the test task at
hand might determine the degree of test anxiety they might experi-
ence and, in turn, lead to the formation of task-irrelevant thoughts
that would distract attention from the task and eventually compromise
performance (Spielberger & Vagg, 1995). In the context of the
integrated tasks, when tackling such tasks the test takers might have
initially appraised them as less threatening and stressful due to the
availability of background information (and perhaps their lack of prior
experience with this test format) and as such experienced a lower
degree of test anxiety and fewer task-irrelevant thoughts. However, as
they noticed that for such integrated tasks they needed to almost
simultaneously process the reading and listening input while generat-
ing an oral summary, they might have turned around and reappraised
these tasks as challenging, leading them to an elevated level of test
anxiety and task-irrelevant thoughts and eventually impairing their per-
formance in the same way test anxiety did independent performance.
Simply put, while the integrated tasks might have reduced test
anxiety and its impact on performance via supplying background
input, their requirements for the input processing might have con-
currently inflicted additional anxiety and thus a more deleterious
impact on performance, in the end canceling out the advantage
afforded by the input and allowing test anxiety to impact integrated
performance in a manner analogous to the way it impacted indepen-
dent performance.
An alternative explanation might be drawn from the low-stakes
nature of the oral tasks. As suggested by Young (1986), one possible
reason for the nonsignificant impact of state anxiety on oral profi-
ciency interview performance revealed in her study might have related
to the unofficial nature of the test. This might also hold true for the
finding here. In this study, the participants’ performance on the oral
tasks did not have any repercussions for them, so it is reasonable to
assume that they might have suffered less anxiety and fewer task-irrele-
vant thoughts than when the test results mattered to them, thus mask-
ing the advantage afforded by the input in reducing the impact of test
anxiety. The researchers hypothesized that had the oral tasks been
high-stakes tests carrying some formal consequences, the test takers
might have experienced less test anxiety and task-irrelevant thoughts
while completing the integrated tasks because such tasks offered back-
ground information for them to frame their responses, in which case
the impact of test anxiety on integrated performance might have been
less than that on independent performance.

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 263


CONCLUSION AND IMPLICATIONS

This path-analytic study modeled the relationship between anxiety


and integrated speaking performance and explored whether test anxi-
ety would differentially affect independent speaking performance and
integrated speaking performance. The analyses performed on the four
path models constructed on the basis of theory, previous research, and
logic led to two major findings that held constant for both of the topic
combinations. In answer to the first research question, one major
finding shows that test anxiety significantly affected integrated
performance. In response to the second research question, the other
major finding reveals that the impacts of test anxiety on independent
performance and on integrated performance were statistically equiva-
lent. These findings, coupled with Huang and Hung’s (2010) finding
that test takers experienced a comparable degree of test anxiety in sit-
ting for an independent task and an integrated task, suggest that the
advantage of integrated tasks over independent tasks, if any, might not
relate to the reduction of test anxiety or its impact on performance.
These findings hold theoretical and pedagogical implications for L2
oral assessment. Theoretically, they substantiate the model of language
use (Bachman & Palmer, 1996) discussed earlier. That is, as revealed
in this study, test anxiety significantly impacted both independent per-
formance and integrated performance, which provides endorsement
for this model whereby affective schemata (e.g., anxiety) are hypothe-
sized to influence language use in an evaluative context. Additionally,
the findings also lend support for the expanded model of oral test
performance propounded by Skehan (1998). This model conceptual-
izes oral test performance as coming under the influence of multiple
components, including task, rater, scale criteria, interactants, and can-
didate (i.e., test taker). Because the current findings have revealed the
obstructive influence of anxiety on oral test performance, and anxiety
clearly constitutes an important test taker characteristic, this study thus
further corroborates the adequacy of this theoretical model. The find-
ings also imply that the ratings awarded for the integrated perfor-
mance represent both ability (i.e., oral proficiency) and test anxiety.
The current study uncovered a significant impact of test anxiety on
integrated performance, which amounts to stating that the perfor-
mance on integrated tasks reflects both how well test takers could
speak in the L2 and how much test anxiety took away from this ability.
In other words, the construct of integrated tasks taps both speaking
proficiency and test anxiety, or, more accurately, it represents the
speaking proficiency attenuated by test anxiety. In light of this finding,
test users are advised to exercise caution in making inferences about

264 TESOL QUARTERLY


the ratings assigned to the performance on integrated tasks and in
taking actions based on these inferences.
In addition, in light of these findings, L2 practitioners are recom-
mended to help students develop anxiety-coping strategies for grap-
pling with speaking test tasks. As this study found, test anxiety exerted
a significantly negative impact on the performance of the two types of
oral test tasks. Therefore, if students could manage to reduce anxiety
while taking an oral task, their performance might be improved as a
result. In this case, L2 instructors might consider setting aside some
class time to demonstrate and practice anxiety-coping strategies that
can lend themselves to application in the oral evaluation context. For
instance, they could introduce the emotion-focused coping strategies
posited by Zeidner (1998) that guide students, in addressing an oral
test task, to remind themselves to relax by taking deep breaths and dis-
tancing themselves from the testing threat by temporarily ignoring the
consequence of the task.

FUTURE RESEARCH

To provide further insight into the implementation of integrated


speaking test tasks, the researchers propose three avenues for future
research. First, because this study recruited only Taiwanese EFL col-
lege students who shared cultural and linguistic backgrounds, the find-
ings might only apply to this particular population. Thus, future
research should include participants with diverse nationalities and lan-
guages so as to cross-validate the results of the current study. Second,
this study administered the oral test tasks to all of the participants in
the same order (i.e., beginning with two independent tasks and con-
cluding with two integrated tasks), due primarily to the test takers’
unfamiliarity with the integrated speaking test format. That is, because
most test takers in this study had never taken integrated tasks before,
beginning with such tasks might run the risk of confounding the anxi-
ety caused by this test format itself with the anxiety induced by the
unfamiliarity with the format. To circumvent this potential confound-
ing effect, the researchers consistently preceded the integrated tasks
with the independent tasks. Therefore, subsequent research may coun-
terbalance the sequence of presenting these two types of tasks (i.e.,
starting with independent tasks for half of the participants and begin-
ning with integrated tasks for the other half) so as to rule out the
potential impact of task ordering, if any. Third, the oral test tasks in
this research took place in an informal context and might as such
have distorted the picture of the anxiety-performance relationship to a
certain degree. Hence, ensuing research may probe the same issues

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 265


with data gathered from oral test tasks of a more formal or high-stakes
nature.

ACKNOWLEDGMENTS
This study was undertaken based on part of the research data collected for the
first author’s doctoral dissertation sponsored by the TOEFL Small Grants for Doc-
toral Research in Second or Foreign Language Assessment Program at ETS. Thus,
we would like to express gratitude to this program for its financial support.

THE AUTHORS

Heng-Tsung Danny Huang obtained his PhD in foreign language education from
the University of Texas at Austin, in the United States, and is currently working as
an assistant professor in the Department of Foreign Languages and Literature at
National Sun Yat-Sen University, in Taiwan. His research interests include lan-
guage testing, computer-assisted language teaching and learning, and quantitative
research methods.

Shao-Ting Alan Hung received his doctoral degree from the Department of Liter-
acy, Culture and Language Education at Indiana University, Bloomington, in the
United States. He is currently an associate professor in the Department of Applied
Foreign Languages at National Taiwan University of Science and Technology, in
Taiwan. His research interests include computer-assisted language teaching and
learning, second language writing and speaking pedagogy, language assessment,
and language teacher education.

REFERENCES
Bachman, L., & Palmer, A. S. (1996). Language testing in practice. Oxford, England:
Oxford University Press.
Bonacci, S., & Reeve, C. L. (2010). The nature and relative importance of
students’ perceptions of the sources of test anxiety. Learning and Individual
Differences, 20, 617–625. doi:10.1016/j.lindif.2010.09.007
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, NY:
Guilford Press.
Butler, F. A., Eignor, D., Jones, S., McNamara, T., & Suomi, B. K. (2000). TOEFL
2000 speaking framework: A working paper (TOEFL Monograph No. 20). Prince-
ton, NJ: Educational Testing Service.
Chung, S. C., & Lung, C. F. (1984). 修訂情境與特質量表之研究 [Modifying the
State-Trait Anxiety Inventory]. 測驗年刊 [Psychological Testing], 31, 27–36.
Douglas, D., & Selinker, L. (1992). Analyzing oral proficiency test performance in
general and specific purpose contexts. System, 20, 317–328. doi:10.1016/346-
251X(92)90043-3
Downing, S. (2002). Threats to the validity of locally developed multiple-choice
tests in medical education: Construct-irrelevant variance and construct under-
representation. Advances in Health Sciences Education, 7, 235–241. doi:10.1023/
A:1021112514626

266 TESOL QUARTERLY


ETS. (2008). TOEFL iBT tips: How to prepare for the TOEFL iBT. Princeton, NJ: Author.
Garson, G. D. (2011). Statnotes: Topics in multivariate analysis. Retrieved from
http://faculty.chass.ncsu.edu/garson/pa765/statnote.htm
Gebril, A. (2006). Independent and integrated academic writing tasks: A study in general-
izability and test method (Unpublished doctoral dissertation). Iowa City, IA: Uni-
versity of Iowa.
Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-
stakes testing. Educational Measurement: Issues and Practice, 23(1), 17–27. doi:10.
1111/j.1745-3992.2004.tb00149.x
Hatcher, L. (1996). Using SAS PROC CALIS for path analysis: An introduction.
Structural Equation Modeling, 3, 176–192. doi:10.1080/10705519609540037
Hembree, R. (1988). Correlates, causes, effects, and treatment of test anxiety.
Review of Educational Research, 58, 47–77.
Hodapp, V. (1995). The TAI-G: A multidimensional approach to the assessment of
test anxiety. In C. Schwarzer & M. Zeidner (Eds.), Stress, anxiety, and coping in
academic settings (pp. 95–130). Tübingen, Germany: Francke.
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance
structure analysis: Conventional criteria versus new alternatives. Structural Equa-
tion Modeling, 6, 1–55. doi:10.1080/10705519909540118
Huang, H.-T., & Hung, S.-T. (2010). Examining the practice of a reading-to-speak
test task: Anxiety and experience of EFL students. Asia Pacific Education Review,
11, 235–242. doi:10.1007/s12564-010-9072-6
Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed levels of sec-
ond language speaking proficiency: How distinct? Applied Linguistics, 29, 24–49.
doi:10.1093/applin/amm017
Jiang, P., & Grimes, J. (2008). TOEFL-iBT speaking 120. Taipei, Taiwan: Jinni.
Keith, T. Z. (2006). Multiple regression and beyond. Boston, MA: Pearson Education.
Kenny, D. A. (2011). Structural equation modeling. Retrieved from http://david
akenny.net/cm/causalm.htm
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.).
New York, NY: Guilford Press.
Lee, Y. (2006). Dependability of scores for a new ESL speaking assessment consist-
ing of integrated and independent tasks. Language Testing, 23, 131–166. doi:10.
1191/0265532206lt325oa
Lee, Y. (2008). Longman iBT TOEFL–Speaking. Taipei, Taiwan: Pearson Education.
Lewkowicz, J. A. (1997). The integrated testing of a second language. In C. Clap-
ham & D. Corson (Eds.), Encyclopedia of language and education: Language testing
and assessment 7 (pp. 121–130). Dordrecht, the Netherlands: Kluwer Academic.
Liu, M. (2007). Language anxiety in EFL testing situations. International Journal of
Applied Linguistics, 153, 53–75. doi:10.2143/ITL.153.0.2022821
Luoma, S. (2004). Assessing speaking. Cambridge, England: Cambridge University
Press.
Messick, S. (1989). Meaning and values in test validation: The science and ethics
of assessment. Educational Researcher, 18(2), 4–11.
Messick, S. (1994). The interplay of evidence and consequences in the validation
of performance assessments. Educational Researcher, 23(2), 13–23.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences
from persons’ responses and performances as scientific inquiry into score
meaning. American Psychologist, 50, 741–749. doi:10.1037/0003-066X.50.9.741
Miller, M., & Legg, S. (1993). Alternative assessment in a high-stakes environment.
Educational Measurement: Issues and Practice, 12(2), 9–15. doi:10.1111/j.1745-
3992.1993.tb00528.x

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 267


Olobatuyi, M. E. (2006). A user’s guide to path analysis. Lanham, MD: University
Press of America.
Oya, T., Manalo, E., & Greenwood, J. (2004). The influence of personality and
anxiety on the oral performance of Japanese speakers of English. Applied
Cognitive Psychology, 18, 841–855. doi:10.1002/acp.1063
Papajohn, D. (1999). The effect of topic variation in performance testing: The
case of the chemistry TEACH test for international teaching assistants. Language
Testing, 16, 52–81. doi:10.1177/026553229901600104
Pearlman, M. (2008). Finalizing the test blueprint. In C. A. Chapelle, M. K.
Enright, & J. M. Jamieson (Eds.), Building a validity argument for the Test of Eng-
lish as a Foreign Language (pp. 227–258). New York, NY: Taylor & Francis.
Plakans, L. (2008). Comparing composing processes in writing-only and reading-
to-write test tasks. Assessing Writing, 13, 111–129. doi:10.1016/j.asw.2008.07.001
Plakans, L. (2009). Discourse synthesis in integrated second language writing
assessment. Language Testing, 26, 561–587. doi:10.1177/0265532209340192
Plakans, L. (2010). Independent vs. integrated writing tasks: A comparison of task
representation. TESOL Quarterly, 44, 185–194. doi:10.5054/tq.2010.215251
Read, J. (1990). Providing relevant content in an EAP writing test. English for
Specific Purposes, 9, 109–121. doi:10.1016/0889-4906(90)90002-T
Sarason, I. G. (1984). Stress, anxiety, and cognitive interference: Reactions to tests.
Journal of Personality and Social Psychology, 46, 929–938. doi:10.1037/0022-3514.
46.4.929
Sarason, I. G. (1987). Test anxiety, cognitive interference, and performance. In R.
E. Snow & M. J. Farr (Eds.), Aptitude, learning and instruction: Cognitive and affect
process analyses (Vol. 3, pp. 131–142). Hillsdale, NJ: Lawrence Erlbaum.
Sarason, I. G., & Stoops, R. (1978). Test anxiety and the passage of time. Journal of
Consulting and Clinical Psychology, 46, 102–109. doi:10.1037/0022-006X.46.1.102
Schumacker, R. E., & Lomax, R. G. (2004). A beginner’s guide to structural equation
modeling. New York, NY: Taylor & Francis.
Shohamy, E. (1995). Performance assessment in language testing. Annual Review of
Applied Linguistics, 15, 188–211. doi:10.1017/S0267190500002683
Skehan, P. (1998). A cognitive approach to language learning. Oxford, England:
Oxford University Press.
Spielberger, C. D. (1966). Theory and research on anxiety. In C. D. Spielberger
(Ed.), Anxiety and behavior (pp. 3–20). New York, NY: Academic Press.
Spielberger, C. D. (1972). Anxiety as an emotional state. In C. D. Spielberger
(Ed.), Anxiety: Current trends in theory and research (pp. 23–49). New York, NY:
Academic Press.
Spielberger, C. D. (1983). Manual for the State-Trait Anxiety Inventory (Form Y). Palo
Alto, CA: Consulting Psychologists Press.
Spielberger, C. D., Anton, W. D., & Bedell, J. (1976). The nature and treatment of
test anxiety. In M. Zuckerman & C. D. Spielberger (Eds.), Emotions and anxiety:
New concepts, methods, and applications (pp. 317–345). Mahwah, NJ: Lawrence
Erlbaum.
Spielberger, C. D., Gorsuch, R. L., & Lushene, R. D. (1970). STAI: Manual for the
State-Trait Anxiety Inventory. Palo Alto, CA: Consulting Psychologists Press.
Spielberger, C. D., & Vagg, P. R. (1995). Test anxiety: A transactional process
model. In C. D. Spielberger & P. R. Vagg (Eds.), Test anxiety: Theory, assessment,
and treatment (pp. 3–14). Washington, DC: Taylor & Francis.
Stage, F. K., Carter, H. C., & Nora, A. (2004). Path analysis: An introduction and
analysis of a decade of research. Journal of Educational Research, 98, 5–12. doi:10.
3200/JOER.98.1.5-13

268 TESOL QUARTERLY


StatSoft. (2012). Electronic statistics textbook. Tulsa, OK: Author. Retrieved from
http://www.statsoft.com/textbook/
Steinberg, F. S., & Horwitz, E. K. (1986). The effect of induced anxiety on the
denotative and interpretative content of second language speech. TESOL
Quarterly, 20, 131–136. doi:10.2307/3586395
Swain, M., Huang, L.-S., Barkaoui, K., Brooks, L., & Lapkin, S. (2009). The speaking
section of the TOEFL iBT (SSTiBT): Test-takers’ reported strategic behaviors (TOEFL
iBT Research Report No. iBT-10). Princeton, NJ: Educational Testing Service.
Watanabe, Y. (2001). Read-to-write tasks for the assessment of second language academic
writing skills: Investigating text features and rater reactions (Unpublished doctoral
dissertation). Manoa, HI: University of Hawaii.
Webley, P., & Lea, S. (1997). Topic 3: Path analysis. Retrieved from http://people.
exeter.ac.uk/SEGLea/multvar2/pathanal.html
Weigle, S. (2004). Integrating reading and writing in a competency test for
non-native speakers of English. Assessing Writing, 9, 28–47. doi:10.1016/j.asw.
2004.01.002
Young, D. J. (1986). The relationship between anxiety and foreign language oral
proficiency ratings. Foreign Language Annals, 19, 439–445. doi:10.1111/j.1944-
9720.1986.tb01032.x
Zeidner, M. (1998). Test anxiety: The state of the art. New York, NY: Plenum Press.

TEST ANXIETY AND SPEAKING TEST PERFORMANCE 269

You might also like