Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Linguistics and Education 24 (2013) 535544

Contents lists available at ScienceDirect

Linguistics and Education


journal homepage: www.elsevier.com/locate/linged

Task type and linguistic performance in school-based


assessment situation
Zhengdong Gan
Department of Linguistics and Modern Languages, The Hong Institute of Education, 10 Lo Ping Road, Tai Po, N.T., Hong Kong

a r t i c l e i n f o a b s t r a c t

Available online 21 September 2013 This study aims at examining how learner L2 oral performance may vary across two different
task types in the current school-based assessment initiative being implemented across sec-
ondary schools in Hong Kong. The study is innovative in that the tasks in this study involve
Keywords: speaking in a high-stakes language assessment context but they also build on a regular
ESL reading and viewing programme integrated into the school curriculum. An in-depth analy-
Task type sis of learner oral linguistic performance on two different task types, i.e., group interaction
Linguistic performance
and individual presentation, from 30 ESL secondary school students, was conducted using a
School-based assessment
wide range of linguistic measures of accuracy, uency and complexity derived from previ-
ous L2 speaking studies. The analysis shows generally systematic variation in performance
dimensions across the two task types, suggesting a trend in the direction of less accuracy,
lower uency and less complex language being associated with the group discussion task.
In addition, differences on rater assessments also appeared in the same direction across the
two tasks as those differences on the linguistic measures. The results of this study appear
to offer little support to the existing categorization of interactive tasks producing greater
L2 complexity and accuracy than non-interactive tasks. Implications of the results for both
test task development and classroom task design are discussed.
2013 Elsevier Inc. All rights reserved.

1. Introduction

Studying task characteristics and the effect they have on language learning and language performance has become a
burgeoning area of research within second language acquisition (SLA), pedagogy, and assessment. Previous studies have
examined the effects of one or another aspect of second language (L2) task demands, such as the nature and extent of partic-
ipation on tasks (e.g., Duff, 1993; Van Lier & Matsu, 2000), the availability of planning time and task output (Wigglesworth,
1997), the effect of task design and performance conditions on language performance (e.g., Tavakoli, 2009; Tavakoli & Foster,
2008), and task difculty (e.g., Elder, Iwashita, & McNamara, 2002; Norris, Brown, Hudson, & Bonk, 2002). L2 acquisition and
pedagogy researchers are interested in task-based learner performance because learner language output during task perfor-
mance can inform us about the impact of tasks on emerging or partially internalized target language rules (Samuda, 2001),
and because tasks are seen as important vehicles for fostering or steering L2 learning and L2 development. In L2 assess-
ment, although the notion of task in task-based language performance assessment is of relatively recent lineage, deriving
much of its impetus from research in SLA and pedagogy (Bachman, 2002), it has been recognized that understanding the
effects of assessment tasks on test performance and how test-takers interact with these tasks is the most pressing issue
facing language performance assessment (p. 471). More explicitly, Fulcher and Marquez-Reiter (2003) suggest that learner

Tel.: +852 29487391.


E-mail address: zdgan@ied.edu.hk

0898-5898/$ see front matter 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.linged.2013.08.004
536 Z. Gan / Linguistics and Education 24 (2013) 535544

Table 1
Denitions of uency, complexity, and accuracy (based on Housen & Kuiken, 2009; Pallotti, 2009).

Aspect Denition

Fluency The capacity to produce speech at normal rate and without interruption or as the production of
language in real time without undue pausing or hesitation.
Complexity The size, elaborateness, richness, and diversity of the learners linguistic L2 system.
Accuracy The degree of deviancy from a particular norm.

task-based oral language performance, at least to some extent, reects test-takers language ability plus some construct-
irrelevant variance triggered by the impact of task characteristics (see also Tavakoli, 2009). Fulchers argument is thus that
task variability may well constitute a source of variance in the test-takers oral linguistic performance. In other words, there
is likely to be variation in test-taker performance by task characteristics and that this variation may inuence the type of
discourse and interaction elicited, which in turn probably impacts upon the nal assessment of a candidate. Consequently,
research identifying the catalysing features of tasks that impact on a learners language processing should provide empiri-
cally sound principles for both classroom materials design and test task development, and thus have practical value (Bygate,
1999; Tavakoli & Foster, 2008).
Prominently, in many SLA and L2 pedagogy contexts, variable performance by task and task characteristics or conditions
has been the object of a rich research programme adopting a cognitive approach that focuses on the interplay of various
aspects of task demandingness and language performance, as displayed by its uency, accuracy, complexity. Theoretical
models proposed by Skehan (1998, 2001, 2009) and Robinson (2001, 2005, 2007) represent good examples of such efforts.
Drawing on Levelts (1989) model of speech production process, Skehan (2001, 2009) argues that performing in an imper-
fectly learned L2 imposes a large burden on the learners attention as attending to one area may drain attention from other
areas, due to the inherent limited attentional and reasoning resources that humans can invest in solving a task. In contrast to
Skehans limited-attention model, Robinsons Triadic Componential Framework (2001, 2007; Robinson, Cadierno, & Shirai,
2009) species that language learners can access multiple attentional pools that do not compete, and depletion of atten-
tion in one pool has no effect on the amount remaining in another. This suggests that language learners can prioritize, for
example, both accuracy and complexity. Robinson argues that the more demanding a task is in terms of its content, the
more complex and accurate its linguistic performance will be. Again, this is somewhat different from Skehans (2001, 2009)
proposal that more complex tasks direct learners attention to content and divert attention away from form, generate more
complex speech at the expense of accuracy and uency.
One of the ways in which learner linguistic performance has been typically examined by task-based researchers adopting
a cognitive approach is to analyse the transcripts of the real performance data for evidence of particular linguistic character-
istics or features. These researchers, in particular, tend to explain and evaluate learner language with measures of accuracy,
uency and complexity which are seen as constituting a learners language prociency. According to Robinson, Cadierno,
and Shirai (2009), one advantage of using measures of accuracy, uency and complexity is that they enable comparison of
ndings for the effects of task demands on learner language production across a wide variety of task conditions, although
operational denitions have varied considerably, making comparisons across studies difcult in some instances (Ellis, 2009).
For the purpose of this study, drawing on Pallotti (2009) and Housen and Kuiken (2009), accuracy, uency, and complexity
are dened as follows (see Table 1):
Importantly, Skehan (2009) suggests that more needs to be said about the precise ways in which the performance
areas (i.e., accuracy, complexity, and uency) enter into competition, and what inuences there are which mediate this
competition. Robinson et al. (2009) also highlight that further research into differences in the language learners produce
in response to complex L2 task demands is warranted for both theoretical and practical reasons. Clearly more empirical
research needs to be undertaken before we are able to conclude which of the above theoretical models on task-based L2
performance is the most convincing. This article attempts to address the question of how task type impacts on oral linguistic
performance, as displayed by its uency, accuracy, complexity. It will briey review the research literature on task type and
task-based L2 performance. The current research study in which the effects of task type on L2 performance were examined
will then be reported and the implications the ndings of the study have for both test task development and classroom task
design will be discussed.

2. Literature review

2.1. Dening tasks

In spite of a global interest in task-based pedagogy and the growing body of research work on task characteristics and
L2 performance, the notion of task remains a somewhat fuzzy one, although various attempts have been made to dene it
(Richards, 2006). Richards (2006, p. 32) outlines the main characteristics of a task in the pedagogic context:

It is something learners do and carry out using their existing language resources.
It has an outcome that is not simply linked to learning language, though language acquisition may occur as the learner
carries out the task.
Z. Gan / Linguistics and Education 24 (2013) 535544 537

It has a focus on meaning.


In the case of tasks involving two or more learners, it calls upon learners use of communication strategies and interactional
skills.

In line with Richards summary of key characteristics of a pedagogic task, task in this paper is dened in the same way
as Bachman and Palmer (1996, p. 44): A language use task is an activity that involves individuals in using language for the
purpose of achieving a particular goal or objective in a particular situation, as this way of dening tasks allows a range
of tasks to be included, not only assessment tasks but also tasks intended specically for language teaching and learning
purposes.
As is the case with task denition, there seems also a lack of consensus about the task classication in the literature. As
far as second language speaking tasks are concerned, Bygate (1987), drawing on Brown and Yule (1983), makes a distinction
between factually-oriented talk and evaluative talk. Factually-oriented talk is further divided into four task types: description,
narration, instruction, and comparison. Evaluative talk is also further divided into four task types: explanation, justication,
prediction, and decision. There are some other taxonomies of tasks based on particular task features such as one-way vs.
two-way tasks. While this feature classication approach has shed light on current understandings of their impact upon
language learning and the nature of classroom interaction, it is the recent work of Skehan (1998, 2001, 2009) and Robinson
(2001, 2007, 2010) in which explicit links have been made between particular task type characteristics or conditions and
their impact on task difculty.

2.2. Theoretical perspectives on task-based L2 performance

The main tenet of Skehans (1998, 2003) theoretical work is that task characteristics and performance conditions may
affect task difculty in complex ways. For example, tasks emphasizing particular types of attentional demand may have
a measurable impact not on task difculty as a whole but on particular aspects of performance. Most recently, Skehan
(2009) has canvassed a range of inuences that may have an impact on task performance, and proposed links between these
inuences and predictions of task difculty. Some of these inuences may either complexify the performance characterized
by being more demanding of cognitive resources and requiring more active working memory use, or have the effect of
pressuring performance as a result of more effortful and slower access to the information needed. For example, some
pressuring inuences concern the online pressures that the learner has to cope with, particularly the time pressure under
which speaking has to take place. In Skehans view, monologic tasks simply represent the task type which combines the
different pressuring conditions, since they are likely to contain signicant quantities of input, and have to be done under
pressuring conditions.
Skehan (2009) also emphasizes that there are benecial inuences which either ease the task or alternatively focus
attention in a particular area. For example, pre-task planning time can also help to identify ideas and their inter-relationships,
and enable learners to prepare syntactic frames, sentence fragments, or even complete sentences. Skehan thus thinks that
the main benecial inuences with interactive tasks are that one has more time to plan and to prepare the ground for the
message one will utter while ones interlocutor has the oor. In addition, Skehan thinks that the presence of an interlocutor
in the interactive tasks makes more salient the need to be accurate and to avoid error.
Clearly, Skehan has established an explicit link between the four categories of inuence (complexication, pressuring,
easing, and focusing as inuences) and learner language performance: complexication links mainly to structural and lex-
ical complexity, whereas pressuring, easing, and focusing are more relevant for accuracy and uency. Overall, in Skehans
view, it is task characteristics and task conditions, in particular combinations, which predict accuracycomplexity corre-
lations when they occur. This enables him to conclude that that interactive tasks advantage accuracy and complexity, but
disadvantage uency. Such hypothesis is somewhat different from Robinsons taxonomy of task characteristics that focuses
on task complexity affecting accuracycomplexity relationship. In his model of the three superordinate categories of task
characteristics (2001, 2007), Robinson has proposed the independence of the dimensions of complexity and difculty with
complexity being a feature of the task, and difculty operationalized in terms of perceptions of task difculty on the part of
learners. Robinson thus thinks that variation in the quality of language produced in a task is a function of task complexity,
and task complexity should thus be the sole basis of prospective task classication, design, and sequencing decisions. While
Robinsons Triadic Componential Framework (see Robinson et al., 2009) does not make clear predictions on the effects of
interactivity on particular aspects of L2 performance, Robinson does state that interactivity and task complexity will result in
a combined effect on the linguistic complexity of L2 performance in that complex interactive tasks affect linguistic complex-
ity of L2 performance negatively because they trigger structurally and lexically less complex speech. Robinson further argues
that the benecial inuence of increased task complexity on accuracy, however, is not supposed to be affected differently
in complex dialogic tasks although uency is expected to decrease.

2.3. Empirical studies on task type and L2 task performance

Since the present study focuses on two task types, i.e., interactive group discussion and monologic individual presentation,
I next present a brief review of a number of empirical studies that compared interactive and monologic task performance.
Importantly, these studies provide methodological insights into the issues relevant to L2 task performance which the present
538 Z. Gan / Linguistics and Education 24 (2013) 535544

study aims to address. First, Foster and Skehan (1996) examined a personal information exchange task, a narrative task and
a decision-making task in terms of their impact on accuracy, uency, and complexity of learners language performance. The
participants in their study were 32 pre-intermediate-level students studying English as a foreign language at college level.
Foster and Skehan found that the personal task generated less complexity than the narrative and decision-making tasks
although the personal task produced the greatest amount of uency. In light of this result, they proposed that interactive
tasks tend to be associated with greater accuracy, complexity, but lower uency. Skehan and Foster (1997) further examined
the effects of types of tasks, as well as different task implementation conditions, on the uency, accuracy and complexity
of the learner language produced. The three tasks chosen for this study were similar in type to the tasks used in Foster and
Skehan (1996). They found that the decision task under planning conditions produced the highest complexity scores. The
results of these two studies became the basis of Skehans trade-off hypothesis that uency, accuracy and complexity seem to
enter into competition with one another, given the limited attentional capacities of second language users. However, Skehan
and his colleagues observation that more interactive tasks lead to more complex language performance did not nd support
in Michel, Kuiken, and Vedders (2007) study which found that the dialogic (i.e., interactive) task tended to elicit shorter
and structurally simpler sentences than the monologic narrative task. In other words, Michel et al.s study suggests that
interactivity may affect structural complexity negatively. But Michel et al. also found that students made signicantly fewer
errors and were signicantly more uent in the dialogic task condition. Finally, Bygate (1999) investigated the grammatical
complexity of the language of Hungarian secondary school learners of English aged 1516 on two types of unscripted oral
communication task: a narrative task and an argumentation task. Bygates ndings suggested that the narrative tasks might
stretch the speakers more in terms of complexity of syntactic and lexical processing. In addition, Bygates study revealed
that interpersonal content represented by the presence or absence of requirement to interact with an interlocutor might
inuence the language content in a given task.
Overall, the above empirical studies that compared interactive and non-interactive task performance in pedagogic con-
texts apparently yielded mixed results. Similarly, a number of empirical studies (e.g., Elder & Iwashita, 2005; Iwashita,
McNamara, & Elder, 2001) that replicated Skehans or Robinsons framework in experimental language testing settings have
also yielded mixed results. Given this, any claims concerning task characteristics and learner performance should be seen
as tentative, and need further empirical conrmation. This study aims at examining how learner oral performance may
vary across two different task types in the current school-based assessment being implemented across secondary schools in
Hong Kong. The study is innovative in that the tasks in this study involve speaking in a real high-stakes language assessment
context although they also build on a regular reading and viewing programme integrated into the school curriculum. Given
the scarcity of task-based research in the authentic assessment contexts, this study adds to our understanding of the impact
of task characteristics on task performance. Specically, the study focuses on the following research questions:

Is L2 performance in individual presentation tasks more accurate and uent, and syntactically more complex than that in
group discussion tasks in the current school-based assessment in Hong Kong?
How may L2 performance as reected in rater assessments vary across the two different task types in the current school-
based assessment in Hong Kong?

3. This study

3.1. Setting

The Hong Kong Examinations and Assessment Authority (HKEAA) has recently introduced a school-based oral assessment
component into the senior secondary English language curriculum. This assessment component, worth 15% of the total
Hong Kong Certicate of Education Examination (HKCEE) English mark, involves the assessment of English oral language
skills based on topics and texts drawn from a programme of independent extensive reading/viewing (Davison, 2007). The
assessment is currently used with Cantonese-speaking ESL Form 4 and 5 (that is, Grade 9 and 10) students in secondary
schools. This in-class performance assessment of students English oral language skills is carried out by the class teachers
who have been well trained through the professional development support programme offered by HKEAA, and are involved
at all stages of the assessment cycle, from planning the assessment programme, to identifying and/or developing appropriate
assessment tasks right through to making the nal judgments (Davison & Leung, 2009). Specically, the assessment involves
two major types of assessment task: individual presentation and group oral discussion, which are dened by the HKEAA
(2010, p. 8) as follows:

An individual presentation, which may be quite informal, is dened as a single piece of oral text in which an individual
speaker presents some ideas or information over a sustained period, with the expectation that they will not be interrupted.
An individual presentation requires comparatively long turns, hence a more explicit structure to ensure coherence.
A group interaction is dened as an exchange of turns or dialogue with more than one speaker on a common topic. An inter-
action is jointly constructed by two or more speakers, hence generally needs less explicit structuring but more attention
to turn-taking skills and more planning of how to initiate, maintain and/or control the interaction by making suggestions,
asking for clarication, supporting and/or developing each others views, disagreeing and/or offering alternatives.
Z. Gan / Linguistics and Education 24 (2013) 535544 539

3.2. Participants and tasks

The study reported here draws on videodata of one class of 30 ESL Cantonese-mother-tongue Form 4 students1 (aged
1416) carrying out the two kinds of school-based assessment tasksindividual presentation and group discussion. In the
individual presentation task, each participant was trying to promote a documentary lm they had viewed earlier in their
reading/viewing programme embedded into the regular English language curriculum. This lm was about former United
States Vice President Al Gores campaign to educate citizens about global warming. The students were expected to talk about
the purpose of the documentary and what they had learned from the documentary. They were also expected to explore how
the issue featured in the documentary related to their real life. Each individual presentation lasted about 35 minutes. For
the group discussion task, the participants had a discussion about a book called Charlie and the Chocolate Factory in groups
of three or four. This book was also one of the required readings in their reading/viewing programme. The group discussion
task also aimed at promoting the book among young children. Within each group, the students were expected to discuss
the personalities of the main characters in the book and highlight some of the major events the characters took part in,
and the way that the characters handled crises in the story. Each group discussion lasted about 79 minutes. For each of
the assessment tasks, the participants were told only about ten minutes before assessment took place what their topic
for presentation or discussion was. The teacher assessor, who had received rater training organized by the HKEAA before
undertaking assessment, sat nearby and assessed each participant with a scoring sheet, following the assessment criteria for
both group discussion and individual presentation that cover six levels of oral English prociency in the four major domains
of English language performance. These four domains are: Pronunciation & Delivery; Communication Strategies; Vocabulary
& Language Patterns; and Ideas & Organization (see Davison, 2007).
The video-taped individual presentations and group interactions were transcribed following conversation analysis con-
ventions (Atkinson & Heritage, 1984) by a research assistant who was a bilingual speaker of Cantonese and English. The
transcripts of both the individual presentations and group interactions were later checked by the author. Given an enor-
mous amount of time involved in coding of the data, the coding work was undertaken by both the author and the research
assistant. Specically, the data were coded by the author for the production units like T-units, clauses, verb phrases, words,
and further coded by the research assistant for the production units like syllables, silent pauses, lled pauses, repetitions,
formulations, and utterances. In the course of data coding, a sample of six transcripts coded by the author was checked by
another researcher, and the author checked six transcripts coded by the research assistant. Discrepancies that arose were
solved by discussion until agreement was reached.2

3.3. Linguistic measures

Given a general consensus in the literature that gauging the three traits of accuracy, and uency, complexity in the
language production of learners is a good starting point for describing task-based linguistic performance and its multidimen-
sionality (Norris & Ortega, 2009; Pallotti, 2009), this study attempts to apply measures of accuracy, and uency, complexity
to an L2 oral assessment context in order to account for how task type may impact on learner linguistic performance.

3.3.1. Accuracy
In the previous research (e.g., Foster & Skehan, 1996; Iwashita, Brown, McNamara, & OHagan, 2008; Skehan & Foster,
1999), accuracy was measured by the percentage of error-free clauses. This general measure of accuracy has often been
described as having the advantage of being potentially the most comprehensive in that all errors are considered. Given this
advantage, accuracy was measured by an index of error-free clauses in this study. Following Tavakoli and Skehan (2005), an
error-free clause was dened as a clause in which no error was seen in grammar, lexis and word-order.

3.3.2. Fluency
Following Kormos and Dnes (2004), Ellis and Yuan (2005), and Mehnert (1998), the following ve features were identied
as the measures of uency for this study: (1) Speech rate A; (2) Speech rate B; (3) The number of silent pauses per minute;
(4) The number of lled pauses per minute; (5) The number of repetitions and reformulations per minute. Speech rate A was
calculated by dividing the number of all syllables uttered in a given speech sample by the amount of time (expressed in
seconds) required to produce the speech sample, including pause time. The outcome was then multiplied by sixty to give
a gure expressed in syllables per minute. Speech rate B was calculated the same way as Speech rate A, but all syllables,
words, or clauses that were repeated or reformulated were excluded from the count. In analysing pauses, pauses over 0.2 s
were considered. To calculate the number of silent pauses per minute, the total number of silent pauses was then divided by
the total amount of time spent speaking expressed in seconds. The outcome was then multiplied by sixty. To calculate the
number of lled pauses per minute, the total number of lled pauses such as uhm, er, mm were divided by the total
amount of time expressed in seconds. The outcome was then multiplied by sixty. Finally, to calculate the number of repetitions

1
The participants in this study were all male students.
2
The changes in the way data were coded for these instances were carried over to the other transcripts, i.e., the discussion led to altering other transcripts
that were not cross-checked.
540 Z. Gan / Linguistics and Education 24 (2013) 535544

Table 2
Comparison of accuracy measure on the two tasks one-way repeated measures ANOVA.

Discussion Presentation

N M SD N M SD F p eta

Error free clauses 30 0.49 0.10 30 0.67 0.12 65.40 0.000 0.69

Notes: p < 0.0045. Effect size (eta); marginal (<0.2); small (>0.2 to <0.5); medium (>0.5 to <0.8); large (>0.8).

and reformulations per minute, the total number of repetitions and reformulations were divided by the total amount of time
expressed in seconds. The outcome was then multiplied by sixty.

3.3.3. Complexity
Following Iwashita (2006) and Iwashita et al. (2008), the following ve measures were identied as measures of gram-
matical complexity used in this study. (1) Length of T-units in terms of number of words. For this study, a T-unit is dened as
a nuclear sentence with its embedded or related adjuncts (Harrington, 1986, p. 53). The length of T-unit was determined by
calculating the mean number of words per T-unit. The mean length of T-unit was thus derived by dividing the total number
of words by the total number of T-units. (2) The number of clauses per T-unit, i.e., the T-unit complexity ratio. It is assumed
that the more clauses per T-unit, the more complex the speech. This measure was calculated by dividing the total number of
clauses by the total number of T-units. (3) The ratio of dependent clauses to the total number of clauses, i.e., the dependent
clause ratio. This measure reects the degree of embedding. It was calculated by dividing the number of dependent clauses
by the total number of clauses. In this study, two types of clauses were identied: independent and dependent clauses.
Following Iwashita (2006, p. 158), an independent clause is operationalized as the main clause of a complex sentence or
a simple sentence, with or without subject, complete or incomplete, target-like or nontarget-like; a dependent clause is
operationalized as a unied predicate (i.e., containing a nite verb, a predicate adjective, or a nontarget-like predication in
which the verb or part of the verb phrase is missing) that is embedded in or dependent on a main matrix clause. (4) The
number of verb phrases per T-unit, i.e., the verbphrase ratio. This measure was calculated by dividing the total number of
verb phrases by the total number of T-units. (5) The mean length of utterance, i.e., MLU. According to Dewaelea and Furnham
(2000), mean length of utterance can well reect learners capacity to build complex structures in their interlanguage. For
this study, an utterance was dened as a stream of speech constituting a single semantic unit (Crookes & Rulon, 1985, cited
in Foster, Tonkyn, & Wigglesworth, 2000, p. 359). Following Dewaelea and Furnham (2000), mean length of utterance (MLU)
in this study was derived by calculating the mean number of words of the three longest utterances produced by each speaker
in each task.

3.4. Rater assessments

Rater assessments on the four domains of language performance across the two task types, i.e., Pronunciation & Delivery;
Communication Strategies; Vocabulary & Language Patterns; and Ideas & Organization, were collected from the teacher
assessor mentioned above.

3.5. Data analysis

One-way repeated measures ANOVA was used to analyse the potential impact of task type on the students language
performance. To reduce the chance of a Type 1 error, a Bonferroni adjustment was applied. In the case of having 11 linguistic
measures, alpha level is set at 0.05/11, i.e., 0.0045. In the case of having 4 domains of rater assessment, alpha level is set at
0.05/4, i.e., 0.012.

Table 3
Comparison of uency measures on the two tasks one-way repeated measures ANOVA.

Discussion Presentation

N M SD N M SD F p eta

Speech rate A 30 158.12 27.31 30 143.42 22.98 12.36 0.001 0.29


Speech rate B 30 141.18 28.31 30 134.51 21.80 1.87 0.183 0.06
Number of lled pauses per minute 30 6.26 3.86 30 1.90 2.09 35.34 0.000 0.55
Number of silent pauses per minute 30 6.80 2.62 30 2.13 1.99 61.98 0.000 0.68
Number of repetitions and reformulations per minute 30 4.18 2.45 30 2.76 2.24 5.93 0.021 0.17

Notes: p < 0.0045. Effect size (eta); marginal (<0.2); small (>0.2 to <0.5); medium (>0.5 to <0.8); large (>0.8).
Z. Gan / Linguistics and Education 24 (2013) 535544 541

Table 4
Comparison of complexity measures on the two tasks one-way repeated measures ANOVA.

Discussion Presentation

N M SD N M SD F p eta

Length of T-unit 30 10.67 2.14 30 13.08 2.14 22.65 0.000 0.44


Clauses per T-unit 30 1.60 0.24 30 1.56 0.25 0.45 0.509 0.015
Dependent clause ratio 30 0.34 0.08 30 0.31 0.08 3.68 0.065 0.11
Verb phrase ratio 30 0.29 0.17 30 0.61 0.22 43.91 0.000 0.60
Mean length of utterance 30 20.00 5.15 30 26.44 5.88 21.92 0.000 0.43

Notes: p < 0.0045. Effect size (eta); marginal (<0.2); small (>0.2 to <0.5); medium (>0.5 to <0.8); large (>0.8).

4. Results

The results from the repeated ANOVA are presented in Tables 24 with the F values, signicance levels, means for each
measure, standard deviations, and effect sizes. With regard to accuracy effects, as can be seen in Table 2, a one-way within-
subjects analysis of variance yields an F value of 65.40 and p value of 0.000 with medium effect size (eta = 0.69). This means
the presentation task yields signicantly more accuracy than the discussion task. This result seems opposed to Skehans
(2001) observation that dialogic tasks tend to generate greater accuracy than monologic tasks.
There also seems to be supporting evidence for greater uency to be associated with the presentation task in this study.
The comparison of the uency measures across the two tasks seems to show that the discussion task generated lower
uency than the presentation task, with the discussion task eliciting signicantly higher number of lled pauses per minute
(F = 35.34, p = 0.000, eta = 0.55), signicantly higher number of silent pauses per minute (F = 61.98, p = 0.000, eta = 0.68). There
was also a higher number of repetitions and reformulations per minute in the discussion task than in the presentation
task (F = 5.93, p = 0.021, eta = 0.17), although the differences did not reach signicance level. The trend thus seems to be in
the direction of lower uency being associated with the discussion task. On the other hand, however, the results are not
completely clear-cut in that the discussion task produced a signicantly higher speech rate A (F = 12.36, p = 0.001, eta = 0.29)
with small effect size, although when speech rate B was calculated in the way in which all syllables, words, or clauses
that were repeated or reformulated were excluded from the count, it generated no signicant difference (F = 1.87, p = 0.183,
eta = 0.06) with marginal effect size.
We now turn to the complexity results (see Table 4 above). Three of the ve complexity measures (length of T-unit,
verb phrase ratio, and mean length of utterance) elicited signicant differences across the two tasks, with verb phrase ratio
generating the highest F value with medium effect size (F = 43.91, p = 0.000, eta = 0.60). This means that the presentation
task produced signicantly longer T-units and utterances as well as signicantly greater use of verb phrases. Although
the discussion task scored higher on clauses per T-unit (F = 0.45, p = 0.509, eta = 0.015) and dependent clause ratio (F = 3.68,
p = 0.065, eta = 0.11) than the presentation task, the differences did not reach signicance level. It is assumed that the larger
the proportion of dependent clauses to the total number of clauses, the greater the degree of subordination and consequently
the more complex the oral production. While the discussion task resulted in signicantly lower mean scores in verb phrase
ratio, as well as signicantly shorter T-units and utterances, it generated slightly higher mean scores on clauses per T-unit
and dependent clause ratio. In other words, the students seemed to have produced more words in their T-units in the
presentation task, but this does not mean that they produced syntactically more complex T-units. To search for an answer to
this issue, the author particularly carried out a more nely grained analysis of nite verbs across the two task types to account
for the difference in terms of subordination between the two tasks. It was discovered that, as is the case with Bygates study
(1999), there was a greater use of the expression I think in the discussion task than in the presentation task in the current
study, which would automatically affect the occurrence of subordinate clauses as suggested by Bygate. Consequently, the
slightly higher mean scores on clauses per T-unit and dependent clause ratio for the discussion task in this study could be
the result of the occurrences of the formulaic expression I think.
The assessment criteria for both group discussion and individual presentation cover six levels of oral English prociency
in the four major domains of English language performance (see Davison, 2007). These four domains are: Pronunciation &

Table 5
Comparison of rater assessments on the two tasks one-way repeated measures ANOVA.

Discussion Presentation

N M SD N M SD F p eta

Global score 30 13.03 2.91 30 14.20 3.56 6.88 0.014 0.19

Domain scores
Pronunciation and delivery 30 3.13 0.90 30 3.63 1.07 10.12 0.003 0.26
Communication strategies 30 3.00 0.87 30 3.30 1.21 2.16 0.153 0.07
Vocabulary and language patterns 30 3.13 0.86 30 3.50 0.97 5.58 0.03 0.16
Ideas and organization 30 3.77 0.82 30 3.77 0.97 0.00 1.000 0.00

Notes: p < 0.012. Effect size (eta); marginal (<0.2); small (>0.2 to <0.5); medium (>0.5 to <0.8); large (>0.8).
542 Z. Gan / Linguistics and Education 24 (2013) 535544

Delivery; Communication Strategies; Vocabulary & Language Patterns; and Ideas & Organization. Each participant in this
study thus received a separate score for each of the four domains of assessment criteria, as well as a global score as a result of
the aggregation of the domain scores, in each of the two assessment tasks. Table 5 above lists the statistics on the differences
in global and domain scores assigned to the students by the teacher assessor across the two tasks. Evidently, in terms of
the global score, the students scored signicantly higher in presentation task (F = 6.88, p = 0.014, eta = 0.19). In terms of the
domain scores, Pronunciation & Delivery (F = 10.12, p = 0.003, eta = 0.26) obtained signicant differences, with presentation
task generating higher scores than group discussion task. Communication Strategies and Vocabulary & Language Patterns
were higher in presentation task than in group discussion task, but this difference did not attain statistical signicance
(p = 0.153; p = 0.03). Interestingly, Ideas & Organization obtained exactly the same mean scores across the two task types.
Overall, the trend towards higher assessment scores on most of the assessment domains in presentation task is thus seen
here.

5. Discussion

This study examined the quality of language secondary school ESL students produced on two types of oral assessment
taska group discussion task and an individual presentation task. Its main focus is the relationship between task type and
linguistic accuracy, uency and complexity. Before discussing the linkage of the results to current theoretical hypotheses
relating to L2 performance in the SLA literature, I will summarize them briey. First, there was a signicant difference
across the two tasks in students linguistic accuracy measured by error-free clause ratio, with group discussion task yielding
less accuracy. The results also show that differences across the two tasks were signicant on three measures of linguistic
complexity, i.e., length of T-unit, verb phrase ratio, mean length of utterance, suggesting a trend in the direction of discussion
task being associated with less complex language and monologic task leading to longer and more sophisticated stretches of
discourse. This nding appears to t in with our knowledge of discourse about the ways in which human beings use language
in situations of communication (e.g., monologic and dialogic speech) discussed in linguistic research such as Brown and Yule
(1983), and more recently, Biber (2006) and Halliday and Matthiessen (2004). Signicant differences were also seen in two
uency measures in this study, i.e., lled pauses and silent pauses, across the two task types, again suggesting a trend in the
performance of discussion task towards lower uency. There was also a higher number of repetitions and reformulations
per minute in the discussion task than in the presentation task, although the differences did not reach signicance level. It
can thus be postulated that there was a general trend towards the presentation task being associated with greater accuracy,
higher uency and more complex language. In addition, differences on rater assessments appeared in the same direction
across the two tasks as those differences on most of the linguistic measures.
The most straightforward effect is that of task type on accuracy as evidenced by the result that the students language
output on the discussion task generated signicantly less accuracy than on the individual presentation task. It could be that
as Swain and Lapkin (2001) observe, in the heat of communication, students in the interactive task might be so concerned
with making themselves understood that they did not have much chance to pay attention to language forms. Krashen
(1988) also suggests that in the case of improvised interactive talk, L2 learners tend to rely on their feel for correctness
in conversation, which may lead to inaccurate output. Consequently, the phenomenon of interactive conversational tasks
leading to greater attention to form as a result of greater amounts of language-oriented negotiation of meaning documented
in the SLA literature (e.g., Long, 1996; Pica, 1994) did not materialize in the present speaking assessment context. It could
be that the learners in this study were a homogeneous group in terms of prociency level and L1 background, and shared
similar L2 speech habits, which facilities comprehensibility and reduces the chances of communication breakdowns and,
hence, the opportunities for negotiating for meaning.
The results of this study also suggest a trend in the direction of less complex language and lower uency being associated
with the group discussion task. There are two possible interpretations of this result. First, from an information-processing
perspective, the group discussion task could make greater demands on attentional resources than the individual presentation
task, as the students in the group discussion task had to be on alert listening to, processing and responding to his or her
partners contributions, or initiating a new turn. Meanwhile, throughout the discussion task, while the students in each group
pursued, developed, and shifted topics to display individual contributions, they also had to constantly monitor the content
of talk for relevance to the assessment task agenda to ensure the successful completion of the assigned task as a group. It can
thus be concluded that the group discussion task in this study should be more demanding of attention and project a greater
mental workload on the part of the participants than the individual presentation task, and hence it appeared to push them
towards less complex syntactic processing as well as less uency.
From an interpersonal perspective, the type of interaction built into the discussion task entailed the students co-
constructing their performance. In this sense, the oral output on the group discussion task was relatively co-constructed,
with competitively distributed responsibility among interlocutors for the initiation of topics-for-talk and creation of sequen-
tial coherence, identities, and events (Jacoby & Ochs, 1995; Lynch & Maclean, 2001). In Skehans (1998) discussion of task
difculty features, components of communicative stress are explicitly described as a combination of real-time processing
pressure for improvised talk and the extent to which individuals can control or inuence the task. Even under non-high-
stakes testing conditions, speaking in a group setting could be a challenge for those who might be quiet or shy normally
and not good at commanding attention or holding his or her own in social situations. Seen in this light, the presence of
collaborative dialogic discourse in the discussion task in this study could result in a certain level of communicative pressure
Z. Gan / Linguistics and Education 24 (2013) 535544 543

and anxiety on the part of the speakers, which could negatively impact on the students linguistic performance as suggested
in the literature of L2 anxiety (e.g., MacIntyre, Noels, & Clement, 1997). On the other hand, the individual presentation in this
study might remove communicative pressure of an interactive task, thus freeing up attentional resources for the learners
to access and making it possible for them to use the upper limits of their interlanguage systems and try complicated lan-
guage. This could possibly further help to explain why such monologic task might lead to longer and more uent stretches
of discourse characterized by relatively more sophisticated structure and potentially higher levels of accuracy.

6. Conclusion

This study focused on the inuence of task type on learners oral linguistic performance by attempting to apply to a
school-based assessment context the insights from task-based research carried out by SLA researchers. The results of this
study showed generally systematic variation in performance dimensions across the two task types, suggesting a trend in
the direction of less accuracy, lower uency and less complex language being associated with the group discussion task.
These ndings did not quite support Skehans (2001) categorization of interactive tasks producing greater L2 complexity
and accuracy than non-interactive tasks. This discrepancy might probably have a lot to do with the fact that Skehans studies
were conducted in a pedagogical context, whereas the current study was conducted in a high-stakes assessment situation.
Such contextual differences might have some bearing on learners attitude, anxiety, motivation and commitment to doing
their best on the task in question, which could produce different levels of stress and engagement during task performance
and in turn might affect candidates willingness to lift their game or to strive for enhanced performance. These differences
in terms of the intensity of learners affective reactions to the tasks that are performed under different conditions are clearly
worth further empirical studies. An important priority for future research is how learners attitudes, anxiety and motivation
in the assessment context may inuence task performance.
The ndings of this study conrm that different tasks may set different linguistic demands, and result in different types
of language processing, and hence production of different linguistic features. The ndings thus add to our understanding of
variation in learner L2 performance by specic task types in non-native L2 context. The results have implications for test
task design and development as they suggest a need for assessment procedures to employ different elicitation tasks and
gathering samples of speech from a variety of contexts to ensure the full range of candidates ability will be tapped. In addition,
a greater understanding of the impact of task type on learner performance could assist L2 testers in providing evidence in
support of or questioning the interpretations about learners abilities made on the basis of test scores on different language
elicitation tasks. Pedagogically, a greater understanding of how specic task types can affect the way learners process the
target language will assist teachers in developing learners capacity to align or realign their resources against actual task
demands and in identifying effective ways in which different task types can be used to increase learners repertoire of specic
language features and hence to promote learners L2 development.

Acknowledgements

The author wishes to thank the Linguistics and Education editors and the anonymous reviewers for their very helpful
comments on an earlier version of this article. I am also grateful to Elizabeth Walker and Paul Stapleton who read the earlier
draft and offered me advice and encouragement.

References

Atkinson, J. M., & Heritage, J. (Eds.). (1984). Structures of social action: Studies in conversation analysis. Cambridge, UK: Cambridge University Press.
Bachman, L. F. (2002). Some reections on task-based language performance assessment. Language Testing, 19(4), 453476.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: Oxford University Press.
Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam: John Benjamins.
Brown, G., & Yule, G. (1983). Teaching the spoken language. Cambridge, UK: Cambridge University Press.
Bygate, M. (1987). Speaking. Oxford: Oxford University Press.
Bygate, M. (1999). Quality of language and purpose of task: Patterns of learners language on two oral communication tasks. Language Teaching Research,
3(2), 185214.
Crookes, G. V., & Rulon, K. (1985). Incorporation of corrective feedback in Native Speaker/Non-Native Speaker conversation. Honolulu: Centre for Second
Language Research, Social Sciences Research Institute University of Hawaii (Technical Report No. 3).
Davison, C. (2007). Views from the chalk face: School-based assessment in Hong Kong. Language Assessment Quarterly, 4(1), 3768.
Davison, C., & Leung, C. (2009). Current issues in english language teacher-based assessment. TESOL Quarterly, 43(3), 393415.
Dewaelea, J.-M., & Furnham, A. (2000). Personality and speech production: A pilot study of second language learners. Personality and Individual Differences,
28, 355365.
Duff, P. (1993). Tasks and interlanguage performance: A SLA perspective. In G. Crookes, & S. Gass (Eds.), Tasks and language learning: Integrating theory and
practice. Clevedon, Avon: Multilingual Matters.
Elder, C., & Iwashita, N. (2005). Planning for test performance: Does it make a difference? In R. Ellis (Ed.), Planning and task performance in a second language
(pp. 219238). Amsterdam: John Benjamins.
Elder, C., Iwashita, N., & McNamara, T. (2002). Estimating the difculty of oral prociency tasks: What does the test-taker have to offer? Language Testing,
19(3), 347368.
Ellis. (2009). The differential effects of three types of task planning on the uency, complexity, and accuracy in L2 oral production. Applied Linguistics, 30,
474509.
Ellis, R., & Yuan, F. (2005). The effects of careful within-task planning on oral and written task performance. In R. Ellis (Ed.), Planning and task performance
in a second language (pp. 193218). Amsterdam: John Benjamins.
544 Z. Gan / Linguistics and Education 24 (2013) 535544

Foster, P., & Skehan, P. (1996). The inuence of planning time and task type on second language performance. Studies in Second Language Acquisition, 18,
299323.
Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons. Applied Linguistics, 21(3), 354374.
Fulcher, G., & Marquez-Reiter, R. (2003). Task difculty in speaking tests. Language Testing, 20(3), 321344.
Halliday, M., & Matthiessen, C. (2004). An introduction to functional grammar (3rd ed.). London: Arnold Press.
Harrington, M. (1986). The T-unit as a measure of JSL oral prociency. Descriptive and Applied Linguistics, 19, 4956.
Hong Kong Examination and Assessment Authority. (2010). English Language School-based Assessment Teachers Handbook.
Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and uency in second language acquisition. Applied Linguistics, 30(4), 461473.
Iwashita, N. (2006). Syntactic complexity measures and their relation to oral prociency in Japanese as a foreign language. Language Assessment Quarterly,
3(2), 151169.
Iwashita, N., Brown, A., McNamara, T., & OHagan, S. (2008). Assessed levels of second language speaking prociency: How distinct? Applied Linguistics,
29(1), 2449.
Iwashita, N., McNamara, T., & Elder, C. (2001). Can we predict task difculty in an oral prociency test? Exploring the potential of an information-processing
approach to task design. Language Learning, 51(3), 401436.
Jacoby, S., & Ochs, E. (1995). Co-construction: An introduction. Research on Language and Social Interaction, 28(3), 171183.
Kormos, J., & Dnes, M. (2004). Exploring measures and perceptions of uency in the speech of second language learners. System, 32(2), 145164.
Krashen, S. (1988). Second language acquisition and second language learning. New Jersey: Prentice-Hall International.
Long, M. (1996). The role of linguistic environment in second language acquisition. In W. C. Richie, & T. K. Bhatia (Eds.), Handbook of second language
acquisition (pp. 413468). San Diego, CA: Academic Press.
Levelt, W. (1989). Speaking: From intention to articulation. Cambridge: MIT Press.
Lynch, T., & Maclean, J. (2001). A case of exercising: Effects of immediate task repetition on learners performance. In M. Bygate, P. Skehan, & M. Swain
(Eds.), Researching pedagogic tasks: Second language learning, teaching, and testing (pp. 141162). Harlow: Longman.
MacIntyre, P. D., Noels, K. A., & Clement, R. (1997). Biases in self-ratings of second language prociency: The role of language anxiety. Language Learning,
47(2), 265287.
Mehnert, U. (1998). The effects of different lengths of planning time on second language performance. Studies in Second Language Acquisition, 20, 83108.
Michel, M. C., Kuiken, F., & Vedder, I. (2007). The inuence of complexity in monologic versus dialogic tasks in Dutch L2. International Review of Applied
Linguistics in Language Teaching, 45(2), 241259.
Norris, J., Brown, J., Hudson, T., & Bonk, W. (2002). Examinee abilities and task difculty in task-based second language performance assessment. Language
Testing, 19(3), 395418.
Norris, J., & Ortega, L. (2009). Towards an organic approach to investigating complex or difculty in instructed SLA: The case of complexity. Applied Linguistics,
30(4), 555578.
Pallotti, G. (2009). CAF: Dening, rening and differentiating constructs. Applied Linguistics, 30(4), 590601.
Pica, T. (1994). Research on negotiation: What does it reveal about second-language learning conditions, processes, and outcomes? Language Learning,
44(4), 493527.
Richards, J. C. (2006). Communicative language teaching today. Cambridge: Cambridge University Press.
Robinson, P. (2005). Cognitive complexity and task sequencing: Studies in a componential framework for second language task design. International Review
of Applied Linguistics, 43(1), 132.
Robinson, P. (2001). Task complexity, task difculty and task production: Exploring interactions in a componential framework. Applied Linguistics, 21(1),
2757.
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2 speech production, interaction, uptake, and perceptions of
task difculty. International Review of Applied Linguistics, 45(3), 193213.
Robinson, P., Cadierno, T., & Shirai, Y. (2009). Time and motion: Measuring the effects of the conceptual demands of tasks on second language speech
production. Applied Linguistics, 30(4), 533544.
Samuda, V. (2001). Guiding relationship between form and meaning during task performance: The role of the teacher. In M. Bygate, P. Skehan, & M. Swain
(Eds.), Researching pedagogic tasks, second language learning, teaching and testing (pp. 119140). Harlow: Longman.
Skehan, P. (1998). A cognitive approach to language learning. Oxford, UK: Oxford University Press.
Skehan, P. (2001). Tasks and language performance assessment. In M. Bygate, P. Skehan, & M. Swain (Eds.), Researching pedagogic tasks, second language
learning, teaching and testing (pp. 167185). Harlow: Longman.
Skehan, P. (2003). Task-based instruction. Language Teaching, 36(1), 114.
Skehan, P. (2009). Modelling second language performance: Integrating complexity, accuracy, uency, and lexis. Applied Linguistics, 30(4), 510532.
Skehan, P., & Foster, P. (1997). Task type and task processing conditions as inuences on foreign language performance. Language Teaching Research, 1(3),
185211.
Swain, M., & Lapkin, S. (2001). Focus on form through collaborative dialogue: Exploring task effects. In M. Bygate, P. Skehan, & M. Swain (Eds.), Researching
pedagogic tasks, second language learning, teaching and testing (pp. 99118). Harlow: Longman.
Tavakoli, P. (2009). Assessing L2 task performance: Understanding the effects of task design. System, 37(3), 482495.
Tavakoli, P., & Foster, P. (2008). Task design and second language performance: The effect of narrative type on learner output. Language Learning, 58(2),
439473.
Tavakoli, P., & Skehan, P. (2005). Strategic planning, task structure, and performance testing. In R. Ellis (Ed.), Planning and task performance in a second
language (pp. 239273). Amsterdam, the Netherlands: John Benjamins.
Van Lier, L., & Matsu, N. (2000). Varieties of conversational experience. Looking for learning opportunities. Applied Language Learning, 11(2), 265287.
Wigglesworth, G. (1997). An investigation of planning time and prociency level on oral test discourse. Language Testing, 14(1), 85106.

You might also like