Formulaic Sequences and L2 Oral Proficie

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Formulaic sequences and L2 oral proficiency: Does

the type of target language influence the association?

HELENE STENGERS, FRANK BOERS, ALEX HOUSEN AND


JUNE EYCKMANS

Abstract

This paper investigates the extent to which productive use of formulaic se-
quences by intermediate students of two typologically different languages, i.e.,
English and Spanish, is associated with their oral proficiency in these lan-
guages. Previous research (e.g., Boers et al. 2006) has shown that appropriate
use of formulaic sequences helps learners of English come across as fluent and
idiomatic speakers. The evidence from the present study, which was conducted
with the participation of Dutch-speaking students of English and Spanish, con-
firms that finding, as oral proficiency assessments based on re-tell tasks cor-
related positively with the number of formulaic sequences the students used in
these tasks. The correlations were strongest in the English language samples,
however. It seems that the greater incidence of morphological-inflectional er-
rors in our participants’ spoken Spanish dampens the contribution that using
formulaic sequences tends to make to their oral proficiency (as perceived by
our assessors). The findings are discussed with reference to typological differ-
ences between L1 and L2.

1. Introduction1

Many researchers, including Ellis (1996, 2002), Myles et al. (1998), Schmitt
(2004), Weinert (1995) and Wray (2002), assign a central role to formulaic
sequences in language acquisition and use. Formulaic sequences is used here
as a cover term for a variety of related phenomena also referred to as lexical
phrases or chunks, including classes as diverse as strong collocations (e.g., tell

1. We are grateful to Jean-Pierre van Noppen, Justine Kemlo, Kate McDonald, Carole Fielding,
Isabel Cangas, Rosa Fernández, Ana Gutiérrez, Pilar Fernández, Sofía Gallego and Paola
Acosta for their help with the study. We would also like to thank two anonymous reviewers
for their insightful comments and useful suggestions.

IRAL 49 (2011), 321–343 0019042X/2011/049-321


DOI 10.1515/iral.2011.017 c
!Walter de Gruyter

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
322 Helene Stengers et al.

a lie; heavy traffic), idioms (e.g., turn the tide; back to square one), binomials
(e.g., cuts and bruises; research and development), standardised similes (e.g.,
clear as crystal; dry as dust), proverbs and clichés (e.g., When the cat’s away
. . . ; That’s the way the cookie crumbles), discourse organisers (e.g., On the
other hand; Having said that) and social routine formulae (e.g., Nice to meet
you; Have a nice day). Some formulaic sequences are pretty much fixed as far
as their lexical composition is concerned, while others occur in different guises,
showing syntagmatic expansion (e.g., tell a white lie; Have very nice day) or
paradigmatic substitution (e.g., Pleased/Nice to meet you; conduct/carry out
an investigation). Paradigmatic substitution is especially evident in slot-and-
frame patterns, such as ‘take (someone) x time (to do y)’ (e.g., It took us two
hours to get there; It’ll take only five minutes). Additionally, some formulaic
sequences are completely fixed at the morphological level as well (e.g., *On
the other hands; *he was jumping guns), while others vary in accordance with
general ‘rules’ of grammar (e.g., commit/ commits/ committed a crime/ crimes).
Frequently occurring formulaic sequences are believed to play a crucial role
in language acquisition because they provide the material for exemplar-based
learning. This is the position advocated by Ellis (2002), who maintains that,
with regard to L1 acquisition, the usual developmental sequence is from for-
mula, through low-scope slot-and-frame pattern, to creative construction. The
outcome of this development, according to Skehan (1998), is a dual system
for language processing which comprises both a rule-based mode, which al-
lows language users to produce ‘novel’ utterances, and a memory-based mode,
which allows for stringing together holistically stored exemplars at a fast pace.
Which of the two modes takes precedence over the other is believed to de-
pend on the particular circumstances of language use: employing the rule-
based mode requires conditions that allow for some planning time, whereas the
memory-based mode facilitates fluency under ‘real-time’ conditions. Memory-
based language production is characterised by ample use of formulaic se-
quences. As most instances of language use typically involve a mixture of both
rule-based and memory-based processing, formulaic sequences can serve as
stepping stones between ‘novel’ word sequences. They may afford speakers
“processing time while computation proceeds, enabling us to plan ahead for
the content of what we are going to say, as well as the linguistic form” (Ske-
han 1998: 40). How densely distributed the stepping stones are will depend
on the discourse genre and the situational context. The incidence of formulaic
sequences has been shown to be particularly high in situations where language
users have had little opportunity to think about precisely how they will express
their thoughts and where a relatively high pace of delivery is expected (Kuiper
1996).
Like Skehan, Wray (2002) also recognizes two, complementary modes of
L1 language processing: while native speakers can adopt an analytic perspec-

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 323

tive on language patterns as the need arises, they will nevertheless tend to pro-
cess word strings in their L1 ‘holistically’, i.e., as sequences that were stored as
units in memory during childhood. The large stock of such ‘prefabricated’ word
strings that native speakers have acquired facilitates fluent language processing
not only in language production, but also in language reception. Experiments
have shown that formulaic sequences can be recognized and interpreted by re-
spondents before completion of the entire word string (Conklin and Schmitt
2008). This is said to free up processing space for interpreting less predictable
parts of incoming discourse. Wray and Perkins put it as follows: “The advan-
tage of the creative system is the freedom to produce or decode the unexpected.
The advantage of the holistic system is economy of effort when dealing with
the expected.” (Wray and Perkins 2000: 11).
There is no definitive figure as to the amount of formulaic language in lan-
guage, but according to some estimates between 50 % and 70 % of adult native
English language consists of formulaic sequences (Altenberg 1990; Erman and
Warren 2000). There are at least three difficulties in calculating the extent to
which discourse is formulaic. Firstly, as mentioned above, the relative density
of formulaic sequences is likely to vary between discourse genres and situ-
ational contexts. Secondly, the notion of formulaic sequence itself is rather
vague. Wray’s often quoted definition is as follows: “a sequence, continuous
or discontinuous, of words or other elements, which is, or appears to be, pre-
fabricated: that is, stored and retrieved whole from memory at the time of use,
rather than being subject to generation or analysis by the language grammar”
(Wray 2002: 9). This definition allows for a fair degree of individual varia-
tion: a given word string may be (more) formulaic for one speaker than for
another. For example, the string (not) statistically significant is more likely to
be experienced as a formulaic sequence by researchers than by lay people. The
string formulaic sequence may have attained ‘chunk status’ in linguistics cir-
cles, but may be experienced as a novelty elsewhere. Indeed, when different
experts are asked to independently identify formulaic sequences in samples of
discourse, agreement among them tends to be far from absolute (Eyckmans et
al. 2007; Foster 2001). A third difficulty in counting the incidence of formulaic
sequences lies in their variability. As mentioned above, many of them occur
with syntagmatic extensions and/or paradigmatic substitutions. Even idioms
show a considerable degree of variation from their dictionary-type form (Her-
rera and White 2010; Moon 1998). As a result, automatic searches for these
types of formulaic sequences in electronic corpora are likely to yield underes-
timations of their frequency of occurrence. We shall return to the possibility of
corpus-based counts of formulaic sequences further below.
From a psycholinguistics perspective, the variability of many formulaic se-
quences raises questions about their holistic representation in the mental lex-
icon (Boers and Lindstromberg 2009: 32-33). Are different variants of ‘the

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
324 Helene Stengers et al.

same’ formulaic sequence each stored as a pre-fabricated string (e.g., per-


form surgery; performs surgery; performing surgery; performed surgery), each
ready to be selected for retrieval, or is only a ‘canonical’ form (e.g., the most
frequently encountered variant or one unspecified for verb endings) stored as
a unit, still to be modified for actual usage through procedural knowledge of
grammar? Either way, it seems probable that the fewer variants available, the
more processing advantage a given formulaic sequence awards in terms of pro-
ductive fluency.
Let’s now turn to the use of formulaic sequences in a second or foreign lan-
guage. It is now well recognized that mastery of formulaic language is a pre-
requisite for learners to attain a native-like command of the language, because
it helps produce stretches of discourse that sound natural (or idiomatic) to na-
tive speakers and because they facilitate fluency (Dechert 1984; Granger 1998;
Pawley and Syder 1983; Skehan 1998; Wray 2002). Proposals for instructional
methods that stimulate acquisition of L2 formulaic sequences have been made
in recent years by Willis (1990) and more explicitly so by Nattinger and DeCar-
rico (1992). The learning contexts which the latter authors refer to are mostly
of the naturalistic type, i.e., situations in which learners are immersed in the
L2 community. Moreover, the focus is on English as the L2, while the associa-
tion of formulaic-sequence use and L2 proficiency clearly extends beyond that
particular L2. Forsberg (2008), for example, shows how this association also
holds in L2 French.
The learning context of the present study is an Instructed Second Language
Acquisition (ISLA) setting, i.e., a more ‘formal’ foreign language learning con-
text where students are exposed to the L2 predominantly in the structured en-
vironment of the language classroom at school, supplemented by individually
varying amounts of out-of-class exposure to the L2 through the media, indepen-
dent reading, homework and perhaps the occasional contact with L2 speakers
(Housen and Pierrard 2005). A proposal for a formulaic sequence-oriented ped-
agogy in this type of learning context has been Michael Lewis’ (1993, 1997,
2000) Lexical Approach. A critical review and proposals for optimizing that
approach is offered by Boers and Lindstromberg (2009).
Empirical support for the claims about the beneficial role of formulaic se-
quences in L2 spoken English comes from an experimental study reported by
Boers et al. (2006), in which significant correlations were found between EFL
students’ use of formulaic sequences during semi-structured (exam) interviews
and the oral proficiency scores they were awarded on the basis of these in-
terviews. The more the students resorted to formulaic sequences to express
themselves, the more likely they were to be perceived by expert raters to be
idiomatic, fluent and accurate language users. Positive correlations between
knowledge of formulaic sequences and proficiency have also been reported in
connection with L2 written English (Margareta Lewis 2008). There thus seems

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 325

to be good backing for pedagogical approaches which give due attention to L2


formulaic sequences.
However, the calls for such instructional approaches have so far stayed con-
fined mostly to the teaching of English. In the present study, we evaluate their
potential merits beyond L2 English, by comparing the degree to which oral
proficiency ratings obtained by learners of English and learners of Spanish are
associated with their usage of formulaic sequences.
There are two general reasons for suspecting that the contribution of formu-
laic sequences to learners’ proficiency might be weaker in L2 Spanish. One
reason is the reputation that English has for being a very ‘idiomatic’ language.
As English grammar is popularly believed to be relatively straightforward, the
true challenge of the FL English learner is often said to lie in mastering the lan-
guage’s vast repertoire of standardized phrases. This belief is probably fuelled
by the availability of a wide choice (in print and on the web) of resources for
learning English idioms, phrasal verbs, etc. and by reiterated statements that
English “is very idiomatic”.2 Stating that a given language is very idiomatic
carries the implicature that other languages, such as Spanish, might not be to
the same extent. This possibility was investigated by Stengers (2007, 2009:
35–73) who probed the relative pervasiveness of idiomaticity in English and
Spanish by means of two complementary studies. The first study bore on a nar-
row definition of idiom (Grant and Bauer 2004), namely the kind of phrases
that are included in so-called idiom dictionaries. Random sets of 500 idioms
were taken from comparable English and Spanish idiom dictionaries and the
frequency of occurrence of each of these idioms was checked in comparable
English and Spanish 56-million-word corpora. Mean frequencies of occurrence
turned out to be virtually the same: 24.56 and 25.57, respectively. The exer-
cise was extended to German by Accou (2009), who found an eerily similar
mean frequency of occurrence of her random set of 500 phrases taken from a
German idiom dictionary: 24.58. Stengers’ (2009) second comparative study
broadened the scope to formulaic sequences at large. One-hour recordings of
matched samples of English and Spanish radio interviews were compiled and
ten volunteers per language were asked to identify word strings they consid-
ered to be formulaic sequences (after having been briefed about this notion).
The mean number of formulaic sequences recognized per minute of record-
ing in either language turned out about the same: 5.88 and 5.38, respectively.
In short, neither of the comparative endeavours yields any statistical evidence
that English is any more idiomatic or formulaic than is Spanish.

2. Google searches for the word strings “English is very idiomatic” and “English is a very id-
iomatic language” generated about 2500 hits (on 14 December 2009). Similar searches for
Spanish generated only 9 hits.

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
326 Helene Stengers et al.

This finding, however, does not necessarily imply that the learning of for-
mulaic sequences poses exactly the same challenges in both target languages,
since these languages differ in other ways which might affect the learning
process. In terms of morpho-syntactic typology, English is relatively analytic,
whereas Spanish is more synthetic. An analytic language has comparatively
little grammatical inflection. Syntactic and semantic roles are signalled mostly
by word order and the use of prepositions (or postpositions) rather than by in-
flection. By contrast, a synthetic language tends to use more affixes and mod-
ifications of roots. Word order tends to be more flexible, since syntactic func-
tions and semantic roles are often signalled by the inflected forms (Bauer 2003;
Haspelmath 2002). Synthetic languages are further classified as inflectional or
fusional, if the forms of the words themselves change to indicate how they re-
late to other words in a sentence, or as agglutinating, if words are formed by
the combination of morphemes. It is clear that Spanish is characterised by a
richer inflectional grammar than English. Adjectives agree with nouns for gen-
der as well as number, verbs are systematically marked for person, number and
(indicative or subjunctive) mood, etc.
When it comes to certain types of formulaic sequences, such as verb-noun
collocations, these will almost inevitably display greater morphological vari-
ability in Spanish than in English. While the verb in run a risk, for exam-
ple, has relatively few variants (i.e., run, running, runs and ran), its equiva-
lent in Spanish, correr un riesgo, has many more (e.g., corro, corres, corre,
corremos, corréis and corren, which are but the verb forms of the present
indicative; to these could be added many more belonging to past and future
tenses). The greater variability in verb inflections is also likely to play in the
case of verb-phrase idioms (e.g., echar un cable – ‘lend a hand’, perder la
cabeza – ‘lose one’s mind’), some similes (e.g., venderse como churros – ‘sell
like hotcakes’), and verb-preposition collocations (e.g., soñar con – ‘dream
of’). More variability in Spanish is also to be expected in the case of adverb-
adjective collocations, whose gender marking will be determined by the nature
of the noun they happen to modify (e.g., estrechamente unido / unida – ‘closely
linked’).
Given this wider morpho-syntactic variation in synthetic languages, one may
hypothesize that fluent production of correct formulaic sequences is somewhat
more difficult to achieve in general than in analytic languages, since either
rule-based knowledge will more often need to be employed to mould a canon-
ical or schematic representation of a formulaic sequence into a grammatically
appropriate form, or the grammatically appropriate ‘variant’ of a formulaic se-
quence will need to be selected from a greater set of ready-made forms. In
addition, one may hypothesize that L2 formulaic sequences will start perform-
ing their function as stepping stones less readily in a synthetic target language
if the learner’s L1 happens to lean more towards the analytic pole of the ty-

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 327

pology continuum. In the study reported here, the participants’ L1 (Dutch)


was typologically closer to one target language (English) than the other (Span-
ish).

2. Research questions

Given the fact that Spanish is a relatively more synthetic, inflectional language
than English, we hypothesize that (L1 Dutch learners’) fluent production of for-
mulaic sequences presents a greater challenge in the case of L2 Spanish than
L2 English. This hypothesis is tested in a study which examines whether the
oral proficiency scores obtained by students of Spanish might be less clearly
associated with their use of formulaic sequences than has been reported in con-
nection with English (Boers et al. 2006). More specifically, the study addresses
the following research questions:
(a) Is the productive use of formulaic sequences by (L1 Dutch) students of
English and Spanish as foreign languages positively associated with their
oral proficiency in these languages, evaluated with regard to fluency, range
of expression and accuracy?
(b) If so, is the association equally strong in both languages?
(c) In case the association is weaker in L2 Spanish, could this be due to a
greater incidence of inflectional errors in the learners’ output?

3. Method

3.1. Participants

Participants were 60 Dutch-speaking students of modern languages at a uni-


versity college in Brussels, Belgium. 26 Students were majoring in English; 34
in Spanish. The English majors (aged 19–21) were in the second year of their
four-year training programme. The Spanish majors (aged 22–24) were in their
fourth year. These two cohorts were selected so that their levels of proficiency
in either language would be similar. Students of English tend to have a head
start over students of Spanish in Belgian higher education, as English is a com-
pulsory subject at secondary school and Spanish is not. Moreover, out-of-class
exposure to English (via the media, youth culture, etc) is greater than exposure
to Spanish in the region where the students came from (Flanders). The official
end-of-year objective for both groups in terms of general proficiency was level
C1 of the Common European Framework of Reference (CEFR). At the time
of testing for the present study, the level of oral proficiency of both cohorts of
students was estimated by their teachers to be at level B2. In order to ascertain
the equivalence in proficiency levels, the students’ oral proficiency scores were

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
328 Helene Stengers et al.

also subjected to a T-test statistic (see Section 4.2). The participants were not
told that they were taking part in an experimental study until after the study
had been completed.

3.2. Formulaic sequence counts

Students were given a re-tell task and their performance was recorded. This L2
sample served to count the students’ use of formulaic sequences and to gauge
their oral proficiency (see Section 3.3). The input for the task was identical for
both language groups. Students were asked to read a 600-word text in their L1
(Dutch) about a general interest topic. After reading the text, students returned
it to the teacher and they were given three minutes to prepare for the re-tell
task, which was to report the text’s content in their major foreign language.
Planning time was limited to ensure a fair degree of ‘real-time’ language pro-
duction. To help the students recollect the content of the input text, they were
given a list with single-word content cues, corresponding chronologically to
the paragraphs of the text. The reason for using an input text in the students’
mother tongue was to avoid a confounding variable in the Boers et al. (2006)
study, where many participants simply reproduced word strings from an L2
input text that were not necessarily part of their own repertoires of formulaic
sequences.
To identify formulaic sequences, one might resort to corpus data and rely on
statistical scores (so-called MI scores, T-scores or Z-scores) to decide which
words form strong partnerships. Unfortunately, corpus statistics do not always
generate word strings which coincide with people’s intuitions about what con-
stitutes a formulaic sequence (e.g., Ellis, Simpson-Vlach and Maynard 2008;
Nesselhauf 2003; Read and Nation 2004; Schmitt, Grandage and Adolphs
2004; Stengers 2007, 2009). Automatically generated corpus-based colloca-
tions lists may include combinations (e.g., in + the) that for lack of semantic
unity are not recognised as formulaic sequences by respondents. Other combi-
nations, which respondents do identify as formulaic sequences, may be absent
from automatically generated lists. For example, at the time of writing this ar-
ticle, consultation of the Collins Cobuild on-line collocations sampler yielded
no confirmation of the formulaic-sequence status of expressions such as throw
a party. For the present study we decided to resort to several native speak-
ers’ counts of what they considered to be formulaic sequences in the students’
spoken discourse. We realised that only partial agreement could be expected
among the respondents, but hoped to obtain sufficient inter-rater reliability as
calculated through correlation analyses. Three speakers of English and three
speakers of Spanish were asked to individually listen to the recordings and to
list all word strings they considered to be formulaic sequences. They had all

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 329

had training in linguistics and were familiar with the literature on phraseology,
including the notion of formulaic sequences.
As we (the authors) first piloted the counting procedure on a small L2 sam-
ple to assess the feasibility of the task we were planning to set for the six vol-
unteer formulaic-sequence counters, we quickly stumbled across word strings
that could perhaps be interpreted as students’ attempts at producing L2 formu-
laic sequences, but which deviated from the L2 targets in one way or another.
Students sometimes produced certain word strings smoothly and confidently –
leaving the impression of ‘holistic’ production – but the strings were actually
not target-like, possibly due to erroneous transfer from L1 (e.g., *if a person
has overweight; *it depends of . . . ). Students sometimes produced word com-
binations that did seem target-like at the level of content-word selection but that
failed to be correct at the level of morpho-syntax and/or function words (e.g.,
*they are desperate seeking for a partner . . . ). Deciding which of these erro-
neous strings might nevertheless be considered close enough approximations
of L2 targets to be included in the tally turned out difficult and the possibil-
ity of including of such approximations seriously reduced the inter-rater agree-
ment among the three authors during the piloting. It therefore became clear that
asking our volunteer formulaic-sequence counters to make such decisions too
would seriously jeopardise the degree of inter-rater agreement we were hop-
ing to obtain. We thus chose to facilitate the task of our formulaic-sequence
counters as much as possible by asking them to list only formulaic sequences
that corresponded fully to L2 targets. This means, of course, that the figures we
will be reporting below are likely to be underestimations of the students’ actual
attempts at producing L2 formulaic sequences.
In order to gauge whether language errors might counteract the positive in-
fluence of the use of formulaic sequences on students’ oral proficiency scores,
we3 made inventories of the students’ language errors independently of the
formulaic-sequence counters. For reliability’s sake we focused on errors at the
level of inflectional grammar (wrong verb conjugations, wrong adjective-noun
agreement, adjectival instead of adverbial form, etc.), since identification of
this type of error is pretty uncontroversial.

3.3. Measurement of oral proficiency

The assessment of learners’ spoken language proficiency has always been an


intricate matter (e.g., Luoma 2004). It is hard to pinpoint the features of a
stretch of spoken language that determine whether it will be perceived (by a

3. The first author counted errors in the L2 Spanish recordings; the second author did so in the
L2 English ones.

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
330 Helene Stengers et al.

native speaker and/or language teacher) as a sample of ‘proficient’ FL use, and


many quantitative measures have been proposed to gauge various aspects of
proficiency (see Ellis and Barkhuizen 2005).
For instance, fluency may be measured on the basis of temporal aspects of
the speech production, e.g., in terms of syllables per minute (e.g., Yuan and El-
lis 2003) or phonation/time ration, that is, the percentage of time spent speaking
as a percentage proportion of the time taken to produce the speech sample (e.g.,
Lennon 1990; Towell et al. 1996). Syntactic complexity may be measured by
the Sub-clause Ratio (Foster and Skehan 1996), whereas Lexical Frequency
Profiles (Laufer and Nation 1995), Giraud’s Index (Vermeer 2000) and Vocd
(Malvern et al. 2004) are some of the measures that have been proposed for lex-
ical richness. Accuracy measures include the Target-like Usage score (Stauble
1978) for grammatical accuracy, while lexical accuracy may be measured by
the number of lexical errors per clause.
It is not sure, however, whether the application of these measures yields
meaningful results if one wants to compare the proficiency of groups of learn-
ers of different languages. For example, it cannot be taken for granted that
conventional speech rates (e.g., syllables per minute) are the same in different
languages (Stengers 2009: 60). Also expectations regarding syntactic complex-
ity may vary. Kaplan (1966) argued that rhetorical patterns may differ across
languages, and this may possibly have repercussions on sub-clause ratios. One
example of a typological difference which might affect lexical richness mea-
sures in certain text genres is the distinction between satellite-framed and verb-
framed languages (Talmy 2000), which pertains to the expression of motion
events. Consider the difference between The girl ran into the house and La
chica entró en la habitación corriendo. In satellite-framed languages, such as
English, manner of motion is typically expressed by the verb, while a particle
or a prepositional phrase indicates direction. In verb-framed languages, such
as Spanish, the motion verb typically expresses direction and, if the manner of
motion is expressed at all, this is more often done by means of an adverbial
phrase. Comparative studies (e.g., Slobin 2000) show that satellite-framed lan-
guages express manner-of-motion more frequently than verb-framed languages
do and display a much richer range of manner-of-motion verbs (saunter, tiptoe,
hobble, sprint, pelt, veer, trudge, stagger, flit, etc.).
For the purpose of the present study – estimating to what extent learners’ use
of formulaic sequences helps them come across as proficient speakers –, it was
decided to ask experienced assessors to rate the quality of the students’ re-tell
tasks, and to subsequently ascertain inter-rater agreement through correlation
analyses. Three teachers of English and three teachers of Spanish, who were
unaware of the aim of the study and who did not know the students, listened to
the recordings and awarded scores by considering fluency (i.e., pace of delivery
and relative absence of hesitations), range of expression (i.e., lexical richness

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 331

and syntactic complexity) and accuracy (relative absence of ‘language errors’).


The assessors were asked to give a mark on a scale from one to fifteen. Asso-
ciated to the scale were the CEFR level descriptors of oral proficiency, which
the judges were asked to read and refer to.4 A score from 1 to 3 corresponded
to level A1, a score from 4 to 6 to level A2, and so on. The assessors could thus
discriminate to some extent between performances corresponding to the same
proficiency bracket.5

4. Results

4.1. Formulaic-sequence counts

As expected, the formulaic-sequence counts showed a fair degree of variation,


but Pearson correlation coefficients6 indicated satisfactory inter-rater agree-
ment as to which students had used relatively many or relatively few formulaic
sequences. Coefficients between the raters’ counts ranged from .592 to .941
(all significant at p < .001). The three raters’ raw counts were aggregated into
two different measures. Firstly, a mean type count was calculated per student,
suggesting the width of students’ use of formulaic sequences (e.g., get mar-
ried; chain smoker; a shoulder to cry on). The students of English were found
to produce significantly more formulaic-sequence types per recording than the
students of Spanish, with mean counts of 9.77 (SD 3.36) and 5.61 (SD 2.59),
respectively (T = 5.15; p = .000). Secondly, a mean token count was calcu-
lated per minute of recording, suggesting the relative density of formulaic-
sequence use by each student. Also in this respect, significantly more formu-
laic sequences were counted in the L2 English than in the L2 Spanish sample
with means of 2.75 (SD 0.76) and 1.56 (SD 0.62), respectively (T = 6.612;
p = .000).

4. We are aware that the CEFR scales for language proficiency are not beyond criticism (see, e.g.,
Hulstijn 2007 on the need for more solid empirical underpinnings). Also, the descriptors are
arguably not precise enough to warrant satisfactory inter-rater reliability. On the other hand,
these scales do now appear to be used quite commonly (in Europe) and so their employment
in the present study is at least ecologically valid.
5. Given the teachers’ previously indicated estimates of their students’ oral proficiency level, it
was felt unrealistic to include the highest CEFR level, i.e., C2, in the guidelines for marking.
A mark of 15 thus corresponded to a convincing performance at the C1 level.
6. All the data were tested for normal distribution by means of the Kolmogorov-Smirnov test.
When they were found to be normally distributed, statistical analyses were performed using
parametric tests. If not, non-parametric alternatives were used.

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
332 Helene Stengers et al.

These results do not mean, however, that the students of Spanish necessar-
ily made fewer attempts at using L2 formulaic sequences than the students of
English. After all, the tallies include only word strings that fully correspond to
L2 targets. As we shall see below, there are reasons to assume that many more
word strings were excluded from the L2 Spanish counts because they contained
morpho-syntactic errors.

4.2. Oral proficiency scores

The degree to which the language proficiency assessors converged in their scor-
ing, though statistically significant, was not as high as we had hoped, with
Spearman rank correlations ranging from .255 to .716. Nevertheless, for both
language cohorts, the three raters’ scores were combined into one mean score
per student. These mean scores ranged from 7 to 12 (corresponding to CEFR
brackets B1 and B2) in both FL cohorts. Students’ mean group scores in the
re-tell task are summarised in Table 1.

Table 1. Comparison of re-tell oral proficiency scores (on a 15-point scale)

English cohort Spanish cohort T p


(N = 26) (N = 34)
Fluency 9.64 9.65 −.019 .985
(1.10) (1.25)
Range 9.23 9.76 −2.033 .047
(1.11) (0.93)
Accuracy 9.03 8.95 −1.985 .790
(1.26) (1.25)

Application of a T-test indicates a (borderline) significant difference for the


Range parameter, on which L2 Spanish students were awarded higher scores
than their L2 English peers. The two cohorts were found to be on a par in
terms of Fluency and Accuracy, however. The mean scores on each parameter
are actually very similar, which is a reflection of the fact that our assessors
often gave ‘blanket’ scores per student (i.e., the same mark for each of the
three parameters). It is possible that our assessors generally found it hard to
draw a line between fluency, range of expression, and accuracy.

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 333

4.3. Correlation between formulaic sequence counts and oral proficiency


scores

The Pearson correlations between the formulaic-sequence counts and the oral
proficiency scores are summarized in Table 2 below. As mentioned in Sec-
tion 3.4, we calculated mean type counts and mean tokens-per-minute counts.
Given the assumption that using formulaic sequences is a way of avoiding hes-
itations, the correlation coefficient for the Fluency parameter was calculated
using the tokens-per-minute measure. For the Range parameter, the type counts
were used (as a reflection of the diversity or width of students’ repertoires of
formulaic sequences). As the occurrence of errors did not differ between tokens
of the same type, we also used the type counts in connection with Accuracy.
The formulaic-sequence counts and the oral proficiency scores are positively
correlated in both L2 language samples. However, the correlations for L2 En-
glish are clearly stronger than for L2 Spanish. An illustration of the difference
in linearity of the association in both languages for the Range parameter is
given in Figures 1 and 2.

Table 2. Correlations of formulaic-sequence counts and oral proficiency scores

ENG SPA
Fluency .550 (p < .01) .361 (p < .05)
Range .626 (p < .01) .389 (p < .05)
Accuracy .561 (p < .01) .363 (p < .05)

 





Figure 1. Range score and formulaic-sequence (type) count correlations in L2 English

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
334 Helene Stengers et al.


 

 


Figure 2. Range score and formulaic-sequence (type) count correlations in L2 Spanish

4.4. Morphological errors in students’ output

As we saw in the introduction, the inflectional properties of Spanish may


present L2 Spanish students with an extra challenge to produce ‘morpholog-
ically correct’ formulaic sequences. This could account not only for the lower
chunk count means observed for Spanish, it may be an additional explanation
for the weaker correlations with the oral proficiency ratings. In order to verify
this hypothesis, we decided to count the number of inflectional errors occurring
in the students’ recorded discourse. For the purposes of this study, we focused
on inflectional errors bearing on verb conjugations and adjective, noun, adverb
and article flexion. The results of this error count are presented in Table 3. As
the number of errors in the L2 English sample – which were few and far be-
tween – did not show a normal distribution, a Mann-Whitney U test was carried
out.
Clearly, the students of Spanish made significantly more inflectional errors
than the students of English. Examples of such errors are listed in the Ap-
pendix.
Pearson correlation coefficients indicate that L2 Spanish students who made
relatively many inflectional errors also tended to be awarded lower oral profi-
ciency scores (see Table 4). In other words, the occurrence of inflectional errors

Table 3. Comparison of inflectional errors in re-tell task

FL English cohort FL Spanish cohort U p


(N = 26) (N = 34)
Number of inflectional er- 0.56 (SD 0.84) 3.53 (SD 1.85) 29.500 .000
rors per recording

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 335

Table 4. Correlations of number of inflectional errors and oral proficiency for L2 Span-
ish

Fluency −.430 (p < .05)


Range −.249
Accuracy −.471 (p < .005)

in L2 Spanish most probably affected the proficiency scores. Not surprisingly,


the strongest negative correlation is found for the parameter Accuracy.
Given the near absence of morpho-syntactic errors in the L2 English sam-
ples, it is unsurprising that no such correlations were found between their in-
cidence and the oral proficiency scores awarded to the L2 English re-tell per-
formances (Spearman rank correlation coefficients: Fluency .09, Range 0.04,
Accuracy −0.07)
At first sight, the greater incidence of inflectional errors in the Spanish re-tell
tasks seems to contradict the finding reported in section 4.2 that the students
of Spanish as a group were given ratings for the accuracy parameter that were
similar to those awarded to the students of English as a group. It is not un-
likely, however, that our assessors of the Spanish re-tell tasks, who knew about
the students’ L1 language background, were relatively tolerant of these kinds
of language errors overall because they took into account the challenges posed
by the typological distance between the target language (Spanish) and the stu-
dents’ L1 (Dutch). What the correlation coefficients do suggest is that the num-
bers of language errors helped our assessors to distinguish between (what they
felt to be) relatively good and relatively weak performances within the group
of L2 Spanish students.

5. Discussion

The data collected reveal positive correlations between the numbers of formu-
laic sequences used by students and their oral proficiency as perceived by blind
judges. The correlations appear to be weaker in L2 Spanish, however, where the
higher incidence of inflectional errors generally seems to dampen the positive
impression which the appropriate use of formulaic sequences helps to make.
Because of the greater importance of inflection in Spanish, it also seems
that formulaic-sequence mastery under real-time conditions is more difficult
to achieve in Spanish than in English. Unlike the students of English, the stu-
dents of Spanish regularly made inflectional errors also in their attempts to
produce word strings which few would contest as being lexically semi-fixed

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
336 Helene Stengers et al.

(e.g., strong collocations), as in *juegue un papel, *las parejas divorciados,


*una investigación llevado a cabo).
This finding again raises the question as to how formulaic sequences are to
be represented in the mental lexicon in order for them to afford their processing
advantage generally and to fulfil their function as stepping stones in language
production in particular. Is only a canonical form truly prefabricated, leaving
variants to be generated procedurally? Alternatively, are the different variants
stored as single units, each waiting to be retrieved as ready-made units? It is
not clear which of these two scenarios is most plausible from the perspective
of automaticity models (e.g., DeKeyser 2001). Automaticity is the ability to
produce streams of words and phrases with rapidity and accuracy (Segalowitz
2003). Two types of theories of automatisation have emerged, namely mem-
ory-based and process-based theories. According to memory-based theories
(e.g., Logan 1988; Schmidt 1992; MacKay 1982) automatisation is a matter of
strengthening associations in memory until the associated elements are stored
as a single unit, ready to be retrieved as a whole. Process-based theories (e.g.,
Anderson 1993), on the other hand, hold that automatisation is a matter of
practising cognitive procedures (such as the application of ‘grammar rules’) so
that their implementation becomes faster and faster. According to the power
law of learning (Ellis and Schmidt 1997), practice will at some point cease
to provide returns in terms of acceleration, because optimal performance has
been reached. In the latter view, fluent production of formulaic sequences is a
consequence of the very fast assemblage of its components rather than holistic
retrieval from memory.
According to Wray (2002), formulaic sequences are processed differently
by natives and non-natives. Her account of the holistic processing of formu-
laic sequences by native speakers is reminiscent of memory-based models of
automatisation, whereas her concession that L2 learners may achieve fluency
in using formulaic sequences through profound procedural knowledge chimes
with process-based models. If Wray is right, then L2 learners will benefit from
the fluent use of formulaic sequences if they manage to assemble them at high
speed, i.e., as proceduralised strings.
It is also possible (and probable) that some L2 formulaic sequences are re-
trieved holistically by learners, while others are assembled procedurally. In this
respect, Schmitt (2005) states that "there seem to be a number of different kinds
of lexical chunks, and each category is likely to be used and even processed in
somewhat different ways" (Schmitt 2005: 21). To illustrate his claim, he exam-
ined three different types of formulaic language which have varying degrees of
fixedness: idioms, e.g., packed like sardines, variable expressions (i.e., phrases
containing open ‘slots’), e.g., put ___ to the test, and ‘lexical bundles’ (i.e.,
continuous strings of words identified by corpus analysis), e.g., the fact that.
Schmitt (2005) assumes that ‘holistic’ storage comes in place when address-

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 337

ing a formulaic sequence which is an ‘intact, unchangeable whole’. Whenever


variation occurs, however, different processing strategies are relied on. In the
case of idioms, Schmitt speculates that only a ‘canonical’ form is stored as a
template, which can be accessed via one or more content words of the expres-
sion and from which creative forms can be generated. In the case of variable
expressions, either all chunk variants are stored as individual units in the ‘phra-
sicon’, e.g., a (week) ago, a (year) ago, a (minute) ago, etc., or, in the case of
a higher degree of possible variation such as tense variation, one ‘canonical’
form is stored containing a grammatical slot which requires appropriate in-
flection, e.g., [inflected form of stand] shoulder to shoulder. Finally, Schmitt
proposes that lexical bundles are likely to be stored as single units, given their
relative fixedness and high frequency.
At any rate, it stands to reason that formulaic sequences that vary least will
be stored as single units more easily than those which come in a variety of
forms. As the degree of variation at the level of inflection and word-order is
likely to be greater in synthetic languages than in analytic ones, the demands
on procedural knowledge for achieving fluent and accurate production of for-
mulaic sequences are also likely to be greater for learners of the former type of
language, perhaps especially so if their mother tongue is typologically distant
from it. It is therefore possible that learners of an inflectional language such
as Spanish will reap the benefits of formulaic-sequence mastery for real-time
production only at an even more advanced stage in their interlanguage devel-
opment than learners of analytic languages such as English, in accordance with
the extent to which a sizeable repertoire of word strings have been entrenched
in long-term memory and/or more procedural strategies are automatised (Tow-
ell et al. 1996; Schmidt 1992).

6. Conclusion and directions

The findings presented in this article corroborate earlier claims that mastery of
formulaic sequences entails a processing advantage not only for native speakers
but also for L2 learners, although the evidence is more convincing in the L2 En-
glish data than in the L2 Spanish data. While this lends support to pedagogies
aiming to help L2 learners expand and employ their repertoire of formulaic
sequences (e.g., Boers and Lindstromberg 2009; Lewis 1993; Nattinger and
Decarrico 1992), our analysis also suggests that mastery of L2 formulaic se-
quences is more of a challenge in a synthetic language such as Spanish than in
an analytic language such as English, perhaps especially so when the learner’s
L1 leans towards the analytic pole of the typology continuum too. The classes
of formulaic sequences that are not fully fixed tend to be amenable to more
inflectional changes in Spanish than in English. Depending on the model of

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
338 Helene Stengers et al.

automatisation one adheres to, this means that either more variants will need
to be stored holistically or procedural knowledge of grammar will need to be
sufficiently ‘in place’ to assemble formulaic word strings not only fast but also
accurately for real-time language production. Whatever the case may be, this
finding does not undermine the claim that knowledge of phraseology is use-
ful, as we obtained positive correlations in both FL language cohorts. It merely
indicates that it may take longer for learners of inflectional languages to de-
velop this knowledge sufficiently for it to have a measurable impact on their
language proficiency, especially under real-time conditions. Also, in the case of
instructed SLA considered here (i.e., classroom-based learning), it seems that
inflectional languages would benefit from a comparatively greater investment
in focus on form or even focus on forms (Long 2001; Doughty and Williams
1998)7 , and this would then hold true not only for the learning of formulaic
sequences, of course. In short, while our findings suggest that formulaic se-
quences serve the purpose of stepping stones that help learners attain a certain
level of fluency in oral interaction, many of these stepping stones may stay
slippery for longer in the case of synthetic languages.
We need to acknowledge, of course, that the different strengths of associa-
tion between formulaic sequence use and oral proficiency scores between L2
English and L2 Spanish that we have attested here may not be entirely due
to the nature of these target languages per se. Although no statistically sig-
nificant differences were found between the two groups’ proficiency scores,
we cannot rule out the possibility that the assessors of L2 Spanish were more
lenient in their scoring than the assessors of L2 English, due to different ex-
pectations. While the two cohorts were roughly matched as far as the quantity
of classroom-based instruction over time in either language was concerned,
we cannot ignore the possible influence of the broader circumstances in which
our participants were learning either of these languages. As was mentioned
above, students in Flanders tend to get much more out-of-class exposure to
English than to Spanish. As a consequence, learners of English may also get
more opportunities for exemplar-based learning – including opportunities for
the incidental acquisition of formulaic sequences. If the L2 English cohort was
slightly more advanced, after all, then this could add an explanation for the
stronger correlations between formulaic-sequence counts and oral proficiency
scores. The results of previous studies have suggested that it is especially at ad-

7. Focus on form refers to pedagogical approaches in which learners’ attention is drawn to lin-
guistic elements during a communicative activity, whereas focus on forms “entails teaching
discrete linguistic structures in separate lessons in a sequence determined by syllabus writers”
(Laufer and Girsai 2008: 2).

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 339

vanced proficiency levels, that the contribution made by formulaic sequences


becomes most tangible (Eyckmans 2007; Forsberg 2009). A replication of the
study with the participation of more rigorously matched groups of learners
would be worthwhile.
Besides replication, new studies may wish to extend the collection of data.
Our investigation considered students’ performance in just one particular speak-
ing task (i.e., a re-tell task). Future research will have to confirm whether the
same trends are observed in other (real-time) speaking activities, such as a con-
versation, where pragmatic formulae or interaction routines play a greater part.
It is also likely that the appropriate use of some phrases (e.g., back to square
one) will affect a rater’s perception of a learner’s level of proficiency more
than others (e.g., You know what I mean). A qualitative breakdown per type
of formulaic sequence, while very intricate, could help fine-tune the attested
association of formulaic-sequence mastery and oral proficiency.

Erasmushogeschool Brussel
"helene.stengers@ehb.be#
Victoria University of Wellington
"frank.boers@vuw.ac.nz#
Vrije Universiteit Brussel
"housen@vub.ac.be#
Erasmushogeschool Brussel
Vrije Universiteit Brussel
"june.eyckmans@vub.ac.be#

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
340 Helene Stengers et al.

Appendix
Examples of morpho-syntactic errors encountered in the re-tell tasks

L2 Spanish

– los solteros holandos – ellos son más feliz


– las parejas casados – una vida más aislado
– unos contactos sexual – una mejora salud
– suelen ser bastantes satisfecho – mueran más jóven
– el primer razón – estos soltero no consideren que
– ahora la universidad quiera investigar – las personas que nunca se casa
– las personas solteros – no es que no quieren . . .
– mueren más rápida – los resultados son las siguientes:
– los viudos son más propensas a . . . – mejor que no fumas y buscas un novio
– los países europeas – más de dos millión de solteros
– cuatros experimentos – los americanos investigadas
– tiene mucho influencia – son muy desesperado
– según una investigación hecho en . . . – se cuenta dos millones de
– son contento en su relaciones – muchos más personas
– las personas que han investigadas – esto juegue un papel
– disfruten más de la vida – hay poco investigación
– no puedan hablar – suelen ser alegro
– estas síntomas – muchos sufran de la sida
– la investigación llevado a cabo – no quieren vivir sola
– una día laboral – contar su expieriencias a su parejas

L2 English

– more happier – desperate in search of


– look after themself – two millions
– much more singles are – the factors that influences this are

References

Accou, Sien. 2009. The repertoire of German figurative idioms. MA dissertation. Brussels: Eras-
mus University College Brussels.
Altenberg, Bengt. 1993. Recurrent verb-complement constructions in the London-Lund Corpus.
In Jan Aarts, Pieter de Haan & Nelleke Oostdijk (eds.), English language corpora: Design,
analysis and exploitation, 227–245. Amsterdam: Rodopi.
Anderson, John R. 1993. Rules of the mind. Hillsdale: Lawrence Erlbaum.
Ellis, Rod & Gary Barkhuizen. 2005. Analysing learner language. Oxford: Oxford University
Press.
Bauer, Laurie. 2003 Introducing linguistic morphology. Washington, D.C.: Georgetown University
Press.

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 341

Boers, Frank, June Eyckmans, Jenny Kappel, Hélène Stengers & Murielle Demecheleer. 2006.
Formulaic sequences and perceived oral proficiency: Putting a lexical approach to the test.
Language Teaching Research 10. 245–261.
Boers, Frank & Seth Lindstromberg. 2009. Optimizing a lexical approach to instructed second
language acquisition. Basingstoke: Palgrave Macmillan.
Conklin, Kathy & Norbert Schmitt. 2008. The processing advantage of formulaic Sequences. Ap-
plied Linguistics 29. 72–89.
Dechert, Hans W. 1984. Second language production: Six hypotheses. In Hans W. Dechert, Hans
W., Dorothea Möhle & Manfred Raupach (eds.), Second language productions, 211–230.
Tübingen: Gunter Narr Verlag.
DeKeyser, Robert. 2001. Automaticity and automatization. In Peter Robinson (ed.), Cognition and
second language instruction, 125–151. Cambridge: Cambridge University Press.
Doughty, Catherine & Jessica Williams (eds.) 1998. Focus on form in classroom second language
acquisition. Cambridge: Cambridge University Press.
Ellis, Nick C. 1996. Sequencing in SLA: Phonological memory, chunking, and points of order.
Studies in Second Language Acquisition 18. 91–126.
Ellis, Nick C. 2002. Frequency effects in language acquisition: A review with implications for
theories of implicit and explicit language acquisition. Studies in Second Language Acquisition
2. 143–188.
Ellis, Nick C. & Richard Schmidt. 1997. Morphology and longer distance dependencies: Labora-
tory research illuminating the A in SLA. Studies in Second Language Acquisition 19. 145–171.
Ellis, Nick C., Rita Simpson-Vlach & Carson Maynard. 2008. Formulaic language in native and
second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quar-
terly 42. 375–396.
Erman, Britt & Beatrice Warren. 2000. The idiom principle and the open choice principle. Text 20.
87–120.
Eyckmans, June. 2007. Taking SLA research to interpreter training: Does knowledge of phrases
foster fluency? In Frank Boers, Jeroen Darquennes & Rita Temmerman (eds.), Multilingual-
ism and applied comparative linguistics: Pedagogical perspectives, 89–104. Newcastle: Cam-
bridge Scholars Publishing.
Eyckmans, June, Frank Boers & Hélène Stengers. 2007. Identifying chunks: Who can see the wood
for the trees? Language Forum 33. 85–100.
Forsberg, Fanny. 2008. Le langage préfabriqué: Formes, fonctions et fréquences en français parlé
L2 et L1. Bern: Peter Lang Verlag.
Forsberg, Fanny. 2009. Formulaic sequences: A distinctive feature at the advanced/very advanced
levels of second language acquisition. In Emmanuelle Labeau & Florence Myles (eds.), The
advanced learner variety: The case of French, 173–197. Bern: Peter Lang Verlag.
Foster, Pauline. 2001. Rules and routines: A consideration of their role in task-based language
production of native and non-native speakers. In Martin Bygate, Peter Skehan & Merrill Swain
(eds.), Researching pedagogic tasks: Second language learning, teaching, and testing, 75–93.
London: Longman.
Foster, Pauline & Peter Skehan. 1996. The influence of planning on performance in task-based
learning. Studies in Second Language Acquisition 18. 299–324.
Granger Sylviane. 1998. Prefabricated patterns in advanced EFL writing: Collocations and For-
mulae. In Anthony P. Cowie (ed.), Phraseology, theory, analysis and applications, 145–160.
Oxford: Oxford University Press.
Grant, Lynn & Laurie Bauer. 2004. Criteria for re-defining idioms: Are we barking up the wrong
tree? Applied Linguistics 25. 38–61.
Haspelmath, Martin. 2002. Understanding morphology. London: Arnold.
Herrera, Honesto & Michael White. 2010. Canonicity and variation in idiomatic expressions: Ev-
idence from business press headlines. In Sabine De Knop, Frank Boers & Teun De Rycker

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
342 Helene Stengers et al.

(eds.), Fostering language teaching efficiency through cognitive linguistics, 167–187. Berlin:
Mouton de Gruyter.
Housen, Alex & Michel Pierrard. 2005. Investigating Instructed Second Language Acquisition. In
Alex Housen & Michel Pierrard (eds.), Investigations in instructed second language acquis-
tion, 1–27. Berlin: Mouton de Gruyter.
Hulstijn, Jan. 2007. The shaky ground beneath the CEFR: Quantitative and qualitative dimensions
of language proficiency. The Modern Language Journal 91. 663–667.
Kaplan, Robert B. 1966. Cultural thought patterns in intercultural education. Language Learning
16. 1–20.
Kuiper, Koenraad. 1996. Smooth talkers: The linguistic performance of auctioneers and sportscast-
ers. Englewood Cliffs: Lawrence Erlbaum.
Laufer, Batia & Nany Girsai. 2008. Form-focused instruction in second language vocabulary learn-
ing: A case for contrastive analysis and translation. Applied Linguistics 29. 694–716.
Laufer, Batia & Paul Nation. 1995. Vocabulary size and use: Lexical richness in L2 written pro-
duction. Applied Linguistics 16. 307–322.
Lennon, Paul. 1990. Investigating fluency in EFL: A quantitative approach. Language Learning
40. 387–417.
Lewis, Margareta. 2008. The idiom principle in L2 English: Assessing elusive formulaic sequences
as indicators of idiomaticity, fluency, and proficiency. Stockholm: University of Stockholm.
PhD thesis.
Lewis, Michael. 1993. The lexical approach: The state of ELT and a way forward. Hove: Language
Teaching Publications.
Lewis, Michael. 1997. Implementing the lexical approach: Putting theory into practice. Hove:
Language Teaching Publications.
Lewis, Michael (ed.). 2000. Teaching collocation: Further developments in the lexical approach.
Hove: Language Teaching Publications.
Logan, Gordon. 1988. Toward an instance theory of automatization. Psychological Review 95.
492–527.
Long, Michael H. 1991. Focus on form: A design feature in language teaching methodology. In
Kees de Bot, Ralph B. Ginsberg & Claire Kramsch (eds.), Foreign language research in cross-
cultural perspective, 39–52. Amsterdam: John Benjamins.
Luoma, Sari. 2004. Assessing speaking. Cambridge: Cambridge University Press.
MacKay, Donald G. 1982. The problem of flexibility, fluency, and speed-accuracy trade off in
skilled behaviour. Psychological Review 89. 483–506.
Malvern, David D., Brian J. Richards, Ngoni Chipere and Pilar Durán. 2004. Lexical Diversity and
Language Development: Quantification and Assessment. Basingstoke: Palgrave Macmillan.
Moon, Rosamund. 1998. Fixed expressions and idioms in English. Oxford: Clarendon Press.
Myles, Florence, Janet Hooper & Rosamond Mitchell. 1998. Rote or rule: Exploring the role of
formulaic language in classroom foreign language learning. Language Learning 48. 323–364.
Nattinger, James R. & Jeannette S. DeCarrico. 1992. Lexical phrases and language teaching. Ox-
ford: Oxford University Press.
Nesselhauf, Nadja. 2003. The use of collocations by advanced learners of English and some im-
plications for teaching. Applied Linguistics 24. 223–242.
Read, John and Paul Nation 2004. Measurement of formulaic sequences. In Norbert Schmitt (ed.),
Formulaic Sequences: Acquisition, Processing and Use, 23–35. Amsterdam: John Benjamins.
Schmidt, Richard W. 1992. Psychological mechanisms underlying second language fluency. Stud-
ies in Second Language Acquisition 14. 357–385.
Schmitt, Norbert (ed.). 2004. Formulaic sequences: Acquisition, processing and use. Amsterdam:
John Benjamins.
Schmitt, Norbert, Sarah Grandage & Svenja Adolphs. 2004. Are corpus-derived recurrent clus-
ters psycholinguistically valid? In Norbert Schmitt (ed.), Formulaic sequences: Acquisition,
processing and use, 127–147. Amsterdam: John Benjamins.

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM
Formulaic sequences and L2 oral proficiency 343

Segalowitz, Norman S. 2003. Automaticity and second languages. In Catherine Doughty &
Michael H. Long (eds.), The handbook of second language acquisition, 282–308. Oxford:
Blackwell.
Skehan, Peter. 1998. A cognitive approach to language teaching. Oxford: Oxford University Press.
Slobin, Dan I. 2002. Verbalized events: A dynamic approach to linguistic relativity and determin-
ism. In Susanne Niemeier and René Dirven (eds.), Evidence for linguistic relativity, 107–138.
Amsterdam: John Benjamins.
Stauble, Ann-Marie E. 1978. The Process of decreolization: A model for second language devel-
opment. Language Learning 28: 29–54.
Stengers, Hélène. 2007. Is English exceptionally idiomatic? Testing the waters for a lexical ap-
proach to Spanish. In Frank Boers, Jeroen Darquennes & Rita Temmerman (eds.), Multilin-
gualism and applied comparative linguistics: Pedagogical perspectives, 107–125. Newcastle:
Cambridge Scholars Publishing.
Stengers, Hélène. 2009. The idiom principle put to the test: An exercise in applied comparative
linguistics. Brussels: Vrije Universiteit Brussel. PhD thesis.
Talmy, Leonard. 2000. Toward a cognitive semantics, Vol. 1: Concept structuring systems. Michi-
gan: MIT Press.
Towell, Richard, Richard Hawkins & Nives Bazergui. 1996. The development of fluency in ad-
vanced learners of French. Applied Linguistics 17. 84–119.
Vermeer, Anne. 2000. Coming to grips with lexical richness in spontaneous speech data. Language
Testing 17. 65–83.
Weinert, Regina. 1995. The role of formulaic language in second language acquisition: A review.
Applied Linguistics 16. 185–205.
Willis, Dave. 1990. The lexical syllabus: A new approach to language teaching. London: Collins
ELT.
Wray, Alison. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Wray, Alison & Michael R. Perkins. 2000. The function of formulaic language: An integrated
model. Language and Communication 20. 1–28.
Yuan, Fangyuan & Rod Ellis. 2003. The effects of pre-task planning and on-line planning on
fluency, complexity and accuracy in L2 oral production. Applied Linguistics 24. 1–27.

!rrrooouuuggghhhttt      tttooo      yyyooouuu      bbbyyy      |||      UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!      (((UUUnnniiivvveeerrrsssiiittteeeiiitttsssbbbiiibbbllliiiooottthhheeeeeekkk      VVVUUU!)))


AAAuuuttthhheeennntttiiicccaaattteeeddd      |||      111777222...111666...111...222222666
DDDooowwwnnnllloooaaaddd      DDDaaattteee      |||      666///333///111222      888:::555888      PPPMMM

You might also like