Download as pdf or txt
Download as pdf or txt
You are on page 1of 291

KONINKLIJKE VLAAMSE ACADEMIE VAN BELGIE VOOR WETENSCHAPPEN EN KUNSTEN

COMPLEXITY, ACCURACY AND FLUENCY IN SECOND LANGUAGE USE, LEARNING & TEACHING
29-30th March 2007 Van Daele, S., Housen, A., Kuiken, F., Pierrard, M. & Vedder, I. (eds.)

XXX & XXXX (Eds.)

CONTACTFORUM

KONINKLIJKE VLAAMSE ACADEMIE VAN BELGIE VOOR WETENSCHAPPEN EN KUNSTEN

COMPLEXITY, ACCURACY AND FLUENCY IN SECOND LANGUAGE USE, LEARNING & TEACHING
29-30th March 2007

Van Daele, S., Housen, A., Kuiken, F., Pierrard, M. & Vedder, I. (eds.)

CONTACTFORUM

Handelingen van het contactforum Complexity, Accuracy and fluency in Second Language Use, Learning & Teaching (29 en 30 maart 2007, hoofdaanvragers: Prof. Michel Pierrrard en Alex Housen) gesteund door de Koninklijke Vlaamse Academie van Belgi voor Wetenschappen en Kunsten. Afgezien van het afstemmen van het lettertype en de alineas op de richtlijnen voor de publicatie van de handelingen heeft de Academie geen andere wijzigingen in de tekst aangebracht. De inhoud, de volgorde en de opbouw van de teksten zijn de verantwoordelijkheid van de hoofdaanvrager (of editors) van het contactforum.

KONINKLIJKE VLAAMSE ACADEMIE VAN BELGIE VOO R WETENS C HAP PEN EN KUN STEN Paleis der Academin Hertogsstraat 1 1000 Brussel

Niets uit deze uitgave mag worden verveelvoudigd en/of openbaar gemaakt door middel van druk, fotokopie, microfilm of op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uitgever. No part of this book may be reproduced in any form, by print, photo print, microfilm or any other means without written permission from the publisher. Copyright 2007 KVAB D/XXXX/XXXX/XX Printed by XXXX

KONINKLIJKE VLAAMSE ACADEMIE VAN BELGIE VOOR WETENSCHAPPEN EN KUNSTEN

Complexity, Fluency and Accuracy in Second Language Use, Learning& Teaching CONTENTS
1. Preface 2. Coarticulatory resistance as a basis for foreign accent: V-to-V coarticulation in German VCV-sequences: A pilot study. 15 Henrike Baumotte, Mario Lenz & Grzegorz Dogil 3. Measure for measure: Why type/token ratio based measures are not valid to assess lexical complexity/richness as a dimension of language proficiency.... 31 Bram Bult 4. Definitions of complexity....... 41 sten Dahl 5. Les competences syntaxiques et lexicales dans les notions de fluidit, de complexit et dexactitude......... 47 Jean-Marc Defays, Sarah Deltour & Audrey Thonard 6. The effect of task complexity on fluency and functional adequacy of speaking performance. ....... 57 Nivja H. de Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen & Jan H. Hulstijn 7. Performance accuracy affected by control over bilingual language production: A study of balanced L2 users..... 69 Julia Festman, Antoni Rodriguez-Fornells & Thomas Mnte 8. Functional-cognitive correlates of complexity in the use of the English-gerund participle with perception verbs...... 81 M. Angeles Gomez 9. Speaking and Writing in L2 French: Exploring effects on fluency, complexity and accuracy. ..... .. 91 Jonas Granfeldt 5

10. Fluency and accuracy in the written production of L2French. . 103 Cecilia Gunnarsson 11. The effects of pre-task planning on L2 narrative tasks. ... Haemoon Lee & Miyoung Oh 119

12. EFL learner performance variation as the effect of interlocutor type 131 Haemoon Lee, Kyungjin Joo, Jungwon Moon& Yunsun Hong 13. Paraphrase as a tool for achieving lexical competence in L2... Jasmina Milicevic 147

14. Complexit, exactitude et fluidit: Le rle que jouent les squences prfabriques dans linterlangue des dbutants... 161 Florence Myles 15. Bilingual reading: An essential factor for the acquisition of written competence in a third language... 177 Helena Roquet Pugs & Carme Prez Vidal 16. Stades de dveloppement en franais perspectives historiques et futures Suzanne Schlyter, Jonas Granfeld & Malin gren 187

17. Complexity, accuracy, fluency and lexis in task-ased performance: A meta-analysis of the Ealing Research.. 201 Peter Skehan & Pauline Foster 18. The complexities of selecting complex (& simple) forms in instructed SLA research. 223 Nina Spada & Yasuyo Tomita 19. Short-term changes in complexity, accuracy and fluency: Developing progress-sensitive proficiency tests 229 Alan Tonkyn 20. Complexity, accuracy and fluency in second language acquisition research.. Richard Towell 260

21. Psycholinguistic mechanisms underlying the manifestation and development of 2nd language complexity, accuracy and fluency 273 Siska Van Daele, Alex Housen & Michel Pierrard

PREFACE One of the most frequently measured aspects of human behaviour is undoubtedly peoples language proficiency. Particularly speakers and learners of second or foreign languages often have their proficiency in the second language assessed. But exactly what is it that is being assessed? What makes a second language speaker (or a native speaker for that matter) a proficient or non-proficient language user? And how can this be most efficiently (i.e. validly, reliably and feasibly) measured? Many SLA researchers and L2 practicioners assume that the construct of L2 proficiency is compositional in nature, and that its principal linguistic components can be captured by the notions of fluency, accuracy and complexity. Both within SLA research and in L2 pedagogy, these terms have been used for a long time to investigate the development, the processing and the use of a L2, but their exact meanings and functions are still not clear. A review of the literature shows that the origins of this three-fold distinction lie in research on L2 pedagogy where in the 1980s a distinction was made between fluent versus accurate L2 usage to investigate the development of oral L2 proficiency in classroom contexts. One of the first to use this dichotomy was Brumfit (1984), who distinguished between fluency-oriented activities, which stimulate spontaneous L2 production, and accuracy-oriented activities, which focus on linguistic form and the controlled production of grammatically accurate oral performance. In the 1990s the dichotomous approach towards the development of oral L2 proficiency was broadened by adding a third dimension, complexity (e.g. Skehan, 1992, 1996). Fluency, accuracy and complexity have thus been traditionally defined as, respectively, speaking with native-like rapidity (Lennon, 1990:390), generating error-free utterances and using a wide range of structures and vocabulary (Wolfe-Quintero, Inagaki & Kim, 1998:4) . As such these three constructs have figured predominantly (and prominently) as dependent variables in many SLA studies, including studies on ultimate attainment in SLA and on the effects of individuality features, age, learning context, instruction, task type and the yearabroad experience. Collectively, this research suggests that complexity, accuracy and fluency emerge as distinct components of L2 proficiency which may be differentially manifested under different task conditions and which may be differentially developed by different types of learners and under different learning conditions. Since the mid-1990s, inspired by advances made in cognitive psychology and psycholinguistics (cf. Anderson, 1993; Levelt, 1989), fluency, accuracy and complexity have also increasingly figured as the primary foci of investigation, that is, as independent variables (e.g. Towell & Hawkins, 1994; Skehan, 1996, 1998; DeKeyser, 1998; Robinson 2001; Segalowitz, 2000). Here fluency, accuracy and complexity emerge as primary epiphenomena of the psycholinguistic mechanisms and processes underlying the acquisition, representation and processing of L2 knowledge. It is now generally assumed that complexity and accuracy are both primarily linked to the current state of the learners partly declarative, explicit and partly procedural, implicit interlanguage knowledge (L2 rules and lexico-formulaic knowledge) whereby complexity is viewed as the scope of expanding or restructured second language knowledge and accuracy as the conformity of second language knowledge to target language norms (Wolfe-Quintero et al, 1998:4). Thus, complexity and accuracy relate 8

primarily to L2 knowledge representation, or the level of analysis of internalized linguistic information. In contrast, fluency is primarily related to the learners control over his linguistic L2 knowledge as reflected in the speed and efficiency with which he accesses relevant L2 information to communicate meanings in real time, with control improv[ing] as the learner automatizes the process of gaining access (Wolfe-Quintero et al, 1998:4). There is less consensus about the relationship between fluency, accuracy and complexity and about the way these constructs should be measured. From a methodological perspective, the measurements used in SLA research to tap fluency, accuracy and complexity have been under scrutiny in recent years. Several studies have catalogued the various global and specific measures that assess each construct and examined their reliability, validity, comparability and practical feasibility (Ortega, 1999; Polio, 1997, 2003; Robinson, 2005; Skehan, 2003; WolfeQuintero et al., 1998). The great variety of measures used reflects the lack of consensus on how fluency, accuracy and complexity should be defined as constructs. A second problem concerns the question to what extent these three dimensions are in(ter)dependent in both L2 acquisition and L2 production. According to Ellis (1994), increase in fluency in L2 acquisition may well occur at the expense of development of accuracy and complexity due to the differential development of knowledge analysis and knowledge automatisation in L2 acquisition and the ways in which different forms of implicit and explicit knowledge influence the acquisition process. The differential evolution of fluency, accuracy and complexity is furthermore caused by the fact that the psycholinguistic processes involved in using L2 knowledge are distinct from acquiring new knowledge. To acquire the learner must attend consciously to the input and, perhaps also, make efforts to monitor output, but doing so may interfere with fluent reception and production (Ellis, 1994:107). Researchers who subscribe to the view that the human attention mechanism and processing capacity are limited (e.g. Bygate, 1999; Skehan, 1996, 1998; Skehan & Foster, 1997, 1999) also see fluency as an aspect of L2 production which competes for attentional resources with accuracy while accuracy in turn competes with complexity. Learners may focus (consciously or subconsciously) on one of the three dimensions to the detriment of the other two. A different view is proposed by Robinson (2001, 2003) who claims that learners can simultaneously access multiple and non-competitional attentional pools; as a result manipulating task complexity by increasing the cognitive demands of a task can lead to simultaneous improvement of complexity and accuracy. This colloquium was conceived as an international forum for presenting and comparing current converging approaches and methods that different disciplines (linguistic, cognitive, neurolinguistic, pedagogic) bring to the study of complexity, fluency and accuracy as fundamental dimension of language proficiency in second/foreign language use, development, learning and teaching. The purpose of the colloquium then, was not only to dissiminate recent research findings and to test the thinking about ones research but also to raise interest in collaboration and to identify means of working together on research procedures and theoretical models. The present volume presents a selection of the contributors to this colloquium. Included are not only the plenary and invited papers but also several of the regular presentations. Collectively the different chapters reflect the wide range of questions addressed during the presentations and during the panel discussion which concluded the colloquium. These questions include the following:

How can complexity, fluency and accuracy be envisaged and defined as constructs? What are possible linguistic, cognitive, psycholinguistic and neurolinguistic correlates of complexity, fluency and accuracy? How can complexity, fluency and accuracy be operationalized and measured? How do complexity, fluency and accuracy develop in the process of L2 learning? To what extent are they are interdependent? Which factors influence the development of complexity, fluency and accuracy in L2 learning and use? To what extent are they open to pedagogic intervention? Clearly this list is not exhaustive and, indeed, many more questions are addressed in the contributions to this volume. REFERENCES Anderson, J.R. (1993). Rules of the Mind. Hillsdale, New Jersey: Lawrence Erlbaum. Brumfit, C.J. (1984). Communicative Methodology in Language Teaching. Cambridge: Cambridge University Press. Bygate, M. (1999). Quality of language and purpose of task: Patterns of learners language on two oral communication tasks. Language Teaching Research 3, 185214. DeKeyser, R. (1998). Beyond focus on form: Cognitive perspectives on learning and practicing second language grammar. In C. Doughty & J. Williams (Eds.), Focus on Form in Classroom Second Language Acquisition, (pp. 42-63). Cambridge: Cambridge University Press. Ellis, R. (1994). A theory of instructed second language acquisition. In Ellis, N. (Ed.), Implicit and Explicit Learning of Language, (pp. 79-114). London: Academic Press. Lennon, P. (1990). Investigating fluency in EFL: A quantative approach. Language Learning 40, 387-417. Levelt, W.J.M. (1989). Speaking: From Intention to Articulation. Cambridge, MA: MIT Press. Ortega, L. (1999). Planning and focus on form in L2 oral performance. Studies in Second Language Acquisition 21, 109-148. Polio, C. (1997). Measures of linguistic accuracy in second language writing research. Language Learning 47, 101-143. Polio, C. (2003). Research on second language writing: An overview of what we investigate and how. In B. Kroll (Ed.), Exploring the Dynamics of Second Language Writing, (pp. 35-65). New York: Cambridge University Press. Robinson, P. (2001). Task complexity, task difficulty, and task production: interactions in a componential framework. Applied Linguistics 22(1), 27-57. Exploring

10

Robinson, P. (2003). Attention and memory during SLA. In C. Doughty & M. H. Long (Eds.), Handbook of Second Language Acquisition, (pp. 631-678)Oxford: Basil Blackwell. Robinson, P. (2005). Cognitive complexity and task sequencing: Studies in a componential framework for second language task design. International Review of Applied Linguistics 43 , 132. Segalowitz, N. (2000). Automaticity and attentional skill in fluent performance. In H. Riggenbach (Ed.), Perspectives on Fluency, (pp. 200219). Ann Arbor: University of Michigan Press. Skehan, P. (1992). Strategies in second language acquisition . Thames Valley University Working papers in English Language Teaching 1, 178-208. Skehan, P. (1996). A Framework for the implementation of task-based instruction. Applied Linguistics 17, 33-62. Skehan, P. (1998). A Cognitive Approach to Language Learning. Oxford: Oxford University Press. Skehan, P. (2003). Task based instruction. Language Teaching 36, 1-14 Skehan, P. & Foster, P. (1997). Task type and task processing conditions as influences on foreign language performance. Language Teaching Research 1, 185-211. Skehan, P. & Foster, P. (1999). The influence of task structure and processing conditions on narrative retellings. Language Learning 49, 93-120. Towell, R. & Hawkins R. (1994). Approaches to Second Language Acquisition. Clevedon, UK: Multilingual Matters. Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity (Technical Report no. 17) . Honolulu, HI: University of Hawai'i, Second Language Teaching and Curriculum Center. ACKNOWLEDGEMENTS The editors gratefully acknowledge the contributions of all participants at the Colloquium on Complexity, Fluency and Accuracy in Second Language Use, Learning & Teaching (Brussels, 29-31 March, 2007), who included the authors of the chapters in this volume and Rick De Graaff and Gabriele Pallotti who served as discussants. Both the colloquium and the present volume were made possible by the financial contributions of the following agencies: The Royal Flemish Academy for Science and the Arts, Belgium(KVAB) and the Research Foundation Flanders (FWO).

11

COARTICULATORY RESISTANCE AS A BASIS FOR FOREIGN ACCENT: V-TO-V COARTICULATION IN GERMAN VCV-SEQUENCES: A PILOT STUDY. Henrike Baumotte, Mario Lenz & Grzegorz Dogil University of Stuttgart, Germany

1. INTRODUCTION This pilot study was carried out as part of the PhD thesis State of the art of phonetic language aptitude linking phonetic & phonological models to empirical neuro-imaging (neurolinguistic) research which is integrated in a large-scale project at the Institute of Natural Language Processing, University of Stuttgart, Germany, as well as in the Section of Experimental Magnetic Resonance Neuro-Imaging, Department of Neuroradiology, University of Tbingen, Germany. The project Language talent and brain acitivity started in May 2006. The first major task within the project consists of the implementation of phonetic experiments in order to detect individual differences and to measure the phonetic aptitude of German subjects in both English and their native language. The majority of subjects (male and female) have an academic background and started learning English at school at the age of 10. They also had spent a sizable period of time in an English-speaking environment. The test battery consists of extensive tests of phonetic language ability and assesses pronunciation talent in English. The individual tasks cover speech production, using various elicitation techniques (spontaneous speech, read speech, direct imitation, delayed imitation), as well as speech perception (discrimination and interpretation of prosody as well as accent identification). In stage 2 of the project subjects will participate in psychological experiments that include tests of phonological working memory, empathy, personality (e.g., learning style and motivation), verbal as well as non-verbal intelligence (Raven-Test), musicality (GordonTest) and mental flexibility. One of the aims of experimental phonetics is to uncover how brain mechanisms control speech production (Roach, 1992: 22). That is why those subjects identified as being particularly skilled or unskilled will be invited to take part in in-depth neuroanatomical and neurofunctional examinations, i.e. brain anatomy based on magnetic resonance (MR) imaging, white and grey matter density measurements (VBM) and white matter fiber tracking (DTI), fMRI, MEG and EEG while undergoing phonetic tests similar to those performed earlier. It is my particular objective to investigate differences in coarticulatory resistance between talented and untalented speakers of L2 English. Obviously, talented speakers of a foreign language have less pronounced foreign accent than untalented counterparts. Speech production is a very complex process demanding the activation of many muscles at the same time (Clark, Yallop & Fletcher, 2007: 11 ff). In the course of speech production, phonological processes play an important role. These processes are understood as the spreading in time of different articulators with their neighbouring ones during speech movements. As a result the vocal tract is influenced at any point in time by more than one segment, a phenomenon known as coarticulation (Ashby & Maidment, 2005: 122 f.; Catford, 2001: 99; Farnetani, 1997: 371; Khnert & Nolan, 1999: 9). Phonological processes vary with regard to different languages and consequently lead to different allophonic variations. 12

Coarticulation can manifest its influence by the addition of secondary articulation to a sound. Secondary articulation is defined as the fine adjustment of the place of articulation, the addition of nasalisation, or variations in duration (Ashby & Maidment, 2005: 132). Oral vowel articulation is seen as primary articulation. Articulatory gestures can be characterized as inherently context-sensitive (Recasens, 1984: 61). Up to now, coarticulation has been found in all languages analyzed, e.g. in French (Benguerel, Hirose, Sawashima & Ushijima, 1977a, 1977b), in English (Bladon & Al-Bamerni, 1976; Bladon & Nolan, 1977; Lehiste, 1964; Majewski, Rothman & Hollien, 1977), in Catalan (Recasens, 1984a, 1984b; Recasens, Fontdevila & Pallars, 1995; Recasens & Pallars, 2001), in German (Recasens, Fontdevila & Pallars, 1995) and in Polish (Majewski, Rothman & Hollien, 1977). These findings support the assumption that coarticulation is a universal phenomenon (Farnetani, 1997: 376). The literature listed above suggests coarticulation to be language-specific, because of the occurrence of some differences in phoneme sequences in one language but not in another. This leads to the question of how to handle these interlanguage differences. According to Stevens and House (1963: 126 f.), Khnert and Nolan (1999: 8) as well as Recasens (1999: 102 f.), coarticulation differs between speakers because it is governed by the different physical sizes of the vocal tract. In addition, prosodic features including syllable position, stress, speech rate, segmental duration, syntactic and phonological boundaries play an important role. For example, different speaking rates create different formant frequencies when generating the same vowel. Speaker-specific speaking habits must be considered as well when analysing coarticulation and coarticulatory resistance. Those habitual vocal tract configurations can change in different situations. For example, in order to ensure intelligibility, we try to speak more clearly in a noisy as opposed to a quiet environment, resulting in less coarticulation and more clear-cut articulatory gestures (This is called hyper-articulation or hyper-speech) (see also Farnetani & Recasens, 1999: 34 f.). Coarticulation can be detected using different techniques: via indirect (aerometry, electromyography, acoustics) and direct observation (radiography, endoscopy, photodetection, mechanical devices and more recent techniques such as ultrasound, magnetic resonance imaging, electromagnetic articulometry) (Chafcouloff, 1999: 284). Moreover, the analysis of spectrograms can show changes in acoustic characteristics which we will investigate in this study. Finally, direct measurements of articulator positions can be performed as well. An early study which is of interest for the present examination was conducted by Stevens and House (1963: 119). Stevens and House (1963) examined the influence of consonant environment on vowel formant frequencies. The authors claim that the extent of this influence differs for each vowel depending on consonantal context. The authors also report systematic shifts in the vowel formant frequencies depending on the place of articulation of the consonant, its manner of articulation, and its voicing characteristics. Furthermore, they demonstrate the displacement of F 2 toward a central position when data for vowels in consonantal contexts are contrasted with data for vowels in null contexts (1963: 121). Additionally, they show that voiced consonants lead to generally lower F 1 values for vowels, while for their voiceless counterparts, the second formant is higher for front vowels and relatively unchanged for back vowels (except [u]). The comparison of fricative and stop contexts shows no consistent effect on F1, but generally higher F2 values for stop environments with front vowels, particularly [] and []. For fricative environments, higher F 2 values are observed with the back vowels [] and [u]. 13

On the basis of grouping languages in velarized (dark) or non-velarized (clear) varieties of the consonant [l], Recasens, Fontdevila and Dolors (1995) analyzed velarization-patterns and vowel-to-consonant coarticulation for German [l] in the sequences [ili] and [ala] contrasted with Catalan [l]. They observed a dorsal contact at the palatal zone for German [l] that goes beyond that for Catalan [l]. Because velarized Catalan is produced with antagonistic tongue dorsum gestures, i.e. tongue dorsum lowering and retraction vs. raising and fronting, its [l] is highly resistant to coarticulation with [i] (Recasens, 1985: 109; Recasens, Fontdevila & Pallars, 1995: 38). In view of the research presented above we suppose for our future investigations that untalented speakers of a foreign language are expected not to be able to coarticulate certain phoneme sequences if it is necessary to overcome the language specific coarticulation of their mother tongue. This is called coarticulatory resistance, which is defined as the variation of speech segments according to their magnitude (Bladon & Al-Bamerni, 1975: 137). In the present pilot study, we follow Recasens (1984a, 1984b, 1985) and Recasens, Fontdevila and Pallars (1995) in examining coarticulatory resistance in VCV-sequences embedded in German frequent/non-frequent real vs. non-words. We are performing acoustic measurements with respect to the permeability of consonants to the influence of unrounded vs. rounded vowels. The results out of this study should give valuable input for the following investigations regarding L2 English production in German subjects. 2. RECORDING PROCEDURE The stimuli were recorded in a sound-attenuated room at the Institute of Natural Language Processing, Stuttgart, Germany by using a headset with an AKG C420. Speech was digitized using a Yamaha O3D digital recording and mixing console at a sampling rate of 48Khz, 16 Bit quantization accuracy. Target words were embedded in a carrier sentence and read from a computer screen by a native speaker of German (male, age 25). 3. MATERIAL The material consists of 60 German frequent vs. less frequent real words as well as 30 nonwords. Stimuli were extracted out of a large corpus of regular expressions within a sequence of corpus position and their frequency information taking CQP (Corpus Query Processor). They include VCV-sequences of the form [i: C i:] or [i: C y:]. We tested all German consonants, except affricates, the glottal stop and [j]. In order to investigate rounded vs. unrounded vowels the phonemic context was held constant. We contrasted non-words vs. frequent vs. rare real words, for example: real words: Kalidnger vs. idyllisch, non-words: idi vs. id. 4. ACOUSTIC ANALYSIS 1 4.1 Method

14

In order to analyze coarticulatory resistance, the acoustic data was submitted to spectral analysis using the linear prediction coding method (LPC). Following Recasens (1985), vowels were first cut out of VCV-sequences, and then the frequency of F 3 (correlating with roundedness) was taken at midpoint to compare frequencies produced by the subject. Vowels were measured manually using standard software (WaveSurfer 1.8.5) and elicited from the LPC spectra taken from the middle section of the vowel. After measurements had been taken, the values of non-words vs. real words were compared. 4.2 Results The results of the acoustic analysis for the three categories, i.e. non-words, frequent real words, and rare real words are represented in the following diagrams (Fig. 1-3). These diagrams indicate the difference values for the third formants for each of the above-mentioned word category. The main result is the detection of coarticulation or coarticulatory resistance effects with regard to the degree to which a respective consonant is permeable for the roundedness vs. unroundedness of the following vowel. Within our study, high values indicate coarticulation, while values around 0 Hz stand for coarticulatory resistance. The difference values in this area are very low, indicating a rising degree of coarticulatory resistance (CR). CR results from a difference (measured value) that approaches to 0 Hz. The closer the frequency value comes to 0 Hz, the more coarticulatory resistance is present in the conformable stimulus. In the first section, we will discuss the difference values for the non-words. Looking at the first diagram, a division into three groups is obvious. The first group shows high positive values, revealing that consonants allow a great deal of coarticulation in this context. Secondly, there is a group within the low positive and negative values. The third group shows high negative values. In the context of non-words, the consonants [h], [s] and [v] have the largest degree of CR, while [f] and [R] are at the opposite end. The latter consonants allow the highest degree of coarticulation. [d] and [n] are the consonants with the most negative values. Eight consonants show signs of CR: [b, h, m, p, s, S, t, v]. A similar three-part division between high positive values, high negative values, and values near 0 Hz is also found in the context of frequent real words. [h] and [b] allow highest degree of coarticulation. In contrast, [d], [l] and [S] are most resistant. No consonants were found with higher negative values. The number of consonants with CR is in the context of frequent real words nine, i.e. one more than in the non-word context: [d, f, g, k, l, p, s, S].

15

Coarticulation & coarticulatory resistance in non-words


600 500

-100 -200 -300 i:bi:i:by: i:di:i:dy: i:fi:i:fy: i:gi:i:gy: i:hi:i:hy: i:ki:i:ky: i:li:i:ly: i:mi:i:my: i:ni:i:ny: i:pi:i:py: i:Ri:i:Ry: i:si:i:sy: i:Si:i:Sy: i:ti:i:ty: i:vi:i:vy:

Formant 3 [Hz]

400 300 200 100 0

16

Coarticulation & coarticulatory resistance of frequent real words


800 700

500 400 300 200 100 0 -100 -200

Formant 3 [Hz]
i:bi:i:by:

600

i:di:i:dy:

i:fi:i:fy:

i:gi:i:gy:

i:hi:i:hy

i:ki:i:ky:

i:li:i:ly:

i:mi:i:my:

i:ni:i:ny:

i:pi:i:py:

i:Ri:i:Ry:

i:si:i:sy:

i:Si:i:Sy:

i:ti:i:ty:

i:vi:i:vy:

17

Coarticulation & coarticulatory resistance of rare real words


800 600 400 200 0

-200 -400 -600

F orm an t 3 [H z]
i:bi:i:by:

i:di:i:dy:

i:fi:i:fy:

i:gi:i:gy:

i:hi:i:hy:

i:ki:i:ky:

i:li:i:ly:

i:mi:i:my:

i:ni:i:ny:

i:pi:i:py:

i:Ri:i:Ry:

i:si:i:sy:

i:Si:i:Sy:

i:ti:i:ty:

i:vi:i:vy:

Figures 1-3 show the difference values [Hz] for the third formant calculated from the subtraction of F 3 values for [i:] in unrounded context from F 3 values in rounded context for non-words (Fig. 1), frequent real words (Fig. 2) and rare real words (Fig. 3). SAMPA-notation has been used for the transcriptions.

The last diagram illustrates differences in rare real words. The consonants [k] and [l] show the strongest coarticulation. The highest degree of CR is found in [f], [h] and [n]. The number of consonants that show tendencies of CR is lower than in the case of the non-word context. [p] and [s] are situated on the negative end. With regard to these results, it must be concluded that no consonant exhibits CR in all three contexts. The consonant [h] shows up as resistant in the context of rare real words and in the non-word context. In contrast to [h], [R] shows a high degree of permeability in all three contexts. 5. ACOUSTIC ANALYSIS 2 The results of the first acoustic analysis might have been confounded by accent effects caused by the production of larger units as opposed to syllables. Earlier studies (Fowler, 1981; 18

Magen, 1997) focused on stress effects on V-to-V coarticulation considering word distances. In those studies, an examination of stress was necessary to figure out coarticulation with respect to vowel neighbouring effects. Following Fowler (1981), stressed vowels coarticulate less with their neighbours and at the same time have less coarticulatory influence to their neighbours. In contrast, unstressed vowels show a lot more coarticulatory impact to their neighbours. In the scope of our study, we recorded the target words within a higher prosodic level, i.e. sentences. This is why in a second step of analysis we tried to optimize the interpretation of our data by concentrating on the parameter of intonation. We expected to be able to unravel the inconsistent results of the first acoustic analysis. For this reason, we took into account the four hypotheses set up by Cho (2004: 144 f.), which deal with sentential level stress or accent. Cho (2004: 146) pointed to different stress patterns, no stress or stresses below primary, primary word stress (lexical stress) and sentence stress (nuclear pitch-stress) (see also Shattuck-Hufnagel & Turk, 1996). Hypothesis 1: Accent-induced coarticulatory reduction. According to Cho (2004: 144) this means that accented vowels show less coarticulation with adjacent vowels than unaccented vowels. Hypothesis 2: Boundary-induced coarticulatory reduction. Following Cho (2004: 144), there is less coarticulation at domain-edges for vowels than in domain-medial position, at the same time V-to-V coarticulation across a higher prosodic boundary can be found less often than across a low prosodic boundary. Hypothesis 3: Duration-dependent coarticulatory reduction. There is a correlation between the degree of V-to-V coarticulation and the V-to-V interval. If this interval is smaller, the degree of coarticulation will in contrast be greater. Hypothesis 4: Accent-induced coarticulatory aggression. If a vowel is accented, its coarticulatory influence on the neighbouring vowels is greater than when being unaccented. In summary, these four hypotheses show that prosody and in particular intonation influence coarticulatory effects. 5.1 Method The duration of the consonantal segment was measured from the offset of V1 to the onset of V2. Afterwards, sentences were transcribed (by one transcriber) following the criteria of the GToBI(S) transcription system (Mayer, 1995). The prosodic structure of target stimuli was analyzed within the recorded sentence, because it is in a way dependent on a broader, hierarchical level of stress, which finally ends up in the rhythmic structure (e.g. Beckman & Edwards, 1994). Intonation was measured using WaveSurfer as mentioned above (see 4.1 Method). Prosodic description included F0, intensity and segmental duration. Two different prosodic boundaries were observed: first, H*L refers to a high target on the accented/tonic syllable followed by a falling pitch, second %, which identifies an intonational phrase boundary at the end of a higher prosodic domain (IP boundary). In order to falsify the H3, we displayed the three categories, non-words, frequent as well as infrequent real words, in scatterplots. Afterwards, statistical analysis in R, a program for statistical computing, helped to find out if length and coarticulatory resistance values correlate or if there is no relationship between them. 19

5.2 Results This section starts with a summary of the results for the applicability of Cho's first hypothesis, taking into account the non-word category. The concentration on accentuation makes it impossible to apply H1 and H4 to non-words, because it was not possible to assign the vowels a clear accentuation. Furthermore, no difficulties arose from the other two contexts. With the help of the example of the word-pair Profischiedsrichter (professional referee) Knieschtzer (knee-protectors), the first hypothesis seems to be proved for frequent real words. A further example which is again accounted for by H1 is the word-pair legitim (legitimate) Kreditberwachung (credit control) (rare real words). But we could also find some counterexamples, which contradicted the first hypothesis. The word-pairs konsolidiert (consolidate) - idyllisch (idyllic), Saloniki - Manikre (manicure) and Stabilitt (stability) illyrisch (illyric) belong to these counterexamples. Although the majority of the examples in the frequent real word context were in accord with H1, we cannot conclude that this hypothesis works in all cases. For the test of rare real words against H1, we found many incorrect predictions and only few correct ones. The word-pair Therapiewiderstand (therapy-resistance) - Autonomiewnsche (wishes for autonomy) shows coarticulatory reduction, which is the first hypothesis' prediction. liniert (lined) Vinyl (vinyl) and minimieren (minimize) - Theoriemdigkeit (tired of theory) represent two counterexamples to the first hypothesis. Our data leads to the conclusion that H1 does not generally predict the correct results. Checking the validity of H2 is quite difficult, as some of the compound words in the experiment contain the relevant segments at domain edges, but nevertheless coarticulate a lot. In the non-word context, the second hypothesis is not applicable, because all non-words are three-letter sequences that consist of two syllables as a prosodic boundary. Thus, a comparison between different prosodic boundaries is not possible. As for the rare real word context, all words with one exception are composita. Composita are built from compound words, while the connection between the composita is called a word boundary. Only one word consists solely of syllables, therefore it is meant to be a syllable boundary. In the word-pair liniert (lined) Vinyl (vinyl), CR can be detected. The second hypothesis is not applicable because the word consists of two syllables, which finally simply indicates a low boundary. For the word-pair legitim (legitimate) Kreditberwachung (credit control) (boundary-induced coarticulatory reduction) the second hypothesis is applicable. The comparison in this context is quite difficult. Therefore, H2 can be best tested when considering the final context, i.e. the frequent real words. A better picture of the applicability of the second hypothesis seems to be possible here because there are sufficient words which have a syllable or word boundary at the crucial letter sequence. There are examples which confirm H2, but there are also counterexamples. Examples that confirm H2 are the word-pairs beliebig (arbitrary) Tribne (stand), Risiko (risk) Energiesystem (energy-system) and Qualifikation (qualification) Technologiefhrer (guide for technology) referring to the consonants [b], [s] and [f]. [b] is very permeable and lies at a syllable boundary. [s] is located at a word boundary and is at the same time resistant. Finally, [f] is located at a word boundary and seems to be resistant. On the other side, [h] in the word-pair Nihilismus (nihilism) Skihtte (ski-hut) is situated at a word boundary and is very permeable but should not be in accordance with H2. In the wordpair Stabilitt (stability) illyrisch (illyric) we can found a syllable boundary, but

20

nevertheless [l] is resistant. In many cases, the second hypothesis seems to predict the correct results, but sometimes it is totally wrong. In conclusion, we cannot generally apply H2. Our study's data leads to the conclusion that the third hypothesis does not apply in the context of non-words either. Although some consonants point to the fact that H3 makes correct predictions, other consonants speak against the third hypothesis. Thus, [R], [g] and [l] validate the third hypothesis. Counterexamples are the letter sequences that reside [f] and [h]. [f] is rather long, but has the second highest degree of coarticulation. By contrast, [h] belongs to the resistant consonants, but it is very short. We used scatterplots to show length as well as coarticulatory resistance of the stimuli used. In the plots no linear relationship in the three different categories could be observed. Further, Pearson correlation tests on a significance level of 5% could also not show any linear relationship (non-words: p = 0.4315, cor = 0.149; rare real words: p = 0.1279, cor = -0.2843; frequent real words: p = 0.646, cor = 0.0874). Looking at the word-pairs in more detail finally leads us to the same conclusion. As in the case of the non-word context the third hypothesis seems to be making the wrong predictions. Only few consonants behave as predicted by H3. Generally, many counterexamples can be found. Examples against the third hypothesis are [k] and [t]. [k] in the word-pair Bikini (bikini) - Galerieknstler (gallery-artist) presents the highest degree of coarticulation, but it is rather long. [t] in the word-pair legitim (legitimate) Kreditberwachung (credit control) belongs to the resistant consonants, but is short. The data of the rare real word context shows many counterexamples to the third hypothesis. Thus, we did not find any convincing evidence supporting H3. The examples were taken from the permeable consonants and the resistant consonants. Considering the high negatives, the result is also non-uniform. Short and long consonants can be detected. Finally, the third hypothesis does not apply in the context of frequent real words either. Again, some consonants are compliant to H3, though others are not. The examples [b] and [g] speak against H3. [b] in the word-pair beliebig (arbitrary) Tribne (stand) showed a lot of coarticulation and according to the third hypothesis the consonant should be short contrary to fact. [g] in the word-pair korrigiert (corrected) Industriegter (industry-goods) is resistant, but it is one of the shortest observed consonants. The consonants behaviour in all contexts suggests that the third hypothesis cannot be maintained. In the context of non-words, the influence of coarticulation and CR on the measured values cannot generally be attributed to hypothesis H4. The problem mentioned earlier prohibits the application. The word-pair Nihilismus (nihilism) Skihtte (ski-hut) in the power plot shows, that the intonation is on [y]. Listening to these stimuli out of the frequent real word class results in the same audio impression. It is not possible to verify H4, which ends up in postulating that the H4 is not applicable in the frequent real word-context.

21

Figure 4: Sample of GToBI(S) transcription for Ich habe Skihtte gesagt. (I have said ski-hut.) with ACCUNACC prosodic boundaries. H*L refers to a high target on the accented/tonic syllable followed by a falling pitch. Final % refers to the IP boundary (Intonational Phrase boundary, end of higher prosodic domain).

By contrast, when regarding the rare real word context, in the word-pair liniert (lined) Vinyl (vinyl) the stress is on [y:], which is an argument for H4 in the rare real-word context. Hypothesis 4 seems to account for [i:ky:] and [i:ly:]. [i:ky:] and [i:ly:] are embedded in the word-pairs Bikini (bikini) Galerieknstler (galery-artist) and Energielieferant (supplier of energy) Spielbersicht (overview of the game). Again, we found contradictory examples to H4, examples like Kiribati Energierckstand (residue of energy), legitim (legitimate) Kreditberwachung (credit control). Therefore, the last hypothesis is also not supported by our data. It is however more problematic to find explanations for word-pairs like liniert (lined) Vinyl (vinyl), in which the third formant of the second [i:] is higher than the first one. According to our basic assumption this should not be the case. Measured values which are not easily distinguishable from the first vowel [i:] could be explained by normal frequency fluctuations of the voice. Aside, there is the problem of high negative difference values in the word-pair liniert (lined) - Vinyl (vinyl). The larger negative values can, however, not be accounted for. It is argued that coarticulation is not responsible for these negative values, because the surrounding sounds tend to be of lower frequency. In the word-pair Deponiesickerwasser (disposal site's seepage) harmonieschtig (addicted to 22

harmony) we calculated a difference of 300 Hz between frequencies. In this case the consonants' third formant values are nearly equal, concurrently all sounds in front of [i:] have lower frequency values. Therefore, coarticulatory influence can be excluded. If we assume that accentuation eventually raises frequency values, we are obliged to explain why this happens only sporadically. The consonants in the pair Energiebilanz (overview of energy consumption) antibrgerlich (anti-civil) behave very conspicuously like the difference between consonants amounts to 550 Hz. The reduction of [i:] is only 100 Hz. It seems to us as though the consonant absorbs coarticulation.

Figure 5: Sample of GToBI(S) transcription for Ich habe harmonieschtig gesagt. (I have said addicted to harmony.) with ACC-UNACC prosodic boundaries. H*L refers to a high target on the accented/tonic syllable followed by a falling pitch. Final % refers to the IP boundary (Intonational Phrase boundary, end of higher prosodic domain).

Searching for applicability of Chos (2004: 144 f.) hypotheses, we state that the H1 (Accentinduced coarticulatory reduction) cannot be said to be correct with regard to our data in all three conditions. The current finding does not support the relation between accented vowels due to hyperarticulation and less coarticulation as well as vice versa. In accordance, the corpus of data studied here does not give evidence for hypothesis 2. Due to high coarticulation values in relevant segments at domain edges, the results of spectral analysis are very contradictory. The third hypothesis was not confirmed either on the basis of our data, as there is no correlation between the degree of V-to-V coarticulation and the V-to-V interval. Further, accent cannot be found to induce coarticulatory aggression because it is not always the case that the coarticulatory influence on the neighbouring vowels is greater when they are accented and vice versa (H4). 6. DISCUSSION The comparison between VCV-sequences of the form [i: C i:] or [i: C y:] in non-words, frequent real words and rare real words did not reveal consistent results. As a consequence, we suggest that the observed consonants do not show any regular sensitivity and resistance to roundedness. The unrounded [i:] has no uniform influence on the second rounded vowel of the 23

stimuli, nor anticipatory neither carry-over coarticulation, i.e. the gestures are not in general anticipated or carried over during the production of a following gesture. However, we could detect regions of high values indicating coarticulation. In contrast to values around 0 Hz, which were referred to coarticulatory resistance. Within the regions of coarticulatory resistance, the difference values were very low, and the lower those difference values the larger the degree of coarticulatory resistance (CR). Results in this first analysis suggest that CR might be highly context and at the same time speaker-dependent. To generally ensure the absence or presence for consistent coarticulation effects or coarticulatory resistance over the three conditions (non-words vs. frequent vs. rare real words), we should investigate F 1 as well as F2 in a subsequent study. In order to check the above described results we tested the four hypotheses of Cho (2004: 144 f.). We could not find any verification for them so far because the target stimuli were confounded by the fact of the invading in a sentence. The results lend no support for hypothesis 1, Accent-induced coarticulatory reduction. In our stimuli, accented vowels (including hyper-articulation under accent) do not consistently show less coarticulation with adjacent vowels than unaccented vowels. Thus, there is neither less coarticulation across the board at domain edges than in domain-medial position (boundaryinduced coarticulatory reduction), nor V-to-V coarticulation across a higher than across a low prosodic boundary. For H3 (Duration-dependent coarticulatory reduction) we could not detect any correlation between the degree of V-to-V coarticulation and the consonantal length in all three cases. In conclusion, there is no support for the H3 either. At the same time, no evidence for hypothesis 4 (Accent induced coarticulatory aggression) was found, i.e. the coarticulatory influence of an accented vowel on the neighbouring vowels is not necessarily greater (higher F 3 values) than the coarticulatory impact of an unaccented vowel. Earlier studies (e.g., Recasens, 1984a, 1984b, 1985; Recasens, Fontdevila & Pallars, 1995) indicated that the investigation of CVC- as well as CV-segments through phonological rules cannot successfully define the presence vs. absence of coarticulation, due to its graded nature and the linguistically relevant aspects of coarticulation which are connected with this graded nature. In other words, the same segments exhibit different degrees of coarticulation across languages. To account for these facts, factors outside the world of phonological features, articulatory constraints and aerodynamic-acoustic constraints, are also necessary to consider (Farnetani, 1997: 390). With regard to our three conditions tested exclusively in German, no conformity could be disentangled. Cho (2004) proposes for language-specific investigations to additionally look for stress differences. This is why we tried to find applicability for Chos (2004) hypotheses, which could also not motivate the different influence of roundedness on the three given categories in our research presented here. We are conscious of the fact that further research is necessary to account for CR in German VCV- and VC-sequences examining more speakers (female and male) in order to consider coarticulation differences or similarities between speakers (Khnert & Nolan, 1999: 8; Recasens 1999: 102 f.; Stevens & House, 1963: 126 f.). The stimuli for this pilot study were produced by only one male German speaker, so no further conclusions can be drawn from the data. Other phoneme combinations and the analysis of the first and second formant frequencies are necessary to continue research in language-specific coarticulation differences. Cho (2004: 168) suggested with regard to his 24

results that duration, stress and prosodic boundaries may not be handled in the same way by reason of an obviously different relationship between coarticulation and duration. According to various authors (e.g., Cho, 2004; Fowler, 1981; Khnert & Nolan, 1999) speech rate and phonological boundaries strongly affect coarticulation patterns. In future studies, we will analyze coarticulatory resistance in speakers who have been subjects in the experiments of the project Language Talent and Brain Activity above. Measurements from the native production data will be compared within the German and English subjects. Taking into consideration the linguistic suprasegmental structure such as duration, different stress, prosodic boundary as well as phonological boundaries, speech rate, speaking style and individual vocal tract differences (Farnetani & Recasens, 1999: 32; Khnert & Nolan, 1999: 28), we will hardly select the data to avoid target stimuli which are not in uniformly chosen stress conditions (only syllable productions). This comparison should yield interesting insights with respect to differences in coarticulation also between German vs. L2 English speech. The method described will provide objective quantified data that will differentiate talented (less coarticulatory resistance) from non-talented (sustained coarticulatory resistance) L2 learners. REFERENCES Ashby, M. & Maidment, J. (2005). Introducing Phonetic Science. Cambridge: Cambridge University Press. Beckham, M. E. & Edwards, J. (1994). Articulatory evidence for differentiating stress categories. In P. A. Keating (Ed.), Papers in laboratory phonology III: Phonological structure and phonetic form, (pp. 7-33). Cambridge: Cambridge University Press. Benguerel, A.-P., Hirose, H., Sawashima, M. & Ushijima, T. (1977a). Velar coarticulation in French: an electromyographic study. Journal of Phonetics 5, 159-167. Benguerel, A.-P., Hirose, H., Sawashima, M. & Ushijima, T. (1977b). Velar coarticulation in French: a fiberscopic study. Journal of Phonetics 5, 149-158. Bladon, R. A. W. & Al-Bamerni, A. (1976). Coarticulation resistance in English /l/. Journal of Phonetics 4, 137-150. Bladon, R. A. W. & Nolan, F. J. (1977). A video-fluorographic investigation of tip and blade alveolars in English. Journal of Phonetics 5, 185-193. Catford, J. C. (2001). A Practical Introduction to Phonetics. Oxford: Oxford University Press. Chafcouloff, M. (1999). Transducers for investigating velopharyngeal function. In W. J. Hardcastle & N. Hewlett (Eds.), Coarticulation. Theory, Data and Techniques, (pp. 284-293). Cambridge: Cambridge University Press. Cho, T. (2004). Prosodically conditioned strengthening and vowel-to-vowel coarticulation in English. Journal of Phonetics 32, 141-176. Clark, J., Yallop, C. & Fletcher, J. (2007). An Introduction to Phonetics and Phonology. Oxford: Blackwell Publishing. 25

Farnetani, E. (1997). Coarticulation and Connected Speech Processes. In W. J. Hardcastle & R. D. Kent (Eds.), Coarticulation in recent speech production models, (pp. 115-133). Oxford: Blackwell Publishers. Farnetani, E. & Recasens, D. (1999). Coarticulation models in recent speech production theories. In W. J. Hardcastle & N. Hewlett (Eds.), Coarticulation. Theory, Data and Techniques, (pp. 31-68). Cambridge: Cambridge University Press. Fowler, C. A. (1981). Production and perception of coarticulation among stressed and unstressed vowels. Journal of Speech and Hearing Research 46, 127-139. Khnert, B. & Nolan, F. (1999). The origin of coarticulation. In Hardcastle, W. J. & Hewlett, N. (Eds.) Coarticulation. Theory, Data and Techniques, (pp. 7-30.) Cambridge: Cambridge University Press. Lehiste, I. (1964). Some Acoustic Characteristics of Selected English Consonants . Research Center in Anthropology, Folklore and Linguistic. Indiana University. Magen, H. (1997). The extent of vowel-to-vowel coarticulation in English. Journal of Phonetics 25, 187-205. Majewski, W., Rothman, H. B. & Hollien, H. (1977). Acoustic comparisons of American English and Polish. Journal of Phonetics 5, 247-251. Mayer, J. (1995). Transcription of German intonation-the Stuttgart System. Ms., University of Stuttgart. Recasens, D. (1984a). V-to-V coarticulation in Catalan VCV sequences: An articulatory and acoustical study. Journal of Phonetics 12, 61-73. Recasens, D. (1984b). V-to-V coarticulation in Catalan VCV sequences. Journal of the Acoustical Society of America 76, 1624-1635. Recasens, D. (1985). Coarticulatory patterns and degree of coarticulatory resistance in Catalan CV sequences. Language and Speech 28, 97-114. Recasens, D., Fontdevila, J. & Pallars, M. D. (1995). Velarization degree and coarticulatory resistance for /l/ in Catalan and German. Journal of Phonetics 23, 37-52. Recasens, D. (1999). Lingual coarticulation. In Hardcastle, W. J. & Hewlett, N. (Eds.), Coarticulation. Theory, Data and Techniques, (pp. 80-104.). Cambridge: Cambridge University Press. Recasens, D. & Pallars, M. D. (2001). Coarticulation, assimilation and blending in Catalan consonant clusters. Journal of Phonetics 29, 273-301. Roach, P. (1992). Introducing Phonetics. New York: Penguin Books. Saville-Troike, M. (2005). Introducing Second Language Acquisition. Cambridge: 26

Cambridge University Press. Shattuck-Hufnagel, S. & Turk, A. E. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25 (2), 193-247. Stevens, K. N. & House, A. S. (1963). Perturbation of Vowel Articulations By Consonantal Context: An Acoustical Study. Journal of Speech an Hearing Research 6, 111-128. ACKNOWLEDGEMENTS The authors would like to thank Matthias Jilka for his valuable input to this paper as well as the Institute for Natural Language Processing, University of Stuttgart, Germany, and especially the DFG project Language Talent and Brain Activity for their support and advice.

27

MEASURE FOR MEASURE: WHY TYPE-TOKEN RATIO BASED MEASURES ARE NOT VALID TO ASSESS LEXICAL COMPLEXITY/RICHNESS AS A DIMENSION OF LANGUAGE PROFICIENCY Bram Bult Free University of Brussels, Belgium

1. CONTEXT With the development of computational linguistics and the use of large corpora in research designs, quantitative measures are often used to assess language data. These measures, it seems, have to be first and foremost practical and easy to apply. The actual validity and reliability of these measures, however, is not always clear, and is often questioned (see Laufer & Nation, 1995; Malvern et al., 2004). Language acquisition researchers first strived to create a developmental index, i.e. a global measure for the development of overall language proficiency (see Wolfe-Quintero et al., 1998). In contrast, the measures that are used nowadays typically target one specific dimension of language proficiency, such as fluency, accuracy or complexity. The accuracy and complexity dimensions are sometimes further broken down into even more specific lexical, syntactic or morphological subcomponents. This article takes a critical look at measures of lexical complexity, and particularly at measures based on the type-token ratio. 2. COMPLEXITY AS A DIMENSION OF LANGUAGE PROFICIENCY When assessing complexity as a dimension of language proficiency, it is important to determine what type of complexity is being measured. Lexical complexity can either be defined as mastering a wide range of lexical items or mastering a repertoire consisting of complex lexical items. The first definition is based on the information theoretic definition of complexity which states that complexity refers to information that cannot be reduced (see Dahl, 2004). A vast vocabulary, consisting of many lexical items would thus be more complex (less reducible) than a smaller one. This definition focuses on the lexicon as a system (systemic complexity). The second definition of complexity pertains to specific lexical items. Some items may be inherently more complex than others. In language acquisition research, complexity is often (mistakenly) equated with difficulty. Many researchers opt for a pragmatic interpretation of complexity, by assuming that what is later acquired is more complex, or what is less frequent is more complex. Although these concepts are related, they are not isomorphic. Difficulty might be perceived as a consequence of complexity and the frequency of linguistic structures and items can in turn be caused by their difficulty. Thus it is often argued that more complex items are less frequent and more difficult to acquire. However, the relative frequency of items, may have little to do with complexity as such. The main questions that need to be answered are: (a) what makes one lexical item more complex than another? and b) what objective criteria can be used to determine the complexity of lexical items? 28

3. MEASURING LEXICAL COMPLEXITY It is obvious that measures that are used to assess complexity should ideally be theoretically grounded. The measures of lexical complexity that are most widely used in language acquisition research are the type-token ratio (TTR) and its derivates. The TTR is a purely quantitative measure; it only takes into account the number of individual words (tokens) and the number of different words (types) that appear in a given language sample. The qualitative characteristics of the words are thereby ignored. What the TTR measures is the (numerical) diversity of words in a text, or the lexical density . Since the TTR is influenced by the length of the texts that are being examined (it becomes increasingly harder to introduce new different words in a longer text), different transformations have been proposed to eliminate the intervening effects of text length on the reliability of the TTR. Graph 1 shows how the TTR steadily drops when the length of a text is increased. The graph is based on 10 language samples of elicited narrative and descriptive Dutch discourse produced by five 18-year-old native speakers of Dutch (6BNN1-6BNN5) and five 18-year-old francophone L2 learners of Dutch (6BNF1-6BNF5). Each sample originally consists of more than 400 word tokens. By computing the TTR for the first 2, 10, 50, 100, 200, 300 and 400 tokens, a curve is obtained reflecting the unwanted influence of text length (a consistent measure would generate a straight line). For this (small) corpus, the average TTR for 50 tokens is approximately 0.8, for 100 tokens 0.7, for 200 tokens 0.6 and for 400 tokens 0.5. This illustrates that comparing the TTR of language samples with different text length is impossible.

6BNN1 6BNN2 6BNN3 6BNN4 6BNN5 6BNF1 6BNF2 6BNF3 6BNF4 0 100 200 Tokens 300 400 6BNF5 Average

0,8

0,6 Type-token ratio 0,4

Graph 1: TTR drops when text-length increases.

The different transformations of the TTR essentially try to mend the shape of this curve (which is similar for different corpora), and turn it into a straight (consistent) line. A first measure, Guiraud's (1959) index (G), divides the number of types by the square root of the number of tokens. But, instead of cancelling out the effect of text length, it seems that longer texts are advantaged by this measure. The square root in the denominator actually overcompensates the effect of text length, resulting in an inversely shaped curve (see graph 2). 29

The fact that G correlates well with other measures of lexical complexity as a dimension of language proficiency (see Vermeer, 2000) is maybe due to the fact that more proficient language users produce longer texts.

14 12 10 8 6 4 Index of Guiraud 2 0 0 100 200 Tokens 300 400

6BNN1 6BNN2 6BNN3 6BNN4 6BNN5 6BNF1 6BNF2 6BNF3 6BNF4 6BNF5 Average

Graph 2: Guiraud's index increases when text length increases.

A second measure, the index of Herdan (H), offers a logarithmic transformation, by dividing the logarithm of types by the logarithm of tokens. But this transformation does not seem to be able to cancel out the effect of text length either. In the Uber index (U) a more complicated logarithmic transformation is used (see Dugast, 1980), and this measure is probably more successful at neutralizing the effect of text length. "D" finally seems to offer the best mathematical transformation (see Malvern et al., 2004). The Guiraud, Herdan and Uber indexes all consist of one mathematical formula that favours longer texts over shorter ones. Such a formula is always a hypothetical approximation. D is calculated in a more individual way, by considering each transcription independently. A curve is created by measuring the TTR of different (random) parts of a language sample, with varying length. It is the shape of this curve that determines the value of D. Since this article concerns the foundations of the measures (which are the same for all of these measures), it is not necessary to go into too much detail about the nature of the methodology used to calculate "D". One has to keep in mind that these (mathematical) transformations of the TTR do not change the basic premise of the measure. All transformations are simply designed to neutralize the effect of text length. Neither of them adds any qualitative weight to the TTR: they are all exclusively based on the distinction between types and tokens and fundamentally measure the same construct. 4. DO THE MEASURES MEASURE WHAT WE WANT THEM TO MEASURE? If these measures are applied to assess the lexical complexity as a dimension of the language proficiency of a language user, they have to indicate whether the language user does indeed master a large vocabulary and can produce or understand complex lexical items. I argue that, for several (interrelated) reasons, the type-token ratio is not suited for this task: a) the concepts type and token ignore the qualitative characteristics of words, b) the TTR only looks at a language sample in isolation, c) complexity in performance is not necessarily an indication of proficiency, d) one language sample is not sufficient to assess lexical complexity, e) the TTR 30

is too much influenced by the task type and context, and f) the TTR is too sensitive to stylistic choices and features. 4.1 The quality of quantity Consider the following (fictional) text samples: The little boy walks to a tree. He takes one apple and eats it. Then he runs away. Next he goes where there is an animal and he gives food. => 26 types / 30 tokens It is difficult to measure how complex a complex system really is. Complexity is per definition multi-layered. In order to assess it, the type of complexity must be determined. => 23 types / 30 token

As these examples illustrate, the TTR is a purely quantitative measure in which the quality of words is ignored. "Tree", "boy" and "food" are treated in the same way as "complexity", "multi-layered" or "assess". Although we feel that some words are more complex than others, it remains problematic to determine objectively what makes one word more complex than another. More recently developed measures of lexical complexity try to incorporate a more qualitative component (although based on quantitative data). The lexical frequency profile (Laufer & Nation, 1995) and other rare word measures compare the words in a language sample to existing word frequency lists. As indicated earlier, the link between frequency and complexity is not straightforward however, and remains to be investigated. It is possible that more complex items are less frequent (as a result of their complexity), and that less frequent items might be more difficult to acquire. Although it is definitely promising to try to distinguish between different words, instead of treating them all the same, it remains to be seen whether looking at their relative frequency really offers the most valid path to follow. 4.2 Solitary confinement TTR-based measures only look at one language sample in isolation. There are no external criteria and no possibilities for comparison across samples. A TTR does not compare the words in a text to any norm or reference, nor does it compare different texts. The language sample is treated as a closed object, without any context. The lexical frequency profile and other rare word measures compare the words in a text to external frequency lists. Of course, such global frequency lists have to be compiled and made accessible. It is also possible to calculate a corpus-based lexical frequency profile, for instance when no frequency lists are readily available. By doing this, different language samples can be compared by looking at the relative frequency of words in the entire corpus that is being investigated. This method has both advantages and drawbacks. The language samples in the corpus are treated as the norm, which can be useful when texts discuss a specific topic or are written in a specific genre. Of course, the question remains to what extent frequency and complexity are related. 4.3 Complexity in performance does not equal proficiency

31

When complexity is being studied as a dimension of language proficiency, it is regarded as a positive feature. A higher level of complexity in this context contributes to a higher overall level of language proficiency. However, the relationship between lexical complexity and proficiency is not linear. There is in fact a trade-off point, where increasing lexical complexity actually detracts from (perceived) proficiency. Furthermore, proficiency is not only a linguistic phenomenon: also socio-linguistic and discourse pragmatic aspects should be taken into account. Sometimes it might be considered proficient not to use too complex language structures. This is a problem that cannot be dealt with by automatic (computerized) calculations of a quantitative measure. 4.4 The sum of one is not all One language sample usually contains insufficient data to assess complexity as a dimension of language proficiency (see Meara & Fitzpatrick, 2000). To assess the richness or the vastness of vocabulary, it might be more useful to elicit performance or to use purpose-built tests. This raises an important classical question: what is the relationship between what people know (competence) and what they actually do (performance)? It is not because a language learner or user does not use complex vocabulary in one instance of performance, that he is not able to do so. When specific tests are compiled to assess vocabulary, they can be designed in such a way to include words with different degrees of complexity. When the size or the complexity of the lexicon of a language user has to be assessed on the basis of one (or even more) instances of free language production, many data are needed to provide a sufficient overview. When only one language sample is examined to determine the proficiency of a language learner, the influence of psychological, physical and contextual variables can be contaminating the results. Subjects can be tired, bored, not very motivated, and so forth. 4.5 Task and context It is obvious that the task type can have a considerable impact on the TTR. Different tasks elicit different types of performance. When comparing language users, it is of course possible to provide the same task, but also the internal structure and number of topics discussed are important factors. It is easier to use more different words when talking about various topics, but of course discussing a variety of topics does not necessarily point to a higher level of lexical complexity. These factors are not taken into account by TTR-based measures, and can contaminate results. This is a problem that is difficult to overcome, since quantification of topic changes is problematic. At any rate the value obtained on quantitative measures should never be interpreted in isolation, but only in comparison with other values on the same measure computed for matched samples. Different language users might also possess a more complex lexicon in different contexts. They might be specialized and know complex vocabulary about a given topic. A certain type of task or context can privilege some language users, so when the proficiency of a learner is being assessed on the basis of one (or even two) specific language samples, results can be deceiving. 4.6 Style Repetition can be a stylistic or rhetoric feature. Analyzing Martin Luther King's language proficiency based on his "I have a dream" speech by means of the TTR would lead to peculiar results. Repeating a word or phrase is not necessarily a sign of low language proficiency. 32

Conciseness, omission or simplification can also be stylistic elements. This should be taken into account when assessing lexical complexity, but doing so in an automatic (computerized) way seems to be impossible. 5. OBJECTIVE ASSESSMENT OF QUANTITATIVE MEASURES It is often argued that measures that are able to discriminate between groups of language learners at different stages of development are good measures. This type of reasoning is somewhat circular, since the measures are designed to check whether there are any differences in the first place. Another way of assessing validity is by looking at the correlation between quantitative measures and purpose-built tests and subjective ratings (concurrent validity). This can also be deceptive, since a positive correlation says nothing about causality, and can be the result of external correlating variables ("post hoc fallacy"). It is not because two measures correlate positively that they measure the same construct or are causally linked to each other. Probably the most valid way of checking the reliability of the measures, is to have a group of language users complete a same type of task two or three times at a given moment in time. By checking the correlation between the scores for the different language samples, it can be determined whether the measures are consistent or not. 6. ALTERNATIVES AND LOOKING FORWARD It has to be kept in mind that this article focuses on measures of lexical complexity as a dimension of language proficiency. TTR-based measures can be useful to assess the lexical diversity of single language samples, but only if the samples are considered as objects, and no claims are being made about the proficiency level of the language user that produced the sample. Even if this is the case, the link between this measure and theories of (lexical) complexity is not very strong. TTR-based measures do not carry much qualitative weight. They have one major advantage (and this might help to explain why they are still often used in research): they are easy to apply to large corpora and very practical to use. When TTR-based measures are used in research, "D" is probably the best option when the length of the language samples is variable. But what are the alternatives to TTR-based measures? The lexical frequency profile and other measures that look at rare words add a more qualitative and external dimension to the measurement, but the link between frequency and complexity is not firmly established. Some of the problems that count for TTR-based measures are valid for rare word measures as well: does one free instance of performance contain enough data to assess the lexical complexity as a dimension of language proficiency? What about the appropriateness of utterances and stylistic features? Also the influence of text length can play a role with this kind of measures. It is obvious that in longer texts there is more opportunity to use more less frequent words. To neutralize the effect of text length, the ratio of rare words to non-rare words (or different frequency categories) can be determined, but then again this percentage can be deceiving. A measure that is both easy to compute (automatically) and firmly rooted in complexity theory is hard to operationalise. Designing a tool that incorporates different aspects of the quality of words is a huge task. Coding has to be done for every language, and for every single word in that language. Maybe measures based on frequency lists can be used to filter out the most common words, and researchers can assess the remaining words manually, based on objective 33

criteria that have been determined beforehand. It does not lie within the scope of this article to try to determine what criteria should be considered, but possible candidates are: semantic features (abstract or concrete words, ...), number of phonemes and number of morphemes. Another alternative to objective corpus-based assessment of language proficiency is subjective rating. A group of trained professionals can judge the lexical complexity of a given sample, and try to form an image of how this relates to the language proficiency of the language user that produced the sample. Inter-rater reliability can be checked by looking at the degree of correlation between the scores given by different raters. I argue that when objective quantitative measures are used to assess proficiency, they should always be combined with subjective ratings. Purpose-built tests are probably still a more valid way of assessing the vastness and complexity of vocabulary. If possible, performance-based tests should be accompanied by tests that try to assess the competence of a learner, by looking at what he knows, rather than what he does. Combining quantitative measures with qualitative assessment offers a wider picture of language proficiency. Of course, purpose-built tests can also be criticised for several reasons. An important advantage is that they can be adapted to the specific circumstances: the context and the (expected) global proficiency level of the language users that are being assessed can be taken into account. Maybe TTR-based measures can give an indication of lexical complexity as a dimension of language proficiency, but this indication is at best rough, and the measures are definitely not appropriate to detect small differences in proficiency. It can be assumed that TTR-based measures are better suited to assess learners at lower levels of proficiency, and better suited to assess oral language samples. At more advanced stages of language proficiency they are unable to discriminate between different language users. All research aimed at assessing/measuring lexical complexity as a dimension of language proficiency using TTR-based measures should be treated with much caution. It is not because no significant increase in (complexity as a dimension of) proficiency was found, or no significant differences between groups of subjects were discovered, that no such increase or difference exists. Quantitative assessment methodology is not rigid enough and has been insufficiently tested to be (exclusively) applied in research designs. Research results can be contaminated by shortcomings in methodology, and this can lead to misleading conclusions. REFERENCES Baker, D. (1989). Language Testing. A Critical Survey and Practical Guide. London: Edward Arnold. Biber, D., Conrad S. & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Clapham, C. & Corson, D. (1997). (Eds.) Encyclopedia of Language and Education. Volume 7: Language Testing and Assessment. Dordrecht: Kluwer Academic Publishers. Cumming, A. (1997). The testing of second-language writing. In C. Clapham (Ed.), Language Assessment, Vol. 7 of the Encyclopedia of language and education, (pp. 51-63). Dordrecht: Kluwer.

34

Cummins, J. (1984). Bilingualism and Special Education. Issues in Assessment and Pedagogy . Clevedon: Multilingual Matters. Dahl, O. (2004). The Growth and Maintenance of Linguistic Complexity . Amsterdam: John Benjamins. Daller, H., Van Hout R. & Treffers-Daller J. (2003). Lexical richness in the spontaneous speech of bilinguals. In Applied Linguistics 24(2), 197-222. Dugast, D. (1980). La statistique lexicale. Genve: Slatkine. Ellis, R. (1997). Second Language Acquisition. Oxford: Oxford University Press. Guiraud, P. (1959). Problmes et mthodes de la statistique linguistique . Dordrecht: Reidel. Housen, A. & Pierrard, M. (2004). Meertaligheid in Nederlandstalig secundair onderwijs in Brussel: verslag van een effectenstudie. In Brusselse Themas. Deel 12: Taal, Attitude en Onderwijs in Brussel, (pp. 9-33). Brussel: VUBPress. Housen, A., Pierrard, M. & Van Daele, S. (2005). Structure complexity and the efficacy of explicit grammar instruction. In Housen & Pierrard (Eds.). Investigations in Instructed Second Language Acquisition, (pp. 235-267). Berlijn - New York: Mouton De Gruyter. Laufer, B. & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics 16(3), 307-329. Macwhinney, B. (1995). The CHILDES Project. Mahwah, NJ: Lawrence Erlbaum. Malvern, D., Chipere N., Richards B. & Durn P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Houndmills: Palgrave Macmillan. Meara, P. & Fitzpatrick T. (2000). Lex30: an improved method of assessing productive vocabulary in an L2. System 28, 19-30. Meyer, C. F. (2002). English Corpus Linguistics. An Introduction. Cambridge: Cambridge University Press. Pienemann, M. (1998). Language Processing and Second Language Development: Processability Theory. Amsterdam: John Benjamins. Scholfield, P. (1995). Quantifying Language. Clevedon: Multilingual Matters. Steels, L. (2000). Language as a complex adaptive system. In Schoenauer, M. (Ed.), Proceedings of PPSU VI. Lecture Notes in Computer Science, (pp. 17-26). Berlijn: SpringerVerlag. Van Hout, R. & Vermeer, A. (1988). Spontane taaldata en het meten van lexicale rijkdom in tweedetaalverwerving. In Toegepaste Taalwetenschap in Artikelen 32, 108-122.

35

Van Mensel, L., Pierrard, M. & Housen, A. (2004). Taalvaardigheid van Nederlandstalige en Franstalige leerlingen in het Nederlandstalig secundair onderwijs in Brussel. In A. Housen, M. Pierrard & P. Van de Craen (Eds.), Brusselse Themas. Deel 12: Taal, Attitude en Onderwijs in Brussel, (pp. 67-110). Brussel: VUBPress. Vermeer, A. (2000). Coming to grips with lexical richness in spontaneous speech data. Language Testing 17(1), 65-83. Vermeer, A. (2004). The relation between lexical richness and vocabulary size in Dutch L1 and L2 children. In P. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition and testing, (pp. 173-189). Amsterdam: John Benjamins. Wolfe-Quintero, K., S. Inagaki & Kim H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity . Honolulu: University of Hawaii, Second Language Teaching & Curriculum Center.

36

37

DEFINITIONS OF COMPLEXITY sten Dahl Stockholm University, Sweden

1. INTRODUCTION According to the lexical database WordNet, complexity has only one meaning, the quality of being intricate and compounded. In fact, however, the ways the term complexity is understood in various disciplines are in themselves complex. In this paper, I shall give a brief account of my own understanding of linguistic complexity, as developed in more detail in my monograph Dahl (2004), relating it to some of the uses of the term complexity in the study of second language acquisition. To begin with, we have to make a distinction between two major ways of understanding complexity, which I shall call absolute and agent-related complexity, respectively. 2. ABSOLUTE COMPLEXITY The first notion (for which I have sometimes used the term objective complexity) is the one that is employed in information theory and the theory of complex systems, and involves the idea that complexity is an objective property of an object or a system. It is notoriously difficult to give a rigid definition of complexity in this sense. The general idea is that the complexity of an object is related to the amount of information needed to re-create or specify it (or alternatively, the length of the shortest possible complete description of it). I shall give a simple example to show how this idea could be applied in practice. Suppose we have three strings of characters, hahaha, byebye, and pardon. Although these all consist of 6 characters, they differ in that the two first strings can in fact be represented in a more compact way, e.g. as 3*ha and 2*bye, whereas there is no way of compressing the string pardon in a similar way. We might therefore say that hahaha is the least complex string, since it can be reduced to 4 characters, while byebye takes minimally 5 and pardon 6 characters. As applied to strings, this notion of complexity, which is sometimes called Kolmogorov complexity or algorithmic information content, comes out as an inverse of compressibility: the most complex string is one which cannot be compressed at all. If we implement this idea in the most straightforward way, we obtain what Gell-Mann (1994) calls effective complexity, which differs from Kolmogorov complexity in that it does not measure the length of the description of an object as a whole, but rather the length of the description of the set of regularities or structured patterns that it contains. A random string of characters, such as w509mf0wr6435217ro0l71734, will have maximal Kolmogorov complexity (the string is its own shortest description), but no effective complexity since it contains no structured patterns. This corresponds better to an intuitive understanding of the notion of complexity. We also come close to a notion which may feel more familiar to linguists: the set of patterns that an object contains can be said to equal its structure, so the complexity of an object is really a measure of the complexity of its structure.

38

3. SYSTEM COMPLEXITY In linguistics, such an absolute complexity measure could apply to different things. Most importantly, it could apply on the one hand to a language seen as a system -- what I call system complexity -- and on the other to the structure of utterances and expressions -- what I call structural complexity. The notion of system complexity can be illustrated by comparing two languages with respect to some subcomponent of their grammars. Thus, the English Adjective+Noun construction is less complex than the corresponding one in French, as in English the adjective has always the same form, independent on the head noun, whereas in French the form of the adjective agrees with the noun and thus depends on its gender and number (e.g. petit garon small boy vs. petite fille small girl). This means that the rules that specify the French construction have to be more complex than the rules for the English construction. In Dahl (2004), system complexity, more specifically the complexity of grammars, was the focus of my interest: given that a speaker has chosen a set of lexemes to express a certain content, how complex are the rules that specify how these lexemes can be combined to express that content in a grammatical way? For instance, as we have just seen, the rules that determine how words meaning small and girl can be put together in a noun phrase are more complex in French than in English. I was particularly interested in processes of grammatical change that lead to an increase in complexity, in particular to an increase in what I called nonlinearity, by which I mean all grammatical phenomena that cannot be described in terms of simple concatenation of lexical elements. Arguably, certain forms of non-linearity, such as the use of ablaut in verb inflection (as illustrated by the strong verbs of Germanic languages), take a long time to develop, and therefore can be seen as mature phenomena in language. This, in fact, has a rather direct relationship to second language acquisition. In recent years, there has been a reaction to the generally assumed tenet of modern linguistics that human languages do not differ in complexity (see e.g. McWhorter (2001) and the other papers in the same journal issue). It has been argued that there are clear differences in phonological and grammatical complexity between languages and that this is correlated to the degree of contact that the languages in question have had with other languages. In particular, situations where the transmission of a language from one generation to another is to a significant extent mediated by second-language speakers, there tends to be a reduction of phonological and grammatical complexity, which hits above all mature linguistic structures, i.e. those that have taken a long time to arise. 4. STRUCTURAL COMPLEXITY If we now turn to the notion of structural complexity, it can of course be applied to one and the same expression at different structural levels -- syntax, morphology etc., and there are also various measures that could be used. For instance, we could consider the depth of the maximal sentential embedding in a sentence. On this measure, a sentence such as John said [that Mary said [that she was coming]] is more complex than John said [that he was coming]]. Although system complexity is often accompanied by structural complexity, this is not necessarily always the case. For instance, plural formation in nouns is more complex in Swedish than in English, since there are a number of different plural suffixes to choose from 39

(e.g. hund-ar dogs vs. katt-er cats), while in English there is only one (-s) except for a limited number of irregular nouns, but Swedish plural nouns are not structurally more complex than the English ones (cf. hund-ar vs. dog-s). Another way of viewing the distinction is to say that system complexity concerns competence/langue while structural complexity concerns performance/parole. The notion of complexity that is most common in second language research, in particular when it is used together with the companion notions accuracy and fluency, clearly relates to performance rather than to competence. More specifically, it could be characterized in terms of the average structural complexity of the output of a speaker, as measured for instance in the ratio of embedded clauses to sentences (which, in passing, seems a bit one-sided to me). 5. AGENT-RELATED COMPLEXITY System complexity could be seen as a measure of the content that language learners have to master in order to be proficient in a language, in other words, the content of their competence. It does not as such tell us anything about the difficulty they have in learning, producing and understanding the language -- that would take us to the other notion of complexity, viz. agentrelated complexity. Although agent-related complexity is perhaps the most popular way of understanding complexity in linguistics, I would in fact prefer to reserve the term complexity for absolute complexity and use other terms such as cost, difficulty, and demandingness to denote different aspects of complexity for a user. In Dahl (2004), I defined those terms as follows. Cost is essentially the amount of resources in terms of energy, money or anything else that an agent spends in order to achieve some goal. What we can call cost-benefit considerations are certainly of central importance in explaining many aspects of communicative behaviour. Difficulty is a notion that primarily applies to tasks and is always relative to an agent: it is easy or difficult for someone. Difficulty can be understood and measured in several different ways. One measure of the difficulty of a task is in terms of risk of failure that is, if a large proportion of all agents fail or one agent fails more often than he or she succeeds, the task is difficult for that agent or group of agents. There is an indirect relationship here to variation if results vary, it means that the task is neither maximally easy nor maximally difficult. There is also a relationship to cost tasks that demand large expenditure of resources or in particular those that force the agent to or beyond the limits of his or her capacity are experienced as difficult. Demandingness refers to the requirements a task puts on its performers: for instance, if you want to study physics, you have to have a certain knowledge of mathematics. Demandingness is different from difficulty in that the task is difficult only if you do not fulfil the requirements if you do, it may be very easy. For instance, acquiring a human language natively is certainly demanding (only human children seem to fulfil the requirements), but it does not necessarily follow that children find it difficult.

40

6. TASK COMPLEXITY In the context of second language acquisition, Robinson (2001) makes a distinction between task difficulty and task complexity, which is rather similar to the one I make here between difficulty and demandingness. For Robinson, task difficulty depends on learner factors which are a consequence of differentials between learners in their available attentional, memory, and reasoning resource pools, i.e. differentials in the resources they draw on in responding to the demands of tasks, while task complexity (31) is the result of the attentional, memory, reasoning and other information processing demands imposed by the structure of the task on the language learner (29). In fact, the term task complexity has been used widely in the behavioral sciences, as noted by Gill & Hicks (2006), who identify no less than thirteen distinct definitions of task complexity in the literature, which they classify as follows: Experienced complexity: definitions where complexity is defined in terms of what the task performer experiences. Information processing complexity: definitions where complexity is defined to be the underlying source of IP capacity requirements or throughput experienced while performing the task. Problem space complexity: definitions where complexity is measured as a function of the characteristics of a problem space used to perform the task. Lack of structure complexity: definitions where complexity represents the degree to which a task is fully programmed. Objective complexity: definitions where complexity is measured as a function dependent upon characteristics strictly specified by the task itself.

It seems that Robinsons two concepts task difficulty and task complexity come closest to the two first types of definitions. The last type, objective complexity, may appear similar to what I have called absolute complexity above, but it is not clear from the paper how it is related to that notion -- in fact, while Gill & Hicks discuss the notion of task complexity in detail, they are not very explicit about how they understand the notion of complexity in general. 7. REASONS FOR KEEPING ABSOLUTE AND AGENT-RELATED COMPLEXITY APART Language obviously involves many different tasks and many different types of agents. This alone is a good reason for not identifying the (absolute) complexity of a language with difficulty, since there is a priori no reason for giving priority to any particular kind of difficulty. Factors of particular importance are, on the one hand, difficulty of processing, and learning or acquisition difficulty on the other. In the latter case, the distinction between first and second language acquisition is evident. When complexity is identified as (or related to) acquisition difficulty, it is more often than not second-language learning that people are thinking of, which is probably natural in view of the fact that it is much harder to identify any 41

variation in the success of first-language acquisition. It is indeed important to know what is easy and what is difficult for learners of second languages, but nevertheless second-language learning difficulty should be labelled as such and not confused with (absolute) complexity. If this is acknowledged, we can formulate and hopefully find the answer to the empirical question of the relationship between second-language learning difficulty and various independently defined notions of complexity. Thus, I am not saying that absolute complexity and difficulty have nothing to do with each other. On the contrary, there is obviously a close relationship between these notions, but in order to be able to be clear about this relationship, we have to keep them apart in the analysis. I mentioned above that certain changes that lead to a reduction in system complexity are typical of what I have called sub-optimal transmission of languages, i.e. when children get a large part of their linguistic input from second-language speakers. This is in contrast to the high fidelity of transmission obtained when only first-language speakers are involved. It does appear that children who learn their first language and adults who learn a second language differ in what kinds of linguistic complexity they find difficult -- a good reason for keeping absolute and agent-related complexity apart. REFERENCES

1. INTRODUCTION La recherche que nous prsentons ici est partie dun simple constat: le flou persiste sur lextension donner au terme fluidit et sur les relations quentretiennent la fluidit, la complexit et lexactitude en matire dvaluation des comptences linguistiques. Bien sr, si ces notions sont connues des chercheurs francophones en enseignement, apprentissage et acquisition des langues trangres, elles ne font cependant pas partie de leurs angles de recherche habituels, et lon peut comprendre alors que leurs conceptions restent incertaines. Cependant, dans la littrature scientifique en anglais, o ces descripteurs sont frquemment utiliss, le dbat sur ce que recouvre la fluidit nest pas davantage tranch. Nous nous sommes donc interrogs sur linfluence que les comptences en syntaxe, en lexique et en prononciation exeraient sur la perception que des valuateurs de franais langue trangre ont de la fluidit, de la complexit et de lexactitude. 2. DEFINITIONS ET DEBATS 2.1 La fluidit: une dfinition plurielle? Dans sa premire acception, celle que lui donne le langage courant, le terme fluidit se confond avec la matrise gnrale de la langue (Wood, 2001). Celui qui est fluent comme on dit en franglais dans une langue trangre la parle avec la mme virtuosit, la mme aisance et la mme efficacit que le locuteur natif. Dans ce cas-ci, ds lors, se poser la question des critres qui concourent un usage fluide de la langue trangre, cest sinterroger sur ce que reprsente la matrise de cette langue. Une dfinition aussi large, o la fluidit englobe non seulement la complexit et lexactitude, mais aussi tout ce que lon peut inclure dans la comptence linguistique, nest gure oprante. Elle convient videmment dcrire 42

lexcellence, la comptence du parfait bilingue, mais ds quil sagit dapporter des nuances, dapprcier tous les stades intermdiaires qui sparent lapprenant dbutant de ce parfait bilingue, elle nous renvoie au dbat sans fin sur lvaluation de la comptence linguistique: devrait-on accorder plus de poids la grammaire, au vocabulaire, la prononciation, ; se centrer sur le sens ou sur la forme; valuer selon telle ou telle norme ? Dans la perspective communicative de lenseignement des langues, la notion de fluidit ne recouvre pas la mme ralit. Lapprenant de langue trangre qui sexprime avec fluidit est celui qui est capable, malgr les lacunes de son systme linguistique, datteindre son but communicatif, de se faire comprendre (Chambers, 1998). Dans cette optique, on conoit bien que les rapports entre la fluidit, la complexit et lexactitude se trouvent bouleverss: il nest plus question dune fluidit qui engloberait, entre autres, la complexit et lexactitude, mais bien dune fluidit que lon met en contraste avec ces deux dernires notions, qui soppose elles. Dun ct, laccent est davantage plac sur la signification, sur le contenu du message; de lautre, lessentiel rside dans la sophistication formelle, la correction du message. Dans ce contexte, on devine que pour juger de la fluidit du discours dun apprenant de langue trangre, on se base essentiellement sur une valuation qualitative fonde sur des critres lexicaux. La fluidit dpend alors de la richesse lexicale (Laufer, 1992) ou de la variation lexicale en dautres termes, de la capacit reformuler son message lorsque lon saperoit que lon n'est pas compris. Elle peut aussi tre fonction de lconomie lexicale, qui permet au locuteur demployer la voie la plus directe pour exprimer ses ides, ou de la sophistication lexicale qui consiste utiliser ensemble les cooccurrents (Raddaoui, 2004: 15). Plusieurs chercheurs (Nattinger & DeCarrico, 1992; Towell et al., 1996: 112; Weinert, 1995; Wray & Perkins, 2000) relvent un autre composant de cette fluidit: lutilisation de formules lexico-grammaticales, dexpressions lexicales qui jouent avec les frontires traces entre le lexique et la syntaxe. Il sagit de segments de discours demi figs que le locuteur complte selon le contexte. Leur matrise serait une condition indispensable pour atteindre un haut degr de fluidit. Dans une dernire acception, la fluidit est un phnomne essentiellement temporel, rythmique et prosodique (Schmidt, 1992) qui se limite la capacit produire un discours oral continu sans interruption ou hsitation intempestives (Raupach, 1980). Cette troisime conception de la fluidit modifie une fois encore ses connexions avec la complexit et lexactitude: elle est considre comme un phnomne qui nest ni concurrent ni dominant par rapport la complexit et lexactitude. Elle se prte aussi plus facilement une valuation quantitative qui repose sur des descripteurs syntaxiques ou prosodiques. On calcule ainsi le taux de rapidit du discours le nombre de mots ou de syllabes prononcs par minute (Nation, 1989) , le taux de pauses, hsitations et autocorrections par T-Unit 1, par minute, ou le pourcentage de temps consacr ces pauses, hsitations et autocorrections, le nombre de mots ou le temps de parole entre deux pauses. La nature des pauses combles ou vides leur place moins drangeantes entre deux T-Units, deux syntagmes quau milieu sont aussi prises en considration (Lennon, 1990). Ainsi, si lon manquait de critres pour dcrire la fluidit dans son sens le plus large, et si les composants de la fluidit communicative paraissaient un peu imprcis, il semble ici quon soit tomb dans lexcs inverse. Cependant les chercheurs saccordent dire que les lments les plus significatifs sont la frquence des pauses plutt que leur longueur, la dure du temps de parole entre deux blancs et la place des interruptions dans un nonc (Chambers, 1998).
Le T-Unit dsigne une proposition principale avec tous ses lments subordonns, propositionnels ou non. (Hunt, 1970)
1

43

2.2 Lexactitude: un problme de traduction? Le terme anglais accuracy dsigne la fois la correction et la prcision. Cest la raison pour laquelle nous avons choisi de le traduire en franais par le mot exactitude qui connat sensiblement la mme ambigut. Plus quun simple problme de traduction, cette quivoque nous informe sur la nature de lexactitude. Si lon choisit dy voir plutt la prcision, on soriente alors vers une notion tributaire du lexique. A linverse, si lon dcide de lui donner le sens de correction , on la subordonne plutt la syntaxe. Il semble que les chercheurs, eux, ont tranch: ils lvaluent notamment en fonction du pourcentage de propositions qui ne contiennent aucune erreur formelle ou du pourcentage de verbes utiliss correctement en termes de temps, daspect, de modalit, (Yuan & Ellis, 2003). 2.3 Mesures de complexit La complexit, quant elle, se mesure en termes de taux de proposition(s) par T-Unit, le nombre total de formes verbales grammaticales utilises ce qui signifie le nombre de combinaisons diffrentes de temps, modes et voix que le locuteur emploie , le rapport entre le nombre total de mots utiliss et le nombre de mots diffrents utiliss (Yuan & Ellis, 2003). Ceci montre encore une fois la prfrence accorde aux critres formels et un type dvaluation quantitative. 2.4 Le CECR et la fluidit, la complexit, lexactitude On voit que la fluidit, la complexit et lexactitude soulvent bon nombre de questions pour quiconque s'intresse lenseignement, lapprentissage et lacquisition des langues trangres. Pourtant, ces notions restent largement ignores dans le Cadre europen commun de rfrence pour les langues (CECR). Ainsi, ce Cadre ne cite-t-il le mot fluidit quune seule fois, sans le dfinir. Le terme complexit est employ le plus souvent dans son sens commun dans des contextes qui nont rien voir avec lvaluation dune performance linguistique. Exactitude connat le mme sort. Deux termes intressants apparaissent tout de mme dans la grille pour lauto-valuation des niveaux communs de comptence: le mot correction qui, sil nest pas davantage dfini que les prcdents, a t prfr exactitude , et le terme aisance qui recouvre manifestement la notion de fluidit dans son sens communicatif. Il est effectivement prcis que lapprenant qui a atteint le plus haut degr de correction dans la langue trangre est capable de sexprimer dans un discours naturel en vitant les difficults ou en les rattrapant avec assez dhabilet pour que linterlocuteur ne sen rende presque pas compte (CECR, 2001: 28). 3. L ENQUETE Pour dcouvrir quelles conceptions de la fluidit, de la complexit et de lexactitude prvalent chez des valuateurs qui se basent sur le CECR, et quel poids ils accordent la prononciation, au vocabulaire et la grammaire dans leur valuation de la fluidit, de la complexit et de

44

lexactitude, nous avons ralis une enqute laquelle ont contribu des apprenants de franais langue trangre et des valuateurs spcialiss. Dix-sept tudiants de franais langue trangre, suivant des cours lInstitut Suprieur des Langues vivantes de lUniversit de Lige en 2007 ont t enregistrs quelques minutes, sans interruption ni encouragement, aprs quon leur a pos les trois questions suivantes propos dune photographie quon leur prsentait: Que voyez-vous sur la photo ? Que sest-il pass avant, que se passera-t-il aprs? Que ressentez-vous en voyant cette photo ? Les enregistrements de deux de ces tudiants ont t scinds et mls aux autres comme sil sagissait de ceux dtudiants diffrents, pour donner un total de 19 enregistrements, dune minute trente chacun, qui ne portent aucune indication sur lidentit de lapprenant. Ces productions orales ont t soumises 11 professeurs-valuateurs de franais langue trangre, tous expriments et forms lutilisation du CECR, qui ont t invits leur attribuer une note comprise entre 1 et 5 pour chacun des critres que nous avions introduits dans trois grilles de corrections parallles: la premire portait sur le vocabulaire, la grammaire et la prononciation; la deuxime concernait la fluidit, la complexit et lexactitude; sur la troisime ils devaient classer chacun de ces monologues un niveau du CECR.
Tableau 1: Critres analytiques et synthtiques.
PREMIERE GRILLE DE CRITERES SECONDE GRILLE DE CRITRES ANALYTIQUES ANALYTIQUES
Prononciation Vocabulaire Grammaire 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Complexit 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 Fluidit 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 Exactitude 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1-2-3-4-5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

CRITRE SYNTHTIQUE
Niveau du CECR A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2 A1 A2 B1 B2 C1 C2

Nous esprions dcouvrir de cette manire si les professeurs tablissaient un lien entre les notes quils attribuaient pour les critres de la premire, de la deuxime et/ou de la troisime grille. Ainsi, si leurs valuations pour la fluidit et pour le niveau du CECR avaient tendance augmenter de concert, cela indiquerait quils sont plutt sensibles la premire dfinition de la fluidit (fluidit-matrise de la langue). Sils liaient les notes de vocabulaire et de fluidit, ils optaient plutt pour la seconde dfinition (fluidit communicative). Enfin, bien entendu, sils subordonnaient les rsultats de la fluidit ceux de la grammaire et de la prononciation, on pourrait en conclure quils sont en faveur de la troisime dfinition (fluidit prosodique).

45

3. RESULTATS 3.1 Corrlations entre les critres Le tableau ci-dessous prsente les corrlations entre les notes attribues selon les diffrents critres analytiques ( prononciation , vocabulaire , grammaire ; complexit , fluidit , exactitude ) et pour le niveau du CECR. Ces rsultats ont t tablis en prenant en compte les notations des 19 enregistrements par les 11 valuateurs. Le graphique, quant lui, reprsente les mmes rsultats sous une autre forme: chaque colonne symbolise la somme des corrlations que chaque critre entretient avec les autres.
Tableau 2: Corrlations entre les critres analytiques et le CECR . P 1,000 V 0,685 1,000 G 0,703 0,734 1,000 C 0,577 0,791 0,714 1,000 F 0,681 0,724 0,723 0,698 1,000 E 0,651 0,729 0,771 0,724 0,725 1,000 CECR 0,622 0,704 0,643 0,645 0,634 0,608 1,000

P V G C F E CECR

Corrlations entre les critres analytiques et le CECR

CECR E F C G V P P V G C F E CECR

Graphique 1: Corrlations entre les critres analytiques et le CECR .

On remarque quaucun des critres analytiques n'est prdominant. Toutes les corrlations sont en effet comprises entre 0,577 (prononciation/complexit) et 0,791 (vocabulaire/complexit). Les enseignants semblent donc, au cours de leurs valuations, avoir t attentifs accorder sensiblement la mme autonomie chaque critre, et n'ont pas t influencs par l'un d'entre eux en particulier. Ces rsultats infirment aussi l'une des hypothses nonces supra, relative au caractre synthtique de la fluidit qui reflterait la matrise gnrale de la langue. Ce critre n'est, en fait, que faiblement corrl celui du CECR qui, pour sa part, devrait tre synthtique. Or, il nest pas davantage influenc par les autres critres. 46

La seconde dfinition de la fluidit ne semble pas non plus avoir cours chez les professeurs qui se sont prts notre enqute. Dabord, la fluidit, pour eux, nest pas lie la matrise du vocabulaire, puisque la corrlation entre ces deux critres nexcde pas 0,724. Ensuite, la fluidit ne soppose pas la complexit et lexactitude: si cela tait le cas, on devrait dcouvrir, sinon des corrlations ngatives entre ce premier critre et les derniers, du moins des corrlations significativement moins leves que celles que lon constate entre les autres critres. Cependant, les valeurs que nous obtenons ne semblent pas significativement faibles (respectivement 0,698 et 0,725). Les valuateurs ntablissent pas non plus de lien entre la fluidit, dune part, et la grammaire et la prononciation, dautre part. Les corrlations de 0,723 et de 0,681 restent autour de la moyenne des rsultats que nous avons obtenus. La dernire dfinition de la fluidit, qui lassimile un phnomne pour lessentiel rythmique et prosodique, ne reflte donc pas la perception des professeurs que nous avons soumis notre enqute. En fait, ce sont le vocabulaire et la grammaire qui sont les critres les plus marquants dans la mesure o ils sont fortement corrls aux autres critres. Ces deux critres sont les plus influents (ou les plus influencs) tant sur les critres de leur propre grille que sur ceux de la deuxime grille. Dans nos rsultats, le vocabulaire est troitement li la complexit; la grammaire, quant elle, dpend fortement de lexactitude. Globalement, les valuateurs qualifient de complexe une production comportant un vocabulaire riche et vari, et d' exacte une production syntaxiquement bien construite. Ils partagent donc la conception quont de lexactitude la majorit des chercheurs en enseignement, apprentissage et acquisition des langues trangres, mais se distinguent deux pour ce qui touche la complexit. Les rsultats montrent aussi que les corrlations entre le niveau du CECR et les autres critres sont comprises entre 0,608 et 0,704. Ce constat est tonnant dans la mesure o nous lavons dj remarqu le critre CECR devrait tre synthtique et donc, fortement influenc par les autres critres. Cette apparente incohrence pourrait sexpliquer de trois faons. Dabord, les valuateurs peuvent considrer que les autres lments de la comptence de communication, qui ne sont pas inclus parmi les critres prononciation , vocabulaire , grammaire , complexit , fluidit et exactitude , sont beaucoup plus importants pour dfinir le niveau du CECR. Ces critres analytiques que nous avons retenus auraient alors peu dinfluence sur le rsultat au critre CECR . Ensuite, il est possible que les valuateurs, qui ont pourtant t forms et/ou forment d'autres (futurs) enseignants la matrise du CECR, ne parviennent pas lutiliser efficacement. Se pose alors srieusement la question de la pertinence et de la clart des descripteurs qui y sont prsents. Enfin, il faut envisager que les professeurs aient t inattentifs ou ngligents, se soient fatigus aprs quelques enregistrements et que les rsultats que nous avons obtenus aient t biaiss. Cependant, cette dernire explication est dmentie par certaines de nos donnes. Ainsi, on peut remarquer que le critre le moins corrl aux autres est le critre prononciation: ses corrlations avec les autres critres sont comprises entre 0,577 et 0,703. Selon les valuateurs, il n'est donc pas exclu qu'un bon tudiant obtienne une mauvaise note en prononciation, et vice versa. Or lon sait que la prononciation est, en matire dvaluation en langue trangre, un des facteurs majeurs deffets de halo. Labsence de corrlation forte avec la prononciation montre que les enseignants n'ont pas t abuss par ce phnomne et sont rests perspicaces pendant lexercice que nous leur demandions. Vont dans ce sens dautres

47

rsultats que nous prsentons dans la suite, comme la cohrence des valuateurs lorsquils jugent deux productions diffrentes ralises par le mme tudiant. 3.2 Corrlations entre les grilles
Moyenne (P, V, G) Moyenne (C, F, E) Corrlation : 0,87

CECRMoyenne (P, V, G) Corrlation : 0,75

CECRMoyenne (C, F, E) Corrlation : 0,7

Graphique 2, 3 et 4: Reprsentations schmatiques des corrlations entre grilles .

Lexamen des corrlations entre grilles vient dabord renforcer lun des constats que nous permettait lanalyse critre par critre: la grille du CECR ne reprsente pas, dans lesprit des valuateurs, la somme des autres critres, puisque cest la grille la moins fortement corrle aux autres. Les deux premires grilles sont, quant elles, fortement corrles entre elles. Ainsi, si, comme on la vu plus haut, les rsultats ne permettent pas de dire quelle place occupent exactement la grammaire, le vocabulaire et la prononciation dans la conception que les valuateurs ont de la fluidit, les comptences phontiques, lexicales et syntaxiques s'intgrent bien, selon eux, aux notions de fluidit, de complexit et d'exactitude. 3.3 Cohrence des valuateurs Si l'on considre les notes attribues par l'valuateur le plus svre et l'valuateur le moins svre, on constate que les carts les plus faibles entre le pourcentage de supriorit la moyenne et le pourcentage d'infriorit la moyenne concernent le critre vocabulaire (28 % d'cart) et le critre complexit (31% d'cart). C'est donc sur ces deux critres (par ailleurs fortement corrls entre eux), et sans doute sur leurs dfinitions, que les valuateurs sont les plus unanimes. Les carts les plus levs entre le pourcentage de supriorit la moyenne et le pourcentage d'infriorit la moyenne concernent le critre CECR (65 % d'cart), suivi de prs par le critre grammaire (60 % d'cart). L'analyse des corrlations entre les valuateurs montre donc, une nouvelle fois, quel point les apprenants obtiennent des notes trs diverses lorsqu'ils sont valus selon le critre CECR . Sur les 19 enregistrements constituant l'chantillon de l'exprience, 4 sont classs dans deux niveaux du CECR, 9 sont classs dans trois niveaux, 6 sont classs dans quatre niveaux. Plus tonnant encore: l'une de ces productions orales s'est vu attribuer 5 niveaux diffrents Notons que les valuateurs ne sont pas plus ou moins svres sur un critre analytique en particulier.

48

3.4 Cohrence entre les notations des deux enregistrements des deux mmes tudiants Les valuateurs sont cohrents avec eux-mmes dans le niveau du CECR qu'ils attribuent aux tudiants. La majorit d'entre eux valuent exactement de la mme faon les productions de ces deux tudiants. Ils s'accordent davantage sur le niveau attribuer l'apprenant lve faible que sur celui attribuer l'apprenant fort . Ceci s'explique peut-tre par le fait que ces enseignants sont plus souvent exposs des tudiants de niveaux dbutant (A) et intermdiaire (B) qu' des tudiants de niveau avanc (C), dont ils cerneraient le profil avec moins d'aisance. Les valuateurs sont plus unanimes entre eux lorsqu'ils notent ces deux monologues sur la fluidit, complexit, lexactitude (deuxime grille) et le niveau du CECR, que lorsqu'ils recourent aux critres de la premire grille ( vocabulaire , prononciation , grammaire ). 4. CONCLUSIONS ET PISTES Manifestement, la fluidit n'est considre par les valuateurs ni comme tant la matrise gnrale de la langue, ni comme tant un phnomne purement phontique et prosodique. A leurs yeux, elle ne s'oppose ni la complexit, ni l'exactitude, et ne dpend ni de la prononciation, ni de la syntaxe, ni du vocabulaire. L'exprience ne valide donc clairement aucune des trois acceptions de la notion de fluidit exposes plus haut. Ceci ne signifie pas pour autant que les valuateurs voient ce phnomne comme une notion indfinie et nbuleuse. En effet, on ne remarque aucun rsultat saillant du critre fluidit lorsque lon examine la cohrence des valuateurs entre eux et leur cohrence avec eux-mmes quand ils jugent deux productions diffrentes ralises par le mme tudiant. Cette constance nous indique que les valuateurs se font une ide prcise de la fluidit, mme si ce nest pas celles invoques par les chercheurs en enseignement, apprentissage et acquisition des langues trangres. A linverse, cette enqute nous amne nous interroger sur le caractre commun du Cadre europen commun de rfrence. Lorsque lon constate que les professeurs de notre enqute, pourtant expriments et forms lutilisation du CECR, et dont la plupart appartiennent la mme quipe denseignants, sont si peu unanimes quand il sagit dattribuer un niveau de rfrence une production orale, et ne confirment pas le caractre synthtique de cet outil dvaluation, on peut se poser des questions sur la validit de la dmarche que propose le Cadre. Cependant, pour obtenir des rsultats encore plus fiables et plus parlants, il conviendrait, lors dune prochaine recherche, d'une part, d'augmenter le nombre d'apprenants et d'valuateurs participant l'exprience et, d'autre part, de rpter l'exprience en proposant aux valuateurs une grille comprenant une chelle plus nuance (des notes sur 20 plutt que des notes sur 5). Par ailleurs, il serait intressant de vrifier que le vocabulaire et la grammaire sont des critres marquants, d'valuer des apprenants de mme niveau en neutralisant l'une des variables. Il conviendrait galement d'examiner des apprenants diffrents stades de leur apprentissage. Nous pourrions en outre croiser les valuations des professeurs avec une analyse quantitative des transcriptions des productions orales. Enfin, les rsultats de cette exprience pourraient, en outre, tre croiss avec ceux d'une enqute sur les reprsentations des valuateurs sur les notions de complexit, fluidit et exactitude. 49

BIBLIOGRAPHIE Cadre europen commun de rfrence pour les langues. (2001). Apprendre, enseigner, valuer . Paris, Conseil de l Europe/Didier. Chambers, F. (1998). What do we mean by fluency? System25(4), 535-544. Hunt, K.W. (1970). Syntactic maturity in school children and adults. Chicago: University of Chicago Press. Laufer, B (1992). How much lexis is necessary for reading comprehension? In P. J. L Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics, (pp.126-132). London: Macmillan. Lennon, P. (1990). Investigating fluency in EFL: A quantitative approach. Language Learning 40(3), 387-417. Mizera, G. J. (2006). Working memory and L2 oral fluency. University of Pittsburgh. Nation, P. (1989). Improving speaking fluency. System 17(3), 377-384. Nattinger, J.R. & DeCarrico, J.S. (1992). Lexical phrases and language teaching. Oxford: Oxford University Press. Raddaoui, A. H (2004). Fluency: a quantitative and qualitative account. The Reading Matrix 4(1). Raupach, M. (1980). Temporal variables in first and second language speech production. In H.W Dechert & M. Raupach (Eds.), Temporal variables in speech, (pp. 263-270). Den Hague, Mouton. Schmidt, R., (1992). Psycholinguistic mechanisms underlying second language fluency. Studies in Second Language Acquisition 14, 357-385. Towell, R., Hawkins, R., & Bazergui, N. (1996). The development of fluency in advanced learners of French. Applied Linguistics, 17(1), 84-119. Weinert, R. (1995). The role of formulaic language in second language acquisition: A review. Applied Linguistics 16(2), 180-205. Wood, D. (2001). In search of fluency: What is it and how can we teach it? The Canadian modern language review 57(4). Wray, A., & Perkins, M.R. (2000). The functions of formulaic language: An integrated model. Language and Communication 20, 1-28. Yuan, F. & Ellis, R. (2003). The effects of pre-task planning and on line planning on fluency, complexity and accuracy in L2 monologic oral production Applied linguistics 24(1), 1-27.

50

51

THE EFFECTS OF TASK COMPLEXITY ON FLUENCY AND FUNCTIONAL ADEQUACY OF SPEAKING PERFORMANCEi. Nivja H. de Jong, Margarita P. Steinel, Arjen Florijn, Rob Schoonen, & Jan H. Hulstijn University of Amsterdam, The Netherlands

1. INTRODUCTION Research on the effect of task type on language performance is important for several reasons. For the purpose of sequencing tasks for syllabus design, for language assessment, and for understanding psycholinguistic mechanisms in different task performances, we need to know the effect of task type on task performance. In this study, we focus on the effect of task complexity on two different measures of task performance: fluency and functional speaking performance. In everyday language use, fluency encompasses overall (oral) proficiency. Being fluent in a second language means having an exceptional, native-like control over that second language. Usually this definition is narrowed down to oral proficiency, although the definition of fluency can even be extended to, for instance, fluency in reading (Segalowitz, 2000). Lennon (1990) distinguishes between two senses of the concept of oral fluency: a broad sense and a narrow sense. The broad definition refers to overall oral proficiency; the narrow sense refers to temporal measures of speech. In this paper, we investigate the effect of task complexity on fluency in the narrow sense. The narrow sense of fluency can be defined as the proficiency to fill time with talk without unnatural hesitations (Fillmore, 1979). Tavakoli & Skehan (2005) note that the narrow sense of fluency has a multifaceted nature. They distinguish between breakdown fluency, speed fluency, and repair fluency. Breakdown fluency can be measured as number of pauses, length of run, and length of pauses. A measure that summarizes these is phonation time ratio: the total length of speech divided by the total time, in other words, the ratio of time filled with speech. Speed fluency refers to how much is actually said per time unit. This can be measured with number of syllables and number of words per second/minute. Finally, repair fluency reflects hesitations and repairs and can be measured by counting the number of false starts and hesitations. Together, these measures reflect the different facets of fluency in the narrow sense. In this paper, we will explore how breakdown fluency and speed fluency are affected by task complexity. According to Robinsons Cognition Hypothesis (2001), increasing cognitive demands heightens second language speakers attention, pushing the grammatical accuracy and linguistic complexity of their L2 production. However, accuracy and linguistic complexity increase at the cost of fluency: complex tasks are performed less fluently than simple tasks (Gilabert, 2005; Robinson, 2001). According to Skehan and Fosters Limited Attentional Capacity model, complex tasks lead to a decrease in performance (Skehan & Foster, 2001). Following Robinson as well as Skehan and Foster, we predict that task complexity negatively influences non-native speakers' fluency.

52

However, neither the Cognition Hypothesis nor the Limited Attentional Capacity model make clear predictions about native speakers' fluency. Native speakers are presumably less influenced by task complexity. Increased cognitive demands may lead to higher attention (Robinson) or fewer resources available (Skehan and Foster), but as speaking is mostly automatic for native speakers, complexity is not likely to have a substantial impact on fluency. On the other hand, Foster (2000) found that increase in planning time positively affects fluency not only for non-native but also for native speakers. In line with these results, we might also find an attenuated, negative effect of task complexity on fluency for native speakers. Researchers have explored the effect of task complexity on measures of grammatical accuracy, linguistic complexity, and fluency (e.g., Foster, 2000; Robinson, 2001, 2005; Skehan & Foster, 2001; Gilabert, 2005). However, it is unclear how these specific measures of linguistic output relate to (speaking) proficiency. At the same time, a (speaking) task is intended to result in language use that bears a resemblance, direct or indirect, to the way language is used in the real world (Ellis, 2003). Therefore, in addition to measuring linguistic characteristics such as accuracy, complexity, and fluency, it is important to also take functional performance into account. While performing speaking tasks, speakers are attempting to achieve an outcome that is communicatively adequate, with correct or appropriate propositional content. We will use the term functional speaking performance to refer to how well participants manage to fulfill the criteria set by the speaking situation. According to Skehan and Foster (2001), complex tasks will lead to a decrease in linguistic performance. They claim that as attentional resources are limited, and learners prioritize meaning over form, an increase in task complexity should lead to a decrease in performance as measured by grammatical accuracy, linguistic complexity, and/or fluency. In terms of Levelt's model of speaking (1989; Levelt, Roelofs, & Meyer, 1999), performance of complex tasks directs attentional resources to the Conceptualizer, resulting in less available resources for the Formulator. From the Limited Attentional Capacity model, we can therefore predict that individual differences in functional speaking performance are only likely to appear if the cognitive complexity of the task exceeds the ability of the speaker to perform the task adequately. If tasks are sufficiently complex, we predict from Skehan and Foster's model that complex tasks will be performed less well in terms of communicative adequacy. With respect to Robinson's Cognition Hypothesis, it is hard to make predictions, as Robinson assumes that measuring the three linguistic measures accuracy, complexity, and fluency together, amounts to the same thing as testing learners' overall functional success on a task (Robinson, 2001). We may hypothesize functional speaking performance to be better in complex speaking tasks, as speakers also perform better in terms of grammatical accuracy and linguistic complexity. In a large-scale study with non-native speakers as well as native speakers, we will explore the effect of task complexity on non-native and native speakers fluency. In addition, we investigate the effect of task complexity on ratings of functional speaking performance. Following Robinson's Cognition Hypothesis and Skehan and Foster's model of Limited Attentional Capacity and in line with previous findings (e.g., Gilabert, 2005), we expect task complexity to negatively influence fluency of non-native speakers. For the native speakers, we predict to find a smaller effect of task complexity on fluency. With respect to functional speaking performance, Robinson's Cognition Hypothesis would predict that complex tasks

53

lead to heightened attention which should result in higher scores, whereas the model of Skehan and Foster predicts the reverse. 2. METHOD 2.1 Participants 267 participants (208 non-native speakers of Dutch with various L1 and 59 native speakers) were paid to take part in this experiment. The investigation described in this paper is part of a large scale research project on speaking proficiency. Participants were asked to carry out several linguistic tasks in addition to the speaking tasks reported here. In total, all tasks took approximately two and a half hours (in two sessions). Performance of the speaking tasks comprised the first activity of the first session. 2.2 Materials We collected speech data using eight different speaking tasks. All speaking tasks were monologues and were administered by a computer-program set in Authorware. We created our speaking tasks with contrasts on three dimensions: Complexity, Formality and Discourse Type. The operationalization of Complexity was inspired by the functional descriptors of the Common European Framework of Reference (Council of Europe, 2001). We defined Complexity as a combination of three dimensions: 1) Complex tasks contain more elements than simple tasks, 2) Complex tasks concern a topic that is more general as opposed to simple tasks, with topics concerning to the personal life, 3) Complex tasks involve more abstract notions as opposed to simple tasks, that involve mostly concrete notions. Complexity was crossed with Formality (see Table 1). Although the tasks elicited monologues addressed to the computer, the task instructions specifically mentioned the audience that the participant should address in each task and participants were instructed to 'role play' and act as if they were actually speaking to these different people. In this way, we created 4 speaking tasks in formal settings (e.g., speaking to a judge or a neighborhoud meeting with an audience of 100 people), and 4 speaking tasks in informal settings (e.g., speaking to a good friend). To obtain a broad range of types of speech data, we created a descriptive and argumentative task (Discourse Type) for each of the cells in Table 1. We thus created four complex tasks and four simple tasks, balanced on Formality and Discourse type. In Table 2, we give a short description of each of the 8 tasks.
Table 1: Eight speaking tasks differing in Complexity and Formality. Complex Informal Formal 1 argumentative / 1 descriptive 1 argumentative / 1 descriptive Simple 1 argumentative / 1 descriptive 1 argumentative / 1 descriptive

Table 2: short descriptions of the 4 simple and 4 complex speaking tasks. Task Descriptive, informal Simple Complex Describe a living room to a friend: Describe a graph you have seen this you are visiting friends who have morning in the newspaper. Your friend

54

moved a while ago and you are on the phone describing their living room to another friend. The living room is shown on the screen in a picture.

Argumentative, informal

Your sister has to choose between two options: follow a two-year course in the week-time and work in the weekends or work at a company where she can follow a four-year course. Advise your sister to choose the second option. You have seen an accident about a month ago. You are now in a courtroom and the judge asks you to describe what you have seen. The screen shows 4 consecutive pictures of a car colliding with a cyclist, and driving away.

Descriptive, formal

is unemployed and you have just seen a graph depicting unemployment figures in the last twelve years, with information of differences in unemployment for men versus women. You describe the graph (shown on the screen) to your friend. You are discussing the problem of traffic jams with a friend. Convince your friend that your solution is best (choose between building more roads, building more bicycle paths, or improving public transport). Discuss the environmental consequences and mobility issues for these three options. You work for human resources at a hospital. The hospital is looking for a new nurse at the moment. You describe the job to a lady calling for information. The activities that have to be described are shown in pictures, organized in a pie-chart that shows the amount of time the activities will presumably take.

Argumentative, formal

In a neighborhood meeting, you are commenting on a speech by a spokesman of the municipality. He has just explained where a new playing ground will be built. You argue for a different location (a map of the neighborhood is shown on the screen, with a school and a road. The planned playground is at the other side of the road. An arrow points at the better location: at the same side of the road as the school).

In a neighborhood meeting, you are presenting a new plan to build more parking spaces near the supermarket. You are the owner of the supermarket and have to convince the people at the meeting to vote for one particular plan. Three plans are presented in a table, differing in total costs, number of parking spaces, consequences for the neighborhood, and noise pollution. The plan that you have to choose involves the lowest costs for the supermarket, but at the same time the plan is not ideal in terms of parking spaces, consequences for the neighborhood, and noise pollution.

2.3 Procedure The tasks were set in Authorware, version 7 (Macromedia. Macromedia Authorware 7. URL: http://www.macromedia.com/software/authorware). All tasks started with a presentation screen providing background information. Participants could click to go to the next screen. They could not go back to this first screen after clicking. The information on the presentation screen that contained the actual speaking task was divided into two parts. The first part explained the speaking task in general, and after a mouse click, a more detailed formulation of the assignment would appear underneath the general information. Depending on how much new information was given in this second part, the screen remained for 7-17 seconds. During this time, allocated for reading the new information, a time bar appeared, filling blue in 30 55

seconds. Participants were instructed to prepare their response during this time. After the bar was filled blue, a second bar appeared. This bar, which filled green in 120 seconds, was the cue to start and keep speaking. The instruction to the speaking tasks urged participants to do their best in imagining they actually were in the situation described in each task. The introduction also explained that participants need not remain speaking until the green time bar was filled completely, but that they could stop when they were ready. Following the introduction, participants carried out a short practice task, in which they told a friend about the experiment in which they are participating. 2.4 Apparatus The speech was recorded with a directional microphone on the same computer that also ran the Authorware presentation, using PRAAT with 11250 Hz sampling frequency. 2.5 Fluency measures All recordings were automatically measured on phonation time ratio and syllables per second. We programmed two scripts in PRAAT (Boersma & Weenink, 2007) to obtain these two measures. For phonation time ratio, a script was made that filters the sound and measures voiced and unvoiced speech to globally find speech in silence. In a second step, more precise beginnings and endings of speech were measured using the intensity of the sound just before and after the voiced parts of speech measured in the first step. Minimum silence duration was set to 350 milliseconds. To measure syllables per second, a script was made in PRAAT filtering the sound and measuring intensity. Syllables were then detected as peaks in intensity above a certain threshold (defined as the median of all intensity measures of the sound) and with a preceding dip in intensity of at least 3 dB. For each speaking performance we measured the number of syllables as well as the total duration of the speaking performance and thus calculated syllables per time unit (per second). We measured both measures of fluency using only the first 30 seconds of all speaking performances. Speaking performances ranged from a few seconds to the complete time allocated to speak (2 minutes). We chose the first 30 seconds for two reasons. First, measures of fluency are more comparable if the sample of speaking time is comparable. Second, we found that for some speakers, the time allocated to finish the speaking task was not sufficient and speakers tended to speed up towards the end of the speaking performance if they felt they would not have enough time to finish. For technical details about these scripts and for a validation of these fluency measures, see De Jong & Wempe (in preparation). 2.6 Functional adequacy measures Twelve students of the University of Amsterdam received payment to judge all speaking performances of two or three tasks. We deliberately selected non-experts, (none of the students studied linguistics or languages) in order to obtain naive judgments on functional speaking performance. For each task, we constructed a rating scale with specific criteria. All criteria were functional speaking proficiency criteria, aiming to distinguish between differences in amount of successful fulfillment of each task. We divided the rating scale in six parts, and for each part we specified the criteria. To be able to distinguish more precisely between speaking performances, we divided each of these parts in five sub-scales, thus creating a rating scale ranging from 1 to 30. For all speaking tasks, the descriptors of the first three global scores 56

(from 1 to 5, from 6 to 10 and from 11 to 15) described performances that did not suffice in functional terms, with descriptors such as unsuccessful, weak, and mediocre with respect to communicative adequacy. The last three global scores (from 16 to 20, from 21 to 25 and from 26 to 30) described performances that would be sufficient in functional terms, with descriptors ranging from sufficient, quite successful, to very successful. For each speaking task, we constructed specified criteria suitable to distinguish between more and successful fulfillments of that task. Note that in this way, it would be more difficult for participants to obtain a high rating for tasks that are more difficult than for tasks that are quite simple. After an introductory training session, the judges received all (267) performances of either two or three tasks, such that for each speaking task, four judges rated all speaking performances. The judges rated these speaking performances at home, within three weeks. We calculated jury means for all speaking performances (averaging over four judges). Alpha measures calculated over all non-native speaker performances ranged from .88 to .91 on these 8 speaking tasks, which ensured us that ratings were sufficiently reliable. 3. RESULTS Due to technical failures, data of 125 speaking performances was not recorded, or not recorded with sufficient quality to judge the speech and/or automatically obtain fluency measures (6 % of all speaking performances). For forty speaking performances, speech lasted less than 5 seconds. We excluded these from the analyses, as (automatic) measures of fluency are unstable for very short speech samples. Furthermore, we calculated means and standard deviations for native and non-native speakers separately and excluded speaking performances with fluency measures (phonation time ratio and syllables per second) below or above 3 standard deviations from their means, as well as speaking performances with ratings below or above 3 standard deviations. For the remaining 1962 speaking performances (over 97% of all technically sound recordings) we computed aggregated means for phonation time ratio, syllables per second, and ratings on functional adequacy of performance over complex and simple tasks.
Table 3: means and standard deviations for complex and simple speaking tasks. Functional adequacy (max = 30) Native (n = 193) Non-native (n = 54) Complex 25.27 (2.19) 15.11 (4.27) Simple 24.59 (2.14) 15.49 (4.03) Phonation time ratio (max = 1) Complex 0.836 (0.079) 0.757 (0.122) Simple 0.823 (0.080) 0.771 (0.121) Syllables per second (no max) Complex 2.94 (0.34) 2.42 (0.40) Simple 2.91 (0.38) 2.47 (0.37)

Table 3 shows the means and standard deviations for the functional adequacy measures, as well as the means and standard deviations of the two fluency measures. We report mixed anova models with Complexity as within-subjects effect and Nativeness (native versus nonnative) as between-subjects effect. For Functional Adequacy, we found a significant effect of Nativeness (F(1, 244) = 326.5, p < 0.001, p2 = 0.58), no significant effect of Complexity (F(1, 2 244) = 2.8, p = 0.099, p = 0.007), and a significant Complexity x Nativeness interaction (F (1, 57

= 46.9, p < 0.001, p2 = 0.11). Follow-up anovas, investigating the effect of complexity for native speakers and non-native speakers separately, showed that for native speakers, the complex tasks yielded higher functional adequacy scores than simple tasks (F (1, 53) = 24.4, p < 0.001, p2 = 0.28). However, for non-native speakers we found the reverse: complex tasks yielded lower functional adequacy ratings than simple tasks (F (1, 191) = 22.7, p < 0.001, p2 = 0.07).
244)

Turning to the first fluency measure, phonation time ratio, we found a similar pattern: a significant effect of Nativeness (F(1, 245) = 22.1, p < 0.001, p2 = 0.08), a (very small) significant effect of Complexity (F(1, 242) = 5.5, p = 0.0120, p2 = 0.02), and a significant Complexity x Nativeness interaction (F (1, 242) = 13.6, p < 0.001, p2 = 0.05). For Native speakers, the complex tasks resulted in higher phonation time ratios than simple tasks (F (1, 53) = 9.8, p = 0.003, p2 = 0.15). For non-native speakers, we found a significant effect in the reverse direction: complex tasks resulted in lower phonation time ratios than simple tasks (F (1, 188) = 14.6, p < 0.001, p2 = 0.07). With respect to our second fluency measure, syllables per second, we again found a significant effect of Nativeness (F(1, 245) = 124.2, p < 0.001, p2 = 0.36), a (very small) significant effect of Complexity (F(1, 242) = 5.1, p = 0.025, p2 = 0.02), and a significant Complexity x Nativeness interaction (F (1, 242) = 5.8, p = 0.017, p2 = 0.02). For Native speakers, there was no difference between complex and simple tasks (F (1, 53) = 1.4, p = 0.2, p2 = 0.02). For non-native speakers, we did find a quite small but significant effect, in that the simple tasks yielded higher scores of syllables per second than complex tasks (F(1, 188) = 10.6, p = 0.001, p2 = 0.05). In line with our expectations we found that non-native speakers are less fluent in complex speaking tasks. Interestingly, we also found that non-native speakers performed less well in terms of functional adequacy. For native speakers, we did not expect a difference between complex and simple tasks with respect to fluency, as a heightened attentional level (e.g. Robinson, 2001) or limited capacity (e.g. Skehan & Foster, 2001) would hardly influence automatic linguistic processing. However, we found that native speakers are more fluent in complex tasks compared with simple tasks for one of the fluency measures (phonation time ratio). In other words, they fill more time with speech, but, at the same time, they are not more fluent with respect to amount of linguistic content per time unit (syllables per second). We also found that native speakers perform better on the complex speaking tasks as measured by functional adequacy ratings. 4. DISCUSSION In this paper, we investigate the influence of task complexity on the fluency and functional adequacy of speaking performance of non-native and native speakers. Research focusing on the influence of task complexity on non-native performance has so far focused on linguistic measurements such as grammatical accuracy, linguistic complexity, and fluency. At the same time, this type of research was initiated in order to gain insight in sequencing tasks for syllabus design in task based language teaching. According to Robinson (2001: 33), the desired outcome of task-based instruction is the ability to achieve real world target task goals as measured by an estimate of successful performance. He then states that testing whether a desired outcome has been achieved can be done in two ways: directly, i.e. through tests where the criterion is whether or not the learner 58

successfully fulfills the task, or indirectly, by measuring accuracy, linguistic complexity, and fluency. Skehan explains a similar rationale for measuring the three linguistic measures of output accuracy, complexity, and fluency. Instead of using global scales to measure overall performance, researchers into tasks have tended to use more precise operationalizations of underlying constructs (Skehan, 2001: 170). In other words, researchers investigating the Cognition Hypothesis as well as the Limited Capacity Model assume that measuring linguistic complexity, accuracy and fluency together, amount to the same thing as measuring global speaking proficiency, which includes functional adequacy (but see Wigglesworth, 1997). However, we claim that functional adequacy is a dimension separate from accuracy, linguistic complexity, or fluency. For instance, we can imagine a very accurate and perhaps quite fluent performance that is not successful in functional terms. Munro and Derwing (2001), for example, show that a slow speaking rate is related to low comprehensibility, but a too high speaking rate also leads to less comprehensibility. In other words, the relation between speaking rate (a measure of fluency) and comprehensibility (a measure related to functional adequacy) is curvilinear, with the optimum in comprehensibility relating to a moderate speaking rate. For linguistic complexity, a similar effect may be expected. It is not necessarily the case that using low-frequency words and many subordinate clauses lead to higher comprehensibility and a higher functional performance. Therefore, for gaining insight in how attention is distributed while performing complex and simple tasks, researchers should take the effect of task complexity on functional speaking performance into account. The results of our study confirm this claim. In addition to the expected outcome that non-native speakers perform complex tasks less fluently, we also found that in terms of functional adequacy, non-native speakers scored lower on complex tasks. It might be the case that non-native speakers in the complex tasks used more linguistic complexity (e.g., more low frequent words, more content words, more subordinate clauses), and performed with less grammatical errors than in the simple tasks, but at the same time, in terms of successful functional performance, non-native speakers score lower on complex than on simple tasks. Perhaps these complex speaking tasks required (complex) language use that participants had not yet fully acquired. These are matters, which we aim to investigate in another paper. For native speakers, performing a speaking task is mostly automatic at the linguistic level. Most attention needed to fulfill the requirements of the speaking task is directed to achieving well in terms of the content of the task, in simple as well as complex tasks. In terms of Levelt's model of speaking (Levelt, 1989), complexity of tasks primarily taxes the Conceptualizer, whereas the Formulator and the Articulator operate automatically, be it a conceptually complex or simple task. Therefore, if an effect of task complexity on native speakers' performance is anticipated, we would expect that a complex task results in lower functional adequacy scores. However, we clearly found that native speakers perform better functionally in complex tasks. Expanding Robinson's model (2001, 2005) to functional speaking performance, we may hypothesize that native speakers have a heightened level of attention when performing complex tasks and that a heightened level of attention may indeed have a positive effect on functional performance. In other words, native speakers seem to need a challenge in order to excel, such that in complex tasks a heightened level of attention pushes (linguistic as well as) functional output to a higher level. We also found that native speakers perform complex tasks more fluently, as measured by phonation time ratio. However, phonation time ratio reflects only one type of fluency. For the 59

measure syllables per second, as an index of speech rate, no significant difference was found. Previous research has shown that both types of measures (the ability to fill time with speech, and the rate of speech) correlate well with perceived fluency (Cucchiarini, Strik, & Boves, 2002). At the same time, we should keep in mind that filling time with speech is not the only way to exhibit fluency: speakers may lengthen their vowel in order to sound more fluently, and use more filled pauses. The automatic measure of phonation time ratio used in our study cannot distinguish between filled pauses and speech with content. We conclude that making a distinction between different types of fluency is crucial to a full insight in the construct of fluency. As Tavakoli and Skehan (2005) have noted, fluency has a multifaceted nature. We find that being fluent in terms of breakdown fluency (i.e. scoring high on phonation time ratio), does not necessarily mean high speed fluency, in terms of syllables per second.

60

REFERENCES Boersma, P., & Weenink, D. (2007). PRAAT, http://www.praat.org (Version 4.5.25). Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative Assessment of Second Language Learners' Fluency: Comparisons between Read and Spontaneous Speech. The Journal of the Acoustical Society of America, 111 (6), 2862-2873. De Jong, N. H., & Wempe, T. (in preparation). Automatic measures of fluency in spoken Dutch. Ellis, R. (2003). Task-based Language Teaching. Oxford: Oxford University Press. Council of Europe (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge: Cambridge University Press. Fillmore, C. J. (1979). On Fluency. In C. J. Fillmore, D. Kempler & W. S.-Y. Wang (Eds.), Individual differneces in language ability and language behavior, (pp. 85 - 101). New York: Academic Press. Foster, P. (2000). Attending to message and medium: The effects of planning time on the taskbased language performance of native and non-native speakers. Unpublished Doctoral Thesis, King's College, London. Gilabert, R. (2005, September 21 - 23). The effects of increasing cognitive complexity on L2 narrative oral production. Paper presented at the International Conference on Task-Based Language Teaching, Leuven, Belgium. Lennon, P. (1990). Investigating Fluency in EFL: A Quantitative Approach. Language learning 3, 387. Levelt, W.J.M. (1989). Speaking: from intention to articulation. Cambridge: Cambridge University Press. Levelt, W.J.M., Roelofs, A, & Meyer, A.S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences 22, 1 38. Munro, M. J., & Derwing, T. M. (2001). Modeling perceptions of the accentness and comprehensibility of L2 speech: The Role of Speaking Rate. Studies in second language acquisition 23(4), 451. Robinson, P. (2001). Task complexity, task difficulty, and task production: Exploring interactions in a componential framework. Applied linguistics 22(1), 27-57. Robinson, P. (2005). Cognitive complexity and task sequencing: Studies in a componential framework for second language task design. IRAL 43, 1-32.

61

Segalowitz, N. (2000). Automaticity and attentional skill in fluent performance. In H. Riggenbach (Ed.), Perspectives on fluency, (pp. 200 - 219). Ann Arbor: University of Michigan Press. Skehan, P. (2001). Task and language performance assessment. In M. Bygate, P. Skehan, & M. Swain (Eds.), Researching Pedagogic Tasks, (pp. 167 185). Harlow: Pearson Education Limited. Skehan, P., & Foster, P. (2001). Cognition and tasks. In P. Robinson (Ed.), Cognition and Second Language Instruction. Cambridge: Cambridge University Press. Tavakoli, P., & Skehan, P. (2005). Strategic planning, task structure, and performance testing. In R. Ellis (Ed.), Planning and Task Performance in a Second Language, (pp. 239 - 276). Amsterdam/Philadelphia: John Benjamins Publishing Company. Research funded by the Netherlands Organisation of Scientific Research (NWO) under grant 254-70-030; project title: Unraveling second language proficiency . Project leaders: Jan H. Hulstijn and Rob Schoonen. The authors whish to thank Renske Berns, Andrea Friedrich, and Kimberley Mulder for their help in running the experiment, and Ton Wempe for his help with programming the scripts for automatic fluency measures in PRAAT.

62

PERFORMANCE ACCURACY AFFECTED BY CONTROL OVER BILINGUAL PRODUCTION: A STUDY OF BALANCED L2 USERS. Julia Festman1, Antoni Rodriguez-Fornells2 & Thomas Mnte1 Otto-von-GuerickeUniversity1, ICREA Barcelona2, Spain

1. BILINGUAL COGNITIVE ADVANTAGES THE SAME FOR BILINGUALS? The fact that bilinguals are constantly and from early on confronted with having to keep two languages separate to such degree that, at any given moment in time, they can decide, which language they want to use for their verbal output, brought researchers to think about the mechanisms involved in language control. Bialystok (1999) claimed that due to the need for controlling both languages, the control mechanisms improve in bilinguals compared to monolinguals. Since the 1960s, researchers showed that bilinguals (learners as well as proficient speakers) have cognitive advantages compared to monolingual speakers, matched for age, sex, and socio-economic level. Among these cognitive advantages are higher verbal and non-verbal intelligence, verbal originality and divergence, greater facility in concept formation (Peal & Lambert, 1962), and a more developed metalinguistic awareness (for a review Hamers & Blanc, 1989). Some research also indicated that the cognitive advantages of bilinguality extend to non-verbal tasks: e.g., bilinguals showed greater originality in creative thinking (Torrace et al., 1970). Bialystok (1988) suggested that bilingual and monolingual children might call to a different extent on strategies of analysis and control in language processing. In the 1990s, authors argued that bilingual children may have greater cognitive control of information processing than do monolingual children (Bialystok & Ryan, 1985; Bialystok, 1991). Mohanty and Perregaux (1997) suggested that bilingual children probably develop special reflective skills, which enable the child to exercise a greater control over his cognitive functions and use them in more effective ways resulting in better performance on a variety of academic tasks. Vygotskys view that improvement of control and self-regulation of cognitive processes is induced by the use of more than one language (1962: 110) was revived and reformulated; it was suggested that the experience with two languages enhances the awareness of the analysis and control components of language processing (Hamers & Blanc, 1989: 85). In sum, the fact that two languages have to be processed and dealt with on a constant basis (due to parallel activation) forces the bilingual speaker to practice control. This additional practice, which is not at all a requirement for a monolingual speaker, was shown to influence effectiveness of processing. More recently, Bialystok (2005) reported data suggesting that bilingual children perform better on problem solving that involves the inhibition of misleading information. Long and Prat (2002) claimed that bilinguals with higher working memory span were better able to prevent Stroop interference than individuals with lower working memory span, but only when the number of conflict trials was high. On the one hand, this finding shows that a bilingual Stroop-advantage was only observed under particular circumstances. On the other hand, it points at the fact that bilinguals should not be considered as one homogeneous group, but that 63

a more careful look should be afforded at their individual differences. Do all bilinguals profit from this control training in the same way? We hypothesize that some bilinguals make more or more efficient use of their control mechanisms during bilingual language processing than others. A common assumption in research on bilingual language production is that bilingual speakers have the ability of language choice, but information on the frequency, with which bilinguals decide for using only one or both of their languages is scarce. To our knowledge, only Grosjean (1982) described such observations: bilinguals seem to differ in their switching habits - some L2 speakers prefer to use only one language at the time, even when talking to a bilingual, while others switch constantly between their two languages. The only uniquely bilingual language production phenomenon is cross-language interference (Festman, 2004, 2007). It is usually defined as the unintended use of non-target language during target-language production. Green (1986, 1993) claimed that interference is due to failures of control: when the control system does not successfully monitor to which language elements belong that are selected during speech preparation, then interference can occur. We then predict that: 1) if some bilinguals are better at controlling their language choice than others, then there should be differences in the number of errors of interference they produce. 2) If, however, the improvement of control is a group-general trait and common to all bilinguals, no such differences in terms of interference should be observed. 3) If the control difference relies on differences of proficiency between the groups, with the non-switchers (with few error of interference) being more proficient than the switchers (those with many errors of interference), such difference should be reflected in verbal production tasks by more correct responses. 4) If, however, the control difference is independent from proficiency, the performance in verbal production tasks should not be different between the groups in terms of number of correct responses. To address this question, we used two bilingual verbal production tasks: a bilingual picture naming task and a verbal fluency task. The first task was intended to form two groups of bilinguals, switchers and non-switchers. The second task was employed to test the predictions with a different verbal production task. In these tasks, bilinguals are faced with a particular difficulty compared to monolinguals: due to parallel activation of words from both languages, bilinguals have to consciously suppress non-target language words, which usually pop up during the stages of lexical search and retrieval. Moreover, the task demands vary depending on the proficiency level in both languages. Three factors have been suggested as influencing bilingual task performance: the size of the speakers mental lexicon, the speed of retrieval, and the ability to control the nontarget language. The stronger language is usually characterized by a larger vocabulary, faster retrieval and better control over the intruding non-target language, while the opposite is true for the weaker language. In sum, in a bilingual setting, these tasks measure not only language proficiency in the sense of lexical competence. In particular they provide an indication of the speakers ability to prevent interference, and the speed of cognitive processing. 64

2. EXPERIMENT 1: BILINGUAL PICTURE-NAMING TASK. 2.1 Methods 49 bilingual speakers of Russian and German (ages 17 to 50) participated in Experiment 1. Most of them spoke Russian as their native language, and had some contact to German while they were living in the former Soviet Union. After migration to Germany due to cultural bonds dating back to their ancestors (who came to Russia in the time of Katharina II, 1762-1796), they acquired German and were fluent in both languages. 2.2 Materials and task For this task, 240 pictures (+48 for practice) were selected from the Snodgrass and Vanderwart (1980) norms, which consist of black line drawings on white background. Word frequency (i.e., lemma frequencies per million) was determined for German with CELEX (Baayen, Piepenbrock, & Van Rijn, 1993), and for Russian with an on-line frequency dictionary (www.artint.ru/projects/frqlist.asp). The following types of pictures were excluded: if they were cognates in both languages, culturally specific (e.g., eskimo, banjo), had no one-word translation to Russian (e.g., ambulance, typewriter). Pictures that had alternative names in German (e.g., Mhre or Karotte for carrot) were used as practice items. Each participant was tested individually in a session (30 minutes). Subjects were seated in a dimly lit, sound-attenuated room in front of a computer screen. All stimuli were displayed centered on a high-resolution screen. Viewing distance was approximately 60cm. The DMDX program (www.u.arizona.edu/~kforster/dmastr/dmastr.htm) controlled the display of the visual stimuli and measured speech-onset latencies. All verbal responses were recorded. Participants were asked to name the pictures as fast as possible (trying neither to make errors nor to correct themselves). They were informed that the language in which a given picture had to be named was determined by the color (red or green) of a frame, which appeared prior to the picture. Participants were instructed that the sequence of switching between the two languages was completely regular, i.e., in every second trial (cf. Jackson et al., 2001): two pictures in a row required a response in German, and the next two in Russian, and so on (GG RR GG RR). Each trial had the following structure: First, a fixation cross appeared in the center of the screen for 100ms. Then, a colored frame (red or green) was displayed that surrounded the fixation point. After 300ms the stimulus picture was shown in the frame for 1500ms. The next trial started after 1500ms with the fixation point. The experiment consisted of six blocks, each with 40 pictures (20 per language). Before each experimental block, 8 practice trials were administered. Between the blocks, the participant was allowed to rest. The presentation order of the pictures was fully randomized. The association of one color with one response language was counterbalanced: half of the participants responded to a green frame with German, and half with Russian. Half of the subjects started with German, the others with Russian. 2.3 Results 65

2.3.1 Subject selection The participants language proficiency was examined according to errors in both languages during picture naming in two languages. The mean of all produced errors in this task was 51.2 (SD=17.6). The number of errors produced by 19 from among our 49 participants was higher than the estimated upper confidence interval (56.2). Apparently, the proficiency level in at least one of the languages was too low for these subjects; they were excluded from analysis and further participation. 2.3.2 Subject background information The remaining, more proficient participants were on average 25 years of age (SD=5.09), most were female (n=21; male n=8). They lived on average since 9 years in Germany (SD=4), and were late L2-learners (age at acquisition of German was 11.37 years, SD=6.09). In the language history questionnaire, participants reported that 28 of them spoke Russian as their first language (L1) and German as their second language (L2), and one participant spoke German as L1 and Russian as L2. More than half of them were students (57%), 30% were pupils or had finished formal education, and few (13%) had achieved a higher academic degree. 2.3.3 Correct and errors Overall, the performance in both languages was equally good: In Russian, 2843 correct responses were given (78.97% of all Russian trials were correct), and in German, 2737 correct responses were given (76.03% of all German trials were correct). According to the t-test for paired comparisons in the alternating runs in German and Russian, no difference was found. 2.3.4 Errors Overall, 1620 errors were produced (22.5% of all responses). Three major error categories were observed: (1) Errors of no responses (dont know): the participant did not provide any answer. (2) Within-language substitutions: the target word is substituted by a word from the target language; however, it is semantically only close to the target word. And (3) errors of interference: a response in the non-target language (translation equivalent or a word similar to the target meaning). On 819 (~ 11 %) of all trials, no response was given (10.9% of all responses in Russian, and 12.11% in German). According to visual, semantic and phonetic criteria (see Festman et al. 2007b for more detail on the rating), the similarity to the target word was rated on a 4-point scale by 3 independent judges and discussed until agreement could be reached: responses were categorized as synonyms / very similar / with little similarity of / no similarity with the target word. In Russian 6.08% and in German 8.36% of all responses were substitution errors (see Table 1 below). Synonyms were not considered as errors.
Table 1: Substitution errors in the Picture Naming Task. Russian German Very similar 192 248 Little similarity 25 41 Not similar 2 12

66

The slightly higher error score for German compared to Russian might indicate a higher semantic sensitivity in the first language of all subjects (Russian). Overall, 281 occurrences of interference were produced in this task (3.9% of all responses). There were slightly more interferences produced from German in Russian responses (4.05% of all Russian responses) than from Russian in German (3.5% of all German responses). Additionally, 8 occurrences of interferences from English were found during German naming. 2. 3.5 Two groups of bilinguals? To test our hypothesis that some bilinguals had made significantly less errors of interference than others, we used Wards method to group the remaining 30 participants in 2 clusters, which should be as homogenous as possible, and similar in size. As a result, in the first cluster, participants were grouped who produced minimally 1 and maximally 8 errors of interference. We called this group the non-switchers, since we assume that they do not switch unintentionally. Participants with 10 to 21 errors constituted the second group, the switchers, apparently showing less control over unintentional switching. The difference in number of interference between both groups was highly significant (t(27) = -6.753; p < 0.001). 2. 3.6 The role of proficiency Correct responses As outlined in the prediction section, an alternative explanation could be that the group difference might not be due to interference control but rather to proficiency, i.e., that the nonswitchers are more highly proficient in both languages than the switchers. To this end, we compared the mean of correct responses in this task in the two languages separately (see Table 2). T-tests comparing the mean of correct responses for Russian between non-switchers and switchers were not significant (t(27) = -0.243; p = 0.810). However, in German, the switchers produced significantly less correct responses than the non-switchers (t(27) = 2.246; p = 0.033).

Table 2: Group mean of correct responses for German and Russian. Group non-switcher switchers Mean correct resp. (SD) German 95.69 (11.7) 85.69 (12.14) Mean correct resp. (SD) Russian 94.13 (13.71) 95.23 (10.01)

2.4 Discussion The results of clearly attributing the interference difference between the two groups only to differences in their control abilities was successful for Russian (L1), but not for German, the subjects L2. This implies that there was a control difference between the groups, which was apparent for naming pictures in L1; a possible proficiency difference in L2 might, however, overlay the effect for naming pictures in L2. It is probable that since our subjects were late bilinguals, this control effect was more prominent for first- than for second-language processing.

67

But the question remains whether the non-switcher group was more proficient in L2 German than the switcher group, so that differences in correct naming in Experiment 1 could be attributed to this factor. To investigate further the impact of control vs. proficiency, we conducted a second experiment with both groups. 3. EXPERIMENT 2: BILINGUAL VERBAL FLUENCY TASK. The same two groups (switchers and non-switchers) that were established in Experiment 1 participated in Experiment 2. The verbal fluency task in its bilingual version is used as a measure of proficiency in both languages (e.g. Luk & Bialystok 2007). In a verbal fluency test, a speaker is asked to name as many words as possible in a given time frame (typically one minute) to a certain stimulus (Lezak et al. 2004). The speakers oral production is measured by the number of correct responses. Response production is subject to predefined rules, such as avoidance of repetition of responses, proper names, numbers, and dialect or slang words. In the bilingual version, words belonging to the non-target language (i.e., interference) were not allowed, too. 3.1 Procedure Two subtests were administered twice for each language, each lasting one minute (overall 8 minutes testing). In the category test, participants were asked to name as many words within a minute that belong to a certain category. For German, the categories FOOD (large search area) and CLOTHING (medium search area), for Russian ANIMALS (large) and FLOWERS & PLANTS (medium) were used. In the letter fluency test, as many words as possible should be named starting in German with S (very large search area) and H (large), in Russian with P (very large) and R (large). The term search area denotes the frequency of words that start with a certain letter, determined by Aschenbrenner et al. (2000) according to a German dictionary; the same procedure was used for Russian. Half of the participants started in German, the other half in Russian. Instructions were provided by a native German and a native Russian experimenter in the target language. 3.2 Results The number of overall responses given was similar across both languages (German = 1825, Russian = 1767), of which a similar number of errors was produced in both languages (German = 139, Russian = 122). The largest error categories among the 261 errors in both languages were same root (e.g., Haut, Hautzelle, Hautkrebs) (n = 96, German n = 51 and Russian n = 45, respectively), and repetition (n = 57). In all other error categories only few errors were observed: membership error (n = 27), lexical invention (n = 22), dialect/slang (n = 16), proper names (n = 12), and fragments (n = 10). Interference errors were infrequent (overall n = 21; n = 13 in German and n = 8 in Russian). 3.3 Group differences The smallest, but very important error category yielded a significant group difference: the mean interference errors collapsed over all conditions showed that the switchers produced significantly more errors of interference (t(27) = -1.998; p = 0.02), while the non-switchers produced hardly any errors of interference at all. 68

As can be seen in Table 3 below, apart from the interference difference, the group performance did not differ. None of the comparisons with regard to correct responses were significant; with regard to errors, non-switchers produced on average 3.6 errors in German and 3.9 in Russian (switchers 5.2 in German and 4.2 in Russian), but differences were not significant. This is taken as evidence that the proficiency level of both groups was similar in both languages. We suggest that, based on these results, the size of the mental lexicon as well as the speed of retrieval is the same in both groups for both languages, however, most importantly, the ability to control the non-target language is the main characteristic that differentiates both groups. 3.4 Task demands and differences The results from the two verbal fluency subtests indicate that the category fluency test was easier to execute than the letter fluency test, since in both groups more correct responses were given and less errors were made in this task. Even slight differences in the search area size could be observed in German and Russian in both groups (see Table 3). Although both subtests require a search of the subjects mental lexicon to produce responses, both tasks differ in their task demands. The category test (e.g., naming different animals) is fairly easy for healthy subjects, because words are commonly searched according to their meaning. The main restriction here is category membership, so that the general search strategy is already established by the task requirement. In order to avoid repetition, one could name domestic animals, farm animals, wild animals, or birds, fish, mammals, etc. As soon as a subcategory is exhausted, the participant must efficiently switch to a new one (Troyer, Moscovitch & Winocur, 1997). A different strategy is necessary to complete the letter fluency task (words starting with a predefined letter): participants have to do a structured phonological and/or lexical search (Aschenbrenner et al., 2000). New strategies have to be both flexible (to adjust to new task demands), and efficient, to use the given time optimally. This task structure does not define clearly the necessary strategies and poses an additional problem for task execution. The participant could search his mental lexicon for minimal pairs, or for the same initial consonant. This general difference between the tasks in terms of difficulty was, however, not reflected in differences of performance between the two groups. Taken together, the two subtests of verbal fluency give a clear picture: both groups seem to have the same level of proficiency in both languages. The main group difference observed in this test is the control-effect with regard to interference: switchers produced more errors of interference than non-switchers.
Table 3: This table shows the mean of correct responses per group (non-switcher and switcher) for each subtest and standard deviation (SD).

Response language subtest German - category Stimulus Food Clothes S groups non-switcher switcher non-switcher switcher non-switcher Mean 21.75 20.62 18.75 18.08 12.13 SD 4.57 4.52 4.92 4.94 4.47

69

H Russian -category Animals Plants Flowers -letter p r

switcher non-switcher switcher non-switcher switcher and non-switcher switcher non-switcher switcher non-switcher switcher

10.23 9.50 8.77 20.63 17.69 16.13 17.08 13.75 11.77 11.69 11.85

4.44 3.44 2.45 6.11 4.97 6.90 7.12 4.64 5.51 4.87 3.26

3.5 Discussion 3.5.1 Is it proficiency or control that plays the crucial role? We used two tasks that are commonly employed to assess bilinguals language proficiency: a picture naming and a verbal fluency task. Both require single word production in both languages. Our major task restriction was that only the target language should be produced, not the non-target language. Consequently, non-target language productions could be classified as errors of interference, pointing at difficulties of controlling language choice and monitoring language production with regard to the target language. The construct of accuracy was operationalized in two ways: in the picture-naming task, with regard to content, a target response served as a norm to describe the picture stimulus; the degrees of deviation (mainly semantic) could be determined and classified. Content accuracy in the verbal fluency task was both evaluated according to correctness of the response with regard to the stimulus (category or letter correctness) as well as to the rules, such as no repetition, no dialect, etc. With regard to language accuracy, task instructions and cues predefined explicitly the target language, and language inaccuracy was measured by means of errors of interference. 3.5.2 Linking accuracy and control We suggest that control over language production is the principle mechanism underlying the construct of accuracy. With regard to accuracy, control aims at detecting, correcting and thus avoiding errors prior to their articulation with regard to both content (lemma) and target language during the internal preparation of a bilinguals language output (for the time being, phonological aspects are not included). Current psycholinguistic theorizing holds that in order to speak one language, bilinguals have to inhibit the other, control is required to assure that the intended language (target language) will be produced, and not the other (non-target language). David Greens model of bilingual language production (1986, 1993) incorporates this idea of inhibition and control. The phenomenon of cross-language interference is attributed to failures of control. Furthermore, the two tasks that we employed here differ in the necessity to switch frequently between both languages in the picture naming task, while responses were required only in single-language blocks in the verbal fluency task. The results probably indicate the impact of predetermined switching for the switcher-group, showing more difficulty in particular in 70

producing correct responses in German (L2). We demonstrated that proficiency is not the factor underlying the weaker performance in German picture naming (compared to results in Experiment 2). Apparently, the command to switch to L2 was less controllable for this group than commands to use L1. Research on switch costs show that switching to a stronger language is more time-costly. For our subjects, German is the language of their speech community, while Russian is mainly the language used at home. Rather than proficiency it is the level of activation of a language that might cause the performance difference with regard to German. The higher the current level of activation (which is independent of language proficiency), the more the language needs to be inhibited to be available for production (Festman 2004). Our data points to the conclusion that controlling these activation states was more difficult for the switcher-group than for the non-switcher with regard to the language of their speech environment. Executive functions are independent of language, and usually necessary for task-related performance. They are employed to monitor and plan behavior (verbal production included), to help focus on targets and to inhibit distracting information of all kinds (Lezak et al., 2004). Also, working memory is considered one of the sub-functions of executive functions. We suggest more specifically that bilinguals differ not only in their ability to control targetand non-target language production with regard to interference, but that this phenomenon is only just the tip of the iceberg, and that fundamental differences in executive functions cause the observed differences between our groups. 4. CONCLUSION We presented data from two verbal production tasks, which, taken together, favor the controleffect hypothesis over the purely proficiency-account to explain performance differences between subjects. Several tests that rely on executive functions, intelligence tests, as well as EEG-recording during verbal and non-verbal tasks are currently run in our lab on both groups to further our understanding of control and to specify the observed behavioral patterns. We found that bilinguals indeed differ in their language and executive control abilities (Festman et al., 2007a, 2007b), as well as in their ERPs (Festman et al., 2007c). This interdisciplinary approach could reveal differences in the application efficiency of cognitive and language control functions. REFERENCES Aschenbrenner, S., Tucha, O. & Lange, K.W. (2000). Regensburger Wortflssigkeitstest. Gttingen: Hogrefe. Baayen, H., Piepenbrock, R. & Van Rijn, H. (1993). The CELEX Lexical Database. Philadelphia: University of Pennsylvania. Bialystok, E. (1988). Levels of bilingualism and levels of linguistic awareness. Developmental Psychology 24, 560-567.

71

Bialystok, E. (1991). Metalinguistic dimensions of bilingual language proficiency. In E. Bialystok (Ed.), Language Processing in Bilingual Children. Cambridge: Cambridge University. Bialystok, E. (1999). Cognitive complexity and attentional control in the bilingual mind. Child Development 70, 636-644. Bialystok, E. (2005). Consequences of bilingualism for cognitive development. In J.R. Kroll & A. de Groot (Eds.), Handbook of bilingualism. Oxford: Oxford University. Bialystok, E. & Ryan, E.B. (1985). Toward a definition of metalinguistic skills. Merill-Palmer Quarterly 31, 229-251. Festman, J. (2004). Lexical production phenomena as evidence for activation and control processes in trilingual lexical retrieval. Unpublished doctoral dissertation. Bar-Ilan University, Israel. Festman, J. (2007). Cross-language interference during trilingual picture naming in single and mixed language conditions (submitted). Festman, J., Derheim, S., Rodriguez-Fornells, A., & Mnte, T.F. (2007a). Interference and tasks involving executive control (submitted). Festman, J., Derheim, S., Rodriguez-Fornells, A., & Mnte, T.F. (2007b). Interference effects and switch costs in a bilingual alternating-runs paradigm - a result of language proficiency or different control abilities? (in preparation). Festman, J., Mnte, T.F., & Rodriguez-Fornells, A. (2007c). Executive Control in Bilingual Language Processing: The Phenomenon of Interference. Poster presented at CNS, New York, May 5-8, 2007. Green, D.W. (1986). Control, activation, and resource: A framework and a model for the control of speech in bilinguals. Brain and Language 27, 210-223. Green, D.W. (1993). Towards a model of L2 comprehension and production. In R. Schreuder, & B. Weltens (Eds.), The Bilingual Lexicon, (249-277). Amsterdam: John Benjamins, Grosjean, F. (1982). Life with Two Languages. An Introduction to Bilingualism. Cambridge, MA., London: Harvard. Hamers, J.F. & Blanc, M. (1989). Bilinguality and Bilingualism. Cambridge: Cambridge University. Jackson, G.M., Swainson, R., Cunnington, R. & Jackson, S.R. (2001). ERP correlates of executive control during repeated language switching. Bilingualism: Language and Cognition 4 (2), 169-178. Lezak, M.D., Howieson, D.B. & Loring, D.W. (2004). Neuropsychological assessment. Oxford: Oxford University.

72

Long, D.L. & Prat, C.S. (2002). Memory for Star Trek: The role of prior knowledge in recognition revisited. Journal of Experimental Psychology: Learning, Memory, and Cognition 28, 1073-1082. Luk, G. & Bialystok, E. (2007). Examining the Bilingual (dis)advantage on the Verbal Fluency Task. Poster presented at isb6, Hamburg, Germany, May 30-June02, 2007. Mohanty, A.K. & Perregaux, C. (1997). Language acquisition and bilingualism. In J.W. Berry, P.R. Dasen & T.S. Saraswathi (Eds.), Handbook of Cross-Cultural Psychology. Vol. 2. Boston, MA: Allyn & Bacon. Peal, E. & Lambert, W.E. (1962). The relation of bilingualism to intelligence. Psychological Monographs 76, 1-23. Snodgrass, J.G. & Vanderwart, M. (1980). A standardized set of 260 pictures: Normes for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory 6, 174-215. Torrace, A.P., Gowan, J.C., Wu, J.M. & Aliotti, N.C. (1970). Creative functioning of monolingual and bilingual children in Singapore. Journal of Educational Psychology 61, 7275. Troyer, A. K., Moscovitch, M. & Winocur, G. (1997). Clustering and switching as two components of verbal fluency: Evidence from younger and older healthy adults. Neuropsychology 11, 138-146. Vygotsky, L. (1962). Thought and Language. Cambridge, MA: MIT.

73

FUNCTIONAL-COGNITIVE CORRELATES OF COMPLEXITY IN THE USE OF THE ENGLISH GERUND-PARTICIPLE WITH PERCEPTION VERBS. Gomez M. Angeles Leuven University, Belgium

1. INTRODUCTION It is assumed in the perception literature that the ing form is interpreted as having a participial and gerundial interpretations with physical and cognitive perception respectively. In order to provide the functional-cognitive correlates in the use of the English Gerund-Participle with Perception verbs, there are two questions to be taken into account: i) the distinction between gerunds and participles; and ii) the fact that some forms in -ing share gerundial / participial properties. In English Perception verbs can appear followed by an NP (Noun Phrase) and an ing form, as in: (1) a. I see my father diving into the sea. b. I remember my father diving into the sea.

In I see my father diving into the sea the speaker places the emphasis only on part of the crossing process. In I remember my father diving into the sea the main clause subject conceptualise only the internal configuration of the complement event. (Hamawand, 2002: 6566). The construction (NP and an ing form) have been mainly interpreted as two constituents 1 in (1a) and implies that the ing form functions as a participle; whereas the interpretation as a single unit in (1b) involves an ing functioning as a gerund. i) The distinction between a gerund and participles can be put in the following terms: the more concrete the event, the more participial the -ing structure is and the more abstract, the more gerundial (Verspoor, 1996: 417-454) as seen in figure 1.

Figure 1: Gerundial Interpretation

Figure 2: Participial Interpretation

ii) However, whenever the ing complement is the object of a transitive verb, and is preceded by a personal pronoun in the objective or an uninflected noun (as in I remember / see my 74

father diving into the sea), the interpretation is unclear (concrete / abstract), there is a certain contextual fluctuation:

Figure 3: Gerundial / participial interpretation

Both sentences could be interpreted as: (2) I see / remember my father diving my fat her (3) I see / remember div ing In this study, we intend to provide semantic, syntactic and thematic evidence in favour of the analysis of NP + -ing form both as two constituents and as a single unit. We will support this hypothesis with an analysis of the relationship between physical and cognitive perception verbs. Finally, we will discuss the implications of these findings on the possible readings of the ing form with perception verbs. 2. HYPOTHESIS AND AIM We claim that cognitive and physical perception verbs belong to the same semantic field and consequently they have some cognitive processes in common, due to their cognitive resemblance we expect a similar analysis to hold for both of them. The aim of this study is to prove that the construction NP + -ing form with perception verbs allow the same readings (abstract / concrete), we provide the following three parameters: i) the semantics of the NP + -ing form; ii) its syntactic function with respect to the main verb; and iii) finally, the argument structure of perception verbs. 3. THE RELATIONSHIP OF COGNITIVE AND PHYSICAL PERCEPTION VERBS According to Givons classification (1993a, 1993b) physical and cognitive perception verbs both belong to the same semantic group: Perception-cognition-utterance (PCU) verbs. The term cognitive perception verb is restricted to verbs having a mental picture of the event depicted by the complement clause (e.g. imagine, recollect, remember, and see in its abstract sense) and implies the conceptualisers ability to form pictures in her/ his mind about what something could be like, something which is not actual before the eye or within experience (Hamawand, 2002: 208). 75

We present briefly the four cognitive processes these verbs share: 1) An episode of physical or cognitive perception has a limited duration that can be thought of as a temporal viewing frame. 2) There is always some kind of temporal overlap between the main verb and the ing form: with perception verbs there is a full coincidence of the main-clause process and the -ing complement. 3) In perception, the main subject is an observer or an experiencer (Croft, 1993) (represented by a smiley) rather than an agent; in fact the observer is not under obligation to carry the complement content, and the main verb profiles a perceptual relationship between its subject and the complement scene. 4) Both in physical and cognitive apprehension, the ing form symbolizes a directly and immediately perceived event: the observer construes an event as seen from a very close perspective (this is represented by the presence on stage of the smiley) (Verspoor, 1996: 439). The following figure illustrates the former observations:

Figure 4: Physical and cognitive perception verbs cognitive processes.

In figure (4), a) the inner rectangle stands for the temporal viewing frame; b) a bold straight line represents the portion of the event denoted by the ing clause; c) the main verb is represented by an horizontal arrow and shows some temporal overlap with the ing form; d) the smiley symbolizes the observer; and finally f) the double arrow stands for the two-way causal relation.

76

4. THE FUNCTIONAL-COGNITIVE CORRELATES IN THE USE OF THE ENGLISH GERUND PARTICIPLE WOTH PERCEPTION VERBS. 4.1 The Linguistic evidence of the concrete interpretation (participle) 4.1.1 The semantic evidence From a semantic and syntactic point of view it is clear that these constructions can be treated as a unit fulfilling an object position (NP) and an ing form functioning as its complement: (4) a. I see my father diving into the sea. b. I remember my father diving into the sea.

Semantically, I see / remember my father diving into the sea can entail I see / remember my father. 4.1.2 The syntactic evidence Syntactically, my father functions as a unit in subject position in passive constructions ( my father is seen and my father is remembered). This suggests that this ing form functions as an object complement. 4.1.3 The thematic evidence From a thematic point of view, perception verbs have two roles: an Experiencer and a Percept; the latter can be assigned to two semantic entities: an individual or an event. The Canonical Structural Realization of individuals is a NP: (5) Percept role: a. Individual: NP (my father) b. Event: NP+ -ing form (my father diving)

In I see/ remember my father diving, the NP can be considered as the sole argument (my father), and the ing form functions as an adjunct. 4.2 The Linguistic evidence of the abstract interpretation (gerund) 4.2.1 The semantics of the NP + -ing form From a semantic point of view, the sequence NP + -ing form with perception verbs evokes an event that is analyzable as the direct object of the main verb. The direct-object function is characterized semantically as being filled by an element that designates that which is [verb]ed. The NP + -ing form can correspond semantically to that which is / was [verb]ed (Duffley, 1999: 227). In I see / remember my father diving into the sea that which is seen / remembered is my father diving, not just my father, nor just diving, i.e., the NP + -ing form fulfils semantically in both cases the role of direct object. 4.2.2 The syntactic function of the NP + -ing form 77

There are various syntactic criteria which corroborate the analysis of NP + -ing form as the direct object of the main verb. Firstly, this construction can be reformulated by means of a genitive or a possessive pronoun; secondly, pseudo-cleft sentences are possible and in addition one can refer to the construction by means of the pronoun it or that: (6) a. I see my father diving into the sea b. I remember my father diving into the sea. a. I see my fathers diving / his diving / the diving of my father. b I remember my fathers diving / his diving / the diving of my father a What I see is my father diving into the sea. b What I remember is my father diving into the sea. a I saw it / that. b I remember it / that

Yet, in the passive the NP and the ing form do not behave as one constituent, as can be seen in (7): (7) a * My father diving is seen (by us). b * My father diving is remembered.

There are two possible explanations for the ungrammaticality of the passive. Firstly, as has been observed by Reuland (1983), the gerund case marks its subjects; in addition, the NP is not a thematic argument on its own, because the argument is the event as a whole. (Borgonovo, 1996: 8-9). Secondly, it is likely that the reason for the ungrammaticality of these passives has to do with the semantic conditions on passivization. We do not have an explanation to offer at this point. So under such conditions, the passivization of NP + -ing form seems impossible. 4.2.3 The Argument evidence of the NP + -ing form We have just seen in 4.1.3. that a Percept is to be assigned to two semantic entities: an individual or an event. The Canonical Structural Realization of Events is typically the gerund (Borgonovo, 1996): (8) Percept role: a. Individual: NP (my father) b. Event: NP+ -ing form (my father diving)

In I see / remember my father diving, the NP + -ing form functions as a constituent (my father diving), as an internal argument of the matrix verb ( see / remember) and, consequently, we can claim that the ing form has an eventive reading. 5. RESULTS: THE ABSTRACT / CONCRETE READING OF PERCEPTION VERBS

78

Once the cognitive resemblance of perception verbs has been proved and linguistic evidence for both readings provided, we present the abstract and concrete readings for both physical and cognitive perception verbs. The concrete reading of physical and cognitive perception verbs is shown in the following figures (5) and (6) respectively:

Figure 5: Concrete reading of physical verbs. Figure 6: Concrete reading of cognitive verbs.

In the description of the concrete reading both 1) the conceptualisation of the event and 2) the role of the main clause subject play an important role. 1) As far as the conceptualisation of the event is concerned, it is evoked as something incomplete: it entails a partial view as something caught at some point between its beginning and its end Duffley 1995: 4); consequently, the observer or experiencer views the situation as an ongoing state of affairs. 2) And secondly, the main clause subject is not fully responsible for the content of the complement clause: the complement clause subject can suspend the action or decide to go out of the viewing frame (this is symbolized by a vertical dashed arrow). In sentences such as I remember my father / diving into the sea and I see my father / diving into the sea, the ing can be interpreted as I recall / see my father as he dived, with emphasis on the performer, hence a participle. The abstract reading of physical and cognitive perception verbs is shown in the following figures (7) and (8) respectively:

Figure 7: abstract reading of physical verbs.

Figure 8: abstract reading OF cognitive verbs.

79

In the description of the abstract reading, there are two main issues: 1) the conceptualisation of the event; and 2) secondly, the role of the main clause subject. 1) In the abstract reading, the ing complement clause evokes its event as a whole, it is seen in its entirety; the observer or experiencer conceptualises the internal configuration of the complement event: he / she conceptualises the event as a thing in itself. (Duffley, 1995: 5) 2) And secondly, the main clause subject is fully responsible for the content of the complement clause: he / she can take the initiative in suspending the event complement by stopping the remembrance, the image or the physical perception of it. We could end up stating that in sentences such as I remember my father diving into the sea, or I see my father diving into the sea, the ing has the following meaning: I recall / see the diving performed by my father, with emphasis on the event, hence gerund. We could summarize all previous observations in the following way: (9) I see / remember my father diving Concrete Participial Two contituents individual

(10) I see / remember

my fat her div ing

Abstract gerundial one constituent Eventive

When the interpretation is concrete, it is always linked to the syntactic function of participle; the participle and the NP are interpreted as two constituents and have an individual 2 interpretation as in (9), henceforth concrete reading. In contrast an abstract interpretation, the ing form syntactically functions as a gerund; in addition, NP and a gerund constitute a single unit and have an eventive interpretation as in (10), henceforth abstract reading. 6. CONCLUSIONS On the whole, our results suggest that that in English perception verbs allow the same readings: an abstract reading and a concrete reading. We think that it is precisely a question of first or second logical percept; thematically, the first logical percept of physical perception verbs is an individual and secondly an event; whereas for cognitive perception verbs it is the other way around: an event and an individual as its first and second logical percept respectively. The literature seems to provide the most logical and immediate reading for each kind of verb but one should bear in mind the complete picture to understand the mechanism of gerundial and participial property sharing. Although our results are preliminary, we claim that ing form with perception verbs (preceded by an NP and having as main verb a physical perception verb) has two possible readings: a concrete reading, hence participle, and an abstract reading, hence gerund (not a 80

prototypical one, as a more detailed study on the ungrammaticality of the passive is needed). NOTES 1. Kortmann 1995; Quirk 1985; Dirven, 1989 and Langacker, 1991 and others. 2. The percept role of this construction is an individual (NP), and the participle functions as its complement; note the term individual refers to all perceptible objects either human entities or things (i.e. I see / remember the sea moving). REFERENCES van der Auwera, J. (1990). Coming to terms. Unpublished postdoctoral thesis, University of Antwerp. Borgonovo, C. (1996). Gerunds and perception verbs. Langues et Linguistique 22, 1-19. Croft, W. (1993). Case marking and the semantics of mental verbs. In J. Pustejovsky (Eds.), Semantics and the Lexicon, (pp. 55-72). Netherlands: Kluwer Academic Publishers. Dirven, R. (1989). A Cognitive Perspective on Complementation. In D. Jaspers, W. Klooster, Y. Putseys & P. Seuren (Eds.), Sentential Complementation and the Lexicon, (pp. 113-139). Dordrecht: Foris. Duffley P. J. (1995). Defining the Potential Meaning of the English ing Form in a Psychomechanical Approach. Langues et Linguistique 21, 1-11. Duffley P. J. (1999). The use of the Infinitive an the ing after Verbs Denoting the Beginning, Middle and End of an Event. Folia Linguistica XXXIII 3(4). Berlin: Mouton de Gruyter. Givn, T. (1993a). English Grammar. A function-based introduction 1 . Amsterdam/ Philadelphia: John Benjamins. Givn, T. (1993b). English Grammar. A function-based introduction 2. Amsterdam/ Philadelphia: John Benjamins. Hamawand, Z. (2002). Atemporal Complement Clauses in English: A cognitive Grammar Analysis. Muenchen: Lincom Europa. Kortmann, B. (1995). Adverbial participial clauses in English. In M. Haspelmath & E. Knig (Eds.), Converbs in Cross-linguistic perspective. Structure and meaning of adverbial verb forms - adverbial participles, gerunds. Berlin: Mouton de Gruyter. Langacker, R. (1991). Foundations of Cognitive Grammar 2: Stanford, CA: Stanford University Press. Descriptive Applications.

Quirk, R., Greenbaum S., Leech G. & Svartvik J. (1985). A Comprehensive Grammar of the English Language. London: Longman. 81

Reuland, E. (1983). Governing ing. Linguistic Inquiry 14, 101-136. Verspoor, M. (1996). The story of -ing: A subjective perspective. In M. Ptz & R. Dirven (Eds.) The construal of Space in Language and Thought, (pp. 417-454).

82

83

SPEAKING AND WRITING IN L2 FRENCH: EXPLORING EFFECTS ON FLUENCY, COMPLEXITY AND ACCURACY. Jonas Granfeldt Lund University, Sweden

1. INTRODUCTION Writing contrasts with speaking in a number of domains. With respect to processing constraints there are three simple but yet important key differences: (i) the stability of the language signal; (ii) the degree of control of the language user over linguistic output and (iii) the presence or absence of an audience during production (Ravid & Tolchinsky, 2002: 426). The first difference is a prerequisite for the second, the degree of control over the linguistic output. The writer more than the speaker can revise and edit the message before submitting it to the receiver. The third difference makes significantly longer planning possible in writing. Taken together, the three differences lead to a situation that typically allows for a more control and focus on the message in writing as compared to speaking. In L2 research it debated how and to what extent the control, planning or monitoring possibilities affect the linguistic output of language learners. The question is if the L2 learner can improve his or hers performance under beneficial conditions where there is time to reflect and focus f ex on grammatical form? The question is also if the language learner under such conditions can put to use a more complete inventory of his or hers knowledge of the second language. The current discussions on these issues often differentiate between implicit/declarative and explicit/proceduralized knowledge. Somewhat simplified, we could say that if proceduralized knowledge is associated with automatic and fast language processing (Towell et al., 1996), then we can assume that oral production will be mostly influenced by this type of knowledge. There is simply not time to consistently draw on declarative knowledge when speaking. In writing the learner is given more time to plan and monitor the production. We could hypothesize that this will have the consequence that the written production will additionally bear features of conscious declarative knowledge of the second language that might lead to a more complex and more accurate performance. If we put this in a developmental perspective and adopt a version of the interface position with respect to declarative and proceduralized knowledge (Anderson, 1983 and later), we might additionally hypothesize that "recently acquired" linguistic knowledge appears first in written production and then later in oral production (when proceduralised). The oral written comparison translates in this paper into the question if additional planning time and extended monitoring possibilities will bring out declarative knowledge that ultimately can differentiate the grammatical performance of the L2 learners in the two modalities. Weissberg (2000) in a longitudinal case study of five ESL learners set out to answer more or less this last question. Weissberg analyzed the syntactic innovation (i.e. the emergence) of different morphosyntactic features in oral and written production. The results showed that writing was the preferred modality for L2 development of syntax. In other words, it was in 84

writing that most syntactic innovations occurred. But Weissberg also found differences between different morphosyntactic features. While regular past morphemes, modal auxiliaries and passives appeared first in writing, irregular verbforms and plurals more often appeared first in speaking, Last, Weissberg found important individual differences and suggested that the learners were driven by a modality preference. Another recent comparative study by Hkansson and Norrby (2007) looked at oral and written production within Processability Theory (PT) (Pienemann, 1998). In PT a central explanatory factor is working memory capacity or more precisely the limited capacity of this memory. When speaking for communicating a propositional content, the working memory capacity is insufficient to control all grammatical information in the beginner learner. This limitation is of course also present when writing but planning and monitoring might take off some of the burden on working memory. The PT stages of development were originally elaborated on oral data and it has not been clear to what extent they are valid also for written data. Therefore, Hkansson and Norrby (2007) tested the same structures on oral and written data from the same learners of Swedish and found that the predictions of PT were followed in both modalities. The additional planning time and the extended possibility to control the written production did not lead to a difference with respect to processability levels. It appears, they say: [t]hat time alone does not give differences in levels of processability. (Hkansson & Norrby, 2007). This confirms the result from a previous study by Hulstijn & Hulstijn (1984) on Dutch were four different production conditions were compared. Only in conditions where the learners were explicitly told to focus on form could they change their grammatical production. Also in the Hulstijn & Hulstijn study (planning) time was not enough to change how the tested structures were produced (for example word order). But Hkansson & Norrby did find an increase in complexity in writing. There was a tendency for the learners to produce more subordinate clauses in written than in oral production. This result also seems consistent with many previous studies looking at planning effects. Yuan & Ellis (2003: 28) summarizes a handful of studies and find that there is good evidence for a pretask planning effect on complexity (measured by subordination ratio) and fluency. For accuracy, the results are, however, mixed. Following these results we could expect a greater syntactic complexity in written production but not necessarily an increase in accuracy. A study that provided inferential evidence of higher accuracy rates in written than in oral production is Granfeldt (2005) where the marking of finiteness was analyzed in two groups of Swedish learners of French. Independent linguistic criteria placed the learners on stage two and four in the six stage scale of Bartning & Schlyter (2004). At stage 2 in spoken learner French we should, according to Bartning & Schlyter (2004), expect between 20 and 30% of non-finite forms (or long forms) like /parle/ /sortir/ where the target language requires a finite verb form (e.g. /parl/ or /sor/) and still at the pre-advanced stage 4 there should be some occurrences of this error remaining. But in written production Granfeldt (2005) found a ceiling effect for this phenomenon emerging already at stage 2 and at stage 4 the learners did practically not produce any non-finite forms at all in writing. This raised the interesting question if, in writing, the learners performed perhaps at a more advanced developmental stage. Since no oral data was available for the same learners this hypothesis could, however, not be verified. In the present study data has therefore been collected in both modalities from the same L2 learners. 2. METHOD 85

The research design was inspired by the one used in the project Developing literacy across genres (Berman & Verhoeven, 2002). The design of the study and some information on the participants are described below. 2.1 Participants Subjects were recruited in the beginning of the fall semester of 2005 among the students of French at the Department of Romance Languages, Lund University, Sweden. Before the experiment they had had approximately 500 hours of teaching. On the basis of the scores from an in-house placement test, the six subjects were grouped into two subgroups: one group with a lower score (NonPass group) and one group with a higher total score (Pass group). The subjects filled out a short background questionnaire about their mother tongue, years of studying French, age of onset, length and purpose of visits in French speaking countries and use of French outside the classroom. The subjects also estimated their keyboard ability on a scale from 1 to 10 where 10 indicated perfect mastery. The result of this self-assessment and other relevant information on the subjects is presented in Table 1.
Table 1: Information on participants. SCORES ON PLACEMENT TEST Subject 1 2 5 3 4 6 Mean St.Dev. Group NonPass NonPass NonPass Pass Pass Pass Sex Male Female Female Female Female Female Age 19 23 21 22 20 21 21,0 1,4 Keyboard ability score 9 9 8 4,5 7 8 6,5 1,7 Grammar 21 25 31,5 53 58 47 39,3 15,5 VOC I 14 19 24 29 32 34 25,3 7,8 VOC II 12 4 15 18,5 25 23,5 16,3 7,8 Total 47 48 70,5 100,5 115 104,5 80,9 29,8

Legend: VOC I = Vocabulary test part one: translation from French to Swedish (passive vocab); VOC II = Vocabulary test part two: translation from Swedish to French (active vocab). Max score on Grammar = 80, VOC I and II = 40 respectively and max total score = 160.

A series of t-test revealed that there is a significant differences between the groups for all three scores of the placement test and for the total score: for Grammar t(4)=, -6,081, p= 0,004, for VOCI t(4)= -3,919, p=0,017 and VOCII t(4)= -3,136 p=0,035 and for Total t(4)=, -5,848, p=0,004. 2.2 Design of the study All subjects produced four texts, two spoken and two written, in two genres 2. Data collection took place in two sessions. In the first session, all six subjects produced two expository texts.
The study was designed to evaluate also the effect of genre on L2 performance and production was elicited in two contrasting genres, expository and narrative. Due to limits of space I will, however, only report on the general comparison between the spoken and written data here. The effect of genre was addressed in the poster presentation and the results are available upon request.
2

86

Half of the subjects spoke before writing and half of the subjects wrote before speaking. About a week later, all subjects came back for a second session and the same procedure was repeated but with two narrative texts. 2.3 Settings and procedures The subjects looked at a video with different scenes from an ordinary school day. The video is silent but accompanied by music. The scenes have in common that they tell mini-stories about different problems that school children might encounter in school. In all conditions the subjects were told not to focus on form but on meaning. Before producing any text, the subjects were asked to take some time to reflect on their production. The key-stroke logging software ScriptLog (Strmqvist & Malmsten, 1997) was used to collect the written texts. Each subject sat alone in a quite room. Subjects were told that they could write for 20 minutes. After 15 minutes the experiment leader notified the subjects that 5 minutes remained. The spoken sessions took place in the same room. The experiment leader sat opposite the subject and acted as the listener. The experiment leader was deliberately silent and gave only some short feedback signals to the subject. The idea was to encourage the subjects to produce monologic texts without focusing too much on form and choice of words. The spoken texts were recorded on a computer and transcribed by the experimental leader in the CHAT format (MacWhinney, 2000). 2.4 Measures and analysis In the first part of the study three sets of analysis were conducted on the data. Five different measures of fluency, complexity and accuracy were applied. These are described and operationalized in the following sections. 2.4.1 Fluency Fluency measures can reveal how easy it is for the language learner to retrieve process and produce the second language in real time. In this study, fluency is defined as rate measure, Words per minute. Fluency in speaking and writing can not, of course, be compared directly since the speaking in normal adults is about six times as fast as writing but it seemed important to use the same quantitative measure for both modalities. All incomplete words, all non-French words and all repeated words were excluded. Words in this study refers then to all meaningful non-repeated French words in the oral production and all words in the final edited text in the written production. Minutes refers to the amount of time measured in minutes that the subject spoke or wrote. Fluency was then calculated as the ratio words/minute with the above definitions. 2.4.2 Complexity Measuring linguistic complexity is a way of defining the degree of variation and sophistication in the learners productions. In this study I differentiate between grammatical and lexical complexity. 87

Grammatical complexity is defined here as a ratio, Number of clauses per T-unit. The choice of this measure is motivated by the fact that it has been used in a large number of studies and it has been found to show a linear relationship with proficiency levels at least in writing (Wolfe-Quintero et al., 1998). The clause is defined here following Bardovi-Harlig and Bofman (1989). Clauses can be both finite and non-finite. Participle phrases, gerunds and infinitive phrases were all analyzed as clauses. T-unit is defined here following Hunt (1965) as a main clause plus any subordinate clauses. Differently from Hunt, I analyzed punctuated sentence fragments as T-units in writing. In speaking I analyzed prosodically marked fragments as T-units. Lexical complexity is defined here as a measure of vocabulary diversity, D, developed by Richards, Malvern and colleagues (Richards & Malvern, 2004). This measure is a development of the traditional Type-Token-Ratio (TTR). It has been implemented to solve some problems encountered with the TTR, specifically the problem with different sample lengths. The lexical diversity measure, D, is computed via a specific program in the CLAN toolbox, called vocd. Self-repetitions, code-switches and incomplete words were excluded. In the written production, I spell corrected words where the proposed spelling did not alter the pronunciation of the word in relation to the norm. Words with a very deviant spelling were excluded. Finally, to avoid confusing this lexical measure with morphological development all inflected forms were lemmatized before analysis. 2.4.3 Accuracy In this study accuracy is defined as a ratio, Number of errors per T-unit. There are several reasons for this choice. Errors per T-unit was one of the accuracy measures that were found to correlate significantly with overall proficiency scores in the meta-analysis of Wolf-Quintero et al. 1998. I divided the error analysis into lexical and grammatical errors. The last category contained three subgroups.
Table 2: Error types. Lexical errors Wrong choice of: prepositions temporal auxiliaries Wrong meaning of a word in a particular context Grammatical errors Syntactic errors Omissions (f ex): subjunctions articles Word order errors

Tense errors Absence of tense marking (non-finite forms) Wrong tenses pass compos for Imparfait (or vice versa). Present tense for past tense. Imparfait for plusque-parfait

Morphological errors Agreement errors. S-V agr Gender concord errors

Given the amount of silent morphology in spoken French it was clear from the beginning 88

that there was a risk for underestimating the accuracy of the written production compared to the oral production. A lot of silent agreement morphology (person, number and gender) could not be scored in speaking. The very same morphology could potentially be scored in written French. But this would risk biasing the results since it would importantly increase the number of possible contexts for errors in writing compared to the speaking. The proposed solution is to only score audible morphological errors both in the spoken and the written data. In the written data this means that subject-verb agreement errors that involved silent morphology were not scored (ex. *Il fais, *Ils parle etc) but only errors that would have been heard in spoken production (ex. *Nous a, *Ils va). 3. RESEARCH HYPOTHESES The following research hypotheses will be tested: A. The written and oral productions of the Pass group will be characterized by an overall higher degree of fluency, complexity and accuracy than in the NonPass group. This would be explained by a developmental effect. Following the results from the off-line placement test, the Pass group could be more advanced than the NonPass group. B. The written production will be characterized by a higher degree of accuracy and complexity than the oral production in all learners. This would be explained by a planning effect as discussed in the introduction. C. The individual grammatical profiles of the written production will include more advanced structures than the grammatical profiles of the oral productions. The written production of each learner will consequently be analyzed as reflecting a more advanced stage of development. This would be explained by the extended monitoring possibilities in writing where the learners also can draw more consistently on declarative knowledge. 4. RESULTS Tables 3 and 4 present the results of research hypothesis A. In the spoken tasks (cf. Table 3), the descriptive results suggest the expected difference. The learners in the Pass group speak more fluently, use more subordinate clauses per T-unit and use a more diversified vocabulary. As expected the Pass group also makes fewer errors, both lexical and grammatical. But there is no significant effect of group on any of the measures, probably due to the very small size of each group in this pilot study (N=2x3). Comparing the results in Table 3 and 4, it is interesting to note that, for some measures, the differences between the Pass group and the NonPass groups are leveled out in the written production when compared to the oral production (cf. the higher p-values in Table 3). This seems to be true for the complexity measures. For fluency and lexical errors the relationship is even inversed in the written production. The results suggest that the NonPass group produce 89

more words/minute in writing than the Pass group. This could, however, be due to the greater keyboard ability in the NonPass group (Mean self-estimated score of 8,7 compared to 6,5 in the Pass group, see Table 1). More notably the supposedly more advanced group makes more lexical errors than the less advanced group but only in writing French. I will come back to this somewhat unexpected result below.
Table 3: Means and standard deviations for fluency, complexity and accuracy in oral production. Fluency Ws/min M SD 84,0 23,9 62,4 20,2 -1,685 0,123 Complexity Clauses/ T-unit M 1,8 1,4 -1,779 0,106 SD 0,5 0,2 Vocab diversity (D) M 52,0 43,1 -1,855 0,093 SD 7,5 9,1 Accuracy LexErrs / T-unit M SD 0,2 0,1 0,4 0,2 -1,370 0,201 GrammErrs / T-unit M SD 0,4 0,1 0,5 0,5 -1,786 0,486

Pass NonPass T P

Table 4. Means and standard deviations for fluency, complexity and accuracy in written production. Fluency Ws/min M SD 8,5 1,9 10,8 3,0 1,554 0,151 Complexity Clauses/ T-unit M SD 1,5 0,3 1,5 0,2 -0,102 0,921 Vocab (D) M 65,3 59,5 -0,493 0,633 diversity SD 21,8 19,0 Accuracy LexErrs / T-unit M SD 0,6 0,3 0,5 0,3 -0,594 0,566 GrammErrs / T-unit M 0,5 0,7 2,249 0,391 SD 0,2 0,5

Pass NonPass T P

In real time production the two groups seem closer than the initial off-line placement test suggested. There are several possible factors inherent to the design of the study that might explain this. But it is also possible that this result captures some more general difference between off-line language knowledge tests and on-line performance in the same language. Many teachers would probably confirm this. Some learners are simply better at performing on cloze-tests but this doesnt necessarily mean that they are more advanced language users. Since no measure revealed any significant difference between the two groups, I decided to treat the learners as one group when evaluating research hypothesis B - the effect of modality on the dependent variables of complexity and accuracy. Table 5 presents the relevant results.
Table 5: Means and standard deviations for complexity and accuracy in spoken and written production. Complexity Clauses/ T-unit Oral Written T p M SD 1,6 0,4 1,5 0,2 0,24 0,629 Vocabulary diversity (D) M 47,5 62,4 5,55 0,028* SD 9,2 19,8 Accuracy LexErrs / T-unit M SD 0,3 0,2 0,5 0,3 5,1 0,035* GrammErrs / T-unit M 0,5 0,6 0,8 0,373 SD 0,3 0,4

90

Looking first at the complexity measures, Table 5 shows that vocabulary diversity is significantly higher in writing than in speaking. This is in line with research hypothesis B. On the other hand the ratio of clauses per T-unit is lower in writing. This result immediately raises two questions. First there can be qualitative differences in the kind of subordinate clauses used in the spoken and the written production. One possibility is that the learners use a less varied and less advanced set of subordinated clauses in the spoken conditions. In fact studies on discourse structuring and clause combing have shown that there is at least a qualitative development with respect to types of subordination in learner production (see Bartning & Schlyter, 2004, for a summary). I conclude here that a general measure of grammatical complexity does not reveal any differences between oral and written production but in a future study the type of subordinate clauses will have to be look at in more detail. Second, there might be a strong effect of genre on this measure. In the spoken expository texts one can expect to find a lot of relatively simple or formulaic subordinations like je crois/pense que X. Contrary to hypothesis B, there are both more lexical and grammatical errors in the written production than in the oral production. The difference is significant at the p < 0.05 level for lexical errors. This is an unexpected result since writing allows for more time to plan the production. I also hypothesized that writing would bring out more declarative knowledge and therefore lead to higher accuracy. The results show, however, the exact opposite of this prediction. I will come back to this in the final discussion. 4.1 Grammatical profiling Tables 6 and 7 present individual grammatical profiling analyses of the personal narratives in the two groups. The morphosyntactic phenomena taken into account come from Bartning & Schlyter (2004) on the basis of which these authors identified six stages or profiles of development. The analysis was carried out with the Direkt Profil software (Granfeldt & Nugues, 2007). In the tables the phenomena are presented in their ranked order with the early phenomena at the top and the late/ more advanced phenomena at the bottom. The last row of the tables indicates the stage evaluation according to Direkt Profil.
Table 6: Grammatical profiles of personal narratives in speaking and writing NonPass-subjects ( + = well acquired, target-like-use score above 75%. Speaking 1 + + + + / + / (+) (+) (+) / 81% (+) / / (+) / 83% Writing 1 + + + + + + (+) (+) (+) (+) / 80% / (+) (+) / 64% / / (+) / 66% / / (+) (+) 79%

Subjects Pass compos (ne) V pas Subordinates Modal Verb + Infinitive Imparfait of lexical verbs Finite forms of lexical Verbs Nous V ons (+) Ne (V) rien 3P.pl Vont Object Pronouns Conditionnel % D-N gender agreement

5 + + + + + + (+) /

2 + + + + + +

5 + + + + + +

2 + + + + + +

91

Stage acc. DP 3 3 3-4 3 3 3 (+)= emerged structure but only occasional occurrences and/or errors; / = no occurrences of the structure; Stage acc. DP = Stage evaluation according to Direkt Profil.) Table 7: Grammatical profile of personal narrative in speaking and writing Pass-subjects. Subjects Pass compos (ne) V pas Subordinates Modal Verb + Infinitive Imparfait of lexical verbs Finite forms of lexical Verbs Ne (V) rien 3ppl ont Object pronouns Pluperfect Futur simple Conditionnel 3P.pl Vent Subjonctif Grondif % D-N gender agreement Stage acc. DP SPEAKING 3 + + + + + + / (+) (+) (+) (+) (+) (+) / / 80% 4-5 6 + + + + + + + (+) + / / / (+) / / 80% 4 4 + + + + + + + (+) + + (+) (+) / (+) (+) 96% 4 WRITING 3 + + + + + + / / + / / (+) / / / 100% 4-5 6 + + + / + + / / + / / / / / / 50% 3-4 4 + + + + + + / / + + + / / / / 80% 3-4

The analysis shows that contrary to research hypothesis C, the grammatical profiles of the written production do not generally include the use of more advanced structures than the profiles of the oral production. The automatic stage evaluation of Direkt Profil is practically the same in speaking and in writing for all learners with the learners in the Pass group being evaluated approximately at one stage of development above the learners in the NonPass group. This is consistent with the fact that the Pass group learners have acquired more advanced structures than the NonPass group (f ex object pronouns are well acquired by all Pass subjects, except subject 3 while this is only an emerging structure in the NonPass group). If there is no general effect of modality here either there might however be individual differences. Especially in the Pass group, there is a tendency for the learners to use more advanced structures in oral production. This is the case for subjects 3 and 4, cf. Table 7. This result recalls the study of Weissberg (2000) discussed in the introduction. Weissberg found that learners have modality preferences when it comes to morphosyntactic constructions. It might be that the preferred modality for learners 3 and 4 is speaking. This in turn could be part of the explanation as to why the CAF-differences between the two groups of learners were leveled out in writing (see discussion of Tables 2 and 3 above). If writing was the dispreferred modality for two out of three learners in the Pass group, this can explain why the NonPass learners caught up with them in writing. 5. SHORT SUMMARY AND DISCUSSION

92

In this study I have analyzed the effect of modality on two sets of dependent measures: traditional CAF-measures and grammatical profiles. In the context of CAF-studies, the modality comparison could translate into a study on the effect of on-line planning. Previous studies (see Yuan and Ellis, 2003) have found a positive planning effect on complexity and fluency but only mixed results for accuracy. In this study, I found that the lexical complexity, measured as vocabulary diversity, significantly increased in writing but there was no effect on grammatical complexity, measured by a subclause ratio. Furthermore and contrary to expected, the learners produced more errors in writing than in speaking. A first idea is to explain this result is to consider some intervening factor(s) in writing. Two candidates come to mind. First spelling might be a problem here. French is, like English, a deep orthography with a highly complex relationship between the oral and written systems. In the final product I have consistently neutralized this factor since I have spell-corrected all written texts before analysis (see Method). But it is possible that in the process of writing this factor might have had a greater influence than I originally thought. If the learners devoted much attention to spelling this might have distracted them away from formal grammatical aspects. This will be looked at in a future study. A second possibility was discussed in relation to the results of the second analysis of individual grammatical profiles. It was found that the grammatical profiles were not more advanced in the written than in the oral production. The extended possibilities to monitor and draw on declarative knowledge did not make the learners produce at a more advanced stage of development in writing (as evaluated by the Direkt Profil software). This result confirms Hkansson & Norrbys (2007) study on learners of L2 Swedish within Processability Theory. But interestingly some individual differences suggested that learners can have a modality preference (Weissberg, 2000) when it comes to morphosyntactic constructions. This might then be the second factor to explain why at least some learners did not perform better in writing. REFERENCES Anderson, J.R. (1983). The Architecture of Cognition. Cambridge MA: Harvard University Press. Bardovi-Harlig, K. & Bofman, T. (1989). Attainment of syntactic and morphological accuracy by advanced language learners. Studies in Second Language Acquisition 11, 17-34. Bartning, I. & Schlyter, S. (2004). Itinraires acquisitionnels et stades de dveloppement en franais L2. Journal of French Language Studies 14, 281-299. Berman, R. & Verhoeven, L. (2002). Cross-linguistic perspectives on the development of textproduction abilities in speech and writing Parts 1 and 2. Special issue of Written Language and Literacy 5 (2). Granfeldt, J. & Nugues, P. (2007). Evaluating stages of development in second language French: a machine-learning approach. In Proceedings of the 16th NODALIDA conference, Finland, Tartu, 28 to 30th May, 2007.

93

Granfeldt, J. (2005). Direkt Profil et deux tudes sur la morphologie verbale et les stades de dveloppment. In Granfeldt, J. & Schlyter S. (Eds.), Acquistion et production de la morphologie flexionnelle. Actes du Festival de la morphologie. Petites tudes Romanes de Lund Extra Seriem (PERLES) 20, (pp. 65-85). Institut dtudes romanes de Lund. Universit de Lund. Hulstijn, J. & Hulstijn, W. (1984). Grammatical errors as a function of processing constraints and explicit knowledge. Language Learning 34, 23-43. Hunt, K.W. (1970). Syntactic maturity in school children and adults. Monographs of the Society For Research in Child Development, 53 (134). University of Chicago Press, Chicago. Hkansson, H. & Norrby, C. (2007). Processability Theory applied to written and oral L2 Swedish. In F. Mansouri (Ed.), Second language acquisition research: theory-construction and testing, (pp. 81-94). Cambridge. Cambridge Scholars Press, UK. Malvern & Richards et al. (2004). Lexical diversity and language development: quantification and assessment. New York: Palgrave MacMillan. MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates, Mahwah, NJ. Pienemann, M. (1998b). Language Processing and Second Language Development Processability Theory. Amsterdam/Philadelphia: John Benjamins. Strmqvist, S. & Malmsten, L. (1997). ScriptLog Pro User's manual, Gteborg University, Department of Linguistics. http://www.scriptlog.net/ Ravid, D. & Tolchinsky, L. (2002). Developing linguistic literacy: a comprehensive model. Journal of Child Language 29, 417-447. Towell, R., Hawkins, R., & Bazergui, N. (1996). The development of fluency in advanced learners of French. Applied Linguistics 17, 84-119. Yuan, F. & Ellis, R. (2003). The effects of pre-task planning and on-line planning on fluency, complexity and accuracy in L2 monologic oral production. Applied Linguistics 24, 1-27. Weissberg, B. (2000). Developmental relationships in the acquisition of English syntax: writing vs. speech. Learning and Instruction 10, 37-53. Wolfe-Quintero, K., Shunji, I et Hae-Young, Kim (1998). Second language development in writing: measures of fluency, accuracy and complexity . Technical Report 17: Second Language Teaching & Curriculum Center, University of Hawaii.

94

FLUENCY AND ACCURACY IN THE WRITTEN PRODUCTION OF L2 FRENCH Cecilia Gunnarsson University of Lund, Sweden

1. INTRODUCTION The aim of this study is to examine the relation between fluency and accuracy, in written production of L2 French. The research in L2 acquisition has for a long time focused on the learners oral production. In the last few years though, an interest in studying the written production has emerged. This recent interest has resulted in two models for written production in L2 (i.e. Zimmermann, 2000; Wang & Wen, 2002). In a longitudinal study of 30 months, we have followed the written production of 5 guided learners of L2 French. The learners are 16 to 19 years old. The learners computer-written production was recorded in the program ScriptLog (Strmqvist & Malmsten, 1997) and a videofilmed thinking aloud protocol (TAP). This methodology allows us to follow the written production in real time. The protocols from ScriptLog and the TAP:s provide the material for the analysis of the development of linguistic proficiency. In a previous study we hypothesised a general development leading to more fluency and complexity (Gunnarsson, 2006). This hypothesis was not confirmed. Instead we observed important inter-individual differences in the 5 learners. One group of learners produced written L2 French with more fluency and less verbalised reflection in the TAP:s, whereas the other group produced with less fluency and more verbalisations. Considering each learners limited cognitive capacities (Fayol, 1994) we expected those who produced more fluently to have less complexity and vice versa. A simple relation like that between fluency and complexity could not be confirmed. We could observe it in some learners but not in general. This lack of relation could be explained by the fact that writers in L2 are more preoccupied with the formulation process, where the ideas get their verbal form, compared to the planning process, where the ideas are generated (Zimmermann, 2000, see his model in Figure 1). Compared to writers in L1, writers in L2 tend to rephrase more frequently, see the shadowed middle section in the model (Figure 1). In the rephrasing you find that modifications (Mod), repetitions (Rep) and simplification (Simplify) are the techniques the writer uses to try out different tented forms during the formulation process.

PLAN

95

(Tent Form L1) Tent Form Other L2 (L2 Problem Solving)

Tent Form1 / Mod Tent Form1 Tent Form1 / Rep Postpone Evaluation Reject

Simplify Tent Form2 Tent FormS Tent FormN Accept

WRITE
Figure 1 The formulation process in L2 according to Zimmermann (2000)

Underlines : = pre-text Italics = indicates our addition to the Zimmermann model = (largely) L2-specific

If we use the Zimmermann (2000) model for written production as a theoretical basis and consider the observations concerning the predominance of low level linguistic aspects when writing in L2, we are more likely to find a relation between fluency and accuracy than between fluency and complexity in the data from our longitudinal study of 5 Swedish senior high school learners writing in L2 French. During the formulation process, emphasised by the model, the L2 writers are more preoccupied with vocabulary, orthography and grammar (Barbier, 1997: 96), than aspects linked to complexity. As the processes treating these low

96

level aspects are not as implicit in L2 writing as they are in L1 writing, the writing is slower in L2 (Barbier, 1997: 218). Furthermore, compared to the oral production in L2, which is considered to give evidence of the learners implicit knowledge (Towell et al., 1997), the written production offers the possibility to use explicit knowledge during the production. As a matter of fact, writing is 5 to 8 times slower than speaking in the same individual (Fayol, 1997: 10). In this way the written production facilitates the use of explicit knowledge and explicit control of the output. This possibility might affect both fluency and accuracy. In the TAP:s from the writing sessions we get access to some of this explicit knowledge used when writing in L2 French. For this reason the verbalisations during the writing process are being considered in the present study. We will mainly consider their frequency and their nature, i.e. the extent of L1 (Wang & Wen, 2002; Zimmermann, 2000) and repetitions of already tented forms in L2 (Zimmermann, 2000, see Figure 1). In order to write with fluency you must plan what to write and how to write it almost parallel with the writing. A fluent production is considered to reflect the use of implicit knowledge (Chenoweth & Hayes, 2001; Towell et al., 1997). In an senior high school L1 writer the use of implicit knowledge generally leads to an accurate text in terms of vocabulary, orthography and grammar, i.e. the low level aspects preoccupying L2 writers of the same age. When it comes to L2 learners this is probably not the case. To be accurate in the low level aspects of the text, learners would probably also need to use explicit knowledge in some of these aspects, especially in the case of writing in L2 French, where you dont get oral cues to a lot of these aspects, subject-verb agreement for example. Considering the inter-individual differences observed in Gunnarsson (2006), we will here pursue the individual perspective and propose the following Hypothesis: Hypothesis: In written production where the use of control and explicit knowledge is facilitated, we expect to find a relation between fluency and accuracy. We expect the more fluent production to be less accurate and the less fluent production to be more accurate. 2. METHOD 2.1 Participants Participants in this study were 5 Swedish senior high school pupils studying French as an L2. The 5 learners, 4 girls and 1 boy, were selected from the same group of 15 learners, all in their 4th year of French studies and first year in senior high school. The participants were selected out of three main criteria: 1/ they were planning to study French until finishing senior high school 2/ their competence in French was sufficient to produce written text 3/ they accepted and managed the TAP:s in front of a video camera. Furthermore they were a quite homogenous group when it comes to individual differences often referred to in literature (e.g., Larsen-Freeman & Long, 1991; Ligthbown & Spada, 1993). They were the same age, i.e. 16 years at the beginning of the study and 19 at the end and they had of course the same educational level. They came from a similar social background, i.e. middle class, more or less well-off. All of them were preparing a senior high school exam aiming at university studies. Albeit all five were very motivated for school, their motivation for

97

French varied somewhat. Three of the participants majored in languages (Christine, Emelie and Martine) and two majored in natural sciences (Oscar) or social sciences (Sophie). The participants linguistic level of French was evaluated by the developmental stages proposed by Bartning and Schlyter (2004). According to this evaluation the linguistic level of Christine, Emelie and Martine was slightly superior to that of Oscar and Sophie, especially at the end of the study, when Christine, Emelie and Martine had reached a higher stage than the other two (see Gunnarsson, 2006: 71-74 for details). 2.2 Data collection The written production in this study was exclusively produced on a computer. The computer written data was collected using ScriptLog, developed by Strmqvist and Malmsten (1997), a computer programme which records all keyboard activity in real time (Strmqvist & Ahlsn, 1999). The programme gives access not only to a full version of the final text, but also to every prior version with its changes, corrections and revisions. The recordings of ScriptLog were combined with a video filmed thinking aloud protocol (TAP). During the recordings the participants were alone with the computer and the video camera. The instructions given at each time were the simplest possible Pronounce all your thoughts aloud while writing. There were no training session, but a pilot study, also included in the study, where 15 learners were tested and the 5 learners best responding to the experimental environment and some others parameters were chosen, see Participants above. The longitudinal study of 30 months includes all in all 6 recording sessions, one each term, except the 4th term which comprises of two sessions, one before and one after an exchange journey to France. Two different narrative tasks in French L2 were recorded at each session 3. As our interests in the study were mainly the linguistic processes (grammar, vocabulary) and the cognitive cost implicated in these processes, we opted for the simplest possible writing task, knowledge telling (Scardamalia & Bereiter, 1987), not demanding other planning than What comes next?. All the recordings were 20 minutes long 2.3 Data analysis The ScriptLog logfiles enable us to follow the writing process in real time, to the hundredth of a second, keystroke by keystroke. In order to analyse the writing process in even greater detail we transcribed the TAP:s directly into the ScriptLog logfiles. Example 1 is an extract from one of those combined ScriptLog logfiles and TAP protocols: Emelie 5 [il placer - > met] time from to key 603.50 517 517 . verbalisation oj jag missuppfattade hela historien dr sorry (.) han hittar ingen fluga han lgger ut den [oops Ive misunderstood the whole story sorry (.) he doesnt find a fly he puts it there]

In order to have material to compare the written production in L2 to the one in L1, the task retelling Little Red Riding Hood was given first in L2 and then the next recording session in L1 (for the results in the comparative study, see Gunnarsson, 2006).
3

98

629.45 630.13 630.28 630.50 630.80 631.26 631.73 632.28 632.46 632.70 634.05 634.21 634.41 643.18 648.46 648.98 649.20 649.43 649.66 650.16 650.66 651.83 652.05 652.30 661.30

308 308 307 306 305 304 303 302 303 304 305 306 307 336 308 308 307 306 305 304 303 302 303 304 333

308 308 307 306 305 304 303 302 303 304 305 306 307 336 308 308 307 306 305 304 303 302 303 304 333

<MOUSE> <DELETE> <DELETE> <DELETE> <DELETE> <DELETE> <DELETE> p l a c e r <MOUSE> <MOUSE> <DELETE> <DELETE> <DELETE> <DELETE> <DELETE> <DELETE> m e t <MOUSE>

XX

placera [to place] pla cer (..) kanske (..) / [maybe] nej han lgger ju mettre [no he puts mettre] r /[he] det / [had] bst /[better] han / [do] gr / [that] meT je tu il inget, S S inget [je tu il nothing, S S nothing]

Legend: The first column shows the time in seconds. The second and third column locate the keystrokes in the text. The fourth column indicates which keystroke is typed and the fifth column contains the TAP transcription. Swedish is indicated in normal fonts and what is said in French in italics. Our translation from Swedish into English is given within square brackets. A capital letter indicates that the learner has pronounced normally silent let ters or spelled them. The bold parts in the protocol indicate the location of a pause.

We analysed the participants texts in order to establish their fluency, the importance of the verbalisations and their accuracy. In order to establish the fluency we tried several measurements used in other studies (see f. ex., Wolfe-Quintero et al., 1998, Chenoweth & Hayes, 2001). In this article we will only give an account of the measurement of fluency that we found the most dependable (Gunnarsson, 2006), i.e., bursts of proposition, used by Chenoweth and Hayes (2001) for analysing written production by American students in L2 German and French4. A burst is the number of written words produced between two pauses 5. Therefore it was important to establish a threshold for pausing. The pauses were individually determined through an analysis of the participants written production in L1, where the participants had achieved L1 expertise for the low level linguistic aspects we were interested in,

A similar measurement is also used in the analysis of fluency in the oral L2 production, see for example Towell et al. (1997). Chenoweth and Hayes (2001), who worked with TAP:s, based the bursts on the oral segments verbalised out loud before writing.

99

i.e. implicit production in L1. We established an individual inter-word pausing threshold for each one of the participants. Figure 2 shows the data leading to the establishment of the interword threshold, at 1.5 seconds for the participant Christine. In order to find a threshold, we used the 2-second-pause proposed by ScriptLog as a starting point to be modified by the individual inter-word pausing pattern in L1. The data in Figure 2 only come form the intra-phrase context. Pauses between the phrases are of course longer, when the writer tends to plan what to write next.
6,000

5,000

4,000

3,000

Duration (sec) 2,000

1,000

0,000 0 20 40 60 80 100 120

Figure 2: Individual pausing repartition between words in clauses in the L1 written production by Christine. (The length of all the inter-word pauses in a phrasal context is indicated in seconds).

We did not conduct a special test to evaluate the typing skills of each student, but they were all quite experienced in writing on a computer and the possible inter-individual differences in typing skills do not effect the comparisons between the written production in the longitudinal study in L2 French on an individual basis. Furthermore the chosen measurement, bursts of proposition, is not very sensitive to writing skills. As for the importance of the verbalisations, we evaluated this by a simple count of the words in the TAP:s. Then we calculated the ratio of the importance of the verbalisations compared to the importance of the edited text. The nature of the verbalisations are evaluated by two criteria. The first one is influenced by Zimmermanns (2000) observation that repetition of a tented form is common in the formulation process in L2 (see figure 1). Here we calculated the extent of repetitions of tented forms in L2. The second criterion is influenced by Wang and Wen (2002) and Zimmermann (2000) and concerns the extent of L1 when writing in L2. The accuracy is accounted for in terms of correct subject-verb agreement in the present, and correct choice of the past tense. As for the subject-verb agreement we only consider it in the singular in the group of verbs where the participants had the greatest problems, i.e. verbs on ir, re and oir, which comprise both regular and irregular verbs (Gunnarsson, 2006: 149). For the past tense we consider the choice between pass compos and imparfait, which is quite complicated for a Swedish learner as the Swedish preterite tense is used in both pass compos (aoriste) and imparfait contexts (see Kihlstedt, 1998: 24-37, for details).

100

3. FINDINGS We have hypothesized that the possibility of control and use of explicit knowledge facilitated by the written production mode will influence fluency and accuracy. We expect to find that more fluent production is less accurate and less fluent production is more accurate. This will be examined on an individual basis. To examine the expected use of explicit knowledge during the formulation process in L2, we also take the verbalisations from the TAP:s into consideration. 3.1 Fluency In order to compare as similar data as possible for the developmental aspect of fluency in the learners, we here only give an account of the task Telling a Memory. This task was given on four occasions, the first time in the second recording session and the last time in the last recording session, which means that 2 years passed between the first and the last session. In Figure 3, we show the development of fluency from the first (Memory 1) to the last (Memory 4) of the four recordings.

Christine Emelie Martine Oscar Sophie

1 Mean number of words / burst 0 Memory 1 Memory 4

Figure 3: Fluency in bursts of proposition. Legend: When calculating the number of words per burst, words started but interrupted were counted as 0.5 words. Elided forms such as dargent and cest are counted as one word. In the diagram, black is used for the three learners on the higher level of L2 French and grey for those on the lower level.

Figure 3 shows that some learners have a more fluent production than others. Furthermore it is the learners who have the most fluent written production in the first recording, i.e., Christine, Emelie and Sophie, who also have the most important increase of fluency during the study. Among these three learners, Christine distinguishes herself by almost doubling her fluency. The increase in the two others is more modest. Two learners, Martine and Oscar, remain on almost the same low level of fluency during the whole study. Oscars fluency even decreases from the first to the last recording. We will now look at the data from the verbalisations to try to see if these can inform us about the observed differences in fluency. 3.2 Verbalisation 101

According to Zimmermann (2000) and Barbier (1997) L2 writers tend to use more time for reflection on how to write, i.e. low level linguistic aspects such as vocabulary, orthography and grammar during the formulation process. These reflections implicate explicit knowledge and are time consuming. We therefore expect the learners who have less fluency to verbalise more than the learners with more fluency. We will here give an account of the extent and nature of the verbalisations in the TAP:s. In Figure 4 we can clearly see that in the first recording we have the opposite picture of the fluency diagram. The learners who have less fluency, Martine and Oscar, have more verbalisations. In the last recording, even though not gaining much in fluency, Martine seems to verbalise less, whereas Oscar doubles the verbalisations from the first recording. We also notice that the three learners with more fluency show a lower frequency of verbalisation. Christine has the least frequency and she seems to keep her very low frequency. In Emelie we notice a decrease and in Sophie a slight increase.

6 5 4 3 2 Words / Edited token 1 0 Memory 1 Memory 4 Christine Emelie Martine Oscar Sophie

Figure 4: The importance of verbalisation. Legend: The number of spoken words in the TAP:s is divided by the number of edited tokens in the final version of the written text. For word counting and colour coding, see Figure 4.

When we compare the frequency of verbalisations and the fluency they seem to be related. The participant with the most fluency, Christine, verbalises the least and the one with the least fluency, Oscar, verbalises the most, and so on. We will now try to find out more about the nature of the verbalisations. We will look for two behaviours observed by Zimmermann (2000), i.e., the repetitions of a tented form, and Zimmermann (2000) and Wang and Wen (2002), i.e., the use of L1 when writing in L2.
Table 1: The nature of the verbalisations: repetition of L2 and use of L1. Rep L2 / L2 0,21 0,25 0,56 0,71 0,21 L1 / Tot 0 0,69 0,64 0,53 0,84

Christine Emelie Martine Oscar Sophie

0,03 0,09 0,10 0,05 0,07

0 0,09 0,05 0,08 0,02

102

Legend: The proportion of repetitions of tented formulations in L2 and of the use of L1 is indicated as the result of simple divisions: the number of repeated words in L2 divided by the total of words in L2, and the number of words in L1 divided by the total number of verbalised words; = standard deviation.

The data indicated in Table 1 are the mean values of all the four recordings of the Telling a Memory task. We could discern an increase or a decrease in the frequency of verbalisations in some learners, whereas the extent of repeated L2 and use of L1 remain on about the same levels in all the recordings in the same learner. According to these data, it seems as if the two groups differing in fluency continue to differ in the repetitions of the tented forms in L2. Here, Martine and Oscar, the less fluent learners, tend to repeat two (Martine) or three times (Oscar) as often as the others. They use repetition to control what they are about to write or have just written. When it comes to the use of L1 while reflecting on what and how to write (Zimmermann, 2000: 87), we cannot separate the one group from the other. Instead one learner distinguishes herself clearly from the others, Christine. Indeed, in the TAP:s, Christine does not use the L1 at all while writing in L2 French. She only speaks French throughout all the 12 recordings in the study and she never discusses aloud in L1 what and how to write. She is also by far the most fluent writer in L2. The other learners use their L1 in the TAP:s from about 50% (Oscar, 0,53) to more than 80% of the time (Sophie, 0,84). The data from fluency and verbalisations indicate the existence of an individual production mode. Every learner seems to have his/her mode when producing in L2. Christine is the most extreme, with the greatest fluency, the fewer verbalisations and no L1 at all, whereas Oscar has the least fluency, the more frequent verbalisations and a lot of L1. Now we will see if the fluency in the written production has an impact on accuracy as predicts our hypothesis. 3.3 Accuracy Accuracy is first evaluated in terms of correct subject-verb agreement in the group of verbs on ir, -re and oir6. This is the group of regular and irregular verbs where the learners have the greatest difficulties to find the correct suffix in the singular throughout the longitudinal study. As the number of occurrences is quite weak in some learners, the first diagram (Figure 5) shows the percentage of correct agreement in all the recordings of the verbs in the group. This is also the reason why we will not take the developmental perspective into account when discussing accuracy.
Figure 5: Accuracy in subject-verb agreement .

The most frequent verbs avoir, tre and faire are excluded from this group.

103

100 90 80 70 60 %50 40 30 20 10 0 Correct agreement Christine Emelie Martine Oscar Sophie

Legend: The proportion of correct subject-verb agreement in the singular in the verbs on ir, -re and -oir is indicated in percent of the total production of subject-verb agreement of those verbs; The lighter colours are used for the learners on the lower linguistic level .

In Figure 5 our hypothesis seems confirmed. The subject-verb agreement is more correct in the learners who produce with less fluency and more verbalisations, Martine and Oscar, compared to the other learners on the same linguistic level of L2 French. Among the learners having a higher level of L2 French, Martine has the highest percentage of correct forms (86%), compared to the other two, Christine (56%) and Emelie (65%). Among the learners having a lower level of L2 French, the case is the same, Oscar has the highest percentage of correct forms (70%), compared to Sophie (58%). Our second evaluation tool for accuracy is the choice between pass compos and imparfait, known to be difficult for Swedish learners . The diagram in Figure 6 shows the percentage of all the correct choices between pass compos and imparfait in all the four Telling a Memory texts, where the past tense was elicitated.
Figure 6: Correct choice in the past tense.
100 90 80 70 60 %50 40 30 20 10 0 Correct choice of the past tense Christine Emelie Martine Oscar Sophie

104

Legend: The proportion of correct choice between pass compos and imparfait is indicated in percent of the total number of contexts with the two tenses; For the colour coding, see Figure 5.

Once again the two learners, Martine and Oscar, have more accuracy than the other learners on the same linguistic level. Nevertheless, when it comes to the choice between pass compos and imparfait it seems as if the linguistic level has a greater impact than fluency. Christine, Emelie and Martine all have a more correct production than Oscar and Sophie. In the group of learners on a higher linguistic level in L2 French, the most fluent learner, Christine, has the least accuracy when it comes to the agreement, albeit her production in the past tense is very accurate. Indeed the difference in accuracy between Christine (93%), and Martine (97%), the least fluent in this group, is rather insignificant. As fluency and verbalisations do not have the same impact on the choice of the past tense as on the correct subject-verb agreement, we will take into account the nature of the past tense in the learners texts. That is, we will evaluate the degree of variation between pass compos and imparfait. In order to be sure that the learners are on a level of L2 French where the two tenses are normally used in their contexts (see Bartning and Schlyter, 2004), we will only take the three learners on the higher level into consideration.
Figure 7: Variation between pass compos and imparfait.
60 50 40 30 % 20 10 0 Imperfective context Christine Emelie Martine

Legend: The variation is indicated in percent of contexts with imparfait.

Figure 7 shows a great difference between the more fluent learners and least fluent learner. The least fluent learner, Martine, has hardly any variation in her texts in the past tense. She concentrates exclusively on pass compos, a conscious choice according to the TAP:s. When there is a context requiring imparfait (3%) in her texts, instead she uses the present tense. Her only concern is to tell what happened in pass compose (foreground) and not to describe what it was like in imparfait (background). The more fluent learners, Christine and Emelie, give both foreground and background information. They seem eager to communicate to the reader both events and how they experienced these events. Therefore we have an alternation between pass compos and imparfait in their texts. Furthermore, these two learners use imparfait in their correct contexts, and thus not the present.

105

4. CONCLUSION AND DISCUSSION We find a relation between fluency and accuracy but it is not as simple as we hypothesised. As a matter of fact the written production of the less fluent learners is quite accurate as to the subject-verb agreement. It seems as if these learners use more explicit knowledge and control to a greater extent than the more fluent learners. When it comes to the past tense, the production of the less fluent learner is slightly more accurate, but we must add another parameter to the accuracy, variation between pass compos and imparfait. When we study the least fluent learners texts from this aspect, we see that they are more uni-dimensional in terms of foreground (pass compos context) and background (imparfait context). The least fluent learner seems to concentrate on one of the two tenses, pass compos, and communicates no personlised background information to the reader, whereas the more fluent learners give more subjective background information in imparfait to the reader. In the two cases where accuracy is studied, fluency and accuracy do seem related. Nevertheless the nature of the relation appears to vary according to the studied morphosyntactic phenomenon. The observed differences between more or less fluent learners, and their consequences as to accuracy and variation in the past tense, seem to relate to the learners different use of explicit and implicit knowledge when producing written text in L2 French. The observed differences in the treatment of subject-verb agreement and the past tense could illustrate the Krashen (1981) and Pica (1985) hypothesis that simple structures, in this case the subject-verb agreement, are easier to learn/produce in an explicit way (less fluent production), and that more complex structures, in this case the choice between pass compos and imparfait, are easier to learn/produce in an implicit way (more fluent production) 7. Learners using more implicit knowledge seem to be more at ease when producing more complex structures, whereas learners using more explicit knowledge tend to play it safe and concentrate on simple structures, or simplify the production of the complex ones (see Martine unidimensional use of pass compos). This is likely to have an impact on the individual learners acquisition of the L2. As these observations only come from a very limited sample of L2 French learners, in terms of quantity, we propose more research on individual production or learning styles and their impact on the acquisition of different grammatical phenomena. REFERENCES Barbier, M-L. (1997). Rdaction de texte en langue premire et en langue seconde: Indicateurs temporels et cot cognitif . Thse de doctorat, Universit de Provence. Bartning I. & Schlyter, S. (2004). Itinraires acquisitionnels et stades de dveloppement en franais L2. Journal of French Studies 14, 281-299. Chenoweth, N. A. & Hayes, J. R. (2001). Fluency in Writing: Generating Text in L1 and L2. Written Communication 18(1), 80-98. Fayol, M. (1994). From declarative and procedural knowledge to the management of declarative and procedural knowledge. European Journal of Psychology of Education 9(3), 179-190.
7

Krashen and Pica are using the term rules (see Housen et al., 2005 for a discussion).

106

Fayol, M. (1997). Des ides au texte. Paris: P.U.F. Gunnarsson, C. (2006). Fluidit, complexit et morphosyntaxe dans la production crite en FLE. Thse de doctorat, tudes romanes de Lund 78, Universit de Lund. Housen, A., Pierrard, M., Van Daele, S. (2005). Structure complexity and the efficacy of explicit grammar instruction. In A. Housen & M. Pierrard (Eds.), Investigations in Instructed Second Language Acquisition, (pp. 235-269). Berlin: Mouton de Gruyter. Krashen, S. D. (1981). Second language acquisition and second language learning . New York: Prentice hall. Larsen-Freeman, D., Long, M.H. (1991). An Introduction to Second Language Acquisition Research. London: Longman. Kihlstedt, M. (1998). La rfrence au pass dans le dialogue: tude de lacquisition de la temporalit chez des apprenant dits avancs de franais. Thse de doctorat, Forskningsrapporter 6, Stockholms Universitet. Lightbown, P. M., Spada, N. (1993). How Languages are Learned. Oxford: Oxford University Press. Pica, T. (1985). Linguistic simplicity and learnability: Implications for language syllabus design. In K. Hyltenstam & M. Pienemann (Eds.), Modelling and Assessing Second Language Acquisition, (pp. 137-151). Clevedon: Multilingual Matters 18. Scardamalia, M. & Bereiter, C. (1987). Knowledge telling and knowledge transforming in written composition. In S. Rosenberg (Ed.), Advances in applied psycholinguistics, vol 2: Reading, writing, and language learning, (pp. 142-175). Cambridge: CUP. Strmqvist, S. & Ahlsn, E. (1999). The process of writing: A progress report. Gothenburg: Gothenburg Papers in Theoretical Linguistics 83. Towell, R., Hawkins, R. & Bazergui, N. (1996). The Development of Fluency in Advanced Learners of French. Applied Linguistics 17(1), 84-119. Wang, W. & Wen, Q. (2002). L1 use in the L2 composing process: An exploratory study of 16 Chinese EFL writers. Journal of Second Language Writing 11, 225-246. Wolfe-Quintero, K., Inagaki, S, Kim, H-Y. (1998). Second language development in writing: Measures of fluency, accuracy and complexity . Honolulu: Univerisity of Hawaii. Zimmerman, R. (2000). Writing: subprocesses, a model of formulating and empirical findings. Learning and Instruction 10, 73-79.

107

108

THE EFFECTS OF PRE-TASK PLANNING ON L2 NARRATIVE TASKS. Haemoon Lee & Miyoung Oh Sungkyunkwan University, South Korea

1. INTRODUCTION The present study aims at examining the effect of planning time that Skehan (1998) proposed as one of the composing variables of task that can be manipulated for the three aspects of interlanguage performance: complexity, accuracy and fluency. According to him, learners attempt to produce the highly complex structure in their interlanguage is the driving force of L2 development because the growing degree of complexity is the sign that the learner is trying the cutting-edge of his/her interlanguage system. Accuracy, on the other hand, was proposed as playing the role of consolidating the existing interlanguage system and fluency was proposed as representing and contributing to automatic access to the current interlanguage system. Skehan (1998) and Robinson (2001) propose that while the learners use the target language, their proficiency improves through the combination of the three aspects. They suggest that different tasks contribute to different aspects among the three and that having planning time prior to the task performance is one variable that affects the three aspects of interlanguage performance. Therefore, in the present study, the second language learners language output is analyzed in terms of its complexity, accuracy and fluency when the planning time was given and when not in a narrative task. 2. REVIEW OF LITERATURE Previous studies have investigated the effects of planning from various perspectives: task type (Skehan & Foster, 1996), length of planning time (Mehnert, 1998), different operationalizations of planning such as detailed planning, undetailed planning, and no planning (Foster & Skehan, 1996), the effects of planning time depending on different proficiency of learners (Wigglesworth, 1997). They suggest that planning time is effective in the task-based instruction in general. However, regarding the trade-off effect among the three aspects of language output, complexity, accuracy and fluency, some studies (Mehnert, 1998; Ortega, 1999) show that there is a trade-off effect between complexity and accuracy in particular while others show that the two aspects are achieved together (Robinson, Ting & Urwin, 1996; Wiggleworth, 1997). The trade-off effect between complexity and accuracy were reported by several studies. For example, in Mehnerts (1998) study regarding the length of planning time, though the overall effect of planning was confirmed, interestingly, he suggested that complexity and accuracy were not achieved together, because 1-minute planners were much better in accuracy than no planners without difference between 1-minute and 10-minute planners whereas complexity improved only after 10 minute long planning. Therefore, he made a conclusion that if the least planning time is given, it seems to be allotted in form, accuracy, first and if there is the more planning time, learners attention heads for the complexity. Ortega (1999) reports the effect of 109

planning significantly large in fluency and complexity but not in accuracy, suggesting the trade-off effects between complexity and accuracy. Though both Mehnert (1998) and Ortega (1999) suggest the trade-off effect between complexity and accuracy as the effect of planning, the two studies report the conflicting findings regarding which aspect of the three is improved by planning time, which probably is due to differences in the kind or difficulty levels of tasks and the level of subjects in two studies. More fine-tuned findings on the trade-off effect have been reported regarding the task difficulty. In Song, Jeong Wons two studies (2005a, 2005b) on Korean adult learners, the planning effect occurred only in the unfamiliar task, therefore, more difficult task than the familiar task. And the effect occurred only for complexity of the learner output. The slightly different findings from the two similar studies by Foster and Skehan (1996) and Skehan and Foster study (1997) also suggest that complexity and accuracy have trade-off effect, with the difficult tasks resulting in the improvement in complexity in contrast to the easy tasks resulting in the improvement in accuracy. In Skehan and Foster (1997), the task was to describe the story represented by the cartoon strips with a clear story line. The result showed that in the narrative tasks, planning time was associated with fluency and accuracy, but not with complexity. In contrast, in Foster and Skehan (1996) the narrative task was to make up a story from a semi-related series of pictures with no obvious storyline, which is more cognitively demanding than that of Skehan and Foster (1997). The learners in the more demanding task in Foster and Skehan (1996) improved complexity most, not accuracy. The difference between the two narrative tasks lies in the cognitive load, because subjects had to create a story in Foster and Skehan (1996) and seemed to have given priority to creating story more than other aspects such as accuracy and fluency in their language. In comparison to them, the learners with the clear-sequenced cartoon strips may have had less burden for the content because there was the clear story line and seemed to have more focused on accuracy than complexity. What is drawn from the two studies in common is the trade-off effect between complexity and accuracy, though which aspect improves varies according to the difficulty level of task. Therefore, it is concluded that in task performance, as the linguistic complexity increases, the grammatical accuracy tends to decrease. In sum, studies above suggest in common that complexity and accuracy do not improve simultaneously. Depending on the cognitive load that each narrative task demands on the learners, there seems to be the priority in using the planning time for. On the other hand, unlike the above studies, Robinson, Ting and Urwin (1996) proposed that the more difficult tasks will lead the learners to produce less fluent, but more complex and accurate language output, defining the task difficulty as the composite result of cognitive load of the content, planning time, and the prior information that the learner has about the task. In their study, the results partially supported their hypothesis. That is, in a difficult task, there and then narrative, there was greater lexical density (one of the measures of complexity) and also greater accuracy, though there was no difference in fluency between the hard and easy task groups, or in the complexity measured by subordination clauses. In this study, both accuracy and one type of complexity were achieved simultaneously supporting their prediction. A similar finding to Robinson, et al. (1996) was reported by Wiggelworth (1997) that the high level learners had advantages of planning time in the more difficult tasks in three variables of fluency, complexity and accuracy than in the easier tasks. The low level subjects didnt have any difference between planning and non-planning conditions. The two studies above suggest 110

that there may not be the trade-off effect between complexity and accuracy when the tasks are difficult in particular and/or the level of learners is high. In sum, the planning strategy in task-processing seems to have an effect on learners speaking performance, reducing cognitive load that learners can have for the task and expanding the learners limitation in their cognitive capacity. However, it is uncertain whether planning time affects fluency, accuracy, or complexity altogether, or with some trade-off effect, given that learners have a limited capacity. In particular, the controversy in the trade-off effect between complexity and accuracy is an important issue, not only because of its conflicting findings but also because they are the two most important mechanisms for interlanguage expansion, in comparison to fluency that is mostly achieved by automatization through repetition. The two are both needed for SLA but it is not clear whether they can be simultaneously achieved given the limited cognitive capacity of human being, let alone whether they should be simultaneously achieved for SLA. Whereas Robinson (2001) proposed that they can and so they should be, Skehan (1998) suggests that they are subject to the trade-off effect due to the limited cognitive capacity of human being. Based on the literature reviewed so far, the present study formulated the following research questions: 1. Does the planning time have the effect on the Korean adult learners complexity, accuracy and fluency in their output in the narrative task? 2. If so, is there any trade-off effect between the three aspects of the language output? 3. RESEARCH METHOD 3.1 Participants The participants of the study were twenty four students attending one of the major universities in Seoul, Korea. They were in the second year in the university and were the volunteers among the members of the same extra-curricular activity club. They were composed of the different majors and of different sex without any experience to study English abroad. Of the 24 participants, 12 participants were the subjects for the study. Six were the speakers in the planning group, and the other six were the speakers in the non-planning group. The other 12 participants served as the listeners of the stories being paired with the speaker subjects. They were allowed to ask questions when needed. Since they did not have any standardized English test scores, based on ACTFL guideline, they are considered as low intermediate learners judging from their speech performance in the task. 3.2 Task and procedure The narrative task type was chosen for the study. A six-cut-picture in sequence was used as a material, which provides access to a visual stimulus for the subjects, as shown in APPENDIX A. It was extracted from an ESL textbook, which was used in a private foreign language institute. Since the book has been used for learners in an intermediate level class as a text book, the material was not considered difficult to the subjects, and was considered to provide the subjects with the opportunities to be creative in describing the pictures because the story was about the aliens from the outer space. The speaker and listener were arranged to sit face to face with each other. In the planned condition, the speakers looked at the picture strip for 10 minutes to prepare 11 1

for what they were going to say. Under the unplanned condition, the speakers told the story as soon as they looked at the pictures. The output of subjects was audio-recorded and transcribed. Each interaction lasted five to ten minutes. 3.3 Measures The recorded oral output was transcribed and analyzed in terms of complexity, accuracy and fluency in the measures partly adopting Song, Jeong Won (2005a, b). Complexity was measured by subordination clauses per T-unit and the total number of words per T-unit; Accuracy was measured by error-free T-units and the number of errors per T-unit; Fluency was measured by the ratio of the length of pauses to the length of the total production and the number of pauses per word. Regarding accuracy, no error sentences and 1 error sentences were counted both as an errorfree T-unit. Grammatical errors, wrong word choices, and incorrect word order were considered as errors, but the incorrect use of article was not counted as an error because they far outnumbered other types of errors such that they could override the difference in other types of errors. With regard to fluency, any pauses of more than 1 second were considered significant. Therefore, in this study a pause was measured as there is a pause more than 1 second, and a pause between 1 second and 2 seconds was regarded as 1 second, and a pause between 2 seconds and 3 seconds was regarded as 2 seconds etc.. 3.4 Analyses For complexity, the total number of words within T-units was divided by the number of T-unit for examining the average length of T-unit, and the number of subordinate clauses was divided by the total number of T-unit to produce the average number of subordination per T-unit. For accuracy, the number of error-free T-units was divided by the total number of T-units to produce the ratio of the error-free T-unit, and the total number of errors was divided by the total number of T-units to produce the average number of errors per T-unit. To measure fluency, the total length of pause was divided by the total length of output production resulting in the ratio of length of pause over the total length of output production and the number of pause was divided by total number of words, resulting in the average frequency of pause per word. A statistical analysis of the data was performed using SPSS package version 10.0. 4. RESULTS The narratives spoken by the twelve subjects were transcribed into 1547 words in total, excluding the hedges and fillers such as uhm, uh, well, and etc. Among them, the non-planned group produced 594 words in total with 99 words as the average number of words that each subject produced. The planned group produced 953 words in total with 159 words as the average. The t-test showed that the planned group produced more words almost at the significant level (p= .057), as shown in Table 1.
Table 1: The Number of Words Produced in Total. Total N. of words 594 953 Mean N. words 99 159 of SD 16.57 66.16 t-score -2.149 p-value (two-tailed) 0.057

No planning Planning

112

The data in Table 2 shows that in all three aspects of the learners output, the planned group performed better. According to the descriptive statistics, the planned group produced more complex language in terms of the length of T-unit (9.83 vs. 7.80, p= .182) and the rate of subordination (0.24 vs. 0.12, p= .282). The planned group also produced more accurate language in terms of error free T-unit (0.56 vs. 0.49, p= .564), and the number of errors per Tunit (0.40 vs. 0.60, p= .099*). The planned group was more fluent than the non-planned group in terms of the number of pauses per words (0.11 vs. 0.22, p= . 052*) and in terms of the ratio of the length of pause during the output production (0.16 vs. 0.27, p= .054*). However, at the level of p<.05, the inferential statistics showed that none of the three measures differed between the two groups, though the differences are suggested at the level of p< .1. Therefore, the comparison between groups has been interpreted for tendency level, at the level of p< .1. As a result, among the three features the effect of planning tends to be in the accuracy measured by the number of errors per T-unit and the two measures of fluency, but not in any of the complexity measures. Therefore, to the Research Question 1, regarding the effect of planning on complexity, accuracy and fluency of the learners output, the effect of planning tended to be in accuracy and fluency, but not in complexity. To the Research Question 2, regarding the possible tradeoff effect among the three features of output, the study showed that the trade-off effect seemed to have occurred between complexity and accuracy/fluency, since only the accuracy and fluency tended to be better by the planned group, though the raw numbers suggest the potential that the complexity also could be improved by planning.

Table 2: Complexity, Accuracy and Fluency in Planning and Non-Planning Groups (N=number, L= length, Sub= subordination, *p< .1) Group N Mean SD t-score p-value (two-tailed) 0.18 0.28 0.56 0.09* 0.05* 0.05*

Complexity

N.words/ T-units N. sub/ T-units

Accuracy

Error-free/ T-units Errors/ T-units

Fluency

N. pause/ Total words L. pause/ Total length

No planning Planning No planning Planning No planning Planning No planning Planning No planning Planning No planning Planning

6 6 6 6 6 6 6 6 6 6 6 6

7.80 9.83 0.11 0.23 0.49 0.56 0.59 0.40 0.21 0.11 0.27 0.16

1.86 2.93 0.09 0.23 0.20 0.18 0.19 0.19 0.10 0.03 0.11 0.04

-1.43 -.1.17 -.0.59 1.8 2.2 2.18

113

5. DISCUSSION Considering the results from this study, planners produced more fluent and more accurate speech (p< .1), although accuracy was not confirmed fully by both measures. Although the planners speech was also more complex, the difference between non-planners and planners didnt reach even tendency level. Therefore, in the present study, the effect of planning was interpreted only in accuracy and fluency, not in complexity. The present study suggests the potential of trade-off effect between complexity and accuracy/fluency because accuracy and fluency were higher by the planners than by the nonplanners whereas complexity was not. The planning time seems to have been used to increase accuracy/fluency more than complexity by the learners in the present study. It adds to the findings that support the general trade-off effect between complexity and accuracy/fluency shown by a number of previous studies. Regarding the reason why complexity was not improved by planning, the difficulty level of the task seems relevant. In review of the previous studies, Skehan and Fosters (1997) structured narrative task and Song, Jeong Wons (2005a, b) narrative tasks with a familiar story were easier task than the less structured narrative task and the task with the unfamiliar story respectively. In both studies, in easy tasks, complexity was not affected by planning, but only the accuracy and/or fluency were improved by planning. Though it is not clear whether the task given in the present study was difficult or not in any absolute term or in relation to the students level, in comparison to the tasks in other studies, one possible explanation is that the clearly sequenced pictures made the task easy to the learners, reducing the cognitive burden of creating the story itself. In this sense, the findings from the present study can be said to be in line with the studies of Skehan and Foster (1997) and Song, Jeong Won (2005a, b) that showed the trade-off effect exists anyway both in the difficult and easy tasks, though in a different pattern. On the other hand, regarding the Wiggleworth (1997) and Robinson et al.s (1996) position that the trade-off effect disappears in the difficult task by the high level learners, the present study does not provide the relevant data because the difficulty level of the task was not built in the research design as an independent variable. The difficulty level of the task was proposed as dependent on the cognitive load of the content, planning time, and the prior information that the learner has about the task (Robinson, 2001). Whereas the first two are learner-external, the learners prior information is learnerinternal and includes the learners proficiency level. Therefore, the difficulty level of the task is determined by the relative match between the task-inherent factor and the learners level. That is, the task difficult to one learner could be easy to another learner. Further studies on the difficulty level of the task in relation to the learner level seem to be needed in order to examine the proposal of Robinson et al. (1996) and Robinson (2001). Within the limitation of the present study of not being able to design the difficulty level of the task as an independent variable, the closer examination of the subjects narrative was conducted in order to investigate whether the subjects narratives with more creative details are related to the higher complexity of their narratives, in consideration that providing creative detail to the bare picture description possibly contributes to making the task more complex and difficult as a result of the naturally occurring voluntary behavior of the learners. Planned Subject Number 2: descriptive narrator At one night uh, three aliens uh, found the earth. And they reached America. Uh, they found a 114

house in America. And they went went there. And in the house, two uh people, um a couple was in there. And they saw the UFO. So they were uh very surprised and scared. They uh but they thought uh it is uh they thought is is curious. So they went out and see the alien. And surprisingly the alien could tell uh speak um English. So they so they could communicate. And and then the three uh went to the house. And the alien was drinking a coke. And the couple was drinking coffee. And the alien even slept in the house. And the next morning uh, the alien uh could uh the alien tell told the man to go out with man. Planned Subject Number 3: creative narrator Yeah, uh I thought I thought it is about the alien travel like trip because the UFO you know the UFO is not broken and even theyre back away. So so it seems to be has traveling the earth. And the ordinary old couple uh maybe I thought they have dont have uh child, because they look old. So they seems to be a good friend because the ordinary couple uh maybe may be they were lonely. So alien is food friend to them but he last picture seem to be the old man is exhausted. Uh maybe maybe the alien wanted too much about something like uh, something like playing. Thats all about the story. Its end of my speech. The above transcripts are showing the two sample narratives from the first subject who is descriptively narrating the story and from the second subject who is creative in explaining the story behind the pictures. As shown in Table 3, the creative narrative was more complex than average and less accurate than the average whereas the descriptive narrative is the opposite, though the fluency of both narratives was near average. The creative narrative was beyond the average complexity as 12.44 words per T-unit and 0.56 subordination per T-unit, whereas the descriptive narrative was below the average complexity as 7.0 words per T-unit and 0.06 subordination. The accuracy was the opposite; the creative narrative was below the average accuracy as 0.44 error-free T-unit per T-unit and 0.67 errors per T-unit, whereas the descriptive narrative was above the average as 0.69 error-free T-unit per T-unit and 0.19 errors per T-unit (See APPENDIX B for analyzed transcription).
Table 3: The Analysis of Two Sample (Narratives N=number, L= length, Sub= subordination). Complexity N.Words/ T-unit Planned 2 (descriptive) Planned 3 (creative) Group Average Accuracy Error-free/ T-unit 0.06 0.56 0.23 Fluency N. L. pauses/ pause/ Total length words 0.19 0.11 0.67 0.40 0.10 0.11

N.Sub/ T-unit 7.0 12.44 9.83

Errors/ T-unit 0.69 0.44 0.56

0.11 0.14 0.16

Therefore, it is interpreted that the creative story takes more subordinations and words within T-units resulting in higher complexity whereas the accuracy was not sustained. In contrast, the descriptive narrative was provided through relatively accurate but less complicated coordinate clauses. It cautiously suggests that the narrative tasks could be differently performed by the learners depending on whether the story is creatively narrated or descriptively narrated, which in turn affect the degree of complexity and accuracy probably due to the resulting difficulty of the task in a manner of trade-off effect.

115

Story creation was interpreted as increasing cognitive load in the previous studies (Skehan, 1998) that distinguished semi-related stories from the structured picture sequence task. It is considered that the difference between the semi-related pictures and the structured picture sequence is a matter of degree rather than mutually exclusive classification. In this respect, the picture sequence the present study used provides the students with the possibility to be creative as much as the speaker wants to be. Most of the subjects in the present study, however, were not very creative but were faithful to describing the pictures. Therefore, one of the reasons why there was no significant difference in complexity between the two groups seems that most of the subjects took the task as descriptive tasks, without attempting to be creative. The analysis supports that the complex or difficult narrative requires more complex language as suggested by many previous studies. The support was made in a qualitative way in the present study unlike in the previous studies. The two sample scripts were taken among the twelve scripts that are most contrasting in terms of creativity/descriptivity, though the judgment of creativity was rather intuitive based on whether there are parts of the students production that are not explicitly shown in the picture. Standards for judging the creativity might be one issue that needs further study for investigating systematically the relationship between the degree of creativity and the complexity of language involved. On the other hand, such difference in creativity/descriptivity in story telling might be part of individual difference, i.e., individual students may have as a stable tendency to be creative and descriptive when they tell a story. If so, how such individual difference is incorporated into language learning process would be one of the interesting research questions for the future study. Also, controlling such individual difference, if there is, and letting the students perform both creative and descriptive story telling, respectively, for comparison would be another meaningful direction for future study. 6. LIMITATIONS OF THE STUDY AND IMPLIACTIONS FOR TEACHING Regarding the limitation of the present study, the measures used here may have affected the results of this study. For complexity, this study used the number of subordinate clauses per Tunit which is the general measurement and the length of T-unit, but type-token ratio can also be measured as in Robinson et al. (1996). For fluency, this study considered only pause without considering replacement, repetition, as in Song (2005a) and Foster and Skehan (1996). Comprehensive measurements of a larger data set would have provided the clearer picture of impact of planning. There have been many studies of planning time as a pre-task activity in the task-based approach in learning the second language. Many have proven that planning time before a real task helps learners prepare the task and achieve the goal of the task. The present study adds to such a finding though not all three aspects of learner language improved significantly by planning. There was a tendency of trade-off effect between complexity and accuracy/fluency. However, it is not clear whether the trade-off effect occurred because the task was not difficult enough relative to the learners proficiency level in support of Robinson et al. (1996) or because the trade-off effects are supposed to occur at any rate due to the limited cognitive capacity of human being as argued by Skehan (1998). Though the ideal type of task is the one that promotes all three aspects of learner output, studies to present rarely suggest such a finding. Therefore, the tasks would have to be designed targeting the different aspects of the language output in practice. On the other hand, fine tuning the level of tasks and the learners would be studied more to examine the possibility of maximizing the human cognitive capacity 116

so that all three aspects can improve simultaneously. Meanwhile, as suggested in this study, encouraging the students to be creative seems to be one way to contribute to increasing the complexity of their interlanguage performance. REFERENCES Foster, P., & Skehan, P. (1996). The influence of planning and task-type on second language performance. Studies in Second Language Acquisition 18, 299-323. Mehnert, U. (1998). The effects of different lengths of time for planning an second language performance. Studies in Second Language Acquisition 20, 83-108. Ortega, L. (1999). Planning and focus on form in L2 oral performance. Studies in Second Language Acquisition 21, 109-148. Robinson, P., Ting, S., & Urwin, J. (1996). Investigating second language task complexity. RELC Journal 26, 62-79. Robinson, P. (2001). Task complexity, cognitive resources, and syllabus design: a triadic framework for examining task influences on SAL. In P. Robinson (Ed.), Cognition and second language instruction, (pp. 287-318). Cambridge: Cambridge University Press. Skehan, P. & Foster, P. (1997). Task type and task processing conditions as influences on foreign language performance, Language Teaching Research 1(3), 185-211. Skehan, P. (1998). A cognitive approach to language learning, Oxford: Oxford University Press. Song, J. (2005a). Task-processing conditions as influences on spoken language, English Teaching 60(3), 117-137. Song, J. (2005b). The spoken performance of advanced learners in a narrative task, Foreign Language Education 12(4), 85-106. Wigglesworth, G. (1997). An investigation of planning time and proficiency level on oral test discourse. Language Testin, 14, 85-106.

117

APPENDIX A Picture Sequence for the Narrative Task

APPENDIX B Analyzed Transcription of Two Sample Narratives < >: error free T-un it (including +1 error) [ ]: subordinate clause ( ): pause by seconds alien(italic): error Planned-2 <At one night uh, three aliens uh, found the Earth.> <And they reached America.> <Uh, they found a house in America.> <And they went went there.>(1) And in the house (2), two uh people, um a couple was in there. <And they saw the UFO.>(1) <So they were uh very surprised and scared.>(1) They uh but they thought uh it is uh (1) they thought [it is curious]. So they went out and see the alien. <And surprisingly the alien could tell uh speak um English.> <So they (2) so they could communicate.>(1) <And and then (1) the three uh went to the house.> <And (1) the alien (2) was drinking a coke.> And the couple was drinking coffee. <And the alien even slept in the house.>(1) <And the next morning uh, the alien uh could uh the alien tell told the man (1) to go out with man.>

T-units: 16 Subordinate clause: 1, S/T=0.06 Words: 112, W/T=7.0 Errors: 3, E/T=0.19 118

Error-free T: 11, EF/T=0.69 No. of pauses: 12, P/W=0.11 Planned-3 Yeah, uh (1) I thought I thought it is about the alien travel like trip [because the UFO you know [the UFO is not broken and even (1) theyre back away]].(1) So (1) so it seems to be has traveling the earth.(1) <And the ordinary old couple uh maybe I thought [they have dont have uh child],(1) [because they look old].> So they seems to be a good friend [because the ordinary couple uh maybe may be they were (1) lonely]. So alien is food friend to them. But he last picture seem to be the old man is exhausted.(1) <Uh (1) maybe maybe the alien wanted too much about (1) something like uh, (1) something like playing.> (3) <Thats all about the story.> <Its end of my speech.> T-units: 9 Subordination: 5 S/T=0.56 Words: 121-8=112 W/T=12.44 Errors: 6 E/T=0.67 Error-free T: 4 EF/T=0.44 N. Pauses: 11 P/W=0.10

119

EFL LEARNER PERFORMANCE VARIATION AS THE EFFECT OF INTERLOCUTOR TYPE. Haemoon Lee, Kyungjin Joo, Jungwon Moon & Yunsun Hong Sungkyunkwan University, South Korea

1. INTRODUCTION In discussing English teachers, English learners in Korea largely tend to prefer native speakers (hereafter referred as NS) of English to non-native speakers (hereafter referred as NNS) because native speakers provide authentic target language input and so they think that native speakers would be much more efficient interlocutors to improve their overall English proficiency (Lee, 2003, McDonough, 2004). However, in reality, NNS English teachers outnumber the NS teachers in Korea on one hand, and most of EFL learners in Korea interact with other learners in the communicatively oriented classrooms in group or pair activities on the other hand. Therefore, the present study compares the effect of the three types of the interlocutors. Whereas the previous studies examined the effect of the interlocutor type on the various interactional features such as meaning negotiation, pushed output production and focus on form and etc. (Lyster & Ranta, 1997; Mackey, Oliver, & Leeman, 2003; Pica & Doughty, 1985a, 1985b; Varonis & Gass, 1985), the present study aims at analyzing the learners language output in terms of its complexity, accuracy and fluency in Skehans (1998) framework, thus the researchers can get more direct access to the effect of the interlocutor type on learners IL output performance. 2. REVIEWS OF THE LITERATURE Though the survey shows that the EFL learners preference of NS interlocutor is a norm (McDonough, 2004), empirical studies report diverse findings. Regarding meaning negotiation, Varonis and Gass (1985) reported that negotiation of meaning is the most prevalent in NNS-NNS interaction, whereas Pica, Lincoln-Porter, Paninos, and Linnel (1996) argued that the interaction between L2 learners can address some of their input, feedback, and output needs but that it does not provide as much modified input and feedback as the interaction with native speakers does. Regarding the amount of output, studies of Zuengler and Bent (1991) and Selinker and Douglas (1985) suggest that an expertise in a certain content domain would make learners participate in a conversation more actively regardless of the types of interlocutors, resulting in more output production than the interlocutor. Regarding the types of teacher in particular, rva and Medgyes (2000) characterized the NS teachers as less committed, focusing on fluency, meaning, and colloquial registers, and above all, as tolerating students errors. On the other hand, the NNS teachers were more committed and more strict as they focus on forms, accuracy and formal registers by correcting students 120

errors. In addition, Milambiling (1999) emphasizes the advantages of NNS teachers in terms of bringing in a variety of background experiences, attitudes and openness to their teaching situations in an EFL setting. She adds that NNS teachers would have a clear advantage in a certain teaching situation since NNS teachers have had experiences of learning English themselves. Also, they can understand the students needs and problems, both linguistically and culturally, since their L1 is the same as that of their students. In sum, studies on the interaction by types of the interlocutors have reported mixed findings and were mostly in the ESL situations where L2 learners are varied in their L1 backgrounds and exposed more easily to interactions with native speakers. On the other hand, considering that the weakness of the NNS interlocutor was the lack of target language proficiency, if the NNS interlocutors are replaced by NNS teachers who have native-like L2 proficiency, the effects of interaction could be even greater than what can be earned from interactions with NS interlocutors or NS teachers. From the learners output perspective, how such different role of the interlocutor results in the learners output needs to be examined in terms of the three aspects of output that are differently relevant to SLA. Therefore, the following research questions have been formulated to examine, in the EFL situation, the different role of the interlocutor type in terms of resulting learner interlanguage. (1) Is there any difference in L2 learners language output in terms of complexity, accuracy, and fluency when they interact with different types of interlocutors in a controlled information gap task? (2) If (1) is the case, is it due to the different interactional feedback of the interlocutors? 3. METHODOLOGY There were 28 participants in total; 24 of them were freshmen college students at a womens university in Seoul, Korea, and there were 2 NS teachers and 2 NNS teachers. All the student participants were from one of the freshmen English classes whose majors were all pharmacy, and so their English proficiency was relatively standardized based on their English scores in the college entrance exam, though not strictly enough. The teacher participants were the English instructors at the same school who have never taught the students. Student participants were announced about the experiment by their English instructor during the class and volunteered to sign up for the experiment. Teacher participants were politely asked to participate in the experiment and they accepted the request, free of charge. All the NS and NNS teachers have taught English in Korea for several years. The NNS teachers were the ones who have native-like English proficiency with experience of living in the United States for several years. One has a masters degree earned in the States, and the other has a Ph.D in English literature at a Korean university. The student participants were randomly divided into three different groups, and those were the NS teacher group, NNS teacher group, and NNS peer group, 8 students in each, as shown in Table 1. The participants in the NNS peers group changed roles and performed the same task with different stories.
Table 1: Groups of Participants Teachers Students Native Teachers A B 1 5 Non-native Teachers C D 9 13 Non-native peers in pairs 17 21

121

2 3 4

6 7 8

10 11 12

14 15 16

18 19 20

22 23 24

The student participants were told that the experiment had nothing to do with their grades to reduce students anxiety to speak in English. The teacher participants were asked not to try to teach their student participants during the task but just naturally communicate with them for the purpose of understanding the students output and performing the task; in other words, they were allowed to freely ask questions for repetition, clarification or confirmation but not to teach something explicitly to the student participants. For the experiment, information gap tasks were employed from the book titled Can You Believe It? (Huizenga, 2000); one participant was responsible to arrange 6 pictures in a correctly sequenced order according to a story that the other participant tells. The student participants took a role of telling a story to the teacher or peer participants based on a picture strip given to them (APPENDIX A). 4 different stories of the similar length from the same book were chosen so that there is no repeated story to the same interlocutor or to the story teller. To reduce serious confusion or communication breakdown, the participants who were responsible to tell a story to the other was shown a script of a story written in Korean (APPENDIX B), and it was taken away from them after they finished reading it and began to tell a story. The recorded data of the participants oral output were transcribed and analyzed in terms complexity, accuracy and fluency, and these were measured with eight different standards as follows: a) Complexity: The structural complexity of the learner output was measured by the number of t-unit, the number of subordinate clauses per t-unit b) Accuracy: Well-formedness of learner output was measure by three categories, lexical error, morphosyntactic error, error-free t-unit c) Fluency: Fluency of learner output was measured by the number of repetition, the number of replacement, the number of hesitation Since the overall proficiency of the student participants was not high, the t-unit with one error was considered as an error-free t-unit (Song, 2003). Every subordinate clause was marked for measuring complexity. In terms of accuracy measures, all the word choice problems were marked as lexical errors, and all the other grammatical errors were marked as morphosyntactic errors. In terms of fluency measures, repetition was simply repeating the same words, and replacement was the self-repair of ungrammatical features (she have she has), and all kinds of utterances such as uh.., um.., well.. were counted as hesitation. After all the measures were converted to the ratio to every 100 words produced by the individual learners, except for the ratio of subordinate clauses to t-unit, one way ANOVA was conducted from SPSS 12.0. 4. RESULTS In terms of the total number of words in the learners output production, the learners produced average of 94.5, 114.0 and 115.1 words with the NS teachers, NNS teachers and with the NNS peers respectively, without the statistically significant difference as shown in Table 2 below (p=.54). In other words, learners produced the similar amount of output regardless of the types of interlocutors they interact with. 122

On the other hand, the quality of the language output produced by the participants in the three groups was significantly different in terms of their complexity, accuracy and fluency as discussed in the following sections.
Table 2: Total Number of Words Produced in the Tasks ( *p<.05) Groups NS T NNS T NNS peers Total N 8 8 8 24 Mean 94.5 114.0 115.1 107.9 SD 38.7 39.2 45.8 40.7 Min. 51 64 60 51 Max. 160 197 206 206 F 0.63 Sig. 0.54

4.1 Complexity The complexity of the participants language output was measured by two syntactic dimensions, the number of t-unit that was produced per 100 words and the number of subordinate clauses per t-unit. As shown in Table 3, there was not a significant difference between the three groups in term of the number of t-unit; however, in terms of the number of subordinate clauses produced per t-unit, there was the significant difference between the three groups (p=.029). The NNS teacher group produced the most subordinate clauses, 0.022, per tunit, followed by the NNS peers, 0.006 and the NS teachers, 0.006. Scheffe test shows that the NNS teacher group produced more complex output than the other two groups with the difference approaching the significance level ( p=.064), while the other two groups were the same (p=1.0) as shown in Table 3.

123

Table 3: Complexity (*p<.05) Dimensions t-unit Groups NS T NNS T NNS peers Total NS T NNS T NNS peers Total Mean (SD) 10.2 (1.6) 9.1(2.2) 9.2 (2.0) 9.5 (2.0) 0.006 (0.007) 0.022 (0.019) 0.006 (0.009) 0.128 (0.188) F-ratio / P-value 0.744 0.487 4.195 / 0.029* / NS T NNS T NNS T NNS Peers NS T NNS Peers 0.064 0.064 1.000 Post hoc Scheffe test (p-value)

Subordinate cluase

4.2 Accuracy To measure the accuracy of the participants language output, the errors were counted as two different categories: lexical errors and morphosyntactic errors. The means of lexical and morphosyntactic errors produced in total show that the NNS teacher group was the most accurate group with the mean of 5.1 followed by the NS teacher group with the mean of 9.7 and then by the NNS peer group as the least accurate group with the mean of 10.8 as shown in Table 4. The difference was significant between groups( p=.001). The NNS teacher group was significantly more accurate than the other two groups ( p=.007 with the NS teacher group, p=.001 with the NNS peer group, respectively), whereas the other two groups were not different (p=.694).
Table 4: Accuracy (*p<.05) Groups NS T NNS T NNS Peers Total Mean (SD) 9.7 (2.3) 5.1 (2.4) 10.8 (3.0) 8.58 (3.56) F-ratio / P-value 10.89 / 0.001* Post hoc Scheffe test (P-value) NS T NNS T 0.007* NNS T NNS Peers 0.001* NS T NNS Peers 0.694

When the two categories of errors were separately analyzed, the similar results were found as shown in Table 5. In terms of lexical errors, the three groups in the order of the NNS teacher, NS teacher and NNS peer, made 1.38, 1.88, and 4.13 errors per 100 words. Both of the two teacher groups were significantly more accurate than the NNS peer group ( p=.01 with NNS teacher, and p=.04 with NS teacher group). But the two teacher groups were not different (p=.83). In terms of morphosyntactic errors, the three groups in the order of NNS teacher, NS teacher and NNS peer groups, made 4.0, 7.75, and 6.50 errors per 100 words. The NNS teacher group was significantly more accurate than the NS teacher group (p=.02), with the NNS peer group in the middle without being significantly different from either group. In sum, when separately analyzed, the higher performance of the NNS teacher group was sustained: the NNS teacher group outperformed the NNS peer group in lexical, and NS teacher group in morphosyntactic accuracy.

124

Table 5: Lexical, Morphosyntactic Errors per 100 Words (*p<.05). Dimensions Lexical Total MorphoSyntactic Total Groups NS T NNS T NNS Peers NS T NNS T NNS Peers Mean (SD) 1.88 (1.64) 1.38 (0.92) 4.13 (2.10) 2.46 (1.98) 7.75 (2.49) 4.00 (2.00) 6.50 (2.83) 6.1 (2.8) F-ratio / P-value 6.481 0.006 4.084 / 0.019 / Post hoc Scheffe test (P-value) NS T NNS T NNS T NNS Peers NS T NNS Peers NS T NNS T NNS T NNS Peers NS T NNS Peers

0.83 0.01 0.04 0.02 0.15 0.15

4.3 Fluency Fluency was measured by the number of repetition, replacement and hesitation produced in the participants language output during the task as in Table 6. The NNS peer group produced the largest number of repetition, replacement and hesitation, 17.8, and the NNS teacher group was ranked as the next group, as 11.3, and the NS teacher group was the one who produced the least, as 8.0, per 100 words, with a significant between-group difference ( p=.027). Scheffe test shows that the NS teacher group was significantly more fluent than NNS peer group (p=.031) whereas the NNS teacher group in the middle, was not significantly different from any of the two groups (p=.67 with the NS teacher group, and p=.17 with the NNS peer group).
Table 6: Fluency(*p<.05). Groups NS T NNS T NNS Peers Total Mean (SD) 8.0 (6.9) 11.3 (4.6) 17.8 (8.5) 12.3 (7.8) F-ratio / P-value 4.315 / 0.027* Post hoc Scheffe test (P-value) NS T NNS T 0.667 NNS T NNS Peers 0.170 NS T NNS Peers 0.031*

A closer examination of the three independent measures of fluency showed that the only measure found to be significant among the three fluency measures was replacement as shown in Table 7. Replacement was significantly more produced by the NNS peer group, 3.5 times, than by the other two groups, 1.7 times by the NS teacher group, and 2.7 times by the NNS peer group, though the NNS peer group produced all three measures the most.

125

Table 7: Fluency in Terms of Repetition, Replacement, and Hesitation per 100 Words (*p<.05 ). Dimensions Repetition Groups NS T NNS T NNS Peers Total NS T NNS T NNS Peers Total NS T NNS T NNS Peers Total Mean(SD) 1.8 (1.1) 3.0 (1.6) 5.0 (5.7) 3.3 (3.6) 1.7 (1.0) 2.7 (1.2) 3.5 (1.4) 2.7 (1.4) 4.5 (5.7) 5.8 (3.1) 9.3 (5.5) 6.6 (5.5) F-ratio / P-value 1.651/ 0.216 NS T NNS T NNS T NNS Peers NS T NNS Peers 0.301 0.024* 0.036* Post hoc Scheffe test (P-value)

Replacement

Hesitation

5. DISCUSSION Regarding the first research question on the three aspects of learner language output, the complexity and accuracy were both the highest in the NNS teacher group. NNS teacher group produced the more complex language than the other two groups approaching the significant level (p=.06) and also the significantly more accurate language than the other two groups, while the two groups were not different from each other either in complexity and accuracy. That is, regarding complexity and accuracy, the NNS teacher group is considered distinct from the other two groups. However, with regards to fluency, the NNS teacher group was not because the NS teacher group was the most fluent group with the significant difference from the NNS peer group, whereas the NNS teacher group was in the middle being not different from either group. The results are summarized in Figure 1.
Complexity (Subordinate clause per t-unit) 0.025 0.02 0.015 0.01 0
Accuracy (lexical & morpho errors per 100 words) 12 10 8

Fluency (repet. replace. & hesit. per 100 words) 20 15 10 5 0

NNS T NNS P

NST NNS T

NNS P

6 4 2 0

NST 0.005

NST

NNS T

NNS P

Figure 1: Summary of the Results.

The overall higher performance of the NNS teacher group was further examined through the interaction transcript for the second research question on the interactional behavior of the different interlocutors. The interlocutors feedback was counted in the exclusion of, backchannels and responsive fillers such as uh, uh huh, yeah, ok or yes. Any type of responses including topic continuation was counted as feedback following Lyster and Ranta (1997). The total number of feedback from NS teachers was 19, that of NNS teachers was 40, and the NNS peers produced 25 feedback throughout the task as shown in Table 8.

126

Table 8: Frequency of Feedback. Total number feedback 19 40 25 84 of Error Correction 5 7 3 15 Facilitative Questions 0 20 3 23

NS T NNS T NNS P Total

The tendency in NS teachers feedback was that they did not necessarily attend to the students contribution to the task or their understanding of the interlocutors feedback. Excerpts (1) and (2) show the incidents where the NS teachers did not understand the story line, and they posed clarification requests. However, the students in both excerpts did not understand the question, so the student went on with her task in the excerpt (1), or the teacher gave up asking his question and let the student continue in the excerpt (2). The NS teachers, however, performed the task all correctly, which suggests that they tried to find the solution not in the learners description but from their own reasoning or guess. (1) NS T: S: NS T: S: Who is watching TV. Yes. Who? Small child, children and uh, she ah.. he tell the mother that the tiger is in the bedroom and uh... mother go to the bedroom

The students answer to Who?, small child, was not the correct answer but was the subject of the following sentence. The student seems to have gone on with her work without getting NS teachers response and the NS teacher did not further requested for clarification. (2) NS T: S: NS T: S: the bed there is a tiger on the looks like a pillow. Yeah? O.K? O.K and then soldiers or police have a gun.

Another behavior that was found in NS teacher group was that they directly corrected students errors, and that students repeated the teachers correction. The exact repetition as uptake of the interlocutors feedback occurred only within the NS teacher group. Examples (3) and (4) show clear and direct error correction from NS teachers and students immediate repetition according to NS teachers direct correction. This leads to an assumption that students credit full authority to the input from the NS teacher on one hand and that students feel passively dependable on native teachers on the other hand, since they almost automatically repeated what the NS teachers provided to them though such an exact repetition does not contribute to the on-going communication with any new information. (3) NS T: S: (4) S: run ran run run run ah so ah run and husband and wife found a bear and they gone to 127

NS T: S:

go out go out yeah the car but bear dont move so they drive together.

In contrast, the NNS teachers asked much more facilitative questions, i.e., the ones that led students to the next step of a story line or to more details of the story. In other words, all these facilitative questions eventually yielded to a meaning negotiation. Example (5) shows the nonnative teachers facilitative questions, and example (6) not only shows the facilitative questions but the process that the student incorporated the correct form that resulted from the facilitative question into the more complex sentence than the one uttered before the teachers question. (5) NNS T: S: NNS T: S: NNS T: S: (6) S: NNS T: S: NNS T: S: The bear climbing in the car or Climbing in the car Okay. Is there anyone else around the bear? Yes. Yes, is someone around the bear? not yet. Not yet .um ..old people What is the reason of the s s smell and when mechanic ...open..., opened the front door, front door Front door? Um yeah.. front cover Um front cover, better And when, when he checked the engine,. uh she was so amazed because ... in, in the cover, in the cover, there was a big snake.

In excerpt (6), the learner self-repaired the error, front door to the correct form, front cover, responding the NNS teachers recast, Front door? And then after being confirmed by the NNS teacher, the learner proceeded with the complex sentence that has two subordinate clauses, with the rephrased beginning of the utterance from when mechanic ...open, opened the front door, front door, to when he checked the engine. As a result, the sentence became a complete t-unit with a complex structure and the accurate expression. Among the total 40 feedback occurrences, the direct error correction and the facilitative questions were separately counted as shown in Table 8. The NNS teacher used more questions, 20 times (50%), than the direct error correction, 7 times (17.5%), in contrast to the NS teacher who did the opposite, 5 error correction (26.3%) and no question (0%) among the total 19 feedback occurrences. The NNS teachers guiding questions seem to have elicited both complex and accurate output of the learners as shown in the above excerpts. It suggests that depending on the nature of feedback, complexity and accuracy could be achieved together, unlike in most of other studies (Lee, 2005; Song, 2003; see Skehan, 1998 for review) where the trade-off effect occurred between complexity and accuracy. Regarding the NNS peer group, whereas few instances of error correction or questions were observed, 3 times (12%) each, among the 25 feedback occurrences, an interesting observation was made that they tried to help each other by looking for the appropriate words together. Excerpts (7) and (8) below show that the NNS peer interlocutor tries to find the appropriate words for the story telling partner, thus co-constructing a storyline. 128

(7) S: NNS Peer: S: (8) S: NNS Peer: S: NNS Peer:

And he.. then, then the piggy the piggy stand up and walk to walk to his house? Home. Yes.. Uh.. So she go to car car Car center? Car center? Car hospital? Car hospital? Uh..huh..

Such a co-constructing interaction does not seem to lead to the higher complexity or accuracy as shown in the excerpt but seems to help the task go on. Another interesting observation made in the NNS peer group is that the NNS peer group produced significantly more replacement than the other two groups, resulting in the slowest speech group. However, considering that replacement is different from the other two measures of fluency, repetition and hesitation, in the sense that it includes a moment of noticing and self-correction, the NNS peer groups frequent replacement can eventually better promote language acquisition. Although the NNS peer group did not receive as many authoritative error corrections or guiding questions as other groups and also resulted in the worst group in all three measures, the finding that they tried the most replacement and co-constructed the output shed a bright promise for further studies on the precise effects of these features. In sum, the NNS teachers were shown to be a better interlocutor who guides the learners to modify their output for complexity and accuracy in comparison to the NS teachers who were more oriented to the task performance itself through their own guessing work rather than through the verbal interaction with the information provider. Though the NNS peers were not beneficial interlocutor in any of the three aspects of interlanguage output, their discourse behavior such as co-construction of the output and the high frequency of replacement demands the further studies for their benefit. The findings from the present study are in support of the findings of Doughty (1996, cited from Doughty, 2000 also from Nakahama, Tyler, & Van Lier, 2001). Although her study was not aimed at comparing the NS and NNS interlocutors, she pointed out the surprisingly low percentage of actual uptake of the opportunities for interactional modification during the NSNNS information gap task as 1-17 percents. Her observation of NS interlocutors seems very similar to the observation made in the present study as quoted below: "it was apparent that sometimes during the tasks, the task doers, i.e., NS interlocutors, would select a piece and place it somewhere without having heard any direction pertaining to that piece." (Doughty, 2000, p.62), "The NS simply accepted NNS utterances without modifying them, i.e., NNS's non-target forms" (Doughty, 2000, p.63). Doughty's finding is well in line with our suggestion that the native interlocutors nonresponsive attitude eventually yielded less self-repair and led to a higher fluency, as she pointed out that NS interlocutors paid little attention to their NNS interlocutors speech in a controlled task.

129

6. CONCLUSION The research aim of the study was to investigate whether there would be any differences in L2 learners language output depending on whom they interact with. At the heart of this research question, a small piece of hopeful potential lied for NNS interlocutors, including NNS teachers and NNS peers, to be as qualified and effective interlocutors as native speaker interlocutors, and some of the findings of the current study shed a hopeful light on the NNS teacher interlocutors. The results showed that the NNS teachers were the best interlocutors in terms of promoting the complexity and accuracy of learners language output. Also fluency was not lower in the NNS teacher group than in the NS teacher group, when directly compared, though only the NS teachers, not the NNS teachers, were better interlocutors than the NNS peers for fluency. On the other hand, the NNS peers seem to make the learners less pressed for both complexity and accuracy and also fluency. However, the major factor that made the NNS peer group less beneficial in terms of promoting the learners fluency than the NS group was the large number of self-repair produced by the learners, and this self-repair could be considered as an obvious process of noticing which would eventually bring learners closer to language acquisition as the necessary first step of SLA in Schmidts (1990) noticing hypothesis. In this sense, it should not be hastily concluded that the NNS peers provide no benefit before more research is conducted. Finally, though this study revealed some virtue of NNS teachers, it is limited in the participants number and their diversity and the types of the task, due to the practical difficulty in participating a large number of teachers. Another factor that holds the findings as tentative is that the NNS teachers who participated had the native-like proficiency, which is not representative of all NNS teachers in EFL settings.

130

REFERENCES rva, A., & Medgyes, P. (2000). Native and non-native teachers in the classroom. System 28(3), 355-372. Doughty, C. (1996). Second language acquisition through conversational discourse. Paper presented at the meeting of the American Association for Applied Linguistics, Chicago, IL. Doughty, C. (2000). Negotiating the L2 linguistic environment. University of Hawaii Working Papers in ESL 18(2), 47-83. Huizenga, J. (2000). Can you believe it? Stories and idioms from real life, Book 1. Oxford: Oxford University Press. Lee, H. (2003). The other side of the mirror: Intentional focus on form. Korean Journal of Applied Linguistics 18(2), 191-214. Lee, H. (2005). The effects of interlocutor variable on the interlanguage performance variation. Korean Journal of Applied Linguistics 21(2), 41-56. Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in Second Language Acquisition 19, 37-66. Macky, A., Oliver, R., & Leeman, J. (2003). Interactional input and the incorporation of feedback: An exploration of NS-NNS and NNS-NNS adult and child dyads. Language Learning 53(1), 35-66. McDonough, K. (2004). Learner-learner interaction during pair and small group activities in a Thai EFL context. System 32, 207-224. Milambiling, J. (1999). Native and non-native speakers: The view from teacher education . Paper presented at the annual meeting of the Midwest Modern Languages Association, Minneapolis, MN. Nakahama, Y., Tyler, A., & Van Lier, L. (2001). Negotiation of meaning and conversational and information gap activities: A comparative discourse analysis. TESOL Quarterly 35(3), 377-405. Pica, T., & Doughty, C. (1985a). Input and interaction in the communicative language classroom: A comparison of teacher fronted and group activities. In S. Gass & C. Madden (Eds.), Input in second language acquisition, (pp. 115-132). Rowley, Mass: Newbury House. Pica, T., & Doughty, C. (1985b). The role of group work in classroom second language acquisition. Studies in Second Language Acquisition 7, 233-248. Pica, T., Lincoln-Porter, F., Paninos, D., & Linnel, J. (1996). Language learners interaction: How does it address the input, output, and feedback needs of L2 learners? TESOL Quarterly 30(1), 59-83.

131

Robinson, P. (2001). Task complexity, cognitive resources, and syllabus design: a triadic framework for examining task influences on SLA. In P. Robinson (Ed.), Cognition and second language instruction, (pp. 287-318). Cambridge: Cambridge University Press. Robinson, P. (2003). Attention and memory in SLA. In C. Doughty & M. H. Long (Eds.), Handbook of second language acquisition, (pp. 631-678). Oxford: Blackwell. Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics 11, 129-158. Selinker, L., & Douglas, D. (1985). Wrestling with context in interlanguage theory. Applied Linguistics 6(2), 190-204. Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford University Press. Song, J. (2003). Effects of task processing conditions on the oral output of post beginners in a narrative task. English Teaching 58(4), 249-271. Varonis, E., & Gass, S. (1985). Non-native/non-native conversations: A model for negotiation of meaning. Applied Linguistics 6, 71-90. Zuengler, J., & Bent, B. (1991) Relative knowledge of content domain: An influence on native-nonnative conversations. Applied Linguistics 12, 397-415.

132

APPENDIX A A Sample Picture for the Task

From Huizenga, J. (2000). Can you believe it? Stories and idioms from Real Life, Book 1 (p. 34). Oxford University Press.

APPENDIX B A Sample of the Picture Story Given to the Subjects in Korean An elderly couple is taking a vacation in Yellowstone Park. They stop their car to take a picture of some bears. They leave their car doors open. A young bear gets in. He is looking for food. The man and woman do their best to get rid of the bear, but he refuses to move. So they drive to a park ranger station with the bear in the back seat. When the man gets out to report the problem, the bear gets in the front seat. The rangers can not believe their eyes! They find a woman in the passenger seat and a bear behind the wheel.

133

PARAPHRASE AS A TOOL FOR ACHIEVING LEXICAL COMPETENCE IN L2. Jasmina Milicevic Dalhousie University & OLST, Universit de Montral, Canada

1. PARAPHRASING AND LEXICAL COMPETENCE From the perspective of text production, the ultimate goal of language instruction in L2 is to help an L2 learner achieve near-native idiomaticity ( naturalness of produced text) and fluency ( ease of speaking). Both the naturalness and the ease of speaking come from a good command of lexical relations and their active use in order to produce, when needed, a varied output, i.e., to paraphrase. Examples in (1) illustrate two major types of lexical relations: derivational relations and collocational relations . In (1a) we see the verb STEAL and three of its derivativesnouns denoting, respectively, the agent, the affected object and the action of stealing; (1b) features the noun REBUKE as a collocation base controlling two collocatesa support verb and an intensifying adjective.
(1) a. STEAL ~ THIEF[the person who steals] ~ LOOT[ that which gets stolen] ~ THEFT[the act of stealing]

b. give[Collocate-1] a sharp[Collocate-2] REBUKE[Collocation base] Examples in (2) illustrate the use of both types of lexical relations in paraphrasing. In (2a), the verb STEAL is replaced by a semantically equivalent collocation featuring the action noun of the verb and a light verb for this noun; in (2b) the collocation base REBUKE is replaced by a quasi-synonymous noun, REPRIMAND, with the concomitant selection of appropriate collocates for the new base.
(2) a. STEAL commit[Collocate] a THEFT[Action.noun(STEAL) = THEFT; Collocation base]

b. give[Collocate-1] a sharp[Collocate-2] REBUKE[Collocation base] issue[Collocate-1] a stern[Collocate-2] REPRIMAND[QSyn(REBUKE) = REPRIMAND; Collocation base] The ability to paraphrase is part and parcel of linguistic competence (cf. Milicevic 2007: 2 and references herein). On the one hand, paraphrasing is used as a backup preventing a crash in text generation. Speaking is an extremely complex process, laden with pitfallsin the form of restricted lexical co-occurrence, lexical gaps, defective paradigms, etc.,which can only be avoided if we have at our disposal alternative ways of expressing the same meaning, i.e., some spare paraphrastic variants that can supplant a sentence whose construction has gone wrong. Examples in (3) illustrate paraphrasing necessary to get around a lexical gap: the verb ATTACK (V) does not have a collocate expressing the meaning intensely with respect to scope, but the corresponding deverbal noun, ATTACK(N), does, so the problematic verb is replaced by a semantically equivalent construction deverbal N + light V of N (this is the same type of substitution as the one seen in (2a)). 134

(3) a. The rebels attacked(V) the city ???____________ [intensely with respect to scope].

b. The rebels launched(light V of N) a whole-scale[intense with respect to scope] ATTACK(deverbal N) on the city. On the other hand, paraphrasing is crucially involved in sophisticated linguistic tasks, such as technical writing, abstracting, reformulation or translation; in fact, we can say that these tasks consist mainly in paraphrasing (cf. paraphrasing as a standard component of exercises in rhetoric/stylistics and the increasingly popular view of translation as inter-linguistic paraphrasing). Examples in (4) and (5) illustrate paraphrasing needed to avoid a tedious lexical repetition: in (4), the offending lexeme is replaced by one of its quasi-synonyms, and in (5) the offending collocate of a lexeme L is replaced by another of Ls collocates expressing the same meaning.
(4) a. Light CIGARETTES are as lethal as any other #CIGARETTES.

b. Light CIGARETTES are as lethal as any other SMOKES[Syn(CIGARETTE) = colloq. SMOKE] .


(5) [Both sides hope that the DEAL[Collocation base] will be reached[Collocate, Realization V].]

a. But if no DEAL is #reached, ... b. But if no DEAL[Collocation base] is struck[Collocate, Realization V], ... Given the crucial role of paraphrase in language production, 8 we can put forward the following requirement:

Language instruction in general, and that of an L2 in particular, should be centered on the acquisition of lexical relations and paraphrastic patterns based on these relations. This, in turn, implies the necessity of developing teaching tools geared toward paraphrasing. Two types of teaching tools can be envisaged: 1) manuals and learner dictionaries with a paraphrastic component; 2) diagnostic tests measuring learners paraphrasing ability and serving as an indicator of their over-all language proficiency/the efficiency of instruction. So far, with rare exceptions (Daunay, 2002, Tremblay, 2003, Polgure, 2004, Apresjan et al., 2007), the role of paraphrase/paraphrasing in language instruction has not even been recognized; paraphrase has been used only marginally in exercises and the aptitude to paraphrase has been but cursorily mentioned in L2 requirements. To the best of my knowledge, the only dictionary that makes use of paraphrasing and explicitly acknowledges its role in language teaching is Lexique actif du franais (Melcuk & Polgure, 2007). As for diagnostic tests based on paraphrasing, the only attempt to develop one specifically for language teaching seems to be Windsor (1976); Russo & Pippa (2004) propose a paraphrase-based test for measuring the aptitude to interprete. Clearly, then, there is a need to develop paraphrase-centered teaching tools. In this paper, I will present one such toola learner dictionary focusing on the acquisition and use of lexical and paraphrastic relations, called reformulation dictionary. The rest of the paper is structured as follows: general characterization of a reformulation dictionary (Section 2); description of a specific reformulation dictionary for intermediate-to-advanced learners of French (Section 3); summary of the research reported in the paper and preliminary conclusions (Section 4).

Paraphrase is also instrumental in language comprehension; for L2 see, e.g., Hsia (2000).

135

2. A TEACHING TOOL INTENDED TO BOOST LEXICAL COMPETENCE A REFORMULATION DICTIONARY This section presents the general principles guiding the elaboration of a reformulation dictionary and illustrates its intended use. The discussion is anchored in the Explanatory-Combinatorial Lexicology (Melcuk et al., 1995) of the Meaning-Text linguistic theory, a framework that has served for the construction of a number of production-oriented, formalized dictionaries (DEC, Melcuk et al., 1984-1999; DiCo, Polgure, 2000; DiCE, Alonso-Ramos, 2004; LAF, Melcuk & Polgure, 2007; Computer-Aided Lexical Acquisition Manual, Apresjan et al., 2007). A reformulation dictionary is an Explanatory-Combinatorial Dictionary with the following particularities: it is a learner dictionary and it is oriented towards paraphrasing. A general characterization of an EC learner dictionary can be found in Milicevic et al. (2006). A reformulation dictionary is a teaching tool supporting integrated and enlightened lexical acquisition via paraphrasing. Integrated acquisition means such that it allows for the learning of a lexical item not in isolation but by situating it within the lexical network of a language, and enlightenedsuch that it helps learners grasp regularities in the lexicon, i.e., recurrent links between lexical items, and extract generalizations about semantic and co-occurrence properties thereof. Since paraphrasing is instrumental in the reinforcement of lexical links already assimilated as well as in the activation of new ones, it is an ideal means for achieving this goal. A dictionary of this type uses descriptive tools developed by theoretical linguistics, appropriately adapted to be as learner-friendly as possible. I have in mind, in particular, lexical functions (Wanner, 1996) and paraphrasing rules (Melcuk, 1992; Milicevic, 2007) of the Meaning-Text theory, cross-linguistically valid formalisms for describing lexical and paraphrastic relations. Thus, lexicon acquisition with the aid of a reformulation dictionary is enlightened also in the sense that it presupposes the assimilation by the learner of some knowledge of linguistics (concepts/formalisms of the science of language, as opposed to knowledge of language). The advantages of using linguistic descriptive tools in language teaching are obvious: they not only help learners adopt more efficient learning strategies but also give more depth to the learning experience, since, due to their universality, they allow for easy cross-linguistic and cross-cultural comparisons; cf. Wierzbickas semantic primitives, which are intended to facilitate exactly this (e.g., Goddard & Wierzbicka, to appear). In order to support the learning process and gradually lead learners towards independence in the use of a reformulation dictionary, the latter is integrated into a (computer-supported) learning environment comprising tutorials (on different concepts/formalisms used in the dictionary) and exercises. A dictionary of this type can be envisaged for any language, be it L1 or L2, and at any level of instruction. While the type and detail of the information to be included into a given reformulation dictionary will vary based on the language/level of instruction, the general organizing principles just expounded remain valid in all cases. More concretely, lexicographic information in a reformulation dictionary is chosen, organized, and presented as a function of the task it is supposed to help its users accomplish, namely:

136

Given a sentence S containing a lexical unit L, paraphrase S by manipulating L, i.e., by using the available lexicographic information about Ls meaning and co-occurrence properties. Let me illustrate the type of paraphrasing intermediate-to-advanced learners (of French) may need to resort to. This will help us understand what type of information must be made available to them in a reformulation dictionary developed for the corresponding level. It is relatively easy to extrapolate from this the type of information required for other levels. Paraphrasing may be needed either in order to correct an incorrect text or in order to improve a correct but insufficiently elegant text; cf., respectively, sentences (6a) and (6b). 9 In both cases, the learner may be guided by teachers remarks concerning the type of error or impropriety to be addressed; cf. the grammaticality judgment (?) and error types (in parentheses) in the examples below.
(6) a. Souvent, il mest arriv de tomber de ma bicyclette et de subir une blessure

(Collocation, style) Often it happened that I fell from my bike and suffered an injury. b. Les bicyclettes ne sont pas aussi rapides que les voitures et les camions, mais la socit bnficie de leur usage (Thematic Progression) Bikes are not as fast as cars and trucks, but society benefits from their use. Sentence (6a) is deficient because of the improper use of the collocation subir une blessure suffer an injury. This is a perfectly valid collocation, in which subir expresses the lexical-functional meaning start having; however, it is used mostly in journalistic style (in reporting on events, such as conflicts and sport competitions, which is not the case here). This is what the learner will discover if he looks up the lexical relations zone of the entry for BLESSURE wound (the expressions in curly brackets are lexical functions [= LF] in soft encoding: they stand respectively, for Oper1, IncepOper1 and Caus1Oper1 and are literal renditions of the meanings of those functions; more on this will be said in Section 3): ... {avoir have} avoir, porter [ART ~]; souffrir [de ART] {commencer avoir start having} presse subir {se causer de commencer avoir cause oneself to start having} se faire [de ART] // se blesser At this point, the learner needs to select the value of another LF of BLESSURE with as close a meaning as possible to the initial one (or else attempt a more radical reformulation). The verb se faire make oneself, expressing the meaning of the LF cause oneself to start having, is a fair choice: it does add the meaning of auto-causation but this is perfectly appropriate for the situation being described (someone falling from a bike and hurting himself in the process). Alternatively, the learner may opt for the verb se blesser hurt oneself, a fused value of the LF in question (a fused value of an LF expresses together, i.e., as one word, the meaning of the function and that of its keyword). In this way, the following reformulations will be obtained:
9

These sentences are taken from a corpus of texts written by intermediate-to-advanced FLS learners, used for the elaboration of the Dire autrement reformulation dictionary (see Section 3).

137

Souvent, il mest arriv de tomber de ma bicyclette et de me faire une blessure <de me blesser>. Sentence (6b), a complex clause made up of two coordinated clauses, is grammatical but communicatively deficient: the sentence is about bikes and it would be in order to have BICYCLETTES bikes as the Theme not only in the first conjunct but also in the second, where this role has been usurped by SOCIT society. This may be accomplished by substituting to the verb BNFICIER (to) benefit a construction consisting of the copula TRE (to) be and the adjective BNFIQUE beneficial, characterizing the second actant of the verb (a Y from which X benefits is beneficial to X): Les bicyclettesTheme ne sont pas aussi rapides mais ellesTheme sont bnfiques pour la socit. The corresponding information can be found in the paraphrastic zone of the lexical entry for BNFICIER, containing instantiations of paraphrasing rules in which this verb can participate, with pointers towards the rules themselves (to be found in the tutorial on paraphrasing). X bnficie de Y ~ X tire des bnfices de X ~ Y offre des bnfices X ~ Y est bnfique pour X etc. [Rule n. xx] [Rule n. xx] [Rule n. xx]

The corresponding paraphrasing rule follows. It is given first in soft encoding, i.e., a learnerfriendly variant of the standard Meaning-Text formalism, and then, for comparison, in the latter formalism as well.

X V-e [preposition] Y Y est Adj(V) preposition X L(V) Oper1(A2(L(V))) II A2(L(V)) In the tutorial on paraphrasing, the learner will find necessary explanations concerning the rule, as well as its instantiations involving other lexemes: X doute de Y X doubts Y ~ Y est douteux pour X Y is dubious to X, X effraye Y X frightens Y ~ Y est effrayant pour X Y is frightening to X, X peut Y-er X can do Y ~ Y est faisable pour X Y is doable for X, etc. The examples above show that a reformulation dictionary for intermediate-to-advanced level needs to contain fairly sophisticated information concerning the meaning, lexical and syntactic co-occurrence of lexemes. In order to be able to find this information, its users need to have quite a bit of linguistic knowledge

3. DIRE AUTREMENT (SAY IT DIFFERENTLY) A REFORMULATION DICTIONARY FOR INTERMEDIATE-TO-ADVANCED FSL LEARNERS. Dire autrement (henceforth DA) is an electronic reformulation dictionary for intermediate-to-advanced FSL learners, currently developed at the French Department of Dalhousie University (http://direautrement.french.dal.ca). Here I will focus on the organization and presentation of lexicographic information in a reformulation dictionary at the micro-level, i.e., within a lexicographic entry, which describes a single lexical unit (= a lexeme or an idiom). Determination of the lexical stock of the dictionary, as well as its macro-organization, i.e., grouping of lexical units into vocables (= polysemous words) and semantic fields, is left out. 138

For a description of the DA dictionary that includes these aspects of its organization see Milicevic & Hamel (in press). Subsection 3.1 presents a standard DA entry and identifies in it the sources of paraphrases (i.e., the information that can be exploited for paraphrasing), described in turn in subsection 3.2. 3.1 Standard Lexicographic Entry in the DA Dictionary The zones of a DA lexicographic entry, their content and (where applicable) the way it is presented are indicated in Figure 1. The information relevant to paraphrasing is in bold face. A DA entry is actually a hybrid entry in the sense that it combines types of lexicographic information and formalisms that can be found in other EC dictionaries, while at the same time adding information and formalisms peculiar to it. Thus, like ECD, it contains a full-fledged lexicographic definition of the keyword (sub-zone 2c), written in a somewhat simplified ECD style; like LAF, it uses soft encoding of lexical functions (zone 4); and it is the only EC dictionary that contains a paraphrasing component (zone 6), with soft encoding of paraphrasing rules. A sample DA entry is given in the APPENDIX.

139

1. Identification of the Lexical Unit L (= keyword of the article) Citation form of L, its part of speech and its grammatical characteristics (e.g., gender, defective paradigm, etc.) 2. Semantic characterization of L a) Ls Semantic Field(s) b) Ls Semantic Label: taxonomic characteristics of L (e.g., fact, act, entity, artifact, etc.) = its minimal paraphrase c) Ls Definition: paraphrastic decomposition of Ls meaning d) Ls connotations 3. Syntactic characterization of L Ls Government Pattern [ Subcategorization Frame] Presented in soft encoding (to be developed) 4. Characterization of Lexical Relations of L Presented in soft encodingin terms of natural language formulas paraphrasing the meaning of corresponding lexical functions 5. Lexical units Related to L (without being quasi-synonymous with it) 6. Paraphrases involving L Instantiations of paraphrasing rules applicable to L, with pointers towards the rules, presented in soft encoding (to be developed) 7. Examples of Ls use 8. List of idioms containing Ls signifier WARNINGS alerting the user to potential difficulties associated with Ls use (may be found in any of the zones 1-4; the zone 5 is a warning in itself). Figure 1: A DA Entry Template.

3.2 Sources of Paraphrases in a DA Entry Sources of paraphrases in a DA entry are as follows: 1) semantic label [sub-zone 2b]; 2) definition [sub zone 2c]; 3) Government Pattern [zone 3]; and 4) lexical relations [zone 4]. Sub zone [2b] is a minor source of paraphrases only, i.e., paraphrasing possibilities it offers are rather limited,10 while other (sub)zones can be either minor or major sources. Part of the information relevant to paraphrasing that can be deduced from these sources is explicitly indicated in zone (6). In what follows, I will discuss (sub)zones [2c] and [4], leaving aside (sub)zones [2b] and [3]. 3.2.1 Lexicographic definition of L Lexicographic definition of L allows for semantic paraphrasingvia manipulation of (configurations of) meanings. We can either 1) replace L by its definition, which is a semantic decomposition of Ls meaning in terms of simpler meanings (this is a minor source of paraphrases) or 2) use Ls definition as a point of entry for omitting/adding some meanings or changing their configurations (a major source). The definition of L is actually a rule that establishes the equivalence between L (more precisely, Ls propositional forma formula containing L and its semantic actants) and Ls semantic decomposition; cf.:

10

On semantic labels and their use in Meaning-Text lexicography, see Polgure (2003) and Milicevic (1995).

140

RuleSEM 1
[Ls propositional form = definiendum] [Ls semantic decomposition = definiens]

X empche Y de faire Z(Y) X prevents Y from doing Z(Y) X cause que Y ne peut pas faire Z(Y) X causes Y to be unable to

do Z(Y)
(7) a. Ce bruit mempche de dormir This noise prevents me from sleeping.

b. Je ne peux pas dormir cause de ce bruit I cannot sleep because of this noise. Even though the definition of L is an exact paraphrase of Ls meaning, it cannot always replace L in texts, due to its insufficient idiomaticity. The result of substitution is quite acceptable in some cases (Mon gendre = Le mari de ma fille enseigne langlais My son-in-law = My daughters husband teaches English), but very often it is not ( Jean sest rveill *= a cess de dormir 7 heures Jean woke up *= stopped sleeping at 7 oclock). This source of paraphrase should thus be used with caution. The number of semantic decomposition rules is equal to the number of lexical units of the language (roughly, a million). Once completed, the DA dictionary will contain 1.000 such rules. Semantic omissions/additions and meaning configurations modifications are taken care of by semantic reconstruction rules. A rule of this type establishes a (quasi-)equivalence between two configurations of meanings that cannot be related via definitions; cf.: RuleSEM 2 X fait Y habituellement X does Y habitually Y peut faire Y X can do Y
(8) a. Cet enfant parle couramment This child speaks fluently.

b. Cet enfant peut parler couramment This child can speak fluently. Semantic reconstruction rules constitute the Meaning-Text semantic paraphrasing subsystem. Since these are fairly new rules, their number has yet to be assessed (my guess is that it should be around a few hundred). The DA dictionary will use only a small number of these rules. Semantic decomposition and reconstruction rules as presented above are already in soft encoding; in their hard version they are written as (quasi-)equivalencies between semantic networks. However, DA definitions are simplified with respect to those of ECD. 3.2.2 Lexical functions [LFs] of L Lexical functions allow for lexical-syntactic paraphrasingvia synonymic lexical substitutions, possibly leading to modifications of the initial syntactic environment. We can either 1) replace an element of the value of an LF(L) by another such element or by a literal realization of the meaning of the LF (these are minor sources of paraphrases) or else 2) replace L by L (or a lexical configuration) with which L is in a lexical-functional relation (a major source). To illustrate the first case, let us consider the following LFs of the lexeme CONSEIL#I.1 advice and their respective values; each LF is indicated in its standard notation and in the soft encodingin fact, a natural language formula paraphrasing the meaning of the LF (Popovic, 2003; Polgure, 2004).

141

LF 1. Oper1 2. trop de C.+Labor12 3. essayer de Caus3Func3

Paraphrase of the LF encoding) {donner un C.} give A. {donner trop de C.} give too much A. {essayer de recevoir un C.} try to receive some A.

(soft

Value donner [(ART) ~ NZ]; fournir, offrir [ART ~ NZ] inonder [NZ de ~s], accabler [NZ de ~s] demander [(ART) ~ NX], solliciter [ART ~ NX]

Figure 2: Three LFs of the noun CONSEIL#I.1 advice and their values.

For each of the LFs, any element of its value canas a rulebe replaced by any other, since they are synonymous; cf.:
(9) a. Jean lui a donn a fourni a offert un CONSEIL fort utile Jean gave <fur-

nished, offered> him some very useful advice. b. Tu les inondes accables de CONSEILS You flood <burden> them with advice. c. Il faut demander soliciter son CONSEIL It is necessary to ask for <sollicit> his advice. An element of the value of a given LF(L) can also be replaced by the natural language formula which encodes the meaning of this LF; cf.:
(10) a. Tu les inondes de CONSEILS You flood them with advice. Tu leur donnes

trop de CONSEILS You give them too much advice. b. Il faut lui demander un CONSEIL It is necessary to ask for his advice. Il faut essayer de recevoir son CONSEIL It is necessary to try to receive his advice. A natural language formula encoding an LFs meaning is of course less idiomatic than the values of the LF (for a given L); thus, the formulas should be used as the last resort, if the values themselves are not known or cannot be recalled. Let me now turn to the second (major) case of lexical-syntactic paraphrasing: lexical-functional equivalencies, modeled by means of paraphrasing rules, such as the following two. RuleLEX-SYNT 1: L(V) Oper1(S0(L(V))) II S0(L(V))
(11)a. Jean est tomb [= L(V)] Jean fell

b. Jean a fait [= Oper1(So(L))] une chute [= So(L)] Jean took a fall This rule describes a synonymic substitutiona replacement of a verbal lexeme L by a configuration consisting of the corresponding deverbal noun [S0] and the light verb [Oper1] that takes this noun as its object and the first syntactic actant of the noun as its subject. RuleLEX-SYNT 2: L Anti(L) ATTR _NEPAS_

(12) a. Jean tait absent [= L] Jean was absent.

b. Jean ntait pas prsent [= Anti(L)] Jean was not present This is an antonymic substitutiona replacement of a lexeme L by a configuration consisting of an antonym of L and the negation.

142

There are about hundred rules of this type; they constitute the Meaning-Text lexical-syntactic paraphrasing subsystem. Only one tenth of these rules are used in the DA dictionary (in a learner-friendly formalism that has yet to be worked out; cf. the rule p. 8, written in such a formalism). Finally, the paraphrastic zone of a DA entry [zone 6] presents explicitly the information that can be deduced from zones [2c], [3] and [4] (with the exceptions of the information coming from the so-called minor sources of paraphrases, cf. p. 10). More specifically, it contains instantiations of paraphrasing rules (applicable to the keyword) of the following types: 1) semantic decomposition and reconstruction rules exploiting Ls definition; 2) syntactic rules exploiting Ls GP modifications; and 3) lexical-syntactic rules exploiting Ls lexical functions. Each rule instantiation is linked to the corresponding rule in the tutorial on paraphrasing. For an illustration of the paraphrastic zone in a DA entry, see the APPENDIX.

4. SUMMARY AND PRELIMINARY CONCLUSIONS The paper has argued that lexical competence stems largely from paraphrasing competence, i.e., the ability to exploit lexical relations in order to reformulate ones discourse. It has pointed out the necessity of developing teaching tools geared toward paraphrasing and has presented one such toola learners reformulation dictionary anchored in the Combinatorial-Explan atory lexicology of the Meaning-Text linguistic theory. A specific reformulation dictionary for intermediate-to-advanced learners of French has been described. Teaching tools intended to facilitate lexical acquisition must make use of concepts and formalisms developed by theoretical lexicology, as only the latter allow for coherent and rigorous structuring of (notoriously complex) lexicographic information. This, in turn, entails the necessity that their usersin particular L2 learnersassimilate those concepts and formalisms. This task, often intimidating for learners, can be facilitated if a learner-friendly version of linguistic descriptive tools is adopted. (However, as our experience so far suggests, learnerfriendly does not necessarily mean less formalization.)

ACKNOWLEDGEMENTS I am grateful to Yves Bourque, Marie-Jose Hamel, Igor Melcuk, Muriel Pguret and Alain Polgure for their comments on a pre-final version of this paper. REFERENCES Alonso Ramos, M. (2004). Elaboracin del Diccionario de colocaciones en espaol y sus aplicaciones. In P. Bataner & J. de Cesaris (Eds.), De Lexicographia. Actes del I Symposium internacional de Lexicografia, (pp. 149-162). Barcelona, IULA-Edicions Petici. Apresjan, Yu., Djacenko, P., Lazurski, A. & Tsinman, L. (to appear). O kompjutornom ucebnike leksiki russkogo jazyka. Russkij jazyk v naucnom osveccenii. Daunay, B. (2002). La paraphrase dans lenseignement du franais. Bern: Peter Lang.

143

Goddard, C. & Wierzbicka, A. (to appear). Semantic Primes and Cultural Scripts in Language Learning and Intercultural Communication. In Sharifian, F. & Palmer, G. (Eds.), Applied Cultural Linguistics: Implications for Second Language Learning and Intercultural Communication. Hsia, S. (2000). Grammaticality judgments, paraphrase and reading comprehension: evidence from European, Latin American, Japanese and Korean ESL learners. sunzi1.lib.hku.hk/hkjo/view/10/1000030.pdf Melcuk, I. (1992). Paraphrase et lexique: la thorie Sens-Texte et le Dictionnaire explicatif et combinatoire. In Meluk, I., Arbatchewsky-Jumarie, N., Iordanskaja, L., Mantha, S. & Polgure, A. (Eds.), Dictionnaire explicatif et combinatoire du franais contemporain. Recherches lexico-smantiques III, (pp. 9-59). Montral: Presses de lUniversit de Montral Melcuk, I Clas, A. & Polgure, A. (1995). Introduction la lexicologie explicative et combinatoire. Louvain-la-Neuve: Duculot. Melcuk, & Polgure, A. (2007). Lexique actif du franais. Lapprentissage du vocabulaire fond sur 20 000 drivations smantiques du franais. Louvain-la-Neuve: de Boeck Duculot. Milicevic, J. (2007). La paraphrase. Modlisation de la paraphrase langagire. Bern: Peter Lang. Milicevic, J (1995). tiquette smantiques dans un dictionnaire de type explicatif et combinatoire [mmoire de matrise]. Universit de Montral. Milicevic, J & Hamel, M.-J. (in press). Un dictionnaire de reformulation pour les apprenants du franais langue seconde. Actes du 29e colloque de lAssociation linguistique des provinces de lAtlantique, Moncton, 4-5 novembre, 2005. Milicevic, J., Alonso-Ramos, M. & Hamel, M.-J. (2006). Un dictionnaire dapprentissage de type explicatif et combinatoire. Principes et techniques dlaboration. Le 30e colloque de lAssociation linguistique des provinces de lAtlantique, Halifax, 3-4 novembre, 2006. Polgure, A. (2004). La paraphrase comme outil pdagogique de modlisation des liens lexicaux. In E. Calaque & J. David (Eds.), Didactique du lexique: contextes, dmarches, supports, (pp. 115-125). Bruxelles: De Boeck. Polgure, A. (2003). tiquetage smantique des lexies dans la base de donnes DiCo . Traitement Automatique des Langues 44(2), 39-68. Polgure, A. (2002). Une base de donnes lexicales du franais et ses applications possibles en didactique. Revue de linguistique et de didactique des langues (LIDL) 21, 75-97. Popovic, S. (2003). Paraphrasage des liens de functions lexicales [mmoire de matrise]. Universit de Montral. Russo, M. & Pippa, S. (2004). Aptitude to Interpreting: Preliminary Results of a Testing Meth odology Based on Paraphrase. META 49(2), 409-432. Tremblay, O. (2003). Une approche structure de lenseignement du lexique en franais langue maternelle base sur la lexicologie explicative et combinatoire [mmoire de matrise]. Universit de Montral. Wanner, L., ed. (1996). Lexical Functions in Lexicography and Natural Language Processing. Amsterdam/Philadelphia: Benjamins.

144

Windsor, Micheline (1976). Lexercice de paraphrase dans lenseignement du franais. Audio visual language learning 14(1), 3-8.

145

APPENDIX Entry for CONSEILLERI.1 (to) advise in the DA Dictionary CONSEILERI.1, verbe, transitif Caractrisation smantique Champ sm.: tiquette sm.: communication verbale, acte de parole, aide communiquer verbalement

Dfinition: individu X ~ individu Z de faire Y = X croit que Z veut savoir#I lopinion#1 de X sur ce que Z doit faire dans la situation en question.|| X communiqu Z que faire action Y ou tre en tat Y serait dans lintrt de Z, le but de cette communication tant dinciter Z Y. Emploi performatif est possible: Je vous/te conseille [Y] Expression des actants [qui conseille] [ce qui est conseill] X=I=N Y = II = N, de V-inf, PROPOSITION, LE(Pron.neutre)

[ qui on conseille] Z = III = N, IL(Clitique.datif) Expression de Y est obligatoire Jean [X] ~ Marie [Z] la modration <dtre modre> [Y]; Il [X] le [Y] lui [Z] ~; Jean [X] ~: Soyez modrs! [Y] Emploi impersonnel est possible: Il est conseill ( Z) de [Y] Relations lexicales {quasi-synonyme} discrtement {antonyme} {X qui C. habituellement} {intensment} Lexies de sens proche dconseiller, attnu ne pas conseiller# I.1 conseiller de ne pas faire Y conseiller(N)#I.1 conseil#I.1 fortement, vivement PROPOSER#2a {fait de C. ou ce qui a t C. [=Y]} recommander, suggrer recommander conseiller vivement; suggrer conseiller

146

Paraphrases X ~ Z de faire Y dcomposition] 2) X recommande <propose, suggre> Z de faire Y [Rgle n xx] 3) X donne Z le conseil de faire Y 4) Le conseil de X Z est de faire Y X ne ~ pas Z de faire Y [Rgle n xx] Exemples Son mdecin lui conseilla le repos. | Nous vous conseillons vivement de faire votre demande en ligne. [Source: Google] 5) X dconseille Z de faire Y [Rgle n xx] [Rgle n xx] 1) X dit Z quil sera dans lintrt de Z de faire Y [Par

COMPLEXITE, EXACTITUDE ET FLUIDITE: LE ROLE QUE JOUENT LES SEQUENCES PREFABRIQUES DANS LINTERLANGUES DES DEBUTANTS.11 Florence Myles
11

Cet article est une version simplifie et abrge, en franais, de Myles (2004). Voir aussi Myles et al (1998; 1999) pour des tudes plus pousses des SP verbales et interrogatives respectivement.

147

Universit de Newcastle, Angleterre

1. INTRODUCTION Le rle que jouent les squences prfabriques dans lacquisition des langues trangres, ainsi que leur statut au sein de la grammaire mergente de lapprenant, restent mal compris. De nombreuses thories dapprentissage prsupposent que la prsence dune structure grammaticale dans linterlangue de lapprenant signifie que la structure en question est acquise. Cette prsupposition est particulirement problmatique dans linterlangue des dbutants, dont les premiers pas vers une comptence communicative de base impliquent gnralement lapprentissage par cur de jeux de rle, qui leur permettent daccomplir quelques changes linguistiques en dpit dun manque de comptences grammaticales adquates. Dans la classe de langue par exemple, il est trs frquent dentendre des tout dbutants changer des informations sur leurs loisirs, leurs familles etc. au moyen de questions structurellement bien plus complexes que ne leur permettraient leurs comptences grammaticales productives. Bien que la prsence de ces squences prfabriques chez les apprenants ait t atteste depuis plus de trente ans (Hakuta, 1974, 1976; Myles, 2004; Myles, Hooper, & Mitchell, 1998; Myles, Mitchell, & Hooper, 1999; Pawley & Syder, 1983; Raupach, 1984; Skiba & Dittmar, 1992; Towell, 1987; Vihman, 1982; Weinert, 1995; WongFillmore, 1976; Wray, 2002), le rle quelles jouent dans le dveloppement du systme grammatical de lapprenant reste mal compris. Servent-elles seulement de support temporaire en attendant que lapprenant ait acquis les comptences grammaticales ncessaires ? Contribuent-elles de manire plus directe au dveloppement linguistique de lapprenant ? Les apprenants dbutants les utilisent-ils afin de donner une impression de fluidit et de complexit grammaticale allant au del de leur systme rudimentaire et hsitant ? Dans cet article, aprs avoir prsent brivement quelques problmes mthodologiques de dfinition et didentification, nous analyserons le dveloppement de plusieurs squences prfabriques dans deux groupes dapprenants dbutants ayant pris part une tude longitudinale et transversale respectivement. Nous comparerons ensuite ce dveloppement lmergence du systme grammatical productif chez ces mmes apprenants, afin dtudier la relation entre les deux. Dans une dernire partie, nous analyserons le statut de ces squences au sein du systme grammatical mergent des apprenants, ainsi que le rle quelles jouent dans le dveloppement linguistique de lapprenant. 2. METHODOLOGIE Dans cette premire partie, nous aborderons tout dabord les questions lies la dfinition de telles squences, ainsi qu leur identification dans linterlangue des apprenants. 2.1 Dfinition Les squences prfabriques sont un phnomne trs courant dans tout change linguistique, et ne sont pas rserves au langage des apprenants. Plusieurs termes sont utiliss en linguistique pour y rfrer, tels que: expressions idiomatiques ou expressions figes en 148

franais; chunks, formulaic language, unanalysed formulae, rote-learned sequences, prefabricated routines en anglais, avec des sens lgrement diffrents. Nous nous basons ici sur une dfinition psycholinguistique, telle que celle donne par (Wray, 2002): a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar. une squence, continue ou discontinue, de mots ou autres lments, qui est ou semble tre, prfabrique: c'est--dire mmorise et extraite de la mmoire en un tout au moment de la production, et non pas gnre ou analyse par le systme grammatical. 12 Cette dfinition nest pas controverse, et reflte un consensus chez les chercheurs travaillant dans ce domaine que toute dfinition doit inclure la notion dunit multimorphmique mmorise et produite comme un tout, et non pas gnre partir de rgles grammaticales combinant des items lexicaux individuels (Myles et al., 1998; Myles et al., 1999). Ces squences sont trs courantes chez les locuteurs natifs aussi bien que chez les apprenants dune langue maternelle ou trangre et chez les aphasiques, et certains chercheurs suggrent quune grande partie de nos changes linguistiques quotidiens sont en fait prfabriqus (Wray, 2002). Mais sil est relativement facile de dfinir ces squences, il est considrablement plus problmatique de les identifier: comment peut on savoir si une production a t gnre par la grammaire, ou sil sagit dune unit non segmente. 2.2 Identification Les critres didentification ont tous recours aux notions de complexit, fluidit et exactitude des squences prfabriques lorsquelles sont compares aux productions gnres paralllement par la grammaire des apprenants. Je prsenterai ici les critres les plus couramment utiliss, et qui furent tout particulirement pertinents dans notre tude. Ils sont tirs de Weinert (1995). Longueur et complexit:

Les SP sont gnralement plus longues et plus complexes que les autres productions des apprenants. Par exemple, ces deux interventions ont t produites par le mme apprenant lors de la mme sance, aprs seulement un trimestre dapprentissage du franais langue trangre:
(1) (2)

quel ge as-tu ? (pronom interrogatif prpos; inversion du verbe et du sujet) *il ge frre ? (sens voulu13 (projet/prmdit): quel ge a ton frre ?) Fluidit phonologique:

12

Ma traduction Le sens projet par lapprenant est facile retrouver daprs le contexte

13

149

Les SP ont tendance tre plus fluides phonologiquement que les autres productions de lapprenant, et tre nonces sans hsitations et sans rupture prosodique. Les exemples suivants viennent dun autre lve, durant la mme sance aprs un trimestre de franais:
(3)

quelle est la date de ton anniversaire ? (sens voulu: quelle est la date de SON anniversaire) *est elle bon anniversaire ? (sens voulu: quelle est la date de son anniversaire) Usage inappropri:

(4)

Les SP sont souvent utilises de manire inapproprie, dun point de vue syntaxique, smantique ou pragmatique, comme nous lavons dj vu ci-dessus, et comme le montre lexemple suivant:
(5)

*mon petit garon euh o habites-tu ? (sens projet: o habite votre petit garon ?)

Ce critre est sans doute le plus rvlateur du statut prfabriqu dune production, et est extrmement commun chez nos apprenants au tout dbut de lapprentissage. Immuabilit:

Les SP ne se prtent gnralement pas modification. Les apprenants ne sont pas mme de substituer leurs lments constitutifs, comme par exemple la rfrence la deuxime personne du singulier dans lexemple suivant:
(6)

as-tu des frres ou des surs ? (sens projet: a-t-elle des frres ou des surs ?)

Nulle part ailleurs dans nos donnes ne trouvons nous la structure as-tu employe par les apprenants. - Exactitude et complexit grammaticale: De manire gnrale, les SP sont correctes grammaticalement, et dun niveau plus avanc que les autres productions un stade donn. Elles semblent sans rapport avec la capacit productive de lapprenant:
(7) (8)

comment tappelles-tu ? (sens voulu: comment sappelle-t-il ?) *euh une nom ? (sens voulu: comment sappelle-t-il ?)

Ces phrases on t toutes deux produites par le mme apprenant, la premire aprs seulement un trimestre de franais, et la deuxime aprs 7 trimestres, ce qui pourrait porter croire que cet apprenant est plus avanc au dbut qu la fin de cette tude, si lon ne prend pas en compte le statut prfabriqu du premier exemple. Dautres critres peuvent tre employs afin didentifier les squences prfabriques, par exemple lis au contexte dapprentissage (cf. (Myles, Hooper, & Mitchell, 1998; Myles, Mitchell, & Hooper, 1999; Weinert, 1995). Ceux que je viens de prsenter sont les plus courants et se rvlrent les plus utiles dans notre corpus. Essentiellement, ce qui permet de dcider si une production est prfabrique ou non est la comparaison avec la grammaire 150

productive dun apprenant: si une squence est clairement plus complexe syntaxiquement que le reste des productions dun apprenant, et utilise de manire inapproprie au contexte, son statut prfabriqu est trs probable. 3. ETUDE Cette tude va suivre le dveloppement de plusieurs squences prfabriques au sein de deux corpus oraux dapprenants, tous deux disponibles lectroniquement partir de la base de donnes FLLOC (French Learner Language Oral Corpora; www.flloc.soton.ac.uk). 3.1 Participants - corpus dbutants: Le premier groupe de participants est extrait dune tude longitudinale de 60 lves du secondaire dbutant le franais en Grande Bretagne; ils sont en anne 7 (quivalent de la 6 me en France) au dbut de ltude, et ont particip des activits orales individuelles avec un chercheur une fois par trimestre pendant deux annes, reprsentant environ 200 heures denregistrement (trimestre 2 et 3 anne 7; trimestre 1, 2 et 3 anne 8, et trimestre 1 anne 9). Lanalyse qualitative qui suit est base sur seize de ces apprenants. - corpus post-dbutants: Le deuxime groupe de participants est tir dune tude transversale dapprenants en annes 9, 10 et 11 dans le secondaire en Grande Bretagne. 20 lves dans chacune de ces annes ont pris part des activits orales semblables celles de ltude longitudinale. Dans lanalyse qui suit, nous tudions lutilisation des SP chez les 20 apprenants en anne 11, c'est--dire qui ont deux annes de plus de franais qu la fin de ltude prcdente. 3.2 Squences tudies et leur dveloppement Les tout premiers noncs des apprenants semblaient indiquer que beaucoup de leurs productions contenant des verbes taient prfabriques. En effet, elles taient souvent complexes syntaxiquement, comme par exemple dans le cas des interrogatives, et coexistaient pendant de longues priodes avec des noncs trs simples, gnralement sans verbe ou au mieux contenant un verbe non flchi. Cest pour cette raison que nous avons choisi dtudier trois squences verbales trs courantes dans lensemble du corpus, ainsi que quatre squences interrogatives produites ds le dbut de lapprentissage. 3.2.1 Squences verbales: jaime; jadore; jhabite Le fait que nos apprenants utilisent trs souvent ces trois squences de manire inapproprie, comme dans les exemples *Monique jaime (= Monique aime) ou *La garon jaime le cricket ? (= est-ce que le garon aime le cricket ?), montre clairement quelles nont pas t analyses en terme de leurs constituants et sont prfabriques. Afin de nous en assurer, nous avons recherch dans le corpus toutes les occurrences du pronom j en dehors de ces squences, ainsi que des verbes aimer, adorer et habiter avec un sujet autre que j.

151

Nous avons compt 329 occurrences de ces trois squences dans tout le corpus dbutant (annes 7, 8 et 9). Sur ces 329 cas, peu prs la moitie (158) tait utilise de manire inapproprie, comme par exemple dans Richard jaime le muse (= Richard aime le muse). Le reste tait utilis correctement car le contexte exigeait un pronom la premire personne. Par contraste, il ny avait que 3 occurrences de j en dehors de ces squences dans tout le corpus (excluant jai qui satisfait aussi aux critres caractristiques des SP dans notre corpus). Cela reprsente moins d1% de lutilisation du pronom j chez ces apprenants. Les apprenants en anne 11 par contre, sils utilisent toujours frquemment ces trois squences, emploient j de manire productive avec dautres verbes, comme lindique le tableau 1:
Tableau 1: j suivi de aime/adore/habite et suivi dautres verbes 14 J + aime/adore/habite J + autres verbes Total Dbutants (annes 7, 8, 9) 329 (99.1%) 3 (0.9%) 332 (100%) Post-dbutants (anne 11) 26 (59.1%) 18 (40.9%) 44 (100%)

Quant lutilisation des verbes aimer/adorer/habiter en dehors de ces squences, elle nest pas aussi frquente, mais certains apprenants semblent tout de mme avoir segment la squence afin den extraire le verbe. Nous avons trouv 39 occurrences du verbe aimer chez les dbutants, 34 du verbe habiter, et 37 du verbe adorer. Ce quil est intressant de noter est que dans tous ces cas le verbe se termine par un schwa, comme dans la SP. Ceci est en contraste marqu avec les autres productions verbales des apprenants dbutants, qui sont pratiquement toutes non flchies, se terminant en [e], par ex. la mre et le garon arriver le lac. Dans une tude prcdente (Myles, 2005), nous avons constat trois stades de dveloppement de la morphologie verbale chez ces mmes apprenants: 1. noncs sans verbe 2. verbes non flchis 3. verbes flchis Tableau 2 illustre ce dveloppement dans le contexte dun verbe trs courant produit par tous les apprenants au cours dun rcit15, le verbe regarder:
Tableau 2: proportion de formes flchies/non flchies du verbe regarder. 16 Forme flchie Forme non-flchie Total Anne 8 16 (26.2%) 45 (73.8%) 61 (100%) Anne 9 20 (34.5% 38 (65.5%) 58 (100%) Anne 11 51 (66.2%) 26 (33.8%) 77 (100%)

Il semblerait que contrairement aux verbes originaires de squences prfabriques, les autres verbes font leur apparition sous forme non-flchie, par exemple * ma mre arriver au maison;
14

Tableau extrait de Myles (2004: 146) Tous les contextes exigeaient une forme flchie Tableau extrait de Myles (2004: 146)

15

16

152

*un journaliste parler le grande-mre17. Le nombre des formes flchies du verbe regarder saccroit progressivement, passant denviron un quart en anne 8 environ deux tiers en anne 11. Cela nous amne penser que les formes verbales aime/habite/adore, qui sont toujours produites sous forme flchie, proviennent de la segmentation des SP. Nous reviendrons ce point plus tard. En effet, si nous tudions de plus prs la faon dont sont utilises ces SP verbales, nous pouvons voir les apprenants raliser petit petit quelles rfrent la premire personne, et quils doivent les modifier dune manire ou dune autre afin de changer la rfrence. Dans un premier stade, les apprenants utilisent plusieurs stratgies afin dindiquer de manire explicite le rfrent, tout en gardant la SP intacte:
(9)

jaime le sp- elle jaime le sport () euh she likes euh elle jaime la history museum (= elle aime le sport et elle aime les muses dhistoire)

Dans un second stade, le pronom sujet est identifi et spar du verbe afin de rendre explicite la rfrence la troisime personne: (10) jai no oh elle habite le [nom de ville] ? (= habite-t-elle [nom de ville] ?) Ces exemples montrent un lien entre la construction du systme pronominal et lanalyse des SP en leurs constituants. 3.2.2 Squences interrogatives Lanalyse des squences verbales semble donc indiquer que nos apprenants dbutants extraient le verbe de la SP, mme sils ne sont pas encore en mesure den varier linflexion, ni dutiliser le pronom personnel 1re personne de manire productive. Cependant, ces squences verbales sont relativement simples structurellement, et il serait intressant de voir ce qui se passe dans le contexte de SP plus complexes syntaxiquement. Nous allons donc maintenant tudier le dveloppement de squences interrogatives trs courantes ds le dbut de notre tude, comprenant toutes un pronom interrogatif en tte de phrase ainsi que linversion du sujet et du verbe flchi. Les changes dinformations personnelles ds les premiers cours de langue contiennent des questions structurellement trs complexes, telles que comment tappelles-tu ? o habites-tu ? quel ge as-tu ? quelle est la date de ton anniversaire ? etc. . Ds notre premier recueil de donnes, aprs seulement un trimestre de franais, ce type de question fait son apparition. Ces squences interrogatives cohabitent avec dautres questions structurellement trs diffrentes, souvent au sein dun mme change. Par exemple, un de nos apprenants produisit quelques secondes dintervalle les trois questions suivantes: (11) (12) (13)
17

quelle est la date de ton anniversaire ? euh tu ge ? nom ?

Ceci en dpit du fait que les verbes ne sont jamais enseigns non-flchis; dans le rcit en question, lapprenant vient dentendre le chercheur produire regarde plusieurs fois

153

Les propositions interrogatives en franais sont complexes, car elles impliquent la prposition du pronom interrogatif ainsi que linversion du verbe et du sujet. De plus, un pronom rflexif vient parfois compliquer encore les choses, comme dans lexemple ci-dessous: (14) comment tappelles tu ? Prposition pronom rflexif verbe flchi pronom personnel sujet

Il serait trs surprenant que la grammaire dapprenants dbutants puisse gnrer de telles constructions. Ces squences sont donc trs certainement prfabriques, les apprenants ne les ayant pas encore segmentes en leurs constituants syntaxiques. Ceci devient encore plus vident lorsque nous comparons ces structures aux autres questions produites par les apprenants lorsquils nont pas de SP leur disposition. 3.2.2.1 Dveloppement du systme interrogatif Treize de nos seize apprenants dbutants produisent comment tappelles-tu ds le premier recueil de donnes, sans aucune modification. Par contraste, si nous examinons les questions quils produisent quand ils ne peuvent avoir recours des SP, le verbe et le sujet ne sont jamais invertis, le verbe nest gnralement pas flchi, et la plupart de ces questions ne contiennent mme pas de verbe, comme lindique le tableau 3 18:
Tableau 3: proportion de questions non-SP avec/sans verbe. Anne 7 2me trimestre 41 (95.3%) 2 (4.7%) 43 (100%) Anne 7 3me trimestre 129 (83.8%) 25 (16.2%) 154 (100%) Anne 8 1er trimestre 53 (82.8%) 11 (17.2%) 64 (100%) Anne 8 2me trimestre 235 (87.4%) 34 (12.6%) 269 (100%) Anne 8 3me trimestre 287 (79.5%) 74 (20.5%) 361 (100%) Anne 9 1er trimestre 182 (81.3%) 42 (18.8%) 224 (100%)

Questions verbe Questions verbe Total

sans avec

Quand nos apprenants nont pas de SP dans leur rpertoire leur permettant de rpondre leurs besoins communicatifs, ils ont recours dans plus de 80% des cas la simple juxtaposition de syntagmes nominaux et/ou prpositionnels. Quand ils utilisent un verbe, il nest jamais inverti et trs rarement flchi. Le dveloppement de leurs questions est rsum dans (Myles et al., 1999) de la faon suivante: - Stade 1: absence de verbes je grand maison ? et activits soir la cinma ? - Stade 2: verbes non-flchis euh la mre regarder la magasin ? umm euh jouer au tennis ? - Stade 3: verbes flchis (trs petit nombre dapprenants en anne 9) la mre regarde euh lire euh la petite frre et sur euh fchent ?(=pchent)

18

Extrait de Myles (2004: 148)

154

une journaliste dit est le monstre de Lac Ness ? Il semble donc clair que le systme interrogatif en dveloppement est trs diffrent structurellement des SP interrogatives qui leur coexistent. Il serait intressant de voir maintenant ce que deviennent ces SP au cours de lacquisition: les apprenants les abandonnent-ils quand leur systme productif devient plus sophistiqu ? Ou bien les segmentent-ils afin dutiliser leurs constituants de manire productive ? Cest ce que nous examinons dans la prochaine section. 3.2.2 Dveloppement des SP interrogatives Au cours de cette section, nous allons suivre le dveloppement de la SP comment tappellestu ? ainsi que ses contextes dutilisation. En particulier, puisque cette SP est la 2 me personne du singulier (tant donne la nature des activits par paires courantes en salle de classe), nous allons examiner comment nos apprenants demandent le nom de quelquun la troisime personne, ce que requirent plusieurs des activits avec le chercheur. Le tableau 4 indique combien de fois ils utilisent la SP sans la modifier (donc la deuxime personne), et combien de fois ils indiquent correctement la troisime personne dans leurs questions.:
Tableau 4: nombre dutilisations de la SP comment tappelles-tu avec rfrence une 3 me personne.19 SP 3me personne Total Anne 7 18 (52.9%) 16 (47.1%) 34 (100%) Anne 8 31 (39.2%) 48 (60.8%) 79 (100%) Anne 11 27 (22.7%) 92 (77.3%) 119 (100%)

Au dbut de lapprentissage, plus de la moiti des questions visant demander le nom dun tiers se servent de la SP la deuxime personne comment tappelles-tu. La proportion diminue ensuite et reprsente moins dun quart de ces questions en anne 11. Les apprenants semblent analyser graduellement les constituants de cette SP. Dans un premier lieu, ils ne peuvent la modifier et ils nont pas encore les moyens linguistiques pour indiquer le rfrent 3me personne et ils lutilisent donc telle quelle (exemple 14). Dans un second stade, ils se rendent compte que la SP na pas la rfrence correcte, et ils ajoutent un rfrent, sans toutefois la modifier (exemple 15). Dans un troisime stade, ils omettent le pronom sujet de la SP, sachant que la rfrence est incorrecte, et le remplacent parfois par un syntagme nominal (exemple 16). Ensuite, dans un quatrime stade, ils ralisent que le pronom rflexif t est aussi la 2me personne, et ils le remplacent par s; le sujet est alors soit omis, soit remplac par un syntagme nominal (exemple 17). Enfin, dans un cinquime stade, la structure quivalente la SP dorigine, mais la 3 me personne, est utilise (exemple 18); un seul lve dbutant atteint ce stade en anne 9. Stade 1: (14) comment tappelles-tu ? (= comment sappelle-t-il ?) Stade 2: (15) comment tappelles-tu le garon ? (= comment sappelle le garon ?) Stade 3:
19

Extrait de Myles (2004: 150)

155

(16) comment tappelle (la fille) ? (= comment sappelle la fille ?) Stade 4: (17) comment sappelle, comment sappelle gar- un garon ? (= comment sappelle le garon ?) Stade 5: (18) comment sappelle-t-il ? Le dveloppement de cette SP interrogative montre donc quelle est analyse au cours du processus dacquisition, et que ses constituants sen dgagent. La section suivante explore la relation entre lanalyse des SP et la construction dun systme grammatical productif chez nos apprenants. 4. DISCUSSION Deux questions principales vont tre abordes dans cette section. Premirement, quelle est la relation entre le savoir appris, c'est--dire ce que les apprenants ont appris par cur sans lavoir analys, et le savoir acquis, c'est--dire la grammaire productive que les apprenants construisent au cours du processus dacquisition ? Est-ce que les apprenants abandonnent les SP quand leur grammaire productive est suffisamment sophistique pour permettre de rpondre leurs besoins communicatifs, ou bien ces SP alimentent-elles le processus dacquisition lui-mme ? Deuximement, nous discuterons du rle grammatical de ces SP dans le systme linguistique en construction. Si les apprenants les juxtaposent avec dautres lments, comme par exemple quand ils produisent comment tappelles-tu le garon, quel est leur statut: sont-elles des verbes, des noms ? 4.1 Relation entre savoir acquis et savoir appris La relation entre savoir acquis et savoir appris dans le contexte de lacquisition des langues secondes reste de nos jours controverse, certains chercheurs en faveur dune telle relation (Myles et al., 1998; 1999; Myles, 2004; Towell & Hawkins, 1994), et dautres revendiquant que les deux types de savoir sont indpendants lun de lautre (Krashen & Scarcella, 1978; Schwartz, 1993). Si les SP sont un exemple de savoir appris, puisque nous avons dmontr clairement quelles ne sont pas construites en ligne selon les oprations grammaticales dusage, leur dcomposition au fur et mesure de lapprentissage va nous permettre dexplorer leur rle dans la construction du systme grammatical productif. Nous avons vu que chez les dbutants, les SP verbales diffrent du systme productif verbal parce quelles contiennent un verbe flchi, alors que les autres verbes sont non-flchis. Nous avons aussi vu que lorsque les apprenants commencent segmenter les SP et en utiliser le verbe de manire productive, le verbe reste toujours flchi. Ceci est en contraste avec les autres verbes, qui commencent non-flchis, avant dtre graduellement flchis. Il semblerait donc que le systme productif verbal rattrape en quelque sort la grammaire plus avance contenue dans les SP verbales. En effet, ce ne sont pas les SP qui se modifient afin daccommoder la grammaire en cours, puisque nous ne trouvons jamais de verbes non-flchis originaires de SP. Ceci devient encore plus clair dans le contexte des SP interrogatives. Ces 156

dernires se dcomposent au cours de lapprentissage, mais restent toujours dun niveau grammatical plus avanc que le reste des questions des apprenants. Elles ne se simplifient pas pour accommoder la grammaire productive, qui aux premiers stades ne contient ni inversion ni verbes flchis, donnant lieu par exemple comment tappeler-tu, mais elles se modifient afin de changer leur porte communicative, tout en gardant leur complexit. De plus, dans notre corpus dbutant, ce sont les apprenants dont le systme grammatical productif est le plus avanc qui ont le plus grand stock de SP dans leur rpertoire, et qui plus est, qui les analysent et les segmentent. Ceci va lencontre de lhypothse que les SP ne servent que de bquilles communicatives en attendant que le systme productif soit mme de remplir les mmes fonctions, et quelles sont abandonnes quand elles ne sont plus ncessaires. Il semble que les SP servent de modle linguistique qui facilite llaboration dhypothses plus complexes. A lautre extrme, nous avons des apprenants dans ce corpus qui nont pas russi mmoriser les SP au del du premier recueil de donnes; ces apprenants sont aussi ceux qui ont fait trs peu de progrs aprs deux annes, et en sont encore au stade sans verbe ou ils juxtaposent des syntagmes nominaux et/ou prpositionnels. 4.2 Statut grammatical des squences prfabriques Mais sil semble clair que les SP jouent un rle central dans le dveloppement linguistique de lapprenant dbutant, quel est leur statut dans son systme linguistique mergent ? Nous avons vu quelles semblent se comporter comme une unit lexicale immuable ? Si cela est le cas, quelle catgorie syntaxique appartient-elle ? 4.2.1 Les tout dbuts de lapprentissage Attardons nous un instant sur la tche faisant face aux apprenants au tout dbut de lapprentissage. Ils sont immdiatement confronts trois tches essentielles: (a) tablir une correspondance entre reprsentations smantiques et de nouvelles squences phonologiques, c'est--dire construire un lexique (contenant des informations dordre smantique, syntaxique, morphologique et phonologique); (b) construire de nouvelles reprsentations de la faon dont les mots sont combins (reprsentations syntaxiques); (c) apprendre accder ces reprsentations en temps rel (perceptivement et productivement). Leur priorit est sans aucun doute lapprentissage dun certain nombre de mots et expressions qui vont leur permettre leurs premiers changes communicatifs. Lapprentissage de mots et expressions nest cependant pas simple. Les apprenants doivent non seulement apprendre comment les prononcer, mais ils doivent aussi leur associer non seulement une catgorie syntaxique, mais aussi un cadre syntaxique, c'est--dire avec quelles autres catgories syntaxiques ils se combinent et de quelle faon. Les apprenants dbutants ont tendance viter les verbes au dbut de lapprentissage (Housen, 2002; Lakshmanan & Selinker, 2001; Myles, 2005) parce quils sont plus complexes acqurir; ils ncessitent lapprentissage non seulement de leur phonologie et morphologie mais aussi de leur structure argumentale. Il faut savoir comment raliser les relations quils ont avec les autres lments dans la phrase tels que sujet et complments ventuels, ce qui est complexe.

157

Ce que je voudrais suggrer, cest que dans un stade initial, les apprenants tablissent une correspondance approximative entre une reprsentation smantique et une squence phonologique, un peu comme les enfants apprenant leur langue maternelle qui surgnralisent le sens des mots en se fixant sur une caractristique smantique dun mot quils utilisent alors pour tous les objets partageant cette caractristique, appelant par exemple chat tous les animaux. Chez les apprenants L2, cette correspondance est aussi trs approximative initialement, tant donnes les limites de leur rpertoire lexical, et ils essaieront de trouver le mot ou expression qui est le plus proche de leur besoin communicatif du moment. Par exemple, la reprsentation smantique [demander nom] naura initialement quune seule correspondance phonologique: [comment tappelles-tu ?], de mme que [donner nom]=[je mappelle]. A ce stade, les apprenants nont pas encore assign une reprsentation syntaxique cette squence. Quand ils doivent demander le nom dun garon par exemple, ils se contentent de juxtaposer les reprsentations smantiques de leur rpertoire qui sont le plus proche de ce but: [demander nom]+[garon]=[ comment tappelles-tu][le garon]. Le fait que les apprenants L2 surgnralisent des squences plus longues que les enfants L1 est probablement d au fait que leur mmoire a une capacit plus importante. Si les apprenants L2 ne semblent pas passer par le stade deux mots typique de lacquisition en contexte L1, mme si leurs premires productions semblent aussi tre dficientes morphosyntaxiquement, cest sans doute parce que leur units lexicales peuvent paratre mulitimorphmiques alors quen fait elles ne le sont pas vraiment. Si [ comment tappelles-tu ?] est en fait une seule unit lexicale sous-spcifie, une production telle que [comment tappelles-tu][le garon] nest peut-tre pas vraiment diffrente syntaxiquement des productions typiques du stade deux mots chez les enfants. Lacquisition de la syntaxe, ainsi que du traitement en ligne de linformation linguistique ncessaire la rception et production, est nettement plus complexe et lente, et beaucoup moins mme de bnficier de raccourcis ou dapprentissage par cur. Les apprenants L2 vont donc avoir recours la mmorisation de SP qui vont leur servir non seulement pouvoir communiquer avant que leur systme linguistique productif ne soit en mesure de le faire, mais aussi donner limpression davoir une maitrise de la langue bien suprieure quelle ne lest vraiment, ce qui peut tre trs utile en examen! Le fait quune reprsentation smantique ne corresponde pas ncessairement un seul mot ou morphme est bien reconnu par les thoriciens en smantique; en fait, cest sans doute la seule chose avec laquelle ils sont tous daccord selon Jackendoff (2002: 123-4): [] elles sont toutes deux [smantique formelle et grammaire cognitive] des thories du sens en tant que systme combinatoire sophistiqu. Les units de ce systme ne sont ni dans lune ni dans lautre des noms et des verbes; ce sont des entits telles que des individus, des prdicats, des variables et des quantificateurs. []. Le sens a donc un inventaire dunits de base et de moyens de les combiner qui est aussi diffrent de la syntaxe que la syntaxe ne lest de la phonologie. 20 Appliqu notre contexte dapprentissage, lorsque lapprenant doit, par exemple, demander le nom du garon, les units smantiques requises peuvent tre dcrites comme suit: [Q(uestion); nom; garon]. Lorsquun de nos apprenants produit durant le mme change nom le garon ? et comment tappelles-tu le garon ?, il semblerait que la mme structure smantique est ralise de deux manires diffrentes:
20

Ma traduction

158

[Q] intonation; [nom] nom; [garon] le garon [Q + nom] comment tappelles-tu; [garon] le garon Lors de ce premier stade, nous avons affaire une correspondance entre la reprsentation smantique et le lexique rudimentaire, sans structure syntaxique. Il est vident dans ces deux exemples que nous ne pouvons leur assigner une structure syntaxique: le premier na pas de verbe, et le second semblerait avoir un verbe avec deux sujets, tu et le garon, alors que nous savons daprs le contexte que le sujet ne peut tre que le garon. Les SP dans cette analyse sont des squences qui contiennent plus dune unit smantique, dans notre exemple [Q + nom]. 5. CONCLUSION Le point de dpart pour les apprenants L2 est dessayer de communiquer avec les moyens du bord. Pour ce faire, ils doivent premirement tablir des correspondances entre smantique et squences phonologiques, ce quils font en projetant des projections lexicales uniquement, de tailles varies et souvent mulitimorphmiques comme dans le cas des SP. En une deuxime tape, ils doivent assigner ces squences (mots ou SP) des spcifications syntaxiques. Cela prend du temps, et au dpart, il est courant que leurs productions aient une reprsentation smantique mais pas encore syntaxique, comme dans le cas des SP tudies ici. Les SP contiennent gnralement des verbes, qui sont plus complexes acqurir cause de leur rle architectural dans la syntaxe de la phrase et de leur morphologie complexe. Ces SP sont segmentes au cours de lapprentissage, et servent de modle linguistique lapprenant pour la construction du systme grammatical productif. Elles permettent lapprenant de communiquer en dpit de moyens linguistiques limits, et de sembler possder un systme linguistique plus complexe, fluide et correct quen ralit. Lorsque la SP nest pas approprie au contexte, lapprenant a recours a plusieurs stratgies afin de faire passer son message, et commence la segmenter. Cest cette tension entre les SP complexes et fluides mais pas encore analyses en termes de leurs constituants, ce qui leur permettrait de les modifier afin de rpondre aux besoins du moment, et le systme productif en bauche, qui fait progresser lapprenant.

BIBLIOGRAPHIE Hakuta, K. (1974). Prefabricated patterns and the emergence of structure in second language acquisition. Language Learning 24, 287-297. Hakuta, K. (1976). A case study of a Japanese child learning English as a second language. Language Learning 26, 321-351. Housen, A. (2002). A corpus-based study of the L2 acquisition of the English verb system. In S. Granger, J. Hung & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language learning, (pp. 77-116). Amsterdam: John Benjamins.

159

Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution . New York: Oxford University Press. Krashen, S., & Scarcella, R. (1978). On routines and patterns in language acquisition and performance. Language Learning 28, 283-300. Lakshmanan, U., & Selinker, L. (2001). Analysing interlanguage: how do we know what learners know? Second Language Research 17(4), 393-420. Myles, F. (2004). From data to theory: the over-representation of linguistic knowledge in SLA. In R. Towell & R. Hawkins (Eds.), Empirical evidence and theories of representation in current research in Second Language Acquisition . Transactions of the Philological Society. Myles, F. (2005). The emergence of morpho-syntactic structure in French L2. In J.-M. Dewaele (Ed.), Focus on French as a foreign language: Multidisciplinary approaches . Clevedon: Multilingual Matters. Myles, F., Hooper, J., & Mitchell, R. (1998). Rote or rule? Exploring the role of formulaic language in classroom foreign language learning. Language Learning 48(3), 323-363. Myles, F., Mitchell, R., & Hooper, J. (1999). Interrogative chunks in French L2: A basis for creative construction? Studies in Second Language Acquisition 21(1), 49-80. Pawley, A., & Syder, F. (1983). Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In J. Richards & J. Schmidt (Eds.), Language and communication, (pp. 191-266). London: Longman. Raupach, M. (1984). Formulae in second language speech production. In D. Dechert, D. Mhle & M. Raupach (Eds.), Second language production, (pp. 114-137). Tbingen: Gunter Narr. Schwartz, B. (1993). On explicit and negative data effecting and affecting competence and linguistic behavior. Studies in Second Language Acquisition 15, 147-163. Skiba, R., & Dittmar, N. (1992). Pragmatic, semantic, and syntactic constraints and grammaticalization. Studies in Second Language Acquisition 14, 323-349. Towell, R. (1987). A discussion of the psycholinguistic bases for communicative language teaching in a foreign language teaching situation. British Journal of Language Teaching 25 , 91-101. Towell, R., & Hawkins, R. (1994). Approaches to second language acquisition: Multilingual Matters. Vihman, M. (1982). Formulas in first and second language acquisition. In L. Obler & L. Menn (Eds.), Exceptional language and linguistics, (pp. 261-284). New York: Academic Press. Weinert, R. (1995). The role of formulaic language in second language acquisition: a review. Applied Linguistics 16, 180-205.

160

Wong-Fillmore, L. (1976). The second time around: cognitive and social strategies in second language acquisition. Unpublished doctoral dissertation, Stanford University, Stanford, CA. Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.

161

162

BILINGUAL READING: AN ESSENTIAL FACTOR FOR THE ACQUISITION OF WRITTEN COMPETENCE IN A THIRD LANGUAGE. Helena Roquet Pugs & Carme Prez Vidal Universidad Pompeu Fabra, Spain

1. INTRODUCTION Abundant research from diverse studies has demonstrated that bilingual and multilingual subjects enjoy many advantages with respect to monolinguals when acquiring a new language (Peal & Lambert, 1962; Cummins, 1976; Ben-Zeev, 1977; Ricciardelli, 1992; Cenoz & Genesee, 1998, among others). For this reason, experts in the field are now focusing on identifying these advantages and their specific characteristics (Klein, 1995; Gonzlez, 1997; Sanz, 1997; Sanz, 2000, among others). Some of these studies make special reference to the degree of competence that the bilingual should have in their L1 and L2 in order to achieve higher results in the acquisition of an L3 (Cummins, 1976; Bild & Swain, 1989; Swain, Lapkin, Rowen & Hart, 1990; Cenoz, 1992; Cenoz & Valencia, 1994; Lasagabaster, 1997; or Sanz, 2000). Such research has driven the study at hand, which seeks to document a relationship between the levels of competence in Catalan and Spanish, and the results obtained in an L3 (English), using data from Catalan students in their second year of middle school (ESO) or final year of high school (bachillerato). It is observed that the level of balance between the L1 and L2 of these bilingual subjects affects their writing of English texts. Following the Measures for the analysis of EFL writing proposed by Navs, Torras and Celaya (2003), we analyze the aspects of syntactic and lexical complexity, accuracy and fluency in the subjects written production in English, as well as in Catalan and Spanish. In this way we are able to determine profiles of written linguistic competence in the three languages. If the results confirm our hypotheses, this study could serve to encourage the practice of developing balanced L1 and L2 reading habits for students in bilingual communities, as a means of improving their acquisition of new languages. 2. OBJECTIVESHYPOTHESES Taking into account the results of a preliminary study on balance in bilingualism and its consequences for texts written in an L3 (Roquet, 2003), the primary objective of the study at hand is to measure the repercussions of the degrees of balance and the threshold levels of our bilingual subjects, with respect to their written L3 competence; accordingly, it aims to investigate how the language or languages in which the subjects read may determine their degrees of balance and threshold levels. Hypotheses

163

1. Those bilingual subjects who regularly read in Catalan and Spanish will show a greater tendency to overcome threshold levels in these languagesas will be demonstrated by their written productionthan those subjects who read regularly in only one of the two languages. 2. Those bilingual subjects who read regularly in Catalan and Spanish will show a tendency to reach higher levels of balance in the two languagesas will be demonstrated by their written productionthan those subjects who read regularly in only one of the two languages. 3. Those bilingual subjects who read regularly in Catalan and Spanish and who, therefore, will show a greater tendency to overcome threshold levels and obtain balance in these languagesas will be demonstrated by their written production will, in general, reach higher levels of competence in their written production of L3 English (as quantified by syntactic and lexical complexity, accuracy, and degree of fluency). 3. METHODOLOGY 3.1 Approach Our study consists of measuring levels of competence in English, Catalan, and Spanishwith regards to the syntactic and lexical complexity, accuracy, and fluency observed in written samplesfollowing the measurement matrix proposed by Prez Vidal, Torras & Celaya (2000). The results for Catalan and Spanish are essential to measure the level of balance and the possibility of overcoming of the threshold levels. This data is then related to the profiles of English competence to verify whether threshold levels and balance in the L1s correspond to L3 levels in different degrees. 3.2 Subjects The sample for this study consists of 58 subjects between 13 and 17 years of age; all of these students are in their second year of ESO (middle school) or second (and final) year of bachillerato (high school).
Table 1/ Distribution of the sample according to school level and geographic location (Source: Primary analysis). Town 2 ESO (middle school) 2 bachillerato (high school) Cornell de Llobregat 22 10 Figueras 19 7

In spite of the fact that all subjects in the sample know how to speak both Catalan and Spanish, it was assumed that the majority of students from Cornell de Llobregat, a town in the Barcelona metropolitan area, would have Spanish as an L1, as the area has been primarily populated by Spanish-speaking families beginning in the 50s and 60s. In contrast, it was expected that majority of those subjects from Figueras would have Catalan as an L1, as this town is situated in the Alt Ampurd area where Catalan is the dominant language. As a result

164

of these geographical differences, we obtained a highly varied sample with regards to the students linguistic profiles. 3.3 Analysis Each of the 58 subjects was administered a sociolinguistic questionnaire as well as written tests in Catalan, Spanish, and English (essays with specific topics). In this way we obtained 12 primary variables:
Table 2: Variables Source: Primary analysis. Level of English Gender Age L1 Immediate environment language Habitual language of reading Habitual language of writing Extracurricular L3 Motivation Balance in written texts Catalan threshold Spanish threshold

Afterwards, the writing samples were evaluated with a matrix of our own design, though keeping in mind the matrix of measurements proposed by Prez-Vidal, Torras & Celaya (2000).
Table 3: Matrix to determine the profile of written linguistic competence. Source: Adapted from Prez-Vidal, Torras, Celaya (2000). Coordination index Nodes per sentence Auxiliary verbs per clause Discursive markers per clause Number of paragraphs per minute Lexical density Adverbs per clause Lexical verbs per clause Errors per clause % of error-free clauses Total number of clauses Total number of words Words per clause Total number of sentences Clauses per sentence Words per sentence Words per minute

SYNTACTIC

COMPLEXITY

LEXICAL

ACCURACY

FLUENCY

4. RESULTS (This article uses the data from the 13-year-old middle-school students)
15 from the group that read in both languages achieved 16 shows that more subjects Graph 1 14 balanced 14 bilingualism with high threshold levels. The observed difference between subjects from the two groups is statistically significant (ANOVA p<0.05). 12 10 8 6 4 2 0 21 bil reading 20 mon reading 7 5 upper threshold & balanced lower threshold / not balanced

165

Graph 1. Threshold level and balance according to habitual reading language (Source: Primary analysis).

Graph 2 shows how those subjects with balanced bilingualism and high threshold levels obtained better results in their L3 English. The difference is significant. They were followed by those subjects who read in both languages but did not overcome the threshold level and were not balanced, and by those who read in only one language.
10

English level
0
N= 14 7 20

bilread,thresh,balan

monol. reading

bilin. reading Graph 2. Level of English according to habitual reading language, threshold level, and balance (Source: Primary analysis).

In graph 3 we see how those bilingual subjects who read in both languages obtained better results than those who read more (or only) in one of the two languages, for all the measures of fluency, accuracy, and complexity. The ANOVA analysis yields a significant difference: p<0.05. The level of accuracy was the measure for which those bilinguals who read in both languages enjoyed the greatest advantages.
6,5 6 5,5 5 4,2 4,5 4 3,5 2,9 3 2,5 English level 2 1,6 1,5 1 0,5 0 Fluency 6 5 4,3 3,9 3,2 2,5 2 2 3,1 Spanish Catalan Bilingual

166
Accuracy Syntactic Complexity Lexical Complexity

Graph 3. Level of fluency, accuracy, and complexity in English, according to habitual reading language (Source: Primary analysis).

5. DISCUSSIONCONCLUSION Primarily, we can conclude that the 13-year-old subjects of our sample who regularly read in both Catalan and Spanish were more likely to overcome the threshold levels in written competence in these languages than were those subjects who habitually read in only one of the two languages. With regards to the degree of balance, the results pertaining to our second hypothesis show that the 13 year old subjects who habitually read in Catalan and Spanish also obtained greater degrees of balance in these languages than those subjects who habitually read in only one of the two. Finally, in relation to our third hypothesis, the results allow us to conclude that our 13-yearold bilingual subjects who read habitually in Catalan and Spanish overcame the threshold effect in these languages and had balanced levels in the two languages. Unlike the first two hypotheses, in which these factors were observed in isolation, this hypothesis allows us to confirm that there are more subjects with bilingual reading habits who overcome the threshold level and become balanced, than there are subjects with bilingual reading habits who lack these characteristics. In addition to this, our results show that those subjects who regularly read in both Catalan and Spanish and who, as a result, overcame the threshold levels and were balanced in these languages, tended to reach higher levels of competence in their L3 English than those subjects who habitually read in only Catalan or Spanish. In summary, we can conclude that the reading habits in Catalan and Spanish of our 13year-old bilingual subjects was a determining factor for their reaching higher levels of competence in written L3 English. This was true independent of gender, motivation and language spoken at home with the family. Our data confirm results by other studies relating bilingual reading habits to higher levels of L3 competence. Our findings indicate that those subjects with bilingual reading habits will overcome threshold levels and, at the same time, reach balanced levels in their two languages. There are various psycholinguistic reasons that may explain why bilingualism, and especially balanced bilingualism, provides an advantage for the acquisition of an L3. Specifically, it may be helpful to invoke the hypothesis of interdependent development proposed by Cummins (1979) as it provides the groundwork for many of these explanations. According to this hypothesis, the linguistic abilities developed in each language form a common underlying competence that consists of the deep linguistic knowledge that the individual possesses based on the languages that they know. In this way, when the bilingual subject learns an L3, they apply their knowledge of the two systems they already possess to approach the new linguistic code; the common underlying competence provides them with a set of references that helps 167

them manage and appropriately use the new language. Logically, the greater and more balanced their prior knowledge is, the more effectively they will be able to apply it to the new code. REFERENCES Baetens Beardsmore, H. (1989). Principis bsics del bilingisme. Barcelona: Ediciones de la Magrana. Been-Zeev, S. (1977). The influence of bilingualism on cognitive strategy and cognitive development. Child development 48, 1009-1018. Bild, E.R. & Swain, M. (1989). Minority language students in a French immersion programme: Their French proficiency. Journal of Multilingual and Multicultural Development 10, 255-274. Celaya, M.L., Prez, C., Torras, M.R. (2001). Matriz de criterios de medicin para la determinacin del perfil de competencia lingstica escrita en ingls lengua extranjera. RESLA 14, 87-89. Cenoz, J., Valencia, J.F. (1994). Additive trilingualism: Evidence from the Basque country. Applied Psycholinguistics 15, 195-207. USA: Cambridge University Press. Cenoz, J., Genesee, F. (1998). Beyond bilingualism. Multilingualism and multilingual education. Clevedon: Multilingual Matters LTD. Cenoz, J. (1998). Multilingual education in the Basque Country. Beyond bilingualism. Multilingualism and multilingual education . Clevedon: Multilingual Matters LTD. Cook, V.J. (1995). Multi-competence and the learning of many languages. Language, Culture and Curriculum 8, 93-98. Cummins, J. (1976). The influence of bilingualism on cognitive growth: A synthesis of research findings and explanatory hypothesis. Working Papers on Bilingualism 9, 1-43. Cummins, J. (1979). Linguistic interdependence and the educational development of bilingual children. Review of Educational Research 49, 222-251. Dewaele, J., Housen, A. & Wei, L. (2003). Bilingualism: Beyond basic principles. Clevedon: Multilingual Matters LTD. Daz, L. & Prez, C. (Eds.) (1997). Views on the acquisition and use of a second language . Barcelona: Universitat Pompeu Fabra. Genesee, F. (1988). A case study of multilingual education in Canada. Beyond bilingualism. Multilingualism and multilingual education . Clevedon: Multilingual Matters LTD. Gonzlez, P. (1997). Learning a second language in a third language environment. Eurosla: Views on the acquisition and use of a second language . Barcelona: Universitat Pompeu Fabra. 168

Hakuta, H & Diaz, R. (1984). The relationship between degree of bilingualism and cognitive ability: a critical discussion and some new longitudinal data. In K. E, Nelson. (Ed.), Children's language 5. Hillsdale, NJ: Lawrence Erlbaum Associates. Hamers, J., Blanc, M. (1989). Bilinguality and Bilingualism. Cambridge: Cambridge University Press. Herdina, P. & Jessner, U. (2002). A dynamic model of multilingualism. Clevedon: Multilingual Matters LTD. Klein, E.C. (1995). Second versus third language acquisition: is there a difference? Language Learning 45. Lambert, W.E. (1981). Bilingualism and Language Acquisition. Native language and foreign language acquisition, (pp. 9-22). Nueva York: The New York Academy of Sciences. Lasagabaster, D. (1997). Creatividad y conciencia metalingstica: incidencia en el aprendizaje del ingls como L3. Vitoria-Gasteiz: Servicio editorial de la Universidad del Pas Vasco. Leki, I. (2000). Writing, literacy and applied linguistics. Annual Review of Applied Linguistics 20, 99-115. Liceras, J. (1991). La adquisicin de las lenguas extranjeras. Madrid: Visor. Long, M. (1991). Measuring classroom language change. University of Hawaii at Manoa.Manuscript. Miralpeix, I., Navs, T. (2003). The influence of language dominance and language proficiency on L3 writing performance. Third International Conference on Third Language Acquisition and Trilingualism. Tralee, Eire. Muoz, C. (2000). Segundas lenguas. Barcelona: Ariel. Navs, T., Torras, M.R. & Celaya, M.L. (2003). Long-term effects of an earlier start. An analysis of EFL written production. Eurosla Yearbook 3, (pp. 103-129). John Benjamins Publishing Company. Peal, E., Lambert, W.E. (1962). The relation of bilingualism to intelligence. Psychological Monographs 76. Prez-Vidal, C., Torras, M.R., Celaya, M.L. (2000). Age and EFL Written Performance by Catalan/Spanish Bilinguals. Spanish Applied Psycholinguistics 4(2), 267-290. University of Illinois. Polio, Ch.G. (1997). Measures of linguistic accuracy in second language writing research. Language Learning 7(1), 101-13.

169

Ricciardelli, L.A. (1992). Bilingualism and cognitive development in relation to threshold theory. Journal of psycholinguistic research 21, 301-316. Roquet, H. (2003). Efectes de lequilibri en el bilingisme sobre els textos escrits en una tercera llengua. Treball programa doctorat Bel, A., Daz, L., i Prez, C. Universitat Pompeu Fabra. Sanz, C. (1997). L3 Acquisition and the Cognitive advantages of bilingualism: Catalans learning English. Eurosla: Views on the acquisition and use of a second language . Barcelona: Universitat Pompeu Fabra. Sanz, C. (2000). Bilingual education enhances third language acquisition: Evidence from Catalonia. Applied Psycholinguistics 21, 23-44. USA: Cambridge University Press. Skehan, P. (1989). Individual differences in second-language learning. London: Edward Arnold. Swain, M., Lapkin, S., Rowen, N. & Hart, D. (1990). The role of mother tongue literacy in third language learning. Language, Culture and Curriculum 3 (1). Wolfe-Quintero, K., Inagaki, S., Kim, H.Y. (1998). Second language development in writing: measures of fluency, accuracy and complexity. Hawaii: University of Hawaii at Manoa. Yugo, C. (2002). Efectos del equilibrio en bilingismo sobre el procesamiento del vocabulario en una tercera lengua. Trabajo acadmico. Barcelona: Universitat Pompeu Fabra.

170

171

STADES DE DEVELOPPEMENT EN FRANCAIS PERSPECTIVES HISTORIQUES ET FUTURES. Suzanne Schlyter, Jonas Granfeldt & Malin gren Lund University , Sweden

1. PERSPECTIVES HISTORIQUES 1.1 Profils Le but de ce colloque est dvaluer la production des apprenants de langue trangre en ce qui concerne les facteurs de complexit, de fluidit et de prcision. Or, ct de ces mesures bien connues, il existe aussi une autre tradition dvaluation, laquelle nous faisons rfrence ici, qui est celle de lanalyse des profils des enfants ou des apprenants adultes. Cette tradition remonte Crystal, Fletcher et Garman (1976), et Clahsen (1985), auteurs dont les analyses sont bases sur les itinraires et les stades de dveloppement de lenfant. Les analyses de profils de Clahsen, Profilanalyse, concernent lacquisition de la langue maternelle (L1) des enfants germanophones (Clahsen, 1986). Clahsen utilise un grand nombre de critres grammaticaux diffrents qui se sont avrs tre des indices stables de dveloppement. Le mme principe a aussi t propos pour lanalyse dune langue seconde: Clahsen (1985) proposait dvaluer le niveau de dveloppement de lallemand des immigrs adultes, apprenants dont on avait des enregistrements en langue spontane, mais pas de tests. 172

Dans le projet DuFDE (Deutsch und Franzsisch, Doppelter Erstsprachenerwerb), Hamburg 1985 90, dirig par Jrgen M. Meisel, le Profilanalyse de Clahsen (1986) a t utilis pour valuer le niveau de dveloppement de lallemand et du franais des enfants bilingues, et pour mesurer lquilibre entre leurs deux langues. Le profil a t partiellement adapt au franais L1 par Teresa Parodi et Suzanne Schlyter, qui cette poque travaillaient dans le projet (cf Schlyter, 1988). Manfred Pienemann, qui travaillait avec Clahsen dans le projet ZISA (cf Clahsen, Meisel & Pienemann, 1983) a dvelopp cette ide de profil pour tablir les niveaux de dveloppment des apprenants adultes germanophones et anglophones. Son travail a abouti un nombre de publications sur les stades de dveloppement et finalement dans loeuvre dj classique sur la thorie de processabilit (Pienemann, 1998). Les stades quil propose sont bass sur les itinraires de dveloppement, tudis dans un grand nombre de phnomnes linguistiques syntaxiques et morphologiques. La thorie de processabilit est propose pour expliquer ce dveloppement gnral. Trs tt, on a vu lintrt dutiliser ces stades dans un but dvaluation (Bachmann & Cohen, 1998). Dans une discussion sur les chelles dvaluation (rating scales) en acquisition L2, Brindley (1998:130) cite de Jong: What we need to know if we want to develop good scales is not linguistic knowledge of how language is structured, what all the features of language are; we need to know how somebody acquires language, that is, what the developmental stages in language acquisition are Poursuivant lide que les stades de dveloppement peuvent servir de base lvaluation des productions libres des apprenants L2, Pienemann dveloppe un outil semi-informatique, le Rapid Profile (Pienemann & Mackey, 1992). Cet outil a t une source dinspiration importante pour nos travaux, notamment pour le logiciel Direkt Profil qui sera prsent cidessous. 1.2 Stades de dveloppement en franais Il y a donc un grand intrt pratique et pdagogique tablir les stades de dveloppement pour plusieurs langues. Alors que Pienemann et ses collaborateurs en ont propos pour lacquisition de plusieurs langues, notamment lallemand, langlais, le sudois, larabe, et litalien, ils ne lont pas (encore) fait pour le franais. Il existe pourtant dautres propositions, dans dautres cadres. Ainsi, dans le cadre du programme ESF (Klein & Perdue, 1997), on propose trois niveaux pour les apprenants non guids: Les Variantes Pr-basique, Basique et Post-basique. Ces niveaux concernent les langues europennes tudies dans le programme ESF, le franais inclus, et sont bass sur la morphologie, la syntaxe, la smantique et la pragmatique. Valables pour plusieurs langues, ces stades sont ncessairement moins prcis que les stades proposs dans le cadre de la Thorie de la Processabilit, et ne concernent que le dbut de lacquisition, pas les niveaux les plus avancs. Les niveaux avancs ont en revanche t tudis par Bartning (1997; paratre) qui propose des stades ultrieurs. Parmi les stades de dveloppement qui comprennent aussi le franais, il faut aussi mentionner le Cadre Europen Commun de Rfrence / Common European Frame of Reference , bien que fond sur la pragmatique et non pas sur la morpho-syntaxe. Ce document est excellent pour orienter les enseignants de langue vers un enseignement plus communicatif et sloigner de la 173

tradition grammaire et traduction . Pourtant, le cadre nest pas toujours trs facile utiliser de manire prcise comme document dvaluation, et on ressent le besoin de le complter par des valuations plus exactes, par exemple du type des stades de dveloppement morphosyntaxiques. Cest pourquoi Inge Bartning et Suzanne Schlyter ont ressenti le besoin de faire une synthse de leurs travaux respectifs prcdents, sur lacquisition de diffrents phnomnes linguistiques du franais, dans le but dtablir des stades de dveloppement morpho-syntaxiques du franais L2 (Bartning & Schlyter, 2004). Deux corpus diffrents existaient dj comprenant des adultes sudophones apprenant le franais: le corpus InterFra de Stockholm (Bartning, apprenants guids et semi-guids) et le corpus de Lund (Schlyter, apprenants guids et non guids), et des tudes avaient t effectues assez indpendamment les unes des autres dans les deux projets. La possibilit existait donc de comparer les rsultats, pour avoir des rsultats dune plus grande gnralit. Le rsultat se trouve rsum dans Bartning & Schlyter (2004), dornavant B&S, mais aussi dans dautres publications (voir Sanell, 2007 pour une bibliographie exhaustive). 1.3 laboration des stades Dans llaboration de ces stades, nous sommes parties de nos propres tudes ou de celles de nos deux quipes sur des itinraires spcifiques de certains phnomnes, tels que les temps (Schlyter, 1996), laccord verbal (Bartning, 1998), laccord en genre (Bartning, 2000; Granfeldt, 2003), les pronoms clitiques sujet et objet (Granfeldt & Schlyter, 2004), etc. Nous avons combl les lacunes par des tudes spcifiques (parues ultrieurement), concernant par exemple lutilisation de la forme non finie dans un contexte fini (Schlyter & Bartning, 2005), ou la ngation (Sanell, 2007). Ainsi, nous avions tabli, par exemple, les itinraires suivants: Temps: Prsent > PC > Fut Pr > Impf > Fut S > Ppf > Cond > Subj (Schlyter, 1996; Kihlstedt, 1998 etc) Pronoms dobjet: Omission > postposition > pos intermdiaire > pos pr-auxiliaire Je voir _ > je voir lui > je vais le voir/ *jai le vu > je lai vu (Granfeldt & Schlyter, 2004) Ngation: Neg X > place incertaine > (ne) V pas > (ne) V rien/jamais > ng sujet *Je non parler > *je ne comprends > je V pas > personne ne V (Sanell, 2007) Accord sujet-verbe: Sans accord > accord aux/mod sing > acc verbes lexicaux 1p pl > accord 3p pl aux > accord 3p pl verbes lexicaux *je parler > je suis/il est > nous V-ons > ils sont > ils prennent (Schlyter & Bartning, 2005) partir de ces tudes de phnomnes spcifiques, nous avons regroup les donnes dans le but dobserver des stades relativement stables, au moins pour certains phnomnes. Il sest avr, par exemple, que quand les apprenants utilisaient les objets clitiques de type je voir lui , cette structure tait corrle avec lutilisation dun grand nombre de formes non finies et extrmement peu daccord sujet-verbe ( > Stade 2). Dautre part, la structure je lai vu , o le pronom tait plac devant lauxiliaire temporel, tait corrle avec laccord sujet-verbe pratiquement parfait, lapparition du plus-que-parfait et dautres formes temporelles 174

complexes ( > Stade 4). De telles relations taient la base de nos stades, alors que dautres phnomnes, qui ntaient pas lis au dveloppement, ont t exclus. Le tableau 1) ci-dessous, sur un sous-ensemble des enregistements, illustre ce procd: pour chaque enregistrement dun apprenant spcifique, nous avons indiqu lapparition (+/-) et la production (75% doccurrences marques), ce qui a permis dorganiser les donnes (longitudinales et transversales) dans une chelle implicationnelle, suggrant un continuum dveloppemental. Il semblait possible de reprer des faisceaux de traits dveloppementaux qui constituaient les noyaux des six stades de dveloppement que nous avons proposs: Stade 1 (initial); Stade 2 (post-initial); Stade 3 (intermdiaire); Stade 4 (avanc bas); Stade 5 (avanc moyen); Stade 6 (avanc suprieur).
Tableau 1. Rsum de lapparition et lutilisation des formes dans les deux corpus.
Phen/ Enreg J 1 C l C Ro a 1 1 H e 1 2 - -+ + -+ 1 -+ 1 Ja 46 Mt 1 Ca 35 Bj 1 Ro 23 Pt 12 Pe 47 Sr 12 To 12 Ka 1 Ro Os 45 12 Sm 12 Li 12 Mt 23 Ro Sm Em 67 34 47 Ka Sm5 23 E v 1 2 E 34 Ka 45 Yv 14 Ni 13 Ch 14 Ni 46 Pr 13 M g 14 + + + + + + + + + + + + + + + -+ -+ + 5 Le 14 Kn 13 Ma1 4 Mg 59 An 14 Pe V3 I4 Ke V3 Int 14 Ida4 Vi3

Li 13

PC (ne) V pas Sub ordin Mod +Infin Impf Forme finie-c NousV -ons FutP voudr (ne)V rien 3ppl ils ont Pr obj Pqpf FSmp Cond 3 ppl Vlx-nt Subj rien ne V gron dont Stade

-+ -+ + -+ + + -+ + -+ -+ -+ -+ -+ 2

-+ + -+ + + -+ -+ + -+ -+ -+ -+ 2

-+ + + + -+ -+ -+ + + -+ -+ -+ 2

-+ -+ + + -+ -+ -+ -+ + -+ -+ -+ -+ -+ -+ 2

-+ -+ + + -+ -+ -+ + -+ -+ 2

-+ + + + -+ -+ -+ + + -+ -+ -

-+ + + + + + -+ -+ + + + -+ -+ -+ -+ -

-+ -+ + + -+ + + -+ -+ 3

-+ + -+ + + + -+ -+ -+ + + -+ + -+ 3

+ + + + + -+ -+ + -+ + + + -+ -+ -+ -+ -+ -+ 3

+ + + + + + + + + -+ + -

+ + + + + + + + + + + + -+ -+ -+ -

+ + + + + + + + + + + + + 4

+ + + + + + -+ + + + + -+ + + -+ -+ -+ 4

+ + + + + + -+ + + + + -+ + + + + -+ -+ -+ + 4

+ + + + + + + + + + + + + + + -+ -+ -+ + 5

+ + + + + + + + + + + + + + + -+ -+ +5

+ + + + + + + + + + + + + + + -+ -+ + + 5

+ + + + + + + + + + + + + + -+ 5

+ + + + + + + + + + + + + + + + -

+ + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + -

2-3 3

3-4 4

+ 5-6 5-6 5-6

Lgende: Les apprenants souligns: corpus Lund; en gras: non guids; Si la valeur dun trait diffre dans la mme case, celle qui se trouve sur la 2 e ligne concerne le corpus Lund. Abrviations: - voudrai = (je) voudrais utilis comme formule passe-partout (nest pas compt comme conditionnel);

175

- (ne) V rien: inclut aussi neVjamais, neVpersonne; - rien ne V: inclut aussi personne ne V (sujet ni). Valeurs: Dans le cas gnral: - = pas doccurrences; -+ = la forme apparat; + = partir de 75% environ.

Cas spcifiques: - Forme finie c (=formes finies courtes): - 40-65%; -+ 65-90%; + <90%; - Pr obj(=pronoms objet): - = *svo; -+ = s(v)oV ou positions variables; + = sovV (=avant lauxiliaire), seulement recenss pour le corpus Lund; - (ne) V pas (=ngation): - pas doccurrences; -+ structures correctes et non correctes

Les six stades que nous proposons sont les suivants (pour plus de dtails, voir B&S 2004, Schlyter 2003, Bartning paratre): Stade 1, initial: On trouve trs peu de morphologie verbale. Les apprenants utilisent beaucoup de formes non finies dans des contextes finis (et linverse), cest--dire quils parlent linfinitif (je manger/). Souvent, ils nutilisent que des syntagmes nominaux/adjectivaux; la ngation se trouve surtout devant ces derniers ( non grand-lit); la morphologie grammaticale ne manque pas compltement, car on trouve gnralement larticle dfini et indfini ainsi que le pronom sujet je, bien que trs souvent accentus et non lids. Stade 2, post-initial: la morphologie grammaticale verbale commence tre utilise (PC, Mod+Inf) mais les marques du temps ou de laccord sujet-verbe manquent encore souvent; ici la subordination apparat; la ngation commence figurer aprs le verbe fini, mais est encore place dans des positions errones; les pronoms dobjet sont postposs (* je voir le); les prpositions sont souvent non amalgames: * le, *de le, *au le (Granfeldt, 2003). Stade 3, intermdiaire: la morphologie verbale est assez stabilise (PC, Mod+Inf, laccord nous Vons), mais il reste pourtant beaucoup de formes incorrectes (formes non finies, singulier pour le pluriel, etc.); la ngation ( ne-V-pas) est utilise comme dans la langue cible; les pronoms dobjet se trouvent trs souvent dans la position intermdiaire (* jai le vu); Stade 4, avanc bas: les formes non finies dans des contextes finis nexistent pratiquement plus; les formes verbales complexes (Ppf, Cond) apparaissent, mais ne sont pas encore utilises correctement; les ngations plus complexes (avec rien, jamais, etc.) apparaissent; les pronoms dobjet sont maintenant nettement des clitiques, placs devant le verbe fini (je lai vu); lamalgame prp+article (du, des, aux) fonctionne assez bien chez plusieurs des apprenants (Granfeldt, 2003). Stade 5, avanc moyen: les formes verbales complexes sont maintenant utilises assez correctement, et le subjonctif devient plus productif; laccord de 3p pl est fait avec les verbes en -ont (ils ont/ sont/ vont), mais manque souvent avec dautres verbes (* les Xs vien); en et y apparaissent; le compactage du type grondif commence, et les relatives introduites par dont apparaissent. Stade 6, avanc suprieur: la morphologie flexionnelle est stabilise, mme dans les noncs multipropositionnels. Ce nest que maintenant que le subjonctif est utilis comme chez les 176

natifs, et laccord de la 3p pl ( ils prennent) est fait correctement. On observe un haut degr d'empaquetage, dellipse et d'intgration des propositions. Il est pourtant vident que les six stades proposs ici sont de nature encore hypothtique. Nous esprons que des tudes ultrieures sur des corpus dcrit plus riches (cf. Granfeldt et Nugues, 2007) pourront mieux vrifier et prciser nos rsultats. Nous proposons ces stades uniquement pour les apprenants sudophones adultes du franais L2, et loral, mais ils sont probablement valables pour les apprenants du franais de toutes les langues scandinaves et, dans une certaine mesure, dautres langues germaniques. 2. PERSPECTIVES FUTURES 2.1 Relation entre les stades et linteraction ? Si on value les apprenants selon les stades de dveloppement morpho-syntaxique, la question se pose videmment de savoir dans quelle mesure ces stades vont de pair avec les stades dfinis partir de la pragmatique tels que ceux du Cadre Europen Commun de Rfrence (CECR). tant donn que B&S sont arrivs six stades, ds le dbut de lacquisition jusqu un niveau trs avanc, une comparaison avec les six stades A1 C2 de CECR semble possible. Lhypothse de base serait que chaque stade dans les deux chelles correspond exactement, donc 1 A1; 2 A2, 3 B1; 4 B2; 5 C1; 6 C2. Comme premire tape, nous nous sommes demands dans quelle mesure il pouvait exister une relation entre le dveloppement morpho-syntaxique daprs B&S et les critres des niveaux CECR de loral. Dans la Grille de lvaluation de loral (Tableau 3 du CECR ) les facteurs suivants sont mentionns: tendue, correction, aisance, interaction et cohrence. Le facteur correction reprsente le dveloppement grammatical, mais les critres sont trs peu prciss par rapport aux stades B&S, formuls en termes gnraux de contrle, correction, structures simples , etc. Alors que, notre avis, ltendue pourrait tre mesure laide de diffrentes mesures automatiques de vocabulaire, laisance laide de pauses, la cohrence ventuellement laide dune tude des connecteurs, tous possiblement laide dun ordinateur, lvaluation doit se faire par des apprciations personnelles. Nous avions aussi dj une tude sur linteraction de ces apprenants, en partie combine avec le dveloppement grammatical (Bozier, 2005), qui pouvait servir de base une telle comparaison. Cest pourquoi, dans une petite tude pilote (Schlyter & Bozier, 2006, sur le corpus Lund), le stade morpho-syntaxique B&S de 5 apprenants a t compar leur capacit interactionnelle selon ce rfrentiel. Deux juges indpendants ont valu le niveau interactionnel des apprenants daprs les critres de ce rfrentiel, dans 14 enregistrements. Les enregistrements reprsentent 2 3 enregistrements successifs, 1 ou 2 mois dintervalle, de 5 apprenants: Lisa, Sama et Nina (apprenants guids, sans sjour dans un pays francophone), et Martin et Karl (apprenants non guids, qui ont acquis le franais en France sans enseignement). La figure 1. ci-dessous montre (1re colonne) le stade grammatical tel qutabli dans B&S, et le jugement des deux valuateurs (colonne 2 et 3). Dans ces colonnes, les chiffres correspondent non seulement aux stades B&S mais aussi aux niveaux CECR, daprs le principe propos cidessus. Le rsultat suggre dune part des diffrences individuelles: lapprenante Lisa interagit de manire comptente, alors que son niveau morpho-syntaxique ne se dveloppe pas 177

beaucoup; Nina, apprenante guide typique, a inversement un niveau morpho-syntaxique (daprs B&S) trs dvelopp, surtout par rapport sa comptence interactionnelle assez basse, qui pourtant se dveloppe avec le temps (voir Bozier, 2005 pour plus de dtails). Dautre part, on observe des dveloppements assez parallles dans les deux comptences chez les deux apprenants non guids Martin et Karl. Une tude approfondie pourrait indiquer dans quelle mesure il existe une diffrence entre apprenants guids et non guids ou sil ny a que des diffrences individuelles.

Grammaire et interaction
6

stade gr 3 interact 1 interact 2

0 Lisa 1 Lisa 3 Lisa 4 Sama Sama Sama 1 2 4 Nina 1 Nina 3 Nina 6 Martin Martin 1 2 Karl 1 Karl 3 Karl 5

Fig. 1. Stade grammatical et placement dans les stades dinteraction, chez les apprenants L2.

2.2 Les stades de dveloppement lcrit Aprs plusieurs dcennies de travail centr sur la langue parle des apprenants sudophones du franais, notre quipe a commenc sintresser la production crite de ce mme groupe dapprenants. Dans le but de dcrire et dexpliquer lvolution de la morpho-syntaxe en production crite, nous avons cr le Corpus CEFLE ( Corpus crit de Franais Langue trangre). Ce corpus inclut environ 400 textes crits par des apprenants du franais au lyce en Sude et par un groupe contrle franais. Le matriel stend des dbutants aux apprenants avancs (entre 16 et 19 ans). Leur langue maternelle est le sudois. Ils ont chacun crit quatre textes entre septembre 2003 et mai/juin 2004. Les tches sont de nature narrative o il sagit de raconter une histoire partir dune srie dimages, ou de raconter un souvenir de voyage. Ainsi, laccent se trouve sur le contenu et sur la transmission du message et non pas sur la forme. Le Corpus CEFLE a servi de base llaboration du logiciel Direkt Profil (voir ci-dessous) ainsi qu plusieurs tudes sur lvolution de la morphologie en franais L2 crit (gren, 2005; Hedbor & gren, 2006; gren, 2007 paratre). Dans ce travail, nous avons employ 178

les critres linguistiques dcrits par B&S: lorganisation de la phrase, la finitude, le temps, laspect et le mode, pour classer les apprenants aux stades de dveloppement diffrents. Il savre que, selon ces critres, les apprenants du Corpus CEFLE se trouvent entre le stade initial (1) et le stade pr-avanc (4). 2.3 La morphologie du pluriel lcrit Les premires analyses du Corpus CEFLE ont port sur lvolution de la morphologie du nombre. Deux tudes diffrentes, dont lune transversale et lautre longitudinale, ont montr un dveloppement linaire de cette morphologie. Comme le montre le tableau 2, litinraire observ est le suivant: N > Dt > V > A.
Tableau 2: Le pluriel exprim sur les divers constituants de la phrase au stade 1, 2, 3, 4 et au groupe contrle natif. Tableau rcapitulatif de gren (2005). Nom Art. indf Art. poss Art. Df V.irr sont, ont -/+ + + + V.rg (-nt) V.irr Accord morph. (-nt) +/+ + V.irr Accord radical +/+ Adj. pr Adj. prd Adj. post

Stade1 Stade2 Stade3 Stade4 Contr.

+ + + + +

+ + + +

+/+ + +

+/+ + +

+/+ +

+/+

+/+

+/+

Lgende: +: Accord correct ( >90 %); +/-: Accord productif (>75 %) -: accord qui nest pas productif (<75 %).

Dj au stade initial, le pluriel nominal est marqu dans 90 % des contextes au pluriel. Quant aux dterminants, les formes au pluriel sont produites relativement tt et de faon productive partir du stade 2. Le nom est accord au pluriel, mme si le dterminant nexprime pas le nombre, comme dans les exemples suivants: (Ex. 1) Anders (stade 1): deux *lhommes *parler Barbara (stade 2): *le_ monsieurs ont une question

partir du stade 2, de nombreux textes montrent que les verbes portent la marque du nombre lorsquils saccordent un sujet pluriel. Au stade 3, cet accord est relativement systmatique, sur les verbes rguliers ainsi quirrguliers. Jusquau stade 4, les apprenants ralisent laccord verbal de faon morphologique plutt que lexicale (radical), comme le montrent de nombreux exemples de verbes irrguliers modaux et thmatiques: (Ex. 2) Conny (stade 3): ils*ditent bonjour les filles *conduitent

En dernier lieu, ce nest quau stade 4 que les apprenants L2 commencent accorder les adjectifs au pluriel de manire systmatique. Il semble que laccord en nombre de ladjectif soit particulirement difficile en franais L2 crit (malgr le fait que le sudois marque aussi cet accord).

179

Tableau 3: volution de la morphologie du pluriel lcrit - tude longitudinale. Exemple des apprenants au stade 3. Nom Conny1 Conny2 Conny3 Conny4 Caesar1 Caesar2 Caesar3 Caesar4 Caroline1 Caroline2 Caroline3 Caroline4 Carole1 Carole2 Carole3 Carole4 + + + + + + + + + + + + + + + + Dt + + + + / / / + + + + + + + + + V-irr / + + / / + + + / + + + / + + + V-autres + / + / / + + / + + + + + + + Adj / / / / / / / / / + / / + +

Lgende: +: mergence du pluriel; -: le phnomne tudi na pas merg; /: il y a trop peu doccurrences pour permettre une analyse de lmergence; Nom: pluriel nominal; Dt: pluriel des dterminants; V-irr: pluriel des verbes irrguliers tre, avoir, aller et faire; V-autres: pluriel des verbes rguliers (parler) et irrguliers avec changement de radical ( prendre, boire) exprim avec le morphme -nt; Adj: pluriel adjectival.

un niveau individuel, lvolution de la morphologie du nombre se prsente de la manire attendue: litinraire semble tre N > Dt > V > A, comme le montre le tableau 3 de quatre apprenants au stade 3. De manire gnrale, les rsultats montrent un dveloppement net la fois entre les stades et chez les individus. Nous notons que lvolution de la morphosyntaxe orale, telle quelle a t observe par Bartning et Schlyter (2004), saccompagne dun dveloppement morphologique aussi clair lcrit, du moins en ce qui concerne la morphologie du nombre. Les rsultats de lcrit indiquent aussi, outre le fait quon peut formuler des itinraires de dveloppement pour lcrit, que les dveloppements dans le syntagme nominal et dans le syntagme verbal vont de pair. Un rsultat intressant, par rapport au franais L2 parl, est finalement lobservation que la production crite des apprenants montre une mergence relativement prcoce de laccord morphologique sur plusieurs constituants de la phrase. 2.4 Le logiciel Direkt Profil tant donn que les itinraires et stades de dveloppement se retrouvent aussi dans des textes de production libre lcrit, et que nous commenons connatre les critres de ces stades lcrit, la possibilit a t cre de reprer ces stades laide dun programme dordinateur. Depuis quelques annes, Jonas Granfeldt, Pierre Nugues et leurs collaborateurs sont ainsi en train de dvelopper le logiciel Direkt Profil.

180

Direkt Profil est un analyseur de textes crits en franais comme langue trangre. Lide derrire le logiciel est de pouvoir obtenir automatiquement un profil grammatical des textes dapprenants et une indication de leur niveau. Le systme se compose dun analyseur reli un module dapprentissage automatique. Dabord, nous avons construit une annotation en XML de la plupart des phnomnes morpho-syntaxiques discuts par B&S mais aussi un grand nombre dautres indices quantitatifs (longueur moyenne des phrases, etc.). Lanalyseur parcourt le texte dun apprenant en annotant et calculant les occurrences dun phnomne particulier dans ses formes diverses. Le rsultat est un profil de texte bas sur ces critres et une indication du niveau du texte. Le programme prsente les rsultats lutilisateur en visualisant par des couleurs diffrentes les structures qui ont t dtectes. Lindication des stades est possible grce au module dapprentissage automatique. Nous avons entran trois algorithmes diffrents sur un corpus annot manuellement (le corpus CEFLE, voir ci-dessus). Les rsultats actuels montrent que pour un classement en trois stades le taux de convergence entre homme et machine est autour de 80%. Pour un classement en cinq stades, les rsultats vacillent autour de 60 - 70% selon les stades (Granfeldt & Nugues, 2007). Il y a diffrents emplois envisags avec Direkt Profil. Dabord pour les chercheurs le systme permet: (i)
(ii)

une analyse de textes et de transcriptions. Avec son annotation morphosyntaxique automatique, le logiciel peut aider lanalyste dans le travail quotidien. de reprer la plupart des phnomnes morphosyntaxiques de B&S et dtablir un profil grammatical. Ainsi pour le chercheur qui souhaite valuer limpact dun facteur indpendant sur le profil grammatical dune production dapprenant, Direkt Profil offre un moyen rapide et efficace de le faire. Un exemple de cette utilisation se trouve dans larticle de Granfeldt (ce volume) dans lequel lauteur a compar les profils grammaticaux des rcits loral et lcrit chez des apprenants sudophones.

La mthode dapprentissage automatique permet aussi: (iii) dvaluer limpact des critres individuels (en identifiant les critres les plus sparateurs pour une classification donne avec une mthode de slection dattributs, InfoGain). Nous introduisons cette mthode dans Granfeldt & Nugues (2007).

Ensuite pour les enseignants/professeurs de franais et les apprenants, nous esprons que le systme permettra: (iv) une valuation du niveau de dveloppement grammatical reflt dans les textes. Ce type dvaluation, mme si elle ne restera que diagnostique, pourra aider le professeur tablir des groupes homognes, choisir un matriel appropri, etc. Pour les lves, Direkt Profil pourra fonctionner comme un systme dautovaluation. Le systme est accessible sur internet ladresse: http://www.rom.lu.se:8080/profil et ouvert tous les utilisateurs intresss. 2.5 Remarques finales

181

laide des travaux sur les stades, tout dabord loral comme B&S, et ensuite lcrit, comme gren (2005, 2007), ainsi qu laide du logiciel Direkt Profil (Granfeldt & Nugues (2007), nous esprons nous approcher de plus en plus une valuation partielle du niveau de lapprenant sudophone du franais. Il faut pourtant faire attention certains aspects: - la possibilit de communiquer et interagir ntant pas exactement relie au dveloppement grammatical (cf. Bozier 2005, Schlyter & Bozier, 2006), les stades grammaticaux ne peuvent pas tre traits comme seuls critres dune valuation relle des productions libres des apprenants. - on ne doit pas utiliser ces stades comme base lenseignement. Lintention nest pas denseigner explicitement ces phnomnes grammaticaux dans lordre observ par B&S, mais de les utiliser comme indice de la maturation du systme linguistique internalis de lapprenant. Ce systme doit tre acquis, notre avis, plus ou moins implicitement dans un environnement linguistique riche et motivant, concentr sur un contenu qui puisse intresser les lves et les stimuler communiquer. BIBLIOGRAPHIE gren, M. (2005). Dveloppement de la morphologie du nombre en franais langue trangre lcrit: tude transversale. Licentiatavhandling, Universit de Lund, (SOL: franais), Sude. gren, M. ( paratre) The Advanced L2 Writer of French: A Study of Number Agreement in Advanced Swedish Learners. In E. Labeau et F. Myles (Eds.), The Advanced Learner Variety: the Case of French. Peter Lang. Bachman, L.F & Cohen, A. D. (1998). (Eds.) Interfaces Between Second Language Acquisition and Language Testing Research. Cambridge: Cambridge University Press. Bartning, I. (1997). L'apprenant dit avanc et son acquisition d'une langue trangre. Tour d'horizon et esquisse d'une caractrisation de la varit avance . AILE 9, 9-50. Bartning, I. (1998). Procds de grammaticalisation dans l'acquisition des prdications verbales en franais parl . Travaux de linguistique 36: 223-34. Bartning, I. (2000). Gender agreement in L2 French preadvanced vs advanced learners. Studia Linguistica 54:2: 225-237. Bartning, I. ( paratre).The Advanced Learner Variety: 10 years later. paratre dans: E. Labeau & F. Myles: The Advanced Learner Variety: the Case of French. Peter Lang Bozier, C. (2005). La sollicitation dans linteraction exolingue en franais . Thse de doctorat, Universit de Lund (SOL), Sude. Brindley, G. (1998).Describing language development? Rating scales and SLA. In L. F. Bachman & A. Cohen (Eds.) Interfaces Between Second Language Acquisition and Language Testing Research, (p. 141-155). Cambridge: Cambridge University Press.

182

Crystal, D., Fletcher, P. et Garman, M. (1976). The grammatical analysis of language disability. London: Arnold. Clahsen, H. (1985). Profiling second language development: A procedure for assessing L2 proficiency. In K. Hyltenstam & M. Pienemann (Eds.), Modelling and Assessing Second Language Acquisition. Multilingual Matters 18. Clahsen, H. (1986). Die Profilanalyse. Ein linguistisches Verfahren zur Sprachdiagnose im Vorschulalter. Berlin: Marhold Clahsen, H., Meisel, J-M., et Pienemann, M. (1983). Deutsch als Fremdsprache. Der Spracherwerb auslndischer Arbeiter. Tbingen: Narr Common European Frame of Reference: Conseil De LEurope (1998). Les langues vivantes: apprendre, enseigner, valuer. Un cadre europen commun de rfrence. Conseil pour la coopration culturelle, Comit de lducation, Strasbourg. (http.// www. coe.fr) Granfeldt, J. (2003).LAcquisition des catgories fonctionnelles. tude comparative du dveloppement du DP franais chez des enfants et des apprenants adultes. tudes Romanes de Lund 67. Romanska institutionen, Universit de Lund, Sude. Granfeldt, J. et Nugues, P. (2007). valuation des stades de dveloppement en franais langue trangre , dans Actes de Traitement Automatique des Langues Naturelles (TALN), Toulouse, 5-8 juin 2007. Granfeldt, J. et Schlyter, S. (2004).Cliticisation in the acquisition of French as L1 and L2. in Prvost, P. et Paradis, J. (ds.).Acquisition of French in Different Contexts: Focus on Functional Categories, Amsterdam: Benjamins (Language Acquisition and Language Disorders vol 32), pp 333-371. Hedbor, C. et gren, M. (2006).Acquisition et mise en oeuvre de la morphologie flexionnelle en franais langue trangre, dans P. Largy et M-P Thibault (ds.) La morphologie: Acquisition et mise en oeuvre, Special issue in Rducation orthophonique, no 225: 159-189. Kihlstedt, M. (1998).La rfrence au pass dans le dialogue: tude de lacquisition de la temporalit chez des apprenants dits avancs de franais, Cahier de la recherche 6, Thse de doctorat, Universit de Stockholm, Sude. Klein, W. et Perdue, C. (1997)."The Basic Variety (or: Couldn't natural languages be much simpler?)". Second Language Research 13(4): 301-347. Pienemann. M. (1998). Language processing and second language development. Processability Theory. Amsterdam: Benjamins. Pienemann, M. & Mackey, A. (1992). An empirical study of childrens ESL development and Rapid Profile. Sydney: NLLIA Language Acquisition Research Centre, Universit de Sydney, Australie.

183

Sanell, A. (2007). Parcours acquisitionnel de la ngation et de quelques particules de porte en franais L2, Cahiers de la Recherche 35, Thse de doctorat, Universit de Stockholm, Sude. Schlyter, S. (1996).Tlicit, pass compos et types de discours dans l'acquisition du franais langue trangre. Revue franaise de linguistique applique, 1. Schlyter, S. (1988). Tidig simultan tvsprkighet: Sprkniv och obalans mellan sprken. Gunnarsson, B.L. & Liberg, C. (Eds.): Barns tvsprkighet. Rapport frn ASLA's hstsymposium, Uppsala, 6-7 november 1987. Uppsala, Sude. Schlyter, S. (2003). Stades de dveloppement en franais L2. sudophones, guids et non-guids, du Corpus Lund. Ms, Lund. Exemples dapprenants

Schlyter, S. et Bartning, I. (2005). Laccord sujet-verbe en franais L2 parl. In J. Granfeldt & S. Schlyter (Eds.) Acquisition et production de la morphologie flexionnelle. Actes du Festival de la morphologie , mars 2005 Lund. PERLES (Petites tudes Romanes de Lund, Extra Seriem) no 20, SOL, Universit de Lund. Schlyter, S. et Bozier, Ch. (2006). Stades de dveloppement grammatical et comportement interactif. Prsentation au colloque "Recherches en acquisition et en didactique des langues trangres et secondes" (Paris III, 6).

184

185

COMPLEXITY, ACCURACY, FLUENCY AND LEXIS IN TASK-BASED PERFORMANCE: A META-ANALYSIS OF THE EALING RESEARCH. Peter Skehan1 & Pauline Foster2 Chinese University of Hong Kong1, St.Marys University College2

1. INTRODUCTION Many studies of second language performance, usually from within a task-based framework, use complexity, accuracy, and fluency to capture different aspects of second language learner language. Given this level of consistency which has emerged, it is interesting to explore the earlier research which led us to the use of separate measures in the three areas. A significant study in this regard is Ellis (1987), who explored accuracy only in the context of a narrative retelling of a series of cartoon pictures under what Ellis proposed as different planning conditions. He reported an accuracy effect: engagement of planned discourse is associated with greater accuracy, while lack of opportunity to engage planned discourse is associated with lower accuracy. Crookes (1989) responded to this study critically, suggesting that Ellis implementation of planning confounded spoken and written modalities. He defined planning as time-to-plan and then compared learner performance with or without planning time He reported no accuracy effect, but interestingly he did report significant effects for complexity and fluency. As a result of these two studies, we now have an interesting independent variable, planning, and three dependent variables, with the possibility of different experimental influences on complexity, accuracy, and fluency, respectively. The present authors reacted to the Crookes study with the concerns that (a) the tasks that Crookes used were a little superficial in nature, so that more engaging tasks might lead to different results, and (b) the measures of complexity, accuracy, and fluency that he used were not the only ones possible. In particular, we were drawn to the idea of using a generalised measure of accuracy, rather than the specific measure, e.g. of article usage, that he had used. Foster and Skehan (1996) used three tasks, a personal information exchange; a narrative; and a decision making task, and reported a significant effect for planning in all three areas, i.e. including accuracy. This (and related studies (e.g. Skehan & Foster, 1997) led to a need to conceptualise the three performance areas in greater detail, since these were being proposed as the means by which second language learner performance on tasks could be evaluated. This conceptualisation can be done in two complementary ways. First, one can simply regard the three areas as operating simultaneously during performance. In this respect we offer the following propositions:

attentional capacity is limited (Cowan, 2005) attending to one of the three performance areas may drain attention from other areas 186

given this limitation, there is, in particular, a form-meaning tension, with meaning normally taking priority, and therefore reducing the attention available for form (Van Patten, 1990) even when there is attention available for form, there is a still further tension between form directed to the use of more complex, cutting-edge language (formas-ambition) and attention within form directed to accurate, error-free language (form-as-conservatism) a trade-off hypothesis can be formulated which predicts that under certain conditions, raised levels in one performance area, when it consumes attention, may take attention away from other areas, with the result that performance in those areas may be lowered

Following this set of propositions, a central challenge for task-based instruction is explore how tasks and task conditions can be manipulated to produce performance which maximises complexity, accuracy, and fluency even though these three areas may enter into competition with one another. We will explore this in detail below. But there is a second perspective on how the three areas inter-relate, and one which focuses on acquisitional sequence. It can also be argued that the following relationship, shown in Table 1, captures how new language is developed and how, subsequently, greater control is achieved over this new language:
Table 1: A developmental sequences for complexity, accuracy, and fluency. Complexity Accuracy Fluency This represents new cutting edge and possibly risky language, and foreshadows growth in the interlanguage system This represents a striving for control and error avoidance, possibly by the avoidance of cutting-edge language, and by avoiding fluency to enable more time to be used to achieve higher accuracy This represents a focus on meaning, automatisation, lexicalisation and a push for real-time processing

In this view, the three performance areas are not simply aspects of performance, but also the sequence that will be followed as different sub-systems of language emerge, and become controlled. In this case, a reason to research tasks and task performance, and to discover what independent variables influence the different areas would be to help pedagogic decision making to more effectively nurture new language and then enable this new language to be controlled and extended. A series of research studies has addressed these issues, and established a number of findings. Regarding difficulty, for example that: tasks based on familiar information were easier tasks with more information transformation were more difficult Regarding selective influences it appeared that: tasks based on familiar or concrete information led to greater accuracy and fluency dialogic tasks lead to greater accuracy and complexity, while monologic tasks generally produce the reverse results Regarding task conditions: 187

pre-task planning consistently produces greater complexity and fluency (cf. Crookes, 1989) pre-task planning sometimes produces more accurate language pre-task planning is more effectively done when led by the teacher, and least effectively done in a group of learners

These findings are drawn from individual research studies that we did and one can offer some preliminary claims based on them. For example, it seems to be the case that different task features, or different task conditions exert systematic influences on performance, and that if one conceives of performance in terms of complexity, accuracy, and fluency, many individual or combined effects are possible. For example, it can be claimed that complexity and accuracy often enter into competition with one another, so that the more usual outcome will be that one of these will show elevated performance, but the other will not. However, there may be times where both increase but such occasions are less frequent. In contrast, producing simultaneous beneficial effects on complexity and fluency, or on accuracy and fluency is not so much of a challenge. In a sense this detail develops the more general claims made earlier about a tradeoff between performance areas. It also relates to the two interpretations of the three performance areas by allowing two challenges to be formulated regarding the impact of task research. The first challenge would be to use task research findings to identify or design tasks at different levels of difficulty. Meeting this challenge would mean being able to use tasks which make realistic processing demands on learners, so that they do not have to allocate all their attention to simply getting the task done. As a result, they might have some attention left over so that not only meaning, but also form could be brought into focus. The second challenge would be not only to be able to make effective predictions about difficulty, but also to identify or design tasks which promote the different performance areas (complexity, accuracy, fluency) in such a way that this fits in with pedagogic goals. This might either be to enable focus on areas of weakness, or it might be to work on the sequence of new language > control of error > achievement of fluency, which was described earlier. Clearly, this interpretation of tasks presupposes a limited capacity attention system and the operation of a trade-off hypothesis. Since this is so fundamental, it is important therefore to discuss an alternative position which takes a significantly different view of attentional functioning in task performance. Robinson (2001) has outlined the Cognition Hypothesis. This rejects the notion of a limited attentional capacity, instead proposing that we have available multiple attentional pools, and expandable resources which respond to the communicational needs that arise. Performance is driven, following this hypothesis, by the notion that task difficulty provokes wider attentional use, in that the more difficult the task, the more likely the language user will be to strive to match task difficulty with more complex language (consistent with claims made earlier), but also to strive to respond the difficulty of the task by producing more precise and accurate language to ensure that meanings are communicated effectively. This leads to the prediction that more difficult tasks will be associated with both increased accuracy and complexity, since these are not seen as competing for attentional resources. (Robinson also predicts that such tasks will be associated with lower fluency.) These are two interesting contrasting positions on the linkage between attentional functioning and tasks. Robinsons position is perhaps clearer: more difficult tasks will lead to increased complexity and accuracy. Skehan (1998), in contrast, although working from a simple starting assumption, makes more varied predictions. The starting assumption is that attentional capacity is limited, and therefore there will be occasions when trade-off effects will be seen. In a sense, this develops notions of task difficulty covered earlier. But Skehan also predicts that 188

task performance will also be the result of selective influences on different aspects of performance, such that higher complexity will be associated with some task characteristics and conditions; higher accuracy with others; and higher fluency with still others. What this means is that actual performance will depend upon how the different combinations of independent variables interact to influence the language that is produced. Some combinations of task characteristics and conditions will lead to trade-off effects, but it is also possible that on occasions, complexity and accuracy will both be raised, because independent influences are at work to support each. On such occasions, potential trade-off limitations will be overcome. In other words, both Robinson and Skehan make predictions that there will be times when accuracy and complexity are both raised, but they do so for different reasons. For Robinson, task difficulty is the driver. For Skehan, it is not difficulty that is the issue, but rather the combination of task characteristics and task conditions. 2.RESPONDING TO THE CHALLENGES So far, we have briefly explored findings from individual studies from Foster and Skehans research (termed the Ealing research, through its location in West London). We have also outlined the theoretical context with the opposing interpretations of Skehan and Robinson. The remainder of this chapter will be an attempt to extend the discussion of findings and also to address the alternative positions of Skehan and Robinson. Two additional features will be important in this regard. First, the conceptualisation of performance which underpins much existing task research will be challenged and extended. This concerns the development of some new measures of the existing constructs, but also the introduction of new aspects of performance particularly lexis. Second, the findings will be presented in the form of a metaanalysis, such that generalisations will be made from the range of studies that were conducted in the Ealing research. In this way, wider claims can be made since they are based on a larger database than simply individual studies. 2.1 Adapted and New Measures Currently performance, at least in the Ealing studies was conceptualised as: Complexity: where this was measured through a measure of subordination, in the form of the total number of clauses divided by the number of AS units (Foster et al., 2000). This generated a minimum number of 1, by definition, and typical values ranged between 1.20 and 2.00. Accuracy: where this was measured through the generalised measure of the percentage of error-free clauses, with typical values in the range of 0.40 to 0.80 Fluency: where this was measured in terms of Breakdown Fluency, essentially the number of pauses and the total amount of silence; and also in terms of Repair Fluency, with measures of things like reformulation, repetition, false starts etc., standardised to the number of each unit per 100 words. These measures have proved serviceable and have generated a wide range of meaningful significances. But there is clearly scope for improvement, and some modifications are worth discussing here. Nothing new is proposed here regarding complexity, although this is not to say that the area does not have potential for development. With accuracy, though, an alternative measure is proposed. Currently, the standard measure is the proportion of error-free clauses. 189

Unfortunately, this measure has the potential disadvantage that if a speaker uses many short correct utterances, the score which results may be inflated. Even short backchannels, if more than one word, might lead to a higher score that would otherwise be typical of a learner. For that reason, an alternative measure will be outlined here (and has once been reported: Skehan & Foster, 2005). Essentially, the measure explores the length of clause that can be accurately handled. To compute this measure, all clauses are ranked in length, so that, for example, all two word clauses, all three word clauses, .. all twelve word clauses are brought together, and then the proportion of each word length that is used correctly is computed. A criterion is set (usually 70% correct use) and then the maximum length which reaches this criterion is taken to be the clause length accuracy score. (Some rules are outlined in Skehan and Foster (2005) for handling cut-off points which are not simple.) This measure is proposed as a better measure of accuracy in performance since it avoids the problem of score inflation through correct short-clause use. Earlier it was indicated that Breakdown Fluency was measured through a standardised measure of number of pauses and total silence. This measure also has a disadvantage. Native speakers obviously pause, and it would be useful if one could distinguish between pausing by native speakers compared to non-native speakers. To this end, it has been proposed (Davies, 2003) that pauses at clause boundaries are more characteristic of native speakers, while they do not so often pause mid-clause. To that end, the meta-analysis to be reported here distinguishes between the two pause locations of end-of-clause (or end-of-AS unit) and midclause pausing. There may be differing effects, for native and non-native speakers, with these different pause types. So far, the additional measures that have been covered essentially tinker with the three performance areas of complexity, accuracy, and fluency but they do not challenge it fundamentally. However, one of the major omissions in performance measures of tasks (although not for a minute is this being proposed as the only major omission!) is that of lexis. This wide-ranging area has not so far been measured extensively or systematically, although there have been occasional attempts to incorporate its assessment (e.g. Foster & Skehan, 1996; Robinson, 2001). Two relevant measures are proposed here. First, there is the area of lexical density, or at least its operationalisation as the type-token ratio (the ratio of different words to total words which are used). The now well-recognised difficulty with this measure (Malvern & Richards, 2002) is that it is strongly influenced by text length (at least for the sort of text lengths typical in task-based performances) with a negative correlation between text length and type-token ratio of around 0.75 (Skehan, 2003). Fortunately, the CLAN suite of programmes (MacWhinney, 2000) offers a subroutine, VOCD, which handles this problem, and which provides a measure of lexical density corrected for text length, known as D (Malvern and Richards 2002). This will be reported on here. This measure is what has been termed text-internal, i.e. it only uses information from the actual text itself. So this leads to the need for a different, second measure which is textexternal, i.e. it uses information from outside the text, some sort of reference material, to compute an index of the extent to which a greater variety of words is drawn on (Daller et al., 2003). In the present work a measure will be presented, Lambda, which is an adaptation of work by Meara and Bell (2001) and Bell (2003). A text is divided into ten word chunks, and then the number of infrequent words used in each ten-word chunk is calculated. This yields a distribution of scores (for as many ten word chunks as there are in a text) which can be modelled using a Poisson Distribution. The modelling generates a value, lambda, which reflects the extent of lexical variety that the text exhibits. The value of D is taken to be an 190

index of the extent to which speakers avoid recycling a small number of words. The value of lambda is taken to reflect the extent to which a speaker can access less frequent words and introduce a greater degree of lexical variety into performance through wider use of the second language lexicon. The two measures (Skehan, ms) seem to capture different aspects of lexical performance, with median correlations between them in the Ealing dataset close to zero. 2.2 The Meta-Analysis Database Now that we have explored the new performance measures that will be used, we need to turn to outlining how this chapter will report on a meta-analysis which goes beyond any individual study. First of all, the range of studies to be drawn on will be described. Clearly, there is a lot that already be gleaned from looking at these studies singly, but the next section will attempt to draw out more powerful generalisations by linking the different studies wherever possible.

Table 2: The Ealing Dataset. P=Personal task: N=Narrative task: D=Decision-making task. Study St.1: Foster and Skehan (1996) Focus P vs. N vs. D Planning Results Strong task effects Selective planning effect C only omp. and fluency strongly affected Acc. Slightly affected Strong planning effect Selective task effect As above Partial post task accuracy effect Strong planning effect No effect of mid-task surprise information Strong time effect (5 mins > 10 mins on all measures) Structured task was more fluent and sometimes more accurate Simultaneous processing is very difficult compared to delayed Strong source effect with teacher planning, with C and A both raised No focus effect: content planning no different to language Clear accuracy effect of post-task Strong planning effect with complexity Size 25K

St.2: Skehan and Foster (1997) St.3: Skehan and Foster (2005)

P vs. N vs. D Planning Post-task Decision-making task Planning Mid-task surprise Time (5 vs.10 mins) Degree of structure Processing load Source of planning Focus of planning

36K

18K

St.4: Skehan and Foster (1999) St.5: Foster and Skehan (1999)

30K

30K

St. 6: Unpublished St.7: Foster (2001)

N vs. D Post-task condition P vs. N vs. D

30K 25K

191

Planning NS vs. NNS

and fluency Native speakers less formulaic when planned, Non-native speakers the reverse

3. META-ANALYTIC RESULTS First of all we can examine the complexity scores across the studies. These are shown in Table 3. Italicised figures are significantly different.
Table 3: Complexity Scores across four studies with and without planning. Study St.7: NS (Foster 2001) St.1: NNS St.2: St.3 Personal 1.13 1.14 1.26 n/a Narrative 1.20 1.32 1.27 n/a Decision-making 1.26 1.46 1.27 1.40 1.32 1.75 1.31 1.40

1.30 1.20 1.31

1.49 1.60 1.36

The first figure in each cell is for the unplanned condition, the second figure is for the planned condition.

There were four studies which used planning as a variable, two (i.e. Study 7 and Study 1) which used the same tasks and experimental conditions but which had native speaker and nonnative speaker participants respectively, and two others which only had non-native speaker participants. The generalisations here are fairly clear. Planning has a significant (and beneficial) effect everywhere, except with the two non-native speaker studies with Personal tasks, i.e. the easiest tasks, based on familiar information, where it appears that there was little scope for planning to confer any advantage. Everywhere else significance is attained. On the whole, the degree of effect of planning on complexity is similar with the Narrative and Decision-making tasks. There is also a little task variation within task type. The Study 2 Agony Aunt (agree on advice to letters written to a magazine Agony Aunt) decision-making task produced a greater complexity effect than did a decision making task (Study 1) requiring participants to be judges and decide on the length to time to send people to prison for various crimes. Interestingly, in the Studies 1 and 7 comparison, native speakers behave in pretty much the same way as non-native speakers: planning, for them too, led to greater complexity in language. It appears that the opportunity for pre-task planning impacts upon the Conceptualiser stage of Levelts (1989; Kormos, 2006) model for both native and non-native speakers. Table 4 presents the comparable results for lambda, the index of lexical variety. The same studies and variables are involved, with planning once more the major task condition variable. (The results for D are not presented here because there are no significances as a result of planning. See Skehan (ms) for greater discussion of the lexical measures.)
Table 4: Lambda Scores across four studies with and without planning. Study St.7: NS (Foster 2001) St.1: NNS St.2: St.3 Personal 1.27 0.94 1.27 n/a Narrative 1.46 1.32 1.25 n/a Decision-making .80 .93 .54 .79 .58 .43 .71 .85

1.48 1.12 1.35

1.95 1.60 1.46

192

Italicised figures are significantly different. The first figure in each cell is for unplanned condition, the second figure is for the planned condition.

Clearly, and unsurprisingly, native speakers produce higher lambda figures than do non-native speakers. In other words, native speakers are able to draw upon a wider range of less frequent lexis while speaking, reflecting their richer, more accessible, and better organised lexicons. Not quite as obviously, the native speakers also show major planning effects on lexical variety, certainly for the Personal and Narrative tasks. While the planners produce a higher value for the Decision-making task, this is not significant. There is also a fairly clear trend that the order of the tasks for lambda is fairly constant, with Narratives generating the highest lambdas and Decision-making the lowest, and the Personal task is closer to the Narratives than to the Decision-making. This generalisation applies equally to the native and non-native speakers. Finally, although the Decision-making tasks show little variation, the different Personal and Narrative tasks for St.7 and St.1 compared to St.2 do generate somewhat different performance levels with lambda. Explaining how to get to your home (Personal, Studies One and Seven) is associated with lower lambdas than describing what surprises you about life in Britain (Personal, Study Two), while making up a story needed to link a series of pictures (Narratives, Studies One and Seven) produces higher lambdas than telling the story in an organised set of cartoon pictures (Narrative, Study Two).

Table 6: Accuracy Scores across four studies with and without planning. Study St.1 St.2 St.3 Personal 0.58 3.3 0.56 2.6 n/a Narrative .57 3.1 .45 2.1 n/a Decision-making .60 .60 2.5 3.3 .60 .61 2.8 3.3 .62 .68 3.2 4.1

0.62 4.1 0.65 5.0

.52 3.0 .55 2.8

Italicised figures are significantly different. The top row in each cell gives error free clauses scores, the bottom row gives Length of Accurate Clause .

Table 6 gives the scores for accuracy. No scores are given for the Native Speakers from Study One since it is assumed that such speakers do not make errors, but merely occasional lapses. It is clear that the error-free clauses scores here do not generate many significant differences. There is one occasion where planning is associated with a lower EFC score, some occasions where there is hardly any difference, and some where the planners are significantly more accurate. The scores for the Length of Accurate Clause (LAC) are more stable. This, it appears, is a better measure of accuracy, with a particularly clear example being the Decisionmaking task from Study One, where while the EFC measure indicates no change, the LAC measure is clear in showing that planners perform at higher levels. Looking at the broad range of studies, it seems that planning has its greatest effects with the Personal and the Decisionmaking tasks, and the least with Narratives. Planning, in other words, seems less able to translate its benefits into avoiding error with the more monologic tasks. Otherwise, it is clear that, irrespective of whether there is planning or not, the Personal tasks generate the highest levels of accuracy, and the Narratives the least (although this is not true for Study One). This is not particularly surprising in that Personal tasks, based as they are on familiar information,

193

are likely to enable more attention to be made available for the Formulator stage in performance. We turn next to measures of fluency. The first measure of fluency to be considered is pausing and the standardised values (pauses per 100 words) are shown in Table 7.
Table 7: Pausing Scores across four studies with and without planning. Study St.7: NS 2001) St.1: NNS St.2: St.3 Personal 2.8 Narrative 4.2 2.1 1.3 (3.23) (0.81) 2.6 3.9 1.4 3.8 (1.03) (0.37) 3.8 6.8 6.1 10.1 (0.67) (0.47) 12.9 n/a Decision-making 3.6 0.8 1.6 (2.25) (2.0) 0.4 3.7 1.5 4.8 (0.77) (0.56) 2.7 4.6 2.7 10.6 (0.43) (0.29) 9.2 1.9 1.4 2.9 (0.66) (0.61) 2.3

(Foster,

1.

4 1.1 (2.55) (1.08) 1.3 1.6 1.6 2.6 (0.62) (0.76) 2.1 4.5 3.9 8.5 (0.53) (0.46) 8.4 n/a

Italicised figures are significantly different.The upper row in each cell represents the no. of AS pauses per 100 words, while the lower row represents the mid-clause pauses. Parenthesised figures are the ratio of AS pauses divided by mid-clause pauses.

The first point of interest here concerns the relationship between the Native Speakers and the Non-native speakers. It appears to be the case that Native Speakers pause more often or at least as often as Non-native speakers at AS boundaries. (Although it should be said that when they pause, they pause for less time, on average.) In contrast, the non-native speakers pause very clearly more than Native Speakers at mid-clause points. The two groups, in other words, differ not just in pausing, but also in pattern of pause locations. This is most clearly demonstrated by the parenthesised, ratio,value. Native speakers are able to engage in what might be considered to be a listener-friendly distribution of pauses, while non-native speakers clearly have pauses thrust upon them as they encounter difficulties in unpredictable places. It is also clear that planning has intriguing effects. For Native and Non-native speakers alike, AS pauses generally reduce, suggesting that the discourse produced is more organised and more predictable. The situation with mid-clause pauses though is more complex. For the non-native speakers these pauses are generally reduced, but for two of the three native speaker data points, mid-clause pauses actually increase, suggesting a different approach to processing on the part of these speakers. Again it is the ratio figures for the two pause locations that captures this most clearly. Finally, there is a trend towards Narratives (i.e. the most monologic task) being the task type that provokes the most pauses, although the difference between this and Decision-making tasks is not large. We turn next to other measures of fluency repetition as an index of repair fluency (again standardised per 100 words), and length of run, i.e. the number of words produced, on average, without any dysfluency marker. These values are shown in Table 8.
Table 8: Repair and Length of Run Scores with and without planning. Study St.7: NS (Foster 2001) St.1: NNS St.2: Personal 1.2 4.80 3.6 3.7 4.6 3.4 1.04 6.30 4.5 3.7 3.1 3.4 Narrative 1.5 4.4 3.6 3.4 5.1 3.1 0.9 6.1 4.7 3.6 4.9 2.8 Decision-making 1.4 0.5 4.6 6.2 5.0 6.5 3.1 3.5 5.7 6.2 2.8 3.0

194

St.3

n/a

n/a

4.3 3.5

4.6 3.6

Italicised figures are significantly different. The upper row in each cell indicates the average number of repetitions while the lower row represents the length of run

There is an interesting contrast in these two dependent variables. The Repetition scores show two things happening. First, the native speakers repeat much less (and repeat even less in the planned condition). Second the non-native speakers repeat very much more, and tend to repeat even more when there is opportunity to plan. Planning seems associated with greater involvement with the speech, and as a result more demanding but positive cognitive operations are engaged. This seems a somewhat similar effect to what happens with nonnative speakers and mid-clause pausing. Length of run, it will be recalled, is a measure of how long a stretch of language can be produced without any sort of interruption, whether this is in the form of a pause (filled or unfilled) or the use of a repair device. It has been taken (Towell et al, 1996) to be an indication of the degree of automatisation in speech performance. Here again there is a clear contrast between native and non-native speakers. The native speakers clearly perform at a significantly higher and fairly consistent level. Without planning the Length of Run index is around 4.6, almost regardless of task, whereas with planning it goes up to 6.2 or so, again regardless of task. Native speakers, that is, produce four and a half words, on average, without interruption in impromptu speech, and over 6 with the opportunity to prepare. This figure is interestingly close to Cowans (2005) revision of the magic number of working memory or span of apprehension from Millers (1956) 7, plus or minus 2, to closer to 5. In contrast, the nonnative speakers, operating at a lower level, produce around 3.5 uninterrupted words, on average, and this is unaffected by task or by planning condition. It is also appreciably lower than the native speaker level, and one can more easily understand therefore why length of run may be an important indicator when fluency in spoken language is rated (Cuchiarrini et al., 2002). In fact, it is worth saying that the lack of real improvement amongst the non-native speakers when there is planning reflects two competing effects. Pauses do slightly reduce (which would push up length of run scores), but repair indices tend to increase (which has the reverse effect). The result is two defining features of length of run working in opposite directions, and cancelling one another out. In itself, this finding has implications for the way lengthy of run should best be defined. The other task condition which was investigated was that of a post-task activity. There were two studies which explored this, and the findings are given in Table 9. The focus here is not on all aspects of performance, but only that of form, and therefore accuracy and complexity.
Table 9: Post-task effects on accuracy and complexity. Study St.2: Public Performance St.6: transcription Performance Personal 3.37 1.32 n/a Narrative 2.67 1.32 1.80 1.28 Decision-making 2.93 3.23 1.45 1.70 2.44 5.13 1.29 1.62

3.82 1.22

2.35 1.34 3.29 1.42

Italicised figures are significantly different

The figures in the top row of each cell give the Length of Clause Accuracy measure, and the lower row gives the AS-unit based Complexity score 195

The operationalisation in the first of the two studies (Study 2) was to require some students to engage in a public performance after doing the task privately (Skehan & Foster, 1997). The hypothesis was that foreknowledge of such a post-task-to-come would cause participants to selectively prioritise accuracy, since they would be aware of the greater salience of pedagogic norms because of the public performance. In Study Two, this produced one significant result for accuracy, and none for complexity, with the accuracy value for the Narrative task even going down. It was decided that running a second study would be worthwhile but with a different operationalisation, with the need for participants to engage in transcription of their own performance post-task. It was thought this would more effectively lead to the predicted prioritisation of accuracy during the task. In addition, it was decided to retain the most favourable and least favourable tasks from Study 2, the Narrative and the Decision-making, to provide, simultaneously, the most stringent and supportive tests of the hypothesis that learners could be induced to selectively prioritise accuracy. In addition, it should be noted that all these results are presented in terms of the Length of Accurate Clause measure, which has been argued to be the most sensitive and valid measure of accuracy (Skehan & Foster, 2005). This contrasts with the EFC measure, which was reported in Skehan and Foster (1997). Three points can be made on the basis of the results in Table 9. First, it is clear that the transcription condition (Study Six) is more effective than the public performance (Study Two), and more consistently leads to significant differences in performance. Second, there is once again something about Decision-making tasks which leads to stronger effects than are found with Narratives. In other words, interactive tasks are more influenced by post-task manipulations targeting attention allocation. Third, whereas original predictions were in terms of accuracy only, there is also evidence here that with Decision-making tasks, complexity is also promoted, leading to the conclusion that the attention switching during performance is towards form in general, rather than selectively towards conservative form-as-accuracy. We can now summarise the effects of task conditions across the range of studies we have examined. These suggest that: planning has a major impact upon the Conceptualiser, driving lexical complexity, lexical choice, and length of turn (not reported here) native speaker effects when there is planning are greater, but non-natives are basically the same in the changed performance that they show, although effects are slightly weaker. In addition (and see below) their ability to integrate lexis into performance is not so effective planning impacts more clearly on Length of Clause accuracy than on Error Free Clauses planning does not help non-natives with mid-clause problems (perhaps even making them worse) but does help natural performance and effective use of pauses at clause boundaries. post-task conditions, provided that they are operationalised effectively, do impact on form in performance, suggesting that participants priorities of attentional focus can be manipulated. However, this may not simply be accuracy, but could also be complexity as well.

4. SOME RECONCEPTUALISATIONS OF TASKS

196

So far, the emphasis in this meta-analysis has been on task conditions. As indicated earlier, though, there are also suggestions about consistent linkages between tasks and different performance areas, such as tasks based on familiar information favouring fluency and accuracy. The availability of this larger Ealing dataset enables these claims to be revisited and extended. In particular, here, we will examine three task variables, partly for the additional generalisations they provide, but also for their bearing on the debate between Skehan and Robinson on attentional limitations. First of all, we can consider the effect of task structure. Initially, when the results of Studies 1 and 2 were compared, it was realised that a Personal Information Exchange task (Turn off the oven: Study One) and a Narrative (Sempe cartoon: Study 2) both produced higher than expected accuracy and fluency, and it was hypothesised, post-hoc, that this was due to the storyline involved in each case, i.e. a clear beginning, middle, and end, with clear interrelationships between the stages. Accordingly, Study 4 was designed specifically to explore the effects of structure, with two Mr. Bean video-based narrative retellings. One was structured, (the Restaurant task, with possible engagement of a restaurant script) while the other was not (the Crazy Golf task, with a series of unpredictable and unrelated events). The prediction was largely confirmed, leading to the suggestion that tasks containing a clear macrostructure ease Conceptualiser operations (Levelt, 1989) and as a result, release more attention for the Formulator and consequently greater accuracy and fluency. A further study was conducted (outside the Ealing research). Tavakoli and Skehan (2005) used the WinterHoey analysis of text types (Winter, 1976; Hoey, 1983), focussing on their Problem-Solution structure. Iranian learners of English were required to tell four cartoon series narratives which varied in degree of structure (operationalised as the number if pictures in the picture series whose order could be changed without compromising the story). This study showed clear effects with significantly greater accuracy and fluency, the more structured the task. To generalise therefore, we can now see that structure can be operationalised in various ways. It can consist of narratives: with a clear number of component steps, or based on a clear script which brings structure to the story, or based on a discourse structure which contains integrated coherence

In all cases, the macrostructure appears to ease immediate processing burdens, requiring less input from the Conceptualier, and therefore enabling the Formulator to have more attention available for processes of lemma retrieval and syntactic planning. The second task feature for reconsideration and extension is that of information manipulation. Study One contained a narrative which required participants to look at a series of pictures for which there was no obvious set of linkages or storyline, but which did contain common characters. They had to devise a story which linked the pictures into a meaningful sequence. This led to high complexity scores, (and lower accuracy scores) which were interpreted as learners being pushed to heavy Conceptualiser use as they needed to make transformation of the material in the set of pictures. The transformations, needed to bring about linkages, in turn required greater language complexity. In Tavakoli and Skehan (2005), where the major variable (as indicated in the last paragraph) was degree of task structure, one of the four cartoon-series based narratives produced slightly anomalous results. A particular story sequence showed two children going on a picnic, and where their dog, unknown to them, hid in the picnic basket, and ate all the food, meaning that when they arrived at the picnic site, the story denouement was that they had no picnic! This structured story produced greater 197

accuracy, but it also produced greater complexity. With hindsight, it was realised that the meaningfulness of this story required the teller to integrate what appeared to be background information in some pictures so that the connections underlying the story would be apparent later to explain the denouement. The result of this need for integration was greater language complexity. In a sense this area is the reverse of the last. There the focus was on how knowledge of task structure eases processing demands and enables the Formulator to come into play. Here, the focus is on how the speaker, during performance, has to engage in on-line Conceptualiser work to address the organisation and expression of the more demanding information that needs to be conveyed. These demands may be through the nature of the information (abstract rather than concrete); its dynamic pressure (as in the need to make transformations); or in the need to make connections, as with the Tavakoli and Skehan study. Whichever of these is operative, the Conceptualiser has more difficulty preparing, and will require attentional resources during on-line operation, as the pushed content of the task generates higher complexity. The third and final task area to be considered here is that of necessary elements when a task is being done. The effects of such elements can be seen through Table 10, where values for lambda, mid-clause pausing, and length of accurate clause (LAC) are given.
Table 10: The effects of necessary elements on performance. Study St.1 St.2 St.4 St.6 Narrative Lambda 1.46 1.38 1.66 1.45 Decision-making Lambda Mid-Cl Pauses .65 3.8 .49 9.9 n/a .48 2.0

Mid-Cl Pauses 4.5 11.5 7.3 4.2

LAC 3.0 2.5 2.4 3.0

LAC 2.9 3.1 4.4

The first point to make here is that the tasks contrast. Narrative tasks, as we saw earlier, generate higher levels of lambda. They contain input which is non-negotiable. The Decision making tasks, in contrast, although they do have input, provide a greater scope for improvisation, avoidance, and development. As a result, particular words become less important than in the Narratives. Even within the Narratives there are differences. The (realtime) Mr.Bean narratives generate the highest lambdas, followed by the two cartoon picture sequences of Studies Two and Six. Taking lambda to be an indicator of how input is less negotiable, this is an interesting insight in itself. But even more interesting here is the relationship between the various measures shown. In general, higher lambdas are associated with more pauses and with lower accuracy, i.e. both features of Formulator operations. The need, in other words, to incorporate less avoidable elements seems to come at some cost. Needing to deal with more specific lexis provokes a greater need to pause, and less ability to avoid error. Necessary elements, in other words, although producing high lambdas, have a damaging effect on other aspects of performance. More generally it appears that second language speakers have poorer abilities at integrating lexis, especially lexis thrust upon them as opposed to lexis more easily chosen, with ongoing effective accurate language performance. It is interesting that these influences do not manifest themselves with native speakers. They show a capacity to harness lexis effectively, and to associate higher lambdas with greater complexity. Lexis, in other words, can effectively drive syntax for these speakers,

198

as one would expect from the Levelt model. The non-native speakers, in contrast, cannot achieve this integration, and the need for more difficult lexis impairs other performance areas.

3. COGNITION VERSUS TRADE-OFF We are now in a position to return to two alternative accounts of task performance, by Robinson (2001) and Skehan (1998) respecitively. The basic accounts that each provides are as follows:
Table 11: Contrasting Predictions for the Robinson and Skehan. Cognition Hypothesis Task difficulty should lead to increase complexity and accuracy simultaneously Complexity and accuracy should correlate, and be mediated by difficulty of task Tradeoff Hypothesis When attentional resources are limited, there will be competing priorities in performance In addition, task characteristics can have selective influences which can modify the effects of tradeoff

The crux here is the relationship between accuracy and complexity. Robinson (2001) predicts that task complexity will raise performance on these simultaneously. Note two implications of this. One (which has been the most influential one so far) is that where task difficulty is concerned, statistical significance will be achieved with more difficult tasks leading to higher complexity and accuracy performances in groups who do difficult tasks. But the second is no less important. The Cognition Hypothesis ought to predict also that there will be a correlation between complexity and accuracy, i.e. at the individual level there will be simultaneously elevated performance, i.e. the two measures should go together. Turning now to the Trade-off Hypothesis, it should be noted that there is no prediction that one will always see raised performance in one area at the expense of performance in another. This may be the default position, as it were, in that, other things being equal, there will be pressure on limited attentional resources, so that task difficulty is likely to provoke lowered performance in some areas while other areas may not be so affected. But at every stage, the trade-off hypothesis has been paralleled by a concern to establish the selective effects of tasks, such that particular task characteristics are associated with elevated performance in certain areas, hence the claims about familiarity of information raising accuracy and fluency, as do structured tasks, while interactive tasks are likely to raise accuracy and complexity. This leads to a major point of comparison between the two models. Both can predict that accuracy and complexity will go together, but they do so for different reasons. The Cognition Hypothesis predicts that it is task difficult which leads to this result. What might be termed the Extended Tradeoff Hypothesis predicts that accuracy and complexity will be simultaneously raised as a result of the conjunction of propitious and selective influences on task performance working in combination. In other words, it proposes an independent explanation for an accuracy-complexity relationship. This in turn leads to a re-examination of the evidence regarding accuracy-complexity relationships. So, drawing on the meta-analysis presented here, based as it is on an additional range of measures, three studies will be explored where accuracy and complexity are simultaneously raised, and it will be proposed that one does not need invoke the Cognition Hypothesis to explain these results. In contrast, one can account for

199

them more simply by other means, means which mediate the effects of the basic Tradeoff Hypothesis. Foster and Skehan (1999), building upon previous studies which showed planning effects on performance, sought to explore whether varying the source of planning and the focus of planning would have differential effects on performance. The focus of planning (content vs. language) was ineffective. However the source of planning yielded interesting results, since three conditions (in addition to control) were used. These were teacher-fronted planning, group-based planning, and individual planning (the condition used in earlier studies). The teacher-fronted planning group was clearly the most effective, but what was really interesting is that this group showed raised accuracy and complexity scores simultaneously. In other words, the role of the teacher in guiding the planning seemed to lead learners to be more effective at focussing on form in both dimensions. Through more effective use of planning time, performance could be more complex and simultaneously, error could be avoided. It is hard to argue that the teacher-fronted planning condition produced a more difficult task. After all, it was the same task that was done (a Balloon debate) in all conditions. It seems more plausible therefore to associate the simultaneous raised performance in the two areas as the consequence of the type of planning that was engaged in, as the teacher more effectively prepared the ground for integrated performance and attention management that was more effectively handled and distributed. A second study is that of Skehan and Foster (ms), discussed above. In this study, a post-task condition of transcribing performance was used, with the intention that participants, while they were doing the task, would make connections with the post-task, and therefore allocate attention selectively towards accuracy. This prediction was confirmed, for both Narrative and Decision-making tasks. It also seemed clear that the experimental post-task condition was a more effective operationalisation than an earlier study which had used the threat of a public performance. But what was most interesting is that the complexity scores were also raised in each of the tasks. For the narrative, these were 1.28 (no post-task) and 1.42 (post-task) although these values were not significantly different. But for the Decision-making task the figures were 1.29 and 1.62 respectively, and this difference was significant. It appears that the post-task condition, although affecting accuracy more strongly, also has an impact on complexity, especially for the Decision-making, interactive task. This suggests that a successful post-task condition does cause attention allocation towards form, but this attention is less selective than had been anticipated. Learners are more aware of the language they are using, and this does not simply mean a greater awareness of error-avoidance. It also means that they are attending to the syntactic choices they are making. The two studies so far have been concerned with the conditions under which tasks are done, and have focussed on pre- and post-task phases respectively. The final study, also briefly discussed earlier, is concerned with task qualities themselves. Tavakoli and Skehan (2005) report a study in which level of structure was manipulated in several cartoon narrative retellings. The prediction was that structure would advantage accuracy and fluency. This prediction was confirmed, but for one of the picture series there was also a complexity effect, i.e. all three measured performance areas were elevated. The interpretation made is that in this Picnic story (and see above), two independent task variables (structure and information integration) operated simultaneously to produce a conjoint effect which then transcended the effects of any tradeoff. The variable of structure was covered earlier, as was information integration, in that for the Picnic cartoon series, it was necessary to integrate background and foreground information and it was this which produced higher complexity. In other words, it is possible, through task characteristic manipulation, to overcome tradeoff limitations because 200

the different characteristics support different performance areas. Complexity and accuracy are not being driven forward by the same thing, task difficulty, but by two independent influences. Task structure leads to greater accuracy, while information integration produced higher complexity. One final point from the meta-analysis (rather than any individual study) is worth restating because it is also relevant. It will be recalled from Table 10 that there was a clear difference between native and non-native speakers regarding capacity to integrate lexis. Levelts (1989) model of speaking proposes that the Conceptualiser delivers a pre-verbal message which triggers Formulator processes of lemma access and consequent syntactic encoding. Native speakers seem able to handle lemma access pressures that arrive at the Formulator stage without undue disruption. Non-native speakers, in contrast, when they need to use less frequent lexical elements, see other areas of performance suffer, with reductions in accuracy and complexity. Indirectly, it is proposed that this too is evidence more consistent with tradeoffs in attentional resources. Lexical retrieval demands, although potentially having a good effect on task complexity, in the event turn out to be disruptive. The extra effort that is required for lemma access and the use of the lemma information so retrieved has a damaging effect on performance more generally. This does not encourage views that task difficulty will push, in non-native speakers, for higher levels of performance. 4. CONCLUSION A first point to make here is that it is desirable to refine the methods by which task performance is measured. The different measure of accuracy, the more detailed measures of pausing, and especially the introduction of measures of lexical performance add to the richness with which differences in task language can be assessed. It is claimed here that these measures are more valid, and are also likely to be more sensitive to different experimental manipulations. Certainly from now on, it has to be accepted that indices of lexical performance are needed to contribute a fourth general area to characterising task performance. A second point is that we have a more robust view of the effects of task conditions and task characteristics as a result of looking at this range of studies. Planning has clear effects on accuracy and complexity, and now we can also argue that it is not equally important for all aspects of fluency, with the need to distinguish between pause locations being central. In addition, the construct of length of run needs further research, since it is clear that measuring length of run for native speakers and non-native speakers is affected by different things. Finally, it is clear that additional task characteristics, beyond those covered in previous publications, e.g. Skehan (2003) are relevant, and that now we also need to consider more thoroughly areas such as degree of structure, the processes required for information manipulation, and the importance of necessary, non-negotiable elements, especially lexis, within tasks. But the third point is perhaps the most interesting. It is claimed here that the evidence from three studies is suggestive that the Trade-off hypothesis, supplemented by data on the selective effects of different task characteristics, is sufficient to account for occasions when accuracy and complexity go together. In other words, it is proposed that it is more plausible to believe that when these two performance areas do go together, they do so for reasons independent of task difficulty. The evidence in support of a task difficulty influence is reviewed critically

201

elsewhere. In this context, all that is being claimed is that when these areas do go together, satisfactory alternative explanations to the Cognition Hypothesis are available. REFERENCES Bell H. (2003). Using frequency lists to assess L2 texts, Unpublished Ph.D. thesis, University of Swansea. Cowan N. (2005) Working Memory Capacity, New York: Psychology Press. Crookes G. (1989). Planning and interlanguage variation. Studies in Second Language Acquisition 11, 367-383. Elder C., Iwashita N. and McNamara T. (2002). Estimating the difficulty of oral proficiency tasks: What does the test-taker have to offer? Language Testing 19(4) 343-368. Cucciarini C., Strik H. and Boves L. (2002). Quantitative assessment of second language learners fluency: comparisons between read and spontaneous speech. Journal of the Acoustic Society of America 111(6), 2862-2873. Davies A. (2003). The Native Speaker: Myth and Reality, (Second Edition). Clevedon, Avon: Multilingual Matters. Ellis R. (1987). Interlanguage variability in narrative discourse: Style shifting in the use of the past tensE. Studies in Second Language Acquisition 9, 12-20. Foster P. (2001a). Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers. In Bygate M., Skehan P. and Swain M. (Eds.), Research Pedagogic Tasks: Second Language Learning, Teaching, and Testing, (pp 75-93). Harlow: Longman. Foster P. (2001b). Lexical measures in task-based performance. Paper presented at the AAAL Conference, Vancouver, Canada. Foster P. and Skehan P. (1999). The effect of source of planning and focus on planning on task-based performance. Language Teaching Research 3(3), 185-215. Foster P. and Skehan P. (1996). The influence of planning on performance in task-based learning. Studies in Second Language Acquisition 18(3), 299-324. Foster P., Tonkyn A. and Wigglesworth G. (2000). Measuring spoken language. Applied Linguistics 21(3), 354-375. Hoey M. (1983). On the surface of discourse. London: George Allen and Unwin. Kormos J. (2006). Speech Production and Second Language Acquisition. Mahwah, N.J: Lawrence Erlbaum. Levelt W.J. (1989). Speaking: From intention to articulation . Cambridge, Ma: MIT Press. 202

MacWhinney B. (2000). The CHILDES Project: Tools for analysing talk: Volume 1: Transcription format and programs (3rd Edition). Mahwah, N.J: Lawrence Erlbaum. Malvern D. and Richards B. (2002). Investigating accommodation in language proficiency interviews using a new measure of lexical diversity, Language Testing 19(1), 85-104. Meara P. and Bell H. (2001). P_Lex: A simple and effective way of describing the lexical characteristics of short L2 texts. Prospect, 16(3), 5-19. Miller G. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63, 81-97. Robinson P. (2001). Task complexity, cognitive resources, and syllabus design: a triadic framework for examining task influences on SLA. In P.Robinson (Ed.), Cognition and Second Language Instruction, (pp. 287-318). Cambridge: Cambridge University Press. Skehan P. (1998). A Cognitive Approach to Language Learning , Oxford: Oxford University Press. Skehan P. (2001). Tasks and language performance. In Bygate M., Skehan P. and Swain M. (Eds.), Research pedagogic tasks: second language learning, teaching, and testing , London: Longman. Skehan P. (2003). Task based instruction. Language Teaching 36(1), pp. 1-14. Skehan P. and Foster P. (1997). The influence of planning and post-task activities on accuracy and complexity in task based learning. Language Teaching Research 1(3). Skehan P. and Foster P. (1999). The influence of task structure and processing conditions on narrative retellings. Language Learning 49(1), 93-120. Skehan P. and Foster P. (2001). Cognition and tasks. In Robinson P. (Ed.), Cognition and Second Language Instruction. New York: Cambridge University Press. Tavakoli P. and Skehan P. (2005). Planning, task structure, and performance testing. In Ellis R. (Ed.), Planning and Task Performance in a Second Language. Amsterdam: John Benjamins. Daller H., Van Hout R., & Treffers-Daller J. (2003). Lexical richness in the spontaneous speech of bilinguals, Applied Linguistics 24(2), 197-222. Towell R., Hawkins R. & Bazergui N. (1996). The development of fluency in advanced learners of French. Applied Linguistics 17(1), 84-115. Van Patten B. (1990). Attending to content and form in the input: an experiment in consciousness. Studies in Second Language Acquisition 12, 287-301. Winter E. (1976). Fundamentals of information structure: a pilot manual for further development according to student.

203

THE COMPLEXITIES OF SELECTING COMPLEX (AND SIMPLE) FORMS IN INSTRUCTED SLA Nina Spada & Yasuyo Tomita University of Toronto, Canada

INTRODUCTION In the instructed second language acquisition (SLA) literature there is a general consensus that instruction is beneficial for second language development (Ellis, 2001; Norris & Ortega, 2000; Spada, 1997). Several issues remain, however, including what types of knowledge and language abilities benefit most from instruction, whether particular types of instruction contribute more positively to L2 learning, and whether the benefits of instruction vary depending on the type of language feature targeted. The last question is the focus of this paper whether simple and complex language features benefit differentially from instruction. We also address issues related to type of instruction and type of L2 learning outcomes. The initial motivation for this research takes place within the context of a research project investigating the contributions of two types of form-focused instruction (FFI) on L2 development: isolated and integrated FFI. Isolated FFI is defined as instruction that is provided in activities that are separate from the communicative use of language, but as part of a program that also includes communicative interaction and/or content-based instruction. Isolated FFI may be taught in preparation for a communicative activity or after it but is always separate from it. In integrated FFI, on the other hand, the learners attention is drawn to language form during communicative or content-based instruction. The language features in focus may have been anticipated and planned for by the teacher or they may occur incidentally in the course of ongoing interaction (see Spada & Lightbown, in press for a more detailed description of isolated and integrated FFI). In selecting the language features to be targeted for integrated and isolated FFI, we wanted to include a complex and a simple feature in order to explore whether isolated or integrated FFI might contribute to the development of these features in different ways. One could hypothesize for example, that integrated FFI would be more beneficial for complex features and that isolated FFI would benefit simple features. This is based on claims in the SLA and the cognitive sciences literature that while easy rules can be taught, hard rules are by their very nature too complex to be successfully taught in isolation and thus difficult to learn through traditional explanation and practice pedagogy. They are thought to be best learned implicitly, embedded in meaning-based practice (Krashen, 1982, 1994; Reber, 1989) Others have claimed the opposite that simple morphosyntactic rules are better learned under implicit conditions and that the learning of complex rules is better accomplished with explicit teaching (Hulstijn & de Graaff, 1994). The argument is that complex features are difficult to notice in naturally occurring input and that explicit instruction is necessary in order to help learners discover complex rules. Simple features, on the other hand, are easily available to be noticed (and subsequently learned) via input. While there is evidence to support the 204

advantages of explicit instruction for complex features, there is also evidence that explicit instruction works equally well for simple features (de Graaff, 1997; Housen, Pierrard & Van Daele, 2005; Williams & Evans, 1998). Support for the claim that implicit-inductive instruction works best with simple rules is less clear (DeKeyser, 1995; Robinson, 1996). In this paper we report on a meta-analysis carried out to investigate the effects of different types of instruction on complex and simple language features. The specific questions under investigation are: 1. Do simple and complex features benefit equally from explicit and implicit instruction in the short and long term? 2. Do explicit and implicit instruction lead to similar types of language ability for complex and simple forms? 3. Are there different learning outcomes for complex and simple forms depending on whether explicit/implicit instruction is provided in the classroom or laboratory?
4.

Does length of explicit/implicit instruction make any difference in terms of learners developing knowledge of complex and simple forms?

To explore these questions, we first reviewed the literature on how simple and complex features have been defined. We discovered that there is a clear lack of consensus in terms of how they are conceptualized and inconsistencies in the way in which they have been characterized in empirical studies. Notwithstanding this lack of clarity, there appear to be two fundamentally different ways in which simple and complex features have been distinguished: in the psycholinguistic sense as difficult to process and/or to learn and in the linguistic sense concerning the inherent complexity of a language feature (e.g. more marked/unmarked, more/fewer transformations). In the following quotation Housen, Pierrard and Van Daele (2005) outline some of the challenges that have resulted from previous efforts to define simple and complex language features. Different studies use different criteria to distinguish between simple and complex structures. For example Krashen (1982) considers the 3rd person simple present -s marker in English as a formally simple structure because of its paradignmatic uniqueness while Ellis (1990) classifies it as formally complex because of the distance between the verb stem and the noun phrase with which it agrees. Both authors agree, however, that -s is a functionally simple structure. In contrast, DeKeyser (1998) considers -s to be functionally complex because of its highly syncretic nature, expressing several abstract grammatical functions simultaneously (present time, 3rd person, singular number). De Graaff (1997) operationalizes structure complexity as the total number of formal and functional grammatical criteria or features which determine the specific form and function of a given structure and which are essential for its effective noticing and processing. Yet another approach is exemplified by Robinsons (1996) study, where expert SLA teachers were asked to identify from a list of grammatical structures the ones they thought to be more difficult for their students. (Housen et al, 2005, p. 242) In carrying out our research we were influenced by the criteria proposed by Hulstijn and de Graaff (1994) to distinguish simple and complex features. They propose on way in which 205

degree of complexity is determined is by the number of criteria to be applied in order to arrive at the correct form (p. 103). Their definition relies on linguistic transformation rules, which is similar to the way in which Celce-Murcia and Larsen-Freeman (1999) define linguistic complexity in their grammar book for second/foreign language teachers. By combining this feature of the Hulstijn & De Graaff framework with that of Celce-Murcia and Larsen-Freemans criteria, we decided to use the number of linguistic transformation rules as the basis for distinguishing between simple and complex forms in our study. Table 1 provides an example of the number of transformations required to form a complex rule and a simple rule.
Table 1: Number of transformations Complex and simple rules. Complex Rule: Wh-question of an object of preposition Who did you talk to? Wh-replacement (You [past] talk to who) Wh-fronting (Who you [past] talk to) Do support (Who you [past] do talk to) Subject/auxiliary inversion (Who [past]do you talk to) Affix attachment (Who [DO + past] you talk to) Morphological rules (Who did you talk to?) Fronting/leaving behind (To whom did you talk?/ Who did you talk to?) Simple Rule: 3rd person singular possessive determiner NP [female] = her

Once we had established criteria for determining simple and complex forms, we began our review and synthesis of empirical studies investigating the effects of different types of instruction on specific language features. Below we outline the procedures carried out in the meta-analysis including how the primary studies were searched, selected, and coded. This is followed by a description of how the effect sizes were calculated and the results of the metaanalysis. 2. METHODOLOGY 2.1 Data Collection The data for the meta-analysis were collected first through an extensive online search of the instructed SLA literature. Norris and Ortegas (2000) examination of the studies published between 1980 and 1998 revealed that most studies of instructional effects on SLA were published after 1990. Therefore, only those studies published after 1990 were included in this meta-analysis. The Education Resources Information Center (ERIC), Scholars Portal, and Linguistics & Language Behavior Abstracts (LLBA) databases were selected as tools for the online search, and the combinations of the following key words were utilized: English, instruction, treatment, grammar, form, acquisition, teaching, ESL, and EFL. An examination of the target grammar forms in these retrieved studies led to a further search through the same online databases, adding new key words, such as: tense, past tense, possessive, determiner, article, plural, third person, dative, passive, pseudo cleft, relative, and question. In addition, the titles and abstracts of the following ten online retrievable journals were examined: Applied Linguistics, Canadian Modern Language Review, International Journal of Applied Linguistics, International Review of Applied Linguistics, Language and Education, 206

Language Learning, Modern Language Journal, Second Language Research, Studies in Second Language Acquisition, and TESOL Quarterly. Finally, the references of the retrieved articles and the bibliography in Norris and Ortegas (2000) research synthesis were consulted along with some book chapters and journal articles. This literature search resulted in 103 published reports. 2.2 Criteria for Inclusion The retrieved published reports were examined to determine whether they satisfied all of the following criteria for inclusion in the meta-analysis: (a) published between 1990 and 2006; (b) experimental or quasi-experimental design; (c) English grammar was the target form; (d) included comparisons of treatment and control/comparison groups and/or pretests and posttests; (e) included an instructional treatment and (f) provided enough statistical information for computing effect sizes. Applying these criteria resulted in 69 studies failing to meet them and thus, they were excluded from the meta-analysis. It was often the case that a single publication reported the results of multiple comparisons and 9 studies fell into this category. In some instances the same experimental study was reported in several publications. In cases like these we selected only one that provided enough statistical information to calculate effect sizes. This procedure led to the exclusion of two more studies from the meta-analysis. Furthermore, there was one study that appeared in a bibliography but was not retrievable and another study that was retrievable but was not readable due to the poor condition of the PDF file. In the end, a total of 30 publications that included 41 separate studies were selected for the meta-analysis. Table 2 shows the publication dates of the 30 studies included in the meta-analysis. Approximately ten studies examining the effects of instruction on the acquisition of specific English grammatical features were published in every five-year period from 1990 and 2004, whereas more recently, almost the same number of studies was published within a two-year period (2005 and 2006). Among the studies published in this two-year period, 90% were conducted in classroom contexts and both complex and simple forms were equally investigated. In contrast, the studies published between 1990 and 2004 were conducted more frequently in laboratory contexts (62.5%) with an emphasis on complex forms (70.8%). As with the Norris and Ortega (2000) meta-analysis ours examines the effects of instruction on L2 learning. However our meta-analysis is restricted to English grammar and focuses on the effects of instruction on simple and complex features. Using the criteria described above, the grammatical features characterized as simple were: tense, articles, plurals, preposition, subject-verb inversions, possessive determiners, and participial adjectives. The features characterized as complex were: dative alternation, question formation, relativization, passives, and pseudo-cleft sentences. Thus, among the retrieved 41 study samples, 17 investigated simple forms, while 24 examined complex forms. APPENDIX A provides the detailed characteristics of the 30 studies included in the meta-analysis.
Table 2: Number of Study Reports and Publication Years. Number of study reports a Complex Total Class 10 4 7 0

Year of Publication 2005-2006 2000-2004

Lab 1 4

Simple Class 5 2

Lab 0 1

207

1995-1999 1990-1994 Total

10 7

2 2 8

5 4 14

2 1 10

1 0 2

a) Some study reports targeted both complex and simple linguistic features.

2.3 Coding We began the coding with the independent variables. First, the low-inference features such as instructional contexts (i.e., classroom and laboratory settings) and the length of treatment were coded. Type of instruction was coded according to Norris and Ortega (2000). Instruction was considered to be explicit if it included linguistic rule explanation, and the learners attention was mainly on forms. Instruction was coded as implicit if there was no rule explanation, or if students were not asked to directly draw their attention to forms. The outcome measures were coded according to whether they were controlled or free constructed responses. This was motivated by the argument that while explicit instruction leads to monitored analyzed knowledge, implicit instruction is more likely to lead to unanalyzed, spontaneous knowledge (Krashen, 1982, Schwartz, 1993; Truscott, 1999). In their meta-analysis Norris and Ortega (2000) categorized outcome measures into four types: (a) metalinguistic judgments; (b) selected responses; (c) constrained constructed responses; and (d) free constructed responses. Metalinguistic judgments require learners to judge the grammaticality of a sentence, while selected responses ask learners to select a correct answer in the form of multiple-choice questions. Constrained constructed responses ask learners to produce language varying from one word to a complete sentence. Free constructed responses are unrestricted in terms of form and have meaning as the primary goal. Due to a smaller number of studies in our meta-analysis, we decided to collapse these categories into two types: controlled and free constructed tasks. The overall characteristics of the studies (n = 41) are shown in Table 3. Although all studies reported sample sizes, some did not provide precise information about the length of instructional treatment (see APPENDIX B for the characteristics of each study). In addition, several studies did not provide exact information about the testing schedule (i.e. pretest and immediate/delayed posttest). Our examination of the narrative descriptions in these sample studies, led us to interpret one lesson as one hour of treatment (n=4). Descriptions of the timing of tests such as within one week after the treatment were coded as 3.5 days (n = 10) and on the first day after the treatment as two days (n = 2).
Table 3: Overall Study Characteristics: a number of sample studies. Characteristics Sample size Treatment length (hours) Immediate posttest (days) First delayed posttest (weeks) Second delayed posttest (weeks) Mean 58.66 2.99 4.89 4.00 5 SD 37.533 2.69 7.52 3.62 2.31 Min 8 0.33 0 1 3 Max 160 9 28 16 7 Mode(s) 34 0.33 1.5 6 0 2 3 7 na 41 40 41 17 4

2.4 Meta-Analysis

208

To investigate the effects of instruction on the acquisition of simple and complex forms, we calculated the effect sizes for each study. There are several types of effect size, such as Cohens d, Hedges g, Pearsons r, and Glass delta (Lipsey & Wilson, 2001). ii Norris and Ortega (2006) suggest that Cohens d is the most appropriate effect size when comparing treatment and control groups. Thus we calculated Cohens d for each study in this metaanalysis. Since most of the collected studies did not report Cohens d, we calculated it based on the original data in three ways. First, if the original study provided means and standard deviations, Cohens d was calculated as in (1) where M stands for mean, SQRT refers to squared root, SD is standard deviation (adapted from Vacha-Haase & Thompson, 2004, p.474): d = {M1 - M2}/{SQRT[(SD1SD1 + SD2SD2)/2]} (1)

Second, if the original study reported F values and the sample size, Cohens d was computed as in (2) where n is a sample size (adapted from Keck, Iberri-Shea, Tracy-Ventura, &WaMbaleka, 2006, p.106): d = SQRT {F(n1 + n2)/(n1n2)} (2)

Third, if the original study reported only percentage of participants who experienced improvement, Cohens d was calculated based on arcsine transformations (adapted from Keck, Iberri-Shea, Tracy-Ventura, & Wa-Mbaleka, 2006, p.106; Lipsey and Wilson, 2001, p.188 ad p.204): d = arcsinetreatment arcsinecontrol (3)

Cohens d was computed to examine between and within-group comparisons. First, the effect size was calculated by comparing treatment groups and control/comparison groups at the immediate posttest to investigate the effects of instruction (i.e., implicit and explicit instruction) in relation to linguistic features (i.e., complex and simple features), contexts (i.e., classroom and laboratory), and outcome measures (i.e., controlled and free tasks). Second, to examine the durability of the instruction, effect sizes were calculated for delayed posttests (i.e., first and second delayed posttests). However, due to a small sample size of studies that conducted second delayed posttests, these are not included in this meta-analysis. Third, to examine the effects of instruction observed within a group, the immediate posttest scores and pretest scores were compared within the same group to compute the effect size. This included control/comparison groups, which allowed us to investigate the growth observed within them. We calculated one effect size for each treatment group by averaging all effect sizes gained from dependent variables for the group. Because a central question motivating this research is whether instructional effectiveness is likely to vary depending on the grammatical feature we considered each grammatical form as an independent variable. Such multiple effect sizes based on the same sample are non-independent observations (Lipsey & Wilson, 2001; Norris & Ortega, 2006). However, as Keck et al. (2006) suggest, as long as descriptive statistics (i.e. no inferential testing) are carried out, the non-independence observation does not present a problem for the meta-analysis.

209

Before comparing effect sizes, we followed Hedges (1981) and Lipsey and Wilsons (2001) suggestions and calculated unbiased and weighted effect sizes for each treatment group. As Hedges (1981) points out, the standardized mean difference effect size tends to be biased upwardly when the sample size is small. Therefore, he provides a calculation formula for calculating effect sizes as in (4) where d is an unbiased effect size, N is the total sample size, and d is the biased or raw effect size (adapted from Hedges, 1981, p.114; Lipsey & Wilson, 2001, p.49): d = [1-3/(4N-9)]d (4)

In order to compute the weighted effect size, we calculated the inverse variance weight as in (5) (adapted from Lipsey and Wilson, 2001, p.49): w = 2n1n2(n1 + n2)/[2(n1 + n2)2 + n1n2d2] (5)

where w is the inverse variance weight. Finally, the weighted mean effect size, the standard error of the mean effect size (SE), and lower and upper 95% confidence intervals (CI) were calculated as in (6), (7), and (8) respectively (adapted from Lipsey and Wilson, 2001): Weighted mean effect size d = wd / w SE = SQRT(1/w) (7) (8) (6)

CI = weighted mean effect size d 1.96SE

In the results section, we use d to refer to the weighted mean effect size based on unbiased effect sizes. 3. RESULTS Table 4 and Figure 1 display the mean effect sizes and confidence intervals for explicit and implicit instruction on the acquisition of complex and simple language features in the classroom and lab settings. Table 4 indicates that the mean effect sizes for explicit instruction are larger than those for implicit instruction. iii Explicit instruction was also more effective in the classroom than in the lab. The largest effect sizes are for explicit instruction for both complex and simple forms in the classroom (d = 1.00 for complex forms and d = 0.81 for simple forms). All the effect sizes for implicit instruction are small with minor differences between the lab and the classroom. The smallest effect sizes for implicit and explicit instruction are for simple forms in laboratory settings (d = 0.27 for explicit instruction and d = 0.22 for implicit instruction). It should be noted, however, that both include zero values in their 95% confidence intervals, indicating that the results could simply be due to chance. This is likely related to small sample sizes in both cases. As Figure 1 displays, explicit instruction and implicit instruction rarely overlap with each other, indicating that the difference between explicit and implicit instruction is clear. iv
Table 4: Type of Instruction and Language Feature. na kb Mean d (weighted) SE 95% CI Lower 95% CI Upper

210

Explicit Instruction Complex forms Classroom Laboratory Simple forms Classroom Laboratory Implicit Instruction Complex forms Classroom Laboratory Simple forms Classroom Laboratory

7 8 12 1 4 10 7 3

10 14 18 2 4 21 7 4

1.00 0.71 0.81 0.27 0.29 0.44 0.38 0.22

0.08 0.10 0.08 0.20 0.11 0.09 0.12 0.21

0.84 0.51 0.65 -0.12 0.09 0.27 0.15 -0.19

1.16 0.90 0.98 0.66 0.50 0.61 0.62 0.64

a) Number of sample studies (e.g., a study report may include multiple sample studies) contributing the metaanalysis.b) Number of treatment groups contributing the meta-analyisis.

Table 5 and Figure 2 present the effect sizes representing the magnitude of change within groups from pretest to posttest and the 95% confidence intervals. A similar pattern exists in that the largest effect sizes are for explicit instruction with complex and simple forms. The effect sizes for implicit instruction are either medium or small and as expected, the smallest effect sizes are for control groups in both instructional settings. Similar to Norris and Ortega (2000) our results show evidence of some growth in the control group for learning simple forms (d = 0.28 and d = 0.29 in classroom and laboratory settings, respectively) but these effect sizes are small. Figure 2 reveals that among the control groups, only the effect sizes for those in which simple forms were taught in classroom settings is trustworthy, which may be a reflection of the natural maturation of simple forms via classroom input. It does not appear to make much of a difference as to whether explicit instruction is provided in the classroom or lab. The effect sizes are equally strong for both language features. The results indicate that there is a slight advantage for the implicit teaching of simple forms in the lab but these effect sizes are all medium or low.

1.5

1 0.5 Effect size (d) 0 Explicit, Complex, Class Explicit, Simple, Class Explicit, Explicit, Implicit, Complex, Simple, Lab Complex, Lab Class Instructional Treatment Implicit, Simple, Class Implicit, Implicit, Complex, Simple, Lab Lab

-0.5

Figure 1. Mean effect sizes and 95% confidence intervals: Types of instruction and language feature. Table 5: Effect Sizes: Pretest to Posttest. na Explicit Instruction Complex forms Kb Mean d (weighted) SE 95% CI Lower 95% CI Upper

211

Classroom Laboratory Simple forms Classroom Laboratory Implicit Instruction Complex forms Classroom Laboratory Simple forms Classroom Laboratory Control Group Complex forms Classroom Laboratory Simple forms Classroom Laboratory

5 6 12

6 10 18

0.74 1.08 0.88 --0.15 0.40 0.64 0.72 -0.03 0.07 0.28 0.29

0.09 0.13 0.08

0.57 0.82 0.72

0.91 1.34 1.05

2 6 4 2 3 5 11 2

2 14 4 3 3 5 11 2

0.11 0.11 0.16 0.32 0.17 0.18 0.10 0.43

-0.07 0.19 0.33 0.10 -0.37 -0.27 0.09 -0.55

0.38 0.61 1.00 1.35 0.31 0.42 0.48 1.13

a) Number of sample studies (e.g., a study report may include multiple sample studies) contributing the metaanalysis. b) Number of treatment groups contributing the meta-analysis.

Overall these results suggest a more positive role for explicit instruction and furthermore, that explicit instruction works better than implicit instruction for both simple and complex features in the classroom and laboratory. However, this analysis does not take into account the effects of instruction over time. These results are presented below.
1.5 1 0.5 0 Effect size (d) Explicit, Class, -0.5 Complex -1 Instructional treatment

Explicit, Class, Simple

Explicit, Implicit, Implicit, Implicit, Implicit, Control, Lab, Class, Class, Lab, Lab, class, complex Complex Simple Complex Simple complex

Control, Contrl, Control, Class, Lab, Lab, simple Complex Simple

Figure 2. Mean effect sizes and confidence intervals: Pretest to posttest.

Table 6 and Figure 3 present mean effect sizes and 95% confidence intervals for delayed posttests. Table 6 shows that not all studies reported the results of delayed posttests with sufficient statistical information to calculate effect sizes. Nonetheless, of those studies that did, the effect sizes for explicit instruction of both complex and simple forms are once again larger than those for implicit instruction with one exception - implicit instruction of simple forms in the laboratory (d = 1.03). However, there is only one study represented in this category so no conclusions can be drawn. Figure 3 displays the effect sizes at two different times: immediate and delayed posttests. Three of the four explicit and implicit groups increase at the delayed posttest and the overall increase from the pretest is greatest with the explicit groups with one exception as noted above.
Table 6: Effect Sizes: Delayed Posttests.

212

na Explicit Instruction Complex forms Classroom Laboratory Simple forms Classroom Laboratory Implicit Instruction Complex forms Classroom Laboratory Simple forms Classroom Laboratory

kb

Mean d (weighted) 1.14 0.94 1.01 ----0.56 0.48 1.03

SE

95% CI Lower 0.93 0.66 0.67

95% CI Upper 1.34 1.21 1.36

2 5 3

2 7 3

0.10 0.14 0.18

5 4 1

10 4 1

0.13 0.14 0.62

0.32 0.20 -0.20

0.81 0.76 2.25

a) Number of sample studies (e.g., a study report may include multiple sample studies) contributing the metaanalysis. b) Number of treatment groups contributing the meta-analysis.

1.2 1 0.8 0.6 0.4 Effect 0.2 size (d) 0 Explicit, Class, Complex Explicit, Class, Simple Explicit, Lab, Complex Explicit, Lab, Simple Implicit, Class, Complex Implicit, Class, Simple Implicit, Lab, Complex Implicit, Lab, Simple

Immediate delayed

Instructional treatment

Figure 3: Mean effect sizes of posttests: Immediate and delayed posttests .

Table 7 and Figure 4 present the mean effect sizes and 95% confidence intervals for types of outcome measures (i.e., controlled and free outcome measures) in relation to explicit instruction on complex/simple forms in classroom/laboratory settings. There are several large effect sizes in this table. The largest are for performance on controlled outcome measures in studies of explicit instruction on complex forms in the classroom (d = 0.93), and in the laboratory setting (d = 0.85). Regarding student performance on free outcome measures in classrooms, the effect sizes are small for simple forms (d = 0.39) and medium for complex forms (d = 0.71). These results are consistent with those obtained in Norris and Ortega (2000) in which free outcome measures had smaller effect sizes than controlled tasks in relation to form-focused instruction.
Table 7: Effect Sizes for Explicit Instruction: Controlled and Free Outcome Measures.

213

na Explicit Instruction Complex forms Classroomc Free Control Laboratory Free Control Simple forms Classroom Free Control Laboratory Free Control

kb

Mean d (weighted) 1.00 0.71 0.93 0.71 0 0.71 0.81 0.39 0.85 0.27 --0.55

SE

95% CI Lower 0.84 0.44 0.76 0.51 -1.24 0.51 0.65 0.18 0.60 -0.12 0(-0.00)

95% CI Upper 1.16 0.99 1.11 0.90 1.24 0.90 0.98 0.61 1.11 0.66 1.11

7 3 7 8 1 8 12 4 6 1 0 1

10 4 8 14 1 14 18 9 9 2 0 1

0.08 0.14 0.10 0.10 0.63 0.10 0.08 0.11 0.13 0.20 0.28

a) Number of sample studies (e.g., a study report may include multiple sample studies) contributing the metaanalysis. b) Number of treatment groups contributing the meta-analysis. c) Some studies combined the scores on free and controlled tasks. These scores are not included in this table. Only those studies that reported separate scores for free and controlled tasks contributed to the meta-analysis.

1.5 1 0.5 0 Complex, -0.5 size (d) Effect Class, -1 -1.5 Instructional treatment: Explicit instruction Control Complex, Class, Free Simple, Class, Control Simple, Class, Free Complex, Lab, Control Complex, Simple, Lab, Lab, Free Control

Figure 4: Mean effect sizes and confidence intervals for explicit instruction: Controlled and free outcome measures.

Table 8 and Figure 5 present the mean effect sizes and 95% confidence intervals for types of outcome measures in relation to the implicit instruction of complex and simple forms. The largest effect size (d = 2.16) was obtained in classroom studies using implicit instruction to teach complex features and measuring learners progress with controlled tasks. Figure 5 shows that while most instructional treatment categories overlap with each other, the largest effect size has no or little overlap strengthening these findings. The lowest effect size (d = -0.04) was obtained when free tasks were used to measure learners knowledge in classroom studies teaching complex features implicitly. This result contradicts the claim that the effects of implicit instruction are more likely to be reflected in implicit outcome measures; however, there were only three study comparisons in this analysis. More in keeping with conventional wisdom, the second largest effect size (d =1.16) was obtained when free measures were used

214

to examine the effects of implicit instruction on simple forms in the laboratory context. However, there was only one study contributing to this analysis.
Table 8: Effect sizes for Implicit Instruction: Controlled and Free Outcome Measures. na Implicit Instruction Complex forms Classroomc Free Control Laboratory Free Control Simple forms Classroom Free Control Laboratory Free Control Kb Mean d (weighted) 0.29 -0.04 2.16 0.44 0.56 0.35 0.38 0.52 0.17 0.22 1.16 --SE 95% CI Lower 0.09 -0.25 1.11 0.27 0.28 0.16 0.15 0.14 -0.42 -0.19 -0.34 95% CI Upper 0.50 0.18 3.21 0.61 0.83 0.54 0.62 0.90 0.77 0.64 2.66

4 3 1 10 5 9 7 3 2 3 1 0

4 3 1 21 12 13 7 3 2 4 1 0

0.11 0.11 0.54 0.09 0.14 0.10 0.12 0.19 0.30 0.21 0.76

a) Number of sample studies (e.g., a study report may include multiple sample studies) contributing the metaanalysis. b) Number of treatment groups contributing the meta-analysis. c) Some studies combined the scores on free and controlled tasks. These scores are not included in this table. Only those studies that reported separate scores for free and controlled tasks contributed to the meta-analysis.

3.5 3 2.5 2 1.5 1 0.5 Effect 0 size (d) -0.5 Complex, Complex, Simple, Simple, Complex, Complex, Simple, -1 Class, Class, Class, Class, Lab, Lab, Free Lab, Free Control Free Control Free Control Instructional treatment: Implicit instruction

Figure 5. Mean effect sizes and confidence intervals for implicit instruction: Controlled and free outcome measures.

Table 9 and Figure 6 show mean effect sizes and 95% confident intervals for length of treatment in different instructional environments classroom versus laboratory. Many of the implicit instruction categories include zero values in their 95% confidence intervals. Since this makes comparisons less trustworthy, Figure 6 shows only categories that do not include the zero value.

215

It should be noted that all of the explicit instruction for complex forms in the classroom regardless of length showed large effect sizes ranging from 1.03 to 3.14. While the effect sizes for explicit instruction of simple forms in the classroom were not as large, they were substantial. The effect sizes for implicit instruction of complex forms in the classroom suggested that a shorter length of time is better but there did not seem to be any differences regarding length of time spent on implicit instruction of complex forms in the laboratory setting. This was similar for the implicit teaching of complex forms in the laboratory. Although laboratory studies teaching simple forms implicitly for a longer period of time produced the largest effect (d = 1.16), there was only one study in this category.

3.5 3 2.5 2 1.5 1 Effect 0.5 0 Brief Short Medium Long

size (d)
Explicit, Complex, Class Explicit, Simple, Explicit,Complex, Implicit, Class Lab Complex, Class Implicit, Simple, Class Implicit, Complex, Lab

Instructional treatment

Figure 6: Mean effect sizes: Length of treatment: Brief (<1), Short (1h x<3h), Medium (3hx<7H), and Long (7h). Table 9: Length of treatment: Brief (<1h), Short (1hx<3h), Medium (3hx<7h), Long (7h) na Explicit Instruction Complex forms Classroom Brief Short Medium Long Laboratory Brief Short Medium Simple forms Classroom Brief Short Medium Long Laboratory Brief Implicit Instruction Complex forms Classroom Brief Short kb Mean d (weighted) SE 95% CI Lower 95% CI Upper

1 1 2 2 4 3 1 3 4 4 1 1

1 3 3 2 8 5 1 6 4 7 1 2

1.03 3.14 1.41 1.18 0.77 0.57 1.27 0.53 1.08 0.81 1.52 0.27

0.48 0.42 0.15 0.21 0.13 0.16 0.63 0.14 0.16 0.14 0.48 0.20

0.10 2.30 1.12 0.77 0.52 0.26 0.04 0.24 0.77 0.55 0.58 -0.12

2.00 4.00 1.70 1.58 1.02 0.88 2.5 0.81 1.40 1.10 2.47 0.66

1 1

1 1

1.00 0.75

0.39 0.39

0.24 -0.02

1.76 1.52

216

Long Laboratory Brief Short Medium Simple forms Classroom Short Medium Long Laboratory Brief Short Long

2 4 6 1 5 1 1 1 1 1

2 8 11 4 5 1 1 1 2 1

0.19 0.34 0.56 0.33 0.33 0.70 0.02 -0.06 0.50 1.16

0.11 0.15 0.12 0.20 0.15 0.27 0.43 0.28 0.37 0.76

-0.03 0.05 0.32 -0.06 0.05 0.18 -0.81 -0.60 -0.22 -0.34

0.42 0.62 0.80 0.72 0.620 1.22 0.86 0.48 1.12 2.66

aNumber of sample studies (e.g., a study report may include multiple sample studies) contributing the metaanalysis. bNumber of treatment groups contributing the meta-analysis.

4. DISCUSSION AND CONCLUSION The results of the meta-analysis are discussed with reference to the four research questions outlined earlier beginning with the first question: Do simple and complex features benefit equally from explicit and implicit instruction in the short and long term? The results of this meta-analysis indicate that the effect sizes for explicit instruction are consistently larger than those for implicit instruction. Explicit instruction was found to be beneficial for both simple and complex forms in both classroom and laboratory settings. These results support those of Norris & Ortega (2000) in terms of advantages for explicit over implicit instruction. The findings are also consistent with Robinson (1996) who reported benefits for explicit instruction with complex features and with DeKeyser (1995) and Williams & Evans (1998) who reported advantages of explicit instruction for simple features (although not with complex features). Furthermore, these findings are similar to other research in which explicit instruction has been found to be equally beneficial for simple and complex features (de Graaf, 1997, Housen Pierrard & Van Daele, 2005). Do explicit and implicit instruction lead to similar types of language ability for complex and simple forms? The results of this meta-analysis do not permit a clear answer to this question and this is mainly due the small number of studies in some of the categories. Nonetheless, the effect sizes were highest for performance on controlled measures after explicit instruction for both simple and complex features. This suggests that explicit instruction leads to explicit knowledge for both complex and simple features. The highest effect size for implicit instruction was also on controlled measures but there was only one study contributing to this category. The second largest effect size was for implicit instruction on free measures in the laboratory setting but again, there is only one study contributing to this analysis. These findings also reveal that implicit instruction did not have a large impact on learning outcomes regardless of linguistic complexity, except perhaps for simple forms in laboratory settings. This result may be due to the characteristics of the laboratory setting where simple forms become salient to learners through intensive corrective feedback with a small number of students interacting together (Han, 2002). In the Norris and Ortega (2000) meta-analysis only 16% of the studies used free outcome measures. This led them to question whether the benefits they observed for explicit instruction 217

was related to the fact that the language measures were measuring explicit knowledge. In a re-analysis of the Norris and Ortega data, Doughty (2003) concluded that there was a bias in the studies and that the reason for the superior benefits of explicit instruction were directly related to the fact that the studies measured explicit knowledge. The studies included in this research were published as recently as 2006, and 50 percent of them utilized free outcome measures. In a recent meta-analysis of interaction-based SLA research, Mackey and Goo (in press) report that 52% of the outcome measures used in the studies were open-ended production measures (e.g. oral production tasks and writing tests). Thus, it would appear that SLA researchers have responded to the call to include more measures of less analyzed and more spontaneous L2 ability. Furthermore, the effects of explicit instruction do not seem to be restricted to superior performance on tasks that measure controlled and analyzed knowledge. Relatively large effect sizes were also obtained on free outcome measures for explicit instruction of complex forms. These large effect sizes for free outcome measures following explicit instruction are consistent with the findings reported in Housen et al. (2005). In that study explicit instruction had a positive impact on learners performance on a picture-cued oral performance in relation to complex forms (i.e., French passives). One could argue, however, that the measures characterized as free in this meta-analysis cannot be described as completely spontaneous and unrestricted since most of them were oral/written picture description and picture-cued production tasks. Nonetheless, more studies are needed to examine whether explicit instruction promotes L2 learning of implicit knowledge or spontaneous performance. Are there are different learning outcomes for complex and simple forms depending on whether explicit/implicit instruction is provided in the classroom or laboratory? The overall findings of this meta-analysis suggest that the differences are minimal between explicit instruction provided in the classroom and laboratory contexts. The only instance where we did observe differences was with implicit instruction. While explicit instruction appeared to be equally effective for both controlled and free tasks regardless of linguistic features and settings, implicit instruction was strong only in two instances: complex forms measured by controlled tasks in classroom settings and simple forms measured by free tasks in laboratory settings. However, because there was only one study contributing to each large effect size, we need more studies of implicit instruction before drawing any conclusions. Does length of explicit/implicit instruction make any difference in terms of learners developing knowledge of complex and simple forms? An investigation of this question revealed that while length of explicit and implicit instruction can make a difference, it does not always do so. For example, the effect sizes for explicit instruction were consistently large for complex forms regardless of the length of treatment. For the most part, there were also large effect sizes for simple forms in the classroom setting. However, length of treatment appeared to make a difference with implicit instruction of simple features the longer the treatment, the better. One explanation for this may be that while simple features can be noticed in the input without the help of explicit instruction (Hulstijn & de Graaff, 1994), it may take a longer time for implicit instruction to show its positive effects on L2 learning (Ellis, 1993). To conclude, the overall findings of this meta-analysis suggest that explicit instruction is effective for both simple and complex features in the classroom and the laboratory. Thus, these finding do not support the hypothesis that type of language feature is a crucial variable in instructed SLA research. An important caveat is in order however, and is related to the way 218

in which simple and complex features are defined and operationalized. If we had used different and/or a broader range of linguistic criteria to distinguish the two language features the results might have been different. Using psycholinguistic criteria based on learning difficulty might also lead to different findings although deciding on what those psycholinguistic criteria would be for a range of different language features is not clear. Furthermore, as DeKeyser (2005) points out complexity is an individual issue that can be described as the ratio of the rules inherent linguistic complexity to the students ability to handle such a rule. What is a rule of moderate difficulty for one student may be easy for a student with more language learning aptitude or language learning experience (De Keyser, 2005, p. 331). Thus, the subjective difficulty of the language feature adds another complication to investigating the effects of instruction on the learning of different L2 features. An interesting finding from this research is that while explicit instruction was superior in contributing to learners controlled/analyzed knowledge of complex and simple forms, it also contributed to their ability to use language in less restricted ways. Furthermore, implicit instruction did not always lead to greater gains on free outcome measures but contributed to learners controlled/analyzed knowledge as well. More research is clearly needed to be more confident of these findings. Specifically, there is a need for more studies investigating the effects of implicit instruction measuring learners outcomes on spontaneous and controlled measures. There is also a need for more studies that provide instruction over longer periods of time and that include delayed post-tests in their design, particularly in the classroom setting. This continued research will undoubtedly be of great help in future meta-analyses of the effects of different types of instruction on the learning of specific L2 language features. REFERENCES21 *Alanen, R. (1995). Input enhancement and rule presentation in second language acquisition. In R. Schmidt (Ed.), Attention and awareness in foreign language learning and teaching (Technical Report No. 9), (pp. 259-302). Honolulu, HI: University of Hwaii, Second Language Teaching & Curriculum Center. **Ammar, A., & Lightbown, P. M. (2005). Teaching marked linguistic structure-more about the acquisition of relative clauses by Arab learners of English. In A. Housen & M. Pierrard (Eds.), Investigation in instructed second language acquisition, (pp.167-198). Amsterdam: Mouton de Gruyter. *Bardovi-Harlig, K. (1994). Revierse-order reports and the acquisition of tense: Beyong the principle of chronological order. Language Learning 44, 243-282. *Bardovi-Harlig, K. (1995). A narrative perspective on the development of the tense/aspect system in second language acquisition. Studies in Second Language Acquisition 17, 263-291. *Bardovi-Harlig, K. (1997). Another piece of the puzzle: The emergence of the present perfect. Language Learning 47, 375-422.

21

References with one asterisk are the 102 study reports that were retrieved through literature search.

219

*Bardovi-Harlig, K. (1998). Narrative structure and lexical aspect: Conspiring factors in second language acquisition of tense-aspect morphology. Studies in Second Language Acquisition 20, 471-508. *Bardovi-Harlig, K. (1999). From morpheme studies to temporal semantics: Tense-aspect research in SLA. Studies in Second Language Acquisition 21, 341-382. *Bardovi-Harlig, K. (2000). Tense and aspect in second language acquisition: Form, meaning, and use. Malden. MA: Blackwell. *Bardovi-Harlig, K. (2002). Analyzing aspect. In R. Salaberry, & Y. Shirai (Eds.), The L1 acquisition of tense-aspect morphology, (pp.129-154). Amsterdam: Benjamins. *Bardovi-Harlig, K., & Bergstom, A. (1996). The acquisition of tense and aspect in SLA and FLL: A study of learner narratives in English (SL) and French (FL). Canadian Modern Language Review 52, 308-330. *Bardovi-Harlig, K., & Raynolds, D.W. (1995). The role of lexical aspect in the acquisition of tense and aspect. TESOL Quarterly 29, 107-131. **Benati, A. (2005). The effects of processing instruction, traditional instruction and meaningoutput instruction on the acquisition of the English past simple tense. Language Teaching Research 9(1), 67-93. **Bitchener, J., Young, S., & Cameron, D. (2005). The effect of different types of corrective feedback on ESL student writing. Journal of Second Language Writing 14, 191-205. *Bouton, L.F. (1994). Can CCS skill in interpreting implicature in American English be improved through explicit instruction? --- A pilot study. Pragmatics and Language Learning 5, 89-109. *Cadierno, T. (1995). Formal instruction from a processing perspective: An investigation into the Spanish past tense. The Modern Language Journal 79, 179-193. *Carroll, S., Roberge, Y., & Swain, M. (1992). The role of feedback in adult second language acquisition: Error correction and morphlogical generalization. Applied Psycholinguistics 13, 173-189. **Carroll, S., & Swain, M. (1993). Explicit and implicit negative feedback: An empirical study of the learning of linguistic generalizations. Studies in Second Language Acquisition 15(3), 357-386. Celce-Murcia, M., & Larsen-Freeman, D. (1999). The grammar book. Boston, MA: Heinle & Heinle. **Chan, A.Y.W. (2006). An algorithmic approach to error correction: An empirical study. Foreign Language Annals 39(1), 131-147. *Clachar, A. (2005). Creole English speakers treatment of tense-aspect morphology in English interlanguage written discourse. Lanugage Learning 55(2), 275-334. 220

*Collins, L. (2002). The role of L1 influence and lexical aspect in the acquisition of temporal morphology. Language Learning 52(1), 43-94. *Day, E., & Shapson, S. (1991). Integrating formal and functional approaches to language teaching in French immersion: An experimental study. Language Learning 41, 25-58. *de Graaff, R. (1997). The eXperanto experiment: Effects of explicit instruction on second language acquisition. Studies in Second Language Acquisition 19, 249-297. *DeKeyser, R.M. (1995). Learning second language grammar rules: An experiment with a miniature linguistic system. Studies in Second Language Acquisition 17, 379-410. *DeKeyser, R.M. (1997). Beyond explicit rule learning: Automatizing second language morphosyntax. Studies in Second Language Acquisition 19, 195-221. DeKeyser, R. (2005). What makes learning second-language grammar difficult? A review of issues. Language Learning 55 (s1), 1-25. *DeKeyser, R.M., & Sokalski, K.J. (1996). The differential role of comprehension and production practice. Language Learning 46, 613-642. **Doughty, C. (1991). Second language instruction does make a difference: Evidence from an empirical study of SL relativization. Studies in Second Language Acquisition 13, 431-469. Doughty, C. (2003). Instructed SLA: Constraints, compensation, and enhancement. In C. Doughty & M. Long (Eds.), The handbook of second language acquisition, (pp. 256-310). *Doughty, C., & Varela, E. (1998). Communicative focus on form. In C. Doughty & J. Williams (Eds.), Focus on form in classroom second language acquisition, (pp.114-138). Cambridge: Cambridge University Press. Ellis, R. (1993). The structural syllabus and second language acquisition. TESOL Quarterly, 27(1), 91-113. Ellis, R. (2001). Investigating form-focused instruction. Language Learning 51, 1-46. **Ellis, R., Loewen, S., & Erlam, R. (2006). Implicit and explicit corrective feedback and the acquisition of L2 grammar. Studies in Second Language Acquisition 28, 339-368. *Ellis, R., Rosszell, H., & Takashima, H. (1994). Down the garden path: Another look at negative feedback. JALT Journal 16, 9-24. **Fotos, S., & Ellis, R. (1991). Communicating about grammar: A task-based approach. TESOL Quarterly 25(4), 605-628. *Gor, K., & Cherinigovskaya, T. (2005). Formal instruction and the acquisition of verbal morphology. In A. Housen, & Pierrard, M (Eds.), Investigation in instructed second language acquisition, (pp.131-164). New York: Mouton de Gruyter.

221

*Griggs, P. (2005). Assessment of the role of communication tasks in the development of second language oral production skills. In A. Housen, & Pierrard, M (Eds.), Investigation in instructed second language acquisition, (pp.407-432). New York: Mouton de Gruyter. **Han, Z. (2002). A study of the impact of recasts on tense consistency in L2 output. TESOL Quarterly 36(4), 543-572. *Harley, B. (1989). Functional grammar in French immersion: A classroom experiment. Applied Linguistics 10, 331-359. Hedges, L.V. (1981). Distribution theory for Glasss estimator of effect size and related estimators. Journal of Educational Statistics 6, 107-128. *Herron, C., & Tomasello, M. (1988). Learning grammatical structures in foreign language: Modelling versus feedback. The French Review 61, 910-922. *Hinkel, E. (1997). The past tense and temporal verb meanings in a contextual frame. TESOL Quarterly 31(2), 289-313. *Housen, A. (2002). The development of tense-aspect in English as a second language and the variable influence of inherent aspect. In R. Salaberry, & Y. Shirai (Eds.), The L1 acquisition of tense-aspect morphology, (pp.155-197). Amsterdam: Benjamins. Housen, A., Pierrard, M., & Van Daele, S. (2005). Rule complexity and the efficacy of explicit grammar instruction. In A. Housen & M. Pierrard (Eds.), Investigations in instructed second language acquisition, (pp.235-269). Amsterdam: Mouton de Gruyter *Hulstijn, J.H. (1989). Implicit and incidental second language learning: Experiments in the processing of natural and partly artificial input. In H.W. Dechert & M. Raupach (Eds.), Interlingual processes, (pp.49-73). Tubingen: Gunter Narr. Hulstijn, J. H., & de Graaf, R. (1994). Under what conditions does explicit knowledge of a second language facilitate the acquisition of implicit knowledge? A research proposal. AILA Review 11, 97-112. *Ionin, T., & Wexler, K. (2003). Why is Is easier than -s?: Acquisition of tense/agreement morphology by child second language learners of English. Second Language Research 18(2), 95-136. **Izumi, S. (2002). Output, input enhancement, and the noticing hypothesis: An experimental study on ESL relativization. Studies in Second Language Acquisition 24, 541-577. *Izumi, S., & Bigelow, M. (2000). Does output promote noticing and second language acquisition? TESOL Quarterly 34 (2), 239-278. *Izumi, S., Bigelow, M., Fujiwara, M., & Fearnow, S. (1999). Testing the output hypothesis: Effects of output on noticing and second language acquisition. Studies in Second Language Acquisition 21, 421-452.

222

**Izumi, Y., & Izumi, S. (2004). Investigating the effects of oral output on the learning of relative clauses in English: Issues in the psycholinguistic requirements for effective output tasks. The Canadian Modern Language Review 60(5), 587-609. **Izumi, S., & Lakshmanan, U. (1998). Learnability, negative evidence and the L2 acquisition of the English passive. Second Langauge Research 14(1), 62-101. *Jourdenais, R., Ota, M., Stauffer, S., Boyson, B., & Doughty, C. (1995). Does textual enhancement promote noticing? A think-aloud protocol analysis. In R. Schmidt (Ed.), Attention and awareness in foreign language learning (Technical Report No. 9), (pp. 182216). Honolulu, HI: University of Hawaii, Second Language Teaching & Cucciculum Center. Keck, C.M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006). Investigating the empirical link between task-based interaction and acquisition: A meta-analysis. In J. Norris, & L. Ortega (Eds.), Synthesizing research on language learning and teaching, (pp.91-131). Amsterdam: Benjamins. *Kihlstedt, M. (2002). Reference to past events in dialogue; The acquisition of tense and aspect by advanced learners of English. In R. Salaberry, & Y. Shirai (Eds.), The L1 acquisition of tense-aspect morphology, (pp.323-361). Amsterdam: Benjamins. Krashen, S. (1982). Principles and practice in second language acquisition. Oxford: Pergamon. Krashen, S. (1994). The input hypothesis and its rivals. In N. Ellis (Ed.), Implicit and explicit learning of languages, (pp.45-77). London: Academic Press. *Kubota, M. (1993). Accuracy order and frequency order of relative clauses as used by Japanese senior high school students of EFL. Institute for Research in Language Teaching Bulletin 7, 27-53. **Kubota, M. (1994). The role of negative feedback on the acquisition of the English dative alternation by Japanese college students of EFL. Institute for Research in Language Teaching Bulletin 8, 1-36. *Kubota, M. (1995a). The Garden Path technique: Is it really effective? Working Papers of Chofu Gakuin Woemns Junior College 27, 21-48. *Kubota, M. (1995b). Teachability of conversational implicature to Japanese EFL learners. Institute for Research in Language Teaching Bulletin 9 , 35-67. *Kubota, M. (1996). The effects of instruction plus feedback on Japanese university students of EFL: A pilot study. Bulletin of Chofu Gakuen Womens Junior College 18 , 59-95. **Kuiken, F., & Vedder, I. (2002). The effect of interaction in acquiring the grammar of a second language. International Journal of Educational Research 37 , 343-358. *Lantolf, J.P., DiCamilla, F.J., & Ahmed, M.K. (1997). The cognitive function of linguistic performance: Tense/aspect use by L1 and L2 speakers. Language Sciences 19, 153-165.

223

*Larsen-Freeman, D., Kuehn, T., & Hacciuis, M. (2002). Helping students make appropriate English verb tense-aspect choices. TESOL Journal 11(4), 3-9. *Leow, R.P. (1997). Attention, awareness, and foreign language behavior. Language Learning 47, 467-506. *Leow, R.P. (1998a). The effects of amount and type of exposure on adult learners L2 development in SLA. The Modern Language Journal 82, 49-68. *Leow, R.P. (1998b). Toward operationalizing the process of attention in SLA: Evidence for Tomlin and Willas (1994) fine-grained analysis of attention. Applied Psycholinguistics 19, 133-159. Lipsey, M., & Wison, D. (2001). Practical meta-analysis. London: Sage. *Long, M.H., Inagaki, S., & Ortega, L. (1998). The rule of implicit negative feedback in SLA: Models and recasts in Japanese and Spanish. The Modern Language Journal 82, 357-371. *Loschky, L. (1994). Comprehensible input and second language acquisition: What is the relationship? Studies in Second Language Acquisition 16, 303-323. *Lyster, R. (1994). The effect of functional-analytic teaching on aspects of French immersion students sociolinguistic competence. Applied Linguistics 15, 263-287. **Mackey, A. (1999). Input, interaction, and second language development: An empirical study on question formation in ESL. Studies in Second Language Acquisition 21, 557-587. **Mackey, A. (2006). Feedback, noticing and instructed second language learning. Applied Linguistics 27(3), 405-430. Mackey, A., & Goo, J. (in press). Interaction research in SLA: A meta-analysis and research synthesis. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A series of empirical studies, (pp. 407-452). Oxford: Oxford University Press. **Mackey, A., & Oliver, R. (2002). Interactional feedback and childrens L2 development. System 30, 459-477. **Mackey, A., & Philp, J. (1998). Conversational interaction and second language development: Recasts, Responses, and red herrings? The Modern Language Journal 82(3), 338-356. **Master, P. (1994). The effect of systematic instruction on learning the English article system. In T. Odlin (Ed.), Perspectives on pedagogical grammar, (pp. 229-252). Cambridge: Cambridge University Press. *McDonough, K. (2004). Learner-learner interaction during pair and small group activities in a Thai EFL context. System 32, 207-224. **McDonough, K. (2005). Identifying the impact of negative feedback and learners responses on ESL question development. Studies in Second Language Acquisition 27, 79-103. 224

*McDonough, K. (2006). Interaction and syntactic priming: English L2 speakers production of dative constructions. Studies in Second Language Acquisition 28, 179-207. **Muranoi, H. (2000). Focus on form through interaction enhancement: Integration formal instruction into a communicative task in EFL classrooms. Language Learning 50(4), 617-673. *Murphy, V.A. (2004). Dissociable systems in second language inflectional morphology. Studies in Second Language Acquisition 26(3), 433-459. *Nagata, N. (1993). Intelligent computer feedback for second language instruction. The Modern Language Journal 77, 330-339. *Nagata, N. (1995). An effective application of natural language processing in second language instruction. CALICO Journal 13, 47-67. *Nagata, N. (1997a). The effectiveness of computer-assisted metalinguistic instruction: A case study in Japanese: Foreign Language Annals 30, 187-200. *Nagata, N. (1997b). An experimental comparison of deductive and inductive feedback generated by a simple parser. System 25, 515-534. *Nagata, N. (1998). Input vs. output practice in educational software for second language acquisition. Language Learning & Technology 1(2), 23-40. *Nakamori, T., (2002). Teaching relative clauses: how to handle a bitter lemon for Japanese learners and English teachers. ELT Journal 56 (1), 29-40. Norris, J., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning 50, 417-528. Norris, J., & Ortega, L. (2006). The value nad practice of research synthesis for language learning. In J. Norris, & L. Ortega (Eds.), Synthesizing research on language learning and teaching, (pp.3-52). Amsterdam: Benjamins. *Nunan, D. (1994). Linguistic theory and pedagogic practice. In In T. Odlin (Ed.), Perspectives on pedagogical grammar (pp.253-270). Cambridge: Cambridge University Press. Reber, A. S. (1989). Implicit learning and tacit knowledge. Journal of Experimental Psychology: General 118, 219-235 *Robinson, P. (1996). Learning simple and complex second language rules under implicit, incidental, rule-search and instructed conditions. Studies in Second Language Acquisition 18, 27-67. **Robinson, P. (1997a). Generalizability and automaticity of second language learning under implicit, incidental, enhanced and instructed conditions. Studies in Second Language Acquisition 19(2), 233-247. **Robinson, P. (1997b). Individual differences and the fundamental similarity of implicit and explicit adult second language learning. Language Learning 47, 45-99. 225

*Rohde, A. (2002). The aspect hypothesis in naturalistic L2 acquisition: What uninflected and non-target-like verb forms in early interlanguage tell us. In R. Salaberry, & Y. Shirai (Eds.), The L1 acquisition of tense-aspect morphology, (pp.199-220). Amsterdam: Benjamins. *Salaberry, M.R. (1997). The role of input and output practice in second language acquisition. The Canadian Modern Language Review 53, 422-451. *Salaberry, R. (2000). The acquisition of English in an instructional setting. System 28, 135152. *Salaberry, R., & Shirai, Y. (2002). L2 acquisition of tense-aspect morphology. In R. Salaberry, & Y. Shirai (Eds.), The L1 acquisition of tense-aspect morphology, (pp.1-20). Amsterdam: Benjamins. Schwartz, B. (1993). On explicit and negative data effecting and affecting competence and linguistic behavior. Studies in Second Language Acquisition 15, 147-162. *Scott, V. (1989). An empirical study of explicit and implicit teacing strategies in French. The Modern Language Journal 72, 14-22. *Scott, V. (1990). Explicit and implicit grammar teaching: New empirical data. The French Review 62, 779-788. *Slabakova, R., & Montrul, S. (2002). On viewpoint aspect interpretation and its L2 acquisition: A UG perspective. In R. Salaberry, & Y. Shirai (Eds.), The L1 acquisition of tense-aspect morphology, (pp.363-395)). Amsterdam: Benjamins. Spada, N. (1997). Form-focused instruction and second language acquisition: A review of classroom and laboratory research. Language Teaching 30, 73-87. *Spada, N., & Lightbown, P.M. (1993). Instruction and the development of questions in the L2 classroom. Studies in Second Language Acquisition 15, 205-221. **Spada, N., & Lightbown, P. M. (1999). Instruction, first language influence, and developmental readiness in second language acquisition. The Modern Language Journal 83(1), 1-22. Spada, N., & Lightbown, P.M. (in press). Form-focused instruction: Isolated or integrated? TESOL Quarterly. **Spada, N., Lightbown, P. M., & White, J. (2005). The importance of form/meaning mappings in explicit form-focused instruction. In A. Housen & M. Pierrard (Eds.), Current issues in instructed second language learning, (pp.199-234). Brussels: Mouton De Gruyter. **Takashima, H., & Ellis, R. (1999). Output enhancement and the acquisition of the past tense. In R.Ellis (Ed.), Learning ad second language through interaction, (pp.173-188). Amsterdam: Benjamins. *Tickoo, A. (2002). On the sue of then/after that in the marking of chronological order: insights from Vietnemese and Chinese learners of ESL. System 30, 107-124. 226

Truscott, J. (1999). Whats wrong with oral grammar correction. Canadian Modern Language Review 55, 437-456. Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology 51(4), 473-481. *van Baalen, T. (1983). Giving learners rules: A study into the effect of grammatical instruction with varying degrees of explicitness. Interlanguage Studies Bulletin Utrecht 7, 71100. *VanPatten, B., & Cadierno, T. (1993). Explicit instruction and input processing. Studies in Second Language Acquisition 15, 225-241. *VanPatten, B., & Oikkenon, S. (1996). Explanation versus structured input in processing instruction. Studies in Second Language Acquisition 18, 495-510. *VanPatten, B., & Sanz, C. (1995). From input to output: Processing instruction and communicative tasks. In F. Eckman, D. Highland, P. Lee, J. Mileham, & R. Weber (Eds.), SLA theory and pedagogy, (pp. 169-185). Hillsdale, NJ: Lawrence Erlbaum. **White, J., & Ranta, L. (2002). Examining the interface between metalinguistic task performance and oral production in a second language. Language Awareness 11(4), 259-290. **White, L., Spada, N., Ligtbown, P.M., & Ranta, L. (1991). Input enhancement and L1 question formation. Applied Linguistics 12, 416-432. **Williams, J,. & Evans, J. (1998). What kind of focus and on which forms? In C. Doughty & J. Williams (Eds.), Focus on form in classroom second language acquisition, (pp.139-155). New York; Cambridge University Press. *Yang, L., & Givon, T. (1997). Benefits and drawbacks of controlled laboratory studies of second language acquisition. Studies in Second Language Acquisition 19, 173-194. *Yang, S., & Huang, Y. (2004). The impact of the absence of grammatical tense in L1 on the acquisition of the tense-aspect system in L2. International Review of Applied Linguistics in Language Teaching 42(1), 49-70. **Yip, Y. (1994). Grammatical consciousness-raising and learnability. In T. Odlin (Ed.), Perspectives on pedagogical grammar, (pp.123-139). Cambridge: Cambridge University Press. *Zobl, H. (1985). Grammars in search of input and intake. In S. Gass & C. Madden (Eds.), Input in SLA, (pp. 329-344). Rowley, MA: Newbury House.

227

APPENDIX A
Summary of Synthesized Studies Study Ammar & Lightbown (2005) Benati (2005): Study 1 Study 2 Bitcherner, Young, & Cameron (2005) Study 2 Study 3 Chan (2006) Ellis, Loewen, & Erlam (2006) Mackey (2006): Study 1 Study 2 Study 3 Master (1994) Muranoi (2000): Study 1 Study 2 Spada & Lightbown (1999) Spada, Lightbown, & White (2005): Study1 Study2 Takashima & Ellis (1999) White & Ranta (2002) White, Spada, Lightbown & Ranta (1991): Phase 1 Phase 2 Williams & Evans (1998): Study 1 Study 2 Yip (1994) Carroll & Swain (1993) Doughty (1991) Fotos & Ellis (1991): Study 1 Study 2 Han (2002) Izumi (2002) Izumi & Izumi (2004) Izumi & Lakshmanan (1998) Kubota (1994) Kuiken & Vedder (2002) Mackey (1999) Mackey & Oliver (2002) Mackey & Philp (1998) McDonough (2005) Robinson (1996, 1997b): Study 1 Study 2 Robinson (1997a) N 34 47 30 53 (53) (53) 160 34 28 28 28 47 91 91 144 90 90 61 59 129 108 33 33 10 100 20 56 34 8 61 24 15 100 34 34 22 35 60 104 104 60 Context Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Classroom Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Laboratory Targeta Relativization (c) Past tense (s) Past tense (s) Prep (s) Tense (s) Articles (s) Relative (c) Past tense (s) Plurals (s) Past tense (s) Questions (c) Articles (s) Def. Articles (s) Inf. Articles (s) Questions (c) Questions (c) PDs (s) Past tense (s) PDs (s) Questions (c) Questions (c) Passives (c) Participial adj(s) Passives (c) Datives (c) Relativization (c) Datives (c) Datives (c) Tense (s) Relativization (c) Relativization (c) Passives (c) Dative (c) Passives (c) Questions (c) Questions (c) Questions (c) Questions (c) Pseudo-cleft (c) S-V inversion (s) Datives (c) Instructionb E/E/E/C E/E/E/ E/E/E/ E/E/C E/E/C E/E/C E I/E/C I I I E/C I/E/C I/E/C I E/E/C E/E/C I/C E/C E/C E/C I/E/C I/E/C E I/E/E/C I/E/C E/E/C E/E/C I/C I/I/I/I/C I/I/C E/C E/E/I/I/C I/C I/C I/C I/I/I/I/C I/I/I/C I/I/E/E/ I/I/E/E I/E/C

a) Complex and simple forms are represented as (c) and (s), respectively. b) Implicit, explicit, control groups are represented as I, E, and C, respectively.

228

APPENDIX B
Length of Treatment and Timing of Posttests Study lengtha (hour) 1.5 6 6 0.33 0.33 0.33 Immediate (day) 1 0 0 4 weeks 4 weeks 4 weeks In a week 1 In a week In a week In a week In a week In a week In a week 3 In a week In a week 7 7 1st delayed (week) 2nd delayed (week)

Ammar & Lightbown (2005) Benati (2005): Study 1 Study 2 Bitcherner, Young, & Cameron (2005) Study 2 Study 3 Chan (2006) 4 Ellis, Loewen, & Erlam (2006) 1 2 Mackey (2006): Study 1 2.5 Study 2 2.5 Study 3 2.5 Master (1994) 6 Muranoi (2000): Study 1 1.5 5 Study 2 1.5 5 Spada & Lightbown (1999) 8 16 Spada, Lightbown, & White (2005): Study1 6 Study2 6 Takashima & Ellis (1999) 3 lessons 2 7 White & Ranta (2002) 3 White, Spada, Lightbown & Ranta (1991): Phase 1 5 On the 1st day Phase 2 9 On the 1st day 5 Williams & Evans (1998): Study 1 8 2 weeks Study 2 8 2 weeks Yip (1994) 0.75 14 Carroll & Swain (1993) 2 lessons 0 1 Doughty (1991) 1.67 0 Fotos & Ellis (1991): Study 1 0.5 0 Study 2 0.5 0 2 Han (2002) 8 lessons 5 2 Izumi (2002) 4.5 3.43 4 Izumi & Izumi (2004) 1.25 0 Izumi & Lakshmanan (1998) 3 5 8 Kubota (1994) 3 lessons 0 4 Kuiken & Vedder (2002) 1.5 0 2 Mackey (1999) 1 1 Mackey & Oliver (2002) 1.5 1 1 3 Mackey & Philp (1998) 1 1 1 3 McDonough (2005) 0.5 In a week 4 7 Robinson (1996, 1997b): Study 1 0.17 0 Study 2 0.17 0 Robinson (1997a) 0.42 0 a) Some studies reported the length by the number of lessons. We considered each lesson to be an hour.

229

THE EFFECTIVENESS OF A PHRASE-LEARNING APPROACH ON FLUENCY, COMPLEXITY AND ACCURACY IN AND BEYOND THE EFL CLASSROOM. Hlne Stengers1, Alex Housen1, Frank Boers2 and June Eyckmans2 Free University of Brussels1, Erasmushogeschool Brussels2

1. INTRODUCTION Recently, educational linguists have acknowledged the importance of learners mastering of multi-word lexical chunks at large, i.e. lexical items which consist of a sequence of two or more words forming a meaningful unit, such as idioms, and which are referred to in literature as lexical phrases, multiword units, formulas, prefabricated chunks, formulaic sequences, etc. (e.g., Nattinger & DeCarrico, 1992; Pawley & Syder, 1983; Wray, 2002). These multiword expressions are a reflection of Sinclairs (1991) Idiom Principle, according to which language is organised largely in terms of semi-preconstructed phrases that fall outside the scope of grammar-rules. As English grammar is generally said to be relatively straightforward, the true challenge for the learner of English is to master the languages vast repertoire of standardized phrases. Mastery of such phrases is said to be a prerequisite to attain a native-like command of the language. There are roughly three reasons why a command of formulaic sequences in L2 is believed to be beneficial to learners. Firstly, many standardized multiword expressions or formulaic sequences are predictable neither by grammar rules nor by the properties of the individual words they are composed of. In other words, they reflect Sinclairs (1991) idiom principle. Mastery of the idiomatic dimension of natural language can thus help learners come across as native-like. Secondly, since formulaic sequences are believed to be retrieved from memory holistically, i.e. as prefabricated, ready-made chunks, they are believed to facilitate fluent language production under real-time conditions (Skehan, 1998). In fact, one of the characteristics of a formulaic sequence in a speakers real-time discourse is the absence of hesitations within the sequence. In this view, hesitations should occur only in parts of discourse that connect the prefabricated chunks. This leads us to the third reason why mastery of formulaic sequences is believed to be beneficial to learners: formulaic sequences (at least those that are correctly committed to memory) constitute zones of safety and appropriate use of them may thus confine the risk of erring to the spaces in between the formulaic sequences in ones discourse. The pedagogical message to draw learners attention to formulaic sequences has been conveyed to the teaching community perhaps most successfully by Lewis (1993), for whom chunk-noticing is at the heart of his Lexical Approach. In this approach, learners are systematically encouraged to notice recurring lexical chunks in the authentic L2 language they are exposed to. Lewis does not propose many mnemonic strategies to help learners commit those chunks to memory, but seems to rely mostly on the power of awareness-raising to trigger acquisition through imitation of sequences encountered either inside or outside the classroom. Empirical evidence in support of the claim that a Lexical Approach can indeed help learners of English reach a higher level of perceived oral proficiency, i.e. fluency, complexity and accuracy, has been reported by Boers et al. (2006). So far, the call for such a phraselearning approach, i.e. an approach which fully acknowledges the prevalence of 230

idiomaticity, seems to remain largely confined to TEFL. Some recent articles, e.g. by Gmez Molina (2003) and by Alonso Raya (2003), suggest that a Lexical Approach could usefully be implemented in the teaching of Spanish, too. However, these articles fail to provide empirical support. Moreover, it has often been suggested to us that English may be exceptionally idiomatic and that ESL/EFL may therefore be exceptionally well suited to a pedagogical approach that prioritises mastery of (semi-)fixed phrases. To our knowledge, however, no empirical evidence has ever been adduced to substantiate the claim that English is any more idiomatic than other languages. Instead, it seems likely that this belief in the highly idiomatic nature of English is fuelled by the abundance of popular-academic articles, web pages, and course materials that are devoted to English idioms. This disproportionate attention given to idioms in English in comparison to other target languages is not in itself surprising given the dominance of English language publications and the abundance of English pedagogical materials in general. Anyone claiming that English might be exceptionally idiomatic should also clarify what they understand by the term idiomatic. As Grant and Bauer (2004) point out, idiomaticity can be defined in two ways: 1. Narrowly, it pertains to the relatively high density of idioms, i.e. the use of the sort of standardized expressions that are typically included in idiom dictionaries. 2. Broadly, it refers to native-like selection in the language (Pawley & Syder, 1983) that is, to the whole of a languages stock of standardized phrases, which includes idioms, but also various other kinds of multiword units such as strong collocations, proverbs and formulae in general. As part of an attempt to investigate the applicability of a Lexical Approach beyond ESL/EFL, we will present a quantitative comparison of the relative pervasiveness of idiomaticity in English and Spanish. The investigation bears on both definitions, i.e. both the narrow and the broad conception of idiomaticity. It encompasses i) a corpus-based comparison of the frequency of figurative idioms and ii) a psycholinguistic experiment targeting the identification of English and Spanish multiword expressions at large, respectively. The results of this study are reported more exhaustively in Stengers (forthcoming). 2. COMPARING THE FREQUENCY OF FIGURATIVE IDIOMS. First, we report the results of a corpus-based, quantitative study in which the overall frequency of occurrence of a sample of 500 English figurative idioms was compared to that of a similar set of 500 Spanish idioms. The idioms selected for the study are figurative in the sense that their metaphorical meaning can still be traced back to their original, literal usage. They are standardized, (semi-)fixed, multiword expressions that instantiate implicit analogies referring to their origins, i.e. the source domains in which they were originally used in a literal sense. For example, To show someone the ropes (teach someone how to do a certain task) can be traced back to its original, literal usage where an experienced sailor shows a novice how to handle the ropes on a sailing boat. Idioms obviously display different degrees of transparency or opacity when it comes to their origins. Fortunately, several idiom-dictionary makers have recently begun to provide users with information about the origin of idioms, including the dictionaries that we chose to use: The Oxford Dictionary of Idioms (Speake, 1999) and The Collins Cobuild Dictionary of Idioms (Sinclair and Moon 2002 ed.) for English, and the

231

Diccionario Espasa de dichos y frases hechas (Buitrago Jimnez, 1998) for Spanish, supplemented with Del dicho al hecho (Gimnez, 1998). Each randomly selected idiom and its potential variants, i.e. all possible syntagmatic, lexical or stylistic pattern variations of the idiom, were subsequently retrieved in the Collins Online English and Spanish Wordbanks. A comparison of the results reveals a striking similarity in the overall frequency of occurrence of the idioms in both languages. Table One provides an overview of the overall frequencies of occurrence of our samples of 500 idioms in the English and Spanish corpora. As the corpora differed in size (56 million words for English, as opposed to 73 million words for Spanish), the frequencies for the Spanish idioms were converted to values for a 56-million-word corpus. The means of both samples then turn out to be virtually identical: the English idioms in our sample occur on average 24.56 times in the corpus, as compared to an average for the Spanish idioms of 24.57 times.
Table 1: Frequency of occurrence of 500 English and 500 Spanish idioms.

In both corpora around 75% of the idioms actually occurred only between 0 and 29 times, which also explains why the standard deviations given in Table One are quite similar: Eng. 39.67 and Sp. 35.78. We nevertheless applied a T-test to the seemingly different variability in frequency ranges between the English and Spanish sample, but this calculation reveals no significant difference at all ( p = .29; two-tailed).
Sample Combined frequency Per 56 million words 500 English idioms 12, 278 500 Spanish 12, 287 idioms Mean frequency 24.56 24.57 Standard deviation 39.67 35.78

In sum, the quantitative results suggest that, overall, figurative idioms are used just as often in Spanish as in English. So, at least as far as the narrow conception of idiomaticity is concerned (i.e. idiomaticity measured in terms of the density of occurrence of figurative idioms), we have found no evidence to support the widespread notion that English is an exceptionally idiomatic language. 3. COMPARING THE PERCEIVED PREFABRICATED-NESS A broader degree of idiomaticity is currently being probed in an experiment involving English/Spanish native speakers and language teachers intuitive identification of formulaic sequences operating under the idiom principle. As mentioned above, idiomaticity in its broad sense refers the whole of a languages stock of formulaic sequences and collocational patterns. Standardized phrases can be very diverse, in terms of lexical composition as well as function: they range from simple fillers (e.g., Sort of) and functions (e.g., Excuse me) over collocations (e.g., Tell a story) and idioms (e.g., Back to square one) to proverbs (e.g., Lets make hay while the sun shines) and lengthy standardized phrases (e.g., There is a growing body of evidence that). Wray (2002:9) defines a formulaic sequence as a sequence, continuous or discontinuous, of words or other elements, which is, or appears to be, prefabricated: that is, stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar. This definition clearly acknowledges the fuzzy nature of the category called formulaic sequences, since a stretch of words that is processed holistically by one individual need not be processed that way by another. Furthermore, holistic processing itself may be a matter of degree. As a result, the best one 232

can expect in many cases is a degree of inter-subjective agreement as to what does or does not qualify as a multiword expression. If phrases make up a category that is so difficult to delineate, how then can one measure a languages phraseomaticity? In what follows we present an experimental study which attempted to do just that while acknowledging the reality of the concepts fuzziness. The psycholinguistic experiment reported below is still in progress. It encompasses a study of what could be referred to as perceived idiomaticity or perceived prefabricated-ness. The purpose is to assess the extent to which native speakers and/or teachers of English and Spanish as foreign languages experience certain strings of words in a sample of their language as formulaic and holistically processed expressions. We chose to work with recorded spoken language, since the recognition of formulaic utterances can be aided considerably by auditory elements such as prosodic contrasts, pauses, intonation contours, etc. The audio samples of English and Spanish which we compiled consisted of approximately one hour of recorded native spoken English or Spanish. The samples (recorded mostly from BBC Radio 4 for English, and BBCMundo and Cadena Ser for Spanish) were made up of 18 interview fragments in English and 17 interview fragments in Spanish. The two samples were matched in terms of topics discussed, which came from the domains of politics, science, art, etc. The genre of most of the discourse can be described as semi-formal, semi-spontaneous. Respondents were asked to listen to the recorded interviews once and to write down each word string they considered to be standard formulaic sequences. They were asked to write down all instances, including phrases which happened to recur in the given fragment. This implies that the experiment aimed to chart respondents chunk perception. All participants were given the same instructions and information as to what constitutes a phrase. They were presented with auxiliary identification criteria, such as standardization and semi-fixedness. For a study that depends on a satisfactory degree of inter-subjective agreement, it is indispensable that sufficient numbers of respondents be involved. Since the experiment is still running, we can only present results that are based on the data obtained from eight judges for the English sample (four natives and four teachers), and seven judges (three natives and four SFL teachers) for the Spanish sample that have so far submitted their protocols. The means of all chunk instances perceived in the complete recording, i.e. all interview fragments, by the respondents in both languages is rendered in Table Two. On average, respondents seem to have perceived slightly more chunk instances in the Spanish sample than in the English sample. If we convert these results into the average number of formulaic sequences recognised per minute, we obtain a ratio of 5.78 chunks/minute (SD 2.42) for English, compared to 6.43 chunks/minute (SD 2.16) for Spanish. As expected, due to the fuzziness of the linguistic concept, the observed standard deviations in both samples are considerable, but comparable for both groups of respondents. Application of the Mann Whitney U statistic reveals no significant difference between the two groups of respondents counts. This means that if we rank all respondents counts, the low and high counts are evenly distributed across the two groups of English and Spanish judges.
Table 2: average numbers of word strings written down by respondents. Audio Sample English Spanish Mean of all chunk moments (N 8 English; N 7 Spanish) 292.69 310 Standard deviation 122.66 109.40 Mean chunks per minute 5.78 6.12

233

As mentioned above, the identification of standardized phrases cannot be fully objective, but one should nevertheless aim at a satisfactory degree of inter-subjectivity. It appears that respondents for both samples were equally consistent in their identification of formulaic sequences. This equivalence in inter-respondent consistency is also reflected in the means of the correlation coefficients, which were .688 for English (SD .093), and .652 for Spanish (SD . 119). Given the small size of the current database, it would be premature to draw any strong conclusions yet. Still, there seems to be no evidence so far in support of the claim that English is exceptionally idiomatic (and therefore better suited for a language teaching method based on a Lexical Approach). In fact, slightly more chunks were signalled by our respondents in the Spanish sample than in the English sample. It needs to be pointed out that if respondents were fairly consistent in their chunk perception, that does not mean that the same chunks were consistently noticed by all respondents. These quantitative results should be complemented with a qualitative analysis of the identified chunks, aiming at assessing the overlap in the chunk recognition by the different respondents within each language sample. 4. IMPLICATIONS FOR THE APPLICABILITY OF A LEXICAL APPROACH Neither the corpus-based comparison of the frequency of occurrence of figurative idioms nor the experimental study on the perceived prevalence of prefabricated lexical chunks has yielded any evidence in support of the popular notion that English is an exceptionally idiomatic language. Both comparative studies reported in this paper indicate that Sinclairs Idiom Principle is at least as pervasive in Spanish as it is in English discourse. In fact, there is little theoretical ground for believing that some languages might be more idiomatic than others, either at the level of frequency of idioms or at the level of phraseology at large. Firstly, according to Cognitive Linguistics at least, figurative idioms are the result of general and universal cognitive processes of figurative thought that enable human beings to communicate about non-tangible things and processes (Lakoff & Johnson, 1980; Lakoff, 1987). Whenever discourse revolves around abstract concepts, metaphors and figurative expressions are bound to be used, as they help us understand them by making reference to more concrete experience. Given their similar conceptual and communicative needs, there seems to be little reason to believe that linguistic communities would significantly differ with regard to the overall volume or intensity of use of their idiom repertoires. Secondly, research in applied psycholinguistics, although mainly based on English language, suggests that prefabricated multiword chunks in general play a crucial role in enabling language users to process language real time, i.e. without time-consuming planning or monitoring (e.g. Ellis, 2002; Skehan, 1998). We may assume that native speakers of all languages resort to a stock of ready-made phrases to process their language especially under real-time conditions. If there is neither ground nor evidence for believing in quantitative differences between languages with regard to idiomaticity (whether defined narrowly or broadly), then there would seem to be no reason to believe that a pedagogical approach that highlights the importance of phraseology would be less relevant to languages other than English, such as Spanish.

234

4.1 Pedagogical effectiveness of a Lexical Approach in EFL vs. SFL However, English and Spanish have different typological properties, e.g. in terms of relative flexibility of word order and degree of inflection, which may have repercussions on the learning process. So, although a Lexical Approach certainly seems relevant for the teaching and learning of Spanish (or probably any natural language), its pedagogical effectiveness should not be taken for granted across the linguistic board. The greater importance of inflection in Spanish may create an extra challenge for learners, either because they have to apply more procedural knowledge of grammar to adopt the phrases they have learnt to fit the syntactic context, or because they need to commit to memory many more ready-made variants of the same canonical forms. It may thus take a comparatively greater amount of practice and exposure for a learner of Spanish to start using L2 phraseology accurately and fluently. In practice, this may mean that the benefits of a Lexical Approach for learners realtime language production may become noticeable (in terms of correlations with, for example, accuracy and fluency ratings) earlier in the process of learning English than Spanish. An empirical comparison of the relative pedagogical effectiveness of a Lexical Approach to English and to Spanish as foreign languages is currently being carried out by means of two controlled experiments with a pre and post test design. Both experiments involve two equivalent groups of learners of English or Spanish as an L2 who are matched in terms of level of instruction and oral proficiency. Within the scope of the same proficiency course, each language group is split up into two other equivalent groups, i.e. an experimental group and a control group. Both groups are presented with the same teaching material, but whereas the instructional method in the experimental group is phraseology-oriented, the control group is exposed to a more traditional grammar-oriented instruction. All students have been pre-tested on the basis of two oral tasks and will be subjected to a similar post-test at the end of the instruction. Students oral performance is evaluated by three trained blind judges, i.e. English or Spanish native speakers and/or language teachers, according to a 15-point scale associated with Common European Framework of Reference (CEFR) level descriptors for the oral proficiency parameters fluency, complexity and accuracy. In order to correlate students perceived oral proficiency on these parameters with their appropriate use of multiword expressions, three other trained blind judges, again English or Spanish native speakers and/or language teachers, identify the phrases or multiword expressions in the students recordings. These experiments, address the following research questions: Does the use of multiword units positively influence learners perceived oral proficiency in the target languages. If so then precisely what qualitative dimension(s) of spoken language does it affect: Does it help the L2 learners come across as more fluent, as having a wider range of expression or complexity, and/or as producing more accurate language? If the use of multiword units positively influences perceived oral proficiency, then is this influence equally strong in different L2s, in our case English and Spanish? Or do language typological variables play a part in the relative impact of phrasal knowledge? The results of this ongoing experiment, and the answers to these questions, will be presented and discussed in a future publication. REFERENCES 235

Alonso Raya, R. (2003). Algunas aplicaciones del enfoque lxico. Mosaico. Revista de Difusin para la Promocin y Apoyo a la Enseanza del Espaol 11, 9-14. Buitrago Jimnez, A. (1997). Diccionario de dichos y frases hechas. Madrid: Espasa-Calpe. Boers, F., Eyckmans, J., Kappel, J., Stengers, H. and Demecheleer, M. (2006). Formulaic sequences and perceived oral proficiency: putting a lexical approach to the test , Language Teaching Research 10 (3), pp.245-261. Ellis, N. (2002). Frequency effects in language processing. A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24, 143-188. Gimnez, E. (1998). Del dicho al hecho. Argentina: Ed. San Pablo Gmez Molina, J.R. (2003). Las unidades lxicas: tipologa y tratamiento en el aula de ELE. Mosaico. Revista de Difusin para la Promocin y Apoyo a la Enseanza del Espaol 11, 4-9 Grant, L. and Bauer L. (2004). Criteria for re-defining idioms: Are we barking up the wrong tree? Applied Linguistics 25, 38-61. Lakoff, G. (1987). Women, Fire and Dangerous Things: What Categories Reveal about the Mind. Chicago/London: University of Chicago Press. Lakoff, G. and Johnson, M. (1980). Metaphors We Live By. Chicago/London: University of Chicago Press. Lewis, M. (1993). The Lexical Approach. The state of ELT and a way forward. Hove: LTP. Nattinger, J. R. & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford: OUP. Pawley, A. & Syder, F. H. (1983). Two puzzles for linguistic theory: nativelike selection and nativelike fluency. In Richards, J.C. and Schmidt, R.W., (Eds.), Language and communication, (pp. 191-225). London: Longman. Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: OUP. Sinclair, J. & Moon R. (Eds.) (2002) (2nd ed.). Collins Cobuild Dictionary of Idioms. Glasgow: Harper Collins. Skehan P. (1998). A Cognitive Approach to language learning . Oxford: OUP. Speake, J. (Ed.) (1999). Oxford Dictionary of Idioms. Oxford: Oxford University Press. Stengers H. (in press). Is English exceptionally idiomatic? Testing the waters for a Lexical Approach to Spanish, In Boers F. Darquennes J. & Temmerman R. (Eds.), Multilingualism and applied comparative linguistics, volume one: pedagogical perspectives , Cambridge: Cambridge Scholar Press.

236

Wray, A. (2002). Formulaic language and the lexicon. Cambridge: CUP.

237

SHORT-TERM CHANGES IN COMPLEXITY, ACCURACY AND FLUENCY: DEVELOPING PROGRESSIVE-SENSITIVE PROFICIENCY TESTS. Alan Tonkyn University of Reading, England

1. INTRODUCTION: THE PROBLEM OF MEASURING PROGRESS At the beginning of the 1990s, Charles Alderson, reviewing work in language testing to date, and looking to the future, argued that the development of progress-sensitive tests was one of the major tasks for the language testing profession (Alderson, 1991) . However, the elusiveness of this type of measure was shown by the fact that in 2000 Alderson was still campaigning for measures which could chart gains by learners on specific programmes, and on highstakes English for academic purposes programmes in particular: I would argue that we need to take seriously the need to measure achievement, to explore and demonstrate what can be measured, and if we cannot demonstrate achievement and progress, then we need to be clear about why not. (Alderson, 2000: 42). Some commentators have doubted the possibility of much progress in spoken communication skills over a period such as two months, even when intensive study is involved (Lennon, 1995; Politzer & McGroarty, 1985). However, language course providers and students expect progress to be made in speaking skills over such periods, even in the case of intermediate level (and above) level learners. Therefore those involved in instruction and assessment must take up Aldersons challenge of providing appropriate progress-sensitive proficiency measures for these learning contexts. The study reported here is an attempt to respond to this challenge. The dimensions of Complexity, Accuracy and Fluency the focus of this volume appear to be fruitful language features to examine with regard to L2 22 proficiency development. Whether we adopt a componential model of speaking proficiency, such as that of Bachman (1990), or a process-oriented model, such as that of Levelt (1999), we can observe the simple truth embodied in Widdowsons statement: Language learning has two sides to it: knowing and doing (competence and performance). (Widdowson, 1990:150). Clearly, the knowing and doing elements interact in complex ways, but one could roughly link complexity and accuracy of speech with the former, and fluency with the latter. Let us now turn briefly to particular indices which have been used to operationalise these concepts. 2. COMPLEXITY INDICES Grammatical complexity indices can be divided into those which deal with whole units of speech, and those which look at specific features within these units. Units of speech are

22

L2 will be used in this paper to refer to languages additional to the mother-tongue learnt at adolescence or

238

typically linked to syntactic constructions, with the T-unit (Hunt, 1970) the C-unit (Loban, 1963) and the Analysis-of-speech Unit, or AS-unit (Foster, Tonkyn & Wigglesworth, 2000) all serving as benchmarks in much previous research. Amongst the general complexity measures, a simple length metric (words/ unit) has proved effective in distinguishing native- from nonnative-speakers (Mendelsohn, 1983) and higher-rated from lower-rated non-native-speakers (Halleck, 1995). Another general measure which has been frequently used is based on the number of subordinate clauses in a unit. This measure has been found to distinguish nativefrom non-native speech in the Mendelsohn study alluded to above, and has also distinguished planned from unplanned speech in several investigations (Foster & Skehan, 1996; Mehnert, 1998; Skehan & Foster, 1997). An increase in certain types of subordination has also been observed in some longitudinal studies of spoken language development during periods of Study Abroad (Towell, 1994; Towell, Hawkins & Bazergui, 1996). Finally, it can be noted that a number of general complexity metrics have been used in first and second language research which attempt to calculate complexity by counting the number of complexity features within a unit. A comparative review of three of these measures Yngve depth, Fraziers count and the Botel, Dawkins and Granowski (BDG) syntactic complexity measure was conducted by Cheung and Kemper (1990). All were found to highly inter-correlated and to be superior to simple length or subordination metrics in measuring L1 grammatical complexity in that they were seen to be sensitive to smaller structures. Cheung and Kemper concluded that the choice of which measure to use could be made according to practical considerations. This may favour the BDG metric, as being possibly the easiest to compute. In addition to these more general measures, researchers have investigated a number of intraclausal features, as possible indices of different levels of grammatical complexity. Several commentators have linked grammatical complexity and /or syntactic maturity to complexification of the Noun Phrase via modification (Akinnaso, 1982; Garman, 1990; Hunt, 1970). The ability to produce complex Verb Phrases has also been a focus of attention in a range of studies. Thus greater Verb Phrase complexity was associated with a planning condition in Foster and Skehans (1996) investigation of task-based performance. Lennons advanced learners increased their use of modal and catenative verbs over time (Lennon, 1987) while Towell (1994) observed similar changes in an advanced learner of French after a period in France. Finally, the addition of different kinds of Adverbial have been seen as important ways of extending clause patterns in order to express or modulate clause structure (Garman, 1990: 146-147), and Lennons (1987) longitudinal study provides some support for this, leading Lennon to refer later to the frequency of Adverbials and Prepositional Phrases as a partial indicator of complexification (Lennon, 1995: 99). 3. LANGUAGE ACCURACY INDICES As with Grammatical Complexity, indices of Language Accuracy can be both general and specific. The argument of Albrechtsen et al. that listener irritation at error is directly predictable from the number of errors which an IL text contains, regardless of error type (Albrechtsen, Henrikson & Faerch, 1980: 394) can be used to defend more general error density measures, such as the number of words per error, the proportion of error-free units in a text or the average length of error-free units. Foster and Skehan, who have used percentage of error-free clauses as an accuracy metric in several studies, have argued that such a generalized measure of accuracy is more sensitive to detecting significant differences between experimental conditions (Foster & Skehan, 1999: 229).

239

The intuitive appeal of the idea that different error types will have different degrees of gravity is one factor behind researchers decisions to examine particular deviations from standard usage. However, as is well-known, attempts to establish a hierarchy of error gravity have produced conflicting results, and Fulchers (1993) attempt to link error types with levels of L2 oral proficiency failed to produce effective predictions of overall ratings. Within the grammatical area, there is a measure of agreement that Verb Phrase errors are regarded as more serious than those in the Noun Phrase (Chastain, 1981; Guntermann, 1978; Horner, 1987; Politzer, 1978; Rifkin, 1995), but the Chastain study found that word errors in the Noun Phrase were regarded as more serious than form or word errors in the Verb Phrase, which makes glib assertions about NP and VP errors difficult. Finally, it can be noted that Lennon (1995) has urged that a measure of lexical accuracy is a valuable complement to measures of lexical range in assessing short-term proficiency gains. 4. FLUENCY MEASURES Fluency, in Lennons (2000) sense of lower order or narrow fluency, can be seen to be related to processing abilities whereby knowledge is accessed and mobilised with different degrees of automaticity and hence speed. Lennon has divided this type of fluency into two types, namely: - temporal fluency, which can be measured by rate of speaking, the length of fluent runs between pauses of a standard length, and frequency, length and placement of pauses; - vocal fluency, distinctive features of which will be false starts, reformulations and functionless repetitions. Rate of speaking has frequently been found to be associated with judgements of fluency (Connors, 1983; Kormos & Dnes, 2004; Van Gelderen, 1994), and has been shown to increase over time in longitudinal L2 studies (Lennon, 1990; Towell, 2002; Towell et al., 1996). Similarly, length of fluent run has been found to distinguish intermediate from advanced learners (Kormos & Dnes, 2004), and to improve over time (Lennon, 1990; Towell, 2002). Amounts of pausing have been measured in various ways; an overall measure that has proved of value in links with judgements of fluency, as well as in longitudinal studies, is the ratio of phonation to total speaking time, sometimes called the silence ratio (Kormos & Dnes, 2004; Lennon, 1990; Temple, 2000; Towell et al., 1996). It may also be surmised that weighty pauses will be strong indicators of disfluency, and several researchers report links between reductions in numbers of pause clusters (combinations of silent and filled pauses) and different levels of fluency or developments in fluency over time (e.g.Riggenbach, 1991; Towell, 1987). It is well established that, in normal L1 speech, pauses tend to be placed at grammatical junctures, typically at clause boundaries, or possibly before adverbials which are less integrated in the clause (Garman, 1990). Speech with this feature has been reported to be perceived as more fluent (Butcher, 1980) and to distinguish L2 learners at different levels (Riggenbach, 1991) or L1 from L2 speakers (Deschamps, 1980). Finally, recalling Fillmores definition of fluency as the ability to fill time with talk (Fillmore, 1979), one may wish to include some measure of productivity as an index of fluency. Ejzenberg (2000: 293) associates a tendency to speak more with higher fluency, and Kormos and Dnes (2004) found productivity correlated with teachers ratings of fluency. In an interview, length of turn may be a useful measure of this feature.

240

Lennons vocal fluency features have not proved unequivocal markers of different degrees of fluency, though they are useful complements to the temporal indices outlined above. Deese (1980: 80) has commented that hearers find speech which is dense with false starts and selfcorrections unpleasant and difficult to listen to, but the Kormos and Dnes study (2004) failed to show a link between such features and teachers fluency judgements, while some studies have found that L2 learners may not gain much in this regard over time (Lafford, 1995; Lennon, 1990). On the other hand, Deeses statement receives some support from other research on perceptions of fluency (Albrechtsen et al., 1980; Riggenbach, 1991; Van Gelderen, 1994), and it may be that a threshold is operating: below that threshold, these vocal disfluencies are seen as normal monitoring behaviour; above it, they are disfavoured as obstructive. A global measure of the extent of this kind of vocal disfluency or maze is the proportion in the spoken text of extraneous words, that is, words which are involved in false starts, reformulations or functionless repetitions (Vann, 1979). 5. RESEARCH QUESTIONS The indices of grammatical complexity, language accuracy and fluency outlined above suggest ways of dealing with the challenge of measuring short-term gains in speaking proficiency. In the study reported here, the following questions were asked: 1. What changes in the oral proficiency of instructed intermediate/upper intermediate learners of English as L2 occur during a typical intensive EAP course? 2. How are objective measurements of the performances of such learners related to subjective ratings by experienced judges?

6. SUBJECTS The subjects were 24 postgraduate students studying on the 10week summer Reading Presessional English course for intending university matriculants. They constituted an opportunistic sample23, meeting inter alia - the following key conditions: a) They had not studied on the pre-sessional course prior to the summer period; b) They had ELTS M324 (speaking) scores band 6 on the 9 band scale prior to coming to the UK. There were 5 females and 19 males, with a median age of 30. 10 came from South Asia, 5 from East and South-East Asia, 5 from North Africa, 3 from South America, and 1 from Europe.

23

Although the subjects were an opportunistic sample, they could be seen as representative of the wider presessional course group in terms of English proficiency. A comparison of scores achieved by the subject group (n=23 on this occasion) and the non-subject group (n = 26) on the mid-course Test of English for Educational

24

The ELTS test was the precursor to the current IELTS test of English proficiency for academic purposes.

241

7. THE DATA FOR THE STUDY 7.1 Interview data The subjects were interviewed three times by the researcher during the pre-sessional programme, and substantial parts of the audio-recordings of the first and third interviews (henceforth interviews 1 and 2) were used as the data for this study. A period of approximately 9 weeks intervened between the interviews, involving about 210 hours of classroom work, plus additional homework. Interviews 1 and 2 were designed to be broadly parallel in theme, with each being divided into two sections dealing respectively with each subjects academic discipline and with their English language-learning experience at home (first interview) or on the pre-sessional course (final interview). Prior to each interview, the subjects had completed questionnaires which provided a standard basis for the discussion. The tape recorded interviews were orthographically transcribed. The subjects output was then segmented into Analysis-of-Speech units (AS-units), following the principles outlined in Foster et al. (2000). For the purpose of this study, a Level 3 analysis, as defined by Foster et al., was adopted, excluding minor utterances (e.g. yeah, thanks), verbatim echoes and certain verbless turn-initial units such as elliptical responses. The shortest interview produced 66 AS-units of the Level 3 standard defined above. Accordingly, 66 AS-units were selected from all the interviews (n = 48) for analysis, with 33 taken from the first half of the interview, focusing mainly on the subjects academic subject, and 33 from the second half, focusing mainly on the subjects learning and use of English. 7.2 Grammatical complexity measures The 66 AS-units from each subject were then analysed to produce a measure for each subject of the number of the following features in the sample, which reviews of previous research had suggested might prove fruitful indices of grammatical complexity: Overall complexity measures: a. Words; b. Botel Dawkins Granowski (BDG) Syntactic Complexity count; c. Subordinate clauses; Phrase-level complexity measures d. Noun Phrase Premodifications; e. Noun Phrase Postmodifications; f. Primary Auxiliaries; g. Modal Auxiliaries; h. Catenative Verbs; i. Adverbial: Adverbs; j. Adverbial: Prepositional Phrases. All these measures were based on pruned versions of the transcripts, that is, after the removal of extraneous words involved in false starts, reformulations and functionless repetitions. 7.3 Language Accuracy measures 242

Using the pruned transcripts, the following language accuracy measures were calculated: a. Words / error; b. Error-free AS-units/Total AS-units; c. Words / Verb phrase error; (i.e. inflection error, or auxiliary omission/ misuse.) d. Words / Noun phrase error; (i.e. number/case error; determiner omitted or misused.) e. Words / Syntactic error; (i.e. word order; omission/misuse of contextually necessary clause or phrase element.) f. Words / Lexical error; (i.e. wrong choice of open class word) g. Words / error-free AS-unit. As a check on the reliability of the researchers judgement, the error analysis of 5% of every subjects output (ie 7/132 AS-units), randomly selected, was subjected to validation by an experienced applied linguist. The researchers error-analysis of 86% of these cases was judged completely acceptable by the validator, with a further 12.9% being judged possibly acceptable. On this basis, the researchers error count was used in the analysis of the data. 7.4 Fluency measures Finally, the tape recordings of the interview excerpts were subjected to analysis and a range of detailed measures of fluency, covering both its temporal and vocal aspects, as defined by Lennon (1990), were computed. The audio-recorded output was analysed on computer using the Macintosh-based speech analysis program Signalyze (version 3.12) (Keller, 1994) in order to provide accurate measures of speech rate and pause lengths (silent and filled). A minimum pause length of 0.3 seconds was established, the level used by Raupach (Raupach, 1980) and the minimum level for Riggenbachs hesitation pause. (Riggenbach, 1991). Fluent runs were thus defined as runs of speech between pauses (silent or filled) of at least 0.3 seconds. In addition, minimal syntactic units, or text units, (Garman, 1989) were identified in the transcripts for the purposes of establishing where pauses were occurring at grammatical junctures, or within grammatical units. In addition, pause clusters were defined, following Towell (1987; 2002) as combinations of silent + filled + silent pauses. Finally, the proportion of words in each transcript which were extraneous, that is, involved in false starts, reformulations or functionless repetitions, was calculated. On this basis, the following measures were computed for each subjects performance in the interviews: a. b. c. d. e. f. g. Rate of speaking (all syllables and pruned syllables): syllables / minute Length of fluent runs (in syllables): all syllables and pruned syllables Phonation time / Total speaking time. Proportion of total (silent and filled) pause time at text unit boundaries. Turn length (average): AS-units Non-extraneous words/Total words Pause clusters (/ 66 AS-units)

7.5 Subjective Rating data

243

Data on judges perceptions of the subjects performances were provided by ratings of the audio-recordings made by a panel of four experienced raters trained by the University of Cambridge Local Examinations Syndicate to assess performances on the IELTS tests of speaking and writing. Two versions of the then current IELTS speaking rating scale were used in this assessment. One was the standard holistic 9-band global scale then used for the IELTS speaking module, and the other was a specially adapted analytical version of the scale, which was named the Oral Profile Rating Scale (hereafter OPR Scale; see APPENDIX). It is worth noting that UK universities will typically require an overall band of 6.5 7 for matriculants in most disciplines. Those achieving an overall band of 5 5.5 will typically be recommended by universities to take a pre-entry English course of 2-3 months. Three of the ratings provided by the raters were used in the study reported here, namely: Grammatical Complexity; (OPR Scale B) Language Accuracy; OPR Scale D) Fluency. (OPR Scale C) The raters were also asked to indicate, on their rating pro-forma, the features of each performance which influenced their grading decisions. The recordings of all 48 interview excerpts were placed in two different randomised orders, with Version A given to raters 1 and 2, and Version B to raters 3 and 4. (This randomised presentation of the taped data was designed to ensure that assessors would have to provide independent ratings of the same subjects two performances, and would not be influenced by an assumption that the end-of-course interview must be better.) Statistical analysis of the ratings showed no effect of the Version used on rating behaviour. In order to establish the extent of inter-rater agreement, a table of Perfect Agreements, Acceptable Disagreements (1 band or less), and Total Disagreements (>1 band) was drawn up for all the possible pairings of raters based on Overall band scores awarded, following the model of Barnwells (1987) study of ratings on the 9-level ACTFL scale. This gave the following overall results, which are very similar to those reported by Barnwell: Total agreement: 42.7%; Acceptable disagreement: 47.2%; Total disagreement: 10.1%. In order to establish the degree of individual rater consistency, the ratings were subjected to a multi-faceted Rasch analysis, with the 24 subjects, four raters, two occasions (pre- and postcourse scores), and five types of rating (listed above) as the facets. This analysis was undertaken to enable the researcher to establish whether the average grades awarded by each rater (the observed grades) were close to a fair average or not. The resulting measurement report confirmed that there was significant disagreement amongst the raters, but that they appeared to be rating consistently, as was seen from the closeness of the observed and fair averages in each case. This was confirmed by the acceptable range of Infit and Outfit Mean Square figures, which were within the range (0.5 1.5) recommended by Lunz and Wright (1997) for situations where rater consistency rather than total agreement is acceptable. 244

On the basis of this finding it was decided to use the (observed) arithemetical mean of the raters band scores as the subjective rating band in other calculations. 7.6 Analytical procedures To answer Research Question 1, the statistical significance of changes in subjects performances on all measures of Grammatical Complexity, Language Accuracy and Fluency was determined by analysis using Wilcoxons Matched-Pairs Signed-ranks test. To answer Research Question 2, two groups were formed based on average ratings for each subject in each of the three speech areas for the first interview 25: a Lower level group of subjects with ratings 5.25 and an Upper level group with ratings 5.75. 26 The tape+transcript-based measures for these ratings-based Lower and Upper groups relevant to each of the four specific parameters were then compared for Interview 1 using the MannWhitney U test for independent samples. This was to enable the researcher to see which features were most significantly different across two adjacent proficiency groups. These features would, it was hypothesised, be most useful as indicators of progress. Finally, to assess the influence of features of L2 speech on judges perceptions from another angle, the raters open-ended comments on their rating decisions were examined. Special attention was paid to anomalous cases, where the judges verdicts appeared at odds with the transcript-based evidence. 8. RESULTS 8.1 Gains in Complexity, Accuracy and Fluency The results of the statistical analysis of gains in Grammatical Complexity are shown in Table 1 below. The results in Table 1 indicate that gains according to subjective ratings were disappointing, with 12 subjects achieving a minimum gain of 0.25 band gain on averaged Grammatical Complexity ratings (8 achieved 0.5 band). However, several complexity features showed statistically significant advances, with all the more general metrics (Words, Subordinate clauses and the BDG complexity measure) falling into this group. Modal and Catenative verbs Lennons Co-verbs also showed significant gains, as did the use of Adverbs as clause elements
Table 1: Summary of measures Grammatical Complexity gains

These groups were only examined for interview 1, as the Lower Group (Band 5) became unacceptably small for some analyses in Interview 2. (eg n= 4 for Lexical Range). 26 Cambridge ESOL has estimated the Standard Error of Measurement for its speech ratings as 0.46 of a band (See 2005 test data at: http://www.ielts.org/teachersandresearchers/analysisoftestdata/article234.as px. ) The interval between the two groups thus established, although it would entail the loss of some subjects data, would therefore mean a high likelihood (over 70%) that they constituted genuinely distinctive proficiency levels, with the lower group being in or close to the band 5 range and the upper group being in or close to, the Band 6 range.
25

245

Interview 1

Interview 2

Subjects gaining by >5% 27 s.d. 0.69 81.5 7 11.53 12.6 2 9.12 5.46 6.74 3.37 8.52 10.9 1 Median 5.57 660.50 36.50 55.50 21.00 10.00 18.00 2.50 34.00 39.50 n= 12 17 20 7 10 13 23 / 23 14 22 7

Wilcoxon ranks test

Signed

n = 24 Mean Gr C rating Words Subord. Cl.s 23.88 Premod.s Postmod.s Primaryt Aux.s Modal Aux.s Catenative Vbs A: Adverbs A: Phr.s BDG Prep. 42.83 192.04 61.83 24.50 10.42 9.08 2.29 27.42 5.50 598.29

s.d. 0.77 82.93 9.99 19.45 10.79 5.62 4.74 2.85 7.86 10.74 40.27

Median 5.57 600.00 23.00 60.50 22.50 10.50 9.00 1.00 28.00 44.50 200.00

n = 24 Mean 5.68 673.08 39.00 53.88 23.25 11.17 17.83 3.58 35.21 39.46 225.92

z -1.588 -3.329 -4.189 -2.187 -0.746 -1.028 -4.623 -2.026 -3.918 -1.258

p (2tailed) .112 .001 *** <.001 *** .029* .455 .304 <.001 *** .043 * <.001 *** .208

33.7 228.50 17 -3.443 <.001 0 *** (* Significant at the p<.05 level; **significant at the p<.01 level; ***significant at the p<.001 level; significant in a negative direction)

Table 2 below gives summary results for gains according to the Language Accuracy measures.
Table 2: Summary of measures: Language Accuracy gains. Interview 1 n=24 Mean Lang.Accuracy rating Words/ 5.71 6.39 s.d. 0.70 2.10 Median 5.82 6.01 Interview 2 n=24 Mean 5.67 7.61 s.d. 0.61 2.80 Median 5.63 6.71 Subjects gaining by >5% n= 8 19 Wilcoxon ranks test z -.296 -3.543 Signed

p (2tailed) .767 <.001

27

For the ratings, a minimum of 0.25 of a band was used.

246

error Error-free AS-us/Tot Words/VP error Words/NP error Words/Syntax error Words/Lexical error Words/Err-orfree AS-u

*** 0.27 29.90 24.08 51.43 56.83 7.03 0.11 18.78 10.12 26.67 22.46 0.91 0.23 24.62 22.30 51.43 52.28 7.18 0.31 55.44 28.62 62.71 57.92 7.62 0.11 35.9 8 14.2 6 63.7 5 24.6 5 1.02 0.28 47.96 24.33 42.76 51.02 7.40 13 17 19 17 12 12 -2.557 -3.714 -2.800 -.514 .000 -2.114 .011 * <.001 *** .005 ** .607 1.000 .034 *

(* Significant at the p<.05 level; **significant at the p<.01 level; ***significant at the p<.001 level.)

Yet again, accuracy was not perceived to improve significantly for the majority of the learners: 8 students achieved average minimum band gains of 0.25, and only 5 achieved a 0.5 band minimum gain. However, the transcripts reveal modest but significant improvements in overall error density (Words/error; Error-free AS-units / Total AS-units), in the ability of learners to construct longer units without errors (Words / Error-free AS-unit), and in the frequency of Noun Phrase errors (Words/ NP error). The most striking improvement is in the frequency of Verb Phrase errors (Words / VP error), which may in part be due to constraints on the number of Verb Phrases in each unit, with productivity in words overall outstripping increases in the number of VPs in the sample. Table 3 below summarises subjects gains, assessed subjectively and objectively, in Fluency. Yet again, only a minority of the subjects convinced the raters that their band level had changed (10 and 7 students at the 0.25 and 0.5 band minima respectively). Rather surprisingly, with regard to the objective measures, significant increases still relatively modest were only recorded for the two Fluent Runs metrics and for length of Turn.

247

Table 3: Summary of measures: Fluency gains . Interview 1 n=24 Mean Fluency rating Sp. rate(all) (syll.s/min.) Sp. rate (prun.) (s/m) Fluent runs (all) (Syll.s) Fluent runs (pr.) (Syll.s) Phonation/ Time Pause time inter-t-u/ tot. Turn length (AS-units) Non-extr. words/total Pause clusters 5.68 167.76 149.50 6.58 5.89 0.67 0.64 3.03 0.87 7.69 s.d. 0.9 25.41 29.45 1.29 1.34 0.06 0.08 1.20 0.0 6 7.29 0.88 6.5 0.86 6.63 0.05 7.33 0.86 5.5 1 11 -1.315 -.716 .189 .474 Median 5,63 165.60 147.9 6.55 5.72 0.65 0.64 2.94 Interview 2 n=24 Mean 5.76 171.98 151.50 7.43 6.54 0.69 0.58 3.97 s.d. 0.70 23.7 2 23.0 5 1.47 1.26 0.07 0.08 1.59 Median 5.75 178.20 157.5 7.25 6.45 0.71 0.59 3.41 Subjects gaining by >5% n= 10 10 9 16 15 12 1 17 Wilcoxon ranks test z -.520 -1.314 -.654 -3.029 -2.672 -1.688 -3.458 -2.451 Signed

p (2tailed) .603 .189 .513 .002 ** .008 ** .091 .001 ** .014 *

( a significant difference in the opposite direction to that hypothesised) (* Significant at the p<.05 level; **significant at the p<.01 level; ***significant at the p<.001 level

8.2 Perceptions of level As mentioned above, two groups were formed, based on raters average ratings for each of the three speech features under investigation. The groups comprised, respectively, subjects with average band scores for each feature of 5.25 and 5.75. (Subjects with average band scores between these points were removed from this analysis). These two groups can be seen to be approximately in the IELTS Band 5 and Band 6 ranges respectively. Results are reported here only for those features in the tape/transcript-based analysis which were significantly different for the two groups, with the alpha level set at p<.05. Figures 1 and 2 below show that the Grammatical Complexity features which significantly distinguished these two approximately adjacent groups (Band 5: n=11; Band 6: n=11) tended to be the more general measures, namely the overall length of their AS-units in Words, the general BDG complexity measure, and the number of subordinate clauses in the standard

248

700

623 554

600

500

400

Band 5 Band 6 172.9 206.6

300

200

100

Words

BDG

sample. In addition, the Band 6 group used, on average, more Primary Auxiliaries than the Band 5 group.
Figure 1: Significant band group differences (interview 1): Overall Grammatical Complexity (Group means).
30 25 20 15 10 5 0 Sub. Cl. Primary aux. 7.2 19.2 13.1 Band 5 Band 6

26.2

Figure 2: Significant band group differences (interview 1): Grammatical Complexity: Specific features (Group means).

The data in Figure 3 below suggest that, in relation to accuracy, the raters paid attention especially to overall error frequency (Words/error) and to Syntax (Words/syntax error) in assigning accuracy ratings. Band 6 (n = 15) subjects also significantly outperformed Band 5 (n = 7) with regard to Verb Phrase accuracy.
70

63.5

60

50

40

34.1 28.2 20.8

Band 5 Band 6

30

20

10

4.8

7.1

Words/error

Words/VP error

Words/syntax error

Figure 3: Significant band group differences (interview 1): Language Accuracy

249

Finally, it can be seen from Figure 4, that the most striking difference between the two ratingsbased groups with regard to perceptions of fluency was in the area of Pause Clusters, which are far fewer in the case of the Band 6 group (n = 12) than in that of the Band 5 group (n = 9). However, in addition, the key temporal variables of Speaking rate and Fluent runs (both given in Figure 4 for unpruned text) also distinguished the groups. A modest but significant difference in the amount of non-extraneous output can also be noted.
200 180 160 140 120 100 80 60 40 20 0

178.4 147.5

84
s

90

Band 5 Band 6

5.7 7.1 Speaking rate (sylls/min) Fluent runs (syll.s) Non-extran. Words: % of total

13.7

3.8

Pause clusters

Figure 4: Significant band group differences (interview 1): Fluency .

8.2 Halo effects in ratings The quantitative results for the Band 5 and Band 6 groups allow us to infer what was driving the raters decisions with regard to proficiency levels, and what might therefore be helpful indices of progress for experienced judges. However, scholars have noted the problem of halo effects in subjective assessments of L2 speaking (e.g. Malvern et al., 2004), with contamination of ratings occurring, as a rating in one area of performance influences that in another. Although the trends noted in Figures 1-4 above were strong for the majority of the learners, there were anomalous cases, where above, or below, average performance in one area of speaking (as measured objectively) was not perceived as such by the assessors, probably under the influence of other features of the speech. Examination of the judges open-ended comments suggested that several halo phenomena were occurring. Grammatically complex language might not be perceived as such if it involved considerable repetition of structures, or was contained in relatively short turns. Complex language might also not be recognised if it was felt to be imprecise or irrelevant to the discussion. Finally, complex language appeared to be hidden in some cases if it was produced laboriously in relatively non-fluent ways. On the other hand, fluent production of relatively short and simple AS-units might appear more complex than it actually was. Relatively sophisticated content, especially that associated with an academic discipline, also appeared to have an unduly positive effect on grammatical complexity ratings in some cases. Finally, there was at least one case where grammatical complexity and accuracy were confused, with high levels of the former masking low levels of the latter.

250

9. DISCUSSION 9.1 Complexity These data suggest that the more general complexity measures ( Words, Subordinate clauses, or the BDG measure of several complexity features) seem to be better progress-sensitive indices and better aligned with judges assessments of adjacent proficiency levels than specific intra-clausal features, though elaboration of the Verb Phrase, and the frequency of Adverb use are also possibly useful progress markers. Assessors may need guidance to discern complexity within disfluency and/or short turns, and to distinguish complexity from confident fluency or sophisticated content. 9.2 Language Accuracy Overall Error density, and Error frequency in the VP seem to be promising indices of progress, and to be aligned with judges views of level. Syntax errors (e.g. word order errors or constituent omission) are less likely to show short-term gains, but seem very influential in judges assessment of level, probably because syntactic errors will include omissions and disordered speech which will disturb coherence and hence comprehensibility. Assessors may need to be guided to distinguish accurate use of grammar from range of grammar displayed. 9.3 Fluency These data showed surprisingly limited fluency gains over time, with Fluent runs and Turn length the only significant cases of short-term progress. It may be that the learners greater ambition, realised in greater complexity, served to hold back gains in fluency. However, Speaking rate and frequencies of weighty Pause clusters appeared to be strong influences on judges assessments of band group. The latter phenomenon, coupled with the failure of the pause placement measure to show a link with impression grade, may be evidence of the importance of the threshold effect in fluency, mentioned earlier. Thus badly placed pauses may only register strongly with hearers if they are of a certain length and thus more disruptive. 9.4 On-line rating Some of the measures mentioned above will be of more interest to the researcher with time to spare and computer to hand than to the hard-pressed assessor, judging live interviews. However, some indices, such as subordination, overall error density, syntax errors, and the presence of pause clusters might well be incorporated into rating scales to distinguish performances within the intermediate upper intermediate range. However, the problems of halo effects noted here suggest that, where circumstances permit, simultaneous ratings of complexity, accuracy and fluency by one rater should be abandoned in favour of separate ratings by different raters, or by a single rater listening to a recorded version three times.

251

REFERENCES Akinnaso, F. N. (1982). On the differences between spoken and written language. Language and Speech 25(2), 97-125. Albrechtsen, D., Henrikson, B. & Faerch, C. (1980). Native speaker reactions to learners' spoken interlanguage. Language Learning 30(2), 365-396. Alderson, J. C. (1991). Language testing in the 1990s: How far have we come? How much further have we to go? In S. Anivan (Ed.), Current Developments in Language Testing. Singapore: SEAMEO Regional Language Centre. Alderson, J. C. (2000). Testing in EAP: Progress? Achievement? Proficiency? In G. M. Blue, J. Milton & J. Saville (Eds.), Assessing English for Academic Purposes. Berne: Peter Lang. Bachman, L. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press. Barnwell, D. (1987). Who is to judge how well others speak? An experiment with the ACTFL/ETS oral proficiency scale. Paper presented at the Eastern States Conference of Linguistics, Ohio State University. Butcher, A. (1980). Pause and syntactic structure. In H. W. Dechert & M. Raupach (Eds.), Temporal Variables in Speech: Studies on Honour of Frieda Goldman-Eisler. The Hague: Mouton. Chastain, K. (1981). Native speaker evaluation of student composition errors. Modern Language Journal 65(3), 288-294. Cheung, H. & Kemper, S. (1990). Complexity metrics and the production of complex sentences. Mid-America Linguistics Conference Papers, 58-70. Connors, K. (1983). Performance measures in L2: classification and correlations. Bulletin of the Canadian Association of Applied Linguistics 5(2), 117-141. Deese, J. (1980). Pauses, prosody, and the demands of production in language. In H. W. Dechert & M. Raupach (Eds.), Temporal Variables in Speech: Studies in Honour of Frieda Goldman-Eisler. The Hague: Mouton. Deschamps, A. (1980). The syntactical distribution of pauses in English spoken as a second language by French students. In H. W. Dechert & M. Raupach (Eds.), Temporal Variables in Speech: Studies on Honour of Frieda Goldman-Eisler. The Hague: Mouton. Ejzenberg, R. (2000). The juggling act of oral fluency: a psycho-sociolinguistic metaphor. In H. Riggenbach (Ed.), Perspectives on Fluency. Ann Arbor: University of Michigan.

252

Fillmore, C. J. (1979). On fluency. In C. J. Fillmore, D. Kempler & W. S.-Y. Wang (Eds.), Individual differences in Language Ability and Language Behavior. New York: Academic Press. Foster, P. & Skehan, P. (1996). The influence of planning on performance in task based learning. Studies in Second Language Acquisition 18(3), 299-324. Foster, P. & Skehan, P. (1999). The influence of source of planning and focus of planning on task-based performance. Language Teaching Research 3(3), 215-247. Foster, P., Tonkyn, A. & Wigglesworth, G. (2000). Measuring spoken language: a unit for all reasons. Applied Linguistics 21(3), 354-375. Fulcher, G. (1993). The construction and validation of rating scales for oral tests in English as a foreign language. Unpublished PhD, University of Lancaster, Lancaster. Garman, M. A. G. (1989). The role of linguistics in speech therapy assessment and remediation: assessment and interpretation. In P. Grunwell & A. James (Eds.), The Functional Evaluation of Speech Disorders. London: Croom Helm. Garman, M. A. G. (1990). Psycholinguistics. Cambridge: Cambridge University Press. Guntermann, G. (1978). A study of the frequency and communicative effects of errors in Spanish. Modern Language Journal 62(5/6), 249-253. Halleck, G. B. (1995). Assessing oral proficiency: A comparison of holistic and objective measures. Modern Language Journal, 79(2), 223-234. Horner, D. (1987). The perception of error gravity by French native speakers. Franco-British Studies 3(Spring), 73-86. Hunt, K. (1970). Syntactic Maturity in Schoolchildren and Adults. Monographs of the Society for Research into Child Development 35 (1). Keller, E. (1994). Signalyze (3.12). Lausanne: InfoSignal Inc. Kormos, J. & Dnes, M. (2004). Exploring measures and perceptions of fluency in the speech of second language learners. System 32(1), 145-164. Lafford, B. (1995). Getting into, through and out of a survival situation: a comparison of communicative strategies used by students studying Spanish abroad and at home. In B. Freed (Ed.), Second Language Acquisition in a Study Abroad Context. Amsterdam: John Benjamins. Lennon, P. (1987). Second language acquisition of advanced German learners. Unpublished PhD, University of Reading, Reading. Lennon, P. (1990). Investigating fluency in EFL: a quantitative approach. Language Learning 40, 387-417.

253

Lennon, P. (1995). Assessing short-term change in advanced oral proficiency: problems of reliability and validity in four case studies. ITL Review of Applied Linguistics, 109-110, 75109. Lennon, P. (2000). The lexical element in spoken second language fluency. In H. Riggenbach (Ed.), Perspectives on Fluency. Ann Arbor: The University of Michigan Press. Levelt, W. (1999). Producing spoken language: a blueprint of the speaker. In C. M. Brown & P. Hagoort (Eds.), The Neurocognition of Language. Oxford: Oxford University Press. Loban, W. D. (1963) National Council of Teachers, Champaign, Illinois. Lunz, M. E. & Wright, B. D. (1997). Latent trait models for performance examinations. . In J. Rost & R. Langheine (Eds.), Applications of Latent Trait and Latent Class Models in the Social Sciences. Mnster: Waxmann. Mehnert, U. (1998). The effects of different lengths of time for planning on second language performance. Studies in Second Language Acquisition, 20(1), 83-108. Mendelsohn, D. J. (1983). The case for considering syntactic maturity in ESL and EFL. International Review of Applied Linguistics 21 (4), 299-311. Politzer, R. (1978). Errors of English speakers of German as perceived and evaluated by German natives. Modern Language Journal, 62. Politzer, R. & McGroarty, M. (1985). An exploratory study of learning behaviours and their relationship to gains in linguistic and communicative competence. TESOL Quarterly 19(1), 103-123. Raupach, M. (1980). Temporal variables in first and second language speech production. In H. W. Dechert & M. Raupach (Eds.), Temporal Variables in Speech: Studies on Honour of Frieda Goldman-Eisler. The Hague: Mouton. Rifkin, B. (1995). Error gravity in learners' spoken Russian: a preliminary study. Modern Language Journal 79(4), 477-490. Riggenbach, H. (1991). Toward an understanding of fluency: a micro-analysis of nonnative speaker conversations. Discourse processes 14, 423-441. Skehan, P. & Foster, P. (1997). Task type and task processing conditions as influences on foreign language performance. Language Teaching Research 1(3), 185-211. Temple, L. (2000). Second language learner speech production. Studia Linguistica, 54(2), 288-297. Towell, R. (1987). Approaches to the analysis of the oral language development of the advanced learner. In J. A. Coleman & R. Towell (Eds.), The Advanced Language Learner. London: AFLS/SUFLRA/CILT. Towell, R. (1994). The growth of linguistic knowledge and language processing in advanced language learning. In G. Doble & P. Fawcett (Eds.), Applied Linguistics and Language 254

Teaching: Bradford Occasional Papers No. 13. Bradford: Department of Modern Languages, University of Bradford. Towell, R. (2002). Relative degrees of fluency: a comparative case study of advanced learners of French. International Review of Applied Linguistics 40, 117-150. Towell, R., Hawkins, R. & Bazergui, N. (1996). The development of fluency in advanced learners of French. Applied Linguistics 17(1), 84-119. Van Gelderen, A. (1994). Prediction of global ratings of fluency and delivery in narrative discourse by linguistic and phonetic measures - oral performances of students aged 11-12 years. Language Testin 11(3), 291-319. Vann, R. J. (1979). Oral and written syntactic relationships in second language learning. In C. Yorio, K. Perkins & J. Schachter (Eds.), On TESOL '79: The Learner in Focus. Washington DC: TESOL. Widdowson, H. (1990). Aspects of Language Teaching. Oxford: Oxford University Press.

255

Oral Profile Rating Scale (Note: descriptors in italics are the author's additions to the pre-2001 IELTS scale; the other descriptors are derived, with minor alterations of wording, from that scale.) P a r a m e t e r s A. Communicative/ Functional Range Speech is situationally appropriate Communicates effectively on all general academic vocational or leisure topics relevant to own interests or experiences. Can use speculative, argumentative descriptive and narrative language flexibly to convey precise meanings. Communicates effectively on a wide range of general, academic, vocational or leisure topics. Displays some flexibility in the use of speculative, argumentative, descriptive and narrative language. Generally communicates effectively on general topics and on other matters relevant to own immediate academic, vocational or leisure interests. Can present speculation, extended argument, and long or complex description or narration. Errors in structure or coherence may sometimes occur. B. Language Range and Complexity Capable of very complex speech at all times; wide language range Communicates precisely using a wide range of language forms, with only occasional indications of limitations. Extended structuring including cohesive features is accurate and appropriate Communicates fairly precisely using complex sentence forms and a wide range of modifiers, connectives, and cohesive features. Can use complex sentence forms and a wide range of modifiers, connectives, and cohesive features to convey most meanings precisely. Is generally able to use circumlocution to cover gaps in vocabulary and structure C. Fluency Speech is fluent Speech is mainly fluent, though there may be an occasional hesitation caused by a language problem

Band 9 8

Speech is mainly fluent though hesitations caused by language problems occur fairly regularly, without impeding communication. Speech is fairly fluent, though hestitation and backtracking obstruct communication on occasion. Errors in structure or coherence may sometimes occur.

D. La Accuracy Speech is acceptable times Occasional inappropriaci non-systemat errors in gr and vocabula occur w impeding communicatio Errors vocabulary structure may without inh communicatio

Errors in gr and vocabula occur occasionally interfere communicatio

256

Band 5

A. Communicative/ Functional Range Is broadly able to convey meaning on most general topics. Has difficulty in presenting speculation and extended argument, while long or complex description or narration may lose coherence. Can convey basic meaning on familiar topics. Limited ability to describe, give precise information or express attitudes.

B. Language Range and Complexity Generally makes use of relevant connectives and other cohesive features. Has some ability to use complex sentence forms and modifiers.

C. Fluency

Can convey only simple meaning on very familiar topics. Can answer simple questions and respond to simple statements. Has only limited ability to take the initiative with original statements and questions. Little communication is possible except Uses very limited vocabulary, Utterances consist for the most rudimentary information isolated words, or short isolated words. memorized utterances.

Fluency problems are noticeable throughout, though able to keep going, even in longer utterances. Can engage in extended conversation. Can use common question forms Longer utterances tend to Errors to elicit information. Has control break down. grammar of basic sentence forms. Can link vocabular simple sentences using the most frequent frequently occurring connectives. may in Tentative use of modifiers. with communic Basic sentence forms appear to be Frequent pauses may Grammati used. Essentially no ability to link occur as the candidate errors sentences or use modifiers. searches for words. numerous in mem utterances of

D. La Accuracy Errors structure vocabular interfere communic

Essentially unable to speak English. Limited to a very few isolated Uses mainly isolated Limited to utterances of virtually no words and memorized utterances words and fragments. communicative significance

257

258

COMPLEXITY, ACCURACY AND FLUENCY IN SECOND LANGUAGE ACQUISITION RESEARCH Richard Towell University of Salford, England

1. INTRODUCTION This contribution is written from the point of view of a researcher into second language acquisition (SLA). Complexity, Accuracy and Fluency are, therefore, constructs to be determined in relation to a number of background assumptions related to the study of SLA. First, I make the assumption that three different kinds of mental representation are implicated in SLA. Each of these specifies an area which is best looked at for research purposes independently of the other two. The first need is for learners to acquire an appropriate mental representation for linguistic competence. The clearest specification of this concept is to be found in the works of Chomsky (1986) and for SLA in Hawkins (2001) and White (2003). Competence will require a system for word classification, mechanisms for realising the syntactic and morphological phenomena which express grammatical concepts like subject, object, agreement, tense, interrogation, passivisation, negation and clause embedding etc. These aspects of language are seen as fundamental for complexity. The second need is for learners to be able to represent correctly learned linguistic knowledge. This is the kind of knowledge specified in dictionaries, thesauri, glossaries, style manuals and normative grammars. It includes the specification of items in the lexicon, form/function pairs, all morpho-syntactic forms, formulaic utterances, pragmatic, stylistic and discourse rules and the rules of written language, including spelling. This aspect is seen as fundamental for accuracy. Also included is the learning of grammar via explanations as opposed to the learning implied by competence. The third need is for learners to build a suitable mental representation for the procedures which enable the processing of language in real time. For Levelt (1989: 9-11, 149, 240) these are IF-THEN condition/action pairs which a speaker uses to construct and encode the message. For Jackendoff (2002: 199) they might be part of the interface processors which permit the integrative processors to interact with one another. For learners, such structures will go through various stages as development takes place. Having a mental representation of these processes which permits real time operation in comprehension and production is seen as fundamental for fluency. Second, I wish to distinguish three kinds of learning. I will call these triggered, explicit and procedural learning . Triggered learning fits evidence to expectations, as will be explained below; it is unconscious, it cannot be explicitly formulated, it is quick to store information. Explicit learning is conscious, can be explicitly formulated, information is stored quickly but slow to be retrieved, and it can be the starting point for skill development. Procedural learning is also unconscious and cannot be explicitly formulated; it is slow to store information, and is closely linked to skill development. These three kinds of learning are related to the three kinds of mental representation. The relationship is, however, not a simple one-to-one: notably, what I will call the primary forms of knowledge i.e. triggered and explicit knowledge, once initially acquired, need

259

to be integrated within procedures which permit operation in real-time and within memory constraints. Third, I wish to indicate that the three kinds of learning are likely to be sensitive to frequency in different ways. Frequency effects will play a role both in comprehension/exposure and in production/practice. Triggered learning of competence involves the learner matching the data against a set of expectations given by Universal Grammar (UG): this may be achieved on the basis of relatively little exposure to positive evidence and practice, if necessary, will play a minor role. Explicit learning of learnt linguistic knowledge has no such expectations and requires each form and its related categories to be perceived, learnt, stored and practised, so multiple exemplars and considerable practice are essential. Procedural learning is even more dependent on multiple exposure and extensive practice as frequency, leading to the unconscious identification of patterns of repeated use, is likely to be a central factor in driving developmental change. This will lead to more economic and reliable storage of structures and forms through processes of restructuring, tuning and strengthening. These background assumptions can be summarised in the form of a table:
Competence Knowledge Learnt Linguistic Knowledge Knowledge of Processes Triggered Explicit Procedural Less Frequency Sensitive More Frequency Sensitive Most Frequency Sensitive

Figure 1: Linguistic Knowledge and Kinds of Learning.

Finally, as part of the background assumptions, I must include memory. The knowledge acquired has to be stored in memory and I assume that humans have three memory stores. The first of these is Declarative Memory. This memory stores explicit, propositional and conceptual knowledge: it is quick to store but slow to retrieve. The second is Procedural Memory: it builds on triggered, explicit and procedural knowledge to develop structured processes for skill-based activities. It is slow to store but quick to retrieve and will store non-conceptual knowledge, compiling the information it holds into the most economic units. It can be subdivided into associative procedural and autonomous procedural memories: at the associative stage it is possible for information to interact with other kinds of information in the store; at the autonomous stage, interaction is not possible and the information cannot come under conscious control. Essential to the functioning of both of these memories is Working Memory. This is an intermediary between the other two memories and performance. It is of limited storage capacity. I then make the assumption that in the memory systems productions (condition/action pairs in the terminology of Levelt (1989)) must be created out of the linguistic knowledge of all kinds. I take the view that the units underlying language processing, both comprehension and production, must be created as productions through a process which, over time, carefully transforms them into the automatised units necessary for real-time communication. The justification for these assumptions was first laid out in Approaches to Second Language Acquisition (Towell & Hawkins, 1994). Many publications since that date have provided accounts which chime with but would bring modifications to that justification: Paradis (2004) provides specific insight into implicit and explicit language processes. A special issue of Studies in Second Language Acquisition (2005) edited by Hulstijn and Ellis discusses many relevant issues in detail. Papers in Miyake and Shah (1999) and Andrade (2001) provide more detailed commentaries on Working Memory. Whilst this brief presentation has undoubtedly led to some oversimplification, it is intended simply as

260

means of providing a coherent set of background assumptions with which to define Complexity, Accuracy and Fluency from an acquisitionist standpoint. 2. DEFINITIONS OF THE CONSTRUCTS AND RELATIONSHIP WITH THE BACKGROUND ASSUMPTIONS Under these assumptions, the ability to build complex sentences will critically depend on the ability to construct within linguistic competence a mental representation for the full syntactic tree of the second language and operate merge and move according to the generative possibilities of that language. Such mental representations are a necessary condition for being accurate in that language but they are clearly not sufficient. Accurately stored and represented learnt linguistic knowledge is as essential. Accuracy in production will depend on these two kinds of knowledge and how they are stored in the mind: the learner will be able to produce correct forms in real time only if the structures and forms have been created and stored in a way which permits immediate, unthinking, reliable access. Fluency should be defined in two ways: the first linked to comprehension and the ability to assimilate messages encoded in the second language; the second linked to production and the ability to encode and deliver messages in the second language. Both depend on fully integrating the three kinds of knowledge. There are several implications to be drawn from this. First, using complex language accurately and fluently depends on a) successfully acquiring the two primary kinds of knowledge and b) integrating them with each other within procedures which are usable in real-time. Second, it should also be noted that the balance between triggered and explicit knowledge in the L2 is likely to be different from the balance in the L1 because of age and context of learning. These factors will also cause the processing procedures to be different. Third, the processing dimensions are crucial: it is through processing for comprehension that the developmental dimension of linguistic competence is realised and it is through processing for production that the interaction between the different forms of knowledge takes place: triggered knowledge and learned linguistic knowledge must be built upon and stored in such a way as to be available for use in real time performance. I will now look in turn at the learning of each of the three kinds of knowledge. 3. LINGUISTIC COMPETENCE: TRIGGERING MENTAL REPRESENTATIONS The UG approach to second language acquisition assumes that second language learners, like first language learners have access to an innate knowledge of linguistic principles. This knowledge guides or constrains language learning. It provides the set of expectations referred to above. There are however currently two contrasting points of view, known respectively as the Representational Deficit Hypothesis, usually accompanied by Modulated Structure Building, and Full Transfer/Full Access Hypothesis, which I will combine here with the Missing Surface Inflection Hypothesis. 3.1 Representational Deficit Hypothesis and Modulated Structure Building Under the Representational Deficit Hypothesis, it is assumed that the second language learner starts with a mental representation which is less than fully specified. He or she starts with a full but unconscious awareness of the UG phrase structure capabilities, specifically the X projections, and then builds up the structure of the L2 on the basis of expectations plus intake, influenced by the L1 at key points. Hawkins (2001), the leading proponent of this view, that the evidence shows that, in many cases, learners find it impossible to construct a fully specified L2 feature

261

system identical to that possessed by those for whom the language is an L1, although they can find other ways of producing language which is very like that of the L2. 3.2 Full Transfer/Full Access combined with the Missing Surface Inflection Hypothesis From the point of view of the Full Transfer/Full Access hypothesis (Schwartz & Sprouse, 1996), the learner already has the relevant abstract syntactic categories: they do not need to be created on the basis of input. But the settings of the categories with regard to features or strength will be wrongly determined because, under this scenario, learners derive from their L1 a set of working assumptions about functional categories and about their features and strength. They then clearly need to modify these so that they come to match those of the L2. They can do so because they will be sensitive to the triggers in the input provided by UG. White has claimed: it has been proposed that there must be designated, unambiguous and unique cues or triggers, consisting of partially or fully analysed structures (White, 2003: 159). And furthermore, Cues are part of the built-in knowledge supplied by UG (White 2003:159). From this perspective, there are two obstacles to full and immediate acquisition. First, the starting point of the L1 settings complicates the process. This influences outcomes in two possible ways: depending on the nature of the L1, the learner may find it difficult to re-work the existing knowledge. Also, the learner may parse the incoming data as if it were the L1 if this works even in a less than optimal way, the learner may not be able to perceive the new knowledge (Fernandez, 1998). Second, even when the learner has acquired the new syntactic settings, it may be difficult for him or her to use them consistently: Linguistic competence accounts for the distinction between grammatical and ungrammatical sentences; performance factors account for the failure to observe the distinction absolutely. (White, 2003:29/30). This position has come to be known as the Missing Surface Inflection Hypothesis (Lardiere, 1998). 3.3 How can each of these hypotheses be moved forward or proven? What the Representational Deficit position needs is some way of proving that learners do indeed construct the intermediate representations that the researchers say they do. This often means demonstrating that the grammar has evolved over time and specifically that the learners move away from treating an item as lexically determined to one which plays a productive role in the syntax. What the Full Transfer/Full Access and Missing Surface Inflection hypothesis needs is a) some way of confirming that triggering has actually taken place and the categories etc have been established and b) some way of explaining exactly what is preventing the production of the morphological surface forms. Researchers working in these areas generally take the view that linguistic evidence is sufficient to demonstrate the existence of these supposed mental representations. However, I am looking towards psycholinguistic investigations to provide additional evidence. I will return to this issue in section 5.1.1. and 5.1.2. of this article 4. LEARNT LINGUISTIC KNOWLEDGE: BUILDING EXPLICIT MENTAL REPRESENTATIONS I will cite here two experiments which show that learnt linguistic knowledge (in this case knowledge of aspects of grammar) can be incorporated into spontaneous language production. In both accounts, the authors refer to this as the creation of implicit knowledge from explicit knowledge. From the point of view I have adopted here, it should be clear that I do not see it in

262

that light: I think that what they have observed is a process of the proceduralisation of learned linguistic knowledge. 4.1 The role of explicit instruction The first study is that undertaken by Housen, Pierrard and VanDaele (2005). The results of this investigation of the learning of the French passive and the French negative allow the authors to state that: a first series of analyses reflects a clear positive effect of explicit instruction on learners mastery of the target structures. The strongest effect is found in the learners unplanned oral production. This would suggest that explicit instruction promotes not only explicit grammatical knowledge as shown by previous studies but also implicit knowledge (2005:235). From the point of view adopted here, this experiment does not deal with what we have been calling the triggering of knowledge on the comprehension side of SLA development but fits squarely into the framework of the proceduralisation of existing knowledge, something which the researchers acknowledge: As Ellis (2001) has pointed out, time pressure does not necessarily guarantee a measure of implicit knowledge as some learners may have developed automatized explicit knowledge which they can apply even under time pressure. Consequently language tasks which allow little or no planning time (like the oral interview in this study) may not necessarily provide appropriate measures of learners implicit knowledge but rather of procedural knowledge. Future research should attempt to develop tasks which can distinguish between proceduralized explicit knowledge and proceduralized implicit knowledge. (2005:262) 4.2 The role of explicit corrective feedback A second example deals with the issue of the role of explicit corrective feedback. Ellis, Loewen and Irlam (2006) have offered evidence which they believe demonstrates that explicit corrective feedback is more effective in bringing about implicit learning than implicit feedback in the form of recasts. However, the researchers state that: Our purpose was not to examine whether corrective feedback assists the learning of a completely new structure, but whether it enables learners to gain greater control over a structure they have already partially mastered (2006:351). They distinguish between explicit and implicit forms of corrective feedback and claim that the explicit forms are more effective. The researchers attach particular significance to the outcomes of an oral imitation test as they see this as an indication of implicit learning. Once again, from my point of view, this is good evidence that explicit instruction has a role to play in enabling learners to proceduralise and generalize from existing knowledge when it comes to making wider use of forms associated with a piece of linguistic knowledge which has already been established. It supports the notion that the learned linguistic knowledge dimension of acquisition is assisted by corrective feedback. 4.3 The way forward The point of view I have adopted here suggests that there are two primary sources of knowledge. These concern, on the one hand, knowledge which is unconsciously triggered as the expectations of UG meet with the input from exposure and, on the other hand, knowledge which is more consciously acquired through explanations and experience. I have argued that these forms of knowledge are not acquired in the same way. However, once acquired, they must be combined and made usable in real time. Once the knowledge has been initially acquired then the important issues are about how the knowledge is restructured on the basis of exposure and practice

263

involving increased quantities of data and especially about how learners integrate competence and learnt linguistic knowledge as their skill base improves. This will be exemplified in section 5.3. 5. THE DUAL ROLE OF LANGUAGE PROCESSING Language processing for comprehension and production is key to second language acquisition in two different ways, both equally essential. The study of learner comprehension holds the key to whether or not, and how, learners extract knowledge from surface structures to build their mental representations. The study of learner production holds the key to how procedural knowledge restructures and integrates the different kinds of knowledge within the procedures that are essential to performance. I will now examine these two aspects in turn. 5.1 Processing for comprehension/competence. There are a number of well known theories which address second language processing in so far as it relates to the extraction of relevant information in order to build linguistic knowledge: Processability theory (Pienemann, 1998); Autonomous induction theory (Carroll 2000); MOGUL or Acquisition by Processing (Truscott & Sharwood Smith, 2004) and the Shallow Structure Hypothesis (Clahsen & Felser, 2006; Felser & Clahsen, 2006). References to these are provided in the bibliography, but I will leave them aside here to concentrate on two recent attempts which have brought psycholinguistic methods to the fore. 5.1.1 Input Processing and Priming The first by Marsden, Altmann and St Clair (submitted permission to quote obtained from the authors) makes use of priming to investigate empirically whether or not the kind of activation which is claimed under the Input Processing Hypothesis of Van Patten actually takes place. Van Patten assumes UG, limited availability of processing resources and the lexical preference principle. The key issue within this approach has always been one of how to get learners to pay attention to the important grammatical forms which are often less salient or communicatively redundant. He argues that, as learners improve, less of their time will be spent understanding the essential lexical meaning and therefore their processing resources will be freed up to pay attention to the less communicatively essential parts of a sentence. The authors therefore set out to test in an empirical way whether communicative redundancy of a form affects subsequent activation of that form and whether comprehension of overall sentential meaning interacts with this. Specifically, they wished to investigate: the extent to which a communicatively redundant form is activated during aural input at the same time as sentential meaning, by using implicit priming tests to measure the activation for the form in different exposure conditions: redundant vs non-redundant, and sentence comprehended versus not comprehended. The investigators have two predictions: 1. When a sentence is comprehended, redundancy or non-redundancy of a specific form will have no priming impact but 2. When a sentence is not comprehended, nonredundancy will lead to priming effects. The form which interests the researchers is the French first person plural marker ons. The design therefore required one set of forms where a given marker is redundant in terms of information and another where it is non-redundant. For the redundant use, they present this form to the learners with other first person plural markers, such as nous or X(ma soeur) et moi mangeons and, for the non-redundant use, they present the form without the other markers in the imperative: Mangeons. In the case where the other

264

markers are present, these will always come first and therefore the inflection on the verb can be said to be redundant. If the learners understand the sentence which contains the other markers, he or she will not necessarily have had to pay any special attention to the inflection to do so. In the other case, where the marker is non-redundant, the plural meaning can only be obtained if the significance of the inflection is grasped. The researchers presented these cases to learners at different levels of proficiency: 50 beginners (100-200 hours of exposure) and 57 intermediates (700-800 hours of exposure). The assumption was that the beginners would not understand the sentence and be dealing with redundancy and non-redundancy from a position of non-comprehension, whereas the intermediates and natives would understand the sentence and therefore allow the study of the differential effect of comprehension. The experiment involved exposure followed by a perceptual identification task where a word was partially covered by white noise and a lexical decision task word or non-word. The results showed that, when sentence meaning was not understood: Non-redundancy led to statistically significant priming effects when compared with the same amount of exposure to the target form in redundant contexts. When the sentence meaning was understood there were no differences in priming effects between the redundant and non-redundant groups. . The authors state that: These findings suggest that when sentential meaning was easily understood but the verb inflection was communicatively redundant, representation of verb inflections were activated to the same extent as when the verb inflection was essential to the task. The findings also suggest that when sentential meaning was not understood, representations of redundant verb inflections were not activated to the same extent as when the verb inflection was essential to the task The findings could support a model of input processing in SLA where sentential processing and communicative redundancy influence how limited processing resources are used. As this represents an innovative approach, the authors rightly surround their findings with a number of limitations and in some ways it is not clear that these findings are unequivocal, but they do represent the kind of methodological collaboration which seems essential if we are to provide empirical demonstrations of the reality of morpho-syntactic mental representations. 5.1.2 Event related Potentials Another recent study by Osterhout et al (2006) makes use of Event Related Potentials. The focus was on how the mental processes underlying comprehension of the L2 change with increasing L2 exposure or proficiency. It concentrated on 14 novice learners of French and followed them over the first 8 months of their language learning, testing them at the end of the first, fourth and eighth month. Research over some years has demonstrated that it can be shown that brain patterns react differently to different kinds of anomalies in speech. Simplifying somewhat, it has been found that semantic anomalies produce what is called an N400. This is a negative wave-form which occurs approximately 400 milliseconds after the perception of the anomalous word. Syntactic anomalies produce what is called a P600. This is a positive wave-form which occurs approximately 600 milliseconds after the syntactic anomaly is perceived. The critical point for our purposes here is that Osterhout et al make use of this finding to establish at what point in the learning process learners react to anomalies as semantic or syntactic. The crucial piece of data from this experiment relates to an aurally-presented test of the reactions of the learners to a verbal person agreement condition where the agreement is phonologically realised, correctly and incorrectly:

265

Tu adores le franais vs *Tu adorez le franais. The results which are reported by Osterhout et al concern the seven fast learners in their study. After one month, the learners brains discriminated between the syntactically wellformed and ill-formed sentences above but by a N400, not a P600. However, by the fourth month, the N400 effect was replaced by a P600-like positivity. The authors attach considerable importance to this result. They comment: If our interpretation is correct, then our adult learners grammaticalized this aspect of the L2 after just a few months of L2 instruction. Our results can be explained by assuming that learners (much like child L2 learners) initially memorize salient word sequences (e.g. Tu adores). Violations of the verbal person rule (e.g. Tu adorez) result in novel word combinations and hence, elicit an N400 effect. After more instruction, learners induce a general verbal person rule ( tu-s, nous ons, vous ez etc): violations of the rule elicit a P600 effect. (2006:220-221). The longitudinal nature of this study allows the researchers to claim that what they have identified here is the point in time when, as far as this relatively simple rule of the L2 is concerned, the learners shifted from the memorized chunk to the rule. If this is the case, then it merits considerable attention. Researchers like Myles et al .(1998) have made the claim that learners do make this transition but have not been able to demonstrate where or when. 5.1.3 Summary I have argued in this section that the study of the triggered learning of linguistic competence, which in my view underlies issues of complexity, can, by making use of techniques such as priming and event related potentials, provide an empirical demonstration of the actual state of mental representation in the learner and thus provide evidence to support the competing positions, described in section 3, based more on linguistic analysis and reasoning. But, when it comes to issues of accuracy and fluency, linked to the proceduralisation of different kinds of knowledge, and the overall integration of the knowledge types, the role of language production seems more important. The argument here is that it is through language production that what is learnt becomes proceduralised and stored in the mind in ways which a) give it an economic and stable form and b) make it accessible in real-time. 5.2 Processing for production - proceduralisation of knowledge. In this section I will briefly present some of the evidence from the studies I undertook with Roger Hawkins and Nives Bazergui of the oral production of English learners of French. Over a period of four years, we asked a group of 12 learners to perform tasks at defined intervals. We also asked the learners to undertake one task in their L1, English. The results here refer to the results from the re-telling of short films. There were two films used in order to lessen the practice effect: one was the Pink Panther (for Year 1 and Year 4), the other was called Balablok (for Year 2 and Year 3 and in English). I will make use of two measures: Speaking Rate, is expressed as syllables per minute, inclusive of pauses; and Mean Length of Run, expressed in syllables occurring between pauses of not less than 0.28sec.. The level of detail will be minimal, given the limitations on space, and I will refer the reader to articles where a fuller treatment is provided. First, I wish to present some evidence of the way the Speaking Rate developed over time by looking at two block graphs for the group on the same task, but with different stimuli, at different times.

266

The Pink Panther graph shows clearly that the group as a whole increased its score between Year 1 and Year 4.
Speaking Rate Averages
160 140 120 100 80 60 40 Syllables per minute 20 0 Pink Panther 1 and 4

Figure 1: Speaking reates averages for the Pink Panther tasks.

The second graph, based on the second film, Balablok, also contains information about the performance in English. It shows that the Speaking rate in the second language, even on the second occasion post residence abroad, did not match that of the first language.
Speaking Rate Averages
200 180 160 140 120 100 80 60 Syllables per Minute 40 20 0 Balablok Y2, Y3 and Eng

. Figure 2: Speaking rates averages on the Balablok task.

267

The evidence suggests that more exposure to the L2 over time has led to a growth in processing production ability but that the L2 speaking rate does not match the L1 speaking rate. My key questions are: what changes have come about to produce this change? Can they be shown to be linked to restructuring and integration of knowledge? What differentiates the L1 performance from the L2 performance? In a more detailed study of two learners performing similar tasks, Towell (2002) argues that one learner can be shown to attain an average level from a low base by adjusting her pausing behaviour whilst another can be shown to attain a high level from an average base by an ability to process more complex syntax. Next, I wish to present some evidence of the level of consistency in individual performance in the second language. When I plot the scores of the individuals at different times in the L2, it becomes clear that within the group there are major differences in the level attained but that these differences are present in each performance.

Furthermore, when these differences are plotted against the scores in English, the same consistencies are revealed, with one exception.

The SR consistencies are confirmed by Spearman rank order correlations: PP1 with PP4 0.84; Bal 2 with Bal 3:0.86; Bal 2 with Bal Eng: 0.79; Bal 3 with Bal Eng: 0.73. Along with Speaking Rate, the growth in the Mean Length of Run of these speakers, expressed as the number of syllables in runs separated by pauses no longer than 0.28 sec. was also measured. Of interest here is the way in which a graph plotting the relative performances of individuals shows the same consistencies as were observed for Speaking Rate (including the odd result for Subject 5) and these operate across L2 at different times and in the L1. It is argued that increases in MLRs must also be linked to the proceduralisation of knowledge. How this happens could be key to the development of the integration of the knowledge once it has been initially acquired from the two primary knowledge sources suggested here.What we see here are simple measures which show three main things: First, on the same task, using very nearly the same language, the Speaking Rate score for each individual increases over time. I wish to argue that Speaking Rate is likely to be a reliable measure of the speed at which learners can access and deliver their knowledge of the L2 in production. By extension, I wish to argue that this demonstrates that the knowledge in question has become more proceduralised. I think that this is reinforced by the evidence of similar patterning of the development of Mean Length of Run. Second, with one exception, the learners all have higher scores in English on this activity than on French. The differential in the scores in my view suggests that French is not stored in the same way as in English. English as a native language was triggered and then proceduralised and subsequently maintained by regular use of L1 over time; it has very high levels of resting activation; retrieval of terms in correct syntactic form is immediate, reliable and accurate. French may have been triggered in part, but a considerable amount of the knowledge has been proceduralised from the analogical learning of patterns or from more or less explicit

268

knowledge. I would wish to argue that this difference in the balance of different kinds of knowledge lies behind the differential outcomes visible on the graphs. Third, we have seen that behind the group measures lie a number of individual differences and that those differences carry over from one time to another and between the L1 and the L2. This speaks to me of some individual factor which is continually influencing the Speaking Rate of each individual. The consistency of the relations between the performance in the L1 and the L2 rule out any notion that this is a reflection of relative proficiency in the L2.This could conceivably be a personality trait or, possibly, differences in individual working memory capacity. As the learners were not independently tested on their WM, this cannot be proven. Further attempts to explore these relationships can be found in Towell and Dewaele (2005). 7. THE RESEARCH CHALLENGES To summarise: Accuracy, Complexity and Fluency are the (desired) behavioural outcomes of the interaction between the growth of linguistic knowledge, the development of learnt linguistic knowledge and the development of linguistic processing ability. Complexity, I have argued, is linked to the growth of linguistic competence. Understanding the factors which determine the growth of linguistic competence are linked to whether or not, and when, the triggering of the full UG defined syntactic tree takes place through exposure/comprehension. I have suggested here that the way forward for the Comprehension/Competence Dimension is to seek to support the Representational Deficit and MSB Hypothesis and/or the FT/FA/MSIH Hypothesis with findings from psycholinguistic studies of comprehension processing, such as priming and ERPs, which may be able to produce empirical evidence of the status of forms in the mind/brain of the individual. I further argued that accuracy was related to the development of learned linguistic knowledge and the integration of that knowledge with linguistic competence. We looked at evidence which demonstrated how this kind of knowledge had been proceduralised. However, the way forward for the Production/Integrated Competence and Learnt Linguistic Knowledge learning will require the use of processing findings to separate out triggered learning from procedural learning, to find empirical evidence of restructuring and to establish how different kinds of knowledge become integrated over time. Fluency, I suggested had the twin dimensions of comprehension and production, comprehension linking squarely to the development of competence and production to the development of those psycholinguistic processes needed for the integration of the two primary forms of linguistic knowledge. Language processing has a dual role in permitting the development of both areas. A final challenge for further research is to investigate the relative importance of any personal dimension, such as Working Memory, which could be a determining factor in the learning process.

269

REFERENCES Andrade, J. (Ed.) (2001). Working memory in Perspective. Hove, UK: Psychology Press. Carroll, S. E. (2000). Input and Evidence: The raw material of second language acquisition. Amsterdam: John Benjamins. Chomsky, N. (1986). Knowledge of Language. New York: Praeger. Clahsen, H. & Felser C. (2006). Grammatical processing in language learners. Applied Psycholinguistics 27, 3-42. Clahsen, H. & C. Felser (2006). Continuity and shallow structures in language processing. Applied Psycholinguistics 27, 107-126. Ellis, R., Loewen S. & Irlam R. (2006). Implicit and Explicit Corrective Feedback and the Acquisition of L2 Grammar. Studies in Second Language Acquisition 28(2), 339-369. Fernandez, E.M. (1998). Processing Strategies in Second Language Acquisition: Some preliminary results. In E.C Klein & G. Martohardjono (Eds.), The Development of Second Language Grammars. A Generative Approach. Amsterdam: John Benjamins. Hawkins, R. (2001). Second Language Syntax Oxford: Blackwells. Housen, A, Van Daele, S. & Pierrard, M. (2005). Rule complexity and the effectiveness of explicit grammar instruction. In Housen A. & Pierrard M. (Eds.) Investigations in Instructed Second Language Acquisition. Berlin: Mouton de Gruyter. Hulstijn, J.H & Ellis R. (2005). Theoretical and Empirical Issues in the study of implicit and explicit second-language learning. Studies in Second Language Acquisition 27 (2). Jackendoff, R. (2002). Foundations of Language. Oxford: Oxford University Press. Lardiere, D. (1998). Dissociating syntax from morphology in a divergent end-state grammar. Second Language Research 14, 359-375. Levelt, W.J.M (1989). Speaking: from Intention to Articulation. Cambridge Mass.: MIT Press. Marsden, E, Altmann, G. & St Clair M. (submitted) Priming of verb inflections in L1 and L2 French: a comparison of communicatively redundant versus non-redundant training conditions. Miyake, A. & Shah P. (Eds.) (1999). Models of Working Memory: Mechanisms of active maintenance and executive control . Cambridge, UK: Cambridge University Press. Myles, F, Hooper J. & Mitchell R. (1998). Rote or Rule? Exploring the role of formulaic language in classroom foreign language learning. Language Learning 48, 323-363.

270

Osterhout, L. McLaughlin, J. Pitkanen, I. Frenck-Mestre, C. & Molinaro, N. (2006). Novice learners, longitudinal designs and event-related potentials: a means for exploring the neurocognition of second language processing. Language Learning 56(1), 199-230. Pienemann, M. (1998). Language processing and Second Language Development: Processability Theory. Amsterdam: Benjamins Schwartz, B.D. & Sprouse R. (1996). L2 cognitive states and the full transfer/full access model. Second Language Research 12, 40-72 Towell, R, (2002). Relative degrees of fluency: A comparative case study of advance learners of French. IRAL 40, 117-150. Towell, R & Hawkins R. (1994). Approaches to Second Language Acquisition. Clevedon: Multilingual Matters. Towell, R & Dewaele J-M (2005). The Role of Psycholinguistic Factors in the Development of Fluency amongst Advanced Learners of French. In J-M Dewaele (Ed.) Focus on French as a Foreign Language. Clevedon: Multilingual Matters. Truscott, J & Sharwood Smith M. (2004). Acquisition by Processing: A modular perspective on language development. Bilingualism, Language and Cognition 7 (1), 1-20. White, L (2003). Second Language Acquisition and Universal Grammar. Cambridge: Cambridge University press.

271

PSYCHOLINGUISTIC MECHANISMS UNDERLYING THE MANIFESTATION AND DEVELOPMENT OF 2ND LANGUAGE FLUENCY, ACCURACY AND COMPLEXITY. Siska Van Daele, Alex Housen and Michel Pierrard Free University of Brussels, Belgium

INTRODUCTION Both the fields of second language acquisition and second language teaching have recently witnessed a growing interest in the fluency, accuracy and complexity of L2 production but despite a growing body of empirical research in this area (e.g. Dewaele & Furnham, 1999, 2000; Ejzenberg, 2000; Ellis & Yuan, 2005; Gilabert, 2007; Kuiken, Mos & Vedder, 2005; Robinson, 2005; Skehan 1996, 1998) our knowledge of the psycholinguistic mechanisms which underlie these dimensions of linguistic proficiency is still limited. Previous studies of memory capacity in L2 processing suggest that second language learners have limited processing space which leads to bottlenecks in working memory and may result in both momentary and longitudinal trade-off effects between fluency, accuracy and complexity (Ellis, 1990; Skehan, 1998). The study reported in this paper is part of a larger research project which further explores these issues by investigating the manifestation and development of productive oral proficiency in terms of fluency, accuracy and complexity in two L2s (French and English) by the same learners (native speakers of Dutch) and by comparing them to benchmark data from native speakers of English and French. In the research project various avenues were explored. From a crossectional perspective we considered if and how strategical pre-task planning opportunities could compensate for the limited processing capacity in the working memory of our learners. The planning construct ties in with cognitive models of SLA and can thus go some way towards understanding the psycholinguistic processes involved in the real-time processing of linguistic information (Ellis, 2005). In line with the assumption that time allotted for linguistic preparation reduces the cognitive strain in the mind of the learner and thus enhances his or her L2 performance it was hypothesized that the speech produced with opportunity for strategic pre-task planning would be characterized by overall higher fluency, accuracy and complexity than the speech produced under conditions without strategic pre-task planning time. Furthermore, we hypothesized that these effects would be independent of the target language. Statistical analysis revealed a similar positive effect of unguided pre-task strategic planning for both target languages for as far as the fluency and complexity dimensions are concerned, but the effects for the two languages were clearly different in terms of lexical and grammatical accuracy. Secondly we tried to explain the individual variability of our findings by taking into account the learner-internal personality variables extraversion (Eysenck & Eysenck, 1985) and foreign language anxiety (Horwitz, Horwitz & Cope, 1986). In this part of the research project we investigated the hypothesis that in introvert and anxious L2 users working memory overloads due to high levels of dopamine in the dorsolateral prefrontal cortex and that these individuals thus render a less fluent, accurate and complex L2 output. Statistical analysis confirmed this

272

hypothesis for as far as lexical diversity was concerned, but showed a lack of effect or differential effects for the other dimensions. The effects of extraversion have been described in Van Daele et al. (2006). A fuller discussion of reasons for the outcomes concerning strategical planning and language anxiety will appear in later publications. Thirdly we followed a longitudinal perspective by considering the development of the dimensions in the simultaneous acquisition of both L2s over three six months intervals. Central to the discussion is the question how L2 fluency, accuracy and complexity develop in each foreign language, whether the different dimensions develop interdependently and how this relates to recent models of psycholinguistic processes underlying L2 acquisition such as the role of various types of memories and automatisation processes. These issues will be the focus of this article. The research questions to be examined here are: 1) a. How does the oral fluency, accuracy and complexity in the EFL and FFL production of the SAME L2 learners develop over time? b. How does the learners' fluency, accuracy and complexity in EFL and FFL compare to that of native speaker benchmarks? 2) 3) Is the evolution fluency, accuracy and complexity similar or different for both target languages (French & English)? (How) can the production and development of these three dimensions of oral proficiency in two L2s be explained in terms of current psycholinguistic models of L2 processing and development?

2. METHODOLOGY 2.1 Participants Participants in this study are 25 Dutch-speaking adolescents (aged 14 to 16) learning both English and French as foreign languages (FL) in a main stream secondary school in Flanders, Belgium. At the onset of the study subjects were in the third year of secondary education and had received 390 hours of French classroom instruction and 180 hours of English classroom instruction. The sizeable discrepancy in amount of classroom input notwithstanding, learning curricula (systematized inventories of learning goals and contents) are the same for both languages from the first year of secondary education onwards and the educational authorities in Flanders maintain that due to greater extra-curricular contact with English pupils will have equal levels of proficiency in French and English by the time they reach the third year. This level roughly corresponds to the A2/B1 level in the Common Frame of Reference of the Council of Europe (Morrow, 2004). The learners were therefore selected on the basis of the amount of English-FL and French-FL instruction they received in years 1, 2 and 3.

273

2.2 Instruments The pupils oral speech production in both L2s was tapped by means of an oral retell task based on a wordless picture story. Subjects were asked to retell one of several versions from a cartoon called Monsieur O (Mister O) (Trondheim, 2002) in both French and English. All versions included the same amount of protagonists, a here-and now contextualization and similar but not identical plot lines. All stories deal with a little O-shaped man who tries to cross an abyss by means of various strategies, but never succeeds. Following Elllis and Yuan (2005) the different versions were selected to reduce learning effects and it was believed that they would generate similar retellings. A narrative rather than a personal story telling, a decision making task or an interactive task was chosen in this study for two reasons: first, to allow comparison of results with other studies, which mostly employ the narrative task type and second, to minimize the possibility of interlocutor reactions to affect learner outcomes and thus obscure the analysis (Kawauchi, 2005). 2.3 Procedure The oral retell-task was administered 4 times in both target languages. During data collection a team of two researchers visited the school where they were assigned two quiet rooms easily accessible to the pupils. From the moment pupils entered the interview room, they were addressed in either English or French. After a brief introduction, the pupils were asked to turn one of the three picture sheets lying in front of them and describe the depicted story on the spot and under a time pressure of 5 minutes (Ellis & Yuan, 2005). This limit was imposed to minimize the effects of strategic and online planning. Each retelling was audio-recorded and the researchers intervened as little as possible during the time of narration. Only in the case of a complete break down of communication or an explicit question from the pupil, did the researchers intervene (e.g. by prompting the pupil to resume the narrative). After having completed the task in one language, pupils returned to their regular classes. The same procedure was used with each pupil for the other target language. There were at least 6 hours between the French and the English retelling. Each recorded oral retelling was transcribed and coded for errors in basic CHAT format (MacWhinney, 2000). The fluency, accuracy and complexity of each recording was determined by calculating 6 linguistic measures. Oral fluency was measured using Speech Rate A (e.g. Dewaele & Furnham, 2000; Siegman, 1987) and Speech Rate B (e.g. Yuan & Ellis, 2003; Ellis & Yuan 2005). Both measures calculate the average number of syllables per minute. Speech Rate A includes all syllables into the count, while Speech Rate B only computes meaningful syllables per minute and consequently excludes all hesitations and other performance phenomena. For the complexity and accuracy dimensions a distinction was made between variables related to the interlanguage lexicon and the interlanguage grammar. Lexical complexity was measured by means of Guirauds Index and the Uber Index (Vermeer, 2000), two transformations of the type/token ratio which measure lexical diversity and attempt to eliminate the effect of differences in text length. For syntactic complexity the Subclause Ratio (Ortega, 2001) and a self-designed Weigthed Subclause Ratio were computed by dividing the total number of subordinate clauses or weigthed subordinate clauses by the total number of clauses. The accuracy dimension was likewise evaluated by means of two ratio measures: a Lexical Accuracy ratio and Grammatical Accuracy ratio. Both measures were calculated by taking the total amount of errors (lexical and grammatical respectively) in the nominator and the total amount of clauses in the denominator (Wolfe-Quintero, Inagaki & Kim, 1998).

274

2.4 Design In summary the oral productive proficiency of twenty-five 14 to 16 year old FL learners of French and English was tracked over a period of one year and a half and compared to baseline data from native speakers of English and French.Table 1 illustrates this research design.
Table 1: research design NOV 03 X X APRIL 04 X X NOV 04 X X X X X APRIL 05 X X

English L2 learners (N= French 25) Dutch L1 speakers (N= French 25) English

2.6 Analysis A series of two-way random effect ANOVAs were performed to investigate how the oral complexity, accuracy and fluency in the EFL and FFL production developed over time and how the learners' oral proficiency compared to that of the native speaker benchmarks. The linguistic measures and the different recording periods were brought in as independent categorical variables, the variable pupil was introduced as a random effect to account for the fact that measures obtained from the same pupil are not statistically independent (Verbeke & Molenberghs, 2000). When necessary, the data were log-transformed before analysis to meet the normality assumptions. Degrees of freedom were calculated by means of the KenwardRoger method (Kenward & Roger 1997). All analyses made use of the SAS 8.1 proc MIXED module (SAS Institute, 1999). 3. RESULTS 3.1Fluency

275

The analysis revealed that both in the English and French interlanguages there was no significantly different evolution between both Speech Rate measures (F4,193=0.20, p=0.937 for

S RA 160 SRB 140 Value 120 100 80 1 2 3 4 Native Native Dutch English
English and F4,189=0.36, p=0.835 for French), but that the values for all the L2 observation periods differed significantly from the values obtained for the native speakers (F 4,174=13.30, p<0.001 for English and F4,168=104.25, p<0.001 for French). A posteriori Tukey tests showed that the values for SRA and SRB were consistently higher for native speakers than for learners in both target languages (all p<0.001) which indicates that L1 speakers in both target languages produced significantly more syllables per minute than the learners. The English native speakers uttered around 150 syllables per minute on average, the EFL learners produced around 100 syllables at the onset of the study and around 110 during the last two observation periods. Statistically only the difference between periods 1 and 3 (p=0.023) and 1 and 4 (p=0.043) were significant. This suggests that the values for both fluency measures increased at the beginning of the observation period but that at the end a plateau effect was reached. Figure 1 illustrates this finding.
Figure 1: Fluency in L1 and L2 English (SRA = Speech Rate A, SRB= Speech Rate B: all and meaningful syllables per minute respectively).

The French native speakers produced between 190 and 200 syllables per minute which is more than four times as much as the FFL learners who produced between 35 and 40 syllables per minute on average at the the onset of the study and between 50 and 55 syllabes during observation period 4. Statistically only the differences between period 2 and 3 (p=0.239) and period 3 and 4 (p=0.225) were not significant. All other values increased significantly (all p<0.05). This suggests a slow average increase of the amount of syllables produced per minute, but other than in English a plateau effect did not (yet) occur.

200

SRA SRB

180 80

Value

60

40

20 1 2 3 4

Native Dutch - Period

276

Figure 2: Fluency in French as an L1 and an L2 (SRA = Speech Rate A, SRB= Speech Rate B: : all and meaningful syllables per minute respectively).

3.2 Accuracy Regarding the accuracy measures we follow Corder (1973, 1981, 1983) in the assumption that native speakers do not make systemic errors. Hence we choose not to compare the L2 values obtained for the accuracy measures with the native speaker benchmarkdata.

3.2.1 Grammatical accuracy The ANOVA for grammatical accuracy in the English learner data reveals that only the values obtained in period 4 differ significantly from the values in period 1 (p<0.001) and period 2 (p<0.001). None of the other periods differed significantly (p>0.05). On average EFL learners speech samples contained about 15 % of grammatical errors in period 1 and 2, 10 % in period 3 and 5% in period 4. In French the learners produced around 40 % of grammatical errors in the first two speech samples, 30 % in period 3 and about 27 % in the last sample. Statistically only the values obtained in observation period 2 differ significantly from periods 3 (p=0.044) and 4 (p=0.016). These results imply that for both languages the learners only improved in grammatical accuracy after the second data collection.
0.5 English French 0.4

0.3

Value
0.2

0.1

0.0 1 2 3 4

Period

Figure 3: Grammatical accuracy in English and French as an L2 (Gram acc = Grammatical Accuracy: Grammatical errors per clause).

3.2.2 Lexical accuracy The ANOVA for lexical accuracy reveals a slightly different development. For English the analysis revealed only a tendency that the values obtained in period 2 were higher than the values collected in period 3 (p=0.056) and 4 (p=0.065). The values obtained for all other data collections do not differ significantly (all p>0.548). As figure 4 illustrates the learners lexical accuracy in English first deteriorates to improve later on. Between period 1 and 2 the average amount of lexical errors briefly increases from around 10 % to 15 %, but in the last speech samples the average error rate drops to about 8 %. In French the learnerss lexical accuracy improves from around 60 % of lexical errors in period 1 to 40 % in period 2 and 25% in the last two periods. Only the values obtained in period 1 differ significantly from periode 2

277

(p=0.045), 3 (p=0.028) and 4 (p<0.001). The other periods are not significantly different (all p>0.199). This implies that improvement in lexical accuracy for French is mainly observed at the beginning of the study.

0.7 English French

0.6

0.5

0.4

Value

0.3

0.2

0.1

0.0 1 2 3 4

Period

Figure 3: Lexical accuracy in English and French as an L2 (Lex acc = Lexical Accuracy: Lexical errors per clause).

3.3 Complexity 3.3.1 Syntactic complexity Regarding syntactic complexity the analysis reveals that both in the English and French interlanguages there was no significantly different evolution between the normal and the weighted subclause ratio (F4,192=0.72, p=0.581 for English and F4,186=0.75, p=0.560). There is, however, a significant difference between the scores obtained from the learners and the scores obtained from the native speakers (F4,172=3.81, p=0.005 for English and F4,162=32.30, p<0.001 for French). In both languages the native speakers seem to outperform the learners. When we take a closer look at the values obtained in the English speech samples, it becomes clear, however, that only the scores obtained for the learners in observation period 2 are lower than the scores obtained for the native speakers (p=0.056) and that between observation periods the EFL learners only differed significantly between period 1 and 2 (p=0.012) and between period 2 and 3(p=0.04). The differences between all other periods are not significant (all p>0.773). These results suggest that the EFL learners go through a short period of backsliding, but that their initial and final knowledge of subordination rules equaled that of the native speakers.
Figure 5: Syntactic complexity in English as L1 and L2 (SCR = Subclause Ratio; WSCR = Weighted Subclause

0.40 SCR WSCR 0.35 0.30 0.25 Value 0.20 0.15 0.10 1 Native 2 3 4 English Native Dutch
Ratio: subclauses per clause).

278

SCR 0.5 WSCR 0.4 0.3 Value 0.2 0.1 0.0 1 2 3 4 Native

Dutch - P

Figure 5: Syntactic complexity in French as L1 and L2 (SCR = Subclause Ratio; WSCR = Weighted Subclause Ratio).

In French this situation is reversed. Only the differences between period 1 and 2 (p=0.985) and 3 and 4 (p=0.823) are nt significant. All other periods and the differences between the learner data and the native speaker data are significant (all p<0.001). Contrary to the developmental patterns for the English interlanguage, then, learners increasingly improve their use of subordination rules. 3.3.1 Lexical Diversity For the lexical diversity measures, finally, the two-way random effect ANOVAs revealed that both variations of the type/token ratio did not differ significantly throughout the study (F1,244=0.000, p=1.000 for English and F4,186=1.44, p=0.223 for French). In English only the scores obtained in period 3 and 4 differ significantly (p=0.014). The values obtained for the native speakers is significantly higher than the scores obtained for the learners in period 1 (p=0.014) and 3 (p<0.001), but at the end of the study (in period 4) the difference between the learners values for the lexical diversity measures and the scores obtained by the native speakers was no longer significant (p=0.526). This implies that at the end of the study the learners speech samples were as lexically diverse as the native speakers.

Figure 6: Lexical diversity in English as L1 and L2 (IG = Guirauds Index; UBER = UBER Index: type/token variations).

279

In French the results are quite different again. The statistical analysis yields significant differences between the L1 and the L2 production (F4,165=22.82, p<0.001), but throughout the study no significant differences were revealed between the various observation periods for the learners (all p>0.363).

Figure 7: Lexical diversity in French as L1 and L2 (IG = Guirauds Index; UBER = UBER Index: type/token variations).

4. SUMMARY AND DISCUSSION With regard to the first two research questions concerning the development of fluency, accuracy and complexity in the English and French interlanguages over time, their mutual relationship and their link to the scores obtained from native speakers of both languages, these results indicate that in English the participants of this study significantly improved their oral fluency, grammatical acuracy and lexical diversity, but that the lexical accuracy and syntactic complexity of their speech samples did not significantly change over time. In French the learners improved in all aspects of their oral productive proficiency except where lexical diversity is concerned. Clearly, then, the developmental patterns for both target languages are different. Regarding the relationship between these developments and the native speaker benchmarks the results indicate that the native speakers consistently obtain higher scores than the learners except where the use of English subordination rules and the lexical diversity of the speech samples is concerned. For all measures the learners were much more proficient in English than in French. In order to explain these findings we draw from several cognitive and psycholinguistic models of language learning and processing. Each dimension will first be dealt with seperately and for each language independently. Then the development of the overall productive proficiency in each language will be dealt with taking into account possible longitudinal trade-off effects between the different dimensions. 4.1 Longitudinal development per dimension. 4.1.1 Fluency As mentioned before, the statistical analysis for fluency in English revealed that the learners speech rate slowly increased but had reached a plateau effect at the end of the study. This finding confirms an earlier study by Towell, Hawkins and Bazergui (1996). They remark that:advanced L2 subjects reach a plateau with respect to speaking rate and articulation rate (which is typically some way below their speaking rate and articulation rate in the L1). Such a plateau appears to be reached while development of proceduralization in the formulator is still

280

in progress (Towell, Hawkins & Bazergui, 1996: 113). When we define productive fluency as the proceduralised parts of the language system (or the relative activation and retrieval of L2 knowledge units in procedural memory) we could conclude that certain manifestations of proceduralised knowledge (in this case syllables per minute) can stagnate before the automatisation process of the underlying rules has ended and before native speakers norms have been reached. Moreover, following Ullman (2001) and Towell and Dewaele (2005) we believe that where grammatical processing in an L1 depends principally on implicit knowledge from procedural memory, L2 processing relies heavily on declarative memory and explicit rules. A fast retrieval and procedural use of explicit rules does not, however, equal the automatic use of implicit knowledge units and structures (Paradis, 2000). Since the processing of linguistic knowledge in an L2 depends heavily on explicit knowledge and the (partial) use of explicit knowledge requires more processing space in working memory than the automatic retrieval from fully proceduralised L1 knowledge, the speech rates in an L2 will mostly remain lower than the speech rates of native speakers. When we apply these theoretical considerations to the results found for French, a similar picture regarding the interaction between the different types of knowledge and memory emerges. In this language the evolution did not stagnate at the end of the observation period (learners slowly, but consistently improved their speech rate) but the difference between the scores obtained by the learners and the native speaker benchmarks was much higher. Seen from this light it is not surprising that the L2 learners (who were less proficient in French than in English) did not reach a plateau effect yet and had not proceduralised as many rules and structures in this language. It is very likely therefore that they primarily relied on declaratively stored explicit knowledge which is harder to process and yields slower speech rates. 4.1. 2 Accuracy In summary the results for both subdimensions of linguistic accuracy in our English learner corpus indicate that both grammatical and lexical errors decreased over time, but that the improvement in grammatical accuracy only set in after the second observation period and that there seemed to be a non-significant backsliding of lexical accuracy between period 1 and 2. Since the subsequent improvements on this last measure are only near significant (and are probably caused by the slight relapse), only the progress in grammatical accuracy can be acertained. In order to explain this evolution we refer to Wolfe-Quintero, Inagaki en Kim (1998) who define the accuracy dimension as the degree to which the L2 language system coincides with the language system of native speakers. Regarding the psycholinguistic mechanisms underlying this dimension they note that: accuracy in language use can arise from three interacting sources: the degree of accuracy of the language representation itself, the strength of competing representations, and the degree of automatization (Wolfe-Quintero Inagaki & Kim, 1998: 33). This implies that other rules (possibly from another language) can interact with automatised rules and that some automatised rules can be deffective (in the sense that they do not follow the norm). If the flawed morfological and syntactic rules are still undergoing a strengthening process (quantitative automatisation of explicit knowledge) they can easily be rectified, but if they have been restructured (qualitative automatisation resulting in a reorganisation of the language system and ultimately in implicit knowledge of the grammatical rules), these procedures are much harder to change (Segalowitz, 2003). If we take into consideration the aforementioned difference between L1 and L2 processing (Ullman, 2001), the significant improvement in grammatical accuracy can be explained. Since L2 processing depends heavily on declarative memory and explicit rules, the learners in this study probably developed two strategies. On the one hand they strengthened and thus sped up the

281

retrieval of explicit grammatical rules (quantitative automatisation), resulting in a faster activation; on the other hand they restructured and qualitatively altered the original rules and created implicit variants. With regard to the French interlanguage the analysis revealed that both grammatical and lexical errors sigificantly decreased over time. Taking into account the lower proficiency level the learners obtained in this language, we believe that the increase in grammatical accuracy can be contributed mainly to a quantitative automatisation of explicitly stored rules and that the learners were thus able to access the rules more efficiently. Since we follow Ullman (2001) in the assumption that lexical items in all languages (including the L1) are always stored in declarative memory, it is our belief that qualitative automatisation of lexical items is not possible. Seen from this light, the increase in lexical accuracy in French is caused by a speeding up of lexical retrieval (quantitative automatisation) along the control dimension (Bialystok, 1991, 2001). 4.1.3 Complexity 4.1.3.1 Syntactic complexity The random effect ANOVAs for syntactic complexity measures only differed significantly between period 2 and the other observation moments. Furthermore the values obtained for the learners only diverged significantly from the scores collected for the native speakers when this weaker period is taken into account. We concluded that in English a form of backsliding occurred, but that overall the learners produced the same amount of subordinate clauses than the native speakers. Following our view on automatisation processes explained above the momentary relapse in syntactic complexity could then be explained as follows: during the first observation period the subordination rules were still stored explictly in declarative long term memory, but during the second data collection a qualitative automatisation of the rules was in progress. During period 3 and 4 an implicit variant of the subordination rules had been acquired and the rules could be retrieved automatically from procedural memory. Or as McLaughlin puts it: once the procedures at any phase become automatized, consolidated, and function effectively, learners step up to a metaprocedural level, which generates representational change and restructering(McLaughlin, 1990: 120). The reorganisation of the underlying L2 system, then, caused a temporary drop in the syntactic complexity scores. This does not necessarily imply that the use of these rules was accurate in every case. The L2 learners proportionally made as much use of them as the native speakers, but as we have seen before: flawed rules can be automatised as well. In French we observed a consistent increase in syntactic complexity which, contrary to the situation in English, never reached more than half of the average French benchmark scores. Seeing as the proficiency levels are much lower in this language, we presume the automatisation processes to be of the quantitative kind, resulting in a faster and more efficient retrieval of declarativly stored rules. This interpretation of the developmental patterns in both target languages gains further credibility when Pienemanns (1998, 2005) processability theory is considered. According to this theory learners can only acquire (and process) certain aspects of rule-based knowledge when a certainlevel of mental processing has been reached. When we apply this idea to the ability to restructure L2 rules and take into consideration the fact that the formation of intrafrasal relationships (and consequently the use of subordination rules) represents the

282

highest hierarchic level in Pienemanns model, this yields the following picture; in French the learners are just beginning to proceduralise the relevant rules, but have not restructerd (or qualitatively transformed) the explicit knowledge, in English the implicit variants of the explicit rules do exist and are (partly) retrieved from procedural memory. 4.1. 3.2 Lexical diversity Regarding lexical diversity in English, the analysis revealed that the values for all periods except for observation period 4 differed significantly from the benchmark scores. At the end of the empirical study EFL learners expressed themselves as lexically diverse as the English native speakers. Since according to Ullman (2001) lexical items are always stored in declarative memory, this implies a quantitative automatisation along the control dimension. As time went by learners could access L2 items (and chunks) more efficiently. The lack of progress in lexical diversity in French is harder to explain. Troughout the study learners produced half as many types as did the native speakers. We presume that although they probably had access to the lexical items items and chunks more quickly, they were not (yet) able to activate a larger diversity of word types. 4.2 Comparison across dimensions 4.2.1 English As mentioned before the results of the analysis indicate a significant improvement in fluency, grammatical acuracy and lexical diversity, but no significant change in lexical accuracy and syntactic complexity over time. When we take a closer look at the developmental patterns, however, it becomes clear that while the scores for syntactic complexity briefly deteriorate (between period 1 and 2), there is no siginficant increase in grammatical accuracy (the average amount of grammatical errors remains the same throughout the first two observation periods), lexical accuracy (the differences between period 1 and 2 are not significant) and lexical diversity (only the scores obtained in period 3 and 4 differ significantly). This might imply that while the subordination rules (rules which require the highest processing level) are being restructured the evolution of all other dimensions of language proficiency (except for fluency) temporarily slows down or halts. 4.2.2 French In the French interlanguage a significant and gradual increase in the fluency and lexical accuracy dimension was noted. Lexical diversity did not change over time, and the increase in grammatical accuracy measures and the values for syntactic complexity only kicked in after observation period 2. These results might imply that during the first observation period learners concentrated on getting the meaning across (i.e. by focusing on their fluency), which resulted in a trade-off between fluency on the one hand and syntactic complexity/ grammatical accuracy on the other hand. From period 2 onwards an additional trade-off effect between syntactic complexity/ grammatical accuracy and lexical accuracy occured. Quantitative automatisation of these dimensions seem to have impeded the elaboration of a the lexicon. 5. CONCLUSION The study reported on in this article dealt with the developmental patterns of productive oral proficiency in two L2s (French and English) by the same learners (native speakers of Dutch)

283

and compared this evolution to benchmarkdata of native speakers of those languages. The results of the analyses were interpretated in terms of different types of automatisation processes and the way these processes allow L2 learners to speed up retrieval of linguistic knowledge from memory. We argued that throughout the observation periods the participants of this study automatised their linguistic knowledge mostly in a quantitative way (i.e. they were able to retrieve and use proceduralised explicit knowledge more efficiently) resulting in sigificant gains in most measures for fluency, accuracy and complexity. In English, however, the language in which our learners were most proficient, it was argued that some of the grammatical rules (especially those concerning subordination) were automatised in a qualitative way (i.e. restructured in long term memory so that learners could (equally) draw from implicit knowledge from procedural memory). It should be noted, however, that these interpretations are tentative and theoretical at best. Firstly, the analysis presented here only addresses the average developmental patterns in each target language. But as Larsen Freeman justly remarks: individual developmental paths, each with all its variation, may be quite different from one another, even though in a grand sweep view these developmental patterns are quite similar (Larsen-Freeman, 2006: 615). Secondly, the measures used in this study, although validated by previous research projects, are quite broad and may not have captured all aspects of the developmental patterns in detail. In order to study what happens in the mind of the L2 learners more accurately, follow-up studies should take into account the inter-individual variation on developmental patterns, add more and more sensitive linguistic measures and should consider the data in a qualitative way.

284

REFERENCES Bialystok, E. (1991). Metalinguistic dimensions of bilingual language proficiency. In E. Bialystok (Ed.), Language processing in bilingual children. Cambridge: Cambridge University Press. Bialystok, E. (2001). Bilingualism in Development: Language, Literacy and Cognition. Cambridge: Cambridge University Press. Dewaele, J.-M. & Furnham, A. (1999). Extraversion: the unloved variable in applied linguistics research. Language Learning 49(3), 509-544. Dewaele, J.-M. & Furnham, A. (2000). Personality and speech production: a pilot study of second language learners. Personality and Individual Differences 28, 355365. Ejzenberg, R. (2000). The juggling act of oral fluency: A psycho-sociolinguistic metaphor. In H.Riggenbach (Ed.), Perspectives on fluency, (pp. 287-315). Ann Arbor: University of Michigan Press. Ellis, R. (1990). Instructed Second Language Acquisition. Oxford: Basil Blackwell. Ellis, R. (2005a). (Ed.) Planning and Task Performance in a Second Language. Amsterdam: JohnBenjamins. Ellis, R. & Yuan, F. (2005). The effects of careful within-task planning. In R. Ellis (Ed), Planning and Task Performance in a Second Language, (pp. 37-76). Amsterdam: John Benjamins. Eysenck, H.J. & Eysenck, S.B.G. (1985). Personality and Individual Differences. New York:Plenum. Gilabert, R. (2004). Task complexity and L2 narrative oral production. Unpublished phd dissertation. Horwitz, E.K., Horwitz, M.B. & Cope, J. (1986). Foreign language classroom anxiety. Modern Language Journal 70(2), 125-132. Kawauchi, C. (2005). The effects of strategic planning. In R. Ellis (Ed), Planning and Task Performance in a Second Language, (pp. 37-76). Amsterdam: John Benjamins. Kenward, M. G. & Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics 53, 983997. Kuiken, F., M. Mos & I. Vedder (2005). Cognitive task complexity and second language writing performance. In S. Foster-Cohen & P. Garca-Mayo (Eds.) Eurosla Yearbook, (pp.195222).Amsterdam: John Benjamins. Larsen-Freeman, D. (2006). The Emergence of Complexity, Fluency, and Accuracy in the Oral and Written Production of Five Chinese learners of English. Applied Linguistics 27(4), 590619.

285

McLaughlin, M. L. (1990). Restructering. Applied Linguistics 21, 299-311. Morrow, K. (2004). (Ed.) Insights from the Common European Framework. Oxford: OxfordUniversity Press. Ortega, L. (2005). Learner-driven attention to form during pre-task planning. In R. Ellis (Ed),Planning and Task Performance in a Second Language , (pp. 37-76). Amsterdam: John Benjamins. Paradis, M. (2000). Awareness of observable input and output not of linguistic competence.Paper presented at Odense University, Denmark, April 2000. Pienemann, M. (1998). Language Processing and Second Language Development: ProcessabilityTheory. Amsterdam: John Benjamins. Pienemann, M. (2005). An introduction to Processability Theory. In M. Pienemann (Ed.), Cross-Linguistic Aspects of Processability Theory, (pp. 160). Amsterdam: John Benjamins. Robinson, P. (2005). Cognitive complexity and task sequencing: Studies in a componential framework for second language task design. IRAL 43, 1-32. SAS Institute (1999). SAS/STAT users guide, Version 8. Cary, North Carolina: SAS Institute. Segalowitz, N. (2003). Automaticity and second languages. In C. J. Doughty & M. H. Long(Eds.), Handbook of Second Language Acquisition (pp. 382-408). Malden, MA: Blackwell. Siegman, A.W. (1987). The tell-tale voice: Nonverbal messages of verbal communication. In A.W. Siegman and S. Feldstein (Eds), Nonverbal behaviour and communications, (351434).Hillsdale, New Jersey:Erlbaum. Verbeke, G. & Molenberghs. G. (2000). Linear mixed models for longitudinal data.Springer:Verlag. Wolfe-Quintero, K., Inagaki, S. & Kim, H.Y.(1998). Second Language Development in Writing:Measures of Fluency, Accuracy and Complexity . Honolulu: University of Hawaii Press. Skehan, P. (1996). A Framework for the implementation of task-based instruction. Applied Linguistics 17, 33-62. Vermeer, A. (2000). Coming to grips with lexical richness in spontaneous speech data. LanguageTesting 17 (1), 65-83. Yuan, F. & Ellis, R. (2003). The effects of pre-task planning and on-line planning on fluency,complexity and accuracy in L2 oral production. Applied Linguistics 24, 1-27. Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford University Press.

286

Towell R. & Dewaele J.-M. (2005). The role of psycholinguistic factors in the development of fluency amongst advanced learners of French. In Dewaele J.-M. (Ed.), Focus on French a Foreign Language : Multidisciplinary Approaches. Clevedon: Multilingual Matters. Towell, R., Hawkins, R. & Bazergui, N. (1996). The development of fluency in advanced learners of English. Applied Linguistics 17, 84-115. Trondheim, L. (2002). Monsieur O. Paris: Delcourt. Ullman, M. T. (2001). A neurocognitive perspective on declarative/proceduralmodel. Nature Review Neuroscience 2, 717726. language: The

287

288

289

De Koninklijke Vlaamse Academie van Belgi voor Wetenschappen en Kunsten cordineert jaarlijks tot 25 wetenschappelijke bijeenkomsten, ook contactfora genoemd, in de domeinen van de natuurwetenschappen (inclusief de biomedische wetenschappen), menswetenschappen en kunsten. De contactfora hebben tot doel Vlaamse wetenschappers of kunstenaars te verenigen rond specifieke themas. De handelingen van deze contactfora vormen een aparte publicatiereeks van de Academie.

Contactforum Complexity, Accuracy and Fluency in Second Language Use, Learning & Teaching (, 2007, hoofdaanvrager)

One of the most frequently measured aspects of human behaviour is undoubtedly peoples language proficiency. Particularly speakers and learners of second or foreign languages often have their proficiency in the second language assessed. But exactly what is it that is being assessed? What makes a second language speaker (or a native speaker for that matter) a proficient or non-proficient language user? And how can this be most efficiently (i.e. validly, reliably and feasibly) measured? Many SLA researchers and L2 practicioners, including the contributors to this thematic issue, assume that the construct of L2 proficiency is compositional in nature, and that its principal linguistic components can be captured by the notions of fluency, accuracy and complexity. Both within SLA research and in L2 pedagogy, these terms have been used for a long time to investigate the development, the processing and the use of a L2, but their exact meanings and functions are still not clear. XXX

i
ii

iii

iv

These effect sizes can be transformed from one to the other (Norris & Ortega, 2006; Rosenthal, Rosnow, & Rubin, 2000; Vacha-Haase & Thompson, 2004). The general guidelines for interpreting effect sizes are that those between 0.2 and 0.5 are small effects, those between 0.5 and 0.8 are medium-sized effects and those greater than 0.8 are considered large effects (Lipsey & Wilson, 2001). In the figures, the top and bottom of the box indicate the upper and lower 95% confidence intervals. Crossing the zero-value line means the results may be by chance and therefore, not trustworthy. When the boxes do not overlap with each other, the differences are reliable.

You might also like