Professional Documents
Culture Documents
ImazAgirreetal.2024.IRAL CAFInstructionmanuscriptAUTHORSVERSION
ImazAgirreetal.2024.IRAL CAFInstructionmanuscriptAUTHORSVERSION
net/publication/381030941
Article in IRAL - International Review of Applied Linguistics in Language Teaching · May 2024
DOI: 10.1515/iral-2023-0137
CITATIONS READS
0 51
3 authors, including:
All content following this page was uploaded by Roberto Arias-Hermoso on 03 June 2024.
Imaz Agirre, A., Arias-Hermoso, R. & Ipiña, N. (2024). The effect of an intervention focused on
academic language on CAF measures in the multilingual writing of secondary students. International
Abstract
The present study aims to explore the effect of an experimental intervention based on academic writing
instruction and scientific argumentation on the argumentative multilingual writing of secondary school
students. Complexity, accuracy, and fluency (CAF) measures were used to evaluate the texts. A quasi-
experimental study with a pre-test/post-test design was carried out with a control group (n=49) and an
experimental group (n=63) of Basque-Spanish bilingual Year 8 students. The students composed
scientific argumentative texts before and after a science unit was taught. Participants in the experimental
group received instruction on academic writing and the discourse aspects of argumentation. The corpus
of 678 texts was processed using MultiAzterTest and CAF measures were retrieved. Repeated measures
ANOVAs were used to compare pre-test and post-test results. The control group exhibited a significant
decrease in some fluency, syntactic complexity and accuracy measures, while the experimental group
showed a significant improvement in some syntactic complexity and accuracy measures. These results
suggest that the experimental intervention might have had a positive impact on written CAF measures.
This study emphasises the importance of teaching academic language in multilingual contexts.
1
1. Introduction and literature review
Multilingual writing is a relatively new area in second language (henceforth, L2) writing research
(Rinnert and Kobayashi 2016). Extensive research has been carried out on multilingual students’
language learning in different linguistic skills and areas, such as overall speaking or writing skills, but
most of this research has been carried out from a monolingual perspective (Cenoz and Gorter 2011),
and consequently, there remains a lack of understanding of students’ multilingual writing skills
(Granados et al. 2022). Our definition of multilingual writing is in line with that of Rinnert and
Kobayashi (2016:365), who define it as “the ability to write in two or more languages, so is intended to
subsume bilingual writing ability or biliteracy. The study of multilingual writing involves comparison
of writing by the same writers in more than one language.” As previous language experiences affect
subsequent language learning (Cenoz 2013; Cummins 2021), education needs to consider the whole
bilingual education contexts in which the language of instruction is not the first language (henceforth,
L1) or dominant language of the majority of the students, an aspect that requires careful consideration
so that students achieve balanced literacies in all of their languages (Lorenzo and Rodríguez 2014;
Although multilingual education programmes and second language (L2) acquisition have received
attention recently, multilingual writing remains under-researched (Granados et al. 2022). Thus, this
multilingual students (in this case, Basque). To this end, we analysed subject-specific argumentative
compositions written in Basque, Spanish and English by a control group and an experimental group of
multilingual secondary education students enrolled in an immersion programme. This study aims to
2
1.1. Academic and disciplinary literacies
The linguistic characteristics of academic settings differ from those of everyday communication
(see Cummins 1980, 2021). In fact, academic discourse has often been characterised by its higher
nominalisation, the advanced use of cohesion devices, or abstract terminology from the disciplines
(Granados et al. 2022). Students, therefore, need to master academic discourse to be able to succeed in
Shanahan 2008; Moje 2015). Furthermore, different disciplines shape the linguistic features required
for knowledge construction and communication, as some linguistic features are more prevalent in
certain disciplines. This is evident at different linguistic levels, such as terminology or discourse.
Regarding the lexical level, for example, social science and humanities tend to use more abstract terms,
while science and technology experts require more concrete language (Durrant 2017). Discourse-related
disciplinary specifics are also presented in terms of subject-specific cognitive discourse functions
(CDFs) (Dalton-Puffer 2013, 2016), which can be understood as discourse patterns that represent certain
cognitive actions and that arise in specific educational contexts, such as defining, arguing, hypothesising
and categorising. For example, in the subject of history, causal explanations are central for historical
narratives (Coffin 2006; Lorenzo 2017), whereas physical descriptions are more frequent in geography
(Llinares et al. 2012). These discourse-level patterns inherently require using distinctive
lexicogrammatical elements (i.e., past tenses and temporal connectives in historical narratives, or
Many of these subject-specific conventions appear in the form of written texts, as writing is a
fundamental practice of language learning (Manchón and Polio 2021) and of education overall (Christie
2012). For successful education, the learner’s home languages and all the languages of instruction need
to be taken into account in the development of literacies (Lorenzo and Trujillo 2017), as many students
in multilingual programs study through a language that is not their L1 (Cenoz 2009; Cenoz and Gorter
2011). Previous research has shown between-language commonalities in the development of academic
literacies in one’s languages (Granados et al. 2021). In order to capture the processes involved in
3
multilingual academic writing, in line with Cummins’ (1980) common underlying proficiency (CUP)
theory, Rinnert and Kobayashi (2016) proposed a model of multilingual writing, which is divided into
three important components: the writer’s repertoire of knowledge, the social context, and the written
outcome. The extent to which the linguistic and textual features of multilingual texts overlap largely
depends on the writer’s repertoire of knowledge and social context (Rinnert and Kobayashi 2016).
Focusing on student writing offers an outstanding opportunity to explore linguistic and educational
development (Christie 2012) and, along with corpus linguistic approaches, can serve to track
development (Durrant et al. 2021). One of the most influential methods in corpus analysis has been the
CAF construct, a set of measures of complexity, accuracy and fluency that have long been used in
research in linguistics and applied linguistics to capture written or oral performance (Wolfe-Quintero
et al. 1998). These measures have been shown to be appropriate in the evaluation of writing quality
(Crossley and Kim 2022). Typically, such measures include analytic and quantitative descriptors of
linguistic features related to syntactic and lexical complexity, accuracy and fluency. The CAF construct
should be viewed as “multifaceted, multi-layered and multidimensional in nature and [the dimensions]
are interrelated in complex and not necessarily linear ways” (Michel 2017:52; Bulté and Housen 2012).
Due to the complex nature of the CAF construct, researchers must define and operationalise its
dimensions (Bulté and Housen 2012). In this paper, we understand syntactic complexity as the
production of a “variety and degree of sophistication of the syntactic structures deployed in written
production” (Lu 2017:494). Measures of lexical complexity reveal both the size (diversity) and depth
(sophistication) of a writer’s lexicon (Crossley and McNamara 2014; Maamuujav 2021). Accuracy
refers to the ability to produce error-free writing or speech and to follow the rule system of a language
(Bui and Skehan 2018; Wolfe-Quintero et al. 1998), while fluency is directly related to a writer’s control
of the language, as it reflects the “speed and ease” with which they produce linguistic output (Housen
and Kuiken 2009:462), and is often measured as the number of utterances produced.
4
Due to the aforementioned multidimensional nature of the CAF construct, it is crucial to understand
the complex tasks such as writing will result in trade-off effects (Skehan 1998, 2015). In fact, writers
might not have the necessary resources to focus on the successful production of all CAF dimensions,
often producing texts that are either fluent or complex or accurate, but not all three, which has previously
referred to the limited attention capacity (LAC) hypothesis (Skehan, 2009). Trade-off effects are
particularly seen between complexity and accuracy (Skehan 2009). For instance, as suggested by Michel
(2017), when L2 learners perform a task that requires complex language use, they might be less fluent
(e.g., produce shorter utterances, or write/speak more slowly), probably due to focusing on the
complexity of the task. Another clear example is the overuse of simple linguistic structures when
learning an L2 – learners might be more inclined to produce simple but accurate language.
Numerous studies have used some or all of the dimensions to analyse different sets of corpora, as
quantitative corpus linguistic approaches provide clear and objective indicators of a learner’s
performance (Durrant et al. 2021). Many studies, such as those by Navés (2011), Roquet and Pérez-
Vidal (2017), and Lahuerta (2020), have focused on the effects of type of instruction (e.g., Content and
Language Integrated Learning vs. English as a Foreign Language) on written outcomes by using CAF
measures. Other studies have used CAF measures to track developmental trajectories in subject-specific
language writing (e.g., Granados et al. 2021, 2022; Lorenzo and Rodríguez 2014). CAF measures have
also been used in process-oriented approaches, such as to explore how pre-planning (e.g., Ashoori
Tootkaboni and Pakzadian 2020), the writer’s perceptions (e.g., García-Ponce et al. 2022), genre
differences (e.g., Yoon and Polio 2017) or individual differences (e.g., Kormos 2012; Vasylets et al.
2022) affect writing outcomes. Most of these studies found that some or all CAF metrics were
Freeman 2009).
Research has produced mixed findings regarding the role of CAF measures in language
development, often overusing the term development (Polio and Park 2016), as an increase in certain
5
measures does not imply development per se. However, there seems to be agreement that CAF measures
develop over time, with writers producing more complex, accurate and fluent texts as they become more
proficient in the language (Berninger et al. 2011; Crossley et al. 2011; Durrant et al. 2021). Nonetheless,
the aforementioned development must be considered carefully, as the acquisition and development of a
language is a dynamic process (Larsen-Freeman 2009; Michel 2017) in which development is normally
non-linear in nature (Polio and Park 2016), especially in the development of academic written language
As mentioned by Granados et al. (2022), much research has used the dimensions of CAF to assess
the development of academic writing at different stages and levels. Durrant et al. (2021) provide a
thoughtful review of some CAF measures in (academic) writing, considering both L1 and L2 studies.
Their review highlights the significant development of certain measures across educational levels (or
age), such as T-unit length or lexical diversity, in addition to their relationship to written quality
assessment. However, not all measures develop with time, as is proposed by the non-linear and dynamic
nature of CAF development (Michel 2017). For example, Polio and Shea (2014) and Yoon and Polio
(2017) longitudinally analysed students’ compositions using complexity and accuracy measures, to
capture academic language change in a one-semester-long language course. They found little significant
change in L2 English writing in a one-semester-long language course. Literature focusing on the use of
CAF for academic writing development is well-established in the field (see Durrant et al. 2021 for a
review), nevertheless, instructional studies focusing on the effects of experimental interventions are
scarcer. For example, Marashi and Chizari (2016) explored the effect of critical discourse analysis, Teng
and Huang (2021), that of metacognitive strategies, and Fathi and Rahimi (2022), that of a flipped
classroom approach.
Many of the aforementioned studies, however, have focused on general academic topics, and little
is known about their development in subject-specific writing. The studies by Lorenzo and Rodríguez
(2014) and Granados et al. (2021, 2022) explored changes in CAF measures in secondary education
history writing, and found that some of the metrics changed over time from one year to another.
6
Granados et al. (2022) found a similar development in students’ L1 Spanish and L2 English during a
2. Research questions
To sum up, CAF measures have long been used in corpus linguistics and academic language
research, as they serve as objective indicators of linguistic performance (Granados et al. 2022; Lorenzo
and Rodríguez 2014). Using CAF measures requires defining and operationalising each dimension
(Bulté and Housen 2012) and being aware of their dynamic and multifaceted nature (Michel 2017), and
the potential trade-off effects in which learners’ attentional resources are involved (Skehan 1998, 2009,
2015; Robinson 2007). However, as has been mentioned, the number of instructional studies focusing
on CAF is still limited, even more so when considering both disciplinary and multilingual writing. In
addition, many studies utilising CAF have only employed one of the dimensions (Phuoc and Barrot
This study, therefore, seeks to fill these gaps by analysing the effects of instruction focused on
CDFs and academic language on students’ written production of CAF measures in disciplinary trilingual
writing. The main objective of the present study is to explore how CDF and academic language teaching
affect student subject-specific writing, answering the following research question: Does instruction on
academic language influence complexity, accuracy and fluency measures in students’ writing in
Three main hypotheses (H) have been formulated for this study:
H1. According to the multilingual writing model proposed by Rinnert and Kobayashi (2016),
H2. Literature suggests that the development of CAF measures is non-linear (Michel 2017) and
that trade-off effects appear in their development (Skehan 1998). It is predicted, therefore, that
not all CAF measures will develop positively, but rather, due to trade-off effects and the LAC
7
hypothesis (Skehan 1998, 2015), students will focus primarily on either accuracy or complexity
(Skehan 2009).
H3. The third hypothesis assumes that students in the experimental group will improve their
writing in the language of instruction (Basque), which will be reflected by improvement in CAF
measures. In addition, gains will also be significant in Spanish and English texts, as certain
characteristics are shared across languages and overlap (Rinnert and Kobayashi 2016). This is
supported by Granados et al. (2022) and Arias-Hermoso and Imaz Agirre (2023), who claim
3. Methodology
3.1. Participants
This was a quasi-experimental study with a pre-test/post-test design carried out in four Basque
immersion Model D schools in the Basque Autonomous Community (BAC). Under Model D
instruction, students are taught through the medium of Basque in all subjects, except for 3 hours a week
each of Spanish and English language classes (see Cenoz 2023 for more information on the language
models in the BAC). Therefore, students are exposed to Basque in school for 26 hours a week. These
four schools were selected because they are members of the same educational network and follow the
same pedagogical approaches, and as a result, this would minimise the effect of confounding variables
such as different teaching methods, instructional materials or students’ previous knowledge of the topic.
Two intact Year 8 (13-14-year-old students) classes were selected in each of three initially chosen
schools, one assigned to the control group in 2021 and the other to the experimental group in 2022.
Table 1 shows the total number of participants, the number included in the final sample, and the number
that did not meet the inclusion criteria. Participants who were absent for at least one data collection
point were excluded from the final sample. Due to data being collected in March 2021 during COVID-
19 and the inclusion criteria, a fourth school was added to the experimental group.
8
Table 1: Number of participants included and excluded per school and group. School D was not
included in the control group.
School A 16 14 12 18
School B 12 5 14 5
School C 21 5 20 2
School D – – 17 14
Total 49 24 63 39
All participants had Basque, Spanish or both as their L1 or dominant language. The majority of
students reported having Spanish as their L1 (67.25%), followed by those having Basque (23.89%) or
both (8.85%). Despite L1 differences, all participants had sufficient command of the language of
instruction, Basque. They were expected to be advanced learners of both Basque and Spanish (around
B2), and initial-intermediate learners of English (around A2). In order to control for potential
differences in language proficiency between schools and groups, their knowledge of Basque, Spanish
and English was measured with a LexTale test (de Bruin et al. 2017; Izura et al. 2014; Lemhöfer &
Broersma 2012). Although LexTale focuses primarily on word recognition, scores are correlated with
general proficiency and language dominance (e.g., Bonvin et al. 2023). In the task, participants have to
indicate whether the words presented to them are real or pseudowords. Correct and incorrect answers
were subtracted to obtain a numerical score ranging from -1 to 1 for each student in each language, with
scores closer to 1 indicating more proficient learners. Both groups scored better in Spanish (M=0.66,
SD=0.2) than in Basque (M=0.55, SD=0.15) and English (M=0.17, SD=0.16). T-tests showed no
significant difference between the experimental and control groups regarding the scores (all ps>0.3),
therefore, a shared similar baseline proficiency between the groups can be assumed.
Written informed consent from parents or guardians was obtained before data collection, and all
participants were informed about the procedure of the study. Data were treated confidentially and were
collected solely for research purposes, and participants were given the option to withdraw from
9
participating in the study at any time. Anonymity could not be granted due to data collection via e-mail,
however, all personal information was pseudonymised. This study received approval from the
The participants were asked to write three texts as part of the pre-test before learning the subject content,
and three as part of the post-test after having covered the topic in class. For the control group, data were
collected in March 2021 (pre-test) and three months later, in June 2021 (post-test). The experimental
group completed the tests on the same dates in 2022. At each data collection point, students were asked
to write three letters, one each in Basque, Spanish and English. In order to control for potential task
repetition and practice effects, the order in which the letters were completed was pseudo-randomised
Basque-Spanish. Students in each class were randomly assigned to one of these sequences. They were
given no word limit for the task and had 55 minutes (the average duration of a school session) to
complete each essay. The first two texts were completed in two contiguous sessions, while the third was
completed the following day. Although this schedule might affect students’ writing due to recency,
tiredness and task familiarity, the pseudo-randomised order of completion was used to mitigate these
effects. The texts were written on a computer under individual test conditions, and the participants were
explicitly told not to use online resources, translators or any aid not provided by the researchers.
Teachers and researchers were present during the tests and students were not allowed to ask questions.
In addition to the texts, students also completed a background questionnaire and the LexTale tests in
In their letters, the students wrote to different people or institutions to define renewable energy and
argue in favour of its use in their schools. The prompts were identical but the texts in Basque were
intended for parents, the ones in Spanish for the Spanish Ministry of the Environment, and the ones in
10
3.3. The intervention design and its context
In line with the pedagogical approach of the network of schools of which they are members, all four
schools participating in the present study make use of a competence-based approach in which subjects
at the secondary level are organised as three three-month-long projects. This competence-based
curriculum emphasises both the acquisition and transfer of lifelong learning knowledge based on
students’ exit profile (Antero et al. 2023). These projects are not interdisciplinary by nature, as each
content subject is taught separately. The present study was carried out during a project focused on
renewable energies, whose main objective was to learn about energy sources, their characteristics and
their use, and to critically evaluate and express opinions on that topic. Thus, the students were expected
An interdisciplinary group was created for the study in order to collaboratively analyse and adapt
the teaching materials used to teach the project and to design subsequent instructional sequences. The
interdisciplinary group included six researchers, four secondary science teachers and two materials
designers who had designed the original materials and textbooks used for the project. The group held
six two-hour collaborative sessions between June 2021 and February 2022. The first two sessions were
mainly theoretical, focusing principally on training teachers and materials designers to acquaint them
with academic language, disciplinary literacies and the CDF construct. The third and fourth sessions
were aimed at collaboratively analysing the teaching materials already in use with a checklist (see
Lersundi, 2023). The analysis showed that, although students were expected to develop scientific
arguments and explanatory discourse, there was little explicit instruction on either in the materials. As
a result, the final two sessions focused on proposing relevant modifications to the teaching materials
After having adapted the materials, the four science teachers participating in the project carried out
the modified intervention in the classroom between March 2022 and June 2022. All teaching was in
11
person and during school hours, during the usual time periods scheduled for the subject of science (3
hours a week). The language of instruction was Basque. The instruction modifications focused mainly
on three one-hour sessions, which were specifically designed to teach students to argue scientifically.
Following Rinnert and Kobayashi’s (2016) multilingual writing model, our intervention addressed the
students’ repertoire of knowledge, focusing on all four components (topic knowledge, genre knowledge,
disciplinary knowledge and multilingual writing knowledge). Figure 1 summarises the objectives and
content of each of the three sessions. Sessions 1 and 2 took place in the middle of the project, while
The first one-hour session aimed to teach the students the CDF argue. Science teachers provided
the students with a handout focusing on Toulmin’s Argumentation Pattern (1958). The main elements
of Toulmin’s Model (claim, data, warrants, rebuttals) were defined and supported with examples. The
importance of using evidence to support one’s opinion was highlighted in the handout. Subsequently,
the students were asked to analyse a text about eating sweets, identify Toulmin’s elements in the text,
and justify whether the author had produced a successful argument. The main objective of the session
was for the students to become aware of the main elements required for effective argumentation. The
second session focused on learning the lexicogrammar required to express cause and effect in Basque.
To do so, the students participated in an interactive digital simulation in which they were asked to
combine different elements such as solar panels, batteries, human force or LED lights to assess the
effect of the elements on each other. They were asked to orally explain the energy transformation
processes by using the lexicogrammatical resources that had been taught. In the third and final one-hour
12
session, the students had to prepare an oral defence of an infographic comparing renewable and non-
renewable energy sources. They were explicitly asked to analyse and provide arguments for the
advantages and disadvantages of each, after having carried out a small research project on the topic.
The only CAF dimension included in the instruction was the lexicogrammar needed to express
For coding, the final trilingual corpus consisted of 672 texts. Student essays were given a code
(including student ID, school, language and time) and were aligned by participant, language and time.
Each text was processed using MultiAzterTest (Bengoetxea et al. 2020), a computational multilingual
corpus analysis tool previously used to track academic language development in writing in the discipline
of history in Secondary Education (e.g., Granados et al. 2022). MultiAzterTest was chosen because it
is one of the only multilingual tools that supports the three languages under study and provides common
quantitative measures for them. MultiAzterTest provides 163 indices in English, 141 in Spanish and
125 in Basque, as some are language-dependent, e.g., Common European Framework of Reference
(CEFR) word classifications in English. For this study, only measures that were common to the three
languages and that had previously been used in the literature were considered. We acknowledge that
some measures might be more language-specific or more common in a certain language, however, the
objective of this study was to compare pre-post differences rather than differences across languages.
Due to the multilayered nature of the construct (Bulté and Housen 2012), it is recommended to use
more than one measure for each dimension. Therefore, in this study, at least three measures were used
to track development in each of the four dimensions. Concerning syntactic complexity, the mean
number of modifiers per noun phrase, the number of subordinate clauses and the mean sentence length
(mean words per sentence) were calculated; all of these measures have been employed in previous
research (e.g., Casal and Lee 2019; Lorenzo and Rodríguez 2014, Maamuujav et al. 2021). The number
of modifiers per noun phrase, for example, has been shown to be correlated with writing quality
13
(Crossley and McNamara 2014), and sentence length development is regarded as an expansion of
academic language (Lorenzo and Rodríguez 2014). Logical and causal connectives were also included
as a measure of syntactic complexity due to the textual characteristics of the elicited genre.
Regarding lexical complexity, the measure of textual lexical diversity (MTLD), the number of rare
content words and the number of distinct rare content words were calculated. The MTLD is a validated
measure (McCarthy and Jarvis 2010) that is stable across texts of different lengths (Zenker and Kyle
2021). Rare words are defined as those with a word frequency lower than 4 in wordfreq (see Speer et
al. 2018). Lexical units of agglutinative languages (i.e., Basque) are segmented and lemmatised by
MultiAzterTest (Bengoetxea et al., 2020), a necessary step for the recognition of both the lemmas and
agglutinated functional words, i.e., determiners and/or declensions (Otegi et al. 2017). Semantic
similarity measures provided by the computational analysis were also included within lexical
complexity. Fluency was measured by the total number of words, sentences and paragraphs in the texts
MultiAzterTest does not provide scores for accuracy, and the authors therefore analysed accuracy
manually for each of the texts using MAXQDA, which provides inter-rater agreement scores. The
authors performed an initial separate analysis of all control texts from School C (n=126), and in order
to calculate the consistency and agreement in their evaluations, intercoder agreement for error frequency
in each text was determined by MAXQDA to be 91%. Discrepancies were discussed and the remaining
For the purposes of this study, the type of error was not relevant, therefore, all errors were coded
and quantified identically. If an error appeared more than once, all appearances were counted as errors.
For example, if a student wrote reniwable instead of renewable three times, it was counted as three
errors. For the purposes of this study, neither content-related errors (e.g., stating that solar energy is
non-renewable) nor style-related errors (e.g., a very informal greeting in the letter to the Ministry of the
Environment) were counted. However, if a content error produced textual incoherence, it was
considered an error. In summary, spelling, typographical, coherence, syntactic and lexical errors were
14
counted. It should be noted that words in non-standard Basque were also counted as errors. Previous
research has used error-free T-units to capture written accuracy, however, this measure might be
inappropriate for beginners, as they tend to make errors in every sentence (Polio and Shea 2014).
Consequently, three scores were used to measure accuracy: the total number of errors, errors per word
(Lahuerta 2020; Orcasitas-Vicandi 2021) and errors per sentence (Sagasta 2003; Wolfe-Quintero et al.
Statistical analyses were performed with JAMOVI (2022) and showed data to be not normally
distributed; therefore, non-parametric tests were carried out to address the research questions. Repeated
measures Friedman analyses of variance (ANOVAs) and Durbin-Conover post-hoc tests were
4. Results
In this section, the results of the statistical analyses are presented. To facilitate a clear and organised
presentation, the findings of each dimension are reported separately. Significant differences from the
15
pre-test to the post-test are indicated with asterisks next to the post-test results. Only variables exhibiting
a significant change from pre-test to post-test in any of the languages (Basque, EU; Spanish, ES;
English, EN) or groups (control, experimental) are included in the tables; those measures that did not
reach statistical significance in any language or group are omitted from the tables in the upcoming
sections.
Regarding syntactic complexity measures in student writing, some significant differences between
testing moments were found, as indicated in Table 3. The control group decreased their use of
subordinates significantly in Basque (χ²(5)=3.722, p<.001) and English (χ²(5)=2.91, p=.004), and
marginally in Spanish (χ²(5)=1.745, p=.082). In addition, a significant decrease in the use of logical
connectives was found in the texts in English (χ²(5)=2.516, p=.013). No other measure changed
significantly. In contrast, the experimental group showed no significant decreases in those measures but
improved in the mean number of words per sentence, significantly in Basque (χ²(5)=2.8, p=.006), and
marginally in Spanish (χ²(5)=1.95, p=.052). Participants in the experimental group also increased their
use of modifiers per noun phrase in Basque (χ²(5)=2.65, p=.009) and Spanish (χ²(5)=2.81, p=.006).
Changes in the other measures in this dimension (the number of causal connectives) did not reach
statistical significance.
16
Table 3: Syntactic complexity measures: ***p<0.001, **p<0.05, *p<0.09.
CONTROL GROUP EXPERIMENTAL GROUP
MEASURE T
EU ES EN EU ES EN
0.673 (0.574) 1.161 (0.169) 0.925 (0.229) 0.559 (0.103) 1.100 (0.184) 0.87
Pre
(0.1811)
N of Modifiers Per Noun
Phrase
0.585 (0.133) 1.168 (0.173) 0.899 (0.223) 0.577 1.216 0.96 (0.24)
Post
(0.150)** (0.218)**
Few differences were found in lexical complexity measures between pre-test and post-test results.
As illustrated in Table 4, there was a significant increase in the number of rare distinct words in Spanish
used by students in the control group from the pre-test to the post-test (χ²(5)=2.08, p=.039). In contrast,
the MTLD significantly decreased in Basque in the experimental group (χ²(5)=2.20, p=.029). No other
significant changes were found in any group, as neither the number of rare words nor semantic similarity
measures showed significant differences between the pre-test and post-test in any language.
5.429 (2.189) 12.898 (5.080) 7.265 (3.904) 5.17 (2.29) 13.03 (3.95) 7.13 (3.58)
Pre
N of distinct rare
words
4.837 (2.392) 14.388 7.429 (4.088) 5.12 (2.28) 13.34 (6.17) 7.48 (3.86)
Post
(5.450)**
17
4.3. Accuracy
Regarding accuracy measures, the control group showed an increase in errors overall in all
languages, as shown in Table 5. However, significant differences from the pre-test to the post-test were
found only in the error per word ratio in Basque (χ²(5)=2.089, p=.038) and English (χ²(5)=2.059,
p=.041). The experimental group, however, improved significantly in Basque, as they produced fewer
total errors (χ²(5)=3.466, p<.001), fewer errors per word (χ²(5)=3.089, p=.002) and fewer errors per
sentence (χ²(5)=2.210, p=.028). There was also a marginally significant improvement in the total
number of errors in English (χ²(5)=1.854, p=.065). No other significant changes were found.
0.069 (0.049) 0.059 (0.053) 0.098 (0.060) 0.10 (0.08) 0.07 (0.06) 0.14 (0.1417)
Pre
Errors per word
0.084 (0.047)** 0.077 (0.065) 0.136 (0.108)** 0.08 (0.05)** 0.07 (0.07) 0.11 (0.08)
Post
4.4. Fluency
In regard to fluency metrics, the control group showed a significant decrease from the pre-test to
the post-test in the number of words in the texts in Spanish (χ²(5)=1.993, p=.047) and in English
(χ²(5)=3.675, p<.001), as indicated in Table 6, while the number of words did not change in the
experimental group (all p<0.5). There were no significant changes in the number of sentences or
18
Table 6: Fluency measures: ***p<0.001, **p<0.05, *p<0.09.
CONTROL GROUP EXPERIMENTAL GROUP
MEASURE T
EU ES EN EU ES EN
5. Discussion
The main objective of this study was to explore how an experimental intervention focused on
complexity, lexical complexity, accuracy and fluency measures (Bulté and Housen 2012; Michel 2017;
Wolfe-Quintero et al. 1998). Our findings showed some significant differences between groups (control
and experimental), suggesting that the experimental intervention and the modified teaching materials
had a significant effect on the outcomes of the participants’ writing. Considering that all participants in
the sample, both control and experimental, completed the same tasks under identical conditions (i.e.,
exposure to the languages, teaching hours, school teachers, data collection schedule, science content)
and had a similar language proficiency (as measured by LexTale tests), the positive effect of the
Focusing explicitly on the students’ repertoire of knowledge positively influenced their written
outcomes, which supports Rinnert and Kobayashi’s (2016) multilingual writing model and confirms the
first hypothesis in the present study. The control group showed no significant changes from pre-test to
post-test in most of the measures, and did significantly worse in 4 of them, improving in only one. In
contrast, the experimental group showed significant improvements in 5 measures, mostly in Basque.
The fact that the latter showed more improvements in CAF measures might suggest that all four
components of the repertoire of knowledge require consideration when carrying out teaching
interventions. Teaching for the control group focused only on the topic and on disciplinary knowledge,
and the students in the control group showed no improvement in their writing from pre-test to post-test.
However, the proposed experimental instructional sequence also addressed the other two components
19
of knowledge (genre and multilingual writing knowledge). CDFs are considered the linguistic
realisation of cognitive activities (Dalton-Puffer 2013) and lie at the crossroads of language, content
and disciplinary literacies (Morton 2020). Therefore, the teaching of academic language and CDFs
might have stepped into all four components of the knowledge repertoire proposed by Rinnert and
Kobayashi (2016).
The experimental group performed worse in only one measure, the Basque MTLD. However, this
needs to be interpreted with caution due to the limitations of MultiAzterTest when performing
computational analysis of an agglutinative language such as Basque. In fact, the tool counts declined
words as different words when calculating the MTLD, which can result in inaccuracies in the analyses.
For example, the Basque word for renewable, "berriztagarri" (an adjective with no declension marks),
was counted differently depending on whether it included a declension mark, such as "berriztagarriak"
(plural accusative) or "berriztagarrienak" (plural accusative with superlative). Therefore, the decrease
might be due to fewer declension marks being used, rather than to the use of fewer different words.
Overall, teaching academic language was shown to be beneficial and had a significant positive effect
on students’ writing.
H2 was partially confirmed. The control group showed non-significant changes from the pre-test to
the post-test in lexical and syntactic complexity measures, and worse performance in fluency and
accuracy, whereas the experimental group improved in two dimensions (accuracy and syntactic
complexity) at the expense of the other two (fluency and lexical complexity), which did not change
significantly. The worse performance of the control group might be explained by a lack of genre and
writing strategies, which might have influenced how the students expressed their recently acquired topic
knowledge (Rinnert and Kobayashi 2016). As for the experimental group, the students’ writing showed
Two possible explanations may illustrate these findings. One explanation could be that the present
findings might not provide evidence to support trade-off effects if we acknowledge that the lack of
change does not necessarily mean that students have not developed their writing skills (Durrant et al.
20
2021). The fact that not all measures improved from the pre-test to the post-test does not necessarily
mean that no linguistic improvement took place (Michel 2017). As suggested by previous research (see
for example Polio and Shea 2014; Yoon and Polio 2017), CAF measures might need more time to
change significantly, which would explain the lack of development in many of the measures.
Additionally, the time between the pre-test and the post-test (3 months) might not have been sufficient
for significant change to occur, probably due to the cognitive, linguistic and maturational features of
the participants. Furthermore, the experimental intervention was not focused on teaching CAF or on
raising the students’ awareness of them, and measuring the effects of explicit instruction of CAF along
Another possible interpretation of the findings could support the LAC hypothesis and trade-off
effects between dimensions (Skehan 1998, 2009, 2015; de Jong et al. 2015). Other analyses of the
corpus showed that students in the sample had a better command of argumentation skills after the
experimental intervention (Garro et al., under review), which further supports the LAC hypothesis, as
it suggests that students in the experimental group focused primarily on arguing properly in the post-
test, rather than on the linguistic features of their compositions. However, mastering discourse-related
aspects such as argumentation might be correlated with the development of CAF measures. The
resource-directing dimensions of tasks (see Robinson 2007, Robinson and Gilabert 2007) shown by
students could have resulted in beneficial effects on accuracy and complexity measures. Further
research is needed to shed light on these two possible explanations and their potential interpretations,
Our third hypothesis predicted that improvement in CAF measures would be present mainly in the
language of instruction (Basque), but parallelisms in the other two languages were also expected. This
hypothesis was confirmed by the findings, as Basque measures were the most influenced by the
consequence of Basque being the students’ language of instruction and the language in which the
experimental instruction was conducted. However, as hypothesised, improvement tendencies were also
found in Spanish (syntactic complexity measures) and English (accuracy measures). These findings
21
may suggest that interventions and instructional sequences aimed at fostering students’ academic
language skills, such as scientific literacy, expressing cause and effect or argumentation, are indeed
beneficial for their acquisition of balanced literacies in their languages (Lorenzo and Rodríguez 2014;
Lorenzo and Trujillo 2017). The results of the present study seem to support Cummins’ (1980, 2021)
CUP theory in that they confirm the idea that some aspects of language are crosslinguistic, going beyond
language barriers. Along the same line, our results align with those reported by Granados et al. (2022)
and Arias-Hermoso and Imaz Agirre (2023) in that a parallel development of disciplinary academic
language takes place throughout a learner’s whole linguistic repertoire, presenting some overlapping
In conclusion, this study suggests that a focus on academic language and argumentation has a
positive impact on multilingual writing as measured by complexity, accuracy and fluency. Teaching
academic language was found to have a positive influence on students’ production of syntactic
complexity and accuracy in Basque, the language of instruction. In addition, trade-off effects were
observed in the results, as lexical complexity and fluency did not change significantly in the
experimental group. Unexpectedly, and in contrast to previous studies (e.g., de Jong et al. 2015), both
syntactic complexity and accuracy developed in parallel. This development, however, did not occur
only in the language of instruction but also in the students’ other two languages: their L1 (Spanish) and
the foreign language (English); this result supports multilingual models of writing (Arias-Hermoso and
Imaz Agirre, 2023; Cummins 2021; Granados et al. 2022; Rinnert and Kobayashi 2016).
Certain limitations to the present study must be acknowledged. First, participant exclusion in this
study was high, and the resulting sample was smaller than expected. Data collection took place during
the 2020-2021 and 2021-2022 academic years, during which COVID-19 health recommendations and
policies resulted in the implementation of lockdown protocols for students who tested positive or who
had been in contact with someone who tested positive. Consequently, 34% of the participants had to be
22
excluded from the sample because they were absent for at least one data collection point. As a result,
the students could not be separated according to their L1, which could provide a clearer picture of the
effectiveness of the experimental intervention. Second, language proficiency was measured by LexTale
tests. While these tests provide good representation of general proficiency (Bonvin et al. 2023; de Bruin
et al. 2017), they might not capture all skills related to language proficiency, as they focus solely on
word recognition. Moreover, the fact that Basque is an agglutinative language poses challenges when
conducting computational linguistic analyses. Some inaccuracies emerge, such as those mentioned in
the Discussion section regarding the MTLD in Basque. In addition, limitations related to the accuracy
analysis emerged during coding, including both the lack of inclusion types of errors and the impact of
errors on lexical complexity measures. Finally, we have to acknowledge the potential effect of the
different registers in the essays, which could have affected the production of CAF to a certain extent.
More research is needed to fully understand how disciplinary or subject-specific writing develops,
and quantitative computational tools that can carry out subject-specific language analyses are needed to
can then be created. Future studies should further investigate the effects of different types of
interventions on student writing measured by the CAF construct, since few intervention studies with a
(quasi-)experimental design have been conducted. Moreover, research and curricula designers should
consider interdisciplinary approaches in which science and language teachers collaborate to focus on
both disciplinary and language requirements. In addition, a larger sample of students with different
linguistic profiles and backgrounds is needed to explore whether similar developmental paths take place
after an intervention. Additionaly, the attentional requirements of producing CDFs need to be addressed
in subsequent studies.
Some implications can be drawn from this study, at both the theoretical and practical levels. The
present study theoretically supports the multilingual writing model suggested by Rinnert and Kobayashi
(2016) in that the repertoire of knowledge of a writer affects subsequent written production.
Furthermore, our study contributes to the field of multilingual writing with findings from a subject-
specific trilingual corpus, which is scarce in research. However, the main implications of this study are
23
pedagogical, as several applications for teaching and learning (disciplinary) languages can be drawn
from the present findings. Our study highlights the importance of focusing on academic and subject-
specific language conventions, such as scientific argumentation, in this case, as it may benefit students
not only in genre mastery but also in their written production of CAF. Benefits were observed despite
Basque being the L2 of the majority of the subjects, which emphasises the prominent role of the
language of instruction in building disciplinary literacies. As a concluding remark, and also as suggested
by previous research (Banegas and Mearns 2023; Sato 2023), it is crucial that researchers, educators
and material developers collaborate to design learning contexts, activities and interventions that foster
24
REFERENCES
Arias-Hermoso, Roberto, & Ainara Imaz Agirre. 2023. Exploring multilingual writers in secondary
education: insights from a trilingual corpus. European Journal of Applied Linguistics. Advance
Antero, Amaia, Artolazabal, Amaia, Garaialde, Esther, & Ibarzabal, Zigor. 2023. Bazatoz?
Ashoori Tootkaboni, A., & Pakzadian, M. 2020. Exploring the effects of pre-task planning time
on EFL learners’ narrative writing. Bellaterra Journal of Teaching & Learning Languages &
Banegas, Dario L., & Mearns, Tessa. 2023. The Language Quadriptych in content and language
integrated learning: Findings from a collaborative action research study. Journal of Multilingual
https://doi.org/10.1080/01434632.2023.2281393
Bengoetxea, Kepa, Gonzalez-Dios, Itziar, & Aguirregoitia, Andoni. 2020. AzterTest: Open Source
Linguistic and Stylistic Analysis Tool. Procesamiento Del Lenguaje Natural, 64, 61-68.
https://doi.org/10.26342/2020-64-7
Berninger, Virginia, Nagy, William, & Beers, Scott. 2011. Child writers’ construction and
https://doi.org/10.1007/s11145-010-9262-y
Bonvin, Audrey, Brugger, Ladina, & Berthele, Raphael. 2023. Lexical measures as a proxy for
25
Bui, Gavin, & Skehan, Peter. 2018. Complexity, Accuracy, and Fluency. The TESOL Encyclopedia
Bulté, Bram, & Housen, Alex. 2012. Defining and operationalising L2 complexity. In A. Housen,
Accuracy and Fluency in SLA (pp. 50-68). John Benjamins Publishing Company.
Casal, J. Elliot, & Lee, Joseph. J. 2019. Syntactic complexity and writing quality in assessed first-
https://doi.org/10.1016/j.jslw.2019.03.005
Cenoz, Jasone. 2009. Towards multilingual education: Basque educational research from an
Cenoz, Jasone. 2013. The influence of bilingualism on third language acquisition: Focus on
Cenoz, Jasone. 2023. Plurilingual education in the Basque Autonomous Community. In J. M. Cots
(Ed.). Profiling plurilingual education: A pilot study of four Spanish autonomous communities (pp.
Cenoz, Jasone, & Gorter, Durk. 2011. A Holistic Approach to Multilingual Education:
4781.2011.01204.x
Christie, Frances. 2012. Language education throughout the school years: a functional
perspective. Wiley-Blackwell.
Coffin, Caroline. 2006. Historical discourse: the language of time, cause and evaluation.
Continuum.
26
Crossley, Scott. A., & Kim, Minkyung. 2022. Linguistic Features of Writing Quality and
https://doi.org/10.37514/JWA-J.2022.6.1.04
Crossley, Scott. A., & McNamara, Danielle. S. 2014. Does writing development equal writing
Crossley, Scott. A., Weston, Jennifer L., McLain Sullivan, Susan. T., & McNamara, D. S. 2011.
The development of writing proficiency as a function of grade level: A linguistic analysis. Written
Cummins, Jim. 1980. The exit and entry fallacy in bilingual education. NABE Journal, 4(3), 25-
60. https://doi.org/10.1080/08855072.1980.10668382
Cummins, Jim. 2021. Rethinking the Education of Multilingual Learners: A Critical Analysis of
https://doi.org/10.21832/9781783096145-005
de Bruin, Angela, Carreiras, Manuel, & Duñabeitia, Jon Andoni. 2017. The BEST Dataset of
27
de Jong, Nivja H., Groenhout, Rachel, Schoonen, Rob, & Hulstijn, Jan. H. 2015. Second language
fluency: Speaking style or proficiency? Correcting measures of second language fluency for first
https://doi.org/10.1017/S0142716413000210
Durrant, Philip. 2017. Lexical Bundles and Disciplinary Variation in University Students’ Writing:
https://doi.org/10.1093/applin/amv011
Durrant, Philip, Brenchley, Mark, & McCallum, Lee. 2021. Understanding Development and
https://doi.org/10.1017/9781108770101
Fathi, Jalil, & Rahimi, Masoud. 2022. Examining the impact of flipped classroom on writing
complexity, accuracy, and fluency: a case of EFL students. Computer Assisted Language Learning,
2022. Role of EFL learners’ perceptions of task difficulty in complexity, accuracy and fluency: An
https://doi.org/10.30827/portalin.vi37.15855
Granados, Adrián, Lorenzo-Espejo, Antonio, & Lorenzo, Francisco. 2021. Evidence for the
Granados, Adrián, Lorenzo-Espejo, Antonio, & Lorenzo, Francisco. 2022. A portrait of academic
https://doi.org/10.1080/09500782.2022.2079951
28
Housen, Alex, & Kuiken, Folkert. 2009. Complexity, Accuracy, and Fluency in Second Language
Izura, Cristina, Cuetos, Fernando, & Brysbaert, Marc. 2014. Lextale-Esp: a test to rapidly and
Kormos, Judit. (2012). The role of individual differences in L2 writing. Journal of Second
Lahuerta, Ana. 2020 Analysis of accuracy in the writing of EFL students enrolled on CLIL and
non-CLIL programmes: the impact of grade and gender. The Language Learning Journal, 48(2),
121-132. https://doi.org/10.1080/09571736.2017.1303745
Larsen-Freeman, Diane. 2009. Adjusting Expectations: The Study of Complexity, Accuracy, and
https://doi.org/10.1093/applin/amp043
Lemhöfer, Kristin, & Broersma, Mirjam. 2012. Introducing LexTALE: A quick and valid Lexical
Test for Advanced Learners of English. Behaviour Research Methods, 44, 325-343.
https://doi.org/10.3758/s13428-011-0146-0
azterketa]. https://hdl.handle.net/20.500.11984/5964
Llinares, Ana, Morton, Tom, & Whittaker, Rachel. 2012. The Roles of Language in CLIL.
Lorenzo, Francisco. 2017. Historical literacy in bilingual settings: Cognitive academic language in
https://doi.org/10.1016/j.linged.2016.11.002
29
Lorenzo, Francisco, & Rodríguez, Leticia. 2014. Onset and expansion of L2 cognitive academic
https://doi.org/10.1016/j.system.2014.09.016
policymaking: Present state and future outcomes. European Journal of Applied Linguistics, 5(2),
177-197. https://doi.org/10.1515/eujal-2017-0007
research and implications for writing assessment. Language Testing, 34(4), 493-511.
https://doi.org/10.1177/0265532217710675
Maamuujav, Undarmaa. 2021. Examining lexical features and academic vocabulary use in
https://doi.org/10.1016/j.asw.2021.100540
Maamuujav, Undarmaa, Olson, Carol Booth, & Chung, Huy. 2021. Syntactic and lexical features
of adolescent L2 students’ academic writing. Journal of Second Language Writing, 53, 100822.
https://doi.org/10.1016/j.jslw.2021.100822
Manchón, Rosa M., & Polio, Charlene. 2021. L2 Writing and Language Learning. In R. M.
Manchón, & C. Polio (Eds.), The Routledge Handbook in Second Language Acquisition: Second
Marashi, Hamid, & Chizari, Azam. 2016. Using Critical Discourse Analysis Based Instruction to
Improve EFL Learners’ Writing Complexity, Accuracy and Fluency. Journal of English Language
McCarthy, Philipp, & Jarvis, Scott. 2010. MTLD, vocd-D, and HD-D: A validation study of
https://doi.org/10.3758/BRM.42.2.381
30
Michel, Marije. 2017. Complexity, Accuracy and Fluency in L2 Production. In S. Loewen, & M.
Sato (Eds.), The Routledge Handbook of Instructed Second Language Acquisition (pp. 50-68).
Routledge.
Moje, Elizabeth Birr. 2015. Doing and teaching disciplinary literacy with adolescent learners: a
https://doi.org/10.17763/0017-8055.85.2.254
Morton, Tom. 2020. Cognitive Discourse Functions: A Bridge between Content, Literacy and
Language for Teaching and Assessment in CLIL. CLIL Journal of Innovation and Research in
Navés, Teresa. 2011. How promising are the results of integrating content and language for EFL
writing and overall EFL proficiency? In Y. Ruiz de Zarobe, J. M. Sierra, & F. Gallardo del Puerto
Otegi, Arantxa, Imaz, Oier, Díaz de Ilarraza, Arantza, Iruskieta, Mikel, & Uria, Larraitz. 2017.
ANALHITZA: a tool to extract linguistic information from large corpora in Humanities research.
Pessoa, Silvia, Miller, Ryan T., & Kaufer, David. 2014. Students’ challenges and development in
0006
31
Polio, Charlene, & Park, Ji-Hyun. 2016. Language development in second language writing. In R.
Manchón, & P. K. Matsuda (Eds.), Handbook of Second and Foreign Language Writing (pp. 287-
Polio, Charlene, & Shea, Mark C. 2014. An investigation into current measures of linguistic
accuracy in second language writing research. Journal of Second Language Writing, 26, 10-27.
https://doi.org/10.1016/j.jslw.2014.09.003
Rinnert, Caroline, & Kobayashi, Hiroe. 2016. Multicompetence and multilingual writing. In R. M.
Manchón, & P. Matsuda (Eds.), Handbook of Second and Foreign Language Writing (pp. 365-
Robinson, Peter. 2007. Task complexity, theory of mind, and intentional reasoning: effects on L2
speech production, interaction, uptake and perceptions of task difficulty. International Review of
Robinson, Peter, & Gilabert, Roger. 2007. Task complexity, the Cognition Hypothesis and second
Roquet, Helena, & Pérez-Vidal, Carmen. 2017. Do Productive Skills Improve in Content and
Language Integrated Learning Contexts? The Case of Writing. Applied Linguistics, 38(4), 489-
511. https://doi.org/10.1093/applin/amv050
Sagasta, María Pilar. 2003. Acquiring writing skills in a third language: The positive effects of
https://doi.org/10.1177/13670069030070010301
Sato, Masatoshi. 2023. Navigating the research–practice relationship: Professional goals and
32
Shanahan, Timothy, & Shanahan, Cynthia. 2008. Teaching disciplinary literacy to adolescents:
Skehan, Peter. 1998. A cognitive approach to language learning. Oxford University Press.
Skehan, Peter. 2009. Modelling second language performance: Integrating complexity, accuracy,
Skehan, Peter. 2015. Limited Attention Capacity and Cognition: Two hypotheses regarding second
language performance on tasks. In M. Bygate (Ed.). Domains and Directions in the Development
of TBLT: A decade of plenaries from the international conference. John Benjamins Publishing
Company.
Speer, Robin, Chin, Joshua, Lin, Andrew, Jewett, Sara, & Nathan, Lance. 2018.
Teng, Mark Feng, & Huang, Jing. 2021. The effects of incorporating metacognitive strategies
instruction into collaborative writing on writing complexity, accuracy, and fluency. Asia Pacific
The JAMOVI Project 2022. JAMOVI (Version 2.3) [Computer Software]. Retrieved from
https://www.jamovi.org
Vasylets, Olena, Mellado, M. Dolores, & Plonsky, Luke. 2022. The role of cognitive individual
differences in digital versus pen-and-paper writing. Studies in Second Language Learning and
Wolfe-Quintero, Kate, Inagaki, Shunji, & Kim, Hae-Young. 1998. Second Language Development
33
Yoon, Hyung-Jo, & Polio, Charlene. 2017. The Linguistic Development of Students of English as
https://doi.org/10.1002/tesq.296
Zenker, Fred, & Kyle, Kristopher. 2021. Investigating minimum text lengths for lexical diversity
34