Professional Documents
Culture Documents
Couzijn - 1999 - Learning To Write by Observation of Writing
Couzijn - 1999 - Learning To Write by Observation of Writing
Couzijn - 1999 - Learning To Write by Observation of Writing
Abstract
A traditional writing pedagogy, learning-by-doing exercises, is criticised for its lack of focus
on balancing writing and learning processes. Three variants of learning-by-observation are
investigated as possible alternatives: observing writers as models (OW), observing both writers
and readers as models (OWR), and observing readers as feedback on writing performance
(FW).
Observations were made by means of authentic video-tape recordings of student writers or
readers (model conditions), or by live confrontations between writers and their readers
(feedback condition). Training focused on argumentative text. Participants were pre- and post-
tested on reading skill and writing skill in order to measure learning and transfer effects.
Results show that all observation conditions were more effective than the learning-by-doing
condition: OW and FW showed larger learning effects (on writing skill) and larger transfer
effects (on reading skill). Condition OWR only showed larger transfer effects. It is concluded
that the effective components of learning-by-observation deserve to be studied in more detail.
1999 Elsevier Science Ltd. All rights reserved.
1. Introduction
Traditional methods for writing instruction rely firmly on the effect of doing exer-
cises. Such a method can be summarised as follows:
0959-4752/99/$ - see front matter 1999 Elsevier Science Ltd. All rights reserved.
PII: S 0 9 5 9 - 4 7 5 2 ( 9 8 ) 0 0 0 4 0 - 1
110 M. Couzijn / Learning and Instruction 9 (1999) 109–142
edge for skilful regulation and thus execution of the entire writing process. Being
self-aware of one’s writing activities and their consequences is an essential step
towards detecting possible flaws in, and possibilities to enhance one’s, writing. Thus,
good writers invest in staying aware of their activities during the course of the writ-
ing process.
How does this perspective on performance regulation relate to learning? Writers
in a learning situation, like students at school, should consider a writing task or
exercise as being part of a learning task. They must execute two processes at the
same time: a writing process (with a material aim: producing a text) and a learning
process (with a cognitive aim: acquiring knowledge about, and skill in producing
such texts). This parallel learning process can be represented with the same mor-
phology as the writing process, including executive activities (orientation on the
learning task, performing learning activities), monitoring activities (self-observation
and evaluation of learning activities) and regulation of learning (e.g. starting over
again, or deciding to skip parts of the exercise). The writing task ought to be instru-
mental to the learning task; thus the quality of learning is dependent on they way
the student is able to connect these two tasks.
This connection lies in the instructiveness of writing experiences evoked in edu-
cation. To be instructive, writing rules, techniques, strategies must not only be
executed, but also be consciously monitored and conceptualised (“What have I been
doing now? How should I call it? Which strategy must I choose? Have I done any-
thing similar before?”), and their positive or negative effects determined (“That strat-
egy has been very time-consuming. This brainstorm gave me a lof of material. This
type of sentences served will in the conclusion.”). In this way, student writers may
use their writing experiences or evaluations (the output of the monitoring processes
112 M. Couzijn / Learning and Instruction 9 (1999) 109–142
on level II) as input for their learning. Students who put some effort into the evalu-
ation of their working-method invest in the meaningfulness and effectiveness of
their learning.
In sum, learning-to-write by doing writing exercises makes a strong appeal to the
learners’ self-observing and self-regulative capacities. The regulation concerns three
aspects of the task: a) students must maintain a “double agenda” with activities aimed
at text production, and activities aimed at learning; b) for each of these agendas they
must effectively alternate executional, monitoring, and regulative activities, and c)
they must control a variety of executive activities for the composition or comprehen-
sion of text.
It is, however, not possible nor productive for learners to be constantly aware of
all of these mental activities. Thus, a permanent awareness is not advocated here: it
takes much cognitive effort of learners to switch between levels of task execution,
self-observation and regulation. As a result, it may be that when learning-by-doing,
the self-observation activities focus on writing performance at the cost of learning
performance: the short-term interest may dominate. If this turns out to be a weak
spot of the learning-by-doing method, compensation must be sought in alternative
methods.
avoiding. In this learning method, the observed behaviour represents the observer’s
learning goal, and the learner does not take part in the observed communication.
Observation-as-feedback, however, relies on the observers’ participation in the
communication. Their initial task execution (here: doing a writing exercise) is fol-
lowed by observation of a communicative partner who performs the complementary
task (here: analysing and comprehending the text). Thus, the observers acquire infor-
mation about the adequacy of their writing which they cannot acquire in another way
(Schriver, 1992). The observed behaviour is complementary (from a communicative
perspective) to the observer’s learning goal, and the observer takes part in the com-
munication.
Both learning-by-observation methods may be superior to learning-by-doing,
although for different reasons. Observation-of-models may be effective because the
models offer concrete, realistic examples, and the focus is on monitoring and evaluat-
ing the execution processes. Observation-of-feedback may be effective because the
feedback is meaningful and authentic and the observer is personally involved. There
is no theory favouring one of these methods over the other.
In analogy to Sonnenschein and Whitehurst’s study, we included two variants of
the observation-of-models pedagogy: observing one communicative role (writers)
and observating both communicative roles (writers and readers, who perform a com-
plete communicative transfer). We expected that communication rules, observed in
both writing and reading contexts, will be acquired in a more abstract way which
enables active use in either of these modes. In other words, the acquired rule will
be transferred to the reading as well as the writing mode.
In sum, this experiment investigates the effectiveness of three functionally differ-
ent types of learning-by-observation in comparison to learning-by-doing. Two vari-
ants of observation-of-models are examined: Observing Writers as models (OW),
and Observing Writers and Readers as models (OWR). One variant of observation-
as-feedback is examined: observing readers as Feedback on one’s own Writing per-
formance (FW).
We distinguish effects on learning and effects on transfer. Learning stands for the
acquisition of skill within the same mode as the learning activities are aimed at:
transfer stands for acquisition of skill within the complementary mode. Since all
activities aim at writing skill acquisition, we call the effect on writing ability learning,
and the effect on reading skill transfer.
Fig. 2 represents the structural relations between the independent variables, and
indicate the research questions:
114 M. Couzijn / Learning and Instruction 9 (1999) 109–142
2.2. Operationalisation
3. Method
3.1. Design
3.2. Subjects
120 students from 8 city schools who had just finished the 9th grade (intermediate
and high levels) took part in the experiment. The average age was 15.5 years. 65%
of the participants were female; boys and girls were almost equally spread over the
conditions. For their participation, the students received a modest financial reward.
In the assignment of subjects, a stratification was applied regarding gender and
the level of education (intermediate vs. high); further assignment across the strata
was random. In each condition precisely 12 students from intermediate level and 18
from high level took part.
Table 1
The experimental design
The theory on argumentative texts forms the backbone of the four different courses
that were developed for the experimental conditions. Nevertheless, the subjects spent
about 70% of the time on the exercises in which the theory must be applied. The
nature of a course as a learning-by-doing course or a learning-by-observation course
is therefore not at all determined by the theory, but only by the type of exercises.
First, subjects in all conditions study the same theoretical part, and subsequently
answer one or two “control questions”. These questions ask for the gist of the part
that has just been studied and are intended to stimulate active reading of the theory.
Next, subjects apply the theory in one of four different types of exercises: individ-
ual writing exercises (DW), observation of writers (OW) or communicative dyads
(OWR), or observation as feedback on writing exercises (FW). After completing one
or more exercises, subjects continue with the next portion of theory, the next control
question, the next exercise, and so on.
The differences between the types of exercises can be explained by an example
from the first lesson. First, in the theoretical part, the two characteristics of argumen-
tative text have been introduced: a stated opinion, and one or more reasons for having
this opinion:
I think we should go to Italy for our holidays, because the whether is always
fine and the food is great.
You must really put the volume of your music down. I cannot work with all that
noise in my ears.
Thus, concept learning takes place (Mayer, 1983, 1987): subjects learn the concept
argumentative text, and learn to identify a text as belonging to a particular type,
according to a conceptual rule:
S and A are the essential parts/properties of a type B text
Such neutral conceptual rule can be used in either receptive or productive communi-
cation. It would have to be adapted to receptive tasks, aimed at determining the
properties from which class membership is inferred:
118 M. Couzijn / Learning and Instruction 9 (1999) 109–142
Check again the three examples on page 2 and then write three new examples
of argumentative texts.
DW subjects must use the rule productively. In the workbooks a limited space is
reserved for the answer, so they must confine themselves to application of the rule.
More specifically, they must give meaning to the abstract concepts “opinion” and
“reason for having this opinion”, aided by the examples. Secondly, they must under-
stand that both characteristics are necessary to meet the rule: opinions only, however
floridly presented, will not suffice. Finally, they must generate concretisations of the
characteristic concepts.
You are going to see two students doing this assignment. It is your task to find
out what they do well, and what they do wrong. When you have observed both
students, you may advance to the next page.
You saw two students doing the assignment. They wrote the following texts:
— Student 1: “I don’t need a dog any more, because I already have three.”
— Student 2: “Dogs are more fun than cats, but they need much more attention.”
⫽ ⫽ ⫽ > > > Which student did better, according to you? Student.....
⫽ ⫽ ⫽ > > > Explain briefly why you think the other student did worse.
— Student..... did worse, because: …………………………………………
M. Couzijn / Learning and Instruction 9 (1999) 109–142 119
The subjects get oriented to the observation exercise by reading the writing assign-
ment. Next they are explicitly instructed to evaluate the observed students’ task per-
formance, which should stimulate engaged and therefore instructive observation.
Observation thus holds that the subject checks the application of the rule by the
observed students — and which problems may arise.
After having observed two different student writers (see section Procedures) the
subject must determine if one of them did worse, and explain what exactly made
this performance less successful. In this way the subjects are forced to designate
“good models” and “worse models”, and not take anything for granted.
It should be noted that the subjects in this condition not only observe writing
processes, but also perform comprehension processes. In order to evaluate the texts
of the observed writers, they must analyse them in terms of the argumentative charac-
teristics. This side-note is important for the explanation of transfer effects.
The first student you will observe is the writer who was instructed to:
“Write a short argumentative text”
After (s)he wrote the text, the second student or reader was asked to:
“Determine if this is an argumentative text. Tell us why.”
Now you are going to observe both the writer and the reader. It is your task to
find out whether each of them does well, and what they may be doing wrong.
When you have observed both students, you may advance to the next page.
You saw two students doing writing and reading assignments. They answered:
— Writer: I will enjoy reading this book, because the introduction pleases me.
— Yes, that is an argumentative text. She gives her opinion about the book.
Explain briefly on which aspects the communication was successful or not.
—- Did the writer do well? O Yes O No because: ………………………..
—- Did the reader do well? O Yes O No because: ………………………….
OWR subjects must divide their attention between the two communication modes.
More than for the other subjects it may become visible for them how strongly writing
and reading — or the construction and reconstruction of meaning — are related
through the use of a conceptual rule for ’argumentative text’. The subjects evaluate
writers and readers by their use of this rule; or, more precisely, of the two variants
of the conceptual rule mentioned above. Because of this varied representation of the
120 M. Couzijn / Learning and Instruction 9 (1999) 109–142
rule, the theoretical element may become more flexible and therefore more readily
transferable to both reading and writing.
Check again the three examples on page 2, and write a new example of argumenta-
tive text. When you finish your text, you present it to the reader.
Now you will see a student analysing your text while (s)he is reading aloud. It
is the reader’s task to find out: a) whether your text is argumentative or not, and
b) which part represents the opinion, and which the reasons for it
Observe this reader’s performance. Don’t interrupt. It is your task to check if the
reader can fluently perform these tasks, and if not, to find out why not.
with students of the same age as students in the sample. These were not staged, but
authentic. 16 students did do all four lessons in front of the video camera, while
thinking-aloud whenever they had to do an exercise. Microphones helped to get the
necessary clear, comprehensible speech.
Thus, many successful and less successful task executions were collected from
which we could choose in editing the tapes for the experimental sessions. These
were edited such that for almost every exercise two different processes or solutions
were to be observed. This would provoke active interest from the observer s: they
would have to choose the best from the two realistic solutions to the task.
3.5. Comparing learning activities in the DW, OW, OWR and FW conditions
We can make a comparison between the four conditions with respect to the type
of cognitive activities they require. By doing so, it becomes clearer to which differ-
ences in activities we may attribute possible differences in effectiveness (Table 2).
We see that the orientation and reflection steps in the exercises are the same for
each condition. In these steps, the learner’s cognition is construed (making an initial
Table 2
Learning activities in the four experimental conditions
The posttests are aimed at the measurement of the dependent variables, as oper-
ationalised in four indicators (see Section 3). The quality of posttest measurement
was enhanced by adding pretests scores in a covariance analysis, filtering out poten-
tial disturbing effects, such as pre-experiment differences in abilitiy between groups.
3.6.1. Posttests for learning (writing ability) and transfer (reading ability)
The four indicators for writing skill (1a to 1d in Appendix A) were measured in
three posttests; the four matching indicators for reading skill (2a to 2d) were meas-
ured in three other posttests. The indicators for reading and writing skill match (e.g.
1a with 2a; 1b with 2b etc.) in that they build on the same knowledge or content
of the lessons. Some indicators are repeatedly measured in more than one posttest.
In Appendix A, second column, it is reported how the indicators were measured.
Writing indicators 1a, 1b, 1c (simple) and 1d were measured in two posttests (W1
and W3). Only indicator 1c (complex) required a specific test (W2). There were
no specific expectations as to which of the abilities would profit most from the
experimental interventions.
Reading indicators 2a, 2b, 2c (complex) and 2d were measured by having the
students perform an analysis of two larger texts (R2 and R3). In the other reading
test (R1), indicator 2c (simple) was measured. It consisted of multiple choice items
in which argumentation had to be identified and categorised.
3.6.2. Pretests (covariates) for IQ, initial writing ability and initial reading ability
Covariate analysis requires pretest measurement of relevant variables, which may
be — unintentionally — included in the posttest measurement and influence the
experimental effects. In this case, the initial skill level in reading and writing argu-
mentative text is relevant but, however, rather low (in the lower streams at school
there is not much systematic attention for argumentative texts) which impedes
measurement with the same instrument as used to measure the level after the training.
We have therefore not measured all indicators, and added three pretests for “intelli-
gence” as an alternative explanatory factor for differences in posttest performance.
Only the indicators 1a, 1b and 1c were measured in the writing pretests (PW1–
PW3), by asking the subjects to write two short argumentative essays, in which two
given standpoints had to be defended on the basis of some documentation.
Only indicator 2c was measured in a reading pretest (PR1): students had to do
M. Couzijn / Learning and Instruction 9 (1999) 109–142 123
two multiple choice tests, in which argumentation had to be identified (but not yet
categorised).
Added were three pretests for intelligence (IQ1–IQ3). Analysis of argumentation
has been considered as an ability to discern abstract relations between verbal units
(Oostdam, 1991); therefore we chose two validated CMR tests (“Conclusions”
(Elshout, 1966) and “Verbal analogies” (DAT, 1984)) and one CMU test (test Word
list (DAT, 1984).
The variable list can now be summarised as follows:
pretest posttest
1a) Writing — social context: PW1 W1D
1b) Writing — text structure: PW2 W1A, W3B
1c) Writing — argumentation structure — W3
Writing — argumentation structure — W1C, W2
(complex):
1d) Writing — means for presentation: PW3 W1B, W3A
2a) Reading — social context: — R3 ABCD
2b) Reading — text structure: — R2D, R3F
2c) Reading — argumentation structure PR1 R1
(simple):
Reading — argumentation structure — R2C, R3E
(complex):
2d) Reading — means for presentation: — R2AB, R3G
iq1 Intelligence — CMR Conclusions: IQ1 —
iq2 Intelligence — CMU Word list: IQ2 —
iq3 Intelligence — Verbal Analogies: IQ3 —
3.7. Procedures
For each subject, participation in the experiment took place in two four-hour ses-
sions over two days. On the first day, the pretests were administered during the first
two hours, after which students followed lesson 1 and lesson 2. On the second day,
the course continued with lesson 3 and lesson 4. After lesson 4, the posttests were
administered during the last two hours. We varied the order in which the six writing
and reading posttests were made, so that no test systematically followed another.
All subjects from conditions DW, OW, OWR and FR worked individually from
a workbook, in which theory and exercises were combined. Condition FW required
some co-operation between the subjects, so these sessions were limited to small
groups only. Students were informed about the time every fifteen minutes, so they
would not be surprised by a sudden deadline. The video-conditions were also con-
trolled by the length of the videotape (57–63 minutes) and an on-screen timer dur-
ing viewing.
book in a normal tempo until the hour was over. The time estimation of one hour
appeared to be sufficient.
4. Results
The results of this study will be presented in two parts. First we will report on
the instrumentation for the measurement of pre- and posttest variables. Quality
assessment is necessary because the instruments differ in nature and length, and
because most of them were constructed for the purpose of this study and thus not
tried out elsewhere. We will also pay attention to pretest scores when reporting and
discussing their inter-correlations. Differences in pretest scores between groups are
not statistically tested, since we will only use them as covariates in the analysis of
posttest data. In such a covariance analysis, the posttest scores must be regressed
first on the relevant covariating pretest scores; then the part of the posttest score that
M. Couzijn / Learning and Instruction 9 (1999) 109–142 125
can be safely attributed to the pretest is subtracted from the posttest scores, and a
variance analysis is performed on the corrected posttest scores. To this end, quantitat-
ive relations between the posttests are tested and discussed using a correlation matrix.
In the second part, the research hypotheses will be statistically tested, and a report
is given on the MANOVA’s performed on the posttest data using the relevant pretest
data as covariates. The first of the two sections in this part is about the learning
measures, the second about the transfer measures.
4.1. Instrumentation
We will list the instruments used for pre- and posttest measurement, give a short
description and some psychometric data: number of items (standard and rejected after
item-analysis using Ritem-total3 0,15 as a criterion) and homogeneity (after removal of
non-fitting items).
Indicators 1b, 1d, 2b, 2c (complex) and 2d were measured with more than one
test. The relevant parts of the tests were taken together in the analysis. The psycho-
metric data reported in this table is based on the two collapsed parts.
The quality of each test is indicated by its homogeneity (reliability) and other
aspects of validity. We must confine ourselves to the assessment of homogeneity,
since we have no other validity indicators than face-validity.
The homogenity of the tests (Cronbach’s alpha without the rejected items) is in
most cases sufficient ( > 0.60), with the exception of the pretest measurement of
indicator 1a and the posttest measurements of indicators 2a and 2d. It is not surprising
that these tests all have low numbers of items. When corrected for test length (with
the Spearman-Brown formula), the reliability of these tests falls within acceptable
ranges.
Items with an item-total correlation ⬍ 0.15 were rejected from the tests: they may
have an unclear or ambiguous formulation (thus functioning as trap questions) or an
extraordinarily high p-value, which could not discriminate between overall high scor-
ers and overall low scorers (Table 3).
With the help of a pretest-posttest correlation table (Appendix C), we can deter-
mine which pretest variables correlate significantly with their corresponding posttest
scores, and thus may function as covariates. One should be careful not to take up
too many covariates in the covariance-analytical model, since they decrease the
degrees of freedom in the final analysis of variance, and thus test power; even though
they might not filter out any undesirable variance from the posttest scores.
It can be read from the correlation table (Appendix C) that none but one of the
pretests correlates significantly with the posttest that was to measure the same con-
struct. Only the (parallel) pre- and posttest measures of indicator R3 (single) corre-
late. Other theoretically related pre- and posttests do apparently not measure the
same construct. A possible explanation is that students acquired really very new
knowledge, and their behaviour in coping with argumentative texts has changed dras-
tically, compared to how they wrote and read before the experiment. In sum, only
one of the reading and writing pretests will function as covariate.
In the upper half of Appendix C one can see that the average intercorrelation of
126
Table 3
Number of Items, number of rejected items, and reliability scores for pre- and posttests
posttest measures within each of the modes (both reading and writing) is much higher
than the average intercorrelation between the modes. Thus, we cannot consider the
operationalizations of each dependent variable (the indicators) as independent. For
this reason we will use multivariate analysis of variance (MANOVA), a technique
that takes the mutual influence of dependent variables into account. Since our design
consists of only one independent variable (type of practice), we will perform a multi-
variate one-way analysis of variance.
The lower half of the table is used to select the covariates that can be included
in the analysis. Note that the first IQ-pretest does not correlate with any posttest
variable, while the second and specially the third do, with two and with seven post-
tests respectively. A covariate will only be included in the analysis of posttest meas-
ures with which it statistically and theoretically related.
The mean pretest scores for the conditions do not show relevant differences (see
Appendix D), as could be expected from the random assignment of subjects. No test
for differences was executed, since we chose to use pretest data as covariates in
posttest analysis: this procedure removes external influences at the individual level
rather than at the group level.
From the pretest intercorrelation table (Table 4) it can be concluded that the three
IQ subtests do not measure the same components of intelligence. IQ3 (verbal
analogies) is obviously an outsider, while IQ1 (logical operators) and IQ2 (word list)
share only little variance. Each of these measures may be independently added as
covariates to the analyses of posttest data, insofar they share significant variance
with these tests.
The high correlation between IQ3 and PR1 (the identification of argumentative
relations) is remarkable. A general skill like the identification of abstract semantic
relations might underlie the strong relation between the two tests. However, this
must remain speculative.
The last conclusion is that the two tests for argumentation analysis or reading do
not feed on the same cognitive skill. Mere identification of argumentative relations
Table 4
Correlations between pretest scores
IQ1 1.0000
IQ2 0.2735** 1.0000
IQ3 0.0235 0.1510 1.0000
PW1 0.0638 ⫺0.0027 0.1320 1.0000
PW2 0.0447 0.0489 0.0160 0.1316 1.0000
PW3 ⫺0.0217 ⫺0.0367 ⫺0.0092 0.3168** ⫺0.2763** 1.0000
PW4 0.0347 0.0138 ⫺0.0180 0.3685** 0.0033 0.3697** 1.0000
PR1 0.0485 0.1980* 0.9244** 0.1298 0.0381 ⫺0.0399 0.0131 1.0000
PR2 ⫺0.0781 0.0130 0.0172 0.0390 ⫺0.0631 ⫺0.0152 0.0595 ⫺0.0568
Table 5 contains the mean posttest scores and standard deviations for each con-
dition.
In contrast to the pretest scores, there is much variance in mean posttest scores.
The groups apparently differ on most of the measures. On the other hand, the within-
group variance is considerable in comparison to the difference in mean scores. There-
fore it must be determined by a MANOVA whether the between-group differences
can be generalised. We test the hypotheses in multivariate procedures because the
Table 5
Means and standard deviations for posttest scores across conditions
DW: Learning by 1.51 (2.11) 7.51 (5.82) 21.51 (8.52) 16.55 (7.17) 4.13 (2.85)
Doing Writing
Exercises
OW: Learning by 4.13 (2.32) 9.34 (5.63) 27.44 (8.06) 22.65 (5.82) 8.89 (3.53)
Observation (1
mode)
OWR: Learning by 3.58 (2.35) 8.75 (5.16) 26.03 (8.05) 20.27 (7.92) 5.65 (3.65)
Observation (2
modes)
FW: Learning by 3.79 (2.09) 8.51 (4.68) 26.55 (7.51) 22.55 (6.83) 8.72 (3.82)
Observation as
Feedback
Max. score: 6 14 30 32 12
DW: Learning by 5.72 (3.25) 8.97 (5.69) 23.32 (8.15) 8.58 (5.50) 5.36 (1.82)
Doing Exercises
OW: Learning by 8.06 (2.64) 17.26 (4.59) 27.65 (10.05) 24.27 (4.62) 12.08 (2.17)
Observation (1
mode)
OWR: Learning by 8.75 (3.18) 15.54 (5.05) 28.62 (9.74) 21.44 (5.43) 12.54 (2.80)
Observation (2
modes)
FW: Learning by 8.72 (2.21) 18.10 (4.84) 29.34 (8.75) 24.84 (4.39) 12.26 (1.96)
Observation as
Feedback
Max. score: 12 22 43 34 18
M. Couzijn / Learning and Instruction 9 (1999) 109–142 129
Table 6
MANOVA tests for between-group differences in ‘learning to write’. In the statistical design, all five
indicators are included in the construct ‘writing skill’
Table 7
MANOVA tests for between-group differences in ‘transfer to reading’. The design includes all five indi-
cators of the construct ‘reading skill’
and reading processes. According to our definitions of learning and transfer, we must
consider the increased reading or writing skill that results from these observations
as learning. Thus we can, and do not, pay attention to any, maybe embedded, transfer
effects between the two learning effects.
We should nevertheless compare the OWR score on the reading posttest to the
transfer-to-reading score of the DW condition. As we have seen, their scores on the
writing posttest are equal. A difference on the reading posttest would be an important
argument to favour one condition over the other. A MANOVA shows a large signifi-
cant result (F ⫽ 3,64; p ⬍ 0.01; n ⫽ 29) in favour of the OWR group in comparison
with DW. So we must conclude that, although learning-by-doing and learning-by-
observing both roles have equal learning effects, the latter method deserves more
credit because its transfer effects are larger. Its hidden strength lies in the learners’
ability to adapt their knowledge to complementary situations: the reading mode.
In summary, we found that transfer from writing practice to reading skill was
promoted more by two types of learning-by-observation than by learning-by-doing
activities. At this point we can only establish that more transfer takes place; in order
to establish how much more we must use a kind of quantification which 1. enables
incorporation of the five indicators into one construct writing skill or reading skill,
and which 2. is informative in that it expresses the achieved amount of transfer in
relation to some meaningful criterion.
The aim of this experiment was to test a theory about effective learning activities
for writing instruction; in this case regarding argumentative text. It was expected
that two instances of observational learning would yield higher learning effects on
writing and higher transfer effects on reading. The rationale for the learning activities
has been presented in Section 2.
The expectations were experimentally put to the test, using a full — between
pretest — posttest design, in which four groups of thirty 15-year old high school
students took part. The four treatments consisted of short experimental courses aimed
at learning to write argumentative text. The presented subject-matter was the same
for each group, but the learning activities varied systematically: doing writing exer-
cises (DW), observing writers (OW), observing both writers and their readers
(OWR), and doing a writing exercise and observing a reader as feedback (FW). After
a pretest session and four one-hour training sessions, the same set of posttests for
reading and writing skill were administered to all participants.
Multivariate analysis of variance was used in order to test the hypotheses regarding
learning effects, using the instructional method as an independent variable and a set
of five indicators for writing skill as a complex dependent variable. The hypotheses
about transfer effects were tested in the same way, with a set of five indicators for
reading skill.
The main findings are that both types of learning-by-observation (observation-as-
model and observation-as-feedback) are more effective than learning-by-doing
132 M. Couzijn / Learning and Instruction 9 (1999) 109–142
5.1. Validity
The experiment has been designed in such a way that several alternative expla-
nations have been ruled out. The experimental groups can be considered comparable,
the time-on-task was equal for all students, no teacher could have influenced the
results since all courses are self-instructive, the students were equally motivated by
a small reward, the treatments and the tests correspond for every condition because
the subject matter was the same for everyone.
Nevertheless, criticism of the validity of the results is possible. For instance, an
important difference between the learning-by-doing and the learning-by-observation
conditions is that the former are very familiar for the student and the latter not. It
may be that the novelty of observations, the use of video, the observation of live
models, was more interesting for the participants, which can partly account for the
M. Couzijn / Learning and Instruction 9 (1999) 109–142 133
experimental effects. On the other hand, research assistants have noticed both
enthusiastic and tedious reactions from all students while they did the tests or worked
in their workbooks. Tediousness was not necessarily greater in a particular condition,
although we have not checked this. An indication may be that the number of not
completed workbooks or not completed tests (a possible symptom of disinterest)
does not vary across the groups. It must also be added that a more active attitude
may be specific for learning activities that call for special attention, such as obser-
vations and evaluations.
Due to requirements of the organisation, workings conditions were not equal for
all conditions. Subjects in the learning-by-doing group worked individually, while
seated in a large room with 3 to 8 people at a table, leaving more than enough space
to work. It was not allowed to co-operate or to converse during the lessons. Subjects
in the model condition, who had to use a videoset, were seated in a middle-size
room with a table for themselves. Only six persons were at the same time in the
room. Subjects in the feedback condition worked in a large room, with only two
writers, the proof-reader, and the research assistant present. If group size influences
performance, then this has worked to the advantage of the feedback condition. On the
other hand, these subjects had to cope with more organisational problems (walking to
and from the proof-reader, keeping a very strict time schedule).
There are also weaknesses in the experimental and statistical design of the study.
In the first place, the pretest-posttest design is not genuine, since the pre- and post-
tests are not equal. We had to use different pre- and posttests, because two quite
different levels of mastery had to be reliably measured without bottom- or ceiling
effects. The pre- and posttests therefore aimed at the same main skills (aspects of
reading and writing argumentative text), but different subskills may be assessed.
This is related to the problem of the covariates. Pretests were included in the
design to enable covariance analysis which would filter out undesirable effects in
the posttest measurements. However, the majority of the pretests did not correlate
with posttests that were aimed at the same construct. It is uncertain what the pretests,
which in themselves are sufficiently homogeneous, have measured. It is anyway
unwise to use non-correlating pretests as covariates, so we have left them out.
There are some threats to the external validity too. Due to the organisation of the
experiment, the posttests were administered almost immediately after the training
had taken place. We can therefore not be certain about the durability of the results.
On the other hand, it was not our aim to develop long-lasting skills for the students,
but to find an answer to our research questions about effective learning activities.
Durability will be an important feature in real educational settings, but did not have
the highest priority for this study.
A similar threat to external validity may be that the pretests were administered
immediately before the first training session. The pretests did not need much instruc-
tion and explanation, but it remains possible that they influenced the prior knowledge
or attitudes of the participating students. Since all participants started with the pre-
tests, it will not account for between-group differences after the experiment. Never-
theless it can put limitations on the generality of the findings, if the pretests served
134 M. Couzijn / Learning and Instruction 9 (1999) 109–142
are combined and the drawbacks compensated. That the qualities of learning-by-
observation deserve to be studied in more detail, is what we hope to have demon-
strated.
Acknowledgements
The author wishes to thank two anonymous reviewers for their constructive and
instructive comments.
Indicator: Measurement/scoring:
Indicator: Measurement/scoring:
2a) Reading — social context: the Subjects must indicate the social
ability to identify certain concepts in the parameters in several argumentative
text, by which the text can be placed in texts. An example: “They are too lazy
the social context of an argumentative to work!”. That is what I keep hearing
discussion; (See 1a) above for this set when I ask people what we should do
of concepts.) about the growing army of the
unemployed ( ⫽ attractor). I find more
and more people talking about the
question whether the labour act of 1963
shouldn’t be sharpened ( ⫽ issue).
There is quite some disagreement: our
government seems to be quite fond of
the idea, and the Parliament has reacted
rather moderate — but hasn’t
condemned the plan either ( ⫽ other
parties ⫹ standpoints). Personally I feel
little sympathy for a change of law, and
M. Couzijn / Learning and Instruction 9 (1999) 109–142 137
2b) Reading — text structure: the In the posttest, students analyse two
ability to analyse argumentative texts in texts (400 and 500 words) using the
terms of a standard structure, which same structural components as presented
asks for specific subdivisions of above under 1b). The texts have been
introduction, body and ending specially constructed for the purpose,
(components: see above under 1b). which makes the job doable. Each
analysed component is scored.
Appendix C
Pretest-posttest correlation table
W1
W2 0.1017
W3-1 0.4556** ⫺0.2602**
W3-2 ⫺0.0054 0.0763 ⫺0.0741
W4 0.2996** 0.0734 0.4283** 0.0383
R1 0.1183 ⫺0.0227 ⫺0.0496 0.0849 0.1770*
R2 0.0890 0.1780 0.0166 0.1411 0.1909* 0.4523**
R3A 0.2166* 0.0191 0.1133 0.1226 0.1549 0.0584 0.1925*
R3B 0.0377 0.0342 0.0733 0.1479 0.2029* 0.4269** 0.6523** 0.1560
R4 0.1036 0.0826 ⫺0.0167 0.1117 0.1137 0.4297** 0.6342** 0.2199* 0.5773**
IQ1 0.0147 ⫺0.0751 0.1010 ⫺0.0972 0.1223 0.0963 0.1994 ⫺0.0141 0.1724 0.1470
IQ2 ⫺0.1163 ⫺0.0250 ⫺0.0392 0.0231 0.0296 0.1271 0.2423** 0.0611 0.2905** 0.1355
IQ3 0.2420** 0.0277 0.1432 0.3670** 0.2108* 0.1541 0.4268** 0.4514** 0.2995** 0.4311**
PR1 0.1897 ⫺0.0064 0.1601 0.3275** 0.2192* 0.1918* 0.4310** 0.4495** 0.3278** 0.3369**
PR2 0.0345 0.0805 0.1608 0.0205 0.0761 0.0692
M. Couzijn / Learning and Instruction 9 (1999) 109–142
* ⫽ p ⬍ 0.01 ** ⫽ p ⬍ 0.001
Appendix D
Means and standard deviations for pretest scores across conditions
Learning by Doing Exercises 19.53 8.92 46.50 7.78 22.51 10.19 22.10 6.15 5.55 2.77
Learning by Observation (1 mode) 20.60 7.22 50.55 7.82 23.31 10.01 22.03 4.72 6.06 3.18
Learning by Observation (2 20.79 7.66 52.31 11.33 25.13 10.86 23.13 6.03 4.58 2.89
modes)
Learning by Observation as 18.82 6.93 52.55 9.71 25.51 10.17 23.62 5.57 4.00 2.29
Feedback
Max. score: 40 75 50 33 20
CONDITION: W1 W2 W4
soc. Context text presentation
structure
M. Couzijn / Learning and Instruction 9 (1999) 109–142
Max. score: 8 18 6
141
142 M. Couzijn / Learning and Instruction 9 (1999) 109–142
References
Anderson, J. R. (1990). Cognitive psychology and its implications (3rd ed.). New York: Freeman.
Bandura, A. (1977). Social Learning Theory. Englewood Cliffs, N.J.: Prentice Hall.
Bandura, A. (1986). Social foundations of thought and action: A social-cognitive theory. Englewood
Cliffs, N.J.: Prentice Hall.
Couzijn, M. J. (1995) Observing writing and reading processes. Effects on learning and transfer. Amster-
dam: Dissertation University of Amsterdam.
Couzijn, M. J. & Rijlaarsdam, G. C. W. (1996). Learning to write by reader observation and written
feedback. In G. Rijlaarsdam, M. Couzijn & H. v.d. Bergh (Eds.), Effective teaching and learning of
writing. Amsterdam: Amsterdam University Press.
Elshout, J. J. (1966). Conclusies. Amsterdam: Psychological Laboratory, University of Amsterdam.
(Internal publication).
Kuhl, J. & Kraska, K. (1989). Self-regulation and metamotivation: Conceptual mechanisms, development,
and assessment. In R. Kanjer, P.L. Ackerman & R. Cudeck (Eds.), Abilities, motivation, and method-
ology. Hillsdale, N.J.: Lawrence Erlbaum.
Mayer, R. E. (1983). Thinking, problem solving, cognition. New York: Freeman
Mayer, R. E. (1987). Educational psychology. A cognitive approach.. S.L.: Harper Collins.
Ng, E., & Bereiter, C. (1992). Three levels of goal-orientation in learning. The Journal of Learning
Sciences, 1(3), 243–273.
Oostdam, R. J. (1991). Argumentatie in de peiling. Amsterdam: SCO. Dissertation University of Amsterd-
am.
Salomon, G., & Globerson, T. (1987). Skill may not be enough: The role of mindfulness in learning and
transfer. International Journal of Educational Research, 11, 623–637.
Salomon, G., & Perkins, D. N. (1989). Rocky roads to transfer: Rethinking mechanims of a neglected
phenomenon. Educational Psychologist, 24(2), 113–142.
Schriver, K. A. (1989). Evaluating text quality: the continuum from text-focused to reader-focused
methods. IEEE Transactions on professional communication, 32, 238–255.
Schriver, K. A. (1991). Plain language through protocol-aided revision. In E. R. Steinberg (Ed.), Plain
Language: Principles and Practice. Detroit, MI: Wayne State UP, 148-172.
Schriver, K. A. (1992). Teaching writers to anticipate readers’ needs: a classroom-evaluated pedagogy.
Written Communication, 9(2), 179–208.
Schunk, D. H. (1991). Learning Theories. An Educational Perspective. New York. etc.: Merril/Macmillan.
Schunk, D. H. & Zimmerman, B. J. (1994). Self-regulation of learning and performance. Issues and
educational applications. Hillsdale, N.J.: Lawrence Erlbaum.
Simons, P. R. J. & Beukhof, G. (1987). Regulation of Learning. ’s-Gravenhage: SVO-Selecta.
Sonnenschein, S., & Whitehurst, G. J. (1983). Training referential communication skills: The limits of
success. Journal of Experimental Child Psychology, 35, 426–436.
Sonnenschein, S., & Whitehurst, G. J. (1984). Developing referential communication: A hierarchy of
skills. Child Development, 55, 1936–1945.
Van Eemeren, F. H. & Grootendorst, R. (1983). Argumentatieleer 1. Het analyseren van een betoog.
Groningen: Wolters-Noordhoff.
Van Eemeren, F. H. & Grootendorst, R. (1992). Argumentation, communication, and fallacies. Hillsdale,
N.J.: Lawrence Erlbaum.
Vermunt, J. D. H. M. (1992). Leerstijlen en sturen van leerprocessen in het hoger onderwijs. Lisse: Swets
and Zeitlinger.