Texto+19.1 Pendiente

School Effectiveness and School Improvement
Vol. 16, No. 2, June 2005, pp. 179 – 196
The Development of Metacognition in

Primary School Learning Environments
Bernadet de Jager, Margo Jansen* and Gerry Reezigt
University of Groningen, The Netherlands
(Received 7 August 2003; accepted 8 July 2004)
Constructivist ideas have influenced recent major innovations in Dutch secondary education and
new curricula for reading and math in primary education, for example, pay much more attention to
metacognition than before. In our study, we compared the growth of student metacognition in
varying learning environments, direct instruction, and cognitive apprenticeship in primary school.
The study also included a control group of teachers. In order to measure metacognition we
developed a questionnaire, with separate parts for metacognitive skills and metacognitive
knowledge. In the item selection procedure we made use of item response modeling. It was
found that in the direct instruction and the cognitive apprenticeship group the pupils had higher
scores on metacognitive skills and metacognitive knowledge compared to the control group pupils.
No clear differences were found between direct instruction and cognitive apprenticeship.
Interactions of learning environment and student intelligence were non-significant for both
output measures.
Introduction
Constructivism has changed the traditional view of learning as knowledge absorption
into a view of learning as active knowledge construction. Students actively process
information, using prior knowledge, skills, and strategies (Resnick, 1989). Learning is
considered a constructive, cumulative, self-regulated, goal-oriented, situated,
collaborative, and individually different process of knowledge building and meaning
construction (De Corte, 2000). Education is no longer expected to focus solely on the
transfer of knowledge, but also on the development of metacognition.
*Corresponding author. GION, Groningen Institute for Educational Research, University of

Groningen, PO Box 1286, 9701 BG Groningen, The Netherlands. Email: g.g.h.jansen@rug.nl
ISSN 0924-3453 (print)/ISSN 1744-5124 (online)/05/020179–18
ª 2005 Taylor & Francis Group Ltd
DOI: 10.1080/09243450500114181
180 B. de Jager et al.
Constructivist ideas have influenced recent major innovations in Dutch secondary

education and are gaining ground in primary schools fast. New curricula for reading
and math in primary education, for example, pay much more attention to
metacognition than before. Dutch primary school teachers generally are more
sympathetic to the basic principles of constructivism than secondary school teachers
(Roelofs & Visser, 2001). However, they are often confused about the demands that
new educational goals such as the development of metacognition pose to them and
they are not sure which learning environments are most effective for metacognition.
The essential questions for many teachers refer to the extent of structuring they
should provide for their students, especially when students of different intelligence
levels are grouped in heterogeneous classes, which is common practice in The
Netherlands. We have concentrated on this question by studying the impact of
learning environments which differ in their degree of structuring on student
metacognition.
In the research on metacognition, the actual measurement of metacognition is of
course vital but often problematic. There are several measurement methods, all with
their specific benefits and drawbacks. In our study, we used questionnaires to
measure metacognition, mainly for practical reasons such as the number of students
in our sample. We will discuss the benefits and drawbacks of the questionnaire
method and we will outline the item response theory (IRT) measurement model that
we used to examine the questionnaire data.
Background
Metacognition
The concept of metacognition was introduced by Flavell in 1976 and his
characterisations of the main elements of the concept are still in use (Boekaerts &
Simons, 1993; De Jong, 1992; Resnick, 1989; Simons, 2000).
Metacognition, according to Flavell, encompasses two elements: skills and
knowledge. By several authors metacognitive skills, the self-regulating activities
shown by learners, are further subdivided into skills that can be used before, during,
and after learning activities (Bereiter & Scardamalia, 1989). Before starting to work
on a task, orientation and planning are important, while during the task such skills as
monitoring, testing, making a diagnosis, and repairing are necessary skills. After the
completion of a learning task evaluation and reflection come into focus.
Metacognitive knowledge refers to the knowledge of learners about their own
cognition, cognitive functioning, and possibly that of others. This knowledge is
enlarged by reflection on learning experiences and can be used in the planning of
further learning tasks.
Because metacognition does not develop automatically in all students, teachers
play an essential part in its development. Some authors suggest that especially low
achievers need specific teacher support while high achievers develop metacognition
more easily without any teacher interference (Davidson, Deuser, & Sternberg 1995;
Development of Metacognition in Primary School 181
Davidson & Sternberg, 1998). In the absence of teacher support, high achievers will
take more advantage of the education offered to them and extend their lead (Biemans,
Deel, & Simons, 2001; Bolhuis, 2000; Mayer, 2001). It seems evident that teachers at
least should teach students how to regulate their learning processes before they hand
over responsibilities for learning to them (Schoenfeld, 2001), and for obvious reasons
this is especially important for students who do not have metacognition at their
disposal without any help.
Another point is the relation between intelligence and metacognition (Minnaert &
Janssen, 1999; Veenman, 1992). Veenman (1992) discusses possible models for the
relationships between metacognition and intelligence. First, metacognition can be
viewed as an integral part of intelligence. The independence model rejects this
assumption. Here, metacognitive skills and intelligence are considered as indepen-
dent predictors of learning. In the mixed model it is assumed that metacognition and
intelligence overlap.
How to Measure Metacognition

Metacognition has been measured by means of questionnaires, interviews, thinking
aloud protocols or simulated tutoring (Desoete, Roeyers, Buysse, & De Clercq, 2002;
Kluvers & Simons, 1992; Meijer, Elshout-Mohr, & Van Hout-Wolters, 2001;
Pintrich & De Groot, 1990; Van Hout-Wolters, 2000). Compared to other data
collection methods which can be used to assess metacognition, questionnaires have
the advantage that they are both easy to administer, especially in large samples, and
easy to analyse (De Jong, 1992; Walraven, 1995), but there are also drawbacks. As
questionnaires ask students explicitly about metacognition, they may measure a
student’s perception of metacognition rather than the actual use of metacognition in
educational tasks. Also, they may not be suitable for children who find it hard to
reflect on learning behaviour (Klatter, 1996). Another drawback is that self-report
measures may be influenced by response tendencies such as social desirability.
Similar objections, however, hold for the interview method. Therefore, some
researchers have used thinking aloud protocols (Ericsson & Simon, 1993; Pressley &
Afflerbach, 1995; Van Someren, Barnard, & Sandberg, 1994), in which students
report what they do during task performance. Unfortunately, thinking aloud may
decrease the speed and influence the method of task execution (Dominowski, 1998).
Moreover, thinking aloud is easier for verbally skilled students. Also, metacognition
may in some students function on a subconscious level, so that they cannot report
about it, even though they have acquired it. Recently, researchers have applied
simulated tutoring, a combination of thinking aloud and interview procedures
(Simons, 2000), in which a student is asked to explain to an (imaginary) other student
how to execute a task. This method presupposes that metacognition can be made
visible without actual task execution by the students.
The adequacy of metacognition questionnaires therefore, is an important, but often
not explicitly studied topic. Still, because of the time-consuming character of
interviews, thinking aloud protocols, and simulated tutoring, questionnaires will
continue to be used in metacognition research even though the shortcomings are

evident. In our own study, we also used the questionnaire method for the
measurement of metacognition.
Learning Environments and Metacognition

There is an ongoing debate about the issue as to which learning environments are
most suitable for the development of metacognition in students. Especially the role of
the teacher is discussed. Do teachers need to teach according to relatively new
models, based on constructivist theory? Or can teachers also rely on more traditional
models, which generally provide more structuring by teachers?
Veenman (1992), for example, suggests that the model of direct instruction,
provided that it is extended to encompass the training of metacognition, can be used
in modern educational practice. The impact of direct instruction on cognitive
outcomes has been widely demonstrated (Muijs & Reynolds, 2001; Pressley &
McCormick, 1995) and experiments have shown convincingly that teachers can be
trained successfully to implement the model in their classrooms (Hoogendijk &
Wolfgram, 1995; Sliepen & Reitsma, 1993; Veenman, Leenders, Meyer, & Sanders,
1993). However, it is not clear whether teachers can use the direct instruction model
for the development of metacognition in their students.
In contrast, other researchers suggest that teachers need instructional models such
as reciprocal teaching, procedural facilitation, modelling, and cognitive apprentice-
ship (Resnick, 1989) in order to achieve student metacognition. These models,
developed in the field of instructional psychology, are based on constructivist ideas
about learning, and aim especially at the development of metacognition. A major
difference in comparison with direct instruction is the low degree of structuring
offered by teachers. Research on the effects on metacognition showed some impact
(Brand-Gruwel, 1995; Rosenshine & Meister, 1994). However, studies often took
place in laboratory settings, where small groups of students were trained outside their
classrooms and instruction was generally provided by researchers instead of teachers
(De Corte, 2000). As a consequence, it is still unclear whether regular teachers can
successfully use these models for teaching metacognition.
Both lines of research have not studied intelligence differences between students
extensively. So far it is not clear whether direct instruction and constructivist models
are equally suitable for all students or merely successful for specific groups of students.
Research Questions
Our study focused on the following research questions:
1. Can we measure metacognition adequately by means of a questionnaire?

2. Which effects do learning environments that differ in degree of teacher
structuring have on metacognition, and do these learning environments produce
differential effects for students of different intelligence levels?
Research Design
Sample
In the sampling stage of our study, we contacted all Dutch primary school teachers in
the northern part of The Netherlands who taught seventh grade and who used the
curriculum ‘‘I know what I read’’ (n = 83). The contacts were made by mail and by
telephone. Almost 25% of this group (20 teachers) participated voluntarily in our
study together with all their students in the seventh grade, who were on average 11
years of age. The teachers used the curriculum ‘‘I know what I read’’ (in Dutch: ‘‘Ik
weet wat ik lees’’), which pays attention to the development of metacognition, but
differed in the learning environment which they offered to their students. Assignment
to the experimental and control conditions was also based on voluntary participation
and therefore non-random. The teachers in the direct instruction (DI) and cognitive
apprenticeship (CA) groups received exemplary lessons specifically designed to
enhance the implementation of either DI or CA, as well as a 15-hr training. The
training was given during 5 sessions in which the theory was explained, and practice
and feedback were given. Additionally, there were coaching sessions. The control
group consisted of teachers who had indicated that they practised no specific
instructional model. Teachers in this group received no training. Table 1 shows the
numbers of students and teachers in the research groups.
Measurement of Metacognition (Skills and Knowledge) and Student Intelligence

Because of the relatively large number of students, we used a questionnaire for the
measurement of metacognition, with separate parts for metacognitive skills and
knowledge. Both variables were measured twice, at the beginning and the end of the
school year 1998/1999. To construct the questionnaire, we used items from already
existing Dutch instruments, developed to measure metacognition in a reading setting
(Brand-Gruwel, 1995; Kluvers & Simons, 1992; Walraven, 1995).
The first part of the questionnaire about the use of metacognitive skills asked 22
questions. Students indicated to what extent the use of a skill described in an item
corresponded with their behaviour. They could choose between yes, sometimes, and
no. The items reflected skills in different stages of the reading comprehension
process:
Table 1. Numbers of students and teachers in the research groups
Learning environment Students Teachers
Cognitive apprenticeship 118 8

Direct instruction 72 5
Control 97 7
Total 287 20
. skills used before reading, for example ‘‘Before I start reading, I look at the pictures
and the title of the text’’;
. skills used during the reading process, such as ‘‘During reading, I think over how
the text will continue’’;
. skills aimed at repairing misunderstanding, such as ‘‘When I notice that I do not
understand a part of the text, I read difficult parts of the text once more’’;
. skills used after reading, for instance ‘‘When I have finished reading, I try to tell
myself what the text was about’’.
The second part of the questionnaire about metacognitive knowledge offered 12

questions reflecting these same stages. Students now had to pick one of two given
answers, the one they think is the best. For example, one of the items asks: ‘‘What is
the best thing to do before you start reading?’’ The answers are: ‘‘to ask yourself what
the text will be about’’ and ‘‘to read the last sentence, so that you know how the text
comes to an end’’.
Student intelligence was measured once at the beginning of the school year. Given
the educational setting of the study, metacognitive knowledge and skills in a reading
comprehension context, we chose a non-verbal intelligence test, to avoid undue
overlap with reading skills. For this we used the analogies subtest of the Snijders-
Oomen Non-verbal Intelligence test (revised version, SON-R), that could be
administered to classes of students. The reliability of the analogies subtest, estimated
by Cronbach’s alpha, was .78 (N = 282), which is almost identical with the coefficient
of .79 in the norming sample (Laros & Tellegen, 1991). The score on the analogies
subtest is considered a good proxy for an IQ-score measured by the full SON-R. The
subtest consists of 30 changing geometrical figures. Students have to discover the
principle behind the change and apply this to another figure. They can choose from
four figures. Students had 15 min to complete the test.
Other characteristics of the students, such as gender, ethnicity, and SES, were also
available and used to check for systematic differences, between the groups. Ethnicity
is at present not an important factor in the northern part of The Netherlands.
In a preliminary analysis, we found small non-significant differences between the
three groups in mean IQ scores and in the boys-girls ratio. With regard to SES, a
difference was found between the two experimental groups. We assumed that using
both SES and IQ might lead to overcorrection. Only the intelligence measure was
used in the subsequent analyses as a covariate.
Measurement of Learning Environment

The learning environment as provided by the teacher was measured by means of
observations of reading comprehension lessons. The focus of the observations was on
the specific characteristics of direct instruction (DI) and cognitive apprenticeship
(CA). Before drawing any conclusions about the impact of learning environments on
metacognition, we wanted to be sure that teachers in different environments actually
differed in their behaviour during lessons. Both DI and CA teachers were supposed to
pay more attention to metacognition in their lessons than the control group, because
the materials they were using and the training that was offered to them explicitly
asked them to do so. The main characteristics of DI and CA are in Table 2.
The implementation of the instructional behaviour of the teachers was registered
with high- and low-inference observational instruments, both focusing on the
characteristics of DI and CA. Several significant differences were found between the
control and the experimental groups and between the experimental groups, indicating
a sufficient degree of implementation. More detailed information is given by De Jager
(2002).
Analyses
To scale the items measuring metacognitive skills and knowledge (research question
1), we made use of item response theory, in particular the one parameter logistic
model (OPLM). The idea that item response models have in common is that there is
a single latent variable determining the response behaviour of individual subjects on
the items of the test. All subjects have a different position on the latent scale that can
only be inferred indirectly, from the item responses. The item response function
specifies the probability of a correct answer given the latent ability of the subject. Item
response models differ in the form of the assumed relation between the latent ability
and the item responses. In the Rasch model, the probability of a correct answer is
dependent on only one item characteristic, namely the difficulty (parameter) of the
item, which has to be estimated. In the so-called two-parameter logistic model items
are characterised by a difficulty and a discrimination parameter. As such the second
Table 2. Main characteristics of direct instruction and cognitive apprenticeship learning

environments
Direct Instruction Cognitive Apprenticeship
. teacher provides retrospect of prior . teacher facilitates students activating prior

lessons knowledge
. teacher summarises content and goal of . teacher poses problems and coaches
the lesson problem-solving
. teacher provides instruction in interaction . teacher models the use of skills
with students
. teacher regulates guided practice . teacher stimulates students to model
. teacher uses independent, individual . teacher coaches and fades guidance during
seatwork co-operative learning
. teacher provides feedback during . teacher enables articulation during
lesson co-operative learning, modelling and reflection
. teacher provides whole class feedback in . teacher offers opportunity for reflection in
final stage of lesson final stage of lesson
. teacher concludes lesson with summary . teacher discusses applicability
of lesson content
model is more realistic but the parameters are, theoretically and practically, more
difficult to estimate.
OPLM combines the tractable mathematical properties of the Rasch model with
the greater flexibility of the two-parameter logistic model. In OPLM we have item
difficulty parameters which have to be estimated and discrimination indices with
imputed values (Glas & Verhelst, 1995; Verhelst, Glas, & Verstralen, 1995). The
Rasch model assumes dichotomous items, but OPLM can also be used if the items
are polytomously scored.
With OPLM a set of test items can be calibrated on a common scale, and several
item oriented statistical tests become available if, overall, the OPLM model shows a
reasonable fit. The model for polytomous items, with dichotomous items as a special
case, can be formulated as follows. It is assumed that the response to item i, denoted
by Xi, falls in the score range (0, 1, mi). The probability of observing Xi = j as a
function of y, is given by,
P
expðai ðjy g big ÞÞ
PðXi ¼ jjyÞ ¼ P P
1 þ h expðai ðhy g big ÞÞ
With y, we denote the (latent) ‘‘ability’’, which the test is supposed to measure. For
an item with three response categories, as in our case, we have three characteristic
curves, linking the probability of a response in the category to the latent ability. The
item parameters b correspond to the position on the ability continuum where the
probabilities of responding in successive categories are equal; or in other words,
where the curves of successive categories cross. In case of three categories, there are
two item parameters per item. The discrimination index a governs the steepness of
the curves: the larger the value of a the steeper the curve. An item with a higher
index discriminates better in the ability region around the item parameters than an
item with a lower index. The discrimination indices a are supposed to be integer
constants. This assumption allows for conditional maximum likelihood estimation
of the item category parameters. Secondly, fit measures are available which focus on
the validity of the selected values of the discrimination indices and are informative
with respect to the direction in which they have to be changed in order to obtain a
better fit. The sum of the item scores, weighted by the discrimination indices, is a
sufficient statistic for the ability. This weighted sum is also used to calculate scale
scores for the subjects.
The fit of the model can be assessed by inspecting the global fit-statistic R, and a
number of item-fit statistics. The M statistics are based on a rationale originally
developed by Molenaar (Glas & Verhelst, 1995; Verhelst et al., 1995). The subject
scores are partitioned in a high and a low score group (sometimes also in an
additional medium group). For each score group, the expected number of subjects
giving the correct answer (or scoring in a category of the item) is calculated using the
estimated model, and the differences between the observed and the expected number
are combined. A negative outcome indicates that the item in question discriminates
better than average while a positive value points to a low discriminating item. The
three Ms use different partitions. Large values suggest up- or downgrading of the
discrimination indices in order to increase the item fit.
To answer research question 2, we used analysis of variance methods. We
corrected the dependent variables (student scores for metacognitive skills and
knowledge at the end of the school year) for their scores at the beginning of the school
year. In the final stage, we added student intelligence as a second factor in addition to
learning environment. Based on their score on the intelligence test, the students were
divided into four groups containing approximately 25% of the students each (lowest
scoring students, students that scored below average, students that scored above
average, highest scoring students).
Results
Research Question 1: Measuring metacognition
The questionnaire for metacognition measured metacognitive skills and metacogni-
tive knowledge by separate sets of items. The metacognitive skills part consisted of 22
multiple choice items with three alternatives (Table 3). The items were scored
polytomously, in three successive categories.
A few subjects with missing values for one or more items were left out of the
analysis. The total number of subjects in the analysis was 267. The classical test
analysis resulted in an alpha coefficient of .64, which is fairly low, and we found that
six items had low or even negative item test correlations (2, 4, 8, 16, 20, 22).
In a first OPLM-analysis on all 22 items, we assumed equal discrimination indices
over items (the discrimination index is set to one for each item). Item and global fit
statistics were obtained (R1c = 534.2; df = 129; p = .00). Given the large value of R1c,
the global fit statistic, the Rasch model had to be rejected. In the next step, the model
fit of individual items was inspected. Large positive values of the M-statistic indicate
that an item discriminates less well than average, while items with negative values are
better than average. We found five items (7, 9, 10, 17, 18) discriminating better than
average, but large positive M-values were found for four items in particular (2, 8, 13,
20), indicating that these items discriminate badly. These were items where reversed
coding was used. This finding is not uncommon and it has been suggested in the
literature to place such items in a separate scale. However, inspecting them more
closely, we concluded that they were formulated somewhat ambiguously (in the sense
that ‘‘incorrect’’ answers were also defendable), an additional reason to discard them.
Removing seven items that did not discriminate well (2, 4, 8, 13, 16, 20, 22) resulted
in a global fit statistic of R1c = 105.8 (df = 87; p = .08), which is somewhat better but
still not very good. A less drastical variant where five items (2, 8, 13, 20, 22) were
removed had a global fit statistic of R1c = 188.7 which is not acceptable (df = 99;
p = .00). We then tried to increase the fit to an acceptable level by changing the
discrimination indices, following the suggestions given by the item fit indices. In the
third and last analysis, the item indices were successively adapted. This resulted in a
reasonable fit globally of 17 items (see Table 3).
Table 3. Metacognitive skills: Calibration results of the 17 item test with unequal discrimination
indices (R1c = 111.8; df = 99; p = .18; number of observations = 267)
Item
nr Item content A B SE(B) M
1 Before I start reading, I look at the 2 7.74 .09 7.01

pictures and the title of the text .12 .07 7.58
3 Before I start reading, I try to find 3 7.27 .05 7.21
out what the text is about .05 .05 .16
4 I prefer to start reading at once without 1 7.07 .14 1.31
further thinking .56 .16 1.44
5 Before I start reading, I predict what the 3 .15 .05 .85
text will be about .42 .07 7.32
6 Before I start reading, I skip through the 2 .16 .07 7.13
text momentarily .42 .10 71.81
7 During reading, now and then I check 4 7.13 .04 7.32
whether I understand the text .05 .04 7.16
9 During reading, I try to find out what is 4 7.16 .05 7.45
important 7.15 .04 7.12
10 During reading, I think over which parts 4 7.14 .04 .79
I have to read extra well .05 .04 7.17
11 During reading, I think over how the text 3 .10 .05 .64
will continue .15 .06 1.41
12 When I notice that I do not understand a 3 7.03 .05 .55
part of the text, I check whether there are .27 .06 .01
words in the text that I don’t know
14 When I notice that I do not understand a 3 7.09 .05 1.27
part of the text, I check whether difficult .08 .05 1.09
words are explained elsewhere in the text
part of the text, I read difficult parts of the 7.09 .04 2.96
text once more
part of the text, I just read on .13 .15 .76
17 When I have finished reading, I reflect on 5 7.11 .04 71.48
whether I have understood the text well 7.08 .03 .99
18 When I have finished reading, I reflect on 5 7.28 .04 7.37
what I have learnt 7.02 .03 71.50
19 When I have finished reading, I try to reflect 4 .02 .04 72.00
on whether I have dealt with the text properly .15 .05 71.27
21 When I have finished reading, I try to tell 4 7.01 .04 .07
myself what the text was about .09 .04 7.45
Number and content of removed items:

2 Before I start reading, I first count the paragraphs
8 During reading, I try to memorise all sentences
13 When I notice that I do not understand a part of the text, I write down difficult words
20 When I have finished reading, I inspect the pictures
22 When I have finished reading, I read the first two sentences once more
In Table 3, the (imputed) values of the discrimination parameters are in the

column under A. The estimated item category parameters and their standard errors
are in the column under B and the next column. In Table 3, the reversed coded items
have disappeared. For item 9, the two bs are very close together. In earlier analyses we
found a reversed rank order of the category parameters (b2 smaller or equal b1) for
some items, implying that the middle category is never the modal category (the
chance of a ‘‘sometimes’’ answer was always less than the chance of a ‘‘yes’’ or a
‘‘no’’), no matter the trait value. This suggests that the item functions as a
dichotomous item. The M-statistics are in the last column. Item 4 and item 16
receive low discrimination indices and therefore, while not actually discarded, will
have a limited influence on the weighted total score, obtained by summing the item
scores weighted with the corresponding discrimination indices. Overall, the
correlation between the weighted and unweighted total scores is high.
We then performed a number of OPLM-analyses on the metacognitive knowledge
part of the questionnaire (Table 4). The scale initially consisted of 12 dichotomously
scored two-choice items. A few subjects with missing values for one or more items
were left out of the analysis. The total number of subjects in the analysis was 271. The
classical test analysis resulted in an alpha coefficient of .42, which can be considered
as unacceptably low. Three items (3, 6, 8) were found to have negative item test
correlations.
A first OPLM analysis with the full set of 12 items, with equal discrimination
indices, showed a poor fit globally. On an individual level, the fit indices of several
items such as item 3, 6, 8, and in particular item 11, suggested adaptations. Lowering
the discrimination index of item 11, however, was found to affect the global fit index
negatively. From a content-oriented view, this item also differed somewhat from the
others. From a psychometric point of view, the 11-item scale was acceptable. Leaving
out item 3, 6, 8, and 11 altogether, based on content arguments, and adapting the
indices resulted in an 8-item scale with a global fit statistic of R1c = 18.6 (df = 19,
p = .48). A summary of results is in Table 4. The value of 99 for the M-statistic of
item 2, is a default value indicating that the actual value of the M-statistic could not
be calculated, because subjects could not be partioned into a high and a low group.
Research Question 2: Effects of learning environments

In the preceding section, we demonstrated that it was possible to select subsets of
items satisfying the assumptions of OPLM and, in case of metacognitive knowledge,
in principle without losing a substantial number of items. As a consequence, we now
had tests for which we could feel reasonably confident that the items represent a
single latent continuum, and we had a scoring rule which makes the best use of the
available information, namely a weighted instead of the simple sumscore where the
weights are the discrimination parameter values. These weighted sumscores were
used in the subsequent analyses which we performed to answer research question 2.
Table 5 shows the metacognitive skills scores of students at the beginning of the
school year (skills1) and at the end (skills2) in the three research groups.
Table 4. Metacognitive knowledge: Calibration results of the 8 item test with unequal
discrimination indices (R1c = 18.6; df = 19; p = .48; number of observations = 271)
Item nr Item content A B SE(B) M
1 What is the best thing to do before you 2 .07 .08 7.61

start reading?
. Ask yourself what the text will be about
. Read the last sentence so that you know
how the text will end
2 What is the best thing to do before you 1 71.12 .17 99.99
start reading?
. Think about the title
. Count the paragraphs you have to read
4 What is the best thing to do during reading? 2 .23 .08 7.07
. Read the last sentence so that you know
how the text will end
. Pick the most important issues from the text
5 What is the best thing to do during reading? 3 .27 .07 7.58
. Read the text quickly
. Ask yourself whether you understand the text
7 What is the best thing to do after reading to 1 7.30 .14 .05
find out whether you have understood the text?
. Try to say in your own words what you have read
. Count the words that you do not understand
9 What is the best thing to do after reading to find 1 .64 .13 1.09
out whether you have understood the text?
. Try to pick the main issue from the text
. Read the title once again
10 What is the best thing to do when you do not 2 .10 .08 1.22
understand a sentence?
. Read the last sentence of the text
. Try to say the sentence in your own words
12 What is the best thing to do when you do not 3 .11 .07 7.11
understand a part of the text?
. Read on
. Read a part of the text once again
Number and content of removed items:

3 What is the best thing to do before you start reading?
. Inspect the title and the pictures
. Write down some difficult sentences
6 What is the best thing to do during reading?
. Stop now and then to predict how the text will continue
. Read all difficult words twice
8 What is the best thing to do after reading to find out whether you have understood
the text?
. Read the difficult sentences one more
. Write down the content of the text in a few sentences
11 What is the best thing to do when you do not understand a word?
. Inspect the words around the difficult word
. Write down the difficult word
Table 5. Metacognitive skills in the beginning and the end of the school year, in three research
groups
Group Skills1 Skills2
Cognitive apprenticeship Mean 17.2 23.1

N 107 109
Standard deviation 7.0 6.9
Direct instruction Mean 17.4 21.9
N 65 56
Control Mean 14.1 15.9
N 84 85
Total Mean 16.3 20.4
N 256 250
Table 6. Covariance analysis (tests of between-subjects effects) with metacognitive skills as the 5
dependent variable
Source Type III sum of squares df Mean square F Sign.
Corrected model 344.16* 12 362.01 8.09 .00

Intercept 6207.05 1 6207.05 138.83 .00
Skills1 1156.13 1 1156.13 25.86 .00
Group (learning environment) 1853.45 2 926.73 20.73 .00
Intelligence 160.13 3 53.44 1.19 .31
Group * Intelligence 142.25 6 23.71 .53 .79
Error 9612.50 215 44.71
Total 108186.51 228
Corrected total 13956.66 227
* R2 = .31 (adjusted R2 = .27)
Table 5 makes clear that there were a priori differences with respect to
metacognitive skills. While in the cognitive apprenticeship and the direct instruction
groups the mean scores on metacognitive skills were practically equal, the control
group scored lower. At the end of the school year, the score means were increased in
all groups. The largest gain was observed in the two experimental groups. To test for
the significance of the learning environment effect on metacognitive skills, an analysis
of covariance was performed with instruction group and student intelligence as
factors and pretest scores as the covariate (Table 6).
Table 6 shows a significant effect of learning environment on metacognitive skills.
The effects of intelligence and the interaction of learning environment and
intelligence were non-significant. The decision made earlier to use intelligence as a
blocking variable and not as a covariate may have resulted in some loss of statistical
Table 7. Estimated marginal means for metacognitive skills as the dependent variable
95% Confidence interval
Group Mean* Standard error Lower bound Upper bound
Cognitive apprenticeship 22.92 .70 21.55 24.29

Direct instruction 21.35 .93 19.53 23.18
Control 16.16 .80 14.58 17.74
* evaluated at covariates appeared in the model: skills1 = 16.43
Table 8. Metacognitive knowledge in the beginning and the end of the school year, in three research
groups
Group Know1 Know2
Cognitive apprenticeship Mean 6.43 7.27

N 113 102
Direct instruction Mean 6.32 7.26
N 65 65
Control Mean 5.82 6.57
N 84 87
Total Mean 6.21 7.03
N 262 254
power, but the conclusions would have been the same. In a preliminary analysis, we
found very small differences between correlations of the recoded and the raw IQ
scores.
For the cognitive apprenticeship and the direct instruction groups, the 95%
confidence intervals of the estimated means (corrected for the covariate, the pretest
skills measure) for metacognitive skills overlap strongly. The cognitive apprenticeship
and direct instruction groups both have significantly higher means than the control
group (see Table 7).
With regard to metacognitive knowledge, scaled scores were obtained using the
item weights of the OPLM analysis. We performed the same analyses as for
metacognitive skills. Table 8 shows the metacognitive knowledge scores of students at
the beginning of the school year (know1) and at the end (know2) in the three research
groups. Again, the mean scores on the pretest were very similar for the two
experimental groups, while the control group mean was lower. The same pattern was
observed on the posttest scores. All three groups showed an increase in metacognitive
knowledge.
Table 9. Covariance analysis (tests of between-subjects effects) with metacognitive knowledge as the
dependent variable
Source Type III sum of squares df Mean square F Sign.
Corrected model 133.352* 12 11.11 5.4 .00

Intercept 387.02 1 387.02 189.2 .00
Skills1 71.03 1 71.03 34.73 .00
Group (learning environment) 20.10 2 10.05 4.91 .01
Intelligence 4.80 3 1.60 .78 .51
Group * Intelligence 6.34 6 1.06 .52 .79
Error 462.24 226 2.05
Total 12344.89 239
Corrected total 595.59 238
* R2 = .22 (adjusted R2 = .18)
Table 10. Estimated marginal means for metacognitive knowledge as the dependent variable
95% Confidence interval
Group Mean* Standard error Lower bound Upper bound
Cognitive apprenticeship 7.18 .15 6.89 7.47

Direct instruction 7.24 .19 6.87 7.61
Control 6.57 .16 6.25 6.89
* evaluated at covariates appeared in the model: know1 = 6.35
The analysis of covariance shows a significant effect of learning environment on

metacognitive knowledge. Once again, the effects of intelligence and the interaction
of learning environment and intelligence are non-significant (see Table 9).
In Table 10, we present the expected marginal means of metacognitive knowledge
and their 95% confidence intervals. The mean in the direct instruction group is
slightly larger than the mean in the cognitive apprenticeship group, but again there is
a large overlap for the intervals. The overlap with both intervals and that of the control
group is small.
Conclusions
In our study, we wanted to find out whether we could succeed in measuring
metacognition by means of a questionnaire. Although a questionnaire may not be
the optimal instrument to measure metacognition, it may be necessary to use this
instrument in studies with relatively large samples for pragmatic reasons. Other
more refined methods then may take too much time or may be too expensive. To
scale the items of the questionnaires, we made use of item response theory, in
particular the one parameter logistic model (OPLM). We succeeded in finding an
adequate fit for 17 items measuring metacognitive skills (5 initial items were

removed) and for 8 items measuring metacognitive knowledge (4 initial items were
removed). The correlation between the scaled scores of the metacognitive skills and
the metacognitive knowledge tests correlate was with .21 relatively low. These
results however, though encouraging, do not guarantee the construct validity of the
instruments.
We also wanted to know whether different learning environments would yield
different effects on metacognition. Teachers in our study practised direct instruction
(with relatively high levels of teacher structuring) or cognitive apprenticeship (with
relatively low levels of teacher structuring). The direct instruction and cognitive
apprenticeship teachers were trained to use these models in reading comprehension
lessons. They also were trained to focus on metacognition in their lessons. The study
also included a control group of teachers. Assignment to the experimental and
control conditions was based on voluntary participation and therefore non-random.
This procedure may lead to systematic a priori differences between the groups on
relevant variables. For the available background variables we only found a difference
in SES between the experimental groups.
A comparison between the cognitive apprenticeship and direct instruction
conditions on the one hand with the control group on the other hand clearly shows
that explicit teacher training and specific attention of teachers for metacognition is
needed in order to enhance student metacognition. With regard to expected mean
scores, direct instruction and cognitive apprenticeship both clearly differ in a positive
sense from the control group. We have found no conclusive evidence for a systematic
difference between cognitive apprenticeship and direct instruction with regard to
metacognition. The differences in expected mean scores between direct instruction
and cognitive apprenticeship on metacognitive skills and knowledge, are non-
significant. Interactions of learning environment and student intelligence were non-
significant for both output measures.
In summary, both direct instruction and cognitive apprenticeship were found to
foster the development of metacognition. It is also clear that teachers have to be
trained to implement the instructional models in their classrooms successfully. These
results have implications for educational practice.
References
Bereiter, C., & Scardamalia, M. (1989). Intentional learning as a goal of instruction. In L. B.
Resnick (Ed.), Knowing, learning and instruction (pp. 361 – 393). Hillsdale, NJ: Lawrence
Erlbaum.
Biemans, H. J. A., Deel, O. R., & Simons, P. R. J. (2001). Differences between successful and less
successful students while working with the CONTACT-2 strategy. Learning and Instruction,
11(4/5), 265 – 282.
Boekaerts, M., & Simons, P. R. J. (1993). Leren en instructie, psychologie van de leerling en het leerproces
[Learning and instruction, psychology of the pupil and the learning process]. Assen, The
Netherlands: Dekker & Van de Vegt.
Bolhuis, S. M. (2000). Naar zelfstandig leren: Wat doen en denken docenten? [Towards independent
learning: what do teachers do and think?] Apeldoorn, The Netherlands: Garant.
Brand-Gruwel, S. (1995). Onderwijs in tekstbegrip: Een onderzoek naar het effect van strategisch lees- en
luisteronderwijs bij zwakke lezers [Instruction in text comprehension: A study into the effects of
strategic reading and listening instruction for weak readers]. Ubbergen, The Netherlands:
Uitgeverij Tandem Felix.
Davidson, J. E., Deuser, R., & Sternberg, R. J. (1995). The role of metacognition in problem
solving. In J. Metcalfe & A. P. Shimamura (Eds.), Metacognition: Knowing about knowing (pp.
207 – 227). Cambridge, MA: MIT.
Davidson, J. E., & Sternberg, R. J. (1998). Smart problem solving: How metacognition helps. In D.
J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory and practice
(pp. 47 – 69). London: Lawrence Erlbaum.
De Corte, E. (2000). Marrying theory building and the improvement of school practice: A
permanent challenge for instructional psychology. Learning and Instruction, 10, 249 – 266.
De Jager, B. (2002). Teaching reading comprehension. The effects of direct instruction and cognitive
apprenticeship on comprehension skills and metacognition. Groningen, The Netherlands: GION.
De Jong, F. P. C. M. (1992). Zelfstandig leren: Regulatie van het leerproces en het leren reguleren: Een
procesbenadering [Independent learning: Regulation of the learning process and learning to
regulate: A process approach]. Tilburg, The Netherlands: KUB.
Desoete, A., Roeyers, H., Buysse, A., & De Clercq, A. (2002). Dynamic assessment of
metacognitive skills in young children with mathematics-learning disabilities. In G. M. van
de Aalsvoort, W. C. M. Resing, & A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment
and cognitive training: Actual research and perspectives in theory building and methodology (Vol. 7,
pp. 307 – 333). Oxford: Elsevier.
Dominowski, R. L. (1998). Verbalization and problem solving. In D. J. Hacker, J. Dunlosky, & A.
C. Graesser (Eds.), Metacognition in educational theory and practice (pp. 25 – 47). London:
Lawrence Erlbaum.
Ericsson, K. A, & Simon, H. A. (1993). Protocol analysis: Verbal reports as data. Cambridge, MA:
MIT.
Glas, C. A. W., & Verhelst, N. D. (1995). Testing the Rasch model. In G. H. Fischer & I. W.
Molenaar (Eds.), Rasch models: Foundations, recent developments and applications (pp. 69 – 96).
New York: Springer-Verlag.
Hoogendijk, W., & Wolfgram, P. (1995). KEA, halverwege: Projectverslag schooljaar 1994 – 1995
[KEA, halfway: project report school year 1994 – 1995]. Rotterdam, The Netherlands:
CED.
Klatter, E. B. (1996). Studievaardigheden in de brugklas. Een vergelijking tussen algemene
en specifieke verwerkingsstijlen van brugklasleerlingen. Pedagogische Studiën, 73(4), 303 – 316.
Kluvers, K., & Simons, P. R. J. (1992). Zelfregulatievaardigheden en COO: Tussenrapportage
[Selfregulation skills and computer-assisted education: interim report]. Nijmegen, The
Netherlands: KU Nijmegen.
Laros, J. A., & Tellegen, P. J. (1991). Construction and validation of the SON-R 512-17, The Snijders-
Oomen non-verbal intelligence test. Groningen, The Netherlands: Wolters-Noordhoff.
Mayer, R. E. (2001). Changing conceptions of learning: A century of progress in the scientific study
of education. In L. Corno (Ed.), Education across a century: The centennial volume (pp. 34 – 76).
Chicago: University of Chicago.
Meijer, J., Elshout-Mohr, M. E., & Van Hout-Wolters, B. H. A. M. (2001). An instrument for the
assessment of cross curricular skills. Educational Research and Evaluation, 7, 79 – 108.
Minnaert, A., & Janssen, P. J. (1999). The additive effect of regulatory activities on top of
intelligence in relation to academic performance in higher education. Learning and Instruction,
9(1), 77 – 91.
Muijs, D., & Reynolds, D. (2001). Effective teaching: Evidence and practice. Gateshead: Athenaeum
Press.
Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning components of
classroom academic performance. Journal of Educational Psychology, 82(1), 33 – 40.
Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constructively responsive
reading. Hillsdale, NJ: Lawrence Erlbaum Associates.
Pressley, M., & McCormick, C. B. (1995). Cognition, teaching, and assessment. New York:
Longman.
Resnick, L. B. (1989). Introduction. In L. B. Resnick (Ed.), Knowing, learning and instruction (pp.
1 – 25). Hillsdale, NJ: Lawrence Erlbaum.
Roelofs, E., & Visser, J. (2001). Leeromgevingen volgens ouders en leraren: Voorkeuren en
realisatie [Learning environments according to parents and teachers: Preferences and
realisation]. Pedagogische Studiën, 78, 151 – 168.
Rosenshine, B., & Meister, C. (1994). Reciprocal teaching: A review of the research. Review of
Educational Research, 64(2), 201 – 243.
Schoenfeld, A. H. (2001). Mathematics education in the twentieth century. In L. Corno (Ed.),
Education across a century: The centennial volume (pp. 239 – 279). Chicago: University of
Chicago.
Simons, P. R. J. (2000). Review studie leren en instructie [Review study learning and instruction].
Nijmegen, The Netherlands: University of Nijmegen.
Sliepen, S. E., & Reitsma, P. (1993). Training van lom-leerkrachten in directe instructie van
begrijpend leesstrategieën [Training of teachers in special education in direct instruction in
comprehension skills]. Pedagogische Studiën, 70, 420 – 444.
Van Hout-Wolters, B. (2000). Assessing active self-directed learning. In R. J. Simons, J. van der
Linden, & T. Duffy (Eds.), New learning (pp. 83 – 100). Dordrecht, The Netherlands: Kluwer
Academic Publishers.
Van Someren, M. W., Barnard, Y. F., & Sandberg, J. A. C. (1994). The thinking aloud method: A
practical guide to modelling cognitve processes. London: Academic Press.
Veenman, S. A. M. (1992). Effectieve instructie volgens het directe instructiemodel [Effective
instruction based on the direct instruction model]. Pedagogische Studieën, 69, 242 – 269.
Veenman, S. A. M., Leenders, Y., Meyer, P., & Sanders, M. (1993). Leren lesgeven met het directe
instructiemodel [Learning to teach with the direct instruction model]. Pedagogische Studiën,
70, 2 – 16.
Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1995). The one-parameter logistic model
OPLM. Arnhem, The Netherlands: CITO.
Walraven, A. M. A. (1995). Instructie in leesstrategieën: Problemen met begrijpend lezen en het effect van
instructie aan zwakke lezers [Instruction in reading strategies: Problems with reading
comprehension and the effect of instruction to low-achieving pupils]. Amsterdam/
Duivendrecht: Paedologisch Instituut.

Texto+19.1 Pendiente

Uploaded by

Copyright:

Available Formats

You might also like

Texto+19.1 Pendiente

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Texto+19.1 Pendiente

Uploaded by

Copyright:

Available Formats

School Effectiveness and School Improvement

Vol. 16, No. 2, June 2005, pp. 179 – 196

The Development of Metacognition in

(Received 7 August 2003; accepted 8 July 2004)

*Corresponding author. GION, Groningen Institute for Educational Research, University of

Constructivist ideas have inﬂuenced recent major innovations in Dutch secondary

How to Measure Metacognition

continue to be used in metacognition research even though the shortcomings are

Learning Environments and Metacognition

1. Can we measure metacognition adequately by means of a questionnaire?

Measurement of Metacognition (Skills and Knowledge) and Student Intelligence

Table 1. Numbers of students and teachers in the research groups

Learning environment Students Teachers

Cognitive apprenticeship 118 8

The second part of the questionnaire about metacognitive knowledge offered 12

Measurement of Learning Environment

Table 2. Main characteristics of direct instruction and cognitive apprenticeship learning

Direct Instruction Cognitive Apprenticeship

. teacher provides retrospect of prior . teacher facilitates students activating prior

1 Before I start reading, I look at the 2 7.74 .09 7.01

Number and content of removed items:

In Table 3, the (imputed) values of the discrimination parameters are in the

Research Question 2: Effects of learning environments

Item nr Item content A B SE(B) M

1 What is the best thing to do before you 2 .07 .08 7.61

Number and content of removed items:

Group Skills1 Skills2

Cognitive apprenticeship Mean 17.2 23.1

Source Type III sum of squares df Mean square F Sign.

Corrected model 344.16* 12 362.01 8.09 .00

* R2 = .31 (adjusted R2 = .27)

95% Conﬁdence interval

Group Mean* Standard error Lower bound Upper bound

Cognitive apprenticeship 22.92 .70 21.55 24.29

* evaluated at covariates appeared in the model: skills1 = 16.43

Group Know1 Know2

Cognitive apprenticeship Mean 6.43 7.27

Source Type III sum of squares df Mean square F Sign.

Corrected model 133.352* 12 11.11 5.4 .00

* R2 = .22 (adjusted R2 = .18)

95% Conﬁdence interval

Group Mean* Standard error Lower bound Upper bound

Cognitive apprenticeship 7.18 .15 6.89 7.47

* evaluated at covariates appeared in the model: know1 = 6.35

The analysis of covariance shows a signiﬁcant effect of learning environment on

adequate ﬁt for 17 items measuring metacognitive skills (5 initial items were

You might also like