Carroll 1997

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

acta

psychologica
ELSEVIER Acta Psychologica 95 (1997) 239-253

Tradeoff of semantic relatedness and degree of


overlearning: Differential effects on metamemory
and on long-term retention
Marie Carroll a,,, Thomas O. Nelson b,c, Anne Kirwan a
a CentreforAppliedPsychology, University of Canberra, P.O. Box 1, Belconnen, ACT2616, Australia
Psychology Department, Universi~ of Washington, Seattle, WA 98195, USA
c Psychology Department, UniversiO' of Maryland, College Park, MD 20742, USA
Received 23 October 1995; revised 5 June 1996; accepted 16 June 1996

Abstract

We investigated people's recall and recognition of, and Judgments of Learning (JOLs) and
Feelings of Knowing (FOKs) about: (a) pairs of related words that were learned to a criterion of
two correct recalls (criterion-learned/related) versus (b) pairs of unrelated words that were
learned to a criterion of eight correct recalls (overlearned/unrelated). Recall, FOK on unrecalled
targets, and recognition were tested at either (between-subjects variable) two or six weeks after
learning. In Experiment 1, subjects' JOLs were greater in magnitude for criterion-learned/related
items than for overlearned/unrelated items, and they predicted that recall would be the same after
a 2-week retention interval as after a 6-week retention interval (between-subjects prediction). In
contrast, however, subsequent recall was greater on the 2-week retention test than on the 6-week
retention test and was greater for the overlearned/unrelated items than for the criterion-
learned/related items; also, subjects' FOKs (and recognition performance) were greater in
magnitude for the overlearned/unrelated items than for the criterion-learned/related items.
Experiment 2 revealed that the overweighting of the importance of relatedness disappears from
JOLs when those JOLs occur one day after the acquisition session. These findings imply that the
information tracked by metacognitive monitoring judgments is different for JOLs than for FOKs,
with the JOLs (relative to FOKs) based more on semantic relatedness and less on the degree of
learning during acquisition. Also, subjects' JOLs are not particularly good at accurately forecast-

* Corresponding author. E-mail: carroll@science.canberra.edu.au, Fax: + 61 6 201-5753.

0001-6918/97/$17.00 Copyright © 1997 Elsevier Science B.V. All rights reserved.


PII S0001-69 1 8(96)00040-6
240 M. Carroll et al. /Acta Psychologica 95 (1997) 239-253

ing their eventual level of recall on long-term retention tests that occur several weeks after
acquisition.

PsyclNFOclassification:2343
Keywords: Episodic; Long-term memory; Forgetting; Consciousness;Metamemory

1. Introduction

Judgements of learning (JOL) are made during or at the end of learning and refer to
the subject's estimates of his or her likelihood of subsequently remembering the studied
items (reviewed in the theoretical framework of Nelson and Narens, 1990, 1994).
Feelings of knowing (FOK) are made on items not recalled at test and refer to the
subject's estimates of his or her likelihood of recognizing the nonrecalled items.
Leonesio and Nelson (1990) have shown that JOL and FOK have different bases, but
their research did not identify what those different bases are.
A primary goal of our research was to investigate how the two metamemory
judgments - JOL and FOK - track memory for two kinds of items: (1)
overlearned/semantically-unrelated items and (2) criterion-learned/semantically-related
items. A century of memory research has repeatedly shown that pre-existing semantic
associations between items will have the effect that paired-associate acquisition is easier
for semantically-related pairs (e.g., soil-rock) than for semantically-unrelated pairs (e.g.,
engine-disease). However, our interest was in how a lack of pre-existing association
may be compensated for by extra overleaming during the laboratory session, and what
effects that might have on different aspects of memory performance. Accordingly, we
manipulated both the degree of semantic relatedness and the degree of overleaming in a
way that would trade those two variables off.
Nelson et al. (1982) have already shown that the magnitude of FOK will increase
with the degree of overleaming. Subsequently, Leonesio and Nelson (1990) found the
magnitude of JOLs was greater for overleamed items than for criterion-learned items.
However, in accord with the small value of the correlation (0.17) between JOL and FOK
in their experiment, Leonesio and Nelson conjectured that JOL and FOK are based on
different types of information being monitored. One possibility is that both the JOL and
FOK are based on a multidimensional structure but that each of those aspects of
metamemory is sensitive to different dimensions of the underlying structure of whatever
produces memories. We explored that hypothesis here.

2. Experiment 1

In contrast to the research of Leonesio and Nelson (1990), our experiment was
designed to hold constant the cue for JOL and FOK, by having both kinds of judgments
be made to a cue consisting of the stimulus alone. This allowed a more direct
comparison to be made between JOL and FOK.
M. Carroll et al./ Acta Psychologica 95 (1997) 239-253 241

In Experiment 1, overlearned/unrelated items and criterion-learned/related items


were presented in a within-subject design. Subjects were tested on recall of the
cue-target pairs at either 2 or 6 weeks after acquisition. Thus, any advantage from
preexisting semantic associations was contrasted with the advantage from overlearning.
One goal was to investigate whether the importance of semantic relatedness was
different after longer delays than after shorter delays.

2.1. Method

2.1.1. Subjects and design


University undergraduates who received course credit for their participation acted as
subjects in the two sessions. In a mixed model design, each subject overlearned 20 pairs
of unrelated words and criterion-learned 20 pairs of weakly related words. The first
session lasted approximately 20 to 45 minutes and the second session, which occurred 2
or 6 weeks after, lasted about 20 minutes. The delay condition was between-subjects.
Twenty subjects participated in the 2-week delay condition and twenty in the 6-week
delay condition, for a total of 40 subjects.

2.1.2. Materials
Stimuli were 40 word pairs. The words were all nouns that had high frequency (A or
AA rating in the Thorndike-Lorge norms). Twenty word pairs were formed by
randomly combining the nouns and ensuring that there was no obvious association
between any pairs (e.g. engine-disease; pole-pupil), and 20 were chosen so that there
was some weak association between the words (e.g. soil-rock; student-paper). Each
subject received all 40 word pairs to learn, blocked by condition, with the order of
blocks counterbalanced over subjects. Subjects in both delay groups learned the 20
weakly associated pairs to a criterion of 2 correct recalls per item, and the 20 unrelated
pairs to a criterion of 8 correct recalls per item. Recall was non-consecutive; that is,
when a target was not correctly recalled, the experimenter showed the subject the correct
cue and target pair, and then moved on to the next list pair. Thus subsequent testing of
the incorrectly recalled target would recur only after the subject had been tested on all
other list items.

2.1.3. Procedure
Each subject was tested individually. There were two separate sessions per subject:
acquisition and retention.

2.1.3.1. Acquisition session. Subjects were informed that they would be shown one pair
of words at a time, printed on index cards. The left-hand word printed in lower case was
the cue word, and the right-hand word in upper case was the target word. It was
explained that the target word was required in response to the experimenter's saying the
cue. Each word pair was presented for 15 seconds, withdrawn, and the next card shown
immediately. At the conclusion of the card presentation, the experimenter spoke aloud
the cue words. Alternate subjects received all of the low-association set words en bloc,
or all of the no-association set words. The subject attempted to respond with the
242 M. Carroll et al. /Acta Psychologica 95 (1997) 239-253

appropriate target. If the subject did not respond correctly at any stage of learning, the
cue-target portion (on the flip-side of the card) was shown again for 10 seconds.
Subjects had to produce the correct target on a total of 8 occasions for the unrelated
pairs, and only twice for the weakly associated pairs. A dropout procedure was
employed (as in Leonesio and Nelson, 1990) whereby presentation ceased when the final
item reached 8 or 2 correct.
After all pairs in the list had been learned to the appropriate criteria (i.e. the learning
phase was completed), the experimenter then read aloud each cue word once again,
during which the subject was asked to give a judgment of learning (JOL) rating on a
6-valued scale (0%, 20%, 40%, 60%, 80% 100%) indicating how likely it was that he or
she would be able to produce the correct target word in response to the cue word when
asked to do so at their next session. At this stage, each subject was aware of the interval
of time that would separate the first and second sessions. Moreover, the JOL rating on a
given item occurred after several other items had intervened, so these were 'delayed
JOLs' (Nelson and Dunlosky, 1991).

2.1.3.2. Retention session. At test, 2 or 6 weeks later, the experimenter said the cue
word, and the subject attempted to recall the correct target. Guessing was encouraged
but not required. The session was self-paced. The experimenter first presented the entire
list of 40 cues, one at a time. No feedback was given about whether the response was
correct or wrong. Then the experimenter returned to those items for which the correct
target had not been recalled. In response to the cue for each such item, subjects had to
give an FOK rating from the same 6-point scale described for JOL. They were informed
that they had been incorrect on all of the items they were about to rate for FOK. This
rating indicated the degree of confidence that the correct target would be recognized
from a set of 4 possible responses. A zero rating would mean " I have no feeling of
knowing about this item, and my success at recognizing the correct answer should be
only at chance level"; a 100% rating would mean " I have a very strong feeling of
knowing about this item, and I am certain that I will recognize the correct answer".
FOK ratings on all unrecalled items were obtained. Finally, the experimenter presented
the cue word for each unrecalled target and offered a choice of 4 alternatives, one of
which was correct, and 3 of which were other unrecalled targets.

2.2. Results

2.2.1. Magnitude o f J O L
The mean JOL ratings are shown in the left panel of Fig. 1. The first result of interest
is that JOLs were higher for criterion-learned/related items than for overleamed/unre-
lated items (F(1,38) = 14.28, MSe = 99.99, p < 0.001). This result reflects subjects'
beliefs that relatedness, as we examined it here, is a better predictor of future recall
performance than is the amount of learning engaged in. Both delay groups gave
consistently higher JOL ratings to the criterion-learned/related items than to the
overlearned/unrelated items, ( p < 0.01 for six weeks; p <0.05 for two weeks) and
there was no interaction between group and criterion of learning ( F < 1). JOLs did not
differ between 2-week and 6-week delay groups ( F < 1): Subjects did not give
M. Carroll et al. / Acta Psychologica 95 (1997) 239-253 243

70
JOL
Recall
65

6O
-o
@
Ss
== .,,
50

|
E
35

30
6 wk 6 wk 2 wk 2 wk 6 wk 6 wk 2 wk 2 wk
OIJUN CUR OL/UN CL/R OUUN C L / R OL/UN CUR

condition condition

CL/R = criterion learned related


OL/UN = overlearned unrelated
Fig. I. (left panel) Mean JOL rating and (right panel) Mean proportion recalled as a function of week delay
and type of material, Experiment 1.

predictions of lower recall for 6 weeks than for 2 weeks, contrary to what might be
expected. This surprising finding may be due to the fact that the retention interval was a
between-subjects variable (see Carroll and Nelson, 1993).

2.2.2. Recall
The pattern for the mean JOL ratings in Fig. 1 can be compared with the pattern for
the mean proportion of targets actually recalled during the retention test (see right panel
of Fig. 1). There was a main effect of group (F(1,36) = 15.81, MSe = 141.63, p < 0.001)
such that the 6-week group remembered fewer targets (mean = 0.35) than the 2-week
group (mean = 0.61), as might be expected with longer intervals. However, what is most
noteworthy is that there was greater recall of overlearned/unrelated targets (mean = 0.53
correct) than of criterion-learned/related targets (mean = 0.42 correct), ( F ( 1 , 3 6 ) =
14.86, MSe = 141.63, p < 0.001). This difference was significant for both groups
( p < 0.05 for 6-week delay; p < 0.01 for 2-week delay) and there was no significant
interaction ( F < 1).
Thus, although the subjects believed that recall would be affected more by related-
ness than by the amount of learning, in fact the outcome in terms of eventual recall was
exactly the opposite!

2.2.3. Magnitude of FOK


The same analyses carried out on FOK measures, recorded for targets that were not
recalled at test, yielded some differences from the JOL results (see left panel of Fig. 2).
Firstly, there was a significant effect of kind of items (F(1,36) = i 3.25, MSe = 71.87,
244 M. Carroll et al. /Acta Psychologica 95 (1997) 239-253

100-
FOK Recognition
65~

85 @
80, C

7s, O

70. gl
,,e O
0 65-
M=
~65-
Q E
Q.
E 65- r-

50 E
45
40
6wk
,. 6wk 2wk
I2wk 6wk 6wk 2wk 2wk
OUUN CUR O U U N CUR OIJUN CUR OIJUN CUR

condition condition

CL/R= criterionlearnedrelated
OL/UN= overlearnedunrelated
Fig. 2. (left panel) Mean FOK rating and (right panel) Mean proportion correctlyrecognisedas a function of
week delay and type of material, Experiment 1.

p < 0.001) such that the overlearned/unrelated targets received higher FOK ratings
( m e a n = 52%) than the criterion-learned/related items ( m e a n = 4 5 % ) . Thus the
metacognitive ratings for FOK are the reverse of the JOL ratings; in rating FOK the
subjects are influenced by degree of prior learning more than by relatedness of the
material.
In addition, there was no effect of retention interval on FOK: The mean rating at 6
weeks (0.43) was not significantly different from the mean rating at 2 weeks (0.53)
( F = 1.95, p > 0.05), although this nonsignificant trend is in the direction of greater
expected recognition after a shorter retention interval than after a longer retention
interval. The lack of statistical significance for the effect of retention interval on FOK
(as with JOL) may be due to the relative insensitivity of between-subjects designs at
detecting a difference in FOK (Carroll and Nelson, 1993) presumably because the
threshold for predicting 'will recognize' might change substantially in between-subects
designs. The interaction of retention interval × kind of items was not significant
( F = 1.03).
2.2.4. Recognition
Subsequent recognition of unrecalled items (see right panel of Fig. 2) showed a
pattern similar to recall. The 2-week group recognised more targets (mean = 0.87) than
the 6-week group (mean -- 0.71) (F(1,36) = 10.53, MSe = 296.55, p < 0.005), as might
be expected, and in accord with the direction of the nonsignificant trend in FOK. The
interaction was not significant ( F = 1.96, p > 0.05).
There was also a main effect of kind of items ( F ( 1 , 3 6 ) = 3.80, MSe = 296.55,
p < 0.05) such that the overlearned/unrelated items were recognized better (mean =
M. Carroll et al. /Acta Psychologica 95 (1997) 239-253 245

0.83) than the criterion-learned/related items (mean = 0.75). These recognition data are
consistent with the pattern predicted by FOK (Fig. 2). That is, subjects feel they know
overleamed items better, even when unable to recall them, and they indeed do have
better memory for those items (in accord with Tables II and III in Nelson and Narens,
1990). As we suggested above, in making FOK ratings, subjects may not be tracking the
recallability of the response; rather, they might assess FOK according to earlier features
that they may have encoded, such as how often (they believe; see Nelson and Narens,
1990, Table II) the cue was presented to them. We mention that the FOK advantage of
overlearning reached significance only for the 2-week delay group ( p < 0.005), and
recognition scores were significantly greater only in the overlearned 2-week delay group
( p < 0.01); and although the overlearning effect on FOK was not significant in the
6-week delay group, it was also not significant in recognition for that group.
Overall the JOL and FOK ratings are not completely comparable, because JOL
analyses were conducted on all items whereas FOK analyses were conducted only on
unrecalled items. Therefore another JOL analysis was performed on only the same
subset of items (namely, nonrecalled ones) that received FOK ratings. The results were
essentially the same as for all items: The only significant effect was that of amount of
learning (F(1,36) = 4.48, MSe = 164.99, p < 0.05) with criterion-learned/related items
receiving higher JOL ratings (mean = 54%) than overlearned/unrelated items (mean =
48%).

2.3. D i s c u s s i o n

As Nelson and Narens (1990) have stated, JOL can tap both item difficulty and
degree of learning because JOL occurs at the end of acquisition. Which is considered
more important by the subject: difficulty of learning or amount of learning? Although
each of those is a nondichotomous variable, in our experiment the JOLs were higher for
material judged less difficult to learn (weakly associated) than for material judged more
difficult to learn (unrelated items). This occurred even though the number of recalls was
4 times as great in the unrelated condition as in the weakly associated condition. One
important finding of this study, then, is that when making JOLs, subjects give relatively
too much weight to the relatedness of the two items comprising a given pair (where
'relatively too much weight' means that the eventual recall was less affected by
relatedness than by overleaming).
A primary goal of this research was to look at the relationship between JOL and
FOK. It had been established elsewhere (Leonesio and Nelson, 1990) that the two
measures are based on different types of information. We found further confirmation of
that hypothesis. At both delay intervals, JOL was sensitive to relatedness of the material
more than to overleaming, whereas FOK was sensitive to overlearning more than to
relatedness, particularly at the 2-week delay. It is clear that the two metamemory
measures are indeed sensitive to different types of information, with JOL (more than
FOK) reflecting the variables (such as relatedness) that affect the rate of learning. By
contrast, FOK is influenced less by relatedness of the items than by the degree of
learning. Perhaps this occurs because information about the degree of learning is
encoded with the stimulus-only cue (which occurred more often in the overlearned than
246 M. Carroll et al. / A c t a Psychologica 95 (1997) 239-253

in the criterion learned condition), whereas relatedness manifests itself primarily when
the association (response) is recalled, as in the end-of-acquisition JOLs where all
responses could be recalled.
A secondary goal of this research was to investigate how recall changes over time,
and in particular, to explore whether semantic relatedness exerts a stronger influence
than the degree of overlearning at all retention intervals, which is what the JOLs predict
will happen. However, relatedness n e v e r exerted a stronger influence on recall than did
overlearning, even at the short retention interval.
Why were people's beliefs about what they would recall at odds with their eventual
recall performance? One possibility is that peopleOs a priori belief that overleaming is
less important than semantic relatedness dominated their JOLs. Put differently, subjects
may have invoked a faulty theory of retention (Maki and Berry, 1984) rather than
monitoring retrievability of the items.

3. Experiment 2

In Experiment 1 subjects who were making JOLs gave relatively too much weight to
relatedness and relatively too little weight to the degree of overleaming. What is the
basis for this incorrect weighting? To explore possible answers to that question, we
turned to the Nelson and Dunlosky (1991) finding that people are inaccurate about JOLs
when these judgments are made immediately after the acquisition of information. The
assessment people make during immediate JOLs is presumably affected by noise and
interference from short term memory (Nelson and Dunlosky, 1991). Those authors
suggested that by delaying the JOL until short-term memory is filled with other
information, a more accurate assessment will be made by the subject about the amount
and nature of information that was acquired into LTM during the learning.
In further experiments, Dunlosky and Nelson (1994) found that delayed JOLs, but not
immediate JOLs, were sensitive to variables that consistently improve recall, i.e. spaced
versus massed learning, and imagery versus rote rehearsal. However, in both studies
(Nelson and Dunlosky, 1991, and Dunlosky and Nelson, 1994) the delayed JOL always
occurred shortly after study (approximately 30 seconds after study).
In our Experiment l, all JOLs were 'delayed' for at least 30 seconds but nevertheless
were too heavily weighted towards the related items. Could this inaccuracy regarding the
eventual importance of relatedness be reduced by delaying the JOL in an even more
extreme fashion? If JOLs were to be made during a separate experimental session, then
perhaps 'noise' factors that are salient during learning, such as the relatedness of the
words in a given pair, might become less influential. Perhaps long-delayed JOLs would
no longer contain the overly heavy weighting of relatedness that occurred in Experi-
ment 1.
In Experiment 2, we attempted to replicate Experiment 1 by using the same materials
and testing intervals (2 weeks and 6 weeks), with the only change being the length of
delay between study and the JOL. One group of subjects was required to give JOL
ratings on completion of acquisition, as in Experiment l, while a second group gave
JOL ratings the day after acquisition. We refer to the first group as the 'same-session'
M. Carroll et al./Acta Psychologica 95 (1997) 239-253 247

JOL group; their ratings were given more than 30 seconds after learning was completed.
but occurred in the same session as acquisition. The second group is referred to as the
'one-day-later' JOL group; they returned to give ratings of JOL the day after acquisition.

3.1. Method

Forty university students, drawn from the same pool as those used in Experiment 1,
participated. The materials, design, and procedure were identical to those used in
Experiment 1, with one exception: Half of the subjects were required to return one day
after the acquisition session to give JOLs to each cue, while the other half gave JOLs at
the end of the acquisition session as in Experiment 1.

3.2. Results

3.2.1. Magnitude of JOL


As in Experiment 1, JOL ratings were higher for criterion-learned/related items
(mean = 62.6%) than for overlearned/unrelated items (mean = 58.3%) (F(1,56) = 5.39,
MSe = 103.71, p < 0.05). However, as the left panel of Fig. 3 shows, the interesting
finding was the interaction between the pattern of JOLs and the delay of the JOLs
(F(1,56) = 10.82, MSe = 103.71, p < 0.005). In particular, after one-day-later JOL,
ratings for the two criteria of learning did not differ significantly, and only after
same-session JOL was the magnitude of JOLs significantly higher for criterion-
learned/related items than for the overlearned/unrelated items. Thus the effect of
delaying the JOL by 24 hours was to eliminate the overly strong influence of relatedness
on the JOLs that occurs during and at the end of acquisition.

70 JOL 65 Recall
• s a m e session J O L • slae x u l o n JOL
68 60
[] one-day later J O L [] one.lay later JOL
66
SS
64

i58j
_~ SO
62
45
P 6o
.J
o 40

I
56
e 3s.
~ 30

52 ~ 2s
5O 2O
6 wk 6wk 2wk 2wk 6~ owk 2wk 2~
OLAJN CUR OLAJN CL/R OL/UN CL/R OiJUN CL/R
¢~dltl~
CL/R - cri~erl~ ,~qsled
OL/UN - overJ~rled unrtlJted coDdJtJOD C U R - crl¢¢r~a rdated
OI/I PN - . w q It'mnlcd unrelulcd

Fig. 3. (left panel) Mean JOL ratings and (right panel) Mean proportion of targets recalled as a function of
JOL timing, week delay, and type of material, Experiment 2.
248 M. Carroll et al. / A c t a Psychologica 95 (1997) 239-253

As in Experiment 1, subjects were not significantly more likely to give higher ratings
to items that would be tested after 2 weeks (mean = 62%) than to those that would be
tested after 6 weeks (mean = 59%). Nor does delaying JOL by one day increase
significantly the sensitivity of subjects to the effect that long time intervals have on
memory, at least when the 2-week vs. 6-week delay is a between-subjects variable.

3.2.2. Recall
The lower half of Fig. 3 shows the mean proportions of items recalled in each
condition.There was a main effect of group (F(1,56) = 29.7, MSe = 600.83, p < 0.001)
as in Experiment 1, such that the 6-week group recalled fewer items (mean = 0.33) than
the 2-week group (mean = 0.60). These levels of recall are similar to those found in
Experiment 1.
As in Experiment 1, recall was significantly greater for the overleamed/unrelated
items (mean = 0.47) than for the criterion-learned/related items (mean = 0.43) (F(1,56)
= 3.80, MSe = 131.54, p < 0.05). Again, this finding was contrary to the subjects'
overall predictions about the kind of items that they would recall better. However, there
was an interaction between delay of JOL and kind of items (F(1,56) = 4.45, MSe =
131.54, p < 0.05): When JOL occurred in the same session, the overlearned/unrelated
and criterion-learned/related items were recalled approximately equally (not the differ-
ence that occurred in Experiment 1), whereas for the group in which the JOLs occurred
one day after acquisition, the overleamed/unrelated items were recalled better than the
criterion-learned/related items (similar to the difference that occurred in Experiment 1).
We have no explanation for the anomalous finding for same-session JOLs, except to
note that in neither experiment was the pattem of recall consistent with the pattern
predicted by the same-session JOLs.
There was a significant improvement in recall as a result of delaying JOL by one day
(F(1,56) = 6.51, MSe = 600.83, p < 0.01). Same-session JOL was followed by only
0.39 correct recall, whereas one-day-later JOL was followed by 0.50 correct recall, and
this superiority was present for both the 2- and 6-week groups. Dunlosky and Nelson
(1994, Experiment 1) mention a similar finding for immediate versus same-session
delayed JOL, although they note that it was not robust, and they did not discuss it
further. The data in Fig. 4 suggest that one-day-delayed JOL improves recall, especially
for a 6-week retention interval; this suggests that at least in some situations the JOL can,
perhaps by serving as retrieval practice, facilitate subsequent recall (in accord with
Spellman and Bjork, 1992).

3.2.3. Magnitude of FOK


The analysis on the FOK measure (elicited for targets not recalled at test) yielded a
main effect of type of material ( F ( 1 , 5 6 ) = 15.61, MSe = 66.18, p < 0.001): Over-
learned unrelated material was given higher ratings (mean = 58%) than criterion learned
weakly associated material (mean = 52%). This shows that the FOK ratings are the
reverse of the overall JOL ratings and is therefore consistent with the findings of
Experiment 1, although the overall ratings are lower in Experiment 1 (those ratings were
52% and 45% respectively). As in Experiment 1, too, there was no effect of week delay
on FOK ( F < 1).
M. Carroll et al. / A c t a Psychologica 95 (1997) 239-253 249

Recognition
70 FOK 90
• mime l e s l i o n JOL
• samesesslonJOL 88 (-I one-dlylatorJOL
65 [] one-daylaterJOL
86

'~ s s

,1 lll ,l IllI1
so

~ 50 78

| 76
E 74 I
40 72
6wk 6wk 2wk 2wk 6wk 6wk 2wk ~.wk
OUUN C L . ~ c o u d i t i o n OIJUN CL/R OL/UN CL/R ¢oadition OL/UN CL/R
CI,/R=¢riteri(inrelaled
CUR - cdmte, rdated
OLiN - ~"k~rllqdilardllled
Fig. 4. (left panel) Mean FOK rating rating and (right panel) Mean proportion correctly recognised as a
function of JOL timing, week delay, and type of material, Experiment 2.

There was an interaction (see left panel of Fig. 4) between week delay and JOL delay
(F(1,56) = 3.80, MSe = 673.52, p = 0.05).
After a 2-week delay, FOK ratings did not differ when JOL was one-day-later or
same-session, but after a 6-week delay the magnitude of ratings dropped considerably
when the JOL rating was same-session, while they remained high when JOL rating was
one-day-later.

3.2.4. Recognition
There was a main effect of week delay on correct recognition ( F ( 1 , 5 6 ) = 3.71,
MSe = 431.46, p = 0.05); more items were correctly recognised after 2 weeks (mean =
0.86) than after 6 weeks (mean = 0.78), as in Experiment 1, where the means were 0.87
and 0.71 respectively. Although FOK ratings remained high after 6 weeks in the
one-day-later JOL condition, recognition did not reflect this; there was no significant
interaction between week delay and timing of JOL. However, unlike Experiment 1, the
main effect of type of material failed to reach significance ( F = 1.76). Although in
Experiment 2, subjects had higher FOKs for the overlearned material, this was not
reflected in greater accuracy of recognition for such material (mean overlearned/unre-
lated = 0.81; mean criterion learned related = 0.84).

3.3. Discussion

The timing of the JOL is critical in whether the JOL is dominated more by
relatedness or by the degree of learning (left panel, Fig. 3). When subjects made JOLs
within the same experimental session as that in which they learned the material, the
JOLs were dominated by semantic relatedness and were relatively less affected by the
250 M. Carroll et aL /Acta Psychologica 95 (1997) 239-253

compensatory effect that overlearning has on future recall (right panel, Fig. 3). However,
when JOL was delayed by one day, subjects no longer gave relatively greater weight to
semantic relatedness than to the degree of learning. This one-day delay allowed them to
assess their eventual retention more accurately.
The increase in total amount recalled when JOL was delayed can perhaps be
explained by retrieval practice (Bahrick and Hall, 1991; King et al., 1980), or by covert
retrieval and rehearsal (Spellman and Bjork, 1992), with the cues acting as prompts to
re-establish and maintain word pair associations. Subjects in this and in Nelson and
Dunlosky's experiment reported attempting to recall the target for each cue before
giving the estimate of future recall prformance in the delayed JOL phase.
JOL and FOK magnitudes showed opposite patterns; degree of learning was most
influential for FOK magnitude. Nevertheless, FOK is sensitive to a delayed JOL
episode: after a 6-week study-test interval, subjects felt they would recognise as many
unrecalled items as after only a 2-week interval. This belief was not borne out in actual
recognition, however: Overall, mean recognition performance was better in the 2-week
than in the 6-week group. What is surprising, nevertheless, is that a single exposure to
the cue words just a day less than 6 weeks prior to testing could strengthen the feeling of
knowing about items which were unrecallable. The above-mentioned notion of covert
retrieval is strengthened by this finding. If FOK is predominantly affected by factors
affecting retrieval and not encoding, then it is would be expected that this extra retrieval
episode would have a long term effect on FOK. At test, the events surrounding the study
context are not discriminably different in making the FOK judgment, even if they were
many or few weeks before. But after a long interval of 6 weeks, one event that is
separated from the study context constitutes part of the 'retrieval' episode, and has its
effect on FOK.

4. General discussion

The major new finding from Experiment 1 was subjects' belief that criterion learned
related material would benefit them in recall more than would overlearned unrelated
material. This was equally the case when they knew that recall would occur after an
interval of 2 or 6 weeks. In contrast to subjects' beliefs, in Experiment 1 (and in the
one-day-later condition of Experiment 2), objective performance, as measured by recall,
was better for overlearned material than for criterion-learned material.
Secondly, subjects' on-line monitoring did not indicate that the different retention
intervals of 2 versus 6 weeks would affect their recall. In contrast, recall after a 6-week
interval was worse than after a 2-week interval. However, week delay, as manipulated in
these experiments, was always a between-subjects variable. By incorporating their
general metacognitive knowledge alone (i.e., independent of the on-line monitoring of
the items) subjects' JOLs probably would discriminate between short versus long
retention intervals when retention interval is manipulated within-subjects (Kreutzer et
al., 1975).
Thirdly, FOK, unlike JOL, was dominated more by amount of learning than by
semantic relatedness. Recognition of unrecalled items was consistent with the FOK:
M. Carroll et al. / Acta Psychologica 95 (1997) 239-253 251

More overlearned unrelated items were correctly recognised than were criterion learned
related items. Similar findings were reported previously by Nelson et al. (1982). Also,
length of the study-test interval did affect FOK magnitude: these ratings are higher after
2 weeks than after 6 weeks, and recognition of items was indeed superior after 2 weeks.
In the condition of Experiment 2 which replicated Experiment 1 - same-session JOL
- subjects also believed that association value would be a better predictor of recall than
amount of learning, whereas recall performance did not validate such a belief. (Here,
unlike in Experiment 1, the recall performance was not superior for overlearned
material). However, the important new finding in Experiment 2 was that this over-
weighting of semantic relatedness was eliminated when the JOL was delayed by one
day. In this condition, criterion learned related items and overlearned unrelated items
were judged equally likely to be recalled. Actual recall performance in the one-day-later
condition produced better recall of overlearned unrelated than criterion learned related
items, as in Experiment 1. Thus at no stage was the recall of criterion-learned/related
information superior to that of overlearned/unrelated information, contrary to the
predictions from same-session JOLs.
In Experiment 2, as in Experiment 1, FOK magnitude was higher for overlearned
unrelated material than for criterion learned related material. However, recognition
performance was equal for both types of material, unlike Experiment 1. FOK was not
sensitive to 2-week vs. 6-week delay, although recognition was better for items tested
after 2 weeks than after 6 weeks. When JOL was delayed by one day, FOK magnitude
was significantly greater after 6 weeks than it was when JOL had occurred in the same
session.
With respect to the questions posed in the introduction, we can state the following:

4.1. Effects on recall

The questions of interest were how the rate of forgetting compares for small amounts
of learning on episodes that contain preexisting semantic associations versus large
amounts of overlearning on episodes containing non-preexisting associations. The
answer seems to be that preexisting associations do not have an advantage for recall
when the delay is long enough or large amounts of learning occur. Thus large amounts
of learning can compensate for a lack of initial relatedness. This seems inconsistent with
any view that over time the episodic aspects would be lost faster and pre-existing
sematic aspects would determine recall. Despite confirmation from PET scans of the
distinction between episodic versus semantic processing (Nyberg et al., 1996), we find
no evidence in these experiments to support such an episodic/semantic memory
distinction.

4.2. Effects on m e t a m e m o r y

The effect of overlearning is clear: In both experiments, subjects believe that small
amounts of learning of related material will lead to better memory than large amounts of
learning of unrelated material, at least when the JOL rating is made in the acquisition
session. That is, they underweight the importance of overlearning. We suggested earlier
252 M. Carroll et aL / Acta Psychologica 95 (1997) 239-253

that JOL might be more sensitive to episodic factors, while FOK may be more sensitive
to semantic factors, assuming that such a distinction is sound. No evidence was found
for this assumption. Rather JOL may be heavily influenced by long-standing metacogni-
tive knowledge (Kreutzer et al., 1975) that is different from on-line metacognitive
monitoring of each item. For instance, Maki and Berry (1984) concluded that people
have a 'theory of retention' of how much they will typically remember. This might
include the idea that related items will be remembered better than unrelated items,
regardless of the degree of overlearning that might occur on the latter to compensate for
potential disadvantages relative to the former. Related to this is the idea that people have
long-standing metacognitive knowledge about the effectiveness of various acquisition
strategies that they employ to facilitate their learning (McDaniel and Kearney, 1984).
Such long-standing metacognitive knowledge may be combined in some way with the
information from the on-line monitoring of the items. In the limit, the long-standing
metacognitive knowledge might dominate the information the person obtains during
on-line monitoring, such that the JOL might be affected more by general beliefs (even if
they are sometimes incorrect) such as the importance of the role of semantic relatedness
rather than (or more than) by the information from on-line monitoring.
The temporal proximity of same-session JOL to acquisition factors might reasonably
focus on the salience of the cue and target pair, which of course includes their semantic
relationship. After one day, the salience of this relationship is reduced, and the
overweighting of relatedness is eliminated. However, during FOK judgments that do not
occur until several weeks later, subjects are not influenced by features of the target;
rather, they assess their likely recognition, among other things, on the basis of what they
can remember about the circumstances surrounding stimulus presentation. Stimuli
which were presented more often have acquired a greater familiarity than stimuli which
were presented less often. According to this view (cf. Koriat, 1993), FOK is not
monitoring memory for the unrecallable information about the target but for the
information that is recalled (e.g. about the cue, the context of learning, a n d / o r about
partial-but-incomplete information concerning the target).
Thus the present findings support the claim of recent research (Mazzoni and Nelson,
1995) that JOLs do not merely track the probability of recall (aka memory strength).
Instead, JOLs track aspects of information acquired during a particular learning episode
a n d / o r during much earlier learning (e.g., semantic relatedness), wherein some of those
aspects will be part of whatever underlies recall while other of those aspects will not.
Our research provides a beginning and an impetus for determining which aspects belong
to each of those two theoretical subdivisions of the information acquired during learning.

Acknowledgements

This research was supported by an Australian Research Council Grant to the first
author and by grant RO1-MH32205 and a career development award (K05-MH1075)
from the National Institute of Mental Health to the second author. We thank Louis
Narens for his suggestions.
M. Carroll et al. / Acta Psychologica 95 (1997) 239-253 253

References

Bahrick, H.P. and L.K. Hall, 1991. Lifetime maintenance of high school mathematics content. Journal of
Experimental Psychology: General 120, 20-33.
Carroll, M. and T.O. Nelson, 1993. Effect of overlearning on the feeling of knowing is more detectable in
within-subject than in between-subject designs. American Journal of Psychology 106, 227-235.
Dunlosky, J. and T.O. Nelson, 1994. Does the sensitivity of judgments of learning (JOLs) to the effects of
various study activities depend on when the JOLs occur? Journal of Memory and Language 33, 545-565.
King, J.F., E.B. Zechmeister and J.J. Shaughnessy, 1980. Judgments of knowing: the influence of retrieval
practice. American Journal of Psychology 93, 329-343.
Koriat, A., 1993. How do we know that we know? The accessibility model of the feeling of knowing.
Psychological Review 100, 609-639.
Kreutzer, M.A., C. Leonard and J.H. Flavell, 1975. An interview study of children's knowledge about
memory. Monographs of the Society for Research on Child Development 40, (1, Serial no. 159).
Leonesio, R.J and T.O. Nelson, 1990. Do different metamemory judgments tap the same underlying aspects of
memory? Journal of Experimental Psychology: Learning, Memory and Cognition 16, 464-470.
Maki, R.H. and S.L. Berry, 1984. Metacomprehension of text material. Journal of Experimental Psychology:
Learning, Memory and Cognition 10, 663-679.
Mazzoni, G. and T.O. Nelson, 1995. Judgments of learning are affected by the kind of encoding in ways that
cannot be attributed to the level of recall. Journal of Experimental Psychology: Learning, Memory, and
Cognition.
McDaniel and Kearney, 1984. Optimal learning strategies and their spontaneous use: The importance of
task-appropriate processing. Memory and Cognition 12, 361-373.
Nelson, T.O. and J. Dunlosky, 1991. When people's judgments of learning (JOLs) are extremely accurate at
predicting subsequent recall: The 'Delayed-JOL Effect'. Psychological Science 2, 267-270.
Nelson, T.O., R.J. Leonesio, A.P. Shimamura, R.F. Landwehr and L. Narens, 1982. Overlearning and the
feeling of knowing. Journal of Experimental Psychology: Learning, Memory and Cognition 8, 279-288.
Nelson, T.O. and L. Narens, 1990. Metamemory: A theoretical framework and new findings. The Psychology
of Learning and Motivation 26, 125-141.
Nelson, T.O. and L. Narens, 1994. 'Knowing about knowing'. In: J. Metcalfe and A.P.Shimamura, Metacogni-
tion. Cambridge, MA: Bradford.
Nyberg, L., R. Cabeza and E. Tulving, 1996. PET studies of encoding and retrieval: The HERA model.
Psychonomic Bulletin and Review 3, 135-148.
Spellman, B.A. and R.A. Bjork, 1992. People's judgments of learning are extremely accurate at predicting
subsequent recall when retrieval practice mediates both tasks. Psychological Science 3, 315-316.

You might also like