Professional Documents
Culture Documents
10 1080@10986060701341332
10 1080@10986060701341332
Jane M. Watson
Faculty of Education
University of Tasmania, Australia
Rosemary A. Callingham
School of Education
University of New England, Australia
Ben A. Kelly
Faculty of Education
University of Tasmania, Australia
This study presents the results of a partial credit Rasch analysis of in-depth interview
data exploring statistical understanding of 73 school students in 6 contextual settings.
The use of Rasch analysis allowed the exploration of a single underlying variable
across contexts, which included probability sampling, representation of temperature
change, beginning inference, independent events, the relationship of sample and pop-
ulation, and description of variation. Interpretation of the demands of increasing code
levels for the resulting variable revealed an increasing appreciation of and interaction
between the ideas of variation and expectation. Student progression in understanding
is illustrated with kidmaps, and educational implications are considered.
LITERATURE REVIEW
available in English until 1975 (Piaget & Inhelder, 1975). In this work, children’s
understanding of random events and centered distributions, and the conflict seen
when children attempted to reconcile the behavior of a biased spinner with
respect to its apparently random nature were described. Three stages of develop-
ment were identified. In the first, children were unable to distinguish between “…
the possible and the necessary” (p. 216); by the second stage, children recognized
the nonpredictability of chance events that were not amenable to deductive rea-
soning; in the third stage, these ideas were reconciled and children could reason
deductively to identify outcomes and quantify these, but also accept the innate
unpredictability of chance events.
The history of more recent research into understanding of stochastic ideas has
followed the curriculum to some extent, with expectation reflected in research into
chance and probability, especially in the work of Green (e.g., 1983, 1986, 1991) and
Fischbein (e.g., Fischbein & Gazit, 1984; Fischbein, Nello, & Marino, 1991;
Fischbein & Schnarch, 1997) picking up the misconceptions identified by
Kahneman and Tversky (e.g., 1972). These classic studies focused almost entirely
on the expectations related to events in sample spaces. Although Kahneman and
Tversky implicitly recognized the importance of variation in some of their scenar-
ios—for example, the famous “hospital” problem—the explicit reference made was
to the representativeness of the sample in terms of sample size. Rubin, Bruce, and
Tenney (1991) were the first to comment specifically on students’ struggles with
expectation and variation in situations of samples of different sizes. Interest in the
mean by Mokros and Russell (1995) continued the research related to expectation
and they concluded that few students had an appreciation of the representative
nature of the mean in terms of the data set it represented.
Konold and Pollatsek (2002) used the metaphor of “noise” for variation and
“signal” for expectation. In thinking of a signal as a central tendency they meant
a stable value that (a) represents the signal in a variable process and (b) is better
approximated as the number of observations grows. … Processes with central ten-
dencies have two components: (a) a stable component, which is summarized by
the mean, for example, and (b) a variable component, such as the deviation of
individual scores around an average, which is often summarized by the standard
deviation. (p. 262)
THIS STUDY
Following the analysis of survey and interview data from tasks that suggested
students had difficulty reconciling expectation and variation, it was of interest to
combine interview data from a number of tasks based in different contexts using
Rasch measurement techniques, to provide a more complete model based on in-
depth reasoning in contexts involving experimentation and extended tasks. The
aims of this study were then to:
METHODOLOGY
Sample
The sample for this study consisted of 66 students selected from the sample of
746 surveyed by Watson et al. (2003), plus an extra 7 six-year-old children who
were near the end of a preparatory year of full-time schooling before entering
Grade 1 (Prep). The 66 students included 18 from Grade 3, 18 from Grade 5, 15
from Grade 7 and 15 from Grade 9, selected by the researchers for a range of
responses to the survey questions and by their teachers as being articulate and
willing to be interviewed by the researchers. The six-year-olds were chosen by
their teacher as articulate and happy to talk to the researchers; these students had
been involved in an enriched mathematics program but not one involving chance
and data. The other students had experienced a mathematics curriculum based on
state guidelines derived from A National Statement on Mathematics for
Australian Schools (AEC, 1991), although it was unknown what specific topics
they had studied in relation to chance and data.
Tasks
The tasks used in this study were based on five protocols in different contexts
involving various aspects of chance and data where variation and expectation play a
role in decision making, together with an explanation of words associated with vari-
ation. The protocols are presented in Appendix A in the order in which they were
answered by most students. The Lollies Task was developed from the work of
Shaughnessy et al. (1999) and Torok and Watson (2000), and initial analysis of out-
comes for students in this study is given in Kelly and Watson (2002). The task
88 WATSON, CALLINGHAM, KELLY
involved 100 lollies (or candies) in a container, of which 50 were red, 20 yellow,
and 30 green. Students were presented with the container and asked to speculate in
various ways about the number of reds in a handful of 10 removed without looking.
They were then allowed to produce six handfuls of 10 (with replacement), asked if
they wished to change their estimates, and given the opportunity to represent the
outcomes of 40 such trials. Ideas related to expectation and variation occurred
throughout all parts of the protocol reflecting the initial concerns of Zawojewski
and Shaughnessy (2000).
The Weather Task was adapted from a protocol used by Torok and Watson
(2000) and was analyzed for the students in this study by Watson and Kelly
(2005). The protocol was based on the yearly average daily maximum tempera-
ture in Hobart, Tasmania, of 17ºC. Students were asked to explain what this
meant, suggest daily maxima throughout the year, draw a graph to represent the
maximum temperatures throughout the year, and interpret three graphs presented
to them. Initial expectation was expressed in the yearly average of 17ºC, whereas
all questions were directed at variation about this expectation and the yearly trend
in expectation as recognized by the students.
The Comparing Groups protocol employed three of four parts of the protocol
first analyzed in relation to beginning inference by Watson and Moritz (1999) and
later analyzed for explicit features of variation discussed in the responses by
Watson (2001, 2002). Based on the work of Gal and his colleagues (e.g., Gal,
Rothschild, & Wagner, 1989) students were asked to compare three pairs of
graphs, each pair showing test outcomes for two classes of children. Two pairs of
graphs represented sets of the same size, whereas the third pair represented sets
of differing size. Of interest were the methods chosen by students to decide which
classes had done better and the notice taken of variation in the graphs during deci-
sion making. Coding of responses with respect to two rubrics, one for expectation
and one for variation, was reported for a larger data set by Skalicky (2005).
The Spinners Task was adapted from the work of Zawojewski and Shaughnessy
(2000) and Shaughnessy and Ciancetta (2002), and analyzed by Watson and Kelly
(2004a). The task involved two circular spinners, each half black and half white.
The scenario was based on the chances of winning a game where it was necessary
to spin both spinners and have them land on black. Trials of the game were actually
performed with students and they were allowed to change their initial estimates of
the chances of winning. Although observation of difference from students’ stated
expectation was considered in coding, this task mainly addressed expectation
related to outcomes from the two independent spinners.
The Population/Sample Means Task was adapted from a problem of Tversky
and Kahneman (1971) and was based on the difference in mean values for a random
sample of size 10 drawn from a population and for the corresponding sample of size
9 if one of the values from the original sample is known. In this case the context
was the weight of Grade 5 students from a population with a mean of 30 kg, for a
APPRECIATION OF EXPECTATION AND VARIATION 89
sample of size 10, where one value was known to be 39 kg (Watson & Kelly, 2006).
Again the initial questions in the protocol were stated in terms of expectation, but
appreciation of sampling variation was instrumental in achieving sophisticated
responses and two rubrics were devised to account for both ideas.
The task associated with explanation of words related to variation was devel-
oped by the researchers and was analyzed in a similar fashion to that used by
Watson and Kelly (2003c). It included a hint associated with interpreting the sen-
tence, “The winds are variable.” All of the tasks except the last began with a ques-
tion concerning an expectation based on the context, but how variation was
observed and used in decision making or the creation of representations created
was a feature of the analysis.
The Prep students only responded to the Lollies Task and the Weather Task.
This results in their contribution to five items in the subsequent analysis.
Initial Analysis
The initial analysis of the responses to each of the protocols was informed in two
ways: by the structural taxonomy suggested by Biggs and Collis (1982, 1991) and
by the statistical appropriateness of the responses. The work of Biggs and Collis
is in the Piagetian tradition (Inhelder & Piaget, 1958), reflecting the development
of understanding observed in children as they progress through the school years.
The taxonomy presents four levels of interaction with the relevant elements of the
task presented: (a) at the prestructural or iconic level, responses do not employ
elements of the task and are likely to involve idiosyncratic reasoning; (b) at the
unistructural level, responses employ single elements and are likely not to be
aware of contradictory information; (c) at the multistructural level, responses use
multiple elements, usually in sequence, sometimes recognizing but not being able
to resolve conflicting information; and (d) at the relational level, responses inte-
grate multiple elements of the task to achieve closure, resolving any conflict
encountered. As well as being informed by this structure, the appropriateness of
the elements and their combination was important in coding. Although at times a
response could not be deemed “correct” or “incorrect” given the open-ended
nature of a question, it could be said that it was more or less statistically appropri-
ate given the nature of the task presented. The criteria for appropriateness are
described in detail in Appendix A.
The coding schemes for the tasks were developed and detailed in previous
studies (Kelly & Watson, 2002; Skalicky, 2005; Watson & Kelly, 2003c, 2004a,
2005, 2006). Codes for the Lollies Task were revised from those of Kelly and
Watson (2002) by the first and third authors according to criteria noted in the pre-
vious paragraph, and applied by the two authors independently to the response
set. Any discrepancies were discussed and resolved in a fashion consistent with
the suggestions of Miles and Huberman (1994, p. 61). Appendix A contains the
90 WATSON, CALLINGHAM, KELLY
coding schemes and the frequency with which each code was observed in the
overall sample for the 11 items defined based on the protocols. Table 1 outlines
the components of the protocols presented, the labels applied, the criteria for cod-
ing, and the range of coding values possible.
Secondary Analysis
Coded data were analyzed using the Quest computer program (Adams & Khoo,
1996) employing the Partial Credit Model (PCM; Masters, 1982), with the aim of
identifying developmental pathways (Bond & Fox, 2001, Chapter 7). The rela-
tively small sample size (73 students) could lead to somewhat greater measure-
ment errors than seen with large-scale survey data, but the use of interview data
provided opportunity for more accurate coding according to the underlying devel-
opmental model employed, and this, it was felt, would provide information on
likely progression that would outweigh the disadvantage of the small sample size.
The PCM (Masters, 1982) is one of the family of Rasch measurement models.
It makes use of the interaction between persons (in this case the interview sub-
jects) and items (the 11 tasks) to determine the relative positions of all persons
and all items on the same measurement scale. The unit of measure is the logit, the
natural logarithm of the odds of success (Wright & Masters, 1982). The PCM has
been used with interview data from 58 students in relation to Piagetian tasks
(Bond & Bunting, 1995; Bond & Fox, 2001), and is regarded as a useful model
TABLE 1
Coding Criteria for Each of the Tasks
Lollies (Parts 1–4) LDN Expectation and variation shown in discussion 0–4
Lollies (Part 5) LGR Expectation and variation shown in graph 0–4
Weather WDN Expectation and variation shown in discussion 0–4
(Parts 1a, 1b, 1d)
Weather WDT Consistency of variation shown in suggested 0–3
(Parts 1c, 1e, 1f, 1g) temperatures
Weather WGR Expectation and variation shown in graph produced 0–4
(Parts 1g, 2) and graph interpretation
Comparing Groups CGX Expectation in deciding differences between groups 0–5
Comparing Groups CGV Variation in deciding differences between groups 0–4
Spinners SPN Expectation in explaining outcomes 0–4
Population/ PSX Expectation in suggesting means for two samples 0–3
Sample Means
Population/ PSV Variation in suggesting means for two samples 0–4
Sample Means
Definition of Variation VDF Appreciation of variation 0–4
APPRECIATION OF EXPECTATION AND VARIATION 91
because it allows for a different number of coding steps for each item. In this
study the 11 tasks were coded from 0 to 3, from 0 to 4, or from 0 to 5, determined
through the initial analysis.
Several statistics, produced by the Quest program, are used to evaluate the fit of
the data to the PCM. The first of these is the Infit Mean Square (IMSQ), a weighted
measure of the extent to which the fit of the items (item IMSQ) or persons (case
IMSQ) deviates from the expected value of 1.00. Acceptable values lie between 0.77
and 1.30 (Adams & Khoo, 1996; Keeves & Alagumalai, 1999). For both items (item
IMSQ = 1.01) and persons (case IMSQ = 1.00) the overall mean values in this study
were acceptable. Individual item fit was also considered. Only two items showed
small misfit: CGV had some indication of random behavior (IMSQ = 1.39) and LDN
behaved unexpectedly consistently (IMSQ = 0.71). Complete item difficulties, mea-
surement error, and fit values are provided in Appendix B. The Separation Reliability
is a measure of how well the items (RI) or persons (RP) behave consistently. These
statistics may be interpreted as a reliability statistic, and have an ideal value of 1. For
both items (RI = 0.90) and persons (RP = 0.90), the figures were high, indicating that
the behaviors of both items and persons were consistent.
The Quest program produced a variable map of the behavior of items and
students, which was interpreted by a qualitative analysis of the skills, knowledge,
and understanding required to respond to the particular items that were clustered
close together on the variable. Item clusters were initially identified by inspecting
the variable map and noting places along the variable where there was an appar-
ent “jump” in difficulty, shown by a gap or discontinuity among the item difficul-
ties. The items occurring in each cluster were then analyzed to distinguish
common cognitive demands, based on the item coding from the initial analysis.
Finally, a short descriptor of each cluster was synthesized. Discussion and agree-
ment among the authors determined the placement of lines on the variable map to
indicate different levels of cognitive demand along the variable. This procedure is
the same as that described in other studies (Callingham & Watson, 2004, 2005). It
should be noted that the lines between the levels are not considered as “hard”
boundaries. Error of measurement means that it is not possible to draw firm divi-
sions between items and that there is potential overlap among items occurring at
the margins. Rather, the levels are a convenient device for describing consistent
behaviors across a range of tasks at different points along a continuum, and thus
provide useful information about likely patterns of development among children.
This approach has been used elsewhere to provide a profile of students’ likely
development (Griffin, 1990). The characteristics of the item clusters appearing at
each level are described in the Results section.
To illustrate the typical performance of students at the various levels, kidmaps
(Adams & Khoo, 1996) are presented. These show the most likely placement of indi-
viduals with respect to the items and how they performed in terms of what would be
expected from the difficulty of the items. The dotted lines across the maps indicate
92 WATSON, CALLINGHAM, KELLY
one standard error of measurement. In general, for items falling within this range, the
individuals have approximately a 50% chance of success. For categories below this
region, the chances are higher than 50%, whereas above this region the chances are
less than 50%. Anomalies of performance shown in the upper left quadrant are those
where students achieved on an item at a level higher than might have been expected.
Conversely, those items shown in the lower right of the map are those where students
did not achieve what would have been expected. Overall, the kidmaps show the con-
sistency of students’ responses in relation to the underlying variable.
RESULTS
FIGURE 1 Variable map for the conceptual understanding of expectation and variation.
Description of Levels
Level 1, Idiosyncratic. For most of the response categories appearing at
Level 1, iconic reasoning that was not related to issues involving expectation or
variation was likely to be displayed. For the Lollies protocol, students were
likely to explain outcomes (LDN.1) in terms of their favorite numbers, of the
94 WATSON, CALLINGHAM, KELLY
position of lollies in the container, or of the sizes of their hands. Similarly for the
Spinners Task (SPN.1), explanations for observed outcomes from trials were
likely to be based on egocentric or anthropomorphic beliefs, for example, sug-
gesting “nine” black outcomes “because I’m turning nine this year.” For the
Weather protocol, explanations (WDN.1) were likely to be inconsistent across
parts, perhaps suggesting alternatives to the maximum or noting a single aspect
of the weather context but also focusing on personal experiences of cold weather
and choosing what clothes to wear. Graphs for the lowest response category for
the Weather protocol (WGR.1) consisted only of informal axes with no data or of
pictures of sun and rain; as well, there was an inability to interpret other graphs
shown in the protocol except for occasionally noting single values in one of
them. Figure 2 shows two examples of representations for the Weather Task
(WGR.1) at Level 1. For the initial response category for comparing graphs of
two data sets (CGX.1), single features were likely to be used to distinguish the
better set, for example, noticing placement along the number line for the Blue
and Red classes or the existence of a “7” for the Brown class (see Appendix A).
The lowest six response categories shown on the variable map represented an
appreciation of what the tasks were about but could go little further with expla-
nation or representation.
The kidmap in Figure 3 shows the performance of a Prep student, S1, with an
ability estimate of –3.35 logits (cf. Figure 1), placing the student in the middle
range of Level 1. For the four response categories in the range where the odds of
success are 50–50, the student was successful on 3 of 4 items, reflecting idiosyn-
cratic explanations of variation and inconsistent suggestions of temperature data,
shown by the following excerpts from the interview.
The student did not achieve at the higher levels on any task attempted, as would be
expected from the student’s placement on the scale (as noted earlier, the seven Prep
students only responded to tasks LDN, LGR, WDN, WDT, and WGR). As shown in
Figure 2, S1 drew a picture of herself in the sun and explained what she would wear
if it were hot or cold for WGR. She also did not achieve a Code 1 response to LGR,
and her relatively poor performance in graphing tasks may reflect lack of prior
experience. In her responses, S1 typifies the behaviors expected within Level 1.
category are shown in Figure 5. These students were likely to know what the
general shape of a graph should be like but were unable to connect this with the
requirements of the task to show change over the year.
The kidmap in Figure 6 is from a Grade 3 student, S2, with an ability estimate
of –1.54 logits. This student was not asked about definitions of variation but else-
where provided Code 1 responses except for Comparing Groups (CGX.2).
Variation was acknowledged in the Spinners Task (SPN), but the explanation
given was anthropomorphic.
S2 (SPN): [Agree with Jeff 50–50 chance?] No. Because they might not land on
the same place because they don’t know whereabouts they are going
to land.
In the Comparing Groups Task coded for expectation (CGX), the response
focused on the class total only, using a stepwise calculation.
Graphing tasks produced the representation shown on the left of Figure 5 (LGR)
and the following limited response to WGR.
S2 (WGR): Writes Jan, Feb, Mar. Produces a list for Jan: 18, 9, 10, 22, 13, 18.
The student did not achieve a Code 2 on the Weather graphing task, but this result
is not unexpected, as the item falls within the range where there is a 50% chance
of success. This student was at the top of Level 2, possibly in transition to more
sophisticated thinking, in keeping with the idea that the boundaries identified are
not “hard” barriers.
FIGURE 7 Representations at Level 3 for Lollies task (LGR.2) and Weather Task (WGR.3).
100 WATSON, CALLINGHAM, KELLY
S3 (CGX, CGV): [Yellow or Brown class better?] Yellow [How did you
decide?] Because there’s a whole lot of 5s (Yellow).
There’s two 6s and two 4s. And this one (Brown) has
only got 10 [points to the 3 and the 7] … [Pink or Black
class better?] Pink, because more of the bars are higher.
APPRECIATION OF EXPECTATION AND VARIATION 101
S3 (VDF): [Have you heard “the winds are variable”?] Yes. [What does it
mean?] It is changing or something.
The anomalous outcomes for S3 were for the items on populations and samples
(PSX and PSV) where Code 1 responses were not achieved. As noted earlier,
these items were generally more difficult for students, but S3’s responses may
also reflect a lack of opportunity to learn about these ideas in any formal sense in
the primary years of schooling, as shown in the following extract.
S3 (PSX, PSV): [Next 9 children, average weight?] 189. [How did you work
it out?] Because I was trying to use the 30 and 39. [The
whole sample of 10 children?] [Pause] I don’t know. [Do
you want to use a calculator?] Yes, writes down 4287.
[How?] First I started with 189 plus 39 …
Level 4, Consistent. Two of the tasks had their highest codes appearing at
Level 4: WDT and VDF. Students were likely to recognize the need for consis-
tency in suggesting ranges in relation to data values (WDT.3) and, when specifi-
cally asked for explanations of words associated with variation, generally
provided satisfactory responses for all terms (VDF.4). They also usually com-
pared graphs of data sets of the same size successfully (CGX.3). In explaining
aspects of variation for drawing lollies from a container (LDN.3), responses
tended to refer to “more” or “half” with some appreciation of center but without a
strong appreciation of proportion across related tasks. Similarly explanations of
variation in temperatures (WDN.3) focused on comparisons between sites, alter-
natives to a maximum, or multiple aspects of weather events without focusing
explicitly on center or distribution. For graphing of repeated outcomes from
drawing lollies from a container (LGR.3), responses were likely to be a time-
series type focusing realistically on the center or to be a frequency type without
specific reference to an appropriate center. Two graphs for the Lollies Task repre-
sentative of this level are shown in Figure 9.
Among the most consistent performances at Level 4 was that of the Grade 7
student, S4, with an ability estimate of 0.86 logits, who had no anomalous results.
The kidmap is shown in Figure 10. The student focused on center in predicting
lollies outcomes but also acknowledged variation (LDN.3).
S4 (LDN): [1(a) How many red in 10?] 5 [Why?] Because there are 50 in
there and the other two, 20, 30, that equals another 50, and that’s
100 and that’s the majority of them, so you might get 5. [2(a) Six
expected outcomes] 5, 4, 6, 5, 7, 6 (centered values, reasonable
102 WATSON, CALLINGHAM, KELLY
The student was consistent in giving temperature values (WDT.3) but did not achieve
a Code 3 response for explaining the average temperature in Hobart (WDN).
S4 (WDN): [1(a) What does 17ºC mean?] It is cold. [Anything else?] It is not
really a hot place, it is a more of a lower temperature place to live
in. [1(b) All days 17ºC?] No. [Why?] Well on summer days …
this year we had a couple up to 30… and in the winter it has been
cold like 7 … and 12. [1(c) Suggested temperatures] 23, 31, 13,
19, 29, 27 [1(d) Explain choice of 6 temperatures] An average day
… not a warm day, just an in-between day, bit of a cold day.
On the Comparing Groups Task (CGV), S4 achieved a Code 2 response, but did
not reach the higher level Code 3 that fell within the 50%-chance zone.
S4 (CGV): [Yellow or Brown class better?] Exactly the same (added scores).
[Pink or Black class better?] Pink. They had more 6s, had more
5s, more 4s, more 3s.
APPRECIATION OF EXPECTATION AND VARIATION 103
ideas such as range. Five tasks, those for the Lollies, Weather, and Spinners proto-
cols, had their highest codes at Level 5. For the Lollies protocol (LDN.4), the
explanation was likely to include mention of shape (although the term distribution
was seldom used) with focus on proportional reasoning. Similarly for the Weather
protocol (WDN.4), responses generally focused explicitly on variation away from
the average maximum temperature. In the graphing task for the Lollies protocol
(LGR.4), graphs showing the appropriate shape of the relevant distribution were
likely to be drawn, although often with too much variation. Two examples are
shown in Figure 11. For the Weather protocol (WGR.4), graphs showed the
appropriate shape and variation throughout the year and the other graphs were
described correctly in terms of meaning and variation shown. Two graphs are
shown in Figure 12.
For the Spinners Task two codes appeared at this level, Code 3 and Code 4.
Students appeared to learn from the trialing of the spinners after initially suggest-
ing 50–50 outcomes, and some students were able to use this prompt to reach the
higher level of response (SPN.4). For others, however, responses were unlikely
to be quantitatively appropriate (SPN.3). In comparing graphs of two data sets
(CGX), responses were likely to be successful in determining the better
groups when the groups were of unequal size, using a single feature of the graphs,
either the mean or the visual proportional aspect but not both. Generally at Level 5,
variation was appreciated in various contexts and this was stated appropriately in
relation to proportions (e.g., Lollies and Spinners protocols) and to averages (e.g.,
Weather protocol) and to an inference with a single measure (e.g., Comparing
Groups protocol). The intuitive dilemma of reconciling expectation and variation
was likely to be resolved for tasks based in straightforward situations.
At Level 5, a Grade 7 student, S5, with an ability estimate of 1.95 logits,
reached the highest response category for four of the tasks (LDN.4, LGR.4,
CGV.4, VDF.4), but performed unexpectedly poorly on suggesting consistent
temperatures (WDT). This discrepancy is difficult to explain but may relate to a
lack of interest or familiarity with the weather topic. The kidmap for this student’s
responses is shown in Figure 13. The response to the Lollies Task (LDN) showed
the student’s understanding of expectation and variation.
S5 (LDN): [1(a) How many reds in 10?] Probably about 5. [Why?] … If you
choose 10, we have … half of the 100 is red. So I expect if you
pulled out 10, half of that many … would be red. [2(a) Six
FIGURE 13 Kidmap for student S5 at Level 5.
106
APPRECIATION OF EXPECTATION AND VARIATION 107
The response to the Weather Task (WDT), however, showed little appreciation of
the context, and did not take account of the information provided about the aver-
age temperature.
S5 (WDT): [1(c), 1(d) Suggested temperatures] 31, 29, 27, 30, 26, 25, to
give a wide range of the possibilities because quite often you
have a very cold day but then you have very hot days and so the
rest are just spread out through the middle to show that they are
all different and you can get different temperatures. [1(e)
Highest and lowest maximums for the whole year] 32 and 25.
[1(f) Highest and lowest maximums for January] 17 and 10.
[1(g) Highest and lowest maximums for July] 23 and 20.
S5 (CGX, CGV): [Yellow or Brown class better?] I would have a look at the
scores. We have got 1, 2, 3, 4, 5 (Yellow). These guys
(Brown) have got less people getting the average but have
got more variety. Although they (Brown) have got less than
the Yellow class, the lowest person got less than the Yellow
class but they have also got a higher rate (Brown). So I
reckon they are about equal but … if I just look at it like
that I reckon they are around about equal but I would have
to see to be exact. … These people (Brown) got 3 too, so I
would just take that part out (Yellow) and then say these
people (Brown) got exact from there (Yellow) so I would
count this and this and these people (left, right, middle,
Brown) and those two (Yellow 5s) which is 10 so they got
equal. [Pink or Black class better?] I would have a look at
the highest and lowest scores. So the highest being 9
(Pink), the highest being 9 (Black). Four people in the sec-
ond highest which was 8 (Pink), four people here (Black).
Their third highest (Black) six people getting 7, same there
(Pink). Their lowest got 2 and 2 (Pink/Black), and then this
108 WATSON, CALLINGHAM, KELLY
Level 6, Comparative distributional. The four items at this level are based
on two protocols, representing the highest codes for the tasks, which each
required comparing and contrasting of two data sets (two graphs of data or two
samples). For Comparing Groups the two items required both visual comparison
and the use of means in comparing two groups (CGX), as well as an integrated
comparison of global features of variation for the two groups (CGV). The other
two items, for Population/Sample Means, required an understanding of the
sample mean as a representation of the population mean in the two sample sizes
(PSX), as well as the explanation of multiple aspects of potential variation associ-
ated with the values in the two samples (PSV).
Only one student performed at Level 6. This student, S6, with an ability esti-
mate of 4.66 logits, was only unsuccessful at the highest response category on the
explanation of variation in the weather (WDN.4). This Grade 7 student’s kidmap is
shown in Figure 14. Of particular interest are responses to the Population/Sample
Task, where the student recognized and reconciled the expected value (average)
with likely variation.
S6 (PSX, PSV): [The next 9 children, average weight?] Umm, about, around
29, 30. [Why?] That’s just an average, it could be anything
but because the first one’s 39, I wouldn’t expect it to be more
than 30 but the average weight would be well over that one
because it is just a small sample. Yes, like they could be a lot
lighter, so could be, yes, could be anything but it would be
around there as an average. [Sample of 10?] Around 31, 32.
[Why?] Because the 39 you always know—you first know
that one’s a bit heavier than the average already, so if they’re
on average he would probably be about 31 all up … just a bit
heavier.
Similarly, on Comparing Groups (CGX, CGV) the student was able to take a
global perspective, using all the information.
S6 (CGX, CGV): [Yellow or Brown class better?] They were about even
because they (Yellow) had more on 5 and they (Yellow)
didn’t have any 3s, but they (Brown) had a 7 and they
(Yellow) didn’t have a 7, so it is pretty much exactly even.
[Pink or Black class better?] The black class scored a bit
FIGURE 14 Kidmap for student S6 at Level 6.
109
110 WATSON, CALLINGHAM, KELLY
better because they got more higher but there’s many more
students in this (Pink) so on an average they (Pink) would
be about 5 or 6. On an average they (Black) would be about
a 6 so they would be around even. [How would you find
out?] Average all the scores up to get it to an average score.
Add all the scores together and divide it by how many.
DISCUSSION
Following the work of Bond and Bunting (1995) using Piaget’s pendulum task,
this study adds to the evidence supporting the use of Rasch analysis in develop-
mental research.
The use of Rasch modeling in the Bond and Bunting (1995) research showed
the value that the concept of order has within a framework of unidimensional-
ity. Interpretations of item and item step order as well as person order are
clearly central in developmental and educational research, with clear implica-
tions for measuring physical skill acquisition and medical rehabilitation as well.
TABLE 2
Levels of Performance Across Grades
Level 1 4 2 0 0 0 6
Level 2 3 3 1 0 0 7
Level 3 0 10 11 2 1 24
Level 4 0 3 6 9 10 28
Level 5 0 0 0 3 4 7
Level 6 0 0 0 1 0 1
APPRECIATION OF EXPECTATION AND VARIATION 111
Hand in hand with a clear concept of the variable under examination is the
Rasch concept of unidimensionality. Although this might seem a little esoteric
to some, the point is an important one in the application to Rasch measurement,
especially in novel settings. (Bond & Fox, 2001, p. 103)
There are three factors that can influence the error estimates and, hence, confi-
dence in the outcomes of a partial credit analysis (Bond & Fox, 2001, p. 100).
These factors are (a) a small sample size, (b) the item difficulties being off target
for the population and showing a ceiling or floor effect, and (c) a large number of
response codes per item that can lead to poor discrimination between categories.
Although the sample size and number of items in this study were not large, the
overall fit for the 11 items was good (see Appendix B), with only some indication
of random behavior for CGV and of unexpectedly consistent performance for
LDN. Errors of measurement (shown in Appendix B) were not unduly large,
despite the small sample size, and in keeping with the size of the errors in the
Bond and Bunting study. The variable map (Figure 1) shows a reasonable distrib-
ution of items and students. There is a slight “tail” in the students’ distribution,
but there are items at low levels that did allow these students to demonstrate what
they were able to do. Similarly, at the other end of the scale, there is a cluster of
items that demanded high levels of response. Item and case (person) separation
reliabilities were high indicating that the tasks provided a wide ranging variable,
and that the students were spread out along it. Overall, based on the three criteria
of Bond and Fox, there can be confidence in the outcomes from the analysis.
Variable Interpretation
The decisions on the placement of boundaries in the variable map were based on
qualitative interpretations of the demands of the tasks and the jumps in difficulty
between adjacent item clusters. This approach was one way of segmenting the con-
tinuum to provide a usable description for teachers that was not unduly detailed, but
inevitably it meant that there were some compromises. Generally, for each individ-
ual item the increasing codes appeared in different identified levels of the variable,
the exceptions being PSV.2 and PSV.3, and SPN.3 and SPN.4, which appeared in
Level 5, and VDF.2 and VDF.3, appearing in Level 3. The appearance of both codes
within the same level suggests that there is a relatively small jump in understanding
between the two codes. In particular, PSV.2 focused on “balancing” alone, whereas
PSV.3 recognized a wider range of sources of variation in the task. It is likely that,
once students begin to balance possible outcomes, they are demonstrating early
understandings of the links between expectation and variation, providing a basis for
teacher intervention. In the Spinners Task, given that the spinners presented were
50–50 spinners, it may be that it was relatively easy for students to achieve a theo-
retical solution (Code 4). From the responses to VDF, it would seem that being able
112 WATSON, CALLINGHAM, KELLY
113
114 WATSON, CALLINGHAM, KELLY
lines around the terms at Level 1. Students then appear to take on a primitive
appreciation of the ideas at Level 2 with some single descriptors but no interac-
tion of ideas. At Level 3, “more” becomes a proportional focus. “Anything can
happen,” however, is an explanation in chance settings for outcomes that vary
from those expected as “more.” This explanation is also used in data settings (e.g.,
with temperatures). There is little acknowledgement, however, of links between
the ideas. Gradually students then take on the idea of expectation as center or
trend, depending on the setting, and a more sophisticated idea of variation as
“small change” rather than “any change.” This beginning of appreciation of inter-
action between the two ideas is indicated at Level 4 with a broken arrow. At Level
5 students begin to resolve the dilemma of reconciling expectation and variation
in a single setting by recognizing their connections with each other, indicated
with a solid arrow, and can question surprising outcomes of one with respect to
the other. Finally at Level 6, they can do this when asked to compare and contrast
data sets, indicated by crossing arrows.
It is of interest that this model has similarities to the stages of development
proposed by Piaget and Inhelder (1975). Whereas the earlier work focused on
experiments with different random generators, the study reported here included
more “real world” examples, which allowed for the social context of the questions
to be considered. At Level 6, students could reconcile opposing ideas regardless
of context, whereas at lower levels familiarity with the context in which the items
were based appeared to play a part. The inclusion of context is in keeping with the
thrust towards statistical literacy and reinforces the interaction of context with the
mathematical ideas underlying statistical literacy that has been described in ear-
lier studies based on survey data (Callingham & Watson, 2005; Watson &
Callingham, 2003; Watson et al., 2003).
Educational Implications
It may be deemed impractical in a classroom to carry out interviews with
individual students to uncover the details of understanding that this study has
done based on the protocols in Appendix A. Some of the questions however can
be used to structure discussion and activities in the classroom that will elicit
students’ beliefs and understandings. The protocols can be adapted for group
work and report-writing for assessment purposes. If as suggested by the analyses
in this paper, the tasks present a viable way of deciphering levels of student
understanding, then their use in the classroom to assist students to reach higher
levels of performance should be encouraged. Watson and Shaughnessy (2004)
made practical suggestions for the use of the Lollies and Comparing Groups pro-
tocols that incorporate the importance of proportional reasoning in relation to the
expectation aspects of the tasks.
APPRECIATION OF EXPECTATION AND VARIATION 115
ACKNOWLEDGMENTS
This research was funded by the Australian Research Council, Grant numbers
A00000716 and DP0208607. The authors thank the referees for helpful sugges-
tions in revising this article.
116 WATSON, CALLINGHAM, KELLY
REFERENCES
Adams, R. J., & Khoo, S. T. (1996). Quest: Interactive item analysis system. Version 2.1 [Computer
software]. Melbourne: Australian Council for Educational Research.
Australian Education Council. (1991). A national statement on mathematics for Australian schools.
Carlton, Vic: Author.
Biggs, J. B., & Collis, K. F. (1982). Evaluating the quality of learning: The SOLO taxonomy. New York:
Academic Press.
Biggs, J. B., & Collis, K. F. (1991). Multimodal learning and the quality of intelligent behaviour. In
H. A. H. Rowe (Ed.), Intelligence: Reconceptualization and measurement (pp. 57–76). Hillsdale,
NJ: Lawrence Erlbaum Associates, Inc.
Bond, T. G., & Bunting, E. M. (1995). Piaget and measurement III: Reassessing the méthode clinique.
Archives de Psychologie, 63, 231–255.
Bond, T. G., & Fox, C. M. (2001). Applying the Rasch Model: Fundamental measurement in the
human sciences. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Callingham, R. A., & Watson, J. M. (2004). A developmental scale of mental computation with part-
whole numbers. Mathematics Education Research Journal, 16(2), 69–86.
Callingham, R., & Watson, J. M. (2005). Measuring statistical literacy. Journal of Applied Measurement,
6(1), 19–47.
Capel, A. D. (1885). Catch questions in arithmetic & mensuration and how to solve them. London:
Joseph Hughes.
Fischbein, E. (1975). The intuitive sources of probabilistic thinking in children. Dordrecht: D. Reidel.
Fischbein, E., & Gazit, A. (1984). Does the teaching of probability improve probabilistic intuitions?
An exploratory research study. Educational Studies in Mathematics, 15, 1–24.
Fischbein, E., Nello, M. S., & Marino, M. S. (1991). Factors affecting probability judgements in
children and adolescents. Educational Studies in Mathematics, 22, 523–549.
Fischbein, E., & Schnarch, D. (1997). The evolution with age of probabilistic, intuitively based mis-
conceptions. Journal for Research in Mathematical Education, 28, 96–105.
Gal, I., Rothschild, K., & Wagner, D. A. (1989, April). Which group is better?: The development of
statistical reasoning in elementary school children. Paper presented at the meeting of the Society
for Research in Child Development, Kansas City, MO.
Green, D. R. (1983). A survey of probability concepts in 3000 pupils aged 11–16 years. In D. R. Grey,
P. Holmes, V. Barnett, & G. M. Constable (Eds.), Proceedings of the First International Conference
on Teaching Statistics (Vol. 2, pp. 766–783). Sheffield, England: Teaching Statistics Trust.
Green, D. R. (1986). Children’s understanding of randomness: Report of a survey of 1600 children aged
7–11 years. In R. Davidson & J. Swift (Eds.), Proceedings of the Second International Conference on
Teaching Statistics (pp. 287–291). Victoria, BC: The Organizing Committee, ICOTS2.
Green, D. (1991). A longitudinal study of pupils’ probability concepts. In D. Vere-Jones (Ed.),
Proceedings of the Third International Conference on Teaching Statistics. Vol. 1. School and gen-
eral issues (pp. 320–328). Voorburg, The Netherlands: International Statistical Institute.
Green, D. (1993). Data analysis: What research do we need? In L. Pereira-Mendoza (Ed.), Introducing
data analysis in the schools: Who should teach it? (pp. 219–239). Voorburg, The Netherlands:
International Statistical Institute.
Griffin, P. (1990). Profiling literacy development: Monitoring the accumulation of reading skills.
Australian Journal of Education, 34, 290–311.
Hart, W. L. (1953). College algebra (4th ed.). Boston: D. C. Heath.
Inhelder, B., & Piaget, J. (1958). The growth of logical thinking: From childhood to adolescence.
(A. Parsons & S. Milgram, Trans.). New York: Basic Books.
James, G., & James, R. C. (Eds.). (1959). Mathematics dictionary. Princeton, NJ: D. Van Nostrand
Company, Inc.
APPRECIATION OF EXPECTATION AND VARIATION 117
Jones, G. A. (1974). The performances of first, second and third grade children on five concepts of
probability and the effects of grade, I.Q. and embodiments on their performances. Unpublished
doctoral thesis. Bloomington: Indiana University.
Kahneman, D., & Tversky, A. (1972). Subjective probability: A judgement of representativeness.
Cognitive Psychology, 3, 430–454.
Keeves, J. P., & Alagumalai, S. (1999). New approaches to measurement. In G. N. Masters &
J. P. Keeves (Eds.), Advances in measurement in educational research and assessment (pp. 23–42).
Oxford: Pergamon.
Kelly, B. A., & Watson, J. M. (2002). Variation in a chance sampling setting: The lollies task. In
B. Barton, K. C. Irwin, M. Pfannkuch, & M. O. J. Thomas (Eds.), Mathematics education in the
South Pacific (Proceedings of the 26th annual conference of the Mathematics Education Research
Group of Australasia, Auckland, Vol. 2, pp. 366–373). Sydney, NSW: MERGA.
Konold, C., & Pollatsek, A. (2002). Data analysis as the search for signals in noisy processes. Journal
for Research in Mathematics Education, 33, 259–289.
Lee, C. (Ed.). (2003). Reasoning about variability: Proceedings of the Third International Research
Forum on Statistical Reasoning, Thinking, and Literacy [CD-ROM]. Mt. Pleasant, MI: Central
Michigan University.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd
ed.). Thousand Oaks, CA: Sage.
Mokros, J., & Russell, S. J. (1995). Children’s concepts of average and representativeness. Journal for
Research in Mathematics Education, 26, 20–39.
Moore, D. S. (1990). Uncertainty. In L. S. Steen (Ed.), On the shoulders of giants: New approaches to
numeracy (pp. 95–137). Washington, DC: National Academy Press.
National Council of Teachers of Mathematics. (1989). Curriculum and evaluation standards for
school mathematics. Reston, VA: Author.
National Council of Teachers of Mathematics. (2000). Principles and standards for school mathemat-
ics. Reston, VA: Author.
Petrosino, A. J., Lehrer, R., & Schauble, L. (2003). Structuring error and experimental variation as
distribution in the fourth grade. Mathematical Thinking and Learning, 5(2&3), 131–156.
Piaget, J., & Inhelder, B. (1975). The origin of the idea of chance in children. (L. Leake Jr., P. Burrell,
& H. D. Fishbein, Trans.). New York: W.W. Norton and Company. (Original work published 1951)
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago:
University of Chicago Press (Original work published 1960)
Reading, C., & Shaughnessy, M. (2000). Student perceptions of variation in a sampling situation. In
T. Nakahara & M. Koyama (Eds.), Proceedings of the 24th annual conference of the International
Group for the Psychology of Mathematics Education (Vol. 4, pp. 89–96). Hiroshima, Japan:
Hiroshima University.
Reading, C., & Shaughnessy, M. (2004). Reasoning about variation. In J. Garfield & D. Ben-Zvi (Eds.),
The challenge of developing statistical literacy, reasoning, and thinking (pp. 201–226). Dordrecht:
Kluwer.
Rubin, A., Bruce, B., & Tenney, Y. (1991). Learning about sampling: Trouble at the core of statistics.
In D. Vere-Jones (Ed.), Proceedings of the Third International Conference on Teaching Statistics:
Vol. 1. School and general issues (pp. 314–319). Voorburg, The Netherlands: International
Statistical Institute.
Shaughnessy, J. M., Canada, D., & Ciancetta, M. (2003). Middle school students’ thinking about vari-
ability in repeated trials: A cross-task comparison. In N. A. Pateman, B. J. Dougherty, & J. T. Zilliox
(Eds.), Proceedings of the 27th conference of the International Group for the Psychology of
Mathematics Education held jointly with the 25th conference of PME-NA (Vol. 4, pp. 159–165).
Honolulu, HI: Center for Research and Development Group, University of Hawaii.
118 WATSON, CALLINGHAM, KELLY
Watson, J. M., & Moritz, J. B. (2003). Fairness of dice: A longitudinal study of students’ beliefs and
strategies for making judgments. Journal for Research in Mathematics Education, 34, 270–304.
Watson, J. M., & Shaughnessy, J. M. (2004). Proportional reasoning: Lessons from research in data
and chance. Mathematics Teaching in the Middle School, 10, 104–109.
Wild, C. J., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical
Review, 67, 223–265.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago: MESA
Press.
Zawojewski, J. S., & Shaughnessy, J. M. (2000). Data and chance. In E. A. Silver & P. A. Kenney
(Eds.), Results from the Seventh Mathematics Assessment of the National Assessment of
Educational Progress (pp. 235–268). Reston, VA: National Council of Teachers of Mathematics.
1. Suppose you have a container with 100 lollies in it. 50 are red, 20 are yellow, and 30 are
green. The lollies are all mixed up in the container. You pull out 10 lollies.
(a) How many reds do you expect to get?
(b) Suppose you did this several times. Do you think this many would come out every
time? Why do you think this?
(c) How many reds would surprise you? Why do you think this?
2. Suppose six of you do this experiment.
What do you think is likely to occur for the numbers of red lollies that are written down?
______, ______, ______, ______, ______, ______ Why do you think this?
3. Look at these possibilities that some students have written down for the numbers they
thought likely.
(a) 5,9,7,6,8,7 (b) 3,7,5,8,5,4 (c) 5,5,5,5,5,5 (d) 2,3,4,3,4,4
(e) 7,7,7,7,7,7 (f) 3,0,9,2,8,5 (g) 10,10,10,10,10,10
Which one of these lists do you think best describes what might happen? Why do you think
this?
4. Suppose that 6 students did the experiment. What do you think the numbers will most likely
go from and to?
From __________ (lowest) to __________ (highest) number of reds. Why do you think this?
Now try it for yourself: ______, ______, ______, ______, ______, ______
Given the results, do you want to change any of your previous answers?
120 WATSON, CALLINGHAM, KELLY
Code Description N = 73
4 Distributional reasoning· 8
• Strong appreciation of proportion
• Consistency across questions
3 “More” or “half” red with centered reasoning 18
• Intuitive acknowledgment of center
• Partially consistent over questions
• No strong appreciation of proportion
2 “More red” but inconsistent reasoning 25
• No explicit mention of proportion
• An attempt to justify choices – inconsistent over different questions
1 Intuitive iconic reasoning 19
• Favorite numbers
• Guessing
• Location of lollies in container/size of hand
• Outcome approach
0 Limited reasoning· 3
• Minimum of 3 “no” responses (e.g., don't know, no reason)
• All other responses iconic
5. Suppose that 40 students pulled out 10 lollies from the container, wrote down the number of
reds, put them back, mixed them up.
(a) Can you show what the number of reds look like in this case? (Use the blank space below)
(b) Now use the graph below to show what the number of reds might look like for the 40
students [axes provided on next page].
APPRECIATION OF EXPECTATION AND VARIATION 121
Code Description N = 72
3 Without axes, logical time series graphs that focus on the center. With axes, 11
data focused around “5”. Some reference to the center and variation in
discussion.
1. Some students watched the news every night for a year, and recorded the daily maximum tem-
perature in Hobart. They found that the average maximum temperature in Hobart was 17 °C.
(a) What does this tell us about the temperature in Hobart?
(b) Do you think all the days had a maximum of 17 °C? - Why or why not?
(c) (What do you think the maximum temperature in Hobart might be for 6 different days
in the year?)* ______, ______, ______, ______, ______, ______
(d) Why did you make these choices?
*Part (c) is not part of WDN but essential to understanding Part (d)
122 WATSON, CALLINGHAM, KELLY
1. Some students watched the news every night for a year, and recorded the daily maximum tem-
perature in Hobart. They found that the average maximum temperature in Hobart was 17 °C.
c) What do you think the maximum temperature in Hobart might be for 6 different days in
the year?______, ______, ______, ______, ______, ______
e) For the whole year, what do you think the highest and lowest daily maximum tempera-
ture in Hobart would be? highest maximum _____ lowest maximum ____
f) For the month of January, what do you think the highest and lowest daily maximum
temperature in Hobart would be? highest maximum _____ lowest maximum ____
g) For the month of July, what do you think the highest and lowest daily maximum tem-
perature in Hobart would be? highest maximum _____ lowest maximum ____
1. Some students watched the news every night for a year, and recorded the daily maximum
temperature in Hobart. They found that the average maximum temperature in Hobart
was17 °C.
APPRECIATION OF EXPECTATION AND VARIATION 123
2. Here are some ideas from other students. What do you think of them?
(a)
(b)
(c)
Two schools are comparing some classes to see which is better at spelling.
a) Number of People
Number of People
Now look at the scores of all students in each class, and then decide. Did the two classes score
equally well, or did one of the classes score better? Explain how you decided.
b) Number of People
Number of People
APPRECIATION OF EXPECTATION AND VARIATION 125
Did the two classes score equally well, or did one of the classes score better? Explain how you
decided.
c) Number of People
Number of People
Again look at the scores of all students in each class, and then decide. Did the two classes score
equally well, or did one of the classes score better? Explain how you decided.
126 WATSON, CALLINGHAM, KELLY
The two fairs spinners shown below are part of a carnival game. A player wins a prize only
when both arrows land on black after each spinner is spun once.
WIN LOSS
GAME 1
GAME 2
GAME 3
GAME 4
GAME 5
GAME 6
GAME 7
GAME 8
GAME 9
GAME 10
TOTAL
d) How does this compare with what you thought in Part (b)?
128 WATSON, CALLINGHAM, KELLY
4 Relational
• Appropriate and theoretical reasoning and understanding of 6
independent events when predicting outcomes (a and b) and when
explaining outcomes observed from trials (d).
3 Multistructural
• Intuitive reasoning of independent events expressed in light of the 5
experimental outcome when explaining outcomes observed in trial (d).
2 Unistructural
• A focus on how the spinner is used when predicting outcomes (a and b) 37
and when explaining outcomes observed from trial (d).
• A focus on chance (50–50) or “anything can happen” when predicting
(a and b) and when explaining outcomes (d).
1 Iconic
• Intuitive beliefs when predicting outcomes (a and b), however, 16
egocentric or anthropomorphic views when explaining outcomes (d).
Let’s say that the average weight for Grade 5 children over the whole of Tasmania is 30 kg. A
researcher randomly chooses a sample of 10 Grade 5 children from the state. The first child
chosen weighs 39 kg.
(a) Now think about just the next 9 children in the sample.What do you think their average
weight will be?
Please explain your answer.
(b) Now think about the whole sample of 10 children together. What do you think their
average weight will be?
Please explain your answer.
2 Balancing: Recognition of two sets (size 9 and 10) and the need to 3
compensate for the known value in some fashion.
4 Relational 14
• Appropriate description of “Variation”.
• Appropriate description of “Variable” without the need for a
familiar context (winds).
3 Multistructural 20
• Appropriate description of “Variation”.
• Appropriate description of “Variable” when offered in a
familiar context (winds).
2 Unistructural 7
• Unable to appropriately describe “Variation”.
• Appropriate understanding of “Variable” only when offered in a
familiar context (winds).
1 Prestructural 12
• Unable to appropriately describe “Variation”.
• Inappropriate understanding of “Variable” even when offered in a
familiar context (wind).
LEX −4.13 0.88 −0.86 0.51 0.67 0.49 2.04 0.55 0.71 0.73 −2.02 −1.4
LGR −2.19 0.63 −0.04 0.47 1.15 0.52 2.17 0.62 0.86 0.94 −0.82 −0.2
WDN −3.81 0.81 −0.12 0.47 0.64 0.49 2.02 0.56 1.01 1.15 0.11 0.74
WDT −4.19 0.88 −0.28 0.48 1.08 0.5 1.25 1.31 1.66 1.33
WGR −3.69 0.94 −1.43 0.63 −0.12 0.53 1.75 0.52 0.85 0.87 −0.9 −0.56
CGX −3.09 0.97 −1.27 0.65 1.3 0.61 2.27 0.78 3.86 1.47 1.06 1.08 0.36 0.39
CGV −1.63 0.63 −0.55 0.54 0.62 0.5 3.21 0.86 1.39 1.42 2.05 1.72
SPN −3.25 0.97 −0.75 0.55 1.58 0.61 2.05 0.64 0.86 0.79 −0.66 −0.77
PSX −1.31 0.78 0.72 0.64 2.8 0.93 1.21 1.24 0.98 0.89
PSV −1.84 0.88 1.6 0.88 1.92 0.95 2.54 1.13 0.93 0.61 −0.08 −0.85
VDF −1.72 0.69 −0.28 0.51 0.11 0.52 1.38 0.52 0.96 1.01 −0.17 0.14