Receptive Skills

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/222008635

Reading, listening, and viewing comprehension in English as a foreign


language: One or more constructs?

Article  in  Intelligence · November 2010


DOI: 10.1016/j.intell.2010.09.003

CITATIONS READS

22 5,101

3 authors, including:

Ulrich Schroeders Oliver Wilhelm


Universität Kassel Ulm University
97 PUBLICATIONS   1,545 CITATIONS    239 PUBLICATIONS   10,800 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Special Issue "New Methods and Assessment Approaches in Intelligence Research" View project

Anchoring Vignettes in Personality Research View project

All content following this page was uploaded by Oliver Wilhelm on 01 November 2017.

The user has requested enhancement of the downloaded file.


This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy

Intelligence 38 (2010) 562–573

Contents lists available at ScienceDirect

Intelligence

Reading, listening, and viewing comprehension in English as a foreign


language: One or more constructs?
Ulrich Schroeders ⁎, Oliver Wilhelm, Nina Bucholtz
Institute for Educational Progress, Humboldt-Universität zu Berlin, Germany

a r t i c l e i n f o a b s t r a c t

Article history: Receptive foreign language proficiency is usually measured with reading and listening
Received 31 March 2010 comprehension tasks. A novel approach to assess such proficiencies – viewing comprehension –
Received in revised form 3 September 2010 is based on the presentation of short instructional videos followed by one or more comprehension
Accepted 6 September 2010
questions concerning the preceding video stimulus. In order to evaluate a newly developed
viewing comprehension test 485 German high school students completed reading, listening, and
Keywords: viewing comprehension tests, all measuring the receptive proficiency in English as a foreign
Reading language. Fluid and crystallized intelligence were measured as predictors of performance. Relative
Listening
to traditional comprehension tasks, the viewing comprehension task has similar psychometric
Comprehension
qualities. The three comprehension tests are very highly but not perfectly correlated with each
Language assessment
Intelligence other. Relations with fluid and crystallized intelligence show systematic differences between the
three comprehension tasks. The high overlap between foreign language comprehension measures
and between crystallized intelligence and language comprehension ability can be taken as support
for a uni-dimensional interpretation. Implications for the assessment of language proficiency are
discussed.
© 2010 Elsevier Inc. All rights reserved.

1. Introduction factor accounting for a broad variety of language ability facets


(Carroll, 1983), but survived in an alleviated form of the four
For decades the question whether language ability is language skill factors which postulates four first order factors
represented through a single factor or through a couple of (reading, listening, writing, and speaking) below an over-
traits has been discussed intensively (cp. Bachman, 1990; arching second order factor.
Carroll, 1993; Purpura, 1999). Lado (1961) and Carroll (1961, In this study about the dimensionality of language skills we
1968) proposed models that discriminate between four basic focus on receptive skills because of their fundamental role in
skills (reading, listening, writing, and speaking) and three second language acquisition. In order to be engaged in the –
components of language knowledge (structure, vocabulary, other things being equal – more challenging productive
and phonology/graphology) to map native and secondary activities such as writing and speaking it is indispensable to
language proficiency. In contrast to the multi-constructs comprehend text and speech. Besides the distinction between
perspective, Oller (1976, 1979) proposed a model with only receptive (reading and listening) and productive (writing and
one general factor. This unified assumption was based on the speaking) skills, the four basic language skills can also be
concept of a pragmatic expectancy grammar that enables classified into oral (listening and speaking) vs. written-text
guessing in situations with reduced context. The single factor skills (reading and writing; see Table 1). It is important to note
theory was rebutted in its extreme form of a single general that reading and listening abilities both can be assessed through
comprehension tasks, but they do rely on different input
⁎ Corresponding author. Institute of Psychology, University of Duisburg-
modalities.
Essen, Berliner Platz 6-8, 45127 Essen, Germany. On a more fine grained level it can be asked, whether or
E-mail address: ulrich.schroeders@uni-due.de (U. Schroeders). not the mode of comprehension tasks is essential in receptive

0160-2896/$ – see front matter © 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.intell.2010.09.003
Author's personal copy

U. Schroeders et al. / Intelligence 38 (2010) 562–573 563

Table 1 and mathematical equations and a depictive representation for


Taxonomy of global foreign language skills. pictures or physical models (Schnotz, 2005; Schnotz &
Language Skill Bannert, 2003).
However, research concerning multimedia learning is
Receptive Productive
often limited to the first moment of ability distributions.
Input Oral Listening, (viewing) Speaking Apparently, the difficulty of the task depends on the
Written Reading Writing
complexity and type of information provided and the
modality that is involved. Whether or not the modality of a
task affects the bivariate relations between measures is a
foreign language ability testing. From a modality unspecific or different research question. For example LC stimuli might be
single skill perspective (Spolsky, 1973; Oller, 1979, 1983) it has processed on a different processing route than RC stimuli—
been proposed that a single comprehension factor is sufficient nevertheless individual differences in a LC and a RC measure
in order to explain individual differences in comprehension might be perfectly correlated with each other. Such a result
ability whereas from a modality specific or multiple skill would emerge if the different processing routes do not affect
perspective (Lado, 1961; Carroll, 1961, 1968) it has also been the relative standing of any two individuals completing a LC
suggested that there are several distinguishable latent and a RC task.
comprehension abilities. The modality unspecific view is We want to pick up the controversy about the dimen-
mainly motivated by the concept of a unique mental sionality of the comprehension abilities by focusing on
representation that underlies reading and listening. As a individual differences and by implementing a relatively new
result both reading and listening comprehension (henceforth, type of comprehension task. Measuring comprehension
abbreviated to RC and LC) tasks supposedly rely on the same ability in a foreign language often requires the participant
mental representation and imply the same cognitive pro- to either read a text that is sometimes enriched with tables
cesses. The fact that normally reading and listening ability in and figures or to listen to a spoken audio sequence. We want
first-language acquisition develop simultaneously is often to introduce another type of comprehension task, labeled
cited as evidence for the common representation approach. In viewing comprehension (VC).1 In a VC task the stimulus is a
order to support the modality specific view, many different short video, for example, about how the city of Boston
arguments on how reading and listening comprehension developed economically in the 20th century. Subsequent to
differ have been proposed. First, speech is obviously more the video presentation, multiple-choice questions are
comprehensive than text and listening comprehension prompted asking for detailed information or conclusions
typically demands the integration of both linguistic and that can be drawn out of the previously presented video. In
non-linguistic knowledge (Buck, 2001). For example, it is order to respond correctly, first, auditory and visual informa-
easier to understand a quarrel between a mother and her tion has to be extracted from the video. After gathering pieces
child if the speech reflects their emotions or if accentuation of information the next process in the chain is relating the
highlights critical information. Second, spoken text varies in pieces to each other and connecting them with already stored
accordance to the characteristics of the speaker's voice and knowledge in a mental model of the video sequence. In
speech. Prosodic features such as the speech rate (Carver, principle, this sequence of intertwined processing steps is
1973), the timbre of voice, and dialects (Adank, Evand, Stuart- similar to the processing steps in completing RC and LC tasks.
Smith, & Scott, 2009) may significantly contribute to required Besides such similarities in information processing, there are
processing effort (for an overview see Samuels, 1987). obvious differences between VC and the traditional compre-
Therefore, the difficulty of a listening item depends on its hension tasks. For instance, VC is not restricted to one
specific instantiation whereas the text of a reading item is modality (seeing or listening) but rather combines both
static and more or less fixed. Third, spoken text enforces modalities. The primary objective of this paper is the
linear information encoding at an essentially compulsory structural relationship of the constructs RC, LC, and VC.
pace. On the contrary, reading allows for re-reading a text
passage several times looking for specific information. The 1.1. Research questions
opportunity to flexibly allocate processing resources to
elements of information that are relevant for an individual The first research question is whether the new VC task is a
understanding allows the application of more challenging proper indicator for measuring proficiency in English as a
items. Consistently, it has been shown that significantly more foreign language at all. Including videos might add interesting,
information of a complex text can be recalled in a reading but unnecessary information to the test material. Even though
than in a listening condition (Green, 1981; Lund, 1991). there is an increased amount of information this may lead to
Fourth, listening and reading may use different encoding poorer performance in a comprehension task (Mayer, Heiser, &
systems to store information in long-term memory. Through Lonn, 2001). This paradox is also known as the seductive-
this point the discussion about the influence of modality on details effect. Different explanations have been proposed to
reading and listening comprehension ability becomes vividly
connected to research on multimedia learning. One of the 1
Alternatively the terms “audio-visual comprehension”, “video compre-
basic assumptions in cognitive theory of multimedia (CTML) hension”, “multimedia comprehension” or “watching comprehension” are
is that information is processed via two autonomous routes, conceivable. We tried to coin a new succinct term in reference to the already
existing labels of reading and listening comprehension that both describe
one for visual and one for auditory input (Paivio, 1986; activities rather than sensory input (visual and auditory) or stimulus
Mayer, 2005). Some researchers even assume two different material (text and audio). It is our understanding that the term “viewing
representations, a descriptive representation for written text comprehension” encompasses also the processing of auditory material.
Author's personal copy

564 U. Schroeders et al. / Intelligence 38 (2010) 562–573

account for the phenomenon: a) distraction, b) cognitive three factors, one factor for each comprehension domain (see
overload (Sanchez & Wiley, 2006), or c) activation of improper Fig. 1, panel B). If modality affects the relative position of an
schemata (Harp & Mayer, 1998). Therefore, a potential gain in individual in a comprehension tasks, then the correlation
ecological validity – that is given if the material and setting of between LC and VC should be more pronounced than the
the testing approximates real-life (Shadish, Cook, & Campbell, correlations with RC because both rely (mainly) on auditory
2002) – might be accompanied by a loss of psychometric input. Furthermore, the correlation between RC and LC should
quality. In order to evaluate the psychometric quality we be lowest because they rely on distinct modalities. This model
establish measurement models for all comprehension tasks – is a relaxed version of a two-factor model in which – according
reading, listening, and viewing – and compare them in terms of to the modality involved – a RC factor and a combined LC–VC
reliability of the measures and model fit. factor account for the bivariate relations. This two-factor
The second research question addresses the nature of the model can be derived from the three factor model by
supposedly new construct VC: is VC substantially different constraining the correlation between LC and VC to unity and
from RC and LC or do all three comprehension measures the remaining correlations (RC–LC and RC–VC) to equality.
assess the same general comprehension ability? In a series of However, it is essential to bear in mind that the discussion of
confirmatory measurement models we want to test whether similarity between reading, listening, and viewing is not
the constructs are distinguishable or not. Therefore, we limited to a controversy about sensory input. Sensory input is
translate the two competing theories mentioned above – one prominent source of potential differences between the
the modality unspecific and the modality specific theory – different types of comprehension tasks, but not the only one.
into testable measurement models. Differences may also lie in the form of representation (e.g.,
From the modality unspecific perspective a parsimonious descriptive vs. depictive; Schnotz & Bannert, 2003). For this
single-factor model can be derived (see Fig. 1, panel A). The reason, we decided to set up a model with three correlated
modality specific theory is represented through a model with comprehension factors.

(A) Modality-unspecific
Compre-
single-factor model hension

R01 R02 … R20 R21 L01 L02 … L20 L21 V01 V02 … V30 V31

(B) Modality-specific model


with three correlated
comprehension factors
Listening Reading Viewing

R01 R02 … R20 R21 L01 L02 … L20 L21 V01 V02 … V30 V31

(C) Single-factor model with


Compre-
one nested method factor hension

R01 R02 … R20 R21 L01 L02 … L20 L21 V01 V02 … V30 V31

Method
factor

Fig. 1. Competing measurement models. Note. The last panel shows just an example of all possible nested factor models. Actually, seven different models belong to
this category: three with only one nested factor, three with two nested factors, and one with a nested factor for each comprehension domain.
Author's personal copy

U. Schroeders et al. / Intelligence 38 (2010) 562–573 565

An alternative modeling approach treated the different background questionnaire. Instructions for all comprehension
comprehension tasks as different methods, all assessing one measures were give in English.
general comprehension ability factor. By specifying addition-
ally one or more nested factors that tap such method specific 2.2.1. Reading and listening comprehension measures
variance, we want to test if the individual differences in Items for RC and LC were drawn from a larger database of
comprehension tasks can be explained better by taking into field-tested items assessing proficiency in English as a foreign
account such task specific components or measurement language (Rupp, Vock, Harsch, & Köller, 2008). In order to
artifacts (see Fig. 1, panel C). Comparing model fit of both compile multiple-choice items it was necessary to change the
models and scrutinizing the magnitude of the correlations response format for some of the items. The RC measure
between latent variables will determine which theory is consisted of 12 short texts covering various topics, for
supported by empirical evidence on the level of individual example Amazon parrots or the Hurricane Katrina. The screen
differences. was split in two halves: on the left side participants saw the
The third research question is whether or not evidence text, on the right side the questions. Test takers were given the
exists for the specificity of VC by investigating established opportunity to go back and forth within the RC part to review
ability constructs as covariates. A convenient way to show and change previous answers while the text was constantly
that a supposedly new construct actually differs from similar visible. Although there is good empirical evidence that the
constructs is to demonstrate differential relationships to majority of examinees will change only a few responses
covariates or to establish incremental validity in the predic- (Revuelta, Ximénez, & Olea, 2003), we added this function-
tion of relevant outcomes (Sechrest, 1963; Wilhelm, 2009). In ality to keep the differences in response format across the
order to provide evidence for the similarity or dissimilarity of three comprehension tasks as small as possible. The 7 audio
the constructs we compare the correlations of the different stimuli for the LC items were mostly short dialogs, for
comprehension types with fluid and crystallized intelligence. instance, excerpts of a radio interview. Participants were
Differential correlations would provide valuable information able to read all questions concerning a specific item for a short
about the nature of the differences in comprehension tasks. period of time before presenting the audio stimulus and while
Using confirmatory factor analysis is essential for under- listening to the audio input. RC and LC both had a testlet
standing individual differences in these constructs. Horn and structure with 29 and 24 items, respectively.
Noll (1997) stated that this approach provides structural
evidence. They also pointed out that other construct aspects –
for instance, how abilities develop or how abilities are tied to 2.2.2. Viewing comprehension measure
the neural substrate – requires other forms of validity The newly developed VC task was based on short film
evidence such as developmental or neurocognitive data. sequences of the television production “USA — The sound
of...” (information available at SWR broadcasting) lasting
2. Methods between 1.5 and 7.5 min. The 5 video stimuli provided
historical, geographical and societal information about dif-
2.1. Participants ferent regions of the USA, for example, about the conse-
quences of the discovery of oil in Louisiana. Similar to a
This study was embedded in the process of establishing typical television report, a narrator or an interview partner
national performance scales in English as a foreign language provided critical information on the topic and the visual
in Germany. The RC and LC scales are based on the national material only supported the understanding. For example, the
educational standards in English as the first foreign language narrator's text “In 1901 oil was discovered in Louisiana. One
in the 5th or 6th year of foreign language education. The Data year later there were already 76 oil companies digging in the
Processing Center of the International Association for the swamp for the black gold” was accompanied by old photos
Evaluation of Educational Achievement selected the schools picturing a drilling rig framed with workers of an oil
participating in the present study and also proctored the tests company. The essential information has to be processed via
in schools according to guidelines provided by the authors. In the auditory channel. Unlike the procedures in the LC task
order to increase feasibility and to cut costs only schools from participants did not see the questions in the VC task while
one northern state of Germany were recruited. Participation watching the videos once. VC had a testlet structure with 39
was mandatory for the students. N = 485 German high school items. Items of all comprehension tasks had the same
students of intermediate-track Realschule (n = 241) and response format, that is, multiple-choice format with four
academic-track Gymnasium (n = 244) participated in this response alternatives and only one correct solution. The items
study. Nearly two thirds of them attended 9th grade, included all information that was necessary to answer the
(n = 361) and the remaining third, 10th grade (n = 124). questions, thus, minimizing the effect of prior knowledge. All
Fifty-four percent of the participants were female (n = 262). three comprehension tests were designed to assess different
Mean age for the overall sample was about 16 years comprehension subskills such as finding detailed information
(M = 15.9, SD = 0.72, range 13.8–18.4). and deriving conclusions from different pieces of information.

2.2. Measures 2.2.3. Intelligence measures


In addition to the three comprehension tasks participants
Participants completed measures of RC, LC, and VC as well worked on measures assessing fluid and crystallized intelli-
as tests for fluid and crystallized intelligence. After completing gence (Wilhelm, Schroeders, & Schipolowski, 2009). A
all ability tests participants completed a socio-demographic common way to capture fluid intelligence (gf) is to assess
Author's personal copy

566 U. Schroeders et al. / Intelligence 38 (2010) 562–573

only its figural content that is considered prototypical for 2.3. Procedure
reasoning (Wilhelm, 2005).
More specifically, participants had to detect regularities by 2.3.1. Data collection
which geometric figures change their shape, position, shading Participants were randomly assigned to one of two groups in
et cetera in a sequence of squares. The task of participants was a within-subject-design. For group 1 the sequence of tests was: a)
to complete two positions of the sequence by choosing completion test (25 min), b) listening comprehension (25 min),
amongst three alternatives for. Fig. 2 shows a sample item of c) reading comprehension (25 min), d) viewing comprehension
the test. The actual content of the gf measure was non-verbal; (40 min), e) fluid intelligence (14 min), f) crystallized intelli-
instructions were in German. gence (20 min), and g) questionnaires (20 min). For group 2 only
The measure for crystallized intelligence (gc) consisted of the order of the LC and the VC task was interchanged.
64 multiple-choice declarative knowledge items covering Specifications of time represent effective test time without
three broad domains—natural sciences, humanities, and civics instructions. Total test time added up to 179 min. All testing was
(see Fig. 2). The questions range from biology (What is the computer-based except the completion tests.
function of red blood cells?) to politics (Who or what is
elected every 5 years at the “European Election”?) and do not 2.3.2. Data analysis
overlap with the comprehension measure regarding the Analyses were computed with Mplus 5.21 (Muthén &
content. This approach of operationalizing crystallized intel- Muthén, 2009). All measurement and structural models were
ligence by means of declarative knowledge tests (Ackerman, based on the weighted least squares mean and variance
2000; Rolfhus & Ackerman, 1999) varies from the predom- adjusted estimator (WLSMV, B. O. Muthén, 1993) because
inantly used vocabulary measures (Carroll, 1993). Arguably, simulation studies showed that for dichotomous data the
declarative knowledge measures represent a more adequate WLSMV estimator is superior to the maximum likelihood
and prototypical form of assessment of crystallized intelli- estimator (ML) both in terms of model rejection rates and
gence (Beauducel, 2003). In contrast to the comprehension appropriate estimation of factor loadings (Beauducel &
measures, the gc measure was administrated in German Herzberg, 2006). WLSMV is a robust weighted least squares
language. estimator using a diagonal weight matrix (Muthén & Muthén,
2007) that is based on the asymptotic variances and
2.2.4. Other measures covariances of tetrachoric (or polychoric) correlations. In a
Socio-economical, biographical and computer-related simulation study of Flora and Curran (2004) estimation with
background information were collected with questionnaires robust WLS resulted in accurate test statistics, parameter
at the end of the testing session. Participants also worked on a estimates and standard errors, even if the latent responses are
completion test. Results for the C-Tests, the computer-related not normally distributed and sample size is low.
background information, and socio-economic status will be Data were complete for RC (N = 485). LC and VC data for
reported elsewhere. one case were missing (N = 484). N = 483 participants

Fig. 2. Sample items for fluid and crystallized intelligence.


Author's personal copy

U. Schroeders et al. / Intelligence 38 (2010) 562–573 567

completed the crystallized intelligence task and N = 478 the 3.2. Measurement models
fluid intelligence task, respectively, resulting in a minimal
covariance coverage of 98.4%. Because missingness was not Descriptive statistics for the comprehension tasks and
an issue data were analyzed with pairwise present analysis, indices of model fit are listed in Table 2. Because in WLSMV
that is, by exploiting the information of all individuals who estimation the degrees of freedom (df) are estimated rather
worked on a specific item pair (Muthén & Muthén, 2007). than computed (as in the case of ML estimation) neither χ2
nor df can be used as in models with ML estimation. However,
the comparative fit index (CFI), the root mean square error of
3. Results approximation (RMSEA), and the weighted root mean square
residual (WRMR) are suitable statistics to evaluate goodness
3.1. Item selection of fit. For categorical data and the present sample size, Yu
(2002) recommends the following cutoff values indicating
Item selection was necessary for the comprehension tasks good model fit, CFI ≥0.96, RMSEA ≤0.05, and WRMR ≤0.95.
because new or modified parts of the item database were not After item selection measurement models of all compre-
sufficiently field-tested. Items with one of the following hension tasks yielded excellent fit with regard to all goodness
characteristics were removed (numbers of excluded items are of fit statistics. The measurement models for both intelligence
denoted in parentheses for each task in the order RC/LC/VC): scales show acceptable fit without exclusion of items.
a) extreme proportion of correct responses pi N0.95 or below Reliabilities by means of McDonald's omega (1999) – for
guessing probability (1/3/6), b) non-significant loadings in the VC factor ω = 0.87, the RC factor ω = 0.85, the LC factor
one factorial measurement models computed separately for ω = 0.81, the gf factor ω = 0.82, and the gc factor ω = 0.79 –
each task (0/3/1), and c) considerable improvement in model can all be considered high or at least sufficient. The misgiving
fit under item exclusion in the single-factor model (2/2/1). that the VC task is more prone to testlet effects because in
This item selection procedure results in 21 items for both the comparison with RC and LC more questions are asked per
RC and LC task and 31 items for the VC task. The sample was stimulus is unfounded: The model fit is good for all measures
range-restricted in that only educational institutions at the even without explicitly modeling these interdependencies.
upper end of the ability spectrum were included. The range Different models that take the testlet structures into account
restriction is partly responsible for the high rates of correct have been proposed (for a comprehensive overview see
responses. The RC and LC items were originally constructed Rijmen, 2009). The testlet model (Bradlow, Wainer, & Wang,
for paper-pencil-testing. Whether the change of test medium 1999) is obtained by constraining the loadings on the testlet
affected the psychometric properties of items will be factor to be proportional to the loadings on the comprehen-
addressed in another paper based on independent data sion factor within each testlet. We checked all analyses with
using multi-group confirmatory factor analysis (Schroeders testlet models and compared them to the non-testlet models
& Wilhelm, submitted for publication). For RC that was in order to evaluate the divergence. The differences are small
assessed with the same items used in this study we showed and bare no effect on the interpretation of results. In answer
strict measurement invariance. The LC measure that was to the first research question, we therefore conclude that the
comprised of different items from a common item database psychometric quality of the VC, the LC, and the RC is decent
was strongly invariant. and of comparable magnitude.

Table 2
Measurement models of all three comprehension tasks, fluid and crystallized intelligence.

na ∅ pb pmin c pmax d χ2e df e p CFI f RMSEA g WRMR h ωi

RC 21 0.74 0.45 0.93 120.5 104 0.13 0.974 0.013 0.884 0.85
LC 21 0.69 0.39 0.93 121.2 113 0.28 0.983 0.012 0.857 0.81
VC 31 0.55 0.27 0.91 227.6 204 0.12 0.978 0.015 0.898 0.87
Gf 16 0.56 0.24 0.95 107.5 73 0.01 0.933 0.031 0.952 0.82
Gc j 16 0.63 0.54 0.81 101.5 78 0.04 0.977 0.025 0.736 0.79

Note. RC, reading comprehension (N = 485). LC, listening comprehension (N = 484). VC, viewing comprehension (N = 484). Gf, fluid intelligence (N = 478). Gc,
crystallized intelligence (N = 483).
a
Number of items after item selection.
b
Mean item difficulty.
c
Lowest item difficulty.
d
Highest item difficulty.
e
Because all models are based on the weighted least squares mean and variance adjusted estimator (WLSMV) the degrees of freedom cannot be interpreted as
the number of unspecified model parameters.
f
Comparative fit index.
g
Root mean square error of approximation.
h
Weighted root mean square residual.
i
Reliability of the scale (McDonald, 1999).
j
For gc item parcels were considered containing four items of a content domain (e.g., chemistry). To ensure comparability of the characteristics, scores were
divided by four. Note, that test time for VC was nearly twice as much as for the other comprehension tasks (45 vs. 25 min).
Author's personal copy

568 U. Schroeders et al. / Intelligence 38 (2010) 562–573

The second research question examines the relation of the the remaining correlations to equality) does not deteriorate
supposedly new construct VC in contrast to more traditional model fit significantly (Δχ2 (2, N = 485) = 5.6, p = 0.06).2
comprehension measures. Is it essential to differentiate these However, this result is to interpret with caution because
constructs or can one single comprehension factor account fixing a correlation to the upper boundary is problematic
for the variance of all indicators? To get a first impression of (Stoel, Garre, Dolan, & van den Wittenboer, 2006).
the relationship between the comprehension skills and the Additional models, for example, models that specify
intelligence scale, Table 3 provides the correlations between nested factors in addition to a single common factor – also
the means (below the diagonal) and factors (above the termed hierarchical models – have inconclusive loading
diagonal). patterns for one or more nested factors. Therefore, with
In order to answer this question we translated the respect to the second research question we conclude that a
competing perspectives into testable models and compared model with correlated comprehension factors describes the
them in terms of model fit. The single skill perspective that data best.
treats all three comprehension tasks as different instantia- In the third research question we evaluate the validity and
tions of a single construct is represented through a single- specificity of the different comprehension constructs by
factor model. The fit of this modality unspecific model is very introducing criteria into a structural model. The main interest
good (χ2 (326, N = 485) = 352.2, p = 0.15, CFI = 0.982, here is to investigate the relations of the correlated
RMSEA = 0.013, WRMR = 0.927). The modality specific per- comprehension factors with fluid and crystallized intelligence
spective treats the three comprehension tasks as separable as criteria. Any difference in correlation could be interpreted
but correlated constructs. Therefore, a model with three as differential validity between the factors and substantiate
correlated content factors (i.e., RC, LC, and VC) represents the differences in comprehension constructs.
modality specific perspective. This model is equivalent to a Fig. 3 depicts the measurement model for the compre-
higher-order model of comprehension ability in which the hension measures in combination with gf and gc. The two
first order comprehension factors are regressed on a second latent factors for intelligence share roughly 25% of their
order comprehension factor. The fit of this model is also variance (ρ(gf, gc) = 0.52). Overall the three comprehension
excellent (χ2 (326, N = 485) = 345.7, p = 0.22, CFI = 0.986, factors correlated moderately with fluid and highly with gc.
RMSEA = 0.011, WRMR = 0.918). If all factor correlations in For gf the comprehension tasks show a somewhat differential
model 2 are constrained to unity, model 1 results. Because correlative pattern. More precisely, gf is more highly
model 2 is nested in model 1, model fit of both models can be correlated with RC than with VC (ρ(RC, gf) = 0.55 vs. ρ(VC,
compared with a likelihood-ratio-test (Bollen, 1989). Such a gf) = 0.45; Δχ2 (1, N = 485) = 5.2, p = 0.02). VC in turn might
comparison shows, that model 2 offers a significantly better be correlated more highly with gf than LC (ρ(VC, gf) = 0.45 vs.
model fit (Δχ2 (3, N = 485) = 29.4, p b 0.01). Hence, the ρ(LC, gf) = 0.36; Δχ2 (1, N = 485) = 3.2, p = 0.08). The data
correlations between the first order factors are smaller than support the conclusion that reasoning is significantly more
unity. Moreover, constraining the three correlations to strongly related with RC than with LC and VC (Δχ2 (1,
equality also leads to a decline in model fit (Δχ2 (2, N = 485) = 14.7, p b 0.01). The correlations of gc with RC and
N = 485) = 7.5, p = 0.02). More specifically, the correlation VC, respectively, are equal (ρ(RC, gc) = 0.83 vs. ρ(VC, gc) =
between RC and VC is significantly higher (ρ(RC, VC) = 0.94) 0.85; Δχ2 (1, N = 485) = 0.5, p = 0.47). However, the corre-
than the correlations between RC and LC (ρ(RC, LC) = 0.86; lation with LC turned out to be significantly lower (ρ(LC, gc)=
Δχ2 (1, N = 485) = 5.5, p = 0.02) and LC and VC, respectively 0.72 vs. ρ(RC, gc)= 0.83; Δχ2 (1, N = 485)= 6.0, p = 0.01; ρ(LC,
(ρ(VC, LC) = 0.85; Δχ2 (1, N = 485) = 4.6, p = 0.03). Fixing gc) = 0.72 vs. ρ(VC, gc) = 0.85; Δχ2 (1, N = 485) = 9.5,
this correlation between RC and VC to unity (and constraining p b 0.01).
The model incorporating fluid and crystallized intelligence
can also be specified as a regression model, in which both fluid
Table 3 and crystallized intelligence are used as predictors for the
Descriptive statistics and correlation matrix of the means (below diagonal) three comprehension factors (Fig. 4). In this model correla-
and factors (above diagonal) for the comprehension tasks, fluid and
tions of the residuals of the comprehension factors are
crystallized intelligence.
estimated. Furthermore, the correlation between fluid and
RC LC VC Gf Gc crystallized intelligence is replaced by a regression path from
N 485 485 484 478 483 fluid to crystallized intelligence. This model is theoretically
Mean 0.74 0.69 0.55 0.55 2.54 a derived from Cattell's investment theory where gc is concep-
SD 0.16 0.16 0.16 0.18 0.46 tualized as “a product of environmentally varying, experien-
λ 0.81 0.69 0.85 0.42 0.75 tially determined investments of gf” (Cattell, 1963, p. 4).
SE(λ) 0.02 0.03 0.02 0.04 0.02
Statistically, both models are equivalent (MacCallum, Wege-
RC 1.00 0.86 0.94 0.55 0.83
LC 0.55 1.00 0.85 0.36 0.72 ner, Uchino, & Fabrigar, 1993). The conceptual difference of
VC 0.68 0.56 1.00 0.45 0.85 the regression model to the one with correlated group factors
Gf 0.38 0.22 0.32 1.00 0.52 is that the weights of the intelligence factors are computed
Gc 0.60 0.49 0.65 0.37 1.00
considering the collinearity of the predictors. Comparing the
Note. RC, reading comprehension. LC, listening comprehension. VC, viewing regression weights of the predictors shows that crystallized
comprehension. Gf, fluid intelligence. Gc, crystallized intelligence. λ, factor
loadings on the first unrotated factor in an explanatory factor analysis with
2
maximum likelihood estimator. SE(λ), standard error of factor loadings. Please note that for these comparisons of correlations it is necessary to
a
For gc item parcels were considered containing four items per content fix factor variances to one instead of scaling the factors by fixing a loading to
domain. one.
Author's personal copy

U. Schroeders et al. / Intelligence 38 (2010) 562–573 569

Gf01 Gf02 … Gf15 Gf16 Gc01 Gc02 … Gc15 Gc16

Fluid Crystallized
intelligence .52 intelligence

.55 .36 c .45 c

.83 a .72 .85 a

.86 b .85 b
Reading Listening Viewing

.94

R01 R02 … R20 R21 L01 L02 … L20 L21 V01 V02 … V30 V31

N = 485, χ² = 411.8, df = 375, p = .09, CFI = .977, RMSEA = .014, WRMR = .942

Fig. 3. Three correlated content factors with fluid and crystallized intelligence as covariates. Note. Only the correlations marked with the same superscripts can be
set to equality without leading to deterioration in model fit. Indicators for gc were domain-specific item parcels; all other indicators were dichotomous variables.

Fluid Crystallized
intelligence .52 intelligence

a a
.17 -.02 .01

.74 .73 .85

Reading Listening Viewing

.86 .85

.94

.70 .65

.84

N = 485, χ² = 411.8, df = 375,p = .09, CFI = .977, RMSEA = .014, WRMR = .942

Fig. 4. Three correlated content factors with fluid and crystallized intelligence as predictors in a regression model. Note. Indicators are omitted. The correlations
between the residuals are symbolized with a dashed line. a Regression weight is not significantly different from zero.
Author's personal copy

570 U. Schroeders et al. / Intelligence 38 (2010) 562–573

intelligence is the decisive predictor for all three compre- other hand, the restriction of school type with the associated
hension tasks (βgc(RC) = 0.74; βgc(LC) = 0.73; βgc(VC) = range restriction in foreign language skills might cause an
0.85). Obviously, this is also due to the fact that the influence underestimation of the relations between manifest variables.
of gf is primarily mediated through gc. The indirect effect of It also seems likely that different learning experiences
fluid intelligence mediated through crystallized intelligence affected the three comprehension skills differentially. For
(βgf–gc(RC) = 0.39; βgf–gc(LC) = 0.38; βgf–gc(VC) = 0.44) is example, a student who watches foreign language films may
much higher than the small to negligible direct effects of gf have developed a more subtle ability to discriminate auditory
on comprehension abilities (βgf(RC)= 0.17; βgf(LC) = −0.02; stimuli than a student who likes reading the original crime
βgf(VC)= 0.01). This finding is in line with Cattell's investment stories of Agatha Christie. This raises the question about what
theory that complex, acquired skills such as the measured residuals of the first order comprehension factors express.
comprehension skills are indicators of crystallized intelligence They may reflect individual differences in specific language
and that crystallized intelligence depends appreciably on the components such as syntactic proficiencies or receptive
level of fluid intelligence (Cattell, 1987). vocabulary (Joshi, 2005) which are conceptualized as inde-
pendent from the actual skills (Carroll, 1961).
4. Discussion VC as a construct might reflect on the debate about the
cause of the differences between RC and LC. Comparing the
We begin this discussion by highlighting the research correlations between the different types of comprehension
questions and the results we found. The first research we found the highest correlation between RC and VC.
question dealt with the psychometric quality of the VC Put differently, the correlations enclosing LC (i.e., LC–RC and
measure relative to established measures of LC and RC. LC–VC) are lower than the relation between RC and VC.
Compared to the traditional comprehension indicators, the Reviewing the demands of the specific comprehension tasks
VC task yielded comparable psychometric features and was from the perspective of CTML it would have been reasonable
well accepted by participants. Because it comes closer to to assume the highest correlation between LC and VC,
language comprehension in real-life where understanding is because in both tasks the relevant information was primarily
enhanced through visual material, it might even offer a higher given in aural, not in visual form. Even though both LC and VC
authenticity. Authenticity might be considered one of the six have the same flow and processing of information according
major components of test usefulness (Bachman & Palmer, to CTML the relation between these two comprehension tasks
1996). Items usually used in language assessment to assess LC was not higher than the correlation between LC and RC.
map rare situations (e.g., listening to a podcast) whereas a Therefore, neither modality per se nor information processing
movie or a real-life conversation provide visual stimulus that seems to account for differences between RC and LC on the
allows for compensating erroneous speech. Also VC relates to level of covariances. It is reasonable to suggest that different
the pupils' environment. The most likely way for German modalities affect the difficulty of a task. However, we cannot
high school students to get in contact with the English identify this effect from the covariance structure.
language beyond lessons is to watch films and videos. The Partnership for Accessible Reading Assessment (PARA)
Although the VC measure is a psychometrically adequate, has stimulated considerable research to make large-scale
well accepted, and supposedly more authentic instrument, it assessment of reading proficiency more accessible for
has only been used sporadically in the context of measuring students with reading disabilities. In addition to the treat-
individual differences in language proficiency. Several rea- ment of the actual disorder, the use of assistive technology
sons may account for this circumstance, for instance, the has been suggested in order to tackle that issue. For example,
anticipated higher costs for acquiring test devices or Thurlow, Moen, Lekwa, and Scullin (2010) studied the use of
constructing measurement instruments (for cost–benefit a so-called reading pen that recognizes written text and
considerations, see Farcot & Latour, 2009). However, with produces synthesized speech out of it in the assessment of RC.
the proceeding transition to computer-based assessment However, they found no beneficial effects through the
(Scheuermann & Björnsson, 2009) it is more likely that implementation of the tool. In this context, the result that
established constructs such as language comprehension will RC and VC are essential perfectly correlated gains new
be assessed in new ways. significance. Moreover, we replicated this finding in another
With the second research question we investigated study using a similar VC measure in the field of sciences that
aspects of validity of the construct VC. As reported, all was administered in the first language (Schroeders, Bucholtz,
comprehension tasks correlated very highly with each other Formazin, & Wilhelm, submitted for publication). Taken
depicting a very high overlap between the constructs. This together, the findings of the two studies are encouraging
can also be seen from the factor loadings of the comprehen- that VC can be used as a substitute for RC when comprehen-
sion and intelligence skills on the first unrotated factor in an sion ability of students with reading disabilities should be
explanatory factor analysis (see Table 3, R2 = 0.60). Although measured. Moreover, it seems possible that a great gap
the one-dimensional model had to be rejected, the magnitude between the performances in a RC vs. VC task indicates a
of the correlations comes close to unity. Taking into account specific learning disability.
the fact that no substantial nested method factors could be Compared to the high correlation between RC and VC, the
introduced to the single-factor model the specificity of the significant lower correlations with LC may be caused by several
different comprehension tasks is rather low. On the one hand, reasons. One explanation might be that the differences between
it is possible that the language skills of the participants who the comprehension skills could be attributed to differences in
had 5 years of foreign language education at the time of test compilation rather than differences in construct. Or, the
testing had not yet differentiated (Garrett, 1946). On the more formal language used in documentaries and school books
Author's personal copy

U. Schroeders et al. / Intelligence 38 (2010) 562–573 571

might hold different demands in comparison to the more the first stratum as narrow ability below gc (cp. Carroll,
informal and casual language used in conversations. In order to 1993). In the Cattell–Horn–Carroll theory of cognitive
test these hypotheses it would be necessary to unconfound abilities (Alfonso, Flanagan, & Radwan, 2005; McGrew,
items from test in a subsequent study. Controlling for these 2005, 2009) that integrates the three-level structure of
differences is likely to result in equal and even higher Carroll's theory with the ability factors of Cattell–Horn-
correlations between the comprehension factors. tradition (Cattell, 1987; Horn, 1968, 1991) the classification is
In the third research question we questioned the specific- equivocal: Reading comprehension and verbal language
ity of the different comprehension skills by investigating their comprehension are sorted beneath a broad ability labeled
relationship to established measures of cognitive ability. reading and writing whereas foreign language aptitude/
Seemingly, fluid and crystallized intelligence contributed proficiency and listening ability are still part of gc. This
differently to the prediction of RC, LC, and VC as denoted by notion of a separate Stratum II or broad factor traces back to
the diverging amounts of explained variance: 52% of the LC conceptualizations of the Cattell–Horn model (McGrew,
factor, 70% of RC factor, and 73% of the VC factor. Although 2009). Such an allocation and distinction seems inconclusive
fluid intelligence is a major contributor of crystallized and is questioned through the findings of the present study.
intelligence and comprehension ability in foreign language, Gc is actually as highly correlated with comprehension skills
it does not account for all the variance. Other important as the comprehension skills are correlated with each other.
determinants such as motivation, education by parents and Therefore, these results advocate the perspective that within
quality of school instructions, and socio-economic status are the Carroll (1993) model comprehension ability should be
not taken into account in the present study. Despite the understood as a Stratum 1 factor below a more general gc
collinearity between fluid and crystallized intelligence, the factor. This is not to say that comprehension measures per se
regression model helps in gauging the gf specific influence on should be placed below a gc factor. Reducing the relevance of
the different comprehension tasks. The significantly higher prior knowledge and increasing the reasoning demand in
regression weight of gf on RC – in comparison with LC and VC – comprehension measures should shift such tasks towards gf.
indicates that RC is more reliant on reasoning processes. This The correlative results may also reflect on the recent
could originate from different cognitive demands in compre- controversy about the role of g in large-scale student assess-
hension skills (cp. Reves & Levine, 1988): Textual reasoning ment. In short, some researchers argued that the performance
(i.e., creating a cognitive model and drawing inferences) could measures included in large-scale-assessments and intelli-
be more prevalent in RC whereas searching for details (i.e., gence tests essentially capture the same construct, that is,
detection and comparison of pieces of information, for a intelligence (Rindermann, 2006, 2007a; Frey & Detterman,
discussion concerning the dimensionality of RC see Rost, 2004; Gottfredson, 2003), whereas other researchers showed
1985) may be more prominent in the LC and VC tasks. their distinctiveness by means of confirmatory factor analysis
However, the discussion whether reading and listening have (Brunner, 2008; Baumert, Lüdtke, Trautwein, & Brunner,
different subskills has a long tradition and is still unsettled 2009). In our opinion, the dispute may profit from a broader
(Alderson, 2000; Song, 2008). definition and assessment of intelligence. In educational
The high correlations between RC, LC, and VC indicate that research the influence of such a broadly defined construct
all three tasks represent different instantiations of a higher- of intelligence is often underestimated or neglected. In order
order ability to comprehend independent of the sensory input to overcome this shortcoming, we would like to recommend
while the tasks are unique in some respects (cp. Buck, 2001). two things for future research: First, intelligence is often
This comprehension ability in English as a foreign language is simply modeled as the common variance of heterogeneous
also highly correlated with a measure of crystallized indicators of cognitive ability and the statistical abstraction is
intelligence. This finding is consistent with previous research then treated as synonymous with fluid or general intelli-
on the influence of gc on reading comprehension for the gence. However, this psychometric approach to g reveals little
corresponding age group (Benson, 2007) and underlines the about the psychological construct behind g (Thorndike,
importance of gc in higher-level cognition (Hambrick, 2005). 1994). Therefore, it seems indicated to assess fluid intelli-
The high correlations could also be interpreted as a cause of gence with an array of specific indicators across the three
beneficial interactions (or mutualism) between cognitive content domains—verbal, numerical, and figural (Wilhelm,
processes according to the dynamic model of general 2005). Second, it seems necessary to extend the assessment
intelligence proposed by Maas et al. (2006). In the present of intelligence and include additional facets of intelligence
study gc was assessed with a declarative knowledge test (see such as gc. In contemporary theories of the structure of
Ackerman, 2000; Rolfhus & Ackerman, 1999) rather than the intelligence there is a broad consensus that both a decontex-
often used vocabulary measures and was administered in tualized reasoning part and a domain-specific knowledge
German language. Regardless of the ostensible heterogeneity part are essential in order to capture the most general parts of
of the demands of the gc task and the comprehension tasks intelligence (Ackerman, 1996; Ackerman & Beier, 2005;
they correlated highly. This finding is in line with the McGrew, 2005, 2009). This two-component approach should
conceptualization that gc subsumes primarily verbal or also affect modern conceptions of student achievement more
language-based knowledge that is acquired through cognitive profoundly. Moreover, the existence of these two compo-
investment during education and general life experiences. nents is recognized by supporters of both perspectives (e.g.,
Therefore, gc should be more sensitive to effects of schooling Baumert et al., 2009; Rindermann, 2007b). In the current
than other cognitive abilities. In Carroll's Three-Stratum study we measured both components—decontextualized
Theory of Intelligence (1993) foreign language proficiency fluid intelligence and crystallized intelligence. We conclude
in general and comprehension skills in particular are listed in that both are major contributors to comprehension ability
Author's personal copy

572 U. Schroeders et al. / Intelligence 38 (2010) 562–573

and account roughly for half to three quarters of the factors' Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A critical
experiment. Journal of Educational Psychology, 54, 1−22.
variance. One reason why we might experience these Cattell, R. B. (1987). Intelligence: Its structure, growth, and action. Amsterdam:
empirical results as an uncomfortable truth is that our Elsevier.
understanding of crystallized intelligence is much poorer Farcot, M., & Latour, T. (2009). Transitioning to computer-based assess-
ments: A question of costs. In F. Scheuermann & J. Björnsson (Eds.), The
than our understanding of fluid abilities. transition to computer-based assessment (pp. 108–116). JRC Scientific
and Technical Reports.
Acknowledgements Flora, D. B., & Curran, P. J. (2004). An empirical evaluation of alternative
methods of estimation for confirmatory factor analysis with ordinal data.
Psychological Methods, 9, 466−491.
We thank Patrick C. Kyllonen and three anonymous Frey, M. C., & Detterman, D. K. (2004). Scholastic assessment or g? The
reviewers for their comments on an earlier draft of this article relationship between the scholastic assessment test and general
cognitive ability. Psychological Science, 15, 373−378.
and Frank Rijmen for detailed information about the specifica-
Gottfredson, L. S. (2003). G, jobs and life. In H. Nyborg (Ed.), The scientific
tion of the testlet model. Special thanks to the SWR for study of general intelligence: Tribute to Arthur R. Jensen (pp. 293−342).
providing video material of their broadcasting Schulfernsehen. Amsterdam: Pergamon Press, Netherlands.
Garrett, H. E. (1946). A developmental theory of intelligence. American
Psychologist, 1, 372−378.
References Green, R. (1981). Remembering ideas form text: The effect of modality of
presentation. British Journal of Educational Psychology, 51, 83−89.
Ackerman, P. L. (1996). A theory of adult intellectual development: Process, Hambrick, D. Z. (2005). The role of domain knowledge in higher-level
personality, interests, and knowledge. Intelligence, 22, 227−257. cognition. In O. Wilhelm, & R. W. Engle (Eds.), Handbook of understanding
Ackerman, P. L. (2000). Domain-specific knowledge as the “dark matter” of and measuring intelligence (pp. 361−372). London: Sage.
adult intelligence: Gf/gc, personality and interest correlates. Journal of Harp, S. F., & Mayer, R. E. (1998). How seductive details do their damage: A
Gerontology: Psychological Sciences, 55B, 69−84. theory of cognitive interest in science learning. Journal of Educational
Ackerman, P. L., & Beier, M. E. (2005). Knowledge and intelligence. In O. Psychology, 90, 414−434.
Wilhelm, & R. W. Engle (Eds.), Handbook of understanding and measuring Horn, J. L. (1968). Organization of abilities and the development of
intelligence (pp. 125−139). London: Sage. intelligence. Psychological Review, 75, 242−259.
Adank, P., Evand, B. G., Stuart-Smith, J., & Scott, S. K. (2009). Comprehension of Horn, J. L. (1991). Measurement of intellectual capabilities: A review of
familiar and unfamiliar native accents under adverse listening conditions. theory. In K. S. McGrew, J. K. Werder, & R. W. Woodcock (Eds.), WJ-R
Journal of Experimental Psychology: Human Perception and Performance, 35, technical manual (pp. 197−232). Chicago: Riverside.
520−529. Horn, J. L., & Noll, J. (1997). Human cognitive capabilities: Gf-Gc theory. In D. P.
Alderson, J. C. (2000). Assessing reading. Cambridge, UK: Cambridge University Flanagan, J. L. Genshaft, & P. L. Harrison (Eds.), Contemporary intellectual
Press. assessment. Theories, tests, and issues (pp. 53−91). New York: The
Alfonso, V. C., Flanagan, D. P., & Radwan, S. (2005). The impact of the Cattell– Guilford Press.
Horn–Carroll theory on test development and interpretation of cognitive Joshi, R. M. (2005). Vocabulary: A critical component of comprehension.
and academic abilities. In D. Flanagan, & P. L. Harrison (Eds.), Contemporary Reading and Writing Quarterly: Overcoming Learning Difficulties, 21,
intellectual assessment: Theories, tests, and issues (pp. 185−202). New York: 209−219.
Guilford. Lado, R. (1961). Language testing: The construction and use of foreign language
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: tests. London: Longman.
Oxford University Press. Lund, R. J. (1991). A comparison of second language listening and reading
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford: comprehension. Modern Language Journal, 75, 196−204.
Oxford University Press. Maas, H. L. J. V. D., Dolan, C. V., Grasman, R. P. P. P., Wicherts, J. M., Huizenga,
Baumert, J., Lüdtke, O., Trautwein, U., & Brunner, M. (2009). Large-scale H. M., & Raijmakers, M. E. J. (2006). A dynamical model of general
student assessment studies measure the results of processes of intelligence: The positive manifold of intelligence by mutualism.
knowledge acquisition: Evidence in support of the distinction between Psychological Review, 113, 842−861.
intelligence and student achievement. Educational Research Review, 4, MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The
165−176. problem of equivalent models in applications of covariance structure
Beauducel, A. (2003). Fluid and crystallized intelligence: Theory and measure- analysis. Psychological Bulletin, 114, 185−199.
ment. In R. Fernández-Ballesteros (Ed.), Encyclopedia of Psychological Mayer, R. E. (2005). Cognitive theory of multimedia learning. In R. E. Mayer
Assessment. pp, 1. (pp. 416−419) London: Sage. (Ed.), The Cambridge handbook of multimedia learning (pp. 31−48). New
Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum York: Cambridge University Press.
likelihood versus means and variance adjusted weighted least square Mayer, R. E., Heiser, J., & Lonn, S. (2001). Cognitive constraints on multimedia
estimation in confirmatory factor analysis. Structural Equation Modeling, learning: When presenting more material results in less understanding.
13, 186−203. Journal of Educational Psychology, 93, 187−198.
Benson, N. (2007). Cattell Horn Carroll cognitive abilities and reading McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ:
achievement. Journal of Psychoeducational Assessment, 26, 27−41. Erlbaum.
Bollen, K. A. (1989). Structural equations with latent variables. Oxford, England: McGrew, K. S. (2005). The Cattell–Horn–Carroll theory of cognitive abilities:
John Wiley & Sons. Past, present, and future. In D. P. Flanagan, & P. L. Harrisson (Eds.),
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects Contemporary Intellectual Assessment: Theories, tests, and issues
model for testlets. Psychometrika, 64, 153−168. (pp. 136−181). New York: The Guilford Press.
Brunner, M. (2008). No g in education? Learning and Individual Differences, McGrew, K. S. (2009). CHC theory and the human cognitive abilities project:
18, 152−165. Standing on the shoulders of the giants of psychometric intelligence
Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press. research. Intelligence, 37, 1−10.
Carroll, J. B. (1961). Fundamental considerations in testing for English Muthén, B. O. (1993). Goodness of fit with categorical and other non-normal
language proficiency of foreign students. In Testing Center for Applied variables. In K. A. Bollen, & J. S. Long (Eds.), Testing structural equation
Linguistics, Washington, DC. Reprinted in H. B. Allen, & R. N. Campbell models (pp. 205−243). Newbury Park, CA: Sage.
(Eds.), Teaching English as a Second Language: A Book of Readings Muthén, L. K., & Muthén, B. O. (2007). Mplus user's guide. Los Angeles:
(1972). McGraw Hill: New York. Muthén & Muthén.
Carroll, J. B. (1968). The psychology of language testing. In A. Davies (Ed.), Muthén, L. K., & Muthén, B. O. (2009). Mplus. Los Angeles: Muthén & Muthén.
Language testing symposium: A psycholinguistic approach (pp. 46−69). Oller, J. W. (1976). Evidence of a general language proficiency factor: An
London: Oxford University Press. expectancy grammar. Die Neueren Sprachen, 76, 165−174.
Carroll, J. B. (1983). Psychometric theory and language testing. In J. W. Oller Jr. Oller, J. W. (1979). Language tests at school. London: Longman.
(Ed.), Issues in language testing research (pp. 80−107). Rowley, MA: Oller, J. W. (1983). A consensus for the eighties? In J. W. Oller (Ed.), Issues in
Newbury House. language testing research (pp. 351−356). Rowley, MA: Newbury House.
Carroll, J. B. (1993). Human cognitive abilities. A Survey of factor-analytic Paivio, A. (1986). Mental representations: A dual coding approach. New York:
studies. New York: Cambridge University Press. Oxford University Press.
Carver, R. P. (1973). Effects of increasing the rate of speech presentation Purpura, J. E. (1999). Learner strategy use and performance on language tests: A
upon comprehension. Journal of Educational Psychology, 65, 118−126. structural equation modeling approach. Cambridge: Cambridge University.
Author's personal copy

U. Schroeders et al. / Intelligence 38 (2010) 562–573 573

Reves, T., & Levine, A. (1988). The FL receptive skills: Same or different? Schroeders, U., Bucholtz, N., Formazin, M., & Wilhelm, O. Modality specificity
System, 16, 327−336. of individual differences in comprehension measures in sciences.
Revuelta, J., Ximénez, M. C., & Olea, J. (2003). Psychometric and psychological Manuscript submitted for publication.
effects of item selection and review on computerized testing. Educational Schroeders, U., & Wilhelm, O. Equivalence of reading and listening
and Psychological Measurement, 63, 791−808. comprehension across test media. Manuscript submitted for publication.
Rijmen, F. (2009). Three multidimensional models for testlet-based tests: Sechrest, L. (1963). Incremental validity: A recommendation. Educational and
Formal relations and an empirical comparison. Educational Testing Psychological Measurement, 23, 153−158.
Service Research Report No: RR-09-37. Princeton, NJ: ETS. Shadish, W., Cook, T., & Campbell, D. (2002). Experimental and quasi-experimental
Rindermann, H. (2006). What do international student assessment studies designs for generalized causal inference. Boston: Houghton Mifflin.
measure? School performance, student abilities, cognitive abilities, Song, M. -Y. (2008). Do divisible subskills exist in second language (L2)
knowledge or general intelligence? Psychologische Rundschau, 57, 69−86. comprehension? A structural equation modeling approach. Language
Rindermann, H. (2007a). Intelligenz, kognitive Fähigkeiten, Humankapital Testing, 25, 435−464.
und Rationalität auf verschiedene Ebenen. [Intelligence, cognitive Spolsky, B. (1973). What does it mean to know a language: Or how do you
abilities, human resources, and rationality on different levels.]. Psycho- get someone to perform his competence? In J. W. Oller, & J. C. Richards
logische Rundschau, 59, 137−145. (Eds.), Focus on the learner: Pragmatic perspectives for the language
Rindermann, H. (2007b). The big G-factor of national cognitive ability teacher (pp. 164−176). Rowley, MA: Newbury House.
(author's response on open peer commentary). European Journal of Stoel, R. D., Garre, F. G., Dolan, C., & van den Wittenboer, G. (2006). On the
Personality, 21, 767−787. likelihood ratio test in structural equation modeling when parameters
Rolfhus, E., & Ackerman, P. (1999). Assessing individual differences in are subject to boundary constraints. Psychological Methods, 11, 439−455.
knowledge: knowledge, intelligence and related traits. Journal of SWR broadcasting. “USA — The Sound of...”. Retrieved March, 1, 2010, from
Educational Psychology, 91, 511−526. http://www.planet-schule.de/wissenspool/usa-the-sound-of/inhalt.
Rost, D. H. (1985). Dimensionen des Leseverständnisses [Dimensions of reading html
comprehension]. Braunschweig: Pedersen. Thorndike, R. L. (1994). g. Intelligence, 19, 145−155.
Rupp, A. A., Vock, M., Harsch, C., & Köller, O. (2008). Developing standards- Thurlow, M. L., Moen, R. E., Lekwa, A. J., & Scullin, S. B. (2010). Examination of
based assessment tasks for English as a first foreign language. Context, a reading pen as a partial auditory accommodation for reading assessment.
processes, and outcomes in Germany. Münster: Waxmann. Minneapolis, MN: University of Minnesota, Partnership for Accessible
Samuels, S. J. (1987). Factors that influence listening and reading compre- Reading Assessment.
hension. In R. Horowitz, & S. J. Samuels (Eds.), Comprehending oral and Wilhelm, O. (2005). Measuring reasoning ability. In O. Wilhelm, & R. W.
written language (pp. 295−325). San Diego: Academic Press. Engle (Eds.), Handbook of understanding and measuring intelligence
Sanchez, C. A., & Wiley, J. (2006). An examination of the seductive details (pp. 373−392). London: Sage.
effect in terms of working memory capacity. Memory & Cognition, 34, Wilhelm, O. (2009). Issues in computerized ability measurement: Getting
344−355. out of the Jingle and Jangle Jungle. In F. Scheuermann, & J. Björnsson
Scheuermann, F., & Björnsson, J. (Eds.) (2009). The transition to computer- (Eds.), The transition to computer-based assessment (pp. 145–150). JRC
based assessment. JRC Scientific and Technical Reports. Retrieved March, Scientific and Technical Reports.
1, 2010, from http://crell.jrc.it/RP/reporttransition.pdf Wilhelm, O., Schroeders, U., & Schipolowski, S. (2009). BEFKI. Berliner Test
Schnotz, W. (2005). An integrate model of text and picture comprehension. zur Erfassung fluider und kristalliner Intelligenz [Berlin test of fluid and
In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning crystallized intelligence]. Unpublished manuscript.
(pp. 49−69). New York: Cambridge University Press. Yu, C. Y. (2002). Evaluating cutoff criteria of model fit indices for latent
Schnotz, W., & Bannert, M. (2003). Construction and interference in learning variable models with binary and continuous outcomes. Doctoral
from multiple representations. Learning and Instruction, 13, 141−156. dissertation, University of California, Los Angeles.

View publication stats

You might also like