Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/270585231

Measuring reasoning ability

Chapter · January 2005


DOI: 10.4135/9781452233529.n21

CITATIONS READS

55 3,392

1 author:

Oliver Wilhelm
Ulm University
200 PUBLICATIONS   8,699 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Individual difference in face cognition View project

Prosocial behavior View project

All content following this page was uploaded by Oliver Wilhelm on 12 January 2015.

The user has requested enhancement of the downloaded file.


21-Wilhelm.qxd 9/8/2004 5:09 PM Page 373

21
MEASURING REASONING ABILITY
OLIVER WILHELM

DEDUCTIVE AND INDUCTIVE REASONING In deduction, the premises necessarily entail


or imply the conclusion. It is impossible that the
Reasoning is a thinking activity that is of crucial premises are true and that the conclusion is
importance throughout our lives. Consequen- false. Three perspectives on deduction can be
tially, the ability to reason is of central impor- distinguished. From a syntactic perspective, the
tance in all major theories of intelligence structure. relation between premises and conclusion is
Whenever we think about the causes of events derivable independent of the instantiation of the
and actions, when we pursue discourse, when we premises. The criterion for the correctness of an
evaluate assumptions and expectations based on argument is its derivability from the premises.
our prior knowledge, and when we develop ideas From a semantic perspective, the conclusion is
and plans, the ability to reason is pivotal. true in any possible model of the premises. The
The verb reason is associated with various criterion for the correctness of an argument is its
highly overlapping meanings. Justifying and validity. From a pragmatic perspective, there is
supporting concepts and ideas is as important as a learned or acquired relation between premises
convincing others through good reasons and the and conclusion that has no logical necessity.
“discovery” of conclusions through the analysis The criterion to assess the quality of an argu-
of discourse. In modern psychology, usually two ment is its utility.
to three forms of reasoning are distinguished. In These perspectives cannot be applied to
deductive reasoning, we derive a conclusion that induction because the criteria to assess conclu-
is necessarily true if the premises are true. In sions must be different. Carnap’s formalization
inductive reasoning, we try to infer information has attracted considerable attention when it
by increasing the semantic content when pro- comes to distinguishing forms of induction.
ceeding from the premises to the conclusion. Carnap (1971) classifies inductive arguments as
Sometimes, a third form of reasoning is distin- enumerative and eliminative. In enumerative
guished (Magnani, 2001). In abductive reason- induction, the premises assert something as true
ing, we reason from a fact to the action that has of a finite number of specific objects or subjects,
caused it. Abductive reasoning has not been and the conclusion infers that what is true for
thoroughly investigated in intelligence research, the finite number is true of all such objects or
and we can consider abductive reasoning to be a subjects. In eliminative induction, confirmation
subset and mixture of inductive and deductive proceeds by falsifying competing alternative
reasoning. In the remainder of this chapter, hypotheses. The problem with induction is that
abductive reasoning will not be discussed. we cannot prove for any inductive inference

373
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 374

374– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


with true premises that the inference provides In contemporary psychological research on
us with a true conclusion (Stegmüller, 1996). reasoning, so-called dual-process theories domi-
Nevertheless, induction is of crucial importance nate. In these theories, an associative, heuristic,
in science whenever we talk about discovery. implicit, experiential, and intuitive system of
However, the testing of theories is a completely information processing is contrasted with a
deductive enterprise. Induction and deduction second rule-based, analytical, explicit, and
hence both have their place in science, and the rational system (Epstein, 1994; Evans, 1989;
ability to draw good inductive and deductive Hammond, 1996; Sloman, 1996; Stanovich, 1999).
inferences is of major importance in real life. Most of the biases found in reasoning, judgment,
Historically, logic was primarily established and decision making can be located within
through Aristotle. Although Aristotle viewed the first system. A reasoning competence and
logic as the proper form of scientific investiga- propensity to think rationally can be located
tion, he used the term as equivalent to verbal within the second system. In considering individ-
reasoning. The syllogistic form of reasoning, as ual differences in reasoning ability, the interest is
established through Aristotle, dominated logic up primarily on differences within the second sys-
until the middle of the 19th century. Throughout tem. Most of the differences could reflect indi-
the second half of the 19th century, there was a vidual differences in available resources for the
rapid development of logic as a scientific disci- computational work to be accomplished to obtain
pline. Philosophers such as George Boole (1847) a correct response. An additional source of indi-
and Gottlob Frege (1879) started to develop for- vidual differences might be the probability with
malizations of deductive logic as a language that which individuals deliberately use the second
went beyond the idea that logic should reflect system when responding to specific problems.
common sense and sound reasoning. In a nut- The discussion of individual differences in
shell, logic was the manipulations of symbols by reasoning ability starts with the assertion that
virtue of a set of rules. The “logical” truth of an there are individual differences in the ability
argument was hence no longer assessed by agree- to reason according to some rational standard.
ment with some experts or through acceptance by Humans can be rational in principle, but they
common sense. Whether logical reasoning was fail to a varying degree in practice. The princi-
correct could then be assessed by agreement with ple governing this rationality is that people
a calculus. In our historical excursion, we need to accept inferences as valid if there is no mental
note, however, that George Boole did believe that representation contradicting the conclusion
the laws of thinking and the rules of logic are (Johnson-Laird & Byrne, 1993; Stanovich,
equivalent, and John Stuart Mill thought that the 1999). Individual differences from this per-
rules of logic are generalizations of forms of spective primarily arise from restrictions in the
conclusions considered true by humans. ability to create and manipulate mental repre-
Apparently from the early days of logic to sentations. In other words, depending on our
now, the puzzle remains that although humans cognitive apparatus, we are able to find a good,
“invented” logic, they are not able or willing to or the correct, answer to some reasoning prob-
follow its standards in all instances. Humans are lems but not to other more difficult problems.
vulnerable to errors in reasoning and do not pro- In measuring reasoning ability, it is conse-
ceed consistently in deriving conclusions. The quently assumed that individuals can think
research on biases, contents, and strategies in rationally but that there are individual differ-
reasoning has a long tradition in psychology. ences in how well people can do so.
For example, Störing (1908) investigated
thought processes in syllogistic reasoning and
distinguished various strategies, Wilkins (1929) THOUGHT PROCESSES IN REASONING
manipulated test content and observed effects
on test properties, and Woodworth and Sells There are several competing theories for the
(1935) conducted outstanding research on a description and explanation of reasoning
particular bias in syllogistic reasoning labeled processes. The theories are distinguished by the
the “atmosphere effect.” broadness of the phenomena they can explain
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 375

Measuring Reasoning Ability– • –375

and how profound the proposed explanations distributed symbolic representations are the
are. They are also different with respect to how basis of relational reasoning in working memory.
much experimental research was done to inves- There is no doubt substantial promise in extend-
tigate them and how much supportive evidence ing these accounts of inductive thinking to
was collected. The theory of mental models available reasoning measures. So far, there is not
(Johnson-Laird & Byrne, 1991) is one outstand- enough experimental evidence available allow-
ing effort in describing and explaining what ing derivation of predictions of item difficulties
people do when they reason, and this theory will (but see Andrews & Halford, 2002), and there is
be described in more detail after briefly review- not enough variability in the application of the
ing more specific accounts of deductive and theories to allow a broad application in pre-
inductive reasoning, respectively. dicting psychometric properties of reasoning
Besides many more specific accounts of rea- tests in general. To illustrate the character and
soning, the mental logic approach to reasoning promise of theories of reasoning processes,
has many adherents and was applied to a broad I will limit the exposition to the mental model
range of reasoning problems (Rips, 1994). theory. It is hoped that the future will bring an
According to mental logic theories, individuals integration of theories of inductive and deduc-
apply schemata of inference when they reason. tive reasoning along with strong links to
Errors in reasoning occur when inference theories of working memory.
schemata are unavailable, corrupted, or cannot The mental model theory has been exten-
be applied. More complex inferences are sively applied to deductive reasoning (Johnson-
accomplished by compiling several elemental Laird, 2001; Johnson-Laird & Byrne, 1991)
schemata. The inference schemata in various and inductive thinking (Johnson-Laird, 1994b).
mental logic theories are different from each Briefly, mental model theory views thinking
other, from logical terms in natural language, as the manipulation of models (Craik, 1943).
and from logical terms in formal logic. The These models are analogous representations,
“psychology of proof” by Rips (1994) is the meaning that the structure of the models corre-
most elaborated and sophisticated theory of sponds to what they represent. Each entity is
mental logic. However, the mental model theory represented by an individual token in a model.
covers a broader range of phenomena than Properties of and relations between entities
mental logic accounts do. In addition, the exper- are represented by properties of and relations
imental support seems to be in favor of the between tokens, respectively. Negations of
mental models theory. Finally, both sets of atomic propositions are represented as annota-
theories are closely related with each other—the tions of tokens. Information can be represented
major difference being that the mental model implicitly, and the implicit status of a model is
approach deals with reasoning on the semantic part of the representation. If necessary, implicit
level, whereas mental logic theories investigate representations can be fleshed out by simple
reasoning on the syntactic level. mechanisms. The epistemic status of a model
Analogical reasoning is a subset of inductive is represented as a propositional annotation in
thinking that has received considerable attention the model.
in cognitive psychology. For example, Holyoak A major determinant of the difficulty of rea-
and Thagard (1997) developed a multiconstraint soning tasks is the number of mental models
theory of analogical reasoning. Three con- that are compatible with the premises. The
straints are claimed to create coherence in ana- premises “A is left of B. B is left of C. C is left
logical thought: similarity between the concepts of D. D is left of E.” can be easily integrated into
involved; structural parallels—specifically, one mental model:
isomorphism—between the functions in the
source and target domains; and guidance by the A B C D E
reasoner’s goals. This work was recently
extended. Hummel and Holyoak (2003) devel- This mental model supports conclusions
oped a symbolic connectionist model of rela- such as “C is left of E.” However, the premises
tional inference. The theory suggests that “A is left of B. B is left of C. C is left of E. D is
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 376

376– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


left of E.” call for the construction of two mental In the second stage, a parsimonious description
models. The first mental model places C left of of the constructed model(s) is attempted. If
D, whereas the second mental model places D the task is deductive reasoning, the resulting
left of C. construction should include something that
was not explicitly evident in the premises.
Model 1: A B C D E Technically, no meaning is created in deduction.
Model 2: A B D C E It is all implicit in the premises. Experientially,
deductive conclusions do not seem to be com-
Both models are compatible with the pletely obvious and apparent. If no such conclu-
premises. Generally, the more mental models sion can be found, the answer to the problem can
that are compatible with the premises of a rea- be that there is no conclusion to the problem. If
soning task, the harder the task will be. This pre- the task is inductive reasoning, the resulting con-
diction has been confirmed with a wide variety struction allows a conclusion that increases the
of reasoning tasks, including syllogisms, spatial semantic information of the premises. Hence,
and temporal reasoning, propositional reason- a tentative hypothesis is constructed that implies
ing, and probabilistic reasoning (Johnson-Laird, a semantically stronger description than evident
1994a; Johnson-Laird & Byrne, 1991). In estab- in the premises. However, if background knowl-
lished measures of reasoning ability, it is hard or edge is operating besides the premises, an induc-
impossible to specify the nature and number of tive problem might turn into an enthymeme—a
mental models a given item calls for (Yang & deduction in which not all premises are explicit.
Johnson-Laird, 2001). This is because test con- Many of the so-called inductive tasks used in
struction is usually driven by applying psycho- intelligence research technically might well
metric criteria and not by creating indicators be classified as enthymemes. Frequently used
through the strictly theory-driven derivation number-series problems could qualify as
from a cognitive model of thought processes. In enthymemes. If the premises of such a number-
specifically constructed measures, on the other series task are explicitly stated—for example, as
hand, the nature and number of mental models “Continue the number series 1, 3, 5, 7, 9, 11 by
that participants need to construct in order to one more number,” “The operations you can use
solve an item correctly can be manipulated. The are ‘+’ ‘−,’ ‘/,’ and ‘*’ and all results are positive
empirical study presented later in this chapter integers,” and “rules are indicating regularities in
mixes measures with and without explicit proceeding through the number series, and these
manipulation of the number of mental models regularities can include rule-based changes to
required for successful solution. the rule”—there might be just one option that
Inductive and deductive reasoning processes meaningfully continues the sequence: 13.
go through the same three stages of information In the third stage, models are evaluated,
processing. In the first stage, the premises are maintained, modified, or rejected. If the task is
understood. Knowledge in general and literacy deductive reasoning, counterexamples to tenta-
in dealing with the stimuli are critical in build- tive conclusions are searched for. If no coun-
ing a representation of the problem. Frequently, terexample can be found, the conclusion is
the problem will be verbal, and hence reading produced. If a counterexample is found, the
comprehension will be an important aspect of process goes back to Stage 2. If the task is
the creation of representations. However, it is inductive reasoning, the conclusion adds infor-
well known that strategies can have an effect on mation to the premises. The conclusion should
encoding. In solving syllogisms, subgroups of be consistent with the premises and background
individuals might follow different strategies for knowledge. Obviously, inductive conclusions
creating an initial representation of problem are not necessarily true. If an induction turns out
content (Ford, 1995; Stenning & Oberlander, to be wrong, either the premises are false or the
1995; Sternberg & Turner, 1981). As a result, induction was too strong. If a deduction turns
specific groups of items are hard for one sub- out to be wrong, the premises must be false.
group but not for another, whereas for a second Evidently, only the third stage is specific to
group of items, the reverse is true. inductive and deductive reasoning, respectively.
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 377

Measuring Reasoning Ability– • –377

However, errors in answering reasoning relations. The ability to educe relations and
problems can be located at any of the three correlates is best reflected in reasoning measures.
stages. The relevance of the third stage as a Other intelligence measures are characterized
primary source of errors can be debated. by varying proximity to the general factor.
Johnson-Laird (1985) argues that the search Reasoning measures are expected to have high g
for counterexamples is crucial for individual loadings and low proportions of specific vari-
differences, yet Handley, Dennis, Evans, and ance. The g factor is said to be precisely defined
Capon (2000) argue that individuals rarely and the core construct of human abilities
engage in a search for counterexamples. Psy- (Jensen, 1998; but see Chapter 16, this volume).
chometrically, syllogisms and spatial relational There are several more or less strict interpreta-
tasks that do not rely on a search for counterex- tions of the g factor theory (Horn & Noll, 1997).
amples are as good or better than measures of In its strictest form, one core process is causal
reasoning ability as items that require such a for all communalities in individual differences.
search (Wilhelm & Conrad, 1998). In a much more relaxed form of the theory, a
Theories about reasoning processes in general general factor is supposed to capture the corre-
and the mental model theory in particular have lations between oblique first- or second-order
been widely and successfully applied to reason- factors. With respect to reasoning, Spearman
ing problems. Few of these applications have (1923) considered inductive and deductive
considered problems from psychometric reason- reasoning to be forms of syllogisms. Although
ing tasks (but see Yang & Johnson-Laird, 2001). Spearman (1927) did not exclude the option of
We will now discuss the status of reasoning abil- a reasoning-specific group factor besides g, per-
ity in various models of the structure of intelli- formance on reasoning measures was assumed
gence, as assessed by psychometric reasoning to be primarily limited by mental energy—or g.
tasks, and then turn to formal and empirical clas- The controversy around Spearman’s theory
sifications of reasoning measures. Ideally, a gen- was initially focused on statistical and method-
eral theory of reasoning processes should govern ological issues, and it was in the context of new
test construction and confirmatory data analysis. statistical developments that Thurstone con-
In practice, theories of reasoning processes have tributed his theory of primary mental abilities.
rarely been considered when creating and using Thurstone’s initial work on the structure of intelli-
psychometric reasoning tasks. gence (1938) was substantially modified and
improved by Thurstone and Thurstone (1941). In
the later work, the primary factors of Space,
REASONING IN VARIOUS MODELS Number, Verbal Comprehension, Verbal Fluency,
OF THE STRUCTURE OF INTELLIGENCE Memory, Perceptual Speed, and Reasoning are
distinguished. The initial distinction between
Binet’s original definition of intelligence inductive and deductive reasoning was abandoned,
focused on abilities of sensation, perception, and the associated variances were allocated to
and reasoning, but this definition was modified Reasoning, Verbal Comprehension, Number, and
several times and ended up defining intelligence Space. The Reasoning factor is marked mostly
as the ability to adapt to novel situations (Binet, by inductive tasks. Several of the other factors
1903, 1905, 1907). Structurally, Binet’s as well have substantial loadings from reasoning
as Ebbinghaus’s (1895) earlier investigations do tasks. In a sample of eighth-grade students, the
not fall within the realm of factor-analytic work, Reasoning factor is the factor with the highest
and consequently, they have been rarely dis- loading on a second-order factor. Further elabo-
cussed in this context. ration of deductive measures by creating better
Spearman’s invention of tetrad analysis as a indicators, as suggested by the Thurstones, was
means to assess the rank of correlation matrices attempted only by the research groups surround-
was the starting point of factor-analytic work ing Colberg (Colberg, Nester, & Cormier, 1982;
(Krueger & Spearman, 1906; Spearman, 1904). Colberg, Nester, & Trattner, 1985) and Guilford.
Spearman’s definition of general intelligence Guilford’s contribution to the measurement
(g) focuses on the role of educing correlates and of reasoning ability is mostly in constructing and
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 378

378– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


popularizing reasoning measures. The structure- human cognitive abilities. The result of this
of-intellect (SOI) theory (Guilford, 1956, 1967) work is an elaborated hierarchical theory that
is mostly to be credited for its heuristic value postulates a general factor, g, at the highest
in including some of what was previously “no- level. On a second level, broad ability factors
man’s-land” into intelligence research. For the are distinguished. The proposed abilities are
present purposes, the focus is on reasoning ability fluid intelligence (Gf), crystallized intelligence
exclusively, and Guilford’s major contributions to (Gc), general memory and learning, broad
this topic can be located prior to the specification visual perception, broad auditory perception,
of the SOI theory. On the basis of a mixture of broad retrieval ability, broad cognitive speedi-
literature review, construction of specific tests, and ness, and processing speed. Fluid intelligence
empirical investigations of the structure of reason- is largely identified by three reasoning abili-
ing, Guilford proposed initially three, later four, ties distinguished on the lowest stratum of
reasoning factors (Guilford, Christensen, Kettner, Carroll’s theory. The three reasoning factors
Green, & Hertzka, 1954; Guilford, Comrey, are Sequential Reasoning, Induction, and
Green, & Christensen, 1950; Guilford, Green, & Quantitative Reasoning. The Sequential Reason-
Christensen, 1951; Hertzka, Guilford, Christensen, ing factor is measured by tasks that require
& Berger, 1954). These four factors (General participants to reason from premises, rules, or
Reasoning, Thurstone’s Induction, Commonalities, conditions to conclusions that properly and
and Deduction) are hard to separate conceptually necessarily follow from them. In the remainder
and empirically. Specifically, the first three factors of this chapter, the terms sequential reasoning
are very similar on the task level, and empirically, and deductive reasoning will be used inter-
inductive tasks load on all three of these reasoning changeably. The Induction factor is measured
factors. The deduction factor is marked weakly by tasks that provide individuals with materials
with tasks that are hard to distinguish from tasks that are governed by some rules, principles,
assigned to other reasoning factors. The tasks similarities, or dissimilarities. Participants are
popularized by Guilford are still in use today supposed to detect and infer those features of
(Ekstrom, French, & Harman, 1976), but many the stimuli and apply the inferred rule. The
measures are available that are much better Quantitative Reasoning factor is measured by
conceptually and psychometrically. tasks that ask the participant to reason with
The Berlin Intelligence Structure model concepts involving numerical or mathematical
(Jäger, Süß, & Beauducel, 1997; see Chapter 18, relations. Figure 21.1 presents the classification
this volume) is a bimodal hierarchical perspec- of reasoning tasks according to Carroll (1993,
tive on cognitive abilities. Intelligence tasks are p. 210).
classified with respect to a content facet and an The theory developed by Cattell and Horn
operation facet. On the content facet, Verbal, (Horn & Noll, 1994, 1997) is very closely
Quantitative, and Spatial intelligence are related to Carroll’s theory. In fact, Carroll’s
distinguished. On the operation facet, Creativity/ theory is more based on Cattell’s and Horn’s
Fluency, Memory, Processing Speed, and work than the other way round. Their investiga-
Reasoning are distinguished. The model has a tion of human cognitive capabilities was
surface similarity with Guilford’s SOI theory focused on five kinds of evidence in its develop-
but avoids some of the technical pitfalls of ment: first, structural evidence as expressed in
Guilford’s model. The Reasoning factor on the the covariation of performances; second, devel-
operation facet is defined as information pro- opmental change through the life span; third,
cessing in tasks that require availability and neurocognitive evidence; fourth, achievement
manipulation of complex information. The pro- evidence as expressed in the prediction of
cessing thus reflects reasoning and judgment criteria involving cognitive effort; and fifth,
abilities. The Reasoning factor is defined across behavioral-genetic evidence. Major differences
the content facet, and consequently, there are between the three-stratum theory from Carroll
verbal, spatial, and numerical reasoning tasks. and the Gf-Gc theory from Horn and Cattell are
In an epochal effort, Carroll (1993) summa- the lack of a general factor in the Cattell-Horn
rized and reanalyzed factor-analytic studies of framework because, according to Horn and Noll
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 379

Measuring Reasoning Ability– • –379

Categor. Quantit.
Syllog. Tasks
Sequ.
Reason. Quantitat.
Linear Multiple
Syllog. Exempl.

Gen. ver. Matrix


Reason. gf Tasks

Rule Odd
Discover Elements

Series Inductive Analo-


Tasks gies

Figure 21.1 Carroll’s Higher-Order Model of Fluid Intelligence (Reasoning)

(1994), there is no unifying principle and hence FORMAL CLASSIFICATIONS OF REASONING


no sufficient reason for specification of a
general factor. However, for the present There is certainly no lack of reasoning measures.
purposes, the proposed structure and interpreta- Carroll (1993) lists a very broad variety of avail-
tion of reasoning ability is of major importance. able reasoning tasks, and more, similar tests
Horn and Noll interpret fluid intelligence as could be developed without major problems.
inductive and deductive reasoning that is critical Kyllonen and Christal (1990) summarize the
in understanding relations among stimuli, com- situation as follows:
prehending implications, and drawing infer-
ences. Horn and Noll (1997) also speak about Since Spearman (1923) reasoning has been
conjunctive and disjunctive reasoning, but sup- defined as an abstract, high-level process, eluding
posedly, these two forms fall under inductive precise definition. Development of good tests of
and deductive reasoning. The Cattell-Horn reasoning ability has been almost an art form,
theory assumes that both inductive and deduc- owing more to empirical trial-and-error than to
tive reasoning tasks can have verbal as well as systematic delineation of the requirements such
spatial content (Horn & Cattell, 1967). This idea tests must satisfy. (p. 426)
can be extended, and both Gf and Gc can be
measured with a broader variety of contents Although empirical evidence indicates that
(Beauducel, Brocke, & Liepmann, 2001). In some measures are better indicators of reason-
terms of the structure of reasoning ability, there ing ability than others, the theoretical knowl-
is little difference between Carroll’s theory, on edge about which measure is good for what
the one side, and the Cattell-Horn framework, reasons is still very limited. In addition, scien-
on the other. The major difference is the postu- tists and practitioners are left with little advice
lation of a separate quantitative factor in the from test authors as to why a specific test has
latter model, whereas Carroll subsumes quanti- the form it has. It is easy to find two reasoning
tative reasoning under fluid intelligence. tests that are said to measure the same ability
Based on available psychometric reasoning but that are vastly different in terms of their
tasks, reasoning ability has a central place in all of features, attributes, and requirements.
the above-discussed theories of the structure of Compared to this bottom-up approach of test
intelligence. However, the manifold of available construction, a top-down approach could facilitate
measures might still reflect a biased selection from construction and evaluation of measures. There
all possible reasoning tests. The two following are four aspects of such a top-down approach that
sections on formal and empirical classifications will be discussed subsequently: operation, con-
should contribute to deepening our understanding tent, instantiation and nonreasoning requirements,
of reasoning measures and reasoning ability. and vulnerability to reasoning strategies.
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 380

380– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


The first aspect to consider in the classification importance. Instantiations of reasoning problems
of reasoning measures is the formal operational can either conform or not with our prior knowl-
requirement. Reasoning tasks can call for induc- edge. Nonconforming instantiations can either
tive and deductive inferences, and among be “counterfactual” or “impossible.” A “coun-
various tests for fluid intelligence, there are terfactual” instantiation could be “All psycholo-
additional tests that primarily call for judgment, gists are Canadian. All Canadians drive Porsches.”
decision making, and planning. In focusing on An “impossible” instantiation could be “All cats
inductive and deductive reasoning, the distinc- are dogs. All dogs are birds.” In the branch that
tion is that in inductive reasoning, individuals includes instantiations that conform to prior
create semantic information; as a result, the knowledge, we can distinguish “factual” and
inferences are not necessarily true. In deductive “possible” instantiations. A “factual” instantia-
reasoning, however, individuals maintain tion could be “All cats are mammals. All
semantic information and derive inferences that mammals have chromosomes.” A “possible”
are necessarily true if the premises are true. instantiation could be “All white cars in this
Tasks that are commonly classified as requiring garage are fast. All fast cars in this garage run
broad visualization (Carroll, 1993) usually out of petrol.”
satisfy the definition of deductive reasoning. It is well established that the form of the
However, the visualization demand of such instantiation has substantial effects on the diffi-
tasks is pivotal and paramount (Lohman, 1996), culty of structurally identical reasoning tasks
and such tasks will consequently be excluded (Klauer, Musch, & Naumer, 2000). It is also
from further discussion. known that the form of the instantiation of rea-
A second aspect to consider in the classifica- soning tasks has some influence on the psycho-
tion of reasoning measures is the content of metric properties of reasoning tasks (Gilinsky
tasks. Tasks can have many contents, but the & Judd, 1993). Abstract instantiations might
vast majority of reasoning measures employ induce test anxiety in some individuals because
figural, quantitative, or verbal stimuli. Many they look like formulas. Aside from this possi-
tasks also represent a mixture of contents. For ble negative effect, abstract instantiations might
example, arithmetic reasoning tasks can be both be a good format for reasoning tasks. Instan-
verbal and quantitative. Experimental manipula- tiations that do not conform to prior knowledge
tions of the content of measures are desirable are likely to be less good forms of reasoning
to understand the structure of reasoning ability problems because there is an apparent conflict
more profoundly. between prior knowledge and the required
A third aspect of relevance in classifying thought processes. It is likely that some indi-
measures of reasoning ability has to do with the viduals are better able than others to abstract
instantiation of reasoning problems. Reasoning from their prior knowledge. However, such an
problems have an underlying formal structure. abstraction would not be covered by a measure-
If we decide to construct a measure of reasoning ment intention that aims at assessing the ability
ability, we instantiate this general form and have to reason deductively. Instantiations that actu-
a variety of options in doing so. In choosing ally reflect prior knowledge are not good forms
between these options, essentially we go for reasoning problems because rather than rea-
through a decision tree. A first choice might soning, the easiest way to a solution is to recall
be to use either concrete or abstract forms of the actual knowledge. Some of the most widely
reasoning problems. In the abstract branch, we used tests of deductive reasoning are “impossi-
might choose between a “nonsense” instantia- ble” instantiations. The psychometric differences
tion and a “variable” instantiation. In the case of between measures instantiated in a different
syllogistic reasoning tests, a nonsense instantia- way are likely to be not trivial.
tion might be “All Gekus are Lemis. All Lemis The final aspect of a classification of reason-
are Filop.” A “variable” instantiation of the ing measures discussed here deals with the
same underlying logical form could be “All vulnerability of a task to reasoning strategies.
A are B. All B are C.” In the concrete branch of In measuring reasoning ability—like most other
the decision tree, prior knowledge is of crucial abilities—it is assumed that all individuals
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 381

Measuring Reasoning Ability– • –381

approach the problems in the same way. Some be used when studying reasoning ability. The
individuals are more successful than others benefits would be mutual. For example, differ-
because they have “more” of the required abil- ences in correlations between various individual
ity. Consequently, it is implicitly assumed that reasoning items as used in cognitive research
individuals at the very top of the ability distrib- and latent variables from reasoning ability tests
ution proceed roughly in the same way through might reveal important differences between the
a reasoning test as individuals at the very experimental tasks. Similarly, variability in the
bottom of the distribution. If a subgroup of par- difficulties of items from standard psychometric
ticipants chooses a different approach to work reasoning tests can be possibly explained by
on a given test, the consequence is that the test application of various theories of reasoning
is measuring different abilities for different sub- processes—like the mental model theory that
groups. For syllogistic reasoning, it is known was sketched above.
that there are two or three subgroups of individ-
uals who approach syllogistic reasoning tests
differently. Depending on which strategy is EMPIRICAL CLASSIFICATIONS
chosen, different items are easy and hard, respec- OF REASONING MEASURES
tively (Ford, 1995). Knowledge about strategies
in reasoning is limited (but see Schaeken, de In psychology, inductive reasoning has fre-
Vooght, Vandierendonck, & d’Ydewalle, 2000), quently been equated with proceeding from
and the role of strategies in established reasoning specific premises to general conclusions.
measures has been barely investigated. Conversely, deductive reasoning has frequently
The actual reasoning tasks that have been been equated with proceeding from general
used in experimental investigations of reasoning premises to specific conclusions. This definition
processes and psychometric studies of reason- can still be found in textbooks, but it is outdated.
ing ability have little to no overlap in surface There are inductive arguments proceeding from
features. However, there is now good evidence general premises to specific conclusions, and
(Stanovich, 1999) that reasoning problems, as there are deductive arguments proceeding from
they have been used in cognitive psychology, specific premises to general conclusions. For
are moderately correlated with reasoning mea- example, the argument “Almost all Swedes are
sures as they have been used in individual- blond. Jan is a Swede. Therefore Jan is blond.”
differences research. The experimentally used is an inductive argument that violates the above
tasks have been thoroughly investigated, and we definition, and the argument “Jan is a Swede.
now know a lot about the ongoing thought Jan is blonde. Therefore some Swedes are
processes involved in these tasks. One important blond.” is a deductive argument that also
conclusion from this research is that the instan- violates the above definition.
tiations of reasoning problems are appropriate According to Colberg et al. (1982), most
to elicit the intended reasoning processes for the established reasoning tests confound the direc-
most part (Shafir & Le Boeuf, 2002; Stanovich, tion of inference (general or specific premises
1999). However, there are pervasive reliability and general or specific conclusions) with deduc-
issues because frequently, only a few such tive and inductive reasoning tasks. By con-
problems are used in any given experiment. structing specific deductive and inductive
Conversely, we do not know a lot about ongoing reasoning tasks (Colberg et al., 1985), they pre-
thought processes in established measures of sent correlational evidence that seems to support
reasoning ability as used in psychometric the unity of inductive and deductive reasoning
research. However, we do know a lot about their tasks. However, reliability of the measures is
structure (Carroll, 1993), their relations with very low; the applied method of disattenuating
other measures of maximal behavior (Carroll, correlations is not satisfying; and, most impor-
1993; Jäger et al., 1997; Kyllonen & Christal, tant, Shye (1988) reclassifies their tasks and
1990), and their validity for the prediction finds support for a distinction between rule-
of real-life criteria (Schmidt & Hunter, 1998). inferring and rule-applying tasks (see Chapter 18,
Both sets of reasoning tasks can and should this volume). In the initial classification and
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 382

382– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


construction of tasks (Colberg et al., 1985), tests variations in instructing participants, and, most
have been labeled as inductive when in fact they important, individual measures that are classi-
were probabilistic. Probabilistic tasks can, in fied post hoc rather than a priori.
principle, be deductive (Johnson-Laird, 1994a; In carefully examining Tables 6.1 and 6.2
Johnson-Laird, Legrenzi, Girotto, Legrenzi, & from Carroll (1993), it is apparent that the
Caverni, 1999), and the probabilistic tasks used deductive reasoning tasks are frequently verbal.
(Colberg et al., 1985) were in fact deductive Content for the inductive reasoning tasks is
tasks. What was shown by Colberg (Colberg more diverse but tends to be figural-spatial. The
et al., 1982, 1985), then, was the unity of some last reasoning factor is rather unequivocally a
forms of deductive reasoning tasks, and what quantitative factor. An explanation of the data
Shye demonstrated was that task classification in Carroll as indicating a distinction between
is a sensitive business and that rule-applying inductive, deductive, and quantitative reasoning
tasks, as constructed by Colberg et al., fall into competes with an explanation that distinguishes
the periphery of a multidimensional scaling, between verbal, figural-spatial, and quantitative
with rule inferring/inductive reasoning at the content. Inspection of Carroll’s reanalysis of
center of the solution. individual data sets is compatible with an inter-
The most sophisticated, ambitious, and pretation of the factor labeled as general sequen-
advanced attempt to propose factors of reason- tial reasoning or deductive reasoning as a verbal
ing ability comes from Carroll (1993). Carroll reasoning factor. The inductive reasoning factor,
discusses the structure of reasoning ability, on the other side, could reflect figural-spatial
bearing in mind several objections and difficul- reasoning. The quantitative reasoning factor
ties. Among those objections are that (a) reason- apparently reflects numerical or quantitative
ing tests are frequently complex, requiring reasoning. Compatible with this interpretation
both inductive and deductive thought processes; is that the deductive reasoning factor can fre-
(b) reasoning measures are often short and quently not be distinguished from a verbal
administrated under timed conditions; (c) rea- factor and tends to have high loadings on a
soning tests are usually not carefully constructed higher-order crystallized factor. In accord with
and analyzed on the item level; (d) inductive and the interpretation of the inductive reasoning
deductive reasoning processes are learned and factor, the figural-spatial reasoning processes
developed together; and (e) many reasoning measured with the associated tasks tend to be
measures involve language, quantitative, or highly associated with a higher-order fluid rea-
spatial skills to an unknown amount. soning factor. In line with this theorizing, the
Carroll (1993) asserts that his proposal of the induction factor has the highest loading on g of
three reasoning factors—Induction, Deduction, all Stratum 1 factors. The deductive reasoning
and Quantitative Reasoning—is preliminary for factor ranks only 10 among these loadings. The
several reasons (but see Carroll, 1989). First, in mean loading of induction on g is .57, whereas
many of the reanalyzed studies, only one rea- the mean loading of deductive reasoning is only
soning factor emerged. This is simply due to the .41. Besides the mean difference in the average
fact that there was frequently not a sufficient magnitude of loadings, there is a higher disper-
number of reasoning tests included to examine sion of g loadings among the deductive tasks.
the structure of reasoning ability in such studies. Similarly, the fluid intelligence factor, Gf, is
Second, in the 37 out of 176 data sets with more best defined by induction in Carroll’s reanalysis.
than one reasoning factor, most of the studies Gf is defined by induction 19 times, with an
were never intended and designed to investigate average loading of .64. Deductive reasoning
the structure of reasoning ability. Third, those defined Gf only 6 times, with an average load-
studies intended to investigate the structure of ing of .55. On the other side, deductive reason-
reasoning ability included insufficient numbers ing appears among the variables defining
of reasoning measures. Other problems with crystallized intelligence. Deductive reasoning
investigating the structure of reasoning ability defined the Gc factor 7 times, with an average
include variations in time pressure across tests loading of .69. Induction does not appear on the
and studies, variations in scoring procedures, list of Stratum 1 abilities defining crystallized
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 383

Measuring Reasoning Ability– • –383

intelligence. Finally, deductive reasoning appears deductive figural-spatial tasks. However, these
8 times, with an average loading of .70 on a tasks frequently represent a mixture with other
factor labeled 2H—reflecting a mixture of fluid demands. For example, “ship-destination” has
and crystallized intelligence. Induction, on the quantitative demands; “match problems,” “plot-
other hand, appeared only twice, with an ting,” and “route planning” have visualization
average loading of .41. demands. In classifying 90 German intelligence
Given these considerations, the proposal of tasks, Wilhelm (2000) could not find a single
reasoning ability as being composed of induc- deductive figural-spatial measure.
tive, deductive, and quantitative reasoning is To test the structure of reasoning ability,
competing with a proposal of verbal, figural- Wilhelm (2000) selected reasoning measures
spatial, and quantitative reasoning. To investi- based on their cognitive demands and the
gate possible structures of reasoning ability, one content involved. In addressing the above-
should include tasks that allow for comparison mentioned criticisms of existent reasoning tasks,
between several competing theories. There are several reasoning tasks were newly constructed.
basically five theories competing as explana- The following 12 measures were included in the
tions for the structure of reasoning ability. study (D and I denote deductive and inductive
reasoning; F, N, and V stand for figural, numeri-
1. a general reasoning factor accounting for the cal, and verbal content, respectively).
communality of reasoning tasks varying with
DF1 (Electric Circuits): Positive and negative
respect to content (verbal, quantitative, figural-
signals travel through various switches. The result-
spatial) and operation (inductive, deductive);
ing signal has to be indicated. The number and kind
2. two correlated factors for inductive and of switches and the number of signals are varied
deductive reasoning, respectively, without the (Gitomer, 1988; Kyllonen & Stephens, 1990).
specification of any content factors;
DF2 (Spatial Relations): Spatial orientation of
3. three correlated factors for verbal, quantitative, symbols is presented pairwise. The spatial orien-
and figural-spatial reasoning, without distin- tation of two symbols that were not presented
guishing inductive and deductive reasoning together can be derived from the pairwise presen-
processes; tations (Byrne & Johnson-Laird, 1989).

4. a general reasoning factor along with nested DN1 (Solving Equations): A series of equations is
and completely orthogonal factors for verbal presented. Participants can derive values of vari-
and quantitative reasoning but no figural- ables deductively. Items vary by the number of
spatial factor; and variables and the difficulty of relation. A difficult
sample item is “A plus B is C plus D. B plus C is
5. two correlated factors for inductive and deduc-
2*A. A plus D is 2*B. A + B is 11. A + C is 9.”
tive reasoning along with completely orthogo-
nal content factors for verbal and quantitative DN2 (Arithmetic Reasoning): Participants pro-
reasoning and again no figural-spatial factor. vide free responses to short verbally stated arith-
metic problems from a real-life context.
For the evaluation of these models, it is
DV1 (Propositions): Acts of a hypothetical
important to avoid a confound between content
machine are described, and the correct conclusion
and process on the task level. A second crucial
has to be deduced. The number of mental models,
aspect for exploring the structure of reasoning
logical relation, and negation are varied in this
ability is to select appropriate tasks to measure
multiple-choice test (Wilhelm & McKnight,
the intended constructs. This is particularly hard
2002). A simple sample item is as follows: “If the
in the domain of deductive reasoning. Following
lever moves and the valve closes, then the inter-
the above-presented definition of inductive and
rupter is switched. The lever moves. The valve
deductive reasoning, it is very difficult to find
closes.”
adequate measures of figural-spatial deductive
reasoning. In fact, only 7 of all the tasks DV2 (Syllogisms): Verbally phrased quantitative
described in Carroll (1993) can be classified as premises are presented in which the number of
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 384

384– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


mental models is varied by manipulating the model simply specifies one latent reasoning
figure and quantifier (Wilhelm & McKnight, factor with loadings from all indicators. A
2002). A sample item is as follows: “No big shield two-factor model specifies two correlated latent
is red. All round shields are big.” factors: one factor with loadings on all the
inductive tasks, the other factor with loadings on
IF1 (Figural Classifications): Participants are
all the deductive tasks. The correlation between
asked to find the one pictorial figure that does not
both factors is estimated freely. The three-factor
belong with four other figures based on various
model specifies three correlated content factors:
attributes of the figures.
a verbal factor with loadings from all the verbal
IF2 (Matrices): Based on trends in rows and tasks, a quantitative factor with loadings from
columns of 3*3 matrices, a figure that belongs in all quantitative tasks, and a figural-spatial factor
a specified cell has to be selected from several with loadings on all the figural-spatial tasks.
distractors. The fourth model specifies a general reasoning
factor and two orthogonal nested factors—one
IN1 (Number Series): Rule-ordered series of
for the four verbal tasks and the other for the
numbers are to be continued by two elements. The
four quantitative tasks. The fifth model specifies
difficulty of the rule that has to be detected is varied.
an inductive reasoning factor with loadings
IN2 (Unfitting Number): In a series of numbers, from all inductive reasoning tasks and, likewise,
one that does not fit has to be identified. a deductive reasoning factor with loadings from
all the deductive reasoning tasks. In addition,
IV1 (Verbal Analogies): Analogies as they are fre-
the two content factors as in the fourth model
quently used in intelligence research. The general
are specified. The two reasoning factors are
form of the multiple-choice items is “? is to B as
correlated, but the three content factors are not.
C is to ?.” The vocabulary of these double analo-
Generally, there are, of course, other possible
gies is simple (i.e., participants are familiar with
model architectures (see Chapter 14, this
all terms), and the difficulty of the relationship is
volume). However, the above-mentioned mod-
varied.
els provide a test of competing theories for the
IV2 (Word Meanings): In this multiple-choice structure of reasoning ability. The last two models
test, participants should identify a word that means mentioned above specify content factors for the
approximately the same thing as a given word. verbal and quantitative tasks only. For the
figural-spatial tasks, such a content factor might
A total of 279 high school students with a not be necessary because such tasks have been
mean age of 17.7 years and a standard deviation said to require decontextualized reasoning, and
of 1.2 years completed all tests and several cri- observed individual differences do not reflect
terion measures. All tests were analyzed sepa- specific prior knowledge (Ackerman, 1989,
rately with item response theory models. For all 1996; Undheim & Gustafsson, 1987). Models
tests, a two-parameter model assuming disper- with and without a first-order factor of figural-
sion in item discrimination was superior to a spatial reasoning—as specified in the current
Rasch model. The estimated person parameters context—are nested and can be compared infer-
from these two-parameter models were subse- entially (see Chapter 14, this volume).
quently analyzed. For participants who got Table 21.1 summarizes the fit of the five
either all answers wrong or all answers right, confirmatory factor analyses. Comparing the
person parameters were interpolated. Some of general factor model with a model that specifies
the reliabilities of the tasks are not satisfying. two correlated factors of inductive and deductive
Coefficient Omega (McDonald, 1985) for IF1 reasoning, respectively, reveals that there is no
and IF2 are only .50 and .51, respectively. The advantage in estimating the correlation between
overall test length for individual measures might inductive and deductive reasoning freely (as
be responsible for these suboptimal results. opposed to restricting this correlation to unity).
The core research question in the present Indeed, the correlation between both factors in
context is which of the above-specified models Model 2 is estimated to be exactly 1. Conseque-
provides the best fit for the data. A one-factor ntly, when comparing these two models, the
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 385

Measuring Reasoning Ability– • –385

Table 21.1 Fit Statistics of Five Competing Structural Explanations of Reasoning Ability

g Ind. Ded. Cont. g & Cont. Ind. Ded. & Cont.

χ2 121.2 121.2 84.8 73.3 72.0


df 54 53 51 46 45
p <.0001 <.0001 .002 .006 .006
CFI .901 .900 .950 .960 .960
RMSEA .067 .068 .049 .046 .046
BIC 316.0 324.1 303.9 333 339.8
CAIC 280.4 287 263.8 285.5 290.8

Note: Ind. Ded. = inductive and deductive; Cont. = contents; CFI = comparative fit index; RMSEA = root mean square error
of approximation; BIC = Bayesian information criterion; CAIC = consistent Akaike’s information criterion.

general factor model is the better explanation of factor. This model is presented in Figure 21.2.
the data because it is more parsimonious than the The two content factors—Verbal and Quantita-
two-factor model. However, both models do not tive Reasoning—reflect deductive and inductive
provide acceptable fit. reasoning with verbal and quantitative material,
A model specifying three correlated group respectively. Due to the relevance of task con-
factors for content does substantially better in tent, it can be expected that the Verbal and the
explaining the data. Although there is still room Quantitative Reasoning factors do predict dif-
to improve fit, the model represents an accept- ferent aspects of criteria such as school grades,
able explanation of the data. Given that the achievement, and the like. The loading of the
model is completely derived from theory, it can Figural Reasoning factor on fluid intelligence is
serve as a good starting point for future investi- freely estimated to be 1. Not only are g and Gf
gations. Comparing the two models with com- very highly or perfectly correlated (Gustafsson,
pletely orthogonal content factors again 1983), but the same is true between figural-
demonstrates the superiority of the model that spatial reasoning and fluid intelligence. Con-
postulates the unity of inductive and deductive sequently, the current analysis extends Undheim
reasoning. In this data set, inductive and deduc- and Gustafsson’s (1987) work to a lower stra-
tive reasoning are perfectly correlated. tum. It is a replicated finding that Gf is the
Introducing a distinction between both factors is Stratum 2 factor with the highest loading on
unnecessary and consequently does not improve g (Carroll, 1993). It has also been argued that
model fit. Both models are substantially better this relation might be perfect (Gustafsson, 1983;
than the initial one- and two-factor models. Undheim & Gustafsson, 1987, but see Chapter
However, one of the loadings on the verbal 18, this volume). Figural-spatial reasoning, in
factor is not significant and negative in sign. turn, has the highest loading on fluid intelli-
Given this departure from the theoretical expec- gence, and in the data presented in this chapter,
tation of positive and significant loadings, and the relation between figural-spatial reasoning
keeping in mind interpretative issues with group and the factor labeled fluid intelligence is per-
factors in nested factor models (see Chapter 14, fect. Hence, if we do want to measure g with a
this volume), the best solution seems to be single task, we should select a task of figural-
accepting the model based on the content spatial reasoning. Matrices tasks have been con-
factors. In this model, there are three content- sidered particularly good measures of Gf and g.
related reasoning factors, each one of them sub- Spearman (1938) suggested the Matrices test
suming inductive and deductive reasoning tasks. from Penrose and Raven (1936), as well as the
In the current study, the model with correlated inductive figural measure from Line (1931), as
group factors is equivalent to a second-order the single best indicators of g. The latter test is
factor model. In this model, the correlations less prominent than the Matrices test, but vari-
between factors are captured by a higher-order ants of it can be found in various intelligence
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 386

386– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE

IV1 .35 .33 IF1

.57 Verbal Figural .49


IV2 IF2
.50 .45
.84 1.00
DV1 DF1
.68 gf .63
DV2 DF2

.67 .83 .73


IN1 DN1

.60 Quant. .69


IN2 DN2

Figure 21.2 Higher-Order Model of Fluid Intelligence (Reasoning)

tests. Although it is not good practice to emerged considering the desiderata for future
measure rather general constructs with single research provided by Carroll (1993, p. 232).
tasks, there is certainly evidence suggesting Specifically, the present tasks have been
that, if need be, this sole task should be a selected or constructed based on a careful review
figural-spatial reasoning measure. Whether such of the individual-differences and cognitive
a task is classified as inductive or deductive is literature on the topic, the items were analyzed
not important for that purpose. by latent item response theory, and the scales
Frequently, the composition of intelligence were analyzed by confirmatory factor analyses.
batteries is not well balanced in the sense that The current tests include several new reasoning
there are many indicators for one intelligence measures that are based on and informed
construct but few or no tests for other intelli- through cognitive psychology.
gence constructs. In such cases (e.g., Roberts
et al., 2000), the overall solution can be domi-
nated by tasks other than fluid intelligence WORKING MEMORY AND REASONING
tasks. As a result, figural-spatial reasoning tasks
might not be the best selection in these cases to There have been several attempts to explain
reflect the g factor of such a battery. reasoning ability in terms of other abilities that
When interpreting the results from this study, are considered more basic and tractable. Specifi-
it is important to keep in mind that the differ- cally, working memory has been proposed as
ences between various models were not that big. the major limiting factor for human reasoning
With different tasks and different participants, it (Kyllonen & Christal, 1990; Süß, Oberauer,
is possible that different results emerge. The Wittmann, Wilhelm, & Schulze, 2002). The
present results are preliminary and in need of working definition of working memory has been
replication and extension. The most important that any task that requires individuals to simul-
result from the study reported above is that in a taneously store and process information can be
critical test aimed to assess a distinction considered a working memory task (Kyllonen &
between inductive and deductive reasoning, no Christal, 1990). This definition has been criti-
such distinction could be found. Latent factors cized because it seems to include all reasoning
of inductive and deductive reasoning are per- measures. The definition has also been criti-
fectly correlated in several models. The result of cized because its notion of “storage” and “pro-
a unity of inductive and deductive reasoning cessing” are imprecise and fuzzy (see Chapter
was also obtained with multidimensional 22, this volume). A critique of the “working
scaling, exploratory factor analysis, and tetrad memory = reasoning” hypothesis can also focus
analysis. It is important to note that this result on the problem of the reduction of one construct
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 387

Measuring Reasoning Ability– • –387

in need of explanation through another one psychological constructs. There should be more
(Deary, 2001) that is not doing any better. than three indicators of sufficient psychometric
However, this critique is unjustified for several quality for each construct to allow an evaluation
reasons. of the measurement models on both sides.

1. It is easy to construct and create working 2. Depending on the task selection and the
memory tasks. Many tasks that satisfy the above breadth of the definition of both constructs, the
definition work in the sense that they correlate specification of more than one factor on both
highly with other working memory measures, sides might be necessary (Oberauer, Süß,
reasoning, Gf, and g. In addition, it is easy and Wilhelm, & Wittmann, 2003).
straightforward to manipulate the difficulty of a 3. The definition of constructs and task
working memory item by manipulating the stor- classes is a difficult issue. Classifying anything
age demand, the process demand, or the time as a working memory task that requires simulta-
available to do storage, processing, or both. neous storage and processing could turn out to
Those manipulations account for a large amount be overinclusive. Restricting fluid intelligence
of variance of task difficulty in almost all cases. to figural-spatial reasoning measures is likely to
2. There is an enormous corpus of research be underinclusive. The comments on tasks of
on working memory and processes in working reasoning ability presented in this chapter, as
memory in cognitive psychology (Conway, well as similar comments on what constitutes a
Jarrold, Kane, Miyake, & Towse, in press; good working memory task (see Chapters 5 and
Miyake & Shah, 1999). It is fruitful to derive 22, this volume), might be a good starting point
knowledge and hypotheses about individual dif- for definition of task classes.
ferences in cognition from this body of research. 4. Content variation in the operationaliza-
3. In the sense of a reduction of working tion for both constructs can have an influence on
memory on biological substrates, intensive and the magnitude of the relation. When assessing
very productive research has linked working reasoning ability, one is well advised to use
memory functioning to the frontal lobes and several tasks with verbal, figural, and quantita-
investigated the role of various physiological tive content. The same is true for working
parameters to cognitive functioning (Kane & memory. This chapter provided some evidence
Engle, 2002; see Chapter 9, this volume, for a for the content distinction on the reasoning side.
review of research linking reasoning to various Similar evidence for the working memory
neuropsychological parameters). Hence, the side is evident in structural models that posit
equation of working memory with reasoning is content-specific factors of working memory
complemented by relating working memory to (Kane et al., 2004; Kyllonen, 1996; Oberauer,
the frontal lobes and other characteristics and Süß, Schulze, Wilhelm, & Wittmann, 2000).
features of the brain. Relating working memory tasks of one content
with reasoning tasks of another content causes
The strengths of the relation found between one to underestimate the true relation.
latent factors of working memory and reasoning 5. A mono-operation bias should be avoided
vary substantially, fluctuating between a low of in assessing both constructs. Using only com-
.6 (Engle, 2002; Engle, Tuholski, Laughlin, & plex span tasks or only dual-tasks to assess
Conway, 1999; Kane et al., 2004) and a high of working memory functioning does not do
nearly 1 (Kyllonen, 1996). In the discussion of justice to the much more general nature of the
the strength of the relation, several sources that construct (Oberauer et al., 2000). Task class-
could cause an underestimation or an overesti- specific factors or task-specific strategies might
mation should be kept in mind. have an effect on the estimated relation.
1. The relation should be assessed on the 6. Reasoning measures—like other intelli-
level of latent factors because this is the level gence tasks—are frequently administered under
of major interest when it comes to assessing time constraints. Timed and untimed reasoning
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 388

388– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


ability are not perfectly correlated (Wilhelm & ability. There are several very interesting
Schulze, 2002). Similarly, working memory tasks attempts to develop training methods for rea-
frequently have timed aspects (Ackerman, Beier, soning ability, and the initial results are encour-
& Boyle, 2003). For example, there might be aging in some cases (Klauer, 1990, 2001).
only a limited time to execute a process before Although it was not possible to discriminate
the next stimulus appears, there might be a timed between inductive and deductive reasoning
rate of stimulus presentation, and the like. psychometrically, it could be possible that
Common speed variance could inflate the corre- appropriate training causes differential gains in
lation between working memory and reasoning. both forms of reasoning. The cognitive processes
in inductive and deductive reasoning tasks
The assumption that working memory is a might be different, but the individual differences
critical ingredient to success on reasoning tasks we can observe on adequate measures are not.
is compatible with experimental evidence and This does not exclude the option that both
theories from cognitive psychology. The ability thought processes might be affected by different
to successfully create and manipulate mental interventions.
representations was argued to be the critical
ingredient in reasoning. Whether the necessary
representations can be created and manipulated REFERENCES
depends crucially on working memory. This
prediction has gained strong support from the Ackerman, P. L. (1989). Abilities, elementary infor-
correlational studies relating working memory mation processes, and other sights to see at the
and reasoning. If the individual differences in zoo. In R. Kanfer, P. L. Ackerman, & R. Cudeck
reasoning ability and working memory turn out (Eds.), Abilities, motivation, and methodology:
to be roughly the same, the evidence supporting The Minnesota symposium on learning and
the predictive validity of reasoning ability and individual differences (Vol. 10, pp. 280–293).
fluid intelligence applies to working memory Hillsdale, NJ: Lawrence Erlbaum.
capacity, too. After careful consideration of Ackerman, P. L. (1996). A theory of adult intellectual
costs and benefits, it might be sensible to use development: Process, personality, interests, and
more tractable working memory tasks for many knowledge. Intelligence, 22, 229–259.
practical purposes. Ackerman, P. L., Beier, M. E., & Boyle, M. D.
(2003). Individual differences in working
memory within a nomological network of
SUMMARY AND CONCLUSIONS cognitive and perceptual speed abilities. Journal
of Experimental Psychology: General, 131,
The fruitful avenue to future research on mea- 567–589.
suring and understanding reasoning ability is Andrews, G., & Halford, G. S. (2002). A cognitive
characterized by (a) more theoretically moti- complexity metric applied to cognitive develop-
vated work in the processes and resources ment. Cognitive Psychology, 45, 153–219.
involved in reasoning and (b) the use of confir- Beauducel, A., Brocke, B., & Liepmann, D. (2001).
matory methods on the item and test level to Perspectives on fluid and crystallized intelli-
investigate meaningful measurement and struc- gence: Facets for verbal, numerical, and figural
tural models. The major result of efforts directed intelligence. Personality and Individual Differ-
that way would be a more profound understand- ences, 30, 977–994.
ing of important thought processes and an Binet, A. (1903). L’ etude experimentale de l’intelli-
improved construction and design of measures gence [Experimental studies of intelligence].
of reasoning ability. A side product of such Paris: Schleicher, Frenes.
efforts will be generative item production and Binet, A. (1905). A propos de la measure de l’intelli-
theoretically derived assumptions about psycho- gence [On the subject of measuring intelli-
metric properties of items and tests. Another gence]. Année Psychologique, 12, 69–82.
side product would be the option to develop Binet, A. (1907). La psychologie du raisonnement
more appropriate means of altering reasoning [The psychology of reasoning]. Paris: Alcan.
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 389

Measuring Reasoning Ability– • –389

Boole, G. (1847). The mathematical analysis of logic: variable approach. Journal of Experimental
Being an essay towards a calculus of deductive Psychology: General, 128, 309–331.
reasoning. Cambridge, UK: Macmillan, Barclay, Epstein, S. (1994). Integration of the cognitive and
and Macmillan. the psychodynamic unconscious. American
Byrne, R. M. J., & Johnson-Laird, P. N. (1989). Psychologist, 49, 709–724.
Spatial reasoning. Journal of Memory and Evans, J. St. B. T. (1989). Bias in human reasoning:
Language, 28, 564–575. Causes and consequences. Hove, UK: Lawrence
Carnap, R. (1971). Logical foundations of probabil- Erlbaum.
ity. Chicago: University of Chicago Press. Ford, M. (1995). Two modes of mental representation
Carroll, J. B. (1989). Factor analysis since Spearman: and problem solution in syllogistic reasoning.
Where do we stand? What do we know? In Cognition, 51, 1–71.
R. Kanfer, P. L. Ackerman, & R. Cudeck (Eds.), Frege, G. (1879). Begriffsschrift: Eine der arithmetis-
Abilities, motivation, and methodology: The chen nachgebildete Formelsprache des reinen
Minnesota symposium on learning and individ- Denkens [Begriffsschrift: A formula language
ual differences (Vol. 10, pp. 43–70). Hillsdale, modeled upon that of arithmetic, for pure
NJ: Lawrence Erlbaum. thought]. Halle a.S.: L. Nebert.
Carroll, J. B. (1993). Human cognitive abilities: A Gilinsky, A. S., & Judd, B. B. (1993). Working
survey of factor-analytic studies. Cambridge, memory and bias in reasoning across the life
MA: Cambridge University Press. span. Psychology and Aging, 9, 356–371.
Colberg, M., Nester, M. A., & Cormier, S. M. (1982). Gitomer, D. H. (1988). Individual differences in
Inductive reasoning in psychometrics: A philo- technical troubleshooting. Human Performance,
sophical corrective. Intelligence, 6, 139–164. 1, 111–131.
Colberg, M., Nester, M. A., & Trattner, M. H. (1985). Guilford, J. P. (1956). The structure of intellect.
Convergence of the inductive and deductive Psychological Bulletin, 53, 267–293.
models in the measurement of reasoning abilities. Guilford, J. P. (1967). The nature of human intelli-
Journal of Applied Psychology, 70, 681–694. gence. New York: McGraw-Hill.
Conway, A. R. A., Jarrold, C., Kane, M., Miyake, A., Guilford, J. P., Christensen, P. R., Kettner, N. W.,
& Towse, J. (in press). Variation in working Green, R. F., & Hertzka, A. F. (1954). A factor
memory. Oxford, UK: Oxford University Press. analytic study of Navy reasoning tests with the
Craik, K. (1943). The nature of explanation. Air Force Aircrew Classification Battery.
Cambridge, MA: Cambridge University Press. Educational and Psychological Measurement,
Deary, I. J. (2001). Human intelligence differences: 14, 301–325.
Towards a combined experimental-differential Guilford, J. P., Comrey, A. L., Green, R. F., &
approach. Trends in Cognitive Science, 5, Christensen, P. R. (1950). A factor-analytic
164–170. study on reasoning abilities: I. Hypotheses and
Ebbinghaus, H. (1895). Über eine neue Methode description of tests. Reports from the
zur Prüfung geistiger Fähigkeiten und ihre Psychological Laboratory, University of
Anwendung bei Schulkindern [On a new method Southern California, Los Angeles.
to test mental abilities and its application with Guilford, J. P., Green, R. F., & Christensen, P. R.
schoolchildren]. Zeitschrift für Psychologie und (1951). A factor-analytic study on reasoning
Physiologie der Sinnesorgane, 13, 401–459. abilities: II. Administration of tests and analysis
Ekstrom, R. B., French, J. W., & Harman, H. H. of results. Reports from the Psychological
(1976). Manual for kit of factor-reference cogni- Laboratory, University of Southern California,
tive tests. Princeton, NJ: Educational Testing Los Angeles.
Service. Gustafsson, J.-E. (1983). A unifying model for the
Engle, R. W. (2002). Working memory capacity as structure of intellectual abilities. Intelligence, 8,
executive attention. Current Directions in 179–203.
Psychological Science, 11, 19–23. Hammond, K. R. (1996). Human judgment and social
Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, policy: Irreducible uncertainty, inevitable error,
A. R. A. (1999). Working memory, short-term unavoidable injustice. Oxford, UK: Oxford
memory and general fluid intelligence: A latent University Press.
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 390

390– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


Handley, S. J., Dennis, I., Evans, J. St. B. T., & Capon, & D. Over (Eds.), Rationality: Psychological
A. (2000). Individual differences and the search and philosophical perspectives (pp. 177–210).
for counter-examples in reasoning. In W. Schaeken, London: Routledge.
A. Vandierendonck, & G. de Vooght (Eds.), Johnson-Laird, P. N., Legrenzi, P., Girotto, V.,
Deductive reasoning and strategies (pp. 241–266). Legrenzi, M. S., & Caverni, J. P. (1999). Naïve
Hillsdale, NJ: Lawrence Erlbaum. probability: A mental model theory of exten-
Hertzka, A. F., Guilford, J. P., Christensen, P. R., & sional reasoning. Psychological Review, 106,
Berger, R. M. (1954). A factor analytic study 62–88.
of evaluative abilities. Educational and Psycho- Kane, M. J., & Engle, R. W. (2002). The role of pre-
logical Measurement, 14, 581–597. frontal cortex in working-memory capacity,
Holyoak, K. J., & Thagard, P. (1997). The analogical executive attention, and general fluid intelli-
mind. American Psychologist, 52, 35–44. gence: An individual-differences perspective.
Horn, J. L., & Cattell, R. B. (1967). Age differences Psychonomic Bulletin & Review, 9, 637–671.
in fluid and crystallized intelligence. Acta Kane, M. J., Hambrick, D. Z., Tuholski, S. W.,
Psychologica, 26, 107–129. Wilhelm, O., Payne, T. W., & Engle, R. W.
Horn, J. L., & Noll, J. (1994). A system for under- (2004). The generality of working-memory
standing cognitive capabilities: A theory and capacity: A latent-variable approach to verbal
the evidence on which it is based. In D. K. and visuo-spatial memory span and reasoning.
Detterman (Ed.), Current topics in human Journal of Experimental Psychology: General,
intelligence: Vol. 4. Theories of intelligence 133, 189–217.
(pp. 151–203). Norwood, NJ: Ablex. Klauer, K. C., Musch, J., & Naumer, B. (2000). On
Horn, J. L., & Noll, J. (1997). Human cognitive belief bias in syllogistic reasoning. Psychologi-
capabilities: Gf-Gc theory. In D. P. Flanagan, J. L. cal Review, 107, 852–884.
Genshaft, & P. L. Harrison (Eds.), Contemporary Klauer, K. J. (1990). A process theory of inductive
intellectual assessment: Theories, tests, and reasoning tested by the teaching of domain-
issues (pp. 53–92). New York: Guilford. specific thinking strategies. European Journal of
Hummel, J. E., & Holyoak, K. J. (2003). A symbolic- Psychology of Education, 5, 191–206.
connectionist theory of relational inference and Klauer, K. J. (2001). Handbuch kognitives training
generalization. Psychological Review, 110, [Handbook of cognitive training]. Toronto:
220–264. Hogrefe.
Jäger, A. O., Süß, H.- M., & Beauducel, A. (1997). Krueger, F., & Spearman, C. (1906). Die Korrelation
Berliner Intelligenzstruktur Test [Berlin Intel- zwischen verschiedenen geistigen Leistungs-
ligence Structure test]. Göttingen: Hogrefe. fähigkeiten [The correlation between different
Jensen, A. R. (1998). The g factor: The science of mental abilities]. Zeitschrift für psychologie, 44,
mental ability. London: Praeger. 50–114.
Johnson-Laird, P. N. (1985). Deductive reasoning abil- Kyllonen, P. C. (1996). Is working memory capacity
ity. In R. J. Sternberg (Ed.), Human abilities: An Spearman’s g? In I. Dennis & P. Tapsfield
information-processing approach (pp. 173–194). (Eds.), Human abilities: Their nature and mea-
New York: Freeman. surement (pp. 49–75). Mahwah, NJ: Lawrence
Johnson-Laird, P. N. (1994a). Mental models and Erlbaum.
probabilistic thinking. Cognition, 50, 189–209. Kyllonen, P. C., & Christal, R. E. (1990). Reasoning
Johnson-Laird, P. N. (1994b). A model theory of ability is (little more than) working-memory
induction. International Studies in the Philoso- capacity?! Intelligence, 14, 389–433.
phy of Science, 8, 5–29. Kyllonen, P. C., & Stephens, D. L. (1990). Cognitive
Johnson-Laird, P. N. (2001). Mental models and abilities as determinants of success in acquiring
deduction. Trends in Cognitive Science, 5, logic skill. Learning and Individual Differences,
434–442. 2, 129–160.
Johnson-Laird, P. N., & Byrne, R. M. J. (1991). Line, W. (1931). The growth of visual perception in
Deduction. Hove, UK: Lawrence Erlbaum. children. British Journal of Psychology, 15.
Johnson-Laird, P. N., & Byrne, R. M. J. (1993). Lohman, D. F. (1996). Spatial ability and g. In
Models and deductive rationality. In K. Manktelov I. Dennis & P. Tapsfield (Eds.), Human abilities:
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 391

Measuring Reasoning Ability– • –391

Their nature and measurement (pp. 97–116). Spearman, C. (1923). The nature of ‘intelligence’ and
Mahwah, NJ: Lawrence Erlbaum. the principles of cognition. London: Macmillan.
Magnani, L. (2001). Abduction, reason, and science: Spearman, C. (1927). The abilities of man: Their
Processes of discovery and explanation. nature and measurement. New York: AMS.
Dordrecht, the Netherlands: Kluwer Academic. Spearman, C. (1938). Measurement of intelligence.
McDonald, R. P. (1985). Factor analysis and related Scientia, 64, 75–82.
methods. Hillsdale, NJ: Lawrence Erlbaum. Stanovich, K. E. (1999). Who is rational: Studies of
Miyake, A., & Shah, P. (1999). Models of working individual differences in reasoning. Mahwah,
memory: Mechanisms of active maintenance NJ: Lawrence Erlbaum.
and executive control. New York: Cambridge Stegmüller, W. (1996). Das Problem der Induktion:
University Press. Humes Herausforderung und moderne Antworten
Oberauer, K., Süß, H.-M., Schulze, R., Wilhelm, O., [The problem of induction: Hume’s challenge
& Wittmann, W. W. (2000). Working memory and modern answers]. Darmstadt: Wissenschaf-
capacity: Facets of a cognitive ability construct. tliche Buchgesellschaft.
Personality and Individual Differences, 29, Stenning, K., & Oberlander, J. (1995). A cognitive
1017–1045. theory of graphical and linguistic reasoning:
Oberauer, K., Süß, H.-M., Wilhelm, O., & Wittmann, Logic and implementation. Cognitive Science,
W. W. (2003). The multiple faces of working 19, 97–140.
memory: Storage, processing, supervision, and Sternberg, R. J., & Turner, M. E. (1981). Components
coordination. Intelligence, 31, 167–193. of syllogistic reasoning. Acta Psychologica, 47,
Penrose, L. S., & Raven, J. C. (1936). A new series 245–265.
of perceptual tests: Preliminary communication. Störing, G. (1908). Experimentelle Untersuchungen
British Journal of Medical Psychology, 16, über einfache Schlussprozesse [Experimental
97–104. studies on simple inference processes]. Archiv
Rips, L. J. (1994). The psychology of proof: für die gesamte Psychologie, 11, 1–27.
Deductive reasoning in human thinking. Süß, H.-M., Oberauer, K., Wittmann, W. W.,
Cambridge: MIT Press. Wilhelm, O., & Schulze, R. (2002). Working
Roberts, R. D., Goff, G. N., Anjoul, F., Kyllonen, P. C., memory capacity explains reasoning ability—
Pallier, G., & Stankov, L. (2000). The Armed and a little bit more. Intelligence, 30, 261–288.
Services Vocational Aptitude Battery: Not much Thurstone, L. L. (1938). Primary mental abilities.
more than acculturated learning (Gc)? Learning Chicago: University of Chicago Press.
and Individual Differences, 12, 81–103. Thurstone, L. L., & Thurstone, T. G. (1941).
Schaeken, W., de Vooght, G., Vandierendonck, A., & Factorial studies of intelligence. Chicago:
d’Ydewalle, G. (Eds.). (2000). Deductive reason- University of Chicago Press.
ing and strategies. New York: Lawrence Erlbaum. Undheim, J. O., & Gustafsson, J.-E. (1987). The hier-
Schmidt, F. L., & Hunter, J. E. (1998). The validity archical organization of cognitive abilities:
and utility of selection methods in personnel Restoring general intelligence through the use of
psychology: Practical and theoretical implica- linear structural relations. Multivariate Behavior
tions of 85 years of research findings. Research, 22, 149–171.
Psychological Bulletin, 124, 262–274. Wilhelm, O. (2000). Psychologie des schlussfolgern-
Shafir, E., & Le Boeuf, R. A. (2002). Rationality. den Denkens: Differentialpsychologische Prüfung
Annual Review of Psychology, 53, 491–517. von Strukturüberlegungen [Psychology of rea-
Shye, S. (1988). Inductive and deductive reasoning: A soning: Testing structural theories]. Hamburg:
structural reanalysis of ability tests. Journal of Dr. Kovac.
Applied Psychology, 73, 308–311. Wilhelm, O., & Conrad, W. (1998). Entwicklung und
Sloman, S. A. (1996). The empirical case for two Erprobung von Tests zur Erfassung des logis-
systems of reasoning. Psychological Bulletin, chen Denkens [Development and evaluation of
119, 3–22. deductive reasoning tests]. Diagnostica, 44,
Spearman, C. (1904). “General intelligence” objec- 71–83.
tively determined and measured. American Wilhelm, O., & McKnight, P. E. (2002). Ability and
Journal of Psychology, 15, 201–293. achievement testing on the World Wide Web. In
21-Wilhelm.qxd 9/8/2004 5:09 PM Page 392

392– • – HANDBOOK OF UNDERSTANDING AND MEASURING INTELLIGENCE


B. Batinic, U.-D. Reips, & M. Bosnjak (Eds.), Woodworth, R. S., & Sells, S. B. (1935). An
Online social sciences (pp. 151–181). Toronto: atmosphere effect in formal syllogistic reason-
Hogrefe. ing. Journal of Experimental Psychology, 18,
Wilhelm, O., & Schulze, R. (2002). The relation of 451–460.
speeded and unspeeded reasoning with mental Yang, Y., & Johnson-Laird, P. N. (2001). Mental
speed. Intelligence, 30, 537–554. models and logical reasoning problems in the
Wilkins, M. C. (1929). The effect of changed mater- GRE. Journal of Experimental Psychology:
ial on ability to do formal syllogistic reasoning. Applied, 7, 308–316.
Psychological Archives, 16, (102).

View publication stats

You might also like