3

Developing Integrative Task-based
Language Assessment:
A Case with Iranian EFL Learners
Natasha Pourdana
2
1
Preface
A prototype model of task-based language assessment, the Integrative

Task (IT) Model, was developed as an attempt to solve the longstanding
problem of generalizing the language learners' performances on
particular language tasks to their future performances on the tasks at
similar levels of difficulty. In the IT Model, two psychological
doctrines contributed in developing the IT Model in order to be
operationally defined as the two-factor constructs underlying
Integrative tasks: First, the core operations identified in the eight
categories of Multiple Intelligences Theory (Gardner, 1983; 1985) and
the objective behaviors elaborated in the six hierarchical levels of
Cognitive Domain in the Taxonomy of Educational Objectives (Bloom,
1965). The credibility, consistency, trustworthiness, and predictive
adequacy of Integrative tasks were statistically examined based on the
four original research questions. Ample quantitative as well as
qualitative data supported the hypothetical assumptions made on the
accountability of the IT Model as a newly-developed language
assessment framework. The findings might be outlined as: (I) There
was a significant positive relationship between Integrative Task ratings
and learners' self-ratings on task performance outcomes; (II) Integrative
Task ratings were consistent in terms of inter-rater and inter-item
reliability estimates; (III) Integrative Task ratings were trustworthy in
terms of content, criterion-related, and construct validity estimates; and
finally (IV) There was a significant positive relationship between the
real difficulty levels of Integrative tasks and the difficulty levels
predicted by the Integrative Task Scales.
Table of Contents
PREFACE iii
TABLE OF CONTENTS iv
ABBREVIATIONS ix
CHAPTER I
1 Background and Purpose 11
1.1 Statement of the Problem and Purpose of the Study 15
1.2 Significance of the Study 17
1.3 Research Questions and Hypotheses 20
1.4 Definition of Key Terms 22
1.5 Delimitations and Limitations of the Study 28
CHAPTER II
2 Review of the Related Literature 30
Part I Language Performance Assessment 31
2.1 Language Testing Alternatives 32
2.2 Language Performance Assessment in Use 35
2.3 Language Performance Assessment: Pros and Cons 37
2.4 Performance Assessment Tasks 39
Part II Multiple Intelligences Theory 44

2.5 Evidence in Favor of Multiple Intelligences Theory 46
2.5.1 Educational Evidence 46
2.5.2 Application Evidence 47
3
2.5.3 Biological and Cognitive Evidence 49
2.6 Multiple Intelligences Defined 52
2.6.1 Linguistic Intelligence: Core Operations 53
2.6.2 Musical Intelligence: Core Operations 55
2.6.3 Mathematical Intelligence: Core Operations 56
2.6.4 Spatial Intelligence: Core Operations 58
2.6.5 Kinesthetic Intelligence: Core Operations 60
2.6.6 Personal Intelligences: Core Operations 61
2.6.7 Naturalist Intelligence: Core Operations 62
2.7 MI Theory and Performance Assessment 63 Part
III Taxonomy of Educational Objectives 65
2.8 Taxonomy of Educational Objectives Defined 68
2.8.1 Cognitive Domain 69
2.8.2 Affective Domain 72
2.8.3 Psychomotor Domain 76
Part IV Integrative Task (IT) Model 78
2.9 Task Characteristics: Current Approaches 79
2.10 Task Difficulty: Statement of the Problem 80
2.11 Integrative Task (IT) Model: An Alternative 83
CHAPTER II
3 Methodology 85
3.1 Participants 85
3.2 Design 87
3.3 Instruments 89
3.3.1 CELT 89
3.3.2 MIDAS 90
4
3.3.3 Integrative Tasks 91
3.4 Procedure 96
Phase I
3.4.1 Designing Checklist of Integrative Task Specifications 97
3.4.2 Designing Integrative Task Scales 98
3.4.3 Designing Self-Rating Scales 99 Phase
II
3.4.4 Developing Integrative Tasks 101 Phase III
3.4.5 Piloting Integrative Tasks and Revision Process 101
Phase IV
3.4.6 Administering Integrative Tasks and Self-Rating Scales 103
CHAPTER IV
4 Data Analysis and Discussions 104
Research Question I
4.1 Integrative Tasks Ratings 105
4.2 Self-Ratings 108
4.3 Correlation of Performance Measures 109
Research Question II
4.4 Inter-Rater Reliability Estimates 111
4.5 Inter-Item Reliability Estimates 113
Research Question III
4.6 Content Validity 115
4.7 Criterion-Related Validity 117
CELT 117
4.7.1 CELT and Integrative Linguistic Tasks 119
4.7.2 CELT and Integrative Mathematical Tasks 121
4.7.3 CELT and Integrative Musical Tasks 123
5
4.7.4 CELT and Integrative Kinesthetic Tasks 125
4.7.5 CELT and Integrative Spatial Tasks 127
4.7.6 CELT and Integrative Intrapersonal Tasks 129
4.7.7 CELT and Integrative Interpersonal Tasks 131
4.7.8 CELT and Integrative Naturalist Tasks 133
MIDAS 137
4.7.9 MIDAS and Integrative Linguistic Tasks 138
4.7.10 MIDAS and Integrative Mathematical Tasks 140
4.7.11 MIDAS and Integrative Musical Tasks 142
4.7.12 MIDAS and Integrative Kinesthetic Tasks 144
4.7.13 MIDAS and Integrative Spatial Tasks 146
4.7.14 MIDAS and Integrative Intrapersonal Tasks 148
4.7.15 MIDAS and Integrative Interpersonal Tasks 150
4.7.16 MIDAS and Integrative Naturalist Tasks 152
4.8 Construct Validity 156
4.8.1 Factor Analysis 156
4.8.2 Two-Factor Within-Subject ANOVA 162
Research Question IV
4.9 Actual and Predictive Integrative Tasks Difficulty 166
References 170
Appendixes 184
6
Abbreviations
Inter/An Interpersonal Task at Analysis Subscale

Inter/Ap Interpersonal Task at Application Subscale
Inter/C Interpersonal Task at Comprehension Subscale
Inter/Eva Interpersonal Task at Evaluation Subscale
Inter/K Interpersonal Task at Knowledge Subscale
Inter/Syn Interpersonal Task at Synthesis Subscale
Intra/Ap Intrapersonal Task at Application Subscale
Intra/An Intrapersonal Task at Analysis Subscale
Intra/C Intrapersonal task at Comprehension Subscale
Intra/Eva Intrapersonal Task at Evaluation Subscale
Intra/K Intrapersonal task at Knowledge Subscale
Intra/Syn Intrapersonal Task at Synthesis Subscale
IT Integrative Task
Kin/Ap Kinesthetic Task at Application Subscale
Kin/An Kinesthetic Task at Analysis Subscale
Kin/C Kinesthetic Task at Comprehension
Subscale
Kin/Eva Kinesthetic Task at Evaluation Subscale
Kin/K Kinesthetic Task at Knowledge Subscale
Kin/Syn Kinesthetic Task at Synthesis Subscale
Ling/Ap Linguistic Task at Application Subscale
Ling/An Linguistic Task at Analysis Subscale
Ling/C Linguistic Task at Comprehension Subscale
Ling/Eva Linguistic Task at Evaluation Subscale
Ling/K Linguistic Task at Knowledge Subscale
Ling/Syn Linguistic Task at Synthesis Subscale
Math/Ap Mathematical Task at Application Subscale

7
Math/An Mathematical Task at Analysis Subscale
Math/C Mathematical Task at Comprehension Subscale
Math/Eva Mathematical Task at Evaluation Subscale
Math/K Mathematical Task at Knowledge Subscale
Math/Syn Mathematical Task at Synthesis Subscale
MI Multiple Intelligences
Mus/Ap Musical Task at Application Subscale
Mus/An Musical Task at Analysis Subscale
Mus/C Musical Task at Comprehension Subscale
Mus/Eva Musical Task at Evaluation Subscale
Mus/K Musical Task at Knowledge Subscale
Mus/Syn Musical Task at Synthesis Subscale
Nat/Ap Naturalist Task at Application Subscale
Nat/An Naturalist Task at Analysis Subscale
Nat/C Naturalist Task at Comprehension Subscale
Nat/Eva Naturalist Task at Evaluation Subscale
Nat/K Naturalist Task at Knowledge Subscale
Nat/Syn Naturalist Task at Synthesis Subscale
r Pearson Product-moment Correlation Coefficient
CHAPTER I
Background and Purpose
There is no science without measurements,

no quality without testing,
and no global market without standards.
BAAL Language Learning and Teaching SIG (2007)
8
Assessment is a popular topic these days. Frequently encountered in
professional publications, workshops, in-service training, and college
courses, assessment meets certain criteria for being a cutting-edge topic
(Bachman, 1990). One useful way to think about assessment is to
contrast it with testing, an ever-present issue that confronts teachers and
learners in all disciplines. Tests have come to be an accepted
component of instructional programs throughout the world. Sometimes
tests are justified on the basis of accountability: Are learners learning
what they are supposed to be learned? Decision-makers need this type
of evidence, for example, to make judgments about how to spend
resources.
Sometimes, tests are viewed as the feedback given to language

learners concerning their progress. As Oller (1979) states, "the purpose
of the tests is to measure the variance in performances of various sorts"
(p. 45). In this sense, testing serves as a monitoring device for learning.
Tests are given at a particular point in time to sample learners' learning.
Regularly, after a test is given, some type of reporting takes place, often
in the form of a single score or grade. Sometimes, decisions are made
based on test results (e.g., re-taking the test, passing the course, going
on to the next unit of instruction, etc.). The final important aspect of
testing is that the test is usually kept hidden from the examinees until it
is administered, indicating a degree of secrecy in order to assure
confidentiality (Blanche, 1990).
Let's assume that this simple characterization of tests and testing is

correct. Assessment then can be shown to be very different. In an
instructional program, assessment is usually an ongoing strategy
through which learners' learning is not only monitored--a trait shared
with testing--but by which learners are involved in making decisions
9
about the degree to which their performances match their abilities
(Kallenbach, 1999; Viens, 1999).
Spolsky (1992) rightly argues that the diagnostic or formative

assessment is typically curriculum-driven. This type of assessment
shadows curriculum and provides feedback to learners and teachers.
Moreover, he wisely argues for a multilevel system that combines
testing and assessment. According to Spolsky, (a) learners are provided
with opportunities before and after units of instruction to assess their
own performance (self-assessment);
(b) teachers periodically assess learners' performance, and both teachers

and learners discuss their respective assessments (tests and
measurements); and finally (c) some external monitor assesses the
individual learner's (and perhaps the teacher's) performance and
discusses it with the teacher. Assessment, then, should be viewed as an
interactive process that engages both the teacher and the learner in
monitoring the learner's performance. Criterion-referenced testing is
clearly based on this way of relating teaching-testing-assessment for
congruence (Brown, 2001; Morris, 2002; Wolf, 1989).
Many of the reigning theoretical assumptions, on which
contemporary testing and assessment rely, are based on the
behavioristic views of cognition and development. In the 1990's, most
educators came to realize that new, alternative ways of thinking about
learning and assessing learning were needed. Fodor (1983), for
example, has espoused the view that there are separate analytic devices
involved in tasks like syntactic parsing, tonal recognition, and facial
perception. Gardner (1983) argues that there is a resurgence of interest
in the idea of multiplicity of intelligences. He and his followers claim
the existence of mental modules (i.e., fast-operating, reflex-like,
10
information-processing devices). Others (Gruber, 1985; Perkins, 1981;
Sternberg, 1993, to name a few) have investigated the concept of
creativity. Their studies show that creative individuals do have unique
mental modules, but they are not used to utilize what they have in
efficient and flexible ways. Such individuals might be extremely
reflective about their activities, their use of time, and the quality of their
products (Gardner, 1993).
So, we need to ask alternative to what? A case can be made in
second language testing for the alternative to the conventional ways of
assessing learners' language progress and performance. Alternative
assessment is an ongoing process involving the learner and the teacher
in making judgments about the learner's progress in language, using
some non-conventional strategies (Chen, 1995; Willis, 1996). In other
words, an assessment model, initiative in foreign and second language
studies, should acknowledge the effects of a number of factors on
learners' performance and provide the most appropriate accounts to
assess competence, including the factors that involve the individuals in
making self-assessments. Archibald and Newmann (1989) define
competence as:
It is the capacity to perform a range of occupationally or professionally
relevant communicative tasks with members of another cultural and
linguistic community using the language of that community, whether
that community is domestic or abroad. (p. 11)
They also call for a language testing framework to guide the

defining of competencies and
how these competencies are best measured, so as to focus scarce
resources in the most efficient manner possible on the application of
11
new teaching methodologies, teacher training and assessment, and
research related to language testing. (pp. 8-9)
As a solution to the longstanding need for language testing,

performance assessment has efficiently been pursued by those who wish
to move away from the traditional, standardized multiple-choice testing.
Performance assessment aims to model the real learning activities that
we wish our learners to engage with, such as oral and written
communication skills, problem-solving activities, and so forth (Gardner,
1999a). Heilenmann (1990) defines performance assessment as [A]
systematic attempt to measure a learner's ability to use previously
acquired knowledge in solving novel problems, or completing specific
tasks. In performance assessment, real-life or simulated assessment
tasks are used to elicit original responses which are directly observed
and rated by a qualified judge. (p. 7)
Most learners, however, know that sometimes they simply do not do

well on real tasks, often not because of a failure on their part to study or
prepare, but because language performance depends heavily on the
purposes for which learners are using the language and the context in
which it is done; the importance of opportunity for flexible and frequent
practice on the part of the learners cannot be overestimated (Yap, 1993).
In the real world, most of us have more than one option available to
demonstrate that we can successfully complete a task, whether at work
or in social settings. Therefore, it makes sense to provide similar
multiple opportunities for learners whose language performance is
being assessed (Gardner, 1999b).
12
The call for the increased use of meaningful, multiple assessment
models that set language learners free to demonstrate their knowledge
of language in a variety of tasks and to reflect on their own abilities
means that language teachers will have a wider range of evidence on
which to judge whether learners are competent, purposeful language
users (Hancock, 1994). It also means that language programs should
become more responsive to learners' differing learning styles of learners
and value diversity therein.
1.1 Statement of the Problem and Purpose of the Study
At first glance, it may appear that the term task is a relatively new
term in the field of language testing. In fact, the push for
communicative competence in the 1970s, the proficiency movement in
the 1980s, and more recently the call for more performance-based
testing, has been accompanied with a concomitant emphasis by
language testers on assessments that share some features to be
considered as the core in language testing (Allwright, 1988;
Widdowson, 1990).
A move towards tasks in performance assessment poses problems

for ability-oriented proficiency testing. The most influential approaches
of this type (Bachman, 1990; Bachman & Palmer, 1996; Canale &
Swain, 1980) posit a structure for the underlying components of
competence and then propose mediating mechanisms by which such
components will have an impact upon performance. In principle, then,
performance assessment might be extremely rewarding but, in practice,
the codifying nature of the underlying competence-oriented models has
not interfaced easily with effective predictions to real-world
13
performances (Harley, 1998; Larsen-Freeman, 2007; Skehan & Foster,
1999).
At the most general level, the problem is that the underlying

language competence is not easily predicted across different contexts.
Moreover, language learners perform differently on different types of
tasks; moving from the underlying constructs to actual language use has
proved problematic (Gass, 2002). Moreover, the generalizability of the
learners' performance is a particular problem for performance
assessment since the direct assessment of a complex performance is not
usually generalized from one task to another; performance is heavily
task-dependent, that is, we cannot take performance on one task to
imply that the learner can do other tasks in the same domain (Johnson,
2007a). Task specificity is compounded by limited sampling from one
domain and the difficulty of generalizing the performance outcomes to
the whole domain.
Newton (2001) urgently argues that to overcome this problem, we

should increase the number of tasks and ensure comprehensive
coverage of the domain in order to improve generalizability. In
assessment tasks, however, task length in terms of additional samples of
behavior seems critical to the reduction of chance errors in performance
assessment (Doughty, 2001). On the other hand, Skehan (2002) argues
that in relation to second language measures we need to rethink the
whole concept and consistency; "perhaps we may have to begin a
search for meaningful quality criteria [italics added] for the inclusion of
the items, rather rely on a measure of internal consistency" (p. 117).
In line with these quests and questions, in this research project an

Integrative Task (IT) Model of language assessment is proposed. This
14
model attempts to portray the assessment event in a more
comprehensive way which (a) introduces a means of grading
competence components; and (b) clarifies how an assessment model
might be organized and validated most effectively to give an empirical
basis for the claims which are made about an individual's language
proficiency (Ghal-eh, 2007). In the IT Model, Gardner's (1983)
Multiple Intelligences (MI) Theory and Bloom's (1956) Taxonomy of
Educational Objectives are integrated into two-factor constructs
underlying a number of Integrative tasks in order to assess an
individual's language proficiency.
1.2 Significance of the Study
Gardner's (1983) MI Theory offers a definition of intelligence that

defers the intelligence assessment by standard psychometric tests. In
fact, Gardner (1999c) argues that because of the contextual and creative
aspects of intelligence, these tests are unable to adequately measure a
person's true ability potential.
Gardner (1983) defines intelligence as
[A] bio-psychological potential to incorporate information that can be

activated in a cultural setting to solve problems, and create the products
of value in a culture. (p. 34)
This definition stresses the interaction between the individuals'

biology and the context of their cultural environment. Intelligence, thus
defined, includes both convergent problem-solving abilities associated
with taking tests, as well as divergent cognitive abilities that influence
15
every day practical behavior (Armstrong, 2003). "Such a broad concept
of intellectual potential cannot [sic] adequately be measured via short
answer problem-solving tests" (Wiggins, 1994, p. 43). The philosophy
of assessment from the MI perspective is closely in line with the
perspectives of a growing number of educators who, in recent years,
have argued that authentic measures of assessment probe learners'
understanding of material far more thoroughly than the traditional
multiple-choice or fill-in-the-blank tests. In fact, MI theory provides its
greater contribution to language assessment in suggesting multiple ways
to evaluate learners.
Generally speaking, the biggest shortcoming of standardized tests is

that they require learners to show, in a narrowly defined way, what they
have learned. MI theory, on the other hand, supports the belief that
learners should be able to show competence in specific skills, subject
matters, or domains in a variety of ways (Lazear, 1992).
In spite of the widespread applicability of MI theory in educational

settings, especially in ESL/EFL environments, the theory has constantly
been criticized for an insufficient account of human intelligence in
terms of the underlying mental processing that contributes to
performance on specific intelligence tasks. Similar to most
psychometric theories of intelligence [Guilford, 1982; Spearman, 1927
(Cited in Sternberg, 1991); Sternberg, 1991; Thurstone, 1938 (cited in
Sternberg, 1991)], MI theory merely identifies eight latent traits or
categories of human intelligence: Linguistic, Mathematical, Musical,
Kinesthetic, Spatial, Intrapersonal, Interpersonal, and Naturalist. The
aim of this study, therefore, is to integrate an information-processing
theory with MI categories in order to construct a new psycholinguistic
model of language assessment which properly contributes to a better
16
understanding of the cognitive abilities involved in performing an
intelligence task. To this end, the Taxonomy of Educational Objectives:
Cognitive Domain (Bloom, 1956) has been identified as an appropriate
option.
Almost fifty years ago, Bloom (1956) published his famous

Taxonomy of Educational Objectives: Cognitive Domain, Handbook 1.
This work included a cognitive domain with six levels of complexity,
that is, Knowledge, Comprehension, Application, Analysis, Synthesis,
and Evaluation. The lower levels of the cognitive domain exert less
cognitive load compared to the higher levels. In other words, as one
moves up the hierarchy, the activities require more high-level thinking
skills (Bloom, 1956). Bloom's Taxonomy has widely been used by
educators to ensure that instructions stimulate and develop learners'
higher-order thinking capacities
One of the consequences of Bloom's Taxonomy is that, not only do

the hierarchical levels in the taxonomy provide a framework for
formulating educational objectives, but also they serve as a means
through which evaluation tasks can be constructed (Smith, 2002).
Taxonomy of Educational Objectives provides a kind of quality control
mechanism through which a teacher may judge how deeply learners'
minds have been stirred by the assignment tasks. In the IT Model,
therefore, a number of language tasks are designed and schematized
based on eight categories of human intelligence. Next, these tasks are
graded in terms of their difficulty levels, based on the six levels of
Cognitive Domain in the Taxonomy of Educational Objectives. The
constructs underlying such language tasks, therefore, have two-variable
or two-factor qualities which are supposed to vary independently from
17
one another. Consequently, in the IT Model, the 48 language tasks are
called "Integrative" in order to signify the integration of these factors.
For assessment purposes, then, the researcher's concern is to

understand and model the relationship between language abilities and
language task characteristics in order to develop a framework for
designing and grading tasks so that predicting the performance
difficulty would be possible. This framework might provide a means for
making generalizations from performance on one task to the future
performance on other tasks with similar difficulty estimates. Such a
framework might also enable test users to investigate the extent to
which language learners possess the underlying abilities needed to
accomplish a range of related tasks. Addressing these goals, two
wellestablished doctrines of MI Theory (Gardner, 1983, 1985) and
Taxonomy of Educational Objectives (Bloom, 1956) are utilized in
developing the Integrative Task (IT) Model of language assessment.
1.3 Research Questions and Hypotheses
In a nutshell, the purpose of this study is to develop a prototype

language assessment model for making accurate and dependable
generalizations out of the examinees' performances on Integrative
tasks. To assess learners' ability to use language in real-world, the
contents of Integrative tasks are schematized based on MI categories of
human intelligence. Addressing this goal, the researcher utilizes the
concept of task difficulty to grade the Integrative tasks based on the six
levels of Cognitive Domain proposed by Bloom (1956). Likewise, this
study tries to investigate the credibility, consistency, trustworthiness,
and predictive adequacy of the IT Model to the extent that it properly
18
determines the correspondence of the real difficulty levels of
Integrative tasks with their difficulty levels predicted by the Integrative
Task Scales. The initial investigations are guided by the following
research questions:
o Question I: Is there any statistically significant relationship between
Integrative Task ratings and learners' self-ratings on task performance o
utcomes?
oQuestion II: Are Integrative Task ratings reliable? If yes, IIa.
What are Inter-Rater Reliability estimates?
IIb. What are Inter-Item Reliability estimates?
oQuestion III: Are Integrative Task ratings valid? If yes, IIIa.
What are Content Validity estimates?
IIIb. What are Criterion-Related Validity estimates?
IIIc. What are Construct Validity estimates? o Question IV: Is there
any statistically significant relationship between real difficulty levels of
Integrative tasks and the difficulty levels predicted by the Integrative
Task Scales?
The main concern of Question I is to qualitatively investigate the
credibility of the examinees' performances on Integrative tasks by
comparing the quantitative data obtained from Integrative Task ratings
with the qualitative data collected out of individual examinees'
selfratings. The main focus of Questions II and III is evaluating the
aspects of test usefulness. In other words, the researcher intends to
investigate various reliability and validity estimates of the prototype IT
Model through a number of statistical measurements. Finally,
investigating the predictive adequacy of the six levels of Cognitive
Domain utilized in constructing the Integrative Task Scales, the
researcher calls for any possible correspondence between the actual
19
and the predicted levels of task difficulty in Question IV. Thus, the
following null hypotheses are formulated based on the
abovementioned research questions:
oNull Hypothesis I: There is no statistically significant relationship
between Integrative Task ratings and learners' self-ratings on task
performance outcomes. oNull Hypothesis II: Integrative Task ratings
are not statistically consistent in terms of (IIa) Inter-Rater and (IIb)
Inter-Item reliability estimates. oNull Hypothesis III: Integrative Task
ratings are not statistically trustworthy in terms of (IIIa) Content, (IIIb)
Criterion-Related, and (IIIc) Construct validity estimates. oNull
Hypothesis IV: There is no statistically significant relationship between
real difficulty levels of Integrative tasks and difficulty levels predicted
by the Integrative Task Scales.
1.4 Definition of Key Terms
Assessment
Assessment is usually conducted where a learner is asked to
perform a real-world task that demonstrates a meaningful
application of essential knowledge and skills (McLaughlin, 1987;
Morgan, 1996). According to Morgan (1996, pp. 78-9), whether
the teacher is assessing the learners' achievements on a classroom
level or an institutional level, she has to perform the quality
assessment which generally is
1. Systematic and ongoing; assessment programs should be an
organized and open method of acquiring evidence over time.
20
2. Cumulative; assessment efforts should build a body of evidence that
can be used for improvement.
3. Multi-faceted; evidenceshould be obtained on

multiple dimensions using multiple methods and multiple sources.
4. Pragmatic; assessment information should be used to improve the

educational environments.
Assessment Tasks
Performance assessment is a measure of assessment based on such

authentic tasks as activities, exercises, or problems that require learners
to show what they can do (Christison, 1996). Generally speaking,
assessment tasks are designed to have learners demonstrate their
understanding by applying their knowledge to a particular situation
(Farrar, 1992). Assessment tasks often have more than one acceptable
solution; they may require a learner to create a response to a problem
and then explain or defend it. The process involves the use of
higherorder thinking skills, for instance, cause-and-effect analysis,
deductive or inductive reasoning, experimenting, or problem solving
(Nunan, 1991). Assessment tasks may be used primarily for assessment
at the end of a period of instruction but are frequently used for teaching,
as well as assessment.
Intelligence
In this study, Gardner’s (1983) notion of human intelligence has been
adopted as following
Intelligence is an intellectual component that must entail a set of
skills of problem-solving, enabling the individual to resolve
genuine problems or difficulties--everyday or educational--that he or
21
she encounters, and when motivated, to create an effective product
and must also entail the potential to interact--verbally or
nonverbally--with other inhabitants of a cultural setting. (p. 56)
The Integrative Task (IT) Model

The Integrative Task (IT) Model is a prototype model of language
assessment. In this model, a matrix of forty-eight language tasks are
designed in six subscales, based on the six levels of Taxonomy of
Educational Objectives: Cognitive Domain. These levels, or subscales,
are Knowledge, Comprehension, Application, Analysis, Synthesis, and
Evaluation. Likewise, each subscale consists of eight Integrative tasks.
The content of Integrative tasks is schematized based on the eight
categories of Multiple Intelligences, that is, Linguistic, Mathematical,
Musical, Kinesthetic, Spatial, Intrapersonal, Interpersonal, and
Naturalist.
Technically speaking, Integrative tasks are vertically schematized
based on the eight categories of human intelligence and are horizontally
graded based on the six levels of Cognitive Domain (See the Checklist
of Integrative Task Specifications, p. 129).
Model
A model is a theoretical construct that represents an entity

consisting of variables which demonstrate logical and quantitative
relationships among those variables (Po-Ying, 1999). Models, in this
sense, are constructed to enable reasoning within an idealized logical
framework about such relationships and, therefore, are an important
component of scientific theories (Nicholas et al., 2001). Idealized here
means that the model can make explicit assumptions that are known to
be false (or incomplete) in some details. Such assumptions may be
22
justified on the grounds that they simplify the model while, at the same
time, allowing the production of acceptably accurate solutions
(MacWhinney, 2000a).
Performance Assessment
Performance assessment, also known as alternative, or authentic

assessment, is a form of testing that requires learners to perform a task
rather than to select an answer from a ready-made list (Nemeth &
Komos, 2001). For example, a learner may be asked to explain
historical events, generate scientific hypotheses, solve math problems,
converse in a foreign language, or conduct research on an assigned
topic. In assessing learners' performances, experienced raters, either
teachers or trained staff members, usually judge the quality of the
learners' work based on an agreed-upon set of criteria (Skehan &
Foster, 2001).
Task
According to Richards and Renanya (2002), a task is an activity in
which (a) the focus is on process rather than product; (b) basic elements
are purposeful activities that emphasize communication and meaning;
and (c) learners learn language by interacting communicatively and
purposefully
while engaged in real-life activities.
Integrative Task Scales

In this study, for the purpose of grading the Integrative tasks, the
researcher has utilized the concept of task difficulty based on the
cognitive processing demands identified by Bloom (1956). In the six-
point Integrative Task Scales, the levels of Integrative task difficulty are
23
not graded based on examinees' ability to produce language content
needed to perform a task; rather they are scaled independently based on
the six hierarchical levels of
Cognitive Domain, that is, Knowledge, Comprehension, Application,
Analysis, Synthesis, and Evaluation. Therefore, Integrative Task Scales
are the examples of task-independent rating scales (See Brown et al.,
2002).
Multiple Intelligences (MI) Theory

Gardner (1983, 1993) argues that human beings possess a number
of distinct intelligences that manifest themselves in different skills and
abilities. All human beings apply these intelligences to solve problems,
invent processes, and create novelties. According to the MI theory,
Intelligence is the ability to apply one or more of the intelligence
categories in the ways that are valued by a community or a culture
(Kronhaber, 2001; Rodgers, 2001). The current MI theory outlines
eight intelligence categories: Verbal-linguistic, Rhythmic-musical,
Logical-mathematical, Visual-spatial, Bodily-kinesthetic,
Socialinterpersonal, Solidarity-intrapersonal, and Naturalist. Central to
each category is the existence of one or more basic mechanisms
dealing with specific behaviors. These basic information-processing
operations are called "core operations" to locate their neutral quality in
terms of different intelligence categories and to prove that the cores are
indeed
separate (Gardner, 1983, p. 79).
24
Taxonomy
Taxonomy is the practice and the science of classification.

Taxonomies, or "Taxonomic schemes, are composed of taxonomic units
or kinds of things that are frequently arranged in a hierarchical
structure" (Fotos, 1994, p. 98).
The Taxonomy of Educational Objectives
The Taxonomy of Educational Objectives, often called Bloom's

Taxonomy, is a classification of different objectives and skills that
educators set for their students. Bloom's Taxonomy has divided
educational objectives into three domains: Cognitive, Affective, and
Psychomotor. Within each domain there are different levels of learning,
with higher levels considered more complex and closer to the complete
mastery of the subject matter. Cognitive Domain consists of six
hierarchical levels of thinking: Knowledge, Comprehension,
Application, Analysis, Synthesis, and Evaluation (Celce-Murcia, 1991).
1.5 Delimitations and Limitations of the Study
Similar to most studies in the field of social sciences, this research

project suffers from a number of delimitations and inefficiencies. A
number of delimitations are set in advance to enhance the reliability
estimates of test outcomes: First, in validating the interpretations made
on the examinees' scores, the association between two standard tests of
CELT and Integrative Task ratings, on the one hand, and MIDAS and
Integrative Task ratings, on the other hand, are explored. However, the
probable associations between Integrative Task ratings and real-world
language use (i.e., outside of the testing context) are not examined.
25
Second, since the Integrative Task (IT) Model is not intended to
serve as an applicatory assessment framework, and it is not related to
any particular curriculum and circumstances for testing purposes, the
consequential aspects of the test use are not examined. The researcher's
justification is mainly that it exceeds beyond the scope of a single
investigation to address all necessary questions in test validation.
Perhaps the most serious shortcoming of the research methodology

in this study has to do with the potentially time-consuming and
laborintensive quality of data collection. This problem is the main cause
of a number of limitations: First, thoroughly supervised, the researcher
confines herself to collect data from one group of examinees in only
one educational setting. Second, Integrative Task modality is limited to
the paper-and-pencil type, with the English language as the medium of
task performance. Third, the technique of "think aloud," or verbal
protocol, is not employed to detect the real processing components
underlying examinees' performances in favor of a one-shot paper-
andpencil assessment session. Honestly speaking, the researcher
recognizes that the power of the current study would be considerably
higher if she did so. Ample results from her sample of 200 English
translation undergraduates, however, support the idea that the test
power is sufficient enough for various analyses conducted on reliability
and validity estimates. Finally, in order to tackle with the problem of
"task finiteness" (Coustan & Rocka, 1999), or the accountability of task
number and task type in a test battery, the researcher determines to
design only one Integrative task for each MI category (n=8) at each
level of Cognitive Domain (n=6).
In a nutshell, Integrative Task (IT) Model is a prototype model of

task-based language assessment in which the two theories of Multiple
26
Intelligences and Taxonomy of Educational Objectives: Cognitive
Domain are integrated into the two-factor constructs underlying 48
language tasks.
CHAPTER II
Review of the Related Literature
Hide not your talents

For use they were made. What's
a sundial in the shade!
Benjamin Franklin
27
Chapter II is divided into four parts: Part I starts with an overview
of the roots of language assessment, in general, and performance
assessment, in particular. Discussing task-based performance
assessment, Part I outlines the distinctive features of language
assessment tasks to compare them with language teaching tasks.
Part II starts with the notion of intelligence in the Multiple
Intelligences (MI) theory, and the impact of this notion on foreign
language teaching and testing. Moreover, Part II investigates MI theory
potentials in providing an opportunity for developing language
proficiency tasks in terms of the multidimensionality it grants to tests
and measures of language assessment.
Part III starts with an introduction to the Taxonomy of Educational
Objectives: Cognitive Domain suggesting that the potentiality of MI
theory in developing language tasks might increase, if one would be
able to account for the real processing components involved in
language task performance. Integration of these two theories, therefore,
is considered as the solution to improve one of the shortcomings of MI
theory for which it is commonly blamed by the critics.
Finally, Part IV introduces the Integrative Task (IT) Model as an
alternative framework for assessing language proficiency. Part IV ends
up with a detailed discussion on the significant features of the IT
Model.
Part I Language Performance Assessment

Summarizing recent language teaching research, Prabhu (1987)
posits that the primary objective of language instruction is to bring
about lasting change in the learner's abilities to spontaneously use
28
language. Such prioritization of performance ability in the outcomes of
second language instruction can also be seen, for example, in the
abiding influence of global proficiency models [American Council on
the Teaching of Foreign Languages (ACTFL), 1999], and the
Proficiency Movement in the U.S. Foreign Language Education.
As language instructions increasingly focus on educational
outcomes, likewise does language assessment on evaluating what
learners can do with language. As such, language testing research has
recently turned its attention to the development of various approaches to
second language performance assessment (Brown et al., 2002;
Skehan, 1996).
Performance assessments have their roots in the notion of machine
performance, which has typically addressed how reliable and durable a
particular machine performs under normal operating conditions. There
is a short step from asking such a question about machine performance
to asking similar questions about people and their job performance
(Ellis, 2000; Foster, 2001). For many testing purposes, language
performance assessment may not be very different from job
performance assessment. Language performance assessment provides a
means for eliciting learners' performances on tasks that require the use
of second or foreign language in some integral way, and such
assessment may enable meaningful interpretations to be made about
learners' abilities to use the second language for actual communication
(Skehan, 1996; Yuan & Ellis, 2003).
29
2.1 Language Testing Alternatives
One way of associating language performance with approaches to
language testing is to consider the kinds of actions that are required of
learners on associated language testing tools. Brown et al. (2002) argue
that language test users have a number of such testing options at their
disposal. Accordingly, they classify language testing options into three
categories based on the types of response required of learners:
"Selected-responses, Constructed-responses, and Personal responses"
(p. 32).
1. Selected-response: Such test types may include a variety of truefalse,
multiple-choice, and matching formats. Generally speaking, selected-
response tests have the advantage that they are relatively fast to
administer, quick to score, accurate, easy, and relatively objective
(Murphy, 2003 ). Selected-response tests can prove useful, for example,
when test users want information about learners' knowledge of
vocabulary, grammar, sound contrasts, or the receptive skills of reading
and listening (Gass, 2002).
However, such tests have the disadvantage that they are relatively
difficult to construct and require no productive language use. Arguably,
therefore, they may be at best only indirectly related to many of the
interpretations and decisions that are often based on them in second
language education contexts. In addition, they may not reflect the
instructional emphases within a language classroom or program.
2. Constructed-response: Constructed-response tests are based on
various types of fill-in, short-answer, and performance formats. As a
group, constructed-response tests have the advantage that they minimize
the effects of guessing, require productive language use, and make
possible the measurement of interactions between receptive and
30
productive skills (Samuda, 2001). Constructed-response tests can prove
useful for assessing the productive skills of writing and speaking or
their interactions with other skills (Skehan, 2002). Furthermore, since
they are often more closely aligned with second language pedagogy and
curricular objectives than are selected-response test types, second
language educators usually tend to utilize constructed-response tests in
order to better envision learners' abilities to actively use language in
meaningful communication (Murphy, 2003).
Unfortunately, Constructed-response tests suffer from the
shortcoming that bluffing is possible; that is, a learner who is clever
enough can construct a linguistically sophisticated and well-organized
response that does not actually accomplish the task at hand but gets
partial or even full-credit. Moreover, administrating and scoring are
time-consuming and scoring is both difficult and somewhat subjective.
In addition, as learners must generate the language that gets evaluated
on such tests and test methods (i.e., item directions, learner familiarity
with the test format, etc.) it may unpredictably influence test scores.
3. Personal-response: Personal-response tests include conferences,
portfolios, and self-peer-assessment formats, to name a few. These
types of assessment have the benefits of stimulating personal
communication tending to be easy to integrate into a language
curriculum and associated pedagogical interventions, and to enable the

assessment of learning processes and products. As a result, they may
promote positive washback effects on instruction. Personal-response
assessment is proved useful for motivating learners to use the language,
individualizing assessment of learner progress and achievement,
incorporating the learner into assessment process, assessing the
interaction of all four language skills, and tapping to higher order
31
organizational and thinking abilities. In addition, personal-response test
types may serve as worthwhile learning activities in their own right.
However, Personal-response tests have the demerits of being
difficult to plan and organize, difficult for learners to produce, and
difficult to score (perhaps because judgments are subjective and test
methods and products may differ radically between learners,
classrooms, or programs). In addition, the intended role of
personalresponse activities as pedagogical tools versus assessment tools
may often remain unclear in the minds of both teachers and learners if
care is not taken to designate the specific purposes for a given activity.
As Brown et al. (2002) discuss, each of these three different
categories of language testing options may prove useful and appropriate
for various assessment purposes in language education contexts.
However, within recent language testing research and practice, one can
see a special role emerging for performance assessment, in particular.
Therefore, Constructed-response tests, with all of the associated
advantages and disadvantages listed above, have become a
contemporary language testing research and development priority
(Dornyei & Komos, 2001; Skehan, 2002).
2.2 Language Performance Assessment in Use
Fundamental to second language performance assessment is the

direct observation and evaluation of learners using the second language
when engaged in the extended acts of communication. Summarized by
Robinson (2001), language performance assessment of all sorts utilizes
test instruments and procedures which consistently (a) elicit meaningful
second or foreign language task performance; (b) elicit sufficient
amounts of second or foreign language communication so that
32
trustworthy interpretations can be made about learners' language
abilities; and (c) provide accountable data for the systematic and
meaningful, though subjective, evaluation of the communicative
performances of learners.
Although general, these characteristics are significant in
distinguishing what we consider as performance assessment from a
range of other kinds of assessment which do not focus on the elicitation
of extended stretches of meaningful communication and which are not
used for the evaluation of learners' abilities to use second or foreign
language. From this point of view, thus, language performance
assessment incorporates more traditional types of assessment, such as
essay writing, oral interviews, and written or oral narration, as well as a
variety of more recently-used assessment formats, such as interactive
pair and group tests, individual planned presentations or extended
compositions, and so forth (Hughes, 1998; Robinson, 2001).
Approaches to performance assessment may widely vary according
to the methods used for eliciting language performance, the types and
qualities of meaningful communication observed, and the ways in
which performance is evaluated. These variations depend, in turn, on
different purposes associated with performance assessment in language
education contexts. As Guignon (1998) states, they initially take into
account:
1. Who is using the test-based information?
2. What kinds of interpretations are being made?
3. What decisions or actions are being taken? and,
4. What educational and social consequences are associated
with the use of the test?
33
In general, as outlined by Brualdi (1996), three primary
interpretative purposes motivate performance assessment within most
language classrooms and programs: First, performance assessment may
be used for establishing whether or not learners can accomplish specific
target tasks that are directly related to learner, curricular, or professional
objectives. Second, performance assessment may be used for evaluating
various qualities of learners' language ability, including specific
qualities such as accuracy, complexity, or fluency of second language
production (Skehan, 2002), as well as, such holistic qualities as general
proficiency, and communicative competence. And, third, performance
assessment may be used for making interpretations about particular
aspects of language learning that are (or are not) occurring within
language classrooms and programs.
These three types of interpretive demands, commonly found in
second language education contexts, may all be met by differing
approaches to performance assessment (Yap, 1993). In addition, based
on such interpretations, a wide range of related decisions and actions
may be taken by teachers, learners, and those interested in the outcomes
of performance assessment. Second language performance assessment,
for example, may lead to the certification of jobs qualification that
requires ability to accomplish particular tasks by using the second
language. Decisions may also be made about the admission and
placement of learners with respect to programs of study (e.g., through
the university entrance examinations).
Similarly, learners' performances on achievement measures may
figure centrally in the evaluation and revision of the curricular
objectives in a language program. The effectiveness of pedagogic
practices may also be evaluated with the help of performance
34
assessment outcomes (e.g., for the purpose of improving day-to-day
instructional effectiveness). Occasionally, performance assessment may
also be implemented for evolving a positive washback effect on
language pedagogy (e. g., in order to gradually shift language teaching
away from grammar drills towards meaningful communication).
In a nutshell, performance assessment seems to be a necessary and
desirable evaluative measure in many contemporary language education
contexts. It certainly meets the interpretive and decision-making needs
of teachers and educational researchers by providing them with
instruments of the evaluation of learners' performances on meaningful
communicative activities, especially in language classrooms and
programs that focus on enabling learners to engage in language use.
2.3 Language Performance Assessment: Pros and Cons

In addition to the previously mentioned strengths, several specific
benefits are often associated with the use of language performance tests.
According to Christison (1998, 1999), language performance tests are
commonly preferred to be designed in order to (a) simulate the
authentic, contextualized language use with high fidelity; (b)
compensate for the negative effects often associated with the traditional
standardized testing; and (c) initiate positive washback effects on
language pedagogy and curriculum design and development.
However, literature presents numerous potential disadvantages or
problems with performance assessment (Breen, 1984; Celce-Murcia &
Larsen-Freeman, 1990; Larsen-Freeman & Long, 1991, to name a few).
In general, it has been observed that performance tests, like many other
test types, require needs analysis, input from content experts,
cooperation and coordination among teachers and test developers,
35
participation of learners and other test score users in the development
process (Mitchell & Myles, 1998).
In addition, as Sternberg (1991) states, performance tests typically
(a) require more time and resources to administer and score than other
test types do; (b) are accompanied by a variety of logistical problems
(e.g., transporting and storing materials and realia); (c) may cause
formidable reliability problems (due to both test administration and
scoring inconsistencies); (d) may only lead to very restricted kinds of
test-based interpretations; and (e) can face increased test security risks
(because of the difficulty of creating numerous tasks and the ease which
learners can communicate).
Suffering from such disadvantages, notwithstanding, performance
assessment seems essential to meet the kinds of assessment demands
that are increasingly associated with second language education
contexts (Skehan, 2002). The central issue to be dealt with before
engaging in performance assessment is to develop test instruments,
testing procedures, and performance scoring practices that can lead to
trustworthy, accurate, consistent, and useful interpretations
(LarsenFreeman, 1997). In order to do so, test developers must first
specify particular interpretations to be made on the basis of
performance assessment. Next, they can select appropriate test tasks,
initiate consistent testing procedures, and develop adequate rating or
scoring criteria (Ghal-eh, 2007).
Generally speaking, the steps taken in test development will
extensively depend on the specificity (or generality) of intended
interpretations, the range of different interpretations, and the particular
focus of those interpretations. Hence, the types of performance tests and
forms of scoring practices will vary considerably in order to meet
36
differing interpretive demands. In addition, the range of development
and implementation problems listed above (especially time and resource
demands) will influence the shapes that performance tests take (Bygate,
2001). As such, idealized test instruments and procedures for
performance assessment, which provide highly accurate and reliable
measures of specific language performance abilities, may never actually
be realized (Cook, 2000).
In short, the purpose of performance assessment is to provide a
trustworthy benchmark for making critical interpretations or
complicated decisions in academic or job-training settings. However, it
should always be remembered that the best information a performance
test provides will be an estimate of a learner's actual abilities (Brown et
al., 2002).
2.4 Performance Assessment Tasks

Performance assessment may render a range of interpretations
necessary for different purposes in various language education contexts
(Ellis, 2000). In turn, the intended interpretations play a central role in
determining which language testing alternatives can most appropriately
be employed.
Recently, the educators' attention has turned to task-based
approaches to language performance assessment which provide new
types of information about learners' abilities to use the second or foreign
language (Brown et al., 2002; Robinson, 2002; Skehan, 2002, to name a
few). Due to the absence of adequate investigations on taskbased
approaches, however, literature reflects predictably inexact uses for
terms such as task-based and task-centered assessment. These notions
37
have, in some instances, also been used interchangeably with such
terms as performance-based or performance/performative assessment
(NEAToday Online TOPIC, 1999). Therefore, it can be helpful if a
clear distinction is drawn between task-based and other forms of
performance assessment.
Technically speaking, task-based language assessment represents a
specific approach to interpreting learners' abilities by using a particular
set of methods for the elicitation and evaluation of learners' language
performances. Accordingly, Larsen-Freeman's (2001) definition of
taskbased assessment is subscribed as follows:
Task-based language assessment takes the task itself as the fundamental
unit of analysis motivating item selection, test instrument construction,
and the rating of task performance. Task-based assessment does not
simply utilize the real-world task as a means for eliciting particular
components of the language system which are then measured or
evaluated; on the contrary, the construct of interest in task-based
assessment is performance of the task itself. (p. 90)
Briefly stated, in task-based language assessment, we are interested

in eliciting and evaluating learner's abilities to accomplish particular
tasks in which target language communication is essential (Hulstijn,
1989). Such assessment is obviously performance assessment because a
learner's language performance on the task is particularly evaluated. As
Pinnemann (1998) properly argues, this restricted definition of
taskbased performance assessment has some implications for
(a) educational contexts in which such assessment is used;
(b) test methods appropriate for such assessment; and
38
(c) performance evaluation methods and interpretive criteria related
tosuch assessment.
Particularly, where language instruction takes the development of
learner abilities to accomplish real-world communication tasks as its
goal, the measurement of instructional outcomes often takes the form of
task-based performance assessment. For example, task-based
assessment might prove appropriate in educational contexts where the
focus is on (a) academic- or specific-purposes instruction; (b)
contentbased instructions; and (c) task-based language proficiency
assessment (See Breen, 1998; Byrnes, 2001; Skehan & Foster, 2002).
In such contexts, specific real-world tasks often provide curricular
and classroom instructional and evaluation objectives. Brown et al.
(2002) define real-world language tasks as "those activities that people
do in everyday life which require language for their accomplishment"
(p. 33). Where such tasks are prioritized, various test score users (e.g.,
teachers, learners, administrators, employers, etc.) will want to know, at
some point, whether or not learners can accomplish a necessary task. In
order to make such interpretations, task-based assessment is essential.
As Larsen-Freeman and Long (1991) explain, task-based methods of
assessment require learners to engage in some sort of behavior which
simulates, with as much fidelity as possible, the goal-oriented target
language use outside the language test situation. Performance on such
tasks is usually evaluated according to some pre-determined, real-world
criterion elements (i.e., task outcomes or processes). Interpretations
about task performance, then, are based on those criteria associated with
real-world expectations for task accomplishment. Further
inferences typically lead to program-
39
or profession-related actions, such as certification, achievement,
qualification decisions, entrance or exit decisions, or mastery decisions
(Lynch, 2001).
In accordance with such interpretations and uses, task-based
assessment prioritizes the simulation of real-world tasks, and associated
situational or interactional characteristics, wherein communication
plays a central role (See Bachman & Palmer, 1996). Moreover,
taskbased assessment emphasizes the evaluation of task performance
according to whether or not, or to what extent, learners are able to
accomplish the task according to real-world criterion elements and
expectations. In order to find out whether or not learners are able to
accomplish curriculum-, program-, profession-, or learner-relevant
tasks, task-based performance assessment seems essential (Lantole,
2000).
By way of contrast, other forms of performance assessment do not
necessarily focus on the accomplishment of particular tasks or task
types. Rather, a variety of approaches to performance assessment may
simply utilize particular tasks which are then evaluated according to
some criteria that may not be directly related to the tasks themselves
(e.g., criteria found in global language proficiency rating scales, second
language developmental sequences, language production characteristics
like accuracy, complexity, or fluency, etc.).
In short, assessment tasks are selected or designed in order to
promote particular language behaviors which are to be evaluated. Such
tasks thus provide the means through which language behavior or
knowledge is elicited so that interpretations about such abilities or
knowledge can be easily made. As such, assessment tasks prioritize test
method characteristics that may be indirectly related to the actual
40
conditions associated with the performance and accomplishment of
tasks. Apparently, the primary purpose of assessment tasks is to
facilitate interpretations about learners' abilities to accomplish particular

tasks. Therefore, the identification and specification of real-world
criteria for task accomplishment pose the most important step in the
development of task-based testing procedures.
Real-world assessment criteria, however, are not always sufficiently
clear to allow teachers to make unambiguous judgments about
performance; the criteria in task-based assessment system are not
specific enough for assessment purposes (Lynch & McLean, 2001). In
addition to the problem of stating a clear-cut set of real-world criteria,
there is some concern over the problem of generalizability; since direct
assessments of complex performance do not generalize well from one
task to another (perhaps because performance is heavily taskdependent),
performance on a task does not necessarily imply that the learner could
do similar tasks in the domain. This task specificity is compounded by
limited sampling from a domain and the difficulty of generalizing the
performance to the whole domain.
Block (2001) argues that to overcome these problems one should
increase the number of tasks and ensure comprehensive coverage of the
domain in order to improve generalizability. Brown et al. (2002)
recommend the use of a wider range of assessment tasks and activities,
together with reemphasis of assessment for professional purposes rather
than for accountability or achievement records. Bruton (2002), on the
other hand, argues that, in relation to language assessment, we need to
rethink the whole concepts of accuracy and consistency: "Perhaps we
may have to begin a search for meaningful quality criteria for the
inclusion of test items, rather than rely on a measure of internal
41
consistency" (p. 114). Similarly, Nakahama et al. (2001) call for the
development of alternative models of performance assessment for
warranting generalizability, internal consistency and construct validity
estimates.
As an alternative, in the current study, the old-fashioned problem of
generalizability is dealt with using the potentiality of Multiple
Intelligences Theory in terms of "a grid design approach" (Skehan,
2002). In other words, the real-world task domains are specified in
advance and assessment tasks are created to systematically represent the
critical aspects of these domains based on the two well-established
doctrines: Multiple Intelligences core operations and hierarchical levels
of Cognitive Domain in the Taxonomy of Educational Objectives.
Primarily, the language contents of assessment tasks are schematized
based on the MI core operations and, secondarily, the assessment tasks
are graded according to the difficulty levels of Cognitive Domain.
Introducing MI theory, Part II starts with elaborate discussion on its
conceptual configurations.
Part II Multiple Intelligences Theory
Multiple Intelligences (MI) Theory was first published in Gardner's

Frames of Mind (1983) and quickly became established as a classical
model for understanding, teaching, and evaluating many aspects of
human intelligence, learning style, personality and behavior in
education and industry. Gardner has initially developed his ideas of
multiple intelligences as a contribution to psychology; however, his
theory was soon embraced by education, as well as teaching and testing
communities.
42
The Multiple Intelligences philosophy of assessment is closely in
line with the perspectives of a growing number of leading educators
who have recently argued that authentic measures of assessment probe
learners' understanding of material far more thoroughly than
multiplechoice or fill-in-the-blank tests (Gardner, 1995). Such
standardized measures almost always assess learners in artificial
settings far removed from the real world (Fodor, 1983). Perhaps, the
biggest shortcoming of standardized tests is that they require learners to
show, in a narrowly defined way, what they have learned. On the
contrary, authentic assessment allows learners to show what they have
learned in the context or in a setting that closely matches the real-life
environment in which they would be expected to show that learning.
MI theory provides its greatest contribution to assessment in
suggesting multiple ways to evaluate learners. MI theory supports the
belief that learners should be able to show competence in a specific
skill, subject, content area, or domain in variety of ways. Moreover, just
as MI theory suggests that any instructional objective can be taught in at
least eight different ways based on eight categories of human
intelligence, so does it imply that any subject can be assessed in at least
eight different ways.
Accordingly, in an educational context, learners might be tested in
any number of ways, for example: (a) Learners could be exposed to all
eight performance tasks in an attempt to discover the area(s) in which
they are most successful; (b) Learners might be assigned a performance
task based on the teacher's understanding of their most developed
intelligence; and (c) Learners themselves could choose the manner in
which they would like to be assessed.
43
2.5 Evidence in Favor of Multiple Intelligences Theory
2.5.1 Educational Evidence

According to Smith (2002), though MI theory has not been readily
accepted in the academic psychology, the theory has met strong positive
responses from many educators. It has been embraced by a range of
educational theorists and significantly applied by teachers and policy
makers to the problems of schooling. Recently, a large number of schools
in North America have looked to structure their curricula based on valued
learners’ intelligence performances and to design classrooms and even
the whole schools to reflect the understanding that MI theory has
developed. In addition, MI theory has been introduced to pre-schools,
higher vocational, and adult education initiatives (Ellis, 2000). Kronhaber
(2001) identifies a number of reasons why teachers and policy makers
respond positively to MI
theory:
The theory validates educators’ everyday experience that the learners
think and learn in many different ways. It also provides educators with a
conceptual framework for organizing and reflecting on curriculum
assessment and pedagogical practices. In turn, this reflection has led many
educators to develop new approaches that might better meet the needs of
the range of learners in the classroom. (p. 229) Similarly,
Guignon (1998) points out that when MI theory busts on the scene, it
seemed to answer many questions for experienced teachers. She further
mentions that:
We all had learners who did not fit the model. We knew that the learners
were bright, but they didn’t excel on tests. Gardner’s claim that there are
several kinds of intelligence gave us and others, involved with teaching
44
and learning, a way of beginning to understand those learners. We would
look at what they could do well, instead of what they could not do. (p. 1)
In addition, Armstrong (2003) outlines the following key points
that educators find attractive about MI theory:
1. Each person possesses all eight intelligences. In each person,
however, the eight intelligences function together in unique ways.
Some people have high levels of functioning in all or most of the
eight intelligences; a few people lack most of the rudimentary
aspects of an intelligence.
2. Intelligence can be developed. MI theory suggests that everyone
has the capacity to develop all eight intelligences to a reasonably
high level of performance with appropriate
encouragement, enrichment and instruction.
3. Intelligences work together in complex ways. No intelligence
really exists by itself in reality. Intelligences are always interacting
with one another.
4. There are many different ways to be intelligent. There is no
standard set of attributes that one must have in order to be
considered intelligent.
2.5.2 Application Evidence

As Guignon (1998) argues, MI theory makes great changes in the
way some educators teach; the psychological, educational, and
theoretical perspectives of MI arise from the assumption that learners
are active contributors in the learning process, though they need
support and facilitation to their capacity and power. Similarly,
Armstrong (2003) points out that one of the most remarkable features
45
of MI theory is "how to provide eight different potential pathways to
learning" (p. 23). He further mentions that MI theory facilitates
effective learning if a teacher has difficulty with teaching a learner in
the more traditional linguistic or logical way of instruction: You
don’t have to teach or learn something in all eight ways; just see what
the possibilities are, and then decide which particular pathways
interest you the most, or seem to be the most effective teaching or
learning tools… [T]he theory of multiple intelligences is so intriguing,
because it expands our horizon of available
teaching/learning tools beyond the conventional linguistic and logical
methods used in most schools (e.g., lecture, textbook, writing
assignment, formulas, etc.). (pp. 27-8)
Armstrong (2003) further mentions some outstanding benefits

of MI theory in application. His ideas can be summarized as
follows:
o Providing a variety of curricular options to give opportunities to both
the teacher and learners in order to uncover their own strengths
and interests;
o Providing variety of activities or entry points to develop
understanding or learning skills and allowing learners to learn in the
ways they are most comfortable;
o Expanding and analyzing instructional strategies and media based on
learners' multiple intelligences and areas of strengths and weaknesses
observed in the classroom;
o Assessing learners’ intelligences toward developing educational
activities; informal assessments based on observations, learner
checklists, portfolio, and questionnaires are some classroom activities
46
that provide a context to collect invaluable information about learners’
areas of competence;
o Expanding assessment options to allow for the learners’ use of areas
of strength in demonstrating their learning; analogue to providing
curricular options, giving learners options for showing their learning
allows them to use ways that are comfortable and through which they
can experience success.
2.5.3 Biological and Cognitive Evidence

According to Gardner (1983), from the biological and cognitive
perspectives, there come eight criteria in favor of MI theory:
1. The potential of isolation by brain damage: The extent to which a
particular faculty can be destroyed, or spared in isolation as a result
of brain damage, has proved its relative autonomy from other human
faculties; the consequences of several injuries may well constitute the
single most instructive line of evidence regarding those distinctive
abilities or computations that lie at the core of a human intelligence.
2. The existence of idiot savants, prodigies and other exceptional

individuals: Second to brain damage in its persuasiveness is the
discovery of an individual who exhibits a highly uneven profile of
abilities and deficits. In the case of a prodigy, we encounter an
individual who is extremely precocious in one (or, occasionally in
more than one) area of human competence. In the case of the idiot
savant (and other retarded or exceptional individuals, including
autistic children), we behold the unique sharing of particular human
ability against a background of mediocre or highly retarded human
performances in other domains.
47
3. An indefinable core operation or set of operations: Central to the
notion of intelligence is the existence of one or more basic core
operations or mechanisms, each one dealing with specific kinds of
input. One might go so far as to define a human intelligence as a
neural mechanism or computational system which is genetically
programmed to be activated or triggered by certain kinds of internally
or externally presented information. Given this definition, it becomes
crucial to be able to identify these core operations, to locate their
neural substrate, and to prove that these cores are indeed separate
(Morris, 2002).
4. A distinctive developmental history, along with a definable asset of
expert "end-state" performances. An intelligence should have an
identifiable developmental history through which, normal as well as
gifted, individuals pass in the course of ontogeny. To be sure, the
intelligence will not develop in isolation, except in an unusual person,
so it becomes necessary to focus on those cases or situations where
the intelligence occupies a central place. In addition, it should prove
possible to identify disparate levels of expertise in the development
of an intelligence, ranging from the universal beginnings, through
which every novice passes, to exceeding high levels of competence,
which may be visible only in individuals with unusual talents and/or
special forms of training.
5. An evolutionary history and evolutionary plausibility:
All species display areas of intelligence (and ignorance), and
human beings are no exceptions (Gardner, 1983). The roots of our
current intelligence reach back millions of years in the history of
species. A specific intelligence becomes more plausible to the
extent that one can locate its evolutionary antecedents, including
48
capacities (like bird song or primate social organization), that are
shared, with other organisms. One must also be on the lookout for
specific computational abilities, which appear to operate in
isolation from other species, but have become yoked with one
another in human being (Gardner, 1999).
6. Support from experimental and psychological tasks: Many paradigms
in experimental psychology illuminate the operation of a certain
intelligence. Using the methods of cognitive psychology, one can, for
example, study details of linguistic or spatial processing with
exemplary specificity. Consequently, the relative autonomy of an
intelligence can be investigated. Especially suggestive are studies of
the tasks that interfere with one another; tasks that transfer across
different contexts, and identification of different forms of memory,
attention or perception that may be peculiar to one kind of input.
Such experimental tests can provide convincing support for the claim
that particular abilities are (or are not!) manifestations of the same
intelligence.
7. Supports from psychometric findings: Outcomes of psychological
experiments, such as the outcomes of IQ tests, provide valuable
sources of information relevant to intelligences. While the tradition of
intelligence testing has not emerged yet, it is clearly relevant to our
pursuit here. To the extent that the tasks that purportedly assess one
intelligence correlate with one another and less highly with those that
purportedly assess other intelligences, the formulation enhances its
credibility.
8. Susceptibility to encoding in a symbol system: Much of human
representation and communication of knowledge take place via
symbol system--culturally contrived systems of meaning that capture
49
important forms of information. Language, picturing, and
mathematics are only three of the symbol systems that have become
important in the world for human survival and human productivity. In
MI theory, one of the features that makes a raw computational
capacity useful (and exploitable) by human beings is its susceptibility
to marshaling by a culture’s symbol system. Viewed from one
opposite perspective, symbol systems may have evolved just in those
cases where there exists a computational capacity ripe for harnessing
by the culture. Consequently, these are some criteria by means of
which a candidate's competence can be judged as an intelligence in
MI theory. Following the accountability and evidentiality of MI
theory, next section elaborates in detail on the eight categories of
intelligence originally
identified by Gardner (1983).
2.6 Multiple Intelligences Defined
Almost eighty years after the first intelligence tests were developed,
Gardner challenged the commonly-held belief about the accountability
of those tests results. Stating that our culture defined intelligence too
narrowly, he proposed in his Frames of mind (1983) the existence of at
least seven basic intelligences. More recently, however, he has added the
eighth, and even discussed the possibility of the ninth intelligence
(Gardner, 1999). In his MI Theory, Gardner tries to broaden the
scope of human potential beyond the confines of IQ scores. He seriously
questions the validity of determining an individual's intelligence through
the practice of taking a person out of his natural learning environment
and asking him to do isolated tasks he would never done before and
50
probably would never choose to do again. In a sharp contrast with the
current theories of intelligence, Gardner
(1983) suggests that intelligence has more to do with the capacity for
solving problems and fashioning products in a context-rich and naturalist
setting. Once Gardner's broader and more pragmatic perspective is
taken, the concept of intelligence begins to lose its mystique and
becomes a functional concept that could be seen working in people's
lives in a variety of ways. Gardner has provided a means of mapping the
broad range of abilities that humans possess by grouping their
capabilities into eight comprehensive
categories or intelligences.
Figure 2.1 Multiple Intelligences Defined (Presented in Armstrong,

2003, p. 102)
51
2.6.1 Linguistic Intelligence: Core Operations
[Linguistic Intelligence has] to do with words, spoken or written. People
with verbal-linguistic intelligence display a facility with words and
languages. (Gardner, 1983, p. 49)
According to Gardner's (1983) notion of Linguistic Intelligence, in a

poet or a lawyer, one may see the core operations of linguistic
intelligence, that is, a sensitivity to the meaning of words, whereby an
individual appreciates the subtle shades of differences among three spelling
inks of "intellectually, deliberately, or on purpose" (p. 52). This extends to a
sensitivity to the order among words and the capacity to follow rules of
grammar on carefully selected occasions. At a more sensory level, Linguistic
Intelligence entails a sensitivity to sounds, rhythms, inflections, and meters of
words. This is the ability which makes poetry, even in a foreign language,
beautiful to hear. Finally, Linguistic Intelligence accounts for a sensitivity to
different functions of language; its potential to excite, to convince, to stimulate,
to convey information, or simply to please (Lightbown, 2000). But, for those
among us who are not practically poets or lawyers, what are
some of the other major uses to which language can be put? First
of all, there are the rhetorical aspects of language, that is, the ability to use
language to convince other individuals in a course of action. Second of all,
there is the mnemonic potential of language or the capacity to use language to
help one remember information, ranging from a list of possessions and rules of
a game, to directions for finding one’s way, or procedures for operating a new
machine. A third aspect of Linguistic Intelligence is its role in explanation
(Morris, 2002). Much of teaching and learning occur through language
principally through oral instructions, employing verse or simple explanations
and, interestingly, through the words in their written forms. Finally, there is a
52
potential for language to explain its own activities, that is, the ability to use
language to engage in the "meta-linguistic" analysis (Gardner, 1999).
2.6.2 Musical Intelligence: Core Operations

[Musical Intelligence has] to do with music, rhythm and hearing. [People
high in musical intelligence] often use songs or rhythms to learn and
memorize information, and may work best with music playing. (Gardner,
1983, p. 60)
Of all the gifts with which individuals may be endowed with, none
emerges earlier than Musical Intelligence. The extent to which musical
talent is expressed publicly, however, will depend upon the milieu in
which one lives (Doughty, 2001). There is relatively little dispute over the
principal constituent elements of music, yet experts will differ on the
precise definition of each aspect.
Most central in Musical Intelligence are pitch (or melody) and rhythm
or sounds emitted at certain auditory frequencies and grouped according
to a prescribed system. Pitch is more central in certain cultures, such as in
oriental societies, while rhythm is correlatively emphasized in
SubSaharian Africa. Next in importance only to pitch and rhythm in
Musical
Intelligence is timbre, or the characteristic qualities of a tone (Spolasky,
1992). Many experts have gone on to place the affective aspects of music
close to its core. On Rogers' (2000) account, for instance, "music is the
controlled movement of sounds in time. … It is made by humans who
want it, enjoy it, and even love it" (p. 88).
An analogy to language may not be out of place here. Just as one can
tear apart a series of levels of language from the basic phonological levels
through a sensitivity to word order and word meaning, so can one not only
53
examine the sensitivity to individual tones or phrases, but also look at how
these fit together into larger musical structures (Sternberg, 1993). And just
as these different levels of analysis can be brought together in
pprehending a literary work like a poem or novel, so does the
apprehension of musical works require the ability to make the local
analysis of the "bottom-up" camp, as well as the "top-down"
schematizations of the Gestalt school (O’Connell, 1994).
2.6.3 Mathematical Intelligence: Core Operations

[Mathematical Intelligence has] to do with logic, abstraction, inductive
and deductive reasoning and numbers. One possibility is that formal,
symbolic logic and strict logic games are under the command of logical
intelligence, while skills such as fallacy hunting and argument
construction are under the command of verbal intelligence. (Gardner,
1983, p. 73)
In contrast to Linguistic and Musical Intelligences, the competence
that Gardner calls Mathematical Intelligence does not have its origins in
the auditory-oral sphere. Instead, this form of thought can be traced to a
confirmation with the world of objects. For it is in the confronting objects,
in ordering and recording them, and in assessing their quality, that the
young child gains her or his initial and most fundamental knowledge
about the logical-mathematical realm (Gardner, 1999). From this
preliminary viewpoint, Mathematical Intelligence rapidly becomes remote
from the world of material objects; the individual becomes more able to
appreciate the actions that one can perform, the statements (or
propositions) that one make about those actual or potential action the
relationships among those statements.
Over the course of development, one proceeds from the realm of the
54
sensory-motor to the realm of pure abstraction and ultimately to the height
of logic and science. The chain is long and complex, but it needs not be
mysterious: The roots of the highest regions of logical-mathematical talent
can be found in the simple actions of young children upon the physical
objects in their worlds (Sternberg, 1993; Willis, 1996). In short, according
to this analysis, the basis for all Mathematical forms of intelligence
initially inheres in the handling of objects. Such actions can, however, also
be conducted mentally, inside one’s head. And after some time, the
actions, in fact, become internalized. The child needs not touch the objects
himself; he can simply make the required comparisons, additions, or
deletions in his head and, all the same, arrive at the correct answer.
Moreover, these mental operations become increasingly certain; no longer
does the child merely suspect that two different orders of counting will
yield ten objects; rather he is certain that they will. Logical necessity
comes to attend these operations as the child is now dealing with
empirical discoveries (Morris, 2002).
Whatever the views of the experts in these particular disciplines, then,
it seems legitimate from the psychological point of view to speak of a
family of interlocking capacities. Beginning with observations and
objects in the material world, the individual moves towards the
increasingly abstract formal systems whose interconnections become the
matter of logic rather than of empirical observations. As Chen (1995)
puts it succinctly, "as long as you are dealing with pure mathematics, you
are in the realm of complete and absolute abstraction" (p. 39)
Indeed, the mathematician ends up working within the world of invented
objects and concepts which may have no direct parallel in everyday
reality, even so the logician’s primary interest falls on the relationship
among statements rather than on the relation of those
55
statements to the world of empirical facts (Gardner, 1983).
2.6.4 Spatial Intelligence: Core Operations
[Spatial Intelligence has] to do with visions and spatial judgments.
Some critics point out the high correlation between the spatial and
mathematical abilities, which seems to disprove the clear separation of
the intelligences. Although they may share certain characteristics, they
are easily distinguished by several factors. (Gardner, 1983, p. 85)
Central to Spatial Intelligence are the capacities to accurately

perceive the visual world, to perform transformations and modifications
upon one’s initial perceptions, and to be able to re-create aspects of
one’s visual experience, even in the absence of relevant physical
stimuli; one can be asked to produce forms or simply to manipulate
those that have been provided. These abilities are not clearly identical:
An individual may be acute, say, in visual perception, while having
little ability to draw, imagine, or transform in abstract world. As
Musical Intelligence consists of rhythmic and pitch abilities which are
sometimes dissociated from one another, or as Linguistic Intelligence
consists of syntactic and pragmatic capacities which may also come
uncoupled, so does Spatial Intelligence emerge as an amalgam of
abilities. All the same, the individual with skills in several
aforementioned areas is most likely to achieve success in the spatial
domain. The fact that practice in one of those areas stimulates
development of skills in other areas is another reason that spatial skills
can reasonably be considered "of a
piece" (Sternberg, 1993, p. 112).
The most elementary operation, upon which other aspects of
Spatial Intelligence rest, is the ability to perceive a form or an object.
One can test this ability by multiple-choice questions or by asking an
56
individual to copy a form; copying turns out to be a more demanding
assignment, and often latent difficulties in the spatial realm can be
detected through errors in a copying task. Analogous tasks can,
incidentally, be posed in the tactile modality, for both blind and sighted
individuals (Gardner, 1995).
Once an individual is asked to manipulate a form or an object,
appreciating how it is perceived from another viewing angle, or how
it would look when if it is turned around, s/he fully enters the spatial
realm for a manipulation through which "space" has been required
(Morris, 2002). Such tasks of transformation can be demanding, as
one is required mentally rotate complex forms through any number
of twists and turns. Problems of still greater difficulty can be posed in
the object or picture domain; when a problem is phrased verbally, a
clear option raises to solve the problem strictly through the plane of
words, without any resort to the creation of a mental image or picture
in the head.
In a nutshell, Spatial Intelligence entails a number of loosely
related capacities: The ability to recognize instances of the same
element, the ability to transform or recognize the transformation of
an element into other elements, the capacity to conjure up mental
imagery, the capacity to produce a graphic likeness of spatial
transformation, and the like (Perkins, 1981). Convincibly, these
operations are independent from one another and could develop or
break down separately. However, just as rhythm and pitch work
together in the area of music, so do the spatial capacities occur
together in the spatial realm. Indeed, they operate as a family, and the
use of an individual operation may well reinforce the use of the
others (Gardner, 1983).
57
2.6.5 Kinesthetic Intelligence: Core Operations
[Kinesthetic Intelligence has] to do with movement and action. Those
with strong Kinesthetic Intelligence seem to use what might be termed
as muscle memory; i.e., they remember things through their body,
rather than through words (verbal memory) or images (visual memory).
(Gardner, 1983, p. 110)
Skillful use of one’s body has been important in the history of the
species for thousands of years. In speaking of masterful use of the
body, it is natural to think of the Greeks, and there is a sense in which
this form of intelligence reached its apogee in the West during the
Classic Era. The Greeks revered the beauty of the human form and, by
means of their artistic and athletic activities, sought to develop a body
that was perfectly proportioned and graceful in movement, balance,
and tone (Robinson, 2001). More generally, they sought a connection
between mind and body through a trained body that properly responds
to the power of mind. All these in particularly striking form account for
the actions and capacities associated with a highly evolved Kinesthetic
Intelligence.
The prominent characteristic of such an intelligence is the ability to
use one’s body in highly differentiated and skilled ways. Another
Kinesthetic core operation is the capacity to work skillfully with
objects, both those that involve fine motor movements of one’s fingers
and hands and those that exploit gross motor movements of the body.
Gardner (1983) treats these two capacities--control of one’s bodily
motions and the capacity to handle objects skillfully--as the cores of
Kinesthetic Intelligence. It is rarely possible for these two core
elements to exist separately; however in typical cases, skills in the use
58
of the body for functional or expressive purposes tend to go hand in
hand with
attempts in manipulation of objects.
2.6.6 Personal Intelligences: Core Operations

[Personal Intelligences have] to do with oneself and to do with others.
People in the category of Intrapersonal Intelligence are typically introverts
and prefer to work alone, while those who are strong in Interpersonal
Intelligence are usually extroverts and learn best by working with others
and often enjoy discussion and debate. (Gardner, 1983, p. 140)
To examine the development of both of these aspects of human nature

in turn, one can refer to the development of the internal aspects of a
person. The core capacity at work here is the access to one’s own feeling
life, the capacity to discriminate among these feelings and, eventually, to
label them, to enmesh them in symbolic codes, and to draw upon them as
the means of understanding and guiding one’s behavior
(Armstrong, 200
In its most primitive form, Intrapersonal Intelligence amounts to little
more than the capacity to distinguish a feeling of pleasure from that of
pain and, on the basis of such discrimination, to become more involved in
or to withdraw from a situation. At its most advanced level, Intrapersonal
Intelligence allows one to detect and to symbolize complex and highly
differentiated sets of feelings (Gardner, 1983).
The other personal intelligence, that is, Interpersonal Intelligence, turns
outward away from other individuals. The core capacity here is the ability
to notice and make distinctions among other individuals and, in
particular, among their moods, temperaments, motivations, and
59
intentions. In its most elementary form, Interpersonal Intelligence entails
the capacity of the young child to distinguish among the individuals
around her and to detect their various moods (Armstrong, 2003). In its
advanced form, Interpersonal Intelligence permits a skillful adult to infer
individuals' intentions and desires, and to act upon her hunches, for
example, by encouraging a group of disparate individuals to behave
along their desired lines (Armstrong, 1993).
2.6.7 Naturalist Intelligence: Core Operations

[Naturalist Intelligence has] to do with nature, nurturing, and
classifications. This is the newest among the intelligences and is not as
widely accepted as the original seven. The theory behind Naturalist
Intelligence is often criticized, as it is seen by many not indicative of an
intelligence but rather of an interest. (Gardner, 1995, p. 138)
The very term Naturalist combines a description of the core ability

with a characterization of a role that many cultures value. A naturalist
demonstrates expertise in the recognition and classification of the
numerous species--"the flora and fauna"--in her or his environment
(Gardner, 1999, p. 140). Every culture prizes people who not only can
recognize members of a species that are especially valuable, and notably
dangerous, but also can appropriately categorize new and unfamiliar
organisms.
In cultures without formal science, the naturalist is a person most
skilled in applying the accepted folk taxonomies; in cultures with a
scientific orientation, the naturalist is a biologist who recognizes and
categorizes specimens in terms of accepted formal taxonomies, such as
the botanical ones devised in the 1700s by Swedish scientist Carlos
60
Linnaeus (Doughty, 2001).
Naturalist Intelligence proves as firmly entrenched as do the other
intelligences. There are some core capacities, such as to recognize
instances as members of a group (or more formally, a species), to
distinguish among members of a species, to recognize the existence of
other neighboring species, and to chart out among several species (Yap,
1993). Clearly, the importance of Naturalist Intelligence is
wellestablished in the evolution history, where the survival of an
organism depends on its ability to discriminate among similar species
(Gardner, 1995). The naturalist capacity presents itself, not only in those
primates evolutionary closest to human beings, but also in birds that can
discern differences among species of plants and animals and can even
recognize
which forms in photographs are humans!
2.7 MI Theory and Performance Assessment
As children do not learn in the same way, they cannot be assessed in a

uniform fashion. Therefore, it is important that a teacher create
intelligence profiles for each learner. Knowing how each learner learns
will allow the teacher to properly assess the child's progress. This
individualized evaluation practice will allow the teacher to make more
informed decisions on what to teach and how to present information.
(Lazear, 1992, p. 122)
As Lazear (1992) suggests, MI theory can properly provide an
assessment framework within which learners can have their rich and
complex lives acknowledged, celebrated, and nurtured. In fact, MI
assessment and MI instructions represent flip sides of the same coin, MI
61
approaches to assessment are not likely to take more time and resources
to implement, as long as they are seen as an integral part of the
instructional process (Gardner, 1995).
Building on the view of assessment derived from MI theory,

researchers are exploring assessment techniques that are built around
authentic performances. In music, for example, a teacher evaluates a
learner's facility with a given piece by asking the learner to perform that
piece--the performance itself is the test. The assessment is authentic
because performance on the test draws directly on the skills that the
learner is trying to master, as the learner practices the performance
piece repeatedly, she is taking the test until she achieves mastery
(Armstrong, 2003).
The performance selected for assessment must reflect the actual
skills and competencies that are valued in the field. For example,
authentic skills in chemistry class might include designing an
experiment around a question, gathering evidence, analyzing the
resulting data, and reporting the results in a coherent, convincing
manner. An authentic task in language studies might include reviewing
relevant information in the library, conducting original research, and
creating a highly documentary report that represents the results. In each
case, learners practice these skills repeatedly until they will master
them.
A common difficulty in utilizing performance assessment is
determining how the learners' performances will be scored. Some
teachers try to solve the problem by using alternative assessment
techniques that eventually may distort learners' rich and complex works
into holistic scores or norm-referenced rankings like these: Learner A's
62
research project is at a novice level and Learner B's project is at a
mastery level; or, for example, "Jane has done a better job than John in
phonology, though John is better in linguistics" (Skehan, 2007). This
reductionism ends up looking very much like standardized testing in
some of its worst moments. In task-based assessment, such rating scales
are not typical. Instead, reflecting the different psycholinguistic
research tradition to which they belong, researchers have tended to use
more precise operationalisations of underlying constructs (See Skehan,
2002).
Research is continuing to investigate the impact of task
characteristics on learners' performance. As an alternative solution to
the potential problems dealing with rating the language learners'
performances on assessment tasks, this study incorporates the difficulty
levels of the Taxonomy of Educational Objectives: Cognitive Domain
as a quality control mechanism through which one can judge how
deeply learners' minds have been stirred by an assessment task. Part III
introduces this taxonomy in detail.
Part III Taxonomy of Educational Objectives
Discussed earlier in Part I, one of the thorny issues in performance

assessment is concerned with the problem of rating scales. Some
investigators have explored the use of performance ratings built on
conventional methods of language testing and others have used the concept
of task difficulty (See Robinson, 2001; Skehan, 1993; Skehan & Foster,
2001). Although there has been significant progress in developing better
performance measures, there are still areas of disagreement. Obviously, this
63
problem calls for more research. In Part III, therefore, Bloom's Taxonomy
of Educational Objectives: Cognitive Domain is introduced as an alternative
information-processing theory.
In 1950s, Bloom recognizes that one of the most important issues in
education is to help learners to achieve the goals of the curriculum they
were studying, rather than to compare them with each other. Bloom also
wants to reveal what learners are thinking about when teachers are
teaching because he recognizes that "it is what learners are experiencing
that ultimately matters" (Bloom, 1956, p. 67). The processes of teaching
and necessarily testing, therefore, need to be geared towards the design
of tasks that would progressively and inevitably lead to the realization of
the
objectives that defined the goals of the
curriculum. The variable that needs to be addressed, as Bloom sees it,
is time. It makes no pedagogical sense to expect all learners to take the
same amount of time to achieve the same objectives. There are individual
differences among learners, and the important thing is to accommodate
those differences in order to promote learning rather than to hold time
constant and to expect some learners to fail. Education is not a race! In
addition, learners are encouraged to help one other. Feedback and
correction are
immediate.
In short, what Bloom suggests is applying, in a very rational way, the
basic assumptions embraced by those who believe the educational
process should be geared towards the realization of educational
objectives. Moreover, he believes that such an approach to curriculum, to
teaching, and to assessment would enable virtually all youngsters to
achieve success in school. If they do not succeed the potential problem
64
lies in curriculum design and in the forms of teaching and testing that are
inappropriate to
promoting the realization of the goals.
His convictions about the power of the environment to influence
human performance are illustrated in Developing Talent in Young People
(Bloom et al., 1985). In this book, Bloom and his colleagues argue that
even worldfamous high-achieving adults, such as champion tennis
players, mathematicians, scientists, and award-winning writers, have
seldom been regarded as child prodigies! What makes the difference,
Bloom believes, is the kind of attention and support those individuals
have received at home from their parents. Champion tennis players, for
example, have constantly profited from the instructions of increasingly
able teachers of tennis during the course of their childhood. Because of
this and the amount of time and energy they spend in learning to play
championship tennis, their success is the outcome of guidance and effort
rather than raw genetic capacity; attainment is prominently a product of
learning, and learning is influenced by opportunity and effort. It was
then, and it is now, a powerful and optimistic conception of the
possibilities that education provides (Samuda,
2001).
Many of Bloom's proponents have also studied the impact of
environment on learners' performance, such as the educational environment
of home or, at the broader scope, the immediate cultural setting. Bloom’s
initial works focus on what might be called the operationalisation of
educational objectives based on a cognitive taxonomy. Bloom’s Taxonomy
of Educational Objectives: Cognitive Domain is predicated on the idea that
cognitive operations can be ordered into six increasingly complex levels.
What is taxonomic about this taxonomy is that each subsequent level
65
depends upon the learner’s ability to perform at the level or levels that
precede it. For example, "the ability to evaluate--the highest level in
Cognitive Domain--is predicated on the assumption that for the learner to
be able to evaluate, they would need to have the knowledge or necessary
information, comprehend the information s/he has, be able to apply it, be
able to analyze it, synthesize it and then eventually evaluate it [italics
added]" (Bloom, 1965, p. 35). One of the consequences of the three
categories in the taxonomy, that is, Cognitive, Affective and Psychomotor
Domains, is that not only they serve as a means through which evaluation
tasks could be formulated, but also they provide a framework for the
formulation of the curricular objectives.
2.8 Taxonomy of Educational Objectives Defined

Taxonomy of Educational Objectives identifies three distinctive
domains or categories: (a) Cognitive Domain (intellectual capability, i.e.,
knowledge or think); (b) Affective Domain (feelings, emotions and
behaviors, i.e., attitude or feel); and (c) Psychomotor Domain (manual and
physical skills, i.e., skills or do). Generally speaking, Bloom's taxonomy is
based on the premise that the levels of each domain are ordered in terms of
the degree of difficulty. Consequently, each level in these categories must
be mastered before progress is made to the next. As such, the levels within
each domain are actually the levels of learning development with
progressive difficulty.
Efficiently enough, this matrix structure enables a checklist or template
to be constructed for the design of learning programs, training courses,
lesson plans, and evaluation. Effective learning--especially where training
is to be converted into organizational results--should arguably cover all the
levels of each domain relevant to the situation and to the learners. Learners
66
should benefit from development of knowledge and intellect (Cognitive
Domain), attitudes and beliefs (Affective Domain), and the ability to put
physical and bodily skills into act (Psychomotor Domain). These domains
and their hierarchical levels are elaborated in turn.
2.8.1 Cognitive Domain
Figure 2.2 Levels of Cognitive Domain (Bloom, 1956, p. 50)
Knowledge
Knowledge is defined as the remembering of previously learned
materials. This may involve the recall of a wide range of material, from
specific facts to complete theories, but all that is required is the bringing
to mind the appropriate information (Bloom et al., 1985). Knowledge
represents the lowest level of learning outcomes in Cognitive Domain.
Some examples of objective behaviors are "knowing common terms,
67
knowing specific facts, knowing methods and procedures, knowing basic
concepts, and knowing basic principles" (Bloom et al., 1985, p. 56).
Comprehension
Comprehension is defined as the ability to grasp the meaning of
materials. This may be shown by translating material from one form to
another (words to numbers, for instance), by interpreting material
(explaining or summarizing), and by estimating future trends (predicting
consequences or effects). These learning outcomes go one step beyond the
simple remembering of material that represents the lowest level of
understanding. Some examples of objective behaviors are "understanding
facts and principles, interpreting verbal materials, interpreting charts and
graphs, translating verbal material into mathematical formulae, estimating
the future consequences implied in data, and justifying methods and
procedures" (Bloom at al., 1985, p. 66).
Application
Application refers to the ability to use learned materials in new and
concrete situations. This may include the application of such things as
rules, methods, concepts, principles, laws, and theories. Learning
outcomes in this area require a higher level of understanding than those
under comprehension. Some examples of objective behaviors are
"applying concepts and principles to new situations, applying laws and
theories to practical situations, solving mathematical problems,
constructing graphs and charts, and demonstrating the correct usage of a
method or procedure"
(Bloom et al., 1985, p.
89).
68
Analysis
Analysis refers to the ability to break down materials into their
component parts so that their organizational structure may be understood.
This may include the identification of the parts, analysis of the
relationship between parts, and recognition of the organizational
principles involved. Learning outcomes here represent a higher
intellectual level than comprehension and application because they require
an understanding of both the content and the structural form of the
material. Some examples of objective behaviors are "recognizing unstated
assumptions, recognizing logical fallacies in reasoning, distinguishing
between facts and inferences, evaluating the relevancy of data, and
analyzing the organizational structure
of a work (e.g., art, music or writing)" (Bloom et al., 1985, p. 95).
Synthesis
Synthesis refers to the ability to put parts together to form a new
whole. This may involve the production of a unique communication
(theme or speech), a plan of operations (research proposal), or a set of
abstract relations (scheme for classifying information). Learning outcomes
in this area stress creative behaviors, with major emphasis on the
formulation of new patterns or structure. Some examples of objective
behaviors are "writing a well organized theme, giving a well organized
speech, writing a creative short story (or poem or music), proposing a plan
for an experiment, integrating learning from different areas into a plan for
solving a problem, and formulating a new scheme for classifying objects
(or events, or ideas)" (Bloom et al., 1985, p. 120).
Evaluation
69
Evaluation is concerned with the ability to judge the value of materials
(e.g., a statement, novel, poem, or research report) for a given purpose.
The judgments are to be based on definite criteria. These may be internal
criteria (organization), external criteria (relevance to the purpose), or the
learners themselves may determine the set of criteria. Learning outcomes
in this area are highest in the cognitive hierarchy because they contain
elements of all the other categories and become conscious if they
recognize value judgments based on clearly defined criteria. Some
examples of objective behaviors are "judging the logical consistency of
written material, judging the adequacy with which conclusions are
supported by data, judging the value of a work (art, music, writing) by the
use of internal criteria, and judging the value of a work (art, music,
writing) by the use of external standards of excellence" (Bloom et al.,
1985, p. 130). For the comprehensive outline of the levels of Cognitive
Domain see Appendix
A1.
70
2.8.2 Affective Domain
Figure 2.3 Levels of Affective Domain (Bloom, 1956, p. 60)
Receiving
Receiving is defined by the learner’s openness to experience the world
around and her or his willingness to feel through senses. At this level,
learners are evaluated for their internal focus and attitude to their learning
experience and the amount of interest they show in the teaching
materials. Moreover, the amount of time the learners take on their
academic achievement is positively valued. Intended outcomes include
the pupil's awareness that a thing exists. Some examples of objective
behaviors are "listening attentively, showing sensitivity to social
problems, and attending carefully" (Bloom et al., 1985, pp. 132-4).
Responding
Responding refers to learners’ active participation not only in learning
materials, but also in reacting and providing feedback to their learning
experiences. The learners’ activeness and class participation in group tasks or
71
in class discussions are highly graded in this level. Some examples of
objective behaviors are "completing homework, obeying rules, participating
in class discussion, showing interest in subject, and enjoying helping
others" (Bloom et al., 1985, p. 156).
Valuing
Valuing is the worth a learner attaches to a particular object,
phenomenon, or behavior. It commonly ranges from acceptance, through
commitment (e.g., assuming responsibility for the functioning of a group)
to attitudes and appreciation. Some examples of objective behaviors are
"demonstrating belief in democratic processes, appreciating the role of
science in daily life, showing concern for others' welfare, and
demonstrating a problem-solving approach" (Bloom et al., 1985, p. 170).
Organizing and Conceptualizing

Organizing and Conceptualizing mean bringing together different values,
resolving conflicts among them, and starting to build an internally consistent
value system. Learners feel enough autonomy to resolve their internal
conflicts, to compare, to relate and to synthesize values, and to develop a
philosophy of life. Some examples of objective behaviors are "recognizing the
need for balance between freedom and responsibility in a democracy,
understanding the role of systematic planning in solving problems, and
accepting responsibility for own behavior" (Bloom et al.,
1985, p. 189).
72
Characterizing by a Value or Value Concept
Characterizing by a value or value concept is the level in which, the
learner has held a value system that controls her or his behavior for a
sufficiently long time so that a characteristic "life style" has been
developed. The learner’s behavior is pervasive, consistent, and
predictable. Some examples of objective behaviors are "concerning
personal, social, and emotional adjustment, displaying self-reliance in
working independently, cooperating in group activities, and maintaining
good health habits" (Bloom et al., 1985, p. 199). For a detailed
description of the levels of Affective Domain see Appendix A2.
2.8.3 Psychomotor Domain
Figure 2.4 Levels of Psych0motor Domain (Bloom, 1956, p. 85)
Psychomotor Domain has ostensibly been established to address

skills development relating to manual tasks and physical movement.
However, this domain also concerns and covers modern-day business
73
and social skills, such as communications and operations with IT
equipment, for example, telephone and keyboard skills, or public
speaking. Thus, motor skills extend beyond the originally imagined
manual and physical skills. It is highly recommended to the teachers and
trainers to always consider using Psychomotor Domain, even if they
think the learning environment is adequately covered by the cognitive
and affective domains (Bloom et al., 1985).
Imitation
Imitation is concerned with copying others--either the teacher's or

the other learners'--actions. This skill requires enough degree of
observation and concentration by the learners which is positively
valued in this level of motor capacity. Some examples of objective
behaviors are "watching teacher or trainer, and repeating an action,
a process or activity" (Bloom et
al., 1985, p. 203).
Manipulation
Manipulation requires the learner to rely on her or his memory
for reproducing the action or to follow the given instructions by
heart. At this level, learners are required to carry out a task from
written or verbal instructions. Some examples of objective
behaviors are "rebuilding a construction, remembering an
instruction, and copying the actions" (Bloom
et al., 1985, p.
211).
74
Precision
Precision requires the learners to be autonomous, to act
independently, not to expect assistance. Some examples of objective
behaviors are "performing a task or activity with expertise and to
high quality without assistance or instruction, and being able to
demonstrate an activity to other
learners" (Bloom et al., 1985, p. 220).
Articulation
Articulation refers to the learner ability to adapt and integrate
expertise to satisfy a non-standard objective. The learner has to relate
and combine associated activities to develop methods and to meet
varying, novel requirements. Some examples of objective behaviors
are "solving a problem independently, integrating two previously
learned methods, and formulating a new path to reach the required
result" (Bloom et al., 1985, p.
234).
Naturalization
Naturalization refers to a learner’s state of automated, unconscious
mastery of activity and related skills at strategic levels. Learners are able
to define their aim, approach and strategy for employment of activities to
meet strategic needs. Some examples of objective behaviors are "being
able to manage projects, being able to design a work map, and specifying
goals and requirements" (Bloom et al., 1985, p. 254). For detailed outlof
the Levels of Psychomotor Domain see Appendix A3.
To conclude from the thorough investigation and implementation of
Bloom's Taxonomy of Educational Objectives as an
75
informationprocessing theory in many educational settings, one can see
that the hierarchical levels of Cognitive Domain might properly be
utilized in grading the learners’ performances on language assessment
tasks. In this study, then, the problem of grading the language tasks is
hypothetically stated in terms of the incorporation of the levels of
Cognitive Domain into a task-based model of language assessment. The
prototype Integrative Task (IT) Model is introduced
in Part IV.
Part IV Integrative Task (IT) Model

Generally speaking, task-based language assessment focuses on the
elicitation of performance on relevant tasks under conditions that
approximate the real world as much as possible, as well as on the
evaluation of task performance according to real-world criteria. The
primary purpose of such assessment is to facilitate interpretations on
learners' abilities to accomplish the particular tasks that are tested. As
such, the identification of real-world criteria for task accomplishment
poses the most important step in the development of task-based testing
procedures (Robinson, 2001). However, in addition to making
interpretations about learners' abilities to accomplish particular tasks,
test users often wish to make several other interpretations that may be
related to task performances.
First, they may wish to know whether performance on one target
task can be used to inform interpretations on the learners' abilities to
accomplish other similar tasks. This type of interpretation requires a
basis for generalizing from performance on one task to performance on
a set of related tasks. Second, they may also wish to know what it is that
enables learners to accomplish target tasks, or what it is that is keeping
76
learners from accomplishing them. This type of interpretation requires a
basis for relating underlying abilities of learners with sources of
performance difficulty in various second or foreign language tasks.
Recent research on second language task difficulty has begun to realize
a potential means for generalizing among tasks, as well as for better
understanding the relationship between language learners' underlying
abilities and their performances on tasks. Common to these views is the
notion that cognitive abilities interact with contextual demands placed
on performances by a range of variable task
characteristics (Brown et al., 2002; Skehan, 1998). For assessment
purposes, then, if the relationship between cognitive abilities and task
characteristics can be understood and modeled, developing a framework
for grading tasks according to performance difficulty would be possible.
This framework may, thus, provide a scaffold for making
generalizations from performance on one task to future performance on
other tasks with similar degrees of difficulty. Such a framework might
also enable test users to investigate the extent to which learners possess
the underlying abilities needed to accomplish a range of related tasks.
2.9 Task Characteristics: Current Approaches

The researchers who have taken a cognitive perspective to design
tasks usually focus on the psychological processes that learners are
typically engaged in while performing tasks. Skehan (2001) outlines
three areas that have recently influenced task-based studies: o The
analysis of how attentional resources are used during task
completion;
o The influence of task characteristics on performance; and o The
impact of different conditions under which tasks are
77
completed.
Recent research in task-based teaching claims that the manipulation
of task characteristics and processing conditions can focus a learner's
attention on the three competing goals of accuracy, fluency, and
complexity (Robinson, 2001; Skehan, 1998). There are two contrasting
approaches, here, but with many similarities. Skehan (1998) proposes
that attentional resources are limited, and that to attend to one aspect of
performance (complexity of language, accuracy, or fluency) may well
mean that other dimensions suffer. Skehan and Foster (2001) argue for
the existence of a tradeoff in performance, that is, greater fluency, for
example, may be accompanied with greater accuracy or greater
complexity, but not with both.
Robinson (2001), in contrast, advocates two propositions: o
Attentional resources are not limited in the way Skehan and Foster
(2001) argue, but instead learners can access multiple and non-
competing attentional pools, and
o Following Bygate (2001), complexity and accuracy in a task
correlate since they are driven individually by the nature of functional
linguistic demands of the task itself.
Whereas Skehan and Foster argue for fluency being correlated with
either complexity or accuracy (at best), Robinson argues that fluency
contrasts with complexity and accuracy, which both correlate with one
another. Empirical evidence, however, does not constantly support
either of these interpretations of attention, but it tends to support the
limited capacity view.
78
2.10 Task Difficulty: Statement of the Problem
Although there are still areas of disagreement within task difficulty
estimates, significant progress in developing better performance
measures can be seen. The complexity-accuracy-fluency dimensions of
task performance have been supported--both theoretically and
empirically--by their proponents (Skehan, 2002; Skehan & Foster,
2001).
Theoretically, as Skehan (1998) argues, the sequence of complexity-
accuracy-fluency implies three stages of change in the underlying
system as language learning takes place: (a) greater complexity, as more
complex interlanguage systems are developed; (b) greater accuracy, as
new interlanguage elements are used not simply haltingly and
incorrectly, but instead, with some reduction in error; and (c) greater
fluency, as language elements are routinzed and lexicalized.
Empirically, in addition, Skehan and Foster (2001) provide evidence
indicating how these different performance areas compete with one
another for limited attentional resources, suggesting that each area
needs to be included in a study if any wide-ranging claims about
performance are to be made. Research is continuing to establish just
how these three areas interrelate, and a growing number of
investigations in task-based assessment are based on carefully
computed data in each of these three areas. The research outcomes
reveal, however, that the competition among complexity-
accuracyfluency dimensions is predicted to have harmful effects on
decisions that are made about task difficulty (Bygate et al., 2001).
The counterarguments can be summarized as following:
1. It is not clear to what theory or research in EFL the
complexityfluency-accuracy criteria is held accountable; there is
79
relatively little reference to the language learning criteria in the
writing of this trilogy.
2. Grading and scaling tasks based on complexity-accuracy-
fluencycriteria both appeal to be arbitrary processes, left partly to
impressionistic judgments of the researchers (Murphy, 2003). As
Skehan (2007) states, using ad hoc criteria for assessing task difficulty,
such as "at least half the task" by "at least half the task-takers" (p. 54)
for assessing difficulty is not a satisfactory solution, for it makes task
performance a norm-referenced issue and reveals nothing about what
made one task than another; thereby precludes any generalizations to
new materials.
3. Findings in a large-scale study on eighty English learners
atintermediate-level (Murphy, 2003) has led to unjustifiable,
tentative interpretations regarding the complexity-accuracy-fluency
criteria. The assumption that task characteristics and conditions can
be manipulated to produce desired effects on complexity, accuracy,
and fluency is an attractive one, as it holds out hope of a principled
basis on which to devise a sequence of tasks. However, the evidence
from Murphy's report (2003) strongly suggests that the influence of
test takers on task performance can seriously jeopardize the task
designer's competitive
goals.
4. Referred to what Long and Crookes (1992) call the problem of
finiteness, which afflicts task-based language assessment most, the
questions such as "How many tasks and task types should be included
in a test?", "Where does one task end and the next begin?" or "How
many levels of task analysis are needed?" (p. 46) are left unanswered in
the current approaches to task-based language assessment. Some tasks,
80
say, doing the shopping, could involve other sub-tasks, for example,
catching a bus, paying the fare, choosing purchases, paying for
purchase, and so on. Moreover, some of those subtasks could easily be
broken down still further, for example, paying for purchases divided
into counting money and checking change.
5. Finally, based on the research findings reported by Bruton (2002),
language learners can successfully communicate in non-target
grammar context if they are given an interactive assessment task;
accuracy is not manipulated as it is predicted and promised by
Skehan (1998). Moreover, task takers usually stick to a lexicalized
communication--a survival strategy--that counteracts with Skehan's
claim to develop complexity index.
2.11 Integrative Task (IT) Model: An Alternative
To evaluate the current approaches to task-based assessment, Long
and Crookes (1992) outline some shared problems that despite periodic
discussions have never been resolved as follows: (a) The difficulty of
differentiating among tasks and their subtasks, which in turn raises such
questions as tasks finiteness or tasks generative capacity; (b) The issue
of task difficulty, that is, of determining the relevant grading and
sequencing criteria; and finally (c) Absence of a rigorous field
evaluation that none of the current approaches and their proposals have
yet been subjected to. As reviewed in the previous section, research
findings on the problem of task grading could not, as Long and Crookes
(1992) properly argue, "identify some valid, user-friendly sequencing
criteria, and it has remained one of the oldest unsolved problems in
language testing" (p. 46).
In conclusion, the need for an accountable set of criteria in grading
language assessment tasks in terms of difficulty estimates has motivated
81
this research project. The current study aims at proposing a task-based
framework for language assessment. A number of motives that
eventually led to the development the Integrative Task (IT) Model are
outlined as follows:
First, due to several drawbacks and inefficiencies of metalinguistic
theories underlying current language testing frameworks (For the full
discussion see Farhady, 2005), it seems reasonable enough to utilize a
psychometric theory of human intelligence, that is, Multiple
Intelligences (MI) theory to devise a number of language assessment
tasks so that their language contents are schematized based on the core
operations of MI categories. MI theory, therefore, provides the
researcher with "a sort of biogenetic syllabus to design Integrative
tasks" (Ghal-eh, 2007).
Second, the Integrative Task (IT) Model benefits from a cognitive
information-processing theory, that is, Bloom's Taxonomy of
Educational Objectives: Cognitive Domain that provides a well-suited
set of criteria for grading Integrative tasks. And, third, contrary to
current approaches to task-based language assessment that mainly
utilize linguistic grading criteria for scaling tasks (See Skehan, 1998;
Skehan & Foster, 2001), Integrative tasks are graded with respect to the
processing components involved in task performance which is a sort of
task-independent criteria; Integrative Task Scales adopt the hierarchical
levels of Cognitive Domain as its cumulative subscales to predict the
difficulty levels of Integrative tasks. An immediate consequence of such
task-independent rating scales, the assessment of the learners'
performance in the IT Model intends to be criterion-referenced, as
opposed to the current troublesome norm-referenced task-based
assessment trend.
82
In a nutshell, the Integrative Task (IT) Model is a prototype
framework of task-based language assessment, in which a psychometric
theory of human intelligence (i.e., Multiple Intelligences Theory), and a
cognitive information-processing theory (i.e., Taxonomy of Educational
Objectives: Cognitive Domain) are integrated into underlying constructs
of a number language assessment tasks.
CHAPTER III
Methodology
If the only tool you have is a hammer,

everything around you looks like a nail!
Abraham Maslow
In order to investigate four research questions listed in Chapter I, the

researcher has collected several sources of data from a range of
examinees' performances on Integrative tasks. In Chapter III, therefore,
the pertinent characteristics of the examinees, instruments, tasks
administration, and tasks rating procedure are discussed. Doing so, the
83
researcher hopes to clearly explain all the steps she has taken in
modeling the processes involved in assessing performances on
Integrative tasks under practical conditions of task-based test
development and use. To this end, she will provide a technical
description of her results not only in tables, but also in prose explaining
those tables.
These results will be further explored and discussed in Chapter IV,
where she will focus particularly on their relationship to the original
research questions and add several follow-up and summary analyses.
3.1 Participants
Two groups of examinees took part in this study: Initially, in order
to pilot the Integrative tasks content quality and administration
procedure, a number of English translation undergraduates (n=37),
similar in age (ranging from 20 to 39), gender (female), and nationality
(Iranian) to the main sample of the study, voluntarily participated in the
trial run of Integrative tasks on April 20, 2007. The pilot sampling was
administered in the Islamic Azad University, Karaj Branch.
Analyzing the data out of the pilot performances on Integrative
tasks, the researcher's purpose was to examine (a) the examinees'
understanding of what was expected from them to do on every task; (b)
the efficiency of the Integrative Task Scales; (c) the evaluation of the
Integrative task language content quality by the raters in the project; and
(d) the reasonable amount of time allotted for tasks completion.
Next, a total of 200 English translation undergraduates were sampled
non-randomly to participate in this research project as the main sample.
Since Integrative tasks were specifically developed for the university-
level non-native speakers of English (NNSs), the examinees in this
84
study were purposively recruited from an Iranian university-level
population studying in the Islamic Azad University, Karaj Branch.
The main sample size (n=200) was proportional to the number of the
whole population (n=400) based on Krejcie and Morgan's (1970) Table
of Sample Size. Moreover, three raters from Islamic Azad University,
Karaj Branch, and Allameh Tabatabai University, all professional in
foreign language testing and research methodology in social sciences,
participated in this research project.
Prior to embarking the project, the raters were given the required
training for rating Integrative tasks which primarily consisted of getting
to know Integrative tasks, Checklist of Integrative Task Specifications,
and Integrative Task Scales. Acknowledged as the empowering sources
in this project, the expert raters reliably interpreted and utilized the
scales and procedures originally designed by the researcher.
3.2 Design
Initially in this study, the two independent variables were arranged
in a 8 x 6 factorial design, with language proficiency as the dependent
variable. As the first variable, the core operations defined in Multiple
Intelligences Theory consisted of eight categories of human
intelligence, that is (1) Linguistic, (2) Mathematical, (3) Musical, (4)
Spatial, (5), Kinesthetic (6) Intrapersonal, (7) Interpersonal, and (8)
Naturalist which were used as the content domains in designing 48
Integrative tasks.
The second variable, the Taxonomy of Educational Objectives:
Cognitive Domain, consisted of six cumulative levels, that is (1)
Knowledge, (2) Comprehension, (3) Application, (4) Analysis, (5)
Synthesis, and (6) Evaluation which were used to grade Integrative tasks in
terms of their difficulty levels. In the Integrative Task (IT) Model,
85
therefore, these two variables were integrated into the twofactor constructs
underlying 48 Integrative tasks. Table 3.1 displays the integration of
the two variable sublevels i n the IT Model.
Table 3.1
The 8 x 6 Factorial Design in the IT Model
8 MI Core Operations
Linguistic Math. Musical Kinesth. Spatial Intra. Inter. Natu
Int. Int. Int. Int. Int. Int. ral.
Int.
Int.
Evaluatio
Subscale
Synthesis
Subscale
Analysis
Subscale
Application
Subscale
Comp.
Subscale
Knowledge
Subscale
86
Prior to the discussing the research instruments, two points should be
mentioned in the outset: First, what the researcher meant to investigate
in this study was merely the examinees' present language proficiency
(status quo) which was assessed through their performances on a group
of 48 Integrative tasks, so the researcher had no manipulation or
treatment within this research project. Second, only one group of
English translation undergraduates (n=200) was participated as the main
sample in this study.
3.3 Instruments
3.3.1 CELT
As a standard language proficiency test, The Comprehensive
English Language Test (CELT) has earned reputation for measuring
English as a second or foreign language proficiency. This test is
especially appropriate for assessing high school, college, and adult
nonnative speakers (NNs) learning English as their second or foreign
language.
For the interpretation of the CELT ratings, learners' Percent-Correct
scores are referenced against normative data for several reference
groups: (a) Foreign learners admitted to an American University ESL
programs; (b) Adult ESL learners attending a language upgrading
program in a community college; (c) Francophone ESL learners from
Quebec; (d) High school ESL learners; (e) Adult ESL learners in an
intensive language institute; and (f) Adult Non-Native speakers (NNs)
accepted for academic works.
CELT consists of two sections: (a) A Structure Section with 75
multiple-choice items that measures English sentence structures and
language use in terms of short conversations in which one or more
words are missing; and (b) A Vocabulary Section with 25
87
multiplechoice items that measures English vocabulary control and use.
The CELT Rating Scale ranges from zero (minimum language ability)
to 100 (maximum language ability). The participants in this study were
required to take CELT in a two-hour session 0n April 29, 2007. For a
copy of CELT administered in this project, see Appendix B.
3.3.2 MIDAS
As the other standard test, The Multiple Intelligences
Developmental Assessment Scales (MIDAS) questionnaire was used in
this study to provide individuals' multiple intelligences profiles. The
results of MIDAS were intended to provide a reliable estimate of each
learner's intellectual disposition in the eight intelligence domains.
Designed and sponsored by Shearer (1999), MIDAS consists of 80
items that inquire about the everyday abilities which require the
examinees to demonstrate their cognitive involvements and intellectual
judgments. MIDAS consists of eight main scales and 26 subscales.
Majority of the items (k=57) inquire about an examinee's level of skill
or performance of a specific activity. Fewer items (k=13) ask the
respondent to assess the frequency or duration of the time a person
participates in a particular activity. Finally, the smallest group of items
(k=10) inquires about a person's displayed enthusiasm. Each item uses a
five-point Likert scale that permits a range of responses: 65-80= Very
high, 50-65= High, 30-45= Moderate, 15-30= Low, and 15-0= Very
low. For the MIDAS Rating Scales, see Appendix C1.
Response anchors are individually tailored to match the specific
content in individual questions; respondents are not forced to provide
generalized responses or answer beyond their level of actual knowledge
because a zero category is induced for every item when the respondents
"Don’t know" or the item "Doesn’t apply."
88
Extensively administered in the Iranian educational contexts,
MIDAS has been normalized in several research studies (Ghal-eh,
2006; Rahimyan, 2003; Saeidi, 2004, to name a few) in terms of its
ecological validity or the threat of culture bias. Participants in this study
were required to complete MIDAS questionnaire in a one-hour-and-
ahalf session on March 19, 2007. For a copy of MIDAS administered in
this study, see Appendix C2.
3.3.3 Integrative Tasks

As a prototype language assessment framework, Integrative Task
(IT) Model consists of 48 paper-and-pencil, closed-ended language
proficiency tasks (See Nunan, 1991, for language task typology). The
tasks are called Integrative, because of the two-factor constructs they
operationalize: Technically speaking, the language contents in
Integrative tasks are vertically schematized, based on the eight
categories of Multiple Intelligences, while they are horizontally graded,
based on the six levels of Cognitive Domain in the Taxonomy of
Educational Objectives. (For the 8 x 6 factorial design in the IT Model
see Table 3.1)
In the Integrative Task matrix, therefore, every category of
intelligence, say Musical Intelligence, is the content domain--or the
biogenetic syllabus--for a group of eight Integrative tasks which are
graded based on the six levels of Cognitive Domain in terms of their
difficulty levels, that is, (1) Scale 0-1: Integrative Musical Task at
Knowledge Subscale; (2) Scale 0-2: Integrative Musical Task at
Comprehension Subscale; (3) Scale 0-3: Integrative Musical Task at
Application Subscale; (4) Scale 0-4: Integrative Musical Task at
89
Analysis Subscale; (5) Scale 0-5: Integrative Musical Task at Synthesis
Subscale; and (6) Scale 0-6: Integrative Musical Task at Evaluation
Subscale. As the examples of the 48 Integrative tasks in the IT Model,
the six Integrative Musical tasks are presented and briefly discussed.
Musical Task at Knowledge Subscale
How many syllables are in the word engineer?
a. 2 b. 3 c. 4 d. 5
As a multiple-choice type of task, this item requires the examinees to

recognize the number of syllables and choose the correct response. This
item measures the knowledge of a language (here is English) sounds
which is identified as a core operation in Musical Intelligence (Gardner,
1983). Meanwhile, "the ability to demonstrate the knowledge of a subject
matter" is an objective behavior identified in the Knowledge Level of the
Taxonomy of Educational Objectives: Cognitive Domain (Bloom, 1965,
p. 39).
Musical Task at Comprehension Subscale

Paraphrase the following lines by John Keats.
A thing of beauty is a joy forever,
Its loveliness increases, it will
Never pass into nothingness.
As an open-ended task type, this item requires examinees (a) to use

their knowledge of English vocabulary for reading the poem; and (b) to
reword the English verse into prose. The task content is a piece of poem
which requires the examinees' ability to understand rhythmical language
which is identified as a core operation in Musical Intelligence (Gardner,
90
1983). Meanwhile, "the ability to paraphrase" is an objective behavior
identified in the Comprehension Level of the Taxonomy of Educational
Objectives: Cognitive Domain (Bloom, 1965, p.43).
Musical Task at Application Subscale
In poetry, "Iambic" is a line in which an unstressed syllable is followed
by a stressed one. Read the following line by S. T. Coleridge.
Is it an iambic?
My way is to begin with the beginning.
As an open-ended task type, this item requires the examinees (a) to

demonstrate the knowledge of English sounds and syllables; (b) to
comprehend the text; and (c) to apply the given definition to the text in
order to respond the item. As a core operation in Musical Intelligence
(Gardner, 1983), the familiarity with the English sounds and syllables is
assessed in this item. Meanwhile, "the ability to apply a given rule" is
an objective behavior identified in Application Level of the Taxonomy
of Educational Objectives: Cognitive Domain (Bloom, 1965, p. 57).
Musical Task at Analysis Subscale

Black and White, Up and Down, Left and Right are examples of
Juxtaposition. Complete the following poem with appropriate
juxtapositions so that the rhythm in each line is reserved.
I am the ……………………., you are the arrow.
You are the ……………….., I am the night.
I am …………………………, you are the wheel.
You're never wrong, I am ………………….. .
91
As an open-ended task type, this item demands the examinees (a) to
demonstrate the knowledge of English sounds and syllables in order to
maintain the rhythm in the poem; (b) to comprehend the poem; (c) to
apply juxtaposition to the poem; and (e) to analyze the relationship
among words necessary for finding the appropriate words missed in the
lines. Similar to Musical tasks at the previous subscales, familiarity
with English sounds and syllables is identified as a core operation in
Musical Intelligence, while "the ability to examine the given
information and to break down the relationship among the components"
(here are the words) is mentioned as an objective behavior in Analysis
Level of the Taxonomy of Educational Objectives: Cognitive Domain
(Bloom, 1965, p. 65).
Musical Task at Synthesis Subscale

Following the directions, complete this haiku so that the poem reflects your
definition of "jealousy" to the readers.
Jealousy, (a noun)
……………….., ferocious, (two adjectives)
Destroying, ………………… , ……………… , ………………….. , (four
present participles)
Makes everything become ………………………, (a four-word phrase
with a past participle)
…………………… . (a synonym of the noun in Line 1)
As an open-ended task type, this item requires the examinees to (a)

demonstrate the knowledge of English sounds and syllables; (b) to
understand the theme of the poem; (c) to follow the directions in order
92
to complete the poem; and (d) to create their own meaning. As it was
previously mentioned, the knowledge of sounds and syllables is
identified as a core operation in Musical Intelligence (Gardner, 1983).
Likewise, "the ability to produce personal meaning which uniquely
communicates with the reader" is considered as an objective behavior in
Synthesis Level of the Taxonomy of Educational Objectives: Cognitive
Domain (Bloom, 1965, p. 70).
Musical Task at Evaluation Subscale

Meg urgently argues for the effective roles of music on TV programs.
Can you complete her comments with three advantages of using music
on TV programs?
Andy: "Why do they have music on TV news programs? To me it sounds
totally unnecessary!"
Meg: "Most probably they want to create a sense of drama! I think
music is supposed to …."
As an open-ended task type, this item requires the examinees (a) to

demonstrate a general appreciation for music and its potentials to affect
feelings; (b) to comprehend the theme and topic of the conversation; (c)
to collect positive attitudes towards music and to assert them in a series;
(d) to discriminate the positive from negative attributes; (e) to itemize
the positive attributes; and finally (f) to support their own position by
using effective language in terms of convincing examples and
explanations.
As a general knowledge, examinees' appreciation of music is
apparently dissociated from their knowledge of language sounds and
sound qualities. However, "the ability to understand the relationship
93
between music and human feelings" (Gardner, 1983, p. 62) is identified
as a core operation in Musical Intelligence. Similarly, "the ability to
judge the value of material based on one's personal values or opinions"
is considered as an objective behavior in Evaluation Level in the
Taxonomy of Educational Objectives: Cognitive Domain (Bloom,
1965, p. 198). For detailed lists of Multiple Intelligences Core Operations and
Objective Behaviors in Levels of Cognitive Domain, see Appendixes D and E,
respectively.
Already discussed, the six-point Integrative Task Scales was
developed based on the hierarchical levels of Cognitive Domain, and
carefully utilized to evaluate the examinees' language performances.
The notion of hierarchy has two important implications in the IT
Model: First, the tasks higher in the Integrative Task Scales are
highgraders, and those down in the scales are low-graders. Second,
according to the postulation made in the Gestalt Theory, "the ability to
perform a complex behavior holistically sums up the number of abilities
needed to perform its component parts, and even something more"
[italics added] (Cited in Bachman, 1990, p. 35). The difficulty
estimation in Integrative Task Scales, therefore, is based on the
increasing difficulty levels of objective behaviors identified in the
cumulative subscales. The descriptions of such objective behaviors
incorporated into the Checklist of Integrative Task Specifications to be
discussed in the next section.
3.4 Procedure
The procedure taken in this research project can be outlined in four
phases:
Phase I: Designing Checklist of Integrative Task Specifications
94
Integrative Task Scales, and Self-Rating Scales
Phase II: Developing Integrative Tasks
Phase III: Piloting Integrative Tasks and Revision Process
Phase IV: Administering Integrative Tasks and Self-Rating Scales
Phase I
3.4.1 Designing Checklist of Integrative Task Specifications
As earlier discussed, in order to design the Checklist of Integrative
Task Specifications, the researcher needed to integrate the levels of two
independent variables in this study--the categories of Multiple
Intelligences and the levels of Cognitive Domain-- into two-factor
constructs underlying Integrative tasks. Initially, an extensive list of MI
core operations was extracted out of Gardner's (1999) Frames of Mind.
The list consisted of detailed operational definitions of eight categories
of multiple intelligences, followed by typical social roles, and activity
types performed best by an individual who is good at an intelligence.
For the full list of MI Core Operations, see Appendix D.
Next, a thorough list of objective behaviors was outlined out of the
hierarchical levels of Cognitive Domain in Bloom's (1956) Taxonomy
of Educational Objectives: Cognitive Domain, Handbook 1. For the
full list of objective behaviors at each level of Cognitive Domain, see
Appendix E. Eventually, based on the two collected lists, the Checklist
of Integrative Task Specifications was designed which enabled the
researcher to grade the Integrative tasks and to score the examinees'
performances on the tasks.
As Table 3.2 shows, in the Checklist of Integrative Task
Specifications, the 48 Integrative tasks could initially be checked for the
MI core operations they selectively schematized. For example, a
linguistic task was assigned with one plus sign in Linguistic Intelligence
domain and seven minus signs in other intelligence domains. Moreover,
95
as Table 3.2 displays, Integrative tasks could be checked for the
objective behaviors they properly operationalized; Integrative tasks at
higher subscales, say at Synthesis or Evaluation Subscales, would
require all the abilities listed in the preceding subscales, as well as,
some more cognitive demands on the part of examinees to accomplish.
Considering the example above, therefore, Linguistic task at Analysis
Subscale was checked with four plus signs for Knowledge,
Comprehension, Application, and Analysis Subscales and two minus
signs for Synthesis and Evaluation Subscales.
3.4.2 Designing Integrative Task Scales

The researcher's approach to design Integrative Task Scales was to
incorporate the objective behaviors identified in six levels of Cognitive
Domain as the scaling criteria into the difficulty estimation of
Integrative tasks. In order to generalize the examinees' performances on
Integrative tasks to real-world tasks, the researcher's initial assumption
was that if, for example, a learner could accomplish a task at Analysis
Subscale, then, that learner would likely be able to accomplish
realworld tasks with similar behavioral configurations.
As Table 3.3 displays, the Integrative Task Scales was designed in
such a way that it consisted of a cumulative range of scales. Therefore,
as the difficulty level of the subscales was increasing from Knowledge
to Evaluation Subscale, so were the credits given to the tasks
accomplishment exceeding. This six-point cumulative scaling system
could be considered as a type of task-independent scales since
Integrative Task Subscales were not operationally defined for individual
48 Integrative tasks. This task-independency was anticipated to increase
the predictive adequacy of the IT Model for generalizing examinees'
96
success on the Integrative task to the real-world tasks. In
conclusion, the design of Integrative Task Scales has implied a number
of implications for the IT Model: o Virtually, as one moves up the
Integrative Task Scales, the credit range exceeds from the Knowledge
Subscale (0-1), through Comprehension (0-2), Application (0-3),
Analysis (0-4), Synthesis (0-5) to Evaluation (0-6) Subscale. o The
scoring procedure needs to follow a Partial-Credit Rating System, rather
than an Absolute-Credit one (See Bachman, 2004) since the credits
given to Integrative task accomplishment is cumulatively distributed to
include the credits in the preceding subscales (except for Knowledge
Subscale). o Regarding language task typology (See Nunan, 1991),
the versatility of the Integrative Task Scales allows both closed-ended
and open-ended task types so that the window of opportunity is opened
for the examinees to demonstrate their abilities in a better and broader
variety of ways.
3.4.3 Designing Self-Rating Scales

As a qualitative support for the data which were quantitatively
collected out of administering Integrative tasks, designing a type of self-
rating scales seemed necessary in this study. As Table 3.4 shows, for
rating each Integrative task a three-point Likert Scale was designed that
permitted a range of responses to the self-rating questions from one
(disagreement), two (moderate agreement), to three (agreement) for
individual Integrative tasks. The examinees were required to fill in the
Self-Rating sheets as the Post-test questionnaire. They were asked to
express themselves for the degree of familiarity with individual
Integrative tasks (Question 1), the extent to which they evaluate their
97
performances on individual Integrative tasks (Question 2), and the ease
with which they performed on individual Integrative tasks (Question 3).
Table 3.4
Self-Rating Scales
Self-Rating Sheet
1. To what extent were you familiar with this task?
Very familiar Somehow familiar Not familiar

Task 1 3 2 1
Task 2 3 2 1 Task 3
3 2 1 Task 4 3
2 1
………..
2. How well did you do on this task?
I did very well. I did Okay. I did not do we

Task 1 3 2 1
Task 2 3 2 1 Task 3
3 2 1
Task 4 3 2 1
……….
3. How easy or difficult was this task?
Easy to do Possible, but not easy Difficult to do

Task 1 3 2 1
Task 2 3 2 1
Task 3 3 2 1 Task 4
3 2 1
In designing Self-Rating Scales, the researcher's initial assumption

was that the examinees' perceptions of Self-Familiarity,
SelfPerformance, and Self-Ease could be related to their performances
98
on Integrative tasks. The researcher, therefore, hypothesized that those
examinees who were more familiar with Integrative task types, who
highly rated their own performances, and who found Integrative tasks
relatively easy to perform would also receive higher performance ratings
on Integrative tasks.
Phase II
3.4.4 Developing Integrative Tasks
As it was mentioned, the first version of 48 Integrative tasks was
thoughtfully prepared out of the two lists of core operations in MI
categories and objective behaviors in Cognitive Domain to be piloted
for appropriacy of the content quality and tasks format. It is also worthy
to mention that, as the lead-on to the examinees' performances on
Integrative tasks, all the tasks at Knowledge Subscale were prepared in
closed-ended items, mainly in multiple-choice type. While the
examinees were moving through the items, they were required to work
with open-ended tasks commonly perceived as more demanding in terms
of language production and use. For Pilot Integrative tasks see Appendix
F1.
Phase III
3.4.5 Piloting Integrative Tasks and Revision Process
After designing the first version of Integrative tasks, the researcher
piloted them with 37 English translation undergraduates, similar in age
(ranging from 20 to 39), gender (female), and nationality (Iranian) to the
main sample of this study. Next to the trial run of Integrative tasks, the
three raters who participated in the study were required to (a) use the
Checklist of Integrative Task Specifications in evaluating the content
99
quality of the Integrative tasks; and to (b) use the Integrative Task Scales
in rating the Pilot examinees' performances on individual Integrative
tasks.
Therefore, after a minimal training which primarily consisted of
getting to know the 48 Integrative tasks, the Checklist of Integrative Task
Specifications and the Integrative Task Scales, the raters independently
applied a coding system to evaluate the 48 Integrative task contents. To do
so, the raters simply assigned plus signs in the Checklist of Integrative
Task Specifications for the MI core operations properly schematized
through the content of an Integrative task in an MI category, and for the
objective behaviors adequately operationalized by the task.
Next to evaluating Integrative tasks content quality, the examinees'
performances on Integrative tasks in the Pilot study were independently
scored by the raters using Integrative Task Scales. The examinees' final
scores on each task then were based on the average of the three ratings for
that task. The full descriptive data for Piloting Integrative Tasks is
presented in Appendix G1.
Later discussed in detail in Chapter IV, statistics revealed that the
distribution of scores for three items--Intrapersonal Task at Application
Subscale (Intra/Ap), Musical Task at Analysis Subscale (Mus/An), and
Musical Task at Synthesis Subscale (Mus/Syn)--was positively skewed.
Reported as problematic, then, these three items were revised and re-rated
before final administering the Integrative tasks. For the Revised
Integrative tasks, see Appendix F2.
Phase IV
3.4.6 Administrating Integrative Tasks and Self-Rating Scales
Revised Integrative tasks were administered in 120 minutes with 200
100
English translation undergraduates studying in Islamic Azad University,
Karaj Branch on June 3, 2007. The examinees' performances were
carefully scored by the three raters, so that the statistical data obtained for
Integrative task administration were based on the average of the three
raters' scores for individual Integrative tasks. Immediately followed by
the Integrative task administration, all examinees (n=200) completed the
Self-Rating sheets. To fill in this post-test questionnaire, the examinees
were required to check their performance for Self-Familiarity,
SelfPerformance, and Self-Ease they experienced going through Question
1 to 3 for individual Integrative tasks. It took about 20 minutes for the
examinees to complete the self-rating sheets.
CHAPTER IV
Data Analysis and Discussions
2QHRIWKHPRVWGDQJHURXVIRUPVRIKXPDQHUURULV
IRUJHWWLQJZKDWRQHLVWU\LQJWRDFKLHYH
3DXO1LW]H
101
In Chapter IV, the original research questions posed in Chapter I are
addressed and the statistical procedures carried out to answer them will
be followed. In order to organize the chapter, the research questions
themselves will be used as Chapter IV subheadings.
Research Quation I o Is there any statistically significant

relationship between Integrative task ratings and examinees' self-
ratings on task performance outcomes?
To answer Research Question I, the researcher will initially discuss

the descriptive statistics for Integrative Task ratings and examinees' self-
ratings respectively. Next, the measures of Pearson product-moment
correlation are computed for Integrative task ratings and three subscales
of learners' self-ratings in order to investigate the presence of
statistically significant correlations.
4.1 Integrative Task Ratings

As it was discussed in Chapter III, Integrative tasks were piloted with
a sample of 37 English translation undergraduates who
voluntarily participated in the trial run of the tasks. Pilot performances
on Integrative tasks were independently scored by the three raters who
participated in this study. The examinees' final scores on an
Integrative task then were based on the average of the three ratings.
For the full data, see Appendix G1.
The descriptive statistics computed after Piloting Integrative tasks
included the number of raters (1, 2, 3), the means (M), standard
deviations (SD), maximum scores (Max), minimum scores (Min), and
102
skewedness (Skew). It should be mentioned that, throughout the rest
of this study, the researcher will use the terms items and tasks
interchangeably to refer to the same concept.
As Appendix G1 displays, each rater used the Integrative Task Scales
to evaluate Pilot examinees' performances (n=37) on individual
Integrative task items (k=48). The mean scores for each rater on
Integrative tasks indicated that, on average, the raters varied from one
another on individual items no more than .75 on the cumulative scales of
zero to six points. In fact, the raters' consistency was quite high for many
of the items. The standard deviations, minimum, and maximum values all
indicated that the raters produced a wide range of scores on individual
Integrative tasks. In fact, the raters appeared to have used the full range
of possible values (from 0.00, to 6.00) in their scoring. However, the
skewedness statistics for three raters on each item indicated that the
distribution for Intrapersonal Task at Application Subscale (Intra/Ap),
Musical Task at Analysis Subscale (Mus/An), and Musical Task at
Synthesis Subscale (Mus/Syn) was positively skewed across all raters
(marked with asterisks with bolded statistics in Appendix G1), while
none of the other items were substantially skewed for any of the raters. In
addition, the raters appeared to have generally rated Pilot performances
in similar ways. For example, in every subscale, the items with lower
mean ratings consistently (across raters) intended to have a positive skew
statistic, while the items with higher mean ratings showed consistent
negative skewedness. The average statistics across three raters for
Piloting Integrative tasks are summarized in Table 4.1.
Table 4.1
Descriptive Statistics for Piloting Integrative Tasks
103
9DULDEOHV 0HDQ 6' 0LQ 0D[ 6NHZ 1
7RWDO
,QWHJUDWLYH
7DVNV
Reported as positively skewed, the items Intrapersonal Task at

Application Subscale (Intra/Ap), Musical Task at Analysis Subscale
(Mus/An), and Musical Task at Synthesis Subscale (Mus/Syn) were
revised before the administration of Integrative tasks with the main
sample. For Revised Integrative tasks see Appendix F2. The revised items
were rated again for their content quality, and the computed Pearson
product-moment correlation coefficient was r=.87 statistically significant
at a two-tailed p<.01.
Revised and positively evaluated, Integrative tasks were administered
in 120 minutes on June 3, 2007. As displayed in Appendix G2, similar
types of statistics to those computed for Pilot administration were
obtained plus one additional statistic for each item: the Item Total
Correlation (r item-total also called the Item Discrimination Index), which
showed the measures of correlation between individual items, on the one
hand, and total Integrative Task ratings, on the other hand. This statistic is
further discussed to support the Inter-Item Reliability of Integrative Task
ratings.
The mean scores for Integrative tasks administration indicated that the
item scores ranged, somewhat in their average difficulty, from a low mean
of .32 on a two-point scale for Integrative Musical Task at Knowledge
Subscale (Mus/K), to a high mean of 5.54 on a six-point scale for
Integrative Linguistic Task at Evaluation Subscale (Ling/Eva), with an
104
average mean of 3.60 across all items. The average of the means indicated
that the items were reasonably well-centered; likewise, the conclusion
might be that the item means themselves were well-centered, though there
was some variation in the difficulty of individual items. The standard
deviations indicated that each item produced a variance to a fairly high
degree, though the degree to which they did so ranged moderately from a
rather low SD of 1.00 to a high SD of 1.85. The minimum and maximum
statistics indicated that raters on average utilized the entire six-point range
in assigning their scores for all 48 Integrative tasks. In addition, in terms
of skewedness, none of the items appeared to be markedly skewed even
though there was a moderate variation in difficulty of the tasks. Note also
that the N-sizes were the same (200) for all items. The summary of the
descriptive statistics for Integrative task administration can be seen in
Table 4.2.
7DEOH
7DVNV 6' 0LQ 0D[ 6NHZ U 1
0HDQ LWHP
WRWDO
7RWDO
,QWHJUDWLYH *
7DVNV
VWUDWLRQ
'HVFULSWLYH6WDWLVWLFVIRU,QWHJUDWLYHWDVNV$GPLQL
6HOI5DWLQJV
105
Next to analyzing the correlation data for Integrative tasks
administration, the statistics for the examinees' self-ratings were
computed. The descriptive data for examinees' self-ratings consisted of
the means (M), standard deviations (SD), minimum scores (Min),
maximum scores (Max), and skewedness (Skew). As Appendix H
displays, on the average, the examinees appeared to have properly
differentiated among Integrative task items based on the three self-rating
subscales, that is, Self-Familiarity, Self-Performance, and Self-Ease.
The means of self-ratings ranged across much of the three-point scales
(from an overall low mean of 1.09 to an overall high mean of 2.65),
while the standard deviations, minimum, maximum and skew statistics
showed that the examinees tended to use the entire scales in evaluating
their performances on Integrative tasks. The lower means (in
combination with relatively high positive skew statistics) belonged to the
items for which the examinees rated their Self-Familiarity,
SelfPerformance, or Self-Ease relatively as low and vice verse. The
standard deviations ranging from .36 to .49 indicated that perhaps these
subscales were producing less variance, even proportionally, than
Integrative task ratings. At the same time, the minimum and maximum
statistics indicated that the examinees have utilized most of the one to
three scales for all items. The skew statistics, also, showed that none of
the scores were markedly skewed. Table 4.3 summarizes the computed
results for the average ratings across all Integrative tasks on subscales of
Self-
Familiarity, Self-Performance, and Self-Ease.
7DEOH
$YHUDJH6HOI5DWLQJVDFURVVDOO,QWHJUDWLYH7DVNV
106
9DULDEOH06'0LQ0D[6NHZ1
6HOI)DPLOLDULW\ 6HOI3HUIRUPDQFH
6HOI(DVH
$OOFRUUHODWLRQZHUHVLJQLILFDQWDWS
In conclusion, the qualitative data obtained out of administering

Selfrating Scales might suggest that the examinees properly and
consistently perceived differences in performance on 48 Integrative tasks,
in terms of the familiarity with the tasks, the ease with which each task
was performed, as well as their own performances on individual
Integrative
tasks.
4.3 Correlation of Performance Measures

Next to computing correlation statistics for Integrative task ratings and the
examinees' self-ratings separately, the researcher obtained correlation
coefficients for both measures of examinees' performances, that is,
Integrative tasks and self-ratings, in order to find the extent to which these
two sets of quantitative and qualitative data are significantly correlated as
the requirement in Research Question I.
Table 4.4 represents the computed correlation coefficients for ratings
across the performance measures: (a) Integrative Task ratings expressed
as raw scores (I-T Raw) and (b) Average self-ratings (completed across all
Integrative tasks) for the three self-rating subscales (Self-Familiarity, Self-
Performance, Self-Ease).
107
7DEOH
&RUUHODWLRQ6WDWLVWLFVIRU3HUIRUP
DQFH0HDVXUHV
,75DZ6HOI)DP6HOI3HUI6HOI(DVH ,75DZ
6HOIIDP
6HOISHUI
6HOI(DVH
1RWH$OOFRUUHODWLRQVZHUHVWDWLVWLFDOO\
VLJQLILFDQWDWWZRWDLOHGS
As Table 4.4 shows, both performance measures (i.e., from I-T Raw
scores to Self-Ease statistics) exhibited strong correlations with one
another, ranging from r=.64 (between I-T Raw scores and Self-Ease) to
r=.90 (between Self-Performance and Self-Ease). Average examinees' self
ratings correlated quite well (r=.64 to r=.96) with Integrative Task ratings
as well as with each other. In addition, the average examinees' self-ratings
for their own performances (Self-Performance) correlated consistently
higher than the average self-ratings for familiarity with the tasks
(SelfFamiliarity) or the average ratings for ease in performing the tasks
(Self-
Ease). In short, the overall correlation among two sets of ratings in this
project was quite better than what was expected.
In conclusion, posing Research Question I, the researcher's main concern
was to investigate the credibility of the Integrative tasks in terms of the
extent to which the quantitative data obtained out of examinees'
performances on Integrative tasks would be supported by the qualitative
data obtained out of the examinees' self-evaluation. Technically speaking,
the degree of association between two sets of data is investigated by
means of computing various measures of correlation. To this end,
108
therefore, the researcher computed the measures of Pearson
productmoment correlation among Integrative task raw scores and three
subscales of self-ratings. The high positive measures of correlation
provided the researcher with reliable support to reject Null Hypothesis I:
There is a statistically significant positive relationship between Integrative
Task ratings and learners' self-ratings on task performance outcomes.
Research Quation II
o Are Integrative Task ratings reliable? If yes, (IIa)
What are Inter-Rater reliability estimates?
(IIb) What are Inter-Item reliability estimates?
Intricacies of the rating criteria and scaling procedures which eventually

led to ample statistics were the causes behind the researcher's early concern
for the consistency in assessing examinees' performances across individual
and overall Integrative tasks. In this study, therefore, two common types of
reliability estimates were conducted (IIa) Interrater and (IIb)
Inter-item reliability measures.
4.4 Inter-Rater Reliability Estimates

Test reliability indicates the degree to which a set of test scores is
consistent in what it measures. Inter-Rater Reliability is a type of
reliability which refers to consistency of ratings given by different raters
to a sample of performances. In general, reliability estimates can range
from zero to 1.00, and they indicate the extent to which ratings can be
considered consistent. Any value of less than .70 would be the indication
of a possible room for test improvement either through briefing the raters
or the system of scoring used.

109
In this study, in order to estimate the degree of consistency among the
raters' scores given to the examinees' performances on 48 Integrative
tasks, the researcher computed the measures of Pearson product-moment
correlation as a widely-used test of Inter-Rater Reliability. Table 4.5
reports the correlation coefficients for pairs of raters across all Integrative
tasks: Rater 1 and Rater 2, Rater 1 and Rater 3, Rater 2 and Rater 3,
Average of the three sets of ratings (the Fisher z transformation method of
averaging), as well as the Full-Test Reliability measures adjusted by the
Spearman-Brown Prophecy. See Appendix I, for full Inter-Rater
Reliability Estimates.
7DEOH
,QWHU5DWHU5HOLDELOLW\
(VWLPDWHVIRU,QWHJUDWLYH7DVN 5DWLQJV
7DVNVU U U DYHUDJHU$GMXVWHG6%
7RWDO
,QWHJUD
W LYH
7DVNV
As summarized in Table 4.5 and fully displayed in Appendix I, the

reliability of the individual ratings was shown by measures of Pearson
product-moment correlation, and the reliability of the three scores taken
together was represented by the Adjusted Spearman-Brown statistic in
the column furthest to the right. The reliability estimates for the
individual ratings varied considerably from r=.45 to r=.98, while the
reliability indexes for the three ratings taken together (Adjusted using the
Spearman-Brown Prophecy formula) on each item were generally higher
(as would be expected) and their range was considerably less (from r=.65
110
to r=.99). Also the pairs of total scores correlated at r=.94, and the
adjusted three-rating reliability was r=.98, statistically significant at a
two-tailed p<.05.
In short, to estimate the Inter-Rater reliability for Integrative task
items the researcher's speculation was that if the assigned scores to the
examinees' performances on Integrative tasks would be paired with one
another (across the three raters) they would probably show significant
correlation measures. Eventually, the computed correlation statistics
appeared to be quire respectable and supported the consistency of
Integrative task ratings in terms of Inter-Rater reliability.
4.5 Inter-Item Reliability Estimates

Inter-Item Reliability or Internal-Consistency Reliability is a measure of
the degree to which the items or parts of a test are homogeneous or
consistent with each other and with the total test scores. Similar to other
tests of reliability, the Inter-Item Reliability test ranges from zero to 1.00.
In this study, the measures of Item Total Correlation (r item-total, also called
the Item Discrimination Index), were computed which normally show the
degree of correlation between individual items and the total ratings on a
test.
As Table 4.6 shows, the statistics computed consisted of the means
(M), the standard deviations (SD), the minimum scores (Min), the
maximum scores (Max), measures of skewdness (Skew), and Item Total
Correlation for a sample size of 200. Presented earlier in Section 4.1, the
significance of Item Total Correlation coefficients left to be discussed in
this section. As Table 4.6 displays, the average Item Total Correlations of
r=.78 indicated that all items discriminated in a manner quite similar to
the total scores on Integrative tasks, though they had done so to varying
111
degrees with correlations ranging from r=.21 to r=.79. For full data, see
Appendix G2. It was also noticed that all the correlation coefficients
were
statistically significant at a two-tailed p<.05.
7DEOH
,QWHU,WHP5HOLDELOLW\
(VWLPDWHVIRU,QWHJUDWLYH7DVN5DWLQJ
7DVNV 6' 0LQ 0D[ 6NHZ U 1
0HDQ LWHP
WRWDO
7RWDO
,QWHJUDWLYH *
7DVNV
In a nutshell, the main concern in Research Question IIa and IIb was to
investigate the degree of consistency of Integrative Task ratings which
was examined by testing Inter-Rater and Inter-Item Reliability. On the
one hand, the high correlation coefficients for three raters' sets of ratings
revealed that the Integrative Task ratings remained consistent in three
independent scoring procedures. On the other hand, the statistically
significant measures of Item Total Correlation indicated that the
individual Integrative task rating discriminated examinees' performances
in a similar pattern to their total scores on Integrative tasks. Confidently
enough, then, the researcher rejects Null Hypotheses II: Integrative Task
ratings are statistically consistent in terms of (IIa) Inter-Rater and (IIb)
Inter-Item reliability estimates.
112
Research Question III
o Are the Integrative Task ratings valid? If yes, (IIIa)
What are Content Validity estimates?
(IIIb) What are Criterion-Related Validity estimates?
(IIIc) What are Construct Validity estimates?
Simply defined, test validity is the degree to which the interpretations

or decisions we make on the basis of test scores are meaningful, useful,
and appropriate. In this study, therefore, three widely-used methods of
estimating validity, that is, content validity, criterion-related validity, and
construct validity were computed to examine the trustworthiness of the
Integrative Task ratings which are discussed in turn.
4.6 Content Validity

In simple words, content validity of a test can be defined as the extent
to which a test adequately and sufficiently measures the particular skills or
behaviors it sets out to measure. Since the beginning of this study, the
procedure for designing Integrative tasks and developing Integrative Task
Scales were carefully planned so that the content validity of the
Integrative tasks would be enhanced. Furthermore, as it was explained
earlier, three experienced language raters and research professionals
independently utilized the Checklist of Integrative Task Specifications to
evaluate the content validity of the 48 Integrative tasks designed in this
project.
The raters were asked to (a) use the Checklist of Integrative Task
Specifications for evaluation of the content quality of Integrative tasks and
to (b) use the Integrative Task Scales for evaluation of the examinees'
performances on individual task items. After a minimal training, the raters
113
independently applied a coding system to evaluate the 48 Integrative Task
contents. To do so, the raters simply assigned plus signs using Checklist
of Integrative Task Specifications for the core operations properly
schematized by an Integrative task in an MI category, and for the
objective behaviors adequately operationalized by the task in a Cognitive
Domian subscale (See Checklist of Integrative Task Specifications, p.
129). To compare their evaluations, the researcher computed the measures
of Pearson product-moment correlation which are summarized in Table
4.7.
7DEOH
&RQWHQW9DOLGLW\RI,QWHJUDWLYH7DVNV
Correlations
rater1 rater2 rater3
rater1 Pearson Correlation 1 .510 ** .628 **

Sig. (2-tailed) .000 .000
39.000 18.000 23.600
Covariance .394 .182 .238
N 48 48 48
rater2 Pearson Correlation .510 ** 1 .688 **

Sig. (2-tailed) .000 .000
18.000 32.000 23.400
N 48 48 48
rater3 Pearson Correlation .628 ** .688 ** 1
Sig. (2-tailed) .000 .000
23.600 23.400 36.160
N 48 48 48
**. Correlation is significant at the 0.01 level (2-tailed ).
As
presented in Table 4.7, the correlation measures exhibited moderate-to-
strong relationships with one another, ranging from .510 (Rater 1 and
114
Rater 2), through .628 (Rater 1 and Rater 3), to .688 (Rater 2 and Rater
3). The homogeneity of covariances supported the significance of the
correlation coefficients at a two-tailed p<.01. The statistically significant
measures of correlation, therefore, showed that the raters were consistent
in their evaluation of Integrative task content quality. In addition,
because the estimates of content validity were computed based on the
ratings assigned by three expert raters, the reliability of their
judgments would be enhanced.

In a nutshell, the major concern in Research Question III (a), was to
investigate the appropriateness of Integrative tasks content which was
examined through measuring Pearson product-moment correlations
among the three professional raters' judgments. As it was expected, the
acceptable range of correlation coefficients revealed that the
interpretations made on Integrative task ratings were trustworthy enough
in terms of content validity.
4.7 Criterion-Related Validity

Criterion-Related Validity (also called Statistical Validity,
Pragmatic Validity, or External Validity) is a form of validity in which
a test is compared or correlated with an outside criterion measure. In
this study,
Comprehensive English Language Test (CELT), and Multiple
Intelligences Developmental Assessment Scales (MIDAS) were
administered as two criterion tests of language proficiency and Multiple
Intelligences. The CELT and MIDAS ratings were separately correlated
with the examinees' performances on Integrative tasks in terms of
Pearson product-moment coefficients.
115
CELT
As a prototype framework for assessing general language

proficiency, the IT Model required some statistical evaluation to
support the dependability of the scaling criteria developed to score
Integrative tasks. To this end, in this study, the Comprehensive English
Language Test (CELT) was administered as an external criterion test
with which Integrative task ratings in every MI category were
correlated. Initially, the descriptive statistics were obtained on CELT
scores.
7DEOH
'HVFULSWLYH6WDWLVWLFVIRU&(/76FRUHV
7HV 0 0D[ 0LQ 6' 1
W
&(/7
9DOLG
1
/LVWZL
VH
As Table 4.8 shows, statistics for CELT scores included the mean (M),
maximum scores (Max), minimum scores (Min), standard deviations
(SD), and the number of learners (N). The measures of mean, minimum,
and maximum scores all revealed that the test scores were reliably well-
centered while the values of standard deviations showed a rather wide
range of CELT scores. After administering Integrative tasks, the
researcher computed the Pearson product-moment correlation
coefficients to associate the examinees' Integrative Task ratings with their
CELT scores as the first measure of criterion-related validity. To the
116
researcher's speculations, examinees with higher CELT scores would do
better on Integrative tasks designed in each MI category.
Prior to the discussion of the CELT and Integrative tasks R-matrixes,
the concept of correlation should be elaborated in the outset. A
correlation coefficient is a static devised for the purpose of measuring the
strength, or the degree of linear association between two variables. The
most familiar correlation coefficient is Pearson product-moment
correlation coefficient (r). The Pearson correlation is so defined that it
can take values only within the range of -1 to +1. The Pearson
productmoment correlation indexes are normally represented in
Scatterplots. The larger the absolute value (i.e., ignoring the sign), the
narrower the ellipse, and the closer to the regression line the points in the
scatterplot will fall. In other words, the narrower the elliptical cloud of
points, the stronger the association and the greater the absolute value of
the Pearson correlation. Whenever there is no or weak association
between two variables, their scatterplot should be a roughly circular
cloud. In summary, the variation and the clustering of the points around
the regression line determine the magnitude of the correlation
coefficients. The supposition of linearity between two variables,
therefore, must be confirmed by the inspection of the scatterplots.
With this introduction, one may conclude that in order to estimate the
criterion-related validity of Integrative task ratings, initially the Pearson
product-moment coefficients will be computed separately for
CELTIntegrative Task ratings, and MIDAS-Integrative Task ratings.
Next, the R-matrixes are re-examined for their consistency and accuracy
with following scatterplots. Moreover, the patterns of correlation among
individual Integrative tasks at different subscales are briefly discussed.
117
4.7.1 CELT and Integrative Linguistic Tasks
As Table 4.9 shows, examinees' CELT scores and Integrative
Linguistic ratings demonstrated a moderate-to-high correlation
coefficients from r=.483, for CELT and Integrative Linguistic task at
Analysis Subscale, to r=.647, for CELT and Integrative Linguistic task
at Comprehension Subscale. All the correlation statistics proved to be
significant at a two-tailed p<.01.
7DEOH
3HDUVRQ3URGXFWPRPHQW&RUUHODWLRQVIRU&(/7DQG
,QWHJUDWLYH/LQJXLVWLF5DWLQJVWDLOHG
Correlations
CELT LingK LingC LingAp LingAn LingS LingE
CELT Pearson Correlation 1 .589** .647** .487** .483** .581** .643**

Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingK Pearson Correlation .589** 1 .823** .503** .435** .435** .514**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingC Pearson Correlation .647** .823** 1 .720** .552** .546** .683**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingAp Pearson Correlation .487** .503** .720** 1 .654** .534** .596**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingAn Pearson Correlation .483** .435** .552** .654** 1 .820** .641**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingS Pearson Correlation .581** .435** .546** .534** .820** 1 .789**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingE Pearson Correlation .643** .514** .683** .596** .641** .789** 1
**. Correlation is significant at the 0.01 level (2-tailed).
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
As Figure 4.1 displays, examinees' CELT scores, ranging from 50.00 to

90.00, and their Integrative Linguistic ratings demonstrated narrow linear
patterns when the correlation statistics were high, for example in CELT
and Integrative Linguistic task at Knowledge Subscale (r=.589). The
cloudy circular patterns, however, could be seen when the correlation
coefficients were lower, for example in CELT and Integrative Linguistic
118
task at Application Subscale (r=.487). Moreover, the narrow linear
patterns among Integrative Linguistic ratings, in cases such as Linguistic
tasks at Comprehension and Application Subscales (r=.720), and circular
patterns, in cases such as Linguistic tasks at Comprehension and
Synthesis Subscales (r=.546), are graphically displayed in Figure 4.1. The
correlation statistics for CELT and Integrative Linguistic ratings,
therefore, were statically significant and the accuracy of the R-matrix was
confirmed with correlation patterns displayed in the scatterplot.
MIDAS CELT
LingK
50.00
LingC
LingAp
LingAn
LingS
LingE
60.00
66.0070.00
75.00
80.00
85.0090.00
LingK LingC LingAp LingAn LingS LingE
)LJXUH6FDWWHU3ORWIRU&(/76FRUHVDQG,QWHJUDWLYH
119
/LQJXLVWLF5DWLQJV
4.7.2 CELT and Integrative Mathematical Tasks

Examinees' CELT scores and their Integrative Mathematical ratings,
as Table 4.10 shows, displayed a low-to-moderate Pearson
productmoment correlation coefficients ranging from r=.292, for CELT
and Integrative Mathematical task at Comprehension Subscale, to
r=.464, for CELT and Integrative Mathematical task at Synthesis
Subscale. As displayed in Table 4.10, all measures of correlation
coefficients were
proved to be significant at a two-tailed p<.01.
7DEOH
,QWHJUDWLYH0DWKHPDWLFDO5DWLQJVWDLOHG
120
Correlations
CELT MathK MathC MathAp MathAn MathS MathE
CELT Pearson 1 .370** .292** .411** .458** .464** .364**

Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MathK Pearson .370** 1 .667** .549** .758** .736** .583**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MathC Pearson .292** .667** 1 .654** .627** .565** .532**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MathAp Pearson .411** .549** .654** 1 .738** .581** .620**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MathAn Pearson .458** .758** .627** .738** 1 .800** .688**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MathS Pearson .464** .736** .565** .581** .800** 1 .698**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MathE Pearson .364** .583** .532** .620** .688** .698** 1
Correlation .000 .000 .000 .000 .000 .000 Ran
Sig. (2-tailed) 200 200 200 200 200 200 200
g N ing
from 50.00 to 95.00, as Figure 4.2 displays, examinees' CELT scores

and their Integrative Mathematical ratings demonstrated narrow linear
patterns, in cases such as CELT and Integrative Mathematical task at
Analysis Subscale (r=.458) and circular patterns, in cases such as CELT
and Integrative Mathematical task at Application Subscale (r=.411).
Similarly, the high correlation coefficients for Integrative Mathematical
tasks, in cases such as Mathematical tasks at Knowledge and
Comprehension Subscales (r=.667), and circular patterns of correlation,
in cases such as Integrative Mathematical tasks at Comprehension and
Evaluation Subscales (.532), are graphically displayed. The
compatibility of the correlation statistics with their graphic
representations, therefore, supported the accuracy of the data in
121
the R-matrix.
CELT
MathK
50.00
55.00
60.0062.00
MathC
65.0068.00
69.0070.00
MathAp
75.0074.00 77.00
MathAn
80.0085.00
90.0095.00
MathS
MathE
MathK MathC MathAp MathAn MathS MathE
)LJXUH6FDWWHU3ORWIRU&(/7DQG,QWHJUDWLYH0DWKHPDWLFDO
5DWLQJV
&(/7DQG,QWHJUDWLYH0XVLFDO7DVNV
Examinees' CELT scores and their Integrative Musical ratings, as
Table 4.11 indicates, displayed a low-to-moderate Pearson
productmoment correlation coefficients ranging from r=.111, for CELT
and Integrative Musical task at Comprehension Subscale, to r=.454, for
CELT and Integrative Musical task at Synthesis Subscale. Except for the
correlation coefficient between CELT and Integrative Musical task at
122
Comprehension Subscale (r=.111) which was not statistically significant,
all measures of correlation coefficients proved to be significant at a two-
tailed p<.01.
7DEOH
,QWHJUDWLYH0XVLFDO5DWLQJVWDLOHG
Correlations
CELT MusK MusC MusAp MusAn MusS MusE
CELT Pearson 1 .332** .111 .208** .314** .454** .347**

Correlation .000 .117 .003 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MusK Pearson .332** 1 .821** .763** .733** .777** .677**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MusC Pearson .111 .821** 1 .790** .670** .632** .515**
Correlation .117 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MusAp Pearson .208** .763** .790** 1 .757** .705** .510**
Correlation .003 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MusAn Pearson .314** .733** .670** .757** 1 .796** .611**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MusS Pearson .454** .777** .632** .705** .796** 1 .760**
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
MusE Pearson .347** .677** .515** .510** .611** .760** 1
Correlation .000 .000 .000 .000 .000 .000
Sig. (2-tailed) 200 200 200 200 200 200 200
N
Ranging from 50.00 to 95.00, as Figure 4.3 displays, examinees' CELT

scores and their Integrative Musical ratings demonstrated linear patterns
123
of correlation, in cases such as CELT and Integrative Musical task at
Synthesis Subscale (r=.454), and circular patterns, in cases such as
CELT and Integrative Musical task at Application Subscale (r=.208).
Meanwhile, the narrow patterns of correlation for Integrative Musical
tasks, in cases such as Integrative tasks at Knowledge and
Comprehension Subscales (r=.821), and the circular patterns, in cases
such as Integrative Musical tasks at Evaluation and Application
Subscales (r=.510), are properly displayed. As a conclusion, the
correlation statistics and their graphic representations turned out to be
compatible as a sign of the accuracy of the findings in the R-matrix.
CELT
MusK
50.00
60.0055.00
MusC
MusAp
MusAn
MusS
MusE
62.00
65.00
68.0069.00 70.00
74.00
75.0077.0080.00
124
85.00
90.00
95.00
MusK MusC MusAp MusAn MusS MusE
)LJXUH6FDWWHU3ORWIRU&(/
7DQG,QWHJUDWLYH0XVLFDO 5DWLQJV
4.7.4 CELT and Integrative Kinesthetic Tasks

Examinees' CELT scores and their Integrative Kinesthetic ratings
had, as Table 4.12 indicates, low Pearson product-moment correlation
coefficients ranging from r=.137, for CELT and Integrative Kinesthetic
task at Comprehension Subscale, to r=.242, for CELT and Integrative
Kinesthetic task at Synthesis Subscale. Despite low Pearson
productmoment correlation statistics, they were proved to be significant
at a two-
tailed p<.01.
7DEOH
,QWHJUDWLYH.LQHVWKHWLF5DWLQJVWDLOHG
125
Correlations
.222** CELT 1 KinK .20 KinC KinAp KinAn KinS .24 KinE
6 .137* .225* .102* 2
* **
CELT Pearson Correlation .003 .005 .002 .004 .001 .002
.525** Sig. (2-tailed) 200 200 200 200 200 200
N 200
KinK Pearson Correlation .206 * 1 .694 ** .612 ** .412 ** .530 ** .000
Sig. (2-tailed) .003 .000 .000 .000 .000 200
.684** N 200 200 200 200 200 200
KinC Pearson Correlation .137* .694 ** 1 .651 ** .584 ** .614 ** .000
Sig. (2-tailed) .005 .000 .000 .000 .000 200
N 200 200 200 200 200 200
.528**
KinAp Pearson Correlation .225* .612 ** .651 ** 1 .770 ** .693 ** .000
Sig. (2-tailed) .002 .000 .000 .000 .000 200
N 200 200 200 200 200 200
KinAn Pearson Correlation .102* .412 ** .584 ** .770 ** 1 .749** .514**
Sig. (2-tailed) .000 .000
.004 .000 .000 .000
N 200 200
200 200 200 200 200
KinS Pearson Correlation .242** .530** .614** .693** .749** 1 .704**
Sig. (2-tailed) .001 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
KinE Pearson Correlation .222** .525** .684** .528** .514** .704** 1
Sig. (2-tailed) .002 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
As Figure 4.4 displays, ranging from 50.00 to 95.00, examinees' CELT

scores and their Integrative Kinesthetic ratings demonstrated linear
patterns of correlation, in cases such as CELT and Integrative Kinesthetic
task at Knowledge Subscale (r=.206), and roughly circular patterns of
correlation, in cases such as CELT and Integrative Kinesthetic task at
Analysis Subscale (r=.102). Moreover, as it was expected, based the
correlation data in the R-matrix, most Integrative Kinesthetic tasks did not
show any linear patterns of correlation.
CELT
50.00
55.00
60.00
62.00
65.00
68.00
69.00
70.00
74.00
75.00
77.00
80.00
85.00
90.00
126
KinC
KinAp
KinAn
95.00
KinS
KinE
KinK KinC KinAp KinAn KinS KinE
7DQG,QWHJUDWLYH
.LQHVWKHWLF5DWLQJV
4.7.5 CELT and Integrative Spatial Tasks

As Table 4.13 indicates, examinees' CELT scores and their
Integrative Spatial ratings showed low Pearson product-moment
correlation coefficients ranging from r=.075, for CELT and Integrative
Spatial task at Knowledge Subscale, to r=.328, for CELT and Integrative
Spatial task at
Application Subscale. Despite the low range of correlation coefficients,
all r-measures were statistically significant at a two-tailed p<.01 or
p<.05.
7DEOH
,QWHJUDWLYH6SDWLDO5DWLQJVWDLOHG
127
Correlations
CELT SpatK SpatC SpatAp SpatAn SpatS SpatE
CELT Pearson Correlation 1 .075* .195 * .328 ** .141 * .198 ** .217 *

Sig. (2-tailed) .003 .006 .000 .004 .005 .002
N 200 200 200 200 200 200 200
SpatK Pearson Correlation .075* 1 .187 ** .097 .150 * .116 .087
Sig. (2-tailed) .003 .008 .170 .035 .103 .219
N 200 200 200 200 200 200 200
SpatC Pearson Correlation .195 ** .187 ** 1 .677 ** .685 ** .512 ** .708 **
Sig. (2-tailed) .006 .008 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatAp Pearson Correlation .328 ** .097 .677 ** 1 .709 ** .523 ** .551 **
Sig. (2-tailed) .000 .170 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatAn Pearson Correlation .141 * .150 * .685 ** .709 ** 1 .618 ** .633 **
Sig. (2-tailed) .004 .035 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatS Pearson Correlation .198 * .116 .512 ** .523 ** .618 ** 1 .638 **
Sig. (2-tailed) .005 .103 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatE Pearson Correlation .217 * .087 .708 ** .551 ** .633 ** .638 ** 1
Sig. (2-tailed) .002 .219 .000 .000 .000 .000
N 200 200 200 200 200 200 200
*. Correlation is significant at the 0.05 level (2-tailed).
In Figure 4.5, ranging from 50.00 to 95.00, the examinees' CELT scores
and their Integrative Spatial ratings demonstrated rather linear patterns
of correlation, in cases such as CELT and Integrative Spatial task at
Application Subscale (r=.328), and circular patterns, in cases such as
CELT and Integrative Spatial task at Synthesis Subscale (r=.217).
Moreover, as Figure 4.5 shows, the statistically insignificant correlation
coefficients for Integrative Spatial task at Knowledge Subscale with
most of other Spatial tasks are properly displayed as flat or even patterns
of correlation. The narrow linear patterns, in cases such as Spatial tasks
at Application and Analysis Subscales (r=.709), and the circular
patterns, in cases such as Spatial Tasks at Application and Evaluation
Subscales (r=.551) are graphically displayed. Similar to the previous
Figures, the compatibility of the correlation statistics and their graphic
representations supported the accuracy of the obtained
128
correlation data
CELT
SpatK
SpatC
SpatAp
SpatAn
50.00
55.00
60.0062.00
65.00
SpatS
68.00
69.0070.00
74.00
SpatE
75.00
77.00
80.00
85.00
90.00
95.00
SpatK SpatC SpatAp SpatAn SpatS SpatE
7DQG,QWHJUDWLYH6SDWLDO 5DWLQJV
4.7.6 CELT and Integrative Intrapersonal Tasks

Examinees' CELT scores and their Integrative Intrapersonal ratings showed, as
Table 4.14 indicates, low Pearson product-moment correlation coefficients ranging
from r=.208 for CELT and Integrative
Intrapersonal task at Analysis Subscale to r=.319 for CELT and Integrative
Intrapersonal task at Comprehension Subscale. As presented in Table 4.14, all r-
measures were statistically significant at a two-tailed p<.01 or p<.05.
129
7DEOH
Correlations
MIDAS IntraK IntraC IntraAp IntraAn IntraS IntraE
MIDAS Pearson Correlation 1 .314* .319** .237* .208* .214* .228**

Sig. (2-tailed) .005 .000 .003 .002 .004 .000
N CELT 200 200 200 200 200 200 200
CELTIntraK Pearson Correla tion .314* 1 .674** .638** .532** .633** .637**
Sig. (2-tailed) .005 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraC Pearson Correlation .319** .674** 1 .550** .364** .291** .358**

Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraAp Pearson Correlation .237* .380** .550** 1 .517** .322** .373**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraAn Pearson Correlation .208* .317** .364** .517** 1 .419** .394**
Sig. (2-tailed) .002 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraS Pearson Correlation .214* .334** .291** .322** .419** 1 .480**
Sig. (2-tailed) .004 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraE Pearson Correlation .228** .369** .358** .373** .394** .480** 1
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
,QWHJUDWLYH,QWUDSHUVRQDO5DWLQJVWDLOHG
130
Ranging from 50.00 to 95.00, as Figure 4.5 displays, examinees' CELT scores
and their Integrative Intrapersonal ratings demonstrated linear patterns of
correlation, in cases such as CELT and Intrapersonal task at Comprehension
Subscale (r=.319), and circular patterns in cases such as CELT and Intrapersonal
task at Evaluation Subscale (r=.228). Moreover, narrow linear patterns, in cases
such as Intrapersonal tasks at Application and Analysis Subscales (r=.517) and
roughly circular patterns, in cases such as Intrapersonal tasks at Comprehension
and Synthesis Subscales (r=.219) were compatible with the statistical data in the
R-matrix as a proof for the
correlation findings.
IntraK
CELT50.0055.00
60.00 62.00
IntraC
65.00 68.00
IntraAp
69.0070.00 74.00
75.00 77.00
IntraAn
80.00 85.00
90.00 95.00
IntraS
IntraE
IntraK IntraC IntraAp IntraAn IntraS IntraE
131
)LJXUH6FDWWHU3ORWIRU&(/7DQG,QWHJUDWLYH
,QWUDSHUVRQDO5DWLQJV
4.7.7 CELT and Integrative Interpersonal Tasks

Despite the low range of Pearson product-moment correlation coefficients,
examinees' CELT scores and their Integrative Interpersonal task ratings were, as
Table 4.15 indicates, statistically significant from r=.014, for CELT and Integrative
Interpersonal task at Analysis Subscale, to r=.208, for CELT and Integrative
Interpersonal task at
Evaluation Subscale. All measures of correlation were proved
statistically significant at the two-tailed of p<.01 or p<.05.
7DEOH
3HDUVRQ3URGXFWPRPHQW&RUUHODWLRQVIRU&(/
7DQG ,QWHJUDWLYH,QWHUSHUVRQDO5DWLQJVWDLOHG
132
Correlations
CELT InterK InterC InterAp InterAn InterS
InterE
CELT Pearson Correlation 1 .052* .201* .200* .014* .146* .208*
Sig. (2-tailed) .005 .003 .003 .003 .004 .002
N 200 200 200 200 200 200 200
InterK Pearson Correlation .052* 1 .759** .609** .580** .614** .682**
Sig. (2-tailed) .005 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
InterC Pearson Correlation .201* .759** 1 .762** .614** .574** .661**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
InterAp Pearson Correlation .200* .609** .762** 1 .735** .595** .622**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
InterAn Pearson Correlation .014* .580** .614** .735** 1 .669** .656**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
InterS Pearson Correlation .146* .614** .574** .595** .669** 1 .721**
Sig. (2-tailed) .004 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
InterE Pearson Correlation .208* .682** .661** .622** .656** .721** 1
Sig. (2-tailed) .002 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
Ranging
from 50.00 to 95.00, as Figure 4.7, examinees' CELT scores and their Integrative
Interpersonal ratings demonstrated roughly linear patterns, in cases such as CELT
and Interpersonal task at Application Subscale (r=.200), and circular patterns, in
cases such as CELT and Interpersonal task at Synthesis Subscale (r=.148).
Moreover, the linear patterns of correlation, in cases such as Interpersonal tasks at
Application and Analysis Subscales (r=.735), and slightly circular patterns, in cases
such as Interpersonal tasks at Comprehension and
Evaluation Subscales (r=.661) are graphically displayed in Figure 4.7. Similar to
previous Figures, compatibility of the correlation statistics and their graphic
representations could be taken as the sign of the
accuracy of
InterK
correlation data in
the R-matrix.
InterC
InterAp
CELT
50.00
InterAn
55.00 60.00
InterS
133
InterE
62.00
65.00
68.0069.00
70.00
74.00
75.0077.00
80.0085.00
90.0095.00
InterK InterC InterAp InterAn InterS InterE
)LJXUH6FDWWHU3ORWIRU&(/7DQG,QWHJUDWLYH
,QWHUSHUVRQDO5DWLQJV
4.7.8 CELT and Integrative Naturalist Tasks

As Table 4.16 indicates, examinees' CELT scores and their Integrative Naturalist
ratings had low Pearson product-moment correlation measures ranging from r=.036, for
CELT and Integrative Naturalist task at Comprehension Subscale, to r=.203, for CELT
and Integrative Naturalist task at Application Subscale. As presented in Table 4.15, all
r-measures
were statistically significant at a two-tailed p<.01 or p<.05
7DEOH
134
,QWHJUDWLYH1DWXUDOLVW5DWLQJVWDLOHG
Correlations
CELT NatK NatC NatAp NatAn NatS NatE
CELT Pearson Correlation 1 .037* .036* .203* .055* .124* .144*

Sig. (2-tailed) .005 .005 .005 .004 .003 .004
N 200 200 200 200 200 200 200
NatK Pearson Correlation 0.037* 1 .682** .564** .530** .314** .387**
Sig. (2-tailed) .005 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatC Pearson Correlation .036* .682** 1 .583** .608** .327** .486**
Sig. (2-tailed) .005 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatAp Pearson Correlation .203* .564** .583** 1 .653** .354** .333**
Sig. (2-tailed) .005 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatAn Pearson Correlation .055 .530** .608** .653** 1 .516** .537**
Sig. (2-tailed) .004 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatS Pearson Correlation .124 .314** .327** .354** .516** 1 .450**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatE Pearson Correlation .144* .387** .486** .333** .537** .450** 1
Sig. (2-tailed) .004 .000 .000 .000 .000 .000
**.
N 200 200 200 200 200 200 200 Correlation
is significant at the 0.01 level (2-tailed).
Ranging from 50.00 to 90.00, as Figure 4.8 displays, examinees' CELT scores and
their Integrative Naturalist task ratings demonstrated linear patterns of correlation, in
cases such as CELT and Naturalist task at Evaluation Subscale (r=.144), and circular
patterns, in cases such as CELT and Naturalist task at Analysis Subscale (r=.055).
Meanwhile, the narrow linear patterns of correlation, in cases such as Naturalist tasks
at Application and Analysis Subscales (r=.653) and circular patterns, in cases such as
Naturalist tasks at Comprehension and Synthesis Subscales
(r=.327) were compatible with the statistics in the R-matrix and proved as
the sign of accuracy for the correlation data.
CELT MIDAS
135
NatK
50.00 60.00
66.00
NatC
NatAp
NatAn
NatS
NatE
70.00
75.00 80.00
85.00
90.00
NatK NatC NatAp NatAn NatS NatE
)LJXUH6FDWWHU3ORWIRU&(/7DQG,QWHJUDWLYH1DWXUDOLVW
5DWLQJV
In a nutshell, as measures of criterion-related validity, CELT and Integrative tasks R-

matrixes and their scatterplots were displayed in detail for analyzing the measures of
Pearson product-moment correlation for (a) CELT and Integrative Task ratings, and
(b) individual Integrative Task ratings designed in MI categories at different subscales.
The CELT and Integrative tasks correlation data indicated a wide low-to-moderate
range of correlation coefficients, from the lowest correlation coefficient of r=.014 for
CELT and Integrative Interpersonal task at Analysis Subscale to the highest
correlation coefficient of r=.647 for CELT and Integrative Linguistic task at
Comprehension Subscale. Except for one insignificant correlation coefficient (r=.111)
136
in CELT and Integrative Musical task at Comprehension Subscale , all measures of
Pearson product-moment correlation proved to be positive and statistically significant
at two-tailed p<.01 or p<.05. Similarly, Integrative tasks at different subscales in eight
MI categories showed moderate-to-high measures of Pearson product-moment
correlation while the patterns of fluctuation were totally different from those observed
for CELT and
Integrative tasks.
In conclusion, based on CELT-Integrative Task ratings a number of
distinctive findings and their interpretations are outlined: 1. Data
analysis for CELT-Integrative task R-matrixes revealed that majority of measures of
Pearson product-moment correlation turned out to be low, except for CELT and
Integrative Linguistic tasks. Such low correlation coefficients might be interpreted to
support the fact that as a standard test of general language proficiency, CELT is more
specifically intended to assess examinees' linguistic competence; hence the high
correlation coefficients for CELT and Integrative Linguistic tasks at different
subscales, and the low-to-moderate correlation measures for CELT and other
Integrative tasks were not too far from the researcher's expectation. However, as far as
the trustworthiness of Integrative tasks was concerned, the compatibility of the
correlation data with their graphic representations in scatterplots, as well as the
statistically significant measures of correlation must be relied on and admitted as
respectable evidence in favor of CELT criterion-related validity for the
IT Model.
2. Contrary to CELT-Integrative tasks correlation statistics, individual Integrative
task ratings at different subscales demonstrated moderate-to-high Pearson product-
moment correlation coefficients which were proved compatible with their graphic
representations in scatterplots. The strength of correlations observed among
Integrative tasks at different subscales might be interpreted in favor of the high
137
trustworthiness of Integrative task ratings as operational definitions of their
underlying two-factor constructs, that is, (a) categories of Multiple
Intelligences at (b) different levels of Cognitive Domain. This critical set
of findings will further be discussed as the evidence of construct validity
for the IT Model.
3. Despite the moderate-to-high correlation measures of Integrative task ratings at
different subscales, the patterns of fluctuation did not seem quite predictable as far
as specific subscales in the IT Model were concerned. As an example, while
Intrapersonal tasks at Knowledge and Synthesis Subscales showed a high
correlation measure (r=.633), Spatial tasks at the very two subscales demonstrated
a rather low correlation measure (r=.412). Such unpredictability might
successfully provide some supports for the absence of significant interaction
among Integrative tasks, in other words, the independency of Integrative tasks at
different subscales
of the IT Model.
MIDAS
Next to investigating the CELT criterion-related validity estimates, statistical
findings for Multiple Intelligences Developmental Assessment Scales (MIDAS), as a
standard test of Multiple Intelligences and the other criterion test in this study, will be
discussed. After administering MIDAS with the main sample of the participants,
measures of Pearson product-moment correlation were computed for MIDAS-
Integrative Task ratings. Initially, therefore, statistics obtained for MIDAS ratings are
summarized in Table 4.17. Descriptive data for MIDAS ratings included the measures
of mean (M), maximum score (Max), minimum score (Min), standard deviations
(SD), and the number of examinees
(N).
138
7DEOH
'HVFULSWLYH6WDWLVWLFVIRU0,'$65DWLQJV
0D 0L
0 [ Q 6' 1
0,'$
6
9DOLG 1
OLVWZ
L VH
Similar to CELT statistics, the mean, minimum, and maximum scores

represented a wide range of scores and the measure of standard deviation indicated
that the learners appeared to produce the full range of possible values on each
sections of the MIDAS. In order to investigate the associations between examinees'
performances on Integrative tasks and their MIDAS ratings, the researcher computed
the measures of Pearson product-moment correlation for Integrative tasks designed
in eight MI categories and MIDAS ratings. Moreover, the accuracy of correlation
data represented in R-matrixes will graphically
be examined with following scatterplots.
4.7.9 MIDAS and Integrative Linguistic Tasks

As Table 4.18 indicates, examinees' MIDAS and Integrative Linguistic ratings
showed moderate-to-high correlation coefficients, ranging from r=.470, for
MIDAS and Integrative Linguistic task at Application Subscale, to r=.793, for
MIDAS and Integrative Linguistic task at Evaluation Subscale. All measures of
Pearson product-moment correlation proved to be significant at a two-tailed
p<.01.
139
7DEOH
3HDUVRQ3URGXFWPRPHQW&RUUHODWLRQVIRU0,'$6DQG
,QWHJUDWLYH/LQJXLVWLF5DWLQJVWDLOHG
Correlations
MIDAS LingK LingC LingAp LingAn LingS LingE
MIDAS Pearson Correlation 1 .495** .628** .470** .521** .748** .793**

Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingK Pearson Correlation .495** 1 .823** .503** .435** .435** .514**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingC Pearson Correlation .628** .823** 1 .720** .552** .546** .683**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingAp Pearson Correlation .470** .503** .720** 1 .654** .534** .596**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingAn Pearson Correlation .521** .435** .552** .654** 1 .820** .641**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingS Pearson Correlation .748** .435** .546** .534** .820** 1 .789**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
LingE Pearson Correlation .793** .514** .683** .596** .641** .789** 1
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
Ranging from 50.00 to 90.00, as Figure 4.9 displays, examinees' MIDAS scores and
their Linguistic ratings demonstrated linear patterns, in cases such as MIDAS and
Linguistic task at Comprehension Subscale (r=.628), and circular patterns, in cases
such as MIDAS and Linguistic task at Knowledge Subscale (r=.495). Demonstrated in
Figure 4.9, the compatibility of correlation measures with their graphic representations
in the scatterplot supports the accuracy of numerical data in Table 4.18.
140
LingK
LingC
LingAp
LingAn
LingS
LingE
MIDAS
50.00
60.0066.00
70.00
75.00
80.0085.00
90.00
LingK LingC LingAp LingAn LingS LingE
)LJXUH6FDWWHU3ORWIRU0,'$6DQG,QWHJUDWLYH/LQJXLVWLF
5DWLQJV
4.7.10 MIDAS and Integrative Mathematical Tasks Table 4.19 indicates

moderate Pearson product-moment correlation measures for MIDAS and Integrative
Mathematical ratings, ranging from r=.207, for MIDAS and Integrative
Mathematical task at Comprehension Subscale, to r=.436, for MIDAS and
141
Integrative Mathematical task at Synthesis Subscale. All measures of Pearson
product-moment correlation were statistically significant at a two-
tailed p<.01.
7DEOH
,QWHJUDWLYH0DWKHPDWLFDO5DWLQJVWDLOHG
Correlations
MIDAS MathK MathC MathAp MathAn MathS MathE
MIDAS Pearson Correlation 1 .225** .207** .380** .415** .436** .289**

Sig. (2-tailed) .001 .003 .000 .000 .000 .000
N 200 200 200 200 200 200 200
MathK Pearson Correlation .225** 1 .667** .549** .758** .736** .583**
Sig. (2-tailed) .001 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
MathC Pearson Correlation .207** .667** 1 .654** .627** .565** .532**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
MathAp Pearson Correlation .380** .549** .654** 1 .738** .581** .620**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
MathAn Pearson Correlation .415** .758** .627** .738** 1 .800** .688**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
MathS Pearson Correlation .436** .736** .565** .581** .800** 1 .698**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
MathE Pearson Correlation .289** .583** .532** .620** .688** .698** 1
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
142
Ranging from 50.00 to 95.00, as Figure 4.10 displays, examinees' MIDAS scores
and their Integrative Mathematical ratings demonstrated linear patterns of
correlation, in cases such as MIDAS and Mathematical task at Analysis Subscale
(r=.415), and circular patterns, in cases such as MIDAS and Mathematical task at
tasks at Synthesis Subscale (r=.289). Similarly, the accuracy of correlation data in
the R-matrix was confirmed with the compatibility between the correlation
measures and their graphic representations in the following
scatterplot.
MIDASCELT
MathK
50.00
55.0060.00
MathC
MathAp
MathAn
MathS
MathE
62.00
65.00
68.00 69.00
70.00
143
74.00
75.00 77.00
80.00
85.00
90.00 95.00
MathK MathC MathAp MathAn MathS MathE
)LJXUH6FDWWHU3ORWIRU0,'$6DQG,QWHJUDWLYHDWKHPDWLF
DO 5DWLQJV
0,'$6DQG,QWHJUDWLYH0XVLFDO7DVNV
As Table 4.20 indicates, MIDAS and Integrative Musical ratings showed low-to-
high correlation coefficients ranging from r=.298, for MIDAS and Integrative
Musical task at Application Subscale, to r=.602, for MIDAS and Integrative Musical
task at Synthesis Subscale. Similar to the previous R-matrixes, all measures of
Pearson product-moment correlation were statistically significant at two-tailed p<.01
or p<.05.
7DEOH
,QWHJUDWLYH0XVLFDO5DWLQJVWDLOHG
144
Correlations
MIDAS MusK MusC MusAp MusAn MusS MusE
MIDAS Pearson Correlation 1 .385* .346* .298** .376** .602** .484**

Sig. (2-tailed) * .004 .000 .000 .000 .000
N 200 .000 200 200 200 200 200
200
MusK Pearson Correlation .385** 1 .821** .763** .733** .777** .677**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
MusC Pearson Correlation .746* .821* 1 .790** .670** .632** .515**
Sig. (2-tailed) .004 * .000 .000 .000 .000
N 200 .000 200 200 200 200 200
200
MusAp Pearson Correlation .298** .763* .790** 1 .757** .705** .510**
Sig. (2-tailed) .000 * .000 .000 .000 .000
N 200 .000 200 200 200 200 200
200
MusAn Pearson Correlation .376** .733* .670** .757** 1 .796** .611**
Sig. (2-tailed) .000 * .000 .000 .000 .000
N 200 .000 200 200 200 200 200
200
MusS Pearson Correlation .602** .777* .632** .705** .796** 1 .760**
Sig. (2-tailed) .000 * .000 .000 .000 .000
N 200 .000 200 200 200 200 200
200
MusE Pearson Correlation .484** .677* .515** .510** .611** .760** 1
Sig. (2-tailed) .000 * .000 .000 .000 .000
N 200 .000 200 200 200 200 200 Figure 4.11
200
presents the
correlation
patterns between MIDAS ratings, ranging from 50.00 to 90.00, and Integrative
Musical ratings. As demonstrated, roughly linear patterns of correlation can be seen, in
cases such as MIDAS and Musical task at Synthesis Subscale (r=.602), and circular
patterns, in cases such as MIDAS and Integrative Musical task at Application Subscale
(r=.298). As it was expected, the dependability of the correlation data in the R-matrix
was supported through the correlation
patterns in the scatterplots.
MIDAS
MusK
MusC
MusAp
MusAn
145
MusS
MusE
50.00
60.0066.0070.00
75.00
80.00
85.00 90.00
MusK MusC MusAp MusAn MusS MusE
)LJXUH6FDWWHU3ORWIRU0,'$6DQG,QWHJUDWLYH0XVLFDO
5DWLQJV
0,'$6DQG,QWHJUDWLYH.LQHVWKHWLF7DVNV
Table 4.21 shows moderate Pearson product-moment correlation measures for
MIDAS and Integrative Kinesthetic ratings ranging from r=.215, for MIDAS and
Integrative Kinesthetic task at Application Subscale , to r=.317, for MIDAS and
Integrative Kinesthetic task at Evaluation Subscale. All measures of Pearson
product-moment correlation were statistically significant at either a two-tailed p<.01
or
p<.05.
146
7DEOH
,QWHJUDWLYH.LQHVWKHWLF5DWLQJVWDLOHG
Correlations
MIDAS KinK KinC KinAp KinAn KinS KinE
MIDAS Pearson Correlation 1 .300** .314* .215* .303* .270** .317**

Sig. (2-tailed) .000 .004 .003 .004 .000 .000
N 200 200 200 200 200 200 200
KinK Pearson Correlation .300** 1 .694** .612** .412** .530** .525**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
KinC Pearson Correlation .314** .694** 1 .651** .584** .614** .684**
Sig. (2-tailed) .004 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
KinAp Pearson Correlation .215** .612** .651** 1 .770** .693** .528**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
KinAn Pearson Correlation .303* .412** .584** .470** 1 .749** .514**
Sig. (2-tailed) .004 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
KinS Pearson Correlation .270** .530** .614** .693** .749** 1 .704**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
KinE Pearson Correlation .317** .525** .684** .528** .514** .704** 1
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
Ranging
from 50.00 to 90.00, as Figure 4.12 shows, MIDAS ratings and Integrative
Kinesthetic tasks demonstrated linear patterns of correlation, in cases such as
MIDAS and Kinesthetic task at Knowledge Subscale (r=.300), and circular
patterns, in cases such as
MIDAS and Kinesthetic Task at Application Subscale (r=.215). Similar to previous
Figures, the graphic representations of the correlation data could effectively
support the accuracy of numerical
data in the R-matrix.
147
MIDAS
KinK
50.0060.0066.00
KinC
70.00 75.00
80.0085.00
KinAp
KinAn
KinS
KinE
90.00
KinK KinC KinAp KinAn KinS KinE
)LJXUH6FDWWHU3ORWIRU0,'$6DQG,QWHJUDWLYH
.LQHVWKHWLF5DWLQJV
4.7.13 MIDAS and Integrative Spatial Tasks

As Table 4.22 shows, MIDAS and Integrative Spatial ratings were moderately
correlated with the range of r=.289, for MIDAS and Integrative Spatial task at Analysis
Subscale, to r=.439, for MIDAS and Integrative Spatial task at Synthesis Subscale.
148
Except for the nonsignificant correlation measure (r=.310) for MIDAS and Spatial task
at
Knowledge Subscale, other Pearson product-moment correlation
coefficients were statistically significant at two-tailed p<.01 or p<.05. 7DEOH
,QWHJUDWLYH6SDWLDO5DWLQJVWDLOHG
Correlations
MIDAS SpatK SpatC SpatAp SpatAn SpatS SpatE
MIDAS Pearson Correlation 1 .310 .326** .532** .289** .439** .360**

Sig. (2-tailed) .004 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatK Pearson Correlation .310 1 .319** .310 .315* .412 .387
Sig. (2-tailed) .002 .001 .002 .003 .001 .002
N 200 200 200 200 200 200 200
SpatC Pearson Correlation .326** .319** 1 .677** .685** .512** .708**
Sig. (2-tailed) .000 .004 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatAp Pearson Correlation .532** .510 .677** 1 .709** .523** .551**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatAn Pearson Correlation .289** .615* .685** .709** 1 .618** .633**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatS Pearson Correlation .439** .512 .512** .523** .618** 1 .638**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
SpatE Pearson Correlation .360** .509 .708** .551** .633** .638** 1
Sig. (2-tailed) .000 .002 .000 .000 .000 .000
N 200 200 200 200 200 200 200
Figure 4.13 shows the patterns of correlation for examinees' MIDAS ratings, with a
range of 50.00 to 90.00, and their Integrative Spatial ratings. Compatible with the
correlation data in R-matrix, MIDAS and Integrative Spatial task ratings
demonstrated flat or even patterns of correlation, in the case of MIDAS and
Integrative Spatial tasks, linear patterns in cases such as MIDAS and Integrative
Spatial task at
Application Subscale (r=.532) and circular patterns, in cases such as MIDAS and
Integrative Spatial task at Analysis Subscale (r=.289).
149
SpatK
SpatC
SpatAp
SpatAn
SpatS
SpatE
MIDAS
50.00
60.0066.00
70.00
75.00
80.00
85.0090.00
SpatK SpatC SpatAp SpatAn SpatS SpatE
6SDWLDO5DWLQJV
4.7.14 MIDAS and Integrative Intrapersonal Tasks
As Table 4.23 indicates, MIDAS ratings had moderate measures of Pearson

product-moment correlation with Integrative Intrapersonal ratings ranging from
r=.208, for MIDAS and Integrative Intrapersonal task at Analysis Subscale, to
150
r=.319, for MIDAS and Integrative Intrapersonal task at Comprehension Subscale.
All measures of correlation proved to be statistically significant at a two-tailed
p<.01 or
p<.05.
7DEOH
,QWHJUDWLYH,QWUDSHUVRQDO5DWLQJVWDLOHG
Correlations
MIDAS IntraK IntraC IntraAp IntraAn IntraS IntraE
MIDAS Pearson Correlation 1 .314* .319** .237* .208* .214* .228**

Sig. (2-tailed) .005 .000 .003 .002 .004 .000
N 200 200 200 200 200 200 200
IntraK Pearson Correlation .314* 1 .674** .638** .532** .633** .637**
Sig. (2-tailed) .005 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraC Pearson Correlation .319** .674** 1 .550** .364** .291** .358**
Sig. (2-tailed) .000 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraAp Pearson Correlation .237* .380** .550** 1 .517** .322** .373**
Sig. (2-tailed) .003 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraAn Pearson Correlation .208* .317** .364** .517** 1 .419** .394**
Sig. (2-tailed) .002 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraS Pearson Correlation .214* .334** .291** .322** .419** 1 .480**
Sig. (2-tailed) .004 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
IntraE Pearson Correlation .228** .369** .358** .373** .394** .480** 1
*. Correlation is significant at the 0.05.000
Sig. (2-tailed) level (2-tailed).
.000 .000 .000 .000 .000
N 200 200 200 200 200 200 200 **.
Correlation is significant at the 0.01 level (2-tailed).
Figure 4.14 displays the patterns of correlation for MIDAS, ranging from 50.00
to 90.00, and Integrative Intrapersonal ratings. As it was expected, some of these
patterns turned out as linear in cases such as MIDAS and Intrapersonal task at
Knowledge Subscale (r=314), while some were circular, in cases such as MIDAS
151
and Intrapersonal tasks at Synthesis Subscale (r=214). Supportive enough, the
compatibility of data in R-matrix and their graphic representations in the scatterplot
confirmed the accuracy of the correlation findings.
IntraK
IntraC
IntraAp
IntraAn
IntraS
IntraE
MIDAS
50.00 60.00
70.0066.00 75.00
80.00 85.00
90.00
IntraK IntraC IntraAp IntraAn IntraS IntraE
,QWUDSHUVRQDO5DWLQJV
4.7.15 MIDAS and Integrative Interpersonal Tasks As Table 4.24

displays, MIDAS and Integrative Interpersonal ratings showed low-to-moderate
correlation coefficients, ranging from r=.014 for MIDAS and Integrative
152
Interpersonal task at Analysis Subscale to r=.208 for MIDAS and Integrative
Interpersonal task at
Evaluation Subscale. All measures of Pearson product-moment
correlation were significant at p<.01 or p<.05.
7DEOH
,QWHJUDWLYH,QWHUSHUVRQDO5DWLQJVWDLOHG
153
As Figure 4.15 displays, MIDAS ratings with a range of 50.00 to
95.00 were correlated with Integrative Interpersonal ratings.
Compatible with correlation data in R-matrix, MIDAS and Integrative Interpersonal
task ratings show linear patterns, in cases such as MIDAS and Interpersonal tasks at
Application (r=.200) Subscale, and circular patterns, in cases such as MIDAS and
Interpersonal task at Knowledge
Subscale (r=.052).
InterK
InterC
InterAp
InterAn
MIDACELT S
50.00
InterS
InterE
154
55.00
60.0062.00
65.0068.00
69.0070.00
74.0075.00
77.0080.00
85.0090.00
95.00
InterK InterC InterAp InterAn InterS InterE
,QWHUSHUVRQDO5DWLQJV
4.7.16 MIDAS and Integrative Naturalist Tasks

Finally, Table 4.25 indicates low-to-moderate measures of Pearson product-
moment correlation for MIDAS and Integrative Naturalist ratings ranging from
r=.036, for MIDAS and Integrative Naturalist task at Comprehension Subscale, to
r=.203, for MIDAS and Integrative Naturalist task at Application Subscale. All
Pearson product-moment correlation coefficients proved statistically
significant at either a two-tailed p<.01 or p<.05.
155
7DEOH
3HDUVRQ3URGXFWPRPHQW&RUUHODWLRQVIRU0,'$6DQG ,QWHJU
DWLYH1DWXUDOLVW5DWLQJVWDLOHG
Correlations
MIDAS
CELT NatK NatC NatAp NatAn NatS NatE
MIDA
CELT S Pearson Correlation 1 .037* .036* .203* .055* .124* .144*
Sig. (2-tailed) .053 .005 .053 .044 .028 .041
N 200 200 200 200 200 200 200
NatK Pearson Correlation 0.037* 1 .682** .564** .530** .314** .387**
Sig. (2-tailed) .053 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatC Pearson Correlation .036* .682** 1 .583** .608** .327** .486**
Sig. (2-tailed) .005 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatAp Pearson Correlation .203* .564** .583** 1 .653** .354** .333**
Sig. (2-tailed) .053 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatAn Pearson Correlation .055* .530** .608** .653** 1 .516** .537**
Sig. (2-tailed) .044 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatS Pearson Correlation .124* .314** .327** .354** .516** 1 .450**
Sig. (2-tailed) .028 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
NatE Pearson Correlation .144* .387** .486** .333** .537** .450** 1
Sig. (2-tailed) .041 .000 .000 .000 .000 .000
N 200 200 200 200 200 200 200
As Figure 4.16 displays, the patterns of correlation for MIDAS ratings, with the
range of 50.00 to 90.00, and Integrative Naturalist ratings showed linear patterns, in
156
cases such as MIDAS and Naturalist task at Application Subscale (r=.203), and
circular patterns, in cases such as
MIDAS and Naturalist task at Knowledge Subscale (r=.037).
Similarly, the graphically representations of data in R-matrix properly supported the
accuracy of the measures of correlation indicated in
Table 4.24.
MIDAS50.00
60.00
NatK
66.0070.00
75.00
NatC
80.0085.00
90.00
NatAp
NatAn
NatS
NatE
NatK NatC NatAp NatAn NatS NatE
)LJXUH6FDWWHU3ORWIRU0,'$6DQG,QWHJUDWLYH1DWXUDOL
VW 5DWLQJV
In short, Pearson product-moment correlation coefficients for MIDAS and

Integrative task ratings resulted in a wide low-to-moderate range of measures, from
the lowest r=.014 for MIDAS and Interpersonal task at Analysis Subscale to the
highest r=.793 for MIDAS and Linguistic task at Evaluation Subscale. Except for
MIDAS and Integrative Spatial task at Knowledge Subscale, other correlation
coefficients were proved to be statistically significant. Moreover, the dependability of
the correlation data in MIDAS-Integrative Task ratings was investigated and
successfully instantiated through patterns of correlation in the scatterplots. In
157
conclusion, a number of findings and logical implications can be drawn out of
MIDAS-Integrative task correlation data that might be interpreted as evidential
support for
MIDAS Criterion-Related Validity of the IT Model: 1. Data
analysis for MIDAS-Integrative task Pearson productmoment correlation revealed
low-to-moderate though statistically significant r-measures. Contrary to what was
initially expected, the examinees with high MIDAS profiles did not perform similarly
on
Integrative tasks. Statistically reported, such a discrepancy requires more research and
probably a careful revision of Integrative task contents. However, since the
researcher's major concern was to obtain significant correlation measures, the low
numerical values of correlation coefficients were not as harmful as they might seem
to MIDAS
Criterion-Related Validity estimation.
2. Despite low measures of Pearson product-moment correlation, they did not
show a wide range of variation for MIDAS and Integrative Task ratings at different
subscales. This limited range of variance can successfully be interpreted as the
evidence in favor of the viability of the correlation data (For detailed discussion, see
Hatch & Farhady, 1981). In conclusion, posing Research Question IIIb, the
researcher's initial assumption was that the examinees with higher ratings on two
standard tests of CELT and MIDAS would have better performances on
Integrative tasks. The criterion-related validity of Integrative tasks was investigated by
means of computing measures of Pearson productmoment correlation for examinees'
ratings on Integrative Tasks, CELT, and MIDAS, respectively. A wide range of
statistically significant correlation measures was computed which was interpreted as
the supportive evidence in favor of criterion-related trustworthiness of
Integrative Tasks.
158
4.8 Construct Validity
4.8.1 Factor Analysis
Simply defined, construct validity of a test is an estimate of the degree to which
the items in the test reflect the essential aspects of the theory on which the test is based
(called as trait or construct). One of the most critical measures of test trustworthiness,
construct validity is regarded as paramount for accountability of a prototype model of
assessment. In the current study, as a test of construct validity, therefore, Exploratory
Factor Analysis was performed to discern and to quantify the variables or factors
supposed to underlie the examinees' performances on
48 Integrative tasks.
In factor analysis, the major assumption is that the mathematical factors represent
the latent variables (i.e., psychological dimensions), the nature of which can only be
guessed by examining the nature of the tests that have sizeable co-ordinates on any
particular axis (i.e., Factor Loadings). It should perhaps be said at the outset that this
claim is still controversial, and there have been notable psychologists (See Kim &
Mueller, 1978a; Winer et al., 1991, for some notable discussions) who hold that the
factors found with factor analysis are statistical realities, but
psychological fictions.
As the SPSS 15.0 statistical program utilizes, a matrix of correlation coefficients
was generated for all the variable combinations found in the
IT Model, that is, Integrative tasks designed in eight MI categories at six subscales
(n=48). Theoretically, therefore, an individual Integrative task operationalized a two-
factor construct: (a) a Multiple Intelligence core operation, as the first factor; and (b) a
subscale of Cognitive Domain, as the second factor, both of which were integrated into
the two-factor
construct underlying that specific Integrative task. As Table
4.26 displays, the 48 Integrative Task ratings listed in the table were analyzed for
exploring the number of underlying factors. Consequently, after using the Varimax
159
Rotation Test--a commonly-used measure of Factor Analysis—the researcher came up
with nine factors which were selectively loaded by 48 Integrative task ratings.
Thorough analysis of the data presented in Table 4.26 is summarized

as follows: o Factor
1, or Musical Factor, was highly loaded by the Musical Task ratings with
correlation coefficients of .772, .743, .805, .783, .793, and .735, respectively.
o Factor 2, or Mathematical factor, was highly loaded by the examinees'
Mathematical Task ratings with correlation indexes of
.797, .801, .727, .811, .750, and .755, respectively.
o Factor 3, or Interpersonal Factor, was mainly loaded by Interpersonal Task
ratings with the correlation coefficients of .813, .841, .840, .837, .858, and .838,
respectively.
o Factor 4, or Kinesthetic Factor, was loaded mainly by the
examinees' Kinesthetic Task ratings, with the correlation coefficients
of .676, .777, .849, .765, and .689, respectively.
o Examinees' Linguistic Task ratings loaded mainly on Factor 5, or Linguistic
Factor, with the correlation indexes of .722, .611, .700, .845, .855, and .797,
respectively.
o Factor 6, or Naturalist Factor, was highly loaded by Naturalist Task ratings
with correlation coefficients of .784, .827, .791, .851, .601, and .647, respectively.
o Factor 7, or Intrapersonal Factor, was mainly loaded by
Intrapersonal Task ratings with correlation statistics of .365, .474, .715, .773, .604,
and .566, respectively.
o Factor 8, or Spatial Factor, was moderately loaded by Spatial Task
ratings with the correlation coefficients of .600, .509, .510, .511, .402, and .516,
respectively. oFinally, Factor 9, or IT Factor, was the one highly and exclusively
loaded by the Integrative Task ratings at Knowledge and
160
Comprehension Subscales (that is, Ling/K, and C, Math/K, and C, Mus/K, and C, Kin/K, and
C, Spat/K, and C, Intra/K, and C, Inter/K, and C, and Nat/K,
and C). All Integrative Task ratings at Application and Analysis
Subscales (that is, Ling/Ap, and An, Math/Ap, and An, Mus/Ap, and An,
Kin/Ap, and An, Spat/Ap, and An, Intra/Ap, and An, Inter/Ap, and An, and Nat/Ap, and An )
moderately load on the IT Factor. Finally, IT Factor was weakly loaded by all
Integrative Task ratings at Synthesis and
Evaluation Subscales (that is, Ling/Syn, and Eva, Math/Syn, and Eva,
Mus/Syn, and Eva, Kin/Syn, and Eva, Spat/Syn, and Eva, Intra/Syn, and Eva, Inter/Syn, and Eva, and
Nat/Syn, and Eva).
Exploratory factor analysis data, therefore, indicated the presence of eight

factors which corresponded to the number of eight MI categories, and the presence
of a nine factor, or IT Factor, which seemed totally critical in this study; IT Factor
was loaded in descending order by Integrative tasks at Subscales of ascending order
so that it seemed be a factor that was strongly assessed by Integrative tasks at
Knowledge and Comprehension Subscales, moderately by Integrative tasks at
Application and Analysis Subscales, and weakly by Integrative tasks at Synthesis
and
Evaluation Subscales.
Apparently, the nature of IT Factor demands more investigation; however, since
this factor has predictably discriminated the Integrative tasks at different subscales, its
presence supports the logical application of the Taxonomy of Educational Objectives
as a yardstick for grading the Integrative tasks in the IT Model. Graphically displayed
in a Ribbon from Pivot Figure--a graphic recently used to visualize the number and
positions of factors retrieved in the Varimax Rotational Test--the nine factors explored
in the IT Model are displayed in the rotated component matrix, after 17
iterations performed.
161
Rotated Component Matrix
Variables
LingK MathS SpatA p IntraK InterS
LingC MathE SpatA n IntraC InterE

LingAp MusK SpatS IntraA p NatK
LingAn MusC SpatE IntraA n NatC

LingS MusA p KinK IntraS NatAp
LingE MusA n KinC IntraE NatAn

MathK MusS KinAp InterK NatS
MathC MusE KinAn InterC NatE
MathA p SpatK KinS InterAp
MathA n SpatC KinE InterAn
162
)LJXUH5LEERQIURP3LYRW)LJXUHIRUWKH
1LQH )DFWRUV([SORUHGLQWKH,70RGHO
In conclusion, Exploratory factor analysis statistics displayed the presence of eight

factors selectively loaded by eight MI categories of Integrative Task ratings and an
additional ninth factor, or IT Factor, with distinctive characteristics. In other words,
Factor Loading values on eight out of nine factors were determined by the MI
categories of Integrative tasks; for example, Factor 1 or Musical Factor was highly
loaded by Integrative Musical tasks and weakly by Integrative tasks in other MI
categories. Factor Loading values on IT Factor, however, were determined by the
subscale at which an Integrative task was located. As an example, one could have
accurately predict a high loading value on IT Factor, if an Integrative task were at
Knowledge Subscale and a low value of loading, if the task were at Evaluation
Subscale. This critical characteristic might be interpreted as a statistical support for the
viability of the levels of Cognitive Domain in grading Integrative tasks into six
distinctive subscales. The outcomes of Exploratory factor analysis as a test of construct
validity, therefore, supported the viability of the two-factor constructs underlying
Integrative tasks.
163
Trustworthiness of interpretations made out of the Exploratory factor analysis data
was further garnered through conducting a Two-Factor Within-Subject Analysis of
Variances (ANOVA). In other words, if the researcher could prove that the levels of
the two factors-- eight MI categories at six subscales of Cognitive Domain--which were
integrated as the underlying constructs of Integrative task were independent from one
another, the interpretations made on the performance abilities assessed by particular
Integrative tasks would have successfully been supported. Statistically, the
independence of the levels of two factors and the absence of possible interactions
among them are investigated through measures of
Two-Factor Analysis of Variances (ANOVA).
4.8.2 Two-Way Within-Subject ANOVA

As mentioned, in this study, the question of possible interaction among levels of
the two-factor constructs underlying Integrative tasks was the main concern for
computing a Two-Factor Within-Subject Analysis of Variances (ANOVA). One of the
most important specifications made in Two-Factor ANOVA is that the correlations
among scores at various levels of the two factors --here the eight MI categories at six
subscales of Cognitive Domain--are homogeneous. This requirement is known as the
assumption of Homogeneity of Covariances or Sphericity. If this assumption is
violated, the type I error (i.e., the probability of rejecting Null Hypothesis, when it is
true) may occur. To explore the Homogeneity of Covariances, the SPSS 15.0 program
utilizes the Mauchly's Test of Sphericity. If the data fail the Sphericity Test (i.e., at a
two-tailed p<.01 or beyond at p<.05), the ANOVA F-test can be modified to make it
more conservative (less likely to reject the Null hypothesis). SPSS usually offers three
more tests varying in degree of conservativeness: the Greenhouse-Geisser, Huynh-
Feldt, and the Lower-
bound.
Ideally, in this study, it was not expected that any significant interaction would be
found among levels of the two factors, that is (a) MI categories, and (b) subscales of
164
Cognitive Domain, as a sign of total independence of these two factors. In the current
study, therefore, TwoFactor ANOVA was conducted for examining (a) Homogeneity
of the Covariances between eight MI categories (referred to as
INTELLIGENCE Factor, in Table 4.27), and six subscales of Cognitive Domain
(referred to as COGNITIVE LEVEL Factor), (b) the statistically significant
interaction between INTELLIGENCE Factor and
COGNITIVE LEVEL Factor (referred to as INT*COG Fact or).
As Table 4.27 displays, the SPSS 15.0 output for the Mauchly's Test of Sphericity
and the three conservative tests indicated the measures of Mauchly's Test for Factors
of INTELLIGENCE F¹=.121*, and COGNITIVE LEVEL F²=2.015*, both
statistically significant at a twotailed p<.01. The F³=1.036 was not, however,
statistically significant at p<.01 and beyond at p<.05, indicating no significant
interaction for the
INT*COG Factor.
7DEOH
)UDWLRVIRU0DXFKO\7HVWRI6SKHULFLW\$129$
Mauchly's Test of Sphericit b y

Measure: MEASURE_1
a
Epsilon
Mauchly's df Greenhou
Within Subjects Approx. s e- Huynh-
Effec t Chi-Square Sig. Geisser Feldt Lower-bound
INTELLIGENCE *3.1213. 414.279 27 .000 .572 .585 .143
COGNITIVE *2.01515*88 824.469 14 .000 .316 .318 .200
LEVEL 2 10719.093 629 .048 .048 .029
2.01 .063
INT*COG
1.036
1.036
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is
proportional to an identity matrix.
a. May be used to adjust the degrees of freedom for the averaged tests of significance.
.Corrected tests are displayed.
b.
Design: Intercept
Within Subjects Design: int+cog+int*cog
165
The results of ANOVA test are graphically displayed in a Clustered Boxplot
(Figure 4.18)--a graphic appropriate for demonstrating different levels of two or more
variables in factorial designs. Simply described, the structure of a Boxplot represents
that portion of the distribution of examinees' scores, falling between the 25 th and 75th
percentiles, known as Boxplot Hinges. The thick horizontal line across the interior of
the box represents the median scores, and the vertical lines outside the box, which are
known as Whiskers, connect the largest and the smallest values that are not
categorized as outliers or extreme scores in the taskindependent rating distribution.
An Outlier (0) is defined as a score or value more than 1.5 box-length away from the
box, and an Extreme Value (*) as more than 3 box-length away from the box. An
outlier case in this study was the Mus/Ap score of 91, or a case of extreme value was
the Spat/K score of 28.
Skewedness is also indicated by an eccentric location of the median within the
box; the higher the horizontal line across the box, the higher the median score in each
integrative task. In Figure 4.18, therefore, the number of Boxplots queued in the
clusters displayed the six levels of
Cognitive Domain which were used to scale eight MI categories of
Integrative tasks, represented by the number of clusters.
166
)LJXUH&OXVWHUHG%R[SORWVLQ7ZRZD\:LWKLQ
6XEMHFW$129$IRUWKH,70RGHO
In a nutshell, in terms of construct validity of the IT Model, and in order to

examine the trustworthiness of the interpretations made on Integrative task ratings,
the researcher conducted two tests of
Exploratory factor analysis and Two-Way Within-Subject ANOVA. Exploratory
factor analysis data investigated the presence of nine factors, eight of which were
loaded selectively by Integrative tasks designed in eight MI categories. A ninth
factor, or IT Factor, with a critical predictive adequacy, was explored which
supported the viability of the subscales of Cognitive Domain, as the IT Factor was
Loaded in a descending order by the Integrative task ratings at six ascending
subscales of Cognitive Domain.
Moreover, the three F-ratios computed in Two-Way Within-Subject
ANOVA proved the significance of INTELLIGENCEFactor,
COGNITIVE LEVEL Factor, as well as the independence of the levels of these two
167
factors in terms of statistically insignificant value for INT*COG Factor. Based on the
statistical data obtained on the construct validity of the IT Model, therefore, Null
Hypotheses III is rejected: The Integrative task ratings are significantly valid in terms of
(IIIa) Content, (IIIb) Criterion-Related, and (IIIc) Construct perspectives.
Research Question IV
o Is there any significant relationship between the real difficulty levels of

Integrative tasks and the difficulty levels predicted by Integrative Task Scales?
4.9 Actual and Predicted Integrative Tasks Difficulty

As discussed earlier, one of the researcher's main concerns in this study was to
investigate whether the Taxonomy of Educational
Objectives, as the quality criteria for estimating the difficulty levels of
the Integrative tasks, was a logical assumption.
Addressing this objective, the researcher's attempt was mainly to explore the
possible relationship between the difficulty levels of the Integrative tasks which
were predicted through Integrative Task Scales and their actual difficulty levels
indicated in the examinees'
performances on Integrative tasks.
168
Therefore, a Pearson product-moment correlation analysis was conducted
between the means of Integrative Task ratings (across three raters) and their
predicted difficulty levels in terms of the credits given to the Integrative tasks at
different subscales based on Integrative Task Scales. As it was expected, the
correlation findings appeared to be negative--the direction that would indicate
agreement in difficulty levels--and significant at a two-tailed p<.05. Table 4.28
represents the high measures of correlation between the means of Integrative task
ratings, and their predicted difficulty levels.
7DEOH
3UHGLFWHGDQG$FWXDO'LIILFXOW\/HYHOVRI,QWHJUDWLYH7DVNV
169
7DVNV3UHGLFWHGGLIILFXOW\OHYHOV,70HDQ
/LQJ.
0DWK.
0XV.
.LQ.
6SDW.
,QWUD.
,QWHU.
1DW.
/LQJ&
0DWK& S
0XV&
.LQ&
6SDW&
,QWUD&
As Table 4.28
,QWHU&
1DW& /LQJ$S displays, the 48
0DWK$S
0XV$S .LQ$S
Integrative tasks
6SDW$S ,QWUD$S were listed from
,QWHU$S
1DW$S their
/LQJ$Q
lowest
0DWK$Q
0XV$Q scale (0-
.LQ$Q
6SDW$Q 1) at
,QWUD$Q
,QWHU$Q
1DW$Q
/LQJ6\Q 0DWK6\Q
0XV6\Q
.LQ6\Q
6SDW6\Q
,QWUD6\Q
,QWHU6\Q
1DW6\Q
/LQJ(YD
0DWK(YD
0XV(YD
.LQ(YD
6SDW(YD ,QWUD(YD ,QWHU(YD
1DW(YD
&RUUH
ODWLRQEHWZHHQ,7PHDQV
SUHGLFWHG7DVNGLIILFXOW\OHYHOVU
Knowledge Subscale through (0-2) Comprehension Subscale, (0-3) Application

Subscale, (0-4) Analysis Subscale, (0-5) Synthesis Subscale, to (0-6) Evaluation
Subscale. The researcher's early assumption was that as the Integrative Task Scales
would grow higher so would do the means of examinees' ratings on Integrative tasks.
The mean of negatively high correlation measures between the predicted difficulty
levels of Integrative tasks at different subscales and their actual difficulty levels
turned out to be r=-0.89,
170
statistically significant at a two-tailed p<.05. In
conclusion, supportive findings for the predictive adequacy of the Integrative Task
Scales added some rich evidence for psychological reality of the subscales in the IT
Model and the logical application of the Taxonomy of Educational Objectives in
grading Integrative tasks. Based on statistical data, therefore, Null Hypothesis IV can
successfully be rejected: There is a significant positive relationship between the real
difficulty levels of the Integrative tasks and the difficulty levels
predicted by Integrative Task Scales.
In short, in Chapter IV, the researcher briefly presented and thoroughly discussed
the statistical analyses which were conducted to investigate (I) the credibility of the
Integrative tasks, (II) their consistency across different raters and various items, (III)
the trustworthiness of the interpretations made on the Integrative Task ratings in
terms of their content quality, togetherness with other standard tests, and
psychological reality of their underlying constructs, and finally (IV) the predictive
adequacy of the scaling system utilized in grading the Integrative tasks. Eventually,
the four original research hypotheses were confidently rejected in favor of viability
and dependability of the IT Model. Chapter V brings this study into a conclusion and
further
practical implications.
171
References
Allwright, R. (1988). Autonomy and individuation in whole class
instructions. New York: Longman.
Archbald, D., & Newmann, F. (1989). The Functions of assessment and
the nature of authentic academic achievement. In I. Berlock (Ed.),
Assessing achievement: Toward the development of a new science of
educational testing, Buffalo,
New York: SUNY Press.
Armstrong, T. (2003). Multiple intelligences in the classroom. Retrieved
July 23, 2005 from http://www.thomasarmstrong. com/multiple-
intelligences.htm
Bachman, L. F. (1990). Fundamental considerations in language
teaching. Oxford: Oxford University Press. Bachman, L. (2004). Statistical
analyses for language assessment. Cambridge: Cambridge University
Press. Bachman, L., & Palmer, A. (1996). Language testing in practice.
Oxford: Oxford University Press.
Blanche, P. (1990). Using standardized achievement and oral proficiency
tests for self-assessment purposes: The DLIFLC study. Language Testing,
7, 202-229.
Block, D. (2001). McCommunication: A problem in the frame for SLA. In
D. Block, & D. Cameron (2nd Eds.), Globalization and language teaching,
Clevedon, Avon: Multilingual Matters.
172
Bloom, B. (1965). Taxonomy of educational objectives: Cognitive domain,
Handbook 1 (Trans.). New York: David MacKey. Bloom, B., Broder, J., &
Dave, R. H. (1985). Developing talent in young people. New York:
Ballentine.
Breen, M. P. (1984). Process syllabuses for the language classroom.
In C. J. Brumfit (Ed.), General English syllabus design, London:
Pergamon Press.
Brown, H. D. (2001). Teaching by principles: An interactive approach to
language pedagogy (2nd Ed.). San Francisco:
Longman.
Brown, J. D., & Hudson, Th. (2002). Alternatives in language
assessment. TESOL Quarterly, 32 (4), 653-675. Brown, J. D., Hudson, Th.,
Norris, J., & Bonk, W. (2002). An investigation of second language
task-based performance assessments. Honolulu: University of
Hawaii' Press.
Brualdi, A. (1996). Multiple intelligences: Gardner's theory.
Retrieved August 19, 2007 from http://www.parenting-
baby.com/GardnerERIC.html
Bruton, A. (2002). From tasking purposes to purposing tasks.
English Language Teaching Journal, 56 (3), 280-288. Bygate, M. (2001).
Effects of task repetition on the structure and control of oral language.
London: Longman.
Bygate, M., Skehan, P., & Swain, M. (2001). Researching pedagogic tasks:
Second language learning, teaching and testing.

173
London: Longman.
Canale, M., & Swain, M. (1980). The theoretical bases of
communicative approaches to second language teaching and learning.
Applied Linguistics, 1 (1), 1-47.
Celce-Murcia, M. (1991). Grammar pedagogy in second and foreign language
teaching. TESOL Quarterly, 25 (2), 459-478.
Celce-Mulcia, M., & Larsen-Freeman, D. (1999). The grammar book:
An ESL/EFL teacher's course. Heinle & Heinle Publishers.
Cook, G. (2000). Language play, language learning. Oxford:
Oxford University Press.
Chen, T. Y. (1995). In search of an effective grammar teaching model.
Forum, 33 (3). Retrieved March 23, 2007 from
http://www.exchanges.state.gove/forum/vols/no3/p58.html Christison, M. A.
(1996). Teaching and learning language through multiple intelligences.
TESOL Quarterly, 6 (1), 10-14. Christison, M. A. (1998). Applying multiple
intelligences theory in pre-service and in-service TEFL education
programs. English Teaching Forum, 36, 2-19.
Christison, M. A. (1999). Multiple intelligences: Theory and practice in adult
ESL. Retrieved July 20, 2007 from http://www.cal.org/ ncle/digest/MI.html.
Coustan, T., & Rocka, L. (1999). Putting theory into practice. Focus on
Basics, 30 (3). Retrieved July 17, 2007 from http://www.gse.
harvard.edu/ncsall/fob/1999/coustan.html
174
Dornyei, Z., & Kormos, J. (2001). The role of individual and social variables in
oral task performance. Language Testing Research,
13, 431-69.
Doughty, C. (2001). Cognitive aspects of focus on form. Cambridge:
Cambridge University Press.
Ellis, R. (2000). Task-based research and language pedagogy.
Language Teaching Research, 4 (3), 190-98.
Ellis, R. (2001). Non-reciprocal tasks, comprehension and second
language acquisition. London: Longman. Farhady, H. (2005). Language
assessment: A Linguametric perspective. Language Assessment Quarterly,
2 (2), 38-45. Farrar, M. J. (1992). Negative evidence and grammatical morpheme
acquisition. Developmental Psychology, 28, 80-89.
Fodor, J. (1983). The modularity of the mind. Cambridge, MA:
MIT Press.
Foster, P. (2001). Lexical measures in task-based performance.
Paper presented in the AAAL Conference, Vancouver, Canada. Fotos, S. (1994).
Shifting the focus from forms to form in EFL classroom. ELT Journal, 52 (4),
301-307.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences.
New York: Basic Books.
Gardner, H. (1993). Multiple intelligences: The theory in practice.
New York: Basic Books.
Gardner, H. (1995). Reflections on multiple intelligences: Myths and messages .
Phi Delta Kappan Educational Foundation. Retrieved August 10, 2006 from
175
http://www.byu.edu/pe/ blakemore/reflection.html
Gardner, H. (1998). Are there additional intelligences? The case for
naturalist, spiritual, and existential intelligences. In J. Kane (Ed.),
Education, information, and transformation (111-131), Upper Saddle
River, NJ: Merrill-Prentice Hall.
Gardner, H. (1999a). Intelligence reframed: Multiple intelligences for the
21st century. New York: Basic Books
Gardner, H. (1999b). Who owns intelligence? Atlantic Monthly, 67-
70.
Gardner, H. (1999c). Deeper into multiple intelligences: MI theory as a
tool. MI-News, 1 (9). Retrieved August 19, 2006 from
http://www.angelfire.com/oh/themidas/dec99_5sections.html Gass, S. (2002).
Interaction perspectives in second language acquisition. In R. Kaplan
(Ed.), Handbook of applied linguistics, Oxford: Oxford University Press.
Ghal-eh, N. (2006). The contextualized intelligence:
Legitimacy of Gardner's multiple intelligences theory in Iranian
context. Paper presented in the 2nd Conference on
Issues of Language Teaching, Tehran, Iran. Ghal-eh, N. (2007). An
integrative model of language task based assessment: The efficacy of
the theory of multiple intelligences. Paper presented in LAEL PG
Conference, Lancaster, England.
Gruber, H. (1985). Giftedness and moral responsibility: Creative thinking
and human survival. In F. Horowitz, & M. O'Brien (Eds.), The gifted and
176
the talented: Developmental perspectives, Washington, DC: American
Psychological
Association.
Guignon, A. (1998). Multiple intelligences: A theory for everyone.
Education World. Retrieved July 20, 2007 from http://www. education-
word.com/a-curr/curr054.html
Guilforld, J. P. (1982). Cognitive psychology's ambiguities:
Some suggested remedies. Psychological Review, 89,
48-59.
Hancock, C.R. (1994). Teaching, testing, and assessing: Making the
connection. Northeast Conference Reports, Lincolnwood, IL: National
Textbook Co.
Harely, B. (1998). The role of focus of form tasks in promoting child L2
acquisition. In C. Doughty, & J. Williams (Eds.), Focus on form in classroom
second language acquisition (154-175).
Cambridge: Cambridge University Press.
Hatch, E., & Farhady, H. (1981). Research design and statistics for applied
linguistics. Los Angeles: University of California
Press.
Heilenmann, K. L. (1990). Self-assessment of second language ability:
The role of response effects. Language Testing, 7, 174 201.
Hughes, A. (1998). Testing for language teachers. Cambridge:
177
Hulstijn, J. (1989). Implicit and incidental language learning: Experiments
in the processing of natural and partly artificial input. In H. Dechert, & M.
Raupach (Eds.), Interlingual processing (49-73), Tubingen: Gunter Narr.
Johnson, K. (2007a). Researching expertise in language teaching. Paper
presented at LAEL PG Conference, Lancaster,
England.
Johnson, K. (2007b). Tracing pedagogical expertise: LATEX expertise. Paper
presented at BAAL Conference, Lancaster,
England.
Kallenbach, S. (1999). Emerging themes in adult multiple
Intelligences research. Focus on Basics, 3, 17-20. Retrieved
July 12, 2007 from http://www.gseweb.harvard.edu/ncsall/
Fob/1999/kallen.html
Krejcie, R. V., & Morgan, D. W. (1970). Determining sample size

for research activities. Educational and Psychological
Measurement, 30, 607-610.
Kornhaber, M. L. (2001). Howard Gardner. In J. A. Palmer (Ed.), Fifty
modern thinkers on education: From Piaget to the present. London:
Routledge.
Lantole, J. (2000). Sociocultural theories and second language learning.
Larsen-Freeman, D., & Long, M. H. (1991). An introduction to second

language acquisition research. London and New York:
Longman.
178
Larsen-Freeman, D. (1997). Grammar and its teaching: Challenging the
myths. Eric Digests. Retrieved October 10, 2007 from
http://www.ed.gov/databases/ERICDigest/ed406826.html Larsen-Freeman, D.
(2000). Grammar dimensions: Form, meaning, and use (2nd Ed.). Pacific
Grove: Heinle & Heinle Publishers. Larsen-Freeman, D. (2007). A retroductive
approach to researched pedagogy. Paper presented at BAAL Conference,
Lancaster, England.
Lazear, D. (1992). Teaching for multiple intelligences, Phi Delta Kappan
Educational Foundation, 223-225.
Lightbown, P. (2000). Classroom SLA research and language teaching.
Applied Linguistics, 21 (4), 431-50.
Long, M., & Crookes, G. (1992). Three approaches to task-based syllabus
design. TESOL Quarterly, 28 (1), 27-49. Lynch, T. (2001). Seeing what they
meant: Transcribing as a route to noticing. English Language Teaching
Journal, 55 (2), 317-
25.
Lynch, T., & McLean, J. (2001). A case of exercising: Effects of immediate

task repetition on learner's performance. New York:
Basic Books.
MacWhinney, B. (2000a). The CHILDS project: Tools for analyzing tasks:

Volume 1. (ed.). Mahway, N. J.: Erlbaum.
McKenzie, W. (2002). Those who can teach multiple intelligences.
Innovative Teaching Newsletter, 4. Retrieved April 23, 2003 from
http://surfauarium.com/newsletter/mi.html
179
McLaughlin, B. (1987). Theories of second language acquisition.
London: Edward Arnold.
Mitchell, R., & Myles, F. (1998). Second language learning theories.
London: ARNOL.
Morgan, H. (1996). An analysis of Gardner's theory of multiple intelligences.
Rooper Review, 18, 263-270.
Morris, C. (2002). Intelligence reframed: Multiple intelligences for the 21st
century. Retrieved August 15, 2004 from http://article/ intell/review.html
Murphey, J. (2003). Task-based learning: The interaction between tasks
and learners. ELT Journal, 58 (4) 36-49. Nakahama, Y., Tyler, A., & Van Lier,
L. (2001). Negotiation of meaning in conversational and information gap
activities:
A comparative discourse analysis. TESOL Quarterly, 35 (3), 377-407.
NEAToday Online TOPIC (1999). Interview with Howard

Gardner.
Retrieved March 23, 2005 from http://www.nea.org/neatoday/
9903/gardner.html
Nemeth, N., & Kormos, J. (2001). Pragmatic aspects of task performance: The
case of argumentation. Language Teaching
Research, 5 (3), 213-40.
Newton, J. (2001). Options for vocabulary learning through communication
tasks. English Language Teaching Journal,
55 (1), 30-37.
180
Nicholas, H., Lightbown, P., & Spada, N. (2001). Recasts as feedback to
language learners. Language Learning, 51 (4), 719-58. Nunan, D. (1991).
Communication tasks and the language curriculum. TESOL Quarterly, 25
(3), 235-265.
Oller, J. W. (1979). Language tests at school. London:
Longman.
Perkins, D. (1981). The mind's best work. Cambridge, MA: Harvard
University Press.
Pinemann, M. (1998). Language processing and second language
development: Processability theory. London: Longman. Po-Ying, K. (1999).
Multiple intelligences theory and English language teaching. Retrieved April
12, 2005 from http://www.highschool.english.nccu.edu.tw/paper/ying
Prabhu, N. S. (1987). Second language pedagogy. Oxford: Oxford
University Press.
Rahimyan, Sh. (2003). The relationship between multiple intelligences and
learner types on Iranian EFL learners.
Unpublished MA thesis. Islamic Azad University, Science and
Research Campus, Tehran, Iran.
Richards, J. C., & Renandya, W. A. (2002). Methodology in language
teaching and applied linguistics. New York: Longman. Robinson, P. (2001).

Task complexity, cognitive load, and syllabus design. In P. Robinson (Ed.),
Cognitive and second language acquisition, Cambridge: Cambridge

University Press.
181
Robinson, P. (2002). Attention and memory during SLA. In C. Doughty,
& M. Long (Eds.), Handbook of research in second language acquisition,
Rodgers, T. S. (2001). Language teaching methodology. ERIC

Digest.
Retrieved April 2, 2006 from http://www.ericcll/digest/rodgers.
Saeidi, M. (2004). Grammar instruction: Multiple intelligencebased focus

on form approach in an EFL context. Unpublished PhD Dissertation.
Islamic Azad University, Science and Research Campus, Tehran, Iran.
Samuda, V. (2001). Guiding relationships between form and meaning

during task performance: The role of the teacher. Language
Teaching Journal, 10 (2), 23-30.
Shearer, B. (1999). The MIDAS: A guide to assessment and education for
multiple intelligences. Ohio: Grayden Press.
Skehan, P. (1996). A framework for the implementation of taskbased
instruction. Applied Linguistics, 17 (1), 37-50.
Skehan, P., & Foster, P. (1999). The influence of task structure and
processing conditions on narrative retellings. Language Learning, 49 (1),
98-101.
Skehan, P., & Foster, P. (2001). Cognition and tasks. London:
Longman.
Skehan, P. (2002). A non-marginal role for tasks. ELT Journal, 56 (3),
289-96.
182
Skehan, P. (2007). Contrasting views on the psycholinguistics of task-
based performance. Paper presented at BAAL
Conference, Lancaster, England.
Smith, M. K. (2002). Howard Gardner and multiple intelligences:
The encyclopedia of informal education. Retrieved May 29,
2003 from http://www.infed.org/thinkers/gardner.html
Spolsky, B. (1992). Diagnostic testing revisited. In E. Shohamy, &

R. A. Walton (Eds.), Language assessment and feedback: testing and
other strategies (29-39). National Foreign Language Center. Dubuque, IA:
Kendall/Hunt Publishing Co.
Sternberg, R. (1991). Death, taxes and bad intelligence tests.
Intelligence, 15, 257-269.
Sternberg, R. (1993). The nature of creativity. Cambridge:
Viens, J. (1999). Understanding multiple intelligences: The theory behind
practice. Focus on Basics, 3. Retrieved June 12, 2005 from
http://www.gse.harvard.edu/ncsall/fob/1999/viens.html
Widdowson, H. (1990). Aspects of language teaching. Oxford:
Oxford University Press.
Wiggins, G. (1994). Toward more authentic assessment of language
performances. In C. R. Hancock (Ed.), Teaching, testing, and assessment:
Making the connection. Northeast Conference Reports. Lincolnwood, IL:
National Textbook Co.
183
Willis, J. (1996). A framework for task-based learning. London:
Longman.
Wolf, D. (1989). Portfolio assessment: Sampling student work.
Educational Leadership, 46, 7, (35-39).
Yap, K. O. (1993). Integrating assessment with instruction in
ABE/ESL programs. Paper presented at the Annual Meeting

of
the American Educational Research Association, 1995. Yuan, F., & Ellis,
R. (2003). The effects of pre-task planning and online planning on fluency,
complexity, and accuracy in
L2 monologic oral production. Applied Linguistics, 24 (1), 1-27.
184
$SSHQGL[HV
185
$SSHQGL[$
/HYHOVRI&RJQLWLYH'RPDLQ%ORRP
&DWHJRU\ ([DPSOHDQG.H\:RUG
.QRZOHGJH5HFDOOGD ([DPSOHV5HFLWHDSROLF\
WDRU LQIRUPDWLRQ 4XRWHSULFHV
IURPPHPRU\
WRDFXVWRPHU.QRZWKH
VDIHW\UXOHV.H\
:RUGVGHILQHVGHVFULEHVLGHQ
WLILHV
NQRZVODEHOVOLVWVPDWFKHVQ
DPHV
RXWOLQHVUHFDOOVUHFRJQL]HV
UHSURGXFHV VHOHFWVVWDWHV
&RPSUHKHQVLRQ ([DPSOHV5HZULWHWKHSULQFLS
8QGHUVWDQGWKHPHDQ OHVRIWHVW
LQJ ZULWLQJ([SODLQLQRQH’s
WUDQVODWLRQLQWHUS RZQZRUGVWKH
RODWLRQ VWHSVIRUSHUIRUPLQJDFRPSOH[
DQGLQWHUSUHWDWLRQ WDVN
RI 7UDQVODWHDQHTXDWLRQLQWRD
LQVWUXFWLRQVDQGSUR FRPSXWHU
186
EOHPV VSUHDGVKHHW
6WDWHDSUREOHPLQRQH .H\:RUGVFRPSUHKHQGVFRQYHU
V WV
RZQZRUGV GHIHQGVGLVWLQJXLVKHVHVWLP
DWHVH[SODLQV
H[WHQGVJHQHUDOL]HV
$SSOLFDWLRQ8VH ([DPSOHV8VHDPDQXDOWRFDOFX
D ODWHDQ
FRQFHSWLQDQHZVL HPSOR\HH’s
WXDWLRQ YDFDWLRQWLPH$SSO\ODZVRI
RUXQSURPSWHGXVH VWDWLVWLFVWRHYDOXDWHWKH
RIDQ UHOLDELOLW\RID
DEVWUDFWLRQ$SSO\ ZULWWHQWHVW
ZKDWZDV .H\:RUGVDSSOLHVFKDQJHVFRPSX
OHDUQHGLQWKHFO WHV
DVVURRP FRQVWUXFWVGHPRQVWUDWHVGL
LQWRQRYHOVLWXD VFRYHUV
WLRQVLQWKH PDQLSXODWHVPRGLILHVRSHUDW
ZRUNSODFH HVSUHGLFWV
SUHSDUHVSURGXFHVUHODWHVV
KRZV
$QDO\ ([DPSOHV7URXEOHVKRRWDSLHF
VLV6HSDUDWH HRI
PDWHULDORUFRQFH HTXLSPHQWE\
SWVLQWR XVLQJORJLFDOGHGXFWLRQ
FRPSRQHQWSDUWVV 5HFRJQL]HORJLFDOIDOODFLHVLQ
RWKDWLWV UHDVRQLQJ*DWKHULQIRUPDWLR
RUJDQL]DWLRQDOV QIURPD
WUXFWXUH GHSDUWPHQWDQGVHOHFWVWKH
PD\EHXQGHUVWRRG UHTXLUHGWDVNV
'LVWLQJXLVKEHWZH IRUWUDLQLQJ.H\
HQIDFWV :RUGVDQDO\]HVEUHDNVGRZQFRP
DQGLQIHUHQFHV SDUHV
FRQWUDVWVGLDJUDPVGHFRQVW
UXFWV
GLIIHUHQWLDWHVGLVFULPLQDW
HVGLVWLQJXLVKHV
LGHQWLILHVLOOXVWUDWHVLQIH
UVRXWOLQHV
UHODWHVVHOHFWVVHSDUDWHV
187
6\QWKHVLV ([DPSOHV:ULWHDFRPSDQ\
%XLOGD RSHUDWLRQVRU
VWUXFWXUHRUSDW SURFHVVPDQXDO'HVLJQDPDFKLQ
WHUQIURP HWR
GLYHUVHHOHPHQWV SHUIRUPDVSHFLILFWDVN,QWHJU
3XWSDUWV DWHVWUDLQLQJ
WRJHWKHUWRIRUPD IURPVHYHUDOVRXUFHVWRVROYH
ZKROH DSUREOHP
ZLWKHPSKDVLVRQFU 5HYLVHDQGSURFHVVWRLPSURYH
HDWLQJD WKH
QHZPHDQLQJRUVWU RXWFRPH.H\:RUGV
XFWXUH FDWHJRUL]HVFRPELQHVFRPSLOH
VFRPSRVHV
FUHDWHVGHYLVHVGHVLJQVH[SO
DLQV
JHQHUDWHVPRGLILHVRUJDQL]HV
SODQV
UHDUUDQJHVUHFRQVWUXFWVUH
ODWHV
UHRUJDQL]HVUHYLVHVUHZULWH
VVXPPDUL]HV WHOOVZULWHV
(YDOXDWLRQ0DNH ([DPSOHV6HOHFW WKH PRVW

MXGJPHQWVDERXW HIIHFWLYH
WKHYDOXH VROXWLRQ+LUHWKHPRVWTXDOLI
RILGHDVRUPDWHUL LHGFDQGLGDWH
DOV ([SODLQDQGMXVWLI\ D QHZ
EXGJHW
.H\ :RUGV DSSUDLVHV
FRPSDUHV
FRQFOXGHV FRQWUDVWV
FULWLFL]HV FULWLTXHV
GHIHQGV GHVFULEHV
GLVFULPLQDWHV
HYDOXDWHV H[SODLQV
LQWHUSUHWV MXVWLILHV
UHODWHV
188
$SSHQGL[$
/HYHOVRI$IIHFWLYH'RPDLQ%ORRP
&DWHJRU\ ([DPSOHDQG.H\:RUGV
5HFHLYLQJ ([DPSOHV/
%H$ZDUH LVWHQWRRWKHUVZLWKUHVSHFW/
+DYHZLOOLQJQHVV LVWHQ
WRKHDU IRUDQGUHPHPEHUWKHQDPHRIQHZO\
6KRZVHOHFWLYHD LQWURGXFHG
WWHQWLRQ SHRSOH.H\:RUGVDVNVFKRRVHV
GHVFULEHVIROORZVJLYHVKROGVLGH
QWLILHV
ORFDWHVQDPHVSRLQWVWRVHOHFW
VVLWVHUHFWV
5HVSRQGLQJ$FWL ([DPSOHV3DUWLFLSDWHLQFODVV
YHO\ GLVFXVVLRQV*LYHDSUHVHQWDWLRQ
SDUWLFLSDWHRQW 4XHVWLRQQHZ
KHSDUWRI LGHDOVFRQFHSWVPRGHOVHWFLQRU
WKHOHDUQHUV$W GHUWRIXOO\
WHQGDQG XQGHUVWDQGWKHP.QRZWKHVDIHW\
UHDFWVWRDSDUWL UXOHVDQG
FXODU SUDFWLFHVWKHP
189
SKHQRPHQRQ/HDUQ .H\:RUGVDQVZHUVDVVLVWVDLGVFR
RXWFRPHVPD\ PSOLHV
HPSKDVL]H FRQIRUPVGLVFXVVHVJUHHWVKHOSV
FRPSOLDQFHLQUHV ODEHOV
SRQGLQJ SHUIRUPVSUDFWLFHVSUHVHQWVUH
ZLOOLQJQHVVWRU DGVUHFLWHV
HVSRQGRU UHSRUWVVHOHFWVWHOOVZULWHV
VDWLVIDFWLRQLQU
HVSRQGLQJ
9DOXLQJ$WWDFKZ ([DPSOHV'HPRQVWUDWHEHOLHILQ
RUWKRU WKH
YDOXHWRDSDUWLF GHPRFUDWLFSURFHVV
XODU %HVHQVLWLYHWRZDUGV
REMHFWSKHQRPHQ LQGLYLGXDODQGFXOWXUDOGLIIHUH
RQRU QFHVYDOXH
EHKDYLRU,QWHUQD GLYHUVLW\6KRZWKHDELOLW\
OL]HRID
WRVROYH
VHWRIVSHFLILHGY
DOXHV SUREOHPV3URSRVHDSODQWRVRFLDO
ZKLOHFOXHVWRWK LPSURYHPHQW
HVHYDOXHV DQGIROORZVWKURXJKZLWKFRPPLWP
DUHH[SUHVVHGLQ HQW,QIRUP
WKH PDQDJHPHQWRQPDWWHUVWKDWRQ
OHDUQHU¶VRYHUW HIHHOVVWURQJO\
EHKDYLRU DERXW.H\:RUGV
DQGDUHRIWHQLGH FRPSOHWHVGHPRQVWUDWHVGLIIHU
QWLILDEOH HQWLDWHV
H[SODLQVIROORZVIRUPVLQLWLDWHV
LQYLWHVMRLQV
MXVWLILHVSURSRVHVUHDGVUHSRU
WVVHOHFWV
VKDUHVVWXGLHVZRUNV
2UJDQL]LQJDQG ([DPSOHV5HFRJQL]HWKHQHHGIRUE
FRQFHSWXDOL]L DODQFH
QJ EHWZHHQIUHHGRPDQGUHVSRQVLEO
2UJDQL]HYDOXHVL H
QWR EHKDYLRU$FFHSWUHVSRQVLELOLW\
SULRULWLHVE\ IRURQH’s
FRQWUDVWLQJ EHKDYLRU([SODLQWKHUROHRIV\
GLIIHUHQWYDOXHV VWHPDWLF
190
5HVROYH SODQQLQJLQVROYLQJSUREOHPV$FFH
FRQIOLFWVEHWZH SW
HQWKHP SURIHVVLRQDOHWKLFDOVWDQGDUG
&UHDWHDQXQLTXH V&UHDWHDOLIHSODQ
YDOXH LQKDUPRQ\
V\ ZLWKDELOLWLHVLQWHUHVWVDQGE
VWHP7KHHPSKDVLV HOLHIV
LV .H\:RUGVDGKHUHVDOWHUVDUUDQJ
RQFRPSDULQJUHOD HV
WLQJ FRPELQHVFRPSDUHVFRPSOHWHVGHI
DQGV\ HQGV
QWKHVL]LQJYDOXH H[SODLQVIRUPXODWHVJHQHUDOL]H
V VLGHQWLILHV
LQWHJUDWHVPRGLILHVRUGHUV
&KDUDFWHUL]LQ ([DPSOHV6KRZVHOIUHOLDQFHZKHQ
JE\D ZRUNLQJ
YDOXHRUYDOXHF LQGHSHQGHQWO\
RQFHSW &RRSHUDWHLQJURXS
+DYHDYDOXHV\ DFWLYLWLHVGLVSOD\
VWHPWKDW VWHDPZRUN8VHDQREMHFWLYH
FRQWUROVKLVEHK DSSURDFKLQSUREOHPVROYLQJ'LVSOD
DYLRU7KH \D
EHKDYLRULVSHUYD SURIHVVLRQDOFRPPLWPHQWWRHWK
VLYH LFDOSUDFWLFHRQ DGDLO\
FRQVLVWHQW EDVLV5HYLVHMXGJPHQWVDQGFKDQJ
SUHGLFWDEOH,QVW HV
UXFWLRQDO EHKDYLRULQOLJKWRIQHZHYLGHQFH
REMHFWLYHVDUHF 9DOXHSHRSOH
RQFHUQHG IRUZKDWWKH\DUHQRWKRZWKH\
ZLWKWKHVWXGHQ ORRN
W VJHQHUDO .H\:RUGVDFWVGLVFULPLQDWHVGLV
SDWWHUQVRIDGMX SOD\V
VWPHQW LQIOXHQFHVOLVWHQVPRGLILHVSHUI
RUPVSUDFWLFHV SURSRVHV
191
$SSHQGL[$
/HYHOVRI3V\FKRPRWRU'RPDLQ%ORRP
&DWHJRU\ ([DPSOHDQG.H\:RUGV
3HUFHSWLRQ8VH ([DPSOHV'HWHFWQRQYHUEDOFRPP
VHQVRU\ XQLFDWLRQ
FXHVWRJXLGHPRW FXHV(VWLPDWHZKHUHDEDOOZLOOO
RU DQGDIWHULWLV
DFWLYLW\ WKURZQDQGWKHQPRYLQJWRWKHFR
7KLVUDQJHVIURP UUHFWORFDWLRQ
VHQVRU\ WRFDWFKWKHEDOO$GMXVWKHDWRI
VWLPXODWLRQ VWRYHWRFRUUHFW
WKURXJKFXHVHOH WHPSHUDWXUHE\
FWLRQWR VPHOODQGWDVWHRIIRRG$GMXVW
WUDQVODWLRQ WKHKHLJKWRIWKHIRUNVRQDIRUNO
192
LIWE\FRPSDULQJ
ZKHUHWKHIRUNVDUHLQUHODWLRQ
WRWKHSDOOHW
.H\:RUGVFKRRVHVGHVFULEHVGHW
HFWV
GLIIHUHQWLDWHVGLVWLQJXLVKHVL
GHQWLILHVLVRODWHV
UHODWHVVHOHFWV
0DQLSXODWLRQ+ ([DPSOHV.QRZDQGDFWXSRQDVHTXHQFH
DYH RI
UHDGLQHVVWRDF VWHSVLQDPDQXIDFWXULQJSURFHVV5HF
W,W RJQL]H
LQFOXGHVPHQWDO RQH’s
SK\VLFDO DELOLWLHVDQGOLPLWDWLRQV6KRZGHV
DQGHPRWLRQDOVH LUHWR
WV7KHVH OHDUQDQHZSURFHVVPRWLYDWLRQ.H\
WKUHHVHWVDUHG :RUGVEHJLQVGLVSOD\VH[SODLQVPRYHV
LVSRVLWLRQV SURFHHGVUHDFWVVKRZVVWDWHVYROX
WKDWSUHGHWHUP QWHHUV
LQHD
SHUVRQ¶VUHVSRQV
HWR
GLIIHUHQWVLWXD
WLRQV
3UHFLVLRQ7KLVL ([DPSOHV8VHDSHUVRQDOFRPSXWH
VWKH U5HSDLUD
LQWHUPHGLDWHV OHDNLQJIDXFHW.H\:RUGVDVVHPEO
WDJHLQ HV
OHDUQLQJDFRPSOH FDOLEUDWHVFRQVWUXFWVGLVPDQ
[ WOHVGLVSOD\V
VNLOO/ IDVWHQVIL[HVJULQGVKHDWVPDQLS
HDUQHGUHVSRQVH XODWHV
V PHDVXUHVPHQGVPL[HVRUJDQL]HVV
KDYHEHFRPHKDEL NHWFKHV
WXDODQG
WKHPRYHPHQWVFD
QEH
SHUIRUPHGZLWKVR
PH
FRQILGHQFHDQG
SURILFLHQF\
$UWLFXODWLRQ6 ([DPSOHV0DQHXYHUDFDULQWRDWL
KRZ JKWSDUDOOHO
VNLOOIXOSHUIRUP SDUNLQJVSRW2SHUDWHDFRPSXWHU
DQFHRI TXLFNO\DQG
PRWRUDFWVWKDW DFFXUDWHO\'LVSOD\
193
LQYROYH FRPSHWHQFHZKLOHSOD\LQJ
FRPSOH[PRYHPHQW WKHSLDQR
SDWWHUQV3URILFL .H\:RUGVDVVHPEOHVEXLOGVFDOLE
HQF\LV UDWHV
LQGLFDWHGE\ FRQVWUXFWVGLVPDQWOHVGLVSOD\
DTXLFN VIDVWHQVIL[HV
DFFXUDWHDQGKLJ JULQGVKHDWVPDQLSXODWHVPHDVX
KO\ UHVPHQGV
FRRUGLQDWHGSHU PL[HVRUJDQL]HVVNHWFKHV
IRUPDQFH
UHTXLULQJDPLQLP
XPRI
HQHUJ\
Naturalization: Create new ([DPSOHV&RQVWUXFWDQHZWKHRU\'HY

movement patterns to fit a HORSD
particular situation or QHZDQGFRPSUHKHQVLYHWUDLQLQJSURJ
specific problem. Learning UDPPLQJ
outcomes emphasize &UHDWHDQHZJ\PQDVWLFURXWLQH
creativity .H\:RUGVDUUDQJHVEXLOGVFRPELQHV
based upon highly developed FRPSRVHVFRQVWUXFWVFUHDWHVGHVLJ
skills. QVLQLWLDWH PDNHVRULJLQDWHV
194
$SSHQGL['
0,&RUH2SHUDWLRQV'HILQHG
195
196
197
WRZDUG VWUD
VD WHJ\
WDQJLEOH WR
RXWFRP DFKL
HRU HYH
UHVXOW DQDL
P
$VNW
KHP
WR
$V 3HUIR
*RRGDW
0XVLFL UP
0XVLFDO
DQV DPXVL
DELOLW\
6LQJHUV FDO
DZDUHQHVV
$SSUHFLDWL &RPSRV SLHF
R HUV H
QDQGXV '- V 6LQJD
HRI 0XVLF VRQJ
VRXQG SURGXF 5HYL
0XVLF 5HFRJQL HUV HZD
DO WLRQ 3LDQR PXVLF
RIWRQD WXQHUV DO
ODQG $FRXV ZRUN
UK\WKPLF WLF &RDFK
SDWWHUQV HQJLQ VRPH
8QGHUV HHUV RQH
WDQGL (QWHU WRSO
QJWKH WDLQH D\D
UHODWL UV3DU PXVLF
RQVKLS W\ DO
EHWZHHQ SODQQ LQVW
VRXQGV HUV UXPH
DQG (QYLUR QW
IHHOLQJV QPH 6SHFL
QWDQG I\
QRLVH PRRG
DGYLVR PXVLF
UV IRU
WHOH
SKRQ
%RGLO\ HV\
.LQHV *RRGDW VWHP
%RG\ $V V
WKH
PRYHPH 'DQFHU DQG
W
QW V UHFH
LF
198
SWLR
Q
V
$VWK
HP
'HPRQV
WR
WUD
-
WRUV
XJJOH
$FWRUV
'HPRQ
$WKOH
VW
WHV
GLYHU UDWHD
V VSRUWV
6SRUW WHFK
V QLTX
SHRSOH H)OLS
6ROGL D
FRQWURO EHHU
HUV
0DQXDO PDW
)LUH
GH[WHU &UHD
ILJKWH
LW\ WHD
UV
3K\VLFDO PLPH
3HUIRU
DJLOLW\ WR
PDQ
DQG H[SOD
FHDUW
EDODQFH LQ
LVWHV
(\HDQG VRPH
(UJRQR
ERG\ WKLQ
PLV
FRRUGL J7RVV
WV
QDWLR D
2VWHR
Q SDQF
SDWKV
DNH
6SDWLD *RRGDW $V $VNWKHP

O
199
9LVXDO 9LVXDODQG $UWLVWV WR
VSDWLDO 'HVLJQHU 'HVLJQD
SHUFHSWL V FRVWXP
RQ &DUWRRQ H ,QWH
,QWHUSUH LVW USUHW
WDWL V6WRU\ D
RQDQG ERDUGHU SDLQWL
FUHDWLRQ V QJ
RI $UFKLWH FUHDW
YLVXDO FWV HD
LPDJHV 3KRWRJU URRP
3LFWRULDO DS OD\RXW
LPDJLQDWL KHUV &UHDW
RQ 6FXOSWR HD
DQG UV FRUSRU
H[SUHVVLR 7RZQ DWH
Q SODQQHU ORJR
8QGHUVWD V 'HVLJQD
QGL 9LVLRQDU EXLOGL
QJ LHV QJ
UHODWLRQ
VKLS
EHWZHHQ
LPDJHVDQG
PHDQLQJV
$VNWK
*RRGDW HP
3HUFHSWL WR
RQ $V ,QWHUS
RIRWKHU 7KHUDSLV UHW
SHRSOH V WV PRRGV
IHHOLQJV 0HGLDWR IURP
$ELOLW\WR UV IDFLDO
UHODWHWR /HDGHUV H[SUHV
RWKHUV &RXQVHO VLR
,QWHUS ,QWHUSUH RUV QV
HUV
WDWL 'HPRQV
RQDO
RQRI W
3ROLWLFL
EHKDYLRU UDWH
DQV
DQG IHHOLQ
(GXFDWR
FRPPXQLFD JV
UV
W WKURXJ
6DOHV
LRQV K
SHRSOH
8QGHUVWD ERG\
&OHUJ\
QGL ODQJXD
3V\ JH
QJWKH
FKRORJL $IIHFW
UHODWLRQ
VWV WKH
VKLS
200
IHHOLQ
JV
RIRWKH
UV
LQD
7HDFKHU
SODQQH
VEHWZHHQ V
G
SHRSOHDQ 'RFWRUV
+HDOHUV ZD\
G &RDFK
WKHLU DQRWK
VLWXDWLR HU
QV SHUVRQ
$V
*RRGDW $Q\RQH $VNWK
6HOI ZKRLVVHO HP
DZDUHQHV I WR
V DZDUHDQ &RQVLG
3HUVRQDO G HU
FRJQL]DQF LQYROYH DQG
H GLQ GHFLGH
3HUVRQDO WKHSURF RQH V
REMHFWLY HVV RZQDLP
LW\ RI V
8QGHUVWD FKDQJLQJ DQG
QGL SHUVRQD SHUVRQ
,QWUDS QJRQHVHOI O DO
HUV 8QGHUVWD WKRXJKW FKDQJH
RQDO QGL V V
QJ EHOLHIVD UHTXL
RQH V QG UHG
UHODWLRQ EHKDYLR WR
VKLS ULQ DFKLHY
WRRWKHU UHODWLR H
V DQGWKH QWR WKHP
ZRUOGDQG WKHLU &RQVLG
RQH VRZQ VLWXDWL HU
QHHGIRU RQ RQH V
5HDFWLRQ RWKHU RZQ
WR SHRSOH -RKDUL
FKDQJH WKHLU :LQGRZ
SXUSRVH DQG
DQGDLPV GHFLGH
0DVORZ V RSWLR
6HOI QV
$FWXDOL IRU
VDWL GHYHO
RQOHYHO RSP
HQW
201
*RRGDW
&ODVVLILF
$VNWK
DWLR
HP
Q
WR
1XUWXULQ
$V :RUNLQ
J
%LRORJLV QDWXU
5HFRJQLWL
WV H
RQ
%RWDQLV )LQGQH
RIDQLPDO
W Z
VSHFLHV
=RRORJLV VSHFLH
1DWXUDOL 8QGHUVWD
W VRI
VW QGL
6FLHQFH SODQWV
QJWKH
VWXGHQW DQG
UHODWLRQ
6FLHQFH DQLPDO
VKLS
WHDFKHU V
EHWZHHQ
0DNHD
KXPDQVDQ
WRXUL
G
QWR
RWKHU
QDWXU
VSHFLHV
H
$SSHQGL[(
2EMHFWLYH%HKDYLRUVLQ/
HYHOVRI&RJQLWLYH'RPDLQ
202
203
SURYLGHUHODWHUHSRUWVK
RZVROYHWHDFK
WUDQVIHUXVHXWLOL]H
7KHDELOLW\WR
‡EUHDNGRZQLQIRUPDWLRQDO
PDWHULDOVLQWR
WKHLUFRPSRQHQWSDUWV
‡H[DPLQHDQGWU\
LQJWRXQGHUVWDQGWKH
RUJDQL]DWLRQDOVWUXFWXU
$QDO\VLV HRIVXFK
LQIRUPDWLRQWRGHYHORSGL
YHUJHQW
FRQFOXVLRQVE\LGHQWLI\
LQJPRWLYHVRU
FDXVHV
‡PDNHLQIHUHQFHV
‡ILQGHYLGHQFHWRVXSSRUWJ
HQHUDOL]DWLRQV
$EOHWR
EUHDNGRZQFRUUHODWHGLDJ
UDP
GLIIHUHQWLDWHGLVFULPLQD
WHGLVWLQJXLVK
6\ IRFXVLOOXVWUDWHLQIHUOLP
QWKHVLV LWRXWOLQHSRLQW
RXWSULRULWL]HUHFRJQL]HV
HSDUDWH
VXEGLYLGH
7KHDELOLW\WR
‡SURGXFHDXQLTXHFRPPXQLF
DWLRQ
‡SURGXFHDSODQRUSURSRVHG
VHWRI
(YDOXDWL RSHUDWLRQV
RQ ‡GHULYDWHDVHWRIDEVWUDF
WUHODWLRQV
$EOHWR
DGDSWDQWLFLSDWHFDWHJR
204
UL]HFROODERUDWH
FRPELQHFRPPXQLFDWHFRPSD
UHFRPSLOH
FRPSRVHFRQWUDVWFUHDWH
GHVLJQGHYLVH
H[SUHVVIDFLOLWDWHIRUPXO
DWHJHQHUDWH
LQFRUSRUDWHLQGLYLGXDOL]
HLQLWLDWH
LQWHJUDWHLQWHUYHQHPRG
HOPRGLI\
QHJRWLDWHSODQSURJUHVVU
HDUUDQJH
UHFRQVWUXFWUHLQIRUFHUH
RUJDQL]HUHYLVH
VWUXFWXUHVXEVWLWXWHYD
OLGDWH
7KHDELOLW\WR
‡MXGJHWKHYDOXHRIPDWHUL
DOEDVHGRQ
SHUVRQDOYDOXHVRSLQLRQV
‡UHVXOWLQDQHQGSURGXFWZ
LWKDJLYHQ
SXUSRVHZLWKRXWUHDOULJK
WRUZURQJ DQVZHUV
$EOHWR
DSSUDLVHFRPSDUH
FRQWUDVWFRQFOXGH
FULWLFL]HFULWLTXHGHFLGHGHIHQG
MXGJHMXVWLI\
LQWHUSUHW
UHIUDPHVXSSRUW
205
$SSHQGL[)
7KH3LORW,QWHJUDWLYH7DVNV
,QWHJUDWLYH7DVNVDW.QRZOHGJH6XEVF
DOH
DEXWEWKHQFWKHGWR
0DWKHPDWLF.QRZOHGJH7DVN0DWK.
+RZPDQ\DQJOHVDUHWKHUHLQDFXEH" DEFG
0XVLFDO.QRZOHGJH7DVN0XV.
+RZPDQ\V\OODEOHVDUHLQWKHZRUG(QJLQHHU" DEFG
.LQHVWKHWLF.QRZOHGJH7DVN.LQ.
:KHUHLVWKHELJJHVWPXVFOHLQ\RXUERG\"
DDUPEOHJFQHFNGEDFN
6SDWLDO.QRZOHGJH7DVN6SD.
$JHRPHWULFVKDSHZLWKILYHVLGHVLVD««««
D5HFWDQJOHE7ULDQJOHF6TXDUHG3HQWDJRQ
,QWUDSHUVRQDO.QRZOHGJH7DVN,QWUD.
$VDQKRQHVWSHUVRQGR\
RXFRQIHVVDIWHUPDNLQJDPLVWDNH" DVRPHWLPHVEDOZD\
VFLWGHSHQGVGQHYHU
,QWHUSHUVRQDO.QRZOHGJH7DVN,QWHU.
$SHUVRQZKRDOZD\VGLVDJUHHVZLWKRWKHUV
DWWLWXGHVLV««
DGHWHUPLQHGEDUJXPHQWDWLYHFXQVRFLDEOHGLQVLQFH
UH
1DWXUDOLVW.QRZOHGJH7DVN1DW.
206
:KLFKDQLPDOLVZLWKLQWKH0DPPDOV)DPLO\"
DFDWEVKDUNFHDJOHGIURJ
207
,QWHJUDWLYH7DVNVDW&RPSUHKHQVLRQ6XEVFDOH
/LQJXLVWLF&RPSUHKHQVLRQ7DVN/LQJ&
:ULWHGRZQDV\QRQ\PDQGDQDQWRQ\PIRUWKHYHUE³SURWHFW
´LQWKH IROORZLQJVHQWHQFH
<RXQJFKLOGUHQQHHGWREHSURWHFWHGIURPSK\
VLFDODQGPHQWDODEXVH
0DWKHPDWLF&RPSUHKHQVLRQ7DVN0DWK&
5HZULWHWKLVPDWKHPDWLFDOVWDWHPHQWLQWR(QJOLVKSU
RVH !±
0XVLFDO&RPSUHKHQVLRQ7DVN0XV&
3DUDSKUDVHWKHIROORZLQJOLQHVE\-RKQ.HDWV
$WKLQJRIEHDXW\LVDMR\IRUHYHU
,WVORYHOLQHVVLQFUHDVHVLWZLOO
1HYHUSDVVLQWRQRWKLQJQHVV
.LQHVWKHWLF&RPSUHKHQVLRQ7DVN.LQ&
:KDWVSRUWGRHVWKLVORJRUHSUHVHQW"
6SDWLDO&RPSUHKHQVLRQ7DVN6SD&
:KDWLVDWKUHHGLPHQVLRQDOVKDSHZKLFKLVPDGHRIIRXUWULD
QJOHVDQG
RQHVTXDUH"
,QWUDSHUVRQDO&RPSUHKHQVLRQ7DVN,QWUD&
$IULHQGKDVLQVXOWHG\RX+RZGR\RXGHVFULEH\
RXUDPELYDOHQFHRI IHHOLQJVIRUFXWWLQJWKLVIULHQGVKLS"
,QWHUSHUVRQDO&RPSUHKHQVLRQ7DVN,QWHU&
-HQQ\GRHVQ¶WFRXQWKHUFKLFNHQEHIRUHWKH\
¶UHKDWFKHG+RZGR\RX GHVFULEHKHUFKDUDFWHU"
1DWXUDOLVW&RPSUHKHQVLRQ7DVN1DW&
:KDWLVWKHQDWXUDOGLVDVWHUGHVFULEHGLQWKLVVHQWH
QFH" ,WOLIWHGDFDUDERXWIHHWRIIWKHJURXQGDQGWKHQZH
VDZLW GLVDSSHDUHGIDUIURPXV
D7KXQGHUE+XUULFDQHF(DUWKTXDNHG9ROFDQLFHUXSWLRQ
208
,QWHJUDWLYH7DVNVDW$SSOLFDWLRQ6XEVFDOH
/LQJXLVWLF$SSOLFDWLRQ7DVN/LQJ$S
:ULWHGRZQDVHQWHQFHZLWKWKHIROORZLQJZRUGV
EXVZHXSRUKXUU\ZLOOWKHPLVV
0DWKHPDWLFDO$SSOLFDWLRQ7DVN0DWK$S
,I\
RXUDYHUDJHVDYLQJLVGROODUVSHUPRQWKDQGWKLVLVRQO\
SHUFHQWVRI\RXUSD\PHQWKRZPXFKDUH\RXSDLGSHUPRQWK"
0XVLFDO$SSOLFDWLRQ7DVN0XV$S
,QSRHWU\³,DPELF´LVDOLQHLQZKLFKDQXQVWUHVVHGV\
OODEOHLV IROORZHGE\
DVWUHVVHGRQH,VWKLVDQLDPELFOLQH"
0\ZD\LVWREHJLQZLWKWKHEHJLQQLQJ
.LQHVWKHWLF&RPSUHKHQVLRQ7DVN.LQ$S
<RXUERG\QHHGVJUDPVSURWHLQSHUSRXQG$GGPRUHLI\RX
DUHDPDQ:KDWLV\RXULGHDOGDLO\SURWHLQFRQVXPSWLRQ"
6SDWLDO$SSOLFDWLRQ7DVN6SD$S
<RXUURRPLVILYHPHWHUVORQJDQGIRXUZLGH:KDWVKDSHPR
VW SUREDEO\GRHV\RXUURRPKDYH"
,QWUDSHUVRQDO$SSOLFDWLRQ7DVN,QWUD$S
:KDWZRXOG\RXGRLIRQHRI\
RXUFRDWEXWWRQVFDPHRIIDWWKH VWUHHW"
,QWHUSHUVRQDO$SSOLFDWLRQ7DVN,QWHU$S
:KDWZRXOG\RXGRLQWKLVVLWXDWLRQ"
$IULHQGERUURZV\
RXUSHQDQGWKHQORVHVLW:KHQKHDSRORJL]HV\RX
ZDQWWRUHDVVXUHKLP
1DWXUDOLVW$SSOLFDWLRQ7DVN1DW$S
$QLPDOVSHFLHVLQDIDPLO\VKDUHSK\
VLFDOFKDUDFWHULVWLFVVXFKDV
WKHVKDSHRIWKHKHDGRUWKHTXDOLW\
RIPRYHPHQW1RZPHQWLRQIRXU
PHPEHUVLQWKH,QVHFWV)DPLO\
209
,QWHJUDWLYH7DVNVDW$QDO\VLV6XEVFDOH
/LQJXLVWLF$QDO\VLV7DVN/LQJ$Q
&RPSOHWHWKHIROORZLQJVHQWHQFHZLWKWKHEHVWFKRLFH
(YHQWKHPRVW««IORZHUKDVWKRUQV DXJO\
EZHDWKHUHGFHOXVLYHGQR[LRXVHWHPSWLQJ
0DWKHPDWLFDO$QDO\VLV7DVN0DWK$Q
:KLFKIUDFWLRQLVELJJHU"
D7KHVTXDUHGVXPRIGLIIHUHQFHVEHWZHHQ;DQG<GLYLGHGE\
E7KHVTXDUHG;PLQXVVTXDUHG<GLYLGHGE\
0XVLFDO$QDO\VLV7DVN0XV$Q
&RPSDUHWKHVHWZRVHQWHQFHV+RZDUHWKH\
GLIIHUHQWLQPHDQLQJ"
D,FRXOGKHDUWKHNLGV¶YRLFHVXSVWDLUV
E,FRXOGKHDUWKHNLGV QRLVHVXSVWDLUV
.LQHVWKHWLF$QDO\VLV7DVN.LQ$Q
-RKQLVDQDWKOHWH:KDWVSRUWJDPHLVKHPRVWSUREDEO\
WDONLQJ DERXW"
³,OLNHWKHQRLVHWKHVSHHGDQGWKHGDQJHU±WKHUHLVQRWKL
QJPRUH H[FLWLQJWRZDWFK7KRXJK-HQQ\
LVIRQGRIWKHVORZEXWVPDUW PRYHPHQWVRQWKHERDUG´
6SDWLDO$QDO\VLV7DVN6SD$Q
:KLFKFRQWDLQHUZRXOG\
RXXVHIRUNHHSLQJFHUHDOVDQGFKRFRODWHV" D%RZHOE
%DVNHWF%XFNHWG%RWWOH
,QWUDSHUVRQDO$QDO\VLV7DVN,QWUD$Q
,QZKDWVLWXDWLRQPLJKW\RXWLSWRHDURXQGWKHKRXVH"
,QWHUSHUVRQDO$QDO\VLV,QWHU$Q
+RZGR\RXGHVFULEHDSHUVRQZKRLVNLQGRIDFROGILVK
210
1DWXUDOLVW$QDO\VLV7DVN1DW$Q
:KLFKRQHLVOHDVWOLNHWKHRWKHUIRXU"
D+RUVHE.DQJDURRF*RDWG'RQNH\
211
,QWHJUDWLYH7DVNVDW6\
QWKHVLV6XEVFDOH
/LQJXLVWLF6\QWKHVLV7DVN/LQJ6\Q
&DQ\
RXZULWHGRZQDQLGLRPDERXWWKHWLPHDSHUVRQFDQQRW
H[SUHVVKLPVHOI"
0DWKHPDWLFDO6\QWKHVLV7DVN0DWK6\Q
,PDJLQHWKHVKLS\RX¶UHWUDYHOLQJRQLVVLQNLQJDQG\
RXDUHDOORZHG WRFDUU\RQO\NLORJUDPVLQ\
RXUEDJRXWRIWKHVKLS:KLFKRQHVGR \RXFKRRVH":K\"
DURSHNJ EPHGLFDONLWNJ FFDQVRIIRRGJUHDFK
GERWWOHVRIZDWHUNJHDFK HVKRUWZDYHUDGLRNJ I$
[NJ
0XVLFDO6\QWKHVLV7DVN0XV6\Q
:KDWFDQPDNHDKLVVLQJVRXQGLQWKHNLWFKHQ"
.LQHVWKHWLF6\QWKHVLV7DVN.LQ6\Q
&DQ\RXZULWHGRZQWKUHHHIIHFWVRIUHJXODUERG\
H[HUFLVHRQD SHUVRQ VOLIH"
6SDWLDO6\QWKHVLV7DVN6SD6\Q
&DQ\RXGUDZIRXUVTXDUHVVKDULQJDWOHDVWRQHVLGHE\
PHDQVRI OHVVWKDQWRRWKSLFNV"
,QWUDSHUVRQDO6\QWKHVLV7DVN,QWUD6\Q
,PDJLQHWKDW\
RXKDYHEHHQHOHFWHGDVWKHPDQDJHURIDELJ FRPSDQ\
OLNH621<ZKDWGR\RXGRRQWKHILUVWGD\LQ\RXUQHZ
SRVLWLRQ"
,QWHUSHUVRQDO6\QWKHVLV7DVN,QWHU6\Q
&RPSOHWHWKHIROORZLQJVHQWHQFH
³+HLVDQHJRFHQWULFSHUVRQ+HFDQ¶W«««««´
1DWXUDOLVW6\QWKHVLV7DVN1DW6\Q +RZFDQ\
RXSURWHFW\RXUVHOIIURPGLVHDVHVFDXVHGE\FHUWDLQ
PRVTXLWRHV ELWHVLQDWURSLFDODUHD"
212
,QWHJUDWLYH7DVNVDW(YDOXDWLRQ6XEVFD
OH
/LQJXLVWLF(YDOXDWLRQ7DVN/LQJ(YD
,QWKHIROORZLQJVHQWHQFHXQGHUOLQHXQJUDPPDWLFDO
ZRUGVDQG UHSODFHWKHPZLWKWKHFRUUHFWRQHV
7KHUHDIHZGUXJVDUHWRGD\
WKDWYDOXHGPRUHWKHQSHQLFLOOLQ
0DWKHPDWLFDO(YDOXDWLRQ7DVN0DWK(YD
:KRLVWKHROGHVWLIWKHVHWKUHHIULHQGVPDNHRQHWU
XHDQGRQH IDOVHVWDWHPHQWV"
$OLFH,¶PROGHUWKDQ%UHQGD&DUOLVQRWWKHROGHVW
%UHQGD,¶PWKHROGHVW&DUOLV\RXQJHUWKDQ$OLFH
&DUO,¶PROGHUWKDQ%UHQGD$OLFHLVWKH\RXQJHVW
0XVLFDO(YDOXDWLRQ7DVN0XV(YD
0HJXUJHQWO\
DUJXHVIRUWKHHIIHFWLYHUROHVRIWKDWPXVLFSOD\VRQ
WKHDXGLHQFHLQ79SURJUDPV&DQ\
RXFRPSOHWHKHUFRPPHQWVZLWK
WKUHHDGYDQWDJHVRIXVLQJPXVLFLQ79SURJUDPV" $QG\:K\
GRWKH\KDYHPXVLFLQ791HZVSURJUDPV"7RPHLW
VRXQGVWRWDOO\XQQHFHVVDU\ 0HJ0RVWSUREDEO\WKH\
ZDQWWRFUHDWHDVHQVHRIGUDPD,WKLQN
PXVLFLVVXSSRVHGWR«
.LQHVWKHWLF(YDOXDWLRQ7DVN.LQ(YD
'LVFXVVZKLFKRQHLVPRUHVXLWDEOHIRUFKLOGUHQ D
%DVNHWEDOOE6ZLPPLQJF*\PQDVWLFVG6NDWLQJ
6SDWLDO(YDOXDWLRQ7DVN6SD(YD
:KLFKRQHGR\
RXSUHIHUPRUHWROLYHLQDELJKRXVHRUDVPDOO
IODW"%ULQJ\RXUUHDVRQV
213
,QWUDSHUVRQDO(YDOXDWLRQ7DVN,QWUD(YD
<RXKDYHDOUHDG\SODQQHGIRUDVKRUWYDFDWLRQZKHQ\RX
VXGGHQO\UHDGLQQHZVSDSHUWKDW
³<RXPD\EHLWFKLQJWRWUDYHOWRGD\:KDWLV\
RXUUHDFWLRQ":K\"
,QWHUSHUVRQDO(YDOXDWLRQ7DVN,QWHU(YD
'LVFXVVWKLV&KLQHVHSURYHUE
³2QHSHUVRQ¶VPHDOLVDQRWKHURQH¶VSRLVRQ´
1DWXUDOLVW(YDOXDWLRQ7DVN1DW(YD
6LQFHDJHVDJR(VNLPRVKDYHEXLOWWKHLUKRXVHVLQVHP
LFLUFOH
VKDSH+RZFDQ\RXVXSSRUWWKLVROGWUDGLWLRQ"
214
$SSHQGL[)
5HYLVHG,QWHJUDWLYH7DVNV
,QWHJUDWLYH7DVNVDW.QRZOHGJH6XEVF
DOH
DEXWEWKHQFWKHGWR
0DWKHPDWLF.QRZOHGJH7DVN0DWK.
+RZPDQ\DQJOHVDUHWKHUHLQDFXEH" DEFG
0XVLFDO.QRZOHGJH7DVN0XV.
+RZPDQ\V\OODEOHVDUHLQWKHZRUG(QJLQHHU" DEFG
.LQHVWKHWLF.QRZOHGJH7DVN.LQ.
:KHUHLVWKHELJJHVWPXVFOHLQ\RXUERG\"
DDUPEOHJFQHFNGEDFN
6SDWLDO.QRZOHGJH7DVN6SD.
$JHRPHWULFVKDSHZLWKILYHVLGHVLVD««««
D5HFWDQJOHE7ULDQJOHF6TXDUHG3HQWDJRQ
,QWUDSHUVRQDO.QRZOHGJH7DVN,QWUD.
$VDQKRQHVWSHUVRQGR\
RXFRQIHVVDIWHUPDNLQJDPLVWDNH" DVRPHWLPHVEDOZD\
VFLWGHSHQGVGQHYHU
,QWHUSHUVRQDO.QRZOHGJH7DVN,QWHU.
$SHUVRQZKRDOZD\VGLVDJUHHVZLWKRWKHUV
DWWLWXGHVLV««
DGHWHUPLQHGEDUJXPHQWDWLYHFXQVRFLDEOHGLQVLQFH
UH
1DWXUDOLVW.QRZOHGJH7DVN1DW.
:KLFKDQLPDOLVZLWKLQWKH0DPPDOV)DPLO\"
DFDWEVKDUNFHDJOHGIURJ
,QWHJUDWLYH7DVNVDW&RPSUHKHQVLRQ6XEVFDOH
215
/LQJXLVWLF&RPSUHKHQVLRQ7DVN/LQJ&
:ULWHGRZQDV\QRQ\PDQGDQDQWRQ\
PIRUWKHYHUE³SURWHFW´LQWKH IROORZLQJVHQWHQFH
<RXQJFKLOGUHQQHHGWREHSURWHFWHGIURPSK\
VLFDODQGPHQWDODEXVH
0DWKHPDWLF&RPSUHKHQVLRQ7DVN0DWK&
5HZULWHWKLVPDWKHPDWLFDOVWDWHPHQWLQWR(QJOLVKS
URVH !±
0XVLFDO&RPSUHKHQVLRQ7DVN0XV&
3DUDSKUDVHWKHIROORZLQJOLQHVE\-RKQ.HDWV
$WKLQJRIEHDXW\LVDMR\IRUHYHU
,WVORYHOLQHVVLQFUHDVHVLWZLOO
1HYHUSDVVLQWRQRWKLQJQHVV
.LQHVWKHWLF&RPSUHKHQVLRQ7DVN.LQ&
:KDWVSRUWGRHVWKLVORJRUHSUHVHQW"
6SDWLDO&RPSUHKHQVLRQ7DVN6SD&
:KDWLVDWKUHHGLPHQVLRQDOVKDSHZKLFKLVPDGHRIIRXUW
ULDQJOHV DQGRQHVTXDUH"
,QWUDSHUVRQDO&RPSUHKHQVLRQ7DVN,QWUD&
$IULHQGKDVLQVXOWHG\RX+RZGR\RXGHVFULEH\
RXUDPELYDOHQFHRI IHHOLQJVIRUFXWWLQJWKLVIULHQGVKLS"
,QWHUSHUVRQDO&RPSUHKHQVLRQ7DVN,QWHU&
-HQQ\GRHVQ¶WFRXQWKHUFKLFNHQEHIRUHWKH\
¶UHKDWFKHG+RZGR\RX GHVFULEHKHUFKDUDFWHU"
1DWXUDOLVW&RPSUHKHQVLRQ7DVN1DW&
:KDWLVWKHQDWXUDOGLVDVWHUGHVFULEHGLQWKLVVHQW
HQFH" ,WOLIWHGDFDUDERXWIHHWRIIWKHJURXQGDQGWKHQ
ZHVDZLW GLVDSSHDUHGIDUIURPXV
D7KXQGHUE+XUULFDQHF(DUWKTXDNHG9ROFDQLF
HUXSWLRQ
,QWHJUDWLYH7DVNVDW$SSOLFDWLRQ6XEVFDOH
216
/LQJXLVWLF$SSOLFDWLRQ7DVN/LQJ$S
:ULWHGRZQDVHQWHQFHZLWKWKHIROORZLQJZRUGV
EXVZHXSRUKXUU\ZLOOWKHPLVV
0DWKHPDWLFDO$SSOLFDWLRQ7DVN0DWK$S
,I\
RXUDYHUDJHVDYLQJLVGROODUVSHUPRQWKDQGWKLVLVRQO\
SHU FHQWVRI\RXUSD\PHQWKRZPXFKDUH\
RXSDLGSHUPRQWK"
0XVLFDO$SSOLFDWLRQ7DVN0XV$S
,QSRHWU\³,DPELF´LVDOLQHLQZKLFKDQXQVWUHVVHGV\
OODEOHLV IROORZHGE\
DVWUHVVHGRQH,VWKLVDQLDPELFOLQH"
0\ZD\LVWREHJLQZLWKWKHEHJLQQLQJ
.LQHVWKHWLF&RPSUHKHQVLRQ7DVN.LQ$S
<RXUERG\QHHGVJUDPVSURWHLQSHUSRXQG$GGPRUHLI\RX
DUHDPDQ:KDWLV\RXULGHDOGDLO\SURWHLQFRQVXPSWLRQ"
6SDWLDO$SSOLFDWLRQ7DVN6SD$S
<RXUURRPLVILYHPHWHUVORQJDQGIRXUZLGH:KDWVKDSHPRV
W SUREDEO\GRHV\RXUURRPKDYH"
,QWUDSHUVRQDO$SSOLFDWLRQ7DVN,QWUD$S
,I\RXUIULHQGVEODPH\RXIRUODFNRIVHOIFRQILGHQFHKRZGR\RX
GHIHQG\RXUVHOIEULQJLQJVRPHUHDOH[DPSOHVIURP\
RXUOLIH"
,QWHUSHUVRQDO$SSOLFDWLRQ7DVN,QWHU$S
:KDWZRXOG\RXGRLQWKLVVLWXDWLRQ"
$IULHQGERUURZV\
RXUSHQDQGWKHQORVHVLW:KHQKHDSRORJL]HV\RX
ZDQWWRUHDVVXUHKLP
1DWXUDOLVW$SSOLFDWLRQ7DVN1DW$S
$QLPDOVSHFLHVLQDIDPLO\VKDUHSK\
VLFDOFKDUDFWHULVWLFVVXFKDVWKH
VKDSHRIWKHKHDGRUWKHTXDOLW\
RIPRYHPHQW1RZPHQWLRQIRXU
217
PHPEHUVLQWKH,QVHFWV)DPLO\
218
,QWHJUDWLYH7DVNVDW$QDO\VLV6XEVFDOH
/LQJXLVWLF$QDO\VLV7DVN/LQJ$Q
&RPSOHWHWKHIROORZLQJVHQWHQFHZLWKWKHEHVWFKRL
FH
(YHQWKHPRVW««IORZHUKDVWKRUQV
DXJO\EZHDWKHUHGFHOXVLYHGQR[LRXVHWHPSWLQJ
0DWKHPDWLFDO$QDO\VLV7DVN0DWK$Q
:KLFKIUDFWLRQLVELJJHU"
D7KHVTXDUHGVXPRIGLIIHUHQFHVEHWZHHQ;DQG<GLYLGH
GE\
E7KHVTXDUHG;PLQXVVTXDUHG<GLYLGHGE\
0XVLFDO$QDO\VLV7DVN0XV$Q
%ODFNDQG:KLWH8SDQG'RZQ/HIWDQG5LJKWDUHH[DPSOHVRI
-
X[WDSRVLWLRQV&RPSOHWHWKHIROORZLQJSRHPZLWKDSSUR
SULDWH
MX[WDSRVLWLRQVVRWKDWWKHUK\
WKPLQHDFKOLQHLVREVHUYHG
,DPWKH««««««««\RXDUHWKHDUURZ
<RXDUHWKH««««««,DPWKHQLJKW
,DP««««««««««\RXDUHWKHZKHHO
<RX UHQHYHUZURQJ,DP«««««««
&RPSDUHWKHVHWZRVHQWHQFHV+RZDUHWKH\
GLIIHUHQWLQPHDQLQJ"
D,FRXOGKHDUWKHNLGV¶YRLFHVXSVWDLUV
E,FRXOGKHDUWKHNLGV QRLVHVXSVWDLUV
.LQHVWKHWLF$QDO\VLV7DVN.LQ$Q
-RKQLVDQDWKOHWH:KDWVSRUWJDPHLVKHPRVWSUREDEO\
WDONLQJ
DERXW"
219
³,OLNHWKHQRLVHWKHVSHHGDQGWKHGDQJHU±WKHUHLV
QRWKLQJPRUH H[FLWLQJWRZDWFK7KRXJK-HQQ\
LVIRQGRIWKHVORZEXWVPDUW
PRYHPHQWVRQWKHERDUG´
6SDWLDO$QDO\VLV7DVN6SD$Q
:KLFKFRQWDLQHUZRXOG\RXXVHIRUNHHSLQJFHUHDOVDQG
FKRFRODWHV"
D%RZHOE%DVNHWF%XFNHWG%RWWOH
,QWUDSHUVRQDO$QDO\VLV7DVN,QWUD$Q
,QZKDWVLWXDWLRQPLJKW\
RXWLSWRHDURXQGWKHKRXVH"
,QWHUSHUVRQDO$QDO\VLV,QWHU$Q
+RZGR\RXGHVFULEHDSHUVRQZKRLVNLQGRIDFROGILVK
1DWXUDOLVW$QDO\VLV7DVN1DW$Q
:KLFKRQHLVOHDVWOLNHWKHRWKHUIRXU"
D+RUVHE.DQJDURRF*RDWG'RQNH\
,QWHJUDWLYH7DVNVDW6\QWKHVLV6XEVFDOH
/LQJXLVWLF6\QWKHVLV7DVN/LQJ6\Q
&DQ\
RXZULWHGRZQDQLGLRPDERXWWKHWLPHDSHUVRQFDQQRW
H[SUHVVKLPVHOI"
0DWKHPDWLFDO6\QWKHVLV7DVN0DWK6\Q
,PDJLQHWKHVKLS\RX¶UHWUDYHOLQJRQLVVLQNLQJDQG\
RXDUHDOORZHGWR FDUU\RQO\NLORJUDPVLQ\
RXUEDJRXWRIWKHVKLS:KLFKRQHVGR\RX
FKRRVH":K\"
DURSHNJ
EPHGLFDONLWNJ
FFDQVRIIRRGJUHDFK
GERWWOHVRIZDWHUNJHDFK
HVKRUWZDYHUDGLRNJ
I$[NJ
220
0XVLFDO6\QWKHVLV7DVN0XV6\Q
)ROORZLQJWKHGLUHFWLRQVFRPSOHWHWKLVKDLNXVRWKD
WWKHSRHP UHIOHFWV\RXUGHILQLWLRQRIMHDORXV\
WRWKHUHDGHUV
-HDORXV\DQRXQ
««««««IHURFLRXVWZRDGMHFWLYHV
'HVWUR\LQJ««««««««««««««««««««IRXUSUHVHQW
SDUWLFLSOHV
0DNHVHYHU\
WKLQJEHFRPH«««««««««DIRXUZRUGSKUDVHZLWKD
SDVWSDUWLFLSOH
««««««««DV\QRQ\PRIWKHQRXQLQ/LQH
:KDWFDQPDNHDKLVVLQJVRXQGLQWKHNLWFKHQ"
.LQHVWKHWLF6\QWKHVLV7DVN.LQ6\Q
&DQ\RXZULWHGRZQWKUHHHIIHFWVRIUHJXODUERG\
H[HUFLVHRQD SHUVRQ VOLIH"
6SDWLDO6\QWKHVLV7DVN6SD6\Q
&DQ\RXGUDZIRXUVTXDUHVVKDULQJDWOHDVWRQHVLGHE\
PHDQVRIOHVV WKDQWRRWKSLFNV"
,QWUDSHUVRQDO6\QWKHVLV7DVN,QWUD6\Q
,PDJLQHWKDW\
RXKDYHEHHQHOHFWHGDVWKHPDQDJHURIDELJFRPSDQ\
OLNH621<ZKDWGR\RXGRRQWKHILUVWGD\LQ\
RXUQHZSRVLWLRQ"
,QWHUSHUVRQDO6\QWKHVLV7DVN,QWHU6\Q
&RPSOHWHWKHIROORZLQJVHQWHQFH
³+HLVDQHJRFHQWULFSHUVRQ+HFDQ¶W«««««´
1DWXUDOLVW6\QWKHVLV7DVN1DW6\Q
+RZFDQ\RXSURWHFW\RXUVHOIIURPGLVHDVHVFDXVHGE\
FHUWDLQ PRVTXLWRHV ELWHVLQDWURSLFDODUHD"
,QWHJUDWLYH7DVNVDW(YDOXDWLRQ6XEVFDOH
/LQJXLVWLF(YDOXDWLRQ7DVN/LQJ(YD
221
,QWKHIROORZLQJVHQWHQFHXQGHUOLQHXQJUDPPDWLFDOZR
UGVDQG UHSODFHWKHPZLWKWKHFRUUHFWRQHV
7KHUHDIHZGUXJVDUHWRGD\
WKDWYDOXHGPRUHWKHQSHQLFLOOLQ
0DWKHPDWLFDO(YDOXDWLRQ7DVN0DWK(YD
:KRLVWKHROGHVWLIWKHVHWKUHHIULHQGVPDNHRQHWUXH
DQGRQHIDOVH VWDWHPHQWV" $OLFH,¶PROGHUWKDQ
%UHQGD&DUOLVQRWWKHROGHVW
%UHQGD,¶PWKHROGHVW&DUOLV\RXQJHUWKDQ$OLFH
&DUO,¶PROGHUWKDQ%UHQGD$OLFHLVWKH\RXQJHVW
0XVLFDO(YDOXDWLRQ7DVN0XV(YD
0HJXUJHQWO\
DUJXHVIRUWKHHIIHFWLYHUROHVRIWKDWPXVLFSOD\
VRQWKH DXGLHQFHLQ79SURJUDPV&DQ\
RXFRPSOHWHKHUFRPPHQWVZLWKWKUHH
DGYDQWDJHVRIXVLQJPXVLFLQ79SURJUDPV" $QG\:K\GRWKH\
KDYHPXVLFLQ791HZVSURJUDPV"7RPHLWVRXQGV
WRWDOO\XQQHFHVVDU\
0HJ0RVWSUREDEO\WKH\
ZDQWWRFUHDWHDVHQVHRIGUDPD,WKLQN
PXVLFLVVXSSRVHGWR«
.LQHVWKHWLF(YDOXDWLRQ7DVN.LQ(YD
'LVFXVVZKLFKRQHLVPRUHVXLWDEOHIRUFKLOGUHQ
D%DVNHWEDOOE6ZLPPLQJF*\PQDVWLFVG6NDWLQJ
6SDWLDO(YDOXDWLRQ7DVN6SD(YD
:KLFKRQHGR\
RXSUHIHUPRUHWROLYHLQDELJKRXVHRUDVPDOOIODW"
%ULQJ\RXUUHDVRQV
,QWUDSHUVRQDO(YDOXDWLRQ7DVN,QWUD(YD
<RXKDYHDOUHDG\SODQQHGIRUDVKRUWYDFDWLRQZKHQ\
RXVXGGHQO\
UHDGLQQHZVSDSHUWKDW
³<RXPD\EHLWFKLQJWRWUDYHOWRGD\:KDWLV\
RXUUHDFWLRQ":K\"
,QWHUSHUVRQDO(YDOXDWLRQ7DVN,QWHU(YD
'LVFXVVWKLV&KLQHVHSURYHUE
222
³2QHSHUVRQ¶VPHDOLVDQRWKHURQH¶VSRLVRQ´
1DWXUDOLVW(YDOXDWLRQ7DVN1DW(YD
6LQFHDJHVDJR(VNLPRVKDYHEXLOWWKHLUKRXVHVLQVHPLFL
UFOHVKDSH
+RZFDQ\RXVXSSRUWWKLVROGWUDGLWLRQ"
$SSHQGL[*
'HVFULSWLYH6WDWLVWLFVIRU3LORW$GPL
QLVWUDWLRQ
7DVN 5DWHU 0 6 0 0 1
6N
' LQ D[ H
Z
/LQJ.
223
0DWK.
-
0XV.
.LQ.
6SD.
-
,QWUD.
-
224
,QWHU.
1DW.
/LQJ&
-
0DWK&
-
0XV& .
225
-
.LQ&
,QWUD&
-
,QWHU&
1DW&
/LQJ$S
-
226
0DWK$S
-
0XV$S
.LQ$S
227
-
6SD$S
-
,QWUD$S
,QWHU$S
1DW$S
228
/LQJ$Q
-
0DWK$Q
-
0XV$Q
.LQ$Q
229
6SD$Q
-
,QWUD$Q
-
,QWHU$Q
-
.
1DW$Q
230
1DW6\Q
-
0DWK6\Q
-
/LQJ6\Q -
-
/LQJ(YD
- -
0XV6\Q
-
0DWK(YD
.LQ6\Q -
-
-
-
-
0XV(YD
6SD6\Q -
-
-
.LQ(YD
,QWUD6\Q .2 -
-
-
6SD(YD
231
-
,QWHU6\Q
-
,QWHU(YD
1DW(YD
1RWH$OOFRUUHODWLRQVZHUHVLJQLILFDQWDWW
ZRWDLOHGS
232
233
Buy your books fast and straightforward online - at one of world’s fastest
growing online book stores! Environmentally sound due to Print-on-
Demand technologies.
Buy your books online at

VDM Verlagsservicegesellschaft mbH
Heinrich-Böcking-Str. 6-8 Telefon: +49 681 3720 174
info@vdm-vsg.de
D - 66121 Saarbrücken Telefax: +49 681 3720 1749 www.vdm-vsg.de
www.get-morebooks.com
Kaufen Sie Ihre Bücher schnell und unkompliziert online – auf einer der
am schnellsten wachsenden Buchhandelsplattformen weltweit! Dank
Print-On-Demand umwelt- und ressourcenschonend produziert.
Bücher schneller online kaufen
www.morebooks.de

3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3

Uploaded by

Copyright:

Available Formats

Developing Integrative Task-based

A prototype model of task-based language assessment, the Integrative

1.2 Significance of the Study 17

1.3 Research Questions and Hypotheses 20

1.4 Definition of Key Terms 22

1.5 Delimitations and Limitations of the Study 28

2 Review of the Related Literature 30

Part I Language Performance Assessment 31

2.1 Language Testing Alternatives 32

2.2 Language Performance Assessment in Use 35

2.3 Language Performance Assessment: Pros and Cons 37

2.4 Performance Assessment Tasks 39

Part II Multiple Intelligences Theory 44

Inter/An Interpersonal Task at Analysis Subscale

Math/Ap Mathematical Task at Application Subscale

There is no science without measurements,

Sometimes, tests are viewed as the feedback given to language

Let's assume that this simple characterization of tests and testing is

Spolsky (1992) rightly argues that the diagnostic or formative

(b) teachers periodically assess learners' performance, and both teachers

They also call for a language testing framework to guide the

As a solution to the longstanding need for language testing,

Most learners, however, know that sometimes they simply do not do

1.1 Statement of the Problem and Purpose of the Study

A move towards tasks in performance assessment poses problems

At the most general level, the problem is that the underlying

Newton (2001) urgently argues that to overcome this problem, we

In line with these quests and questions, in this research project an

1.2 Significance of the Study

Gardner's (1983) MI Theory offers a definition of intelligence that

Gardner (1983) defines intelligence as

[A] bio-psychological potential to incorporate information that can be

This definition stresses the interaction between the individuals'

Generally speaking, the biggest shortcoming of standardized tests is

In spite of the widespread applicability of MI theory in educational

Almost fifty years ago, Bloom (1956) published his famous

One of the consequences of Bloom's Taxonomy is that, not only do

For assessment purposes, then, the researcher's concern is to

1.3 Research Questions and Hypotheses

In a nutshell, the purpose of this study is to develop a prototype

1.4 Definition of Key Terms

3. Multi-faceted; evidenceshould be obtained on

4. Pragmatic; assessment information should be used to improve the

Performance assessment is a measure of assessment based on such

The Integrative Task (IT) Model

A model is a theoretical construct that represents an entity

Performance assessment, also known as alternative, or authentic

Integrative Task Scales

Multiple Intelligences (MI) Theory

Taxonomy is the practice and the science of classification.

The Taxonomy of Educational Objectives

The Taxonomy of Educational Objectives, often called Bloom's

Similar to most studies in the field of social sciences, this research

Perhaps the most serious shortcoming of the research methodology

In a nutshell, Integrative Task (IT) Model is a prototype model of

Review of the Related Literature

Hide not your talents

Part I Language Performance Assessment

curriculum and associated pedagogical interventions, and to enable the

Fundamental to second language performance assessment is the

2.3 Language Performance Assessment: Pros and Cons

2.4 Performance Assessment Tasks

Briefly stated, in task-based language assessment, we are interested

facilitate interpretations about learners' abilities to accomplish particular