Abpc1203 BM

Faculty of Applied Social Sciences
ABPC1203
Psychology Test and Measurement
Copyright © Open University Malaysia (OUM)

ABPC1203
PSYCHOLOGY TEST
AND MEASUREMENT
Dr Wan Shahrazad Wan Sulaiman
Dr Stuti K Mishra

Project Directors: Prof Dato’ Dr Mansor Fadzil
Assoc Prof Dr Mohd Yusuf Ahmad
Open University Malaysia
Module Writers: Dr Wan Shahrazad Wan Sulaiman

Universiti Kebangsaan Malaysia
Dr Stuti K Mishra
ACME Learning
Moderator: Dr Wong Huey Siew

Enhancer: Gan Chun Hong

Universiti Kebangsaan Malaysia
Developed by: Centre for Instructional Design and Technology

First Edition, April 2012

Second Edition, December 2014
Copyright © Open University Malaysia (OUM), December 2014, ABPC1203

All rights reserved. No part of this work may be reproduced in any form or by any means without
the written permission of the President, Open University Malaysia (OUM).
Copyright © Open
Copyright Open University
University Malaysia
Malaysia (OUM)
(OUM)
Table of Contents
Course Guide xiiiăxviii
Topic 1 Introduction to Testing and Assessment 1

1.1 Definitions, Basic Principles and Types of Test 2
1.1.1 Definitions of Psychological Tests 3
1.1.2 Basic Principles 4
1.1.3 Types of Tests 7
1.2 History of Psychological Testing 11
1.2.1 Chinese Civilisation 12
1.2.2 Western Civilisation 13
1.2.3 Contemporary History of Psychological Testing 14
1.3 Advantages and Limitations of Testing 16
1.3.1 Advantages of Testing 16
1.3.2 Limitations of Testing 17
Summary 18
Key Terms 19
References 19
Topic 2 The Science of Psychological Measurement 20

2.1 Basic Statistics for Testing 21
2.1.1 The Importance of Statistics for Psychology
Testing and Measurement 21
2.1.2 Scales of Measurement 21
2.1.3 Types of Scales 22
2.1.4 Describing Data 23
2.1.5 Norms 25
2.2 Correlation and Regression 26
2.2.1 Correlation Analysis 26
2.2.2 Significance of Measuring Correlation 28
2.2.3 Types of Correlation 29
2.2.4 Regression Analysis 31
2.2.5 Advantages of Regression Analysis 32
2.2.6 Differences between Correlation and Regression
Analysis 33
2.3 Reliability and Validity 33
2.3.1 Reliability of Tests 34
2.3.2 Validity of Tests 35
Summary 37
Key Terms 38
References 38

iv  TABLE OF CONTENTS
Topic 3 Test Construction 39

3.1 Test Construction 40
3.2 Defining the Test: What to Measure? 41
3.3 Selecting a Scaling Method: Types of Item Formats 42
3.4 Constructing the Items: Writing Test Items 45
3.4.1 Writing Test Items 45
3.4.2 Essential Characteristics of Item Writers 46
3.5 Testing the Items (1): Items Evaluation 48
3.6 Testing the Items (2): Item Analysis 49
3.6.1 Item Difficulty Index 50
3.6.2 Item Discrimination Index 51
3.6.3 Item Reliability Index 53
3.6.4 Item Validity Index 53
Summary 54
Key Terms 55
References 55
Topic 4 Test Administration 57

4.1 Interviewing Techniques 58
4.1.1 Principles of Effective Interviewing 58
4.1.2 Types of Interview 61
4.2 Issues in Test Administration 63
4.2.1 The Examiner and the Subject 64
4.3 Practical Considerations in Test Administration 66
4.3.1 Physical Environment 67
4.3.2 Various Responsibilities of the Test Administrator 68
4.3.3 Duties of the Test Administrator during the
Process of Psychology Testing and Measurement 70
4.3.4 Additional General Guidelines for Test
Administrators to Follow 71
4.3.5 Test AdministratorÊs Post-test Duties 72
4.4 Computerised Testing 72
Summary 74
Key Terms 75
References 76

TABLE OF CONTENTS  v
Topic 5 Intelligence Test 78

5.1 The Concept of Intelligence and Its Definitions 79
5.2 Intelligence Test and Intelligence Quotient: The
Development in Brief 80
5.2.1 The Early Development 80
5.2.2 Intelligence Quotient (IQ) 81
5.3 Models and Theories of Intelligence 82
5.3.1 SpearmanÊs Two-Factor Theory of Intelligence:
The „g‰ Factor 83
5.3.2 ThurstoneÊs Multidimensional Model: Primary
Mental Abilities 84
5.3.3 GuilfordÊs Structure of Intellect Model 85
5.3.4 CattellÊs Hierarchical Model: CHC Model 86
5.3.5 GardnerÊs Theory of Multiple Intelligence 87
5.4 The Stanford-Binet Intelligence Scale 89
5.5 The Wechsler Scales 91
5.6 Intelligence Tests for Military 93
5.6.1 Brief History 93
5.6.2 The Army Alpha Tests 93
5.6.3 The Army Beta Tests 94
5.6.4 Various Related Issues 94
5.7 Intelligence Tests Issues 97
5.7.1 Can Intellectual Abilities be Increased? 97
5.7.2 Culture and Intelligence 99
5.7.3 Genetic versus Environment 99
5.7.4 Use of IQ Score 100
Summary 101
Key Terms 101
References 102
Topic 6 Ability, Aptitude and Achievement Test 103

6.1 Definition of Ability, Aptitude and Achievement Tests 104
6.2 Structures of Aptitude and Achievement Tests 105
6.2.1 Characteristics of Aptitude and Achievement Tests 105
6.2.2 Methods of Tests Administration 106
6.2.3 Speed Tests versus Power Tests 106
6.2.4 The Contents 107
6.2.5 The Test Scores 109

vi  TABLE OF CONTENTS
6.3 Guidelines for Test Takers 110

6.3.1 Ask the Right Questions 110
6.3.2 Work Systematically 110
6.3.3 Confirm If in Doubt 111
6.3.4 Do Not Make Assumptions 112
6.3.5 Decide on a Practice Strategy 112
6.4 Group Tests 113
6.4.1 Advantages of Group Tests 113
6.4.2 Disadvantages of Group Tests 114
6.5 Multiple Aptitude Test Batteries 117
6.6 General Aptitude Test Battery (GATB) 118
6.7 Differential Aptitude Tests (DAT) 120
6.8 Kaufman Assessment Battery for Children-II 123
6.9 Other Tests in Education and Special Education 124
6.10 Application of Aptitude and Achievement Tests: Issues 128
6.10.1 Education 128
6.10.2 Civil Services 130
6.11 Aptitude Testing 131
6.11.1 Career Aptitude Tests versus Attainment Tests 131
6.11.2 Aptitude Tests versus Intelligence Quotient (IQ)
Tests 132
6.11.3 Encounter with a Career Aptitude Test 132
6.11.4 What Characteristics Do Aptitude Tests Analyse? 132
Summary 133
Key Terms 134
References 134
Topic 7 Attitudes, Values and Interests Tests 136

7.1 The Concepts of Attitudes, Values and Interest 137
7.1.1 Attitudes 137
7.1.2 Values 140
7.1.3 Interest 141
7.2 The Strong Interest Inventory (SII) 142
7.2.1 Brief History of SII 143
7.2.2 The Features of SII 144
7.2.3 The Application of SII 147
7.3 Kuder Occupational Interest Survey (KOIS) 148
7.3.1 Some Brief Psychometric Features 149
7.3.2 The Kuder Test Survey 150
7.3.3 Kuder Journey 151

TABLE OF CONTENTS  vii
7.4 Career Assessment Inventory (CAI) 154

7.4.1 Key Features of CAI 155
7.4.2 The Usage of CAI 155
7.5 Jackson Vocational Interest Sruvey (JVIS) 156
7.5.1 Applications of JVIS 157
7.5.2 Description of JVIS 158
7.5.3 Basic Interest Scales of JVIS 158
7.5.4 The Scoring Methods 159
7.6 Psychology Tests and Measurement in Industries and
Businesses 161
7.6.1 The Roles of Test and Assessment in
Organisations 161
7.6.2 Aspects of Tests and Measurements 162
7.6.3 Assessment Centres 162
7.6.4 Biographical Data 164
7.6.5 Cognitive Ability Tests 164
7.6.6 Integrity Tests 164
7.6.7 Interviews 165
7.6.8 Job Knowledge Tests 165
7.6.9 Personality Tests 165
7.6.10 Physical Ability Tests 166
7.6.11 Work Samples and Simulations 166
Summary 167
Key Terms 169
References 169
Topic 8 Personality Test 171

8.1 Personality: The Concepts 171
8.2 Objective versus Projective 172
8.3 Development of Personality Testing 173
8.4 Objective Measures of Personality 174
8.4.1 California Psychological Inventory (CPI) 174
8.4.2 Personality Research Form (PRF) 175
8.4.3 Sixteen Personality Factor Questionnaire (16PF) 177
8.4.4 The Revised NEO Personality Inventory
(NEO-PI-R) 178

viii  TABLE OF CONTENTS
8.5 Projective Personality Tests 179

8.5.1 Rorschach Inkblot Test 180
8.5.2 Thematic Apperception Test (TAT) 182
8.5.3 Draw-a-Person Test (DAP) 183
Summary 184
Key Terms 185
References 185
Topic 9 Psychology Test and Measurement in Counselling, Health

and Clinical Psychology 188
9.1 Application in Counselling Settings 189
9.1.1 Counselling Related Tests 189
9.1.2 Testing Process 190
9.2 Application in Health Psychology and Healthcare 193
9.2.1 Lifestyle and Disease 193
9.2.2 Tests and Measurement 195
9.3 Neuropsychology Test and Measurement 197
9.3.1 Neuropsychology 197
9.3.2 Neuropsychological Tests and Measurement 198
9.4 Applications in Clinical Psychology 201
9.4.1 Psychopathology 201
9.4.2 Psychopathology as the Study of Mental Illness 202
9.4.3 Psychopathology as a Descriptive Term 202
9.5 The Minnesota Multiphasic Personality Inventory (MMPI) 203
9.5.1 Overviews, History and Development 203
9.5.2 Current Scale Composition 206
9.5.3 Scoring and Interpretation 210
9.6 The Millon Clinical Multiaxial Inventory (MCMI) 211
9.6.1 Psychometrics of MCMI-III 212
9.7 Diagnostic and Statistical Manual of Mental Disorders 217
9.7.1 History of DSM 218
9.7.2 Developments of DSM 219
Summary 224
Key Terms 225
References 226

TABLE OF CONTENTS  ix
Topic 10 Issues and Challenges of Testing 227

10.1 Overview on Psychological Testing Application 228
10.1.1 Uses of Psychological Tests 228
10.1.2 Information on Psychological Tests 229
10.2 Testing and Society 230
10.3 Societal Consequences of Tests 231
10.4 The Issues of Faking Tests 233
10.4.1 Some Techniques in Reducing Test Faking 233
10.4.2 This Personality Test Cannot Be Faked 236
10.5 Test Bias 238
10.5.1 Definition of Test Bias 238
10.5.2 Models of Test Bias 239
10.5.3 Test Bias in Industrial and Organisational
Psychology 240
10.5.4 Test Fairness 240
10.6 Cultural Influence in Testing 241
10.6.1 Cultural Backgrounds 242
10.6.2 Language 243
10.6.3 Behaviour 243
10.6.4 Culture-free and Culture-fair Tests 244
10.7 Testing in a Cross-cultural Context 244
10.7.1 Developing a Cross-cultural Conceptual Model
for Testing Organisational Commitment in UAE 244
10.7.2 Language Issues in Cross-cultural Usability
Testing 245
10.8 Legal and Ethical Issues 247
10.8.1 Legal Issues of Testing in Educational Settings 247
10.8.2 Legal Issues of Testing in Entrepreneur Settings 250
10.8.3 Legal and Ethical Considerations 252
10.9 The Future of Testing 254
Summary 257
Key Terms 259
References 259

x  TABLE OF CONTENTS

COURSE GUIDE

x  PANDUAN KURSUS

COURSE GUIDE  xiii
COURSE GUIDE DESCRIPTION

You must read this Course Guide carefully from the beginning to the end. It tells
you briefly what the course is about and how you can work your way through
the course material. It also suggests the amount of time you are likely to spend to
complete the course successfully. Please keep on referring to the Course Guide as
you go through the course material as it will help you to clarify important study
components or points that you might miss or overlook.
INTRODUCTION
ABPC1203 Psychology Test and Measurement is one of the courses offered by the
Faculty of Applied Social Sciences at Open University Malaysia (OUM). This
course is worth three credit hours and should be covered over 8 to 15 weeks.
COURSE AUDIENCE
This course is offered to all students undertaking the Bachelor of Psychology with
Honours programme.
As an open and distance learner, you should be able to learn independently and
optimise the learning modes and environment available to you. Before you begin
this course, please ensure that you have the right course materials, understand
the course requirements, as well as know how the course is conducted.
STUDY SCHEDULE
It is standard OUM practice that learners accumulate 40 study hours for every
credit hour. As such, for a three-credit hour course, you are expected to spend
120 study hours. Table 1 gives an estimation of how the 120 study hours could be
accumulated.

xiv  COURSE GUIDE
Table 1: Estimation of Time Accumulation of Study Hours
Study
Study Activities
Hours
Briefly go through the course content and participate in initial discussions 3
Study the module 60
Attend 3 to 5 tutorial sessions 10
Online participation 12
Revision 15
Assignment(s), Test(s) and Examination(s) 20
TOTAL STUDY HOURS 120
COURSE OUTCOMES
By the end of this course, you should be able to:
1. Discuss different categories of tests;
2. Identify several tests and their usefulness in each category;
3. Demonstrate the ability to determine if tests tend to provide reliable and

valid scores;
4. Demonstrate an understanding of norms and basic statistics used in

psychological testing;
5. Explain professional, legal and ethical issues in testing;
6. Describe the rationale for selecting tests to measure various characteristics

of interest; and
7. Organise a test and interpret the results in a professional report.

COURSE GUIDE  xv
COURSE SYNOPSIS
This course is divided into 10 topics. The synopsis for each topic is listed as
follows:
Topic 1 introduces psychological testing and assessment, historical, cultural and

legal/ethical consideration, application and consequences of psychological tests.
Topic 2 describes the norms and basic statistics of testing, correlation and
regression and reliability and validity of test items.
Topic 3 identifies the stages of test construction, the goals of a test, types of item
formats, steps in evaluating test items and item analysis in psychological tests.
Topic 4 describes interviewing techniques, types of interviews, important issues

related to test administration, the various responsibilities of a test administrator,
the ways to conduct a psychology test and measurement session effectively and
the advantages and disadvantages of computerised testing.
Topic 5 discusses the concept of intelligence and its measurement, the different
models and theories related to intelligence and major intelligence tests. This is
then followed by descriptions of the types of intelligence tests used for military in
US and the critical issues related to intelligence tests.
Topic 6 describes individual tests of specific abilities, group tests, multiple

aptitude test batteries, other tests of ability in education and special education
and issues of aptitude testing. In addition, this topic also identifies the issues in
standardised tests in the field of education, civil service and the military.
Topic 7 explains the rationale for attitudes, values and interest testing, the Strong
Interest Inventory, the Kuder Occupational Interest Survey, Career Assessment
Inventory and Jackson Vocational Interest Survey (JVIS). It also discusses the
various issues related to psychological testing in industrial and business settings.
Topic 8 examines the development of personality test, objectives measures of

personality tests like the California Psychological Inventory, Personality Research
Form and Sixteen Personality Factor questionnaire and NEO-PI-R. It also explores
the types of projective personality tests used to measure personality of
individuals.

xvi  COURSE GUIDE
Topic 9 describes psychopathology, the Minnesota Multiphasic Personality

Inventory (MMPI), the Millon Clinical Multiaxial Inventory (MCMI) and the
DSM-IV. This is followed discussion related to the application of psychological
tests in clinical, counselling, health psychology and healthcare settings.
Topic 10 discusses issues of faking, test bias, testing in a cross-cultural context,

legal issues and the future of testing.
TEXT ARRANGEMENT GUIDE

Before you go through this module, it is important that you note the text
arrangement. Understanding the text arrangement will help you to organise your
study of this course in a more objective and effective way. Generally, the text
arrangement for each topic is as follows:
Learning Outcomes: This section refers to what you should achieve after you
have completely covered a topic. As you go through each topic, you should
frequently refer to these learning outcomes. By doing this, you can continuously
gauge your understanding of the topic.
Self-Check: This component of the module is inserted at strategic locations

throughout the module. It may be inserted after one sub-section or a few sub-
sections. It usually comes in the form of a question. When you come across this
component, try to reflect on what you have already learnt thus far. By attempting
to answer the question, you should be able to gauge how well you have
understood the sub-section(s). Most of the time, the answers to the questions can
be found directly from the module itself.
Activity: Like Self-Check, the Activity component is also placed at various

locations or junctures throughout the module. This component may require you
to solve questions, explore short case studies, or conduct an observation or
research. It may even require you to evaluate a given scenario. When you come
across an Activity, you should try to reflect on what you have gathered from the
module and apply it to real situations. You should, at the same time, engage
yourself in higher order thinking where you might be required to analyse,
synthesise and evaluate instead of only having to recall and define.

COURSE GUIDE  xvii
Summary: You will find this component at the end of each topic. This component
helps you to recap the whole topic. By going through the summary, you should
be able to gauge your knowledge retention level. Should you find points in the
summary that you do not fully understand, it would be a good idea for you to
revisit the details in the module.
Key Terms: This component can be found at the end of each topic. You should go
through this component to remind yourself of important terms or jargon used
throughout the module. Should you find terms here that you are not able to
explain, you should look for the terms in the module.
References: The References section is where a list of relevant and useful

textbooks, journals, articles, electronic contents or sources can be found. The list
can appear in a few locations such as in the Course Guide (at the References
section), at the end of every topic or at the back of the module. You are
encouraged to read or refer to the suggested sources to obtain the additional
information needed and to enhance your overall understanding of the course.
PRIOR KNOWLEDGE
No prior knowledge required.
ASSESSMENT METHOD
Please refer to myINSPIRE.
REFERENCES
Domino, G., & Domino M. L. (2006). Psychological testing: An introduction
(2nd ed.). New York: Cambridge University Press.
Cohen, R. J., & Swerdlik, M. E. (2004). Psychological testing and assessment: An

introduction to tests and measurement (6th ed.). Boston, MA: McGraw-Hill.
Kaplan, R. M., & Saccuzzo, D. P. (2004). Psychological testing: Principles,

applications and issues (6th ed.). Belmont, CA: Wadsworth.
Copyright © Open
Copyright Open University
University Malaysia
Malaysia (OUM)
(OUM)
xviii  COURSE GUIDE
TAN SRI DR ABDULLAH SANUSI (TSDAS)

DIGITAL LIBRARY
The TSDAS Digital Library has a wide range of print and online resources for
the use of its learners. This comprehensive digital library, which is accessible
through the OUM portal, provides access to more than 30 online databases
comprising e-journals, e-theses, e-books and more. Examples of databases
available are EBSCOhost, ProQuest, SpringerLink, Books247, InfoSci Books,
Emerald Management Plus and Ebrary Electronic Books. As an OUM learner, you
are encouraged to make full use of the resources available through this library.

Topic  Introduction to
1 Testing and
Assessment
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Define test;
2. List the various definitions of psychological testing;
3. State the basic concepts and principles in psychological testing and
assessments;
4. Explain the historical development of psychological testing; and
5. Identify the advantages and limitations of psychological testing.
 INTRODUCTION
As adults, we would have thus far experienced many kinds of tests in our lives.
The moment we were born, our length, weight and physical fitness were
measured. Then, as we grew up, we underwent a series of tests ă from being
measured for our ability to crawl, stand and walk to having our health checked.
As school-going children, we were exposed to many more tests, which served to

evaluate our cognitive development and academic performance. School and
national examinations, for instance, are conducted to assess the knowledge and
comprehension of students. Tests and examinations are even more common for
students in higher educational institutions.

2  TOPIC 1 INTRODUCTION TO TESTING AND ASSESSMENT
These examples show that we are no strangers to tests and testing. Most of the
tests that we have undertaken were in academic settings with the purpose of
assessing how much knowledge we have acquired. Psychological testing is no
different. The only difference is that psychological tests and measurements are
used to assess the behaviour and psychological components of human beings, for
example, intelligence, personality, self-esteem, motivation, quality of life,
depression, stress and many other aspects.
In Topic 1, we will first learn the definitions, basic concepts and principles of
testing and assessment. We will then study the historical development of
psychological testing. At the end of the topic, we will evaluate the advantages
and limitations of psychological testing.
1.1 DEFINITIONS, BASIC PRINCIPLES AND

TYPES OF TEST
In general, a „test‰ or an examination (or „exam‰ in short) is an assessment
intended to measure a test takerÊs knowledge, skills, aptitude, physical fitness or
classification in many other topics (e.g. beliefs). A test may be administered orally,
on paper, on a computer or in a confined area that requires a test taker to
physically perform a set of skills.
Tests vary in style, rigour and requirements. For example, in a closed book test, a
test taker is often required to rely on memory to respond to specific items, whereas
in an open book test, a test taker may use one or more supplementary tools such as
a reference book or calculator when responding to an item.
From a psychological perspective, a test is an instrument designed to measure

selected psychological and mechanical attributes of an individual. Let us take an
example of a psychological test and measurement that is applied in a business
setting. The objective of this exercise usually is to enable the employer to predict
what an individual will do in the future and whether he or she is the right fit for
the organisation or not.
Some important prerequisites for an effective test are that it should be objective,
reliable and valid. It should be clear on what property it aims to measure, has
clear instructions for administration purposes, scoring and procedures of
interpreting test results.

TOPIC 1 INTRODUCTION TO TESTING AND ASSESSMENT  3
It is also a plus if a test offered was economical in terms of the time and money it
took to administer, score and interpret it. Most of all, a good test is one that
measures what it sets out to measure (Cohen, et al., 2010).
Reliability of a test refers to the accuracy, dependability, consistency or

repeatability of test results. Whereas validity of a test refers to the meaning and
usefulness of the test results (Kaplan, et al., 2009).
We will explore the concepts of reliability and validity which are vital in
psychological testing and measurement in Topic 2.
1.1.1 Definitions of Psychological Tests

Several definitions of psychological tests as proposed by psychometricians are
shown in Table 1.1 below.
Table 1.1: Definitions of Psychological Tests and the Sources
Definition Source
A psychological test is a systematic and objective Anastasi (1988)
procedure for measuring a sample of behaviour.
A test is a standardised procedure for sampling Gregory (2007)
behaviour and describing it with categories or scores.
A test is a measurement device or technique used to Kaplan and Saccuzzo (2005)
quantify behaviour or aid in the understanding and
prediction of behaviour.
Therefore, we can conclude that a psychological test is a tool (device or technique)

used to quantify behaviour. It can also be referred to as a set of items (questions or
problems) that are arranged in a way that gives an estimation of the intended
behaviour to be measured.
There is a term that appears in the first paragraph of this section: Psychometrician.
This is the term that learners who study psychological tests and measurements
should be aware of. Psychometrics is a term which is closely related to
psychological tests and measurements. Psychometrics may be defined as the
science of psychological measurement. Variants of these words include the
adjective: „Psychometric‰, which refers to measurement that is psychological in
nature; and the nouns: „Psychometrist‰ and „Psychometrician‰, both referring to
psychological test users (Cohen, et al., 2010).

1.1.2 Basic Principles

Based on the definitions given by psychometricians, most tests have these five
basic principles, as shown in Figure 1.1.
Figure 1.1: Five basic principles of psychological tests
Let us now examine these basic principles in greater detail.
(a) Standardised Procedure

A test can be described as standard if the administration, scoring and
interpretation of the test by test users and researchers are standard.
Standardisation of administration means that psychologists administering
the test must follow a standard method of administration, such as following a
certain time limit for a test that measures speed and skills. This is why all
psychologists must get specific and adequate training before administering a
test.
A standardised procedure also means that the test user must follow a
specific procedure in scoring the test. This will lead to a standardised
procedure in interpreting the test results. All the information on the
procedures of using the test is usually included in the manual of each
psychological test.

Imagine a psychological test to be any tool or electrical appliance that we

buy from a department store. Every one of these electrical appliances comes
with a manual guiding users on how to use and operate the appliance. The
same goes with the manual of a psychological test. Users who want to use
any psychological test must first read and understand the manual before
taking the test. In this way, we can avoid any errors that may affect the
accuracy of the test results.
(b) Sample of Behaviour

Each psychological test can only measure certain parts of the whole
behaviour. In other words, only a sample of behaviour can be measured,
since we cannot measure human behaviour in total. For instance, one
psychological test cannot include all the items or questions that measure the
emotions of human beings. What it can do is to select and measure only a
specific part of emotions such as depression, which is measured by the Beck
Depression Inventory (BDI). A good test therefore does not assess two
different behaviours in one single test. This is why personality tests only
measure personality, while intelligence tests only measure intelligence.
(c) Scores and Categories

Any good psychological test can produce scores. The scores are used as
information and evidence to place individuals into respective categories.
Without scores, the test is meaningless. Take the example of a mathematics
test taken by a class of students. If the test does not produce a score for the
students, then we will not know how strong a child is in terms of his or her
ability in mathematics. A low score is used to imply that the child has low
ability in mathematics, while a high score indicates that the child is good in
mathematics.
The same analogy can be used in psychological tests. If a test that measures
stress can produce scores, then we can categorise individuals as having a
low, moderate or high stress level. In the case of personality testing, the
scores from a personality test can tell us what type of personality a person
has.

(d) Norms and Standardisation

The score obtained from any test is meaningless on its own. It can only
make sense when we compare it with the scores of other individuals. The
scores and performances obtained from each individual measured are
described and interpreted by looking at the similarities and differences
between one subject with that of other subjects who have taken the same
test. For instance, we can see the application of scoring in educational
settings where scores are used to rank students from the top performers to
weaker students.
However, the scores must be compared with people who take the same test.
In this sense, selection and testing the test to a sample of respondents is
crucial. This sample must represent the whole population so that when it
becomes the basis for comparison, it can be used reliably to include all
people in the population. Imagine that you administer a motivation test to
high performers in urban schools and these results are used as the norm or
basis for comparison. Now, imagine that a student of average performance
from a rural school takes the test and obtains a score below the average. Is
this comparison a valid one when the basis for comparison does not
represent those in the rural schools in the first place?
(e) Prediction of Behaviour

The objective of a psychological test is to help us to predict human
behaviour. Human behaviour can be measured using psychological tests.
This can be done when the norm developed is able to give complete
information on the behaviour measured. For example, we can identify a
personÊs motivational level by using a motivational test.
Prediction of behaviour of a psychology test can be explained with an

example as follow: A psychology test is measuring personality. From the
personality types that the test has measured, we can predict the related
behaviours which are associated with certain type of personality revealed
from the test. For instance, an individual with introvert personality type ă
the behaviours that we can predict from it (although not directed measured
through the test) are prefer to do activities individually; may not feel
comfortable in a party etc..)

1.1.3 Types of Tests

As there are many types of behaviours and various psychological aspects that
can be found in different individuals, therefore there are many types of tests and
ways in classifying them. According to Kaplan and Saccuzzo (2009), there are
two main ways to categorise psychological tests which are:
(a) Ways of administration; and
(b) Behavioural or psychological aspects measured.
These two ways to categorise tests are discussed further as follows:
(a) Ways of Administration (shown in Figure 1.2)

There are two ways of test administration and they are:
(i) Individual Tests

Tests that are given only to one person at a time are known as
individual tests. The examiner or test administrator (the person giving
the test) gives the test to only one person at a time, the same way that
psychotherapists see only one person at a time (Kaplan et al., 2009).
(ii) Group Tests

A group test is a test that is administered to more than one person at a
time by a single examiner.
Figure 1.2: Two major types of tests based on the methods of administration

(b) Behavioural or Psychological Aspects Measured (shown in Figure 1.3):

This method used to categorise tests can be further divided into two major
types of tests which are:
(i) Ability Tests

Contains items that can be scored in terms of speed, accuracy or both.
The faster or the more accurate a personÊs responses, the better the
scores on particular characteristics that are measured by the tests.
There are different types of abilities, namely, achievement, aptitude

and intelligence, which are measured in different types of tests.
 Achievement Tests
Assess the ability that is acquired as a result of previous learning.
For example, a mathematics achievement test measures how many
mathematical questions an individual can solve based on what has
been learnt thus far.
These tests are also known as proficiency tests. The skills already
acquired by the candidate either through his/her education or
experience can be judged through these tests. Such skills are
usually essential during job interviews. A candidate for the post of
a stenographer for example, may be given a test in typewriting
and shorthand to see how accurate and how fast he or she can
perform.
 Aptitude Tests
Assess an individualÊs potential for learning or acquiring a specific
skill. Aptitude means the potential which an individual has for
learning the skills required to perform a task or job efficiently. For
example, a mathematics aptitude test measures how many
questions an individual might be able to solve given a certain
amount of training, education and experience. These tests measure
an individualÊs capacity and his or hers potential for development.
In industrial and business settings, aptitude tests are the most
promising indices for predicting an employerÊs success.

 Intelligence Tests
Assess a personÊs general potential to solve problems, ability to
adapt to different surroundings, ability of abstract thinking and to
what extent an individual is able to utilise what he or she gains
from experience.
Intelligence tests are different from achievement or aptitude tests.

Intelligence refers to the general potential to achieve. The
intelligence test measures oneÊs mental ability based on age. The
general ability or potential as proposed by Thurstone (1938), for
instance, includes dimensions such as comprehension, vocabulary,
numbers, spatial ability, memory, speed, perception and
reasoning.
There are many intelligence tests based on the models and

theories proposed by different researchers on the concept of
intelligence. Topic 4 will focus on intelligence tests as one of the
major types of psychological tests and measurements especially in
determining the mental and learning ability of children.
Intelligence tests will be further discussed in Topic 5 while Ability,
Aptitude and Achievement Tests will be further introduced in
Topic 6 of this module.
Figure 1.3: Two major types of tests and their subtypes

In contrast to ability tests, there is another major type of psychological tests

that is personality test.
(ii) Personality Tests

Personality tests are related to the overt and covert dispositions of an
individual. It reflects the tendency of an individual to show a specific
behaviour or response in a given situation. These tests measure
certain characteristics such as emotional maturity, sentimental
balance, sociability, objectivity and so on of an individual. Some
personality tests can even measure whether a person has a sick or
healthy personality. Personality tests will be explained in greater
detail in Topic 7.
There are two major types of personality tests which are:
 Objective Personality Tests

Structured, or objective, personality tests provide a statement that
requires the subject to answer on his or her own, usually self-
report in nature.
 Projective Personality Tests

Are unstructured, either the stimulus (test materials) or the
required response or both, are ambiguous (Refer to Kaplan, et al.,
2009).
Apart from the categorisation on the types of tests discussed above, there are
many other types of tests which may not fully fit into the categorisation
presented so far. One of them is the interest test.
Interest tests are widely used by counsellors and industrial and organisational
psychologists. This form of test identifies the pattern of interests in areas in
which individuals show special concern, fascination and involvement. These
tests will be able to suggest what type of a job may be satisfying to employees.
Interest tests are also used for vocational guidance. They help the individuals in
taking up occupations of their choice.
Other tests, for example neuropsychological tests, assess brain and nervous
system functions in relation to behaviour. There are also testing and screening
tools to determine levels of anxiety and stress to assess quality of life and coping
strategies, which are essential in health psychology. All these types of tests will
be discussed further in their respective topics in this module.

ACTIVITY 1.1
1. State some of the tests that you have taken until now.
2. Based on a test that you are familiar with, analyse whether or not
the test fulfils the five basic principles of tests as discussed earlier.
SELF-CHECK 1.1
1. Define a test.
2. Compare various definitions of psychological tests and

measurements. What are their similarities and differences?
3. State two examples of personality tests.
1.2 HISTORY OF PSYCHOLOGICAL TESTING

In general, the development of psychological testing can be divided into three
eras based on historical context as shown in Figure 1.4.
Figure 1.4: The three eras from a historical context of the development of psychological
testing
Let us explore each era in greater detail in the following subtopics.

1.2.1 Chinese Civilisation

Evidence has shown that tests were first systematically used in China. Written
tests were introduced in the era of the Han Dynasty (206 B.C.E. to 220 C.E.). The
use of test batteries was quite common then as well. Test batteries refer to two or
more tests used in conjunction in order to get holistic views on the aspects that
are tested on an individual. During the Han Dynasty, there were five important
aspects that were used when testing an individual, in order to select those who
were suitable to work in public office. Figure 1.5 shows these five important
aspects.
Figure 1.5: Important aspects of tests during the Han Dynasty
Tests had become quite well developed by the Ming Dynasty. There were
national multistage testing programmes conducted, involving local and regional
testing centres. Those who did well in tests at the local level went on to
provincial capitals for more extensive essay examinations. After this second
testing, those with the highest test scores proceeded to the national level for the
final round. Only those who passed this third set of tests were eligible for public
office. Thus, the first evidence of systematic usage of testing can be found in the
Chinese civilisation. The Western civilisation is believed to have established their
testing system of government officials based on that of the Chinese civilisation.

1.2.2 Western Civilisation

The Western world most likely learned about testing programmes from the
Chinese. Many psychologists have stated that tests in the Western civilisation
started with Charles Darwin and his theory of individual differences, which he
published in the Origin of Species (1859; in Kaplan & Saccuzzo, 2005). His theory
proposed the concept of the survival of the fittest.
This theory was later continued by Sir Francis Galton (1869; in Kaplan &
Saccuzzo, 2005) when he proposed his theory in the book Hereditary Genius.
Galton stated that only the fittest human beings survive and they pass on their
genes to the next generation. He further proved his theory by studying
individual differences in sensory-motor functions. His interest in genetics led
him to measure individual differences, where he introduced sensory, perception
and motor tests. Evidence of these first tests by Galton was recorded when
visitors to the International Exposition in 1884 paid to undergo GaltonÊs simple
measures of vision, hearing, physical strength and reaction time.
The studies of individual differences in adaptability were later picked up by J.

McKeen Cattell in the late 1890s. Cattell also contributed to the development of
rating scales, questionnaires and statistical methods. He assessed individual
differences in the „intellectual‰ levels of college students in which the primary
emphasis was on sensory-motor tests such as reaction time, visual acuity and
memory. Cattell believed that complex mental abilities can be measured by
simply computing the sum of scores on tests of basic human facilities. Cattell
coined the term „mental test‰ and went on to describe 10 basic tests which are:
(a) Dynamometer strength ă the strength of a hand squeeze;
(b) Movement speed ă the rate at which a hand moves to a distance of 50cm;
(c) Two point discrimination threshold ă discriminating between two points of

sensation on the skin;
(d) Pressure-pain threshold ă force applied to a rubber-tipped needle, which is

applied to the skin to determine pressure-pain thresholds;
(e) Just noticeable difference threshold for weights ă discriminating between

boxes of different weights;

(f) Reaction time ă reaction time to an auditory stimulus;
(g) Colour naming ă time taken to name a series of coloured patches;
(h) Size estimation ă ability to place a sliding line as close as possible to the
middle of a piece of wood which is 50cm in length;
(i) Time judgement ă ability to estimate the passage of 10 seconds; and
(j) Memory for letters ă number of letters recalled after hearing a random list.
1.2.3 Contemporary History of Psychological Testing

Experimental psychology became popular in the late 1800s in Europe and Britain.
Although Wilhelm Wundt was considered to be the first psychologist to use a
laboratory in Leipzig, Germany, in 1879, his emphasis was still on visual-
perceptual abilities.
Later on, tests emerged as a consequence of important needs to categorise and

identify mental and emotional retardation. Esquirol and Seguin defined the
concept of mental retardation and differentiated it from other mental illnesses.
The earliest test constructed was the Seguin Form Board Test which was used to
evaluate people who were mentally disabled. Kraepelin (1912 in Kaplan &
Saccuzzo, 2005), on the other hand, devised a series of examinations for the
evaluation of emotionally impaired people.
An important development in terms of psychological testing occurred in France

when the Ministry of Public Instruction of the French government formed a
commission to develop a test for identifying mentally retarded school children for
special instruction. This commission was headed by Alfred Binet and together with
Theodore Simon, they developed the first intelligence test called the Binet-Simon
Scale in 1905. A revision to the scale was made in 1911 where some minor changes
were made and the scale was then able to be extended to adults. In 1916, Lewis
Terman and his colleagues in Stanford University improved the Binet-Simon test
and renamed it the Stanford-Binet Intelligence Test.
Another important milestone in the history of psychological testing was the

coining of the word I.Q. or Intelligence Quotient (IQ) by William Stern. A
numerical value given to intelligence is determined from the score on an
intelligence test. This score is determined by dividing oneÊs mental age with the
chronological age and then multiplying the figure by 100. The mental age is
obtained from the score of the test. The following is the formula:
IQ = MA/CA  100

During World War I, Robert M. Yerkes, Goddard and Terman proposed for a test
to be used on American army personnel. The test was called the Army Alpha and
Beta Examination. In the 1920s, a government agency called the National
Research Council developed tests for children like the Wechsler scales, the
Scholastic Aptitude Test and the Graduate Record Examination (GRE). The
College Entrance Examination Board (CEEB) was developed to screen students
for entrance into educational institutions.
The first structured personality tests and trait tests emerged as a result of the
success of intelligence tests. The first personality test developed was the
Woodworth Personal Data Sheet. Then, projective personality tests such as the
Rorschach Inkblot Test and the Thematic Apperception Test (TAT) emerged. The
Minnesota Multiphasic Personality Inventory (MMPI) was developed in the late
1930s and gained rapid growth and improvement. The success of the MMPI
encouraged further development of personality tests such as the Sixteen
Personality Factors (16PF) by Raymond B. Cattell.
The Second World War accelerated the growth of tests in clinical and army settings.
However, between the 1950s and the 1970s, the field of psychological testing
witnessed a relative decline and also gave rise to a wide range of criticisms because
of the abuse and misuse of tests.
From the 1980s to 2000, many new applied psychology tests emerged and the most
important ones were in relation to neuropsychology, health psychology, forensic,
child, space and others. These new applied areas of psychology require intensive
and extensive testing and assessment. As a result, the demands for tests in these
areas are on a continual rise. The improvement of test content, techniques and the
use of computers have had positive impacts. Janda (1998) estimated that 3,009
psychological tests were commercially produced in 1994. Between 1992 and 1995,
418 new tests were produced. Today, in US alone, 20,000 tests are produced in a
year. Most of these are used as tools for research and are not standardised.
SELF-CHECK 1.2
1. List the five aspects that were used in testing during the Han
dynasty.
2. Discuss the theory proposed by Francis Galton.

1.3 ADVANTAGES AND LIMITATIONS OF

TESTING
Testing is useful and has its own advantages, but it also has some disadvantages.
These are discussed in the following subtopics.
1.3.1 Advantages of Testing

Tests help in achieving the following five advantages as shown in Table 1.2.
Table 1.2: Advantages of Testing
No Advantages of Testing
1. A test is an objective and standardised behaviour sample, which lends itself well
to statistical evaluation. Also, tests tend to be less subject to bias, particularly tests
of aptitude and achievement.
2. Tests can help to uncover talent that may otherwise be overlooked and to
differentiate between the abilities that are required for the present job with that of
new ones. Another advantage is that a great deal of information about a person
can be collected in a relatively short period of time by using tests.
3. Tests reduce the cost of selection and placement because a large number of
applicants can be evaluated within the least possible time. If an employer expects
to continue in a competitive business, the costs of hiring plus the costs of training
must be kept to a minimum. Psychological tests can reduce the cost of hiring
people, by measuring their aptitude and predicting their success.
4. Tests provide a healthy basis for comparing an applicantÊs background. They
compel the supervisor and the interviewer to think through their evaluation more
carefully. Not only do tests compensate for weaknesses in the interviewer and
supervisor, they also have the effect of increasing the quality of the organisationÊs
employees over a period of time.
5. Tests can be used for differential placements because in testing, attention is
centred on the qualifications required for a specific job. If the applicant fails to
pass the test or does very well in the tests, his or her suitability for a job other
than the one applied for can be explored.

1.3.2 Limitations of Testing

There are various limitations to testing, which are shown in Table 1.3.
Table 1.3: Limitations of Testing
No. Limitations of Testing

1. Tests are often criticised for measuring only a part of the total information needed
to make an accurate selection. This criticism is especially justified if tests were the
only selection method used.
2. Tests are rarely used as the only selection method. Our objective should be to
maximise accuracy in selection by choosing a proper combination of methods.
3. Tests are sometimes criticised based on grounds that tests cannot be used to predict
the chances of success of an applicant as he or she may be nervous at the time of
test. It is true that tests are far from perfect, but other methods like application
letters, interviews and reference checks are also of limited value.
4. No test can measure in full the complex combination of characteristics required for
numerous positions. However it should be remembered that in the past there have
been tests devised to measure far more complex qualities and faculties of
individuals.
SELF-CHECK 1.3
1. Discuss the basic principles of a standard psychological test.
2. Do psychological tests have more advantages than limitations?

Discuss critically.

 In general, a „test‰ or an examination (or „exam‰ in short) is an assessment

intended to measure a test takerÊs knowledge, skills, aptitude, physical fitness
or classification in many other topics.
 A psychological test is defined as a systematic and objective procedure for

measuring a sample of behaviour.
 Tests can be divided into individual tests and group tests based on the mode
of administration.
 Based on the psychological aspects that tests measure, there are two major
types of tests: Ability tests and personality tests.
 Standard tests have these five basic principles:
– Standardised procedure;
– Sample of behaviour;
– Scores and categories;
– Norms and standardisation; and
– Prediction of behaviour.
 Evidence shows that tests were first systematically used in China. Later, the
Western civilisation learned about testing programmes through the Chinese
and further developed tests.
 The early development of tests is closely related to the selection of

government officials. Then, psychological tests were developed, with an
initial focus of assessing the differences of individuals in terms of various
physical abilities.
 In early contemporary history of psychological testing, the major focuses of

psychological testing were in terms of intelligence tests for educational and
army screening purposes. After that, personality tests and a variety of other
tests were developed and are still being used until today.
 The main advantage of psychological testing is that it helps to measure,

quantify, assess and interpret in order to understand various psychological
aspects in human beings.

TO
OPIC 1 INTRO
ODUCTION TO TESTING AND ASSESSMENT  19
 The major
m limitattion that need
ds attention when
w applyin
ng psychologiical tests
is th
hat there is no
n single test that can meeasure in full the compleexities of
human psycholog gical dispositiions.
Ability tests Men

ntal age
Competence Norm
ms and stand
dardisation
ological age
Chrono Perssonality tests
Group tests
t Psycchological test
Individ
dual differencees Psycchometrics
Individ
dual tests Sam
mple of behaviiour
Intelligeence quotientt Stan
ndardised pro
ocedure
Anastasii, A. (1988). Psychological

Ps t
testing . New York: Macmiillan.
Cohen, R.
R J., & Swerddlik, M. E. (20010). Psycholo
logical testing
g and assessm
ment: An
intr
troduction to tests and meeasurement. Boston,
B MA: McGraw-Hill Higher
Edu ucation.
Gregory,, R. J. (2007). Psychologica

cal testing: History,
Hi princip
iples and app
plications
(5th
h ed.). Boston n, MA: Allyn and Bacon.
Janda, L. H. (1998). Psychological

P l testing: Theo
ory and appli
lications. Bostton, MA:
Alllyn and Baconn.
Kaplan, R. M., & Saccuzzo,

S D. P. (2009). Psychologica
P al testing: Prrinciples,
app
plications, an
nd issues. Belm
mont, CA: Waadsworth Cen
ngage Learnin ng.
Thurston
ne, L. L. (19938). Primary
y mental abil
ilities. Chicag
go, IL: Univeersity of
Ch
hicago Press.

Topic  The Science of
2 Psychological
Measurement
LEARNING OUTCOMES
1. Discuss the importance of basic statistics for tests and
measurement;
2. Identify the scales of measurement in tests;
3. Describe test data using basic statistical knowledge;
4. Explain the fundamentals of correlation and regression; and
5. Differentiate between the reliability and validity of tests.
 INTRODUCTION
In the previous topic, we were introduced to psychological testing. We also learnt
its history and development and its advantages and limitations.
In this topic, we will go in-depth into the science of psychological testing and
measurement. We will first study the basic concepts, norms and statistics for
testing. Then, we will learn about correlation, regression, reliability and validity
of these test items.

TOPIC 2 THE SCIENCE OF PSYCHOLOGICAL MEASUREMENT  21
2.1 BASIC STATISTICS FOR TESTING

Statistics are important for psychological testing and measurement. The
following subtopics explain statistics in terms of its importance, scales of
measurement, types of scales, ways to describe the data and its norms.
2.1.1 The Importance of Statistics for Psychology

Testing and Measurement
It will be very useful for you to have sound knowledge of psychological statistics
when studying psychological tests and measurement. This is because test scores
are frequently expressed as numbers and statistical tools are used to interpret
these numbers.
Statistical methods serve two important purposes in psychological testing:
(a) Statistics are used as descriptions, as numbers obtained from psychological

tests and measurement provide convenient summaries and allow us to
evaluate certain observations relative to others (Cohen & Lea, 2004; Pagano,
2004; Thompson, 2006); and
(b) We can use statistics to make inferences, which are logical deductions about
events that cannot be observed directly (Kaplan & Saccuzzo, 2009).
ACTIVITY 2.1
Can you think about examples that show the importance of statistics
for psychological testing and measurement?
2.1.2 Scales of Measurement

According to Cohen and Swerdlik (2010), measurement is the act of assigning
numbers or symbols to characteristics of things, for example people or events
according to certain rules.
The rules used in assigning numbers are guidelines for representing the
magnitude or some other characteristics, of the object being measured. For
example, check the ruler that you use to measure length. For a 12-inch ruler, the
number 12 is assigned to all lengths that are exactly the same length.

22  TOPIC 2 THE SCIENCE OF PSYCHOLOGICAL MEASUREMENT
A scale is a set of numbers, or other symbols, whose properties model empirical

properties of the objects to which the numbers are assigned. There are two major
ways of categorising scales.
A scale used to measure a continuous variable is usually referred to as a

continuous scale. For example to measure the height of a child, it is possible to
continue measuring by millimetre or even by the micrometre, but usually we will
just measure at most by millimetre.
On the other hand, a scale used to measure a discrete variable is usually referred
to as a discrete scale. For example, research subjects are to be categorised as
either female or male. In general, it will not be meaningful to categorise a subject
as anything other than female or male.
2.1.3 Types of Scales

There are four types of scales that we need to understand in psychological testing
and measurement:
(a) Nominal Scales

Nominal scales are the simplest form of measurement. These scales involve
classification or categorisation based on one or more distinguishing
characteristics, where all things measured must be placed into mutually
exclusive and exhaustive categories (Cohen & Swerdlik, 2010).
For example, questions in psychology tests and measurement which need

only „yes‰ or „no‰ answers is a part of nominal scaling like below:
(i) Do you like to read magazines related to car and machinery?; and
(ii) For the past two months, have you experienced insomnia more than
three times?
(b) Ordinal Scale

Ordinal scales permit classification. In addition, rank ordering on some
characteristics is also permissible with ordinal scales (Cohen & Swerdlik,
2010). For example, in school settings, students are rank-ordered based on
their performance in the final exam so that the class placement can be made
based on their academic abilities.

(c) Interval Scale

Interval scales contain equal intervals between numbers. Each unit on the
scale is exactly equal to any other unit on the scale (Cohen & Swerdlik,
2010). However like ordinal scales, interval scales contain no absolute zero
point. An absolute zero point is a point at which nothing of the property
being measured exists.
A common example to explain this is the measurement of temperature. In

our country, the temperature is usually measured in Celsius. Although 0
represents freezing on the Celsius scale, it is not an absolute 0 because there
are some aspects of heat that are still being measured and there is still
plenty of room on the thermometer below 0. That means something still
„exists‰ even though the temperature is „0‰.
(d) Ratio Scale

A ratio scale however has a true zero point. All mathematical operations
can meaningfully be performed because there exists equal intervals
between the numbers on the scale as well as a true or absolute zero point
(Cohen & Swerdlik, 2010).
For instance, when you say you are travelling at 0 kilometres per hour, it
means that is the point at which there is no speed at all. If you are driving at
40 kilometres per hour and in 1 minute increase to 80 kilometres, then it can
be said that you have doubled your speed.
ACTIVITY 2.2
Think about a psychological property that you want to measure by

using nominal, ordinal, interval and ratio scales so that you can
differentiate their characteristics. Discuss your ideas with your tutor
and coursemates.
2.1.4 Describing Data

Once you obtain data from psychology tests and measurement, it is essential to
describe the data in order to make the data obtained meaningful and
understandable for the subjects that you have conducted the test and
measurement on. Here are some basic concepts that you will need to know in
order to describe and understand data.

(a) Frequency Distributions

Frequency distributions display scores on a variable or a measure to reflect
how frequently each value was obtained. With a frequency distribution, we
are able to define all possible scores and determine how many people
obtained each of those scores in a test (Kaplan & Saccuzzo, 2009).
(b) Percentile Ranks

Percentile ranks replace simple ranks when we want to adjust the number
of scores in a group. A percentile rank answers the question „What percent
of the scores fall below a particular score?‰ (Kaplan & Saccuzzo, 2009). As
an example, when the intelligence score of a student is below the second
percentile, it means that 98 percent of the students who took the same test
have better scores than the student.
(c) Mean
The arithmetic average score in a distribution is called the mean. To
calculate the mean, we total the scores and divide the sum by the number of
cases (Kaplan & Saccuzzo, 2009).
(d) Median
Median is the middle score in a distribution and is a commonly used
measure of central tendency.
(e) Mode
Mode is the most frequently occurring score in a distribution of scores.
(f) Standard Deviation

Standard deviation is an approximation of the average deviation around
the mean (Kaplan & Saccuzzo, 2009). It is a measure of variability equal to
the square root of the average squared deviations around the mean. There
is a related concept to standard deviation: Variance. Variance is equal to the
arithmetic mean of the squares of the differences between the scores in a
distribution and their mean (Cohen & Swerdlik, 2010).
(g) Z Score
One of the problems with means and standard deviations is that they do
not covey enough information for us to make meaningful assessments or
accurate interpretations of data. Therefore, the Z score is often used to
transform data into standardised units that are easier to interpret (Kaplan &
Saccuzzo, 2009).

(h) Standard Normal Distribution

Standard normal distribution is central to statistics and psychological
testing. When the data from tests are normally distributed, a normal curve
can be obtained which is a bell-shaped, smooth, mathematically defined
curve that is highest in its centre. From the centre it tapers on both sides
approaching the X-axis asymptotically (meaning that it approaches, but
never touches the axis) (Cohen & Swerdlik, 2010).
ACTIVITY 2.3
Do additional readings regarding the eight basic concepts introduced

in section 2.1.4 for describing data and discuss with your face-to-face
tutor and e-tutor to further enhance your understanding. Get the
relevant formula for the concepts where applicable.
2.1.5 Norms
Norms refer to the performances by defined groups on particular tests (Kaplan &
Saccuzzo, 2009).
For example, a psychometrician develops a measurement of stress level for

administrative employees working in universities. After establishing some
psychometric properties for the test, the psychometrician will administer the test
to normative groups of people who are working in universities, in various
administrative positions.
Let us say for the normative groups of people who are administered the test, the
average score is 20. One of the employees, Azhar, who is working in OUM takes
the same test and obtains a score of 34. Then the psychologist may conclude that
Azhar is above average in the stress that he experiences in comparison to the
norms of the test.
The concept of norms is important in psychology testing and measurement

because without norms, the score from a particular psychology test and
measurement cannot be compared to provide a meaningful interpretation on the
performance of the subjects that are tested on.

ACTIVITY 2.4
Below are the scores obtained from an intelligence test on a group of

year 4 primary school children, in total 22 pupils.
89, 101, 87, 88, 70, 89, 64, 121, 90, 90, 65, 113, 100, 88, 60, 64, 79, 64, 113,
108, 99, 90
(a) Construct a table to show the frequency distribution of scores

obtained in the test;
(b) Calculate the mean of the scores of this group of children;
(c) Determine the median of score in this group; and
(d) Determine the mode of score in this group.
2.2 CORRELATION AND REGRESSION

Correlation and linear regression are the most frequently used techniques for
investigating the connection between two quantitative variables. We are going to
discuss these in detail under the following sections.
2.2.1 Correlation Analysis

Statistical thinking enhances our understanding of how life works, allows control
over some social issues and helps employers make informed decisions. Often, an
analysis of data concerning two or more quantitative variables is needed to look
for any statistical relationship or association between them that can describe
specific numerical features of the association. The knowledge of such a relationship
is important to make inferences from the relationship between variables in a given
situation. A few instances where the knowledge of an association or relationship
between two variables would prove vital to decision-making are:
(a) Family income and expenditure on luxury items;
(b) Yield of a crop and quantity of the fertiliser used;
(c) Weight and height of an individual;
(d) Age and sign legibility distance;

(e) Frequency of smoking and lung damage;
(f) Sales revenue and expenses incurred on advertising; and
(g) Age and hours of TV viewing per day.
A statistical analysis used to indicate the strength and direction of the relationship
between two quantitative variables is called correlation analysis. Table 2.1 gives
two definitions of correlation analysis.
Table 2.1: Definitions of Correlation Analysis
Definitions of Correlation Analysis Source

An analysis of the relationship between two or more variables is A. M. Tuttle
usually called correlation. (1957)
When the relationship is quantitative in nature, the appropriate Croxton and
statistical tool for discovering and measuring the relationship and Cowden (1939)
expressing it in a brief formula is usually called correlation.
An intelligent correlation analysis can lead to a greater understanding of your

data.
The coefficient of correlation is a number that indicates the strength (magnitude)

and direction of statistical relationships between two variables, as shown in
Figure 2.1.
Figure 2.1: Interpretation of correlation coefficient
The strength of the relationship is determined by the closeness of the points to a

straight line when a pair of values of two variables is plotted on a graph. A
straight line is used as a frame of reference to evaluate the relationship.
The direction is determined by whether one variable generally increases or

decreases when the other variable increases.

2.2.2 Significance of Measuring Correlation

The objective of any scientific or clinical research is to establish relationships
between two or more sets of observations and variables to arrive at conclusions
which are close to reality. Finding such relationships is often an initial step in
identifying causal relationships. A few advantages of measuring an association
(correlation) between two or more variables are:
(a) Correlation analysis contributes to locating critically important variables on

which others depend. In psychology, it may reveal to the psychologist, for
example, the connections by which disturbances spread in regards to a
particular behaviour and suggest to him the paths through which stabilising
forces may become effective;
(b) The effect of correlation is to reduce the range of uncertainty of our

prediction. Predictions based on correlation analysis will be more reliable
and closer to reality;
(c) In psychology, we come across several types of variables which are able
to explain different kinds of relationships. For example, there exists a
relationship among stress levels, sleep quality and how frequent a person
falls sick. Correlation analysis helps in quantifying precisely the degree of
association and direction of such relationships; and
(d) Correlations are useful in determining the validity and reliability of clinical
measures and in expressing how health problems are related to certain
biological or environmental factors.
SELF-CHECK 2.1
How would you justify the use of correlation analysis in psychological

tests?

2.2.3 Types of Correlation

There are three broad types of correlations as shown in Figure 2.2.
Figure 2.2: Three broad types of correlations
The three types are discussed in greater detail in the following.
(a) Positive and Negative Correlations

A positive (direct) correlation refers to the same direction of change in the
values of variables. In other words, if the values of variables are varying
(i.e. increasing or decreasing) in the same direction, then such a correlation
is referred to as a positive correlation.
A negative correlation refers to changes in the values of variables in the

opposite direction.
Example:
Positive Correlations
Increasing (x) 5 8 10 15 17
Increasing (y) 10 12 16 18 22
Decreasing (x) 17 15 10 8 5
Decreasing (y) 20 18 16 12 10

Negative Correlations
Increasing (x) 5 8 10 15 17
Decreasing (y) 20 18 16 12 10
Decreasing (x) 17 15 12 10 6
Increasing (y) 2 7 9 13 14
(b) Linear and Non-linear Correlations

A linear correlation implies a constant change in one of the variable values
with respect to a change in the corresponding values of another variable. In
other words, when variations in the value of two variables have a constant
ratio, it is said to be a linear correlation.
Example:
x 10 20 30 40 50
y 40 60 80 100 120
When the values of x and y are plotted on a graph paper, the line joining
these points will be a straight line.
A non-linear correlation implies an absolute change in one of the variable

values with respect to changes in the value of another variable. In other
words, the amount of change in one variable does not bear a constant ratio
to the amount of change in the corresponding values of another variable.
Example:
x 8 9 9 10 10 28 29 30
y 80 130 170 150 230 560 460 600
When the values of x and y are plotted on a graph paper, the line joining
these points will NOT be a straight line, it would be a curvy-linear.

(c) Simple, Partial and Multiple Correlations

If only two variables are chosen for a study of the correlation between
them, then such a correlation is referred to as simple correlation. For
example, a study on smoking behaviour with respect to self-esteem levels
among adolescents.
In partial correlation, two variables are chosen for a study of the correlation
between them, but the effect of other influencing variables is kept constant.
For example, attraction among people is influenced by physical proximity
and other factors such as appearance, cultural factors, values, thoughts and
so on, assuming that the average values of the other factors exist.
In multiple correlations, the relationship between more than three variables

is considered simultaneously for study. For example, employer-employee
relationship in an organisation may be examined with reference to the
training and development facilities, medical, housing and education to
children facilities, salary structure, grievance handling system and so on.
2.2.4 Regression Analysis

Correlation analysis covers the concept of statistical relationships between two
variables such as sexual satisfaction and personality types; prosocial behaviour
and gender; stress and immunity in people living with HIV and so on. The
relationships between such variables indicate the degree and direction of their
association but fail to answer the following question:
Is there any functional (algebraic) relationship between two variables?

If yes, can it be used to estimate the most likely value of one variable,
given the value of the other variable?
The statistical technique that expresses the relationships between two or more
variables in the form of an equation to estimate the value of a variable based on
the given value of another variable, is called regression analysis. The variable
whose value is estimated using algebraic equation is called a dependent (or
response) variable and the variable whose value is used to estimate this value is
called an independent (regressor or predictor) variable. The linear algebraic
equation used for expressing a dependent variable in terms of an independent
variable is called a linear regression equation.

2.2.5 Advantages of Regression Analysis

The following are some important advantages of regression analysis:
(a) Regression analysis helps in developing a regression equation by which

the value of a dependent variable can be estimated given a value of an
independent variable;
(b) Regression analysis helps to determine standard errors of estimate to

measure the variability or spread of values of a dependent variable in
respect to the regression line. The smaller the variance and error of
estimate, the closer the pair of values (x, y) fall around the regression line
and the better the line fits the data, which means that a good estimate can
be made of the value of variable y. When all the points fall on the line, the
standard error of estimate equals zero; and
(c) When the sample size is large, the interval estimation for predicting the
value of a dependent variable based on a standard error of estimate is
considered to be acceptable by changing the values of either x or y.
SELF-CHECK 2.2
Recall what you have learnt on basic statistics for psychology to

determine your understanding of the topic, based on the questions
below.
1. In the example „frequency of smoking and lung damage‰ above,

do you think you can make a definite conclusion on which is the
dependent variable and which is the independent variable?
2. Do you think correlation analysis is able to determine causal

relationships between two or more variables?

2.2.6 Differences between Correlation and Regression

Analysis
The basic differences between correlation and regression are summarised in
Table 2.2.
Table 2.2: Basic Differences between Correlation and Regression
No. Correlation Regression

1. Measuring the strength (degree) Developing an algebraic equation between
of the relationship between two variables from sample data and practicing
two variables is referred to as the value of one variable, given the value of
correlation analysis. another variable, is referred to as regression
analysis.
2. It determines an association It determines the cause-and-effect relationship
between two variables x and y between x and y, which is that a change in one
but not that they have a cause- value of independent variable x causes a
and-effect relationship. corresponding change in the value of
dependent variable y if all the factors that
affect y remain unchanged.
3. In correlation analysis, both In linear regression analysis, one variable is
variables are considered to be considered as dependent and the other
independent. variable is said to be independent.
ACTIVITY 2.5
Do you find the use of regression analysis in industries as a relevant

measure in testing suitable candidates for job placements?
2.3 RELIABILITY AND VALIDITY

In general, reliability implies „dependability‰ or „consistency‰. Validity on the
other hand, is the degree to which a test measures what it claims to measure. It is
fundamental for a test to be valid in order for the results to be precisely applied
and interpreted. We will discuss reliability and validity of tests in the sections
that follow.

2.3.1 Reliability of Tests

The reliability of tests is the consistency with which it yields the same score
throughout a series of measurements at different times but on the same subjects.
If a test is to be of any value, the person being tested should receive the same
score and his relative standing in the group should show little change if he takes
the test on different dates, say, April 25 and June 14 of the same year.
Reliability denotes that the same trait, if measured by tests, should have the same
results at different times in similar conditions, that is, the consistency and
uniformity of the tests should be maintained.
Example: A test is carried out for a particular trait to be measured, say

introversion, at one point of time. The test is said to be a reliable test, if it has
the same results for the same job. It is reliable if the same test is carried out at
some other time under similar conditions, and still yields the same results.
The main characteristics of an objective or reliable test are as follows:
(a) There will be no difference in the marks that a candidate receives if

different examiners were used to score the papers;
(b) There will not be much difference in the marks obtained by the candidates
if they are re-tested with the same or similar test; and
(c) The purpose of the test is clearly defined, so that another person working
independently would arrive at the same conclusion as that of the
candidates.
A considerable number of factors can cause tests to have low reliability. If a test is
not administered under standardised conditions, the reliability will tend to be
low. Thus, in a shorthand test for stenographers, if the material is not dictated
with the same degree of clarity and at the same speed every time, the test cannot
be expected to be reliable.
In addition, people vary from time to time in their emotional state, degree of
attention, attitude, health, fatigue and so on. If a particular test has few test
questions and is short, chance factors may determine whether an individual does
or does not know a particular fact. Also luck in the selection of answers by
guessing may introduce variance into the scores.

2.3.2 Validity of Tests

A test is said to be valid if the test that is prepared for the testers of a particular
job are tested by that test only and not by any other test. Validity is the extent to
which a test measures what it claims to measure. It is vital for a test to be valid in
order for the results to be accurately applied and interpreted.
Example: A test that has been designed for the job of a clerk in an organisation
is said to be invalid if it is used for an individual for a managerial job
position. This is because the test which has been designed for a specific job
will not display the same results or correct results for other jobs.
The validity of tests is the degree to which it measures what it is intended to

measure. In other words, it should show the extent to which a test does the job
for which it is used. In terms of employment, a valid test is one that accurately
predicts the criterion of job success. A criterion is a measurement of how
satisfactory an employee is in a particular job or in relation to his or her total
employment.
The procedure to determine the validity of a test is to compare the test with
performance on the job. A valid test that measures a specific ability must
differentiate between the more able and the less able. If it is unable to do this, it is
invalid, as it does not measure the ability in question. For example, a valid test in
a particular industry must be able to differentiate between poor and good
workers in that industry.
Validity is always specific, which implies that a good testing instrument is valid
for a specific purpose only. For instance, a test may be valid for selecting a sales
person, but invalid for selecting a scientist. It takes time to determine the validity
of a test. The applicant must be tested, hired and put to work on a mechanical
task. After a period of time, his performance on the job should be measured and
comparison should be made to determine whether the applicants who had high
scores on the test are the ones who have done better on the job.
Validity of a test is expressed in terms of a coefficient of correlation, in which the

test score is correlated with a performance criterion. For instance, the validity of
an intelligence test can be determined by correlating the test scores with the
studentÊs marks in examinations.

Broadly speaking, there are three types of validity as shown in Figure 2.3.
Figure 2.3: Three types of validity
Let us now discuss the types in greater detail.
(a) Logical Validity

A test consists of many items to measure capacities such as intelligence,
aptitude, etc. and when these items are examined to find out whether they
are relevant to the concept of the test, its logical validity is determined. In
logical validity, the logical analysis of an item incorporated in the task is
examined.
(b) Empirical Validity

In empirical validity, the relationship between the subjectÊs performances in
one situation with his performance in another situation is examined. For
instance, if the test of a studentÊs success in university examinations has
a relation with his intellectual standing in other situations, it is said to
have empirical validity. In empirical validity, therefore, the subjectÊs
performance in certain tests should correlate with certain criteria. For
instance, intelligence test scores and scholastic achievements are correlated.
(c) Factorial Validity

Factorial validity is a form of construct validity that is established through a
factor analysis. A test with factorial validity is a pure measure of certain
capacities. The validity of a test is determined by its correlation with a
factor determined by factor analysis. It measures a number of statistical
concepts. For instance, in an intelligence test, the intelligence is the factor
„g‰ that underlies all test items and subtests, that is, items and subtests that
appear to require intelligence.

SELF-CHECK 2.3
1. Give two examples of partial correlation.
2. Explain the types of regression by giving suitable examples.
3. State the difference between reliability and validity of tests.
4. How do we determine whether a test is reliable and valid?
 Test is an objective and standardised behaviour sample, which lends itself

well to statistical evaluation.
 Statistics is essential in psychology testing and measurements to describe the

data obtained and to make inferences from the data.
 Nominal, ordinal, interval and ratio scales are four major scales of
measurement in psychology tests and measurement.
 Norms are the test performance data from a particular group of test takers
that are used as a reference to evaluate or interpret individual test scores.
 Correlation analysis is a statistical analysis that is used to study the strength

and direction of the relationship between two quantitative variables.
 Regression analysis is used to express the relationships between two or more

variables in the form of an equation to estimate the value of a variable, based
on the given value of another variable.
 For a test to be efficient, it has to be reliable i.e. uniformity and consistency

should be maintained, and valid i.e. the test designed for a particular job
should be used for that job only.
 The reliability of tests is the consistency with which it yields the same score
throughout a series of measurements at different times but on the same
subjects.
 The validity of tests is the degree to which it measures what it is intended to

measure. In other words, it should show the extent to which a test does the
job for which it is used.

38  TOPIC 2 THE SCIE
ENCE OF PSYCH
HOLOGICAL ME
EASUREMENT
C
Correlation Ratio scalees
I
Interval scalees Regression
n
N
Nominal scalles Reliability
y
N
Norms Standard deviation
d
O
Ordinal scalees Standard normal
n distrib
bution
P
Percentile ran
nks Validity
C
Cohen, B. H., & Lea, R. B.. (2004). Esseentials of sttatistics for the
t social and
a
behaviorral sciences. Hoboken,
H NJ: Wiley.
C
Cohen, M E. (2010). Psychological
R. J., & Swerdlik, M. P l testing and assessment: An
A
introducction to tests and measureement. Bostonn, MA: McGrraw-Hill High her
Educatio on.
n, D. J. (19399). Applied general

Croxton, F. E., & Cowden
C ge statisttics. New York:
Prentice--Hall.
K
Kaplan, R. M.,
M & Saccuz zzo, D. P. (2009).
( Psycho
hological testi
ting: Principlles,
applicatiions, and issu
ues. Belmont, CA:
C Wadswo orth Cengage Learning.
Pagano, R. R. (2004). Und

P derstanding statistics in
n the behav
vioral sciencces.
Australiaa: Wadsworth
h/Thomson Learning.
L
T
Thompson, B. (2006). Foun
ndations of behavioral statistics:
s An
n insight-bassed
approachh. New York: Guilford Preess.
Tuttle, A. M. (1957). Elem

T mentary busin
ness and econ
nomic statisttics. New York:
McGraww-Hill.

Topic  Test
Construction
3
LEARNING OUTCOMES
1. List the stages in test construction;
2. Differentiate the various types of item formats;
3. Describe test items effectively;
4. Discuss the steps taken in evaluating the test items; and
5. Explain various concepts related to item analysis.
 INTRODUCTION
Valid tests do not just materialise out of thin air; they emerge gradually from an
evolutionary, developmental process that builds in validity from the very
beginning. Test construction is a developmental process from the beginning of its
construction to the stage where the test is determined to be of good quality and
valid to be used. Creating a new test involves both science and art (Gregory,
2007).

40  TOPIC 3 TEST CONSTRUCTION
3.1 TEST CONSTRUCTION

This subtopic will introduce and explain the different stages of test construction.
The process of developing educational and psychological tests commonly begins

with a statement of purpose(s) of the test and the construct or content domain to
be measured (AERA/APA/NCME, 1999).
Gregory (2007) suggests that test construction comprises of six intertwined stages
as shown in Table 3.1.
Table 3.1: Six Intertwined Stages in Test Construction
No Stages Description
1. Defining the test Involves delimiting its scope and purpose, which must be
known before the developer can proceed to test
construction.
2. Selecting a scaling A process of setting the rules by which numbers are
method assigned to test results.
3. Constructing the Creativity of the test developer is required at this stage.
items
4. Testing the items Once a preliminary version of the test is available, the
developer usually administers it to a modest-sized sample
of subjects in order to collect initial data about test item
characteristics. Testing the items involves a variety of
statistical procedures referred to collectively as item
analysis. The purpose of item analysis is to determine which
items should be retained, revised or thrown out.
5. Revising the test Based on item analysis and other sources of information, the
test is then revised. If the revisions are substantial, new
items and additional pre-testing with new subjects may be
required.
6. Publishing the test In addition to releasing the test materials, the test developer
must produce a user-friendly manual.
Based on the six intertwined stages in test construction mentioned above, the first
four stages which are more essential are discussed in detail in the following.

TOPIC 3 TEST CONSTRUCTION  41
3.2 DEFINING THE TEST: WHAT TO

MEASURE?
In order to construct a new test, the developer must have a clear idea of what the
test is to measure and how it is to differ from existing instruments. Test
development begins with a clear statement of purpose for the test. This statement
includes delineation of the trait(s) to be measured and the target audience for the
test. The statement should be formulated keeping in mind the type of
interpretation ultimately intended for the test score(s).
From a practical point of view, after the purpose of the test has been clearly
stated, one should not proceed immediately to build the test. The next step
should be to determine whether an appropriate test already exists for the same
purpose.
Kaufman and Kaufman (1983) provide a good model of the test definition
process. In proposing the Kaufman Assessment Battery for Children (K-ABC), a
new test of general intelligence in children, the authors listed six primary goals
that defined the purpose of the test, which distinguishes it from existing
measures (Kaufman & Kaufman, 1983), as shown in Figure 3.1.
Figure 3.1: The six primary goals

Several preliminary design issues must be considered in constructing a new test.

In the earliest stages of test development, the test developer must make a number
of decisions about the design of the test. These decisions are based on the testÊs
purpose, intended score interpretations as well as practical considerations. The
following design issues as shown in Table 3.2 must be taken into consideration
when designing tests.
Table 3.2: Preliminary Design Issues
Preliminary Design Issues Description

Mode of administration Will the test be individually administered or amenable to
administration to a group?
Length Approximately how long will the test be? A short period
is obviously more efficient but it may mean very limited
reliability and only one score.
Item format What item format will be used: multiple choice, true-
false, agree-disagree or constructed response?
Number of scores How many scores will the test yield? More scores allow
for additional interpretations, but more scores require
more items and therefore, more testing time.
Score reports What kind of score reports will be produced? Will there
be a simple, handwritten record of the score or an
elaborate set of computer-generated reports, possibly
including narrative reports?
Administrator training How much training will be required for test administration
and scoring?
Background research This research should include literature search and may
also include discussions with practitioners in the
respective fields.
3.3 SELECTING A SCALING METHOD: TYPES

OF ITEM FORMATS
There are several item formats available for measuring the construct of human
behaviour (refer to Figure 3.2). The discussion in this part is focused on several
easy, simple and most used item formats in many psychological tests and
inventories. The types of item formats depend on the use and types of tests. The
item formats most used in the test of ability and performance are multiple choice
items and two-choice answer formats. Psychological tests in the form of surveys
and inventories, on the other hand, are more likely to use the Likert scale and
dichotomous response format.
Figure 3.2: Five formats in measuring the construct of human behaviour
Let us now examine the formats in greater detail.
(a) Two-choice Answer Format

Offers two alternatives for each item. The advantages are that it is simple,
easy to administer, quick to score and requires absolute judgement.
However, there are several disadvantages, such as it encourages
memorisation and must include many items in order to be reliable.
(b) Multiple Choice Format

Each item has more than two alternatives. A point is given for the selection
of one of the alternatives and no point is given for selecting any other
choice. The incorrect choices are called distractors. A review of the
problems associated with selecting distractors suggests that it is usually
best to develop three or four good distractors for each item (Anastasi &
Urbina, 1997). Properly constructed items can measure conceptual as well
as factual knowledge. Multiple choice tests also permit quick and objective
machine scoring.
The major shortcomings of multiple choice questions are, first, the difficulty
of writing good distractor options and second, the possibility that the
presence of the response may cue a half-knowledgeable respondent to the
correct answer.
(c) Likert Format

Used as part of LikertÊs (1932) method of attitude scale construction. A
Likert scale presents the examinee with five responses ordered on an
agree/disagree or approve/disapprove continuum. In some applications,
six options are used to avoid the respondent from being neutral: strongly

disagree, moderately disagree, mildly disagree, mildly agree, moderately

agree and strongly agree. Scoring requires that any negatively worded
items be reverse scored and the responses are then summed up.
(d) Category Format

Similar with the Likert format but uses an even greater number of choices.
Example: A 10-point rating system such as „On a scale of 1 to 10, with 1 as
the lowest and 10 as the highest, how would you rate your new boyfriend
in terms of attractiveness?‰ Experiments have shown that responses to
items on 10-point scales are affected by the groupings of the people or
things being rated. A variety of studies have showed that people will
change ratings depending on context (Norman, 2003). When given a group
of objects to rate, subjects have a tendency to spread their responses evenly
across the 10 categories (Stevens, 1966).
(e) Checklists
One example of a checklist is an adjective checklist (Gough, 1960) in which
a subject receives a long list of adjectives and indicates whether each one is
a characteristic of himself or herself. An adjective checklist can be used for
describing either oneself or someone else. It requires subjects either to
endorse such adjectives or not to endorse them, thus allowing only two
choices for each item.
Example:
Traits that characterise a group of 40 graduate students.
(i) High Originality

Adventurous, alert, curious, quiet, imaginative and fair-minded.
(ii) Low Originality

Confused, conventional, defensive, polished, prejudiced and suggestible.
SELF-CHECK 3.1
1. List the main steps involved in test construction.
2. Discuss the differences between multiple choice format and the

Likert scale.
3. Name two examples of tests that use the Likert scale.

3.4 CONSTRUCTING THE ITEMS: WRITING

TEST ITEMS
Hypothetically, all items are randomly chosen from a universe of item content.
However, care in selecting and developing items is valuable.
DeVellis (1991) provides several simple guidelines for item writing, which are:
(a) Defining clearly what you want to measure;
(b) Generating an item pool;
(c) Avoiding exceptionally long items;
(d) Ensuring that the level of reading difficulty is appropriate;
(e) Avoiding double-barrelled items; and
(f) Considering mixing positively and negatively worded items.
We are going to discuss item writing in greater detail in this section.
3.4.1 Writing Test Items

Writing test items is a matter of precision. The items in a test that a test taker is
required to answer will most probably impact him or her in some way, especially
if the test is used as a part of the decision-making process at particular occasions.
For example, there are a number of job applicants who are depending on the
results of a personality test for their future in securing a dream job. The test items
are basically questions that are to be given to the individual applicants in order to
test their different personality traits as required for the job. Test items are to be
prepared keeping in mind the basic needs and objectives of the psychological
tests. A test item must focus the attention of the examinee on the principle or
construct upon which the item is based.
For item writers however, the task is to focus the attention of a group of potential
test takers for a particular test ă often with widely varying background
experiences ă on a single idea. Such communication requires extreme care in
choice of words and it may be necessary to try the items out before problems can
be identified.

3.4.2 Essential Characteristics of Item Writers

The following are some of the important characteristics that item writers should
possess before writing a test item:
(a) Knowledge and Understanding of the Material Being Tested

One vital point to consider during item writing is that the examiner should
have a thorough knowledge of the test item he has to prepare. Until and
unless the tester has the expertise and skills regarding the material that is to
be tested, he cannot write accurate test items or evaluate them effectively.
As a result, the test would be a formality that will never be able to produce
reliable and valid results.
(b) Continuous Awareness of Objectives

A test must reflect the purposes of the instruction it is intended to assess.
This quality of a test, referred to as content validity, is assured by specifying
the nature and/or number of items prior to selecting and writing the items.
Instructors sometimes develop a chart or test blueprint to help guide the
selection of items. Such a chart may consider the modules or blocks of
content as well as the nature of the skills a test is expected to assess.
Take an example of constructing a test used in an educational setting to

measure studentsÊ academic performance. In the case of criterion-referenced
instruction, content validity is obtained by selecting a sample of criteria to be
assessed. For content-oriented instruction, a balance may be achieved by
selecting items in proportion to the amount of instructional time allotted to
various blocks of material. An example of a blueprint for a test with 38 items
is shown in Table 3.3.
Table 3.3: Test Blueprint ă An Example
Types of Tests Reliability Validity Correlation Total

Knowledge of terms 3 1 1 5
Comprehension of principles 3 4 4 11
Application of principles 2 4 6 12
Analysis of situations 1 4 2 7
Evaluation of solutions 1 1 1 3
Total 10 14 14 38
Source: https://www.msu.edu/dept/soweb/writitem.html

The blueprint specifies the number of items to be constructed for each cell
of the two-way chart. For example, in the test blueprint shown in Table 3.3,
two items are to involve the application of principles of reliability.
(c) Understanding of the Target Test Takers for whom the Items are Intended
The item difficulty should be such that it does not go over the heads of the
examinees; they should not feel mentally pressured regarding what has
been given in the test. On the other hand, the item difficulty should also not
be too low, that it does not pose any challenge for the examinees or test
takers to the point that they start taking it lightly. For example in an
achievement test for students, the students must identify the test items with
what they themselves have covered in their studies.
(d) Skill in Written Communication

The test items given to the students should be understandable to them. The
language that has been reflected in the test items should not be too
complicated that it becomes difficult for the students to understand. An
item writer's goal is to be clear and concise. The level of reading difficulty
must be appropriate and the wording must not be more complicated than
that used in the instruction.
(e) Prepare Keys or Model Answers in Advance of Test Evaluation

Once the test has been taken, another important step is to evaluate the test.
For that purpose, the writers must prepare keys or answer booklets in
advance to avoid any confusion in the answers that are given on the test.
This model answer booklet will be helpful in avoiding any kind of
imprecision during the evaluation of the tests. Preparing a key for objective-
type items or a model answer for essays or short answer items is an
excellent way of checking the quality of the items. If there are major flaws
in the items, they are likely to be discovered in the keying process.
(f) Avoid Jargon and Yextbook Language

It is sometimes essential to use technical terms in an area of study.
However, jargon and textbook phrases should be avoided as much as
possible.
(g) Place all Items of a Given Type of Format Together in the Test
The questions in the test should be organised properly so as to keep the
same types of items format together in the test. For example, it is better to
group items in the Likert format together if they are measuring the same
concept. This allows the examinees to respond to all items requiring a
common mindset at one time. They do not have to shift back and forth from
one type of task to another. Furthermore, when items are grouped by type,
each item is contiguous with its appropriate set of directions.
ACTIVITY 3.1
Think about a psychological property that you want to measure. Based
on the essential characteristics in writing better test items, write ten
related test items for measuring the chosen property. Discuss what you
have written with your study mates, face-to-face tutor and e-tutor to
check how effective your suggested test items are.
3.5 TESTING THE ITEMS (1): ITEMS

EVALUATION
Once the test items are prepared, the very tedious task of assessing the
performance of the test has to be done. Evaluation of the test items is a very big
responsibility on the part of the examiner. It tests the credibility and unbiased
nature of the examiner and also his potential in writing test items.
Again, take an example of an achievement test for students in a school. After the
test is administered, there will be a wealth of information available about how
students performed on each test item. The most convenient way to organise all
this information is in an Item Analysis (IA). An IA provides a breakdown of how
different types of students performed on various aspects of each item. IAs are
particularly useful for multiple choice tests, but could conceivably be used for
other item types as well.
Instructors who bring their test data to Testing & Evaluation Services for
scanning and scoring will receive a detailed IA report along with their scored
rosters. The process of evaluating tests is described in Figure 3.3 below.
Figure 3.3: Evaluating the test

Source: testing.wisc.edu/LL01-041.pdf

While evaluating the test, the following points should be considered:
(a) Determine the purpose of evaluation;
(b) Include multiple types of evaluation criteria;
(c) Establish from the beginning what will be used to evaluate examinees;
(d) Determine whether the evaluation is a norm or the criterion referenced.

Grading on a curve is usually not a good idea;
(e) Be consistent and fair;
(f) Go over the answers to the test questions with psychometricians or other
psychologists to get broader views on the items written; and
(g) Focus on the major points that were not understood by the examinees and
make appropriate amendments on the particular test items.
The reason for evaluating psychological tests and measurement is to ensure that
the tests constructed are precise and appropriate to measure what they are
supposed to measure, in a reliable and valid manner.
3.6 TESTING THE ITEMS (2): ITEM ANALYSIS

In the process of test construction, test developers usually create items in large
numbers, often twice the actual number of items intended for use. Hogan (2003)
suggests that twice the number of items needed for the final test should be
prepared. Aiken (2000) recommends that for the purpose of generating a pool of
items for objective tests, 20% more items than actually needed should be
prepared initially so that an adequate quantity of good items will be available for
the final version of the test.
Facione (2000) also agreed by saying that statistical analyses of the responses of a
sufficiently large and representative sample of test takers allow for the
elimination of items that fail adequately to discriminate among test takers, items
where the responses are inversely correlated with the overall scores of the test
and in the interest of brevity, items that add little or nothing by way of further
refinement of overall scores.

How are the samples of final test items chosen from the original items? Test
developers usually employ item analysis, a set of statistical procedures to
identify the best items to be included in a test. Generally, the objective of item
analysis is to determine which items should be retained, improved and
eliminated. Many of the methods of item analysis originate from application in
ability and performance testing, especially for multiple choice questions. In these
domains, there are right and wrong answers. However, item analysis procedures
are also used for tests in other domains, such as in personality and attitude tests.
The evaluation of whether a test is a good test depends on its reliability and
validity. Therefore, good items must also have reliability and validity. In other
words, a good test must consist of good items. In addition, good test items must
also be able to discriminate between test takers. This means that a good test item
is one that high scorers on the test as a whole get right. An item that high scorers
on the test as a whole do not get right is probably not a good item. We may also
describe a good test item as one that low scorers on the test as a whole get wrong.
An item that low scorers on the test as a whole get right may not be a good item.
3.6.1 Item Difficulty Index

Item difficulty refers to the percentage of examinees who answer correctly for
certain items; or responding in a certain direction for items where there is no
correct answer, for example, responding „agree‰ to an attitude item. If everyone
gets a correct answer for item 1 (Figure 3.4) in an examination, can we say that
item 1 is a good item? How about if no one gets a correct answer for item 1?
Figure 3.4: Example of item 1 in an examination
If all test takers answer item 1 correctly it means that this item is too easy and is
not a good item. In contrast, if everyone answers item 1 incorrectly; this shows
that this item is too difficult. The item difficulty index is therefore a useful
technique for identifying items that need to be improved on or eliminated.

Item difficulty index is defined as the number of individuals getting the correct
answer for an item. For example, if 84% of individuals taking the test answer
item 24 correctly, then the item difficulty index is .84. Item difficulty is
represented by the value of p, where „p‰ indicates percentage or proportion.
These proportions do not really indicate item „difficulty‰ but item „easiness‰.
The higher the proportions of people who get the item correct, the easier the item
(Allen & Yen, 1979).
In practice, a good item usually has an item difficulty in the range of .30 to .70.
The statistics referred to as an item difficulty index in the context of achievement
testing may be an item-endorsement index in other contexts, such as personality
testing. Here, the statistics provide not a measure of the percentage of people
passing the item but a measure of the percentage of people who said yes to,
agreed with or endorsed the item. In most tests, the items should have a variety
of difficulty levels because a good test discriminates at many levels (Kaplan &
Saccuzzo, 2005).
ACTIVITY 3.2
Read the following statement and answer the questions below.
63% of test takers get the correct answer for an item in a cognitive
ability test.
(a) What is the item difficulty index for this item?
(b) Is this item easy or difficult? Give reason(s) for your answer.
(c) Is this item a good item based on its item difficulty index? Provide
your thoughts on this.
3.6.2 Item Discrimination Index

The evaluation of item discrimination determines whether an individual
obtaining high scores on the whole test also shows high performance for items.
This involves employing the item discrimination index, which refers to an itemÊs
ability to discriminate statistically among groups of test takers. This index
analyses the relationship between performances on certain items with the
performance of the whole test. The degree of discrimination is represented by the
symbol D (difference or discrimination).

The Brennan discrimination index was introduced by Brennan (1972; in Iran

Herman & Muhamed Awang, 1999). It is given the symbol B. Brennan said
that the B discrimination index can evaluate the effectiveness of a test item to
discriminate between high and low groups after they have taken the test. The
formula for calculating the Brennan discrimination index is:
B = (U/n1) ă (L/n2)
where,
B = Brennan discrimination index
U = The number of individuals in the upper group who answered item 1
correctly
L = The number of individuals in the lower group who answered item 1
correctly
n1 = The number of individuals in the upper group
n2 = The number of individuals in the lower group
According to Brennan (1972; in Iran Herman & Muhamed Awang, 1999) dividing
the upper and lower groups must be based on a meaningful comparison value
that can truly separate the two groups. Allen and Yen (1979) defined the upper
and lower groups as highs and lows of 10% to 33% respectively, from the number
of individuals in a group. Kelley (1939) suggested that the number of individuals
in the upper and lower groups is 27%. However, the same decision can be
obtained in using 30% or 50% for the upper and lower groups (Beuchert &
Mendoza, 1979; in Iran Herman & Muhamed Awang, 1999).
Several steps need to be followed when calculating the item discrimination

index. First, start with the distribution of total scores on the test. B is calculated
by first identifying two groups of people: a group who received low scores on the
test and a group who received high test scores. Then, the proportion of low
scorers who answered an item correctly is subtracted from the proportion of high
scorers who answered the item correctly.
The B index is between ă1.0 and +1.0 where items with positive and high values
are better items compared to items with lower values. To determine whether an
item has discrimination power, Ebel (1965) proposed the following guideline as
shown in Table 3.4 in interpreting an item discrimination index.

Table 3.4: Interpretation of an Item Discrimination Index
Discrimination Value Discrimination Function and Evaluation
Same or more than .40 Good and satisfactory

.30ă.39 Not good and has to be improved
.20ă.29 Weak and has to be improved
.19 below Too weak and item has to be eliminated
3.6.3 Item Reliability Index

Test developers may want an instrument that has high internal consistency in which
items are homogenous. One simple way to determine whether one item depends on,
is related to or hangs together with other items in the test is by correlating that item
with the total score of the test. This can be done by calculating the standard deviation
(SD) of an item because the standard deviation shows how much dispersion that
item has and correlating that item with the total score (rit).
Item reliability index = (SD) (rit)
By computing the item reliability index for every item in the preliminary test, we
can eliminate the „outlier‰ items that have the lowest value of this index. Such
items will usually possess poor internal consistency or weak dispersion of scores
and therefore do not contribute to the goals of measurement.
3.6.4 Item Validity Index

Test developers may also want an instrument that has high validity in its items.
One way is to determine whether one item is related with the test criteria as what
is done in criterion related validity. This is achieved by calculating the standard
deviation of an item because the standard deviation shows how much dispersion
an item has and correlating this with the score of criteria (ric). In other words, the
index is obtained with the standard deviation (SD) of an item and its correlation
with the criterion score (ric).
Item validity index = (SD) (ric)
After the four major stages discussed above are completed, the test is then
revised to improve its quality based on the characteristics of a good test. After
which, the test is ready for use.

SELF-CHECK 3.2
1. What are the steps taken when evaluating test items?
2. Review the four types of analyses discussed in section 3.6.

Differentiate what aspects of each item, each analysis focuses on.
ACTIVITY 3.3
What would you do with an item in a test that you developed which
has an item discrimination index of 0.15? Justify your answer.
 Gregory (2007) suggests that test construction consists of six intertwined

stages: defining the test, selecting a scaling method, constructing the items,
testing the items, revising the test and publishing the test.
 The item formats most used in the test of ability and performance are
multiple choice items and two-choice answer formats, while psychological
tests in the form of surveys and inventories usually use the Likert scale and
dichotomous response format.
 Writing test items should be done carefully so as to fulfil the objectives of

giving the test. The language used should be simple.
 After writing, evaluation is also important as the future test taker for a
particular test may depend on it in making important decisions. It should be
fair, unbiased and accurate. Feedback should be given after the test.
 Test developers usually employ item analysis and a set of statistical

procedures to identify the best items to be included in a test.
 Four techniques of item analysis that can be used are item difficulty index,
item discrimination index, item reliability index and item validity index.

TOPIIC 3 TEST CONSTRUCTION  55
Criterio
on-referenced
d test Item
m reliability in
ndex
Dichoto
omous respon
nse format Item
m validity indeex
Distracttors Likeert scale
Item an
nalysis Multtiple choice ittems
Item diffficulty index
x Norm
m-referenced
d test
Item disscrimination index Test construction
n
Item forrmats Test blueprint
AERA/AAPA/NCME Joint Comm mittee. (1999)). Standards for educatioonal and

psy
ychological teesting. Washin
ngton, DC: Am
merican Psychhological Asso
ociation.
L R. (2000). Personality:
Aiken, L. Pe Th
Theories, assesssment, resear
arch, and appllications.
Sprringfield, IL: Charles
C C. Th
homas.
Allen, M. W M. (1979). Introduction

M J., & Yen, W. I t measureme
to ment theory. Monterey,
M
CAA: Brooks/Colle.
na, S. (1997). Psychologica

Anastasii, A., & Urbin P al testing. Upp
per Saddle River,
R NJ:
Preentice Hall.
1). Scale deveelopment: Th

DeVelliss, R. F. (1991 heory and ap
pplications. Newbury
N
Parrk, CA: Sage.
Ebel, R. L. (1965). Measuring

M edu
ducational ach
hievement. Englewood Clliffs, NJ:
Preentice-Hall.
Facione, P. A. (2000)). The dispossition toward ds critical thinking: Its ch

haracter,
meeasurement, and
a relationsship to criticaal thinking skill.
s Informaal Logic,
20(1),
( 61ă84.
Gough, H.
H G. (1960). The y assessment research
T adjectivee check list as a personality
technique. Psych
chological Rep
ports, 6(1), 1077ă122.

Gregory, R. J. (2007). Psychological testing: History, principles and applications.

(5th ed.). Boston, MA: Allyn and Bacon.
Hogan, T. P., & Cannon, B. (2003). Psychological testing: A practical introduction.

New York: Wiley & Sons.

applications, and issues. Belmont, CA: Wadsworth Cengage Learning.
Kaufman, A. S., & Kaufman, N. L. (1983). K-ABC: Kaufman assessment battery

for children. Circle Pines, MN: American Guidance Service.
Iran Herman & Muhamed Awang. (1999). Ujian dan pengukuran. Modul
Pengajian Jarak Jauh. Bangi: Universiti Kebangsaan Malaysia.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of
test items. Journal of Educational Psychology, 30(1), 17.
Likert, R. (1932). A technique for the measurement of attitudes. New York,

The Science Press.
Norman, G. (2003). Hi! How are you? Response shift, implicit theories and
differing epistemologies. Quality of Life Research, 12(3), 239ă249.
Stevens, S. S. (1966). A metric for the social consensus. Science, 151, 530ă541.

Topic  Test
Administration
4
LEARNING OUTCOMES
1. List the principles of effective interviewing;
2. Differentiate between various types of interviews;
3. Describe the important issues related to test administration;
4. Explain the various responsibilities of a test administrator and the
ways to conduct a psychology test and measurement session
effectively; and
5. Discuss the advantages and disadvantages of computerised testing.
 INTRODUCTION
This topic discusses the administration of psychological testing. First, we will
look at interviews as a form of psychological testing used to obtain data on
human behaviour. This will include principles of effective interviewing as
suggested by Kaplan and Saccuzzo (2005). Several types of interviews and their
application in psychology will also be discussed.
After which, the general process of administering psychology tests and

measurement and the significant issues related to this form of testing will be
discussed. Discussion on computerised test administration, as a result of the
advancement of computer technology, will also be carried out.

58  TOPIC 4 TEST ADMINISTRATION
4.1 INTERVIEWING TECHNIQUES

Before the administration of psychological tests and measurement, an interview
is the fundamental process that the test administrator needs to start with. Even
during the process of test administration, the interview continues. Effective
interviewing will enable test administrators to gather relevant and important
information to assist in interpreting the results of the psychology test effectively,
keeping in mind its relevance to the test taker.
An interview is actually like a test. Similar to any psychological or educational

test, an interview is a method of gathering data or information about an
individual. The interview involves two or more people interacting with each
other. Some interviews are conducted like individually administered tests, with
the interviewer interacting with one individual at a time. However, in some
interviews such as family interviews, one interviewer works with two or more
individuals at the same time, just like in a group test.
Many psychological tests, such as the Thematic Apperception Test (TAT), cannot
be properly used without conducting interviews. The interview remains one of
the most prevalent selection devices for employment (Posthuma, Morgeson &
Campion, 2002). Furthermore, interviewing is the chief method of collecting data
in clinical psychiatry (Allen & Smith, 1993; Groth-Marnat, 2003). Therefore,
interviewing is an important method of collecting data across many fields of
psychology such as clinical, industrial, counselling, school and correctional
psychology.
4.1.1 Principles of Effective Interviewing

Interview skills involve specific interviewing techniques and approaches, with
varying skills, depending on factors such as the type of interview (e.g.,
employment versus diagnostic) and the goals of the interviewer (e.g., description
versus prediction). Conducting an interview requires flexibility as the goal is to
get as much information about the interviewees as possible in order to
understand them and predict their behaviour. Kaplan and Saccuzzo (2005)
suggest four principles of effective interviewing as shown in Figure 4.1.

TOPIC 4 TEST ADMINISTRATION  59
Figure 4.1: Four principles of effective interviewing
Let us discuss the principles in detail.
(a) The Proper Attitudes

Good interviewing is a matter of attitude rather than skill (Duan & Kivlighan,
2002; Tyler, 1969). Experiments in social psychology have shown that
interpersonal influence (the degree to which one person can influence another)
is related to interpersonal attraction (the degree to which people share a feeling
of understanding, mutual respect, similarity and the like) (Dillard & Marshall,
2003; Green & Kenrick, 1994; Hensley, 1994). Attitudes related to good
interviewing skills include warmth, genuineness, acceptance, understanding,
openness, honesty and fairness. The interviewer should be warm, open,
concerned, involved, committed and interested, regardless of subject matter or
the type or severity of the problem.
(b) Responses to Avoid

There are several responses that an interviewer should avoid. One of them is
making interviewees feel uncomfortable. When this happens, interviewees
tend to be uncooperative and they will reveal very little information about
themselves. One of the purposes of the interview is to determine how well an
individual behaves in difficult situations. If the goal is to get as much
information as possible, then interviewers should avoid the following
responses, as shown in Table 4.1.

Table 4.1: Four Responses that an Interviewer Should Avoid
Responses Description
Judgemental or Being judgemental means evaluating the thoughts, feelings
evaluative or actions of another. These judgements prevent other people
statements from revealing important information.
Probing The interviewer may push the interviewee to reveal
statements something that the interviewee is unwilling to reveal. This
means that the interview is demanding for more information
than the interviewee wants to voluntarily give. If this
happens, the interviewee will probably feel anxious and
therefore refuse to reveal additional information.
Hostility The interviewer uses hostile statements which can anger the
interviewee. Interviewers should avoid such responses
unless necessary, for example, when determining how an
interviewee reacts to anger.
False reassurance A reassuring statement attempts to comfort or support the
interviewee. Though reassurance is sometimes appropriate,
an interviewer should always avoid false reassurance.
(c) Effective Responses

One major principle of effective interviewing is to ensure the continuity of
interaction. We can effectively start the interview process by using an open-
ended question instead of a close-ended question. A closed-ended question
will bring the interview to a stop. Open-ended questions give the
interviewee freedom to choose the topics that are important to him or her.
The difference between the two is that open-ended questions require the
interviewee to produce something spontaneously, while close-ended
questions require people to recall something.
(d) Responses to Keep the Interaction Flowing

After asking the open-ended question, the interviewer lets the interviewee
respond without interruption. The interviewer at this point should remain
quiet and listen. From the intervieweesÊ answers, the interviewer makes
responses, whereby the interviewer may use any of the following types of
statements, as shown in Table 4.2.

Table 4.2: Three Types of Statements during an Interview
Types of Statements Description

Verbatim playback The interviewer simply repeats the intervieweeÊs last
response.
Paraphrasing and Paraphrasing is similar to the intervieweeÊs response,
restatement compared to a restatement. Both try to capture the meaning
of the intervieweeÊs response. Both communicate that the
interviewer was listening and makes it easy for the
interviewee to elaborate further.
Summarising and In summarising, the interviewer extracts the meaning of
clarification several responses. The clarification statement, on the other
hand, serves to clarify the intervieweeÊs response.
ACTIVITY 4.1
Write a short dialogue for each of the three types of statements, as

shown in Table 4.2, that reflects how an interviewer makes responses
effectively to interviewees during an interview to keep the interaction
flowing. Do additional readings to enhance your understanding.
4.1.2 Types of Interview

Generally, there are three types of interviews, as shown in Figure 4.2.
Figure 4.2: Three types of interviews

Let us examine each one in greater detail in the following.
(a) Evaluation Interview

An evaluation interview begins with an open-ended question, with the
interviewer „listening, facilitating and clarifying‰ during the initial phases of
the interview (Maloney & Ward, 1976). The interviewer is also recommended
to use confrontation in the process of gathering data in this form of interview.
A confrontation is a statement that points out a discrepancy or inconsistency
(Kaplan & Saccuzzo, 2005).
Carkhuff (1969) recognised three types of confrontation:
(i) A discrepancy between what the person is and what he or she wants
to become;
(ii) A discrepancy between what the person says about himself or herself
and what he or she does; and
(iii) A discrepancy between the personÊs perception of himself or herself

and the interviewersÊ experience with the person.
Towards the end of the interview, direct questions can be used to fill in
details or gaps in the interviewerÊs knowledge. The use of direct questions
is necessary in three conditions:
(i) The data cannot be obtained in other ways;
(ii) Time is limited and the interviewer needs specific information; and
(iii) The interviewee cannot or will not cooperate with the interviewer.
(b) Structured Clinical Interview

Usually, structured interviews are conducted with a specific set of
questions presented in a particular order. In addition, there is usually a
specified set of rules for probing so that all interviewees are handled in the
same manner as in a standardised test. The development of structured
clinical interviews gained importance with the development of the
Diagnostic and Statistical Manual of Mental Disorders (DSM). A specific set
of questions has been developed to determine whether or not a person
meets the criteria for mental disorders.

We can assess the reliability of structured clinical interviews but it lacks

flexibility. One major disadvantage of the structured interview is that it
depends totally on the respondent. It assumes that the respondent is honest
and capable of accurate self-observation and that the respondent will
provide frank and candid answers, even to embarrassing questions (Kaplan
& Saccuzzo, 2005).
(c) Case History Interview

The interviewer obtains an in-depth description of the interviewee by
asking specific questions. This is done to obtain a complete case history or a
biographical sketch of the interviewee. Case history data may include a
chronology of major events in the personÊs life, a work history, a medical
history and a family history. In obtaining a case history, the interviewer
often takes a developmental approach, examining an individualÊs entire
life, beginning with infancy or the point at which the given type of history
is first relevant. The purpose of obtaining a case history is to understand the
individualÊs background so that one can accurately interpret individual test
scores.
ACTIVITY 4.2
1. Select an individual as your subject. Conduct a case history
interview on the subject.
2. Name two examples in which interviews are used in any field of

psychology.
3. In the examples given above, discuss whether the principles of

effective interviewing are applied.
4.2 ISSUES IN TEST ADMINISTRATION

After an initial intake interview is done to gather necessary data regarding
the examinee for a particular psychological test and measurement, test
administration begins to measure the related psychological properties.
A standardised test requires establishing norms which are units of measurement

and standardised instructions and procedures. However, it is a known fact that
situational factors can affect test scores. Variables such as physical environment,
place, time, light and temperature can influence an individualÊs performance on a
test. The examinerÊs personality and rapport with subjects and the language used
can also have an influence, which may result in examiner bias.

Apart from that, the perceptions, state of mind and previous experiences and
expectations of the examinee all play a role in a testing environment. As such, to
minimise the influence of these variables, all conditions of testing have to be
standardised.
Test administrators must also be sensitive towards disabilities ă damage of

hearing, vision, speech or motor control may skew test results. Reports by
Vernon and Brown (1964) presented the case of a young girl who was admitted
to a hospital for being mentally retarded due to the insensitivity of the test
administrator in determining the physical disability of the child. Misdiagnosis of
the girl occurred due to the insensitivity of the test administrator, which caused
the inappropriate administration of an intelligence test.
Another major error that can be made during the administration of a group test is
the inaccurate allocation of time for tests which require time limits, such as the
Miller Analogies Test (MAT).
Further issues related to test administration are discussed in the following

sections.
4.2.1 The Examiner and the Subject

Several issues have to be considered in administering psychological tests as
shown in Figure 4.3.
Figure 4.3: Issues in administering psychological tests

Let us take a look at the issues one by one.
(a) The Relationship between Examiner and Test Taker

Both the behaviour of the examiner and his relationship with the test taker
can affect test scores. Studies have shown that familiarity with the test taker
and perhaps pre-existing notions about the test takerÊs abilities can either
positively or negatively bias test results.
(b) The Race of the Tester

Race or ethnicity could become a factor in affecting language
comprehension or cultural understanding during psychological testing and
measurement. However, there is little evidence that the race of the
examiner significantly affects intelligence scores (Sattler, 2002; 2004). This is
mainly because the procedures for properly administering an intelligence
test are so specific that anyone who is trained professionally in psychology
tests and measurement should follow the procedure strictly regardless of
the race or cultural background of both the examiner and examinee. Even
though race effects in test administration may be relatively small, efforts
must be made to reduce all potential bias. Greater standardisation and
procedures for fair test administration will be very useful for this purpose.
(c) Language of Test Taker

Translating tests is difficult and it cannot be assumed that the validity and
reliability of the translation are comparable to the English version.
Adaptation or modification of existing tests require extensive translation.
Acceptable cross-cultural research involving language differences usually
must include rather sophisticated translation procedures, such as those
outlined by Brislin (1986). Many hours of careful, dedicated research may
be needed to make even a brief questionnaire appropriate for culture-
comparative research (Lonner, 1990).
(d) Training of Test Administrators

Test administrators must have good knowledge of the test manual and its
instructions before administering the test. Although some group tests are
not that difficult to administer, most individual tests have complicated
procedures and if not given due consideration, they may cause test takers
to fail in certain items unnecessarily. For instance, the administration of the
Stanford-Binet test and Wechsler Adult Intelligence Scale (WAIS) requires
test administrators to undergo training before they can administer the test.
There are also many behavioural assessment procedures which require
training and evaluation but not a formal degree or diploma. Psychiatric
diagnosis is sometimes obtained using the Structured Clinical Interview for
DSM-IV (SCID) (Spitzer et al., 1997).

(e) Expectancy Effects

Data sometimes can be affected by what an experimenter expects to find.
Robert Rosenthal and his colleagues at Harvard University conducted
many experiments on such expectancy effects, often called Rosenthal effects
(Rosenthal, 2002). The results of several experiments have consistently
shown that subjects actually provide data that confirm the experimenterÊs
expectancies. This phenomenon may occur when administering a
standardised test as well.
(f) Effects of Reinforcing Responses

Because reinforcement affects behaviour, testers should always administer
tests under controlled conditions. Sattler and Theye (1967) upon reviewing
the literature on procedural and situational variables in testing found that
an inconsistent use of feedback can damage the reliability and validity of
test scores. For instance, several studies have shown that reward can
significantly affect test performance.
ACTIVITY 4.3
1. Discuss how the familiarity between a test examiner and test taker
can either positively or negatively bias test results.
2. Try to administer a simple psychological test. Consider the issues

of test administration and how they can affect test scores when
doing so.
4.3 PRACTICAL CONSIDERATIONS IN TEST

ADMINISTRATION
Besides the issues related to examiner and subject as discussed in Section 4.2,
there are many other practical considerations which need to be taken into
consideration when administrating psychological tests and measurement.

4.3.1 Physical Environment

Controlling the physical environment is important to ensure the smooth
administration of psychology tests and measurement. Some important aspects
that interviewers should be aware of are:
(a) Light levels;
(b) Temperature;
(c) Ambient noise level;
(d) Ventilation; and
(e) Minimal distractions.
If a psychology test is conducted in a group, where everybody takes the test at

the same time in the same location, then any problems with the above factors
should affect all testees equally. If more than one testing session is taking place,
than all sessions should be held under mostly identical circumstances.
Controlling these factors of the physical environment also helps to ensure a more
reliable testing device.
ACTIVITY 4.4
1. Describe in detail the suitable conditions in terms of light levels,

temperature, ambient noise level, ventilation and minimal
distractions that ensure the smooth administration of psychology
tests.
2. How are tests affected by the psychological factors of an

individual?

4.3.2 Various Responsibilities of the Test

Administrator
The test administrator is a person who administers the test. He or she is the one
who organises the test and takes care of every detail until the results are
produced. The different responsibilities of the administrator are depicted as
follows:
(a) Scheduling an Appropriate Time for Psychological Testing and

Measurement
Discuss with the test taker for an appropriate time to schedule a psychology
test and measurement. In the case of children, they may not be aware of the
appropriate time to take a test. However, the following concerns are an
indication of what should be taken into consideration when scheduling for
testing children:
(i) Avoid doing psychology tests during lunch or playground time;
(ii) It may be better not to schedule a test immediately after holidays or

exciting events; and
(iii) Ideally, do not test for longer than 1 hour (in general, the attention
span for preschool and elementary school children is 30 minutes and
not longer than 90 minutes for secondary school children). However,
many psychology tests need longer hours. Therefore, allow breaks in
between for the children to rest.
(b) Inform the Test Takers Well before the Test

It is important to provide test takers with sufficient information on the
psychology test and measurement before it is performed on them. This
information includes:
(i) When and where the test will be given?;
(ii) What subject material will be given?;
(iii) What type of test questions will be included to the test?; and
(iv) How much time will be allowed for test takers to complete the test?
Any other relevant information based on a particular test and testing

situation should also be shared with the test takers.This information allows
the test taker to prepare in advance and can reduce test-taking anxiety.

(c) Informed Consent

Sometimes, test takers will have to give their informed consent before a
psychological test is administered. Informed consent means that the person
taking the test knows:
(i) Why the test is being given;
(ii) Who will see the results of the test; and
(iii) What the results of the test will be used for.
For school children, a parent or legal guardian must give consent.
Depending on the law, standardised educational testing and psychological

testing done for court proceedings may not require informed consent if the
testing is mandated by law or a governmental agency and the testing is
conducted as a regular part of school activities for evaluation and
assessment purposes in school.
Even when consent is not legally required, test administrators should still
inform test takers about the specifics of a test.
(d) AdministratorÊs Responsibilities

The administrator should read and master the test manual and practice the
test himself or herself before administering it to others in order to better
understand the standard administration procedures of a particular
psychology test.
Understanding the test from „both sides of the fence‰ will make the testing
session run more smoothly as the administrator will understand it from the
perspective of test takers as well.
Specific directions and procedures should also be reviewed one last time
immediately before the test begins.
Examiners must also become familiar with security procedures for secure
tests such as the Scholastic Aptitude Test (SAT), Law School Admission
Test (LSAT) and Graduate Record Examination (GRE). Each exam should
be inspected and arranged in numerical order.
(e) Ensuring Satisfactory Testing Conditions

The administrator must ensure sufficient seating, left-handed
accommodations and any other physical considerations are made
available for test takers.

Especially for achievement tests or ability tests in educational and school

settings where the test is administered in a big group, chances for cheating
can be minimised through seating arrangements. When developing the test
items in this context, preparing different exam forms or multiple answer
sheets will improve test conditions.
In addition, the administrator must be aware that proper identification may

be required for certain tests.
4.3.3 Duties of the Test Administrator during the

Process of Psychology Testing and
Measurement
The following are some examples of the duties of an administrator in the process
of carrying out a psychology test and measurement:
(a) Ensure All Test Takers are Given Proper Instructions

Instructions should be verbally given to ensure test takers hear the proper
directions at least once.
Directions should be slowly read and easy to understand. Therefore

familiarity with the test is essential.
Many tests have standardised instructions, which serve to keep the test
tasks identical for all respondents.
(b) Establishing Rapport with Test Takers

Test takers should be able to trust the examiner enough to fairly administer
the psychology test and measurement and to answer the questions
securely. Establishing a good relationship with the test taker is especially
important when a test measures sensitive psychological issues.
A good test administrator should be friendly, objective, authoritative, polite

and appropriate in dressing and manners.

(c) Remain Alert

If a psychology test is conducted in a group and has standardised answers
such as in many achievement and ability tests for various educational
settings, cheating in the test by test takers should always be prevented.
Employing a number of proctors to oversee the room, answer questions and

deter cheating, is one way to help control inappropriate test-taking conduct.
The test-taking environment should be preserved against unwarranted

intrusions or disturbances. Loud, unruly behaviour cannot be tolerated
within a mass testing situation.
(d) Preparing for Special Situations

Do all students understand English? What equipment is allowable for the
test, for example for achievement tests and ability tests are calculators,
translators, slide-rules or scrap paper allowed? Can the test administrator
deal with sudden medical problems? Invigilators must remain alert and be
flexible to deal with special circumstances that may crop up during testing.
(e) Flexibility
Standardised directions may not cover all possible situations. The test
administrator should always be prepared to deal with novel problems.
Experience is sometimes the best teacher when it comes to bizarre testing
situations.
4.3.4 Additional General Guidelines for Test

Administrators to Follow
There are some additional general guidelines for test administrators to follow
that can be summarised as follows:
(a) Provide ample time for the administration of a test;
(b) Allow the test taker to have sufficient practice on sample items;
(c) Use short testing periods if possible;

(d) Make arrangements for deficits in visual, auditory and other sensory-motor
systems;
(e) Be aware of fatigue and test anxiety and take them into account when
interpreting scores;
(f) Use encouragement and positive reinforcement whenever possible; and
(g) Do not force examinees to respond when they repeatedly decline to do so.
4.3.5 Test Administrator’s Post-test Duties

After the test is over, the test administrator will have post-test duties to tend to.
The test administrator must ensure:
(a) All required test items are answered;
(b) All answer or scoring sheets have names or other necessary identification
indicating which test paper belongs to whom;
(c) Discuss with test takers on when feedback of the test results can be given;
(d) The confidentiality of the test results and that the test and measurement
information are kept in a safe and proper place. The test administrator must
consider how the confidentiality of the tests can still be maintained even
though the respective test administrator is no longer in the organisation.
Delete test results and data which are private and confidential if necessary,
after feedback is given and the test results have served their purpose; and
(e) Test room and testing tools are back to their pre-test set up for the
convenience of the next testing session.
4.4 COMPUTERISED TESTING

The advent of technology has also influenced the way we use psychological tests.
Easy access to computers and the internet has caused test administration on
computers to increase. Several advantages of computerised testing are:
(a) Excellent level of standardisation, ensuring control;
(b) Precision of timing response;

(c) Items can be given in any order;
(d) Less costly and enables the examiner to perform other duties;
(e) Subjects are more willing to be honest than during face-to-face

administration;
(f) Reduces errors of scoring;
(g) Testers will find it more interesting to interact with a computer; and
(h) Testers are not rushed in answering the test.
However, there are also certain disadvantages to computerised administration

such as:
(a) Results are easily misinterpreted and this may cause harm to test takers;
(b) Possible routine errors or poor validation;
(c) Faulty computerised systems;
(d) Some may have untested claims;
(e) Computerised reports may be based on an obsolete database; and
(f) Depending on the computer to do all the thinking may cause the insights
and clinical judgement made by well-trained clinical psychologists to not be
taken into consideration.
SELF-CHECK 4.1
Explain the advantages and disadvantages of using computerised

testing.

ACTIVITY 4.5
1. Read Table 4.3 below regarding various testing issues and their
explanations. Discuss in tutorial or on the myVLE forum your
views on the testing issues highlighted and whether you agree
with the explanations, from the perspective of psychology testing
and measurement.
Table 4.3: Testing Issues
Testing Issues Explanation

Pop Quizzes and Should be avoided whenever possible.
Surprise Exams
Changing answers Wisdom says most often your first hunch is the right
one and changing answers usually lowers scores.
Guessing Guessing usually results in higher scores when
examinees can eliminate at least one false answer
from the choices before guessing.
Being test wise A person usually becomes more test wise with
repeated exposure to a variety of testing situations.
Gender difference Males tend to be wiser in tests than females.
2. List several psychological tests which involve computerised

administration.
 An interview is a method for gathering data or information about an

individual.
 Several principles of effective interviewing are:
ă The proper attitudes;
ă Avoid inappropriate responses;
ă Enhance effective responses; and
ă Use responses to keep the interaction flowing.

 There are three types of interviews:
ă Evaluation interview;
ă Structured clinical interview; and
ă Case history.
 Several issues have to be considered in administering psychological tests

such as the relationship between the examiner and the subject, language of
the test taker, training of test administrators, race of tester, expectancy effects
and effects of reinforcing responses.
 To administer psychological test and measurement effectively, various factors

related to the physical environment are also important.
 There are various responsibilities a test administrator needs to fulfil before,

during and after the administration of a psychology test and measurement.
 Computerised testing has advantages such as excellent standardisation of

testing procedures and cost and scoring errors reduction; however, there are
also disadvantages such as lacking of individual direct contact with the test
taker, possibility of computer system failure and so on.
Computerised testing Interviewing techniques

Case history interview Informed consent
Effects of reinforcing responses Paraphrasing
Evaluation interview Probing statements
Expectancy effects Rapport establishment
False reassurance Structured clinical interview
Flexibility Test administration

7
76  TOPIC 4 TEST AD
DMINISTRATION
Allen, N. J., Meyer,

A M J. P., & Smith, C. A. (1993).
( Comm
mitment to org
ganizations and
a
occupatio ons: Extensio on and test ofo a three-co
omponent con nceptualizatio
on.
Journal ofo Applied Psy sychology, 788(4),
( 538.
Brislin, R. W. (1986). Interccultural interractions: A prractical guidee. Beverly Hillls,

B
CA: Sagee.
C
Carkhuff, R. R. (1969). Helping
H and human
h relatiions: A prim
mer for lay and
a
professio
onal helpers. New
N York: Ho
olt, Rinehart and
a Winston.
D
Dillard, J. P., & Marshall, L.L J. (2003). PPersuasion ass a social skilll. Handbook
k of
commun nication and so
ocial interactio
on skills, 479ă5513.
Duan, C., & Kivlighan,

D K D.. M. (2002). Relationships
R among theraapist presessiion
mood, th athy, and sesssion evaluation. Psychothe
herapist empa herapy Researrch,
12(1), 23ăă37.
First, M. B., Sp
pitzer, R. L., Gibbon,
G M., & Williams, J. B. (1997). Strructured cliniical
interview
w for DSM M-IV axis I disorders SCID-I: Clin inician versioion,
administtration bookleet. Arlington, VA: Americaan Psychiatricc Publishing.
Green, B. L., & Kenrick, D.. T. (1994). Th

G he attractiven
ness of gender-typed traitss at
different relationship levels: Andro ogynous charracteristics may
m be desirab
ble
after all. Personality
P an
nd Social Psych
chology Bulleti
tin, 20(3), 244ăă253.
Groth-Marnat,, G. (2003). Digit

G D span as a measure of everyday attention: A stu
udy
gical validity. Perceptual and
of ecolog an Motor Skiills, 97, 1209ă11218.
Hensley, W. E.
H E (1994). Heig onal attraction. Adolescen
ght as a basis for interperso nce,
29(114), 469ă474.
4
K
Kaplan, R. M.,
M & Saccuz zzo, D. P. (22005). Psych
hological testi
ting: Principlles,
applicatiions, and issu
ues. Belmont, CA:
C Wadswo orth Cengage Learning.
Lonner, W. J. (1990). An ov
L verview of cro oss-cultural testing
t and asssessment. In R.
W. Brisliin (Ed.), Appllied cross-culltural psychollogy (56ă76). Newbury Park,
CA: Sagee.

Maloney, M. P., & Ward, M. P. (1976). Psychological assessment: A conceptual

approach. New York: Oxford University Press.
Posthuma, R. A., Morgeson, F. P., & Campion, M. A. (2002). Beyond employment

interview validity: A comprehensive narrative review of recent research
and trends over time. Personnel Psychology. 55(1), 1ă81.
Rosenthal, R. (2002). Covert communication in classrooms, clinics, courtrooms,

and cubicles. American Psychologist, 57(11), 839.
Sattler, J. M. (2002). Assessment of children: Behavioral and clinical applications.

San Diego, CA: Jerome M. Sattler.
Sattler, J. M., & Dumont, R. (2004). Assessment of children: WISC-IV and WPPSI-
III supplement. San Diego, CA: Jerome M. Sattler.
Sattler, J. M., & Theye, F. (1967). Procedural, situational, and interpersonal variables
in individual intelligence testing. Psychological Bulletin, 68(5), 347.
Tyler, L. E. (1969). The work of the counselor (3rd ed.). New York: Appleton-
Century-Crofts.
Vernon, M., & Brown, D. W. (1964). A guide to psychological tests and testing
procedures in the evaluation of deaf and hard-of-hearing children. Journal
of Speech and Hearing Disorders, 29(4), 414.

Topic  Intelligence
5 Test
LEARNING OUTCOMES
1. Describe the concept of intelligence and its measurement;
2. Identify the different models and theories in defining intelligence;
3. Explain major intelligence tests;
4. Describe the intelligence tests used for military purposes; and
5. Discuss critical issues regarding intelligence tests.
 INTRODUCTION
Intelligence tests are widely used by clinical psychologists in Malaysia as part of
psychological assessment especially in determining psychological disorders
related to cognition and learning.
When applying for the People with Disability Card (or in Bahasa Malaysia: „Kad
Orang Kurang Upaya‰), intelligence tests are often requested to determine the
intelligence quotient (IQ) of the applicants.
Intelligence tests are also commonly conducted in forensic psychology

assessment in our country, for both the accused and the victim, in order to help
the court to better understand the mental status of the accused when a crime is
committed and the ability of the victim to give his or her testimony.

TOPIC 5 INTELLIGENCE TEST  79
This topic focuses on the discussion of intelligence tests, one of the major
psychology tests. Theories of intelligence such as SpearmanÊs „g‰ factor theory,
ThurstoneÊs theory of primary mental abilities and the multidimensional models
of intelligence are presented as these theories provide the foundation for many
intelligence tests. Two main intelligence tests are described at length due to their
importance as the first psychological tests. They are the Stanford-Binet
intelligence test and the Wechsler scales of intelligence. As a comparison,
intelligence measurement for military use, drawing on the United StatesÊ as an
example will be introduced as well. Finally, issues related to intelligence testing
will be presented.
5.1 THE CONCEPT OF INTELLIGENCE AND ITS

DEFINITIONS
When asked what intelligence is (Journal of Educational Psychology, 1921),
psychologists at that time gave different answers, although generally most of
them said that intelligence covers two main themes. Intelligence involves:
(a) The capacity to learn something from experience; and
(b) An individualÊs ability to adapt to the demands of his or her environment.
In addition to focusing on peopleÊs assumptions in a particular culture of what is

considered an intelligent action, current cognitive psychologists also focus on
metacognition, which is an individualÊs ability to understand and control the
thinking process.
Based on various perspectives proposed thus far, intelligence can be summarised

as an individualÊs capacity to learn from experience and the ability to use
metacognitive processes to increase learning and adapt oneself to situations in
the environment that may involve adaptation in different social and cultural
contexts.

80  TOPIC 5 INTELLIGENCE TEST
Table 5.1 provides six definitions of intelligence by prominent scholars in the

field of intelligence testing.
Table 5.1: Definitions of Intelligence
Definition of Intelligence Source

The tendency to take and maintain a definite direction; the capacity to Binet (in
make adaptations for the purpose of attaining a desired end, and the Terman, 1916,
power of autocriticism. p. 45)
The ability to deduce either relations or correlations. Spearman
(1923)
Adjustment or adaptation of the individual to his total environment, Freeman (1955)
the ability to learn and the ability to carry on abstract thinking.
The ability to plan and structure oneÊs behaviour with an end in view. Das (1973)
The ability to resolve genuine problems or difficulties as they are Gardner (1983)
encountered.
Mental activities involved in purposive adaptation to, shaping of and Sternberg
selection of real-world environments relevant to oneÊs life. (1986, 1988)
SELF-CHECK 5.1
Compare and contrast the definitions on intelligence by different

scholars as provided in Table 5.1. Do additional readings to further
your academic understanding on the concept of intelligence.
5.2 INTELLIGENCE TEST AND INTELLIGENCE

QUOTIENT: THE DEVELOPMENT IN BRIEF
The history of intelligence tests development can be traced back to as early as the
1900s.
5.2.1 The Early Development

Research and assessment of intelligence began at the end of the 19th century
when Francis Galton (1822ă1911) devised a laboratory complete with tools to test
various psychophysical abilities. The methods used to assess intelligence were
consistent with GaltonÊs perspective, which suggested that human intelligence is

the function of psychophysical ability. Intelligence tests prepared in GaltonÊs

laboratory aimed to measure a wide range of psychophysical abilities and
sensitivities, such as weight discrimination, sensitivity to sound and various tests
that measure physical strength.
The techniques used in GaltonÊs tests of intelligence were widely used until the
emergence of an alternative approach developed by Alfred Binet (1857ă1911)
together with his associate, Theodosius Simon. By request from the Ministry of
Public Instruction in France at the time, Binet and Simon (1916) constructed a test
to measure intelligence, focusing on childrenÊs learning ability in academic
settings.
According to Binet and Simon, human intelligence depends on judgement, not on

accuracy, strength and psychophysical ability as suggested by Galton. Binet and
Simon assumed that intelligence comprised of three elements:
(a) Instruction ă knowing what to do and how to do it;
(b) Adaptation ă determining a self-strategy to perform a task and monitoring

that strategy while performing the task; and
(c) Critique ă the ability to criticise self-thinking and action. Priority is given to
instructions, and adaptation and critique in Binet and SimonÊs approach
can be considered to be suitable with the current perspective about
intelligence that also stresses on the metacognitive process.
5.2.2 Intelligence Quotient (IQ)

At the initial stages of constructing the intelligence test, Binet and Simon were
interested in comparing the intelligence of a child with a group of other children
with the same age, calculated based on chronological age.
To achieve this objective, Binet and Simon determined the mental age (average
level of intelligence of individuals at a certain age level) for each child. For
instance, if a child has the mental age of seven, this means his level of thinking is
similar with the thinking of other seven-year-old children.
The concept of mental age is considered suitable to compare a childÊs intelligence

with the intelligence of other children at the same age level. However, the
problem arises when we are interested in comparing the relative intelligence of
children with different chronological ages.

To overcome this problem, Stern (1912) suggested that intelligence should be

measured by using intelligence quotient, or in short IQ, a ratio obtained by
dividing the mental age (MA) with the chronological age (CA) and multiplying
this by 100 or:
IQ = (MA/CA)  100
Calculations using this formula, when the mental age of a child is higher than the
chronological age, will produce an IQ score of more than 100. In contrast, if the
chronological age is higher than the mental age, the ratio will produce an IQ
score of less than 100.
ACTIVITY 5.1
Calculate the IQ of the subjects below:
Subject Mental Age Chronological Age
W 15 years 10 years
X 30 years 40 years
Y 55 years 55 years
Z 35 years 28 years
5.3 MODELS AND THEORIES OF

INTELLIGENCE
Psychologists interested in studying the structure of intelligence generally
employ factor analysis as their main tool of research. Factor analysis is a
statistical method used to divide the construct of intelligence into several
hypothetical factors or abilities believed to provide basic differences of
individuals based on their performance on intelligence tests. Specific factors that
will be obtained depend on the questions asked and specific tasks assessed.
Research on intelligence mainly uses the correlation method, which involves
three levels:
(a) Administering the test to a group of individuals by using several different

tests of ability;

(b) Determining the correlation among all the tests; and
(c) Performing statistical analyses on all the correlations to produce a number

of simpler factors to summarise the individualsÊ performances on
intelligence tests.
Although there are variations, all researchers who use factor analysis adhere to
the steps mentioned. Several factorial theories have been proposed in the study
of intelligence and intelligence tests, including theories by Spearman, Thurstone,
Guilford, Cattell, Vernon and Carroll.
Let us now discuss a few of the models and theories proposed in detail.
5.3.1 Spearman’s Two-Factor Theory of Intelligence:

The “g” Factor
As the researcher who created factor analysis, Spearman (1927) concluded that
human intelligence can be understood not only by one single factor that
influences human performance in all tests of mental ability, but it is also based on
a set of specific factors, with each one functioning to determine performance in
one test of mental ability (for example, the ability of arithmetic calculation).
However, according to Spearman, specific factors are side interests due to its
limited usage. According to him, the general factor that is labelled „g‰ provides
an important understanding of intelligence and is considered the basis of human
mental energy.
His theory is also referred to as the two-factor theory of intelligence, with the
general factor or often called the „g‰ factor, representing the portion of the
variance that all intelligence tests have in common and the remaining portions of
the variance being accounted for mainly by specific components of this general
factor. Figure 5.1 illustrates SpearmanÊs concept of intelligence in brief.
Figure 5.1: SpearmanÊs model of intelligence

5.3.2 Thurstone’s Multidimensional Model: Primary

Mental Abilities
In contrast to the single factor test proposed by Spearman, the factor analysis
theory by Thurstone (1938) suggested that human intelligence is not based on
one single factor, instead it comprises of seven different factors, known as
primary mental abilities. This was the impetus for a multidimensional model to
conceptualise intelligence.
According to Thurstone, primary mental abilities are as shown in Table 5.2.
Table 5.2: Primary Mental Abilities
Primary Mental Abilities Description

Verbal understanding Measured by using vocabulary tests.
Verbal fluency Measured by using a time test that requires individuals
taking the test to think up as many words as they can,
beginning with certain letters that will be given by
testers.
Inductive reasoning Measured by using a reasoning test, such as an analogy
and the task of completing a number series.
Spatial visualisation Measured by using a test that requires individuals to
perform tasks of mental transformations of objects.
Numbers Measured by using calculation and simple mathematics
problem solving.
Memory Measured by using tasks of recalling of pictures and
words.
Speed of perception Measured by using tests that require individuals taking
the tests to recognise small differences that exist in
pictures, or crossing letter a located in the order of
various words.

5.3.3 Guilford’s Structure of Intellect Model

Guilford (1967) suggested a total of 150 factors that contribute to the construction
of the human structure of intellect. According to Guilford, human intelligence
can be modelled using a cube that represents three dimensions comprising of
various operations, contents and products which Guilford further explained as
follows:
(a) Operation is a mental process, consisting of cognition, memory and

evaluation (making judgement);
(b) Content is the situation that exists in a problem, such as a symbol, semantic,
behaviour, sound and visual; and
(c) Product is the response required, such as a unit, class, relationship, system,
transformation and implication.
The cube representing GuilfordÊs structure of intellect model is illustrated in

Figure 5.2.
Figure 5.2: GuilfordÊs structure of intellect model

Source: http://www.instructionaldesign.org/theories/intellect.html

5.3.4 Cattell’s Hierarchical Model: CHC Model

A model considered to be the most parsimonious in explaining the human mind
is that of the hierarchical model proposed by Cattell (1971) which was then
expanded by his student, Horn. Through the hierarchical model, Cattell
suggested that general intelligence comprised of two main sub-factors:
(a) Fluid Intelligence

Speed and accuracy of abstract reasoning, especially involving problems
that have never been encountered before.
(b) Crystallised Intelligence

The knowledge and vocabulary accumulated. Similar models have also
been proposed by Vernon (1971) and Carroll (1993).
CarrollÊs theory has similar models as CattellÊs. By using a total of 460 sets of data
collected since 1927, involving 130,000 individuals from various strata in the
society, across several countries which use English as their medium of
instruction, Carroll was able to map out his hierarchical model of intelligence.
According to Carroll, human intelligence is comprised of three strata, as shown
in Table 5.3.
Table 5.3: Three Strata of Human Intelligence
Strata of Human
Description
Intelligence
Stratum I Includes specific abilities (for example, the ability to spell and
speed of reasoning).
Stratum II Consists of various general abilities (for example, fluid intelligence,
and crystallised intelligence).
Stratum III Consists of a single general ability similar with SpearmanÊs
conception of „g‰.

Apart from fluid and crystallised intelligence, Carroll also suggested learning,
memory process, visual perception, auditory perception, idea generation and
speed (whether from speed and accuracy of response) as substrata Stratum II.
Although Carroll did not suggest anything new, he managed to integrate some
reading materials on intelligence based on factor analysis, making him the
researcher with the most authority about his model.
The Cattell-Horn theory with CarrollÊs three-stratum theory of intelligence is

referred to as the Cattell-Horn-Carroll (CHC) model of cognitive abilities.
5.3.5 Gardner’s Theory of Multiple Intelligence

Gardner proposed the theory of holistic intelligence, known as the theory of
multiple intelligence, which considers human intelligence as comprising of
multiple intelligence, all of which combine to form intelligence and is not merely
made up of one single construct. GardnerÊs multiple intelligence theory
suggested that each ability is a separate intelligence and not a part of the whole
intelligence.
Gardner listed seven types of intelligence and the tasks that reflect the related
intelligence. The seven independent frames of mind or forms of intelligence are:
(a) Linguistic;
(b) Logical-mathematical;
(c) Musical;
(d) Spatial;
(e) Bodily-kinaesthetic (skilled motor performance);
(f) Intrapersonal intelligence (e.g. Mahatma Ghandi); and
(g) Interpersonal intelligence (e.g. Lyndon Johnson).

Gardner also suggested eight signs that were considered as the criteria to detect
the existence of various types of intelligence, as shown in Table 5.4.
Table 5.4: Eight Signs to Detect the Existence of Various Types of Intelligence
No Description
1. Separation potential caused by brain deformity, which occurred due to damage to
a discrete location (for example, location related with verbal aphasia) that brings
about damage, or in contrast, retains intelligent actions.
2. Existence of individuals with special abilities (for example, ability in music and
mathematics) that show high ability, or in contrast, show handicap in intelligent
action in related fields.
3. Basic operation or a set of operations that can be identified (for example, the
ability to identify relations among musical notes) and which are considered
necessary to perform a type of intelligent action.
4. History of discrete development that propels individuals from the novice level to
the master level along with other levels of expert performance which are clear or
discrete.
5. History of evolution; through it an increase of intelligence is considered to be
related logically with the increase of adaptation to the environment.
6. Proof from the support of past experimental-cognitive studies, such as difference
of performance on specific tasks across separate types of intelligence, together
with similarities of performance across tasks and within tasks of discrete
intelligence.
7. Proof from the support of psychometric test results that show discrete
intelligence.
8. Susceptibility towards coding in the symbol system (for example, language,
mathematics, musical notes) or in the area of cultural creativity (for example,
dance, athletics, theatre, engineering and surgery).
After discussing the five popular models and theory of intelligence, two
major intelligence tests will be explained in detail in the following section. The
two major intelligence tests are: The Stanford-Binet intelligence scale and the
Wechsler scales.

SELF-CHECK 5.2
1. Do additional readings on the different models and theories

which define human intelligence. Compare all the models that
you discover to find out their similarities and differences.
2. Discuss the theories of intelligence presented previously in this

section.
5.4 THE STANFORD-BINET INTELLIGENCE

SCALE
Based on the scale developed by Binet and Simon in France, Lewis Terman at
Stanford University constructed an early version of the intelligence test known as
the Stanford-Binet Intelligence Scale. The objective of this test construction was to
identify mental retardation amongst children. The earliest version constructed
was in 1905. It was an individual test consisting of 30 items arranged according
to level of difficulty. However, there were several disadvantages to this test
which were:
(a) It did not have a suitable measurement unit to explain the test results;
(b) It did not have enough normative data to support validity; and
(c) The norms were only based on 50 children who were considered normal
according to school performance.
The second version of the Binet-Simon scale, which was revised in 1908,
introduced the concept of an age scale. Items were grouped according to age
levels and not based on difficulty levels. However, the weakness was that it did
not vary the range of abilities. The scale only comprised of language, reading and
verbal skills. However, this version introduced the concept of mental age. The
norms were also increased to 203 samples.

Revision of the Stanford-Binet Intelligence Scales in 1916 increased the sample

size further. However, it was not representative because samples were from
Caucasian children in California. This version was the first to use the concept of
IQ. It was obtained by calculating the mental and chronological age as explained
in the previous section. The mental age was obtained from test scores, then
divided with the chronological age and multiplied by 100 (IQ = MA/CA  100).
In the 1937 version, the scale widened its age range to the age level of two years
old and increased the maximum mental age to 22 years, 10 months. Samples used
were increased to 3,184. This version also included alternate forms such as Forms
L and M. Both Forms L and M were designed to be equivalent in terms of
difficulty and content. With two such forms, the psychometric properties of the
scale could be readily examined. However, it was similar with regards to
difficulty and content.
The 1960 version managed to establish the standard score using a mean of 100
and standard deviation of 16. Representative samples were chosen based on
2,100 children.
The modern Stanford-Binet Intelligence Scales was introduced from the revision
made in 1986. This revision included the intelligence theory of fluid and
crystallised intelligence: gf-gc. Items in this version are arranged according to the
three-level hierarchical model as shown in Figure 5.3.
Figure 5.3: Three-level hierarchical model in The Modern Binet Scale

The modern Stanford-Binet Intelligence Scales eliminates the age scale. Items are
arranged according to content. The test format is in an adaptive form. It uses
subject scores in vocabulary tests and the chronological age.
In addition, basal age has to be determined, which refers to the lowest level in
which two items with the same level of difficulty can be answered consecutively.
Then, the ceiling age is also determined, which refers to the point where at least
three out of four items cannot be answered.
Standardised samples were taken from 5,000 subjects in 47 states in the USA. The
selection of samples was based on the strata of geographical location, community
size, ethnic groups, age and gender.
The reliability reported for the scale was good with internal consistency using the
KR20 method. The reliability index is more than .90. The high index is necessary
to make decisions on individuals. Test-retest reliability showed the reliability
index of .91 for five-year-old subjects and .90 for eight-year-old subjects.
5.5 THE WECHSLER SCALES

Wechsler (1939) defined intelligence as „the aggregate or global capacity of the
individual to act purposefully, to think rationally and to deal effectively with his
environment‰.
The Wechsler Intelligence Scales are the most common intelligence tests used in
our country to measure the intelligence level of an individual. There are three
Wechsler scales of intelligence as shown in Figure 5.4.
Figure 5.4: The three Wechsler Intelligence Scales

These three scales measure intelligence of different age groups:
(a) WAIS-IV is for individuals between the ages of 16 to 89 years;
(b) WISC was constructed in 1949 to measure the general intelligence of

children between five and 15 years old. The latest version of The WISC-IV
has norms from six years to 16 years 11 months; and
(c) WPPSI-III is for children from age 2 years 6 months to 7 years 3 months.
In general, all three Wechsler scales produce three types of scores: verbal score,
performance score and total score. Verbal score is obtained from tests such as
vocabulary and verbal similarities, while performance score is obtained from
tests such as picture completion and picture arrangement. The total score is the
combination of the verbal and performance scores.
Like Binet, Wechsler also assumed that human intelligence is wider than what is
measured by the test. Although Wechsler believed in intelligence assessment, he
did not limit the conception of intelligence to the scores of intelligence tests.
Wechsler believed that intelligence is the basis of human life. Individuals use
intelligence not only to sit for intelligence test or complete school work, but they
also use their intelligence to interact with other people, perform tasks effectively
and manage daily lives. Focus on assessment of intelligence is only one of several
theoretical approaches and research on intelligence.
All three Wechsler scales have good norms. Split-half reliability is more than .95,
while reliability of verbal IQ and performance IQ each is within the range of .90
to .95. The validity of WAIS is also satisfactory. Good criterion validity was
shown in many studies of correlation between WAIS-III with other tests of
intelligence and academic performance.
SELF-CHECK 5.3
1. List the two standardised tests of intelligence and discuss their

psychometric properties.
2. Find some examples of items of intelligence tests listed in this

topic.

5.6 INTELLIGENCE TESTS FOR MILITARY

Although it is still uncommon in our country to conduct a proper intelligence test
to measure the intelligence levels of army personnel, the measurement of
intelligence in military has had a long history in the West.
In this section, we will explore intelligence test and measurement in military as it

is used in the USA, a comparison to the popular intelligence tests which we have
discussed thus far to determine their similarities and differences.
5.6.1 Brief History

The measurement of intelligence forced its way into the public consciousness of
Americans during World War I, when some 1.7 million US recruits were tested
by the army under the direction of Col. Robert M. Yerkes. The findings provided
the first large-scale evidence from the „science of mental testing‰ that American-
born blacks and some of the foreign-born draftees scored lower on intelligence
tests than American-born whites.
After the war, the armyÊs system of scoring was translated into mental age levels
and the results were made public. According to the scales and the method of
calculation used then, it was estimated that the average army draftee had a
mental age of about fourteen years. These tests initiated a debate that has gone
on ever since. What is intelligence? Can it be measured?
5.6.2 The Army Alpha Tests

The army had no intention of committing itself to a definition of intelligence. To
achieve the goal of classifying recruits quickly ă weeding out the „feeble-
minded‰ and identifying candidates for officersÊ training ă the army asked a
committee of psychologists to assemble a series of tests by drawing on the
different existing systems, including the Stanford-Binet test.
The committee tried a series of tests out in a few camps, timing the participants.
The number of text items and the time limits were then fixed so that only about
five percent of an average group would be able to finish the entire test in the time
allowed.

This determined the „A‰ man, a man supposedly with „very superior
intelligence.‰ Between 100 and 200 men were ordered to report for testing at a
time. After a five-minute literacy test, those who could not read or write English
were withdrawn, and the rest were given pencils and printed forms of the Army
Group Examination Alpha. A senior officer stood at the front of the room and
read the general directions only once. Then, the men were given the tests.
5.6.3 The Army Beta Tests

While the Alpha tests were devised for literate, English-speaking recruits, the
Beta tests were devised to compensate for language differences among groups of
poorly educated soldiers.
The Beta tests were constructed so that the directions could be given in
pantomime. Test I, for example, was a maze. An assistant demonstrated by
tracing through a sample maze on a blackboard at the front of the room with a
piece of chalk. When he purposely went into a blind alley and crossed over a line,
the officer shook his head, said, „No, no‰ and took the demonstratorÊs hand back
to the place where he could get on the right track again. Then, he traced an
imaginary line with his finger through each maze on the sheet and said, „All
right. Go ahead. Do it. Hurry up‰. Speed was emphasised as orderlies walked
about the room motioning to men who were not working and telling them to „Do
it. Do it. Hurry up, quickly‰.
5.6.4 Various Related Issues

There were various issues however regarding the implementation and
development of these intelligence tests, as detailed in the following:
(a) Flawed Test

The Beta test came under criticism and was not as successful as the Alpha.
For example, the Beta test taker was expected to know what was missing in
a picture of an electric light bulb without the filament or a tennis game
without a net. For many recruits in 1917 and 1918, however, electricity was
not available in their homes and tennis was a sport for the well-to-do.
Despite the flaws in the test, the individualÊs score did affect the army
careers of many men. Men who scored low were assigned to labour
battalions. In May 1918, Beta scores became the basis for putting men in
special development battalions for intensive training to see if there were
tasks that could be found for them in the army.

(b) Intelligence, Culture or Education

When psychological tests were first created early in the century, little
allowance was made for cultural or educational differences. Such tests were
developed to find out what kept children from learning and progressing in
schools. The committee that constructed the army tests thought at the time
that they were measuring innate intelligence, not which developed from
schooling.
However test results were closely connected to the amount of schooling a

man had received. College men were at the upper end of the scale and the
majority of those who had not advanced beyond grade school were
concentrated in the middle and lower end.
In the uproar that followed the publication of the test results, Lewis M.
Terman, the creator of the Stanford-Binet tests, pointed out that the mental
age standards for the army were established by giving both the Alpha and
the Beta tests to groups of schoolchildren. It came as no surprise to test
critics that the average fourteen-year-old student in school did as well as or
a little better than soldiers who on average had less formal education.
(c) Immigration Controversies

After the war, the scores of recent Polish, Russian, Jewish and Italian
immigrants in the United States were well below the scores of the
thoroughly acculturated immigrants from England and Western Europe.
This further fuelled the arguments of those professing that the new
immigrants were genetically inferior.
Members of the Eugenics Research Association and members of the House

Committee on Immigration and Naturalisation of the United States
Congress claimed that the tests had taken the national debate about
immigration, which had simmered during and after the war, „out of
politics‰ and positioning it from „a scientific basis.‰ In 1924, Congress
passed a law restricting the total number of immigrants, favouring those
from northern and western Europe. Immigration from the European
Continent had become partitioned by geography.
(d) The Army Intelligence Tests ă A Sample

During the war, the nature of the armyÊs intelligence tests was a military
secret. Anyone caught revealing their contents faced a $10,000 fine, a two-
year prison term or both.

However, the March 1919 issue of The American Magazine carried what it called
a specimen set of the Army Alpha test under the heading „Try These Tests on
Yourself and Others‰:
With your pencil, make a dot over any one of these letters FGHIJ, and a
comma after the longest of these three words: boy mother girl. Then, if
Christmas comes in March, make a cross right here ⁄ but if not, pass along
to the next question, and tell where the sun rises. If you believe that Edison
discovered America, cross out what you just wrote, but if it was someone
else, put in a number to complete this sentence: „a Horse has ⁄ feet.‰
The entire version of this sample took the average adult 125 seconds to answer.
Fifty percent of average educated adults came somewhere between 100 seconds
and 150 seconds. Those who took less than 100 seconds were ranked in the
superior 25 percent. Those who took more than 150 seconds were labelled in the
poorest 25 percent. No one taking the test scored the maximum. Scores were
ranked according to the following scale as shown in Table 5.5:
Table 5.5: Score ranking
Ranking Points Right

A Very Superior 135ă212
B Superior 105ă134
C+ High Average 75ă104
C Average 45ă74
Că Low Average 25ă44
D Inferior 15ă24
Dă Very Inferior 0ă14
Source: Evelyn Sharp (1972)
An E rating was reserved for those who were considered unfit for duty because
of mental inferiority and who were then discharged from the army (about 0.5
percent).

SELF-CHECK 5.4
1. What are the abilities or qualities required for a person to qualify

for an American military exam?
2. Compare the army intelligence tests with other intelligence tests.
5.7 INTELLIGENCE TESTS ISSUES

Based on the perspectives of current cognitive psychologists, there are two issues
that need to be given consideration regarding intelligence tests:
(a) Should researchers interested in intelligence focus on the structure of

intelligence or the processes behind intelligent behaviour?
(b) What is the foundation of intelligence: hereditary genetics, attributes

obtained from interaction with the environment or interaction between the
two?
ACTIVITY 5.2
Think about the issues highlighted above regarding intelligence tests

and discuss them further with your face-to-face tutor and e-tutor on the
MyVLE forum.
This section will focus on discussing issues related to human intelligence,

specifically the possibility of utilising knowledge obtained from studies on
intelligence in efforts to increase human intelligence. Various issues related to
human intelligence will be highlighted as well.
5.7.1 Can Intellectual Abilities be Increased?

There are views stating that the human brain cannot be changed to increase
intellectual abilities, which are determined through genetics. However,
researchers with authority in human intelligence and wide experience in
implementation of various programmes with the aim of increasing human
intelligence, particularly among children, share a different perspective on this.

By referring to various studies on intervention programmes, Detterman and

Sternberg (1982; in Iran Herman & Muhamed Awang, 1999) and Sternberg
(1996), provided various evidence that human intelligence is something that is
malleable, which means, it can be manipulated and in addition increased
through various kinds of intervention.
For example, the Head Start programme was implemented in the United States to
increase the intellectual capabilities and performance of preschool children.
Studies intended to evaluate its effectiveness showed that by middle adolescence,
children who participated in the Head Start programme from the beginning
obtained a performance level of one grade higher than children in the control
group who did not participate in the programme (Lazar & Darlington, 1982;
Zigler & Berman, 1983). Children who participated in the programme also
showed higher scores in various performance tests in school, did not require
remedial attention and showed less symptoms of behavioural problems.
Although it was not an actual measurement of intelligence, it was a form of
assessment that showed positive and strong correlations with intelligence tests.
Apart from Head Start, several other programmes have also showed encouraging
success in increasing the intellectual abilities of children. One of them was
the Instrumental Enrichment programme, which involved training in various
abstract reasoning skills and which seemed effective in improving the skills in
retarded children. Another programme, The Philosophy for Children (Lipman,
Sharp & Oscanyan, 1980), succeeded in teaching logical thinking skills to children
in primary and secondary school levels.
In addition, several aspects of the Intellectual Applied programme (Sternberg,

1994) which aimed to teach intellectual abilities have proven to be effective in
increasing literate skills (Davidson & Sternberg, 1984) and the ability to learn the
meaning of words in context, which is a method of attaining new vocabulary
(Sternberg, 1994).
There are several research programmes aimed at enriching the situational

environment to increase the intellectual abilities of its people, specifically among
children. Support for the importance of the living environment towards the
intellectual development of children was shown in a study by Bradley and
Caldwell (1984; in Iran Herman & Muhamed Awang, 1999). The study found that
several factors in the environment (that is, preschool) were correlated with IQ
scores. These factors were:
(a) Emotional and verbal responsiveness of the closest caretakers;
(b) Their involvement with children;

(c) Avoidance of limit and punishment;
(d) Physical environmental organisation and scheduled activities;
(e) Provision of play materials; and
(f) Opportunity to obtain a variety of daily stimulus.
A study by Bradley and Caldwell (1984; in Iran Herman & Muhamed Awang,
1999) also found that variables listed previously effectively predict IQ scores
compared to socioeconomic statuses. Current studies by Pianta and Egeland
(1994) suggest factors such as social support and interactive behaviour play an
important role in determining the stability of scores on intellectual abilities test
among children between two and eight years old.
5.7.2 Culture and Intelligence

Data from research should not be interpreted as evidence that demographic
variables do not influence IQ scores. Conversely, across all human history and
across cultures, many groups of people are put in the lowest social order. Across
several cultures, the lowest strata of people (for example, the Maoris in New
Zealand compared with immigrants from Europe) showed differences in
intelligence and aptitude test scores (Steele, 1990; Zeidner, 1990). This is the same
with the Buraku-min race in Japan who were given emancipation but were not
fully accepted in the Japanese culture. However, the low performance and low
strata members of Buraku-min race who migrated to the United States of
America showed similar level of IQ scores and school performance with other
American-Japanese (Ogbu, 1986; in Iran Herman & Muhamed Awang, 1999).
5.7.3 Genetic versus Environment

Although genetic factors determine the boundary of highest intelligence, there is
evidence that shows environment or nature (Reed, 2000; Sternberg & Wagner,
1994), motivation (Collier, 1994; Sternberg & Rizgis, 1994) and training
(Feuerstein, 1980; Sternberg, 1994) also influence intellectual abilities.
This means that individual intelligence can be developed based on a range of

wide potential intelligence. The approach further suggests that we can still hope
to help individuals improve their intelligence, as each individual has yet to reach
his or her highest potential in terms of intellectual abilities. Therefore, creating
the need for attending training and going to school.

5.7.4 Use of IQ Score

Since intelligence tests are most closely associated with measures of achievement
in school, why canÊt we do away with these tests and rely solely on achievement
tests?
This is because many people believe that achievement test scores do not have the
same meaning as intelligence test scores. We tend to view an IQ score as
reflecting a general ability and hence, as having wider implications. We may
conclude that a person with low achievement test scores should have studied
harder in school, but we are likely to view a person with a low IQ score as being
less capable and by implication, a less worthy individual (Janda, 1998).
Brody (1992) argues that there is redundant information in a studentÊs file that
includes both intelligence and achievement test results, with the intelligence test
score offering a fairer standard for making decisions. This is true because not all
students have the same educational experiences.
In his defence of intelligence tests, Brody (1992) states that there is not another
single index that is as predictive of socially important outcomes as are tests of
general intellectual ability. Large numbers of professionals in educational and
clinical settings believe that these tests are useful in the decision-making process.
They also believe that without such tests, it would be impossible to conduct the
research necessary to expand our knowledge of intelligence and to learn more
about how we might maximise a personÊs potential (Janda, 1998).
ACTIVITY 5.3
1. After reading the issued related to intelligence tests discussed in

Section 5.7, relate the issues to the Malaysian context.
2. Do additional readings to identify any other possible

psychological issues related to intelligence tests which need to be
focused on.

 Intelligence can be defined as an individualÊs capacity to learn from

experience and the ability to use metacognitive processes to increase learning
and adapt oneself to situations in the environment that may involve
adaptation to different social and cultural contexts.
 Intelligence quotient, or in short IQ, is a ratio obtained by dividing the mental

age (MA) with the chronological age (CA) and multiplying this by 100.
 Factor analysis is a statistical method used to discriminate the construct of

intelligence into several hypothetical factors or abilities believed to provide
basic differences of individuals based on their performance on intelligence
tests.
 The Two-Factor Theory of Intelligence, Multidimensional Model, Structure of

Intellect Model, CHC Model and Theory of Multiple Intelligence are among
the popular models in defining the concept of intelligence, which form the
foundations in the construction of intelligence tests and measurement tools.
 Two major tests of intelligence which are widely used are the Stanford-Binet
intelligence scale and the Wechsler scales of intelligence.
 The Army Alpha tests and the Army Beta tests are two intelligence tests
initially used in the USAÊs military during World War I, but with many
critical issues.
 The possibility of improving intelligence abilities, genetic versus

environment, the usage of IQ test results and cultural factors are among the
issues related to intelligence tests that have sparked a lot of debates.
Army Alpha tests Mental age

Army Beta tests Multiple intelligence
Chronological age Primary mental abilities
Crystallised intelligence SpearmanÊs „g‰ factor theory
Fluid intelligence Stanford-Binet intelligence scale
Factor analysis Structure of intellect model
Intelligence quotient Wechsler scales

102  TO
OPIC 5 INTELLLIGENCE TEST
Binet, A., & Simon,

B S 16). The intel
T. (191 elligence of th
he feeble-min
nded. Baltimo
ore,
MD: Williams & Wilkkins.
B
Brody, N. (19992). Intelligenc
nce. San Diego
o, CA: Academ
mic Press.
D
Das, J. P. (19773). Cultural deprivation
d a
and cognitivee competencee. In Ellis, N. R.
(Ed.), Int
nternational Review
R of Ressearch in Men ntal Retardattion. New York:
Academiic Press.
Freeman, F. S. (1955). Theo

eory and pracctice of psych
hological testi
ting. New York:
Holt.
Gardner, H. (11983). Frames

G es of mind: The
Th theory off multiple inte
telligences. Neew
York: Bassic Books.
G
Guilford, J. P. (1967). The nature
n of hum
man intelligencce. New York
k: McGraw-Hiill.
Jaanda, L. H. (11998). Psycho

ological testin
ng: Theory an
nd application
ns. Boston, MA:
M
Allyn annd Bacon.
harp, E. (19722). The IQ cullt. New York: Coward, McC

Sh Cann & Geog
ghegan.
pearman, C. (1923). The nature

Sp n of „inteelligence‰ and
d the principlles of cognitio
ion.
London, ENG: Macmiillan.
J (1986). Inteelligence appl

Sternberg, R. J. plied: Understtanding and increasing
i yo
our
intellectu
ual skills. San Diego, CA: Harcourt
H Bracce Jovanovich
h.
Sternberg, R. J. (1988). Th
he nature off creativity: Contemporary
C y psychologiical
perspecti
tives. Cambrid
dge, ENG: Caambridge Uniiversity Press.
Structure off Intellect (Guilford, J. P.). (2013). Reetrieved fro

om
http://w
www.instructiionaldesign.o
org/theories/
/intellect.htmll
T
Terman, L. M.. (1916). The measurement
m nt of intelligen
nce: An explaanation of and
da
completee guide for the
th use of thehe Stanford reevision and extension
e of the
t
Binet-Sim
mon intelligennce scale. Boston, MA: Houughton Miffliin.
Thurstone, L. L. (1938). Primary

T P men
ntal abilities. Chicago, IL: University of
Chicago Press.
Wechsler, D. (1939).
W ( The measurement
m o adult intellligence. Baltiimore, MD: The
of T
Williamss & Wilkins Company.
C

Topic  Ability,
6 Aptitude and
Achievement
Test
LEARNING OUTCOMES
1. Explain ability, aptitude and achievement tests;
2. Understand group tests and their advantages and disadvantages;
3. Describe the Multiple Aptitude Test Batteries and other specific
aptitude and achievement tests;
4. Analyse the other individual tests of ability in education and
special education; and
5. Discuss the issues in aptitude and achievement testing.
 INTRODUCTION
In the previous topic, the theories of intelligence, popular intelligence testing
tools and the issues related to intelligence tests were discussed. In this topic, you
are going to learn about ability, aptitude and achievement tests. As these tests are
usually administered in groups, the issues related to group tests will be
highlighted as well. Furthermore, specific ability, aptitude and achievement tests
used in education, business and civil services settings will be introduced.
Towards the end of this topic, you will also learn about the various issues
concerning aptitude and achievement testing.

104  TOPIC 6 ABILITY, APTITUDE AND ACHIEVEMENT TEST
6.1 DEFINITION OF ABILITY, APTITUDE AND

ACHIEVEMENT TESTS
Ability, aptitude and achievement tests are used as part of a sequence to
determine giftedness of individuals, in order to identify their strengths and
weaknesses.
Ability tests are also known as aptitude or intelligence tests. These are
standardised batteries administered by qualified professionals that assess an
individualÊs overall thinking and reasoning abilities. The terms intelligence,
ability and aptitude are often used interchangeably to refer to behaviour that is
used to predict future learning or performance. However, subtle differences exist
between the terms, especially for intelligence tests.
Intelligence tests assess general intelligence. The Binet and Wechsler scales
introduced in Topic 5 are exceptionally good instruments for this. However, both
scales have limitations, one of which is that they cannot be used to assess a
personÊs special abilities.
Therefore, several individual tests have been created to meet special problems,
measure specific abilities or address the limitations of the Binet and Wechsler
scales (Kaplan & Saccuzzo, 2009). These are ability and aptitude tests and are
widely used in education and in particular, special education.
In this topic, both ability and aptitude tests are termed as „aptitude test‰ in the
discussions that follow. To further differentiate aptitude and achievement tests,
the primary difference between aptitude tests and achievement tests is that
aptitude tests tend to focus more on informal learning or life experiences,
whereas achievement tests tend to focus on the learning that has occurred as a
result of relatively structured input (Cohen & Swerdlik, 2010).
ACTIVITY 6.1
After doing additional readings, discuss in face-to-face tutorials and on

the myVLE forum:
1. The difference and the relationship between the concept of

„aptitude‰ and „achievement‰.
2. Debate based on your own opinion and understanding of how

„intelligence‰ is related to the aptitude, ability and achievement
of an individual.

TOPIC 6 ABILITY, APTITUDE AND ACHIEVEMENT TEST  105
6.2 STRUCTURES OF APTITUDE AND

ACHIEVEMENT TESTS
In this section, the common structures of aptitude and achievement tests along
with an explanation of their attributes will be introduced.
6.2.1 Characteristics of Aptitude and Achievement

Tests
Aptitude and achievement tests are designed to assess logical reasoning or
thinking performance. They consist of multiple-choice questions and are
administered under exam conditions. They are strictly timed and a typical test
might allow 30 minutes or more for 30 or so questions. The test result will be
compared to that of a control group so that judgements can be made about
individual abilities.
Figure 6.1 shows the characteristics of aptitude and achievement tests.
Figure 6.1: Characteristics of aptitude and achievement tests

Source: www.scribd.com/doc/37565606/Aptitude-Tests

6.2.2 Methods of Tests Administration

The test takers may be asked to answer the questions either on paper or online.
The advantages of online testing include immediate availability of results and the
fact that the test can be taken at employment agency premises for business and
industrial settings, or even at home. This makes online testing particularly
suitable for initial screening, as it is very cost-effective.
Figure 6.2 demonstrates the methods of tests administration.
Figure 6.2: Methods of tests administration

Source: www.psychometric-success.com/psychometric_tests/
psychometric-aptitude-tests.htm
6.2.3 Speed Tests versus Power Tests

Aptitude and achievement can be classified as speed tests or power tests.
In speed tests, the questions are relatively straightforward and the test is
concerned with how many questions a test taker can answer correctly within an
allotted time. In the context of business and industry application, speed tests tend
to be used in selection at the administrative and clerical levels.
A power test, on the other hand, will present a smaller number of more complex
questions. For business and industry settings, power tests tend to be used more
at the professional or managerial levels.

6.2.4 The Contents

There are at least 5,000 aptitude and achievement tests on the market. Some of
them contain questions that can only measure one aspect (for example, verbal
ability or numeric reasoning ability), while others are made up of questions to
measure different aspects of aptitude and achievement, as shown in Figure 6.3.
Figure 6.3: Some aspects of measurement in aptitude and achievement tests

Source: www.psychometric-success.com/psychometric_tests/
psychometric-aptitude-tests.htm

In Table 6.1, some of the common types of questions in aptitude and achievement
tests are explained in detail.
Table 6.1: Common Types of Questions
Types of Questions Description

Verbal Ability Includes spelling, grammar and the ability to understand
analogies and follow detailed written instructions. These
questions appear in most general aptitude tests to ascertain
how well the test taker can communicate.
Numeric Ability Includes basic arithmetic, number sequences and simple
mathematics. In management level tests, the test taker will
often be presented with charts and graphs that need to be
interpreted. These questions appear in most general aptitude
tests because for example, in business settings, employers
usually want some indication of a potential employeeÊs
ability to use numbers even if it may not be a major part of
the job.
Abstract Reasoning Measures the ability to identify the underlying logic of a
pattern and then determine the solution. The ability of
abstract reasoning is believed to be the best indicator of fluid
intelligence and the ability to learn new things quickly;
therefore these questions appear in most general aptitude
tests.
Spatial Ability Measures the ability to manipulate shapes in two
dimensions or to visualise three-dimensional objects
presented as two-dimensional pictures. These questions are
not usually found in general aptitude tests unless the job
specifically requires good spatial skills.
Mechanical Reasoning Designed to assess knowledge of physical and mechanical
principles. Mechanical reasoning questions are used to select
employees for a wide range of jobs in civil services,
including the military (Armed Services Vocational Aptitude
Battery), police forces, fire services, as well as many craft,
technical and engineering occupations.
Fault Diagnosis These tests are used to select technical personnel who need
to be able to find and repair faults in electronic and
mechanical systems. As modern equipment of all types
become more dependent on electronic control systems (and
arguably more complex), the ability to approach problems
logically in order to find the cause of the fault is increasingly
important.

Data Checking Measures how quickly and accurately errors can be

detected in data and are used to select candidates for
clerical and data input jobs.
Work Sample Involves a sample of the work that the test taker will be
expected to do. These types of tests can be very broad
ranging. They may involve exercises using a word
processor or spreadsheet if the job is administrative or they
may include giving a presentation or in-tray exercises if the
job is of management or supervisory level.
Source: http://www.psychometric-success.com/aptitude-tests/aptitude-tests-
introduction.htm
6.2.5 The Test Scores

The test scores from the aptitude and achievement tests are then compared with
the results of a control group, who have taken the tests in the past. This control
group can consist of other graduates, current job holders or a sample of the
population as a whole. The test takersÊ reasoning skills can then be assessed in
relation to this control group and judgements will then be made about their
ability, as illustrated in Figure 6.4.
Figure 6.4: Test scores

Source: www.scribd.com/doc/37565606/Aptitude-Tests

SELF-CHECK 6.1
1. Give an example of a verbal reasoning question.
2. What type of aptitude question best serves the purpose of

selecting an engineer?
6.3 GUIDELINES FOR TEST TAKERS

Test takers who sit for aptitude and achievement tests will always want to do
their best to show that a potential job is suitable for them, or that they have made
progress from the training or learning that they have gone through. There are
some guidelines from psychology testing and measurement perspectives which
can help the test takers to better prepare and perform in test taking for aptitude
and achievement tests.
6.3.1 Ask the Right Questions

The first thing to do is to determine which types of questions are asked in a test.
Do not waste time practising questions that will not appear in the actual test.
6.3.2 Work Systematically

Spend the preparation time wisely. Most people find themselves with only one or
two weeks to prepare for an aptitude or achievement test. Therefore it is essential
to work systematically by following the steps shown in Figure 6.5.

Figure 6.5: Steps to prepare for a test systematically

Source: http://www.psychometric-success.com/aptitude-tests/aptitude-tests-
introduction.htm
6.3.3 Confirm If In Doubt

If the test takers are applying for a job and are unsure of what types of questions
to expect, then they should ask the human resources department at the related
organisation. This will not count against the test taker in any way and the human
resources personnel should be only too happy to give them a breakdown. The
test takers have the right to prepare themselves for any tests they are asked to sit
for.

6.3.4 Do Not Make Assumptions

Try not to make any assumptions. For example, many people believe that they
will not have any problems with verbal ability questions because they once got
an „A‰ in English. They may have a point if they got the „A‰ a few months ago,
but what if it was ten years ago? It is very easy to ignore the effects of not reading
as much as one used to and of letting the spell-checker take care of correcting
written English.
The same thing applies to numerical ability. Most people who have left education
for more than a few years will have forgotten certain skills such as how to
multiply fractions and calculate volumes. While it is easy to dismiss these as
„first grade‰ or elementary maths, most people simply do not do these things on
a daily basis. So, do not assume anything ă it is better to know for sure.
6.3.5 Decide on a Practice Strategy

Test takers should make their own decisions on which types of questions to
practise on. They can either concentrate on their weakest areas or they can try to
elevate their score across all areas. Whichever strategy they choose, they should
keep practising because the way that aptitude tests are marked, even small
improvements to the raw score will have a big impact on the chances of getting
the job.
Whichever type of test that is given; the questions are almost always presented in
multiple-choice format and have definite correct and incorrect answers. As the
test takers proceed through the test, the questions may become more difficult and
they will usually find that there are more questions than they can comfortably
complete in the time allowed. Very few people manage to finish these tests and
the object is simply to give as many correct answers as a test taker can.
ACTIVITY 6.2
Can you think of any disadvantage(s) for using multiple-choice

questions for assessing individual applicants? Discuss your thoughts
with your course mates.

6.4 GROUP TESTS

Some aptitude and achievement tests are administered individually, however
most are conducted in groups. Some examples of group tests are:
Multidimensional Aptitude Battery, the Cognitive Abilities Test and the
Scholastic Assessment Tests.
Most people are administered either a group-administered cognitive or

achievement test during their studies. Of the millions of cognitive tests that are
administered to students annually, only a small fraction of these are individually-
administered (Cohen & Swerdlik, 2002). Considering their practicality, group
tests are often used across a variety of environments, including military,
industrial/organisational and educational. Thus, group-administered tests have
a broader application than individual tests (Aiken & Groth-Marnat, 2006).
6.4.1 Advantages of Group Tests

From their inception, it is clear that group-administered tests can address some
of the limitations inherent in individually-administered tests. For example, by
using only printed materials and following a standardised administration
procedure, the financial and personnel resources required for group-
administered tests are much less than the costs associated with individually-
administered tests.
Most group-administered tests also have standardised and computerised scoring

systems, which reduces the time required to score the protocols and thus
minimises scoring error.
Moreover, given the nature of the format, group-administered tests can be given
to as many students as can comfortably fit into a room, which reduces test
administration time and increases testing efficiency.
Finally, considering the potentially unlimited number of students who would be

administered a group-administered test, the norms created are often based on a
sample that is much larger than individually-administered tests. This advantage
allows for a direct comparison of scores across select demographic variables (for
example, race and disability status) which might not be possible when using
individually-administered tests.

Figure 6.6 summarises the advantages of group tests.
Figure 6.6: Advantages of group tests
6.4.2 Disadvantages of Group Tests

There are however a few important disadvantages when considering group-
administered tests. For example, the format does not allow for in-depth
observations of individual students as they complete the test. Thus, behaviours
such as fatigue, low motivation, anxiety, hunger and other negative states that
may interfere with performance are not observed.

Like their individually-administered counterparts, most group-administered

tests consist of subtests that assess a variety of cognitive or academic domains
and are either in the form of timed or power tests. However, the scoring format
for most group-administered tests is multiple-choice, which is less flexible and
yields much less diagnostic information. For this reason, school-based group-
administered tests are often used as screeners to determine whether further
evaluation (often using an individually-administered test) is warranted.
Furthermore, since the examiner may be less trained in the nuances of the test (in
comparison to those who administer individual tests), the examiner may break
the standardisation and inadvertently (and inappropriately) answer studentsÊ
queries or not be able to monitor the testing environment with the same fidelity
as can be given to the individual testing environment.
Another limitation is the restriction of responses to multiple-choice questions,

whereas items on many individually-administered tests have different levels of
scoring depending on the complexity of the response. In this regard, group-
administered items may unduly penalise creative or original thinkers.
Although the sample size of a group-administered test may be large, it may also
not be representative of children from a particular demographic. For example
overseas, many group-administered cognitive and achievement tests are normed
by students who take the test in the fall and in the spring. However, many
students may choose not to take the test (when given a choice) or not be
motivated to perform their best on the test (Aiken & Groth-Marnat, 2006).
Finally, the results of group-administered tests can be used inappropriately. For

example, the data obtained from such tests can be used to diagnose and place
students into special programmes, which should only occur from individually-
administered tests (Cohen & Swerdlik, 2002).

The disadvantages of group tests can be summarised as shown in Figure 6.7.
Figure 6.7: Disadvantages of group tests
After exploring some theoretical aspects of aptitude and achievement tests and
the issues related to group tests, we will move on to examine certain specific
aptitude and achievement testing tools.

6.5 MULTIPLE APTITUDE TEST BATTERIES

Multiple aptitude tests consist of a set of tests meant for general use, while
special aptitude tests are used for special programmes.
The first multiple aptitude test battery was published in 1941 and was known as
the Chicago Tests of Primary Mental Abilities. This battery was the direct
outcome of ThurstoneÊs factor analytic investigation. ThurstoneÊs theory of
intelligence centres on the existence of Primary Mental Abilities (PMA) and was
in direct contrast with SpearmanÊs theory of general intelligence.
Thurstone felt that differences in the results of intellectual tasks could be

attributed to one or more of nine independent abilities. These nine abilities were
named space, verbal comprehension, word fluency, number facility, induction,
perceptual speed, deduction, rote memory and arithmetic reasoning. Some of
these are explained below:
(a) Space PMA represents the ability to recognise that two shapes are the same
when one has been rotated;
(b) Perceptual speed is the ability to recognise similarities and differences

between pairs of stimuli;
(c) Verbal comprehension involves recognising synonyms and antonyms;
(d) Induction requires establishing a rule or pattern within a given set; and
(e) Deduction involves drawing a logical inference from a set of facts or

premises.
ThurstoneÊs theory was well supported by his early research with subjects who
were University of Chicago undergraduates. It did not hold up however, when
he tested the theory against school-aged children. Apparently, the more
intellectually elite subjects at the University of Chicago did not differ very much
in their general intelligence. Their observable differences were noted among the
PMAs. On the other hand, the grade school children were more diverse in their
general intelligence. Therefore, the differences among their PMAs were not as
notable as the differences among their general intelligence.

One of the most used multiple aptitude battery is the Differential Aptitude Tests
(DAT). The DAT was first published in 1947 and later revised in 1962 and in
1974. It was developed by Bennett, Seashore and Wesman (1974). It comprises the
following eight subtests:
(a) Verbal reasoning;
(b) Numerical ability;
(c) Abstract reasoning;
(d) Mechanical reasoning;
(e) Clerical speed and accuracy;
(f) Space relations;
(g) Spelling; and
(h) Language usage.
6.6 GENERAL APTITUDE TEST BATTERY

(GATB)
The General Aptitude Test Battery (GATB) was developed by the US
Employment service in 1970 for use primarily in the armed force services. The
GATB consisted of:
(a) Intelligence (G);
(b) Numerical aptitude (N);
(c) Verbal aptitude (V);
(d) Spatial aptitude (S);
(e) Form perception (P);
(f) Clerical perception (Q);
(g) Motor coordination (K);
(h) Finger dexterity (F); and
(i) Manual dexterity (M).

The GATB has been widely used in the employment service. Gradually, a
number of aptitude test batteries were developed for different purposes such as
the Flanagan Aptitude Classification Test (FACT) (Flanagan, 1964). This is a
multiple aptitude battery generally used for vocational counselling, rehabilitation
and occupational and employee selection. The resulting psychological profile is
used to determine appropriate career and training paths.
The battery involves nine different general aptitude tests involving 12 separate
subtests. These general aptitude tests are shown in Table 6.2.
Table 6.2: General Aptitude Test Battery (GATB)
Aptitudes Measured Description

General Learning It is linked to the utilisation of logic or scientific evidence to
Ability characterise problems and draw conclusions, make decisions
and judgements, or plan and administer the work of others.
Verbal Aptitude The skill to recognise the meaning of words and to employ
them efficiently.
Numerical Aptitude The capability to execute arithmetic operations rapidly and
appropriately.
Spatial Aptitude The skill to imagine visually geometric figures and to
understand the two-dimensional demonstration of three-
dimensional objects.
Form Perception The skill to perceive important details in objects or in pictorial
or graphic materials.
Clerical Perception The aptitude to observe significant details in verbal or tabular
materials. The capability to observe distinctions in copy, to
proofread words and numbers and to evade perceptual errors
in arithmetic calculation.
Motor Coordination The skill to synchronise eyes and hands or fingers quickly and
appropriately in making exact movements with speed.
Finger Dexterity The skill to move fingers and direct small objects with the
fingers, quickly or accurately.
Manual Dexterity The capability to move the hands simply and proficiently.

Different jobs may require different capabilities to perform them perfectly.

Depending on the job requirements, these general aptitude tests may either be
used individually or in a composite way.
When applicants have applied for a job where multiple or most of the traits are
required, then they will have to go through a complete GATB. The data of their
performance in different areas is collected through the use of composite battery.
For selection in particular areas or for particular occupations, only a part of
GATB is administered.
For example, if an architectural job fascinates an applicant, then, he or she needs

to score high in the following four parts of the GATB:
(a) Computation;
(b) Three-dimensional space;
(c) Vocabulary; and
(d) Arithmetic reasoning.
SELF-CHECK 6.2
1. How would you justify the usage of individual tests over group
tests?
2. Which was the first multiple aptitude test battery that came into
fruition and when?
6.7 DIFFERENTIAL APTITUDE TESTS (DAT)

Differential Aptitude Tests (DAT) are the latest variety of career aptitude tests.
They are considered to be a powerful tool in screening candidates for all jobs
because they measure a candidateÊs aptitude in various areas as shown in
Table 6.3.

Table 6.3: Nine Different Areas of Measuring a CandidateÊs Aptitude
Areas of Measuring a
Description
CandidateÊs Aptitude
Verbal reasoning test These tests generally involve grammar, verbal analogies and
following explained written instructions. They can also include
spelling, sentence completion and comprehension.
Numerical ability test Numerical aptitude tests are employed by employers to assess
oneÊs capability to carry out tasks involving the management of
numbers.
Abstract reasoning These tests assess the skills of a person in analysing
test information and solving problems on a compound, thought-
based level.
Mechanical reasoning These tests evaluate oneÊs understanding of simple mechanical
and physical concepts.
Space relations or The space relations test evaluates a personÊs capability to
spatial aptitude test envisage objects in three dimensions.
Spelling test A spelling test is an evaluation of a personÊs (generally a
studentÊs) capability to spell words properly.
Language usage test The capability to utilise language is significant in any job in
which communication, written or verbal, is used.
Spatial aptitude test A spatial aptitude test assesses oneÊs skill to manipulate shapes
in two aspects or to visualise three-dimensional objects
presented as two-dimensional pictures.
Perceptual speed and This test evaluates the capability to work precisely with detail
accuracy test and at different speeds.
Differential aptitude testing offers eight sets of questions based on different

aptitudes. It consists of multiple-choice questions and the test takers will be
required to select the correct option within a set time limit of 12 to 25 minutes for
each test.
The reason why DAT forms a part of almost all job aptitude tests is that it tests an
individual on all basis and helps him or her to decide which career he or she
would want to choose for himself or herself. This decision is taken on the basis of
marks secured, the level of knowledge and the section that interests him or her
the most.

Different individuals have varying levels of interest and intelligence in different

fields. Some might be good at maths, but poor in verbal reasoning, while others
may be excellent in written language but may be very weak in calculations. So, a
DAT will help an individual to know whether he or she possesses the skills
required for taking up a career of choice or not.
The verbal DAT measures the ability to find relations amongst words and
manipulate abstract ideas. The numerical DAT measures capability to interpret
numerical relationships between different figures. These two skills are required
for most jobs.
Other types of DAT include the abstract reasoning test, which measures test
takersÊ ability to quickly identify patterns, logical rules and trends in new data,
integrate this information and apply it to solve problems. Mechanical reasoning
tests measure the test takersÊ ability to understand and apply mechanical
concepts and principles to solve problems.
Spelling test measures capability to recognise correctly spelt common English

words. This DAT is used for English and writing courses. This test is used to
screen candidates for jobs in review writing, journalism and management
courses. In order to score well in this test, a test taker must have basic knowledge
of grammar, punctuation and capitalisation rules according to the English
language.
Speed and accuracy test measures the ability to perform a job quickly and
accurately. Then, there are some specific DATs that are required only for specific
jobs. For instance, the space relations test measures the capability to analyse
three-dimensional figures. This sort of an aptitude is a must when an individual
is looking for jobs in engineering, architecture or designing.
SELF-CHECK 6.3
How do both spelling and language usage tests differ?

6.8 KAUFMAN ASSESSMENT BATTERY FOR

CHILDREN-II
The Kaufman Assessment Battery for Children-II (KABC-II) is an individual test
that measures cognitive ability constructed for children and adolescents from 3 to
18 years old (Kaufman & Kaufman, 1983). Several features of this test are as
follows:
(a) Based on two theoretical models of intelligence;
(b) Consists of different subtests and a global scale for each group age (ages 3,
4 to 6 and 7 to 18); and
(c) Provides choices of non-verbal scales which also vary according to age
groups.
Kaufman and Kaufman (1983) provide a good model of the test definition
process. In proposing the Kaufman Assessment Battery for Children (K-ABC), a
new test of general intelligence in children, the authors listed six primary goals
that define the purpose of the test and distinguish it from existing measures:
(a) Measures general intelligence from a strong theoretical and research basis;
(b) Separates acquired factual knowledge from the ability to solve unfamiliar
problems;
(c) Yields scores that translate to educational intervention;
(d) Includes novel tasks;
(e) Be easy to administer and objective to score; and
(f) Be sensitive to the diverse needs of preschool, minority groups and

exceptional children.

6.9 OTHER TESTS IN EDUCATION AND

SPECIAL EDUCATION
Individualised achievement tests are useful for assessing a studentÊs academic
abilities. They are designed to measure both pre-academic and academic
behaviour: from the ability to match pictures and letters, to more advanced
literacy and mathematical skills. They can be helpful in assessing needs as well.
(a) The Peabody Individual Achievement Test (PIAT)

The Peabody Individual Achievement Test (PIAT) is an achievement test
which is administered individually to students. Using a flip book and a
record sheet, it is easily administered and requires little time. The results
can be very helpful in identifying strengths and weaknesses. The PIAT is a
criterion-based test which provides age equivalent and grade equivalent
scores.
(b) The Woodcock Johnson Test of Achievement

The Woodcock Johnson Test of Achievement is another individualised test
which measures academic areas and is appropriate for use with persons
from as young as two and as old as „90-plus‰ according to the test
manual (Cohen & Swerdlik, 2010).
The tester finds a base of a designated number of consecutive correct

answers and works to a ceiling of the same incorrect consecutive answers.
The highest correct number, minus any incorrect responses, provides a
standard score, which is quickly converted into a grade equivalent or age
equivalent.
The Woodcock Johnson (WJ) also provides diagnostic information as well

as grade level performances on discrete literacy and mathematical skills,
from letter recognition to mathematical fluency.
Ability/achievement discrepancy is the most common method used for

determining eligibility for special programmes. The WJ III, the latest
version revised in 2001, provides several options for calculating ability/
achievement discrepancies. For the first time, an ability/achievement
discrepancy can be calculated by using only the WJ III Tests of
Achievement.

The oral language tests, formerly in the cognitive battery, are now part of
the achievement battery. The Oral Language cluster is used as the „ability‰
score and is then compared to the achievement clusters. In this scenario, the
individualÊs oral language ability becomes the predictor of his or her
academic achievement.
The WJ III Tests of Achievement includes the following five oral language
tests:
(i) Story recall;
(ii) Understanding directions;
(iii) Picture vocabulary;
(iv) Oral comprehension; and
(v) Story recall-delayed.
Various combinations of these tests create the following clusters:
(i) Oral language-standard;
(ii) Oral language-extended;
(iii) Listening comprehension; and
(iv) Oral expression.
The oral language-extended cluster, the broadest measure of the ability, is

used in the ability/achievement discrepancy calculation.
(c) The Brigance Comprehensive Inventory of Basic Skills

The Brigance Comprehensive Inventory of Basic Skills is another well
known, well-accepted criterion based and formed for individual
achievement tests. The Brigance provides diagnostic information on
reading, math and other academic skills. As well as being one of the least
expensive assessment instruments, the publisher provides software to help
write IEP (Individualised Education Plan-Programme) goals based on the
assessments, called Goals and Objective Writers Software.

The Brigance Test of Basic Skills, also known as the Brigance

Comprehensive Inventory of Basic Skills-Revised, is a criterion-referenced
assessment that identifies a studentÊs academic level of functioning. It is
also used as a tool in standardised assessment for identifying a studentÊs
strengths and weaknesses.
The Brigance test is administered in a classroom setting. A teacher may

administer the test to his or her own students. Students may be assessed in
a group setting or on an individual basis. The Brigance test assesses:
(i) Reading;
(ii) Decoding;
(iii) Reading comprehension;
(iv) Writing;
(v) Listening comprehension; and
(vi) Math.
The Brigance Test of Basic Skills provides assessments for students ranging
from pre-kindergarten to ninth grade. The test kit contains materials that
enable teachers to maintain an accurate recording of student achievement.
The Inventory section provides test administration directions and the
sequence in which specific skills should be assessed.
There is a student record book that allows the teacher to track education
objectives, student responses and academic progress. The test also contains
student profile test booklets that archive assessments and are used as a tool
in placement decisions.
The Brigance test contains a CD that has goals for individualised education
programmes and a manual for test validation and standardisation.
Triplicate scoring sheets are included, which are used to share assessment
results with parents and service providers attending multidisciplinary team
meetings.

TO
OPIC 6 ABILIT
TY, APTITUDE AND
A ACHIEVEM
MENT TEST  127
(d) KeyyMath 3 Diag gnostic Assessment

KeyyMath 3 Diag gnostic Assesssment (DA) is i both a diaggnostic and progress-
p
moonitoring tool for maths sk
kills. It is brok
ken into three areas: basic concepts,
c
opeerations and applications.. The instrum ment providess scores for each area
as well
w as each of o the 10 subttests it containns. Along witth the flip chaart books
andd test bookleets, KeyMathh also provid des scoring software,
s to generate
sco
ores and repoorts. Figure 6.8 shows the manual of one o of the KeyMath 3
Diaagnostic Asseessment 3Ês material.
m
Fig
gure 6.8: KeyM
Math 3 Diagnostic Assessmentt manual
Source:
h
http://www.p
pearsonclinical..com/educatioon/products/100000649/keym
math3-
diagnosticc-assessment.h
html#tab-pricin
ng
A comprehensive, norm-reeferenced in nstrument, th he KeyMath 3 DA,

inccludes conten
nt that covers the full specttrum of mathhs concepts an
nd skills,
rannging from early experiiences with rote and rational r counnting to
expperiences with
h factoring po
olynomials an nd solving lin
near equationss.
Tw
wo parallel forms (Form A and Form B) B allow for test
t administrration in
alteernating sequ uence every three month hs. Growth Scale Valuess (GSVs)
enaable educatorrs and cliniciians to measu ure progress accurately ov
ver time
acrross the full ra
ange of mathss concepts an
nd skills.

ACTIVITY 6.3
Search for the revised version of the Woodcock Johnson Test of

Achievement and state the difference(s) between the original version
and the revised one.
6.10 APPLICATION OF APTITUDE AND

ACHIEVEMENT TESTS: ISSUES
There are some standardised tests used in the field of education, civil service and
the military. Many of these tests are aptitude and achievement tests. These are
mainly used by every school, military regimes and civil service departments. The
situations of their related application are detailed in the following section.
6.10.1 Education
Schools use standardised tests to determine if children are ready for school and
to track them into instructional groups; diagnose them for learning disability,
retardation and other handicaps; and decide whether to promote or retain them
in their grade. Schools also use tests to guide and control curriculum content and
teaching methods. A test must be good enough to serve as the sole or primary
basis for important educational decisions.
Readiness tests, used to determine if a child is ready for school, are very
inaccurate and encourage the use of overly academic, developmentally
inappropriate primary schooling (that is, schooling not appropriate to the childÊs
emotional, social or intellectual development and to the variation in childrenÊs
development).
Screening tests for disabilities are often not adequately validated; it is not proven
that they are accurately measuring for disabilities. They also promote a view of
children as having deficits to be corrected, rather than having individual
differences and strengths on which to build. While screening tests are supposed
to be used to refer children for further diagnosis, they often are used to place
children in special programmes.
Tracking hurts slower students and mostly does not help more advanced
students. Retention in grade, or flunking or leaving a student, is almost always
academically and emotionally harmful, not helpful. Test content is a very poor

TO
OPIC 6 ABILIT
TY, APTITUDE AND
A ACHIEVEM
MENT TEST  129
basis forr determining

g curriculum content
c and teeaching meth
hods based on
n the test
are them
mselves harmfful.
In many y countries, raising

r test scores
s has beecome the sinngle most im mportant
indicatorr of school im
mprovement.. As a result,, teachers and d administraators feel
enormou us pressure too ensure thatt test scores go
g up. Schools narrow and d change
the curriiculum to ma atch tests. Teaachers teach only
o what is covered on thet tests.
Methodss of teaching conform to th he multiple-chhoice format of the tests. Teaching
T
more and d more resemmbles testing.
For multtiple-choice tests,

t „teachin
ng to the testt‰ means foccusing on thee content
that willl be on the test,
t sometimmes even drillling on test items and using u the
format of
o the test as a basis forr teaching. Since
S this kinnd of teachin ng leads
primarily d test-taking skills, increasses in test scorres do not necessarily
y to improved
mean im
mprovement in n real academ mic performan nce.
The US is the only economically

e advanced naation to rely heavily on multiple-
m
choice teests. Other nations use perfformance-bassed assessmen nt where stud
dents are
evaluateed on the ba asis of real work
w such as
a essays, prrojects and activities.
a
Ironicallyy, because th hese nations do not focuus on teachin ng to multiplle-choice
tests, theey even scoree higher than US students on those kinds of tests. Fiigure 6.9
shows an n example of the answer shheet of multip
ple-choice tessts.
Fig
gure 6.9: Answer sheet of mulltiple-choice teests
Source: http://www.w
wisegeek.com/ /what-are-the-different-typess-of-standardizzed-test-
m
questions.htm
Teaching g for the test also narrowss the curriculuum, forcing teachers
t and students
to conceentrate on memorisation n of isolateed facts, insstead of dev veloping
fundameental and hig gher order abilities.
a For example, mu ultiple-choice writing
tests are really copy-eediting tests, which
w do nott measure thee ability to org
ganise or

communicate ideas. Practising on tests or test-like exercises is not how to learn

even the mechanics of English, much less how to write like a writer.
Tests that measure as little and as poorly as multiple-choice tests cannot provide
genuine accountability. Pressure to teach to the test distorts and narrows
education. Instead of being accountable to parents, community, teachers and
students, schools become „accountable‰ to a completely unregulated testing
industry.
Better methods of evaluating studentsÊ needs and progress already exist. Good
observational checklists used by trained teachers are more helpful than any
screening test. Assessment based on student performance on real learning tasks
is more useful and accurate for measuring achievement and provides more
information than multiple-choice achievement tests.
Trained teams of judges can be used to rate performance in any academic or non-
academic area. In the Olympic Games, for example, gymnasts and divers are
rated by panels of judges and the high and low scores are thrown out. Studies
have shown that, with training, the level of agreement among judges (the „inter-
rater reliability‰) is high. As with multiple-choice tests, it is necessary to enact
safeguards to ensure that race, class, gender, linguistic or other cultural biases do
not affect evaluation.
6.10.2 Civil Services

The civil service exam is a comprehensive exam given to those who want to
become a civil servant, a term often used to refer to a professional job in the
government. Passing the exam is a prerequisite to many government jobs at the
local, state and federal levels in the US. These civil service jobs may each require
a different civil service exam, depending on the situation. Questions are
generally split between those related to general knowledge and academics, as
well as specific knowledge requirements based on the job.
The general knowledge portion of the civil service exam covers basic areas such
as arithmetic and possibly even advanced arithmetic, depending on the job.
These questions may be particularly suited to money handling or word problems
based on different jobs. Interpretation of graphs and statistics may also be a
portion of the test, especially for those going into fields that are more analytical
in nature, such as finance and government accounting jobs.

ACTIVITY 6.4
When they were young, students were given tests to gauge their
abilities. Do you think these tests were helpful in deciding their future?
6.11 APTITUDE TESTING

Aptitude tests are used in todayÊs workplace, as well as in todayÊs educational
systems for a variety of reasons. For employers, aptitude tests are used to screen
potential job applicants, to determine which employees are naturally best suited
for certain positions.
In the public educational system, aptitude tests are used to score students and
determine how well certain educational approaches are compared to others.
Regardless of the format of an actual aptitude test, practice aptitude tests come in
different forms and formats. In fact, there are businesses today that depend on
people wondering on how to practise an aptitude test and that sell sample
aptitude tests.
Aptitude tests are meant to measure mental development and intellectual

abilities. They make the test taker aware of how well she or he can perform under
a given situation. Today, there are multiple resources to analyse a personÊs
aptitude. A person can find a number of career tests on the internet, both paid
and free services. By taking these tests, candidates are able to get a sense of their
capabilities to comprehend instructions and then apply their previously acquired
skills and knowledge to make good inferences. These tests tell a person how he
or she will perform in the future.
6.11.1 Career Aptitude Tests versus Attainment Tests

The career aptitude test, as mentioned above, analyses general capabilities for
predicting future performance. Though these tests are a part and parcel of all
psychological assessments, they vary from one group of people to another.
Aptitude tests differ for people belonging to different cultural groups.
On the other hand, an attainment test is different from a career aptitude test.
Attainment tests are meant to measure academic achievements. They are used to
predict achievement in different subjects including social studies, science and
mathematics. Attainment tests do not differ according to its application in
different cultures.

6.11.2 Aptitude Tests versus Intelligence Quotient

(IQ) Tests
How do you know whether you are a genius, of average intelligence or someone
below average? Intelligence quotient (IQ) tests are designed to test how an
individual has developed mentally. Though aptitude tests also do the same
thing, they measure an individualÊs intelligence for predicting his or her future
performance.
In most cases, an aptitude test may be the same as an IQ test. Owing to court
rulings, however, aptitude tests do not use the term IQ or do not interpret the IQ
scores as the result of an aptitude test.
6.11.3 Encounter with a Career Aptitude Test

The process of sitting for a career aptitude test is usually done in the following
way:
(a) Before test takers start taking the aptitude test, they will be given a solved
practice test paper. The test takers need to understand the requirements of
the test by going through the given test paper;
(b) After this introductory preparation, the tester will provide the test takers
with a long questionnaire, containing multiple-choice questions; and
(c) They will need to answer all the multiple-choice questions within the
provided time limit.
The test taker should not worry if they are given a maximum number of
questions to answer. These are given to candidates to test their capability of
handling stressful situations. Both accuracy and speed of candidates are tested
through career aptitude tests.
6.11.4 What Characteristics Do Aptitude Tests

Analyse?
By taking an aptitude test, a test taker will come to know about their ability to
perform a role in the future. These tests analyse some of the most essential
characteristics of a person, including:
(a) Logical thinking and analytical skills;
(b) Strengths and weaknesses;

(c) Leadership skills;
(d) Comprehension and communication skills;
(e) Capabilities a person can work upon and improve; and
(f) The hidden potentials that an individual can use to perform his or her role.
Psychological testing companies have also developed job-specific career tests.

Specialised career aptitude tests assist employers in selecting the right candidates
for specific job positions. For candidates, these tests have to play a crucial role in
determining the right career path. Jobseekers, students and career changers need
to take these aptitude tests seriously to become successful in their careers.
SELF-CHECK 6.3
1. Differentiate the pros and cons between individual tests and

group tests.
2. Explain the various kinds of individual tests that are available.
3. What do you think is the future of aptitude tests?
4. Explain numerical aptitude tests with examples.
5. Explain the Woodcock Johnson Test of Achievement in detail.
 Achievement tests are designed to measure accomplishment while aptitude

tests measure what knowledge an individual has already acquired prior to
taking the test.
 Different jobs may require different capabilities to perform them perfectly.

Depending upon the job requirements, general aptitude tests may either be
used individually or in a composite way.
 Group-administered cognitive or achievement tests are cost effective but have

limitations in flexibility.
 Multiple aptitude tests consist of a set of tests meant for general use, while
special aptitude tests are used for special programmes.

134  TO
OPIC 6 ABILIT
TY, APTITUDE AND
A ACHIEVEM
MENT TEST
 The first multiple

m aptitude test batteery was publlished in 1941 known as thet
Chicago Teests of Primarry Mental Ab bilities. Later, based on Th
hurstoneÊs facttor
analytic in
nvestigation, the
t Differentiial Aptitude Tests
T (DAT) and
a the Geneeral
Aptitude Test
T Battery (G GATB) were developed.
d
 Individuallised achieveement tests are useful for assessin ng a studen ntÊs

academic abilities. Theey are design
ned to measu ure both pree-academic and
academic behaviour
b fro
om the ability to match pictures
p and letters, to moore
advanced literacy and mathematical skills. Theyy can be helpful in assessiing
needs as well.
w
 Aptitude tests are ussed in today yÊs workplacce, as well as in today yÊs
educationaal system for a variety of reasons.
r For employers,
e ap
ptitude tests are
a
used to sccreen potential job applicants to deterrmine which employees are a
naturally best
b suited forr certain posittions.
 Psychologiical testing companies

c h
have developed job-speciffic career tessts.
Specialised
d career aptitude tests assist
a employyers in seleccting the rigght
candidatess for specific job positions.
A
Ability test General leearning ability
y
A
Abstract reasoning Group tests
A
Achievement
t test Individuall tests
A
Aptitude testt Multiple aptitude
a tests
A
Arithmetic reeasoning Power testt
A
Attainment teest Spatial apttitude
F
Form percepttion Speed testt
Aiken, L. R., & Groth-Marrnat, G. (20066). Psycholog

A gical testing and
a assessmeent.
Boston, MA:
M Allyn an
nd Bacon.
Aptitude Testss. (2010). Retrrieved from

A
http://w
www.scribd.co om/doc/375665606/Aptitu
ude-Tests

Aptitude Tests – What You Need to Know. (2013). Retrieved from

http://www.psychometric-success.com/aptitude-tests/aptitude-tests-
introduction.htm
Bennett, G. K., Seashore, H. G., & Wesman, A. G. (1974). Fifth edition manual for
the Differential Aptitude Tests. New York: Psychological Corporation.

introduction to tests and measurement. Mountain View, CA: McGraw-Hill.

introduction to tests and measurement. Boston, MA: McGraw-Hill Higher
Education.
Flanagan, J. C. (1964). Project TALENT: The American high school student.

Pittsburgh, PA: University of Pittsburgh. Project TALENT Office.

KeyMath™-3 Diagnostic Assessment. (2014). Retrieved from

http://www.pearsonclinical.com/education/products/100000649/keymat
h3-diagnostic-assessment.html#tab-pricing
Psychometric Success ă Free Practice Aptitude Tests. (2013). Retrieved from

http://www.psychometric-success.com
Thurstone, L. L. (1938). Primary mental abilities. Chicago, IL: University of

Chicago Press.
What Are the Different Types of Standardized Test Questions? (2014). Retrieved
from
http://www.wisegeek.com/what-are-the-different-types-of-standardized-
test-questions.htm

Topic  Attitudes,
7 Values and
Interests Tests
LEARNING OUTCOMES
1. State the concepts of attitudes, values and interest;
2. Explain the Strong Interest Inventory;
3. Understand the Kuder Occupational Interest Survey;
4. Describe Career Assessment Inventory;
5. Discuss the Jackson Vocational Interest Survey (JVIS); and
6. Identify various issues of psychology tests and measurement in
business and industrial settings.
 INTRODUCTION
An attitude is a hypothetical construct that represents an individualÊs degree of
preference for an item. Attitudes are generally positive or negative views
towards a person, place, thing or event ă all these, which attitudes are projected
on, are often referred to as attitude objects. People can also be conflicted or
ambivalent towards an object, meaning that they simultaneously possess both
positive and negative attitudes towards the item in question. We will be
discussing attitudes, values and interests in detail in this topic.
After discussing the concepts of attitudes, values and interests, the related
psychology tests and measurement tools will be introduced. At the end of this
topic, the various issues related to the applications of psychology testing and
measurement in industrial and business settings will be discussed as well.

TOPIC 7 ATTITUDES, VALUES AND INTERESTS TESTS  137
7.1 THE CONCEPTS OF ATTITUDES, VALUES

AND INTEREST
Before discussing psychology test and measurement on attitudes, values and
interest, it is essential to understand theoretically these three related concepts.
7.1.1 Attitudes
Attitudes are judgements. They develop based on the affect (A), behaviour (B)
and cognition (C) or so-called ABC model. Figure 7.1 shows the different
components of the ABC model of attitudes and also provides a description for
each component.
Figure 7.1: The ABC model of attitudes
Most attitudes are the result of either direct experience or observational learning
from the environment. Unlike personality, attitudes are expected to change as a
function of experience.
Tesser (1993) argued that hereditary variables may affect attitudes but also
believed that they may do so indirectly. For example, consistency theories imply
that we must be consistent in our beliefs and values. The most famous example
of such a theory is the Dissonance-reduction theory, which has been introduced
in the course of Social Psychology and is associated with Leon Festinger,
although there are other theories for explaining attitudes as well, such as the
balance theory.

138  TOPIC 7 ATTITUDES, VALUES AND INTERESTS TESTS
Attitude Change
Attitudes can be changed through persuasion and we should understand attitude
change as a response to communication. Experimental research into the factors
that can affect the persuasiveness of a message is as shown in Figure 7.2.
Figure 7.2: Three factors that affect the persuasiveness of a message
Let us now discuss each factor in greater detail.
(a) Target Characteristics

These are characteristics that refer to the person who receives and processes
a message. The main target characteristics are as follows:
(i) One such trait is intelligence ă it seems that more intelligent people
are less easily persuaded by one-sided messages; and
(ii) Another variable that has been studied in this category is self-esteem.
Although it is sometimes thought that those higher in self-esteem are
less easily persuaded, there is some evidence that the relationship
between self-esteem and persuasibility is actually curvilinear, with
people of moderate self-esteem being more easily persuaded than
those of both high and low self-esteem levels (Rhodes & Woods,
1992). The mind frame and mood of the target also play a role in this
process.
(b) Source Characteristics

The major source characteristics are:
(i) Expertise;
(ii) Trustworthiness; and
(iii) Interpersonal attraction or attractiveness.

The credibility of a perceived message has been found to be a key variable

here; if one reads a report about health and believes it came from a
professional medical journal, one may be more easily persuaded than if one
believes it is from a popular newspaper.
Some psychologists have debated whether this is a long-lasting effect and

Hovland and Weiss (1951) found the effect of telling people that a message
came from a credible source disappeared after several weeks (the so-called
„sleeper effect‰). Whether there is a sleeper effect or not is controversial.
Perceived wisdom is that if people are informed of the source of a message
before hearing it, there is less likelihood of a sleeper effect than if they are
told a message and then told its source.
(c) Message characteristics

The nature of the message plays a role in persuasion. Sometimes presenting
both sides of a story is useful to help in changing attitudes.
Attitude Change and Emotion

Apart from the three factors related to the characteristics of a message that may
bring about attitude change, an aspect related to the human factor in causing
attitude change is the emotional state of a person.
By activating an affective or emotion node, attitude change may be possible,

although affective and cognitive components tend to be intertwined. In primarily
affective networks, it is more difficult to produce cognitive counterarguments in
the resistance to persuasion and attitude change.
Affective forecasting, otherwise known as intuition or the prediction of emotion,

also impacts attitude change. Research suggests that predicting emotions is an
important component of decision-making, in addition to cognitive processes.
How we feel about an outcome may override purely cognitive rationales.
In terms of research methodology, the challenge for researchers is in measuring

emotion and its subsequent impacts on attitude. Since we cannot „see emotions‰
in the brain, various models and measurement tools have been constructed to
obtain emotion and attitude information.
Measures may include the use of physiological cues like facial expressions, vocal
changes and other body rate measures. For instance, fear is associated with raised
eyebrows, increased heart rate and increased body tension (Dillard, 1994). Other
methods include concept or network mapping and using primes or word cues.

Any discrete emotion can be used in a persuasive appeal; this may include
jealousy, disgust, indignation, fear and anger. Fear is one of the most studied
emotional appeals in communication and social influence research.
Important consequences of fear appeals and other emotional appeals include the
possibility of reactance, which may lead to either message rejections or source
rejection and the absence of attitude change. There is an optimal emotion level in
motivating attitude change. If there is not enough motivation, an attitude will not
change. If the emotional appeal is overdone, the motivation can be paralysed
thereby preventing attitude change.
SELF-CHECK 7.1
List the various sources of attitude including how it is formed or

changed.
7.1.2 Values
Do you know what values are? Let us look at the following definition of this
term.
Values are that which an individual prizes or the ideals an individual

believes in (Cohen & Swerdlik, 2010). A value system is a set of consistent
values and measures. A principle value is a foundation upon which other
values and measures of integrity are based.
At a personal level, the value possessed by an individual can be absolute or

relative. It usually forms the ethical value of that particular individual. This
ethical value becomes an assumption which can serve as the basis for ethical
actions that an individual may make in his/her life.
Values which are not physiologically determined and normally considered

objective, such as a desire to avoid physical pain and seek pleasure are
considered subjective and vary across individuals and cultures and are in many
ways aligned with belief and belief systems.

Types of values include the following:
(a) Ethical/moral values;
(b) Doctrinal/ideological (religious, political) values;
(c) Social values; and
(d) Aesthetic values.
An ongoing debate on values is whether some values which are not clearly
physiologically determined are intrinsic such as altruism and whether some such
as acquisitiveness should be valued as vices or virtues.
Values have typically been studied across various fields including:
(a) Sociology;
(b) Anthropology;
(c) Social psychology;
(d) Moral philosophy; and
(e) Business ethics.
SELF-CHECK 7.2
1. How can attitudes be positive or negative?
2. What is the difference between beliefs and values?
7.1.3 Interest
The word „interest‰ can be defined differently by different people, but how do
we define it from the perspective of psychology?
The concept of interest from a psychology perspective means the preference

for doing something. Interest tests assess the various „interests‰ of an
individual and classify them into high, medium or low.

It is always better to choose a career where oneÊs interest is high because only
then will the person find the job interesting. His/her productivity and personal
job satisfaction will be high in such jobs. It is obvious that one will do well in an
area in which one is interested in.
Today, there is a mad rush to enter into professions like management, software
and information technology services. However after getting into such services,
many young people become bored and disinterested. This often leads to
mediocre or poor performance. It is therefore necessary to identify the areas of
interest of an individual before suggesting careers to him/her. A good
psychology test and measurement in determining an individualÊs interest will be
helpful for this purpose.
ACTIVITY 7.1
After exploring the three concepts of attitudes, values and interests, try
to reflect upon your own personal attitudes, values and interests to
better understand yourself.
You may do these in small discussion groups in tutorial classes and in

the myVLE forum and then write them down after the discussions.
7.2 THE STRONG INTEREST INVENTORY (SII)

An interest inventory is a testing instrument designed for the purpose of
measuring and evaluating the level of an individualÊs interest in, or preference
for, a variety of activities; also known as an interest test. What is the Strong
Interest Inventory (SII)? Let us read its definition to know more.
The Strong Interest Inventory (SII) is an assessment that categorises your

interests in leisure and work settings. Your interests are categorised into six
career areas and lists of possible careers of interest.

Testing methods include direct observation of behaviour, ability tests and self-
reporting inventories of interest in educational, social, recreational and
vocational activities. The activities usually represented in interest inventories are
related to various occupational areas and these instruments and their results are
often used in vocational guidance.
7.2.1 Brief History of SII

The history of the emergence of interest tests can be traced back to the time
shortly after the World War I. The first widely used interest inventory was the
Strong Vocational Interest Blank (SVIB), developed in 1927 by E. K. Strong.
Later, the SII was developed. SII is an interest inventory used in career
assessment. It is also frequently used for educational guidance as one of the most
popular career assessment tools. The test was initially developed in 1927 by
psychologist E. K. Strong, to help people exiting the military to find suitable jobs.
It was revised later by Jo-Ida Hansen and David Campbell.
The modern version is based on the typology (Holland Codes) of psychologist

John L. Holland (see Figure 7.3). The newly revised inventory consists of 291
items, each of which asks the respondents to indicate their preference using five
responses. It is an assessment of interests, not to be confused with personality
assessments or aptitude tests.
Figure 7.3: John L. Holland

Source: http://www.self-directed-search.com/what-is-it-/john-holland

For nearly 80 years, the SII assessment has guided thousands of individuals in
exploring careers and college majors. The assessment is the most respected and
widely used career-planning instrument in the world.
7.2.2 The Features of SII

SII is a professional career interest inventory that is:
(a) Well researched and extensively validated; and
(b) Used by career coaches and college counsellors worldwide.
SII comes in two versions:
(a) A simplified printed version ă Strong Interest Explorer; and
(b) An online Holland Code assessment that helps to identify interests,

Holland Codes and careers.
The results of SII include:
(a) Scores on the level of interest on each of the six Holland Codes or General
Occupational Themes. Holland Code Themes include realistic,
investigative, artistic, social, enterprising and conventional;
(b) Scores on 25 Basic Interest Scales (e.g. art, science and public speaking);
(c) Scores on 211 Occupational Scales which indicate the similarity between the
respondentÊs interests and those of people working in each of the 211
occupations;
(d) Scores on four Personal Style Scales (learning, working, leadership and risk
taking); and
(e) Scores on three Administrative Scales used to identify test errors or unusual
profiles.

Figure 7.4 explains the functions of SII.
Figure 7.4: The functions of the Strong Interest Inventory
Figure 7.5 shows the four fundamental score scales of SII.
Figure 7.5: Four fundamental score scales of SII

Let us now discuss the scales one by one.
(a) General Themes

Description of the interrelationship between Holland Codes and interests,
work activities, potential skills and personal values.
(b) Basic Interest Scales

Identifying your Highest Holland Code Themes, Holland Theme Code,
Standard Score and Interest Levels, Basic Interest Scales point to work
activities, projects, course work and leisure activities that are personally
motivating and rewarding. The Interest Scale Levels are:
(i) Very Little;
(ii) Little;
(iii) Moderate;
(iv) High; and
(v) Very High.
(c) Occupational Scales

Comparison of your likes and dislikes with people who are satisfied with
their work, across various occupations. The Occupational Scales match your
interests to 122 occupations. Your score will match the likes and dislikes of
people who are working in and are satisfied in that career. The occupations
are an example of a larger job cluster. The Top Ten Occupations are the
careers that most closely match your interests. Within each Holland Code
Theme, you will find careers that are Dissimilar, Midrange or Similar to
your score, likes and dislikes.

(d) Personal Style Scales

Description of relationship among Holland Code Themes, work styles,
learning, risk taking and team work. Examples of Personal Style Scales
include:
(i) Working with people;
(ii) Enjoying helping others;
(iii) Preferring practical learning environments;
(iv) Preferring short-term training;
(v) Taking charge of others;
(vi) Taking risks;
(vii) Making quick decisions; and
(viii) Working in teams.
7.2.3 The Application of SII

The test can typically be taken in 25 minutes after which the results must be
scored by a computer. It is then possible to show how certain interests compare
with the interests of people successfully employed in specific occupations. Access
to the comparison database and interpretation of the results usually incurs a fee.
SII is the most widely used and respected instrument for career exploration in the
world. For the US, the newly revised SII is a powerful tool as its content reflects
the way the people in the US work today. This includes, the many changes in the
workforce, the very nature of the jobs they do and the mirroring of the US
population. In particular, the folks at CPP (Consulting Psychologists Press) are
most proud of the huge sampling size as well as the widest possible range of
demographic, racial, ethnic and socio-economic data gathered in ensuring the
highest level of validity and reliability for the SII.

At its core level, the SII is based on the idea that individuals are more satisfied
and productive when they work in jobs or at tasks that they find interesting and
when they work with people whose interests are similar to their own. To say it
another way, a personÊs interests are compared to thousands of individuals who
report being happy and successful in their jobs and in general, are doing well in
them.
To say again, the SII does not examine your abilities and skills; it is an inventory
of your interests. Consisting of 291 questions, the SII will ask you to indicate your
preference for a wide range of occupations, school subjects, activities and types of
people. It will take about 30 to 45 minutes to complete and its results can be
viewed online. The result is a personÊs highly personalised report, which
identifies optimum career choices based on interests. It also includes additional
related occupations with concise job descriptions.
For example, the results may tell you that your interests are similar to those of
engineers who are very satisfied with their career choice. The results however do
not tell you what you should be or whether you have an aptitude for the level of
mathematics involved in this career, i.e. whether you would be good at that job.
SELF-CHECK 7.3
1. What is the point of using the SII?
2. How does the SII indicate whether you will be good at your job or
not?
7.3 KUDER OCCUPATIONAL INTEREST

SURVEY (KOIS)
Now, let us look at the definition of the Kuder Occupational Interest Survey.
The Kuder Occupational Interest Survey (also known as KOIS and „The
Kuder‰) is a self-report vocational interest test used for vocational guidance
and counselling. It originated in the work of G. Frederic Kuder, who first
began publishing about the instrument in 1939.

The Kuder is often compared to other vocational interest tests, such as the Strong
Interest Inventory. While the SII test compares the interests of the person to those
of certain groups of people holding certain occupations, the Kuder focuses on
measuring the personÊs broad areas of interest. Thus, the Kuder will yield the
personÊs scores along ten vocational interest scales as shown in Figure 7.6.
Figure 7.6: Ten vocational interest scales of the KOIS
7.3.1 Some Brief Psychometric Features

The Kuder test results are presented as percentile scores and the report lists them
separately for men and women. It then compares the individualÊs scores on these
scales to scores obtained by people holding certain professions and lists the top
matches. It will also report the match between the examineeÊs interests and the
interests reported by representative samples of students majoring in certain
academic fields.
The survey itself is a paper-and-pencil test that consists of 100 forced-choice

triads of activities. For each triad, the person marks the activity preferred most
and preferred least, leaving his/her intermediate choice blank. The test usually
takes about 30 minutes to complete. It is published by the Science Research
Associates, Inc. in Chicago, IL. Professionals who purchase the test, pay for the
self-report blanks and then mail them to the company to obtain a score report.
Internal consistency of the vocational interest scales range from .47-.85 with a
median of .66. Median stability estimates over two weeks are .80 for the
vocational interest scales and .90 for the specific occupation scales. Validity
research has generally been based on „hit rates‰ (the scale scores which match
the actual occupations of the research participants) and factor analyses. The
Kuder has a dependability scale that may exercise caution in interpreting the
results if there are indications that the personÊs interests „are not settled‰.

7.3.2 The Kuder Test Survey

Since 1939, for 65 years, the Kuder Test Survey helped millions of youths and
adults worldwide discover their interests, skills and work values. Figure 7.7 lists
the different groups of people who can benefit from the Kuder Test Survey.
Figure 7.7: Groups of people who can benefit from the Kuder Test Survey

7.3.3 Kuder Journey

The Kuder Journey is an easy step-by-step process which helps with career
planning in many different ways based on an individualÊs specific needs. The
Kuder Journey consists of the following six items:
(a) Three job career tests;
(b) College by major information;
(c) Career job finder;
(d) Career job descriptions;
(e) Career portfolio; and
(f) Resume tutorial.
Figure 7.8 lists the benefits of the Kuder Journey.
Figure 7.8: Benefits of the Kuder Journey

How is it possible that the Kuder Journey can be beneficial in all these ways? It is
possible because the Kuder Journey helps you to answer the following questions
in Figure 7.9, which will then enable you to achieve the Kuder Journey benefits.
Figure 7.9: Questions asked in Kuder Journey
Generally, the Kuder Journey also helps you to:
(a) Identify skills, interests, abilities and values.
(b) Find a cluster of careers that match your skills, interests, abilities and
values.
(c) Prepare for post-secondary education.
(d) Highlight specific programmes based on your interests and skills.

(e) Search for jobs.
(f) Create a resume.
(g) Build a portfolio or e-portfolio.
(h) Focus on:
Specific career job descriptions
(i) Working conditions
(ii) Important skills
(iii) Important knowledge areas
(iv) Nature of the work
(v) Job titles
(vi) General work activities
(vii) Important interests
(viii) Trends
(ix) Detailed work activities
(x) Important abilities
(xi) Training
(xii) Additional information
(xiii) Important work values
(xiv) Specific tasks typical of the occupation
(i) Get information on:
(i) Application and admission factors and costs;
(ii) Types of instructions or programmes offered;
(iii) Costs and financial aid;
(iv) Specific instructional programmes;
(v) Major areas of instruction;

(vi) Graduation rates;
(vii) General campus and student body information;
(viii) Degree or certificate types offered or awarded; and
(ix) College and school results.
7.4 CAREER ASSESSMENT INVENTORY (CAI)

Next, let us examine the Career Assessment Inventory (CAI). Do you know what
it is about? Let us read the following definition of this term.
The Career Assessment Inventory (CAI) is an objective, vocational interest

inventory that compares occupational interests and personality preferences
with individuals in over 100 specific careers.
It is used by school guidance counsellors (in high schools, community colleges

and universities), psychologists and personnel professionals for career guidance,
adult career development and human resource development.
Providing occupational scales for 111 occupations requiring varying amounts of

post-secondary education, the Enhanced Version of CAI enables counsellors to
explore a variety of career possibilities with their clients.
Key areas measured in CAI include the following as shown in Table 7.1.
Table 7.1: Key Areas Measured in CAI
 Basic interest area  Service professions  Fine art

 Investigative  Professional occupations  Mechanical
 Conventional  Educational orientation  Realistic
 Occupational  Occupational extroversion  Artistic
 Skilled trades  Occupational introversion  Social
 Non-occupational  Variability of interests  Enterprising

7.4.1 Key Features of CAI

The key features of CAI are:
(a) Provides scales for 111 occupations requiring varying amounts of post-
secondary education;
(b) Easy to administer, taking only about 40 minutes to complete;
(c) Graphic and narrative test reports can be shared with the client and the
narrative report provides a three-page counsellorÊs summary;
(d) Combined gender scales allow for the broadest interpretation of survey
results; and
(e) The inventory closely matches the distribution of professional and non-
professional jobs in the labour force, making it well-suited for assessing
groups with a variety of career aspirations (e.g., complete high school
populations).
7.4.2 The Usage of CAI

This inventory can be used for three main purposes which are to:
(a) Teach students to focus on their patterns of interest that are important in
making educational and occupational choices;
(b) Help high school and college students identify career directions and major
areas of study; and
(c) Advise individuals who are re-entering the workforce, considering a career
change, or who have been displaced.
CAI has been updated to provide additional occupations, new suggested

readings, new vocational codes and career resources on the web. The vocational
version of CAI focuses on careers requiring less than two years of post-secondary
training.
The CAI ă Enhanced Version assessment compares an individualÊs occupational

interests to those of individuals in 111 specific careers that reflect a broad range
of technical and professional positions in todayÊs workforce.

The inventory is used by:
(a) Guidance counsellors to help students and adults develop their career and
study plans; and
(b) Psychologists and human resource professionals to advise individuals on

career development.
SELF-CHECK 7.4
How has the updated version of CAI been utilised more?
7.5 JACKSON VOCATIONAL INTEREST SURVEY

(JVIS)
Do you know what the Jackson Vocational Interest Survey (JVIS) is? Let us look
at its definition.
The Jackson Vocational Interest Survey (JVIS) is a comprehensive, accurate,

gender-fair career test which matches each individual's unique set of
interests with relevant academic and career fields.
The Jackson Vocational Interest Survey (JVIS) is an educational and career

planning tool. It provides a detailed snapshot of your interests and how they
relate to the world of study and work. It will focus your search for professional
and academic satisfaction.
You will have access to links, resources and industry contacts to help you learn
more about the careers and university majors which will in turn help you to
make the most of your time and talent. It is well suited for people whose career
path includes a four-year university degree. It is also appropriate for individuals
considering a mid-career change.

TOPIC 7 ATT
TITUDES, VALUES AND INTER
RESTS TESTS  157
Figure 7.110: Dr. Douglaas Jackson

Source: http://psycholo ogy.uwo.ca/faacultyremembrrance.htm
The JVIS S was writtenn by Dr. Dou uglas Jackson n (refer to Fiigure 7.10), the same
psycholoogist who developed
d th
he intelligennce test used d to screen n NASA
astronau
uts. He is a world
w authorrity on the subject
s of humman assessm ment and
among other honou urs, was thee President of the American Psych hological
Associattion Division of Measurem ment, Evaluattion and Statiistics. The JVIIS is one
of the most
m carefully and elaborateely constructeed psychologgical instrumeents ever
created.
7.5.1 Applica
ations of JVIS
J
The JVIS
S is applicablee under the fo
ollowing categ
gories:
(a) Carreer and edu

ucational coun
nselling for high
h school, co
ollege and un
niversity
stu
udents;
(b) Carreer planning

g for adults, in
ncluding mid
d-life career reedirection; and
(c) Corporate restru

ucturing.

7.5.2 Description of JVIS

The JVIS consists of 289 pairs of statements describing professional job-related
activities. The forced-choice format requires an individual to choose between two
equally popular interests.
The JVIS assesses work roles (for example, engineering) and work environment
preferences (for example, job stability), as well as measures potential academic
satisfaction. Detailed reports provide links, resources and industry contacts to
help individuals learn more about their highest ranked careers and university
majors.
SELF-CHECK 7.5
Give examples of jobs where you can use JVIS.
7.5.3 Basic Interest Scales of JVIS

Interests are of different kinds for different people. In JVIS, there are 34 basic
interest scales. The following Table 7.2 presents some of these basic interest
scales.
Table 7.2: Some of the Basic Interest Scales in JVIS
 Creative arts  Social science  Medical service

 Performing arts  Adventure  Dominant leadership
 Mathematics  Nature-agriculture  Job security
 Physical science  Skilled trades  Stamina
 Engineering  Personal service  Accountability
 Life science  Family activity  Teaching

7.5.4 The Scoring Methods

The JVIS manual and several research studies provide strong support for the
reliability and validity of this carefully constructed assessment. There are four
main types of scoring methods in JVIS. Figure 7.11 summarises the scoring
methods applied to JVIS.
Figure 7.11: Four types of scoring methods for JVIS
Let us now discuss each one in greater detail.
(a) Hand Scoring

Hand scoring requires no template and is unusually easy, with a basic
interest profile plotted in ten minutes or less. Materials required for hand
scoring include:
(i) A manual;
(ii) One reusable test booklet;
(iii) One hand-scorable answer sheet; and
(iv) One profile sheet per respondent.

(b) Mail-in Scoring

Two reports are available via the mail-in service:
(i) The JVIS Extended Report includes the basic interest profile, a profile
for 10 general occupational themes, a profile of similarity to 17
educational major field clusters, a ranking of 32 occupational group
clusters, validity scales, an academic satisfaction score and other
information. A narrative summary of the three highest-ranked
educational and occupational clusters is particularly useful. Finally, a
section titled „Where to go from here‰ offers information on related
career exploration books and activities; and
(ii) The JVIS Basic Report contains the basic interest scales profile and
data similar to the Extended Report but with pre-printed interpretive
information rather than the personalised narrative summaries.
(c) Software
The SigmaSoft JVIS for Window software allows you to administer and
score the JVIS on computer. The JVIS for Windows software produces three
types of reports:
(i) The Extended Report is similar to that of the Mail-in Scoring service;
(ii) The Basic Report contains all of the profiles found in the Extended
Report, but does not provide explanatory text and career information;
and
(iii) The Data Report contains the scores found in the Basic Report in a
format designed for use by other programs.
According to product information from Sigma Assessment Systems

Inc., in addition to the software itself, the test takers will need to
purchase enough coupons to pay for each report they wish to
produce, as shown in Table 7.3 below.
Table 7.3: Report Type and Coupons Required
Report Type Coupons Required

Extended 6
Basic 4
Data 2
Source: www.sigmaassessmentsystems.com/assessments/jvis.asp

(d) Internet
The JVIS is available online in two formats, which are:
(i) SigmaTesting.Com: The main testing site, which gives the counsellor
complete control over administration and report handling; and
(ii) JVIS.Com: The career site which offers a more self-driven approach,
with an online report linked to numerous online career resources.
SELF-CHECK 7.6
1. Define attitude and values. What are the different types of

values?
2. What are the consequences of being compelled to do a job that is

not of your interest?
3. What are the key features that are examined by the Kuder Interest
Inventory?
4. How can you determine the reliability and validity of the CAI?
5. What are the basic interest scales in JVIS?
7.6 PSYCHOLOGY TESTS AND MEASUREMENT

IN INDUSTRIES AND BUSINESSES
From the discussion in this topic so far, you can notice that the attitudes, values
and interests tests are the more significant psychology tests and measurement
used in career planning. In relation to this, this section will further look into
various issues in general when psychology tests and measurement are applied in
industries and businesses.
7.6.1 The Roles of Test and Assessment in

Organisations
Organisations are social systems. As they interact with the environment, they are
highly dynamic and not stagnant. They keep on changing as the preferences of
their stakeholders (employees, customers and management) change and also as a
result of growing competition. Globalisation has compelled organisations to
create strategic advantages for them to tackle the challenges they face and remain
competitive for their survival and growth.
TodayÊs organisations are shifting from „earning organisations‰ to „learning

organisations‰. As a result of this paradigm shift, new organisations are
emerging which are more responsive to both internal and external environments.
Over the years, organisations have increasingly become aware of the importance
of human resources. If they want to survive in this global world, they need to
have a competent workforce so as to carry out their functions and operations
smoothly.
In order to work this out, it is very important to carefully select employees as per
the job requirements. Therefore psychology tests such as attitudes, interests and
values tests together with other relevant employment tests and assessment are
useful in helping industrial and business organisations to select suitable
employees.
SELF-CHECK 7.7
What is the significance of psychological tests in the selection of new

employees?
7.6.2 Aspects of Tests and Measurements

Psychology tests and measurement can be applied in many aspects related
to various issues of employment applicable in business, industrial and
organisational settings in personnel selection, job placement, making important
business decisions and appraisal of employees.
7.6.3 Assessment Centres

Assessment centres can be designed to measure many different types of job-
related skills and abilities, but are often used to assess four different skills as

Figure 7.12: Skills assessed in assessment centres
The assessment centre typically consists of exercises that reflect job content and
types of problems faced on the job. For example, individuals might be evaluated
on their ability to make a sales presentation or on their behaviour in a simulated
meeting.
In addition to these simulation exercises, assessment centres often include other

kinds of tests such as:
(a) Cognitive ability tests;
(b) Personality inventories; and
(c) Job knowledge tests.
The assessment centre typically uses multiple raters who are trained to observe,
classify and evaluate behaviour. At the end of the assessment, the raters meet to
make overall judgements about the performance of the participants in the centre.

7.6.4 Biographical Data

The content of biographical data instruments vary widely and may include areas
such as leadership, teamwork skills, specific job knowledge and specific skills (e.g.
knowledge of certain software and specific mechanical tool used), interpersonal
skills, extraversion and creativity.
Biographical data typically uses questions about education, training, work

experience and interests to predict success on the job. Some biographical data
instruments also analyse an individualÊs attitudes, personal assessments of skills
and personality.
7.6.5 Cognitive Ability Tests

Cognitive ability tests typically use questions or problems to measure ability. It is
also used to test a personÊs ability to learn, quickly use logic, to be able to reason,
reading comprehension and other enduring mental abilities that are fundamental for
success in many different jobs.
Cognitive ability tests assess a personÊs aptitude or potential to solve job-related

problems by providing information about their mental abilities such as verbal or
mathematical reasoning and perceptual abilities like speed in recognising letters of
the alphabet.
7.6.6 Integrity Tests

Integrity tests assess attitudes and experiences related to a personÊs honesty,
dependability, trustworthiness, reliability and pro-social behaviour.
These tests typically ask direct questions about previous experiences related to
ethics and integrity or ask questions about preferences and interests from which
inferences are drawn about future behaviour in these areas. Integrity tests are
used to identify individuals who are likely to engage in inappropriate, dishonest
and anti-social behaviour at work.

7.6.7 Interviews
Interviews vary greatly in their content, but are often used to assess interpersonal
skills, communication skills and teamwork skills as well as to assess job
knowledge.
Well-designed interviews typically use a standard set of questions to evaluate

knowledge, skills, abilities and other qualities required for the job. The interview
is the most commonly used type of test. Employers generally conduct interviews
either face-to-face or by phone.
7.6.8 Job Knowledge Tests

Job knowledge tests typically use multiple choice questions or essay type items to
evaluate technical or professional expertise and knowledge required for specific
jobs or professions. Examples of job knowledge tests include tests of basic
accounting principles, A+/Net+ programming and blueprint reading.
7.6.9 Personality Tests

Personality tests typically measure traits related to behaviour at work,
interpersonal interactions and satisfaction with different aspects of work. Some
commonly measured personality traits in work settings are extraversion,
conscientiousness and openness to new experiences. They also measure
optimism, agreeableness, service orientation, stress tolerance, emotional stability
and initiative or proactivity.
Personality tests are often used to assess whether individuals have the potential
to be successful in jobs where performance requires a great deal of interpersonal
interaction or work in team settings.

7.6.10 Physical Ability Tests

Physical ability tests typically use tasks or exercises that require physical ability
to perform. These tests typically measure physical attributes and capabilities,
such as:
(a) Strength;
(b) Balance; and
(c) Speed.
7.6.11 Work Samples and Simulations

These tests typically focus on measuring specific job skills or job knowledge.
However they can also assess more general skills such as organisational skill,
analytical skills and interpersonal skills.
Work samples and simulations typically require performance of tasks. They

should be the same or similar to those performed on the job to assess their level
of skill or competence. For example, work samples might involve installing a
telephone line, creating a document in MS Word or turning on an engine.
ACTIVITY 7.2
There are various employment tests and measurement used in business
and industrial settings. By doing additional readings and discussing
with your face-to-face and online tutors, identify the advantages and
disadvantages of various employment tests and measurement methods
that can be used in organisations.

SELF-CHECK 7.8
1. Explain the nature of employment tests. Why are they used?
2. Briefly discuss the various kinds of psychological tests used for

employee selection purposes.
3. What is the best method of testing the knowledge of an individual

applicant regarding his/her job?
4. How are psychological assessments beneficial in selecting the best

person for the job?
 Attitudes are judgements. They develop on the affect (A), behaviour (B) and
cognition (C) or so-called ABC model.
 Target, source and message characteristics can have effects on the

persuasiveness of a message.
 Value is an absolute or relative ethical value.
 Interest from a psychological perspective indicates the preference of doing

something.
 An interest inventory is a testing instrument designed for the purpose of

measuring and evaluating the level of an individualÊs interest in, or
preference for, a variety of activities; also known as the interest test.
 The Strong Interest Inventory (SII) is an assessment of interests, not to be

confused with personality assessments or aptitude tests.

 For nearly 80 years, the Strong Interest Inventory assessment guided

thousands of individuals in exploring careers and college majors.
 The six Holland Codes include Realistic, Investigative, Artistic, Social,

Enterprising and Conventional.
 The Kuder Occupational Interest Survey („The Kuder‰) is a self-report

vocational interest test used for vocational guidance and counselling.
 Since 1939, for 65 years, the Kuder Test Survey helped millions of youths and
adults worldwide discover their interests, skills and work values.
 Career Assessment Inventory (CAI) is an objective, vocational interest

inventory that compares occupational interests and personality preferences
with individuals in over 100 specific careers.
 The Career Assessment Inventory has been updated to provide additional

occupations, new suggested readings, new vocational codes and career
resources on the web.
 The Jackson Vocational Interest Survey (JVIS) is an educational and career

planning tool. It provides a detailed snapshot of your interests and how they
relate to the world of study and work. It will focus your search for
professional and academic satisfaction.
 The JVIS manual and several research studies provide strong support for the
reliability and validity of this carefully constructed assessment.
 The matching of an individualÊs physical, mental and temperamental pattern

with requirements of a specific job is a difficult task. Psychology tests and
measurements can be useful in assisting industry and business companies to
select suitable employees for their organisation.
 Hundreds of tests are available to help employers in decision-making. A test

is valid for application in business and industrial settings if the inferences
made based on the test score are accurate. For example, if we are correct in
concluding how well the individual does on the test, then it tells us how well
he/she will perform on the job.

TOPIC 7 ATT
TITUDES, VALUES AND INTER
RESTS TESTS  169
Academ
mic satisfactio
on Inveestigative
Artisticc Job knowledge
k teests
Attitud
des Learrning organisaations
Biograp
phical data Message characteeristics
Cluster upational inteerest
Occu
ntional
Conven Reallistic
Enterprrising Sociaal
Holland
d Codes Sourrce characteristics
Hypoth
hetical constru
uct Targ
get characterisstics
Integritty tests Valu
ue
Interestt Vocaational guidan
nce
Invento
ory Worrk sample and
d simulation exercise
e
Cohen, R.
R J., & Swerddlik, M. E. (20010). Psychollogical testing
g and assessm
ment: An
intr
troduction to tests and meeasurement (77th ed.). New w York: McGraw-Hill
Hig gher Educatio
on.
Dillard, J. P. (1994)). Rethinking

g the study of fear apppeals: An em
motional
perrspective. Com
mmunicationn Theory, 4, 2995ă323.
Hovland
d, C. I., & Weiss,
W W. (11951). The in nfluence of source
s credibbility on
mmunication effectivenesss. The Public Opinion
com O Quar
arterly, 15(4), 635ă650.
6

Psychology, D. (2014). The University of Western Ontario. Psychology.uwo.ca.

Retrieved from http://psychology.uwo.ca/facultyremembrance.htm
Rhodes, N., & Wood, W. (1992). Self-esteem and intelligence affect influence
ability: The mediating role of message reception. Psychological
Bulletin, 111(1), 156ă171.
Self-directed-search.com. (2014). John Holland. Retrieved from

http://www.self-directed-search.com/what-is-it-/john-holland
Tesser, A. (1993). The importance of heritability in psychological research: The

case of attitudes. Psychological Review, 100(1), 129ă142.

Topic  Personality
8 Test
LEARNING OUTCOMES
1. Explain the concepts of personality from the perspective of
psychology test and measurement;
2. State the development of personality testing;
3. Identify the objectives of personality testing; and
4. Describe projective personality tests.
 INTRODUCTION
Personality tests are the most popular tests in psychology. This is because almost
all people are interested in knowing what type of personality they have. In this
topic, the concept of personality from the perspective of psychology tests and
measurement will be introduced. Furthermore, a few popular personality tests
which are either based on objective methods or projective methods will also be
discussed.
8.1 PERSONALITY: THE CONCEPTS

Many definitions are used in relation to personality, such as personality serves as
the relatively stable and distinctive patterns of behaviour that characterise an
individual and his or her reactions to the environment.
A person can be categorised as shy, reserved, friendly, caring, manipulative,

thoughtful, systematic, organised and many other characteristics. These
characteristics are referred to as „personality types‰.

172  TOPIC 8 PERSONALITY TESTING
Personality types can be defined as a constellation of traits and states that is

similar in pattern to one identified category of personality within a
taxonomy of personalities (Cohen & Swerdlik, 2010).
Another common term is „personality traits‰ which refer to relatively enduring

dispositions, tendencies to act, think or feel in a certain manner. Personality
types, on the other hand, refer to general descriptions of people.
There is another term, „personality states‰ which refer to emotional reactions

that vary from one situation to another. In other words, personality states are a
relatively temporary predisposition of a person (Chaplin, John and Goldberg,
1988).
We also usually use the word „self-concept‰ which is related to personality and it
refers to a personÊs self-definition, or „an organised and relatively consistent set
of assumptions that a person has about himself or herself‰.
8.2 OBJECTIVE VERSUS PROJECTIVE

Standardised personality tests can be categorised into two main types:
(a) Objective Personality Tests

Objective tests are defined as containing highly structured, clear,
unambiguous items, statements or questions that are objectively scored.
They are typically linked with so-called paper-and-paper personality tests
and computer administered personality tests.
(b) Projective Personality Tests

In contrast, projective tests are tests that have unstructured, ambiguous
items, statements or questions that require respondents to project their
personality into the tasks utilising hidden wishes, attitudes and needs in
responding to the stimuli presented.

TOPIC 8 PERSONALITY TESTING  173
8.3 DEVELOPMENT OF PERSONALITY TESTING

According to history, personality tests began with the work of Fernald who
attempted to measure character traits. He started the development of the test by
writing items based on his theories about personality and then organised these
items into a personality test.
Many personality tests after that were developed based on the need of the
society. For instance, during the World War, there was a need to select
individuals for military service and this led to the construction of tests which can
predict whether an individual recruited could adjust to military life or not.
Two major approaches in the development of personality tests can be seen in this
early stage:
(a) Tests that were constructed from empirical methods; and
(b) Projective tests.
Tests using empirical methods did not use a theoretical framework in

constructing test items but depended on mathematical relationships that existed
among test items. These tests were later categorised into two types:
(a) Tests using factor analysis; and
(b) Tests that employed criterion.
The first projective tests developed by Murray and his associates were called the
Thematic Apperception Test and quickly became popular. Another well-known
projective test is the Rorschach Inkblot Test constructed by Herman Rorschach.
These projective tests differed from the use of objective tests by using unstructured
and ambiguous stimuli.

8.4 OBJECTIVE MEASURES OF PERSONALITY

This subtopic will further explain the California Psychological Inventory (CPI), the
Personality Research Form (PRF), the Sixteen Personality Factor Questionnaire
(16PF) and finally the Neo-PI-R. All these personality tests use objective methods
to measure personality.
8.4.1 California Psychological Inventory (CPI)

Table 8.1 explains the content of the test and psychometric properties of the
California Psychological Inventory (CPI).
Table 8.1: The Content and Psychometric Properties of California

Psychological Inventory (CPI)
The Development of
Description
Personality Tests
Content of the test  The California Psychological Inventory (CPI) was developed by
Harrison Gough in 1957. This inventory measures the normal
personality of adolescents and adults.
 The third revision has eliminated several items that were
considered objectionable, or that were considered to violate
privacy considerations, or that were in conflict with the recent
legislation dealing with the rights of the disabled (Gough &
Bradley, 1996).
 The CPI can be administered and scored individually or in a
group setting and can be answered in about an hour. Scoring
can be done by counting the number of items endorsed on each
scale and plotting the raw scores on a profile. The scores are
then converted to T-scores.
 The items on the CPI are grouped into 20 scales which are as
shown below:
Achievement Dominance Independence Well-being
Intellectual Sociability Empathy Communality
efficiency
Psychological Capacity for Responsibility Tolerance
mindedness status
Flexibility Social presence Socialisation Achievement
via
conformance
Femininity/ Self-acceptance Self-control Good
masculinity impression

Psychometric  The CPI has a large norm which is based on 6,000 samples and
properties this provides information on its validity and reliability. Research
on the CPI (Megargee, 1972) has established that it is extremely
useful in predicting underachievement in academic settings and
potential delinquency.
 There is also evidence that indicates that the CPI can predict job
performance in careers and in school. Deniston and Ramanaiah
(1993) reported that the CPI had factor loadings on four
(extroversion, openness, neuroticism and conscientiousness) of
the five factors comprising the five-factor model of personality
but did not show significant loadings on agreeableness.
8.4.2 Personality Research Form (PRF)

Table 8.2 shows the content of the test and psychometric properties in Personality
Research Form (PRF).
Table 8.2: The Content and Psychometric Properties of Personality Research Form (PRF)
The Development of
Description
Personality Tests
Content of the test  The Personality Research Form (PRF) was developed by
Douglas Jackson in 1967. This test was developed by using the
theoretical framework of Henry Murray and his colleagues at
the Harvard Psychological Clinic (Murray, 1938) which
measures dimensions of normal personality.
 There are two forms of the PRF:
(1) The short forms (Forms A and B) comprise 300 items
measuring 14 personality dimensions and one validity scale; and
(2) The long forms (Forms AA and BB) comprise 440 items
measuring 20 personality dimensions and two validity scales.
 Another form is the PRF (Form-E) that comprises 352 items
measuring 20 dimensions and two validity scales and is similar
to Forms AA and BB.
 The personality dimensions are interpreted using the bipolar
method, meaning that a low score on any scale indicates the
absence of the trait but also the presence of its opposite.

 The 22 scales are as follows:
Abasement Achievement Affiliation Aggression

Change Cognitive Defendence Dominance
structure
Exhibition Harm Impulsivity Nurturance
avoidance
Play Sentience Social recognition Succorance
Desirability Infrequency Understanding Autonomy
Endurance Order
Psychometric  Reliability using the Kuder-Richardson formula; 20 values for

properties the 20 personality content scales have been shown to range
between .87 and .94 with a median of .91 (Jackson, 1999).
 Test-retest reliability data over a one-week period collected by
Bentler (1964) revealed that the PRF personality scale scores
were quite stable over time, ranging from a low of .69 to a high
of .90.
 The PRF has been correlated with other personality and interest
tests (Jackson, 1999) and has generally shown positive
relationships with conceptually similar variables and low-to-
zero-order relationships with conceptually unrelated variables.

8.4.3 Sixteen Personality Factor Questionnaire (16PF)

Table 8.3 states the content of the test and psychometric properties in Sixteen
Personality Factor Questionnaire (16PF).
Table 8.3: The Content and Psychometric Properties of Sixteen Personality Factor
Questionnaire (16PF)
The Development of
Description
Personality Tests
Content of the test  The Sixteen Personality Factor Questionnaire was developed by
Raymond B. Cattell in 1949. The test measures normal personality
and comprises all the characteristics and attributes of normal
adults. Cattell began by conducting a survey of all the words in
the English language which described normal personality
characteristics. Together with Allport and Odbert (1936), they
found approximately 4,000 English adjectives that described
personality characteristics.
 Using the method of factor analysis, they reduced these words
into 15 factors, which were simply labelled A through O. Other
factors considered relevant were added and were given the labels
Q1, Q2, Q3 and Q4.
 The latest edition is the 16PF Fifth Edition (1993), which comprises
185 items and uses a three-point Likert scale. These items are
grouped into 16 primary factor scales representing the dimensions
of personality initially identified by Cattell. The raw scores are
converted into standard scores known as stens (area
transformation scores on a standard 10 base).
 The 16 scales in the 16PF are as follows:
Warmth Liveliness Vigilance Tension
Reasoning Rule- Abstractedness Openness to
consciousness change
Emotional Social-boldness Privateness Self-reliance
stability
Dominance Sensitivity Apprehension Perfectionism
Psychometric  The reliability and validity of the 16PF are reported in numerous
properties studies (Conn & Rieke, 1994; Russell & Karol, 1994). In addition,
evidence presented by R. B. Cattell and Catell (1995) has strongly
supported its proposed factor structure.
 The 16PF also has norms for high school, college and adult
populations. It can also be used in personnel selection and
placement and can measure workersÊ leadership potential,
decision-making ability and personal initiative.

8.4.4 The Revised NEO Personality Inventory

(NEO-PI-R)
Another personality test is The Revised NEO Personality Inventory (NEO-PI-R)
which is widely used in clinical applications and research that involve
personality assessment.
The following are the explanation on the content of the test and its psychometric
properties:
(a) Content of the Test

The NEO in NEO-PI-R stands for the first three domains measured in the
test, which are:
(i) Neuroticism;
(ii) Extraversion; and
(iii) Openness.
Therefore in full it is Neuroticism, Extraversion and Openness Personality

Inventory Revised and was developed by Costa and McCrae.
It measures personality traits according to the five factor model personality.

The inventory has 240 items using a five-point Likert scale.
The five personality dimensions of the NEO-PI-R have six specific facets
and they are as shown in Table 8.4.
Table 8.4: Five Personality Dimensions of the NEO-PI-R
Personality Dimensions
Description
of the NEO-PI-R
Neuroticism Anxiety, hostility, depression, self-consciousness,
impulsiveness, and vulnerability.
Extraversion Warmth, gregariousness, assertiveness, activity,
excitement-seeking and positive emotions.
Openness Fantasy, aesthetics, feelings, actions, ideas and
values.
Agreeableness Trust, straightforwardness, altruism, compliance,
modesty and tender-mindedness.
Conscientiousness Competence, order, dutifulness, achievement
striving, self-discipline, and deliberation.

(b) Psychometric Properties

The internal consistency reliability coefficients of the five dimensions range
from .86 to .95 (Aiken, 2003).
The internal consistency reliability coefficients of the facets range from .56
to .90 (Aiken, 2003).
Test-retest reliability over a six-month period range from .86 to .91 for the
five dimensions and from .56 to .90 for the facet scales (Aiken, 2003).
ACTIVITY 8.1
1. List some of the personality tests that you have taken until now.
What personality type do you have?
2. Name two other examples of personality tests apart from the tests
based on objective methods listed in this topic.
3. Discuss whether the personality tests you mentioned above can

accurately categorise people.
8.5 PROJECTIVE PERSONALITY TESTS

The history of projective tests began with the use of inkblots to assess
imagination and intelligence of individuals.
This form of testing is used based on the assumption that when individuals try to
understand ambiguous stimuli, the interpretation of the stimuli will reflect their
emotions, experience, thinking and needs. In other words, the ambiguous stimuli
eliminate or reduce self-defence and other efforts that are created consciously to
skew test results.
Apart from that, although what is seen by subjects reflect their personal
characteristics, some responses may expose their hidden personalities
unconsciously. Therefore, projective tests are considered sensitive in detecting
hidden personality characteristics or thoughts from their unconscious minds.
However, the administration and interpretation of projective tests are quite

difficult. The same response may give various meanings depending on the
individualÊs characteristics.

8.5.1 Rorschach Inkblot Test

One of the most famous projective tests is the Rorschach Inkblot Test. It was
developed by Hermann Rorschach in 1921. Figure 8.1 shows an image of
Hermann Rorschach.
Figure 8.1: Hermann Rorschach

Source: http://en.wikipedia.org/wiki/Hermann_Rorschach
The following explains the content of the test and its psychometric properties:

Rorschach developed stimuli cards by putting an ink blot on a piece of
paper and then folding that paper. The result was a unique pattern which
was bilaterally symmetrical.
After trying with thousands of inkblots, Rorschach finally selected 20 cards.

However, the final Rorschach inkblot test only uses 10 cards:
(i) Five black and white cards;
(ii) Two cards with black, grey and red inkblots; and
(iii) Another three cards using various colours.

TOPIC 8 PERSONALIT
TY TESTING  181
Fig
gure 8.2 show
ws the first of the
t ten cards in
i the Rorsch
hach inkblot teest.
Figure 8.2: The first card in Rorschach

R test
Sourcee: http://en.wwikipedia.org/w wiki/Rorschacch_test
Thiis test is an individual teest presentedd with minim mum structuree, which
meeans there are no particularr instructionss in respondin
ng.
Thee administrattion of the teest is done by y presenting the cards twice. This
phaase is called free
f association. The tester records thee length of tim
me taken
by subjects to give
g responsees and also thet location ofo the card when
w the
ressponse is madde. The next phase
p is called
d the inquiry phase.
p
Tab ding to a subjectÊs score.

ble 8.5 shows the testerÊs reecords accord
Table 8.5:
8 The TesterrÊs Records Acccording to a Su
ubjectÊs Score
The TesterÊÊs Records

Descrip
ption
aaccording to a SubjectÊs
S Score
Lo
ocation Whole lot, detail
d or uncom
mmon detail.
Deeterminant Form, co
olour, shadiing-texture, shading-
dimension, chromatic colour, achromatiic colour,
movement or a combination of all these.
Co
ontent Anatomy, blood, cloudss, fire, geography and
nature.
Po
opularity Whether it is popular.
(b) Psyychometric Prroperties

Eviidence of reeliability and d validity off the Rorsch hach Inkblot Test is
inaadequate (Gleeser, 1963; Zubbin, Eron & Schumer,
S 19655). The reliab
bility and
vallidity data on
n the test do not meet th he accepted values
v established for
objjective person Hiller, Rosentthal, Bornsteiin & Brunell-Neulieb,
nality tests (H
19999).

8.5.2 Thematic Apperception Test (TAT)

There is another famous projective test, the Thematic Apperception Test, which
uses pictures to measure personality. The following are the content of the test
and its psychometric properties:

The Thematic Apperception Test was developed by Christina Morgan and
Henry Murray in 1935. It was based on MurrayÊs theory of needs, which
categorised human needs to 28 types, among them sexual needs,
dominance and affiliation.
The TAT is more structured and clearer than the Rorschach Inkblot Test. It
consists of 30 picture cards and one empty card that provide stimuli for
respondents to create stories on relationships or social situations as
suggested by the pictures. There are several cards for male respondents
while others are for female respondents.
The administration of the TAT is similar with that of the Rorschach, which
is ambiguous and not standardised. The tester has to record subjectÊs
responses verbatim and also take note of their reaction time. Table 8.6 states
the five aspects in the interpretation of TAT.
Table 8.6: The Five Aspects in Interpretation of TAT
Five Aspects in
Description
Interpretation of TAT
Hero The character in the picture that the subject relates as
himself or herself.
Needs The desires and motives of the hero or heroine in the
story.
Press Environmental influences that disturb or ease the
achievement of the subjectÊs desires and needs.
Themes The theme of the story such as depression.
Outcomes The conclusion of the story such as failure.

The psychometric properties of the TAT, like the Rorschach Inkblot Test,
are not convincing (Zubin, Eron & Schumer, 1965). The validity of the TAT
is also unsubstantiated (Murphy & Davidshofer, 2001). Results of the test-
retest reliability showed changes with the median of .30.

8.5.3 Draw-a-Person Test (DAP)

Another group of projective tests are figure drawing tests. They use expressive
techniques, in which a subject is asked to create something, usually a drawing
(Kaplan et al., 2009). One of the examples of this form of test is the Draw-a-
Person Test. The following is explanation on the content of the test and its
psychometric properties:

The Draw-a-Person (DAP) Test was developed by Karen Machover in 1949.
Its purpose was to assess a personÊs intelligence but its usage has since been
widened to measure personality and psychopathology.
The DAP depends on the assumption that individuals project themselves in

the form of human bodies that they draw.
The DAP requires that the subject draws a picture of himself or herself and
a picture of another human body of the opposite gender. After the picture is
completed, the subject is required to explain the picture drawn including
age, occupation and family relations.
Several structural and formal aspects of the drawings are considered

important in the interpretation process. Table 8.7 states the interpretation of
the process.
Table 8.7: The Interpretation Process
The Interpretation Description

Process
The size of the head Intellectual ability, impulse control or narcissism.
Facial expression Fear, hatred or aggressive behaviour.
Emphasis on the mouth Eating disorders, alcohol or gastric problems.
Eyes Self-concept, social problems or paranoia.
Hair Symbol of power or sexual problems.
Hands Degree of relationship with the environment or
openness to others.
Fingers Ability to manipulate other people.
Legs Support, sexual problems or aggressive behaviour.
Emphasis on the chest Sexual immaturity or neurosis.


Research has generally failed to demonstrate that human drawings can be
successfully used to assess personality, behaviour or psychopathology
(Kahill, 1984; Motta, Little & Tobin, 1993; Smith & Dumont, 1995).
Quantitative scoring systems have been developed for the DAP which may
help to reduce unreliability due to subjectivity in scoring (Naglieri, 1998).
SELF-CHECK 8.1
1. Discuss the meaning of human drawings among children.
2. Describe the method of interpretation of projective personality

tests.
3. Discuss the issues of reliability and validity of personality tests using

projective technique.
 Personality is the relatively stable and distinctive patterns of behaviour that

characterise an individual and his or her reactions to the environment.
 The two major approaches in the development of personality tests that can be
seen at the early stage are:
– Tests that were constructed from empirical methods; and
– Projective tests.
 Objective tests are defined as containing highly structured, clear,

unambiguous items, statements or questions that are objectively scored.
 Projective tests are tests that have unstructured, ambiguous items, statements
or questions.
 Projective tests are considered sensitive to detect the hidden personality and
characteristics or whatever is available in the unconscious mind.
 The California Psychological Inventory (CPI) can predict job performances in

careers and in school.

TOPIC 8 PERSONALIT
TY TESTING  185
 The Sixteen Perrsonality Facctor Question nnaire (16PF

F) measures normal
onality using a factor analy
perso ysis method.
 The NEO-PI-R
N meeasures perso
onality traits according to the five facto
or model
perso
onality.
 The most
m popularr projective personality
p tests are the Ro
orschach Inkb
blot Test
and Thematic
T Appperception Teest (TAT).
 The Draw-a-Perso on Test is a typical exam mple of projeective tests of

o figure
wing tests whiich ask subjeccts to create so
draw omething, usuually a drawiing.
Facets of
o personality
y Perssonality dimen
nsions
Factor analysis
a Perssonality statess
Five facctor model of personality Perssonality traits
Human
n drawings Perssonality typess
Inkblot test Projeective person
nality tests
Interpreetation processs Psycchopathology
y
Normall personality Stan
ndard scores
Objectiv
ve personality
y tests Them
matic Appercception Test
L R. (2003). Psychological
Aiken, L. Ps t
testing and assessment
as . Bo
oston, MA: Allyn
A and
Baccon.
Allport, G. W., & Odbert,

O H. S.. (1936). Traiit-names: A psycho-lexicaal study.
Psyychological Monographs,
M 4 (1).
47
Bentler, P. M. (1964). Response var

ariability: Factt or artifact? Unpublished
U doctoral
disssertation, Sta
anford Univerrsity.
R B., & Cattell, H. E. (1995). Personaality structurre and the new

Cattell, R. n fifth
ediition of the 16PF. Educcational and psychologicaal measurem ment, 55,
9266ă937.
Chaplin, W. F., John, O. P., & Goldberg, L. R. (1988). Conceptions of state and
traits: Dimensional attributes with ideals as prototypes. Journal of
Personality and Social Psychology, 54(4), 541ă557.

introduction to tests and measurement. Boston, MA: McGraw-Hill Higher
Education.
Conn, S. R., & Rieke, M. L. (1994). The 16PF Fifth Edition technical manual.
Champaign, IL: Institute for Personality and Ability Testing.
Deniston, W. M., & Ramanaiah, N. V. (1993). California Psychological Inventory

and the five-factor model of personality. Pscyhological Reports, 73, 491ă496.
Gleser, G. C. (1963). Projective methodologies. Annual Review of Psychology, 14,

391ă422.
Gough, H. G., & Bradley, P. (1996). CPI manual: California Psychological

Inventory. Palo Alto, CA: Consulting Psychologists Press.
Hiller, J.B., Rosenthal, R., Bornstein, R.F., & Brunell-Neulieb, S. (1999). A

comparative meta-analysis of Rorscach and MMPI validity. Psychological
Assessment, 11, 278ă296.
Jackson, J. L. (1999). Psychometric considerations in self-monitoring assessment.

Psychological Assessment, 11, 439ă447.
Jackson Vocational Interest Survey. (2012). Retrieved from

http://www.sigmaassessmentsystems.com/assessments/jvis.asp
Kahill, S. (1984). Human figure drawing in adults: An update of the empirical

evidence, 1967ă1982. Canadian Psychology, 25(4), 269ă292.

Machover, K. (1949). Personality projection in the drawing of the human figure.

Springfield, IL: Charles C. Thomas.
Megargee, E. I. (1972). The California psychological inventory handbook. San

Francisco, CA: Jossey-Bass.

Morgan, C. D., & Murray, H. A. (1935). A method for investigating fantasies: The
thematic apperception test. Archives of Neurology and Psychiatry, 34(2),
289.
Motta, R. W., Little, S. G., & Tobin, M. I. (1993). The use and abuse of human
figure drawings. School Psychology Quarterly, 8(3), 162ă169.
Murphy, K. R., & Davidshofer, C. O. (2001). Psychological testing: Principles and

applications. Upper Saddle River, NJ: Prentice Hall.
Murray, H. A. (and collaborators). (1938). Explorations in personality. New York:

Oxford University Press.
Naglieri, J. A. (1998). Draw-a-Person: A quantitative scoring system. New York:

Psychological Corporation.
Russell, M., & Karol, D. (1994). The 16 PF fifth edition administratorÊs manual.
Champaign, IL: Institute for Personality and Ability Testing.
Smith, D., & Dumont, F. (1995). A cautionary study: Unwarranted interpretations

of the Draw-a-Person Test. Professional Psychology: Research and Practice,
26(3), 298ă303.
Types of Employment Tests. (2014). Retrieved from

http://www.siop.org/workplace/employment%20testing/testtypes.aspx
Zarrella, K. L., & Schuerger, J. M. (1990). Temporal stability of occupational

interest inventories. Psychological Reports, 66(3), 1067ă74.
Zubin, J., Eron, L.D., & Schumer, F. (1965). An experimental approach to

projective techniques. New York: Wiley.

Topic  Psychology
9 Test and
Measurement
in Counselling,
Health and
Clinical
Psychology
LEARNING OUTCOMES
1. Discuss the applications of psychology test and measurement in
clinical, health and counselling psychology;
2. Explain analytically the basic concept of psychopathology;
3. Analyse the Minnesota Multiphasic Personality Inventory (MMPI);
4. Describe the Millon Clinical Multiaxial Inventory (MCMI); and
5. Examine the Diagnostic and Statistical Manual of Mental Disorders.

TOPIC 9 PSYCHOLOGY TEST AND MEASUREMENT IN COUNSELLING,  189
HEALTH AND CLINICAL PSYCHOLOGY
 INTRODUCTION
In the previous topic, you were introduced to personality testing in psychology.
You learnt about the development and objectives of personality tests. In this
topic, discussions will focus on the applications of psychology test and
measurement in more specific fields of psychology namely clinical, health and
counselling psychology.
In relation to these, the concept of psychopathology will be discussed from

psychology test and measurement perspectives. There are a few more personality
tests used for abnormal personality assessment which will be introduced in detail
as well in this topic, the Minnesota Multiphasic Personality Inventory (MMPI),
the Millon Clinical Multiaxial Inventory (MCMI) and the Diagnostic and
Statistical Manual of Mental Disorders (DSM).
9.1 APPLICATION IN COUNSELLING SETTINGS

The psychology test and measurement applied in counselling setting includes
interest tests, which have been introduced in Topic 7, for career counselling
purposes. There are also tests and measurement widely used in counselling
psychology to test on areas such as self-concept, emotions and other related
psychological issues in order to help clients to be more aware about themselves
and assist them to achieve self-growth and make life decision related to
themselves. We are going to discuss in the following section the general issues
related to psychology tests and measurement when applied in a counselling
setting.
9.1.1 Counselling Related Tests

Counselling can provide people with a new life experience. Psychological
counselling can help people better understand their own environment and
society. It is also intended to help in dealing with relationships and gradually
change irrational thinking, feelings and responses to enable a person to improve
the quality of his/her life and inculcate self-worth.
Counsellors use tests generally for assessment, placement and guidance, as well
as to assist clients to enhance their self-knowledge, practise decision-making and
acquire new behaviours.

190  TOPIC 9 PSYCHOLOGY TEST AND MEASUREMENT IN COUNSELLING,
According to Goldman (1971), tests may be used in a variety of counselling

settings including for:
(a) Individuals;
(b) Marital purposes;
(c) Group and family; and
(d) Either informational or non-informational purposes.
Informational uses include the gathering of data of clients, assessing the level of
some traits such as stress and anxiety and measuring the clientsÊ personality
types. The purpose of non-informational tests is to stimulate further interaction
with the client.
Although published literature on testing has increased, proper test utilisation

remains a problematic area. The issue is not so much whether a counsellor uses
tests in counselling practice, but when and to what end tests will be used (Corey,
Corey, & Callanan, 1984).
9.1.2 Testing Process

The steps involved in the process of using tests in counselling are as shown in
Figure 9.1.
Figure 9.1: Testing process in counselling

The five main steps in testing process in counselling are explained further as
follows:
(a) Selecting
Having defined the purpose for testing, the counsellor will look to a variety
of sources for information on available tests for the purpose determined.
Resources include review books, journals, test manuals and textbooks on
testing and measurement (Anastasi, 1988; Cronbach, 1979). The most
complete source of information on a particular test is usually the test
manual.
(b) Administering
Test administration is usually standardised by the developers of the test.
Manual instructions need to be followed in order to make a valid
comparison of an individualÊs score with the testÊs norm group.
Non-standardised tests used in counselling are best given under controlled

circumstances. This allows the counsellorÊs experience with the test to
become an internal norm.
Issues of individual versus group administration of test need to be

considered as well. The clients and the purpose for which they are being
tested will contribute to decisions about group testing.
(c) Scoring
Scoring of tests follow the instructions provided in the test manual. The
counsellor is sometimes given the option of having the test machine scored
rather than hand scored. Both the positive and negative aspects of this
choice need to be considered. It is usually believed that test scoring is best
handled by a machine as this will make it free from bias.
(d) Interpreting
The interpretation of test results is usually the area which allows for the
greatest flexibility within the testing process. Depending upon the
counsellorÊs theoretical point of view and the extent of the test manual
guidelines, interpretation may be brief and superficial, or detailed and
explicitly theory based (Tinsley & Bradley, 1986).

As this area allows for the greatest flexibility, it is also the area with the
greatest danger of misuse. While scoring is best done by a bias-free
machine, interpretation by machine is often too rigid. What is needed is the
experience of a skilled test user to individualise the interpretation of results.
(e) Communicating
Here, the therapeutic skills of counsellors come fully into play (Phelps,
1974). The counsellor will use verbal and non-verbal interaction skills to
convey messages to clients and to assess their understanding.
Confidentiality, counsellor preparation, computer testing and client involvement

are all issues within the ethical realm of testing. Ultimately, tests used by
counsellors must be seen as an adjunct to the entire counselling process. Test
results provide descriptive and objective data, which help counsellors to better
assist their clients in making life choices. In order to make the best use of
available tests in a counselling relationship, the process of testing and the issues
which surround the process must be examined.
ACTIVITY 9.1
1. Do you think it is necessary for counsellors to impose their

suggestions on their clients? Discuss this with your coursemates
and tutor by considering the roles of psychology test and
measurement in counselling psychology.
2. As a counsellor, would you be biased if either a stranger or a friend

goes to you for advice? Discuss with your coursemates and tutor
how you would handle both situations..
3. By doing additional readings, find and explain a few tests which

measure self-concept.

9.2 APPLICATION IN HEALTH PSYCHOLOGY

AND HEALTHCARE
Health psychologists are interested in how behaviour and attitudes affect our
health, with the aim of promoting and maintaining health in the population; but
what does it mean to be healthy?
The World Health Organisation (2003) defines health as „a state of complete

physical, mental and social well-being and not merely the absence of disease
or infirmity‰.
How many people can be considered healthy based on that definition? This
definition of what it means to be healthy probably creates an unrealistic goal for
a vast majority of people.
9.2.1 Lifestyle and Disease

The way in which we live can have profound effects on our health. Personal
habits and lifestyle choices are known as behavioural pathogens because they
influence the onset and progression of disease. This is evident if we look at how
patterns of illnesses change as the lifestyles of society change.
A century ago, contagious and infectious diseases like smallpox, rubella and
influenza were much bigger killers than they are today. Nowadays, more deaths
are caused by heart diseases, cancer and strokes. While advances in medical
science have made a big difference, our lifestyle choices have also contributed to
this changing trend.
The Greek philosopher, Plato, believed that „where temperance is, there health is
speedily imparted‰. Plato has been proven right by health psychologists.
Research shows that moderation in all things is the key to a long and healthy life.

Seven healthy lifestyle habits from a Western perspective have been identified
and are shown in Figure 9.2.
Figure 9.2: Seven healthy lifestyle habits from a Western perspective

A group of people were studied over a 25-year period. Those who followed all
the seven healthy habits previously mentioned had significantly lower mortality
rates than those who followed fewer than three.
People are now more aware of what is good for them and their health. However
knowledge by itself does not lead to changes in behaviour. Even when we are ill
and have been prescribed medicine, many of us do not follow our doctorÊs
advice.
Research has shown that people are more likely to be compliant if the doctor
adopts a friendly approach, communicates well with the patients and provides
them with information about their condition and its treatment.
Even the waiting time to see the doctor can affect how compliant people are.
Many people who are made to wait for more than 30 minutes to see the doctor
will be reluctant to follow their doctorÊs advice. In one study, only 31% of long-
suffering people complied with their treatment. In contrast, 67% of people who
were kept waiting for less than 30 minutes were quite happy to follow their
doctorÊs orders.
9.2.2 Tests and Measurement

The previous section provides a general picture of how lifestyle and disease are
related. This reflects that human physical health is related to psychology, which
is the major interest in health psychology. For this reason, there are various tests
and measurement which have been developed for application in health
psychology and healthcare settings for the purpose of enhancing illness and
disease prevention, together with the health management of the general public.

Table 9.1 further explains the aspects measured in health psychology and
discusses some of the tests used in each of the aspects.
Table 9.1: Aspects Measured in Health Psychology
Aspects Measured Description

Stress  Stress is commonly experienced by everyone in their daily life.
Excessive stress can deteriorate health.
 In health psychology, there are various tests to assess anxiety
and stress, for example The State-Trait Anxiety Inventory and
Holmes-Rahe Stress Inventory which are used to measure a
personÊs anxiety and stress levels so that they can better
understand and handle them.
Coping  There are tests and measurement of coping to assess the coping
styles of individuals when they are faced with challenging
situations in their life.
 One of these measures is the Ways of Coping Scale developed
by Lazarus et al. (Lazarus, 1995; Lazarus & Folkman, 1984).
Quality-of-life  Quality-of-life assessments help individuals to gain a better
understanding of the level of their quality of life and whether
they are living healthily.
 This will provide guidance for them to further improve their
lifestyle in order to live better both psychologically and
physically.
 One of the examples of common methods for measuring
quality of life is the Medical Outcome Study Short Form-36
(SF-36).
Pain  There are also inventories to measure pain, which are
commonly used in health psychology to help individuals in
managing pain.
 Although pain is commonly viewed as purely a sensory
phenomenon with biological mechanisms involved, more and
more studies reveal that pain is a complex perceptual
phenomenon that involves the operation of numerous
psychological processes (Passer & Smith, 2008), where culture,
meanings, beliefs, personality factors and social supports can
influence how a person feels about the pain he/she
experiences.
 Therefore, psychological measurement tools on pain will help
to understand the psychological aspects of pain, which in turn
will be useful for pain management.

ACTIVITY 9.2
1. Read further on the psychology test and measurement tools

commonly used in health psychology and healthcare settings. On
each of the aspects listed below, identify a specific tool used to
measure that particular aspect in relation to health psychology:
(a) Stress and anxiety;
(b) Coping;
(c) Quality of life; and
(d) Pain.
2. Discuss critically with your coursemates and tutors your opinion

of this statement: „Where temperance is, there health is speedily
imparted‰.
9.3 NEUROPSYCHOLOGY TEST AND

MEASUREMENT
In this section, we will focus on a specific field in psychology, neuropsychology,
which is closely related to clinical and health psychology. First, we will introduce
what neuropsychology is.
9.3.1 Neuropsychology
Do you know what neuropsychology is? Let us read its definition.
Neuropsychology is the study of the brain and how it relates to behaviour.
Neuropsychologists study the brain and its many different disorders. Some of
these include the following conditions:
(a) How alcohol affects the functioning of the brain;
(b) How the HIV virus changes brain functioning and leads to problems in
memory;

(c) The effects of AlzheimerÊs disease;
(d) Deficiencies in language;
(e) Motor and movement problems;
(f) Malingering;
(g) Injury or disease of the brain;
(h) Epilepsy; and
(i) Learning disabilities such as Attention Deficit Hyperactive Disorder

(ADHD).
9.3.2 Neuropsychological Tests and Measurement

A substantial amount of psychological testing and measurement in healthcare
and hospital settings are related to neuropsychological tests. They are supposed
to be mostly done by neuropsychologists. However, clinical psychologists with
training in clinical neuropsychology can also administer neuropsychology tests.
In Malaysia, there is still a lack of neuropsychologists. Thus, clinical
psychologists with training in neuropsychology play an important role in
administering neuropsychology tests and measurement in our country.
Neuropsychologists use neuropsychological tests to test for brain dysfunction

which can affect behaviour. Through these tests, neuropsychologists can identify
disorders of the brain and spinal cord for further treatment or for the sake of
gaining knowledge. It is also a diverse field which overlaps with the studies of
psychological testing, neurology and psychiatry.
Neuropsychologists use many techniques to diagnose psychological disorders by

using tests which are able to measure the following areas:
(a) Sensory input;
(b) Attention and concentration;
(c) Learning and memory;
(d) Language;
(e) Executive functions; and
(f) Motor output.

These different areas allow neuropsychologists or other psychologists and

healthcare practitioners to determine the area in which the dysfunction exists.
Table 9.2 describes the different areas measured by neuropsychologists and the
tests used for each area.
Table 9.2: Areas Measured by Neuropsychologists
Areas Measured Description

Sensory input  Sensory input is important because humans need to hear, feel,
smell and perceive incoming stimuli accurately to learn and
function well.
 Neuropsychologists measure the senses to see if there is a
deficiency or dysfunction.
 One example of a test is asking a blindfolded subject, after
touching one of his/her hand, to identify which hand has been
touched. This test determines if the sense of touch is operating
correctly.
 In addition, a faint sound can be presented to one ear to test if
hearing is functioning correctly.
Attention and  Attention and concentration is the ability to attend to stimuli,
concentration sustain attention, shift attention, the ability to ignore irrelevant
stimuli and the ability to divide attention to different tasks at the
same time.
 Most attention tasks measure one component of attention.
 An example of an attention test is the continuous performance
test.
 This test measures sustained attention. In it, a person is asked to
press a key when a certain letter or shape appears on the
computer monitor.
Learning and  Learning and memory are connected to each other and it is not
memory meaningful to separate them in tests.
 Tests measure the various aspects of learning and memory such
as short-term memory, long-term memory, auditory and visual
memory and verbal memory.
 Examples of tests for learning and memory are the Wechsler
Memory Scale-3 (WMS-3).

Language  The use and understanding of language is the ability to speak,

write and read the language.
 Expressive language is the ability to speak whereas receptive
language is the ability to understand the language.
 A deficiency in language skills due to brain damage is known as
aphasia.
 There are two major forms of aphasia, WernickeÊs aphasia and
BrocaÊs aphasia.
 WernickeÊs aphasia is a deficiency in understanding language
and BrocaÊs aphasia is a difficulty in producing language.
 AphasiaÊs can be determined using an MRI because the MRI will
reveal structural damage to the brain.
Executive  To test executive functioning (conceptual reasoning, planning,
functioning organisation and flexibility in thinking), the Wisconsin Card
Sorting test can be used.
 In this test, a number of cards which have varying numbers,
colours and shapes are used. The subject is then asked to sort the
cards according to different aspects of the cards. These tests are to
determine if there is frontal lobe damage.
Motor output  The last area in which neuropsychologists measure is motor
output. These tests measure both motor speed and accuracy.
 An example of a test for this is the finger tapping test.
 In this test, subjects are asked to tap their index fingers of both
hands as fast as they can in one minute; the number of taps is
counted as an indication of motor speed.
 As for the grooved pegboard tests, subjects are asked to put pegs
in holes within a certain amount of time with their left and right
hands. The number of pegs placed in the pegboard is counted
also as an indication of motor speed.
Source:
http://en.wikibooks.org/wiki/Psychological_Testing/Testing_in_Health_Psychology
SELF-CHECK 9.1
1. What ability can be tested using the Wisconsin Card Sorting test?
2. Explain what the Wechsler Memory Scale-3 (WMS-3) measures.

9.4 APPLICATIONS IN CLINICAL PSYCHOLOGY

Clinical psychology is concerned with the understanding and treatment of
psychological distress. It focuses on research and psychotherapy on the more
severe forms of behaviour and mental pathology in comparison to counselling
psychology which focuses on the „everyday‰ types of concerns and problems
such as those related to marriage, family, academics and career (Cohen &
Swerdlik, 2010).
Since clinical psychology deals with behaviour and mental pathology, it is first
important to understand the concept of psychopathology. After which,
personality tests on psychopathology and measurement on psychological and
mental disorders commonly used in clinical psychology will be highlighted.
9.4.1 Psychopathology
Psychopathology is the study of mental illness. A mental disorder or mental
illness is a psychological or behavioural pattern associated with distress or
disability that occurs in an individual and is not a part of normal development.
The term is most commonly used within psychiatry. Psychiatry is the branch of
medicine that deals with the diagnosis, treatment and prevention of mental and
emotional disorders, whereas pathology refers to disease processes.
Psychiatry uses medical models in understanding and treating psychopathology

whereas clinical psychology, which deals with psychopathology as well, uses
psychological and social models to investigate psychopathology and applies
various psychotherapy approaches for the intervention.
Another term closely related to psychopathology is abnormal psychology.

Abnormal psychology is the study of mental and emotional disorders or
maladaptive behaviours or of altered mental phenomena such as dreams,
hypnosis and other altered states or levels of consciousness. This is a term used
more frequently in the non-medical field of psychology.

9.4.2 Psychopathology as the Study of Mental Illness

Many different professions may be involved in studying mental illness or
distress. Most notably, psychiatrists and clinical psychologists are particularly
interested in this area and may either be involved in the clinical treatment of
mental illness, or research into the origin, development and manifestations of
such states, or often both.
Many different specialties may be involved in the study of psychopathology. For

example, a neuroscientist may focus on brain changes related to mental illness.
Therefore, someone who is referred to as a psychopathologist may be one of any
number of professionals who have specialised in studying this area.
Psychopathology should not be confused with psychopathy, which deals with
personality disorder.
Psychiatrists in particular are interested in descriptive psychopathology, which

works towards describing the symptoms and syndromes of mental illness. This is
both for the diagnosis of individual patients (to see whether the patientÊs
experience fits any pre-existing classification), or for the creation of diagnostic
systems (such as the Diagnostic and Statistical Manual of Mental Disorders
(DSM) or International Statistical Classification of Diseases and Related Health
Problems (ICD)) which define exactly which signs and symptoms should make
up a diagnosis and how experiences and behaviours should be grouped in
particular diagnoses (e.g. clinical depression, paraphernalia, paranoia and
schizophrenia).
9.4.3 Psychopathology as a Descriptive Term

The term psychopathology may also be used to denote behaviours or
experiences, which are indicative of mental illness, even if they do not constitute
a formal diagnosis. For example, hallucinating may be considered as a
psychopathological sign, even if there are not enough symptoms present to fulfil
the criteria for one of the disorders listed in the DSM or ICD.
In a more general sense, any behaviour or experience which causes impairment,

distress or disability, particularly if it is thought to arise from a functional
breakdown in either the cognitive or neuro-cognitive system in the brain, may be
classified as psychopathology.

After understanding the concept of psychopathology, we can now move on to

discuss the specific psychology tests and measurement tools related to
psychopathology which are essential for the field of clinical psychology.
SELF-CHECK 9.2
1. Define psychopathology.
2. Differentiate between psychiatry and psychopathology.
3. How does hallucination affect human behaviour?
9.5 THE MINNESOTA MULTIPHASIC

PERSONALITY INVENTORY (MMPI)
The Minnesota Multiphasic Personality Inventory (MMPI) is one of the most
frequently used personality tests in mental health. The test is used by trained
professionals to assist in identifying personality structure and psychopathology.
9.5.1 Overviews, History and Development

The original authors of the MMPI were Starke R. Hathaway, PhD and
J. C. McKinley, MD. The MMPI is copyrighted by the University of Minnesota.
The standardised answer sheets can be hand scored with templates that fit over
the answer sheets, but most tests are computer scored.
Computer scoring programs for the current standardised version, the MMPI-2,
are licensed by the University of Minnesota Press to Pearson Assessments and
other companies located in different countries. The computer scoring programs
offer a range of scoring profile choices including the extended score report,
which includes data on the newest and most psychometrically advanced scales ă
the Restructured Clinical scales (RC scales).
The extended score report also provides scores on the more traditionally used
clinical scales as well as content, supplementary and other subscales of potential
interest to clinicians.

The use of the MMPI is tightly controlled for ethical and financial reasons. The
clinician using the MMPI has to pay for materials and for scoring and report
services, as well as for installing the computerised program. The most historically
significant developmental changes for MMPI include:
(a) MMPI
The original MMPI was developed in 1939 (Groth Marnat, , 2009) using an
empirical keying approach, which means that the clinical scales were
derived by selecting items that were endorsed by patients known to have
been diagnosed with certain pathologies.
The difference between this approach and other test development strategies
used around the time was that it was theoretical (not based on any
particular theory) and thus, the initial test was not aligned with the
prevailing psychodynamic theories of the time.
The theoretical approach to MMPI development seemingly enabled the test

to capture aspects of human psychopathology that were recognisable and
meaningful despite changes in clinical theories.
However, because the MMPI scales were created based on a group with
known psychopathologies, the scales themselves were not theoretical, by
way of using the participantsÊ clinical diagnoses to determine the content of
the scales.
(b) MMPI-2
The first major revision of the MMPI was the MMPI-2, which was
standardised based on a new national sample of adults in the United States
and released in 1989. It is appropriate for use with adults aged 18 and over.
Subsequent revisions of certain test elements have been published and a

wide variety of subscales have also been introduced over the years to help
clinicians interpret the results of the original clinical scales, which were
found to contain a general factor that made interpretation of scores on the
clinical scales difficult.
The current MMPI-2 has 567 items, all in true-or-false format and usually
takes between one to two hours to complete, depending on participantsÊ
reading level.

There is an infrequently used abbreviated form of the test which consists of

the MMPI-2Ês first 370 items. The shorter version has been mainly used in
circumstances that have not allowed the full version to be completed (e.g.
illness or time pressure), but the scores available on the shorter version are
not as extensive as those available in the 567-item version.
(c) MMPI-A
A version of the test designed for adolescents, the MMPI-A, was released in
1992. The MMPI-A has 478 items, with a short form of 350 items.
(d) MMPI-2 RF
A new and psychometrically improved version of the MMPI-2 has recently
been developed, employing rigorous statistical methods that were used to
develop the restructured clinical (RC) scales in 2003. The new MMPI-2
Restructured Form (MMPI-2-RF) has now been released by Pearson
Assessments.
The MMPI-2-RF produces scores on a theoretically grounded, hierarchically

structured set of scales, including the RC scales. The modern methods used
to develop the MMPI-2-RF were not available at the time the MMPI was
originally developed. The MMPI-2-RF builds on the foundation of the RC
scales, which have been extensively researched since their publication in
2003.
Publications on the MMPI-2-RC scales include book chapters, multiple

published articles in peer-reviewed journals and address the use of the
scales in a wide range of settings. The MMPI-2-RF scales rest on an
assumption that psychopathology is a homogenous condition that is
additive.
SELF-CHECK 9.3
1. What is the basic concept of MMPI?
2. Why did MMPI-2 come into the picture?
3. What is the basis for the origins of MMPI-2-RF?

9.5.2 Current Scale Composition

In this section, we will look at different scales and their respective composition.
(a) Clinical Scales

Clinical scales composition of the new version ranges from Scale 1 to
Scale 0 and each has its own purpose of measuring different items.
Each scale is also known by another name which is closely related to
the item that it measures as explained in Table 9.3.
Table 9.3: New Clinical Scales of MMPI and Its Purposes
Clinical Scales Purpose

Scale 1 (The Measures a personÊs perception and preoccupation
Hypochondriasis Scale) with their health and health issues.
Scale 2 (The Depression Measures a personÊs depressive symptoms level.
Scale)
Scale 3 (The Hysteria Scale) Measures the emotionality of a person.
Scale 4 (The Psychopathic Measures a personÊs need for control or their
Deviate Scale) rebellion against control.
Scale 5 (The Femininity/ Measures a stereotype of a person and how they
Masculinity Scale) compare to other people. For men, it would be the
Marlboro man; for women it would be June Cleaver
or Donna Reed.
Scale 6 (The Paranoia Scale) Measures a personÊs inability to trust.
Scale 7 (The Psychasthenia Measures a personÊs anxiety levels and tendencies.
Scale)
Scale 8 (The Schizophrenia Measures a personÊs unusual/odd cognitive,
Scale) perceptual and emotional experiences.
Scale 9 (The Hypomania Measures a personÊs energy.
Scale)
Scale 0 (The Social Measures whether people enjoy and are
Introversion Scale) comfortable being around other people.
Source: http://sevencounties.org/poc/view_doc.php?type=doc&id=8214&cn=18

The original clinical scales were designed to measure common diagnoses of

the era, as shown in Table 9.4.
Table 9.4: The Original Clinical Scales of MMPI
No. of
No. Abbreviation Description What is Measured
Items
Concerned with bodily
1. Hs Hypochondriasis 32
symptoms
2. D Depression Depressive symptoms 57
Awareness of problems and
3. Hy Hysteria 60
vulnerabilities
Psychopathic Conflict, struggle, anger,
4. Pd 50
Deviate respect for societyÊs rules
Masculinity/ Stereotypical masculine or
5. MF 56
Femininity feminine interests/behaviours
Level of trust, suspiciousness,
6. Pa Paranoia 40
sensitivity
Worry, anxiety, tension,
7. Pt Psychasthenia 48
doubts, obsessiveness
8. Sc Schizophrenia Odd thinking and social
78
alienation
9. Ma Hypomania Level of excitability 46
10. Si Social Introversion People orientation 69
Source:
http://en.wikipedia.org/wiki/Minnesota_Multiphasic_Personality_Inventory
Codetypes are a combination of the one, two or three (and according to a

few authors even four), highest scoring clinical scales (for example, 4, 8, 2 =
482).
Codetypes are interpreted as a single, wider ranged elevation, rather than

interpreting each scale individually.
(b) Validity Scales

The validity scales in the MMPI-2 RF are minor revisions of those contained
in the MMPI-2, which include three basic types of validity measures:
(i) Those that were designed to detect non-responding or inconsistent

responding (CNS, VRIN, TRIN);

(ii) Those designed to detect when clients are over reporting or

exaggerating the prevalence or severity of psychological symptoms (F,
Fb, Fp, FBS); and
(iii) Those designed to detect when test takers are under-reporting or

downplaying psychological symptoms (L, K).
A new addition to the validity scales for MMPI-2 RF includes an over

reporting scale of somatic symptoms scale (Fs), as shown in Table 9.5.
Table 9.5: Validity Scales for MMPI-2 RF
New in
Abbreviation Description Assesses
Version
CNS 1 „Cannot Say‰ Questions not answered
L 1 Lie Client „faking good‰
F 1 Infrequency Client „faking bad‰ (in first half of
test)
K 1 Defensiveness Denial/Evasiveness
Fb 2 Back F Client „faking bad‰ (in last half of
test)
VRIN 2 Variable Response Answering similar/opposite
Inconsistency question pairs inconsistently
TRIN 2 True Response Answering questions all true/all
Inconsistency false
F-K 2 F minus K Honesty of test responses/not
faking good or bad
S 2 Superlative Self- Improving upon K scale,
Presentation „appearing excessively good‰
Fp 2 F-Psychopathology Frequency of presentation in clinical
setting
Fs 2 RF Infrequent Somatic Over-reporting of somatic
Response symptoms
Source:

(c) Content Scales

To supplement these multidimensional scales and to assist in interpreting
the frequently seen diffuse elevations due to the general factor (removed in
the RC scales) were also developed, with the more frequently used being
the substance abuse scales (MAC-R, APS, AAS), designed to assess the
extent to which a client admits to or is prone to abusing substances and the
A (anxiety) and R (repression) scales, developed by Welsh after conducting
a factor analysis of the original MMPI item pool.
Dozens of content scales currently exist, some samples of which are shown
in Table 9.6.
Table 9.6: Content Scales
Abbreviation Description
Es Ego Strength Scale
OH Over-Controlled Hostility Scale
MAC MacAndrews Alcoholism Scale
MAC-R MacAndrews Alcoholism Scale Revised
Do Dominance Scale
APS Addictions Potential Scale
AAS Addictions Acknowledgement Scale
SOD Social Discomfort Scale
A Anxiety Scale
R Repression Scale
TPA Type A Scale
MDS Marital Distress Scale
Source:
(d) PSY-5 Scales

Unlike the Content and Supplementary scales, the PSY-5 scales were not
developed as a reaction to some actual or perceived shortcoming in the
MMPI-2 itself, but rather as an attempt to connect the instrument with more
general trends in personality psychology.

The five factor model of human personality has gained great acceptance
amongst non-pathological populations. The PSY-5 scales differ from the
five factors identified in non-pathological populations in that they were
meant to determine the extent to which personality disorders might
manifest and be recognisable in clinical populations. The five components
were labelled as:
(i) Negative Emotionality (NEGE);
(ii) Psychoticism (PSYC);
(iii) Introversion (INTR);
(iv) Disconstraint (DISC); and
(v) Aggressiveness (AGGR).
9.5.3 Scoring and Interpretation

Like many standardised tests, scores on the various scales of the MMPI-2 and the
MMPI-2-RF are not representative of either percentile rank or how „well‰ or
„poorly‰ someone has done on the test. Rather, analysis looks at relative
elevation of factors compared to the various norm groups studied.
Raw scores on the scales are transformed into a standardised metric known as T-
scores (mean or average equals 50, standard deviation equals 10), making
interpretation easier for clinicians. Test manufacturers and publishers ask test
purchasers to prove they are qualified to purchase the MMPI/MMPI-2/MMPI-2-
RF and other tests (Sevencounties.org, 2014).
SELF-CHECK 9.4
What do the scales in MMPI denote?

9.6 THE MILLON CLINICAL MULTIAXIAL

INVENTORY (MCMI)
The Millon Clinical Multiaxial Inventory-III (MCMI-III) is a psychological
assessment tool intended to provide information on psychopathology, including
specific disorders outlined in the DSM-IV. It is intended for adults (18 and above)
who have at least an eight-grade reading level.
The MCMI was developed and standardised specifically based on clinical

populations (i.e. patients in psychiatric hospitals or people with existing mental
health problems) and the authors were very specific that it should not be used
with the general population. However, there is a strong evidence base which
shows that it still retains its validity in non-clinical populations and so
psychologists will often also administer the test to members of the general
population.
It is composed of 175 true-false questions that reportedly take 25 to 30 minutes to

complete. It was created by Theodore Millon, Carrie Millon, Roger Davis and
Seth Grossman. The test was formed from a sample of 998 male and female
adults with a wide variety of clinical disorders.
The test is modelled on the following four scales:
(a) 14 Personality Disorder Scales.
(b) 10 Clinical Syndrome Scales.
(c) Five Correction Scales which includes:
(i) Three Modifying Indices (which determine the patientÊs response

style and can detect random responding); and
(ii) Two Random Response Indicators.
(d) 42 Grossman Personality Facet Scales (based on Seth GrossmanÊs theories of

personality and psychopathology).

9.6.1 Psychometrics of MCMI-III

MCMI (Millon Clinical Multiaxial Inventory) is distinguished from other
inventories primarily by its brevity, its theoretical anchoring, multiaxial format,
tripartite construction and validation schema, use of base rate scores and
interpretive depth.
The Millon Clinical Multiaxial Inventory-III, Third Edition (MCMI-III) (2009) has
new norms and updated scoring.
Each generation of the MCMI inventory has attempted to keep the total number
of items small enough to encourage its use in all types of diagnostic and
treatment settings. Yet, it is kept large enough to permit the assessment of a wide
range of clinically relevant multiaxial behaviours.
At 175 items, the MCMI inventory is much shorter than comparable instruments.
Terminology is geared to an eighth-grade reading level. The inventory is almost
self-administering. A great majority of patients can complete the MCMI-III™ in
20 to 30 minutes, facilitating relatively simple and rapid administrations while
minimising patient resistance and fatigue.
According to Millon.net (2014), the following are some of the descriptions of

MCMI-III™:
(a) Theoretical Anchoring

Diagnostic instruments are more useful when they are linked systematically
to a comprehensive clinical theory. Unfortunately, assessment techniques
and personality theories have developed almost independently. As a result,
very few diagnostic measures have either been based on or have evolved
from clinical theory.
The MCMI is different. Each of its Axis II scales is an operational measure

of a syndrome derived from a theory of personality (Millon, 1969, 1981,
1986a, 1986b, 1990; Millon & Davis, 1996). The scales and profiles of the
MCMI thus measure these theory-derived and theory-refined variables
directly and quantifiably.
With a firm foundation in measurement, scale elevations and

configurations can be used to suggest specific patient diagnoses and clinical
dynamics, as well as testable hypotheses about social history and current
behaviour.

(b) Base Rate Scores

An important feature which distinguishes the MCMI inventory from other
inventories is its use of actuarial base rate data, rather than normalised
standard score transformations.
T-scores implicitly assume the prevalence rates of all disorders to be equal,

for example that there are equal numbers of depressives and
schizophrenics. In contrast, the MCMI inventory seeks to diagnose the
percentages of patients that are actually found to be disordered across
diagnostic settings. These data not only provide a basis for selecting
optimal differential diagnostic cutting lines, but also ensure that the
frequency of MCMI generated diagnoses and profile patterns will be
comparable to representative clinical prevalence rates.
(c) Computer Scoring and Interpretation

Computer programs are available for rapid and convenient machine
scoring in all major computing environments. Interpretive reports are
available with two levels of details.
The Profile Report presents the patientÊs MCMI scores and profile and is
useful as a screening device to identify patients that may require more
intensive evaluation or professional attention.
The Narrative Report integrates both logical and symptomatic features of

the patient and is arranged in a style similar to those prepared by clinical
psychologists. Results are based on actuarial research, the MCMIÊs
theoretical schema and relevant DSM diagnoses within a multiaxial
framework. Therapeutic implications are included as well.
(d) Clinical Uses

The primary function of the MCMI inventory is to provide information to
clinicians, including psychologists, psychiatrists, counsellors, social
workers, physicians and nurses, who must make assessments and
treatment decisions concerning people with emotional and interpersonal
difficulties.
Due to its simplicity of administration and the availability of rapid

computer scoring and interpretation, the MCMI inventory can be used on a
routine basis in:
(i) Outpatient clinics;
(ii) Community agencies;

(iii) Mental health centres;
(iv) College counselling programmes;
(v) General and mental hospitals;
(vi) Independent and group practice offices; and
(vii) In the courts.
(e) Research
Over 600 research studies have used the MCMI inventory in a significant
manner. Objective, quantified and theory-grounded individual scale scores
and profile patterns can be used to generate and test a variety of clinical,
experimental and demographic hypotheses. Research support is also
available through Pearson Assessments.
(f) Scales
The current version, the MCMI-III, is composed of 175 items that are scored
to produce 28 scales divided into the following categories (Groth-Marnat,
2009):
(i) Clinical Personality Patterns;
(ii) Severe Personality Pathology;
(iii) Clinical Syndromes; and
(iv) Severe Syndromes; and
(v) Modifying Indices.
The personality scales parallel the personality disorders of the DSM-III-R

and DSM-IV, as refined by theory. They are grouped into two levels of
severity:
(i) The Clinical Personality Patterns scales; and
(ii) Severe Personality Scales.
The Axis I scales represent clinical conditions frequently seen in clinical

settings. They are also grouped into two levels of severity:
(i) The Clinical Syndrome scales; and
(ii) The Severe Syndrome scales.

The three Modifying Indices ă Disclosure, Desirability and Debasement ă

assess response tendencies which are connected to particular personality
patterns or Axis I conditions.
The contents of these scale categories in MCMI-III consist of:
(i) Eleven Clinical Personality Patterns scales;
(ii) Three Severe Personality Pathology scales;
(iii) Seven Clinical Syndrome Scales;
(iv) Three Severe Clinical Syndrome scales; and
(v) Three Modifying Indices and one Validity index.
Table 9.7 provides a clearer outline of the respective scale categories, their
name and the number of relevant items in measuring each scale.
Table 9.7: MCMI-III Scale Categories and Number of Items
Scale category/Name No. of items

Modifying Indices
 Disclosure NA
 Desirability 21
 Debasement 33
 Validity 4
Clinical Personality Patterns

 Schizoid 16
 Avoidant 16
 Depressive 15
 Dependent 16
 Histrionic 17
 Narcissistic 24
17
 Antisocial
20
 Aggressive (Sadistic)
17
 Compulsive
16
 Passive-Aggressive
(Negativistic) 15
 Self-Defeating

Severe Personality Pathology

 Schizotypal 16
 Borderline 16
 Paranoid 17
Clinical Syndromes
 Anxiety 14
 Somatoform 12
 Bipolar: Manic 13
 Dysthymia 14
 Alcohol Dependence 15
 Drug Dependence 14
16
 Post-traumatic Stress Disorder
Severe Syndromes
 Thought Disorder 17
 Major Depression 17
 Delusional Disorder 13
Source: Adapted from Millon (1997)
The MCMI-III, is a recent development in that it adds value to the basic

inventory. Present for the first time are a series of facet subscales for
refining and maximising the utility of each of the major personality scales.
Known as the Grossman Facet Scales, they provide information specifying
patientsÊ scores on several of the logical/clinical domains of a patient, such
as problematic interpersonal conduct, cognitive styles, expressive
behaviours and the like.
The MCMI-III thereby contribute useful diagnostic information that should

help clinicians better understand the particular realms of functioning in
which patientsÊ difficulties manifest themselves. They should also provide
the clinical practitioner with guidance for selecting specific therapeutic
modalities that are likely to maximise the achievement of positive treatment
goals.
Scale descriptions and detailed data on test development and validation can
be obtained from the latest (2006) MCMI-III, test manual.

SELF-CHECK 9.5
1. What is the basic use of MCMI?
2. What difference did the Third Edition of MCMI or MCMI-III

make?
3. How is MMPI different from the MCMI?
9.7 DIAGNOSTIC AND STATISTICAL MANUAL

OF MENTAL DISORDERS
The Diagnostic and Statistical Manual of Mental Disorders (DSM) published by
the American Psychiatric Association is a widely used diagnostic manual to
assess and diagnose mental disorders in psychopathology. Many mental health
professionals use the manual to determine and help communicate a patientÊs
diagnosis after an evaluation; hospitals, clinics and insurance companies in the
US also generally require a „five axis‰ DSM diagnosis of all the patients treated.
DSM can be used clinically and also to categorise patients using diagnostic
criteria for research purposes. Studies done on specific disorders often recruit
patients whose symptoms match the criteria listed in the DSM for that disorder.
An international survey of psychiatrists in 66 countries comparing the use of the
ICD-10 and DSM-IV found the former was more often used for clinical diagnosis
while the latter was more valued for research.
The DSM, including DSM-IV, is a registered trademark belonging to the

American Psychiatric Association (APA). It is a bestselling publication from
which APA makes „huge profits‰ and gains considerable clout in the world of
psychiatry, especially as many reputed research journals require studies to use
DSM classification in order to be published.

9.7.1 History of DSM

The initial impetus for developing a classification of mental disorders in the
United States was the need to collect statistical information.
The first official attempt was the 1840 census which used a single category,
„idiocy/insanity‰. The 1880 census distinguished among seven categories of
mental illness, which are listed in Figure 9.3.
Figure 9.3: Seven categories of mental illness
In 1917, a „Committee on Statistics‰ from what is now known as the American

Psychiatric Association (APA), together with the National Commission on
Mental Hygiene developed a new guide for mental hospitals called the
„Statistical Manual for the Use of Institutions for the Insane‰, which included 22
diagnoses. This was subsequently revised several times by APA over the years.
APA, along with the New York Academy of Medicine, also provided the
psychiatric nomenclature subsection of the US medical guide, the „Standard
Classified Nomenclature of Disease‰, referred to as the „Standard‰.

9.7.2 Developments of DSM

There are seven different developments of DSMs starting from the year 1952 to
2013 and this section will discuss each one in greater detail.
Figure 9.4 shows the seven different DSMs, from the earliest to the latest ones.
Figure 9.4: Seven DSMs from the earliest to the latest
The following are detailed descriptions of the seven different DSMs, from the
earliest to the latest:
(a) DSM-I (1952)

World War II saw the large-scale involvement of US psychiatrists in the
selection, processing, assessment and treatment of soldiers. This moved the
focus away from mental institutions and traditional clinical perspectives.
A committee headed by psychiatrist and brigadier general William C.

Menninger developed a new classification scheme called Medical 203,
issued in 1943 as a „War Department Technical Bulletin‰ under the
auspices of the Office of the Surgeon General.
The foreword to the DSM-I states that the US Navy had itself made some
minor revisions but „the Army established a much more sweeping revision,
abandoning the basic outline of the Standard and attempting to express
present day concepts of mental disturbance. This nomenclature eventually
was adopted by all Armed Forces‰, and „assorted modifications of the
Armed Forces nomenclature [were] introduced into many clinics and
hospitals by psychiatrists returning from military duty.‰ The Veterans
Administration also adopted a slightly modified version of Medical 203.
In 1949, the World Health Organisation published the sixth revision of the
International Statistical Classification of Diseases (ICD) which included a
section on mental disorders for the first time. The foreword to DSM-1 states
this „categorised mental disorders in rubrics similar to those of the Armed
Forces nomenclature.‰

An APA Committee on Nomenclature and Statistics was empowered to

develop a version specifically for use in the United States, to standardise
the diverse and confused usage of different documents. In 1950, the APA
committee undertook a review and consultation. It circulated an adaptation
of Medical 203, the VA system and the StandardÊs Nomenclature, to
approximately 10% of APA members.
46% replied, of which 93% approved and after further revisions (resulting
in it being called DSM-I), the Diagnostic and Statistical Manual of Mental
Disorders was approved in 1951 and published in 1952. Its structure and
conceptual framework were the same as in Medical 203 and many passages
of the text were identical. The manual was 130 pages long and listed 106
mental disorders.
(b) DSM-II (1968)

Although the APA was closely involved in the next significant revision of
the mental disorder section of the ICD (version 8 in 1968), it decided to also
go ahead with a revision of the DSM. It was also published in 1968, listed
182 disorders and was 134 pages long. It was quite similar to the DSM-I.
The term „reaction‰ was dropped from it but the term „neurosis‰ was
retained. Both DSM-I and DSM-II reflected the predominant psychodynamic
psychiatry, although they also included biological perspectives and concepts
from KraepelinÊs system of classification.
Symptoms were not specified in detail for specific disorders. Many

were seen as reflections of broad underlying conflicts or maladaptive
reactions to life problems, rooted in a distinction between neurosis and
psychosis (roughly, anxiety/depression broadly in touch with reality, or
hallucinations/delusions appearing disconnected from reality).
Sociological and biological knowledge were also incorporated, in a model

that did not emphasise a clear boundary between normality and
abnormality.
(c) DSM-III (1980)

In 1974, the decision to create a new revision of the DSM was made and
Robert Spitzer was selected as chairman of the task force. The initial
impetus was to make the DSM nomenclature consistent with the
International Statistical Classification of Diseases and Related Health
Problems (ICD), published by the World Health Organization.

The revision took on a far wider mandate under the influence and control
of Spitzer and his chosen committee members. One goal was to improve the
uniformity and validity of psychiatric diagnosis in the wake of a number of
critiques, including the famous Rosenhan experiment. There was also a
need to standardise diagnostic practices within the US and with other
countries after research showed that psychiatric diagnoses differed
markedly between Europe and the US. The establishment of these criteria
was also an attempt to facilitate the pharmaceutical regulatory process.
The criteria adopted for many of the mental disorders were taken from the
Research Diagnostic Criteria (RDC) and Feighner Criteria, which had just
been developed by a group of research-orientated psychiatrists based
primarily at Washington University in St. Louis and the New York State
Psychiatric Institute.
Other criteria and potential new categories of disorder were established by

consensus during meetings of the committee, chaired by Spitzer. A key aim
was to base categorisation on colloquial English descriptive language
(which would be easier for use by Federal administrative offices), rather
than assumptions of etiology, although its categorical approach assumed
each particular pattern of symptoms in a category reflected a particular
underlying pathology (an approach described as „neo-Kraepelinian‰).
The psychodynamic or physiologic view was abandoned, in favour of a

regulatory or legislative model. A new „multiaxial‰ system attempted to
yield a picture more amenable to a statistical population census, rather than
just a simple diagnosis.
Spitzer argued that „mental disorders are a subset of medical disorders‰

but the task force decided on the DSM statement: „Each of the mental
disorders is conceptualised as a clinically significant behavioural or
psychological syndrome.‰
The first draft of the DSM-III was prepared within a year. Many new
categories of disorders were introduced; a number of the unpublished
documents that aim to justify them have recently come to light.
Field trials sponsored by the US National Institute of Mental Health

(NIMH) were conducted between 1977 and 1979 to test the reliability of the
new diagnoses. A controversy emerged regarding the deletion of the
concept of neurosis, a mainstream of psychoanalytic theory and therapy but
seen as vague and unscientific by the DSM task force.

Faced with enormous political opposition, the DSM-III was in serious

danger of not being approved by the APA Board of Trustees unless
„neurosis‰ was included in some capacity, a political compromise
reinserted the term in parentheses after the word „disorder‰ in some cases.
Additionally, the diagnosis of ego-dystonic homosexuality replaced the
DSM-II category of „sexual orientation disturbance‰.
Finally published in 1980, the DSM-III was 494 pages long and listed 265
diagnostic categories. It rapidly came into widespread international use by
multiple stakeholders and has been termed a revolution or transformation
in psychiatry.
However, Robert Spitzer later criticised his own work on it in an interview

with Adam Curtis, saying it led to the medicalisation of 20 to 30 percent of
the population who may not have had any serious mental problems.
(d) DSM-III-R (1987)

In 1987, the DSM-III-R was published as a revision of DSM-III, under the
direction of Spitzer. Categories were renamed, reorganised and significant
changes in criteria were made.
Six categories were deleted while others were added. Controversial

diagnoses such as pre-menstrual dysphoric disorder and Masochistic
Personality Disorder were considered and discarded. „Sexual orientation
disturbance‰ was also removed, but was largely subsumed under „sexual
disorder not otherwise specified‰ which can include „persistent and
marked distress about oneÊs sexual orientation.‰ Altogether, DSM-III-R
contained 292 diagnoses and was 567 pages long.
(e) DSM-IV (1994)

In 1994, DSM-IV was published, listing 297 disorders in 886 pages. The task
force was chaired by Allen Frances. A steering committee of 27 people was
introduced, including four psychologists. The steering committee created
13 work groups of five to 16 members. Each work group had approximately
20 advisers.
The work groups conducted a three-step process. First, each group

conducted an extensive literature review of their diagnoses. Then, they
requested data from researchers and conducted analyses to determine
which criteria required change, with instructions to be conservative.
Finally, they conducted multi-centre field trials relating diagnoses to
clinical practice.

A major change from previous versions was the inclusion of a clinical

significance criterion to almost half of all the categories, which required
symptoms that cause „clinically significant distress or impairment in social,
occupational or other important areas of functioning‰.
(f) DSM-IV-TR (2000)

A „Text Revision‰ of the DSM-IV, known as the DSM-IV-TR, was published
in 2000. The diagnostic categories and the vast majority of the specific
criteria for diagnosis were unchanged. The text sections which provided
additional information on each diagnosis were updated, as were some of
the diagnostic codes in order to maintain consistency with the ICD.
(g) DSM-5 (2013)

The latest edition of DSM is DSM-5 published in May, 2013. The notable
changes which can be found in DSM-5 include that it drops Asperger
syndrome as a distinct classification from the manual. There is also loss of
subtype classifications for variant forms of schizophrenia. In addition, the
manual also drops the „bereavement exclusion‰ for depressive disorders.
In addition, a revised treatment and naming of gender identity disorder to

gender dysphoria is included. The A2 criterion for post-traumatic stress
disorder (PTSD) is removed because its requirement for specific emotional
reactions to trauma did not apply to combat veterans and first responders
with PTSD. For a brief and quick general reference on the newest edition of
DSM-5: http://en.wikipedia.org/wiki/DSM-5.
SELF-CHECK 9.6
1. Why is it important to understand psychopathology in human
behaviour?
2. When did DSM first originate?
3. Write short notes on DSM.

 Psychology tests and measurement used in counselling psychology mainly

focus on interest tests, self-concept, emotion and other related psychological
issues in order to help clients to be more aware about themselves and assist
them in making life decisions and to achieve self-growth.
 Psychology tests and measurement related to stress and anxiety, coping

styles, quality of life and pain are among those commonly used in health
psychology and healthcare settings.
 Neuropsychology is the study of the brain and how it relates to behaviour.

Neuropsychologists uses neuropsychological tests in order to test for
dysfunctions in the brain which may affect behaviour.
 Clinical psychology is concerned with the understanding and treatment of

psychological distress. It focuses on the research and psychotherapy of the
more severe forms of behaviour and mental pathology in comparison to
counselling psychology which focuses on the „everyday‰ types of concerns
and problems such as those related to marriage, family, academics and
career.
 Psychopathology is the study of mental illness. Many different professions

may be involved in studying mental illness or distress particularly
psychiatrists and clinical psychologists. They may either be involved in
clinical treatment of mental illness, or research into the origin, development
and manifestations of such states, or often both.
 The MMPI-2 is the most commonly used personality test by mental health
professionals to understand personality structure and to assess and diagnose
mental illness.
 The MMPI-2 is also utilised in other fields outside of clinical psychology. The
test is often used in legal cases, including criminal defence and custody
disputes.

 The Millon Clinical Multiaxial Inventory-III (MCMI-III) is a psychological

assessment tool intended to provide information on psychopathology,
including specific disorders outlined in the DSM-IV.
 The MCMI was developed and standardised specifically based on clinical

populations. However, there is a strong evidence base which shows that it
still retains validity amongst non-clinical populations.
 The Diagnostic and Statistical Manual of Mental Disorders (DSM) published

by the American Psychiatric Association is a widely used diagnostic manual
to assess and diagnose mental disorders in psychopathology.
 Many mental health professionals use DSM to determine and help

communicate a patientÊs diagnosis after an evaluation; hospitals, clinics and
insurance companies in the US also generally require a „five axis‰ DSM
diagnosis of all the patients treated.
 On May, 2013, the latest version of DSM: DSM-5 was published with a few
significant changes in the categorisation of mental disorders.
APA Pain management

Clinical scales Pathology
Coping Psychiatric diagnosis
Executive functioning Psychopathology
Mental disorder Quality of life
Motor output Self-concept
Multiaxial system Spatial skills
Neurocognitive Stress
Neuropsychologist Validity scales

226  TO
OPIC 9 PSYCH
HOLOGY TEST AND
A MEASUREMENT IN COUNSELLING,
HEALT
TH AND CLINIC
CAL PSYCHOLOGY
C
Cohen, M E. (2010). P
R. J., & Swerdlik, M. Psychologicall testing and assessment: An
A
introducction to tests and measureement (7th edd.). New York k: McGraw-H Hill
Higher Education.
E
Groth-Marnat,, G. (2009). Handbook

G H off psychologicaal assessmentt. Hoboken, NJ:
N
John Willey.
Lazarus, R. S. (1995). Psych

L hosocial facto
ors play a rolle in health, but we have to
nd thought. Advances,
tackle theem with more sophisticateed research an A 11(2),
(
14ă18.
n, S. (1984). Stress,
Lazarus, R. S., & Folkman
L S appraiisal, and copiing. New York:
Springer-Verlag.
Millon, T. (19997). MCMI-IIII: Millon Clin

M nical Multiaxia
ial Inventory-IIII Manual (2nd
ed.). Minnneapolis, MNN: National Computer
C Systtems, Inc.
Millon.net. (20014). The Milllon Clinical Multiaxial

M M Inve
ventory-III. Reetrieved from
http://w www.millon.n net/instrumen nts/MCMI_IIII.htm
P
Passer, W., & Smith, R. E. (2008).. Psychology
M. W y: The science
ce of mind and
a
behaviorr (4th ed.). Neew York: McG
Graw-Hill Hig
gher Educatio
on.
Sevencountiess.org. (2014)). Psycholog gical testing

g: Minnesotta Multiphaasic
Personallity Inventory
y. Retrieved frrom
http://seevencounties.org/poc/vieew_doc.php?ttype=doc&id=8214&cn=188
Wikibooks.org
W g. (2010). Psychological
Ps testing/Testting in healt
lth psychologgy.
Retrieved
d from
http://enn.wikibooks.o
org/wiki/Psyychological_TTesting/Testiing_in_Health
h_
Psycholoogy
Wikipedia. (22014). Diagno

W nostic and Statistical
St Man
anual of Men ental Disordeers.
Retrieved
d from
http://enn.wikipedia.o
org/wiki/Diaagnostic_and__Statistical_M
Manual_of_
Mental_DDisorders
W
Wikipedia. (20014). DSM-5. Retrieved
R from
m http://en.wikipedia.org
g/wiki/DSM
M-5
Wikipedia. (20014). Minneso

W ota Multiphassic Personality
y Inventory. Retrieved
R from
m
http://en n.wikipedia.o
org/wiki/Min nnesota_Mulltiphasic_Perssonality_
Inventory

Topic  Issues and
10 Challenges of
Testing
LEARNING OUTCOMES
1. Examine the societal consequences of tests;
2. Explain the issues related to faking a test;
3. Differentiate between test bias and cultural bias;
4. Assess the cultural, legal and ethical considerations related to tests;
and
5. Identify issues related to the future of testing.
 INTRODUCTION
In the previous topic, you learnt about the application of testing in clinical, health
and counselling settings. The standardised tests in certain fields were also
discussed.
In the last topic of this module, an overview of psychology test and measurement
will be given. You will also learn about the various issues and challenges related
to psychology test and measurement which include faking tests, test bias,
cultural bias in testing and legal and ethical issues. The future trends in testing
will be explained as well.

228  TOPIC 10 ISSUES AND CHALLENGES OF TESTING
10.1 OVERVIEW ON PSYCHOLOGICAL TESTING

APPLICATION
This subtopic will provide an overview of the uses of psychological tests and
discuss information on psychological tests in detail.
10.1.1 Uses of Psychological Tests

As you have learnt in the previous topics in this module, psychological tests are
used in various fields and settings such as clinics, hospitals, organisations,
industries, businesses, schools and universities. They are also used in private
services, government services and in research and counselling.
In discussing the main objectives and uses of psychological tests, Aiken (2000)
states that the use of psychological tests today is the same as its use in previous
years and centuries. They are utilised to make an assessment of behaviour,
mental abilities and an individualÊs characteristics to help in making decisions,
predicting and guiding. Specifically, he lists six uses of psychological tests, which
are for research in general and for evaluation of programmes. The six uses of
these tests are listed in Figure 10.1.
10.1.2 Information on Psychological Tests

The use of psychological tests is widespread. Psychological tests are used in
many settings and situations; from personal or individual to group; from simple
tests such as a self-checklist to complicated tests like personality tests and
neuropsychological ones; and for the diagnosis of mental disorders.
We can see evidence of tests continuing to be constructed, developed, adapted

and published. Therefore, test users must know the nature of the test, its type
and usage, research done on it and literature discussing it and the application of
the test.

TOPIC 10 ISSUES AND CHALLENGES OF TESTING  229
Figure 10.1: Uses of psychological tests

Source: Aiken (2000)
Most importantly, test users must be able to make critical evaluation scientifically
and systematically on the tests they intend to use, such as whether the tests are of
high quality or not.
In order to obtain information about tests and several important issues related to
their usage and psychometric characteristics, manuals of tests and books that
discuss psychological tests are both good important sources. The most important
book referred to by many test users is the Mental Measurement Yearbooks edited
by O.K. Buros. It includes thousands of standardised tests that have been
evaluated by many experts. In addition, Buros also produced and edited four
other books which are:
(a) Tests in print (1961);

(b) Intelligence: Tests and reviews (1975);

(c) Personality Tests and reviews (1970); and

(d) Vocational Tests and reviews (1975).
Papers and journals discussing psychological tests continue to be published.

There are some which discuss theories in testing and psychometric issues and
others which include the application of tests in research. Among the journals
published are:
(a) Psychometrical;
(b) Educational and Psychological Measurement;
(c) Applied Psychological Measurement; and
(d) Journal of Educational Measurement.
10.2 TESTING AND SOCIETY

An in-depth knowledge of the principles of measurement and the nature of tests
is important regardless of the specialty area that individuals eventually pursue. If
we are to study behaviour, we have to be able to measure it. At the same time,
psychological testing occurs in the context of social and political issues that must
be addressed both by the testing professionals and by society in general.
The task of making predictions about psychological testing is especially difficult

because society seems to be in a dilemma about the testing field as well as about
a number of political issues that have important implications for testing.
On the one hand, the publication of new tests and the revision of existing tests
appear to be accelerating. A survey of psychological literature reveals that
psychologists are as enthusiastic about psychological tests as they have ever been
and nothing indicates that this interest will decline (Janda, 1998). Society, on the
other hand, appears to be increasingly sceptical about the widespread use of
tests.
Matarazzo (1992) observed that predictions are more often wrong than right
because no one can foresee the theoretical or technological innovations or the
changes in the social and political climate that influence the development of any
discipline.

Morganthau (1990) observed that many, if not most, Americans believe that tests
are „biased, mechanistic, dehumanising and inimical to learning‰ (p.63) and that
they are used to control people for the benefit of those who use them. The
distrust of tests may result from the fact that many people experience them as
barriers that prevent them from attaining their educational, vocational or
professional goals.
10.3 SOCIETAL CONSEQUENCES OF TESTS

This discussion focuses on the consequences and interpretations that individuals
make regarding specific psychological tests used. What are the consequences,
results or implications of using a test?
For example, if a test administered to students is used to identify students for

remedial class in mathematics, we must assess whether the test adequately
covers the contents of the syllabus. This is the issue of content validity. On the
other hand, if the issue is whether the use of test leads to educational benefits for
the students identified, then this issue addresses whether the test has
consequential validity.
Two considerations which have to be emphasised are:
(a) Claims regarding consequences made by test developers; and
(b) Consequences that may occur regardless of the claims of test developers.
For instance, the test developer may claim that a test of depression leads to more
effective therapy. In this case, evidence of improved therapy should be collected
as proof.
The issue of consequential validity is still a new concept. Some think it is

essential, while others feel that consequences are a matter of politics and policy-
making and to gather relevant evidence.
Test bias means that a test functions differently for different groups. Studying
test bias can be done using criterion-related validity methods. Do the tests
function in the same way for different groups, even if the groups vary in average
performance related to real differences in underlying traits?

Jensen (1980) stated that „Most current standardised tests of mental ability yield
unbiased measures for all native-born English-speaking segments of American
society today, regardless of their sex or their racial and social-class background.
The observed mean differences in test scores between various groups are
generally not an artefact of the tests themselves, but are attributable to factors
that are causally independent of the tests.‰
Reynolds (1994), on the other hand, argued that „Only since the mid-1970s has
considerable research been published regarding race bias in testing‰. For the
most part, this research has failed to support the test bias hypothesis, revealing
instead that:
(a) Well-constructed, well-standardised educational and psychological tests

predict future performance in an essentially equivalent manner across race
for American-born ethnic minorities;
(b) The internal psychometric structure of tests is essentially not biased in

favour of one race over another; and
(c) The content of the items in tests is about equally appropriate for all these
groups.
We will further examine the issues of test bias in the later section of this topic.
ACTIVITY 10.1
By doing additional reading, discuss with your coursemates and tutors
the following:
1. Do you yourself like to take psychology tests and measurement?
Share your reasons.
2. Debate critically whether psychology tests and measurement
bring more benefits or harms to our society.

10.4 THE ISSUES OF FAKING TESTS

While faking tests is a continuing issue, designers of selection tests have a
number of tools and techniques available that can be used to work against or at
least detect faking.
10.4.1 Some Techniques in Reducing Test Faking

In this subsection, plenty of techniques will be introduced as efforts to reduce the
phenomenon of test faking when test takers undergo psychology tests and
measurement. Figure 10.2 shows the eight techniques used in reducing test
faking.
Figure 10.2: Techniques in reducing test faking

The techniques used in reducing test faking are further explained as follows
(Changingminds.org, 2014):
(a) Initial Instructions

Before the beginning of a test, the candidates may be given instructions that
include a warning of the consequences of detected faking and honesty will
be requested in anwering the questions. Instructions may also ask
candidates to answer quickly, with the first answer that comes in mind
instead of pondering. Holden et al. (2001, p160) indicates that lying takes
time. This is also supported by Ekman (1985) in his general study of lying.
(b) Trick Questions

It is also possible to include „trick‰ questions, where a fake response is
easily identified and hence raises suspicion or doubt about all the other
responses. For example, while assessing a given set of skills, a multiple
choice question may have no right answer. If an answer was given for that
question, then earlier assertions may later be reviewed in detail.
(c) Multiple Sources

Instruments which use self-reporting may give false readings when they
are used by candidates who have insufficient self-insight to be able to
answer questions fully. If information is collected from multiple sources,
then this problem may be reduced, such as through the use of „assessment
centres‰ where multiple methods and assessors give a range of data and
viewpoints that can be cross-checked.
(d) Safe Answer

Test takers who use the „central response tendency‰ and who go for „safe‰
central options may be identified by asking different questions in which a
consistent response would include high and low responses.
If individuals have a high need for approval, they usually tend towards
positive „agree‰ and „yes‰ responses. This may be countered and detected
by reversing some questions (reversing also breaks up habituating patterns
of similar responses). This tendency towards seeking approval may also be
detected by including a „social desirability‰ scale within the questions to
separate them from habitual response forming questions.

(e) Multiple Questions

Assessing the same attributes with multiple questions can also show
whether the candidate is averaging across questions („IÊve been a bit
negative; I think I shall be positive for a while now.‰), although obvious
care needs to be taken to ensure that similar questions are interpreted in the
intended way. Analysis of sequential patterns of positive and negative
responses across responses may also identify uncertainty or deliberate
averaging.
(f) Ipsative Questions

Normative items ask the candidate to rate their level of agreement with
statements, and give a good measure of psychological characteristics (Kline,
1993). However, the question of faking has led to an ipsative approach
being used in many contexts, where the test taker is forced to make a choice
from a fixed number of options. Ipsative questions either offer a choice
between items from very different areas (for example, one question from
such a test is „Which do you prefer, a poem or a gun?‰), or a polar choice
from the same scale, which may have a yes/no response.
However, as Johnson et al. (1988), points out, ipsative forced-choice

approaches are highly problematic. The notion that you can force people to
do something deprives them of their free will and the very real problems of
respondents ă either second-guessing or making a random choice from a set
of items ă amongst which no clear preference arises.
Martin et al. (1995) shows that test takers with a good understanding of job
needs can provide realistic faked responses. Ipsative methods still persist,
in particular, where sound alternatives are not available, for example, the
Zuckerman, Eysenck & Eysenck (1978) scale of sensation-seeking is still
used, despite the report by Ridgeway & Russell (1980) on unacceptably low
reliabilities for the various sub-scales.
(g) Question Opacity

Faking may also be reduced by use of item opacity, where the respondent
does not know the right or wrong answers. For example, the use of biodata
approaches, where traits and historic activities have been correlated with
requirements of the job in question, can offer very opaque questions (such
as the World War 2 discovery of a correlation between childhood flying of
model aeroplanes and good pilots).

(h) Including the Candidate

Including the candidate in the assessment process can also help to reduce
faking by socialising with them to provide honest responses. It may be
implemented, for example, in assessment centres, where they may be
involved in discussions about psychometric outcomes.
SELF-CHECK 10.1
1. How can faking be reduced in psychological tests?
2. Why do candidates need to be included in the assessment process

of tests?
10.4.2 This Personality Test Cannot Be Faked

Psychological testing is often used to predict success in academic and creative
domains. Increasingly psychological tests have found a place in the corporate
world to determine if an individual has skill sets to match a particular job
requirement.
However, test results can be influenced if an individual provides biased

responses. University of Toronto researchers believe they have solved this
problem with the development of a personality inventory that can appropriately
predict future performance even when respondents are trying hard to fake their
answers.
It is very common for people to try and make themselves look better than they
actually are on these questionnaires, especially if they know that they are being
evaluated.
This sort of faking can distort the predictive validity of these tests, with
significant negative economic consequences. We want to develop a measure that
can predict real-world performance even in the absence of completely honest
responses.

The research findings demonstrate that traditional personality inventories fail to

predict performance outcomes when respondents have strong incentives to fake
their scores. The new measure, by contrast, maintained its ability to predict
success, even when respondents were consciously trying to make themselves
look good.
Personality remains an important factor in predicting performance. Trait

conscientiousness has consistently emerged as a major predictor of academic
success and workplace performance, while trait openness is a good predictor of
creative achievement.
Using formulas derived by Frank Schmidt (Iowa University) and John Hunter
(Michigan State University), the authors were able to estimate the potential
productivity gain associated with using the new measure in a workplace setting.
Since people differ widely in their individual abilities, even a small degree of
accuracy in testing can produce significant economic gains.
In the present study, the tests were accurate beyond that small degree. In fact,
Schmidt and HunterÊs formulas indicated that the use of bias-resistant test over
currently available for personality assessment methods could result in a
productivity gain of 23 percent per hired employee, when response faking is an
issue ($17,000/ year per $75,000 of salary).
Potential gains of this magnitude should not be ignored. It is very important that
the right people be chosen for any competitive position. This questionnaire is a
step in the right direction.
Details of the discussion on these related issues can be referred to the article:
„This personality test cannot be faked‰ by Nauert (2008).
ACTIVITY 10.2
Discuss in tutorial class and in the myVLE forum what you have learnt
about test faking, especially in personality tests and measurement, after
reading the article „This personality test cannot be faked‰ as mentioned
in section 10.4.2.

10.5 TEST BIAS

Another common issue in the administration of psychology test and
measurement is test bias. Test bias will have an impact on the accuracy of
interpretation of test results. Therefore it is essential for us to understand the
concepts and knowledge related to test bias. First let us define what test bias is.
10.5.1 Definition of Test Bias

A biased test is one in which there are systematic differences in the meaning of
test scores associated with group membership.
Another way of saying this is that a biased test is one in which people from two
groups who have the same observed score do not have the same standing on the
trait of interest.
A third way of saying this is that using a test to predict some criterion of interest
results in a systematic over or under-prediction based on group membership.
Example: racist performance appraisal opened a PandoraÊs box in US and
Germany.
Test bias is said to occur when a test yields higher or lower scores on average
when it is administered to specific criterion groups such as people of a particular
race or gender than to an average population sample. Negative bias is said to
occur when the criterion group scores lower than average, while positive bias is
said to occur when the group scores higher. The crux of the issue then is: does
this occur because there is a real difference in the attribute being measured or is
this due to cultural test bias?

10.5.2 Models of Test Bias

The following Table 10.1 explains two models to help us in understanding the
concept of test bias.
Table 10.1: Models of Test Bias and its Descriptions
Models Description
Mean  The most intuitive definition of bias is the observation of a mean
difference difference between groups. So, for example, if we saw that females
scored higher than males on the SAT (Scholastic Aptitude Test)
verbal test, we might suspect that the test was biased. However, the
mean difference by itself is a bad choice of models of bias. This is
because a mean difference could demonstrate bias, but it could also
reflect a real difference between groups.
 If you measure the height of a representative sample of adult males
and females in the US with a tape measure, you will find that males
are taller on average. Does this mean that the tape measure is
biased?
 People differ in a lot of ways, so finding a mean difference between
groups does not necessarily mean that the test is biased. On the other
hand, finding no mean difference does not necessarily mean lack of
bias.
 If you developed a new tape measure that showed no mean
difference between males and females in height, the new measure
would be biased, because there really is a difference. In essence, your
new measure would be adding inches to the height of females and
this is what we would define to be bias.
Equal  The most widely accepted (but not the only) model of test bias is the
regressions regression model (which is also known as the Cleary model). This
model places bias into the context of the interpretation of test scores
(that is, validity).
 The model says that if different groups share the same regression
line, the test is not biased (even if there are differences in means
across groups). If the groups have different regression lines, then the
test is biased because it is measuring different things for different
groups.
 The model says that people with the same test scores should do
equally well on some external criterion. For example, if the test is not
biased, then blacks and Whites with the same SAT score will get the
same freshman grade point average. On the other hand, if the SAT is
biased against blacks, then blacks with the same SAT scores as
whites will have higher freshman GPAs.
Source: http://luna.cas.usf.edu/~mbrannic/files/tnm/tstbias.htm
10.5.3 Test Bias in Industrial and Organisational

Psychology
According to Aguinis, Culpepper and Pierce (2010), test bias is one of the issues
in industrial and organisational (I/O) psychology on which most researchers
agree because findings seem consistent.
The consensus in I/O psychology and related fields (e.g., education, human
resource management) concerned with high-stakes testing is that, in the instances
when it exists, test bias is found regarding intercept differences between groups
in the form of over-prediction of scores for minority group members (i.e., smaller
intercept for the ethnic minority group compared to the majority group), but no
differences are found regarding slopes across groups (e.g., Cole, 1981; Houston &
Novick, 1987; Humphreys, 1986; Hunter, Schmidt, & Rauschenberger, 1984;
Kuncel & Sackett, 2007; Linn, 1978; Rotundo & Sackett, 1999; Rushton & Jensen,
2005; Sackett, Schmitt, Ellingson, & Kablin, 2001; Sackett & Wilk, 1994; Schmidt &
Hunter, 1981, 1998).
This conclusion has been reached regarding selection tools used in both work
and other organisational settings to assess a heterogeneous set of constructs
ranging from general mental abilities (GMA; e.g., Hartigan & Wigdor, 1989) to
personality (e.g., Cortina, Doherty, Schmitt, Kaufman, & Smith, 1992; Saad &
Sackett, 2002) and safety suitability (Te Nijenhuis & Vander Flier, 2004).
Details of the discussion on these related issues can be found in the article:
„Revival of test bias research in pre-employment testing‰ by Aguinis, Culpepper
and Pierce (2010).
10.5.4 Test Fairness

Fairness tries to find out how a test is used. Fairness and bias are not the same
thing. A judgement of fairness rests on values and reasonable people may
disagree about the fairness of a test when both agree about the facts of the matter.

Suppose we use a test to decide who will be admitted to college. An individualist

may say that the test should be administered to all those who applied and those
with the highest scores should be admitted, regardless of race, gender or other
group membership, even if this means that some groups will be admitted in
greater numbers than others. Others may contend that admissions should be in
proportion to the numbers from each group that applied, so the test should be
used to select those who have high scores in different groups to ensure
admissions are made in proper proportion.
A biased test may be used fairly. Suppose that a test is biased such that males
score 10 points higher on average than do females. If we simply add 10 points to
the observed scores of the females and use that score for making decisions, the
biased test will prove to be fair (Aguinis, Culpepper and Pierce, 2010).
SELF-CHECK 10.2
1. Differentiate between test biasness and test fairness.
2. Explain the models to understand test bias.
10.6 CULTURAL INFLUENCE IN TESTING

The growth and development of the field of psychological testing shows the
significant role of tests in society. Many tests have emerged due to the practical
needs of society, be it to test childrenÊs mental ability or large groups of
individuals suitable for army service as evidenced by the development of the
Army Alpha and Beta Examination.
These historical developments have especially gained popularity amongst

Western countries. The issue of whether these tests developed in Western
countries are suitable for use in other countries with different cultural
backgrounds has been discussed at length.

The cultural influence in testing will be discussed from the following four aspects
Figure 10.3: Aspects of cultural influence in testing
10.6.1 Cultural Backgrounds

Test users need to consider that the cultural background of test takers will
influence the entire process of assessment.
Sattler (1988) states that cultural groups may have variations based on their
different values, language, views of life and death, roles of family members,
problem-solving strategies, attitudes toward education, mental health, mental
illness and stage of acculturation.
The implication is that differences based on cultures in response style may

influence how some test takers answer the test. Thus, interpretation of test results
should be done with caution and this is especially important for test takers from
culturally diverse backgrounds.
In other words, the test scores obtained may be due to complex social-
psychological factors potentially influenced by national history, predicaments of
race and many other factors.

The assessment of individuals with different cultures raises important questions,

especially when the test results use the norms developed from a Western
population. As a result, placement decisions and categorisation of individuals
based on these norms may not be accurate. Psychologists should therefore
display an increased sensitivity to cultural variables in the practice of testing.
10.6.2 Language
The field of psychology has also recognised that specialised practices may be
needed to achieve equitable testing with linguistic minorities. We could suggest
that a native language interpreter be used to facilitate the testing of examinees
whose first language is not English. However, testing specialists advise against
this practice because interpreters may substitute words, speak in a different
dialect or engage in subtle prompting that influences the examineesÊ responses
(Rogers, 1998). A well-trained psychologist would be preferable, but even this
practice is considered problematic by some (Figueroa, 1990).
The preferred option is to use tests translated into the examinersÊ native language
and norm of the relevant subpopulations. The process of translating tests from
English to the intended language must undergo a process of back translation as
suggested by Brislin, Lonner and Thorndike (1973) in order to achieve
comparable meaning.
10.6.3 Behaviour
In addition to possible language barriers, test takers of different cultures may
exhibit a lack of familiarity about test taking that further adds to their
disadvantage.
Padilla and Medina (1996) made the following observations: „It is quite probable
that minority students are less familiar with standardised achievement testing
and thus less test wise than majority students, most of whom have been exposed
to standardised testing over an extended time.‰

10.6.4 Culture-free and culture-fair tests

Based on these considerations, psychometricians have suggested that culture-free
tests be developed.
Culture-fair tests are tests that pose problems that are equally familiar to all
cultures. According to Tan and Tan (1998), culture-fair tests are tests that reduce
cultural factors as much as possible. Examples of these tests are the Culture-Free
Self-Esteem Inventory by Coopersmith and the Culture-Fair Intelligence Test by
Cattell.
10.7 TESTING IN A CROSS-CULTURAL

CONTEXT
In this section, two pieces of academic articles are extracted for reading to enable
you to have general ideas on practical testing issues in a cross-cultural context.
10.7.1 Developing a Cross-Cultural Conceptual Model

for Testing Organisational Commitment in UAE
According to Anwar, Chaker and Ferhat (2003), the motivation for international
business firms to seek the full commitment of their employees for attaining the
stated organisational objectives in various parts of the world has been one of the
cornerstones of managerial action and contemporary research.
However, organisational commitment in the cross-cultural context of an open

Arabian economy such as the United Arab Emirates (UAE) has not been studied
adequately, presumably partly owing to the lack of conceptual clarity and partly
due to the lack of empirical information. The main objective of this study would
be to present a conceptual model of testing organisational commitment in a
cross-cultural context of the UAE and explore its implications for managerial
decisions.
Details of the discussion on these issues can be found in the article: „Developing
a cross-cultural conceptual model for testing organisational commitment in the
UAE: A theoretical perspective‰ by Anwar, Chaker and Ferhat (2003).

10.7.2 Language Issues in Cross-Cultural Usability

Testing
Another extract is from an article titled: „Language issues in cross cultural
usability testing: A pilot study in China‰ by Sun and Shi (2007). Although this
article is not related to psychology tests and measurement, it reveals an
interesting phenomenon of language issues in cross-cultural settings.
According to Sun and Shi (2007), with the progress of economic globalisation,
more and more international enterprises have started to perform usability tests in
different cultures during the last decade. In China, only two or three years ago,
„usability‰ was quite a new word for most of its people. Presently, the situation
has changed dramatically. Many domestic enterprises have considered the
importance of usability tests for their products, especially for IT businesses.
Many Western researchers were interested in Chinese usersÊ preference,

behaviour and mental models. Since China is not an English speaking country,
unlike India and Singapore, most users in China cannot speak English at all. This
creates some of the biggest communication problems when conducting usability
tests by international moderators.
There are several methods to avoid this problem:
(a) The first is using bilingual moderators to test users;
(b) The second is finding users who can speak English. However both
professional moderators and English speakers are very rare in China and
they are young and probably come from a Western educated background.
Hence, there is no way to get real feedback from all kinds of users in China;
and
(c) The third and the most regular way is by using both remote and local
moderators to work together with Chinese users to ensure that they really
get the feedback from the right users and understand it.
Local moderators here mean those who have received training in human factors
or those who have working experience on usability test for at least one year in
China. They usually cannot speak English very well. Remote moderators are also
those who have received training in human factors and have experience in
usability test of at least one year in foreign countries. They usually can speak
English and their local language very well.

Previous studies on cross-cultural usability evaluation show that culture broadly

affects the usability evaluation processes. Vatrapu R, and Pérez-Quiñones M.A
(2006) investigated the evaluator effect, and found that participants found more
usability problems and made more suggestions to an interviewer who was a
member of the same (Indian) culture than to the foreign (Anglo-American)
interviewer. The results of the study empirically established that culture
significantly affects the effectiveness of structured interviews during
international user testing.
The first thing we need to do is to identify the kinds of cultural factors that can
affect usability tests. The reason why language is picked as a factor to be
investigated is that language is a representation of culture and the language
situation among India, European countries and China is totally different.
Although English is not a native language for the Indians and Danish, most
people in these two countries can speak English very well. However in China,
few people are proficient in English. Therefore, if conducting a usability test in
China, the first thing you have to do is to change the testing interface into
Chinese.
We usually say if someone is speaking English, he/she must be thinking in

English. So, if the test user and evaluator choose a specific language during the
usability test, they probably think in that language. This means that speaking
different languages can affect the process of the usability test even if all the
participants are Chinese.
SELF-CHECK 10.3
1. How does testing vary in a cross-cultural context?
2. Explain the cultural influences at play when administrating and

interpreting psychology test and measurement.
ACTIVITY 10.3
After reading the two abstracts of the articles in section 10.7 (or the full
articles which can be found online), write short notes on how you can
relate both articles and get ideas from them in further understanding
cross-cultural issues in psychology test and measurement. Discuss your
findings in class and on the myVLE forum.

10.8 LEGAL AND ETHICAL ISSUES

Psychology testing is getting more and more popular in our country. However
there are a handful of legal issues that need to be taken into consideration when
utilising psychology tests. In this section, we will learn the legal issues related to
psychology tests and measurement by referring to various experiences from
other countries, especially experiences and cases from the US.
10.8.1 Legal Issues of Testing in Educational Settings

Standardised tests are often used as a mechanism of social control. „If a decision-
maker can point to the results of an objective and valid test as the information on
which a control decision was based, those being controlled are more likely to
accept and internalise the decision and its consequences.‰
Tests as a social control mechanism are „open to criticism in proportion to the

extent to which those being controlled perceive it as irrational, capricious,
arbitrary or unjust‰ (Nitko, 1983).
Legal challenges to the use of tests for decision-making in schools have focused
on ability tracking, placement in special education classes, test scores as school
admissions criteria, test disclosure and teacher competency.
In general, the application of specific laws to the claims of inappropriate test use
is unclear; instead, the cases have been decided on based on the specific
circumstances of each case. Cases illustrating legal challenges are described in
greater detail as follows (ERIC, 1985):
(a) Ability Tracking

Many cases have been based on charges that tests have caused the
disproportionate placement of minority students in lower ability tracks. The
cases are usually based on the argument that the tests are biased against the
lower scoring group or that they reflect the effects of past segregation in
schools. The plaintiffs argue that use of the test denies them access to
certain programmes or to some certifications.
Court decisions have upheld these arguments to some extent. In Hobson vs.
Hansen (1967), it was ruled that the IQ tests used to track students were
culturally biased because they were based on a white, middle-class sample. It
was also ruled that these tests were inaccurate for lower-class and Black
students and the court abolished the tracking system used in the District of
Columbia. Later appeals allowed other forms of ability grouping, but would
not allow the use of tests that had racially discriminatory consequences.

The use of achievement tests instead of IQ tests may not be any more
appropriate. Moses vs. Washington Parish School Board (1971) involved the
use of both IQ and achievement tests. The IQ test scores were used for
special education placement; the achievement test scores were used for later
tracking. The case was also somewhat unique because it involved a recently
desegregated school. The courts ruled against test use for tracking under
these circumstances.
(b) Special Education Placement

The arguments against the use of tests for special education placement
decisions are the same as those against the use of tests for tracking. In
addition, the plaintiffs frequently argue that using a test to label a person is
illegal because it results in the stigmatisation of that person.
The best-known case focusing on special education placement is Larry P.

vs. Riles (1972). IQ tests were being used to place students in EMR classes.
The defense argued that racial imbalance in the EMR classes was not the
result of test scores, since parental consent for placement was required. The
court decided that the parents would also be influenced by the test scores
and was not sympathetic to the defenseÊs argument that there was no better
alternative.
In later appeals, test validity became an important issue and the court set
standards for validity: the same pattern of scores must appear in different
subgroups, the mean score should be the same for different subgroups and
the results should correlate with relevant criterion measures. Though
experts argued that these standards were not psychometrically sound, the
court found that the racial differences in test scores were due to cultural
biases in the tests.
The Larry P. decision was rejected as a precedent by Judge Grady in

Parents in Action on Special Education (PASE) v. Hannon (1980). In this
case, IQ tests were being used for placement of students in EMR classes in
Chicago schools. The plaintiffs argued that the tests were culturally biased.
Since other criteria were also used for placement and many of the school
psychologists were Black, Grady founded for the defendants.
Linguistic bias in IQ tests used to place students in special education classes

has also been the basis of legal challenges. One case of this type (Diana vs.
California State Board of Education 1970) never actually came to court.
Research indicated that in the IQ tests used for placement in EMR classes,
Mexican-Americans gained 15 points if they were allowed to respond in
Spanish. The consent decree allowed non-Anglo children to choose the
language in which they would respond, banned the use of verbal sections
of the test and required state psychologists to develop an IQ test

appropriate for Mexican-Americans and other non-English speaking
students. Soon after its introduction, the California state legislature passed
a law requiring that test scores used for placement to be substantiated
through an evaluation of the studentÊs developmental history, cultural
background and academic achievement.
(c) School Admissions

Test scores are frequently used as important information in a schoolÊs
decision about whether to admit a specific student. In Bakke vs. Regents of
the University of California (1976), test scores were used as evidence, but
the validity of the tests being used was not challenged.
Instead, the case focused on the admissions procedures at the UC-Davis

Medical School, where 16% of the admissions openings were reserved for
disadvantaged students. Many students admitted under this policy had
lower undergraduate grade point averages and test scores than regular
admittees. Bakke argued that the special admissions policy was
discriminatory against White applicants because race was a criterion for
disadvantagement.
(d) Test Disclosure

Many test takers or interested parties may want to know the content of a
particular test. For example, parents may want to examine contents of the
IQ test used to place their child in a special education class, or a college
applicant may want to examine the items of a college entrance exam. Most
arguments for test content disclosure begin with the Family Education
Rights and Privacy Act (1974). It allows parents and eligible students access
to their education records and an opportunity to challenge those records,
including the test protocols used for placement of students.
In 1980, New York passed a Truth-in-Testing bill covering college

admissions tests, among others. Proponents of the bill argued that this
would humanise the admissions process, equalise opportunities for
minorities and ensure the accountability of test publishers. On the other
hand, opponents argued that the administration of secure tests minimises
costs to test takers, prevents unevenness across admission directors and
protects test score validity. Though a similar national bill was introduced, it
was not passed and further legislation in this direction seems unlikely in
the near future.

(e) Teacher Competency Testing

Legal issues related to teacher testing are similar to those in occupational
testing in general. States or school districts must be able to demonstrate that
a test is valid for the purpose for which it is being used. The example of the
use of the National Teacher Examinations (NTE) for certification and
promotion in South Carolina can be used to illustrate these issues. The use
of the NTE was challenged by the National Education Association, the
South Carolina Education Association and the US Justice Department on
grounds that the NTE were biased against minorities; many more black
than white applicants failed the test.
The court decided that the NTE were valid for these purposes, because
scores reflected presence or absence of knowledge. There was no intent to
discriminate, and an ETS validity study indicated that they were in
compliance with Title VII of the Civil Rights Act of 1964. Opponents of this
type of test continue to argue that certification should be based on a
performance test, rather than a paper-and-pencil test.
10.8.2 Legal Issues of Testing in Entrepreneur

Settings
There are many business and industrial organisations which utilise psychology
test and measurement as part of their process in hiring potential employees.
However, there are many legal requirements that entrepreneurs should be aware
of in relation to this practice.
In hiring potential employees, resumes and interviews are helpful, but pre-
employment testing is the only way to really verify a candidateÊs qualifications
and abilities. The problem is that pre-employment testing is subject to strict legal
restrictions and if you do not know what they are, you could find yourself in
difficulty.
Overseas, especially in developed countries, pre-employment testing is subject to

restrictions under both federal and state law. However, since the laws are not
necessarily clear-cut, business owners often conduct testing that falls outside
legal parameters. As a result, many small business employers are completely
unaware that their company is vulnerable to lawsuits.

Although it is a good idea to consult an attorney before you conduct any

pre-employment tests, here are some of the issues related to testing in US in
Table 10.2.
Table 10.2: Description of Issues Related to Pre-employment Tests in US
Types of Tests Description

Skill testing There are a variety of skill tests you may want to conduct with a
potential employee. Everything from advanced mechanical ability to
basic office skills are fair game ă as long as they are limited to testing
specific skills the employee needs to perform his/her job.
Personality Tests designed to assess an employeeÊs personality type or
and psychological profile can give you insight about an individualÊs ability
psychological to interact with others in the workplace. Unfortunately, they can also
testing open the door for lawsuits because they can potentially reveal
information about the individualÊs religious beliefs, sexual preferences
or mental disabilities.
Medical Generally speaking, you cannot ask job applicants to submit to a
exams physical examination before offering them a job. After offering the
position to an applicant, it is possible to require a medical test, but
only if you require every new employee to submit to the same exam. If
an employee is singled out for medical testing, you could face
litigation or penalties for discrimination.
Lie detector At some point, nearly every small business employer has thought
tests about submitting a job candidate or hired employee to a lie detector
test. However the government is one step ahead of you. The Federal
Employee Polygraph Protection Act prohibits employers from
requiring applicants to take a lie detector exam unless the business is
related to security or pharmaceutical distribution. However, there are
no federal laws stopping you from administering a polygraph to
existing employees, although the practice is banned in many state and
local jurisdictions.
Drug tests Laws governing drug testing vary from state to state. However,
certain jobs make drug testing a necessity. To be safe, talk to your
lawyer before you request an employee to submit to testing.
Other Employers need to be especially careful when testing employees or
concerns applicants with disabilities. Under certain circumstances, testing can
be seen as discriminatory. If there is any doubt about whether or not a
test violates state, local or federal law, contact the Department of
Labour for more information.
Source: http://www.gaebler.com/Employment-Testing-Legal-Issues.htm

In conclusion, as can be seen from the cases and scenarios described previously
in this section for educational and entrepreneur settings, many legal issues are
involved when tests are used as a mechanism for social control. In general, the
issues revolve around the validity of the test for a specific use. However, specific
legal decisions depend on „the particular circumstances surrounding a given
case, the evidence brought to bear in the case, and the opinion of the judge and
jury involved‰ (Nitko, 1983).
ACTIVITY 10.4
After reading section 10.8.1 on the experiences of testing related legal

issues in the US, discuss with your coursemates and tutors how these
legal issues may be relevant in our country.
SELF-CHECK 10.4
Justify the importance of legalising testing.
10.8.3 Legal and Ethical Considerations

Psychological tests are used to measure, assess and describe human behaviour.
The widespread use of psychological tests may lead to the misuse and abuse of
testing. Although the intention is sincere, ignorance in the field of psychological
testing can cause harm ă not only to individuals but also to society as a whole.
When the inaccurate test is used, the evaluation and description of individuals
may also be inaccurate. Therefore, there must be a body that governs and
provides guidelines and standards that can be used by anyone with the intention
of using psychological tests.
Several documents that are used as guidelines for ethics in test usage are:
(a) Speciality Guidelines (1981) for counselling and clinical;
(b) Casebook for Providers of Psychological Services (1982); and
(c) Standards for Educational and Psychological Testing (1985).

First, professional issues must be taken into consideration. One important aspect
that should be focused on is the competence of test purchasers. There is potential
for harm if the tests fall into the wrong hands.
The APA proposed that tests can be categorised into three levels of complexity
that require different degrees of expertise from the examiner as shown in
Table 10.3.
Table 10.3: Three Levels of Complexity that Require Different Degrees of

Expertise from the Examiner
Level Description
Level A  Requires minimal training.
 Test administration involves reading simple directions.
 Covers tests for educational achievement and job proficiency.
Level B  Requires some knowledge of the technical characteristics of tests.
 Covers tests such as group-administered mental ability and interest
inventories.
 Also requires knowledge of test construction and training in statistics
and psychology.
Level C  Requires advanced training in test theory and relevant content areas.
 Also requires substantial understanding of testing and supporting
topics.
 Requires a minimum of a masterÊs degree in psychology.
 Covers individually administered intelligence tests and personality tests.
Next, ethical issues in using psychological tests must be emphasised, as shown in

Table 10.4.
Table 10.4: Three Ethical Issues in Using Psychological Tests
Ethical Issues Description

Informed consent Informing test takers of the nature and purpose of the assessment.
Knowledge of The patient, client or subject has the right to full disclosure of test
results results.
Confidentiality Test results should be treated as confidential information.

SELF-CHECK 10.5
1. We need to have a body that governs the activities of
psychological testing. Explain why this is so.
2. Explain the ethical issues that are vital in using psychology tests.
3. Describe how APA provides guidelines to ensure that psychology

tests are professionally administered by the relevant experts.
10.9 THE FUTURE OF TESTING

Psychological testing is an important part of psychology history. Opinions about
its future are wide ranging. Some psychologists believe it is a nearly obsolete
tool, while others see it as a specialty area with tremendous growth potential.
Descriptions of the areas related to psychological testing are as follows (Rich,
2007):
(a) Payment Trends

Psychological testing has followed a reimbursement course similar to that
of psychotherapy. With the advent of managed care in the 1980s and 1990s,
payment amounts were reduced and the kind and quantity of services
offered were limited by „medical-necessity‰ criteria. Third-party payers
enthusiastically followed these criteria and that dramatically slashed
reimbursement for testing. Typically, psychological testing is considered
medically necessary, if the diagnosis is still unclear after a thorough
diagnostic interviewing.
However, psychologists have had some good news with regard to

insurance reimbursement for testing. Although some MasterÊs level mental
health practitioners have expanded their practices to include psychological
testing, insurance typically only pays when a psychologist performs the
test. Despite overall decline in Medicare payments, the rate for
psychological testing has actually increased recently.
In January 2006, Medicare introduced new billing codes that distinguished

tests administered by psychologists from those administered by a
technician or assistant. This resulted in a 26 percent to 69 percent increase in
payment for outpatient testing by psychologists.

Aurelio Prifitera, Ph.D., psychologist and president of Harcourt Assessment

International, views mental health parity laws as another factor which will
reverse the trend toward declining payment for psychological testing.
Parity laws require that mental health insurance reimbursement be on
equal footing as physical health payments.
(b) Survey of Testing Psychologists

In a 2007 survey across 18 states, 32 psychologists who used testing in their
practice were asked if they saw a growth in psychological opportunities, as
well as sought their opinions about the future of testing and the
psychological testing niches.
What was most remarkable about the result was the diversity of opinions.
47 percent of the respondents believed that the market for psychological
testing was shrinking, while 22 percent saw the market as growing. The
remainder had not seen a change. Yet when these psychologists were asked
whether they had experienced a growth in testing opportunities in their
own practice, 42 percent answered affirmatively.
One recurring theme among the most pessimistic psychologists was that
insurance and managed care had drastically reduced payments. A North
Carolina psychologist stated, „Managed care and insurance reimbursement
have been significantly cut. Testing such as the MMPI, which used to be a
regular part of the intake, is not done at all now.‰ Others lamented that
testing is being less emphasised in graduate training.
John L. Reeves II, Ph.D., ABPP, professor and director of behavioural

medicine services at UCLAÊs Orofacial Pain Clinic, said, „Sadly, the
majority of psychology graduate schools are doing a very poor job of
teaching psychometric testing and psychodiagnostics. Few graduate
departments even require a competency evaluation on such gold standard
tests as the MMPI.‰
Psychologists with specialised training such as neuropsychology have had

the most positive attitudes or have found niches outside the traditional
medical insurance arena. Frank Cushing, Ph.D., practising in Rockford, Ill.,
was among the most optimistic respondents. He saw some societal trends
as opening new markets, including schools asking for violence-risk
assessments and courts asking for sex offender evaluations. He noted that
requests for „ADHD evaluations have increased with more data about
children being overmedicated.‰

Surveyed psychologists who perform testing reported an average minimum

$850 charge for a test battery and an average maximum of $2,550. Those
with a subspecialty in neuropsychology reported an average range of
$1,000 to $3,260.
There appears to be no shortage of opportunities for creative and

entrepreneurial psychologists. Professional school admission tests, such as
the LSAT (Law School Admission Test), require specific documentation in
order to accommodate learning disabilities.
Other new niches include testing for citizenship waivers, pre-surgical

evaluations and assessments to screen candidates for the ministry or police
work. Matchmaking services such as eHarmony use psychology expertise
to design and validate tests that pair kindred spirits with each other.
Prifitera commented that testing is expanding into the business settings to

assess talent and into primary care medical settings to screen for mental
health needs. In these contexts, psychologists may have a less direct role
but are still needed to manage the assessment process.
Traditionally, testing has been done to guide mental health treatment

planning. While economic pressures have decreased the demand for this
application, new markets continue to make testing a viable, exciting and
expanding area of practice.
On the technical part, it is said that there is a new psychological test called
brain mapping. Neuropsychological, clinical neuroscience and biohypnosis
are the new testing methods for family law issues. They provide brain
mapping images that can give insight to the capacities and mental health of
parents, teens and children. Hypnoanalysis is being reinvented because of
the validation of interviewing a person under hypnosis as shown by brain
mapping studies.
ACTIVITY 10.5
After reading the writing in section 10.9, think and discuss in class and
forum about the future of psychology tests and measurements in
Malaysia by comparing possible situations here to the situation in the
US as described.

SELF-CHECK 10.6
1. Differentiate between faking and cheating.
2. What are the contemporary legal issues of testing?
3. What kind of new ideas are expected to be generated in terms of

testing in the future?
4. Differentiate between cultural bias and test bias.
5. Define ability tracking.
 Psychological tests are used in various fields and settings such as clinics,
hospitals, organisations, industries, businesses, schools and universities. They
are also used in private services, government services and in research and
counselling.
 Test users must be able to make critical evaluation scientifically and

systematically on the tests intended for use.
 Although psychologists are enthusiastic with the new publication and

revision of tests, society appears to be increasingly sceptical about the
widespread use of tests as many people experience them as barriers that
prevent them from attaining their educational, vocational or professional
goals.
 The consequences of tests to society can be looked at from two perspectives:

the claims regarding consequences made by test developers; and
consequences that may occur regardless of the claims of test developers.
 Whilst faking is a continuing issue, designers of selection tests have a number

of tools and techniques available that can be used to counteract or at least
detect this, for example using reversed question, ipsative question and item
opacity.

 Psychological testing is often used to predict success in academic and creative

domains and also in the corporate world to determine if an individual has the
required skill-sets to match a particular job requirement. Therefore, bias and
faking-resistant personality test will increase the efficiency of the test.
 Test bias is said to occur when a test yields higher or lower scores on average
when it is administered to specific criterion groups such as people of a
particular race or gender than to an average population sample.
 The cultural influence in testing can be discussed from four aspects: the
influence of different cultural backgrounds in using tests, language of tests,
test taking behaviour, and culture-free and culture-fair tests.
 Legal challenges to the use of tests for decision-making in schools have

focused on ability tracking, placement in special education classes, test scores
as school admissions criteria, test disclosure and teacher competency.
 Psychologists and test users must be guided by a body that governs and
provides guidelines and standards.
 Ability tracking, special education placement, school admissions and test

disclosure are among the aspects that must be dealt with cautiously when
applying psychology tests and measurement as they are prone to legal
challenges.
 There are various issues that are have been focused on in terms of the future
of psychology tests and measurement, for example the payment trends and
the demands for psychological testing.
 Brain mapping is said to be a new technique in psychology test used in

neuropsychology, clinical neuroscience and biohypnosis.

TOPIC 10 ISSUES AND CHALLENGES
C O TESTING
OF  259
Ability tracking Inforrmed consentt

Brain mapping
m Ipsative tests
Centrall response ten
ndency m opacity
Item
Cleary model
m Meaan difference
Confideentiality Men
ntal Measurem
ment Yearboo
oks
Conseq
quential validiity Pred
dictive validitty
Conten
nt validity Regrression modeel
Criterio
on-related vallidity Reveersed question
Cross-ccultural contex
xt Sociaal desirability
y
Culturee-fair test Speccial education
n placement
Culturee free test Test disclosure
Equal regressions Test fairness
Ethical issues k question
Trick
Faking
Aguinis,, H., Culpepp

per, S. A., & Pierce,
P C. A. (22010). Revivaal of test bias research
in preemploym ment testing. The Journaal of Applied d Psychology gy, 95(4),
6488ă80. Retrieveed from http://www.apaa.org/pubs/jo ournals/releaases/apl-
95--4-648.pdf
L R. (2000). Psychological
Aiken, L. Ps t
testing and assessment
as . (11th ed.). Bostton, MA:
Alllyn & Bacon.

Anwar, S. A., Chaker, M. N., & Ferhat, N. R. (2003). Developing a cross-

cultural conceptual model for testing organisational commitment in
the UAE: A theoretical perspective. Journal for International Business
and Entrepreneurship Development, 1(1), 63ă66. Retrieved from
http://www.inderscience.com/info/inarticle.php?artid=7809)
Brislin, R. W., Lonner, W. J., & Thorndike, R. M. (1973). Cross-cultural research

methods. New York: John Wiley & Sons.
Buros, O. K. (1970). Personality: Tests and reviews. Highland Park, NJ: Gryphon
Press.
Buros, O. K. (1975). Intelligence: Tests and reviews. Highland Park, NJ: Gryphon
Press.
Buros, O. K. (1975). Vocational tests and reviews: A monograph consisting of the

vocational sections of the seven mental measurements yearbooks (1938ă
1972) and Tests in print II (1974). Highland Park, NJ: Gryphon Press.
Buros, O. K., & Buros Institute of Mental Measurements. (1961). Tests in print.
Highland Park, NJ: Gryphon Press.
Changingminds.org,. (2014). Reducing faking in tests. Retrieved from

http://changingminds.org/disciplines/hr/selection/reducing_faking.htm
Ekman, P. (1985). Telling lies: Clues to deceit in the marketplace, politics, and
marriage. New York: Norton.
ERIC Clearinghouse on Tests, Measurement, and Evaluation. (1985). Legal issues

in testing. Retrieved from http://www.ericdigests.org/pre-927/legal.htm
Figueroa, R. A. (1990). Best practices in the assessment of bilingual children. In

A. Thomas & J. Grimes (Eds.), Best practices in school psychology II.
Washington, DC: National Association of School Psychologists.
Janda, L. H. (1998). Psychological testing: Theory and applications. Boston, MA:

Allyn & Bacon.
Jensen, A. R. (1980). Bias in mental testing. New York: Free Press.
Luna.cas.usf.edu. (2014). Test bias. Retrieved from

http://luna.cas.usf.edu/~mbrannic/files/tnm/tstbias.htm

Nauert, R. (2008). This personality test cannot be faked ă Psych Central News.
Psych Central.com. Retrieved from
http://psychcentral.com/news/2008/10/08/this-personality-test-cannot-
be-faked/3088.html
Padilla, A., & Medina, A. (1996). Cross-cultural sensitivity in assessment:

Using tests in culturally appropriate ways. In L. A. Suzuki, P. J. Meller &
J. G. Ponterotto (Eds.), Handbook of multicultural assessment: Clinical,
psychological, and educational applications. Englewood Cliffs, NJ: Prentice
Hall.
Reynolds, C. R. (1994). Bias in testing. In R. J. Sternberg, (Ed.), Encyclopedia of

human intelligence. New York: Macmillan.
Rich, J. (2007). Psychological testing: Old specialty, new markets. | The National
Psychologist. Retrieved from
http://nationalpsychologist.com/2007/07/psychological-testing-old-
specialty-new-markets/10933.html
Rogers, M. R. (1998). Psychoeducational assessment of culturally and linguistically

diverse children and youth. In H. B. Vance (Ed.), Psychological assessment
of children: Best practices for schools and clinical settings (2nd ed.).
New York: Wiley.
Sattler, J. M. (1988). Assessment of children. (3rd ed.). San Diego, CA: Sattler.
Sun, X., & Shi, Q. (2007). Language issues in cross cultural usability testing: A
pilot study in China. Retrieved from
http://culturalusability.cbs.dk/downloads/HCI%202007/sunxianghong.
pdf
Tan, U., & Tan, M. (1998). Curvelinear correlations between total testosterone
levels and fluid intelligence in men and women. International Journal of
Neuroscience, 95, 77ă83.

MODULE FEEDBACK
MAKLUM BALAS MODUL
If you have any comment or feedback, you are welcome to:
1. E-mail your comment or feedback to modulefeedback@oum.edu.my
OR
2. Fill in the Print Module online evaluation form available on myINSPIRE.
Thank you.
Centre for Instructional Design and Technology

(Pusat Reka Bentuk Pengajaran dan Teknologi )
Tel No.: 03-27732578
Fax No.: 03-26978702


Abpc1203 BM

Uploaded by

Copyright:

Available Formats

You might also like

Abpc1203 BM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abpc1203 BM

Uploaded by

Copyright:

Available Formats

Faculty of Applied Social Sciences

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Module Writers: Dr Wan Shahrazad Wan Sulaiman

Moderator: Dr Wong Huey Siew

Enhancer: Gan Chun Hong

Developed by: Centre for Instructional Design and Technology

First Edition, April 2012

Copyright © Open University Malaysia (OUM), December 2014, ABPC1203

Topic 1 Introduction to Testing and Assessment 1

Topic 2 The Science of Psychological Measurement 20

Copyright © Open University Malaysia (OUM)

Topic 3 Test Construction 39

Topic 4 Test Administration 57

Copyright © Open University Malaysia (OUM)

Topic 5 Intelligence Test 78

Topic 6 Ability, Aptitude and Achievement Test 103

Copyright © Open University Malaysia (OUM)

6.3 Guidelines for Test Takers 110

Topic 7 Attitudes, Values and Interests Tests 136

Copyright © Open University Malaysia (OUM)

7.4 Career Assessment Inventory (CAI) 154

Topic 8 Personality Test 171

Copyright © Open University Malaysia (OUM)

8.5 Projective Personality Tests 179

Topic 9 Psychology Test and Measurement in Counselling, Health

Copyright © Open University Malaysia (OUM)

Topic 10 Issues and Challenges of Testing 227

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

COURSE GUIDE DESCRIPTION

Copyright © Open University Malaysia (OUM)

Table 1: Estimation of Time Accumulation of Study Hours

1. Discuss different categories of tests;

2. Identify several tests and their usefulness in each category;

3. Demonstrate the ability to determine if tests tend to provide reliable and

4. Demonstrate an understanding of norms and basic statistics used in

5. Explain professional, legal and ethical issues in testing;

6. Describe the rationale for selecting tests to measure various characteristics

7. Organise a test and interpret the results in a professional report.

Copyright © Open University Malaysia (OUM)

Topic 1 introduces psychological testing and assessment, historical, cultural and

Topic 4 describes interviewing techniques, types of interviews, important issues

Topic 6 describes individual tests of specific abilities, group tests, multiple

Topic 8 examines the development of personality test, objectives measures of

Copyright © Open University Malaysia (OUM)

Topic 9 describes psychopathology, the Minnesota Multiphasic Personality

Topic 10 discusses issues of faking, test bias, testing in a cross-cultural context,

TEXT ARRANGEMENT GUIDE

Self-Check: This component of the module is inserted at strategic locations

Activity: Like Self-Check, the Activity component is also placed at various

Copyright © Open University Malaysia (OUM)

References: The References section is where a list of relevant and useful

Cohen, R. J., & Swerdlik, M. E. (2004). Psychological testing and assessment: An

Kaplan, R. M., & Saccuzzo, D. P. (2004). Psychological testing: Principles,

TAN SRI DR ABDULLAH SANUSI (TSDAS)