Professional Documents
Culture Documents
Abpc1203 BM
Abpc1203 BM
Abpc1203 BM
ABPC1203
Psychology Test and Measurement
Copyright © Open
Copyright Open University
University Malaysia
Malaysia (OUM)
(OUM)
Table of Contents
Course Guide xiiiăxviii
INTRODUCTION
ABPC1203 Psychology Test and Measurement is one of the courses offered by the
Faculty of Applied Social Sciences at Open University Malaysia (OUM). This
course is worth three credit hours and should be covered over 8 to 15 weeks.
COURSE AUDIENCE
This course is offered to all students undertaking the Bachelor of Psychology with
Honours programme.
As an open and distance learner, you should be able to learn independently and
optimise the learning modes and environment available to you. Before you begin
this course, please ensure that you have the right course materials, understand
the course requirements, as well as know how the course is conducted.
STUDY SCHEDULE
It is standard OUM practice that learners accumulate 40 study hours for every
credit hour. As such, for a three-credit hour course, you are expected to spend
120 study hours. Table 1 gives an estimation of how the 120 study hours could be
accumulated.
Study
Study Activities
Hours
Briefly go through the course content and participate in initial discussions 3
Study the module 60
Attend 3 to 5 tutorial sessions 10
Online participation 12
Revision 15
Assignment(s), Test(s) and Examination(s) 20
TOTAL STUDY HOURS 120
COURSE OUTCOMES
By the end of this course, you should be able to:
COURSE SYNOPSIS
This course is divided into 10 topics. The synopsis for each topic is listed as
follows:
Topic 2 describes the norms and basic statistics of testing, correlation and
regression and reliability and validity of test items.
Topic 3 identifies the stages of test construction, the goals of a test, types of item
formats, steps in evaluating test items and item analysis in psychological tests.
Topic 5 discusses the concept of intelligence and its measurement, the different
models and theories related to intelligence and major intelligence tests. This is
then followed by descriptions of the types of intelligence tests used for military in
US and the critical issues related to intelligence tests.
Topic 7 explains the rationale for attitudes, values and interest testing, the Strong
Interest Inventory, the Kuder Occupational Interest Survey, Career Assessment
Inventory and Jackson Vocational Interest Survey (JVIS). It also discusses the
various issues related to psychological testing in industrial and business settings.
Learning Outcomes: This section refers to what you should achieve after you
have completely covered a topic. As you go through each topic, you should
frequently refer to these learning outcomes. By doing this, you can continuously
gauge your understanding of the topic.
Summary: You will find this component at the end of each topic. This component
helps you to recap the whole topic. By going through the summary, you should
be able to gauge your knowledge retention level. Should you find points in the
summary that you do not fully understand, it would be a good idea for you to
revisit the details in the module.
Key Terms: This component can be found at the end of each topic. You should go
through this component to remind yourself of important terms or jargon used
throughout the module. Should you find terms here that you are not able to
explain, you should look for the terms in the module.
PRIOR KNOWLEDGE
No prior knowledge required.
ASSESSMENT METHOD
Please refer to myINSPIRE.
REFERENCES
Domino, G., & Domino M. L. (2006). Psychological testing: An introduction
(2nd ed.). New York: Cambridge University Press.
Copyright © Open
Copyright Open University
University Malaysia
Malaysia (OUM)
(OUM)
xviii COURSE GUIDE
INTRODUCTION
As adults, we would have thus far experienced many kinds of tests in our lives.
The moment we were born, our length, weight and physical fitness were
measured. Then, as we grew up, we underwent a series of tests ă from being
measured for our ability to crawl, stand and walk to having our health checked.
These examples show that we are no strangers to tests and testing. Most of the
tests that we have undertaken were in academic settings with the purpose of
assessing how much knowledge we have acquired. Psychological testing is no
different. The only difference is that psychological tests and measurements are
used to assess the behaviour and psychological components of human beings, for
example, intelligence, personality, self-esteem, motivation, quality of life,
depression, stress and many other aspects.
In Topic 1, we will first learn the definitions, basic concepts and principles of
testing and assessment. We will then study the historical development of
psychological testing. At the end of the topic, we will evaluate the advantages
and limitations of psychological testing.
Tests vary in style, rigour and requirements. For example, in a closed book test, a
test taker is often required to rely on memory to respond to specific items, whereas
in an open book test, a test taker may use one or more supplementary tools such as
a reference book or calculator when responding to an item.
Some important prerequisites for an effective test are that it should be objective,
reliable and valid. It should be clear on what property it aims to measure, has
clear instructions for administration purposes, scoring and procedures of
interpreting test results.
It is also a plus if a test offered was economical in terms of the time and money it
took to administer, score and interpret it. Most of all, a good test is one that
measures what it sets out to measure (Cohen, et al., 2010).
We will explore the concepts of reliability and validity which are vital in
psychological testing and measurement in Topic 2.
Definition Source
A psychological test is a systematic and objective Anastasi (1988)
procedure for measuring a sample of behaviour.
A test is a standardised procedure for sampling Gregory (2007)
behaviour and describing it with categories or scores.
A test is a measurement device or technique used to Kaplan and Saccuzzo (2005)
quantify behaviour or aid in the understanding and
prediction of behaviour.
There is a term that appears in the first paragraph of this section: Psychometrician.
This is the term that learners who study psychological tests and measurements
should be aware of. Psychometrics is a term which is closely related to
psychological tests and measurements. Psychometrics may be defined as the
science of psychological measurement. Variants of these words include the
adjective: „Psychometric‰, which refers to measurement that is psychological in
nature; and the nouns: „Psychometrist‰ and „Psychometrician‰, both referring to
psychological test users (Cohen, et al., 2010).
A standardised procedure also means that the test user must follow a
specific procedure in scoring the test. This will lead to a standardised
procedure in interpreting the test results. All the information on the
procedures of using the test is usually included in the manual of each
psychological test.
The same analogy can be used in psychological tests. If a test that measures
stress can produce scores, then we can categorise individuals as having a
low, moderate or high stress level. In the case of personality testing, the
scores from a personality test can tell us what type of personality a person
has.
However, the scores must be compared with people who take the same test.
In this sense, selection and testing the test to a sample of respondents is
crucial. This sample must represent the whole population so that when it
becomes the basis for comparison, it can be used reliably to include all
people in the population. Imagine that you administer a motivation test to
high performers in urban schools and these results are used as the norm or
basis for comparison. Now, imagine that a student of average performance
from a rural school takes the test and obtains a score below the average. Is
this comparison a valid one when the basis for comparison does not
represent those in the rural schools in the first place?
Figure 1.2: Two major types of tests based on the methods of administration
Achievement Tests
Assess the ability that is acquired as a result of previous learning.
For example, a mathematics achievement test measures how many
mathematical questions an individual can solve based on what has
been learnt thus far.
These tests are also known as proficiency tests. The skills already
acquired by the candidate either through his/her education or
experience can be judged through these tests. Such skills are
usually essential during job interviews. A candidate for the post of
a stenographer for example, may be given a test in typewriting
and shorthand to see how accurate and how fast he or she can
perform.
Aptitude Tests
Assess an individualÊs potential for learning or acquiring a specific
skill. Aptitude means the potential which an individual has for
learning the skills required to perform a task or job efficiently. For
example, a mathematics aptitude test measures how many
questions an individual might be able to solve given a certain
amount of training, education and experience. These tests measure
an individualÊs capacity and his or hers potential for development.
In industrial and business settings, aptitude tests are the most
promising indices for predicting an employerÊs success.
Intelligence Tests
Assess a personÊs general potential to solve problems, ability to
adapt to different surroundings, ability of abstract thinking and to
what extent an individual is able to utilise what he or she gains
from experience.
Apart from the categorisation on the types of tests discussed above, there are
many other types of tests which may not fully fit into the categorisation
presented so far. One of them is the interest test.
Interest tests are widely used by counsellors and industrial and organisational
psychologists. This form of test identifies the pattern of interests in areas in
which individuals show special concern, fascination and involvement. These
tests will be able to suggest what type of a job may be satisfying to employees.
Interest tests are also used for vocational guidance. They help the individuals in
taking up occupations of their choice.
Other tests, for example neuropsychological tests, assess brain and nervous
system functions in relation to behaviour. There are also testing and screening
tools to determine levels of anxiety and stress to assess quality of life and coping
strategies, which are essential in health psychology. All these types of tests will
be discussed further in their respective topics in this module.
ACTIVITY 1.1
1. State some of the tests that you have taken until now.
2. Based on a test that you are familiar with, analyse whether or not
the test fulfils the five basic principles of tests as discussed earlier.
SELF-CHECK 1.1
1. Define a test.
Figure 1.4: The three eras from a historical context of the development of psychological
testing
Tests had become quite well developed by the Ming Dynasty. There were
national multistage testing programmes conducted, involving local and regional
testing centres. Those who did well in tests at the local level went on to
provincial capitals for more extensive essay examinations. After this second
testing, those with the highest test scores proceeded to the national level for the
final round. Only those who passed this third set of tests were eligible for public
office. Thus, the first evidence of systematic usage of testing can be found in the
Chinese civilisation. The Western civilisation is believed to have established their
testing system of government officials based on that of the Chinese civilisation.
This theory was later continued by Sir Francis Galton (1869; in Kaplan &
Saccuzzo, 2005) when he proposed his theory in the book Hereditary Genius.
Galton stated that only the fittest human beings survive and they pass on their
genes to the next generation. He further proved his theory by studying
individual differences in sensory-motor functions. His interest in genetics led
him to measure individual differences, where he introduced sensory, perception
and motor tests. Evidence of these first tests by Galton was recorded when
visitors to the International Exposition in 1884 paid to undergo GaltonÊs simple
measures of vision, hearing, physical strength and reaction time.
(b) Movement speed ă the rate at which a hand moves to a distance of 50cm;
(h) Size estimation ă ability to place a sliding line as close as possible to the
middle of a piece of wood which is 50cm in length;
(j) Memory for letters ă number of letters recalled after hearing a random list.
IQ = MA/CA 100
During World War I, Robert M. Yerkes, Goddard and Terman proposed for a test
to be used on American army personnel. The test was called the Army Alpha and
Beta Examination. In the 1920s, a government agency called the National
Research Council developed tests for children like the Wechsler scales, the
Scholastic Aptitude Test and the Graduate Record Examination (GRE). The
College Entrance Examination Board (CEEB) was developed to screen students
for entrance into educational institutions.
The first structured personality tests and trait tests emerged as a result of the
success of intelligence tests. The first personality test developed was the
Woodworth Personal Data Sheet. Then, projective personality tests such as the
Rorschach Inkblot Test and the Thematic Apperception Test (TAT) emerged. The
Minnesota Multiphasic Personality Inventory (MMPI) was developed in the late
1930s and gained rapid growth and improvement. The success of the MMPI
encouraged further development of personality tests such as the Sixteen
Personality Factors (16PF) by Raymond B. Cattell.
The Second World War accelerated the growth of tests in clinical and army settings.
However, between the 1950s and the 1970s, the field of psychological testing
witnessed a relative decline and also gave rise to a wide range of criticisms because
of the abuse and misuse of tests.
From the 1980s to 2000, many new applied psychology tests emerged and the most
important ones were in relation to neuropsychology, health psychology, forensic,
child, space and others. These new applied areas of psychology require intensive
and extensive testing and assessment. As a result, the demands for tests in these
areas are on a continual rise. The improvement of test content, techniques and the
use of computers have had positive impacts. Janda (1998) estimated that 3,009
psychological tests were commercially produced in 1994. Between 1992 and 1995,
418 new tests were produced. Today, in US alone, 20,000 tests are produced in a
year. Most of these are used as tools for research and are not standardised.
SELF-CHECK 1.2
1. List the five aspects that were used in testing during the Han
dynasty.
No Advantages of Testing
1. A test is an objective and standardised behaviour sample, which lends itself well
to statistical evaluation. Also, tests tend to be less subject to bias, particularly tests
of aptitude and achievement.
2. Tests can help to uncover talent that may otherwise be overlooked and to
differentiate between the abilities that are required for the present job with that of
new ones. Another advantage is that a great deal of information about a person
can be collected in a relatively short period of time by using tests.
3. Tests reduce the cost of selection and placement because a large number of
applicants can be evaluated within the least possible time. If an employer expects
to continue in a competitive business, the costs of hiring plus the costs of training
must be kept to a minimum. Psychological tests can reduce the cost of hiring
people, by measuring their aptitude and predicting their success.
4. Tests provide a healthy basis for comparing an applicantÊs background. They
compel the supervisor and the interviewer to think through their evaluation more
carefully. Not only do tests compensate for weaknesses in the interviewer and
supervisor, they also have the effect of increasing the quality of the organisationÊs
employees over a period of time.
5. Tests can be used for differential placements because in testing, attention is
centred on the qualifications required for a specific job. If the applicant fails to
pass the test or does very well in the tests, his or her suitability for a job other
than the one applied for can be explored.
SELF-CHECK 1.3
Tests can be divided into individual tests and group tests based on the mode
of administration.
Based on the psychological aspects that tests measure, there are two major
types of tests: Ability tests and personality tests.
– Standardised procedure;
– Sample of behaviour;
– Prediction of behaviour.
Evidence shows that tests were first systematically used in China. Later, the
Western civilisation learned about testing programmes through the Chinese
and further developed tests.
The major
m limitattion that need
ds attention when
w applyin
ng psychologiical tests
is th
hat there is no
n single test that can meeasure in full the compleexities of
human psycholog gical dispositiions.
Cohen, R.
R J., & Swerddlik, M. E. (20010). Psycholo
logical testing
g and assessm
ment: An
intr
troduction to tests and meeasurement. Boston,
B MA: McGraw-Hill Higher
Edu ucation.
Thurston
ne, L. L. (19938). Primary
y mental abil
ilities. Chicag
go, IL: Univeersity of
Ch
hicago Press.
INTRODUCTION
In the previous topic, we were introduced to psychological testing. We also learnt
its history and development and its advantages and limitations.
In this topic, we will go in-depth into the science of psychological testing and
measurement. We will first study the basic concepts, norms and statistics for
testing. Then, we will learn about correlation, regression, reliability and validity
of these test items.
(b) We can use statistics to make inferences, which are logical deductions about
events that cannot be observed directly (Kaplan & Saccuzzo, 2009).
ACTIVITY 2.1
Can you think about examples that show the importance of statistics
for psychological testing and measurement?
The rules used in assigning numbers are guidelines for representing the
magnitude or some other characteristics, of the object being measured. For
example, check the ruler that you use to measure length. For a 12-inch ruler, the
number 12 is assigned to all lengths that are exactly the same length.
On the other hand, a scale used to measure a discrete variable is usually referred
to as a discrete scale. For example, research subjects are to be categorised as
either female or male. In general, it will not be meaningful to categorise a subject
as anything other than female or male.
(i) Do you like to read magazines related to car and machinery?; and
(ii) For the past two months, have you experienced insomnia more than
three times?
For instance, when you say you are travelling at 0 kilometres per hour, it
means that is the point at which there is no speed at all. If you are driving at
40 kilometres per hour and in 1 minute increase to 80 kilometres, then it can
be said that you have doubled your speed.
ACTIVITY 2.2
(c) Mean
The arithmetic average score in a distribution is called the mean. To
calculate the mean, we total the scores and divide the sum by the number of
cases (Kaplan & Saccuzzo, 2009).
(d) Median
Median is the middle score in a distribution and is a commonly used
measure of central tendency.
(e) Mode
Mode is the most frequently occurring score in a distribution of scores.
(g) Z Score
One of the problems with means and standard deviations is that they do
not covey enough information for us to make meaningful assessments or
accurate interpretations of data. Therefore, the Z score is often used to
transform data into standardised units that are easier to interpret (Kaplan &
Saccuzzo, 2009).
ACTIVITY 2.3
2.1.5 Norms
Norms refer to the performances by defined groups on particular tests (Kaplan &
Saccuzzo, 2009).
Let us say for the normative groups of people who are administered the test, the
average score is 20. One of the employees, Azhar, who is working in OUM takes
the same test and obtains a score of 34. Then the psychologist may conclude that
Azhar is above average in the stress that he experiences in comparison to the
norms of the test.
ACTIVITY 2.4
89, 101, 87, 88, 70, 89, 64, 121, 90, 90, 65, 113, 100, 88, 60, 64, 79, 64, 113,
108, 99, 90
A statistical analysis used to indicate the strength and direction of the relationship
between two quantitative variables is called correlation analysis. Table 2.1 gives
two definitions of correlation analysis.
(c) In psychology, we come across several types of variables which are able
to explain different kinds of relationships. For example, there exists a
relationship among stress levels, sleep quality and how frequent a person
falls sick. Correlation analysis helps in quantifying precisely the degree of
association and direction of such relationships; and
(d) Correlations are useful in determining the validity and reliability of clinical
measures and in expressing how health problems are related to certain
biological or environmental factors.
SELF-CHECK 2.1
Example:
Positive Correlations
Increasing (x) 5 8 10 15 17
Increasing (y) 10 12 16 18 22
Decreasing (x) 17 15 10 8 5
Decreasing (y) 20 18 16 12 10
Negative Correlations
Increasing (x) 5 8 10 15 17
Decreasing (y) 20 18 16 12 10
Decreasing (x) 17 15 12 10 6
Increasing (y) 2 7 9 13 14
Example:
x 10 20 30 40 50
y 40 60 80 100 120
When the values of x and y are plotted on a graph paper, the line joining
these points will be a straight line.
Example:
x 8 9 9 10 10 28 29 30
y 80 130 170 150 230 560 460 600
When the values of x and y are plotted on a graph paper, the line joining
these points will NOT be a straight line, it would be a curvy-linear.
In partial correlation, two variables are chosen for a study of the correlation
between them, but the effect of other influencing variables is kept constant.
For example, attraction among people is influenced by physical proximity
and other factors such as appearance, cultural factors, values, thoughts and
so on, assuming that the average values of the other factors exist.
The statistical technique that expresses the relationships between two or more
variables in the form of an equation to estimate the value of a variable based on
the given value of another variable, is called regression analysis. The variable
whose value is estimated using algebraic equation is called a dependent (or
response) variable and the variable whose value is used to estimate this value is
called an independent (regressor or predictor) variable. The linear algebraic
equation used for expressing a dependent variable in terms of an independent
variable is called a linear regression equation.
(c) When the sample size is large, the interval estimation for predicting the
value of a dependent variable based on a standard error of estimate is
considered to be acceptable by changing the values of either x or y.
SELF-CHECK 2.2
ACTIVITY 2.5
Reliability denotes that the same trait, if measured by tests, should have the same
results at different times in similar conditions, that is, the consistency and
uniformity of the tests should be maintained.
(b) There will not be much difference in the marks obtained by the candidates
if they are re-tested with the same or similar test; and
(c) The purpose of the test is clearly defined, so that another person working
independently would arrive at the same conclusion as that of the
candidates.
A considerable number of factors can cause tests to have low reliability. If a test is
not administered under standardised conditions, the reliability will tend to be
low. Thus, in a shorthand test for stenographers, if the material is not dictated
with the same degree of clarity and at the same speed every time, the test cannot
be expected to be reliable.
In addition, people vary from time to time in their emotional state, degree of
attention, attitude, health, fatigue and so on. If a particular test has few test
questions and is short, chance factors may determine whether an individual does
or does not know a particular fact. Also luck in the selection of answers by
guessing may introduce variance into the scores.
Example: A test that has been designed for the job of a clerk in an organisation
is said to be invalid if it is used for an individual for a managerial job
position. This is because the test which has been designed for a specific job
will not display the same results or correct results for other jobs.
The procedure to determine the validity of a test is to compare the test with
performance on the job. A valid test that measures a specific ability must
differentiate between the more able and the less able. If it is unable to do this, it is
invalid, as it does not measure the ability in question. For example, a valid test in
a particular industry must be able to differentiate between poor and good
workers in that industry.
Validity is always specific, which implies that a good testing instrument is valid
for a specific purpose only. For instance, a test may be valid for selecting a sales
person, but invalid for selecting a scientist. It takes time to determine the validity
of a test. The applicant must be tested, hired and put to work on a mechanical
task. After a period of time, his performance on the job should be measured and
comparison should be made to determine whether the applicants who had high
scores on the test are the ones who have done better on the job.
Broadly speaking, there are three types of validity as shown in Figure 2.3.
SELF-CHECK 2.3
Nominal, ordinal, interval and ratio scales are four major scales of
measurement in psychology tests and measurement.
Norms are the test performance data from a particular group of test takers
that are used as a reference to evaluate or interpret individual test scores.
The reliability of tests is the consistency with which it yields the same score
throughout a series of measurements at different times but on the same
subjects.
C
Correlation Ratio scalees
I
Interval scalees Regression
n
N
Nominal scalles Reliability
y
N
Norms Standard deviation
d
O
Ordinal scalees Standard normal
n distrib
bution
P
Percentile ran
nks Validity
C
Cohen, B. H., & Lea, R. B.. (2004). Esseentials of sttatistics for the
t social and
a
behaviorral sciences. Hoboken,
H NJ: Wiley.
C
Cohen, M E. (2010). Psychological
R. J., & Swerdlik, M. P l testing and assessment: An
A
introducction to tests and measureement. Bostonn, MA: McGrraw-Hill High her
Educatio on.
K
Kaplan, R. M.,
M & Saccuz zzo, D. P. (2009).
( Psycho
hological testi
ting: Principlles,
applicatiions, and issu
ues. Belmont, CA:
C Wadswo orth Cengage Learning.
T
Thompson, B. (2006). Foun
ndations of behavioral statistics:
s An
n insight-bassed
approachh. New York: Guilford Preess.
INTRODUCTION
Valid tests do not just materialise out of thin air; they emerge gradually from an
evolutionary, developmental process that builds in validity from the very
beginning. Test construction is a developmental process from the beginning of its
construction to the stage where the test is determined to be of good quality and
valid to be used. Creating a new test involves both science and art (Gregory,
2007).
Gregory (2007) suggests that test construction comprises of six intertwined stages
as shown in Table 3.1.
No Stages Description
1. Defining the test Involves delimiting its scope and purpose, which must be
known before the developer can proceed to test
construction.
2. Selecting a scaling A process of setting the rules by which numbers are
method assigned to test results.
3. Constructing the Creativity of the test developer is required at this stage.
items
4. Testing the items Once a preliminary version of the test is available, the
developer usually administers it to a modest-sized sample
of subjects in order to collect initial data about test item
characteristics. Testing the items involves a variety of
statistical procedures referred to collectively as item
analysis. The purpose of item analysis is to determine which
items should be retained, revised or thrown out.
5. Revising the test Based on item analysis and other sources of information, the
test is then revised. If the revisions are substantial, new
items and additional pre-testing with new subjects may be
required.
6. Publishing the test In addition to releasing the test materials, the test developer
must produce a user-friendly manual.
Based on the six intertwined stages in test construction mentioned above, the first
four stages which are more essential are discussed in detail in the following.
From a practical point of view, after the purpose of the test has been clearly
stated, one should not proceed immediately to build the test. The next step
should be to determine whether an appropriate test already exists for the same
purpose.
Kaufman and Kaufman (1983) provide a good model of the test definition
process. In proposing the Kaufman Assessment Battery for Children (K-ABC), a
new test of general intelligence in children, the authors listed six primary goals
that defined the purpose of the test, which distinguishes it from existing
measures (Kaufman & Kaufman, 1983), as shown in Figure 3.1.
The major shortcomings of multiple choice questions are, first, the difficulty
of writing good distractor options and second, the possibility that the
presence of the response may cue a half-knowledgeable respondent to the
correct answer.
(e) Checklists
One example of a checklist is an adjective checklist (Gough, 1960) in which
a subject receives a long list of adjectives and indicates whether each one is
a characteristic of himself or herself. An adjective checklist can be used for
describing either oneself or someone else. It requires subjects either to
endorse such adjectives or not to endorse them, thus allowing only two
choices for each item.
Example:
Traits that characterise a group of 40 graduate students.
SELF-CHECK 3.1
DeVellis (1991) provides several simple guidelines for item writing, which are:
For example, there are a number of job applicants who are depending on the
results of a personality test for their future in securing a dream job. The test items
are basically questions that are to be given to the individual applicants in order to
test their different personality traits as required for the job. Test items are to be
prepared keeping in mind the basic needs and objectives of the psychological
tests. A test item must focus the attention of the examinee on the principle or
construct upon which the item is based.
For item writers however, the task is to focus the attention of a group of potential
test takers for a particular test ă often with widely varying background
experiences ă on a single idea. Such communication requires extreme care in
choice of words and it may be necessary to try the items out before problems can
be identified.
Source: https://www.msu.edu/dept/soweb/writitem.html
The blueprint specifies the number of items to be constructed for each cell
of the two-way chart. For example, in the test blueprint shown in Table 3.3,
two items are to involve the application of principles of reliability.
(c) Understanding of the Target Test Takers for whom the Items are Intended
The item difficulty should be such that it does not go over the heads of the
examinees; they should not feel mentally pressured regarding what has
been given in the test. On the other hand, the item difficulty should also not
be too low, that it does not pose any challenge for the examinees or test
takers to the point that they start taking it lightly. For example in an
achievement test for students, the students must identify the test items with
what they themselves have covered in their studies.
(g) Place all Items of a Given Type of Format Together in the Test
The questions in the test should be organised properly so as to keep the
same types of items format together in the test. For example, it is better to
group items in the Likert format together if they are measuring the same
concept. This allows the examinees to respond to all items requiring a
common mindset at one time. They do not have to shift back and forth from
one type of task to another. Furthermore, when items are grouped by type,
each item is contiguous with its appropriate set of directions.
Copyright © Open University Malaysia (OUM)
48 TOPIC 3 TEST CONSTRUCTION
ACTIVITY 3.1
Think about a psychological property that you want to measure. Based
on the essential characteristics in writing better test items, write ten
related test items for measuring the chosen property. Discuss what you
have written with your study mates, face-to-face tutor and e-tutor to
check how effective your suggested test items are.
Again, take an example of an achievement test for students in a school. After the
test is administered, there will be a wealth of information available about how
students performed on each test item. The most convenient way to organise all
this information is in an Item Analysis (IA). An IA provides a breakdown of how
different types of students performed on various aspects of each item. IAs are
particularly useful for multiple choice tests, but could conceivably be used for
other item types as well.
Instructors who bring their test data to Testing & Evaluation Services for
scanning and scoring will receive a detailed IA report along with their scored
rosters. The process of evaluating tests is described in Figure 3.3 below.
(c) Establish from the beginning what will be used to evaluate examinees;
(f) Go over the answers to the test questions with psychometricians or other
psychologists to get broader views on the items written; and
(g) Focus on the major points that were not understood by the examinees and
make appropriate amendments on the particular test items.
The reason for evaluating psychological tests and measurement is to ensure that
the tests constructed are precise and appropriate to measure what they are
supposed to measure, in a reliable and valid manner.
Facione (2000) also agreed by saying that statistical analyses of the responses of a
sufficiently large and representative sample of test takers allow for the
elimination of items that fail adequately to discriminate among test takers, items
where the responses are inversely correlated with the overall scores of the test
and in the interest of brevity, items that add little or nothing by way of further
refinement of overall scores.
How are the samples of final test items chosen from the original items? Test
developers usually employ item analysis, a set of statistical procedures to
identify the best items to be included in a test. Generally, the objective of item
analysis is to determine which items should be retained, improved and
eliminated. Many of the methods of item analysis originate from application in
ability and performance testing, especially for multiple choice questions. In these
domains, there are right and wrong answers. However, item analysis procedures
are also used for tests in other domains, such as in personality and attitude tests.
The evaluation of whether a test is a good test depends on its reliability and
validity. Therefore, good items must also have reliability and validity. In other
words, a good test must consist of good items. In addition, good test items must
also be able to discriminate between test takers. This means that a good test item
is one that high scorers on the test as a whole get right. An item that high scorers
on the test as a whole do not get right is probably not a good item. We may also
describe a good test item as one that low scorers on the test as a whole get wrong.
An item that low scorers on the test as a whole get right may not be a good item.
If all test takers answer item 1 correctly it means that this item is too easy and is
not a good item. In contrast, if everyone answers item 1 incorrectly; this shows
that this item is too difficult. The item difficulty index is therefore a useful
technique for identifying items that need to be improved on or eliminated.
Item difficulty index is defined as the number of individuals getting the correct
answer for an item. For example, if 84% of individuals taking the test answer
item 24 correctly, then the item difficulty index is .84. Item difficulty is
represented by the value of p, where „p‰ indicates percentage or proportion.
These proportions do not really indicate item „difficulty‰ but item „easiness‰.
The higher the proportions of people who get the item correct, the easier the item
(Allen & Yen, 1979).
In practice, a good item usually has an item difficulty in the range of .30 to .70.
The statistics referred to as an item difficulty index in the context of achievement
testing may be an item-endorsement index in other contexts, such as personality
testing. Here, the statistics provide not a measure of the percentage of people
passing the item but a measure of the percentage of people who said yes to,
agreed with or endorsed the item. In most tests, the items should have a variety
of difficulty levels because a good test discriminates at many levels (Kaplan &
Saccuzzo, 2005).
ACTIVITY 3.2
63% of test takers get the correct answer for an item in a cognitive
ability test.
(a) What is the item difficulty index for this item?
(b) Is this item easy or difficult? Give reason(s) for your answer.
(c) Is this item a good item based on its item difficulty index? Provide
your thoughts on this.
B = (U/n1) ă (L/n2)
where,
B = Brennan discrimination index
U = The number of individuals in the upper group who answered item 1
correctly
L = The number of individuals in the lower group who answered item 1
correctly
n1 = The number of individuals in the upper group
n2 = The number of individuals in the lower group
According to Brennan (1972; in Iran Herman & Muhamed Awang, 1999) dividing
the upper and lower groups must be based on a meaningful comparison value
that can truly separate the two groups. Allen and Yen (1979) defined the upper
and lower groups as highs and lows of 10% to 33% respectively, from the number
of individuals in a group. Kelley (1939) suggested that the number of individuals
in the upper and lower groups is 27%. However, the same decision can be
obtained in using 30% or 50% for the upper and lower groups (Beuchert &
Mendoza, 1979; in Iran Herman & Muhamed Awang, 1999).
The B index is between ă1.0 and +1.0 where items with positive and high values
are better items compared to items with lower values. To determine whether an
item has discrimination power, Ebel (1965) proposed the following guideline as
shown in Table 3.4 in interpreting an item discrimination index.
By computing the item reliability index for every item in the preliminary test, we
can eliminate the „outlier‰ items that have the lowest value of this index. Such
items will usually possess poor internal consistency or weak dispersion of scores
and therefore do not contribute to the goals of measurement.
After the four major stages discussed above are completed, the test is then
revised to improve its quality based on the characteristics of a good test. After
which, the test is ready for use.
SELF-CHECK 3.2
ACTIVITY 3.3
What would you do with an item in a test that you developed which
has an item discrimination index of 0.15? Justify your answer.
The item formats most used in the test of ability and performance are
multiple choice items and two-choice answer formats, while psychological
tests in the form of surveys and inventories usually use the Likert scale and
dichotomous response format.
After writing, evaluation is also important as the future test taker for a
particular test may depend on it in making important decisions. It should be
fair, unbiased and accurate. Feedback should be given after the test.
Four techniques of item analysis that can be used are item difficulty index,
item discrimination index, item reliability index and item validity index.
Criterio
on-referenced
d test Item
m reliability in
ndex
Dichoto
omous respon
nse format Item
m validity indeex
Distracttors Likeert scale
Item an
nalysis Multtiple choice ittems
Item diffficulty index
x Norm
m-referenced
d test
Item disscrimination index Test construction
n
Item forrmats Test blueprint
L R. (2000). Personality:
Aiken, L. Pe Th
Theories, assesssment, resear
arch, and appllications.
Sprringfield, IL: Charles
C C. Th
homas.
Gough, H.
H G. (1960). The y assessment research
T adjectivee check list as a personality
technique. Psych
chological Rep
ports, 6(1), 1077ă122.
Iran Herman & Muhamed Awang. (1999). Ujian dan pengukuran. Modul
Pengajian Jarak Jauh. Bangi: Universiti Kebangsaan Malaysia.
Kelley, T. L. (1939). The selection of upper and lower groups for the validation of
test items. Journal of Educational Psychology, 30(1), 17.
Norman, G. (2003). Hi! How are you? Response shift, implicit theories and
differing epistemologies. Quality of Life Research, 12(3), 239ă249.
Stevens, S. S. (1966). A metric for the social consensus. Science, 151, 530ă541.
INTRODUCTION
This topic discusses the administration of psychological testing. First, we will
look at interviews as a form of psychological testing used to obtain data on
human behaviour. This will include principles of effective interviewing as
suggested by Kaplan and Saccuzzo (2005). Several types of interviews and their
application in psychology will also be discussed.
Many psychological tests, such as the Thematic Apperception Test (TAT), cannot
be properly used without conducting interviews. The interview remains one of
the most prevalent selection devices for employment (Posthuma, Morgeson &
Campion, 2002). Furthermore, interviewing is the chief method of collecting data
in clinical psychiatry (Allen & Smith, 1993; Groth-Marnat, 2003). Therefore,
interviewing is an important method of collecting data across many fields of
psychology such as clinical, industrial, counselling, school and correctional
psychology.
Responses Description
Judgemental or Being judgemental means evaluating the thoughts, feelings
evaluative or actions of another. These judgements prevent other people
statements from revealing important information.
Probing The interviewer may push the interviewee to reveal
statements something that the interviewee is unwilling to reveal. This
means that the interview is demanding for more information
than the interviewee wants to voluntarily give. If this
happens, the interviewee will probably feel anxious and
therefore refuse to reveal additional information.
Hostility The interviewer uses hostile statements which can anger the
interviewee. Interviewers should avoid such responses
unless necessary, for example, when determining how an
interviewee reacts to anger.
False reassurance A reassuring statement attempts to comfort or support the
interviewee. Though reassurance is sometimes appropriate,
an interviewer should always avoid false reassurance.
ACTIVITY 4.1
(i) A discrepancy between what the person is and what he or she wants
to become;
(ii) A discrepancy between what the person says about himself or herself
and what he or she does; and
Towards the end of the interview, direct questions can be used to fill in
details or gaps in the interviewerÊs knowledge. The use of direct questions
is necessary in three conditions:
(ii) Time is limited and the interviewer needs specific information; and
(iii) The interviewee cannot or will not cooperate with the interviewer.
ACTIVITY 4.2
1. Select an individual as your subject. Conduct a case history
interview on the subject.
Apart from that, the perceptions, state of mind and previous experiences and
expectations of the examinee all play a role in a testing environment. As such, to
minimise the influence of these variables, all conditions of testing have to be
standardised.
Another major error that can be made during the administration of a group test is
the inaccurate allocation of time for tests which require time limits, such as the
Miller Analogies Test (MAT).
ACTIVITY 4.3
1. Discuss how the familiarity between a test examiner and test taker
can either positively or negatively bias test results.
(b) Temperature;
Controlling these factors of the physical environment also helps to ensure a more
reliable testing device.
ACTIVITY 4.4
(iii) Ideally, do not test for longer than 1 hour (in general, the attention
span for preschool and elementary school children is 30 minutes and
not longer than 90 minutes for secondary school children). However,
many psychology tests need longer hours. Therefore, allow breaks in
between for the children to rest.
(iii) What type of test questions will be included to the test?; and
(iv) How much time will be allowed for test takers to complete the test?
Even when consent is not legally required, test administrators should still
inform test takers about the specifics of a test.
Understanding the test from „both sides of the fence‰ will make the testing
session run more smoothly as the administrator will understand it from the
perspective of test takers as well.
Specific directions and procedures should also be reviewed one last time
immediately before the test begins.
Examiners must also become familiar with security procedures for secure
tests such as the Scholastic Aptitude Test (SAT), Law School Admission
Test (LSAT) and Graduate Record Examination (GRE). Each exam should
be inspected and arranged in numerical order.
Many tests have standardised instructions, which serve to keep the test
tasks identical for all respondents.
(e) Flexibility
Standardised directions may not cover all possible situations. The test
administrator should always be prepared to deal with novel problems.
Experience is sometimes the best teacher when it comes to bizarre testing
situations.
(b) Allow the test taker to have sufficient practice on sample items;
(d) Make arrangements for deficits in visual, auditory and other sensory-motor
systems;
(e) Be aware of fatigue and test anxiety and take them into account when
interpreting scores;
(g) Do not force examinees to respond when they repeatedly decline to do so.
(b) All answer or scoring sheets have names or other necessary identification
indicating which test paper belongs to whom;
(c) Discuss with test takers on when feedback of the test results can be given;
(d) The confidentiality of the test results and that the test and measurement
information are kept in a safe and proper place. The test administrator must
consider how the confidentiality of the tests can still be maintained even
though the respective test administrator is no longer in the organisation.
Delete test results and data which are private and confidential if necessary,
after feedback is given and the test results have served their purpose; and
(e) Test room and testing tools are back to their pre-test set up for the
convenience of the next testing session.
(d) Less costly and enables the examiner to perform other duties;
(g) Testers will find it more interesting to interact with a computer; and
(a) Results are easily misinterpreted and this may cause harm to test takers;
(f) Depending on the computer to do all the thinking may cause the insights
and clinical judgement made by well-trained clinical psychologists to not be
taken into consideration.
SELF-CHECK 4.1
ACTIVITY 4.5
1. Read Table 4.3 below regarding various testing issues and their
explanations. Discuss in tutorial or on the myVLE forum your
views on the testing issues highlighted and whether you agree
with the explanations, from the perspective of psychology testing
and measurement.
ă Evaluation interview;
ă Case history.
C
Carkhuff, R. R. (1969). Helping
H and human
h relatiions: A prim
mer for lay and
a
professio
onal helpers. New
N York: Ho
olt, Rinehart and
a Winston.
D
Dillard, J. P., & Marshall, L.L J. (2003). PPersuasion ass a social skilll. Handbook
k of
commun nication and so
ocial interactio
on skills, 479ă5513.
First, M. B., Sp
pitzer, R. L., Gibbon,
G M., & Williams, J. B. (1997). Strructured cliniical
interview
w for DSM M-IV axis I disorders SCID-I: Clin inician versioion,
administtration bookleet. Arlington, VA: Americaan Psychiatricc Publishing.
Hensley, W. E.
H E (1994). Heig onal attraction. Adolescen
ght as a basis for interperso nce,
29(114), 469ă474.
4
K
Kaplan, R. M.,
M & Saccuz zzo, D. P. (22005). Psych
hological testi
ting: Principlles,
applicatiions, and issu
ues. Belmont, CA:
C Wadswo orth Cengage Learning.
Lonner, W. J. (1990). An ov
L verview of cro oss-cultural testing
t and asssessment. In R.
W. Brisliin (Ed.), Appllied cross-culltural psychollogy (56ă76). Newbury Park,
CA: Sagee.
Sattler, J. M., & Dumont, R. (2004). Assessment of children: WISC-IV and WPPSI-
III supplement. San Diego, CA: Jerome M. Sattler.
Sattler, J. M., & Theye, F. (1967). Procedural, situational, and interpersonal variables
in individual intelligence testing. Psychological Bulletin, 68(5), 347.
Tyler, L. E. (1969). The work of the counselor (3rd ed.). New York: Appleton-
Century-Crofts.
Vernon, M., & Brown, D. W. (1964). A guide to psychological tests and testing
procedures in the evaluation of deaf and hard-of-hearing children. Journal
of Speech and Hearing Disorders, 29(4), 414.
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Describe the concept of intelligence and its measurement;
2. Identify the different models and theories in defining intelligence;
3. Explain major intelligence tests;
4. Describe the intelligence tests used for military purposes; and
5. Discuss critical issues regarding intelligence tests.
INTRODUCTION
Intelligence tests are widely used by clinical psychologists in Malaysia as part of
psychological assessment especially in determining psychological disorders
related to cognition and learning.
When applying for the People with Disability Card (or in Bahasa Malaysia: „Kad
Orang Kurang Upaya‰), intelligence tests are often requested to determine the
intelligence quotient (IQ) of the applicants.
This topic focuses on the discussion of intelligence tests, one of the major
psychology tests. Theories of intelligence such as SpearmanÊs „g‰ factor theory,
ThurstoneÊs theory of primary mental abilities and the multidimensional models
of intelligence are presented as these theories provide the foundation for many
intelligence tests. Two main intelligence tests are described at length due to their
importance as the first psychological tests. They are the Stanford-Binet
intelligence test and the Wechsler scales of intelligence. As a comparison,
intelligence measurement for military use, drawing on the United StatesÊ as an
example will be introduced as well. Finally, issues related to intelligence testing
will be presented.
SELF-CHECK 5.1
The techniques used in GaltonÊs tests of intelligence were widely used until the
emergence of an alternative approach developed by Alfred Binet (1857ă1911)
together with his associate, Theodosius Simon. By request from the Ministry of
Public Instruction in France at the time, Binet and Simon (1916) constructed a test
to measure intelligence, focusing on childrenÊs learning ability in academic
settings.
(c) Critique ă the ability to criticise self-thinking and action. Priority is given to
instructions, and adaptation and critique in Binet and SimonÊs approach
can be considered to be suitable with the current perspective about
intelligence that also stresses on the metacognitive process.
To achieve this objective, Binet and Simon determined the mental age (average
level of intelligence of individuals at a certain age level) for each child. For
instance, if a child has the mental age of seven, this means his level of thinking is
similar with the thinking of other seven-year-old children.
IQ = (MA/CA) 100
Calculations using this formula, when the mental age of a child is higher than the
chronological age, will produce an IQ score of more than 100. In contrast, if the
chronological age is higher than the mental age, the ratio will produce an IQ
score of less than 100.
ACTIVITY 5.1
W 15 years 10 years
X 30 years 40 years
Y 55 years 55 years
Z 35 years 28 years
Although there are variations, all researchers who use factor analysis adhere to
the steps mentioned. Several factorial theories have been proposed in the study
of intelligence and intelligence tests, including theories by Spearman, Thurstone,
Guilford, Cattell, Vernon and Carroll.
Let us now discuss a few of the models and theories proposed in detail.
His theory is also referred to as the two-factor theory of intelligence, with the
general factor or often called the „g‰ factor, representing the portion of the
variance that all intelligence tests have in common and the remaining portions of
the variance being accounted for mainly by specific components of this general
factor. Figure 5.1 illustrates SpearmanÊs concept of intelligence in brief.
(b) Content is the situation that exists in a problem, such as a symbol, semantic,
behaviour, sound and visual; and
(c) Product is the response required, such as a unit, class, relationship, system,
transformation and implication.
CarrollÊs theory has similar models as CattellÊs. By using a total of 460 sets of data
collected since 1927, involving 130,000 individuals from various strata in the
society, across several countries which use English as their medium of
instruction, Carroll was able to map out his hierarchical model of intelligence.
According to Carroll, human intelligence is comprised of three strata, as shown
in Table 5.3.
Strata of Human
Description
Intelligence
Stratum I Includes specific abilities (for example, the ability to spell and
speed of reasoning).
Stratum II Consists of various general abilities (for example, fluid intelligence,
and crystallised intelligence).
Stratum III Consists of a single general ability similar with SpearmanÊs
conception of „g‰.
Apart from fluid and crystallised intelligence, Carroll also suggested learning,
memory process, visual perception, auditory perception, idea generation and
speed (whether from speed and accuracy of response) as substrata Stratum II.
Although Carroll did not suggest anything new, he managed to integrate some
reading materials on intelligence based on factor analysis, making him the
researcher with the most authority about his model.
Gardner listed seven types of intelligence and the tasks that reflect the related
intelligence. The seven independent frames of mind or forms of intelligence are:
(a) Linguistic;
(b) Logical-mathematical;
(c) Musical;
(d) Spatial;
Gardner also suggested eight signs that were considered as the criteria to detect
the existence of various types of intelligence, as shown in Table 5.4.
Table 5.4: Eight Signs to Detect the Existence of Various Types of Intelligence
No Description
1. Separation potential caused by brain deformity, which occurred due to damage to
a discrete location (for example, location related with verbal aphasia) that brings
about damage, or in contrast, retains intelligent actions.
2. Existence of individuals with special abilities (for example, ability in music and
mathematics) that show high ability, or in contrast, show handicap in intelligent
action in related fields.
3. Basic operation or a set of operations that can be identified (for example, the
ability to identify relations among musical notes) and which are considered
necessary to perform a type of intelligent action.
4. History of discrete development that propels individuals from the novice level to
the master level along with other levels of expert performance which are clear or
discrete.
5. History of evolution; through it an increase of intelligence is considered to be
related logically with the increase of adaptation to the environment.
6. Proof from the support of past experimental-cognitive studies, such as difference
of performance on specific tasks across separate types of intelligence, together
with similarities of performance across tasks and within tasks of discrete
intelligence.
7. Proof from the support of psychometric test results that show discrete
intelligence.
8. Susceptibility towards coding in the symbol system (for example, language,
mathematics, musical notes) or in the area of cultural creativity (for example,
dance, athletics, theatre, engineering and surgery).
After discussing the five popular models and theory of intelligence, two
major intelligence tests will be explained in detail in the following section. The
two major intelligence tests are: The Stanford-Binet intelligence scale and the
Wechsler scales.
SELF-CHECK 5.2
(a) It did not have a suitable measurement unit to explain the test results;
(b) It did not have enough normative data to support validity; and
(c) The norms were only based on 50 children who were considered normal
according to school performance.
The second version of the Binet-Simon scale, which was revised in 1908,
introduced the concept of an age scale. Items were grouped according to age
levels and not based on difficulty levels. However, the weakness was that it did
not vary the range of abilities. The scale only comprised of language, reading and
verbal skills. However, this version introduced the concept of mental age. The
norms were also increased to 203 samples.
In the 1937 version, the scale widened its age range to the age level of two years
old and increased the maximum mental age to 22 years, 10 months. Samples used
were increased to 3,184. This version also included alternate forms such as Forms
L and M. Both Forms L and M were designed to be equivalent in terms of
difficulty and content. With two such forms, the psychometric properties of the
scale could be readily examined. However, it was similar with regards to
difficulty and content.
The 1960 version managed to establish the standard score using a mean of 100
and standard deviation of 16. Representative samples were chosen based on
2,100 children.
The modern Stanford-Binet Intelligence Scales was introduced from the revision
made in 1986. This revision included the intelligence theory of fluid and
crystallised intelligence: gf-gc. Items in this version are arranged according to the
three-level hierarchical model as shown in Figure 5.3.
The modern Stanford-Binet Intelligence Scales eliminates the age scale. Items are
arranged according to content. The test format is in an adaptive form. It uses
subject scores in vocabulary tests and the chronological age.
In addition, basal age has to be determined, which refers to the lowest level in
which two items with the same level of difficulty can be answered consecutively.
Then, the ceiling age is also determined, which refers to the point where at least
three out of four items cannot be answered.
Standardised samples were taken from 5,000 subjects in 47 states in the USA. The
selection of samples was based on the strata of geographical location, community
size, ethnic groups, age and gender.
The reliability reported for the scale was good with internal consistency using the
KR20 method. The reliability index is more than .90. The high index is necessary
to make decisions on individuals. Test-retest reliability showed the reliability
index of .91 for five-year-old subjects and .90 for eight-year-old subjects.
The Wechsler Intelligence Scales are the most common intelligence tests used in
our country to measure the intelligence level of an individual. There are three
Wechsler scales of intelligence as shown in Figure 5.4.
(c) WPPSI-III is for children from age 2 years 6 months to 7 years 3 months.
In general, all three Wechsler scales produce three types of scores: verbal score,
performance score and total score. Verbal score is obtained from tests such as
vocabulary and verbal similarities, while performance score is obtained from
tests such as picture completion and picture arrangement. The total score is the
combination of the verbal and performance scores.
Like Binet, Wechsler also assumed that human intelligence is wider than what is
measured by the test. Although Wechsler believed in intelligence assessment, he
did not limit the conception of intelligence to the scores of intelligence tests.
Wechsler believed that intelligence is the basis of human life. Individuals use
intelligence not only to sit for intelligence test or complete school work, but they
also use their intelligence to interact with other people, perform tasks effectively
and manage daily lives. Focus on assessment of intelligence is only one of several
theoretical approaches and research on intelligence.
All three Wechsler scales have good norms. Split-half reliability is more than .95,
while reliability of verbal IQ and performance IQ each is within the range of .90
to .95. The validity of WAIS is also satisfactory. Good criterion validity was
shown in many studies of correlation between WAIS-III with other tests of
intelligence and academic performance.
SELF-CHECK 5.3
After the war, the armyÊs system of scoring was translated into mental age levels
and the results were made public. According to the scales and the method of
calculation used then, it was estimated that the average army draftee had a
mental age of about fourteen years. These tests initiated a debate that has gone
on ever since. What is intelligence? Can it be measured?
The committee tried a series of tests out in a few camps, timing the participants.
The number of text items and the time limits were then fixed so that only about
five percent of an average group would be able to finish the entire test in the time
allowed.
This determined the „A‰ man, a man supposedly with „very superior
intelligence.‰ Between 100 and 200 men were ordered to report for testing at a
time. After a five-minute literacy test, those who could not read or write English
were withdrawn, and the rest were given pencils and printed forms of the Army
Group Examination Alpha. A senior officer stood at the front of the room and
read the general directions only once. Then, the men were given the tests.
The Beta tests were constructed so that the directions could be given in
pantomime. Test I, for example, was a maze. An assistant demonstrated by
tracing through a sample maze on a blackboard at the front of the room with a
piece of chalk. When he purposely went into a blind alley and crossed over a line,
the officer shook his head, said, „No, no‰ and took the demonstratorÊs hand back
to the place where he could get on the right track again. Then, he traced an
imaginary line with his finger through each maze on the sheet and said, „All
right. Go ahead. Do it. Hurry up‰. Speed was emphasised as orderlies walked
about the room motioning to men who were not working and telling them to „Do
it. Do it. Hurry up, quickly‰.
In the uproar that followed the publication of the test results, Lewis M.
Terman, the creator of the Stanford-Binet tests, pointed out that the mental
age standards for the army were established by giving both the Alpha and
the Beta tests to groups of schoolchildren. It came as no surprise to test
critics that the average fourteen-year-old student in school did as well as or
a little better than soldiers who on average had less formal education.
However, the March 1919 issue of The American Magazine carried what it called
a specimen set of the Army Alpha test under the heading „Try These Tests on
Yourself and Others‰:
With your pencil, make a dot over any one of these letters FGHIJ, and a
comma after the longest of these three words: boy mother girl. Then, if
Christmas comes in March, make a cross right here ⁄ but if not, pass along
to the next question, and tell where the sun rises. If you believe that Edison
discovered America, cross out what you just wrote, but if it was someone
else, put in a number to complete this sentence: „a Horse has ⁄ feet.‰
The entire version of this sample took the average adult 125 seconds to answer.
Fifty percent of average educated adults came somewhere between 100 seconds
and 150 seconds. Those who took less than 100 seconds were ranked in the
superior 25 percent. Those who took more than 150 seconds were labelled in the
poorest 25 percent. No one taking the test scored the maximum. Scores were
ranked according to the following scale as shown in Table 5.5:
An E rating was reserved for those who were considered unfit for duty because
of mental inferiority and who were then discharged from the army (about 0.5
percent).
SELF-CHECK 5.4
ACTIVITY 5.2
For example, the Head Start programme was implemented in the United States to
increase the intellectual capabilities and performance of preschool children.
Studies intended to evaluate its effectiveness showed that by middle adolescence,
children who participated in the Head Start programme from the beginning
obtained a performance level of one grade higher than children in the control
group who did not participate in the programme (Lazar & Darlington, 1982;
Zigler & Berman, 1983). Children who participated in the programme also
showed higher scores in various performance tests in school, did not require
remedial attention and showed less symptoms of behavioural problems.
Although it was not an actual measurement of intelligence, it was a form of
assessment that showed positive and strong correlations with intelligence tests.
Apart from Head Start, several other programmes have also showed encouraging
success in increasing the intellectual abilities of children. One of them was
the Instrumental Enrichment programme, which involved training in various
abstract reasoning skills and which seemed effective in improving the skills in
retarded children. Another programme, The Philosophy for Children (Lipman,
Sharp & Oscanyan, 1980), succeeded in teaching logical thinking skills to children
in primary and secondary school levels.
A study by Bradley and Caldwell (1984; in Iran Herman & Muhamed Awang,
1999) also found that variables listed previously effectively predict IQ scores
compared to socioeconomic statuses. Current studies by Pianta and Egeland
(1994) suggest factors such as social support and interactive behaviour play an
important role in determining the stability of scores on intellectual abilities test
among children between two and eight years old.
This is because many people believe that achievement test scores do not have the
same meaning as intelligence test scores. We tend to view an IQ score as
reflecting a general ability and hence, as having wider implications. We may
conclude that a person with low achievement test scores should have studied
harder in school, but we are likely to view a person with a low IQ score as being
less capable and by implication, a less worthy individual (Janda, 1998).
Brody (1992) argues that there is redundant information in a studentÊs file that
includes both intelligence and achievement test results, with the intelligence test
score offering a fairer standard for making decisions. This is true because not all
students have the same educational experiences.
In his defence of intelligence tests, Brody (1992) states that there is not another
single index that is as predictive of socially important outcomes as are tests of
general intellectual ability. Large numbers of professionals in educational and
clinical settings believe that these tests are useful in the decision-making process.
They also believe that without such tests, it would be impossible to conduct the
research necessary to expand our knowledge of intelligence and to learn more
about how we might maximise a personÊs potential (Janda, 1998).
ACTIVITY 5.3
Two major tests of intelligence which are widely used are the Stanford-Binet
intelligence scale and the Wechsler scales of intelligence.
The Army Alpha tests and the Army Beta tests are two intelligence tests
initially used in the USAÊs military during World War I, but with many
critical issues.
B
Brody, N. (19992). Intelligenc
nce. San Diego
o, CA: Academ
mic Press.
D
Das, J. P. (19773). Cultural deprivation
d a
and cognitivee competencee. In Ellis, N. R.
(Ed.), Int
nternational Review
R of Ressearch in Men ntal Retardattion. New York:
Academiic Press.
G
Guilford, J. P. (1967). The nature
n of hum
man intelligencce. New York
k: McGraw-Hiill.
Sternberg, R. J. (1988). Th
he nature off creativity: Contemporary
C y psychologiical
perspecti
tives. Cambrid
dge, ENG: Caambridge Uniiversity Press.
T
Terman, L. M.. (1916). The measurement
m nt of intelligen
nce: An explaanation of and
da
completee guide for the
th use of thehe Stanford reevision and extension
e of the
t
Binet-Sim
mon intelligennce scale. Boston, MA: Houughton Miffliin.
Wechsler, D. (1939).
W ( The measurement
m o adult intellligence. Baltiimore, MD: The
of T
Williamss & Wilkins Company.
C
INTRODUCTION
In the previous topic, the theories of intelligence, popular intelligence testing
tools and the issues related to intelligence tests were discussed. In this topic, you
are going to learn about ability, aptitude and achievement tests. As these tests are
usually administered in groups, the issues related to group tests will be
highlighted as well. Furthermore, specific ability, aptitude and achievement tests
used in education, business and civil services settings will be introduced.
Towards the end of this topic, you will also learn about the various issues
concerning aptitude and achievement testing.
Ability tests are also known as aptitude or intelligence tests. These are
standardised batteries administered by qualified professionals that assess an
individualÊs overall thinking and reasoning abilities. The terms intelligence,
ability and aptitude are often used interchangeably to refer to behaviour that is
used to predict future learning or performance. However, subtle differences exist
between the terms, especially for intelligence tests.
Intelligence tests assess general intelligence. The Binet and Wechsler scales
introduced in Topic 5 are exceptionally good instruments for this. However, both
scales have limitations, one of which is that they cannot be used to assess a
personÊs special abilities.
Therefore, several individual tests have been created to meet special problems,
measure specific abilities or address the limitations of the Binet and Wechsler
scales (Kaplan & Saccuzzo, 2009). These are ability and aptitude tests and are
widely used in education and in particular, special education.
In this topic, both ability and aptitude tests are termed as „aptitude test‰ in the
discussions that follow. To further differentiate aptitude and achievement tests,
the primary difference between aptitude tests and achievement tests is that
aptitude tests tend to focus more on informal learning or life experiences,
whereas achievement tests tend to focus on the learning that has occurred as a
result of relatively structured input (Cohen & Swerdlik, 2010).
ACTIVITY 6.1
In speed tests, the questions are relatively straightforward and the test is
concerned with how many questions a test taker can answer correctly within an
allotted time. In the context of business and industry application, speed tests tend
to be used in selection at the administrative and clerical levels.
A power test, on the other hand, will present a smaller number of more complex
questions. For business and industry settings, power tests tend to be used more
at the professional or managerial levels.
In Table 6.1, some of the common types of questions in aptitude and achievement
tests are explained in detail.
Source: http://www.psychometric-success.com/aptitude-tests/aptitude-tests-
introduction.htm
SELF-CHECK 6.1
The same thing applies to numerical ability. Most people who have left education
for more than a few years will have forgotten certain skills such as how to
multiply fractions and calculate volumes. While it is easy to dismiss these as
„first grade‰ or elementary maths, most people simply do not do these things on
a daily basis. So, do not assume anything ă it is better to know for sure.
Whichever type of test that is given; the questions are almost always presented in
multiple-choice format and have definite correct and incorrect answers. As the
test takers proceed through the test, the questions may become more difficult and
they will usually find that there are more questions than they can comfortably
complete in the time allowed. Very few people manage to finish these tests and
the object is simply to give as many correct answers as a test taker can.
ACTIVITY 6.2
Moreover, given the nature of the format, group-administered tests can be given
to as many students as can comfortably fit into a room, which reduces test
administration time and increases testing efficiency.
Furthermore, since the examiner may be less trained in the nuances of the test (in
comparison to those who administer individual tests), the examiner may break
the standardisation and inadvertently (and inappropriately) answer studentsÊ
queries or not be able to monitor the testing environment with the same fidelity
as can be given to the individual testing environment.
Although the sample size of a group-administered test may be large, it may also
not be representative of children from a particular demographic. For example
overseas, many group-administered cognitive and achievement tests are normed
by students who take the test in the fall and in the spring. However, many
students may choose not to take the test (when given a choice) or not be
motivated to perform their best on the test (Aiken & Groth-Marnat, 2006).
After exploring some theoretical aspects of aptitude and achievement tests and
the issues related to group tests, we will move on to examine certain specific
aptitude and achievement testing tools.
The first multiple aptitude test battery was published in 1941 and was known as
the Chicago Tests of Primary Mental Abilities. This battery was the direct
outcome of ThurstoneÊs factor analytic investigation. ThurstoneÊs theory of
intelligence centres on the existence of Primary Mental Abilities (PMA) and was
in direct contrast with SpearmanÊs theory of general intelligence.
(a) Space PMA represents the ability to recognise that two shapes are the same
when one has been rotated;
(d) Induction requires establishing a rule or pattern within a given set; and
ThurstoneÊs theory was well supported by his early research with subjects who
were University of Chicago undergraduates. It did not hold up however, when
he tested the theory against school-aged children. Apparently, the more
intellectually elite subjects at the University of Chicago did not differ very much
in their general intelligence. Their observable differences were noted among the
PMAs. On the other hand, the grade school children were more diverse in their
general intelligence. Therefore, the differences among their PMAs were not as
notable as the differences among their general intelligence.
One of the most used multiple aptitude battery is the Differential Aptitude Tests
(DAT). The DAT was first published in 1947 and later revised in 1962 and in
1974. It was developed by Bennett, Seashore and Wesman (1974). It comprises the
following eight subtests:
The GATB has been widely used in the employment service. Gradually, a
number of aptitude test batteries were developed for different purposes such as
the Flanagan Aptitude Classification Test (FACT) (Flanagan, 1964). This is a
multiple aptitude battery generally used for vocational counselling, rehabilitation
and occupational and employee selection. The resulting psychological profile is
used to determine appropriate career and training paths.
The battery involves nine different general aptitude tests involving 12 separate
subtests. These general aptitude tests are shown in Table 6.2.
When applicants have applied for a job where multiple or most of the traits are
required, then they will have to go through a complete GATB. The data of their
performance in different areas is collected through the use of composite battery.
For selection in particular areas or for particular occupations, only a part of
GATB is administered.
(a) Computation;
SELF-CHECK 6.2
1. How would you justify the usage of individual tests over group
tests?
2. Which was the first multiple aptitude test battery that came into
fruition and when?
Areas of Measuring a
Description
CandidateÊs Aptitude
Verbal reasoning test These tests generally involve grammar, verbal analogies and
following explained written instructions. They can also include
spelling, sentence completion and comprehension.
Numerical ability test Numerical aptitude tests are employed by employers to assess
oneÊs capability to carry out tasks involving the management of
numbers.
Abstract reasoning These tests assess the skills of a person in analysing
test information and solving problems on a compound, thought-
based level.
Mechanical reasoning These tests evaluate oneÊs understanding of simple mechanical
and physical concepts.
Space relations or The space relations test evaluates a personÊs capability to
spatial aptitude test envisage objects in three dimensions.
Spelling test A spelling test is an evaluation of a personÊs (generally a
studentÊs) capability to spell words properly.
Language usage test The capability to utilise language is significant in any job in
which communication, written or verbal, is used.
Spatial aptitude test A spatial aptitude test assesses oneÊs skill to manipulate shapes
in two aspects or to visualise three-dimensional objects
presented as two-dimensional pictures.
Perceptual speed and This test evaluates the capability to work precisely with detail
accuracy test and at different speeds.
The reason why DAT forms a part of almost all job aptitude tests is that it tests an
individual on all basis and helps him or her to decide which career he or she
would want to choose for himself or herself. This decision is taken on the basis of
marks secured, the level of knowledge and the section that interests him or her
the most.
The verbal DAT measures the ability to find relations amongst words and
manipulate abstract ideas. The numerical DAT measures capability to interpret
numerical relationships between different figures. These two skills are required
for most jobs.
Other types of DAT include the abstract reasoning test, which measures test
takersÊ ability to quickly identify patterns, logical rules and trends in new data,
integrate this information and apply it to solve problems. Mechanical reasoning
tests measure the test takersÊ ability to understand and apply mechanical
concepts and principles to solve problems.
Speed and accuracy test measures the ability to perform a job quickly and
accurately. Then, there are some specific DATs that are required only for specific
jobs. For instance, the space relations test measures the capability to analyse
three-dimensional figures. This sort of an aptitude is a must when an individual
is looking for jobs in engineering, architecture or designing.
SELF-CHECK 6.3
(b) Consists of different subtests and a global scale for each group age (ages 3,
4 to 6 and 7 to 18); and
(c) Provides choices of non-verbal scales which also vary according to age
groups.
Kaufman and Kaufman (1983) provide a good model of the test definition
process. In proposing the Kaufman Assessment Battery for Children (K-ABC), a
new test of general intelligence in children, the authors listed six primary goals
that define the purpose of the test and distinguish it from existing measures:
(a) Measures general intelligence from a strong theoretical and research basis;
(b) Separates acquired factual knowledge from the ability to solve unfamiliar
problems;
The oral language tests, formerly in the cognitive battery, are now part of
the achievement battery. The Oral Language cluster is used as the „ability‰
score and is then compared to the achievement clusters. In this scenario, the
individualÊs oral language ability becomes the predictor of his or her
academic achievement.
The WJ III Tests of Achievement includes the following five oral language
tests:
(i) Reading;
(ii) Decoding;
(iv) Writing;
(vi) Math.
The Brigance Test of Basic Skills provides assessments for students ranging
from pre-kindergarten to ninth grade. The test kit contains materials that
enable teachers to maintain an accurate recording of student achievement.
The Inventory section provides test administration directions and the
sequence in which specific skills should be assessed.
There is a student record book that allows the teacher to track education
objectives, student responses and academic progress. The test also contains
student profile test booklets that archive assessments and are used as a tool
in placement decisions.
The Brigance test contains a CD that has goals for individualised education
programmes and a manual for test validation and standardisation.
Triplicate scoring sheets are included, which are used to share assessment
results with parents and service providers attending multidisciplinary team
meetings.
Fig
gure 6.8: KeyM
Math 3 Diagnostic Assessmentt manual
Source:
h
http://www.p
pearsonclinical..com/educatioon/products/100000649/keym
math3-
diagnosticc-assessment.h
html#tab-pricin
ng
Tw
wo parallel forms (Form A and Form B) B allow for test
t administrration in
alteernating sequ uence every three month hs. Growth Scale Valuess (GSVs)
enaable educatorrs and cliniciians to measu ure progress accurately ov
ver time
acrross the full ra
ange of mathss concepts an
nd skills.
ACTIVITY 6.3
6.10.1 Education
Schools use standardised tests to determine if children are ready for school and
to track them into instructional groups; diagnose them for learning disability,
retardation and other handicaps; and decide whether to promote or retain them
in their grade. Schools also use tests to guide and control curriculum content and
teaching methods. A test must be good enough to serve as the sole or primary
basis for important educational decisions.
Readiness tests, used to determine if a child is ready for school, are very
inaccurate and encourage the use of overly academic, developmentally
inappropriate primary schooling (that is, schooling not appropriate to the childÊs
emotional, social or intellectual development and to the variation in childrenÊs
development).
Screening tests for disabilities are often not adequately validated; it is not proven
that they are accurately measuring for disabilities. They also promote a view of
children as having deficits to be corrected, rather than having individual
differences and strengths on which to build. While screening tests are supposed
to be used to refer children for further diagnosis, they often are used to place
children in special programmes.
Tracking hurts slower students and mostly does not help more advanced
students. Retention in grade, or flunking or leaving a student, is almost always
academically and emotionally harmful, not helpful. Test content is a very poor
Fig
gure 6.9: Answer sheet of mulltiple-choice teests
Source: http://www.w
wisegeek.com/ /what-are-the-different-typess-of-standardizzed-test-
m
questions.htm
Teaching g for the test also narrowss the curriculuum, forcing teachers
t and students
to conceentrate on memorisation n of isolateed facts, insstead of dev veloping
fundameental and hig gher order abilities.
a For example, mu ultiple-choice writing
tests are really copy-eediting tests, which
w do nott measure thee ability to org
ganise or
Tests that measure as little and as poorly as multiple-choice tests cannot provide
genuine accountability. Pressure to teach to the test distorts and narrows
education. Instead of being accountable to parents, community, teachers and
students, schools become „accountable‰ to a completely unregulated testing
industry.
Better methods of evaluating studentsÊ needs and progress already exist. Good
observational checklists used by trained teachers are more helpful than any
screening test. Assessment based on student performance on real learning tasks
is more useful and accurate for measuring achievement and provides more
information than multiple-choice achievement tests.
Trained teams of judges can be used to rate performance in any academic or non-
academic area. In the Olympic Games, for example, gymnasts and divers are
rated by panels of judges and the high and low scores are thrown out. Studies
have shown that, with training, the level of agreement among judges (the „inter-
rater reliability‰) is high. As with multiple-choice tests, it is necessary to enact
safeguards to ensure that race, class, gender, linguistic or other cultural biases do
not affect evaluation.
The general knowledge portion of the civil service exam covers basic areas such
as arithmetic and possibly even advanced arithmetic, depending on the job.
These questions may be particularly suited to money handling or word problems
based on different jobs. Interpretation of graphs and statistics may also be a
portion of the test, especially for those going into fields that are more analytical
in nature, such as finance and government accounting jobs.
ACTIVITY 6.4
When they were young, students were given tests to gauge their
abilities. Do you think these tests were helpful in deciding their future?
In the public educational system, aptitude tests are used to score students and
determine how well certain educational approaches are compared to others.
Regardless of the format of an actual aptitude test, practice aptitude tests come in
different forms and formats. In fact, there are businesses today that depend on
people wondering on how to practise an aptitude test and that sell sample
aptitude tests.
On the other hand, an attainment test is different from a career aptitude test.
Attainment tests are meant to measure academic achievements. They are used to
predict achievement in different subjects including social studies, science and
mathematics. Attainment tests do not differ according to its application in
different cultures.
In most cases, an aptitude test may be the same as an IQ test. Owing to court
rulings, however, aptitude tests do not use the term IQ or do not interpret the IQ
scores as the result of an aptitude test.
(a) Before test takers start taking the aptitude test, they will be given a solved
practice test paper. The test takers need to understand the requirements of
the test by going through the given test paper;
(b) After this introductory preparation, the tester will provide the test takers
with a long questionnaire, containing multiple-choice questions; and
(c) They will need to answer all the multiple-choice questions within the
provided time limit.
The test taker should not worry if they are given a maximum number of
questions to answer. These are given to candidates to test their capability of
handling stressful situations. Both accuracy and speed of candidates are tested
through career aptitude tests.
(f) The hidden potentials that an individual can use to perform his or her role.
SELF-CHECK 6.3
Multiple aptitude tests consist of a set of tests meant for general use, while
special aptitude tests are used for special programmes.
Aptitude tests are ussed in today yÊs workplacce, as well as in today yÊs
educationaal system for a variety of reasons.
r For employers,
e ap
ptitude tests are
a
used to sccreen potential job applicants to deterrmine which employees are a
naturally best
b suited forr certain posittions.
A
Ability test General leearning ability
y
A
Abstract reasoning Group tests
A
Achievement
t test Individuall tests
A
Aptitude testt Multiple aptitude
a tests
A
Arithmetic reeasoning Power testt
A
Attainment teest Spatial apttitude
F
Form percepttion Speed testt
Bennett, G. K., Seashore, H. G., & Wesman, A. G. (1974). Fifth edition manual for
the Differential Aptitude Tests. New York: Psychological Corporation.
What Are the Different Types of Standardized Test Questions? (2014). Retrieved
from
http://www.wisegeek.com/what-are-the-different-types-of-standardized-
test-questions.htm
INTRODUCTION
An attitude is a hypothetical construct that represents an individualÊs degree of
preference for an item. Attitudes are generally positive or negative views
towards a person, place, thing or event ă all these, which attitudes are projected
on, are often referred to as attitude objects. People can also be conflicted or
ambivalent towards an object, meaning that they simultaneously possess both
positive and negative attitudes towards the item in question. We will be
discussing attitudes, values and interests in detail in this topic.
After discussing the concepts of attitudes, values and interests, the related
psychology tests and measurement tools will be introduced. At the end of this
topic, the various issues related to the applications of psychology testing and
measurement in industrial and business settings will be discussed as well.
7.1.1 Attitudes
Attitudes are judgements. They develop based on the affect (A), behaviour (B)
and cognition (C) or so-called ABC model. Figure 7.1 shows the different
components of the ABC model of attitudes and also provides a description for
each component.
Most attitudes are the result of either direct experience or observational learning
from the environment. Unlike personality, attitudes are expected to change as a
function of experience.
Tesser (1993) argued that hereditary variables may affect attitudes but also
believed that they may do so indirectly. For example, consistency theories imply
that we must be consistent in our beliefs and values. The most famous example
of such a theory is the Dissonance-reduction theory, which has been introduced
in the course of Social Psychology and is associated with Leon Festinger,
although there are other theories for explaining attitudes as well, such as the
balance theory.
Attitude Change
Attitudes can be changed through persuasion and we should understand attitude
change as a response to communication. Experimental research into the factors
that can affect the persuasiveness of a message is as shown in Figure 7.2.
(i) One such trait is intelligence ă it seems that more intelligent people
are less easily persuaded by one-sided messages; and
(ii) Another variable that has been studied in this category is self-esteem.
Although it is sometimes thought that those higher in self-esteem are
less easily persuaded, there is some evidence that the relationship
between self-esteem and persuasibility is actually curvilinear, with
people of moderate self-esteem being more easily persuaded than
those of both high and low self-esteem levels (Rhodes & Woods,
1992). The mind frame and mood of the target also play a role in this
process.
(i) Expertise;
Measures may include the use of physiological cues like facial expressions, vocal
changes and other body rate measures. For instance, fear is associated with raised
eyebrows, increased heart rate and increased body tension (Dillard, 1994). Other
methods include concept or network mapping and using primes or word cues.
Any discrete emotion can be used in a persuasive appeal; this may include
jealousy, disgust, indignation, fear and anger. Fear is one of the most studied
emotional appeals in communication and social influence research.
Important consequences of fear appeals and other emotional appeals include the
possibility of reactance, which may lead to either message rejections or source
rejection and the absence of attitude change. There is an optimal emotion level in
motivating attitude change. If there is not enough motivation, an attitude will not
change. If the emotional appeal is overdone, the motivation can be paralysed
thereby preventing attitude change.
SELF-CHECK 7.1
7.1.2 Values
Do you know what values are? Let us look at the following definition of this
term.
An ongoing debate on values is whether some values which are not clearly
physiologically determined are intrinsic such as altruism and whether some such
as acquisitiveness should be valued as vices or virtues.
(a) Sociology;
(b) Anthropology;
SELF-CHECK 7.2
7.1.3 Interest
The word „interest‰ can be defined differently by different people, but how do
we define it from the perspective of psychology?
It is always better to choose a career where oneÊs interest is high because only
then will the person find the job interesting. His/her productivity and personal
job satisfaction will be high in such jobs. It is obvious that one will do well in an
area in which one is interested in.
Today, there is a mad rush to enter into professions like management, software
and information technology services. However after getting into such services,
many young people become bored and disinterested. This often leads to
mediocre or poor performance. It is therefore necessary to identify the areas of
interest of an individual before suggesting careers to him/her. A good
psychology test and measurement in determining an individualÊs interest will be
helpful for this purpose.
ACTIVITY 7.1
After exploring the three concepts of attitudes, values and interests, try
to reflect upon your own personal attitudes, values and interests to
better understand yourself.
Testing methods include direct observation of behaviour, ability tests and self-
reporting inventories of interest in educational, social, recreational and
vocational activities. The activities usually represented in interest inventories are
related to various occupational areas and these instruments and their results are
often used in vocational guidance.
Later, the SII was developed. SII is an interest inventory used in career
assessment. It is also frequently used for educational guidance as one of the most
popular career assessment tools. The test was initially developed in 1927 by
psychologist E. K. Strong, to help people exiting the military to find suitable jobs.
It was revised later by Jo-Ida Hansen and David Campbell.
For nearly 80 years, the SII assessment has guided thousands of individuals in
exploring careers and college majors. The assessment is the most respected and
widely used career-planning instrument in the world.
(a) Scores on the level of interest on each of the six Holland Codes or General
Occupational Themes. Holland Code Themes include realistic,
investigative, artistic, social, enterprising and conventional;
(b) Scores on 25 Basic Interest Scales (e.g. art, science and public speaking);
(c) Scores on 211 Occupational Scales which indicate the similarity between the
respondentÊs interests and those of people working in each of the 211
occupations;
(d) Scores on four Personal Style Scales (learning, working, leadership and risk
taking); and
(e) Scores on three Administrative Scales used to identify test errors or unusual
profiles.
(ii) Little;
(iii) Moderate;
SII is the most widely used and respected instrument for career exploration in the
world. For the US, the newly revised SII is a powerful tool as its content reflects
the way the people in the US work today. This includes, the many changes in the
workforce, the very nature of the jobs they do and the mirroring of the US
population. In particular, the folks at CPP (Consulting Psychologists Press) are
most proud of the huge sampling size as well as the widest possible range of
demographic, racial, ethnic and socio-economic data gathered in ensuring the
highest level of validity and reliability for the SII.
At its core level, the SII is based on the idea that individuals are more satisfied
and productive when they work in jobs or at tasks that they find interesting and
when they work with people whose interests are similar to their own. To say it
another way, a personÊs interests are compared to thousands of individuals who
report being happy and successful in their jobs and in general, are doing well in
them.
To say again, the SII does not examine your abilities and skills; it is an inventory
of your interests. Consisting of 291 questions, the SII will ask you to indicate your
preference for a wide range of occupations, school subjects, activities and types of
people. It will take about 30 to 45 minutes to complete and its results can be
viewed online. The result is a personÊs highly personalised report, which
identifies optimum career choices based on interests. It also includes additional
related occupations with concise job descriptions.
For example, the results may tell you that your interests are similar to those of
engineers who are very satisfied with their career choice. The results however do
not tell you what you should be or whether you have an aptitude for the level of
mathematics involved in this career, i.e. whether you would be good at that job.
SELF-CHECK 7.3
2. How does the SII indicate whether you will be good at your job or
not?
The Kuder Occupational Interest Survey (also known as KOIS and „The
Kuder‰) is a self-report vocational interest test used for vocational guidance
and counselling. It originated in the work of G. Frederic Kuder, who first
began publishing about the instrument in 1939.
The Kuder is often compared to other vocational interest tests, such as the Strong
Interest Inventory. While the SII test compares the interests of the person to those
of certain groups of people holding certain occupations, the Kuder focuses on
measuring the personÊs broad areas of interest. Thus, the Kuder will yield the
personÊs scores along ten vocational interest scales as shown in Figure 7.6.
Internal consistency of the vocational interest scales range from .47-.85 with a
median of .66. Median stability estimates over two weeks are .80 for the
vocational interest scales and .90 for the specific occupation scales. Validity
research has generally been based on „hit rates‰ (the scale scores which match
the actual occupations of the research participants) and factor analyses. The
Kuder has a dependability scale that may exercise caution in interpreting the
results if there are indications that the personÊs interests „are not settled‰.
Figure 7.7: Groups of people who can benefit from the Kuder Test Survey
How is it possible that the Kuder Journey can be beneficial in all these ways? It is
possible because the Kuder Journey helps you to answer the following questions
in Figure 7.9, which will then enable you to achieve the Kuder Journey benefits.
(b) Find a cluster of careers that match your skills, interests, abilities and
values.
(viii) Trends
(xi) Training
Key areas measured in CAI include the following as shown in Table 7.1.
(a) Provides scales for 111 occupations requiring varying amounts of post-
secondary education;
(c) Graphic and narrative test reports can be shared with the client and the
narrative report provides a three-page counsellorÊs summary;
(d) Combined gender scales allow for the broadest interpretation of survey
results; and
(e) The inventory closely matches the distribution of professional and non-
professional jobs in the labour force, making it well-suited for assessing
groups with a variety of career aspirations (e.g., complete high school
populations).
(a) Teach students to focus on their patterns of interest that are important in
making educational and occupational choices;
(b) Help high school and college students identify career directions and major
areas of study; and
(c) Advise individuals who are re-entering the workforce, considering a career
change, or who have been displaced.
(a) Guidance counsellors to help students and adults develop their career and
study plans; and
SELF-CHECK 7.4
You will have access to links, resources and industry contacts to help you learn
more about the careers and university majors which will in turn help you to
make the most of your time and talent. It is well suited for people whose career
path includes a four-year university degree. It is also appropriate for individuals
considering a mid-career change.
The JVIS S was writtenn by Dr. Dou uglas Jackson n (refer to Fiigure 7.10), the same
psycholoogist who developed
d th
he intelligennce test used d to screen n NASA
astronau
uts. He is a world
w authorrity on the subject
s of humman assessm ment and
among other honou urs, was thee President of the American Psych hological
Associattion Division of Measurem ment, Evaluattion and Statiistics. The JVIIS is one
of the most
m carefully and elaborateely constructeed psychologgical instrumeents ever
created.
7.5.1 Applica
ations of JVIS
J
The JVIS
S is applicablee under the fo
ollowing categ
gories:
The JVIS assesses work roles (for example, engineering) and work environment
preferences (for example, job stability), as well as measures potential academic
satisfaction. Detailed reports provide links, resources and industry contacts to
help individuals learn more about their highest ranked careers and university
majors.
SELF-CHECK 7.5
(i) A manual;
(i) The JVIS Extended Report includes the basic interest profile, a profile
for 10 general occupational themes, a profile of similarity to 17
educational major field clusters, a ranking of 32 occupational group
clusters, validity scales, an academic satisfaction score and other
information. A narrative summary of the three highest-ranked
educational and occupational clusters is particularly useful. Finally, a
section titled „Where to go from here‰ offers information on related
career exploration books and activities; and
(ii) The JVIS Basic Report contains the basic interest scales profile and
data similar to the Extended Report but with pre-printed interpretive
information rather than the personalised narrative summaries.
(c) Software
The SigmaSoft JVIS for Window software allows you to administer and
score the JVIS on computer. The JVIS for Windows software produces three
types of reports:
(i) The Extended Report is similar to that of the Mail-in Scoring service;
(ii) The Basic Report contains all of the profiles found in the Extended
Report, but does not provide explanatory text and career information;
and
(iii) The Data Report contains the scores found in the Basic Report in a
format designed for use by other programs.
Source: www.sigmaassessmentsystems.com/assessments/jvis.asp
(d) Internet
The JVIS is available online in two formats, which are:
(i) SigmaTesting.Com: The main testing site, which gives the counsellor
complete control over administration and report handling; and
(ii) JVIS.Com: The career site which offers a more self-driven approach,
with an online report linked to numerous online career resources.
SELF-CHECK 7.6
3. What are the key features that are examined by the Kuder Interest
Inventory?
4. How can you determine the reliability and validity of the CAI?
In order to work this out, it is very important to carefully select employees as per
the job requirements. Therefore psychology tests such as attitudes, interests and
values tests together with other relevant employment tests and assessment are
useful in helping industrial and business organisations to select suitable
employees.
SELF-CHECK 7.7
The assessment centre typically consists of exercises that reflect job content and
types of problems faced on the job. For example, individuals might be evaluated
on their ability to make a sales presentation or on their behaviour in a simulated
meeting.
The assessment centre typically uses multiple raters who are trained to observe,
classify and evaluate behaviour. At the end of the assessment, the raters meet to
make overall judgements about the performance of the participants in the centre.
These tests typically ask direct questions about previous experiences related to
ethics and integrity or ask questions about preferences and interests from which
inferences are drawn about future behaviour in these areas. Integrity tests are
used to identify individuals who are likely to engage in inappropriate, dishonest
and anti-social behaviour at work.
7.6.7 Interviews
Interviews vary greatly in their content, but are often used to assess interpersonal
skills, communication skills and teamwork skills as well as to assess job
knowledge.
Personality tests are often used to assess whether individuals have the potential
to be successful in jobs where performance requires a great deal of interpersonal
interaction or work in team settings.
(a) Strength;
(c) Speed.
ACTIVITY 7.2
There are various employment tests and measurement used in business
and industrial settings. By doing additional readings and discussing
with your face-to-face and online tutors, identify the advantages and
disadvantages of various employment tests and measurement methods
that can be used in organisations.
SELF-CHECK 7.8
Attitudes are judgements. They develop on the affect (A), behaviour (B) and
cognition (C) or so-called ABC model.
Since 1939, for 65 years, the Kuder Test Survey helped millions of youths and
adults worldwide discover their interests, skills and work values.
The JVIS manual and several research studies provide strong support for the
reliability and validity of this carefully constructed assessment.
Academ
mic satisfactio
on Inveestigative
Artisticc Job knowledge
k teests
Attitud
des Learrning organisaations
Biograp
phical data Message characteeristics
Cluster upational inteerest
Occu
ntional
Conven Reallistic
Enterprrising Sociaal
Holland
d Codes Sourrce characteristics
Hypoth
hetical constru
uct Targ
get characterisstics
Integritty tests Valu
ue
Interestt Vocaational guidan
nce
Invento
ory Worrk sample and
d simulation exercise
e
Cohen, R.
R J., & Swerddlik, M. E. (20010). Psychollogical testing
g and assessm
ment: An
intr
troduction to tests and meeasurement (77th ed.). New w York: McGraw-Hill
Hig gher Educatio
on.
Hovland
d, C. I., & Weiss,
W W. (11951). The in nfluence of source
s credibbility on
mmunication effectivenesss. The Public Opinion
com O Quar
arterly, 15(4), 635ă650.
6
Rhodes, N., & Wood, W. (1992). Self-esteem and intelligence affect influence
ability: The mediating role of message reception. Psychological
Bulletin, 111(1), 156ă171.
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Explain the concepts of personality from the perspective of
psychology test and measurement;
2. State the development of personality testing;
3. Identify the objectives of personality testing; and
4. Describe projective personality tests.
INTRODUCTION
Personality tests are the most popular tests in psychology. This is because almost
all people are interested in knowing what type of personality they have. In this
topic, the concept of personality from the perspective of psychology tests and
measurement will be introduced. Furthermore, a few popular personality tests
which are either based on objective methods or projective methods will also be
discussed.
We also usually use the word „self-concept‰ which is related to personality and it
refers to a personÊs self-definition, or „an organised and relatively consistent set
of assumptions that a person has about himself or herself‰.
Many personality tests after that were developed based on the need of the
society. For instance, during the World War, there was a need to select
individuals for military service and this led to the construction of tests which can
predict whether an individual recruited could adjust to military life or not.
Two major approaches in the development of personality tests can be seen in this
early stage:
The first projective tests developed by Murray and his associates were called the
Thematic Apperception Test and quickly became popular. Another well-known
projective test is the Rorschach Inkblot Test constructed by Herman Rorschach.
These projective tests differed from the use of objective tests by using unstructured
and ambiguous stimuli.
The Development of
Description
Personality Tests
Content of the test The California Psychological Inventory (CPI) was developed by
Harrison Gough in 1957. This inventory measures the normal
personality of adolescents and adults.
The third revision has eliminated several items that were
considered objectionable, or that were considered to violate
privacy considerations, or that were in conflict with the recent
legislation dealing with the rights of the disabled (Gough &
Bradley, 1996).
The CPI can be administered and scored individually or in a
group setting and can be answered in about an hour. Scoring
can be done by counting the number of items endorsed on each
scale and plotting the raw scores on a profile. The scores are
then converted to T-scores.
The items on the CPI are grouped into 20 scales which are as
shown below:
Achievement Dominance Independence Well-being
Intellectual Sociability Empathy Communality
efficiency
Psychological Capacity for Responsibility Tolerance
mindedness status
Flexibility Social presence Socialisation Achievement
via
conformance
Femininity/ Self-acceptance Self-control Good
masculinity impression
Psychometric The CPI has a large norm which is based on 6,000 samples and
properties this provides information on its validity and reliability. Research
on the CPI (Megargee, 1972) has established that it is extremely
useful in predicting underachievement in academic settings and
potential delinquency.
There is also evidence that indicates that the CPI can predict job
performance in careers and in school. Deniston and Ramanaiah
(1993) reported that the CPI had factor loadings on four
(extroversion, openness, neuroticism and conscientiousness) of
the five factors comprising the five-factor model of personality
but did not show significant loadings on agreeableness.
Table 8.2: The Content and Psychometric Properties of Personality Research Form (PRF)
The Development of
Description
Personality Tests
Content of the test The Personality Research Form (PRF) was developed by
Douglas Jackson in 1967. This test was developed by using the
theoretical framework of Henry Murray and his colleagues at
the Harvard Psychological Clinic (Murray, 1938) which
measures dimensions of normal personality.
There are two forms of the PRF:
(1) The short forms (Forms A and B) comprise 300 items
measuring 14 personality dimensions and one validity scale; and
(2) The long forms (Forms AA and BB) comprise 440 items
measuring 20 personality dimensions and two validity scales.
Another form is the PRF (Form-E) that comprises 352 items
measuring 20 dimensions and two validity scales and is similar
to Forms AA and BB.
The personality dimensions are interpreted using the bipolar
method, meaning that a low score on any scale indicates the
absence of the trait but also the presence of its opposite.
Table 8.3: The Content and Psychometric Properties of Sixteen Personality Factor
Questionnaire (16PF)
The Development of
Description
Personality Tests
Content of the test The Sixteen Personality Factor Questionnaire was developed by
Raymond B. Cattell in 1949. The test measures normal personality
and comprises all the characteristics and attributes of normal
adults. Cattell began by conducting a survey of all the words in
the English language which described normal personality
characteristics. Together with Allport and Odbert (1936), they
found approximately 4,000 English adjectives that described
personality characteristics.
Using the method of factor analysis, they reduced these words
into 15 factors, which were simply labelled A through O. Other
factors considered relevant were added and were given the labels
Q1, Q2, Q3 and Q4.
The latest edition is the 16PF Fifth Edition (1993), which comprises
185 items and uses a three-point Likert scale. These items are
grouped into 16 primary factor scales representing the dimensions
of personality initially identified by Cattell. The raw scores are
converted into standard scores known as stens (area
transformation scores on a standard 10 base).
The 16 scales in the 16PF are as follows:
Warmth Liveliness Vigilance Tension
Reasoning Rule- Abstractedness Openness to
consciousness change
Emotional Social-boldness Privateness Self-reliance
stability
Dominance Sensitivity Apprehension Perfectionism
Psychometric The reliability and validity of the 16PF are reported in numerous
properties studies (Conn & Rieke, 1994; Russell & Karol, 1994). In addition,
evidence presented by R. B. Cattell and Catell (1995) has strongly
supported its proposed factor structure.
The 16PF also has norms for high school, college and adult
populations. It can also be used in personnel selection and
placement and can measure workersÊ leadership potential,
decision-making ability and personal initiative.
The following are the explanation on the content of the test and its psychometric
properties:
(i) Neuroticism;
(iii) Openness.
The five personality dimensions of the NEO-PI-R have six specific facets
and they are as shown in Table 8.4.
Personality Dimensions
Description
of the NEO-PI-R
Neuroticism Anxiety, hostility, depression, self-consciousness,
impulsiveness, and vulnerability.
Extraversion Warmth, gregariousness, assertiveness, activity,
excitement-seeking and positive emotions.
Openness Fantasy, aesthetics, feelings, actions, ideas and
values.
Agreeableness Trust, straightforwardness, altruism, compliance,
modesty and tender-mindedness.
Conscientiousness Competence, order, dutifulness, achievement
striving, self-discipline, and deliberation.
The internal consistency reliability coefficients of the facets range from .56
to .90 (Aiken, 2003).
Test-retest reliability over a six-month period range from .86 to .91 for the
five dimensions and from .56 to .90 for the facet scales (Aiken, 2003).
ACTIVITY 8.1
1. List some of the personality tests that you have taken until now.
What personality type do you have?
2. Name two other examples of personality tests apart from the tests
based on objective methods listed in this topic.
This form of testing is used based on the assumption that when individuals try to
understand ambiguous stimuli, the interpretation of the stimuli will reflect their
emotions, experience, thinking and needs. In other words, the ambiguous stimuli
eliminate or reduce self-defence and other efforts that are created consciously to
skew test results.
Apart from that, although what is seen by subjects reflect their personal
characteristics, some responses may expose their hidden personalities
unconsciously. Therefore, projective tests are considered sensitive in detecting
hidden personality characteristics or thoughts from their unconscious minds.
The following explains the content of the test and its psychometric properties:
(ii) Two cards with black, grey and red inkblots; and
Fig
gure 8.2 show
ws the first of the
t ten cards in
i the Rorsch
hach inkblot teest.
Thiis test is an individual teest presentedd with minim mum structuree, which
meeans there are no particularr instructionss in respondin
ng.
Thee administrattion of the teest is done by y presenting the cards twice. This
phaase is called free
f association. The tester records thee length of tim
me taken
by subjects to give
g responsees and also thet location ofo the card when
w the
ressponse is madde. The next phase
p is called
d the inquiry phase.
p
Table 8.5:
8 The TesterrÊs Records Acccording to a Su
ubjectÊs Score
The TAT is more structured and clearer than the Rorschach Inkblot Test. It
consists of 30 picture cards and one empty card that provide stimuli for
respondents to create stories on relationships or social situations as
suggested by the pictures. There are several cards for male respondents
while others are for female respondents.
The administration of the TAT is similar with that of the Rorschach, which
is ambiguous and not standardised. The tester has to record subjectÊs
responses verbatim and also take note of their reaction time. Table 8.6 states
the five aspects in the interpretation of TAT.
Five Aspects in
Description
Interpretation of TAT
Hero The character in the picture that the subject relates as
himself or herself.
Needs The desires and motives of the hero or heroine in the
story.
Press Environmental influences that disturb or ease the
achievement of the subjectÊs desires and needs.
Themes The theme of the story such as depression.
Outcomes The conclusion of the story such as failure.
The DAP requires that the subject draws a picture of himself or herself and
a picture of another human body of the opposite gender. After the picture is
completed, the subject is required to explain the picture drawn including
age, occupation and family relations.
SELF-CHECK 8.1
1. Discuss the meaning of human drawings among children.
The two major approaches in the development of personality tests that can be
seen at the early stage are:
– Projective tests.
Projective tests are tests that have unstructured, ambiguous items, statements
or questions.
Projective tests are considered sensitive to detect the hidden personality and
characteristics or whatever is available in the unconscious mind.
The NEO-PI-R
N meeasures perso
onality traits according to the five facto
or model
perso
onality.
The most
m popularr projective personality
p tests are the Ro
orschach Inkb
blot Test
and Thematic
T Appperception Teest (TAT).
Facets of
o personality
y Perssonality dimen
nsions
Factor analysis
a Perssonality statess
Five facctor model of personality Perssonality traits
Human
n drawings Perssonality typess
Inkblot test Projeective person
nality tests
Interpreetation processs Psycchopathology
y
Normall personality Stan
ndard scores
Objectiv
ve personality
y tests Them
matic Appercception Test
L R. (2003). Psychological
Aiken, L. Ps t
testing and assessment
as . Bo
oston, MA: Allyn
A and
Baccon.
Chaplin, W. F., John, O. P., & Goldberg, L. R. (1988). Conceptions of state and
traits: Dimensional attributes with ideals as prototypes. Journal of
Personality and Social Psychology, 54(4), 541ă557.
Conn, S. R., & Rieke, M. L. (1994). The 16PF Fifth Edition technical manual.
Champaign, IL: Institute for Personality and Ability Testing.
Morgan, C. D., & Murray, H. A. (1935). A method for investigating fantasies: The
thematic apperception test. Archives of Neurology and Psychiatry, 34(2),
289.
Motta, R. W., Little, S. G., & Tobin, M. I. (1993). The use and abuse of human
figure drawings. School Psychology Quarterly, 8(3), 162ă169.
Russell, M., & Karol, D. (1994). The 16 PF fifth edition administratorÊs manual.
Champaign, IL: Institute for Personality and Ability Testing.
INTRODUCTION
In the previous topic, you were introduced to personality testing in psychology.
You learnt about the development and objectives of personality tests. In this
topic, discussions will focus on the applications of psychology test and
measurement in more specific fields of psychology namely clinical, health and
counselling psychology.
Counsellors use tests generally for assessment, placement and guidance, as well
as to assist clients to enhance their self-knowledge, practise decision-making and
acquire new behaviours.
(a) Individuals;
Informational uses include the gathering of data of clients, assessing the level of
some traits such as stress and anxiety and measuring the clientsÊ personality
types. The purpose of non-informational tests is to stimulate further interaction
with the client.
The five main steps in testing process in counselling are explained further as
follows:
(a) Selecting
Having defined the purpose for testing, the counsellor will look to a variety
of sources for information on available tests for the purpose determined.
Resources include review books, journals, test manuals and textbooks on
testing and measurement (Anastasi, 1988; Cronbach, 1979). The most
complete source of information on a particular test is usually the test
manual.
(b) Administering
Test administration is usually standardised by the developers of the test.
Manual instructions need to be followed in order to make a valid
comparison of an individualÊs score with the testÊs norm group.
(c) Scoring
Scoring of tests follow the instructions provided in the test manual. The
counsellor is sometimes given the option of having the test machine scored
rather than hand scored. Both the positive and negative aspects of this
choice need to be considered. It is usually believed that test scoring is best
handled by a machine as this will make it free from bias.
(d) Interpreting
The interpretation of test results is usually the area which allows for the
greatest flexibility within the testing process. Depending upon the
counsellorÊs theoretical point of view and the extent of the test manual
guidelines, interpretation may be brief and superficial, or detailed and
explicitly theory based (Tinsley & Bradley, 1986).
As this area allows for the greatest flexibility, it is also the area with the
greatest danger of misuse. While scoring is best done by a bias-free
machine, interpretation by machine is often too rigid. What is needed is the
experience of a skilled test user to individualise the interpretation of results.
(e) Communicating
Here, the therapeutic skills of counsellors come fully into play (Phelps,
1974). The counsellor will use verbal and non-verbal interaction skills to
convey messages to clients and to assess their understanding.
ACTIVITY 9.1
How many people can be considered healthy based on that definition? This
definition of what it means to be healthy probably creates an unrealistic goal for
a vast majority of people.
A century ago, contagious and infectious diseases like smallpox, rubella and
influenza were much bigger killers than they are today. Nowadays, more deaths
are caused by heart diseases, cancer and strokes. While advances in medical
science have made a big difference, our lifestyle choices have also contributed to
this changing trend.
The Greek philosopher, Plato, believed that „where temperance is, there health is
speedily imparted‰. Plato has been proven right by health psychologists.
Research shows that moderation in all things is the key to a long and healthy life.
Seven healthy lifestyle habits from a Western perspective have been identified
and are shown in Figure 9.2.
A group of people were studied over a 25-year period. Those who followed all
the seven healthy habits previously mentioned had significantly lower mortality
rates than those who followed fewer than three.
People are now more aware of what is good for them and their health. However
knowledge by itself does not lead to changes in behaviour. Even when we are ill
and have been prescribed medicine, many of us do not follow our doctorÊs
advice.
Research has shown that people are more likely to be compliant if the doctor
adopts a friendly approach, communicates well with the patients and provides
them with information about their condition and its treatment.
Even the waiting time to see the doctor can affect how compliant people are.
Many people who are made to wait for more than 30 minutes to see the doctor
will be reluctant to follow their doctorÊs advice. In one study, only 31% of long-
suffering people complied with their treatment. In contrast, 67% of people who
were kept waiting for less than 30 minutes were quite happy to follow their
doctorÊs orders.
Table 9.1 further explains the aspects measured in health psychology and
discusses some of the tests used in each of the aspects.
ACTIVITY 9.2
(b) Coping;
(d) Pain.
9.3.1 Neuropsychology
Do you know what neuropsychology is? Let us read its definition.
Neuropsychologists study the brain and its many different disorders. Some of
these include the following conditions:
(b) How the HIV virus changes brain functioning and leads to problems in
memory;
(f) Malingering;
(d) Language;
Source:
http://en.wikibooks.org/wiki/Psychological_Testing/Testing_in_Health_Psychology
SELF-CHECK 9.1
1. What ability can be tested using the Wisconsin Card Sorting test?
Since clinical psychology deals with behaviour and mental pathology, it is first
important to understand the concept of psychopathology. After which,
personality tests on psychopathology and measurement on psychological and
mental disorders commonly used in clinical psychology will be highlighted.
9.4.1 Psychopathology
Psychopathology is the study of mental illness. A mental disorder or mental
illness is a psychological or behavioural pattern associated with distress or
disability that occurs in an individual and is not a part of normal development.
The term is most commonly used within psychiatry. Psychiatry is the branch of
medicine that deals with the diagnosis, treatment and prevention of mental and
emotional disorders, whereas pathology refers to disease processes.
SELF-CHECK 9.2
1. Define psychopathology.
2. Differentiate between psychiatry and psychopathology.
3. How does hallucination affect human behaviour?
The standardised answer sheets can be hand scored with templates that fit over
the answer sheets, but most tests are computer scored.
Computer scoring programs for the current standardised version, the MMPI-2,
are licensed by the University of Minnesota Press to Pearson Assessments and
other companies located in different countries. The computer scoring programs
offer a range of scoring profile choices including the extended score report,
which includes data on the newest and most psychometrically advanced scales ă
the Restructured Clinical scales (RC scales).
The extended score report also provides scores on the more traditionally used
clinical scales as well as content, supplementary and other subscales of potential
interest to clinicians.
The use of the MMPI is tightly controlled for ethical and financial reasons. The
clinician using the MMPI has to pay for materials and for scoring and report
services, as well as for installing the computerised program. The most historically
significant developmental changes for MMPI include:
(a) MMPI
The original MMPI was developed in 1939 (Groth Marnat, , 2009) using an
empirical keying approach, which means that the clinical scales were
derived by selecting items that were endorsed by patients known to have
been diagnosed with certain pathologies.
The difference between this approach and other test development strategies
used around the time was that it was theoretical (not based on any
particular theory) and thus, the initial test was not aligned with the
prevailing psychodynamic theories of the time.
However, because the MMPI scales were created based on a group with
known psychopathologies, the scales themselves were not theoretical, by
way of using the participantsÊ clinical diagnoses to determine the content of
the scales.
(b) MMPI-2
The first major revision of the MMPI was the MMPI-2, which was
standardised based on a new national sample of adults in the United States
and released in 1989. It is appropriate for use with adults aged 18 and over.
The current MMPI-2 has 567 items, all in true-or-false format and usually
takes between one to two hours to complete, depending on participantsÊ
reading level.
(c) MMPI-A
A version of the test designed for adolescents, the MMPI-A, was released in
1992. The MMPI-A has 478 items, with a short form of 350 items.
(d) MMPI-2 RF
A new and psychometrically improved version of the MMPI-2 has recently
been developed, employing rigorous statistical methods that were used to
develop the restructured clinical (RC) scales in 2003. The new MMPI-2
Restructured Form (MMPI-2-RF) has now been released by Pearson
Assessments.
SELF-CHECK 9.3
1. What is the basic concept of MMPI?
2. Why did MMPI-2 come into the picture?
3. What is the basis for the origins of MMPI-2-RF?
Source: http://sevencounties.org/poc/view_doc.php?type=doc&id=8214&cn=18
No. of
No. Abbreviation Description What is Measured
Items
Concerned with bodily
1. Hs Hypochondriasis 32
symptoms
2. D Depression Depressive symptoms 57
Awareness of problems and
3. Hy Hysteria 60
vulnerabilities
Psychopathic Conflict, struggle, anger,
4. Pd 50
Deviate respect for societyÊs rules
Masculinity/ Stereotypical masculine or
5. MF 56
Femininity feminine interests/behaviours
Level of trust, suspiciousness,
6. Pa Paranoia 40
sensitivity
Worry, anxiety, tension,
7. Pt Psychasthenia 48
doubts, obsessiveness
8. Sc Schizophrenia Odd thinking and social
78
alienation
9. Ma Hypomania Level of excitability 46
10. Si Social Introversion People orientation 69
Source:
http://en.wikipedia.org/wiki/Minnesota_Multiphasic_Personality_Inventory
New in
Abbreviation Description Assesses
Version
CNS 1 „Cannot Say‰ Questions not answered
L 1 Lie Client „faking good‰
F 1 Infrequency Client „faking bad‰ (in first half of
test)
K 1 Defensiveness Denial/Evasiveness
Fb 2 Back F Client „faking bad‰ (in last half of
test)
VRIN 2 Variable Response Answering similar/opposite
Inconsistency question pairs inconsistently
TRIN 2 True Response Answering questions all true/all
Inconsistency false
F-K 2 F minus K Honesty of test responses/not
faking good or bad
S 2 Superlative Self- Improving upon K scale,
Presentation „appearing excessively good‰
Fp 2 F-Psychopathology Frequency of presentation in clinical
setting
Fs 2 RF Infrequent Somatic Over-reporting of somatic
Response symptoms
Source:
http://en.wikipedia.org/wiki/Minnesota_Multiphasic_Personality_Inventory
Dozens of content scales currently exist, some samples of which are shown
in Table 9.6.
Abbreviation Description
Es Ego Strength Scale
OH Over-Controlled Hostility Scale
MAC MacAndrews Alcoholism Scale
MAC-R MacAndrews Alcoholism Scale Revised
Do Dominance Scale
APS Addictions Potential Scale
AAS Addictions Acknowledgement Scale
SOD Social Discomfort Scale
A Anxiety Scale
R Repression Scale
TPA Type A Scale
MDS Marital Distress Scale
Source:
http://en.wikipedia.org/wiki/Minnesota_Multiphasic_Personality_Inventory
The five factor model of human personality has gained great acceptance
amongst non-pathological populations. The PSY-5 scales differ from the
five factors identified in non-pathological populations in that they were
meant to determine the extent to which personality disorders might
manifest and be recognisable in clinical populations. The five components
were labelled as:
Raw scores on the scales are transformed into a standardised metric known as T-
scores (mean or average equals 50, standard deviation equals 10), making
interpretation easier for clinicians. Test manufacturers and publishers ask test
purchasers to prove they are qualified to purchase the MMPI/MMPI-2/MMPI-2-
RF and other tests (Sevencounties.org, 2014).
SELF-CHECK 9.4
What do the scales in MMPI denote?
The Millon Clinical Multiaxial Inventory-III, Third Edition (MCMI-III) (2009) has
new norms and updated scoring.
Each generation of the MCMI inventory has attempted to keep the total number
of items small enough to encourage its use in all types of diagnostic and
treatment settings. Yet, it is kept large enough to permit the assessment of a wide
range of clinically relevant multiaxial behaviours.
At 175 items, the MCMI inventory is much shorter than comparable instruments.
Terminology is geared to an eighth-grade reading level. The inventory is almost
self-administering. A great majority of patients can complete the MCMI-III™ in
20 to 30 minutes, facilitating relatively simple and rapid administrations while
minimising patient resistance and fatigue.
The Profile Report presents the patientÊs MCMI scores and profile and is
useful as a screening device to identify patients that may require more
intensive evaluation or professional attention.
(e) Research
Over 600 research studies have used the MCMI inventory in a significant
manner. Objective, quantified and theory-grounded individual scale scores
and profile patterns can be used to generate and test a variety of clinical,
experimental and demographic hypotheses. Research support is also
available through Pearson Assessments.
(f) Scales
The current version, the MCMI-III, is composed of 175 items that are scored
to produce 28 scales divided into the following categories (Groth-Marnat,
2009):
Table 9.7 provides a clearer outline of the respective scale categories, their
name and the number of relevant items in measuring each scale.
Self-Defeating
Clinical Syndromes
Anxiety 14
Somatoform 12
Bipolar: Manic 13
Dysthymia 14
Alcohol Dependence 15
Drug Dependence 14
16
Post-traumatic Stress Disorder
Severe Syndromes
Thought Disorder 17
Major Depression 17
Delusional Disorder 13
Scale descriptions and detailed data on test development and validation can
be obtained from the latest (2006) MCMI-III, test manual.
SELF-CHECK 9.5
1. What is the basic use of MCMI?
DSM can be used clinically and also to categorise patients using diagnostic
criteria for research purposes. Studies done on specific disorders often recruit
patients whose symptoms match the criteria listed in the DSM for that disorder.
An international survey of psychiatrists in 66 countries comparing the use of the
ICD-10 and DSM-IV found the former was more often used for clinical diagnosis
while the latter was more valued for research.
The first official attempt was the 1840 census which used a single category,
„idiocy/insanity‰. The 1880 census distinguished among seven categories of
mental illness, which are listed in Figure 9.3.
Figure 9.4 shows the seven different DSMs, from the earliest to the latest ones.
The following are detailed descriptions of the seven different DSMs, from the
earliest to the latest:
The foreword to the DSM-I states that the US Navy had itself made some
minor revisions but „the Army established a much more sweeping revision,
abandoning the basic outline of the Standard and attempting to express
present day concepts of mental disturbance. This nomenclature eventually
was adopted by all Armed Forces‰, and „assorted modifications of the
Armed Forces nomenclature [were] introduced into many clinics and
hospitals by psychiatrists returning from military duty.‰ The Veterans
Administration also adopted a slightly modified version of Medical 203.
In 1949, the World Health Organisation published the sixth revision of the
International Statistical Classification of Diseases (ICD) which included a
section on mental disorders for the first time. The foreword to DSM-1 states
this „categorised mental disorders in rubrics similar to those of the Armed
Forces nomenclature.‰
46% replied, of which 93% approved and after further revisions (resulting
in it being called DSM-I), the Diagnostic and Statistical Manual of Mental
Disorders was approved in 1951 and published in 1952. Its structure and
conceptual framework were the same as in Medical 203 and many passages
of the text were identical. The manual was 130 pages long and listed 106
mental disorders.
The term „reaction‰ was dropped from it but the term „neurosis‰ was
retained. Both DSM-I and DSM-II reflected the predominant psychodynamic
psychiatry, although they also included biological perspectives and concepts
from KraepelinÊs system of classification.
The revision took on a far wider mandate under the influence and control
of Spitzer and his chosen committee members. One goal was to improve the
uniformity and validity of psychiatric diagnosis in the wake of a number of
critiques, including the famous Rosenhan experiment. There was also a
need to standardise diagnostic practices within the US and with other
countries after research showed that psychiatric diagnoses differed
markedly between Europe and the US. The establishment of these criteria
was also an attempt to facilitate the pharmaceutical regulatory process.
The criteria adopted for many of the mental disorders were taken from the
Research Diagnostic Criteria (RDC) and Feighner Criteria, which had just
been developed by a group of research-orientated psychiatrists based
primarily at Washington University in St. Louis and the New York State
Psychiatric Institute.
The first draft of the DSM-III was prepared within a year. Many new
categories of disorders were introduced; a number of the unpublished
documents that aim to justify them have recently come to light.
Finally published in 1980, the DSM-III was 494 pages long and listed 265
diagnostic categories. It rapidly came into widespread international use by
multiple stakeholders and has been termed a revolution or transformation
in psychiatry.
SELF-CHECK 9.6
1. Why is it important to understand psychopathology in human
behaviour?
The MMPI-2 is the most commonly used personality test by mental health
professionals to understand personality structure and to assess and diagnose
mental illness.
The MMPI-2 is also utilised in other fields outside of clinical psychology. The
test is often used in legal cases, including criminal defence and custody
disputes.
On May, 2013, the latest version of DSM: DSM-5 was published with a few
significant changes in the categorisation of mental disorders.
C
Cohen, M E. (2010). P
R. J., & Swerdlik, M. Psychologicall testing and assessment: An
A
introducction to tests and measureement (7th edd.). New York k: McGraw-H Hill
Higher Education.
E
n, S. (1984). Stress,
Lazarus, R. S., & Folkman
L S appraiisal, and copiing. New York:
Springer-Verlag.
P
Passer, W., & Smith, R. E. (2008).. Psychology
M. W y: The science
ce of mind and
a
behaviorr (4th ed.). Neew York: McG
Graw-Hill Hig
gher Educatio
on.
Wikibooks.org
W g. (2010). Psychological
Ps testing/Testting in healt
lth psychologgy.
Retrieved
d from
http://enn.wikibooks.o
org/wiki/Psyychological_TTesting/Testiing_in_Health
h_
Psycholoogy
W
Wikipedia. (20014). DSM-5. Retrieved
R from
m http://en.wikipedia.org
g/wiki/DSM
M-5
INTRODUCTION
In the previous topic, you learnt about the application of testing in clinical, health
and counselling settings. The standardised tests in certain fields were also
discussed.
In the last topic of this module, an overview of psychology test and measurement
will be given. You will also learn about the various issues and challenges related
to psychology test and measurement which include faking tests, test bias,
cultural bias in testing and legal and ethical issues. The future trends in testing
will be explained as well.
In discussing the main objectives and uses of psychological tests, Aiken (2000)
states that the use of psychological tests today is the same as its use in previous
years and centuries. They are utilised to make an assessment of behaviour,
mental abilities and an individualÊs characteristics to help in making decisions,
predicting and guiding. Specifically, he lists six uses of psychological tests, which
are for research in general and for evaluation of programmes. The six uses of
these tests are listed in Figure 10.1.
Most importantly, test users must be able to make critical evaluation scientifically
and systematically on the tests they intend to use, such as whether the tests are of
high quality or not.
In order to obtain information about tests and several important issues related to
their usage and psychometric characteristics, manuals of tests and books that
discuss psychological tests are both good important sources. The most important
book referred to by many test users is the Mental Measurement Yearbooks edited
by O.K. Buros. It includes thousands of standardised tests that have been
evaluated by many experts. In addition, Buros also produced and edited four
other books which are:
(a) Psychometrical;
On the one hand, the publication of new tests and the revision of existing tests
appear to be accelerating. A survey of psychological literature reveals that
psychologists are as enthusiastic about psychological tests as they have ever been
and nothing indicates that this interest will decline (Janda, 1998). Society, on the
other hand, appears to be increasingly sceptical about the widespread use of
tests.
Matarazzo (1992) observed that predictions are more often wrong than right
because no one can foresee the theoretical or technological innovations or the
changes in the social and political climate that influence the development of any
discipline.
Morganthau (1990) observed that many, if not most, Americans believe that tests
are „biased, mechanistic, dehumanising and inimical to learning‰ (p.63) and that
they are used to control people for the benefit of those who use them. The
distrust of tests may result from the fact that many people experience them as
barriers that prevent them from attaining their educational, vocational or
professional goals.
(b) Consequences that may occur regardless of the claims of test developers.
For instance, the test developer may claim that a test of depression leads to more
effective therapy. In this case, evidence of improved therapy should be collected
as proof.
Test bias means that a test functions differently for different groups. Studying
test bias can be done using criterion-related validity methods. Do the tests
function in the same way for different groups, even if the groups vary in average
performance related to real differences in underlying traits?
Jensen (1980) stated that „Most current standardised tests of mental ability yield
unbiased measures for all native-born English-speaking segments of American
society today, regardless of their sex or their racial and social-class background.
The observed mean differences in test scores between various groups are
generally not an artefact of the tests themselves, but are attributable to factors
that are causally independent of the tests.‰
Reynolds (1994), on the other hand, argued that „Only since the mid-1970s has
considerable research been published regarding race bias in testing‰. For the
most part, this research has failed to support the test bias hypothesis, revealing
instead that:
(c) The content of the items in tests is about equally appropriate for all these
groups.
We will further examine the issues of test bias in the later section of this topic.
ACTIVITY 10.1
By doing additional reading, discuss with your coursemates and tutors
the following:
1. Do you yourself like to take psychology tests and measurement?
Share your reasons.
2. Debate critically whether psychology tests and measurement
bring more benefits or harms to our society.
The techniques used in reducing test faking are further explained as follows
(Changingminds.org, 2014):
If individuals have a high need for approval, they usually tend towards
positive „agree‰ and „yes‰ responses. This may be countered and detected
by reversing some questions (reversing also breaks up habituating patterns
of similar responses). This tendency towards seeking approval may also be
detected by including a „social desirability‰ scale within the questions to
separate them from habitual response forming questions.
Martin et al. (1995) shows that test takers with a good understanding of job
needs can provide realistic faked responses. Ipsative methods still persist,
in particular, where sound alternatives are not available, for example, the
Zuckerman, Eysenck & Eysenck (1978) scale of sensation-seeking is still
used, despite the report by Ridgeway & Russell (1980) on unacceptably low
reliabilities for the various sub-scales.
SELF-CHECK 10.1
It is very common for people to try and make themselves look better than they
actually are on these questionnaires, especially if they know that they are being
evaluated.
This sort of faking can distort the predictive validity of these tests, with
significant negative economic consequences. We want to develop a measure that
can predict real-world performance even in the absence of completely honest
responses.
Using formulas derived by Frank Schmidt (Iowa University) and John Hunter
(Michigan State University), the authors were able to estimate the potential
productivity gain associated with using the new measure in a workplace setting.
Since people differ widely in their individual abilities, even a small degree of
accuracy in testing can produce significant economic gains.
In the present study, the tests were accurate beyond that small degree. In fact,
Schmidt and HunterÊs formulas indicated that the use of bias-resistant test over
currently available for personality assessment methods could result in a
productivity gain of 23 percent per hired employee, when response faking is an
issue ($17,000/ year per $75,000 of salary).
Potential gains of this magnitude should not be ignored. It is very important that
the right people be chosen for any competitive position. This questionnaire is a
step in the right direction.
Details of the discussion on these related issues can be referred to the article:
„This personality test cannot be faked‰ by Nauert (2008).
ACTIVITY 10.2
Discuss in tutorial class and in the myVLE forum what you have learnt
about test faking, especially in personality tests and measurement, after
reading the article „This personality test cannot be faked‰ as mentioned
in section 10.4.2.
Another way of saying this is that a biased test is one in which people from two
groups who have the same observed score do not have the same standing on the
trait of interest.
A third way of saying this is that using a test to predict some criterion of interest
results in a systematic over or under-prediction based on group membership.
Example: racist performance appraisal opened a PandoraÊs box in US and
Germany.
Test bias is said to occur when a test yields higher or lower scores on average
when it is administered to specific criterion groups such as people of a particular
race or gender than to an average population sample. Negative bias is said to
occur when the criterion group scores lower than average, while positive bias is
said to occur when the group scores higher. The crux of the issue then is: does
this occur because there is a real difference in the attribute being measured or is
this due to cultural test bias?
Models Description
Mean The most intuitive definition of bias is the observation of a mean
difference difference between groups. So, for example, if we saw that females
scored higher than males on the SAT (Scholastic Aptitude Test)
verbal test, we might suspect that the test was biased. However, the
mean difference by itself is a bad choice of models of bias. This is
because a mean difference could demonstrate bias, but it could also
reflect a real difference between groups.
If you measure the height of a representative sample of adult males
and females in the US with a tape measure, you will find that males
are taller on average. Does this mean that the tape measure is
biased?
People differ in a lot of ways, so finding a mean difference between
groups does not necessarily mean that the test is biased. On the other
hand, finding no mean difference does not necessarily mean lack of
bias.
If you developed a new tape measure that showed no mean
difference between males and females in height, the new measure
would be biased, because there really is a difference. In essence, your
new measure would be adding inches to the height of females and
this is what we would define to be bias.
Equal The most widely accepted (but not the only) model of test bias is the
regressions regression model (which is also known as the Cleary model). This
model places bias into the context of the interpretation of test scores
(that is, validity).
The model says that if different groups share the same regression
line, the test is not biased (even if there are differences in means
across groups). If the groups have different regression lines, then the
test is biased because it is measuring different things for different
groups.
The model says that people with the same test scores should do
equally well on some external criterion. For example, if the test is not
biased, then blacks and Whites with the same SAT score will get the
same freshman grade point average. On the other hand, if the SAT is
biased against blacks, then blacks with the same SAT scores as
whites will have higher freshman GPAs.
Source: http://luna.cas.usf.edu/~mbrannic/files/tnm/tstbias.htm
Copyright © Open University Malaysia (OUM)
240 TOPIC 10 ISSUES AND CHALLENGES OF TESTING
The consensus in I/O psychology and related fields (e.g., education, human
resource management) concerned with high-stakes testing is that, in the instances
when it exists, test bias is found regarding intercept differences between groups
in the form of over-prediction of scores for minority group members (i.e., smaller
intercept for the ethnic minority group compared to the majority group), but no
differences are found regarding slopes across groups (e.g., Cole, 1981; Houston &
Novick, 1987; Humphreys, 1986; Hunter, Schmidt, & Rauschenberger, 1984;
Kuncel & Sackett, 2007; Linn, 1978; Rotundo & Sackett, 1999; Rushton & Jensen,
2005; Sackett, Schmitt, Ellingson, & Kablin, 2001; Sackett & Wilk, 1994; Schmidt &
Hunter, 1981, 1998).
This conclusion has been reached regarding selection tools used in both work
and other organisational settings to assess a heterogeneous set of constructs
ranging from general mental abilities (GMA; e.g., Hartigan & Wigdor, 1989) to
personality (e.g., Cortina, Doherty, Schmitt, Kaufman, & Smith, 1992; Saad &
Sackett, 2002) and safety suitability (Te Nijenhuis & Vander Flier, 2004).
Details of the discussion on these related issues can be found in the article:
„Revival of test bias research in pre-employment testing‰ by Aguinis, Culpepper
and Pierce (2010).
A biased test may be used fairly. Suppose that a test is biased such that males
score 10 points higher on average than do females. If we simply add 10 points to
the observed scores of the females and use that score for making decisions, the
biased test will prove to be fair (Aguinis, Culpepper and Pierce, 2010).
SELF-CHECK 10.2
The cultural influence in testing will be discussed from the following four aspects
shown in Figure 10.3.
Sattler (1988) states that cultural groups may have variations based on their
different values, language, views of life and death, roles of family members,
problem-solving strategies, attitudes toward education, mental health, mental
illness and stage of acculturation.
In other words, the test scores obtained may be due to complex social-
psychological factors potentially influenced by national history, predicaments of
race and many other factors.
10.6.2 Language
The field of psychology has also recognised that specialised practices may be
needed to achieve equitable testing with linguistic minorities. We could suggest
that a native language interpreter be used to facilitate the testing of examinees
whose first language is not English. However, testing specialists advise against
this practice because interpreters may substitute words, speak in a different
dialect or engage in subtle prompting that influences the examineesÊ responses
(Rogers, 1998). A well-trained psychologist would be preferable, but even this
practice is considered problematic by some (Figueroa, 1990).
The preferred option is to use tests translated into the examinersÊ native language
and norm of the relevant subpopulations. The process of translating tests from
English to the intended language must undergo a process of back translation as
suggested by Brislin, Lonner and Thorndike (1973) in order to achieve
comparable meaning.
10.6.3 Behaviour
In addition to possible language barriers, test takers of different cultures may
exhibit a lack of familiarity about test taking that further adds to their
disadvantage.
Padilla and Medina (1996) made the following observations: „It is quite probable
that minority students are less familiar with standardised achievement testing
and thus less test wise than majority students, most of whom have been exposed
to standardised testing over an extended time.‰
Culture-fair tests are tests that pose problems that are equally familiar to all
cultures. According to Tan and Tan (1998), culture-fair tests are tests that reduce
cultural factors as much as possible. Examples of these tests are the Culture-Free
Self-Esteem Inventory by Coopersmith and the Culture-Fair Intelligence Test by
Cattell.
Details of the discussion on these issues can be found in the article: „Developing
a cross-cultural conceptual model for testing organisational commitment in the
UAE: A theoretical perspective‰ by Anwar, Chaker and Ferhat (2003).
According to Sun and Shi (2007), with the progress of economic globalisation,
more and more international enterprises have started to perform usability tests in
different cultures during the last decade. In China, only two or three years ago,
„usability‰ was quite a new word for most of its people. Presently, the situation
has changed dramatically. Many domestic enterprises have considered the
importance of usability tests for their products, especially for IT businesses.
(b) The second is finding users who can speak English. However both
professional moderators and English speakers are very rare in China and
they are young and probably come from a Western educated background.
Hence, there is no way to get real feedback from all kinds of users in China;
and
(c) The third and the most regular way is by using both remote and local
moderators to work together with Chinese users to ensure that they really
get the feedback from the right users and understand it.
Local moderators here mean those who have received training in human factors
or those who have working experience on usability test for at least one year in
China. They usually cannot speak English very well. Remote moderators are also
those who have received training in human factors and have experience in
usability test of at least one year in foreign countries. They usually can speak
English and their local language very well.
The first thing we need to do is to identify the kinds of cultural factors that can
affect usability tests. The reason why language is picked as a factor to be
investigated is that language is a representation of culture and the language
situation among India, European countries and China is totally different.
Although English is not a native language for the Indians and Danish, most
people in these two countries can speak English very well. However in China,
few people are proficient in English. Therefore, if conducting a usability test in
China, the first thing you have to do is to change the testing interface into
Chinese.
SELF-CHECK 10.3
ACTIVITY 10.3
After reading the two abstracts of the articles in section 10.7 (or the full
articles which can be found online), write short notes on how you can
relate both articles and get ideas from them in further understanding
cross-cultural issues in psychology test and measurement. Discuss your
findings in class and on the myVLE forum.
Legal challenges to the use of tests for decision-making in schools have focused
on ability tracking, placement in special education classes, test scores as school
admissions criteria, test disclosure and teacher competency.
In general, the application of specific laws to the claims of inappropriate test use
is unclear; instead, the cases have been decided on based on the specific
circumstances of each case. Cases illustrating legal challenges are described in
greater detail as follows (ERIC, 1985):
Court decisions have upheld these arguments to some extent. In Hobson vs.
Hansen (1967), it was ruled that the IQ tests used to track students were
culturally biased because they were based on a white, middle-class sample. It
was also ruled that these tests were inaccurate for lower-class and Black
students and the court abolished the tracking system used in the District of
Columbia. Later appeals allowed other forms of ability grouping, but would
not allow the use of tests that had racially discriminatory consequences.
The use of achievement tests instead of IQ tests may not be any more
appropriate. Moses vs. Washington Parish School Board (1971) involved the
use of both IQ and achievement tests. The IQ test scores were used for
special education placement; the achievement test scores were used for later
tracking. The case was also somewhat unique because it involved a recently
desegregated school. The courts ruled against test use for tracking under
these circumstances.
In later appeals, test validity became an important issue and the court set
standards for validity: the same pattern of scores must appear in different
subgroups, the mean score should be the same for different subgroups and
the results should correlate with relevant criterion measures. Though
experts argued that these standards were not psychometrically sound, the
court found that the racial differences in test scores were due to cultural
biases in the tests.
The court decided that the NTE were valid for these purposes, because
scores reflected presence or absence of knowledge. There was no intent to
discriminate, and an ETS validity study indicated that they were in
compliance with Title VII of the Civil Rights Act of 1964. Opponents of this
type of test continue to argue that certification should be based on a
performance test, rather than a paper-and-pencil test.
In hiring potential employees, resumes and interviews are helpful, but pre-
employment testing is the only way to really verify a candidateÊs qualifications
and abilities. The problem is that pre-employment testing is subject to strict legal
restrictions and if you do not know what they are, you could find yourself in
difficulty.
Source: http://www.gaebler.com/Employment-Testing-Legal-Issues.htm
In conclusion, as can be seen from the cases and scenarios described previously
in this section for educational and entrepreneur settings, many legal issues are
involved when tests are used as a mechanism for social control. In general, the
issues revolve around the validity of the test for a specific use. However, specific
legal decisions depend on „the particular circumstances surrounding a given
case, the evidence brought to bear in the case, and the opinion of the judge and
jury involved‰ (Nitko, 1983).
ACTIVITY 10.4
SELF-CHECK 10.4
When the inaccurate test is used, the evaluation and description of individuals
may also be inaccurate. Therefore, there must be a body that governs and
provides guidelines and standards that can be used by anyone with the intention
of using psychological tests.
Several documents that are used as guidelines for ethics in test usage are:
First, professional issues must be taken into consideration. One important aspect
that should be focused on is the competence of test purchasers. There is potential
for harm if the tests fall into the wrong hands.
The APA proposed that tests can be categorised into three levels of complexity
that require different degrees of expertise from the examiner as shown in
Table 10.3.
Level Description
Level A Requires minimal training.
Test administration involves reading simple directions.
Covers tests for educational achievement and job proficiency.
Level B Requires some knowledge of the technical characteristics of tests.
Covers tests such as group-administered mental ability and interest
inventories.
Also requires knowledge of test construction and training in statistics
and psychology.
Level C Requires advanced training in test theory and relevant content areas.
Also requires substantial understanding of testing and supporting
topics.
Requires a minimum of a masterÊs degree in psychology.
Covers individually administered intelligence tests and personality tests.
SELF-CHECK 10.5
1. We need to have a body that governs the activities of
psychological testing. Explain why this is so.
2. Explain the ethical issues that are vital in using psychology tests.
What was most remarkable about the result was the diversity of opinions.
47 percent of the respondents believed that the market for psychological
testing was shrinking, while 22 percent saw the market as growing. The
remainder had not seen a change. Yet when these psychologists were asked
whether they had experienced a growth in testing opportunities in their
own practice, 42 percent answered affirmatively.
One recurring theme among the most pessimistic psychologists was that
insurance and managed care had drastically reduced payments. A North
Carolina psychologist stated, „Managed care and insurance reimbursement
have been significantly cut. Testing such as the MMPI, which used to be a
regular part of the intake, is not done at all now.‰ Others lamented that
testing is being less emphasised in graduate training.
On the technical part, it is said that there is a new psychological test called
brain mapping. Neuropsychological, clinical neuroscience and biohypnosis
are the new testing methods for family law issues. They provide brain
mapping images that can give insight to the capacities and mental health of
parents, teens and children. Hypnoanalysis is being reinvented because of
the validation of interviewing a person under hypnosis as shown by brain
mapping studies.
ACTIVITY 10.5
After reading the writing in section 10.9, think and discuss in class and
forum about the future of psychology tests and measurements in
Malaysia by comparing possible situations here to the situation in the
US as described.
SELF-CHECK 10.6
Psychological tests are used in various fields and settings such as clinics,
hospitals, organisations, industries, businesses, schools and universities. They
are also used in private services, government services and in research and
counselling.
Test bias is said to occur when a test yields higher or lower scores on average
when it is administered to specific criterion groups such as people of a
particular race or gender than to an average population sample.
The cultural influence in testing can be discussed from four aspects: the
influence of different cultural backgrounds in using tests, language of tests,
test taking behaviour, and culture-free and culture-fair tests.
Psychologists and test users must be guided by a body that governs and
provides guidelines and standards.
There are various issues that are have been focused on in terms of the future
of psychology tests and measurement, for example the payment trends and
the demands for psychological testing.
L R. (2000). Psychological
Aiken, L. Ps t
testing and assessment
as . (11th ed.). Bostton, MA:
Alllyn & Bacon.
Buros, O. K. (1970). Personality: Tests and reviews. Highland Park, NJ: Gryphon
Press.
Buros, O. K. (1975). Intelligence: Tests and reviews. Highland Park, NJ: Gryphon
Press.
Buros, O. K., & Buros Institute of Mental Measurements. (1961). Tests in print.
Highland Park, NJ: Gryphon Press.
Ekman, P. (1985). Telling lies: Clues to deceit in the marketplace, politics, and
marriage. New York: Norton.
Nauert, R. (2008). This personality test cannot be faked ă Psych Central News.
Psych Central.com. Retrieved from
http://psychcentral.com/news/2008/10/08/this-personality-test-cannot-
be-faked/3088.html
Rich, J. (2007). Psychological testing: Old specialty, new markets. | The National
Psychologist. Retrieved from
http://nationalpsychologist.com/2007/07/psychological-testing-old-
specialty-new-markets/10933.html
Sattler, J. M. (1988). Assessment of children. (3rd ed.). San Diego, CA: Sattler.
Sun, X., & Shi, Q. (2007). Language issues in cross cultural usability testing: A
pilot study in China. Retrieved from
http://culturalusability.cbs.dk/downloads/HCI%202007/sunxianghong.
pdf
Tan, U., & Tan, M. (1998). Curvelinear correlations between total testosterone
levels and fluid intelligence in men and women. International Journal of
Neuroscience, 95, 77ă83.
OR
Thank you.